Modular polynomials via isogeny volcanoes

We present a new algorithm to compute the classical modular polynomial Phi_n in the rings Z[X,Y] and (Z/mZ)[X,Y], for a prime n and any positive integer m. Our approach uses the graph of n-isogenies to efficiently compute Phi_n mod p for many primes p of a suitable form, and then applies the Chinese Remainder Theorem (CRT). Under the Generalized Riemann Hypothesis (GRH), we achieve an expected running time of O(n^3 (log n)^3 log log n), and compute Phi_n mod m using O(n^2 (log n)^2 + n^2 log m) space. We have used the new algorithm to compute Phi_n with n over 5000, and Phi_n mod m with n over 20000. We also consider several modular functions g for which Phi_n^g is smaller than Phi_n, allowing us to handle n over 60000.


Introduction
For a prime l, the classical modular polynomial Φ l is the minimal polynomial of the function j(lz) over the field C(j), where j(z) is the modular j-function.The polynomial Φ l parametrizes elliptic curves E together with an isogeny E → E ′ of degree l.From classical results, we know that Φ l lies in the ring Z[X, Y ] and satisfies Φ l (X, Y ) = Φ l (Y, X), with degree l + 1 in both variables [57, §69].
The fact that the moduli interpretation of Φ l remains valid modulo primes p = l was crucial to the improvements made by Atkin and Elkies to Schoof's pointcounting algorithm [21,49].More recently, the polynomials Φ l mod p have been used to compute Hilbert class polynomials [2,53], and to determine the endomorphism ring of an elliptic curve over a finite field [6].Explicitly computing Φ l is notoriously difficult, primarily due to its large size.As shown in [19], the logarithmic height of its largest coefficient is 6l log l + O(l), thus its total size is (1) O(l 3 log l).
As this bound suggests, the size of Φ l grows quite rapidly; the binary representation of Φ 79 already exceeds one megabyte, and Φ 659 is larger than a gigabyte.The polynomial Φ l can be computed by comparing coefficients in the Fourier expansions of j(z) and j(lz), an approach considered by several authors [8,21,35,37,39,43,45].As detailed in [8], this only requires integer arithmetic and may be performed modulo p for any prime p > 2l + 2. The time to compute Φ l mod p is then O(l 3+ε (log p) 1+ε ), and for a sufficiently large p this yields an O(l 4+ε ) time algorithm to compute Φ l over Z. Alternatively (and preferably), one computes Φ l modulo several smaller primes and applies the Chinese Remainder Theorem, as suggested in [8,35,43,45].
An alternative CRT-based approach appears in [16].This algorithm uses isogenies between supersingular elliptic curves defined over a finite field, and computes Φ l mod p in time O(l 4+ε (log p) 2+ε + (log p) 4+ε ), under the GRH.Although slower than using Fourier expansions, this approach relies solely on the fact that Φ l parametrizes isogenies.
In [23], Enge uses interpolation and fast floating-point evaluations to compute Φ l ∈ Z[X, Y ] in time O(l 3 (log l) 4+ε ), under reasonable heuristic assumptions.The complexity of this method is nearly optimal, quasi-linear in the size of Φ l .However, most applications actually use Φ l in a finite field F p n , and Φ l mod p may be much smaller than Φ l .In general, Enge's algorithm can compute Φ l and reduce it modulo p much faster than either of the methods above can compute Φ l mod p, but this may use an excessive amount of space.For large l this approach becomes impractical, even when Φ l mod p is reasonably small.
Here we present a new method to compute Φ l , either over the integers or modulo an arbitrary positive integer m, including m ≤ l.Our algorithm is both asymptotically and practically faster than alternative methods, and achieves essentially optimal space complexity.More precisely, we prove the following result.
Theorem 1.Let l denote an odd prime and m a positive integer.Algorithm 6.1 correctly computes Φ l ∈ (Z/mZ)[X, Y ].Under the GRH, it runs in expected time O(l 3 log 3 l log log l), using O(l 2 log lm) expected space.
To compute Φ l over Z, the modulus m is made large enough to uniquely determine the coefficients, via an explicit height bound proven in [12].Our algorithm is of the Las Vegas type, a probabilistic algorithm whose output is unconditionally correct; the GRH is only used to analyze its running time.We have used it to compute Φ l for all l < 3600, and many larger l up to 5003.The largest previous computation of which we are aware has l equal to 1009.Working modulo m we can go further; we have computed Φ l modulo a 256-bit integer m with l = 20011.
Applications that rely on Φ l can often improve their running times by using alternative modular polynomials that have smaller coefficients.Our algorithm can be adapted to compute polynomials Φ g l relating g(z) and g(lz), for modular functions g that share certain properties with j.This includes the cube root γ 2 of j, and we are then able to compute Φ l mod m more quickly by reconstructing it from Φ γ2 l mod m, capitalizing on a suggestion in [21].Other examples include simple and double eta-quotients, the Atkin functions, and the Weber f-function.The last is especially attractive, since the modular polynomials for f are approximately 1728 times smaller than those for j.This has allowed us to compute modular polynomials Φ f l with l as large as 60013.The outline of this article is as follows.In Section 2 we give a rough overview of our new algorithm.The theory behind the algorithm is presented in Sections 3-5.We present the algorithm, prove its correctness and analyze its runtime in Section 6. Section 7 deals with modular polynomials for modular functions other than j, and a final Section 8 contains computational results.

Overview
Our basic strategy is a standard CRT approach: we compute Φ l mod p for various primes p and use the Chinese Remainder Theorem to recover Φ l ∈ Z[X, Y ].Alternatively, the explicit CRT (mod m) allows us to directly compute Φ l ∈ (Z/mZ)[X, Y ], via [4,Thm. 3.1].By applying the algorithm of [53, §6], this can be accomplished in O(l 2 log lm) space, even though the total size of all the Φ l mod p is O(l 3 log l).
Our method for computing Φ l mod p is new, and applies only to certain primes p. Strategic prime selection has been used effectively in other CRT-based algorithms, such as [53], and it is especially helpful here.Working in the finite field F p , we select l + 2 distinct values j i , compute Φ l (X, j i ) ∈ F p [X] for each, and then interpolate the coefficients of Φ l ∈ (F p [Y ])[X] as polynomials in F p [Y ].The key lies in our choice of p, which allows us to select particular interpolation points that greatly facilitate the computation.We are then able to compute Φ l mod p in expected time (2) O(l 2 (log p) 3 log log p).
In contrast to the methods above, this is quasi-linear in the size of Φ l mod p.
Our algorithm exploits the structure of the l-isogeny graph G l defined on the set of j-invariants of elliptic curves over F p .Each edge in this graph corresponds to an l-isogeny; the edge (j 1 , j 2 ) is present if and only if Φ l (j 1 , j 2 ) = 0.As described in [28,40], the ordinary components of this graph have a particular structure known as an l-volcano.Depicted in Figure 1 are a set of four l-volcanoes, each with two levels: the surface (at the top), and the floor (on the bottom).Note that each vertex j i on the surface has l + 1 neighbors; these are the roots of Φ l (X, j i ) ∈ F p [X], and there are at least l + 2 such j i .This configuration contains enough information to compute the l+2 polynomials Φ l (X, j i ) that we need to interpolate Φ l (X, Y ) mod p.It is not an arrangement that is likely to arise by chance; it is achieved by our choice of the order O and the primes p that we use.To further simply our task, we choose p so that vertices on the surface correspond to curves with F p -rational l-torsion.Our ability to obtain such primes is guaranteed by Theorems 4.1 and 4.4, proven in Section 4.
The curves on the surface all have the same endomorphism ring type, isomorphic to an imaginary quadratic order O. Their j-invariants are precisely the roots of the Hilbert class polynomial H O ∈ Z[X].As described in [2], the roots of H O may be enumerated via the action of the ideal class group cl(O).To do so efficiently, we use an algorithm of [53] to compute a polycyclic presentation for cl(O) that allows us to enumerate the roots of H O via isogenies of low degree, typically much smaller than l.We may use this presentation to determine the action of any element of cl(O), including those that act via l-isogenies.This allows us to identify the lisogeny cycles that form the surfaces of the volcanoes in Figure 1.
Similarly, the vertices on the floor are the roots of H R , where R is the order of index l in O, and we use a polycyclic presentation of cl(R) to enumerate them.To identity children of a common parent (siblings), we exploit the fact that siblings lie in a cycle of l 2 -isogenies, which we identify using our presentation of cl(R).It remains only to connect each parent to one of its children.This may be achieved by using Vélu's formula [54] to compute an l-isogeny from the surface to the floor.By matching each parent to a group of siblings, we avoid the need to compute an l-isogeny to every child, which is critical to obtaining the complexity bound in (2).
Below is a simplified version of the algorithm to compute Φ l mod p. Algorithm 2.1.Given l, p, and O, compute Φ l mod p as follows: 1. Find a root of H O over F p . 2. Enumerate the roots j i of H O and identify the l-isogeny cycles.3.For each j i find an l-isogenous j on the floor.4. Enumerate the roots of H R and identify the l 2 -isogeny cycles.5.For each )[X] using the j i and the polynomials Φ l (X, j i ).
Algorithm 2.1 assumes that l, p, and O satisfy the conditions of Theorem 4.1, and that h(O) ≥ l + 2. We use the same O for each p, so H O may be precomputed.Note that we do not compute H R , we enumerate its roots by applying the Galois action of cl(R) to a root obtained in Step 3.
A more detailed version of Algorithm 2.1 appears in Section 6 together with Algorithm 6.1, which selects the order O and the primes p, and performs the CRT computations needed to determine Φ l over Z, or modulo m.

Orders in imaginary quadratic fields
It is a classical fact that the endomorphism ring of an ordinary elliptic curve over a finite field is isomorphic to an imaginary quadratic order O.The order O is necessarily contained in the maximal order O K of its fraction field K, but we quite often have O O K .As most textbooks on algebraic number theory focus on maximal orders, we first develop some useful tools for working with non-maximal orders.To simplify the presentation, we work throughout with fields of discriminant d K < −4, ensuring that we always have the unit groups O * = O * K = {±1}.We use dK p to denote the Kronecker symbol, which is −1, 0, or 1 as the prime p splits, ramifies, or remains inert in K (respectively).
Let O be a (not-necessarily maximal) order in a quadratic field K of discriminant d K < −4.Let N be a positive integer prime to the conductor u = [O K : O].The order R = Z + N O has index N in O, and its ideal class group cl(R) is an extension of cl(O).More precisely, as in [52,Thm. 6.7], there is an exact sequence where ϕ maps the class [I] to the class [IO].For R-ideals prime to uN , the underlying map as in [20,Prop. 7.20].
We have a particular interest in the kernel of the map ϕ.
Lemma 3.1.In the exact sequence above, if N = p n is a power of an unramified odd prime p, then ker ϕ is cyclic of order p n−1 p − dK p .
We now apply [17,Cor. 4.2.11] to compute the structure of (O/p n O) * : In both cases, the factor (Z/p n−1 Z) of (Z/p n Z) * is a maximal cyclic subgroup of the Sylow p-subgroup of (O/p n O) * , and must correspond to a direct summand.Thus the p-rank of the quotient (O/p n O) * /(Z/p n Z) * is 1.The order of the factor (Z/(p−1)Z) of (Z/p n Z) * is not divisible by p, and must correspond to a subgroup of a cyclic factor of (O/p n O) * in both cases.It follows that the quotient is cyclic.The calculations above also show that # Even when ker ϕ is not necessarily cyclic, the size of ker ϕ is as in Lemma 3.1.More generally, the exact sequence (3) can be used to derive the formula ( 5) as in [20,Thm. 7.24].We now describe a particular representation of ker ϕ when N = l is prime.In this case ker ϕ is cyclic, of order l − dK l ; this follows from Lemma 3.1 for l > 2, and from (5) for l = 2. Let O = Z[τ ] for some τ ∈ K that is coprime to l.There are exactly l + 1 index l subrings of O: the order R, and rings S i = lZ + (τ + i)Z, for i from 0 to l − 1.Each O-ideal of norm l corresponds to one of the S i .The remaining S i are fractional invertible R-ideals corresponding to proper R-ideals These are all non-principal and inequivalent in cl(R), and each lies in ker ϕ, since we have J i O = lO.The invertible J i are exactly the non-trivial elements of ker ϕ.We summarize with the following lemma.Lemma 3.2.If N = l is prime in the exact sequence (3), then the R-ideal lR and the invertible R-ideals J i defined in (6) are representatives for ker ϕ.In particular, ker ϕ is generated by the class of an invertible R-ideal with norm l 2 .This representation of ker ϕ has proven useful in other settings [15].We use it to obtain the l 2 -isogeny cycles we need in Step 4 of Algorithm 2.1.
We conclude this section with a theorem that allows us to construct arbitrarily large class groups that are generated by elements of bounded norm.Theorem 3.3.Let O be an order in a quadratic field of discriminant d K < −4, and let p ∤ disc(O) be an odd prime.Let P be a set of primes that do not divide p[O K : O].For n ∈ Z ≥0 , let R n denote the order Z + p n O, and let G n be the subgroup of cl(R n ) generated by the set S n of classes of R n -ideals with norms in P.
Then if , and proceed by induction on n.
For each prime q ∈ P and every n, there are exactly 1 − dK q ideals in R n of norm q, and φ n+1 maps S n+1 onto S n and G n+1 onto G n .By the inductive hypothesis, G n = cl(R n ), therefore G n+1 intersects every coset of ker φ n+1 ⊂ ker ϕ n+1 .To prove G n+1 = cl(R n+1 ), it suffices to show ker ϕ n+1 ⊂ G n+1 .
The groups ker ϕ n and ker ϕ n+1 are cyclic, by Lemma 3.1, since p is odd and unramified.Let α n be a generator for ker ϕ n .Since # ker ϕ n is divisible by p, α n cannot be a pth power in ker ϕ n .Expressing α n in terms of S n , we see that φ −1 n+1 (α n ) must intersect G n+1 .Let α n+1 lie in this intersection, and note that α n+1 ∈ ker ϕ n+1 .The order of α n+1 must be a multiple of |α n | = # ker ϕ n , and α n+1 cannot be a pth power in ker ϕ n+1 .It follows that α n+1 has order # ker ϕ n+1 , hence it generates ker ϕ n+1 , proving ker ϕ n+1 ⊂ G n+1 as desired.
To see Theorem 3.3 in action, let O be the order of discriminant D = −7, let p = 3, and let P = {2}.The class group of the order R n of discriminant 3 2n D happens to be generated by an ideal of norm 2 when n = 2, and the theorem then implies that this holds for all n.This allows us to construct arbitrarily large cyclic class groups, each generated by an ideal of norm 2.
We remark that Theorem 3.3 may be extended to handle p = 2 if the condition  The first main theorem of complex multiplication [20,Thm. 11.1] states that for any complex elliptic curve E with endomorphism ring O. Furthermore, the minimal polynomial H O of j(E) over K actually has coefficients in Z, and its degree is  The second main theorem of complex multiplication [20,Thm. 11.39] states that where x(E[N ]) denotes the set of x-coordinates of the N -torsion points of an elliptic curve E with endomorphism ring O.The Galois invariance of the Weil pairing The Frobenius endomorphism of the non-isomorphic quadratic twist Ẽ/F p corresponds to −π p , and we then have # Ẽ(F p ) = p + 1 − tr(−π p ) ≡ 2 mod l.For l = 2 this implies that Ẽ has trivial l-torsion over F p , proving (1).
To prove (2), let E ′ /F p be a curve with endomorphism ring R that is l-isogenous to E. The Frobenius endomorphism of E ′ also corresponds to π p under an isomorphism End(E ′ ) ∼ −→ R. The cardinality of E ′ (F p ) is thus equal to the cardinality of E(F p ) and therefore divisible by l 2 .However, since p does not split completely in the ring class field of index l 2 in O, we cannot have π p ≡ 1 mod lR.It follows that E ′ [l] ⊂ E ′ (F p ) and E ′ (F p ) must contain a point of order l 2 .As above, the quadratic twist of E ′ must have trivial l-torsion over F p , proving (2).
Provided the order O in Theorem 4.1 also satisfies h(O) ≥ l + 2, we can achieve the desired setting for Algorithm 2.1.We say such an order is suitable for l.
To determine the coefficients Φ l via the Chinese Remainder Theorem, we need to compute Φ l mod p for many primes p satisfying Theorem 4.1.We necessarily have p > l, since p ≡ 1 mod l, and the height bound [19] on the coefficients of Φ l implies that 6l + O(1) primes suffice.We now show these primes exist and bound their size, assuming the GRH.For this purpose we define a suitable family of orders.Definition 4.2.Let S be the set of odd primes and let T be the set of all imaginary quadratic orders.A suitable family of orders is a function F : S → T such that: (1) for all l ∈ S the order F(l) is suitable for l.
(2) there exist effective constants c 1 , c 2 ∈ R >0 such that for all l ∈ S the bounds , and for l > 3 let F(l) be the order O of discriminant −7 • 3 2n , where n is the least integer for which h Letting c 1 = 4 and c 2 = 205, we see that F is a suitable family of orders.
Theorem 4.4.Let F be a suitable family of orders and let c 0 ∈ R >0 be an arbitrary constant.Then for each prime l > 2 the set of primes p for which l, O = F(l) and p satisfy the conditions of Theorem 4.1 has positive density.
Assuming the GRH, there is an effective constant c ∈ R >0 such that at least c 0 l 3 (log l) 3 of these primes are bounded by B = cl 6 (log l) 4 , for all primes l > 7.
Proof.For a prime l > 2, let O = F(l) have fraction field K, and let u The ray class field K l,O and the ring class field K S for the order S = Z + l 2 O are both invariant under the action of complex conjugation, hence both are Galois extensions of Q.One finds that and the Chebotarëv density theorem [47,Thm. 13.4] yields the unconditional claim.
To prove the conditional claim, we apply an effective Chebotarëv bound to the extension K l,O /Q, assuming the GRH for the Dedekind zeta function of K l,O .
The extension K l,O /K is abelian of conductor dividing lu, with degree nh(O), ) , by Hasse's Führerdiskriminantenproduktformel [47, Thm.VII.11.9].We then have where π(x, K l,O /Q) counts the primes up to x ∈ R >0 that split completely in K l,O , and c 3 ∈ R >0 is an effectively computable constant, independent of l.
If we now suppose x = cl 6 (log l) 4 , and apply Li(x) ∼ x log x and nh(O) ≤ c 1 l 3 , we may choose c ∈ R >0 so that Li(x)/(2nh(O)) is greater than the RHS of ( 7) by an arbitrarily large constant factor.In particular, for any c 4 ∈ R >0 there is an effectively computable choice of c that ensures π(x, K l,O /Q) ≥ c 4 l 3 (log l) 3 , independent of l.Moreover, for the least such c we have c/c 4 → 1 as c 4 → ∞.
We now show that most of these primes do not split completely in K S .Any prime p that splits completely in K l,O must split completely in the ring class field for R = Z + lO.Putting D = disc(O), we then have with t, v ∈ Z >0 and t ≡ 2 mod l.If v ≡ 0 mod l, then p cannot split completely in K S .For p ≤ cl 6 (log l) 4 , we have v ≤ 2c 1/2 l(log l) 2 and t ≤ 2c 1/2 l 3 (log l) 2 , since D ≥ l 2 , hence there are at most 2c 1/2 (log l) 2 positive v ≡ 0 mod l, and at most 2c 1/2 l 2 (log l) 2 + 1 positive t ≡ 2 mod l, that satisfy (8).
It follows that no more than 4cl 2 (log l) 4 + 2c 1/2 (log l) 2 primes p ≤ cl 6 (log l) 4 split completely in K S .For a sufficiently large choice of c 4 , we can choose c so that π(x, K l,O /Q) ≥ c 4 l 3 (log l) 3 and at the same time ensure that for all primes l > 7, since we then have l/(log l) bounded above 4. Theorem 4.4 guarantees we can obtain a sufficient number of primes p for use with Algorithm 2.1.In fact, as is typical for such bounds, it provides far more than we need.The task of finding these primes is addressed in Section 6.1.

4.3.
Computing the CM action.The Galois action of Gal(K O /K) ∼ = cl(O) on the set Ell O (F p ) may be explicitly computed using isogenies, as described in [2].Let the prime p split completely in the ring class field K O , and let E/F p be an elliptic curve with End(E) ∼ = O.Fixing an isomorphism End(E) in which the ideal group of O acts on the set Ell O (F p ).This action factors through the class group, and the cl(O)-action is transitive and free.Equivalently, Ell O (F p ) is a torsor for cl(O); for each pair (j 1 , j 2 ) of elements in Ell O there is a unique element of cl(O) whose action sends j 1 to j 2 .Now let l 0 be an invertible O-ideal of prime norm l 0 = p.The curves E and E/E[l 0 ] are l 0 -isogenous, hence Φ l0 (j 0 , j l0 0 ) = 0, where j 0 = j(E).To compute the action of l 0 , we need to find the corresponding root of Φ l0 (X, j 0 ) ∈ F p [X].We assume that Φ l0 (X, Y ) is known, either via one of the algorithms from the introduction, or by a previous application of Algorithm 6.1.The polynomial Φ l0 (X, j 0 ) ∈ F p [X] has either 1 or 2 roots that lie in Ell O (F p ), depending on whether l 0 ramifies or splits (it is not inert).These roots correspond to the actions of l 0 and its inverse l -1 0 , which coincide when l 0 ramifies.Our fixed isomorphism End(E) ∼ −→ O maps the Frobenius endomorphism of E to an element π p ∈ O ⊂ O K with norm p.We then have the norm equation where t = tr(π p ), and v is the index of Z[π p ] in O K .When l 0 does not divide v, the order Z[π p ] is maximal at l 0 and the only roots of Φ l0 (X, j 0 ) over F p are those in Ell O (F p ). Otherwise Φ l0 (X, j 0 ) has l + 1 roots in F p , and those in Ell O (F p ) lie on the surface of the l 0 -volcano containing j, as described in [28].The roots on the surface can be readily distinguished, as in [53, §4], for example, but typically we choose p with l 0 ∤ v so that every root of Φ l0 (X, j 0 ) in F p is on the surface.When l 0 splits and does not divide v, the actions of l 0 and l -1 0 may be distinguished as described in [11, §5] and [29, §3].The kernels of the two l 0 -isogenies are subgroups of E[l 0 ].A standard component of the SEA algorithm computes a polynomial F l0 (X), whose roots are the abscissa of the points in one of these kernels [21,49].In our setting l 0 splits in Z[π p ], and provided l 0 ∤ v, the action of π p on E[l 0 ] has two distinct eigenvalues corresponding to the two kernels.Expressing the ideal l 0 in the form (l 0 , c + dπ p ) yields the eigenvalue λ = −c/d mod l 0 .We may then use F l0 (X) to test whether π p 's action is equivalent to multiplication by λ in the corresponding kernel.See [11] for an example and further details.
As a practical optimization (see Section 6.7), we avoid the need to ever make this distinction.The asymptotic complexity of computing the action of l 0 is the same in any case.Given Φ l0 ∈ F p [X, Y ], the j-invariant j l0 0 may be computed using an expected O(l 2 0 + M(l 0 ) log p) operations in F p .
Here M(n) denotes the complexity of multiplying two polynomials of degree less than n, as in [55,Def. 8.26].Naïvely, M(n) = O(n 2 ), Karatsuba's algorithm yields M(n) = O(n log 2 3 ), and FFT-based methods achieve M(n) = O(n log n log log n).
Proof.We first compute gcd(X p − X, Φ l0 (X, j 0 )), the product of the distinct linear factors of Φ l0 (X, j 0 ) over F p .Instantiating f (X) = Φ l0 (X, j 0 ) uses O(l 2 0 ) operations in F p , exponentiating X p mod f uses O(M(l 0 ) log p) operations in F p , and the fast Euclidean algorithm [55, §11.1] obtains gcd(X p −X, f ) using O(M(l 0 ) log l 0 ) = O(l 2 0 ) operations in F p .This gcd has degree at most 2, since Z[π p ] is maximal to l 0 , and we may find its roots using an expected O(log p) F p -operations [55,Cor. 14.16].
The desired root j l0 0 is then distinguished as outlined above.We first compute the eigenvalue λ − c/d mod l 0 , where l 0 = (l 0 , c + dπ p ), using O(l 2 0 ) bit operations.Applying [9, Thm.2.1], the kernel polynomial F l0 (X) can be computed using O(M(l 0 )) operations in F p .To compare (X p , Y p ) to the scalar multiple λ • (X, Y ), we compute X p , Y p , and the required division polynomials ψ n (X, Y ), modulo F l0 (X) and the curve equation for E, as in the SEA algorithm [7,Ch. VII].This uses O((log l 0 + log p)M(l 0 )) = O(l 2 0 + M(l 0 ) log p) operations in F p .

Mapping the CM torsor
The previous section made explicit the Galois action corresponding to an element of cl(O) ∼ = Gal(K O /K) represented by an ideal l 0 of prime norm l 0 .We now use this to enumerate the set Ell O (F p ), and at the same time compute a map that explicitly identifies the action of each element of cl(O).To do this efficiently it is critical to work with generators whose norms are small, since the cost of computing the action of l 0 increases quadratically with its norm.5.1.Polycyclic presentations.As a finite abelian group, each element of cl(O) can be uniquely represented using a basis.However, as noted in [53, §5.3], the norms arising in a basis may need to be much larger than those in a set of generators.Thus we are led to consider polycyclic presentations.
Let α = (α 1 , . . ., α k ) be a sequence of generators for G, and let G i = α 1 , . . ., α i denote the subgroup generated by α 1 , . . ., α i .The composition series is then polycyclic, meaning that each quotient G i+1 /G i is a cyclic group.The sequence r(α) = (r 1 , . . ., r k ) of relative orders for α is defined by Each r i necessarily divides |α i |, and for i > 1 we typically have r i < |α i |.The sequences α and r(α) allow us to uniquely represent each β ∈ G in the form (10)  The vector x in (10) is the discrete logarithm or exponent vector of β.We let and note that the map x → α x defines a bijection from X(α) to G.
We now consider the case G = cl(O), where O is an order in a quadratic field K of discriminant d K < −4.Let P = (p 1 , p 2 , p 3 , . ..) be an increasing sequence of primes with the property that cl(O) is generated by the classes of invertible ideals with norms in P. By Dirichlet's density theorem [20, Thm.9.12], any sequence containing all but a finite set of primes works, and from [20,Cor. 7.17] we know that some finite prefix of P actually suffices.For O = O K we may take P to be the sequence of primes less than |d K /3| 1/2 , by [13, Prop.9.5

.2].
There is a unique lexicographically minimal subsequence (l 1 , . . ., l k ) of P that corresponds to a polycyclic sequence α = (α 1 , . . ., α k ) for cl(O) in which α i is represented by an ideal of norm l i , and r(α) has r i > 1.When l i splits there are two possibilities for α i .To fix a choice, let α i be the ideal class represented by the unique binary quadratic form ax 2 + bxy + cy 2 of discriminant D = disc(O) with a = l i and b nonnegative [13, §3.4], corresponding to the ideal l i = (l i , (−b+ √ D)/2).We call α the polycyclic presentation of cl(O) determined by P. We use l(α) to denote the sequence of norms (l 1 , . . ., l k ), but note that this also depends on P; each l i is the least prime in P that is the norm of an ideal in α i .
We may compute α by applying [53, Alg.2.1] to an implicit sequence of generators γ = (γ 1 , γ 2 , γ 3 , . ..) corresponding to the subsequence of P for which there exists an invertible O-ideal of norm p i .The algorithm computes r i for each γ i in turn, and if we find that r i > 1, we append γ i to an initially empty vector α.We terminate when r i = h(O), a value which we assume has been precomputed.

Suitable presentations.
When P is the sequence of all primes, the norms l(α) for the polycyclic presentation α of cl(O) determined by P are as small as possible.However, when working in a finite field F p , we may wish to ensure that each norm l i does not divide v = [O K : Z[π p ]], as noted in Section 4.3.This is achieved by excluding from P primes that divide v, and we call the corresponding α the presentation of cl(O) suitable for p.This may cause us to use norms that are slightly larger than optimal.We now show that, provided we work with a family of orders that satisfies certain (easily met) constraints, the norms in every suitable presentation are quite small, assuming the GRH.Then under the GRH, for every O in F and every prime p that splits completely in K O , the presentation α of cl(O) suitable for p has norms l(α) for which where v is defined by 4p = t 2 − v 2 disc(O), the function ω(v) counts the distinct prime factors of v, and c is an effective constant that depends only on c 3 .
Proof.Let O, p, v, and α be as above.Let R be the order of index s 2 O in O K , and let γ be the presentation of R determined by the increasing sequence of primes that do not divide v.It follows from Theorem 3.3 that max l(α) ≤ max l(γ).
Let K R be the ring class field for R and let π C (x, K R /Q) count the primes bounded by x ∈ R >0 whose Frobenius symbol (under the Artin map) lies in the conjugacy class C of Gal(K R /Q).We may bound [K R : Q], # Gal(K R /Q), and disc(K R ) by constants that depend only on c 3 , independent of O.Under the GRH, the Chebotarëv bound of [41, Thm.1.1] then yields for some effective constant c 4 ∈ R >0 and all x > 2, where c 4 depends only on c 3 .For an effective constant c depending on c 3 , setting x = cω(v) log(ω(v) + 1) yields π C (x, K R /Q) > ω(v).In this case the Frobenius symbol of at least one prime not dividing v lies in C, and this applies to every C.It follows that every class in cl(R) contains an element whose norm is a prime bounded by x that does not divide v.We then have max l(α) ≤ max l(γ) ≤ x = cω(v) log(ω(v) + 1), as desired.
The family of orders in Example 4.3 satisfies the requirements of Theorem 5.1.We note that provided log p = O(log l), we have ω(v) = O(log l/ log log l), and the theorem then yields an O(log l) bound on the norms l(α).This is sharper than the more general O(log 2 l) bound implied by [1].In fact, by [32,Thm. 431], one expects ω(v) = O(log log p), which yields a bound of O(log log l log log log l).

5.3.
Realizing the CM torsor.We now consider how to explicitly map cl(O) to the torsor Ell O (F p ), so that we may then compute the action of any element (or subgroup) of cl(O) on any element of Ell O (F p ), without needing to compute any further isogenies.We use the presentation α of cl(O) suitable for p, and the table T : X(α) → cl(O) described in Section 5.1.As above, we have α = ([l 1 ], . . ., [l k ]), with norms l(α) = (l 1 , . . ., l k ) and relative orders r(α) = (r 1 , . . ., r k ).We assume the modular polynomials Φ l1 , . . ., Φ l k are known, since the l i are small.
To enumerate Ell O (F p ) we use [53,Alg. 1.3], but we augment this algorithm to also compute an explicit bijection φ : cl where the j i are distinct elements of Ell O (F p ).As explained in Section 4.3, each step in this path is computed by finding a root j i of Φ(X, j i−1 ) that lies in Ell O (F p ).When l k splits in K we have two choices for j 1 , and the correct choice may be determined using a kernel polynomial as outlined in Section 4.3.For i > 1 we use the polynomial Φ(X, j i−1 )/(X − j i−2 ), which has exactly one root When k > 1, the enumeration of Ell O (F p ) proceeds recursively.For each j i in (11) we compute a path of l k−1 isogenies containing r k−1 distinct j-invariants.Eventually, every element of Ell O (F p ) is enumerated exactly once [53,Prop. 5].For each j n ∈ Ell O (F p ) we also compute a vector x ∈ X(α) that describes the path used to reach j n from j 0 , where x i indicates the number of steps taken on an l i -isogeny path.By correctly choosing the direction of each path, we ensure that each j n is the image of j 0 under the cl(O)-action of This yields the desired bijection φ; since α x uniquely represents some β ∈ cl(O), we may set φ(α x ) = j n , a process facilitated by the map T : X(α) → cl(O).
The bijection φ allows us translate any computation in the group cl(O) to the torsor Ell O (F p ).In particular, by enumerating the cyclic subgroup H ⊆ cl(O) generated by [l], where l is an ideal of norm l, we obtain the l-isogeny cycle containing j 0 , corresponding to the surface of one of the l-volcanoes in Figure 1.Doing the same for each coset of H partitions Ell O (F p ) into l-isogeny cycles.
This may also be applied to the order R = Z + lO.After obtaining a bijection from cl(R) to Ell R (F p ), we enumerate the kernel of the map ϕ : cl(R) → cl(O) from the exact sequence of (3).Here we use one of the generators of norm l 2 guaranteed by Lemma 3.2.Enumerating the cosets of ker ϕ then partitions Ell R (F p ) into l 2isogeny cycles of siblings with a common l-isogenous parent in Ell O (F p ).

The algorithm
We now present our algorithm to compute the modular polynomial Φ l using the Chinese Remainder Theorem (CRT).Algorithm 6.1 follows the standard pattern of a CRT-based algorithm; the details lie in Algorithm 6.2, which selects a set of primes S, and in Algorithm 2.1, which computes Φ l modulo each prime p ∈ S.
The computation of Φ l ∈ Z[X, Y ] may be viewed as a special case of computing Φ l ∈ (Z/mZ)[X, Y ], where m is the product of the primes in S. The choice of S ensures that this m is large enough to uniquely determine Φ l ∈ Z[X, Y ].Algorithm 6.1.Let l be an odd prime, let m be a positive integer, and let O = F(l) lie in a suitable family of orders F. Compute Φ l ∈ (Z/mZ)[X, Y ] as follows: 1 The polynomial H O computed in Step 1 may be obtained using any of several algorithms whose running time is quasi-linear in disc(O), including [11,22,53].
6.1.Selecting primes.The primes p in the set S selected by Algorithm 6.1 must satisfy the conditions of Theorem 4.1 in order to use them in Algorithm 2.1.We require p to split completely in the ray class field K l,O , but to not split completely in the ring class field for the order Z + l 2 O. Equivalently, we need p ∤ D to satisfy (12) 4p = t 2 − v 2 l 2 D, with t ≡ 2 mod l and l ∤ v, where D = disc(O).To apply the CRT, we also require ( 13) From [12], we use the explicit bound ( 14) on the logarithmic height of Φ l to achieve this.We then have #S = O(l), by (12).Heuristically, it is easy to find primes that satisfy (12).If D ≡ 1 mod 8, fix v = 2, otherwise fix v = 1.Then, for increasing t ≡ 2 mod l with the correct parity, test whether p = (t 2 − v 2 l 2 D)/4 is prime.We expect to need O(l log l) primality tests, and each can be accomplished in time polynomial in log l, although typically p is small enough to make an attempted factorization more efficient.We could obtain slightly smaller p's by letting v vary, but it is more convenient to fix v so that we can use the same presentation of cl(O) and cl(R) for every p.This approach is easy to implement and very fast in practice.
However, in order to prove Theorem 1 we must take a more cautious approach.Even assuming the GRH, we cannot guarantee we will find any primes with a fixed value of v. On the other hand, Theorem 4.4 implies that if we construct random integers p ≤ x satisfying (12), for sufficiently large x we have p prime with probability Ω(1/ log x), and under the GRH, x = O(l 6 (log l) 4 ) is large enough.Additionally, we would like to avoid v's with many prime factors, so that we may more profitably apply Theorem 5.1.By Lemma 8.1 of the appendix, the restriction ω(v) ≤ (log(log v + 3)) 2 eliminates a negligible proportion of the integers v ∈ [1, x].
We now present Algorithm 6.2, emphasizing that its purpose is to facilitate the proof of Theorem 1.In practice we use the heuristic procedure described above.Algorithm 6.2.Let l be an odd prime, let D < −4 be a discriminant, and let B l be as in (14).Construct the set S as follows: 1. Set n ← (B l + 2 log 2)/ log(l 2 |D|/4) and then x ← 4l 2 |D|n log n.
Set b ← 0 and S ← ∅.In Step 3a, the integer t is generated as t = al + 2 using a uniformly random integer a ∈ [0, V /l − 2].The computation of ω(v) in Step 3b is performed by factoring v.Note that #S ≤ n, and p ≤ x for all p ∈ S. Lemma 6.3.Let F be a suitable family of orders, let l be an odd prime and let D = disc(F(l)).Given inputs l and D, the expected running time of Algorithm 6.2 is finite.Under the GRH we also have the following:
(2) There is a constant c < 1 such that for all l > 7 and k ∈ Z >0 , the algorithm terminates with log x ≤ (6 + k) log l with probability at least 1 − c −k log l .
Proof.To analyze Algorithm 6.2, we count the number of times Step 4 is executed, referring to the period between each execution as an iteration.By Theorem 4.4, the set of primes that satisfy Theorem 4.1, equivalently, those that satisfy (12), has positive density.Here we may use the natural density, via [38, Thm.4.3.e].For every fixed odd prime l, this implies a lower bound of Ω(1/ log x) on the probability that a random integer in [1, x] is a prime that satisfies (12).Each integer p tested by Algorithm 6.2 necessarily satisfies ( 12), hence such a p is prime with probability Ω(1/ log x).By Lemma 8.1 in the appendix, the probability that a candidate p is skipped due to the test in Step 3b is o(1/ log x).This implies that for all sufficiently large x, the probability that Algorithm 6.2 terminates in a given iteration is bounded above zero, and the expected running time is finite.Now assume the GRH and let l > 7. Applying Theorem 4.4 with c 0 = 1, there are at least l 3 (log l) 3 primes p < c 1 l 6 (log l) 4 that satisfy (12), for some constant c 1 ∈ R >0 that does not depend on l.Let x 0 be the least value of x ≥ c 1 l 6 (log l) 4 ).When x = x 0 we have V T /l ≤ 8c 1 l 3 (log l) 4 , since |D| ≥ l 2 , and the probability that a given primality test succeeds is at least 8c 1 / log l ≥ c 2 / log x, for some constant c 2 ∈ R >0 .From the inequality (7) in the proof of Theorem 4.4, one finds that this holds for all x ≥ x 0 , with the same constants.As above, Step 3b has negligible impact, and for x ≥ x 0 the probability that the algorithm terminates in a given iteration is at least c, for some constant c ∈ R >0 independent of l and x.
We now consider the running time as a function of l, fixing an arbitrary ε ∈ R >0 .It takes O(log l) iterations to achieve x = x 0 , assuming that we don't terminate earlier, and we execute Steps 3b and 3c a total of O(l(log l) 2 ) times during this process.The computation of ω(v) and the primality test of p can both be achieved in expected time subexponential in log x, by [44], yielding an O(l 1+ε ) bound on the time to reach x = x 0 , since log x 0 = O(log(l)).
For x ≥ x 0 , the probability of reaching each subsequent iteration declines exponentially, while the cost of Steps 3b and 3c grows subexponentially, implying that the total expected running time is also O(l 1+ε ), proving (1).
Claim (2) follows from the same analysis.We have log x 0 = (6 + o(1)) log l and add log 2 to log x in each iteration.Once x = x 0 , it takes more than k log l iterations to reach log x > (6 + k) log l.There is a probability of at least c that the algorithm terminates in each subsequent iteration, yielding the bound in (2).6.2.CRT computations.The computations involved in Steps 3, 4b, and 5 of Algorithm 6.1 are described in detail in [53, §6].We summarize briefly here.
Given S = {p i }, let M = p i , M i = M/p i , and [55, §10.3], we can use fast Chinese remaindering to efficiently compute (15) c ≡ c i a i M i mod M.
Provided that M > 2|c|, we can then lift the result from Z/M Z to Z.When m is "large," by which we mean log m = Ω(log M ), we compute c mod M , lift to Z, and then reduce to Z/mZ.In this scenario, Step 4b simply stores the coefficients c i and Step 3 can be deferred to Step 5.
When m is "small," by which we mean M(log m) = O(log 3 l log log l), we instead use the explicit CRT modulo m.Assuming M > 4|c|, we may apply (16) c where r is the closest integer to s = c i a i /p i , by [4,Thm. 3.1].In this scenario, we update the sum C = c i a i M i mod m and an approximation to s in Step 4b as each c i is computed.This uses O(log m + log l) space per coefficient, rather than the O(l log l) space used to compute c mod M .The postcomputation in Step 5 determines r from the approximation to s and computes c mod m via (16).
When m is neither small nor large, a hybrid approach is used, see [ 1. Compute the presentations α of cl(O) and α ′ of cl(R) suitable for p.
2. Find a root of j 0 of H O (X) over F p .
3. Use α to enumerate Ell O (F p ) from j 0 , and identify the l-isogeny cycles.4. For distinct j 0 , . . ., j l+1 ∈ Ell O (F p ): a. Construct a curve E i with j(E i ) = j i such that l divides #E i (F p ). b. Generate a random point P ∈ E i (F p ) of order l.c.Use E i and P to compute an l-isogenous curve , otherwise return to Step 3b. 5. Use α ′ to enumerate Ell R (F p ) from j ′ 0 and identify the l 2 -isogeny cycles.6.For i from 0 to l + 1: a.Let j i0 , . . ., j il consist of the neighbors of j i in its l-isogeny cycle in Ell O (F p ) together with the l 2 -isogeny cycle of Ell R (F p ) containing Steps 2, 6, and 7 involve standard computations with polynomials over finite fields, as described in [55], for example.Step 1 is addressed in Section 5.1, and Steps 3 and 5 are the topic of Section 5.3.Only Step 4 merits further discussion here.
The existence of the curve E i constructed in Step 4a is guaranteed by Theorem 4.1.The trace of Frobenius t of the desired curve is uniquely determined by the norm equation for p and the constraint t ≡ 2 mod l, as in (12).With k = j i /(1728 − j i ), the curve E/F p defined by y 2 = x 3 + 3kx + 2k has j(E) = j i , and we may determine whether it is E or its quadratic twist that has trace t by attempting to generate a point of order l on both curves in Step 4b.
To obtain a point P of order l, we generate a random point Q uniformly distributed over E i (F p ), compute the scalar multiple P = nQ, where n = (p + 1 − t)/l, and then check that P = 0.This will be true with probability 1 − 1/l 2 , since Theorem 4.1 implies that the Sylow l-subgroup of Thus we expect to succeed within 1 + O(1/l 2 ) attempts, and the expected cost of generating P is O(log p) operations in F p .
Note that E i (F p ) contains l + 1 distinct subgroups of order l, each corresponding to a distinct l-isogenous j-invariant.At most 2 of these lie in Ell O (F p ). Thus in Step 4d we have j(E ′ i ) ∈ Ell O (F p ) with probability at least 1 − 2/(l + 1) and expect to need 1 + O(1/l) random points P to obtain such an E ′ i .The curve E ′ i is the image of an l-isogeny whose kernel is generated by P , obtained via Algorithm 6.4.

6.4.
Isogenies from subgroups.Let E/F p be an elliptic curve.Given a cyclic subgroup H ⊆ E(F p ), Vélu's formulas construct an isogeny E → E ′ with H as its kernel [54].Typically H is specified by a polynomial whose roots in the algebraic closure F p are the x-coordinates of the points in H.In our setting we may specify H as the subgroup generated by a point P ∈ E(F p ), and work entirely in F p rather than an extension field.Additionally, the order l of H is odd and p > 3, allowing us to simplify the formulas.Algorithm 6.4.Let l > 2 and p > 3 be primes, let E/F p be an elliptic curve defined by y 2 = x 3 + Ax + B, and let P = (P x , P y ) be a point on E(F p ) of order l.Compute the image E ′ /F p of the l-isogeny with kernel H = P as follows: 1. Set t ← 0, w ← 0, and Q ← P 2. Repeat (l − 1)/2 times: a. Set s ← 6Q 2 x + 2A and then set u ← 4Q 2 y + sQ x .b. Set t ← t + s, w ← w + u, and The addition Q + P in Step 2b is performed using the group operation in E(F p ).The complexity of Algorithm 6.4 is O(l) operations in F p .6.5.Complexity analysis.We first bound the complexity of Algorithm 2.1, as used by Algorithm 6.1.Lemma 6.5.Let F be a suitable family of orders that satisfies the conditions of Theorem 5.1.For an odd prime l, let O = F(l) and let D = disc(O).Let p be a prime in the set S selected by Algorithm 6.2 on input l and D. Assuming the GRH, the expected running time of Algorithm 2.1 is O(l 2 (log p) 3 log log p).
Proof.We note that p = t 2 − v 2 l 2 D > l 4 , thus log l < log p, and recall that the bitcomplexity of multiplying two polynomials of degree O(l) in F p [X] may by bounded by O(M(l log p)), using Kronecker substitution, see Corollaries 8.28 and 9.8 of [55].
In the analysis below we use O(M(l log p)) = O(l(log p) 2 log log p), via the bound M(n) = O(n log n log log n) for fast multiplication [48].To bound the cost of operations in F p , we use M(n) = O(n 2 /(log n) c ), see Remark 6.6, and bound the cost of inversions by O(M(log p) log log p) = O((log p) 2 ), via [55,Cor. 11.10].
We now bound the (expected) cost of each step in Algorithm 2.1: 1. We have h(O) < h(R) = O(l 2 ).As described in Section 5.1, the cost of computing the presentations α and α ′ is O(l 2 ) operations in cl(O) and cl(R).Using binary quadratic forms, each group operation has complexity O((log l) 2 ), by [5], yielding an O(l 2 (log l) All but O(l) of the O(l 2 ) steps taken when enumerating Ell R (F p ) involve these elements, and, as in Step 3, we obtain a cost of O(l 2 (log p) 3 ) for these steps.Assuming the GRH, the remaining elements of α ′ all have norm O((log |D|) 2 ) = O((log l) 2 ), via [1], yielding a cost of O(l(log l) 4 (log p) 3 ) for these steps.Thus the expected time to enumerate Ell R (F p ) is O(l 2 (log p) 3 ), which dominates the O(l 2 log l) time to identify the l 2 -isogeny cycles.6.Using a product tree we may compute k (X −j ik ) in time O(M(l log p) log l), yielding a total cost of O(l 2 (log p) 3 log log p) for Step 6. 7. Using a product tree and fast interpolation [55,Alg. 10.11], we also obtain a cost of O(l 2 (log p) 3 log log p) for Step 7.Here we use the O(M(l log p)) bit-complexity of polynomial multiplication in F p [X] to bound the cost at each level, rather than using the bound in [55,Cor. 10.12].
The bound O(l 2 (log p) 3 log log p) applies to every step, completing the proof.
We are now ready to prove our main theorem.Proof.We first argue correctness.By Lemma 6.3, Algorithm 6.2 obtains a set of primes S that satisfy Theorem 4.1, with p∈S p > 4|c|, for every coefficient c of Φ l .We now claim that for p ∈ S, Algorithm 2.1 obtains, for each of l + 2 distinct jinvariants j i , a list of l + 1 distinct j-invariants j ik of l-isogenous curves.Granting the claim, we may invoke standard properties of Φ l to show that Algorithm correctly interpolates Φ l ∈ F p [X, Y ], see [56,Thm. 12.19] and [42, Thm.5.3], for example.
The correctness of Algorithm 6.1 then follows from the CRT and/or the explicit CRT mod m, via [4, Thm.3.1], as described in Section 6.2.As usual, let O = F(l) have fraction field K, and let R = Z + lO.The claim above rests on three facts: (1) the explicit CM-action described in Section 4.3 is correct, (2) any two l 2 -isogenous elements of Ell R (F p ) must be l-isogenous to exactly one and the same element of Ell O (F p ), and (3) each l 2 isogeny cycle in R contains exactly l − dK l elements.We note that (1) follows from the theory of complex multiplication and the properties of Φ l0 guaranteed by [56, Thm.12.19], (2) follows from the l-volcano structure, as shown by [28, §2.2] and [40,Prop. 23], and ( 3) is explicitly proven in Lemma 3.1.
We now assume the GRH and bound the complexity of Algorithm 6.1.Lemma 6.3 shows that the expected size of the largest p ∈ S is O(log l), and we have #S = O(l).Applying [53, §6] yields an O(l 2 log lm) space bound for m ∈ Z >0 .
By [53, Thm.1], the expected time to compute H O in Step 1 is O(l 2+ε ), and Lemma 6.3 gives an expected time of O(l 1+ε ) for Step 2, for any ε ∈ R >0 .Additionally, we have log p > (6 + k) log l for all p ∈ S with probability approaching 1 exponentially as k increases.The time complexity of all remaining steps in Algorithm 6.1, including calls to Algorithm 2.1, depends polynomially on log p, hence we may bound the expected running time assuming log p = O(log l).
Regardless of the exact cutoff used, if M(log m) = O((log l) 3 log log l) whenever we consider m "small", we may apply the results of [53, §6] to obtain a bound of O(l 3 (log l) 3 log log l) on the expected time for all CRT computations, for every m ∈ Z >0 .Since F satisfies the conditions of Theorem 5.1, we may apply Lemma 6.5 with log p = O(log l) to obtain an O(l 2 (log l) 3 log log l) bound on the expected time of each call to Algorithm 2.1.Applying #S = O(l) completes the proof.

Remark on M(n).
To bound the complexity of multiplication in F p , and of polynomials of low degree, we assume only that M(n) = O(n c ) for some c < 2, already achieved by Karatsuba's algorithm [55,Alg. 8.1].In the practical range of l, the elements of F p actually fit in a single 64-bit machine word and can be multiplied very quickly.We do assume FFT-based multiplication is used for multiplying polynomials of degree l in F p [X], and this assumption is met in practice; most of the computations described in Section 8 make heavy use of the FFT.6.7.Selecting a suitable order.The family of orders used in Theorem 1 suffices to prove the complexity bound, but we can simplify the implementation and improve performance with some additional constraints on the order O. Let us fix a bound b < l (say b = 256, for large l), and a small prime l 0 < l (typically l 0 = 2).As above, R is the order of index l in O, and O K is the maximal order.We seek an order O for which the following hold: ( (2) The groups cl(O) and cl(R) are either generated by a single ideal with norm l 0 , or by two ideals with norms l 0 and l 1 , where l 1 ≤ b is ramified.The first condition ensures that O is suitable for l and allows us to to obtain a root of H O (X) using only polynomials of degree at most b.This is accomplished by finding a root of H OK (X) and descending to the proper level of the l ′ -isogeny volcano for each prime l ′ ≤ b dividing the conductor of O, as in [53, §4.1].The second condition allows us to realize the torsors for cl(O) and cl(R) either by walking a single l 0 -isogeny cycle, or by walking two l 0 -isogeny cycles connected by a single l 1 -isogeny.In the latter case we orient the two cycles by computing one extra l 1 -isogeny.In both cases we avoid the need to ever distinguish the action of an ideal and its inverse, simplifying the computation described in Section 5.3.
Subject to these conditions, we also wish to minimize h(O) ≥ l + 2. To find such orders we enumerate fundamental discriminants d K < −4 with dK l1 = 1 and h(d K ) ≤ b, and for each d K we select b-smooth integers u for which h(u 2 d K ) is slightly greater than l + 2 and test whether condition (2) holds.In practice we are almost always able to obtain O with h(O) within a few percent of l + 2.

Modular functions other than j
Let g be a modular function of level N , and let l ∤ N be a prime.We define the modular polynomial Φ g l of level l for g as the minimal polynomial of the function g(lz) over the field C(g).Much of the theory for the classical modular polynomial of the j-function generalizes to g.In particular, if the Fourier expansion of g has integer coefficients then we have Φ g l ∈ Z(g)[X].The following lemma gives us further information in this case.
Lemma 7.1.Let g be a modular function and let l be a prime not dividing the level of g.Suppose that Φ g l has integer coefficients.If g is invariant under the action of either The proof follows the symmetry proof for Φ l (X, Y ), see [42,Thm. 5.3].
To apply our method we additionally require that Φ g l have degree l + 1.This is not true in general, but it does hold in many cases of interest.
The polynomial Φ g l should not be confused with the minimal polynomial of g as an element of C(j), which we denote Ψ g (X, J).The polynomial Ψ g depends only on g, not l, and we assume it is known (for our purposes, it effectively defines g).Given Ψ g , our goal is to efficiently compute Φ g l for a prime l ∤ N .We wish to adapt Algorithm 2.1 to compute Φ g l ∈ F p [X].We may then apply Algorithm 6.1 to recover Φ g l over the integers or modulo some integer m via the Chinese Remainder Theorem.To simplify matters, we place some additional restrictions on the order O that we use in Algorithm 6.1.Specifically, we require that there is a generator τ ∈ H of O = Z[τ ] with the property that where K O is the ring class field for the order O.We say that g is a class invariant for O in this case.If we now take a prime p that splits completely in K O and E/F p an elliptic curve with endomorphism ring O, then the polynomial has at least one root in F p .Indeed, the value h = g(τ ) mod p for a prime p|p of K O satisfies Ψ g (h, j(E)) = 0.
We can analyze how many roots the polynomial Ψ g (X, j(E)) ∈ F p [X] has using a combination of Deuring lifting and Shimura reciprocity.We refer to [10, §6.7] for a detailed description of the techniques involved and only state the result here.Let g i : H → C be the roots of Ψ g (X, j).The functions g i are modular of level N , and they are permuted by the Galois group GL 2 (Z/N Z) of the field of all modular functions of level N .To state our result, we will associate a matrix A ∈ GL 2 (Z/N Z) This reduces the cost of all the significant components of Algorithm 2.1 by a factor of approximately 3. The reduction in the cost of the interpolations in Step 7 is actually greater than this, since its complexity is superlinear in the degree.
The total size of Φ γ2 l is approximately 9 times smaller than Φ l , and with the optimizations above, the time to compute it is effectively reduced by the same factor.A small amount of additional time is required to compute the cube roots of the elements in Ell O (F p ) and Ell R (F p ), but even this can be avoided.
Provided we have already computed Φ γ2 l ′ for some small values of l ′ (specifically, for the primes l 0 and l 1 of Section 6.7), we may use these polynomials to directly enumerate sets Ell γ2 O (F p ) and Ell γ2 R (F p ) containing the cube roots of the elements in Ell O (F p ) and Ell R (F p ) respectively.We need only compute the cube roots of j 0 and j ′ 0 as starting points.This third optimization yields a small but useful improvement in the case of γ 2 , and plays a critical role in the examples that follow.
7.2.Recovering Φ l from Φ γ2 l .Having computed Φ γ2 l , we note that Φ l may be computed via [21,Eq. 23]: ), where ω = e 2πi/3 .For computation in Z, or modulo m, it is more convenient to express Φ γ2 l in terms of polynomials P 0 , P , where b = 2 when l ≡ 1 mod 3, and b = 0 when l ≡ 2 mod 3. We then have (22) Φ l = P 3 0 Y b + (P 3 1 − 3P 0 P 1 P 2 )XY + P 3 2 X 2 Y 2−b .Using Kronecker substitution and fast multiplication, it is possible to evaluate (22) in time O(l 3 (log l) 2+ε ), which is asymptotically faster than Algorithm 6.1.This suggests that we might more efficiently compute Φ l by recovering it from Φ γ2 l , but we do not find this to be true in practice: it actually takes longer to evaluate (22) than it does to compute Φ l directly.This can be explained by two factors.First, the F p -operations used in Algorithm 2.1 effectively have unit cost for word-size primes, making it faster than Theorem 1 would suggest for all but very large l.Secondly, the evaluation of ( 22) becomes extremely memory intensive when l is large.However, if we are computing Φ l mod m with log m ≪ l log l, then the time to apply (22) modulo m is negligible.In this situation it is quite advantageous to derive Φ l mod m from Φ γ2 l mod m, as may be seen in Table 3 of Section 8. where ζ 48 = e πi 24 and η(z) is the Dedekind eta function.This is a modular function of level 48 that satisfies γ 2 = (f 24 − 16)/f 8 , see [57, p. 179], thus we have Ψ f (X, j) = (X 24 − 16) 3 − X 24 j.
Asymptotically, we expect to be able to reduce the height bound B l by a factor of deg X Ψ f / deg j Ψ f = 72 when computing Φ f l .One can derive an explicit bound along the lines of ( 18), but this tends to overestimate the O(l) term quite significantly, so in practice for large l we use the heuristic bound which has been verified for every prime l between 2400 and 10000.The modular polynomial Φ f l is also sparse: the coefficient of X a Y b can be nonzero only when (24) la + b ≡ l + 1 mod 24, as shown in [57, p. 266].Thus Φ f l is roughly 72 • 24 = 1728 times smaller than Φ l .By applying the technique described above for γ 2 , mutatis mutandi, we can actually compute Φ f l more than 1728 times faster than Φ l for large values of l, as may be seen in see Table 2 of Section 8.When applying Algorithm 6.1, we now insist that the O have discriminant D ≡ 1 mod 8 and 3 ∤ D, since the Weber function yields class invariants in (at least) this case, see [30], for example.
Since −f will also yield class invariants for O, the polynomial Ψ f (X, j 0 ) will always have at least two roots.The following lemma tells us that we can impose a congruence condition on p to ensure that we have exactly two roots.Lemma 7.3.Let p ≡ 11 mod 12 be prime and let j 0 be the j-invariant of an elliptic curve E/F p with End(E) isomorphic to an imaginary quadratic order O with discriminant D ≡ 1 mod 8 and 3 ∤ D. Then Ψ f (X, j 0 ) ∈ F p [X] has exactly two roots in F p , and these are of the form x 0 and −x 0 .
Note that if the lemma applies to O, it also applies to the order R of index l in O.
Proof.We only have to apply theorem 7.2.The action of A on the roots of Ψ f (X, j 0 ) is computed in [10, §6.7], and this yields the lemma.Given a j-invariant j 0 ∈ F p that corresponds to j(τ 0 ) ∈ K O , we cannot readily determine which of the roots x 0 and −x 0 of Ψ f (X, j 0 ) actually corresponds to f(τ 0 ).The functions f and −f yield distinct class invariants, but they share the same modular polynomials, since Φ (24).Thus for the initial j 0 obtained in Step 2 of Algorithm 2.1, it does not matter whether we pick x 0 or −x 0 as a root of Ψ f (X, j 0 ), and we need not be concerned with making a consistent choice for each prime p. However it is critical that while computing Φ f l mod p we make a consistent choice of sign for each j-invariant we convert to an "f-invariant" (a root of Ψ f (X, j i ) mod p).This makes it impractical to enumerate j-invariants and convert them en masse.Instead, as described for γ 2 above, we use modular polynomials Φ f l ′ for small l ′ to enumerate sets Ell f O (F p ) and Ell f R (F p ) from starting points x 0 and x ′ 0 satisfying Ψ f (x 0 , j 0 ) = Ψ f (x ′ 0 , j ′ 0 ) = 0.This ensures that signs are chosen consistently within each of these sets, we only need to check that the sign choices for the two sets are consistent with each other.
To do so, we use the fact that the coefficient of X l Y l in Φ f l (X, Y ) is −1.This is shown for Φ l in [57, §69], and the same argument applies to Φ f l .We modify Algorithm 2.1 to compute the coefficient of X l Y l in Φ f l mod p in between Steps 5 and 6.We do this twice, switching the signs in Ell f O (F p ) the second time, and expect exactly one of these computations to yield −1, thereby determining a consistent choice of signs.This test should be regarded as a heuristic, since we do not rule out the possibility that both choices produce −1.However, in the course of extensive In some cases one can obtain smaller modular polynomials by considering suitable roots of the functions defined above.For example, a sixth root of f 3 also yields class invariants (this is shown for w 2  3 in [24]), and the corresponding modular polynomials are sparser and of lower height (by a factor of 6).
Our algorithm also applies to the Atkin modular functions, which we denote A N .These are (optimal) modular functions for X + 0 (N ) invariant under the Atkin-Lehner involution, see [21,45] for further details.The polynomials Ψ AN are known as Atkin modular polynomials, and are available in computer algebra systems such as Magma [14] and Sage [51].For primes N < 32, and also N in the set {41, 47, 59, 71}, we have deg j Ψ AN = 2 and can compute polynomials Φ AN l for odd primes l = N .This is done in essentially the same way as with the double eta-quotients, except that now N is prime and Ψ AN (X, j 0 ) has just two roots, rather than four.For these A N , the height bound can be reduced by a factor of approximately (N + 1)/2.
Finally, we note an alternative approach applicable to both eta-quotients and the Atkin modular functions.If we choose O so that the prime factors of N are all ramified, then there is actually a unique x 0 ∈ F p corresponding to each j 0 in Ell O (F p ) and Ell R (F p ), that is, Ψ g (X, j 0 ) has exactly one root in F p .In this scenario we can simply enumerate j-invariants as usual and then replace each j i with a corresponding x i .This is not as efficient and places stricter requirements on O, but it allows us to compute Φ g l without needing to know Φ g l ′ for any l ′ .This provides a convenient way to "bootstrap" the process.In fact all of the modular polynomials Φ g l we have considered can eventually be obtained via Algorithm 6.1, starting from the polynomials Ψ g and Φ 2 .

Computational results
We have applied our algorithm to compute polynomials Φ g l for all the modular functions discussed in Section 7 and every applicable l up to 1000.For the functions j, γ 2 , and f we have gone further, and present details of these computations here.8.1.Implementation.The algorithms described in this paper were implemented using the GNU C/C++ compiler [50] and the GMP library [31] on a 64-bit Linux platform.Multiplication of large polynomials is handled by the zn poly library developed by Harvey [33,34].
The hardware platform included four 3.0 GHz AMD Phenom II processors, each with four cores and 8GB of memory.Up to 16 cores were used in the larger tests, with essentially linear speedup.For consistency we report total CPU times, noting that in a multi-threaded implementation, disk and network I/O can be overlapped with CPU activity so that all computations are CPU bound.
As a practical optimization, we do not use the Hilbert class polynomial H O in Step 1 of Algorithm 6.1.Instead, we compute the minimal polynomial of some more favorable class invariant, as described in [27], which is then used to obtain a j-invariant.Additionally, as noted in Section 6.7, it suffices to compute a class polynomial for the maximal order containing O. With these optimizations the time spent computing class polynomials is completely negligible (well under one second).
Another important optimization is the use of polynomial gcds to accelerate rootfinding when walking paths in the isogeny graph, a technique developed in [27, §2].This greatly accelerates the enumeration of the sets Ell O (F p ) and Ell R (F p ) in Steps 3 and 5 of Algorithm  1 and 2 provide performance data for computations of Φ l and Φ f l using Algorithm 6.1.For each l we list: • The discriminant D of the suitable order O.
• The number of CRT primes n = #S used.
• The height bound B l in bits and the actual bit-size b l of the largest coefficient.
• The total size of Φ l (resp.Φ f l ) in megabytes (1MB = 10 6 bytes), computed as the sum of the coefficient sizes, with symmetric terms counted only once.
• The total CPU time, in seconds.This includes the time to select O.
• The throughput, defined as the total size divided by the total CPU time.In the last column of Table 1 one can see the quasilinear performance of Algorithm 6.1 as a function of the size of Φ l , and the constant factors appear to be advantageous relative to other algorithms.For example, computing Φ 1009 with the evaluation/interpolation algorithm of [23] uses approximately 100000 CPU seconds (scaled to our hardware platform), while Algorithm 6.1 needs less than 3000.
The first five rows of Table 2 may be compared to the corresponding rows of Table 1 to see the performance advantage gained when computing modular polynomials for the Weber f function rather than j.As expected, these polynomials are approximately 1728 times smaller, and the speedup achieved by Algorithm 6.1 is even better; we already achieve a speedup of around 1800 when l = 1009, and this increases to to over 3000 when l = 5003.This can be explained by the superlinear complexity of interpolation, as well as the superior cache utilization achieved by condensing the sparse coefficients of Φ f l , as described in Section 7.1.As noted in Section 7.3, we used a heuristic height bound for the computations in Table 2.The gap between the values of b l and B l in each case gives us high confidence in the results (the probability of this occurring by chance is negligible).Instead, Algorithm 6.1 derives Φ l mod m from the computations of Φ l mod p, for p ∈ S, using the explicit CRT.The same set S is used as when computing Φ l over Z, so the running time is largely independent of m, but using the explicit CRT (mod m) yields a noticeable speedup when log m is significantly smaller than 6l log l.For example, when l = 1009 it takes approximately 2300 seconds to compute Φ l mod m, for the m listed in Table 3, versus about 2900 seconds to compute Φ l over Z.
In addition to computing Φ l mod m directly, we may also obtain Φ l mod m by first computing Φ γ2 l mod m and then applying (22), as discussed in Section 7.1.The time to compute Φ γ2 l mod m is essentially independent of m, but the time to apply (22) is not.Even so, for the 256-bit and 1024-bit m that we used, computing Φ l mod m in this fashion is much faster than computing Φ l mod m directly; for l = 1009 we achieve times of 223 and 403 seconds, respectively.As when computing Φ f l , this speedup improves superlinearly, and for large l is noticeably greater than the expected factor of 9.
When computing Φ γ2 l mod m we used the height bound B γ2 l = 2l log l + 8l given by (18).The timings in Table 3 would be further improved if the heuristic bound B γ2 l = 2l log l + 4l were used instead.While we conjecture that this bound holds for all l > 60, caution is warranted when applying heuristic bounds to computations performed modulo m: the impact of an incorrect height bound may not be immediately apparent, as it is when computing Φ l over Z.
The computations listed in Tables 1 and 2 were practically limited by space, not time.The largest computations took only a day or two when run on 16 cores, but required nearly a terabyte of disk storage.However when computing Φ l mod m, we can handle larger values of l without using an excessive amount of space.When l = 20011, for example, the total size of Φ l is over 30 terabytes, but we are able to compute Φ l modulo a 256-bit integer m using less than 10 gigabytes.Columns Φ * l list the total time to obtain Φ l by computing Φ γ 2 l and applying (22).
For any fixed c 0 ∈ R >0 and all sufficiently large x the RHS of ( 26) is smaller than log(c 0 x/ log c x). Thus k∈I τ k (x) is o(x/(log x) c ), as desired.

figure 1 .
figure 1.A set of l-volcanoes arising from Theorem 4.1.In this example l = 7 splits into ideals of order 3 in cl(O) and we have h(O) = 12 surface curves and h(R) = 72 floor curves.
), and easily generalizes to treat families of orders lying in O that have b-smooth conductors, for any constant b.

4. Explicit CM theory 4 . 1 .
The theory of complex multiplication (CM).As in Section 3, let O be an order in a quadratic field K of discriminant d K < −4.We fix an algebraic closure of K.It follows from class field theory that there is a unique field K O with the property that the Artin map induces an isomorphismGal(K O /K) ∼ −→ cl(O)between the Galois group of K O /K and the ideal class group of O.The field K O is called the ring class field for the order O.If O is the maximal order of K, then K O is the Hilbert class field of K, the maximal totally unramified abelian extension of K.In general, primes dividing [O K : O] ramify in the ring class field.
of an element of O.The equation 4p = t 2 − v 2 D is often called the norm equation.For a positive integer N , there is a unique extension K N,O of the ring class field K O such that the Artin map induces an isomorphismGal(K N,O , K O ) ∼ −→ (O/N O) * /{±1}.The field K N,O is the ray class field of conductor N for O.When O = O K , this is simply the ray class field of conductor N , and for N = 1 we recover the ring class field K O = K 1,O .The ring class field K R of the order R = Z + N O is a subfield of the ray class field K N,O .The Galois group of K R /K O is isomorphic to (O/N O) * /(Z/N Z) * , the kernel of the map ϕ in (3).

Lemma 4 . 5 .
Let l 0 and p be distinct odd primes, and let O = Z[i], Z[ζ 3 ] be an imaginary quadratic order.Let j 0 = j(E) ∈ Ell O (F p ), fix an isomorphism End(E) ∼ −→ O, and let π p ∈ O denote the image of the Frobenius endomorphism.Let l 0 be an invertible O-ideal of norm l 0 , and assume Z[π p ] is maximal at l 0 .

Theorem 5 . 1 .
Let c 3 ∈ R >0 be a fixed constant, and let F be a suitable family of orders with the following additional property: if O is an order in F whose fraction field K has discriminant d K , and s O denotes the square-free part of [O K : O], then s O is coprime to 2d k and both s O and |d K | are bounded by c 3 .

3 .
Repeat ⌈2N log x⌉ times: a. Construct an integer p = (t 2 −v 2 l 2 D)/4 using uniformly random integers v ∈ [1, V ] and t ∈ [1, T ], subject to l ∤ v, t ≡ 2 mod l, and t ≡ vD mod 2. b.If ω(v) > (log(log v + 3)) 2 then go to Step 3d.c.If p / ∈ S and p is prime then set S ← S ∪ {p} and b ← b + log p. d.If b > B l + 2 log 2 then output S and terminate.4. Set x ← 2x and go to Step 2.
2 ) = O(l 2 (log p) 2 ) bound on Step 1. 2. Using Berlekamp's probabilistic root-finding algorithm [3, §7] with a fast GCD computation [55, Alg.11.4], the expected time to find a single root of H D ∈ F p [X] may be bounded by O(M(l)(log l + log p)) operations in F p .This implies an O(l(log p) 3 ) bound on Step 2. 3.For p ∈ S we have ω(v) ≤ (log log p) 2 , yielding an O((log log p) 2 log log log p) bound on max l(α).From Lemma 4.5, O((log log p) 4 log p) operations in F p suffice to compute the action of any element of α.We obtain an O(l(log p) 3 ) bound on the time to enumerate Ell O (F p ), which dominates the O(l log l) time to identify the l-isogeny cycles.4. From the discussion in Section 6.3 and the complexity of Algorithm 6.4, we expect to use O(l 2 + l log p) operations in F p during Step 4. This yields an O(l 2 (log p) 2 + l(log p) 3 ) bound. 5. Recall that the surjective map ϕ : cl(R) → cl(O) in (3) preserves the norms of representative ideals.The subgroup of cl(R) generated by invertible R-ideals with norms in l(α) contains ϕ −1 (cl(O)).It follows that the elements of α ′ with norm at most max l(α) generate a subgroup of size at least h(O) > l.

Theorem 1 .
Let F be the suitable family of orders in Example 4.3.Let l be an odd prime and let m ∈ Z >0 .Given inputs l, m, and O = F(l), Algorithm 6.1 correctly computes Φ l ∈ (Z/mZ)[X, Y ].Under the GRH, its expected running time is O(l 3 log 3 l log log l), using O(l 2 log lm) expected space.
The polynomial H O is known as the Hilbert class polynomial .If p is a prime that splits completely in the extension K O /Q, then H O splits into distinct linear factors in F p [X].Its roots are the j-invariants of the elliptic curves E/F p with End(E) ∼ = O, a set we denote Ell O (F p ).Let D = disc(O).The primes that split completely in K O are precisely the primes p ∤ D that are the norm (2)mes that split completely in the ray class field.We are specifically interested in primes p that split completely in the ray class field K l,O , where l is an odd prime.For such p we can achieve the desired setting for Algorithm 2.1, as depicted in Figure1.Let l > 2 be prime, and let O ⊂ Z[i], Z[ζ 3 ] be an imaginary quadratic order that is maximal at l. Let R = Z + lO be the order of index l in O. Let p be a prime that splits completely in the ray class field K l,O , but does not split completely in the ring class field for the order of index l 2 in O.The inclusions K O ⊆ K R ⊆ K l,O imply that both H O and H R split into linear factors in F p .Each j-invariant in Ell O (F p ), resp.Ell R (F p ), corresponds to two distinct isomorphism classes over F p , since these curves are ordinary andO = Z[i], Z[ζ 3].We will show that exactly one of these satisfies (1), resp.(2).Since p splits completely in the ray class field K l,O , we can factor p = π p π p ∈ O with π p ≡ 1 mod lO.Let E/F p be an elliptic curve with endomorphism ring O whose Frobenius endomorphism corresponds to π p under one of the two isomorphisms End(E) and is called a power relation, see [36, §8.1].A generic algorithm to compute r(α) and the s(α, i) can be found in [53, Alg.2.1].
. Compute the Hilbert class polynomial H O ∈ Z[X].The suitable family of orders F is as defined in Section 4.2, see Definition 4.2.

Table 1 .
6.1.As a result, most of the computation (typically over 75%) is spent interpolating polynomials in Steps 6 and 7. Computations of Φ l over Z.

Table 2 .
Table 3 gives timings for computations of Φ l modulo 256-bit and 1024-bit primes m.The values of m are arbitrary, and, in particular, they are not of a form suitable for direct computation with Algorithm 2.1.Computations of Φ f l over Z.

Table 3 .
Computations of Φ l and Φ γ2 l modulo m.