Phase transitions in phylogeny
HTML articles powered by AMS MathViewer
- by Elchanan Mossel
- Trans. Amer. Math. Soc. 356 (2004), 2379-2404
- DOI: https://doi.org/10.1090/S0002-9947-03-03382-8
- Published electronically: October 28, 2003
- PDF | Request permission
Abstract:
We apply the theory of Markov random fields on trees to derive a phase transition in the number of samples needed in order to reconstruct phylogenies. We consider the Cavender-Farris-Neyman model of evolution on trees, where all the inner nodes have degree at least $3$, and the net transition on each edge is bounded by $\epsilon$. Motivated by a conjecture by M. Steel, we show that if $2 (1 - 2 \epsilon )^2 > 1$, then for balanced trees, the topology of the underlying tree, having $n$ leaves, can be reconstructed from $O(\log n)$ samples (characters) at the leaves. On the other hand, we show that if $2 (1 - 2 \epsilon )^2 < 1$, then there exist topologies which require at least $n^{\Omega (1)}$ samples for reconstruction. Our results are the first rigorous results to establish the role of phase transitions for Markov random fields on trees, as studied in probability, statistical physics and information theory, for the study of phylogenies in mathematical biology.References
- Noga Alon and Joel H. Spencer, The probabilistic method, 2nd ed., Wiley-Interscience Series in Discrete Mathematics and Optimization, Wiley-Interscience [John Wiley & Sons], New York, 2000. With an appendix on the life and work of Paul Erdős. MR 1885388, DOI 10.1002/0471722154
- A. Ambainis, R. Depser, M. Farach-Colton and S. Kannan (1999) Tight bounds on learnability of evolution, preprint.
- S. L. Bezrukov, Isoperimetric problems in discrete spaces, Extremal problems for finite sets (Visegrád, 1991) Bolyai Soc. Math. Stud., vol. 3, János Bolyai Math. Soc., Budapest, 1994, pp. 59–91. MR 1319157
- P. M. Bleher, J. Ruiz, and V. A. Zagrebnov, On the purity of the limiting Gibbs state for the Ising model on the Bethe lattice, J. Statist. Phys. 79 (1995), no. 1-2, 473–482. MR 1325591, DOI 10.1007/BF02179399
- James A. Cavender, Taxonomy with confidence, Math. Biosci. 40 (1978), no. 3-4, 271–280. MR 503936, DOI 10.1016/0025-5564(78)90089-5
- Thomas M. Cover and Joy A. Thomas, Elements of information theory, Wiley Series in Telecommunications, John Wiley & Sons, Inc., New York, 1991. A Wiley-Interscience Publication. MR 1122806, DOI 10.1002/0471200611
- Mary Cryan, Leslie Ann Goldberg, and Paul W. Goldberg, Evolutionary trees can be learned in polynomial time in the two-state general Markov model, SIAM J. Comput. 31 (2001), no. 2, 375–397. MR 1861281, DOI 10.1137/S0097539798342496
- Péter L. Erdős, Michael A. Steel, László A. Székely, and Tandy J. Warnow, A few logs suffice to build (almost) all trees. I, Random Structures Algorithms 14 (1999), no. 2, 153–184. MR 1667319, DOI 10.1002/(SICI)1098-2418(199903)14:2<153::AID-RSA3>3.3.CO;2-I
- Péter L. Erdős, Michael A. Steel, László A. Székely, and Tandy J. Warnow, A few logs suffice to build (almost) all trees. II, Theoret. Comput. Sci. 221 (1999), no. 1-2, 77–118. ICALP ’97 (Bologna). MR 1700821, DOI 10.1016/S0304-3975(99)00028-6
- William Evans, Claire Kenyon, Yuval Peres, and Leonard J. Schulman, Broadcasting on trees and the Ising model, Ann. Appl. Probab. 10 (2000), no. 2, 410–433. MR 1768240, DOI 10.1214/aoap/1019487349
- J. S. Farris (1973) A probability model for inferring evolutionary trees, Syst. Zool., 22, 250–256.
- P. Frankl and Z. Füredi, A short proof for a theorem of Harper about Hamming-spheres, Discrete Math. 34 (1981), no. 3, 311–313. MR 613409, DOI 10.1016/0012-365X(81)90009-1
- Hans-Otto Georgii, Gibbs measures and phase transitions, De Gruyter Studies in Mathematics, vol. 9, Walter de Gruyter & Co., Berlin, 1988. MR 956646, DOI 10.1515/9783110850147
- Geoffrey Grimmett, Percolation, 2nd ed., Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 321, Springer-Verlag, Berlin, 1999. MR 1707339, DOI 10.1007/978-3-662-03981-6
- L. H. Harper, Optimal numberings and isoperimetric problems on graphs, J. Combinatorial Theory 1 (1966), 385–393. MR 200192
- Yasunari Higuchi, Remarks on the limiting Gibbs states on a $(d+1)$-tree, Publ. Res. Inst. Math. Sci. 13 (1977/78), no. 2, 335–348. MR 0676482, DOI 10.2977/prims/1195189812
- Dmitry Ioffe, On the extremality of the disordered state for the Ising model on the Bethe lattice, Lett. Math. Phys. 37 (1996), no. 2, 137–143. MR 1391195, DOI 10.1007/BF00416016
- S. Janson and E. Mossel (2003). Robust reconstruction on trees is determined by the second eigenvalue, submitted.
- 42nd IEEE Symposium on Foundations of Computer Science, IEEE Computer Society, Los Alamitos, CA, 2001. MR 1948690
- H. Kesten and B. P. Stigum, Additional limit theorems for indecomposable multidimensional Galton-Watson processes, Ann. Math. Statist. 37 (1966), 1463–1481. MR 200979, DOI 10.1214/aoms/1177699139
- Thomas M. Liggett, Interacting particle systems, Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 276, Springer-Verlag, New York, 1985. MR 776231, DOI 10.1007/978-1-4613-8542-4
- Russell Lyons, The Ising model and percolation on trees and tree-like graphs, Comm. Math. Phys. 125 (1989), no. 2, 337–353. MR 1016874
- Elchanan Mossel, Recursive reconstruction on periodic trees, Random Structures Algorithms 13 (1998), no. 1, 81–97. MR 1634344, DOI 10.1002/(SICI)1098-2418(199808)13:1<81::AID-RSA5>3.3.CO;2-K
- Elchanan Mossel, Reconstruction on trees: beating the second eigenvalue, Ann. Appl. Probab. 11 (2001), no. 1, 285–300. MR 1825467, DOI 10.1214/aoap/998926994
- E. Mossel (2003) On the impossibility of reconstructing ancestral data and phylogenetic trees, Jour. Comput. Biol., to appear.
- E. Mossel and Y. Peres (2003) Information flow on trees, Ann. Appl. Probab., to appear.
- E. Mossel and M. A. Steel (2003) A phase transition for a random cluster model on phylogenetic trees, submitted.
- Jerzy Neyman, Molecular studies of evolution: A source of novel statistical problems, Statistical decision theory and related topics (Proc. Sympos., Purdue Univ., Lafayette, Ind., 1970) Academic Press, New York, 1971, pp. 1–27. MR 0327321
- Yuval Peres, Probability on trees: an introductory climb, Lectures on probability theory and statistics (Saint-Flour, 1997) Lecture Notes in Math., vol. 1717, Springer, Berlin, 1999, pp. 193–280. MR 1746302, DOI 10.1007/978-3-540-48115-7_{3}
- Frank Spitzer, Markov random fields on an infinite tree, Ann. Probability 3 (1975), no. 3, 387–398. MR 378152, DOI 10.1214/aop/1176996347
- M. A. Steel (1994) Recovering a tree from the leaf colourations it generates under a Markov model, App. Math. Lett., 7(2), 19–24.
- M.A. Steel, L.A. Székely and M. D. Hendy (1994). Reconstructing trees when sequence sites evolve at variable rates, Jour. Comput. Biol., 1, 153–163.
- M. A. Steel (2001), My favorite conjecture, http://www.math.canterbury.ac.nz/$\sim$mathmas/ conjecture.pdf.
Bibliographic Information
- Elchanan Mossel
- Affiliation: Department of Statistics, Evans Hall, University of California, Berkeley, California 94720-3860
- MR Author ID: 637297
- Email: mossel@stat.berkeley.edu
- Received by editor(s): May 21, 2002
- Received by editor(s) in revised form: April 10, 2003
- Published electronically: October 28, 2003
- Additional Notes: The author was supported by a Miller Fellowship. Most of the research reported here was conducted while the author was a PostDoc in theory group, Microsoft Research.
- © Copyright 2003 American Mathematical Society
- Journal: Trans. Amer. Math. Soc. 356 (2004), 2379-2404
- MSC (2000): Primary 60K35, 92D15; Secondary 60J85, 82B26
- DOI: https://doi.org/10.1090/S0002-9947-03-03382-8
- MathSciNet review: 2048522