Skip to main content
Log in

Dynamical sources in information theory: A general analysis of trie structures

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Digital trees, also known as tries, are a general purpose flexible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of array-tries, list tries, and bst-tries (“ternary search tries”). The size and the search costs of the corresponding representations are analysed precisely in the average case, while a complete distributional analysis of the height of tries is given. The unifying data model used is that of dynamical sources and it encompasses classical models like those of memoryless sources with independent symbols, of finite Markov chains, and of nonuniform densities. The probabilistic behaviour of the main parameters, namely, size, path length, or height, appears to be determined by two intrinsic characteristics of the source: the entropy and the probability of letter coincidence. These characteristics are themselves related in a natural way to spectral properties of specific transfer operators of the Ruelle type.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Allen, B., and Munro, I. Self-organizing binary search trees.Journal of the ACM 25(4) (Oct. 1978), 526–535.

    Article  MATH  MathSciNet  Google Scholar 

  2. Bedford, T., Keane, M., and Series, C.Ergodic Theory, Symbolic Dynamics and Hyperbolic Spaces. Oxford University Press, Oxford, 1991.

    MATH  Google Scholar 

  3. Bentley, J., and Sedgewick, R. Fast algorithms for sorting and searching strings. InProceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (New Orleans, January 1997), pp. 360–369. SIAM Press, Philadelphia, PA, 1997.

    Google Scholar 

  4. Bogomolny, E.B., and Carioli, M. Quantum maps from transfer operators.Physica D 67 (1993), 88–112.

    Article  MATH  MathSciNet  Google Scholar 

  5. Burge, W. H. An analysis of binary search trees formed from sequences of nondistinct keys.Journal of the ACM 23(3) (July 1976), 451–454.

    Article  MATH  MathSciNet  Google Scholar 

  6. Clampett, H. A. Randomized binary searching with tree structures.Communications of the ACM 7(3) (March 1964), 163–165.

    Article  MATH  Google Scholar 

  7. Clément, J., Flajolet, P., and Vallée, B. The analysis of hybrid trie structures. InProceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (Philadelphia, 1998), pp. 531–539. SIAM Press, Philadelphia, PA, 1988.

    Google Scholar 

  8. Clément, J., and Zimmermann, P. An efficient spell-checker based on ternary search tries (provisional title). Research report, Institut National de Recherche en Informatique et en Automatique, 2000. In preparation.

  9. Daudé, H., Flajolet, P., and Vallée, B. An average-case analysis of the Gaussian algorithm for lattice reduction.Combinatorics, Probability and Computing 6 (1997), 397–433.

    Article  MATH  MathSciNet  Google Scholar 

  10. De Bruijn, N. G.Asymptotic Methods in Analysis. Dover, 1981. first edition, A reprint of the third North-Holland edition, Amsterdam, 1970 (1958).

    Google Scholar 

  11. Devroye, L. A probabilistic analysis of the height of tries and of the complexity of triesort.Acta Informatica 21 (1984), 229–237.

    Article  MATH  MathSciNet  Google Scholar 

  12. Devroye, L. A study of trie-like structures under the density model.The Annals of Applied Probability 2(2) (1992), 402–434.

    Article  MATH  MathSciNet  Google Scholar 

  13. Efrat, I. Dynamics of the continued fraction map and the spectral theory ofSL(2,Z).Inventiones Mathematicae 114 (1993), 207–218.

    Article  MATH  MathSciNet  Google Scholar 

  14. Fagin, R., Nievergelt, J., Pippenger, N., and Strong, R. Extendible hashing: a fast access method for dynamic files.ACM Transactions on Database Systems 4 (1979), 315–344.

    Article  Google Scholar 

  15. Faivre, C. Distribution of Lévy’s constants for quadratic numbers.Acta Arithmetica 61(1) (1992), 13–34.

    MATH  MathSciNet  Google Scholar 

  16. Fayolle, G., Flajolet, P., and Hofri, M. On a functional equation arising in the analysis of a protocol for a multi-accessbroadcast channel.Advances in Applied Probability 18 (1986), 441–472.

    Article  MATH  MathSciNet  Google Scholar 

  17. Flajolet, P. On the performance evaluation of extendible hashing and trie searching.Acta Informatica 20 (1983), 345–369.

    Article  MATH  MathSciNet  Google Scholar 

  18. Flajolet, P., Gardy, D., and Thimonier, L. Birthday paradox, coupon collectors, caching algorithms, and self-organizing search.Discrete Applied Mathematics 39 (1992), 207–229.

    Article  MATH  MathSciNet  Google Scholar 

  19. Flajolet, P., Gourdon, X., and Dumas, P. Mellin transforms and asymptotics: harmonic sums.Theoretical Computer Science 144(1–2) (June 1995), 3–58.

    Article  MATH  MathSciNet  Google Scholar 

  20. Flajolet, P., and Puech, C. Partial match retrieval of multidimensional data.Journal of the ACM 33(2) (1986), 371–407.

    Article  MathSciNet  Google Scholar 

  21. Flajolet, P., and Steyaert, J.-M. A branching process arising in dynamic hashing, trie searching and polynomial factorization. InAutomata, Languages and Programming, M. Nielsen and E. M. Schmidt, eds., vol. 140 of Lecture Notes in Computer Science, pp. 239–251. Springer-Verlag, Berlin, 1982. Proceedings of 9th ICALP Colloquium, Aarhus, Denmark, July 1982.

    Chapter  Google Scholar 

  22. Flajolet, P., and Vallée, B. Continued fraction algorithms, functional operators and structure constants.Theoretical Computer Science 194 (1998), 1–34.

    Article  MATH  MathSciNet  Google Scholar 

  23. Gonnet, G. H., and Baeza-Yates, R.Handbook of Algorithms and Data Structures: in Pascal and C, second edn. Addison-Wesley, Reading, MA, 1991.

    Google Scholar 

  24. Grothendieck, A.Produits tensoriels topologiques et espaces nucléaires. No. 16 in Memoirs of the American Mathematical Society. A.M.S., Providence, RI, 1955.

    Google Scholar 

  25. Grothendieck, A. La théorie de Fredholm.Bulletin de la Société Mathématique de France 84 (1956), 319–384.

    MATH  MathSciNet  Google Scholar 

  26. Jacquet, P., and Régnier, M. Trie partitioning process: limiting distributions. InCAAP ’86, P. Franchi-Zanetacchi, ed., vol. 214 of Lecture Notes in Computer Science, pp. 196–210. Springer-Verlag, Berlin, 1986. Proceedings of the 11th Colloquium on Trees in Algebra and Programming, Nice France, March 1986.

    Google Scholar 

  27. Jacquet, P., and Szpankowski, W. Analysis of digital tries with Markovian dependency.IEEE Transactions on Information Theory 37(5) (1991), 1470–1475.

    Article  Google Scholar 

  28. Jacquet, P., and Szpankowski, W. Autocorrelation on words and its applications: analysis of suffix trees by string-ruler approach.Journal of Combinatorial Theory. Series A 66(2) (1994), 237–269.

    Article  MATH  MathSciNet  Google Scholar 

  29. Jacquet, P., and Szpankowski, W. Asymptotic behavior of the Lempel-Ziv parsing scheme and digital search trees.Theoretical Computer Science 144(1–2) (1995), 161–197.

    Article  MATH  MathSciNet  Google Scholar 

  30. Jacquet, P., and Szpankowski, W. Analytical de-Poissonization and its applications.Theoretical Computer Science 201(1–2) (1998), 1–62.

    Article  MATH  MathSciNet  Google Scholar 

  31. Janson, S. Private communication.

  32. Kato, T.Perturbation Theory for Linear Operators. Springer-Verlag, New York, 1980.

    MATH  Google Scholar 

  33. Kirschenhofer, P., and Prodinger, H. Eine Anwendung der Theorie der Modulfunktionen in der Informatik.Sitzungsberichte der Österreichischen Akademie der Wissenschaften 197 (1988), 339–366.

    MATH  MathSciNet  Google Scholar 

  34. Kirschenhofer, P., and Prodinger, H. On some applications of formulæ of Ramanujan in the analysis of algorithms.Mathematika 38 (1991), 14–33.

    Article  MATH  MathSciNet  Google Scholar 

  35. Kirschenhofer, P., Prodinger, H., and Szpankowski, W. Analysis of a splitting process arising in probabilistic counting and other related algorithms.Random Structures & Algorithms 9 (1996), 379–402.

    Article  MATH  MathSciNet  Google Scholar 

  36. Knuth, D. E.The Art of Computer Programming, third edn., vol. 3: Sorting and Searching. Addison-Wesley, Reading, MA, 1998.

    Google Scholar 

  37. Krasnoselskii, M.Positive Solutions of Operator Equations. P. Noordhoff, Groningen, 1964.

    Google Scholar 

  38. Larson, P. A. Dynamic hashing.BIT 18 (1978), 184–201.

    Article  MATH  MathSciNet  Google Scholar 

  39. Lind, D., and Marcus, B.An Introduction to Symbolic Dynamics and Coding. Cambridge University Press, Cambridge, 1995.

    MATH  Google Scholar 

  40. Lothaire, M.Combinatorics on Words, vol. 17 of Encyclopedia of Mathematics and its Applications. Addison-Wesley, Reading, MA, 1983.

    MATH  Google Scholar 

  41. Mahmoud, H.Evolution of Random Search Trees. Wiley, New York, 1992.

    MATH  Google Scholar 

  42. Mayer, D. H. Private communication.

  43. Mayer, D. H. Spectral properties of certain composition operators arising in statistical mechanics.Communications in Mathematical Physics 68 (1979), 1–8.

    Article  MATH  MathSciNet  Google Scholar 

  44. Mayer, D. H. On composition operators on Banach spaces of holomorphic functions.Journal of Functional Analysis 35 (1980), 191–206.

    Article  MATH  MathSciNet  Google Scholar 

  45. Pittel, B. Paths in a random digital tree: limiting distributions.Advances in Applied Probability 18(1) (1986), 139–155.

    Article  MATH  MathSciNet  Google Scholar 

  46. Pollicott, M. A complex Ruelle-Perron-Frobenius theorem and two counterexamples.Ergodic Theory and Dynamical Systems 4 (1984), 135–146.

    Article  MATH  MathSciNet  Google Scholar 

  47. Prodinger, H. Combinatorics of geometrically distributed random variables: left-to-right maxima.Discrete Mathematics 153 (1996), 253–270.

    Article  MATH  MathSciNet  Google Scholar 

  48. Régnier, M. On the average height of trees in in digital search and dynamic hashing.Information Processing Letters 13 (1982), 64–66.

    Article  Google Scholar 

  49. Rivest, R. L. Partial match retrieval algorithms.SIAM Journal on Computing 5 (1976), 19–50.

    Article  MATH  MathSciNet  Google Scholar 

  50. Ruelle, D.Thermodynamic Formalism. Addison-Wesley, Reading, MA, 1978.

    MATH  Google Scholar 

  51. Ruelle, D.Dynamical Zeta Functions for Piecewise Monotone Maps of the Interval, vol. 4 of CRM Monograph Series. American Mathematical Society, Providence, RI, 1994.

    MATH  Google Scholar 

  52. Schwartz, H. Composition operators in ℋp. Ph.D. thesis, University of Toledo, 1969.

  53. Sedgewick, R.Algorithms in C: Fundamentals, Data Structures, Sorting, Searching, third edn. Addison-Wesley, Reading, MA, 1988.

    Google Scholar 

  54. Shapiro, J. Compact composition operators on spaces of boundary regular holomorphic functions.Proceedings of the AMS 100 (1997), 49–57.

    Article  Google Scholar 

  55. Shapiro, J., and Taylor, P. Compact, nuclear, and Hilbert-Schmidt composition operators on ℋ2.Indiana University Mathematics Journal 23 (1973), 471–496.

    Article  MATH  MathSciNet  Google Scholar 

  56. Szpankowski, W. Some results onV-ary asymmetric tries.Journal of Algorithms 9 (1988), 224–244.

    Article  MATH  MathSciNet  Google Scholar 

  57. Szpankowski, W. On the height of digital trees and related problems.Algorithmica 6(2) (1991), 256–277.

    Article  MATH  MathSciNet  Google Scholar 

  58. Titchmarsh, E. C., and Heath-Brown, D. R.The Theory of the Riemann Zeta-Function, second edn. Oxford Science Publications, Oxford, 1986.

    MATH  Google Scholar 

  59. Trabb Pardo, L. Set representation and set intersection. Technical report, Stanford University, 1978.

  60. Vallée, B. Algorithms for computing signs of 2×2 determinants: dynamics and average-case analysis. InAlgorithms—ESA ’97, G. W. R. Burkhard, ed., no. 1136 in Lecture Notes in Computer Science, pp. 486–499. Springer-Verlag, Berlin, 1997. Proceedings of the Fifth European Symposium on Algorithms, Graz, September 1997.

    Google Scholar 

  61. Vallée, B. Opérateurs de Ruelle-Mayer généralisés et analyse des algorithmes d’Euclide et de Gauss.Acta Arithmetica 81(2) (1997), 101–144.

    MATH  MathSciNet  Google Scholar 

  62. Vallée, B. Dynamics of the binary euclidean algorithm: functional analysis and operators.Algorithmica 22(4) (1998), 660–685.

    Article  MATH  MathSciNet  Google Scholar 

  63. Vallée, B. Dynamical sources in information theory: fundamental intervals and words prefixes.Algorithmica, this issue.

  64. Vallée, B., and Lemée, C. Average-case analyses of three algorithms for computing the jacobi symbol. Rapport de Recherche de l’Université de Caen, Les Cahiers du GREYC # 1, 1999.

  65. Yao, A. C.-C. A note on the analysis of extendible hashing.Information Processing Letters 11 (1980), 84–86.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by H. Prodinger and W. Szpankowski.

This work was supported in part by the Long Term Research Project ALCOM-IT (# 20244) of the European Union.

Online publication October 6, 2000.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clément, J., Flajolet, P. & Vallée, B. Dynamical sources in information theory: A general analysis of trie structures. Algorithmica 29, 307–369 (2001). https://doi.org/10.1007/BF02679623

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02679623

Key Words

Navigation