Remote Access Theory of Probability and Mathematical Statistics

Theory of Probability and Mathematical Statistics

ISSN 1547-7363(online) ISSN 0094-9000(print)

 
 

 

PageRank for networks, graphs, and Markov chains


Authors: C. Engström and S. Silvestrov
Original publication: Teoriya Imovirnostei ta Matematichna Statistika, tom 96 (2017).
Journal: Theor. Probability and Math. Statist. 96 (2018), 59-82
MSC (2010): Primary 60K20; Secondary 05C82, 90B15, 90B18
DOI: https://doi.org/10.1090/tpms/1034
Published electronically: October 5, 2018
Full-text PDF

Abstract | References | Similar Articles | Additional Information

Abstract:

In this work it is described how a partitioning of a graph into components can be used to calculate PageRank in a large network and how such a partitioning can be used to re-calculate PageRank as the network changes. Although calculating PageRank is considered a problem, it is worth noing that the same partitioning method could be used when working with Markov chains in general or solving linear systems as long as the method used for solving a single component is chosen appropriately. An algorithm for calculating PageRank using a modified partitioning of the graph into strongly connected components is described.

Moreover, the paper also focuses on the calculation of PageRank in a changing graph from two different perspectives, by considering specific types of changes in the graph and calculating the difference in rank before and after certain types of edge additions or removals between components. Moreover, some common specific types of graphs for which it is possible to find analytic expressions for PageRank are considered, and in particular the complete bipartite graph and how PageRank can be calculated for such a graph. Finally, several open directions and problems are described.


References [Enhancements On Off] (What's this?)

  • [1] J. Leskovec, Google web graph. Google programming contest, http://snap.stanford.edu/data/web-Google.html.
  • [2] Fredrik K. Andersson and Sergei D. Silvestrov, The mathematics of internet search engines, Acta Appl. Math. 104 (2008), no. 2, 211–242. MR 2443279, https://doi.org/10.1007/s10440-008-9254-y
  • [3] Konstantin E. Avrachenkov, Jerzy A. Filar, and Phil G. Howlett, Analytic perturbation theory and its applications, Society for Industrial and Applied Mathematics, Philadelphia, PA, 2013. MR 3137641
  • [4] A. Arasu, J. Novak, A. Tomkins, and J. Tomlin, Pagerank computation and the structure of the web: Experiments and algorithms, Proceedings of the Eleventh International Conference on World Wide Web, Alternate Poster Tracks, 2002.
  • [5] L. Becchetti and C. Castillo, The distribution of pagerank follows a power-law only for particular values of the damping factor, Proceedings of the 15th international conference on World Wide Web, ACM Press, 2006.
  • [6] Richard Bellman, Introduction to matrix analysis, Classics in Applied Mathematics, vol. 12, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1995. Reprint of the 1960 original. MR 1312360
  • [7] Abraham Berman and Robert J. Plemmons, Nonnegative matrices in the mathematical sciences, Classics in Applied Mathematics, vol. 9, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1994. Revised reprint of the 1979 original. MR 1298430
  • [8] S. Brin and L. Page, The anatomy of a large-scale hypertextual web search engine, Proceedings of the Seventh International Conference on World Wide Web, Amsterdam, The Netherlands, 1998, Elsevier Science Publishers B.V., pp. 107-117.
  • [9] D. Dhyani, S. S. Bhowmick, and W.-K. Ng, Deriving and verifying statistical distribution of a hyperlink-based web page quality metric, Data Knowl. Eng. 46 (2003), no. 3, 291-315.
  • [10] C. Engström and S. Silvestrov, A componentwise pagerank algorithm, Applied Stochastic Models and Data Analysis (ASMDA 2015), The 16th Conference of the ASMDA International Society, 2015, pp. 186-199.
  • [11] C. Engström and S. Silvestrov, Non-normalized pagerank and random walks on n-partite graphs, 3rd Stochastic Modeling Techniques and Data Analysis International Conference (SMTDA2014), 2015, pp. 193-202.
  • [12] C. Engström and S. Silvestrov, Using graph partitioning to calculate pagerank in a changing network, 4th Stochastic Modeling Techniques and Data Analysis International Conference (SMTDA2016), 2016, pp. 155-164.
  • [13] C. Engström and S. Silvestrov, Calculating pagerank in a changing network with added or removed edges, International Conference in Nonlinear Problems in Aviation and Aerospace (ICNPAA), AIP Conference Proceedings, vol. 1798, 2017.
  • [14] Christopher Engström and Sergei Silvestrov, Generalisation of the damping factor in PageRank for weighted networks, Modern problems in insurance mathematics, EAA Ser., Springer, Cham, 2014, pp. 313–333. MR 3330692
  • [15] Christopher Engström and Sergei Silvestrov, PageRank, connecting a line of nodes with a complete graph, Engineering mathematics. II, Springer Proc. Math. Stat., vol. 179, Springer, Cham, 2016, pp. 249–274. MR 3630582
  • [16] Christopher Engström and Sergei Silvestrov, PageRank, connecting a line of nodes with a complete graph, Engineering mathematics. II, Springer Proc. Math. Stat., vol. 179, Springer, Cham, 2016, pp. 249–274. MR 3630582
  • [17] F. R. Gantmacher, The theory of matrices. Vols. 1, 2, Translated by K. A. Hirsch, Chelsea Publishing Co., New York, 1959. MR 0107649
  • [18] Mats Gyllenberg and Dmitrii S. Silvestrov, Quasi-stationary phenomena in nonlinearly perturbed stochastic systems, De Gruyter Expositions in Mathematics, vol. 44, Walter de Gruyter GmbH & Co. KG, Berlin, 2008. MR 2456816
  • [19] T. Haveliwala and S. Kamvar, The second eigenvalue of the google matrix, Technical Report 2003-20, Stanford InfoLab, 2003.
  • [20] H. Ishii, R. Tempo, E.-W. Bai, and F. Dabbene, Distributed randomized pagerank computation based on web aggregation, Decision and Control, 2009, Held jointly with the 2009 28th Chinese Control Conference, CDC/CCC 2009, Proceedings of the 48th IEEE Conference, 3026-3031.
  • [21] S. Kamvar and T. Haveliwala, The condition number of the pagerank problem, Technical Report 2003-36, Stanford InfoLab, June 2003.
  • [22] Sepandar Kamvar, Taher Haveliwala, and Gene Golub, Adaptive methods for the computation of PageRank, Linear Algebra Appl. 386 (2004), 51–65. MR 2066607, https://doi.org/10.1016/j.laa.2003.12.008
  • [23] Vladimir S. Koroliuk and Nikolaos Limnios, Stochastic systems in merging phase space, World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2005. MR 2205562
  • [24] Peter Lancaster, Theory of matrices, Academic Press, New York-London, 1969. MR 0245579
  • [25] Amy N. Langville and Carl D. Meyer, A reordering for the PageRank problem, SIAM J. Sci. Comput. 27 (2006), no. 6, 2112–2120. MR 2211442, https://doi.org/10.1137/040607551
  • [26] Chris P. Lee, Gene H. Golub, and Stefanos A. Zenios, A two-stage algorithm for computing PageRank and multistage generalizations, Internet Math. 4 (2007), no. 4, 299–327. MR 2522947
  • [27] J. Leskovec and A. Krevl, SNAP Datasets: Stanford large network dataset collection, http://snap.stanford.edu/data, June 2014.
  • [28] Dmitrii Silvestrov and Sergei Silvestrov, Asymptotic expansions for stationary distributions of perturbed semi-Markov processes, Engineering mathematics. II, Springer Proc. Math. Stat., vol. 179, Springer, Cham, 2016, pp. 151–222. MR 3630580
  • [29] D. Silvestrov and S. Silvestrov, Asymptotic expansions for stationary distributions of nonlinearly perturbed semi-Markov processes. I, II (2016). Part I: arXiv:1603.04734, Part II: arXiv:1603.04743.
  • [30] Dmitrii Silvestrov and Sergei Silvestrov, Nonlinearly perturbed semi-Markov processes, SpringerBriefs in Probability and Mathematical Statistics, Springer, Cham, 2017. MR 3700359
  • [31] Robert Tarjan, Depth-first search and linear graph algorithms, SIAM J. Comput. 1 (1972), no. 2, 146–160. MR 0304178, https://doi.org/10.1137/0201010
  • [32] Q. Yu, Z. Miao, G. Wu, and Y. Wei, Lumping algorithms for computing google's pagerank and its derivative, with attention to unreferenced nodes, Information Retrieval 15 (2012), no. 6, 503-526.

Similar Articles

Retrieve articles in Theory of Probability and Mathematical Statistics with MSC (2010): 60K20, 05C82, 90B15, 90B18

Retrieve articles in all journals with MSC (2010): 60K20, 05C82, 90B15, 90B18


Additional Information

C. Engström
Affiliation: Division of Applied Mathematics, The School of Education, Culture and Communication, Mälardalen University, Box 883, 721 23 Västerås, Sweden
Email: christopher.engstrom@mdh.se

S. Silvestrov
Affiliation: Division of Applied Mathematics, The School of Education, Culture and Communication, Mälardalen University, Box 883, 721 23 Västerås, Sweden
Email: sergei.silvestrov@mdh.se

DOI: https://doi.org/10.1090/tpms/1034
Keywords: PageRank, random walk, Markov chain, graph, strongly connected component
Received by editor(s): March 13, 2017
Published electronically: October 5, 2018
Dedicated: Dedicated to Professor Dmitrii Silvestrov on his 70th birthday
Article copyright: © Copyright 2018 American Mathematical Society