Graphical posterior predictive classification: Bayesian model averaging with particle Gibbs
Authors:
Tatjana Pavlenko and Felix L. Rios
Journal:
Theor. Probability and Math. Statist. 109 (2023), 81-99
MSC (2020):
Primary 54C40, 14E20; Secondary 46E25, 20C20
DOI:
https://doi.org/10.1090/tpms/1198
Published electronically:
October 3, 2023
MathSciNet review:
4652995
Full-text PDF
Abstract |
References |
Similar Articles |
Additional Information
Abstract: In this study, we present a multi-class graphical Bayesian predictive classifier that incorporates the uncertainty in the model selection into the standard Bayesian formalism. For each class, the dependence structure underlying the observed features is represented by a set of decomposable Gaussian graphical models. Emphasis is then placed on the Bayesian model averaging which takes full account of the class-specific model uncertainty by averaging over the posterior graph model probabilities. An explicit evaluation of the model probabilities is well known to be infeasible. To address this issue, we consider the particle Gibbs strategy of J. Olsson, T. Pavlenko, and F. L. Rios [Electron. J. Statist. 13 (2019), no. 2, 2865–2897] for posterior sampling from decomposable graphical models which utilizes the so-called Christmas tree algorithm of J. Olsson, T. Pavlenko, and F. L. Rios [Stat. Comput. 32 (2022), no. 5, Paper No. 80, 18] as proposal kernel. We also derive a strong hyper Markov law which we call the hyper normal Wishart law that allows to perform the resultant Bayesian calculations locally. The proposed predictive graphical classifier reveals superior performance compared to the ordinary Bayesian predictive rule that does not account for the model uncertainty, as well as to a number of out-of-the-box classifiers.
References
- Christophe Andrieu, Arnaud Doucet, and Roman Holenstein, Particle Markov chain Monte Carlo methods, J. R. Stat. Soc. Ser. B Stat. Methodol. 72 (2010), no. 3, 269–342. MR 2758115, DOI 10.1111/j.1467-9868.2009.00736.x
- Jose-M. Bernardo and Adrian F. M. Smith, Bayesian theory, Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics, John Wiley & Sons, Ltd., Chichester, 1994. MR 1274699, DOI 10.1002/9780470316870
- Simon Byrne and A. Philip Dawid, Structural Markov graph laws for Bayesian model uncertainty, Ann. Statist. 43 (2015), no. 4, 1647–1681. MR 3357874, DOI 10.1214/15-AOS1319
- Nicolas Chopin and Sumeetpal S. Singh, On particle Gibbs sampling, Bernoulli 21 (2015), no. 3, 1855–1883. MR 3352064, DOI 10.3150/14-BEJ629
- Merlise Clyde and Edward I. George, Model uncertainty, Statist. Sci. 19 (2004), no. 1, 81–94. MR 2082148, DOI 10.1214/088342304000000035
- Jukka Corander, Yaqiong Cui, and Timo Koski, Inductive inference and partition exchangeability in classification, Algorithmic probability and friends, Lecture Notes in Comput. Sci., vol. 7070, Springer, Heidelberg, 2013, pp. 91–105. MR 3128216, DOI 10.1007/978-3-642-44958-1_{7}
- Jukka Corander, Yaqiong Cui, Timo Koski, and Jukka Sirén, Have I seen you before? Principles of Bayesian predictive classification revisited, Stat. Comput. 23 (2013), no. 1, 59–73. MR 3018350, DOI 10.1007/s11222-011-9291-7
- J. Corander, T. Koski, T. Pavlenko, and A. Tillander, Bayesian block-diagonal predictive classifier for Gaussian data, Synergies of Soft Computing and Statistics for Intelligent Data Analysis, Springer, Berlin, Heidelberg, 2013, pp. 543–551.
- Yaqiong Cui, Jukka Sirén, Timo Koski, and Jukka Corander, Simultaneous predictive Gaussian classifiers, J. Classification 33 (2016), no. 1, 73–102. MR 3503204, DOI 10.1007/s00357-016-9197-3
- A. P. Dawid and B. Q. Fang, Conjugate Bayes discrimination with infinitely many variables, J. Multivariate Anal. 41 (1992), no. 1, 27–42. MR 1156679, DOI 10.1016/0047-259X(92)90055-K
- A. P. Dawid and S. L. Lauritzen, Hyper-Markov laws in the statistical analysis of decomposable graphical models, Ann. Statist. 21 (1993), no. 3, 1272–1317. MR 1241267, DOI 10.1214/aos/1176349260
- Seymour Geisser, Posterior odds for multivariate normal classifications, J. Roy. Statist. Soc. Ser. B 26 (1964), 69–76. MR 174133, DOI 10.1111/j.2517-6161.1964.tb00540.x
- Seymour Geisser, Predictive discrimination, Multivariate Analysis (Proc. Internat. Sympos., Dayton, Ohio, 1965) Academic Press, New York-London, 1966, pp. 149–163. MR 211539
- Seymour Geisser, Predictive inference, Monographs on Statistics and Applied Probability, vol. 55, Chapman and Hall, New York, 1993. An introduction. MR 1252174, DOI 10.1007/978-1-4899-4467-2
- Peter J. Green and Alun Thomas, Sampling decomposable graphs using a Markov chain on junction trees, Biometrika 100 (2013), no. 1, 91–110. MR 3034326, DOI 10.1093/biomet/ass052
- Peter J. Green and Alun Thomas, A structural Markov property for decomposable graph laws that allows control of clique intersections, Biometrika 105 (2018), no. 1, 19–29. MR 3768862, DOI 10.1093/biomet/asx072
- Robert E. Kass and Adrian E. Raftery, Bayes factors, J. Amer. Statist. Assoc. 90 (1995), no. 430, 773–795. MR 3363402, DOI 10.1080/01621459.1995.10476572
- Steffen L. Lauritzen, Graphical models, Oxford Statistical Science Series, vol. 17, The Clarendon Press, Oxford University Press, New York, 1996. Oxford Science Publications. MR 1419991
- D. Madigan and A. E. Raftery, Model selection and accounting for model uncertainty in graphical models using Occam’s window, Journal of the American Statistical Association 89 (1994), no. 428, 1535–1546.
- D. Madigan, J. York, and D. Allard, Bayesian graphical models for discrete data, International Statistical Review / Revue Internationale de Statistique 63 (1995), no. 2, 215–232.
- Henrik Nyman, Jie Xiong, Johan Pensar, and Jukka Corander, Marginal and simultaneous predictive classification using stratified graphical models, Adv. Data Anal. Classif. 10 (2016), no. 3, 305–326. MR 3541238, DOI 10.1007/s11634-015-0199-5
- Jimmy Olsson, Tatjana Pavlenko, and Felix L. Rios, Bayesian learning of weakly structural Markov graph laws using sequential Monte Carlo methods, Electron. J. Stat. 13 (2019), no. 2, 2865–2897. MR 3998930, DOI 10.1214/19-EJS1585
- Jimmy Olsson, Tatjana Pavlenko, and Felix L. Rios, Sequential sampling of junction trees for decomposable graphs, Stat. Comput. 32 (2022), no. 5, Paper No. 80, 18. MR 4487691, DOI 10.1007/s11222-022-10113-2
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort et al., Scikit-learn: machine learning in Python, J. Mach. Learn. Res. 12 (2011), 2825–2830. MR 2854348
- A. Reiss and D. Stricker, Creating and benchmarking a new dataset for physical activity monitoring, Proceedings of the 5th International Conference on Pervasive Technologies Related to Assistive Environments, ACM, 2012, p. 40.
- Amir M. Ben-Amram, Introducing: reasonable complete programming languages, Bull. Eur. Assoc. Theor. Comput. Sci. EATCS 64 (1998), 153–155. MR 1618301
- F. L. Rios, G. Moffa, and J. Kuipers, Benchpress: a scalable and versatile workflow for benchmarking structure learning algorithms for graphical models, arXiv preprint arXiv:2107.03863 (2021).
- B. D. Ripley, Pattern recognition and neural networks, Cambridge University Press, Cambridge, 2007. Reprint of the 1996 original. MR 2451352
- Alun Thomas and Peter J. Green, Enumerating the junction trees of a decomposable graph, J. Comput. Graph. Statist. 18 (2009), no. 4, 930–940. MR 2598034, DOI 10.1198/jcgs.2009.07129
- Nicholas C. Wormald, Counting labelled chordal graphs, Graphs Combin. 1 (1985), no. 2, 193–200. MR 951781, DOI 10.1007/BF02582944
References
- C. Andrieu, A. Doucet, and R. Holenstein, Particle Markov chain Monte Carlo methods, J. R. Stat. Soc. Ser. B Stat. Methodol. 72 (2010), no. 3, 269–342. MR 2758115
- J.M. Bernardo and A.F.M. Smith, Bayesian theory, Wiley Series in Probability and Statistics, John Wiley & Sons, Ltd., Chichester, 2000. MR 1274699
- S. Byrne and A. P. Dawid, Structural Markov graph laws for Bayesian model uncertainty, Ann. Statist. 43 (2015), no. 4, 1647–1681. MR 3357874
- N. Chopin and S. S. Singh, On particle Gibbs sampling, Bernoulli 21 (2015), no. 3, 1855–1883. MR 3352064
- M. Clyde and E. I. George, Model uncertainty, Statist. Sci. 19 (2004), no. 1, 81–94. MR 2082148
- J. Corander, Y. Cui, and T. Koski, Inductive inference and partition exchangeability in classification, Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence, Lecture Notes in Comput. Sci., vol. 7070, Springer, 2013, pp. 91–105. MR 3128216
- J. Corander, Y. Cui, T. Koski, and J. Sirén, Have I seen you before? Principles of Bayesian predictive classification revisited, Stat. Comput. 23 (2013), no. 1, 59–73. MR 3018350
- J. Corander, T. Koski, T. Pavlenko, and A. Tillander, Bayesian block-diagonal predictive classifier for Gaussian data, Synergies of Soft Computing and Statistics for Intelligent Data Analysis, Springer, Berlin, Heidelberg, 2013, pp. 543–551.
- Y. Cui, J. Sirén, T. Koski, and J. Corander, Simultaneous predictive Gaussian classifiers, J. Classification 33 (2016), no. 1, 73–102. MR 3503204
- A. P. Dawid and B. Q. Fang, Conjugate Bayes discrimination with infinitely many variables, J. Multivariate Anal. 41 (1992), no. 1, 27–42. MR 1156679
- A. P. Dawid and S. L. Lauritzen, Hyper Markov laws in the statistical analysis of decomposable graphical models, Ann. Statist. 21 (1993), no. 3, 1272–1317. MR 1241267
- S. Geisser, Posterior odds for multivariate normal classifications, J. Roy. Statist. Soc. Ser. B 26 (1964), 69–76. MR 174133
- —, Predictive discrimination, Multivariate Analysis (Proc. Internat. Sympos., Dayton, Ohio, 1965), Academic Press, New York, 1966, pp. 149–163. MR 0211539
- —, Predictive inference: An introduction., Monographs on Statistics and Applied Probability, vol. 55, Chapman and Hall, New York, 1993. MR 1252174
- P. J. Green and A. Thomas, Sampling decomposable graphs using a Markov chain on junction trees, Biometrika 100 (2013), no. 1, 91–110. MR 3034326
- —, A structural Markov property for decomposable graph laws that allows control of clique intersections, Biometrika 105 (2018), no. 1, 19–29. MR 3768862
- R. E. Kass and A. E. Raftery, Bayes factors, J. Amer. Statist. Assoc. 90 (1995), no. 430, 773–795. MR 3363402
- S. L. Lauritzen, Graphical models, Oxford University Press, New York, 1996. MR 1419991
- D. Madigan and A. E. Raftery, Model selection and accounting for model uncertainty in graphical models using Occam’s window, Journal of the American Statistical Association 89 (1994), no. 428, 1535–1546.
- D. Madigan, J. York, and D. Allard, Bayesian graphical models for discrete data, International Statistical Review / Revue Internationale de Statistique 63 (1995), no. 2, 215–232.
- H. Nyman, J. Xiong, J. Pensar, and J. Corander, Marginal and simultaneous predictive classification using stratified graphical models, Adv. Data Anal. Classif. 10 (2016), no. 3, 305–326. MR 3541238
- J. Olsson, T. Pavlenko, and F. L. Rios, Bayesian learning of weakly structural Markov graph laws using sequential Monte Carlo methods, Electron. J. Statist. 13 (2019), no. 2, 2865–2897. MR 3998930
- —, Sequential sampling of junction trees for decomposable graphs, Stat. Comput. 32 (2022), no. 5, Paper No. 80, 18. MR 4487691
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in Python, J. Mach. Learn. Res. 12 (2011), 2825–2830. MR 2854348
- A. Reiss and D. Stricker, Creating and benchmarking a new dataset for physical activity monitoring, Proceedings of the 5th International Conference on Pervasive Technologies Related to Assistive Environments, ACM, 2012, p. 40.
- —, Introducing a new benchmarked dataset for activity monitoring, Wearable Computers (ISWC), 2012 16th International Symposium on Wearable Computers, IEEE, 2012, pp. 108–109. MR 1618301
- F. L. Rios, G. Moffa, and J. Kuipers, Benchpress: a scalable and versatile workflow for benchmarking structure learning algorithms for graphical models, arXiv preprint arXiv:2107.03863 (2021).
- B. D. Ripley, Pattern recognition and neural networks, Cambridge University Press, Cambridge, 2007. MR 2451352
- A. Thomas and P. J. Green, Enumerating the junction trees of a decomposable graph, J. Comput. Graph. Statist. 18 (2009), no. 4, 930–940. MR 2598034
- N. C. Wormald, Counting labelled chordal graphs, Graphs Combin. 1 (1985), no. 2, 193–200. MR 951781
Similar Articles
Retrieve articles in Theory of Probability and Mathematical Statistics
with MSC (2020):
54C40,
14E20,
46E25,
20C20
Retrieve articles in all journals
with MSC (2020):
54C40,
14E20,
46E25,
20C20
Additional Information
Tatjana Pavlenko
Affiliation:
Department of Statistics, Uppsala University, Box 513, 751 20 Uppsala, Sweden
Email:
tatjana.pavlenko@statistik.uu.se
Felix L. Rios
Affiliation:
Department of Mathematics, KTH Royal Institute of Technology SE-100 44 Stockholm, Sweden
Email:
flrios@kth.se
Keywords:
Decomposable graphical models,
strong hyper Markov law,
particle Markov chain Monte Carlo
Received by editor(s):
March 31, 2022
Accepted for publication:
February 3, 2023
Published electronically:
October 3, 2023
Additional Notes:
The first author was supported in part by AI4Reseach Grant, Uppsala University.
Article copyright:
© Copyright 2023
Taras Shevchenko National University of Kyiv