Test for mean matrix in GMANOVA model under heteroscedasticity and non-normality for high-dimensional data
Authors:
Takayuki Yamada, Tetsuto Himeno, Annika Tillander and Tatjana Pavlenko
Journal:
Theor. Probability and Math. Statist. 109 (2023), 129-158
MSC (2020):
Primary 62H15, 62E20; Secondary 62H10
DOI:
https://doi.org/10.1090/tpms/1200
Published electronically:
October 3, 2023
MathSciNet review:
4652997
Full-text PDF
Abstract |
References |
Similar Articles |
Additional Information
Abstract: This paper develops a unified testing methodology for high-dimensional generalized multivariate analysis of variance (GMANOVA) models. We derive a test of the bilateral linear hypothesis on the mean matrix in a general scenario where the dimensions of the observed vector may exceed the sample size, design may be unbalanced, the population distribution may be non-normal and the underlying group covariance matrices may be unequal. The suggested methodology is suitable for many inferential problems, such as the one-way MANOVA test and the test for multivariate linear hypothesis on the mean in the polynomial growth curve model. As a key component of our test procedure, we propose a bias-corrected estimator of the Frobenius norm of the mean matrix. We derive null and non-null asymptotic distributions of the test statistic under a general high-dimensional asymptotic framework that allows the dimensionality to arbitrarily exceed the sample size of a group. The accuracy of the proposed test in a finite sample setting is investigated through simulations conducted for several high-dimensional scenarios and various underlying population distributions in combination with different within-group covariance structures. For a practical demonstration we consider a daily Canadian temperature dataset that exhibits group structure, and conclude that the interaction of latitude and longitude has no effect to predict the temperature.
References
- T. W. Anderson, An introduction to multivariate statistical analysis, 3rd ed., Wiley Series in Probability and Statistics, Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, 2003. MR 1990662
- Zhidong Bai, Kwok Pui Choi, and Yasunori Fujikoshi, Limiting behavior of eigenvalues in high-dimensional MANOVA via RMT, Ann. Statist. 46 (2018), no. 6A, 2985–3013. MR 3851762, DOI 10.1214/17-AOS1646
- Zhidong Bai and Hewa Saranadasa, Effect of high dimension: by an example of a two sample problem, Statist. Sinica 6 (1996), no. 2, 311–329. MR 1399305
- T. Tony Cai, Weidong Liu, and Yin Xia, Two-sample test of high dimensional means under dependence, J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 (2014), no. 2, 349–372. MR 3164870, DOI 10.1111/rssb.12034
- T. Tony Cai and Yin Xia, High-dimensional sparse MANOVA, J. Multivariate Anal. 131 (2014), 174–196. MR 3252643, DOI 10.1016/j.jmva.2014.07.002
- Song Xi Chen, Jun Li, and Ping-Shou Zhong, Two-sample and ANOVA tests for high dimensional means, Ann. Statist. 47 (2019), no. 3, 1443–1474. MR 3911118, DOI 10.1214/18-AOS1720
- Song Xi Chen and Ying-Li Qin, A two-sample test for high-dimensional data with applications to gene-set testing, Ann. Statist. 38 (2010), no. 2, 808–835. MR 2604697, DOI 10.1214/09-AOS716
- Yasunori Fujikoshi, Vladimir V. Ulyanov, and Ryoichi Shimizu, Multivariate statistics, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., Hoboken, NJ, 2010. High-dimensional and large-sample approximations. MR 2640807, DOI 10.1002/9780470539873
- Anil K. Ghosh and Munmun Biswas, Distribution-free high-dimensional two-sample tests based on discriminating hyperplanes, TEST 25 (2016), no. 3, 525–547. MR 3531841, DOI 10.1007/s11749-015-0467-x
- C. C. Heyde and B. M. Brown, On the departure from normality of a certain class of martingales, Ann. Math. Statist. 41 (1970), 2161–2165. MR 293702, DOI 10.1214/aoms/1177696722
- Sayantee Jana, Narayanaswamy Balakrishnan, Dietrich von Rosen, and Jemila Seid Hamid, High dimensional extension of the growth curve model and its application in genetics, Stat. Methods Appl. 26 (2017), no. 2, 273–292. MR 3652497, DOI 10.1007/s10260-016-0369-4
- Robb J. Muirhead, Aspects of multivariate statistical theory, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York, 1982. MR 652932, DOI 10.1002/9780470316559
- J. O. Ramsay and B. W. Silverman, Functional data analysis, 2nd ed., Springer Series in Statistics, Springer, New York, 2005. MR 2168993, DOI 10.1007/b98888
- M. S. Srivastava, Methods of multivariate statistics, Wiley Series in Probability and Statistics, Wiley-Interscience [John Wiley & Sons], New York, 2002. MR 1915968
- Muni S. Srivastava and Tatsuya Kubokawa, Tests for multivariate analysis of variance in high dimension under non-normality, J. Multivariate Anal. 115 (2013), 204–216. MR 3004555, DOI 10.1016/j.jmva.2012.10.011
- Muni S. Srivastava and Martin Singull, Test for the mean matrix in a growth curve model for high dimensions, Comm. Statist. Theory Methods 46 (2017), no. 13, 6668–6683. MR 3631538, DOI 10.1080/03610926.2015.1132328
- Sho Takahashi and Nobumichi Shutoh, Tests for parallelism and flatness hypotheses of two mean vectors in high-dimensional settings, J. Stat. Comput. Simul. 86 (2016), no. 6, 1150–1165. MR 3441561, DOI 10.1080/00949655.2015.1055269
- Dietrich von Rosen, Bilinear regression analysis, Lecture Notes in Statistics, vol. 220, Springer, Cham, 2018. An introduction. MR 3823252, DOI 10.1007/978-3-319-78784-8
- Lan Wang, Bo Peng, and Runze Li, A high-dimensional nonparametric multivariate test for mean vector, J. Amer. Statist. Assoc. 110 (2015), no. 512, 1658–1669. MR 3449062, DOI 10.1080/01621459.2014.988215
- Wei Wang, Nan Lin, and Xiang Tang, Robust two-sample test of high-dimensional mean vectors under dependence, J. Multivariate Anal. 169 (2019), 312–329. MR 3875602, DOI 10.1016/j.jmva.2018.09.013
- Takayuki Yamada and Tetsuto Himeno, Testing homogeneity of mean vectors under heteroscedasticity in high-dimension, J. Multivariate Anal. 139 (2015), 7–27. MR 3349477, DOI 10.1016/j.jmva.2015.02.005
- Takayuki Yamada and Tetsuro Sakurai, Asymptotic power comparison of three tests in GMANOVA when the number of observed points is large, Statist. Probab. Lett. 82 (2012), no. 3, 692–698. MR 2887488, DOI 10.1016/j.spl.2011.12.004
- Bu Zhou, Jia Guo, and Jin-Ting Zhang, High-dimensional general linear hypothesis testing under heteroscedasticity, J. Statist. Plann. Inference 188 (2017), 36–54. MR 3648316, DOI 10.1016/j.jspi.2017.03.005
References
- T. W. Anderson, An introduction to multivariate statistical analysis, third ed., Wiley Series in Probability and Statistics, Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, 2003. MR 1990662
- Z. Bai, K. P. Choi, and Y. Fujikoshi, Limiting behavior of eigenvalues in high-dimensional MANOVA via RMT, Ann. Statist. 46 (2018), no. 6A, 2985–3013. MR 3851762
- Z. Bai and H. Saranadasa, Effect of high dimension: by an example of a two sample problem, Statist. Sinica 6 (1996), no. 2, 311–329. MR 1399305
- T. T. Cai, W. Liu, and Y. Xia, Two-sample test of high dimensional means under dependence, J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 (2014), no. 2, 349–372. MR 3164870
- T. T. Cai and Y. Xia, High-dimensional sparse MANOVA, J. Multivariate Anal. 131 (2014), 174–196. MR 3252643
- S. X. Chen, J. Li, and P.-S. Zhong, Two-sample and ANOVA tests for high dimensional means, Ann. Statist. 47 (2019), no. 3, 1443–1474. MR 3911118
- S. X. Chen and Y.-L. Qin, A two-sample test for high-dimensional data with applications to gene-set testing, Ann. Statist. 38 (2010), no. 2, 808–835. MR 2604697
- Y. Fujikoshi, V. V. Ulyanov, and R. Shimizu, Multivariate statistics: High-dimensional and large-sample approximations, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., Hoboken, NJ, 2010. MR 2640807
- A. K. Ghosh and M. Biswas, Distribution-free high-dimensional two-sample tests based on discriminating hyperplanes, TEST 25 (2016), no. 3, 525–547. MR 3531841
- C. C. Heyde and B. M. Brown, On the departure from normality of a certain class of martingales, Ann. Math. Statist. 41 (1970), 2161–2165. MR 293702
- S. Jana, N. Balakrishnan, D. von Rosen, and J. S. Hamid, High dimensional extension of the growth curve model and its application in genetics, Stat. Methods Appl. 26 (2017), no. 2, 273–292. MR 3652497
- R. J. Muirhead, Aspects of multivariate statistical theory, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York, 1982. MR 652932
- J. O. Ramsay and B. W. Silverman, Functional data analysis, second ed., Springer Series in Statistics, Springer, New York, 2005. MR 2168993
- M. S. Srivastava, Methods of multivariate statistics, Wiley Series in Probability and Statistics, Wiley-Interscience [John Wiley & Sons], New York, 2002. MR 1915968
- M. S. Srivastava and T. Kubokawa, Tests for multivariate analysis of variance in high dimension under non-normality, J. Multivariate Anal. 115 (2013), 204–216. MR 3004555
- M. S. Srivastava and M. Singull, Test for the mean matrix in a growth curve model for high dimensions, Comm. Statist. Theory Methods 46 (2017), no. 13, 6668–6683. MR 3631538
- S. Takahashi and N. Shutoh, Tests for parallelism and flatness hypotheses of two mean vectors in high-dimensional settings, J. Stat. Comput. Simul. 86 (2016), no. 6, 1150–1165. MR 3441561
- D. von Rosen, Bilinear regression analysis: An introduction, Lecture Notes in Statistics, vol. 220, Springer, Cham, 2018. MR 3823252
- L. Wang, B. Peng, and R. Li, A high-dimensional nonparametric multivariate test for mean vector, J. Amer. Statist. Assoc. 110 (2015), no. 512, 1658–1669. MR 3449062
- W. Wang, N. Lin, and X. Tang, Robust two-sample test of high-dimensional mean vectors under dependence, J. Multivariate Anal. 169 (2019), 312–329. MR 3875602
- T. Yamada and T. Himeno, Testing homogeneity of mean vectors under heteroscedasticity in high-dimension, J. Multivariate Anal. 139 (2015), 7–27. MR 3349477
- T. Yamada and T. Sakurai, Asymptotic power comparison of three tests in GMANOVA when the number of observed points is large, Statist. Probab. Lett. 82 (2012), no. 3, 692–698. MR 2887488
- B. Zhou, J. Guo, and J.-T. Zhang, High-dimensional general linear hypothesis testing under heteroscedasticity, J. Statist. Plann. Inference 188 (2017), 36–54. MR 3648316
Similar Articles
Retrieve articles in Theory of Probability and Mathematical Statistics
with MSC (2020):
62H15,
62E20,
62H10
Retrieve articles in all journals
with MSC (2020):
62H15,
62E20,
62H10
Additional Information
Takayuki Yamada
Affiliation:
Faculty of Data Science, Kyoto Women’s University, 35 Kitahiyoshi-cho, Imakumano, Higashiyama-ku, Kyoto 605-8501, Japan
Email:
yamadatak@kyoto-wu.ac.jp
Tetsuto Himeno
Affiliation:
Faculty of Data Science, Shiga University, 1-1-1 Banba, Hikone, Shiga 522-8522, Japan
Annika Tillander
Affiliation:
Department of Computer and Information Science, Linköping University, 581 83 Linköping, Sweden
Tatjana Pavlenko
Affiliation:
Department of Statistics, Uppsala University, Box 513, 751 20 Uppsala, Sweden
Keywords:
Asymptotic distribution,
bilateral linear hypothesis on mean matrix,
bias correction approach,
$(N,p)$-asymptotic
Received by editor(s):
April 17, 2022
Accepted for publication:
January 25, 2023
Published electronically:
October 3, 2023
Additional Notes:
The first author was supported in part by the Ministry of Education, Science, Sports, and Culture, a Grant-in-Aid for Scientific Research (C), 18K03419, 2018-2022.
The second author was supported in part by the JSPS, Grant-in-Aid for Scientific Research, Young Scientists (B), KAKENHI Grant Number JP16K16018, 2016-2020.
Article copyright:
© Copyright 2023
Taras Shevchenko National University of Kyiv