Test for mean matrix in GMANOVA model under heteroscedasticity and non-normality for high-dimensional data

Authors:
Takayuki Yamada, Tetsuto Himeno, Annika Tillander and Tatjana Pavlenko

Journal:
Theor. Probability and Math. Statist. **109** (2023), 129-158

MSC (2020):
Primary 62H15, 62E20; Secondary 62H10

DOI:
https://doi.org/10.1090/tpms/1200

Published electronically:
October 3, 2023

MathSciNet review:
4652997

Full-text PDF

Abstract |
References |
Similar Articles |
Additional Information

Abstract: This paper develops a unified testing methodology for high-dimensional generalized multivariate analysis of variance (GMANOVA) models. We derive a test of the bilateral linear hypothesis on the mean matrix in a general scenario where the dimensions of the observed vector may exceed the sample size, design may be unbalanced, the population distribution may be non-normal and the underlying group covariance matrices may be unequal. The suggested methodology is suitable for many inferential problems, such as the one-way MANOVA test and the test for multivariate linear hypothesis on the mean in the polynomial growth curve model. As a key component of our test procedure, we propose a bias-corrected estimator of the Frobenius norm of the mean matrix. We derive null and non-null asymptotic distributions of the test statistic under a general high-dimensional asymptotic framework that allows the dimensionality to arbitrarily exceed the sample size of a group. The accuracy of the proposed test in a finite sample setting is investigated through simulations conducted for several high-dimensional scenarios and various underlying population distributions in combination with different within-group covariance structures. For a practical demonstration we consider a daily Canadian temperature dataset that exhibits group structure, and conclude that the interaction of latitude and longitude has no effect to predict the temperature.

References
- T. W. Anderson,
*An introduction to multivariate statistical analysis*, 3rd ed., Wiley Series in Probability and Statistics, Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, 2003. MR **1990662**
- Zhidong Bai, Kwok Pui Choi, and Yasunori Fujikoshi,
*Limiting behavior of eigenvalues in high-dimensional MANOVA via RMT*, Ann. Statist. **46** (2018), no. 6A, 2985–3013. MR **3851762**, DOI 10.1214/17-AOS1646
- Zhidong Bai and Hewa Saranadasa,
*Effect of high dimension: by an example of a two sample problem*, Statist. Sinica **6** (1996), no. 2, 311–329. MR **1399305**
- T. Tony Cai, Weidong Liu, and Yin Xia,
*Two-sample test of high dimensional means under dependence*, J. R. Stat. Soc. Ser. B. Stat. Methodol. **76** (2014), no. 2, 349–372. MR **3164870**, DOI 10.1111/rssb.12034
- T. Tony Cai and Yin Xia,
*High-dimensional sparse MANOVA*, J. Multivariate Anal. **131** (2014), 174–196. MR **3252643**, DOI 10.1016/j.jmva.2014.07.002
- Song Xi Chen, Jun Li, and Ping-Shou Zhong,
*Two-sample and ANOVA tests for high dimensional means*, Ann. Statist. **47** (2019), no. 3, 1443–1474. MR **3911118**, DOI 10.1214/18-AOS1720
- Song Xi Chen and Ying-Li Qin,
*A two-sample test for high-dimensional data with applications to gene-set testing*, Ann. Statist. **38** (2010), no. 2, 808–835. MR **2604697**, DOI 10.1214/09-AOS716
- Yasunori Fujikoshi, Vladimir V. Ulyanov, and Ryoichi Shimizu,
*Multivariate statistics*, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., Hoboken, NJ, 2010. High-dimensional and large-sample approximations. MR **2640807**, DOI 10.1002/9780470539873
- Anil K. Ghosh and Munmun Biswas,
*Distribution-free high-dimensional two-sample tests based on discriminating hyperplanes*, TEST **25** (2016), no. 3, 525–547. MR **3531841**, DOI 10.1007/s11749-015-0467-x
- C. C. Heyde and B. M. Brown,
*On the departure from normality of a certain class of martingales*, Ann. Math. Statist. **41** (1970), 2161–2165. MR **293702**, DOI 10.1214/aoms/1177696722
- Sayantee Jana, Narayanaswamy Balakrishnan, Dietrich von Rosen, and Jemila Seid Hamid,
*High dimensional extension of the growth curve model and its application in genetics*, Stat. Methods Appl. **26** (2017), no. 2, 273–292. MR **3652497**, DOI 10.1007/s10260-016-0369-4
- Robb J. Muirhead,
*Aspects of multivariate statistical theory*, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York, 1982. MR **652932**, DOI 10.1002/9780470316559
- J. O. Ramsay and B. W. Silverman,
*Functional data analysis*, 2nd ed., Springer Series in Statistics, Springer, New York, 2005. MR **2168993**, DOI 10.1007/b98888
- M. S. Srivastava,
*Methods of multivariate statistics*, Wiley Series in Probability and Statistics, Wiley-Interscience [John Wiley & Sons], New York, 2002. MR **1915968**
- Muni S. Srivastava and Tatsuya Kubokawa,
*Tests for multivariate analysis of variance in high dimension under non-normality*, J. Multivariate Anal. **115** (2013), 204–216. MR **3004555**, DOI 10.1016/j.jmva.2012.10.011
- Muni S. Srivastava and Martin Singull,
*Test for the mean matrix in a growth curve model for high dimensions*, Comm. Statist. Theory Methods **46** (2017), no. 13, 6668–6683. MR **3631538**, DOI 10.1080/03610926.2015.1132328
- Sho Takahashi and Nobumichi Shutoh,
*Tests for parallelism and flatness hypotheses of two mean vectors in high-dimensional settings*, J. Stat. Comput. Simul. **86** (2016), no. 6, 1150–1165. MR **3441561**, DOI 10.1080/00949655.2015.1055269
- Dietrich von Rosen,
*Bilinear regression analysis*, Lecture Notes in Statistics, vol. 220, Springer, Cham, 2018. An introduction. MR **3823252**, DOI 10.1007/978-3-319-78784-8
- Lan Wang, Bo Peng, and Runze Li,
*A high-dimensional nonparametric multivariate test for mean vector*, J. Amer. Statist. Assoc. **110** (2015), no. 512, 1658–1669. MR **3449062**, DOI 10.1080/01621459.2014.988215
- Wei Wang, Nan Lin, and Xiang Tang,
*Robust two-sample test of high-dimensional mean vectors under dependence*, J. Multivariate Anal. **169** (2019), 312–329. MR **3875602**, DOI 10.1016/j.jmva.2018.09.013
- Takayuki Yamada and Tetsuto Himeno,
*Testing homogeneity of mean vectors under heteroscedasticity in high-dimension*, J. Multivariate Anal. **139** (2015), 7–27. MR **3349477**, DOI 10.1016/j.jmva.2015.02.005
- Takayuki Yamada and Tetsuro Sakurai,
*Asymptotic power comparison of three tests in GMANOVA when the number of observed points is large*, Statist. Probab. Lett. **82** (2012), no. 3, 692–698. MR **2887488**, DOI 10.1016/j.spl.2011.12.004
- Bu Zhou, Jia Guo, and Jin-Ting Zhang,
*High-dimensional general linear hypothesis testing under heteroscedasticity*, J. Statist. Plann. Inference **188** (2017), 36–54. MR **3648316**, DOI 10.1016/j.jspi.2017.03.005

References
- T. W. Anderson,
*An introduction to multivariate statistical analysis*, third ed., Wiley Series in Probability and Statistics, Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, 2003. MR **1990662**
- Z. Bai, K. P. Choi, and Y. Fujikoshi,
*Limiting behavior of eigenvalues in high-dimensional MANOVA via RMT*, Ann. Statist. **46** (2018), no. 6A, 2985–3013. MR **3851762**
- Z. Bai and H. Saranadasa,
*Effect of high dimension: by an example of a two sample problem*, Statist. Sinica **6** (1996), no. 2, 311–329. MR **1399305**
- T. T. Cai, W. Liu, and Y. Xia,
*Two-sample test of high dimensional means under dependence*, J. R. Stat. Soc. Ser. B. Stat. Methodol. **76** (2014), no. 2, 349–372. MR **3164870**
- T. T. Cai and Y. Xia,
*High-dimensional sparse MANOVA*, J. Multivariate Anal. **131** (2014), 174–196. MR **3252643**
- S. X. Chen, J. Li, and P.-S. Zhong,
*Two-sample and ANOVA tests for high dimensional means*, Ann. Statist. **47** (2019), no. 3, 1443–1474. MR **3911118**
- S. X. Chen and Y.-L. Qin,
*A two-sample test for high-dimensional data with applications to gene-set testing*, Ann. Statist. **38** (2010), no. 2, 808–835. MR **2604697**
- Y. Fujikoshi, V. V. Ulyanov, and R. Shimizu,
*Multivariate statistics: High-dimensional and large-sample approximations*, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., Hoboken, NJ, 2010. MR **2640807**
- A. K. Ghosh and M. Biswas,
*Distribution-free high-dimensional two-sample tests based on discriminating hyperplanes*, TEST **25** (2016), no. 3, 525–547. MR **3531841**
- C. C. Heyde and B. M. Brown,
*On the departure from normality of a certain class of martingales*, Ann. Math. Statist. **41** (1970), 2161–2165. MR **293702**
- S. Jana, N. Balakrishnan, D. von Rosen, and J. S. Hamid,
*High dimensional extension of the growth curve model and its application in genetics*, Stat. Methods Appl. **26** (2017), no. 2, 273–292. MR **3652497**
- R. J. Muirhead,
*Aspects of multivariate statistical theory*, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York, 1982. MR **652932**
- J. O. Ramsay and B. W. Silverman,
*Functional data analysis*, second ed., Springer Series in Statistics, Springer, New York, 2005. MR **2168993**
- M. S. Srivastava,
*Methods of multivariate statistics*, Wiley Series in Probability and Statistics, Wiley-Interscience [John Wiley & Sons], New York, 2002. MR **1915968**
- M. S. Srivastava and T. Kubokawa,
*Tests for multivariate analysis of variance in high dimension under non-normality*, J. Multivariate Anal. **115** (2013), 204–216. MR **3004555**
- M. S. Srivastava and M. Singull,
*Test for the mean matrix in a growth curve model for high dimensions*, Comm. Statist. Theory Methods **46** (2017), no. 13, 6668–6683. MR **3631538**
- S. Takahashi and N. Shutoh,
*Tests for parallelism and flatness hypotheses of two mean vectors in high-dimensional settings*, J. Stat. Comput. Simul. **86** (2016), no. 6, 1150–1165. MR **3441561**
- D. von Rosen,
*Bilinear regression analysis: An introduction*, Lecture Notes in Statistics, vol. 220, Springer, Cham, 2018. MR **3823252**
- L. Wang, B. Peng, and R. Li,
*A high-dimensional nonparametric multivariate test for mean vector*, J. Amer. Statist. Assoc. **110** (2015), no. 512, 1658–1669. MR **3449062**
- W. Wang, N. Lin, and X. Tang,
*Robust two-sample test of high-dimensional mean vectors under dependence*, J. Multivariate Anal. **169** (2019), 312–329. MR **3875602**
- T. Yamada and T. Himeno,
*Testing homogeneity of mean vectors under heteroscedasticity in high-dimension*, J. Multivariate Anal. **139** (2015), 7–27. MR **3349477**
- T. Yamada and T. Sakurai,
*Asymptotic power comparison of three tests in GMANOVA when the number of observed points is large*, Statist. Probab. Lett. **82** (2012), no. 3, 692–698. MR **2887488**
- B. Zhou, J. Guo, and J.-T. Zhang,
*High-dimensional general linear hypothesis testing under heteroscedasticity*, J. Statist. Plann. Inference **188** (2017), 36–54. MR **3648316**

Similar Articles

Retrieve articles in *Theory of Probability and Mathematical Statistics*
with MSC (2020):
62H15,
62E20,
62H10

Retrieve articles in all journals
with MSC (2020):
62H15,
62E20,
62H10

Additional Information

**Takayuki Yamada**

Affiliation:
Faculty of Data Science, Kyoto Women’s University, 35 Kitahiyoshi-cho, Imakumano, Higashiyama-ku, Kyoto 605-8501, Japan

Email:
yamadatak@kyoto-wu.ac.jp

**Tetsuto Himeno**

Affiliation:
Faculty of Data Science, Shiga University, 1-1-1 Banba, Hikone, Shiga 522-8522, Japan

**Annika Tillander**

Affiliation:
Department of Computer and Information Science, Linköping University, 581 83 Linköping, Sweden

**Tatjana Pavlenko**

Affiliation:
Department of Statistics, Uppsala University, Box 513, 751 20 Uppsala, Sweden

Keywords:
Asymptotic distribution,
bilateral linear hypothesis on mean matrix,
bias correction approach,
$(N,p)$-asymptotic

Received by editor(s):
April 17, 2022

Accepted for publication:
January 25, 2023

Published electronically:
October 3, 2023

Additional Notes:
The first author was supported in part by the Ministry of Education, Science, Sports, and Culture, a Grant-in-Aid for Scientific Research (C), 18K03419, 2018-2022.

The second author was supported in part by the JSPS, Grant-in-Aid for Scientific Research, Young Scientists (B), KAKENHI Grant Number JP16K16018, 2016-2020.

Article copyright:
© Copyright 2023
Taras Shevchenko National University of Kyiv