Skip to Main Content

Bulletin of the American Mathematical Society

The Bulletin publishes expository articles on contemporary mathematical research, written in a way that gives insight to mathematicians who may not be experts in the particular topic. The Bulletin also publishes reviews of selected books in mathematics and short articles in the Mathematical Perspectives section, both by invitation only.

ISSN 1088-9485 (online) ISSN 0273-0979 (print)

The 2024 MCQ for Bulletin of the American Mathematical Society is 0.84.

What is MCQ? The Mathematical Citation Quotient (MCQ) measures journal impact by looking at citations over a five-year period. Subscribers to MathSciNet may click through for more detailed information.

 

Compositional sparsity of learnable functions
HTML articles powered by AMS MathViewer

by Tomaso Poggio and Maia Fraser;
Bull. Amer. Math. Soc. 61 (2024), 438-456
DOI: https://doi.org/10.1090/bull/1820
Published electronically: May 15, 2024

Abstract:

Neural networks have demonstrated impressive success in various domains, raising the question of what fundamental principles underlie the effectiveness of the best AI systems and quite possibly of human intelligence. This perspective argues that compositional sparsity, or the property that a compositional function have “few” constituent functions, each depending on only a small subset of inputs, is a key principle underlying successful learning architectures. Surprisingly, all functions that are efficiently Turing computable have a compositional sparse representation. Furthermore, deep networks that are also sparse can exploit this general property to avoid the “curse of dimensionality”. This framework suggests interesting implications about the role that machine learning may play in mathematics.
References
  • Markus Bachmayr, Anthony Nouy, and Reinhold Schneider, Approximation by tree tensor networks in high dimensions: Sobolev and compositional functions, Pure Appl. Funct. Anal. 8 (2023), no. 2, 405–428. MR 4613643
  • Alexander Bastounis, Anders C Hansen, and Verner Vlačić, The extended smale’s 9th problem – on computational barriers and paradoxes in estimation, regularisation, computer-assisted proofs and learning, 2021.
  • Benedikt Bauer and Michael Kohler, On deep learning as a remedy for the curse of dimensionality in nonparametric regression, Ann. Statist. 47 (2019), no. 4, 2261–2285. MR 3953451, DOI 10.1214/18-AOS1747
  • Richard Bellman, Dynamic programming, Princeton University Press, Princeton, NJ, 1957. MR 90477
  • Holger Boche, Adalbert Fono, and Gitta Kutyniok, Limitations of deep learning for inverse problems on digital hardware, IEEE Trans. Inform. Theory 69 (2023), no. 12, 7887–7908. MR 4692648, DOI 10.1109/tit.2023.3326879
  • Wolfgang Dahmen, Compositional sparsity, approximation classes, and parametric transport equations, 2022.
  • Ronald A. DeVore, Ralph Howard, and Charles Micchelli, Optimal nonlinear approximation, Manuscripta Math. 63 (1989), no. 4, 469–478. MR 991266, DOI 10.1007/BF01171759
  • Tomer Galanti, Mengjia Xu, Liane Galanti, and Tomaso Poggio, Norm-based generalization bounds for compositionally sparse neural networks, 2023.
  • Timothy Gowers, Announcing an automatic theorem proving project, https://gowers.wordpress.com/2022/04/28/announcing-an-automatic-theorem-proving-project (2022).
  • Timothy Gowers, How can it be feasible to find proofs?, (54 page manifesto, available at) https://gowers.wordpress.com/ 2022/04/28/announcing-an-automatic-theorem-proving-project (2022).
  • Lars Grasedyck, Hierarchical singular value decomposition of tensors, SIAM J. Matrix Anal. Appl. 31 (2009/10), no. 4, 2029–2054. MR 2678955, DOI 10.1137/090764189
  • W. Hackbusch and S. Kühn, A new scheme for the tensor representation, J. Fourier Anal. Appl. 15 (2009), no. 5, 706–722. MR 2563780, DOI 10.1007/s00041-009-9094-9
  • Chi Han, Ziqi Wang, Han Zhao, and Heng Ji, In-context learning of large language models explained as kernel regression, 2023.
  • Dan Hendrycks and Kevin Gimpel, Bridging nonlinearities and stochastic regularizers with gaussian error linear units, CoRR abs/1606.08415 (2016).
  • Michael Kohler and Sophie Langer, Discussion of: “Nonparametric regression using deep neural networks with ReLU activation function” [ MR4134774], Ann. Statist. 48 (2020), no. 4, 1906–1910. MR 4134777, DOI 10.1214/19-AOS1912
  • Gitta Kutyniok, Discussion of: “Nonparametric regression using deep neural networks with ReLU activation function” [ MR4134774], Ann. Statist. 48 (2020), no. 4, 1902–1905. MR 4134776, DOI 10.1214/19-AOS1911
  • Bingbin Liu, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, and Cyril Zhang, Transformers learn shortcuts to automata, 2023.
  • V. E. Maiorov, On best approximation by ridge functions, J. Approx. Theory 99 (1999), no. 1, 68–94. MR 1696577, DOI 10.1006/jath.1998.3304
  • John McCarthy, Marvin L. Minsky, Nathaniel Rochester, and Claude E. Shannon, A proposal for the dartmouth summer research project on artificial intelligence, august 31, 1955, AI Magazine 27 (2006), no. 4, 12.
  • H. Mhaskar, Q. Liao, and T. Poggio, Learning real and boolean functions: When is deep better than shallow?, Center for Brains, Minds and Machines (CBMM) Memo No. 45, also in arXiv (2016).
  • H. N. Mhaskar, Neural networks for optimal approximation of smooth and analytic functions, Neural Computation 8 (1996), no. 1, 164–177.
  • H. N. Mhaskar and T. Poggio, Deep vs. shallow networks: an approximation theory perspective, Anal. Appl. (Singap.) 14 (2016), no. 6, 829–848. MR 3564936, DOI 10.1142/S0219530516400042
  • Shikhar Murty, Pratyusha Sharma, Jacob Andreas, and Christopher D. Manning, Characterizing intrinsic compositionality in transformers with tree projections, 2022.
  • Allan Pinkus, Approximation theory of the MLP model in neural networks, Acta numerica, 1999, Acta Numer., vol. 8, Cambridge Univ. Press, Cambridge, 1999, pp. 143–195. MR 1819645, DOI 10.1017/S0962492900002919
  • T. Poggio, How deep sparse networks avoid the curse of dimensionality: Efficiently computable functions are compositionally sparse, CBMM memo 138 (2022).
  • Frank Rosenblatt, The perceptron: A theory of statistical separability in cognitive systems, Cornell Aeronautical Laboratory, Inc., U.S. Department of Commerce, Office of Technical Services, PB 151247, 1958. Rep. No. VG-1196-G-1. MR 122606
  • Johannes Schmidt-Hieber, Nonparametric regression using deep neural networks with ReLU activation function, Ann. Statist. 48 (2020), no. 4, 1875–1897. MR 4134774, DOI 10.1214/19-AOS1875
  • M. Sipser, Introduction to the theory of computation, International Thomson Publishing, 1996.
  • L. G. Valiant, A theory of the learnable, Communications of the ACM 27 (1984), 1134–1142.
  • V. N. Vapnik and A. Ja. Červonenkis, The uniform convergence of frequencies of the appearance of events to their probabilities, Teor. Verojatnost. i Primenen. 16 (1971), 264–279 (Russian, with English summary). MR 288823
  • V. N. Vapnik and A. Ya. Chervonenkis, Necessary and sufficient conditions for the uniform convergence of empirical means to their true values, Teor. Veroyatnost. i Primenen. 26 (1981), no. 3, 543–563 (Russian, with English summary). MR 627861
  • V. N. Vapnik and A. Ya. Chervonenkis, The necessary and sufficient conditions for consistency in the empirical risk minimization method, Pattern Recognition and Image Analysis 1 (1991), no. 3, 283–305.
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, Attention is all you need, CoRR abs/1706.03762 (2017).
  • Norbert Wiener, Cybernetics, or Control and Communication in the Animal and the Machine, Actualités Scientifiques et Industrielles [Current Scientific and Industrial Topics], No. 1053, Hermann & Cie, Paris; The Technology Press, Cambridge, MA John Wiley & Sons, Inc., New York, 1948. MR 25096
  • Norbert Wiener, Men, machines, and the world about them, Oct 1950.
  • Wikipedia contributors, Big o notation—Wikipedia, the free encyclopedia, 2023, [Online; accessed June-2023].
  • Mengjia Xu, Akshay Rangamani, Qianli Liao, Tomer Galanti, and Tomaso Poggio, Dynamics in deep classifiers trained with the square loss: normalization, low rank, neural collapse and generalization bounds, Research (2023).
  • Mengjia Xu, Akshay Rangamani, Qianli Liao, Tomer Galanti, and Tomaso Poggio, Dynamics in deep classifiers trained with the square loss: Normalization, low rank, neural collapse, and generalization bounds, Research 6 (2023), 0024.
  • Kaiyu Yang, Aidan M. Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, and Anima Anandkumar, Leandojo: Theorem proving with retrieval-augmented language models, 2023.
Similar Articles
  • Retrieve articles in Bulletin of the American Mathematical Society with MSC (2020): 68-XX, 41-XX, 03-XX
  • Retrieve articles in all journals with MSC (2020): 68-XX, 41-XX, 03-XX
Bibliographic Information
  • Tomaso Poggio
  • Affiliation: Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
  • MR Author ID: 219649
  • ORCID: 0000-0002-3944-0455
  • Email: tp@csail.mit.edu
  • Maia Fraser
  • Affiliation: Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
  • MR Author ID: 613740
  • ORCID: 0000-0003-4240-7178
  • Email: fraser.maia@gmail.com
  • Received by editor(s): October 21, 2023
  • Published electronically: May 15, 2024
  • Additional Notes: The material in the present paper is based upon work supported by the Center for Minds, Brains and Machines (CBMM), funded by NSF STC award CCF-1231216. This research was also sponsored by grants from the National Science Foundation (NSF-0640097, NSF-0827427), the Natural Sciences and Engineering Research Council of Canada (NSERC RGPIN-2017-06901), AFSOR-THRL (FA8650-05-C-7262) and Lockheed-Martin.
  • © Copyright 2024 American Mathematical Society
  • Journal: Bull. Amer. Math. Soc. 61 (2024), 438-456
  • MSC (2020): Primary 68-XX, 41-XX, 03-XX
  • DOI: https://doi.org/10.1090/bull/1820
  • MathSciNet review: 4751011