# Journal of the American Mathematical Society

Published by the American Mathematical Society, the Journal of the American Mathematical Society (JAMS) is devoted to research articles of the highest quality in all areas of pure and applied mathematics.

ISSN 1088-6834 (online) ISSN 0894-0347 (print)

The 2020 MCQ for Journal of the American Mathematical Society is 4.79.

What is MCQ? The Mathematical Citation Quotient (MCQ) measures journal impact by looking at citations over a five-year period. Subscribers to MathSciNet may click through for more detailed information.

by Charles Fefferman, Sanjoy Mitter and Hariharan Narayanan
J. Amer. Math. Soc. 29 (2016), 983-1049

## Abstract:

The hypothesis that high dimensional data tend to lie in the vicinity of a low dimensional manifold is the basis of manifold learning. The goal of this paper is to develop an algorithm (with accompanying complexity guarantees) for testing the existence of a manifold that fits a probability distribution supported in a separable Hilbert space, only using i.i.d. samples from that distribution. More precisely, our setting is the following. Suppose that data are drawn independently at random from a probability distribution $\mathcal {P}$ supported on the unit ball of a separable Hilbert space $\mathcal {H}$. Let $\mathcal {G}(d, V, \tau )$ be the set of submanifolds of the unit ball of $\mathcal {H}$ whose volume is at most $V$ and reach (which is the supremum of all $r$ such that any point at a distance less than $r$ has a unique nearest point on the manifold) is at least $\tau$. Let $\mathcal {L}(\mathcal {M}, \mathcal {P})$ denote the mean-squared distance of a random point from the probability distribution $\mathcal {P}$ to $\mathcal {M}$. We obtain an algorithm that tests the manifold hypothesis in the following sense.

The algorithm takes i.i.d. random samples from $\mathcal {P}$ as input and determines which of the following two is true (at least one must be):

1. There exists $\mathcal {M} \in \mathcal {G}(d, CV, \frac {\tau }{C})$ such that $\mathcal {L}(\mathcal {M}, \mathcal {P}) \leq C {\epsilon }.$

2. There exists no $\mathcal {M} \in \mathcal {G}(d, V/C, C\tau )$ such that $\mathcal {L}(\mathcal {M}, \mathcal {P}) \leq \frac {\epsilon }{C}.$

The answer is correct with probability at least $1-\delta$.

Similar Articles
• Charles Fefferman
• Affiliation: Department of Mathematics, Princeton University, Princeton, New Jersey 08544
• MR Author ID: 65640
• Email: cf@math.princeton.edu
• Sanjoy Mitter
• Affiliation: Laboratory for Information and Decision Systems, MIT, Cambridge, Massachusetts 02139
• Email: mitter@mit.edu
• Hariharan Narayanan
• Affiliation: Department of Statistics and Department of Mathematics, University of Washington, Seattle, Washington 98195
• Email: harin@uw.edu
• Received by editor(s): March 9, 2014
• Received by editor(s) in revised form: February 3, 2015, and August 9, 2015
• Published electronically: February 9, 2016
• Additional Notes: The first author was supported by NSF grant DMS 1265524, AFOSR grant FA9550-12-1-0425 and U.S.-Israel Binational Science Foundation grant 2014055.
The second author was supported by NSF grant EECS-1135843.