Remote Access Journal of the American Mathematical Society
Green Open Access

Journal of the American Mathematical Society

ISSN 1088-6834(online) ISSN 0894-0347(print)



Testing the manifold hypothesis

Authors: Charles Fefferman, Sanjoy Mitter and Hariharan Narayanan
Journal: J. Amer. Math. Soc. 29 (2016), 983-1049
MSC (2010): Primary 62G08, 62H15; Secondary 55R10, 57R40
Published electronically: February 9, 2016
Full-text PDF

Abstract | References | Similar Articles | Additional Information

Abstract: The hypothesis that high dimensional data tend to lie in the vicinity of a low dimensional manifold is the basis of manifold learning. The goal of this paper is to develop an algorithm (with accompanying complexity guarantees) for testing the existence of a manifold that fits a probability distribution supported in a separable Hilbert space, only using i.i.d. samples from that distribution. More precisely, our setting is the following. Suppose that data are drawn independently at random from a probability distribution $ \mathcal {P}$ supported on the unit ball of a separable Hilbert space $ \mathcal {H}$. Let $ \mathcal {G}(d, V, \tau )$ be the set of submanifolds of the unit ball of $ \mathcal {H}$ whose volume is at most $ V$ and reach (which is the supremum of all $ r$ such that any point at a distance less than $ r$ has a unique nearest point on the manifold) is at least $ \tau $. Let $ \mathcal {L}(\mathcal {M}, \mathcal {P})$ denote the mean-squared distance of a random point from the probability distribution $ \mathcal {P}$ to $ \mathcal {M}$. We obtain an algorithm that tests the manifold hypothesis in the following sense.

The algorithm takes i.i.d. random samples from $ \mathcal {P}$ as input and determines which of the following two is true (at least one must be):

  1. There exists $ \mathcal {M} \in \mathcal {G}(d, CV, \frac {\tau }{C})$ such that $ \mathcal {L}(\mathcal {M}, \mathcal {P}) \leq C {\epsilon }.$
  2. There exists no $ \mathcal {M} \in \mathcal {G}(d, V/C, C\tau )$ such that $ \mathcal {L}(\mathcal {M}, \mathcal {P}) \leq \frac {\epsilon }{C}.$
The answer is correct with probability at least $ 1-\delta $.

References [Enhancements On Off] (What's this?)

Similar Articles

Retrieve articles in Journal of the American Mathematical Society with MSC (2010): 62G08, 62H15, 55R10, 57R40

Retrieve articles in all journals with MSC (2010): 62G08, 62H15, 55R10, 57R40

Additional Information

Charles Fefferman
Affiliation: Department of Mathematics, Princeton University, Princeton, New Jersey 08544

Sanjoy Mitter
Affiliation: Laboratory for Information and Decision Systems, MIT, Cambridge, Massachusetts 02139

Hariharan Narayanan
Affiliation: Department of Statistics and Department of Mathematics, University of Washington, Seattle, Washington 98195

Received by editor(s): March 9, 2014
Received by editor(s) in revised form: February 3, 2015, and August 9, 2015
Published electronically: February 9, 2016
Additional Notes: The first author was supported by NSF grant DMS 1265524, AFOSR grant FA9550-12-1-0425 and U.S.-Israel Binational Science Foundation grant 2014055.
The second author was supported by NSF grant EECS-1135843.
Article copyright: © Copyright 2016 American Mathmatical Society