Building a telescope to look into high-dimensional image spaces

Hill, Mitch; Nijkamp, Erik; Zhu, Song-Chun

doi:10.1090/qam/1532

Authors: Mitch Hill, Erik Nijkamp and Song-Chun Zhu
Journal: Quart. Appl. Math. 77 (2019), 269-321
MSC (2010): Primary 65C40
DOI: https://doi.org/10.1090/qam/1532
Published electronically: January 25, 2019
MathSciNet review: 3932961
Full-text PDF

Abstract | References | Similar Articles | Additional Information

Abstract: In Grenander’s work, an image pattern is represented by a probability distribution whose density is concentrated on different low-dimensional subspaces in the high-dimensional image space. Such probability densities have an astronomical number of local modes corresponding to typical pattern appearances. Related groups of modes can join to form macroscopic image basins (known as Hopfield memories in the neural network community) that represent pattern concepts. Grenander pioneered the practice of approximating an unknown image density with a Gibbs density. Recent works continue this paradigm and use neural networks that capture high-order image statistics to learn Gibbs models capable of synthesizing realistic images of many patterns. However, characterizing a learned probability density to uncover the Hopfield memories of the model, encoded by the structure of the local modes, remains an open challenge. In this work, we present novel computational experiments that map and visualize the local mode structure of Gibbs densities. Efficient mapping requires identifying the global basins without enumerating the countless modes. Inspired by Grenander’s jump-diffusion method, we propose a new MCMC tool called Attraction-Diffusion (AD) that can capture the macroscopic structure of highly non-convex densities by measuring metastability of local modes. AD involves altering the target density with a magnetization potential penalizing distance from a known mode and running an MCMC sample of the altered density to measure the stability of the initial chain state. Using a low-dimensional generator network to facilitate exploration, we map image spaces with up to 12,288 dimensions ($64\times 64$ pixels in RGB). Our work shows: (1) AD can efficiently map highly non-convex probability densities, (2) metastable regions of pattern probability densities contain coherent groups of images, and (3) the perceptibility of differences between training images influences the metastability of image basins.

References

Yves F. Atchadé and Jun S. Liu, The Wang-Landau algorithm in general state spaces: applications and convergence analysis, Statist. Sinica 20 (2010), no. 1, 209–233. MR 2640691

A. J. Ballard, J. D. Stevenson, R. Das, and D. J. Wales, Energy landscapes for a machine learning application to series data, Journal of Chemical Physics 144 (2016), 124119.

O. M. Becker and M. Karplus, The topology of multidimensional potential energy surfaces: Theory and application to peptide structure and kinetics, Journal of Chemical Physics 106 (1997), no. 4.

A. Bovier and F. den Hollander, Metastability: A potential theoretic approach, International Congress of Mathematicians 3 (2006), 499–518.

C. J. Cerjam and W. H. Miller, On finding transition states, The Journal of chemical physics 75 (1981), no. 2800.

P. Chaudhari, A. Choromanska, S. Soatto, Y. A. LeCun, C. Baldassi, C. Borgs, J. Chayes, L. Sagun, and R. Zecchina, Entropy-sgd: Biasing gradient descent into wide valleys, ICLR (2017).

P. Chaudhari and S. Soatto, On the energy landscape of deep networks, arXiv:1511.06485 (2015).

A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, and Y. A. LeCun, The loss surfaces of multilayer networks, AISTATS (2015).

Ritankar Das and David J. Wales, Machine learning landscapes and predictions for patient outcomes, R. Soc. Open Sci. 4 (2017), no. 7, July, 170175, 19. MR 3688315, DOI https://doi.org/10.1098/rsos.170175

D. Erhan, Y. Bengio, A. Courville, and P. Vincent, Visualizing higher-layer features of a deep network, Technical Report, Univerisite de Montreal (2009).

R. P. Feynman and P. G. Wolynes, Quantum mechanics and path integrals, McGraw Hill, New York, 1965.

Stuart Geman and Chii-Ruey Hwang, Diffusions for global optimization, SIAM J. Control Optim. 24 (1986), no. 5, 1031–1043. MR 854068, DOI https://doi.org/10.1137/0324060

S. German and D. German, Stochastic relaxation, gibbs distribution, and the bayesian restoration of images., IEEE Trans. PAMI 6 (1984), 721–741.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, Advances in Neural Information Processing Systems (2014), 2672–2680.

Ulf Grenander, General pattern theory, Oxford Mathematical Monographs, The Clarendon Press, Oxford University Press, New York, 1993. A mathematical study of regular structures; Oxford Science Publications. MR 1270904

Ulf Grenander and Michael I. Miller, Representations of knowledge in complex systems, J. Roy. Statist. Soc. Ser. B 56 (1994), no. 4, 549–603. With discussion and a reply by the authors. MR 1293234

U. Grenander, Probability models for clutter in natural images, IEEE Trans. Pattern Analysis and Machine Learning 23 (2001), no. 4.

Ulf Grenander and Michael I. Miller, Pattern theory: from representation to inference, Oxford University Press, Oxford, 2007. MR 2285439

T. A. Halgren and W. N. Lipscomb, The synchronous-transit method for determining reaction pathways and locating molecular transition states, Chemical Physics Letters 49 (1977), no. 2, 225–232.

T. Han, Y. Lu, S.-C. Zhu, and Y. N. Wu, Alternating back-propagation for generator network, arXiv:1606.08571 (2016).

G. Hinton, Training products of experts by minimizing contrastive divergence, Neural Computation (2002), 1771–1800.

J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. Nat. Acad. Sci. U.S.A. 79 (1982), no. 8, 2554–2558. MR 652033, DOI https://doi.org/10.1073/pnas.79.8.2554

B. Julesz, Visual pattern discrimination, IRE Trans. Information Theory 8 (1962), no. 2, 84–92.

B. Julesz, Textons, the elements of texture perception, and their interactions, Nature 290 (1981), 91.

A. Krizhevsky, I. Sutskever, and G. Hinton, Imagenet classification with deep convolutional neural networks, NIPS (2012), 1097–1105.

A. Kuki and P. G. Wolynes, Electron tunneling paths in proteins, Science 236 (1986), no. 1647.

David P. Landau and Kurt Binder, A guide to Monte Carlo simulations in statistical physics, 3rd ed., Cambridge University Press, Cambridge, 2009. MR 2559932

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (1998), no. 11, 2278–2324.

Faming Liang, A generalized Wang-Landau algorithm for Monte Carlo computation, J. Amer. Statist. Assoc. 100 (2005), no. 472, 1311–1327. MR 2236444, DOI https://doi.org/10.1198/016214505000000259

Po-Ling Loh and Martin J. Wainwright, Regularized $M$-estimators with nonconvexity: statistical and algorithmic theory for local optima, J. Mach. Learn. Res. 16 (2015), 559–616. MR 3335800

Y. Lu, S. C. Zhu, and Y. N. Wu, Learning frame models using cnn filters, Thirtieth AAAI Conference on Artificial Intelligence (2016).

A. Mahendran and A. Vedaldi, Visualizing deep convolutional neural networks using natural pre-images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), 5188–5196.

G. F. Montufar, R. Pascanu, K. Cho, and Y. Bengio, On the number of linear regions of deep neural networks, Advances in Neural Information Processing Systems (NIPS), 2014, pp. 2924–2932.

A. Mordvintsev, C. Olah, and M. Tyka, Inceptionism: Going deeper into neural networks, Google Research Blog (2015).

Radford M. Neal, MCMC using Hamiltonian dynamics, Handbook of Markov chain Monte Carlo, Chapman & Hall/CRC Handb. Mod. Stat. Methods, CRC Press, Boca Raton, FL, 2011, pp. 113–162. MR 2858447

A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox, and J. Clune, Synthesizing the preferred inputs for neuron in neural networks via deep generator networks, NIPS (2016).

J. N. Onuchic and P. G. Wolynes, Theory of protein folding, Current Opinion in Structural Biology 14 (2004), 70–75.

M. Pavlovskaia, K. Tu, and S.-C. Zhu, Mapping the energy landscape of non-convex learning problems, arXiv preprint arXiv:1410.0576 (2014).

Zhangzhang Si, Haifeng Gong, Song-Chun Zhu, and Ying Nian Wu, Learning active basis models by EM-type algorithms, Statist. Sci. 25 (2010), no. 4, 458–475. MR 2807764, DOI https://doi.org/10.1214/09-STS281

J. Simons, P. Joergensen, H. Taylor, and J. Ozment, Walking on potential energy surfaces, Journal of Physical Chemistry 89 (1985), no. 684.

T. Tieleman, Training restricted boltzmann machines using approximations to the likelihood gradient, ICML (2008), 1064–1071.

L. Van der Maaten and G. Hinton, Visualizing data using t-sne, Journal of Machine Learning Research 9 (2008), no. 85, 2579–2605.

A. Vedaldi, K. Lenc, and G. Ankush, Matconvnet – convolutional neural networks for matlab, Proceeding of the ACM Int. Conf. on Multimedia (2015).

D. J. Wales, The energy landscape as a unifying theme in molecular science, Phil. Trans. R. Soc. A 363 (2005), 357–377.

D. J. Wales and S. A. Trygubenko, A doubly nudged elastic band method for finding transition states, Journal of Chemical Physics 120 (2004), 2082–2094.

F. Wang and D. P. Landau, Efficient multiple-range random walk algorithm to calculate the density of states, Physical review letters 86 (2001), 2050–2053.

Ying Nian Wu, Cheng-En Guo, and Song-Chun Zhu, From information scaling of natural images to regimes of statistical models, Quart. Appl. Math. 66 (2008), no. 1, 81–122. MR 2396653, DOI https://doi.org/10.1090/S0033-569X-07-01063-2

J. Xie, W. Hu, S. C. Zhu, and Y. N. Wu, A theory of generative convnet, International Conference on Machine Learning (2016).

J. Xie, Y. Lu, and Y. N. Wu, Cooperative learning of energy-based model and latent variable model via mcmc teaching, AAAI (2018).

Y. Zeng, X. Penghao, and G. Henkelman, Unification of algorithms of minimum mode optimization, Journal of Chemical Physics 140 (2014), 044115.

Q. Zhou, Random walk over basins of attraction to construct ising energy landscapes, Physical Review Letters 106 (2011), 180602.

S.-C. Zhu, X. Liu, and Y. N. Wu, Exploring texture ensembles by efficient markov chain monte-carlo, PAMI 22 (2000), 245–261.

S.-C. Zhu, Y. N. Wu, and D. Mumford, Filters, random fields and maximum entropy (frame): Toward a unified theory for texture modeling, International Journal of Computer Vision 27 (1998), no. 2, 107–126.