Skip to Main Content
Quarterly of Applied Mathematics

Quarterly of Applied Mathematics

Online ISSN 1552-4485; Print ISSN 0033-569X

   
 
 

 

Building a telescope to look into high-dimensional image spaces


Authors: Mitch Hill, Erik Nijkamp and Song-Chun Zhu
Journal: Quart. Appl. Math. 77 (2019), 269-321
MSC (2010): Primary 65C40
DOI: https://doi.org/10.1090/qam/1532
Published electronically: January 25, 2019
MathSciNet review: 3932961
Full-text PDF

Abstract | References | Similar Articles | Additional Information

Abstract: In Grenander’s work, an image pattern is represented by a probability distribution whose density is concentrated on different low-dimensional subspaces in the high-dimensional image space. Such probability densities have an astronomical number of local modes corresponding to typical pattern appearances. Related groups of modes can join to form macroscopic image basins (known as Hopfield memories in the neural network community) that represent pattern concepts. Grenander pioneered the practice of approximating an unknown image density with a Gibbs density. Recent works continue this paradigm and use neural networks that capture high-order image statistics to learn Gibbs models capable of synthesizing realistic images of many patterns. However, characterizing a learned probability density to uncover the Hopfield memories of the model, encoded by the structure of the local modes, remains an open challenge. In this work, we present novel computational experiments that map and visualize the local mode structure of Gibbs densities. Efficient mapping requires identifying the global basins without enumerating the countless modes. Inspired by Grenander’s jump-diffusion method, we propose a new MCMC tool called Attraction-Diffusion (AD) that can capture the macroscopic structure of highly non-convex densities by measuring metastability of local modes. AD involves altering the target density with a magnetization potential penalizing distance from a known mode and running an MCMC sample of the altered density to measure the stability of the initial chain state. Using a low-dimensional generator network to facilitate exploration, we map image spaces with up to 12,288 dimensions ($64\times 64$ pixels in RGB). Our work shows: (1) AD can efficiently map highly non-convex probability densities, (2) metastable regions of pattern probability densities contain coherent groups of images, and (3) the perceptibility of differences between training images influences the metastability of image basins.


References [Enhancements On Off] (What's this?)

References
  • Yves F. AtchadΓ© and Jun S. Liu, The Wang-Landau algorithm in general state spaces: applications and convergence analysis, Statist. Sinica 20 (2010), no. 1, 209–233. MR 2640691
  • A. J. Ballard, J. D. Stevenson, R. Das, and D. J. Wales, Energy landscapes for a machine learning application to series data, Journal of Chemical Physics 144 (2016), 124119.
  • O. M. Becker and M. Karplus, The topology of multidimensional potential energy surfaces: Theory and application to peptide structure and kinetics, Journal of Chemical Physics 106 (1997), no. 4.
  • A. Bovier and F. den Hollander, Metastability: A potential theoretic approach, International Congress of Mathematicians 3 (2006), 499–518.
  • C. J. Cerjam and W. H. Miller, On finding transition states, The Journal of chemical physics 75 (1981), no. 2800.
  • P. Chaudhari, A. Choromanska, S. Soatto, Y. A. LeCun, C. Baldassi, C. Borgs, J. Chayes, L. Sagun, and R. Zecchina, Entropy-sgd: Biasing gradient descent into wide valleys, ICLR (2017).
  • P. Chaudhari and S. Soatto, On the energy landscape of deep networks, arXiv:1511.06485 (2015).
  • A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, and Y. A. LeCun, The loss surfaces of multilayer networks, AISTATS (2015).
  • Ritankar Das and David J. Wales, Machine learning landscapes and predictions for patient outcomes, R. Soc. Open Sci. 4 (2017), no. 7, July, 170175, 19. MR 3688315, DOI https://doi.org/10.1098/rsos.170175
  • D. Erhan, Y. Bengio, A. Courville, and P. Vincent, Visualizing higher-layer features of a deep network, Technical Report, Univerisite de Montreal (2009).
  • R. P. Feynman and P. G. Wolynes, Quantum mechanics and path integrals, McGraw Hill, New York, 1965.
  • Stuart Geman and Chii-Ruey Hwang, Diffusions for global optimization, SIAM J. Control Optim. 24 (1986), no. 5, 1031–1043. MR 854068, DOI https://doi.org/10.1137/0324060
  • S. German and D. German, Stochastic relaxation, gibbs distribution, and the bayesian restoration of images., IEEE Trans. PAMI 6 (1984), 721–741.
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, Advances in Neural Information Processing Systems (2014), 2672–2680.
  • Ulf Grenander, General pattern theory, Oxford Mathematical Monographs, The Clarendon Press, Oxford University Press, New York, 1993. A mathematical study of regular structures; Oxford Science Publications. MR 1270904
  • Ulf Grenander and Michael I. Miller, Representations of knowledge in complex systems, J. Roy. Statist. Soc. Ser. B 56 (1994), no. 4, 549–603. With discussion and a reply by the authors. MR 1293234
  • U. Grenander, Probability models for clutter in natural images, IEEE Trans. Pattern Analysis and Machine Learning 23 (2001), no. 4.
  • Ulf Grenander and Michael I. Miller, Pattern theory: from representation to inference, Oxford University Press, Oxford, 2007. MR 2285439
  • T. A. Halgren and W. N. Lipscomb, The synchronous-transit method for determining reaction pathways and locating molecular transition states, Chemical Physics Letters 49 (1977), no. 2, 225–232.
  • T. Han, Y. Lu, S.-C. Zhu, and Y. N. Wu, Alternating back-propagation for generator network, arXiv:1606.08571 (2016).
  • G. Hinton, Training products of experts by minimizing contrastive divergence, Neural Computation (2002), 1771–1800.
  • J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. Nat. Acad. Sci. U.S.A. 79 (1982), no. 8, 2554–2558. MR 652033, DOI https://doi.org/10.1073/pnas.79.8.2554
  • B. Julesz, Visual pattern discrimination, IRE Trans. Information Theory 8 (1962), no. 2, 84–92.
  • B. Julesz, Textons, the elements of texture perception, and their interactions, Nature 290 (1981), 91.
  • A. Krizhevsky, I. Sutskever, and G. Hinton, Imagenet classification with deep convolutional neural networks, NIPS (2012), 1097–1105.
  • A. Kuki and P. G. Wolynes, Electron tunneling paths in proteins, Science 236 (1986), no. 1647.
  • David P. Landau and Kurt Binder, A guide to Monte Carlo simulations in statistical physics, 3rd ed., Cambridge University Press, Cambridge, 2009. MR 2559932
  • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (1998), no. 11, 2278–2324.
  • Faming Liang, A generalized Wang-Landau algorithm for Monte Carlo computation, J. Amer. Statist. Assoc. 100 (2005), no. 472, 1311–1327. MR 2236444, DOI https://doi.org/10.1198/016214505000000259
  • Po-Ling Loh and Martin J. Wainwright, Regularized $M$-estimators with nonconvexity: statistical and algorithmic theory for local optima, J. Mach. Learn. Res. 16 (2015), 559–616. MR 3335800
  • Y. Lu, S. C. Zhu, and Y. N. Wu, Learning frame models using cnn filters, Thirtieth AAAI Conference on Artificial Intelligence (2016).
  • A. Mahendran and A. Vedaldi, Visualizing deep convolutional neural networks using natural pre-images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), 5188–5196.
  • G. F. Montufar, R. Pascanu, K. Cho, and Y. Bengio, On the number of linear regions of deep neural networks, Advances in Neural Information Processing Systems (NIPS), 2014, pp. 2924–2932.
  • A. Mordvintsev, C. Olah, and M. Tyka, Inceptionism: Going deeper into neural networks, Google Research Blog (2015).
  • Radford M. Neal, MCMC using Hamiltonian dynamics, Handbook of Markov chain Monte Carlo, Chapman & Hall/CRC Handb. Mod. Stat. Methods, CRC Press, Boca Raton, FL, 2011, pp. 113–162. MR 2858447
  • A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox, and J. Clune, Synthesizing the preferred inputs for neuron in neural networks via deep generator networks, NIPS (2016).
  • J. N. Onuchic and P. G. Wolynes, Theory of protein folding, Current Opinion in Structural Biology 14 (2004), 70–75.
  • M. Pavlovskaia, K. Tu, and S.-C. Zhu, Mapping the energy landscape of non-convex learning problems, arXiv preprint arXiv:1410.0576 (2014).
  • Zhangzhang Si, Haifeng Gong, Song-Chun Zhu, and Ying Nian Wu, Learning active basis models by EM-type algorithms, Statist. Sci. 25 (2010), no. 4, 458–475. MR 2807764, DOI https://doi.org/10.1214/09-STS281
  • J. Simons, P. Joergensen, H. Taylor, and J. Ozment, Walking on potential energy surfaces, Journal of Physical Chemistry 89 (1985), no. 684.
  • T. Tieleman, Training restricted boltzmann machines using approximations to the likelihood gradient, ICML (2008), 1064–1071.
  • L. Van der Maaten and G. Hinton, Visualizing data using t-sne, Journal of Machine Learning Research 9 (2008), no. 85, 2579–2605.
  • A. Vedaldi, K. Lenc, and G. Ankush, Matconvnet – convolutional neural networks for matlab, Proceeding of the ACM Int. Conf. on Multimedia (2015).
  • D. J. Wales, The energy landscape as a unifying theme in molecular science, Phil. Trans. R. Soc. A 363 (2005), 357–377.
  • D. J. Wales and S. A. Trygubenko, A doubly nudged elastic band method for finding transition states, Journal of Chemical Physics 120 (2004), 2082–2094.
  • F. Wang and D. P. Landau, Efficient multiple-range random walk algorithm to calculate the density of states, Physical review letters 86 (2001), 2050–2053.
  • Ying Nian Wu, Cheng-En Guo, and Song-Chun Zhu, From information scaling of natural images to regimes of statistical models, Quart. Appl. Math. 66 (2008), no. 1, 81–122. MR 2396653, DOI https://doi.org/10.1090/S0033-569X-07-01063-2
  • J. Xie, W. Hu, S. C. Zhu, and Y. N. Wu, A theory of generative convnet, International Conference on Machine Learning (2016).
  • J. Xie, Y. Lu, and Y. N. Wu, Cooperative learning of energy-based model and latent variable model via mcmc teaching, AAAI (2018).
  • Y. Zeng, X. Penghao, and G. Henkelman, Unification of algorithms of minimum mode optimization, Journal of Chemical Physics 140 (2014), 044115.
  • Q. Zhou, Random walk over basins of attraction to construct ising energy landscapes, Physical Review Letters 106 (2011), 180602.
  • S.-C. Zhu, X. Liu, and Y. N. Wu, Exploring texture ensembles by efficient markov chain monte-carlo, PAMI 22 (2000), 245–261.
  • S.-C. Zhu, Y. N. Wu, and D. Mumford, Filters, random fields and maximum entropy (frame): Toward a unified theory for texture modeling, International Journal of Computer Vision 27 (1998), no. 2, 107–126.

Similar Articles

Retrieve articles in Quarterly of Applied Mathematics with MSC (2010): 65C40

Retrieve articles in all journals with MSC (2010): 65C40


Additional Information

Mitch Hill
Affiliation: Department of Mathematics, University of California, Los Angeles, Los Angeles, California 90095
Email: mkhill@ucla.edu

Erik Nijkamp
Affiliation: Department of Mathematics, University of California, Los Angeles, Los Angeles, California 90095
Email: enijkamp@ucla.edu

Song-Chun Zhu
Affiliation: Department of Mathematics, University of California, Los Angeles, Los Angeles, California 90095
MR Author ID: 712282
Email: sczhu@stat.ucla.edu

Received by editor(s): February 17, 2018
Received by editor(s) in revised form: October 18, 2018
Published electronically: January 25, 2019
Additional Notes: This work was supported by DARPA #W911NF-16-1-0579. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ASC170063.
Article copyright: © Copyright 2019 Brown University