Publications Meetings The Profession Membership Programs Math Samplings Policy & Advocacy In the News About the AMS
   
  Quarterly of Applied Mathematics
Quarterly of Applied Mathematics
  
Online ISSN 1552-4485; Print ISSN 0033-569X
 

 

Unsupervised learning of compositional sparse code for natural image representation


Authors: Yi Hong, Zhangzhang Si, Wenze Hu, Song-Chun Zhu and Ying Nian Wu
Journal: Quart. Appl. Math. 72 (2014), 373-406
MSC (2000): Primary 62M40
Published electronically: November 14, 2013
Full-text PDF

Abstract | References | Similar Articles | Additional Information

Abstract: This article proposes an unsupervised method for learning compositional sparse code for representing natural images. Our method is built upon the original sparse coding framework where there is a dictionary of basis functions often in the form of localized, elongated and oriented wavelets, so that each image can be represented by a linear combination of a small number of basis functions automatically selected from the dictionary. In our compositional sparse code, the representational units are composite: they are compositional patterns formed by the basis functions. These compositional patterns can be viewed as shape templates. We propose an unsupervised learning method for learning a dictionary of frequently occurring templates from training images, so that each training image can be represented by a small number of templates automatically selected from the learned dictionary. The compositional sparse code approximates the raw image of a large number of pixel intensities using a small number of templates, thus facilitating the signal-to-symbol transition and allowing a symbolic description of the image. The current form of our model consists of two layers of representational units (basis functions and shape templates). It is possible to extend it to multiple layers of hierarchy. Experiments show that our method is capable of learning meaningful compositional sparse code, and the learned templates are useful for image classification.


References [Enhancements On Off] (What's this?)

  • [1] M. Aharon, M. Elad, and A. M. Bruckstein. The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representation, IEEE Transactions On Signal Processing, 54, 4311-4322, 2006.
  • [2] C. C. Chang and C. J. Lin.
    LIBSVM: A library for support vector machines.
    ACM Transactions on Intelligent Systems and Technology, 2, 1-27, 2011.
  • [3] Scott Shaobing Chen, David L. Donoho, and Michael A. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. 20 (1998), no. 1, 33–61. MR 1639094 (99h:94013), http://dx.doi.org/10.1137/S1064827596304010
  • [4] J. Chen and X. Huo. Sparse representations for multiple measurements vectors (mmv) in an overcomplete dictionary. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 4, 257-260, 2005.
  • [5] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray.
    Visual categorization with bags of keypoints.
    Workshop of European Conference on Computer Vision, 2004.
  • [6] J. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of Optical Society of America, 2, 1160-1169, 1985.
  • [7] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Statist. Soc. Ser. B 39 (1977), no. 1, 1–38. With discussion. MR 0501537 (58 #18858)
  • [8] David L. Donoho and Xiaoming Huo, Uncertainty principles and ideal atomic decomposition, IEEE Trans. Inform. Theory 47 (2001), no. 7, 2845–2862. MR 1872845 (2002k:94012), http://dx.doi.org/10.1109/18.959265
  • [9] R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin.
    LIBLINEAR: A library for large linear classification.
    Journal of Machine Learning Research, 9, 1871-1874, 2008.
  • [10] L. Fei-Fei, R. Fergus, and P. Perona.
    Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories.
    CVPR Workshop, 2004.
  • [11] V. Ferrari, F. Jurie, and C. Schmid.
    From images to shape models for object detection.
    International Journal of Computer Vision, 87, 284-303, 2010.
  • [12] S. Fidler, M. Boben, and A. Leonardis.
    Similarity-based cross-layered hierarchical representation for object categorization.
    IEEE Conference on Computer Vision and Pattern Recognition, 2008.
  • [13] Chris Fraley and Adrian E. Raftery, Model-based clustering, discriminant analysis, and density estimation, J. Amer. Statist. Assoc. 97 (2002), no. 458, 611–631. MR 1951635, http://dx.doi.org/10.1198/016214502760047131
  • [14] Jerome H. Friedman, Exploratory projection pursuit, J. Amer. Statist. Assoc. 82 (1987), no. 397, 249–266. MR 883353 (88c:62004)
  • [15] Stuart Geman, Daniel F. Potter, and Zhiyi Chi, Composition systems, Quart. Appl. Math. 60 (2002), no. 4, 707–736. MR 1939008 (2003i:68129)
  • [16] Y. Jin and S. Geman.
    Context and hierarchy in a probabilistic image model.
    IEEE Conference on Computer Vision and Pattern Recognition, 2006.
  • [17] S. Lazebnik, C. Schmid, and J. Ponce.
    Beyond bags of features: spatial pyramid matching for recognizing natural scene categories.
    IEEE Conference on Computer Vision and Pattern Recognition, 2006.
  • [18] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations.
    International Conference on Machine Learning, 2009.
  • [19] C. Lo, Amazing Chinese Characters, Panda Media Co., 2002.
  • [20] K. Lounici, A. B. Tsybakov, M. Pontil, and S. A. van de Geer. Taking advantage of sparsity in multi-task learning. Proceedings of the 22nd Conference on Learning Theory, 2009.
  • [21] D. Lowe.
    Distinctive image features from scale-invariant keypoints.
    International Journal of Computer Vision, 60, 91-110, 2004.
  • [22] S. Mallat and Z. Zhang.
    Matching pursuit in a time-frequency dictionary.
    IEEE Transactions on Signal Processing, 41, 3397-3415, 1993.
  • [23] M. Marszalek and C. Schmid.
    Accurate object localization with shape masks.
    IEEE Conference on Computer Vision and Pattern Recognition, 2007.
  • [24] Boris Mirkin, Mathematical classification and clustering, Nonconvex Optimization and its Applications, vol. 11, Kluwer Academic Publishers, Dordrecht, 1996. MR 1480413 (99c:62167)
  • [25] Guillaume Obozinski, Martin J. Wainwright, and Michael I. Jordan, Support union recovery in high-dimensional multivariate regression, Ann. Statist. 39 (2011), no. 1, 1–47. MR 2797839 (2012d:62224), http://dx.doi.org/10.1214/09-AOS776
  • [26] B. A. Olshausen and D. J. Field.
    Emergence of simple-cell receptive field properties by learning a sparse code for natural images.
    Nature, 381, 607-609, 1996.
  • [27] B. A. Olshausen, P. Sallee, and M. S. Lewicki.
    Learning sparse image codes using a wavelet pyramid architecture.
    Advances in Neural Information Processing Systems, 13, 887-893, 2001.
  • [28] S. D. Pietra, V. D. Pietra, and J. Lafferty.
    Inducing Features of Random Fields.
    IEEE Transactions on Pattern Recognition and Machine Intelligence, 19, 380-393, 1997.
  • [29] M. Riesenhuber and T. Poggio.
    Hierarchical models of object recognition in cortex.
    Nature Neuroscience, 2, 1019-1025, 1999.
  • [30] Jorma Rissanen, Information and complexity in statistical modeling, Information Science and Statistics, Springer, New York, 2007. MR 2287233 (2008f:62003)
  • [31] Gideon Schwarz, Estimating the dimension of a model, Ann. Statist. 6 (1978), no. 2, 461–464. MR 0468014 (57 #7855)
  • [32] J. Tropp, A. Gilbert, and M. Straus. Algorithms for simultaneous sparse approximation. part I: Greedy pursuit. Journal of Signal Processing, 86, 572-588, 2006.
  • [33] Robert Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B 58 (1996), no. 1, 267–288. MR 1379242 (96j:62134)
  • [34] Vladimir N. Vapnik, The nature of statistical learning theory, 2nd ed., Statistics for Engineering and Information Science, Springer-Verlag, New York, 2000. MR 1719582 (2001c:68110)
  • [35] Z. Si and S. C. Zhu. Learning and-or templates for object modeling and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, to appear, 2013.
  • [36] Ying Nian Wu, Cheng-En Guo, and Song-Chun Zhu, From information scaling of natural images to regimes of statistical models, Quart. Appl. Math. 66 (2008), no. 1, 81–122. MR 2396653 (2009a:62395), http://dx.doi.org/10.1090/S0033-569X-07-01063-2
  • [37] Ying Nian Wu, Zhangzhang Si, Haifeng Gong, and Song-Chun Zhu, Learning active basis model for object detection and recognition, Int. J. Comput. Vis. 90 (2010), no. 2, 198–235. MR 2719010, http://dx.doi.org/10.1007/s11263-009-0287-0
  • [38] Ming Yuan and Yi Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B Stat. Methodol. 68 (2006), no. 1, 49–67. MR 2212574, http://dx.doi.org/10.1111/j.1467-9868.2005.00532.x
  • [39] M. Zeiler, G. Taylor, and R. Fergus. Adaptive deconvolutional networks for mid and high level feature learning.
    International Conference on Computer Vision, 2011.
  • [40] L. Zhu, C. Lin, H. Huang, Y. Chen, and A. Yuille.
    Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion.
    European Conference on Computer Vision, 2008.
  • [41] S. C. Zhu, C. Guo, Y. Wang, and Z. Xu.
    What are textons?
    International Conference on Computer Vision, 62, 121-143, 2005.
  • [42] S. C. Zhu and D. B. Mumford.
    A stochastic grammar of images.
    Foundations and Trends in Computer Graphics and Vision, 2, 259-362, 2006.
  • [43] S. C. Zhu, Y. N. Wu, and D. B. Mumford. Minimax entropy principle and its application to texture modeling. Neural Computation, 9, 1627-1660, 1998.

Similar Articles

Retrieve articles in Quarterly of Applied Mathematics with MSC (2000): 62M40

Retrieve articles in all journals with MSC (2000): 62M40


Additional Information

Yi Hong
Affiliation: Department of Computer Science, University of California, Los Angeles, California 90024
Email: yihong@cs.ucla.edu

Zhangzhang Si
Affiliation: Google Inc.
Email: zhangzhang.si@gmail.com

Wenze Hu
Affiliation: Department of Statistics, University of California, Los Angeles, California 90024
Email: wenzehu@ucla.edu

Song-Chun Zhu
Affiliation: Department of Statistics, University of California, Los Angeles, California 90024
Email: sczhu@stat.ucla.edu

Ying Nian Wu
Affiliation: Department of Statistics, University of California, Los Angeles, California 90024
Email: ywu@stat.ucla.edu

DOI: http://dx.doi.org/10.1090/S0033-569X-2013-01361-5
PII: S 0033-569X(2013)01361-5
Received by editor(s): October 23, 2012
Received by editor(s) in revised form: February 10, 2013
Published electronically: November 14, 2013
Article copyright: © Copyright 2013 Brown University



Brown University The Quarterly of Applied Mathematics
is distributed by the American Mathematical Society
for Brown University
Online ISSN 1552-4485; Print ISSN 0033-569X
© 2014 Brown University
Comments: qam-query@ams.org
AMS Website