Unsupervised learning of compositional sparse code for natural image representation

Hong, Yi; Si, Zhangzhang; Hu, Wenze; Zhu, Song-Chun; Wu, Ying

doi:10.1090/S0033-569X-2013-01361-5

Authors: Yi Hong, Zhangzhang Si, Wenze Hu, Song-Chun Zhu and Ying Nian Wu
Journal: Quart. Appl. Math. 72 (2014), 373-406
MSC (2000): Primary 62M40
DOI: https://doi.org/10.1090/S0033-569X-2013-01361-5
Published electronically: November 14, 2013
MathSciNet review: 3186243
Full-text PDF Free Access

Abstract | References | Similar Articles | Additional Information

Abstract: This article proposes an unsupervised method for learning compositional sparse code for representing natural images. Our method is built upon the original sparse coding framework where there is a dictionary of basis functions often in the form of localized, elongated and oriented wavelets, so that each image can be represented by a linear combination of a small number of basis functions automatically selected from the dictionary. In our compositional sparse code, the representational units are composite: they are compositional patterns formed by the basis functions. These compositional patterns can be viewed as shape templates. We propose an unsupervised learning method for learning a dictionary of frequently occurring templates from training images, so that each training image can be represented by a small number of templates automatically selected from the learned dictionary. The compositional sparse code approximates the raw image of a large number of pixel intensities using a small number of templates, thus facilitating the signal-to-symbol transition and allowing a symbolic description of the image. The current form of our model consists of two layers of representational units (basis functions and shape templates). It is possible to extend it to multiple layers of hierarchy. Experiments show that our method is capable of learning meaningful compositional sparse code, and the learned templates are useful for image classification.

References

M. Aharon, M. Elad, and A. M. Bruckstein. The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representation, IEEE Transactions On Signal Processing, 54, 4311-4322, 2006.

C. C. Chang and C. J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 1-27, 2011.

Scott Shaobing Chen, David L. Donoho, and Michael A. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. 20 (1998), no. 1, 33–61. MR 1639094, DOI https://doi.org/10.1137/S1064827596304010

J. Chen and X. Huo. Sparse representations for multiple measurements vectors (mmv) in an overcomplete dictionary. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 4, 257–260, 2005.

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. Workshop of European Conference on Computer Vision, 2004.

J. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of Optical Society of America, 2, 1160-1169, 1985.

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Statist. Soc. Ser. B 39 (1977), no. 1, 1–38. With discussion. MR 501537

David L. Donoho and Xiaoming Huo, Uncertainty principles and ideal atomic decomposition, IEEE Trans. Inform. Theory 47 (2001), no. 7, 2845–2862. MR 1872845, DOI https://doi.org/10.1109/18.959265

R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871-1874, 2008.

L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. CVPR Workshop, 2004.

V. Ferrari, F. Jurie, and C. Schmid. From images to shape models for object detection. International Journal of Computer Vision, 87, 284–303, 2010.

S. Fidler, M. Boben, and A. Leonardis. Similarity-based cross-layered hierarchical representation for object categorization. IEEE Conference on Computer Vision and Pattern Recognition, 2008.

Chris Fraley and Adrian E. Raftery, Model-based clustering, discriminant analysis, and density estimation, J. Amer. Statist. Assoc. 97 (2002), no. 458, 611–631. MR 1951635, DOI https://doi.org/10.1198/016214502760047131

Jerome H. Friedman, Exploratory projection pursuit, J. Amer. Statist. Assoc. 82 (1987), no. 397, 249–266. MR 883353

Stuart Geman, Daniel F. Potter, and Zhiyi Chi, Composition systems, Quart. Appl. Math. 60 (2002), no. 4, 707–736. MR 1939008, DOI https://doi.org/10.1090/qam/1939008

Y. Jin and S. Geman. Context and hierarchy in a probabilistic image model. IEEE Conference on Computer Vision and Pattern Recognition, 2006.

S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE Conference on Computer Vision and Pattern Recognition, 2006.

H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. International Conference on Machine Learning, 2009.

C. Lo, Amazing Chinese Characters, Panda Media Co., 2002.

K. Lounici, A. B. Tsybakov, M. Pontil, and S. A. van de Geer. Taking advantage of sparsity in multi-task learning. Proceedings of the 22nd Conference on Learning Theory, 2009.

D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110, 2004.

S. Mallat and Z. Zhang. Matching pursuit in a time-frequency dictionary. IEEE Transactions on Signal Processing, 41, 3397-3415, 1993.

M. Marszalek and C. Schmid. Accurate object localization with shape masks. IEEE Conference on Computer Vision and Pattern Recognition, 2007.

Boris Mirkin, Mathematical classification and clustering, Nonconvex Optimization and its Applications, vol. 11, Kluwer Academic Publishers, Dordrecht, 1996. MR 1480413

Guillaume Obozinski, Martin J. Wainwright, and Michael I. Jordan, Support union recovery in high-dimensional multivariate regression, Ann. Statist. 39 (2011), no. 1, 1–47. MR 2797839, DOI https://doi.org/10.1214/09-AOS776

B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607-609, 1996.

B. A. Olshausen, P. Sallee, and M. S. Lewicki. Learning sparse image codes using a wavelet pyramid architecture. Advances in Neural Information Processing Systems, 13, 887-893, 2001.

S. D. Pietra, V. D. Pietra, and J. Lafferty. Inducing Features of Random Fields. IEEE Transactions on Pattern Recognition and Machine Intelligence, 19, 380-393, 1997.

M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025, 1999.

Jorma Rissanen, Information and complexity in statistical modeling, Information Science and Statistics, Springer, New York, 2007. MR 2287233

Gideon Schwarz, Estimating the dimension of a model, Ann. Statist. 6 (1978), no. 2, 461–464. MR 468014

J. Tropp, A. Gilbert, and M. Straus. Algorithms for simultaneous sparse approximation. part I: Greedy pursuit. Journal of Signal Processing, 86, 572–588, 2006.

Robert Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B 58 (1996), no. 1, 267–288. MR 1379242

Vladimir N. Vapnik, The nature of statistical learning theory, 2nd ed., Statistics for Engineering and Information Science, Springer-Verlag, New York, 2000. MR 1719582

Z. Si and S. C. Zhu. Learning and-or templates for object modeling and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, to appear, 2013.

Ying Nian Wu, Cheng-En Guo, and Song-Chun Zhu, From information scaling of natural images to regimes of statistical models, Quart. Appl. Math. 66 (2008), no. 1, 81–122. MR 2396653, DOI https://doi.org/10.1090/S0033-569X-07-01063-2

Ying Nian Wu, Zhangzhang Si, Haifeng Gong, and Song-Chun Zhu, Learning active basis model for object detection and recognition, Int. J. Comput. Vis. 90 (2010), no. 2, 198–235. MR 2719010, DOI https://doi.org/10.1007/s11263-009-0287-0

Ming Yuan and Yi Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B Stat. Methodol. 68 (2006), no. 1, 49–67. MR 2212574, DOI https://doi.org/10.1111/j.1467-9868.2005.00532.x

M. Zeiler, G. Taylor, and R. Fergus. Adaptive deconvolutional networks for mid and high level feature learning. International Conference on Computer Vision, 2011.

L. Zhu, C. Lin, H. Huang, Y. Chen, and A. Yuille. Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion. European Conference on Computer Vision, 2008.

S. C. Zhu, C. Guo, Y. Wang, and Z. Xu. What are textons? International Conference on Computer Vision, 62, 121–143, 2005.

S. C. Zhu and D. B. Mumford. A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2, 259–362, 2006.

S. C. Zhu, Y. N. Wu, and D. B. Mumford. Minimax entropy principle and its application to texture modeling. Neural Computation, 9, 1627-1660, 1998.