Deformable classifiers

Shen, Jiajun; Amit, Yali

doi:10.1090/qam/1525

Authors: Jiajun Shen and Yali Amit
Journal: Quart. Appl. Math. 77 (2019), 207-226
MSC (2010): Primary 62H35
DOI: https://doi.org/10.1090/qam/1525
Published electronically: January 18, 2019
MathSciNet review: 3932959
Full-text PDF

Abstract | References | Similar Articles | Additional Information

Abstract: Geometric variations of objects, which do not modify the object class, pose a major challenge for object recognition. These variations could be rigid as well as non-rigid transformations. In this paper, we design a framework for training deformable classifiers, where latent transformation variables are introduced, and a transformation of the object image to a reference instantiation is computed in terms of the classifier output, separately for each class. The classifier outputs for each class, after transformation, are compared to yield the final decision. As a by-product of the classification this yields a transformation of the input object to a reference pose, which can be used for downstream tasks such as the computation of object support. We apply a two-step training mechanism for our framework, which alternates between optimizing over the latent transformation variables and the classifier parameters to minimize the loss function. We show that multilayer perceptrons, also known as deep networks, are well suited for this approach and achieve state of the art results on the rotated MNIST and the Google Earth dataset, and produce competitive results on MNIST and CIFAR-10 when training on smaller subsets of training data.

References

S. Allassonnière, Y. Amit, and A. Trouvé, Towards a coherent statistical framework for dense deformable template estimation, J. R. Stat. Soc. Ser. B Stat. Methodol. 69 (2007), no. 1, 3–29. MR 2301497, DOI https://doi.org/10.1111/j.1467-9868.2007.00574.x

Y. Amit, U. Grenander, and M. Piccioni, Structural image restoration through deformable template, Journal of the American Statistical Association 86 (1991), no. 414, 376–387.

Y. Amit and A. Trouvé, Pop: Patchwork of parts models for object recognition, Intl. Jour. of Comp. Vis. 75 (2007), 267–282.

S. Andrews, I. Tsochantaridis, and T. Hofmann, Support vector machines for multiple-instance learning, Advances in neural information processing systems, pages 577–584, 2003.

F. L. Bookstein, Principal warps: Thin-plate splines and the decomposition of deformations, IEEE Transactions on pattern analysis and machine intelligence, 11(6):567–585, 1989.

T. S. Cohen and M. Welling, Group equivariant convolutional networks, arXiv preprint arXiv:1602.07576 (2016).

T. S. Cohen and M. Welling. Steerable cnns, arXiv preprint arXiv:1612.08498 (2016).

S. Dieleman, J. De Fauw, and . Kavukcuoglu, Exploiting cyclic symmetry in convolutional neural networks, arXiv preprint arXiv:1602.02660 (2016).

A. Dosovitskiy, J. T. Springenberg, M. Riedmiller, and T. Brox, Discriminative unsupervised feature learning with convolutional neural networks, Advances in Neural Information Processing Systems, pages 766–774, 2014.

S. Dieleman, K. W. Willett, and J. Dambre, Rotation-invariant convolutional neural networks for galaxy morphology prediction, Monthly notices of the royal astronomical society, 450(2):1441–1459, 2015.

P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, Object detection with discriminatively trained part-based models, IEEE transactions on pattern analysis and machine intelligence, 32(9):1627–1645, 2010.

B. Fasel and D. Gatica-Perez, Rotation-invariant neoperceptron, Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, volume 3, pages 336–339. IEEE, 2006.

R. Gens and P. M. Domingos, Deep symmetry networks, Advances in neural information processing systems, pages 2537–2545, 2014.

Ulf Grenander, General pattern theory, Oxford Mathematical Monographs, The Clarendon Press, Oxford University Press, New York, 1993. A mathematical study of regular structures; Oxford Science Publications. MR 1270904

U. Grenander, Y. Chow, and D. M. Keenan, A pattern theoretical study of biological shape, Springer Verlag, New York, 1991.

Ulf Grenander and Michael I. Miller, Computational anatomy: an emerging discipline, Quart. Appl. Math. 56 (1998), no. 4, 617–694. Current and future challenges in the applications of mathematics (Providence, RI, 1997). MR 1668732, DOI https://doi.org/10.1090/qam/1668732

G. Heitz and D. Koller, Learning spatial context: Using stuff to find things, European conference on computer vision, pages 30–43. Springer, 2008.

J. F. Henriques, P. Martins, R. F. Caseiro, and J. Batista, Fast training of pose detectors in the Fourier domain, Advances in neural information processing systems, pages 3050–3058, 2014.

J. F. Henriques and A. Vedaldi, Warped convolutions: Efficient invariance to spatial transformations, arXiv preprint arXiv:1609.04382 (2016).

M. Jaderberg, K. Simonyan, A. Zisserman, et al., Spatial transformer networks, Advances in Neural Information Processing Systems, pages 2017–2025, 2015.

A. Krizhevsky and G. Hinton, Learning multiple layers of features from tiny images, 2009.

Y. LeCun, C. Cortes, and C. J. C. Burges, The mnist database of handwritten digits, 1998.

D. Laptev, N. Savinov, J. M. Buhmann, and M. Pollefeys, Ti-pooling: Transformation-invariant pooling for feature learning in convolutional neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 289–297, 2016.

H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, An empirical evaluation of deep architectures on problems with many factors of variation, Proceedings of the 24th international conference on Machine learning, pages 473–480. ACM, 2007.

D. Marcos, M. Volpi, N. Komodakis, and D. Tuia, Rotation equivariant vector field networks, arXiv preprint arXiv:1612.09346 (2016).

Michael I. Miller, Alain Trouvé, and Laurent Younes, Geodesic shooting for computational anatomy, J. Math. Imaging Vision 24 (2006), no. 2, 209–228. MR 2227097, DOI https://doi.org/10.1007/s10851-005-3624-0

A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, arXiv preprint arXiv:1511.06434 (2015).

I. Rocco, R. Arandjelovic, and J. Sivic, Convolutional neural network architecture for geometric matching, Proc. CVPR, volume 2, 2017.

N. S. Detlefsen, O. Freifeld, and S. Hauberg, Deep diffeomorphic transformer networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4403–4412, 2018.

K. Sohn and H. Lee, Learning invariant representations with local transformations, arXiv preprint arXiv:1206.6418 (2012).

D. Teney and M. Hebert, Learning to extract motion from videos in convolutional neural networks, arXiv preprint arXiv:1601.07532 (2016).

N. van Noord and E. Postma, Learning scale-variant and scale-invariant features for deep image classification, Pattern Recognition, 61:583–592, 2017.

F. Wu, P. Hu, and D. Kong, Flip-rotate-pooling convolution and split dropout on convolution neural networks for image classification, arXiv preprint arXiv:1507.08754 (2015).

X. Yang, R. Kwitt, M. Styner, and M. Niethammer, Quicksilver: Fast predictive image registration–a deep learning approach, NeuroImage, 158:378–396, 2017.

Laurent Younes, Shapes and diffeomorphisms, Applied Mathematical Sciences, vol. 171, Springer-Verlag, Berlin, 2010. MR 2656312

Y. Zhou, Q. Ye, Q. Qiu, and J. Jiao, Oriented response networks, arXiv preprint arXiv:1701.01833 (2017).