Skip to Main Content

an Equivariant Neural Network?

Lek-Heng Lim
Bradley J. Nelson

Communicated by Notices Associate Editor Emilie Purvine

We explain equivariant neural networks, a notion underlying breakthroughs in machine learning from deep convolutional neural networks for computer vision KSH12 to AlphaFold 2 for protein structure prediction JEP21, without assuming knowledge of equivariance or neural networks. The basic mathematical ideas are simple but are often obscured by engineering complications that come with practical realizations. We extract and focus on the mathematical aspects, and limit ourselves to a cursory treatment of the engineering issues at the end. We also include some materials with machine learning practitioners in mind.

Let and be sets, and a function. If a group acts on both and , and this action commutes with the function :

then we say that is -equivariant. The special case where acts trivially on is called -invariant. Linear equivariant maps are well-studied in representation theory and continuous equivariant maps are well-studied in topology. The novelty of equivariant neural networks is that they are usually nonlinear and sometimes discontinuous, even when and are vector spaces and the actions of are linear.

Equivariance is ubiquitous in applications where symmetries in the input space produce symmetries in the output space . We consider a simple example. An image may be regarded as a function , with each pixel assigned some RGB color . A simplifying assumption here is that pixels and colors can take values in a continuum. Let be the set of all images. Let the group act on via top-bottom reflection, i.e., is the image whose value at is . Let ,

Graphic without alt text

Here and are the RGB encodings for pitch black and pure white respectively. So the map , transforms a color image into a black-and-white image. It does not matter whether we do a top-bottom reflection first or remove color first, the result is always the same, i.e., for all . Hence the decoloring map is -equivariant.

Our choice of an image with left-right symmetry presents another opportunity to illustrate the notion. If we choose coordinates so that the vertical axis passes through the center of the butterfly image, then as a function , it is invariant under the action of on via , i.e., . Note that the -equivariance of has nothing to do with this.

Graphic without alt text

A takeaway of these examples is that nonlinear and discontinuous functions may very well be equivariant. However, the best known context for discussing equivariant maps is when is an intertwining operator, i.e., a linear map between vector spaces and equipped with a linear action of . In this case, an equivalent formulation of -equivariance takes the following form: Given linear representations of on and , i.e., homomorphisms and , a linear map is said to be -equivariant if

Intertwining operators preserve eigenvalues and, when is a Lie group, the action of its Lie algebra, properties that are crucial to their use in physics BH10.

Nevertheless the restriction to linear maps is unnecessary. The de Rham problem asks if and is merely required to be a homeomorphism, then does condition 1 imply that must be a linear map? De Rham conjectured this to be the case but it was disproved in CS81, launching a fruitful study of nonlinear similarity, i.e., nonlinear homeomorphisms with

in algebraic topology and algebraic K-theory. More generally, equivariant continuous maps under continuous group actions have been thoroughly studied in equivariant topology May96.

An equivariant neural network CW16 is an equivariant map constructed from alternately composing equivariant linear maps with nonlinear ones like the decoloring map above. That neural networks can be readily made equivariant is a consequence of two straightforward observations:

(i)

the composition of two -equivariant functions , is -equivariant;

(ii)

the linear combination of two -equivariant functions is -equivariant;

even when are nonlinear. Although an equivariant neural network is nonlinear, it uses intertwining operators as building blocks, and 1 plays a key role. In some applications like the wave function , the input or possibly some hidden layers may not be vector spaces; for simplicity we assume that they are and their -actions are linear.

In machine learning applications, the map is learned from data. A major advantage of requiring equivariance in a neural network is that it allows one to greatly narrow down the search space for the parameters that define . To demonstrate this, we begin with a simplified case that avoids group representations. A feed-forward neural network is a function obtained by alternately composing affine maps , , with a nonlinear function , i.e.,

giving . The depth, also known as the number of layers, is and the width, also known as the number of neurons, is . The simplifying assumption, which will be dropped later, is that our neural network has constant width throughout all layers. The nonlinear function is called an activation, with the ReLU (rectified linear unit) function for a standard choice. In a slight abuse of notation, the activation is extended to vector inputs by evaluating coordinatewise

In this sense, is called a pointwise nonlinearity. The affine function is defined by for some called the weight matrix and some called the bias vector. We do not include a bias in the last layer.

Although convenient, it is somewhat misguided to lump the bias and weight together in an affine function. Each bias is intended to serve as a threshold for the activation and should be part of it, detached from the weight that transforms the input. If one would like to incorporate translations, one may do so by going up one dimension, observing that . Hence, a better, but mathematically equivalent, description of would be as the composition

where we identify with the linear operator , , and for any we define by . We will drop the composition symbol to avoid clutter and write

as if it were a product of matrices. For example, with the aforementioned ReLU as ,

and plays the role of a threshold for activation as was intended in Ros58, p. 392 and MP43, p. 120.

A major computational issue with neural networks is the large number of unknown parameters, namely the entries of the weights and biases, that have to be fit with data, especially for wide neural networks where is large. To get an idea of the numbers involved in realistic situations, may be on the order of millions of pixels for image-based tasks, whereas is typically to layers deep. Computational cost aside, one may not have enough data to fit so many parameters. Thus, many successful applications of neural networks require that we identify, based on the problem at hand, an appropriate low-dimensional subset of from which we will find our weights . For example, for a signal processing problem, we might restrict to be Toeplitz matrices; the convolutional neural networks for image recognition in KSH12, an article that launched the deep learning revolution, essentially restrict to so called block-Toeplitz–Toeplitz-block or BTTB matrices. For 1D inputs with a single channel, i.e., inputs from , a general weight matrix requires parameters, whereas a Toeplitz one just needs parameters and an -banded Toeplitz one requires just . For 2D inputs with channels such as color images, i.e., inputs from , a general weight matrix would have required a staggering parameters, whereas a BTTB one just needs , and an local BTTB one requires just . These are all simplified versions of the convolutional layers in a convolutional neural networks, which are a quintessential example of equivariant neural networks CW16, and in fact every equivariant neural network may be regarded as a generalized convolutional neural network in an appropriate sense KT18.

To see how equivariance naturally restricts the range of possible , let be a matrix group. Then is -equivariant if

and an equivariant neural network is simply a feed-forward neural network that satisfies 4. The key to its construction is just that

and the last expression equals if we have

for all , and for all . The condition on the right is satisfied by any pointwise nonlinearity that takes the form in 3, i.e., has all coordinates equal to some ; we will elaborate on this later. The condition on the left limits the possible weights for to a (generally) much smaller subspace of matrices that commute with all elements of . Finding this subspace (in fact a subalgebra) of intertwining operators,

is a well-studied problem in group representation theory; a general purpose approach is to compute the null space of a matrix built from the generators of and, if continuous, its Lie algebra FWW21. We caution the reader that will generally be a very low-dimensional subset of , as will become obvious from our example below in 8. It will be pointless to pick, say, as the set in 6 will then be just , clearly too small to serve as meaningful weights for any neural network. Indeed, will usually be a homomorphic image of a representation , i.e., the image will play the role of in 6. In any case, we will need to bring in group representations to address a different issue.

In general, neural networks have different width in each layer:

with , , , . The simplified case treated above assumes that . It is easy to accommodate this slight complication by introducing group representations to equip every layer with its own homomorphic copy of . Instead of fixing to be some subgroup of , may now be any abstract group but we introduce a homomorphism

in each layer, and replace the equivariant condition 5 with the more general 1, i.e.,

or, equivalently,

for all . In case 7 evokes memories of Schur’s Lemma, we would like to stress that the representations are in general very far from being irreducible and that the map is nonlinear. Indeed the scenario described by Schur’s Lemma is undesirable for equivariant neural networks: As we pointed out earlier, we do not want to restrict our weight matrices to the form or a direct sum of these.

We summarize our discussion with a formal definition.

Definition 1.

Let , , , , and be a continuous function. Let be a group and , , be its representations. The -layer feed-forward neural network given by

is a -equivariant neural network with respect to if 7 holds for all . Here , , is a pointwise nonlinearity as in 2.

Figure 1.

A -layer -activated feed-forward neural network, also known as a multilayer perceptron.

Graphic without alt text

A word of caution is in order here. What we call a neural network MP43, i.e., the alternate composition of activations with affine maps, is sometimes also called a multilayer perceptron Ros58; a standard depiction is shown in Figure 1. When it is fit with data, one would invariably feed its output into a loss function and that is usually not equivariant; or one might chain together multiple units of multilayer perceptrons into larger frameworks like autoencoders, generative adversarial networks, transformers, etc, that contain other nonequivariant components. In the literature, the term “neural network” sometimes refers to the entire framework collectively. In our article, it just refers to the multilayer perceptron—this is the part that is equivariant.

We will use an insightful toy example as illustration. Let be the set of possible positions of unit-weight masses, , and compute the center of mass

with . We use the same system of coordinates for each copy of in and . If we work in a different coordinate system, the position of the center of mass remains unchanged but its coordinates will change accordingly. For simplicity, we consider a linear change of coordinates, represented by the action of a matrix on each point in . By linearity,

so is -equivariant. Since each mass has the same unit weight, is also invariant under permutations of the input points. Let , which acts on via and acts trivially on via . As the sum in 8 is permutation invariant,

so is -invariant. Combining our two group actions, we see that is -equivariant. Note that the group here is , which has much lower dimension than for large . This is typical in equivariant neural networks.

In this simple example, we not only know but have an explicit expression for it. In general, there are many functions that we know should be equivariant or invariant to certain group actions, but for which we do not know any simple closed-form expression; and this is where it helps to assume that is given by some neural network whose parameters could be determined by fitting it with data, or, if it is used as an ansatz, by plugging into some differential or integral equations. A simple data-fitting example is provided by semantic segmentation in images, which seeks to classify pixels as belonging to one of several types of objects. If we rotate or mirror an image, we expect that pixel labels should follow the pixels. A more realistic version of the center of mass example would be a molecule represented by positions of its atoms, which comes up in chemical property or drug response predictions. Here we want equivariance with respect to coordinate transformations, but we wish to preserve pairwise distances between atoms and chirality, so the natural group to use is KLT18 or the special Euclidean group WGW18FWFW20. The much-publicized protein structure prediction engine of DeepMind’s AlphaFold 2 relies on an -equivariant neural network and an -invariant attention module JEP21. In TEW21, -equivariant convolution is used to improve accuracy assessments of RNA structure models.

Another straightforward example comes from computational quantum chemistry, where one seeks a solution to a Schrödinger equation: if we write , then the wave function of identical spin- fermions is antisymmetric, i.e.,

for all . In other words, the increasingly popular antisymmetric neural networks HSN20 are -equivariant neural networks. Even without going into the details, the reader could well imagine that restricting to neural networks that are antisymmetric is a savings from having to consider all possible neural networks. More esoteric examples in particle physics call for Lorentz groups of various stripes like , , , or , which are used in Lorentz-equivariant neural networks to identify top quarks in data from high-energy physics experiments BAO20.

We now discuss the equivariant condition for pointwise nonlinearities . It is instructive to look at a simple numerical example. Suppose we apply a pointwise nonlinearity and a permutation matrix given by

to a vector . We see that :

which clearly holds more generally, i.e., for any permutation matrix and any pointwise nonlinearity . The bottom line is that the permutation matrix comes from a representation ; and since acts on the indices of and acts on the values of , the two actions are always independent of each other. More generally, it is easy to see that if we include a bias term , then is -equivariant as long as has all coordinates equal CW16. This does not necessarily hold for more general : Take the example above and set the bias to be .

but . So . Going beyond pointwise nonlinearity is a nontrivial issue and is crucial when the neural network requires more than just -equivariance. We will say a few words about this below.

The mathematical ideas that we have described are all fairly straightforward. Indeed the technical challenges in equivariant neural networks are mostly about getting these mathematical ideas to work in real-life situations, what we have swept under the “engineering complications” rug. We will discuss a few of these but as engineering complications go, they invariably depend on the problem at hand and every case is different.

The butterfly image example presented at the beginning already concealed several difficulties. While we have assumed that images are functions , in real life they are sampled on a grid, i.e., pixels are discrete, and a more realistic model would be . Instead of a straightforward -equivariance as one might expect for imaging problems, one instead finds discussions of equivariance CW16 with respect to wallpaper groups like

for translation in ; or

that augments with right-angle rotations; or

that further augments with reflections. The reason for these choices is that they have to send pixels to pixels.

There is also the important issue of aliasing Zha19. When pixels are discrete, rotation will involve interpolation, and pointwise nonlinearities introduce higher order harmonics that produce aliasing and break equivariance FW21. This can happen even with discrete translations like those in the groups above for standard convolutional neural networks. Dealing with aliasing and choosing equivariant activations that do not compromise expressive power MFSL19 are important problems that cannot be underemphasized; and dealing with these issues constitute a mainstay of the research and development in equivariance neural networks.

In reality the pixels of an image are not just discrete but also finite in number. So instead of , a -pixel image is more accurately a function on some discrete finite subset of points . Since these points are fixed, we may conveniently regard with each copy of representing one of three color channels. In such cases the output of each layer should not be treated simply as a vector space but a direct sum , with depending on and . The weight matrix would then have a corresponding block structure and the representation takes the form with .

For molecular structure prediction problems, in FWFW20TEW21, the input is a collection of points augmented with various information in addition to location coordinates, giving the input layer a direct sum structure that propagates through later layers. Just to give a flavor of what is involved, in TEW21, the inputs are atom positions in a model of an RNA molecule, with an encoding of atom type, and the output is an estimate of the root mean square error of the model’s structure; in one example in FWFW20, the inputs encode position, velocity, and charge of particles, and the output is an estimate of the location and velocity of each particle after some amount of time. In both examples, the function that maps inputs to outputs has no known expression but is known to be equivariant with respect to , i.e., translations and rotations of the coordinate system. The weight matrix has a block structure , , and is equivariant if each block is equivariant. Equivariance constrains each block to depend entirely on the relative input locations , and the permitted matrices can be expressed in terms of radial kernels, spherical harmonics, and Clebsch–Gordan coefficients WGW18.

The engineering aspects of equivariant neural networks are many and varied. While we have selectively discussed a few that are more common and mathematical in nature, we have also ignored many that are specific to the application at hand and often messy. We avoided most jargon used in the original literature as it tends to be mathematically imprecise or application specific. Nevertheless, we stress that equivariant neural networks are ultimately used in an engineering context and a large part of their success has to do with overcoming real engineering challenges.

Acknowledgment

We would like to thank the two anonymous reviewers and the handling editor for their many useful suggestions and comments. We would also like to thank Yuehaw Khoo, Risi Kondor, Zehua Lai, and Jiangying Zhou for helpful discussions. The simplified constant width case was adapted from Lim21, Example 2.16 where it was used to illustrate tensor transformation rules. We thank StackExchange user ‘Ulysses’ for his TikZ butterfly figure in https://tex.stackexchange.com/a/495243.

References

[BAO20]
Alexander Bogatskiy, Brandon Anderson, Jan Offermann, Marwah Roussi, David Miller, and Risi Kondor, Lorentz group equivariant neural network for particle physics, Proceedings of the 37th International Conference on Machine Learning (ICML), 2020, pp. 992–1002.Show rawAMSref\bib{Lorentz}{inproceedings}{ author={Bogatskiy, Alexander}, author={Anderson, Brandon}, author={Offermann, Jan}, author={Roussi, Marwah}, author={Miller, David}, author={Kondor, Risi}, title={{L}orentz group equivariant neural network for particle physics}, date={2020}, booktitle={{Proceedings of the 37th International Conference on Machine Learning (ICML)}}, editor={{Daum\'e III}, Hal}, editor={Singh, Aarti}, series={Proceedings of Machine Learning Research}, volume={119}, pages={992\ndash 1002}, } Close amsref.
[BH10]
John Baez and John Huerta, The algebra of grand unified theories, Bull. Amer. Math. Soc. (N.S.) 47 (2010), no. 3, 483–552.Show rawAMSref\bib{Baez}{article}{ author={Baez, John}, author={Huerta, John}, title={The algebra of grand unified theories}, date={2010}, journal={Bull. Amer. Math. Soc. (N.S.)}, volume={47}, number={3}, pages={483\ndash 552}, url={https://doi.org/10.1090/S0273-0979-10-01294-2}, } Close amsref.
[CS81]
Sylvain E. Cappell and Julius L. Shaneson, Nonlinear similarity, Ann. of Math. 113 (1981), no. 2, 315–355.Show rawAMSref\bib{Cappell}{article}{ author={Cappell, Sylvain~E.}, author={Shaneson, Julius~L.}, title={Nonlinear similarity}, date={1981}, journal={Ann. of Math.}, volume={113}, number={2}, pages={315\ndash 355}, url={https://doi.org/10.2307/2006986}, } Close amsref.
[CW16]
Taco Cohen and Max Welling, Group equivariant convolutional networks, Proceedings of the 33rd International Conference on Machine Learning (ICML), 2016, pp. 2990–2999.Show rawAMSref\bib{Welling}{inproceedings}{ author={Cohen, Taco}, author={Welling, Max}, title={Group equivariant convolutional networks}, date={2016}, booktitle={{Proceedings of the 33rd International Conference on Machine Learning (ICML)}}, editor={Balcan, Maria~Florina}, editor={Weinberger, Kilian~Q.}, series={Proceedings of Machine Learning Research}, volume={48}, pages={2990\ndash 2999}, } Close amsref.
[FW21]
Daniel Franzen and Michael Wand, General nonlinearities in -equivariant CNNs, Advances in Neural Information Processing Systems (NeurIPS), 2021, pp. 1970–1981.Show rawAMSref\bib{Franzen}{inproceedings}{ author={Franzen, Daniel}, author={Wand, Michael}, title={General nonlinearities in {$\SOr (2)$}-equivariant {CNN}s}, date={2021}, booktitle={{Advances in Neural Information Processing Systems (NeurIPS)}}, editor={Ranzato, Marc'Aurelio}, editor={others}, volume={34}, pages={1970\ndash 1981}, } Close amsref.
[FWFW20]
Fabian Fuchs, Daniel E. Worrall, Volker Fischer, and Max Welling, -transformers: 3D roto-translation equivariant attention networks, Advances in Neural Information Processing Systems (NeurIPS), 2020, pp. 1970–1981.Show rawAMSref\bib{Fuchs}{inproceedings}{ author={Fuchs, Fabian}, author={Worrall, Daniel~E.}, author={Fischer, Volker}, author={Welling, Max}, title={{$\SE (3)$}-transformers: {3D} roto-translation equivariant attention networks}, date={2020}, booktitle={{Advances in Neural Information Processing Systems (NeurIPS)}}, editor={Larochelle, Hugo}, editor={others}, pages={1970\ndash 1981}, } Close amsref.
[FWW21]
Marc Finzi, Max Welling, and Andrew Gordon Wilson, A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups, Proceedings of the 38th International Conference on Machine Learning (ICML), 2021, pp. 3318–3328.Show rawAMSref\bib{finzi_practical_2021}{inproceedings}{ author={Finzi, Marc}, author={Welling, Max}, author={Wilson, Andrew~Gordon}, title={A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups}, date={2021}, booktitle={{Proceedings of the 38th International Conference on Machine Learning (ICML)}}, pages={3318\ndash 3328}, } Close amsref.
[HSN20]
Jan Hermann, Zeno Schätzle, and Frank Noé, Deep-neural-network solution of the electronic Schrödinger equation, Nat. Chem. 12 (2020), 891–897.Show rawAMSref\bib{PauliNet}{article}{ author={Hermann, Jan}, author={Sch\"atzle, Zeno}, author={No\'e, Frank}, title={Deep-neural-network solution of the electronic {S}chr\"odinger equation}, date={2020}, journal={Nat. Chem.}, volume={12}, pages={891\ndash 897}, url={https://doi.org/10.1038/s41557-020-0544-y}, } Close amsref.
[JEP21]
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, and Demis Hassabis, Highly accurate protein structure prediction with AlphaFold, Nature 596 (2021), 583–589.Show rawAMSref\bib{Alpha}{article}{ author={Jumper, John}, author={Evans, Richard}, author={Pritzel, Alexander}, author={Green, Tim}, author={Figurnov, Michael}, author={Ronneberger, Olaf}, author={Tunyasuvunakool, Kathryn}, author={Bates, Russ}, author={\v {Z}\'{i}dek, Augustin}, author={Potapenko, Anna}, author={Bridgland, Alex}, author={Meyer, Clemens}, author={Kohl, Simon A.~A.}, author={Ballard, Andrew~J.}, author={Cowie, Andrew}, author={Romera-Paredes, Bernardino}, author={Nikolov, Stanislav}, author={Jain, Rishub}, author={Adler, Jonas}, author={Back, Trevor}, author={Petersen, Stig}, author={Reiman, David}, author={Clancy, Ellen}, author={Zielinski, Michal}, author={Steinegger, Martin}, author={Pacholska, Michalina}, author={Berghammer, Tamas}, author={Bodenstein, Sebastian}, author={Silver, David}, author={Vinyals, Oriol}, author={Senior, Andrew~W.}, author={Kavukcuoglu, Koray}, author={Kohli, Pushmeet}, author={Hassabis, Demis}, title={Highly accurate protein structure prediction with {AlphaFold}}, date={2021}, journal={Nature}, volume={596}, pages={583\ndash 589}, url={https://doi.org/10.1038/s41586-021-03819-2}, } Close amsref.
[KLT18]
Risi Kondor, Zhen Lin, and Shubhendu Trivedi, Clebsch–Gordan nets: a fully Fourier space spherical convolutional neural network, Advances in Neural Information Processing Systems (NeurIPS), 2018, pp. 10117–10126.Show rawAMSref\bib{Risi2}{inproceedings}{ author={Kondor, Risi}, author={Lin, Zhen}, author={Trivedi, Shubhendu}, title={Clebsch--{G}ordan nets: a fully {Fourier} space spherical convolutional neural network}, date={2018}, booktitle={{Advances in Neural Information Processing Systems (NeurIPS)}}, editor={Bengio, S.}, editor={Wallach, H.}, editor={Larochelle, H.}, editor={Grauman, K.}, editor={Cesa-Bianchi, N.}, editor={Garnett, R.}, volume={31}, pages={10117\ndash 10126}, } Close amsref.
[KSH12]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS), 2012, pp. 1097–1105.Show rawAMSref\bib{AlexNet}{inproceedings}{ author={Krizhevsky, Alex}, author={Sutskever, Ilya}, author={Hinton, Geoffrey~E.}, title={{ImageNet} classification with deep convolutional neural networks}, date={2012}, booktitle={{Advances in Neural Information Processing Systems (NIPS)}}, editor={Pereira, F.}, editor={others}, pages={1097\ndash 1105}, } Close amsref.
[KT18]
Risi Kondor and Shubhendu Trivedi, On the generalization of equivariance and convolution in neural networks to the action of compact groups, Proceedings of the 35th International Conference on Machine Learning (ICML), 2018, pp. 2747–2755.Show rawAMSref\bib{Risi1}{inproceedings}{ author={Kondor, Risi}, author={Trivedi, Shubhendu}, title={On the generalization of equivariance and convolution in neural networks to the action of compact groups}, date={2018}, booktitle={{Proceedings of the 35th International Conference on Machine Learning (ICML)}}, editor={Dy, Jennifer}, editor={Krause, Andreas}, series={Proceedings of Machine Learning Research}, volume={80}, pages={2747\ndash 2755}, } Close amsref.
[Lim21]
Lek-Heng Lim, Tensors in computations, Acta Numer. 30 (2021), 555–764.Show rawAMSref\bib{L}{article}{ author={Lim, Lek-Heng}, title={Tensors in computations}, date={2021}, journal={Acta Numer.}, volume={30}, pages={555\ndash 764}, url={https://doi.org/10.1017/S0962492921000076}, } Close amsref.
[May96]
J. P. May, Equivariant homotopy and cohomology theory, CBMS Regional Conference Series in Mathematics, vol. 91, Published for the Conference Board of the Mathematical Sciences, Washington, DC; by the American Mathematical Society, Providence, RI, 1996. With contributions by M. Cole, G. Comezaña, S. Costenoble, A. D. Elmendorf, J. P. C. Greenlees, L. G. Lewis, Jr., R. J. Piacenza, G. Triantafillou, and S. Waner, DOI 10.1090/cbms/091. MR1413302Show rawAMSref\bib{May}{book}{ author={May, J. P.}, title={Equivariant homotopy and cohomology theory}, series={CBMS Regional Conference Series in Mathematics}, volume={91}, note={With contributions by M. Cole, G. Comeza\~{n}a, S. Costenoble, A. D. Elmendorf, J. P. C. Greenlees, L. G. Lewis, Jr., R. J. Piacenza, G. Triantafillou, and S. Waner}, publisher={Published for the Conference Board of the Mathematical Sciences, Washington, DC; by the American Mathematical Society, Providence, RI}, date={1996}, pages={xiv+366}, isbn={0-8218-0319-0}, review={\MR {1413302}}, doi={10.1090/cbms/091}, } Close amsref.
[MFSL19]
Haggai Maron, Ethan Fetaya, Nimrod Segol, and Yaron Lipman, On the universality of invariant networks, Proceedings of the 36th International Conference on Machine Learning (ICML), 2019, pp. 4363–4371.Show rawAMSref\bib{Maron}{inproceedings}{ author={Maron, Haggai}, author={Fetaya, Ethan}, author={Segol, Nimrod}, author={Lipman, Yaron}, title={On the universality of invariant networks}, date={2019}, booktitle={{Proceedings of the 36th International Conference on Machine Learning (ICML)}}, editor={Chaudhuri, Kamalika}, editor={Salakhutdinov, Ruslan}, series={Proceedings of Machine Learning Research}, volume={97}, pages={4363\ndash 4371}, } Close amsref.
[MP43]
Warren S. McCulloch and Walter Pitts, A logical calculus of the ideas immanent in nervous activity, Bull. Math. Biophys. 5 (1943), 115–133, DOI 10.1007/bf02478259. MR10388Show rawAMSref\bib{MP}{article}{ author={McCulloch, Warren S.}, author={Pitts, Walter}, title={A logical calculus of the ideas immanent in nervous activity}, journal={Bull. Math. Biophys.}, volume={5}, date={1943}, pages={115--133}, issn={0007-4985}, review={\MR {10388}}, doi={10.1007/bf02478259}, } Close amsref.
[Ros58]
Frank Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain, Psychol. Rev. 65 (1958), 386–408.Show rawAMSref\bib{Rosenblatt}{article}{ author={Rosenblatt, Frank}, title={The perceptron: A probabilistic model for information storage and organization in the brain}, date={1958}, journal={Psychol. Rev.}, volume={65}, pages={386\ndash 408}, } Close amsref.
[TEW21]
Raphael J. L. Townshend, Stephan Eismann, Andrew M. Watkins, Ramya Rangan, Maria Karelina, Rhiju Das, and Ron O. Dror, Geometric deep learning of RNA structure, Science 373 (2021), no. 6558, 1047–1051.Show rawAMSref\bib{townshend_RNA_2021}{article}{ author={Townshend, Raphael J.~L.}, author={Eismann, Stephan}, author={Watkins, Andrew~M.}, author={Rangan, Ramya}, author={Karelina, Maria}, author={Das, Rhiju}, author={Dror, Ron~O.}, title={Geometric deep learning of {RNA} structure}, date={2021}, journal={Science}, volume={373}, number={6558}, pages={1047\ndash 1051}, } Close amsref.
[WGW18]
Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, and Taco S. Cohen, 3D Steerable CNNs: Learning rotationally equivariant features in volumetric data, Advances in Neural Information Processing Systems (NeurIPS), 2018, pp. 10381–10392.Show rawAMSref\bib{weiler_3d_2018}{inproceedings}{ author={Weiler, Maurice}, author={Geiger, Mario}, author={Welling, Max}, author={Boomsma, Wouter}, author={Cohen, Taco~S.}, title={{3D Steerable {CNN}s}: Learning rotationally equivariant features in volumetric data}, date={2018}, booktitle={{Advances in Neural Information Processing Systems (NeurIPS)}}, volume={31}, publisher={Curran Associates, Inc.}, pages={10381\ndash 10392}, } Close amsref.
[Zha19]
Richard Zhang, Making convolutional networks shift-invariant again, Proceedings of the 36th International Conference on Machine Learning (ICML), 2019, pp. 7324–7334.Show rawAMSref\bib{Zhang}{inproceedings}{ author={Zhang, Richard}, title={Making convolutional networks shift-invariant again}, date={2019}, booktitle={{Proceedings of the 36th International Conference on Machine Learning (ICML)}}, editor={Chaudhuri, Kamalika}, editor={Salakhutdinov, Ruslan}, series={Proceedings of Machine Learning Research}, volume={97}, pages={7324\ndash 7334}, } Close amsref.

Credits

Butterfly images are courtesy of Ulysses via StackExchange.

Figure 1 is courtesy of Lek-Heng Lim and Bradley J. Nelson.

Photo of Lek-Heng Lim is courtesy of Sou-Cheng Choi.

Photo of Bradley J. Nelson is courtesy of Xiaotong Suo.