Notices of the American Mathematical Society

Welcome to the current issue of the Notices of the American Mathematical Society.
With support from AMS membership, we are pleased to share the journal with the global mathematical community.


Machine Learning and Invariant Theory

Ben Blum-Smith
Soledad Villar
Article cover

1. Introduction

Modern machine learning has not only surpassed the state of the art in many engineering and scientific problems, but it also has had an impact on society at large, and will likely continue to do so. This includes deep learning, large language models, diffusion models, etc. In this article, we give an account of certain mathematical principles that are used in the definition of some of these machine learning models, and we explain how classical invariant theory plays a role in them. Due to space constraints we leave out many relevant references. A version of this manuscript with a longer set of references is available on arXiv 1.

In supervised machine learning, we typically have a training set , where are the data points and are the labels. A typical example is image recognition, where the are images and the are image labels (say, “cat” or “dog”), encoded as vectors. The goal is to find a function in a hypothesis space , that not only approximately interpolates the training data (), but also performs well on unseen (or held-out) data. The function is called the trained model, predictor, or estimator. In practice, one parametrizes the class of functions with some parameters varying over a space of parameter values sitting inside some ; in other words, . Then one uses local optimization (in ) to find a function in that locally and approximately minimizes a prespecified empirical loss function which compares a candidate function ’s values on the with the “true” target values . In other words, one approximately solves , and then takes .

Modern machine learning performs regressions on classes of functions that are typically overparameterized (the dimension of the space of parameters is much larger than the number of training samples), and in many cases, several functions in the hypothesis class can interpolate the data perfectly. Deep learning models can even interpolate plain noise, or fit images to random labels. Moreover, the optimization problem is typically nonconvex. Therefore the model performance is highly dependent on how the class of functions is parameterized and the optimization algorithms employed.

The parameterization of the hypothesis class of functions is what in deep learning is typically referred to as the architecture. In recent years, the most successful architectures have been ones that use properties or heuristics regarding the structure of the data (and the problem) to design the class of functions: convolutional neural networks for images, recurrent neural networks for time series, graph neural networks for graph-structured data, transformers, etc. Many of these design choices are related to the symmetries of the problem: for instance, convolutional neural networks can be translation equivariant, and transformers can be permutation invariant.

When the learning problem comes from the physical sciences, there are concrete sets of rules that the function being modeled must obey, and these rules often entail symmetries. The rules (and symmetries) typically come from coordinate freedoms and conservation laws 19. One classical example of these coordinate freedoms is the scaling symmetry that comes from dimensional analysis (for instance, if the input data to the model is rescaled to change everything that has units of kilograms to pounds, the predictions should scale accordingly). In order to do machine learning on physical systems, researchers have designed models that are consistent with physical law; this is the case for physics-informed machine learning, neural ODEs and PDEs, and equivariant machine learning.

Given data spaces and a group acting on both of them, a function is equivariant if for all and all . Many physical problems are equivariant with respect to rotations, permutations, or scalings. For instance, consider a problem where one uses data to predict the dynamics of a folding protein or uses simulated data to emulate the dynamics of a turbulent fluid. Equivariant machine learning restricts the hypothesis space to a class of equivariant functions. The philosophy is that every function that the machine learning model can express is equivariant, and therefore consistent with physical law.

Symmetries were used for machine learning (and in particular neural networks) in early works 16, and more recently they have been revisited in the context of deep learning. There are three main ways to implement symmetries. The simplest one parameterizes the invariant and equivariant functions with respect to discrete groups by averaging arbitrary functions over the group orbit 3. The second approach, explained in the next section, uses classical representation theory to parameterize the space of equivariant functions (see for instance 8). The third approach, the main point of this article, uses invariant theory.

As an example, we briefly discuss graph neural networks (GNNs), which have been a very popular area of research in the past couple of years. GNNs can be seen as equivariant functions that take a graph represented by its adjacency matrix and possible node features , and output an embedding so that for all permutation matrices. Graph neural networks are typically implemented as variants of graph convolutions or message passing, which are equivariant by definition. However, many equivariant functions cannot be expressed with these architectures. Several recent works analyze the expressive power of different GNN architectures in connection to the graph isomorphism problem.

Beyond graphs, equivariant machine learning models have been extremely successful at predicting molecular structures and dynamics, protein folding, protein binding, and simulating turbulence and climate effects, to name a few applications. Theoretical developments have shown the universality of certain equivariant models, as well as generalization improvements of equivariant machine learning models over nonequivariant baselines. There has been some recent work studying the inductive bias of equivariant machine learning, and its relationship with data augmentation. See 1 for a list of references on these topics.

2. Equivariant Convolutions and Multilayer Perceptrons

Modern deep learning models have evolved from the classical artificial neural network known as the perceptron. The multilayer perceptron model takes an input and outputs defined to be the composition of affine linear maps and nonlinear entry-wise functions. Namely,

where is the (fixed) entrywise nonlinear function and are affine linear maps to be learned from the data. The linear maps can be expressed as where and . In this example each function is defined by the parameters .

The first neural network that was explicitly equivariant with respect to a group action is the convolutional neural network. The observation is that if the input images are seen in the torus , the linear equivariant maps are cross-correlations (which in ML are referred to as convolutions) with fixed filters. The idea of restricting the linear maps to satisfy symmetry constraints was generalized to equivariance with respect to discrete rotations and translations, and to general homogenous spaces. Note that when working with arbitrary groups there are restrictions on the functions for the model to be equivariant.

Classical results in neural networks show that certain multilayer perceptrons can universally approximate any continuous function in the limit where the number of neurons goes to infinity. However, that is not true in general in the equivariant case. Namely, functions expressed as 1 where the are linear and equivariant may not universally approximate all continuous equivariant functions. In some cases, there may not even exist nontrivial linear equivariant maps.

One popular idea to address this issue is to extend this model to use equivariant linear maps on tensors. Now are linear equivariant maps (where the action in the tensor product is defined as the tensor product of the action in each component and extended linearly). Now the question is how can we parameterize the space of such functions to do machine learning? The answer is via Schur’s lemma.

A representation of a group is a map that satisfies (where is a vector space and , as usual, denotes the automorphisms of , that is, invertible linear maps ). A group action of on (written as ) is equivalent to the group representation such that . We extend the action to the tensor product so that the group acts independently in every tensor factor (i.e., in every dimension or mode), namely .

The first step is to note that a linear equivariant map corresponds to a map between group representations such that for all . Homomorphisms between group representations are easily parametrizable if we decompose the representations in terms of irreducible representations (aka irreps):

In particular, Schur’s Lemma says that a map between two irreps over is zero (if they are not isomorphic) or a multiple of the identity (if they are).

The equivariant neural-network approach consists in decomposing the group representations in terms of irreps and explicitly parameterizing the maps 9. In general, it is not obvious how to decompose an arbitrary group representation into irreps. However in the case where , the decomposition of a tensor representation as a sum of irreps is given by the Clebsch–Gordan decomposition:

The Clebsch–Gordan decomposition not only gives the decomposition of the right side of 3 but also it gives the explicit change of coordinates. This decomposition is fundamental for implementing the equivariant 3D point-cloud methods defined in 8 and other works (see references in 1). Moreover, recent work 6 shows that the classes of functions used in practice are universal, meaning that every continuous SO(3)-equivariant function can be approximated uniformly in compact sets by those neural networks. However, there exists a clear limitation to this approach: Even though decompositions into irreps are broadly studied in mathematics (plethysm), the explicit transformation that allows us to write the decomposition of tensor representations into irreps is a hard problem in general. It is called the Clebsch–Gordan problem.

3. Invariant Theory for Machine Learning

An alternative but related approach to the linear equivariant layers described above is the approach based on invariant theory, the focus of this article. In particular, the authors of this note and collaborators 18 explain that for some physically relevant groups—the orthogonal group, the special orthogonal group, and the Lorentz group—one can use classical invariant theory to design universally expressive equivariant machine learning models that are expressed in terms of the generators of the algebra of invariant polynomials. Following an idea attributed to B. Malgrange (that we learned from G. Schwarz), it is shown how to use the generators of the algebra of invariant polynomials to produce a parameterization of equivariant functions for a specific set of groups and actions.

To illustrate, let us focus on -equivariant functions, namely functions such that for all and all (for instance, the prediction of the position and velocity of the center of mass of a particle system). The method of B. Malgrange (explicated below) leads to the conclusion that all such functions can be expressed as

where are -invariant functions. Classical invariant theory shows that is -invariant if and only if it is a function of the pairwise inner products . So, in actuality, 4 can be rewritten

where are arbitrary functions. In other words, the pairwise inner products generate the algebra of invariant polynomials for this action, and every equivariant map is a linear combination of the vectors themselves with coefficients in this algebra.

In this article, we explicate the method of B. Malgrange in full generality, showing how to convert knowledge of the algebra of invariant polynomials into a characterization of the equivariant polynomial (or smooth) maps. In Section 4 we explain the general philosophy of the method, and in Section 5 we give the precise algebraic development, formulated as an algorithm to produce parametrizations of equivariant maps given adequate knowledge of the underlying invariant theory. In Section 6, we work through several examples.

We note that, for machine learning purposes, it is not critical that the functions are defined on invariant polynomials, nor that they themselves are polynomials. In the following, we focus on polynomials because the ideas were developed in the context of invariant theory; the arguments explicated below are set in this classical context. However, in 18, the idea is to preprocess the data by converting the tuple to the tuple of dot products , and then treat the latter as input to the ’s, which are then learned using a machine learning architecture of one’s choice. Therefore, the ’s are not polynomials but belong to whatever function class is output by the chosen architecture. Meanwhile, some recent works 2511 have proposed alternative classes of separating invariants that can be used in place of the classical algebra generators as input to the ’s, and may have better numerical stability properties. This is a promising research direction.

4. Big Picture

We are given a group , and finite-dimensional linear -representations and over a field . (We can take or .) We want to understand the equivariant polynomial maps . We assume we have a way to understand -invariant polynomials on spaces related to and , and the goal is to leverage that knowledge to understand the equivariant maps.

The following is a philosophical discussion, essentially to answer the question: why should it be possible to do this? It is not precise; its purpose is just to guide thinking. Below in Section 5 we show how to actually compute the equivariant polynomials given adequate knowledge of the invariants. That section is rigorous.

The first observation is that any reasonable family of maps (for example linear, polynomial, smooth, continuous, etc.) has a natural -action induced from the actions on and , and that the -equivariant maps in such a family are precisely the fixed points of this action, as we now explain. This observation is a standard insight in representation and invariant theory.

Let be the set of maps of whatever kind, and let (respectively ) be the group of linear invertible maps from (respectively ) to itself. Given and and , we define the map by

where and are the group homomorphisms defining the representations and . The algebraic manipulation to verify that this is really a group action is routine and not that illuminating. A perhaps more transparent way to understand this definition of the action as “the right one” is that it is precisely the formula needed to make this square commute:

It follows from the definition of this action that the condition is equivalent to the statement that is -equivariant. The square above automatically commutes, so is the same as saying that the below square commutes—

—and this is what it means to be equivariant.

An important special case of 6 is the action of on , the linear dual of . This is the case with trivial action, and for 6 reduces to . This is known as the contragredient action. We will utilize it momentarily with in the place of .

The second observation is that can be identified with functions from a bigger space to the underlying field by “currying,” and this change in point of view preserves the group action. Again, this is a standard maneuver in algebra. Specifically, given any map , we obtain a function , defined by the formula

Note that the function is linear homogeneous in . Conversely, given any function that is linear homogeneous in the second coordinate, we can recover a map such that , by taking to be the element of identified along the canonical isomorphism with the functional on that sends to —this functional is guaranteed to exist by the fact that is linear homogeneous in the second coordinate. An observation we will exploit in the next section is that the desired functional is actually the gradient of with respect to .

This construction gives an identification of with a subset of . Furthermore, there is a natural action of on , defined precisely by the above formula 6 with in place of , in place of , and trivial action on ;⁠Footnote1 and the identification described here preserves this action. Therefore, the fixed points for the -action on correspond with fixed points for the -action on , which are invariant functions (since the action of on is trivial).

1

The action of on is defined by acting separately on each factor; the action on is the contragredient representation defined above.

What has been achieved is the reinterpretation of equivariant maps first as fixed points of a -action, and then as invariant functions . Thus, knowledge of invariant functions can be parlayed into knowledge of equivariant maps.

5. Equivariance from Invariants

With the above imprecise philosophical discussion as a guide, Algorithm 1 shows how in practice to get from a description of invariant polynomials on , to equivariant polynomial (or smooth) maps . The technique given here is attributed to B. Malgrange; see 13, Proposition 2.2 where it is used to obtain the smooth equivariant maps, and 15, Proposition 6.8 where it is used to obtain holomorphic equivariant maps. Variants on this method are used to compute equivariant maps in 7, Sections 2.1–2.2, 12, Section 3.12, 4, Section 4.2.3, and 20, Section 4.

The goal of the algorithm is to provide a parametrization of equivariant maps. That said, the proof of correctness is constructive: as an ancillary benefit, it furnishes a method for taking an arbitrary equivariant map given by explicit polynomial expressions for the coordinates and expressing it in terms of this parametrization.

Algorithm 1.

Malgrange’s method for getting equivariant functions

Input: Bihomogeneous generators for .
(1)

Order the generators so that are of degree and are of degree in . Discard (of higher degree in ).

(2)

Choose a basis for , and let be the dual basis, so an arbitrary element can be written

and .

(3)

For , and for , let be the gradient of with respect to , identified with an element of along the canonical isomorphism ; explicitly,

Then each is a function .

Output: Equivariant polynomial functions from to such that any equivariant polynomial (or smooth, if is compact) map can be written aswhere are arbitrary polynomial (or smooth, if is compact) functions.Remark: Because the ’s are being differentiated with respect to variables in which they are linear, Step 3 could alternatively have been stated: for each , define as the vector in whose coefficients with respect to the ’s are just the coefficients of the the ’s in .

We now exposit in detail Algorithm 1 and its proof of correctness, in the case where is a polynomial map; for simplicity we take . The argument is similar for smooth or holomorphic maps, except that one needs an additional theorem to arrive at the expression 7 below. If is a compact Lie group, the needed theorem is proven in 14 for smooth maps, and in 10 for holomorphic maps over .

We begin with linear representations and of a group over . We take to be the contragredient representation to , defined above. (If is compact, we can work in a coordinate system in which the action of on is orthogonal, and then we may ignore the distinction between and , as discussed above in the case of .) We suppose we have an explicit set of polynomials that generate the algebra of invariant polynomials on the vector space (denoted as )—in other words, they have the property that any invariant polynomial can be written as a polynomial in these. We also assume they are bihomogeneous, i.e., independently homogeneous in and in . To reduce notational clutter we suppress the maps specifying the actions of on and (which were called and in the previous section), writing the image of (respectively , ) under the action of an element as (respectively , ). We suppose are degree in (so they are functions of alone), are degree in , and are degree in .

Now we consider an arbitrary -equivariant polynomial function

We let be arbitrary, and, as in the previous section, we construct the function

Equivariance of implies that is invariant:

From the invariance of , and the fact that generate the algebra of invariant polynomials on , we have an equality of the form

where is a polynomial. Note that do not depend on , while do.

We now fix and take the gradient of both sides of 7 with respect to , viewed as an element of .⁠Footnote2 Choosing dual bases for and for , and writing , we can express the operator acting on a smooth function explicitly by the formula

2

In the background, we are using canonical isomorphisms to identify with all its tangent spaces, and with .

Applying to the left side of 7, we get

so recovers from . (Indeed, this was the point.) Meanwhile, applying to the right side of 7, writing for the partial derivative of with respect to its th argument, and using the chain rule, we get

Combining these, we conclude

Now we observe that if , because in those cases is constant with respect to . But meanwhile, the left side of 8 does not depend on , and it follows the right side does not either; thus we can evaluate it at our favorite choice of ; we take . Upon doing this, also becomes for , because in these cases is homogeneous of degree at least in , so its partial derivatives with respect to the remain homogeneous degree at least in , thus they vanish at . Meanwhile, itself vanishes for , so that the st to th arguments of each vanish. Abbreviating

as , we may thus rewrite 8 as

Finally, we observe that, being linear homogeneous in for , is degree in , i.e., it does not depend on . So we may call it as in the algorithm, and we have finally expressed as the sum , as promised.

6. Examples

In this section we apply Malgrange’s method to parametrize equivariant functions in various examples. In all cases, for positive integers , we take a group of matrices, equipped with its canonical action on , and we are looking for equivariant maps

from an -tuple of vectors to a single vector. The underlying invariant theory is provided by Weyl’s The Classical Groups in each case.

The orthogonal group. We parametrize maps that are equivariant for . By the Riesz representation theorem, we can identify with along the map , where is the standard dot product . Since preserves this product (by definition), this identification is equivariant with respect to the -action, thus is isomorphic with as a representation of . We may therefore ignore the difference between and in applying the algorithm.

Thus, we consider the ring of -invariant polynomials on tuples

We begin with bihomogeneous generators for this ring. By a classical theorem of Weyl known as the first fundamental theorem for , they are the dot products for ; for ; and . These are ordered by their degree in as in Step 1 of the algorithm; we discard as it is degree in .

We can work in the standard basis for , and we have identified it with its dual, so Step 2 is done as well.

Applying Step 3, we take the generators of degree in , which are

Taking the gradients, we get

where denotes the th coordinate of . Thus the yielded by the algorithm is nothing but projection to the th input vector. Meanwhile, the of the algorithm are the algebra generators of degree zero in ; thus the output of the algorithm is precisely the representation described in 5 and the paragraph following.

The Lorentz and symplectic groups. If we replace with the Lorentz group , or (in case is even) the symplectic group , the entire discussion above can be copied verbatim, except with the standard dot product being replaced everywhere by the Minkowski product in the former case, or the standard skew-symmetric bilinear form (where is block diagonal with -rotation matrices as blocks) in the latter. We also need to use these respective products in place of the standard dot product to identify equivariantly with its dual representation. The key point is that the invariant theory works the same way (see 12, Sec. 9.3 for a concise modern treatment, noting that and have the same complexification).

The special orthogonal group. Now we consider . We can once again identify with its dual. However, this time, in Step 1, the list of bihomogeneous generators is longer: in addition to the dot products and (and , which will be discarded), we have determinants for , and for . The former are of degree in while the latter are of degree . Thus, the latter contribute to our list of ’s in Step 3, while the former figure in the arguments of the ’s. Carrying out Step 3 in this case, we find that

is exactly the generalized cross product of the vectors . Thus we must add to the ’s a generalized cross-product for each -subset of our input vectors; in the end the parametrization of equivariant maps looks like

where , represents the set of -subsets of , is shorthand for where , and is shorthand for the generalized cross product of the vectors where .

The special linear group. We include an example where we cannot identify with its dual representation. Namely, we take . As does not preserve any bilinear form on , we must regard as a distinct representation. Thus in Step 1 we must consider the polynomial invariants on

A generating set of homogeneous invariants is given by the canonical pairings , , and the determinants , , where we have used the same shorthand as above for a -subset of and the determinant whose columns are the ’s indexed by . The former are degree in while the latter are degree . So, writing as in Step 2, we have

and again in Step 3 we get

with the computation identical to the one above for the orthogonal group. Thus the algorithm outputs that an arbitrary -equivariant polynomial map has the form

The symmetric group. We give an example where the algebra of invariants is generated by something other than determinants and bilinear forms, so the ’s output by the algorithm are not just the ’s themselves and generalized cross products. Take , the symmetric group on letters, acting on by permutations of the coordinates. As realized in this way is a subgroup of , we can once again identify with its dual.

By the fundamental theorem on symmetric polynomials, the algebra of -invariant polynomials on a single vector is given by the elementary symmetric polynomials in the coordinates , , etc., where index the coordinates of the vector . Weyl showed that for any , the algebra of invariant polynomials on an -tuple of vectors , is generated by the polarized elementary symmetric polynomials

etc., where can be any of the vectors , distinct or not. (Up to a scalar multiple, one recovers the original, unpolarized elementary symmetric polynomials of a single vector by setting .) Running the algorithm to parametrize equivariant functions , we write down the algebra of invariants on , and see that the algebra generators of degree 1 in have the form , where, again, can be distinct or not. The gradients of these become the ’s of Step 3. For the sake of explicitness, we fix , , and write out the results as column vectors. We get

Thus any -equivariant polynomial map is a linear combination of these six maps, with coefficients that are polynomials in , , and with .

7. Discussion

This article gives a gentle introduction to equivariant machine learning, and it explains how to parameterize equivariant maps in two different ways. One requires knowledge of the irreducible representations and Clebsch-Gordan coefficients, and the other requires the knowledge of the generators of the algebra of invariant polynomials on . The main focus is on the latter, which is useful to design equivariant machine learning with respect to groups and actions where the invariants are known and are computationally tractable. This is not a panacea: sometimes the algebra of invariant polynomials is too large, or outright not known. Both these issues come up, for example, in the action of permutations on symmetric matrices by conjugation, the relevant one for graph neural networks. The invariant ring has not been fully described as of this writing, except for small , where the number of generators increases very rapidly with ; see 17.

From a practitioner’s point of view, it is not yet clear which of these approaches will behave better in practice. We conjecture that Malgrange’s construction applied to nonpolynomial invariants such as 2511, as described at the end of Section 3, is a promising direction for some applications, especially because some of them exhibit desirable stability properties.

Acknowledgments

The authors thank Gerald Schwarz (Brandeis University) for introducing us to Malgrange’s method. The authors also thank Jaume de Dios Pont (UCLA), David W. Hogg (NYU), Teresa Huang (JHU), and Peter Olver (UMN) for useful discussions. SV and BBS are partially supported by ONR N00014-22-1-2126. SV was also partially supported by the NSF-Simons Research Collaboration on the Mathematical and Scientific Foundations of Deep Learning (MoDL) (NSF DMS 2031985), NSF CISE 2212457, and an AI2AI Amazon research award.

References

[1]
B. Blum-Smith and S. Villar, Machine learning and invariant theory, arXiv preprint arXiv:2209.14991, 2023.
[2]
J. Cahill, J. W. Iverson, D. G. Mixon, and D. Packer, Group-invariant max filtering, arXiv preprint arXiv:2205.14039, 2022.
[3]
T. Cohen and M. Welling, Group equivariant convolutional networks, Proceedings of the 33rd International Conference on Machine Learning, PMLR, 2990–2999, 2016.
[4]
Harm Derksen and Gregor Kemper, Computational invariant theory, Second enlarged edition, Encyclopaedia of Mathematical Sciences, vol. 130, Springer, Heidelberg, 2015. With two appendices by Vladimir L. Popov, and an addendum by Norbert A’Campo and Popov; Invariant Theory and Algebraic Transformation Groups, VIII, DOI 10.1007/978-3-662-48422-7. MR3445218,
Show rawAMSref \bib{derksen2015computational}{book}{ author={Derksen, Harm}, author={Kemper, Gregor}, title={Computational invariant theory}, series={Encyclopaedia of Mathematical Sciences}, volume={130}, edition={Second enlarged edition}, note={With two appendices by Vladimir L. Popov, and an addendum by Norbert A'Campo and Popov; Invariant Theory and Algebraic Transformation Groups, VIII}, publisher={Springer, Heidelberg}, date={2015}, pages={xxii+366}, isbn={978-3-662-48420-3}, isbn={978-3-662-48422-7}, review={\MR {3445218}}, doi={10.1007/978-3-662-48422-7}, }
[5]
N. Dym and S. J. Gortler, Low dimensional invariant embeddings for universal geometric learning, arXiv preprint arXiv:2205.02956, 2022.
[6]
N. Dym and H. Maron, On the universality of rotation equivariant point cloud networks, International Conference on Learning Representations, 2021.
[7]
Karin Gatermann, Computer algebra methods for equivariant dynamical systems, Lecture Notes in Mathematics, vol. 1728, Springer-Verlag, Berlin, 2000, DOI 10.1007/BFb0104059. MR1755001,
Show rawAMSref \bib{gatermann2007computer}{book}{ author={Gatermann, Karin}, title={Computer algebra methods for equivariant dynamical systems}, series={Lecture Notes in Mathematics}, volume={1728}, publisher={Springer-Verlag, Berlin}, date={2000}, pages={xvi+153}, isbn={3-540-67161-7}, review={\MR {1755001}}, doi={10.1007/BFb0104059}, }
[8]
M. Geiger and T. Smidt, e3nn: Euclidean neural networks, arXiv preprint arXiv:2207.09453, 2022.
[9]
R. Kondor, Z. Lin, and S. Trivedi, Clebsch–Gordan nets: a fully fourier space spherical convolutional neural network, Advances in Neural Information Processing Systems 31 (2018), 10117–10126.
[10]
Domingo Luna, Fonctions différentiables invariantes sous l’opération d’un groupe réductif (French, with English summary), Ann. Inst. Fourier (Grenoble) 26 (1976), no. 1, ix, 33–49. MR423398,
Show rawAMSref \bib{luna1976fonctions}{article}{ author={Luna, Domingo}, title={Fonctions diff\'{e}rentiables invariantes sous l'op\'{e}ration d'un groupe r\'{e}ductif}, language={French, with English summary}, journal={Ann. Inst. Fourier (Grenoble)}, volume={26}, date={1976}, number={1}, pages={ix, 33--49}, issn={0373-0956}, review={\MR {423398}}, }
[11]
Peter J. Olver, Invariants of finite and discrete group actions via moving frames, Bull. Iranian Math. Soc. 49 (2023), no. 2, Paper No. 11, 12, DOI 10.1007/s41980-023-00744-0. MR4549776,
Show rawAMSref \bib{olver2023invariants}{article}{ author={Olver, Peter J.}, title={Invariants of finite and discrete group actions via moving frames}, journal={Bull. Iranian Math. Soc.}, volume={49}, date={2023}, number={2}, pages={Paper No. 11, 12}, issn={1017-060X}, review={\MR {4549776}}, doi={10.1007/s41980-023-00744-0}, }
[12]
V. L. Popov and E. B. Vinberg. Invariant theory. In Algebraic geometry IV, pages 123–278. Springer, 1994.
[13]
F. Ronga, Stabilité locale des applications équivariantes, Differential topology and geometry (Proc. Colloq., Dijon, 1974), Lecture Notes in Math., Vol. 484, Springer, Berlin, 1975, pp. 23–35. MR0445526,
Show rawAMSref \bib{ronga1975stabilite}{article}{ author={Ronga, F.}, title={Stabilit\'{e} locale des applications \'{e}quivariantes}, conference={ title={Differential topology and geometry}, address={Proc. Colloq., Dijon}, date={1974}, }, book={ series={Lecture Notes in Math., Vol. 484}, publisher={Springer, Berlin}, }, date={1975}, pages={23--35}, review={\MR {0445526}}, }
[14]
Gerald W. Schwarz, Smooth functions invariant under the action of a compact Lie group, Topology 14 (1975), 63–68, DOI 10.1016/0040-9383(75)90036-1. MR370643,
Show rawAMSref \bib{schwarz1975smooth}{article}{ author={Schwarz, Gerald W.}, title={Smooth functions invariant under the action of a compact Lie group}, journal={Topology}, volume={14}, date={1975}, pages={63--68}, issn={0040-9383}, review={\MR {370643}}, doi={10.1016/0040-9383(75)90036-1}, }
[15]
Gerald W. Schwarz, Lifting smooth homotopies of orbit spaces, Inst. Hautes Études Sci. Publ. Math. 51 (1980), 37–135. MR573821,
Show rawAMSref \bib{schwarz1980lifting}{article}{ author={Schwarz, Gerald W.}, title={Lifting smooth homotopies of orbit spaces}, journal={Inst. Hautes \'{E}tudes Sci. Publ. Math.}, number={51}, date={1980}, pages={37--135}, issn={0073-8301}, review={\MR {573821}}, }
[16]
J. Shawe-Taylor, Building symmetries into feedforward networks, 1989 First IEE International Conference on Artificial Neural Networks (Conf. Publ. No. 313), 158–162, IET, 1989.
[17]
N. M. Thiéry, Algebraic invariants of graphs; a study based on computer exploration, ACM SIGSAM Bulletin 34 (2000), no. 3, 9–20.
[18]
S. Villar, D. W. Hogg, K. Storey-Fisher, W. Yao, and B. Blum-Smith, Scalars are universal: Equivariant machine learning, structured like classical physics, Advances in Neural Information Processing Systems 34 (2021), 28848–28863.
[19]
S. Villar, D. W. Hogg, W. Yao, G. A. Kevrekidis, and B. Schölkopf, The passive symmetries of machine learning, arXiv preprint arXiv:2301.13724, 2023.
[20]
Patrick A. Worfolk, Zeros of equivariant vector fields: algorithms for an invariant approach, J. Symbolic Comput. 17 (1994), no. 6, 487–511, DOI 10.1006/jsco.1994.1031. MR1300350,
Show rawAMSref \bib{worfolk1994zeros}{article}{ author={Worfolk, Patrick A.}, title={Zeros of equivariant vector fields: algorithms for an invariant approach}, journal={J. Symbolic Comput.}, volume={17}, date={1994}, number={6}, pages={487--511}, issn={0747-7171}, review={\MR {1300350}}, doi={10.1006/jsco.1994.1031}, }

Credits

Opening image is courtesy of filo via Getty.

Photo of Ben Blum-Smith is courtesy of Ryan Lash/TED.

Photo of Soledad Villar is courtesy of Erwin List/Soledad Villar.