Some thoughts on automation and mathematical research

By Akshay Venkatesh

Abstract

I discuss how mathematicians come to a shared notion of what is important, and how automated reasoning might affect that process.

The deeper one digs the spade, the harder the digging gets; maybe it has become too hard for us unless we are given some outside help, be it even by such devilish devices as high-speed computing machines.

1. Introduction

In 2017, DeepMind’s Alphazero taught itself chess and Go “overnight,” surpassing human performance and apparently reconstructing a good part of accumulated knowledge about chess openings. We will consider a thought experiment:

What if, in ten years, “Alephzero” (written ) does the same for mathematics?

“Mathematics” for the purpose of this essay means “research in pure mathematics.” Our starting point is to imagine that teaches itself high school and college mathematics and works its way through all of the exercises in the Springer-Verlag Graduate Texts in Mathematics series. The next morning, it is let loose upon the world—mathematicians download its children and run them with our own computing resources. What happens next—say, in the subsequent decade?

This is indeed a thought experiment, for it is clearly unrealistic: By restricting our horizon to ten or twenty years in the future, we allow ourselves to consider the question in isolation from the social changes that would likely accompany this kind of technological advance, and also allow ourselves to avoid thinking about more extreme types of machine intelligence—we model as a power tool and not as a sentient collaborator. Nonetheless, I have found the exercise to be clarifying.

We may comfort ourselves with the thought that, in reality, the premise is so far in the future that we need not think about it. But if we allow even a remote possibility that this might happen in twenty years—the timescale between commencing an undergraduate degree and obtaining tenure—it certainly merits us grappling with the possibilities. I suggest that:

Human mathematics may go on as before in many respects, just as many other professions have adapted to automation. Indeed, the resulting mathematics will be inestimably more powerful than ours, in the sense that its ability to solve any specific question will be vastly greater.

However, the resulting field will be greatly altered; its central questions and values will be very different from those to which we are accustomed, rendering it all but unrecognizable to us.

The main point I want to make here is that the mechanization of our cognitive processes will enhance our ability to do mathematics but also will alter our understanding of what mathematics is. We cannot meaningfully assess the first point without taking into account the second. To look at it seriously we must examine, at a minimum, the effect of automation on those processes by which our field decides which questions are interesting and fruitful; as practitioners we rarely stop to think about these, but, even setting aside our current purpose, there are many reasons not to leave the examination of such matters entirely to historians and sociologists of science.

In the remainder of the essay, I will discuss how value and consensus is constructed and maintained in current research mathematics, and then consider how will affect some of these processes.

2. Preliminary observations

We should begin by observing that human mathematical research is in no danger of being killed. There is a very large gap between the ease of asking a question and the difficulty of answering it; and for a meaningful notion of human research it is sufficient that we understand the questions but cannot solve them readily.

It is tempting to wonder about the specifics of ’s capabilities. Will it be able to visualize higher dimensions? Will it produce proofs that are displeasing, or even oracular insight without proofs? Will it surpass us at all mathematical reasoning tasks (a scenario that we should certainly not dismiss)? Indeed, it is very hard to imagine the exact structure of post- mathematics without some understanding of such issues. But we can still hope to obtain insight without such details simply by thinking of extreme versions of commonplace phenomena. For example, many consequences of the development of will resemble the consequences of a very large increase in the number of working mathematicians. The experience of producing alien insight without proof would also not be wholly foreign to us, for our colleagues in physics departments have done this for a long time, and with less electricity consumed.

It is similarly irrelevant to our current purpose to know whether can enter mathematical realms that are essentially beyond our comprehension. We will regard this as the proverbial tree falling in an unpopulated forest, i.e., we are interested only in the effect on humans.

3. Value and consensus in mathematics

What follows is, evidently, a crude analysis of one part of a very complicated system. However, the specifics do not matter so much; the key take-away for our purposes will be that how we value mathematics is an active process inextricably woven in with the actual doing of mathematics.

There are infinitely many mathematical problems, and a finite number of mathematicians. Very few mathematicians substantively interact with a typical problem, and conversely a single mathematician can be aware of only a small part of the mathematical landscape. By what mechanism, then, does it happen that there is a substantial measure of consensus on what the important problems are, at least at a given time, and even stronger consensus on who is doing important work? I don’t mean to suggest, of course, that we mathematicians have anything near unanimity on such issues. However, my impression is that we have much more of it than other academic fields.

The valuation mechanism is fundamentally important because it constrains with an iron, if invisible, hand, the mathematics we can feasibly do. It is responsible for selecting what we are exposed to in talks, seminars, and papers, and for incentivizing some questions over others. In a sense, it defines what mathematics is at any given time. So it is crucial to carefully examine how this value structure evolves. The points I am about to make are very simple ones, instinctively grasped by mathematicians in our working lives, but they are not often enunciated explicitly.

There are some obvious (overlapping) mechanisms that influence the construction of value:

(a)

External validation (for example, the influence of applied fields such as cryptography or fluid mechanics);

(b)

Processes that direct our attention (e.g., seminars, conferences, journals, prizes, influence of individual charisma, social media);

(c)

Infrastructure (e.g., the organization of the educational system, the hiring process, and the grant process);

(d)

Aesthetic considerations.

We shall assume these mechanisms will evolve slowly in relationship to the transition we want to study, and so we will not discuss them. This is clearly not entirely realistic and point (b) is particularly important, both because it has evolved very rapidly in recent times (e.g., through the creation of giant online seminars), and because it mediates the processes discussed below.

In any case, (a)–(d) miss a crucial part of the picture because they are not specific to mathematics, and I think they do not adequately explain why mathematics should have a higher level of consensus than other academic fields. There is one feature of mathematics that stands out: it has distinguished a specific class of scholarly communication (proofs) which are defined by the fact that they should induce uniform agreement about their validity without any need for replication.⁠Footnote1 It is reasonable to suppose that our elevated level of broad consensus is eventually derived from our much higher level of consensus on the narrow issue of validity of proof. I will assume this is so, although it is by no means obvious; to investigate this point further, it would be useful to compare with fields such as physics, economics, and computer science where proof plays a substantial but less central role. In any case, it becomes important to study how consensus might propagate from a restricted setting to a broader one.

1

In fact, in practice, the correctness of mathematical proofs is at least partly maintained by a process of replication, and it is currently an interesting topic of discussion how close modern proofs are to being formally valid. However, all that is important for us here is that a proof is generally understood to mean an argument compelling consensus.

There are many situations, such as the price mechanism in a free market or the Elo rating system of chess, where information is propagated through a network through repeated local transactions, thereby arriving at a consensus even when individual actors have only local information. I suggest that a similar mechanism, which we could informally call

(e)

Free trade in ideas,⁠Footnote2

2

This phrase, suggesting a market metaphor for an intellectual process, appears in the dissent of the justice Oliver Wendell Holmes in a famous decision of the United States Supreme Court Reference 1; that text continues “the best test of truth is the power of the thought to get itself accepted in the competition of the market.”

is a crucial component of the valuation mechanism in mathematics. I will describe it as a Bayesian process of updating our mental landscape of mathematics and mathematicians as we receive information about it. Models of this type have been studied extensively in different contexts (see Reference 3Reference 5 for examples from computer science and cognitive science, respectively).

Tautologically, the value we assign to a work of mathematics is purely subjective, in the sense that it depends solely on the perception of that work, and not on any objective quality. Through what means is a work of mathematics perceived by other mathematicians? The size and complexity of modern mathematics means that most papers are almost incomprehensible to us; our opinion of them can then only repeat that of others. The only people who can be involved in the formation of opinion about a given paper or a given question are those who interact with it in some way. Now, the set of people who study the details of any argument themselves is very small; a much larger group acquire, instead, an awareness of its relationship to other existing work. This can be acquired quite incidentally, e.g., through attending talks, reading or refereeing papers, reading or writing recommendation letters, and other less formal methods. Let us, proceeding by way of example, examine how such awareness of the relationship between different works can shape opinion.

Suppose that we learn of a relationship between two hitherto unrelated conjectures in our field:

This could mean that (i) conjecture is more important than we thought, or that (ii) conjecture is easier than we thought. In practice we decide (to some extent unconsciously) according to the prior uncertainty of our beliefs: if is a conjecture of long standing, option (i) is more likely, and if is a conjecture of long standing, option (ii) is more likely. Nor does need to imply for this conclusion—they need only be linked in some substantive way. A similar situation occurs if

this is possible evidence that either is a good mathematician, or that is an easy conjecture, and in practice we again choose in a fashion dependent on our prior information. In either of the situations Equation 1 or Equation 2, our views and uncertainty about both interacting parties are altered.

The intellectual activity in a field involves innumerable interactions of this general type. (It is a gross oversimplification to reduce mathematics to a collection of events of type Equation 1 and Equation 2, but we will adopt this very crude model for our current discussion, keeping in mind its obvious limitations.) The endless iteration of the resulting value negotiations is an important means by which the value of problem is established within the “vicinity” of , i.e., among those people to whom problem is visible, and is also perhaps the dominant means of establishing a status hierarchy⁠Footnote3 among workers in that community. The specifics of how this mechanism work are, of course, heavily influenced by what defines “visible”—in particular, the processes mentioned in (b). Now, although two observers in the same field do not observe the same interactions and they do not, in general, interpret identically those that they do both see, there is nonetheless a substantial fraction on which they agree precisely because of the concept of rigorous proof. This reduces the discrepancy between the value systems deduced by and .

3

This phrase may evoke various negative connotations—however, I want to avoid discussing normative issues here, and make only the point that such hierarchies both influence and are influenced by the hierarchy of importance assigned to scientific problems.

To spell out: when will a new conjecture acquire a high value in this model? This will be so, to the greatest extent, if both of the processes Equation 1 and Equation 2 just described raise its status, which is to say:

(a)

It is difficult: many people try to solve and fail.

(b)

It is central: is linked with many other questions of (prior) importance.

An interesting empirical study of the relative status of different research fields within mathematics has been carried out by Schlenker Reference 4. He examines which subfields of mathematics have the most “prestige”, this notion being defined via bibliometrics, prizes, and departmental rankings; to explain his results, he hypothesizes that fields of high “prestige” are distinguished by a focus on a small number of central questions.

How does this hypothesis relate to our discussion? We just noted that our simple model predicts the role of centrality in determining status. The function of small number is that problems require many repeated attempts at a solution (strictly, many repeated visible attempts) to certify their difficulty. This is only possible when the number of workers is large relative to the number of questions.

But then—why do some fields have fewer central questions than others? I cannot see any meaningful or intrinsic sense that one field has “fewer” problems than another. Partly the emergence of central questions may reflect the structure of the mathematics itself, which is very difficult to quantify, but a readily visible factor is the extent of barriers to working on new problems. Where such barriers are low (as, for example, in combinatorics)⁠Footnote4 the set of problems under investigation can be relatively large in comparison to the number of workers in the field.

4

A colleague of mine, in reading this, felt that it might be interpreted as demeaning combinatorics. My intention is in fact quite the opposite. If anything I hope that analyzing the origins of our conceptions about “depth” will make us think more critically about those conceptions.

It is also interesting to consider failures of consensus, which may arise because different observers see different parts of the network. Consider, for example, problems that are common to two fields which otherwise have little overlap. Observers from field and those from field then see within entirely different “contexts” and its importance may be perceived rather differently within the two fields. This can even happen when field is an offshoot of field , or potentially when and are the same field at different times. Increases in the overlap of and would probably lead to equalization.

I have attempted here to mechanistically model some part of how valuation in mathematics operates in practice, but I am not advocating any position on how it should work. To discuss this, we would first need to clarify what the goals, internal and external, of mathematics research are; such a discussion can obviously go on without end—which is fortunate, because in our post- existence the fundamental role for humans may be exactly to carry on this conversation.

4. The impact of mechanization

We have offered a very rough model of part of our valuative mechanism via (Bayesian) interaction in a network of mathematicians and problems. We now consider how will affect this network and alter the resulting outcome.

Perceived difficulty is, as we have seen, an essential component of our construction of value. No matter the specifics, will alter our ability to solve questions and therefore our perception of their difficulty. The parts of the mathematical process that can be speeded up the most by will have the greatest reductions in their perceived difficulty, and, according to our model above, will suffer the greatest reduction in status. Similar patterns occur in many instances of automation.

The centrality of questions—that is to say, their relationship to others—is another component of the way we value mathematics, and we expect to change this too. Let us suppose that the energies of are partly directed towards reworking the existing literature: revisiting and supplying proofs of known results rather than examining open questions. As we have emphasized, the number of mathematicians who have thought about a specific question is typically very small, and it is likely that very many parts of the literature would be greatly revised even through careful re-examination by many human mathematicians. It is not unlikely that we will see a scenario that has happened surprisingly rarely in recent history—replacement of long elaborate proofs by short overlooked ones. What effect might a five page combinatorial proof of the Weyl conjectures have? Even if such an extreme scenario does not occur, it seems very likely that the web of relationships between standard lemmas and theorems will be altered. This discussion also suggests why the operators of may be induced to revisit old problems over studying new ones: besides settling concerns about formal correctness, the shifting of foundations has a larger social impact than adding new levels.

Finally, will greatly expand the entire landscape of questions considered mathematically interesting. Such inflation can happen through many different paths; it is not necessary for to explicitly generate questions on its own, for new mathematics always generates new questions, and correspondingly any process accelerating research in mathematics will accelerate the creation of new questions. (If does all the proving and we do all the questioning, the result is not so different to a scenario where is capable of generating its own mathematical conjectures.) Now we already saw that fields with an oversupply of problems relative to the number of workers may lose status, particularly if those problems do not organize around central ones. Since the existence of will increase both the number of problems and the effective number of workers it is not clear how this will play out; but certainly we may expect great variability from the current situation. In such an expanded landscape many currently central problems may become peripheral.

These three points already suggest a great shift in what problems and fields will attract the most attention. However, the process may extend beyond this, and affect, for example, the balance between heuristics and rigor, the role of aesthetic considerations, the extent of consensus, and the placement of boundaries such as those between professional and amateur mathematics or pure and applied mathematics. ( will likely level the playing field between professional mathematicians and other interested parties.) The definitions and concepts which structure our perception of mathematical reality are designed, in part, to ease the cognitive load of interacting with intricate structure; if this load is partly borne by a machine, it is possible that new definitions and categorizations may lead to radical reframings. To analyze the specifics is obviously impossible without a better idea of the abilities of , but whatever direction it goes, it will go far.

An important limitation on rapid change in a subject is the the length of the professional career. Those who can most readily enter a new field are the young, and the extent to which this is possible is limited by the structure of hiring; senior scientists are slower to change their view of what is valuable. Nonetheless, since it will presumably be infeasible to do research without making use of mechanized assistants in the post- age, the impacts that we have detailed above will likely extend to senior mathematicians also, although their effects will be more extreme for younger mathematicians.

In the normal development of any scholarly field the way we assign importance and value is continuously changing and evolving. What distinguishes our scenario is the breadth and magnitude of these effects and the short timescale over which they are likely to occur; developments that previously took several mathematical generations may be compressed into a few short years.

It is natural to look to history for metaphors. Post-mechanization mathematics may look to us as modern mathematics might impress those working a century ago, but I think this does not go far enough: the impact of on mathematical cognition may be much greater. To find a suitable parallel for this effect on our thought process, we might consider, for example, the introduction of algebraic notation in mathematics.

It is important for us to consider seriously the possibility of such developments.

5. An afterword

Added March 2023. This essay was not originally written with the intention of publication, but rather with the hope of conveying the urgency of reflecting upon these issues to the mathematical community.

Kumar Murty and the Fields Institute kindly took these concerns seriously enough to devote the 2022 Fields Symposium to this topic, broadly construed—the “changing face of mathematical research.” I learned a great deal from this wonderful meeting, and I was particularly delighted to interact with scholars from the humanities who share an interest in these topics. They have much to contribute to discourse among mathematicians, a point that is made at more length in Michael Harris’s contribution to this volume. Indeed, that we can acquire knowledge about the world in ways inaccessible to the direct reach of the senses, and that this knowledge may overflow the capacity of our individual brains—these are hardly new developments, and there is much to be learned by examining parallels in the history of human thought.

Although what I have written is informal and limited in its scope, and, in particular, does not examine any of the underlying philosophical issues, I hope that—if only as a call to think critically about what it is that we are doing—it is appropriate for this present issue of the Bulletin.

6. Acknowledgments

I would like to thank all the participants of the STMS seminar for providing a stimulating atmosphere in which to explore these ideas. I also thank Ken Alder, Aravind Asok, Brian Conrad, Harald Helfgott, David Nirenberg, Patrick Shafto, and David Treumann for interesting discussion and suggestions. That writing is a useful metaphor was suggested by Ken; Patrick pointed out the reference Reference 5, and Brian pointed out several errors and suggested many interesting examples to consider. Finally, I note that the topics here have a substantial overlap with an excellent recent essay Reference 2 of Jeremy Avigad which also includes many interesting historical and mathematical examples.

Mathematical Fragments

Equation (1)
Equation (2)

References

Reference [1]
Abrams v. United States, 250 U.S. 616, 1919.
Reference [2]
J. Avigad, Varieties of mathematical understanding, Bull. Amer. Math. Soc. (N.S.) 59 (2022), no. 1, 99–117, DOI 10.1090/bull/1726. MR4340829,
Show rawAMSref \bib{Avigad}{article}{ author={Avigad, J.}, title={Varieties of mathematical understanding}, journal={Bull. Amer. Math. Soc. (N.S.)}, volume={59}, date={2022}, number={1}, pages={99--117}, issn={0273-0979}, review={\MR {4340829}}, doi={10.1090/bull/1726}, }
Reference [3]
J. Pearl, Reverend Bayes on inference engines: A distributed hierarchical approach, Proceedings of the Second National Conference on Artificial Intelligence, AAAI Press, Menlo Park, California, 1982.
Reference [4]
J.-M. Schlenker, The prestige and status of research fields within mathematics, arXiv:2008.13244.
Reference [5]
S. Sloman, B. Love and W. Ahn, “Feature Centrality and Conceptual Coherence,” Cognitive Science vol. 22 (2), 1998.
Reference [6]
H. Weyl, address at Princeton Bicentennial Conference, 1946.

Article Information

MSC 2020
Primary: 00A30 (Philosophy of mathematics), 00A35 (Methodology of mathematics)
Author Information
Akshay Venkatesh
School of Mathematics, Institute for Advanced Study, Princeton, New Jersey 08540
akshay.venkatesh@gmail.com
MathSciNet
Additional Notes

This essay, originally written in February 2022, is a writeup and expansion of a talk I gave at the Institute for Advanced Study in November 2021 as part of an ongoing interdisciplinary seminar examining some of the impacts of machine learning. Some minor changes, including the addition of an afterword, were made in March 2023.

The author was supported by NSF grant DMS-1931087.

Journal Information
Bulletin of the American Mathematical Society, Volume 61, Issue 2, ISSN 1088-9485, published by the American Mathematical Society, Providence, Rhode Island.
Publication History
This article was received on and published on .
Copyright Information
Copyright 2024 American Mathematical Society
Article References
  • Permalink
  • Permalink (PDF)
  • DOI 10.1090/bull/1834
  • MathSciNet Review: 4726987
  • Show rawAMSref \bib{4726987}{article}{ author={Venkatesh, Akshay}, title={Some thoughts on automation and mathematical research}, journal={Bull. Amer. Math. Soc.}, volume={61}, number={2}, date={2024-04}, pages={203-210}, issn={0273-0979}, review={4726987}, doi={10.1090/bull/1834}, }

Settings

Change font size
Resize article panel
Enable equation enrichment

Note. To explore an equation, focus it (e.g., by clicking on it) and use the arrow keys to navigate its structure. Screenreader users should be advised that enabling speech synthesis will lead to duplicate aural rendering.

For more information please visit the AMS MathViewer documentation.