Probabilistic view of voting, paradoxes, and manipulation
Abstract
The Marquis de Condorcet, a French philosopher, mathematician, and political scientist, studied mathematical aspects of voting in the eighteenth century. Condorcet was interested in studying voting rules as procedures for aggregating noisy signals and in the paradoxical nature of ranking three or more alternatives. We survey some of the main mathematical models, tools, and results in a theory that studies probabilistic aspects of social choice. Our journey will take us through major results in mathematical economics from the second half of the twentieth century, through the theory of Boolean functions and their influences and through recent results in Gaussian geometry and functional inequalities.
1. Introduction
The Marquis de Condorcet, a French philosopher, mathematician, and political scientist, studied mathematical aspects of voting in the eighteenth century. It is remarkable that already in the eighteenth century Condorcet was an advocate of equal rights for women and people of all races and of free and equal public education Reference 73. His applied interest in democratic processes led him to write an influential paper in 1785 Reference 16, where in particular he was interested in voting as an aggregation procedure and where he pointed out the paradoxical nature of voting in the presence of three or more alternatives.
1.1. The law of large numbers and Condorcet’s jury theorem
In what is known as Condorcet’s jury theorem, Condorcet considered the following setup. There are voters and two alternatives denoted (which stands for and ) (which stands for Each voter obtains a signal which indicates which of the alternatives is preferable. The assumption is that there is an a priori better alternative and that each voter independently obtains the correct information with probability ). and incorrect information with probability The . voters then take a majority vote to decide the winner. Without loss of generality, we may assume that the correct alternative is and therefore the individual signals are i.i.d. (independent and identically distributed) random variables where and Let . denote the Majority function, i.e., the function that returns the most popular values among its inputs. Condorcet’s jury theorem is the following.
The first part of the theorem is immediate from the law of large numbers (which was known at the time), so the novel contribution was the second part. In the early days of modern democracy, Condorcet used his model to argue that the more people participating in decision making, the more likely that the correct decision is arrived at. We leave the proof of the second part of the theorem as an exercise.
1.2. Condorcet’s paradox and Arrow’s theorem
As hinted earlier, things are more interesting when there are three or more alternatives. In the same 1785 paper, Condorcet proposed the following paradox. Consider three voters named , and , and three alternatives named , , and , Each voter ranks the three alternatives in one of six linear orders. While it is tempting to represent the orders as elements of the permutation group . it will be more useful for us to use the following representation. Voter , preference is given by where , if she prefers to and , otherwise; if she prefers to and , otherwise; and if she prefers to and , otherwise. Each of the six rankings corresponds to one of the vectors in .
Condorcet considered three voters, with rankings given by , , or in our notation by the rows of the following matrix: ,
How should we decide how to aggregate the individual rankings? If we use the majority rule to decide between each pair of preferences, then we apply the majority rule on each of the columns of the matrix and conclude that overall preference is In other words, the overall preference is that . the overall preference is that , and the overall preference is that , This does not correspond to an order! This is what is known as Condorcet’s paradox. .
Almost two hundred years later, Ken Arrow asked if perhaps the paradox is the result of using the majority function to decide between every pair of alternatives? Can we avoid paradoxes if we aggregate pairwise preferences using a different function?
One function that never results in paradoxes is the dictator function as the aggregate ranking is .
Arrow in his famous theorem Reference 1Reference 2 proved the following.
With the right notation and formulation, the proof of Arrow’s theorem is very short (see Reference 3Reference 50).
Arrow also considered a more general setting where , and , are allowed to be different functions. In this case, there are other functions that never result in a paradox. For example if and for all inputs, then can be arbitrary. This corresponds to the case where the second alternative is ignored (always ranked last) and the choice between and is determined by Arrow’s theorem in this setting can be stated as follows: .
1.3. Manipulation and the Gibbard–Satterthwaite theorem
A naturally desirable property of a voting system is strategy-proofness (a.k.a. nonmanipulability): no voter should benefit from voting strategically, i.e., voting not according to her true preferences. However, Gibbard Reference 28 and Satterthwaite Reference 71 showed that no reasonable voting system can be strategy proof. Before stating the result, let us specify the model more formally.
The setting here is different than the setup of Arrow’s theorem: We consider voters electing a winner among alternatives. The voters specify their opinion by ranking the alternatives, and the winner is determined according to some predefined social choice function (SCF) of all the voters’ rankings, where denotes the set of all possible total orderings of the alternatives. We call a collection of rankings by the voters a ranking profile. We say that an SCF is manipulable if there exists a ranking profile where a voter who knows how all other voters vote, can achieve a more desirable outcome of the election according to her true preferences by voting in a way that does not reflect her true preferences.
For example, consider Borda voting, where each candidate receives a score which is the sum of its ranks, and the candidate with the lowest score wins. If the individual rankings are then , is the winner, but if the second voter were to vote instead, then will become the winner, so the second voter is incentivized to cast a vote that is different from her true ranking.
This theorem has contributed to the realization that it is unlikely to expect truthfulness in voting. There are many proofs of the Gibbard–Satterthwaite theorem, but all are more complex than the proof of Arrow’s theorem given above. We will not provide proof of the theorem here.
1.4. Judgment aggregation
A recent line of work is devoted to the problem of judgment aggregation. In the legal literature, Kornhauser and Sager Reference 43 discuss a situation where three cases are considered in court, and by law, one should rule against if and only if there is a ruling against both and When several judges are involved, their opinions should be aggregated using a function . that preserves this law, that is, satisfies
where
1.5. Modern perspectives
Work since the 1980s addressed novel aspects of aggregation of votes. Condorcet’s jury theorem assumes a probability distribution over the voters but is restricted to a specific aggregation function (majority) while Arrow’s theorem considers general aggregation functions but involves no probability model. There are many interesting questions that can be asked by combining the two perspectives. First, it is natural to ask about the aggregation properties of Boolean functions. The study of aggregation properties of Boolean functions was fundamental to the development of the area of analysis of Boolean functions since the 1980s, starting with the work of Ben-Or and Linial Reference 6 and of Kahn, Kalai, and Linial Reference 36. Second, we can ask questions regarding the probability of manipulation and paradoxes, questions that were analyzed since the early 2000s, starting with the works by Kalai, Nisan, Friedgut, and collaborators Reference 26Reference 37Reference 38.
The theory that was developed is intimately connected to the area of property testing in theoretical computer science and to additive combinatorics. Moreover, we will see that some of the main results and techniques have a discrete isoperimetric flavor. We discuss some of these topics and their connections as follows.
In Section 2, we will start by studying the question of noise stability of Boolean functions that were originally studied by Benjamini, Kalai, and Schramm in the context of percolation Reference 7 and which later played an important role in analyzing the probability of paradoxes, starting with the work of Kalai Reference 37. This work as well as motivation from theoretical computer science Reference 41, led to the proof of the Majority Is Stablest theorem by Mossel, O’Donnell, and Oleszkiewicz Reference 57Reference 58. The proof of these results will require some of the main analytical tools in the area, including, notably, hypercontraction Reference 4Reference 10Reference 29Reference 66, the invariance principle Reference 58, and Borell’s Gaussian noise-stability result Reference 12.
In Section 3, we sketch proofs of quantitative versions of Arrow’s theorem. We will follow Reference 52 by first proving a Gaussian version of Arrow’s theorem, as well as a quantitive version using reverse hypercontraction Reference 11. Combining the two, we will prove a general quantitive Arrow’s theorem Reference 52. We will also present Kalai’s original proof of a quantitative Arrow’s theorem Reference 37, which is less general but uses only hypercontraction via Reference 25.
Different tools were used in proving different quantitive versions of the manipulation theorem. The first proof, which applies only to three alternatives, uses a reduction to a quantitive Arrow’s theorem Reference 24Reference 26. In Section 4 we discuss later approaches that apply for any number of candidates and use reverse hypercontraction as a major tool Reference 32Reference 62. The classical proofs of manipulation theorems often use long paths of voting profiles. The most general proof in Reference 62 will quantify such arguments using geometric tools from the theory of Markov chains.
Recent work Reference 23 established quantitative versions of the result by List and Pettit by showing that if
1.6. Probability of paradox for the majority
As a preview of what’s to come, we will compute the asymptotic probability of a nontransitive outcome in the Condorcet setup with three alternatives and where voters vote uniformly at random. The first reference to this computation is by Guilbaud (1952); see Reference 14 and Reference 5Reference 27.
Let us denote the Majority function by
How do we analyze the probability of a paradox? The following simple fact was used in Reference 37: Since the binary predicate
we can write
which due to symmetry can be written as
Moreover, the uniform distribution over
where
and
In particular, the probability of paradox does not vanish as
2. Noise stability
2.1. Boolean noise stability
Consider the following thought experiment. Suppose the voters in binary voting obtain independent uniform signals:
Now consider the following process that produces a vector
How should we interpret
to be as large as possible if
We can also write the noisy inner product in terms of the noise operator
This section discusses noise stability of various families of functions and analytical properties of the operator
The noise operator plays a key role in the theory of hypercontraction Reference 10. Note that
and that
Basic properties of this operator can be revealed using its eigenfunctions, i.e., the Fourier basis. The following proposition is straightforward to prove.
The following folklore result is easily derived by explicitly writing the Fourier expression of
Theorem 2.4 allows a quick proof of a version of Arrow’s theorem by Kalai Reference 37:
From the voting perspective, there is something a little disappointing about Theorem 2.4. It says that if we want to maximize robustness, then among all balanced functions dictator is optimal. However, in the theory of voting dictators are usually not considered good voting schemes. From a mathematical perspective, it is disappointing that there is something special about
To the best of our knowledge, the value is not known for all