PDFLINK |

# Threshold Phenomena for Random Discrete Structures

Communicated by *Notices* Associate Editor Emilie Purvine

## 1. Erdős–Rényi Model

To begin, we briefly introduce a model of random graphs. Recall that a graph is a mathematical structure that consists of vertices (nodes) and edges.

Roughly speaking, a random graph in this article means that, given a vertex set, the existence of each potential edge is decided at random. We will specifically focus on the *Erdős–Rényi random graph* (denoted by which is defined as follows. ),

Consider vertices that are labelled from 1 to .

Observe that on those vertices, there are potentially edges, that is, the edges labelled Given a probability . include each of the , potential edges with probability where the choice of each edge is made independently from the choices of the other edges. ,

In reality, when we consider , is a large (yet finite) number that tends to infinity, and is usually a function of that tends to zero as For example, . , etc. ,

As we saw in Example 1.1, a random graph is a random variable with a certain probability distribution (as opposed to a fixed graph) that depends on the values of and Assuming . is given, the structure of changes as the value of changes, and in order to understand we ask questions about the structure of , such as

or

Basically, for *any* property of interest, we can ask

In those questions, usually we are interested in understanding the *typical* structure/behavior of Observe that, unless . or there is always a positive probability that all of the edges in , are absent, or all of them are present (see Examples 1.2, 1.3). But in this article, we would rather ignore such extreme events that happen with a tiny probability, and focus on properties that possesses with a probability close to 1.

We often use languages and tools from probability theory to describe/understand behaviors of Below we discuss some very basic examples. .

We will write if as .

One important object in graph theory is the *complete graph*, a graph with all the potential edges present. The complete graph on vertices is denoted by .

We can easily imagine that, unless is very close to it is extremely unlikely that , is complete. Indeed,

(since we want all the edges present), which tends to 0 unless is of order at most .

Similarly, we can compute the probability that is “empty” (let’s denote this by meaning that no edges are present. )Footnote^{1} The probability for this event is

When is small, is approximately so the above computation tells us that ,

How many edges does typically have? The natural first step to answer this question is computing the *expected* number of edges in Using .*linearity of expectation*,

For example, if then the expected number of edges in , is But does this really imply that . *typically* has about edges? The answer to this question is related to the fascinating topic of “concentration of a probability measure.” We will very briefly discuss this topic in Example 2.5.

Similarly, we can compute the expected number of *triangles* (the complete graph in ) .

The above computation tells us that

from which we can conclude that is typically triangle-free if (If the expectation tends to 0, then there is little chance that . contains a triangle.)

On the contrary, we *cannot* conclude that *typically* contains many triangles for from the above expectation computation. Just think about a lottery of the prize money dollars with the chance of winning to see that a large expectation does not necessarily imply a high chance of the occurence of an event. In general, showing that a desired structure typically ,*exists* in is a very challenging task, and this became a motivation for the *Kahn–Kalai conjecture* that we will discuss in the latter sections.

## 2. Threshold Phenomena

One striking thing about is that appearance and disappearance of certain properties are abrupt. Probably one of the most well-known examples that exhibit threshold phenomena of is the appearance of the *giant component.* A *component* of a graph is a maximal connected subgraph. For example, the graph in Figure 4 consists of four components, and the size (the number of vertices) of each component is , , and , .

**Figure 4.**

A graph that consists of four components.

For observe that, when , the size of a largest component of , is 1; in this case all of the edges are absent with probability 1, so each of the components is an isolated vertex. On the other hand, when , is the complete graph with probability 1, so the size of its largest component is .

**Figure 5.**

and .

Then what if is strictly between and ?

What’s the (typical) size of a largest component in ?

Of course, one would naturally guess that as increases from to the typical size of a largest component in , would also increase from 1 to But what is really interesting here is that there is a “sudden jump” in this increment. .

In the following statement and everywhere else, *with high probability* means that the probability that the event under consideration occurs tends to 1 as .

For any the size of a largest component of , is

with high probability, where depend only on .

The above theorem says that if is “slightly smaller” than then typically all of the components of , are very small (note that is much smaller than the number of vertices, ).

**Figure 6.**

with all components small .

On the other hand, if is “slightly larger” than then the size of a largest component of , is as large as linear in It is also well-known that all other components are very small (at most of order . and this unique largest component is called the ),*giant component*.

**Figure 7.**

with the giant component .

So around the value the giant component “suddenly” appears, and therefore the structure of , also drasitically changes. This is one example of the *threshold phenomena* that exhibits, and the value is a *threshold function* for of having the giant component. (The formal definition of a threshold function is given in Definition 2.3. See also the definition of *the threshold* in Section 5.)

The abrupt appearance of the giant component of is just one instance of vast threshold phenomena for *random discrete structures*. In this article, we will mostly deal with for the sake of concreteness, but there will be a brief discussion about a more general setting in Section 5.

Now we introduce the formal definition of a threshold function due to Erdős and Rényi. Recall that, in all the vertices are labelled , A .*graph property* is a property that is invariant under graph isomorphisms (i.e., relabelling the vertices), such as , , etc. We use , for a graph property, and denotes that has property .

Given a graph property we say that , is a *threshold function*Footnote^{2} (or simply a *threshold*) for if

^{2}

By the definition, a threshold function is determined up to a constant factor thus not unique, but conventionally people also call this *the* threshold function. In this article, we will separately define *the threshold* in Section 5, which is distinguished from a threshold function.

For example, is a threshold function for the existence of the giant component.

Note that it is not obvious at all whether a given graph property would admit a threshold function. Erdős and Rényi proved that many graph properties have a threshold function, and about 20 years later, Bollobás and Thomason proved that, in fact, there is a wide class of properties that admit a threshold function. In what follows, an *increasing (graph) property* is a property that is preserved under addition of edges. For example, connectivity is an increasing property, because if a graph is connected then it remains connected no matter what edges are additionally added.

Every increasing property has a threshold function.

Now it immediately follows from the above theorem that all the properties that we have mentioned so far—connectivity, planarity,Footnote^{3} having the giant component, etc.—have a threshold function (thus exhibit a threshold phenomenon). How fascinating it is!

On the other hand, knowing that a property has a threshold function does not tell us anything about the value of So it naturally became a central interest in the study of random graphs to find a threshold function for various increasing properties. One of the most studied classes of increasing properties is .*subgraph containment*, i.e., the question of for what , is likely/unlikely to contain a copy of the given graph. Figure 8 shows some of the well-known threshold functions for various subgraph containments (and that for connectivity).

**Figure 8.**

Some well-known thresholds.

Figure 8 says that is a threshold function for the property Recall from the definition of a threshold that this means .

- (i)
if then and ;

- (ii)
if then .

We have already justified (i) in Example 1.2 by showing that

However, showing (ii) is an entirely different story. As discussed in Remark 1.7, the fact that

does not necessarily imply that typically contains many triangles. Here we briefly describe one technique, which is called the *second moment method*, that we can use to show (ii): let be the number of triangles in noting that then , is a random variable. By showing that the variance of is very small, which implies that is “concentrated around” we can derive (from the fact that , is huge) that typically the number of triangles in is huge. We remark that the second moment method is only a tiny part of the much broader topic of *concentration of a probability measure.*

We stress that, in general, finding a threshold function for a given increasing property is a very hard task. To illustrate this point, let’s consider one of the most basic objects in graph theory, a *spanning tree*—a tree that contains all of the vertices.

**Figure 9.**

A connected graph and a spanning tree in it.

The question of finding a threshold function for of containing a spanning treeFootnote^{4} was one of the first questions studied by Erdős and Rényi. Already in their seminal paper 6, Erdős and Rényi showed that a threshold function for containing a spanning tree is However, the difficulty of this problem immensely changes if we require . to contain a *specific* (up to isomorphisms) spanning tree (or more broadly, a spanning graph.Footnote^{5}) For example, one of the biggest open questions in this area back in 1960s was finding a threshold function for a *Hamiltonian cycle* (a cycle that contains all of the vertices).

**Figure 10.**

A graph and a Hamiltonian cycle in it.

This problem was famously solved by Pósa in 1976.

A threshold function for to contain a Hamiltonian cycle is

Note that both threshold functions for any spanning containtree and are of order even though the latter is a stronger requirement. Later we will see (in the discussion that follows Example ,4.6) that is actually an easy lower bound on both threshold functions. It has long been conjectured that for *any* spanning treeFootnote^{6} with a constant maximum degree, its threshold function is of order This conjecture was only very recently proved by Montgomery .14.

## 3. The Kahn–Kalai Conjecture: A Preview

Henceforth, always denotes an increasing property.

In 2006, Jeff Kahn and Gil Kalai 12 posed an extremely bold conjecture that captures the location of threshold functions for *any* increasing properties. Its formal statement will be given in Conjecture 4.11 (graph version) and Theorem 5.7 (abstract version), and in this section we will give an informal description of this conjecture first. All of the terms not defined here will be discussed in the forthcoming sections.

Given an we are interested in locating its threshold function, , .Footnote^{7} But again, this is in general a very hard task.

Kahn and Kalai introduced another quantity which they named the *expectation threshold* and denoted by which is associated with some sort of expectation calculations as its name indicates. By its definition (Definition ,4.5),

and, in particular, is easy to compute for many interesting increasing properties So . provides an “easy” lower bound on the hard parameter A really fascinating part is that then Kahn and Kalai conjectured that . is, in fact, bounded *above* by multiplied by some tiny quantity!

So this conjecture asserts that, for any , is actually well-predicted by (much) easier !

The graph version of this conjecture (Conjecture 4.11) is still open, but the abstract version (Theorem 5.7) is recently proved in 15.

## 4. Motivating Examples

The conjecture of Kahn and Kalai is very strong, and even the authors of the conjecture wrote in their paper 12 that “it would probably be more sensible to conjecture that it is *not* true.” The fundamental question that motivated this conjecture was:

What drives thresholds?

All of the examples in this section are carefully chosen to show the motivation behind the conjecture.

Recall that the definition of a threshold (Definition 2.3) doesn’t distinguish constant factors. So in this section, we will use the convenient notation , and , to mean (respectively) and , up to constant factors. Finally, write for a threshold function for of containing a copy of for notational simplicity. ,

Let be the graph in Figure 11. Let’s find .

**Figure 11.**

Graph .

In Example 2.5, we observed that there is a connection between a threshold function and computing expectations. As we did in Examples 1.4 and 1.6,

where is because the number of in ’s is of order (since has four vertices), and is precisely (since has five edges). So we have

and let’s (informally) call the value

This name makes sense since is where the expected number of drastically changes. Note that ’s1 tells us that

so, by the definition of a threshold, we have

This way, we can always easily find a lower bound on for any graph .

What is interesting here is that, for in Figure 11, we can actually show that

using the second moment method (discussed in Example 2.5). This tells us a rather surprising fact that is actually *equal* to the threshold for the expectation of .

**Dream.** Maybe is always equal to the threshold for the expectation of for *any* graph ?

The next example shows that the above dream is too dreamy to be true.

Consider in Figure 12 this time. Notice that is in Figure 11 with a “tail.”

**Figure 12.**

Graph .

By repeating a similar computation as before, we have

so the threshold for the expectation of is Again, this gives that .

so we have However, the actual threshold . is which is much larger than the lower bound. ,

**Figure 13.**

Gap between and the expectational lower bound.

This is interesting, because Figure 13 tells us that when , contains a huge number of “on average,” but still it is very unlikely that actually contains What happens in this inverval? .

Here is an explanation. Recall from Example 4.2 that if then , is unlikely to contain But .

because is a subgraph of !

So when it is highly unlikely that , contains because it is already unlikely that contains However, if . happens to contain then that copy of , typically has lots of “tails” as in Figure 14. This produces a huge number of copies of in ’s .

**Figure 14.**

with many “tails.”

Maybe you have noticed the similarity between this example and the example of a lottery in Remark 1.7.

In Example 4.3, is not predicted by the expected number of thus the ,**Dream** is broken. However, it still shows that is predicted by the expected number of *some* subgraph of and, intriguingly, this holds true in general. To provide its formal statement, define the ,*density* of a graph by

The next theorem tells us the exciting fact that we can find by just looking at its densest subgraph, as long as is fixed.Footnote^{8}

For any *fixed* graph , is equal to the threshold for the expectation of the *densest* subgraph of .

For example, in Example 4.2, the densest subgraph of is itself, so is determined by the expectation of This also determines . in Example 4.3, since the densest subgraph of is again .

Motivated by the preceding examples and Theorem 4.4, we give a formal definition of the *expectation threshold*.

For any graph the expectation threshold for , is

Observe that

and in particular, Theorem 4.4 gives that

Note that this gives a beautiful answer to Question 4.1 whenever is a property of containing a fixed graph.

Theorem 4.4 characterizes threshold functions for any fixed graphs. To extend our exploration, in this example we consider a graph that *grows* as grows. We say a graph is a *matching* if is a disjoint union of edges. is a *perfect matching* if is a matching that contains all the vertices. Write PM for perfect matching.

**Figure 15.**

A matching (above) and a perfect matching (below).

Keeping Question 4.1 in mind, let’s first check the validity of Theorem 4.4 to a perfect matching,Footnote^{9} which is not a fixed graph. By repeating a similar computation as before, we obtain that

which tends to if In fact, it is easy to compute (by considering all subgraphs of a perfect matching) that . so by ,2,

However, unlike threshold functions for fixed graphs, is *not* equal to it was proved by Erdős and Rényi that ;

**Figure 16.**

Gap between and .

Notice that, in Figure 16, what happens in the gap is *fundamentally* different from that in Figure 13. When , contains huge numbers of PMs and *all its subgraphs* “on average.” This means the absence of a subgraph of a PM is not the obstruction for from having a PM when Then what happens here, and what’s the real obstruction? .

It turned out, we have

for a very simple reason: the existence of an isolated vertexFootnote^{10} in It is well-known that if . then , contains an isolated vertex with high probability (this phenomenon is elaborated in Example 4.7). Of course, if there is an isolated vertex in a graph, then this graph cannot contain a perfect matching.

So 3 says the very compelling fact that once is large enough that avoids isolated vertices, contains a perfect matching!

The existence of an isolated vertex in is essentially equivalent to the *coupon collector’s problem*:

Each box of cereal contains a random coupon, and there are different types of coupons. If all coupons are equally likely, then how many boxes of cereal do we (typically) need to buy to collect all coupons?

The well-known answer to this question is that we need to buy boxes of cereal. This phenomenon can be translated to in the following way: in the , vertices are regarded as coupons. If a vertex is contained in a (random) edge in then that is regarded as being “collected.” Note that if , then typically the number of edges in , is and then the coupon collector’s problem says that there is typically an “uncollected coupon,” which is an isolated vertex. ,

Observe that, in Example 4.6, the “coupon-collector behavior” of