Local well-posedness for quasi-linear problems: A primer

By Mihaela Ifrim and Daniel Tataru


Proving local well-posedness for quasi-linear problems in partial differential equations presents a number of difficulties, some of which are universal and others of which are more problem specific. On one hand, a common standard for what well-posedness should mean has existed for a long time, going back to Hadamard. On the other hand, in terms of getting there, there are by now both many variations—and also many misconceptions.

The aim of these expository notes is to collect a number of both classical and more recent ideas in this direction, and to assemble them into a cohesive roadmap that can be then adapted to the reader’s problem of choice.

1. Introduction

Local well-posedness is the first question to ask for any evolution problem in partial differential equations (PDEs). These notes, prepared by the authors for a summer graduate school seminar at MSRI Reference 13 in 2020, aim to discuss ideas and strategies for local well-posedness in quasi-linear and fully nonlinear evolution equations, primarily of hyperbolic type. We hope to persuade the reader that the structure presented here should be adopted as the standard for proving these results. Of course, there are many possible variations, and we try to point out some of them in our many remarks. While a few of the ideas here can be found in several of the classical books (see, e.g., Reference 30, Reference 10, Reference 3, Reference 24), some of the others have appeared only in articles devoted to specific problems and have never been collected together, to the best of our knowledge.

1.1. Nonlinear evolutions

For our exposition we will adopt a two-track structure, where we will broadly discuss ideas for a general problem and, in parallel, implement these ideas on a simple, classical, concrete example.

Our general problem will be a nonlinear PDE of the form

i.e., a first-order system in time, where we think of as a scalar- or a vector-valued function belonging to a scale of either real or complex Sobolev spaces. This scale will be chosen to be for the purpose of this discussion, though in practice it often has to be adapted to the class of problems to be considered. The nonlinearity represents a nonlinear function of and its derivatives,

where we will refer to as the order of the evolution. Here typical examples include (hyperbolic equations), (Schrödinger-type evolutions) and (Korteweg–de Vries-type evolutions). But many other situations arise in models which are nonlocal, e.g., in water waves one encounters for gravity waves (resp., for capillary waves).

Some problems are most naturally formulated as second-order evolutions in time, for instance nonlinear wave equations. While some such problems admit also good first-order in time formulations (e.g., the compressible Euler flow), it is sometimes better to treat them as second order. Regardless, our roadmap still applies, with obvious adjustments.

Our model problem will be a classical first-order symmetric hyperbolic system in , of the form

where takes values in and the matrices are symmetric and smooth as functions of . Here the order of the nonlinearity is , and the scale of Sobolev spaces to be used is indeed the Sobolev scale.

1.2. What is well-posedness?

To set the expectations for our problems, we recall the classical Hadamard standard for well-posedness, formulated relative to our chosen scale of spaces.

Definition 1.1.

The problem Equation 1.1 is locally well-posed in a Sobolev space if the following properties are satisfied:


Existence: For each there exists some time and a solution .


Uniqueness: The above solution is unique in .


Continuous dependence: The data to solution map is continuous from to .

As a historical remark, we note that Hadamard primarily discussed the question of well-posedness in the context of linear PDEs, specifically for the Laplace and wave equations, beginning with an incipient form in Reference 8 and a more developed form in Reference 9. It is in the latter reference where the continuous dependence is discussed, seemingly inspired by Cauchy’s theorem for ordinary differential equations.

The above definition should not be taken as universal, but rather as a good starting point, which may need to be adjusted depending on the problem. Consider for instance the uniqueness statement, which, as given in Definition 1.1(ii), is in the strongest form, which is often referred to as unconditional uniqueness. Often this may need to be relaxed somewhat, particularly when low regularity solutions are concerned. Some common variations concerning uniqueness are as follows:


The solutions in Definition 1.1(i) are shown to belong to a smaller space, , and then the uniqueness in Definition 1.1(ii) holds in the same class.


Unconditional uniqueness holds a priori only in a more regular class with , but the data to solution map extends continuously as a map from to .

Since we are discussing nonlinear equations here, the lifespan of the solutions need not be infinite, i.e., there is always the possibility that solutions may blow up in finite time. In particular, in the context of well-posed problems it is natural to consider the notion of maximal lifespan, which is the largest for which the solution exists in ; here the limit of as approaches cannot exist, or else the solution may be continued further.

In this context, the last property in Definition 1.1 should be interpreted to mean in particular that, for a solution , small perturbations of the initial data yield solutions which are also defined in . This in turn implies that the maximal lifespan is lower semicontinuous as a function of .

In view of the above discussion, it is always interesting to provide more precise assertions about the lifespan of solutions, or, equivalently, continuation (or blow-up) criteria for the solutions. Some interesting examples are as follows:


The lifespan is bounded from below uniformly for data in a bounded set,

This implies a blow-up criteria as follows:


The blowup may be characterized in terms of weaker bounds,

relative to a Banach topology , or perhaps a time integrated version thereof

To conclude our discussion of the above definition, we note that many well-posedness statements also provide additional properties for the flow:

Higher regularity

If the initial data has more regularity with , then this regularity carries over to the solution, , with bounds and lifespan bounds depending only on the size of the data.

Weak Lipschitz bounds

On bounded sets in , the flow is Lipschitz in a weaker topology (e.g., up to in our model problem).

Both of these properties are often an integral part of a complete theory and frequently also serve as intermediate steps in establishing the main well-posedness result.

In all of the above discussion, a common denominator remains the fact that the solution to data map is locally continuous but not uniformly continuous. It is very natural indeed to redefine (expand) the notion of quasi-linear evolution equations to include all flows that share this property.

In many problems of this type, one is interested not only in local well-posedness in some Sobolev space , but also in lowering the exponent as much as possible. We will refer to such solutions as rough solutions. Then, a natural question is what kind of regularity thresholds should one expect or aim for in such problems? One clue in this direction comes from the scaling symmetry, whenever available. As an example, our model problem exhibits the scaling symmetry

The scale-invariant initial-data Sobolev space corresponding to this symmetry is the homogeneous space , where . This space is called the critical Sobolev space, and it should heuristically be thought of as an absolute lower bound for any reasonable well-posedness result. Whereas in some semilinear dispersive evolutions one can actually reach this threshold, in nonlinear flows it seems to be out of reach in general.

1.3. A set of results for the model problem

In order to state the results, we begin with a discussion of control parameters. We will use two such control parameters. The first one is

This is a scale-invariant quantity, which appears in the implicit constants in all of our bounds. Our second control parameter is

which instead will be shown to control the energy growth in all the energy estimates. Precisely, the norm plays the role of the norm mentioned in the discussion above.

The primary well-posedness result for the model problem is as follows:

Theorem 1.

The equation Equation 1.2 is locally well-posed in in the Hadamard sense for .

The reader will notice that this result is one derivative above scaling. It is also optimal in some cases, including the scalar case (where the problem can be solved locally using the method of characteristics), but it is not optimal in many other cases where the system is dispersive.

For the uniqueness result we have in effect a stronger statement that only requires Lipschitz bounds for . This however does not improve the scaling comparison relative to the critical spaces:

Theorem 2.

Uniqueness holds in the Lipschitz class, and we have the difference bound

This is exactly the kind of weak Lipschitz bound discussed earlier. With a bit of additional effort, for the solutions in Theorem 1 this may be extended to a larger range of Sobolev spaces,

The small price to pay here is that now the implicit constant in the estimate depends not only on and but also on the norms of and in .

A key role in the proof of the well-posedness result is played by the energy estimates, which are also of independent interest:

Theorem 3.

The following bounds hold for for solutions to Equation 1.2 for all

Finally, as a corollary of the last result, we obtain a continuation criteria for solutions:

Theorem 4.

Solutions can be continued in for as long as remains finite.

Theorem 1 has been first proved by Kato Reference 16, borrowing ideas from nonlinear semigroup theory; see, e.g., Barbu’s book Reference 4. The existence and uniqueness part, as well as the energy estimates, can also be found in standard references; e.g., in the books of Taylor Reference 30, Hörmander Reference 10, and Sogge Reference 24 (in the last two the wave equation is considered, but the idea is similar). However, interestingly enough, the continuous dependence part is missing in all these references. We did find presentations of continuous dependence arguments inspired from Kato’s work in Chemin’s book Reference 3, and also on Tao’s blog Reference 26.

Our objective for the remainder of the paper will be to provide complete proofs for Theorems 1, 2, 3, and 4, which readers may take as a guide for their problem of choice. While these results are not new in the model case we consider, to the best of our knowledge this is the first time when the proofs of these results are presented in this manner. Along the way, we will also provide extensive comments and pointers to alternative methods developed along the years.

In particular, we would emphasize the frequency envelope approach for the regularization and continuous dependence parts, as well as the time discretization approach for the existence proof. The frequency envelope approach has been repeatedly used by the authors, jointly with different collaborators, in a number of papers (see, e.g., Reference 23, Reference 29, Reference 18, Reference 12, Reference 15), with some of the ideas crystalizing along the way. The version of the existence proof based on time discretization is in some sense very classical, going back to ideas which have originally appeared in the context of semigroup theory; however, its implementation is inspired from the authors’ recent work Reference 15, though the situation considered here is considerably simpler.

1.4. An outline of these notes

Our strategy will be, in each section, to provide some ideas and a broader discussion in the context of the general equation Equation 1.1 and then show how this works in detail in the context of our chosen example Equation 1.2.

In Section 2 we introduce the paradifferential form of our equations, both the main equation and its linearization. This is an idea that goes back to work of Bony Reference 6 and helps clarify the roles played by different frequency interaction modes in the equation. Another very useful reference here is Metivier’s more recent book Reference 21.

Section 3 is devoted to the energy estimates in multiple contexts. These are presented both for the full equation, for its linearization, for its associated linear paradifferential flow, and for differences of solutions. The latter, in turn, yields the uniqueness part of the well-posedness theorem. A common misconception here has been that for well-posedness it suffices to prove energy estimates for the full equation. Instead, in our presentation we regard the bound for the linearized problem as fundamental, though, at the implementation level, it is the paradifferential flow bound that can be found at the core.

Section 4 provides two approaches for the existence part of the well-posedness theorem. The first one, more classical, is based on an iteration scheme, which works well on our model problem but may run into implementation issues in more complex problems. The second approach, which we regard as more robust, relies on time discretization, and is somewhat related to nonlinear semigroup theory, which also inspired Kato’s work. Two other possible strategies, which have played a role historically, are briefly outlined.

Section 5 introduces Tao’s notion of frequency envelopes (see for example Reference 27), which is very well suited to track the flow of energy as time progresses. This is then used to show how rough solutions can be obtained as uniform limits of smooth solutions. This is a key step in many well-posedness arguments, and helps decouple the regularity for the initial existence result from the rough data results.

Finally, Section 6 is devoted to the continuous dependence result, where we provide the modern frequency-envelope-based approach. At the same time, for a clean, elegant reinterpretation of Kato’s original strategy, we refer the reader to Tao’s blog Reference 26.

2. A menagerie of related equations

While ultimately one would want all the results stated in terms of the full nonlinear equation, any successful approach to quasi-linear problems needs to also consider a succession of closely related linear equations as well as associated reformulations of the nonlinear flow. Here we aim to motivate and describe these related flows, stripping away technicalities.

2.1. The linearized equation

This plays a key role in comparing different solutions; we will write it in the form

where stands for the differential of , which in our setting is a partial differential operator of order . One may also reinterpret the equation for the difference of two solutions as a perturbed linearized equation with a quadratic source term. Some caution is required here, because often some structure is lost in doing this, and the question is whether or not that is too much.

In the particular case of Equation 1.2, the linearized equation takes the form

2.2. The linear paradifferential equation

One distinguishing feature of quasi-linear evolutions is that the nonlinearity cannot be interpreted as perturbative. Nevertheless, one may seek to separate parts of the nonlinearity which can be seen as perturbative, at least at high regularity, in order to better isolate and understand the nonperturbative part.

To narrow things down, consider a nonlinear term which is quadratic, say of the form , and consider the three modes of interaction between these terms, according to the Littlewood–Paley trichotomy or paraproduct decomposition,

where the three terms represent the -, -, and the - frequency interactions, respectively. The high-high interactions in the last term are always perturbative at high regularity, so they are placed into the perturbative box. But one cannot do the same with the low-high or high-low interactions, which are kept on the nonperturbative side. This is closely related to the linearization and, indeed, at the end of the day, we are left with a paradifferential style nonperturbative part of our evolution, which we can formally write as

Here, one can naively use Bony’s notion of a paraproduct Reference 6 to define the linear operator as

where is a placeholder for the argument of the nonlinearity . However, there are also other related choices one can make; see for instance the discussion at the end of this subsection. For a discussion on the use of paradifferential calculus in nonlinear PDEs (though not the above notation), we refer the reader to Metivier’s book Reference 21.

One can think of the above evolution as a linear evolution of high frequency waves on a low frequency background. Then one can interpret solving the nonperturbative part of our evolution as an infinite-dimensional triangular system, where each dyadic frequency of the solution is obtained at some step by solving a linear system with coefficients depending only on the lower components, and in turn it affects the coefficients of the equations for the higher frequency components. Of course, this should only be understood in a philosophical sense, because a variable coefficient flow in general does not preserve frequency localizations. This can sometimes be achieved with careful choices of the paraproduct quantizations, but it never seems worthwhile to implement, as the perturbative terms will mix frequencies anyway and add tails.

Turning to our model problem, in a direct interpretation the associated paradifferential equation will have the form

However, upon closer examination one may see several choices that could be made. Considering for instance the first paraproduct, which of the following expressions would make the better choice at frequency ?

The last one may seem the most complicated, but it is also the most accurate. In many cases, including our model problem, it makes no difference in practice. However, one should be aware that often a simpler choice, which is made for convenience in one problem, might not work in a more complex setting.

Remark 2.1.

Here the frequency gap, which was set to be equal to in the above formulas, is chosen rather arbitrarily; its role is simply to enforce the frequency separation between the coefficients and the leading term. On occasion, particularly in large data problems, it is also useful to work instead with a large frequency gap as a proxy for smallness; see, e.g., Reference 25.

2.3. The paradifferential formulation of the main equations

Consider first our general equation Equation 1.1, which we can write in the form

Here one would hope that the paradifferential source term can be seen as perturbative, in the sense that

Similarly, we can write the linearized equation Equation 2.1 in the same format,

with the appropriate nonlinearity . This is still based on the paradifferential equation Equation 2.5 but can no longer be interpreted as the direct paralinearization of the linearized equation. This is because the expression also contains some low-high interactions, precisely those where is the low frequency factor.

3. Energy estimates

Energy estimates are a critical part of any well-posedness result, even if they do not tell the entire story. In this section we begin with a heuristic discussion of several ideas in the general case and then continue with some more concrete analysis in the model case.

3.1. The general case

Consider first the energy estimates for the general problem Equation 1.1, where it is simpler to think of this in the paradifferential formulation Equation 2.3. An energy estimate for this problem is an estimate that allows us to control the time evolution of the Sobolev norms of the solution. In the simplest formulation, the idea would be to prove that

with a constant that at the very least depends on the norm of .

There are two points that one should take into account when considering such estimates. The first is that it is often useful to strenghten such bounds by relaxing the dependence of the constant on . Heuristically, the idea is that this constant measures the effect of nonlinear interactions, which are strongest when our functions are pointwise large, not only large in an sense. Thus, it is often possible to replace the constant with an analogue of the uniform control norm in the model case, perhaps with some additional implicit dependence on another scale invariant uniform control parameter . See however the discussion in Remark 3.2.

A second point is that, although it is tempting to try to work directly with the norm, it is often the case that the straight norm is not well adapted to the structure of the problem; see, e.g., what happens in water waves Reference 2, Reference 12. Then it is useful to construct energy functionals adapted to the problem at hand. For these energies we should aim for the following properties.


Energy equivalence:


Energy propagation:

where the control parameter satisfies

Now consider our main equation written in the form Equation 2.3. For the perturbative part of the nonlinearity we hope to have some boundedness,

This in turn allows us to reduce nonlinear energy bounds of the form Equation 3.2 to similar bounds for the linear paradifferential equation Equation 2.5. One may legitimately worry here that some structure is lost when we decouple the paradifferential coefficients from the evolution variable; however, the point is that these two objects are indeed separate, as they represent different frequencies of the solution.

Remark 3.1.

In our discussion here we took the simplified view that bounds for begin at . But this is not always the case in practice, and often one needs to identify the lower range for where this works; see, e.g., the nonlinear wave equation Reference 23, the wave map equation Reference 29, or the water wave problem considered in Reference 1.

Now consider the paradifferential evolution Equation 2.5, and begin with the case by setting . Then we need to produce a linearized type energy so that the solutions satisfy

Then the associated nonlinear energy at would be

If , then the bound Equation 3.5 would simply require that the paradifferential operator is essentially antisymmetric in . If that is not true, then the backup plan is to find an equivalent Hilbert norm on so that the antisymmetry holds. Some care is needed however; if this norm depends on , then this dependence needs to be mild.

The next step is to consider a larger . By interpolation it suffices to work with integer , in which case one might simply differentiate Equation 2.3,

Here we would be done if the last commutator is bounded from into . In principle that would be the case almost automatically, at least when the order of is at most one. One can heuristically associate this with the finite speed of propagation in the high frequency limit.

Remark 3.2.

The case , which corresponds to an infinite speed of propagation, is often more delicate; see, e.g., Reference 17, Reference 18, Reference 19 for quasi-linear Schrödinger flows or Reference 14 for capillary waves. There one needs to further develop the function space structure based on either dispersive properties of solutions or on normal forms analysis.

3.2. Coifman-Meyer and Moser type estimates

Before considering our model problem, we briefly review some standard bilinear and nonlinear estimates that play a role later on. In the context of bilinear estimates, a standard tool is to consider the Littlewood–Paley paraproduct type decomposition of the product of two functions, which leads to Coifman–Meyer type estimates; see Reference 7, Reference 22:

Proposition 3.3.

Using the standard paraproduct notations, one has the following estimates,

as well as the commutator bound

Here is the Littlewood–Paley projection onto frequencies .

These results are standard in the harmonic/microlocal analysis community. For nonlinear expressions we use Moser type estimates instead:

Proposition 3.4.

The following Moser estimate holds for a smooth function , with , and

Of course many more extensions of both the bilinear and the nonlinear estimates above are available.

3.3. The model case

We now turn our attention to our model problem, where, if we adopt the expression Equation 2.4 for the paradifferential flow, the source term is given by

We can rewrite this in the form

For this expression we can show that it always plays a perturbative role:

Proposition 3.5.

The above nonlinearity satisfies the following bounds:


Sobolev bounds:


Difference bounds:

as well as

The next-to-last bound shows in particular that is Lipschitz in for . The simplification in the case is also useful in order to bound differences of solutions in the topology.


(i) We use the expression Equation 3.9 for . The first term can be estimated using a version of the Coifman–Meyer estimates and Moser estimates by

For the second term we use again paraproduct bounds and Moser estimates to get

The third term is similar to the second.

(ii) First, we note the representation

which we use to separate factors. Here is a smooth function of and . Then taking differences in the first term of , we need two estimates