From Bézier to BernsteinBézier curves are ubiquitous in computer graphics. ...
Bill Casselman
IntroductionBézier curves are ubiquitous in computer graphics. They were introduced implicitly into theoretical mathematics long before computers, primarily by the French mathematician Charles Hermite and the Russian mathematician Sergei Bernstein. But it was only the work of Pierre Bézier, an employee of the automobile maker Renault, and of Paul de Casteljau, of Citroen, that made these curves familiar to graphics specialists. Recently, the polynomials defined by Bernstein have become again of interest to mathematicians. Bézier curves
A quadratic Bézier curve is determined by three control points P_{0}, P_{1}, and P_{2}. It has the parametric form
A cubic Bézier curve is determined by four control points P_{0}, P_{1}, P_{2}, and P_{3}. It has the parametric form
Again, B(0) = P_{0} and B(1) = P_{3}. The velocity vector at t is
When t=0 the velocity vector is 3(P_{1}P_{0}) and when t=1 it is 3(P_{3}P_{2}). Thus, the path starts at P_{0}, ends at P_{3}; when it leaves P_{0} it is heading towards P_{1}, and when it arrives at P_{3} it is coming from the direction of P_{2}. Otherwise, the relationship between the path and the control points is intuitively weak. In all these cases, the coefficients of the points P_{i} in the parametric equation are terms which appear in the binomial expansion, respectively
A Bézier curve of one degree can be reproduced by one of higher degree. For example, the quadratic Bézier curve with control points P_{0}, P_{1}, P_{2} is the same as the cubic Bézier curve with control points
This can be checked by a simple algebraic calculation. Calculus and Bézier curvesSuppose B(t) to be a cubic Bézier curve. We know that its coordinates are functions of t, and that position and velocity at endpoints are determined by the equations:
These conditions determine B(t) uniquely, because of a simple result first observed by the French mathematician Charles Hermite. The conditions above are really conditions on the coordinates of B(t), so the claim follows from this: THEOREM. Given numbers y_{0}, v_{0}, y_{1}, v_{1}, there exists a unique polynomial P(x) of degree 3 such that
We can guess a formula for P(t), from that of Bézier cubic curves:
As for uniqueness, if we have two such polynomials, let D(x) = a + bx + cx^{2} + dx^{3} be their difference. Then
These tell us immediately that a=b=0, and these four equations immediately reduce to two:
which have only the solutions c=d=0. Q.E.D. This is a special case of the main result of Hermite, according to which we may find a unique polynomial of degree n1 satisfying a total of n conditions on values of the low order derivatives at possibly different points. This is called Hermite interpolation. This Theorem suggests a converse process. Suppose we are given a formula for a function f(x), as well as a formula for its derivative f'(x), and suppose we want to graph the function in a range a ≤ x ≤ b. We could just draw a bunch of very small line segments, but we can get a smoother curve by using Bézier cubic curves. Because if we are given a mediumly small range [c, d] it seems like a reasonable idea to approximate the function over that range by a cubic polynomial P(x) that satisfies these conditions:
To see how Hermite's result applies, replace P varying over the range [c,d] by Q(t) where t varies over [0,1], and with
To this Q we can apply Hermite's formula. A little algebra will then show that the graph between c and d will be approximated by the Bézier cubic with control points
where h=dc. There is one situation where this technique is definitely a good idea. Suppose that
where we do not have a formula for the indefinite integral of g(x). For example, we might have
Then we must approximate the integral numerically, say by Simpson's rule, so we have a recursive estimate
and, by the Fundamental Theorem of Calculus, the graph between x and x+h for small h is approximated by the Bézier curve with control points
where the last is computed according to Simpson's rule. The same idea can be nicely used to plot the trajectories of systems of differential equations in the plane, using a numerical approximation to go from f(x) to f(x+h), such as one of the RungeKutta formulas. Why Bézier curves?There is something at first a bit odd about cubic Bézier curves, the fact that the curves don't actually go through the control points. So let's imagine an alternate way to draw curves that go nicely from one point P_{0} to another point P_{3}, with a cubic parametrization P(t). The only technique that suggests itself naturally is choosing two points P_{1} and P_{2}, and having the curve pass through them, not just near them as Bézier curves do. This means also choosing parameter values t_{1} and t_{2} with P(t_{1}) = P_{1} and P(t_{2}) = P_{2}. The first choice that comes to mind is t_{1} = 1/3, t_{2}=2/3. But it is well known (and explained clearly in the book by Henrici referred to later) that the curves chosen in this way look very odd. What turns out to be more reasonable is to choose t_{1} closer to 0, t_{2} closer to 1. But this means that we are very close to fixing values of the velocity at 0 and 1, which is essentially what Bézier curves do.
Bézier curves and fontsOne of the most common uses of Bézier curves is in the design of fonts. Cubic Bézier curves are used in Type 1 fonts, and quadratic Bézier curves are used in True Type fonts. Cubic Bézier curves are also used in the TEX fonts designed by Donald Knuth, and one of the clearest explanations is in his book MetaFont: the Program. Bernstein polynomialsThe formulas for the coordinates of Bézier curves
are special cases of a more general formula for polynomials of all degrees
where C_{n,k} is the binomial coefficient n!/k!(nk)! Thus B_{n,x }(0)= x_{0}, B_{n,x}(1) = x_{n}. Here x is the array (x_{i}). These polynomials were first defined by the Russian mathematician Sergei Bernstein around 1910. The coefficients of a Bernstein polynomial of degree n are the coefficients in the binomial expansion of [(1t) + t]^{n}. They are all nonnegative, and their sum is 1. The value of B_{n}(t) is therefore a weighted average of the numbers x_{i} in the array x. The term C_{n,k} (1t)^{nk}t^{k} is also the probability of k successes in n trials in which the probability of a success is t. This is not an unimportant fact, as we shall see. They have a number of fundamental properties that we have seen already for Bézier curves:
Thus both B'_{n,x}(0) and B'_{n,x}(1) have simple expressions. Otherwise, the exact relationship between the polynomial and its control values is rather obscure. One other useful feature is that
A related fact, which is crucial in the application of Bernstein polynomials to optimization problems:
And here is the proper generalization of de Casteljau's algorithm:
These polynomials were invented by Sergei Bernstein in order to prove a fundamental result in approximation theory, as we shall see in the next section. They are impractical in this role, but in recent years they have proven important in optimization theory. See, for example, the thesis of Roland Zumkeller (linked to in the References). This is gratifying  almost anyone who has seen Bernstein's proof of Weierstrass' theorem must have felt that these polynomials were destined to play other roles, as well. Bernstein's proof of Weierstrass' approximation theoremIn about 1885 Karl Weierstrass, at the age of 70, published a proof of one of the theorems for which he is most famous: Any continuous function on the interval [0,1] may be uniformly approximated, arbitrarily closely, by polynomials. The definition of continuous functions is relatively simple, and this definition is in close accord with intuition, so it was apparently surprising to learn, as mathematicians of the nineteenth century acquired more and more knowledge about them, that continuous functions could exhibit rather strange behavior. Weierstrass' theorem was perhaps considered refreshing evidence that they were not so wildly behaved after all. Many outstanding mathematicians soon came up with proofs rather different from Weierstrass' original one. One of the most satisfying was that found by Sergei Bernstein, and it was apparently for this purpose that he came up with the polynomials now named after him. Unlike other versions, his provides a very explicit converging sequence of polynomials. Suppose f is a continuous function on [0,1]. For each n let f_{n} be the array of n+1 values f(k/n) for k = 0, ..., n. Explicitly, we have
THEOREM. Suppose f(x) to be a continuous function on [0,1]. The polynomials B_{n,f }(x) converge uniformly on [0,1] to f(x) as n goes to infinity.
and here is a table of the Bernstein approximations of t1/2 at t=1/2 for various values of n:
As you can guess, the convergence here is of order about 1/n^{1/2} around t=1/2, and better at other points. We can see that although one virtue of Bernstein polynomials is that they can approximate arbitrary continuous functions, on functions that one sees in practice they converge at an impractically slow rate. Bernstein's proof of convergence is very elegant. It is a variation on a simple argument used in a slightly different context, that of local averaging. Suppose f to be a continuous function on [0,1]. Define for each n a function
In other words, the value of f_{n} at x is the average of its values at x1/2n and x+1/2n, an interval of width 1/n. Or, to give slightly more complicated examples:
or
These are now different weighted averages of f near x. Bernstein's definition of f_{n}(x) is also a weighted average of values of f around x, but the weighting depends on x. The value of the polynomial B_{n,f } at x is also a weighted average of f around x, although that is not at all apparent at first sight. It is a sum of values of f at certain point k/n, and what happens is that only the terms for k/n near x make a sizable contribution. Here are some graphs of the weighting for various values of x, with n=100:
Why is this? The weighting function is the distribution of k successes in n trials with a probability of success equal to x. The average number of successes  the mean value of this probability distribution  is x, and the real basis of Bernstein's reasoning is that the standard deviation of this distribution  the quantitative measure of its spread  is
As n grows, this shrinks. What this means is that the probability distribution clusters more narrowly around the mean value x, and more or less uniformly for all x. The proof uses this observation. Since the sum of the binomial coefficients is 1, we have
We can divide the sum over the point k/n into those which are near x and those far away, since we expect the sum over the near ones will be the major contribution. Say we choose some small number δ and sum over k/n with xk/n ≤ δ and those with xk/n > δ. The first is
If we choose δ small enough so  f(x)  f(k/n)  < ε/2 whenever x  k/n ≤ δ, this will be less than ε/2. Now let let M be the maximum spread of f on [0,1]. The second sum is
which is at most M times the sum
We must now choose n so that this sum is at most ε/2M. Being able to do so is plausible, since as n increases the range over which C_{n,k} (1x)^{nk} x^{k} is sizable becomes narrower. There are many ways to make this idea rigorous  the simplest is to use Chebyshev's inequality. This says that if p_{i} is a discrete probability distribution with mean m and standard deviation σ then the probability of t  m ≥ s σ is at most 1/s^{2}. This is a very elementary fact, and we'll come back to justify it in a moment. Given it, we see that if n > 2M x(1x) / ε δ^{2} then the sum above is less than ε/2. Proof of Chebyshev's inequality. Since t  m/ sσ > 1 if and only if t  m^{2}/s^{2} σ^{2} > 1, the sum is that of all p_{i} over all t_{i} with t_{i}  m^{2}/s^{2} σ^{2} ≥ 1. But this is at most the sum of t_{i}m^{2}/s^{2}σ^{2} over the same region, which is in turn is at most the sum over all i of the same terms, which is (by definition of standard deviation)
ReferencesTo find out more ...... about Bézier curvesThe first two books contain articles by Pierre Bézier on the history of his work with the curves named after him.
Various works by Donald Knuth explain clearly the role of Bézier curves in font design.
The next two items are about interpolation of various kinds.
... about Bernstein polynomialsWikipedia has several useful entries involving Bernstein polynomials and interpolation. There is a major site at Technion University concerned with approximation theory:
Bernstein polynomials are not practical for polynomial approximation, but in recent years it has become apparent that they are practical for optimization. The next item is one of the more interesting places where this is discussed.
Bill Casselman Those who can access JSTOR can find some of the papers mentioned above there. For those with access, the American Mathematical Society's MathSciNet can be used to get additional bibliographic information and reviews of some these materials. Some of the items above can be accessed via the ACM Portal , which also provides bibliographic services. 
Welcome to the These web essays are designed for those who have already discovered the joys of mathematics as well as for those who may be uncomfortable with mathematics. Search Feature Column Feature Column at a glance 
Comments: Email Webmaster 
© Copyright
, American Mathematical Society


