PDFLINK |

# Grace Wahba and the Wisconsin Spline School

Communicated by *Notices* Associate Editor Richard Levine

Grace Wahba (née Goldsmith, born August 3, 1934) joined the Department of Statistics at the University of Wisconsin–Madison in 1967, and remained on its faculty until her retirement in 2019. During her luminous career at Madison of more than half a century, Grace graduated 39 doctoral students and has over 400 academic descendants to date. Over the years, Grace has worked on countless interesting problems, covering a broad spectrum from theory to applications. Above all, Grace is best known as the mother of the Wisconsin spline school, the primary driving force behind a rich family of data smoothing methods developed by herself, collaborators, and generations of students. Some of Grace’s life stories can be found in a recent entertaining piece by Nychka, Ma, and Bates 10. Here, we try to complement that piece with some technical discussions concerning the Wisconsin spline school.

As Grace’s former students, our first set of reading assignments consisted of the defining document of reproducing kernel Hilbert spaces by Aronszajn 1 and three early papers by Kimeldorf and Wahba 456. As we shall demonstrate below, some of those early results had far-reaching impacts on developments decades later. According to 10, many of the ideas of Kimeldorf and Wahba were inspired by discussions during tea parties at the Mathematics Research Center at UW Madison in the late 1960s/early 1970s, with participants including Issac Schoenberg, Carl de Boor, and Larry Schumaker.

In the sections to follow, we shall outline the smoothing spline approach to data smoothing, addressing numerous aspects and noting similarities and differences compared to related techniques. We highlight Grace’s many original contributions, but otherwise focus on the flow of presentation; for more accurate attributions of credit, one may consult the forward of 16 and the bibliographic notes in 3.

## 1. Smoothing Splines

Given pairs , …, , , one may obtain a smoothing spline via the minimization of ,

The minimizer of 1 is a piecewise cubic polynomial, three times differentiable, with the third derivative jumping at the distinctive

In the mathematics or numerical analysis literature, a spline typically refers to a piecewise polynomial, one is concerned with function interpolation or approximation, and the

With stochastic data, one does not have exact samples of the function and needs statistical models. A regression model behind 1 has

The first term in 1 pushes for a close fit of

Figure 1 illustrates the cubic smoothing spline of 1 applied to some data from a simulated motorcycle accident, found in R package MASS as data frame mcycle, where

Regression analysis, a primary tool of supervised learning, is widely used in applications. Traditional parametric regression was developed when data were scarce, restricting

Many non-parametric regression methods exist, and all perform equally well in one dimension. The real challenge is in the modeling/estimation of functions on multi-dimensional domains, and generalizations of 1 lead to unparalleled operational convenience and a rich collection of modeling tools.

### 1.1. Penalized likelihood method

The penalized likelihood method results from an abstraction of 1. To estimate a function of interest

where

In a step towards an abstract

A smoothing spline is defined as the solution to a variational problem of the form given in 2. Depending on the configuration, it may or may not reduce to a piecewise polynomial.

The first term in 1 is proportional to the minus log likelihood of the Gaussian regression model stated above. Two more examples of

Example 1 is a special case of non-Gaussian regression. A variant of Example 2 was studied by Silverman 11.

Shown in Figure 2 is an example of one-dimensional density estimation using

The penalized likelihood of 2 is in fact performing constrained maximum likelihood estimation,

using the Lagrange method, with

The null space of

### 1.2. Reproducing kernel Hilbert spaces

The minimization of 2 is implicitly in the space

A reproducing kernel Hilbert space is a Hilbert space

By Riesz representation, there exists a reproducing kernel, a non-negative definite bivariate function

A reproducing kernel Hilbert space can also be generated from its reproducing kernel *any* non-negative definite function qualifies, as the “column space”

For use in 2, one takes

Facts concerning tensor-sum decompositions of reproducing kernel Hilbert spaces can be found in 3, Theorem 2.5, and technical details of Example 3 are in Craven and Wahba 2.

The theory of reproducing kernel Hilbert spaces provides an abstract mathematical framework encompassing a great variety of problems. The abstract setting allows many important issues, such as the computation and the asymptotic convergence of the minimizers of 2, be treated in a unified fashion. As summarized in Grace’s 1990 monograph 16, much of her work up to that date, from approximation theory to spline smoothing, fit under the general framework.

### 1.3. Tensor product splines

A statistical model should be interpretable, which distinguishes it from mere function approximation or some black-box predictor/classifier. Two main challenges in the non-parametric modeling of multivariate data are weak interpretability and the curse of dimensionality, which might be alleviated via a hierarchical structure of the functional ANOVA decomposition.

#### Functional ANOVA decomposition

Consider a bivariate function

where

Examples of averaging operators include

For

where

Selective term elimination in functional ANOVA decompositions helps to combat the curse of dimensionality in estimation and facilitates the interpretation of the analysis. For example, the so-called additive models, those containing only main effects, are easier to estimate and interpret than ones involving interactions. As with classical ANOVA models on discrete domains, the inclusion of higher-order interactions are to be avoided in practice.

For

#### Tensor product spaces

For the estimation of

To construct a reproducing kernel Hilbert space, it suffices to specify a reproducing kernel. The following example illustrates a construction on

The construction described in Example 4 is at an abstract level, requiring little specifics of cubic splines on

Denote by

As an illustration, we fit a logistic regression model of Example 1 to the wesdr data found in R package gss, concerning the progression of diabetic retinopathy. One has

where

Grace’s signature was all over the tensor product spline technique, from the inception of the idea to the ensuing rigorous developments, involving several of her students including your authors; see, e.g., 16, Chap. 10 and 18.

### 1.4. More splines

Real intervals are the most encountered domains in practice, which can be mapped onto

We now present a variety of configurations tuned to various situations, which may be used directly in 2 on the respective designated domains, or be used as building blocks to construct tensor product splines.

#### Periodic splines

To accommodate recurring patterns such as circadian rhythms or seasonal effects, one may consider only periodic functions on a real interval. Mapping the interval onto

#### L-splines

On

When the null space