Proper scoring rules enable decision-theoretically principled comparisons of probabilistic forecasts. New scoring rules can be constructed by identifying the predictive distribution with an element of a parametric family and then applying a known scoring rule. We introduce a condition which ensures propriety in this construction and thereby obtain novel proper scoring rules.
1. Introduction
In order to account for the inherent uncertainty of future quantities or events, it is preferable to issue forecasts in the form of probability distributions Reference 6. One way to measure the predictive ability of such probabilistic forecasts is to assign a score, or loss, $S(F,y)$ to each pair of forecast distribution $F \in \mathcal{F}$ and observation $y \in \mathsf{O}$. In this setting $\mathsf{O}$ is a topological space with Borel $\sigma$-algebra$\mathcal{O}$ and $\mathcal{F}$ is a class of probability distributions on $(\mathsf{O}, \mathcal{O})$. A scoring rule is a function $S: \mathcal{F}\times \mathsf{O}\rightarrow \bar{\mathbb{R}}$ such that for all $F,G \in \mathcal{F}$ the expectation
is well defined. Here we let $\bar{\mathbb{R}}\coloneq [-\infty , \infty ]$ be the extended real line. The scoring rule $S$ is proper relative to $\mathcal{F}$ if
for all $F, G \in \mathcal{F}$. It is strictly proper if equality holds if and only if $F = G$. If a forecaster believes that the quantity $y$ is drawn from the distribution $G$ and receives a penalty $S(F,y)$ for reporting $F$, then propriety ensures that reporting her true belief $F=G$ is an optimal strategy in expectation. For recent reviews of the theory and application of proper scoring rules we refer to Reference 2, Reference 7, and Reference 3.
Various proper scoring rules have been proposed in the literature, in particular for the special situation where each member of $\mathcal{F}$ admits a density with respect to some $\sigma$-finite measure on $(\mathsf{O}, \mathcal{O})$. The logarithmic scoreReference 8 is defined via
where $f$ denotes the probability density function of $F$. It is the most popular strictly proper scoring rule for densities since it connects to various fundamental statistical concepts, such as maximum-likelihood estimation, information criteria, and Bayes factors Reference 7. For $\mathsf{O}= \mathbb{R}^d$ a popular scoring rule which depends on the first two moments only is the Dawid-Sebastiani (DS) scoreReference 4. If $\mathcal{F}$ is a class of distributions with finite second moments, then it is given by
where $\mu _F$ and $\Sigma _F$ denote the mean and the covariance matrix of the predictive distribution $F$. The DS score is proper, but not strictly proper, as distributions with the same first and second moments attain the same score.
This work is motivated by the fact that, up to unimportant constants, the DS score of $F$ equals the logarithmic score of a multivariate normal distribution with the same mean and covariance matrix as $F$. More precisely,
where $\varphi ( \cdot \mid \mu , \Sigma )$ denotes the density of the multivariate normal distribution with mean $\mu$ and covariance matrix $\Sigma$. This connection raises the question, under which conditions we obtain a proper scoring rule by identifying the predictive distribution $F$ with an element of a parametric family (e.g. the normal distributions) and then applying another proper scoring rule (e.g. the logarithmic score). The Section 2 gives a simple condition which ensures propriety in this construction and is restricted to neither the normal family nor the logarithmic score. The paper concludes with several examples which yield new proper scoring rules and recover existing ones.
2. Construction principle
Let $\mathcal{E}\coloneq \{ F_\theta \mid \theta \in \Theta \} \subseteq \mathcal{F}$ be a parametric family of distributions with parameter space $\Theta$. Let $\phi : \mathcal{F}\to \mathcal{E}$,$F \mapsto F_\theta$ be a mapping onto $\mathcal{E}$ and write $\theta _F$ for the parameter $\theta$ in $\phi (F) = F_\theta$.
Strict propriety in Theorem 2.1 is only possible for special choices of $\mathcal{E}$ and $\phi$, which render the mapping $\phi$ a bijection, since otherwise two different distributions can attain the same score.
Exponential families are natural and flexible candidates for distributional classes in statistics. We call a set of densities $\{ f(\cdot \mid \theta ) \mid \theta \in \Theta \}$ on $\mathsf{O}$ an exponential family if any member can be represented via
for measurable functions $h : \mathsf{O}\to (0, \infty )$,$t : \mathsf{O}\to \mathbb{R}^m$,$\eta : \Theta \to \mathbb{R}^m$, and $A : \Theta \to \mathbb{R}$, where $m \in \mathbb{N}$. The mapping $A$ is often called log-partition function and $t$ is a sufficient statistic for the parameter $\theta$; see Reference 1 for details.
When the scoring rule $S$ in Theorem 2.1 is the logarithmic score, exponential families are convenient candidates for the class $\mathcal{E}$. In detail, let $\mathcal{E}$ be an exponential family on $\mathsf{O}$ and set $H(y) \coloneq \log h(y)$, then Equation Equation 2.1 holds if
i.e. if the expectations of $t$ agree. Since $\phi (G)$ is a member of the exponential family $\mathcal{E}$, the right-hand side of Equation 2.2 can be calculated and expressed in terms of $\theta \in \Theta$ via the partial derivatives of the log-partition function $A$. If a closed-form expression exists, this yields sufficient conditions on the mapping $\phi : \mathcal{F}\to \mathcal{E}$ for Equation 2.1 to hold; see Section 3 for concrete examples.
Another possible choice for $S$ in Theorem 2.1 which fits well with exponential families is the Hyvärinen scoreReference 9. Let $\mathsf{O}= \mathbb{R}^d$ and let $\nabla$ denote the gradient and $\Delta$ the Laplace operator. Define $\mathcal{L}^*$ as the class of densities on $\mathsf{O}$ which are twice differentiable, positive almost everywhere, and such that $\nabla \log (f(y)) g(y) \to 0$ as $\Vert y \Vert \to \infty$ for all $f,g \in \mathcal{L}^*$. Then the Hyvärinen score is given by
and it is a strictly proper scoring rule relative to $\mathcal{L}^*$ if its expectation is finite. The Hyvärinen score has the remarkable property that it is $0$-homogeneous, i.e. to compute $\mathrm{HyvS}(f,y)$ the predictive density $f$ needs to be specified up to the normalization constant only; see Reference 9, Reference 10, and Reference 5 for details.
To connect to Theorem 2.1 assume for simplicity that $\mathcal{E}$ is an exponential family of distributions on $\mathsf{O}= \mathbb{R}^d$ where the function $h$ is constant and all densities satisfy the regularity conditions of the class $\mathcal{L}^*$. If we define $W_\theta (y) \coloneq \eta (\theta )^\top t(y)$, then the Hyvärinen score on $\mathcal{E}$ is completely determined by
where the index $i$ denotes the $i$-th component of a vector in $\mathbb{R}^m$. As a consequence, we can set $H=0$ and Equation Equation 2.1 holds if the derivatives of $t$ satisfy
for $i$,$j=1$, …, $m$, giving $m + m(m+1)/2$ identities. Similar to Equation 2.2 these equations provide sufficient conditions for Equation 2.1 to hold, which can be used to define a suitable mapping $\phi : \mathcal{F}\to \mathcal{E}$ in Theorem 2.1; see Example 3.4.
3. Examples
Acknowledgments
The author is grateful for funding by the Klaus Tschira Foundation and for infrastructural support provided by the University of Mannheim. I thank Tilmann Gneiting for fruitful comments and discussions and two anonymous referees for helpful suggestions.
O. Barndorff-Nielsen, Information and exponential families in statistical theory, Wiley Series in Probability and Statistics, John Wiley & Sons, Ltd., Chichester, 2014. Reprint of the 1978 original [MR0489333], DOI 10.1002/9781118857281. MR3221776, Show rawAMSref\bib{Barn2014}{book}{
author={Barndorff-Nielsen, O.},
title={Information and exponential families in statistical theory},
series={Wiley Series in Probability and Statistics},
note={Reprint of the 1978 original [MR0489333]},
publisher={John Wiley \& Sons, Ltd., Chichester},
date={2014},
pages={x+238},
isbn={978-1-118-85750-2},
review={\MR {3221776}},
doi={10.1002/9781118857281},
}
Reference [2]
A. P. Dawid, The geometry of proper scoring rules, Ann. Inst. Statist. Math. 59 (2007), no. 1, 77–93, DOI 10.1007/s10463-006-0099-8. MR2396033, Show rawAMSref\bib{Dawid2007}{article}{
author={Dawid, A. P.},
title={The geometry of proper scoring rules},
journal={Ann. Inst. Statist. Math.},
volume={59},
date={2007},
number={1},
pages={77--93},
issn={0020-3157},
review={\MR {2396033}},
doi={10.1007/s10463-006-0099-8},
}
Reference [3]
Alexander Philip Dawid and Monica Musio, Theory and applications of proper scoring rules, Metron 72 (2014), no. 2, 169–183, DOI 10.1007/s40300-014-0039-y. MR3233147, Show rawAMSref\bib{DawidMusio2014}{article}{
author={Dawid, Alexander Philip},
author={Musio, Monica},
title={Theory and applications of proper scoring rules},
journal={Metron},
volume={72},
date={2014},
number={2},
pages={169--183},
issn={0026-1424},
review={\MR {3233147}},
doi={10.1007/s40300-014-0039-y},
}
Reference [4]
A. Philip Dawid and Paola Sebastiani, Coherent dispersion criteria for optimal experimental design, Ann. Statist. 27 (1999), no. 1, 65–81, DOI 10.1214/aos/1018031101. MR1701101, Show rawAMSref\bib{DawidSebas1999}{article}{
author={Dawid, A. Philip},
author={Sebastiani, Paola},
title={Coherent dispersion criteria for optimal experimental design},
journal={Ann. Statist.},
volume={27},
date={1999},
number={1},
pages={65--81},
issn={0090-5364},
review={\MR {1701101}},
doi={10.1214/aos/1018031101},
}
Reference [5]
Werner Ehm and Tilmann Gneiting, Local proper scoring rules of order two, Ann. Statist. 40 (2012), no. 1, 609–637, DOI 10.1214/12-AOS973. MR3014319, Show rawAMSref\bib{EhmGneit2012}{article}{
author={Ehm, Werner},
author={Gneiting, Tilmann},
title={Local proper scoring rules of order two},
journal={Ann. Statist.},
volume={40},
date={2012},
number={1},
pages={609--637},
issn={0090-5364},
review={\MR {3014319}},
doi={10.1214/12-AOS973},
}
Reference [6]
Tilmann Gneiting and Matthias Katzfuss, Probabilistic forecasting, Ann. Rev. Stat. Appl. 1 (2014), 125–151.
Reference [7]
Tilmann Gneiting and Adrian E. Raftery, Strictly proper scoring rules, prediction, and estimation, J. Amer. Statist. Assoc. 102 (2007), no. 477, 359–378, DOI 10.1198/016214506000001437. MR2345548, Show rawAMSref\bib{GneitRaft2007}{article}{
author={Gneiting, Tilmann},
author={Raftery, Adrian E.},
title={Strictly proper scoring rules, prediction, and estimation},
journal={J. Amer. Statist. Assoc.},
volume={102},
date={2007},
number={477},
pages={359--378},
issn={0162-1459},
review={\MR {2345548}},
doi={10.1198/016214506000001437},
}
Reference [8]
I. J. Good, Rational decisions, J. Roy. Statist. Soc. Ser. B 14 (1952), 107–114. MR77033, Show rawAMSref\bib{Good1952}{article}{
author={Good, I. J.},
title={Rational decisions},
journal={J. Roy. Statist. Soc. Ser. B},
volume={14},
date={1952},
pages={107--114},
issn={0035-9246},
review={\MR {77033}},
}
Reference [9]
Aapo Hyvärinen, Estimation of non-normalized statistical models by score matching, J. Mach. Learn. Res. 6 (2005), 695–709. MR2249836, Show rawAMSref\bib{Hyvae2005}{article}{
author={Hyv\"{a}rinen, Aapo},
title={Estimation of non-normalized statistical models by score matching},
journal={J. Mach. Learn. Res.},
volume={6},
date={2005},
pages={695--709},
issn={1532-4435},
review={\MR {2249836}},
}
Reference [10]
Matthew Parry, A. Philip Dawid, and Steffen Lauritzen, Proper local scoring rules, Ann. Statist. 40 (2012), no. 1, 561–592, DOI 10.1214/12-AOS971. MR3014317, Show rawAMSref\bib{Parryetal2012}{article}{
author={Parry, Matthew},
author={Dawid, A. Philip},
author={Lauritzen, Steffen},
title={Proper local scoring rules},
journal={Ann. Statist.},
volume={40},
date={2012},
number={1},
pages={561--592},
issn={0090-5364},
review={\MR {3014317}},
doi={10.1214/12-AOS971},
}
Show rawAMSref\bib{4323520}{article}{
author={Brehmer, Jonas},
title={A construction principle for proper scoring rules},
journal={Proc. Amer. Math. Soc. Ser. B},
volume={8},
number={24},
date={2021},
pages={297-301},
issn={2330-1511},
review={4323520},
doi={10.1090/bproc/98},
}
Settings
Change font size
Resize article panel
Enable equation enrichment
(Not available in this browser)
Note. To explore an equation, focus it (e.g., by clicking on it) and use the arrow keys to navigate its structure. Screenreader users should be advised that enabling speech synthesis will lead to duplicate aural rendering.