A construction principle for proper scoring rules

By Jonas R. Brehmer

Abstract

Proper scoring rules enable decision-theoretically principled comparisons of probabilistic forecasts. New scoring rules can be constructed by identifying the predictive distribution with an element of a parametric family and then applying a known scoring rule. We introduce a condition which ensures propriety in this construction and thereby obtain novel proper scoring rules.

1. Introduction

In order to account for the inherent uncertainty of future quantities or events, it is preferable to issue forecasts in the form of probability distributions Reference 6. One way to measure the predictive ability of such probabilistic forecasts is to assign a score, or loss, to each pair of forecast distribution and observation . In this setting is a topological space with Borel -algebra and is a class of probability distributions on . A scoring rule is a function such that for all the expectation

is well defined. Here we let be the extended real line. The scoring rule is proper relative to if

for all . It is strictly proper if equality holds if and only if . If a forecaster believes that the quantity is drawn from the distribution and receives a penalty for reporting , then propriety ensures that reporting her true belief is an optimal strategy in expectation. For recent reviews of the theory and application of proper scoring rules we refer to Reference 2, Reference 7, and Reference 3.

Various proper scoring rules have been proposed in the literature, in particular for the special situation where each member of admits a density with respect to some -finite measure on . The logarithmic score Reference 8 is defined via

where denotes the probability density function of . It is the most popular strictly proper scoring rule for densities since it connects to various fundamental statistical concepts, such as maximum-likelihood estimation, information criteria, and Bayes factors Reference 7. For a popular scoring rule which depends on the first two moments only is the Dawid-Sebastiani (DS) score Reference 4. If is a class of distributions with finite second moments, then it is given by

where and denote the mean and the covariance matrix of the predictive distribution . The DS score is proper, but not strictly proper, as distributions with the same first and second moments attain the same score.

This work is motivated by the fact that, up to unimportant constants, the DS score of equals the logarithmic score of a multivariate normal distribution with the same mean and covariance matrix as . More precisely,

where denotes the density of the multivariate normal distribution with mean and covariance matrix . This connection raises the question, under which conditions we obtain a proper scoring rule by identifying the predictive distribution with an element of a parametric family (e.g. the normal distributions) and then applying another proper scoring rule (e.g. the logarithmic score). The Section 2 gives a simple condition which ensures propriety in this construction and is restricted to neither the normal family nor the logarithmic score. The paper concludes with several examples which yield new proper scoring rules and recover existing ones.

2. Construction principle

Let be a parametric family of distributions with parameter space . Let , be a mapping onto and write for the parameter in .

Theorem 2.1.

Let be a proper scoring rule and . If there is a function which is integrable with respect to all and such that for all

then the scoring rule

is proper.

Proof.

For invoke Equation Equation 2.1 two times to obtain

where the inequality stems from the propriety of .

Strict propriety in Theorem 2.1 is only possible for special choices of and , which render the mapping a bijection, since otherwise two different distributions can attain the same score.

Exponential families are natural and flexible candidates for distributional classes in statistics. We call a set of densities on an exponential family if any member can be represented via

for measurable functions , , , and , where . The mapping is often called log-partition function and is a sufficient statistic for the parameter ; see Reference 1 for details.

When the scoring rule in Theorem 2.1 is the logarithmic score, exponential families are convenient candidates for the class . In detail, let be an exponential family on and set , then Equation Equation 2.1 holds if

i.e. if the expectations of agree. Since is a member of the exponential family , the right-hand side of Equation 2.2 can be calculated and expressed in terms of via the partial derivatives of the log-partition function . If a closed-form expression exists, this yields sufficient conditions on the mapping for Equation 2.1 to hold; see Section 3 for concrete examples.

Another possible choice for in Theorem 2.1 which fits well with exponential families is the Hyvärinen score Reference 9. Let and let denote the gradient and the Laplace operator. Define as the class of densities on which are twice differentiable, positive almost everywhere, and such that as for all . Then the Hyvärinen score is given by

and it is a strictly proper scoring rule relative to if its expectation is finite. The Hyvärinen score has the remarkable property that it is -homogeneous, i.e. to compute the predictive density needs to be specified up to the normalization constant only; see Reference 9, Reference 10, and Reference 5 for details.

To connect to Theorem 2.1 assume for simplicity that is an exponential family of distributions on where the function is constant and all densities satisfy the regularity conditions of the class . If we define , then the Hyvärinen score on is completely determined by

where the index denotes the -th component of a vector in . As a consequence, we can set and Equation Equation 2.1 holds if the derivatives of satisfy

for , , …, , giving identities. Similar to Equation 2.2 these equations provide sufficient conditions for Equation 2.1 to hold, which can be used to define a suitable mapping in Theorem 2.1; see Example 3.4.

3. Examples

Example 3.1 (Normal family).

Let consist of the multivariate normal distributions with parameter , where is the mean and the covariance matrix. The exponential family representation of implies . If is the logarithmic score, then a mapping can be determined via Equation 2.2. This yields

such that has to be computed from a predictive distribution . The resulting scoring function

is proper by Theorem 2.1 and an affine transformation of the DS score, as discussed in Section 1.

Example 3.2 (Laplace family).

Let be the class of centered Laplace distributions with scale parameter . Its members have densities , thus it forms an exponential family with . In this situation, Equation 2.2 becomes

such that is computed from the predictive distribution. Theorem 2.1 implies that the scoring rule

where is proper. A natural question is whether it is possible to transfer these arguments to the general class of Laplace distributions with parameters , i.e. to the situation of a non-constant location parameter . In this case, Equation 2.1 reads

with . Since the random variable and the parameter cannot be separated, it is not clear how to obtain a mapping which satisfies this identity for all if is sufficiently large. Consequently, it is not obvious whether Theorem 2.1 can be applied to the logarithmic score in concert with the general Laplace family.

Example 3.3 (Poisson family).

Let and be the class of Poisson distributions with parameter . The exponential family representation implies and Equation 2.2 becomes

hence gives a suitable mapping . By Theorem 2.1 the resulting scoring rule

where is the expectation of , is proper.

Example 3.4 (Normal family, continued).

Let be as in Example 3.1. If is the Hyvärinen score, the conditions in Equation 2.3 and Equation 2.4 simplify to equations which contain the moments and mixed moments for , , …, , only. Hence, the mapping of Example 3.1, which is given by the parameter choice , satisfies these conditions. As a result we obtain a Dawid-Sebastiani type scoring rule given by

which is proper by Theorem 2.1. It already appears in Reference 9, Section 3.1 in the context of score matching, however, our derivation establishes propriety in wide generality, not only relative to the normal family.

Acknowledgments

The author is grateful for funding by the Klaus Tschira Foundation and for infrastructural support provided by the University of Mannheim. I thank Tilmann Gneiting for fruitful comments and discussions and two anonymous referees for helpful suggestions.

Mathematical Fragments

Theorem 2.1.

Let be a proper scoring rule and . If there is a function which is integrable with respect to all and such that for all

then the scoring rule

is proper.

Equation (2.2)
Equations (2.3), (2.4)
Example 3.1 (Normal family).

Let consist of the multivariate normal distributions with parameter , where is the mean and the covariance matrix. The exponential family representation of implies . If is the logarithmic score, then a mapping can be determined via Equation 2.2. This yields

such that has to be computed from a predictive distribution . The resulting scoring function

is proper by Theorem 2.1 and an affine transformation of the DS score, as discussed in Section 1.

Example 3.4 (Normal family, continued).

Let be as in Example 3.1. If is the Hyvärinen score, the conditions in Equation 2.3 and Equation 2.4 simplify to equations which contain the moments and mixed moments for , , …, , only. Hence, the mapping of Example 3.1, which is given by the parameter choice , satisfies these conditions. As a result we obtain a Dawid-Sebastiani type scoring rule given by

which is proper by Theorem 2.1. It already appears in Reference 9, Section 3.1 in the context of score matching, however, our derivation establishes propriety in wide generality, not only relative to the normal family.

References

Reference [1]
O. Barndorff-Nielsen, Information and exponential families in statistical theory, Wiley Series in Probability and Statistics, John Wiley & Sons, Ltd., Chichester, 2014. Reprint of the 1978 original [MR0489333], DOI 10.1002/9781118857281. MR3221776,
Show rawAMSref \bib{Barn2014}{book}{ author={Barndorff-Nielsen, O.}, title={Information and exponential families in statistical theory}, series={Wiley Series in Probability and Statistics}, note={Reprint of the 1978 original [MR0489333]}, publisher={John Wiley \& Sons, Ltd., Chichester}, date={2014}, pages={x+238}, isbn={978-1-118-85750-2}, review={\MR {3221776}}, doi={10.1002/9781118857281}, }
Reference [2]
A. P. Dawid, The geometry of proper scoring rules, Ann. Inst. Statist. Math. 59 (2007), no. 1, 77–93, DOI 10.1007/s10463-006-0099-8. MR2396033,
Show rawAMSref \bib{Dawid2007}{article}{ author={Dawid, A. P.}, title={The geometry of proper scoring rules}, journal={Ann. Inst. Statist. Math.}, volume={59}, date={2007}, number={1}, pages={77--93}, issn={0020-3157}, review={\MR {2396033}}, doi={10.1007/s10463-006-0099-8}, }
Reference [3]
Alexander Philip Dawid and Monica Musio, Theory and applications of proper scoring rules, Metron 72 (2014), no. 2, 169–183, DOI 10.1007/s40300-014-0039-y. MR3233147,
Show rawAMSref \bib{DawidMusio2014}{article}{ author={Dawid, Alexander Philip}, author={Musio, Monica}, title={Theory and applications of proper scoring rules}, journal={Metron}, volume={72}, date={2014}, number={2}, pages={169--183}, issn={0026-1424}, review={\MR {3233147}}, doi={10.1007/s40300-014-0039-y}, }
Reference [4]
A. Philip Dawid and Paola Sebastiani, Coherent dispersion criteria for optimal experimental design, Ann. Statist. 27 (1999), no. 1, 65–81, DOI 10.1214/aos/1018031101. MR1701101,
Show rawAMSref \bib{DawidSebas1999}{article}{ author={Dawid, A. Philip}, author={Sebastiani, Paola}, title={Coherent dispersion criteria for optimal experimental design}, journal={Ann. Statist.}, volume={27}, date={1999}, number={1}, pages={65--81}, issn={0090-5364}, review={\MR {1701101}}, doi={10.1214/aos/1018031101}, }
Reference [5]
Werner Ehm and Tilmann Gneiting, Local proper scoring rules of order two, Ann. Statist. 40 (2012), no. 1, 609–637, DOI 10.1214/12-AOS973. MR3014319,
Show rawAMSref \bib{EhmGneit2012}{article}{ author={Ehm, Werner}, author={Gneiting, Tilmann}, title={Local proper scoring rules of order two}, journal={Ann. Statist.}, volume={40}, date={2012}, number={1}, pages={609--637}, issn={0090-5364}, review={\MR {3014319}}, doi={10.1214/12-AOS973}, }
Reference [6]
Tilmann Gneiting and Matthias Katzfuss, Probabilistic forecasting, Ann. Rev. Stat. Appl. 1 (2014), 125–151.
Reference [7]
Tilmann Gneiting and Adrian E. Raftery, Strictly proper scoring rules, prediction, and estimation, J. Amer. Statist. Assoc. 102 (2007), no. 477, 359–378, DOI 10.1198/016214506000001437. MR2345548,
Show rawAMSref \bib{GneitRaft2007}{article}{ author={Gneiting, Tilmann}, author={Raftery, Adrian E.}, title={Strictly proper scoring rules, prediction, and estimation}, journal={J. Amer. Statist. Assoc.}, volume={102}, date={2007}, number={477}, pages={359--378}, issn={0162-1459}, review={\MR {2345548}}, doi={10.1198/016214506000001437}, }
Reference [8]
I. J. Good, Rational decisions, J. Roy. Statist. Soc. Ser. B 14 (1952), 107–114. MR77033,
Show rawAMSref \bib{Good1952}{article}{ author={Good, I. J.}, title={Rational decisions}, journal={J. Roy. Statist. Soc. Ser. B}, volume={14}, date={1952}, pages={107--114}, issn={0035-9246}, review={\MR {77033}}, }
Reference [9]
Aapo Hyvärinen, Estimation of non-normalized statistical models by score matching, J. Mach. Learn. Res. 6 (2005), 695–709. MR2249836,
Show rawAMSref \bib{Hyvae2005}{article}{ author={Hyv\"{a}rinen, Aapo}, title={Estimation of non-normalized statistical models by score matching}, journal={J. Mach. Learn. Res.}, volume={6}, date={2005}, pages={695--709}, issn={1532-4435}, review={\MR {2249836}}, }
Reference [10]
Matthew Parry, A. Philip Dawid, and Steffen Lauritzen, Proper local scoring rules, Ann. Statist. 40 (2012), no. 1, 561–592, DOI 10.1214/12-AOS971. MR3014317,
Show rawAMSref \bib{Parryetal2012}{article}{ author={Parry, Matthew}, author={Dawid, A. Philip}, author={Lauritzen, Steffen}, title={Proper local scoring rules}, journal={Ann. Statist.}, volume={40}, date={2012}, number={1}, pages={561--592}, issn={0090-5364}, review={\MR {3014317}}, doi={10.1214/12-AOS971}, }

Article Information

MSC 2020
Primary: 62C99 (None of the above, but in this section)
Author Information
Jonas R. Brehmer
Computational Statistics Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany
Jonas.Brehmer@h-its.org
MathSciNet
Communicated by
Qi-Man Shao
Journal Information
Proceedings of the American Mathematical Society, Series B, Volume 8, Issue 24, ISSN 2330-1511, published by the American Mathematical Society, Providence, Rhode Island.
Publication History
This article was received on , revised on , , and published on .
Copyright Information
Copyright 2021 by the author under Creative Commons Attribution-Noncommercial 3.0 License (CC BY NC 3.0)
Article References
  • Permalink
  • Permalink (PDF)
  • DOI 10.1090/bproc/98
  • MathSciNet Review: 4323520
  • Show rawAMSref \bib{4323520}{article}{ author={Brehmer, Jonas}, title={A construction principle for proper scoring rules}, journal={Proc. Amer. Math. Soc. Ser. B}, volume={8}, number={24}, date={2021}, pages={297-301}, issn={2330-1511}, review={4323520}, doi={10.1090/bproc/98}, }

Settings

Change font size
Resize article panel
Enable equation enrichment

Note. To explore an equation, focus it (e.g., by clicking on it) and use the arrow keys to navigate its structure. Screenreader users should be advised that enabling speech synthesis will lead to duplicate aural rendering.

For more information please visit the AMS MathViewer documentation.