Notices of the American Mathematical Society

Welcome to the current issue of the Notices of the American Mathematical Society.
With support from AMS membership, we are pleased to share the journal with the global mathematical community.


Stochastic Separation Theorems: How Geometry May Help to Correct AI Errors

Alexander Gorban
Bogdan Grechuk
Ivan Tyukin

Communicated by Notices Associate Editor Emilie Purvine

Article cover

1. Introduction

Recent years have seen explosive progress in data-driven artificial intelligence (AI) systems. Many decades of the development of mathematics underpinning statistical learning theory coupled with advancements in approximation theory, numerical analysis, technology, and computing gave rise to new-generation AI transforming our life. These systems show great promise in cancer diagnostics MSG20, they are a part of autonomous cars 22, automated face recognition and biometrics KE21, image segmentation SBKV20, language processing and translation tools DZS22, and as such become our new reality. Availability of unprecedented volumes of data, citizens’ expectations and participation are further driving this change.

New reality, however, brings new challenges. Uncertainties and biases are inherent within any empirical data. They enter production pipelines of data-driven AI and ripple through them causing errors. AI instabilities and adversarial examples—errors due to minor changes in data or structure—have recently been found in many advanced data-driven AI models. Moreover, mounting evidence suggests that these errors are in fact expected in such systems THG20 and may not always be cured by larger volumes of data or better training algorithms BHV21 as long as the AI architecture remains fixed.

This leads to the following question: if errors are inevitable in data-driven AI then how do we deal with them once they occur?

One way to address this imminent challenge is to equip an AI with an “error filter” or “error corrector” GT18. The function of the AI corrector is to learn from errors “on-the-job,” supplementing the AI’s initial training. Dynamic addition of AI correctors continuously extends AI architecture, adapts to data uncertainty GGG18, and enables AI to escape the stability barrier revealed in BHV21. When a new data arrives at AI input, the AI error corrector then decides if it is likely to cause an error, and if so, then reports. To do this, the filter uses some set of attributes, such as, for example, internal latent representations of the input in AI decision space. To each attribute , the system assigns some weight . For each new input, the system computes numerical values of all attributes , and compares the weighted sum with some threshold to decide whether to report the input as an error.

However, how does the filter determine the weights of all attributes? To do this, the filter is provided a training set of example inputs marked as correct and errors. Then the system tries to find weights such that, ideally, all data in the training set are classified correctly. Moreover the system tries to ensure that all (or a large proportion of) future “unseen” inputs would be processed correctly too. In other words, the system seeks to learn the weights from some training data, and the error filter itself is therefore an example of a machine learning (ML) system.

Geometrically, any input is described by the values of the attributes, and can therefore be represented as a point in the -dimensional Euclidean space, where is the number of attributes. Then the criterion for an input being an error defines a half-space, whose boundary is the hyperplane defined by the equation . If we mark points corresponding to errors and correct AI behavior as red and blue, respectively, the machine learning task of error identification reduces to finding a hyperplane that separates the red points from the blue ones; see Figure 1.

Assume that such hyperplane exists and we have started to use the filter with the corresponding weights . Imagine, however, that a new input has arrived, which the filter classified as correct but the user marked as an error. In other words, the filter itself made an error. Of course, we would like the system to be able to learn from such errors and improve its performance in the future. An obvious way to do this is to add a new point to the training set and recompute the weights. This constitutes “retraining the system.” Geometrically, this means that a new red point appears on the “wrong” side of the hyperplane, so that we try to find a different hyperplane that separates all points correctly; see Figure 2. Obviously, it is not always possible to find such a hyperplane; see Figure 3. Moreover, even if it is possible, it may require substantial time to recompute all weights every time the filter makes an error.

Figure 1.

Separation of red and blue points by a hyperplane.

Graphic without alt text
Figure 2.

Retraining the system by recomputing a hyperplane.

Graphic without alt text
Figure 3.

Separation of new red point by a different hyperplane.

Graphic without alt text
Figure 4.

A red point not separable by a hyperplane.

Graphic without alt text

Alternatively, one may use the following error-correction method, suggested in GMT19: separate a new red point from the existing blue points by another hyperplane given by an equation ; see Figure 3. After this, classify any new input as error if either or .

A careful reader may have already noticed a limitation of this approach that appears to be fundamental: why did we assume that a point can be separated from all other points by a hyperplane? Obviously, if that point belongs to the convex hull⁠Footnote1 of other points, then a separating hyperplane does not exist and the method does not work; see Figure 4. For example, even if we have just points , then may lie in the interior of the line interval , and in this case it cannot be separated from .

1

Recall that the convex hull of set is the intersection of all convex sets containing .

However, intuitively, the described case is in some sense “degenerate” and should not happen too often with real data. The best way to formalise this intuition is to use the language of probability theory, and ask what is the probability that the method would work for random data. This leads to a very nice problem that lies on the borderline of probability theory and geometry.

Problem 1.

Given a set of random points in , what is the probability that each point can be separated from all other points by a hyperplane? Equivalently, what is the probability that points in are in convex position (in sense that each point is a vertex of the convex hull of )?

2. Sylvester’s Problem

Problem 1 has a long history and goes back to at least the question asked by Sylvester in 1864: given random points on the plane, what is the probability that they form a convex quadrilateral?

To address this question, it is convenient to introduce the following random variables. Let be the random variable equal to if point is inside the triangle and otherwise. Let random variables , and be defined similarly. Then random variable

counts the number of points that are inside the triangle formed by other points. Hence, precisely if form a convex quadrilateral, and this happens with probability . With probability , . Thus, the expected value and .

This implies that to find it suffices to find . From the linearity of the expectation, and assuming that are drawn independently from the same distribution,

Next, is a random variable that takes values or , and

where is the probability that lies inside triangle .

If points are selected independently and uniformly at random from the unit disk , then by the law of total expectation,

where denotes the area. Hence, the problem reduces to determining the expected area of the triangle . In 1867, Woolhouse determined that

hence

Of course, random points can be selected inside regions different from a disk. Sylvester also asked the same question in the following modified form. Let be a convex body in the plane (that is, a compact convex set with non-empty interior) and choose four points from independently and uniformly at random. What is the probability that these points are the vertices of a convex quadrilateral? Further, for what is this probability the smallest and the largest? The second question has been solved by Blaschke, who proved in 1917 that for all convex bodies ,

where and denotes a triangle and a disk in the plane, respectively.

Sylvester’s question can be asked for points: if they are selected uniformly at random in a convex body in the plane, what is the probability that they form a convex -gon?

In 1995, Valtr solved this problem exactly for a parallelogram , and proved that

In 1996, Valtr also solved this problem for triangle , and showed that

Using Stirling’s approximation for the factorial, it is straightforward to prove that

Because any convex body in the plane can be sandwiched between two triangles, this implies the existence of universal constants such that

for all and all . In fact, Bárány Bár99 proved in 1999 that

for some constant that depends on . For example, for parallelogram, for triangle, and for disk. In particular,

approaches as with super-exponential speed.

Can we have the exact (non-asymptotic) formulas for ? In 1971, Miles derived the exact formula for :

Finally, Marckert in 2017 derived exact (but somewhat complicated) formulas for for an arbitrary . For example, for ,

The following table lists numerical values for , and for .

As expected, we see that the probabilities decrease fast even for small values of . This is bad news for our machine learning application, because it shows that new points will most likely be in the convex hull of other points. However, all these results are in the plane, which corresponds to a (toy) machine learning system with just two attributes. Any real ML system has significantly more attributes, hence we should study Problem 1 in higher-dimensional spaces. In the next section we show that separability properties of random points in higher dimensions are dramatically different from those computed for our low-dimensional example.

3. The Effect of Higher Dimension

We start our analysis of Problem 1 in higher dimensions with a simple special case. Let be the closed unit ball in . We first consider the case when points are fixed, and is selected uniformly at random in . In 1986, Elekes Ele86 proved that for any points , we have

where is the convex hull, and denotes the -dimensional volume. This implies that can be separated from by a hyperplane with probability at least . This probability is greater than provided that , or

Now assume that we select points independently and uniformly at random in . Let be the event that the point is inside the convex hull of the remaining points. Then Elekes’s theorem implies that the probability of is at most , and the probability of the event is at most . Hence with the probability greater than every point is separable by a hyperplane from the remaining points. This probability is greater than if , or

The upper bound 3 was originally proved by Bárány and Füredi in 1988. Complementing this result, Bárány and Füredi also proved that, for all , the probability that

independent uniformly distributed points in are all vertices of their convex hull is less than . Hence, the bound 3 is quite tight. In particular, the result is no longer true if in 3 is replaced by for any .

The following table shows, in various dimensions , the upper bounds for in 2 and 3 with , ensuring the separability with probability.

Upper bound in 2 Upper bound in 3

We see that in dimension , a random point is separable from millions of other points with probability over , and thousands of random points are all separable. In dimension , millions of points all become separable. In other words, if we select million uniformly random points in ball , then with probability over they are all vertices of their convex hull. This observation is in sharp contrast with our low-dimensional intuition.

This effect is not limited to the uniform distribution in the unit ball . In fact, when we say “uniform distribution in the unit ball,” we actually mean a family of distributions, one for each dimension: the uniform distribution on the interval in , the uniform distribution in the the disk in , and so on. In the theorems below, the dimension will not be fixed but will be a variable, and in this case we need to consider a family

of probability measures, where denotes the probability measure on .

Definition 1.

GGG18 The family of joint distributions of points in has SmAC property if there exist constants , , and , such that for every positive integer , any convex set such that

any index , and any points in , we have

Condition 4 says that, with probability exponentially close to , a random point lies inside the unit ball, but outside of any convex set of exponentially small volume. In other words, SmAC property holds for the distributions without (i) heavy tails and (ii) sharp peaks in sets with exponentially small volume. Indeed, any bounded or light-tailed distribution can, after appropriate shift and rescaling, be located essentially inside , while for heavy-tailed distributions there is a significant probability that , hence 4 fails. The name SmAC is an abbreviation of “SMeared Absolute Continuity” and comes from analogy with absolute continuity: the absolute continuity means that the sets of zero measure have zero probability, and the SmAC condition requires that convex sets with exponentially small volume should not have high probability.

The theorem below states that if a family of distributions has the SmAC property, then exponentially many points are in convex position with high probability.

Theorem 1.

GGG18 Let be a set of random points in from a distribution satisfying the SmAC property. Let be fixed. Then there exists constants and such that if then points are in convex position with probability greater than .

The SmAC condition is very general and holds for a large variety of distributions. As an illustration, consider a special case of i.i.d. data. If probability measures in family have support in the unit ball and density , then the SmAC condition holds provided that

where and are some constants independent of the dimension, and is the density of the uniform distribution in . In other words, the density is allowed to differ from the uniform density by an exponentially large factor, and the exponent must be a constant independent of but can be arbitrarily large.

For example, let be a bounded measurable set in . Then it is not difficult to see that 5 is true for the uniform distribution in (a possibly scaled and shifted) provided that

for some constant . In particular, if is the unit cube in , then , , and 6 holds with . Hence Theorem 1 implies that exponentially many points selected uniformly at random from the unit cube are in convex position with high probability.

4. Computing Separating Hyperplanes

Under SmAC condition, exponentially many random points in are linearly separable with high probability: for each , there exists a hyperplane passing through such that all other points are on the same side from . If are the coordinates of point , , then we can explicitly find by solving the quadratic program

If is the solution to 7, then

The above program is a version of the well-known maximal-margin classifier or a support vector machine. This quadratic program has constraints and variables. Worst-case computational complexity of solving this problem scales as Cha07. When is potentially exponentially large in , the worst-case complexity grows exponentially with .

The other issue with finding separating hyperplanes through solving 7 is that this approach requires full knowledge of all points , . Whilst such knowledge might be available in some tasks, it is hardly practical in the task of correcting AI errors. In this context, represents an AI “error” that has already been detected and is to be removed, and , stand for “correct or expected” past and possibly future AI behavior. The fact that some or all are unknown makes solving 7 hardly possible. The question, therefore, is:

Problem 2.

How to construct separating from the remaining points without knowing their positions?

In the next sections we show that, for appropriately high dimension and under some mild assumptions, there are simple closed-form expressions defining hyperplanes separating from , with probability close to .

4.1. One-shot separability: Fisher separability

In order to develop the intuition for Problem 2, let us return to the simplest example, when the points are selected uniformly at random from the -dimensional unit ball . Any hyperplane through divides into pieces with volumes . To maximize the chance that hyperplane separates from all other points, we aim to select such that volume is the minimal possible. The optimal choice of is the hyperplane orthogonal to , where is the centre of ; see Figure 5. If is the event that point belongs to the piece with volume , then a straighforward calculation shows that

where is the indicator function of the event , the second equality is the law of total expectation, the third equality follows from the fact that is equal to if and only if belongs to a ball with radius , and the last equality follows from the fact that is a random variable with cdf

Figure 5.

One-shot separability in a sphere.

Graphic without alt text

Now, if we have i.i.d. points from , there are ordered pairs of points. Hence, the probability that we can find some pair such that the corresponding event happens is at most . This probability is less than provided that

Remarkably, this bound is even less restrictive than 3, while the conclusion is stronger: not only are this many points in convex position with probability greater than , but in fact each point can be separated from the other ones by the specific hyperplane tangent to , which is independent from other points, and can be constructed exponentially faster than solving the program 7.

It turns out that this simple idea to choose hyperplane tangent to solves Problem 2 for surprisingly many families of distributions, and is known as Fisher separability GGG18.

Definition 2.

A point is Fisher-separable from with threshold if

We say that is Fisher-separable from a finite set with threshold if 9 holds for all .

Figure 6.

One-shot separability: we select a hyperplane that divides the probability measure into two maximally unequal parts.

Graphic without alt text

The question is: how do we know that is Fisher-separable from , ? An answer to this question follows from the next statement.

Proposition 1 (GGG18).

Let , , let be drawn from a distribution supported on whose probability density satisfies 5 with some and , and let be a finite set in with

Then the point is Fisher-separable from the set with probability at least .

Several interesting observations stem immediately from Proposition 1. It appears that construction of separating hyperplanes does not always require complete knowledge of sets that are being separated. Some rough information such as the value of the point , the fact that all , are in a unit ball centered at , and that is drawn from a SmAC distribution suffices. The resulting hyperplane separates from , with probability at least , and with some guaranteed margin . Note, however, that this margin is not necessarily maximal as requested by program 7.

It turns out that Fisher separability for exponentially many points holds for many important families of distributions, including rotation invariant log-concave distributions and product distributions whose components have bounded support or very fast-decaying tails GGT21. At the same time, there are examples of product distributions with identical log-concave components for which this is no longer true GGT21. It is hence natural to ask if and how similar simple solutions could be derived for such distributions with “heavier” tails.

4.2. One-shot separability: general case

Now we formulate the same idea in general. Let be an arbitrary probability measure in , and let be an arbitrary point. Problem 2 asks to construct a hyperplane separating from other points selected at random from without knowing their positions. Every hyperplane divides into two half-spaces, say and , whose probability measures are , and , respectively. We would like and the remaining points to belong to different subspaces, say , and other points to belong . The probability of the latter event is . This probability is maximized if is minimized. Hence, the idea is to select the halfspace containing whose probability measure is minimal; see Figure 6. Formally, let be the set of halfspaces of containing , let

be the minimal measure of a halfspace containing , and let be the minimizer⁠Footnote2 in 10. Function is known as Tukey’s halfspace depth.

2

Each halfspace can be identified with its normal unit vector, the set of all such vectors is a compact set, hence there must be a halfspace that achieves the minimum.

The probability that separates from points is . We would like this probability to be greater than a given constant even if grows exponentially fast with . To ensure this, should decrease exponentially fast with . This may not be the case for all : for example, if is the uniform distribution in the ball, and is the center of the ball, then . However, there is a hope that decreases fast on average, for random point . In other words, we need expected value

to decrease exponentially fast with .

Definition 3.

Let be a family of probability measures, where is the probability measure on . We say that has exponential one-shot separability if

for some constants , .

In this section, we overview our recent results that establish exponential one-shot separability for a large class of product distributions, and discuss a conjecture that this property holds for all log-concave distributions.

Let us now be a bit more formal. We say that density of random vector (and the corresponding probability measure ) is log-concave, if set

is convex and is a convex function on . For example, the uniform distribution in an arbitrary convex body is log-concave. Let be the variance-covariance matrix of , that is, matrix with components . Because the log-concavity of and the definition of are invariant under invertible linear transformations, we may assume that and is the identity matrix. Such distributions are called isotropic. Quantity

is called the isotropic constant of . Very recently, Brazitikos, Giannopoulos, and Pafis BGP22 proved that

for some absolute constant . A famous conjecture in convex geometry predicts that

for some constant independent from the dimension. This conjecture has been made in 1986 by Jean Bourgain Bou86 in the form that “There exists a universal constant (independent from ) such that for any convex set of unit volume in , there exists a hyperplane such that the -dimensional volume of the section is bounded below by ,” and since then is known as the Hyperplane conjecture. It turns out that this conjecture is equivalent to 12, and in fact has many other equivalent formulations. Recently, Chen Che21 made a breakthrough and proved that

for some function tending to as . Even more recently, Klartag and Lehec KL22 improved this to for some absolute constant . In combination with 11, a full proof of conjecture 12 would imply that any family of log-concave probability measures has exponential one-shot separability.

Our next example is a family of product distributions. Specifically, for each , let be the the product measure of one-dimensional probability measures . For any distribution on , define

where and are random variables with distribution . Then we have proved GGT that has exponential one-shot separability provided that for each component distribution . This property holds for a large variety of distributions. For example, we have the following sufficient condition GGT.

Proposition 2.

Let be a random variable with distribution . Assume that is non-constant and for some . Then .

When our data are non-negative, Proposition 2 implies the following corollary.

Corollary 1.

Let be a non-constant non-negative random variable with distribution . Then .

For log-concave distributions, we have the following explicit and uniform upper bound GGT.

Proposition 3.

For any log-concave probability distribution on , we have

Proposition 3 implies the following result.

Theorem 2.

Let be a family of product distributions such that all component distributions are log-concave. Then has exponential one-shot separability (see Definition 3) with parameters and .

We did not try to optimize the upper bound for in Proposition 3. Instead, we pose the problem of finding the optimal upper bound as an open question. Specifically, if is the class of all log-concave distributions on , then what is the value of

Proposition 3 provides the upper bound . On the other hand, example of Laplace distribution shows that

While the upper bound is clearly non-optimal, it may be that is equal to the lower bound.

5. Conclusions

A phenomenon known as curse of dimensionality states that many methods and techniques that are efficient in low dimension become infeasible is high dimension. Stochastic separation theorems are examples of the opposite phenomenon, blessing of dimensionality, which states that some aspects become easier in higher dimensions. The theorems state that if we have random points in , then, with high probability, every point can be separated from all others by a hyperplane. This is true even if the number of points grows exponentially fast with dimension.

While being interesting from purely mathematical perspective, stochastic separation theorems could be a stepping stone for the development of much-needed error correcting mechanisms GGG18, algorithms capable of learning from just few examples GGM21, approaching the challenge of continuous learning without catastrohpic forgetting in machine learning and AI, and to produce new notions of data dimension GMT19. The theorems imply that if the number of attributes is moderately high, AI errors may be corrected by adding simple linear correctors, that are fast, easy to compute and implement, and do not destroy existing functionality of the system. The simplest corrector is based on Fisher separability discussed in Section 4.1. Deeper one-shot separation theorems discussed in Section 4.2 make the method applicable even for distributions for which Fisher separability fails.

References

[22]
Safe driving cars, Nat. Mach. Intell. 4 (2022), 95–96.Show rawAMSref\bib{Self_driving}{article}{ title={Safe driving cars}, date={2022}, journal={Nat. Mach. Intell.}, volume={4}, pages={95\ndash 96}, url={https://doi.org/10.1038/s42256-022-00456-w}, } Close amsref.
[Bár99]
Imre Bárány, Sylvester’s question: the probability that points are in convex position, Ann. Probab. 27 (1999), no. 4, 2020–2034. MR1742899Show rawAMSref\bib{MR1742899}{article}{ author={B\'{a}r\'{a}ny, Imre}, title={Sylvester's question: the probability that {$n$} points are in convex position}, date={1999}, issn={0091-1798}, journal={Ann. Probab.}, volume={27}, number={4}, pages={2020\ndash 2034}, url={https://doi.org/10.1214/aop/1022677559}, review={\MR {1742899}}, } Close amsref.
[BHV21]
A. Bastounis, A. C. Hansen, and V. Vlačić, The extended Smale’s 9th problem–on computational barriers and paradoxes in estimation, regularisation, computer-assisted proofs and learning, Preprint, arXiv:2110.15734, 2021.Show rawAMSref\bib{bastounis2021extended}{eprint}{ author={Bastounis, A.}, author={Hansen, A.~C.}, author={Vla{\v {c}}i{\'c}, V.}, title={The extended {S}male's 9th problem--on computational barriers and paradoxes in estimation, regularisation, computer-assisted proofs and learning}, date={2021}, arxiv={2110.15734}, } Close amsref.
[Bou86]
J. Bourgain, On high-dimensional maximal functions associated to convex bodies, Amer. J. Math. 108 (1986), no. 6, 1467–1476. MR868898Show rawAMSref\bib{MR868898}{article}{ author={Bourgain, J.}, title={On high-dimensional maximal functions associated to convex bodies}, date={1986}, issn={0002-9327}, journal={Amer. J. Math.}, volume={108}, number={6}, pages={1467\ndash 1476}, url={https://doi.org/10.2307/2374532}, review={\MR {868898}}, } Close amsref.
[BGP22]
S. Brazitikos, A. Giannopoulos, and M. Pafis, Half-space depth of log-concave probability measures, Preprint, arXiv:2201.11992, 2022.Show rawAMSref\bib{brazitikos2022half}{eprint}{ author={Brazitikos, S.}, author={Giannopoulos, A.}, author={Pafis, M.}, title={Half-space depth of log-concave probability measures}, date={2022}, arxiv={2201.11992}, } Close amsref.
[Cha07]
Olivier Chapelle, Training a support vector machine in the primal, Neural Comput. 19 (2007), no. 5, 1155–1178. MR2309267Show rawAMSref\bib{MR2309267}{article}{ author={Chapelle, Olivier}, title={Training a support vector machine in the primal}, date={2007}, issn={0899-7667}, journal={Neural Comput.}, volume={19}, number={5}, pages={1155\ndash 1178}, url={https://doi.org/10.1162/neco.2007.19.5.1155}, review={\MR {2309267}}, } Close amsref.
[Che21]
Yuansi Chen, An almost constant lower bound of the isoperimetric coefficient in the KLS conjecture, Geom. Funct. Anal. 31 (2021), no. 1, 34–61. MR4244847Show rawAMSref\bib{MR4244847}{article}{ author={Chen, Yuansi}, title={An almost constant lower bound of the isoperimetric coefficient in the {KLS} conjecture}, date={2021}, issn={1016-443X}, journal={Geom. Funct. Anal.}, volume={31}, number={1}, pages={34\ndash 61}, url={https://doi.org/10.1007/s00039-021-00558-4}, review={\MR {4244847}}, } Close amsref.
[DZS22]
I. Drori, S. Zhang, R. Shuttleworth, L. Tang, and A. Lu et al., A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level, Proc. of the Nat. Acad. Sci. 119 (2022), no. 32, e2123433119, available at https://www.pnas.org/doi/pdf/10.1073/pnas.2123433119.Show rawAMSref\bib{maths}{article}{ author={Drori, I.}, author={Zhang, S.}, author={Shuttleworth, R.}, author={Tang, L.}, author={Lu~et al., A.}, title={A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level}, date={2022}, journal={Proc. of the Nat. Acad. Sci.}, volume={119}, number={32}, pages={e2123433119}, eprint={https://www.pnas.org/doi/pdf/10.1073/pnas.2123433119}, url={https://www.pnas.org/doi/abs/10.1073/pnas.2123433119}, } Close amsref.
[Ele86]
G. Elekes, A geometric inequality and the complexity of computing volume, Discrete Comput. Geom. 1 (1986), no. 4, 289–292. MR866364Show rawAMSref\bib{MR866364}{article}{ author={Elekes, G.}, title={A geometric inequality and the complexity of computing volume}, date={1986}, issn={0179-5376}, journal={Discrete Comput. Geom.}, volume={1}, number={4}, pages={289\ndash 292}, url={https://doi.org/10.1007/BF02187701}, review={\MR {866364}}, } Close amsref.
[GGG18]
A. N. Gorban, A. Golubkov, B. Grechuk, E. M. Mirkes, and I. Y. Tyukin, Correction of AI systems by linear discriminants: probabilistic foundations, Inform. Sci. 466 (2018), 303–322. MR3847955Show rawAMSref\bib{MR3847955}{article}{ author={Gorban, A.~N.}, author={Golubkov, A.}, author={Grechuk, B.}, author={Mirkes, E.~M.}, author={Tyukin, I.~Y.}, title={Correction of {AI} systems by linear discriminants: probabilistic foundations}, date={2018}, issn={0020-0255}, journal={Inform. Sci.}, volume={466}, pages={303\ndash 322}, url={https://doi.org/10.1016/j.ins.2018.07.040}, review={\MR {3847955}}, } Close amsref.
[GGM21]
A. N. Gorban, B. Grechuk, E. M. Mirkes, S. V. Stasenko, and I. Y. Tyukin, High-dimensional separability for one-and few-shot learning, Entropy 23 (2021), no. 8, 1090.Show rawAMSref\bib{gorban2021high}{article}{ author={Gorban, A.~N.}, author={Grechuk, B.}, author={Mirkes, E.~M.}, author={Stasenko, S.~V.}, author={Tyukin, I.~Y.}, title={High-dimensional separability for one-and few-shot learning}, date={2021}, journal={Entropy}, volume={23}, number={8}, pages={1090}, } Close amsref.
[GGT]
A. N. Gorban, B. Grechuk, and I. Y. Tyukin, One-shot separation theorems, In preparation.Show rawAMSref\bib{gorban2022oneshot}{article}{ author={Gorban, A.~N.}, author={Grechuk, B.}, author={Tyukin, I.~Y.}, title={One-shot separation theorems}, journal={In preparation}, } Close amsref.
[GMT19]
A. N. Gorban, V. A. Makarov, and I. Y. Tyukin, The unreasonable effectiveness of small neural ensembles in high-dimensional brain, Physics of Life Reviews 29 (2019), 55–88.Show rawAMSref\bib{gorban2019unreasonable}{article}{ author={Gorban, A.~N.}, author={Makarov, V.~A.}, author={Tyukin, I.~Y.}, title={The unreasonable effectiveness of small neural ensembles in high-dimensional brain}, date={2019}, journal={Physics of {L}ife {R}eviews}, volume={29}, pages={55\ndash 88}, url={https://doi.org/10.1016/j.plrev.2018.09.005}, } Close amsref.
[GT18]
A. N. Gorban and I. Y. Tyukin, Blessing of dimensionality: mathematical foundations of the statistical physics of data, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 376 (2018), no. 2118, 20170237.Show rawAMSref\bib{gorban2018blessing}{article}{ author={Gorban, A.~N.}, author={Tyukin, I.~Y.}, title={Blessing of dimensionality: mathematical foundations of the statistical physics of data}, date={2018}, journal={Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences}, volume={376}, number={2118}, pages={20170237}, } Close amsref.
[GGT21]
B. Grechuk, A. N. Gorban, and I. Y. Tyukin, General stochastic separation theorems with optimal bounds, Neural Networks 138 (2021), 33–56.Show rawAMSref\bib{grechuk2021general}{article}{ author={Grechuk, B.}, author={Gorban, A.~N.}, author={Tyukin, I.~Y.}, title={General stochastic separation theorems with optimal bounds}, date={2021}, journal={Neural Networks}, volume={138}, pages={33\ndash 56}, url={https://doi.org/10.1016/j.neunet.2021.01.034}, } Close amsref.
[KE21]
N. Khan and M. Efthymiou, The use of biometric technology at airports: The case of customs and border protection (cbp), International Journal of Information Management Data Insights 1 (2021), no. 2, 100049.Show rawAMSref\bib{biometric}{article}{ author={Khan, N.}, author={Efthymiou, M.}, title={The use of biometric technology at airports: The case of customs and border protection (cbp)}, date={2021}, issn={2667-0968}, journal={International Journal of Information Management Data Insights}, volume={1}, number={2}, pages={100049}, url={https://www.sciencedirect.com/science/article/pii/S2667096821000422}, } Close amsref.
[KL22]
Bo’az Klartag and Joseph Lehec, Bourgain’s slicing problem and kls isoperimetry up to polylog, Preprint, arXiv:2203.15551, 2022.Show rawAMSref\bib{klartag2022bourgain}{eprint}{ author={Klartag, Bo'az}, author={Lehec, Joseph}, title={Bourgain's slicing problem and kls isoperimetry up to polylog}, date={2022}, arxiv={2203.15551}, } Close amsref.
[MSG20]
S.M. McKinney, M. Sieniek, V. Godbole, et al., International evaluation of an AI system for breast cancer screening, Nature 577 (2020), no. 7788, 89–94.Show rawAMSref\bib{mckinney2020international}{article}{ author={McKinney, S.M.}, author={Sieniek, M.}, author={Godbole, V.}, author={others}, title={International evaluation of an {AI} system for breast cancer screening}, date={2020}, journal={Nature}, volume={577}, number={7788}, pages={89\ndash 94}, url={https://doi.org/10.1038/s41586-019-1799-6}, } Close amsref.
[SBKV20]
H. Seo, M. Badiei Khuzani, V. Vasudevan, C. Huang, and H. Ren et al., Machine learning techniques for biomedical image segmentation: An overview of technical aspects and introduction to state-of-art applications, Medical Physics 47 (2020), no. 5, e148–e167.Show rawAMSref\bib{segmentation}{article}{ author={Seo, H.}, author={Badiei~Khuzani, M.}, author={Vasudevan, V.}, author={Huang, C.}, author={Ren~et al., H.}, title={Machine learning techniques for biomedical image segmentation: An overview of technical aspects and introduction to state-of-art applications}, date={2020}, journal={Medical Physics}, volume={47}, number={5}, pages={e148\ndash e167}, url={https://doi.org/10.1002/mp.13649}, } Close amsref.
[THG20]
I. Y. Tyukin, D. J. Higham, and A. N. Gorban, On adversarial examples and stealth attacks in artificial intelligence systems, 2020 International Joint Conference on Neural Networks (IJCNN), 2020, pp. 1–6.Show rawAMSref\bib{tyukin2020adversarial}{inproceedings}{ author={Tyukin, I.~Y.}, author={Higham, D.~J.}, author={Gorban, A.~N.}, title={On adversarial examples and stealth attacks in artificial intelligence systems}, organization={IEEE}, date={2020}, booktitle={2020 {I}nternational {J}oint {C}onference on {N}eural {N}etworks ({IJCNN})}, pages={1\ndash 6}, } Close amsref.

Credits

Opening image is courtesy of metamorworks via Getty.

Figures 1–6 are courtesy of the authors.

Photo of Alexander Gorban is courtesy of Alexander Gorban.

Photo of Bogdan Grechuk is courtesy of Bogdan Grechuk.

Photo of Ivan Tyukin is courtesy of Ivan Tyukin.