Order-distance and other metric-like functions on jointly distributed random variables

We construct a class of real-valued nonnegative binary functions on a set of jointly distributed random variables, which satisfy the triangle inequality and vanish at identical arguments (pseudo-quasi-metrics). These functions are useful in dealing with the problem of selective probabilistic causality encountered in behavioral sciences and in quantum physics. The problem reduces to that of ascertaining the existence of a joint distribution for a set of variables with known distributions of certain subsets of this set. Any violation of the triangle inequality or its consequences by one of our functions when applied to such a set rules out the existence of this joint distribution. We focus on an especially versatile and widely applicable pseudo-quasi-metric called an order-distance and its special case called a classification distance.


Order p.q.-metrics
Random variables in this paper are understood in the broadest sense, as measurable functions X : V s → V , no restrictions being imposed on the sample spaces (V s , Σ s , µ s ) and the induced probability spaces, (V, Σ, µ), with the usual meaning of the terms (sets of values V s , V , sigma-algebras Σ s , Σ, and probability measures µ s , µ). In particular, any set X of jointly distributed random variables (functions on the same sample space) is a random variable, and its induced probability space (or, simply, distribution) X = (V, Σ, µ) is referred to as the joint distribution of its elements.
Given a class of random variables X , not necessarily jointly distributed, let X * be the class of distributions X for all X ∈ X . For any class function f * : X * → R (reals), the function f : X → R defined by f (X) = f * X is called observable (as it does not depend on sample spaces, typically unobservable). We will conveniently confuse f and f * for observable functions, so that if f is defined on X , then f (Y ), identified with f * Y , is also defined for any Y ∈ X with Y ∈ X * . (This convention is used in Section 2, when we apply a function defined on a set of random variables H to different but identically distributed sets of A-variables.) For an arbitrary nonempty set Ω, let H = {H ω : ω ∈ Ω} be a indexed set of jointly distributed random variables H ω with distributions H ω = (V ω , Σ ω , µ ω ). For any α, β ∈ Ω, the ordered pair (H α , H β ) is a random variable with distribution (V α × V β , Σ α × Σ β , µ α,β ), and H × H is a set of jointly distributed random variables (hence also a random variable). For terminological clarity, the conventional pseudometrics (also called semimetrics) obtain by adding the property d (H α , H β ) = d (H β , H α ); the conventional quasimetrics are obtained by adding the property α = β ⇒ d (H α , H β ) > 0. A conventional metric is both a pseudometric and a quasimetric. (See, e.g., Zolotarev, 1976, for discussion of a variety of metrics and pseudometrics on random variables.) By obvious argument we can generalize the triangle inequality, (iii): for any H α1 , . . . , H α l ∈ H (l ≥ 3), We refer to this inequality (which plays a central role in this paper) as the chain inequality.
and we write a b to designate (a, b) ∈ R. Let R be a total order, that is, transitive, reflexive, and connected in the sense that for any (a, b) ∈ (α,β)∈Ω×Ω V α × V β , at least one of the relations a b and b a holds. We define the equivalence a ∼ b and strict order a ≺ b induced by in the usual way. Finally, we assume that for any (α, β) ∈ Ω × Ω, the sets are µ α,β -measurable. This implies the µ α,β -measurability of the sets Thus, if all V ω are intervals of reals, can be chosen to coincide with ≤, and (assuming the usual Borel sigma algebra) all the properties above are satisfied. Another example: for arbitrary V ω , provided each Σ ω contains at least n > 1 disjoint nonempty sets, one can partition V ω as n k=1 V is called an order p.q.-metric, or order-distance, on H.
That the definition is well-constructed follows from Since in the last expression all events are pairwise exclusive, we have This may seem an attractive addition to the triangle inequality. The inequality is redundant, however, as it is subsumed by the triangle inequalities holding on {A, B, X}. Rewriting the expression above as

Selective probabilistic causality
Consider an indexed set W = W λ : λ ∈ Λ , with each W λ being a set referred to as a (deterministic) input, with the elements of {λ} × W λ called input points. Input points therefore are pairs of the form x = (λ, w) and should not be confused with input values w. A nonempty set Φ ⊂ λ∈Λ W λ is called a set of (allowable) treatments; a treatment therefore is also a set of pairs of the form (λ, w). Let there be a collection of sets of random variables, referred to as (random) outputs, A φ = A λ φ : λ ∈ Λ , φ ∈ Φ, such that the distribution of A φ (i.e., the joint distribution of all A λ φ in A φ ) is known for every treatment φ. We define with the understanding that A λ is not a random variable (i.e., A λ φ for different φ are not jointly distributed).
The following problem is encountered in a wide variety of contexts (see Dzhafarov, 2003;Dzhafarov & Gluhovsky, 2006;Kujala & Dzhafarov, 2008). We say that the dependence of random outputs A λ φ on the deterministic inputs W λ is (canonically) selective if, for every λ ∈ Λ and every φ ∈ Φ, the output A λ φ is "influenced" by none of the input points in φ except, possibly, for the one belonging to {λ} × W λ . The question is how one should define this selectivity of "influences" rigorously, and how one can determine whether this selectivity holds. This problem was introduced to behavioral sciences in Sternberg (1969) and Townsend (1984). In quantum physics, using different terminology, it was introduced in Bell (1964) and elaborated in Fine (1982a-b). The definition can be given in several equivalent forms, of which we present the one focal for the present context. Definition 2.1. The dependence of A λ : λ ∈ Λ on W λ : λ ∈ Λ (or the "influence" of the latter on the former) is (canonically) selective if there is a set of jointly distributed random variables (one random variable for every value of every input), such that, for every φ ∈ Φ, A φ = A λ φ : λ ∈ Λ (the corresponding elements of H φ and A φ being those sharing the same λ).
This definition is known as the Joint Distribution Criterion (JDC) for selectivity of influences, and the set H satisfying this definition is referred to as a (hypothetical) JDC-set. Specialized forms of this criterion in quantum physics can be found in Suppes & Zanotti (1981) and Fine (1982a-b); in the behavioral context and in complete generality this criterion is given (derived from an equivalent definition) in .
Remark 2.2. The adjective "canonical" in the definition refers to the one-to-one correspondence between W λ and A λ sharing the same λ. A seemingly more general scheme, in which different A λ are selectively influenced by different (possibly overlapping) subsets of W λ : λ ∈ Λ is always reducible to the canonical form by considering, for every A λ , the Cartesian product of the inputs influencing it a single input, and redefining correspondingly the sets of input points and the set of allowable treatments.
The simplest consequence of JDC is that the selectivity of influences implies marginal selectivity (Dzhafarov, 2003;Townsend & Schweickert, 1989), defined as follows. For any Λ ′ ⊂ Λ we can uniquely present any 3. In the following we always assume that marginal selectivity is satisfied.
The relevance of the order-distance and other p.q.-metrics on the sets of jointly distributed random variables to the problem of selectivity lies in the general test (necessary condition) for selectivity of influences, formulated after the following definition.
Definition 2.4. We call a sequence of input points If a JDC-set H exists, then for any p.q.-metric d on H we should have This chain inequality, written entirely in terms of observable probabilities, is referred to as a p.q.-metric test for selectivity of influences. If this inequality is violated for at least one treatment-realizable sequence of input points, no JDC-set H exists, and the selectivity is ruled out. Note: if the sequence φ (1) , . . . , φ (l) ∈ Φ for a given x 1 , . . . , x l can be chosen in more than one way, the observable quantities d A α1 remain invariant due to the (tacitly assumed) marginal selectivity.
As an example, let Λ = {1, 2}, φ have a bivariate normal distribution with zero means, unit variances, and correlation ρ = min (1, v + w). Marginal selectivity is trivially satisfied. Do W 1 , W 2 influence A 1 , A 2 selectively? For any bivariate normally distributed (A, B), let us define A ≺ B iff A < 0, B ≥ 0. Then the corresponding order-distance on the hypothetical JDC-set H is The sequence of input points (1, 0) , (2, 1) , (1, 1) , (2, 0) is treatment-realizable, so if H exists, we should have The numerical substitutions yield, however, and as this is false, the hypothesis that W 1 , W 2 influence A 1 , A 2 selectively is rejected. The theorem below and its corollary show that one only needs to check the chain inequality for a special subset of all possible treatment-realizable sequences x 1 , . . . , x l . Proof. We prove this theorem by showing that if (2) is violated for some reducible sequence x 1 , . . . , x l , then it is violated for some proper subsequence thereof. Clearly, x 1 = x l because otherwise (2) is not violated. For l = 3, x 1 , x 2 , x 3 is reducible only if it is contained in a treatment: but then (2) would be satisfied. So l > 3, and the reducibility of x 1 , . . . , x l means that there is a pair {x p , x q } belonging to a treatment, with (p, q) = (1, l) and q > p + 1. But then (2) must be violated for either x p , . . . , x q or x 1 , . . . , x p , x q , . . . , x l (allowing for p = 1 or q = l but not both).
If Φ = λ∈Λ W λ (all logically possible treatments are allowable), then any subsequence x i1 , . . . , x i k of input points with pairwise distinct α i1 , . . . , α i k belongs to some treatment. Therefore an irreducible sequence cannot contain points of more than two inputs, and it is easy to see that then it must be a sequence of pairwise distinct . It is also easy to see that if m > 2, each of the subsets {x 1 , x 4 } and {x 2 , x 5 } will belong to a treatment. Hence m = 2 is the only possibility for an irreducible sequence.
Remark 2.8. This formulation is given in , although there it is unnecessarily confined to metrics of a special kind.

An application
The four tables below represent results of an experiment with a 2 × 2 factorial design, {x, x ′ } × {y, y ′ }, and two binary responses, A and B. In relation to our general notation, we have here Λ = {1, 2}, W 1 = {x, x ′ }, W 2 = {y, y ′ }, and four treatments (x, y) , . . . , (x ′ , y ′ ); for every treatment φ, the random outputs A 1 φ and A 2 φ are represented by, respectively, A φ and B φ , each having two possible values, arbitrarily labeled. This design is arguably the simplest possible, and it is ubiquitous in science. In a psychological double-detection experiment (see, e.g., Townsend & Nozawa, 1995), the input values may represent presence (x and y) or absence (x ′ and y ′ ) of a designated signal in two stimuli labeled 1 and 2, presented side-by-side. The participant in such an experiment is asked to indicate whether the signal was present or absent in stimulus 1 and in stimulus 2. The output values A = • and B = ⊓ may indicate either that the response was "signal present" or that the response was correct; and analogously for A = • and B = ⊔ (either "signal absent" or an incorrect response). The entries p ij , q ij , etc. represent joint probabilities of the corresponding outcomes, a i· , a ′ i· , etc. represent marginal probabilities. The question to be answered is: does the response to a given stimulus (A to 1 and B to 2) selectively depend on that stimulus alone (despite A and B being stochastically dependent for every treatment), or is A or B influenced by both 1 and 2?
Another important situation in which we encounter formally the same problem is the Einstein-Podolsky-Rosen (EPR) paradigm. Two particles are emitted from a common source in such a way that they remain entangled (have highly correlated properties, such as momenta or spins) as they run away from each other (Aspect, 1999;Mermin, 1985). An experiment may consist, e.g., in measuring the spin of electron 1 along one of two axes, x or x ′ , and (in another location but simultaneously in some inertial frame of reference) measuring the spin of electron 2 along one of two axes, y or y ′ . The outcome A of a measurement on electron 1 is a random variable with two possible values, "up" or "down," and the same holds for B, the outcome of a measurement on electron 2. The question here is: do the measurements on electrons 1 and 2 selectively affect, respectively, A and B (even though generally A and B are not independent at any of the four combinations of spin axes)? If the answer is negative, then the measurement of one electron affects the outcome of the measurement of another electron even though no signal can be exchanged between two distant events that are simultaneous in some frame of reference. What makes this situation formally identical to the double-detection example described above is that the measurements performed along different axes on the same particle, x and x ′ or y and y ′ , are non-commuting, i.e., they cannot be performed simultaneously. This makes it possible to consider such measurements as mutually exclusive values of an input.
exists if and only if the following eight inequalities are satisfied: We refer to (3) Bell's (1964) approach was developed into a special version of (3).
Remark 3.2. The proof given in Fine (1982a-b) that (3) is both necessary and sufficient (under marginal selectivity) for the existence of a JDC-set can be conceptually simplified: the Bell-CHSH-Fine inequalities can be algebraically shown to be the criterion for the existence of a vector Q with 16 probabilities that sum to one and whose appropriately chosen partial sums yield the 8 observable probabilities (other probabilities being determined due to marginal selectivity). This is a simple linear programming task, and the Bell-CHSH-Fine inequalities can be derived "mechanically" by a facet enumeration algorithm (see Wolf, 2001a-b, andBasoalto &Percival, 2003).
The point of interest in the present context is that the Bell-CHSH-Fine inequalities, whose rather obscure structure does not seem to fit their fundamental importance, turn out to be interpretable as the triangle inequalities for appropriately chosen order-distances.
Consider the chain inequalities for the order-distance D 1 obtained by putting • = ⊔ = 1, • = ⊓ = 2, and identifying with ≤: Consider also the inequalities for the order-distance D 2 obtained by putting • = ⊓ = 1, • = ⊔ = 2, and identifying with ≤: Proof. We show the proof for the first of the Bell-CHSH-Fine double-inequalities. The equivalence of obtains by using the identities The equivalence of follows from the identity

Concluding remarks
The order-distances are versatile and have a broad sphere of applicability because order relations on the domains of any given set of random variables can always be defined in many different ways. If no other structure is available, this can always be done by the partitioning of the domains mentioned in Section 1 and used in the example with bivariate normal distributions in Section 2 as well as for the binary variables of the previous section: and k ≤ l. Due to its universality and convenience of use, it deserves a special name, classification distance.
Under additional constraints one can suggest many other p.q.-metrics on sets of jointly distributed random variables. Thus, if the variables in H are real-valued with the conventional Borel sigma algebras, one can define, for any where This means that the test can be performed on a potential infinity of sets of random variables If the jointly distributed random variables constituting the set H are discrete, one can use information-based p.q.-metric. Perhaps the simplest of them is with the conventions 0 log 0 0 = 0 log 0 = 0. is This function is called conditional entropy. The identity h (A|A) = 0 is obvious, and the triangle inequality, follows from the standard information theory (in)equalities, Note that, unlike with the distance d Below we present an incomplete list of transformations which, given a p.q.metric (quasimetric, pseudometric, conventional metric) d on a space H of jointly distributed random variables produces a new p.q.-metric (respectively, quasimetric, pseudometric, or conventional metric) on the same space. The proofs are trivial or well-known. The arrows =⇒ should be read "can be transformed into." 1. d =⇒ d q (q < 1). In this way, for example, we can obtain metrics To illustrate the latter way of constructing p.q.-metrics, consider a classification distance with binary partitions: the domain V ω of every H ω in H is partitioned into two (measurable) subsets, W ω,υ and W ω,υ . Making these partitions random, i.e., allowing the index υ to randomly vary in any way whatever, we get a new p.q.-metric. In the special case when all random variables in H take their values in the set of real numbers, and W ω,υ is defined by z ≤ υ (z ∈ V ω ⊂ R, υ ∈R), the randomization of the partitions reduces to that of the separation point υ. The p.q.-metric then becomes where U is some random variable. An additively symmetrized (i.e., pseudometric) version of this p.q.-metric, d S (A, B) + d S (B, A), was introduced in Taylor (1984Taylor ( , 1985 under the name "separation (pseudo)metric," and shown to be a conventional metric if U is chosen stochastically independent of all random variables in H.  Abstract. We construct a class of real-valued nonnegative binary functions on a set of jointly distributed random variables, which satisfy the triangle inequality and vanish at identical arguments (pseudo-quasi-metrics). We apply these functions to the problem of selective probabilistic causality encountered in behavioral sciences and in quantum physics. The problem reduces to that of ascertaining the existence of a joint distribution for a set of variables with known distributions of certain subsets of this set. Any violation of the triangle inequality by one of our functions when applied to such a set rules out the existence of the joint distribution. We focus on an especially versatile and widely applicable class of pseudo-quasi-metrics called order-distances. We show, in particular, that the Bell-CHSH-Fine inequalties of quantum physics follow from the triangle inequalities for appropriately defined order-distances.

References
We show how certain metric-like functions on jointly distributed random variables (pseudo-quasimetrics introduced in Section 1) can be used in dealing with the problem of selective probabilistic causality (introduced in Section 2), illustrating this on examples taken from behavioral sciences and quantum physics (Section 3). Although most of Section 2 applies to arbitrary pseudo-quasimetrics on jointly distributed random variables, we single out one, termed order-distance, which is especially useful due to its versatility. We discuss examples of other pseudo-quasi-metrics and rules for their construction in Section 4.

Order p.q.-metrics
Random variables in this paper are understood in the broadest sense, as measurable functions X : V s → V , no restrictions being imposed on the sample spaces (V s , Σ s , µ s ) and the induced probability spaces, (V, Σ, µ), with the usual meaning of the terms (sets of values V s , V , sigmaalgebras Σ s , Σ, and probability measures µ s , µ). In particular, any set X of jointly distributed random variables (functions on the same sample space) is a random variable, and its induced probability space (or, simply, distribution) X = (V, Σ, µ) is referred to as the joint distribution of its elements.
Given a class of random variables X , not necessarily jointly distributed, let X * be the class of distributions X for all X ∈ X . For any class function f * : X * → R (reals), the function f : X → R defined by f (X) = f * X is called observable (as it does not depend on sample spaces, typically unobservable). We will conveniently confuse f and f * for observable functions, so that if   f is defined on X , then f (Y ), identified with f * Y , is also defined for any Y ∈ X with Y ∈ X * . (This convention is used in Section 2, when we apply a function defined on a set of random variables H to different but identically distributed sets of A-variables.) For an arbitrary nonempty set Ω, let H = {H ω : ω ∈ Ω} be a indexed set of jointly distributed random variables H ω with distributions H ω = (V ω , Σ ω , µ ω ). For any α, β ∈ Ω, the ordered pair (H α , H β ) is a random variable with distribution (V α × V β , Σ α × Σ β , µ α,β ), and H × H is a set of jointly distributed random variables (hence also a random variable). Definition 1.1. We call an observable function d : For terminological clarity, the conventional pseudometrics (also called semimetrics) obtain by adding the property d (H α , H β ) = d (H β , H α ); the conventional quasimetrics are obtained by adding the property α = β ⇒ d (H α , H β ) > 0. A conventional metric is both a pseudometric and a quasimetric. (See, e.g., [27] for discussion of a variety of metrics and pseudometrics on random variables.) By obvious argument we can generalize the triangle inequality, (iii): for any H α1 , . . . , H α l ∈ H (l ≥ 3), We refer to this inequality (which plays a central role in this paper) as the chain inequality.
and we write a b to designate (a, b) ∈ R. Let R be a total order, that is, transitive, reflexive, and connected in the sense that for any (a, b) ∈ (α,β)∈Ω×Ω V α × V β , at least one of the relations a b and b a holds. We define the equivalence a ∼ b and strict order a ≺ b induced by in the usual way. Finally, we assume that for any (α, β) ∈ Ω × Ω, the sets are µ α,β -measurable. This implies the µ α,β -measurability of the sets Thus, if all V ω are intervals of reals, can be chosen to coincide with ≤, and (assuming the usual Borel sigma algebra) all the properties above are satisfied. Another example: for arbitrary V ω , provided each Σ ω contains at least n > 1 disjoint nonempty sets, one can partition V ω as and k ≤ l. Again, all properties above are clearly satisfied.
is called an order p.q.-metric, or order-distance, on H.
That the definition is well-constructed follows from Proof. Let α, β, γ ∈ Ω, and H α = A, H β = B, and H γ = X. That D (A, B) is determined by the distribution of (A, B) is obvious from the definition. The properties D (A, B) ≥ 0 and D (A, A) = 0 are obvious too. To prove the triangle inequality,

Selective probabilistic causality
Consider an indexed set W = W λ : λ ∈ Λ , with each W λ being a set referred to as a (deterministic) input, with the elements of {λ} × W λ called input points. Input points therefore are pairs of the form x = (λ, w), with w ∈ W λ , and should not be confused with input values w. A nonempty set Φ ⊂ λ∈Λ W λ is called a set of (allowable) treatments. A treatment therefore is a function φ : Λ → λ∈Λ W λ such that φ (λ) ∈ W λ for any λ ∈ Λ. Note that symbol φ not followed by an argument always refers to the entire function, the set {(λ, φ (λ) : λ ∈ Λ)}.
In the following we use two kinds of random variables: those indexed as A λ φ (each corresponding to a fixed index λ ∈ Λ and a fixed function φ) and those indexed as H λ w (with w ∈ W λ ), corresponding to input points (λ, w).
Let there be a collection of sets of random variables, referred to as (random) outputs, such that the distribution of A φ (i.e., the joint distribution of all A λ φ in A φ ) is known for every treatment φ. We define A λ = A λ φ : φ ∈ Φ , λ ∈ Λ, with the understanding that A λ is not a random variable (i.e., A λ φ for different φ are not jointly distributed). To illustrate the notation, let Λ = {1, 2, . . .} and W λ be the set of reals for all λ ∈ Λ. A treatment φ then is a real-valued function (sequence) {(1, φ (1)) , (2, φ (2)) , . . .} = (φ (1) , φ (2) , . . .), where φ (1) ∈ W 1 , φ (2) ∈ W 2 , etc. Let Φ be a nonempty set of such sequences. Fixing one of them, φ = (w 1 , w 2 , . . .), w2,...) , . . . ; fixing, say, λ = 2 and allowing (w 1 , w 2 , . . .) range over Φ, The following problem is encountered in a wide variety of contexts [6,7,15]. We say that the dependence of random outputs A λ φ on the deterministic inputs W λ is (canonically) selective if, for any distinct λ, λ ′ ∈ Λ and any φ ∈ Φ, the output A λ φ is "not influenced" by φ (λ ′ ). The question is how one should define this selectivity of "influences" rigorously, and how one can determine whether this selectivity holds. This problem was introduced to behavioral sciences by Sternberg [18] and Townsend [22]. In quantum physics, using different terminology, it was introduced by Bell [3] and elaborated by Fine [10,11]. The definition can be given in several equivalent forms, of which we present the one focal for the present context. Definition 2.1. The dependence of outputs A λ : λ ∈ Λ on inputs W λ : λ ∈ Λ (or the "influence" of the latter on the former) is (canonically) selective if there is a set of jointly distributed random variables H = H λ w : w ∈ W λ , λ ∈ Λ (one random variable for every value of every input), such that, for any treatment φ ∈ Φ, A φ = A λ φ : λ ∈ Λ (the corresponding elements of H φ and A φ being those sharing the same λ).
This definition is known as the Joint Distribution Criterion (JDC) for selectivity of influences, and the set H satisfying this definition is referred to as a (hypothetical) JDC-set. Specialized forms of this criterion in quantum physics can be found in [19] and [10,11]; in the behavioral context and in complete generality this criterion is given (derived from an equivalent definition) in [8].
Remark 2.2. The adjective "canonical" in the definition refers to the one-to-one correspondence between W λ and A λ sharing the same λ. A seemingly more general scheme, in which different A λ are selectively influenced by different (possibly overlapping) subsets of W λ : λ ∈ Λ is always reducible to the canonical form by considering, for every A λ , the Cartesian product of the inputs influencing it a single input, and redefining correspondingly the sets of input points and the set of allowable treatments.
The relevance of the order-distance and other p.q.-metrics on the sets of jointly distributed random variables to the problem of selectivity lies in the general test (necessary condition) for selectivity of influences, formulated after the following definition.
Definition 2.4. We call a sequence of input points If a JDC-set H exists, then for any p.q.-metric d on H we should have This chain inequality, written entirely in terms of observable probabilities, is referred to as a p.q.metric test for selectivity of influences. If this inequality is violated for at least one treatmentrealizable sequence of input points, no JDC-set H exists, and the selectivity is ruled out. Note: if the sequence φ (1) , . . . , φ (l) ∈ Φ for a given x 1 , . . . , x l can be chosen in more than one way, the observable quantities d A α1 φ (1) , A α l φ (1) and d A αi−1 remain invariant due to the (tacitly assumed) marginal selectivity.
As an example, let Λ = {1, 2}, for any treatment φ have a bivariate normal distribution with zero means, unit variances, and correlation ρ = min (1, The numerical substitutions yield, however, 1 4 ≤ 0 + 0 + 0, and as this is false, the hypothesis that W 1 , W 2 influence A 1 , A 2 selectively is rejected. The theorem below and its corollary show that one only needs to check the chain inequality for a special subset of all possible treatment-realizable sequences x 1 , . . . , x l . Proof. We prove this theorem by showing that if (2.1) is violated for some reducible sequence x 1 , . . . , x l , then it is violated for some proper subsequence thereof. Clearly, x 1 = x l because otherwise (2.1) is not violated. For l = 3, x 1 , x 2 , x 3 is reducible only if it is contained in a treatment: but then (2.1) would be satisfied. So l > 3, and the reducibility of x 1 , . . . , x l means that there is a pair {x p , x q } belonging to a treatment, with (p, q) = (1, l) and q > p + 1. But then (2.1) must be violated for either x p , . . . , x q or x 1 , . . . , x p , x q , . . . , x l (allowing for p = 1 or q = l but not both).
If Φ = λ∈Λ W λ (all logically possible treatments are allowable), then any subsequence x i1 , . . . , x i k of input points with pairwise distinct α i1 , . . . , α i k belongs to some treatment. Therefore an irreducible sequence cannot contain points of more than two inputs, and it is easy to see that then it must be a sequence of pairwise distinct . It is also easy to see that if m > 2, each of the subsets {x 1 , x 4 } and {x 2 , x 5 } will belong to a treatment. Hence m = 2 is the only possibility for an irreducible sequence.
Remark 2.8. This formulation is given in [8], although there it is unnecessarily confined to metrics of a special kind.

An application
The four tables below represent results of an experiment with a 2 × 2 factorial design, {x, x ′ } × {y, y ′ }, and two binary responses, A and B. In relation to our general notation, we have here Λ = {1, 2}, W 1 = {x, x ′ }, W 2 = {y, y ′ }, and four treatments (x, y) , . . . , (x ′ , y ′ ); for every treatment φ, the random outputs A 1 φ and A 2 φ are represented by, respectively, A φ and B φ , each having two possible values, arbitrarily labeled. This design is arguably the simplest possible, and it is ubiquitous in science. In a psychological double-detection experiment (see, e.g., [23]), the input values may represent presence (x and y) or absence (x ′ and y ′ ) of a designated signal in two stimuli labeled 1 and 2, presented side-by-side. The participant in such an experiment is asked to indicate whether the signal was present or absent in stimulus 1 and in stimulus 2. The output values A = • and B = ⊓ may indicate either that the response was "signal present" or that the response was correct; and analogously for A = • and B = ⊔ (either "signal absent" or an incorrect response). The entries p ij , q ij , etc. represent joint probabilities of the corresponding outcomes, a i· , a ′ i· , etc. represent marginal probabilities. The question to be answered is: does the response to a given stimulus (A to 1 and B to 2) selectively depend on that stimulus alone (despite A and B being stochastically dependent for every treatment), or is A or B influenced by both 1 and 2?
Another important situation in which we encounter formally the same problem is the Einstein-Podolsky-Rosen (EPR) paradigm. Two particles are emitted from a common source in such a way that they remain entangled (have highly correlated properties, such as momenta or spins) as they run away from each other [1,16]. An experiment may consist, e.g., in measuring the spin of electron 1 along one of two axes, x or x ′ , and (in another location but simultaneously in some inertial frame of reference) measuring the spin of electron 2 along one of two axes, y or y ′ . The outcome A of a measurement on electron 1 is a random variable with two possible values, "up" or "down," and the same holds for B, the outcome of a measurement on electron 2. The question here is: do the measurements on electrons 1 and 2 selectively affect, respectively, A and B (even though generally A and B are not independent at any of the four combinations of spin axes)? If the answer is negative, then the measurement of one electron affects the outcome of the measurement of another electron even though no signal can be exchanged between two distant events that are simultaneous in some frame of reference. What makes this situation formally identical to the double-detection example described above is that the measurements performed along different axes on the same particle, x and x ′ or y and y ′ , are non-commuting, i.e., they cannot be performed simultaneously. This makes it possible to consider such measurements as mutually exclusive values of an input.
exists if and only if the following eight inequalities are satisfied: We refer to (3.1) as Bell-CHSH-Fine inequalities, where CHSH abbreviates Clauser, Horne, Shimony, & Holt [4]: in this work Bell's [3] approach was developed into a special version of (3.1).
Remark 3.2. The proof given in [10,11] that (3.1) is both necessary and sufficient (under marginal selectivity) for the existence of a JDC-set can be conceptually simplified: the Bell-CHSH-Fine inequalities can be algebraically shown to be the criterion for the existence of a vector Q with 16 probabilities ⊓ that sum to one and whose appropriately chosen partial sums yield the 8 observable probabilities (other probabilities being determined due to marginal selectivity). This is a simple linear programming task, and the Bell-CHSH-Fine inequalities can be derived "mechanically" by a facet enumeration algorithm (see [25,26] and [2]). For extensions of the Bell-CHSH-Fine inequalities to multiple particles, multiple spin axes, and multiple random outputs, see [9] and [17]. For modern accounts of mathematical and interpretational aspects of the entanglement problem in quantum physics, see [12,13,14].
The point of interest in the present context is that the Bell-CHSH-Fine inequalities, whose rather obscure structure does not seem to fit their fundamental importance, turn out to be interpretable as the triangle inequalities for appropriately chosen order-distances.

Concluding remarks
The order-distances are versatile and have a broad sphere of applicability because order relations on the domains of any given set of random variables can always be defined in many different ways. If no other structure is available, this can always be done by the partitioning of the domains mentioned in Section 1 and used in the example with bivariate normal distributions in Section 2 as well as for the binary variables of the previous section: is a p.q.-metric. As a special case, consider a classification distance with binary partitions: the domain V ω of every H ω in H is partitioned into two (measurable) subsets, W (1) ω,υ and W (2) ω,υ . Making these partitions random, i.e., allowing the index υ to randomly vary in any way whatever, we get a new p.q.-metric. In the special case when all random variables in H take their values in the set of real numbers, and W (1) ω,υ is defined by z ≤ υ (z ∈ V ω ⊂ R, υ ∈R), the randomization of the partitions reduces to that of the separation point υ. The p.q.-metric then becomes where U is some random variable. An additively symmetrized (i.e., pseudometric) version of this p.q.-metric, d S (A, B) + d S (B, A), was introduced in [20,21] under the name "separation (pseudo)metric," and shown to be a conventional metric if U is chosen stochastically independent of all random variables in H.