# Bell's theorem

**Bell's theorem** asserts that if certain predictions of quantum theory are correct then our world is non-local. "Non-local" here means that there exist interactions between events that are too far apart in space and too close together in time for the events to be connected even by signals moving at the speed of light. This theorem was proved in 1964 by John Stewart Bell and has been in recent decades the subject of extensive analysis, discussion, and development by both physicists and philosophers of science. The relevant predictions of quantum theory were first convincingly confirmed by the experiment of Aspect *et al.* in 1982; they have been even more convincingly reconfirmed many times since. In light of Bell's theorem, the experiments thus establish that our world is non-local. This conclusion is very surprising, since non-locality is normally taken to be prohibited by the theory of relativity.

## Historical background

John Bell's interest in non-locality was triggered by his analysis of the problem of hidden variables in quantum theory and in particular by his learning about the de Broglie–Bohm^{1} "pilot-wave" theory (aka "Bohmian mechanics"^{2}). Bell wrote that David "Bohm's 1952 papers on quantum mechanics were for me a revelation. The elimination of indeterminism was very striking. But more important, it seemed to me, was the elimination of any need for a vague division of the world into 'system' on the one hand, and 'apparatus' or 'observer' on the other."^{3}

In particular, learning about Bohm's "hidden variables"^{4} theory helped Bell recognize the invalidity of the various "no hidden variables" theorems (by John von Neumann and others) which had been taken almost universally by physicists as conclusively establishing something like Niels Bohr's Copenhagen interpretation of quantum theory. Bohm's pilot-wave theory was a clean counterexample, i.e., a proof-by-example that the theorems somehow didn't rule out what they had been taken to rule out.

This led Bell to carefully scrutinize those theorems. The result of this work was his paper "On the problem of hidden variables in quantum mechanics"^{5}. This paper was written prior to the 1964 paper^{6} in which Bell's theorem was first presented, but (due to an editorial accident) remained unpublished until 1966. The 1966 paper shows that the "no hidden variables" theorems of von Neumann and others all made unwarranted — and in some cases unacknowledged — assumptions. (All these theorems involved an assumption^{7} which today is usually called *non-contextuality*.) In examining how Bohm's theory managed to violate these assumptions, Bell noticed that it did have one "curious feature": the theory was manifestly non-local. As Bell explained, "in this theory an explicit causal mechanism exists whereby the disposition of one piece of apparatus affects the results obtained with a distant piece."^{8} This naturally raised the question of whether the non-locality was eliminable, or somehow essential:

... to the present writer's knowledge, there is noproofthatanyhidden variable account of quantum mechanicsmusthave this extraordinary character. It would therefore be interesting, perhaps, to pursue some further 'impossibility proofs,' replacing the arbitrary axioms objected to above by some condition of locality, or of separability of distant systems.^{9}

Because of the editorial accident mentioned above, Bell had answered his own question before the paper in which it appeared was even published. The answer is contained in what we will here call "Bell's inequality theorem", which states precisely that "*any* hidden variable account of quantum mechanics *must* have this extraordinary character", i.e., must violate a locality constraint that is motivated by relativity.

But the more general result we here call "Bell's theorem" is much more than this: combined with the Einstein–Podolsky–Rosen (EPR) argument "*from locality to* deterministic hidden variables"^{10}, the inequality theorem establishes a contradiction between locality as such (and not merely some special class of local theories) and the (now experimentally confirmed) predictions of quantum theory.

## The EPR argument for pre-existing values

It is a general principle of orthodox formulations of quantum theory that measurements of physical quantities do not simply reveal pre-existing or pre-determined values, the way they do in classical theories. Instead, the particular outcome of the measurement somehow "emerges" from the dynamical interaction of the system being measured with the measuring device, so that even someone who was omniscient about the states of the system and device prior to the interaction couldn't have predicted in advance which outcome would be realized.

In a celebrated 1935 paper^{11}, however, Albert Einstein, Boris Podolsky, and Nathan Rosen pointed out that, in situations involving specially-prepared pairs of particles, this orthodox principle conflicted with locality. Unfortunately, the role of locality in the discussion is often misunderstood — or missed entirely. One thus often hears that the EPR paper is essentially just an expression of (in particular) Einstein's philosophical discontent with quantum theory. This is quite wrong: what the paper actually contains is an *argument* showing that, if non-local influences are forbidden, and if certain quantum theoretical predictions are correct, then the measurements (whose outcomes are correlated) must be revealing pre-existing values. It is on this basis — in particular, on the assumption of locality — that EPR claimed to have established the "incompleteness" of orthodox quantum theory (which denies the existence of any such pre-existing values).

In the 1935 EPR paper, the argument was formulated in terms of position and momentum (which are observables having continuous spectra). The argument was later reformulated (by Bohm^{12}) in terms of spin. This "EPRB" version is conceptually simpler and also more closely related to the recent experiments designed to test Bell's inequality.

The EPRB argument is as follows: assume that one has prepared a pair of spin-1/2 particles in the entangled spin *singlet state*

with \(\left\vert\uparrow\right\rangle\ ,\) \(\left\vert\downarrow\right\rangle\) an orthonormal basis of the spin state space. A measurement of the spin of one of the particles along a given axis yields either the result "up" (i.e., "spin up") or the result "down" (i.e., "spin down"). Moreover, if one measures the spin of both particles along some given axis (say, the \(z\)-axis), then quantum theory predicts that the results obtained will be perfectly anti-correlated, i.e., they will be opposite ("up" for one particle and "down" for the other). If such measurements are carried out simultaneously on two spatially-separated particles (technically, if the measurements are performed at space-like separation) then locality requires that any disturbance triggered by the measurement on one side cannot influence the result of the measurement on the other side. But without any such interaction, the only way to ensure the perfect anti-correlation between the results on the two sides is to have each particle carry a pre-existing determinate value (appropriately anti-correlated with the value carried by the other particle) for spin along the \(z\)-axis. Any element of locally-confined indeterminism would at least sometimes spoil the predicted perfect anti-correlation between the outcomes.

Now, obviously there is nothing special here about the \(z\)-axis, so what was just established for the \(z\)-axis applies to any axis. Thus it applies to all axes at once^{13}. That is, assuming (a) locality and (b) that the perfect anti-correlations predicted by quantum theory actually obtain, it follows that each particle must carry a pre-existing value for spin along all possible axes, with the values for the two particles in a given pair — which, of course, needn't be the same from one particle pair to another — perfectly anti-correlated, axis by axis. (A mathematical formulation of this argument is presented at the end of Section 5.)

## Bell's inequality theorem

Pre-existing values are thus the only local way to account for perfect anti-correlations in the outcomes of spin measurements along identical axes. But a simple argument shows that pre-existing values are incompatible with the predictions of quantum theory (for a pair of particles prepared in the singlet state) when we allow also for the possibility of spin measurements along different axes.

According to quantum theory, when spin measurements along different axes are performed on the pair of particles in the singlet state, the probability that the two results will be opposite (one "up" and one "down") is equal to \((1+\cos\,\theta)/2\ ,\) where \(\theta\in[0,\pi]\) is the angle between the chosen (oriented) axes. It follows from the simple mathematical result below, Bell's inequality theorem, that this is not compatible with the pre-existing values we have been discussing.

To see this, suppose that the spin measurements for both particles do simply reveal pre-existing values. Denote by \(Z^i_\alpha\ ,\) \(i=1,2\ ,\) the pre-determined outcome of the spin measurement for particle number \(i\) along axis \(\alpha\ .\) These values will evidently vary from one run of the experiment (i.e., one particle pair) to the next, and can thus be treated mathematically as random variables (each one assuming only two possible values, say 1 for "up" and -1 for "down").

Now consider three particular axes \(\mathbf a\ ,\) \(\mathbf b\ ,\) and \(\mathbf c\) that lie in a single plane and are such that the angle between any two of them is equal to \(2\pi/3\ .\) Then, since \(\big(1+\cos(2\pi/3)\big)/2=1/4\ ,\) agreement with quantum theory will require that \(P(Z^1_\alpha\ne Z^2_\beta)=1/4\) if \(\alpha\ne\beta\) are among \(\mathbf a\ ,\) \(\mathbf b\ ,\) \(\mathbf c\) (where \(P\) stands for probability). Agreement with quantum theory *also* requires opposite outcomes for identical measurement axes, i.e., \(Z^1_\alpha=-Z^2_\alpha\ ,\) for all \(\alpha\ .\) But it turns out that it is impossible to satisfy both requirements:

**Bell's inequality theorem.** Consider random variables \(Z^i_\alpha\ ,\) \(i=1,2\ ,\) \(\alpha=\mathbf a, \mathbf b, \mathbf c\ ,\) taking only the values \(\pm1\ .\) If these random variables are perfectly anti-correlated, i.e., if \(Z^1_\alpha=-Z^2_\alpha\ ,\) for all \(\alpha\ ,\) then:
\[(1)\quad P(Z^1_{\mathbf a}\ne Z^2_{\mathbf b})+P(Z^1_{\mathbf b}\ne Z^2_{\mathbf c})+P(Z^1_{\mathbf c}\ne Z^2_{\mathbf a})\ge1.\]

**Proof.** Since (at any given point of the sample space) the three \(\pm1\)-valued random variables \(Z^1_\alpha\) can't all disagree, the union of the events \(\{Z^1_{\mathbf a}=Z^1_{\mathbf b}\}\ ,\) \(\{Z^1_{\mathbf b}=Z^1_{\mathbf c}\}\ ,\) \(\{Z^1_{\mathbf c}=Z^1_{\mathbf a}\}\) is equal to the entire sample space. Therefore the sum of their probabilities must be greater than or equal to 1:

\[P(Z^1_{\mathbf a}=Z^1_{\mathbf b})+P(Z^1_{\mathbf b}=Z^1_{\mathbf c})+P(Z^1_{\mathbf c}=Z^1_{\mathbf a})\ge1.\]

But since \(Z^1_\beta = -Z^2_\beta\ ,\) we have that \(P(Z^1_\alpha=Z^1_\beta)=P(Z^1_\alpha\ne Z^2_\beta)\ .\) The thesis immediately follows.

Each of the three terms on the left hand side of (1) must equal \(1/4\) in order to reproduce the quantum predictions. But, since \(1/4+1/4+1/4=3/4<1\ ,\) the full set of quantum predictions cannot be matched. This establishes the incompatibility between the quantum predictions and the existence of pre-existing values.

We note that Bell's original paper^{6} considered for this purpose, instead of the disagreement probability \(P(Z^1_\alpha\ne Z^2_\beta)\ ,\) the correlation \(C(\alpha,\beta)\ ,\) defined as the expected value of the product \(Z^1_\alpha Z^2_\beta\ :\)

\[C(\alpha,\beta)=E(Z^1_\alpha Z^2_\beta)=P(Z^1_\alpha Z^2_\beta=1)\,-\,P(Z^1_\alpha Z^2_\beta=-1)=P(Z^1_\alpha=Z^2_\beta)\,-\,P(Z^1_\alpha\ne Z^2_\beta)=1\,-\,2P(Z^1_\alpha\ne Z^2_\beta).\]

Bell's original inequality (under the same assumptions as for Bell's inequality theorem above) is:

\[\vert C(\mathbf a,\mathbf b)-C(\mathbf a,\mathbf c)\vert\le 1+C(\mathbf b,\mathbf c).\]

Let us see how this inequality is related to inequality (1). Rewriting inequality (1) in terms of the correlations \(C(\alpha,\beta)\ ,\) we obtain:

\[\quad C(\mathbf a,\mathbf b)+C(\mathbf b,\mathbf c)+C(\mathbf c,\mathbf a)\le1.\]

Since (because of the perfect anti-correlations) \(C(\alpha,\beta)=C(\beta,\alpha)\ ,\) this yields that \[(2)\quad C(\mathbf a,\mathbf b)+C(\mathbf a,\mathbf c)+C(\mathbf b,\mathbf c)\le1.\]

Bell's original inequality is equivalent to the conjunction of two inequalities without absolute value: one of them is obtained from (2) by changing the signs of \(C(\mathbf a,\mathbf c)\) and \(C(\mathbf b,\mathbf c)\ .\) (This inequality follows, as (2) does, from Bell's inequality theorem above if we replace \(Z^i_{\mathbf c}\) with \(-Z^i_{\mathbf c}\ .\)) The other inequality is obtained from (2) by changing the signs of \(C(\mathbf a,\mathbf b)\) and \(C(\mathbf b,\mathbf c)\ .\) (This inequality follows from Bell's inequality theorem above by replacing \(Z^i_{\mathbf b}\) with \(-Z^i_{\mathbf b}\ .\))

## Bell's theorem

*Bell's theorem* states that the predictions of quantum theory (for measurements of spin on particles prepared in the singlet state) cannot be accounted for by any local theory. The proof of Bell's theorem is obtained by combining the EPR argument (from locality and certain quantum predictions to pre-existing values) and Bell's inequality theorem (from pre-existing values to an inequality incompatible with other quantum predictions).

Here is how Bell himself recapitulated the two-part argument:

Let us summarize once again the logic that leads to the impasse. The EPRB correlations are such that the result of the experiment on one side immediately foretells that on the other, whenever the analyzers happen to be parallel. If we do not accept the intervention on one side as a causal influence on the other, we seem obliged to admit that the results on both sides are determined in advance anyway, independently of the intervention on the other side, by signals from the source and by the local magnet setting. But this has implications for non-parallel settings which conflict with those of quantum mechanics. So wecannotdismiss intervention on one side as a causal influence on the other.^{14}

Already at the time Bell wrote this, there was a tendency for critics to miss the crucial role of the EPR argument here. The conclusion is not just that some special class of local theories (namely, those which explain the measurement outcomes in terms of pre-existing values) are incompatible with the predictions of quantum theory (which is what follows from Bell's inequality theorem *alone*), but that *local theories as such* (whether deterministic or not, whether positing hidden variables or not, etc.) are incompatible with the predictions of quantum theory. This confusion has persisted in more recent decades, so perhaps it is worth emphasizing the point by (again) quoting from Bell's pointed footnote from the same 1980 paper quoted just above: "My own first paper on this subject ... starts with a summary of the EPR argument *from locality to* deterministic hidden variables. But the commentators have almost universally reported that it begins with deterministic hidden variables."^{10}

## The CHSH–Bell inequality: Bell's theorem without perfect correlations

Perhaps motivated by this widespread and persistent misunderstanding concerning his 1964 paper^{6}, Bell wrote many subsequent papers in which he explained and elaborated upon his very interesting result from a variety of angles. After 1975^{15} Bell sometimes presented his result using a new strategy that does not rely on perfect (anti-)correlations and on the EPR argument. The new strategy has some advantages: *perfect* correlations cannot be demonstrated empirically, and one could also imagine the possibility that quantum theory might be replaced with a new theory that predicts some small deviation from the perfect correlations. So it is desirable to have a version of Bell's theorem that "depends continuously" on the correlations. The new strategy also sheds some light on the meaning of locality.

The idea is to write down a mathematically precise formulation of a consequence of locality in the context of an experiment in which measurements are performed on two systems which have previously interacted — say, systems that have been produced by a common source — but which are now spatially separated. (The EPR scenario considered above is of course an example of such an experiment.) Which of the several possible measurements are actually performed on each system will be determined by (control) parameters — \(\alpha_1\) and \(\alpha_2\) — which should be thought of as being randomly and freely chosen by the experimenters, just before the measurements. The measurements (and the choices of the control parameters) are assumed to be space-like separated. Once \(\alpha_1\) and \(\alpha_2\) are chosen, the experiment is performed, yielding (say, real-valued) outcomes \(A_1\) and \(A_2\) for the measurements on the two systems. While the values of \(A_1\) and \(A_2\) may vary from one run of the experiment to another even for the same choice of parameters, we assume that, for a fixed preparation procedure on the two systems, these outcomes exhibit statistical regularities. More precisely, we assume these are governed by probability distributions \(P_{\alpha_1,\alpha_2}(A_1,A_2)\) depending of course on the experiments performed, and in particular on \(\alpha_1\) and \(\alpha_2\ .\)

Notice that no assumption of pre-determined outcomes is being invoked here: part (or all) of the randomness of \(A_1\ ,\) \(A_2\) can arise during the process of measurement. By contrast, recall that in the above proof of Bell's inequality theorem using the random variables \(Z^i_\alpha\ ,\) the randomness was entirely located at the source, or at least occurred prior to the measurements. Moreover, in that context it was meaningful to talk about the joint probability distribution of \((Z^i_\alpha,Z^i_\beta)\) with \(\alpha\ne\beta\) (i.e., the joint probability distribution for outcomes of different measurements on the *same* system), while here a joint probability distribution of that type is not meaningful.

Let us now see how a mathematically precise necessary condition for locality can be formulated. First of all, one should realize that locality *does not* imply the independence \(P_{\alpha_1,\alpha_2}(A_1,A_2)=P_{\alpha_1,\alpha_2}(A_1)P_{\alpha_1,\alpha_2}(A_2)\) of the outcomes \(A_1\ ,\) \(A_2\ .\) Indeed, it is perfectly natural to expect that the previous interaction between the systems 1 and 2 could produce dependence relations between the outcomes. However, if locality is assumed, then it must be the case that any *additional* randomness that might affect system 1 *after* it separates from system 2 must be independent of any additional randomness that might affect system 2 after it separates from system 1. More precisely, locality requires that some set of data \(\lambda\) — made available to both systems, say, by a common source^{16} — must *fully* account for the dependence between \(A_1\) and \(A_2\ ;\) in other words, the randomness that generates \(A_1\) out of the parameter \(\alpha_1\) and the data codified by \(\lambda\) must be independent of the randomness that generates \(A_2\) out of the parameter \(\alpha_2\) and \(\lambda\ .\) Since \(\lambda\) can vary from one run of the experiment to the other, it should be modeled as a random variable.

Let us re-state these ideas mathematically\[\lambda\] is a random variable conditioning upon which yields a decomposition

\[(3)\quad P_{\alpha_1,\alpha_2}(A_1,A_2)=\int_\Lambda P_{\alpha_1,\alpha_2}(A_1,A_2|\lambda)\,\mathrm dP(\lambda),\]

into conditional probabilities obeying a factorizability condition of the form:

\[(4)\quad P_{\alpha_1,\alpha_2}(A_1,A_2|\lambda)=P_{\alpha_1}(A_1|\lambda)P_{\alpha_2}(A_2|\lambda).\]

The probability distribution \(P\) of \(\lambda\) should not be allowed to depend on \((\alpha_1,\alpha_2)\ ;\) this is the mathematical meaning of the assumption, noted above, that the control parameters \(\alpha_1\ ,\) \(\alpha_2\) are "randomly and freely chosen by the experimenters". One might imagine here that the experimenter on each side makes a free-will choice (just before the measurement) about how to set his apparatus, that is independent of the data codified by \(\lambda\) (which existed before the choices were made). One needn't worry, however, about whether experimenters have "genuine free will" or about what that exactly means. In a real experiment, the parameters \(\alpha_1\) and \(\alpha_2\) would typically be chosen by some random or pseudo-random number generator (say, a computer) that is independent of any other physical processes that might be relevant for the outcomes, and hence independent of \(\lambda\) — unless, that is, there exists some incredible conspiracy of nature (the kind of conspiracy that would make any kind of scientific inquiry impossible). We will thus call the assumption that the probability distribution of \(\lambda\) is independent of \((\alpha_1,\alpha_2)\) the *"no conspiracy" condition*.

Note that the "no conspiracy" condition *doesn't* follow from locality: even if we assume that the choices of \(\alpha_1\) and \(\alpha_2\) are made at space-like separation from the physical processes creating the value of \(\lambda\ ,\) it is still possible in principle that the supposedly random process determining \(\alpha_1\) and \(\alpha_2\) is in fact dependent, via some local influences from the more distant past, on whatever is going on in the process that creates \(\lambda\ .\) The "no conspiracy" assumption, then, is strictly speaking just that — an additional assumption (beyond locality) on which the derivation of Bell-type inequalities rests. That said, we stress that this assumption is necessarily always made whenever one does any empirical science; in practice, one assesses the applicability of the assumption to a given experiment by examining the care with which the experimental design precludes any non-conspiratorial dependencies between the preparation of the systems and the settings of instruments^{17}.

The precise mathematical setup for formulas (3) and (4) is the following: one considers a probability space \((\Lambda,P)\) and, with each \(\lambda\in\Lambda\) and each choice of the parameters \(\alpha_1\ ,\) \(\alpha_2\ ,\) one associates a probability measure \(P_{\alpha_1,\alpha_2}(\cdot|\lambda)\) on the set of possible values for the pair \((A_1,A_2)\ .\) Formula (4) says that, for each \(\lambda\in\Lambda\ ,\) the probability measure \(P_{\alpha_1,\alpha_2}(\cdot|\lambda)\) factorizes as the product of a probability measure \(P_{\alpha_1}(\cdot|\lambda)\) (the marginal of \(A_1\) given \(\lambda\)) that depends only on \(\alpha_1\) and a probability measure \(P_{\alpha_2}(\cdot|\lambda)\) (the marginal of \(A_2\) given \(\lambda\)) that depends only on \(\alpha_2\ .\) The probability distribution (3) of \((A_1,A_2)\) that is observed in the experiment (and for which quantum theory makes predictions) is obtained from \(P_{\alpha_1,\alpha_2}(\cdot|\lambda)\) by averaging (i.e., integrating) over \(\lambda\) with respect to the probability measure of the space \((\Lambda,P)\ .\) As in Section 3, we define the correlation \(C(\alpha_1,\alpha_2)\) as the expected value of the product \(A_1A_2\) for a given choice of \(\alpha_1\ ,\) \(\alpha_2\ :\)

\[C(\alpha_1,\alpha_2)=E_{\alpha_1,\alpha_2}(A_1A_2)=\int_\Lambda E_{\alpha_1,\alpha_2}(A_1A_2|\lambda)\,\mathrm dP(\lambda),\]

where \(E_{\alpha_1,\alpha_2}(A_1A_2|\lambda)\) is the expected value of the product \(A_1A_2\) with respect to the probability measure \(P_{\alpha_1,\alpha_2}(\cdot|\lambda)\ .\)

Now it is easy to prove the *CHSH inequality*^{18} (after John F. Clauser, Michael A. Horne, Abner Shimony, and Richard A. Holt). This inequality is also known in the literature as the *CHSH–Bell inequality* or simply "Bell's inequality". In this article we will call it the "CHSH–Bell inequality" in order to distinguish it from the inequalities of Section 3 which are used in the versions of Bell's theorem that require the assumption of certain perfect (anti-)correlations.

**Theorem.** Suppose that the possible values for \(A_1\) and \(A_2\) are \(\pm1\ .\) Under the mathematical setup described above, assuming the factorizability condition (4), the following inequality holds:

\[|C(\mathbf a,\mathbf b)-C(\mathbf a,\mathbf c)|+|C(\mathbf a',\mathbf b)+C(\mathbf a',\mathbf c)|\le2,\]

for any choice of parameters \(\mathbf a\ ,\) \(\mathbf b\ ,\) \(\mathbf c\ ,\) \(\mathbf a'\ .\)

**Proof.** It follows from (4) that \(E_{\alpha_1,\alpha_2}(A_1A_2|\lambda)=E_{\alpha_1}(A_1|\lambda)E_{\alpha_2}(A_2|\lambda)\ ,\) for all \(\lambda\ ,\) \(\alpha_1\ ,\) \(\alpha_2\ .\) Thus:

\[|C(\mathbf a,\mathbf b)-C(\mathbf a,\mathbf c)|+|C(\mathbf a',\mathbf b)+C(\mathbf a',\mathbf c)|\ :\] \[\le\int_\Lambda\Big[\big|E_{\mathbf a}(A_1|\lambda)\big|\,\big(\big|E_{\mathbf b}(A_2|\lambda)-E_{\mathbf c}(A_2|\lambda)\big|\big)\,+\,\big|E_{\mathbf a'}(A_1|\lambda)\big|\,\big(\big|E_{\mathbf b}(A_2|\lambda)+E_{\mathbf c}(A_2|\lambda)\big|\big)\Big]\,\mathrm dP(\lambda)\ :\] \[\le\int_\Lambda\Big[\big|E_{\mathbf b}(A_2|\lambda)-E_{\mathbf c}(A_2|\lambda)\big|\,+\,\big|E_{\mathbf b}(A_2|\lambda)+E_{\mathbf c}(A_2|\lambda)\big|\Big]\,\mathrm dP(\lambda),\]

where the second inequality follows from the observation that \(|E_\alpha(A_1|\lambda)|\le1\ .\) The conclusion now follows directly from the following elementary lemma:

**Lemma.** For real numbers \(x,y\in[-1,1]\ ,\) we have that \(|x-y|+|x+y|\le2\ .\)

**Proof.** Squaring \(|x-y|+|x+y|\) we obtain \(2x^2+2y^2+2|x^2-y^2|\ ,\) which is either equal to \(4x^2\) or to \(4y^2\ ;\) in either case, it is less than or equal to 4.

For the experiment considered in Section 2 (spin measurements on a pair of particles in the singlet state), quantum theory predicts \(C(\alpha,\beta)=-\alpha\cdot\beta\) (where the dot denotes the Euclidean inner product and the oriented axes \(\alpha\ ,\) \(\beta\) are identified with their corresponding unit vectors). For this experiment, the CHSH–Bell inequality is maximally violated by the quantum predictions if \(\mathbf b\) and \(\mathbf c\) are mutually orthogonal, \(\mathbf a'\) bisects \(\mathbf b\) and \(\mathbf c\ ,\) and \(\mathbf a\) bisects \(\mathbf b\) and the opposite axis \(-\mathbf c\ .\) In that case, the left hand side is equal to \(2\sqrt2\ .\) We remark also that the original Bell's inequality is obtained from the CHSH–Bell inequality by setting \(\mathbf a'=\mathbf b\) and using \(C(\mathbf b,\mathbf b)=-1\ .\)

We have thus established again the incompatibility between locality and certain predictions of quantum theory: we have proven that the CHSH–Bell inequality, which is violated by the quantum predictions, follows from the assumption of locality (and the "no conspiracy" condition).

Let us now take advantage of the mathematical formulation of (a consequence of) locality presented above — the factorizability condition (4) — in order to formulate mathematically the version of Bell's theorem presented in Section 4. Since Bell's inequality theorem has already been formulated mathematically, it remains for us to do so for the EPR argument as well. The mathematical statement (which we will prove in a moment) corresponding to the EPR argument is the following: assuming (4) and the perfect anti-correlations \(P_{\alpha,\alpha}(A_1\ne A_2)=1\ ,\) there exist random variables \(Z^i_\alpha\) on the probability space \((\Lambda,P)\) such that:

\[(5)\quad P_{\alpha_1,\alpha_2}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)\;\stackrel{(4)}=\;P_{\alpha_i}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)=1,\]

for \(i=1,2\) and all \(\lambda\ ,\) \(\alpha_1\ ,\) and \(\alpha_2\ .\)

Notice that (using integration over \(\lambda\)) equality (5) implies that, for all \(\alpha_1\ ,\) \(\alpha_2\ ,\) the probability distribution of the pair of random variables \((Z^1_{\alpha_1},Z^2_{\alpha_2})\) is equal to the (unconditional) probability distribution (3) of the pair of outcomes \((A_1,A_2)\) (the probability distribution observed in the experiment, for which quantum theory makes predictions). In particular, we have \(P_{\alpha_1,\alpha_2}(A_1\ne A_2)=P(Z^1_{\alpha_1}\ne Z^2_{\alpha_2})\ .\) The random variables \(Z^i_\alpha\) are precisely the ingredients necessary for the proof of Bell's inequality theorem and hence we obtain, as just announced, a mathematical formulation of the version of Bell's theorem presented in Section 4.

Here is the proof of the mathematical statement corresponding to the EPR argument: assume (4) and the perfect anti-correlations. It follows from \(P_{\alpha,\alpha}(A_1\ne A_2)=1\) that \(P_{\alpha,\alpha}(A_1\ne A_2|\lambda)=1\) holds for all^{19} \(\lambda\in\Lambda\ .\) When \(\alpha_1=\alpha_2=\alpha\ ,\) for each \(\lambda\in\Lambda\ ,\) the outcomes \(A_1\) and \(A_2\) given \(\lambda\) (whose joint probability distribution is \(P_{\alpha,\alpha}(\cdot|\lambda)\)) are at the same time independent (by (4)) and perfectly anti-correlated. An elementary lemma from probability theory shows that this can happen only if they are not really random, i.e., if they are constant. The constant may depend upon \(\alpha\) and \(\lambda\ ,\) and thus there are functions \(f_i\) such that \(P_{\alpha,\alpha}\big(A_i=f_i(\alpha,\lambda)|\lambda\big)=1\ .\) Define the random variables \(Z^i_\alpha\) by setting \(Z^i_\alpha(\lambda)=f_i(\alpha,\lambda)\ .\) In order to conclude the proof, observe that condition (4) implies:

\[P_{\alpha_1,\alpha_2}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)=P_{\alpha_i}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)=P_{\alpha_i,\alpha_i}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)=1.\]

## Bell's definition of locality

As we have stressed above, the crucial assumption from which one can derive various empirically-testable Bell-type inequalities is locality. (Bell sometimes also used the term *local causality* instead of locality). Bell explained the "principle of local causality" as follows:

The direct causes (and effects) of events are near by, and even the indirect causes (and effects) are no further away than permitted by the velocity of light.^{20}

In relativistic terms, locality is the requirement that goings-on in one region of spacetime should not affect — should not influence — happenings in space-like separated regions.

Although we have not presented any kind of careful mathematical definition of locality, we were able to prove in the previous sections
that certain quantum predictions are incompatible with locality. This was achieved by means of the formulation of a mathematically precise *necessary* condition for locality (in the context of a particular type of experiment): namely, the factorizability condition (4). It is possible, however, to formulate locality itself in a rigorous way, at least for a certain class of physical theories. Bell actually proposed two (subtly different) such formulations, one in his 1975 paper "The theory of local beables"^{15} and the other — which we will explain here — in his 1990 paper "La nouvelle cuisine"^{21}.

"Beable" is Bell's term for those elements of a theory which are "to be taken seriously, as corresponding to something real"^{22}. As an example, Bell cites the electric and magnetic fields of classical electromagnetism:

In Maxwell's electromagnetic theory, for example, the fields \(\mathbf E\) and \(\mathbf H\) are 'physical' (beables, we will say) but the potentials \(\mathbf A\) and \(\phi\) are 'non-physical'. Because of gauge invariance the same physical situation can be described by very different potentials.^{23}

As Bell points out, it is therefore no violation of locality "that in Coulomb gauge the scalar potential propagates with infinite velocity. It is not really supposed to *be* there."^{24}

The beables of a theory have values that (according to the theory) are supposed to exist independently of any observation or experiment. In this regard Bell contrasts the notion of beable with the notion of "observable" which features prominently in orthodox quantum theory:

The concept of 'observable' lends itself to very precisemathematicswhen identified with 'self-adjoint operator'. But physically, it is a rather woolly concept. It is not easy to identify precisely which physical processes are to be given the status of 'observations' and which are to be relegated to the limbo between one observation and another. So it could be hoped that some increase in precision might be possible by concentration on thebeables, which can be described in 'classical terms', because they are there.^{25}

This woolliness suggests that the notion of "observation" should not appear in the *formulation* of (candidate) fundamental physical theories. Indeed, every aspect of a physical process (including those processes we humans classify as "observations") should be completely reducible to the actions and interactions of some physically real objects — some beables. In an "observation", both the "observed system" and the relevant experimental apparatus, for example, must be made of beables, and anything like a measurement outcome which (say) emerges anew from the system-apparatus interaction must be contained in the final disposition of those beables.

Locality is the idea that physical influences cannot propagate faster than light. It thus presupposes a clear identification, for a given candidate theory, of which elements are supposed to correspond to something that is physically real. Here is how Bell makes this point: "No one is obliged to consider the question 'What cannot go faster than light?'. But if you decide to do so, then the above remarks suggest the following: you must identify in your theory 'local *be*ables'"^{26}. (We will discuss this again later.)

*Local* beables are those elements of a theory which should correspond to elements of physical reality *living within spacetime*. Those should include the representation of the ordinary objects of our experience, such as tables, chairs and experimental equipment. As Bell puts this:

These are the mathematical counterparts in the theory to real events at definite places and times in the real world (as distinct from the many purely mathematical constructions that occur in the working out of physical theories, as distinct from things which may be real but not localized, ...).^{27}

All the beables familiar from at least so-called "classical theories" are of this type — for example, the already mentioned fields in classical electromagnetism, or the positions of particles in classical mechanics. The possibility of *non-local* beables — corresponding to elements of physical reality which are not in spacetime — arises especially with respect to the several candidate versions of quantum theory, all of which involve a wave function (or quantum state) which, as a function on an abstract configuration space, will be a non-local beable if it is granted beable status at all. In the words of Bell:

... the wavefunction as a whole lives in a much bigger space, of \(3N\)-dimensions. It makes no sense to ask for the amplitude or phase or whatever of the wavefunction at a point in ordinary space. It has neither amplitude nor phase nor anything else until a multitude of points in ordinary three-space are specified.^{28}

Thus, one can meaningfully talk about "the local beables living within a region \(R\) of spacetime" or, more simply, "the local beables in region \(R\)"^{29}. Those represent, according to the theory, what is supposed to be really happening in \(R\ .\) On the other hand, there is no such thing as a non-local beable "living inside" a given region of spacetime.

Not surprisingly, it is less straightforward to assess the locality of theories positing non-local beables. Let us then turn to Bell's formulation of locality (which applies straightforwardly to theories of exclusively local beables) and then return to the question of non-local beables and how Bell's formulation can be extended to apply, for example, to theories positing quantum wave functions as non-local beables.

The thought motivating Bell's formulation is that a complete specification of the physical state of (i.e., the beables in) a spacetime region which closes off the past light cone of some event should include everything needed to make predictions about that event. More precisely, such a specification should render further information — about goings-on at space-like separation from the event in question — irrelevant and/or redundant for making predictions about that event. Referring to the spacetime diagram reproduced at right, Bell formulated this as follows:

A theory will be said to be locally causal if the probabilities attached to values of local beables in a space-time region 1 are unaltered by specification of values of local beables in a space-like separated region 2, when what happens in the backward light cone of 1 is already sufficiently specified, for example by a full specification of local beables in a space-time region 3.^{31}

More precisely, the following equality of conditional probabilities must hold in a local theory: \[P(x_1|x_2,X_3)=P(x_1|X_3),\] where \(x_1\) (resp., \(x_2\)) denotes the value of a local beable in region 1 (resp., in region 2) and \(X_3\) denotes a full specification of the local beables in region 3.

As Bell goes on to explain, it is crucial that region 3 shields^{32} region 1 from the overlapping past light cones of 1 and 2, and also that the specification of events in region 3 be complete; otherwise information about events in region 2 could well alter the probabilities assigned to events in 1 without this implying any violation of locality. For example, in a local non-deterministic theory, an event might occur subsequent to region 3 which was not predictable on the basis of even a complete specification of the local beables in region 3; such an event could then influence events in its own future light cone, giving rise to correlations — not predictable on the basis of information about region 3 — between space-like separated events. Such a mechanism could make information about events at space-like separation from 1 highly relevant for making predictions about 1, even when those predictions are conditionalized on complete information about region 3. The requirement that region 3 shields 1 from the overlapping past light cones of 1 and 2, however, precludes this possibility: the only way for information about such a region 2 to be relevant for predictions about 1 (once complete information about 3 has been taken into account) is if something somewhere is influencing events outside its future light cone, i.e., violating locality. It is likewise clear that information about goings-on in region 2 may very well usefully supplement predictions about events in 1 made on the basis of an *incomplete* specification of the values of the local beables in region 3, without any violation of locality being implied.

It is important to appreciate that Bell's proposed definition of locality applies primarily to candidate *theories*. There is then no particular mystery (at least for clearly-formulated theories) about, for example, which elements have beable status, or what a complete specification of local beables in some spacetime region might involve.

As suggested earlier in this section, Bell's definition of locality does not apply to arbitrary theories; also, it is not clear how one should rigorously define locality for arbitrary theories. Nevertheless, Bell's formulation can be extended in order to provide *necessary conditions* for the locality of the theories to which it does not apply as a definition (and, of course, such necessary conditions can be used to establish non-locality).

To begin with, Bell's definition of locality does not apply to theories positing non-local beables. Namely, one should certainly expect that not only the local beables in region 3, but also the non-local beables, should be relevant for making predictions about region 1. And of course one cannot talk about "the non-local beables in region 3" since non-local beables do not live inside regions of spacetime. However, for the only seriously-suggested example of a non-local beable — the wave function or quantum state — one can talk about its value on a Cauchy surface and it is natural (for the purpose of assessing the locality of the theory) to take "the complete description of the physical state of region 3" to mean the values of all local beables in region 3 *and* of the wave function in a given family of Cauchy surfaces that cover region 3.

Problems with Bell's definition also arise for *non-Markovian* theories, i.e., theories in which influences might "jump" over space-like surfaces. In that case, the region 3 displayed in the figure might not work properly as a shield and local^{33} non-Markovian theories could be incorrectly diagnosed by Bell's definition of locality as being non-local. For the non-Markovian case, Bell's definition should then be modified so that the equality \(P(x_1|x_2,X_3)=P(x_1|X_3)\) is required to hold only when region 3 is "sufficiently thick", in some precise sense that would have to be specified, depending on how non-Markovian the theory is. In the worst case scenario, the equality \(P(x_1|x_2,X_3)=P(x_1|X_3)\) would be required to hold only when region 3 includes the entire interior of the past light cone of region 1 from some point down. We observe, however, that this modified form of Bell's definition might incorrectly diagnose some non-local theories as being local^{34} so that it works only as a *necessary* condition for locality.

Let us now apply Bell's proposed definition of locality to the kind of experiment considered in the previous sections. (For the sake of simplicity, in what follows we will consider only theories for which Bell's definition applies directly, though it should be obvious how to adapt the exposition to more general theories for which — as discussed above — only a necessary condition for locality is available.) Recall that in Section 5 we took as a consequence of locality the factorizability condition (4); this condition involves a random variable \(\lambda\) that, by a "no conspiracy" assumption, is independent of \((\alpha_1,\alpha_2)\ .\)

Consider the spacetime diagram at right. Regions 1 and 2 contain the experiments performed on the two systems and the star in the intersection of the interior of their past light cones indicates the source. (The "particle worldlines" in the diagram are merely an illustration and play no role in the argument.) Thus, the parameter \(\alpha_1\) and the outcome \(A_1\) are (functions of) local beables in region 1 and, similarly, \(\alpha_2\) and \(A_2\) are (functions of) local beables in region 2. Note that the indicated region 3 shields off both regions 1 and 2 from their overlapping past light cones, so Bell's locality condition will require that facts about region 1 (in particular, \(\alpha_1\) and \(A_1\)) must be irrelevant for predictions about region 2, once a complete specification of the local beables in region 3 is given (and vice versa, exchanging the role of 1 and 2).

Denoting a complete specification of the local beables in region 3 by \(X\ ,\) we start with the identity^{35}:
\[P_{\alpha_1,\alpha_2}(A_1,A_2|X)=P_{\alpha_1,\alpha_2}(A_1|A_2,X)P_{\alpha_1,\alpha_2}(A_2|X)\]
and then we use locality to obtain \(P_{\alpha_1,\alpha_2}(A_1|A_2,X)=P_{\alpha_1}(A_1|X)\) and \(P_{\alpha_1,\alpha_2}(A_2|X)=P_{\alpha_2}(A_2|X)\ .\) It follows that:
\[P_{\alpha_1,\alpha_2}(A_1,A_2|X)=P_{\alpha_1}(A_1|X)P_{\alpha_2}(A_2|X).\]
The equality above looks like the factorizability condition (4), but there is a difference: the variable \(X\) includes much more data than the \(\lambda\) that we considered in Section 5. While it is reasonable to assume (as a "no conspiracy" condition) that \(\lambda\) is independent of \((\alpha_1,\alpha_2)\ ,\) it is not reasonable to assume that \(X\) is independent of \((\alpha_1,\alpha_2)\ .\) Namely, since \(X\) is the *complete* specification of the local beables in region 3, it is not only possible but *likely* that \(X\) will fail to be independent of \((\alpha_1,\alpha_2)\ .\)

Of course, assuming Bell's definition of locality alone, we cannot prove the existence of a subset \(\lambda\) of the data codified by \(X\) that is independent of \((\alpha_1,\alpha_2)\) and for which (4) holds. Namely, the existence of this \(\lambda\) is not a consequence of locality alone, as it depends also on the assumption of a "no conspiracy" condition. Unlike locality, the "no conspiracy" condition involves anthropocentric elements, such as the distinction between the parameters \(\alpha_1\ ,\) \(\alpha_2\) (instrument settings, controllable by human experimenters) and the various other beables that are relevant for the experiment. For this reason, it does not seem possible to write down a clean mathematical definition of "non-conspiratorial" theory (as Bell did for local theory) in terms of conditional probabilities for values of beables posited by the theory^{36}. (As usual, anthropocentric conditions are vague.) In particular, it is not possible to give a mathematical proof that for a "non-conspiratorial" local theory, there exists a \(\lambda\) independent of \((\alpha_1,\alpha_2)\) for which condition (4) holds. (Obviously, a mathematical proof cannot relate a mathematically formulated condition to a condition that is not formulated mathematically^{37}.)

Nevertheless, we can argue (without any pretension to mathematical formalization) that for a "non-conspiratorial" local theory, a subset \(\lambda\) of the data codified by \(X\) satisfying these properties does exist. We do that by analyzing the meaning of various subsets of the local beables living in region 3. To begin with, notice that (it is likely that) the vast majority of those beables are irrelevant for the experiment and can be ignored. Let us then focus on the beables that are relevant for the experiment. Some of these beables (call them \(\mathfrak a_1\)) will determine or influence the setting \(\alpha_1\ .\) Similarly, some of these beables (call them \(\mathfrak a_2\)) will determine or influence the setting \(\alpha_2\ .\) One can think about \(\mathfrak a_i\) as the beables describing a computer getting ready to choose the parameter \(\alpha_i\ .\) (In a deterministic theory, the parameter \(\alpha_i\) should be a *function* of \(\mathfrak a_i\ ,\) but for a stochastic theory there could be additional randomness in the process that generates \(\alpha_i\) from \(\mathfrak a_i\ .\)) We take \(\lambda\) to denote the remaining local beables in region 3 that are relevant for the experiment.

For a "non-conspiratorial" theory, one must be able to define the sets of local beables \(\mathfrak a_1\ ,\) \(\mathfrak a_2\ ,\) and \(\lambda\) in such a way that \(\lambda\) is independent of \((\alpha_1,\alpha_2)\ .\) Let us now argue that, if the theory is local, condition (4) must hold for this \(\lambda\ .\) Since, among the local beables in region 3, only \(\lambda\) and \(\mathfrak a_1\) are relevant for the outcome \(A_1\) and since \(\mathfrak a_1\) is relevant to \(A_1\) only through \(\alpha_1\ ,\) the same thoughts motivating Bell's definition of locality lead to the conclusion that, upon conditioning on \(\lambda\) and \(\alpha_1\ ,\) the outcome \(A_1\) should be independent of \((A_2,\alpha_2)\ ,\) i.e., \(P_{\alpha_1,\alpha_2}(A_1|A_2,\lambda)=P_{\alpha_1}(A_1|\lambda)\ .\) For similar reasons, we have \(P_{\alpha_1,\alpha_2}(A_2|\lambda)=P_{\alpha_2}(A_2|\lambda)\) and hence \(P_{\alpha_1,\alpha_2}(A_1,A_2|\lambda)=P_{\alpha_1,\alpha_2}(A_1|A_2,\lambda)P_{\alpha_1,\alpha_2}(A_2|\lambda)=P_{\alpha_1}(A_1|\lambda)P_{\alpha_2}(A_2|\lambda)\ ,\) i.e., (4) holds.

## Experiments

Bell's theorem brings out the existence of a contradiction between the empirical predictions of quantum theory and the assumption of locality. Since locality has been widely taken to be an implication of relativity theory, one thus has some grounds for wondering if the relevant predictions of quantum theory are correct. This question can only be addressed through experiment.

The first really convincing experimental tests of the relevant quantum predictions were produced in 1981—1982 by Aspect *et al.*^{38}. These experiments involved measuring the polarizations of pairs of photons emitted (in a state of total angular momentum zero analogous to the singlet state mentioned previously) during the decay from an excited state of calcium. Correlations between the outcomes of the two polarization measurements were monitored as the axes along which the polarizations were being measured were changed. Results consistent with the quantum predictions were observed and a Bell-type inequality was violated with high statistical significance. A subsequent experiment^{39} demonstrated that the quantum predictions continued to hold even when the apparatus settings (i.e., the axes along which the incoming photons' polarizations were measured) were not fixed until the last possible moment — after the photons had already been emitted by the source. (Rather than physically rotate a piece of measurement apparatus — a practical impossibility on the ten-nanosecond timescale involved in a photon's traversal of the several meters distance between the calcium source and a detector — Aspect *et al.* used an ingenious device that shunted each incoming photon — effectively randomly for the purpose at hand — to one of two polarization measurement devices of fixed orientation.)

The innovation of Aspect *et al.* represented an important first step toward closing the so-called *locality loophole*^{40}. Recall that the locality assumption used in, for example, the derivation of the CHSH–Bell inequality, requires that the (conditional) probability distribution for possible outcomes of one of the measurements be independent of the choice of apparatus setting for the other measurement. But this is a consequence of the *relativistic* notion of locality only if each apparatus setting is made too late for it to affect (via influences propagating at the speed of light) the distant measurement. Fixing the final apparatus settings only after the photons (moving at the speed of light) have been emitted ensured this. However, the 1982 experiment of Aspect *et al.* involved, on each side of the apparatus, a *periodic* switching between the two possible settings (albeit with incommensurate frequencies on the two sides); one could thus conceivably still worry that the photon source and/or the nearby measurement were somehow "anticipating" the final distant apparatus setting — thus violating the formal locality assumption but without violating relativity's supposed prohibition on superluminal influences.

The locality loophole was closed much more convincingly in a more recent experiment in Innsbruck by Weihs *et al.*^{41} in 1998. The basic experimental procedure was analogous to the one of Aspect *et al.*, but the Innsbruck group used entangled pairs of photons created in parametric down-conversion (instead of the decay of calcium atoms like in Aspect *et al.*) and high-speed electro-optic modulators to switch between two polarization measurement settings on each side. Importantly, the modulators could be controlled on a nanosecond timescale, allowing the choice between the two possible apparatus settings on each side to be made (by independent, spatially-separated quantum random number generators) only well after the window for possible light-speed influence on the distant measurement had passed. Leaving aside the possibility of a cosmic conspiracy, this setup thus guarantees that the formal locality assumption can be violated only if some data from the measurement on one side is being somehow broadcast, faster than light, to the photon and/or measuring device on the opposite side and influencing the results there. In light of Bell's theorem the experiment thus quite conclusively establishes the relativistic non-locality of the actual world.

Other experiments (Tittel *et al.*^{42}) have shown that the quantum predictions remain accurate even when the particles are allowed to separate by several kilometers before their polarizations are measured. Also, in experiments designed to close the so-called *detection loophole*^{43} (Rowe *et al.*^{44} and Matsukevich *et al.*^{45}), Bell-type inequalities were violated even when a much higher fraction of all emitted pairs was successfully detected.

Another interesting recent experiment (Salart *et al.*^{46}) relates experimental violations of a Bell-type inequality to the motion of the earth in order to put lower limits on the speed (relative to some hypothetical preferred frame) of any involved superluminal influences.

The most naive reading of standard presentations of quantum theory might lead one to the following view: the quantum observables, normally mathematically represented by self-adjoint operators on a Hilbert space, are the dynamical variables of the theory and represent elements of physical reality (beables). According to this view, when one talks about "measuring the observable \(A\)" one simply means that \(A\) has a value which is unknown to the experimenter and that the "measurement" makes the experimenter aware of this value (just as, say, measuring the cholesterol in my blood informs me of the pre-existing amount of cholesterol in my blood). A theory that assigns well-defined values to all quantum observables at all times (and for which "measurement" of an observable simply reveals that pre-existing assigned value) is usually known as a *non-contextual hidden variables theory*. There are several theorems implying that non-contextual hidden variables theories are incompatible with certain quantum predictions. (The various forms of) Bell's inequality can also be used to establish this incompatibility.

Let us explain the appropriate mathematical formulation of non-contextual hidden variables theories. Given a complex Hilbert space \(\mathcal H\ ,\) whose rays correspond to (pure) states of a quantum system, then a non-contextual hidden variables theory associates with each quantum state \(\psi\) and each self-adjoint operator \(A\) on \(\mathcal H\) a random variable \(Z_A\) on some probability space \((\Lambda,P)\) (that might depend on \(\psi\)). The value of \(Z_A\) at a point \(\lambda\) of \(\Lambda\) represents the value of the observable \(A\) for a system that, according to the theory, is described both by the quantum state \(\psi\) and by the extra variable \(\lambda\ .\) (Successive preparations of the quantum state \(\psi\) might generate different values of \(\lambda\ .\) The probability measure \(P\) on \(\Lambda\) describes their statistics.)

The compatibility condition between the non-contextual hidden variables theory and the empirical predictions of the given quantum theory is the following: if \(A_1, \ldots, A_n\) are mutually commuting self-adjoint operators on \(\mathcal H\ ,\) then the spectral measure^{47} on \(\mathbb R^n\) defined from the operators \((A_1,\ldots,A_n)\) and the state \(\psi\) should coincide with the distribution of the random vector \((Z_{A_1},\ldots,Z_{A_n})\ .\) Notice that both inequality (1) and Bell's original inequality follow from the assumption of the existence of a non-contextual hidden variables theory (that covers the relevant experiments); also the CHSH–Bell inequality can be derived from this assumption. The violation of such inequalities by the quantum predictions therefore shows that non-contextual hidden variables theories are incompatible with the quantum predictions for a state space \(\mathcal H\) having at least four dimensions. (Four is, of course, the number of dimensions of the Hilbert space associated to the spin degrees of freedom of two spin-1/2 particles.) Some authors call this result "Bell's theorem" and this has given rise to a few misunderstandings.

The term "non-contextual" is motivated by the following: consider observables \(A\ ,\) \(B\) and \(C\) with \([A,B]=0\ ,\) \([A,C]=0\) but \([B,C]\ne0\ ,\) so that while \(A\) and \(B\) are jointly measurable and \(A\) and \(C\) are jointly measurable, \(B\) and \(C\) are not. (Here \([A,B]=AB-BA\) denotes the commutator of \(A\) and \(B\ .\)) Then one can perform an experiment which counts as a "measurement" of both \(A\) and \(B\) and one can also perform an experiment which counts as a "measurement" of both \(A\) and \(C\ ,\) but these experiments must be different. If one assumes that such experiments reveal pre-determined values (i.e., if one assumes that nothing truly random in the outcome is being generated by the interaction of the apparatus with the system) then, since the two experiments under consideration are different, there is no justification for assuming that the pre-determined outcomes for the "measurement" of the observable \(A\) must be the same for both experiments. More precisely: within a theory that describes the quantum system using — besides the quantum state \(\psi\) — an extra variable \(\lambda\) that *determines* the outcomes of experiments, one could have *different* functions of \(\lambda\) (for a given \(\psi\)) associated to different strategies for "measuring" an observable \(A\ .\) A non-contextual hidden variables theory is therefore one that ignores the possibility that the value assigned to \(A\) might depend on the *experimental context*.

In simple terms, the assumption of "non-contextuality" is the assumption that the outcome of an experiment for "measuring" an observable \(A\) does not depend on the experiment — just on the given observable. But what distinct experiments that count as "measurements" of a given observable \(A\) must have in common is only *the probability distribution* on the set of all possible outcomes, for every possible preparation procedure for the system on which the "measurement" is going to be performed. In other words: two different experimental arrangements \(\mathcal E\ ,\) \(\mathcal E'\) designed for "measuring" the observable \(A\) should (within a theory in which the outcomes are pre-determined) be associated to (possibly) *different* random variables \(Z_A^{\mathcal E}\ ,\) \(Z_A^{\mathcal E'}\) on the probability space \((\Lambda,P)\) in which \(\lambda\) takes values. Of course, agreement with the quantum predictions requires that these different random variables have the same probability distribution (for every \(\psi\)).

Since everyone knows that different random variables can have the same probability distribution, it is somewhat surprising that so many are surprised by the incompatibility between "non-contextuality" and the quantum predictions. A possible explanation for this surprise might be the fact that many quantum observables usually carry nicknames (such as "momentum" and "energy") which are motivated by their association with certain quantities that are physically real according to some classical theory from which the given quantum theory was obtained by "quantization". Of course, words such as "momentum" and "energy" are quite powerful and suggest that one is talking about some physically real quantity. However, the statement that every quantum observable corresponds to a physically real quantity that is revealed by a measurement of that observable is logically incompatible with quantum theory.

Another way to prove that non-contextual hidden variables theories are not compatible with the quantum predictions is to prove the impossibility of a *value map*, i.e., a map \(v\) associating with each self-adjoint operator \(A\) on \(\mathcal H\) an element \(v(A)\) of the spectrum of \(A\) in such a way that \(v(A+B)=v(A)+v(B)\) and \(v(AB)=v(A)v(B)\ ,\) whenever \(A\) and \(B\) are *commuting* self-adjoint operators^{48}. It is easy to see that the existence of a non-contextual hidden variables theory compatible with the quantum predictions implies the existence of a value map; one must simply fix (a quantum state and) an element \(\lambda\) of the probability space \((\Lambda,P)\) where the random variables \(Z_A\) are defined and set \(v(A)=Z_A(\lambda)\)^{49}. Thus, the impossibility of a value map implies the incompatibility of non-contextual hidden variables theories with the quantum predictions.

The impossibility of a value map when \(\mathrm{dim}(\mathcal H)\ge3\) follows from *Gleason's theorem*^{50} (after Andrew M. Gleason) and also from the *Kochen–Specker theorem*^{51} (after Simon B. Kochen and Ernst P. Specker). Another proof of the impossibility of a value map when \(\mathrm{dim}(\mathcal H)\ge3\) was given by Bell himself^{5}, after Gleason and before Kochen–Specker. (See also Section IV of Mermin^{52} and references therein for other proofs.) When \(\mathrm{dim}(\mathcal H)=2\ ,\) the corresponding operator algebra is somewhat trivial and it turns out that a non-contextual hidden variables theory compatible with the quantum predictions *is* possible; a concrete example was constructed by Bell^{5}. When \(\mathrm{dim}(\mathcal H)\ge4\ ,\) a much simpler proof of the impossibility of a value map can be obtained from *Mermin's theorem*^{53} (after David Mermin).

## Bell's theorem without inequalities

There are approaches to establishing the incompatibility between locality and the quantum predictions that do not use probabilistic inequalities, but instead rely only on perfect correlations. In this section, we sketch three such approaches. The first is based on a generalization of the EPR argument given by Schrödinger; it has appeared in the general form presented here in Hemmick^{54}, but particular cases of it have appeared before^{55}. The second approach is based on a *GHZ state*^{56} (after Daniel M. Greenberger, Michael A. Horne, and Anton Zeilinger) and the third approach is based on *Hardy states*^{57} (after Lucien Hardy).

We start by presenting Hemmick's approach. It depends on the notion of maximally entangled state. Given finite-dimensional Hilbert spaces \(\mathcal H_1\ ,\) \(\mathcal H_2\) having the same dimension \(n\) and orthonormal bases \((e_1,\ldots,e_n)\ ,\) \((e'_1,\ldots,e'_n)\) of \(\mathcal H_1\) and \(\mathcal H_2\ ,\) respectively, one defines the *maximally entangled state* \(\psi\) associated to these bases by^{58}:

If a composite system is in a maximally entangled state then to each observable \(A\) on \(\mathcal H_1\) there can be associated another observable \(\overline A\) on \(\mathcal H_2\) in such a way that a measurement of \(A\) on the system corresponding to \(\mathcal H_1\) and a measurement of \(\overline A\) on the system corresponding to \(\mathcal H_2\) must always give the same outcome^{59}. We have thus a situation analogous to the one considered in the EPR argument, namely, perfect correlations between outcomes of measurements of \(A\) on the first system and outcomes of measurements of \(\overline A\) on the second system. Assuming locality and that the measurements are performed at space-like separation, we conclude that a measurement of \(A\) on the first system must actually be revealing a pre-existing value \(v(A)\ ,\) which must depend only on \(A\) and not on the experimental arrangement used to measure \(A\ .\) This map \(v\) is then a value map and any proof of the impossibility of a value map for the Hilbert space \(\mathcal H_1\) leads then to a proof of non-locality. As discussed above, such proofs can be given for \(\mathrm{dim}(\mathcal H_1)\ge3\ .\)

Let us now turn to the second approach, based on a GHZ state for three spin-1/2 particles. What we present here is a modification of a proof of the impossibility of a value map for eight-dimensional Hilbert spaces given in Mermin^{60}.
We consider a setup with space-like separated measurements of spin components being performed on three spin-1/2 particles. For the \(i\)-th particle, \(i=1,2,3\ ,\) the experimenter can choose between measuring spin either along the \(x\)-axis (the observable \(\sigma^i_x\)) or along the \(y\)-axis (the observable \(\sigma^i_y\)). As usual, the possible outcomes (for each particle) are either 1 or -1.

Consider the following four \(\pm1\)-valued observables:

A straightforward computation shows that these four observables are mutually commuting and that their product \(U_1U_2U_3U_4\) equals minus the identity^{61}. Therefore, there exists a state \(\psi\) which is an eigenstate for all of them and, moreover, the corresponding eigenvalues \(u_1\ ,\) \(u_2\ ,\) \(u_3\ ,\) and \(u_4\) satisfy \(u_1u_2u_3u_4=-1\ .\) Assume that the state prepared by the source is this common eigenstate \(\psi\ .\)

Since \(\psi\) is an eigenstate of \(U_1\) with eigenvalue \(u_1\ ,\) if the measured observables on the three particles are chosen to be \(\sigma^1_x\ ,\) \(\sigma^2_x\ ,\) and \(\sigma^3_x\) then the product of the three outcomes obtained must be equal to \(u_1\ .\) We now use locality and a three-sided analogue of the EPR argument^{62} to infer that the measurement of the observables \(\sigma^i_x\) must be revealing pre-existing values \(v(\sigma^i_x)\) satisfying \(v(\sigma^1_x)v(\sigma^2_x)v(\sigma^3_x)=u_1\ .\) Analogous arguments based on the fact that \(\psi\) is an eigenstate of the other observables \(U_2\ ,\) \(U_3\ ,\) and \(U_4\)
can be used to show that the measurement of *any one* of the six observables \(\sigma^i_\alpha\) must be revealing a pre-existing value \(v(\sigma^i_\alpha)\ ,\) with the suitable three factor products of these pre-existing values being equal to \(u_2\ ,\) \(u_3\ ,\) and \(u_4\ .\) It follows that the product \(u_1u_2u_3u_4\) — being the square of the product of the six values \(v(\sigma^i_\alpha)\) — is equal to 1. This contradicts the fact that \(u_1u_2u_3u_4=-1\ .\)

Finally, let us turn to the third approach, based on Hardy states. Hardy states constitute a large class of entangled states for two spin-1/2 particles: namely, every entangled state that is not maximally entangled is a Hardy state. We follow the notation of Goldstein^{63}, where the reader can find the detailed description of the relevant states and observables. The experimental setup consists of two spin-1/2 particles. For the \(i\)-th particle, \(i=1,2\ ,\) the experimenter can choose between measuring either the observable \(U_i\) or the observable \(W_i\ .\) The possible outcomes (for each particle) are taken to be either 0 or 1. As usual, measurements are performed at space-like separation. In what follows, for simplicity, we use the same notation for a quantum observable and for the outcome of its measurement (which, of course, is not assumed *a priori* to be pre-determined). For a given Hardy state, the observables \(U_i\) and \(W_i\) can be constructed so that the following four facts hold: (i) \(U_1U_2=0\ ;\) (ii) if \(U_1=0\) then \(W_2=1\ ;\) (iii) if \(U_2=0\) then \(W_1=1\ ;\) (iv) with positive probability, \(W_1=W_2=0\ .\)

It is easy to obtain a contradiction between locality and these four facts. Namely, assume locality^{64}. By (i), in a given run of the experiment, either \(U_1\) or \(U_2\) must carry a pre-existing value of 0 (but it *does not* follow — as it does in the EPR argument — that both of them carry pre-existing values). In a given run of the experiment in which \(U_1\) carries the pre-existing value 0, it follows from (ii) that \(W_2\) must carry the pre-existing value 1. Similarly, in a given run of the experiment in which \(U_2\) carries the pre-existing value 0, it follows from (iii) that \(W_1\) must carry the pre-existing value 1. Hence, in each run of the experiment, either \(W_1\) or \(W_2\) carries the pre-existing value 1 and this contradicts (iv).

## Controversy and common misunderstandings

There are many misunderstandings and controversies surrounding Bell's theorem. To begin with, we should note that while "Bell's theorem" as we have presented it here conforms with Bell's own understanding of his theorem, many other authors have presented as "Bell's theorem" very different arguments with very different conclusions — and many of those authors are often not even aware that what they are presenting differs so radically from Bell's own views. In this section we will try to shed some light on this messy state of affairs.

### Missing the role of the EPR argument entirely

Section II of Bell's original paper^{6} containing the celebrated theorem starts (after a short introduction, contained in the first section) with a one-paragraph recapitulation of the EPR argument (reformulated in terms of spin), i.e., it starts with the assumption of locality and it deduces from this assumption the existence of a "more complete specification of the state" (the kind of more complete specification of the state that Einstein thought would suffice to restore locality to quantum theory). Bell then claims that this more complete specification (the pre-existing values for the outcomes of spin measurements) leads to an incompatibility with the quantum predictions. The mathematical details of the proof of this incompatibility (i.e., the derivation of Bell's inequality) appears later, in Section IV.

It seems likely that many readers didn't pay sufficient attention to the first paragraph of Section II (the beginning of Bell's argument, i.e., the EPR argument) and jumped too quickly to the mathematical considerations of Section IV (the proof of the inequality). Indeed, Bell himself comments in a footnote of a later paper that "the commentators have almost universally reported that it [his original paper] begins with deterministic hidden variables"^{65}. One should also take into account the fact that, by the time Bell's theorem came along, the EPR argument was about 30 years old and it had been forgotten by many (or considered to have been somehow refuted by Bohr^{66}). Whatever the historical explanation for the misunderstanding might be, it turns out that the general understanding within the physics community regarding Bell's theorem was that it established the impossibility of "hidden variables" (or, for those a little better informed, of "local hidden variables") and the role of the EPR argument (i.e., the fact that the non-locality problem arises anyway if we regard quantum theory as complete) was missed entirely. Moreover, many authors took Bell's theorem to be a proof that, with regard to the EPR argument, Einstein was wrong and Bohr was right. While it is indeed true that Bell's theorem shows that Einstein was wrong, in that the *assumption* of the EPR argument (locality) turned out to be incorrect, it is not at all true that Bell's theorem shows that the EPR *argument* itself is not valid. In fact the EPR argument is correct and plays a crucial role in establishing that its main assumption is wrong. (That is, of course, a standard situation whenever a *reductio ad absurdum* is performed.)

Of course, not all commentators on Bell's theorem that disagree (knowingly or not) with Bell's conclusion have missed the EPR argument entirely. There are controversies and misunderstandings surrounding the EPR argument itself (or some poorly formulated version of it) and we shall discuss those in Subsection 10.3. But it should be recalled that not all presentations of Bell's theorem even require the EPR argument: for instance, the CHSH–Bell inequality can be proven directly from locality, as we have shown in Section 5. Of course, this alternative presentation of Bell's theorem generates controversies and misunderstandings of its own. One could, for instance, disagree with the claim that the mathematical formulation (4) of (a consequence of) the locality condition is adequate. (We will discuss some of those controversies and misunderstandings regarding the locality condition in Subsection 10.5.) In fact, though, since (as we have shown) the EPR argument becomes a simple mathematical theorem after (4) is accepted as a consequence of locality, it would be incoherent to accept that (4) is a consequence of locality but reject the EPR argument.

### Bell's theorem proves the impossibility of "local realism"

One currently popular account of Bell's theorem has it showing that "local realism" is incompatible with the quantum predictions, so that one has to choose between abandoning locality or abandoning realism. Those who talk about "local realism" rarely explain what they mean by "realism". (Is "realism" related to "hidden variables" of some sort? What exactly is meant by "hidden variables"? Is "realism" related to determinism?) And when they do, it often becomes clear that the "realism" under consideration *isn't* among the actual assumptions of Bell's theorem, so that abandoning that kind of realism isn't a viable strategy for saving locality^{67} ^{68}. In what follows we discuss a type of "realism" that is actually relevant for Bell's theorem, but, as we will see, abandoning that kind of realism won't turn out to be a viable strategy for saving locality either.

Before we go any further, it should be pointed out that the advent of quantum theory has made many physicists quite suspicious of any analysis of what might be happening in nature when "no one is watching". The double-slit experiment, the so-called "delayed-choice" experiments, Bohr's principle of complementarity (and the EPR–Bell argument itself) are sometimes seen as evidence that certain aspects of the microscopic world transcend human understanding or, alternatively, that any discussion concerning elements of physical reality is meaningless or beyond the scope of science. (The use of the words "quantum mechanical system", Bell once noted, can have "an unfortunate effect on the discussion"^{69}.) One should then allegedly settle for doing computations with operators and predicting the statistics of experimental outcomes. But, as discussed in Section 6, the very concept of locality involved in Bell's theorem cannot even be formulated without reference to elements of physical reality, i.e., to beables (and *local* beables)! Unfortunately, orthodox formulations of quantum theory are notoriously vague about which (if any) variables are to be taken seriously, as beables^{70}. This unfortunate situation muddles discussions regarding the locality of orthodox quantum theory.

The fact that "locality" cannot be seriously discussed without reference to local beables can be illustrated, for instance, by the following simple example: if a married man dies then his wife instantly becomes a widow. Of course, no one takes that to be an instance of non-locality. On the other hand, if the death of the husband were to cause, say, an instantaneous increase in the body temperature of his wife then this would indeed be considered a violation of locality. The difference between the two cases is that, while the state of being a widow isn't associated with any element of physical reality localized around the wife, her body temperature is: it is a function of the local beables in the region of spacetime containing the wife^{71}! Bell makes a similar point in his paper "La nouvelle cuisine":

When the Queen dies in London (may it long be delayed) the Prince of Wales, lecturing on modern architecture in Australia, becomesinstantaneouslyKing. (Greenwich Mean Time rules here.)^{72}

Bell goes on to present an example directly related to physics, namely, the example of the infinite velocity of propagation of the scalar potential in Coulomb gauge that we mentioned above. Bell then concludes (before he begins discussing the concept of local beable):

Conventions can propagate as fast as may be convenient. But then we must distinguish in our theory between what is convention and what is not.^{73}

While the EPR argument and Bell's theorem make no assumptions about what the elements of physical reality might be like, they cannot avoid talking about them. If one's criteria for accepting a sentence as being meaningful lead to the conclusion that any sentences that talk about "elements of physical reality" are meaningless^{74} then, according to such criteria, the relevant notion of locality for Bell's theorem (and thus Bell's theorem itself) becomes meaningless. Those who hold that position will avoid concluding that the quantum predictions imply non-locality, but they will also avoid the conclusion that the quantum predictions are compatible with locality! So refusing to talk about elements of reality is not a strategy by which one can defend the locality of quantum theory.

Hence, one possible notion of "realism" that is actually relevant for Bell's theorem is the willingness to accept statements about elements of physical reality as in principle meaningful. This "realism" isn't, however, an independent assumption that has to be taken together with "locality" for proving Bell's theorem; it is rather a *precondition* for the very meaningfulness of "locality". Thus, abandoning that sort of realism does not allow one to save locality; it merely prevents one from discussing it. (Another type of "realism" that is relevant for Bell's theorem will be discussed in Subsection 10.7.)

### Some controversy regarding the EPR argument

Analyses of the EPR argument normally are focused on the presentation that appears in the original 1935 paper^{11} by Einstein, Podolsky, and Rosen. The EPR paper was developed in order to present an argument establishing the incompleteness of quantum theory, i.e., establishing that there are some elements of physical reality that are omitted by the standard quantum description (in the sense that they are not determined by the quantum state^{75}).

With this goal in mind, the paper is careful about presenting a sufficient criterion for something to be an element of physical reality. The criterion presented is this: "If, without in any way disturbing a system, we can predict with certainty (i.e., with probability equal to unity) the value of a physical quantity, then there exists an element of physical reality corresponding to this physical quantity"^{76}. This criterion simply reflects the fact that if the outcome of some experiment isn't pre-determined by some element of physical reality (i.e., if it is not a function of something that was an element of physical reality before the experiment) then its outcome involves some randomness and hence cannot be predicted with certainty. Some commentators, however, have taken Einstein's criterion to be an assumption of some sort or have objected to Einstein's use of the notion of "element of physical reality". (As discussed above, the use of such a notion could indeed conflict with someone's philosophical position regarding what sentences are to be considered meaningful.)

A somewhat different kind of criticism against the EPR argument involves the claim that it (allegedly) depends on some suspicious reasoning involving counterfactuals^{77}. Here is an unfortunate formulation of the EPR argument that raises this kind of concern (under the setup with a pair of particles in the singlet state considered earlier): if the experimenter on one side chooses to measure spin along the \(z\)-axis then this experimenter can predict with certainty the outcome of the same measurement on the other side and therefore conclude that the outcome of this measurement corresponds to an element of physical reality there. The experimenter could, *instead*, choose to measure spin along the \(x\)-axis and, along the same lines, then conclude that the outcome of the same measurement on the other side corresponds to an element of physical reality. But the experimenter can only measure *either* the spin along the \(z\)-axis *or* the spin along the \(x\)-axis and thus (so the alleged rebuttal of the EPR argument goes) can't conclude that *both* the measurement outcomes (along the \(z\)-axis *and* along the \(x\)-axis) correspond to elements of physical reality on the other side, but rather only that one *or* the other (whichever one is in fact measured) does.

Considering this alleged rebuttal of the EPR argument, two observations are in order. First, for Einstein's original goal of establishing the incompleteness of quantum theory (assuming locality, of course), a simpler "single axis" version of the EPR argument is sufficient. The thesis of this "single axis" version of the EPR argument is merely that, when both experimenters choose to measure spin along the \(z\)-axis, then the outcomes of the measurements of spin along the \(z\)-axis are pre-determined. This "single axis" version of the EPR argument is (trivially) immune to the alleged rebuttal just discussed.

The second observation is that also the more general "several axes" version of the EPR argument — establishing the existence of pre-determined outcomes for measurements of spin along several axes *at once* — can be formulated without any counterfactuals and is
therefore also immune to the alleged rebuttal discussed above. (Of course, it is this "several axes" version of the EPR argument which is needed for Bell's theorem.)

Here is the formulation of the "several axes" version of the EPR argument that does not involve counterfactuals: in order to explain (without violation of locality) the fact that the outcomes will be perfectly anti-correlated if the experimenters both measure spin along the \(z\)-axis, one has to assume that these outcomes are
pre-determined. The same goes for measurements of spin along the \(x\)-axis. Even though, in each run of the experiment, *either* the \(z\)-axis *or* the \(x\)-axis is chosen along which to perform the measurements, the elements of physical reality that exist before the measurements *cannot depend on choices that will be made later by the experimenters*! This, indeed, doesn't follow from the assumption of locality itself but it does follow from the so-called "no conspiracy" assumption which states, roughly speaking, that the pair of particles prepared by the source does not "know" in advance what experiments are going to be performed on them later^{78}.

### Classical versus quantum probability (and logic)

Some authors regard the experiments yielding a violation of Bell-type inequalities as proving that classical probability theory is wrong and that it should be replaced by *quantum* probability theory. The term "quantum probability" is sometimes used simply to refer to the probabilities predicted by quantum theory for outcomes of experiments; those are, of course, distinct from the probabilities predicted by, say, classical mechanics. The term is, however, more often used to refer to the theory of quantum probability spaces^{79}. A *quantum probability space* can be defined as a pair \((\mathcal H,\psi)\) where \(\mathcal H\) is a complex Hilbert space and \(\psi\) is a unit vector in \(\mathcal H\)^{80}. One then uses the term (quantum) *event* to refer to a closed subspace \(\mathcal S\) of \(\mathcal H\ ;\) to each such subspace one can assign a probability which is the number \(\langle \psi,P_{\mathcal S}\,\psi\rangle\in[0,1]\ ,\) where \(P_{\mathcal S}\) denotes the orthogonal projection onto \(\mathcal S\ .\) Both the set of events of a classical probability space (i.e., the \(\sigma\)-algebra of measurable subsets of the sample space) and the set of (quantum) events of a quantum probability space carry the mathematical structure of a *lattice*, i.e., both are partially ordered sets (in both cases the partial order is inclusion) and any pair of elements admits a least upper bound (the "or" operation) and a greatest lower bound (the "and" operation). In both cases, the greatest lower bound is the intersection while, for the classical case, the least upper bound is the union and for the quantum case it is the closure of the sum^{81}.

Some formulas involving probabilities and the lattice operations of events that are true in the classical case are not true in the quantum case. This fact should not, however, be blamed on "quantum queerness" but on the fact that when one uses the words "and" and "or" to refer to the lattice operations of a quantum probability space one is *using these words with non-standard meanings*! Of course, one can always change the truth value of a sentence by changing the meaning of its words and this is not evidence that physical systems are strange and counterintuitive. The motivation for calling a closed subspace \(\mathcal S\) of a Hilbert space a (quantum) event is that a "quantum measurement" of the observable \(P_{\mathcal S}\) is a \(\{0,1\}\)-valued experiment; it yields the result 1 with probability \(\langle \psi,P_{\mathcal S}\,\psi\rangle\in[0,1]\ .\) However, given two arbitrary closed subspaces \(\mathcal S_1\ ,\) \(\mathcal S_2\) of \(\mathcal H\ ,\) the \(\{0,1\}\)-valued experiments associated with \(\mathcal S_1\) and \(\mathcal S_2\) are in general mutually incompatible. Therefore a statement of the form "both the measurement of \(P_{\mathcal S_1}\) and the measurement of \(P_{\mathcal S_2}\) yield the value 1" does not correspond to any experiment and in particular is not in any way related to the experiment that is associated with the subspace \(\mathcal S_1\cap\mathcal S_2\) (except for the case in which \(P_{\mathcal S_1}\) and \(P_{\mathcal S_2}\) commute, of course).

The alleged need to abandon classical probability theory is sometimes also argued for on the basis of an incorrect analysis of the double slit experiment. However, as long as the usual meanings of words are kept, there is no need to get rid of classical probability theory (or classical logic).
One should not confuse the use of the adjective "classical" as in "classical mechanics" with the use of the adjective "classical" as in "classical probability theory" or "classical logic". While classical mechanics is a *physical* theory which has been shown to be not *empirically* viable, classical probability theory and classical logic are methods of *reasoning* and cannot be tested empirically: such reasoning tools are what we use in order to draw conclusions from experiments so that we can decide which physical theories are or are not compatible with the results of those experiments.

Quantum probability theory is sometimes also seen as a new type of probability theory that allows for the possibility of *non-commuting random variables* which cannot be identified with (classical) random variables on a common probability space^{82}. Of course, there is nothing "non-classical" or particularly strange about the fact that random variables on a common probability space are not always the right way to model outcomes of experiments; in fact, there is no reason why one should expect that random variables on a common probability space could be used to model the outcomes of incompatible experiments (unless one works under the *assumption* that the outcomes of those experiments reveal functions of elements of reality that exist independently of whether or not the experiments are performed). We will return to this point later (in Subsection 10.6 and again in Subsection 10.8) when we discuss again misunderstandings related to the role of non-commutativity.

### Controversies and misunderstandings regarding the locality condition

The concept of locality that is relevant for Bell's theorem is sometimes mistakenly conflated with other concepts that appear in physics that are named "locality" by some authors. For instance, when one studies (classical or quantum) field theories, one learns that the Lagrangian of the theory should not contain terms of the form \(\phi(x)\phi(y)\ ,\) for example, involving the values of the field \(\phi\) at two or more different points of spacetime; Lagrangians not containing such terms are often referred to as being *local*. When one studies quantum field theories, one learns that space-like separated observables should commute, a requirement normally referred to as the *locality condition* or the *local commutativity condition*. Local commutativity is used to show that *superluminal signalling* is not possible within quantum field theory, i.e., the correlations predicted by the theory for outcomes of measurements performed at space-like separation cannot be used for communication between the experimenters. (In the notation of Section 5, this means that the unconditional marginal distribution of the outcome \(A_1\) does not depend on the parameter \(\alpha_2\) and, similarly, the unconditional marginal of \(A_2\) does not depend on \(\alpha_1\ .\))

The locality condition for the Lagrangian, local commutativity and the impossibility of superluminal signalling are all, of course, conditions that are *related* to the concept of locality that is relevant for Bell's theorem. But they are not *equivalent* to it. In fact, the very pair correlations between observables at space-like separation on the basis of which Bell concluded that quantum mechanics is non-local are well-defined (in a frame independent way) in quantum field theory precisely because, as a consequence of local commutativity, the observables do commute.

The fact that non-locality does not imply the possibility of superluminal signalling might appear particularly surprising; this fact will seem less surprising, however, if one keeps in mind that the concept of superluminal signalling involves anthropocentric notions such as *controllability* and *observability* that play no role in the concept of locality. In simpler words, the possibility of superluminal signalling is not just non-locality, it is a form of *controllable* non-locality. (Notice that, for instance, while the parameters \(\alpha_i\) are controllable by the experimenters, the outcomes \(A_i\) are not.)

Other misunderstandings are reflected by certain types of objections toward the adoption of the factorizability condition (4) as a consequence of locality. For instance, one might think that the \(\lambda\) appearing in (4) is a "hidden variable" or something suspicious of that sort. Nevertheless, the \(\lambda\) could, for instance, be nothing but the quantum state (which is taken to be fixed from one run of the experiment to the other, so that the probability space \((\Lambda,P)\) in which \(\lambda\) takes values is trivial in that case). Of course, if \(\lambda\) is nothing but the quantum state then condition (4) is not satisfied by the quantum predictions, as in that case there is nothing to explain the correlation between the outcomes. (This is precisely the point raised by the EPR paper^{11}.)

Some authors (notably, Jon Jarrett^{83}) have claimed that the locality condition proposed by Bell is too strong, i.e., it is more than just "locality". In order to understand the objection, one should notice first that condition (4) is equivalent to the conjunction of the following two sub-conditions:

\[(\text{OI})\quad P_{\alpha_1,\alpha_2}(A_1,A_2|\lambda)=P_{\alpha_1,\alpha_2}(A_1|\lambda)P_{\alpha_1,\alpha_2}(A_2|\lambda);\]

\[(\text{PI})\quad P_{\alpha_1,\alpha_2}(A_1|\lambda)=P_{\alpha_1}(A_1|\lambda),\quad P_{\alpha_1,\alpha_2}(A_2|\lambda)=P_{\alpha_2}(A_2|\lambda).\]

Condition (OI) says that, given \(\lambda\) and the parameters \(\alpha_1\ ,\) \(\alpha_2\ ,\) then the outcomes \(A_1\ ,\) \(A_2\) are independent. Condition (PI) says that, given \(\lambda\ ,\) then the marginal of the outcome \(A_1\) does not depend on the parameter \(\alpha_2\) and, similarly, the marginal of the outcome \(A_2\) does not depend on the parameter \(\alpha_1\ .\)

Condition (OI) is known in the literature as *outcome independence* and condition (PI) as *parameter independence*. Since the conjunction of (OI) and (PI) implies the CHSH–Bell inequality, it follows that any theory that makes the same predictions as quantum theory (and thus predicts the violation of the CHSH–Bell inequality) must violate either^{84} condition (OI) or condition (PI) (or both). Some authors claim that only condition (PI), rather than (4), is a consequence of locality. Since condition (PI) doesn't have to be violated by a theory that matches the quantum predictions, these authors conclude that there is no incompatibility between the quantum predictions and locality. The misunderstanding is likely to have originated from a misinterpretation of the meaning of \(\lambda\)^{85}. Namely, recall that in Section 6 we have defined \(\lambda\) to be the *complete* specification of the local beables (relevant for the experiment, but not for the process that chooses \(\alpha_1\) and \(\alpha_2\)) in a region of spacetime that shields the measurements from the intersection of the interior of their past lightcones^{86}. It is under this particular definition for \(\lambda\) that (4) is a consequence of locality. If one takes something else as a definition of \(\lambda\) then, indeed, a violation of condition (OI) might not imply a violation of locality.

A further note concerning the conditions (OI) and (PI): the distinction between \(A_i\) and \(\alpha_i\) (which allows one to separate (4) into (OI) and (PI)) is highly anthropocentric. Namely, the parameter \(\alpha_i\) is controllable by the human experimenter and the outcome \(A_i\) isn't. But such a distinction cannot play a role in the formulation of a fundamental concept such as locality. Nevertheless, there is a difference between violation of (OI) and violation of (PI) that is worth mentioning, since it might also have caused some authors to mistakenly regard only (PI) and not (OI) as a consequence of locality: if a theory satisfies (PI) and violates (OI), then it might be the case that the kind of non-local interaction between the two sides of the experiment can be thought of as a *symmetrical* interaction in which there is no objective fact about which side should be regarded as the "cause" and which side should be regarded as the "effect". In fact, because of this symmetry, one might argue that the cause/effect language should not be used in this case and that one should talk only about *interactions*. However, if (PI) is violated, the symmetry disappears, as it is reasonable to regard a parameter \(\alpha_i\) as one of the *causes* of the outcome on the other side of the experiment, but it is unreasonable to regard \(\alpha_i\) as a *consequence* of what happened on the other side of the experiment.

We have seen in Section 5 that the CHSH–Bell inequality can be proven from the assumption of locality. It is easy to see that the CHSH–Bell inequality can *also* be proven from the assumption of the existence of a non-contextual hidden variables theory (one that covers the relevant experiments, of course). Namely, the assumption of non-contextual hidden variables is mathematically formulated in terms of the existence of random variables \(Z^i_{\alpha_i}\) defined over a probability space \((\Lambda,P)\) such that the outcome \(A_i\) of the experiment with parameter choice \(\alpha_i\) is equal to the value of \(Z^i_{\alpha_i}\ .\) In other words, by conditioning on a given \(\lambda\in\Lambda\ ,\) we obtain a degenerate joint probability distribution for \((A_1,A_2)\) supported by the single outcome \(\big(Z^1_{\alpha_1}(\lambda),Z^2_{\alpha_2}(\lambda)\big)\ :\)
\[P_{\alpha_1,\alpha_2}\big(A_1=Z^1_{\alpha_1}(\lambda),\ A_2=Z^2_{\alpha_2}(\lambda)|\lambda\big)=1.\]
This degenerate joint probability distribution of \((A_1,A_2)\) given \(\lambda\) obviously satisfies the factorizability condition (4). The purely mathematical part of the argument showing that the CHSH–Bell inequality is a consequence of locality is nothing but a proof of the CHSH–Bell inequality from condition (4). Thus, this same mathematical reasoning (restricted to the particular deterministic case in which \(A_i\) is a function of \(\alpha_i\) and \(\lambda\)) is also a proof of the CHSH–Bell inequality from the assumption of non-contextual hidden variables.

This fact has caused certain misunderstandings. To begin with, the fact that the CHSH–Bell inequality can be proven from an assumption distinct from locality leads some authors into believing that the violation of the CHSH–Bell inequality does not imply non-locality. Of course, the correct thing to say is that we have *two* implications:

- (i) non-contextual hidden variables \(\Rightarrow\) CHSH–Bell inequality,
- (ii) locality \(\Rightarrow\) CHSH–Bell inequality,

and the logical conclusion is that the violation of the CHSH–Bell inequality by the quantum predictions gives (by (i)) yet another proof of the incompatibility of non-contextual hidden variables with the quantum predictions *and* (by (ii)) a proof of the incompatibility between locality and the quantum predictions.

Unfortunately, misunderstandings go beyond that. The presentation that maximizes the misunderstanding requires a different notation from what we have been using, so let us make some adaptations. Assume that the experimenter on one side can choose between measuring either the \(\pm1\)-valued quantum observable \(X_1\) or the \(\pm1\)-valued quantum observable \(Y_1\ .\) (In our old notation, the choice between \(X_1\) and \(Y_1\) corresponds to two distinct values for the parameter \(\alpha_1\ .\)) Similarly, assume that on the other side the experimenter chooses between measuring the \(\pm1\)-valued quantum observable \(X_2\) or the \(\pm1\)-valued quantum observable \(Y_2\ .\) The quantum prediction for the left hand side of the CHSH–Bell inequality — with the absolute values removed — is given by the expected value \(\langle S\rangle\) of the quantum observable^{87}:
\[S=X_1X_2-X_1Y_2+Y_1X_2+Y_1Y_2=X_1(X_2-Y_2)+Y_1(X_2+Y_2),\]
so that the existence of a quantum state for which the CHSH–Bell inequality^{88} \(\vert\langle S\rangle\vert\le2\) is violated is equivalent to the condition that the operator norm \(\Vert S\Vert\) be greater than 2. Taking into account that the observables \(X_i\ ,\) \(Y_j\) are \(\pm1\)-valued (so that their squares are equal to the identity) and that the observables carrying the index 1 commute with the observables carrying the index 2, a straightforward computation shows that:
\[S^2=4+(X_1Y_1-Y_1X_1)(X_2Y_2-Y_2X_2)=4+[X_1,Y_1][X_2,Y_2].\]
Since (because \(S\) is self-adjoint) \(\Vert S^2\Vert=\Vert S\Vert^2\ ,\) it follows that the existence of a quantum state for which the CHSH–Bell inequality \(\vert\langle S\rangle\vert\le2\) is violated is equivalent to the condition that \(\Vert S^2\Vert\) be greater than 4 and that condition is equivalent to the requirement that the commutators \([X_1,Y_1]\ ,\) \([X_2,Y_2]\) both be non-vanishing^{89}.

Now, let us assume that we have a non-contextual hidden variables theory and, with some (here, deliberate) abuse of notation, let us use the same symbol to denote a given quantum observable and to denote the corresponding random variable given by the non-contextual hidden variables theory. Since the product of random variables is obviously commutative, the same considerations as above show that \(S^2=4\ ,\) so that \(S\) takes only the values \(\pm2\) and therefore \(\vert\langle S\rangle\vert\le2\ .\) We have thus proven again that non-contextual hidden variables imply the CHSH–Bell inequality.

The considerations above present us with the following situation: (a) there is a proof of the CHSH–Bell inequality that does not use locality (it uses non-contextual hidden variables instead) and it uses the fact that the product of random variables is commutative; (b) there is a proof that quantum theory can violate the CHSH–Bell inequality, in which non-commutativity of observables plays a prominent role; (c) it is widely believed nowadays that the great novelty of a quantum theory over a classical one is the possibility of non-commuting observables. Combining (a), (b) and (c), we can easily get the false impression that violation of the CHSH–Bell inequality has nothing to do with non-locality, but rather with a certain "non-classical" character of nature that requires "non-commuting observables". Let us put aside the fact that it is not at all clear what such a "non-classical" character of nature requiring "non-commuting observables" really means. Because of (a), (b) and (c) (combined, possibly, with other misunderstandings discussed in this article), many physicists have missed the point that one can prove the CHSH–Bell inequality from the assumption of locality alone and, therefore, no matter what one believes about the role of non-commuting observables, it follows that the violation of the CHSH–Bell inequality implies non-locality.

### Many-worlds and relational interpretations of quantum theory

Strictly speaking, there is yet another assumption, besides locality and the "no conspiracy" condition that is necessary for the proof of Bell's theorem: one has to assume that, *after* the experiment on one given side is performed, its \(\pm1\)-valued outcome is a well-defined element of physical reality. (Recall that in Section 6, in order to apply Bell's definition of locality to the type of experiment considered in Section 5, we assumed that the outcomes \(A_1\) and \(A_2\) were functions of the local beables in regions 1 and 2, respectively.) Now one might wonder how anyone could deny that assumption. After all, the outcome of the experiment is recorded by the configuration of a macroscopic object (say, a pointer position, ink on a piece of paper, etc.) that can be directly inspected by a human experimenter. However, there exists one fairly popular interpretation of quantum theory that does deny that one has (after the experiments are concluded) a well-defined physically real \(\pm1\)-valued outcome on each side: the *many-worlds interpretation*^{90}. More precisely, according to the many-worlds interpretation, *both* outcomes are equally real on each side, so that it doesn't make sense to talk about "the one \(\pm1\)-valued outcome that actually occurs". Certain "relational" interpretations of quantum theory^{91} also deny that a completed experiment has a well-defined physically real outcome. It is possible that this type of strategy could succeed in evading the consequences of Bell's theorem, allowing for the possibility of a universe governed by a local theory such that conscious observers living in that universe attest to the validity of the quantum predictions. However, it is not clear how to actually do the trick. There are many difficulties and the subject is rather subtle. To begin with, there are controversies around the problem of finding an appropriate formulation of a many-worlds (or relational) interpretation. Moreover, it is not clear whether such an appropriate formulation can be made local, given that the wave function — which seems to be all there is in standard formulations of many-worlds theories — is not a localized object; in the terminology of Bell, it is not a local beable. (Indeed, if a theory has *no* local beables, it is certainly not meaningful to ask whether it is local or not in the relevant sense.) A formulation of a version of the many-worlds interpretation which includes, in addition to the wave function, some local beables, was presented in a recent paper^{92}, but it was found by the authors to be non-local. The question of whether a many-worlds (or relational) approach can be taken advantage of to create a local (and empirically viable) theory thus remains open — as does the question of how seriously one should take a theory of this type, should it be successfully constructed.

### Consistent histories

Proponents of the (decoherent or) consistent histories approach^{93} (CH) to quantum theory claim that this approach can avoid non-locality^{94}. A detailed exposition of CH is beyond the scope of this article, so let us just briefly review a few facts about it. First of all, CH is about histories; a *history* can be defined as a set \(\{E_1,\ldots,E_k\}\) of (orthogonal) projection operators \(E_i\ .\) A projection operator is a quantum \(\{0,1\}\)-valued observable which can be thought of as representing a proposition, such as a proposition about the "value" of a certain quantum observable (at a certain time). For example, if \(A\) is a quantum observable (i.e., a self-adjoint operator) and \(a\) is an eigenvalue of \(A\) then the proposition "\(A=a\)" is represented by the projection operator onto the \(a\)-eigenspace of \(A\)^{95}.

The history \(\{E_1,\ldots,E_k\}\) is to be thought as the *conjunction* of the propositions \(E_1,\ldots,E_k\ .\) (It is convenient to work in the Heisenberg picture, i.e., a quantum state is considered to be fixed and time-dependence is on the observables, so that each \(E_i\) is regarded as associated with a certain instant of time.) The theory treats certain families of histories, usually known as *decoherent families*, as special: those are sets of histories satisfying a certain *decoherence condition* which is formally similar to a condition stating that certain interference terms vanish. The decoherence condition allows one to assign probabilities to the histories belonging to a decoherent family in a way that is consistent with standard rules of probability theory. The probability^{96} that CH assigns to a history is simply the probability that orthodox quantum theory would assign to the observation of this history had one performed (ideal) quantum measurements of the observables \(E_i\ .\) Certain histories belong to no decoherent families at all and to such histories CH does not assign a probability. Histories consisting of mutually commuting projection operators *always* belong to at least one decoherent family and therefore CH defines a probability for those. (And there are also histories — belonging to some decoherent family — consisting of operators that do not commute.)

What is the main difference between CH and orthodox quantum theory? The difference is that while orthodox quantum theory is usually presented as some sort of algorithm for computing probabilities of outcomes of experiments, CH is supposed to be a theory about an objective reality in which observers making measurements do not play a privileged role. (It is supposed to be a *quantum theory without observers*^{97}.) More precisely, a history in CH is an event that may or may not happen (even if there are no observers of any sort around) and it is to that event that the theory assigns a probability. On the other hand, orthodox quantum theory merely talks about the probability of someone *observing* the given history, in case the measurements of the corresponding observables \(E_i\) are actually performed. So, for instance, while orthodox quantum theory talks about the probability of someone *finding* a particle in a given box (using a suitable detector), CH talks about the probability of this particle *being* in that box (with no detector being necessary).

Thus, in CH, a "quantum measurement" is really supposed to be a measurement, simply revealing the pre-existing value of the measured observable; it is not the interaction with the apparatus that creates the observed value. That sounds a lot like a non-contextual hidden variables theory, which, as we now know, must lead to inconsistencies with the quantum predictions. Indeed, while the probabilistic statements made by CH about the histories belonging to *one* given decoherent family do not lead to any inconsistencies, it is very easy to show^{97} (and this is uncontroversial) that probabilistic statements made by CH about histories belonging to *different* decoherent families sometimes *do*. In fact, any of the standard proofs of impossibility of non-contextual hidden variables compatible with the quantum predictions can be used to obtain such an inconsistency.

The proponents of CH have addressed this problem as follows: they have imposed a *rule*^{98} which says essentially that arguments involving probabilities for several histories, not all of which belong to the same decoherent family, are forbidden.

To illustrate this, let us look at the EPR setup from the point of view of CH. According to CH, the spin measurements on both sides merely reveal pre-existing values and therefore there is no difficulty in locally explaining the perfect anti-correlation when the oriented axes chosen on the two sides are the same. We thus have, as the conclusion of the EPR argument asserts, random variables \(Z^i_\alpha\) as in our discussion of Bell's inequality theorem. One can now, of course, proceed to the proof of inequality (1). CH assents that the three terms on the left hand side of (1) are equal to 1/4 and therefore we obtain the contradiction \(3/4\ge1\ .\) As explained above, this is just one of the many ways to obtain a contradiction from CH if one is allowed to use probabilities for histories belonging to different decoherent families — as do the three probabilities that appear on the left hand side of (1). However, by forbidding the reasoning used to prove inequality (1), the aforementioned rule of CH prevents us from arriving at the contradiction.

But a physical theory is not simply a game for which one can impose arbitrary rules about what reasonings are permitted for the propositions of the theory; if a physical theory implies both \(P\) and \(Q\) then the logical consequences of both \(P\) and \(Q\) will hold in a world governed by that theory and there is nothing that the proponents of the theory can do to prevent that. One might try to find an actual *objection* against the reasoning leading to inequality (1), but one cannot simply state as a "rule" that the reasoning is forbidden. We suspect that the proponents of CH would object to the proof of inequality (1) (within CH) by claiming that one cannot assume that all the random variables \(Z^i_\alpha\) are defined over the same probability space because on each run of the experiment the value of *only one* among the \(Z^1_\alpha\) and the value of *only one* among the \(Z^2_\alpha\) is going to be observed. But if the experiments merely reveal pre-existing values then, on each run of the experiment, *all* the variables \(Z^i_\alpha\) have a well-defined value (which may or may not turn out to be observed). By considering the frequencies of these (unobserved, yet existing) values, one obtains a joint probability distribution for all the variables \(Z^i_\alpha\ .\) Thus they can be modeled as random variables on the same probability space. The objection against the possibility of modeling the \(Z^i_\alpha\) as random variables on the same probability space is effective only when one takes their values to be created by the experiments: in that case, the joint probability distribution for all the \(Z^i_\alpha\) would indeed be meaningless, as their values would then correspond to the outcomes of incompatible experiments. But reinterpreted in terms of values being created by experiment, CH would be *pointless* — it would just be orthodox quantum theory.

## Non-locality and relativity

In the previous sections, we have explained in detail the theoretical analysis according to which one can conclude from certain experimental results that non-local interactions, of the sort often thought to be precluded by relativity theory, really exist in nature. This raises the obvious question: do these experimental results, then, show that relativity is *wrong*? Here we will not attempt to give an unambiguous answer, but will instead merely try to indicate very briefly some of the issues that a full answer would need to address.

To begin with, it is crucial to make a distinction between two different senses in which a theory might be said to be "relativistic". First, a theory might be *empirically relativistic*. This means that what it predicts for the outcomes of experiments will exhibit the usual relativistic properties — for example, it should predict the familiar relativistic behavior of clocks and meter sticks in relative motion. More generally, it should agree with classical relativistic mechanics about the behavior of macroscopic objects and it should predict (ignoring for the moment gravitation and general relativity) that an experimenter cannot tell whether an appropriately isolated laboratory has been set into uniform motion^{99}.

Despite the central role it is given in certain philosophies of science, however, observation is not everything. We thus need to recognize (at least) a second sense in which a theory might be said to be "relativistic" — namely, that it is compatible with relativity *through and through*, and not just at the (relatively superficial) level of empirical predictions. Such a theory will be said to be *fundamentally relativistic*. To make the distinction clear, it is helpful to contrast two different versions of classical electromagnetism. Let us call these the Lorentzian and the Einsteinian theories.

According to the *Lorentzian* theory, there is a physically meaningful notion of absolute rest (defined by the so-called "ether" rest frame) and a physically meaningful notion of absolute time. These correspond to the existence of a preferred family of coordinate systems over spacetime^{100} and the dynamics of the theory is defined, with respect to these coordinate systems, by the usual equations of electromagnetism. As a *consequence* of this dynamics, a clock at absolute rest measures absolute elapsed time, but a clock moving with absolute speed \(v\) ticks slower and it measures not absolute elapsed time, but absolute elapsed time multiplied by the usual relativistic factor \(\sqrt{1-(v/c)^2}\ .\) However, experimenters living in a world governed by this theory cannot distinguish absolute rest from absolute motion; the theory is *empirically relativistic*. (Bell's paper "How to teach special relativity"^{101} provides a detailed discussion.)

By contrast, the *Einsteinian* version of classical electromagnetism is of course relativistic, not just empirically, but *fundamentally*. The notion of a really-existing but unobservable "ether" rest frame is dispensed with and all uniform states of motion are regarded as equivalent^{102}.

Sometimes it is thought (and taught) that certain experiments from the late 19th or early 20th century refuted the Lorentzian theory in favor of the Einsteinian one. But this is not correct. With regard to their empirical predictions, there is no difference between the Lorentzian and Einsteinian theories. Nonetheless, they are different, as Bell explains in his paper "How to teach special relativity":

Since it is experimentally impossible to say which of two uniformly moving systems isreallyat rest, Einstein declares the notions 'really resting' and 'really moving' as meaningless. For him only therelativemotion of two or more uniformly moving objects is real. Lorentz, on the other hand, preferred the view that there is indeed a state ofrealrest, defined by the 'aether', even though the laws of physics conspire to prevent us identifying it experimentally. The facts of physics do not oblige us to accept one philosophy rather than the other.^{103}

Bell suggests that a "Lorentzian pedagogy" might usefully supplement the usual approach to teaching special relativity. About this Lorentzian approach Bell writes:

Its special merit is to drive home the lesson that the laws of physics in anyonereference frame account for all physical phenomena, including the observations of moving observers. And it is often simpler to work in a single frame, rather than to hurry after each moving object in turn.^{104}

For our purposes, there are three important lessons here. The first is that the empirical violation of Bell-type inequalities does not require theories that fail to be *empirically* relativistic. But this is hardly sufficient to assuage the worry: if empirical relativity is the only kind of relativity that can be saved, it's not clear that relativity, in any substantial sense, is being saved. The second lesson is thus that, if we want to insist on preserving compatibility with relativity, it is *fundamental* relativity (not mere *empirical* relativity) that we must insist on.

But the third lesson is that perhaps abandoning fundamental relativity should be on the table as a serious option. Doing so would *not* necessitate empirical predictions at odds with the experimental results that are normally taken to support relativity. And it is clear that the use of a dynamically preferred but unobservable "ether" frame would make it very easy for theories to incorporate the non-local interactions that Bell's theorem (and the associated experiments) require. Indeed, Bell himself took this possibility quite seriously:

It may well be that a relativistic version of [quantum] theory, while Lorentz invariant and local at the observational level, may be necessarily non-local and with a preferred frame (or aether) at the fundamental level.^{105}

Many readers may be puzzled by the claim that there might be any problem in making quantum theory compatible with relativity. After all, relativistic quantum field theories have been known for a long time^{106}. But these theories are normally presented merely as algorithms for predicting outcomes of experiments (i.e., they are all about what *observers* will see when measurements are performed) and, as such, they have nothing to say about the fundamental level. Therefore, it is simply meaningless to even discuss whether or not these theories are fundamentally relativistic. The issue of fundamental relativity is meaningful only for so-called *quantum theories without observers*^{97}, i.e., formulations of quantum theory that describe a universe in which observers play no special role in the formulation of the theory — but as a consequence of the theory are predicted to attest to the validity of the quantum predictions.

Examples of empirically (but not fundamentally) relativistic quantum theories without observers are provided by certain empirically relativistic versions of *Bohmian mechanics*^{107} which are formulated — like the Lorentzian version of electromagnetism — in terms of a preferred family of coordinate systems corresponding to notions of absolute rest and absolute time. The fact that these theories are indeed empirically relativistic is established as follows: first, one proves that these theories make the same predictions as some empirically relativistic version of quantum theory with respect to the preferred coordinate systems in which the theory is formulated. Then, one simply appeals to the fact stated in the Bell quote mentioned above: the predictions of a given theory for *one* given coordinate system account for everything that happens in spacetime, including the observations of moving observers.

So, is non-locality incompatible with *fundamental* relativity? The main difficulty for answering this question seems to be to decide what it means for a theory to be fundamentally relativistic. Most readers might be surprised to learn that this is a non-trivial matter, since they probably can make straightforward judgements such as "(Einsteinian) Maxwell's electromagnetism is fundamentally relativistic, Newtonian mechanics is not". Of course, for many examples of physical theories it is indeed straightforward to say whether or not the given theory is fundamentally relativistic. However, it turns out not to be easy to formulate the notion of "fundamentally relativistic theory" precisely^{108} and in the context of candidate theories of quantum phenomena our straightforward intuitive judgments about what a fundamentally relativistic theory should look like begin to fail us.

Let us illustrate how intuitive judgments about fundamental relativity might not be so readily available. *Foliations* of spacetime are normally taken to be an "anti-relativistic" structure. For instance, in the empirically (but not fundamentally) relativistic electromagnetic theory of Lorentz, a foliation — the simultaneity hyperplanes defined by the absolute time — is part of the structure of spacetime. Various empirically relativistic versions of Bohmian mechanics that are formulated in terms of a preferred family of coordinate systems corresponding to notions of absolute rest and absolute time can be reformulated by using a foliation of spacetime instead of the preferred family of coordinate systems. Incidentally, nothing forces us to interpret this foliation as being related to absolute time. What if this foliation of space-time — instead of being put in "by hand" — emerged as the solution to a Lorentz invariant law^{109}? Should the theory then be considered fundamentally relativistic? Or what if the foliation is extracted from objects — for example, the usual quantum mechanical wave function — which are already present in the theory^{110}? Or maybe the mere presence of a quantum mechanical wave function already constitutes a violation of fundamental relativity?

It is also possible that the non-local interactions required by Bell's theorem (and the associated experiments) could be incorporated without the aid of any foliation at all. Bell himself proposed a strategy^{111} and in fact this strategy has led to the construction of a relativistic^{112} spontaneous collapse theory of GRW type (after GianCarlo Ghirardi, Alberto Rimini and Tulio Weber^{113}) which involves no preferred foliation. Ghirardi has suggested a different strategy, based on the use of past light cones instead of a foliation^{114}.

In summary, it remains unclear what exactly "fundamental relativity" means or requires. Whether Bell's theorem and the associated experiments can be reconciled with fundamental relativity thus remains very much an open question.

## Notes and references

^{1 ^}D. Bohm, A suggested interpretation of the quantum theory in terms of "hidden" variables. I,*Phys. Rev.***85**n. 2 (1952), p. 166—179. D. Bohm, A suggested interpretation of the quantum theory in terms of "hidden" variables. II,*Phys. Rev.***85**n. 2 (1952), p. 180—193.

^{2 ^}As an introduction to the theory (presented in modern form), we suggest the following references: S. Goldstein, Bohmian mechanics,*The Stanford Encyclopedia of Philosophy*(2001, revised in 2006), available online. R. Tumulka, Understanding Bohmian mechanics: a dialogue,*Am. J. Phys.***72**n. 9 (2004), p. 1220—1226, arXiv:quant-ph/0408113v1. S. Goldstein, Bohmian mechanics and quantum information,*Found. Phys.***40**n. 4 (2010), p. 335—355, arXiv:0907.2427v1. For a detailed analysis of the theory, see: D. Dürr, S. Goldstein, and N. Zanghì, Quantum equilibrium and the origin of absolute uncertainty,*J. Stat. Phys.***67**n. 5—6 (1992), p. 843—907, arXiv:quant-ph/0308039v1.

^{3 ^}J. S. Bell, Beables for quantum field theory, 1984, available online, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 173 (p. 1 in the online version).

^{4 ^}Traditionally, the phrase "hidden variables" is used to characterize any elements supplementing the wave function of orthodox quantum theory. This terminology is, however, particularly unfortunate in the case of the de Broglie–Bohm theory, where it is in the supplementary variables — definite particle positions — that one finds an image of the manifest world of ordinary experience. As Bell explained: "Although [in the de Broglie–Bohm theory] \(\Psi\) is a real field it does not show up immediately in the result of a single 'measurement,' but only in the statistics of many such results. It is the de Broglie–Bohm variable \(X\) [the definite particle positions] that shows up immediately each time. That \(X\) rather than \(\Psi\) is historically called a 'hidden' variable is a piece of historical silliness." (J. S. Bell, On the impossible pilot wave, 1982, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 162—163.)

^{5 ^}^{a}^{b}^{c}J. S. Bell, On the problem of hidden variables in quantum mechanics, 1966, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 1—13.

^{6 ^}^{a}^{b}^{c}^{d}J. S. Bell, On the Einstein–Podolsky–Rosen paradox, 1964, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 14—21.

^{7 ^}Von Neumann's impossibility proof makes an assumption that is stronger than non-contextuality. Bell also discusses an impossibility proof given by Jauch and Piron (which contains an assumption that is stronger than non-contextuality) and an impossibility proof based on a theorem of Gleason (which assumes just non-contextuality).

^{8 ^}J. S. Bell, On the problem of hidden variables in quantum mechanics, 1966, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 11.

^{9 ^}*Ibid.*, emphasis in the original.

^{10 ^}^{a}^{b}J. S. Bell, Bertlmann's socks and the nature of reality, 1980, available online, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 157 (p. 19 in the online version), emphasis in the original.

^{11 ^}^{a}^{b}^{c}A. Einstein, B. Podolsky, and N. Rosen, Can quantum-mechanical description of physical reality be considered complete?,*Phys. Rev.***47**n. 10 (1935), p. 777—780, available online.

^{12 ^}D. Bohm, Quantum Theory, Prentice-Hall, 1951.

^{13 ^}This argument involves, however, also a hidden "no conspiracy" assumption that we will discuss in detail in Section 5, after we have presented a mathematical formulation of (a necessary condition for) locality. We will also discuss the importance of the "no conspiracy" assumption in the context of the EPR argument in Subsection 10.3.

^{14 ^}J. S. Bell, Bertlmann's socks and the nature of reality, 1980, available online, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 149—150 (p. 10 in the online version), emphasis in the original.

^{15 ^}^{a}^{b}J. S. Bell, The theory of local beables, 1975, available online, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 52—62.

^{16 ^}The set of data \(\lambda\) that explains the dependence relations between the outcomes doesn't have to originate exclusively from a common source for the two systems. For example, dependence relations between the outcomes could also arise from relations between the experimental apparatuses on both sides (typically, those apparatuses will have been produced by a common source). The set of data \(\lambda\) should then include*everything*— aside from the control parameters \(\alpha_1\) and \(\alpha_2\) — that existed before the measurements and that can be relevant for the outcomes \(A_1\) and \(A_2\ .\) We will provide further clarification on the meaning of \(\lambda\) at the end of Section 6.

^{17 ^}For example, if you are performing a drug versus placebo clinical trial, then you have to select some group of patients to get the drug and some group of patients to get the placebo. The conclusions drawn from the study will necessarily depend on the assumption that the method of selection is independent of whatever characteristics those patients might have that might influence how they react to the drug.

^{18 ^}J. F. Clauser, M. A. Horne, A. Shimony, and R. A. Holt, Proposed experiment to test local hidden-variable theories,*Phys. Rev. Lett.***23**n. 15 (1969), p. 880—884.

^{19 ^}More precisely, what follows from \(P_{\alpha,\alpha}(A_1\ne A_2)=1\) is that \(P_{\alpha,\alpha}(A_1\ne A_2|\lambda)=1\) holds for all \(\lambda\) in a subset \(G_\alpha\) of \(\Lambda\) having probability equal to 1. Then, strictly speaking, it cannot be proven that (5) holds for all \(\lambda\) in \(\Lambda\ ,\) but merely that (5) holds for all \(\lambda\) in the set \(G_{\alpha_i}\) (a set of probability 1). Nevertheless, since sets of probability zero are irrelevant for integration, it does follow from this that the probability distribution of the pair of random variables \((Z^1_{\alpha_1},Z^2_{\alpha_2})\) is equal to the (unconditional) probability distribution (3) of the pair of outcomes \((A_1,A_2)\ .\) Note, however, that for the latter argument the "no conspiracy" assumption is crucial: namely, in a "conspiratorial" model, the probability measure \(P\) on \(\Lambda\) depends on \(\alpha_1\) and \(\alpha_2\ .\) Denoting this probability measure by \(P_{\alpha_1,\alpha_2}\ ,\) we have \(P_{\alpha,\alpha}(G_\alpha)=1\ .\) However, the integral in (3) should be taken with respect to the probability measure \(P_{\alpha_1,\alpha_2}\) and it may not be true that \(P_{\alpha_1,\alpha_2}(G_{\alpha_1}\cap G_{\alpha_2})=1\ .\) In other words, the "bad" set — namely, the complement of \(G_{\alpha_1}\cap G_{\alpha_2}\) — consisting of those \(\lambda\) for which the pair \((Z^1_{\alpha_1},Z^2_{\alpha_2})\) is not even well-defined is not an irrelevant set of probability zero with respect to the relevant probability measure \(P_{\alpha_1,\alpha_2}\ .\) We note also that Bell's inequality theorem requires that the random variables \(Z^i_\alpha\) all be defined on the same probability space (endowed, of course, with*one*probability measure). So, without the "no conspiracy" assumption, we wouldn't have the right ingredients for Bell's inequality theorem. The importance of the "no conspiracy" assumption for the EPR argument will be discussed again in Subsection 10.3.

^{20 ^}J. S. Bell, La nouvelle cuisine, 1990, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 239.

^{21 ^}*Ibid.*, p. 232—248.

^{22 ^}*Ibid.*, p. 234.

^{23 ^}J. S. Bell, The theory of local beables, 1975, available online, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 52 (p. 1 in the online version).

^{24 ^}*Ibid.*, p. 52—53 (p. 1 in the online version), emphasis in the original.

^{25 ^}*Ibid.*, p. 52 (p. 1 in the online version), emphasis in the original.

^{26 ^}J. S. Bell, La nouvelle cuisine, 1990, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 234, emphasis in the original.

^{27 ^}J. S. Bell, Are there quantum jumps?, 1987, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 205.

^{28 ^}*Ibid.*, p. 204—205.

^{29 ^}In more mathematical terms, local beables can be seen as defining a map that associates to each (say, open) region \(R\) of spacetime the value at \(R\) (or the restriction to \(R\)) of the given local beable. For example, for an electromagnetic field (or, more generally, for any field over spacetime), one can associate to each region \(R\) of spacetime the restriction of the field to \(R\ .\) Similarly, if a theory posits particle worldlines, then one can associate to each region \(R\) of spacetime the intersection with \(R\) of the particle worldlines. For a value of a local beable at a given region of spacetime one can naturally define a notion of*restriction*to smaller subregions; moreover, the following*covering property*should be satisfied: if a region \(R\) is covered by a family of subregions \(R_i\) and if two specifications \(\mathcal B_1\ ,\) \(\mathcal B_2\) of the value of a local beable at \(R\) have the same restriction to \(R_i\) for all \(i\) then \(\mathcal B_1=\mathcal B_2\ .\) In other words, the value of a local beable at a given region \(R\) can be seen as being*determined*by its restrictions to smaller subregions \(R_i\ .\) A beable for which a notion of restriction to regions of spacetime satisfying this covering property is not available cannot be regarded as a local beable. While for quantum states in quantum field theory a reasonably natural notion of restriction to regions of spacetime is available — namely, the notion of*reduced density matrix*of the quantum state to the given region — the aforementioned covering property does not hold for this notion of restriction (and in particular quantum states cannot be local beables). The distinction between a local and a non-local beable is related to the distinction between a*sheaf*and a*presheaf*. More precisely, a beable for which a notion of restriction to regions of spacetime is available defines a presheaf over spacetime and the "covering property" is one of the conditions that must be satisfied in order for this presheaf to be a sheaf.

^{30 ^}J. S. Bell, La nouvelle cuisine, 1990, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 240.

^{31 ^}*Ibid.*, p. 239—240.

^{32 ^}It is easy to formulate this shielding condition precisely, even though Bell didn't bother to do it: the precise formulation says that any causal curve connecting an event inside the past light cone of region 2 to an event inside region 1 should cross region 3. Moreover, one should add the requirement (not explicitly mentioned by Bell) that region 3 do not intersect the interior of the future light cones of regions 1 and 2.

^{33 ^}Though non-Markovian theories are in some sense "non-local in time", this is not the type of non-locality of interest here.

^{34 ^}The problem is that if region 3 is "too big" then, for some theories, the equality \(P(x_1|x_2,X_3)=P(x_1|X_3)\) might be trivially satisfied, even in the presence of non-local influences. For example, if a theory were deterministic (in some appropriate sense) then for a "very big" region 3 it might happen that \(x_1\) and \(x_2\) are*determined*by (i.e., are functions of) \(X_3\ ,\) in which case \(P(x_1|x_2,X_3)=P(x_1|X_3)\) would trivially hold.

^{35 ^}The expression \(P_{\alpha_1,\alpha_2}(A_1,A_2|X)\) means, of course, \(P(A_1,A_2|X,\alpha_1,\alpha_2)\ .\)

^{36 ^}In Section 5, the "no conspiracy" condition was identified with the mathematical condition that \(\lambda\) is independent of \((\alpha_1,\alpha_2)\ .\) Note, however, that this mathematical condition is not a mathematical definition of "non-conspiratorial" theory, but merely a mathematical formulation of a "no conspiracy" condition in the context of a particular type of experiment. A mathematical definition of "non-conspiratorial" theory would have to be formulated in terms of the things appearing in the formulation of the theory, namely, the beables and their dynamics.

^{37 ^}For example, one cannot give a mathematical proof that the rigorous \(\varepsilon\)-\(\delta\) definition of continuous function is a consequence of the intuitive notion of continuity.

^{38 ^}A. Aspect, P. Grangier, and G. Roger, Experimental tests of realistic local theories via Bell's theorem,*Phys. Rev. Lett.***47**n. 7 (1981), p. 460—463. A. Aspect, P. Grangier, and G. Roger, Experimental realization of Einstein–Podolsky–Rosen–Bohm*Gedankenexperiment*: a new violation of Bell's inequalities,*Phys. Rev. Lett.***49**n. 2 (1982), p. 91—94.

^{39 ^}A. Aspect, J. Dalibard, and G. Roger, Experimental test of Bell's inequalities using time-varying analyzers,*Phys. Rev. Lett.***49**n. 25 (1982), p. 1804—1807.

^{40 ^}Bell discusses this point in his paper: Atomic-cascade photons and quantum-mechanical nonlocality, 1980, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 105—110.

^{41 ^}G. Weihs, T. Jennewein, C. Simon, H. Weinfurter, and A. Zeilinger, Violation of Bell's inequality under strict Einstein locality conditions,*Phys. Rev. Lett.***81**n. 23 (1998), p. 5039—5043.

^{42 ^}W. Tittel, J. Brendel, H. Zbinden, and N. Gisin, Violation of Bell inequalities by photons more than 10 km apart,*Phys. Rev. Lett.***81**n. 17 (1998), p. 3563—3566.

^{43 ^}This loophole is related to the inefficiency of photon detectors. The problem is that, for inefficient detectors, there are*local*toy models that can violate Bell-type inequalities. In these toy models the detected photons are not a random sample of all photons. Instead, the photons carry (from the source) specific instructions about whether or not they should allow themselves to be detected. For references presenting these toy models (and also for deeper discussion of experimental loopholes of Bell-type inequalities tests) see: A. Shimony, Bell's Theorem,*The Stanford Encyclopedia of Philosophy*(2004, revised in 2009), available online.

^{44 ^}M. A. Rowe, D. Kielpinski, V. Meyer, C. A. Sackett, W. M. Itano, C. Monroe, and D. J. Wineland, Experimental violation of a Bell's inequality with efficient detection,*Nature***409**(2001), p. 791—794, available online.

^{45 ^}D. N. Matsukevich, P. Maunz, D. L. Moehring, S. Olmschenk, and C. Monroe, Bell Inequality Violation with Two Remote Atomic Qubits,*Phys. Rev. Lett.***100**n. 15 (2008), 150404 (4 pages).

^{46 ^}D. Salart, A. Baas, C. Branciard, N. Gisin, and H. Zbinden, Testing the speed of 'spooky action at a distance',*Nature***454**(2008), p. 861—864.

^{47 ^}The probability measure on \(\mathbb R^n\) that associates to each Borel subset \(B\) of \(\mathbb R^n\) the probability \(\langle\psi,\chi_B(A_1,\ldots,A_n)\psi\rangle/\langle\psi,\psi\rangle\ ,\) where \(\chi_B\) denotes the characteristic function of \(B\ .\)

^{48 ^}One famous "impossibility proof" given by John von Neumann (J. von Neumann, Mathematical foundations of quantum mechanics, Princeton University Press, 1955, translated from the original in German: Mathematische Grundlagen der Quantenmechanik, Springer-Verlag, 1932) amounts simply to a proof of the impossibility of a value map \(v\) that satisfies \(v(A+B)=v(A)+v(B)\) for*any*pair of (not necessarily commuting) self-adjoint operators \(A\ ,\) \(B\ .\) Von Neumann's assumption about \(v\) is, however, not justified by the quantum predictions as quantum theory makes no predictions for joint values of \(A\ ,\) \(B\) and \(A+B\) when \(A\) and \(B\) do not commute.

^{49 ^}Actually there is a minor technical difficulty here: compatibility with quantum theory implies that the equalities \(Z_{A+B}=Z_A+Z_B\) and \(Z_{AB}=Z_AZ_B\) (when \(A\) and \(B\) commute) must hold*almost surely*(i.e., with probability 1), but not necessarily at*every*point \(\lambda\) of the probability space \(\Lambda\ .\) (Also the condition that \(Z_A\) takes values in the spectrum of \(A\) should hold merely almost surely.) However, for any countable set of operators one can find \(\lambda\in\Lambda\) such that the corresponding map \(v(A)=Z_A(\lambda)\) satisfies the required properties of a value map for that countable set of operators. Moreover, only a countable (actually, only a finite) number of operators are required for most standard proofs of the impossibility of value maps. Also, when \(\mathcal H\) is finite-dimensional, a compactness argument can be used (for instance, Rado's selection lemma) to establish the existence of a globally defined value map \(v\) from the existence of value maps defined for finite sets of operators.

^{50 ^}A. M. Gleason, Measures on the closed subspaces of a Hilbert space,*J. Math. Mech.***6**n. 6 (1957), p. 885—893, available online.

^{51 ^}S. Kochen and E. P. Specker, The problem of hidden variables in quantum mechanics,*J. Math. Mech.***17**n. 1 (1967), p. 59—87.

^{52 ^}N. D. Mermin, Hidden variables and the two theorems of John Bell,*Rev. Mod. Phys.***65**n. 3 (1993), p. 803—815.

^{53 ^}*Ibid.*, Section V.

^{54 ^}D. L. Hemmick, Hidden variables and nonlocality in quantum mechanics, PhD thesis, Department of Mathematics, Rutgers University, 1996, available online.

^{55 ^}A. Stairs, Quantum logic, realism and value-definiteness,*Philos. Sci.***50**n. 4 (1983), p. 578—602. P. Heywood and M. L. G. Redhead, Nonlocality and the Kochen–Specker paradox,*Found. Phys.***13**n. 5 (1983), p. 481—499. H. R. Brown and G. Svetlichny, Nonlocality and Gleason's lemma. Part I. Deterministic Theories,*Found. Phys.***20**n. 11 (1990), p. 1379—1387. A. Elby, Nonlocality and Gleason's lemma. Part 2. Stochastic theories,*Found. Phys.***20**n. 11 (1990), p. 1389—1397. The argument can also be found in: J. H. Conway and S. Kochen, The strong free will theorem,*Not. Am. Math. Soc.***56**n. 2 (2009), p. 226—232, available online. This paper has recently received a good deal of attention.

^{56 ^}D. M. Greenberger, M. A. Horne, and A. Zeilinger, Going beyond Bell's theorem, in Bell's theorem, quantum theory and conceptions of the universe, M. Kafatos (editor), Kluwer, 1989, p. 69—72, arXiv:0712.0921v1.

^{57 ^}L. Hardy, Nonlocality for two particles without inequalities for almost all entangled states,*Phys. Rev. Lett.***71**n. 11 (1993), p. 1665—1668.

^{58 ^}The state \(\psi\) thus defined is*not*independent of the chosen orthonormal bases. When \(\mathcal H=\mathcal H_1=\mathcal H_2\) is two-dimensional and the basis \((e'_1,e'_2)\) is given by \(e'_1=e_2\ ,\) \(e'_2=-e_1\) then the corresponding maximally entangled state is the*singlet state*, which is independent of the choice of the basis \((e_1,e_2)\ .\)

^{59 ^}Here are the details: let \(\overline A\) be the operator on \(\mathcal H_2\) whose matrix with respect to the basis \((e'_1,\ldots,e'_n)\) is the complex conjugate (or, equivalently, the transpose) of the matrix of \(A\) with respect to the basis \((e_1,\ldots,e_n)\ .\) It is readily checked that \(\psi\) is an eigenstate of the operator \(A\otimes\mathbf1-\mathbf1\otimes\overline A\) with eigenvalue zero.

^{60 ^}N. D. Mermin,*op. cit.*, Section VI.

^{61 ^}Here are the algebraic properties of the observables \(\sigma^i_\alpha\) which are relevant for this computation: (a) the square of \(\sigma^i_\alpha\) (\(i=1,2,3\ ,\) \(\alpha=x,y\)) is the identity; (b) \(\sigma^i_\alpha\) commutes with \(\sigma^j_\beta\) for \(i\ne j\ ;\) (c) \(\sigma^i_x\) anti-commutes with \(\sigma^i_y\) (because the axes \(x\) and \(y\) are orthogonal).

^{62 ^}This three-sided analogue of the EPR argument can be formulated mathematically — just as the ordinary EPR argument can — using (the suitable three-sided version of) the factorizability condition (4) as a necessary condition for locality. The following elementary lemma from probability theory must be used for this purpose: if the product of independent \(\pm1\)-valued random variables is constant then each of these random variables must be constant.

^{63 ^}S. Goldstein, Nonlocality without inequalities for almost all entangled states for two particles,*Phys. Rev. Lett.***72**n. 13 (1994), p. 1951, available online.

^{64 ^}The arguments that follow can be easily formulated mathematically using the factorizability condition (4) as a consequence of locality.

^{65 ^}J. S. Bell, Bertlmann's socks and the nature of reality, 1980, available online, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 157 (p. 19 in the online version).

^{66 ^}Bell discusses the position of Bohr regarding the EPR argument in the first appendix of his paper: Bertlmann's socks and the nature of reality, 1980, available online, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 139—168.

^{67 ^}For further discussion, see also: T. Norsen, Against 'realism',*Found. Phys.***37**n. 3 (2007), p. 311—340, arXiv:quant-ph/0607057v2.

^{68 ^}Complex experiments (whose results were published in prestigious journals) have been motivated by this particular misinterpretation of Bell's theorem. See, for instance: S. Gröblacher, T. Paterek, R. Kaltenbaek, Č. Brukner, M. Żukowski, M. Aspelmeyer, and A. Zeilinger, An experimental test of non-local realism,*Nature***446**(2007), p. 871—875, arXiv:0704.2529v2. For criticism of this paper, see: F. Laudisa, Non-local realistic theories and the scope of the Bell theorem,*Found. Phys.***38**n. 12 (2008), p. 1110—1132, arXiv:0811.2862v1.

^{69 ^}J. S. Bell, Bertlmann's socks and the nature of reality, 1980, available online, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 150 (p. 11 in the online version).

^{70 ^}For instance, it is not clear whether or not the quantum state is supposed to be a beable (and it definitely is not a local beable). As explained above, quantum observables are not beables. Moreover, it is not possible to simply give beable status to all quantum observables, as that would amount to having a non-contextual hidden variables theory and those are incompatible with the quantum predictions. However, if one at least concedes that measurement outcomes recorded in the configuration of macroscopic instruments are elements of physical reality (which seems, for example, to have been Bohr's view) then these will evidently be among the local beables, though these local beables are not explicit in standard presentations of quantum theory.

^{71 ^}One might be tempted here to say, instead, that the relevant difference between the two cases is the fact that the temperature is a local*observable*localized around the wife. This thought would motivate formulations of "locality" in terms of "observables", instead of beables, such as this: "Locality means that goings-on in one region of spacetime should not influence the value of an*observable*localized outside the future light cone of that region". However, there are at least two serious problems with any such attempts at formulating locality in terms of "observables" instead of beables. The first problem is that locality is supposed to be a fundamental notion, while any such attempts will inevitably lead to anthropocentric notions. (An attempt to formulate locality exclusively in terms of observers would likely end up producing a formulation of a condition known as no superluminal signalling instead of a formulation of locality.) The second problem is that, in general, it might not be meaningful to talk about "influencing the value of an observable" since "observables" don't necessarily have well-defined values that could be influenced. In quantum theory, the "observables" are, of course, always associated with probability distributions (for a given state) and one could try to think about locality in terms of the way that these probability distributions change, depending on what happens at a space-like separated region. The difficulty is that, without clarification of what the elements of reality are, it won't be possible to distinguish changes of probability distributions that reflect changes in physical reality from changes of probability distributions that reflect merely changes in our information about a system.

^{72 ^}J. S. Bell, La nouvelle cuisine, 1990, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 234, emphasis in the original.

^{73 ^}*Ibid.*.

^{74 ^}This type of criterion for meaningfulness of sentences — which usually goes by the name of*verificationism*— is one of the basic tenets of a school of philosophy known as*logical positivism*. Discussion of detailed criticism of various forms of logical positivism and verificationism is beyond the scope of this article. We should mention, however, that most scientific arguments do not run if statements about physical reality are taken to be meaningless — it is not just Bell's argument that requires discussion about physical reality. In fact, physicists (and scientists in general) rarely ever adopt logical positivist positions when the subject of discussion is not the foundations of quantum theory.

^{75 ^}This point is prone to generate controversy already. Since standard formulations of quantum theory are not explicit about its elements of physical reality, it is not clear that these elements of physical reality should be identified with the objects that are determined by (i.e., are functions of) the quantum state. Even though the orthodox interpretation is supposed to claim that the quantum state is the full description of the system, some authors (who would identify themselves as proponents of the orthodox formulation) would claim that the quantum state represents merely "our knowledge" or is merely a calculational device. Happily, if the EPR argument is to be used as the first part of Bell's theorem instead of an argument against some particular view of quantum theory then it becomes irrelevant to get agreement on what exactly quantum theory says.

^{76 ^}A. Einstein, B. Podolsky, and N. Rosen,*op. cit.*, p. 777.

^{77 ^}The use of counterfactuals (if it were indeed necessary) should indeed raise suspicion. Namely, not only are there controversies regarding the appropriate semantics for counterfactuals but the problem of formulating such a semantics becomes particularly difficult in the case of stochastic theories (and one most definitely shouldn't take determinism as an innocuous assumption in an analysis of quantum theory).

^{78 ^}Let us analyze this matter in terms of the mathematical formulation of the EPR argument. Recall that in a "conspiratorial" model the probability measure \(P\) on \(\Lambda\) depends on the axes \(\alpha_1\ ,\) \(\alpha_2\ ,\) so we denote it by \(P_{\alpha_1,\alpha_2}\ .\) The mathematical formulation of the thesis of the "single axis" version of the EPR argument (for an axis \(\alpha\)) is the following: there exists a subset \(G_\alpha\) of \(\Lambda\) with \(P_{\alpha,\alpha}(G_\alpha)=1\) such that, for all \(\lambda\in G_\alpha\ ,\) the probability distribution of the outcome \(A_i\) given \(\lambda\) is supported by a single outcome \(Z^i_\alpha(\lambda)\ ,\) i.e., \(P_{\alpha,\alpha}\big(A_i=Z^i_\alpha(\lambda)|\lambda\big)=1\ ,\) for \(i=1,2\ .\) The set \(G_\alpha\) can be thought of as containing the values of \(\lambda\) which codify a set of data that includes the (pre-determined) outcomes of spin measurements along the axis \(\alpha\ .\) The proof of this "single axis" version of the EPR argument does not require the "no conspiracy" assumption. However, the "several axes" version of the EPR argument does require the "no conspiracy" assumption. Indeed, observe that given two distinct axes (say, the \(z\)-axis and the \(x\)-axis) then, for a "conspiratorial" model, the "good" sets \(G_z\) and \(G_x\) have probability 1 with respect to the (possibly)*distinct*probability measures \(P_{z,z}\) and \(P_{x,x}\ ,\) respectively. In particular, nothing prevents the sets \(G_z\) and \(G_x\) from being*disjoint*, in which case it would*never*happen that the set of data codified by \(\lambda\) included the outcomes of spin measurements along both the \(z\)-axis and the \(x\)-axis. Of course, there is no difficulty in establishing (as we already have) the thesis of the "several axes" version of the EPR argument under the "no conspiracy" assumption.

^{79 ^}See, for instance: P.-A. Meyer, Quantum probability for probabilists, Springer-Verlag (Lecture notes in mathematics 1538), 1995.

^{80 ^}Of course, unit vectors differing only by a phase should be identified, as usual. Also, a more general definition of quantum probability space is obtained by replacing the unit vector \(\psi\) (a pure state) with a mixed state, i.e., a positive operator in \(\mathcal H\) of trace equal to 1. An even more general definition can be given, in terms of von Neumann algebras. However, the restricted definition of quantum probability space that we have adopted is sufficient for the purposes of this section.

^{81 ^}In the classical case the lattice is a*Boolean algebra*while, in the quantum case, the lattice is not even distributive. A Boolean algebra is the appropriate mathematical object to be used for the semantics of the*classical proposition calculus*(or classical logic). The non-distributive lattice of closed subspaces of a Hilbert space is also the object of study of what is normally called*quantum logic*. See, for instance: G. Birkhoff and J. von Neumann, The logic of quantum mechanics,*Ann. Math. (2)***37**n. 4 (1936), p. 823—843. V. S. Varadarajan, Geometry of quantum theory, Springer-Verlag, 1985.

^{82 ^}The*quantum random variables*of a quantum probability space \((\mathcal H,\psi)\) are the self-adjoint operators on the Hilbert space \(\mathcal H\ .\) Those constitute a (subspace of a) non-commutative algebra, while classical random variables (i.e., real-valued measurable functions over a classical probability space) form a commutative algebra. Because of the spectral theorem, families of*commuting*quantum random variables can be identified — in a suitable sense — with families of classical random variables on a common classical probability space.

^{83 ^}J. Jarrett, On the physical significance of the locality condition in the Bell arguments,*Noûs***18**(1984), p. 569—589.

^{84 ^}For example, in Bohmian mechanics (as in any deterministic theory) the outcomes \(A_1\ ,\) \(A_2\) are functions of \(\alpha_1\ ,\) \(\alpha_2\) and \(\lambda\) and therefore condition (OI) is trivially satisfied; of course, (PI) is violated. For theories in which \(\lambda\) is nothing but the quantum state (for instance, in spontaneous collapse theories of GRW type) then (PI) is satisfied and (OI) is violated.

^{85 ^}For a detailed discussion, see: T. Norsen, Local causality and completeness: Bell vs. Jarrett,*Found. Phys.***39**n. 3 (2009), p. 273—294, arXiv:0808.2178v1.

^{86 ^}Of course, the relevant non-local beables should also be included in \(\lambda\) in case the theory under consideration posits non-local beables.

^{87 ^}Notice that the experiments relevant to the CHSH–Bell inequality*are not*quantum measurements of the observable \(S\ ,\) but rather quantum measurements of the observables \(X_iY_j\ .\) The conclusion that the left hand side of the CHSH–Bell inequality (with absolute values removed) equals the expected value of \(S\) follows from the fact that the operation of taking the expected value of quantum observables is linear.

^{88 ^}More precisely, the inequality \(\vert\langle S\rangle\vert\le2\) is*not*the same inequality that we call "the CHSH–Bell inequality" in the rest of the article; indeed, the two inequalities differ with respect to the placement of the absolute values. Nevertheless, in order to simplify the exposition, we will refer to \(\vert\langle S\rangle\vert\le2\) as the "the CHSH–Bell inequality" in the discussion that follows. We observe, however, that the arguments that follow can be easily adapted to the CHSH–Bell inequality considered in the rest of the article (and these adaptations don't affect the established conclusions). For this purpose, note that the left hand side of the CHSH–Bell inequality considered in the rest of the article is equal to the maximum between \(\vert\langle S\rangle\vert\) and \(\vert\langle S'\rangle\vert\ ,\) where \(S'\) is defined by the same expression that defines \(S\ ,\) with \(Y_1\) replaced by \(-Y_1\ .\)

^{89 ^}An extra argument is necessary to establish the latter equivalence: notice that the self-adjoint operator \([X_1,Y_1][X_2,Y_2]\) is conjugated by the unitary operator \(X_1\) to the self-adjoint operator \(-[X_1,Y_1][X_2,Y_2]\ .\) This implies that the spectrum of \([X_1,Y_1][X_2,Y_2]\) is symmetrical with respect to the origin and, therefore, if \([X_1,Y_1][X_2,Y_2]\) is non-zero then its spectrum contains some positive real number and hence the spectrum of \(S^2\) contains some real number greater than 4. Moreover, keep in mind that \([X_1,Y_1]\) and \([X_2,Y_2]\) act on distinct factors of a tensor product and hence their product vanishes if and only if one of them vanishes.

^{90 ^}H. Everett III, Theory of the universal wave function, PhD thesis, Princeton University, 1956, available online. H. Everett III, Relative state formulation of quantum mechanics,*Rev. Mod. Phys.***29**n. 3 (1957), p. 454—462, available online. B. S. DeWitt and R. N. Graham (editors), The many-worlds interpretation of quantum mechanics, Princeton University Press, 1973.

^{91 ^}C. Rovelli, Relational Quantum Mechanics,*Int. J. Theor. Phys.***35**n. 8 (1996), p. 1637—1678, arXiv:quant-ph/9609002v2.

^{92 ^}V. Allori, S. Goldstein, R. Tumulka, and N. Zanghì, Many-worlds and Schrödinger's First Quantum Theory,*Brit. J. Philos. Sci.*, to appear, arXiv:0903.2211v2.

^{93 ^}M. Gell-Mann and J. B. Hartle, Quantum mechanics in the light of quantum cosmology, in Complexity, entropy, and the physics of information, W. Zurek (editor), Addison-Wesley, 1990, p. 425—458. R. Omnès, Logical reformulation of quantum mechanics. I. Foundations,*J. Stat. Phys.***53**n. 3—4 (1988), p. 893—932. R. B. Griffiths, Consistent histories and the interpretation of quantum mechanics,*J. Stat. Phys.***36**n. 1—2 (1984), p. 219—272.

^{94 ^}See, for instance: R. B. Griffiths, Quantum Locality,*Found. Phys.*, to appear, arXiv:0908.2914v2.

^{95 ^}More generally, if \(A_1,\ldots,A_n\) are mutually commuting self-adjoint operators, \(B\) is a Borel subset of \(\R^n\) and \(\chi_B\) denotes its characteristic function then the projection operator \(\chi_B(A_1,\ldots,A_n)\) represents the proposition stating that the "value" of \((A_1,\ldots,A_n)\) is in \(B\ .\)

^{96 ^}The probability is \(\langle E\,\psi,E\,\psi\rangle\ ,\) where \(E\) is the appropriately time-ordered product of the operators \(E_i\) and \(\psi\) denotes the normalized (fixed) quantum state.

^{97 ^}^{a}^{b}^{c}S. Goldstein, Quantum theory without observers—part one,*Phys. Today***51**n. 3 (1998), p. 42—46. S. Goldstein, Quantum theory without observers—part two,*Phys. Today***51**n. 4 (1998), p. 38—42. (Both parts are available online.)

^{98 ^}See, for instance: R. Omnès, The interpretation of quantum mechanics, Princeton University Press, 1994, p. 163.

^{99 ^}One should have an action of the Lorentz (more precisely, the Poincaré) group on experimental procedures that leaves probability distributions of outcomes invariant.

^{100 ^}By a coordinate system over spacetime is meant, of course, an assignment of coordinates \((t,\vec x)\) to spacetime points (or events). In terms of the mathematical formalism of the theory, a family of coordinate systems over spacetime, related to each other by compositions of spacetime translations with spatial rotations, is taken as primitive. Through these, the notions of absolute rest and time are then defined: velocities, defined (in the usual way) in terms of these coordinate systems, are to be taken seriously as*absolute*velocities and, moreover, elapsed times computed using the time-coordinate of these coordinate systems are to be taken seriously as*absolute*elapsed times.

^{101 ^}J. S. Bell, How to teach special relativity, 1976, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 67—80.

^{102 ^}In the Einsteinian theory there is no preferred family of coordinate systems over spacetime related to each other by compositions of spacetime translations with spatial rotations; rather, there is a preferred family of coordinate systems over spacetime related to each other by elements of the Poincaré group (i.e., compositions of spacetime translations with Lorentz transformations). Moreover, in the Einsteinian (but not in the Lorentzian) theory, the speed of light is incorporated into the structure of spacetime. In the Lorentzian (but not in the Einsteinian) theory there is a well-defined splitting of the electromagnetic field into an electric and a magnetic field so that, for instance, it is meaningful to say that "the magnetic field vanishes in a given region of spacetime".

^{103 ^}*Ibid.*, p. 77, emphasis in the original.

^{104 ^}*Ibid.*, emphasis in the original.

^{105 ^}J. S. Bell, Quantum mechanics for cosmologists, 1981, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 133.

^{106 ^}It is true that there are many highly non-trivial open problems concerning the issue of the*mathematically rigorous construction*of physically interesting quantum field theories. But the difficulties involved in solving these problems are not related to non-locality.

^{107 ^}J. S. Bell, Beables for quantum field theory, 1984, available online, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 173—180. D. Bohm and B. J. Hiley, The undivided universe: an ontological interpretation of quantum theory, Routledge, 1993. D. Dürr, S. Goldstein, R. Tumulka, and N. Zanghì, Bohmian mechanics and quantum field theory,*Phys. Rev. Lett.***93**n. 9 (2004), 090402 (4 pages), arXiv:quant-ph/0303156v2. D. Dürr, S. Goldstein, R. Tumulka, and N. Zanghì, Bell-type quantum field theories,*J. Phys. A: Math. Gen.***38**n. 4 (2005), 43 pages, arXiv:quant-ph/0407116v1. W. Struyve, Pilot-wave theory and quantum fields,*Rep. Prog. Phys.***73**n. 10 (2010), 106001 (30 pages), arXiv:0707.3685v4.

^{108 ^}One might be tempted to say that "fundamentally relativistic" means "Lorentz invariant" (if we ignore general relativity). However, it turns out that it is not easy to give a precise and uncontroversial formulation of what it means to say that a theory is invariant by a certain group. And it is not even clear that being "fundamentally relativistic" should be defined in terms of invariance groups. (And, of course, it may simply be that*there is no such thing*as the one "correct meaning" for the phrase "fundamentally relativistic theory" and, instead, several reasonable precise notions can be formulated in such a way that they all agree with our intuitive judgments when those are available.) For discussion concerning the notions of invariance/covariance groups and the philosophy of spacetime theories and relativity we suggest, for instance, the following books: J. L. Anderson, Principles of relativity physics, Academic Press, 1967. M. Friedman, Foundations of space-time theories: relativistic physics and philosophy of science, Princeton University Press, 1983. We suggest also the following review paper: J. D. Norton, General covariance and the foundations of general relativity: eight decades of dispute,*Rep. Prog. Phys.***56**(1993), p. 791—858, available online. These references do not discuss theories of quantum phenomena.

^{109 ^}Such a proposal was made in: D. Dürr, S. Goldstein, K. Münch-Berndl, and N. Zanghì, Hypersurface Bohm–Dirac models,*Phys. Rev. A***60**n. 4 (1999), p. 2729—2736, arXiv:quant-ph/9801070v2, along with suggestions for the law in Section IV. For a more serious suggestion, see Subsection 3.3.1 of: R. Tumulka, The "unromantic pictures" of quantum theory,*J. Phys. A-Math. Theor.***40**n. 12 (2007), p. 3245—3273, arXiv:quant-ph/0607124v1.

^{110 ^}S. Goldstein and N. Zanghì, Reality and the role of the wavefunction in quantum theory, to appear, arXiv:1101.4575v1.

^{111 ^}J. S. Bell, Are there quantum jumps?, 1987, reprinted in J. S. Bell, Speakable and unspeakable in quantum mechanics, Cambridge, 2004, p. 201—212.

^{112 ^}R. Tumulka, A relativistic version of the Ghirardi–Rimini–Weber model,*J. Stat. Phys.***125**n. 4 (2006), p. 821—840, arXiv:quant-ph/0406094v2. Of course, the issue of whether or not this theory is fundamentally relativistic depends on the clarification of the meaning of "fundamentally relativistic".

^{113 ^}Spontaneous collapse theories of GRW type provide another family of examples of quantum theories without observers. The original paper in which the theory appeared is: G. C. Ghirardi, A. Rimini, and T. Weber, Unified dynamics for microscopic and macroscopic systems,*Phys. Rev. D***34**n. 2 (1986), p. 470—491. We also suggest these other references for the subject: J. S. Bell,*op. cit.*. G. C. Ghirardi, Collapse theories,*The Stanford Encyclopedia of Philosophy*(2002, revised in 2007), available online. R. Tumulka, The point processes of the GRW theory of wave function collapse,*Rev. Math. Phys.***21**n. 2 (2009), p. 155—227, arXiv:0711.0035v1.

^{114 ^}G. C. Ghirardi, R. Grassi, and P. Pearle, Relativistic dynamical reduction models: general framework and examples,*Found. Phys.***20**n. 11 (1990), p. 1271—1316. G. C. Ghirardi, Local measurements of nonlocal observables and the relativistic reduction process,*Found. Phys.***30**n. 9 (2000), p. 1337—1385.