# Luce's choice axiom

Post-publication activity

Curator: R. Duncan Luce

When a person chooses among alternatives, very often their responses appear to be governed by probabilities that are conditioned on the choice set. But ordinary probability theory with its standard definition of conditional probability does not seem to be quite what is needed. An example illustrates the difficulty. When deciding how to travel from home to another city, your choice may be by airplane $$(a),$$ bus $$(b),$$ or car $$(c) .$$ Let $$A,B,C$$ denote the uncertain states of nature associated with each form of travel. Note that if one elects $$c$$ all of the uncertainties of $$A$$ and $$B$$ remain because planes fly and busses run whether or not you are on them. However, if you elect either $$a$$ or $$b\ ,$$ then your car remains in the garage and the set $$C$$ is radically altered from when the car is driven. So there really is no universal event underlying the sources of uncertainty.

The choice axiom of chapter 1 was introduced as a first attempt to construct a probability-like theory of choice that by-passed the fixed, universal sample space assumption.

## The axiom as originally formulated

Suppose that $$P_{S}(R)$$ denotes the probability (or more generally, subjective weight) that the choice of an individual person (and in later work in economics and marketing across a group of people), from a finite set $$S$$ of alternatives, falls within the subset $$R\subset S\ .$$ For example, let $$S$$ denote the six faces of a die and $$R=\{1,6\}\ ,$$ then $$P_{S}(R)$$ is the probability that a person thinks either a $$1$$ or a $$6$$ will arise with a roll of the die. For $$S=\{x,y\}$$ the abbreviation $$P(x,y):=P_{S}=P_{\{x,y\}}(x)$$ is used; it simply denotes the probability of selecting $$x$$ over $$y\ .$$ Then Luce (1959/2005) (abbreviated ICB) defined the choice axiom by:

Let $$T$$ be a finite subset [...] such that, for every $$S\subset T\ ,$$ $$P_{S}$$ is defined.

1. If $$P(x,y)\neq 0,1$$ for all $$x,y\in T\ ,$$ then for $$R\subset S\subset T$$
$\tag{1} P_{T}(R)=P_{S}(R)P_{T}(S);$
1. If $$P(x,y)=0$$ for some $$x,y\in T\ ,$$ then for every $$S\subset T$$
$\tag{2} P_{T}(S)=P_{T-\{x\}}(S-\{x\}).$

In modern notation $$S-\{x\}=S\backslash\{x\}\ .$$

Subsequently, the issue of not confusing this axiom with the very important, mathematical axiom of choice may have led some authors to start calling, at least, Part (1) of (1), Luce's choice axiom (LCA); Part (2) is often ignored.

In recent work on the utility of gambling (Luce, Ng, Marley, Aczél, 2008, chapter 3.6), (1) was derived from other axioms as a property holding for non-additive subjective weights. Because it was not postulated, they called it the <<choice property>>.

If the size of the subsequent literature means a work can be described as seminal, then a check via, e.g., Google, makes clear that this 150 page monograph warrants that adjective.

## Formulations related to Part (1) of LCA

Summaries of some of the issues are found in Marley (1997, especially the chapters by Estes, Nosofsky, Suppes, and Yellott). Some of the detailed issues are:

### Resemblance to conditional probability

ICB(p. 10-11) noted this resemblance because one can rewrite (1) as $\tag{3} P_{S}(R)=\frac{P_{T}(R)}{P_{T}(S)}.$

But this is not formally conditional probability because LCA does not assume $$T$$ is a universal sample space. For example, suppose that $$T=$$ the possible chance outcomes of rolling a die and $$T'=$$ the possible chance outcomes of spinning a roulette wheel. Each may be a local sample space without $$T\cup T'$$ necessarily also being one. Relevant to this issue are experiments of Batsell & Lodish (1981) as well as theoretical work of Suppes (1997) and Narens (2003, 2008).

### Ratio scale representation

Theorem 3, (p. 23) of ICB noted that Part (1) was equivalent to the existence of a positive ratio scale (i.e., unique up to multiplication by a positive constant) $$v$$ on $$T$$ such that $\tag{4} P_{S}(x)=\frac{v(x)}{\sum\limits_{y\in S}v(y)}.$

When finite additivity is no longer assumed, (4) may hold but with $$\sum\limits_{y\in S}v(y)$$ replaced by $$v(S)\ .$$ There is a sense in which the function $$v$$ replaces the role of a universal probability measure over a universal sample space. ICB noted that Bradley and Terry (1952) and earlier papers had studied the binary case of (4), and for a time some psychologists, at least, referred to LCA as the Bradley-Terry-Luce (BTL) theory. The form of (4) has repeatedly reappeared in various models. In one such model, the $$v$$ measure is modified to represent both a response bias and a similarity measure (e.g., Luce, 1963; Nosofsky, 1984, 1997; Townsend & Ashby, 1982). Nosofsky (1997) summarized his extensive research program on applying these ideas to his generalized context model of categorization and to extensions in the study of response times.

### Product rule

Suppose that $$T=\{x,y,z\}\ ,$$ then (4) implies (Theorem 2, p. 16 of ICB) $\tag{5} \frac{P(x,y)}{P(y,x)}\frac{P(y,z)}{P(z,y)}\frac{P(z,x)}{P(x,z)}=1,$

which is called the product rule. It can be rewritten as $\tag{6} \frac{P(x,z)}{P(z,x)}=\frac{P(x,y)}{P(y,x)}\frac{P(y,z)}{P(z,y)},$

which makes clear that the role of $$y$$ simply "cancels" when evaluating $$\frac{P(z,x)}{P(x,z)}\ .$$ This is closely related to the next inference.

### Independence from irrelevant alternatives (IIA)

One feature of (1) (noted on p. 9) that follows immediately from (4), is: $\tag{7} \frac{P(x,y)}{P(y,x)}=\frac{P_{S}(x)}{P_{S}(y)}.$

This means the elements in $$S\backslash\{x,y\}$$ are irrelevant in the right ratio, a form of IIA. Debreu's (1960) review of ICB included an example that made clear that, in some contexts, (7) is an empirically unsatisfactory feature of the axiom. His example involved both parts of the axiom. A widely cited alternative that rests just on Part (1) plus a concept of similarity is: Let $$x=$$ a bicycle, $$y=$$ a red bus, $$z=$$ a blue bus. Suppose a boy is indifferent pairwise between rides on a bicycle, a red bus, or a blue bus. When confronted with choosing among all three, his preference remains 50:50 between a bus or a bicycle ride because he is indifferent to the bus color, i.e., $$\frac{1}{2},\frac{1}{4},\frac{1}{4}\ ,$$ whereas Part (1) of the choice axiom predicts $$\frac{1}{3},\frac{1}{3},\frac{1}{3}\ .$$ Of course, such examples led to various attempts to generalize the axiom to be rid of IIA, with, perhaps, the most important being Tversky's elimination by aspects (EBA) model (Tversky, 1972; Tversky & Russo, 1969). In EBA it is assumed that each alternative is a vector of aspects and that choices are achieved by comparing aspects and dropping partially dominated alternatives.

### Random utility representation

Subsequent work, especially by E. Holman & A. A. J. Marley (see footnote 7 of Luce & Suppes, 1965), McFadden (e.g., 2003), and Yellott (1977, 1997), led to the idea of a random variable representation that was parallel to Thurstone's discriminal processes but with the double exponential, sometimes called logit, (cumulative) distribution $$\exp\left( e^{-\alpha t+\beta}\right)$$ replacing Thurstone's Gaussian distributions. Some authors, e.g., McFadden (2003), identify the model as the logit or multinomial logit. Note that the Gaussian arises as the limit distribution of the sum of independent random variable whereas the double exponential arises as the limit distribution of their maximum value. The latter density is quite asymmetric whereas the Gaussian is symmetric.

### Rank orderings

Section F of Chapter 2 raised the question of how rank orderings relate to binary choice probabilities. To that end, let $$\sigma$$ denote a ranking of $$T\ .$$ Let $$R_{T}(\sigma)$$ denote the probability of a person making the ranking $$\sigma\ ,$$ and let $$x\succ y$$ denote the set of rankings of $$T$$ in which $$x$$ precedes $$y\ .$$ Suppose that $$\sigma$$ ranks alternative $$x\in T$$ first and let $$\sigma_{-x}$$ denote the ranking of over the remaining subset $$T\backslash\{x\}\ .$$ Then the following ranking postulate was partially investigated:

$\tag{8} R_{\{x,y\}}(x\succ y) =P(x,y)$

$R_{T}\left( \sigma\right) =P_{T}(x)R_{T\backslash\{x\}}(\sigma_{-x})$

Some results were derived which, as subsequent work made clear, were very partial. One issue concerned when the induction from best to worst was, or was not, the same as that from worst to best. Marley (1968) and Yellott (1977, 1997, 1980) made further progress on the issue, but that work was quite incomplete and not fully satisfactory. Nonetheless it raised issues that others, especially Block & Marschak (1960), Georgescu-Roegen (1958, 1969), Fishburn (1994), Marschak (1960), and Marley (1968), found tantalizing and clarified somewhat, as outlined (to that date) by Luce (1977).

Matters went fairly dormant until Saari (2005, 2008) took it up again. He had worked extensively on understanding social rankings, as in the Arrow and Sen impossibility theorems, by recasting the problem into geometric concepts, and he applied that perspective in a very detailed fashion to clarify the relations between the choice axiom and rankings. He showed that IIA essentially nullifies the power of other assumed properties, such as transitivity of individuals. He concludes by saying "Indeed, while the geometric approach introduced here leads to a richer selection of alternative computational approaches, where the subject uses more information, and a significant relaxation on the choice of ranking probabilities, it is only an indication of what is possible."

## Extensions and applications

A great many applications of LCA, including some that seem somewhat questionable, have appeared. Luce (1977) presented a quite detailed summary of work, but limited mainly to psychology, in the ensuing 20 years\footnote{This dates the choice axiom to 1957, which is when it was circulated as a technical report among a number of mathematical psychologists. That report, which had red covers, led to some confusion about its relation to probability theory, and some people referred to it as the "red menace", whence the color of the 1959 edition.}. Most notable are rejections of IIA and attempts to weaken the axiom, e.g., the EBA proposal of Tversky (1972) (see 2.d). The representation (4) and the logit representations are both frequently invoked without reference.

Some of the areas in which applications have appeared include, of course, psychology where, as one might expect, a good deal of attention has been paid to issues of violations of IIA for individuals. At the same time, from early on a number of attempts have been made by others, including Tversky (1972), to modify the representation (4) to explain various phenomena. For summaries, see Marley (1997).

A second major area of applications has been in economics. There the logit representation form has been the main tool. For example, McFadden (1974, 1976) invoked it, with credit, in his extensive and penetrating theoretical developments that led to his being awarded the Nobel prize in 2000. His Nobel address (2003) is generous in crediting Luce and others.

And a third area has been in marketing research where there have been fairly extensive applications and generalizations, even though ICB had emphasized the point that even if each person satisfies the axiom, their average data will not in general. Apparently, Huff (1962) first did so; moreover, his was the first multi-attribute version. Perhaps Richard R. Batsell is the person most continuously active, beginning in 1980s, working in marketing on uses of LCA, generalizations of it such as EBA, testing them on extensive individual and group data set, and developing greatly improved parameter estimation schemes (Batsell & Lodish, 1981; 1982; Batsell, Polking, Cramer, & Miller, 2003). Other relevant references are in these papers. During the same period, some developments were tied in with (additive) conjoint measurement and that has been widely developed and applied to real problems. See for example Currim (1982) and Louviere, Hensher, & Swait (2003) and the many references there.

## Where did the choice axiom come from?

Here we shift to personal, motivational remarks.

Shortly before my 1950 M.I.T. Ph.D. in applied (to physical processes) mathematics, I was introduced by a roommate, who was taking a course from Leon Festinger, to a problem in what, today, is called social networks. That led to my first publication with an electrical engineering graduate student, Albert Perry. At that point I started to shift attention to non-physical science and non-military applications, and specifically focused a bit on economics and psychology. The department had no one who could supervise such a dissertation, and I was "led" to work on semigroups. Although that then seemed a diversion, later semigroups, especially ordered ones, played a very significant role in my work on the foundations of measurement. Following my degree, Alex Bavelas hired me into his Group Network Laboratory and I undertook a disorganized, on the job, self-study of some psychology, some probability, and (in self defense) some statistics. To a considerable degree I was informally and selectively mentored by a number of, even then, quite prominent people including: Bavelas, Festinger, J. C. R. Licklider, W. J. McGill, G. A. Miller, and W. A. Rosenblith all at M.I.T. In 1953 that laboratory folded and Paul F. Lazersfeld, a sociologist at Columbia brought me into the newly created Behavioral Models Project. He was an imposing influence in the social sciences, and he arranged for me to spend the inaugural year at the Center for Advanced Study where, among other people, A. H. Hastorf of Stanford influenced my understanding of psychophysics. The statistician F. W. Mosteller at Harvard next arranged for me to become a senior lecturer there, and he and S. S. Stevens had a pronounced influence on my work. Many of these people are no longer with us. I also collaborated with younger, but ultimately prominent, people including: R. R. Bush, W. Edwards, E. Galanter, D. H. Krantz, H. Raiffa, P. Suppes and, A. Tversky.

During that period I came to realize how thinking in terms of the choice axiom gained some new understandings about several realms including psychophysics, utility theory, and learning (described in Chapters 2-4 of ICB). The latter led to my studies on the class of commutative learning operators. These applications seemed to warrant the effort.

When I was working on Luce and Raiffa (1957) and various other research issues in psychology including semiorders (Luce, 1956) and Fechner's faulty\footnote{He attempted to solve a difference equation by trying to reduce it, incorrectly, to a differential equation.} derivation of his law (Luce & Edwards, 1958), three things became increasingly clear to me. One was that Savage's (1954) axiomatization of subjective expected utility, like probability itself, rested on an assumption that one is dealing with a universal sample space of uncertain states. When one began to consider examples, it soon seemed evident to me, although then not widely acknowledged, that such a supposition is preposterous. Decision making under chance or under uncertainty is, in reality, dealt with quite locally. Moreover, trying to work with an assumed universal sample space generates chaos when either a new source of chance is added or a source of chance vanishes, each thereby significantly altering the sample space. Another realization was that even a very simple set of uncertain gambles led to an enormous sample space, surely beyond the comprehension of ordinary people. As noted above, the choice axiom had the advantage of looking a lot like a conditional probability, but without formally being the same because it was entirely local. Frankly, I have no clear recollection of when exactly I first wrote it down formally beyond it being sometime during my fellowship at the inaugural year of the Center for Advanced Study in the Behavioral Sciences, 1954-55.

Another factor, which Howard Raiffa brought to my attention, was the importance of the idea of IIA which, after all, was fundamental to Arrow's (1951/1964) famous impossibility theorem and which, in a different form, was an inherent property of LCA.

### Acknowledgements

Useful suggestions and references have been provided by R. R. Batsell, A. A. J. Marley, D. G. Saari, and R. Steingrimsson, all of whom I thank.