# Similarity measures

 F. Gregory Ashby and Daniel M. Ennis (2007), Scholarpedia, 2(12):4116. doi:10.4249/scholarpedia.4116 revision #142770 [link to/cite this article]
Post-publication activity

Curator: Daniel M. Ennis

# Similarity Measures

The concept of similarity is fundamentally important in almost every scientific field. For example, in mathematics, geometric methods for assessing similarity are used in studies of congruence and homothety as well as in allied fields such as trigonometry. Topological methods are applied in fields such as semantics. Graph theory is widely used for assessing cladistic similarities in taxonomy. Fuzzy set theory has also developed its own measures of similarity, which find application in areas such as management, medicine and meteorology. An important problem in molecular biology is to measure the sequence similarity of pairs of proteins.

A review, or even a listing of all the uses of similarity is impossible. Instead, this article focuses on perceived similarity. The degree to which people perceive two things as similar fundamentally affects their rational thought and behavior. Negotiations between politicians or corporate executives may be viewed as a process of data collection and assessment of the similarity of hypothesized and real motivators. The appreciation of a fine fragrance can be understood in the same way. Similarity is a core element in achieving an understanding of variables that motivate behavior and mediate affect.

Not surprisingly, similarity has also played a fundamentally important role in psychological experiments and theories. For example, in many experiments people are asked to make direct or indirect judgments about the similarity of pairs of objects. A variety of experimental techniques are used in these studies, but the most common are to ask subjects whether the objects are the same or different, or to ask them to produce a number, between say 1 and 7, that matches their feelings about how similar the objects appear (e.g., with 1 meaning very dissimilar and 7 meaning very similar). The concept of similarity also plays a crucial but less direct role in the modeling of many other psychological tasks. This is especially true in theories of the recognition, identification, and categorization of objects, where a common assumption is that the greater the similarity between a pair of objects, the more likely one will be confused with the other. Similarity also plays a key role in the modeling of preference and liking for products or brands, as well as motivations for product consumption. A common assumption here is that when evaluating a product, people imagine an ideal and then judge the similarity of the offered product to this ideal (Coombs, 1964).

## Distance-Based Similarity Measures

One of the oldest and most influential theoretical assumptions is that perceived similarity is inversely related to psychological distance. The idea here is that a percept is a mental representation of an object, a concept, an ideal, a position on an issue, or any other mental entity that can be quantified. Typically, percepts are assumed to vary on a variety of features or psychological dimensions. The numerical values of a particular percept on each of these dimensions can be interpreted as the coordinates of this percept in a psychological feature space. A fundamental assumption of many psychological theories is that percepts that are close together will be perceived as similar and percepts that are far apart will be perceived as dissimilar.

Multidimensional scaling (MDS) is a technique that uses similarity judgments (or some other proximity measure) to produce a psychological space in which similarity is inversely related to distance (e.g., Young & Hamer, 1994). The output of an MDS computer program is a set of coordinates for each stimulus in some hypothetical psychological space. Within this space, stimuli that were judged by subjects to be similar are close together.

The two most popular distance measures in MDS are Euclidean distance (i.e., as the crow flies distance) and city-block distance. If the coordinates of some stimulus A in an n-dimensional psychological space are (xA1, xA2, …, xAn) then the Euclidean distance from stimulus A to some other stimulus B is

$$d(A,B) = \sqrt{\sum_{i=1}^n\, (x\,_{Ai\,}-x\,_{Bi\,})^2 }$$

When n = 2, this equation reduces to the familiar formula from the Pythagorean theorem. The city-block distance between these two stimuli is defined as

$$d(A,B) = \sum_{i=1}^n\, |x\,_{Ai\,}-x\,_{Bi\,}|$$

City-block distance is so-named because it is the distance in blocks between any two points in a city (e.g., down 3 blocks and over 1 for a total of 4 blocks). An influential hypothesis has been that Euclidean distance is valid when stimulus dimensions are perceptually integral, whereas city-block distance is appropriate when stimulus dimensions are perceptually separable (Shepard, 1964). Integral dimensions, such as the brightness and saturation of a color, fuse together in the mind, whereas separable dimensions, such as the color and shape of an object, can be analyzed separately (Garner, 1974).

The earliest form of MDS, now called metric MDS, required data in the form of dissimilarity judgments that were measured on an interval or ratio scale (Torgerson, 1952). These algorithms were later generalized to non-metric MDS, which only requires ordinal scale data (Kruskal, 1964; Shepard, 1962). A later development accounted for individual differences (INDSCAL) in a version of non-metric MDS that assumes people produce different similarity judgments because they differentially weight the various stimulus dimensions (Carroll & Chang, 1970; Takane, Young, & de Leeuw, 1977). More recently, a variety of machine learning algorithms have been proposed that learn similarity metrics of this type (e.g., Guo, Jain, Ma, & Zhang, 2002; Xing, Ng, Jordan, & Russell, 2002).

Shepard (1987) proposed as a universal law that distance and perceived similarity are related via an exponential function

$$s(A,B) = e^{-d(A,B)}$$

He further proposed that this exponential function describes the probability that two stimuli fall in a region of stimulus space associated with the same response, which he called a consequential region. Shepard noted some failures of the exponential function in the case of confusable objects, although it was later shown that these problems can be resolved by treating percepts as probabilistic and applying Shepard’s similarity function at the moment of decision-making (Ennis, 1988).

All types of distance obey certain properties, called the distance axioms. For example, the distance from point A to point B must be equal to the distance from point B to point A. If similarity is inversely related to psychological distance, then perceived similarity must also obey these axioms. Considerable empirical effort therefore, has been spent testing the validity of the distance axioms. The four distance axioms are

1. Equal self-similarity. d(A, A) = d(B, B) for all points A and B. Therefore, s(A, A) = s(B, B) for all stimuli A and B.

2. Minimality. d(A, B) > d(A, A) for all points A $$\ne$$ B. Therefore, s(A, B) < s(A, A) for all stimuli A $$\ne$$ B.

3. Symmetry. d(A, B) = d(B, A) for all points A and B. Therefore, s(A, B) = s(B, A) for all stimuli A and B.

4. Triangle Inequality. d(A, B) + d(B, C) $$\ge$$ d(A, C) for all points A, B, and C. Therefore, the dissimilarities among any set of three stimuli should satisfy this same condition. The triangle inequality also implies that if stimuli A and B are similar and stimuli B and C are similar, then stimuli A and C must also be similar.

Much evidence has now been collected that raises questions about the validity of each of these axioms. For example, William James (1890) described an apparent counterexample to the triangle inequality more than a century ago. A flame is similar to the moon because they are both luminous, and the moon is similar to a ball because they are both round, but in contradiction to the triangle inequality, a flame is not similar to a ball. As one example of evidence against symmetry, Tversky (1977) reported that most people judge the similarity of North Korea to China to be greater than the similarity of China to North Korea. In response to examples such as these, many theories were proposed that could account for violations in some or all of the distance axioms. Included in this list were models that retained the assumption that similarity and psychological distance are inversely related, and assumed that violations in the distance axioms occurred because of response bias (Nosofsky, 1991), shifts in selective attention (Nosofsky, 1986), variations in the spatial density of stimulus representations in the psychological space (Krumhansl, 1978), or because percepts are probabilistic rather than deterministic (Ennis & Johnson, 1993). Other theories accounted for distance axiom violations by rejecting the assumption that similarity and distance are closely related and assumed instead that similarity is a function of the saliency of stimulus features (Tversky, 1977) or of the overlap between distributions of probabilistic percepts (Ashby & Perrin, 1988).

## Feature-Based Similarity Measures

Partly in response to empirical evidence against the distance axioms, Tversky (1977) proposed that perceived similarity is the result of a feature-matching process that differentially weights common and distinct stimulus features. Let g(A ∩ B) denote the salience of the features that are common to stimuli A and B and let g(A – B) denote the salience of the features that are unique to stimulus A. Then Tversky’s (1977) feature contrast model proposes that the similarity of stimulus A to stimulus B is equal to

s(A, B) = α g(A ∩ B) – β g(A – B) – γ g(B – A),

where α, β, and γ are constants that might vary across individuals, context, and instructions. According to this model, features in common increase similarity, whereas features that are unique to one stimulus decrease similarity. One advantage of the feature contrast model is that it can account for violations in any of the distance axioms.

## Probabilistic Similarity Measures

All similarity measures considered so far assume that repeated presentation of the same stimulus always elicits the exact same percept – that is, they assume that the percept is deterministic. But many theorists have argued that the information that forms a percept varies over time, and thus that percepts are probabilistic. This is consistent with personal experience concerning the taste of products, views on political issues, or opinions about people. Biological processes involved in generating percepts, the chemical and physical variation associated with stimuli, and limitations on our ability to know the information state absolutely all favor models that assume probabilistic percepts.

Many probabilistic models have been proposed (for a review see Ashby, 1992). In general these models all make two assumptions that were inspired by L. L. Thurstone (1927) and by signal detection theory (Tanner & Swets, 1954): a) the percept elicited by a stimulus varies probabilistically over repeated exposures to that stimulus, and b) there is a well-defined decision rule that describes how a response is selected for any momentary value of the percept. Some probabilistic models retain the assumption that similarity is inversely related to psychological distance (Ennis & Johnson, 1993). Other models rely instead on the signal-detection notion of a decision bound (Ashby & Perrin, 1988). Assuming the percept is probabilistic fundamentally changes the predictions of these models. For example, all of the models that assume a probabilistic percept can account for violations in at least some of the distance axioms.

## A Classification of Similarity Models

As mentioned above, similarity models differ according to whether they assume the percept is deterministic or probabilistic. They also differ according to whether they assume that the process through which a response is selected is deterministic or probabilistic. The decision process is deterministic if the same response is always made given the same information and it is probabilistic if, for each information state, a response is selected randomly by sampling from some probability distribution. Thus, with a probabilistic decision process, the decision may be different on two separate occasions when the percept is identical. Thurstonian models assume probabilistic percepts, but the decision process may be deterministic or probabilistic.

Table 1 provides a 2 × 2 classification based on whether the percept and the decision process are deterministic and/or probabilistic. To illustrate the ideas in Table 1, consider the ‘same-different’ task, which is widely used in the study of perceived similarity. In this task, subjects are presented with pairs of objects and instructed to report if the stimuli are the same or different. If the stimuli are different, but highly similar, subjects will sometimes respond that they are the same, and thus, the proportion of same responses that a pair of stimuli receives can be interpreted as a measure of their similarity.

Many models assume that the subject’s decision about whether two stimuli are the same or different depends on whether the distance between the two percepts is less than or greater than some criterion. Type 0 models in Table 1 assume the percept and decision process are both deterministic. The best-known example in this class is MDS. Type 0 models describe a situation of invariant decision-making in an unchanging universe. They can often provide valuable summary descriptions of aggregate behavior, but they are poor models of individual moment-by-moment behavior, because, for example, they cannot account for any variability in the performance of subjects.

Type I models assume that the percept is probabilistic and the decision process is deterministic. They account for variability in behavior because the probabilistic nature of the percept means that the distance between percepts associated with the same two stimuli can sometimes be closer than the criterion and sometimes greater. The Type I class includes signal detection theory (Tanner & Swets, 1954), general recognition theory (Ashby & Townsend, 1986), probabilistic preferential choice unfolding models (De Soete, Carroll & DeSarbo, 1986; Ennis & Johnson, 1994; MacKay, Easley & Zinnes, 1995), and classical Thurstonian psychophysical models (Thurstone, 1927).

Type II models assume a deterministic percept and a probabilistic decision rule. Examples include the logistic model used in marketing, public health, and economics (Hosmer & Lemeshow, 2000), and the MDS-choice model (Luce, 1963; Nosofsky, 1986; Shepard, 1957), which is popular in the object recognition and categorization literatures. Type II models assume an invariant perceptual world, and account for variable behavior by assuming that people are inconsistent in their use of perceptual information.

Type III models assume the percept and decision process are both probabilistic. When applied to the same-different task, some Type III models predict that a ‘same’ response is based on the expected value of the similarity function, whereas others are a special case of moment generating functions (Ennis & Johnson, 1993). Both Type I and Type III models are Thurstonian.

 Decision Process Percept Deterministic Probabilistic Deterministic Type 0 MDS Type II logistic model MDS-choice model Probabilistic Type I signal detection theory general recognition theory probabilistic preferential choice unfolding models classical Thurstonian psychophysics Type III probabilistic extensions of Type II models special cases of moment generating functions

## References

Ashby, F. G. (Ed.). (1992). Multidimensional models of perception and cognition. Hillsdale, NJ: Erlbaum.

Ashby, F. G., & Perrin, N. A. (1988). Toward a unified theory of similarity and recognition. Psychological Review, 95, 124﷓150.

Ashby, F. G., & Townsend, J. T. (1986). Varieties of perceptual independence. Psychological Review, 93, 154﷓179.

Carroll, J. D. & Chang, J. J. (1970). Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition. Psychometrika, 35, 238-319.

Coombs, C. H. (1964). A theory of data. New York: Wiley.

De Soete, G., Carroll., J.D., & DeSarbo, W. S. (1986). The wandering ideal point model: A probabilistic multidimensional unfolding model for paired comparisons data. Journal of Mathematical Psychology, 30, 28-41.

Ennis, D. M. (1988). Confusable and discriminable stimuli: Comments on Nosofsky (1986) and Shepard (1986). Journal of Experimental Psychology: General, 117, 408-411.

Ennis, D. M. & Johnson, N. L. (1993). Thurstone-Shepard similarity models as special cases of moment generating functions. Journal of Mathematical Psychology, 37, 104-110.

Ennis, D. M. & Johnson, N. L. (1994). A general model for preferential and triadic choice in terms of central F distribution functions. Psychometrika, 59, 91-96.

Garner, W. R. (1974). The processing of information and structure. New York: Wiley.

Guo, G.-D., Jain, A. K., Ma, W.-Y., & Zhang, H.-J. (2002). Learning similarity measure for natural image retrieval with relevance feedback. IEEE Transactions on Neural Networks, 13, 811-820.

Hosmer, D. W. & Lemeshow, S. (2000). Applied Logistic Regression. 2nd ed. New York, NY: Wiley.

James, W. (1890). Principles of psychology. New York: Holt.

Krumhansl, C. L. (1978). Concerning the applicability of geometric models to similarity data: The interrelationship between similarity and spatial density. Psychological Review, 85, 445-463.

Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29, 1-27.

Luce, R. D. (1963). Detection and recognition. In R. D. Luce, R. R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology (Vol. 1, pp. 103-189). New York: Wiley.

MacKay, D. B., Easley, R. F., & Zinnes, J. L. 1995). A single ideal point model for market structure analysis. Journal of Marketing Research, 32, 433-443.

Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39-57.

Nosofsky, R. M. (1991). Stimulus bias, asymmetric similarity, and classification. Cognitive Psychology, 23, 94-140.

Shepard, R. N. (1957). Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space. Psychometrika, 22, 325-345.

Shepard, R. N. (1962). The analysis of proximities: Multidimensional scaling with an unknown distance function I. Psychometrika, 27, 125-140.

Shepard, R. N. (1964). Attention and the metric structure of the stimulus space. Journal of Mathematical Psychology, 1, 54-87.

Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237, 1317-1323.

Takane, Y., Young. F. W., & de Leeuw, J. (1977). Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features. Psychometrika, 42, 7-67.

Tanner, W. P. & Swets, J. A. (1954). A decision-making theory of visual detection. Psychological Review, 61, 401-409.

Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273-286.

Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika, 17, 401-419.

Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327-352.

Xing, E., Ng, A., Jordan, M., & Russell, S. (2002). Distance metric learning with application to clustering with side-information. Advances in Neural Information Processing Systems. MIT Press.

Young, F. W. & Hamer, R. M. (1994). Theory and applications of multidimensional scaling. Hillsdale, NJ: Erlbaum.

Internal references