# Support vector clustering

Post-publication activity

Curator: Asa Ben-Hur

The objective of clustering is to partition a data set into groups according to some criterion in an attempt to organize data into a more meaningful form. There are many ways of achieving this goal. Clustering may proceed according to some parametric model or by grouping points according to some distance or similarity measure as in hierarchical clustering. A natural way to put cluster boundaries is in regions in data space where there is little data, i.e. in "valleys" in the probability distribution of the data. This is the path taken in support vector clustering (SVC), which is based on the support vector approach.

In SVC data points are mapped from data space to a high dimensional feature space using a kernel function. In feature space we look for the smallest sphere that encloses the image of the data using the Support Vector Domain Description algorithm. This sphere, when mapped back to data space, forms a set of contours which enclose the data points. We interpret these contours as cluster boundaries, and points enclosed by each contour are associated by SVC to the same cluster.

## The algorithm

SVC uses the Support Vector Domain Description (SVDD) to delineate the region in data space where the input examples are concentrated. SVDD belongs to the general category of kernel based learning. In its "linear" version SVDD looks for the smallest sphere that encloses the data. When used in conjunction with a kernel function, it looks for the smallest enclosing sphere in the feature space defined by the kernel function. While in feature space the data is described by a sphere, when mapped back to data-space the sphere is transformed into a set of non-linear contours that enclose the data (see Figure 2). SVDD provides a decision function that tells whether a given input is inside the feature-space sphere or not, indicating whether a given point belongs to the support of the distribution. More specifically, it is the radius-squared of the feature-space sphere minus the distance-squared of the image of a data point $$\mathbf{x}$$ from the center of the feature-space sphere. This function, denoted by $$f(\mathbf{x})$$ returns a value greater than 0 if $$\mathbf{x}$$ is inside the feature space sphere and negative otherwise. For more details on SVDD the reader is referred to the SVDD article.

We now interpret the manifold or contour where $$f(\mathbf{x})=0$$ as cluster boundaries. An example of such contours is shown in Figure 2. However, these boundaries define the clusters implicitly, and an additional step is required to "tease" the cluster membership out of the SVDD.

Figure 1: The line segment that connects points in different clusters has to go through a low-density region in data space where the SVDD returns a negative value.

The key geometrical observation that enables to infer clusters out of the SVDD is that given a pair of data points that belong to different components (clusters), the line segment that connects them must go through a region in data space which is part of a "valley" in the probability density of the data, i.e. does not belong to the support of the distribution. Such a line must then go outside the feature-space sphere, and therefore have a segment with points that return a negative value when tested with the SVDD decision function (see Figure 1). This observation leads to the definition of an adjacency matrix $$A$$ between pairs of points in our dataset. For a given pair of points $$\mathbf{x_i}$$ and $$\mathbf{x_j}$$ the $$i,j$$ element of $$A$$ is given by $A_{ij} = \begin{cases} 1, & \mbox{if } f(\mathbf{x}) > 0 \mbox{ for all}~ \mathbf{x} ~\mbox{on the line segment connecting}~ \mathbf{x_i} ~\mbox{and}~ \mathbf{x_j} \\ 0 & \mbox{otherwise}. \end{cases}$ Clusters are now defined as the connected components of the graph induced by $$A\ .$$ Checking the line segment is implemented by sampling a number of points (20 points were used in numerical experiments).

## Examples

Figure 2: Contours generated by SVDD as $$\gamma$$ is increased.

SVC uses SVDD to generate non-linear cluster boundaries in conjunction with the Gaussian kernel: $k(\mathbf{x}, \mathbf{x}') = \exp (-\gamma || \mathbf{x} - \mathbf{x}'||^2).$ The boundaries generated by SVDD become increasingly non-linear as the parameter $$\gamma$$ which governs the width of the Gaussian, is increased as shown in Figure 2. As the value of $$\gamma$$ is increased the SVDD boundary fits the data more tightly, and at several values of $$\gamma$$ the enclosing boundary splits, forming an increasing number of components (clusters). The so-called support vectors are the data points that are on the boundary. As $$\gamma$$ is increased their number increases, increasing the complexity of the boundary. The choice of the Gaussian kernel is motivated by Roberts (1997) where a related method produces a sequence of cluster splitting when used in conjunction with the Gaussian kernel.

Figure 3: Allowing for outliers allows SVC to separate noisy data.

In real data, clusters are usually not as well separated as in Figure 2. Thus, in order to observe splitting of contours, we must allow some data points to fall outside the support of the distribution, i.e. outside the feature-space sphere defined by SVDD. The fraction of such points is controlled using the parameter $$C$$ of the SVDD. The effect of allowing outliers is demonstrated in Figure 3. Without allowing outliers contour splitting does not occur for the two outer rings for any value of $$\gamma\ .$$ Allowing for outliers, the clusters are separated easily (right panel of Figure 3). Note that the outliers are unclassified SVC, since they lie outside the enclosing sphere. One may decide either to leave them unclassified, or to assign them to the cluster closest to them.

The original SVC paper recommends exploring the parameter space of $$(\gamma, C)$$ by starting with a low value of $$\gamma$$ and high value of $$C$$ where a single cluster with no outliers occurs. $$\gamma$$ is then iteratively decreased to look for cluster bifurcations; whenever the number of support vectors becomes excessive, indicating a non-smooth boundary, $$C$$ is decreased, allowing for more outliers. Note that whenever applying a clustering algorithm the question arises whether the clustering captures structure that is inherent in the data. There are many approaches for clustering validation, one of which is the stability of the clustering under sampling or other perturbations. Stability of the clustering with respect to varying the width of the gaussian kernel could be an indicator of stability of the clustering, but further research is required to show that.

## Summary

SVC is a nonparametric clustering algorithm that does not make any assumption on the number or shape of the clusters in the data. In our experience it works best for low-dimensional data, so if your data is high-dimensional, a preprocessing step, e.g. using principal component analysis, is usually required. Several enhancements of the original algorithm were proposed that provide specialized algorithms for computing the clusters by only computing a subset of the edges in the adjacency matrix.

## References

• A. Ben-Hur, D. Horn, H.T. Siegelmann and V. Vapnik. Support vector clustering. Journal of Machine Learning Research 2:125-137, 2001.
• S.J. Roberts. Parametric and non-parametric unsupervised cluster analysis. Pattern Recognition, 30(2): 261-272, 1997.

Internal references

• Olaf Sporns (2007) Complexity. Scholarpedia, 2(10):1623.
• Philip Holmes and Eric T. Shea-Brown (2006) Stability. Scholarpedia, 1(10):1838.