# Kohonen network

The **Self-Organizing Map (SOM)**, commonly also known as **Kohonen network** (Kohonen 1982, Kohonen 2001) is a computational method for the visualization and analysis of high-dimensional data, especially experimentally acquired information.

## Introduction

The Self-Organizing Map defines an ordered mapping, a kind of projection
from a set of given data items onto a regular, usually two-dimensional grid. A *model*
\(m_i\) is associated with each grid node ( Figure 1).
These models are computed by the SOM algorithm. A data item will be
mapped into the node whose model which is most similar to the data item,
e.g., has the smallest distance from the data item in some metric.

Like a codebook vector in vector quantization, the model is then usually a certain weighted local average of the given data items in the data space. But in addition to that, when the models are computed by the SOM algorithm, they are more similar at the nearby nodes than between nodes located farther away from each other on the grid. In this way the set of the models can be regarded to constitute a similarity graph, and structured 'skeleton' of the distribution of the given data items.

The SOM was originally developed for the visualization of distributions of metric vectors, such as ordered sets of measurement values or statistical attributes, but it can be shown that a SOM-type mapping can be defined for any data items, the mutual pairwise distances of which can be defined. Examples of non-vectorial data that are feasible for this method are strings of symbols and sequences of segments in organic molecules (Kohonen and Somervuo 2002).

## History

The SOM algorithm grew out of early neural network models, especially
models of associative memory and adaptive learning (cf. Kohonen 1984). A
new incentive was to explain the spatial organization of the brain's
functions, as observed especially in the cerebral cortex. Nonetheless
the SOM was not the first step in that direction: at least one has to
mention the spatially ordered line detectors of von der Malsburg
(1973) and the neural field model of Amari (1980).
However, the self-organizing power of these early models was rather
weak. The crucial invention of Kohonen was to introduce a system model
that is composed of at least two interacting subsystems of different
nature. One of these subsystems is a competitive neural network that
implements the *winner-take-all function*, but there is also
another subsystem that is controlled by the neural network and which
modifies the local synaptic plasticity of the neurons in learning. The
learning is restricted spatially to the local neighborhood of the most
active neurons. The plasticity-control subsystem could be based on
nonspecific neural interactions, but more probably it is a chemical
control effect. Only by means of the separation of the neural signal
transfer and the plasticity control has it become possible to
implement an effective and robust self-organizing system.

Nonetheless, the SOM principle can also be expressed mathematically in a pure abstract form, without reference to any underlying neural or other components. The first application area of the SOM was speech recognition (see Figure 2). In its abstract form, the SOM has come into widespread use in data analysis and data exploration (Kaski et al. 1998, Oja et al. 2003, Pöllä et al. 2007).

## Mathematical definition of the SOM

Consider first data items that are \(n\)-dimensional Euclidean vectors \[ x(t) = [\xi_1 (t), \xi_2 (t), \ldots , \xi_n (t)] \ .\]

Here \(t\) is the index of the data item in a given sequence. Let the \(i\)th model be \(m_i (t) = [\mu_{i1} (t), \mu_{i2} (t), \ldots , \mu_{in} (t)]\ ,\) where now \(t\) denotes the index in the sequence in which the models are generated. This sequence is defined as a smoothing-type process in which the new value \(m_i (t+1)\) is computed iteratively from the old value \(m_i (t)\) and the new data item \(x (t)\) as \[ m_i(t+1) = m_i(t) + \alpha(t) h_{ci}(t) [ x(t)-m_i(t) ]. \]

Here \(\alpha (t)\) is a scalar factor that defines the size of the correction; its value decreases with the step index \(t.\) The index \(i\) refers to the model under processing, and \(c\) is the index of the model that has the smallest distance from \(x(t)\) in the Euclidean signal space. The factor \(h_{ci} (t)\) is a kind of a smoothing kernel, also called the neighborhood function. It is equal to 1 when \(i = c\) and its value decreases when the distance between the models \(m_i\) and \(m_c\) on the grid increases. Moreover, the spatial width of the kernel on the grid shall decrease with the step index \(t\ .\) These functions of the step index, which determine the convergence, must be chosen very delicately: cf. , e.g., (Kohonen 2001). Also the initialization of the \(m_i(1)\) is a problem discussed in (Kohonen 2001).

Although the above iterative algorithm has been used with success in
numerous applications, it has turned out that the scheme termed the
*Batch Map* produces essentially similar results but an order of
magnitude faster. The basic idea is that for every node \(j\) in the
grid, the average \(\overline{x}_j\) of all those input items \(x(t)\) is
formed that have \(m_j\) as the closest model. After that, new models
are computed as
\[m_i = \sum_j{n_j h_{ji}}
\overline{x}_j / \sum_j{n_j h_{ji}}
\]

where \(n_j\) is the number of the input items mapped into the node \(j\ ,\) and the index \(j\) runs over the nodes in the neighborhood of node \(i\ .\) For the updated \(m_i\ ,\) this scheme is iterated for a few times, always using the same batch of input data items to determine the updated \( \overline{x}_j\ .\)

The mathematical theory of the SOM is very complicated and only the one-dimensional case has been analyzed completely (Fort 2006). For a modified adaptation rule, in which the matching is averaged over \(h_{ci}\ ,\) can the SOM be derived from a specific potential function (Heskes 2001). Apparently, the SOM belongs to the "ill posed" problems in mathematics.

## Useful instructions when applying the SOM

It is advisable to pay attention at least to the following recommendations in order that the resulting mappings are stable, well oriented, and topologically correct.

**Form of the array**: A hexagonal grid of nodes is to be preferred for
visual inspection. The edges of the array ought to be rather
rectangular than square, because the "elastic network" formed of the model vectors \(m_i\) must be oriented along with the distribution
of the input \(x\) in the signal space.
On the other hand, since the \(m_i\) have to approximate to the distribution
of the \(x\ ,\) it is further desirable that
the width and the height of the array at least roughly
correspond to the major dimensions (e.g. the [[principal
components]]) of the distribution
of \(x\ .\)

**Scaling of the vector components** is a very subtle problem. One
may easily realize that the orientation, or ordered "regression" of
the model vectors in the input space depends on the scaling of
the components (or dimensions) of the input data vectors.
Especially if the data elements represent variables of
different kinds and are therefore given in different scales,
there does not exist any simple rule to determine what kind of
rescaling is best before entering the training data to the
learning algorithm. One may try heuristically justifiable
rescalings and check the quality of the resulting maps, e.g,, by means of
Sammon's mapping (Sammon 1969) or the average quantization error.
Often the normalization of all input variables
such that, e.g., their variances become equal, is a useful strategy.

**Learning with a small number of available training samples:**
In some problems, e.g., when making maps for items defined by
a finite set of attributes, one does not have a large set of training samples.
However, for a good statistical accuracy, the learning process may require
a rather large number of training steps, and if the number of available samples is
much smaller, the samples can be used
reiteratively in training. The
samples may be applied cyclically or in a randomly permuted order, or
picked up at random from the basic set (called the bootstrap learning).
It has turned out in practice that ordered cyclic application of the available samples is not
noticeably worse than the other, statistically stringent sampling
methods.

**Quality of learning**: Different learning processes will be
obtained when starting with different initial values \(m_i(1)\ ,\) and applying
different sequences of the training vectors \({x(t)}\) and different
learning parameters.
Of course it would be desirable for the resulting map
to always be the least ambiguous one.
It is obvious that some optimal map for the same
input data must exist.
When comparing maps with the same array structure and definition
of \(h_{ci}\ ,\) the mean of \(|| x - m_c ||\) (quantization error)
over the training data is a
useful performance index. Therefore, an appreciable number (say,
several dozen) of random initializations of the \(m_i(1)\) may be
tried, and the map with the least quantization error selected.

Notice, however, that there would be no sense in comparing quantization errors for maps with different structures or \(h_{ci}\ ,\) because, e.g., it is a trivial fact that the error is minimum if \(h_{ci} = \delta_{ci}\) (Kronecker delta). When using this singular kernel, however, there is no self-organizing power left, because the algorithm will be reduced to the classical vector quantization.

## Software packages

Data analysis, clustering and visualization by the SOM can be done using either public domain, commercial, or self-coded software. The use of self-coded software is not encouraged as there are many subtle aspects that need to be taken into account and which affect the convergence and accuracy of the algorithm.

The SOM_PAK is a collection of tools for the correct application of the SOM. This original program package was created by the SOM Programming Team of the Helsinki University of Technology especially for very large problems and it is available for download at http://www.cis.hut.fi/research/som_lvq_pak.shtml.

The SOM Toolbox is a more flexible, general-purpose software library for MATLAB implementation of the SOM algorithm, and it uses the MATLAB graphics. The SOM Toolbox software package is available for download at http://www.cis.hut.fi/projects/somtoolbox/.

Although there exist many commercial implementations of the SOM for specific applications, it must be cautioned that in many of them there are fatal flaws, because all of the above precautions have not been taken into account.

## Extensions of SOM

Numerous variants of the basic SOM have been suggested (cf. Kaski et al. 1998, Oja et al. 2003, and Pöllä et al. 2007). They are too many to be reviewed here. The main alternatives for the definition of the SOM models are the Generative Topographic Map (Bishop et al. 1998) and the information-based computation of the SOM models (Van Hulle 2000).

The structure of the SOM array can be made adaptive ("growing maps", Fritzke 1991), or alternatively the neighborhood relations can be defined in the input space (Martinetz et al. 1993).

The SOM principle can be modified to analyze subspace relations (Kohonen et al. 1997) or dynamic patterns (Hammer et al. 2004).

In its basic form, the SOM described in this article, as contrasted with its many variants, is in a special position, because it continues the classical methods such as principal component analysis (Hotelling 1933), multidimensional scaling (Kruskal and Wish 1978), and directly visualizes the clustering results as an easily comprehensible geometric display.

SOM variants are often introduced in the WSOM (Workshop on Self-Organizing Map) conference series that is dedicated to reporting research related to the Self-Organizing Map. Latest publications include (Laaksonen and Honkela 2011) and (Príncipe and Miikkulainen 2009).

## References

Amari, S. (1980).
Topographic organization of nerve fields.
*Bulletin of Mathematical Biology*, 42:339-364.

Bishop, C.M., Svensén, M. and Williams, C.K.I. (1998).
GTM: The Generative Topographic Mapping. *Neural Computation* 10(1): 215-234.

Fort, J.C. (2006).
SOM’s mathematics. *Neural Networks*, 19: 812–816.

Fritzke, B. (1991). Let it grow - self-organizing feature maps with
problem dependent cell structure.*Proceedings of ICANN 1991*,
North-Holland, Amsterdam, pp. 403-308.

Hammer, B., Micheli, A., Sperduti, A., Strickert, M. (2004).
Recursive self-organizing network models.
*Neural Networks*, 17(8-9):1061-1086.

Heskes, T. (2001).
Self-organizing maps, vector quantization, and mixture modeling.
*IEEE Transactions on Neural Networks*, 12:1299-1305.

Hotelling, H. (1933).
Analysis of a complex of statistical variables into principal components.
*Journal of Educational Psychology*,
24: 417-441, 498-520.

Kaski, S., Kangas, J., and Kohonen, T. (1998).
Bibliography of self-organizing map (SOM) papers: 1981-1997.
*Neural Computing Surveys*, 1: 102-350.

Kohonen, T. (1982).
Self-organized formation of topologically correct feature maps.
*Biological Cybernetics*, 43:59-69.

Kohonen T. (1984).
*Self-Organization and Associative Memory*.
Springer, Berlin.

Kohonen, T. (2001).
*Self-Organizing Maps*. Third, extended edition.
Springer.

Kohonen, T., Kaski, S. and Lappalainen, H. (1997).
Self-organized formation of various invariant-feature filters in
the adaptive-subspace SOM. *Neural Computation*, 9: 1321-1344.

Kohonen, T. and Somervuo, P. (2002).
How to make large self-organizing maps for nonvectorial data.
*Neural Networks* 15(8-9): 945-952.

Kruskal, J.B. and Wish, M. (1978).
*Multidimensional Scaling*. Sage Publications, Beverly Hills, CA.

Laaksonen, J. and Honkela, T. (eds.) (2011). "Advances in Self-Organizing Maps, WSOM 2011", Springer, Berlin.

von der Malsburg, C. (1973). Self-organization of orientation sensitive
cells in the striate cortex. *Kybernetik*, 14:85-100.

Martinetz, T.M., Berkovich, S.G. and Schulten, K. (1993).
"Neural gas" for vector quantization and its application to time-series prediction.
*IEEE Transactions on Neural Networks*, 4: 558-569.

Oja, M., Kaski, S. and Kohonen, T. (2003).
Bibliography of Self-Organizing Map (SOM) Papers: 1998-2001 Addendum.
*Neural Computing Surveys*, 3: 1-156.

Pöllä, M., Honkela, T. and Kohonen, T. (2007).
Bibliography of Self-Organizing Map (SOM) Papers: 2002-2005 Addendum.
*Neural Computing Surveys*, forthcoming.

Príncipe, J.C. and Miikkulainen, R. (eds.) (2009) "Advances in Self-Organizing Maps, WSOM 2009", Springer, Berlin.

Sammon, J. (1969).
A nonlinear mapping for data structure analysis.
*IEEE Trans. Comp.*, 18(5):401-409.

Van Hulle, M.M. (2000).
*Faithful Representations and Topographic Maps: From Distortion- *
to Information-Based Self-Organization*. John Wiley & Sons, New York.*

**Internal references**

- Valentino Braitenberg (2007) Brain. Scholarpedia, 2(11):2918.
- Rob Schreiber (2007) MATLAB. Scholarpedia, 2(7):2929.

## External links

## See Also

Adaptive Resonance Theory, Associative Memory, Bayesian Learning, Neural Networks, Self Organization