Kohonen network
From Scholarpedia
| Teuvo Kohonen and Timo Honkela (2007), Scholarpedia, 2(1):1568. | doi:10.4249/scholarpedia.1568 | revision #73488 [link to/cite this article] | |||||||||||||||||||
Curator: Dr. Teuvo Kohonen, Academy of Finland
Curator: Dr. Timo Honkela, Helsinki University of Technology
The Self-Organizing Map (SOM), commonly also known as Kohonen network (Kohonen 1982, Kohonen 2001) is a computational method for the visualization and analysis of high-dimensional data, especially experimentally acquired information.
Contents |
Introduction
The Self-Organizing Map defines an ordered mapping, a kind of projection
from a set of given data items onto a regular, usually two-dimensional grid. A model
is associated with each grid node (Fig. 1).
These models are computed by the SOM algorithm. A data item will be
mapped into the node whose model which is most similar to the data item,
e.g., has the smallest distance from the data item in some metric.
Like a codebook vector in vector quantization, the model is then usually a certain weighted local average of the given data items in the data space. But in addition to that, when the models are computed by the SOM algorithm, they are more similar at the nearby nodes than between nodes located farther away from each other on the grid. In this way the set of the models can be regarded to constitute a similarity graph, and structured 'skeleton' of the distribution of the given data items.
The SOM was originally developed for the visualization of distributions of metric vectors, such as ordered sets of measurement values or statistical attributes, but it can be shown that a SOM-type mapping can be defined for any data items, the mutual pairwise distances of which can be defined. Examples of non-vectorial data that are feasible for this method are strings of symbols and sequences of segments in organic molecules (Kohonen and Somervuo 2002).
History
The SOM algorithm grew out of early neural network models, especially models of associative memory and adaptive learning (cf. Kohonen 1984). A new incentive was to explain the spatial organization of the brain's functions, as observed especially in the cerebral cortex. Nonetheless the SOM was not the first step in that direction: at least one has to mention the spatially ordered line detectors of von der Malsburg (1973) and the neural field model of Amari (1980). However, the self-organizing power of these early models was rather weak. The crucial invention of Kohonen was to introduce a system model that is composed of at least two interacting subsystems of different nature. One of these subsystems is a competitive neural network that implements the winner-take-all function, but there is also another subsystem that is controlled by the neural network and which modifies the local synaptic plasticity of the neurons in learning. The learning is restricted spatially to the local neighborhood of the most active neurons. The plasticity-control subsystem could be based on nonspecific neural interactions, but more probably it is a chemical control effect. Only by means of the separation of the neural signal transfer and the plasticity control has it become possible to implement an effective and robust self-organizing system.
Nonetheless, the SOM principle can also be expressed mathematically in a pure abstract form, without reference to any underlying neural or other components. The first application area of the SOM was speech recognition (see Fig. 2). In its abstract form, the SOM has come into widespread use in data analysis and data exploration (Kaski et al. 1998, Oja et al. 2003, Pöllä et al. 2007).
Mathematical definition of the SOM
Consider first data items that are
-dimensional Euclidean vectors
.
Here
is the
index of the data item in a given sequence. Let the
th model be
, where
now
denotes the index in the sequence in which the models are
generated. This sequence is defined as a smoothing-type process in
which the new value
is computed iteratively from the old
value
and the new data item
as
Here
is a scalar factor that defines the size of the
correction; its value decreases with the step index
The index
refers to the model under processing, and
is the index of the model that has the smallest distance
from
in the Euclidean signal space. The factor
is a kind of a smoothing kernel, also called the neighborhood function.
It is equal to 1 when
and
its value decreases when the distance between the models
and
on
the grid increases. Moreover, the spatial width of the kernel on
the grid shall decrease with the step index
. These functions
of the step index, which determine the convergence,
must be chosen very delicately: cf. , e.g., (Kohonen 2001).
Also the initialization of the
is a problem
discussed in (Kohonen 2001).
Although the above iterative algorithm has been used with success in
numerous applications, it has turned out that the scheme termed the
Batch Map produces essentially similar results but an order of
magnitude faster. The basic idea is that for every node
in the
grid, the average
of all those input items
is
formed that have
as the closest model. After that, new models
are computed as
where
is the number of the input items mapped
into the node
, and the index
runs over the nodes in the
neighborhood of node
. For the updated
, this scheme is
iterated for a few times, always using the same batch of input data
items to determine the updated
.
The mathematical theory of the SOM is very complicated and only the one-dimensional
case has been analyzed completely (Fort 2006). For a modified adaptation rule, in which
the matching is averaged over
, can the SOM be derived from a specific
potential function (Heskes 2001). Apparently, the SOM belongs to the "ill posed"
problems in mathematics.
Useful instructions when applying the SOM
It is advisable to pay attention at least to the following recommendations in order that the resulting mappings are stable, well oriented, and topologically correct.
Form of the array: A hexagonal grid of nodes is to be preferred for
visual inspection. The edges of the array ought to be rather
rectangular than square, because the "elastic network" formed of the model vectors
must be oriented along with the distribution
of the input
in the signal space.
On the other hand, since the
have to approximate to the distribution
of the
, it is further desirable that
the width and the height of the array at least roughly
correspond to the major dimensions (e.g. the [[principal
components]]) of the distribution
of
.
Scaling of the vector components is a very subtle problem. One may easily realize that the orientation, or ordered "regression" of the model vectors in the input space depends on the scaling of the components (or dimensions) of the input data vectors. Especially if the data elements represent variables of different kinds and are therefore given in different scales, there does not exist any simple rule to determine what kind of rescaling is best before entering the training data to the learning algorithm. One may try heuristically justifiable rescalings and check the quality of the resulting maps, e.g,, by means of Sammon's mapping (Sammon 1969) or the average quantization error. Often the normalization of all input variables such that, e.g., their variances become equal, is a useful strategy.
Learning with a small number of available training samples: In some problems, e.g., when making maps for items defined by a finite set of attributes, one does not have a large set of training samples. However, for a good statistical accuracy, the learning process may require a rather large number of training steps, and if the number of available samples is much smaller, the samples can be used reiteratively in training. The samples may be applied cyclically or in a randomly permuted order, or picked up at random from the basic set (called the bootstrap learning). It has turned out in practice that ordered cyclic application of the available samples is not noticeably worse than the other, statistically stringent sampling methods.
Quality of learning: Different learning processes will be
obtained when starting with different initial values
, and applying
different sequences of the training vectors
and different
learning parameters.
Of course it would be desirable for the resulting map
to always be the least ambiguous one.
It is obvious that some optimal map for the same
input data must exist.
When comparing maps with the same array structure and definition
of
, the mean of
(quantization error)
over the training data is a
useful performance index. Therefore, an appreciable number (say,
several dozen) of random initializations of the
may be
tried, and the map with the least quantization error selected.
Notice, however, that there would be no sense in comparing quantization
errors for maps with different structures or
, because, e.g., it is a trivial fact
that the error is minimum if
(Kronecker delta).
When using this singular kernel, however, there is no self-organizing power left, because the algorithm will be reduced to the classical vector quantization.
Software packages
Data analysis, clustering and visualization by the SOM can be done using either public domain, commercial, or self-coded software. The use of self-coded software is not encouraged as there are many subtle aspects that need to be taken into account and which affect the convergence and accuracy of the algorithm.
The SOM_PAK is a collection of tools for the correct application of the SOM. This original program package was created by the SOM Programming Team of the Helsinki University of Technology especially for very large problems and it is available for download at http://www.cis.hut.fi/research/som_lvq_pak.shtml.
The SOM Toolbox is a more flexible, general-purpose software library for MATLAB implementation of the SOM algorithm, and it uses the MATLAB graphics. The SOM Toolbox software package is available for download at http://www.cis.hut.fi/projects/somtoolbox/.
Although there exist many commercial implementations of the SOM for specific applications, it must be cautioned that in many of them there are fatal flaws, because all of the above precautions have not been taken into account.
Extensions of SOM
Numerous variants of the basic SOM have been suggested (cf. Kaski et al. 1998, Oja et al. 2003, and Pöllä et al. 2007). They are too many to be reviewed here. The main alternatives for the definition of the SOM models are the Generative Topographic Map (Bishop et al. 1998) and the information-based computation of the SOM models (Van Hulle 2000).
The structure of the SOM array can be made adaptive ("growing maps", Fritzke 1991), or alternatively the neighborhood relations can be defined in the input space (Martinetz et al. 1993).
The SOM principle can be modified to analyze subspace relations (Kohonen et al. 1997) or dynamic patterns (Hammer et al. 2004).
In its basic form, the SOM described in this article, as contrasted with its many variants, is in a special position, because it continues the classical methods such as principal component analysis (Hotelling 1933), multidimensional scaling (Kruskal and Wish 1978), and directly visualizes the clustering results as an easily comprehensible geometric display.
References
Amari, S. (1980). Topographic organization of nerve fields. Bulletin of Mathematical Biology, 42:339-364.
Bishop, C.M., Svensén, M. and Williams, C.K.I. (1998). GTM: The Generative Topographic Mapping. Neural Computation 10(1): 215-234.
Fort, J.C. (2006). SOM’s mathematics. Neural Networks, 19: 812–816.
Fritzke, B. (1991). Let it grow - self-organizing feature maps with problem dependent cell structure.Proceedings of ICANN 1991, North-Holland, Amsterdam, pp. 403-308.
Hammer, B., Micheli, A., Sperduti, A., Strickert, M. (2004). Recursive self-organizing network models. Neural Networks, 17(8-9):1061-1086.
Heskes, T. (2001). Self-organizing maps, vector quantization, and mixture modeling. IEEE Transactions on Neural Networks, 12:1299-1305.
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24: 417-441, 498-520.
Kaski, S., Kangas, J., and Kohonen, T. (1998). Bibliography of self-organizing map (SOM) papers: 1981-1997. Neural Computing Surveys, 1: 102-350.
Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43:59-69.
Kohonen T. (1984). Self-Organization and Associative Memory. Springer, Berlin.
Kohonen, T. (2001). Self-Organizing Maps. Third, extended edition. Springer.
Kohonen, T., Kaski, S. and Lappalainen, H. (1997). Self-organized formation of various invariant-feature filters in the adaptive-subspace SOM. Neural Computation, 9: 1321-1344.
Kohonen, T. and Somervuo, P. (2002). How to make large self-organizing maps for nonvectorial data. Neural Networks 15(8-9): 945-952.
Kruskal, J.B. and Wish, M. (1978). Multidimensional Scaling. Sage Publications, Beverly Hills, CA.
von der Malsburg, C. (1973). Self-organization of orientation sensitive cells in the striate cortex. Kybernetik, 14:85-100.
Martinetz, T.M., Berkovich, S.G. and Schulten, K. (1993). "Neural gas" for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks, 4: 558-569.
Oja, M., Kaski, S. and Kohonen, T. (2003). Bibliography of Self-Organizing Map (SOM) Papers: 1998-2001 Addendum. Neural Computing Surveys, 3: 1-156.
Pöllä, M., Honkela, T. and Kohonen, T. (2007). Bibliography of Self-Organizing Map (SOM) Papers: 2002-2005 Addendum. Neural Computing Surveys, forthcoming.
Sammon, J. (1969). A nonlinear mapping for data structure analysis. IEEE Trans. Comp., 18(5):401-409.
Van Hulle, M.M. (2000). Faithful Representations and Topographic Maps: From Distortion- to Information-Based Self-Organization. John Wiley & Sons, New York.
Internal references
- Valentino Braitenberg (2007) Brain. Scholarpedia, 2(11):2918.
- Rob Schreiber (2007) MATLAB. Scholarpedia, 2(7):2929.
External links
See Also
Adaptive Resonance Theory, Associative Memory, Bayesian Learning, Neural Networks, Self Organization
| Teuvo Kohonen, Timo Honkela (2007) Kohonen network. Scholarpedia, 2(1):1568, (go to the first approved version) Created: 18 May 2006, reviewed: 18 January 2007, accepted: 18 January 2007 |
| Reviewer A: | Dr. Jeff McKinstry, The Neurosciences Institute, San Diego, CA |




