Rival penalized competitive learning
From Scholarpedia
| Lei Xu (2007), Scholarpedia, 2(8):1810. | revision #37114 [link to/cite this article] | |||||||||||||||||||
Curator: Dr. Lei Xu, Dept. Computer Science & Engineering, Chinese University of Hong Kong
Rival penalized competitive learning (RPCL) is a development of competitive learning in help of an appropriate balance between participating and leaving mechanisms, such that an appropriate number of agents or learners will be allocated to learn multiple structures underlying observations. Moreover, RPCL has been further extended into a general divide-conquer learning framework for multi-agents to divide and conquer a complicated task.
Classical competitive learning: from WTA to FSCL
Proposed decades ago, classical competitive learning (CCL) is usually used for making data clustering by a number of units or agents. Each agent is featured by a parametric point
that moves among the observation samples and seeks to be located at the center of a cluster of samples, such that several points collectively perform clustering analysis. As a sample
comes, each
evaluates a degree of its suitability on representing
by an individual value or criterion
, and competes to be allocated to represent
. The competition is guided globally by the winner-take-all (WTA) policy
- (1)
Then, each
is updated to further reduce
by
- (2)
When
are initialized as in Fig.1(a), as a sample
comes,
becomes the winner and then moves a little bit towards
. Next, as another sample
comes in Fig.1(b),
becomes the winner and then moves similarly. Finally,
and
each converge to one clusters' center as in Fig.1(c).
However, if
are initialized as in the top part of Fig.1(d),
always becomes the winner and finally moves the middle point between two clusters, while
never wins and becomes dead. This is known as the under-utilized or dead unit problem, featured by a WTA based competitive learning (Rumelhart et al, 1985; Grossberg, 1978; Hecht-Nielsen, 1987), which makes the task of clustering analysis badly performed. This problem comes from unequal chances for the dead agents to participate competition due to their unfavorable initial locations. Thus, it is desired that every agent should be initialized to have an equal chance to join competition, which is actually difficult to implement without a priori knowledge on the entire clustering structure, while such a knowledge is exactly the goal of this competitive learning.
One way to tackle this difficulty is a so called frequency sensitive competitive learning (FSCL) (Ahalt, et al, 1990), in help of modifying
into
- (3)
where
is the frequency that the
-th agent won in past. Illustrated in Fig.2 is an example of four clusters, the CCL resulted in three dead units in Fig.3, while the FSCL makes all the four units placed to the cluster's centers instead of becoming dead, as shown in Fig.4.
Such a sprit of reducing the winning rates of frequent winners can also be implemented in alternative ways, e.g., conscience (Desieno, 1988), leaky learning (Rumelhart et al, 1985; Grossberg, 1978), convex bridge (Hecht-Nielsen, 1987), and neighbor sharing in Kohonen network (Kohonen,1982, 1995).
Rival penalized competitive learning
The way of reducing the winning rates works when the number
of clusters is known. Unfortunately, it is usually unknown. Not only it is no doubt that both CCL and FSCL will result in bad performances when
is given a smaller number, but also FSCL fails when
is pre-assigned to a value larger than an appropriate one. As illustrated in Fig.5, the frequency sensitive mechanism will bring the extra agents into data to disturb the correct locations of other agents. Thus, how to determine
is a critical difficulty too.
One solution is provided under the name of Rival Penalized Competitive Learning (RPCL), proposed in the early 1990's for clustering analysis and detecting lines in an image (Xu, Krzyzak, & Oja, 1992 & 93). As illustrated in Fig.6, not only the winner is learned but also its rival (i.e., the second winner) is repelled a little bit from the sample to reduce a duplicated information allocation. As a sample
comes,
is the winner and
is its rival in Fig.6(a),
is moved towards
by a small step size, while
is repelled away from
by a much smaller step size. Similarly in Fig.6(b),
is the winner and moved towards
, while
is its rival and repelled away from
.
More precisely, the RPCL differs from FSCL in that eq.(2) is implemented with eq.(1) replaced by
- (4)
where
approximately takes a number between
and
for controlling the penalizing strength. This is an
empirical heuristics, while there still lacks a theoretical guide on how to
determine an appropriate
, which will be further discussed
at the end of this paper.
As illustrated in Fig.6(c),
and
finally converge to clusters' centers, while
is driven far away from samples. Shown in Fig.7 is an experimental result that improves its FSCL counterpart in Fig.5, with a disturbing agent driven away from samples. Adding this rival penalized mechanism makes extra agents driven far away from data as long as the number of agents is initially given to be larger than the number of clusters to be detected. In other words, RPCL can allocate an appropriate number of learning agents to detect a correct number of clusters or objects automatically during learning. This automatic allocation is a favorable feature that both CCL and FSCL as well as the conventional clustering algorithms (e.g., k-means) do not have. In the past decade, a number of applications of RPCL have been made (Acciani, et al, 2003; Nair, 2003; Nakkrasae & Sophatsathit, 2004).
A general framework for divide-conquer learning
CCL, FSCL, and RPCL can be investigated systemically from the perspective of a general framework for learning based problem solving. That is, multi-agents coordinately divide a complicated task into a number of subtasks and each subtask is mainly conquered by an individual agent such that the entire task can be well solved collectively by these multi-agents (Xu, 2007). In the following, we introduce the four basic ingredients of this general framework.
- Agent's structure and initialization Each agent needs a hardware
that is able to handle typical problem solving tasks. This
is usually featured by a pre-designed structure with a set
of parameters to be determined via learning. The simplest example is that
is given by a point
. Another example is its probabilistic extension that
is given by a Gaussian distribution
with a priori probability
(Xu, 1998). Even further,
can be extended to any parametric distribution
. At the beginning, it is desired that every
is initialized such that every agent has an equal chance to enter learning. However, it is usually difficult to do so without an enough piece of a priori knowledge.
- Individual criterion As an extension of the previous
, each agent has an individual value
to evaluate its performance per sample
observed. Typically, this performance is measured by considering an error
- (5)
where
is a reconstruction or a representation of
by
. One typical instance of such a measure is simply the least square error
. For a simple case
, we have
and
. Another type of instances is given in the probabilistic forms (Xu, 1998)
- (6)
where
includes
or equivalently
as a special case. Type II can be regarded as a further extension of eq.(3) since it returns equivalently to eq.(3) when
. Given an equal priori
, both Type I and Type II degenerate to be equivalent to a negative likelihood
. Other criteria are referred to (Xu, 2007).
- Global principle Extending eq.(1), agents are coordinated via a global principle that combines individual criteria
. The global principle can be observed in two forms. One is a criterion for evaluating the overall performance jointly by all the agents, e.g., a simple one is the following weighted sum
- (7)
The other form is an allocation scheme on a per sample basis
- (8)
according to which
is assigned to the j-th agent with a weight
.
- Updating rule As a further extension of eq.(2), the activated agents are modified to adapt
according to the allocation by
, e.g.,
- (9)
where
is a learning step size, and
is the domain of
, within which
may need to satisfy certain constraints. Moreover,
denotes an operator that acts on
and
to yield a direction along which
reduces within a neighborhood region of
. When
is differentiable,
can be given by
or
on
, where
is a positive definite matrix. Different types of
and different ways of controlling
will jointly lead to updating rules with difference convergence performances.
In this framework, a conquering action is performed via a updating rule by eq.(9), while a dividing action is performed via an allocation scheme by eq.(8). In addition to the problem of parameter learning, i.e., estimating unknown parameters in all the agents, it also needs to select an appropriate
of agents to be allocated, such that not only each agent performs best according to its individual criterion but also a best overall performance is achieved according to its global principle. In the literatures of statistics and machine learning, the task of determining the number
belongs to what is usually called model selection.
In past decades, several typical model selection theories have been proposed for model selection, such as AIC and extensions (Akaike, 1974; Bozdogan & Ramirez, 1988; Cavanaugh, 1997), Bayesian approach related criteria under the names of BIC (Schwarz, 1978), MML (Wallace, 1999), and MDL (Rissanen, 1986), the cross validation based criteria (Stone, 1978; Rivals & Personnaz, 1999). In implementation, model selection has to be made in a two-phase procedure. At the first phase, a number of candidate models are enumerated, and the unknown parameters in each candidate are estimated by the maximum likelihood (ML) principle. At the second phase, a criterion is obtained from one of the above theories and used on every candidate with estimated parameters to get the best candidate. Unfortunately, a two-phase implementation is very expensive to compute. Moreover, these theories can only provide a rough estimation or even a wrong estimate especially on a small size of samples.
Automatic model selection via coordinated mechanisms
As illustrated in Fig.8(a), in addition to allocating agents and conquering each subtask individually via determining all unknown parameters in every agent, an appropriate combination of a global principle and individual criteria actually results in automatic model selection during parameter learning, due to a coordination of two opposite mechanisms, namely one for each agent to participate and the other for each agent to leave.
The WTA allocation by eq.(1) is obtained via minimizing the global criterion by eq.(7) with respect to
under the constraints
. It leads to the previous CCL with a local criterion
. As mentioned early, it has the dead unit problem because agents do not have an equal chance to participate. This problem can be solved via a participating mechanism that lets each agent to have an equal initial chance to participate, through either or both of its local criterion and the global criterion.
For the FSCL that bases still on the WTA allocation by eq.(1), its local criterion by eq.(3) provides a participating mechanism that penalizes those frequent winners and compensates those constant losers. Alternatively, we can still use
but minimize the global criterion by eq.(7) under the following constraint
- (10)
which leads to the following soft-allocation:
- (11)
where every agent in
gets a chance
to participate per sample coming.
However, all the above cases are lack of a mechanism for an incapable or extra agent to leave. As a result, they will still fail when the number of agents in consideration is larger than an appropriate one. As previously introduced, RPCL improve FSCL via penalizing a rival by eq.(4), which provides a mechanism to drive an incapable or extra agent away. An appropriate balance between this leaving mechanism and the participating mechanism, as
illustrated in Fig.8(a), leads to a favorable RPCL nature of automatic model selection, with the balance controlled by the de-learning rate
in eq.( 4).
Instead of providing a participating mechanism via a local criterion by eq.(3), a balance between the mechanisms for participating and leaving can also be obtained by the following generalized RPCL allocating scheme:
- (12)
where
provides a sharing allocation similar to that by eq.(11), while
provides a penalizing scheme that distributes a penalizing strength
over agents. One extreme case is eq.(4) with
by eq.(1) and
at
and
for
. One another extreme case
.
Alternatively, we can get a leaving mechanism via an appropriate local criterion too, e.g., by eq.(6) instead of using a global allocating policy such as either of eq.(4) and eq.(12). Without losing generality, still using the WTA by eq.(1), we consider Type I in eq.(6) with a special Gaussian
. It follows from that
varies with
as shown in Fig.8(b), where
is the dimension of
. If the
-th agent wins a lot of samples,
will become larger than a threshold, not only
will increase but also the gap between different agents reduces. In other words, the competing ability of agents that won too much in past is self-penalized and the competing ability of weaker agents is compensated for a relative boosting, which thus provides a participating mechanism similar to
in eq.(3). This similarity can also be observed from the term
that
takes a role similar to
in eq.(3).
Also, when the
-th agent won too fewer samples with its
dropping below certain threshold, not only
will increase but also the corresponding gap between different agents increases. As a result, an incapable/extra agent is self-penalized to fade out, which also provides a leaving mechanism that is different from the RPCL one by eq.(4). Moreover, as the competing ability of the
-th agent decreases,
decreases and the other
will relatively increase. That is, the term
enhances this self-penalizing featured leaving mechanism. As a result, the local criterion intentionally seeks an appropriate balance between its participating mechanism and leaving mechanism, with an automatic model selection nature.
For Type II in eq.(6), we have
that varies with
in a way similar to Fig.8(b). Thus, the above discussion applies similarly. The difference is that
takes a role similar to that in eq.(3), which enhances a participating mechanism instead of a leaving mechanism.
Categorization on Gaussian mixture
Since the coordinating nature between a participating mechanism and a leaving mechanism is actually featured via a combination of a specific individual criterion
and a specific instance of global allocating scheme by eq.(8), here we categorize different combinations to get an overlook on algorithms that fall in the proposed divide-conquer learning framework. Several typical examples on Gaussian mixture are listed in Fig.9.
It follows from the first row with
by eq.(1) that varying
leads us to CCL, FSCL, and an extension from CCL to a special case of Bayesian Ying Yang Learning or shortly BYY-CL. Then, we go to the third row with
by eq.(11) at
. The block (3,1) is a typical example of Type I in eq.(6), with
by eq.( 11) becoming the Bayes posterior allocation:
- (13)
From
, we observe that the updating by eq.(9) is equivalent to making an ascent updating of the likelihood on
. That is, the block (3,3) leads to an adaptive version of the well known EM algorithm for the maximum likelihood (ML) learning on Gaussian mixture with
(Redner & Walker, 1984; Xu & Jordan, 1996). Solving the previously mentioned dead unit problem due to
by eq.(1),
by eq.(13) enhances the participating mechanism of each agent, which is along a direction similar to that of FSCL. Still,
needs to be pre-specified since it is well known that the ML learning is not good at model selection, especially on a small size of samples. Actually, the participating mechanism of each agent may be over-enhanced such that the leaving mechanism discussed in Fig.8(b) fails to balance. Moreover, the block (3,1) is equivalent to its degenerated case at
and
, where the leaving mechanism in Fig.8(b) is switched off completely. Alternatively, the block (3,2) is equivalent to a degenerated case of Type II in eq.(6) at
, which modifies FSCL with a further enhancement on each agent's participating mechanism, and thus may result in a further imbalance over leaving mechanism.
Lessening the chance of introducing an excess enhancement on each agent's participating mechanism, the second row enhances the participating mechanism by eq.(11) on a subset of agents per sample comes. Being
maximum posteriors, the sharing allocation by eq.(11) with
is thus referred as
-map sharing shortly. As a result, the second row behaves between those of the first row and the third row. For the last two blocks or particularly the block (2,3), with an appropriate
value the leaving mechanism in Fig.8(b) and the participating mechanism via
by eq.(11) may reach a good balance with an automatic selection on
.
On the other hand, the 4th row also introduces a leaving mechanism via penalizing the rival via
by eq.(4). The block (4,2) is exactly the previously discussed RPCL, while the block (4, 1) switches off the frequent sensitive mechanism and the block (4, 3) is added with the combined mechanisms in Fig.8(b). Moreover, the 4th row is further extended into the 5th row via the generalized RPCL scheme by eq.(12).
Finally, it is noted that not only the block (1,3) and block (2,3) but also the RPCL one by eq.(4) and its extension by eq.(12) (i.e., the block (3,3) and block (4,3)) can be obtained from the Bayesian Ying Yang Learning. Moreover, the criterion by eq.(7) can also be derived from one of its special cases (Xu, 2006).
Mixture of experts, RBF networks, and object detection
RPCL is applicable to tasks that are featured by learning multiple structures underlying samples or detecting multiple objects among observations. Typical examples include not only a finite mixture (particularly Gaussian mixture) based density estimation and clustering analysis but also those mixture-of-experts or RBF nets based supervised learning, as well as learning based detection of multiple objects on an image, which are briefly introduced as follows.
- Mixture-of-experts (ME). It targets at learning an input-output map
or its probabilistic version
from a set of pairs
, which is usually in a form
that represents a regression function
under a Gaussian noise with a variance
. A gating net
will coordinate experts to perform function approximation type tasks (e.g., prediction, control, ..., etc) via
- (14)
which has been widely studied and applied. The original ME model (Jacobs, et al, 1991; Jordan & Xu, 1995) is featured by the maximum likelihood (ML) learning on
, but its gating model
makes the ML learning difficult to be implemented by the EM algorithm. An alternative ME model (Xu, Jordan, & Hinton, 1995) has been proposed with the ML learning easily implemented by an EM algorithm, in help of the following gating model:
- (15)
Also, RPCL can be applied to this alternative ME further with the number of experts determined automatically during learning, simply in help of implementing eq.(9) with the following measure:
- (16)
- RBF nets. In particular, when
, the alternative ME actually becomes an extension of a typical family of RBF nets, which is simplified into an extended normalized RBF net at the following special case
- (17)
which further degenerates into a usual normalized RBF net (Moody, & Darken, 1989), simply by setting
. In other words, an adaptive EM algorithm for the ML learning can be directly applied to replace the conventional but suboptimal two stage approach for RBF net learning. Moreover, with the constraint
removed, we can implement RPCL by eq.(9) and eq.(14), further with the number of basis functions determined automatically during learning.
- Object detection. This is a classic task in the field of pattern recognition and computer vision. Shown in Fig.6 is a comparative experiment on detecting eleven elliptic blood cells on a real image. After edge detection, the classical Hough transform (HT) (Hough, 1962), the Randomized Hough transform (RHT) (Xu, Oja, & Kultanen, 1991; Xu & Oja, 1993), and RPCL based object detection are used on the resulted edge image to detect ellipses. It can be seen that the RPCL approach outperforms RHT significantly though the newly developed RHT has significantly outperformed HT already (Liu, Qiao, & Xu, 2006). The approaches have also been extended to detect objects in a shape without a parametric expression but via a template in a pre-designed specific shape. The details for implementing RPCL based object detection are referred to (Xu, 2007).
Open problems
The RPCL learning was heuristically proposed on a bottom level (i.e., a level of learning dynamics or updating mechanisms), while Bayesian Ying-Yang learning is proposed and developed on a global level of guiding parameter learning and model selection by a learning principle or theory in a top-down manner. Both approaches are featured by automatic model selection ability during parameter learning. Recently, certain links between the two approaches have been built up (Xu, 2001 &02; Ma, Wang, & Xu, 2004), with convergence behavior of RPCL is interpreted qualitatively in a top-down way by the BYY harmony learning. However, challenges still remain at least in two aspects.
One is on theoretical analysis of RPCL itself. A classic competing process is already a highly nonlinear process. How to balance the participating mechanism and leaving mechanism such that an appropriate
can be selected is a key challenge, with two open problems as follows:
- Though many empirical studies have been made on RPCL, little progress has been achieved on its mathematical convergence analysis. Letting the learning step size
in eq.(9) to gradually reduce to zero in some way, e.g., according to (Robbins & Monro, 1951), the learning process will eventually freeze. However, there lacks theoretical analysis on its correct selection ability (i.e., whether a correct number of agents can be allocated) and the convergence (i.e., whether each agent will converge to a good representation of an individual substructure or object), as well as the convergence rate, ..., etc.
- Actually, the above problems relate to closely the problem of how to control the penalizing strength
in eq.(4). It has no penalizing effect if
is too small, while the learning will oscillate or diverge if
is too larger. Currently,
is selected heuristically by experiments. It may be related to the dimension of data space, the number of substructures, the distributions of data samples, etc, but little theoretical results are available yet.
The other is on choices listed in Fig.9. Two interesting topics are as follows:
- Further comparative studies on features of eac item, especially RPCL, BYY-RPCL, BYY-CL, as well as those in the second row.
- How the
affect the performances ? how to select an appropriate value for
?
Moreover, it also deserves to make a further comparative study on Type I and Type I learning in eqn.(6).
External Links
Author's website (http://www.cse.cuhk.edu.hk/~lxu/)
References
- Acciani, G., et al (2003), "A feature extraction unsupervised neural network for an environmental data set", Neural Networks 16: 427-436.
- Ahalt, SC., et al (1990), "Competitive learning algorithms for vector quantization", Neural Networks 277-291.
- Akaike, H (1974), "A new look at the statistical model identification", "IEEE Tr. Automatic Control" 19: 714-723.
- Bozdogan, H & Ramirez, DE (1988), "FACAIC: Model selection algorithm for the orthogonal factor model using AIC and FACAIC", "Psychometrika" { 53} (3): 407-415.
- Cavanaugh, JE (1997), "Unifying the derivations for the Akaike and corrected Akaike information criteria", "Statistics & Probability Letters ", {33}: 201-208.
- Desieno, D (1988), "Adding a conscience to competitive learning", Proc. of IEEE int. conf. on Neural Networks, Vol.I, pp.117-124.
- Grossberg, S (1987), "Competitive learning: from iterative activation to adaptive resonance", Cognitive Science, 11: 23-63.
- Hecht-Nielsen, R (1987), "Counterpropagation networks", Applied Optics 26: 4979-4984.
- Hough, PVC "Method and means for recognizing complex patterns", U.S. Patent 3069654, Dec.18, 1962.
- Jacobs, RA., et al (1991), "Adaptive mixtures of local experts", Neural Computation 3: 79-87.
- Jordan, MI & Xu, L (1995), "Convergence results for the EM approach to mixtures of experts", Neural Networks 8: 1409-1431.
- Kohonen, T (1982), "Self-organized formation of topologically correct feature maps", Biological Cybernetics 43: 59-69.
- Liu, ZY, Qiao, H, & Xu, L (2006), "Multisets Mixture learning based Ellipse Detection", Pattern Recognition 39: 731-735.
- Ma, J, Wang, T, & Xu, L (2004), "A gradient BYY harmony learning rule on Gaussian mixture with automated model selection", Neurocomputing 56: 481-487.
- Moody, J & Darken, J (1989) "Fast learning in networks of locally-tuned processing units", Neural Computation 1: 281-294.
- Nair, TM., et al, (2003) "Rival penalized competitive learning (RPCL): a topology-determining algorithm for analyzing gene expression data", Computational Biology and Chemistry 27: 565-574.
- Nakkrasae, S, & Sophatsathit, P (2004), "An rpcl-based indexing approach for software component classification" International J. of Software Engineering and Knowledge Engineering 14: 497-518.
- Redner, RA & Walker, HF (1984), "Mixture densities, maximum likelihood, and the EM algorithm", SIAM Review 26: 195-239.
- Rissanen, J (1986), "Stochastic complexity and modeling"," Annals of Statistics", 14 (3): 1080-1100.
- Rivals, I & Personnaz, L (1999) "On Cross Validation for Model Selection", " Neural Computation" 11: 863-870.
- Robbins, H., & Monro, S., (1951), "A stochastic approximation method", Ann. Math. Statist., 22, 400-407.
- Rumelhart, DE & Zipser, D (1985), "Feature discovery by competitive learning", Cognitive Science 9: 75-112.
- Schwarz, G (1978), "Estimating the dimension of a model", "Annals of Statistics" 6: 461-464.
- Stone, M (1978), "Cross-validation: A review", "Math. Operat. Statist. " 9: 127-140.
- Xu, L (2007), "A Unified Perspective and New Results on RHT Computing, Mixture Based Learning, and Multi-learner Based Problem Solving", (in press), " Pattern Recognition", 40: 2129-2153.
- Xu, L (2007), "A Trend on Regularization and Model Selection in Statistical Learning: A Perspective from Bayesian Ying Yang Learning", an invited chapter in "Studies in Computational Intelligence 63: Challenges to Computational Intelligence", Duch, W. and Mandziuk, J., 365-405.
- Xu, L (2002), "BYY Harmony Learning, Structural RPCL, and Topological Self-Organizing on Unsupervised and Supervised Mixture Models", Neural Networks 15: 1125-1151.
- Xu, L (2001), "Best Harmony, Unified RPCL and Automated Model Selection for Unsupervised and Supervised Learning on Gaussian Mixtures, ME-RBF Models and Three-Layer Nets", International J. of Neural Systems 11: 3-69.
- Xu, L (1998), "Rival Penalized Competitive Learning, Finite Mixture, and Multisets Clustering", Proc. of IJCNN98, Anchorage, Vol.II, pp.2525-2530.
- Xu, L & Jordan, MI (1996), "On convergence properties of the EM algorithm for Gaussian mixtures", Neural Computation 8(1): 129-151.
- Xu, L, Jordan, MI, & Hinton, GE (1995), "An Alternative Model for Mixtures of Experts", in Cowan, Tesauro, and Alspector, eds., Advances in Neural Information Processing Systems 7, MIT Press, 633-640.
- Xu, L, Krzyzak, A, & Oja, E (1993), "Rival Penalized Competitive Learning for Clustering Analysis, RBF net and Curve Detection", IEEE Tr. on Neural Networks 4: 636-649.
- Xu, L, & Oja, E (1993), "Randomized Hough Transform (RHT): Basic Mechanisms, Algorithms and Complexities", Computer Vision, Graphics, and Image Processing: Image Understanding 57: 131-154.
- Xu, L, Krzyzak, A & Oja, E (1992), "Unsupervised and Supervised Classifications by Rival Penalized Competitive Learning", Proc. of 11th Intl. Conf. on Pattern Recognition, Hauge, Netherlands, Aug.30-Sept.3, 1992, Vol.I, pp.672-675.
- Xu, L, Oja, E & Kultanen, P (1990), "A New Curve Detection Method: Randomized Hough transform (RHT)", Pattern Recognition Letters 11: 331-338.
Internal references
- Lei Xu (2007) Bayesian Ying Yang learning. Scholarpedia, 2(3):1809.
- Teuvo Kohonen and Timo Honkela (2007) Kohonen network. Scholarpedia, 2(1):1568.
- Jeff Moehlis, Kresimir Josic, Eric T. Shea-Brown (2006) Periodic orbit. Scholarpedia, 1(7):1358.
See also
- Bayesian Ying Yang Learning
- Unsupervised Learning
- Data Clustering
- Competitive Learning
- Winner-Take-All
- Kohonen Network
- Mixtures of Experts
- Gaussian mixture
- Radial Basis Function Networks
| Lei Xu (2007) Rival penalized competitive learning. Scholarpedia, 2(8):1810, (go to the first approved version) Created: 31 July 2006, reviewed: 17 August 2007, accepted: 31 August 2007 |
| Action editor: | Dr. Eugene M. Izhikevich, Editor-in-Chief of Scholarpedia, the free peer reviewed encyclopedia |




