Rival penalized competitive learning/APPENDIX
(a) Subspace based functions
In many practices, there is only a finite size of training samples distributed in small dimensional subspaces, instead of scattering over all the dimensions of the observation space. These subspace structures can not be well captured by considering a basis \( exp[-0.5(x-m_j)^T\Sigma_j^{-1} (x-m_j)]\) supported on the entire space of \(x\ .\) There are too many free parameters in \(\Sigma_j\ ,\) which usually leads to poor performances. Instead, we consider a basis on a subspace as shown in (a), where observed samples are regarded as generated from a subspace with independent factors distributed along each coordinate of a \(m_{\ell}\) dimensional inner representation \(y\ .\)
Shown in , we may let \( G(x|m_j,\Sigma_j)\) in eq() and eq() to be replaced by \( G(x|m_j,A_j\Lambda_jA_j+\Sigma_j)\) that considers \(x\) as generated from a lower dimensional subspace spanned by the columns of \(A_j\ ,\) while the mapping to \(z\) is described by \(q(z|x,y,\ell)\) based on this subspace also. Specifically, there are two typical choices:
- Type A is indicated by \(i_Z=0\ ,\) which corresponds to the previous ME by eq() and RBF networks by eq() with \(f_j(x,\phi_j)= W_jx+c_j \) for \(x \to z\) directly while the gating net in eq() and basis function in eq() are supported on the subspace of \(y\) instead of the original space of \(x\ .\)
- Type B is indicated by \(i_Z=1\ .\) It performs a mapping \(y \to z\) from the lower dimension subspace. We seek a mapping \(x \to y\) to get a cascade mapping \(x \to y \to z\ .\) From two Gaussians \(G(y|0, \Lambda_j)\) and \(G(x|Ay+m_j), \Sigma_j\ ,\) a choice for \(x \to y \) is their posteriori inverse in a Bayesian sense, from which we get \(x \to z\) by a Gaussian \( \begin{array}{l}\int G(y|U(x-m_j), \Pi_j^{y \ -1})G(y|0, \Lambda_j)dy\end{array} \) as \(G(z|f_j(x,\phi_j), \Gamma_j)\) in eq() with \(f_j(x,\phi_j)= W_jU(x-m_j)+c_j \ .\) Putting them into eq(), learning is made by those algorithms in (b) again.
Correspondingly, we get two types of subspace based gating networks and subspace based functions (SBF). Type B further improves Type A as the mapping \(x \to y \) acts as feature extraction, such that redundant parts are discarded.