Rival penalized competitive learning/APPENDIX

From Scholarpedia
Jump to: navigation, search

    (a) Subspace based functions

    In many practices, there is only a finite size of training samples distributed in small dimensional subspaces, instead of scattering over all the dimensions of the observation space. These subspace structures can not be well captured by considering a basis \( exp[-0.5(x-m_j)^T\Sigma_j^{-1} (x-m_j)]\) supported on the entire space of \(x\ .\) There are too many free parameters in \(\Sigma_j\ ,\) which usually leads to poor performances. Instead, we consider a basis on a subspace as shown in (a), where observed samples are regarded as generated from a subspace with independent factors distributed along each coordinate of a \(m_{\ell}\) dimensional inner representation \(y\ .\)

    Figure 1: Subspace based function


    Shown in , we may let \( G(x|m_j,\Sigma_j)\) in eq() and eq() to be replaced by \( G(x|m_j,A_j\Lambda_jA_j+\Sigma_j)\) that considers \(x\) as generated from a lower dimensional subspace spanned by the columns of \(A_j\ ,\) while the mapping to \(z\) is described by \(q(z|x,y,\ell)\) based on this subspace also. Specifically, there are two typical choices:

    • Type A is indicated by \(i_Z=0\ ,\) which corresponds to the previous ME by eq() and RBF networks by eq() with \(f_j(x,\phi_j)= W_jx+c_j \) for \(x \to z\) directly while the gating net in eq() and basis function in eq() are supported on the subspace of \(y\) instead of the original space of \(x\ .\)
    • Type B is indicated by \(i_Z=1\ .\) It performs a mapping \(y \to z\) from the lower dimension subspace. We seek a mapping \(x \to y\) to get a cascade mapping \(x \to y \to z\ .\) From two Gaussians \(G(y|0, \Lambda_j)\) and \(G(x|Ay+m_j), \Sigma_j\ ,\) a choice for \(x \to y \) is their posteriori inverse in a Bayesian sense, from which we get \(x \to z\) by a Gaussian \( \begin{array}{l}\int G(y|U(x-m_j), \Pi_j^{y \ -1})G(y|0, \Lambda_j)dy\end{array} \) as \(G(z|f_j(x,\phi_j), \Gamma_j)\) in eq() with \(f_j(x,\phi_j)= W_jU(x-m_j)+c_j \ .\) Putting them into eq(), learning is made by those algorithms in (b) again.

    Correspondingly, we get two types of subspace based gating networks and subspace based functions (SBF). Type B further improves Type A as the mapping \(x \to y \) acts as feature extraction, such that redundant parts are discarded.

    Personal tools
    Namespaces

    Variants
    Actions
    Navigation
    Focal areas
    Activity
    Tools