Matching pursuit

From Scholarpedia

Piotr J. Durka (2007), Scholarpedia, 2(11):2288. revision #20910 [link to/cite this article]

Revision as of 16:26, 17 September 2007; view current revision
←Older revision | Newer revision→

Curator: Piotr J. Durka, University of Warsaw, Warszawa, Poland

Matching pursuit (MP) algorithm finds a sub-optimal solution to the problem of an adaptive approximation of a signal in a redundant set (dictionary) of functions. Commonly used with dictionaries of Gabor functions, it offers several advantages in time-frequency analysis of signals, in particular EEG/MEG.

Contents

Dictionaries and linear expansions

Many signal analysis methods look for a linear expansion of the unknown signal x in terms of functions g_n

(1)
x = \sum_{n=1}^{N} a_n g_n

We may say that in such a way the unknown signal x is explained using words (functions g_n) from the dictionary D, used for decomposition. If the dictionary D is an orthonormal basis (like orthogonal wavelets or Fourier bases), then the coefficients a_n are given simply by the inner products of the dictionary's functions g_n with the signal \langle x, g_n \rangle.

We would like to use a dictionary D=\lbrace g_n\rbrace_{n=1..N} that would properly reveal the intrinsic properties of an unknown signal, or, almost equivalently, would give low entropy of the \lbrace a_n\rbrace and possibilities of good lossy compression. Unfortunately, there is no universal recipe for such a prior choice.


Adaptive approximations

We may relax the requirement of exact signal representation (1), and try to automatically choose the functions g_{\gamma_n}, optimal for the representation of a given signal x, from a redundant dictionary D. The expansion becomes an approximation, and uses only the functions g_{\gamma_n} chosen from the redundant dictionary D. In practice, the dictionary contains orders of magnitude more candidate functions g_\gamma than the number M of functions chosen for the representation.

(2)
x\approx \sum_{n=0}^{M-1} a_n g_{\gamma_n}

A criterion of optimality of a given solution (for a fixed dictionary D, signal x, and number of used functions M) can be formulated as minimization of the error of representation

(3)
\epsilon=\left\| x-\sum_{n=0}^{M-1} a_n g_{\gamma_n} \right\| = \min

Finding the minimum requires checking all the possible combinations (subsets) of M functions from the dictionary, which leads to a combinatorial explosion. Therefore, the problem is intractable even for moderate dictionary sizes. Matching pursuit algorithm, proposed in (Mallat and Zhang 1993), finds a sub-optimal solution by means of an iterative procedure.


Matching pursuit algorithm

In the first step, the waveform g_{\gamma_0} which gives the largest product with the signal x is chosen from the dictionary D, composed of normalized functions (||g_{\gamma_n}||=1). In each of the consecutive steps, the waveform g_{\gamma_n} is matched to the signal R^n x which is the residual left after subtracting results of previous iterations:

(4)
\begin{cases} R^0 x = x \\  R^n x = \langle R^n x,g_{\gamma_n}\rangle g_{\gamma_n}+R^{n+1}x\\   g_{\gamma_n} = \arg \max_{g_{\gamma_i} \in D } |\langle R^n x, g_{\gamma_i}\rangle | \end{cases}

Orthogonality of R^{n+1} x and g_{\gamma_n} in each step implies energy conservation

||x||^2 =\sum_{n=0}^{M-1} {|\langle R^n x, \;g_{\gamma_n}\rangle |^2} + ||R^M x||^2

For a complete dictionary the procedure converges to x with M\rightarrow\infty (Mallat and Zhang 1993). In practice we use finite expansions

(5)
x\approx\sum_{n=0}^{M-1} {\langle R^n x,\; g_{\gamma_n}\rangle g_{\gamma_n} }

Multichannel matching pursuit

Multichannel extensions of the matching pursuit algorithm can be achieved in many significantly different ways, possibly corresponding to the aims of analysis and/or to the model of generation of the multichannel signal. For example, the most straightforward approach, fixing across the analyzed channels all the parameters except for the amplitudes, corresponds to the model assuming instantaneous propagation of brains electrical activity to the surface of the skull and can be used as preprocessing for the EEG inverse solutions (c.f. Durka 2007).

Dictionary of time-frequency atoms

Matching pursuit
Enlarge
Figure 1: Example Gabor functions, generated from equation (6).

Dictionaries (D), commonly used for time-frequency analysis, are composed of Gabor functions, that is Gaussian envelopes modulated by sinusoidal oscillations. Real-valued Gabor function can be expressed as

(6)
g_\gamma (t) = K(\gamma)e^{-\pi{ \left( \frac{t-u}{s} \right) }^2} \cos\left(\omega (t-u)+\phi\right)

where \gamma=\{u, s, \omega, \phi\}, and K(\gamma) is a normalization factor ensuring ||g_{\gamma}||=1. These functions provide a general and compact model for transient oscillations; the above equation can describe parametrically a wide variety of shapes (figure 1). Apart from that, Gabor functions exhibit compact time-frequency localization. Gabor dictionaries used for matching pursuit decomposition usually contain also complete Dirac and Fourier bases. Other functions can be also used with the matching pursuit algorithm.

In practical implementations of matching pursuit with Gabor dictionaries we must restrict the search to a discrete and finite subset of the continuous 3-dimensional space of parameters \{u, s, \omega\}; phase (\phi) is optimized separately in numerical implementations.


Time-frequency energy density

Wigner-Ville transform

Matching pursuit
Enlarge
Figure 2: Cross-terms in time-frequency representations of signal's energy: (a + b)^2 = a^2 + b^2 + 2ab. Wigner-Ville transform of a signal composed of two sine waves with compact and separated support (below). The middle structure (labelled "2ab"), appearing in the time coordinates where the signal is flat, is a cross-term.

It can be simply demonstrated that the squared modulus of the Fourier transform of the signal x, that is the power spectrum of x, can be computed as the Fourier transform of the autocorrelation function of x (Wiener-Khinchine theorem). That is, for a real-valued signal x

(7)
|\mathcal{F}(x)|^2 = \mathcal{F} \left(\int_{-\infty}^{\infty} x(t+\frac{\tau}{2}) x(t-\frac{\tau}{2}) d\tau \right)

where the Fourier transform of the signal x is given by

(8)
\mathcal F(x)=\int_{-\infty}^{\infty} e^{-i\omega t} x(t) dt

Substituting explicitly (8) into (7) we get the following formula for the power spectral density of x:

(9)
|\mathcal{F}(x)|^2 = \int_{-\infty}^{\infty} e^{-i\omega \tau}  \left(\int_{-\infty}^{\infty} x(t+\frac{\tau}{2}) x(t-\frac{\tau}{2}) dt \right) d \tau

If we remove the middle integral, corresponding to the integration over time, we get a time-dependent spectral density as a 2-dimensional function

(11)
\mathcal{W}(x) = \int_{-\infty}^{\infty} e^{-i\omega \tau}  x(t+\frac{\tau}{2}) x(t-\frac{\tau}{2}) d \tau

which is the Wigner-Ville transform of x. This transform exhibits several elegant and desirable mathematical properties, hence it is sometimes considered a 'fundamental' time-frequency representation. Unfortunately, it has also one major drawback which is the presence of cross-terms (aka cross-components) in the time-frequency plane, as illustrated in figure 2. Minimization of the presence of cross-terms in time-frequency estimates can be achieved by applying a smoothing kernel, designed for a particular signal as in the upper panel of figure 6. Unfortunately, there is no general recipe for such smoothing which would give satisfactory results for an unknown signal.

Similarly to the Wigner-Ville transform of x, we can define a cross-Wigner transform of signals x and y:

(11)
\mathcal{W}(x, y) = \int_{-\infty}^{\infty} e^{-i\omega \tau}  x(t+\frac{\tau}{2}) y(t-\frac{\tau}{2}) d \tau

MP-based estimate of energy density

Matching pursuit
Enlarge
Figure 3: Middle plot: estimate of the time-frequency energy density (5), computed from the MP decomposition (13) of a 500-points simulated signal (plotted in red) composed of four Gabor functions, sine wave, one-point discontinuity, and a chirp (sine with linearly increasing frequency). Changing frequency of the chirp is represented as a series of Gabor funtions. Upper plot--the same as middle, presented in 3-D. Square root of energy proportional to the height of the surface or "temperature" on 2-D plots.

Wigner distribution computed directly from (2) gives

(12)
\mathcal{W}(x)  \approx \mathcal{W}\left(  \sum_{n=0}^{M-1} a_n g_{\gamma_n} \right) =\sum_{n=0}^{M-1} a_n^2\, \mathcal{W}(g_{\gamma_n})  + \, \sum_{n=0}^{M-1} \sum_{k=1, k\not=n}^{M-1} a_n a_k \mathcal{W}\left( g_{\gamma_n}, g_{\gamma_k} \right)

where a_n=\langle R^n x,\; g_{\gamma_n}\rangle. The double sum in (12) gathers cross-terms. Using linear expansion (5), we can omit them explicitly and construct the time-frequency representation of signal's energy density from the first sum, containing only auto-terms:

(13)
\mathcal{E} x  = \sum_{n=0}^{M-1} |\langle R^n x,\; g_{\gamma_n}\rangle|^2  \mathcal{W}g_{\gamma_n}

Technical issues

Matching pursuit provides a simple and intuitive decomposition of a signal, yet "under the hood" the procedure is highly nonlinear and complicated. First problem relates to the already mentioned need to select a discrete subset of the possible dictionary's functions for practical implementations. In general, it is expected, the larger (denser) the dictionary, the better the resulting matching pursuit decomposition at a higher computational cost. However, fixed schemes of subsampling the potentially continuous space of dictionary's parameters introduce a statistical bias, which becomes relevant when pooling together results of many matching pursuit decompositions (Durka et al. 2001). As for the number of waveforms in expansion (5), equal to the number of iterations of (4), there are several possible stopping criteria, but if we stop the iterations "too late" we may simply neglect the waveforms fitted in the latter iterations in further analysis. Reasonable settings for the dictionary size (density) and the number of iterations are important, because the computational cost of the matching pursuit, approximately proportional to these two factors, is still relatively high (Durka 2007).


Matching pursuit
Enlarge
Figure 4: A failure in feature extraction: matching pursuit decomposition of a signal consisting of two Gabor functions, both present in the dictionary used for decomposition. R0 – analyzed signal, g0 – function fitted in the first iteration by the matching pursuit algorithm (4).

Not all the signal's structures can be efficiently modeled by Gabor functions. Some waveforms, as for example the chirp in figure 3, may be expressed by several functions from the dictionary D, in which case the advantage of explicit parametrization is impaired. Nevertheless, we still get a robust and universal estimate of the time-frequency energy density, which is relatively independent of arbitrary factors. Influence of the prior choices, biasing other estimates of signals time-frequency energy density, is partly illustrated in figure 6. Matching pursuit algorithm can be also used with different types of dictionaries, but up to now these attempts were mostly theoretical.

Finally, there is a price to pay for using the sub-optimal solution of (3). Greedy matching pursuit algorithm can fail also in cases when the signal is constructed entirely from the structures present in the dictionary. An example is given in figure 4: a perfect solution to (3) would contain two Gabor functions, but, according to the greedy strategy of explaining maximum energy in a single step, in the first iteration matching pursuit chooses an intermediate waveform embracing both the structures. However, such an error can occur only in case when both the structures are perfectly in phase; in some cases one could argue that explaining them in term of one oscillation may be not so erratic.


Advantages in EEG/MEG analysis

Major advantages of the matching pursuit in analysis of biomedical signals stem from the explicit parametrization of transients, robust estimate of the time-frequency energy density, and combinations of these two. Extensive summary of the matching pursuit-based frameworks successfully applied to the EEG analysis is given in (Durka 2007).

Explicit parametrization of transients

Matching pursuit
Enlarge
Figure 5: Time-frequency energy density (13) of matching pursuit decomposition (5) of 20 seconds of sleep EEG. Marked blobs correspond to sleep spindles, structures C-D and E-F were marked by an encephalographer in the raw signal as single spindles (during a visual analysis or the raw EEG trace).

Gabor functions (6) fitted to the signal in the matching pursuit procedure (4) are defined by their amplitude, occurrence in time u, frequency \omega, time span s and phase \phi. If we assume that Gabor functions model appropriately the relevant components of the signal, then we can treat the fitted Gabors as representing the investigated phenomena. For example, EEG sleep spindles known from the visual analysis are usually defined as waveforms of frequencies in the 11-15 Hz range, with more or less generally assumed restrictions also on the amplitude and time span (figure 5). These restrictions can be implemented directly in the space of parameters of functions from the signal's expansion (5).


A robust and universal time-frequency estimate

Matching pursuit
Enlarge
Figure 6: Different estimates of time-frequency energy density of a simulated 500-points signal plotted at the bottom (the same as in figure 3). Two lower panels present spectrograms (windowed or short-time Fourier transforms) calculated for long (125 points in b) and short (21 points in c) windows. (d)--scalogram (continuous wavelet transform), (e)--smoothed pseudo Wigner-Ville distribution, (f)--the same as (e) in 3 dimensions.

The major problem in estimating the time-frequency energy density of signals is the trade-off between the time and frequency resolutions, known as the uncertainty principle in signal analysis. Different estimates offer different solutions to this issue, based upon different prior settings regulating this trade-off (figure 6). Spectrogram (windowed Fourier transform) gives uniform time-frequency resolution based upon the prior choice of the length of the analyzing window, scalogram (wavelet transform) gives higher time resolution for higher frequencies, while a variety of Wigner-derived distributions from the Cohen's class allow for almost any setting of these constrains. Nevertheless, these settings have to be decided a priori, before the analysis. Their improper choice may severely bias the results.

On the contrary, time-frequency estimate (13) derived from the matching pursuit decomposition (5) is based upon a representation, which adopts the time-frequency trade-off to the local signal structures in each iteration of the algorithm, yielding an estimate which is basically free from arbitrary settings--apart from choosing the Gabor functions.

References

  • S. Mallat and Z. Zhang (1993) Matching pursuit with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41:3397-3415.
  • P.J. Durka, D. Ircha, and K.J. Blinowska (2001) Stochastic Time-Frequency Dictionaries for Matching Pursuit, IEEE Transactions on Signal Processing, vol. 49, no. 3, pp. 507–510.
  • P.J. Durka (2007) Matching Pursuit and Unification in EEG Analysis, Artech House, ISBN 978-1-58053-304-1

External links

MPTK: The Matching Pursuit ToolKit project

Software for unbiased matching pursuit decomposition of signals


Piotr J. Durka (2007) Matching pursuit. Scholarpedia, 2(11):2288, (go to the first approved version)
Created: 25 October 2006, reviewed: 4 October 2007, accepted: 11 November 2007
Action editor: Dr. Ke CHEN, School of Computer Science, The University of Manchester, U.K.
Reviewer A: Dr. Sebastian Ferrando , Department of Mathematics, Physics and Computer Science , Ryerson University, Canada
For authors