# State space model

Curator and Contributors

1.00 - Zhe Chen

State space model (SSM) refers to a class of probabilistic graphical model (Koller and Friedman, 2009) that describes the probabilistic dependence between the latent state variable and the observed measurement. The state or the measurement can be either continuous or discrete. The term “state space” originated in 1960s in the area of control engineering (Kalman, 1960). SSM provides a general framework for analyzing deterministic and stochastic dynamical systems that are measured or observed through a stochastic process. The SSM framework has been successfully applied in engineering, statistics, computer science and economics to solve a broad range of dynamical systems problems. Other terms used to describe SSMs are hidden Markov models (HMMs) (Rabiner, 1989) and latent process models. The most well studied SSM is the Kalman filter, which defines an optimal algorithm for inferring linear Gaussian systems.

An important objective of computational neuroscience is to develop statistical techniques to characterize the dynamic features inherent in neural and behavioral responses of experimental subjects collected during neurophysiological experiments. In neuroscience experiments, measurements of neural or behavioral data are often dynamic, noisy and have rich temporal structures. Examples of such include intracellular or extracellular recordings, neuronal spike trains, local field potentials, EEG, MEG, fMRI, calcium imaging, and behavioral measures (such as the reaction time and decision choice). Questions of interest may include how to analyze spike trains from ensembles of hippocampal place cells to infer the rodent’s position in the environment or how to identify the sources of dipole using multi-channel MEG recordings. Regardless of their specific modality and applications, SSM provides a unified and powerful paradigm to model and analyze these signals in a dynamic fashion in both time and space.

## Formalism and theory

The objective of state space modeling is to compute the optimal estimate of the hidden state given the observed data, which can be derived as a recursive form of Bayes’s rule (Brown et al., 1998; Chen, Barbieri and Brown, 2010). In a general state space formulation, let x(t) denote the state and y(0:t) denote the cumulative observations up to time t, the filtering posterior probability distribution of the state conditional on the observations y(0:t) is

$$p({\bf x}(t)|{\bf y}(0:t)) ={p({\bf x}(t), {\bf y}(0:t)) \over p({\bf y}(0:t)) } = {p({\bf x}(t)|{\bf y}(0:t-1)) p({\bf y}(0:t)|{\bf x}(t),{\bf y}(0:t-1))\over p({\bf y}(t)|{\bf y}(0:t-1)) } ={p({\bf x}(t)|{\bf y}(0:t-1)) p({\bf y}(t)|{\bf x}(t),{\bf y}(0:t-1))\over p({\bf y}(t)|{\bf y}(0:t-1))}\tag{1}$$

where the last equality of Equation (1) follows the conditional independence assumption between the observations. The one-step state prediction, known as the Chapman-Kolmogorov equation, is

$$p({\bf x}(t)|{\bf y}(0:t-1)) = \int p({\bf x}(t)|{\bf x}(t-1)) p({\bf x}(t-1)|{\bf y}(0:t-1)) d{\bf x}(t-1)\tag{2}$$

where the probability distribution (or density) p(x(t)|x(t-1)) describes a state transition equation, and the probability distribution (or density) p(y(t)|x(t),y(0:t-1)) is the observation equation. Equations (1) and (2) provide the fundamental relations to develop state space models and analyses.

For an illustration purpose, consider a discrete-time multivariate linear Gaussian system, the SSM is characterized by two linear equations.

State equation. The n-dimensional hidden state process x(t+1) follows a first-order Markovian dynamics, as it only depends on the previous state at time t and is corrupted by a (correlated or uncorrelated) state noise process n(t)

$$\textbf{x}(t+1) = \textbf{Ax}(t) + \textbf{n}(t)\tag{3}$$

where A is an n × n state-transition matrix. The state equation describes the state space evolution of a stochastic dynamical system.

Observation equation. The m-dimensional measurement y(t) is subject to a linear transformation of the hidden state x(t) and is further corrupted by a measurement noise process $\textbf{v}(t)$

$$\textbf{y}(t) = \textbf{Bx}(t) + \textbf{v}(t)\tag{4}$$

When the noise processes n(t) and v(t) are both Gaussian with zero mean and respective covariance matrices Q and R, y(t) will be a Gaussian process. Equation (3) defines a first-order autoregressive (AR) process. Although the first-order Markovian transition is assumed in equation (3) , a higher-order AR structure can be always reformulated and transformed into a first-order AR formulation by concatenating several state vectors into a new state vector (for example, ${\bf x}_{new}$(t) = [x(t), x(t-1)]). Given a series of observations Y={y(1), …, y($T_0$)} during a time interval [0,$T_0$], in light of equations (3) and (4), the joint (complete data) likelihood function for the linear Gaussian SSM is

$$p(X,Y|\theta) = {1\over (2\pi)^{n/2}|{\bf Q}|^{1/2}} \exp\Big\{ \sum\limits_{t=1}^{T_0-1} (-0.5({\bf x}(t+1)-{\bf Ax}(t))^T {\bf Q}^{-1}({\bf x}(t+1)-{\bf Ax}(t))\Big\} + {1\over (2\pi)^{m/2}|{\bf R}|^{1/2}} \exp\Big\{ \sum\limits_{t=1}^{T_0} (-0.5({\bf y}(t)-{\bf Bx}(t))^T {\bf R}^{-1}({\bf y}(t)-{\bf Bx}(t))\Big\} \tag{5}$$

where the superscript T denotes the vector or matrix transpose operator, and the augmented variable θ={A, B, Q, R, x(0)} includes the initial state condition and parameters that fully characterize the linear Gaussian SSM.

## Variants

The linear Gaussian SSM can be extended to a broad class of dynamical Bayesian networks (Ghahramani, 1998) by changing one or more following conditions about the state or measurement variables (Chen, Barbieri and Brown, 2010): (i) from continuous state to discrete or mixed-value state variable in equation (3); (ii) from continuous observation to discrete or mixed observation in equation (3); (iii) from Gaussian to non-Gaussian noise processes in equation (3) or (4); and (iv) inclusion of nonlinearity in equation (3) or (4). For instance, changing condition (i) may result in a discrete-time HMM or switching SSM; changing condition (ii) or (iii) may result in a generalized SSM with a generalized linear model (GLM) (McCullagh and Nelder, 1989) in place of equation (4); and changing condition (iv) may result in a nonlinear neural filter. In addition, a control variable can be incorporated into the state equation (3), which will result in a standard linear quadratic Gaussian (LQG) control system for which the optimal solution can be derived analytically (Bertsekas, 2005).

In modeling discrete neural signals, such as neuronal spike trains (point processes) or spike count (Poisson process), new variants of SSM may emerge. For simplicity, consider a single neural point process whose conditional intensity function λ(t) is modulated by a constant baseline rate parameter μ, a latent continuous state variable x(t) and an observed covariate u(t). When Δ is sufficiently small, the product λ(t)Δ is approximately equal to the probability of observing a spike within the interval ((t-1)Δ, tΔ] (in which there is at most one spike). For illustration simplicity, assume that the state equation is characterized by a first-order AR process, and the observation equation is characterized by a point process likelihood function (Smith and Brown, 2003)

$$x(t+1)=\rho x(t) + n(t) \tag{6}$$

$$\log\lambda(t)=\mu+\alpha x(t) + \beta u(t) \tag{7}$$

$$p(Y|X,\theta)=\exp\Big\{\int_0^{T_0} \log\lambda(\tau) dy(\tau) -\int_0^{T_0}\lambda(\tau)d\tau\Big \} \tag{8}$$

where θ={σ, ρ, μ, α, β, $x_0$}, n(t) denotes a zero-mean Gaussian variable with variance $\sigma^2$, |ρ|<1 denotes the AR coefficient, and the indicator variable $dy(\tau)$ is equal to 1 when there is a spike within the interval ((t-1)Δ, tΔ] and 0 otherwise. From equations (6) and (8), the complete data likelihood function is derived as

$$p(X,Y|\theta)=\Big[{(1-\rho^2) \over 2\pi \sigma^2}\Big]^{1/2} \exp\Big\{-{1\over 2\sigma^2}\Big[(1-\rho^2) x_0^2 +\sum\limits_{t=1}^{T_0-1}(x(t+1)-\rho x(t))^2\Big]\Big\} + \exp\Big\{\int_0^{T_0} \log\lambda(\tau) dy(\tau) -\int_0^{T_0}\lambda(\tau)d\tau\Big \} \tag{9}$$

Provided that the parameter θ is known, direct optimization of p(X,Y|θ) or logp(X,Y|θ) will yield the a global optimal state estimate of {x(t)}.

In the presence of multivariate point process observations $Y=\{y_c(1:t)\}_{c=1}^C$, provided that the observations are driven by a common state process $X=\{x(t)\}_{t=1}^{T_0}$ and the multivariate point process observations are conditionally independent at any time $t$ given a parameter θ, the generic complete data likelihood is derived as

$$p(X,Y|\theta)= \prod_{t=1}^{T_0}p(x(t)|x(t-1),\theta) \prod_{c=1}^C p(y_c(t)|x(t),\theta) \tag{10}$$

Such a probabilistic model is often used to characterize population neuronal spike trains (Brown et al., 1998; Brown et al., 2003; Chen, Barbieri and Brown, 2010).

## Statistical inference and learning

A common objective of statistical inference for SSM is to infer the state (including its uncertainty) based on the time series observations. In light of equation (1), the goal of state space analysis is to estimate the posterior probability distribution (or density) p(x(t)|Y). In the special case of linear Gaussian SSM, the predictive posterior distribution is fully characterized by the conditional mean and conditional covariance of a Gaussian distribution. When the state and observation equations of the linear Gaussian SSM are known, the optimal inference algorithm is described by recursive Kalman filtering (where Y={y(1), …, y(t)} is used for an online operation) or fixed-interval Kalman smoothing (where Y={y(1), …, y($T_0$)} is used for an offline operation). When the state is discrete (as in the HMM) and the state and observation equations are known, the optimal solution is given by the Viterbi algorithm. In contrast, in the presence of point process observations, a discrete analog of Kalman filtering operation is described by a point process filtering operation (Brown et al., 1998; Smith and Brown, 2003). For the above simple example (equations (6)-(8)), the following point process filtering equations are used to infer the state $\{x(t)\}$

$$x(t+1|t) = \rho x(t|t) \;\;\; (\textrm{one-step mean prediction}) \tag{11}$$

$$\sigma^2_x(t+1|t) =\rho^2 \sigma^2_x(t|t) +\sigma^2 \;\;\; (\textrm{one-step variance prediction}) \tag{12}$$

$$x(t+1|t+1) = x(t+1|t) + \sigma^2_x(t+1|t) \alpha \Big[dy(t+1)-\exp\Big(\mu+\alpha x(t+1|t+1) +\beta u(t+1)\Big)\Delta \Big] \;\;\; (\textrm{posterior mode})\tag{13}$$

$$σ^2_x(t+1|t+1) = \Big[ \Big(\sigma^2_x(t+1|t) \Big)^{-1} +\alpha^2 \exp\Big(\mu+\alpha x(t+1|t+1)+\beta u(t+1)\Big)\Delta \Big]^{-1} \;\;\; (\textrm{posterior variance})\tag{14}$$

where x(t+1|t+1) and $σ^2_x$(t+1|t+1) denote the posterior state mode and posterior state variance, respectively. Equations (11)-(14) are reminiscent of Kalman filtering. Equations (11) and (12) for one-step mean and variance predictions are the same as Kalman filtering, but equations (13) and (14) are different from Kalman filtering due to the presence of non-Gaussian observations and nonlinear operation in (13). In equation (13), [d y(t+1) − exp(μ + αx(t+1|t+1) + βu(t+1))Δ] is viewed as the innovations term, and $σ^2_x$(t+1|t) may be interpreted as a “Kalman gain”. The quantity of the Kalman gain determines the “step size” of error correction. In equation (14), the posterior state variance is derived by inverting the second derivative of the log-posterior probability density logp(X|Y,θ) based on a Gaussian approximation of the posterior distribution around the posterior mode (Brown et al., 1998; Smith and Brown, 2003; Eden et al., 2004).

When the state and observation equations are completely or partially unknown, the parameter θ and the state {x(t)} need to be jointly estimated. In statistics, likelihood inference is a well-established and asymptotically efficient approach for parameter estimation. Specifically, the expectation-maximization (EM) algorithm (Dempster, Laird and Rubin, 1977) provides a general framework to maximize or increase the likelihood by iteratively updating the latent state and parameter variables. Reconsider the example used in equations (6)-(8), the corresponding EM algorithm consists of the following two steps.

    E-step: At the k-th step, compute the expected complete data log likelihood (Q-function) based on the estimate ${\it θ}^{(k)}$


$$Q(\theta|\theta^{(k)}) = E[\log p(X,Y|\theta) || \theta^{(k)}] \tag{15}$$

and estimate the expected statistics of the latent process (E[x(t)||${\it θ}^{(k)}$], E[$x^2$(t)||${\it θ}^{(k)}$] and E[x(t)x(t+1)||${\it θ}^{(k)}$]) using point process filtering (equations (11) through (14)) and fixed-interval Kalman smoothing.

   M-step: Update ${\it θ}^{(k)}$ to ${\it θ}^{(k+1)}$ such that ${\it θ}^{(k+1)}=\arg\max_{θ} Q({\it θ}|{\it θ}^{(k)}$).


This can be achieved by setting the partial derivative of the Q-function to zero (i.e., $\frac{\partial Q}{\partial \theta}=0$), which may be solved via either numerical optimization (in the case of μ, α and β) or closed-form solutions (in the case of σ, ρ and $x_0$).

The E and M steps are executed iteratively until the likelihood reaches a local maximum. Upon convergence, the EM algorithm yields a point estimate of θ, the confidence intervals of θ can be assessed from the likelihood principle (Pawitan, 2001; Brown et al., 2003).

Alternatively, Bayesian inference (Gelman et al., 2005) provides another approach that aims to estimate the full posterior of the state and parameters based on Bayesian statistics. When the state variable is non-Gaussian, particle filtering or smoothing may be used to numerically approximate the posterior distribution; when the parameter variable is non-Gaussian, Gaussian approximation, Gibbs sampling or Markov chain Monte Carlo (MCMC) approaches may be employed. The common goal of these inference methods is to estimate the joint posterior of the state and the parameters using Bayes's rule

$$p(X,\theta|Y)\approx p(X|Y)p(\theta|Y)= { p(Y|X,\theta)p(X)p(\theta)\over p(Y)}={p(Y|X,\theta)p(X)p(\theta)\over \int p(Y|X,\theta)p(X)p(\theta)dX d\theta }\tag{16}$$

where the first equation assumes a factorial form of the posterior for the state and the parameters, and p(X) and p(θ) denote the prior distributions for the state and the parameters, respectively. The denominator of equation (16) is a normalizing constant known as the partition function. In general, Monte Carlo-based Bayesian inference or learning is powerful yet computationally expensive (Doucet, de Frietas, and Gordon, 2001; Gilks, Richardson and Spiegelhalter, 1996). A trade-off between tractable computational complexity and good performance is to exploit various approximate Bayesian inference methods, such as expectation propagation (Minka, 1999), mean-field approximation (Opper and Saad, 2001) and variational approximation (Jordan et al., 1999; Ghahramani, 1998; Beal and Ghahramani, 2006). These approximation techniques can also be integrated or combined to produce new methods, such as Monte Carlo EM or variational MCMC algorithms (McLachlan and Krishnan, 2008; Andrieu et al., 2003).

## Applications in neuroscience

Numerous applications of SSM to dynamic analyses of neuroscience data can be found in the literature (Paninski et al., 2009; Chen, Barbieri and Brown, 2010). Several important and representative applications are highlighted here.

• Population neuronal decoding: Examples include decoding the movement kinematics from primate motor cortex (M1) neurons in neural prosthetics (Brockwell, Rojas and Kass, 2004; Wu et al., 2006, 2009; Kulkarni and Paninski, 2008; Srinivasan et al., 2006, 2007) or goal-directed movement control (Srinivasan and Brown, 2007; Shanechi et al., 2012, 2013), or decoding rat’s spatial location from hippocampus ensemble spike trains (Brown et al., 1998; Barbieri et al., 2004). Truccolo et al. (2008) applied the first point-process state-space analysis to decode M1 neuronal spike trains recorded in patients with tetraplegia.

• Analysis of single neuronal plasticity or dynamics: Examples include tracking the receptive field of rat hippocampal neurons in navigation (Brown et al., 2001) and analyzing between-trial monkey hippocampal neuronal dynamics during associative learning experiments (Czanner et al., 2008).

• Identification of the state of neuronal ensembles: Examples include detecting stimulus-driven cortical state during behavior (Jones et al., 2007; Kemere et al., 2008) or detecting intrinsic cortical up/down states during slow wave sleep (Chen et al., 2009).

• Assessment of learning behavior of experimental subjects: Examples include characterizing dynamic behavioral responses in neuroscience experiments (Smith et al., 2004, 2005, 2007; Prerau et al., 2009).

• Inverse problems: Examples include solving EEG or MEG inverse problems (Galka et al., 2004; Lamus et al., 2012), deconvolving fMRI time series (Penny et al., 2005) and deconvolving spike trains from calcium imaging (Vogelstein et al., 2009, 2010).

• Other neuroscience applications such as spike sorting (Herbst et al., 2008; Calabrese and Paninski, 2011), assessment of higher-order neuronal synchrony (Shimazaki et al, 2012), prediction of spike timing (Kobayashi and Shinomoto, 2007), unfolding of population neuronal representation (Chen et al., 2012), and causality analysis (Havlicek et al., 2010).

## Research topics

An important topic in state-space modeling is model selection, or specifically to select the (discrete or continuous-valued) state dimensionality. Classical likelihood-based approaches rely on Akaike’s information criterion (AIC) or Bayesian information criterion (BIC), but these measures are often practically inefficient especially in the presence of sparse data samples. Following the Bayesian principle of “letting data speak for themselves”, the model selection problem has recently been tackled using nonparametric Bayesian inference, for instance in the cases of infinite HMM (Beal et al., 2002; Teh et al., 2006) and the switching SSM (Fox et al., 2010, 2011). Moreover, inference of a large-scale SSM for neuroscience data remains another important research topic. Exploiting the structure of the system, such as the sparsity, smoothness and convexity, may allow for employing efficient state-of-the-art optimization routines and imposing domain-dependent priors for regularization (Paninski et al., 2009; Paninski, 2010). Finally, developing consistent goodness-of-fit assessment for neuroscience data would help to validate and compare different statistical models (Brown et al., 2003).

## References

Andrieu C, de Freitas N, Doucet A, Jordan MI. (2003) An introduction to MCMC for machine learning. Machine Learning, 50(1): 5-43.

Barbieri R, Frank LM, Nguyen DP, Quirk MC, Solo V, Wilson MA, Brown EN. (2004) Dynamic analyses of information encoding by neural ensembles. Neural Computation, 16: 277-307.

Beal M, Ghahramani Z, Rasmussen CE. (2002) The infinite hidden Markov model. In Advances in Neural Information Processing Systems, 14: 577-584. Cambridge, MA: MIT Press.

Beal M, Ghahramani Z. (2006) Variational Bayesian learning of directed graphical models. Bayesian Analysis, 1(4): 793-832.

Bertsekas D. (2005) Dynamic Programming and Optimal Control. Boston, MA: Athena Scientific.

Brown EN, Frank LM, Tang D, Quirk MC, Wilson MA. (1998) A statistical paradigm for neural spike train decoding applied to position prediction from ensemble firing patterns of rat hippocampal place cells. Journal of Neuroscience, 18:7411-7425.

Brown EN, Nguyen DP, Frank LM, Wilson MA, Solo V. (2001) An analysis of neural receptive field plasticity by point process adaptive filtering. Proceedings of the National Academy of Sciences, 98: 12261-12266.

Brown EN, Barbieri R, Eden UT, Frank LM. (2003) Likelihood methods for neural data analysis. In: Feng J. (Ed.) Computational Neuroscience: A Comprehensive Approach (pp. 253-286), London: CRC Press.

Brockwell A, Rojas A, Kass R. (2004) Recursive Bayesian decoding of motor cortical signals by particle filtering. Journal of Neurophysiology, 91:1899-1907.

Calabrese A, Paninski L. (2011) Kalman filter mixture model for spike sorting of non-stationary data. Journal of Neuroscience Methods, 196(1): 159-169.

Chen Z, Vijayan S, Barbieri R, Wilson MA, Brown EN. (2009) Discrete- and continuous-time probabilistic models and algorithms for inferring neuronal UP and DOWN states. Neural Computation, 21(7): 1797-1862.

Chen Z, Barbieri R, Brown EN. (2010) State-space modeling of neural spike train and behavioral data. In Oweiss K (Ed.) Statistical Signal Processing for Neuroscience and Neurotechnology, Chap. 6 (pp. 161-200). Academic Press.

Chen Z, Kloosterman F, Brown EN, Wilson MA. (2012) Uncovering hidden spatial topology represented by hippocampal population neuronal codes. Journal of Computational Neuroscience, 33: 227-255.

Czanner G, Eden UT, Wirth, S, Yanike M, Suzuki WA, Brown EN. (2008) Analysis of between-trial and within-trial neural spiking dynamics. Journal of Neurophysiology, 99, 2672-2693.

Dempster A, Laird N, Rubin DB. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B39: 1-38.

Doucet A, de Freitas N, Gordon N. (2001) Sequential Monte Carlo Methods in Practice. Springer.

Eden UT, Frank LM, Barbieri R, Solo V, Brown EN. (2004) Dynamic analyses of neural encoding by point process adaptive filtering. Neural Computation, 16: 971-998.

Fox EL, Sudderth EB, Jordan MI, Willsky AS. (2010) Bayesian nonparametric learning of Markov switching processes. IEEE Signal Processing Magazine, 28(11): 43-54.

Fox EL, Sudderth EB, Jordan MI, Willsky AS. (2011) Bayesian nonparametric inference of switching dynamic linear models. IEEE Transactions on Signal Processing, 59(4): 1569-1585.

Galka A, Yamashita O, Ozaki T, Biscay R, Valdés-Sosa P. (2004) A solution to the dynamical inverse problem of EEG generation using spatiotemporal Kalman filtering. NeuroImage, 23(2): 435-453.

Gelman A, Carlin JB, Stern HS, Rubin DB. (2004) Bayesian Data Analysis (2nd ed), Chapman & Hall/CRC.

Ghahramani Z. (1998) Learning dynamic Bayesian networks. In Giles CL and Gori M (Eds.), Adaptive Processing of Sequences and Data Structures (pp. 168-197). Berlin: Springer-Verlag.

Gilks WR, Richardson S, Spiegelhalter DJ. (1996). Markov Chain Monte Carlo in Practice. Chappman & Hall/CRC Press.

Havlicek M, Jan J, Brazdil M, Calhoun VD. (2010) Dynamic Granger causality based on Kalman filter for evaluation of functional network connectivity in fMRI data. NeuroImage, 53(1): 65-77.

Herbst JA, Gammeter S, Ferrero D, Hahnloser RH. (2008) Spike sorting with hidden Markov models. Journal of Neuroscience Methods, 174(1):126-134.

Jones LM, Fontanini A, Sadacca BF, Katz DB. (2007) Natural stimuli evoke analysis dynamic sequences of states in sensory cortical ensembles. Proceedings of National Academy of Sciences USA 104: 18772-18777.

Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK. (1999) An introduction to variational methods for graphical models. Machine Learning, 37:183-233.

Kalman RE. (1960) A new approach to linear filtering and prediction problems. Transactions of the ASME--Journal of Basic Engineering, 82:35-45.

Kemere C, Santhanam G, Yu BM, Afshar A, Ryu SI, Meng TH, Shenoy KV. (2008) Detecting neural-state transitions using hidden Markov models for motor cortical prostheses. Journal of Neurophysiology, 100: 2441-2452.

Kobayashi R, Shinomoto S. (2007) State space method for predicting the spike times of a neuron. Physical Review E75, 011925.

Koller D, Friedman N. (2009) Probabilistic Graphical Models. Cambridge, MA: MIT Press.

Kulkarni J, Paninski L. (2008) State-space decoding of goal-directed movements. IEEE Signal Processing Magazine, 25:78-86.

Lamus C, Hamalainen MS, Temereanca S, Long CJ, Brown EN, Purdon PL. (2012) A spatiotemporal dynamic distributed solution to the MEG inverse problem. NeuroImage, 63(2): 894-909.

McCullagh P, Nelder JA. (1989) Generalized Linear Models (2nd ed.). Chapman & Hall/CRC Press.

McLachlan GJ, Krishnan T. (2008) The EM Algorithm and Extensions (2nd ed.). New York: Wiley.

Minka TP. (2001) A family of algorithms for approximate Bayesian inference. PhD thesis, Dept. EECS, Massachusetts Institute of Technology, Cambridge, MA.

Opper M, Saad D. (2001) Advanced Mean Field Methods: Theory and Practice. Cambridge, CA: MIT Press.

Paninski L, Ahmadian Y, Ferreira DG, Koyama S, Rad KR, Vidne M, Vogelstein JT, Wu W. (2009) A new look at state-space models for neural data. Journal of Computational Neuroscience, 29(1-2): 107-126.

Paninski L. (2010) Fast Kalman filtering on quasilinear dendritic trees. Journal of Computational Neuroscience, 28: 211-228.

Pawitan Y. (2001) In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford: Clarendon Press.

Penny W, Ghahramani Z, Friston K. (2005). Bilinear dynamical systems. Philosophical Transactions of Royal Society of London B, 360: 983-993.

Prerau MJ, Smith AC, Eden UT, Kubota Y, Yanike M, Suzuki W, Graybiel AM, Brown EN. (2009) Characterizing learning by simultaneous analysis of continuous and binary measures of performance. Journal of Neurophysiology, 102:3060-3072.

Rabiner LR. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2): 257-286.

Shanechi MM, Wornell GW, Williams ZM, Brown EN. (2013) Feedback-controlled parallel point process filter for estimation of goal-directed movements from neural signals. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 21: 129-140.

Shanechi MM, Brown EN, Williams ZM. (2012) Neural population partitioning and a concurrent brain-machine interface for sequential control motor function. Nature Neuroscience, 12: 1715-1722,

Shimazaki H, Amari S, Brown EN, Gruen S. (2012) State-space analysis of time-varying higher-order spike correlation for multiple neural spike train data. PLoS Computational Biology, 8(3): e1002385.

Smith AC, Brown EN. (2003) Estimating a state-space model from point process observations. Neural Computation, 15: 965-991.

Smith AC, Frank LM, Wirth S, Yanike M, Hu D, Kubota Y, Graybiel AM, Suzuki WA, Brown EN. (2004) Dynamical analysis of learning in behavior experiments. Journal of Neuroscience, 24: 447-461.

Smith AC, Stefani MR, Moghaddam B, Brown EN. (2005) Analysis and design of behavioral experiments to characterize population learning. Journal of Neurophysiology, 93: 776-792.

Smith AC, Wirth S, Suzuki W, Brown EN. (2007) Bayesian analysis of interleaved learning and response bias in behavioral experiments. Journal of Neurophysiology, 97: 2516-2524.

Srinivasan L, Eden UT, Willsky AS, Brown EN. (2006) A state-space analysis for reconstruction of goal-directed movements using neural signals, Neural Computation, 18: 2465-2494.

Srinivasan L, Eden UT, Mitter SK, Brown EN. (2007) General-purpose filter design for neural prosthetic devices. Journal of Neurophysiology, 98: 2456-2475.

Srinivasan L, Brown EN. (2007) A state-space framework for movement control to dynamic goals through brain-driven interfaces. IEEE Transactions on Biomedical Engineering, 54(3): 526-535.

Teh YW, Jordan MI, Beal MJ, Blei DM. (2006) Hierarchical Dirichlet processes. Journal of American Statistical Association, 101: 1566-1581.

Truccolo W, Friehs GM, Donoghue JP, Hochberg LR. (2008) Primary motor cortex tuning to intended movement kinematics in humans with tetraplegia. Journal of Neuroscience, 28(5): 1163-1178.

Vogelstein J, Watson B, Packer A, Yuste R, Jedynak B, Paninski L. (2009) Spike inference from calcium imaging using sequential Monte Carlo methods. Biophysical Journal, 97(2): 636-655.

Vogelstein J, Packer A, Machado TA, Sippy T, Babadi B, Yuste R, Paninski L. (2010) Fast nonnegative deconvolution for spike train inference from population calcium imaging. Journal of Neurophysiology, 104: 3691-3704.

Wu W, Gao Y, Bienenstock E, Donoghue JP, Black MJ. (2006) Bayesian population decoding of motor cortical activity using a Kalman filter. Neural Computation, 18: 80-118.

Wu W, Kulkarni JE, Hatsopoulos NG, Paninski L. (2009) Neural decoding of hand motion using a linear state-space model with hidden states. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 17: 370-378.

Andre Longtin (2010) Stochastic dynamical systems. Scholarpedia, 5(4):1619.

Jose Pedro Segundo (2010) Spike train and point processes. Scholarpedia, 5(7):5729.

David Spiegelhalter and Kenneth Rice (2009) Bayesian statistics. Scholarpedia, 4(8):5230.

David H. Terman and Eugene M. Izhikevich (2008) State space. Scholarpedia, 3(3):1924.

Andrew J. Viterbi (2008) Viterbi algorithm. Scholarpedia, 4(1):6246.

Barber D, Cemgil AT, Chiappa S. (2011) Bayesian Time Series Models. Cambridge University Press.

Cappé O, Moulines E and Rydén T. (2005) Inference in Hidden Markov Models. Berlin: Springer.

Durbin J and Koopman SJ. (2001) Time Series Analysis by State Space Methods. Oxford University Press.

Kim C-J and Nelson CR. (1999) State-Space Models with Regime Switching: Classical and Gibbs-Sampling Approaches with Applications. Cambridge, MA: MIT Press.

Kitagawa G and Gersh W. (1996) Smoothness Priors Analysis of Time Series. New York: Springer.

Ozaki T. (2012) Time Series Modeling of Neuroscience Data. Chapman & Hall/CRC Press.