Optic flow

Post-publication activity

Dr. Florian Raudies, Boston University, Boston, MA, USA

1 Definitions of optic flow
2 A special feature of this article
3 Optic flow in our daily life
4 Active and ecological optics
5 Optic flow in computational vision
6 Optic flow for guidance of locomotion and scene parsing
7 Optic flow in neuroscience
- 7.1 Single cell physiology
- 7.2 Functional magnetic resonance imaging (fMRI)
8 Biologically inspired models for optic flow processing
- 8.1 Initial optic flow detection and processing
- 8.2 Cortex models of spatial and temporal motion integration
9 Optic flow in psychophysics – Perception and phenomenology
10 Acknowledgements
11 References
12 Appendix

Definitions of optic flow

Optic flow is defined as the change of structured light in the image, e.g. on the retina or the camera’s sensor, due to a relative motion between the eyeball or camera and the scene. Further definitions from the literature highlight different properties of optic flow.

Helmholtz’s: “My belief too is that it is mainly by variations of the retinal image due to bodily movements that one-eyed persons are able to form correct apperceptions of the material shapes of their surroundings.” (Helmholtz, 1925, p. 297)

Gibson’s: “Analytically, this total transformation of the array appears to mean that the elements of this texture are displaced, the elements being considered as spots. Introspectively, the field is everywhere alive with motion when the observer moves.” (Gibson, 1966, p. 196f.)

Horn’s: “The apparent motion of brightness patterns observed when a camera is moving relative to the objects being imaged is called optical flow.” (Horn, 1986, p. 278)

Helmholtz is mainly concerned about depth perception and he describes optic flow as the “variations of the retinal image” that are due to body motion and depend on the structure, namely distance as well as rigidness, of the environment. Gibson describes the displacement of structure in the optic array as a transformation that becomes “alive with motion when the observer moves”. A more recent definition of optic flow is given by Horn who attributes the “motion of brightness patterns” in the image to relative motion between observer and objects in the environment.

In a bio-inspired context retinal flow is the change of structured patterns of light on the retina that lead to an impression of movement of the visual imagery projected onto the retina. Figure 1a depicts the production of retinal optic flow for the displacements of two exemplary visual features. In technical terms and in the context of computer vision, changes of the environment in the image are represented by a series of image frames. Figure 1b shows three frames of an image sequence, which can be obtained by a spatial and temporal sampling of the incoming light. Optic flow captures the change in these images through a vector field. Research emphasizes on the accurate, pixel-wise estimation of optic flow, which is a computationally demanding task. Nowadays, optic flow can be estimated in close to real-time for a reasonable image resolution.

Figure 1: shows the production and detection of optic flow. a) Optic flow is generated on the retina by changes in the patterns of light. The example shows the shift of two visual features (star and hexagon) on a plane and their angular displacements on the surface of the eyeball. b) If the structured light is sampled spatially and temporally this results in an image sequence. The example shows three frames, which show the movement of the silhouette of a head. The optic flow is depicted as the correspondence of contour pixels between frame 1 and 2 as well as frame 2 and 3. For methods estimating flow, the challenge is to find the point correspondence for each pixel in the image, not only the contour pixels.

The phenomenology of optic flow, its dependence on observer motion and environment has been qualitatively known for decades. Only during the early 80s were popular algorithms proposed for the estimation of optic flow from image sequences (Lucas & Kanade, 1981; Horn & Schunck, 1981), whereas an analytic model for optic flow was formulated as early as the mid-60s (Gordon, 1965; Koenderink & Van Doorn, 1976; Longuet-Higgins & Prazdny, 1980). Horn (1986, p. 278) calls this analytic model of optic flow “motion field”, which is a geometrical concept. This motion field is determined by the geometry of the camera, namely the projection function, the movement of the camera, and the geometry in the scene. In the literature, the motion field is often confounded with the optic flow, which is defined by the temporal change of the registered patterns of light. The motion field as analytic model was formulated for a pinhole camera in locomotion. Locomotion is approximated by three translational and three rotational instantaneous velocities.

Summaries of state-of-the-art algorithms for the estimation of optic flow are given frequently (Nagel, 1987; Barron et al., 1994; Sun et al., 2010). Nagel (1987) unifies three methods (Nagel, 1983; Tretiak & Pastor, 1984; Haralick & Lee, 1983) by identifying the rigorous constraint as the common formalism. Nagel also provides a discussion of smoothness constraints for the solution of optic flow (Horn & Schunck, 1981; Hildreth, 1984; Nagel & Enkelmann, 1986). Barron et al. (1994) provide a categorization into differential techniques (Horn & Schunck, 1981; Lucas & Kanade, 1981; Nagel, 1983, 1987; Tretiak & Pastor 1984; Uras et al. 1988), region-based matching (Anandan, 1989; Little et al., 1988), energy based methods (Adelson & Bergen, 1986; Heeger, 1988), and phase based techniques (Fleet & Jepson, 1990). Sun et al. (2010) suggest that little has changed in the typical formulation given by Horn & Schunck (1981). Mainly advanced numerical optimization routines and robustness functions lead to performance gains in terms of quality and speed-up. In this review, we will show the diversity of approaches and techniques from the last 30 years of optic flow estimation in computational vision and biologically inspired modeling. Most of the ideas included in the constraints and regularizers, which impose smoothness onto the optic flow solution, have found their way into state-of-the-art estimation methods. To render a more complete picture we included studies on optic flow from neuroscience and psychophysics.

A special feature of this article

We provide Matlab implementations for several methods that estimate optic flow given an image sequence. Table 1 summarizes these implementations and contains links to the code package.

Table 1: lists Matlab implementations of methods that estimate optic flow from an image sequence. (*) Most of these implementations are not a faithful replication of the original algorithm due to the lack of detail in the published literature. In some cases, the implemented methods contain only special cases or are simplified to communicate the idea behind the method. The Appendices A1-A14 provide a detailed description.
Author(s) of method ()*	Brief description	Appendix
Adelson & Bergen (1985)	Motion energy detector	A14
Farnebäck (2000)	Orientation tensor	A3
Fleet & Jepson (1990)	Detection with local image phase	A6
Heeger (1988)	Optic flow constraint equation in Fourier space	A4
Horn & Schunck (1981)	Variational approach with regularizer imposing smoothness	A7
Lucas & Kanade (1981)	Linear least squares	A1
Mota et al. (2001)	Constraints for $n$ motions in the regular domain	A2
Nagel (1987)	Rigorous condition and higher order derivatives	A11
Otte & Nagel (1994)	Sampling condition and higher order derivatives	A10
Shizawa & Mase (1991)	Constraints for $n$ motions in Fourier space	A5
Uras et al. (1988)	2^nd order derivatives assuming constant motion	A9
Download methods: Download methods

Optic flow in our daily life

Optic flow is continually processed in our visual system and can be used to help solve various tasks. Such tasks involve the estimation of self-motion (Bruss & Horn, 1983), the segmentation of the scene into independently moving objects and rigid parts, or foreground and background (Weiss, 1997; Cremers & Schnörr, 2003). Furthermore, optic flow contains information about the time-to-contact with locations in the environment depicted in the image (Lee, 1976; Horn et al., 2007; Alenya et al., 2009). Optic flow also allows for an estimation of relative depths of all visible and rigid objects. Information extracted from optic flow can be used in driver assistance systems to detect other cars and their movement, pedestrians, and the car's movement (Wedel et al., 2011). Another application is the detection of other airplanes to avoid collisions in airspace (Zufferey & Floreano, 2006; Soccol et al., 2007). Optic flow is used in video codecs to interpolate the image between key frames (Chahine & Konrad, 1995). Fast, high resolution displays use optic flow to synthesize additional image frames as an interpolation between existing ones (Csillag & Boroczky, 1999). Often higher frame rates are not possible due to the bandwidth limited connection to the display and, therefore, this interpolation technique is used. Optic flow can be detected using camera sensors that are cheaper than light radar and passive in their sensing (Karhoff et al., 2006).

Active and ecological optics

Typically, optic flow was studied for passive vision. This means that the sensor, which registers optic flow, is rigidly mounted to a robot and flow is not actively generated by the robot to increase the information that can be extracted from optic flow. Instead, the detected flow is used within a control circuit to e.g. avoid obstacles. In contrast, for active vision the sensor is controlled by the flow and image stream, e.g. to maintain fixation, and, thus, to allow for a simplification of flow estimation and processing (Fermüller & Aloimonos, 1993). Recently, Rothkopf & Ballard (2009) presented evidence for such an active control in humans through eye or head rotations during visual navigation.

Gibson (1966) introduced the concept of “ecological optics”. Environment, task, and the specifics of the observer (e.g. body size) all impose constraints as how to use the agent's degrees of freedom and capabilities efficiently and adaptively to solve the task. Effken & Shaw (1992) discuss these as the “duality of affordances and affectivities” (p. 249) where affordances are imposed by the environment and affectivities by one self's constraints and both interact with the task to be solved. Only a few approaches in robotics followed this idea. These are known under the term “ecological robotics”. Largely, these approaches rest upon control laws, which combine sensory input directly with the actors output. This has been successfully applied to sensing of optic flow and obstacle avoidance during visual navigation (Duchon et al., 1998). Control laws have been inspired by human's (Duchon & Warren, 2002) as well as bee's visual navigation strategies (Srinivisan, 2011). This combination of an active vision approach and the recent techniques for optic flow processing are a promising avenue for future research.

Optic flow in computational vision

We provide taxonomy for constraints used in methods that estimate optic flow with the goal to introduce various basic concepts and constraints. In addition, we provide an abstract description of most recent methods which often involve multiple of the previously mentioned concepts.

We start with a few definitions and a formal statement of the problem of estimating flow. Assume the gray-value function $f(x,y,t)$ describes the image sequence of a video, where $x$ and $y$ denote the spatial Cartesian coordinates and $t$ the time or temporal dimension, then we are interested in the local, dense vector field $(u(x,y,t),v(x,y,t))$ to be estimated from that video. For simplifications, we assume that all videos are gray-value or reduced to one channel. More recent methods have been developed that take color images as input (e.g. Zimmer et al., 2009). Box 1 summarizes this problem definition.

**Box 1** states the problem for the estimation of optic flow from an 2D-spatial image sequence. Here, all presented approaches treat $u(x,y,t)$ and $v(x,y,t)$ as $u(x,y)$ and $v(x,y)$ excluding the temporal domain from the optic flow solution.
Given a scalar function: $f(x,y,t):\Re ^{3} \to \Re $
Unknown vector field: $(u(x,y,t),v(x,y,t)):\Re ^{3} \to \Re ^{2} $

Constraints for the estimation of optic flow

Most constraints for methods that estimate optic flow can be derived from the continuity equation. Figure 2 shows the assumptions and tools used to derive constraints and their direct or indirect link to the continuity equation. The continuity equation referred to by (i) in Figure 2,

$\tag{1} \partial _{t} h+\nabla_{xy} (\vec{v}\cdot h)=0 $,

states the conservation of a quantity $h$ under flow $\vec{v}$.

Figure 2: shows a taxonomy of constraints for the estimation of optic flow $\vec{v}=(u,v)$ from 2D spatial image sequences $f(x,y,t)$. The nodes of the above tree show the constraints and the edges are labeled with assumptions or tools, which are used to derive a constraint. In total there are ten constraints that can be linked back to the continuity equation. The constraint (xii), the orientation tensor, has no obvious link to the continuity equation and, thus, appears separately without connection to constraints (i)-(xi).

Here, we assume the quantity $h$ to be the gray-value $f$ and the flow $\vec{v}$ to be the optic flow $\vec{v}=(u,v)$, for which we assume: $\nabla _{xy} \vec{v}=\vec{0}$. Note that $\nabla _{xy} =\left(\partial _{x} ,\partial _{y} \right)^{t} $ denotes the divergence operator applied to the two spatial dimensions x and y and $\nabla _{xyt} =\left(\partial _{x} ,\partial _{y}, \partial _{t} \right)^{t} $ denotes the divergence operator applied to the three dimensions x, y, and t. The superscript `t' denotes the transpose. The constraint $\nabla _{xy} \vec{v}=\vec{0}$ means that in regions of constant gray-values there is no change in flow -- or gray-values do not "accumulate" like e.g. mass does in a fluid. Then, the continuity equation (i) translates into the optic flow constraint equation (OFCE) (ii)

$\tag{2} \nabla_{xyt} f^{t} (\dot{x},\dot{y},1)=0,$

which is used by many methods, e.g. Lucas & Kanade's (1981), see also App. A1. If the same continuity constraint is applied for $n$ motions this leads to (iii) the generalized structure tensor

$\tag{3} J_{n} =\int \vec{l}(\vec{x})^{t} \vec{l}(\vec{x})\cdot a(\vec{x})\; d\vec{x}.$

The $n$ motion components are computed through the minors of $J_{n} $ (Mota et al., 2001). This tensor is computed by ${n}^{th}$-order partial, spatio-temporal derivatives of the image signal $\vec{l}(\vec{x})=f(\vec{x})*\partial ^{\alpha } g(\vec{x})$ and $\left|\alpha \right|=n$, whereas $\alpha $ denotes a multi-index. More details are given in App. A2.

In general, the OFCE provides only one constraint equation for two unknowns, which is an undetermined system and ill-posed problem (Poggio et al., 1985). An interpretation of this is the aperture problem (see Figure 7). For a limited aperture size and an edge structure, motion can only be estimated normal to that edge, see also Optic flow in psychophysics. When taking this ambiguity imposed by the aperture problem into consideration, we can only estimate motion along the normal of an edge denoted by $\nabla_{xy} f/\left\| \nabla_{xy} f\right\| $. The normal flow is defined by: $\vec{v}/\left\| \; \vec{v}\; \right\| =\nabla _{xy} f/\left\| \nabla _{xy} f\right\| $. Using the OFCE the speed of this normal flow is:

$\tag{4} s=-f_{t} /\left\| \nabla _{xy} f\right\|. $

This constraint is also called normal flow and referred to by (iv) in Figure 2. The following constraints aim to provide alternative to the OFCE descriptions for the estimation of optic flow. Most of them also aim to overcome the ambiguity of the OFCE making additional assumptions. The Bayesian constraint uses a motion prior to resolve the ambiguity. Sampling conditions assume a constant optic flow within the neighborhood they sample from. Thus, these conditions provide additional constraint equations to solve for one velocity vector. The rigorous condition assumes that all terms in the Taylor series expansion of $\textit{f}$ and $\vec{v}$ evaluate to zero independently. Therefore, it is called “rigorous”. A Fourier transform of the entire image sequence or local cubes thereof results in energy in a plane whose orientation is determined by the motion. If the entire image consists of a moving edge the energy is concentrated around a single line, which again results in an ambiguous 2D motion vector. In natural stimuli such a configuration is unlikely and most times energy is arranged in a plane. If the image sequence contains multiple moving objects e.g. transparently overlaid, then energy exists in multiple planes. Another constraint computes the spatial partial derivatives of the OFCE, which yields now two equations due to the two spatial dimensions. This resolves the ambiguity as well. Because of computing the Hessian matrix, the 2${}^{nd}$-order partial derivatives of a multi-dimensional function, we call this approach “Hessian”. The constraint using local image phase is ambiguous in the same way as the OFCE. For the constraint using phases ambiguity is introduced by a constant image phase, which could be introduced, e.g. by a regular texture of constant orientation.

Bayesian constraint. Existing methods integrate the OFCE into a probabilistic framework (Simoncelli et al., 1991; Nestares et al. 2000; Weiss \& Fleet, 2002). These methods assume that the deviations from the OFCE follow a normal distribution, thus

$\tag{5} P(f(\vec{z})\; \left|\vec{v}\right. (\vec{z}))=\frac{1}{\sqrt{2\pi } \sigma } \exp \left(-\frac{1}{2\sigma ^{2} } \int _{\Omega _{\vec{z}} }\left(\left(\nabla_{xyt} f(\vec{z}')\right)^{t} \left(\begin{array}{c} {\vec{v}} \\ {1} \end{array}\right)\right) ^{2} \; d\vec{z}'\right).$

Note that such a term is linked to the posterior using the prior $P(\vec{v}(\vec{z}))$ for motions and $P(f(\vec{z}))$ the probability for a certain gray-value being present by, (v):

$\tag{6} P(\vec{v}(\vec{z})\; \left|\; f(\vec{z})\right. )=\frac{P(f(\vec{z})\; \left|\vec{v}\right. (\vec{z}))\cdot P(\vec{v}(\vec{z}))}{P(f(\vec{z}))}.$

Recent research efforts are focused on priors $P(\vec{v}(\vec{z}))$ for motions (Nestares et al. 2000; Stocker & Simoncelli, 2006) and their interaction with the likelihood term (Wei & Stocker, 2013). Some of these approaches try to improve the estimation of optic flow using this probabilistic formulation. Such estimation typically optimizes the unknown motion velocity $\vec{v}$. For such an optimization the term $P(f(\vec{z}))$ in the denominator can be dropped because it is independent of $\vec{v}$, see also App. A13. The ambiguity for moving gray-value edges is present in this method too. Gray-value edges constrain the velocity to a line of high likelihood values. However, incorporating a prior, e.g. for smaller speeds, such ambiguity can be resolved and a single motion receives maximum likelihood.

Sampling and rigorous condition. Another strategy to address the aperture problem is by integrating over a larger region within the image, hoping that it might contain a gray-value corner besides gray-value edges. Otte & Nagel (1994) propose a Taylor series expansion of the gray-value $f$ of order $m$ and of the motion vector $\vec{w}=(u,v,\; 1)$ of order $n$ both in the argument $\stackrel{\rightharpoonup}{z}=(x,y,t)^{t} $ and plugging them into the OFCE, we get the polynomial

$\tag{7} \sum _{0\le \left|\alpha \right|\le n}\sum _{0\le \left|\beta \right|\le m}\frac{1}{\alpha !\beta !} \partial ^{1+\alpha } f(\vec{z})\cdot \partial ^{\beta } \vec{w}(\vec{z})\cdot (\delta \vec{z})^{\alpha +\beta } = 0$

in the unknowns $\partial ^{\beta } \vec{w}$ with $\left|\beta \right|=n$ and the sampling locations $\delta \vec{z}$ with reference $\stackrel{\rightharpoonup}{z}$. In Eq. (7), $\alpha$ and $\beta$ denote multi-indices. The App. A10 gives further explanations on the Taylor series expansions and multi-indices. The sampling condition, (vi) samples locations $\delta \vec{z}$ on a cubic grid, each sample providing its own condition. For instance, $\delta \vec{z}\in \left\{(-k,-k,-k),...,(k,k,k)\right\}$ provides $(2k+1)^{3} $ constraints for $1/3(m+1)(m+2)(m+3)$ unknowns. Note that some of these sample constraints are linearly dependent, and, thus the total number of linear independent constraint equations is smaller than $(2k+1)^{3} $. More details for this method are given in App. A11. The rigorous condition, (vii)

$\tag{8} \sum _{\left|\alpha \right|+\left|\beta \right|=r}\frac{1}{\alpha !\beta !} \partial ^{1+\alpha } f(\vec{z})\cdot \partial ^{\beta } \vec{w}(\vec{z}) = 0$

assumes that each of these coefficients of this polynomial in Eq. (7) vanish simultaneously, see also App. A10. This rigorous condition provides This gives $(n+m+3)$ over $3$ constraints for $1/3(m+1)(m+2)(m+3)$ unknowns.

Planar constraint in the Fourier domain. Yet another way to use the OFCE is by transforming it into the Fourier domain, which gives the (viii) planar constraint:

$\tag{9} \omega =\vec{k}^{t} \vec{v}.$

The motion $\vec{v}$ observed in the Fourier domain constrains the spectral power to the plane given in Eq. (9), where $\vec{k}=(k_{x} ,k_{y} )^{t} $ denotes the angular frequencies, which correspond to the spatial coordinates $x$ and $y$, and $\omega$ denotes the angular frequency associated with time $t$ (Gafni & Zeevi, 1979; Watson & Ahamuda, 1983). Heeger (1988) proposes to sample the power spectrum, to find the plane given in Eq. (9), see also App. A4. The general form for $n$ motions $\vec{u}_{i} =(u_{i} ,v_{i} ,w_{i} )$ is (ix)

$\tag{10} \tilde{a}(\vec{u}_{1} )\tilde{a}(\vec{u}_{2} )...\tilde{a}(\vec{u}_{n} )f(\vec{\omega }) = 0.$

This assumes the operator $\tilde{a}(\vec{u})=2\pi i\cdot \vec{u}^{t} \vec{\omega }$ with the spatio-temporal frequencies $\vec{\omega }=(\xi ,\eta ,\tau )^{t} $ to be applied to the transformed image signal (Shizawa & Mase, 1991), see also App. A5.

Hessian. Yet another strategy to address the undetermined system of the OFCE is by using another input for the continuity equation. Uras et al. (1988) assume that the quantity $g$ for the continuity equation is expressed by the spatial derivatives of the gray-value function, thus, $g=\nabla _{xy} f$. This should be understood as the continuity applied to each of the partial derivatives separately. In addition, assuming $\nabla _{xy} \vec{v}=0$ as before, we derive the "Hessian" constraint (x),

$\tag{11} \nabla_{xy} (\partial f/\partial t)+H_{f} \vec{v} = 0.$

We call this the Hessian because $H_{f} $ is the Hessian matrix of the gray-value function $f$ with respect to spatial components $x$ and $y$, see also App. A9.

Constant image phase. Another way of using pre-processed input for the continuity equation is the local image phase $\phi$, which can be computed, e.g. with Gabor filters. Thus the quantity $h=\phi $. Plugging this into the continuity equation (again $\nabla _{xy} \vec{v}=0$) we get the local phase constraint (xi):

$\tag{12} \nabla_{xyt} \phi ^{t} \vec{w}=0.$

This assumes that the local phase $\phi$ is constant over time, whereas local image phase represents the local image structure (Fleet & Jepson, 1990), see also App. A6.

Orientation tensor. Another constraint equation, which has no obvious connection to the previously discussed constraints (i)-(xi), is the motion tensor. The idea is that a segment of a video is a stack of images in which gray-value structures have certain orientations. The orientation in the xy-subspace is an indicator for the orientation of structure in space. In contrast, the orientation of structure in xt-subspace or yt-subspace relates to the image velocities u and v, respectively. Thus, estimating the structure orientation in these two subspaces or a combination thereof allows for the estimation of optic flow. Farnebäck (2000) defines the motion tensor $T$ using the minimum residuals between input signal, here the image stream $f$, and defined basis vectors computed with a least squares approach. This motion tensor describes the movement of image structure using the chosen basis set. The motion tensor is a 3 $\times$ 3 matrix. The motion tensor constraint (xii)

$\tag{13} \vec{w}^{t} T\vec{w}=0$

can be interpreted as measure of support for the motion $\vec{w}$. Replacing the motion vector $\vec{w}=(u,v,\; 1)^{t} $ through a parametrized model we get the formulation: $\vec{p}^{t} S^{t} TS\vec{p}=0$, where the matrix $S$ contains the models' parameters. Further details on the method are given in App. A3. Note that tensor-based approaches have difficulties estimating large displacements. Using a coarse to fine multiscale approach can overcome these limitations (Lauze et al., 2004).

Regularization approaches for the estimation of optic flow

Another methodology to solve the ill-posed problem of optic flow estimation using the OFCE is to add regularity constrains. This can be done by minimizing a functional of the form:

$ \tag{14} \min _{u,v} \int _{\Omega }\psi _{1} \left(\partial _{x} f\cdot u+\partial _{y} f\cdot v+\partial _{t} f\right)+\psi _{2} \left(f,u,v\right) \; d\vec{x}$

where the solution is constrained through a fidelity attached term (OFCE) and a regularization term, which penalizes variations of the optic flow. The functions $\psi _{1} $ and $\psi _{2} $ are convex, a robust estimator for the data term and regularization term, respectively and $\Omega $ denotes the 2D integration area, usually the region defined by the image plane. To solve the variational formulation Eq. (14), which is written in a continuous setting, the general approach is to compute the Euler-Lagrange equation and then to discretize the partial differential equation (PDE) in order to find a numerical solution (see Aubert & Kornprobst (2006) for a review on variational approaches and PDEs in image processing, in particular the Chapter 5.3.2 for the estimation of optic flow). The functions $\psi _{i}$ will characterize the smoothness of the solutions and many possible functions were investigated: Regularization can be image-driven or flow-driven, homogenous or in-homogenous, or isotropic or non-isotropic (Weickert & Schnörr, 2001; Bruhn et al., 2006) -- Note that this nomenclature is also used in the context of image diffusion processing (Weickert, 1996). To illustrate this variety, this section presents several example of variational approaches to solve problem of estimating optic flow. For this reason, we kept the argument of $\psi_2$ general, passing information about the gray-value function $f$ and the optic flow $\vec{v}=\left(u,v\right)$.

Homogenous regularizer. This regularizer is defined by $\tag{15} \min _{u,v} \int _{\Omega }\left\| \nabla u\right\| _{2}^{2} +\left\| \nabla v\right\| _{2}^{2} \; d\vec{x},$

which was introduced by Horn & Schunck (1981). Partial derivatives of each flow component are enforced to be small. Bruhn et al. (2006) call this constraint homogeneous because smoothness is imposed globally and equally at all spatial locations. Combined with the OFCE, the energy functional with $\psi _{D} $ and $\psi _{S} $ as identity functions in Eq.(14) can be solved using the calculus of variations. Further details are given in App. A7.

Image-driven isotropic regularizer. This regularizer uses the squared magnitude of the image gradient to control the smoothness (Alvarez et al., 1999):

$\tag{16} \min _{u,v} \int _{\Omega }\psi \left(\left\| \nabla _{xy} f\right\| _{2}^{2} \right)\cdot \left(\left\| \nabla u\right\| _{2}^{2} +\left\| \nabla v\right\| _{2}^{2} \right)\; d\vec{x} \text{ with } \psi (s^{2} )=\frac{1}{\sqrt{1+\frac{s^{2} }{\eta ^{2} } } },$

$\eta $ denotes a parameter, which steers smoothness. For a large image gradient, as it appears at edges, the enforcement of smoothness is reduced. Thus, Eq.(16) prevents smoothing over edges; however, for homogeneous surfaces, where there is a small image gradient, smoothness is strongly enforced.

Image-driven anisotropic regularizer. This regularizer integrates direction and magnitude information from the gray-value function. Because of integrating direction information, this constraint is called anisotropic (Bruhn et al., 2006). Smoothing appears only along the edge and not across the edge. In addition, smoothness is attenuated in the presence of an edge and de-attenuated for homogeneous regions. Taking these desired properties together Nagel & Enkelmann (1986) proposed

$\tag{17} \min _{u,v} \int _{\Omega }\left(\nabla u\right)S_{NE} \left(\nabla u\right)^{t} +\left(\nabla v\right)S_{NE} \left(\nabla v\right)^{t} \; d\vec{x} $ with $\tag{18} S_{NE} =\frac{1}{\left\| \nabla _{xy} f\right\| _{2}^{2} +\kappa ^{2} } \left(\begin{array}{cc} {\left(\partial _{y} f\right)^{2} +\kappa ^{2} } & {-\left(\partial _{x} f\right)\left(\partial _{y} f\right)} \\ {-\left(\partial _{x} f\right)\left(\partial _{y} f\right)} & {\left(\partial _{x} f\right)^{2} +\kappa ^{2} } \end{array}\right),$

where $\kappa$ is a parameter that controls the smoothness.

Flow-driven isotropic regularizer. This approach uses a convex, robust estimator for the squared magnitudes of each derivative and flow component as argument

$\tag{19} \min _{u,v} \int _{\Omega }\psi _{S} \left(\left\| \nabla u\right\| _{2}^{2} +\left\| \nabla v\right\| _{2}^{2} \right)\; d\vec{x} \text{ with } \psi _{S} (s^{2} )=\sqrt{s^{2} +\lambda ^{2} },$

where $\lambda$ is a parameter that controls the smoothness. The function $\psi _{S} $ is an example used by Bruhn et al. (2006); however, various other robust functions have been proposed. Cohen (1993), Deriche et al. (1995), Kumar et al. (1996), Aubert et al. (1999), Zach et al. (2007) suggest an L1 norm to be used instead of the Euclidean norm and use $\psi _{S} (s)=s$ (total variation).

Flow-driven anisotropic regularizer. In contrast to the isotropic case, which includes information about the magnitude of the flow only, this regularizer includes information about the direction of the flow as well. Weickert & Schnörr (2001) proposed the following constraint

$\tag{20} \min _{u,v} \int _{\Omega }tr\left\{\psi _{S} \left(\left(\nabla u\right)^{t} \left(\nabla u\right)+\left(\nabla v\right)^{t} \left(\nabla u\right)\right)\right\}\; d\vec{x} \text{ with } \psi _{S} (s^{2} )=\sqrt{s^{2} +\lambda ^{2} }.$

Again, $\lambda$ is a parameter that controls the smoothness. Note that $\left(\nabla u\right)^{t} \left(\nabla u\right)$ and $\left(\nabla v\right)^{t} \left(\nabla v\right)$ are outer products that define matrices, which are added. The robustness function $\psi _{S} $ is applied to each entry of the matrix. The operator $tr\left\{\; \cdot \; \right\}$ computes the trace. The matrix $\left(\nabla u\right)^{t} \left(\nabla u\right)+\left(\nabla v\right)^{t} \left(\nabla v\right)$ has two eigenvectors which define the contrast of the vector-valued function $\vec{v}=\left(u,v\right)$.

Several authors propose to use the divergence and curl of the vector field $\vec{v}=\left(u,v\right)$as regularizer

$ \tag{21} \min _{u,v} \int _{\Omega }\psi _{DC} \left(div(\vec{v}),curl(\vec{v})\right)\; d\vec{x} $

with $\psi _{DC}$ a generic robustness function. Suter (1994) and Gupta & Prince (1996) use $\alpha \cdot \left\| div(\vec{v})\right\| _{2}^{2} +\beta \cdot \left\| curl(\vec{v})\right\| _{2}^{2}$ as argument with $\psi _{DC}$ as identity function and parameters $\alpha$ and $\beta$.

Most contours are formed by rigid objects in the scene, thus, locally they move in a coherent manner. This observation can be used to define the contour-based smoothness

$ \tag{22} \int _{C}\left\| \frac{\partial \vec{v}_{n} (s)}{\partial s} \right\| ds =0 $

with $\vec{v}_{n} (s)$ as normal flow along contour $C$ (Hildreth, 1984) where $s$ is the parameter of the curve. This integral computes the curvature of each flow component along the curve, which is to be minimized in combination with a data term that estimates normal flow, e.g. xiii). More details can be found in App. A8.

Recent techniques

We discuss various recent techniques. Their selection has been based on their diversity. The presentation follows a chronological order as much as is possible. Note that several of the citations, in particular the most recent ones, build upon earlier techniques.

Multi-grid with backward warping for faster convergence. In the Computer Vision community numerical multi-grid solvers are known as pyramidal approaches (e.g. Black, 1992). For the estimation of flow, typically a backward warping strategy is used. The later frame is warped towards the earlier frame using the current estimate of optic flow. This warping happens in at each level going from a coarser to a finer grid. The final optic flow is the sum of all estimated flows for each level scaled by the ratio between levels.

Robust through m-functions. Bruhn et al. (2005) suggest a combination of a local data constraint and global regularization constraint both encapsulated into m-functions and a multi-grid numerical optimization method to estimate optic flow robustly.

Constancy of gray-value derivatives or matching of descriptors estimates large displacements. Brox et al. (2004) used an image gradient constancy assumption in addition to the OFCE. Other already previously discussed techniques that they used are m-functions for data terms, and regularization term, backwards warping of image frames and multigrids. A refinement of this idea uses descriptors instead of a constancy of gray-value derivatives. These descriptors extracted from the image help matching large image displacements (Brox \& Malik, 2011).

Robustness through L1 norm. Instead of m-functions, the L1 norm is used for data term and regularization term to achieve a robust estimate for optic flow (Rachid et al., 1995; Zach et al., 2007). In an L1 formulation, both data term and regularization term, are not continuously differentiable. To sidestep this problem, Zach et al. (2007) use auxiliary variables that are linked to the optic flow through a squared Euclidean distance, which is differentiable. Two optimization problems, one for the optic flow variables and one for the auxiliaries, are solved. Each of these problems contains only one term that is not continuously differentiable. This enhanced the robustness and decreased computation time.

Learn from the failure of the gray-value assumption. More general approaches learn violations of the brightness constancy assumption for given intensity functions and their associated optic flow in combination with the spatial characteristics of optic flow (Sun et al., 2008).

Voting and median filtering handles flow discontinuities. Tensor voting helps the handling of flow discontinuities compared to variational optic flow estimation. Anisotropic stick tensor voting is applied to the data term and is combined with an anisotropic regularization term, which by definition contains directional information (Rashawan et al., 2013). Another approach to handles flow discontinuities using a median filtering interspersed into the backwards warping step of the flow solution between levels in an image pyramid (Sun et al., 2010).

Filling-in through an anisotropic regularization term. Filling-in at spatio-temporal locations happens only if the data term does not provide information about the solution of the flow vector at that location. This is achieved by ``automatically determining the weight for the regularization term (Zimmer et al., 2011).

Benchmarks

A measure for the accuracy of the estimated flow $(u_{est} ,v_{est} )^{t} $ compared to the ground-truth flow $(u_{gt} ,v_{gt} )^{t} $ is the angular error measure (Fleet & Jepson, 1990; Barron et al. (1994): $\tag{23} \psi _{E} =\arccos \left(\frac{(u_{gt} ,\; v_{gt} ,\; 1)^{t} (u_{est} ,\; v_{est} ,\; 1)}{\sqrt{u_{gt}^{2} +v_{gt}^{2} +1} \cdot \sqrt{u_{est}^{2} +v_{est}^{2} +1} } \right)$ Note that the error measure is also defined for vectors with zero length.

Barron et al. (1994).' For the evaluation of different algorithms Barren et al. used the following eight image sequences: “Sinusoidal Inputs”, “Translating Squares”, “3D Camera Motion and Planar Surface”, “Yosemite Sequence”, “SRI Sequence”, “NASA Sequence”, “Rotating Rubik Cube”, and the “Hamburg Taxi Sequence”. Implementations of the algorithms and the test sequences can be retrieved from ftp.csd.uwo.ca in the directory pub/vision/ (accessed Dec 2012). Some of these sequences are also available at Michael Black's webpage (accessed Dec 2012). McCane et al. (2001) largely evaluate the same algorithms as Barron et al. (1994); however, they use a different error metric and different image sequences.

Baker et al. (2011) present an evaluation of methods estimating optic flow for the following 12 sequences of the Middlebury database: “Army”, “Mequon”, “Schefflera”, “Wooden”, “Grove”, “Urban”, “Yosemite”, “Teddy”, “Backyard”, “Basketball”, “Dumptruck”, and “Evergreen”. Everyone can test his/her method by computing optic flow for these sequences and submitting results at Middlebury benchmark (accessed Nov 2012). This database includes the following challenges for optic flow estimation: Fast motions, in particular of fine structures, many occlusions, motion discontinuities, independently moving objects, realistic textures, and illumination changes. In comparison to Barron et al.'s benchmark Baker et al.'s benchmark includes sequences with fast image speeds or large displacements and independently moving objects.

Run-time performance

The calculation of optic flow in (close to) real-time has been claimed over the last two decades. Bülthoff and colleagues state: “…to compute the optical flow in near-real time for natural images (256 × 256 pixels) on the Connection Machine system.” (Bülthoff et al., 1989, p. 549). A more quantitative statement about real-time gave Bruhn and colleagues: Their algorithm computed ~12 fps for a resolution of 160 × 120 pixels on a 3.06 GHz Pentium 4 CPU using FAS, a bidirectional full multi-grid method for nonlinear flow-driven isotropic regularization (Bruhn et al., 2006, p. 271). Zach and colleagues achieved ~17 fps for 256 × 256 pixels using a GeForce Go 7900 GTX operating Linux with OpenGL graphics drivers and the Cg 1.5 toolkit with their duality based approach for realtime TV-L1 optical flow estimation running with 100 iterations (Zach et al., 2007, Table 1 on p. 221). Although, none of these algorithms achieve real-time processing for a common resolution of, e.g. 640 × 480 pixels at a frame rate of 30 Hz, improvements in processing speed and quality of the estimated optic flow have been made in the last two decades. An up to date performance evaluation for a benchmark set of sequences is provided by researchers at the Middlebury University.

Optic flow for guidance of locomotion and scene parsing

Optic flow provides limited information about the currently captured scene and about the locomotion, limited because of the 3D to 2D projection. But the movement of the sensor regains some of the 3D information that is beyond the data accessible from a single 2D image. Marr calls this additional information the 2½D sketch (Marr, 1982, p. 37).

Figure 3: shows a model for optic flow called motion field. The observer is moving above a ground-plane with some translational velocity $\vec{v}$ and rotational velocity $\vec{\omega }$. This leads to the movement $\dot{\vec{P}}(t)$ of a sample location $\vec{P}(t)$. Using a perspective projection, in the image plane this movement is expressed by $\dot{\vec{p}}(t)$.

Optic flow allows for the estimation of depth information relative to the speed of the linear velocity, the direction of the linear velocity, and the rotational velocity for each of the axis in 3D. A common model for optic flow is the model of the motion field proposed by Longuet-Higgins & Prazdny (1980), which depends on the depth $Z$ of the scene as registered by the pixels $\vec{x}=(x,y)^{t} $ of the camera's screen, the camera's linear velocity $\vec{v}=(v_{x} ,v_{y} ,v_{z} )^{t} $, and rotational velocity $\vec{\omega }=(\omega _{x} ,\omega _{y} ,\omega _{z} )^{t} $. Figure 3 illustrates the definition of the various identifiers in the motion field model. Note that $\vec{v}$ denotes the translational velocity instead of the image velocity. For a pinhole camera with focal length $f$ that projects 3D points $P=(X,Y,Z)^{t} $ onto the 2D image plane $\vec{x}=(x,y)^{t} =f/Z\cdot (X,Y)^{t} $, the model equation for instantaneous or differential motion reads:

$ \tag{24} \left(\begin{array}{c} {\dot{x}} \\ {\dot{y}} \end{array}\right)=\underbrace{\frac{1}{Z} \left(\begin{array}{ccc} {-f} & {0} & {x} \\ {0} & {.-f} & {y} \end{array}\right)\left(\begin{array}{c} {v_{x} } \\ {v_{y} } \\ {v_{z} } \end{array}\right)}_{t{\rm ranslational\; flow}} +\underbrace{\frac{1}{f} \left(\begin{array}{ccc} {x\cdot y} & {-(f^{2} +x^{2} )} & {f\cdot y} \\ {(f^{2} +y^{2} )} & {-x\cdot y} & {-f\cdot x} \end{array}\right)\left(\begin{array}{c} {\omega _{x} } \\ {\omega _{y} } \\ {\omega _{z} }\end{array}\right)}_{r{\rm otational\; flow}} $

In Eq. (24) $f$ denotes the focal length instead of the gray-value function as, e.g. in Eq. (2). The left-hand side of Eq. (24) corresponds to the variables that we used, e.g. in Box 1 these variables are also named $\dot{x}\equiv u$ and $\dot{y}\equiv v$. This model Eq. (24) of optic flow shows three important characteristics. First, translational flow, the 1${}^{st}$ component on the right-hand side of Eq. (24), and rotational flow, the 2${}^{nd}$ component, superimpose linearly. Second, the depth $Z$ influences only the translational flow. The rotational flow is independent of the depth $Z$. Third, the symmetry axis for translational flow is $y=x$ and that of rotational flow is $y=-x$. Figure 4 illustrates these characteristics and other properties. Various methods (Bruss & Horn, 1983; Rieger & Lawton, 1985; Jepson & Heeger, 1992; Kanatani 1993; Perrone, 1992; Perrone & Stone, 1994; Lappe & Rauschecker, 1993; Ma et al., 1998; Chiuso et al., 2000; Royden, 2002; Pauwels & Van Hulle, 2006; Raudies & Neumann, 2009) estimate relative depth, the direction of translation, or rotational velocities from optic flow. Raudies & Neumann (2012) give an overview and performance evaluation of methods that estimate the velocity parameters from optic flow. In addition, direct methods have been developed that estimate velocities directly from video without estimating optic flow (Negahdaripour & Horn, 1987; Horn & Weldon, 1988).

Figure 4: shows example flows generated by using the image motion model for optic flow. All plots use a pinhole camera with the image plane $x,y\in \{ -1cm,\; 1cm\} $ and focal length $f=1cm$. The back-plane appears at $d=10 m$ distance or $Z(x,y)=d$. The ground-plane is $d=-10 m$ away measured along the direction that is $\alpha $=-30${}^\circ$ down from the horizontal. For this slanted view of a ground-plane the distance is computed as $Z(y)=df/(y\cos (\alpha )+f\sin (\alpha ))$. Velocity components, which are not reported, are set to zero. a) Flow for the forward translation $v_{z} =1m/\sec $ and b) for the translation $v_{x} =0.45m/\sec $ and $v_{z} =0.89m/\sec $. In both cases the focus of expansion (FOE) appears along the horizontal and marks the point where all vectors are radially expanding. c) Uses the same translation as in b) but with a ground-plane instead of a back-plane. This slanted ground-plane introduces an additional speed gradient along the vertical direction, compare with Eq. (17). d) Flow for the pitch rotation $\omega _{x} =3{}^\circ /\sec $. e) Flow for the roll rotation $\omega _{z} =9{}^\circ /\sec $, which shows a center of rotation (COR) around which all flow vectors orbit. f) Flow for curvilinear motion defined by $v_{x} =0.71m/\sec $, $v_{z} =0.71m/\sec $, and $\omega _{y} =3{}^\circ /\sec $ shows a center of motion (COM), a shifted version of the FOE.

An active observer who can rotate eyes and head independent of its body movement might use these additional degrees of freedom to generate a stream of visual input that simplifies the task. For instance, only translational flow depends on the linear velocity. To determine this linear velocity from flow, an observer might avoid rotational motions to simplify the estimation of linear velocity. This strategy could be used to guide locomotion by optic flow cues. A strategy for scene parsing is complementary in terms of the involved degrees of freedom. To segment objects that appear at different depths, the observer might use a combination of translation and rotation. If both occur, far objects produce mainly rotational flow and close objects produce mainly translational flow. Often, this leads to strong differences in speed and direction of optic flow vectors that occur at the border between these objects. This co-occurrence of translation and rotation makes a segmentation of objects from background easier compared to one that uses flow generated by only translation or only rotation. Such strategies can be inferred from data in a virtual environment (Rothkopf & Ballard, 2009) taken together with studies on the trajectories of a person’s approach of a target or avoid of an obstacle (Fajen & Warren, 2003), see Raudies et al. (2012) for further details.

For immediate action, fast, crude, and qualitative estimates are better than erroneous estimates that take too long. Although many modern methods for the estimation of optic flow and self-motion velocities from optic flow include robust estimators and multiple constraints, their success under several natural input conditions is not guaranteed. For optic flow estimation Baker et al. (2011) conclude from their analysis of most common methods: “… no single method is yet able to achieve strong performance across a wide variety of datatypes.” (p. 28). It might be that no single method will be able to compute dense optic flow accurately for each position or pixel in the visual field. We want to suggest that such performance is also not required for most common and natural tasks.

A set of models links the estimation of optic flow and self-motion velocities closer to visual guidance behavior. Grossberg et al. (1999) suggest the estimation of heading, which is the direction of the linear velocity, from a basis set of flow patterns, which cover a large region of the visual field. This estimation happens in log-monopole mapped coordinate space, which models the transform of visual space to cortical space, see also App. A15. Elder et al. (2009) extended the model of Grossberg et al. (1999) to explain data on visual steering behavior for obstacle avoidance and target approach from Fajen & Warren (2003). Browning et al. (2009) extended these models by adding a stage that estimates optic flow and demonstrated it on real-world video. Layton et al. (2012) proposed a model for visually guided navigation in the presence of independently moving objects. In conclusion, some models use optic flow from image sequences directly for visual guidance.

Optic flow in neuroscience

Neuroscience studies the perception of optic flow mainly with techniques of single cell physiology and functional magnetic resonance imaging (fMRI).

Single cell physiology

Various brain areas contain cells responsive to visual motion. Hubel & Wiesel (1968) describe cells in monkey’s striate cortex to be selective to bar movements. Cells are also selective to moving lines of variable length called end-stopping or the line ends of such moving lines (Pack et al., 2003; Yazdanbakhsh & Livingstone, 2006). Motion detection based on such line ends is not affected by the ambiguity of motion for an edge, which is imposed by an aperture of limited size. A model network of two end-stopped cells can establish selectivity for curvature and sign of curvature (Dobbins et al., 1987). Aside from direction selectivity, neurons in V1 are speed tuned. Their firing rate is invariant with respect to the spatial frequency of a presented sinusoidal grating, but varies with the contrast of the grating (Priebe et al., 2006).

Signals of these motion selective cells are spatially integrated in the brain’s middle temporal (MT) area, which succeeds area V1 in the hierarchy of brain areas. This spatial integration is expressed by varying receptive field sizes: Receptive fields in area MT are about ten times as large as those of area V1 (Angelucci et al., 2005). Area MT cells span a wider range of speed selectivity (Priebe et al., 2006; their Figure 8) and have a broader direction (mean 95°) tuning compared to V1 cells (mean 68°) (Albright, 1984). Some MT cells have a center surround interaction where the surround plays an antagonistic role (Xiao et al., 1997; Orban, 2008). The shape of the surround is either circular symmetric (20%), or one sided (50%), or bilaterally arranged (25%). Such configuration with antagonistic surround makes these cells suitable for the computation of partial derivatives of motion signals. Another function is the detection of pop-out motion, e.g. a leftward moving small patch in front of a rightward moving background (Tanaka et al., 1986). An overview about the anatomical structure and selectivity of area MT give Born & Bradley (2005). In sum, the visual area MT integrates and analyzes visual motion signals, e.g. for kinetic contours as introduced by depth discontinuities, independently moving objects (IMOs), or deformations.

The medial superior temporal area (MST) can be distinguished by its function and anatomical location into a lateral and dorsal part. Cells in the lateral part show responses to motion contrasts, stimuli with a small center patch that moves opposite to the background (Eifuku & Wurtz, 1998). This could be also termed as the detection of pop-out motion as, .e.g., IMOs. In contrast, the lateral part of area MST contains cells selective for large field motion stimuli such as optic flow. In particular, cells are selective for the position of the FOE within the visual field, e.g. an FOE that appears in the left-lower corner of the visual field (Duffy & Wurtz, 1995). Other cells are selective for the rotation around the optical axis (roll) or movement along the optical axis, or a combination of these two movements (Grazanio et al., 1994). Such a movement leads to clockwise / counterclockwise rotational flow, expansion / contraction flow, or spiral flow, respectively. These cell responses qualitatively describe the type of self-motion instead of precisely encoding the six degrees of freedom of a translating and rotating observer.

Functional magnetic resonance imaging (fMRI)

Motion sensitivity in humans has been studied with fMRI. Area MT (or V5 according to Brodmann’s labeling) shows sensitivity for visual motion contrasted with stationary stimuli (Tootell et al., 1995). Area V1 shows motion sensitivity, too, more specifically first-order motion sensitivity. Probing with second-order motion stimuli, these contain a second motion component, e.g. noise, shows high fMRI activation in visual area three (V3), ventral posterior (VP), and V5 (Smith et al., 1998). Kinetic contours and static contours activate area kinetic occipital (Zeki et al., 2003) in humans. Recordings for real-live video input shows, visual motion activates V5, V3A, medial posterior parietal cortex (mPPC), and lateral occipital cortex (LOC) with the distinction that LOC responded more to differential flow or kinetic contours and mPPC more to global flow (Bartels et al., 2008). Area V6 is selective for coherent motion that appears in a region of the visual field and, thus, might be specialized in detecting IMOs (Pitzalis et al., 2010). In sum, fMRI studies largely mapped out the same areas or their human homologous as in monkeys, which process visual image motion.

Biologically inspired models for optic flow processing

Another approach for the estimation of optic flow from image sequences takes inspiration from biology. The development of such models is guided by anatomical, physiological, psychophysical data, and computational goals. Therefore, this section is closely related to the one on Optic flow in neuroscience; however, it strictly focuses on bio-inspired models of optic flow processing and not on the presentation of data from recordings etc.

Figure 5: shows a de-correlation and correlation mechanism for the detection of optic flow. a) Schematics of a de-correlation mechanism for one speed and eight directions of motion. All possible shifts $x-\Delta x_{i}$, except for the one to be detected, are de-correlated with the current location $x$. This de-correlation could be realized by inhibitory connections between spatially offset neurons suppressing correlations of all non-preferred motions. b) Schematics of the correlation mechanism for motion detection. The registered intensity signal of a spatial offset location $x-\Delta x$ is temporally delayed by $\Delta $ and then correlated with the intensity signal at location $x$. This correlation could be realized through an excitatory synaptic connection between two spatially offset neurons. We denote the spatial offset in this figure by printing the nodes at a certain distance.

Figure 6: shows dot motions as a line in 2D or 3D. a) Assume a dot is moving with the velocity of two pixels per frame. This dot leaves the trace as shown in the 2D space-time plane (dark squares). b) A dot moving in the 2D image plane with the horizontal velocity of one pixel per frame and the vertical velocity of half a pixel per frame generates the depicted trace (dark cubes).

Initial optic flow detection and processing

Various species are skilled in sensing and interpreting optic flow for visual guidance or collision avoidance. For instance, bees balance flow speeds perceived in the left and right hemifield (Srinivasan et al., 1996). The rabbit’s retina detects visual motion employing an inhibition scheme (Barlow & Levick, 1965). Flies use a correlation mechanism to detect motion (Reichardt, 1987). In the following we present these two well-known mechanisms of inhibition of null-directions and correlation for the detection of optic flow.

The rabbit's retina uses a mechanism of inhibition. All non-preferred motions or all null-directions are inhibited. As a result, excitation remains for the presented motion. Inhibition could be generated by using de-correlation, see Figure 5a.

In contrast, the correlation mechanism used by flies correlates preferred motion. Local image structure is detected at each location in the image. Detected structures from two spatially distant locations are correlated while signals from one location are temporally delayed, see Figure 5b. For instance, if the two locations are 2° of visual angle apart in the visual field and the delay amounts to 200 msec, then a motion speed of 10°/sec is detected in the direction determined by the arrangement of the two spatially distant locations in the visual field.

An extension of this correlation detector uses filters which detect spatio-temporally varying signals (Adelson & Bergen, 1985; van Santen & Sperling, 1985). Figure 6 shows the generation of such a signal for a 1D and 2D space. Visualize the motion of a dark dot in 3D space as being recorded. This dark dot creates a line in the 3D cube of stacked images, see Figure 6b. The spatio-temporal filters are designed to pick up these lines oriented within the 3D cube. More details on the method are given in App. A14.

Cortex models of spatial and temporal motion integration

Thus far, all model circuits addressed the initial detection of optic flow, e.g. in the retina; however, these do not model the processing of optic flow information within the hierarchy of visual areas found in primates. The remainder of this section presents models of such cortical processing with only employing forward connections or forward and feedback connections. Simoncelli & Heeger (1998) propose a forward processing model of optic flow information for the visual areas V1 and middle temporal (MT). Area V1 detects initial motions by sampling the spatio-temporal frequency space with Gabor filters (Heeger, 1988). Filtering results are squared, half-wave rectified, and then normalized to model the result of V1 simple cells. Responses of V1 complex cells are computed as the weighted sum of spatially neighboring simple cells with the same frequency and orientation. Model area MT cells sum V1 complex cell responses for all orientations and all frequencies that fall within a plane that corresponds to a velocity vector (compare with Eq. (9)). A half-wave rectification, squaring, and normalization over neighboring spatial locations are computed for these model area MT cells, as well. This feed forward model explains MT cell responses for grating and plaid stimuli (Movshon et al., 1986), the speed tuning curves of MT cells (Rodman & Albright, 1987), and the response activation as a function of dot numbers, similar and different mask direction compared to the background while varying the direction of motion (Snowden, 1991). This model was one of the first to explain data on plaids and gratings.

Nowlan & Sejnowski (1995) propose a cortical network architecture that can process motion transparency, which is the co-existence of multiple velocities in the same visual region. Their retinal stage contains a Difference-of-Gaussian (DoG) filtering which suppresses noise and sharpens edges and corners in the input stimuli. Motion energy is detected using the model of Adelson & Bergen (1985) for four directions combined with nine spatial and temporal frequencies. The next stage combines these responses by sampling all frequencies that fall into a plane compatible with a single motion (compare with Eq. (9)) and uses eight directions combined with four speeds plus a zero velocity. In total, this gives 33 motion velocities (direction & speed). Filtering results for these velocities are passed through a soft-max operation; see their Eq. (1), which allows for the representation of motion transparency. The next stage, selectively chooses from these soft-max processed filtering results to compute a single velocity estimate as output by combining estimates from all spatial locations. This selection mechanism is trained to minimize the error between estimated and ground-truth velocity using 500 “videos” that show object motion and include cases of motion transparency, which is critical for the model to form a representation for motion transparency. This model explains a variety of data, e.g. the spatiotemporal response properties for sinusoidal gratings of varying spatial and temporal frequency, the response to transparent random dot stimuli of varying motion direction, the neural response properties to two overlapping gratings while varying the gray-value intensity for the intersecting surfaces (Stoner et al., 1990). A unique feature of this model is the training of a selection mechanism based on videos showing object motion.

Neuman’s group suggested cortex models for visual motion processing using feed forward and feedback connectivity (Bayerl & Neumann, 2004; Bayerl & Neumann, 2007; Weidenbacher & Neumann, 2009; Beck & Neumann, 2011; Raudies et al., 2011; Bouecke et al., 2011). Common to all models is a general three step processing cascade defined by (i) a feature selection, (ii) feedback coupling, and (iii) normalization. The feature selection stage could be an initial motion detection, e.g. using the motion energy model form Adelson & Bergen (1985), the detection of orientations in the image, or the detection of horizontal disparity for a given image pair. The feedback coupling mechanism is unique to these models. It allows for the enhancement of feature activations by feedback only if these are present in the forward processing stream. Otherwise the feedback signal remains without effect. In the third stage normalization occurs over spatial locations and/or feature dimensions. For the processing of motion transparency a center-surround kernel is employed for normalization that allows for the co-existence of multiple motions in the same spatial region and, thus, is similar to the soft-max operation of Nowlan & Sejnowski’s (1995) model. Models from Neumann’s group have been linked to data from neurophysiology and in some cases psychophysics. Some model implementations show a technical relevance to the processing of optic flow (Bayerl & Neumann, 2007).

A detailed overview about the mechanisms of motion detection, integration versus selection is provided by Pack & Born (2008). Emphasis in the review is put on two mechanisms: Integration versus selection of motion. A typical selection model is Nowlan & Sejnowski’s (1995) model and a typical integration model is Simoncelli & Heeger’s (1998) model. Both mechanisms show inconsistencies with data. Models that incorporate both mechanisms, so called hybrid models, seem promising.

Optic flow in psychophysics – Perception and phenomenology

Early work on motion perception focused on the influence of the aperture on the perceived direction of motion. Wallach (1935) describes this as the aperture problem; in most cases motion seen through a small aperture is normal to the edge of a line or local surface contour, see Figure 7a. In addition, the perceived motion is influenced by the angle and ratio, e.g. of a rectangular aperture, see Figure 7b. One illusion is the barber’s pole illusion. A diagonally striped pattern on a rotating cylinder is seen as downward or upward moving when viewed from a fronto-parallel perspective, see Figure 7c. Illusions, like the barber’s pole illusion, are good stimulus probes to improve our understanding of motion integration within the visual field.

Figure 7: shows examples of the aperture problem – the ambiguity of motion for a line seen through an aperture of limited size. a) A point on a line can have various motions since the point-correspondence appears ambiguous. Typically, humans perceive a motion normal to the edge. b) However, the perceived motion varies with the geometry of the aperture. In the first example the line appears to move diagonally according to the normal direction of the line. In the second example the line appears to move diagonally, except for the center part, where it is perceived as moving horizontally. c) In the barber’s pole illusion a cylinder with printed concentric bands winding upward is rotating. Although, the pole is rotating counterclockwise, an upward motion is perceived.

Many studies measure the accuracy of heading as perceived from optic flow. For instance, the error in perceived heading linearly increases as a function of the horizontal eccentricity of heading (Warren & Kurtz, 1992; Crowell & Banks, 1993). Heading errors are large for simulated eye-rotations but not for real ones (Royden et al., 1992, 1994). This suggests the involvement of extraretinal signals in heading perception. Others study the influence of independently moving objects (IMOs) on perceived heading. Such IMOs confound the perceived heading because their motion is different from that of the background. Biases in the perceived heading occur if an IMO crosses the observer’s path (Royden & Hildreth, 1996) or obscures the observer’s FOE (Warren & Saunders, 1995). For both configurations it is difficult to segment the IMO from the background based on optic flow. Both studies use exclusively linear, simulated self-motions. In contrast, curvilinear path perception is not significantly “interrupted” by IMOs, and accuracy of path perception depends inversely on curvature (Fajen & Kim, 2002). This suggests that the rotational motion enhances flow discontinuities that in turn enhance the capability of segmenting IMOs. The dependence of accuracy on curvature might support the same hypothesis on flow-based segmentation. Accuracy is low for high curvature or small radii. In this case the instantaneous rotation rates are high and rotational flow dominates, which does not depend on depth and, thus, does not provide a segmentation cue at depth discontinuities as they often occur for IMOs. Conversely, accuracy is high for small curvatures, which co-occur with small rotational velocities. Translational flow is present and allows for segmentation, which is enhanced by the additional rotational flow compared to the case of only translational flow. This idea about motion parallax cues facilitating heading perception is supported by Li & Warren: Heading perception during curvilinear path motion is improved by motion parallax cues – as provided by translational flow – and a reference object (Li & Warren, 2000).

Acknowledgements

We want to express our gratitude to Tim Barnes, Andrew Browning, Oliver Layton, Ennio Mingolla, Heiko Neumann, and two reviewers, who provided helpful comments to improve this article. This work was supported in part by CELEST, a National Science Foundation Science (NSF) of Learning Center (NSF SMA-0835976), NSF BCS-1147440, the Office of Naval Research (ONR) N00014-11-1-0535, and ONR MURI N00014-10-1-0936.

References

Adelson, E.H. and Bergen, J.R. (1985). Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America 2(2), 284-299.
Albright, T. (1984). Direction and orientation selectivity of neurons in visual area MT of the macaque. Journal of Neurophysiology 52, 1106-1130.
Alenya, G., Negre, A., and Crowley, J.L. (2009). A comparison of three methods for measure of time to contact. In Proc. Intelligent Robots and Systems, USA, 4565-4570.
Aloimonos, Y. and Duric, Z. (1994). Estimating the heading direction using normal flow. International Journal of Computer Vision 13(1), 33-56.
Alvarez, L., Esclarin, J., Lefebure, M., & Sanchez, J. (1999). A PDE model for computing the optical ﬂow. In Proc. XVI congreso de ecuaciones diferenciales y aplicaciones. Las Palmas de Gran Canaria, Spain. 1349–1356.
Ahrns, I. and Neumann, H. (1998) Improving phase based disparity estimation by means of filter tuning techniques. In: Proceedings of the DAGM'98, 357-364.
Angelucci, A., Levitt, J., Walton, E., Hupe, J.-M., Bullier, J., and Lund, J. (2002). Circuits for local and global signal integration in primary visual cortex. Journal of Neuroscience 22, 8633–8646.
Aubert, G., Deriche, R., and Kornprobst, P. (1999). Computing optic flow via variational techniques. SIAM Journal of Applied Mathematics 60(1), 156-182.
Aubert, G. and Kornprobst, P. (2006). Mathematical problems in image processing: partial differential equations and the calculus of variations. 2nd edition. Springer, Applied Mathematical Sciences 147.
Bachrach, A., He, R., and Roy, N. (2009). Autonomous flight in unknown indoor environments. International Journal of Micro Air Vehicles 1(4), 217-228.
Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., and Richard, S. (2011). A database and evaluation methodology for optical flow. International Journal of Computer Vision 92, 1-31.
Barlow, H.B. and Levick, W.R. (1965). The mechanism of directionally selective units in rabbit's retina. Journal of Physiology 178, 477-504.
Barron, J. Fleet, D., and Beauchemin, S. (1994). Performance of optical flow techniques. International Journal of Computer Vision 12, 43–77.
Bartels, A., Zeki, S., and Logothetis, N.K. (2008). Natural vision reveals regional specialization to local motion and to contrast-invariant, global flow in the human brain. Cerebral Cortex 18, 705-717.
Bayerl, P. and Neumann, H. (2004). Disambiguating visual motion through contextual feedback modulation. Neural Computation 16, 2041-2066.
Bayerl, P. and Neumann, H. (2007). A fast biologically inspired algorithm for recurrent motion estimation. In IEEE Transactions on Pattern Analysis and Machine Intelligence 29(2): 246-260.
Beck, C. and Neumann, H. (2011). Combining feature selection and integration - a neural model for MT motion selectivity. PLoS ONE 6(7): e21254, doi:10.1371/journal.pone.0021254.
Black, M. (1992). Robust incremental optical flow. PhD Thesis. Yale University. YALEU/CSD/RR #923.
Bouecke, J.D., Tlapale, E., Kornprobst, P., and Neumann, H. (2011). Neural mechanisms of motion detection, integration, and segregation: from biology to artificial image processing systems. In EURASIP Journal on Advances in Signal Processing, doi:10.1155/2011/781561.
Browning, N.A., Grossberg, S., and Mingolla, E. (2009). Cortical dynamics of navigation and steering in natural scenes: motion-based object segmentation, heading, and obstacle avoidance. Neural Networks 22, 1383-1398.
Brox, T., Bruhn, A., Papenberg, N., and Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In Proceedings of the 8th European Conference on Computer Vision, LNCS 3042, T. Pajdla and J. Matas (Eds.), vol. 4, 25-36.
Brox, T. and Malik, J. (2011). Large displacement optical flow: descriptor matching in variational motion estimation. In IEEE Transactions on Pattern Analysis and Machine Intelligence 33(3), 500-513.
Bruhn, A., Weickert, J., and Schnörr, C. (2005). Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods. International Journal of Computer Vision 61(3), 211-231.
Bruhn, A., Weickert, J. Kohlberger, T., and Schnörr, C. (2006). A multigrid platform for real-time motion computation with discontinuity preserving variational methods. International Journal of Computer Vision 70(3), 257-277.
Bruss, A. and Horn, B. (1983). Passive navigation. Computer Vision, Graphics, and Image Processing 21, 3-20.
Bülthoff, H., Little, J., and Poggio, T. (1989). A parallel algorithm for real-time computation of optical flow. Nature 337, 549-553.
Chahine, M. and Konrad, J. (1995). Estimation and compensation of accelerated motion for temporal sequence interpolation. Signal Processing: Image Communication 7, 503-527.
Chiuso, A., Brockett, R., and Soatto, S. (2000). Optimal structure from motion: local ambiguities and global estimates. International Journal of Computer Vision 39(3), 195-228.
Cremers, D. and Schnörr, C. (2003). Statistical shape knowledge in variational motion segmentation. Image and Vision Computing 21, 77-86.
Crowell, J.A. and Banks, M.S. (1993). Perceiving heading with different retinal regions and types of optic flow. Perception & Psychophysics 53(3), 325-337.
Csillag, P. and Boroczky, L. (1999). Enhancement for displaying temporally subsampled image sequences using different motion models in MC interpolation. In Proc. SPIE 3642, High-Speed Imaging and Sequence Analysis, 32 doi:10.1117/12.348426.
Cohen, I. (1993). Nonlinear variational method for optical flow computation. In Proc. of the 8th Scandinavian Conference on Image Analysis, vol. 1, World Scientific, River Edge, NJ, 523-530.
Deriche, R., Kornprobst, P., and Aubert, G. (1995). Optical-flow estimation while preserving its discontinuities: a variational approach. In Proc. 2nd Asian Conference on Computer Vision 2, 290-295.
Duchon, A.P., Warren, W.H., and Kaelbling, L.P. (1998). Ecological robotics 6(3/4), 473-507.
Duchon, A.P. and Warren, W.H. (2002). A visual equalization strategy for locomotion control: Of honeybees, robots, and humans. Psychological Science 13(3), 272-278.
Duffy, C.D. and Wurtz, R.H. (1995). Response of monkey MST neurons to optic flow stimuli with shifted centers of motion. Journal of Neuroscience 15(7), 5192-5208.
Dobbins, A., Zucker, S.W., and Cynader, M.S. (1987). Endstopped neurons in the visual cortex as a substrate for calculating curvature. Nature 329, 438-441.
Effken, J.A. and Shaw, R.E. (1992). Ecological perspectives on the new artificial intelligence. Ecological Psychology 4(4), 247-270.
Eifuku, S. and Wurtz, R.H. (1998). Response to motion in extrastriate area MSTl: Center-surround interactions. Journal of Neurophysiology 80(1), 282-296.
Elder, D.M., Grossberg, S., and Mingolla, E. (2009). A neural model of visually guided steering, obstacle avoidance, and route selection. Journal of Experimental Psychology: Human Perception and Performance 35(5), 1501-1531.
Fajen, B.R. and Kim, N.-G. (2002). Perceiving curvilinear heading in the presence of moving objects. Journal of Experimental Psychology: Human Perception and Performance 28(5), 1100-1119.
Fajen B.R. and Warren, W.H. (2003). Behavioral dynamics of steering, obstacle avoidance, and route selection. Journal of Experimental Psychology Human Perception and Performance 29, 343–362.
Farnebäck G. (2000). Fast and accurate motion estimation using orientation tensors and parametric motion models. In Proceedings of the 15th International Conference on Pattern Recognition, vol. 1, 135-139.
Fermüller, C. and Aloimonos, Y. (1993). The role of fixation in visual motion analysis. International Journal of Computer Vision 11:2, 165-186.
Fleet, D. and Jepson, A.D. (1990). Computation of component image velocity from local phase information. International Journal of Computer Vision 5(1), 77-104.
Fleet, D. and Weiss, Y. (2005). Optical flow estimation. In Mathematical Models in Computer Vision: The Handbook, Paragios, N., Chen, Y., and Faugeras, O. (Eds), Chapter 15, Springer, pp. 239-258.
Freeman, W.T. and Adelson, E.H. (1991). The design and use of steerable filters. In IEEE Transactions on Pattern Analysis and Machine Intelligence 13(9), 891-906.
Gabor, D. (1946). Theory of communication. Part 1: The analysis of information. Journal of the Institution of Electrical Engineers - Part III: Radio and Communication Engineering 93(26), 429-441.
Gafni, H. and Zeevi, Y. (1979). A model for processing of movement in the visual system. Biological Cybernetics 32, 165-173.
Gibson, J.J. (1966). The senses considered as perceptual systems. Houghton Mifflin, Boston.
Gordon, D.A. (1965). Static and dynamic visual fields in human space perception. Journal of the Optical Society of America 55(10), 1296-1303.
Granlund, G. H. and Knutsson, H. (1995). Signal processing for computer vision. Kluwer.
Graziano, M.S.A., Andersen, R.A., and Snowden, R.J. (1994). Tuning of MST neurons to spiral motions. Journal of Neuroscience 14(1), 54-67.
Grossberg, S., Mingolla, E., and Pack, C. (1999). A neural model of motion processing and visual navigation by cortical area MST. Cerebral Cortex 9, 878-895.
Gupta, S.N. and Prince, J.L. (1996). On div-curl regularization for motion estimation in 3-d volumetric imaging. In Proc. of the International Conference on Image Processing (ICCV), 929–932.
Haralick, R.M. and Lee, J.S. (1983). The facet approach to optic flow. In: L.S. Baumann (Ed.), Proceedings Image Understanding Workshop (Science Applications, Arlington, VA) 84-93.
Heeger, D.J. (1988). Optical flow using spatiotemporal filters. International Journal of Computer Vision 1(4), 279-302.
Helmholtz, von H. (1925). Treatise on physiological optics. Translated from the 3rd German edition. Southall, J.P.C. (Ed.) Dover Publications.
Hildreth, E.C. (1984). Computations underlying the measurement of visual motion. AI Memo No. 761, Massachusetts Institute of Technology, Artificial Intelligence Laboratory.
Horn, B.K.P. and Schunck, B.G. (1981). Determining optical flow. Artificial Intelligence 17, 185-203.
Horn, B.K.P. and Weldon, E.J. (1988). Direct methods for recovering motion. International Journal of Computer Vision 2, 51–76.
Horn, B.K.P. (1986). Robot vision. MIT Press, ISBN 0-262-08159-8.
Horn, B.K.P., Fang, Y., and Masaki, I. (2007). Time to contact relative to a planar surface. In Proc. IEEE Intelligent Vehicles Symposium, Turkey. 68-74.
Hubel, D.H. and Wiesel, T.N. (1968). Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology 195, 25-243.
Jähne, B. (1993). Spatio-temporal image processing. LNCS 751, Springer. ISBN 0-387-57418.
Jepson, A. and Heeger, D. (1992). Linear subspace methods for recovering translational direction. Technical Report RBCV-TR-92-40, University of Toronto, Department of Computer Science, 6 Kings College Road, Toronto, Ontario.
Johnston, A., McOwan, P.W., and Buxton, H. (1992). A computational model of the analysis of some first-order and second-order motion patterns by simple and complex cells. Proceedings of the Royal Society, London B 250, 297-306.
Karhoff, B.C., Limb, J.I, Oravsky, S.W., and Shephard, A.D. (2006) Eyes in the domestic sky: an assessment of sense and avoid technology for the army's "Warrior" unmanned aerial vehicle. In Proc. IEEE Systems and Information Engineering Design Symposium, 36–42.
Kanatani, K. (1993). 3D interpretation of optical-flow by renormalization. International Journal of Computer Vision 11(3), 267-282.
Knutsson, H., Westin, C.-F., and Andersson, M. (2011). Representing local structure using tensors II. In Heyden, A. and Kahl, F. (Eds.): SCIA'2011, LNCS 6688, 545–556.
Koenderink, J.J. and van Doorn, A.J. (1976). Local structure of movement parallax of the plane. Journal of the Optical Society of America 66(7), 717-723.
Kumar, A. Tannenbaum, A.R., and Balas, G.J. (1996). Optical flow: a curve evolution approach. IEEE Transactions on Image Processing 5(4), 598-610.
Lauze, F., Kornprobst, P., and Memin, E. (2004). A coarse to fine multiscale approach for linear least squares optical flow estimation. In Proc. British Machine Vision Conference (BMVC) 2, 767-776.
Lappe, M. and Rauschecker, J. (1993). A neural network for the processing of optic flow from ego-motion in man and higher mammals. Neural Computation 5, 374-391.
Layton, O.W., Mingolla, E., and Browning, N.A. (2012). A motion pooling model of visually guided navigation explains human behavior in the presence of independently moving objects. Journal of Vision 12(1), doi: 10.1167/12.1.20.
Lee, D.N. (1976). A theory of visual control of braking based on information about time-to-collision. Perception 5, 437-459.
Li, L. and Warren, W.H. (2000). Perception of heading during rotation: sufficiency of dense motion parallax and reference objects. Vision Research 40, 3873-3894.
Little, J.J., Bulthoff, H.H., and Poggio, T.A. (1988). Parallel optical flow using local voting. In Proceedings of the 2nd International Conference on Computer Vision, Tampa, 454-459.
Longuet-Higgins, H.C. and Prazdny, K. (1980). The interpretation of a moving retinal image. Proceedings of the Royal Society of London B 208, 385-397.
Lucas, B.D. and Kanade, T. (1981). An iterative image registration technique with and application to stereo vision. In Proceedings of Imaging Understanding Workshop, 121-130.
Ma, Y., Koeck, J., and Sastry, S. (1998). Motion recovery from image sequences: discrete viewpoint vs. differential viewpoint. In *Burkhardt, H. and Neumann, B. (Eds.), ECCV’98 LNCS 1407, 337-353.
Marr, D. (1982). Vision. W.H. Freeman and Company, San Francisco.
McCane, B., Novins, K., Crannitch, D. and Galvin, B. (2001). On benchmarking optical flow. Computer Vision and Image Understanding 84, 126-143.
Min, C. and Medioni, G. (2008). Inferring segmented dense motion layers using 5D tensor voting. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(9), 1589-1602.
Mota, C., Stuke, I. and Barth, E. (2001). Analytical solutions for multiple motions. In Proceedings of IEEE International Conference on Image Processing (ICIP'01), Thessaloniki, Greece.
Movshon, J. A., Adelson, E. H., Gizzi, M. S. & Newsome, W. T. (1986). The analysis of moving visual patterns. In Chagas, C., Gattass R. & Gross, C. (Eds.) Experimental brain research supplementum II: Pattern recognition mechanisms, New York: Springer, 148-164.
Nagel, H.-H. (1983). Displacement vectors derived from second order intensity variations in image sequences. Computer Vision, Graphics, and Image Processing 21, 85-117.
Nagel, H.-H. and Enkelmann, W. (1986). An investigation of smoothness constraints for the estimation of displacement vector fields from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8(5), 565-593.
Nagel, H.-H. (1987). On the estimation of optical flow: relations between different approaches and some new results. Artificial Intelligence 33, 299-324.
Negahdaripour, S. and Horn, B.K.P. (1987). Direct passive navigation. IEEE Pattern Analysis and Machine Intelligence PAMI-9(1), 168-176.
Nowlan, S.J. and Sejnowski, T.J. (1995). A selection model for motion processing in area MT of primates. Journal of Neuroscience 15(2), 1195-1214.
Nyquist, H. (2002). Certain topics in telegraph transmission theory. In: Proceedings of the IEEE, “reprinted from Transactions of the A.I.E.E, pp. 617-644, Feb. 1928, 90, 280-305.
Orban, G.A. (2008). Higher order visual processing in macaque extrastriate cortex. Physiological Reviews 88(1), 59-89.
Otte, M. and Nagel, H.-H. (1994). Optical flow estimation: advances and comparisons. Jan-Olof Eklundh (Ed.) ECCV'94, LNCS 800, 51-60.
Pack, C.C., Livingstone, M.S., Duffy, K.R., and Born, R.T. (2003). End-stopping and the aperture problem: two-dimensional motion signals in macaque V1. Neuron 39, 671-680.
Pack, C.C. and Born, R.T. (2008). Cortical mechanisms for the integration of visual motion. In Masland, R.H. and Albright, T. (Eds.) The Senses: A Comprehensive Reference 2, 189-218.
Pauwels, K. and Van Hulle, M. (2006). Optimal instantaneous rigid motion estimation insensitive to local minima. Computer Vision and Image Understanding 104(1), 77-86.
Perrone, J.A. and Stone, L.S. (1994). A model of self-motion estimation within primate extrastriate visual cortex. Vision Research 34(21), 2917–2938.
Perrone, J.A. (1992). Model for the computation of self-motion in biological systems. Journal of Optical Society of America A, 9(2), 177–192.
Pitzalis, S., Sereno, M.I., Committeri, G., Fattori, P., Galati, G., Patria, F., and Galletti, C. (2010). Human V6: the medial motion area. Cerebral Cortex 20, 411-424.
Poggio, T., Torre, V., and Koch, C. (1985). Computational vision and regularization theory. Nature 317, 314-319.
Rashwan, H.A., Garcia, M.A., and Puig, D. (2013). Variational optical flow estimation based on stick tensor voting. IEEE Transactions on Image Processing 22(7), 2589-2599.
Raudies, F. and Neumann, H. (2009). An efficient, linear method for the estimation of ego-motion from optical flow. In Denzler, J., *Notni, G., and Suesse, H. (Eds.) DAGM'09 LNCS 5748, 11-20.
Raudies, F., Mingolla, E., and Neumann, H. (2011). A model of motion transparency processing with local center-surround interactions and feedback. Neural Computation 23(11), 2868–2914.
Raudies, F. and Neumann, H. (2012). A review and evaluation of methods estimating ego-motion. Computer Vision and Image Understanding 116, 606-633.
Ramirez-Manzanares, A., Rivera, M., Kornprobst, P., and Lauze, F. (2011). Variational multi-valued velocity field estimation for transparent sequences. Journal of Mathematical Imaging and Vision 40(3), 285-304.
Reichardt, W. (1987). Evaluation of optical motion information by movement detectors. Journal of Comparative Physiology A 161, 533-547.
Rieger, J. and Lawton, D.T. (1985). Processing differential image motion. Journal of the Optical Society of America 2(2), 354-359.
Robson, J.B. (1966). Spatial and temporal contrast-sensitivity functions of the visual system. Journal of the Optical Society of America 56(8), 1141-1142.
Rodman, H. R. & Albright, T. D. (1987). Coding of visual stimulus velocity in area MT of the macaque. Vision Research 27, 2035-Rojer, A.S. and Schwartz, E.L. (1990). Design considerations for a space-variant visual sensor with complex-logarithmic geometry. Proceedings of the International Conference on Pattern Recognition CVPR'90, 278-285.
Rothkopf, C.A. and Ballard, D.H. (2009). Image statistics at the point of gaze during human navigation. Visual Neuroscience 26, 81-92.
Royden, C.S., Banks, M.S., and Crowell, J.A. (1992). The perception of heading during eye movements. Nature 360, 583-585.
Royden, C.S., Crowell, J.A., and Banks, M.S. (1994). Estimating heading during eye movements. Vision Research 34(23), 3197-3214.
Royden, C.S. and Hildreth, E.C. (1996). Human heading judgments in the presence of moving objects. Perception & Psychophysics 58(6), 836-856.
Royden, C.S. (2002). Computing heading in the presence of moving objects: a model that uses motion-opponent operators. Vision Research 42, 3043-3058.
van Santen, J.P.H. and Sperling, G. (1985). Elaborated Reichardt detector. Journal of the Optical Society of America, A 2(2), 300-321.
Soccol, D., Thurrowgood, S., and Srinivasan, M.V. (2007). A vision system for optic-flow-based guidance of UAVs. In Proc. 9th Australasian Conference on Robotics and Automation. Brisbane, 10-12 December.
Srinivasan, M.V., Zhang, S.W., Lehrer,M., and Collett, T.S. (1996). Honeybee navigation en route to the goal: visual flight control and odometry. Journal of Experimental Biology 199, 237-244.
Shizawa, M. and Mase, K. (1991). A unified computational theory for motion transparency and motion boundaries based on eigenenergy analysis. In Proceedings of IEEE Computer Vision and Pattern Recognition Conference CVPR'91, 289-295.
Simoncelli, E.P. Adelson, E.H. and Heeger, D.J. (1991). Probability distributions of optical flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR’93, 310–315.
Simoncelli, E.P. and Heeger, D.J. (1998). A model of neuronal responses in visual area MT. Vision Research 38(5), 743-761.
Smith, A.T., Greenlee, M.W., Singh, K.D., Kraemer, F.M., and Hennig, J. (1998). The processing of first- and second-order motion in human visual cortex assessed by functional magnetic resonance imaging (fMRI). Journal of Neuroscience 18(10), 3816-3830.
Snowden, R. J., Treue, S., Erikson, R. G. & Andersen, R. A. (1991). The response of area MT and V1 neurons to transparent motion. Journal of Neuroscience 11, 2768-2785.
Stocker, A.A. and Simoncelli, E.P. (2006). Noise characteristics and prior expectations in human visual speed perception. Nature Neuroscience 9, 578–585.
Stoner, G.R., Albright, T.D., and Ramachandran, V.S. (1990). Transparency and coherence in human motion perception. Nature 344, 153-155.
Sun, D., Lewis, J.P., and Black, M.J. (2008). Learning optical flow. In Proc. of the European Conference on Computer Vision (ECCV), 83-97.
Sun, D., Roth, S., and Black, M.J. (2010). Secrets of optic flow estimation and their principles. In Proc. of Computer Vision and Pattern Recognition (CVPR) San Francisco, 2432-2439.
Suter, D. (1994). Motion estimation and vector splines. In Proc. of the International Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, 939–942.
Tanaka, K., Hikosaka, K., Saito, H., Yukie, M., Fukada, Y., and Iwai, E. (1986). Analysis of local and wide-field movements in the superior temporal visual areas of the macaque monkeys. Journal of Neuroscience 6(1), 134-144.
Tootell, R.B.H., Reppas, J.B., Kwong, K.K., Malach, R., Born, R.T., Brady, T.J., Rosen, B.R., and Belliveau, J.W. (1995). Functional analysis of human MT and related visual cortical areas using magnetic resonance imaging. Journal of Neuroscience 15(4), 3215-3230.
Tretiak, O. and Pastor, L. (1984). Velocity estimation from image sequences with second order differential operators. In Proceedings of the International Conference on Pattern Recognition, Montreal, Que. 16-19.
Uras, S., Girosi, F., Verri, A. and Torre, V. (1988). A computational approach to motion perception. Biological Cybernetics 60, 79-87.
Wallach, H. (1935). Über visuell wahrgenommene Bewegungsrichtung. Psychologische Forschung. Band 20.
Wang, J.Y.A. and Adelson, E.H. (1993). Layered representation for motion analysis. In Proceedings of IEEE Computer Vision and Pattern Recognition Conference CVPR'93, 361-366.
Warren, W.H. and Kurtz, K.J. (1992). The role of central and peripheral vision in perceiving the direction of self-motion. Perception & Psychophysics 51(5), 443-454.
Warren, W.H. and Saunders, J.A. (1995). Perceiving heading in the presence of moving objects. Perception 24, 315-331.
Watson, A.B. and Ahumada, A.J. (1983). A look at motion in the frequency domain. National Aeronautic and Space Administration (NASA), Technical Memorandum 84352.
Wedel, A., Brox, T., Vaudrey, T., Rabe, C., Franke, U. and Cremers, D. (2011). Stereoscopic scene flow computation for 3d motion understanding. International Journal of Computer Vision 95, 29-51.
Wei, X.-X. and Stocker, A.A. (2013). Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference. In Proc. Advances in Neural Information Processing Systems (NIPS) 25, 1313-1321.
Weidenbacher, U. and Neumann, H. (2009). Extraction of surface-related features in a recurrent model of V1-V2 interactions. PLoS ONE 4(6): e5909.doi:10.1371/journal.pone.0005909.
Weickert, J. (1996). Anisotropic diffusion in image processing. PhD Thesis Universität Kaiserslautern.
Weickert, J. and Schnörr, C. (2001). A theoretical framework for convex regularizers in PDE-based computation of image motion. International Journal Computer Vision 45(3), 245-264.
Weiss, Y. (1997). Smoothness in layers: motion segmentation using nonparametric mixture estimation. In Proceedings of IEEE Computer Vision and Pattern Recognition Conference CVPR'97, 520-527.
Weiss, A. and Fleet, D.J. (2002). Velocity likelihoods in biological and machine vision. In Rao, R.P.N., Olshausen, B.A., and Lewicki, M.S. (Eds.) Probabilistic models of the brain: Perception and neural function. MIT Press, Cambridge, Massachusetts.
Weiss, Y. and Adelson, E.H. (1998). Slow and smooth: a Bayesian theory for the combination of local motion signals in human vision. Technical Report 1624, MIT AI lab.
Xiao D. K., Raiguel S., Marcar V., and Orban G.A. (1997). The spatial distribution of the antagonistic surround of MT/V5 neurons. Cerebral Cortex 7(7), 662–77.
Xu, C., Prince, J.L. (1998). Snakes, shapes, and gradient vector flow. IEEE Transactions on Image Processing 7(3), 359-369.
Young, D.M. (1950). Iterative methods for solving partial difference equations of elliptical type. PhD thesis, Harvard University.
Zach, C., Pock, T., and Bischof, H. (2007). A duality based approach for realtime TV-L optical flow. Hamprecht, F.A., Schnörr, C., and Jähne, B. (Eds.): DAGM 2007, LNCS 4713, 214–223.
Zeki, S., Perry, R.J., and Bartels, A. (2003). The processing of kinetic contours in the brain. Cerebral Cortex 18, 189-202.
Zimmer, H., Bruhn, A., Weickert, J., Valgaerts, L., Salgado, A., Rosenhahn, B., and Seidel, H.-P. (2009). Complementary optic flow. In Proceedings of Energy Minimization in Computer Vision and Pattern Recognition LNCS 5861, 207-220.
Zimmer, H., Bruhn, A. and Weickert, J. (2011). Optic flow in harmony. International Journal of Computer Vision 93, 368-388.
Zufferey, J.-C. and Floreano, D. (2006). Fly-inspired visual steering of an ultralight indoor aircraft. IEEE Transactions on Robotics 22(1), 137-146.

Appendix

A1 Method of Lucas & Kanade (1981)

Lucas & Kanade (1981) assume that all gray-value changes between two frames are caused by the displacement of pixel values by pixel or sub-pixel shifts. This is a simplification to the general case where there could be changes in gray-values that are introduced by an overall dimming or by abrupt changes, e.g. introduced by scene cuts. Assuming that such changes do not appear too often, the gray-value constancy as assumption is formally defined as:

$f(x+\Delta x,y+\Delta y,t+\Delta t)=f(x,y,t)$. (A1)

This constraint is used in a Taylor series expansion and the limit $\Delta t\to 0$ is computed, which introduces the velocities $\dot{x}$ and $\dot{y}$. Assuming that parts of higher order than the linear term in this expansion are negligible, we write the optic flow constraint equation (OFCE) as:

$(\nabla f)^{t} (\dot{x},\dot{y},1)=0$, (A2)

where ${}^{t}$ denotes the vector transpose. This single equation contains two unknowns. Thus, additional constraints are required to determine the optic flow. A modern variant of Lucas & Kanade's (1981) method assumes that the OFCE holds true in a local neighborhood of a sample point, which can be additionally weighted by a distance dependent function $w$, regarding the sampling point and the center of the neighborhood. Introducing the index $i$ for the sampling of $x$, $j$ for the sampling of $y$, and $k$ for sampling of $t$, we rewrite Eq. (A1) as follows:

$w_{i-i',j-j'} \left((\partial _{x} f_{ijk} )\dot{x}_{ijk} +(\partial _{y} f_{ijk} )\dot{y}_{ijk} +(\partial _{t} f_{ijk} )\right)=0$ for $\left|i-i'\right|\le n$ and $\left|j-j'\right|\le n$. (A3)

with $\partial x$, $\partial y$, $\partial t$ denoting partial derivatives for the dimensions of space and time. Eq. (A3) assumes sampling in a square, spatial neighborhood of the size (2$n$+1) $\times$ (2$n$+1) around the center $i$ and $j$. This system of equations in Eq. (A3) can be solved by:

$\left(\begin{array}{c} {\dot{x}_{ijk} } \\ {\dot{y}_{ijk} } \end{array}\right)=\frac{-1}{AC-B^{2} } \left(\begin{array}{c} {+AE-BF} \\ {-BE+CF} \end{array}\right)$ with $E=w*\left((\partial _{x} f_{ijk} )(\partial _{t} f_{ijk} )\right)$ , $F=w*\left((\partial _{y} f_{ijk} )(\partial _{t} f_{ijk} )\right)$

$A=w*(\partial _{x} f_{ijk} )^{2} $, $B=w*\left((\partial _{x} f_{ijk} )(\partial _{y} f_{ijk} )\right)$, and $C=w*(\partial _{y} f_{ijk} )^{2} $. (A4)

We use the spatial convolution * that operates upon the $(2n+1)\times (2n+1)$ neighborhood. A Matlab implementation of this method is provided here Download.

A2 Method of Mota et al. (2001)

Multiple motions can appear in a single region of the visual field, e.g. when partly transparent materials move, or shadows move over an also moving background, or when observing snow during driving. Mote et al. (2001) propose a method for the detection of multiple motions. First, they define the generalized structure tensor for $n$ motions as:

$J_{n} =\int \vec{l}(\vec{x})^{t} \vec{l}(\vec{x})\cdot a(\vec{x})\; d\vec{x}$ with $\vec{l}(\vec{x})=f(\vec{x})*\partial ^{\alpha } g(\vec{x})$ and $\left|\alpha \right|=n$. (A5)

where $g$ is a kernel function and $\alpha $ denotes a multi-index and $a(\vec{x})$ is a weight. For an explanation of multi-indices, see App. A10. Note that the partial derivative is imposed onto the filtering kernel, e.g. a Gaussian, which is then convolved * with the input signal rather than employing two filtering steps: One for the kernel and one for the partial derivative. After computing the tensor components these are element-wise smoothed with the weighing function $a(\vec{x})$. For two flow components $n$ = 2, the tensor is computed by:

$J_{2} =\left(\begin{array}{cccccc} {J_{1111} } & {J_{1122} } & {J_{1133} } & {J_{1112} } & {J_{1113} } & {J_{1123} } \\ {J_{2211} } & {J_{2222} } & {J_{2233} } & {J_{2212} } & {J_{2213} } & {J_{2223} } \\ {J_{3311} } & {J_{3322} } & {J_{3333} } & {J_{3312} } & {J_{3313} } & {J_{3323} } \\ {J_{1211} } & {J_{1222} } & {J_{1233} } & {J_{1212} } & {J_{1213} } & {J_{1223} } \\ {J_{1311} } & {J_{1322} } & {J_{1333} } & {J_{1312} } & {J_{1313} } & {J_{1323} } \\ {J_{2311} } & {J_{2322} } & {J_{2333} } & {J_{2312} } & {J_{2313} } & {J_{2323} } \end{array}\right)$ with entries (A6a)

$J_{ijkl} =a*\left(\left(f*\frac{\partial ^{2} g}{\partial x_{i} \partial x_{j} } \right)\left(f*\frac{\partial ^{2} g}{\partial x_{k} \partial x_{l} } \right)\right)$. (A6b)

The mixed motion parameters can be computed through the minors of the generalized structure tensor. Motion components are extracted from these mixed motion parameters by defining the motion vectors as complex numbers and computing the roots of a complex-valued polynomial whose coefficients are defined by the mixed motion parameters. The polynomial has the degree $n$. For the case $n$ = 2, we provide a Matlab implementation here: Download.

A3 Method of Farnebäck (2000)

Another technique describes local image structure with tensors (Granlund & Knutsson, 1995; Knutsson et al., 2011). For the estimation of optic flow, Farnebäck (2000) proposed a method that uses the 2${}^{nd}$ order degree polynomial $f(\vec{z})\sim \vec{z}^{t} A\vec{z}+b\vec{z}+c$ with $\vec{z}=(x,y,t)^{t} $ to describe the image structure. This polynomial defines a set of ten basis vectors $\left\{\; 1,x,y,t,xy,xt,yt,x^{2} ,y^{2} ,t^{2} \right\}_{x,y,t=-k...+k} $. To show an example of such a basis, we use the simpler example with six basis vectors $\left\{\; 1,x,y,xy,x^{2} ,y^{2} \right\}_{x,y=-k...+k} $ and $k=1$. The basis is defined by the matrix: \[B=\left(\vec{b}_{1} ,\vec{b}_{2} ,\vec{b}_{3} ,\vec{b}_{4} ,\vec{b}_{5} ,\vec{b}_{4} \right)=\left(\begin{array}{cccccc} {1} & {-1} & {-1} & {+1} & {+1} & {+1} \\ {1} & {0} & {-1} & {0} & {0} & {+1} \\ {1} & {+1} & {-1} & {-1} & {+1} & {+1} \\ {1} & {-1} & {0} & {0} & {+1} & {0} \\ {1} & {0} & {0} & {0} & {0} & {0} \\ {1} & {+1} & {0} & {0} & {+1} & {0} \\ {1} & {-1} & {+1} & {-1} & {+1} & {+1} \\ {1} & {0} & {+1} & {0} & {0} & {+1} \\ {1} & {+1} & {+1} & {+1} & {+1} & {+1} \end{array}\right).\]

The tensor is defined as the minimum residual when projecting the function $f$ onto the basis system $B$ accounting for the weights $E$ and confidence values $C$. Formally, this is the solution to the optimization problem:

$\vec{r}^{*} =\arg \min _{\vec{r}} \left\| B\vec{r}-f\right\| _{E,C} $, which is $\vec{r}^{*} =(ECB)^{+} (ECf)$, (A7)

with $\vec{r}^{*} $ denoting the minimum residual. Note that the weights $E$ and confidence values $C$ are point-wise multiplied onto the components of the basis vectors. The solution of the optimization problem is given by computing the pseudo-inverse ${(ECB)}^{+}$. More detailed, the computation of $\vec{r}^{*} $ is:

$\vec{r}^{*} =\left(\begin{array}{ccc} {{\rm <}\vec{e}.\vec{b}_{1} ,\; \vec{b}_{1} .\vec{c}{\rm >}} & {\cdots } & {{\rm <}\vec{e}.\vec{b}_{1} ,\; \vec{b}_{m} .\vec{c}{\rm >}} \\ {\vdots } & {\ddots } & {\vdots } \\ {{\rm <}\vec{e}.\vec{b}_{m} ,\; \vec{b}_{1} .\vec{c}{\rm >}} & {\cdots } & {{\rm <}\vec{e}.\vec{b}_{m} ,\; \vec{b}_{m} .\vec{c}{\rm >}} \end{array}\right)^{-1} \left(\begin{array}{c} {{\rm <}\vec{e}.\vec{c},\; \vec{b}_{1} .\vec{f}{\rm >}} \\ {\vdots } \\ {{\rm <}\vec{e}.\vec{c},\; \vec{b}_{m} .\vec{f}{\rm >}} \end{array}\right)$ (A8)

with ${\rm <}\cdot ,\; \cdot {\rm >}$ denoting the inner product, $\vec{b}_{1} \dots \vec{b}_{m} $ the basis vectors, $\vec{e}$ and $\vec{c}$ vectors of weights and confidence values, respectively, and $\vec{f}$ a vector with values associated with all neighboring values of a signal matrix, e.g. in row-major order. These neighboring values of the current position are defined by the extent of the basis vectors and correspond to these. For instance, the basis with six vectors defines a $3 \times 3$ neighborhood.

According to the above definition of the basis system $\left\{\; 1,x,y,t,xy,xt,yt,x^{2} ,y^{2} ,t^{2} \right\}$ the coefficients of the polynomial $f(\vec{z})\sim \vec{z}^{t} A\vec{z}+\vec{b}^{t} \vec{z}+c$ are given by:

$A=\left(\begin{array}{ccc} {r_{8}^{*} } & {r_{5}^{*} } & {r_{6}^{*} } \\ {r_{5}^{*} } & {r_{9}^{*} } & {r_{7}^{*} } \\ {r_{6}^{*} } & {r_{7}^{*} } & {r_{10}^{*} } \end{array}\right)$, $\vec{b}=\left(\begin{array}{c} {r_{2}^{*} } \\ {r_{3}^{*} } \\ {r_{4}^{*} } \end{array}\right)$, and $c=r_{1}^{*} $. (A9)

Following Farnebäck (2000), the orientation tensor is defined as:

$T=AA^{t} +\gamma \cdot \vec{b}\vec{b}^{t} $ (A10)

with the parameter $\gamma=1/255$ assuming that values of the function $f$ range between zero and 255. From this tensor the isotropic part, corresponding to the smallest eigenvalue, is removed. This isotropic part represents constant gray-value structure. In an ideal situation no change in gray-value structure along the velocity vector $\vec{w}=(v_{x} ,\; v_{y} ,\; 1)^{t} $ exists; however, due to noise and non-translational local motion a residual might exists. Thus, we optimize for $\vec{w}$ to be minimal:

$\vec{w}^{*} =\arg \min _{\vec{w}} \vec{w}^{t} T\vec{w}$. (A11)

In addition, Farnebäck (2000) uses a linear, affine motion model to define the velocities, which is given by:

$\vec{w}=\underbrace{\left(\begin{array}{ccccccc} {1} & {x} & {y} & {0} & {0} & {0} & {0} \\ {0} & {0} & {0} & {1} & {x} & {y} & {0} \\ {0} & {0} & {0} & {0} & {0} & {0} & {1} \end{array}\right)}_{=:S} \underbrace{\left(\begin{array}{ccccccc} {p_{1} } & {p_{2} } & {p_{3} } & {p_{4} } & {p_{5} } & {p_{6} } & {p_{7} } \end{array}\right)^{t} }_{=:\vec{p}} =S\vec{p}$ (A12)

Introducing the motion model into the optimization problem of Eq. (A11) gives

$\vec{p}^{*} =\arg \min _{\vec{p}} \vec{p}^{t} S^{t} TS\vec{p}=\arg \min _{\vec{p}} \vec{p}Q\vec{p}$ with $Q=S^{t} TS$. (A13)

Due to the structure of $S$, essentially ${p}_{7}$ only influences the last entry of the matrix, this problem can be re-written as:

$\vec{\tilde{p}}^{*} =\arg \min _{\vec{\tilde{p}}} \vec{\tilde{p}}^{t} \tilde{Q}\vec{\tilde{p}}+\vec{\tilde{p}}^{t} \vec{q}+\vec{q}^{t} \vec{\tilde{p}}+s$ with $Q=\left(\begin{array}{cc} {\tilde{Q}} & {\vec{q}} \\ {\vec{q}^{t} } & {s} \end{array}\right)$ and $\vec{p}=(\vec{\tilde{p}}^{t} ,\; 1)^{t} $. (A14)

The estimated parameters are $\vec{\tilde{p}}^{*} =-\tilde{Q}^{-1} \vec{q}$ which yields the velocity estimate $\vec{w}^{*} =S\vec{\tilde{p}}^{*} $. We provide a Matlab implementation of the method here: Download.

A4 Method of Heeger (1988)

Several authors proposed the detection of motion in the Fourier domain (Gafni & Zeevi, 1979; Watson & Ahamuda, 1983; Heeger, 1988). Introducing the velocity vector $\vec{v}=(u,v)^{t} $, the OFCE from Eq. (2) can be written as $f(\vec{x},t)=f(\vec{x}-\vec{v}t)$. This can be shown by computing the partial derivative in t. Following the description of Jähne (1998, p. 92f), the OFCE is described in the Fourier domain by:

$\begin{array}{l} {\hat{f}(\vec{k},\omega )=\frac{1}{\sqrt{2\pi } ^{3} } \iiint \nolimits _{}f(\vec{x},t)\cdot \exp (-i(\vec{k}^{t} \vec{x}+\omega t))d\vec{x}dt} \\ {\quad \quad \quad =\frac{1}{\sqrt{2\pi } ^{3} } \iiint \nolimits _{}f(\vec{x}-\vec{v}t)\cdot \exp (-i(\vec{k}^{t} \vec{x}+\omega t))d\vec{x}dt} \\ {\quad \quad \quad =\frac{1}{\sqrt{2\pi } ^{2} } \iint \nolimits _{}f(\vec{y},t)\cdot \exp (-i(\vec{k}^{t} \vec{y}+\omega t))d\vec{y} \frac{1}{\sqrt{2\pi } } \int _{}1\cdot \exp (-i(\vec{k}^{t} \vec{v}t))dt } \\ {\quad \quad \quad =\hat{f}(\vec{k})\cdot \delta (\vec{k}^{t} \vec{v}-\omega )} \end{array}$. (A15)

In Eq. (A15), the vector $\vec{k}=(k_{x} ,k_{y} )^{t} $ denotes the angular frequencies for the spatial coordinates $x$ and $y$ and $\omega $ denotes the angular frequency associated with time $t$. The deviation in Eq. (A15) uses the coordinate transform $\vec{y}=\vec{x}-\vec{v}t$ and the shift theorem. The Fourier transform of the gray-value constancy assumption states that the transform of the spatio-temporal image sequence $\hat{f}(\vec{k},\omega )$ equals the amplitude spectrum $\hat{f}(\vec{k})$ at the position $\vec{k}^{t} \vec{v}-\omega =0$. Thus, the constraint of motion in the Fourier domain is a plane. We write this plane in normal form as:

$\left(\begin{array}{c} {\vec{k}} \\ {\omega } \end{array}\right)^{t} \left(\begin{array}{c} {+\vec{v}} \\ {-1} \end{array}\right)=0$. (A16)

Planes that correspond to the parameters $\vec{v}=(u,v)^{t} $ pass all through the origin. Heeger (1988) proposed to sample the amplitude spectrum $\hat{f}(\vec{k})$ by a family of Gabor filters (Gabor, 1946). For a single velocity all Gabor filters are placed in the plane defined by Eq. (5). Here is a Matlab implementation of the method proposed by Heeger (1988): Download.

A5 Method of Shizawa & Mase (1991)

Shizawa & Mase (1991) suggest a general scheme for the estimation of multiple motions in the Fourier domain. They start with a space-time formulation of the OFCE:

$\frac{\partial f}{\partial x} u+\frac{\partial f}{\partial y} v+\frac{\partial f}{\partial t} w=0$ (A17a)

or

$\vec{u}^{t} \nabla f(\vec{x})=0$ with $\vec{x}=(x,y,t)^{t} $ and $\vec{u}=(u,v,w)^{t} $. (A17b)

The difference to Eq. (9) is given by the symmetric definition between space and time introducing the variable $w$. The velocity vector $\vec{u}$ is assumed to be normalized: $\left\| \vec{u}\right\| =1$. Motivated by the state of multiple elementary particles in quantum mechanics, Shizawa & Mase (1991) assume that each of the flow vectors is analogous to an elementary particle and its state. Such state is described as element of a symmetrical tensor product space. For a mixture of $n$ motions the constraint from Eq. (A17b) is generalized and gives:

$a(\vec{u}_{1} )a(\vec{u}_{2} )...a(\vec{u}_{n} )f(\vec{x})=0$ with $a(\vec{u})=\vec{u}^{t} \nabla _{xyt} $ and $\nabla _{xyt} =(\partial _{x} ,\partial _{y} ,\partial _{t} )^{t} $. (A18)

and in the Fourier domain this is expressed by:

$\tilde{a}(\vec{u}_{1} )\tilde{a}(\vec{u}_{2} )...\tilde{a}(\vec{u}_{n} )f(\vec{\omega })=0$ with $\vec{\omega }=(\xi ,\eta ,\tau )^{t} $ and $\tilde{a}(\vec{u})=2\pi i\cdot \vec{u}^{t} \vec{\omega }$. (A19)

The correspondence between Eq. (A18) and Eq. (A19) can be understood by looking at the deviation in Eq. (A15) but applying the coordinate transform now to all three components $x$, $y$, and $t$, which are treated equally. Another difference is the definition of frequencies. The Eq. (A15) uses angular frequencies, whereas Eq. (A19) uses just frequencies. This difference explains the additional scaling of $2\pi$ in Eq. (A19) compared to Eq. (A15). For the following deviations we stay in the Fourier domain and assume that the transform into that domain is locally achieved, e.g. by Gabor filers (Gabor, 1946). In addition, for purposes of illustration we will restrict the deviation to the case with two flow components. See Shizawa & Mase (1991) for a general formulation with $n$ motions. Using the constraint in Eq. (A19) we minimize:

$F(\vec{v}_{1} ,\vec{v}_{2} )=\frac{\int (\vec{u}_{1} ^{t} \vec{\omega }\cdot \vec{u}_{2} ^{t} \vec{\omega })^{2} \left|F(\vec{\omega })\right|^{2} d\vec{\omega } }{\int \left|F(\vec{\omega })\right|^{2} d\vec{\omega } } $ with subject to $\vec{u}_{1} $ and $\vec{u}_{2} $. (A20)

This can be formulated as the eigenvalue problem:

$A\vec{u}_{m} =\lambda \vec{u}_{m} $ with $A=\left(\int \left|F(\vec{\omega })\right|^{2} d\vec{\omega } \right)^{-1} \cdot \int \left|F(\vec{\omega })\right|^{2} (\vec{\omega }\otimes \vec{\omega }\otimes \vec{\omega }\otimes \vec{\omega })\; d\vec{\omega } $

where the symbol $\otimes $ denotes the tensor product and $\vec{u}_{m} $ a vector, which contains the corresponding components from the flow vectors $\vec{u}_{1} $ and $\vec{u}_{2} $. Note that the matrix $A$ is $9 \times 9$ matrix and has linear dependencies. These dependencies can be eliminated by formulating a symmetrical tensor product space by:

$V_{s} =\frac{1}{2} \left(\vec{u}_{1} \otimes \vec{u}_{2} +\vec{u}_{2} \otimes \vec{u}_{1} \right)=\left(\begin{array}{ccc} {u_{s,11} } & {u_{s,12} } & {u_{s,13} } \\ {u_{s,21} } & {u_{s,22} } & {u_{s,23} } \\ {u_{s,31} } & {u_{s,32} } & {u_{s,33} } \end{array}\right)$ , (A21)

which has six entries. According to Eq. (A21), the corresponding 6D system is given by:

$M\vec{u}_{s} =\lambda \vec{u}_{s} $ with $\vec{u}_{s} =(u_{s,11} ,\; u_{s,22} ,\; u_{s,33} ,\; u_{s,12} ,\; u_{s,23} ,\; u_{s,31} )^{t} $ (A22a)

and

$M=\left(\begin{array}{cccccc} {M_{1111} } & {M_{1122} } & {M_{1133} } & {2M_{1112} } & {2M_{1123} } & {2M_{1131} } \\ {M_{2211} } & {M_{2222} } & {M_{2233} } & {2M_{2212} } & {2M_{2223} } & {2M_{2231} } \\ {M_{3311} } & {M_{3322} } & {M_{3333} } & {2M_{3312} } & {2M_{3323} } & {2M_{3331} } \\ {2M_{1211} } & {2M_{1222} } & {2M_{1233} } & {4M_{1212} } & {4M_{1223} } & {4M_{1231} } \\ {2M_{2311} } & {2M_{2322} } & {2M_{2333} } & {4M_{2312} } & {4M_{2323} } & {4M_{2331} } \\ {2M_{3111} } & {2M_{3122} } & {2M_{3133} } & {4M_{3112} } & {4M_{3123} } & {4M_{3131} } \end{array}\right)$ (A22b)

with

$M_{ijkl} =\left(\int \left|F(\vec{\omega })\right|^{2} d\vec{\omega } \right)^{-1} \cdot \int \left|F(\vec{\omega })\right|^{2} \vec{\omega }_{i} \cdot \vec{\omega }_{j} \cdot \vec{\omega }_{k} \cdot \vec{\omega }_{l} \; d\vec{\omega } $ and $i,j,k,l\in \left\{1,\; 2,\; 3\right\}$. (A22c)

The eigenvector that corresponds to the smallest eigenvalue is the solution of this minimization problem. This solution contains a mixture of both flow vectors $\vec{u}_{1} $ and $\vec{u}_{2} $, which can be decomposed analytically solving the following 2${}^{nd}$ eigenvalue problem:

$U_{s} \vec{e}=\lambda \vec{e}$. (A23)

If rank($U_{s}$)=1, then $\vec{u}_{1} $ and $\vec{u}_{2} $ are parallel and if rank($U_s$)~=~2, then $\vec{u}_{1} $ and $\vec{u}_{2} $ are not parallel. In an ideal situation the middle eigenvalue of $U_s$ is required to be zero and its deviation from zero can be used as a confidence measure. Assume the non-ideal case, that is rank($U_s$)~=~3, and let the eigenvalues be $\lambda_1$, $\lambda_2$, and $\lambda_3$ ordered from smallest to largest with their corresponding eigenvectors $\vec{e}_{1} $, $\vec{e}_{2} $, and $\vec{e}_{3} $. Then, the two velocity components are computed by:

$\vec{u}_{1} =+\sqrt{\frac{\lambda _{2} -\lambda _{1} }{\lambda _{3} -\lambda _{1} } } \vec{e}_{1} +\sqrt{\frac{\lambda _{3} -\lambda _{2} }{\lambda _{3} -\lambda _{1} } } \vec{e}_{3} $ and $\vec{u}_{2} =-\sqrt{\frac{\lambda _{2} -\lambda _{1} }{\lambda _{3} -\lambda _{1} } } \vec{e}_{1} +\sqrt{\frac{\lambda _{3} -\lambda _{2} }{\lambda _{3} -\lambda _{1} } } \vec{e}_{3} $. (A24)

For the computation of the local Fourier transform Shizawa & Mase (1991) use the Gabor filters:

$\begin{array}{l} {g(\xi _{0} ,\eta _{0} ,\tau _{0} ,x_{0} ,y_{0} ,t_{0} ,x,y,t)} \\ {\quad =\frac{1}{\left(\sqrt{2\pi } \right)^{3} \sigma _{x} \sigma _{y} \sigma _{t} } \exp \left(-\left(\frac{(x-x_{0} )^{2} }{2\sigma _{x}^{2} } +\frac{(y-y_{0} )^{2} }{2\sigma _{y}^{2} } +\frac{(t-t_{0} )^{2} }{2\sigma _{t}^{2} } \right)\right)} \\ {\quad \; \; \cdot \exp (\left(2\pi i\left((x-x_{0} )\xi _{0} +(y-y_{0} )\eta _{0} +(t-t_{0} )\tau _{0} \right)\right)} \end{array}$. (A25)

The window of these Gabor filters is 23~$\times$~23~$\times$~23~pixels with the standard deviations $\sigma_x = \sigma_y = \sigma_t = 4$ pixels, thus, $x_0 = y_0 = t_0 = 12$ pixels. The pixel frequencies are sampled on a regular lattice within the interval 0 to 0.4~(pixels)${}^{-1}$ at a distance of 0.07~(pixels)${}^{-1}$. This cubic lattice defines the range of the integrals in Eq. (A22c). For a discretized version these integrals are replaced by sums. A Matlab implementation of this method, which assumes maximally two motions, is provided here: Download.

A6 Method of Fleet & Jepson (1990)

Another idea is to use the local image phase instead of the squared amplitude signal, like in Eq. (A20), to detect image velocities (Fleet & Jepson, 1990). The Gabor filters (Gabor, 1946) define a filter pair with odd and even part. This allows for the definition of a local phase $\phi (\vec{x},t)$ and local amplitude $\rho (\vec{x},t)$ computing the angle and amplitude from odd and even part. Then, the signal can be expressed by amplitude and phase:

$g(\vec{x},t)=\rho (\vec{x},t)\cdot \exp (-i\phi (\vec{x},t))$. (A26)

Fleet & Jepson (1990) assume that the space-time contours are constant under motion: $\phi (\vec{x},t)=c$. Computation of the temporal derivative, note the chain rule applies, gives:

$\frac{\partial \phi }{\partial x} \frac{\partial x}{\partial t} +\frac{\partial \phi }{\partial y} \frac{\partial y}{\partial t} +\frac{\partial \phi }{\partial t} \cdot 1=0$ or $(\nabla \phi )^{t} \left(\begin{array}{c} {\vec{v}} \\ {1} \end{array}\right)=0$. (A27)

This constraint is similar to the OFCE, only $f$ is replaced by $\phi$, compare with Eq. (A2). The image phase $\phi$ has some properties that require particular care in evaluating the constraint in Eq. (A27).

First, the image phase is cyclic and ranges between 0 and $2\pi$. Thus, calculating the gradient numerically by a one-sided or central difference will fail. One solution is to observe that $\phi =\arctan(Im\left\{g\right\}/Re\left\{g\right\})$ where $Im\left\{g\right\}$ denotes the imaginary part and $Re\left\{g\right\}$ the real part. Then the gradient is computed by

$\nabla \phi =\frac{Im\left\{\bar{g}\cdot \nabla g\right\}}{Re\left\{g\right\}^{2} +Im\left\{g\right\}^{2} } $ (A28)

with the complex conjugate $\bar{g}$ of g.

Second, the local image phase signal has singularities when the amplitude response $ \textit{ $\rho$ }$ is small. This happens in cases where the Gabor filter is "incompatible" with the underlying image structure. One solution to that singularity problem is the detection of image phase at multiple scales and choosing the scale with a strong amplitude signal, which could serve as a measure of reliability. Another solution is the adaptive frequency tuning of Gabor filters used at each image location. A gradient ascent approach can find the locally maximum amplitude response of $g$ as a function of frequency (Ahrns & Neumann, 1998).

Third, another limitation from working with image phases is the detectability of image velocities. Detectable shifts are coupled to the wavelength of the Gabor filter or other band-pass filters. For Gabor filters shifts are limited to maximally half of the wavelength of the Gabor filter due to the sampling theorem (Nyquist, 2002). This can be taken into account by allowing for multiple scales of Gabor filters with different frequencies, see e.g. Fleet & Jepson (1990). A Matlab implementation of a local image phase-based approach for optic flow detection is available here Download.

A7 Method of Horn & Schunck (1981)

Horn & Schunck (1981) proposed to combine the OFCE with the assumption that optic flow appears smooth in each local region of the image. This smoothness assumption and the OFCE are combined in the following functional:

$F(u,v)=\int _{N}\underbrace{\left((\nabla f(x,y,t))^{t} \vec{w}(x,y)\right)^{2} }_{data\; term} +\eta \underbrace{\left(\left\| \nabla u(x,y,t)\right\| _{2}^{2} +\left\| \nabla v(x,y,t)\right\| _{2}^{2} \right)}_{smoothness\; \; term} d\vec{x}$ (A29)

The identifier $\vec{w}=\left(x,y,1\right)^{t} $ denotes the spatio-temporal coordinates. The first term in Eq. (A29) is called the data term and equals the OFCE squared. The second term in Eq. (A29) is the smoothness constraint, which takes the partial derivatives in both spatial dimensions and optionally in the temporal dimension. These derivatives are squared and summed, which is expressed by calculating the Euclidean norm squared. The Eq. (A29) is solved by the Euler-Lagrange equations (see e.g. Bruhn et al., 2006):

$0=J_{11} u+J_{12} v+J_{13} -\eta \cdot \Delta u$ (A30a)

$0=J_{12} u+J_{22} v+J_{23} -\eta \cdot \Delta v$ (A30b)

combined with homogenous Neumann boundary conditions. The symbol $\Delta $ denotes the Laplace operator. Equation (A30a) and (A30b) are coupled through the unknown functions $u$ and $v$. A discretization of each equation yields a sparse equation system which can be efficiently solved, e.g., by using Successive Over Relaxation (SOR) (Young, 1950). Various details of the method, especially the numerical treatment, have not been described, but are included in the following Matlab implementation, Download.

The assumption about local smoothness for optic flow holds if surfaces face the camera more or less; however, depth discontinuities as, e.g. introduced by corners, do not show a smooth transition of flow assuming that the camera has some translational velocity, compare with the model of Longuet-Higgins \& Prazdny (1980). This has been addressed by improved methods using non-linear convex penalty functions for the data and smoothness constraint (e.g. Brox et al., 2004).

A8 Method of Hildreth (1984)

Normal flow estimates occur along a rigid contour and, thus, can be integrated along this contour. This implies smoothness since the rigid contour has a coherent common 3D velocity, which is also constant under orthographic projection (Hildreth, 1984). Assume that the contour can be described by the curve $C$. Then, the normal flow $\vec{v}_{n} (s)$ along the curve parametrized by $s$ is estimated by minimizing:

$\int _{C}\left\| \vec{v}_{n} (s)+\frac{\partial_t f(s)\cdot (\nabla_{xy} f(s) )^{t} }{\left| \nabla_{xy}f(s) \right\|} \right\| _{2}^{2} +\eta \cdot \left\| \frac{\partial \vec{v}_{n} (s)}{\partial s} \right\| _{2}^{2} \; ds $. (A31)

The difficulty with such a formulation is the definition of the curve C using data of partial derivatives from the gray-value function $f$. Such curves can be found, e.g. by using active contours (Xu & Prince, 1998).

A9 Method of Uras et al. (1988)

This method uses higher-order spatio-temporal derivatives to provide more constraints, at the cost that these derivatives have to be computed with care and are more effected by noise than 1${}^{st}$ order derivatives. Assume a planar patch that moves with the time-constant translational velocity $\vec{v}=(u,v)^{t} $ (Uras et al., 1988). Then, besides Eq. (2), the following set of equations holds:

$\frac{\partial ^{2} f}{\partial x\partial t} =-u\frac{\partial ^{2} f}{\partial x^{2} } -v\frac{\partial ^{2} f}{\partial x\partial y} $ (A32a)

and

$\frac{\partial ^{2} f}{\partial y\partial t} =-u\frac{\partial ^{2} f}{\partial x\partial y} -v\frac{\partial ^{2} f}{\partial y^{2} } $, (A32b)

which are the partial derivatives in $x$ and $y$ of Eq. (2). These can be written as $\nabla_{xy} (\partial_t f)+H_{f} \vec{v}=0$ and if the condition number of the Hessian matrix ${H}_{f}$ is “sufficiently” large, then the velocity is computed by:

$\vec{v}^{*} =-H_{f} ^{-1} (\nabla_{xy} \frac{\partial f}{\partial t} )$. (A33)

If the velocity $\vec{v}=(u,v)^{t} $ varies with time, then the equivalent of Eq. (A32a) and (A32b) is derived by:

$\nabla_{xy} \partial_t f + H_{f} \vec{v}+(\nabla_{xy} \vec{v})^{t} (\nabla _{xy} f)=0$ with $\nabla_{xy} \vec{v}=\left(\begin{array}{cc} {\partial_{x} u} & {\partial_{y} v} \\ {\partial_{x} u} & {\partial_{y} v} \end{array}\right)$. (A34)

Note that Uras et al. (1988) use Eq. (A33) to detect the time-constant part of the velocity and Eq. (A34) to evaluate if their assumption of time-constancy holds, which was mostly the case for their tested image sequences. We provide a Matlab implementation of the method here: Download.

A10 Method of Otte & Nagel (1994)

This method uses not only higher-order derivatives of the image sequence but also of the velocities (Otte & Nagel, 1994). The general assumption is again the OFCE from Eq. (2). Taylor series approximations for $\nabla f$ and the velocity vector $\vec{w}$ are

$\nabla f(\vec{z})=\sum _{0\le \left|\alpha \right|\le n}\frac{1}{\alpha !} \partial ^{1+\alpha } f(\vec{z})\cdot (\delta \vec{z})^{\alpha } +R_{n} (\nabla f;\; \vec{z}) $ (A35a)

and

$\vec{w}(\vec{z})=\sum _{0\le \left|\beta \right|\le m}\frac{1}{\beta !} \partial ^{\beta } \vec{w}(\vec{z})\cdot (\delta \vec{z})^{\beta } +R_{m} (\vec{w};\; \vec{z}) $. (A35b)

The identifiers $\alpha $ and $\beta $ denote multi-indices, e.g. all multi-indices that fulfill the constraint $\left|\alpha \right|=2$ in the above example are (2,0,0), (0,2,0), (0,0,2), (1,1,0), (0,1,1), and (1,0,1). The partial derivative for the multi-index (1,0,1) is $\partial ^{2} /\partial z_{1} \partial z_{3} $ and the faculty operator is applied component-wise and faculties from all components are multiplied $(1,0,1)!=1! \cdot 0! \cdot 1!=1$. The R-terms in Eq. (35a) and (35b) are the residual terms of the Taylor series approximation. The term $\delta \vec{z}$ is the distance between the sample location and the reference $\vec{z}$. In our application of image processing this is typically the pixel offset between the center value and an arbitrary sampling of pixel locations in the spatio-temporal neighborhood. The Eq. (35a) introduces various higher-order spatial and temporal derivatives of the gray-value function $f$. The Eq. (A35b) introduces spatial and temporal derivatives of the velocity vector. Plugging Eq. (A35a) and (A35b) into Eq. (2) gives:

$\sum _{0\le \left|\alpha \right|\le n}\sum _{0\le \left|\beta \right|\le m}\frac{1}{\alpha !\beta !} \partial ^{1+\alpha } f(\vec{z})\cdot \partial ^{\beta } \vec{w}(\vec{z})\cdot (\delta \vec{z})^{\alpha +\beta } =0$. (A36)

This is a polynomial in the sampling distances $\delta \vec{z}$ with the maximum degree of $n+m$. In this polynomial $\vec{w}(\vec{z})$ are the unknowns and the partial derivatives $\partial ^{1+\alpha } f(\vec{z})$ are known. They are computed from the image sequence. The Eq. (A36) has $1/3(m+1)(m+2)(m+3)$ unknowns. These are given by the components of $\vec{w}$ and its partial derivatives. To increase the number of equations additional assumptions are required. Otte & Nagel (1994) summarize three of these assumptions. One is the rigorous condition in which all the coefficients of the polynomial in Eq. (A36) vanish identically (Nagel, 1987). This constraint can be written as $(n+m+3)$ over 3 equations:

$\sum _{\left|\alpha \right|+\left|\beta \right|=r}\frac{1}{\alpha !\beta !} \partial ^{1+\alpha } f(\vec{z})\cdot \partial ^{\beta } \vec{w}(\vec{z})\cdot (\delta \vec{z})^{\alpha +\beta } =0$ with $r=1...m+n$. (A37)

To illustrate this constraint we show the case of $n=2$ and $m=1$. In this case Eq. (A37) can be written as the following linear system of equations with 20 equations and eight unknowns using $\vec{w}=(u,\; v,\; 1)^{t} $. Table 2 lists the components of the matrix, unknowns, and right-hand side of the system of equations. A Matlab implementation of this method is found here: Download.

Table 2: shows 20 constraints for eight unknowns derived from Eq. (A25) assuming each polynomial coefficient vanishes independently. The equation number is plotted in the first column. Entries for the columns $\delta x$, $\delta y$, and $\delta z$ denote the exponents for the polynomial in these variables. The unknowns are $u$, $u_x$, $u_y$, $u_t$, $v$, $v_x$, $v_y$, and $v_t$. The constraints are formed by multiplying the entries from column four through 12 with their respective variables and equating the resulting sum to the last column. For instance, for row 1 the constraint equation is: $f_{x} u+f_{y} v=-f_{t} $.
no	$\delta x$	$\delta y$	$\delta t$	$u$	$u_x$	$u_y$	$u_t$	$v$	$v_x$	$v_y$	$v_t$	1
1	0	0	0	${f}_{x}$	0	0	0	${f}_{y}$	0	0	0	$-{f}_{t}$
2	1	0	0	${f}_{xx}$	${f}_{x}$	0	0	${f}_{xy}$	${f}_{y}$	0	0	$-{f}_{xt}$
3	0	1	0	${f}_{xy}$	0	${f}_{x}$	0	${f}_{yy}$	0	${f}_{y}$	0	${-f}_{yt}$
4	0	0	1	${f}_{xt}$	0	0	${f}_{x}$	${f}_{yt}$	0	0	${f}_{y}$	$-{f}_{tt}$
5	2	0	0	$1/2{f}_{xxx}$	${f}_{xx}$	0	0	$1/2{f}_{xxy}$	${f}_{xy}$	0	0	$-1/2{f}_{xxt}$
6	1	1	0	${f}_{xxy}$	${f}_{xy}$	${f}_{xx}$	0	${f}_{xyy}$	${f}_{yy}$	${f}_{xy}$	0	$-{f}_{xyt}$
7	1	0	1	${f}_{xxt}$	${f}_{xt}$	0	${f}_{xx}$	${f}_{xyt}$	${f}_{yt}$	0	${f}_{xy}$	$-{f}_{xtt}$
8	0	2	0	$1/2{f}_{xyy}$	0	${f}_{xy}$	0	$1/2{f}_{yyy}$	0	${f}_{yy}$}	0	$-1/2{f}_{yyt}$
9	0	1	1	${f}_{xyt}$	0	${f}_{xt}$	${f}_{xy}$	${f}_{yyt}$	0	${f}_{yt}$	${f}_{yy}$	$-{f}_{ytt}$
10	0	0	2	$1/2{f}_{xtt}$	0	0	${f}_{xt}$	$1/2{f}_{ytt}$	0	0	${f}_{yt}$	$-1/2{f}_{ttt}$
11	3	0	0	0	$1/2{f}_{xxx}$	0	0	0	$1/2{f}_{xxy}$	0	0	0
12	2	1	0	0	${f}_{xxy}$	$1/2{f}_{xxx}$	0	0	${f}_{xyy}$	$1/2{f}_{xxy}$	0	0
13	2	0	1	0	${f}_{xxt}$}	0	${1/2f}_{xxx}$	0	${f}_{xyt}$	0	$1/2{f}_{xxy}$	0
14	1	2	0	0	$1/2{f}_{xyy}$	${f}_{xxy}$	0	0	$1/2{f}_{yyy}$	${f}_{xyy}$	0	0
15	1	1	1	0	${f}_{xyt}$	${f}_{xxt}$	${f}_{xxy}$	0	${f}_{yyt}$	${f}_{xyt}$	${f}_{xyy}$}	0
16	1	0	2	0	${f}_{xtt}$	0	${f}_{xxt}$	0	$1/2{f}_{ytt}$	0	${f}_{xyt}$	0
17	0	3	0	0	0	$1/2{f}_{xyy}$	0	0	0	$1/2{f}_{yyy}$	0	0
18	0	2	1	0	0	${f}_{xyt}$	$1/2{f}_{xyy}$	0	0	${f}_{yyt}$	$1/2{f}_{yyy}$}	0
19	0	1	2	0	0	$1/2{f}_{xtt}$	${f}_{xyt}$	0	0	$1/2{f}_{ytt}$	${f}_{yyt}$	0
20	0	0	3	0	0	0	$1/2{f}_{xtt}$	0	0	0	$1/2{f}_{ytt}$	0

A11 Method of Nagel (1987)

Another constraint is the sampling condition. For this constraint the polynomial in Eq. (A36) is sampled in the neighborhood of $(\delta x,\; \delta y,\; \delta z)=(0,\; 0,\; 0)$ by $\delta x\in \left\{-1,\; 0,\; +1\right\}$, $\delta y\in \left\{-1,\; 0,\; +1\right\}$, and $\delta z\in \left\{-1,\; 0,\; +1\right\}$. This results in 27~constraint equations which are summarized in Table~3 in terms of the row-indices from Table~2. Thus a constraint equation is formed by plugging in the coefficients from Table~2 multiplied with `1' for a `+' entry, with `0' for a `0' entry, and with `-1' for a `-` entry in Table~3. These 27~equations can be reduced to a system of 24 linear independent equations. We provide a Matlab implementation of this method, too: Download.

Table 3: shows the constraint equations when sampling Eq. (A36). The first column shows an index. The second to fourth columns show the sampled values for $\delta_{x}$, $\delta_{y}$, and $\delta_{t}$. The constraint Eqs. 1, 3, 4, and 9 are linear dependent and one of these can be dropped. The same applies for constraint Eqs. 19, 21, 25, and 27 and for Eqs. 10, 12, 16, and 18. In total, three of the 27 equations can be dropped due to linear dependencies. This leaves 24 equations.
no	$\delta x$	$\delta y$	$\delta t$	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
1	-1	-1	-1	+	-	-	-	+	+	+	+	+	+	-	-	-	-	-	-	-	-	-	-
2	0	-1	-1	+	0	-	-	0	0	0	+	+	+	0	0	0	0	0	0	-	-	-	-
3	+1	-1	-1	+	+	-	-	+	-	-	+	+	+	+	-	-	+	+	+	-	-	-	-
4	-1	0	-1	+	-	0	-	+	0	+	0	0	+	-	0	-	0	0	-	0	0	0	-
5	0	0	-1	+	0	0	-	0	0	0	0	0	+	0	0	0	0	0	0	0	0	0	-
6	+1	0	-1	+	+	0	-	+	0	-	0	0	+	+	0	-	0	0	+	0	0	0	-
7	-1	+1	-1	+	-	+	-	+	-	+	+	-	+	-	+	-	-	+	-	+	-	+	-
8	0	+1	-1	+	0	+	-	0	0	0	+	-	+	0	0	0	0	0	0	+	-	+	-
9	+1	+1	-1	+	+	+	-	+	+	-	+	-	+	+	+	-	+	-	+	+	-	+	-
10	-1	-1	0	+	-	-	0	+	+	0	+	0	0	-	-	0	-	0	0	-	0	0	0
11	0	-1	0	+	0	-	0	0	0	0	+	0	0	0	0	0	0	0	0	-	0	0	0
12	+1	-1	0	+	+	-	0	+	-	0	+	0	0	+	-	0	+	0	0	-	0	0	0
13	-1	0	0	+	-	0	0	+	0	0	0	0	0	-	0	0	0	0	0	0	0	0	0
14	0	0	0	+	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
15	+1	0	0	+	+	0	0	+	0	0	0	0	0	+	0	0	0	0	0	0	0	0	0
16	-1	+1	0	+	-	+	0	+	-	0	+	0	0	-	+	0	-	0	0	+	0	0	0
17	0	+1	0	+	0	+	0	0	0	0	+	0	0	0	0	0	0	0	0	+	0	0	0
18	+1	+1	0	+	+	+	0	+	+	0	+	0	0	+	+	0	+	0	0	+	0	0	0
19	-1	-1	+1	+	-	-	+	+	+	-	+	-	+	-	-	+	-	+	-	-	+	-	+
20	0	-1	+1	+	0	-	+	0	0	0	+	-	+	0	0	0	0	0	0	-	+	-	+
21	+1	-1	+1	+	+	-	+	+	-	+	+	-	+	+	-	+	+	-	+	-	+	-	+
22	-1	0	+1	+	-	0	+	+	0	-	0	0	+	-	0	+	0	0	-	0	0	0	+
23	0	0	+1	+	0	0	+	0	0	0	0	0	+	0	0	0	0	0	0	0	0	0	+
24	+1	0	+1	+	+	0	+	+	0	+	0	0	+	+	0	+	0	0	+	0	0	0	+
25	-1	+1	+1	+	-	+	+	+	-	-	+	+	+	-	+	+	-	-	-	+	+	+	+
26	0	+1	+1	+	0	+	+	0	0	0	+	+	+	0	0	0	0	0	0	+	+	+	+
27	+1	+1	+1	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+

A12 Method of normal flow estimation

The OFCE $\nabla f^{t} (u,v,1)=0$ only allows for the detection of velocities that are normal to the orientation of the local gray-value gradient $(f_{x} ,f_{y} )$ as calculated for a local region within the image. If such a region contains a corner feature, then the motion is unambiguously defined by the OFCE. Most methods use predominantly such corner features to estimate the local image motion. Alternative approaches estimate normal flow from local gray-value edges in the image (Johnston et al., 1992; Barren et al., 1994; Fermüller & Aloimonos, 1993; Aloimonos \& Duric, 1994). The general technique is summarized in the following two equations that estimate speed and normal direction of motion, respectively:

$s=-\partial_t f/\left\| \nabla _{xy} f\right\|$ and $\vec{d}=\nabla _{xy} f/\left\| \nabla _{xy} f\right\|$. (A38)

A13 Methods which uses a probabilistic framework

Some authors proposed the estimation of optic flow using a probabilistic framework (Simoncelli et al., 1991; Nestares et al. 2000). Weiss & Fleet (2002) provide an overview. The goal is to compute the posterior probability of a velocity vector $\vec{v}=(u, v)^{t}$ given the image data $f$ using Bayes theorem:

$P(\vec{v}(\vec{z})\; \left|\; f(\vec{z})\right. )=\frac{P(f(\vec{z})\; \left|\vec{v}\right. (\vec{z}))\cdot P(\vec{v}(\vec{z}))}{P(f(\vec{z}))} $. (A39)

The probability of certain image data $P(f(\vec{z}))$ is independent of the velocity vector $\vec{v}$ and, thus, can be dropped for optimization, e.g. for maximum likelihood estimates. A commonly used constraint to express the probability of the image data $f$ given a velocity $\vec{v}$ is:

$P(f(\vec{z})\; \left|\vec{v}\right. (\vec{z}))=\frac{1}{\sqrt{2\pi } \sigma } \exp \left(-\frac{1}{2\sigma ^{2} } \int _{\Omega _{\vec{z}} }\left(\left(\nabla_{xyt} f(\vec{z}')\right)^{t} \left(\begin{array}{c} {\vec{v}} \\ {1} \end{array}\right)\right) ^{2} \; d\vec{z}'\right)$, (A40)

which incorporates the OFCE constraint integrated over the local neighborhood of the spatio-temporal location $\vec{z}=(x,y,t)^{t} $, which we denote by $\Omega _{\vec{z}} $. Weiss & Adelson (1989) assume the prior term

$P(\vec{v}(\vec{z})\; )=\frac{1}{\sqrt{2\pi } \sigma } \exp \left(-\frac{1}{2\sigma ^{2} } \left(\int _{\Omega _{\vec{z}} }\left\| \sum _{\left|\alpha \right|=0}^{\infty }\frac{\lambda ^{2n} }{2^{n} n!} \partial ^{\alpha } v(\vec{z}') \right\| _{2}^{2} \; d\vec{z}' \right)^{2} \right)$ (A41)

for velocities, which favors slow and smooth motions above fast and incoherent motions. The identifier $\alpha $ denotes a multi-index as before in e.g. Eq. (A36), $\lambda$ and $\sigma$ are parameters.

Weiss (1997) uses the expectation maximization (EM) framework in combination with Gaussian models as smoothness prior, labels drawn from a Markov random field, and labels are "tagged" to the velocity field using the OFCE as argument of a Gaussian distribution function. The maximization step of the EM framework is simplified by using a Green's function approach combined with a restricted number of basis functions, which leads to a suboptimal solution. Examples illustrate that this suboptimal solution is close to the optimal one and the computation of such a suboptimal solution provides savings in computation time.

Wang & Adelson (1993) suggest a method that uses an affine motion analysis and clustering technique to decompose the scene into layers of coherent motion. Rather than assuming piecewise smooth flow, like other methods, Wang & Adelson assume a constant parametric flow for each layer of the image that contains objects of roughly the same depth. Flow discontinuities are then attributed to depth discontinuities between boundaries of objects that appear at different distance from the camera. As parametric motion model, Wang & Adelson use a linear affine motion model with six parameters that is estimated using the method of Lucas & Kanade (1981). Similar parametric motions are clustered using k-means. An initial segmentation into motion layers is propagated to succeeding frames when processing a sequence of images. Another possibility of motion segmentation into layers is by employing a voting strategy (Min & Medioni, 2008). Ramirez-Manzanares et al. (2011) combine the multi-layer flow estimation with the smoothness constraint using the calculus of variations but not assuming piecewise smoothness nor estimating individual flow vectors for each pixel in the image.

A14 Motion Energy model

Adelson & Bergen suggest using spatio-temporal separable filters to detect motions, e.g. Gabor filters (Gabor, 1946) in the spatial dimension and the following causal function for the temporal domain:

$h_{i} (t)=(kt)^{n_{i} } \exp (-kt^{2} )\left(1/(n_{i} )!-(kt)^{2} /(n_{i} +2)!\right)$ with $n_{1} =3$ or $n_{2} =5$. (A42)

This function is motivated by the motion-contrast selectivity of the human visual system (Robson, 1966). Gabor filters for the spatial domain are defined by:

$\begin{array}{l} {g(\xi _{0} ,\eta _{0} ,x_{0} ,y_{0} ,x,y)} \\ {\quad =\frac{1}{2\pi \sigma _{x} \sigma _{y} } \exp \left(-\left(\frac{(x-x_{0} )^{2} }{2\sigma _{x}^{2} } +\frac{(y-y_{0} )^{2} }{2\sigma _{y}^{2} } \right)\right)\cdot \exp (\left(2\pi i\cdot \left((x-x_{0} )\xi _{0} +(y-y_{0} )\eta _{0} \right)\right)} \end{array}$, (A43)

which can be efficiently implemented using steerable filters (Freeman & Adelson, 1991). Following Watson & Ahamuda (1983) these filters can be combined in the following scheme to detect motion energy:

$E=4(AB'-A'B)$ whereas (A44a)

$A=Re(f*g)*h_{1} $, $A'=Re(f*g)*h_{2} $, (A44b)

$B=Im(f*g)*h_{1} $, $B'=Im(f*g)*h_{2} $, (A44c)

with the gray-value space-time variant input function $f$. The notation in Eq. (A44a-c) is abstract and does not include the parameters for the 2D Gabor filter nor does it include the sampling windows for the spatial or temporal dimensions. The temporal convolution is causal. Thus, when using a common implementation of convolution, e.g. convn provided by Matlab, the right-hand side of the filter kernel needs to be filled by zeros to make it causal. We provide a Matlab implementation of the motion energy detection scheme: Download.

A15 Log-monopole mapping of optic flow

To map visual space coordinates into coordinates of area V1, Rojer & Schwartz (1990) suggest the monopole mapping function:

$w=\left\{\begin{array}{cc} {\log (z+a)-\log (a)} & {if\; \; Re(z)\ge 0} \\ {-\left(\log (-z+a)-\log (a)\right)} & {if\; \; Re(z){\rm <0}} \end{array}\right. $ for $z=x+iy\in C$, (A45)

where "$a$" is a parameter, which describes part of the anatomical structure of the primate's visual system. In this context the image coordinates $\vec{x}=(x,y)^{t} $ are often defined by polar coordinates $(\theta ,\rho )^{t} $ with $x=\rho \cos (\theta )$ and $y=\rho \sin (\theta )$. This analytical optic flow is calculated using the Jacobian matrix of the mapping function from Eq. (A45)

$J_{w} =\left\{\begin{array}{cc} {\frac{1}{(x+a)^{2} +y^{2} } \left(\begin{array}{cc} {x+a} & {y} \\ {-y} & {x+a} \end{array}\right)} & {if\; \; Re(z)\ge 0} \\ {\frac{1}{(x-a)^{2} +y^{2} } \left(\begin{array}{cc} {-x+a} & {-y} \\ {y} & {-x+a} \end{array}\right)} & {if\; \; Re(z){\rm <0}} \end{array}\right. $ (A46)

and multiplying the result with the right-hand-side of Eq. (24), Elder et al. (2009).

no	$\delta x$	$\delta y$	$\delta t$	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
1	-1	-1	-1	+	-	-	-	+	+	+	+	+	+	-	-	-	-	-	-	-	-	-	-
2	0	-1	-1	+	0	-	-	0	0	0	+	+	+	0	0	0	0	0	0	-	-	-	-
3	+1	-1	-1	+	+	-	-	+	-	-	+	+	+	+	-	-	+	+	+	-	-	-	-
4	-1	0	-1	+	-	0	-	+	0	+	0	0	+	-	0	-	0	0	-	0	0	0	-
5	0	0	-1	+	0	0	-	0	0	0	0	0	+	0	0	0	0	0	0	0	0	0	-
6	+1	0	-1	+	+	0	-	+	0	-	0	0	+	+	0	-	0	0	+	0	0	0	-
7	-1	+1	-1	+	-	+	-	+	-	+	+	-	+	-	+	-	-	+	-	+	-	+	-
8	0	+1	-1	+	0	+	-	0	0	0	+	-	+	0	0	0	0	0	0	+	-	+	-
9	+1	+1	-1	+	+	+	-	+	+	-	+	-	+	+	+	-	+	-	+	+	-	+	-
10	-1	-1	0	+	-	-	0	+	+	0	+	0	0	-	-	0	-	0	0	-	0	0	0
11	0	-1	0	+	0	-	0	0	0	0	+	0	0	0	0	0	0	0	0	-	0	0	0
12	+1	-1	0	+	+	-	0	+	-	0	+	0	0	+	-	0	+	0	0	-	0	0	0
13	-1	0	0	+	-	0	0	+	0	0	0	0	0	-	0	0	0	0	0	0	0	0	0
14	0	0	0	+	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
15	+1	0	0	+	+	0	0	+	0	0	0	0	0	+	0	0	0	0	0	0	0	0	0
16	-1	+1	0	+	-	+	0	+	-	0	+	0	0	-	+	0	-	0	0	+	0	0	0
17	0	+1	0	+	0	+	0	0	0	0	+	0	0	0	0	0	0	0	0	+	0	0	0
18	+1	+1	0	+	+	+	0	+	+	0	+	0	0	+	+	0	+	0	0	+	0	0	0
19	-1	-1	+1	+	-	-	+	+	+	-	+	-	+	-	-	+	-	+	-	-	+	-	+
20	0	-1	+1	+	0	-	+	0	0	0	+	-	+	0	0	0	0	0	0	-	+	-	+
21	+1	-1	+1	+	+	-	+	+	-	+	+	-	+	+	-	+	+	-	+	-	+	-	+
22	-1	0	+1	+	-	0	+	+	0	-	0	0	+	-	0	+	0	0	-	0	0	0	+
23	0	0	+1	+	0	0	+	0	0	0	0	0	+	0	0	0	0	0	0	0	0	0	+
24	+1	0	+1	+	+	0	+	+	0	+	0	0	+	+	0	+	0	0	+	0	0	0	+
25	-1	+1	+1	+	-	+	+	+	-	-	+	+	+	-	+	+	-	-	-	+	+	+	+
26	0	+1	+1	+	0	+	+	0	0	0	+	+	+	0	0	0	0	0	0	+	+	+	+
27	+1	+1	+1	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+

no	$\delta x$	$\delta y$	$\delta t$	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
1	-1	-1	-1	+	-	-	-	+	+	+	+	+	+	-	-	-	-	-	-	-	-	-	-
2	0	-1	-1	+	0	-	-	0	0	0	+	+	+	0	0	0	0	0	0	-	-	-	-
3	+1	-1	-1	+	+	-	-	+	-	-	+	+	+	+	-	-	+	+	+	-	-	-	-
4	-1	0	-1	+	-	0	-	+	0	+	0	0	+	-	0	-	0	0	-	0	0	0	-
5	0	0	-1	+	0	0	-	0	0	0	0	0	+	0	0	0	0	0	0	0	0	0	-
6	+1	0	-1	+	+	0	-	+	0	-	0	0	+	+	0	-	0	0	+	0	0	0	-
7	-1	+1	-1	+	-	+	-	+	-	+	+	-	+	-	+	-	-	+	-	+	-	+	-
8	0	+1	-1	+	0	+	-	0	0	0	+	-	+	0	0	0	0	0	0	+	-	+	-
9	+1	+1	-1	+	+	+	-	+	+	-	+	-	+	+	+	-	+	-	+	+	-	+	-
10	-1	-1	0	+	-	-	0	+	+	0	+	0	0	-	-	0	-	0	0	-	0	0	0
11	0	-1	0	+	0	-	0	0	0	0	+	0	0	0	0	0	0	0	0	-	0	0	0
12	+1	-1	0	+	+	-	0	+	-	0	+	0	0	+	-	0	+	0	0	-	0	0	0
13	-1	0	0	+	-	0	0	+	0	0	0	0	0	-	0	0	0	0	0	0	0	0	0
14	0	0	0	+	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
15	+1	0	0	+	+	0	0	+	0	0	0	0	0	+	0	0	0	0	0	0	0	0	0
16	-1	+1	0	+	-	+	0	+	-	0	+	0	0	-	+	0	-	0	0	+	0	0	0
17	0	+1	0	+	0	+	0	0	0	0	+	0	0	0	0	0	0	0	0	+	0	0	0
18	+1	+1	0	+	+	+	0	+	+	0	+	0	0	+	+	0	+	0	0	+	0	0	0
19	-1	-1	+1	+	-	-	+	+	+	-	+	-	+	-	-	+	-	+	-	-	+	-	+
20	0	-1	+1	+	0	-	+	0	0	0	+	-	+	0	0	0	0	0	0	-	+	-	+
21	+1	-1	+1	+	+	-	+	+	-	+	+	-	+	+	-	+	+	-	+	-	+	-	+
22	-1	0	+1	+	-	0	+	+	0	-	0	0	+	-	0	+	0	0	-	0	0	0	+
23	0	0	+1	+	0	0	+	0	0	0	0	0	+	0	0	0	0	0	0	0	0	0	+
24	+1	0	+1	+	+	0	+	+	0	+	0	0	+	+	0	+	0	0	+	0	0	0	+
25	-1	+1	+1	+	-	+	+	+	-	-	+	+	+	-	+	+	-	-	-	+	+	+	+
26	0	+1	+1	+	0	+	+	0	0	0	+	+	+	0	0	0	0	0	0	+	+	+	+
27	+1	+1	+1	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+	+