Schematic framework for theories of perception
There is a wide agreement that science can benefit from tight interactions between theory and experiments. In the field of mammalian perception a structured theoretical space is lacking. Contemporary theories of perception, many of them implicitly deduced from experimental designs, form a patchy theoretical landscape. This article is an attempt to describe families of implicit and explicit theories of perception, mostly for the visual and tactile modalities, in one structured plane. The two axes spanning this plane are (i) the brain-world (BW) axis, along which external information is acquired by a given brain, and (ii) the brain-brain axis, along which different brains interact.
How do we perceive our environment? Despite decades of intensive research the scientific community does not seem to converge on an agreed direction. To begin with, there is no agreement about the general scheme of perception: is perception ‘direct’ or ‘indirect’? Does it depend on active sensor movements? Is it based on the construction of internal representations? Moreover, the neurobiology of perception seems to progress for the most part independently of its theory; partially, arguably, due to the lack of a structured theoretical landscape. This article proposes a structured theoretical landscape, which, despite its simplicity (or in fact thanks to its simplicity), can form an initial step towards productive theory-experiment dialogue.
The brain-world (BW) axis
Along this axis information about the environment of a specific brain is acquired. We adopt here the Umwelt viewpoint of von Uexkull, according to which the world perceived by a given brain (B i ) is unique to that brain (W i ) (Uexkull, 1926). In line with von Uexkull, whenever we use the term “brain” here we in fact refer to the entire organism, and primarily to the entire perceptual system in that organism, including the sensory organs and their muscles (see Ahissar et al., 2015).
We categorize theories of perception (TOPs), whether implicit or explicit, in five schematic classes, not necessarily mutually exclusive (Figure 1). For this generic scheme we consider two fundamental states – ‘world state’ and ‘brain state’ – and assume that perceptual acquisition updates the brain state according to the world state such that the brain state forms an updated model of the world state (Friston, 2010; Tishby and Polani, 2011). In the following we describe briefly each class and provide several representative examples of implicit or explicit theories.
- Bottom-Up TOPs. Acquisition of information from the world is done in one direction, from W to B, and in a feedforward manner. Various aspects of such processing have been suggested along the years including: integration of neuronal representations of individual features (Feature Integration Theory, FIT; (Treisman and Gelade, 1980)), classification by feedforward transformations (DEEP network, (Cadieu et al., 2014; Kriegeskorte, 2015); HMAX (Poggio and Serre, 2013)), ignition of Local Activations (LA; (Noy et al., 2015)), activation of a Global Work Space (GWS; (Dehaene et al., 1998; Baars, 2002)) and others.
- Bottom-Up-Top-Down TOPs. Acquisition of information from the world includes processes running in two directions, from W towards B (bottom-up, BU) and from B towards W (Top-Down, TD) (Rao and Ballard, 1999). Different schemes suggest different types of interactions between the two processing streams. The Reverse Hierarchy Theory (RHT) suggests that the gist of the scene is acquired via a rapid propagation in the bottom-up direction and the details are acquired via top-down processes, whose depth and scrutiny level depends on the context (Hochstein and Ahissar, 2002; Ahissar and Hochstein, 2004). The BU/TD Segmentation (BUTD) scheme proposes that BU and TD processing streams run in parallel and interact at different brain levels, matching stored knowledge with segmentation constrains (Borenstein and Ullman, 2008).
- Bottom-up Reentrant TOPs, Acquisition of information from the world is done in the BU direction, from W towards B, and includes local closed-loop dynamics in one or more processing stations. Reentrant processing may facilitate data integration and categorization (Enns and Di Lollo, 2000; Edelman and Gally, 2013), possibly using algorithms for iterative processing such as recursive Bayesian estimation, Kalman filtering or particle filtering.
- Closed-loop TOPs. Acquisition of information from the world is done in loops connecting B and W (Uexkull, 1926; Ahissar and Vaadia, 1990; Friston, 2010; Tishby and Polani, 2011). The processing obeys global closed-loop dynamics which link world elements and brain elements. Loop dynamics may follow an internal control signal (Perceptual Control Theory, PCT (Powers, 1973)) or converge to perceptual attractors (Closed-Loop Perception, CLP (Ahissar and Assa, 2016)).
- Motor-sensory TOPs. Perception is hypothesized to emerge from motor-sensory interactions and to depend on sensorimotor contingencies (SMCs, (O'Regan and Noe, 2001)). Neuronal implementation is not stated explicitly, and thus this type of TOP may be integrated with previous ones. Specifically, it complements quite naturally closed-loop TOPs (Buhrmann et al., 2013).
BW acquisition processes can in principle follow discrete or continuous dynamics. This distinction is related to, but probably cannot be reduced to, the distinction between discrete dynamical systems and continuous dynamical systems, which is formulated only in terms of their descriptive equations.
If perception follows discrete dynamics it can be localized in space and time. That is, it has starting and ending spatiotemporal coordinates and in principle it can be put “on hold” – be paused and continued later. In contrast, if perception follows continuous dynamics it does not have starting or ending spatiotemporal coordinates and it cannot be put “on hold” – if paused it cannot be continued later. Accordingly, with discrete dynamics perception can be based on transformations between static representations (where x is termed ‘static’ if there exists a time window, short as it may be, in which x does not change) (e.g., (Marr, 1982)). In contrast, with continuous dynamics no static events exist whatsoever (e.g., (Van Gelder and Port, 1995; Kelso, 1997)).
The brain-brain (BB) axis
Along this axis information about the environment is exchanged between brains based on their brain states. BB communication is also based on channels of BW acquisition (naturally, since for each brain the other brain is part of its world). For example, auditory perception is used for speech, and visual perception for written symbols. The information carried in these channels, however, is symbolic, and it is transferred between the brain states of the two brains (e.g., B1 and B2; Figure 2). We term the fundamental items transferred in BB communication “ideas”, following Descartes’ terminology (Descartes and Cottingham, 2013). These ideas often represent “substances” perceived in the worlds of these two brains (W1 and W2) - here we refer only to such substance-related ideas. An objective world (W) can be inferred by the collective behavior of the two (or more) brains.
The most typical human BB channels used for conveying ideas about external substances are those related to language:
- Speech. The physical channel is based on the auditory system. This communication is rhythmic, active in both sides (production and perception) and usually interactive (closing a production-perception loop).
- Sign language. The physical channel is typically based on the visual system. This communication is rhythmic, active in both sides (production and perception) and usually interactive (closing a production-perception loop).
- Script. The physical channel is typically based on the visual system. Both writing and reading are rhythmic and active, but are typically not interactive.
In other species, BB communication may be based on conspecific calls, songs and displays, as well as on scent communication. In any case, given the discrete nature of ideas, BB communications of ideas should follow discrete dynamics. The physical communication carrying these ideas, via BW channels, may follow either discrete or continuous dynamics, as discussed above.
Interactions between the two axes can occur at many levels. For example (Figure 2, large arrows): BU processes (including those containing reentrant loops) may add additional feedforward levels to convey the internal brain state (that would form an Internal Representation, IR, in this case) to the BB channel. BU-TD acquisition processes may interact bi-directionally with the BB channel, where sites of interactions may span a range between the top of the BU hierarchy (more likely for RHT) and earlier BU junctions (more likely for BU/TD segmentation). Closed-loop acquisition scheme may prefer closed loop interactions with the BB channel, with sites of interactions spanning those parts of the BW loops that are accessible for conscious report.
BW-BB interactions can be embodied, in each brain, via synaptic interactions between any projection from a BB-related station (e.g., speech recipient station) and a BW-related station (e.g., sensory brain areas). Two major candidates, not necessarily mutually exclusive, come to mind here. One are the feedback connections (TD) projecting from high to low level sensory stations. Another are the efference copies -- collaterals of motor-related projections that innervate sensory stations. The dense distribution of these junctions allows BB-BW interactions in virtually all processing stations. These interactions can be unidirectional, bi-directional, open-loop or closed-loop. One crucial transformation must be acknowledged here. Whereas BB communication is based on discrete signals, typically representing perceptual categories, BW communication may be continuous and non-categorical. Thus, BW-BB interactions are likely to include transformations between continuous and discrete representations.
There are two external anchors for empirical studies addressing theories of perception: the world state and the report. Psychophysical and behavioral approaches typically monitor and manipulate these two anchors rigorously. Unlike these two anchors, the internal brain state cannot be monitored, or manipulated, in a rigorous manner. In fact, we can currently sample only a negligible fraction of the relevant neuronal activity in any given condition. Thus, empirical discrimination between available theories should probably proceed in stages, starting with well-designed behavioral experiments and continuing with prediction-based neuronal experiments.
- Ahissar, E. and E. Assa, Perception as a closed-loop convergence process. eLife, 2016. 5: p. e12830.
- Ahissar, E., Shinde, N. and Haidarliu, S. (2015) Systems Neuroscience of Touch. Scholarpedia 10:32785.
- Ahissar, M. and S. Hochstein, The reverse hierarchy theory of visual perceptual learning. Trends Cogn Sci, 2004. 8(10): p. 457-64.
- Baars, B.J., The conscious access hypothesis: origins and recent evidence. Trends Cogn Sci, 2002. 6(1): p. 47-52.
- Borenstein, E. and S. Ullman, Combined top-down/bottom-up segmentation. IEEE Transactions on pattern analysis and machine intelligence, 2008. 30(12): p. 2109-2125.
- Buhrmann, T., E.A. Di Paolo, and X. Barandiaran, A dynamical systems account of sensorimotor contingencies. Frontiers in psychology, 2013. 4.
- Cadieu, C.F., et al., Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS computational biology, 2014. 10(12): p. e1003963.
- Dehaene, S., M. Kerszberg, and J.P. Changeux, A neuronal model of a global workspace in effortful cognitive tasks. Proc Natl Acad Sci U S A, 1998. 95(24): p. 14529-34.
- Descartes, R. and J. Cottingham, René Descartes: Meditations on First Philosophy: With Selections from the Objections and Replies. 2013: Cambridge University Press.
- Edelman, G.M. and J.A. Gally, Reentry: a key mechanism for integration of brain function. Frontiers in integrative neuroscience, 2013. 7.
- Enns, J.T. and V. Di Lollo, What's new in visual masking? Trends Cogn Sci, 2000. 4(9): p. 345-352.
- Friston, K., The free-energy principle: a unified brain theory? nature reviews neuroscience, 2010. 11(2): p. 127-38.
- Hochstein, S. and M. Ahissar, View from the top: hierarchies and reverse hierarchies in the visual system. Neuron, 2002. 36(5): p. 791-804.
- Kelso, J.S., Dynamic patterns: The self-organization of brain and behavior. 1997: MIT press.
- Kriegeskorte, N., Deep neural networks: a new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 2015. 1: p. 417-446.
- Marr, D., Vision. 1982, San Francisco: W. H. Freeman.
- Noy, N., et al., Ignition’s glow: Ultra-fast spread of global cortical activity accompanying local “ignitions” in visual cortex during conscious visual perception. Consciousness and cognition, 2015. 35: p. 206-224.
- O'Regan, J.K. and A. Noe, A sensorimotor account of vision and visual consciousness. Behavioral and Brain Sciences, 2001. 24(5): p. 939-73; discussion 973-1031.
- Poggio, T. and T. Serre, Models of visual cortex. Scholarpedia, 2013. 8(4): p. 3516.
- Powers, W.T., Feedback: beyond behaviorism. Science, 1973. 179(71): p. 351-6.
- Rao, R.P. and Ballard, D.H. (1999) Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature neuroscience 2:79.
- Tishby, N. and D. Polani, Information theory of decisions and actions, in Perception-Action Cycle. 2011, Springer. p. 601-636.
- Treisman, A.M. and G. Gelade, A feature-integration theory of attention. Cognit Psychol, 1980. 12(1): p. 97-136.
- Uexkull, J.v., Theoretical biology. 1926, London: K. Paul, Trench, Trubner & co. ltd.
- Van Gelder, T. and R.F. Port, It’s about time: An overview of the dynamical approach to cognition. Mind as motion: Explorations in the dynamics of cognition, 1995. 1: p. 43.