Adaptive resonance theory

Post-publication activity

Dr. Stephen Grossberg, Boston University, MA

Introduction

The Stability-Plasticity Dilemma and Rapid Learning Throughout Life

Adaptive Resonance Theory, or ART, is a cognitive and neural theory of how the brain autonomously learns to attend, categorize, recognize, and predict objects and events in a changing world. ART currently has the broadest explanatory and predictive range of available cognitive and neural theories. Central to ART's predictive power is its ability to autonomously carry out fast, incremental, unsupervised and supervised learning in response to a changing world, without erasing previously learned memories.

Humans are able to rapidly learn enormous amounts of new information, on their own, throughout life. How do humans integrate all this information into unified conscious experiences that cohere into a sense of self? One has only to see an exciting movie just once to marvel at this capacity, since we can then tell our friends many details about it later on, even though the individual scenes flashed by very quickly. More generally, we can quickly learn about new environments, even if no one tells us how the rules of each environment differ. To a remarkable degree, humans can rapidly learn new facts without being forced to just as rapidly forget what they already know. As a result, we can confidently go out into the world without fearing that, in learning to recognize a new friend's face, we will suddenly forget the faces of our family and friends. This property is often called catastrophic forgetting.

Grossberg (1980) has called the problem whereby the brain learns quickly and stably without catastrophically forgetting its past knowledge the stability-plasticity dilemma. The stability-plasticity dilemma must be solved by every brain system that needs to rapidly and adaptively respond to the flood of signals that subserves even the most ordinary experiences. To solve the stability-plasticity dilemma, ART specifies mechanistic links between processes of Consciousness, Learning, Expectation, Attention, Resonance, and Synchrony (the CLEARS processes). Grossberg (1978a, 1980, 2007) predicted that all brain representations that solve the stability-plasticity dilemma use variations of CLEARS mechanisms. Synchronous resonances are, in particular, expected to occur between multiple cortical and subcortical areas, and recent neurophysiological data support this prediction; e.g., see Buschman and Miller (2007), Engel et al. (2001), Grossberg (2009b), and Pollen (1999).

Through these CLEARS connections, ART clarifies why many animals are intentional beings who pay attention to salient objects, why "all conscious states are resonant states", and how brains can learn both many-to-one maps (representations whereby many object views, positions, and sizes all activate the same invariant object category) and one-to-many maps (representations that enable us to expertly know many things about individual objects and events).

ART accomplishes these properties by proposing how top-down expectations focus attention on salient combinations of cues, and characterizes how attention may operate via a form of self-normalizing "biased competition" (Desimone, 1998). ART explains how such top-down attentive matching may help to solve the stability-plasticity dilemma. In particular, when a good enough match occurs, a synchronous resonant state emerges that embodies an attentional focus and is capable of driving fast learning of bottom-up recognition categories and top-down expectations; hence the name adaptive resonance.

All of the main ART predictions have received increasing support from psychological and neurobiological data since ART was introduced in Grossberg (1976a, 1976b). Since then, ART has undergone continual development to explain and predict increasingly large behavioral and neurobiological data bases, ranging from normal and abnormal aspects of human and animal perception and cognition, to the spiking and oscillatory dynamics of hierarchically-organized laminar thalamocortical networks in multiple modalities. Indeed, some ART models explain and predict behavioral, anatomical, neurophysiological, biophysical, and even biochemical data. ART currently provides (see Grossberg, 2012; http://cns.bu.edu/~steve/ART.pdf) functional and mechanistic explanations of such diverse topics as laminar cortical circuitry; invariant object and scenic gist learning and recognition; prototype, surface, and boundary attention; gamma and beta oscillations; learning of entorhinal grid cells and hippocampal place cells; computation of homologous spatial and temporal mechanisms in the entorhinal-hippocampal system; vigilance breakdowns during autism and medial temporal amnesia; cognitive-emotional interactions that focus attention on valued objects in an adaptively timed way; item-order-rank working memories and learned list chunks for the planning and control of sequences of linguistic, spatial, and motor information; conscious speech percepts that are influenced by future context; auditory streaming in noise during source segregation; and speaker normalization. Brain regions whose functions are clarified by ART include visual and auditory neocortex; specific and nonspecific thalamic nuclei; inferotemporal, parietal, prefrontal, entorhinal, hippocampal, parahippocampal, perirhinal, and motor cortices; frontal eye fields; supplementary eye fields; amygdala; basal ganglia: cerebellum; and superior colliculus.

Equations, Modules, Modal Architectures, and Complementary Processing Streams

How does ART sit within the corpus of all neural models? In particular, is the brain just a bag of tricks, as some authors have proposed (e.g., Ramachandran, 1990)? A contrary view derives from the fact that many mind and brain phenomena have been explained and predicted using small number of equations (e.g., equations for short-term memory, or STM; medium-term memory, or MTM; and long-term memory, or LTM) and a somewhat larger number of modules or microcircuits (e.g., shunting on-center off-surround networks, gated dipole opponent processing networks, associative learning networks, spectral timing networks, and the like), which have been specialized and assembled into modal architectures. The term “modal” stands for modality (e.g., architectures for vision, audition, cognition, cognitive-emotional interactions, sensory-motor control, and the like). Modal architectures are less general than a Turing or von Neumann architecture for general computing, but far more general than a traditional AI algorithm. They are designed to be capable of general-purpose self-organizing processing of a particular modality of biological intelligence and their particular specializations of the basic equations and modules have been selected over the millennia by evolutionary pressures.

Figure 1: Complementary What and Where cortical processing streams for spatially-invariant object recognition and spatially-variant spatial representation and action, respectively. Perceptual and recognition learning use top-down excitatory matching and match-based learning that achieves fast learning without catastrophic forgetting. Spatial and motor learning use inhibitory matching and mismatch-based learning that enable rapid adaptation to changing bodily parameters. IT = inferotemporal cortex, PPC = posterior parietal cortex. [Reprinted with permission from Grossberg (2009b).]

ART networks form part of modal architectures. Modal architectures, in turn, embody new paradigms for brain computing that are called Complementary Computing (Grossberg, 2000b) and Laminar Computing (Grossberg, 1999). Complementary Computing describes how the global brain is organized into complementary parallel processing streams whose interactions generate biologically intelligent behaviors. Due to the computationally complementary organization of the brain, ART does not describe many spatial and motor behaviors whose matching and learning laws differ from those of ART (Figure 1).

Organization of the brain into complementary processes is predicted to be a general principle of brain design that is not just found in ART (Grossberg, 2000b). A complementary process can individually compute some properties well, but cannot, by itself, process other complementary properties. In thinking intuitively about complementary properties, one can imagine puzzle pieces fitting together. Both pieces are needed to finish the puzzle. Complementary brain processes are, however, more dynamic than any such analogy: Pairs of complementary processes interact to form emergent properties which overcome their complementary deficiencies to compute complete information with which to represent or control some aspect of intelligent behavior.

Laminar Computing describes how the cerebral cortex is organized into layered circuits whose specializations can support all forms of higher-order biological intelligence. Indeed, the laminar circuits of cerebral cortex seem to realize a revolutionary computational synthesis of the best properties of feedforward and feedback processing, digital and analog processing, and data-driven bottom-up processing and hypothesis-driven top-down processing (Grossberg, 2007). ART mechanisms have been naturally embodied in laminar cortical models of 3D vision and figure-ground separation (3D LAMINART model: Grossberg and Raizada, 2000; Grossberg and Swaminathan, 2004; Grossberg and Yazdanbaksh, 2005; Raizada and Grossberg, 2001); audition, speech, and language (cARTWORD model: Grossberg and Kazerounian, 2011); and cognitive information processing (LIST PARSE model: Grossberg and Pearson, 2008).

History

Equations for short-term memory, medium-term memory, and long-term memory

ART was built on a foundation of discoveries about learning and information processing by neural networks that spanned two decades. The first foundation was the introduction by Grossberg in 1957-58 of the paradigm of using nonlinear systems of differential equations to show how brain mechanisms can give rise to behavioral functions. In 1957-1958, Grossberg introduced equations for:

Short-term memory (STM), or neuronal activation (often called the Additive and Shunting models, or the Hopfield model after the Hopfield (1984) application of the Additive model equation and the Cohen and Grossberg (1983) Liapunov function for it). Shunting equations in networks of neurons that interact via on-center off-surround connections have self-normalizing, or contrast gain control, properties (Grossberg, 1973, 1980) which enable ART circuits to process distributed feature patterns without being degraded by noise or saturation.

Medium-term memory (MTM), or activity-dependent habituation (often called habituative transmitter gates, or depressing synapses after Abbott et al. (1997) popularized that term). Habituative gates help to prevent resonances from persisting indefinitely. They also enable a resonance that reads out a predictive mismatch to be reset, thereby triggering a memory search, or hypothesis testing, to discover a recognition category capable of better representing an attended object or event. Habituative transmitter gates have been used to help explain a wide range of data about processes other than ART category learning, including the dynamics of visual perception, cognitive-emotional interactions, and sensory-motor control (Francis and Grossberg, 1996; Francis et al., 1994; Gaudiano and Grossberg, 1991, 1992; Grossberg, 1972, 1980, 1984a, 1984b). For a derivation of this law, see Grossberg (1998b; http://cns.bu.edu/~steve/Gro1998TR001.pdf).

Long-term memory (LTM), or neuronal learning, often called gated steepest descent learning. For a historical discussion of this learning law, see Grossberg (1998a; http://cns-web.bu.edu/Profiles/Grossberg/Learning.html). One variant of these learning equations, called Instar Learning, was used in Grossberg (1976a) for the learning of bottom-up adaptive filters in Self-Organizing Map (SOM) models. Kohonen (1984) also used Instar Learning in his SOM applications (see Kohonen Network). For a historical discussion of SOM models, see Grossberg (1994; http://www.cns.bu.edu/Profiles/Grossberg/Gro1994KohonenLetter.pdf). Another variant, called Outstar Learning, was introduced in Grossberg (1968b) for spatial pattern learning. Outstar and Instar learning were combined in Grossberg (1976a) within a three-layer instar-outstar network for the learning of multi-dimensional maps from any m-dimensional input space to any n-dimensional output space, a network that was called counterpropagation in Hecht-Nielsen (1987). In ART models, Grossberg (1976b) used instars to define the learning in bottom-up adaptive filters, and outstars to define the learning in top-down expectations. The learning instabilities of competitive learning and self-organizing maps that were described in Grossberg (1976a) led to the introduction of ART in Grossberg (1976b) to show how top-down matching of bottom-up feature patterns by learned expectations could help to dynamically stabilize the memories learned in SOM models.

These STM, MTM, and LTM equations are used in many other neural models as well, where they are specialized to copy with evolutionary pressures on different brain systems. One variant is:

STM: Neuronal Activation in Additive and Shunting On-Center Off-Surround Networks

\[ \tag{1} \frac{d x_i}{dt} = - A x_i + (B - C x_i) \left[ I_i + \sum_{k=1}^n f_k (x_k) y_k D_{ki} z_{ki} \right] - (E + F x_i) \left[ J_i + \sum_{k=1}^n g_k (x_k) Y_k G_{ki} Z_{ki} \right] \]

This equation describes the activities, $ x_i $, of the $ i $th cell (population) in a network of $n$ interacting neurons. It includes both the Additive and Shunting models (Grossberg, 1968a, 1969). In the shunting model, the parameters $ C \neq 0$ and $ F \neq 0$. The parameter $ E = 0$ when there is “silent” shunting inhibition, whereas $ E \neq 0$ describes the case of hyperpolarizing shunting inhibition. In the Additive model, parameters $ C =F =0$. The excitatory interaction term $ \left[ I_i + \sum_{k=1}^n f_k (x_k) y_k D_{ki} z_{ki} \right] $ describes an external input $ I_i $ plus the total excitatory feedback signal $ \left[ \sum_{k=1}^n f_k (x_k) y_k D_{ki} z_{ki} \right] $ that is a sum of signals from other populations via their output signals $ f_k (x_k) $. The term $ D_{ki} $ is a constant connection strength between cell populations $ k $ and $ i $, whereas terms $ y_k $ and $ z_{ki} $ describe MTM and LTM variables, respectively. The inhibitory interaction term $ \left[ J_i + \sum_{k=1}^n g_k (x_k) Y_k G_{ki} Z_{ki} \right] $ has a similar interpretation. Equation (1) assumes “fast inhibition”; that is, inhibitory interneurons respond instantaneously to their inputs. Slower finite-rate inhibition with activities $ X_i $ uses an equation like Eq. (1) to describe the temporal evolution of the inhibitory activities. The output signals from these inhibitory interneurons provide the inhibitory feedback signals to the excitatory activities. With slow inhibition, the inhibitory feedback signals would be $ g_k(X_k) $ instead of $ g_k(x_k) $.

MTM: Habituative Transmitter Gates and Depressing Synapses

\[ \tag{2} \frac{d y_i}{dt} = H (K - y_i) - L f_i (x_i) y_i \]

Equation (2) describes how the strength $ y_i $ of the $ i $th habituative transmitter gate, or depressing synapse, in the excitatory feedback term of Eq.(1) accumulates at a fixed rate $H$ to its maximum value $K$ via term $ H (K - y_i) $ and is inactivated, habituated, or depressed via a mass action interaction between the feedback signal $ f_i (x_i) $ and the gate concentration $ y_i $. The mass action term may be more complex than this in some situations; e.g., Gaudiano and Grossberg (1991, 1992). The habituative transmitter gate $ Y_i $ in the inhibitory feedback term of Eq.(1) obeys a similar equation. By multiplying the signals $ f_i (x_i) $ in Eq.(1), transmitter gates can modulate their efficacy in an activity-dependent way. It should be noted that not all signals need to be habituative.

LTM: Gated Steepest Descent Learning

\[ \tag{3} \frac{d z_{ij}}{dt} = M f_i (x_i) \left[ h_j (x_j) - z_{ij} \right] \]

and

\[ \tag{4} \frac{d z_{ij}}{dt} = M f_j (x_j) \left[ h_i (x_i) - z_{ij} \right] \]

Equation (3) describes the outstar learning equation, by which the $i$th source cell can sample and learn a distributed spatial pattern of activation across a network of sampled cells. When the gating signal $ f_i (x_i) $ is positive, the adaptive weights $ z_{ij} $ can sample the activity-dependent signals $ h_j (x_j) $ across the sampled network of cells. Equation (4) describes the instar learning equation, by which the $j$th target cell can sample and learn the distributed pattern of signals that activated it. There are many variations of these gated steepest descent equations (doubly-gated learning, spike-timing dependent learning, self-normalizing, etc. (e.g., Gorchetchnikov et al., 2005; Grossberg and Seitz, 2003). It should also be noted that not all connections need to be adaptive.

Applications in Engineering and Technology

One part of the development of ART has been to identify algorithms that computationally embody specific combinations of useful ART design principles. These algorithms have contributed to the mathematical development of the cognitive and neural theory, and are widely used in large-scale engineering and technological applications, such as medical data base prediction, remote sensing, airplane design, and the control of autonomous adaptive robots.

A standard ART algorithm for applications is called Default ARTMAP (Amis and Carpenter, 2007; Carpenter, 2003). Early important ART algorithms for applications include ART 1, ART 2, ARTMAP, fuzzy ART, and fuzzy ARTMAP (Carpenter and Grossberg, 1987a, 1987b; Carpenter, Grossberg, and Reynolds, 1991; Carpenter, Grossberg, and Rosen, 1991; Carpenter et al., 1992). More recent algorithms from Gail Carpenter and her students include distributed ARTMAP, which combines distributed coding with fast, stable, incremental learning (Carpenter, 1997; Carpenter, Milenova, and Noeske, 1998); ARTMAP Information Fusion, which can incrementally learn a cognitive hierarchy of rules in response to probabilistic, incomplete, and even contradictory data that are collected by multiple observers (Carpenter, Martens, and Ogas, 2005; Carpenter and Ravindran, 2008); Self-supervised ART, which shows how some supervised learning "in school" can lead to effective knowledge acquisition later on by unsupervised learning "in the real world" (Amis and Carpenter, 2009); and Biased ART, which shows how attention can be selectively diverted from features that cause predictive errors (Carpenter and Gaddam, 2010). Computer code for running various ART algorithms and related neural models that were discovered and developed at Boston University can be found at http://techlab.bu.edu/resources/software/C51.

Many variants of ART have been developed and applied to large-scale engineering and technological applications by authors around the world (e.g., Akhbardeh et al., 2007; Anagnostopoulos and Georgiopoulos, 2000; Anton-Rodriguez et al., 2009; Brannon et al., 2009; Cai et al., 2011; Cano-Izquierdo et al., 2009; Caudell, 1992; Caudell et al., 1991; Chao et al., 2011; Cherng et al., 2009; Demetgul et al., 2009; Dunbar, 2012; He et al., 2000, 2012; Healy et al., 1993; Ho et al., 1994; Hsieh, 2008; Hsieh and Yang, 2008; Hsu and Chien, 2007; Kawamura et al., 2008; Kaylani et al., 2009; Keskin and Ozkan, 2007; Liu et al., 2008, 2009; Lopes et al., 2005; Marchiori et al., 2011; Martin-Guerrero et al., 2007; Massey, 2009; Mulder and Wunsch, 2003; Owega et al., 2006; Prasad and Gupta, 2008; Shieh et al., 2008; Sudhakara Panian and Mahapatra, 2009; Takahashi et al., 2007; Tan, 1997; Tan and Teo, 1998; Tan et al., 2008; Wienke and Buydens, 1995; Wunsch et al., 1993; Xu et al., 2009; Zhang and Kezunovic, 2007). A repository of some applications is found at http://techlab.bu.edu/resources/articles/C5.

ART Model Properties

Learning and Prediction by Complementary Cortical Streams for Recognition and Action

Biological learning includes both perceptual/cognitive and spatial/motor processes. Accumulating experimental and theoretical evidence show that perceptual/cognitive and spatial/motor processes both need predictive mechanisms to control learning. There is thus an intimate connection between learning and predictive dynamics in the brain. However, neural models of these processes have proposed, and many experiments have supported, the hypothesis that perceptual/cognitive and spatial/motor processes use different types of predictive mechanisms to regulate the learning that they carry out.

Learning to be an expert in a changing world: Excitatory matching and match learning

The need for different predictive mechanisms is clarified by accumulating theoretical and empirical evidence that brain specialization is governed by computationally complementary cortical processing streams that embody different predictive and learning mechanisms (Figure 1; Grossberg, 2000b): Perceptual/cognitive processes in the What ventral cortical processing stream often use excitatory matching and match-based learning to create predictive representations of objects and events in the world. Match-based learning solves the stability-plasticity dilemma and is the kind of learning used in ART. This sort of learning can occur quickly without causing catastrophic forgetting, much as we quickly learn new faces without forcing rapid and unselective forgetting of familiar faces. However, match learning, and by extension ART, does not describe the only kind of learning that the brain needs to accomplish autonomous adaptation to a changing world. If only for this reason, ART is not a "theory of everything".

Controlling a changing body: Inhibitory matching and mismatch learning

There are just as essential, but complementary, spatial/motor processes in the Where dorsal cortical processing stream that often use inhibitory matching and mismatch-based learning to continually update spatial maps and sensory-motor gains as our bodily parameters change through time (Bullock et al., 1998; Bullock and Grossberg, 1988; Georgopoulos et al., 1982, 1986). Indeed, spatial and motor learning processes that solve the stability-plasticity dilemma would be maladaptive, since the spatial representations and motor gains that were suitable for controlling our infant bodies should not be remembered and used to control our adult bodies. In this sense, catastrophic forgetting is a good property during spatial and motor learning.

As an example of inhibitory spatial matching, consider how an arm movement is made. To make such a movement, a representation of where the arm is now (its present position vector) is subtracted from a representation of where we want the arm to move (its target position vector), thereby computing a difference vector that represents the direction and distance of movement needed to attain the target. After moving to the target, the target and present positions agree, so the difference vector is zero. In other words, this sort of matching is inhibitory (Bullock and Grossberg, 1988).

Neither type of matching and learning is sufficient to design an adaptive autonomous agent, but each is necessary. By combining these two types of processes, brains can incrementally learn and stably remember perceptual and cognitive representations of a changing world, leading to a self-stabilizing front end that solves the stability-plasticity dilemma and enables us to become increasingly expert in understanding and predicting outcomes in the world. At the same time, brains can adaptively update representations of where objects are and how to act upon them using bodies whose parameters change continuously through time due to development, exercise, illness, and aging.

Why procedural memories are not conscious

Brains that use inhibitory matching and mismatch learning cannot generate excitatory resonances. Hence, if "all conscious states are resonant states", then spatial and motor representations are not conscious. This distinction provides a mechanistic reason why declarative memories (or "learning that"), which are the sort of memories learned by ART, may be conscious, whereas procedural memories (or "learning how"), which are the sort of memories that control spatial orienting and action, are not conscious (Cohen and Squire 1980).

Spatially-invariant recognition vs. spatially localized action

There is another basic reason why these complementary What and Where processes need to work together. The What stream attempts to learn view-invariant and spatially-invariant object categories, so that a combinatorial explosion does not occur wherein every view of every object at every position and distance needs to be represented by a different category. Indeed, learning in the What cortical stream leads to recognition categories that tend to be increasingly independent of object size and position at higher cortical levels. The anterior inferotemporal cortex exhibits such invariance (Bar et al., 2001; Sigala and Logothetis, 2002; Tanaka et al., 1991; Zoccolan et al., 2007). Cao, et al. (2011) and Grossberg et al. (2011) have used ART to simulate recent neurophysiological data about properties of invariant category learning and recognition in inferotemporal cortex.

In becoming spatially invariant, recognition categories lose information about how to direct action towards the locations in space where desired objects may be found. In contrast, the Where stream learns spatial maps that locate such desired objects, as well as the movement gains that enable us to make accurate movements with respect to them. On the other hand, Where stream spatial processing gives up information about which objects are in those spatial locations. Interactions between the What and Where stream ("What-Where fusion") overcome these complementary deficiencies to enable invariant object representations to control actions towards desired goals in space. Some ART-compatible models of how this happens are found in Brown et al. (2004), Fazl et al. (2009), Grossberg (2009b) and Grossberg and Vladusich (2010).

In summary, because of their different types of matching and learning, perceptual and cognitive learning provide a self-stabilizing front end to control the more labile spatial and motor learning that enables changing bodies to effectively act upon recognized objects in the world.

Learning, Expectation, Attention, and Intention

Humans are intentional beings who learn expectations about the world and make predictions about what is about to happen. Humans are also attentional beings who focus processing resources upon a restricted amount of incoming information at any time. The stability-plasticity dilemma and its solution using resonant states provides a unifying framework for understanding how intention and attention are conceptually and mechanistically related.

Top-down attentional priming

To clarify the role of sensory or cognitive expectations, and of how a resonant state is activated, suppose you were asked to “find the yellow ball as quickly as possible, and you will win a $100,000 prize”. Activating an expectation of a “yellow ball” enables its more rapid detection, and with a more energetic neural response. Sensory and cognitive top-down expectations hereby lead to excitatory matching with consistent bottom-up data. A mismatch between top-down expectations and bottom-up data can suppress the mismatched part of the bottom-up data, while attention is focused upon the matched, or expected, part of the bottom-up data.

Learning of attended critical feature patterns

Excitatory matching and attentional focusing on bottom-up data using top-down expectations generates resonant brain states: When there is a good enough match between bottom-up and top-down signal patterns between two or more levels of processing, their positive feedback signals amplify, synchronize, and prolong their mutual activation, leading to a resonant state that focuses attention on a combination of features (the "critical feature pattern") that is needed to correctly classify the input pattern at the next processing level and beyond. Amplification, synchronization, and prolongation of activity triggers learning in the more slowly varying adaptive weights that control the signal flow along pathways between the attended features and the recognition category with which they resonate. Resonance hereby provides a global context-sensitive indicator that the system is processing data worthy of learning, hence the name Adaptive Resonance Theory.

In summary, ART predicts a link between the mechanisms which enable us to learn quickly and stably about a changing world, and the mechanisms that enable us to learn expectations about such a world, test hypotheses about it, and focus attention upon information that may predict desired consequences. ART clarifies this link by asserting that, in order to solve the stability-plasticity dilemma, only resonant states can drive fast new learning.

Linking Brain to Behavior: All Conscious States are Resonant States

ART also predicts that experiences which can attract our attention and guide our future lives after being learned are also among the ones that are conscious. Support for the predicted link between resonance and consciousness comes many modeling studies wherein the parametric properties of ART brain resonances map onto parametric properties of conscious behavioral experiences in the simulated experiments. Indeed, without such a linking hypothesis between brain mechanisms and behavioral functions, no theory of consciousness can be fully tested.

Although it is predicted that "all conscious states are resonant states", it is not predicted that "all resonant states are conscious states". Indeed, some resonant states, such as the storage of a sequence of events in working memory before rehearsal occurs (e.g., Grossberg and Pearson, 2008), or the entorhinal-hippocampal resonances that may dynamically stabilize the learning of entorhinal grid cells and hippocampal place cells (e.g., Pilly and Grossberg, 2012), are not accessible to consciousness.

Varieties of Resonant Experience

As of this writing, many different behaviors have been linked resonances in different parts of the brain (see Grossberg (2012) for a review). For example, surface-shroud resonances are predicted to subserve conscious percepts of visual qualia. Feature-category resonances are predicted to subserve recognition of familiar objects and scenes. Item-list resonances are predicted to subserve conscious percepts of speech and language. Spectral-pitch resonances are predicted to subserve conscious percepts of auditory streams. Cognitive-emotional resonances are predicted to subserve conscious percepts of feelings and core consciousness. All of these resonances have distinct anatomical substrates. Some functionally important resonances do not have readily available names from day-to-day language. For example, parietal-prefrontal resonances are predicted to trigger the selective opening of basal ganglia gates to enable the read out of context-appropriate actions. Entorhinal-hippocampal resonances are predicted to dynamically stabilize the learning of entorhinal grid cells and hippocampal place cells.

ART Matching Rule and Biased Competition: Top-down, Modulatory On-Center, Off-Surround Network

Attention obeys the ART Matching Rule

How are What stream top-down expectations computed? How do they focus attention on expected combinations of features? Carpenter and Grossberg (1987a) mathematically proved that the simplest attentional circuit that solves the stability-plasticity dilemma is a top-down, modulatory on-center, off-surround network, which provides excitatory priming of critical features in the on-center, and driving inhibition of irrelevant features in the off-surround. The modulatory on-center emerges from a balance between top-down excitation and inhibition. The neurons in the network obey the membrane equations of neurophysiology. The entire attentional circuit is said to satisfy the ART Matching Rule.

Noise-saturation dilemma: Shunting on-center off-surround networks

Grossberg (1973) proved that the shunting, or automatic gain control, properties of neurons whose activities obey membrane equations and interact via an on-center off-surround network enable them to self-normalize their activities, and thereby solve a design problem that is just as basic as the stability-plasticity dilemma. This design problem is called the noise-saturation dilemma: Without suitable interactions between neurons, their inputs can be lost in cellular noise if they are too small, or can saturate cell activities at their maximum values if they are too large. Moreover, input amplitudes can vary greatly through time. What sort of network interactions enable neurons to retain their sensitivities to the relative sizes of their inputs across the network, even while these inputs may vary in size through time over several orders of magnitude? The answer is: an on-center off-surround network whose cells exhibit shunting properties; see equation (1).

Modeling studies have clarified how a top-down, modulatory on-center, off-surround network can regulate attention across multiple modalities of intelligence (e.g., Dranias, Grossberg, and Bullock, 2008; Gove et al., 1995; Grossberg et al., 2004; Grossberg and Kazerounian, 2011). Models of how cerebral cortex embodies attention within its layered circuits have discovered that identified cell types and connections exist with the necessary properties to realize the ART Matching Rule; see section on Laminar ART Models.

Data support for the ART Matching Rule

Many anatomical and neurophysiological experiments have provided support for the ART prediction of how attention works, including data about modulatory on-center, off-surround interactions; excitatory priming of features in the on-center; suppression of features in the off-surround; and gain amplification of matched data (e.g., Bullier et al., 1996; Caputo and Guerra, 1998; Downing, 1988; Hupé et al., 1997; Mounts, 2000; Reynolds et al., 1999; Sillito et al., 1994; Somers et al., 1999; Steinman et al., 1995; Vanduffell et al., 2000). The ART Matching Rule is often called the “biased competition” model of attention by experimental neurophysiologists (Desimone, 1998; Kastner and Ungerleider, 2001). The property of the ART Matching Rule that bottom-up sensory activity may be enhanced when matched by top-down signals is in accord with an extensive neurophysiological literature showing the facilitatory effect of attentional feedback (Luck et al., 1997; Roelfsema et al., 1998; Sillito et al., 1994), but not with models, such as Bayesian "explaining away" models, in which matches with top-down feedback cause only suppression (Mumford, 1992; Rao and Ballard, 1999).

The ART Matching Rule helps to helps to explain the existence of top-down modulatory connections at multiple stages of cortical processing. For example, Zeki and Shipp (1988, p. 316) wrote that “backward connections seem not to excite cells in lower areas, but instead influence the way they respond to stimuli”; that is, they are modulatory. Likewise, the data of Sillito et al. (1994, pp. 479-482) on attentional feedback from cortical area V1 to the Lateral Geniculate Nucleus (LGN) support an early prediction that the ART Matching Rule should exist in this pathway as well (Grossberg, 1976b). Sillito et al. (1994) concluded that “the cortico-thalamic input is only strong enough to exert an effect on those dLGN cells that are additionally polarized by their retinal input...the feedback circuit searches for correlations that support the ‘hypothesis’ represented by a particular pattern of cortical activity”. Their experiments demonstrated all of the properties of the ART Matching Rule, since they also found that “cortically induced correlation of relay cell activity produces coherent firing in those groups of relay cells with receptive-field alignments appropriate to signal the particular orientation of the moving contour to the cortex...this increases the gain of the input for feature-linked events detected by the cortex”. In other words, top-down priming, by itself, cannot fully activate LGN cells; it needs matched bottom-up retinal inputs to do so; and those LGN cells whose bottom-up signals support cortical activity get synchronized and amplified by this feedback. In addition, anatomical studies have shown that the V1-to-LGN pathway realizes a top-down on-center off-surround network (Dubin and Cleland, 1977; Sillito et al., 1994; Weber et al., 1989). Zhang et al. (1997) have shown that feedback from auditory cortex to the medial geniculate nucleus (MGN) and the inferior colliculus (IC) also has an on-center off-surround form, and Temereanca and Simons (2001) have produced evidence for a similar feedback architecture in the rodent barrel system.

Mathematical form of the ART Matching Rule

There is also convergence across models of how to mathematically instantiate the ART Matching Rule attentional circuit. For example, the “normalization model of attention” (Reynolds and Heeger, 2009) simulates several types of experiments on attention using the same equation for self-normalizing attention as the distributed ARTEXture (dARTEX) model (Bhatt et al., 2007, equation (A5)) used to simulate human psychophysical data about Orientation-Based Texture Segmentation (OBTS, Ben-Shahar and Zucker, 2004).

Imagining, Planning, and Hallucinations: Prediction without Action

A top-down expectation is not always modulatory. The excitatory/inhibitory balance in the modulatory on-center of a top-down expectation can be modified by volitional control from the basal ganglia. If, for example, volitional signals inhibit inhibitory interneurons in the on-center, then read-out of a top-down expectation from a recognition category can fire cells in the on-center prototype, not merely modulate them. Such volitional control has been predicted to control mental imagery and the ability to think and plan ahead without external action, a crucial type of predictive competence in humans and other mammals. If these volitional signals become tonically hyperactive, then top-down expectations can fire without overt intention, leading to properties like schizophrenic hallucinations (Grossberg, 2000a). In summary, the ability to learn quickly without catastrophic forgetting is embodied within circuits that can be volitionally modulated to enable imagination, internal thought, and planning. This modulation, which brings huge evolutionary advantages to those who have it, may also lead to hallucinations.

A similar modulatory circuit, again modulated by the basal ganglia, is predicted to control when sequences of events are stored in short-term working memory in the prefrontal cortex (Grossberg and Pearson, 2008; see Figure 5 below) and the span of spatial attention (“useful-field-of-view”) in the parietal and prefrontal cortex (Foley et al., 2012). ART predicts that all these properties share a circuit design which uses top-down expectations to dynamically stabilize fast learning throughout life.

Complementary Attention and Orienting Systems: Expected vs. Unexpected, Resonance vs. Reset

The cycle of resonance and reset

As noted above, learning within the sensory and cognitive domain that ART mechanizes is match learning: Match learning occurs only if a good enough match occurs between bottom-up information and a learned top-down expectation that is read out by an active recognition category, or code. When such an approximate match occurs, a resonance can be triggered, whereupon previous knowledge can be refined through learning. It has been mathematically proved that match learning within an ART model leads to stable memories of arbitrary events presented in any order (e.g., Carpenter and Grossberg, 1987a, 1991).

However, match learning also has a serious potential weakness: If fast learning can occur only when a good enough match occurs between bottom-up data and learned top-down expectations, then how is anything learned that is really novel? ART proposes that this problem is solved by the brain by using an interaction between complementary processes of resonance and reset that are predicted to control properties of attention and memory search, respectively. These complementary processes help brains to balance between processing the familiar and the unfamiliar, the expected and the unexpected.

How does a brain learn to balance between expected and unexpected events? Or to incorporate unexpected and unfamiliar events within the corpus of previously learned events, and do so without causing catastrophic forgetting? ART proposes that, when novel inputs cannot match a known recognition category, a memory search, or hypothesis testing, process is activated that enables a brain to discover and learn new recognition categories that best match novel objects or events.

The resonance process in the complementary pair of resonance and reset processes is predicted to take place in the What cortical stream, notably in the sensory, temporal, and prefrontal cortices. Here top-down expectations are matched against bottom-up inputs. When a top-down expectation achieves a good enough match with bottom-up data, this match process focuses attention upon those feature clusters in the bottom-up input that are expected. If the expectation is close enough to the input pattern, then a state of resonance develops as the attentional focus takes hold, which is often realized by oscillatory dynamics that synchronize the firing properties of the resonant neurons.

However, as noted above, a sufficiently bad mismatch between an active top-down expectation and a bottom-up input, say because the input represents an unfamiliar type of experience, can drive a memory search. Such a mismatch within the attentional system is proposed to activate a complementary orienting system, which is sensitive to unexpected and unfamiliar events. ART suggests that this orienting system includes the nonspecific thalamus and the hippocampal system. Carpenter and Grossberg (1993) and Grossberg and Versace (2008) summarize data supporting this prediction. Output signals from the orienting system rapidly reset the recognition category — that is, disconfirm the hypothesis — that has been reading out the poorly matching top-down expectation. The cause of the mismatch is hereby removed, thereby freeing the system to activate a different recognition category. In this way, a reset event triggers memory search, or hypothesis testing, which automatically leads to the selection of a recognition category that can better match the input.

If no such recognition category exists, say because the bottom-up input represents a truly novel experience, then the search process automatically activates an as yet uncommitted population of cells, with which to learn about the novel information. In order for a top-down expectation to match the features that activated a new recognition category, its top-down adaptive weights initially have large values, which are pruned by the learning of a particular expectation.

ART Reset, Search, and Hypothesis Testing Cycle

Figure 2: Search for a recognition code within an ART learning circuit: (a) Input pattern $I$ is instated across feature detectors at level $F_1$ as an activity pattern $X$, while it nonspecifically activates the orienting system $A$ with gain $\rho$, which is called the vigilance parameter. Output signals from activity pattern $X$ inhibits $A$ and generates output pattern $S$. $S$ is multiplied by learned adaptive weights to form the input pattern $T$. $T$ activates category cells $Y$ at level $F_2$. (b) $Y$ generates the top-down signals $U$ which are multiplied by adaptive weights and added at $F_1$ cells to form a prototype $V$ that encodes the learned expectation of active $F_2$ categories. If $V$ mismatches $I$ at $F_1$, then a new STM activity pattern $X^*$ (the hatched pattern) is selected at $F_1$. $X^*$ is active at $I$ features that are confirmed by $V$. Mismatched features (white area) are inhibited. When $X$ changes to $X^*$, total inhibition decreases from $F_1$ to $A$. (c) If inhibition decreases sufficiently so that the total inhibition due to $X^*$ is less than the total excitation due to $I$ multiplied by the vigilance parameter $\rho$, then $A$ is activated and releases a nonspecific arousal burst to $F_2$; that is, “novel events are arousing”. Arousal resets $F_2$ by inhibiting $Y$. (d) After $Y$ is inhibited, $X$ is reinstated and $Y$ stays inhibited as $X$ activates a different activity pattern $Y^*$. Search for a new $F_2$ category continues until a better matching or novel category is selected. When search ends, an attentive resonance triggers learning of the attended data. [Adapted with permission from Carpenter and Grossberg (1993).]

Figure 2 illustrates these ART ideas in a two-level network. Here, a bottom-up input pattern, or vector, $I$ activates a pattern $X$ of activity across the feature detectors of the first level $F_1$. For example, a visual scene may be represented by the features comprising its boundary and surface representations (Grossberg, 1994). This feature pattern represents the relative importance of different features in the inputs pattern $I$. In Figure 2a, the pattern peaks represent more activated feature detector cells, the troughs less activated feature detectors. This feature pattern sends signals S through an adaptive filter to the second level $F_2$ at which a compressed representation $Y$ (also called a recognition category, or a symbol) is activated in response to the distributed input $T$. Input $T$ is computed by multiplying the signal vector $S$ by a matrix of adaptive weights, or long-term memory traces, that can be altered through learning. The representation $Y$ is compressed by competitive interactions—in particular, shunting recurrent lateral inhibition—across $F_2$ that allow only a small subset of its most strongly activated cells to remain active in response to $T$. These active cells are the recognition category that represents the pattern of distributed features across level $F_1$. The pattern $Y$ in the figure indicates that a small number of category cells may be activated to different degrees.

These category cells, in turn, send top-down signals $U$ to $F_1$. The vector $U$ is converted into the top-down expectation $V$ by being multiplied by another matrix of adaptive weights. When $V$ is received by $F_1$, a matching process takes place between the input vector $I$ and $V$ which selects that subset $X^*$ of $F_1$ features that were “expected” by the active $F_2$ category $Y$. The set of these selected features is the emerging “attentional focus” that is gain amplified by the top-down match.

Synchronous binding of Feature Patterns and Categories during Conscious Resonances

If the top-down expectation is close enough to the bottom-up input pattern, then the pattern $X^*$ of attended features reactivates the category $Y$ which, in turn, reactivates $X^*$. The network hereby locks into a resonant state through a positive feedback loop that dynamically links, or binds, the attended features across $X^*$ with their category, or symbol, $Y$.

Resonant synthesis of complementary categories and distributed feature patterns

The resonance process itself embodies another type of complementary processing. This particular complementary relation occurs between distributed feature patterns and the compressed categories, or symbols, that selectively code them: Individual features at $F_1$ have no meaning on their own, just like the pixels in a picture are meaningless one-by-one. The category, or symbol, in $F_2$ is sensitive to the global patterning of these features, and can selectively fire in response to this pattern. But it cannot represent the “contents” of the experience, including their conscious qualia, because a category is a compressed, or “symbolic,” representation. Resonance between these two types of information converts the pattern of attended features into a coherent context-sensitive state that is linked to its category through feedback. Coherent binding of the attended features to the category give them a meaning as a context-sensitive "event" rather than as just isolated pixels. Such coherent states between distributed features and symbolic categories are often expressed dynamically as synchronously oscillating activations across the bound cells, and can enter consciousness.

Order-preserving limit cycles and synchronous oscillations

Grossberg (1976b) predicted the existence of such synchronous oscillations, which were there called “order-preserving limit cycles”. The property of “order-preservation” means that the relative sizes, and thus importance, of the resonating feature activations should not reverse during the oscillation, which could occur, for example, during a traveling wave. Many neurophysiological experiments have confirmed the existence of synchronous oscillations since the confirmatory experiments of Eckhorn et al. (1988) and Gray and Singer (1989). Raizada and Grossberg (2003) and Grossberg and Versace (2008) review confirmed ART predictions, including predictions about synchronous oscillations.

Resonance Links Intentional and Attentional Information Processing to Learning

In ART, the resonant state, rather than bottom-up activation alone, is predicted to drive fast learning. The synchronous resonant state persists long enough, and at a high enough activity level, to activate the slower learning processes in the adaptive weights that guide the flow of signals between bottom-up adaptive filter and top-down expectation pathways between levels $F_1$ and $F_2$ in Figure 2. Adaptive weights that were changed through previous learning can hereby regulate the brain's present information processing, without necessarily learning about the signals that they are currently processing, unless the network as a whole can initiate a resonant state. Through resonance as a mediating event, one can understand from a deeper mechanistic view why humans are intentional beings who are continually predicting what may next occur, and why we tend to learn about the events to which we pay attention.

This match-based learning process stabilizes learned memories both in the bottom-up adaptive filters that activate recognition categories and in the top-down expectations that are matched against feature patterns. It embodies a fundamental form of prediction that can be activated either bottom-up by input data, or top-down by an expectation that predictively primes a class of events whose future occurrence is sought. Match-based learning allows memories to change only when input from the external world is close enough to internal expectations, or when something completely new occurs.

Mixing Unsupervised with Supervised Learning

The ART category learning process works well under both unsupervised and supervised conditions. Variants of the ARTMAP architecture can carry out both types of learning (e.g., Carpenter et al., 1992). Unsupervised learning means that the system can learn how to categorize novel input patterns without any external feedback. Supervised learning uses predictive errors to let the system know whether it has categorized the information correctly or not.

Supervision can force a search for new categories that may be culturally determined, and are not based on feature similarity alone. For example, separating the featurally similar letters E and F into separate recognition categories is culturally determined. Such error-based feedback enables variants of E and F to learn their own category and top-down expectation, or prototype. The complementary, but interacting, processes of attentive-learning and orienting-search together realize a type of error correction through hypothesis testing that can build an ever-growing, self-refining internal model of a changing world.

Mismatch-activated Nonspecific Arousal Regulates Reset and Search

Complementary attentional and orienting systems

The attentional and orienting systems in an ART network (Figure 2) also experience complementary informational deficiencies. At the moment when a predictive error occurs, the system does not know why the currently active category was insufficient to predict the correct outcome. In particular, when the orienting system gets activated by a mismatch in the attentional system, the orienting system has no way of knowing what went wrong in the attentional system. Thus, the attentional system has information about how inputs are categorized, but not whether the categorization is correct, whereas the orienting system has information about whether the categorization is correct, but not what is being categorized. How, then, does the orienting system cope with the daunting challenge of resetting and driving a memory search within the attentional system in a way that leads to a better outcome after the search ends.

Novelty-sensitive nonspecific arousal: Novel events are arousing

Because the orienting system does not know what cells in the attentional system caused the predictive error, its activation needs to influence all potential sources of the error equally. Thus, mismatch triggers a burst of nonspecific arousal that activates all cells in the attentional system equally. In other words, novel events are arousing! Said in a more philosophical way, a novelty-sensitive burst of nonspecific arousal implements the principle of sufficient reason. As illustrated in Figure 2, the current state of activation of the attentional system interacts with such an arousal burst to selectively reset cells that caused the mismatch, and to thereby drive a search leading to a better predictive outcome.

Medium-term memory: Habituative transmitter gates in nonstationary hypothesis testing

Due to habituative gating, recently active cells are more habituated than inactive cells. Activity-dependent habituation interacts with self-normalizing competition among the category cells to help suppress cells that are most active when the arousal burst is received. Once the maximally activated cells are suppressed by this combination of habituation and competition during the search cycle, the self-normalizing network activity is available to enable other cells, which got smaller inputs than the original winning cells, to become active in the next time interval. This cycle of mismatch-arousal-reset continues until resonance can again occur.

The self-normalizing total activity of the category cell network enables the activities of these categories to be interpreted as a kind of real-time probability distribution, and the ART search cycle to be interpreted as a kind of probabilistic hypothesis testing and decision making that works in response to non-stationary time series of input patterns.

Vigilance Regulates the Content of Conscious Experiences: Exemplars and Prototypes

Vigilance controls whether concrete or general categories are learned

What combinations of features or other information are bound together into conscious object or event representations? One popular view in cognitive psychology is that exemplars, or individual experiences, are learned, because humans can have very specific memories. For example, we can all recognize the faces of our friends. On the other hand, storing every remembered experience as an exemplar could lead to a combinatorial explosion of memory, as well as to unmanageable problems of memory retrieval. A possible way out is suggested by the fact that humans can learn prototypes which represent general properties of the environment (Posner and Keele, 1968). For example, we can recognize that everyone has a face. But then how do we learn specific episodic memories? ART provides an answer to this question that overcomes problems faced by earlier models.

ART prototypes are not merely averages of the exemplars that are classified by a category, as is often assumed in classical prototype models. Rather, they are the actively selected critical feature patterns upon which the top-down expectations of the category focus attention. The generality of the information that is coded by these critical feature patterns is controlled by a gain control process, called vigilance control, which can be influenced by environmental feedback or internal volition (Carpenter and Grossberg, 1987a). Low vigilance permits the learning of general categories with abstract prototypes. High vigilance forces a memory search to occur for a new category when even small mismatches exist between an exemplar and the category that it activates. As a result, in the limit of high vigilance, the category prototype may encode an individual exemplar.

Vigilance is computed in the orienting system

Vigilance is computed within the orienting system of an ART model (Figures 2b-d). It is here that bottom-up excitation from all the active features in an input pattern $I$ are compared with inhibition from all the active features in a distributed feature representation across $F_1$. If the ratio of the total activity across the active features in $F_1$ (that is, the “matched” features) to the total activity due to all the features in $I$ is less than a vigilance parameter $\rho$ (Figure 2b), then a nonspecific reset wave is activated (Figure 2c), which can drive the search for another category with which to classify the exemplar. This can be accomplished by letting $\rho$ multiply the bottom-up inputs $I$ to the orienting system; that is, $\rho$ is the gain of the inputs to the orienting system. The orienting system is then activated when the total excitatory input $\rho I$ is greater than the total inhibition from the features $X^*$ across $F_1$ that survive top-down matching; that is, when $ \rho |I| - |X^*| > 0$, where $ | .| $ denotes the number of positive inputs or matched features. This inequality can be rewritten as $ \rho > |X^*| |I|^{-1} > 0$ to show that the orienting system is activated whenever $\rho$ is chosen higher than the ratio of active $X^*$ matched features in $F_1$ to total features in $I$. In other words, the vigilance parameter controls how bad a match can be before search for a new category is initiated. If the vigilance parameter is low, then many exemplars can all influence the learning of a shared prototype, by chipping away at the features that are not shared with all the exemplars. If the vigilance parameter is high, then even a small difference between a new exemplar and a known prototype (e.g., $F$ vs. $E$ ) can drive the search for a new category with which to represent $F$.

Minimax learning via match tracking: Learning the most general predictive categories

One way to control vigilance is by a process of match tracking (Carpenter et al., 1991, 1992). Here, in response to a predictive error (e.g., $D$ is predicted in response to $F$), the vigilance parameter $\rho$ increases just enough to trigger reset and search for a better-matching category. Match tracking gives up the minimum amount of generalization in the learned categories to search for a better-matching category. In other words, vigilance “tracks” the degree of match between input exemplar and matched prototype. Because match tracking increases vigilance by the minimum amount to trigger a reset and search for a new category, it realizes a Minimax Learning Rule that conjointly maximizes category generality while it minimizes predictive error. Match tracking thus uses the least memory resources that can correct errors in classification.

Because the baseline level of vigilance is initially set at the lowest level that has led to predictive success in the past, ART models try to learn the most general category that is consistent with the data. This tendency can, for example, lead to the type of overgeneralization that is seen in young children until further learning leads to category refinement. However, because vigilance can vary during match tracking in a manner that reflects current predictive success, recognition categories capable of encoding widely differing degrees of generalization or abstraction can be learned by a single ART system. Low vigilance leads to broad generalization and abstract prototypes. High vigilance leads to narrow generalization and to prototypes that represent fewer input exemplars, even a single exemplar. Thus a single ART system may be used, say, to learn abstract prototypes with which to recognize abstract categories of faces and dogs, as well as “exemplar prototypes” with which to recognize individual views of faces and dogs, depending on task requirements.

Memory Consolidation and the Emergence of Rules: Direct Access to Globally Best Match

As sequences of inputs are practiced over learning trials, the search process eventually converges upon stable categories. It has been mathematically proved (e.g., Carpenter and Grossberg, 1987a) that familiar inputs directly access the category whose prototype provides the globally best match, without undergoing any search, while unfamiliar inputs engage the orienting subsystem to trigger memory searches for better categories until they become familiar. In other words, ART provides a solution of the local minimum problem that various other algorithms, such as back propagation (Baldi and Hornik, 1989; Gori and Tessi, 1992), do not solve. This process of search and category learning continues until the memory capacity, which can be chosen arbitrarily large, is fully utilized.

Memory consolidation and medial temporal amnesia

The process whereby search is automatically disengaged is a form of memory consolidation that emerges from network interactions. The first example of memory consolidation that was described by ART concerns cortico-hippocampal interactions, and proposed how a hippocampal ablation may cause symptoms of medial temporal amnesia (Carpenter and Grossberg, 1993). Emergent consolidation does not preclude structural consolidation at individual cells, since the amplified and prolonged activities that subserve a resonance may be a trigger for learning-dependent cellular processes, such as protein synthesis, synapse formation, and transmitter production.

Learning of fuzzy IF-THEN rules by a self-organizing production system

It has been proved that the adaptive weights which are learned by some ART models can, at any stage of learning, be translated into fuzzy IF-THEN rules (Carpenter et al., 1992). Thus the ART model is a self-organizing rule-discovery production system as well as a neural network. These examples show that the claims of some cognitive scientists and AI practitioners that neural network models cannot learn rule-based behaviors are as incorrect as the claims that neural models cannot learn symbols.

Laminar ART Models

Realization of ART Principles in Laminar Cortical Circuits

ART has gone through several stages of development to show how its predicted mechanisms may be embodied in the laminar circuits of neocortex. This research has led to the computational paradigm of Laminar Computing, which was introduced in Grossberg (1999), and which has begun to show how predicted ART mechanisms may be embodied within known laminar microcircuits of the cerebral cortex. Laminar Computing is not a mere relabelling of the previous ART theory. Rather, it has proposed a solution of a long-standing conceptual problem, and enabled the explanation and prediction of much more cognitive and brain data. In so doing, it unified two major streams of research activity. The two streams of research activity are:

ART as a theory of category learning and prediction. This stream emphasized bottom-up and top-down interactions within higher-level cortical circuits, such as cortical areas V4, inferotemporal cortex, and prefrontal cortex, during the learning of visual recognition categories;

FACADE (Form-And-Color-And-DEpth) as a theory of 3D vision and figure-ground perception (Cao and Grossberg, 2005, 2012; Fang and Grossberg, 2009; Grossberg, 1994, 1997; Grossberg and McLoughlin, 1997; Grossberg and Swaminathan, 2004; Grossberg and Yazdanbakhsh, 2005; Grossberg et al., 2008). This stream emphasized bottom-up and horizontal interactions for completion of boundaries during perceptual grouping, and for filling-in of surface brightness and color. These interactions were proposed to occur in lower cortical processing areas such as V1, V2, and V4.

Laminar Cortical Models: LAMINART, cARTWORD, LIST PARSE, SMART

The unification of these two research streams in LAMINART proposed how all cortical areas combine bottom-up, horizontal, and top-down interactions, thereby beginning to functionally clarify why all granular neocortex has a characteristic architecture with six main cell layers (Felleman and Van Essen, 1991), and how these laminar circuits may be specialized to carry out different types of biological intelligence. In particular, this unification suggested how variations of a shared laminar cortical design could be used to explain psychological and neurobiological data about vision, speech, and cognition:

Vision

Figure 3: The LAMINART model clarifies how bottom-up, horizontal, and top-down interactions within and across cortical layers in V1 and V2 interblob and pale stripe regions, respectively, carry out bottom-up adaptive filtering, horizontal grouping, and top-down attention. Similar interactions seem to occur in all six-layered cortices. See text for details. [Reprinted with permission from Raizada and Grossberg (2001).]

3D LAMINART integrates bottom-up and horizontal processes of 3D boundary formation and perceptual grouping, surface filling-in, and figure-ground separation with top-down attentional matching in cortical areas such as V1, V2, and V4 (Cao and Grossberg, 2005; Grossberg, 1999; Grossberg and Raizada, 2000; Grossberg and Swaminathan, 2004; Grossberg and Yazdanbakhsh, 2005; Raizada and Grossberg, 2001). The LAMINART model for 2D perceptual grouping, or boundary completion, is described in Figure 3. The SMART model (see below) extended LAMINART to model attentive thalamocortical learning and matching in the visual cortex using spiking neurons.

Speech

Figure 4: The cARTWORD model describes a hierarchy of levels to carry out some key processes involved in speech and language perception. Each level is organized into laminar cortical circuits, wherein deep layers (6 and 4) are responsible for processing and storing inputs, and superficial layers (2/3) are proposed to group distributed patterns across these deeper layers into unitized representations. The lowest level is responsible for processing acoustic features (cell activities $F_i$ and $E_i$ ) and items (cell activities $C^{(I)}_i$ ), whereas the higher level is responsible for storing of sequences of acoustic items in working memory (activities $Y_i$ and $X_i$ ), and representing these stored sequences of these items as unitized, context-sensitive representations by list chunks (activities $C^{(J)}_i$ ) in a network, called a masking field, that is capable of selectively representing lists of variable length. [Reprinted with permission from Grossberg and Kazerounian (2011).]

cARTWORD models how bottom-up, horizontal, and top-down interactions within a hierarchy of laminar cortical processing stages, modulated by the basal ganglia, can generate a conscious speech percept that is embodied by a resonant wave of activation that occurs between acoustic features, acoustic item chunks, and list chunks (Figure 4; Grossberg and Kazerounian, 2011). Chunk-mediated gating allows speech to be heard in the correct temporal order, even when what is consciously heard depends upon using future context to disambiguate noise-occluded sounds, as occurs during phonemic restoration.

Cognition

Figure 5: Circuit diagram of the LIST PARSE model. The Item and Order working memory is realized by a recurrent shunting on-center off-surround network in layers 4 and 6 of the Cognitive Working Memory, which is assumed to occur in ventrolateral prefrontal cortex. The list chunks are learned in layer 2/3. Outputs from the Cognitive Working Memory to the Motor Working Memory interact with a Vector Integration to Endpoint (VITE) trajectory generator (Bullock and Grossberg, 1988), modulated by the basal ganglia, to perform sequences of variable length at variable speeds. Solid arrows indicate fixed excitatory connections. Solid lines with hemi-disks indicate modifiable (i.e., learned) connections. Dashed arrows indicate fixed inhibitory connections. Only 1-item chunks ($C_j$) and their feedback connections within a single Cognitive Working Memory channel are shown, whereas the model uses chunks of various sizes in layer 2/3 and feedback from layer 2/3 to layer 5/6 of the cognitive working memory is broadly distributed. Also, only the excitatory projections from Cognitive Working Memory to the Motor Plan Field are shown. Green solid arrows denote excitatory connections, red dashed arrows denote inhibitory connections, and blue solid lines ending in hemidisk synapses denote excitatory adaptive connections. [Reprinted with permission from Grossberg and Pearson (2008).]

Figure 6: Schematic of an Item and Order working memory: A temporal sequence of inputs creates a spatial activation pattern among STM activations (Bradski et al., 1994; Grossberg, 1978a, 1978b), often a primacy gradient (height of hatched rectangles is proportional to cell activity). Relative activation levels among stored items codes both which items are stored and the temporal order in which they are stored. A nonspecific rehearsal wave allows item activations to be rehearsed, with the largest activity being read out first. The output signal from this item also activates a self-inhibitory interneuron that inhibits the item, and thereby enables the next most active item to be performed. The process then repeats itself. Green solid arrows denote excitatory connections, and red dashed arrows denote inhibitory connections. [Reprinted with permission from Grossberg and Pearson (2008).]

LIST PARSE (Figure 5) models how bottom-up, horizontal, and top-down interactions within the laminar circuits of lateral prefrontal cortex may carry out working memory storage (Figure 6) of event sequences within layers 6 and 4, how unitization of these event sequences through learning into list chunks may occur within layer 2/3, and how these stored sequences can be recalled at variable rates that are under volitional control by the basal ganglia (Grossberg and Pearson, 2008). In particular, the model uses variations of the same circuitry to quantitatively simulate human cognitive data about immediate serial recall and free recall, and monkey neurophysiological data from the prefrontal cortex obtained during sequential sensory-motor imitation and planned performance.

This emerging unified theory of how variations of a shared laminar neocortical design can carry out multiple types of biological intelligence is also of interest in technology, where having a unified VLSI chip set for multiple types of biological intelligence would revolutionize computer science in general, and the design of autonomous adaptive mobile robots in particular. The DARPA SyNAPSE program is currently pursuing such a possibility (http://en.wikipedia.org/wiki/SyNAPSE).

Resonance vs. Reset Implies Gamma vs. Beta Oscillations.

The Synchronous Matching ART (SMART) model (Grossberg and Versace, 2008) extends the LAMINART model to include neurons that communicate via discrete spikes. In particular, SMART incorporates spiking dynamics and hierarchical thalamocortical and corticocortical interactions into the LAMINART model (Figure 7). SMART hereby provides a unified functional explanation of single cell properties, such as spiking dynamics, spike-timing-dependent plasticity (STDP), and acetylcholine modulation; hierarchical laminar thalamic and cortical circuit designs and their interactions; aggregate cell recordings, such as current-source densities and local field potentials; and single cell and large-scale inter-areal oscillations in the gamma and beta frequency domains.

Figure 7: The SMART model clarifies how laminar neocortical circuits in multiple cortical areas interact with specific and nonspecific thalamic nuclei to regulate learning on multiple organizational levels, ranging from spikes to cognitive dynamics. The thalamus is subdivided into specific first-order and second- order nuclei, nonspecific nucleus, and thalamic reticular nucleus (TRN). The first-order thalamic matrix cells (shown as an open ring) provide nonspecific excitatory priming to layer 1 in response to bottom-up input, priming layer 5 cells and allowing them to respond to layer 2/3 input. This allows layer 5 to close the intracortical loop and activate the pulvinar (PULV). V1 layer 4 receives inputs from two parallel bottom-up thalamocortical pathways: a direct $ \mathrm{LGN} \rightarrow \mathrm{4} $ excitatory input, and a $ \mathrm{6}^\mathrm{I} \rightarrow \mathrm{4} $ modulatory on-center, off-surround network that contrast-normalizes the pattern of layer 4 activation via the recurrent $ \mathrm{4} \rightarrow \mathrm{2/3} \rightarrow \mathrm{5} \rightarrow \mathrm{6^\mathrm{I}} \rightarrow \mathrm{4} $ loop. V1 activates the bottom-up $ \mathrm{V1} \rightarrow \mathrm{V2} $ corticocortical pathways from V1 layer 2/3 to V2 layers $ \mathrm{6^I}$ and 4, as well as the bottom-up corticothalamocortical pathway from V1 layer 5 to the PULV, which projects to V2 layers $ \mathrm{6^I}$ and 4. In V2, as in V1, the layer $ \mathrm{6^I} \rightarrow \mathrm{4} $ pathway provides divisive contrast normalization to V2 layer 4 cells. Corticocortical feedback from V2 layer $ \mathrm{6^{II}} $ reaches V1 layer 1, where it activates apical dendrites of layer 5 cells. Layer 5 cells, in turn, activate the modulatory $ \mathrm{6^I} \rightarrow \mathrm{4} $ pathway in V1, which projects a V1 top-down expectation to the LGN. TRN cells of the two thalamic sectors are linked via gap junctions, which synchronize activation across the two thalamocortical sectors when processing bottom-up stimuli. The nonspecific thalamic nucleus receives convergent bottom-up excitatory input from specific thalamic nuclei and inhibition from the TRN, and projects to layer 1 of the laminar cortical circuit, where it regulates mismatch-activated reset and hypothesis testing in the cortical circuit. Corticocortical feedback connections from layer $ \mathrm{6^{II}} $ of the higher cortical area terminate in layer 1 of the lower cortical area, whereas corticothalamic feedback from layer $ \mathrm{6^{II}} $ terminates in its specific thalamus and on the TRN. This corticothalamic feedback is matched against bottom-up input in the specific thalamus. [Reprinted with permission from Grossberg and Versace (2008).]

SMART demonstrates how a top-down attentive match may lead to fast gamma oscillations that facilitate spike-timing dependent plasticity (STDP), whereas mismatch and reset can lead to slower beta oscillations that help to prevent mismatched events from being learned. This match-mismatch gamma-beta story seems to occur in quite a few brain systems, with examples of data supporting the SMART prediction having recently been reported in cortical area V1, hippocampus, and frontal eye fields (see Grossberg (2012) for a review).

Vigilance control by ACh under nucleus basalis control

As in all ART models, the generality of learned recognition codes in SMART is proposed to be controlled by a vigilance process. SMART predicts how vigilance may be altered by acetylcholine when the nucleus basalis of Meynert is activated via the nonspecific thalamus (Kraus et al., 1994; van Der Werf et al., 2002) which, in turn, is activated by corticothalamic mismatches with one or more specific thalamic nuclei that (Figure 7). The increase of ACh might hereby promote search for finer recognition categories in response to disconfirmatory environmental feedback, even when bottom-up and top-down signals have a pretty good match in the nonspecific thalamus based on similarity alone.

High vigilance and hyperspecific category learning in autism

High vigilance has been predicted to cause symptoms of hyperspecific category learning and attentional deficits in some autistic individuals (Grossberg and Seidman, 2006). Psychophysical experiments have been done to test this prediction in high-functioning autistic individuals (Church et al., 2010; Vladusich et al., 2010). Abnormal cholinergic activity in the parietal and frontal cortices of autistic individuals that is correlated with abnormalities in the nucleus basalis (Perry et al., 2001) is consistent with the predicted role of the nucleus basalis and ACh in regulating vigilance.

References

Abbott, L.F., Varela, K. Sen, K., and Nelson, S.B. (1997). Synaptic depression and cortical gain control. Science, 275, 220-223.
Akhbardeh, A., Junnila, S., Koivistoinen, T., and Varri, A. (2007). An intelligent ballistocardiographic chair using a novel SF-ART neural network and biorthogonal wavelets. Journal of Medical Systems, 31, 69-77.
Amis, G., and Carpenter, G. (2007). Default ARTMAP 2. Proceedings of the International Joint Conference on Neural Networks (IJCNN'07), 777-782. Orlando, Florida, IEEE press.
Amis, G., and Carpenter, G. (2009). Self-supervised ARTMAP. Neural Networks, 23, 265-282.
Anagnostopoulos, G.C., and Georgiopoulos, M. (2000). Hypersphere ART and ARTMAP for unsupervised and supervised incremental learning. Neural Networks, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, 6, 59-64.
Anton-Rodriguez, M., Diaz-Pernas, F.J., Diez-Higuera, J.F., Martinez-Zarzuela, M., Gonzalez-Ortega, D., and Boto-Giralda, D. (2009). Recognition of coloured and textured images through a multi-scale neural architecture with orientational filtering and chromatic diffusion. Neurocomputing, 72, 3713-3725.
Baldi, P., and Hornik, K. (1989). Neural networks and principal component analysis: Learning from examples and local minima. Neural Networks, 2, 53-58.
Bar, M., Tootell, R.B.H., Schacter, D.L., Greve, D.N., Fischl, B., Mendola, J.D., Rosen, B.R. and Dale, A.M. (2001). Cortical mechanisms specific to explicit object recognition. Neuron, 29, 529-535.
Ben-Shahar, O., and Zucker, S. (2004). Sensitivity to curvatures in orientation-based texture segmentation. Vision Research, 44, 257-277.
Bhatt, R., Carpenter, G., and Grossberg, S. (2007). Texture segregation by visual cortex: Perceptual grouping, attention, and learning. Vision Research, 47, 3173-3211.
Bradski, G., Carpenter, G.A., and Grossberg, S. (1994). STORE working memory networks for storage and recall of arbitrary temporal sequences. Biological Cybernetics, 71, 469-480.
Brannon, N.G., Seiffertt, J.E., Draelos, T.J., and Wunsch, D.C.II. (2009). Coordinated machine learning and decision support for situation awareness. Neural Networks, 22, 316-325.
Brown, J., Bullock, D., and Grossberg, S. (1999). How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues. Journal of Neuroscience, 19, 10502-10511.
Brown, J.W., Bullock, D., and Grossberg, S. (2004). How laminar frontal cortex and basal ganglia circuits interact to control planned and reactive saccades. Neural Networks, 17, 471-510.
Bullier, J., Hupé, J. M., James, A., and Girard, P. (1996). Functional interactions between areas V1 and V2 in the monkey. Journal of Physiology (Paris), 90, 217-220.
Bullock, D., Cisek, P. and Grossberg, S. (1998). Cortical networks for control of voluntary arm movements under variable force conditions. Cerebral Cortex, 8, 48-62.
Bullock, D. and Grossberg, S. (1988). Neural dynamics of planned arm movements: Emergent invariants and speed-accuracy properties during trajectory formation. Psychological Review, 95, 49-90.
Buschman, T. J., and Miller, E. K. (2007). Top-down versus bottom-up control of attention in the prefrontal and posterior parietal cortices. Science, 315, 1860-1862.
Cai, Y., Wang, J.-Z., Tang, Y., and Yang, Y.-C. (2011). An efficient approach for electric load forecasting using distributed ART (adaptive resonance theory) & HS-ARTMAP (Hyper-spherical ARTMAP network) neural network. Energy, 36, 1340-1350.
Cano-Izquierdo, J.-M., Almonacid, M., Pinzolas, M., and Ibarrola, J. (2009). dFasArt: Dynamic neural processing in FasArt model. Neural Networks, 22, 479-487.
Cao, Y. and Grossberg, S. (2005). A laminar cortical model of stereopsis and 3D surface perception: Closure and da Vinci stereopsis. Spatial Vision, 18, 515-578.
Cao, Y., and Grossberg, S. (2012). Stereopsis and 3D surface perception by spiking neurons in laminar cortical circuits: A method of converting neural rate models into spiking models. Neural Networks, 26, 75-98.
Cao, Y., Grossberg, S., and Markowitz, J. (2011). How does the brain rapidly learn and reorganize view- and positionally-invariant object representations in inferior temporal cortex? Neural Networks, 24, 1050-1061.
Caputo, G., and Guerra, S. (1998). Attentional selection by distractor suppression. Vision Research, 38, 669–689.
Carpenter, G.A. (1997). Distributed learning, recognition, and prediction by ART and ARTMAP neural networks. Neural Networks, 10, 1473-1494.
Carpenter, G.A. (2003). Default ARTMAP. Proceedings of the international joint conference on neural networks (IJCNN’03), 1396–1401.
Carpenter, G.A. and Gaddam, S.C. (2010). Biased ART: A neural architecture that shifts attention toward previously disregarded features following an incorrect prediction. Neural Networks, 23, 435-451.
Carpenter G. A., and Grossberg S. (1987a). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 37, 54–115.
Carpenter, G.A. and Grossberg, S. (1987b). ART 2: Stable self-organization of pattern recognition codes for analog input patterns. Applied Optics, 26 , 4919-4930.
Carpenter, G.A. and Grossberg, S. (1990). ART 3: Hierarchical search using chemical transmitters in self- organizing pattern recognition architectures. Neural Networks, 3, 129-152.
Carpenter, G.A., and Grossberg, S. (1991). Pattern Recognition by Self-Organizing Neural Networks. Cambridge, MA: MIT Press.
Carpenter, G.A., and Grossberg, S. (1993). Normal and amnesic learning, recognition, and memory by a neural model of cortico-hippocampal interactions. Trends in Neurosciences, 16, 131-137.
Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., and Rosen, D.B. (1992). Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Transactions on Neural Networks, 3, 698-713.
Carpenter G. A., Grossberg, S., and Reynolds, J. H. (1991). ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network. Neural Networks, 4, 565–588.
Carpenter, G. A., Grossberg, S., and Rosen, D. B. (1991). Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural Networks, 4, 759–771.
Carpenter, G.A., Martens, S., and Ogas, O.J. (2005). Self-organizing information fusion and hierarchical knowledge discovery: a new framework using ARTMAP neural networks. Neural Networks, 18, 287-295.
Carpenter, G.A., Milenova, B.L., and Noeske, B.W. (1998). Distributed ARTMAP: a neural network for fast distributed supervised learning. Neural Networks, 11, 793-813.
Carpenter, G.A., and Ravindran, A. (2008). Unifying multiple knowledge domains using the ARTMAP information fusion system. Proceedings of the 11th International Conference on Information Fusion, Cologne, Germany, June 30 – July 3, 2008.
Caudell, T. P. (1992). Hybrid optoelectronic adaptive rsonance theory neura processor, ART 1. Applied Optics, 31, 6220-6229.
Caudell, T. P., Smith, S.D.G., Johnson, G.C., Wunsch, D.C., II., and Escobedo, R. (1991). An industrial application to neural networks to reusable design. Neural Networks, International Joint Conference on Neural Networks, Vol. 2, p. 919.
Chao, H.-C., Hsiao, C.-M., Su, W.-S., Hsu, C.-C., and Wu, C.-Y. (2011). Modified adaptive resonance theory for alarm correlation based on distance hierarchy in mobile networks. Network Operations and Management Symposium, 2011 13th Asia-Pacific, 1-4.
Cherng, S., Fang, C.-Y., Chen, C.-P., and Chen, S.-W. (2009). Critical motion detection of nearby moving vehicles in a vision-based driver-assistance system. IEEE Transactions on Intelligent Transportation Systems, 10, 70-82.
Church, B.A., Krauss, M.S., Lopata, C., Toomey, J.A., Thomeer, M.L., Coutinho, M.V., Volker, M.A., and Mercado, E. (2010). Atypical categorization in children with high-functioning autism spectrum disorder. Psychonomic Bulletin & Review, 17, 862-868.
Cohen, M.A., and Grossberg, S. (1983). Absolute stability of global pattern formation and parallel memory storage by competitive neural networks. IEEE Transactions on Systems, Man, and Cybernetics, 13, 815-826.
Cohen, N. J., and Squire, L. R. (1980). Preserved learning and retention of a pattern-analyzing skill in amnesia: Dissociation of knowing how and knowing that. Science, 210, 207–210.
Demetgul., M., Tansel, I.N., and Taskin, S. (2009). Fault diagnosis of psneumatic systems with artificial neural network architectures. Expert Systems with Applications, 36, 10512-10519.
Desimone, R. (1998). Visual attention mediated by biased competition in extrastriate visual cortex. Philosophical Transactions of the Royal Society of London, 353, 1245–1255.
Downing, C. J. (1988). Expectancy and visual-spatial attention: Effects on perceptual quality. Journal of Experimental Psychology: Human Perception and Performance, 14, 188–202.
Dranias, M., Grossberg, S., and Bullock, D. (2008). Dopaminergic and non-dopaminergic value systems in conditioning and outcome-specific revaluation. Brain Research, 1238, 239-287.
Dubin, M. W. and Cleland, B. G. (1977). Organization of visual inputs to interneurons of lateral geniculate nucleus of the cat. Journal of Neurophysiology, 40, 410-427.
Dunbar, G. (2012). Adaptive Resonance Theory as a model of polysemy and vagueness in the cognitive lexicon, Cognitive Linguistics, 23, 507-537.
Eckhorn, R., Bauer, R., Jordan, W., Brosch, M., Kruse, W., Munk, M., and Reitbock, H. J. (1988). Coherent oscillations: A mechanism of feature linking in the visual cortex? Biological Cybernetics, 60, 121–130.
Engel, A. K., Fries, P., and Singer, W. (2001). Dynamics predictions: Oscillations and synchrony in top-down processing. Nature Reviews Neuroscience, 2, 704-716.
Fang, L. and Grossberg, S. (2009). From stereogram to surface: How the brain sees the world in depth. Spatial Vision, 22, 45-82.
Fazl, A., Grossberg, S., and Mingolla, E. (2009). View-invariant object category learning, recognition, and search: How spatial and object attention are coordinated using surface-based attentional shrouds. Cognitive Psychology, 58, 1-48.
Felleman, D. J., and Van Essen, D. (1991). Distributed hierarchical processing in the primate cerebral cortex, Cerebral Cortex, 1, 1-47.
Fiala, J.C., Grossberg, S., and Bullock, D. (1996). Metabotropic glutamate receptor activation in cerebellar Purkinje cells as substrate for adaptive timing of the classically conditioned eye blink response. Journal of Neuroscience, 16, 3760-3774.
Foley, N.C., Grossberg, S. and Mingolla, E. (2012). Neural dynamics of object-based multifocal visual spatial attention and priming: Object cueing, useful-field-of-view, and crowding. Cognitive Psychology, 65, 77-117.
Francis, G. and Grossberg, S. (1996). Cortical dynamics of boundary segmentation and reset: Persistence, afterimages, and residual traces . Perception, 35, 543-567.
Francis, G., Grossberg, S., Mingolla, E. (1994). Cortical dynamics of feature binding and reset: Control of visual persistence. Vision Research, 34, 1089-1104.
Gaudiano P., and Grossberg S. (1991). Vector associative maps: Unsupervised real-time error-based learning and control of movement trajectories. Neural Networks, 4, 147-183.
Gaudiano, P., and Grossberg, S. (1992). Adaptive vector integration to endpoint: Self-organizing neural circuits for control of planned movement trajectories. Human Movement Science, 11, 141-155.
Georgopoulos, A. P., Kalaska, J. F., Caminiti. R., and Massey, J. T. (1982). On the relations between the direction of two-dimensional arm movements and cell discharge in primate motor cortex. Journal of Neuroscience, 2, 1527-1537.
Georgopoulos, A. P., Schwartz, A. B., and Kettner, R. E. (1986). Neuronal population coding of movement direction. Science, 233, 1416-1419.
Gorchetchnikov, A., Versace, M., and Hasselmo, M.E. (2005). A model of STDP based on spatially and temporally local information: Derivation and combination with gated decay. Neural Networks, 16, 458-466.
Gori, M., and Tesi, A. (1992). On the problem of local minima in Backpropagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 76-86.
Gove, A., Grossberg, S., and Mingolla, E. (1995). Brightness perception, illusory contours, and corticogeniculate feedback. Visual Neuroscience, 12, 1027-1052.
Gray, C. M., and Singer, W. (1989). Stimulus-specific neuronal oscillations in orientation columns of cat visual cortex. Proceedings of the National Academy of Sciences USA, 86, 1698–1702.
Grossberg, S. (1968a). A prediction theory for some nonlinear functional-differential equations, II: Learning of patterns. Journal of Mathematical Analysis and Applications, 22, 490-522.
Grossberg, S. (1968b). Some physiological and biochemical consequences of psychological postulates. Proceedings of the National Academy of Sciences, 60, 758-765.
Grossberg, S. (1972a). A neural theory of punishment and avoidance, I: Qualitative theory. Mathematical Biosciences, 15, 39-67.
Grossberg, S. (1972b). A neural theory of punishment and avoidance, II: Quantitative theory. Mathematical Biosciences, 15, 253-285.
Grossberg, S. (1973). Contour enhancement, short-term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics, 52, 213-257.
Grossberg, S. (1976a). Adaptive pattern classification and universal recoding, I: Parallel development and coding of neural feature detectors. Biological Cybernetics, 23, 121-134.
Grossberg, S. (1976b). Adaptive pattern classification and universal recoding, II: Feedback, expectation, olfaction, and illusions. Biological Cybernetics, 23, 187-202. In R. Rosen and F. Snell (Eds.), Progress in theoretical biology, Volume 5. New York: Academic Press, pp. 233-374.
Grossberg, S. (1978a). A theory of human memory: Self-organization and performance of sensory-motor codes, maps, and plans.
Grossberg, S. (1978b). Behavioral contrast in short-term memory: Serial binary memory models or parallel continuous memory models? Journal of Mathematical Psychology, 3, 199-219.
Grossberg, S. (1980). How does a brain build a cognitive code? Psychological Review, 87, 1-51.
Grossberg, S. (1984a). Some normal and abnormal behavioral syndromes due to transmitter gating of opponent processes. Biological Psychiatry, 19, 1075-1118.
Grossberg, S. (1984b). Some psychophysiological and pharmacological correlates of a developmental, cognitive, and motivational theory. In Karrer R, Cohen J, Tueting P, editors. Brain and Information: Event Related Potentials, New York: New York Academy of Sciences, pp. 58-142.
Grossberg, S. (1988) Nonlinear neural networks: Principles, mechanisms, and architectures. Neural Networks, 1 , 17-61.
Grossberg, S. (1994). 3-D vision and figure-ground separation by visual cortex. Perception and Psychophysics, 55, 48-120.
Grossberg, S. (1994). Letter to the editor: Physiological interpretation of the self-organizing map algorithm.
Grossberg, S. (1997). Cortical dynamics of three-dimensional figure-ground perception of two-dimensional figures. Psychological Review, 104 , 618-658.
Grossberg, S. (1998a). Birth of a learning law. INNS/ENNS/JNNS Newsletter, 21, 1-4.
Grossberg, S. (1998b). Synaptic depression and cortical gain control. Technical Report CAS/CNS TR-98-001, Boston University.
Grossberg, S. (1999). How does the cerebral cortex work? Learning, attention and grouping by the laminar circuits of visual cortex. Spatial Vision, 12, 163-186.
Grossberg, S. (2000a). How hallucinations may arise from brain mechanisms of learning, attention, and volition. Invited article for the Journal of the International Neuropsychological Society, 6, 579-588.
Grossberg, S. (2000b). The complementary brain: Unifying brain dynamics and modularity. Trends in Cognitive Sciences, 4, 233-246.
Grossberg , S., Govindarajan, K.K., Wyse, L.L. , and Cohen, M.A. (2004). ARTSTREAM: A neural network model of auditory scene analysis and source segregation.Neural Networks, 17, 511-536.
Grossberg, S. (2007). Consciousness CLEARS the mind. Neural Networks, 20, 1040-1053.
Grossberg, S. (2009b). Cortical and subcortical predictive dynamics and learning during perception, cognition, emotion and action. Philosophical Transactions of the Royal Society of London B Biological Sciences, 364, 1223-1234.
Grossberg, S. (2012). Adaptive Resonance Theory: How a brain learns to consciously attend, learn, and recognize a changing world. Neural Networks, 37, 1-47.
Grossberg, S., and Kazerounian, S. (2011). Laminar cortical dynamics of conscious speech perception: A neural model of phonemic restoration using subsequent context in noise. Journal of the Acoustical Society of America, 130, 440-460.
Grossberg, S., Markowitz, J., and Cao, Y. (2011). On the road to invariant recognition: Explaining tradeoff and morph properties of cells in inferotemporal cortex using multiple-scale task-sensitive attentive learning. Neural Networks, 24, 1036-1049.
Grossberg, S., and McLoughlin, N. (1997). Cortical dynamics of 3-D surface perception: Binocular and half-occluded scenic images. Neural Networks, 10, 1583-1605.
Grossberg, S. and Paine, R.W. (2000). A neural model of corticocerebellar interactions during attentive imitation and predictive learning of sequential handwriting movements. Neural Networks, 13, 999-1046.
Grossberg, S., and Pearson, L. (2008). Laminar cortical dynamics of cognitive and motor working memory, sequence learning and performance: Toward a unified theory of how the cerebral cortex works. Psychological Review, 115, 677-732 .
Grossberg, S., and Seitz, A. (2003). Laminar development of receptive fields, maps, and columns in visual cortex: The coordinating role of the subplate. Cerebral Cortex, 13, 852-863.
Grossberg, S., and Seidman, D. (2006). Neural dynamics of autistic behaviors: Cognitive, emotional, and timing substrates. Psychological Review, 113, 483-525.
Grossberg, S., and Swaminathan, G. (2004). A laminar cortical model for 3D perception of slanted and curved surfaces and of 2D images: development, attention and bistability. Vision Research, 44, 1147-1187.
Grossberg, S., and Versace, M. (2008). Spikes, synchrony, and attentive learning by laminar thalamocortical circuits. Brain Research, 1218, 278-312.
Grossberg, S., and Vladusich, T. (2010). How do children learn to follow gaze, share joint attention, imitate their teachers, and use tools during social interactions? Neural Networks, 23, 940-965.
Grossberg, S., Yazdanbakhsh, A., Cao, Y., and Swaminathan, G. (2008). How does binocular rivalry emerge from cortical mechanisms of 3-D vision? Vision Research, 48, 2232-2250.
Guenther, F.H., Bullock, D., Greve, D., and Grossberg, S. (1994). Neural representations for sensory-motor control, III: Learning a body-centered representation of 3-D target position. Journal of Cognitive Neuroscience, 6, 341-358.
He, J., Tan, A.-H., and Tan, C.-L. (2000). A comparative study on Chinese text categorization methods. Proceedings of PRICAI’2000.
Healy, M.J., Caudell, T.P., and Smith, S.D.G. (1993). A neural architecture for pattern sequence verification through inferencing. IEEE Transactions on Neural Networks, 4. 9-20.
Hecht-Nielsen, R. (1987). Counterpropagation networks. Applied Optics, 26, 4979-4983.
Ho, C.S., Liou, J.J., Georgiopoulos, M., Heileman, G.L., and Christodoulou, C. (1994) Analogue circuit design and implementation of an adaptive resonance theory (ART) network architecture. International Journal of Electronics, 76, 271-291.
Hopfield, J.J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the National Academy of Sciences, 81, 3088-3092.
Hsieh, K.-L. (2008). The application of clustering analysis for the critical areas on TFT-LCD panel. Expert Systems with Applications, 34, 952-957.
Hsieh, K.-L., and Yang, I.-Ch., (2008). Incorporating PCA and fuzzy-ART techniques into achieve organism classification based on codon usage consideration. Computers in Biology and Medicine, 38, 886-893.
Hsu, S.-C., and Chien, C.-F. (2007). Hybrid data mining approach for pattern extraction from wafer bin map to improve yield in semiconductor manufacturing. International Journal of Production Economics, 107, 88-103.
Hupé, J. M., James A. C., Girard, D. C., and Bullier, J. (1997). Feedback connections from V2 modulate intrinsic connectivity within V1. Society for Neuroscience Abstracts, 406.15, 1031.
Kawamura, T., Takahashi, H., and Honda, H. (2008). Proposal of new gene filtering method, BagPART, for gene expression analysis with small sample. Journal of Bioscience and Bioengineering, 105, 81-84.
Kaylani, A., Georgiopoulos, M., Mollaghasemi, M., and Anagnostopoulos, G.C. (2009). AG-ART : An adaptive approach to evolving ART architectures. Neurocomputing, 72, 2079-2092.
Keskin, G.A., and Ozkan, C. (2009). An alternative evaluation of FMEA: Fuzzy ART algorithm. Quality and Reliability Engineering International, 25, 647-661.
Kohonen, T. (1984). Self-organization and associative memory. New York: Springer.
Kraus, N., McGee, T., Littman, T., Nicol, T., and King, C. (1994). Nonprimary auditory thalamic representation of acoustic change. Journal of Neurophysiology, 72, 1270–1277.
Liu, D., Pang, Z, and Lloyd, S.R. (2008). A neural network method for detection of obstructive sleep apnea and narcolepsy based on pupil size and EEG. IEEE Transactions on Neural Networks, 19, 308-318.
Liu, L., Huang, L., Lai, M., and Ma, C. (2009). Projective ART with buffers for the high dimensional space clustering and an application to discover stock associations. Neurocomputing, 72, 1283-1295.
Lopes, M.L.M., Minussi, C.R., and Lotufo, A.D.P. (2005). Electric load forecasting using a fuzzy ART & ARTMAP neural network. Applied Soft Computing, 5, 235-244.
Luck, S. J., Chelazzi, L., Hillyard, S. A., and Desimone, R. (1997). Neural mechanisms of spatial selective attention in areas V1, V2, and V4 of macaque visual cortex. Journal of Neurophysiology, 77, 24–42.
Marchiori, S.C., da Silveira, M. do C., Lotufo, A.D.P, Minussi, C.R., and Lopes, M.L.M. (2011). Neural network based on adaptive resonance theory with continuous training for multi-configuration transient stability analysis of electric power systems. Applied Soft Computing, 11, 706-715.
Martin-Guerrero, J.D., Lisboa, P.J.G., Soria-Olivas, E., Palomares, A., and Balaguer, E. (2007). An approach based on the Adaptive Resonance Theory for analyzing the viability of recommender systems in a citizen Web portal. Expert Systems with Applications, 33, 743-753.
Massey, L. (2009). Discovery of hierarchical thematic structure in text collections with adaptive resonance theory. Neural Computation & Applications, 18, 261-273.
Mounts, J. R. W. (2000). Evidence for suppressive mechanisms in attentional selection: Feature singletons produce inhibitory surrounds. Perception and Psychophysics, 62, 969–983.
Mulder, S.A., and Wunsch, D.C. (2003). Million city traveling salesman problem solution by divide and conquer clustering with adaptive resonance neural networks. Neural Networks, 16, 827-832.
Mumford, D. (1992). On the computational architecture of the neocortex. II. The role of corticocortical loops. Biological Cybernetics, 66, 241–251.
Owega, S., Khan, B.-U.-Z., Evans, G.J., Jervis, R.E., and Fila, M. (2006). Identification of long-range aerosol transport patterns to Toronto via classication of back trajectories by cluster analysis and neural network techniques. Chemometrics and Intelligent Laboratory Systems, 83, 26-33.
Perry, E.K., Lee, M.L.W., Martin-Ruiz, C.M., Court, J.A., Volsen, S.G., Merrit, J., Folly, E., Iversen, P.E., Bauman, M.L., Perry, R.H., and Wenk, G.L. (2001). Cholinergic activity in autism: Abnormalities in the cerebral cortex and basal forebrain. The American Journal of Psychiatry, 158, 1058-1066.
Pilly, P.K. and Grossberg, S. (2012). How do spatial learning and memory occur in the brain? Coordinated learning of entorhinal grid cells and hippocampal place cells. Journal of Cognitive Neuroscience, in press.
Pollen DA (1999) On the neural correlates of visual perception. Cerebral Cortex, 9, 4–19.
Posner, M.I., and Keele, S.W. (1968). On the genesis of abstract ideas. Journal of Experimental Psychology, 77, 353-363.
Prasad, V.S.S., and Gupta, S.D. (2008). Photometric clustering of regenerated plants of gladiolus by neural networks and its biological validataion. Computers and Electronics in Agriculture, 60, 8-17.
Raizada, R. and Grossberg, S. (2001). Context-sensitive bindings by the laminar circuits of V1 and V2: A unified model of perceptual grouping, attention, and orientation contrast. Visual Cognition, 8, 431-466.
Raizada, R. and Grossberg, S. (2003). Towards a theory of the laminar architecture of cerebral cortex: Computational clues from the visual system. Cerebral Cortex, 13, 100-113.
Ramachandran, V.S. (1990). Interactions between motion, depth, color and form: the utilitarian theory of perception. In Vision: Coding and Efficiency, C. Blakemore, Ed. Cambridge, England: Cambridge University Press.
Rao, R. P. N., and Ballard, D. H. (1999). Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive field effects. Nature Neuroscience, 2, 79–87.
Reynolds, J., Chelazzi, L., and Desimone, R. (1999). Competitive mechanisms subserve attention in macaque areas V2 and V4. The Journal of Neuroscience, 19, 1736–1753.
Reynolds, J. H., and Heeger, D. J. (2009). The normalization model of attention. Neuron, 61, 168-185.
Roelfsema, P. R., Lamme, V. A. F., and Spekreijse, H. (1998). Object-based attention in the primary visual cortex of the macaque monkey. Nature, 395, 376–381.
Shieh, M.-D., Yan, W., and Chen, C.-H. (2008). Soliciting customer requirements for product redesign based on picture sorts and ART2 neural network. Expert Systems with Applications, 34, 194-204.
Sigala, N., and Logothetis, N.K. (2002). Visual categorization shapes feature selectivity in the primate temporal cortex. Nature, 415, 318-320.
Sillito, A. M., Jones, H. E., Gerstein, G. L., and West, D. C. (1994). Feature-linked synchronization of thalamic relay cell firing induced by feedback from the visual cortex. Nature, 369, 479-482.
Somers, D. C., Dale, A. M., Seiffert, A. E., and Tootell, R. B. (1999). Functional MRI reveals spatially specific attentional modulation in human primary visual cortex. Proceedings of the National Academy of Sciences USA, 96, 1663–1668.
Steinman, B.A., Steinman. S.B., and Lehmkuhle, S. (1995). Visual attention mechanisms show a center-surround organization. Vision Research, 35, 1859-1869.
Sudhakara Pandian, R., and Mahapatra, S.S. (2009). Manufacturing cell formation with production data using neural networks. Computers & Industrial Engineering, 56, 1340-1347.
Takahashi, H., Murase, Y., Kobayashi, T., and Honda, H. (2007). New cancer diagnosis modeling using boosting and projective adaptive resonance theory with improved reliable index. Biochemical Engineering Journal, 33, 100 – 109.
Tan, A.-H. (1997). Cascade ARTMAP: Integrating neural computation and symbolic knowledge processing. IEEE Transactions on Neural Networks, 8, 237-250.
Tan, A.-H., and Teo, C. (1998). Learning user profiles for personalized information dissemination. IEEE World Congress on Computational Intelligence, 1, 183-188.
Tan, T.Z., Quek, C., Ng, G.S., and Razvi, K. (2008). Ovarian cancer diagnosis with complementary learning fuzzy neural network. Artificial Intelligence in Medicine, 43, 207-222.
Tanaka, K., Saito, H., Fukada, Y., and Moriya, M. (1991). Coding visual images of objects in the inferotem¬poral cortex of the macaque monkey. Journal of Neurophysiology, 66, 170-189.
Temereanca, S., and Simons, D. J. (2001). Topographic specificity in the functional effects of corticofugal feedback in the whisker/barrel system. Society for Neuroscience Abstracts, 393.6.
van Der Werf, Y.D., Witter, M.P., and Groenewegen, H.J. (2002). The intralaminar and midline nuclei of the thalamus. Anatomical and functional evidence for participation in processes of arousal and awareness. Brain research, 39, 107-140.
Vanduffel, W., Tootell, R.B., and Orban, G.A. (2000). Attention-dependent suppression of meta-bolic activity in the early stages of the macaque visual system. Cerebral Cortex, 10, 109–126.
Vladusich, T., Lafe, F., Kim, D.-S., Tager-Flusberg, H., and Grossberg, S. (2010). Prototypical category learning in high-functioning autism. Autism Research, 3, 226-236.
Weber, A. J., Kalil, R. E., and Behan, M. (1989). Synaptic connections between corticogeniculate axons and interneurons in the dorsal lateral geniculate nucleus of the cat. Journal of Comparative Neurology,289, 156-164.
Wienke, D., and Buydens, L. (1995). Adaptive resonance theory based neural networks—the “ART” of real-time pattern recognition in chemical process monitoring. Trends in Analytical Chemistry, 14, 398-406.
Wunsch, D.C., Caudell, T.P., Capps, C.D., Marks, R.J., II., and Falk, R.A. (1993). An optoelectronic implementation of the adaptive resonance neural network. IEEE Transactions on Neural Networks, 4, 673-684.
Xu, Z., Shi, X., Wang, L., Luo, J., Zhong, C.-J., and Lu, S. (2009). Pattern recognition for sensor array signals using Fuzzy ARTMAP. Sensors and Acuators B: Chemical, 141, 458-464.
Zeki, S., and Shipp, S. (1988). The functional logic of cortical connections. Nature, 335, 311-317.
Zhang, N., and Kezunovic, M. (2007). A real time fault analysis tool for monitoring operation of transmission line protective relay. Electric Power Systems Research, 77, 361-370.
Zhang, Y., Suga, N., and Yan, J. (1997). Corticofugal modulation of frequency processing in bat auditory system. Nature, 387, 900-903.
Zoccolan, D., Kouh, M., Poggio, T., and DiCarlo, J. J. (2007). Trade-off between object selectivity and tolerance in monkey inferotemporal cortex. Journal of Neuroscience, 27, 12292-12307.