Computational models of classical conditioning
|Nestor A. Schmajuk (2008), Scholarpedia, 3(3):1664.||doi:10.4249/scholarpedia.1664||revision #91150 [link to/cite this article]|
During classical (or Pavlovian) conditioning, human and animal subjects change their behavior as a result of the different relationships between the conditioned stimulus (CS) and the unconditioned stimulus (US). Although apparently simple, more than one mechanism is needed to account for the results of the many possible contingencies that have been explored. This article describes several of the proposed mechanisms, shows how they address some important experimental results and how they can be combined to describe most of the known properties of the conditioning.
Competition between CSs to gain association with a US
Rescorla and Wagner (1972) introduced a rule that assumes that CSs compete to gain association with the US. The Rescorla-Wagner Model can describe:
- Acquisition. After a number of CS-US pairings, the CS elicits a conditioned response (CR) that increases in magnitude and frequency.
- Partial Reinforcement. The US follows the CS only on some trials.
- Generalization. A CS2 elicits a CR when it shares some characteristics with a CS1 that has been paired with the US.
- Extinction. When CS-US pairings are followed by presentations of the CS alone or by unpaired CS and US presentations, the CR decreases.
- US–Preexposure effect. Presentation of the US in a training context prior to CS-US pairings retards production of the CR.
- Forward Blocking. Conditioning to CS1-CS2 following conditioning to CS1 results in a weaker conditioning to CS2 than that attained with CS2-US pairings.
- Unblocking. Increasing the US increases responding to the blocked CS2.
- Overshadowing. Conditioning to CS1-CS2 results in a weaker conditioning to CS2 than that attained with CS2-US pairings.
- Conditioned Inhibition. Stimulus CS2 acquires inhibitory conditioning with CS1 reinforced trials interspersed with CS1-CS2 nonreinforced trials.
- Super-normal conditioning. Reinforced CS1-CS2 presentations, following inhibitory conditioning of CS1, increase CS2 excitatory strength compared with the case when it is trained in the absence of CS1.
- Overexpectation. Reinforced CS1-CS2 presentations following independent reinforced CS1 and CS2 presentations, result in a decrement in their initial associative strength.
Van Hamme and Wasserman (1994) described a modified version of the Rescorla and Wagner (1972) model. They proposed that the association of a CS with the US decreases when the CS is absent (a < 0), instead of staying constant as in the original model (a = 0). In addition to the paradigms listed above, the modified model can explain the following phenomena:
- Recovery from overshadowing. Extinction of the CS1 results in increased responding to the overshadowed CS2.
- Recovery from forward blocking. Extinction of the blocker CS1 results in increased responding to the blocked CS2.
- Backward blocking. Conditioning to CS1 following conditioning to CS1-CS2 results in a weaker conditioning to CS2 than that attained with CS2-US pairings.
Attention, CS-CS and CS-US Associations
Mackintosh (1975) suggested that attention to a given CS increases when that CS is the best predictor of the US, and decreases otherwise. Mackintosh’s (1975) rule can be expressed as \(\Delta \alpha_A > 0\) if \(|\lambda-V_A| \geq |\lambda -V_X|\ ,\) where \(\alpha_A\) is the CSA-specific learning rate, \(\lambda\) is the asymptotic association with the US, \(V_A\) is the association of CSA with the US, and \(V_X\) the association with the US of all CSs other than CSA. In addition to acquisition, partial reinforcement, extinction, forward blocking, and overshadowing, the rule can by applied to:
- Latent inhibition. Preexposure to a CS followed by CS-US pairings retard the generation of the CR.
Grossberg (1975; Schmajuk and DiCarlo, 1991) offered a neural network that provides the necessary mechanisms to implement Mackintosh’s (1975) rule.
In contrast to Mackintosh’s (1975) aproach, Pearce and Hall (1980) proposed that attention to a given CS decreases when the US is accurately predicted. This idea can be expressed by \(\Delta V = \alpha |\lambda - \Sigma V| \lambda\ ,\) where \(\alpha\) is proportional to the intensity of the CS, \(\Sigma V\) represents the prediction of the US by all CSs, and \(\lambda\) is the intensity of the US. In addition to most of the results explained by the Rescorla-Wagner model, the model can explain:
- Latent inhibition. See above.
- Unblocking by decreasing the US. Decreasing the US in the second phase of forward blocking can increase responding to CS2.
- Simultaneous excitatory and inhibitory associations. A CS can simultaneously act as excitor and inhibitor of the CR.
Wagner (1981) offered a Sometimes Opponent Process (SOP) theory. This approach assumes that a stimulus representation can be in one of three states, A1 (high activation), A2 (low activation), or I (inactive). An excitatory association between a CS and a US increases when their representations are both in the A1 state. After training, presentation of the CS activates a representation of the US in the A2 state. An inhibitory association between a CS and a US increases when the CS representation is in the A1 state and the US representation is in the A2 state, that is, the US is not present but evoked by another CS. A stimulus cannot activate the A1 state while in the A2 state. The SOP theory can explain most of the results addressed by the Rescorla-Wagner model, latent inhibition, and also:
- Backward conditioning. Excitatory conditioning is obtained when the US precedes the CS by a short interval and inhibitory conditioning when the interval is long.
- Interstimulus Interval (ISI) effects. Conditioning is maximal at an optimal ISI and gradually decreases with increasing ISIs.
- Intertrial Inteval (ITI) effects. Conditioning to the CS increases with longer ITIs.
- Conditioned diminution or facilitation of the unconditioned response (UR). A reduction in the amplitude of the UR that immediately follows a previously reinforced CS.
- Delay conditioning with different CS durations. Conditioning first increases and then decreases with increasing CS durations when the US is presented at the end of the CS.
- Pretrial CS. Presentation of a CS before CS-US pairings decreases conditioning for short CS-CS intervals and increases conditioning for long CS-CS intervals.
- Pretrial US. Presentation of a US before CS-US pairings decreases conditioning.
Dickinson and Burke (1996) proposed a revised version of Wagner’s (1981) SOP theory. Whereas Wagner (1981) suggested that excitatory associations are formed only if the representations of two stimuli are in the A1 state, Dickinson and Burke (1996) postulated that excitatory associations are also formed when they are in the A2 state. This association is weaker, however, than that formed when both stimuli are in the A1 state. In addition, whereas Wagner (1981) suggested that if the CS is represented in the A2 state and the US in the A1 state no learning occurs, Dickinson and Burke (1996) postulated that in this situation an inhibitory association is formed between the CS and the US. The modified SOP model can describe recovery from overshadowing, recovery from blocking, backward blocking, and also:
- Recovery from LI. Presentation of the US in the context of preexposure and conditioning results in renewed responding to the preexposed CS.
Schmajuk, Lam, and Gray (SLG) (1996; see also Schmajuk and Larrauri, 2006) incorporated an elaborated version of the Rescorla-Wagner rule into a model that also included:
- 1.Temporal representations of the CS, the US, the interstimulus interval (ISI) and intertrial interval (ITI) (see also Sutton and Barto, 1981),
- 2.CS-CS associations (see Schmajuk and Moore, 1988),
- 3.An attentional mechanism that increases attention to the CSs when they are present when novelty (defined as the sum of the absolute values of the differences between expected and perceived CSs and USs) is detected in the environment (a real-time, multiple-CS extension of Pearce and Hall, 1980), and
- 4.A feedback loop that combines perceived CSs and CSs predicted by other CSs (Schmajuk and Moore, 1988).
The SLG model describes a large number of experimental results, including acquisition, ISI effects, ITI effects, delay conditioning with different CS durations, partial reinforcement, generalization, super-normal conditioning, overexpectation, extinction of excitatory conditioning, US–Preexposure effect, conditioned inhibition, forward blocking, recovery from forward blocking, overshadowing, recovery from overshadowing, backward blocking, latent inhibition, and recovery from LI. In addition, the model describes:
- Backward conditioning. Inhibitory conditioning is obtained when the US precedes the CS.
- External desinhibition. Presenting a novel stimulus immediately before a previously extinguished CS might produce renewed responding.
- Spontaneous recovery. Presentation of the CS after some time after the subject stopped responding might yield renewed responding.
- Renewal. Presentation of the CS in a novel context might yield renewed responding.
- Reinstatement. Presentation of the US in the context of extinction and testing might yield renewed responding.
- Rapid or Slower Reacquisition. Based on the length of the extinction phase, CS-US presentations following extinction might result in faster or slower reacquisition.
- Extinction of conditioned inhibition. Inhibitory conditioning is extinguished by CS2-US presentations, but not by presentations of CS2 alone.
- Second order conditioning. When CS1-US pairings are followed by CS1-CS2 pairings, presentation of CS2 generates a CR.
- Sensory preconditioning. When CS1-CS2 pairings are followed by CS1-US pairings, presentation of CS2 generates a CR.
- Learned irrelevance. Random exposure to the CS and the US retards conditioning even more than combined latent inhibition and US preexposure.
- Unblocking by increasing or decreasing the US. Increasing or decreasing the US in the second phase of forward blocking can increase responding to CS2.
- Recovery from backward blocking. Extinction of the blocker CS1 results in increased responding to the blocked CS2.
Simple and configural associations
Kehoe (1988) offered a layered network model of associative learning in which the CS inputs, using a competitive rule, learn to activate configural hidden units when the US is presented. In turn the hidden units can become associated with the US. In addition to most of the results explained by the Rescorla-Wagner model, the model is able to address rapid reacquisition, as well as:
- Learning to learn. Learning a CS1-US association facilitates the subsequent learning of a CS2–US association.
- Compound conditioning. Reinforced CS1-CS2 results in stronger responding to the compound than to the components.
- Positive Patterning. Reinforced CS1-CS2 presentations intermixed with nonreinforced CS1 and CS2 presentations result in stronger responding to CS1-CS2 than to the sum of the individual responses to CS1 and CS2.
- Negative Patterning. Nonreinforced CS1-CS2 presentations intermixed with reinforced CS1 and CS2 presentations result weaker responding to CS1-CS2 than to the sum of the individual responses to CS1 and CS2.
- Simultaneous Feature-positive Discrimination. Reinforced simultaneous CS1-CS2 presentations, alternated with nonreinforced presentations of CS2, result in stronger responding to CS1-CS2 than to CS2 alone. In this case, CS1 gains a strong excitatory association with the US.
- Simultaneous Feature-negative Discrimination. Non-reinforced simultaneous CS1-CS2 presentations, alternated with reinforced presentations of CS2, result in weaker responding to CS1-CS2 than to CS2 alone. In this case, CS1 gains a strong inhibitory association with the US.
Schmajuk and DiCarlo (SD) (1992; Schmajuk, Lamoureux, and Holland, 1998) presented a “generalized” version of the Rescorla-Wagner (1972) rule into a model that also included:
- temporal representations of the CS, the US, the ISI and ITI,
- direct CS-US associations, and
- indirect CS-US associations through configural stimuli.
Configural stimuli are created by combining the internal representations of simple CSs. Configural stimuli are maximally active when some specific CSs are present and others are absent. Configural stimuli are needed to solve patternings and feature discriminations. In addition to the results explained by the Rescorla-Wagner model, the SD model also describes:
- ISI and ITI effects. See above.
- CR is determined by both the US and the CS. The nature of the CR is determined not only by the US but also by the CS.
- Serial Feature-positive Discrimination. Reinforced successive CS1-CS2 presentations, alternated with nonreinforced presentations of CS2, result in stronger responding to CS1-CS2 than to CS2 alone. In this case, CS1 acts as an occasion setter.
- Serial Feature-negative Discrimination. Non-reinforced successive CS1-CS2 presentations, alternated with reinforced presentations of CS2, result in weaker responding to CS1-CS2 than to CS2 alone. In this case, CS1 acts as an occasion setter.
Gluck and Myers (1993) also introduced models able to explain some of the above results by incorporating configural stimuli.
Multiple representations of the CS
Grossberg and Schmajuk (GS) (1989) presented a model that assumes that a CS generates multiple temporal representations. The model can describe a property not described by any of the above models:
- Timing of the peak CR. The CR peaks at the time of the US presentation during training (equivalent to responding at the ISI).
- Training with multiple USs. A CS trained with a US presented a different ISIs will present peaks centered at those ISIs.
Buhusi and Schmajuk (1996) combined the mechanisms of the SLG and the SD models into a model that explains all the results previously addressed by each model. Also, Buhusi and Schmajuk (1999) combined the SD and the GS models to explain
- Timing of the peak CR. See above
- Temporal specificity of the competition between CSs in blocking. Blocking is observed when the blocked CS, is paired in the same temporal relationship with the US as the blocking CS.
- Temporal specificity in serial FP discriminations. A serial feature-positive discrimination is best when the feature-target interval during testing matches the training interval.
Figure 1 shows the block diagram of a model, able to describe many of the properties of classical conditioning, which incorporates the following mechanisms
- Competition between CSs to form associations with the US
- Competition between CSs to form associations with other CSs
- A novelty-controlled attentional mechanism (Link to Novelty)
- A feedback loop that combines externally perceived and internally generated images of CSs
- A mechanism to generate stimulus configurations
- Multiple representations of a CS.
The multiple mechanisms that participate in classical conditioning are possibly found in different regions of the brain. Clues to the location of these mechanisms are offered by data showing that, for instance, association cortex participates in sensory preconditioning (Thompson and Kramer, 1965), a midbrain/brain-stem circuit in conditioned inhibition (Mis, 1977), the nucleus accumbens in latent inhibition (Solomon and Staton, 1982), cerebellar areas in eyeblink conditioning (Lincoln, McCormick, and Thompson,1982; Desmond and Moore, 1982), the amygdala in fear conditioning (Hitchcock and Davis, 1986), the hippocampus (Solomon et al., 1986) and medial prefrontal cortex (Kronforst-Collins and Disterhoft, 1998) in trace conditioning, the hippocampus in configural discriminations (Rudy and Sutherland, 1989), and the parabrachial nucleus plays an important role in conditioned taste aversion (Reilly, Grigson, & Norgren, 1993). Hippocampus and trace conditioning of the rabbit's classically conditioned nictitating membrane response.
- Buhusi, C.V., and Schmajuk, N.A. (1996). Attention, configuration, and hippocampal function. Hippocampus, 6, 621-642.
- Buhusi, C.V., and Schmajuk, N.A. (1999). Timing in simple conditioning and occasion setting: A neural network approach. Behavioral Processes, 45, 33-57.
- Desmond, J.E,, and Moore, J.W. (1982) A brain stem region essential for the classically conditioned but not unconditioned nictitating membrane response. Physiology & Behavior, 28, 1029-1033.
- Dickinson, A. and Burke, J. (1996). Within-compound associations mediate the retrospective revaluation of causality judgments. Quarterly Journal of Experimental Psychology, 49B, 60-80.
- Gluck, M. A. , & Myers, C.E. (1993). Hippocampal mediation of stimulus representation: A computational theory. Hippocampus, 3, 491-516.
- Grossberg, S. (1975). A neural model of attention, reinforcement, and discrimination learning. International Review of Neurobiology, 18, 263-327.
- Grossberg, S. and Schmajuk, N.A. (1989) Neural dynamics of adaptive timing and temporal discrimination during associative learning. Neural Networks, 2, 79?102
- Hitchcock, J., and Davis, M. (1986). Lesions of the amygdala, but not of the cerebellum or red nucleus, block conditioned fear as measured with the potentiated startle paradigm. Behavioral Neuroscience, 100, 11-220
- Kehoe, E.J. (1988). A layered network model of associative learning: Learning to learn and configuration. Psychological Review, 95, 411-433.
- Kronforst-Collins, M.A., and Disterhoft, J.F. (1998). Lesions of the caudal area of rabbit medial prefrontal cortex impair trace eyeblink conditioning. Neurobiology of Learning and Memory, 69, 147-162.
- Lincoln, J. S., McCormick, D. A., and Thompson, R.F. (1982) Ipsilateral cerebellar lesions prevent learning of the classically conditioned nictitating membrane/eyelid response. Brain Research, 242, 190-193.
- Mackintosh, N.J. (1975). A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Review, 82, 276-298.
- Mis, F.W. (1977). A midbrain-brain stem circuit for conditioned inhibition of the nictitating membrane response in the rabbit (Oryctolagus cuniculus). Journal of Comparative and Physiological Psychology, 91, 975-988.
- Pearce, J.M., and Hall, G. (1980). A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87, 532?552.
- Reilly, S., Grigson, E S., and Norgren, R. (1993). Parabrachial nucleus lesions and conditioned taste aversion: Evidence supporting an associative deficit. Behavioral Neuroscience, 107, 1005-1017.
- Rescorla, R.A., and Wagner, A.R. (1972). A theory of Pavlovian conditioning: Variation in the effectiveness of reinforcement and non?reinforcement. In A.H. Black and W.F. Prokasy (Eds.), Classical Conditioning II: Theory and Research, New York: Appleton?Century?Crofts.
- Rudy, J. W., and Sutherland, R.J. (1989) The hippocampal formation is necessary for rats to learn and remember configural discriminations. Behavioural Brain Research, 34, 97-109.
- Schmajuk, N.A. and DiCarlo, J.J. (1991). A neural network approach to hippocampal function in classical conditioning. Behavioral Neuroscience, 105, 82-110.
- Schmajuk, N.A. and DiCarlo, J.J. (1992). Stimulus configuration, classical conditioning, and the hippocampus. Psychological Review, 99, 268-305.
- Schmajuk, N.A., Lam, Y.W., and Gray, J.A. (1996). Latent inhibition: A neural network approach. Journal of Experimental Psychology: Behavior Processes, 22, 321-349.
- Schmajuk, N.A., and Larrauri, J.A. (2006). Experimental Challenges to Theories of Classical Conditioning: Application of an Attentional Model of Storage and Retrieval. Journal of Experimental Psychology: Animal Behavior Processes, 32, 1–20.
- Schmajuk, N.A., Lamoureux, J., and Holland, P.C. (1998) Occasion setting and stimulus configuration: A neural network approach. Psychological Review, 105, 3-32.
- Schmajuk, N.A., and Moore, J. (1988). The hippocampus and the classically conditioned nictitating membrane response: A real-time attentional-associative model. Psychobiology, 16, 20-35.
- Solomon, P. R., and Staton, D. M. (1982) Differential effects of microinjections of d-amphetamine into the nucleus accumbens or the caudate putamen on the rat's ability to ignore an irrelevant stimulus. Biological Psychiatry, 17, 743-756.
- Solomon, P.R., Vander Schaaf, E.R., Thompson, R.F., and Weisz, D.J.(1986) Hippocampus and trace conditioning of the rabbit's classically conditioned nictitating membrane response. Behavioral Neuroscience, 100, 729-744.
- Sutton, R.S., and Barto, A.G. (1981). Toward a modern theory of adaptive networks. Psychological Review, 88, 135-170.
- Wagner, A.R. (1978). Expectancies and the priming of STM. In S.H. Hulse, H. Fowler, and W.K. Honig (Eds.), Cognitive Processes in Subject Behavior (pp. 177-209). Hillsdale, N.J.: Lawrence Erlbaum.
- Thompson, R.F., and Kramer, R. F. (1965) Role of association cortex in sensory preconditioning. Journal of Comparative and Physiological Psychology, 60, 186-191.
- Van Hamme, L. and Wasserman, E. (1994). Cue competition in causality judgments: The role of nonpresentation of compound stimulus elements. Learning and Motivation, 25, 127-151.
- Wagner, A. (1981). SOP: A model of automatic memory processing in animal behavior. In N.E. Spear and R.R. Miller (Eds.), Information processing in animals: Memory mechanisms (pp. 5-47). Hillsdale, NJ: Erlbaum.
- Joseph E. LeDoux (2008) Amygdala. Scholarpedia, 3(4):2698.
- Peter Redgrave (2007) Basal ganglia. Scholarpedia, 2(6):1825.
- Valentino Braitenberg (2007) Brain. Scholarpedia, 2(11):2918.
- Nestor A. Schmajuk (2008) Classical conditioning. Scholarpedia, 3(3):2316.
- Peter Jonas and Gyorgy Buzsaki (2007) Neural inhibition. Scholarpedia, 2(9):3286.
- Florentin Woergoetter and Bernd Porr (2008) Reinforcement learning. Scholarpedia, 3(3):1448.
- Robert Rescorla (2008) Rescorla-Wagner model. Scholarpedia, 3(3):2237.
- Wolfram Schultz (2007) Reward. Scholarpedia, 2(3):1652.
- Wolfram Schultz (2007) Reward signals. Scholarpedia, 2(6):2184.
- Andrew G. Barto (2007) Temporal difference learning. Scholarpedia, 2(11):1604.
Actor-Critic Method, Basal Ganglia, Neuroeconomics, Q-Learning, Reinforcement Learning, Rescorla-Wagner Learning Rule, Reward, Reward Signals, Temporal Difference Learning