Reward signals

From Scholarpedia
Wolfram Schultz (2007), Scholarpedia, 2(6):2184. doi:10.4249/scholarpedia.2184 revision #145291 [link to/cite this article]
Jump to: navigation, search
Post-publication activity

Curator: Wolfram Schultz

Reward information is processed by specific neurons in specific brain structures. Reward neurons produce internal reward signals and use them for influencing brain activity that controls our actions, decisions and choices.

A prime goal in the investigation of neural processes of reward is to identify an explicit neuronal reward signal, just as retinal responses to visual stimuli constitute starting points for investigating the neuronal processes underlying visual perception. The search for a "retina of the reward system" has located brain signals related purely to reward value irrespective of sensory and motor attributes in midbrain dopamine neurons and in select neurons of orbitofrontal cortex, dorsal and ventral striatum, and possibly amygdala. Reward signals influence neural processes in cortical and subcortical structures underlying behavioral actions and thereby contribute to economic choices.

Figure 1: Differential response of single dopamine neuron to reward-predicting and other stimuli (from Tobler et al. 2005).

Contents

Pure Reward Signals in Dopamine Neurons

Midbrain dopamine neurons show phasic excitatory responses (activations) following primary food and liquid rewards, and visual, auditory and somatosensory reward-predicting stimuli. As in sensory systems, the reward-related activation can be preceded by a brief detection component before the stimulus has been identified and properly valued. The reward-related activations occur in 65-80% of dopamine neurons in cell groups A9 (pars compacta of substantia nigra), A10 (ventral tegmental area, VTA) and A8 (dorsolateral substantia nigra). The activations have latencies of < 100 ms and durations of < 200 ms. The same neurons are briefly depressed in their activity by reward omission and by stimuli predicting the absence of reward; they are not affected by known neutral stimuli unless they have substantial intensity ( Figure 1). The particular characteristics of these phasic dopamine responses are compatible with the notion of teaching signal according to reinforcement learning theories, as further described below. Dopamine neurons in groups A8-A10 project their axons to the dorsal and ventral striatum, dorsolateral and orbital prefrontal cortex and some other cortical and subcortical structures. The subsecond dopamine reward response may be responsible for the reward-induced dopamine release seen with voltammetry (Roitman et al. 2004) but would not easily explain the 300-9,000 times slower dopamine fluctuations with rewards and punishers seen in microdialysis (Datla et al. 2002, Young 2004).

Figure 2: Reward prediction error response of single dopamine neuron (from Schultz et al. 1997).

Reward prediction error

The dopamine reward response appears to code the discrepancy between the reward and its prediction (‘prediction error’), such that an unpredicted reward elicits an activation (positive prediction error), a fully predicted reward elicits no response, and the omission of a predicted reward induces a depression (negative error, Figure 1).

The hypothesis that dopamine neurons report reward prediction errors can be tested formally by paradigms developed by animal learning theory, using the Rescorla-Wagner learning rule. In the blocking paradigm (Fig. 3a), a stimulus is not learned when it is paired with an already fully predicted reward, indicating the importance of prediction errors for learning. After pairing with a fully predicted reward, the blocked stimulus does not come to predict a reward. Accordingly, the absence of a reward following the blocked stimulus does not produce a response in dopamine neurons, as no prediction error is elicited, and the delivery of a reward does produce a positive prediction error response ( Figure 3a left). By contrast, after a well trained reward-predicting stimulus, reward omission produces a depressant neural response, and reward delivery does not lead to a response in the same dopamine neuron ( Figure 3a right).

Figure 3: Coding of reward prediction errors by dopamine neurons in formal tests developed by animal learning theory. (a) Blocking paradigm. Left two panels: the blocked stimulus that does not become a reward predictor due to absence of prediction error during reward pairings. Consequently the absence of reward after the blocked stimulus does not produce a negative prediction error and, accordingly, no activation in the dopamine neuron (histograms and rasters; top). By contrast, delivery of reward elicits a positive reward prediction error and dopamine activation (bottom). Right two panels: Control with the same neuron. A different novel stimulus is shown together with a known neutral stimulus without reward prediction, but now a reward followed, a reward prediction error occurred, and the novel stimulus becomes a valid reward predictor. Absence of reward following this stimulus produces a negative prediction error and, accordingly, a depressant dopamine response (top), whereas reward delivery produces neither prediction error nor dopamine response. From Waelti et al. (2001). (b) Conditioned inhibition paradigm. Lack of response to absence of reward following the test stimulus predicting no reward (top), even if the stimulus is paired with an otherwise reward-predicting stimulus (R, middle, summation test), but strong activation to reward following the test stimulus predicting no reward (bottom). These responses contrast with those to the neutral control stimulus (right). From Tobler et al. (2003).

In the conditioned inhibition paradigm (Fig. 3b), a test stimulus is presented simultaneously with an established reward-predicting stimulus but no reward is given after the compound, making the test stimulus a conditioned inhibitor which predicts the absence of reward. Reward omission after a conditioned inhibitor does not produce a prediction error response in dopamine neurons, even when the established reward-predicting stimulus is added ( Figure 3b left). By contrast, the occurrence of reward after the inhibitor produces an enhanced prediction error response, as the prediction error represents the difference between the actual reward and the negative prediction from the inhibitor ( Figure 3b left bottom). By contrast, following a neutral control stimulus there is no depression when no reward occurs, there is the usual depression with reward omission when another, otherwise reward-predicting stimulus is added, and there is the usual activation with surprising reward ( Figure 3b right). Taken together, the data from these paradigms suggest that dopamine neurons show bidirectional coding of reward prediction errors, following the equation

DopamineResponse = RewardOccurred – RewardPredicted.
Figure 4: Graded response of dopamine neurons to reward prediction errors. (a) Response to unpredicted liquid reward, graded according to reward magnitude (ml of liquid). The unpredicted occurrence of reward constitutes a positive prediction error and leads to activation of dopamine neurons (median normalized response of 55 neurons). (b) Response to reward occurring in a probabilistic schedule. Five different stimuli predict reward at 5 different probabilities (p=0-0.25-0.5-0.75-1), and the occurrence of reward in rewarded trials constitutes a positive prediction error which decreases as probability increases. (c) Response to reward omission in the same test as in (b). The negative prediction error with reward omission increases with increasing reward probability. Adequate quantification requires a post-reward time window that is common to all probabilities to take the pause duration into account. From Fiorillo et al. (2003).

Thus the dopamine response seems to convey the crucial learning term \((\lambda-V)\) of the Rescorla-Wagner learning rule and complies with the principal characteristics of teaching signals of efficient reinforcement models (Sutton & Barto 1998).

The response to unpredicted primary reward varies in a monotonic positive fashion with reward magnitude ( Figure 3a). The positive and negative reward prediction error response is also graded, such that a partial prediction error induces a smaller error response. Prediction errors covary with reward probability ( Figure 3b, c) and reflect the discrepancy of the experienced and predicted reward or, more precisely, the difference between the mean of the probability distribution of received reward magnitudes and the expected value of the predicted distribution (Fiorillo et al. 2003, Satoh et al. 2003, Morris et al. 2004, Nakahara et al. 2004, Bayer & Glimcher 2005, Pan et al. 2005).

Figure 5: Adaptation of reward prediction error response to currently used input distribution. (a) Activity of single dopamine neuron showing nearly identical responses to three liquid volumes spanning a 10-fold range (right). Each of three pseudorandomly alternating visual stimuli (shown at left) is followed by one of two liquid volumes at p=0.5 (top, 0.0 or 0.05 ml; middle, 0.0 or 0.15 ml; bottom, 0.0 or 0.5 ml). Thus the occurrence of a reward produces a positive prediction error, whereas the absence of reward constitutes a negative error relative to the intermediate prediction specific for each stimulus. (Left: responses to visual stimuli increase with their associated expected reward values.) Only rewarded trials are shown. (b) Undistinguishable population responses to different positive reward prediction errors from the experiment in (a) (57 neurons). From Tobler et al. (2005).

The reward prediction error response appears to normalize to the standard deviation of the prediction error provided that appropriate advance information is available. When three visual stimuli predict different binary distributions of equiprobable reward magnitudes, the larger magnitude always elicits the same positive prediction error-related activation, even with a 10-fold difference in prediction error ( Figure 1a, b), although the same neurons are sensitive to unpredicted magnitudes ( Figure 3a). As a result of this gain adaptation, the neural response discriminates between the two likely outcomes equally well, regardless of their absolute magnitude difference.

Figure 6: Temporal sensitivity of prediction error response of dopamine neuron. From top to bottom: Reward delay by 0.5 s leads to considerable depression at the habitual time of reward and activation at the new time. Earlier reward leads to activation at new time but not to major depression at the habitual time. The habitual time of reward is at 1.0 s after touch of an operant key and simultaneous offset of conditioned stimulus (CS). From Hollerman & Schultz (1998).

The prediction error response is sensitive to both the occurrence and the time of the reward, as a delayed reward induces a depression at its original time and an activation at its new time ( Figure 6)

Figure 7: Learning viewed as change of action following change of prediction. The error signal would be transmitted by dopamine neurons and influence synaptic processes in dopamine target structures such as striatum and orbitofrontal cortex.

Neuronal computations using prediction errors may contribute to the self-organization of behavior ( Figure 2). Brain mechanisms establish predictions, compare current inputs with predictions from previous experience, and emit a prediction error signal once a mismatch is detected. The error signal may act as an impulse for synaptic modifications that lead to subsequent changes in predictions and behavioral reactions. The process is reiterated until behavioral outcomes match the predictions and the prediction error becomes nil. In the absence of a prediction error, there would be no signal for modifying synapses, and synaptic transmission remains unchanged and stable.

Reward-predicting stimuli

Figure 8: Response of dopamine neurons to reward-predicting stimuli, reflecting the expected reward value. Different conditioned stimuli shown at top predict reward at different probability-magnitude combinations. Numbers above histograms indicate expected reward liquid volume. Histograms and inset show averaged population activity from 57 (animal A for histogram and inset) and 53 (animal B for inset) dopamine neurons. From Tobler et al. (2005).

Dopamine neurons acquire responses to reward-predicting visual and auditory conditioned stimuli (CS). The responses covary with the expected value of reward, irrespective of spatial position, sensory stimulus attributes and arm, mouth and eye movements ( Figure 6). The responses are modulated by the motivation of the animal, the time course of predictions and the animal’s choice among rewards (Satoh et al. 2003, Nakahara et al. 2004, Morris et al. 2006). Although discriminating between reward-predicting CSs and neutral stimuli, dopamine activations have a non-negligible propensity for generalization (Waelti et al. 2001).

During the course of learning, the dopamine response to the reward decreases gradually, and a response to the immediately preceding CS develops in parallel. The gradual, opposite changes in US and CS responses do not involve backpropagating waves of prediction error (Pan et al 2005) assumed in earlier reinforcement models (Montague et al. 1996, Schultz et al. 1997) and are modelled in a biologically plausible manner as teaching signals for behavioral tasks, including Pavlovian conditioning, spatial delayed responding and sequential movements (Suri & Schultz 1999; Izhikevich 2007). These changes are compatible with Pavlovian response transfer and basic principles of temporal difference learning (TD) and suggest the presence of eligibility traces as an essential feature of reward learning.

Activations do not occur when the CS is predicted within a few seconds by another well trained stimulus. This observation conforms to basic assumptions of TD models. As it is often difficult to determine whether rewards are 'primary' or conditioned (Wise 2002), TD models do not make this distinction and assume that CSs can act as reinforcers and elicit prediction errors just as rewards do (Sutton & Barto 1998). Accordingly a dopamine CS response would reflect an error in the prediction of this CS (Suri & Schultz 1999).

Physically intense, unrewarded stimuli induce a short, initial activation in dopamine neurons (Fiorillo et al. 2013b), which is enhanced by stimulus novelty (Schultz & Romo 1987, Horvitz et al. 1997, Ljungberg et al. 1992), generalisation to physically similar rewarded stimuli (Mirenowicz & Schultz 1996, Waelti et al. 2001, Tobler et al. 2003), and reward context (Kobayashi & Schultz 2014). This initial component reflects the detection of the stimulus before identification of its properties and reward value (Nomoto et al. 2010, Fiorillo et al. 2013b). Its intensity is graded across the ventral midbrain without clear category boundaries (Fiorillo et al. 2013a). The initial component occurs sometimes also with aversive stimuli, such as air puffs, aversive liquids or footshocks (Mirenowicz & Schultz 1996, Brischoux et al. 2009, Matsumoto & Hikosaka 2009), but careful controls reveal relationships to physical rather than aversive stimulus properties (Fiorillo et al. 2013b). Thus, the activations of some dopamine neurons by noxious stimuli do not seem to reflect aversive but physical stimulus properties. The more common dopamine response to aversive stimuli is depression of activity. Thus, the dopamine activation consists of an early component reflecting stimulus detection, and a subsequent component coding reward prediction error.

Risk Signal in Dopamine Neurons

Figure 9: Separate coding of reward value and risk by dopamine neurons. Five different conditioned stimuli predict all-or-none reward at different probabilities. Center: Averaged neuronal population responses. The initial, phasic response to the conditioned stimulus (CS) increases monotonically with the probability of the reward predicted by the CS (increasing from top to bottom). Left: The nearly monotonic increase in the population responses for several stimulus sets ('Data') may encode expected value or utility (below). Center again: The more sustained response between CS and reward encodes risk by showing a peak at p=0.5. Right: Sustained population response (top) covarying with entropy and variance (and standard deviation) (bottom; entropy scale in bits, variance scale normalized to maximum). Same experiment as in Figure 6. From Fiorillo et al. (2003).

Rewards occur in most natural situations with some degree of uncertainty. The uncertainty of reward can be tested as risk using different well-trained probabilities for the all-or-none delivery of reward and allows researchers to separate expected reward value (linearly increasing from p=0 to p=1) from risk expressed as variance, standard deviation (SD) or entropy of the probability distribution of magnitudes (inverted U function with peak at p=0.5). More than one third of dopamine neurons show a relatively slow, sustained and moderate activation between the reward-predicting stimulus and the reward which covaries with the degree of risk ( Figure 2). This activation occurs in individual trials and does not propagate from reward back to the conditioned stimulus during learning, as assumed by some implementations of temporal difference reinforcement models (Schultz et al. 1997). The risk-related, more sustained activation ( Figure 2 right) contrasts with the more phasic response to reward-predicting stimuli covarying with expected value (left), and the two responses are uncorrelated in strength in individual neurons. When varying the variance (and SD) of the magnitudes of two equiprobable rewards while keeping entropy constant at 1 bit, the sustained activation increases monotonically with variance (or SD). Thus, variance (or SD) is an effective measure of risk for dopamine neurons.

The distinct neural coding of reward value and uncertainty is consistent with the separation of expected utility into these two components suggested by the mean-variance approach in Financial Decision Theory (Huang & Litzenberger 1988) and found in human brain imaging (Preuschoff et al. 2006; Tobler et al. 2007). These activations do not rule out that other brain structures may code utility as single (scalar) variable proposed by classic Expected Utility Theory (Von Neumann & Morgenstern 1944).

Pure Reward Signals in other brain areas

Figure 10: Pure reward prediction signals in orbitofrontal cortex. (a) Despite spatial variations in the position of the movement target, the response of this orbitofrontal neuron discriminates only on the basis of the different liquid rewards. (b) Despite variations of the visual properties of the reward-predicting stimuli, the response of this orbitofrontal neuron discriminates only on the basis of the different liquid rewards. Inset shows anatomic position of orbitofrontal cortex. From Tremblay & Schultz (1999).

Orbitofrontal cortex

Neuronal activity in orbitofrontal cortex is substantially influenced by rewards. The neurons show activations following reward-predicting stimuli, during the expectation of reward and after reward reception.

Orbitofrontal responses to rewards and reward-predicting stimuli are related to the motivational value rather than the more sensory properties of reward objects, as satiation with specific rewards reduces the neuronal responses (Critchley and Rolls 1996). They constitute pure reward signals by reflecting only reward and not spatial or visual object features ( Figure 5). Orbitofrontal reward signals distinguish between reward and punishment (Thorpe et al. 1983), change with reversal of stimulus-reward associations (Rolls et al. 1996), discriminate between different volumes of liquid reward and encode the economic value of rewards for decision-making irrespective of the actual reward objects (Padoa-Schioppa & Assad 2006). Different neurons in this structure show more sustained activations preceding the expected delivery of liquid or food reward (Schoenbaum et al. 1998, Tremblay & Schultz 1999, Hikosaka and Watanabe 2000). Besides these pure reward-related responses, a few other orbitofrontal neurons respond to visual object properties or are activated in relation to movements.

Figure 11: Adaptive coding of orbitofrontal neuron depending on available rewards. This neuron shows higher expectation activity for more compared to less preferred rewards tested in imperative trials with two randomly alternating rewards predicted by specific pictures. Only one of the two rewards is available in a given trial. Top: Higher activity for a more preferred piece of apple compared to cereal. Bottom: Removal of cereal and adding of raisin to the available two rewards in the current trial block results in shift of larger response from apple to raisin, although the apple pieces and their predictive stimuli are unchanged. From Tremblay & Schultz (1999).

Orbitofrontal neurons do not appear to be specialized for particular reward objects but seem to discriminate between different rewards depending on their current availability ( Figure 11). An reward that is effective for activating an orbitofrontal neuron (apple in Figure 11 top) may lose its efficacy when the reward distribution changes and the initially effective reward loses its highest preference (bottom). By encoding economic value rather than specific reward objects, these responses appear to adapt to the current probability distribution of reward values. A change in this distribution changes the neuronal responses. The apparent dependence of responsiveness on a set point corresponds to a basic tenet of Prospect Theory indicating that outcomes are valued relative to movable references rather than absolute physical characteristics (Kahneman & Tversky 1984).


Figure 12: Monotonic relationships to reward magnitude in separate populations of slowly firing striatal neurons. Included are neurons showing increased impulse activity following reward and reward-predicting stimuli, during reward expectation and in relation to the preparation and execution of movements. From Cromwell & Schultz (2003).

Striatum and nucleus accumbens

The slowly firing medium spiny neurons in striatum and nucleus accumbens and the tonically active striatal neurons (TANs) respond to the reception of food and liquid rewards (Apicella et al. 1991a, b). Other striatal and accumbal neurons show phasic activations following visual reward-predicting stimuli and more sustained activations during the expectation of rewards (Hikosaka et al. 1989a, b, Schultz et al. 1992). Changes of existing reward expectation during learning lead to adaptations of reward expectation-related activity to the currently valid expectation in parallel with the animal’s behavioral differentiation. The TANs discriminate between rewards and air puff punishers (Ravel et al. 2003), and many slowly firing striatal neurons distinguish reward from no reward and discriminate between different rewards and reward magnitudes irrespective of visual object properties, spatial information and movements ( Figure 12; Bowman et al. 1996). Neurons in the ventral striatum show a higher incidence of reward responses and reward expectation activities, as compared to caudate and putamen neurons with their larger spectrum of task-related activity. Thus subpopulations of striatal neurons appear to process pure reward signals.

Reward Influences on Action-Related Activity

Figure 13: Influence of predicted food rewards on spatially discriminating delay activity in a neuron located in the dorsolateral prefrontal cortex of a monkey performing in a delayed response task. The specific visual cues indicate both the spatial position of the target for an arm reaching movement and the particular reward obtained for correct performance. From Watanabe et al. (1996).

Dorsolateral prefrontal cortex

In addition to generating specific signals, rewards influence also on-going action-related activity. The prediction of different food or liquid rewards modifies the typical, spatially discriminating delay activity of neurons in dorsolateral prefrontal cortex ( Figure 13; Kobayashi et al. 2002) and influences movement specific cue responses in medial prefrontal cortex (Matsumoto et al. 2003). These prefrontal neurons carry signals related to the preparation of movement and at the same time encode the expected reward. Only a small population of prefrontal neurons is activated by aversive stimuli (Kobayashi et al. 2006).

Other cortical areas

Predicted rewards influence arm and eye movement-related activity also in other cortical areas including parietal cortex (Platt & Glimcher 1999, Mussalam et al. 2004) and anterior and posterior cingulate (Shidara & Richmond 2002, McCoy et al. 2003). Similar reward effects in premotor cortex may reflect the motivating functions of rewards on movements coded in this part of the motor system (Roesch & Olson 2003).

Figure 14: Predicted reward influences movement preparatory activity in caudate nucleus. The animal performs in a delay task involving rewarded arm movements (top), rewarded nonmovement reactions (nogo, middle) and unrewarded movements (bottom). Both the behavioral action and the outcome is predicted by the initial, differential cue s shown to the left. From Hollerman et al. (1998).

Striatum

Similar to prefrontal neurons, the action-related activity of a population of neurons in the striatum (caudate and putamen) is influenced by predicted rewards. These neurons are activated during the preparation and execution of specific arm and eye movements towards different spatial targets and discriminate between movement and non-movement reactions. At the same time these specific action-related activities are differentially influenced by the predicted presence vs. absence of reward ( Figure 14; Kawagoe et al. 1998) and by different predicted types, magnitudes and probabilities of reward (Hassani et al. 2001, Cromwell & Schultz 2003). This activity can predict the animal’s choice toward a rewarding outcome (Samejima et al. 2005). Similar action-reward specific activities exist also in the subthalamic nucleus (Sato & Hikosaka 2002).

The activations in the striatum and cortex mentioned above do not simply represent outcome expectations, as they differentiate in addition between different behavioral reactions for the same outcome ( Figure 14 movement vs. nonmovement), and they do not only reflect different behavioral reactions, as they differentiate also between the expected outcomes ( Figure 14 top vs. bottom). The reward-differentiating nature of the activations develop and adapt during learning while differential reward expectations are being acquired ( Figure 5).

Figure 15: Change of reward expectation activity during the learning of new reward-predicting cues. (a) In the delayed response task, the animal touches a target lever after a trigger stimulus has occurred following the delay. The reward is delivered after a further delay of 2 s, during which the animal’s hand stays on the target lever. In predictably unrewarded trials with a different cue, the animal does not remain on the lever but immediately goes back to the touch key. (b) The variation in return time of the movement from the target key touch back to the resting key varies between rewarded and unrewarded trials and serves to assess the animal’s reward expectations. (c) During the learning of novel reward-predicting cues, the animal’s behavior (return time) reveals an initial default reward expectation which differentiates after an average of 3 trials according to reward vs. nonreward outcomes. (d) Activity in a caudate neuron during familiar trials encodes both the behavioral reaction (movement vs. nonmovement; nonmovement trials not shown) and the expectation of reward vs. nonreward (left vs. right). (e) During learning, the same caudate neuron shows reward expectation activity during all initial movement trials (left and top right), which disappears during later nonmovement trials (right, shift from 2000 ms to no wait time; trial chronology from top down), in parallel with the animal’s change of expectation-related movement parameters to those typical for unrewarded movements. These data suggest adaptation of neuronal reward expectation during learning in parallel with adaptation of the animal’s internal expectation. From Tremblay et al. (1998).

The combined action and reward coding by striatal neurons complies with theoretical notions of associating specific behavioral actions with rewarding outcomes through operant learning (Sutton & Barto 1998). These activities may constitute neuronal correlates of goal-directed behavior, as they appear to reflect neuronal representations of the reward for the specific action while this action is being prepared and executed (Dickinson & Balleine 1994).

The combined coding of action and reward contrasts with the earlier described pure reward signals in dopamine neurons and in some neurons of orbitofrontal cortex and striatum, which reflect the predicted or received reward irrespective of other stimulus or behavioral components. In demonstrating the influence of predicted reward on action-related activity ( Figure 16), action-outcome coding may represent the next processing stage downstream from pure reward signals towards overt choices. Action-outcome coding may be a component of mechanisms by which reward signals are translated into behavioral choices for obtaining reward through action. Information about the value of each possible action in choice situations would constitute important inputs for decision-making mechanisms.

Figure 16: Schematic diagram of differential influence of predicted reward on magnitude of specific behavior-related activity. Through association with reward, action-related neuronal activity comes to reflect the values of individual actions.

Reward-Related Activity in Amygdala

Reward-predicting stimuli and unpredicted liquid, food, cocaine and intracranial electrical stimulation elicit responses in central and basolateral amygdala, the responses being differentiated against aversive and neutral outcomes (Paton et al. 2006). Responses discriminate between reward magnitudes and change with outcome reversal. Responses correlate with orbitofrontal responses during early discrimination learning and decrease after orbitofrontal lesions (Pratt & Mizumori 1998, Schoenbaum et al. 1998, 2000, Toyomizu et al. 2002, Carelli et al. 2003, Saddoris et al. 2005). Amygdala neurons show satiety-sensitive gustatory responses and responses to liquid or food-predicting visual stimuli differentiated from air puffs which decrease with increasing behavioral requirements (Nishijo et al. 1988, Yan & Scott 1996, Wilson & Rolls 2005, Paton et al. 2006, Sugase-Miyamoto & Richmond 2005).

Overview of neuronal reward signals


Selected pure reward signals in monkeys (irrespective of sensory and motor aspects)
Brain structureSpecific characteristicsReferences
Dopamine neuronsreward predictionRomo & Schultz 1990
Kawagoe et al. 2004
Morris et al. 2006
prediction error Schultz et al. 1997
Morris et al. 2004
Bayer & Glimcher 2005
temporal prediction errorHollerman & Schultz 1998
Nakahara et al. 2004
adaptive value coding Tobler et al. 2005
motivation Satoh et al. 2003
Orbitofrontal cortexsatiation sensitivityCritchley & Rolls 1996
reward expectation Tremblay & Schultz 1999
adaptive coding Tremblay & Schultz 1999
economic value Padoa-Schioppa & Assad 2006
reversal learning Rolls et al. 1996
novel learning Tremblay & Schultz 2000
Anterior cingulate cortex reward expectation Shidara & Richmond 2002
Striatum reward expectation Hikosaka et al. 1989b
Hollerman et al. 1998
reward type Shidara et al. 1996
Hassani et al. 2001
Amygdala reward prediction Sugase-Miyamoto & Richmond 2005
Paton et al. 2006
Selected action-related value signals in monkeys (conjoint reward and motor aspects)
Prefrontal cortex spatial-rewardWatanabe 1996
Kobayashi et al. 2002
go-nogo-reward Matsumoto et al. 2003
Premotor cortex spatial-reward w/motivationRoesch & Olson 2003
Posterior cingulate cortexspatial-rewardMcCoy et al. 2003
Parietal cortexspatial-reward valuePlatt & Glimcher 1999
Musallam et al. 2004
Striatum go-nogo-reward Hollerman et al. 1998
spatial-reward type Hassani et al. 2001
spatial-reward magnitude Kawagoe et al. 1998
Cromwell et al. 2003
spatial-reward probability Samejima et al. 2005
spatial-reward adaptive codingCromwell et al. 2005
spatial reversal learningPasupathy & Miller 2005
go-nogo novel learningTremblay et al. 1998
Globus pallidus spatial-rewardArkadir et al. 2004
Substantia nigra reticulataspatial-rewardSato & Hikosaka 2002

Acknowledgements

The author acknowledges support by the Wellcome Trust, Swiss National Science Foundation, Human Frontiers Science Program and several other grant and fellowship agencies.

References

  • Apicella P, Scarnati E, Schultz W. Tonically discharging neurons of monkey striatum respond to preparatory and rewarding stimuli. Exp. Brain Res. 84: 672-675, 1991a
  • Apicella P, Ljungberg T, Scarnati E, Schultz W. Responses to reward in monkey dorsal and ventral striatum. Exp. Brain Res. 85: 491-500, 1991b
  • Arkadir D, Morris G, Vaadia E, Bergman H. Independent coding of movement direction annd reward prediction by single pallidal neurons. J. Neurosci. 24: 10047-10056, 2004
  • Bayer HM, Glimcher PW: Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47: 129-141, 2005
  • Bowman EM, Aigner TG, Richmond BJ. Neural signals in the monkey ventral striatum related to motivation for juice and cocaine rewards. J. Neurophysiol. 75: 1061-1073, 1996
  • Brischoux F, Chakraborty S, Brierley DI, Ungless MA. Phasic excitation of dopamine neurons in ventral VTA by noxious stimuli. Proc Natl Acad Sci USA 106: 4894–4899, 2009
  • Carelli RM, Williams JG, Hollander JA: Basolateral amygdala neurons encode cocaine self-administration and cocaine-associated cues. J Neurosci 23: 8204-8211, 2003
  • Critchley HG, Rolls ET. Hunger and satiety modify the responses of olfactory and visual neurons in the primate orbitofrontal cortex. J. Neurophysiol. 75: 1673-1686, 1996
  • Cromwell HC, Hassani OK, Schultz W. Relative reward processing in primate striatum. Exp Brain Res. 162: 520-525, 2005
  • Cromwell HC, Schultz W. Effects of expectations for different reward magnitudes on neuronal activity in primate striatum. J. Neurophysiol. 89 2823-2838, 2003
  • Datla KP, Ahier RG, Young AMJ, Gray JA, Joseph MH. Conditioned appetitive stimulus increases extracellular dopamine in the nucleus accumbens of the rat. Eur J Neurosci, 16, 1987-1993, 2002
  • Dickinson A, Balleine B. Motivational control of goal-directed action. Anim. Learn. Behav. 22: 1-18, 1994
  • Fiorillo CD, Song MR, Yun SR. Diversity and homogeneity in responses of midbrain dopamine neurons. J Neurosci 33: 4693–4723, 2013a
  • Fiorillo CD, Song MR, Yun SR. Multiphasic temporal dynamics in responses of midbrain dopamine neurons to appetitive and aversive stimuli. J Neurosci 33: 4710–4725, 2013b
  • Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299: 1898-1902, 2003
  • Hassani OK, Cromwell HC, Schultz W. Influence of expectation of different rewards on behavior-related neuronal activity in the striatum. J. Neurophysiol. 85: 2477-2489, 2001
  • Hikosaka O, Sakamoto M, Usui S. Functional properties of monkey caudate neurons. II. Visual and auditory responses. J. Neurophysiol. 61: 799-813, 1989a
  • Hikosaka O, Sakamoto M, Usui S. Functional properties of monkey caudate neurons. III. Activities related to expectation of target and reward. J. Neurophysiol. 61: 814-832, 1989b
  • Hikosaka K, Watanabe M. Delay activity of orbital and lateral prefrontal neurons of the monkey varying with different rewards. Cerebral Cortex 10: 263-271, 2000
  • Hollerman JR, Schultz W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neurosci. 1: 304-309, 1998
  • Hollerman JR, Tremblay L, Schultz W. Influence of reward expectation on behavior-related neuronal activity in primate striatum. J. Neurophysiol. 80: 947-963, 1998
  • Horvitz JC, Stewart T, Jacobs BL. Burst activity of ventral tegmental dopamine neurons is elicited by sensory stimuli in the awake cat. Brain Res. 759: 251-258, 1997
  • Huang C-F and Litzenberger RH: Foundations for Financial Economics. Prentice-Hall, Upper Saddle River, NJ 1988
  • Izhikevich EM. Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral Cortex 17: 2443-2452, 2007
  • Kahneman D, Tversky A. Choices, Values, and Frames. American Psychologist 4, 341-350, 1984
  • Kawagoe R, Takikawa Y, Hikosaka O. Expectation of reward modulates cognitive signals in the basal ganglia. Nature Neurosci. 1: 411-416, 1998
  • Kawagoe R, Takikawa Y, Hikosaka O. Reward-predicting activity of dopamine and caudate neurons - a possible mechanism of motivational control of saccadic eye movement. J. Neurophysiol. 91: 1013-1024, 2004
  • Kobayashi S, Lauwereyns J, Koizumi M, Sakagami, M, Hikosaka O: Influence of reward expectation on visuospatial processing in macaque lateral prefrontal cortex. J. Neurophysiol. 2002, 87: 1488-1498, 2002
  • Kobayashi S, Nomoto K, Watanabe M, Hikosaka O, Schultz W, Sakagami M: Influences of rewarding and aversive outcomes on activity in macaque lateral prefrontal cortex. Neuron 51: 861-870, 2006
  • Kobayashi S, Schultz W. Reward contexts extend dopamine signals to unrewarded stimuli. Curr Biol 24: 56-62, 2014
  • Ljungberg T, Apicella P, Schultz W. Responses of monkey dopamine neurons during learning of behavioral reactions. J. Neurophysiol. 67: 145-163, 1992
  • Matsumoto, M., Hikosaka, O. Two types of dopamine neuron distinctively convey positive and negative motivational signals. Nature 459: 837-841, 2009
  • Matsumoto K, Suzuki W, Tanaka K. Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science 301: 229-232, 2003
  • McCoy AN, Crowley,JC, Haghighian G, Dean HL, Platt ML. Saccade reward signals in posterior cingulate cortex. Neuron 40, 1031–1040, 2003
  • Mirenowicz J, Schultz W. Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379: 449-451, 1996
  • Montague PR, Dayan P, Sejnowski TJ. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16: 1936-1947, 1996
  • Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H. Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43: 133-143, 2004
  • Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H. Midbrain dopamine neurons encode decisions for future action. Nature Neurosci. 9: 1057-1063, 2006
  • Musallam S, Corneil, BD, Greger B, Scherberger H and Andersen RA: Cognitive control signals for neural prosthetics. Science 305: 258-262, 2004
  • Nakahara H, Itoh H, Kawagoe R, Takikawa Y, Hikosaka O. Dopamine neurons can represent context-dependent prediction error. Neuron 41: 269-280, 2004
  • Nishijo H, Ono T, Nishino H: Single neuron responses in amygdala of alert monkey during complex sensory stimulation with affective significance. J. Neurosci. 8: 3570-3583, 1988
  • Nomoto K, Schultz W, Watanabe T, Sakagami M. Temporally extended dopamine response to perceptually demanding reward-predictive stimuli. J Neurosci 30: 10692-10702, 2010
  • Padoa-Schioppa C, Assad JA. Neurons in the orbitofrontal cortex encode economic value. Nature 441: 223-226, 2006
  • Pan WX, Schmidt R, Wickens J, Hyland BI. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25:6235-6242,2005
  • Pasupathy A, Miller EK. Different time courses of learning-related activity in the prefrontal cortex and striaum. Nature 433: 873-876, 2005
  • Paton JJ, Belova MA, Morrison SE, Salzman CD: The primate amygdala represents the positive and negative value of visual stimuli during learning. Nature 439: 865-870, 2006
  • Platt ML, Glimcher PW. Neural correlates of decision variables in parietal cortex. Nature 400: 233-238, 1999
  • Pratt WE, Mizumori SJY: Characteristics of basolateral amygdala neuronal firing on a spatial memory task involving differential reward. Behav. Neurosci. 112: 554-570, 1998
  • Preuschoff K, Bossaerts P, Quartz SR. 2006. Neural differentiation of expected reward and risk in human subcortical structures. Neuron 51: 381–90
  • Ravel S, Legallet E, Apicella P. Responses of tonically active neurons in the monkey striatum discriminate between motivationally opposing stimuli. J. Neurosci. 23: 8489-8497, 2003
  • Roesch MR, Olson, C.R.: Impact of expected reward on neuronal activity in prefrontal cortex, frontal and supplementary eye fields and premotor cortex. J. Neurophysiol. 90: 1766-1789, 2003
  • Roitman MF, Stuber GD, Phillips PEM, Wightman RM, Carelli RM. 2004. Dopamine operates as a subsecond modulator of food seeking. J. Neurosci. 24: :1265–71
  • Rolls ET, Critchley HD, Mason R, Wakeman EA. Orbitofrontal cortex neurons: role in olfactory and visual association learning. J. Neurophysiol. 75: 1970-1981, 1996
  • Romo R, Schultz W. Dopamine neurons of the monkey midbrain: Contingencies of responses to active touch during self-initiated arm movements. J. Neurophysiol. 63: 592-606, 1990
  • Saddoris MP, Gallagher M, Schoenbaum G: Rapid associative encoding in basolateral amygdala depends on connections with orbitofrontal cortex. Neuron 46: 321-331, 2005
  • Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific reward values in the striatum. Science 310, 1337-1340, 2005
  • Sato M, Hikosaka O. Role of primate substantia nigra pars reticulata in reward-oriented saccadic eye movement. J. Neurosci. 22: 2363-2373, 2002
  • Satoh T, Nakai S, Sato T, Kimura M. Correlated coding of motivation and outcome of decision by dopamine neurons. J. Neurosci. 23: 9913-9923, 2003
  • Schoenbaum G, Chiba AA, Gallagher M: Orbitofrontal cortex and basolateral amygdala encode expected outcomes during learning. Nature Neurosci. 1: 155-159, 1998
  • Schoenbaum G, Chiba AA, Gallagher M: Changes in functional connectivity in orbitofrontal cortex and basolateral amygdala during learning and reversal training. J Neurosci 20: 5179-5189, 2000
  • Schultz W, Apicella P, Scarnati E, Ljungberg T. Neuronal activity in monkey ventral striatum related to the expectation of reward. J. Neurosci. 12: 4595-4610, 1992
  • Schultz W, Dayan, P, Montague RR. A neural substrate of prediction and reward. Science 275: 1593-1599, 1997
  • Shidara M, Richmond BJ. Anterior cingulate: single neuron signals related to degree of reward expectancy. Science 296: 1709-1711, 2002
  • Schultz W, Romo R. Responses of nigrostriatal dopamine neurons to high intensity somatosensory stimulation in the anesthetized monkey. J. Neurophysiol. 57: 201-217, 1987
  • Shidara M, Aigner TG, Richmond BJ. Neuronal signals in the monkey ventral striatum related to progress through a predictable series of trials. J. Neurosci. 18: 2613-2625, 1998
  • Shidara M, Richmond BJ. Anterior cingulate: single neuron signals related to degree of reward expectancy. Science 296: 1709-1711, 2002
  • Sugase-Miyamoto Y, Richmond BJ: Neuronal signals in the monkey basolateral amygdala during reward schedules. J Neurosci 25: 11071-11083, 2005
  • Suri R, Schultz W. A neural network with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91: 871-890, 1999
  • Sutton RS, Barto AG, Reinforcement Learning, MIT Press, Cambridge, MA 1998
  • Thorpe SJ, Rolls ET, Maddison S. The orbitofrontal cortex: neuronal activity in the behaving monkey. Exp. Brain Res. 49: 93-115, 1983
  • Tobler PN, Dickinson A, Schultz W. Coding of predicted reward omission by dopamine neurons in a conditioned inhibition paradigm. J. Neurosci 23:10402-10410, 2003
  • Tobler PN, Fiorillo CD, Schultz W. Adaptive coding of reward value by dopamine neurons. Science 307: 1642-1645, 2005
  • Tobler PN, O’Doherty JP, Dolan R, Schultz W. Reward value coding distinct from risk attitude-related uncertainty coding in human reward systems. J. Neurophysiol. 97: 1621-1632, 2007
  • Toyomizu Y, Hishijo H, Uwano T, Kuratsu J, Ono, T.: Neuronal responses of the rat amygdala during extinction and reassociation learning in elementary and configural associative tasks. Eur. J. Neurosci. 15: 753-768, 2002
  • Tremblay L, Hollerman JR, Schultz W. Modifications of reward expectation-related neuronal activity during learning in primate striatum. J. Neurophysiol. 80: 964-977, 1998
  • Tremblay L, Schultz W. Relative reward preference in primate orbitofrontal cortex. Nature 398: 704-708, 1999
  • Tremblay L, Schultz W.: Modifications of reward expectation-related neuronal activity during learning in primate orbitofrontal cortex. J. Neurophysiol. 83: 1877-1885, 2000
  • Ungless MA, Magill PJ, Bolam JP. Uniform inhibition of dopamine neurons in the ventral tegmental area by aversive stimuli. Science 303: 2040-2042, 2004
  • von Neumann J, Morgenstern O. The Theory of Games and Economic Behavior. Princeton University Press, Princeton, 1944
  • Waelti P, Dickinson A, Schultz W. Dopamine responses comply with basic assumptions of formal learning theory. Nature 412: 43-48, 2001
  • Watanabe M. Reward expectancy in primate prefrontal neurons. Nature 382: 629-632, 1996
  • Wilson FAW, Rolls ET. The primate amygdala and reinforcement: a dissociation between rule-based and associatively-mediated memory revealed in neuronal activity. Neuroscience 133: 1061-1072, 2005
  • Wise RA. Brain reward circuitry: insights from unsensed incentives. Neuron 36: 229-240, 2002
  • Yan J, Scott TR: The effect of satiety on responses of gustatory neurons in the amygdala of alert cynomolgus macaques. Brain Res 740: 193-200, 1996
  • Young AMJ. Increased extracellular dopamine in nucleus accumbens in response to unconditioned and conditioned aversive stimuli: studies using 1 min microdialysis in rats. J Neurosci Meth 138: 57–63, 2004

Internal references

  • Valentino Braitenberg (2007) Brain. Scholarpedia, 2(11):2918.
  • Tomasz Downarowicz (2007) Entropy. Scholarpedia, 2(11):3901.
  • Keith Rayner and Monica Castelhano (2007) Eye movements. Scholarpedia, 2(10):3649.
  • Peter Jonas and Gyorgy Buzsaki (2007) Neural inhibition. Scholarpedia, 2(9):3286.
  • John Dowling (2007) Retina. Scholarpedia, 2(12):3487.
  • Wolfram Schultz (2007) Reward. Scholarpedia, 2(3):1652.
  • Philip Holmes and Eric T. Shea-Brown (2006) Stability. Scholarpedia, 1(10):1838.
  • Andrew G. Barto (2007) Temporal difference learning. Scholarpedia, 2(11):1604.

External Links

Author's webpage

See Also

Actor-Critic Method, Attention, Basal Ganglia, Conditioning, Neuroeconomics, Q-Learning, Reinforcement Learning, Reward, Temporal Difference Learning

Personal tools
Namespaces

Variants
Actions
Navigation
Focal areas
Activity
Tools