# Reward

Post-publication activity

Curator: Wolfram Schultz

Figure 1: Rewards.

Rewards make us come back for more. We need them for survival, use them for behavioral choices that maximize them and feel good about them.

Public perception associates rewards primarily with happiness and special gratification, but behavioral research suggests wider functions. There exist various definitions for reward, and this article suggests that reward has a wide spectrum of functions compatible with psychological animal learning theory and economic decision making. Rewards are objects, events, situations or activities that attain positive motivational properties from internal brain processes. They have the potential to (1) increase the probability and intensity of behavioral actions leading to such objects (learning, also called positive reinforcement), (2) generate approach and consummatory behavior and constitute outcomes of economic decision-making, and (3) induce subjective feelings of pleasure and hedonia. Rewards are of crucial importance for individual and gene survival and support such elementary processes as drinking, eating and reproduction. Largely similar behavioral processes are engaged for higher order rewards such as money, novelty and cognitive and social rewards. The basic reward objects are polysensory and do not engage specialized reward receptors, and the brain extracts the reward information from visual, auditory, somatosensory, olfactory and other sensory information. The identification of higher order rewards depends on additional cognitive processes. Thus rewards are not defined by the physics and chemistry of their inputs but by the behavioral reactions they induce. This article describes the key behavioral functions of rewards.

## Behavioral Functions of Reward

### A call for behavioral theory

The laws of mechanics, optics, acoustics and biochemistry define the key functions of primary sensory systems in the brain. The dedicated physical and chemical receptors of these systems translate environmental energy and information into neural language. By contrast, there are no dedicated receptors for reward, and the information enters the brain through the touch, taste, visual and auditory receptors of primary sensory systems. The functions of rewards cannot be derived entirely from the physics and chemistry of input events and are based primarily on behavioral effects. Thus the investigation of reward functions requires behavioral theories that can conceptualize the different effects of rewards on behavior. Animal learning theory and economic decision theories provide coherent frameworks for the investigation of neural reward mechanisms. The central tenets of these theories are based on observable behavior and, superficially, on the behaviorist approach, although mental states of representation and prediction are included.

Figure 2: Basic learning conditions: Pairing of conditioned stimulus (CS) and reward (US).

### Learning

Rewards induce changes in observable behavior. They serve as positive reinforcers by increasing the frequency of the behavior that results in reward.

• In Pavlovian, or classical, conditioning, the outcome follows the conditioned stimulus irrespective of any behavioral reaction, and repeated pairing of stimuli with outcomes leads to a representation of the outcome that is evoked by the stimulus and elicits the behavioral reaction ( Figure 2). Thus Pavlovian conditioning produces outcome predictions and establishes conditioned incentives.
• By comparison, instrumental, or operant, conditioning requires the subject to execute a behavioral response; without such response no reward occurs. Instrumental conditioning increases the frequency of those behaviors that are followed by reward by reinforcing stimulus-response links. Through instrumental conditioning rewards come to serve as goals of behavior.

Instrumental conditioning allows subjects to influence their environment and determine their rate of reward. However, there is general agreement that the stimuli used in instrumental conditioning have become reward predictors through Pavlovian learning.

Figure 3: Learning curve: Learning is proportional to prediction error (received-predicted reward) and reaches an asymptote as the prediction error approaches zero. $$V$$=prediction, $$\alpha$$ and $$\beta$$ are learning constants, $$\lambda$$=reward.

Associative learning depends crucially on the discrepancy between the occurrence of a reward and its prediction. The importance of such prediction errors is derived from Kamin’s blocking effect (1969) which postulates that a reward that is fully predicted does not contribute to the learning of a stimulus or action, even when it has been repeatedly paired with the stimulus or action. This is conceptualized in the associative Rescorla-Wagner learning rules (Rescorla & Wagner 1972), according to which learning advances only to the extent to which a reinforcer is unpredicted and slows progressively as the reinforcer becomes more predicted ( Figure 3). The omission of a predicted reinforcer reduces the strength of the conditioned stimulus and produces extinction of behavior. So-called attentional learning rules in addition relate the capacity to learn (associability) in certain situations to the degree of attention evoked by the conditioned stimulus or reward (Mackintosh 1975, Pearce & Hall 1980).

### Approach behavior and decision-making

Rewards elicit approach and consummatory behavior and serve as incentives by attracting goal pursuit. This is due to the objects being labelled with appetitive value through innate mechanisms (primary rewards) or, in most cases, classical conditioning, after which these objects constitute, strictly spoken, conditioned reinforcers (for learning) or incentives (for action) (Wise 2002). Nutritional rewards derive their value in addition from hunger and thirst states (drive), and satiation of the animal reduces the values of these objects and consequently the behavioral reactions.

Conditioned, reward-predicting stimuli induce and approach behaviors towards the reward which are less specific for the reward object than the consummatory behavior. In Pavlovian or classical conditioning, subjects often show non-consummatory behavioral reactions that normally occur after the primary reward when reward-predicting stimuli are absent. These reactions are not required to obtain the reward but may increase the chance of consuming the reward. Thus Pavlovian conditioning involves the transfer of a part of the behavioral response from the primary reward to the conditioned stimulus.

In instrumental conditioning, the action becomes associated with reward and thus obtains a value. Decision-making mechanisms can be based on the action values of the different options (Sutton & Barto 1998). Furthermore a reward becomes a goal for instrumental action if there is, at the time of the action, a representation of the reward and of the contingency (dependency) of the reward on that action (Dickinson & Balleine 1994).

When more than one option is available, animals show preferences for specific options, expressed as the probability of choosing one option over all others. Choices are crucially based on predictions of outcomes. Without predictions agents can only guess, as they do not know what they will get when making the choice. This puts the Pavlovian learning of reward predictors into a key role for decision-making. The reward function in decision-making is thus indirect: rewards serve for predictive Pavlovian learning, and the acquired prediction directs the choice. Reward itself is the outcome of the overt choice, and its evaluation is used to update the prediction used for the next decision.

The possible differences between reward and positive reinforcement become apparent in approach behavior and decision-making. Once established, these behaviors are driven by the attractive properties of rewards but are not, strictly spoken, identical to the strengthening of behavior associated with reinforcement. The strengthening of behavior that is likely to take place in these situations (or the prevention of extinction) is a separate phenomenon that is adequately described by positive reinforcement.

### Quantification of reward value and uncertainty

Decision mechanisms need to maximize the outcome of choices by comparing the values of all available options and choosing the option with the highest value. Values of the same kinds of reward can be compared on the basis of their probability distributions. The expected value (EV) of a probability distribution denotes the summed product of each reward magnitude weighted by its probability and provides a single numeric value for the outcome. However, additional factors play important roles, and outcomes are measured by the utility they have for the individual decision maker. Utilities are assessed by preferences in overt choice behavior, resulting in the quantification of outcomes as single scalar variable: $\mbox{Expected Utility} = \sum u(x)p$ ($$u$$ is utility, $$x$$ is value, $$p$$ is probability) (Von Neumann & Morgenstern 1944).

Figure 4: Utility functions can serve to describe the influence of uncertainty on the utility of rewarding outcomes. (a) Concave utility function in a risk averse subject. In a choice between a safe reward and a gamble of two equiprobable low and high rewards (1 and 9 units, respectively), the modeled subject may show choice indifference for a safe reward of 3.3 units (certainty equivalent of gamble, CE). This value is below the expected value (EV) of 5 units of the gamble’s probability distribution. Thus the expected utility (EU) of the gamble is lower than the utility of EV, and the difference is due to the uncertainty. (b) Convex utility function in a risk-seeking subject. CE is higher than EV (6.5 vs. 5 units), and the EU of the gamble is higher than the utility of the EV. This subject places a higher utility on uncertain compared to safe rewards.

A major factor influencing utility is uncertainty. In risk averse subjects, uncertainty reduces the utility of a reward (a reward has less value for me if I am afraid that I won’t get it), whereas risk seekers find higher utility in an uncertain reward (I like the uncertainty and thus prefer risky rewards over safe ones), all other reward parameters being equal ( Figure 1). The influence of uncertainty on outcome utility is conceptualized by the Taylor series expansion of expected utility (Huang & Litzenberger 1988).

Figure 5: Discounting of reward utility with increasing temporal delays, as measured in behavioral experiments.

Another major factor influencing the valuation of rewards is the temporal delay of the reward following a reward-predicting stimulus or an instrumental action. Later rewards lose their utility in a hyperbolically or exponentially decaying fashion (temporal discounting; Loewenstein & Prelec 1992; Figure 2). Subjects usually prefer earlier over later rewards, and learning is slowed as reward is delayed. If rewards occur with variable delays, the temporal discounting of reward utility combines with the uncertainty and usually produces risk-seeking because of the hyperbolically or exponentially flattening, convex, temporal discounting function.

### Pleasure

Subjective feelings of pleasure and the resulting positive emotion represent key functions of rewards. It is quite likely that the pleasure derived from an object, event, situation or activity is sufficient to produce a positive reinforcing effect on behavior (what I got makes me feel good, and therefore I will repeat the action that produced the pleasure). However, it is unclear to which extent pleasure is a necessary condition for objects to be reinforcing, and not all reward objects may induce noticeable pleasure. Indeed, recent theories propose a distinction between an unconscious 'wanting' irrespective of pleasure, and a pleasurable 'liking' of rewards (Berridge & Robinson 2003). However, pleasure may be simply an epiphenomenon (my behavior gets reinforced and, in addition, I feel good because of the outcome). Animal experiments assuming subjective states related to reward run into obvious problems because of the lack of common language. Other than for specific investigations of hedonic mechanisms, the issue often may be left aside when studying the neural mechanisms of reward in controlled behavioral neurophysiological experiments on animals.

### Motivational valence

Punishers have opposite valence to rewards, induce withdrawal behavior and act as negative reinforcers by increasing the behavior that results in decreasing the aversive outcome. Avoidance can be passive when subjects increasingly refrain from doing something that is associated with a punisher, or active by increasing an instrumental response that is likely to reduce the impact of a punisher. Punishers induce negative emotional states of anger, fear and panic.

### Less specific behavioral reactions

Rewards share a number of stimulus components with other behaviorally relevant objects. Rewards come in different sensory submodalities and have specific sensory stimulus attributes, such as form, color and spatial position. Rewards induce general behavioral activation and alerting and stimulus-driven attentional reactions that are also produced by punishers and by physically salient stimuli such as novel, large or rapidly moving objects.

The behavioral reactions to sensory, activating, alerting or attentional stimulus attributes can be measured by preferences in free choice behavior, movement responses in reaction time tasks, and vegetative responses such as skin conductance, heart rate, pupillary diameter and salivation. Behavioral reactions to these stimuli can be distinguished from their rewarding properties by using different non-rewarding stimuli with specific sensory and rewarding properties at different spatial positions. Discriminations can be difficult when these objects have also rewarding components, as in the case of novel or physically salient objects. A good initial approximation is the distinction of rewards from punishers.

Figure 6: Primate dopamine systems.

## Overview of Reward Structures in the Brain

Information about rewards is processed in a number of brain structures. The dopamine neurons, named after the neurotransmitter they release with nerve impulses in their projection territories, are located in the midbrain structures substantia nigra (pars compacta) and the medially adjoining ventral tegmental area (VTA) ( Figure 6). The axons of dopamine neurons project to the striatum (caudate nucleus, putamen and ventral striatum including nucleus accumbens), the dorsal and ventral prefrontal cortex, and a number of other structures.

Figure 7: Principal brain structures processing reward information.

Further reward signals are found in the neurons of the dopamine projection structures themselves, including the striatum, orbitofrontal cortex and amygdala ( Figure 1, blue). Rewards influence the action-related activity of neurons in additional structures including striatum and prefrontal and parietal cortex ( Figure 1, green). Additional brain structures influenced by reward include the supplementary motor area in the frontal lobe, the rhinal cortex in the temporal lobe, the pallidum and subthalamic nucleus in the basal ganglia, and a few others.

## Overview of Human Reward Processes

The past decade has brought an enormous wealth of knowledge on human reward processing using functional brain imaging. Figure 4 gives a small overview of the substantial and reproducible involvement of both the dorsal and ventral striatum in a variety of basic and higher reward processes. The other main human reward structures not shown here largely overlap with those found in neuronal studies in animals and mentioned above, namely the midbrain dopamine groups, different regions of frontal cortex and the amygdala.

Figure 8: Robust activation of human striatum by a large variety of rewards.

## Acknowledgements

The author acknowledges support by the Wellcome Trust, Swiss National Science Foundation, Human Frontiers Science Program and several other grant and fellowship agencies.

## References

• Aharon I, Etcoff N, Ariely D, Chabris CF, O’Connor E, Breiter HC. Beautiful faces have variable reward value: fMRI and behavioral evidence. Neuron 32: 537-551, 2001
• Aron A, Fisher H, Mashek DJ, Strong G, Li H, Brown LL. Reward, motivation, and emotion systems associated with early-stage intense romantic love. J Neurophysiol 94: 327–337, 2005
• Berridge KC, Robinson TE. 2003. Parsing reward. Trends Neurosci. 26(9):507–13
• Blood AJ, Zatorre RJ: Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion. Proc Nat Acad Sci (USA) 98: 11818-11823, 2001
• De Quervain DJ, Fischbacher U, Treyer V, Schellhammer M, Schnyder U, Buck A, Fehr E. The neural basis of altruistic punishment. Science 305: 1254-1258, 2004
• Dickinson A, Balleine B. Motivational control of goal-directed action. Animal Learning and Behavior, 22, 1-18, 1994
• Erk S, Spitzer M, Wunderlich AP, Galley L, Walter H. Cultural objects modulate reward circuitry. Neuroreport 13:2499-2503, 2002
• Huang C-F and Litzenberger RH: Foundations for Financial Economics. Prentice-Hall, Upper Saddle River, NJ 1988
• Kamin LJ. Selective association and conditioning. In: Fundamental issues in instrumental learning (eds. Mackintosh, N.J. and Honig, W.K.) Dalhousie University Press, p 42-64, 1969
• Loewenstein G, Prelec D: Anomalies in intertemporal choice: evidence and an interpretation. Q J Econ 107: 573-597, 1992
• Mackintosh NJ. A theory of attention: Variations in the associability of stimulus with reinforcement. Psychol. Rev. 82: 276-298, 1975
• Mobbs D, Greicius MD, Abdel-Azim E, Menon V, Reiss AL. Humor modulates the mesolimbic reward centers. Neuron, Vol. 40, 1041–1048, 2003
• O’Doherty J, Dayan P, Friston K, Critchley H, Dolan RJ. Temporal difference models and reward-related learning in the human brain. Neuron 28: 329-337, 2003
• Pearce JM, Hall G. A model for Pavlovian conditioning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87: 532-552, 1980
• Petrovic P, Dietrich T, Fransson P, Andersson J, Carlsson K, Ingvar M. Placebo in emotional processing-induced expectations of anxiety relief activate a generalized modulatory network. Neuron 46: 957–969, 2005
• Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: Classical Conditioning II: Current Research and Theory (Eds Black AH, Prokasy WF) New York: Appleton Century Crofts, pp. 64-99, 1972
• Sutton RS, Barto AG, Reinforcement Learning, MIT Press, Cambridge, MA 1998
• Thut G, Schultz W, Roelcke U, Nienhusmeier M, Maguire RP, Leenders KL. Activation of the human brain by monetary reward. NeuroReport 8: 1225-1228, 1997
• von Neumann J, Morgenstern O. The Theory of Games and Economic Behavior. Princeton University Press, Princeton, 1944
• Wise RA. Brain reward circuitry: insights from unsensed incentives. Neuron 36: 229-240, 2002

Internal references