|Roy Wise (2009), Scholarpedia, 4(8):2450.||doi:10.4249/scholarpedia.2450||revision #91703 [link to/cite this article]|
Reinforcement is the term used by learning theorists to describe the underlying process of associative learning.
The term reinforcement was introduced by Pavlov in 1903 to describe the strengthening of the association between an unconditioned and a conditioned stimulus that results when the two are presented together. If the association is not periodically "reinforced" by such pairing, the effectiveness of the conditioned stimulus decays: the conditioned response undergoes extinction. For Pavlov, any unconditioned stimulus, such as food or a puff of air to the eye, was a potential reinforcer; the pairing of such a stimulus with a neutral stimulus constituted reinforcement. The term denoted for Pavlov the strengthening (and the establishment) of an association between a conditioned stimulus and its unconditioned parent stimulus (Pavlov, 1928).
The term reinforcement is currently used more in relation to response learning than to stimulus learning. Thorndike first introduced the concept of response reinforcement with the suggestion that responses that are “closely followed by satisfaction” will be “more firmly connected with the situation, so that, when it recurs, they will be more likely to recur.” This is the essence of Thorndike’s (Thorndike, 1911) Law of Effect, a more formal phrasing of the “stamping in” of stimulus response connections that he introduced a decade earlier (Thorndike, 1898). While Thorndike discussed the essential notion of reinforcement, it was not until 1933 that Thorndike (Thorndike, 1933) and Skinner (Skinner, 1933) adopted Pavlov’s term reinforcement to denote the strengthening of stimulus-response associations.
While Skinner originally thought of instrumental behavior and used the term reinforcement within the framework of Pavlovian conditioning, he soon (Skinner, 1937) came to see stimulus (Pavlovian) and response (“operant” or “instrumental”) learning as involving distinct principles and requiring different frameworks. He came to deny that what we think of as “goal-directed” behavior is initially elicited by an external stimulus, and argued instead that the initial acts that are subsequently shaped into instrumental behavior are randomly emitted by the organism much as particles are randomly emitted by a radioactive molecule. He renamed what had heretofore been termed a “response,” calling it an “operant” -- but the term now included a controlling stimulus in the so-called three-term contingency: reinforcement strengthens a response in the presence of a controlling or "discriminative" stimulus. His new formulation was “If the occurrence of an operant is followed by presentation of a reinforcing stimulus, the strength is increased” (Skinner, 1938), p 21. One problem with Skinner’s formulation is that he does not specify, in his formal statement, what it is that is strengthened. Elsewhere, he indicates that the “operant” is strengthened. By this he means that its frequency has increased.
- For Pavlov, what was strengthened is the association between two stimuli (S-S learning).
- For Thorndike, what was strengthened was the association between a stimulus and a response (S-R learning).
- For Skinner there is no relationship to be strengthened; there is no stimulus to participate in an association. There is only the operant, tied only probabilistically, not causally, to any antecedent event with which it might be associated.
Within a few pages of defining the behavior of interest as an “operant,” however, Skinner reverts to the common term response:
- “In Chapter One it was pointed out that there are two types of conditioned reflex, defined according to whether the reinforcing stimulus is correlated with a response” (Skinner, 1938), p62.
Throughout his third and subsequent chapters, it is “responses” or “responses per hour” that appear on the ordinates of his graphs. Having just flatly stated that the operant is a behavioral emission, not a response to an eliciting stimulus, Skinner goes on to suggest what the animal learns is the relationship between its behavior and its consequences, a form of learning designated “response-outcome” (R-O) learning by more recent workers. Thus in the Skinnerian framework, it is the association between a response and its outcome that is learned and “reinforced.”
The first great theory of reinforcement was that it stamped in memory by reducing physiological need or imbalance (Hull, 1943). The notion was attractive because it spoke to the obvious fact that learning was the mechanism by which higher animals could meet their needs despite environmental variations that defied the mechanism of instincts. It was myopic, however, in that it dealt only with response learning and not with the stimulus learning for which the term had first been invoked. Even in the case of response learning, it was soon clear that need-reduction was not a necessary condition for reinforcement. Most telling was the demonstration that rats would learn to work for sweeteners with no nutritional value (Sheffield and Roby, 1950) or for direct electrical stimulation of certain brain pathways (Olds and Milner, 1954). Thirsty rats will compulsively lick an airstream that evaporates saliva and further dehydrates the animal (Freed and Mendelson, 1974). Indeed, animal behavior is replete with examples of reinforcement that reduces no obvious physiological need (Harlow, 1953).
Theories of reinforcement that postulate physiological mechanisms fall into two categories: one that attempts to characterize the anatomical substrate (Glickman and Schiff, 1967; Gallistel et al., 1981; Wise, 2002: see Differentiation from reward below) and one that attempts to characterize the critical neurophysiological or neurochemical events. The focus on critical events centers around the notion of memory consolidation; the stamping-in or reinforcement of memory and the consolidation of memory are conceptually indistinguishable (Landauer, 1969).
Evidence that reinforcers enhance memory consolidation comes from studies in which the reinforcer is administered following an unrelated training episode. The prototypical demonstration involved a step-down avoidance task; animals that were given access to food after training trials showed greater retention of the avoidance training than did animals not given immediate food (Huston et al., 1974). Similarly,
- post-trial footshock can reinforce memory consolidation (White and Legree, 1984).
- Post-trial ingestion of sucrose is also very effective;
- post-trial saccharin is less so despite comparison of iso-hedonic concentrations (Messier and White, 1984).
The cellular basis for memory consolidation is an area of active research and hypothesis. A number of lines of evidence confirm that dopamine is important for instrumental learning with food, brain stimulation, and drug reinforcement (Wise, 2004). Moreover, post-trial dopamine release can enhance memory consolidation (White, 1996). Finally, dopamine appears to play important roles in long-term potentiation and long-term depression in mammals, models of learning and memory at the cellular level (Wise, 2004), just as serotonin plays such a role in aplysia (Kandel, 2001). Dopamine does not, however, play an absolutely essential role. While rats treated with dopamine antagonists behave as if food, brain stimulation, and addictive psychomotor stimulants are no longer reinforcing, knockout mice that lack dopamine from birth can learn flavor preferences (Cannon and Palmiter, 2003), and, if given caffeine, food rewarded T-maze response habits (Robinson et al., 2005). Other systems thus are capable of taking over these functions in mice that are born with dopamine deficiency.
Differentiation from “reward”
The distinction between notions of reinforcement and reward is difficult because of the commonsense assumptions often associated with the latter. Many scientists use the term reinforcement and eschew the term reward on the grounds of precision and objectivity, while other scientists, also in the name of precision, use the term reward preferentially (Wise, 1989).
The most widely accepted distinction is that rewards are positive reinforcers, objects or events that are approached and not withdrawn from, whereas reinforcers need not be. A related connotation is that rewards are often taken to be psychologically hedonic, whereas reinforcers need not be. The term reward, in this perspective, is synonymous with the phrase “positive reinforcer” (White, 1989). The class of reinforcers includes negative reinforcers, a phrase with its own definitional confusions. Negative reinforcement is reinforcement that results from the termination of an ongoing—usually aversive—condition. Do we then call the aversive condition a negative reinforcer or do we call it a punisher? There is no consensus on this question in popular usage, but in the specialist literature the presentation of a painful stimulus is designated punishment and not negative reinforcement.
One group of specialists that often uses the term reward rather than the term reinforcement involves those who study animals trained to lever-press for direct electrical stimulation of the brain. In this case the stimulation has a memory-dependent reinforcing effect but also memory-independent momentary “priming” effect. The priming effect energizes the animal and briefly increases the probability that the response that earned it will be repeated. This is not an effect that is stored in memory. The effectiveness of priming decays within tens of seconds, whereas the reinforcing (post-trial) effects of stimulation are remembered for days (Gallistel et al., 1974). The reinforcing effect of post-trial reinforcement finds its way into long-term memory, whereas the priming effect of pre-trial presentation of the same stimulation does not. Because the animal typically responds 50-100 times a minute for the stimulation, priming effect of each stimulation is quite significant in these studies. For this reason, the stimulation is often termed “brain stimulation reward,” a phrase that does not differentiate the priming and reinforcing actions that jointly determine the animal’s response rate (Wise, 1989). The fact that physiological psychologists tend to prefer the term reward whereas behavioral pharmacologists tend to prefer the term reinforcement (despite the fact that self-administered drugs, like self-administered brain stimulation, have both priming and reinforcing actions: Pickens and Harris, 1968) adds to the uncertainty of non-specialists as to which term should be preferred.
Whereas positive reinforcers are often associated with conscious pleasure, it is not clear that they are necessarily so. Nor is it clear that negative reinforcers need be associated with conscious pain or distress. It is not clear that reinforcement has necessary subjective correlates. Indeed, the subjective ratings of two sets of reinforcing stimuli do not necessarily predict which set the subject will lever-press to view (Aharon et al., 2001). Human subjects report that the subjective pleasure from heroin or cocaine injections decreases dramatically with repeated drug use, yet the injections still exert strong control over their drug-seeking behavior. It is possible that there is no conscious correlate of the fundamental process of reinforcement, and that pleasure (or pain) are reported merely as the subject’s best guess as to what influenced their behavior. For example, studies in humans of the subjective correlates of motivation and reinforcement and attempts to model subjective states in animals have led to the conclusion that wanting an incentive and liking an incentive are not necessarily conscious experiences (Berridge and Winkielman, 2003).
Aharon I, Etcoff N, Ariely D, Chabris CF, O'Connor E, Breiter HC (2001) Beautiful faces have variable reward value: fMRI and behavioral evidence. Neuron 32:537-551.
Berridge KC, Winkielman P (2003) What is an unconscious emotion? (The case for unconscious "liking"). Cognition and Emotion 17:181-211.
Berridge KC, Robinson TE. (2003) Parsing reward. Trends Neurosci. 26(9):507-13. Review. Erratum in: Trends Neurosci. 26(11):581.
Cannon CM, Palmiter RD (2003) Reward without dopamine. Journal of Neuroscience 23:10827-10831.
Freed WJ, Mendelson J (1974) Airlicking: Thirsty rats prefer a warm dry airstream to a warm humid airstream. Physiology & Behavior 12:557-561.
Gallistel CR, Stellar JR, Bubis E (1974) Parametric analysis of brain stimulation reward in the rat: I. The transient process and the memory-containing process. Journal of Comparative and Physiological Psychology 87:848-859.
Gallistel CR, Shizgal P, Yeomans J (1981) A portrait of the substrate for self-stimulation. Psychological Review 88:228-273.
Glickman SE, Schiff BB (1967) A biological theory of reinforcement. Psychological Review 74:81-109.
Harlow HF (1953) Mice, monkeys, men and motives. Psychological Review 60:23-32.
Hull CL (1943) Principles of Behavior. New York: Appleton-Century-Crofts.
Huston JP, Mondadori C, Waser PG (1974) Facilitation of learning by reward of post-trial memory processes. Experietia 30:1038-1040.
Kandel ER (2001) The molecular biology of memory storage: a dialogue between genes and synapses. Science 294:1030-1038.
Landauer TK (1969) Reinforcement as consolidation. Psychological Review 76:82-96.
Messier C, White NM (1984) Contingent and non-contingent actions of sucrose and saccharin reinforcers: Effects on taste preference and memory. Physiology & Behavior 32:195-203.
Olds J, Milner PM (1954) Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. Journal of Comparative and Physiological Psychology 47:419-427.
Pavlov IP (1928) Lectures on conditioned reflexes. New York: International Publishers.
Pickens R, Harris WC (1968) Self-administration of d-amphetamine by rats. Psychopharmacologia 12:158-163.
Robinson S, Sandstrom SM, Denenberg VH, Palmiter RD (2005) Distinguishing whether dopamine regulates liking, wanting, and/or learning about rewards. Behavioral Neuroscience 119:5-15.
Sheffield FD, Roby TB (1950) Reward value of a non-nutritive sweet taste. Journal of Comparative and Physiological Psychology 43:471-481.
Skinner BF (1933) The rate of establishment of a discrimination. Journal of General Psychology 9:302-350.
Skinner BF (1937) Two types of conditioned reflex: A reply to Konorski and Miller. Journal of General Psychology 16:272-279.
Skinner BF (1938) The Behavior of Organisms. New York: Appleton-Century-Crofts.
Thorndike EL (1898) Animal intelligence: An experimental study of the associative processes in animals. Psychological Monographs 8:1-109.
Thorndike EL (1911) Animal intelligence. New York: Macmillan.
Thorndike EL (1933) A theory of the action of the after-effects of a connection upon it. Psychological Review 40:434-439.
White NM (1989) Reward or reinforcement: what's the difference? Neurosci Biobehav Rev 13:181-186.
White NM (1996) Addictive drugs as reinforcers: multiple partial actions on memory systems. Addiction 91:921-949.
White NM, Legree P (1984) Effect of post-training exposure to an aversive stimulus on retention. Physiological Psychology 12:233-236.
Wise RA (1989) The brain and reward. In: The Neuropharmacological Basis of Reward (Liebman JM, Cooper SJ, eds), pp 377-424. Oxford: Oxford University Press.
Wise RA (2002) Brain reward circuitry: Insights from unsensed incentives. Neuron 36:229-240.
Wise RA (2004) Dopamine, learning and motivation. Nature Reviews Neuroscience 5:483-494.
- Valentino Braitenberg (2007) Brain. Scholarpedia, 2(11):2918.
- Nestor A. Schmajuk (2008) Classical conditioning. Scholarpedia, 3(3):2316.
- Howard Eichenbaum (2008) Memory. Scholarpedia, 3(3):1747.
- John E. R. Staddon and Yael Niv (2008) Operant conditioning. Scholarpedia, 3(9):2318.
- Florentin Woergoetter and Bernd Porr (2008) Reinforcement learning. Scholarpedia, 3(3):1448.
- Wolfram Schultz (2007) Reward. Scholarpedia, 2(3):1652.
- Wolfram Schultz (2007) Reward signals. Scholarpedia, 2(6):2184.
Classical Conditioning, Operant Conditioning, Reinforcement Learning, Reward, Reward Signals