A stimulus that alters the probability or intensity of response

'Reinforcer' (fortis, 'strong' in Latin), which is an agent or event, and 'reinforcement', which is the process that this agent is assumed to trigger and sustain, are among the most loaded terms in the behavioural literature. This is not because the concepts involved are necessarily more complex than certain other behavioural concepts, but because frequently their discussion connotes particular theoretical constructs, some of which have gravitated toward the status of a religious sect with all the convictions and emotions involved ("paradigm, "zeitgeist). The reinforcer, via the postulated reinforcement, can shape behaviour, and also maintain the response level once achieved; if the reinforcer is removed, the behaviour risks "extinction. The reinforcer itself is traditionally not considered to produce learning, only to augment response, which promotes "associations. Examples for reinforced learning are provided in many contexts in this book (e.g. Figures pp. 70, 76, 129, 131; "instrumental conditioning). Whether this dissociation between the reinforcing and 'teaching' actions of stimuli, respectively, holds water at "reduced "levels of description, is debatable (e.g. Shimizu et al. 2000; Berman and Dudai 2001).1 'Stimulus' in the above definition refers to stimuli presented intentionally to the "subject, or to incidental stimuli, or to stimuli that result from the exogenous and endogenous effects of the "subject's action; in the nervous system all events, including actions, translate into stimuli. Hence stimuli in general could have both eliciting and reinforcing functions, and the experienced experimenter makes good use of both.

On the history of the concept, two main approaches are distinguished in the pre-scientific thinking. One is the totalitarian, 'exogenous' approach, characteristic of certain religions and regimes. It states that conforming to the rules is bound to bring reward by deity or king, whereas opposing the rules will result in inevitable punishment. The other, 'endogenous' attitude attempts to identify drives that shape human behaviour independently of external authority. The most popular example is 'hedonism', the philosophical doctrine holding that only what is pleasant is intrinsically good; it is epitomized in the words of Diogenes Laertius (3C ad): 'All living creatures from the moment of birth take delight in pleasure and resist pain from natural causes independent of reason' (Long 1986).

The scientific treatment of reinforcers and reinforcements drew in its early days from two conceptual sources. One, philosophical contemplation of the attributes that foster the association of mental events, such as similarity, contrast, and contiguity. This philosophy can be traced back to Aristotle, but the main influence in the early days of experimental psychology was that of British Associationism ("associative learning). The other influential conceptual source was Darwinian evolutionism (Wilcoxon 1969; Boakes 1984). Evolution, so goes the Darwinian view, is moulded by natural selection of adaptive traits among the pool of genetically generated variations. Spencer, Bain, and Baldwin were the first to adapt this selectionist view to the theory of learning (ibid.; Cason 1932).2 In a nutshell, the idea was that the adaptive processes of the ontogenesis of the individual's behaviour parallel the processes of phylogenesis, in that pleasurable states are selected among other states of the organism, whereas noxious states are selected against. This view has culminated in Thorndike's 'law of effect', a "generalization concerning causality and feedback, which posits that behavioural responses that lead to gratification are reinforced by their effect and repeated whereas those that result in discomfort recur less and less (Thorndike 1911; "instrumental conditioning).

The law of effect was remarkably influential in the theory of reinforcement, although within different conceptual frameworks. A prominent development was the consideration of reinforcement in stimulus terms in the elaborate Skinnerian system of operant conditioning (Skinner 1938; Ferster and Skinner 1957).

Skinner's approach supported the 'empirical law of effect', which is similar to the original law but devoid of the theoretical assumptions about internal states, which orthodox "behaviourists shy away from. Another influential theory considered reinforcement in terms of drive reduction. Drives are hypothetical endogenous processes that impel an individual to act on the world or react to it. The drive reductionists proposed that reinforcements satisfy because they reduce drives. In other words, reinforcers promote "homeostasis. The 'law of primary reinforcement', coined by Hull (1943), epitomizes the idea: 'Whenever a reaction takes place in temporal contiguity with an afferent receptor impulse resulting from the impact upon a receptor of a stimulus energy, and this conjunction is followed closely by the diminution in a need (and the associated diminution in the drive and in the drive receptor discharge), there will result an increment in the tendency for that stimulus on subsequent occasions to evoke that reaction' (ibid.; Hull's symbolic notations were omitted from the quote for simplicity) (see also: Miller and Dollard 1941; Birney and Teevan 1961; Wilcoxon 1969).

The reader should not be mislead, however, to think that all the theories of reinforcement were related to the concept of effect. Many prominent thinkers considered the strengthening of response in the absence of the action-effect assumptions. The best example is Pavlov (1927). He used 'reinforcement' to account for the action of the unconditioned stimulus in "classical conditioning, which has nothing to do with the idea of action-effect.3

Over the years, additional theoretical approaches to the problem of reinforcement have emerged. A notable example is the proposal that reinforcement is primarily a function of the value of the response, not of the stimulus, and this response value is greater if the opportunity to perform the behaviour is smaller (Premack 1965). For example, water is an effective reinforcer for a thirsty rat because it reinforces a highly valued behaviour, drinking. Responses more valued by the organism reinforce those that are less valued. This also implies that the reinforcement value is relative. The thirsty rat will increase activity in a running wheel if this is followed by delivery of water; but a water-satiated, running-deprived rat will increase its drinking if this gets it access to running (Premack 1962). Some modern approaches to reinforcement abandon the simple cause-and-effect feedback loop of the effect theories, and take into account complex system properties of brains and organisms, drawing from system theory, cognition, and ecology (Timberlake 1993).

A major chapter in the analysis of reinforcers and reinforcement began with the first systematic attempt to identify brain circuits of reward and punishment. In a "classic series of experiments it was shown that rats could become engaged in intensive self-stimulation via chronically implanted brain electrodes, provided these electrodes are inserted into specific sites in the brain, such as the septal area and the medial forebrain bundle (Olds and Milner 1954; Olds 1969; "dopamine).4 This gave a boost not only to science fiction, but also to the cartography of the brain in terms of circuits that encode the "internal representation of reinforcers and compute reinforcements (Livingston 1967; Robbins and Everitt 1996; "limbic system).

Applied reinforcers and postulated reinforcements vary greatly by the type of experimental "system, "assay and protocol. It is, however, methodologically useful to consider some "generalizations. The "dimensions listed below refer to selected attributes of the reinforcing event, or of inferred reinforcement processes, or of experimental manipulations used to apply the rein-forcer in order to exercise the reinforcement. All these factors are often mixed practically in the design and execution of the experiment.

1. Valence. A reinforcer, the addition of which strengthens the response, is termed a 'positive rein-forcer'. A reinforcer, the removal of which strengthens the response, is termed a 'negative reinforcer'. A candidate reinforcer that leaves the response unaltered is a 'neutral reinforcer'. Many authors use the term positive reinforcer synonymously with 'reward'. This is basically OK, although reward may be delivered without affecting behaviour, as any parent knows, whereas a positive reinforcer by definition affects behaviour. It is not OK, however, to exchange negative reinforcer with 'punishment'. Indeed, both refer to aversive stimuli. But whether an aversive stimulus is a negative reinforcer or punishment depends on the stimulus-response contingencies. Hence an electric shock is a negative reinforcer if its removal is contingent on the response, a removal that is certainly not punishment, but is a punishment if its application is contingent on the response. This is the appropriate point to add that the valence of reinforcers is not always apparent at the time of the experiment. There are situations in which the subject appears to be reinforced in spite of the absence of an apparent reinforcer (e.g. latent "learning, "observational learning). In these cases it makes sense to talk about a 'latent reinforcer'. In the theory of learning there are views that learning is impossible if there is no reinforcer whatsoever; therefore, if there is no apparent reinforcer, there must be a latent one.

2. Magnitude. This refers to quality, or quantity, or both. Reinforcers differ in their quality; some types of stimuli are more effective in a given situation than others, e.g. food vs. toys to a hungry subject (Jarvik 1953; Garcia et al. 1968). Reinforcers could also differ in quantity, in terms of intensity or schedule of delivery.

3. Hierarchy. A reinforcer that has "a priori reinforcing properties to an individual of the species, is a 'primary reinforcer'. A reinforcer whose reinforcing properties are due to association with a primary reinforcer is a 'secondary reinforcer', or, depending on the order of association, 'higher-order reinforcer' (compare with higher-order conditioning in "classical conditioning).

4. Schedule. The schedule in which reinforcer/ reinforcement is delivered is crucial to the behavioural outcome. Generally speaking, reinforcement could be delivered continuously, so that every response is reinforced, or intermittently, so that some responses are reinforced and some are not. Contrary to the intuition of newcomers to the discipline, intermittent reinforcement is often more effective than continuous reinforcement; we have already encountered this in "experimental extinction (the so-called 'PREE effect', see there). Four main types of schedules are common in intermittent reinforcement: (a) fixed ratio, in which a response is reinforced upon completion of a fixed number of responses; (b) variable ratio, in which the reinforcement is scheduled according to a random series of response/reinforcement ratios; (c) fixed interval, in which the first response occurring after a given interval of time measured from the preceding reinforcement is reinforced; and (d) variable interval, in which reinforcements are scheduled according to a random series of intervals (Ferster and Skinner 1957; on the special case of delayed reinforcement and its behavioural consequences, see also Renner 1964).

5. Level. At the behavioural level, reinforcers are sensory input. At the circuit level, they are input from other circuits in the brain, or chemical messages such as hormones from other parts of the body. At the cellular and "synaptic level, reinforcers are encoded in "neurotransmitters, neuromodulators, ion currents, and other chemical or electrical messages.

It is currently possible to consider the neural encoding of reinforcers, and the computation of reinforcement, in terms of identified circuits, synapses, and molecules (e.g. Robbins and Everitt 1996; Schultz et al. 1997; Picciotto et al. 1998; Menzel et al. 1999; Corbit and Balleine 2000; Shimizu et al. 2000). Key brain structures involve "limbic-corticostriatal-pallidal circuitry (Robbins and Everitt 1996) and diffused neuromodulatory systems (Schultz et al. 1997). This accumulated knowledge contributes to the theory of brain and behaviour, and to the understanding and treatment of pathological conditions in which abused reinforcements result in bad "habits (Picciotto et al. 1998; Robbins and Everitt 1999). We should also become tuned to the possibility that soon there will be a need to apply this knowledge to the effective and safe training of smart robots (Saksida et al. 1997).

Selected associations: Algorithm, Instrumental conditioning, Model, Neurotransmitter, Stimulus

1'Teaching' as used here can refer to instructive, adjustive, or selective actions, as explained under 'stimulus.

2As in other entries in this book, unless otherwise indicated, theory does not mean a formal physical theory, but rather a conceptual framework for further hypotheses and experiments (see in 'algorithm).

3Note that in this case the reinforcer does enter into the association. The use of the reinforcer/reinforcement terminology in the context of classical conditioning is considered by some authorities as obsolete.

4The original observation was fortuitous: James Olds noted that a rat keeps returning to the place on the table top where it had been when an electrical stimulus was applied to its brain via a chronically implanted electrode. This has led to experiments in which rats were trained to press a lever to self-deliver the stimulus. For more on the history of this important chapter in the neurobiology of reinforcement, drive, motivation, and learning, as well as on the 'real-life events beyond the 'culture of science, see Milner (1989).

0 0

Post a comment