Types of training protocols in which reinforcement is made contingent upon performance of the proper behaviour

In "classical conditioning the subject learns relations among "stimuli; in instrumental conditioning (also

'instrumental learning') it learns the impact of its actions on the world. The scientific interest in this type of learning can be traced to Bain (1859), who, impressed by his experience in the Scottish Highlands, noted 'association of movement with the effects produced on outward things' in animals trying to leap over obstacles:'... spontaneous impulses of locomotion lead them to make attempts ... any attempt causing a hurt is desisted from; after a number of trials and failures, the proper adjustment is come to, and finally cemented'. Morgan (1894; see "Ockham's Razor) reported experiments on trial-and-error learning in birds and in Tony, his fox terrier. But it is to Thorndike (1911) that we owe the introduction of the systematic study of instrumental conditioning into the laboratory.

In a typical experiment, Thorndike placed a hungry cat into a puzzle box, a small crate hammered together from wooden slats (no fat grants those days) (Figure 38). A piece of fish was placed visibly outside. The box had systems of pulleys, strings, and catches so arranged that pulling a loop, or pressing a lever, allowed a door to fall open, and the cat to jump out and devour

Fig. 38 Two of the puzzle boxes used by Thorndike in his studies of instrumental conditioning. For more on Thorndike's puzzle boxes, see Thorndike (1911); Chance (1999). (Courtesy of Yale University Library.)

the fish. The cat, restlessly exploring the box, would operate the mechanism by chance. As the door fell open immediately, the cat would apparently learn to associate its deeds with the outcome, resulting in a striking improvement in performance over time. The behaviour was not necessarily related to the mechanism of door opening; in some experiments Thorndike manipulated his cats to lick themselves in order to get out. Note, however, that, whereas in classical conditioning the experimenter controls (ideally) all the experimental parameters, in instrumental conditioning there is a more democratic division of labour: the subject decides which response to emit, and the experimenter decides how and when to reinforce it.

Instrumental conditioning is so termed because the individual's behaviour is instrumental in the materialization of the reinforcement. This family of conditioning "paradigms is also known by other names: conditioned reflex type II (Miller and Konorski 1928; they modified Pavlov's paradigm and trained a dog to flex a leg in response to a buzzer in order to get food); Thorndikian conditioning; type R conditioning (R for response, Skinner 1938); trial-and-error conditioning; operant conditioning (operant because the spontaneous behavioural response operates on the environment, and is in turn affected by the environmental effects; ibid.);2 and Skinnerian conditioning, after Skinner, whose problem boxes, descendants of Thorndike's puzzle boxes, came to epitomize the social and educational philosophy that all behaviour is malleable by operant conditioning ('operant "behaviourism'; Skinner 1984).3 A "taxonomy of instrumental conditioning lists no less than 16 different subtypes, differing in the relationship of the behaviour (or its omission) to the outcome (or its prevention), and in the presence or absence of an antecedent signalling stimulus (Woods 1974). Among the most popular subtypes: signalled reward conditioning, in which a signal signifies that the reward will follow if the behaviour is executed (a 'go' situation); signalled omission reward conditioning, which is similar to the above only that the behaviour has to be withheld (a 'no-go' situation); active avoidance conditioning, in which following a signal, punishment is avoided provided the response is made; and passive avoidance conditioning, in which punishment is avoided if the response is withheld.

What is it that gets associated in instrumental conditioning? The theory of instrumental conditioning, similar to that of classical conditioning, has developed remarkably since the introduction of the paradigm. Thorndike himself extracted from his experimental findings a 'law' ("algorithm) that he called 'the law of effect'. This influential 'law' appears in several forms in Thorndike's writings. Here is the formulation at the behavioural and "system "levels: 'Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, will be less likely to recur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond' (Thorndike 1911). And here is the formulation at the circuit, cellular or "synaptic levels: 'Connections between neurones are strengthened every time they are used with indifferent or pleasurable results and weakened every time they are used with resulting discomfort. This law includes the action of two factors, frequency and pleasurable result. It might be stated in a compound form as follows: (1) The line of least resistance is, other things being equal, that resulting in the greatest satisfaction to the animal, and (2) the line of least resistance is, other things being equal, that oftenest traversed by the nerve impulse. We may call (1) the Law of Effect, and (2) the Law of Habit' (Thorndike 1907; italics in the original).4 Two points deserve particular attention. One, the law of effect is an adaptation of Darwinian selectionism ("a priori, "stimulus). Second, Thorndike's attitude was rather modern: he explicitly treated learning as multilevel phenomena and processes, and well appreciated that whatever is observed at the behavioural level, is manifested at the neuronal level as well, and vice versa.

The law of effect continues to drive research to this day in both the behavioural and the brain sciences (e.g. Ahissar et al. 1992). It does not, however, have the power to explain all the processes that occur in instrumental conditioning. Any attempt to understand these processes and their interaction must take into account three types of elements that play a part in every instrumental conditioning situation: the response whose probability or intensity is modified, the reinforcer that is contingent upon this response, and the stimulus in the presence of which this contingency takes place. Three types of associative configurations have dominated the theoretical discussion of the interaction of response, reinforcer, and stimulus in instrumental conditioning (Colwill and Rescorla 1986): (a) stimulus-response association ('S-R theories'; e.g. Guthrie 1952; "associative learning). (b) Pavlovian S-reinforcer association, occurring in parallel with the S-R association

('two-process theories'; Rescorla and Solomon 1967); (c) response-reinforcer, or action-outcome association ('A-O theories'; Colwill and Rescorla 1986; Dickinson and Balleine 1994).

Experimental evidence could be provided for the involvement of each of the above associative structures in instrumental conditioning, but, at the same time, none of these postulated associations could serve as a sufficient, much less so exclusive explanation ("criterion) for the behaviour. It is likely, therefore, that in instrumental conditioning, "internal representations could be formed that link the three elements, S, R, and O, with associative weights that depend on the task, "context, and a priori knowledge of the subject. Just as an example to illustrate that instrumental conditioning involves more knowledge than detected by the naive eye, consider the following experiment (Colwill and Rescorla 1985): rats were trained on two different instrumental responses, lever pressing and chain pulling, each associated with a different reinforcer, sucrose solution or food pellets. Then each rat received pairing of one of the reinforcers with a malaise-inducing injection of LiCl ("conditioned taste aversion, CTA), to decrease the hedonic value of that reinforcer (this is called 'stimulus devaluation', see "classical conditioning, "cue). When the rats were again given access to the instrumental response in the absence of the reinforcers, each rat preferred to make the specific instrumental response that had not been devalued by CTA. This suggests that the rats encoded the reinforcer identity and A-O contingency as part of the knowledge about the instrumental learning situation, and that this knowledge was susceptible to post-training experience. The picture that emerges is hence of instrumental conditioning leading to the "acquisition of specific knowledge bases, rather different from the picture depicted by the early minimalist S-R theories. This is similar to the current "zeitgeist in the study of classical conditioning. Further, from the aforementioned discussion it becomes evident that in spite of the difference in the training protocols and the particular types of knowledge, the processes that occur in instrumental conditioning overlap some that occur in classical conditioning (on this issue, see also Dickinson 1980; Mackintosh 1983).5

The incentive to understand the computational theories, algorithms, and biological hardware ("level) of instrumental conditioning is high. Much of what we learn in our lifetime is by trial and error. Furthermore, together with "observational learning, instrumental conditioning is contemplated as the method of choice to train the smart robots that will share this planet with us in the future (e.g. Saksida et al. 1997), and we had better find the way to teach them efficiently and, what's even more important, the right things only. The neurobiology of instrumental conditioning is, however, still fragmentary. At the system level, brain circuits have been identified that perform selected types of computations in instrumental conditioning. Special interest is dedicated in recent years to those circuits that encode the reinforcement and the A-O associations. These include "cortico"limbic-striatal-pallidal circuits (Robbins and Everitt 1997), with specific structures assumed to play distinct roles such as anticipation of reward, computing the deviation of the actual from the expected outcome, control of response and its adaptive correction, and possibly also representation of A-O causality (Schultz et al. 1997; Trembley and Schultz 1999; Balleine and Dickinson 2000; Baxter et al 2000; Corbit and Balleine 2000; Corbit et al. 2001; see also "dopamine, "habit, "reinforcer). As to the cellular, synaptic, and molecular mechanisms of instrumental conditioning—their study could benefit from the use of "simple systems and, although the analysis of trial-and-error learning in simple systems is so far less developed than that of classical conditioning, some promising preparations are already available (Cook and Carew 1989; Chen and Wolpaw 1995; Nargeot et al. 1997). We might expect the molecular mechanisms of instrumental conditioning to be similar to those of classical conditioning and many other forms of learning; the characteristic instrumental contingencies are probably encoded at the circuit level.

Selected associations: Associative learning, Classical conditioning, Maze, Reinforcer, Skill

'Excluding, of course, consequences that eliminate the opportunity to perform this behaviour again.

2Operant conditioning is sometimes distinguished from instrumental conditioning in that the latter involves distinct responses within a structured task, whereas the former refers in addition to repeated emission of spontaneous behaviour that results in obtaining the goal. This distinction, however, is not systematically honoured in the literature and will not be further elaborated here.

3Operant behaviourism aspired to explain all types of behaviour, including human language (Skinner 1957). This was belligerently confuted by linguists and cognitive psychologists alike (Chomsky 1959). By the way, Skinner's methods found their way even into top-secret war projects: during the Second World War he was engaged in an attempt to train pigeons to guide missiles by operating problem boxes in the warhead (Skinner 1960).

4On precedents of Thorndikes' law, and on its place in the history of the behavioural sciences, see Cason (1932) and Wilcoxon (1969); see also *reinforcer. The empirical 'generalization that the consequences of the response is an important determiner of whether this response will be learned, is called 'the empirical law of effect', and can be used independently of Thorndike's theoretical assumptions (*reinforcer).

5The relevance of classical- to instrumental conditioning has multiple facets and all should be taken into account in the interpretation of the data. One is the postulated processes shared by these two types of learning.A very different facet is the possibility that conditioning that is considered instrumental is actually Pavlovian. Consider, for example, a pigeon trained to peck an illuminated disk in a Skinner box to obtain food. The classical interpretation is that food delivery is contingent upon pecking, i.e. an operant conditioning situation. However, if the experimenter simply illuminated the disk before each food delivery, irrespective of the pigeons' behaviour, the pigeons still pecked at the light as if there were an instrumental contingency (Jenkins and Moore 1973). Pecking is a component of the pigeon's consummatory response, and the contingency was probably between a conditioned stimulus (illuminated disk) and an unconditioned one (food). Hence, in this case, Skinner's famous pigeons were disguised Pavlovian dogs.

0 0

Post a comment