Killeen, Peter R. (1994) Mathematical Principles of Reinforcement. Behavioral and Brain Sciences 17 (1) 105-172.

This is the unedited penultimate draft of a BBS target article that has been accepted for publication (Copyright 1994: Cambridge University Press) and is being circulated for Open Peer Commentary. This preprint is for inspection only, to help prospective commentators decide whether or not they wish to prepare a formal commentary. Please do not prepare a commentary unless you have received the hard copy, invitation, instructions and deadline information.
For information on becoming a commentator on this or other BBS target articles, write to:
For information about subscribing or purchasing offprints of the published version, with commentaries and author's response, write to: (North America) or (All other countries).

Mathematical Principles of Reinforcement: Based on the Correlation of Behaviour with Incentives in Short-Term Memory

Peter R. Killeen
Psychology Department
Arizona State University
Tempe AZ 85287-1104


Reinforcement, memory, coupling, contingency, contiguity, tuning curves, activation, schedules, trajectories, response rate


Effective conditioning requires a correlation between the experimenter's definition of a response and an organism's, but an animal's perception of its behavior differs from ours. Various definitions of the response are explored experimentally using the slopes of learning curves to infer which comes closest to the organism's definition. The resulting exponentially weighted moving average provides a model of memory which grounds a quantitative theory of reinforcement in which incentives excite behavior and focus the excitement on the responses present in memory at the same time. The correlation between the organism's memory and the behavior measured by the experimenter is given by coupling coefficients derived for various schedules of reinforcement. For simple schedules these coefficients can be concatenated to predict the effects of complex schedules and can be inserted into a generic model of arousal and temporal constraint to predict response rates under any scheduling arrangement. According to the theory, the decay of memory is response-indexed rather than time-indexed. Incentives displace memory for the responses that occur before them and may truncate the representation of the response that brings them about. This contiguity-weighted correlation model bridges opposing views of the reinforcement process and can be extended in a straightforward way to the classical conditioning of stimuli. Placing the short-term memory of behavior in so central a role provides a behavioral account of a key cognitive process. 

What does reinforcement strengthen? Responses, to be sure, but responses as we define them, or as the animals nervous system defines them? Beginning with his earliest articles, Skinner (1935) cogently argued that stimuli, responses, and reinforcersQthe constituents of the operant, and the key terms in an experimental analysis of behavior, must be defined functionally, and thus interdependently. In his experiments, however, he immersed subjects in arbitrary lights and sounds, levers and keys, snacks and sips, confident that the operant would select itself out of the stream of events in time. The modern enterprise of behavior analysis succeeds to the extent that our definitions and instrumentations of these key terms have themselves been selected to respect the animals definitions, as best we can intuit them (Bolles 1983; Thompson & Zeiler, 1986; Timberlake & Lucas 1990). In the extra laboratory setting we have less control of all three terms, and make correspondingly greater use of hedges such as the stimulus as coded, response class, and motivational effects. These qualifications, even when apposite, are usually ad hoc. In this paper I derive a functional definition of the response. I then demonstrate the wide-ranging implications of the appropriate use of the organisms definition, for it directly gives shape to many of the laws of behavior.

Reinforcement controls aspects of the response along multiple dimensions, such as locus, force, tempo, and topography. Experimental contingencies, the conditions set by the experimenter as criteria for reinforcement, may shape these aspects either in concert with, or in opposition to the fundamental excitatory property of reinforcement (Zeiler 1979). The present research disentangles these forces with an experimental analysis of the response. With that characterized, it then derives models of how the response is coupled to reinforcement by experimental contingencies. These, in concert with a general model of arousal/elicitation under time constraints, yield a general theory of reinforcement.

1. The spread of effect

A reinforcers effects are not limited to just the response that immediately preceded it. Response rates may be readily increased or decreased by requiring the pause before the last response to exceed some criterion. It follows that the reinforcement must reach back toward the penultimate response. In fact, the contingencies have been shown to extend even farther (Catania 1971; Killeen 1969), but with diminishing effectiveness. The decrease in effectiveness with temporal distance is a manifestation of the delay of reinforcement gradient, a concept that has played a key role in theoretical accounts of behavior over the years. The delay may be bridged by conditioned reinforcers, but without them the decay of strength with time is usually swift.

For operant responses, measurement of the gradient presents paradoxical technical problems (Catania & Keller 1981). Let us suppose we wish to study the effects of delaying reinforcement by two seconds. A response occurs, we start our delay timer, but then another response occurs. If we restart the timer, then we are setting up a new contingency that punishes responses. This will deter animals from making the target response, but the deterrence speaks less to the reduced strength of a 2 s delay than to the establishment of the target response as a predictor of nonreinforcement, and thus as something to be avoided. We have shaped non-responding. Alternatively, we may ignore the second response. Although that is often done (Sizemore & Lattal 1978; Williams 1976), we can no longer assume that we are measuring reinforcers effects on a response 2 s distant when another response from the same class has intervened closer to reinforcement. As a third possibility, we might simply prohibit the second response by removing the lever or turning off the key light. But then we have set up a conditioned reinforcer which will directly strengthen the target response (e.g., Richards, 1981).

Even if these problems were to be somehow solved, we must face the complementary, if less obvious, problem of what to do about responses that precede the target response. Unless the target response absorbs all of the strength of reinforcement (Williams, 1975, shows that it absorbs some), the presence of immediately prior responses that are also being strengthened by the only slightly more delayed reinforcer must be taken into account.

Consider an alternative approach: Assume that reinforcement increases the probability of those events that are in a subjects recent memory. Then, to the extent that reinforcement is contingent on just those events in memory, and only those, it should be maximally effective in strengthening behavior: Conditioning will proceed most rapidly when our definition of the response matches the organisms definition. If we wish to reinforce response patterns in an organism that can remember exactly four responses, making reinforcement contingent on only three correct responses would squander its potential effectiveness on the first, randomly providing reward for both correct and incorrect instances of it. Conversely, making reinforcement contingent on six correct responses would introduce intermittency of reward that had no bearing on the shaping of the response, as the first two responses were gone from memory, and their control of the delivery of reinforcement would not be reciprocated in the reinforcements control of them.

This then is the strategy of the following experiments: We shall change the contingencies for reinforcement over a continuum, and where the greatest impact is had on responding, we shall infer that our definition of the (extended) response most closely matches the animals memory of it (i.e., we choose the model that maximizes the posterior probability of the data given that model). In the process, we shall learn the extent of the delay of reinforcement gradient in a context with maximum ecological generality; that is, a context of naturally varying responding that permits various numbers of responses to come under the aegis of reinforcement on each of its occasions.

2. The shape of the gradient

2.1 Qualitative considerations Some common-sense analyses will narrow the field of candidates for the form of the delay of reinforcement gradient. We can immediately rule out gradients that increase with delay: Although reminiscence may highlight remote events more clearly than recent ones, this occurs largely for rehearsed or otherwise singular or marked events. Our present concern is with a stream of relatively homogenous responses, where the most recent tend to be the most salient. Similarly, we rule out an unweighted memory of the past, as this would grow insensitive with time, and apportion an ever decreasing share of our attention to recent events. A moving window on the past has some of the features required for short-term memory, and out of convenience is often used as a model of it (e.g., McDowell, Bass & Kessel, 1992; Wearden & Clark, 1989). But the corners define a concentration of information that requires energy to generate and maintain. In particular, it gives equal weight to everything in the epoch immediately preceding reinforcement and no weight to events just before that epoch. To achieve such a sharp discrimination would require remembering all of the events up to the edge, and then erasing the oldest as the newest is added. Not only is this unrealistically compute intensive, there is no justification for such absolute discrimination against the slightly older. Surely memory must decay, rather than persist inviolate out to a point of total collapse. Equation 1 gives one picture of decay, an exponential gradient:

1. y=lambda*exp*(-lambda*d).

Equation 1 is a weighting function, so that if a response occurred at, say, 2 s before reinforcement, we would evaluate the equation at d = 2 to find the weight of impact of reinforcement on that response. If lambda = 1/8, the event would be strengthened 78% of the amount of an event at d = 0. To calculate the animals (or experimenters) current memory, one would multiply events at each instant before reinforcement by Equation 1 evaluated at that instant, and sum the products. Equation 1 corresponds to one of the simplest electronic memories, an RC circuit.

A process that automatically accomplishes the same weighting and summing for a sequence of discrete events is the Exponentially-Weighted Moving-Average (EWMA):

2. M(n) = beta*y(n) + (1-beta)*M(n-1), 0 beta 1

where M(n) is the current memory, y(n) the relevant attribute of the current response, M(n-1) is the previous memory, and beta is the currency parameter. When beta = 1, all the emphasis is on the most current event, and none on prior events. When beta is small, most of memory is occupied by prior events. Equation 2 may be iterated with each new event, or with each moment of time. In the limit where it is iterated with time and the moments between iterations become arbitrarily small, the process converges on Equation 1, with lambda = -ln(1-beta).

This process, a linear average, has several advantages as a model of memory decay. It is very easy to calculate, makes minimal demands on memory (and is thus plausibly attributed to simple neural networks) and makes the most efficient use of information (Davis, Staddon, Machado & Palmer 1993; Killeen 1981,1991; McNamara & Houston 1987). Another shape that has been proposed for the gradient is the hyperbolic function (Figure 1e; Mazur 1984; Rachlin, Raineri & Cross 1991). The present experiments are not intended to distinguish between these similar forms, and their major conclusions should be robust over the ultimate decision between them. The computational convenience, correspondence with simple elements in linear systems theory (McDowell & Kessel 1979; McDowell, Bass & Kessel 1992), and intuitive clarity of the EWMA will make it our process of choice in the present analysis, and its continuous realization, the exponential decay, the inferred form of the gradient.

2.2 Event driven or time driven? Time is measured by counting events. In the case of Newtonian time, we devise precisely periodic events which continuously index counters; behaviors time is driven by those internal pacemakers and external stimuli that capture attention (Killeen 1991). Decay of short-term memory is very slow in an undisturbed environment (Brown 1958), and dismayingly fast when other events are interpolated (Peterson & Peterson 1959). Analogously, the conditioning of a stimulus or response to a reward is debased by interposing other stimuli or responses. Such distractors do not so much subvert attention while time elapses; but rather by entering memory they move time along, iterating Equation 2 and thereby downweighting prior stimuli and responses. The theory is developed as a discrete, event-driven process, but we shall later see (in section 5.3.1) that there is a continuous updating during the course of a response. For now, however, just think in terms of Equation 2 operating on quantal responses.

2.3 Quantitative considerations It is easy enough to assert that conditioning should be best when our definition of behavior is the same as the organisms, but how do we know the organisms definition? What is the proper value for in Equation 2? The curve in Figure 1F shows how an organism operating according to Equation 2 with beta = 1/4 weights a sequence of IRTs, and the broken line gives the resulting average. If an experimenter makes reinforcement contingent on only the last IRT, requiring that its value be less than 1, the last response will be judged to have satisfied the criterion. But, because many of the recent responses were associated with long IRTs, the subjects memory of its behavior is that of slow responding. Reinforcement at this point will, perversely from the experimenters viewpoint, move behavior in the wrong direction. We may use this failure to communicate as a technique for inferring the organisms value of beta. We may test various candidate values for it. If one stands out as most powerful in controlling behavior, we infer, per hypotheii, that it is also the organisms rate constant. The approach is not unlike that used to determine the natural frequency of systems by applying various driving frequencies, and determining where resonance occurs. To see this better, it is useful to simulate the postulated results.

3. Experiment 1: Simulation

Is reinforcement most effective when the criterion for its delivery weights the past in the same manner as the subject? In Equation 3:

3. m(n) = alpha*y(n) + (1- alpha)*m(n-1), 0 alpha 1

m(n) signifies our current definition of the relevant response attribute, based on a linear average of the most recent response and the previous average. We may make reinforcement contingent on a value of m(n) that exceeds some criterion value. The only differences between Equation 2 and Equation 3 are the values of the currency parameters and the organismsQ subject or experimenterQin which they reside. If our value for alpha is close to 1, then we are attending only to the last response, and the gradient is very steep. If at the same time the organisms value of beta is substantially less than 1, conditioning should be less than optimal, as the experimenter and organism are focusing on different epochs of behavior. The point of this simulation is to determine if it is true that conditioning proceeds fastest when our definition of the response coincides with the animals, that is when alpha = beta.

In these simulations groups of three stat-rats were constructed with the same memory parameter beta for each animal within a group but with values of 1.0, 0.50, 0.25, 0.12, 0.06, and 0.03 characterizing the different groups. The rats interresponse times IRTs were selected from a normal distribution with a mean of M(n) and a standard deviation of 1: N(M(n),1). Within conditions, a value of alpha was assigned and held constant for 1000 responses. After each response Equation 3 was iterated and the value of m(n), the experimenters average, was tested to see if it exceeded the value for the last trial. If m(n) equalled or exceeded that criterion, the stat rat was reinforced for its behavior. Reinforcement incremented the mean value of the population from which responses are selected to M(n), the value specified by Equation 2, the organisms picture of what was getting reinforced. At the same time, the criterion for reinforcement was increased to m(n), the value specified by Equation 3, the experimenters conception of the last response. Thus for each new reinforcement, the response attribute had to exceed the value that received reinforcement on the last trial. But there are two separate definitions of the response, each of which involves a weighted memory of the past: Behavior changes according to the animals definition, whereas the contingencies change according to the experimenters definition. After 1000 training trials the final value of M(n) was recorded, and a new value for alpha was explored.

Figure 2 shows the results. Learning does indeed proceed fastest when the experimenters weighting of the past, epitomized by alpha, equals the animals, epitomized by beta. In the next experiment, we will ascertain if this also holds true for real organisms. Note also that the overall level of acquisition is greatest under the conditions in which beta is large. This may seem strange, for shouldnt lengthened memory (corresponding to smaller values of beta) enhance learning? Not necessarily. Decreasing the value of beta decreases the variance of the process by the factor beta/(2-beta), providing fewer instances of extreme scores that the trainer could capitalize on to move responding quickly in the preferred direction. When the past weighs heavily in memory, behavior inevitably becomes conservative.

4. Experiment 2: Tuning curves for pigeons STM

4.1 Subjects: Four pigeons (Columba livia), Subjects 14, 37, 40, and 41, with various prior histories of experimentation served.

4.2 Apparatus: The experimental chamber was a standard enclosure, with the central Gerbrands response key transilluminated with white light. A houselight provided general illumination while white noise masked ambient sounds. The reinforcer was mixed grain, available from a standard feeder for approximately 2.5 seconds. The experiment was controlled by a computer, which generated criteria for reinforcement in a manner similar to that of the previous experiment.

4.3 Procedure: The subjects were trained to respond on a Variable Interval (VI) 60-s schedule, in which reinforcement followed the first response after an interval of time that was selected from an approximately constant probability distribution (Catania & Reynolds 1968). After several months of pilot experiments we arrived at the following percentile reinforcement schedule (Platt , 1973; Galbicka 1988). The criterion for reinforcement was based on the recent history of responding and adjusted to maintain a constant upward (or downward) force on the length of the times between responses (IRTs). This was accomplished by setting the criterion for reinforcement above the 80th (or below the 20th) percentile of IRTs emitted since the last reinforcement. Reinforcement was delivered whenever the VI 60 s schedule had primed and the experimenters measure of the animals response, calculated from Equation 2, exceeded the appropriate percentile. The pigeons were randomly assigned to either an ascending (14 & 41) or descending (37 & 40) series of values for from the set: 1.0, 0.50, 0.25, 0.12, 0.06, and 0.03. Once assigned, the value was maintained for 4 weeks (5 sessions per week, 54 reinforcers per session). During the first and third weeks of each condition the pigeons were reinforced whenever the VI schedule was primed and the value of the experimenters weighted average of IRTs was less than 80% of the recent IRTs (i.e., below the 20th percentile of them). For the second and fourth weeks, it had to be greater than 80% of the recent IRTs (i.e., it had to be above their 80th percentile). This cycle was continued until the animals had received exposure to all values of alpha.

4.4 Results: Response rates changed as a function of the direction in which the percentile schedule forced them, and the force was more effective at some values of than at others. Figure 3 shows the response rate as a function of sessions for each of the animals. Over all values of alpha the changes appear to be greatest at beta = 0.5 for one subject, and beta= 0.25 for the other subjects. In order to better characterize the efficacy of the reinforcement contingencies, I have fitted linear learning curves to these data. Linear regression lines have two parameters, slope and intercept, either or both of which may be affected by the changes in response rate. To reduce the number of parameters and simplify analysis, I forced the origin of the straight lines through the first datum, and let the slopes alone reflect the changes wrought by the contingencies.

This analysis is summarized in Figure 4, where the absolute values of the slopes of the ascending and descending curves for each condition are averaged and plotted as a function of alpha. Subjects 14 and 41, with ascending values for , showed better overall learning than 37 and 40. Subjects 14, 40, and 41 showed peak learning at = 0.25, whereas Subject 37 peaked at = 0.50.

We may characterize these results by saying that pigeons response memory is consistent with a linear average having a currency parameter of about 1/4. In Figure 5 I have replotted the data from Figure 4 normalized to set the maximum slope at 1.0, and plotted a theoretical curve generated by the assumptions that motivated this experimentQthat conditioning depends on the correlation between the experimenters requirements for reinforcement and the organisms understanding of them (see Appendix A). The curve shows that correlation coefficient for different values of alpha when beta = 1/4.

4.5 Discussion: This experiment showed that we have much better control of organisms behavior insofar as we construe a response to consist not of a punctate operation of a manipulandum, but rather of a fading memory of behavior in which recent events are weighted more heavily than earlier ones, and in which the mean age of the memory (1/beta) encompasses about four responses.

Other researchers have studied the role of responses preceding the one that produces the reinforcer. Catania (1971) executed a particularly neat set of experiments, in one of which he provided reinforcement to pigeons for sequences of responses between Keys A and B. Figure 6 shows the proportion of responses on Key B for each of the reinforced sequences. Obviously, the greater the number of B responses required, the greater the proportion. But the position of the B requirement in the sequence also had an effect on the proportion of B responses. If we assign a response attribute of 0 for Key A responses and 1 for Key B responses, by iterating Equation 2 we may predict the animals memory for B responses (relative to A responses) upon the delivery of reinforcement under each of the conditions. We set beta = 1/4, and successively assign values of 1 or 0 to y(n) as appropriate for each of the sequences represented on the x-axis, to obtain the predictions graphed next to the obtained data (showing a perfect rank-order correlation with the obtained data).

For beta= 1/4, if we reinforce a single response in isolation we are only utilizing 25% of the potential effectiveness of the reinforcer. This is evident in Catanias data, where we may infer from the last column that reinforcing an A preceded by 3 Bs generated a probability of A equal to about 30%, whereas when preceded by 3 As the probability of responding A increased to almost 90% (first column). Extended response requirements increase the coupling between the response and reinforcement; it will also strengthen the control by stimuli present during that responding, facilitating discriminations that are otherwise difficult or impossible to establish (Williams, 1972).

This importance of responses that precede the last response has been recognized in both the design of reinforcement contingencies, such as the change-over delay used to in concurrent schedules to insulate responses on one key from reinforcement intended for those on another, and in the design of models of the resulting performance (Davison & Jenkins, 1985). The current characterization of the extended response should facilitate those efforts.

Why is response memory so brief? Whatever the total capacity of STM, it must be allocated as shown in the bottom panels of Figure 1. To the extent that a longer view of the past permits one to deal with remote associations, it also undermines ones ability to deal with immediate precursors. Short-term memory is a temporal lens, focused by evolutionary pressures upon the epoch that is most likely to have causal relevance to events of biological significance. Its responsiveness to the most immediate input (the derivative of Equation 2 with respect to y) is simply the currency parameter. As we take the longer view, we necessarily weight the present less heavily; and the longer view, comprising more events, has more inertia. The exponentially decreasing weights of Equations 1 and 2 respect the exponentially ramifying chains of causal relevance, and thus appropriately allocate credit for a reinforcer most strongly to those events most proximal to it. This exponentially decaying model of short term memory will play a key role in the general theory of reinforcement that follows. The rest of this paper develops the implications of this view of memory.

5. A General Theory of Reinforcement

Two counterpoised forces affect the control of behavior under schedules of reinforcement: As we increase the rate of reinforcement, we activate more behavior, but at the same time we decrease the number of responses that each reinforcer can influence. This is because reinforcement isolates responses that precede it from other reinforcers that follow it, and thus truncates the reach of reinforcement (Catania, Sagvolden & Keller 1988; Killeen & Smith 1984; Williams 1978, and Appendix B). Because of this, continual reinforcement of one response after another can provide only a fraction () of the strengthening that is available when many responses precede the reinforced response. This insight is developed by identifying three factors that affect the control of behavior by reinforcement, and which constitute the fundamental axioms of the theory:

1. Activation: An incentive activates a seconds of responding;

2. Temporal constraint: A responses requires delta seconds for its completion;

3. Coupling: Reinforcement occurs when an incentive enters a memory that also contains a response. The proportion of memory occupied by the target response, and thus the effectiveness of reinforcement, is given by the coupling coefficient, zeta .

Simple models of these assumptions are developed and justified in Appendix B (activation & constraint) and Appendix C (coupling). The manner in which reinforcement moves the trajectories of behavior toward the states described by those asymptotic models is discussed in Appendix D (dynamics). These mathematical principles of reinforcement are summarized in the body of the paper where they are applied to representative data.

5.1 Activation & constraint: From Assumptions 1 & 2 it follows that incentives delivered at a rate of R per second can potentially instigate a/delta responses per second. But each response uses up some of the time available for making other responses. Staddon (1977) has shown that a simple correction for such temporal constraints yields Equation 4, originally introduced by Herrnstein (1970):

4. b = kR/(R+c)

where b is the measured rate of responding, R is the rate of reinforcement, and k and c are free parameters: The former is the asymptotic rate of responding as R goes toward infinity (Herrnstein 1974); the latter is interpreted by Herrnstein as the rate of reinforcement available for other, unrecorded responses. Over a large range, response rates on interval schedules adhere closely to Equation 4 (McDowell 1980), which has been rederived from a variety of assumptions (see, e.g., Williams, 1988). In our derivation, essentially the same as Staddons, k also designates the asymptotic response rate, but c = 1/a (see Appendix B). Thus far the development assumes that all of the activation is focussed into target responses. But we have seen from Experiment 2 that this is only the case if the contingencies are carefully matched to the animals memory. This is rarely the case. The rate of responding will be less than that predicted by Equation 4 to the extent that the coupling between incentive and target response is less than perfect. I now apply the activation/constraint model to ratio schedules, where its failure will set the stage for introduction of the coupling coefficient.

5.2 Ratio schedules: To use Equation 4 we must know the rate of reinforcement, R. On Fixed Ratio (FR) schedules, reinforcement is delivered immediately after the response. Rate of reinforcement is therefore proportional to the rate of responding (b) and inversely proportional to the ratio requirement (N); the schedule feedback function for ratio schedules is:

5. R = b/N

Substituting into Equation 4 and rearranging gives the rate of responding predicted on the basis of Equation 4 (Pear, 1975):

6. b = k-cN

Equation 6 predicts that as N approaches zero, response rate will approach its maximum, k. It fits none of the FR data, some of which are shown in Figures 7 and 8. This is because Equation 4 takes a single operant response as the event that is strengthened, and does not address the issue of how reinforcement affects more or less of the animals memory of previous responding. But we already know that when N = 1, reward is only beta times as effective as the case when N is very large. In general, the ceiling k will depend on the extent to which we match our definition of the reinforced response to the animals definition. On interval schedules, for instance, any response that takes time moves the animal along toward satisfaction of the schedule, but many of those responses are not measured by the experimenter, thus lowering the apparent value of the ceiling. In the following sections I derive a general solution to this problem, by introducing a coefficient of coupling as an implicit factor of k. To aid memory, refer to Table 1 for a synopsis of parameters and their interpretation.

Table 1

Symbol Interpretation

Beta The weight in short-term memory assigned to the most recent


M(n) The contents of the subjects memory after the nth response

Alpha the weight assigned by an experimenter to the most recent


m(n) The experimenters characterization of the subjects memory

after the nth response

B The rate of responding predicted by the complete model

Delta The minimum inter-response time; the reciprocal of the

maximum response rate

k Constant, approximately equal to the asymptotic response

rate; reinterpreted as the product of the coupling constant and

the maximum response rate

R The rate of reinforcement

a Specific activation. The number of seconds of responding that

is elicited by a given incentive under the operative

motivational conditions

N The number of responses required to satisfy a ratio schedule

Lambda The measured rate of decay of short-term memory

Zeta The coupling coefficient. The degree to which memory is filled

by target responses

Rho The coupling constant, which gives the portion of target

responses in the response trajectory. It depends on the

experimental conditions and specific response, but may

generally be assigned values of about 1 for ratio schedules and

1/3 for interval schedules

Lambint The intrinsic rate of decay of short-term memory.

5.3 Coupling behavior to incentives: Equation 2 shows that the first response contributes a memory of strength beta, and iteration of that equation shows that the second adds another while the impact of the first has decayed to (1Pbeta)*beta, and so on. As the number of responses, j, increases to the number required for reinforcement, N, iteration of Equation 2 predicts the organisms response memory will fill according to the sum of a geometric series, which may be approximated by:

8. M(n) = 1-exp(-lambda*N)

Equation 8 states that the contribution of the ratio requirement to the definition of the target response increases as a cumulative exponential function of N. When N is small, so must be the number of responses in memory; as N is increased, the response memory approaches its maximum (saturates), and response rate approaches its ceiling. On FR schedules, all N target responses must have occurred before reinforcement, so that the coupling between incentive and responding, Zeta, is proportional to the degree of saturation of memory by the target response up to that point, Zeta = rho*M(n). I rewrite Equation 6, making the coupling coefficient explicit and substituting 1/a for c to obtain the predicted response rate on ratio schedules:

9. B = Zeta/delta - N/a, where

Zeta = rho*(1-exp(-lambda*N))

As the response requirement increases, more and more of the effects of a reinforcer make contact with responses, and move rate toward its ceiling, as shown by the first term in Equation 9. At the same time, however, the arousal is decreasing linearly with N, as shown by the second term. Thus, our model of FR performance predicts an inverted U change in response rate as a function of ratio requirement. Figures in the paper text that youUll want to see show that Equation 9 provides an accurate picture of changes in response rate for pigeons, rats, and mice. The coupling constant, rho, will be interpreted in Section 6; here, it is fixed at 1.

Notice that the linear descending segment, the pure arousal effect, governs the function after the accumulation of responses in memory has saturated, above FRs of about 25. If we were to project this asymptotic linear segment back to the left, it would intercept the y-axis at an ordinate of 1/delta. To the right, the asymptote intercepts the x-axis at a/delta (by which point Zeta has gone to 1), showing that when the number of responses demanded by the experimenter equals or exceeds the number elicited by the incentive (a/delta), response rate must fall to zero: The ratio is strained to the breaking point and performance extinguishes. Notice that the ratio a/delta converts the seconds of responding that may be sustained by the incentive (a) into the number of responses of duration delta that may be sustained: a/delta is the extinction ratio hypothesized by Skinner (1938).

The dimensions of these parameters reflect the role they play in the theory: The reciprocal of lambda tells us the average number of responses that are coupled to a reinforcer in memory; delta tells us the number of seconds it takes to make a response, and thus both constitutes the lower limit of an IRT and the reciprocal of the maximal attainable response rate; a tells us the number of response-seconds that can be activated/elicited by an incentive, and thus provides a measure of incentive-motivation. Because these three key parameters, each identifying an orthogonal causal factor in the control of behavior, may be directly inferred from such graphs (see Appendix C), such mapping of response rates under different FR values provides a very useful diagnostic technique. I shall return to analysis of these figures and their parameters in the next section.

5.3.1. Uncovering hidden structure. It turns out that the rate of memory decay (lambda) is significantly related to the estimated response duration (delta): r = 0.81 for ratio schedules, r = 0.79 for interval schedules, over some two dozen experimental conditions. The longer or more complicated a response, the greater the proportion of memory it occupies. We incorporate this regularity into the structure of the theory by taking the rate of decay of memory to be proportional to delta. This assumption reduces the variance in the decay rate parameter across experiments and schedule types. It is realized by using the exponential form (e.g., Equation 8), and expanding the rate of memory decay as lambda =lambintxdelta/rho, where lambint is the intrinsic rate of decay. (The constant rho is formally introduced here for consistency with the theory as developed in Section 6, but is fixed at 1.0). This correction for decay of memory during the act of responding is enforced in all applications of the theory. The exponential approximation to the discrete geometric weighting was introduced as a convenience; it is an additional convenience now as it permits us to adjust the rate of memory decay as a continuous function of the response duration. Technically, this is a mixed, metered model; Clark, 1976; Mesterton -Gibbons, 1989.

Let us now review the FR data, and the parameters of the model that optimize the goodness of fit to them. The running rates, the response rate once responding has begun, are shown for Powells (1968) study are well fit under the assumption that memory decays at the rate of 0.38 per response-second, that the minimum duration of a key-peck is 0.28 s, and that a single reinforcer under his conditions can sustain 123 seconds of such responding (Figure 7). Barofsky and Hurwitz (1968) found that their rats formed two groups, a high response rate group that could sustain larger FR values, and a lower response rate group. Table 2 shows that the only difference between them was their minimum response durations, delta. Mazur varied both FR value and the force required to operate the lever. Figure 7 shows that his data also took the predicted shape; Table 2 shows us that delta increased with the force requirement, as did rate of decay of memory (even with the adjustment noted above), while the motivational parameter a remained unchanged. Kelsey and Allison (1976; Figure 8), compared control rats with rats having surgical lesions in the ventromedial hypothalamus (VMH), which caused them to overeat and become obese, but at the same time to be less ready to work for food. Greenwood and associates (Greenwood, Quartermain, Johnson, Cruce & Hirsch 1974) chemically lesioned the VMH of mice, and also studied a group of genetically obese mice. Figure 8 shows that response rates were depressed in the lesioned groups of both species. Table 2 tells us that in neither case was memory affected, nor was the specific activation (a) affected; all of the effects were due to motor impairment. Greenwood and associates concluded that Quote the naturally obese animals do not display the behavioral patterns associated with rodents made obese by hypothalamic damage CQuote (p. 687); the current model is more specific: Genetically obese mice had superior motor facility and enhanced motivation. The model also suggests a more rapid decay of memory for them, although that is driven by their very high rate at N=16.

Note the important role of delta in Equation 9. Many of the effects of operations that have been traditionally interpreted as motivational may actually affect motivation only indirectly by modifying the minimum response duration. Where delta is increased, as it may be by lesions, organisms will be unable to sustain high ratio requirements even though they are as motivated as control animals.

5.4 Variable Ratio schedules: Variable-Ratio (VR) schedules provide reinforcement after a variable number of responses averaging N. Their idealized implementations as random-ratio schedules provide a constant probability of reinforcement, 1/N, after every response. As I show in Appendix C, the coupling to these schedules increases as a hyperbolic function of their mean requirement, and the appropriate model is:

9. B = Zeta/delta - N/a, where

Zeta = rho*lambda*N/(1+lambda*N)

The exponential approach to saturation of memory found under FR schedules (Equation 8) is thus replaced with a similar but slower hyperbolic approach to saturation under VR schedules; otherwise the models are identical. Figure 9 shows response rates of rats and humans on VR schedules, and Table 2 the associated parameter values of Equation 10, which directs the curves through them. The coupling constant rho is fixed at 1.

Although the fits to the rat data in Figure 9 are good, the data from the humans seem to tell a new story. For them there is negligible downturn of responding at high rates of reinforcement (low N), which imputes large values to lambda. Why should humans memory appear to saturate so quickly? In using Equation 7 we assume that at the start of each trial memory contains none of the target responses, because it is filled with consummatory responses involving the previous reinforcer. But the reinforcers in this experiment were points on a display, which interfere less with response memory than do primary reinforcers. The quick saturation is due not to a memory that is quickly filled, but rather to one that is not greatly disrupted by symbolic reinforcement, and which therefore includes responses preceding prior reinforcers. The response rates of Subjects H31 and H32 persisted even for very large ratio values, suggesting that, although the point displays were brief, they were vested with substantial incentive value (cf. Hayes and Hayes, 1992).

The contribution of a response sequence to coupling is a concave function of its length (Equation 8), so that the longer sequences of responses on a VR do not compensate for the occasions on which memory is truncated by the short sequences. We therefore predict that FR schedules will command higher running rates than equal-sized VR schedules (except for very large values of N, where saturation is virtually complete for all sequence lengths), and this is what Mazur (1983) found.

Nowhere does the present theory postulate that ratio schedules reinforce short IRTs, nor that molar contingencies favor high response rates on ratio schedules, as do some alternate theories; yet it provides a good account of the data. Where contingencies of reinforcement attend only to responses and not to temporal attributes of them, our models of reinforcement also need only attend to responses. Under ratio schedules only the target responses move the animal toward reinforcement, and so they differentially fill memory at the time of reinforcement. Interval schedules maintain lower response rates because any response (including unmeasured inter-responses) will move the animal toward reinforcement, and thus debase the coupling to the target response. When the interresponses are experimentally identified (e.g., Catania, 1971), we may a coherent response unit ( Platt, 1971; Platt & Day 1979). But often, as in the case of interval schedules, they are not measured; this does not make them go away, but rather necessitates their treatment as hypothetical constructs.

5.5 Interval schedules: Under interval schedules reinforcement is provided for the first response after a fixed (Fixed Interval; FI) or variable (Variable Interval; VI) time. Herrnsteins (1970) model, Equation 4, provides a relatively accurate picture of response rates under them, but the maximum rate, k, is much lower than expected. The median value of asymptotic response rate calculated by Herrnstein (1970) for the VI data of Catania and Reynolds (1968) was kJ=J1.2 responses per second, only 1/3 of the maximum rate that we know is attainable by pigeons under ratio schedules. Thus, when schedules make the time (or responses) between target responses a property upon which reinforcement bears, the models must be expanded to take those events into account (Platt 1979). There are various ways to do this. The present approach treats an inter response time (IRT) as a sequence of responses ending with the target response. Each response, whether recorded or not, is susceptible to reinforcement, but only the final target response that terminates the IRT gets the full weight of reinforcement, beta. Memory is filled with a mixture of target responses and unmeasured responses, so that even when it is saturated after a long string of behavior, only the proportion will contribute to the strengthening of the target response. Interval contingencies debase coupling by strengthening any sequence of responses that precede the target response. They do not differentially reinforce long IRTs; they do not reinforce IRTs at all (as Reynolds & McLeod, 1970, also concluded). They reinforce the terminal response, giving it a weight of , and they reinforce whatever mixture of target responses and interresponses happen to precede it in memory, giving them a weight of 1 minus beta. Whether or not the mixture contains other target responses (and is thus comprised of short IRTs) or not (and is thus part of a long IRT) is irrelevant. Rather than punish premature target responses, these schedules indifferently strengthen them and everything else. It is this slackness that lets coupling to drift less than maximal levels, being stabilized only by the increment of beta guaranteed by contiguous reinforcement.

Wearden and Clark (1989) also saw the need to expand an otherwise inadequate account in terms of IRTs, either by permitting more than one type of response to enter memory, or to consider a sequence of multiple IRTs, but wondered whether a molecular model of this type can be developed (p. 174). In Appendix C I show that the coupling coefficient for VI schedules is approximately:

11. Zeta = rho*lambda*B/(lambda*B+R)

To complete the model, this equation is inserted into the arousal/ constraint model, Equation 4 , and solved for B, yielding:

12. B = kR/(R+1/a) - R/lambda, where k = rho/delta.

Note that the first term on the rhs of Equation 12 is the familiar hyperbolic function of rate of reinforcement. Throughout most of their range, response rates inherit the hyperbolic form of Equation 4, generated by arousal and temporal constraints on responding. Equation 12 differs from the traditional version of Equation 4 in that it includes an inhibitory effect, due to truncation of memory, that is a linear function of reinforcement rate (second term in the rhs of Equation 12), and which can lead to an actual downturn in responding at very high rates of reinforcement. At low rates of reinforcement, or where lambda is large, the suppressive effect is minimal and the functions similar. Aside from the memorial effect, the slackness in the contingencies will also keep rates below their theoretical maximum. This slackness is realized in a smaller value of rho, which has emerged as a factor of k, and thus lowers ceiling response rates. Because the value of rho is generally underconstrained by the data reviewed in this paper, I routinely set it to 1, letting the impact of variation in it be carried by changes in the value of delta. Under interval schedules, its true value is on the order of 1/3.

Figure 10 shows the performance of rats under different conditions of deprivation, and Figure 11 shows the performance of pigeons and rats of whom different response topographies or forces were required. The parameters for Equation 12 are displayed in Table 3.

5.5.1. Motivational effects. Changes in deprivation, quality and quantity of reinforcer should affect the activation parameter a; we always find this to be the case. Heyman and Monaghan (1987) cite the data from nine experiments in which motivational variables affected the value of a, including some of their own shown in the left panel of Figure 10. They argue that motivational variables do not affect the value of k in Equation 4. But attempting to fit the data with just the hyperbolic part of Equation 14 can incorrectly cause one to infer that k is affected by motivation (when motivation is high, the rate curve will be able to rise higher before the descending asymptote carries it down). McDowell and Wood (1984) argued that k increased with magnitude of reinforcement, but over 99% of the variance in their data may be accounted for by Equation 12 with k invariant. The data of Hamilton, Stellar & Hart (1985) show a clear downturn under VI 5 schedules, one that was greatest under conditions that encouraged the highest response rates.

It is not out of the question that motivational variables may affect other parameters than a: A highly motivated animal may live more in the present, weighting the most recent response more heavily (Killeen 1984; 1985), leading to larger values for lambda. In fact, the data of Snyderman (1983; Figure 10), show increases in both a and lambda with deprivation level. Efficient response topographies may be more readily shaped in highly motivated subjects, resulting in smaller values of delta. Rho may vary with motivational level, as activities such as exploration become more or less easily activated. But none of these adjustments are necessary to account for any data I have analysed, with the exception of Snydermans.

5.5.2. Response topography. Changes in the form of the response should obviously affect the maximum response rat, and Heyman and Monaghan (1987; also see Heyman, 1988) cite numerous studies which show this to be the case. Figure 11 shows the data from two of those studies, which are well-fit by Equation 12.

5.5.3. Duration of reinforcement. Manipulating the duration of an incentive affects more than activation level. When the incentive is brief it does not completely reset memory (that is, does not completely displace the memory of the operant response with that of the consummatory response; see Figure B1). The strengthening effect of a reinforcer increases with its duration according to the same cumulative exponential function as the saturation of memory for any other response (Equation 8; Killeen, 1985); its ability to erase memory follows the same function (Killeen & Smith, 1984). As the incentives duration is extended, therefore, it both increases a and more effectively erases memory for the prior target responses, and should thus have mixed effects: At short inter-reinforcement intervals where truncation of memory keeps coupling below its maximum, increasing duration should degrade whatever residual coupling is left to prior responses, and thus work against the increasing activation. At large inter- reinforcement intervals, however, there is adequate time to saturate memory with target responses, so that the arousal effects should dominate the inhibitory effects of the reinforcer, and response rates should show greater covariation with the magnitude of the incentive. The idiosyncratic properties of very brief incentives, such as electrical stimulation of the brain, may be due in part to the interaction of these two aspects of reinforcement--their brevity permits coupling to responses occurring before the prior incentives, yet, because the growth of arousal during the incentive is so minimal, the value of a is also small, and will support neither large work requirements nor extensive responding in extinction.

The present theory gives us a more precise way of interpreting motivational changes, yet it is not insensitive to the variety of ways in which motivational operations can affect behavior. Systematic differences in basic phenomena may be found under certain experimental paradigms such as the closed economy, which typically arranges very long duration reinforcers, low motivational levels, and extended sessions (e.g., Collier, Johnson, and Morgan, 1992; Timberlake and Peden, 1987; Zeiler, 1991). But in many cases these effects, as well as systematic within interval decreases in response rates (McSweeney, 1992), may be accommodated by simple models which let the value of a decrease within a session as hunger decreases.

5.5.4. Fixed Interval schedules. The highest rates of reinforcement, where we would expect to find the best evidence for a downturn in responding due to weakened coupling, are obtainable under Fixed Interval (FI) contingencies. Figure 12, shows the rates of lever-pressing for food pellets by a rhesus monkeys under high rates of reinforcement (Allen & Kenshalo 1976). Jasons rate of pressing shows a definite downturn as the interval between reinforcers fell below 10 s; another monkeys responding did not show the downturn. In Appendix C I show that the approximate rate model for FI schedules is the same as that for VI schedules, and thus use Equation 12 to fit these data.

Under FI schedules animals pause for about half of the interval length before they begin to respond, whereas under VI schedules animals respond at a relatively uniform rate throughout the interval. The quantitative predictions are not undermined by this qualitative difference, but merely require adjustments in the quantitative parameter estimates (see Appendix D).

Periodic schedules elicit adjunctive behavior, such as schedule-induced polydipsia, which follows the same function rule as operant responses (Appendix C ). Figure 12 shows the rate of concurrent drinking by Jason. However, the downturn in drinking at high reinforcement rates is almost surely due to competition from the terminal response. These data are displayed to remind us that the theory awaits proper development for lowered ceilings on response rates caused by the concurrent emission of other responses (See Appendices B and D).

Interval schedules do not differentially reinforce pausing; they non differentially reinforce any response that comes before a target response and occupies time (whether or not that happens to be another target response). This is a subtle point that bears repeating. It makes sense to say the opposite--that interval schedules reinforce long IRTs--only if the premature responses debase the reinforcement of the target response class because they require effort, or because they lower the probability of reinforcement for members of that class. Conversely, the present theory assumes that interval schedules command lower rates than ratio schedules because they give the memory of everything that comes before a reinforced target response (including, of course, other target responses, and other stimuli) the substantial weight of 1- beta. It is because of this promiscuity of reinforcement that we see lower rates of responding, and the drifting of those rates, on interval schedules. It is now time to examine the nature of the coupling constant, rho, which carries the burden of this distinction between ratio and interval schedules.

6. Preasymptotic behavior: Trajectories in response-space

To understand how behavior shifts from the initial acquisition phase to the established patterns of responding described by that model, it is necessary to explore the conditioning process in more detail. What follows is one of many possible model systems, introduced as grounding and as a plausibility proof for the asymptotic model, which is the primary focus of this article.

Figure 13 shows a slice through behavior-space in which successive instances of the target response are plotted on the y-axis as a function of all other responses on the x-axis. Upon each tick of an internal clock the subject emits some response, which moves the tip of the trajectory along either to the right or up. In this figure the probability of a target response is initially set at 0.1, and the line labelled 0 shows the resulting average trajectory before any conditioning has taken place.

Under FR schedules reinforcement occurs as soon as the trajectory crosses the ordinate corresponding to the ratio value N (the consequential line, set at N=16 in Figure 13). The learning model invoked here assumes that upon reinforcement the probabilities of all of the events in the trajectory are increased some proportion (wj) of the distance to their maximum. The equations instantiate Equation 2 for each element of the trajectory. It is one of the oldest and most basic mathematical models of the learning process. It is presented more explicitely in the paper text.

What happens when this model is turned on? Intuitions vary: Since reinforcement increases whatever sequence of responses the animal emitted, and since is was most likely to be doing something other than the target behavior, one might assume that this process would drive the target responding into extinction. But this is not what happens. Figure 13 shows the average trajectories after successive blocks of 100 reinforcements. The trajectories walk themselves into a near-vertical position, indicating that most of the behavior now consists of the target response. Why should this have happened? To find out, reply that you wish to read about this audacious claim in the original, where I demonstrate how this is done, showing that the response space is a field with gradients at each point oriented toward the consequential line. It is a dynamic field, since the strengthening given to any element both depends upon and determines the probability of making a target response. Under FR contingencies the trajectory evolves toward the leftmost curve in Figure 13. If the probability of making a target response at every opportunity to do so were 1.0, then the final state would be a vertical line.

For purposes of analysis we have treated each element in the trajectory as a separate response. Our equipment typically counts all target responses as members of the same class wherever they occur in the trajectory, and the coupling associated with that class is their weighted sum, with the last element receiving a weight of beta, the penultimate one a weight of (1-beta)*beta, and so on. The sum of those weights is the sum of Equation 14 up to N, that is, Zeta. Thus the limiting model for ratio schedules (the vertical trajectory) is one toward which the reinforcement process naturally attracts behavior, and it is the one given by Equation 8 in the text (with rho= 1).

Of course it is unrealistic to think that each element in a trajectory constitutes a unique response having its own memory register. Therefore, in the present simulations after each reinforcement the probability at position i is assigned the average value for it and the ones immediately before and after it. This averaging provides a minimal kind of coarse graining which smooths the results of the finite-elements analysis. The lumping of behavior in real organisms is substantial, with bouts of responding that are self-sustaining (i.e., the probability of continuing in some class of responses is greater than that of leaving it). But such verisimilitude was sacrificed for simplicity; insofar as the present model can reflect the trapping effects of reinforcement and give a correct ordering of coupling constants for the various schedules and parts of them, then more realistic models involving additional parameters can do so a fortiori.

When the FR is large, the consequential line is so high above the origin that reinforcement cannot reach all the way back to attract the earliest elements, but leaves them to drift around their baseline level. The smoothing causes food-oriented responses during the prior reinforcer to propagate forward in the sequence, causing some bowing of the trajectories below the original baseline probabilities (see Figure 13)(ha!). The coupling on the simulated schedules is the proportion of responses in the average trajectory that are target responses. Figure 14 shows this proportion, both for the trajectory as a whole, and for the last half of it. The curves through the points are the locus of Equation 8, multiplied by the coupling constant, rho, which tells us how far the trajectory has rotated to the vertical. For the rate during the last half of the schedule, rho = 0.9, but for the trajectory as a whole, it takes the lower value of 0.6. Note that all changes in behavior due to learning will be reflected in sub-maximal values of rho, which is therefore the index of choice for degree of conditioning. However, predictions with rho fixed at 1 are just as good as those using the precise value, because rho appears in the basic model as a factor of delta; the only effect of freezing it at 1 is to inflate the recovered value of delta. Further discussion of the nature of rho, and simulations for the other basic schedules, are found in Appendix D.

In summary, rho is the asymptotic proportion of target responses in the trajectory as N approaches infinity; ratio schedules drive its value toward 1, interval schedules to lower values. But in general it may be fixed at 1, letting subunitary values due to schedule type or incomplete conditioning be accommodated by larger values of delta.

7. From Interval to Ratio: Concatenating Zetas

The reinforcement of interresponses may be tempered by adding ratio contingencies that require a certain number of target responses to be made after some interval has elapsed. As N increases, the contingencies increasingly strengthen target responses rather than interresponses, driving rate toward its maximum. Figure 15 shows the predictions for N ranging from 1 (i.e., a simple VI) to 12, added to a VT schedule. The data are from Catania (1971), and the curve from a concatenation of the coupling coefficients that constitute these contingencies:

16. Zeta (compound) = Zeta(FR1 to N) + Zeta(VI N to infinity)

Now for the first time we may no longer absorb rho into the maximum response rate; the data require that it take a value of .34 for the interval component, indicating that the measured key-peck response constitutes about 1/3 of the reinforced behaviors in this experiment. (The other parameters are all in line with the values assumed in other experiments). The point is not that a curve can be found to fit the data, which could be done more simply than this; the point is to demonstrate how a coherent theory of schedule control may account for transitions between different types of schedules in a principled fashion.

We may debase the contingency by going in the other direction, following a ratio requirement with a delay. As the delay is lengthened, concatenation of the appropriate coupling coefficients predicts a smooth decrease in response rates, as was observed by Catania & Keller (1981).

The success of Equation 16 is hopeful. Properly written coupling coefficients may be concatenated in ways that correspond to the contingencies of reinforcement: Averaged for mixed schedules, and sequenced as in Equation 16 for tandem (series) schedules. The maximum of two coefficients would be appropriate for conjoint (parallel) schedules, as the contingencies most tightly coupled to behavior would drive responding in the direction in which they would be even more effective. Estimating the coupling on multiple schedules requires consideration of the reinforcing strength of non-contingent component changes, and how they couple to both responses and interresponses, as well as other factors. The same issues arise for concurrent schedules (parallel contingencies for different responses), plus the need to generalize the model of temporal constraints. The development of techniques for combining coefficients may lead us to a unified system of models that is as useful in describing the consequences of reinforcement schedules as the Laplace transform is in describing the arrangements of linear systems. Conversely, the theory can guide us in designing contingencies that are resonant with the operating system of the organism, replacing the old designations of schedules with ones that are based on the design principles of the organism.

8. Curtailing the Response

The minimum IRT for both pigeons and rats is close to one-quarter of a second. Furthermore, pigeons often make double pecks, functionally increasing the duration of a response to about 0.6 s (Arbuckle & Lattal 1988). Modern computer technology makes it possible to reinforce animals within a few milliseconds of their first contact with the operandum, and thus to intercept their stream of behavior before a response has fully run its course, and before the nascent response has fully entered memory. if the experimenters definition of the response is instantaneous, so that it is reinforced in the first instant of its occurrence, then unless the animals definition is also instantaneous, the coupling must fall to zero. It should not then be surprising that schedules which delay reinforcement by 1/2 s from the onset of a response generate higher rates of responding (Arbuckle & Lattal 1988). If reinforcement is precipitate, it will not couple to behavior. In conditioning, as in humor, timing is of the essence. Rather than treat the weight given to a response as constant, we should take:

18. beta = 1 - exp(-lambintxdelta/rho)

In the first two experiments we essentially employed Equation 18 with delta/rho set equal to 1; that is, treated all responses alike (i.e., as quanta). Subsequent analysis belied this quantal assumption, because the rate of decay of memory was highly correlated with and with schedule type, leading us to assert the numerator of equation 18. Whereas delta tells us the minimal IRT, 1/rho tells us how many other interresponses come along for the ride, thus functionally multiplying the duration of the response. Equation 18 leads this inference back to the weight accorded a single response. I explored the notion that responses might be curtailed by reinforcement in Catanias (1971) experiment, and the data were consistent with a fractional weight for the last response, as shown in Figure 6.

Traditional contingencies of reinforcement thus not only truncate the memory of a response, they may curtail its manifestation, in both ways undermining their own control of behavior. Skinner (1938, p. 40) thought that our definition of the unit of behavior was correct when it yielded orderly dynamic changes in behavior. But this is not enough. All of the curves in Figure 3 are relatively orderly. We must look for a resonance between our definition of the response and an animals, as indicated by optimal control and maximal rate of learning, as seen in the tuning curves in Figure 5. The current analysis suggests that reinforcement should should take into account an exponentially weighted history of responding that epitomizes the animals memory for its own behavior, and should follow the onset of the most recent response no sooner than delta s. Performance on the various schedules of reinforcement is a straight forward consequence of the extent to which our contingencies of reinforcement respect the animals understanding and manifestation of its behavior.

9. General Discussion

9.1. Synopsis: The basic principles underlying this calculus of contingencies are represented in Table 4. It is useful to review them before a more general discussion.

Line 1 gives the linear average that governs the decay of memory from one response to the next. During a response, the weighting of the new and downweighting of the old is continuous (Equation 18). The currency parameter, beta, characterizes the resulting weight given to the modal target response. Other responses also occur in the proportion 1-rho, and also index this equation. When the organism is emitting no responses (including observing responses), memory does not decay.

Line 2 tells us the rate at which responses (both target and other responses) are elicited by incentives occurring at a rate of R. The instigating force for behavior is activation due to the cumulative memory of incentives; it is proportional to the historical rate of delivery of incentives in the experimental context (Appendix B; Killeen, Hanson & Osborne 1978). One incentive under the specified conditions of motivation will support a seconds of behavior Q a/delta responses when each requires delta seconds for its completion.

Line 3 formalizes the observation that responses take time, which detracts from the time available for the emission of other responses. A blocked- counter model (Bharucha-Reid, 1960) provides the basic compensation for this depression of observed rates below their theoretical level, while Line 2 provides that theoretical level. Appendix B gives the derivation, plus some additional data.

Line 4 derives from the combination of Lines 2 and 3, and adds the coupling coefficient zeta to complete the model of the contingent control of responding. This is the canonical equation for predicting behavior: To predict response rates one specifies how a particular schedule couples incentives to the organisms memory of its behavior, and insert the appropriate expression for zeta into Equation 4. Prototypical expressions for zeta are given in Table 5.

Where rate of reinforcement varies substantially with rate of responding, this must be taken into account using the appropriate schedule feedback function. In this paper I have assumed that reinforcement rate is proportional to response rate under ratio schedules, and independent of it under interval schedules. Of course, it is always possible to measure the obtained rate of reinforcement and use Equation 4 directly, as is necessary when the schedule-feedback function is obscure. This is possible because I do not anywhere in this theory assume that the schedule-feedback function controls behavior directly (i.e., this is not a molar optimization theory), but only indirectly through the rate of reinforcement it sustains.

Line 5 gives the currency parameter a derivative position in the theory: It is the cumulative weight of reinforcement that is brought to bear on a response of duration delta, when the measured target response comprises the proportion of the total behavior stream, and the rate of decay of memory while the animal is responding is lambint.

Prototypical coupling coefficients are displayed Table 5. These are inserted into the activation/constraint equation to predict behavior. Line 1 (and Figure 5) gives the coupling coefficient for the weighted reinforcement of IRTs in Experiment 2. The remaining lines give the coefficients for responding under some traditional reinforcement schedules. Because B appears on both sides of the equations for interval schedules, these equations must be iterated until the predictions converge. However, for an approximate solution we may use a truncated power series to represent the exponential term, yielding a general approximation for interval schedules, Line 8. This permits approximate solutions to the basic equations which are much simpler, and whose average predictions are usually within 1% of those of those afforded by the precise equations.

For most of the traditional types of scheduling arrangements, the response/ interresponse proportion, rho, is too strongly conflated with the minimum response time, delta, to be independently estimated. Making a virtue of this necessity I set it to 1, letting delta absorb the residual variations in its value. This maneuver permits predictions for all basic schedules based on the same three parameters. The resulting general forms for the various schedules are presented Table 6. The logic of coupling developed for responses may be extended to stimuli, yielding a model for conditioned reinforcers (Line 3).

Under a few scheduling arrangements we are permitted/forced to assign a real value to rho, as in the last study analysed in this paper. Then it takes a value of about 1/3, which is consistent with independent estimates of the efficacy of ratio schedules relative to interval schedules (Zuriff, 1970, estimates that each reinforcer on variable-ratio schedules will generate to 3 times the control of measured responding as on variable interval schedules).

The cumulative memory of a sequence of incentives generates a heightened state of arousal whose asymptote is given by Equation B1 (see Figure B1). This arousal can become associated with the target response, with other responses, with stimuli, and with the experimental context; the degree of these associations is given by the appropriate coupling coefficients, yielding summative models such as those shown in Table 6. The role of such activationQthe memory of incentives pastQin instigating behavior was introduced as a fundamental assumption of the present theory, embodied in Line 2 of Table 4. On now returning to it we see it as another manifestation of the operation of short-term memory, obeying the same rules of saturation and displacement, but operating on emotionally salient stimuli. The passage of incentives through short-term memory increments a longer- term state of activation that elicits diffuse behavior; their co-occurrence in memory with other stimuli and responses directs the association of that arousal, a process we call conditioning. Because the constituent processes involve the movement of organisms under the force of incentives, this ensemble of principles provides a dynamics of behavior.

9.2. Degrees of Freedom There are three central assumptions of this theory: The activation of behavior by incentives, constraints on responding, and memory as the mediator of reinforcement. An important ancillary assumption is the continual indexing of memory by responses, including observing and consummatory responses, which will displace the memory of target responses by the memory of incentives (or their consummation). These assumptions may be captured by models in different ways; this paper displays but one of many possible instrumentations. I have left out certain considerations to spare the reader too heavy a burden of detail (e.g., the amount of behavior supported by an incentive (Line 2 of Table 4) should also take into account unmeasured responses by including as a divisor of a wherever it occurs. However, the only place this detail would have an important impact is on the readers patience). Further application of the theory to data will reduce the field of candidate implementations, modify and expand the contents of Tables 4-6, and refine some of the implementations chosen here. There is a vast amount of empirical research available for this task. The present theory motivates well- defined alternative models of them, and their selection by the likelihood ratios of predictions given those data (Sakamoto, Ishiguro & Kitagawa 1986). The decision axis for this argument is not the truth of the theory, but the expected utility of the models it supports. Where that is not above threshold, it is hoped that modifications wrought by the field will bring it there. Important ways in which the theory must be developed include: Specification of the coupling coefficients for concurrent and multiple schedules; predictions of the variance in behavior associated with the nonlinearities in the governing equations; integration with the research on short-term memory for sequences of stimuli; contact with the extensive literature on associative conditioning; extension to memory of sequences of heterogeneous stimuli and responses; and grounding the constructs in the pharmacological and biobehavioral literature.

9.3. Relation to other theories and phenomena

9.3.1. Conditioning at long delays. Conditioning is best with short delays between a response and reinforcer, yet an impressive amount of conditioning can occur at long delays. According to our event-indexed account of memory, the passage of time between a response and a reinforcer is irrelevant; it is the occurrence of other responses that block control by filling memory with unrecorded behavior. Lattal and Gleeson showed conditioning with delays of 10 to 30 seconds in rats and pigeons, and suggested that: Even though the response and reinforcer are separated from one another by an absolute time interval of 10-30 s, in relative terms, this interval may be quite immediate (Lattal & Gleeson 1990, p. 38; also see Wilkenfield et al, 1992). According to our account, it is relativeQ relative to the total amount of other behavior which fills memory. And that is driven by overall rate of reinforcement, causing memory to lengthen in unstimulating environments (when measured in temporal units; Killeen 1984; 1991; Williams, 1978). Ours is an ordinal contiguity account, in that it is the number of times that memory is indexed that discounts a reinforcers control of a response, and that number depends on the richness of the environment and the activity of the organism.

Some types of conditioning such as taste aversions may occur over a span of hours, not fractions of a minute. In one contemporary review of this literature, Revusky and Garcia explained these effects in terms of an interference theory: QuoteOn a practical level, as the time between a stimulus and a consequence is increased, the probability of a learned association between them will be reduced. ... On a theoretical level, however, it is quite likely that it is not the increased time itself which interferes with the learned association but the fact that an increased duration of time is likely to contain an increased number of interfering events.CQuote (1970, p. 41). Events relevant to the consequence seem to interfere the most: Revusky and Garcia cite research showing hours-long delay gradients for operant responses when these are protected from interference by a change of context, such as removal from the experimental chamber. Similarly, responses that are marked by stimulus change may be more memorable. Indeed, the primary action of conditioned reinforcers may consist in their marking of responses and making them thereby more memorable (i.e., protected from interference) at the time of reinforcement. Conditioned reinforcers work well because they have a conditioned salience, but other highly salient stimuli also do the job (Lieberman, Davidson & Thomas 1985).

It is clear that there can not be just one unidimensional response memory. Behavior carries the organism along the various dimensions of its psychological space. When an incentive such as reward or illness occurs, recent events along all dimensions compete for association. In this paper, we have studied only one of those dimensions, that of homogenous operant responses. Similar processes of coupling may occur simultaneously along all dimensions of the organisms psychological space. I have argued that the force of an incentive decreases exponentially along those dimensions (Killeen 1992). The present paper constitutes a working out of the implications of that theory for one exemplary continuum.

9.3.2. Response selection. The excitability of various actions (a) is a key factor in determining their rate of occurrence (Lines 2 and 4 in Table 4). Their relative excitability (rho, the coupling constant) governs their availability to memory, and thus is a key factor in the coupling of the incentive to behavior under schedules that indifferently reinforce the repertoire (Lines 4 through 8 of Table 5). Contingent reinforcement adds a guaranteed weight of to the measured behavior, but this may not be adequate to compete with a more prepared response having a sufficiently larger value of rho. The availability of responses with intrinsically larger coupling constants may lead to instinctive drift, with the topography of the measured behavior degenerating toward that of a more easily aroused form, even when that is less effective in obtaining reinforcement (Breland & Breland 1961). The gradients that attract behavior toward the target response are often quite attenuated, and provide weak competition for paths through instinctively prepared responses.

When reinforcement contingencies may be satisfied by a variety of discrete acts (Reed, Schachtman & Hall, 1991) or by topographical variants of a single act (Stokes & Balsam, 1991), the history and genetic predispositions of the organism will favor some instances; their presence in memory at the time of reinforcement will further strengthen them, and lead them to dominate the repertoire, even after the experimental constraints are relaxed (Davis & Platt, 1983). Revusky and Garcia have also speculated on this mix of behaviors: Quote If certain species specific behaviors happened to occur prior to the arbitrary response, they would become more strongly reinforced than the arbitrary response although the delay of reinforcement was longer CQuote (1970, p. 49; also see Timberlake 1983). These ideas were formalized by Staddon and Zhang (1989), who generated a model of response selection based on assumptions [of]: arousal and adaptation, the idea that reinforcement transiently energizes a range of activities . . . ; strength and competition, the idea that each activity has a certain tendency to occur and that the strongest will win; and variability, the notion of a repertoire of activities. (1989, p. 190). Their model involved an exponentially weighted moving-average of differences in short-term memory for different responses (with a parameter that functioned like beta), and differences in excitability of various responses (with a parameter that functioned like rho).

Skinner (1938), among others, noted that When a reinforcement depends upon the completion of a number of similar acts, the whole group tends to acquire the status of a single response (p. 300). The present theory is a latter-day response-unit hypothesis, with exponentially-decaying weights for the constituent acts. However, our model does not require that the acts be similar, only that they be in memory at the time of reinforcement. Contingencies which provision memory with a mixture of actsQsome measured, and others merely filling timeQwill strengthen that mixture; but only a fraction rho of those acts will be measured and contribute to the coupling of the target response to reinforcement.

The presence of response rates and probabilities as variables in coupling coefficients indicates the important role of positive feedback in response selection. Candidate stimuli or responses that are favored by evolutionary or ontogenetic histories may be given an invincible lead in the competition for association. Conversely, stimuli or responses with histories of non relevance may be so disfavored that it may take many trials of co occurrence in memory to recondition observation of the stimulus or emission of the response. Such manifestations of learned helplessness and latent inhibition constitute the trapping of a response trajectory in a local minimum by a sensitivity to consequences that is, by definition, an inherent aspect of adaptive behavior.

How does reinforcement actually select responses? Bindra (1972) held that the primary law of behavior is approach to signs of reinforcement, and I have generalized this argument to other dimensions (Killeen 1992). Figure 13 pictures a 2-D slice through this space. Those regions of the behavior- space that are close to the consequential region become signs of it and attract behavior, drawing the organism along all dimensions of its space, including the response dimension. The easiest way to approach the events in memory at the time of reinforcement is to replicate the reinforced behavior (in the right place, at the right time, etc). The pressure to do this increases as an exponential function of the proximity of the constituent responses to reinforcement. The present theory provides the mechanism for the posited exponential change in the force of reinforcement along the response dimension.

9.3.3. The correlation-based law of effect. Baum (1973) has argued for a correlation- based law of effect in which the units of analysis are (molar) reinforcement rates and response rates, whose correlation over stretches of time constitute the law of effect; in his theory, correlation is more important than contiguity. Williams (1976) challenged that theory by finding substantial decreases in response rates in schedules that maintained a correlation over delays of 3 to 15 seconds while disrupting contiguity. (Of course, if the stretch of time over which animals compute correlations is only 3 seconds long, the observed decrease in rates could be attributed to a debased correlation, as both authors recognized. On the other hand, a correlational theory based on a sample of only the few seconds before a reinforcer is only those few seconds away from being a contiguity theory.)

The present theory constitutes a contiguity-weighted correlation-based law of effect, as it shares features of both contingency and contiguity accounts. Reinforcement rate is important because it determines the level to which arousal can cumulate (see Appendix B), and that determines the potential rate of behavior (Line 2 of Table 4). Zeta , a key factor in our theory, is a contiguity-weighted correlation coefficient (see Figure 1F). But the details differ. The present theory is response-indexed, whereas for Baum time is a fundamental dimension of all interactions between behavior and environment (1973, p. 139). Zeta is a correlation between the contents of memory on the occasion of reinforcement and the experimenters requirements, not between rates of responding and rates of reinforcement. There is less emphasis here on schedule feedback functions, as their only role is to control rates of reinforcement and contents of memory; in particular it is not assumed that organisms respond faster on ratio schedules because that is correlated with higher rates of reinforcement.

Learning theorists often counterpose contiguity (closeness between two events) and contingency (relative frequencies of one event in the context of another, often treated as a correlation). The present theory cuts across this distinction, as zeta is a contiguity-weighted correlation coefficient. Williams noted that one of the reasons for contingency treatments is that As of now, the reduction of free operant behavior to molecular principles has been notably unsuccessful (1976, p. 442; also see Thomas, 1983; Papini and Bitterman, 1990). The present dynamical account is just such a reductionQalbeit one in which correlation plays a key role.

9.3.4. Foraging. The exponentially-weighted moving average model of memory proposed here has also appeared in the behavioral-ecology literature on optimal foraging (for a review, see Kacelnik, Krebs & Ens, 1987). In some experiments, however, there is little or no evidence for the control of behavior by any of the responses preceding the most recent (e.g., Cuthill et al, 1990). But the responses in question often comprise extended episodes of travel, and thus by our present analysis the most recent may have washed the penultimate behavior out of short term memory.

McNamara & Houston (1987) have shown that to forage optimally, the currency parameter for an animals memory of reinforcement should vary with the rate of reinforcement in the environment as a whole, giving more weight to the most recent in a relatively rich environment. There is good evidence that this happens. Yet, we have not explicitly specified such a dependency of lambda on rate of reinforcement in this model. Does this imply that the present model is both non-optimal and inconsistent with the data? Neither. The present theory entails an implicit dependency of rate of memory decay on rate of reinforcement, because memory is indexed by responses, and their rate directly depends on the rate of reinforcement (see Table 4). Because this dependency is non-linear it has been easier to develop the theory in the response domain rather than the time domain. Furthermore, there are times when the animal is not responding, and we expect no changes in memory then. Nonetheless, it is clear that as rate of incitement increases, so also will rate of responding, causing the depth of memory measured in seconds to shorten--as it must to satisfy considerations of optimality in both forging decisions and causality detection.

9.3.5. Memory. No one has argued more strongly than Shimp for a reevaluation of our treatment of behavior in terms of its structure, an extension our definition of the response to sequences of acts, and a recognition that this is tantamount to the introduction of memory into the experimental analysis of behavior (e.g., it makes sense to ask what relations obtain between what a subject can remember of its own recent behavior in short-term memory experiments, . . . and what behavioral patterns can be established as behavioral units when those patterns systematically precede a reinforcer (1976a, pp. 125-126; 1976b; cf Branch, 1977). The present theory suggests that the relation is one of isomorphism.

The present theory has much in common with the standard models of human short-term memory (Atkinson & Wickens 1971; Deutsch & Deutsch, 1975; Norman 1970). It is an interference theory, as many of those are. Equation 15 transfers the weighted contents of short-term memory to a long-term store. There is a close similarity between the mathematical models derived from both traditions. The present theory instantiates short-term memory as a low-pass filter between responses (and stimuli) as input, and behavior (and long-term memory) as output.

Organisms remember many different types of things, whose exponentially decreasing weight in memory is consistent with the demands of a temporal lens, focused on the most likely causal candidates preceding an important event. Potential candidates ramify exponentially as time passes, and their a priori weight in memory must decrease in like manner. Other cues to causal relevance than temporal contiguity, such as spatial proximity and salience, also weight the representations. The present models are simple because they take as their domain homogenous response sequences. Qualitatively different stimuli and responses will be weighted and averaged into their own long-term stores, perhaps in a manner similar to the temporally-tagged responses sketched in Equation 15. Furthermore, distinctive stimuli may differentiate otherwise similar responses: Animals may remember the last responses they made in a white alley without interference from similar responses in the home cage or in dark alleys, and may cumulate these tagged memories over substantial temporal interludes (Capaldi, 1992), giving rise to a discrete-trial version of the bitonic function (e.g., Figure 7) known as the partial reinforcement acquisition effect.

The present theory may be seen as a model of a neural field in which an input attracts a proportion beta of the available resources. If the activity on the field is normalized by lateral inhibition, then the attention allocated to the new input is found by decrementing all of the other representations by 1-beta. If the decrement is random, those with the largest representation will be taxed in proportion to the resources they command. This yields the necessary geometric/ exponentially-weighted decay of memory. When an incentive enters memory, its contents are then averaged into their appropriate long-term stores--including the memory of the incentive, which engenders a cumulating increase in activation level. The activation may involve a different neural locus (e.g., the ventral striatum) than the short term memory of other stimuli and responses (e.g., the dorsal striatum; Robbins & Everitt, 1992), and thus have different rate constants associated with it. Behavior dynamics may provide a useful module for the representation of short-term memory in more complete neural networks, such as the Sutton-Barto-Desmond model (e.g., Moore, 1991), Donahoe and Palmers (1993; Donahoe et al, 1982), and others (e.g., Grossberg, 1975; Levine, 1992).

10. Conclusion

We reinforce the animals representation of the response, not the response itself. This representation is a weighted memory of past behavior. The present theory places that representation into a dynamic system of equations for predicting the effect of reinforcement contingencies. It is a non-metaphorical theory, in that each of its key terms is well-defined in ordinary language. It is constructive, in that it permits explicit predictions, just as it permits elbow room for reconstruction of the particular instantiating models in the light of data. It posits no mental algebras, but only excitement, the diffuse activation of behavior, and the focusing of that incitement on specific responses by the correlation between experimental demands and the organisms memory at the time of those demands. It is a theory of how incentives fuel behavior, time constrains it, and contingencies direct it. It is not a cognitive theory, as we have come to know those, but rather it brings us a step closer to understanding cognitive processes such as short-term memory in behavioral terms.


Allen, J. D. & Kenshalo, D. R. J. (1976) Schedule-induced drinking as a function of interreinforcement interval in the rhesus monkey. Journal of the Experimental Analysis of Behavior 26:257-267.

Anger, D. (1956) The dependence of interresponse times upon the relative reinforcement of different interresponse times. Journal of the Experimental Analysis of Behavior 52:145-161.

Arbuckle, J. L. & Lattal, K. A. (1988) Changes in functional response units with briefly delayed reinforcement. Journal of the Experimental Analysis of Behavior 49:249-263.

Atkinson, R. C. & Wickens, T. D. (1971) Human memory and the concept of reinforcement. In: The nature of reinforcement, ed. R. Glaser. Academic Press.

Barofsky, I. & Hurwitz, D. (1968) Within ratio responding during fixed ratio performance. Psychonomic Science 11:263-264.

Baum, W. M. (1973) The correlation-based law of effect. Journal of the Experimental Analysis of Behavior 20:137-153.

Baum, W. M. (1992) In search of the feedback function for variable interval schedules. Journal of the Experimental Analysis of Behavior 57:365-375.

Bharucha-Reid, A. T. (1960) Elements of the theory of Markov processes and their applications. McGraw-Hill.

Bindra, D. (1972) A unified account of classical conditioning and operant training. In: Classical conditioning II: Current research and theory, ed. A. H. Black, & W. F. Prokasy. Appleton-Century-Crofts.

Bolles, R. C. (1983) The explanation of behavior. The Psychological Record 33:31-48.

Branch, M. N. (1977) On the role of memory in the analysis of behavior. Journal of the Experimental Analysis of Behavior 28:171-179.

Breland, K. & Breland, M. (1961) The misbehavior of organisms. American Psychologist 16:681-684.

Brown, J. A. (1958) Some tests of the decay theory of immediate memory. Journal of Verbal Learning and Verbal Behavior 2:34-39.

Capaldi, E. J. (1992) Levels of organized behavior in rats. In: Cognitive aspects of stimulus control, ed. W. K. Honig, & J. G. Fetterman. Lawrence Erlbaum Associates.

Catania, A. C. (1971) Reinforcement schedules: The role of responses preceding the one that produces the reinforcer. Journal of the Experimental Analysis of Behavior 15:271-287.

Catania, A. C. & Keller, K. J. (1981) Contingency, contiguity, correlation, and the concept of causation. In: Advances in Analysis of Behaviour: Predictability, Correlation, and Contiguity, ed. P. Harzem & M. D. Zeiler. John Wiley.

Catania, A. C. & Reynolds, G. S. (1968) A quantitative analysis of the responding maintained by interval schedules of reinforcement. Journal of the Experimental Analysis of Behavior 11:327-383.

Catania, A. C., Sagvolden, T. & Keller, K. J. (1988) Reinforcement schedules: retroactive and proactive effects of reinforcers inserted into fixed interval performances. Journal of the Experimental Analysis of Behavior 49:49-73.

Clark, C. W. (1976) Mathematical bioeconomics: The optimal management of renewable resources. Wiley.

Collier, G., Johnson, D., & Morgan, C. (1992) The magnitude-of reinforcement function in closed and open economies. Journal of the Experimental Analysis of Behavior 57:81-89.

Cuthill, I. C., Kacelnik, A., Krebs, J. R., Haccou, P., & Iwasa, Y. (1990) Starlings exploiting patches: The effect of recent experience on foraging decisions. Animal Behaviour 40:625-640.

Davis, D. G. S., Staddon, J. E. R., Machado, A. & Palmer, R. G. (1993) The process of recurrent choice. Unpublished manuscript

Davis, E. R. & Platt, J. R. (1983) Contiguity and contingency in the acquisition and maintenance of an operant. Learning & Motivation 14:487 512.

Davison, M. & Jenkins, P. E. (1985) Stimulus discriminability, contingency discriminability, and schedule performance. Animal Learning & Behavior 13:77-84.

Dawson, G. R., & Dickinson, A. (1990) Performance on ratio and interval schedules with matched reinforcement rates. The Quarterly Journal of Experimental Psychology 42B: 225-239.

Deutsch, D., & Deutsch, J. A. (1975) Short-term memory. New York: Academic Press.

Donahoe, J. W., Crowley, M. A., Millard, W. J., & Stickney, K. A. (1982) A unified principle of reinforcement. In: Quantitative Analysis of Behavior II: Matching and maximizing accounts, ed. M. L. Commons, R. J. Herrnstein, & H. Rachlin. Ballinger.

Donahoe, J. W. & Palmer, D. C. (1993) Learning and its implications for complex behavior. Boston: Allyn & Bacon.

Dow, S. M. & Lea, S. E. G. (1987) Foraging in a changing environment: Simulation in the operant laboratory. In: Quantitative Analysis of Behavior VI: Foraging, ed. M. L. Commons, A. Kacelnik & S. J. Shettleworth. Erlbaum.

Ettinger, R. H., Reid, A. K. & Staddon, J. E. R. (1987) Sensitivity to molar feedback functions: A test of molar optimality theory. Journal of Experimental Psychology: Animal Behavior Processes 13:366-375.

Galbicka, G. (1988) Differentiating the Behavior of Organisms. Journal of the Experimental Analysis of Behavior 50:343-354.

Galbicka, G. & Platt, J. R. (1986) Parametric manipulation of interresponse- time contingency independent of reinforcement rate. Journal of the Experimental Analysis of Behavior 12:371-380.

Gibbon, J. (1977) Scalar expectancy and Webers law in animal timing. Psychological Review, 84:279-325.

Gibbon, J. & Balsam, P. (1981) Spreading association in time. In: Autoshaping and conditioning theory, ed. J. Gibbon, C. M. Locurto, H. S. Terrace. Academic Press.

Greenwood, M. R. C., Quartermain, D., Johnson, P. R., Cruce, J. A. F. & Hirsch, J. (1974) Food motivated behavior in genetically obese and hypothalamic hyperphagic rats and mice. Physiology & Behavior 13:687-692.

Grossberg, S. (1975) A neural model of attention, reinforcement, and discrimination learning. International Review of Neurobiology, 18:263 327.

Hamilton, A. L., Stellar, J. R., & Hart, E. B. (1985) Reward, performance, and the response strength method in self-stimulating rats: Validation and neuroleptics. Physiology & Behavior, 35: 897-904.

Hayes, S. C., & Hayes, L. J. (1992) Verbal relations and the evolution of behavior analysis. American Psychologist 11:1138-1395.

Herrnstein, R. J. (1970) On the law of effect. Journal of the Experimental Analysis of Behavior 13:243-266.

Herrnstein, R. J. (1974) Formal properties of the matching law. Journal of the Experimental Analysis of Behavior 76:49-69.

Herrnstein, R. J. (1979) Derivatives of matching. Psychological Review, 86:486-495.

Heyman, G. M. & Monaghan, M. M. (1987) Effects of changes in response requirement and deprivation on the parameters of the matching law equation: New data and review. Journal of Experimental Psychology: Animal Behavior Processes 13:384-394.

Kacelnik, A., Krebs, J. R., & Ens, B. (1987) Foraging in a changing environment: an experiment with starlings (Sturnus vulgaris). In: Quantitative Analyses of Behavior. Vol. 6: Foraging, ed. M. L. Commons, A. Kacelnik, & S. J. Shettleworth. Lawrence Erlbaum.

Kelsey, J. E. & Allison, J. (1976) Fixed-ratio lever pressing by VMH rats: Work vs accessibility of sucrose reward. Physiology & Behavior 17:749 754.

Killeen, P. R. (1981) Averaging theory. In: Recent developments in the quantification of steady-state operant behavior, ed. C. M. Bradshaw, E. Szabadi & C. F. Lowe. Elsevier.

Killeen, P. R. (1982a) Incentive Theory. In: Nebraska symposium on motivation, 1981: Response structure and organization, ed. D.J. Bernstein. Lincoln: University of Nebraska Press.

Killeen, P. R. (1982b) Incentive Theory II: Models for choice. Journal of the Experimental Analysis of Behavior 38:217-232.

Killeen, P. R. (1984) Incentive theory III: Adaptive Clocks. In: Timing and time perception, ed. J. Gibbon & L. Allen. New York Academy of Sciences.

Killeen, P. R. (1985) Incentive theory IV: Magnitude of reward. Journal of the Experimental Analysis of Behavior 43:407-417.

Killeen, P. R. (1991) Behaviors time. In: The psychology of learning and motivation, ed. G. H. Bower. Academic Press.

Killeen, P. R. (1992) Mechanics of the animate. Journal of the Experimental Analysis of Behavior 57:429-463.

Killeen, P. R., Hanson, S. J. & Osborne, S. R. (1978) Arousal: Its genesis and manifestation as response rate. Psychological Review 85:571-581.

Killeen, P. R. & Smith, J. P. (1984) Perception of contingency in conditioning: Scalar timing, response bias, and the erasure of memory by reinforcement. Journal of Experimental Psychology: Animal Behavior Processes 10:333- 345.

Kintsch, W. (1965) Frequency distribution of interresponse times during VI and VR reinforcement. Journal of the Experimental Analysis of Behavior 8:347-352.

Lattal, K. A. & Gleeson, S. (1990) Response acquisition with delayed reinforcement. Journal of Experimental Psychology: Animal Behavior Processes 16:27-39.

Levine, D. (1991) Introduction to neural modelling. Lawrence Erlbaum Associates.

Lieberman, D. A., Davidson, F. H. & Thomas, G. V. (1985) Marking in pigeons: The role of memory in delayed reinforcement. Journal of Experimental Psychology: Animal Behavior Processes 11:611-624.

Mazur, J. E. (1983) Steady-state performance on fixed-, mixed-, and random-ratio schedules. Journal of the Experimental Analysis of Behavior 39:293-307.

Mazur, J. E. (1984) Tests of an equivalence rule for fixed and variable delays. Journal of Experimental Psychology: Animal Behavior Processes 10:426-436.

McDowell, J. J. (1980) An analytical comparison of Herrnsteins equations and a multivariate rate equation. Journal of the Experimental Analysis of Behavior 33:397-408.

McDowell, J. J. & Kessel, R. (1979) A multivariate rate equation for variable-interval performance. Journal of the Experimental Analysis of Behavior 31:267-283.

McDowell, J. J. & Wood, H. (1984) Confirmation of linear system theory prediction: Changes in Herrnsteins k as a function of changes in reinforcer magnitude. Journal of the Experimental Analysis of Behavior 41:183-192.

McDowell, J. J., Bass, R., & Kessel, R. (1992) Applying linear systems analysis to dynamic behavior. Journal of the Experimental Analysis of Behavior 57:377-391.

McNamara, J. M. & Houston, A. I. (1987) Memory and the efficient use of information. Journal of Theoretical Biology 125:385-395.

McSweeney, F. K. (1974) Variability of responding on a concurrent schedule as a function of body weight. Journal of the Experimental Analysis of Behavior 21:357-359.

McSweeney, F. K. (1978) Prediction of concurrent keypeck treadle-press responding from simple schedule performance. Animal Learning & Behavior 6:444-450.

McSweeney, F. K. (1992) Rate of reinforcement and session duration as determinants of within-session patterns of responding. Animal Learning & Behavior 20:160-169.

Mesterton-Gibbons, M. (1989) A concrete approach to mathematical modelling. Addison-Wesley,.

Moore, J. W. (1991) Implementing connectionist algorithms for classical conditioning in the brain. In: Neural network models of conditioning and action: A volume in the quantitative analysis of behavior series. eds. M. L. Commons, S. Grossberg, & J. E. R. Staddon. Lawrence Erlbaum.

Neuringer, A. (1992) Choosing to vary and repeat. Psychological Science, 3:246-250.

Norman, D. A. (1970) Models of human memory. Academic Press.

Palmer, D. C., & Donahoe, J. W. (1992) Essentialism and selectionism in cognitive science and behavior analysis. American Psychologist 11: 1344 1358.

Palya, W. L. (1992) Dynamics in the fine structure of schedule-controlled behavior. Journal of the Experimental Analysis of Behavior 57:267-287.

Papini, M. R., & Bitterman, M. E. (1990) The role of contingency in classical conditioning. Psychological Review, 97:396-403.

Pear, J. J. (1975) Implications of the matching law for ratio responding. Journal of the Experimental Analysis of Behavior 23:139-140.

Peterson, L. R. & Peterson, M. J. (1959) Short-term retention of individual items. Journal of Experimental Psychology 58:193-198.

Platt, J. R. (1971) Discrete trials and their relation to free-operant behavior. In: Essays in neobehaviorism: A memorial volume to Kenneth W. Spence, ed. H. H. Kendler, & J. T. Spence. Appleton-Century-Crofts.

Platt, J. R. (1973) Percentile reinforcement: Paradigms for experimental analysis of response shaping In: The psychology of learning and motivation: Advances in research and theory, ed. G. H. Bower. Academic Press.

Platt, J. R. (1979) Interresponse-time shaping by variable-interval-like interresponse-time reinforcement. Journal of the Experimental Analysis of Behavior 31:3-14.

Platt, J. R. & Day, R. B. (1979) A hierarchical response-unit analysis of resistance to extinction following fixed-number and fixed-consecutive number reinforcement. Journal of Experimental Psychology: Animal Behavior Processes 5:307-320.

Powell, R. W. (1968) The effect of small sequential changes in fixed-ratio size upon the post-reinforcement pause. Journal of the Experimental Analysis of Behavior 11:589-593.

Preston, R. A., & Fantino, E. Conditioned reinforcement value and choice. Journal of the Experimental Analysis of Behavior 55:155-175.

Rachlin, H., Raineri, A. & Cross, D. (1991) Subjective probability and delay. Journal of the Experimental Analysis of Behavior 55:233-244.

Reed, P., Schactman, T. R., & Hall, G. (1991) Effect of signaled reinforcement on the formation of behavioral units. Journal of Experimental Psychology: Animal Behavior Processes 17:475-485.

Revusky, S. & Garcia, J. (1970) Learned associations over long delays. In: The psychology of learning and motivation: Advances in research and theory, ed. G. H. Bower. Academic Press.

Reynolds, G. S., & McLeod, A. (1970) On the theory of interresponse-time reinforcement. In: The psychology of learning and motivation: Advances in research and theory, ed. G. H. Bower. Academic Press.

Robbins, T. W., & Everitt, B. J. (1992) Functions of dopamine in the dorsal and ventral striatum. Seminars in the Neurosciences, 4:119-127.

Robbins, T. W., & Sahakian, B. J. (1983) Behavioral effects of psychomotor stimulant drugs: Clinical and neuropsychological implications. In: Stimulants: Neurochemical, Behavioral, and Clinical Perspectives, ed. I. Creese. Raven Press.

Sakamoto, Y., Ishiguor, M. & Kitagawa, G. (1986) Akaike information criterion statistics. Reidel.

Shimp, C. P. (1973) Synthetic variable-interval schedules of reinforcement. Journal of the Experimental Analysis of Behavior 19:311 330.

Shimp, C. P. (1976a) Organization in memory and behavior. Journal of the Experimental Analysis of Behavior 26:113-130.

Shimp, C. P. (1976b) Short-term memory in the pigeon: Relative recency. Journal of the Experimental Analysis of Behavior 25:55-61.

Shimp, C. P. (1976c) Short-term memory in the pigeon: The previously reinforced response. Journal of the Experimental Analysis of Behavior 26:487-493.

Sizemore, O. J. & Lattal, K. A. (1978) Unsignalled delay of reinforcement in variable-interval schedules. Journal of the Experimental Analysis of Behavior 30:169-175.

Skinner, B. F. (1935) The generic nature of the concepts of stimulus and response. The Journal of General Psychology 12:40-65.

Skinner, B. F. (1938) The behavior of organisms. Appleton-Century-Crofts.

Snyderman, M. (1983) Body weight and response strength. Behaviour Analysis Letters 3:255-265.

Staddon, J. E. R. (1977) On Herrnsteins equation and related forms. Journal of the Experimental Analysis of Behavior 28:163-170.

Staddon, J. E. R., Wynne, C. D. L. & Higa, J. J. (1991) The role of timing in reinforcement schedule performance. Learning and Motivation 22:200-225.

Staddon, J. E. R. & Zhang, Y. (1989) Response selection in operant learning. Behavioural Processes 20:189-197.

Stokes, P. D., & Balsam, P. D. (1991) Effects of reinforcing preselected approximations on the topography of the rats bar press. Journal of the Experimental Analysis of Behavior 55:213-231.

Thomas, G. V. (1983) Contiguity and contingency in instrumental conditioning. Learning and Motivation 14:513-526.

Timberlake, W. (1983) Rats responses to a moving object related to food or water: A behavior-systems analysis. Animal Learning & Behavior 11:309- 320.

Timberlake, W. & Lucas, G. A. (1990) Behavior systems and learning: From misbehavior to general principles. In: Contemporary learning theories: Instrumental conditioning theory and the impact of constraints on learning, ed. S. B. Klein & R. R. Mowrer. Erlbaum.

Timberlake, W. & Peden, B. F. (1987) On the distinction between open and closed economies. Journal of the Experimental Analysis of Behavior 48:35 60.

Vaughan, W. (1985) Choice: A local analysis. Journal of the Experimental Analysis of Behavior 43:383-405.

Wearden, J. H. & Clark, R. B. (1989) Constraints on the process of interresponse-time reinforcement as the explanation of variable-interval performance. Behavioural Processes 20:151-175.

Wetherington, C. L. (1979) Schedule-induced drinking: Rate of food delivery and Herrnsteins equation. Journal of the Experimental Analysis of Behavior 32:323-333.

Wilkenfield, J. Nickel, M., Blakely, E., & Ploing, A. (1992) Acquisition of lever-press responding in rats with delayed reinforcement: A comparison of three procedures. Journal of the Experimental Analysis of Behavior 58:431- 443.

Williams, B. A. (1972) Probability learning as a function of momentary reinforcement probability. Journal of the Experimental Analysis of Behavior 17:363-368.

Williams, B. A. (1975) The blocking of reinforcement control. Journal of the Experimental Analysis of Behavior 24:215-226.

Williams, B. A. (1976) The effects of unsignalled delayed reinforcement. Journal of the Experimental Analysis of Behavior 26:441-449.

Williams, B. A. (1978) Information effects on the response-reinforcer association. Animal Learning & Behavior 6:371-379.

Williams, B. A. (1988) Reinforcement, choice, and response strength. In: Stevens Handbook of experimental psychology: Vol. 2. Learning and cognition (2nd ed), ed. R. C. Atkinson, R. J. Herrnstein, G. Lindzey, & R. D. Luce. Wiley.

Williams, D. C., & Johnston, J. M. (1992) Continuous versus discrete dimensions of reinforcement schedules: An integrative analysis. Journal of the Experimental Analysis of Behavior 58:205-228.

Zeiler, M. D. (1979) Output dynamics. In: Advances in analysis of behaviour: Vol 1. Reinforcement and the organization of behaviour, ed. M. D. Zeiler & P. Harzem. Wiley.

Zeiler, M. D. & Buchman, I. B. (1979) Response requirements as constraints on output. Journal of the Experimental Analysis of Behavior, 32:29-50.

Zeiler, M. D. & Thompson, T. (1986) Analysis and integration of behavioral units. Hillsdale, NJ: Erlbaum.

Zeiler, M. D. (1991) Ecological influences on timing. Journal of Experimental Psychology: Animal Behavior Processes 17:13-25.

Zuriff, G. E. (1970). A comparison of variable-ratio and variable-interval schedules of reinforcement. Journal of the Experimental Analysis of Behavior 12:369-374.