Levitin, D. J. & Cook, Perry R.(1996) Memory for musical tempo: Additional evidence that auditory memory is absolute. Perception & Psychophysics, 58, pp. 927-935.
Memory for musical tempo:
Additional evidence that auditory memory is absolute
Daniel J. Levitin
University of Oregon, Eugene, Oregon
and Stanford University, Stanford, California
Perry R. Cook
Stanford University, Stanford, California
This is an electronic Web version of the paper originally appearing in Perception & Psychophysics, 1996, 58, 927-935. Copyright 1997 Daniel J. Levitin. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted with or without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. To copy otherwise,to republish, to post on services, or to redistribute to lists, requires specific permission and/or a fee.
We report evidence that long term memory retains absolute (accurate) features of perceptual events. Specifically, we show that memory for music seems to preserve the absolute tempo of the musical performance. In Experiment 1, 46 subjects sang popular songs from memory, and their tempos were compared to recorded versions of the songs. Seventy-two of the subjects came within 8% of the actual tempo on two consecutive trials (using different songs), demonstrating accuracy near the perceptual threshold (JND) for tempo. In Experiment 2, a control experiment, we found that folk songs lacking a tempo standard generally have a large variability in tempo; this counters arguments that memory for the tempo of remembered songs is driven by articulatory constraints. The relevance of the current findings to theories of perceptual memory and memory for music are discussed.
A fundamental problem facing memory theorists is how to account for two seemingly disparate properties of memory. On the one hand, a rich body of literature suggests that the role of memory is to preserve the gist of experiences; memory functions to formulate general rules and create abstract concepts on the basis of specific exemplars (Posner & Keele, 1970; Rosch, 1975). On the other hand is the extensive literature suggesting that memory accurately preserves absolute features of experiences (Brooks, 1978; Jacoby, 1983; Medin & Schaffer, 1978). (For a further discussion of these two perspectives, see McClelland and Rumelhart, 1986).
These perspectives on the function of human memory parallel an old debate in the animal-learning literature about whether animals' internal representations are relational or absolute (Hanson, 1959; Reese, 1968). As with many psychological debates, there may be some degree of truth on both sides. Currently, philosophers of mind (Dennett, 1991) and cognitive scientists (Kosslyn, 1980) have also wondered to what extent our mental representations and memories of the world are perfect or accurate copies of experience, and to what extent distortions (or generalizations) intrude.
The study of memory for music is potentially helpful in exploring these issues because the experimental evidence is that both relational and absolute features of music are encoded in memory. Researchers have shown that people have little trouble recognizing melodies transposed in pitch (Attneave & Olson, 1971; Dowling & Bartlett, 1981), so it is clear that memory for musical melody must encode the abstract information, that is, the relation of successive pitches, or pattern of tones if not their actual location in pitch space.
Abstract encoding has also been demonstrated for temporal features: people easily recognize songs in which the relation between rhythmic elements (the rhythmic pattern) is held constant, but the overall timing or musical tempo has been changed (Monahan, 1993; Serafine, 1979). (We define tempo as the "pace" of a musical piece, or the amount of time it takes a given note or the average number of beats occurring in a given interval of time, usually beats per minute.)
With respect to this point about musical relations, Hulse and Page (1988) explain,
music emphasizes the constancy of relations among sounds...within temporal structure, music emphasizes the constancy of relations among tone durations and intertone intervals. Rhythmic structures remain perceptually equivalent over a broad range of tempos...and tempo changes involve ratio changes of duration and interval. (p. 431).
And, as Monahan (1993) argues, musical pitch and time are described most naturally in relative rather than absolute terms. Thus (within very broad limits), the identity and recognizability of a song is maintained through transposition of pitch and changes in tempo.
Evidence that memory for music also retains absolute pitch information alongside an abstract melody representation has been mounting for some time (Deutsch, 1991; Halpern, 1989; Levitin, 1994; Lockhead & Byrd, 1981; Terhardt & Seewann, 1983). For example, in a previous study one of us (Levitin, 1994) asked experimental subjects (most of them non-musicians) to sing their favorite rock and roll songs from memory, without any external reference. When compared to the actual keys of the songs, the subjects' productions were found to be at or very near the songs' actual pitches. This is somewhat surprising from the standpoint of music theory (and perhaps from "object perception" theories) because that which gives a melody its identity or "objectness" is the relation of successive intervals (both rhythmic and melodic). As most music theorists would agree, "music places little emphasis on the absolute properties of sounds, such as their exact temporal pitch or duration" (Hulse & Page, 1988, p. 432).
The animal learning literature reveals parallel evidence for both abstract and relational memory for auditory stimuli. Starlings (Hulse & Page, 1988) and whitethroated sparrows (Hurly, Ratcliffe, & Weisman, 1990) were shown to use both absolute and relative pitch information in discriminating melodies, but other evidence is mixed. Barn owls seem to remember musical patterns based on absolute pitch (Konishi & Kenuk, 1975), and wolves use absolute pitch information to the exclusion of relative pitch information to differentiate between familiar and unfamiliar howls (Tooze, Harrington, & Fentress, 1990).
What reasons are there to expect that long term memory might encode tempo information with a high degree of accuracy? Scores of conditioning experiments with animals have shown that animals can learn to estimate interval durations with great accuracy (e.g., Hulse, Fowler, & Honig, 1978).
The question of human memory for tempo has been addressed in several key studies. People report that their auditory images seem to have a specific tempo associated with them (Halpern, 1992), and in one study, subjects who imagined songs tended to imagine them at about the same tempo on occasions separated by as much as five days (Halpern, 1988). Collier & Collier (1994) found that trained jazz musicians tended to vary tempo less than 5% within a song or across multiple performances of the same song on different days, suggesting a stable memory for tempo in these musicians.
Whereas the studies just mentioned argue for the stability of internal tempos, the question of the objective accuracy of these internal tempos interested us. That is, when we remember a song (or "auralize" it, to use Ward's terminology) do we do so at its original tempo? Just as some people can match pitches accurately (and we say they have "absolute pitch"), we wondered if there are those who can match tempos accurately, and have an ability of "absolute tempo." In particular, we wanted to study this ability in everyday people with little formal musical training. This question is relevant to researchers studying human rhythm perception (e.g. Desain, 1992; Jones & Boltz, 1989; Povel & Essens, 1985; Steedman, 1977), the nature and stability of internal clocks (Collier & Wright, 1995; Collyer, Broadbent, & Church, 1994; Helmuth & Ivry, 1995), mental imagery (Finke, 1985; Kosslyn, 1980) and general theories of time perception (Block, 1990; Fraisse, 1981; Michon & Jackson, 1985).
On a music-theoretic level, Narmour (1977) argues that musical listening involves using both schematic reduction and unreducible (absolute) idiostructural information. To the extent that the original listening experience is preserved in the brain, we might expect to find these two types of information represented in memory, not just in perception. One of the goals of the current study was to test this hypothesis.
Experiment 1 was designed to discover if people encode the absolute tempo of a familiar song in memory, and if so, with what degree of precision. In Levitin (1994), subjects were asked subjects to sing contemporary popular and rock songs from memory and their productions were analyzed for pitch accuracy. Using these same data, we analyzed their previous productions for tempo accuracy. Contemporary popular and rock songs form an ideal stimulus set for our study because they are typically encountered in only one version by a particular musical artist or group, and so the song is always heard - perhaps hundreds of times - in the same key, and at the same tempo. (In fact, we selected out songs that did not meet this criterion.)
METHOD. The raw data used in this study were originally collected for a study on pitch memory (Levitin, 1994).
SUBJECTS. The subjects were 46 Stanford University students who served without pay. The subjects did not know in advance they were participating in a study involving music, and the sample included subjects with and without some musical background. All subjects filled out a general questionnaire before the experimental session. The subjects ranged in age from 16 to 35 years (mean, 19.5; mode, 18; SD, 3.7).
By self-report, the subjects' musical background ranged from no instruction to more than 10 years of instruction; 37 subjects had some exposure to a musical instrument, 9 had none. In response to the question "how much structured musical training in either performance or theory have you had?" 17 subjects reported none; 17 subjects reported 1-3 years; 5 subjects reported 3-5 years; 3 subjects reported 5-7 years; 3 subjects reported 7-10 years; and 1 subject reported more than 10 years.
MATERIALS. Prior to data collection, a norming study was conducted to select stimuli with which this subject population would be familiar. 250 introductory psychology students completed a questionnaire which asked them to indicate songs that "they knew well and could hear playing in their heads." None of the subjects in the norming study were subsequently used in the main experiment.
The results of this norming study were used to select the best known songs. Songs on this list that had been performed by more than one group were excluded from the stimulus set because of the possibility that these versions might have different tempos. From this questionnaire, fifty-eight compact discs (CDs) were selected, representing over 600 songs from which the subjects could choose. Examples of songs included are "Hotel California" by The Eagles; "Get into the Groove" by Madonna; and "This & That" by Michael Penn. (A complete list of CDs constituting the stimulus set is available from the first author.)
PROCEDURE. Subjects were seated in a sound attenuation booth alongside the experimenter. The 58 CDs chosen from the norming study were displayed alphabetically on a shelf in front of the subjects. The experimenter followed a written protocol asking subjects to select from the shelf and to hold in their hands a CD that contained a song they knew very well. Holding the CD and looking at it may have provided a visual cue for subsequent auditory imaging. There was no CD player in the booth, and at no time were the CDs actually played for the subjects. All subjects reported that they had not actually heard their chosen song in the previous 72 hours, and many had not heard it in months.
The subjects were then asked to close their eyes and imagine that the song was actually playing. They were told that, when they were ready, we wanted them to try to reproduce the tones of the song by singing, humming or whistling, and they could start anywhere in the song they wanted to. The subjects were not explicitly told anything about rhythm or tempo, nor were they specifically asked to reproduce the tempo of the songs. Their productions were recorded on digital audio tape (DAT) so that pitch and speed would be accurately preserved. The subjects were not told how much of the song to sing, but they typically sang a four-bar phrase. Following the production of this first song, the subjects were asked to choose another song and repeat the procedure; this constituted the two experimental Trials. Three of the subjects discontinued their participation after Trial 1.
ANALYSIS. The subjects' productions were compared with the songs performed by the original artists on CD, in order to compare tempos. The subjects' productions and the corresponding sections of the CD were transferred digitally to a MacIntosh computer, and the sample rate was converted from its original 44.1 KHz or 48 KHz to 22.050 Khz for storage economy.
The duration of each subject's production and the associated CD excerpt were measured using the program MacMix, and these measurements were accurate to within 0.1 msec. A typical subject production was 5 seconds long. The total selection was also divided into beats yielding two equivalent measures of production time: total duration and beats per minute. For the purposes of this report, data are presented as tempos in units of beats per minute.
RESULTS. Figure 1 shows, as a bivariate scatterplot, the tempos produced by subjects compared to the actual tempos of the remembered pieces (Trials 1 and 2 are combined). The subjects came very close to their target tempos as indicated by the high correlation between subjects' tempos and actual tempos (r=.95), and the fact that most responses fall near the diagonal. Figure 2 shows the distribution of errors that the subjects made, expressed in a histogram as percent deviations from the actual tempo.1
Figure 1. Subjects' tempos versus Actual tempos, both trials combined.
On Trial 1, 33/46, or 72%, of the subjects performed within +/- 4% of the actual tempo for the songs, and 41/46, or 89%, of the subjects performed within +/- 8% of the actual tempo (M = 4.1%; SD = 7.7%). On Trial 2, 12/42, or 40%, of the subjects came within +/- 4%, and 25/42, or 60%, came within +/- 8% of the actual tempo (M = 7.7%; SD = 7.9%). As Figure 2 shows, for the two trials combined, 72% of the responses fell within 8%.
To put these results in context, one might ask what is the JND for tempo? Drake and Botte (1993, Experiment 3) found the JND for tempo discrimination to be 6.2% - 8.8% using a two-alternative forced choice listening ("which is faster?") test; Friberg & Sundberg (1993) found JND for tempo to be 4.5% using the psychophysical method of adjustment; Hibi (1983) found the JND to be ~6% for displacement of a single time marker in a sequence, and for lengthening/shortening of a single time marker. In tapping tasks, where subjects had to either tap along with a pulse at a certain tempo (a "synchronization task") or continue tapping to a tempo set up by the experimenter ("continuation task"), JNDs of 3-4% have been reported for synchronization (Collyer, Broadbent, & Church, 1994; Povel, 1981), and 7-11% for continuation (Allen, 1975). All of the above JNDs apply for tempos in the range our subjects sang. It appears, then, that a large percentage of the subjects in our study performed within one or two JNDs for tempo, based solely on their memory for the musical pieces.
A somewhat more ecologically valid confirmation of these JND figures comes from Perron (1994). In contemporary popular music, many recordings are made with drum machines or computer sequencers instead of live drummers. The anecdotal opinion of musicians and record producers has been that these machines are much more able to hold a steady tempo than human players. Perron measured the tempo deviations in a number of these devices and found the mean tempo deviation to be 3.5% (with a standard deviation of 4.5%). Because the deviations in these machines seem to go largely unnoticed by most people (including professional drummers), it seems fair to assume that this 3.5% is less than the JND for tempo variation. An interesting implication of Perron's finding is that our subjects may well have been trying to reproduce tempos for songs that contain variations of this magnitude.
One might ask whether the subjects in our study performed consistently across the two trials. To measure this, trials on which the subjects came within +/- 6% were considered "hits" and all others were considered "misses," in accordance with the more conservative of Drake & Botte's (1993) JND estimates. 25 subjects (or 60%) were found to be consistent in their performance across trials. Yule's Q was computed as a measure of strength of association for this 2x2 table, and was significant (Q = .50; p<.04).
Next, we wondered whether the subjects who had accurate tempo memory also had accurate pitch memory (as measured in Levitin, 1994). Combining Trials 1 and 2, and using 6% as a "hit" criterion for tempo and +/- 1 semitones (s.t.) as a criterion for pitch, there was not a significant association. (Yule's Q = .31; n.s.).2
One of the implicit assumptions in these analyses is that the tempos of the songs our subjects sang are widely distributed. If all the songs fell into a narrow tempo band, one might argue that the subjects only have memory for a particular tempo (or narrow set of tempos). As Figure 1 shows, however, the range of tempos produced by subjects was very large, running from approximately 60 bpm to over 160 bpm.
Similarly, one might wonder if the good performance of subjects across trials was merely due to all of the subjects singing songs at the same tempo in both cases; individual subjects may have an idiosyncratic "preferred" tempo (or "internal tempo") that they know well and relied on for this task (Braun, 1927). We found the correlation between tempos sung on Trial 1 and Trial 2 was very low (r=.07), as was the correlation between "targets" on Trial 1 and Trial 2 (the tempos subjects were trying to reproduce; r=.04).
Figure 2. Percent deviation from actual tempo, both trials combined.
Variations of +/- 5% during the subjects' learning of the material could have occurred if subjects had heard the songs repeatedly on a cassette player or record player that did not keep accurate speed; CD players are not subject to speed fluctuations. A questionnaire item asked the subjects about whether they had heard their chosen song on CD, radio, cassette, or record player, and there was no correlation found between the source of the learning and their performance (all the commercial radio stations in the area of the study were broadcasting CDs exclusively during the study period, so "radio" responses were considered to be "CD learning"). Similarly no correlation was found between other factors such as sex, age, handedness, or musical training.
DISCUSSION. The finding that 89% of the Trial 1 subjects and 60% of the Trial 2 subjects made errors of only +/- 8% is evidence that long term memory for tempo is very accurate, and is near the discrimination threshold (as measured by JNDs) for variability in tempo. These results may even underestimate the strength of tempo memory, because our subjects were only instructed to reproduce pitches accurately; to the extent that they also reproduced tempo, they did this on their own, and without being requested by the experimenter to do so.
The distribution of errors is also instructive. As Figures 1 and 2 show, there is a tight clustering near the actual tempo, and more subjects sang too fast than too slow. Boltz (1994) reviews evidence that various forms of induced stress increase the internal tempo of individuals. This would create internal durations that are shorter than the standard, and cause the subjects to sing fast. If we can assume that the experimental situation was somewhat stressful (many subjects seemed to be embarrassed or nervous), this could account for the asymmetric error distribution favoring faster reproductions. An additional explanation for this asymmetry comes from experimental findings that people are more likely to perform faster rather than slower (Kuhn, 1977), and are better able to detect tempo decreases than increases (Kuhn, 1974).
In spite of the certain awkwardness of being asked to sing for a psychological experiment, and the concomitant desire to be done with the task as quickly as possible, most subjects reproduced the tempo of their selected songs with remarkable accuracy. In listening to the subjects' productions, and the corresponding artists' renditions, we were struck by how close the subjects came not just in tempo, but in pitch, phrasing, and stylistic nuances while singing from memory. In many cases it seemed that the subjects couldn't have performed better if they were actually singing along with the CD - but of course, they were merely singing along with a representation of the CD in their heads.
One could argue that our findings are the result of an artifact rather than actual "memory for tempo." While recalling these musical pieces, the subjects sing or imagine lyrics, and perhaps the lyrics provide a constraint for the tempo. That is to say, the tempo of a piece might be constrained by the number of syllables that have to be fit into a particular melody. To counter this argument, one would need to find a piece of music with lyrics that has no well- defined tempo standard, and ask subjects to sing it. If the range of tempos produced for such a song is much wider than the range produced by our subjects, we could argue that articulatory constraints do not account for memory for tempo.
As a first look at this issue, we examined Halpern's (1988) data. In her Experiment 1, Halpern asked subjects to imagine popular songs and set a metronome to match the tempo they heard in their heads. Most of the songs had the property that no single reference (or canonical) version existed - for example, "Happy Birthday," "London Bridge is Falling Down," and "Twinkle, Twinkle, Little Star." In general, peoples' exposure to these songs is through informal live singing (such as in elementary school), and the variety of recorded versions of these songs virtually ensures that there is no uniformly "correct" tempo. Halpern provided us with the (previously unpublished) standard deviations across subjects for these three songs, and they were (respectively) 16%, 19%, and 22% (see Table 1).
In a replication (Experiment 2), Halpern asked a second group of subjects to perform the same task. The mean tempo (across subjects) for each song varied by a large amount from one experiment to the other: for the three songs (respectively) the difference in mean tempos was 19%, 12%, and 14%. Halpern also asked her Experiment 2 subjects to adjust a metronome incrementally to the point that represented the fastest and slowest they could imagine the song. As Table 1 indicates, the tempos selected by the subjects varied more than 250%. All of Halpern's tempo variations are larger than those we found in our subjects, supporting our claim that the tempo was not tightly constrained by the lyrics.
As a control condition, we designed Experiment 2 to replicate Halpern's (1988) earlier findings about tempo variability in a production context. We recruited eight subjects and asked them to sing three familiar folk songs (mentioning nothing to them about tempo), and afterwards to sing them as fast and slow as possible. A large standard deviation in this task would replicate Halpern's finding and indicate that lyrics do not significantly constrain tempo.
SUBJECTS. The subjects were 4 University of Oregon students, and 4 members of the community, recruited without regard to musical training; six subjects had no previous musical instruction, two subjects had less than two years of musical instruction. All the subjects served without pay.
MATERIALS. To investigate tempo variability in singing, the subjects were asked to sing "Happy Birthday," "We Wish You A Merry Christmas," and "Row, Row, Row Your Boat."
PROCEDURE. The subjects were asked to sing one of the three songs (song order was randomized). When they finished, they were next asked to sing it "as slow as you possibly can" and then "as fast as you possibly can." This was repeated for the other two songs. The subjects were recorded either directly to the hard disk of a NeXT computer using the program SoundEditor, or to a Sony DATMAN Digital Audio Tape Recorder which was then transferred digitally to NeXT computer sound files.
RESULTS. Table 2 shows the distribution of tempos for the three songs sung at their normal speeds. The standard deviations are all well above the ~8% standard deviation we found for our Experiment 1 subjects. An F test between the variance in Experiment 1 and the lowest of the Experiment 2 variances (for "Row, Row, Row Your Boat") confirms that the variances are significantly different [F (1,7) = 24.83; p<.01], using Leven's test for homogenity of variances, and Satterthwaite's correction for unequal n, (Snedecor & Cochran, 1989).
Figure 3 shows the variability of tempos expressed as per cent deviation from each song's mean tempo. When compared with Figure 2, the greater variability in tempos is easy to see. This replicates Halpern's (1988) finding that production variability on these types of songs is large. Furthermore, the fast and slow performances of each of the three songs showed that there is indeed a large range over which people can produce familiar songs with lyrics. The song "Happy Birthday" exhibited maxima and minima of 421 and 48 bpm (with the mean across subjects being 284 and 76 bpm). "We Wish You A Merry Christmas" exhibited maxima and minima of 129 and 22 (with means of 102 and 36). "Row, Row, Row Your Boat" exhibited maxima and minima of 280 and 41 (with means of 226 and 72). This large deviation in "normal" speeds, coupled with the large range of possible speeds, confirms that a given song can be sung across a very broad range of tempos, and that lyric or articulatory constraints are probably not playing a role in our Experiment 1 subjects' accurate memory for tempo.
DISCUSSION. In Experiment 2, we replicated Halpern's (1988) finding that the variability in tempos for popular songs that lack a tempo standard is in the 10-20% range, well exceeding the variability of our Experiment 1 subjects. The accurate performance in Experiment 1 does not seem to have been due to constraints imposed by lyrics.
Songs contain both pitch and tempo information during their performance. What can we say about the mental representation of songs in the brain? Drake and Botte (1993) argued for the existence of a single brain mechanism that judges the tempo of sequences (not merely the durations of intervallic events). Judgment of tempos might be controlled by a central timing mechanism located in the cerebellum (Helmuth & Ivry, 1995), the operation of which is based on oscillatory processes (Ivry & Hazeltine, 1995). Such an "internal clock" may not keep perfect time, but be subject to 1/f noise (Gilden, Thornton, & Mallon, 1995). But pitch perception seems to occur in brain systems separate from time perception, beginning with frequency selective cells in the cochlea, all the way through to the auditory cortex (Moore & Glasberg, 1986). So it would seem that the perception of pitch and tempo is handled by different systems.
But memory for songs may somehow combine or link pitch and tempo representations. Our intuition - despite the finding that there was not a statistically significant correlation between tempo memory and pitch memory - is that the entire spectral-temporal profile of a song is encoded in memory in some fashion and that repeated listenings strengthen the trace. Pitt (1995) provided evidence for a central representation of instrumental timbre, which supports the notion that memory preserves a complex, spectral- temporal image.
Figure 3. Distribution of tempos in Experiment 2. (a) Happy Birthday, SD = 20%. (b) We Wish You A Merry Christmas, SD = 17%. (c) Row, Row, Row Your Boat, SD =11%.
In any event, it seems increasingly clear that human memory encodes both the abstract and the relative information contained in musical pieces, and that people are able to access whichever is required based on the given task. This supports previous theoretical predictions that memory does encode absolute features of the original stimulus, along with abstract relations (Bower, 1967; Hintzman, 1986). This would also account for peoples' ability to easily recognize songs in transposition, and for our findings of being able to reproduce a particular absolute feature. Premack (1978) offers an account of the relation between abstract and absolute memory, suggesting that abstraction is only induced as a response to an overburdened memory; that is, only absolute cues are memorized until memory becomes taxed, and then the organism forms an abstract rule.
Some colleagues have wondered if our results are merely the effect of "overlearning" of the stimuli, and suggest that our findings are nothing more than a measure of how well the stimuli were learned. But we agree with Palmer (S. E. Palmer, Personal Communication, October, 1994) who argues that increased learning is just a matter of increasing the signal to noise ratio in memory retrieval. To toss aside the present findings as "merely overlearning " is to miss our point; to paraphrase Halpern (1992), we are interested in the nature of what is encoded in memory when memory is working well. Overlearned, or well-learned stimuli, provide the cleanest measure of this.
It is well established, at least anecdotally, that expert musicians are capable of producing tempos from memory with great accuracy. The present study found that even nonmusicians have an accurate representation of tempo that they are able to reproduce. Considering this together with the results of Levitin (1994) and other studies, we wonder (perhaps somewhat facetiously) if what people encode in memory is the first bar of what would be written music, including the key, time signature, and metronome marking! This would be parsimonious, allowing the brain to store only the temporal and melodic relations between tones, and imposing temporal and pitch anchors only when needed. And it would explain the ease with which people are able to identify changes in pitch and timbre. But our subjective impression, based on introspection, is that long term memory for music functions more like what Attneave and Olson (1971) described as short term music memory:
The circulating short-term memory of a tonal sequence just heard, which is experienced as an auditory image extended in real time (as a melody that 'runs through one's head'), typically preserves the key or specific pitch values of the original. (p. 164).
Attneave and Olson (1971) believed that in contrast, the long-term memory trace is encoded based only on relations, or intervals. They note that there is nothing about such a coding system that precludes the additional storage of pitch or tempo information, "but it is evident that normal individuals do not, in fact, preserve this kind of information with any high degree of precision" (pp. 164-165). We believe that the present study provides evidence against this point, and that ordinary individuals do possess representations that are more accurate than was previously believed. Furthermore, the present study provides evidence that the two types of operations present in musical listening - schematic reduction and unreducible idiostructural information (Narmour, 1977) - appear to also be present in long-term memory.
Allen, G. D. (1975). Speech rhythm: Its relation to performance universals and articulatory timing. Journal of Phonetics, 3, 75-86.
Attneave, F., & Olson, R. K. (1971). Pitch as a medium: A new approach to psychophysical scaling. American Journal of Psychology, 84, 147-166.
Block, R. A. (Ed.) (1990). Cognitive models of psychological time. Hillsdale, NJ: Erlbaum.
Boltz, M. G. (1994). Changes in internal tempo and effects on the learning and remmebering of event durations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1154-1171.
Bower, G. H. (1967). A multicomponent theory of the memory trace. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation: Adances in research and theory (Vol. 1, pp. 229-325). New York: Academic Press.
Braun, F. (1927). Untersuchungen öber das persðnliche tempo. Archiv der gesamten Psychologie, 60, 317360.
Brooks, L. R. (1978). Nonanalystic concept formation and memory for instances. In E. Rosch & B. B. Lloyd (Eds.), Cognition and categorization. Hillsdale, NJ: Erlbum.
Collier, G. L. & Collier, J. L. (1994). An exploration of the use of tempo in jazz. Music Perception, 11, 219-242.
Collier, G. L. & Wright, C. E. (1995). Temporal rescaling of simple and complex ratios in rhythmic tapping. Journal of Experimental Psychology: Human Perception and Performance,21, 602-627.
Collyer, C. E., Broadbent, H. A., & Church, R. M. (1994). Preferred rates of repetitive tapping and catagorical time production. Perception & Psychophysics, 55, 443-453.
Crowder, R. G., Serafine, M. L., & Repp, B. (1990). Physical interaction and association by contiguity in memory for the words and melodies of songs. Memory & Cognition, 18, 469-476.
Dennett, D. C. (1991). Consciousness Explained. Boston: Little, Brown.
Desain, P. (1992). A (de)composable theory of rhythm perception. Music perception, 9, 439-454
Deutsch, D. (1991). The tritone paradox: An influence of language on music perception. Music Perception, 8, 335-347.
Dowling, W. J., & Bartlett, J. C. (1981). The importance of interval information in long-term memory for melodies. Psychomusicology, 1, 30-49.
Drake, C., and Botte, M.-C. (1993). Tempo sensitivity in auditory sequences: Evidence for a multiple-look model. Perception & Psychophysics, 54, 277-286.
Finke, R. A. (1985). Theories relating mental imagery to perception. Psychological Bulletin, 98, 236-259.
Fraisse, P. (1981). Rhythm and tempo. In D. Deutsch (Ed.), The psychology of music (pp. 149-180). New York: Academic Press.
Friberg, A., & Sundberg, J. (1993). Perception of just noticeable time displacement of a tone presented in a metrical sequence at different tempos. In Kungl. tekniska hoegskolan, Speech Transmission Laboratory [Eds.], Quarterly progress and status report. Stockholm: Royal Institute of Technology, Dept. of Speech Communication, Speech Transmission Laboratory.
Gilden, D. L., Thornton, T., Mallon, M. W. (1995). 1/f noise in human cognition. Science, 272, March 24, 1995.
Halpern, A. R. (1988). Perceived and imagined tempos of familiar songs. Music Perception, 6, 193-202.
Halpern, A. R. (1989). Memory for the absolute pitch of familiar songs. Memory & Cognition, 17, 572-581.
Halpern, A. R. (1992). Musical aspects of auditory imagery. In D. Reisberg (Ed.), Auditory Imagery (pp. 1-28). Hillsdale, NJ: Erlbaum.
Hanson, H. M. (1959). Effects of discrimination training on stimulus generalization. Journal of Experimental Psychology, 58, 331-334.
Helmuth, L. L., & Ivry, R. B. (1995). When two hands are better than one: Reduced timing variability during bimanual movements. Journal of Experimental Psychology: Human Perception and Performance, 22, 278-293.
Hibi, S. (1983). Rhythm perception in repetitive sound sequence. Journal of the Acoustical Society of Japan, 4, 83-95.
Hintzman, D. L. (1986). "Schema abstraction" in a multipletrace memory model. Psychological Review, 93, 411428.
Hulse, S. H., Fowler, H., & Honig, W. K. (Eds.) (1978). Cognitive processes in animal behavior. Hillsdale, N.J.: Erlbaum.
Hulse, S. H., & Page, S. C. (1988). Toward a comparative psychology of music perception. Music Perception, 5, 427-452.
Hurly, T. A., Ratcliffe, L., & Weisman, R. (1990). Relative pitch recognition in white-throated sparrows, Zonotrichia albicollis. Animal Behavior, 40, 176-181.
Ivry, R. B., & Hazeltine, R. E. (1995). The perception and production of temporal intervals across a range of durations: Evidence for a common timing mechanism. Journal of Experimental Psychology: Human Perception and Performance, 21, 3- 18.
Jacoby, L. L. (1983). Remembering the data: Analyzing interaction processes in reading. Journal of Verbal Learning and Verbal Behavior, 22, 485-508.
Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review, 96, 459-491.
Konishi, M., & Kenuk, A. S. (1975). Discrimination of noise spectra by memory in the barn owl. Journal of Comparative Physiology, 97, 55-58.
Kosslyn, S. M. (1980). Image and mind. Cambridge, MA: Harvard.
Kuhn, T. L. (1974). Discrimination of modulated beat tempo by professional musicians. Journal of Research in Music Education, 22, 270-277.
Kuhn, T. L. (1977). Effects of dynamics, halves of exercise, and trial sequences on tempo accuracy. Journal of Research in Music Education, 25, 222-227.
Levitin, D. J. (1994). Absolute memory for musical pitch: Evidence from the production of learned melodies. Perception & Psychophysics, 56, 414-423.
Lockhead, G. R., & Byrd, R. (1981). Practically perfect pitch. Journal of the Acoustical society of America, 70, 387-389.
McClelland, J. L., & Rumelhart, D. E. (1986). A distributed model of human learning and memory. In D. E. Rumelhart, J. L. McClelland, and the PDP Research Group (Eds.), Parallel Distributed Processing, Volume 2: Psychological and Biological Models (pp. 170-215). Cambridge, MA: MIT Press.
Medin, D. L., & Shaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207-238.
Michon, J. A., & Jackson, J. L. (Eds.) (1985). Time, mind, and behavior. New York: Springer-Verlag.
Monahan, C. B. (1993). Parallels between pitch and time and how they go together. In T. J. Tighe & W. J. Dowling (Eds.), Psychology and music: The understanding of melody and rhythm. Hillsdale, NJ: Erlbaum.
Moore, B. C. J., & Glasberg, B. R. (1986). The relationship between frequency selectivity and frequency discrimination for subjects with unilateral and bilateral cochlear impairment. In B. C. J. Moore & R. D. Patterson (Eds.), Auditory Frequency Selectivity (pp. 407-414). New York: Plenum Press.
Narmour, E. (1977). Beyond Schenkerism: The need for alternatives in music analysis. Chicago: University of Chicago Press.
Perron, M. (1994). Checking tempo stability of MIDI sequencers. Paper presented at the 97th Convention of the Audio Engineering Society, November 10-13, 1994, San Francisco.
Pitt, M. A. (1995). Evidence for a central representation of instrumental timbre. Perception & Psychophysics, 57, 43-55.
Posner, M. I., & Keele, S. W. (1970). Retention of abstract ideas. Journal of Experimental Psychology, 83, 304-308.
Povel, D. J. (1981). Interval representation of simple temporal patterns. Journal of Experimental Psychology: Human Perception and Performance, 7, 318.
Povel, D. J., & Essens, P. (1985). Perception of temporal patterns. Music Perception, 2, 411-440.
Premack, D. (1978). On the abstractness of human concepts: Why it would be difficult to talk to a pigeon. In S. H. Hulse, H. Fowler, & W. K. Honig (Eds.), Cognitive processes in animal behavior (pp. 423-451). Hillsdale, NJ: Erlbaum.
Reese, H. W. (1968). The perception of stimulus relations. New York: Academic Press.
Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104, 192-223.
Serafine, M. L. (1979). A measure of meter conservation in music, based on Piaget's theory. Genetic Psychology Monographs, 99, 185-229.
Snedecor, G.W., & Cochran, W.G. (1989). Statistical methods (8th ed.). Ames, IA: Iowa State University Press.
Steedman, M. J. (1977). The perception of musical rhythm and metre. Perception, 6, 555-570.
Terhardt, E., & Seewan, M. (1983). Aural key identification and its relationship to absolute pitch. Music Perception, 1, 63-83.
Tooze, Z. J., Harrington, F. H., & Fentress, J. C. (1990). Individually distinct vocalizations in timber wolves, canis lupus. Animal Behavior, 40, 723-730.
This research was supported by a National Defense Science and Engineering Graduate Fellowship to the first author, and by NSF Research Grant BNS 85-11685 to R. N. Shepard. This report was prepared in part while the first author was a Visiting Research Fellow at the Center for Computer Research in Music and Acoustics, Stanford University, Winter 1994-95. We are grateful to the following for their generous contributions to this work: Jamshed Bharucha, Chris Chafe, Jay Dowling, Andrea Halpern, Stephen Handel, Douglas Hintzman, Jay Kadis, Carol Krumhansl, Max Mathews, Joanne Miller, Caroline Palmer, John R. Pierce, Michael Posner, Bruno Repp, Roger Shepard, Julius Smith, the researchers and staff of CCRMA, and especially to Malcolm Slaney and the participants in the weekly CCRMA Hearing Seminar. Any errors remaining in this work were probably pointed out to us by these people, but we were too stubborn to change them. Correspondence may be sent to D.J. Levitin, Behavioral Sciences Laboratory, Interval Research Corporation, 1801C Page mill Road, Palo Alto, CA 94304. (650) 842-6236 (email: firstname.lastname@example.org) P.R. Cook is now at the Department of Computer Science, Princeton University.
1. Only 88 data points are shown in Figures 1 and 2. We started with 46 subjects in Trial 1; three discontinued participation after Trial 1, and we eliminated the data of one Trial 2 subject. This Trial 2 subject asked to sing a Mozart Piano Sonata on Trial 2, complaining that she didn't know any rock songs other than her Trial 1 choice ("Hey Jude"). The experimenter let her sing the Piano Sonata. Later, during the analysis phase of the study, we realized that multiple recorded versions of such a piece exist, and virtually all are in the same key, so this subject's production was not excluded from the original pitch analysis reported in Levitin (1994). However, the range of tempos over which such pieces are performed typically varies, so this subject was excluded in the tempo analysis, on the grounds that a single reference standard did not exist.
2. The low correlation between pitch memory and tempo memory could indicate that subjects have either independent storage or integrated storage of these two attributes, in the sense of the terms proposed by Crowder, Serafine, and Repp (1990). In independent storage, memory for one element is uninfluenced by the other; in integrated storage, the elements are related in memory in such a way that one component is better recognized in the presence of the other than in its absence. The present data do not provide direct evidence for choosing between these two possibilities, but it is our intuition that tempo and pitch are best characterized by an integrated storage account. Some subjects are able to recall both attributes with great accuracy (one attribute might indeed aid the accurate recall of the other) while other subjects showed no such benefit.