Picture Poems:
Some Cognitive and Aesthetic Principles

This paper was written in response to Willie van Peer's paper "Typographic Foregrounding". It explores some cognitive and aesthetic principles concerning picture poems. In doing so it will suggest that some of these principles, pace the apparent extravagance of these poems, are quite logical extensions of principles regularly granted to poetry of a more conservative kind. At the same time, it will try to do justice to their extravagance too. Van Peer's abstract says as follows:

This article investigates the way in which devices of foregrounding play a role at the typographical level of a text's organisation. In poetry, such devices are very old and are regularly used in a bold way, thereby creating specific effects. However, a historical overview reveals that such bold typographic experiments are not distributed evenly over time. It also emerges that some of these texts survive in the literary canon, while others are forgotten. On the basis of an analysis of some test cases in literary history, hypotheses are proposed which may explain this uneven distribution.

                                                      (Hutchinson, 1978, p. 43)

The subtlety of van Peer's analysis can readily be seen in his discussion of George Herbert's famous picture poem, "Easter Wings":

[T]his is an unmistakably religious poem, one of the best-known texts from Herbert's The Temple, and one of the most authentic expressions of devotion in the Anglican church. [...] [T]he title explicitly refers to a subject matter corresponding to the typographical form. The wings symbolise man's elevation resulting from his belief in Revelation: note also the reference to the divine wings ('imp my wing on Thine') and to the lark, yet another explicit topicalisation of the motif of the wing ('With Thee/ O let me rise/ As larks'). More important still is the fact that each stanza displays a typographical form which closely mirrors the development of the theme. This can be seen quite clearly from the verbs and their distribution across verse lines. Each time the length of the line shrinks, verbs occur which refer to a process of diminution: 'lost' (line 2), 'decaying' (line 3), 'became poore' (lines 4-5), 'became ... thinne' (lines 14-15). When the width of the verse line increases, verbs belonging to a semantic field indicating increase and growth are used: 'rise' (line 7), 'further' (line 10), 'combine' (line 17), 'imp' (line 19), 'advance' (line 20). This pattern is reinforced by the change of tense occurring in each stanza: past tense in the first half of each stanza, when the lines start to grow shorter: 'createdst' (line 1), 'lost' (line 2), 'became' (line 4), 'did' (line 11), 'didst (line 13), 'became' (line 14); via present tense when the lines begin to increase in length: 'let' (line 7), 'sing' (line 9), 'let' (line 17), 'feel' (line 18), 'imp' (line 19); to the future in the final line of each stanza: 'shall further' (line 10), 'shall advance' (line 20) (55-57).

I find Willie van Peer's paper illuminating and mind-expanding. But I also find some significant gaps in his argument, which I propose to fill in in the present paper - this, in turn, may change, eventually, the emerging picture. In the first place, I believe, the term "foregrounding" as a wholesale key-term is insufficient for his purpose. One should distinguish degrees of unnaturalness in foregrounding. In poetry, language is foregrounded relative to the non-poetic use of language; and in some poetic styles, it is more foregrounded than in others. In the second place, I find his explanation based on the arbitrariness of the graphemic sign unsatisfactory. One must realise that the string of phonological signifiants is no less arbitrary with reference to the semantic signifiés than the string of graphemic signifiants is with reference to the phonological signifiés. So we have a whole hierarchy of sign-relationships, characterised, throughout, by arbitrariness. It is just that the arbitrariness of the graphemic sign is somehow different from the arbitrariness of, e.g., the phonological sign. Consequently, one should be careful with the argument: "The founding principle of alphabetic writing is the arbitrary character of the signs used, as a result of which they are more or less void of mimetic meaning, unlike the partial or rudimentary mimesis of ideographic and logographic script" (53). My point is that in Western poetry we compare the effects of "typographic foregrounding" in alphabetic script not to its effects in "ideographic and logographic script", but rather to the effects of phonetic or syntactic or semantic foregrounding. Thus, the explanation becomes the explicandum: we must explain why "typographic foregrounding" displays a more arbitrary character of the signs used, than phonetic or syntactic or semantic foregrounding. In the third place, I am very sympathetic with van Peer's observation "that such 'concrete' poems become popular in periods of great social, political and ideological upheaval" (58). However, it should be noticed that such "typographic foregrounding" of poems is, typically, part of poetic styles usually called "manneristic" or "metaphysical", and it has been frequently suggested that "manneristic" or "metaphysical" styles tend to occur in periods of great social, political and ideological upheaval, when more than one scale of values prevail. This does not imply that van Peer's suggestion is wrong; only that wider issues are involved. My point is that in terms of relationships between signifiants and signifiés, a great variety of mannerist devices are perceived as more unnatural than their non-manneristic counterparts; and that in this respect, "typographic foregrounding" appears to be only one particular instance of a wider manneristic principle. As for the possible relationship between mannerism and "periods of great social, political and ideological upheaval, when more than one scale of values prevail", I have elsewhere suggested a cognitive explanation, to which I shall briefly refer later.

I suggest that my three above points may best be accounted for by a set of homogeneous principles derived from the assumption that man is a sign-using animal. Human culture consists of long series and hierarchies of signifiants and signifiés. There is some experimental evidence which, according to my interpretation, suggests that man is programmed to reach as fast as possible the last link of the chain of signifiants and signifiés. Such a programming has considerable survival value. If a certain noise is "a sign of" some predator, a knowledge of the predator has greater survival value than a knowledge of the noise. In a complex cultural situation of human society, however, in which, e.g., verbal magic may be the source of maladaptive behaviour, the signifiants and signifiés must properly be kept apart. It is here where poetry comes in.

"The function of poetry" wrote Jakobson in 1933, "is to point out that sign is not identical with its referent": why do we need this reminder? "Because", continued Jakobson, "along with the awareness of the identity of the sign and the referent (A is A1), we need the consciousness of the inadequacy of this identity (A is not A1)".

This antinomy is essential since without it the connection between the sign and the object becomes automatical and perception of reality withers away (Erlich, 1965: 181).

In ordinary, nonpoetic language, we typically "attend away" from the signifiants to the signifiés: sometimes we remember the information, but not the exact words in which it was conveyed; sometimes we can't even tell in what language we received some information, or whether it was in the verbal medium at all. Poetic language, by contrast, compels us "to attend back" to the signifiant or, rather, to ever higher signifiants in a great chain of signs: from the extra-linguistic referent to the verbal (semantic) signifiant; from the semantic unit to the string of phonological signifiers and eventually, perhaps, to the graphic signifier of the phonological unit. The phonetic patterning of poetry (rhyme, metre, alliteration) typically directs attention away from the semantic to the phonological component of language; whereas figurative language (and many other semantic devices) direct attention from the extra-linguistic referent to the verbal sign. In this perspective, it would be but natural to expect to find some patterning of the typographic signifier as well. Such patterning, however, is relatively rare in poetry.

This process can be most readily appreciated when it breaks down, as in riddles. Consider the following riddle, common among children: "Which cheese is made backward?" "Edam". Our linguistic competence requires us to run through the hierarchy of signifiers, from the string of graphemes, through the string of phonemes, the semantic units, right down to extra-linguistic reality, and look for some odd production processes in the dairy. The riddle is, precisely, a riddle, because the understander is forced to exit this chain of signifiers and at some theoretically unspecifiable point, at that. In the present instance, the exit occurs at the graphemic level: "edam" is the string of letters that constitute "made", in a reverse order. If we contrive an admittedly less elegant riddle, "What matronly woman is made backward?", the exit will be at the phonological level of the same chain, the solution being "dame". In poetic language no exit is forced on the understander: the whole chain of signifiers is realised, but the understander must linger at some of its earlier stages.

So far there should be little disagreement between Willie van Peer and myself. But here a further step seems to be required. Granted that poetic language draws attention to itself more than nonpoetic language, one must make a stylistic distinction between poetic styles in which this is more, and those in which this is less conspicuous. In non-manneristic styles, classic or romantic for example, the transition from the signifiant to the signifié is relatively smooth, in spite of all. The phonetic patterns in these styles are perceived as some pleasant fusion of sounds in the back of one's mind. In manneristic styles, such as metaphysical or modernistic poetry, language tends to direct attention back to the signifiers more conspicuously. To put my argument briefly, in poetic language we are more aware of the separateness of the signs than in nonpoetic language; and within poetic language, we are more aware of their separateness in manneristic than in, e.g., romantic poetry.

In classic or romantic poetry, the incongruity of the sign vehicle and what it signifies is eventually resolved, and in a rather smooth manner. Not so in metaphysical or modernistic poetry. Sypher (1955: 122) speaks of "Donne's false and verbal (perhaps false? perhaps verbal?) resolutionshis incapacity to commit himself wholly to any one world or view". "The resolution is gained, if at all, only rhetorically, not [through] reason" (123). What we have in a metaphysical pun or conceit is one single sign function, in which each the tenor and the vehicle are so consistently developed that one is compelled to be aware of both their identity and incongruity. The same can be said, mutatis mutandis, of, e.g., the verse lines on the page which convey the string of phonemes that carry the verbal message, and the visual design of, say, a pair of wings, or an altar, or a wounded dove and a fountain, at one and the same time: both are so consistently developed that one is compelled to be aware of both their identity and incongruity. But such rival organisations of typography need not involve some visual design. Much manneristic poetry is distinguished by a device of alternative patterning that became a solid convention: acrostic. When you read the "Envoi" section in some of Villon's ballades, for example, you may read the lines from left to right for the poetic message; but if you read the first letter of each line top-down, you receive the word "Villon". The word "Villon" exists only as part of the typographic design of the poem, but not of its words. 1


Vous portastes, digne Vierge, princesse,
esus regnant qui n'a ne fin ne cesse,
e Tout Puissant, prenant nostre foiblesse,
aissa les cieulx et nous vint secourir,
ffrit a mort sa tres chiere jeunesse;
ostre Seigneur tel est, tel le confesse:
En ceste foy je vueil vivre et mourir.

Neoclassical poetic theory detested mannerism, but was well aware of this hierarchy of patterned signs, and derived a normative principle from it: the lower the item in the hierarchy, upon which the poetic patterns were founded, the more perfect it was; the higher, the more objectionable. Consequently, picture poems were worst of all. Consider the following passage from No. 62 of Joseph Addison's Spectator Papers:

As true Wit generally consists in this Resemblance and Congruity of Ideas, false Wit chiefly consists in the Resemblance and Congruity of single Letters, as in Anagrams, Chronograms, Lipograms, and Acrosticks: Sometimes of Syllable, as in Ecchos and Doggerel Rhymes: Sometimes of Words, as in Punns and Quibbles; and sometimes of whole Sentences or Poems, cast into Figures of Eggs, Axes or Altars ... As true Wit consists in the Resemblance of Ideas, and false Wit in the Resemblance of Words, according to the foregoing Instances; there is another kind of Wit which consists Partly in the resemblance of Ideas, and partly in the Resemblance of Words; which for Distinction Sake I shall call mixt Wit. This Kind of Wit abounds in Cowley, more than in any Author that ever wrote.

Mixt Wit is therefore a Composition of Punn and true Wit, and is more or less perfect as the Resemblance lies in the Ideas or in the Words.

Now I believe that bad classicism can make excellent mannerism, and that these evaluative terms can be translated into descriptive terms with great profit. Instead of "and is more or less perfect" we could read "and its focus is more or less integrated", or "conforms with neoclassical or mannerist taste as the Resemblance lies in the Ideas or in the Words", so as to make the above quote fit perfectly into our scheme. In this way, one can make illuminating generalisations on Mannerism at its best, based on the theoretical writings of the Classicists.

Coming back now to the issue of metaphysical puns and alliteration proper, how can we tell the difference between them? Roughly, by the relationship between the sound pattern and its referent(s). I shall illustrate this by two examples taken from my criticism of Keyser and Prince's discussion of some Wallace Stevens poems in terms of "folk etymology" (cf. Tsur, 1992a: 511-525). The way Keyser and Prince treat repeated sound clusters in Wallace Stevens' poems raises a fundamental question: are we to attribute additional referential meaning to the superimposed sound clusters, and thus turn them into puns, or are we to regard them as a euphonic sound texture in its own right, with no extra meaning assigned? In other words, are we to regard, e.g., the repetition of the sound sequence /yu/ in

(3) You are that white Eulalia of the name

as a musical effect or as a pun? Does the sound sequence /yu/ confer the referent "you" to "Eulalia"? In pun, each of the two sign functions is "striving" to reassert itself in the reader's perception, to preserve, as it were, the two functions' "warring identity"; the result being a perceptual quality of wit. In alliteration, the alternative sound patterns do not vie in rivalry for the reader's attention; they are peacefully arranged in a hierarchic order, the arbitrary referential sign being in the foreground, with the non-referential expressive sound clusters constituting a more or less "thick" musical background texture. Psychologically, then, the two conceptions of sound repetition are incompatible. This incompatibility is curiously similar to that obtaining between the elements which constitute the grotesque. That is to say, if the strict separation of the respective arrangements could somehow be interfered with, one might expect a resulting "high tension" effect, not unlike the strange sensation of confusion and disorientation associated with the grotesque.

Let us consider Keyser and Prince's quotations from Stevens' "Ordinary Evening in New Haven":


When the mariners came to the land of the lemon trees,
At last, in the blond atmosphere, bronzed hard,
They said, "We are back once more in the land of the elm trees,

But folded over, turned around". It was the same,
Except for the adjectives, an alteration
Of words that was a change of nature, more

Than the difference that clouds make over a town.
The countrymen were changed and each constant thing.
Their dark-colored words had redescribed the citrons.

Keyser and Prince (1979: 76) comment: "The shift of letters in the first syllable of lemon, lem, produces elm, and, as Stevens says, this change of language, in itself, produces a change of nature". Here, the term "folk etymology" has picked out those elements to which it was tuned, and molded Stevens' poetic technique in its own image, with little regard for the possibility that the phoneme sequence /lem/ repeated as /elm/ may or may not be a case of plain alliteration. But notice that this repeated sound cluster is actually embedded here in a rich texture of alliteration, about which Keyser and Prince say nothing. Thus, the phonological sequence /blond/ is repeated in the same order (with the liquid replaced by another liquid and with the addition of a voiced sibilant) in /bronzd/. Both phoneme sequences, /l-n-d/ and /r-n-d/ are repeated, in the same order, in land (twice), and in turned around. These sound clusters have further ramifications in the quoted passage, on which I shall not dwell here.

It is interesting to observe the two consecutive phrases "in that blond atmosphere, bronzed hard". The two phrases are related on three levels: First, they constitute a syntagmatic sequence; second, their sound clusters are related by a paradigmatic pattern; and third, they are contrasted by such semantic components as [+/-dark] and [+/-hard]. These contrasting component form semantic patterns that are redundant from the syntagmatic point of view, and it would appear that they are absorbed in the background texture, together with the phonological clusters. Prima facie, the repeated cluster /lem/ /elm/ is just one more thread in this network of sound texture, which appears to be non-referential, superimposed upon the syntagmatic sequence of arbitrary linguistic signs.

In the second tercet of the quotation, however, the tables are turned on the reader. "It was the same, / Except for the adjectives, an alteration / Of words that was a change of nature". One wonders what could be substituted for "was", in the last clause, of more specific verbs, such as "produced" or "reflected". Keyser and Prince paraphrase "was" as "produced", without any reservation. From the context of the poem, I gather that "reflected" might be no less appropriate. What the poem says is quite sophisticated, and could be, I think, paraphrased as follows: "When they came to the land, they found there was a change of nature: elm trees were replaced by lemon trees; in the language of the poem that describes this change of nature, the shift is rather slight: only the word lemon has been replaced by elm (the letters of which are already contained in the replaced word). The reader who has access to this real-world state of affairs only through the language of the poem (description first, landscape last), may think that it is the slight change of language that produced the change of nature". That is why the copula was is so much more appropriate here than either of the content verbs "reflected" or "produced"; it fits into both orders of representation.

The reader is shocked out of his complacent indulgence in poetic language in three different ways at one and the same time. First, there is a shift of the chain of causation: the thing experienced first (language) produces, as it were, the thing experienced later (landscape), irrespective of the logical sequence of events. Second, there is a shift from language to metalanguage ("adjectives"). Third, if Keyser and Prince are right, a referential sign function is assigned to a non-referential pattern of potential sign vehicles (the "shift of letters" signifies a "change of nature"). Thus, the alliteration is turned into a pun after the event. The non-referential sound pattern becomes the sign vehicle of a sign function, emerges from the background texture, and "strives" to establish its "warring identity", to establish itself in the reader's perception. In other words, both the sign vehicle and the signified are so literally developed that they both assume independent existences of their own.

When we consider the instances of typographic foregrounding in picture poems, those considered by Willie van Peer and many many more, it may become pretty clear that all these instances are in the focus of our attention, rather than constitute some harmonious fusion in the back of our mind, as in alliteration. We are not likely to discover an instance in which the graphic design of the typography and the thematic element of a poem tend to blend smoothly. Critics who are inclined to regard conditioned reflex as the basis of artistic taste would suggest that such picture poems are rare, therefore they appear to us strange. In what follows I shall argue that the obverse is the case: such picture poems are strange, and therefore rare.

My first argument will draw upon an analogy with verbal synaesthesia; my second argument upon the conception of Al Liberman and his colleagues at the Haskins Laboratories, of "Why speech is so special?".

The term synaesthesia suggests the joining of different sensory domains. One must distinguish between the joining of sense impressions derived from the various sensory domains (as, e.g., in "genuine coloured hearing"), and the joining of terms derived from the vocabularies of the various sensory domains. The former concerns synaesthesia as a psychological phenomenon; the latter is Verbal Synaesthesia. Literary Synaesthesia is the exploitation of verbal synaesthesia for specific literary effects. These specific literary effects do not presuppose the co-occurrence of sense impressions from two different sensory domains; they can be accounted for, rather, by semantic manipulations of meaning-components of terms derived from the vocabularies of the various sensory domains. For our purpose, synaesthesia is a kind of metaphor, in which the logical contradiction is stronger than usual. In my various writings I have claimed that a strong emotional effect is achieved when the conflicting terms are perceived as smoothly fused, and a strong witty effect is achieved when attention is attracted to the contradiction. 2 The latter kind of synaesthetic metaphor usually (but not exclusively) occurs in various kinds of mannerism. In Romantic poetry and in 19th century Symbolism, Literary Synaesthesia typically contributes to some undifferentiated emotional quality, some "vague, dreamy, or uncanny hallucinatory moods", or some strange, magical experience or heightened mystery. In some varieties of mannerist poetry, as in some modernist and 17th century Metaphysical poetry, this typically makes for a witty quality. I have attempted to follow the strategies by which attention is manipulated by the text. Some of my tools were derived from Ullmann's findings in synaesthesia.

Ullmann examined the intersense transfers in the poetry of twelve nineteenth-century poets in three languages, English, French and Hungarian. The direction of the transfers was checked. According to his findings, "transfers tend to mount from the lower to the higher reaches of the sensorium, from the less differentiated sensations to the more differentiated ones, and not vice versa" (Ullmann, 1957: 280). It is in strict conformity with the first tendency that the touch, the lowest level of the sensorium, should be the main purveyor of transfers. Though it is only one of six possible sources, it looms large in all twelve poets analysed (ibid., 282). The predominant destination, however, turned out, surprisingly, to be not the sense of sight, but the sense of sound, the second highest in the hierarchy.

I have translated these statistical findings into analytical tools. Speaking of the higher in terms of a lower sense may generate that intense emotional atmosphere, those "vague, dreamy, or uncanny hallucinatory moods". Transfer in the opposite direction would generate a witty quality. At some variance with Ullmann's explanation, I suggested that the predominant destination turned out to be not the sense of sight, but the sense of sound, because stable characteristic visual shapes tend to disrupt the smooth fusion of the senses. Indeed, Erzsébet Dombi (1974), who applied Ullmann's methods to Hungarian symbolist poetry, found that in this corpus the predominant destination was sight; but it was only colours, not shapes.3

Let us consider a pair of examples by Keats, upon which some of the foregoing distinctions and generalizations may be focused.

(5) And taste the music of that vision pale.
            (Keats, Isabella: XLIX)

(6) The same bright face I tasted in my sleep
            (Keats, Endymion, I: 895)

Some evasive mood, some uncanny atmosphere, is suggested in (5). It is generated with the help of the double intersense transfer, both in the expected direction, upward, that is, it speaks of vision in terms of music; and of music, in turn, in terms of taste. It should be noticed that though vision belongs to the visual vocabulary, it is a thing-free quality, detached from any stable, characteristic visual shape. Music connotes a pleasant fusion of sounds, expanding toward the perceiving self; the transfer from a lower sense, taste, enhances the indistinctness of the fused sensations. The powerful fusion of the discordant senses heightens the discharge of emotions, eliminating the contradictory sensuous ingredients, leaving the reader with the feel of a supersensuous, mysterious atmosphere.

As for (6), Ullmann finds that it is a strange phrase (1957: 287). My suggestion is that intersense transfer is more capable of splitting the focus of perception than ordinary metaphor. To elicit an emotional rather than witty response requires fusion of the sensory information into a 'soft focus'. Well-defined shapes tend to resist this fusion, whereas thing-free qualities promote it. In the foregoing two examples, "The same bright face I tasted" and "And taste the music of that vision pale" there is an "upward" transfer, from tasting to seeing, and as such, both ought to be perceived as "smooth" and "natural". The characteristic shape of face, however, appears to be an obstacle to fusion with tasting. That seems the source of the "strangeness" of the expression. As a downward transfer, I would mention here Donne's notorious "loud perfume" with its witty effect.

Let us return now to the patterning of signifiers in poetry. The phonetic signifier is derived from the auditory domain. Its patterning into alliteration and rhyme may achieve a fusion of an emotional or musical character, if there is nothing else to resist it. One such resisting factor we have already encountered: the assignment of a referent to the additional, reference-free, sound pattern. Another prominent factor that may affect this fusion is the organisation of the verse into stronger or weaker gestalts. Typographic patterning of the verse lines appeals to the visual sense. What is more, a pair of wings, or an altar, or a wounded dove and a fountain, involve stable characteristic visual shapes and, as such, they resist fusion, and generate a witty effect. That is why such patterning would typically occur in poetry of mannerist character.

Another obvious path would be to argue that while the system of phonological signifiers is inborn in human beings, the system of graphemic signifiers is man-made, and acquired at a relatively late age by human beings. Hence the difference of naturalness in their patterning. This, I believe, is very true; but in a very special sense. There is not only a difference between our responses to phonemes and graphemes, but also between our responses to phonemes on the one hand, and tones or natural noises on the other. Al Liberman and his colleagues at the Haskins Laboratories (see, e.g., Liberman et al, 1967) distinguish between a speech-mode and a nonspeech-mode in aural perception. In the latter, the shape of the perceived sound is similar to that of the sound wave; in the former, not. Consider the hand-painted spectrograms in figure 1. An electronic device called "pattern playback" can reproduce from them the syllables [di] and [du]. The perceived abstract category [d] is similar in the two syllables, but the shape of the sound waves that carry them, not. We perceive only the abstract category, but not the different sound shapes. This characteteristic of speech is called "categorical perception". A second characeteristic of speech is called "parallel transmission". The phenomenon we have observed in figure 1 can also be described as follows: The sound waves that convey the consonant [d] in the two syllables are dissimilar, because they give, at the same time, information about the ensuing vowels as well. Speech researchers speak of various degrees of encodedness. In some phonemes the sound information is accessible to introspection, to some extent; these are the less encoded ones. In some phonemes they are inaccessible; these are the more encoded ones. Consider the sonograms of natural speech in figure 2. Through careful introspection one may tell that [s] is acoustically higher than [S], whereas with respect to figure 1 one cannot tell through introspection that the consonant [d] is conveyed in the two syllables by sound waves of different shapes and frequencies.

Repp (1984: 287) found that this ability to tell the relative pitch of the two fricatives lies in the cognitive strategy of isolating them from their vocalic context. And conversely, Rakerd found that "vowels in consonantal context are more linguistically perceived than are isolated vowels". In plain English this means that the perception of vowels in consonantal context is more categorial, whereas in isolated vowels more pre-categorial information can be perceived; alternatively, the underlying sensory information, by virtue of which a vowel is typically associated with certain perceptual qualities, varies from one consonantal context to another, owing to "parallel transmission", that is, owing to "the fact that a talker often co-articulates the neighboring segments of an utterance (that is, overlaps their respective productions) such that the acoustic signal is jointly influenced by those segments" (Rakerd, 1984: 123). In my work on speech sounds in poetry I have argued that there is a third, poetic mode of aural perception, in which some of the rich precategorial sensory information becomes accessible "behind" the phonetic categories and subtlely interplays with certain semantic components (Tsur, 1992 b; 1992 a: chapter 7).

  Figure 2. Sonograms of S and s, indicating why s is somehow "higher"

In some of his recent articles, Al Liberman tells us dramatic details about how the great breakthrough in speech research occurred in the late forties. He and his colleagues were engaged in developing a reading machine for the blind. The simple idea was this: just as there is an alphabetic correspondence between speech sounds and letters, one might establish a similar correspondence between the letters of the alphabet and a series of musical tones, which the blind might acquire with some practice. Now speech sounds are produced and perceived at a rate of 40 bits per second. When the nonspeech sounds were played at a much lower rate, they exceeded the resolving power of the human ear and were perceived as one fused tone. After long research it was discovered that speech sounds at that rate do not exceed the resolving power of the human ear owing to the fact that one piece of acoustic information gives information about several phonemes at one and the same time.

What conclusion can we draw from this excursus on speech perception with reference to the perceived difference between phonetic and typographic patterning in poetry? There is a substantial difference between the phonetic signifiers in the auditory mode, which are inborn, and the graphemic signifiers in the visual mode, acquired at a relatively late age. This difference is also affected by the peculiarities of visual shapes. But there is also an enormous difference within the auditory mode itself, between the speech mode and the nonspeech mode, both inborn, or acquired at a very early age. The former appears to be of a far higher psychic economy than the latter, handled by a specialised inborn mechanism. The nature of this psychic economy can be understood with reference to two of its characteristics: parallel transmission, and the modular nature of the phonetic information. This requires some elaboration. We listen to a stream of abstract phonetic categories, made amenable to the resolving power of the human ear by parallel transmission. At the same time, at a lower level, and subliminally, we may attend to the rich, precategorial acoustic information, which may affect the perceived quality of poetic language in a variety of ways. For our present business, one thing is important: there is a subtle interplay in the background, on a very minute scale, between this rich, precategorial acoustic information and the fine-grained semantic components. Obviously, owing to the differences propounded above, such interplay cannot take place between the visual design superimposed upon the line arrangement and the patterns of signs at the phonetic, semantic, syntactic and thematic levels.

Still, as van Peer notes, the manipulation of line length on the page, i.e., "typographic foregrounding", is ubiquitous in Western poetry, whether classic, romantic, or manneristic.

The existence of verse lines and stanzas illustrates this. Neither of these can be understood without an appeal to typographical deviations and parallelisms, which have themselves been turned into literary custom. This means that these forms must belong to readers' stock expectations concerning poetry (50).

The earlier auditory device thus becomes transformed into a visual game, in which the delineation of the white space on the page around verse lines and stanzas fulfils a signalling function ('This is poetry!') and gives cause to forms of semiotic play (51).

Though I consider this account to be adequate, such a conception has been a source of innumerable misunderstandings. Some contemporary critics hold, following Dr. Johnson, that blank verse and vers libre are "often only verse for the eye". This is a misconception. Just as the graphic arrangement on the page presents the lines as perceptual units to the eye, the intonation contours heard in the reading of poetry present the lines as perceptual units to the ear. Such contours are the result of the interaction of the intonation contours required by prose rhythm with those that articulate the line. It is assumed that the listener decodes these contours in terms of the intonation contours from whose interplay they arise. In fact, the main function of the graphic arrangement on the page is to give the reader instructions concerning the intonation contours appropriate to the lines. In this respect, the verse lines with the white space around become rather transparent graphic signs of phonological entities: just as the letters on the page signal phonemes, the verse lines surrounded by white space signal intonation contours (cf. Tsur, 1977: 119; 1992 a: 174-175). They only begin to compete for the reader's attention and reassert their warring identity, when they are foregrounded by some "mannerist" device: acrostic, or some mimetic arrangement.

Finally, we come up against the literary-historical and the socio-cultural perspectives in which the discontinuity of these manneristic devices must be considered. Willie van Peer aptly observes:

[I]t is hard to see how a theory along the lines of Tynjanov, i.e. a constant relief of new devices, would be able to provide a clarification of this discontinuity. Yet the observation that the distribution of typographic forms of foregrounding over different historical eras is not random requires an explanation (57).

I propose to offer a model alternative to Tynjanov's, drawn from Sypher (1955: 6), who speaks of four stages of Renaissance style, the first two of which are relevant to our present business: A provisional formulation (Renaissance), and a disintegration (Mannerism). We might suggest that in the first stage, that of the "provisional formulation", the art-consumer (reader, listener, spectator) tends to attend away from the individual devices to the architecture of the whole, whereas in the second stage he is forced to attend back to the isolated devices, with possible serious damage to the architectural structure of the whole. In Melchiori's phrasing (1966: 138), "the total effect is frequently lost sight of, or is reached through accumulation rather than through a harmonious disposition of structural parts", whereas "details are worked out with a goldsmith's care". A similar disintegration has been observed by Curtius (1973: 274) toward the end of the Latin Middle Ages: "A danger of the system lies in the fact that, in manneristic epochs, the ornatus is piled on indiscriminately and meaninglessly. In rhetoric itself, then, lies concealed one of the seeds of Mannerism. It produces a luxuriant growth in Latin Middle Ages". A similar story can be told, mutatis mutandis, about the disintegration of romanticism, after a provisional formulation, into the ensuing various types of mannerism.

The discontinuity of manneristic devices in general, and of picture poems in particular, throughout the history of literature, can be accounted for, then, by a model suggesting an internal dynamics of alternating periods of provisional formulation and of disintegration, in which "the centre cannot hold".

In order to avoid misunderstandings, one more word should be said about the meanings of the term "mannerism". In art history and criticism it has three different but related meanings. All three meanings refer to artistic and literary phenomena which compel the reader (or the audience) to focus attention on the individual figures rather than on the composition of the whole. Many critics use the term Mannerism in a pejorative sense, to refer to a style marked by an excess of ornaments and frequent repetition of a limited number of stylistic devices, whether functionally required or not. This is the meaning in which Willie van Peer uses the term when he says of a 17th century specimen of "wing poems" that "to a modern reader, the text is little more than a manneristic game". On the other hand, at least one important theoretician uses it very differently, referring in a positive sense to the cultural period in the 16th and 17th century, between the Renaissance and the Baroque:

Thus, mannerism has two modes, technical and psychological. Behind the technical ingenuities of mannerist style there usually is a personal unrest, a complex psychology that agitates the form and the phrase (Sypher, 1955: 116).

Sypher's usage links the term Mannerism with Metaphysical. The third meaning of the term refers to other styles or cultural periods which resemble in some important way mannerism in the second sense, including some trends of modernism (Melchiori (1966) uses the term in this third sense, when he calls James and Hopkins "Two Mannerists").

I still owe an explanation why mannerism tends to occur during "periods of great social, political and ideological upheaval, when more than one scale of values prevail". I have elsewhere discussed at considerable length (e.g., Tsur, 1987: chapter 9; 1992 a: chapter 15), the cognitive functions fulfilled by such typical metaphysical devices as the metaphysical pun and conceit, and used these functions in an effort to explain the effects of those devices on the readers. The gist of my argument is that the metaphysical pun and the metaphysical conceit are adaptive devices turned to an aesthetic end. In a socio-cultural situation in which disintegration exceeds the degree that could be handled by cognitive integrating devices deployed by e.g. romanticism, one must cope with emotional disorientation by resorting to some more effective adaptive devices. As a first orientation device, one might check whether one's adaptation mechanisms are properly tuned. When one is shocked out of tune with one's environment by the clashing emotional tendencies of the grotesque, of the metaphysical pun, or of the metaphysical conceit then, as suggested by Sypher, one tries to readjust himself so as to regain aesthetic distance. In the course of this, our own coping mechanisms with the environment, and especially the linguistic mechanisms involved in this process, become perceptible to ourselves. Some similar process of disorientation and readjustment may be at work when the reader attempts to integrate the visual design superimposed upon the verse lines with the rest of the poem, at least in the most extreme instances. Mellin de Sainct-Gelais' occasional wing poem (quoted by van Peer) "On the Recovery of our Lady, Mother of François the Ist",

had it not been composed in the form of two wings, would presumably lose little of its effect. In the case of George Herbert, however, typography and theme form a symbiotic whole, the aesthetic value of which would be affected if the wing-pattern were disrupted. In this sense, typographic forms of foregrounding may contribute in a specific way to the quality of poems (57).

Beardsley (1958) speaks of "multiple relationship", Wheelwright (1968) of "multivalence" in relation to such more traditional poetic devices as metaphor. Van Peer suggests precisely such a multiple relationship between the visual arrangement of the lines and the verbal structure of the verse. He points out that the wing-shape of the stanza is analogous to several thematic features of the poem, as well as to the semantic patterns of the verbs on the one hand, and their syntactic patterns on the other. In this respect, his analysis supports the assumption that Herbert's use of the picture poem is in a sense a rather logical extension of more conservative aesthetic principles. When there is no such multiple relationship between the wing-arrangement of lines and the verbal structure, the two can be contemplated in isolation, with no attempt to integrate them, and no emotional shock arises. The emotional shock would arise only when the multiple relationship serves as an incentive for the integration of the hard-to-integrate dimensions of the poem.

At a time when the Catholic Church tried to re-establish its hegemony through Counter Reformation; at a time when Donne wrote in his "An Anatomie of the World" "And new Philosophy calls all in doubt [...]/ The Sun is lost, and th'earth, and no man's wit/ Can well direct him where to looke for it. [...]/ 'Tis all in peeces, all cohaerance gone", and in his "Holy Sonnet 5" "You which beyond that heaven which was most high/ Have found new spheres, and of new lands can write"; at a time when Milton gave in one passage (in Paradise Lost VIII) a Ptolemaic and a Copernican account of the universe insisting that only the Great Architect knows the truth, people had to cope with disorientation in a world in which more than one scale of values prevailed. In such a universe, readers of poetry find pleasure not so much in the emotional disorientation that arises from the mannerist devices, but rather in the reassertion that their adaptive devices, when disrupted, function properly. This is one thing that cognitive poetics means by suggesting that in the response to poetry, adaptive devices are turned to an aesthetic end. And this is one reason for mannerist styles to recur in cultural and social periods in which more than one scale of values prevail.

PostScript, 1997

Before the publication of this paper I have re-read some of Al Liberman's papers of the past five years that recapitulate his "finding that speech is special". He most vigorously reinstates that "the units of speech are defined as gestures, not as the sounds those gestures porduce" (Liberman, 1992b: 123). "Language is neither auditory nor visual. [...] Optical stimuli will, under some conditions, evoke equally convincing phonetic percepts, provided [...] they specify the same articulatory movements [...] that the sounds of speech evoke. This so-called 'MacGurk' effect works powerfully when the stimuli are the natural movements of the articulatory apparatus, but not when they are the arbitrary letters of the alphabet" (125). Speech is normally transmitted by a stream of inconstant, rapidly and continuously changing sounds, which specify the articulatory gestures that produced them, resulting in invariant and discrete speech categories. This process, says Liberman, is biological, "precognitive". He contrasts this conception with the "received view", according to which the perceived speech categories "are the end products of a cognitive translation that converts auditory percepts into a form appropriate to language. Getting from speech signal to the primary level of language is, therefore, a two-stage process: evocation of an auditory percept in the first stage, followed by conversion to a phonetic representation in the second" (Liberman, 1992a: 110). In this important respect, the rival view "implausibly makes perceiving speech no different in principle from perceiving Morse code or, for that matter, the letters of the alphabet" (110). "Unlike a Morse code operator or writer, a speaker is directly using motor representations that are inherently linguistic. There is no need to connect a nonlinguistic act (pressing a key or writing an alphabetic character) to some linguistic unit of a cognitive sort" (111).

Early attempts at creating reading machines for the blind based on an "alphabetic" principle of nonspeech sounds failed, because even when played at a much lower rate then normal speech, they fused into unique chunks of sounds characteristical of each word. Furthermore, Liberman and his colleagues found little transfer of training across rates. Letters and words learned at one rate could not be recognised at other rates. "Words tended not only to become hard-to-analyze wholes, but the phenomenal nature of the whole changed quite drastically from one rate to another" (Liberman, 1992a: 111). The key to the difference between the speech mode and nonspeech mode of hearing was: coarticulation, parallel transmission and categorial perception.

"Coarticulation must walk a fine line, being constrained on either side by the special demands of phonological communication. Thus, coarticulation must produce enough overlap and merging to permit the high rates of phonetic segment production that do, in fact, occur, while yet preserving the details of phonetic structure" (Liberman, 1992b: 124).

There seems to be general consensus as for the artificiality of the graphemic patterning of picture poems, as compared to the relative naturalness of the various kinds of phonetic patterning prevalent in all kinds of poetry. But only very few people do ask why graphemic patterning should be less natural than phonetic patterning; and even fewer people give an answer to this question based on Liberman's findings concerning the unique nature of speech. It should be noticed that Liberman's conception is tailor-made for explaining this relative unnaturalness. At the same time, the general consensus as for the relative unnaturalness of graphemic patterning might serve as weighty evidence in favour of Liberman's conception of speech perception, as opposed to the received view.

Liberman suggests that "coarticulation produces a complex and singularly linguistic relation between acoustic signal and the phonetic message it conveys" (Liberman, 1992b: 124). Unfortunately, he makes no suggestions as for the complex and singularly linguistic relation between the phonetic message and units of meaning. This latter kind of linguistic relation seems to have been neglected so far by research, although this might be one of the most intriguing questions of the linguistic endeavour. We usually encounter only such general statements as that "a phonological representation is assigned to semantic representations", or the like. The paucity of such statements is revealed by the rich and elaborate knowledge unearthed by Liberman and his colleagues concerning the "singularly linguistic relation between acoustic signal and the phonetic message it conveys". Had we some more detailed information about the ways the phonetic and semantic representations are combined in language and speech, we could, of course, adduce some more illuminating explanation of why graphemic patterning is less natural than phonetic patterning in poetry.

Still, I believe, I can point out some crucial differences between phonetic and graphemic patterning that may account for the greater naturalness of the former and the relative artificiality of the latter. Language, speech, reading require very complex processing of linguistic stretches during a very short period of time. During this period, these stretches are available for processing in "immediate memory", which functions in the acoustic mode. Much research is devoted today in the United States to find out what are the cognitive deficiencies of poor readers as compared to good readers. Two such deficiencies have been isolated. First, poor readers make less efficient use of phonetic coding than proficient readers (their performance in certain verbal memory tasks is less influenced, for better or for worse, by rhyming words); and second, they have a poorer awareness of phonological units in the stream of speech (they have greater difficulty than proficient readers to tap once, twice or three times in response to such words as "eye, pie, spy", or once or three times in response to such words as "dog" and "elephant"). All linguistic processing requires efficient phonetic coding, so as to make the stretch of speech available in immediate memory for processing. But it would appear that illiterate persons or pre-literate societies can do quite well without the second competence. Now consider this. Phonetic patterning enhances the memory traces of speech sounds; consequently, they improve the availability of stretches of speech for processing. Hence the relative naturalness of phonetic patterning. Indeed, poetry is said to have originated in pre-literate societies by the need to memorise and transmit verbatim texts of great cultural importance. I shall argue that graphemic patterning does exactly the opposite: it renders the linguistic units less available for the processing consciousness.

In a paper on linguistic awareness and orthographic form, Ignatius Mattingly (a close associate of Liberman) reviews all known major orthographic systems from ancient times to the present day in Europe, Western and Eastern Asia, North Africa and Mayan culture. In this universal context he points out that no purely logogrammatic system has ever been discovered, and that in all systems we find phonograms with or without logograms. Consequently, reading is possible only for people who have an awareness of smaller linguistic units, and can identify the correspondence of orthographic units to linguistic units. Curiously enough, no culture has ever developed a purely phonetic transcription system, except by highly sophisticated professional phoneticians.

The reason must be that shapes of words in such a transcription are context-sensitive and thus difficult to recognise.

"It is suggested that this is a minimal constraint that all writing systems must meet, so that words can serve as units of transcription" (Mattingly, 1992: 134).

All orthographic systems seem to require, then, linguistic awareness at two levels at least, the word which serves as a "frame" for interpretation, and some lower level, the syllable or the phoneme. In Western orthographic systems words are visually isolated by blank space, facilitating the perception of the "frame" units. In Chinese, no such visual isolation occurs, and the characters (or pairs of characters) on the page signify syllables. Nonetheless, Mattingly quotes an experiment by Xu and himself which strongly suggests that psychologically, in Chinese too, the word is the transcription unit. In this experiment, respondents had to answer "yes" or "no", according to whether two genuine characters can or cannot occur together in one Chinese syllable. They had to make their judgments in the context of pseudowords and of genuine words. The judgments were considerably faster in the latter.

This short excursus on orthographic form may suggest the following. The difference between phonetic and graphemic processing of language does not consist only in that the former relies on an inborn mechanism whereas the latter reflects a man-made artifice. Language and speech are, indeed, primary in humans, whereas reading and writing are secondary. But reading and writing are possible only if the orthographic units have a good fit to the linguistic units, at two different levels at least. We have seen that phonetic patterning of speech and language enhances the acoustic traces of speech sounds in immediate memory and, consequently, increases the availability of stretches of speech and language to the processing consciousness. Hence its relative naturalness. By contrast, graphemic patterning directs away attention from the correspondence of orthographic units to linguistic units; consequently, they render reading more difficult, less natural.


1. In Mediaeval Hebrew poetry, acrostic assumes a very specific function. It occurs in liturgical, but not in secular poetry. While secular poems were collected in Diwans of their authors, liturgical poems were collected with poems by other authors in prayer books, intended for the same liturgical occasion. The acrostic was regarded as a "signature", by which authorship of the poem could be recognised.

2. e.g., Tsur, 1987: chapter 12; 1992 b; 1992 a: chapters 7 and 9.

3 Dombi and myself were doing our respective research at the same time, without knowing about each other's work.

