Thinking in language?: evolution and a modularist possibility


Peter Carruthers


Abstract: This chapter argues that our language faculty can both be a peripheral module of the mind and be crucially implicated in a variety of central cognitive functions, including conscious propositional thinking and reasoning. I also sketch arguments for the view that natural language representations (e.g. of Chomsky’s Logical Form, or LF) might serve as a lingua franca for interactions (both conscious and non-conscious) between a number of quasi-modular central systems. The ideas presented are compared and contrasted with the evolutionary proposals made by Derek Bickerton (1990, 1995), who has also argued for the involvement of language in thought. Finally, I propose that it was the evolution of a mechanism responsible for pretend play, circa 40,000 years ago, which led to the explosion of creative culture visible in the fossil record from that time onwards.


1          Introduction: the modular mind

This chapter will attempt to develop a modularist version of (a relatively weak form of) the cognitive conception of language (weak, because it only claims it to be naturally necessary that some of our thoughts should constitutively involve natural language – see Chapter 1, where we point out that one obstacle to taking the cognitive conception of language seriously, amongst cognitive scientists, is an unnecessary focus on conceptual and/or universal versions of it). In my 1996a I argued at length in support of the cognitive conception of language, and responded to a variety of objections to it. I also regarded it as important to claim that the cognitive conception could be consistent with a more or less modular conception of language and mind; but the idea was not really spelled out, and the consistency not demonstrated. The purpose of this chapter is to remedy that deficiency. I should stress, however, that the main point of the chapter is to defend a possibility –I want to show how it is possible for modularism and the cognitive conception of language both to be true together, so that others may begin to investigate the possibilities. I make no attempt to present anything resembling a defence of a worked-out model of cognition. But first I need to give some background.

            According to Jerry Fodor’s (1983) account, the human mind is cleanly divided into two distinct aspects or parts – into a set of peripheral input (and output) modules, on the one hand, and central cognition on the other. Fodor’s input modules include specialised aspects of vision, audition, taste, smell, touch, and language; output modules include a variety of systems controlling different types of motor activity, and language. Modules are held to have proprietary inputs (or outputs, for output modules) and shallow outputs; to be domain-specific; to be fast; to be hard-wired; to involve mandatory processing; and to have their processes both encapsulated from, and inaccessible to, central cognition. (The encapsulation of modules means that their processes are unaffected by changes in central cognition, such as alterations in background belief. The inaccessibility of modules means that central cognition has no awareness or knowledge of the processes which take place within them.)

            In contrast with the considerable success enjoyed by cognitive science in uncovering the structure of modular cognitive processes, central cognition is thought by Fodor to remain inherently mysterious; and, indeed, we are warned off ever seeking a serious science of the central mind at all. There might seem to be something of a paradox here. For Fodor also claims that we do have a science of central cognition, embodied in the generalisations and ceteris paribus laws of common-sense psychology – see his 1987, ch. 1. Thus we know that, if people see something, they generally believe it; that if they want something, and believe that there is something which they can do to get it, then they generally do that thing; that if they believe that P and Q, they generally also believe that P; and so on. Such generalisations are thought by Fodor to be both true, and to have the status of counter-factual supporting laws, equivalent in standing to the laws of any other special science. So how can it be true that central cognition is closed to science if our folk-psychological beliefs about it are a science? I take it that what Fodor means is that there is no hope of making scientific progress in discovering the causal mechanisms which instantiate folk-psychological laws; that is, that central cognition is closed to cognitive science.

            Now, the claim of hard-wiring of modules was probably always an exaggeration. For it appears that all biological systems admit of considerable degrees of neurological plasticity, especially in the early stages of development (Elman et al., 1996). But it remains plausible that the development of perceptual and motor sub-systems should be largely a matter of biological growth-in-a-normal-environment, rather than of learning; and that to the extent to which learning does take place, it may involve domain-specific, rather than general, learning principles. So we can still think of the development of modular systems as being largely innately determined, in interaction with normal environments. It is also the case that modules are probably less than fully encapsulated from central cognition, as we shall see in section 2 below when we discuss the processes underpinning visual imagery. Rather, it seems likely that all input and output systems are alike in having a rich network of back-projecting neural pathways, to help to direct perceptual search and recognition, and to help to monitor and fine-tune motor output. But these points aside, Fodor’s conception of the modular periphery of the mind can be embraced as broadly acceptable, I believe – that is, limited accessibility, domain-specificity, and mandatory and fast operation do characterise a range of specialised input/output systems.

            Matters are otherwise, however, when it comes to the claimed mysteriousness and intractability of central cognition, where Fodor has surely been unduly pessimistic. For as the work of Baddeley and colleagues has shown, it is possible to study the structure and functioning of the central-process working-memory system, for example (Baddeley and Hitch, 1974; Baddeley, 1986; Gathercole and Baddeley, 1993); and Shallice and others have made testable proposals concerning the structure of the central executive (Norman and Shallice, 1986; Shallice, 1988, 1994; Shallice and Burgess, 1993). Moreover, a number of investigators have begun to argue that central cognition itself may be quasi-modular in structure (Baron-Cohen, 1995; Smith and Tsimpli, 1995; Sperber, 1996). Quasi-modules would differ from full modules in having conceptual (rather than perceptual or motor) inputs and outputs. And they may differ markedly in the degree to which their processes, and principles of operation, are accessible to the rest of the system. But they would still be relatively fast, special-purpose processors, resulting from substantial genetic channelling in development, and operating on principles which are largely unique to them, and at least partly impervious to changes in background belief.

            A picture is thus emerging of a type of mind whose periphery is made up of a quite highly modular set of input and output modules (including language), and whose centre is structured out of a number of quasi-modular component sub-systems, in such a way that its operation is subserved by a variety of special-purpose conceptual processors. Thus it seems highly likely that there are systems designed for reasoning about the mental states of oneself and others (Baron-Cohen, 1995), and for detecting cheaters and social free-riders (Cosmides and Tooby, 1992); and there may well be systems designed to deal with causal reasoning and inferences to the best explanation (Giere, 1988), with mate-selection, with various forms of spatial reasoning, with beneficence and altruism, and with the identification, care of, and attachment to, off-spring (Barkow et al., 1992).

            It is likely to be important, then – if the cognitive conception of language is to be defended – to demonstrate that the latter can be rendered consistent with such a modularist picture. It needs to be shown how language can both be an input/output module of the mind, and be crucially implicated in various kinds of central cognition. This is the main task of the present chapter.


2          The use of peripheral modules in central processing

So, how is it possible for language both to be a peripheral input/output module of the mind, and to be employed in central-process cognition? To see how this can in principle be the case, compare what is known about visual imagination. Almost everyone now thinks that the visual system is a distinct input-module of the mind, containing a good deal of innate structure. But equally, most cognitive scientists now accept that visual imagination re-deploys the resources of the visual module for purposes of reasoning – for example, many of the same areas of the visual cortex are active when imagining as when seeing. (For a review of the evidence, see Kosslyn, 1994.) What is apparent is that central cognition can co-opt the resources of peripheral modules, activating some of their representations to subserve central cognitive functions of thinking and reasoning. The same is then possible in connection with language. It is quite consistent with language being an innately structured input and output module, that central cognition should access and deploy the resources of that module when engaging in certain kinds of reasoning and problem solving.

            According to Stephen Kosslyn (1994), visual imagination exploits the top-down neural pathways (which are deployed in normal vision to direct visual search and to enhance object recognition) in order to generate visual stimuli in the occipital cortex, which are then processed by the visual system in the normal way, just as if they were visual percepts. Normal visual analysis proceeds in a number of stages, on this account. First, information from the retina is mapped into a visual buffer in the occipital lobes. From here, two separate streams of analysis then take place – encoding of spatial properties (position, movement, and so on) in the parietal lobes, and encoding of object properties (such as shape, colour, and texture) in the temporal lobes. These two streams are then pooled in an associative memory system (in the posterior superior temporal lobes), which also contains conceptual information, where they are matched to stored data.

            At this stage object recognition may well take place. But if recognition is not immediately achieved, a search through stored data, guided by the partial object-information already available, then occurs. Object-representations are projected back down through the visual system to the occipital lobes, shifting visual attention, and asking relevant questions of the visual input. This last stage is subserved by a rich network of backward-projecting neural pathways from the ‘higher’, more abstract, visual areas of the brain to the occipital cortex. And it is this last stage which is exploited in visual imagination, on Kosslyn’s account. A conceptual or other non-visual representation (of the letter ‘A’, as it might be) is projected back through the visual system in such a way as to generate activity in the occipital cortex (just as if a letter ‘A’ were being perceived). This activity is then processed by the visual system in the normal way to yield a quasi-visual percept.

            Note that this account involves some weakening of the sense in which the visual system can be said to be modular. For it means that the processing of the system is not fully informationally encapsulated, since centrally stored information has an impact on the way in which visual data are processed; whereas a module, in Fodor’s (1983) sense, is always a fully encapsulated processor. But then it is very hard to see how else visual images could be generated, given that imagination and perception share mechanisms (as they do) – unless, that is, there were some way for central cognition to provide inputs to the visual system from outside (perhaps generating activity in the optic nerve in such a way as to simulate appropriate retinal stimulation), which appears most unlikely. And the visual system can of course remain modular in every other respect – being fast, genetically channelled in development, having proprietary inputs and a restricted domain, involving mandatory processing which is inaccessible to central cognition, and so on.

            Note, too, that hardly anyone is likely to maintain that visual imagery is a mere epiphenomenon of central cognitive reasoning processes, playing no real role in those processes in its own right. On the contrary, it seems likely that there are many tasks which cannot easily be solved by us without deploying a visual (or other) image. Thus, suppose you are asked (orally) to describe the shape which is enclosed within the capital letter ‘A’. It seems entirely plausible that success in this task should require the generation of a visual image of that letter, from which the answer (‘a triangle’) can then be read off. So it certainly appears that central cognition functions, in part, by co-opting the resources of the visual system to generate visual representations, which can be of use in solving a variety of spatial-reasoning tasks. And this then opens up the very real possibility that central cognition may also deploy the resources of the language system to generate representations of natural language sentences (in ‘inner speech’), which can similarly be of use in a variety of conceptual reasoning tasks.

            An obvious problem for this suggestion, however, is as follows. If the generation of inner speech, like the generation of visual imagery, begins with some central-process conceptual representation, which is projected back through the system to generate a quasi-perceptual representation, then it may seem puzzling why it should occur at all. It is easy to see the potential benefits, for cognition, of using a conceptual representation to generate a visual one. But what would be the point of using a conceptual representation to generate another representation of the same general sort, namely a natural language sentence?

            One kind of answer is available from the standpoint of the communicative conception of language – it is that inner verbalisation can help to extend the cognitive resources available to central (non-language-involving) cognition; perhaps serving, in particular, to enhance short-term memory. This is the supra-communicative account of language, defended by Rosemary Varley (this volume, ch. 6) and Andy Clark (this volume, ch. 8). On this account we use a conceptual representation to generate an item of inner speech in order to help us remember, and operate with, that very representation. One problem for this proposal is then that inner speech does not just occur when demands on cognition are particularly heavy. Rather, inner speech is just as likely to figure in idle day-dreaming as in complex problem-solving (Hurlburt, 1990). This is difficult to explain if inner speech is merely part of a strategy we adopt to augment our cognitive resources; but easy if it is an element of the normal functioning of the executive system, in such a way that it continues to occur even when the engine is idling, so to speak. So we already have some reason to prefer an account of inner speech which is consistent with the cognitive conception of language.


2.1       A language-using conscious executive

One part of an answer to the above problem (developed at length in my 1996a) which can be given by those inclined to endorse the cognitive conception of language, is that by generating items in inner speech we can gain access to our own conceptual representations, rendering them conscious, and available to critical reflection and improvement. On this account, sentences of inner speech will be constitutive of the thought-tokens they serve to express, provided that there is a special-purpose executive system which operates on just such sentences (see also Perner, this volume, ch. 13). So one possibility is that although non-conscious conceptual representations need not involve natural language (but rather some sort of Fodorian ‘Mentalese’, as it might be), our conscious conceptual (as opposed to visuo-spatial) thinking should necessarily implicate natural language sentences.

            Does this appeal to a central executive require me to postulate a little homunculus sitting at the centre of the mind, taking the decisions? Is the proposal unacceptable, indeed, on the grounds that I must attribute to the central executive too many of the powers and capacities of the person as a whole? I think not. Indeed, very little, if anything, would need to be added to central cognition beyond language, imagination, and theory of mind, in order to create the sort of executive which I envisage. For with these capacities in place, people would be able to generate sentences of inner speech in imagination, which would get interpreted by the language system in the normal way, and whose contents would be available to meta-representational thought. The creation of an additional executive level of cognition can then come about quite easily in two (compatible) ways. First, as Keith Frankish suggests (this volume, ch. 12), the sentences of inner speech can be objects of further, personally motivated, decision. People can decide to accept or reject those sentences, and can decide to adopt policies of premising – acting and reasoning just as if those sentences expressed their beliefs or desires. And second, people can learn to make transitions of various sorts between sentences of inner speech, acquiring habits of inference and action which are causally dependent upon those sentences being tokened in imagination (Dennett, 1991, and this volume, ch. 14). Neither of these additional functions requires us to postulate anything like an homunculus.

            But what serves to generate the items of inner speech? Am I required to postulate a Central Meaner, in anything like the sense which Daniel Dennett (1991) characterises as objectionable? I don’t believe so. One story I could tell is the standard one in the speech-production literature (e.g. Levelt, 1989), which begins which a conceptual representation of the message to be expressed, which is then used to generate an appropriate phonological representation. This conceptual representation might be formulated in Mentalese, or perhaps in LF (and if the latter, then I can tell a story about how these LF representations might be selected; see below). But equally, I could buy into a ‘pandemonium model’ of speech production (Dennett, 1991), in which a variety of ‘word-demons’ compete with one another to see who can shout the loudest. For what matters, on my account, is what happens to an imaged natural language sentence after it has been generated, not how it came to be formulated in the first place.


2.2       LF as the language of explicit conceptual thought

A further possible answer to the problem above, however (also developed, but at slightly less length, in my 1996a), is that central-process conceptual representations might already consist of (non-imagistic, non-conscious) natural language symbols. For example, Noam Chomsky has maintained that there is a level of linguistic representation which he calls ‘logical form’ (LF), which is where the language faculty interfaces with central cognitive systems (Chomsky, 1995). It might then be claimed that some (or all) conceptual, propositional, thinking consists in the formation and manipulation of these LF representations. In particular, it could be that tokening in an LF representation is what renders a given content explicit (in the sense of Karmiloff-Smith, 1992) – that is, which serves make it generally inferentially available (or ‘promiscuous’) outside of its given cognitive domain, having the potential to interact with a wide range of central cognitive operations. On this account, it would not just be some (conscious) thought tokens which constitutively involve natural language representations; but certain explicit thoughts, as types, would involve such sentences.

            The hypothesis can thus be that central-process thinking often operates by accessing and manipulating the representations of the language faculty. Where these representations are only in LF, the thoughts in question will be non-conscious ones. But where the LF representation is used to generate a full-blown phonological representation (a sentence in auditory imagination, or an episode of inner speech), the thought will be a conscious one. But what, now, is the basic difference between the hypothesis that (many forms of) central-process thinking and reasoning operate, in part, by deploying sentences of LF, and the hypothesis that they are conducted entirely in Mentalese? The important point, here, is that sentences of LF are not sentences of Mentalese – they are not pure central-process representations, but rather depend upon resources provided by the language faculty; and they are not universal to all thinkers, but are always drawn from one or another natural language.

            (Philosophers and logicians should note that Chomsky’s LF is very different from what they are apt to mean by ‘logical form’. In particular, sentences of LF do not just contain logical constants and quantifiers, variables, and dummy names. Rather, they consist of lexical items drawn from the natural language in question, syntactically structured, but regimented in such a way that all scope-ambiguities and the like are resolved, and with pronouns cross-indexed to their referents and so on. And the lexical items will be semantically interpreted, linked to whatever structures in the knowledge-base secure their meanings.)

            Moreover, the proposal is not that LF is the language of all central processing (as Mentalese is supposed to be). For, first, much of central cognition may in any case employ visual or other images, or cognitive models and maps (Johnson-Laird, 1983). And second, and more importantly, my proposal is that LF only serves as the intermediary between a number of quasi-modular central systems, whose internal processes will, at least partly, take place in some other medium of representation (patterns of activation in a connectionist network, perhaps). This idea will be further elaborated in section 4 below. But basically, the thought is that the various central systems may be so set up as to take natural language representations (of LF) as input, and to generate such representations as output. This makes it possible for the output of one quasi-module (theory of mind, say) to be taken as input by another (the cheater-detection system, for example), hence enabling a variety of quasi-modular systems to co-operate in the solution of a problem, and to interact in such a way as to generate trains of thought.

            But how can such an hypothesis be even so much as possible? How can a quasi-modular central system interpret and generate natural language representations, except by first transforming an LF input into a distinct conceptual representation (of Mentalese, as it might be), then using that to generate a further conceptual representation as output, which can then be fed to the language system to build yet another LF sentence? But if that is the story, then the quasi-module in question does not, itself, utilise the resources of the language system. And it also becomes hard to see why quasi-modules could not communicate with one another by exchanging the sentences of Mentalese which they generate as outputs and take as immediate inputs. I shall shortly sketch one way in which (some) quasi-modules might utilise the resources of the language system for their internal operations. But first, let me briefly outline an evolutionary answer to the question how LF, rather than Mentalese, could have come to be the medium of intra-cranial communication between quasi-modules. (More on this in section 4 below.)

            Suppose that the picture painted by Steve Mithen (1996) of the mind of Homo erectus and the Neanderthals is broadly correct. Suppose, that is, that their minds contained a set of more-or-less isolated central modules for dealing with their different domains of activity – a theory of mind (TOM) module for social relationships and behavioural explanation and prediction; a natural history module for processing information about the lifestyles of plants and animals; and a physics module, crucially implicated in the manufacture of stone tools. When a language module was added to this set it would, very naturally, have evolved to take as input the outputs of the other central modules, so that hominids could talk about social relationships, the biological world, and the world of physical objects and artefacts. (It is unlikely, I think, that language would have evolved only for talking about social relationships, as Mithen 1996 suggests, following Dunbar 1996. For given that TOM would already have had access to non-social contents – as it would have to if it was to predict and explain non-social behaviour – there would then have been a powerful motive to communicate such contents. See Gomez, this volume, ch. 4.) It also seems plausible that each of those modules might have altered in such a way as to take linguistic as well as perceptual inputs, so that merely being told about some event would be sufficient to invoke the appropriate specialist processing system.

            With central modules then taking linguistic inputs and generating linguistic outputs, the stage was set for language to become the intra-cranial medium of communication between modular systems, hence breaking down the barriers between specialist areas of cognition in the way Mithen characterises as distinctive of the modern human mind. All that was required, was for humans to begin exercising their imaginations on a regular basis, generating sentences internally, in ‘inner speech’, which could then be taken as input by the various quasi-modular systems. This process might then have become semi-automatic (either through over-learning, or through the evolution of further neural connections), so that even without conscious thought, sentences of LF were constantly generated to serve as the intermediary between central cognitive systems. (I return to this possibility in section 4 below.)

            Let me turn, now, to sketch how the resources of the language system might have come to be implicated in the internal operations of the theory of mind (TOM) quasi-module, in particular. (Note that it is very unlikely that the internal structure of quasi-modules should be the same across all cases – the quasi-module dealing with causal-inference would surely not operate on principles at all similar to those employed by the cheater-detection quasi-module, or the mate-selection quasi-module, for example.)

            Suppose, as I believe, that ‘theory-theory’ accounts of the structure of our TOM abilities are broadly correct (Lewis, 1966; Churchland, 1981; Fodor, 1987; Wellman, 1990; Carruthers, 1996b). Suppose, that is, that the mature TOM system embodies an implicit, partly non-conscious, theory of the structure and functioning of the mind. One possibility is that the system would contain a set of articulated generalisations, of the sort, ‘Anyone who wants something, and believes that there an action they can perform which will achieve it, will, other things being equal, execute that act’. Another possibility is that the system would consist of a set of inferential dispositions, which might then be said to represent the appropriate generalisation (so a disposition to infer from ‘has seen that P’ to ‘believes that P’ might be said to embody belief in the generalisation, ‘Anyone who sees that P believes that P’). Either way, the various nodes in this theoretical structure might be occupied by lexical items of natural language. It may be that the normal development of a mature TOM system builds on a simpler, pre-linguistic, desire-perception quasi-module, say, depending crucially on the acquisition of appropriate natural-language mentalistic vocabulary. At any rate, such an hypothesis seems consistent with everything which we know about the development of TOM (Wellman, 1990; Perner, 1991; Baron-Cohen, 1995). And the effect would be a quasi-modular system which crucially depends, for its more sophisticated operations, upon the resources of the (more fully-modular) natural language faculty.

            (Note that this proposal is distinct from – albeit consistent with – the one made by Gabriel Segal (this volume, ch. 7), according to which TOM operates by accessing the embedded-clause structure of natural language propositional-attitude reports. If only one, but not both, of these hypotheses is correct, then we can predict that TOM abilities will be differentially affected by a-grammatical and lexical aphasias.)


3          Comparison with the Bickerton model

The proposals being sketched here should be distinguished from the version of the cognitive conception of language put forward by Derek Bickerton (1990, 1995), which is nativist but not (in one sense, at least) modularist. Bickerton’s view is that the evolution of the brain-structures underpinning properly-grammatical natural language constituted the evolution of human central cognition as we know it – conscious, flexible, and indefinitely creative. My view, in contrast, is that the evolution of the language system vastly extended the powers of central cognition, by providing the latter with a representational resource which could be used to increase the range and sophistication of human thinking and reasoning, as well as to render some of our thoughts conscious. I now propose to spend some time elaborating on this difference. But I also tell his story in order to use some of it, and to extract from it an additional argument in support of the cognitive conception of language.


3.1       The Bickerton model

Bickerton’s account proceeds in two stages. First, there was a long period of hominid evolution during which what he calls proto-language was developed and deployed – perhaps two or more million years, through the evolution of Homo habilis, Homo erectus, and the Neanderthals. Proto-language is held to consist of essentially just a lexicon, perhaps containing broad categories of noun and verb but otherwise without any significant grammar. It is thought to be essentially similar to the language of one-year-old children, pidgin languages, and the sorts of unstructured language which can be taught to chimpanzees. Then second, there was the very swift evolution of properly-grammatical natural language (in Bickerton’s 1990, perhaps through a single genetic mutation; somewhat more plausibly in his 1995, following a period of rapid evolutionary change) culminating in the emergence of Homo sapiens sapiens some 100,000 years ago.

            Now, Bickerton’s claim about the swift evolution of grammatical language seems to me very unlikely, and nothing that I shall say requires commitment to it. For, first, there is nothing in the fossil record to motivate it, since there seems in any case to have been a time-lag of some 50-60,000 years between the emergence of Homo sapiens sapiens and the first appearance of creative culture some 40,000 years ago (Mithen, 1996). I shall return to this point in section 4 below. And second, as a highly complex and sophisticated organ, the language faculty would almost certainly have had to evolve in stages over a considerable period of time, under a consistent selectional pressure (Pinker and Bloom, 1990). It seems to me much more likely that grammatical language might have been evolving through the 400,000-odd years between the first appearance of Archaic Homo sapiens some half a million years ago and the much more recent appearance of Homo sapiens sapiens.

            A two-stage account has a number of significant advantages, Bickerton maintains. One is that it can simultaneously explain the slow emergence of the physiological adaptations (particularly of the mouth, throat, and larynx) necessary for smooth production of speech – these would have taken place during the long period when proto-language was developing – while also explaining the very late appearance in the fossil record of evidence of genuine culture and creative intelligence, which is held to coincide with the arrival of natural language proper. (Actually, given the time-lag mentioned above, some story needs to be told about what happened to trigger the onset of culture, independently in the different groups of humans dispersed around the globe at about the same time, some 50,000 years after the first emergence of Homo sapiens sapiens. I shall return to this point.) Bickerton can also explain some of the pressures which may have led to the dramatic increases in brain-size in early forms of Homo, which may have been due to the advantages, and demands, of increasingly large vocabularies, together with the processing requirements imposed by a-grammatical languages.

            This latter point is worth elaborating further. For here we have a possible solution to an otherwise puzzling fact about human beings, namely that they appear to have a good deal of excess brain capacity. For brain tissue is, relatively, extremely costly in terms its demands on energy consumption – ten times that of other bodily tissue, in fact (Dunbar, 1993; Dunbar himself explains the growth of hominid brains in terms of the demands of social living, as group-sizes increased; but this cannot explain the excess brain capacity of contemporary humans relative to that required by our hunter-gatherer ancestors; see below in the text). Moreover, head size is the main cause of child-birth mortality, for both mothers and infants (in the case of mothers, even now running at about one in thirteen births in developing countries), also necessitating a uniquely long period of infant-maternal dependency. So the pressures for increases in brain size must have been considerable.

            Yet in the modern world the human brain is required to retain and process amounts of information which would surely have been unthinkable in hunter-gatherer communities. (Consider that many modern children are required to learn reading, writing, and arithmetic; as well as a wealth of information about history, geography, literature, and science, up to and including sub-atomic physics.) How is this possible? The explanation may be that the brain developed as it did at a time when communication was by means of proto-language only, which would have placed very great demands on the on-line processing, interpretation, and storage of utterances. For proto-language utterances, in lacking significant structure, would have been multiply ambiguous; hence requiring a good deal of inferencing about the context and the mental states and assumptions of the speaker. The evolution of the grammar faculty would then have made interpretation semi-automatic, thus freeing cognitive space for other purposes.


3.2       Bickerton against the thesis that big brains equal intelligence

Bickerton himself sees his main opposition to be those who equate brain size with intelligence, and who think that language itself adds nothing to intelligence. His challenge to them is then to explain the evidence of the fossil record. For we know that erectus brains overlapped very substantially in size with the normal sapiens range (indeed, that Neanderthal brains were, on average, slightly larger than ours – with due adjustments made for body weight, of course). Yet there is no evidence of sophisticated erectus culture, or of systematic impact on, and exploitation of, the environment. Bickerton cites the limestone caverns of Zhoukoudian in northern China, for example, which were continuously inhabited by Homo erectus between roughly 500,000 and 200,000 years ago – that is, for 300,000 years, or roughly 60 times the length of recorded human history. Yet during that time they made no structural improvements of any kind to the caves, and the tiny handful of artefacts they produced displayed no significant change or improvement (Bickerton, 1995, p. 46; see also Mithen, 1996, for a good deal of data of this sort). How is this possible, if erectus, through having big brains (albeit lacking language, perhaps), was so intelligent?

            In fact the proponents of the brain-size-equals-intelligence theory can make some headway in replying to this argument. For they can, of course, allow that language can be a necessary condition for humans to entertain many kinds of thought, without actually being implicated in those thoughts; since it is through the medium of language that children acquire most of their beliefs and many of their concepts. Thus it is obvious that no child would ever come to believe that the Earth moves round the Sun, or could ever acquire the concept of an electron or of electricity, in the absence of language. So a story can be told according to which the arrival of language, while not fundamentally altering human cognitive powers, made it possible for humans to begin to accumulate information about their environment, as well as to construct, for the first time, a transmissible culture. Such language-mediated accumulations, it may be claimed, are what underlie the dramatic success of Homo sapiens sapiens over earlier hominid species.

            A more severe difficulty for the brain-size-equals-intelligence account might appear to be the excess capacity of the sapiens brain. For if all that changed when humans first acquired the capacity for language is that they began to acquire more and more information, then we might expect that they would have had substantially less spare capacity, and that there would have been considerable pressures for yet further increases in brain size. However, the data can be accommodated, consistent with the communicative conception of language, simply by accepting that the language-faculty is a late-evolving module of the mind. The story can then go like this: brain size increased steadily during the period when proto-language was being developed, due to pressures of interpreting and storing the significance of proto-language utterances, and resulting from increasing sophistication in a variety of central quasi-modules (such as for theory of mind, naive biology, naive physics, and so on). Then with the arrival of the language-module, interpretation became semi-automatic, fortuitously freeing-up cognitive space at the same time as increased powers of communication made possible a vastly expanded store of accumulated information.

            In fact, however, it is not so easy for defenders of the communicative conception of language to evade Bickerton’s argument. We can set them a dilemma, indeed. Suppose, on the one hand, that it is claimed that both language and human central cognitive powers evolved slowly together, appearing in modern form by about 100,000 years ago. There are three distinct problems with this account. Firstly, it is more complex than the alternative, since it postulates the evolution of two complex structures (language and the mechanisms responsible for intelligent thought) rather than just one. Second, it is hard to reconcile it with the evidence of the very late appearance of creative culture in the fossil record circa 40,000 years ago. And third, we have to accept that two such massive evolutionary changes took place in the sapiens brain without any significant increase in brain size – so much for big-brains-equal-intelligence!

            The second horn of the dilemma arises if it is then claimed that human intelligence co-evolved with brain size prior to the arrival of language, perhaps responding to the demands of social living (Byrne and Whiten, 1988). For in that case, what would have prevented erectus from using its basically-human cognitive powers to make changes in its environment? In order to make a structural improvement to a cave, for example, you don’t need an awful lot of accumulated beliefs (and certainly you don’t need culture). You just need to be able to entertain, and reason from, subjunctive or counterfactual thoughts such as, ‘If there were to be a pile of rocks just there, then water wouldn’t get in during the rainy season’, or, ‘If there hadn’t been a second entrance to the cave over there, then the wind wouldn’t have swept through during the storm’. Surely the most plausible story is that, prior to the evolution of natural language, proto-language using members of Homo erectus were not capable of such thoughts; and that it was the grammatical structures provided by our language faculty which first made it possible for us to entertain such thoughts, by being partly constitutive of them.

            I conclude, then, that we can find in Bickerton’s work a powerful new argument supporting of the cognitive conception of language, grounded in the fossil record. For if natural language and creative human intelligence are supposed to result from distinct cognitive systems, then it becomes implausible to maintain that they should each have evolved at the same time, after the brain had already reached its modern size. And yet if the core of human creative intelligence was supposed to have been in place prior to the evolution of language, then the problem is to explain the distinct lack of creative intelligence on the part of our immediate ancestors, Homo erectus and the Neanderthals.


3.3       Problems for Bickerton

Now on Bickerton’s account language was, from the start, a system of representation (that is, a vehicle of thought), as well as a system of communication. What a language-vocabulary (an explicit lexicon) provided from the outset, from the very beginnings of proto-language, was a level of representation more abstract than the perceptually-grounded concepts available to our ape ancestors, which could draw together under a single heading a number of different properties available through a number of different sense-modalities. (In support of this he can then cite all the evidence speaking in favour of the weak form of Whorfian conceptual relativism – see Lucy, 1992a, 1992b; Goldstone, 1994; Andrews et al., submitted.) The evolution of our innate capacity for properly-grammatical language then involved a major cortical re-organisation, making possible complex, indefinitely-sophisticated, and conscious thinking and reasoning. So on this account, the language faculty, although innately specified, is not in any sense a module of the mind. Rather, it more or less coincides with central cognition itself. And with language in place, all other features of human (central) cognition are held to be products of social invention and learning, down to, and including, such matters as the incest-taboo (Bickerton, 1995).

            One reason why Bickerton’s model should strike us as implausible, however, is that it is quite obscure how the evolution of a grammar-faculty could, by itself, confer capacities for non-demonstrative social, causal, or explanatory reasoning. It is, perhaps, not wholly implausible that a grammar-faculty might involve some capacity for generating semantic entailments amongst sentences, since this might be thought to be part-and-parcel of the capacity to interpret those sentences. And language might also have conferred – crucially, in the light of the examples used above – a capacity to use and to understand conditionals, subjunctives and counterfactuals. (Or language might have given us the capacity to entertain such thoughts explicitly, at least, in an inferentially promiscuous way – presumably even apes entertain at least implicit, quasi-modular, conditional thoughts when they engage in deceptive strategies, say.) But why should it also involve a capacity to reason, non-demonstratively, about the mental states of other people, generating predictions and explanations of their behaviour; or a capacity to reason about the likely causes and effects of the phenomena we observe in nature? Yet it seems unquestionable that these abilities, at least in their developed forms, are distinctively human; and it seems plausible that they should have a substantial innate component (Baron-Cohen, 1995; Atran, 1990; Spelke et al., 1995; Giere, 1988). Moreover, there is plenty of evidence that earlier forms of Homo must have been pretty good at just these forms of reasoning (Mithen, 1996).

            Bickerton’s account as it stands is implausible, in fact; both because it is hard to see how natural language could in any sense be sufficient for human reasoning, and also because it ignores the substantial evidence emerging from the evolutionary-psychology literature of the existence of a variety of innately determined – or, at any rate, innately channelled – aspects of central cognition (Barkow et al., 1992). It is with this kind of weakly modularist picture of the mind (of the sort outlined in section 1 above) that Bickerton’s version of the cognitive conception of language is surely inconsistent. So we have reason to prefer a form of cognitive conception which is more modularist than this (perhaps of the sort sketched in section 2).

            Indeed, aside from his commitment to an innately determined language-faculty, Bickerton’s account is otherwise highly empiricist in character. Virtually every aspect of central cognition besides language itself is put down to gradual social invention, and social transmission through teaching and learning. But the evidence for such gradualism is just not there in the fossil record, in fact (Mithen, 1996). The use of beads and necklaces as ornaments; the burying of the dead with ceremonies; the working of bone and antler into complex weapons; and the production of carved statuettes and paintings – all appear together in highly sophisticated form circa 40,000 years ago, in different parts of the world. So we not only need a version of cognitive conception of language which is consistent with modularism, we also need to provide for some relatively simple evolutionary change to have taken place independently in different groups of Homo sapiens sapiens at about the same time, irrespective of the varieties in the environmental challenges they faced. (As already noted, the evolution of the language faculty itself, as a highly complex mental organ, is likely to have required a consistent evolutionary pressure operating for a considerable period; and so it is unlikely to have occurred independently in different parts of the globe at the same time. See Pinker and Bloom, 1990.)


4          The co-evolution of language with central cognition

According to proposals being made here, in contrast, the language faculty would have co-evolved with changes in a variety of quasi-modular central reasoning systems, each of which then came to operate, in part, by accessing and manipulating natural language representations (of LF, as it might be). What follows is a sketch of how the story might go. I should emphasise that it is just a sketch. There is much that needs greater elaboration than I can provide here, and the account is also presented without the benefit of any further supporting argument. But then my aim is only to convince investigators that such a modularist story is possible, so that they may begin to explore and develop it further, and subject it to various forms of empirical testing. (More on the latter in section 5 below.)


4.1       The evolution of language

Prior to the evolution of any form of language, I presume that our hominid ancestors would have had essentially the same sorts of perceptual and motor modules as we now enjoy (which is not to say that the visual module, for example, has undergone no further modification since the advent of language; but it is reasonable to assume that such modifications will have been relatively shallow). More important for present purposes, our ancestors might have had a number of special-purpose, quasi-modular, central computational systems for dealing with the behaviour of conspecifics (Byrne, 1995), for simple forms of cheater-detection (Cosmides and Tooby, 1992), for classifying natural kinds (Atran, 1995), and for reasoning about causes (Spelke et al., 1995), say. These systems could have operated by effecting computations on the sentences of some sort of Mentalese, or they may have been associative networks of one sort or another. But the inputs to these systems would have been the outputs of the various perceptual (input) modules. And it seems reasonable to suppose that, in so far as the systems themselves provided our ancestors with knowledge of the relevant domains, this knowledge would have remained largely implicit, as opposed to explicit, in the sense of Karmiloff-Smith (1992). That is, such systems might have provided our ancestors with sensitivity to, and hence a capacity to respond differentially to, a number of un-obvious features of their social and natural environments. But these sensitivities would have been embedded in particular procedures and contexts, and would not have been available to thought outside of those contexts. (See Mithen, 1996, for extended argument in support of the mutual isolation of the specialist cognitive systems within our remote as well as our more immediate ancestors, grounded in the archaeological record.)

            Now, suppose that the story which Bickerton tells us about the next stage of hominid evolution is broadly correct. That is, suppose that our ancestors were evolving the capacity to employ ‘proto-languages’ – simple systems of communication and representation, with increasingly large vocabularies, but with little recognisable grammar (see also Gomez, this volume, ch. 4). This would have been connected with an extension in the range and abstractness of hominid thought, with proto-language lexical items forming the most abstract level of conceptual representation. Even such a simple system of communication and conceptual representation would have enabled hominids to vastly extend the store of information which each individual could accumulate during a lifetime, as well as placing very considerable demands on on-line processing. So the arrival of proto-language may very well have coincided with the step-change in brain-size which occurs with the evolution of various different species of Homo erectus some 1.6 to 2 million years ago.

            At this stage the special-purpose reasoning systems would probably have altered, too, in such a way as to operate independently of perceptual input, now taking as input proto-language representations and generating proto-language sentences as output. This latter point is especially easy to see in connection with the theory-of-mind quasi-module. For with the appearance of proto-language, hominid speech-behaviour would then have been one of the prime objects of intentionalist (TOM) explanation and prediction. But the same is very likely true of other quasi-modules too. Thus merely being told about some recent event might have been sufficient to evoke the causal-reasoning quasi-module into activity, attempting to construct the best causal explanation of the event described.

            By this stage the main building-blocks necessary for a meta-representational, reflexively-conscious, central executive, with a structure something like that depicted in section 2.1 above, may well all have been in place (see also my 1996a, ch. 8). We can assume that by now hominids would at least have been capable of imagination of various sorts, re-deploying the resources of perceptual modules in the manner outlined earlier (indeed, there is evidence from the fossil record that Homo erectus was capable of generating and transforming visual images, at least; see Wynn, 1993) – although I shall suggest in a moment that this capacity would rarely have been used, prior to the evolution of grammatical language. So with the arrival of proto-language, hominids may also have been capable of generating proto-language sentences in auditory or motor imagination, thus creating inner speech. Items of inner speech would then have been made available to the same set of conceptual resources as could be brought to bear on heard (overt) speech, including the resources of the theory-of-mind faculty. Whether these items could then have become objects of further, meta-representational, thought (Level 2 explicit, in the sense of Karmiloff-Smith, 1992) depends upon the question whether full TOM abilities are, or are not, dependent upon grammatical language (see Gomez, this volume ch. 4; Varley, this volume ch. 6; Segal, this volume ch. 7).

            Finally, the fully-grammatical natural-language faculty began to evolve, culminating with the first appearance of Homo sapiens sapiens in southern Africa some 100,000 years ago. This would have extended the range and sophistication of hominid thought still further – introducing explicit (as opposed to implicit/domain-specific) conditionals, subjunctives and counterfactuals, for example. And natural language representations would still have been exploited and deployed by the special-purpose reasoning systems, which would have come to operate, now, on natural-language (properly grammatical) sentences. And at this time, too, we can be confident that the materials necessary for a reflexively-conscious central executive would all have been in place. The stage was set for the explosion in creative thinking which occurred with the arrival of sophisticated culture some 50,000 years later.

            But why would the evolution of grammatical language have involved anything like a natural-language module? After all, proto-language was (and is) not modular in any sense. So why should its successor have been modular either? The most likely answer to this question is as follows. The main problem confronting our hominid proto-language-using ancestors was, we may presume, one of speech interpretation. Proto-language utterances are multiply ambiguous, requiring heavy reliance upon contextual knowledge and assumptions about speaker intentions in order to be understood. There would then have been very great advantage in the evolution of a special-purpose faculty for imposing greater structure on utterances, and for deriving interpretations on the basis of such structures. With the arrival of a grammar-faculty, interpretation could become semi-automatic, in fact. (Not that it had to be – the same resources and cognitive effort which had previously been devoted to basic interpretation could now be applied to pragmatics, and to such matters as word-play, metaphor, and irony. For theories of the sorts of processing involved, see Sperber and Wilson, 1986/1995.)

            Other answers are also possible, however, if less likely. For example, some have argued that the grammar faculty may have been the result of run-away sexual selection, like the peacock’s tail (Miller, 1996). But the important point is that the main pressures leading to the evolution of the grammar faculty where those of input and output. They had to do with the need to produce, and to decode, easily interpretable utterances (or perhaps with mate-preferences for structurally elaborate speech). So, like other input and output systems, the grammar faculty would have taken on a fairly strongly modular form. But with central-process quasi-modules already geared up to operate on proto-language sentences, it would have required but small modifications for them to operate on natural-language (properly grammatical) sentences instead, paving the way for the creation of the sort of virtual executive described in section 2.1 above.

            There is a problem with this story, however. For how could the pressures leading to the evolution of the grammar faculty have been basically communicative ones, if grammar was also implicated in thought, and served to extend the range of thoughts available to our ancestors? What could have led to the emergence of subjunctive and counterfactual forms of conditional, for example, if early types of sapiens had not already been capable of subjunctive and counterfactual thinking? The answer, I take it, is a kind of boot-strapping account, drawing on what I assume to be a quite general phenomenon – namely, that we can always think more than we can say, and that whenever people have a system of signs available to them, there are always more things that they can do with it, in thought, than they have explicit markers for in the public system. (See Sperber and Wilson, this volume ch. 9, for one sort of example of this phenomenon, where more-specific variants of public concepts can be introduced for use in the speaker’s own thoughts.) So with each new innovation in the grammar-system, there would be, not just a new range of utterances to be decoded, but also a new set of communicative uses of those utterances which would need to be interpreted, laboriously, by inference, hence creating further selectional pressure for yet further grammatical innovations; and so on.


4.2       The evolution of creative thinking

We may presume, then, that the mind of early Homo sapiens sapiens was basically quite similar to that of some of the later species of Homo erectus and the Neanderthals, except that it contained a specialised grammar module (perhaps also containing a more highly developed TOM quasi-module). This would have given our species a crucial advantage, sufficient to explain its rapid colonisation of the globe (with Australia being reached by boat for the first time some 60,000 years ago), together with the extinction of competing species of Homo. Grammatical language would have conferred on Homo sapiens sapiens the capacity for more sophisticated planning of hunts, and also the ability to accumulate a much richer set of beliefs and knowledge about the world, as well as to acquire and transmit complex skills through instruction. (And indeed, the evidence is that Homo sapiens sapiens was more efficient at hunting than its predecessors, and soon began to carve harpoons out of bone, beginning fishing for the first time; see Mithen, 1996, pp.178-183.) Moreover, grammatical language would have made possible whole new orders of social complexity and co-ordination, sufficient to explain the extinction of competitors (by warfare if necessary).

            The evidence from the fossil record is that Homo sapiens sapiens of 90,000 years ago was of basically modern intelligence, accumulating knowledge about its environment and making a number of important technological innovations; but that it was crucially lacking in imagination (Mithen, 1996). Although the working of wooden artefacts may have undergone some change, and bone tools were introduced for the first time, essentially the same range of stone tools as had been employed by erectus continued to be used for tens of thousands of years. And there was no sign of the use of body-ornaments, or of the production of art (and little evidence of religion) until all these exploded onto the scene world-wide some 40,000 years ago. To explain this, we either have to suppose that the knowledge accumulated by Homo sapiens sapiens and transmitted via language reached some critical mass circa 40,000 years ago, to trigger an explosion of creative thinking; or we have to postulate some simple, multiply-evolved, cognitive adaptation to underpin the change. It is hard to see how an explanation of the former sort could work, since it is difficult to understand why imaginative thinking should presuppose a large body of accumulated knowledge. So my proposal is of the latter kind, with the evolution of an innate predisposition to engage in pretend play marking the crucial divide.  But this will require some setting up.

            Why do I think that the hominid capacity for visual and other forms of imagination would not have been much used, prior to the evolution of grammatical language or for some time thereafter? Because of the evidence of the fossil record, as described by Bickerton and Mithen – if Homo erectus and early Homo sapiens had used their imaginations on a regular basis, then one would have expected them to have had a bigger impact on their environment, and to have developed a wider variety of tools and artefacts. However, I am assuming that grammatical language provided for the first time a capacity to entertain explicitly, and to interpret, various forms of conditional. It seems plausible that this would have been linked to a capacity to suppose – to entertain a supposition, and to reason hypothetically from there. This capacity lies at the very heart of Homo sapiens sapiens’ creativity and adaptability, I believe. And it is essentially the same capacity as that which is exploited in pretend play (Jarrold et al., 1994), which is best understood as practice for our adult employment of supposition in creative hypothetical reasoning (Carruthers, 1996c).

            The young of all species of mammal engage in play of various distinctive sorts, the function of which seems to be to prepare them for adult activities (Smith, 1982). What is special about the play of human children, in all cultures from quite a young age, is that they engage, not just in play (e.g. rough-and-tumble fighting) but in pretence. They use one object to symbolise another (a pencil as an aeroplane; a banana as a telephone), they pretend to adopt social roles (cowboy; fireman; nurse), and they engage in games of pure imagination (carrying on a conversation with an imaginary friend; pretending that the woods are full of dinosaurs, or that they are on an island surrounded by alligators). So it seems reasonable to think that human children must be wired up in such a way as to detect, and to find intrinsically rewarding, their own mental state pretending, or supposing. (And it may then be the failure of this mechanism, through failure of mental-state detection – i.e. delayed or damaged TOM – which explains the marked absence of pretend play in autism; see Carruthers, 1996c.) At any rate, we can readily envisage quite a simple mechanism which would provide children with intrinsic rewards for entering into and maintaining the mental state of pretending, perhaps by boosting the level of interest which the child antecedently has in the content of the play supposition.

            The evolutionary story may then go something like this. A grammar module made its first appearance with the evolution of Homo sapiens sapiens. This provided the representational resources to engage freely and explicitly in hypothetical and counter-factual thinking, which immediately had a considerable impact on the life of the species – in terms of planning, social co-ordination, and making possible a number of technical innovations. At this stage the use of suppositional thinking would have been entirely practical, and tied to particular domains of activity (hunting, social relations, artefacts). But it would have been important enough for there to be pressure for the evolution of a simple pretend-play mechanism, of the sort described above, designed to give children a head-start in this species-distinctive form of activity. But once that happened (circa 40-50,000 years ago) a snowball of creative thinking was set in motion which has not stopped moving yet, giving rise to the unique flexibility and recursive improvability of the distinctively modern mind.

            What I am suggesting is that our disposition to make regular use of visual and other sensory forms of imagination (including inner speech) piggy-backs upon our ability to deploy and reason from language-involving suppositions, and on the fact that children are pre-disposed to find the exercise of supposition intrinsically rewarding, in pretend play. There are then two complementary ways in which a highest-level executive could thereby have been created, as we saw briefly in section 2. One picks up on the idea developed by Frankish (this volume, ch. 12), that sentences of inner speech would be available to become objects of rationally-motivated decision – the subject might decide to accept the sentence in question as a premise in further practical or theoretical reasoning, for example, thus creating a new kind of virtual belief. The other is that subjects might be taught, or might teach themselves, to accept, and to make, certain sorts of transitions between natural-language sentences, thus engaging in new forms of inference. But since these transitions would be available to awareness and to critical reflection, they would not be fixed, but would be subject to further refinement and improvement (once again, of a rationally motivated sort).

            Recall, too, that the various central quasi-modules would already have become set up so as to operate upon natural language inputs, and to produce natural language outputs. The creation of an inner-speech-using central executive would then have enabled these specialised systems to communicate with one another on a regular basis, co-operating in the solution of problems, or generating ideas which cross quasi-modular barriers, such as animals which can speak or persons who can exist without a physical body. And the regular operation of imagination, independently of any particular practical context or problem, would have paid rich dividends in the production of genuinely novel ideas and inventions.

            The basic picture I want to present, then, is of an innately structured natural-language module whose resources are routinely accessed by the various reasoning and executive systems of central cognition, in such a way that the latter function, in part, by deploying and transforming natural-language representations. We can therefore predict that if this form of cognitive conception of language is correct, then an individual who had never acquired any natural language would lack any capacity for (certain forms of) explicit-conceptual (as opposed to implicit, or visuo-spatial) thought. And we can also predict that someone whose natural language system was completely destroyed, would completely lose the capacity for just those kinds of thinking. In the next section I shall develop these implications further.


5          Empirical commitments

The account sketched above is consistent, at least, with the available empirical data; and some of that data speaks tentatively in its support – see my 1996a, chs. 2 and 8 for discussion. But how might it specifically be tested? What particular empirical commitments does the theory have, which might differentiate it from other approaches, and how might they be investigated? When the theory is compared with Bickerton’s account, the differences relate mostly to evidence for, or against, quasi-modularity and the existence of a central executive. Thus my account should predict that there may be cases where language is spared but various other competencies – in theory of mind, in cheater-detection, or in causal-explanatory reasoning, for example – are damaged or lost altogether; whereas Bickerton must deny this. And the autism data, at any rate, speaks strongly in favour of just such a quasi-modularist picture of central cognition (Baron-Cohen, 1995). When the theory is compared with any form of communicative conception of language, in contrast, there are roughly three sets of distinct empirical commitments which could be used to test between the two approaches. I shall discuss each of them briefly in turn.

            One prediction turns specifically on the modularism of the proposal. It is that humans lacking any language would be incapable of explicit conceptual thought about a variety of domains, including perhaps theory of mind, social fairness, and unobservable causes. (I say unobservable causes to exclude Humean learning-by-association, which would not really require reasoning, and would almost certainly not require language. So I find the data presented by Varley – this volume, ch. 6 – of an a-grammatical aphasic who could pass a causal-reasoning task unsurprising in this regard; for the test was essentially one of causal association.) For the proposal is that explicit thought about such domains depends either upon the specialised quasi-modules in question being fed with language or proto-language inputs, or at least on their being capable of producing such linguistic representations as outputs, for consumption by other central systems. If reliable non-linguistic tests can then be devised for the capacity to entertain explicit thoughts about such matters, then this prediction might be tested on people with global aphasia, or on pre-conventional-language deaf children who lack any vocabulary items drawn from the test domain. (In similar spirit, such tests might be run in connection with more limited forms of aphasia, provided that the people in question lack any relevant vocabulary items; but with the added complication that the problems such people experience may simply be those of input and output, while retaining a centrally stored representation of the lexical items in question.) However, there will, of course, be the usual problems of avoiding covert training, and the Clever Hans phenomenon.

            Another prediction relates to conscious (reflexively available, meta-represented) thinking. For the proposal is that the central executive runs (at least in that part of its activities which is properly conceptual) on imaged natural language representations (the other part running on visual and other images, which might underpin explicit spatial reasoning). So a test of this proposal would have to test the benefits of flexibility and revisability which attaches to conscious thinking. It is at least possible to imagine what such tests might look like, to be conducted on patients with global aphasia, for example. But unfortunately there are two distinct confounds, each arising from the involvement of theory of mind in conscious thinking. The first is that if it should turn out to be wrong that sophisticated TOM abilities require language, then people without language might succeed in tests of conscious thinking, but might do so, not because language is not normally implicated in such thinking, but rather because they can mimic some of the advantages of conscious thinking – thus they might be able to use their intact TOM abilities to attribute thoughts to themselves by means of a process of self-interpretation, for example. (I assume, here, that conscious thinking requires non-inferential access to our own thoughts; so thoughts self-ascribed on the basis of self-interpretation would not count as conscious ones; see my 1996a, ch. 7.) The second confound is the converse difficulty, that if it turns out to be correct that sophisticated TOM requires language, then those who fail tests of conscious thinking may do so, not because conscious thinking itself involves language, but rather because the TOM capacities which are necessary for occurrent thoughts to be meta-representationally available require language. It is not easy for me to see, at this stage, how tests might be devised which would control for these confounds.

            The third prediction picks up on the idea that there may be some forms of explicit thinking and reasoning – such as subjunctive and counterfactual reasoning – which are specifically dependent upon the availability of the appropriate grammatical forms. This prediction might then be tested in two distinct ways, provided that reliable non-linguistic tests of explicit counterfactual reasoning can be devised. First, we could see whether people with global aphasia or a-grammatical aphasia can nevertheless succeed in such tasks. And second, we could see whether the performance of pre-conventional-language deaf children on such tasks improves dramatically when they are finally introduced into a Signing community. (This would, in effect, be a re-run of an aspect of the Luria and Yudovich 1956 twin-experiment, in which the pretend-play and reasoning abilities of two barely-linguistic twins improved dramatically over the three-month period in which they acquired significant amounts of language.) For it would surely be implausible that any significant improvement could be due to new beliefs which the children had acquired through communication from adults, since so little time would have elapsed, and since most early uses of language involve comment upon items which are perceptually salient to both speaker and hearer. The most plausible explanation of any improvement would be that the provision of grammatical language had made it possible for the children to entertain new kinds of thought, by providing a vehicle for those thoughts.

            Finally, it may be worth noting that the account of the evolution of creative thinking, sketched above, carries a commitment to a two-component theory of the mechanism responsible for pretend-play. This mechanism consists of a pretence-detector, which we can assume to draw on the resources of (early-developing forms of) the theory of mind quasi-module (TOM); and a motivator, which gives the child some intrinsic reward whenever a mental state of pretending is detected. The prediction must then be that there could be two dissociable ways for the mechanism to break down – either the pretence-detector might fail (e.g. through delayed/damaged TOM), or the motivator might. Pleasingly, the available data support just such a dissociation. In a large screening study undertaken for childhood autism, Simon Baron-Cohen and colleagues found that in all cases where children lacked joint-attention and proto-declarative behaviours (thought to be early stages in the development of TOM), pretend play was also absent. But they also found children in whom pretend play was absent while joint-attention and proto-declarative behaviours were present (Baron-Cohen et al., 1996). More research is needed to see if the latter group of children retain the capacity for pretence but lack the motivation for it (as my account predicts); and also to see if they display any marked lack of creativity in their thinking in later life.


6          Conclusion

In this chapter I have offered a limited defence of (a weak version of) the cognitive conception of language, both by offering an additional argument in its support (following Bickerton), and by showing how it can be rendered consistent with a broadly modularist conception of language and mind. I have also indicated directions in which the proposals might be tested. But, as a philosopher, I must leave the actual devising of such tests to others.



I am grateful to all those who participated in the Hang Seng Centre conferences over the period 1994-7, for providing me with much of the stimulus for my ideas; and to Derek Bickerton, Paul Bloom, George Botterill, Jill Boucher, Keith Frankish, Susan Granger, Steven Mithen, and Neil Smith for their comments on earlier drafts.



Andrews, J., Livingston, K., and Harnad, S. (submitted). Categorical perception effects induced by category learning.

Atran, S. (1990). Cognitive Foundations of Natural History. Cambridge University Press.

Baddeley, A. (1986). Working Memory. Oxford University Press.

          (1988). Human Memory. Lawrence Erlbaum.

Baddeley, A., and Hitch, G. (1974). Working memory. In G. Bower  (ed.), The Psychology of Learning and Motivation, vol. 8 (47-90). Academic Press.

Barkow, J., Cosmides, L., and Tooby, J. (eds.) (1992). The Adapted Mind. Oxford University Press.

Baron-Cohen, S. (1995). Mindblindness. MIT Press.

Baron-Cohen, S., Cox, A., Baird, G., Swettenham, J., Drew, A., Nightingale, N., Morgan, K., and Charman, T. (1996). Psychological markers of autism at 18 months of age in a large population. British Journal of Psychiatry, vol. 168, 158-163.

Bickerton, D. (1990). Language and Species. University of Chicago Press.

¾        (1995). Language and Human Behaviour. University of Washington Press. (UCL Press, 1996.)

Byrne, R. (1995). The Thinking Ape. Oxford University Press.

Byrne, R., and Whiten, A. (eds.), (1988). Machiavellian Intelligence. Oxford University Press.

Carruthers, P. (1996a). Language, Thought and Consciousness: an essay in philosophical psychology. Cambridge University Press.

          (1996b). Simulation and self-knowledge: a defence of theory-theory. In P. Carruthers and P.K. Smith (eds.), Theories of Theories of Mind (22-38). Cambridge University Press.

          (1996c). Autism as mind-blindness: an elaboration and partial defence. In P. Carruthers and P.K. Smith (eds.), Theories of Theories of Mind (257-273). Cambridge University Press.

Chomsky, N. (1995). Language and nature. Mind, vol. 104, 1-61.

Churchland, P. (1981). Eliminative materialism and propositional attitudes. Journal of Philosophy, vol. 78, 67-90.

Cosmides, L. and Tooby, J. (1992). Cognitive adaptations for social exchange. In J. Barkow, L. Cosmides and J. Tooby (eds.), The Adapted Mind (163-228). Oxford University Press.

Dennett, D. (1991). Consciousness Explained. Allen Lane.

Dunbar, R. (1993). Coevolution of neocortical size, group size and language in humans. Behavioural and Brain Sciences, vol. 16, 681-694.

¾        (1996). Grooming, Gossip and the Evolution of Language. Faber and Faber.

Elman, J., Bates, E., Johnson, M., Karmiloff-Smith, A., Parisi, D., and Plunkett, K. (1996). Rethinking Innateness: a connectionist perspective on development. MIT Press.

Fodor, J. (1983). The Modularity of Mind. MIT Press.

          (1987). Psychosemantics. MIT Press.

Gathercole, S., and Baddeley, A. (1993). Working Memory and Language. Lawrence Erlbaum.

Giere, R. (1988). Explaining Science: a cognitive approach. Chicago University Press.

Goldstone, R. (1994). Influences of categorisation on perceptual discrimination. Journal of Experimental Psychology: General, vol. 123, 178-200.

Gomez, J-C. (1996). Some issues concerning the development of theory of mind in evolution. In P. Carruthers and P.K. Smith (eds.), Theories of Theories of Mind (330-343). Cambridge University Press.

Hurlburt, R. (1990). Sampling Normal and Schizophrenic Inner Experience. Plenum Press.

Jarrold, C., Carruthers, P., Smith, P.K., and Boucher, J. (1994). Pretend play: is it meta-representational? Mind and Language, vol. 9, 445-468.

Johnson-Laird, P. (1983). Mental Models. Cambridge University Press.

Karmiloff-Smith, A. (1992). Beyond Modularity: a developmental perspective on cognitive science. MIT Press.

Kosslyn, S. (1994). Image and Brain. MIT Press.

Levelt, W. (1989). Speaking: from intention to articulation. MIT Press.

Lewis, D. (1966). An argument for the identity theory. Journal of Philosophy, vol. 63, 17-25.

Lucy, J. (1992a). Grammatical Categories and Cognition. Cambridge University Press.

¾        (1992b). Language Diversity and Thought. Cambridge University Press.

Luria, A. and Yudovich, F. (1956). Speech and the Development of Mental Processes in the Child. Trans. Kovasc and Simon, Penguin Books, 1959.

Miller, G. (1996). Sexual selection in human evolution. In C. Crawford and D. Krebs (eds.), Evolution and Human Behaviour. Lawrence Erlbaum.

Mithen, S. (1996). The Prehistory of the Mind. Thames and Hudson.

Norman, D. and Shallice, T. (1986). Attention to action. In R. Davidson, G. Schwartz and D. Shapiro (eds.), Consciousness and Self-Regulation 4. Plenum Press.

Perner, J. (1991). Understanding the Representational Mind. MIT Press.

Pinker, S. and Bloom, P. (1990). Natural language and natural selection. Behavioural and Brain Sciences vol. 13, 707-727.

Shallice, T. (1988). From Neuropsychology to Mental Structure. Cambridge University Press.

¾        (1994). Multiple levels of control processes. In C. Umilta and M. Moscovitch (eds.), Attention and Performance XV. MIT Press.

Shallice, T. and Burgess, P. (1993). Supervisory control of action and thought selection. In A. Baddeley and L. Weiskrantz (eds.), Attention, Selection, Awareness and Control. Oxford University Press.

Smith, N. and Tsimpli, I-M. (1995). The Mind of a Savant: language-learning and modularity. Blackwell.

Smith, P.K. (1982). Does play matter? Functional and evolutionary aspects of animal and human play. Behavioural and Brain Sciences, vol. 5, 139-155.

Spelke, E., Phillips, A., and Woodward, A. (1995). Infant’s knowledge of object motion and human action. In D. Sperber, D. Premack and A. Premack (eds.), Causal Cognition (44-78). Oxford University Press.

Sperber, D. (1996). Explaining Culture: a naturalistic approach. Blackwell.

Sperber, D. and Wilson, D. (1986). Relevance: communication and cognition. Blackwell. (2nd Edition 1995.)

Wellman, H. (1990). The Child's Theory of Mind. MIT Press.

Wynn, T. (1993). Two developments in the mind of early Homo. Journal of Anthropological Archaeology, vol. 12, 299-322.