An alternative neural network representation for conceptual knowledge

Paul Jorion
jorion@aris.ss.uci.edu

Official reference: http://aris.ss.uci.edu/~jorion/Texts/AI/GRAPH.html

Paper presented at the British Telecom CONNEX Conference, Martlesham Heath, January 1990



  1. Introduction
  2. There has been extensive discussing lately about the possible usage of (formal) neural networks in the representation of conceptual knowledge. Some authors have defended the feasibility of the concept (McClelland & Kawamoto 1986 ; Shast ri 1988 ; Cottrell 1989), others have stated that « classical » neural networks are insufficiently structured to provide the appropriate basis for conceptual knowledge representation (Minsky & Papert 1988 ; Feldman 1989 ; Pere z 1989).

    In this paper we present an alternative neural network model - here called memory network - which differs from the classical multilayer perceptron-type in being highly structured and in not relying on any statistical cancelling out throug h non-linear filtering. This new connectionist model displays however the typical qualities of neural networks such as fault-tolerance and resistance to « neuron-death » through redundancy and distributed storage ; it complements these qual ities with those of automatic structuring (insulation, Minsky & Papert 1988: 270-271), self-organisation leading to emergent properties (1), and graceful scale-up.

    Although this new model shares also some features of classical (« Quillian-type ») semantic networks, it differs from these in being directly mappable on a neural network. This provides it with a high neurobiological plausibility - a p roperty semantic networks lack altogether. The proposed neural network has been successfully implemented in the interactive system ANELLA (Associative Network with Emergent Logical and Learning Abilities) (Jorion 1988 (2)). In orde r for it to be developed, a new mathematical object called the P-Graph has had to be defined. It is claimed here that P-Graphs offer a plausible and practical mathematical tool for modelling the cerebral cortex.



  3. Networks used in the representation of conceptual knowledge: their neural interpretation
  4. Both (classical) semantic networks and (formal) neural networks rely on specific representation conventions which have become so familiar that it is hardly ever considered anymore that they could have been otherwise than they are, and cou ld therefore be challenged. For instance, the classical representation of a semantic network implies that terms are attached to nodes and relations to arcs. Although intuitively appealing because human beings spontaneously conceive of objects as in a way « concrete » (hence a node representation) and relations as necessarily abstract (hence an arc representation), such a design-mode is nothing but a convention whose familiarity has made it « obvious » (the b enefits will be shown here of the converse convention).

    Neurobiological plausibility for knowledge representation models has gained credence in recent years through the persuasive advocacy of a number of perceptive researchers (Stich 1983 ; Churchland 1986 ; Rumelhart & McClelland 1986b ; Norman 1986). In Churchland and Sejnowski's words: « The point is, evolution has already done it, so why not learn how that stupendous machine, our brain, actually works ? » (1989: 43). The considerable computational costs involved in mo st intelligent systems in use have meant that an opinion has gained momentum that the cerebral cortex provides a quasi-optimal implementation for the representation of conceptual knowledge.

    Both neural networks and semantic networks rest on the same underlying structure of a directed graph. A possible interpretation in terms of a biological neural network of such graphs differs however widely in the two cases.

    Briefly, in a formal neural network, the neuron body (nucleus or soma) equates with a counter associated with a discrete non-linear filter. In the counter, values obtained from other cells’ synapses through dendrites build up. Whenever a threshold value is attained (through continuous or synchronised input), the cell body activates the ramifications of the axon by sending along them a « unary » message. Then it relaxes back to a zero value. Synapses act as valves of a particular section: they modulate the out-flowing « unary » message and allow a particular proportion (synaptic weight) of it to be transmitted to the one neuron they are connected with wherein operates again a counter associated with a discrete no n-linear filter.

    This means that the interpretation of the graph of a neural network is the following: arcs are labelled with synaptic weights with e.g. continuous values between 0 and 1. To neuron nuclei are dynamically attached at any particular moment two val ues: the counter value, continuous over a 0 to threshold (often 1) range, and the corresponding digital choice of 0 or 1 according to whether or not the counter value is under or above the threshold. Input neurons, hidden, and output neurons are only rele vant for symbolic interpretation in their capacity of being on (1) or off (0) within a digital vector representing a set of (neuron) cells.

    Whether biologic neural networks do indeed work in the manner suggested by formal neural networks is beyond the scope of this paper. What is here of concern is whether or not a formal neural network is mappable on a biological one, and to this particular question, the answer is an unqualified « Yes ».

    If one turns now to traditional semantic networks, the answer is quite different. Briefly said, in a semantic network, concepts are attached to nodes and relations attached to arcs (in Sowa's variant of semantic networks: conceptual graphs, both concepts and relations are attached to nodes while arcs symbolise pure « void » linkages [Sowa 1984]). The dynamics of the semantic network consists of a serial (sequential) « reading » of a particular path walked through the d irected and labelled graph. Thus if a node labelled « boy » is connected to a node labelled « girl » through a directed arc labelled « meets », a path walked along them leads to the immediate symbolic reading « boy meets girl ».

    To be fair to their proponents, no suggestion was ever made that semantic networks might be liable of a direct neurobiological interpretation. But let us develop the examination in the perspective opened up by two reasonable hypotheses,

    1) the « evolution has already done it » principle that in the case of representation of conceptual knowledge the neurobiological option is quasi-optimal,

    2) the reduction principle according to which modes of representation of conceptual knowledge will sooner or later need to be translatable in neurobiological terms.

    In the case of a semantic network, a straightforward neurobiological reading would be as for a formal neural network that the nodes of the graph represent the neuron body. Activation through firing would mean that the concept tagged to a node is activa ted (read out) while an activating signal is transmitted down the ramifications of the axon to the synapses. Some competition principle would then decide which ramification is « winning » over the others. The activated synapse corresponding to t he winning ramification then means that a particular relation is being read out. Finally it is the label attached to the neuron’s body of the cell connected with the « winning » synapse that is read out. Thus « boy » as the interpretat ion of body of cell 1, « meets » as the interpretation of the inter-cell 1-cell 2 synapse, and « girl » as the interpretation of the body of cell 2.

    Mapping a classical semantic network in such a manner on a biological neural network would require an incredible amount of pruning and cutting to size which no theoretical argument seems to justify. Furthermore some features of the inner design and fun ctioning of semantic networks do discourage such a painstaking reshaping. Among the major ones are:

    1) the functional distinction between « actual » and « token » nodes.

    Semantic networks force to distinguish between nodes which actuate a particular concept within their own definition sub-network and ones playing only a representational role in some other words’ definitional sub-networks.

    Thus in a Quillian-type semantic network:

    « ... a token node is used simply as part of a type node definition... if (...) links existed between the only representation of each word in a system it would be impossible to follow any single definition since they would all involve links through the same words. Consequently, the definition of the type ‘plant’ is a collection of nodes (e.g. ‘live’) with links between them, where the nodes are copies of the node when it itself is defined. ... These copies of defined nodes are termed ‘token s’. In general for each type node there will be many token nodes scattered throughout the model » (MacRandal 1988: 50).

    2) the damageable lack of redundancy in concept instantiation. As reviewers have noted:

    «  ... to represent the sentence ‘John picked up a book and Mary threw a book at John’, there must be a node representing each of the books » (Conway & Wilson 1988: 132).

    3) the representational confusion that ensues in semantic networks whenever coming to terms with redundancy cannot be further avoided:

    « For example, the system has to be able to deal with sentences such as ‘Every boy loves his dog’ where ‘dog’ is a definite variable entity whose instantiation exists but is different for each instantiation of ‘boy’, and such as ‘Every boy needs a dog’, where the node ‘dog’ is an indefinite variable entity » (MacRandal 1988: 65).

    In fact, the graph representation of classical semantic networks was never intended to favour mapping on a biological neural network, it was required to represent an entirely distinct type of object - something like an ideal type of dictionary - the question of neurobiological plausibility remaining untackled. Any attempt like the one above to map the concept of a semantic network on that of a neural network is bound therefore to remain unsuccessful. The purpose of the attempt made here was to t est the neurobiological plausibility of classical semantic networks. The answer we suggest is that the plausibility is nil.



  5. Arguments in favour of a distributed representation of conceptual knowledge
  6. The points just made on the a priori difficulty of mapping a classical semantic network on a neural network have stressed the crucial importance of some element of redundancy in the architecture of whatever system is used. The ad hoc nature of such «&n bsp;tricks » as resorting to token nodes to ensure redundancy precludes however automatically that the representation has any neurobiological plausibility.

    A common observation from neurobiology is that despite « natural » neuron death within the human nervous system knowledge of concepts is not impaired. Even in the case of dramatic cerebral lesions indeed, it is only speed in recovery that see ms to suffer, not conceptual knowledge as such. This fact suggests forcefully the presence of some amount of redundancy in storage.

    But further arguments can be gathered in favour of redundancy. For instance, non-pathological human speech is prone to mishap, « slips », of a variety of sorts. Slips result from some kind of « derailing » in retrieval of stored kno wledge, but even such derailing follows its own rules. Here is for instance a type of derailing that does not occur even in the most dramatic pathological cases:

    * « When in Egypt I visited the cones, ... hum I mean the cylinders, no the pyramids »,
    or
    * « Then Eve handed Adam an IBM, ... no, a Compaq, I mean an apple ».
    (See for further developments on this subject: Jorion 1990, chapter 10).

    What such fictitious examples seem to suggest is the relative insulation in conceptual knowledge storage for what a dictionary would distinguish as alternative usages of a particular word. Thus the necessity for redundancy in inscription along d ifferent linguistic usages of the same concept.

    Delocalization is an issue distinct from redundancy. One may want, like Sigmund Freud did, to see memory traces delocalized:

    «  ... ideas, thoughts and psychical structures in general must never be regarded as localized in organic elements of the nervous system but rather, as one might say, between them, where resistances and facilitations (Bahnungen) prov ide the corresponding correlates. Everything that can be an object of our internal perception is virtual, like the image produced in a telescope by the passage of light-rays » (Freud [1900] 1954: 611).

    But there does not seem to be any advantage attached to delocalization as such. One of its inconveniences is that it forbids in particular the quite practical mode of representation for symbolic material constituted of labels attached to either nodes o r arcs, requiring instead that these are painfully reconstructed from morpho-semantic elements through what Arbib aptly calls « co-operative computation » (Arbib 1987: 95).

    Holographic distribution where each concept is encoded in the full set of the system's cells - such as displayed by multilayer perceptrons - constitutes a straightforward way of ensuring both redundancy and delocalization of representation. But it may very well amount to an overshot as far as redundancy is concerned, especially when it comes to storing conceptual knowledge. The point has been stressed by Feldman:

    « Suppose that instead of one unit per concept, the system dedicated three or five, distributed somewhat physically. All the theories and experiments would look the same as in the one-unit case, but the redundancy would solve the problem of neuron death. Although the number of neurons dying is large, the fraction is quite small (± 10-6), so the probability of losing two of the representatives of the concept in a lifetime is quite low (± 10-7) (...) Whereas the purely punctate view is insupportable , there is no numerical problem with a theory that has each concept represented by the activity of a few units » (Feldman 1989: 75, 77).



  7. An alternative neural network: the P-Graph
  8. In fact, there exists somewhere mid-way within the range defined by the (classical) semantic network at one end and the (formal) neural network at the other, a structure which allows redundancy in the representation of concepts, displays perfect mappability on a biological neural network and presents when functioning interesting emergence phenomena.

    In an alternative to a classical semantic network, concepts would be attached to arcs and relations to nodes. Thus instead of having a semantic network as in figure 1,

    <Figure 1>

    one would have as in figure 2:

    <Figure 2>

    Such transposing may seem straightforward but it is not as there is more than one way for transposing the nodes of a graph into arcs and arcs into nodes, i.e. obtaining the dual of a graph.

    But let us assume for the time being that this operation of obtaining the dual of a graph is unproblematic. What possible translation is there of such a dual semantic network in terms of a biological neural network ? In this particular instance, « boy » would be attached to a ramification of the axon or to its end- synapse, « meets » to the cell body of the connected neuron and « girl » to a ramification of its axon.

    At first glance the dual semantic network does not seem to offer any impressive advantage over the traditional semantic network representation. It does however in terms of neurobiological plausibility and in more than one way. Let us cons ider why by developing a few illustrations.

    A P-graph is thus the P-Dual of a particular type of graph and the alternative neural network here proposed under the name memory network can be briefly characterised as the P-Dual of a classical semantic network. On the figures il lustrating the examples, neuron mappability will be emphasised through a slightly modified representation of a directed graph: instead of using as its « building blocks » either nodes or arcs, « neurons » will be used - a « neuron  » being composed in this instance of a node and a set of outward-branching arcs. (This convention is of course precisely that holding in the visualisation of <formal> neural networks). To emphasise a biological neuron interpretation of the figures, no arrow is drawn on the arc, and diverging arcs from the same node depart somewhere down a common stem suggesting the ramifications of the axon ending each with a « synapse ». Figure 3 illustrates this clearly.

    <Figure 3>

    Figures 1 and 2 display the straightforward construction of the P-Dual of a simple semantic network containing only two concepts: « Rex » and « dog ». Let us add now to the picture the new concept « pet ». Figure 4 a and b reveal that there is still no special difficulty in transposing from template to P-Graph.

    <Figure 4>

    But with more sophisticated cases a specific transposing method becomes indispensable. This is provided by the auxiliary use of an adjacency matrix of the template graph. The method is simple: a double entry table is constructed where nodes of the temp late graph are located with respect to their location between arcs. The matrix serves in a second step as a guide for drawing the P-Graph.

    Let us specify now that in addition to being a pet, a dog is also a mammal. Figure 5 shows how this is represented in a classical semantic network.

    <Figure 5>

    Here is the adjacency matrix corresponding to figure 5:

    abcdef
    aR
    bdd
    cp
    dm
    e
    f

    One should note that if the rule for building the adjacency matrix is indeed that of locating a node between arcs, some auxiliary arcs (a, d and f) are required lest « Rex », « pet » and « mammal » are absent from the matr ix (what this constraint of no isolated node in the template graph expresses is in fact the condition of « neuro-mappability »).

    One is now in a position for constructing the P-Graph by assigning nodes the names of the former arcs, and assigning the new arcs the labels of the former nodes (the number of nodes in this particular type of dual is the same as the number of arcs in t he template). One proceeds in the following manner: having posited the nodes a, b, c, d, e and f, the arcs existing between them are drawn as instructed by the adjacency matrix. For instance, there is now an arc « dog » between b and c and anoth er arc « dog » between b and e, etc. Figure 6 shows the resulting graph.

    <Figure 6>

    If one wants to examine what has happened to the P-Graph with the addition of « mammal », one can compare figure 4 b) and figure 6. A new neuron has shown up to represent « mammal », and « dog » has branched: a ramificatio n has been shot towards « mammal ».

    Now let us introduce another dog, « Lassie », in the picture. Figure 7 shows first the classical semantic network representation.

    <Figure 7>

    And here is the adjacency matrix:

    abcdefgh
    aR
    bdd
    cp
    d
    em
    f
    gL
    hdd

    Let us build the P-Graph accordingly, i.e. as shown in figure 8.

    <Figure 8>

    A « Lassie » neuron has now shown up, and from it has sprung a new « dog » neuron which has itself shot two ramifications towards the connections held by the original « dog » neuron. So that there are now alt ogether four « dog » synapses belonging to two distinct « dog » neurons.

    Now, a final illustration. Let us forget about Lassie and let us return to where things stood at the earlier stage when we only had as elements: « Rex », « dog », « pet » and « mammal ». And let us add « Mas ter » whereby we are introducing a new relation « has_a ». Now a pet has a Master, but a Master has as well a pet. Hence the classical semantic network representation as in figure 9.

    <Figure 9>

    And the adjacency matrix that ensues:

    abcdefgh
    aR
    bddd
    cp
    d
    em
    f
    gM
    hddd

    Some locations are now much trickier. Notice for instance, « dog » between b and g and between h and g, etc. Figure 10 b) shows the resulting configuration.

    <Figure 10>

    Compare with figure 10 a) (corresponding to figure 6) to see what the irruption of « Master » has meant for the P-Dual. Firstly, a new « Master » neuron has shown up. Secondly, the original « dog » neuron has shot a third ramification towards this new neuron. Thirdly, an entirely new « dog » neuron has shown up, duplicating the first one - but not perfectly: only as far as synapses are concerned. Fourthly, the new « dog » neuron has established an odd t ype of symmetrical connection with the « Master » neuron ; a cycle has indeed appeared in the network between a « dog » neuron and a « Master » neuron: one of the synapses of « dog » connects with the «& nbsp;Master » cell body while one of the synapses of « Master » connects with the « dog » cell body.

    One could pursue with illustrations of this kind but the reader is now sufficiently familiar with the method to have a personal try at further examples.



  9. The semantic interpretation of a P-Graph
  10. When describing the construction rules for a P-Graph we imposed a condition on the template graph: that no node of it should be isolated, this to ensure that it is appearing on the adjacency matrix as (at least) one instance and that it transposes as ( at least) one arc of the P-Graph. This condition implied of course the auxiliary construction of « token » arcs. It is essential that the condition should be met for there to be no loss in information in the transpose from the template graph to the P- Graph. But the same condition is also essential for « neural network mappability »: as nodes in the template transpose into axon ramifications and synapses in the P-Graph, the presence of an arc leading to each of them ensures the presenc e of a « cell body » for every « neuron » in the P-Graph.

    a) synapses

    As far as the synapses of a particular neuron are concerned, a semantic interpretation of a P-Graph is unproblematic: each synapse is a representative of the concept embodied by the neuron. But no neuron is necessarily the single representative of a pa rticular concept as other neurons may represent it as well separately. This is in fact the case as soon as occur in the network new instances of a same type ; we saw for example that the introduction of a second dog, « Lassie », led to a ne w « dog » neuron being instantiated.

    Redundancy of conceptual representation is thus doubly guaranteed in a P-Graph: redundancy of the concept is firstly ensured by the various synapses of the same neuron, then secondly by the n-plicity of neurons representing it for different inst antiations of a same type.

    Two classical difficulties of semantic networks in this respect are thus automatically resolved. Firstly, a solution to the neuron-death problem mentioned by Feldman is in-built in the P- Graph: very few concepts will be represented by a single neuron. Secondly, there is no necessity for any of the token-nodes of Quillian-type semantic networks: neurons ramify as much as required and quite automatically. We saw for example that a new instance of « dog » connects automatically with each strategically relevant piece of information about dogs yet present in the network ; the introduction of « Lassie » for instance generated a new « dog » neuron which automatically connected with both « pets » and «  mammals ».

    b) neuron nucleus

    Let us turn now to the neuron body. Until we added « Master » with its reciprocal relations « Master has dog » and « dog has Master », neuron bodies were only liable to a single interpretation: the classical « is_a&nb sp;» relationship of semantic networks. Should it be the case that there is only one interpretation for the neuron body, there would be no necessity whatever for attaching any labels to the nodes of a P-Graph: each node would be read out as « is_a&nb sp;» and that would be it. Things changed when we introduced « Master » in the network: from then on, a node had to be interpreted as meaning either « is_a » or « has_a », necessitating therefore appropriate labelling of node s.

    But is it really the case that appropriate labelling is required from then on ? Remember what happened with the « has_a » relation: a cycle intervened between the related concepts - which does not exist with the « is_a » rel ationship. In such a way that labelling the nodes could easily be replaced by a decoding of the local configuration. One could issue a rule of the type « if there is an immediate cycle between two neurons, read the node as meaning 'has_a' else read it as 'is_a' ».

    Would the « is_a » / « has_a » distinction be sufficient to organise a full network ? Hardly: only for elementary relations. Aristotle claimed that all relations could be expressed under the form « X is Y », but this trave ls only some distance. In the ANELLA system a module has been experimented which takes any « dictionary-type » definition and analyses it into a concatenation of « is_a » and « has_a » relations. For example,

    « Pyramid is_a monument »

    « Pyramid is_a big »

    « Egypt has_a pharaoh » « pharaoh has_a tomb » « tomb is_a pyramid »,

    etc.

    This works satisfactorily for dictionary definitions but would be too primitive a tool for immediate descriptive usage such as in « Look ! a plane crashed on the building across the street ! ». But it has the considerable merit of suggesting an entirely new approach to network interpretation, i.e. reading out a node, not as a label appended to it, but as the expression of the local topology of the network.



  11. Topological information in a neural network
  12. Let us elaborate on this idea. Formal neural networks rest on the supposition that a biological neural network is an essentially unstructured net: some arbitrarily restrictive constraints are imposed on a network where each cell could in principle be connected with every other, and the absence of a particular link only appears under the form of a synaptic weight with value zero (3). As far as such a network supports any conceptual knowledge, it is digitally: as the outcom e in the output layer of the net of some processing through the hidden units.

    In a formal neural network, the information available at synaptic level is a weight determining what flow will enter the connected cell. The information available at cell-body level - whether one is dealing with input, output or hidden units - is a mer e 0 / 1 digital value. The only thing for the world to see are ON or OFF cells which need for interpretation to be envisaged as vectors to be decoded (the « meaning » of hidden cells is allegedly irrelevant, making them equivalent to interphe nomena in physics: events taking place within the model but which are not « meaningful »).

    All links being potentially live in a neural network, there is no notion that any relevant information could lie in the topology itself of the net. A P-Graph interpretation of a neural network for conceptual knowledge representation would suggest on th e contrary that some important information could be encoded in the topology.

    Furthermore, a number of semantic issues suggest the importance of topological structuring of information storage in the brain. These are,

    1) analogy

    2) synonymy

    3) and figured speech.

    Analogical reasoning supposes the notion of a topological configuration of a language's semantics. As shown by Holland, Holyoak, Nisbett & Thagard (1986), analogy establishes morphisms between the semantic connections existing between words. Therefore the development of analogical thought in an intelligent system implies that knowledge is stored under a form such that the notion of a morphism is meaningful.

    The use of figures of speech implies the notion of a shift from literal to figured meaning, i.e. from one word to another. This can be easily achieved under network representation. It is possible to develop a theory of various figures of speech in term s of increasing distance of displacement on a network from literal to figured meaning (distance increases from synecdoche, to metonymy and from metonymy to metaphor).

    And as far as synonymy is concerned, synonyms are easily located as a strict isomorphism existing in the network for specific usages of two different words (two words are never synonyms for every one of their possible usages).



  13. The dynamics of a P-Graph neural network
  14. In order for a P-Graph neural network to work, a dynamics operating on it needs to be defined. What a plausible dynamics would look like will be discussed now.

    Contrary to classical neural networks that encode information at the subsymbolic level, the memory network encodes complete « signifiers »: words envisaged as an acoustic or graphic imprint, i.e. information represented at the < U>symbolic level. The subsymbolic step is here bypassed: the network's arcs are labelled with meaningful units that only need to be concatenated, i.e. read out serially in sentence form (syntagm) for semantic meaning to be generated. A methodol ogical choice needs therefore to be made: either the activated material is read out « as it comes »: in the process of being serially produced, or it is provisionally stacked in parallel, and is read out serially later on in a ranking which may be different from that of its initial elicitation.

    Is it possible to suppose parallel processing on a memory network structured as a P-Graph ? In order to answer this question let us imagine the following dynamics for a memory network. A syntagm is submitted to the system, triggering a spreading activa tion process. After a few steps, a considerable number of signifiers in the network have been activated. Only a stringent competition principle could then decide which signifiers should be retained to contribute to the output. But what competitive princip le to choose ? Would a number of the first ranked words be retained ? But if so, would they be the first two or ... the first ten ? And in what order would they be drawn for a syntagm to be constructed ? It is not clear either what the competition would e xactly be about.

    So let us consider an alternative type of processing: spreading activation but with competition taking place at every step. Here is what this means. As before, a syntagm is submitted to the network and activates a number of neurons corresponding to the signifiers composing it. Let us suppose for the sake of exposition that for each signifier, only one neuron complies, and let us focus on the descent of one particular signifier. Activation of a neuron means that a signal will travel down this neuron's a xon, spread through its ramifications and reach the end-synapses. This time a single-winner competition ensues and only one of the connected neurons is activated. The latter transmits in turn activation down its own axon and the process is repeated until a stable state is reached within the system (more on this below).

    The process here is of a hybrid nature: every time a neuron is activated, some parallel activation is taking place within its own axial ramifications, but single-winner competition implies that each step of the firing is distinctly located, and the ove rall process is clearly serial in terms of neuron activation.

    As it is signifiers which are attached to a neuron's axial ramifications, it is possible to envisage that the labels are read out each time one is encountered ; a concatenated series of signifiers being thus produced on the spot (the concep t of the memory network earlier described ensuring that such a concatenation amounts to a proper syntagm). An alternative within the same approach could be that each signifier encountered is recorded but not read out immediately: being provisionall y stacked until a stable state is reached in the system. Only then would this parallel type of material be serially processed with the help of some particular reading method (4).

    Let us summarise the competitive principle thus being attained. We are dealing in the case of a memory network with a network whose arcs are labelled with signifiers and where the topological information embedded in the net encodes the relations hip existing between the connected signifiers - rather, between the concepts represented by these signifiers. When such a network is walked, a number of bifurcations are encountered. A single-winner competition principle intervenes at bifurcations as a lo cally applicable decision theory. Which means that information storage is taking place at three different levels:

    1) weights, which provide a decision-principle in the competition taking place at each layer being passed through,

    2) topology, providing syntactic and logical information,

    3) labels, holding semantic units.

    A dynamics of the type just described has been implemented in ANELLA in a manner which will be described now.

    As we saw above, although there may be more than one neuron to represent a particular signifier, a single neuron never represents more than one signifier, i.e. to each of its axial ramifications is attached an identical label. Let us suppose now that t o each synapse that terminates an axial ramification is attached a value which will be called affect value. The name affect value has been chosen following the double hypothesis that a memory trace will only be recorded if the limb ic system is currently active (Gloor & al. 1982), i.e. if the person is emotionally aroused, and, symmetrically, that a speaker utters the words that he « feels like saying », i.e. which he is emotionally driven to utter. Of a nature com parable to that of a Hebbian weight, an affect value has, for our purposes here, an indifferent range.

    Let us express now the single-winner competition principle operating in an activated neuron in terms of affect values. The activation spreading through the axial ramifications of a particular neuron is being transmitted to a single oth er neuron: that connected to the synapse holding the lowest affect value.

    With this principle operating, the dynamics of the system amount to a gradient descent on the net. One can now represent the neural network as deposited on a landscape where synaptic affect values define relative altitude between connected neuro ns. Syntagm generation then equates with a down-slope canalised activation (the path of a rolling ball) from an excited signifier down towards a local minimum (potential well) on the landscape.

    If affect values remain constant, the system operates in a deterministic manner: each time the same words are presented to the system, the « ball » rolls down the same « valley » of the landscape and the same syntagm is gener ated.

    In ANELLA however the landscape has been made to evolve according to a double principle which will now be described.

    When the system is logged on, the affect value attached to a particular synapse splits into a relevance value and a cognitive value. From then on and until the system is logged off, every time a synapse is activated, the relevance valu e is increased for short- term relevance purposes and the cognitive value is decreased for long-term cognitive purposes.

    The increase of the affect value acting as relevance value means the following in terms of our dynamical landscape: that during a particular session - a « conversation » -, as soon as a reply has been generated, the entire profile of the « valley » being travelled gets elevated by the same increment in altitude. If bifurcations in the network exist for the signifiers concerned, this elevation will prevent the same « valley » from being followed a second time round. The effective implication is that, should the same syntagm be offered to the system once again, it will not repeat the answer given the first time, but will generate a new reply.

    Or to better characterise the mechanism at work: the current relevance of the answer first offered has decreased because it has been uttered. The system manages to maximise the relevance of its second reply by offering a different sentence - avo iding in this way the dumb behaviour that would consist of repeating itself. Affect values are clearly acting here as relevance values.

    Conversely, the decrease of the affect values acting as cognitive values means that the oftener syntagms are uttered the more cognitively salient they become. All along a conversation, synapses passed through have seen their cognitive value increased for long-term purposes (the process being of no current import). Once the conversation is finished, all affect values are updated with these new cognitive values. And when a new session begins these act as the new initial aff ect values. A practical interpretation is that the associations made in previous conversations have « facilitated » the paths being used, deepening those valleys which have been travelled. This process, first described by Alexander Bain (5), corresponds strictly to what is nowadays called « Hebbian reinforcement » ; it holds exactly the same cognitive significance (6.

    Thus in terms of overall relevance and as far as the evolutive functioning of the system is concerned, subjects evoked during a session progressively loose relevance within the session in question, but gain relevance in terms of th e long-term functioning of the system.



  15. Learning in a P-Graph neural network
  16. From what has been described earlier, the obvious interpretation of the learning process in a memory network type of neural network is that each time a new signifier is added to the network, a number of arcs (determined by the P-Graph algorithm) are created representing a number of distinct neurons. As such however, the growth process of a P-Graph cannot reflect the actual learning process taking place in the cerebral cortex.

    Let us assume that the initial state of the cerebral cortex (at the birth of the individual) is that supposed by Hebb (see footnote # 3), i.e. of being mappable on a quasi-complete graph each neuron being connected with (nearly) every other neu ron. What our hypothesis suggests is that structuring of the network for memory storage purposes implies a neuron's dramatic destruction of existing connections, the remaining ones becoming significant precisely because of their drastically reduced number .

    A memory network will therefore be constituted of two parts: a « virgin », unemployed part composed of quasi-completely connected neurons, and another part, active for memory storage composed of sparsely connected neurons. In terms of a me mory network, learning a new signifier would then mean including a number of such « virgin » quasi-complete neurons in the active memory network, attaching the new signifier's label to their axial ramifications, and making them significan t by having all their connections removed but those which have become meaningful through labelling.

    An existing neuron would therefore intervene actively in memory storage as soon as it has become structured, i.e. as soon as all its connections have been severed apart from those which encode from then on a specific type of information. Viewed in this way, learning would not consist of the creation of new neurons but of the colonisation of existing but « virgin » neurons belonging to an unemployed part of the cerebral cortex (7).

    The only major constraint on such « pruning » for learning purposes would be that the network remains a connected graph (that there remains at least one path connecting each vertex to every other vertex), that is, that there are no disjoined sub-graphs. The smaller the number of arcs, the more significant is the information contained in the network, as the reduced topology becomes concomitantly more significant. The issue is parallel to that of percolation, but so to speak, in a revers ed manner, i.e. pruning should develop as much as can do but not beyond the percolation threshold (8).



  17. Emergent logic in a memory network
  18. First Order Logic is - under a variety of guises - a high- level description of compatibility maintaining principles active in the production of text. It is unlikely to be revealing about the low-level (word and sentence generation) operating me chanisms (9).

    The reasons why are the following:

    1) logic is a formalisation resulting from successive steps of abstraction. At each step some aspects of the empirical material (spontaneous human discourse) being modelled has been lost (10).

    2) logic is only interested in the role played by one type of words, those that Ryle calls « topic-neutral » (11) ; it has nothing to say about compatibility as far as are concerned the « topic- committed » wo rds composing the content of any piece of speech or text.

    3) formal logic results from a further « forcing » of First Order Logic as a description of spontaneous usage onto a slightly inappropriate mathematical structure: a Boolean algebra.

    Smolensky observes on this subject that

    « It is interesting to note that advocates of logic in AI have, for some time, been trying to evade the brittleness of hard constraints by developing logics, such as nonmonotonic logics, where all of the rules are essentially used together to make inferences, and not separately.

    In the symbolic paradigm, constraints are typically hard, inference is logical, and processing can therefore be serial (...). In the subsymbolic paradigm, constraints are soft, inference is statistical, and therefore it is most natural to use parallel implementations of inference. » (Smolensky 1989: 58).

    This suggests that logic belongs - along with syntax - to the category of rule-like behaviour that will emerge from self- organisation in a connectionist system even if it operates at the symbolic level.

    Experimentation with ANELLA has led the author to believe that FOL is emergent in a memory network. Indeed, whenever a memory network is walked through, a type of logical calculus is automatically operated.

    Let us remind the reader that ANELLA associates concepts through either the asymmetrical « is_a » or the symmetrical « has_a » relationships.

    Every symmetrical connection calls up the « has_a » pseudo- copula. Thus,

    « master has_a dog », and

    « dog has_a master ».

    Conversely, each asymmetrical connection calls up the copula « is_a ». But the asymmetry of the relationship implies that a different quantifier applies depending on the down- or upstream direction of the walk. Thus, between « lio n » and « mammal », obtains either,

    « (Every) lion is_a mammal », or

    « (Some) mammal is_a lion ».

    ANELLA has been equipped with a « motor » for distinguishing various figure types so as to deliver a valid type of inference from the serial chaining of premises. This « motor » as will be obvious has nothing much to do with First Order Logic. The four elementary figures are the following: is_a / is_a, is_a / has_a, has_a / is_a, has_a / has_a. Here are a few actual examples from ANELLA.

    1)

    « Polly is a parrot »

    « a parrot is a bird »

    therefore

    « Polly is a bird ».

    2)

    « a parrot is a bird »

    « a bird has wings »

    therefore

    « a parrot has wings ».

    3)

    « Rex has fleas »

    « fleas is an insect »

    therefore

    « Rex's insect is fleas ».

    4)

    « Oscar has a cat »

    « a cat has whiskers »

    therefore

    « I can't say more ».

    The two first figures are classical: the first one acknowledges the transitivity of inclusion - which may indeed proceed for ever ; the second figure enacts the inheritance of properties. The two principles can combine wheneve r ANELLA is in a « database mode » (actual example):

    User: « Who has wings ? »

    ANELLA:

    « a bird has wings »

    « a parrot is a bird »

    « Polly is a parrot »

    therefore

    « Polly has wings ».

    The third figure displays a type of inference which does not belong to FOL, by common assent (12) however it has been decided to allow the system to make it. A less weird example of the type is:

    « a poppy is red »

    « red is a colour »

    therefore

    « a poppy's colour is red ».

    It has been objected to this figure that « it does not tell anything not known beforehand ». In the light of a similar comment made in the past by John Stuart Mill as being his overall reproach towards syllogistic inference, it has been kept nonetheless.

    The fourth figure is simply impossible, as the « has_a » relationship is intransitive.

    More figures obtain through « up-stream » walking through the memory network. For instance:

    « some mammal is a lion »

    « a lion has a mane »

    therefore

    « some mammal has a mane. »

    And so on.

    Of course such emergent logic is far from exhausting the range covered by First Order Logic, but Johnson-Laird's works on the spontaneous use of logic by human beings (1983) strongly suggests that people who have not been specifically trained in the use of FOL hardly fare better then ANELLA in this respect.



  19. Conclusion
  20. Hypotheses like the one being offered here will not be compelling until experimental evidence come forth and support them. The alternative neural network described here under the name memory network has got however specific qualities such as sim plicity, aesthetic value and economy of means. By suggesting the encoding of a signifier within a collection of neurons it presents the property of being sufficient for its purpose. Whatever views are developed nowadays about neuron columns or «&nb sp;neuronal groups », in the model being proposed here a single neuron suffices at the job of information storage and retrieval. The learning process forces an until then « virgin » unspecialised neuron to specialise, while at the sa me time providing it with a sufficient redundancy (or rather degeneracy in Edelman's terms (13) to counter any « neural death » which might disrupt the system's functioning, and to allow distinct associative chaining to be discriminated by the different affect values they hold. The P-Graph neural network provides automatically an optimal degree of redundancy of the information stored.

[Figure 1]

[Figure 2]

[Figure 3]




1) « ... what Gerstein characterizes as « emergent » properties of the neuronal population in that they signal characteristics not found explicitly in the individual constituent neurons. » (MacGregor 1987: 140). And in Feldman's words: « .. . the nature of any emergent system properties depends heavily on which concepts are explicitly represented and on the detailed structure of the representation. » (Feldman 1989: 98).

2) ANELLA was developed under an Academic Fellowship from British TELECOM in 1988.

3) This was the view underlying Hebb's approach: « Hebb (...) imagined that the intrinsic original connectivity structure of the network was of secondary importance and might for theoretical purposes be considered as random or homogeneous a nd that the original network might be considered akin to the tabula rasa or blank tablet of Locke. » (MacGregor 1987: 150).

4) The latter amounts to a well-known knowledge representation mode in linguistics: the structural syntactics of Lucien Tesnière (1959). His notion of the stemma is precisely that of stacked symbolic material that gets serially processed wh enever a syntagm is generated. In Tesnière's view, stemmas are universal (which would fit with the universal nature of a memory network) but the serial processing method is proper to each particular natural language. This point has been developed at the t heoretical level in Jorion 1996 (especially, 268-270). An algorithmic presentation dealing with this issue of a serial resolution of stacked symbolic material will be dealt with separately (Jorion forthcoming).

5) « Alexander Bain (1868) who represented associations of ideas by the strengths of connections between neurons thet represented those ideas… » (Arbib 1987: 3).

6) In 1949, Hebb […] argued that learning was to be explained by the formation of cell assemblies in the brain » (Arbib 1987: 5).

7) It is possible in this case to suppose that such structuring is not strictly deterministic but results from some type of Darwinian competition such as described by Edelman and co- workers (Edelman 1987; Finkel, Reeke & Edelman 1989). If Edelman's analysis is correct, then it may even be possible to imagine that the newly colonized neurons are actually distracted from some other function they were performing until then.

8) The ultimate means for diminishing the number of arcs in the graph is by letting the graph degenerate into a tree. This would not mean that a single completely ordered hierarchy obtains as hierarchies defined by distinct principles can i ntertwine. Such a principle should not however be sought for as the « has_a » relationship has a valuable role to play in the net. The definition of an « even number » is for instance decomposed in the following manner by a module of ANELLA: « even is_a n umber has_a divisor is_a two ». It is clear that the « has_a » is here highly significant and could not possibly be replaced by a « is_a » relation. The « is_a » relationship introduces however a very effective structuring principle in a net as is reveale d in contrast by « primitive mentality » where the « has_a » relationship is predominant - if not the only existing one (see on this Jorion 1989a).

9) This view is expressed by Aleksander and Burnett in Reinventing Man: « It looks rather as if decision trees, predicate calculus, etc. may only be rationalisations of information processing feats which are, in reality, achieved by quite d ifferent means - a sort of gloss which the brain produces in order to explain its own behaviour » (Aleksander & Burnett 1983 : 171).

10) Ryle makes this especially clear:« The logician's 'and', 'not', 'all', 'some' and the rest are not our familiar civilian terms; they are conscript terms, in uniform and under military discipline, with memories, indeed, of their previou s more free and easy civilian lives, though they are not living those lives now. Two instances are enough. If you hear on good authority that she took arsenic and fell ill you will reject the rumour that she fell ill and took arsenic. The familiar use of 'and' carries with it the temporal notion expressed by 'and subsequently' and even the causal notion expressed by 'and in consequence'. The logician's conscript 'and' does only its appointed duty - a duty in which 'she took arsenic and fell ill' is an abs olute paraphrase of 'she fell ill and took arsenic' » (Ryle 1954 : 117-118).

11) Thus again in Ryle's words: « We may call English expressions 'topic-neutral' if a foreigner who understood them, but only them, could get no clue at all from an English paragraph containing them what that paragraph was about » (Ryle 1 954 : 116). See too Jorion 1997.

12) Within the Connex Project at British Telecom.

13) « ... degeneracy is a population property (i.e. it requires variance) and that it must be distinguished from redundancy, which is used (...) to refer to the existence of repeated units or groups having identical structure and response characteristics. » (Edelman 1987: 50).





References

Aleksander, I. & Burnett, P.
1983 Reinventing Man. The Robot Becomes Reality, London : Kogan Page

Arbib, M.A.
1987 Brains, Machines, and Mathematics (2d ed.), New York - Berlin : Springer-Verlag

Churchland, P.S.
1986 Neurophilosophy, Toward a Unified Science of the Mind/Brain, Cambridge (Mass.) : MIT Press

Churchland, P.S. & Sejnowski, T.J.
1989 « Neural Representation and Neural Computation » in Nadel, Cooper, Culicover & Harnish 1989, 15-48

Conway, T. & Wilson, M.
1988 « Psychological studies of knowledge representation », in Ringland & Duce 1988, 117-160

Cottrell, G.W.
1989 A Connectionist Approach to Word Sense Disambiguation, London : Pitman / Los Altos (Cal.) : Morgan Kaufmann

Edelman, G.M.
1987 Neural Darwinism: The Theory of Neuronal Group Selection, New York : Basic Books

Feldman, J.A.
1989 « Neural Representations of Conceptual Knowledge », in Nadel, Cooper, Culicover, & Harnish 1989, 69-103

Finkel, L.H., Reeke, G.N. Jr. & Edelman, G.
1989 « A Population Approach to the Neural Basis of Perceptual Categorization », in Nadel, Cooper, Culicover & Harnish 1989, 146-179

Freud, S.
1954 (1900) The Interpretation of Dreams, transl. James Strachey, London : George Allen & Unwin

Gloor, P., Olivier, A., Quesnay, L.F., Andermann, F. & Horowitz, S.
1982 « The role of the limbic system in experimental phenomena of temporal lobe epilepsy », Annals of Neurology, 12

Holland, J.H., Holyoak, K.J., Nisbett, R.E. & Thagard, P.R.
1986 Induction. Processes of Inference, Learning, and Discovery, Cambridge (Mass.) : MIT Press

Johnson-Laird, P.N.
1983 Mental Models, Cambridge : Cambridge University Press

Jorion, P.
1988 « ANELLA: Associative Network with Emergent Logical and Learning abilities », British TELECOM, Academic Fellowship, Final Report, 37 pp.
1989 « Intelligence artificielle et mentalité primitive, Actualité de quelques concepts lévy-bruhliens », Revue philosophique, 4, 515-541
1990 Principes des systèmes intelligents, Paris : Masson
1996 « La linguistique d'Aristote », in V. Rialle & D. Fisette (eds.), Penser l’esprit: Des sciences de la cognition à une philosophie cognitive, Grenoble: Presses Universitaires de Grenoble: 261-287
1997 «Jean Pouillon et le mystère de la chambre chinoise », L’Homme, 143, 91-99
forthcoming « Syntax as critical points in a multi-dimensional semantic space »

McClelland, J.L. & Kawamoto, A.H.
1986 « Mechanisms of Sentence Processing : Assigning Roles to Constituents », in McClelland & Rumelhart 1986, 272-325

McClelland, J.L & Rumelhart, D.E.
1986 Parallel Distributed Processing, Explorations in the Microstructure of Cognition, Volume 2 : Psychological and Biological Models, Cambridge (Mass.) : MIT Press

MacGregor, R.J.
1987 Neural and Brain Modeling, New York : Academic Press

MacRandal, D.
1988 « Semantic Networks », in Ringland & Duce 1988, 45-79

Minsky, M.L. & Papert, S.A.
1988 (1969) Perceptrons. An Introduction to Computational Geometry. Expanded Edition, Cambridge (Mass.) : MIT Press

Nadel, L., Cooper, L.A., Culicover, P. & Harnish, R.M. (eds.)
1989 Neural Connections, Mental Computation, Cambridge (Mass.) : MIT Press

Norman, D.A.
1986 « Reflections on Cognition and Parallel Distributed Processing », in McClelland & Rumelhart 1986, 531-546

Perez, J.-C.
1988 De nouvelles voies vers l'Intelligence Artificielle, Paris : Masson

Ringland, G.A. & Duce, D.A. (eds.)
1988 Approaches to Knowledge Representation, Letchworth (Herts.) : Research Studies Press

Rumelhart, D.E. & McClelland, J.L.
1986b « On Learning the Past Tenses of English Verbs », in McClelland & Rumelhart 1986, 216-271

Ryle, G.
1954 Dilemmas. The Tarner Lectures 1953.
Cambridge : Cambridge University Press

Shastri, L.
1988 Semantic Networks : An Evidential Formalization and its Connectionist Realization, London : Pitman / Los Altos (Cal.) : Morgan Kaufmann

Sowa, J.F.
1984 Conceptual Structures: Information Processing in Minds and Machines, Cambridge (Mass.) : MIT Press

Stich, S.P.
1983 From Folk Psychology to Cognitive Science, Cambridge (Mass.) : MIT Press

Tesnière, L.
1966 (1959) Eléments de syntaxe structurale, Paris : Klincksieck