Latimer, C. and Gazzard, S. (1998, in press): Modelling Attentional Biases in the Perception of Geometric Forms. In T. Downs and M. Gallagher: Proceedings of the Ninth Australian Conference on Neural Networks (ACNN, '98).

Modelling Attentional Biases in the Perception of Geometric Forms.

Modelling Attentional Biases in the Perception of Geometric Forms.
Cyril Latimer and Scott Gazzard
Department of Psychology
University of Sydney

Attentional biases are observed in all modalities of human perception. For example, experimental participants respond more quickly to particular dimensions of stimuli and to stimuli located in particular positions of the visual and auditory fields. This paper reports robust top-right attentional bias in perception of simple geometric forms and an explanation of this bias in terms of our long experience at reading English text from left to right coupled with the need to adjust attention upwards as we locomote through visual space. Attempts to explain the attentional bias in terms of simple neural networks given training at recognising features scrolling from right to left and top to bottom in the visual field are reported.

1. Attentional Biases
Attentional biases occur in all modalities. Not only do humans show a preference for processing higher rather than lower frequencies, but they also display a propensity for processing higher frequencies in the right ear [1]. In the visual domain, it is easy to ignore an irrelevant attribute if it is a colour, but not if it is a word - the Stroop Effect [2]. When large alphabetic characters are composed by smaller and different characters, recognition of the larger letters has an interfering effect on recognition of the smaller letters but not vice versa [3]. Biases for particular regions of the visual field and for particular regions within visual patterns have also been demonstrated. Eye fixations are often directed more to the top and left of a visual array and differential performance in processing items to top, right and left of fixation has been observed in reading and visual form recognition. Finally, there is abundant evidence for lateral asymmetries in perception due to the asymmetry of representation of function between the cerebral hemispheres [4].
Latimer and Stevens observed superior performance in the perception of simple geometric forms when differences between a standard form and a set of comparison forms were located reliably at the top and to the right rather than top-left, bottom-right or bottom-left [5]. Samples of 14 - 20 experimental participants were presented with the standard form on a computer screen followed either by the standard again or a comparison form and were asked simply to judge whether the second form was the same as or different from the standard form. Participants responded, "Same" or "Different" and their response times were recorded with millisecond accuracy by way of a voice key. Forms measured 4 cms (3.06 deg.) horizontally by 5.5 cms (4.2 deg.) vertically. Figure 1 shows the stimulus sequence with the standard form and a comparison form differing in the top-left region. Figure 2 depicts examples of sets of comparison forms differing reliably in the regions: top-right, bottom-right, top-left and bottom-left.
The top-right bias is robust and is in evidence even when experimental participants are informed that the reliable region of difference is

Figure 1. The stimulus sequence with the standard form and a comparison form differing in the top-left region.
in the bottom-left and are instructed to attend bottom-left [5]. By way of comparison Figure 2 shows the mean judgment times and standard errors of participants for sets of comparison forms differing reliably from the standard form in one of the four regions. It can be seen that placement of the reliable region of difference in the top-right of the comparison forms produces significantly faster response times. Furthermore, responses are faster when differences are located to the right rather than the left of the comparison forms.

Figure 2. Standard form and sets of comparison forms with mean judgment times and standard errors (in brackets) for each set of comparisons.
2. Explanations of the Bias
One possible explanation of the top-right superiority is the combined influence of our considerable experience at reading English from left to right coupled with our experience at continually adjusting our gaze upwards as we locomote through space. When reading, we are constantly set to gather information on the right of fixation, and as a result, our effective attentional field may extend further to the right than to the left. Honda and Findlay note that eye movements to the top of the visual field are faster than those to the bottom and suggest that visual ecology may play a part here [6]. In normal forward locomotion, the eyes often remain fixed on an object as it moves lower in the visual field. Thus there will be a majority of downward tracking movements of the eyes followed by frequent upward refixation movements. If the latency of attentional shifts is affected by experience, then it is possible that the observed top-right attentional superiority may be engendered by a combination of predominantly rightward attentional shifts in reading and upward attentional shifts in forward locomotion.
Accordingly, an attempt has been made to provide a mechanism for such an explanation of the bias in terms of a simple artificial neural network given training at recognising features scrolling onto its "retina" from various directions. Should it prove possible to train such a network and engender biases like those observed for human subjects, then the network would stand as an existence proof of the possibility that the human attentional biases may be determined by biased visual experience.

3. Network Configuration

The neural network was designed to demonstrate its operating principles in as simple a manner as possible, while maintaining general architectural plausibility with regard to the connection patterns found in the mammalian visual system. Every unit in the network is a linear threshold unit with continuous positive activation and binary output. Activation levels decay by ten percent over each network cycle.

Figure 3: Network architecture with examples of connection patterns for each type of input feature. Each output unit connects to every hidden layer unit.
The architecture is based on a perceptron, with a 9 x 9 array of units in the input layer, a 14 x 14 array of units in the hidden layer, and four output units. Connections from the input layer to the hidden layer are fixed and hard-wired so that the hidden units behave as feature detectors. Figure 3 depicts the network architecture and shows examples of the four types of feature available for detection - vertical, horizontal, left-diagonal and right-diagonal lines within a 3x3 grid. For each type of feature at every possible position on the input layer, there exists a unit in the hidden layer dedicated to its detection.
The output layer contains one unit for each type of feature. Each output unit is connected to every unit in the hidden layer, and weights are initially set to a random value between 0 and 0.1. Weights on connections to the output layer are modifiable and are updated after each cycle according to the version of the delta rule shown in Equation [a]. Note that backpropagation of error is not necessary for training in this network configuration.
[Delta]wij = k[alpha]i(ti-oi)ij [a]
In Equation [a], k is a learning rate constant, [alpha] is a weight limitation parameter, t is the target activation level, and o is the observed activation level. The only term added to the traditional delta rule is [alpha], which operates to keep weights within the positive 0-1 range. Its value is given below.
if ti>oi : [alpha]i=(1-wij);
if ti<oi : [alpha]i=(wij);
if ti=oi : [alpha]I=0 .
The familiar, but noteworthy property of the delta rule in Equation [a] is that the amount of change in connection strength applied to a particular connection is (among other things) proportional to the difference between an output unit's observed level of activation and its target level of activation. This simple point becomes essential for an understanding of how the network modifies its connections during exposure to dynamic input patterns.

4. Training procedure

The goal of network training was to associate each of the output units with the presentation of one type of feature on the input layer at any location. Whereas feature detection by hidden units is location-specific, the successfully trained output units are location-independent feature detectors. Training was conducted using the normal technique of presenting input patterns and applying the delta rule on each presentation. The only difference between the current procedure and traditional techniques was that the input patterns were structured to simulate stimuli scrolling across the network's input layer (an analogue of a retina). Five training schedules were used to train the network on separate occasions, reflecting feature-scrolling from right-to-left, top-to-bottom, left-to-right, bottom-to-top, and a combination of right-to-left and top-bottom scrolling. Regardless of which training schedule was being employed, each feature was presented at every possible location on the input layer a total of 10 times during training. Figure 4 shows an excerpt from the training stimulus used for the right-to-left and left-to-right schedules. Training proceeded by presenting the pattern formed by the 9x9 grid starting at position 1 at cycle 1, then position 2 at cycle 2, position 3 at cycle 3, and so on.

Figure 4: Excerpt from scrolling stimulus set used for right-to-left schedule.


The main focus of interest in training and testing the current network was whether top-visual hemisphere and right-visual hemisphere response biases recorded in human subjects would be observed as a result of training designed to simulate 1) reading of text from left to right (right-to-left feature-scrolling), and 2) forward locomotion in a ground-feature-intensive environment (top-to-bottom feature-scrolling). The network was tested by presenting individual features at each location on the input layer, and then measuring the resultant activation level of that feature's target output unit. Five training runs were conducted for each training schedule, and results presented below refer to the averages across training runs.
The first training schedule employed right-to-left feature scrolling. Figure 5 presents the results for this schedule. The graph depicts the activation levels of target output units for the presentation of features at each location on the input layer. Note that the input layer is a 9 x 9 array, but each feature occupies a 3 x 3 grid, so there are seven possible horizontal and vertical positions for any feature. Figure 5 clearly shows a trend toward higher output activations for features presented on the right side of the input layer, regardless of feature type. A t-test was used to determine whether responses in target output units resulting from features presented to the left side of the input layer (columns 1-3) were significantly lower than responses resulting from features presented to the right side (columns 5-7).

Figure 5: Results for network trained on right-to-left feature scrolling.
As can be seen in Table 1, the difference between these values was significant. Results were obtained and tested similarly for training schedules 2, 3 and 4. Generally, the observed pattern of response strengths due to these schedules replicated that found in Figure 5, with the trend of high to low values always aligning with the direction of feature scrolling. The results and significance tests for these schedules are reported in Table 1.
The fifth training schedule combined right-to-left with top-to-bottom feature scrolling in equal proportions. Results are graphed in Figure 6. As expected, the results of this schedule are a superposition of trends observed in the individual right-to-left and top-to-bottom training schedules. The response bias is to the right and the top simultaneously.
Table 1: Mean responses to features in left, right,
top and bottom hemispheres for training schedules 1-4.
Scroll  Mean target response by hemisphere  Statistics
direction Left  Right Top  Bottom t obs  p value 
right -> left  41.74 54.80 - - -17.11 <0.001
left -> right  54.75 41.80 - - 16.28 <0.001
top -> bottom  - - 54.69 41.58 17.40 <0.001
bottom->top - - 41.80 54.82 -16.46 <0.001

Figure 6: Results for network trained on right-to-left and top-to-bottom feature scrolling.
To summarise, the results of Simulations 1 and 2 show that the network is able to form separate visual-hemisphere response biases similar to those found in human subjects, as a result of training on scrolling stimuli designed to approximate reading and forward locomotion. The results of Simulations 3 and 4 demonstrate that the biases formed by the network are due to training alone, and not network architecture or initial conditions. Finally, Simulation 5 results show that it is possible to form top-right visual hemisphere biases simultaneously (as have been observed in the humans).

5. Discussion

The ability of the network to form response biases based on scrolling input features arises from the interplay of three components: 1) the delta rule for modifying connection strengths; 2) units with linear activation functions and relatively slow decay rates; and 3) a hierarchy of feature detecting units from location-specific detectors to location-independent detectors. The use of the delta rule, as mentioned above, is important because the amount of change in connection strengths depends upon the difference between a target level of activation and the observed level of activation. The fact that unit activations decay slowly means that activation in a particular output unit will increase over time if the feature with which it is associated is continuously presented. A feature that scrolls across an input layer is necessarily continuously present (remembering that output units are location-independent), so activation will tend to increase in a target output unit as a feature moves across the input layer. Subsequently, as the activation in the target output unit increases, the amount of learning (in terms of weight change) that can occur decreases. So, the rate of weight change on connections associating a feature at the beginning of its scroll across the input array is therefore higher than the rate of learning for connections associating a feature at the end of its scroll-path. When connection strengths are increasing, the net result is that stronger associations are formed between the feature-independent (output layer) feature detectors and the location-specific (hidden layer) feature detectors located where features first appear when they scroll across the input array.
In simple terms, stronger associations and higher activation of units in the top-right visual field would translate into faster response times to geometric forms whose reliable regions of difference from a standard form lie in the top-right corner [7]. As a result, the network stands as an existence proof of a mechanism that can acquire through experience a sufficient basis for responding relatively more quickly to differences falling within its top-right visual field. Further research is needed to determine whether or not similar principles determine the robust, visual attentional biases observed in human experimental participants.
This research was supported by an ARC Institutional Grant to the first author.
6. References
[1] D. Deutsch, "Grouping mechanisms in music". In D. Deutsch (Ed.), The psychology of music New York: Academic Press. 1982
[2] J.R. Stroop, J. R " Studies of interference in serial verbal reactions". Journal of Experimental Psychology, vol 18, 643-662, 1935
[3] D. Navon, D, "Forest before trees: The precedence of global features in visual perception". Cognitive Psychology, vol 9, 353-383, 1977
[4] M. Corballis, The lopsided ape: Evolution of the generative mind. New York: Oxford University Press, 1991
[5] C.R. Latimer, C.J. Stevens, L. Webber & S. Gazzard, Attentional biases in geometric form perception. manuscript under review.
[6] M. Honda, H., & J.M. Findlay, Saccades to targets in three dimensional space: Dependence of saccade latency on target location. Perception & Psychophysics, vol 52, 167-174, 1992
[7] C.R. Latimer, W. Joung, & C.J. Stevens Modelling symmetry detection with back-propagation networks. Spatial Vision, vol 8, 1-17, 1994