Sensorimotor accounts of joint attention
|Alexander Maye et al. (2017), Scholarpedia, 12(2):42361.||doi:10.4249/scholarpedia.42361||revision #171037 [link to/cite this article]|
Joint attention is a social-cognitive phenomenon in which two or more agents direct their attention together towards the same object. Definitions range from this rather broad conception to more specific definitions which require that, in addition, attention be directed to the same aspect of that object and that agents need to be mutually aware of their jointly attending. Joint attention is an important coordination mechanism in joint action. The capacity for engaging in joint attention, in particular in the sense of this narrower definition, is frequently taken to indicate the presence of theory of mind in the participating agents, and is also implicated in the development of theory of mind. However, the prominence of sensorimotor components in establishing and sustaining episodes of joint attention — e.g., eye and head movements, pointing and vocalizations — suggests that sensorimotor approaches to social interaction may be well-placed to account for important parts of joint attention without invoking theory-of-mind abilities, and may therefore provide a valuable alternative to approaches emphasizing theory of mind. In particular, sensorimotor accounts may explain aspects of joint attention through the dynamics and regularities in the sensorimotor processes that coordinate the activities of the agents.
Although the topic of joint attention has been studied intensively since at least the 1990s, there is still no consensus about how to define it. The term is generally taken to refer to episodes in which two individuals focus together on the same perceptual object — usually looking together at the same object — and being mutually aware of this. Most adult humans are thoroughly familiar with this phenomenon (sometimes also called shared attention, e.g., Emery, 2000), and have an intuitive understanding of its meaning. But it has in fact proven surprisingly difficult to move from this rough working definition to a more precise and rigorous one. The range extends from rather general characterizations like “looking where someone else is looking” (Butterworth, 1991, p. 223) to descriptions of the complex mental phenomena in joint attention implying “... an understanding of the other participant not as an object or capturer of attention ..., but as a person who intentionally perceives a certain aspect of the environment that is the same as one’s own ...” (Tomasello, 1995, p. 107).
No matter which definition one prefers, a variety of questions remain open. What does joint attention consist of? How is it achieved? By what criteria is it judged? What, after all, does it mean to focus on the same object? And what does it mean to be mutually aware of this? These questions touch upon deep philosophical issues. Thus, as Seemann (2011a) observes: “Your view of joint attention depends, among other things, on your more general philosophical or psychological outlook… There really is no argument that would independently settle just what joint attention is: any answer to this question will always be informed by the conceptual framework from within which the question is addressed” (p. 183).
Nevertheless, the working definitions sketched above have been sufficiently precise to structure a rich and fruitful program of conceptual and experimental research into the components, functions and levels of joint attention, into the phylogeny and ontogeny of the phenomenon, and also into developmental disturbances of joint attention and their consequences for cognitive development. Whereas in the typical scenario for considering joint attention the individuals interact in a direct manner, modern communication technologies open up a new scenario: virtual communication channels make interactions across space and/or time possible and enable joint attention towards a (potentially virtual) object. This is an interesting test case for gauging the scope of definition of joint attention.
Sensorimotor approaches have in common that they give pride of place to action and embodiment: Rather than considering action as a result of or an indicator for cognitive capabilities, they conceive it as a creator and constituent of such capabilities. From this perspective, joint attention may not be taken to presuppose a sophisticated understanding of other agents’ perspectives or mental states generally (Moll & Meltzoff, 2011). Insofar as certain functions are commonly associated with joint attention, such as the facilitation of joint action and verbal communication, attempts to explain joint attention (or aspects thereof) may be assessed in light of how well they account for these functions.
The discussion about different definitions of joint attention is compounded by the possibility of distinguishing different levels. If the concept of joint attention is understood as referring to episodes in which two individuals turn their attention to the same aspect of their environment and are mutually aware of this, it is apparent that it may be a graded concept. Levels of joint attention, then, can be distinguished according to the degree to which participating individuals are focusing their attention and whether they are mutually aware of this or not.
For example, a relatively low level of joint attention may be said to occur when two individuals are looking in the same direction without any specific focus, e.g., when looking out of a window. A somewhat higher level of joint attention would occur when they were looking at the same specific object but not necessarily attending to the same aspect of this object. Attention would become fully joint if both partners were to contemplate the same aspect of the object in a reciprocal manner.
A further distinction can be made by considering the causes for an episode of joint attention. Extrinsic causes may involuntarily draw the attention towards them, like a car engine backfiring. Subjects may also intrinsically engage in episodes of joint attention, for example, in a joint search or when watching a movie together. Joint attention is called bottom-up attention in the former case and top-down in the latter (Carpenter & Liebal, 2011).
Rich and lean accounts
Discussions of joint attention have been structured by the contrast between so-called rich and lean accounts (for a comprehensive overview, see Racine & Carpendale, 2007; Racine, 2011). The distinction between both approaches rests on how much psychological awareness or complexity should be attributed to an agent who engages in joint attention activities (Racine, 2011, p. 23).
On the one hand, lean accounts (Butterworth, 1998) claim that co-orientation of two or more organisms to the same focus is the criterion to identify joint attention. Two people simply looking from different windows at the same event would be an example that complies with this definition (Racine & Carpendale 2007). However, as this example does not entail any connection or coordination between the participants, attention is rather simultaneous or parallel than joint. A lean account of joint attention, then, implies that agents are causally connected to each other in some manner, via low-level mechanisms of some sort. In contrast, advocates of rich accounts of joint attention (Hobson, 2005; Tomasello, 1995) also require the agents to be mutually aware that their attention is directed to the same thing. Consequently, they appeal to the underlying cognitive operations.
In order to account for the mutual awareness that is partially constitutive of joint attention, proponents of rich accounts argue that it is necessary to postulate at least second-order representational competencies (Seemann, 2011), i.e., to represent the other agent's intentional states. Indeed, it can also be argued that higher-order representational capacities are required in order to represent that the other agent represents one's own intentional state and to represent that the other represents one's representation of their intentional state and so on and so forth.
Given the obvious danger of an infinite regress, proponents of leaner accounts argue that joint attention is not a state in which two individuals each enjoy a complex kind of awareness, an aspect of which is directed at the other person or a common focus, but has a strong active component of bodily interactions. These interactions endow the agent with a non-reductive capacity for a "practical, action-guiding understanding of the perceived scene’s causal properties" (Seemann 2011), i.e., affordances. Likewise, foci of attention are not just shared between the individuals but actively constructed through their interaction and extended over time by embedding them in task structures and conventionalizing them in terms of canonical forms in the culture (Bruner, 1995, p. 6). By subscribing to this enacted, embedded and encultured quality of joint attention, proponents of sensorimotor accounts move the focus for explaining properties of the phenomenon away from higher-order mental states and representational and metarepresentational capabilities towards action and the physical and cultural properties of the environment. This approach may be generalized to situations in which the individuals interact across space and time as discussed below using a few test cases. It is natural, therefore, to expect that sensorimotor approaches to joint attention will contribute to the project of articulating a lean account.
Functions of joint attention are mainly considered with respect to ontogenetic development and social coordination. Together with aligning mechanisms at the perceptual-motor level and the psycholinguistic level (Tollefsen & Dale, 2012), it is a low-level mechanism for mental alignment of partners in joint actions, which may be a preparatory stage for the development of the capability to implicitly take another’s perspective in cooperative situations and later to explicitly understand the other’s perspective as such (Fuchs, 2013). The alignment is achieved by exchanging information that is relevant with respect to the jointly attended object (e.g., its location). The coordination of visual attention among social partners is also central to spoken communication: it appears especially important to early language learning and may be the key limiting factor in both language learning and social development for children with autism (Yu & Smith, 2013). Beyond its function for negotiating a given situation, joint attention serves to form culture, allows individuals to participate in this common culture and to maintain coherence (Bruner, 1995). Joint attention might be one of the sensorimotor mechanisms which may contribute to what Tomasello and colleagues (2005; Carpenter & Call, 2007) consider the uniquely human motivation to share psychological states with others.
Campbell (2011) points out the epistemic function of joint attention in the three-place relation between the co-attenders and the object. By coordinating attention with that of a conspecific, a person can differentiate the sensory experience that is available only to her from the object that is independent of her experience. Standing in this three-place relation is a basis on which one could in principle derive a great deal of propositional knowledge. Hence joint attention explains our relating to an object not as a mere artifact of our sensory experience but as an independent thing. As Campbell (2011, p. 415) asserts: joint attention "helps us to address a problem about how it can be that sensory experience is the source of our knowledge of a mind-independent world" and can explain our knowledge of our surroundings, i.e., objects as well as other minds.
One source of interest in the development of joint attention has been its apparent links to the development of theory of mind. Some of the main questions in this context concern the developmental origins of joint attentional behaviors (e.g., gaze following), the infant’s early understanding of attention and of other mental states, and the functional consequences of joint attention (and especially the consequences arising from individual differences in children’s engagement in joint attention).
The attentional capabilities of infants develop roughly along the following timeline: at 2 months, infants begin feeling others’ attention directed to the self in a broad and global sense, in particular towards their own face. They smile more when adults make eye contact with them, and less when adults look away (e.g. at their ears rather than at their eyes). The awareness of attention that is directed to one’s own face later expands to include attention to body parts and to actions performed by the self. By about 4 months of age, infants not only respond to attention directed towards them but also make active attempts to direct others’ attention to the self with ‘calling’ vocalizations when attention is absent (Reddy, 2003). After the middle of the first year, infants try to get attention when it is absent or to retain it by seeking to repeat games, exhibiting acts of clowning (repeating odd actions to re-elicit laughter), showing off (the performance of exaggerated or unusual actions) and teasing. At around nine months of age, joint attention begins to emerge through the caregiver’s shift of gaze from the infant’s face to a third object, which is followed by a subsequent shift of the infant’s gaze on that object (Seemann, 2011). By the end of the first year, infants perceive others’ attention to distal objects and, during the middle of the second year, they expand this awareness to non-present objects and events in time. The basic ability to identify what the other person is attending to (Moll & Meltzoff, 2011), so that oneself can attend to it, is foundational for the subsequent perspective-taking capacities. Joint attention is most evident in children between the ages of 12 and 18 months (Tollefsen, 2005).
In summary, the infant is emotionally aware of the attention of others from very early in life; what appears to be developing is an awareness of the objects to which others’ attention can be directed: the first of these is the self, followed by what the self does, then what the self perceives, and then what the self remembers (Reddy, 2011). So, “it is the existing experience of attention directed to the self as an object which allows the discovery of newer objects of others' attention” (Reddy, 2008, p. 100). The awareness of attending, then, is an expanding process that is intimately tied to the objects of attending.
The developmentally early capacity of humans to mutually focus on the other’s face enables infants to learn to share feelings with their caregiver. Interestingly, parents spontaneously encapsulate episodes of joint attention with positive emotion (Leavens et al., 2014). In particular, during episodes of parent-initiated bids of joint attention, parents display the highest level of smiling at the same time when they signal an object to babies. This pattern has been documented in parent-infant dyads in children from 6-18 months of age, highlighting “the environmentally situated placement of key affective information about the nature of these joint attention interactions” (Leavens et al., 2014, p. 5).
A major condition for disturbances of joint attention is autism. Children with autism are less likely to engage in episodes of joint attention, and this may be one contributing factor in the difficulties they experience with social cognition (Kasari et al. 1990). This may result from impaired capacities and processes which contribute to joint attention: they are less likely to point to and show objects (Landry & Loveland, 1988; Mundy et al., 1986), they are limited in their responsiveness to others in settings that would typically elicit social checking in relation to objects and events, and they show a reduced tendency to use eye contact and deictic gestures (e.g., pointing or showing) to coordinate attention and share experiences with social partners vis-à-vis objects or events in the world (for a review, see Hobson & Hobson, 2011). Yet they are able to shift attention and follow a head turn (Leekam et al., 1998) and detect what is at the focus of someone's gaze (Leekam et al., 1997; Hobson & Hobson, 2011). It is impaired gaze processing which affects recognition of eye contact, gaze following and joint attention (Emery, 2000). Together, this allows autistic individuals to respond to joint attention bids of others, but it leaves them impaired in regard to its initiation (Mundy, 2003). Indeed, lack of joint attention and other forms of referential communication is a primary indication of autism. As these studies suggest, impaired capabilities of sensorimotor interaction may be the underlying cause.
It has also been suggested that children with autism have limited intersubjective experience in virtue of their relative lack of the propensity to identify with the attitudes of other people (Hobson & Hobson, 2011). Of course, the claim is not that individuals with autism are completely lacking the propensity or ability to identify with others. Rather, they appear not to have the powerful pull toward, nor fully organized experience of, relations that have the other-person-centered qualities that identification affords. This may also impinge upon their willingness and ability to experience the mutual awareness that is partially constitutive of joint attention.
The capacity for establishing and maintaining reciprocal eye contact is particularly difficult to acquire for individuals with Down syndrome too. Specifically, they sometimes engage in the skill for a longer period of time than typically developing children. However, although children with Down syndrome struggle with acquiring joint attention, they are able to initiate and respond to joint attention bids as often as their typically developing peers (Abbeduto et al., 2007).
Despite the dominance of visual coordination in the establishing of joint attention, information from any modality can be used to achieve joint attention. Blind children, for example, can attain joint attention via touch or hearing: the child could feel another's hand on an object. As blind children have to rely on subtle and indirect cues for locating objects and the attentional focus of others, it is assumed that they acquire joint attention abilities later than their sighted peers; empirical evidence is affirmative (Bigelow, 2003) but scarce.
Deaf infants are able to engage in joint attention similarly to hearing infants; however, the time spent engaged in joint attention is often reduced in deaf infants born to hearing parents. Evidence shows that hearing mothers and deaf children struggle to achieve successful contingent responses in interactions initiated by mothers, and that deaf children exhibit lower levels of adaptive social behavior, which are related to maternally initiated and child-initiated success rates in the establishment of joint attention. In contrast, deaf infants with deaf parents do not show reduced time spent in joint attention (Nowakowski et al, 2009).
Problems in the acquisition of joint attention aggravate when two sensory modalities are impaired at the same time. Parents of children with dual sensory impairment report having difficulties to introduce new objects or people in the interaction with their children. Nevertheless about one third of a representative population of deafblind children has been shown to engage in spontaneous episodes of joint attention, whereas the remaining part showed communication behaviors that characterize preliminary stages thereof (Núñez, 2014).
Establishing and maintaining attention
An individual may enter joint attention (i) in response to an event in the environment, (ii) in response to another individual’s attention bid, or (iii) by trying to initiate joint attention with another individual. Cases (i) and (ii) rely on triggering the bottom-up attention system by noticeable events from the environment (i) or by nonverbal referential signaling from a conspecific (ii), i.e. deictic and iconic manual gestures, visual orienting behavior, or attention-directing tactile or auditory signals. Joint attention with another person can be actively initiated (iii) by employing the same mechanisms.
Exchanging looks or alternating gaze direction between the partner and the object of interest is a prime mechanism for establishing mutual awareness of being jointly engaged in a perceptual episode. This requires the capacity for responding to someone else’s attention bid by detecting and following the focus of attention of the partner. However, gaze direction alone may not ensure that the focus of attention is one and the same for the partners. In addition to gaze perception, also head or body orientation, other communication channels like pointing or vocalizations, or the context in which the episode takes place may be used to detect the target of attention.
Actions of pointing and social referencing exist before infants understand that they share them with other agents (Racine & Carpendale, 2007). But humans are not the only species exhibiting nonverbal referential signaling. Intentional communicative pointing has also been observed in chimpanzees (Leavens et al., 2004), great apes and some monkeys (Leavens & Hopkins, 1999). Also some vertebrate species (reptiles, birds, non-human mammals) appear to perceive eye gaze and other attentional cues in other individuals and use gaze information in their social interactions (Emery, 2000).
Sensorimotor accounts are skeptical with respect to the necessity of explicitly detecting and representing the state of somebody else’s attention. Rather, they highlight the efficacy of the co-attender in modulating the interaction between both individuals and between them and the attended object, which transforms the problem of detecting a state into one of enabling a coupling. It has been suggested that, early in ontogeny, infants’ awareness of others’ attending to the self is a kind of “social interaction (that) furnishes (them) with a know-how about others as bearers of intentions, a step towards understanding others’ perspectives” (Reddy, 2003; De Jaegher et al., 2010).
Joint attention requires the perceivers to be engaged, a property that is also recognized as ‘attunement’ or ‘joint involvement’. Engagement is the experience of being coupled to other agents in the flow of an interaction (Fantasia et al., 2014). Jointly attending agents are somehow driven by this coupling. Through such interactions with others, enactivist approaches propose, agents are able to understand the world; in this regard, joint attention forms the basis of participatory sense-making, an ability described by Fuchs and DeJaegher (2009, p. 466) as “the process of generating and transforming meaning in the interplay between interacting individuals and the interaction process itself”. Meanings and intentions are emergent products of interaction, and in many situations, they can be viewed as distributed phenomena rather than as individual, private mental acts or properties. In this shared space, some authors argue, intentions manifest themselves in actions (Fantasia et al., 2014). Engagement in joint-attentional situations with others, therefore, allows one to better understand their intentions and their actions.
Being ‘in tune’ entails that participants share their perceptual experiences to a certain degree. But visual or other sensory properties of the perceptual episode are different for each of the co-attenders; therefore, the engaged subjects cannot share them (Seemann, 2011a; Campbell, 2011). This is why participants in an interaction retain their autonomy during the interaction (Fantasia et al. 2014). What is shared instead is the general affective dimension of the perceptual experience, which Seemann (2011a) calls ‘simple feelings’. They are defined as “experiences that occur, directly, on the grounds of a particular feature of or event in the environment, whether the feature or event is a person, other living creature, or inanimate object or state of affairs” (Seemann, 2011a, p. 193). They are integral part and parcel of perceptual experience, and they play a causal role in determining a focus of attention.
The most prominent feature of simple feelings, and the most relevant from a sensorimotor account, is that they are enacted and embodied, because they occur as parts of a complex event that has mental and bodily characteristics. As the embodied perception of simple feelings goes both ways, it allows partners to share these feelings and, as a consequence, to engage. A similar procedure is described in the social sharing of emotion, which strengthens social bonds, links the interactants, and ends in enhanced social integration (Rimé, 2009).
According to this perspective, joint attention might be understood in terms of a process of engagement and not only a perceptual state. Simple feelings, as defined above, allow creatures both to be engaged with other creatures and to be perceptually aware of their surroundings, because they can be tied to changes in the body state of another, perceptually present and mentally attuned individual (Seemann, 2011a).
Common knowledge is considered to be another prerequisite for joint attention, but accounts differ with respect to the level and extent of common knowledge that is required. Clearly, knowledge about which object is at the focus is necessary for the attention to be joint. According to Campbell (2011), this is the ground on which different perspectives from which a thing may be experienced emerge.
At this point, sensorimotor accounts emphasize the importance of pre-reflective, implicit common knowledge. At the most basic level, this is the presupposition of a common world altogether, which is the ground that creates the possibility for two perspectives coming together and constituting the phenomenon of joint attention (Campbell, 2011). This implicit common knowledge consists of affordances, that is, those features in the environment that are relevant for guiding the agent’s actions. Agents are likely to share a substantial part of their perceived affordances of an object in the environment in order to engage in joint attention towards this object. These shared affordances may also underlie Seemann’s (2011a) suggestion (mentioned above) to conceive of joint attention as joint thing-awareness, that involves the “awareness of being jointly engaged in a perceptual episode which yields a practical grasp of the attentional object’s causal properties” on top of Dretske’s notion of thing-awareness (Dretske, 1993).
A higher level of joint attention requires that agents be mutually aware of their directing attention together to the same object or scene; accounts differ in explaining what this awareness is and how it emerges. On the one hand, both Campbell and Peacocke present high-level accounts of common awareness. Campbell (2011) suggests that introspection is the process by which an agent knows that he or she is in a three-place relation with the co-attender and the object. Peacocke (2005) considers common awareness as an iterative process in which one agent is aware that the conspecific is aware that he is aware of an object. Both accounts do not specify what constitutes this awareness, and they seem to suggest that common awareness is individual awareness directed towards another agent.
On the other hand, these high-level accounts can be contrasted with lower, perceptual level approaches. In Seemann’s (2011a) view, common awareness (‘joint thing-awareness’) is constituted by an embodied perceptual relation obtained between subjects and their object of attention. In addition to the sensitivity towards a perceived object, the perceptual experience of an agent is constituted by the sensitivity to the behavior of conspecifics, the awareness of their focus of attention and their sensitivity to the perceived object. In short, the individuation of one’s own perceptual experience constitutively involves the body state of the conspecific and vice versa. This non-reductive account underlines the fact that joint attention is not attending to someone else’s attention, but that it is a different capability altogether.
In contrast to rich accounts of joint attention, sensorimotor approaches aim to explain as much as possible without appealing to sophisticated cognitive processes, and in particular they aim to avoid appealing to metarepresentational abilities. Instead of attempting to infer the intentional state of conspecifics from their activity, they suggest that the mutual monitoring of activity among the co-attenders is sufficient for the emergence of joint attention. For example, following an enactivist approach, Gallagher (2011) emphasizes external scaffolds — physical places or architectures, games, rules or customs — in attempting to account for how humans engage in and perform jointly attending activities. He makes the example of a football game, where the attention and intentions of the players are specified by the layout of the field, the rules of the game, and their movements and position with respect to the various sectors of the field and the position of the goals. Rather than inferring propositional attitudes of the other players or running simulations of their perspective, in structured social games like this, joint attention is encultured in the context and the actions of the members of the social situation. Joint attention, then, is directly perceived through the set of sensorimotor abilities that allows one to understand the meaning of the interaction with another person for which Trevarthen (1979) has coined the term ‘secondary intersubjectivity'. By intersubjectivity Trevarthen refers to the idea that subjects can coordinate their 'subjectivities' with other creatures’ subjectivities without having to recourse to metarepresentations or any other sort of explicit representation of mental states as internal properties of subjects (Gómez, 1998). He distinguished different levels of intersubjectivity: primary intersubjectivity, which refers to first signs of reciprocation in face-to face exchanges that characterize babies’ early interactions with their caregivers; and then, secondary intersubjectivity, which refers to triadic intentional communication with others about objects, i.e. joint attention (Trevarthen & Hubley, 1978).
Working outside of the enactivist tradition, Campbell (2011, 2005) also characterizes the sensory experience available in joint attention in terms of a non-propositional, personal-level relation. On his view, it does require prior knowledge of which thing is in question (the focus of co-attenders jointly attending to), but it does not involve propositional attitudes.
Communication is considered as a crucial mechanism underlying joint attention. It “...turns a mutually experienced event into an interaction, into something joint“ (Carpenter & Liebal, 2011, p. 168). Primary means of communication are looks (e.g., sharing looks, checking looks), declarative pointing and language. Joint attention may be instigated by one of the interaction partners through initiation looks towards the partner to get the other‘s attention. This ‘invitation to interact’ signals communicative intention and can be followed by reference looks towards the object which in turn signals referential intention. The three-way relation between the partners and the attended object is then established by bidirectional sharing looks, acknowledging and confirming the shared attention. Joint attention may also be caused by external events that attract the partners’ attention at the same time; in this case, only sharing looks may be involved.
Although eye-gaze following can be considered a primary mechanism through which visual attention is socially coordinated, one-year-olds and their parents use an alternate pathway through the coordination of hands and eyes in goal-directed action (Yu & Smith, 2013). Figure 1 illustrates both the eye-eye pathway (explained in the previous paragraph) and the eye-hand pathway. Hands that act on objects provide an alternate and spatially precise route to coordinated visual attention. As a result, “hand actions of an actor have a direct effect on the partner’s looking, leading to coordinated visual attention without direct gaze following” (Yu & Smith, 2013, p. 5). This hand-eye pathway is characterized by rapid socially coordinated adjustments of looking behavior.
Other components in the interaction process are bodily resonance, affect attunement, coordination of gestures, facial and vocal expression that also play a role in initiating and sustaining joint attention episodes. Together with looks, pointing and language, they constitute the core elements in sensorimotor explanations of joint attention. These types of interaction constitute a primary mechanism that enables the understanding of others’ intentions and the world’s properties (Fuchs & DeJaegher, 2009; Gallagher, 2011).
The question of how the conscious experience of joint attention is generated may be answered by extrapolating the idea of sensorimotor contingency theory (O’Regan & Noë 2001, O’Regan, 2011) to a social context. On this account, sensory awareness is constituted by the agent knowing the structure of the regularities between his actions and the resulting changes in the sensory stimulation. This knowledge is the essence of ‘raw feels’, and the differences in the structure of the regularities for different sensory modalities is what distinguishes ‘seeing’ from ‘hearing’ etc. Conscious experience of these feels results from the agent having cognitive access to the fact that it is exercising this knowledge of sensorimotor contingencies. Although sensorimotor contingency theory does not consider the role of other agents in shaping the perceptual process (it is ‘philosophically autistic’ as Gallagher, 2009, calls it), it could be argued from this perspective that the ‘feel’ of joint attention may be an instance of a ‘social’ sensorimotor contingency in which the regularities involved in gazing at an object influences the co-attender and vice versa.
Such regularities have also been considered as ‘implicit relational knowing’ (Lyons-Ruth et al. 1998) that is the result of a history of coordination between two agents, or as pre-reflective knowledge of how to deal with others (Fuchs & DeJaegher, 2009) when, for example, eliciting or guiding their attention.
Functional magnetic resonance imaging (fMRI) has shown that the main effect of joint attention is the recruitment of a neural network comprising the dorsal and ventral portions of medial prefrontal cortex (MPFC) in addition to other areas such as medial orbito-frontal and subgenual cingulate cortices extending into the ventral striatum, posterior cingulate cortex, the calcarine gyrus, the right hippocampus, and anterior temporal cortex (Schilbach et al., 2010). Particularly, dorsal MPFC has been related to the processing of communicative intent, whereas the ventral portion of MPFC might be related to the monitoring of one’s own emotions and the interaction’s outcome.
A focus for studies of neural correlates of joint attention is the distinction between other-initiated joint attention (other-JA) and self-initiated joint attention (self-JA). The differences in the involvement of brain areas between both conditions has been studied in an interactive paradigm in which eye-tracking was used to show gaze-contingent stimuli to subjects in an fMRI scanner (Schilbach et al., 2010). Results showed that other-JA resulted in a greater activation of MPFC, particularly anterior and ventral MPFC (see Fig. 2). These areas are known to be involved in both the meeting of minds and the supramodal coordination of perceptual and cognitive processes. Self-JA, in contrast, bilaterally activated the ventral striatum, which underlies the processing of the hedonic aspects of sharing attention. The increased activation in this reward-related brain area (ventral striatum) is congruent with the results of the questionnaires, in which self-JA was rated to be significantly more pleasant than other-JA. Self-JA might be more pleasant because in this case the subject is able to elicit a congruent response from another participant (Schilbach et al., 2010). Therefore, it is the incoming possibility of establishing a successful interaction which might encourage humans to initiate joint attention.
Emery (2000) suggests a link between gaze processing, the amygdala and autism: amygdala activation may be important for some aspects of the attribution of mental states to eye stimuli (but not all), and a lack of amygdala activation has been found in individuals with autism in a task in which the mental state should be derived from the views of the eyes.
The concepts discussed so far will now be considered in the context of three example scenarios. These scenarios were selected to showcase the application of the concepts in a constructive manner in order to demonstrate that sensorimotor accounts may aptly capture notions of joint attention that go beyond the standard example of looking together at an object.
Collaborative writing is an example of a joint action directed at the creation of a text document. In order to coordinate the activity of the individual participants and to build shared information, joint attention is required. If the participants physically work together, sitting at the same table, for example, they rely on the joint attention mechanisms that were discussed so far. Modern internet technology however enables joint writing also when the participants do not share the same physical space and hence cannot physically interact. This off-line mode of collaboration transcends the direct physical interaction in a given moment and extends it over longer periods and to non-physical, possibly asynchronous interactions. In the following, this mode will be considered in order to discuss the sensorimotor perspective on joint attention. It serves as an example for considering joint attention in terms of a process of engagement rather than a perceptual state of two individuals at the same moment and time (as suggested by Seemann, 2011).
Depending on the type and the contents of the document, writing can last from a couple of days to several years, and, during this interval, collaborators pay their attention to the development of this document. In a similarly general sense they might sometimes even simultaneously attend to the same aspect of the document, e.g., when they are online at the same time, when they are discussing the structure, when they draft text or during a revision. This is not to say that participants always attend to the same sentence or paragraph, but that they are jointly aware that the short-term objective of their current work on the document is a plan for the chapters and sections, a first complete version or a publishable edition of the document. Although the physiological attentional processes in the individual participants in this setting are separated in space and time, they may experience joint attention towards the document as a whole, which may raise the question of how this form of joint attention may be explained from a sensorimotor perspective.
An answer to this question may be approached by first noticing that when agents are separated across space and time, they do not lose their knowledge of structures and regularities in sensorimotor interaction. Provided that the agents have the capability of warping individual sensorimotor patterns to timescales that are different from the one on which they were originally observed, they would be able to deploy their knowledge also when there is no direct, physical coupling between them. From this perspective, there would be many important similarities between a face-to-face discussion and an exchange via email, for example. Certainly the sensorimotor patterns that mediate a discussion would be different - talking and listening to each other are important features of a face-to-face discussion, as well as monitoring each other’s visual (and auditory) attention - but many of the regularities would be the same, e.g., that a question begs for an answer, that explanations transfer information, or that the lead in the discussion can be exchanged between the partners.
In this sense, collaborative writing would be an instance of joint attention in which the usual mediating mechanisms were replaced by means that enable the separation of the participants in space and time. This may explain why they feel that they are jointly dedicating attention to the document that they are working on. But because the mediating processes are different, it feels different from other forms of joint attention in which the participants physically work together in a shared space.
Collaborative writing showcases that joint attention ought not to be understood in the narrow terms of a perceptual state but as a broader phenomenon in terms of a relation, causal and otherwise, between the involved organisms’ activities and the environment in which these activities occur (Seemann, 2011).
Joint search is a common strategy in which two individuals try to recover a lost item, e.g. searching the apartment for the car keys or a grassy field for a lost ball. Both situations are cases of joint action in which joint attention might be an important coordination mechanism. Here, two or more people direct their attention to the same object (and to each other). It differs from typical cases of joint attention to the extent that the object is not present or visible for the subjects: they both know what they are looking for, but they do not perceive it yet. It could be said that individuals are directing their attention to cues in the environment that might lead them to find the object. As the partners can’t know to which aspect of the searched object each of them is attending, joint search seems more likely to lend itself to explication by lean than by rich accounts of joint attention.
To be successful, the partners need to act, and they need to do so in a coordinated way. This coordination comprises, first, an agreement in that they are referring to the same object. The agreement could be obtained because individuals were previously interacting with this object that, for some reason, has then disappeared. The agreement could also be met by previous discussion of which object they are going to look for. Additionally, this coordination encompasses other activities such as, the parcellation of the territory, the adjustment of the level of attention or negotiating when to terminate the search. The coordination, during the joint search activity, is primarily mediated by mutual gaze among the partners, checking looks and pointing. Therefore joint attention in this type of scenario may be sustained through looks, gestures, pointing, etc. This view seems to receive support from behavioral studies in humans showing that providing the partners with information about each other’s gaze can significantly improve search performance (Wahn et al., 2015).
Joint attention across sensory modalities
Although the majority of studies consider joint visual attention, visual-only interaction is rather an exception than a standard case in natural settings. Most objects of joint attention will manifest themselves also by auditory, tactile or olfactory signals in addition to their visual appearance. Moreover, the coordination between the co-attenders frequently relies on vocalizations or verbal communication. For example, a child sitting in the mother’s lap while they both handle a toy may not be facing their mother, so there is no mutual gaze; but it would likely sense from the mother’s posture and touch that she is attending to the toy. Hence joint attention usually involves a number of different sensory modalities.
This multi-sensory aspect is generally compatible with the definition of joint attention in the narrower sense in which it has to be directed to one and the same aspect of the attended object. However, it raises the interesting question whether this requirement implies that co-attenders must employ the same sensory modalities. Relatedly, it may be interesting to investigate more closely how people with impaired perception in a sensory modality, like blind or deaf people, engage in joint attention.
One can approach this question by noticing that humans with impaired perception in one modality usually learn to partially compensate this lack through extended skills in the other modalities. For example, some blind people learn to use acoustic information for navigation and even to actively generate such information by emitting sounds; deaf people may have improved skills for lip-reading. These compensatory skills in a secondary modality may allow these people to focus on a sensory aspect of an object for which their primary modality is impaired.
Questions for future research
As previously mentioned, most research in joint attention has focused on the visual modality. A main issue in this field will be to change or expand the focus from only studying gaze-following and visual abilities to investigating how other modalities inform joint attention. Other cues, like voice direction, body posture, touch, etc. provide information about others’ focus of attention. One particularly challenging issue comes from the study of children who are blind and deaf (Nuñez, 2014) as well as individuals lacking one sensory modality. The limited research on this field has revealed that joint attention is a multisensory experience (Nuñez, 2014; Nuñez et al., in prep.). This matter has been neglected in the literature of joint attention so far: from the definitions of the phenomenon to the kind of abilities proposed that conform joint attention. More studies in atypically and typically developing individuals are needed to clarify how other modalities are employed in joint attention.
An additional change of perspective is needed to understand the mechanisms of joint attention in non-Western cultures. Studies from other cultures (as well as from atypically developing individuals) point at the possibility that gaze is ubiquitous in mediating joint attention, and that other senses may sometimes play the role of gaze. Consequently, more studies are needed in those cultures where mothers spend less time looking at their infants or where gazing at each other is less common in adults. For example, in cultures in which infants spend a lot of time carried on their caregivers’ backs, like the Kaluli clan in Papua New Guinea, or in groups where people normally engage in more touching and holding and less eye contact, like the Gusii people in Kenya, vocal cues (e.g., voice direction, prosody) and touch may be particularly important (Akhtar & Gernsbacher, 2008).
The capability to have some form of joint attention may be helpful for robots that interact with humans. The approaches for developing artificial agents that have this capability reflect the range of different contexts in which joint attention is studied in humans and the different accounts used to explain this phenomenon. Correspondingly the spectrum of robot models for joint attention extends from simply imitating core behavioral elements (e.g., Andry et al., 2001) of joint attention, like getting and detecting the attention of a human and establishing mutual gaze (Imai et al.’s ‘Robovie’, 2003), to more cognitive models in which the robot and the human are supposed to attend to the same aspect of the environment and which require, in addition to attention detection and manipulation, skills for establishing and maintaining a coordinated collaborative coupling between the agents and for mutual intentional understanding. Until now, no system has yet achieved joint attention in this demanding sense (Kaplan & Hafner, 2006).
Models frequently build on functions for tracking human faces, recognizing and generating facial expressions, object recognition, pointing to and reaching for objects, and alternating gaze between faces and objects (Kozima and Yano, 2001; Hafner and Kaplan, 2005). Hence another division can be made between models in which these functions are hand-crafted and models of how they develop through interaction with the environment. Research in developmental robotics is closely linked with that in human infant development. Most of the models focus on a single developmental step (e.g. showing the emergence of gaze following when an adequate reward system is present). By studying the development of each prerequisite separately, these models may not capture synergetic dynamics linking their parallel development. Instead of designing different models to independently study attention detection, attention manipulation, social coordination or intentional understanding, one strategy could be to build architectures with generic developmental principles and to study which embodiment and environmental conditions lead to the simultaneous development of these skills. Current results obtained with a generic architecture for autonomous mental development however may be considered too preliminary to explain the emergence of the capacity for joint attention.
Further progress will require solutions to a number of problems. For example, the interaction during episodes of joint attention involves turn-taking (studied by Ikegami and Iizuka, 2003) and other forms of social coordination. Work on the dynamics of and computational models for coordination is at an incipient stage. Whereas models for recognizing activities, gestures or body postures are making good progress, to reliably understand the underlying intentions of the human partner requires that the robot controller considers the larger context in which the interaction takes place as well as general knowledge of structures in the interaction with humans. Trying to discover and utilize structure in the sensorimotor experience of the robot rather than striving for the extraction of abstract, declarative knowledge from it may considerably simplify the search for solutions to these open problems.
Related Scholarpedia articles
- Sensorimotor theory of consciousness (http://www.scholarpedia.org/article/Sensorimotor_theory_of_consciousness)
- Neural basis of emotions (http://www.scholarpedia.org/article/Neural_basis_of_emotions)
The authors would like to thank Mattia Gallotti for reading a previous version of the manuscript and making constructive comments. This work was supported by the European Union through the H2020 FET Proactive project socSMCs (GA no 641321). Pamela Barone was supported by a Ph.D. fellowship from the Spanish Ministry of Economy and Competitiveness (BES-2014-067640) and Carme Isern-Mas was supported by a Ph.D. fellowship from the Spanish Ministry of Education, Culture and Sports (FPU14/01186).
Abbeduto, L., Warren, S.F., & Conners, F.A. (2007). Language development in Down syndrome: From the prelinguistic period to the acquisition of literacy. Mental retardation and developmental disabilities research reviews, 13(3), 247-261.
Akhtar, N., Gernsbacher, M.A. (2008). On privileging the role of gaze in infant social cognition. Child development perspectives, 2(2), 59-65.
Andry, P., Gaussier, P., Moga, S., Banquet, J., & Nadel, J. (2001). Learning and communication in imitation: an autonomous robot perspective. IEEE Transaction on Systems, Man and Cybernetics, Part A: Systems and Humans, 31(5), 431–444.
Bigelow, A.E. (2003). The development of joint attention in blind infants. Development and psychopathology, 15(02), 259-275.
Bruner, J. (1995). From joint attention to the meeting of minds: An introduction. In: Moore, C., & Dunham, P.J. (Eds.), Joint attention. Its origins and role in development. Lawrence Erlbaum Associates, Publishers, Hove, UK
Butterworth, G. (1991). The ontogeny and phylogeny of joint visual attention. In: Whiten, A. (Ed.) Natural theories of mind: Evolution, development, and simulation of everyday mindreading (pp. 223-232). Oxford, England: Blackwell.
Butterworth, G. (1998). What is special about pointing in babies? In Simion, F., & Butterworth, G. (Eds.), The development of sensory, motor and cognitive capacities in early infancy: From perception to cognition (pp. 29–40). Hove: Psychology Press.
Campbell, J. (2005). Joint attention and common knowledge. In Eilan, N., Hoerl, C., McCormack, T., & Roessler, J. (Eds.), Joint Attention: Communication and Other Minds. Issues in Philosophy and Psychology (pp. 287-297). New York: Oxford University Press.
Campbell, J. (2011). An Object-Dependent Perspective on Joint Attention. In Seemann A. (Ed.), Joint Attention: New Developments in Psychology, Philosophy of Mind, and Social Neuroscience (pp. 415-430), Cambridge, MA: MIT Press.
Carpenter, M., & Call, J. (2007). Comparing the imitative skills of children and nonhuman apes. Primatologie, 7, 1-25.
Carpenter, M., & Liebal, K. (2011). Joint attention, communication, and knowing together in infancy. In Seemann, A. (Ed.), Joint Attention: New Developments in Psychology, Philosophy of Mind, and Social Neuroscience (pp. 159-182). Cambridge, MA: MIT Press.
De Jaegher, H:, DiPaolo, E., & Gallagher, S. (2010). Can Social Interaction Constitute Social Cognition? Trends in Cognitive Sciences, 14(10), 441-447.
Dretske, F. (1993). Conscious experience. Mind, 102, 263–283.
Emery, N.J. (2000). The eyes have it: the neuroethology, function and evolution of social gaze. Neuroscience and Biobehavioral Reviews 24, 581-604.
Fantasia, V., De Jaegher, H., & Fasulo, A. (2014). We can work it out: an enactive look at cooperation. Frontiers in Psychology, 5:874.
Fuchs, T. (2013). The phenomenology and development of social perspectives. Phenomenology and the cognitive sciences, 12(4), 655-683.
Fuchs, T., & De Jaegher, H. (2009). Enactive intersubjectivity: participatory sense-making and mutual incorporation. Phenomenology and the Cognitive Sciences, 8(4), 465-486.
Gallagher, S. (2009). Two problems of intersubjectivity. Journal of Consciousness Studies, 16(6-8), 289-308.
Gallagher, S. (2011). Interactive Coordination in Joint Attention. In Seemann, A. (Ed.), Joint Attention: New Developments in Psychology, Philosophy of Mind, and Social Neuroscience (pp. 293-306). Cambridge, MA: MIT Press.
Gómez, J. C. (1998) Are Apes Persons? The Case for Primate Intersubjectivity. Etica & Animali, 9, 51-63.
Hafner, V., & Kaplan, F. (2005). Interpersonal maps and the body correspondence problem. In Demiris, Y., Dautenhahn, K. & Nehaniv, C. (Eds.) Proceedings of the Third International Symposium on Imitation in Animals and Artifacts (pp. 48–53), Hertfordshire, UK.
Hobson, R. P. (2005). What Puts the Jointness into Joint Attention?. In Eilan, N., Hoerl, C., McCormack T., & Roessler, J. (Eds.), Joint Attention: Communication and Other Minds. Issues in Philosophy and Psychology (pp. 185-204). New York: Oxford University Press.
Hobson RP, Hobson JA (2011) Joint attention or joint engagement? Insights from autism. In Seemann, A. (Ed.) Joint Attention: New Developments in Philosophy, Psychology, and Neuroscience, pp. 115-135. Cambridge, MA: MIT Press.
Ikegami, T., & Iizuka, H. (2003). Joint attention and dynamics repertoire in coupled dynamical recognizers. In Dautenhahn, K., & Nehaniv, C. (Eds.), Proceedings of the Second International Symposium on Imitation in Animals and Artifacts (pp. 125–130), Aberystwyth, UK.
Imai, M., Ono, T., & Ishiguro, H. (2003). Physical relation and expression: Joint attention for human-robot interaction. IEEE Transaction on Industrial Electronics, 50(4), 636–643.
Kaplan F., & Hafner, V. (2006). The challenges of joint attention. Interaction Studies, 7(2), 135-169.
Kasari C., Sigman, M., Mundy, P., & Yirmiya, N. (1990). Affective sharing in the context of joint attention interactions of normal, autistic, and mentally retarded children. Journal of autism and developmental disorders, 20(1), 87-100.
Kozima H., & Yano H. (2001). A robot that learns to communicate with human caregivers. In Balkenius, C., Zlatev, J., Kozima, H., Dautenhahn, K., & Breazeal, C. (Eds.), Proceedings of the First International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems (pp. 47–52), Lund University Cognitive Studies 85.
Landry S.H., & Loveland, K.A. (1988). Communication behaviors in autism and developmental language delay. Journal of Child Psychology and Psychiatry, 29(5), 621-634.
Leavens, D.Aa, & Hopkins, W.D. (1999). The whole hand point: The structural and function of pointing from a comparative perspective. Journal of Comparative Psychology, 113, 417–425.
Leavens, D.A., Hopkins, W.D., & Thomas, R.K. (2004). Referential communications of chimpanzees (Pan troglodytes). Journal of Comparative Psychology, 118, 48–57.
Leavens D.A., Sansone, J., Burfield, A., Lightfoot, S., O'Hara, S., & Todd, B.K. (2014). Putting the 'joy' in joint attention: affective-gestural synchrony by parents who point for their babies. Frontiers in Psychology, 5(879), 1-14.
Leekam, S., Baron‐Cohen, S., Perrett, D., Milders, M., & Brown, S. (1997). Eye‐direction detection: A dissociation between geometric and joint attention skills in autism. British journal of developmental psychology, 15(1), 77-95.
Leekam, S.R., Hunnisett, E., & Moore, C. (1998). Targets and Cues: Gaze‐following in Children with Autism. Journal of Child Psychology and Psychiatry, 39(7), 951-962.
Lyons-Ruth, K., Bruschweiler-Stern, N., Harrison, A.M., Morgan, A.C., Nahum, J.P., Sander, L., et al. (1998). Implicit relational knowing: Its role in development and psychoanalytic treatment. Infant Mental Health Journal, 19, 282–289.
Moll, H., & Meltzoff, A.N. (2011). Joint attention as the fundamental basis of perspectives. In A. Seemann (Ed.), Joint attention. Boston: MIT Press.
Mundy, P. (2003). The neural basis of social impairments in autism: The role of the dorsal medial-frontal and anterior cingulate system. Journal of Child Psychology & Psychiatry, 44, 793–809.
Mundy, P., Sigman, M., Ungerer, J., & Sherman, T. (1986). Defining the social deficits of autism: The contribution of non‐verbal communication measures. Journal of Child Psychology & Psychiatry, 27(5), 657-669.
Nowakowski, M.E., Tasker, S.L., & Schmidt, L.A. (2009). Establishment of joint attention in dyads involving hearing mothers of deaf and hearing children, and its relation to adaptive social behavior. American Annals of the Deaf, 154(1), 15-29.
Núñez, M. (2014). Joint attention in deafblind children: A multisensory path towards a shared sense of the world. Sense. London. link
Núñez, M., Reddy, V., Franco, F., & Leekam, S. (2016). Rethinking Joint Attention: Multisensory strategies in deafblind children. Manuscript under review.
O'Regan, J.K., & Noë, A. (2001). A sensorimotor account of vision and visual consciousness. Behavioral and brain sciences, 24(05), 939-973.
O'Regan, J.K. (2011). Why red doesn't sound like a bell: Understanding the feel of consciousness. Oxford University Press.
Peacocke, C. (2005). Joint Attention: its nature, reflexivity and relation to common knowledge. In Eilan, N., Hoerl, C., McCormack T., & Roessler, J. (Eds.), Joint Attention: Communication and Other Minds. Issues in Philosophy and Psychology (pp. 298-324). New York: Oxford University Press.
Racine, T.P. (2011). Getting beyond rich and lean views of joint attention. In Seemann, A. (Ed.), Joint attention: New Developments in Psychology, Philosophy of Mind, and Social Neuroscience (pp. 21-42). Cambridge, MA: MIT Press.
Racine, T.P., & Carpendale, J.I.M. (2007). The role of shared practice in joint attention. British Journal of Developmental Psychology, 25, 3–25.
Reddy, V. (2003). On being the object of attention: implications for self–other consciousness. Trends in Cognitive Sciences, 7(9), 397-402.
Reddy, V. (2008). How infants know minds. Harvard University Press.
Reddy, V. (2011). A gaze at grips with me. In Seemann, A. (Ed.), Joint Attention: New Developments in Philosophy, Psychology, and Neuroscience (pp. 137-158). Cambridge, MA: MIT Press.
Rimé, B. (2009). Emotion elicits the social sharing of emotion: Theory and empirical review. Emotion Review, 1(1), 60-85.
Schilbach, L., Wilms, M., Eickhoff, S.B., Romanzetti, S., Tepest, R., Bente, G., Jon Shah, N.J., Fink, G.R., & Vogeley, K. (2010). Minds made for sharing: initiating joint attention recruits reward-related neurocircuitry. Journal of Cognitive Neuroscience, 22(12), 2702-2715. link
Seemann, A. (2011). Joint Attention: New Developments in Psychology, Philosophy of Mind, and Social Neuroscience. Cambridge, MA: MIT Press.
Seemann, A. (2011a) Joint Attention: Toward a Relational Account. In Seemann, A. (Ed.), Joint attention: New Developments in Psychology, Philosophy of Mind, and Social Neuroscience (pp. 183-202). Cambridge, MA: MIT Press.
Tollefsen, D. (2005). Let’s pretend! Joint action and young children. Philosophy of the Social Sciences, 35 (1), 75 – 97.
Tollefsen, D., & Dale, R. (2012). Naturalizing joint action: A process-based approach. Philosophical Psychology 25(3), 385–407.
Tomasello, M. (1995). Joint attention as social cognition. In: Moore, C., & Dunham, P.J. (Eds.) Joint attention. Its origins and role in development. Hove, UK: Lawrence Erlbaum Associates Publishers.
Tomasello, M., Carpenter, M., Call, J., Behne, T., & Mol, H. (2005). Understanding and sharing intentions: The origins of cultural cognition. Behavioral and Brain Sciences, 28(05), 675-691.
Trevarthen, C.B. (1979). Communication and cooperation in early infancy: A description of primary intersubjectivity. In Bullowa, M. (Ed.), Before Speech. Cambridge: Cambridge University Press.
Trevarthen, C., & Hubley, P. (1978). Secondary intersubjectivity: Confidence, confiding and acts of meaning in the first year. In Lock, A. (Ed.), Action, Gesture and Symbol: The Emergence of Language (pp. 183-229), London: Academic Press.
Wahn, B., Schwandt, J., Krüger, M., Crafa, D., Nunnendorf, V., & König, P. (2015). Multisensory teamwork: using a tactile or an auditory display to exchange gaze information improves performance in joint visual search, Ergonomics, 59(6).
Yu, C., Smith, L.B. (2013). Joint Attention without Gaze Following: Human Infants and Their Parents Coordinate Visual Attention to Objects through Eye-Hand Coordination. PLoS ONE, 8:79659.