Border-ownership coding

From Scholarpedia
Jonathan R. Williford and Rüdiger von der Heydt (2013), Scholarpedia, 8(10):30040. doi:10.4249/scholarpedia.30040 revision #147356 [link to/cite this article]
Jump to: navigation, search
Post-publication activity

Curator: Rüdiger von der Heydt

Figure 1: Three points on borders caused by visual occlusion are marked on a visual scene. At point A, the koala's head is occluding a tree trunk. This part of the border is owned by the koala. More complex occlusions also occur in this image, such as the koala occluding itself (e.g., point B) and a tree that is both occluded by and occluding the koala (e.g. of latter: point C).
Figure 2: Bregman ink blotted Bs. The blue regions are the same in (A) and (B), however, the black ink blot owns some of the borders of the blue regions in (B), which makes it easier to perceive the letters "B". Modified from (Bregman, 1981).
Figure 3: Rubin face-vase illusion. The border-ownership is ambiguous such that you can perceive either two black face profiles or a single white vase.
Figure 4: The peristimulus histograms of a hypothetical orientation selective neuron, which plot the neuron's firing rate in time after the presentation of a static stimulus. The displays are shown below the peristimulus histograms, where the color of the enclosing boxes matches the corresponding response curve. (A) A response showing orientation selectivity. The difference in firing rate is apparent as soon as the neuron starts responding to the stimulus, which is around 40 ms after the stimulus onset for rhesus macaques. (B) A response showing border-ownership selectivity. The border-ownership signal, which is the difference in the firing rates of when the figure is on one side versus the other, begins to appear around 50 ms. (C) A response showing a related phenomenon, called figure-ground modulation (Lamme, 1995). The figure-ground modulation signal appears later, around 80 ms near the borders and 100 ms in the center of the figure.

Understanding visual scenes requires the visual system to infer the three-dimensional structure of the world from the two-dimensional projections on our retinae. This task is complicated by the fact that objects closer to the viewer will often block or occlude the view of objects farther away, as depicted in Figure 1. This produces visual borders that are owned by the closer, occluding objects. For example, the border at point A in Figure 1 is owned by the koala. Gestalt psychologists were the first to notice the importance of border-ownership in perception. Since the occluding borders define the shape of the foreground, the perception of form depends on the correct assignment of these borders (e.g. Figure 2). Edgar Rubin created an intriguing figure in which borders can be perceptually assigned either way, with the effect that different shapes are perceived depending on how the borders are assigned (Figure 3, Rubin, 1915). Rubin noticed that perception becomes unstable with such figures, slowly flipping back and forth between the alternative interpretations. These compulsive alternations of perception and their specific timing made it obvious for the first time that mechanisms exist in the brain that strive for an interpretation of the image, and that there is a neural substrate that represents the interpretations.

When looking at natural scenes, perceiving which borders belong to which objects seems like a trivial task. However, it is unknown how the visual system is able to accomplish this feat, and no computational solution is currently known that approaches the reliability of the primate visual system (see Hoiem, Efros, & Hebert, 2011 and Borenstein & Ullman, 2008 for recent approaches).

The process of distinguishing foreground and background in drawings or other two-dimensional displays is generally referred to as "figure-ground organization", which seems to imply that the brain labels regions as figure (the closer, occluding region) and ground (the more distant, occluded region). In a more complex example involving multiple occlusion levels, the different regions might be labeled with a relative depth order. Labeling regions this way and assigning border-ownership might seem like equivalent ways of representing the occlusion structure of a scene. However, cases in which either an object occludes itself or multiple objects mutually occlude each other (Figure 1) cannot be represented by region labeling. In contrast, one can code any complexity of occlusion structure by assigning border-ownership of the occluding contours.

Studies of the visual cortex have provided evidence for both kinds of coding. Region labeling was discovered by Victor Lamme (1995) who found that certain neurons fired at a higher rate in response to stimulus elements in a figure region compared to elements in the ground region (Figure 4C). Border-ownership coding was discovered by Zhou, Friedman, and von der Heydt (2000) who noticed that certain neurons responded to borders with different firing rates depending on whether the border was owned by a figure on one side or the other (Figure 4B). To date, it is not clear how these two coding schemes in the visual cortex are related.

This article reviews border-ownership coding in the early visual cortex. Studies since Zhou et al. (2000) have not only revealed how the brain represents figure-ground organization but have also provided insight into the mechanisms of object representation and selective attention. For example, Qui, Sugihara, and von der Heydt (2007) found that the border-ownership representation emerges independently of attention, but provides a structure for object-based attention. O'Herron and von der Heydt found that border-ownership signals persist on the order of a second (2009, 2011) and can be "remapped" across eye movements and with object movements (2013). On average, the border-ownership signal appears to be consistent over multiple stimuli, such as various geometric shapes, transparent overlaying figures (Qiu & von der Heydt, 2007), stereoscopic displays (Qiu & von der Heydt, 2005), and dynamic occlusion displays (von der Heydt, Qiu, & He, 2003). It is interesting also that border-ownership does actually affect shape processing in infero-temporal cortex (IT), as suggested by the perceptual demonstrations of Figures 2 and 3. Baylis and Driver, who had previously studied the effect of border-ownership in perception, demonstrated that the responses of contour-shape selective IT neurons depend on border-ownership (Baylis and Driver, 2001).

While these studies are based on recordings of neurons in monkey visual cortex, the existence of border-ownership-selective neurons has also been demonstrated in the human visual cortex. In a psychophysical study, von der Heydt, Macuda, and Qiu (2005) demonstrated a border-ownership selective tilt after effect. Fang, Boyaci, and Kersten (2009) used an adaptation paradigm and fMRI to demonstrate a border-ownership-selective BOLD signal.


Border-ownership coding: Configural cues

Figure 5: Illustration of the stimuli for a neuron that prefers borders oriented at 120°. The mean receptive field size of V2 neurons recorded in (Zhou et al., 2000) (.7°×.4°) is denoted by the red rectangle (not shown to subject). All stimuli within each sub-figure, (A) and (B), are identical in the region surrounding the CRF, as shown in zoomed in display on the left of each sub-figure. The single square stimuli were used as the standard test of border-ownership, while the C-shape and overlapping square stimuli were shown to some of the neurons for comparison.

Zhou et al. recorded isolated single cell neuronal activity from awake behaving macaques. For the initial experiment, called the standard test, the border of a uniform colored square was aligned to the receptive field of the neuron, rotated to match the neuron's preferred orientation. The stimuli within the classical receptive field (CRF) were kept constant while changing the side of the figure by concurrently reversing the colors of the figure and ground, as shown in Figure 5 (compare top and bottom displays). The border-ownership signal of a neuron was defined by the response to locally identical stimuli when the figure is on its preferred side compared to non-preferred side. For example, in the neuron shown in Figure 4B, the border-ownership signal is the difference between the red and blue line. Zhou et al. studied the visual areas V1, V2, and V4 and found border-ownership selective neurons in each area (Figure 6). In V2, these were more than 50% of the orientation selective neurons that responded to contrast edges (which constitute about 80% of all V2 neurons). In V1, less than 20% were border-ownership selective. In V4, the fraction was around 50%, but this is the percentage of neurons that could be activated with figure edges, which were about half of the cells encountered.

Figure 6: Summary of the results from (Zhou et al., 2000) using the standard (single square) test. Proportions are out of the orientation selective neurons that responded to contrast edges. Neurons that are selective for contrast polarity will fire at different rates when the colors within their CRF alternate (for example, a neuron might prefer one side of the border to be darker than the other).

Due to the large proportion of border-ownership selective neurons and the amount of receptive field overlap in the early visual areas, there will be many neurons whose receptive field encode a given piece of a figure's border. Roughly half of these neurons will prefer the figure to be on one side of the border, while the other neurons will prefer the opposite side. The actual side of the figure will then be encoded at each location by the ratio of the firing rates of the neurons in two pools with opposing side preferences.

In addition to the standard test, some cells were also presented with C-figures and overlapping rectangles, like those shown in Figure 5. Both of these displays elicited a significant border-ownership in a smaller proportion of cells than the single square. However, when both the single square and one of the other figures elicited significant border-ownership signals, they were nearly always consistent (see Figure 27 of Zhou et al., 2000). It was also shown (for two example neurons) that the border-ownership signal was position invariant in the direction orthogonal to the border.

Figure 7: Stimuli used in (Qiu and von der Heydt, 2007). Border-ownership was determined by the difference in response between the large displays and the inset displays. The border-ownership signal flips in the middle display (with the inset display eliciting a larger response), in agreement with a transparent overlay interpretation.

Extending the neurophysiological approach to a different perceptual situation, Qiu and von der Heydt (2007) showed that the same neurons also code for border-ownership according to the perception of transparent overlay. When 4 squares are arranged like in Figure 7B, it looks like one semi-transparent bar is overlaying another. With this interpretation, the border in the receptive field (indicated with the red oval) would be owned by the left. If the corners are rounded, as in Figure 7C, the perception is broken and 4 separate squares are seen and the border is owned by the right. In fact, the average border-ownership signal switched in agreement with the perceptual interpretation.

Zhang and von der Heydt (2010) explored the contribution of individual edges to the border-ownership assignment by using contour-defined squares (akin to the Cornsweet illusion figure) and decomposing the contour into fragments. Fragments on the preferred side-of-figure produced facilitation, while fragments on the opposite side produced suppression. The timing of the contributions of the fragments was similar regardless of their proximity to the CRF.

Motion cues

The motion around visual borders can indicate the side of the occluding figure. When the region on one side of the border, but not the other, moves along with the border, then the side with consistent motion is seen as the side owning the border. Textures or other image features from the background will disappear or appear at the border. Von der Heydt, Qiu, and He (2003) tested neurons with displays where moving random dot patterns defined the border-ownership. There was significant correlation between the border-ownership signal elicited by the standard test and that elicited by moving dots.

Stereoscopic cues

Random-dot stereograms are paired images where the forms of objects are only visible when the images are viewed stereoscopically. Von der Heydt, Zhou, and Friedman (2000) used such stereograms to study the form processing of the supragranular layers of V1 and area V2. Both visual areas contain neurons that respond preferentially to surfaces at a specific depth. However, in area V2, but not V1, neurons were found that responded to borders of figures defined by stereoscopic depth and were tuned to the orientation of the borders, just as they were tuned to contrast-defined edges. Furthermore, most of these neurons also fired at a higher rate when a specific side of the border (the preferred side) was closer than the other. In other words, border-ownership signals can be elicited by the stereoscopic depth order of the edge.

Another study explored the relationship between the border-ownership signal elicited by a solid colored square and the border-ownership signal elicited by stereoscopic depth (Qiu and von der Heydt, 2005). In area V2, 22% of the neurons (37/174) were selective for border-ownership with the contrast-defined figure (without depth) as well as for border-ownership defined by depth in random-dot stereograms (which are devoid of contrast edges). Of this subset, 81% (30/37) had the same preferred side for both stimuli. This correlation shows that the neurons combine different figure-ground cues in a meaningful way. One cue is stereoscopic depth order, the other cue is the global configuration of edges. At contours of occluding objects in the real world, stereoscopic depth is 'near' on the object side relative to the background side. Thus, the observed correlation shows that the visual system treats a figure on a computer display like a real object occluding a background.

Framework for attention

Figure 8: Attentional modulation and border-ownership selectivity. Population results (n=216) from the second experiment of (Qiu, Sugihara, & von der Heydt, 2007). Red (c,d) indicates that the occluding square is on the neuron's preferred side of border-ownership (depicted as being on the right), while blue (a,b) indicates the occluding square is on the opposite side. Solid filled bars (b,d) indicate that attention is directed to the preferred side, while patterned bars (a,c) indicate attention is directed to the opposite side. By definition, \((c+d)≥(a+b)\). A surprising finding is that the population has a significantly higher firing rate when the attention is on the same side as the neuron's preferred side of border-ownership, i.e. \(b>a\) and \(d>c\).

The relationship between selective attention and border-ownership coding was explored by Qiu, Sugihara, and von der Heydt (2007). Their findings suggest that the mechanisms responsible for border-ownership coding provide a structure for object-based attention. They used a shape discrimination task to manipulate selective attention, and independently varied border-ownership. While fixating, the monkeys were presented with 3 figures and were rewarded when they correctly discriminated the cued figure as being either a rectangle or a trapezoid. One of the figures was cued at the beginning of a block of trials, and then one of the other figures for the next block etc.

The authors discovered that border-ownership coding at any of the figures still occurred when attention was directed elsewhere on the screen. The strength of the border-ownership signals at one figure decreased only slightly when the attention was directed away compared to when attention was at that figure.

Qiu, Sugihara, and von der Heydt (2007) also found an asymmetry of the attention effect. When two overlapping figures are presented, a neuron responding to the occluding border shows response enhancement when the figure on one side is attended compared the figure on the other side, irrespective of border-ownership (given by the direction of overlap). Each neuron has its preferred side of attention. Interestingly, preferred side of attention and preferred side of border-ownership are correlated. As a result, the responses to the four combinations of overlap and side of attention, averaged over neurons, vary as depicted in Figure 8. The correlation of the attention effect with the border-ownership preference suggests that the same mechanism that gives rise to border-ownership also mediates attentional modulation.

Persistence and remapping of border-ownership signals

Figure 9: Example displays and approximate time course of the population border-ownership signal of experiments by O'Herron and von der Heydt (2009, 2013). In the figure phase of (A,B), a square is shown with one border aligned to a CRF, such that the square is on the preferred (A) or non-preferred (B) side-of-figure. After 500 ms, the display changes to an ambiguous edge. The border-ownership signal \(\left(A-B\right)\) persists, as seen on the peristimulus histograms. In (C,D), a square is shown in the figure phase, but the edges are outside the CRF. After 500 ms, the square changes to a single edge with ambiguous ownership and subsequently the fixation point is moved such that the CRF moves onto the edge. Surprisingly, the border-ownership signal \(\left(C-D\right)\) appears even in this case where the neuron being recorded is never presented an unambiguous border of the square.

When looking at visual scenes, humans and many other animals do not maintain a fixed eye position, but continuously make saccades, several times per second. Even though a large part of our visual system is retinotopically organized, we maintain a stable visual perception. O'Herron and von der Heydt (2009,2011) discovered that the border-ownership signals in V2 neurons often persist for over a second when the figure-ground assignment becomes ambiguous (see Figure 9). Even more interesting, this border-ownership persistence can be remapped during saccades and moves with the ambiguous displays if they jump to a new location (O'Herron & von der Heydt, 2013). These findings show that border-ownership selectivity reflects a mechanism that helps to maintain a stable visual percept.

O'Herron and von der Heydt (2009) aligned an edge of a square to the CRF at the preferred orientation, as shown in Figure 9. A stereoscopic display was used to make the circle appear as a window, with the outside region appearing a few cm in front of the stimuli within the circular window. The square was presented for 500 ms, and then switched to an ambiguous display (Figure 9A) where the border could be owned by either side. The authors analyzed the border-ownership modulation in the persistence phase (from 200 ms to 1000 ms after ambiguous display onset). Looking at the spike counts during this interval persistence varied a lot between cells, from no persistence to nearly complete persistence. However, the time course of the population signal showed a slow steady decay with a time constant of 400 ms.

One possible explanation for this persistence might be that an afterimage is responsible for this effect. This was ruled out by continuously inverting the colors during the figure display at a fast rate before switching to the static phase with the ambiguous display. While the responses would sometimes oscillate in the figure phase, due to selectivity for edge contrast polarity, the border-ownership signal during the persistence phase was virtually identical to that after steady figure presentation. Afterimages, on the other hand, would be significantly reduced by the periodic color inversion.

The border-ownership signal did not depend much on the duration of the figure phase (50, 250, or 500 ms), suggesting that the signal doesn't accumulate. Also, when two figure-ambiguous sequences were presented in succession within a fixation period, the persisting signal in the last phase depended only on the immediately preceding figure display.

A border-ownership signal could also be produced by presenting the ambiguous edge with a few dots with 'far' disparity added on one side. This side is then perceived as background, and the border as owned by the other side. The population border-ownership signal pointed to that side accordingly, and persisted after the dots were removed.

O'Herron and von der Heydt (2009) also showed that the persistence is not the result of attention being attracted by the figure. It would be conceivable that attention would then linger on that side after the square is replaced by an ambiguous edge. This possibility was rejected by adding a second square to the display and having one of the squares, chosen randomly, appear 300 ms before the other. After another 300 ms during which both squares were present, a surface with two circular windows appeared in front that left only one edge of each square visible (one of which was in the CRF of the neuron being recorded). If an automatic shift of attention were responsible for border-ownership persistence, the border-ownership from the first figure should be interrupted by the display of the second figure. However, similar persistence was seen regardless of whether the figure at the CRF was shown first or second.

Neural models and constraints

Figure 10: Simplified diagrams of an example border-ownership coding model for each of the three general classes. The stimulus display (light gray region, bottom) contains a single white square on the right, with another potential square location outlined on the left. The receptive fields of V1 simple cells are depicted on top of the display with red oval outlines. The vividness of the colors indicate the level of spiking activity. Pointed arrow heads between cells indicate an excitatory connections, while round arrow heads indicate an inhibitory synapse. Arrows within cells indicate preferred side-of-figure of border-ownership cells. Colors indicate the preferred figure location on the display (blue: left square, red: right square, gray: other position). (A) A modulatory surround feedforward model of border-ownership. The "S" cells provide surrounds for the border-ownership cells and either provide facilitation or suppression depending on the side preference. (B) A lateral propagation model of border-ownership. (C) A grouping cell feedback model of border-ownership. The "G" cells are grouping cells that first get excited by the borders that match their preferred figure position and size, and then modulate the border signals, creating border-ownership selectivity.

As shown in the studies reviewed above, the primate brain has a remarkable ability to calculate border-ownership quickly, even when doing so requires contextual integration over large areas of the visual field. It is a challenge to model how border-ownership coding can be calculated so quickly, considering that the context information is spread out widely in cortex, and neural conduction velocity is limited. Based on the possible neural mechanisms of propagating context signals across the retinotopic cortical representation one can distinguish three general classes of models: feedforward, horizontal, and feedback.

Feedforward models

Many neurons have regions outside of their CRF that modulate their response. These modulatory surrounds can be either suppressive or facilitative. Walker, Ohzawa, and Freeman (1999) found that the surround regions, measured with grating stimuli, are generally suppressive, and often asymmetric about the CRF. Motivated by these findings, Sakai and Nishimura (2006) showed that a model with asymmetric surround regions (a facilatory region on one side and a suppressive region on the other, Figure 10A), stochastically chosen for each neuron, can account in a statistical sense for the data of Zhou et al. (2000). Supèr, Romeo, and Keil (2010) proposed a feed-forward model that uses two stages of concentric center-surround mechanisms to calculate, first figure-ground modulation, and subsequently border-ownership assignment. However, the feedforward models are physiologically implausible. First, the anatomically defined forward connections are precisely what defines the CRF, whereas the non-classical surround is mediated by horizontal connections and feedback from higher areas (Angelucci, Levitt, & Lund, 2001). Second, the cited studies on surround modulation cannot explain the large range of the context influence in border-ownership modulation (10 times the extent of the CRF and more). And third, neither of the two model studies addresses the problem of limited conduction velocity and the short latency of border-ownership signals. The virtue of these models is their simplicity, but it is unclear if they can explain critical findings such as the strong border-ownership signals for displays of transparent overlay (Qiu & von der Heydt, 2007).

Lateral propagation models

Zhaoping (2005) proposed a model in which border-ownership is calculated within V2, relying on lateral connections (Figure 10B). Local borders are represented by two sets of cells, one for each direction of border assignment. The activity of these cells spreads through lateral connections, providing either enhancement or suppression, depending on the shape of the activating contour. By propagating activity along the representation of the contour, the network assigns border-ownership to the predominantly concave side of the contour. It does this correctly even for stretches of contour where the figure is on the convex side as in the case of a C-shaped figure. Sugihara, Qiu, and von der Heydt (2011) argue that such a model cannot explain the short latency of border-ownership signals because of the large distances the signals would have to travel along the contour representation in cortex and the low conduction velocity of horizontal fibers. They measured the latencies of the border-ownership signals for different sizes of squares and calculated the cortical distance to the nearest point of context information. They found that the recorded latencies did not increase as much as predicted by the model.

Feedback models

Feedback models of border-ownership coding (Craft, Schütze, Niebur, & von der Heydt, 2007; Jehee, Lamme, & Roelfsema, 2007) rely on higher level areas that have larger receptive fields and modulate the activity in the lower level areas via back projections. Craft et al. proposed a “grouping cell model” in which signals from edge selective cells in V2 are integrated by grouping cells (G) at a higher level. The G cells project back to the same cells they receive input from, facilitating their responses. The G cells have annular integration fields, which makes them most sensitive to compact shapes. For example, when a square figure excites the red set of V2 cells, the corresponding G cell is strongly activated and, by feedback, enhances the responses in the red set, whereas the G cell on the other side receives input only from one edge and is therefore only weakly activated. The feedback makes the V2 cells border-ownership selective.

This model can explain the large context integration and the short latency of the border-ownership signals, because the grouping cells can be in another cortical area so that the feedback signals would travel through white matter fibers which conduct about ten times faster than cortical horizontal fibers (Girard, Hupe, & Bullier, 2001). Also, the length of the connections does not increase in direct proportion to the size of the figure representation in V2 cortex, as would the required length of horizontal fiber connections. This explains the relative invariance of the latency with variation of the size of the squares.

Note that different stimuli may use different types of processing in order to calculate border-ownership. For example, displays in which objects are defined by the configuration of contours may require feedback projections to provide the context information to neurons in V1 and V2, while border-ownership in random-dot stereoscopic displays might be calculated in a feedforward manner.

Another important argument for the grouping cell model is that it can easily be extended to explain selective attention, which is known to spread within objects (Egly, Driver, & Rafal, 1994). Because a single grouping cell can facilitate all the feature neurons connected to it, a top-down attention signal only needs to excite a small cluster of grouping cells to enhance the entire contour of an object (Mihalas, Dong, von der Heydt, & Niebur, 2011). A simple consequence of the connection scheme of Figure 10C is the asymmetry of the attention effect observed by Qiu, Sugihara, and von der Heydt (2007): A given border-ownership cell, for example the red cell in the center of Figure 10C, is facilitated when a corresponding grouping cell is activated, that is, only when a figure on its preferred side of border-ownership is attended. Attention to a figure on the other side does not facilitate the cell, because the grouping cells on the other side project back to the opposing (blue) border-ownership cell.

O'Herron & von der Heydt (2013) showed how this model could be extended to explain the remapping of border-ownership signals.


Recommended reading

  • Nakayama, Ken; Shimojo, Shinsuke and Silverman, Gerald H. (1989). Stereoscopic depth: its relation to image segmentation, grouping, and the recognition of occluded objects. Perception 18 (1): 55-68. doi:10.1068/p180055. 
  • Nakayama, Ken; He, Zijiang J. and Shimojo, Shinsuke (1995). Visual surface representation: A critical link between lower-level and higher-level vision. Visual cognition: An invitation to cognitive science 2: 1-70. doi:10.1163/156856893X00135. 
  • Driver, Jon and Baylis, Gordon C. (1996). Edge-Assignment and Figure-Ground Segmentation in Short-Term Visual Matching. Cognitive Psychology 31 (3): 248-306. doi:10.1006/cogp.1996.0018. ISSN 0010-0285. 

See also

Figure-ground perception, Gestalt principles

Personal tools

Focal areas