Talk:Figure-ground perception
From Scholarpedia
Referee A
General Comments
This is a useful contribution to Scholarpedia. I liked it very much as far as it goes, and appreciate the fact that Mary Peterson is a leading expert in this area who has contributed many interesting ideas and facts to it. However, as someone who has also worked on figure-ground perception for a long time, it left me feeling that there is much more known about figure-ground processes than the authors indicate, and quite a few references to other authors that could profitably be included. Where to draw the line is a matter of personal judgment. Although I do not feel that any of the following comments MUST be incorporated in a further revision, I would be very disappointed if they were not, since their inclusion would lead readers to explanations of phenomena that are not sufficiently indicated in the current version of the article.
I organize my comments with respect to specific sentences in the article. I indicate current sentences in the text in BLACK, proposed new text and references in BLUE, and proposed deletions in RED.
1. On p. 3, just above the section on Non-classical geometric configural properties:
Thus, it is unclear whether responses to these configural properties per se are innate, or whether a sophisticated learning mechanism has evolved that allows humans to extract the statistical properties of the environment in which they live (and configural cues are among those properties). It has been shown that development of grouping networks in response to simple statistical properties of the environment may be sufficient to generate figure-ground percepts, indeed bistable 3D percepts such as the Necker cube (Grossberg & Swaminathan, 2004).
NEW REFERENCE:
Grossberg, S., and Swaminathan, G. (2004). A laminar cortical model for 3D perception of slanted and curved surfaces and of 2D images: development, attention and bistability. Vision Research, 44, 1147-1187.
2. On. P. 4, paragraph about the section on Depth cues:
The configural cues are shape cues; they determine where the shape lies with respect to an edge. But recall that the region complementary to the figure is often perceived to complete behind it. The perceptual completion of the ground has not received much attention in the study of figure-ground perception per se. It is possible that at least some of the configural properties may convey depth information as well as shape information (Burge et al. 2005; Grossberg, 1994; Kanizsa, 1985; Nakayama, Shimojo, & Silverman, 1989), and that perceptual completion may be most compelling when those cues are present (see, for instance, Peterson & Salvagio 2008).
NEW REFERENCES:
Kanizsa, G. (1985). Organization in vision: Essays in Gestalt perception. New York: Praeger.
Nakayama, K. Shimojo, S., & Silverman, G.H. (1989). Stereoscopic depth: Its relation to image segmentation, grouping, and the recognition of occluded objects. Perception, 18, 55-68.
3. First paragraph after the header Depth cues:
The region that appears shaped also tends to appear closer (although this relationship does not always hold, e.g., Palmer 1999; Peterson 2003). Depth cues determine which of two contiguous regions is closer to the viewer even in the absence of the classic configural cues. Closer regions tend to be shaped by the edges they share with contiguous regions in the visual input, and the latter typically appear to continue behind as backgrounds. There are ample empirical investigations of the depth cues: for instance, research investigates the ranges over which different depth cues are most effective (e.g., Cutting & Vishton 1995) and the rules by which depth cues combine (e.g., Landy et al. 1995). More research is needed to investigate Very little research investigates how configural cues and depth cues interact (but see Bertamini, Martinovic, & Wuerger, 2008; Burge et al. 2005; Dresp et al. 2000; Egusa, 1983; Peterson & Gibson 1993). Such research is needed for a full understanding of figure-ground perception.
NEW REFERENCES:
Dresp, B., Durand, S., and Grossberg, S. (2002). Depth perception from pairs of overlapping cues in pictorial displays. Spatial Vision , 15, 255-276.
Egusa, H. (1983). Effects of brightness, hue, and saturation on perceived depth between adjacent regions in the visual field, Perception 12, 167-175.
4. Section on Subjective factors:
Subjective factors can also influence figure assignment. For instance, the viewer’s intention to perceive one of two contiguous regions as figure affects figure-ground perception (e.g., Peterson et al. 1991). And regions at which the viewer is looking (fixated regions) are more likely to be seen as figures than adjacent un-fixated regions (Peterson & Gibson 1994). Similarly, an attended region is more likely to be seen as figure than the complementary unattended region, even without fixation (Baylis & Driver 1995; Vecera et al. 2002). Neural models have proposed how attention can influence which bistable percept is seen during percepts of the Necker cube (Grossberg & Swaminathan, 2004) and of bistable transparency (Grossberg & Yazdanbakhsh, 2005). These models predict how attention makes a figural surface look closer, and why there may be a change of brightness when there is a change of figure and its perceived depth, as noted by Tse 2005. Subjective factors can alter the likelihood of seeing the figure on one side of an edge, but typically they tend not to overpower configural cues.
NEW REFERENCES:
Grossberg, S. and Yazdanbakhsh, A. (2005). Laminar cortical dynamics of 3D surface perception: Stratification, transparency, and neon color spreading. Vision Research, 45, 1725-1743.
Tse, P. (2005). Voluntary attention modulates the brightness of overlapping transparent surfaces. Vision Research, 45, 1095-1098.
5. Section on Spatial frequency:
A region filled with a high spatial frequency pattern is more likely to be seen as the shaped figure than a contiguous region filled with a low spatial frequency pattern (see Figure 5; Klymenko & Weisstein 1986). Neural models propose how such a percept occurs and how it influences bistability, as when the Rubin’s vase-faces stimulus of Figure 6 is composed of two regions with different spatial frequencies, or when horizontal bands of high and low spatial frequency sinusoids alternate one above the other (Brown & Weisstein 1988; Grossberg 1994; Klymenko & Weisstein, 1986).
NEW REFERENCES:
Brown, J.M. & Weisstein, N. (1988) A spatial frequency effect on perceived depth. Perception & Psychophysics, 44, 157-166.
Klymenko, V. & Weisstein, N. (1986). Spatial frequency differences can determine figure-ground organization. Journal of Experimental Psychology: Human Perception & Performance, 12, 324-330.
6. Section on Extremal edges:
An extremal edge (EE) is a self-occluding edge. When shading and texture gradients are used to depict an extremal edge along one side of a border but not the other, observers show a strong bias to report seeing the EE side as nearer than the non-EE side (Palmer & Ghose 2008) A sample is shown in Figure 5b, where the extremal edge lies on the left side of the central border. One possible way to think about this percept is by using a A non-shape based likelihood principle may underlie this bias, in that from many viewpoints the input array is consistent with the interpretation that the EE side is closer, whereas it is consistent with the interpretation that the non-EE side is closer from only one viewpoint. On the other hand, Grossberg & Mingolla (1987) have explained such a percept of 3D shape using a property of perceptual grouping that they call a boundary web. A boundary web is a form-sensitive plexus of amodal emergent boundaries that selectively captures the filling-in of surface brightnesses at multiple depths. This concept has been used to quantitatively explain data about such percepts as 3D shape-from-texture (Grossberg et al., 2007) and 3D shape from the waterfall illusion (Pinna & Grossberg, 2005), which we now discuss.
NEW REFERENCES:
Grossberg, S. and Mingolla, E. (1987). Neural dynamics of surface perception: Boundary webs, illuminants, and shape-from-shading. Computer Vision, Graphics, and Image Processing, 37, 116-165.
Grossberg, S., Kuhlmann, L., and Mingolla, E. (2007). A neural model of 3D shape-from-texture: Multiple-scale filtering, boundary grouping, and surface filling-in. Vision Research, 47(5):634-672.
Pinna, B. and Grossberg, S. (2005). The watercolor illusion and neon color spreading: A unified analysis of new cases and neural mechanisms. Journal of the Optical Society of America A, 22, 2207-2221.
7. Section on Watercolor Effect illusion:
Consider a region bounded by two thin colored lines that are parallel to and touching each other. One of the colored lines contrasts less with the background than the other. Pinna, Brelstaff & Spillman (2001) and Pinna, Werner & Spillmann (2003) showed that under these conditions the low contrast color spreads orthogonally from the line and fills the bounded region; they called this phenomenon the “Watercolor Illusion.” They showed that the region through which color spreads is more likely to be seen as the figure than it would be without the color. Key properties of the watercolor illusion have been explained using mechanisms of figure-ground perception that have also been used to explain other figure-ground percepts, such as 3D neon color spreading (Pinna & Grossberg, 2005). Not much is known about the Watercolor Effect as a figural cue; unlike other figural cues, it has not been examined in isolation, it has always interacted with one or more of the other figural cues.
NEW REFERENCE:
Pinna, B., Brelstaff, G. & Spillmann, L. (2001) Surface color from boundaries: a new “watercolor” illusion. Vision Research, 41, 2669-2676.
8. Section on How does figure-ground perception occur?:
Second paragraph:
Experimental work investigating the competition is relatively new (but see Dresp et al 2002). More studies directed to uncovering the nature of the competition are necessary. For instance, experiments investigating how depth cues alter the between-shape competition are needed to elucidate the mechanisms of figure-ground perception.
9. Section on Open questions:
Second paragraph:
2. Edges separate surfaces in touch as well as in vision, and figure-ground perception occurs (Kennedy 1993). There are analogous percepts in hearing, taste, and smell as well. Do the same mechanisms produce figure-ground perception across the senses. As illustrated by auditory streaming and speech categorization, different grouping mechanisms seem to have evolved to deal with audition and vision in general (e.g., Bregman, 1990; Grossberg, 2003), although perhaps the greatest similarities occur between visual apparent motion processing and auditory streaming (Bregman, 1990; Gjerdingen, 1994).
NEW REFERENCES:
Bregman, A.S. (1990). Auditory scene analysis: The perceptual organization of sound. Cambridge, MA: MIT Press.
Gjerdingen, R.O. (1994). Apparent motion in music? Music Perception, 11, 335-370.
Grossberg, S. (2003). Resonant neural dynamics of speech perception. Journal of Phonetics, 31, 423-445.
Third paragraph:
3. There are many other types of ambiguous figures, including binocular rivalry stimuli and reversible stimuli like the duck/rabbit or Necker cube. Interactions between cooperative and competitive mechanisms have been proposed to play a key role in explaining many Competitive mechanisms have been proposed to account for all of these reversals. For example, Grossberg et al. 2008 have explained a large body of binocular rivalry data using the type of perceptual grouping mechanisms that play a key role in figure-ground perception. These mechanisms include long-range cooperation whereby groupings start to form, competitive mechanisms that prune and decide among possible groupings, and habituative mechanisms that enable active groupings to weaken in an activity-dependent way and thereby lead to a perceptual switch. A major goal of ongoing and future research is to clarity how such combinations of mechanisms may explain a wide range of data about percepts of Can comparisons of the different types of ambiguous stimuli. shed light on where and how competitive mechanisms operate in the brain?
NEW REFERENCE:
Grossberg, S., Yazdanbakhsh, A., Cao, Y., and Swaminathan, G. (2008). How does binocular rivalry emerge from cortical mechanisms of 3-D vision? Vision Research, 48, 2232-2250.
Reviewer B
Second review
This is my review of the revision.
1. I still think Figure 1 loooks strange. I am not sure what the authors are trying to do with this figure. Why can't it be more compact and regular?
2. Under Contents the organisation seems wrong. Surely 1 should be a heading "Configural cues" not "Factors that affect figure assignment" which is the superordinate heading of the whole section. Then the subject headings under 1 make sense; 1.1 Classical configural cues and 1.2 Non-classical configural cues.
3. On page 3 the statement is made that "the Gestalt Psychologists held that these cues were innate". It is essential to give a reference here. I do not believe that Rubin thought this. There may be such a statement in Kohler or Koffka but it should be specifically cited in the light of the specific warnings by Kanizsa in "Organisation in Vision" (1979) and by Mitchell Ash in "Gestalt Psychology in German Culture" (1995) against the assumption that the gestaltists supported an innate position.
4. Figure 3. Given the point on page 2 about the importance of having the same contrast of the two regions relative to the backdrop, the authors should check on the contrasts in this figure. The seahorse does not look as if it has equal contrasts for black and white regions. This may be an illusion of course.
5. On page 3 the authors have not made clear the point I suggested in my previous review should be dropped. What does it mean to say that "past experience is not always instantiated as geometric relationships"? Do they mean suggestion? Again what do they refer to when they say that, "it may not be the case that all forms of past experience influence figure assignment but only those that are embodied geometrically"? Do they refer to verbal experience? This whole notion is still quite unclear.
6. Page 5. Under extremal edges the authors say that from many viewpoints the input array is consistent with the interpretation that the EE is closer whereas it is consistent with the interpretation that the non-EE side is closer from only one viewpoint. Surely this is not true. The non-EE edge could be nearer and overlapping with the region with the EE edge and this will be the case for many viewpoints.
7. Also on page 5 under "Ambiguous figure-ground perception" the authors say that familiarity favors perceiving the black regions as figures. I do not see why the sideways faces are more familiar than the vase.
Old -fashioned review as requested. (Original)
I think this is a very suitable topic for Scholarpedia and on the whole very well done. I have the following suggestions.
P 1.
Paragr 1. The first sentence seems a bit awkward. The word "adjacent" which means "near" does not seem correct. "Abutting" is the correct word but "contiguous" or "adjoining" would also be better. I suggest, "For two abutting regions of the visual field, the usual perceptual outcome is that the common edge appears to be the boundary for only one region- the figure-and this region appears to have a definite shape.
I think the third sentence should say "nearer" rather than "closer to the viewer". I suggest "Thus in addition to being shaped, the figure appears nearer than the ground part of which appears occluded by the figure.". This adds a little more and slightly different emphasis to the property of ground that it goes behind figure.
I suggest omitting the sentence " Figures constitute the two-dimensional shapes we perceive and the three-dimensional objects with which we interact." Apart from implying that we do not perceive three-dimensional shapes or interact with two- dimensional ones it raises questions that the review does not address about the way the typical 2-D figure-ground stimuli map onto real world situations. For example surely most three-dimensional objects e.g. a ball could never be ground. Better to omit the sentence. It does not in my opinion add anything useful.
The figure chosen for Figure 1 does not seem optimal - the handles make it look a bit strange and distract from the main figure.
Paragr 2. I suggest leaving out "as does grouping" The article is not about grouping and does not explain what the term refers to.
P2 Paragr 1. I am not sure that it is correct to identify the Rubin figure-ground determinants with the "gestalt principles". Some of them are but not all. Rubin's work was certainly taken up by the Gestalt psychologists so a mention is fine (line 1) but it would probably be better to omit the reference identifying the "classical cues" with Gestalt Principles (line 8).
Paragr 2. The first sentence would read better as " regions that are convex .........are more likely to be seen as figures than contiguous regions that are......."
P3. Under the heading "Non-classical geometric configural properties"
parag 2. I did not understand what the last sentence was getting at "These effects necessarily depend on........yet they can be characterised geometrically in the spatial relationship between the parts on one side of an edge; the configuration of parts is familiar in a particular orientation'. The sentence seems to imply that familiarity can be characterised geometrically in a general sense. This cannot be what is meant but if it means that a particular familiar configuration can be specified geometrically that is surely too obvious to spell out. On the other hand the sentence may mean something else altogether. If so it needs to be clearer. I would omit it.
Under the heading "Depth Cues" line 7. After Burge et al (2005) and Peterson and Gibson (1993) are quoted you may consider adding a recent relevant paper by Gillam et al (2009). (B.J., Anderson, B.L., & Rizwi, F. (2009) Failure of facial configural cues to alter metric stereoscopic depth. Journal of Vision 9 (1):3 1-5).
p5. In the example of the Rubin figure given it is very difficult to see either alternaive as shapeless. The open questions were good.
