Conceptual short term memory

Post-publication activity

Mary C. Potter, Department of Brain and Cognitive Sciences, MIT, Boston, MA

Conceptual short term memory (CSTM) is a mental buffer in which current stimuli and their associated concepts from long term memory (LTM) are represented briefly, allowing meaningful patterns or structures to be identified (Potter, 1993, 1999). CSTM is different from and complementary to other proposed forms of working memory: it is engaged extremely rapidly, is largely unconscious, and is the basis for the unreflective understanding that is characteristic of everyday experience. The key idea behind CSTM is that most cognitive processing occurs without review or rehearsal of material in standard working memory and with little or no conscious reasoning. When one perceives a meaningful stimulus such as a word, picture, or object, it is rapidly identified and in turn activates associated information from long term memory. New links among concurrently active concepts are formed in CSTM, shaped by parsing mechanisms of language or grouping principles in scene perception and by higher-level knowledge and current goals. The resulting structure represents the gist of a picture or the meaning of a sentence, and it is this structure that may be consolidated into long term memory. Momentarily activated information that is not incorporated into such structures either never becomes conscious or is rapidly forgotten.

This whole cycle--identification of stimuli, memory recruitment, structuring, consolidation in long term memory, and forgetting of nonstructured material--may occur in less than 1 second when viewing a pictured scene or reading a sentence. Demo1

CSTM in relation to other memory systems. CSTM is a processing and memory system that differs from other forms of short term memory. The other forms include early visual (iconic) memory, which maintains a detailed visual representation for up to about 300 ms; the visuospatial sketchpad or visual short term memory, which holds a limited amount of visual information (about 4 items' worth) as long as the information is attended to; and the phonological loop, a sound-based short term memory, which holds a limited amount of recently heard or internally generated auditory information for about 2 s or as long as the items are rehearsed. CSTM differs from these other memory systems in one or more ways: in CSTM, new stimuli are rapidly categorized at a meaningful level, associated material in long term memory is quickly activated, this information is rapidly structured, and information that is not structured or otherwise consolidated is quickly forgotten (or never reaches awareness).

CSTM is required to explain the human ability to understand and act rapidly and seemingly effortlessly, drawing on appropriate knowledge from long term memory. Not all cognitive processing is effortless, however: reasoning, recollection, and planning are slower and more effortful. They draw on other forms of working memory together with CSTM.

Relation to other models. Many models of cognition include some form of processing that relies on persistent activation or memory buffers other than standard working memory, tailored to the particular task being modeled. CSTM may be regarded as a generalized capacity for rapid abstraction, pattern recognition, and inference that is embodied in a more specific form in models such as ACT-R (e.g., Budiu & Anderson, 2004), the construction-integration model of discourse comprehension (Kintsch, 1988), the theory of long-term working memory (Ericsson & Kintsch, 1995); and models of reading comprehension (e.g., Just & Carpenter, 1992; van den Broek, Rapp, & Kendeou, 2005).

Evidence for CSTM

Three interrelated phenomena give evidence for CSTM:

(1) There is rapid access to conceptual (semantic) information about a stimulus and its associations. Conceptual information about a word or a picture is available within 100-300 ms, as shown by experiments using semantic priming (including masked priming and so-called fast priming); eye tracking when reading or looking at pictures; measurement of event-related potentials during reading; and target detection in rapid serial visual presentation (RSVP, Forster, 1970). To detect a target such as an animal name in a stream of words, the target must first be identified (e.g., as the word tiger) and then matched to the target category, an animal name. (Demo 1 shows a similar task with pictures.) Targets can be detected in a stream of non-targets presented at rates of 8-10 items/s or higher, showing that categorical information about a stimulus is activated and then selected extremely rapidly. These and other experimental procedures show that semantic or conceptual characteristics of a stimulus have an effect on performance as early as 100 ms after its onset.

(2) New structures can be discovered or built out of the activated conceptual information, influenced by the observer's task or goal. Viewers can read, understand, and recall an RSVP sentence presented as fast as 12 or more words a second (Forster, 1970; Potter,1984). In contrast, when viewers read short lists of unrelated words at that rate, they can only recall two or three words. Readers recover not just the syntactic structure of a sentence seen in RSVP but also its meaning and plausibility, which requires retrieval of general knowledge. Because almost all the sentences one normally encounters include new combinations of ideas, perceiving the structure and meaning of a new sentence is not simply a matter of finding a previously encountered pattern in long-term memory. Instead, the reader or listener has to construct a new relationship among existing concepts. The same is true when viewing a new pictured scene: not only must critical objects and the setting be identified, but also the relations among them: the gist of the picture. Organization or structuring of new stimuli enhances memory by providing conceptual chunks that can function as single items in capacity-limited working memory and ultimately in long term memory.

(3) There is rapid forgetting of information that is not structured or that is not selected for further processing. Conceptual information is activated rapidly, but the initial activation is highly unstable and will be deactivated and forgotten within a few hundred ms if it is not incorporated into a structure. As a structure is built--for example, as a sentence is being parsed and interpreted--the resulting interpretation can be held in memory and ultimately stabilized or consolidated in working or long term memory as a unit, whereas only a small part of an unstructured sequence such as a string of unrelated words can be consolidated in the same time period.

Experiments on CSTM

The working of CSTM is most readily revealed when two or more stimuli are presented together or in a rapid sequence, as in RSVP, or when a rich stimulus such as a picture of a scene or a movie clip is presented briefly, without time for deliberation. Such methods were used in the following studies.

Understanding pictures and scenes

Figure 1: Proportions of correct detections of target pictures specified by a title compared with recognition memory for pictures (corrected for guessing) when no target was specified. Presentation time is on a log scale. Adapted from Potter, 1976.

Figure 2: Probability (corrected for chance) of recognizing a picture as a function of relative serial position in the test, separately for a group given pictures in the recognition test and one given titles. Pictures were presented at 6/s and tested with a yes-no test of pictures or just of titles. Adapted from Potter et al., 2004, Figure 4.

In studies in which unrelated photographs are presented in RSVP, viewers can readily detect a picture when given a brief descriptive title such as picnic or two men talking, at rates of presentation up to about 10 pictures/s, even though they have never seen that picture before and an infinite number of different pictures could fit the description (Intraub, 1981; Potter, 1976). ( Figure 1). Evidently viewers can extract the conceptual gist of a picture rapidly, retrieving relevant conceptual information about objects and their background from long term memory (e.g., Davenport & Potter, 2004). Having spotted the target picture, viewers can continue to attend to it and consolidate it into working memory--after the sequence they can describe the picnic scene, for example. Yet viewers forget most pictures presented at that rate almost immediately, when they are not looking for a particular target, as shown in Figure 1 and Demo 1. The rate must be slowed to about 2 pictures/s for viewers to recognize as many as half the pictures as familiar, shortly after the sequence. However, even at a rate of presentation of 6 pictures/s viewers are usually able to remember most of the pictures if tested for recognition within a second of the end of the sequence (Potter, Staub, Rado, & O'Connor, 2002). That is, they will usually remember the first picture tested, if testing begins immediately; performance drops off rapidly over the first few seconds ( Figure 2). Importantly, one sees a similar fall-off in performance when the test is in the form of picture titles, showing that the gist of most pictures was initially represented but then forgotten (Potter, Staub, & O'Connor, 2004). Thus, gist can be extracted rapidly, but may be quickly forgotten without further processing.

Understanding RSVP sentences

Figure 3: Immediate recall of RSVP lists of 2, 3, 4, 5, or 6 nouns presented at rates between 1 and 12 words/s (adapted from Fig. 2, Potter, 1993).

Differences between lists and sentences. Although 90% of the words of a 12-word RSVP sentence presented at 12 words/s are recalled, that drops to 61% when the same words are presented in a scrambled order (Potter, Kroll, & Harris, 1980). This finding supports the CSTM assumption that each word can be identified and understood even when it is part of a rapid stream of words. Demo2. The results also support the CSTM hypothesis that representations of the words remain activated long enough to allow them to be bound into whatever syntactic and conceptual structures can be built as the words appear. When the order of the words is scrambled it is more difficult to discover the structure, showing that word order is used in syntactic parsing and semantic interpretation even at this high rate of presentation. Performance is still worse with lists of unrelated words. With a list of only five words presented in RSVP for 1 s each, about 90% were recalled correctly, whereas when the same words were presented at 12/s, only 52% were recalled ( Figure 3, Potter, 1993).

Reading RSVP paragraphs: More evidence for immediate use of structure. Although a single RSVP sentence presented at 12 words/s can be comprehended and recalled, if one continues reading a paragraph at that rate the gist will be remembered, but not individual sentences (Potter, Kroll, & Harris, 1980). Demo3 This result, together with those for unordered word strings and single sentences, shows that structuring can occur rapidly, and more structure results in better memory. Nonetheless, rapid conceptual processing is not sufficient for accurate retention if there is no additional time for consolidation: the gist may survive, but details will be lost. Immediate memory for an RSVP paragraph is like long term memory for a more slowly-read paragraph or article when tested hours or days later: the gist is retained despite the loss of details.

Mechanisms of structuring in RSVP sentence processing. In the case of sentences, it is evident that parsing and conceptual interpretation must occur virtually word by word, because any substantial delay would outrun the persistence of unstructured material in CSTM (as happens with lists of unrelated words). A crucial component of sentence understanding is selection of the appropriate meaning of the words, as most words have multiple meanings or senses. The surrounding context, particularly that preceding a given word, is used both to perceive the word correctly and to select the right meaning. Can that be accomplished in the brief time available when reading an RSVP sentence? In one study (Potter, Stiefbold, & Moryadas, 1998) readers were given a task that was equivalent to disambiguating an ambiguous word: they had to select the appropriate word, when two words presented simultaneously in an RSVP sentence. The task was to report the whole sentence, including the correct word of the pair; an example follows.

Figure 4:

The words of the sentence were presented sequentially for 133 ms each; the word pair was presented for 83 ms. Readers had little difficulty picking the right word (showing that both words had been seen). Although they were asked to report the other word after they reported the sentence, they were rarely able to do so. Evidently the selection could be made during presentation, but the word that did not match the semantic structure of the sentence was quickly forgotten.

Selective search and the attentional blink

As already noted, the ability to detect a categorically specified target in a rapidly presented sequence supports the CSTM assumption that a stimulus can be identified rapidly. Curiously, however, a second target is often missed if it appears within 500 ms of the first target, an effect termed an attentional blink. Given that a continuous stream of items such as the words of a sentence may be easy to see and remember, the attentional blink is surprising. When there is an uninterrupted sequence of several targets, as happens when a sentence is presented and recalled as a whole, there is no attentional blink, whereas if the task is to report the two words that are marked by color or by case, there is an attentional blink (Potter, Nieuwenstein, & Strohminger, 2008). There is both behavioral and event-related potential evidence that stimuli that are not reported because of an attentional blink are nonetheless momentarily comprehended, because they activate an ERP mismatch marker when they are inconsistent with prior context (Luck, Vogel, & Shapiro, 1996). Similarly, word targets that are related in meaning are more accurately detected even when the second word occurs within the time period that produces an attentional blink (e.g., Potter, Dell'Acqua, et al., 2005).

Further questions about CSTM

How does structuring occur in CSTM?

Structuring in CSTM is not different in principle from individual steps in the slower processes of comprehension that happen as one gradually understands a difficult text or an initially confusing picture, or solves a chess problem over a period of seconds and minutes. But CSTM structuring occurs with a relative absence of awareness that alternatives have been weighed and that several possibilities have been considered and rejected, at least implicitly. As in slower and more conscious problem solving, a viewer's task set or goal makes a major difference in what happens in CSTM, because one's intentions activate processing routines such as sentence-parsing, target specifications in search tasks, and the like. Thus the goal partially determines what enters CSTM and how structuring takes place. Working memory as it is generally understood comes into play when a first pass in CSTM does not meet one's goal. Then, more conscious thought is required, drawing on working memory together with continued CSTM processing.

Compound cuing and latent semantic analysis (LSA). The presence of many activated items at any moment, in CSTM, allows for compound cuing--the convergence of two or more weak associations on an item. The power of converging cues, familiar to any crossword puzzle fan, is likely to be central to structure-building in CSTM. A radical proposal for the acquisition and representation of knowledge, latent semantic analysis (Landauer & Dumais, 1997; see Scholarpedia entry), provides a suggestive model for how structure may be extracted from loosely related material. However, there is no syntactic parser in latent semantic analysis and it is clear from RSVP research that we do parse rapidly presented sentences as we read (e.g., Potter, Stiefbold, & Moryadas, 1998); thus, the latent semantic analysis approach is at best a partial model of processing in CSTM.

Is CSTM conscious?

The question is difficult to answer, because we have no clear independent criterion for consciousness other than availability for report. And, by hypothesis, report requires some form of consolidation; therefore, only what persists in a structured form will be reportable. Thus, while the evidence we have reviewed demonstrates that there is conceptual processing of material that is subsequently forgotten, it does not tell us whether we were briefly conscious of that material, or whether the activation and selection occurred unconsciously.

It seems unlikely, however, that multiple competing concepts (such as the multiple meanings of a word) that become active simultaneously could all be conscious in the ordinary sense, although preliminary structures or interpretations that are quickly discarded might be conscious. For example, people do sometimes become aware of having momentarily considered an interpretation of a spoken word that turns out to be mistaken. In viewing rapid pictures, people have a sense of recognizing all the pictures but forgetting most of them. But such experiences seem to be the exception, rather than the rule. Thus, much of CSTM activation, selection, and structuring happens before one becomes aware. It is the structured result, typically, of what one is aware, which is why perception and cognition seem so effortless and accurate.

Summary: Rapid conceptual processing followed by rapid forgetting

In each of the experimental domains discussed--comprehension and retention of RSVP word lists, sentences, and paragraphs; studies of word perception and selection; experiments on picture perception and memory; and the attentional blink--there is evidence for comprehension of the meaning or meanings of a stimulus early in processing (possibly before conscious awareness), followed by rapid forgetting unless conditions are favorable for retention. The two kinds of favorable conditions examined in these studies were selection for attention (e.g., the first target in the attentional blink procedure, and selection of a target picture from among rapidly presented pictures) and the availability of associations or meaningful relations between momentarily active items (as in sentence and paragraph comprehension and in word perception, selection, or disambiguation as a sentence is processed). The power of these two factors--selective attention that is defined by conceptual properties of the target, and the presence of potential conceptual structure--is felt early in processing, before conventional STM or working memory for the stimuli has been established, justifying the claim that CSTM is separate from STM and the usual definition of working memory.

References

Budiu, R. & Anderson, J. R. (2004). Interpretation-based processing: A unified theory of semantic sentence processing. Cognitive Science 28, 1-44.
Ericsson K. A., & Kintsch, W. (1995). Long-term working memory. Psychology Review, 102, 211-245.
Intraub, H. (1981). Rapid conceptual identification of sequentially presented pictures. Journal of Experimental Psychology: Human Perception and Performance, 7, 604-610.
Just, M. A., & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99, 122-149.
Kintsch, W. (1988). The role of knowledge in discourse comprehension: A construction-integration model. Psychological Review, 95, 163-183.
Luck, S. J., Vogel, E. K., & Shapiro, K. L. (1996). Word meanings can be accessed but not reported during the attentional blink. Nature, 383, 616-618.
Potter, M.C. (1976). Short-term conceptual memory for pictures. Journal of Experimental Psychology: Human Learning and Memory, 2, 509-522.
Potter, M.C. (1993). Very short-term conceptual memory. Memory & Cognition, 21, 156-161.
Potter, M. C. (1999). Understanding sentences and scenes: The role of Conceptual Short Term Memory. In V. Coltheart (Ed.), Fleeting memories: Cognition of brief visual stimuli (pp.13-46). Cambridge, MA: MIT Press.
Potter, M.C., Dell’Acqua, R., Pesciarelli, F., Job, R., Peressotti, F., & O’Connor, D.H. (2005). Bidirectional semantic priming in the attentional blink. Psychonomic Bulletin & Review, 12, 460-465.
Potter, M.C., Kroll, J.F., & Harris, C. (1980). Comprehension and memory in rapid sequential reading. In R. Nickerson (Ed.), Attention and Performance VIII (pp. 395-418). Hillsdale, NJ: Erlbaum.
Potter, M. C., Nieuwenstein, M. R., & Strohminger, N. (2008). Whole report versus partial report in RSVP sentences. Journal of Memory and Language, 58, 907-915.
Potter, M. C., Staub, A., & O'Connor, D. H. (2004). Pictorial and conceptual representation of glimpsed pictures. Journal of Experimental Psychology: Human Perception and Performance, 30, 478-489.
Potter, M.C., Stiefbold, D., & Moryadas, A. (1998). Word selection in reading sentences: Preceding versus following contexts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 68-100.

Conceptual short term memory

Contents

Evidence for CSTM

Experiments on CSTM

Understanding pictures and scenes

Understanding RSVP sentences

Selective search and the attentional blink

Further questions about CSTM

How does structuring occur in CSTM?

Is CSTM conscious?

Summary: Rapid conceptual processing followed by rapid forgetting

References

See also:

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Focal areas

Activity

Tools