2 Perceptual Disfluency and Recognition Memory: A Response Time Distributional Analysis
We live in a world that, even as adults, is, “blooming and buzzing with confusion” (James, 1890, p. 488). Despite this, we possess the remarkable ability to accomplish perceptual tasks, such as deciphering unclear and unsegmented handwritten cursive words, or actively participating in conversations amidst the chaos of a noisy bar. At the interface between education and cognitive psychology, decades of work has demonstrated a relationship between encoding difficulty and long-term memory. While making something harder and not easier runs counter to people’s beliefs, a host of memory and learning effects show that making encoding more effortful (and thus more errorful), under certain circumstances, can be beneficial for memory. This has popularly become known as the “desirable difficulties” principle (Bjork & Bjork, 2011). Well-documented examples of desirable difficulties include spacing out encoding across multiple sessions rather than massing study during a single session (see Carpenter et al., 2022), studying concepts in an interleaved fashion rather than a blocked fashion (Rohrer & Taylor, 2007), and generating or retrieving information rather than simply re-reading or studying the information again (Roediger & Karpicke, 2006; Slamecka & Graf, 1978).
Another desirable difficulty example involves a very simple manipulation—changing the perceptual characteristics of to-be-learned material to make it more difficult to process. A growing literature has shown that manipulating the perceptual characteristics of the to-be-learned material at encoding can improve memory (e.g., Geller et al., 2018; Geller & Peterson, 2021; Halamish, 2018; Rosner et al., 2015 ). The resulting memory benefit has been called the perceptual disfluency effect (Geller et al., 2018). Many studies have begun to examine how changes in physical characteristics of to be learned materials influence encoding and memory.
2.1 The perceptual disfluency effect
The relationship between perceptual disfluency and long-term memory has a long and storied history. While it is not quite clear where the term perceptual disfluency effect originated from, the idea behind it goes back to the late 80s with the work of Nairne (1988). Under the term perceptual-interference effect, Narine employed the technique of backward masking with hash marks ( e.g., ####) with a quick presentation time to make word encoding noisier during study. Because the word is presented and masked so quickly, considerable effort is needed to identify the word. Since then, different types of perceptual disfluency manipulations have shown to elicit a similar effect, such as high-level blurring (Rosner et al., 2015), word inversion (Sungkhasettee et al., 2011) , small text size (Halamish, 2018), handwritten cursive (Geller et al., 2018), and other unusual or difficult-to-read typefaces (Geller & Peterson, 2021; Weissgerber & Reinhard, 2017; Weltman & Eakin, 2014).
Given the simplicity and ease in which perceptual disfluency can be implemented, it is not surprising researchers began touting the educational implications of such a manipulation. Perceptual disfluency as a possible educational intervention started to garner more support with the publication of Diemand-Yauman et al. (2011). Across two experiments, Diemand-Yauman et al. (2011) showed placing learning materials in disfluent typefaces (e.g., Comic Sans, Bodoni MT, Haettenschweiler, Monotype Corsiva) improved memory in the lab (when learning about space aliens) and in a high school classroom where students learned about a variety of different content areas (i.e., AP English, Honors English, Honors Physics, Regular Physics, Honors U.S. History, and Honors Chemistry) from PowerPoints with information presented in difficult to read typefaces.
Unfortunately, the replicability of effects related to perceptual disfluency has come under scrutiny. A recent case in point is the typeface Sans Forgetica, developed through a collaboration among marketers, psychologists, and graphic designers. Originally launched with the strong claim that it enhances memory retention due to the backward-slanting letters and gaps within each letter which requires individuals to ‘generate’ the missing parts of each word. However, several subsequent studies have failed to replicate these claims, finding Sans Forgetica to be forgettable (Cushing & Bodner, 2022; Geller et al., 2020; Huff et al., 2022; Roberts et al., 2023; Taylor et al., 2020; Wetzler et al., 2021). Similar results have been found for various other perceptual disfluency interventions, such as small font sizes (Rhodes & Castel, 2008), difficult-to-hear stimuli (Rhodes & Castel, 2009), minor blurring (Yue et al., 2013), and alternative typefaces (Rummer et al., 2015).
Due to the mixed findings, researchers began exploring boundary conditions of the disfluency effect. The importance of testing the effects of disfluency in the presence of other variables is key to its usefulness as an educational intervention. Geller et al. (2018), for instance, found level of disfluency (more vs. less disfluent) mattered. Using easy-to-read and hard-to-read handwritten cursive words,(Geller2018?) showed that there is a Goldilocks zone for perceptual disfluency effects. That is, stimuli cannot be too easy to read (e.g., print words) or too hard to read (hard-to-read cursive). Only when the stimulus was moderately difficult (easy-to-read cursive), or just right, did memory improve. In another paper, Geller & Peterson (2021) demonstrated that memory benefits for disfluent stimuli are more robust when test expectancy is low. That is, a disfluency effect is only seen when participants are not told about an upcoming memory test. The authors reasoned that knowing about a memory test engaged a strategy where all stimuli get processed to a deeper level, regardless of how perceptually disfluent the stimulus is. This countervails any benefit disfluency has on memory. Additionally, a few studies have noted the importance of individual differences. Eskenazi & Nix (2021) showed better spellers remembered more words and meanings than poor spellers when placed in a disfluent font.
Though perceptual disfluency can occur in certain scenarios, its usefulness in educational settings, where students anticipate tests, could be limited. However, Geller and Peterson (2021) contended that perceptual disfluency has practical implications, particularly due to our reliance on incidental memory in everyday decision-making. For this to be effective, though, predicting when and where disfluency will occur is crucial.
2.2 Theoretical accounts of the disfluency effect
To achieve this aim, we require a better understanding of the mechanisms involved in eliciting the disfluency effect. Several theories have been proposed to explain this phenomenon. Geller et al. (2018) provided a review of two theories put forth to explain the disfluency effect. The metacognition account of disfluency (Alter, 2013) posits disfluency acts as a signal to engage in more system 2 thinking (Kahneman, 2011), or deeper levels of processing (Craik & Lockhart, 1972). Within this account, disfluency arises after the stimulus has been identified. As a results, the type of disfluency experienced does not matter.
The compensatory processing account (Mulligan, 1996) suggests that the disfluency effect is a result of increased top-down processing from lexical and semantic levels of representation. This framework is largely based on the popular interactive activation mode (IA model) of word recognition (McClelland & Rumelhart, 1981). In the IA model, visual input activates representations at three levels of representation: the feature, letter, and word levels. Activation in the IA model is both feed-forward and feedbackward. Thus, when there is perceptual noise (such as by a mask), there is increased top-down processing from higher, lexical or semantic, levels to aid in word identification. It is this increased top-down processing that produces better memory.
More recently, Ptok et al. (2019) put forth a limited capacity and stage-specific model to explain conflict-encoding effects like the perceptual disfluency effect. Within their model, memory effects arising from encoding conflict rely on (1) the stage or level of processing tapped by the task and (2) metacognitive processes that include monitoring and control. Across six experiments, Ptok et al. demonstrated better recognition memory for target words when shown with incongruent versus congruent semantic distractor words (i.e., category labels of size, animacy, or gender; e.g., “Chair - Alive” vs. “Chair - Inanimate), but no memory benefit for incongruent versus congruent response distractor words (e.g., Lisa -”left”/ Lisa - “right”). While both tasks resulted in conflict evinced by longer RTs to targets preceded by incongruent primes, only when the encoding task focused attention on target meaning (i.e., semantic categorization) rather than response selection did a memory benefit arise. In a follow-up set of experiments, Ptok et al. (2020) replicated this pattern of findings, and in addition, provided physiological evidence from cognitive pupillometry (i.e., the study of eye pupil size and how it relates to cognitive processing). They observed larger pupil size (which has been taken as an index of cognitive control; see (Wel & Steenbergen, 2018), for a review) for semantic incongruent and response incongruent primes, but only observed a memory benefit for semantic incongruent conditions. Interestingly, they also showed that these memory benefits can be mitigated by manipulating endogenous attention. Ptok et al. (2020) (Experiment 3) were able to eradicate the conflict encoding benefit by having participants sit in a chinrest and focus on the task. This is similar to what has been found in the perceptual disfluency literature. For example, having participants study words in anticipation for a test can eradicate the benefit of perceptual disfluency Geller & Peterson (2021). In addition, requiring participants to make judgments of learning (a metamemory judgment on a scale of 0-100 indicating how likely it is they will recall the word on a later memory test) after each studied word also eradicates the disfluency effect (Besken & Mulligan, 2013; Rosner et al., 2015). Taken together, this highlights the critical role of both the kind of processing done on the to-be-remembered stimuli, and control processes in eliciting a disfluency effect.
All three of these theories propose potential loci for the perceptual disfluency effect. In the metacognitive account, the disfluency effect occurs at a later postlexical stage, after word recognition has taken place. The compensatory processing account (Mulligan, 1996) links the perceptual disfluency effect directly to the word recognition process. That is, disfluent words receive more top-down processing from lexical or semantic levels during encoding. Lastly, the stage-specific model proposed by Ptok et al. (2019) associates perceptual disfluency effects with a specific stage of processing, namely the semantic level, but it also considers general attentional and cognitive control processes that are not solely tied to the word recognition process itself.
2.3 Moving beyond the mean: modeling RT distributions
2.3.1 Ex-Gaussian distribution
To test the different stages or loci involved in the perceptual disfluency effect, it is necessary to use methods that allow for a more fine-grained analysis of processes during encoding. In the perceptual disfluency literature (and learning and memory more broadly), it is common to use measures such as mean accuracy and RTs to assess differences between a disfluent and fluent condition (Geller et al., 2018; Geller & Peterson, 2021; Rosner et al., 2015). While this approach is often deemed as acceptable practice, there has been a call to go beyond traditional RT methods when making inferences (see Balota & Yap, 2011).
There are a several reasons for making the shift from traditional RT analyses to analyses that utilize the whole RT distribution. One reason is traditional approaches fail to capture the nuances inherent in RT distributions. Namely, RT distributions are unimodal and positively skewed. A standard analysis based on means can conceal effects that change only the shape of the tail of the distribution, only the location, or both the location and the shape of the distribution. A perfect example of this comes from the Stroop literature. The classic Stroop finding shows words presented in an incongruent color font (the word “red” printed in “blue” font) increases RTs compared to words in a baseline condition (e.g., XXXXX presented in a color font). The Stroop interference effect is something you can bet your house on. A more inconsistent finding is seeing facilitation (shortened RTs) when the word and color are congruent (i.e., “Green” presented in “Green”) compared to a baseline condition. Using ex-Gaussian analyses, Heathcote et al. (1991), provided an answer for this conundrum. When looking at ex-Gaussian parameters, there was both facilitation (from congruent trials) and interference (from incongruent trials) on mu. For \(\sigma\) , the analysis showed interference, but no facilitation. For \(\tau\), there was interference for both congruent and incongruent conditions. Comparing this to a mean RT analysis, they showed the standard interference Stroop effect, but no facilitation. Given that the algerbraic mean of the ex-Gaussian is \(\mu\) + \(\tau\) , the failure to observe a facilitation effect in the standard mean analysis likely arose from facilitation on \(\mu\) and interference on \(\tau\) canceling each other out. A finding such as this would be impossible looking solely at mean RTs.
Another reason to transition away from traditional analyses is that RTs provide only a coarse measure of processing during encoding. RTs capture the total sum of various task-related factors, ranging from non-decisional components like stimulus encoding and motor responses to decisional components.
Lastly, from a statistical standpoint, RTs present significant challenges. Specifically, they often violate two crucial assumptions: they are not normally distributed and their variance is frequently heterogeneous. Such violations can lead to biased results when making statistical inferences, as pointed out by (Wilcox, 1998).
One alternative to standard RT analyses is to examine RT distributions using mathematical models that capture the nuances of the RT distribution and consider various statistical properties, such as the shape ($\mu$), spread ($\sigma$), and skewness ($\tau$) of the distribution. One popular choice is the ex-Gaussian distribution [Balota & Yap (2011); Ratcliff, 1979]. As the name suggests, the ex-Gaussian distribution decomposes RTs into three parameters: mu (μ) representing the mean of the Gaussian component, sigma (σ) representing the standard deviation of that Gaussian component, and tau (τ) representing the mean and standard deviation of an exponential component capturing the tail of the distribution. The algebraic mean of ex-Gaussian is a combination of \(\mu\) + \(\tau\) . Together the three parameters represent different aspects of the distribution’s location and shape.
Exploring effects from a distributional perspective has provided a richer understanding of how different experimental manipulations affect word recognition. Experimental manipulations can produce several distinct patterns. One pattern involves a shift of the entire RT distribution to the right, without increasing the tail or skew. A pattern such as this would suggest a general effect and would manifest as an effect on \(\mu\), but not \(\tau\) . As an example, semantic priming effects–where responses are faster to targets when preceded by a semantically related prime compared to an unrelated prime–can be nicely explained by a simple shift in the RT distribution (Balota et al., 2008). Alternatively, an experimental manipulation could produce a pattern where the RT distribution is skewed or stretched in the slower condition. This suggests that the manipulation only impacts a subset of trials, and is visible as an increase in \(\tau\). An example of an effect that only impacts \(\tau\) is the transposed letter effect in visual word recognition (Johnson et al., 2012). The TL effect involves misidentification of orthographically similar stimuli that with transposed internal like, like mistaking “JUGDE” for “JUDGE” (Perea & Lupker, 2003). Finally, you could observe a pattern wherein an experimental manipulation results in both changes in \(\mu\) and \(\tau\), which would shift and stretch the RT distribution. Recognizing low frequency words have been shown to not only shift the RT distribution, but also stretch the RT distributions(Andrews & Heathcote, 2001; Balota & Spieler, 1999; Staub, 2010).
The ex-Gaussian model while mostly descriptive in nature has been used as a theoretical tool to map model parameters onto cognitive processes—please note this is highly controversial (Heathcote et al., 1991; Matzke & Wagenmakers, 2009). For example, the \(\mu\) and \(\sigma\) parameters have been tied to early, non-analytic, processing. In the area of semantic priming, the selective effect on \(\mu\) has been taken as evidence for an automatic spreading activation process (or head-start), according to which the activation of a node in a semantic network spreads automatically to interconnected nodes, preactivating a semantically related word (Balota et al., 2008; Wit & Kinoshita, 2015). The exponential component (\(\tau\)) has been tied to later, more analytic, processing (Balota & Spieler, 1999) . Specifically, increases in \(\tau\) have been attributed to working memory and attentional processes (Fitousi, 2020; Kane & Engle, 2003). For instance, Johnson et al. (2012) tied \(\tau\) differences for the TL effect to a post-lexical checking mechanism that arose from a failure to identify the stimulus on a select number of trials rather than a broader, lexical, effect occurring on every trial. Taken together, each parameter of the ex-Gaussian distribution seem to reflect a litany of cognitive processes that can be lumped into early vs late stages of cognitive processing.
2.3.2 Drift-diffusion model (DDM)
Contrary to the ex-Gaussian distribution discussed above, the DDM (see Ratcliff et al., 2016, for a comprehensive introduction) is a process-model and it’s parameters can be linked to latent cognitive constructs (Gomez et al., 2013). The DDM is a popular computational model commonly used in binary speeded decision tasks such as the LDT. The DDM model assumes a decision is a cumulative process that begins at stimulus onset and ends once a noisy accumulation of evidence has reached a decision threshold.The DDM has led to important insights into cognition in a wide range of choice tasks, including perceptual-, memory-, and value-based decisions (Myers et al., 2022).
In the DDM, RTs are decomposed into several parameters that represent distinct cognitive processes. The full DDM is composed of several parameters and varies depending on the flavor of DDM model one is using. The most relevant to our purposes here are the drift rate (\(v\)) and non-decisional time (ndt; \(T_{er}\)) parameters. Drift rate (\(v\)) represents the rate at which evidence is accumulated towards a decision boundary. In essence, it is a measure of how quickly information is processed to make a decision. A higher \(v\) means evidence is being accumulated more quickly, leading to faster decisions. The \(v\) is linked with the decision-process itself and is seen as an index of global processing demands placed on the cognitive system by task difficulty, memory load, or other concurrent processing demands (to the extent that concurrent processes compete for the same cognitive resources (Russell J. Boag et al., 2019). Drift rates have also been implicated as a locus of reactive inhibitory control (Braver, 2012), in which critical events (e.g., the need to update WM or task switch) trigger inhibition of prepotent response drift rates (Russell J. Boag et al., 2019; Russell James Boag et al., 2019).
The \(T_{er}\) parameter represents the time taken for processes other than the decision-making itself. This includes early sensory processing (like visual or auditory processing of the stimulus) and late motor processes (like executing the response).
The DDM has been shown to be a valuable tool for studying the effects of different experimental manipulations on cognitive processes in visual word recognition. For example, Gomez & Perea (2014) demonstrated certain manipulations can deferentially affect specific parameters of the model. For instance manipulating the orientation of words (rotating them by 0, 90, or 180 degrees) affected the \(T_{er}\) component, but not \(v\). In contrast, word frequency (high-frequency words vs. low-frequency words) primarily influenced both the drift rate and non-decision time. These findings highlight the sensitivity of the DDM in identifying and differentiating the impact of various stimulus manipulations on different cognitive processes involved in decision-making.
2.4 The Current Experiments
In the present experiments, we pursued two aims related to perceptual disfluency. The first aim was to examine the replicability of perceptual. To optimize our chances for observing this effect, we utilized a disfluency manipulation known to enhance memory in the literature—blurring (Rosner et al., 2015). Additionally, we employed a surprise recognition test (Geller & Peterson, 2021). The second, more pivotal aim, was to enrich the toolkit of researchers exploring conflict encoding like perceptual disfluency. Through the application of distribution techniques, like the ex-Gaussian analysis and DDM, our goal was to showcase how these techniques can grant us a deeper insight into the influence of encoding difficulty on memory. Overall, these endeavors will help ascertain conditions under which perceptual disfluency is beneficial for memory and when it is not.