The present study examined cognitive abilities as predictors of people’s tendency to correct false impressions during problem solving. This was accomplished via a machine learning logistic regression analysis in which several indicators of a person’s working memory capacity (the ability to mentally maintain and manipulate information) were used to predict whether that person would correctly answer each of three questions from the Cognitive Reflection Test (which is designed around misleading questions). Some models added previous-item-performance as a separate predictor of current-item-performance, in hopes that it would further clarify results by reflecting non-cognitive reasons for correct responses. While it was found that aspects working memory capacity (in particular memory updating ability) can be used to predict performance on the Cognitive Reflection Test, previous-item-responding did not add predictive utility to the models. Furthermore, working memory capacity’s predictive utility may be highest on early items, when the confusing aspects of the test are most novel.
A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?
The above question, which was drawn from the Cognitive Reflection Test, represents a specific situation in which a query leads people to an intuitive, yet incorrect, assumption. Although the correct answer is 5 cents, the wording creates an impression that the solution is 10 cents. Who are the people that can overcome this gut-reaction and think more deeply about the problem, and who are the people that give into their initial impulse?
The present analysis explores this question from the perspective of individual differences in working memory capacity. Working memory is the cognitive system that allows people to hold information in mind on an as-needed basis. Working memory capacity is a measure of how effectively the working memory system is functioning, on a person-by-person basis (Engle, 2002).
Individual differences in working memory capacity provide a likely candidate for predicting which people are capable of reconsidering their impressions of a misleading question. This is because working memory capacity is strongly related to attention control (Engle, 2002), distractability (Fukuda & Vogel, 2011), and complex reasoning (Oberauer et al., 2005). Moreover, working memory capacity predicts real-world cognition in ares such as reading comprehension (McVay & Kane, 2011), emotion regulation (Keider et al., 2009), and even the rate at which people learn the syntax of programming languages (Schute, 1991). This is to say that people who score high on tests of working memory capacity also tend to perform well in the above domains. People who score low on test of working memory capacity tend to struggle in the above domains.
Shipstead, Harrison, and Engle (2016) propose that working memory capacity can be subdivided into three general cognitive functions. The first is executive attention control, which reflects the general efficacy with which a person’s attention is applied to tasks. The second is working memory maintenance, which represents the stability with which information is held in mind. The third is updating (or “disengagement”), which represents a person’s ability to keep memory clean by forgetting information when it becomes outdated. These three sub-mechanisms function in concert to allow for complex thought across a variety of contexts.
Intuitively speaking, people with high working memory capacity should be well-equipped to override the pull toward first impressions. These people are good at dealing with verbal information, excel at keeping attention focused on the task at hand, and can forget outdated ideas. However, there are also reasons to believe that a person’s cognitive abilities can only go so far when reasoning through problems, such as the one presented at the outset.
Stanovich and West (2000) describe human thought as occurring in two stages, or systems. Processing begins in System 1, which is fast acting, automatic, and not guided by attention. In other words, this is where general impressions (e.g., “the ball costs 10 cents”) are formed.
This processing bleeds into System 2, which is slower, but attention-guided and thus more flexible. This stage allows for correction of thought patterns (e.g., “oh hold on, it’s 5 cents!”). In other words, System 2 is where a person’s working memory becomes a critical component of thinking.
Given this context, one might expect individual differences in working memory capacity to be a strong predictor of the ability to successfully override the false impressions of System 1. This makes sense, since people with high working memory capacity excel at the type of thinking that occurs in System 2. However, one legitimate concern is that people with high working memory capacity may never realize that they should attempt to make a correction to their thought patterns.
As Kahneman (2003) points out, System 2 is typically lax in its monitoring for System 1 errors. In essence, most people tend toward cognitive miserliness, and only think deeply about issues once they see the personal relevance of doing so. High cognitive capacities and general need for deep engagement are not the same thing. As such, even people with high working memory capacity may fail to monitor their initial impressions, especially if they have no indication that their reasoning was faulty.
The data used in the present study were preexisting, and only contained measures of cognitive ability. The shortcoming is that, while deep and critical thinking are certainly facilitated by high cognitive ability, the desire to engage in deep thought is better thought of as a personality characteristic. In the absence of direct measurement of such a personality characteristic, an attempt was made to construct a measurement via the available data.
In this case, a person’s ability to correctly answer a Cognitive Reflection question, at some point, does reflect a tendency to slow down and consider that their initial impression was mistaken. As such, performance on Question 1 may add to the ability to accurately predict Question 2, above-and-beyond traditional cognitive variables. People with certain personality traits would be more prone to reconsidering their responses than would other people.
Whether previous-item-performance adds to the model is, however, an open question. On one hand, Question 1 performance may reflect personality characteristics, above-and-beyond cognitive ability. In this case, adding previous-item-perfomance to the model would lead to more accurate predictions. On the other hand, the importance of high cognitive ability to correctly answering Question 1 may be too strong. In this case, adding previous-item-performance to a model will simply be redundant and accuracy of predictions would not improve.
This study examined our ability to use different aspects of working memory capacity to differentiate people who are likely to adjust their false impressions from people who are unlikely to do so. Additionally, performance on previous cognitive reflection test items was added to models, as it was expected that people who have correctly responded to one item would approach subsequent items in a different manner. Thus, previous item performance was examined as a potential clarifier of outcomes.
Modelling and analysis were carried out via logistic regression, which examines the associations between multiple predictors (a person’s attention, maintenance, and updating abilities) and a specific outcome (e.g., correct answer on a test question). Through this, logistic regression can provide data needed to make a prediction: Given a person’s cognitive abilities, is that person likely to answer a given question correctly or incorrectly?
These logistic regressions were carried out within a machine learning paradigm in which models of performance were created using a subset of the data (training set). After creation, the model parameters were then used to predict people’s performance using the subset of data that was not included in training (validation data).
The ability to override first impressions was defined as performance on the cognitive reflection test (CRT). The CRT is a 3-question examination, from which the above reasoning problem was drawn (all three test questions are available in Appendix A). Questions were always asked in the same order, due to a need to avoid subject-by-treatment interactions.
Separate models were created for performance on each of the three questions, as it was assumed that CRT performance is an evolving process.
Consistent with the perspective of Shipstead et al. (2016), working memory capacity was defined through three varieties of cognitive test: attention control, maintenance, and updating. The present data set included several measures of each of these capabilities. Examples and details of each variety of test can be found in Appendix B. Relevant descriptive statistics can be found in Appendix C.
In brief, tests of attention control require test takers to overcome a reflexive response (e.g., move your eyes toward a peripheral flash), and instead perform a test-relevant action (e.g., move your eyes away from a peripheral flash). Tests of maintenance require test takers to remember a list of items, despite constant distraction (e.g., remember a list of letters, while solving math problems). Updating tests present more information than a test taker will need to remember. What is important is the ability to forget information as it becomes outdated, and focus on the relevant subset of information. This represents the ability to prevent mental clutter (e.g., not returning to an initial, incorrect, assumption regarding the correct answer).
As stated, this data set included a variety of tests of each of these three cognitive abilities. In order to reduce complexity and increase interpretability of the models, the individual tests were combined into three z-score composites that represented each of the three factors (these groupings are supported by factor analysis; see Martin et al. [2020] and Shipstead et al. [2015]). Z-scores transform different test scores to be on the same scale, thus allowing them to be averaged into one score (the z-score composite).
| Fluid Intelligence | Verbal Reasoning | |
|---|---|---|
| Attention | 0.56 | 0.51 |
| Maintenance | 0.70 | 0.66 |
| Updating | 0.76 | 0.74 |
The validity of these tests as measures of complex cognition is demonstrated in Table 1. As can be seen, each of the three working memory composite scores had a fairly strong correlation to fluid intelligence (reasoning with novel/abstract information) as well as to verbal reasoning (e.g., word analogies). Note that the intelligence tests were excluded from the examination of CRT performance, as this would have amounted to explaining reasoning with reasoning.
Portions of the present data were reported in Martin et al (2020), and were collected as part of a comprehensive screening of various cognitive abilities. 573 people completed 49 cognitive tests across four 2-hour sessions. Missing data points were imputed using expectation-maximization.
Data were split with 75% (n = 402) being used in the process of model training, with the remaining 25% (n = 132) being held out as the test set for model validation. Although it is typical to split data sets to equate the dependent variable, in this case each of the 3 CRT responses served as a unique dependent variable. This prevented a clean split from being made.
The data set, however, did include Raven’s Advanced Progressive Matrices, which is considered to be the best single predictor of human reasoning ability (Jensen, 1998). Since this test did not figure into the primary analyses, data were split along this factor, ensuring that the training and validation groups were cognitively similar.
| Question 1 | Question 2 | Question 3 | |
|---|---|---|---|
| Percent Correct Response (Training Data Set) | 0.18 | 0.19 | 0.34 |
| Percent Correct Response (Validation Data Set) | 0.16 | 0.18 | 0.29 |
Table 2 presents the percentage of test takers who correctly responded to each CRT question in both the training and validation data sets. As can be seen, performance was quite low on the first two questions: Fewer than 20% of people correctly answered either question.
Performance improved substantially on the third item: 33% of all people answered it correctly (across training and validation sets). This may indicate that people were starting to understand the nature of the CRT, and were thus attempting to correct their impressions later in the test. However, since the questions were always presented in the same order, it cannot be ruled out that Question 3 was simply easier.
| CRT | Attention | Maintenance | Updating | |
|---|---|---|---|---|
| CRT | - | |||
| Attention | 0.39 | - | ||
| Maintenance | 0.48 | 0.52 | - | |
| Updating | 0.55 | 0.57 | 0.70 | - |
Table 3 presents the correlations among the working memory capacity predictor variables and a composite score of all three CRT questions. Indeed, the correlations to CRT performance are reasonably strong, indicating that these cognitive tests are valid predictors of performance. This is particularly true of the correlation between CRT and updating.
Machine learning was conducted using R’s Caret package. Within this procedure logistic regressions were performed with R’s base generalized linear model. Cross-validation was performed, such that the training data were split into 5 folds (divisionas), and 4 were used to build a model and the 5th was used for validation. Folds were rotated, such that each was used in validation. This process was repeated 10 times to yield a mean model that is less data-specific, and therefore more generalizable to novel data.
| Model | Intercept | Attention | Maintenance | Update | Question 1 | Question 2 | AIC |
|---|---|---|---|---|---|---|---|
| Question 1 | -2.41*** | 0.42 | 0.99** | 1.14*** | - | - | 286.00 |
| Question 2a | -1.94*** | 0.16 | 0.82** | 0.83** | - | - | 330.06 |
| Question 2b | -2.23*** | 0.09 | 0.65* | 0.58 | 1.58*** | - | 308.47 |
| Question 3a | -1.15*** | 0.43* | 0.46 | 1.60*** | - | - | 368.77 |
| Question 3b | -1.57*** | 0.39 | 0.18 | 1.39*** | 1.43*** | 1.41*** | 331.91 |
Table 4 presents the coefficients of each model that was trained to predict performance on one the three CRT questions. These coefficients represent changes to the log odds of a correct vs. incorrect response, per unit change in the given predictor. As predictor coefficients are positive and all cognitive predictors (attention, maintenance, updating) were on the same scale, interpretation is straightforward: Bigger is better.
Models with an “a” were run with only cognitive variables as predictors. Models with a “b” were run with both cognitive variables and previous-question performance as predictors. Stars indicate degree of statistical significance (see table note).
First, focusing on the models that are strictly cognitive (1, 2a, 3a), memory updating is consistently important. Given that the three CRT questions were designed to require test takers to alter an initial impression, this relationship to updating is a coherent outcome: Memory updating ability is an important component of thought-correction.
It is somewhat surprising that maintenance capacity decreased in importance and that attention control was never a strong predictor of performance. One might expect that a person’s ability to stably hold a representation of the CRT problem in mind would be of particular importance. However, given that memory updating likely requires maintaining information and using attention to suppress outdated information, this result may simply indicate that memory updating tests provide the most complete assessment of a person’s overall working memory capacity (Martin et al., 2020).
Second, the case for including previous-item-performance as a predictor of current-item-performance is equivocal at this point. Examining Models 2b and 3b, previous-item-performance was a statistically significant predictor of current-item-success (at least during model training). Moreover, the meaningfulness of these predictors was reinforced in the AIC column of Table 4. AIC is an indicator of the trade off between increasing model complexity and improved prediction. Lower numbers indicate that the loss of parsimony is offset by increased explanation. This occurred in both cases of models a-vs-b.
Figure 1. Median training accuracy for models with (b) and without (a) previous-item-performance as a predictor.
Conversely, note that the magnitude of the cognitive variables’ coefficients decreased when previous-item-performance was added to the model. This does not mean that cognition was less important. Instead, it simply indicated that individual differences in the cognitive variables were redundantly represented in previous-item-performance. This makes sense if cognitive ability played into making a correct response on than previous-item.
Unfortunately, that interpretation implies that the predictive power that is being carried over by previous-item-performance is simply a reiteration of people’s cognitive abilities. Whereas the intent was to capture something unique, like a personality tendency toward deep thought, previous-item-performance may be mostly redundant variance that is already expressed in people’s cognitive abilities.
This concern is reinforced in Figure 1, which displays the median accuracy of prediction of the training data, across all 50 runs of cross validation. If the addition of previous-item-accuracy was contributing to better outcomes, one might expect higher accuracy. There does seem to be a trend toward higher accuracy associated with the included predictors, however, this is tempered by the fact that the (a)- and (b)-versions of the models showed overlap in their interquartile ranges. The validity of previous-item-performance is questionable at this point.
Figure 2. The general relation between working memory capacity predictors and CRT Question 1 performance.
Deeper analyses of the training data revealed an interesting trend that seems predictable in hindsight. Figure 2 respectively presents the relationship between either (a) attention, (b) maintenance, or (c) updating ability and performance on CRT Question 1. Two aspects of this figure require elaboration.
First, each dot along the x-axis represents one person’s standardized working memory component score, with zero being average. Dots at the top of the y-axis represent a correct response, dots at the bottom represent an incorrect response.
As can be seen, correct responses tended to be restricted to the higher end of the maintenance and updating spectra. This was expected, however, the issue is that incorrect responses never quite disappeared from the higher end: Many people with high working memory ability failed to make the correct response. In other words, high working memory capacity is a prerequisite correcting one’s false impression of the CRT problem, but it is not sufficient to predict that someone will.
Figure 3. The general relation between working memory capacity predictors and CRT Questions 2 (A-C) and 2 (D-F) performance.
Having stated this, it appears as though this relationship is most apparent when the task is novel. This is apparent in Figure 3, which displays responses to Questions 2 and 3. As can be seen, the distribution of the cognitive abilities of correct-responders appears to widen, relative to Question 1. This may account for the large jump in correct responses that occurred on Question 3 (Table 2): A wider range of people are capable of properly responding. This implies that the line between high- and low-ability individuals can blur across items. In turn it may become progressively more difficult to predict correct responders (i.e. from Question 1 to Question 3) when the models are applied to the validation data set.
The second concern that stems from this pre-analysis relates to the trend lines in Figures 2 and 3. These lines represent the probability that a person will make a correct response (numerically represented on the y-axis), on the basis of a given cognitive ability. In most categorization situations it is convention to predict that anyone whose probability is above 50% will make a correct response, while anyone whose probability below 50% will make an incorrect response.
Approaching the training data in this manner does lead to reasonable accuracy (above 80% correct categorization), but it comes at an analytically cost. Due to the muddied distinction between correct responders and incorrect responders, the probability of a correct response rarely even approaches 75% on the high end.
| Incorrect | Correct | | | Accuracy | Sensitivity | Specificity | |
|---|---|---|---|---|---|---|
| Predicted to be Incorrect | 316 | 54 | 0.83 | 0.24 | 0.95 | |
| Predicted to be Correct | 15 | 17 |
The concernm that this raises is made concrete in Table 5, which presents the confusion matrix, accuracy, sensitivity, and specificity when the Question 1 Model is applied to the data on which it was trained (which therefore represents the optimistic perspective). Although accuracy was high (83% correct identification), this was associated with low sensitivity (i.e., poor detection of correct responders): Although people are typically correctly categorized, this is mostly due to a combination of relatively few correct responses, mixed with the decision process being biased toward predicting “incorrect” (this is reflected in the high specificity, or correct identification of incorrect-responders).
Since the goal is to predict the rare correct-responders, this is an unhelpful path to follow. Thus, in order to meet the present research objectives, the threshold for predicting a correct response needed to be adjusted. That is to say, if correct responders are to be identified, the bar for predicting “correct response” needs to be lowered below 50%.
One traditional method for making such an adjustment is to search for the optimal point of informativeness via examination of the receiver operator characteristic curve. In essence, this process requires a search for the point of optimal tradeoff between true positives and false positives. However, with imbalanced data, such as these (i.e., rare correct responses), Matthew’s correlation coefficient (MCC) can be a more reliable estimator of the optimal threshold. MCC takes into account true positives, false positives, true negatives, and false negatives. As a general property, it increases as these four indicators come into balance.
Prior to making predictions on the validation data, the trained models were provided with a series of thresholds, for which MCC was calculated. This threshold was later applied to predictions that were made using probabilities that the model generated on the validation data.
| Optimal Threshold | MCC at Optimal Threshold | MCC at .31 | |
|---|---|---|---|
| Question 1 | 0.26 | 0.45 | 0.42 |
| Question 2a | 0.33 | 0.40 | 0.40 |
| Question 2b | 0.26 | 0.51 | 0.45 |
| Question 3a | 0.29 | 0.54 | 0.53 |
| Question 3b | 0.39 | 0.60 | 0.55 |
Table 6 presents the optimal training-data thresholds that could be used to generate predictions on the validation data set. The full range of Threshold-MCC values generated by the search is available in Appendix D.
The optimal threshold ranged from .26 to .39, from model to model. This variation would be fine if these models were each examined in isolation, however, they were examined in light of one another. As such, a consistent threshold needed to be maintained.
The compromise was to set the threshold to the average of all 5 values, which is .31. As can be seen in the final column on Table 6, the MCC for cognitive-only models (1, 2a, 3a) was minimally affected, which was desired. There is a somewhat greater deviation for the cognitive-plus-previous-response models (2b and 3b), but this was acceptable. These models, in and of themselves, were intended to deviate from 2a and 3a, and thus should do so within the context of the original models and procedures.
|
|
|
The first models to be applied to the validation data set were those involving only the working memory capacity predictors (Models 1, 2a, and 3a from Table 4). For the sake of being precise, the results of these predictions are presented in the confusion matrices on Table 7, but for the sake of clarity, analysis will focus on Figure 4.
Figure 4. Accuracy, sensitivity, and specificity for working memory-only models.
Figure 4 presents accuracy, sensitivity, and specificity, grouped by model. Examining accuracy first, it is clear that Models 1 and 2a did equally well at predicting responses, however, there was a relative decline in accuracy for Model 3a, which represented performance on the third CRT question. This occurred despite correct responders being identified at the numerically highest rate on this question (apparent in sensitivity).
The accuracy decline seems to have stemmed from relative difficulty at identifying incorrect responders on Question 3. This is apparent in the relatively low specificity. This was already hinted in the examination of the training data, where it was obvious that people of a wider range of cognitive abilities were capable of responding correctly to Question 3. Whereas incorrect responders were easy to identify on the first two questions (i.e., whoever has low working memory ability), this line had blurred by the third.
Clearly a person’s cognitive abilities can be used to predict their performance on the CRT, however, the predictive utility declines across items. As such, the next relevant question is whether inclusion of previous-item-responses can provide clarification and thus aid prediction of both correct and incorrect responders.
|
|
The confusion matrices for Models 2b and 3b, which included both working memory capacity variables and previous-item-responses are presented in Table 8. However, as before, discussion will primarily focus the data presented in Figure 5. This figure presents direct comparison of Models 2 and 3 when previous-item-responses were excluded (a) versus included (b).
Figure 5. Accuracy, sensitivity, and specificity comparing cognitive-only and cognitve-plus-previous-performacne models.
It has already been noted that the inclusion of previous-item-responses may not be improving the predictive utility of the models. Examining performance on Question 2 (Figure 5a), this seems to be the case. Inclusion a person’s performance on Question 1 (blue bars) did not improve accuracy of predictions, relative to the original cognitive-only model (red bars). If anything, it appears to have negatively affected sensitivity.
One concern regarding this conclusion is that the MCC analysis from Table 6 indicted that the threshold of .31 may have been too high for Model 2b, since Model 2b had the largest numerical decrease in MCC, relative to its optimal threshold. An alternate analysis was thus run in which Model 2b was allowed the optimal threshold. This analysis did not change the results in any substantial way. These results are available in Appendix E.
In apparent contrast to Question 2 performance, the inclusion of previous-item-performance did, at least numerically, improve prediction of Question 3. As can be seen on Figure 5b all three metrics increased when previous-item-performance was included as a predictor. However, consistent with the analysis of training results, this trend was not reliable. A check of accuracy and associated confidence intervals reveals that accuracy on Model 3a (acc: .66, 95%CI: .57-.74) and Model 3b (acc: 74, 95%CI: .66-.81) was not distinct enough to confidently project these results to the population. The numerical differences were simply too noisy to draw a firm conclusion.
It remains possible that with a larger sample (i.e., more power), previous-item-responses could result in improved fidelity of predictions. However, given these data, it also remains tenable that, while previous-item-responses improved the fit of the trained models, they do not provide enough unique information to improve predictions that are generated by the models.
This study examined the utility of measures of working memory capacity as predictors of performance on CRT questions. Correct responses on CRT are rare, due to the misleading nature of the test. However, memory updating and maintenance ability do seem to provide promising avenues.
Unfortunately, working memory capacity seemed to be more of a prerequisite for making correct responses, and is not a sufficient predictor: Although people with low working memory capacity rarely answer the questions correctly, many people with high working memory capacity often provide incorrect responses. It is hypothesized that a personality variable that predicts a person’s proneness to deep thought may help differentiate between people with high working memory capacity who are likely to recognize that the CRT questions are misleading, and those who are not.
An additional concern that could not be rectified within this data set was that the working memory variables did not perform as well when predicting performance on the third item. This may be a reflection of people catching onto the intentionally misleading nature of CRT, and thus being more careful with their responses. As a byproduct, working memory capacity may become less important to performance. However, the present shortcoming is that, since CRT items were always presented in the same order, the third question may simply be easier to answer. This would also make working memory capacity less important to predictions.
Setting aside the need for further research, the present data are quite clear. Individual differences in working memory capacity are predictive of CRT performance. By extension, this is interpreted to mean that cognitive ability does indeed predict a person’s tendency to slow down and consider that their reasoning has been is faulty.
Engle, R. W. (2002). Working memory capacity as executive attention. Current Directions in Psychological Science, 11, 19-23.
Fukuda, K., & Vogel, E. K. (2011). Individual differences in recovery time from attentional capture. Psychological Science, 22, 361-368.
Kahneman, D. (2003). A perspective on judgment and choice: Mapping bounded rationality. American Psychologist, 58(9), 697–720.
Kleider, H. M., Parrott, D. J., & King, T. Z. (2009). Shooting behaviour: How working memory and negative emotionality influence police shoot decisions. Applied Cognitive Psychology, 23, 1-11.
Martin, J. D., Shipstead, Z., Harrison, T. L., Reddick, T. S., Bunting, M, & Engle, R. W. (2020). The Role of Maintenance and Disengagement in Predicting Reading Comprehension and Vocabulary Learning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 46, 140-154.
McVay, J. C., & Kane, M. J. (2011). Why does working memory capacity predict variation in reading comprehension? On the influence of mind wandering and executive attention. Journal of Experimental Psychology: General, 141, 302-320.
Oberauer, K., Schulze, R., Wilhelm, O., & Süß, H. M. (2005). Working memory and intelligence - their correlation and their relation: A comment on Ackerman, Beier, and Boyle (2005). Psychological Bulletin, 131, 61-65.
Shipstead, Z., Harrison, T. L., & Engle, R. W. (2016). Working memory capacity and fluid intelligence: Maintenance and disengagement. Perspectives on Psychological Science, 11, 771-799.
Shute, V. (1991). Who is likely to acquire programming skills? Journal of Educational Computing Research, 7, 1-24.
Stanovich, K. E., & West, R. F. (2000). Individual differences in reasoning: Implications for the rationality debate? Behavioral and Brain Sciences, 23(5), 645–665.
Question 1: A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?
Impression: 10 cents
Correct Answer: 5 cents
Question 2: If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?
Impression: 100 minutes
Correct Answer: 5 minutes
Question 3: In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake?
Impression: 24 days
Correct Answer: 47 days
Figure B1. Attention control example: The antisaccade test
Figure B1 displays an attention test known as the antisaccade. The essence of this type of test is to create a situation in which a reflex comes into conflict with a task requirement. In this case, a star flashes on one side of a computer screen, leading to the impulse to look to that side. However, the test taker must override this reflex and look to the opposite side of the screen where a letter is briefly presented and then masked.
The dependent measure is accuracy across several trials. The attention control composite score was created using two variants of the antisaccade test.
Figure B2. Working memory maintenance example: The operation span test
Figure B2 displays a prototypical working memory maintenance test, known as the operation span test (sometimes known as Ospan). Test takers are required to remember a list of between 3-7 letters per trial. Maintenance of this information is continually disrupted by the requirement that test takers solve mathematical equations in between the presentations. This secondary task requirement is disruptive to short term memory and thus the test taker is required to engage attention and memory retrieval in order to keep the primary information accessible.
The dependent measure is the number of items recalled in their correct serial position. The working memory maintenance composite score was based on the operations span and two similar tests. One of which required memory for locations on a spatial grid, and the other that required memory for the direction of arrows.
Figure B3. Working memory updating example: The running memory span test
Figure B3 displays the running memory span test. In this version of the test a long string of digits is presented on a computer screen, one at a time. When the presentation ends the test taker is cued to recall the last 3-7 digits from the list. The challenge presented by this test is that the early items generate interference in memory. In turn it becomes difficult to remember the relevant items. Test takers who can put early items out of mind (keep memory clean) will excel at test performance.
The memory updating composite included (1) a composite of three varieties of verbal and spatial running memory span, (2) a composite of 3 varieties of n-back, and (3) a keeping track test. N-back presents a list of information and the test taker must indicate when the current item matches an item that was presented 3-items-ago on the list. Keeping track presents a list of items and the test taker must remember the most recently presented items from specific categories (e.g., remember the most recently presented metal, animal, and color).
| AntiSac1 | AntiSac2 | Ospan | SymSpan | RotSpan | RunSpan1 | RunSpan2 | RunSpan3 | N_back | KeepTrack | |
|---|---|---|---|---|---|---|---|---|---|---|
| mean | 0.80 | 0.56 | 54.99 | 27.07 | 25.13 | 56.78 | 58.25 | 25.24 | 1.58 | 33.52 |
| sd | 0.16 | 0.16 | 15.01 | 8.99 | 9.57 | 19.93 | 18.33 | 10.97 | 1.27 | 10.78 |
| min | 0.31 | 0.20 | 3.00 | 3.00 | 0.00 | 0.00 | 0.00 | 0.00 | -1.38 | 4.00 |
| max | 1.00 | 0.98 | 75.00 | 42.00 | 42.00 | 108.00 | 107.00 | 49.00 | 6.38 | 54.00 |
| skew | -0.76 | 0.06 | -0.89 | -0.48 | -0.47 | -0.45 | -0.48 | -0.36 | 0.69 | -0.51 |
| kurtosis | -0.40 | -0.77 | 0.18 | -0.41 | -0.54 | 0.13 | 0.55 | -0.70 | 0.76 | -0.59 |
Attention = Tests 1-2, Maintenance = Tests 3-5, Updating = Tests 6-10
Accuracy, sensitivity, and specificity comparing models of Question 2 perfomrance: Optimal threshold added.