Replication of Thinking about a limited future enhances the positivity of younger and older adults’ recall: Support for socioemotional selectivity theory by Barber, Opitz, Martins, Sakaki & Mather (2016, Memory & Cognition)

Author

Jane Stephenson (janestep@stanford.edu)

Published

October 25, 2025

Introduction

I would like to attempt to replicate Experiment 2 in the article, “Thinking about a limited future enhances the positivity of younger and older adults’ recall: Support for socioemotional selectivity theory.” In this experiment, the authors found that people who were assigned to write and reflect on a limited future subsequently recalled more positive images compared to people who were assigned to write and reflect on either expansive futures or time-neutral prompt and that this difference was independent of mood. I chose this experiment because I am interested in age differences in emotional experience and whether we can learn from the emotional strengths of older adulthood to design interventions to improve the emotional well-being of younger adults, who suffer disproportionately from mental distress.

To conduct this replication, 150 participants from an online sampling platform will be required, in alignment with the original study. To minimize context sensitivity, the procedures of the replication study will resemble those of the original study as much as possible.

After recruitment, participants will complete the Positive and Negative Affect Scale. Then, participants will be divided randomly into three conditions. In the Limited Time Horizon condition, participants respond to four questions about how they would act if they had 6 months left to live. In the Expansive Time Horizon condition, participants will write responses to four questions about how they would behave if they knew they would live to 120 years of age. Finally, in the control condition, participants write about what they do on a normal day. The exact wording of these materials are available in the appendices of the manuscript of the original study. After the writing activity, participants report their mood on a sliding scale from 0-100. Next, participants complete an emotional picture memory task. This original study used images from the International Affective Picture System and noted the picture numbers in the manuscript. The IAPS images are available for access in my lab so I will use the same images. An issue is that the manuscript indicates that seven negative images were used and seven positive images were used, but the notes in the manuscript only identify six image numbers for negative stimuli. To have equal positive and negative images, I will either have to omit a positive image or add another negative image of equal arousal. The original article also indicates that participants view four neutral images, but do not specify which ones, so I will have to choose four neutral images that may differ from those in the original study. After viewing the images, which are displayed for 5 seconds each in a slideshow, participants will be prompted to type short descriptions of as many of the pictures as they can recall. After the recall task, participants provide demographic information and the study concludes. Two raters will score the responses to the recall task to code whether each image was recalled.

The repository for this project can be found here.

The original paper can be found here.

Methods

Power Analysis

The effect size for the ANOVA in the original paper that found a significant difference in the positivity of recall for three groups had a partial eta-squared of 0.06. This translates to an effect size “f” of 0.25. To detect the same effect size, I would need…

For 95% power: 246 participants
For 90% power: 204 participants
For 80% power: 156 participants

In a follow up t-test, the authors also found that the recall of participants in the limited time horizon condition was significantly more positive than that of participants in the control condition with an effect size of d = .60.

For 95% power: 122 participants (61 people in each group)
For 90% power: 98 participants (49 people per group)
For 80% power: 72 participants (36 people in each group)

Planned Sample

Based on the power analyses above, I plan to collect a sample size of 156, with 52 people in each condition. This would give me 80% power to detect effects of the size reported in the original paper.

Procedure

The procedure for this project followed that of the original paper as closely as possible. The original procedures are reported below:

“As a baseline measure of mood, participants first completed the Positive and Negative Affective Scale (PANAS; Watson, Clark, & Tellegen, 1988). This 20-item questionnaire lists 10 positive and 10 negative emotional adjectives and participants rate the extent to which they are currently feeling each adjective.

Participants were then randomly assigned to either the limited time horizon, expansive time horizon, or control condition. These conditions differed only in the writing activity that was next completed…

Immediately after the writing activity, participants in all three conditions indicated their mood using a sliding scale. Responses could range from 0 (very negative mood) to 100 (very positive mood).

Participants next completed an emotional picture memory task. The picture stimuli used in this task were seven positively-valenced and seven negatively-valenced pictures drawn from the IAPS (Lang et al., 1999). They were all low in arousal, and arousal level did not differ between the positive and negative pictures… During the incidental encoding task pictures were shown in a single random order. The picture slideshow progressed automatically, and participants could not go back and review pictures after they had disappeared. Each picture was shown with either a red or yellow border, and participants were asked to indicate the border’s color. Since this was an online study, this ensured that participants attended to all of the pictures during the encoding period. Each picture was shown for five seconds… Across participants each picture appeared equally often with a red border as it did with a yellow border. To buffer against primacy and recency effects, we also included four non-critical neutral pictures, two of which appeared at the beginning of the slideshow and two at the end. Immediately after viewing the pictures, participants completed a surprise, self-paced, free recall test. Here, they typed short descriptions of as many of the pictures as they could recall.

Finally, at the end of the study participants provided demographics information and also indicated whether they had encountered any technical problems or had seen the emotional picture stimuli in a previous experiment.”

Piloting of this procedure indicated that the survey takes approximately 12 minutes to complete.

The survey can be found here.

Analysis Plan

Prior to analysis, myself and another rater will independently code the presence of the 14 critical and 4 non-critical pictures in participants’ reports. I will exclude participants who report having seen the IAPS emotional picture stimuli in previous studies as well as any participants who report computer errors. To prepare the data, I will create the positive and negative affect scores by summing scores from the positive and negative emotion reports on the PANAS. I will also need to create a recall positivity score, which we be the number of images recalled that were positive minus the number that were negative divided by the total number of critical images recalled.

Clarify key analysis of interest here To determine whether the positivity of recall differs between the three conditions, I will conduct a single-factor between-groups ANOVA on the relative positivity of participants’ recall. I will then conduct follow-up independent t-tests to compare recall positivity between conditions.

The second aim of the study was to test the role of mood in modulating results. First, I will confirm that mood should not differ across participants before the time horizon manipulation by performing ANOVAs on the positive and negative PANAS scores. Then I will test whether mood after the manipulation differs using an ANOVA in reports of mood between the three groups and will test the association between post-manipulation mood and positivity of recall with a correlation test.

Finally, I will test whether mood mediates the association between time horizon condition and positivity of recall. To do this, I will run separate regression analyses of time horizon condition (entered as 0 for either the expansive time horizon or control condition and as 1 for the limited time horizon condition) predicting recall positivity with and without mood and run a Sobel test to see whether the association is significantly strengthened after accounting for mood.

Differences from Original Study

This replication differs in sample and setting such that data are collected on Prolific rather than MTurk and are collected 7-8 years after the original data were collected. Additionally, because I was not able to receive the exact materials used for the original survey, the instructions given to participants and structure of the survey may vary slightly. However, because the stimuli remain the same, I anticipate that these differences should not make a difference in the outcome.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data preparation following the analysis plan (I made this visible for the progress check, will not include it in final report).

### Data Preparation

#### Load Relevant Libraries and Functions
library(qualtRics)
library(readxl)
library(bda)

Loading required package: boot

bda - 19.0.0

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

#### Import data
pilotA_raw = read_survey("C:/Users/JaneS/OneDrive - Stanford/Classes/barber2016/data/pilotA_data_raw.csv") # raw data


── Column specification ────────────────────────────────────────────────────────
cols(
  .default = col_character(),
  Status = col_double(),
  Progress = col_double(),
  `Duration (in seconds)` = col_double(),
  Finished = col_double(),
  RecipientLastName = col_logical(),
  RecipientFirstName = col_logical(),
  RecipientEmail = col_logical(),
  ExternalReference = col_logical(),
  LocationLatitude = col_double(),
  LocationLongitude = col_double(),
  Q0_CONSENT = col_double(),
  PANAS_pos_interested = col_double(),
  PANAS_neg_distressed = col_double(),
  PANAS_pos_excited = col_double(),
  PANAS_neg_upset = col_double(),
  PANAS_pos_strong = col_double(),
  PANAS_neg_guilty = col_double(),
  PANAS_neg_scared = col_double(),
  PANAS_neg_hostile = col_double(),
  PANAS_pos_enthusiast = col_double()
  # ... with 17 more columns
)
ℹ Use `spec()` for the full column specifications.

pilotA_constructed_vars = read_excel("C:/Users/JaneS/OneDrive - Stanford/Classes/barber2016/data/pilotA_data_edit.xlsx") # data with ratings by coders

#### Data exclusion / filtering
pilotA_raw = pilotA_raw %>% 
  filter(previous_exposure == 2) %>% 
  filter(Q0_CONSENT == 1)

pilotA_constructed_vars = pilotA_constructed_vars %>% 
  slice(-1) %>% 
  filter(tech_problems_const != 1)

#### Prepare data for analysis - create columns etc.

pilotA = pilotA_constructed_vars %>% 
  left_join(pilotA_raw, join_by(ResponseId)) %>% 
  mutate(PANAS_pos_sum = rowSums(pick(contains("PANAS_pos"))),
         PANAS_neg_sum = rowSums(pick(contains("PANAS_neg"))),
         across(ends_with("_pos"), as.numeric),
         across(ends_with("_neg"), as.numeric),
         across(ends_with("_neut"), as.numeric),
         pos_images_sum = rowSums(pick(ends_with("_pos")), na.rm = T),
         neg_images_sum = rowSums(pick(ends_with("_neg")), na.rm = T),
         neut_images_sum = rowSums(pick(ends_with("_neut")), na.rm = T), 
         total_images_sum = rowSums(pick(pos_images_sum, neg_images_sum, neut_images_sum), na.rm = T),
         critical_images = total_images_sum - neut_images_sum,
         recall_positivity = (pos_images_sum - neg_images_sum)/critical_images,
         condition_bin = if_else(condition == "limited", 1, 0, missing = NA)
  ) %>% 
  select(ResponseId, PANAS_pos_sum, PANAS_neg_sum, condition, condition_bin, mood = mood_1, total_images_sum, critical_images, recall_positivity, Age, Gender, Education)

Confirmatory analysis

The analyses as specified in the analysis plan.

# ANOVA for difference in recall positivity between 3 groups
aov_recall_positivity = pilotA %>% aov(recall_positivity ~ condition, data = .) 

summary(aov_recall_positivity)

            Df Sum Sq Mean Sq F value Pr(>F)
condition    2 0.3252  0.1626   0.801  0.526
Residuals    3 0.6088  0.2029

pilotA %>% 
  ggplot(aes(x = condition, y = recall_positivity)) + 
  stat_summary(fun = "mean", geom = "bar") +
  stat_summary(fun.data = "mean_se", geom = "errorbar", width = 0.3)

# Multiple Comparisons
TukeyHSD(aov_recall_positivity) # not going to look good because I don't have enough data yet

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = recall_positivity ~ condition, data = .)

$condition
                        diff       lwr      upr     p adj
expansive-control -0.1948718 -2.077281 1.687537 0.9050885
limited-control   -0.5615385 -2.443947 1.320870 0.5087202
limited-expansive -0.3666667 -2.249075 1.515742 0.7215362

# ANOVA for difference in positive and negative affect between three groups
## Positive
aov_PANAS_pos = pilotA %>% aov(PANAS_pos_sum ~ condition, data = .) 

summary(aov_PANAS_pos)

            Df Sum Sq Mean Sq F value Pr(>F)
condition    2  110.2   55.13   0.381  0.753
Residuals    1  144.5  144.50               
2 observations deleted due to missingness

## Negative
aov_PANAS_neg = pilotA %>% aov(PANAS_neg_sum ~ condition, data = .) 

summary(aov_PANAS_neg)

            Df Sum Sq Mean Sq F value Pr(>F)
condition    2  96.75   48.37   24.19  0.142
Residuals    1   2.00    2.00               
2 observations deleted due to missingness

# ANOVA for difference in mood between three groups
aov_mood = pilotA %>% aov(mood ~ condition, data = .) 

summary(aov_mood)

            Df Sum Sq Mean Sq F value Pr(>F)
condition    2    548     274   0.342   0.77
Residuals    1    800     800               
2 observations deleted due to missingness

TukeyHSD(aov_mood) # not going to look good because I don't have enough data yet

Warning in qtukey(conf.level, length(means), x$df.residual): NaNs produced

Warning in ptukey(abs(est), length(means), x$df.residual, lower.tail = FALSE):
NaNs produced

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = mood ~ condition, data = .)

$condition
                  diff lwr upr p adj
expansive-control  -32 NaN NaN   NaN
limited-control    -22 NaN NaN   NaN
limited-expansive   10 NaN NaN   NaN

# Mood and Recall

cor.test(pilotA$mood, pilotA$recall_positivity)


    Pearson's product-moment correlation

data:  pilotA$mood and pilotA$recall_positivity
t = 0.9965, df = 2, p-value = 0.424
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.8626235  0.9893800
sample estimates:
      cor 
0.5760001

# Mediation
pilotA %>% lm(recall_positivity ~ condition_bin, data = .) %>% summary()


Call:
lm(formula = recall_positivity ~ condition_bin, data = .)

Residuals:
       1        2        3        4        5        6 
-0.16410 -0.44103 -0.10000  0.63590 -0.03077  0.10000 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)     0.3641     0.2011   1.811    0.144
condition_bin  -0.4641     0.3482  -1.333    0.253

Residual standard error: 0.4021 on 4 degrees of freedom
Multiple R-squared:  0.3075,    Adjusted R-squared:  0.1344 
F-statistic: 1.776 on 1 and 4 DF,  p-value: 0.2535

pilotA %>% lm(mood ~ condition_bin, data = .) %>% summary()


Call:
lm(formula = mood ~ condition_bin, data = .)

Residuals:
  3   4   5   6 
-20  16 -16  20 
attr(,"label")
                                        mood_1 
"How would you rate your current mood? - Mood" 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)      76.00      18.11   4.196   0.0524 .
condition_bin    -6.00      25.61  -0.234   0.8366  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 25.61 on 2 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.02671,   Adjusted R-squared:  -0.4599 
F-statistic: 0.05488 on 1 and 2 DF,  p-value: 0.8366

pilotA %>% lm(recall_positivity ~ condition_bin + mood, data = .) %>% summary()


Call:
lm(formula = recall_positivity ~ condition_bin + mood, data = .)

Residuals:
      3       4       5       6 
 0.1236  0.1545 -0.1545 -0.1236 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)
(Intercept)   -0.182927   0.619429  -0.295    0.817
condition_bin -0.699593   0.283572  -2.467    0.245
mood           0.011179   0.007724   1.447    0.385

Residual standard error: 0.2798 on 1 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.9057,    Adjusted R-squared:  0.7171 
F-statistic: 4.802 on 2 and 1 DF,  p-value: 0.3071

#mediation.test(pilotA$mood, pilotA$condition_bin, pilotA$recall_positivity)

Side-by-side graph with original graph is ideal here

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.