Replication of Sleep Preferentially Enhances Memory for Emotional Components of Scenes by Payne, Stickgold, Swanberg and Kensinger (2008, Psychological Science)

Author

Daniel Ogunbamowo (dogun@stanford.edu)

Published

October 22, 2023

Introduction

[No abstract is needed.] Each replication project will have a straightforward, no frills report of the study and results. These reports will be publicly available as supplementary material for the aggregate report(s) of the project as a whole. Also, to maximize project integrity, the intro and methods will be written and critiqued in advance of data collection. Introductions can be just 1-2 paragraphs clarifying the main idea of the original study, the target finding for replication, and any other essential information. It will NOT have a literature review – that is in the original publication. You can write both the introduction and the methods in past tense.

A short justification for your choice of experiment in terms of your research interests or research program (1 paragraph).

I am working with James Gross in the affective area, and the study of sleep and affect is a domain I am interested in. The lab just recently completed a large study on sleep bruxism and emotion regulation, and replicating this study will be a good way to prepare myself for exploring the lab’s data, and engaging with the relevant literature.

A description of the stimuli and procedures that will be required to conduct this experiment, and what the challenges will be (1-2 paragraphs).

Stimuli for this study include images containing a combination of a negative emotionally arousing object (a car crash), with a neutral background scene (a street), or a neutral object (a car) with a neutral background. Participants will be put into two groups. One group will view the set of images in the morning and the other group will view the images in the evening. Both groups will wait 12 hours (the morning group are not allowed to nap during the day), and will then be tested on how much they remember from the scenes they viewed earlier. During the test, they will be shown images that are either identical to the images they viewed earlier, images with similar object and background emotion valence to one they viewed earlier, or a new object. Participants will say whether they thought the image was identical, similar or new, for each image they are presented with during the test. The idea behind this study is to see whether emotionally arousing stimuli are better remembered than neutral stimuli after a night of sleep.

I think it might be difficult to get enough participants for this study. This was something Jinxiao, the student who ran the original replication, struggled with. I imagine I would have to start collecting data relatively earlier in the quarter to get a larger sample.

Secondly, this is not a a study that a participant can do in one 30 minute sitting. This will require them to participate at times both before and after sleep, and with 12 hour-long intervals between survey completions. Because participants won’t be in a lab, we won’t be able to monitor whether they actually follow the study guidelines which may have a large impact on the resulting effect size.

A link to the repository and to the original paper (as hosted in your repo). The goal is that we should be able to click on these links and review that your repo is formatted right and look at the original paper.

Rescue Repository: https://github.com/dogunbamowo/payne2008_rescue
Original Replication Repository: https://github.com/psych251/payne2008
Original Paper (as hosted in my repo): https://github.com/dogunbamowo/payne2008_rescue/blob/main/original_paper/payne-et-al-2008-sleep-preferentially-enhances-memory-for-emotional-components-of-scenes.pdf

Summary of prior replication attempt

Based on the prior write-up, describe any differences between the original and 1st replication in terms of methods, sample, sample size, and analysis. Note any potential problems such as exclusion rates, noisy data, or issues with analysis.

The first replication project differed from the original in that it recruited online crowd workers (MTurk), whereas the participants from the original study were college students at Harvard and Boston College. The replication project experiments were also run online, whereas the original experiment was done in person, in a lab.

In the original experiment, participants were randomly assigned to one of two possible conditions (one in which they would view a set of stimuli after sleeping and the other after 12 hours awake). In the replication, participants were allowed to self-select which condition they would be in. This presents a strong chance of selection bias, and also reduces our opportunity to make causal inferences from the results.

The replication attempted to recruit 48 participants, but only ended up being able to include 23 participants in the final analysis due to so many of them dropping out. The original study recruited 88 participants.

Methods

Power Analysis

Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.

How much power does your planned sample have for original effect? For an attenuated effect that is half the size of the original?

(If power analysis is not possible or precise, discuss more fully how you determined a sample size that would be sufficient for rescue.)

Planned Sample

Planned sample size and/or termination rule, sampling frame, known demographics if any, preselection rules if any.

I plan to recruit at least 88 participants using prolific. This figure may change after power analyses are completed, however, seeing as this is a rescue attempt of a replication that may have failed largely due to a small sample size, I think it makes sense to plan for a sufficiently large sample. Participants who complete neither or only one of the two required experiment conditions will be removed from any analyses.

In the replication project, 91.3% of the sample had finished or attended college, so I will attempt to collect a sample that is more diverse in SES.

Materials

All materials - can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article.

“The scenes portrayed negative arousing or neutral objects placed on plausible neutral backgrounds. For each of 64 scenes (e.g., a car on a street), we created eight different versions by placing each of two similar neutral objects (e.g., two images of a car) and each of two related negative objects (e.g., two images of a car accident) on each of two plausible neutral backgrounds (e.g., two images of a street). An additional 32 scenes served as lures on a recognition memory test (Fig. 1). Participants in a previous study had rated the objects and backgrounds for valence and arousal, using 7-point scales (Kensinger, Garoff-Eaton, & Schacter, 2006). All negative objects had received arousal ratings of 5 to 7 (with high scores signifying an exciting or arousing image) and valence ratings lower than 3 (with low scores signifying a negative image). All neutral items (objects and backgrounds) had been rated as nonarousing (arousal values lower than 4) and neutral (valence ratings between 3 and 5).”

This will be followed precisely.

Fig. 1. “Examples of the scenes presented to subjects. Eight versions of each scene were created by combining each of four similar objects (two neutral objects, two negative and arousing emotional objects) with each of two plausible neutral backgrounds. In this example, the two neutral central objects are cars, and the two negative central objects are cars damaged in an accident; the neutral backgrounds are street scenes. Two of the eight versions of the completed scene are shown.” http://journals.sagepub.com/na101/home/literatum/publisher/sage/journals/content/pssa/2008/pssa_19_8/j.1467-9280.2008.02157.x/20160829/images/medium/10.1111_j.1467-9280.2008.02157.x-fig1.gif

Procedure

Can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article.

“Participants studied a set of 64 scenes (32 with a neutral object and 32 with a negative object, all on neutral backgrounds) for 5 s each, and then indicated on a 7-point scale whether they would approach or move away from the scene if they encountered it in real life. This task was used to maximize encoding.

After the delay period, participants performed an unexpected, self-paced recognition task. During this task, objects and backgrounds were presented separately and one at a time. Some of these objects and backgrounds were identical to the scene components that had been studied (e.g., the same car accident), others were the alternate version of the object or background and therefore shared the same verbal label but differed in specific visual details (e.g., a similar car accident), and others were objects or backgrounds that had not been studied (new). Participants never saw both the same and the similar version of an item at test. Each object or background was presented with a question (e.g., “Did you see a monkey?”). If the answer to the question was “yes,” participants pressed one button to indicate that the object or background was an exact match to a studied component (“same”) or a second button to indicate that it was not an exact match (“similar”). If the answer to the question was “no,” they pressed a third button.”

The recognition task includes 32 same objects (16 negative, 16 neutral), 32 similar objects (16 negative, 16 neutral), 32 new objects (16 negative, 16 neutral), 32 same backgrounds (16 previously shown with a negative object, 16 previously shown with a neutral object), 32 similar backgrounds (16 previously shown with a negative object, 16 previously shown with a neutral object), and 32 new backgrounds.”

This will be followed precisely. For the replication project, the experiment sessions before and after the delay period will be referred to as Session 1 and Session 2 respectively. I will also do this for my rescue attempt.

Controls

What attention checks, positive or negative controls, or other quality control measures are you adding so that a (positive or negative) result will be more interpretable?

Neither the original study, nor the replication controlled for sleep quality and number of hours slept, but I think it would be useful to control for this in the rescue attempt.

I also think it might be useful to periodically check with participants whether they actually find some of the stimuli to be negatively or neutrally arousing, which was the intention of the creators of the stimuli. After looking through the images that make up the stimulus set, whilst some of them may induce negative affect, some seemed too unrealistic to induce negative affect, or just seemed like they would stand out as being strange rather than negative (potentially due to appearing poorly photoshopped). This might need to be done with a separate set of participants in a different experiment.

Analysis Plan

Can also quote directly, though it is less often spelled out effectively for an analysis strategy section. The key is to report an analysis strategy that is as close to the original - data cleaning rules, data exclusion rules, covariates, etc. - as possible.

Clarify key analysis of interest here You can also pre-specify additional analyses you plan to do.

“We scored a response as specific recognition of visual details when a subject correctly responded”same” to a same item, but as general recognition without specific details when a subject responded “similar” to a same item. Because “similar” responses were constrained by the number of “same” responses (i.e., subjects responded “similar” only when they did not remember the visual details), we computed the general recognition score as the proportion of “similar” responses after exclusion of “same” responses (similar/[1- same]).” “Specific and general recognition scores were computed separately for central objects (negative or neutral) and for the peripheral neutral backgrounds (studied with either a negative or a neutral object).”

I will follow the replication, which planned to perform a 2 (condition: sleep, wake) x 2 (valence: negative, neutral) x 2 (scene component: object, background) mixed ANOVA. As well as a follow-up 2 (condition: sleep, wake) x 2 (valence: negative, neutral) mixed ANOVA applied on the recognition of objects and backgrounds separately.

Time and funds permitting, I also plan to do additional analyses double checking the valence ratings of the stimulus, as well as whether sleep quality (low, medium, high) has any potential interaction effect.

Differences from Original Study and 1st replication

Explicitly describe known differences in sample, setting, procedure, and analysis plan from original study. The goal, of course, is to minimize those differences, but differences will inevitably occur. Also, note whether such differences are anticipated to make a difference based on claims in the original article or subsequent published research on the conditions for obtaining the effect.

Key Differences

- I aim to recruit a sample greater than that of the original study (n = 88) and the first replication (n = 23) - I intend to recruit a sample diverse in SES - Participants will be randomly assigned to one of the two conditions (sleep or wake), rather than assigning themselves to a condition - The original study was done in-person, and replication was done online using Mturk workers. I will do my rescue online using prolific workers - I plan to run a smaller parallel study confirming the valence ratings of the stimuli by the original authors - I plan to ask participants in all conditions about their sleep quality the previous night when they complete the second of the two experimental sessions - I plan to include sleep quality in an additional mixed ANOVA. This may result in a slight change to the original claims of the article, which were that sleep preferentially enhances memory for emotional components of scenes. A novel result for this control would result in a new claim: Depending on sleep quality, sleep preferentially enhances memory for emotional components of scenes

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data preparation following the analysis plan.

### Data Preparation

#### Load Relevant Libraries and Functions
library(qualtRics)
Warning: package 'qualtRics' was built under R version 4.1.2
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.5     ✔ purrr   1.0.1
✔ tibble  3.2.1     ✔ dplyr   1.1.1
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.0.2     ✔ forcats 0.5.1
Warning: package 'tibble' was built under R version 4.1.2
Warning: package 'tidyr' was built under R version 4.1.2
Warning: package 'purrr' was built under R version 4.1.2
Warning: package 'dplyr' was built under R version 4.1.2
Warning: package 'stringr' was built under R version 4.1.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(lubridate)

Attaching package: 'lubridate'
The following objects are masked from 'package:base':

    date, intersect, setdiff, union
library(ggplot2)


#### Import data

# Read in stimuli rating data
# Read in session 1 data
# Read in session 2 data


#### Data exclusion / filtering

# Filter out participants who did not complete session 2
# Filter out participants who completed session 2 outside of study window


#### Prepare data for analysis - create columns etc.

# Convert csv’s into long format 
# For stimuli csv, names as “Stimulus”, “Rating”
# For session 2 csv, names as “Object Valence”, “Memory Response”, “Correct Response”, “Condition”
# Calculate specific and general recognition rate

Results of control measures

How did people perform on any quality control checks or positive and negative controls?

Using stimulus csv, calculate mean valence ratings of negatively valenced stimuli and neutrally rated stimuli

Confirmatory analysis

The analyses as specified in the analysis plan.

Three-panel graph with original, 1st replication, and your replication is ideal here

SINGLE statistical test that justifies the main inference of the paper

2 (condition: sleep, wake) x 2 (valence: negative, neutral) x 2 (scene component: object, background) mixed ANOVA on general recognition of objects

Further Analyses

2 (condition: sleep, wake) x 2 (valence: negative, neutral) mixed ANOVA on the general recognition

2 (condition: sleep, wake) x 2 (valence: negative, neutral) mixed ANOVA on the recognition of backgrounds

Exploratory analyses

Any follow-up analyses desired (not required).

The results of general recognition of the two groups (wake and sleep) in plot The results of specific recognition in plot

Exploratory Analysis

2 (condition: sleep, wake) x 2 (valence: negative, neutral) x 2 (scene component: object, background) x 2 (sleep quality: high, low) mixed ANOVA on general recognition of objects

Discussion

Mini meta analysis

Combining across the original paper, 1st replication, and 2nd replication, what is the aggregate effect size?

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.