Replication of Economic Insecurity Increases Physical Pain by Chou et al. (2016, Psychological Science)

Author

Jordan Troutman (troutman@cs.stanford.edu)

Published

December 7, 2023

Introduction

The domain of this project is a a bit off-center from my current research study. My research is in the space of Human-Computer Interaction, specifically focusing on how hegemonic digital constructs implicitly influence our perceptions of the world. Like most topics in this course, having experience running randomized control trials will help with testing causal theories. Nevertheless, I picked this project because I am interested in learning more about how our mental state indeed affects our physical state.

This specific paper from Chou et al. (2016) aims to understand if economic insecurity has an effect on physical pain. The original authors claim to demonstrate this through six experiments, and one meta-analysis with five of the experiments. Here, we shift our focus to replicating the experiment not used in the meta-analysis: Lack of Control Produces Physical Pain (Study 5).

Participants are randomly assigned to Group A or Group B. In Group A (control), participants are asked to recount a time when they were in complete control of a situation. In Group B (intervention), participants are asked to do the opposite – namely recount a time they felt “a complete lack of control.” Both groups then spend a few minutes reflecting on these incidents, and then are asked to indicate the level of physical pain they are experiencing. The challenges of this study seem mostly related to assessing the level of severity of their specific memory; some individuals in Group B may recall very painful memories, while some may not. It is also a challenge to determine if the pain they experience is directly attributed to the intervention, or if circumstances like environment and initial pain levels impact the final measurement. There is also an ethical consideration to be evaluated; it will need to determined that it is sufficient to potentially trigger participants’ harmful expereiences for the sake of the study.

Documentation relating to this project can be found here

Summary of prior replication attempt

Based on the prior write-up, describe any differences between the original and 1st replication in terms of methods, sample, sample size, and analysis. Note any potential problems such as exclusion rates, noisy data, or issues with analysis.

The first replication aimed to keep their iteration as close to the initial study as possible. At a high level, the replication project recieves samples through Amazon Mechanical Turk (MTurk), then assigns them to either the high or low control conditions. They then measure average reported pain levels within each group, and conducted a t test and determined effect size using cohen’s d.

The replication study mainly deviates from the initial study within their procedures and, subsequently, their analysis. At the end of their study, the replication asks five additional questions which “captures respondents’ psychological sense of lacking control.” The average of those questionsis used as an additional control within the analysis. Additionally, the replication asks participants if they have used painkillers and the type/frequency of them.

Within the analysis, the replication observes the group averages of the psychological sense of lacking control to ensure the treatment condition actually intervenes on a sense of control, rather than intervening on the specific instances participants are thinking of, which generally could be more painful independent of the amount of control they have.

Another factor to note is the replication increases the sample size, since the initial study was under-powered (post-hoc).

Both the replication and the initial study do not seem to take into consideration the painkiller use within the random treatment of groups. If it is believed that painkillers can alter a participant’s repsonse to the questions of pain, it would be important to know how many people per group are taking painkillers. This would introduce painkillers as something that could affect (mediate) the captured measure based upon group association.

Methods

Power Analysis

Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.

How much power does your planned sample have for original effect? For an attenuated effect that is half the size of the original?

(If power analysis is not possible or precise, discuss more fully how you determined a sample size that would be sufficient for rescue.)

The results from the initial paper compares the average pain levels between the conditions (control vs. lack of control). A two-tailed two-sample t-test is used to determine if the sample means between the two groups are statistically different from each other.

The intial paper reports a \(t=2.94\) Cohen’s d effect size of 0.41. Additionally, the first replication reports a \(t=0.55\) effect size of \(d=0.191\) with 156 samples in total (78 per group).
Post-hoc, the initial study was indeed under powered with a power of 0.53. The replication has a power of 0.93.

Running a power analysis on the initial report, the following sample sizes would have been required to gain the calculated effect size (for groups of equal sizes):

Power = 80%: n = 95 per group
Power = 90%: n = 126 per group
Power = 95%: n = 156 per group

Though the replication did have a well-powered experiment, the sample size is lower than the size needed for 80% power; it is a bit conservative with respect to conducting a replication. As a result, this study aims to have \(n=125\) participants per group, which is 2.5x the original sample size. According to the power-analysis, a-priori, this is well-powered and has a substantially larger sample than the initial, which is a safe method to test the reproducibility of an experiment.

Planned Sample

The plan is to use Prolific to recruit participants.

Materials

All materials - can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article.

The materials used in the original article and the replication study will be used.

“Participants were asked to write about an autobiographical experience (Whitson & Galinsky, 2008). Half recalled a time when they lacked control. The other half recalled a time when they had complete control.

After the recall task, participants were asked to “choose the overall pain level that best describes how much physical pain you are experiencing RIGHT NOW.” They responded using a visual slider scale (Portenoy & Kanner, 1996), on which 0 indicated no pain and 100 indicated the worst pain ever experienced.

Participants also reported their age, gender, and current employment status, and whether they were using painkillers.”

“Subjects then completed a short, five-item measure of control, that was an additional scale added to the author’s original procedure. See Differences from Original Study section for more details.” (Zion)

Additionally, participants will be initially asked to answer an set of pre-survey questions relating to overall pain level before completing the control portion. These questions will be beneficial for participants to acclimate themselves to the pain scale as well as provide relative measures and baselines for the intervention. The questions are as follows:

On a scale of 0 - 100, where do you place getting hit by a pillow?
On a scale of 0 - 100, where do you place stepping on a nail?
On a scale of 0 - 100, where do you place being stung by a bee?
On a scale of 0 - 100, where do you place scratching your elbow?
On a scale of 0 - 100, where do you place burning your hand?

The survey is linked here

The test data is located on the gitub repo in the data folder under test_data.csv

Procedure

The procedure will follow suit generally of the replicated experiment.

Participants will begin by “reading the IRB header and were asked to agree / disagree to continue”
Participants will then be randomized into either high or low control conditions
Regardless of condition, participants will be asked to complete the pre-survey questions stated above.
Participants then will be asked to state their level of pain on a scale of 0 - 100 RIGHT NOW.
Based on the condition, participants will be “asked to write about an autobiographical experience (Whitson & Galinsky, 2008). Half recalled a time when they lacked control. The other half recalled a time when they had complete control.”
Participants will then be prompted again to “choose the overall pain level that best describes how much physical pain you are experiencing RIGHT NOW.”
Participants will be asked the “five-item measure of control” (Zion) from the replication study.
Lastly, participants will fill out a survey reporting “age, gender, employment status, and use of painkillers” (Zion).

Controls

Within the pre-survey questions, one attention check will be added which states “On a scale of 0 - 100, where would you put getting hit by a ball? Please answer with 50.” Overall, the pre-survey section is a method of understanding how a participant internalizes the pain scale. Asking each participant to then state their current level of pain pre-intervention provides a stronger baseline to the effect of the intervention.

Analysis Plan

Summary statistics for overall and per-condition will be reported, as in the original study. The original study and replication study make no note of any data cleaning or exclusion rules. If participants do not pass the attention check, their input will not be included.

Average pain levels (post-intervention) will be reported, as well as regression models adding controls for age, gender, employment status, and painkiller use. Tests of effect size and significance will be a t-test, and Cohen’s d score, respectively.

Can also quote directly, though it is less often spelled out effectively for an analysis strategy section. The key is to report an analysis strategy that is as close to the original - data cleaning rules, data exclusion rules, covariates, etc. - as possible.

The key analysis of interest is the two-sampled t-test between the conditions (sense of control vs. lack of control) Additionally, the average difference in pain levels pre- and post-intervention will be reported. This will give a more defined perspective on how the participant’s pain level changes as a “reaction” to the prompt being asked. A t-test based on the differences across each group can be performed.

Differences from Original Study and 1st replication

The primary difference between the original study and the first replication is that the data collected is from a representative sample of the US. The intial claims of the study are about US populations, but it is not clear of its representation beyond gender and employment status. This study will intentionally have a representative sample of sex, age, ethnicity. Furthermore, this study gets a baseline for how a participant is feeling before the intervention. This would allow for a clearer understanding of the intervention effect.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Read in data from qualtrics
Remove location data with .py file.
Drop participants who do not pass the attention check (answer getting hit by a ball == 50)
Create a column for the “control-pain” level defined by Zion.
For [group 1, group 2, overall group] report the following summary statistics for age, employment status, painkiller use, sex, ethnicity. Also plot the distributions of the pain-level (post-intervention) to check normality condition.

  df |> count(Gender)

# A tibble: 2 × 2
  Gender     n
  <chr>  <int>
1 Female     7
2 Male       3

  df |> count(Employment)

# A tibble: 5 × 2
  Employment                         n
  <chr>                          <int>
1 Not Working (looking for work)     2
2 Not working (disabled)             1
3 Prefer not to answer               1
4 Working (paid employee)            5
5 Working (self-employed)            1

  df |> count(Painkiller)

# A tibble: 2 × 2
  Painkiller     n
  <chr>      <int>
1 No             5
2 Yes            5

Determine means of pain level (post-intervention) and plot

  df |> group_by(condition) |> summarize(avg = mean(`Pain Level_1`))

# A tibble: 2 × 2
  condition   avg
  <chr>     <dbl>
1 High          8
2 Low          21

Report t-test for pain levels across groups, and cohen’s d

library(lsr)
boxplot(`Pain Level_1` ~ condition, data = df)

results <- df |> group_by(condition) |> summarize(
  Mean=mean(`Pain Level_1`, na.rm = TRUE),
  Sd = sd(`Pain Level_1`, na.rm = TRUE),
  N = n()
)

control <- df |> filter(condition == "Low") |> select(`Pain Level_1`)
control_results <- results[which(results$condition == "Low"),]

intervention <- df |> filter(condition == "High") |> select(`Pain Level_1`)
intervention_results <- results[which(results$condition == "High"),]
t.test(control, intervention)


    Welch Two Sample t-test

data:  control and intervention
t = 0.519, df = 6.1895, p-value = 0.6218
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -47.83813  73.83813
sample estimates:
mean of x mean of y 
       21         8

cohen.d(control$`Pain Level_1`, intervention$`Pain Level_1`)


Cohen's d

d estimate: 0.2776029 (small)
95 percent confidence interval:
    lower     upper 
-1.217783  1.772989

Report the regression for pain ~ condition

model = lm(`Pain Level_1` ~ condition, df)
summary(model)


Call:
lm(formula = `Pain Level_1` ~ condition, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-107.00   -7.75    4.50   16.50   65.00 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)      8.00      19.12   0.418    0.687
conditionLow    13.00      30.23   0.430    0.679

Residual standard error: 46.83 on 8 degrees of freedom
Multiple R-squared:  0.0226,    Adjusted R-squared:  -0.09958 
F-statistic: 0.185 on 1 and 8 DF,  p-value: 0.6785

Report the regression for pain ~ condition + age + gender + employment + painkiller

model = lm(`Pain Level_1` ~ condition + Gender, df) #+ Age + Employment + Painkiller, df)
summary(model)


Call:
lm(formula = `Pain Level_1` ~ condition + Gender, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-103.48  -12.03    8.02   19.14   68.52 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)      4.48      23.33   0.192    0.853
conditionLow    13.88      32.23   0.431    0.680
GenderMale      10.56      34.45   0.306    0.768

Residual standard error: 49.73 on 7 degrees of freedom
Multiple R-squared:  0.03554,   Adjusted R-squared:  -0.24 
F-statistic: 0.129 on 2 and 7 DF,  p-value: 0.881

Report and plot regression for pain ~ five item control (per user)

#model = lm(`Pain Level_1` ~ control_pain, df)
#summary(model)

Results of control measures

Display table of participants who did and did not pass the attention check. (Can be reported)

Display a plot of the distributions for each of the pre-survey questions (Take a second to think about what this will look like) ### Confirmatory analysis

The analyses as specified in the analysis plan.

Three-panel graph with original, 1st replication, and your replication is ideal here

The t-test seems to be a sufficient measure within this study.

Panel 1: Plot of the original t test (bar plot) for group 1 and group 2 Panel 2: Plot of the second t test (bar plot) for group 1 and group 2 Panel 3: Plot of the third t test (bar plot) for group 1 and group 2 (my version)

comparison_df<-data.frame(
  Mean=c(13.75,6.57,23.1,18.51, control_results$Mean, intervention_results$Mean), 
  sd=c(21.49,11.62,23.88,23.88,control_results$Sd, intervention_results$Sd), 
  Study=as.factor(c("Original","Original","Replication","Replication","Rescue", "Rescue")),
  Condition=c("Low Control","High Control","Low Control","High Control","Low Control", "High Control"),
  Insert= c(0.0, 0.1, 0.3, 0.5, 0.7, 1.0),
  N=c(50,50,72,86,control_results$N, intervention_results$N)
  ) 
  
  
ggplot(comparison_df, aes(x=Study, y=Mean, fill=Condition)) + 
  geom_bar(position=position_dodge(), stat="identity", 
           colour='black') +
  geom_errorbar(aes(ymin=Mean-sd/sqrt(N), ymax=Mean+sd/sqrt(N)), position=position_dodge(.9), width=.2)

  comparison_df

   Mean       sd       Study    Condition Insert  N
1 13.75 21.49000    Original  Low Control    0.0 50
2  6.57 11.62000    Original High Control    0.1 50
3 23.10 23.88000 Replication  Low Control    0.3 72
4 18.51 23.88000 Replication High Control    0.5 86
5 21.00 16.85230      Rescue  Low Control    0.7  4
6  8.00 57.77889      Rescue High Control    1.0  6

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Mini meta analysis

Combining across the original paper, 1st replication, and 2nd replication, what is the aggregate effect size?

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.