Replication of Hasan et al. (2025, PNAS Nexus)
Introduction
My research focuses on developing computational methods to study affective well-being and mental health, particularly explaining their relationship with affective dynamics and emotion regulation processes. While I have explored these affective constructs independently, my understanding of how cognitive mechanisms like distortion, dissonance, and rumination interact with affective processes to influence mental health remains limited. Hasan et al. (2025) recently observed that individuals with higher depressive symptoms show increased engagement with cognitively distorted content online (r(416)= .34 and r(416) = .24, with p < .001, for likes and re-tweets respectively) and demonstrated an intervention to teach them to recognize and reduce this engagement. I intend to replicate their key finding to improve my understanding of cognitive-affective interactions, particularly in digital environments, and also learn experimental methodologies, like vignette studies.
Methods
Power Analysis
Hasan et al. (2025) show from a generalized mixed effects regression that here was a significant decrease in the liking and retweeting rates of distorted content when the interactions followed training (like: \(e^{\beta}\) = 0.553, 95% CI = [0.475–0.645], retweet: \(e^{\beta}\) = 0.424, 95% CI = [0.343–0.524]). Based on a power analysis (by estimating Cohen’s d from the odds ratio scores), we observe that we need a sample size of 149 and 72 for \(\alpha = .05\) for likes and retweets respectively.
Planned Sample
Planned sample size and/or termination rule, sampling frame, known demographics if any, preselection rules if any.
Materials
We reuse the materials provided in Hasan et al. (2025).
Hasan et al. (2025) prompt a large language model (LLM) using OpenAI’s ChatGPT interface to generate two sets of sentiment-matched tweets, one set of 60 tweets with cognitively distorted content and the other set of the same number but without distorted content. A sample prompt was “Now generate 10 more that contain cognitive distortions with similar sentence construction and sentiment.” In the next step, a licensed clinical psychologist evaluated these tweets, and some of these tweets were modified to be in the correct category. Finally, a random subset of 30 tweets from each category was selected for the experiment. In the next step, the authors annotated the sentiment of each tweet on a -1 to 1 scale using a sentiment analysis tool called VADER (Hutto & Gilbert, 2014). The tweets were further modified so that the composite sentiment scores for both sets of tweets had similar means (Mdistorted = −.30 and Mnon-distorted = −.15, t(58) = 1.45, p = 0.15) and similar distributions (Kolmogorov-Smirnov Test with D(0)= .23, p = .39). This modification was again validated by a licensed clinical psychologist and later used for the experiment.
Procedure
We follow precisely the same experimental procedure as Hasan et al. (2025).
Participants provided their informed consent before participating in the experiment. Their Twitter handle and demographic information were collected before the main task. Participants were assigned to one of the two counter-balanced conditions—Interaction Before Training and Interaction After Training to study the impact of training on interaction. The assigned condition determined the order in which they saw the interaction and training-identification blocks
During the training-identification block, participants first learned about cognitive distortions using the previously described training method. Following this, in the identification block, they were presented with the 30 randomly chosen tweets counter balanced across the two categories and asked to evaluate the probability (on a scale of 0–100) that each tweet contained a distortion. Participants provided their probability judgment by typing an integer between 0 and 100 into a text box. No feedback was given on their judgments. After providing their judgment, participants could interact with the tweet using the “like” or “retweet” buttons as described for the interaction block, using an interface similar to Twitter. In the interaction block, participants were presented with tweets generated from the remaining 30 randomly sampled stimuli also counter balanced across the two categories.
After the main task, we collected participant data on their mental health and social media use. The full details about the questions including question text are in the Supplementary methods. At the end of the experiment, participants were provided with resources for mental health support.
Analysis Plan
We also follow the same analysis plan as Hasan et al. (2025).
We used a generalized mixed effects regression for our analysis. We tested for differences in (i) accuracy, (ii) liking, and (iii) retweeting. For the accuracy regression, we used behavioral data from the training-identification block. For the liking and retweeting regression, we used the behavioral data from the interaction block. We treated subject ID as a random effect. Since every participant saw a different randomly sampled set of stimuli, we controlled for this by treating every stimulus as a random effect. The Depressive Symptoms and TUS were centered and standardized before being used in the model. The models were fit using the Bound Optimization by Quadratic Approximation in the glmer method in the lme4 R package.
To determine the best model of each dependent variable, we conducted a nested model comparison using five models
- base y ∼ block order * is distorted
- depression severity (alone) y ∼ block order * is distorted + Depressive Symptoms
- twitter use score (alone) y ∼ block order * is distorted + Twitter Use Score
- independent effects y ∼ block order * is distorted * Depressive Symptoms + block order * is distorted * Twitter Use Score
- full model: y ∼ block order * is distorted * Depressive Symptoms * Twitter Use Score
Clarify key analysis of interest here You can also pre-specify additional analyses you plan to do.
Differences from Original Study
Explicitly describe known differences in sample, setting, procedure, and analysis plan from original study. The goal, of course, is to minimize those differences, but differences will inevitably occur. Also, note whether such differences are anticipated to make a difference based on claims in the original article or subsequent published research on the conditions for obtaining the effect.
Methods Addendum (Post Data Collection)
You can comment this section out prior to final report with data collection.
Actual Sample
Sample size, demographics, data exclusions based on rules spelled out in analysis plan
Differences from pre-data collection methods plan
Any differences from what was described as the original plan, or “none”.
Results
Data preparation
Data preparation following the analysis plan.
Confirmatory analysis
The analyses as specified in the analysis plan.
Side-by-side graph with original graph is ideal here
Exploratory analyses
Any follow-up analyses desired (not required).
Discussion
Summary of Replication Attempt
Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.
Commentary
Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.