Replication of Hasan et al. (2025, PNAS Nexus)

Author

Shashanka Subrahmanya (ssbrahma@stanford.edu)

Published

November 28, 2025

Introduction

My research focuses on developing computational methods to study affective well-being and mental health, particularly explaining their relationship with affective dynamics and emotion regulation processes. While I have explored these affective constructs independently, my understanding of how cognitive mechanisms like distortion, dissonance, and rumination interact with affective processes to influence mental health remains limited. Hasan et al. (2025) recently observed that individuals with higher depressive symptoms show increased engagement with cognitively distorted content online (r(416)= .34 and r(416) = .24, with p < .001, for likes and re-tweets respectively) and demonstrated an intervention to teach them to recognize and reduce this engagement. I intend to replicate their key finding to improve my understanding of cognitive-affective interactions, particularly in digital environments, and also learn experimental methodologies, like vignette studies.

Methods

Power Analysis

Hasan et al. (2025) show from a generalized mixed effects regression that here was a significant decrease in the liking and retweeting rates of distorted content when the interactions followed training (like: \(e^{\beta}\) = 0.553, 95% CI = [0.475–0.645], retweet: \(e^{\beta}\) = 0.424, 95% CI = [0.343–0.524]). Based on a power analysis (by estimating Cohen’s d from the odds ratio scores), we observe that we need a sample size of 149 and 72 for \(\alpha = .05\) for likes and retweets respectively.

Planned Sample

Planned sample size and/or termination rule, sampling frame, known demographics if any, preselection rules if any.

Materials

We reuse the materials provided in Hasan et al. (2025).

Hasan et al. (2025) prompt a large language model (LLM) using OpenAI’s ChatGPT interface to generate two sets of sentiment-matched tweets, one set of 60 tweets with cognitively distorted content and the other set of the same number but without distorted content. A sample prompt was “Now generate 10 more that contain cognitive distortions with similar sentence construction and sentiment.” In the next step, a licensed clinical psychologist evaluated these tweets, and some of these tweets were modified to be in the correct category. Finally, a random subset of 30 tweets from each category was selected for the experiment. In the next step, the authors annotated the sentiment of each tweet on a -1 to 1 scale using a sentiment analysis tool called VADER (Hutto & Gilbert, 2014). The tweets were further modified so that the composite sentiment scores for both sets of tweets had similar means (M_distorted = −.30 and M_{non-distorted} = −.15, t(58) = 1.45, p = 0.15) and similar distributions (Kolmogorov-Smirnov Test with D(0)= .23, p = .39). This modification was again validated by a licensed clinical psychologist and later used for the experiment.

Procedure

We follow precisely the same experimental procedure as Hasan et al. (2025).

Participants provided their informed consent before participating in the experiment. Their Twitter handle and demographic information were collected before the main task. Participants were assigned to one of the two counter-balanced conditions—Interaction Before Training and Interaction After Training to study the impact of training on interaction. The assigned condition determined the order in which they saw the interaction and training-identification blocks

**Fig. 1.** *Experimental procedure:*. Participants who were assigned to the “Interaction After Training Condition” did the Training-Identification block before the Interaction Block and the individuals in the “Interaction Before the Training Condition” did it in the opposite order

During the training-identification block, participants first learned about cognitive distortions using the previously described training method. Following this, in the identification block, they were presented with the 30 randomly chosen tweets counter balanced across the two categories and asked to evaluate the probability (on a scale of 0–100) that each tweet contained a distortion. Participants provided their probability judgment by typing an integer between 0 and 100 into a text box. No feedback was given on their judgments. After providing their judgment, participants could interact with the tweet using the “like” or “retweet” buttons as described for the interaction block, using an interface similar to Twitter. In the interaction block, participants were presented with tweets generated from the remaining 30 randomly sampled stimuli also counter balanced across the two categories.

After the main task, we collected participant data on their mental health and social media use. The full details about the questions including question text are in the Supplementary methods. At the end of the experiment, participants were provided with resources for mental health support.

Analysis Plan

We also follow the same analysis plan as Hasan et al. (2025).

We used a generalized mixed effects regression for our analysis. We tested for differences in (i) accuracy, (ii) liking, and (iii) retweeting. For the accuracy regression, we used behavioral data from the training-identification block. For the liking and retweeting regression, we used the behavioral data from the interaction block. We treated subject ID as a random effect. Since every participant saw a different randomly sampled set of stimuli, we controlled for this by treating every stimulus as a random effect. The Depressive Symptoms and TUS were centered and standardized before being used in the model. The models were fit using the Bound Optimization by Quadratic Approximation in the glmer method in the lme4 R package.

To determine the best model of each dependent variable, we conducted a nested model comparison using five models

base y ∼ block order * is distorted

depression severity (alone) y ∼ block order * is distorted + Depressive Symptoms

twitter use score (alone) y ∼ block order * is distorted + Twitter Use Score

independent effects y ∼ block order * is distorted * Depressive Symptoms + block order * is distorted * Twitter Use Score

full model: y ∼ block order * is distorted * Depressive Symptoms * Twitter Use Score

Clarify key analysis of interest here You can also pre-specify additional analyses you plan to do.

Differences from Original Study

Explicitly describe known differences in sample, setting, procedure, and analysis plan from original study. The goal, of course, is to minimize those differences, but differences will inevitably occur. Also, note whether such differences are anticipated to make a difference based on claims in the original article or subsequent published research on the conditions for obtaining the effect.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data preparation following the analysis plan.

Confirmatory analysis

The analyses as specified in the analysis plan.

**Fig. 3.:**. The impact of cognitive distortion psychoeducation on liking and retweeting distorted and nondistorted content. The thick dark lines depict the mean tendency. The light lines depict separate statements in the experiment. We observe a consistent reduction in the liking and retweeting of tweets with distorted content after the intervention compared to before the intervention.

Side-by-side graph with original graph is ideal here

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.

References

Hasan, E., Epping, G., Lorenzo-Luaces, L., Bollen, J., & Trueblood, J. S. (2025). One-shot intervention reduces online engagement with distorted content. PNAS Nexus, 4(3), pgaf068. https://doi.org/10.1093/pnasnexus/pgaf068

Hutto, C., & Gilbert, E. (2014). VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216–225. https://doi.org/10.1609/icwsm.v8i1.14550