Replication of Hasan et al. (2025, PNAS Nexus)

Author

Shashanka Subrahmanya (ssbrahma@stanford.edu)

Published

November 1, 2025

Introduction

My research focuses on developing computational methods to study affective well-being and mental health, particularly explaining their relationship with affective dynamics and emotion regulation processes. While I have explored these affective constructs independently, my understanding of how cognitive mechanisms like distortion, dissonance, and rumination interact with affective processes to influence mental health remains limited. Hasan et al. (2025) recently observed that individuals with higher depressive symptoms show increased engagement with cognitively distorted content online (r(416)= .34 and r(416) = .24, with p < .001, for likes and re-tweets respectively) and demonstrated an intervention to teach them to recognize and reduce this engagement. I intend to replicate their key finding to improve my understanding of cognitive-affective interactions, particularly in digital environments, and also learn experimental methodologies, like vignette studies.

Generation of stimuli

Hasan et al. (2025) prompt a large language model (LLM) using OpenAI’s ChatGPT interface to generate two sets of sentiment-matched tweets, one set of 60 tweets with cognitively distorted content and the other set of the same number but without distorted content. A sample prompt was “Now generate 10 more that contain cognitive distortions with similar sentence construction and sentiment.” In the next step, a licensed clinical psychologist evaluated these tweets, and some of these tweets were modified to be in the correct category. Finally, a random subset of 30 tweets from each category was selected for the experiment. In the next step, the authors annotated the sentiment of each tweet on a -1 to 1 scale using a sentiment analysis tool called VADER (Hutto & Gilbert, 2014). The tweets were further modified so that the composite sentiment scores for both sets of tweets had similar means (Mdistorted = −.30 and Mnon-distorted = −.15, t(58) = 1.45, p = 0.15) and similar distributions (Kolmogorov-Smirnov Test with D(0)= .23, p = .39). This modification was again validated by a licensed clinical psychologist and later used for the experiment.

Experimental procedure

Participants (N = 838) were recruited from MTurk and randomly assigned to one of two experimental conditions, interaction before training or interaction after training, to study the impact of training on interaction. This determined the order in which they saw the interaction and training-identification blocks. During the training-identification block, the participants read a psycho-educational document (less than 250 words) that taught them to recognize three classes of cognitive distortions: jumping to conclusions, exaggerating, and being rigid, without mentioning their harmful effects. They then evaluated 30 randomly selected tweets by providing probability judgments (0-100) about whether each contained a cognitive distortion before deciding whether to like or re-tweet them. In the interaction block, participants were presented with the remaining 30 tweets, also counterbalanced across the two categories. At the end of the main task, participants completed questionnaires on depression (PHQ-9) and Twitter engagement using a composite Twitter Use Score (TUS).

Possible challenges

The authors designed a user interface similar to Twitter to enhance ecological validity by using buttons that illuminated when participants “liked” or “re-tweeted,” mimicking the original platform. While Hasan et al. (2025) have shared the analysis code, the interface implementation code is unavailable, which could pose challenges in my replication. However, this limitation should be manageable given that the code could likely be obtained through direct author contact or implemented independently using contemporary LLM-assisted programming tools, particularly given my background in computer science and software engineering

Another challenge involves obtaining validation for generated tweets from a licensed clinical psychologist since access to such expertise may prove difficult or expensive. To mitigate this issue, I plan to approximate the validation process by recruiting domain experts online or utilizing LLMs to assess tweet validity.

Methods

Power Analysis

Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.

Planned Sample

Planned sample size and/or termination rule, sampling frame, known demographics if any, preselection rules if any.

Materials

All materials - can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article.

Procedure

We follow precisely the same experimental procedure as Hasan et al. (2025).

Participants provided their informed consent before participating in the experiment. Their Twitter handle and demographic information were collected before the main task. Participants were assigned to one of the two counter-balanced conditions—Interaction Before Training and Interaction After Training to study the impact of training on interaction. The assigned condition determined the order in which they saw the interaction and training-identification blocks.

Fig. 1. Experimental procedure:. Participants who were assigned to the “Interaction After Training Condition” did the Training-Identification block before the Interaction Block and the individuals in the “Interaction Before the Training Condition” did it in the opposite order

During the training-identification block, participants first learned about cognitive distortions using the previously described training method. Following this, in the identification block, they were presented with the 30 randomly chosen tweets counter balanced across the two categories and asked to evaluate the probability (on a scale of 0–100) that each tweet contained a distortion. Participants provided their probability judgment by typing an integer between 0 and 100 into a text box. No feedback was given on their judgments.

After providing their judgment, participants could interact with the tweet using the “like” or “retweet” buttons as described for the interaction block, using an interface similar to Twitter. In the interaction block, participants were presented with tweets generated from the remaining 30 randomly sampled stimuli also counter balanced across the two categories. After the main task, we collected participant data on their mental health and social media use. The full details about the questions including question text are in the Supplementary methods. At the end of the experiment, participants were provided with resources for mental health support.

Analysis Plan

Can also quote directly, though it is less often spelled out effectively for an analysis strategy section. The key is to report an analysis strategy that is as close to the original - data cleaning rules, data exclusion rules, covariates, etc. - as possible.

Clarify key analysis of interest here You can also pre-specify additional analyses you plan to do.

Differences from Original Study

Explicitly describe known differences in sample, setting, procedure, and analysis plan from original study. The goal, of course, is to minimize those differences, but differences will inevitably occur. Also, note whether such differences are anticipated to make a difference based on claims in the original article or subsequent published research on the conditions for obtaining the effect.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data preparation following the analysis plan.

Confirmatory analysis

The analyses as specified in the analysis plan.

Fig. 3.:. The impact of cognitive distortion psychoeducation on liking and retweeting distorted and nondistorted content. The thick dark lines depict the mean tendency. The light lines depict separate statements in the experiment. We observe a consistent reduction in the liking and retweeting of tweets with distorted content after the intervention compared to before the intervention.

Side-by-side graph with original graph is ideal here

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.

References

Hasan, E., Epping, G., Lorenzo-Luaces, L., Bollen, J., & Trueblood, J. S. (2025). One-shot intervention reduces online engagement with distorted content. PNAS Nexus, 4(3), pgaf068. https://doi.org/10.1093/pnasnexus/pgaf068
Hutto, C., & Gilbert, E. (2014). VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216–225. https://doi.org/10.1609/icwsm.v8i1.14550