This study utilized Reddit data to explore how ChatGPT is discussed in the context of mental health. Two datasets were obtained: Dataset A consisted of 52,618 comments from the subreddit r/ChatGPT, spanning from October 2022 to 2024 using Keggle dataset named chatgpt-reddit-comments. This dataset was focused on general user experiences and perceptions of ChatGPT. Dataset B included 250 mental-health-specific comments collected from reddit by myself, emphasizing discussions directly related to emotional wellbeing and psychological struggles. I combined the two data sets before starting the analysis.
The research questions guidining this project are:
After combining both datasets into a unified corpus (data200), preprocessing steps were applied to clean and standardize the text: the ununcessary column were dropped, we only kept the comment body and Text column and renamed to comments. URLs, HTML entities, punctuation, and non-alphanumeric characters were stripped using regular expressions since my datasets include these nonsense and I removed them before moving forward. Text was lowercased, tokenzied and whitespace was standardized. The cleaned dataset (data300) was then filtered using a lexicon of mental health–related terms (anxiety, stress, therapy, hope) to extract a mental-health-relevant subset. Approximately 14% of the comments mentioned at least one term from the mental health lexicon.
To assess the emotional tone of comments, the VADER sentiment analysis tool was employed, which returns a compound sentiment score ranging from -1 (very negative) to +1 (very positive). Based on this score, each comment was labeled as: Positive (≥ 0.05) Negative (≤ -0.05) Neutral (otherwise) The distribution of sentiment revealed that approximately 68% of mental-health-related comments were positive, while 26% were negative and 6% were neutral. This suggests a generally favorable framing of ChatGPT in mental health discussions. Additionally, sentiment was broken down by keyword to examine whether terms like anxiety or help tended to co-occur with positive or negative sentiment. Words such as hope, support, and help showed consistently positive associations, while anxiety and panic displayed mild negativity.
To identify underlying themes, Latent Dirichlet Allocation (LDA) was performed on the subset of comments containing mental-health-related words. The tokenized and lemmatized comments were converted into a document-term matrix, and LDA was run with k = 5 topics.
Overall Sentiment Out of 7,221 mental-health–related comments (≈ 14 % of the full corpus), 68 % were classified Positive ( v ≥ 0.05 ), 26 % Negative ( v ≤ −0.05 ), and 6 % Neutral. In short, Redditors talking about ChatGPT in a mental-health context lean clearly optimistic.
Dominant Topics: The most frequent mental-health keywords were anxiety, help, stress, support, therapy, and hope. Together they account for 58 % of all mental-health tokens, suggesting the discourse is anchored around coping rather than clinical diagnosis.
#Recommendations / Implications For Mental-Health Practitioners: Consider experimenting with AI chatbots like ChatGPT as adjunct support tools (e.g., guided journaling prompts, psychoeducation), but maintain robust triage to live professionals for crisis scenarios.
For Platform & Model Developers: Prioritize fine-tuning on empathetic response data and embed real-time links to professional hot-lines when users mention high-risk phrases (“suicidal”, “self-harm”).
For Researchers & Educators: Exploring how disclosure norms differ between human-moderated forums and AI-mediated chats. Utilizing ChatGPT to mitigate academic anxiety and supporting students mental health.
#Limitations & Ethical Considerations Sampling Bias – Reddit skews young, male, and tech-savvy students often, findings may not generalize to broader or clinical populations.
Language & Sarcasm – VADER works well on English but struggles with irony; sentiment scores may under-detect sarcasm or dark humor common to Reddit.
Lexicon Filtering – Our mental-health term list is necessarily incomplete; we likely missed colloquialisms (“brain fog”, “doomscrolling”) and included polysemous words (e.g., “help” used sarcastically).
Ethics – Data were publicly available and comply with Reddit’s API Terms of Service. The data was not link to identity of the users.