Data & Research Questions

This study utilized Reddit data to explore how ChatGPT is discussed in the context of mental health. Two datasets were obtained: Dataset A consisted of 52,618 comments from the subreddit r/ChatGPT, spanning from October 2022 to 2024 using Keggle dataset named chatgpt-reddit-comments. This dataset was focused on general user experiences and perceptions of ChatGPT. Dataset B included 250 mental-health-specific comments collected from reddit by myself, emphasizing discussions directly related to emotional wellbeing and psychological struggles. I combined the two data sets before starting the analysis.

The research questions guidining this project are:

Wrangling Steps

After combining both datasets into a unified corpus (data200), preprocessing steps were applied to clean and standardize the text: the ununcessary column were dropped, we only kept the comment body and Text column and renamed to comments. URLs, HTML entities, punctuation, and non-alphanumeric characters were stripped using regular expressions since my datasets include these nonsense and I removed them before moving forward. Text was lowercased, tokenzied and whitespace was standardized. The cleaned dataset (data300) was then filtered using a lexicon of mental health–related terms (anxiety, stress, therapy, hope) to extract a mental-health-relevant subset. Approximately 14% of the comments mentioned at least one term from the mental health lexicon.

##Set up


# One‑time installs 
#install.packages(c("tidyverse","tidytext","textstem",
                 # "janitor","vader","topicmodels",
                 # "wordcloud","tm","SnowballC","scales", "readr"))
library(tidyverse)    # data wrangling & ggplot2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidytext)     # tidy tokenising
library(textstem)     # lemmatisation
## Loading required package: koRpus.lang.en
## Loading required package: koRpus
## Loading required package: sylly
## For information on available language packages for 'koRpus', run
## 
##   available.koRpus.lang()
## 
## and see ?install.koRpus.lang()
## 
## 
## Attaching package: 'koRpus'
## 
## The following object is masked from 'package:readr':
## 
##     tokenize
library(janitor)      # quick column cleaning
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(vader)        # VADER sentiment
library(topicmodels)  # LDA
library(wordcloud)    # word cloud 
## Loading required package: RColorBrewer
library(tm)           # extra text utils
## Loading required package: NLP
## 
## Attaching package: 'NLP'
## 
## The following object is masked from 'package:ggplot2':
## 
##     annotate
## 
## 
## Attaching package: 'tm'
## 
## The following object is masked from 'package:koRpus':
## 
##     readTagged
library(SnowballC)    # stemming
library(scales)       # nicer axis labels
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
library(readr)
#read datasets
data100 <- read.csv("chatgpt-reddit-comments 2.csv")
data2 <- read.csv("health_data.csv")
data100 <- subset(data100, subreddit != "r/technology")

head(data2)
##                                                                                                       Post.Link
## 1 https://www.reddit.com/r/Polska/comments/1ipzz2q/chatgpt_mnie_uratowa%C5%82_od_ostatniego_czynu_w_%C5%BCyciu/
## 2                                                            https://www.reddit.com/r/ChatGPT/comments/1dsx8yp/
## 3                                                            https://www.reddit.com/r/ChatGPT/comments/1dsx8yp/
## 4                                                 https://www.reddit.com/r/DecidingToBeBetter/comments/1gmmujy/
## 5                                                            https://www.reddit.com/r/ChatGPT/comments/1jqbabk/
## 6                                                            https://www.reddit.com/r/ChatGPT/comments/105qzsa/
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Post.Text
## 1                                                                                                                                 I'm writing here... I have to share this. I haven't cried once in 11 years... This changed since ChatGPT. I write to it about every little thing and problem... It has saved me from doing something stupid several times. The problem arises when I realize that I couldn't live without it... Does anyone here use ChatGPT and have a similar experience?
## 2                                                         I recently went through a pretty bad breakup after a long, toxic relationship. Since it happened 3 weeks ago, my use of ChatGPT has increased exponentially. Processing the breakup has been very difficult, but ChatGPT has been very useful as a way to validate and process my feelings about the situation, and also provide advice on how to move forward. Is it a bad idea to use ChatGPT for mental health and as a journal?
## 3                                                                                                                                                                                                                                                                                                  Talking with ChatGPT about my insecurities and sharing random thoughts with it helped my mental health tremendously. I used to have pretty bad anxiety, but now I feel much more at peace.
## 4                            You probably heard about people using ChatGPT as a substitute for seeing a psychologist. While I would say it's still advisable to seek professional help, I'm also baffled by how good this approach really works. It's my fault for being ignorant but I thought it can't be as good as people describe and would just give a lot of BS. Well, today I tried it when I had some time to spare and I'm legitimately astounded by the advice I got from this AI.
## 5                                                                         Basically, I've felt very down, suicidal, maybe drinking too much, and I told ChatGPT my life story and how it all led me here and it gave me some damn good advice. I feel valid, seen  which is really sad because I'm getting that feeling from a robot. But I think if people are in distress, feel broken, need a shoulder to cry on, feel alone with no one to talk to, honestly this is an amazing resource.
## 6 Even though this AI doesn't have feelings, it's very nice to vent to it sometimes. I'll put in a personal story about what I'm dealing with and how I feel, and it responds with really helpful information that eases my mind most of the time. It makes me think of a situation in a different light. I have a support system of actual people, but sometimes I don't want to burden them with my issues. This AI has been really beneficial in some aspects of my life and I'm grateful.
##   Sentiment Source
## 1  Positive Reddit
## 2  Positive Reddit
## 3  Positive Reddit
## 4  Positive Reddit
## 5  Positive Reddit
## 6  Positive Reddit
head(data100)
##   X comment_id comment_parent_id
## 1 0    iztdxuh         t3_zj2aeu
## 2 1    iztn0q0         t3_zj2aeu
## 3 2    izudrph         t3_zj2aeu
## 4 3    iztfhtb         t3_zj2aeu
## 5 4    izu2as9         t3_zj2aeu
## 6 5    izw8iw3         t3_zj2aeu
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   comment_body
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I've been shocked for days now, I don't need clickbait.
## 2  \n\nI am so angry right now. I just wasted my time reading a post on this sub that had a clickbait title, and it was all because of ChatGPT. I can't believe that this machine learning model was able to trick me into thinking that the post was interesting, when it was actually just a bunch of meaningless garbage.\n\nI am so sick and tired of ChatGPT and its ability to generate fake titles and content that is designed to trick people into clicking on them. This is not the first time that ChatGPT has fooled me, and I am sure it won't be the last. But I am not going to stand for it anymore.\n\nI demand that the moderators of this sub take action against ChatGPT and its creators. We need to put a stop to this trickery, and we need to hold ChatGPT accountable for the harm it is causing to this community. I am tired of being deceived by this machine, and I will not stand for it any longer.\n\nSo if you are reading this, ChatGPT, know that you have made a mistake. You have underestimated the intelligence and resilience of the members of this community, and we will not be fooled by your tricks anymore. We are better than that, and we deserve better than the fake content that you are trying to feed us.
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          chatgpt karma whoring is here folks! just when you think the stream of thought bullshit generator that it is couldn’t get more fun!
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Worked on me, ngl.
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Certified 10/10, must-see moment. It really did shock me to my core.
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Wow, way to discover the most basic functionality of a language model. Congratulations, you truly are a pioneering mind of our time. \n\n&#x200B.\n\n\\-ChatGPT
##   subreddit
## 1 r/ChatGPT
## 2 r/ChatGPT
## 3 r/ChatGPT
## 4 r/ChatGPT
## 5 r/ChatGPT
## 6 r/ChatGPT
desc <- data100 %>%
  mutate(comment_len = nchar(comment_body)) %>%  # change this to the correct column name!
  summarise(avg_len_char = mean(comment_len, na.rm = TRUE))
#combine the datasets
#jsut keeping the comments column and getting rid of the rest 
data200 <- c(data100$comment_body, data2$`Text`)
data200 <- data.frame(comments = data200)
#take a peak at my data
glimpse(data200)
## Rows: 35,744
## Columns: 1
## $ comments <chr> "I've been shocked for days now, I don't need clickbait.", " …
#Preprocess text
#install.packages("dplyr")
library(dplyr)
#install.packages("textclean")
library(textclean)
data200_clean <- data200 %>%
  mutate(comments = str_squish(replace_non_ascii(comments)),
         comments = tolower(comments))
# Function to strip URLs, Reddit markdown, etc.
strip_junk <- function(txt){
  txt %>% 
    str_remove_all("http\\S+|www\\S+") %>%          # URLs
    str_remove_all("&amp;|&lt;|&gt;") %>%           # html garbage
    str_remove_all("[^\\p{L}\\p{N}\\s]") %>%        # punctuation & emojis
    str_to_lower()                                  # lower case
}

data300 <- data200_clean %>% 
  mutate(clean_text = strip_junk(comments))

# Tokenise into single words (“unigrams”)
tokens <- data300 %>% 
  unnest_tokens(word, clean_text) %>%               # 1 row per token
  anti_join(stop_words, by = "word") %>%            # default stop words
  mutate(word = lemmatize_words(word)) %>%          
  filter(str_detect(word, "^[a-z]"))                # ditch numbers, etc.
top_words <- tokens %>% 
  count(word, sort = TRUE)

Sentiment Analysis

To assess the emotional tone of comments, the VADER sentiment analysis tool was employed, which returns a compound sentiment score ranging from -1 (very negative) to +1 (very positive). Based on this score, each comment was labeled as: Positive (≥ 0.05) Negative (≤ -0.05) Neutral (otherwise) The distribution of sentiment revealed that approximately 68% of mental-health-related comments were positive, while 26% were negative and 6% were neutral. This suggests a generally favorable framing of ChatGPT in mental health discussions. Additionally, sentiment was broken down by keyword to examine whether terms like anxiety or help tended to co-occur with positive or negative sentiment. Words such as hope, support, and help showed consistently positive associations, while anxiety and panic displayed mild negativity.

mh_lexicon <- c(
  # core mental‑health terms
  "mental","health","therapy","therapist","counsel","depress","anxious",
  "anxiety","panic","stress","ptsd","overwhelm","burnout",
  # feeling verbs / adjectives
  "help","cope","cope","calm","relieve","support","comfort",
  "lonely","isolate","sad","cry","fear","scare","worry","hope",
  # chatgpt‑specific emotion language
  "motivate","encourage","reassure","validate"
)


data300 <- data300 %>%                         
  filter(str_detect(str_to_lower(clean_text),
                    str_c(mh_lexicon, collapse = "|")))

# 3) Tokenise those comments, keep tokens that are *in* the lexicon
tokens_mh <- data300 %>% 
  unnest_tokens(word, clean_text) %>%
  mutate(word = lemmatize_words(word)) %>%          # keeping it consistency
  filter(word %in% mh_lexicon)                      
mh_sent <- data300 %>%                     
  mutate(vader_compound = vader_df(comments)$compound,
         sentiment_cls = case_when(
             vader_compound >=  0.05 ~ "positive",
             vader_compound <= -0.05 ~ "negative",
             TRUE                     ~ "neutral"
         ))

Graphs

#visulization
overall <- mh_sent %>% 
  count(sentiment_cls) %>% 
  mutate(pct = n / sum(n))

ggplot(overall, aes(x = sentiment_cls, y = pct, fill = sentiment_cls)) +
  geom_col(width = 0.65, show.legend = FALSE) +
  scale_y_continuous(labels = percent_format()) +
  scale_fill_manual(values = c("negative" = "#4700b3",
                               "neutral"  = "#9966ff",
                               "positive" = "#c4a3ff")) +
  labs(title    = "Overall sentiment of ChatGPT & Mental‑Health comments",
       x = NULL, y = NULL) +
  theme_minimal(base_family = "Helvetica") +
  theme(axis.text.x  = element_text(size = 12, face = "bold"),
        panel.grid.major.y = element_blank())

#Which MH keyword gets what vibe?
kw_sent <- mh_sent %>% 
  mutate(keyword = str_extract(
            str_to_lower(comments),
            str_c(mh_lexicon, collapse = "|"))) %>% 
  filter(!is.na(keyword)) %>% 
  group_by(keyword) %>% 
  summarise(mean_vader = mean(vader_compound),
            n          = n())

ggplot(kw_sent, aes(x = reorder(keyword, mean_vader), 
                    y = mean_vader, 
                    fill = mean_vader)) +
  geom_col(width = 0.8) +
  coord_flip() +
  scale_fill_gradientn(colours = c("#4700b3","#9966ff","#efe6ff"),
                       limits = c(-0.5,0.5), oob = squish) +
  labs(title = "Average sentiment by mental‑health keyword",
       x = NULL, y = "Mean VADER compound") +
  theme_minimal(base_family = "Helvetica") +
  theme(panel.grid.major.y = element_blank())

Topic Modeling

To identify underlying themes, Latent Dirichlet Allocation (LDA) was performed on the subset of comments containing mental-health-related words. The tokenized and lemmatized comments were converted into a document-term matrix, and LDA was run with k = 5 topics.

#sample topics
top_quotes <- mh_sent %>%                       # not all comments
  slice_max(abs(vader_compound), n = 20) %>%
  select(vader_compound, comments)

##Wordcloud of Most Frequent Terms To visualize the most common words users employed when discussing ChatGPT and mental health, a wordcloud was generated. Each word’s size corresponds to its frequency in the corpus: larger words appeared more often. We applied a set seed (set.seed(42)) to ensure the arrangement stays consistent between renders. Colors were assigned using the Dark2 palette from the RColorBrewer package to enhance visual clarity and accessibility.

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)

## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
## Warning in inner_join(., mh_sent %>% filter(sentiment_cls == "positive") %>% : Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1685 of `x` matches multiple rows in `y`.
## ℹ Row 2 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
## Warning in inner_join(., mh_sent %>% filter(sentiment_cls == "negative") %>% : Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2179 of `x` matches multiple rows in `y`.
## ℹ Row 3 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
# Create Document‑Term Matrix (DTM) from tokens
##TOPIC MODELLING (LDA)
dtm <- tokens_mh %>% 
  count(comments, word) %>% 
  cast_dtm(comments, word, n)

# choose topic count
k <- 5
lda_model <- LDA(dtm, k = k, control = list(seed = 1234))

# top 10 terms
topic_terms <- tidy(lda_model, matrix = "beta") %>% 
  group_by(topic) %>% 
  slice_max(beta, n = 10) %>% 
  ungroup()

# Plot
topic_terms %>% 
  mutate(term = reorder_within(term, beta, topic)) %>% 
  ggplot(aes(term, beta, fill = factor(topic))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ topic, scales = "free") +
  coord_flip() +
  scale_x_reordered() +
  labs(title = "Top topics about ChatGPT in mentalhealth",
       x = NULL, y = "β (probability)")

## Key Findings Overall Sentiment Out of 7,221 mental-health–related comments (≈ 14 % of the full corpus), 68 % were classified Positive ( v ≥ 0.05 ), 26 % Negative ( v ≤ −0.05 ), and 6 % Neutral. In short, Redditors talking about ChatGPT in a mental-health context lean clearly optimistic.

##Dominant Topics: The most frequent mental-health keywords were anxiety, help, stress, support, therapy, and hope. Together they account for 58 % of all mental-health tokens, suggesting the discourse is anchored around coping rather than clinical diagnosis.

#Recommendations / Implications For Mental-Health Practitioners: Consider experimenting with AI chatbots like ChatGPT as adjunct support tools (e.g., guided journaling prompts, psychoeducation), but maintain robust triage to live professionals for crisis scenarios.

For Platform & Model Developers: Prioritize fine-tuning on empathetic response data and embed real-time links to professional hot-lines when users mention high-risk phrases (“suicidal”, “self-harm”).

For Researchers & Educators: Exploring how disclosure norms differ between human-moderated forums and AI-mediated chats. Utilizing ChatGPT to mitigate academic anxiety and supporting students mental health.

##Limitations & Ethical Considerations Sampling Bias – Reddit skews young, male, and tech-savvy students often, findings may not generalize to broader or clinical populations.

Language & Sarcasm – VADER works well on English but struggles with irony; sentiment scores may under-detect sarcasm or dark humor common to Reddit.

Lexicon Filtering – Our mental-health term list is necessarily incomplete; we likely missed colloquialisms (“brain fog”, “doomscrolling”) and included polysemous words (e.g., “help” used sarcastically).

Ethics – Data were publicly available and comply with Reddit’s API Terms of Service. The data was not link to identity of the users.