This study utilized Reddit data to explore how ChatGPT is discussed in the context of mental health. Two datasets were obtained: Dataset A consisted of 52,618 comments from the subreddit r/ChatGPT, spanning from October 2022 to 2024 using Keggle dataset named chatgpt-reddit-comments. This dataset was focused on general user experiences and perceptions of ChatGPT. Dataset B included 250 mental-health-specific comments collected from reddit by myself, emphasizing discussions directly related to emotional wellbeing and psychological struggles. I combined the two data sets before starting the analysis.
The research questions guidining this project are:
After combining both datasets into a unified corpus (data200), preprocessing steps were applied to clean and standardize the text: the ununcessary column were dropped, we only kept the comment body and Text column and renamed to comments. URLs, HTML entities, punctuation, and non-alphanumeric characters were stripped using regular expressions since my datasets include these nonsense and I removed them before moving forward. Text was lowercased, tokenzied and whitespace was standardized. The cleaned dataset (data300) was then filtered using a lexicon of mental health–related terms (anxiety, stress, therapy, hope) to extract a mental-health-relevant subset. Approximately 14% of the comments mentioned at least one term from the mental health lexicon.
#read datasets
data100 <- read.csv("chatgpt-reddit-comments 2.csv")
data2 <- read.csv("health_data.csv")
data100 <- subset(data100, subreddit != "r/technology")
head(data2)
## Post.Link
## 1 https://www.reddit.com/r/Polska/comments/1ipzz2q/chatgpt_mnie_uratowa%C5%82_od_ostatniego_czynu_w_%C5%BCyciu/
## 2 https://www.reddit.com/r/ChatGPT/comments/1dsx8yp/
## 3 https://www.reddit.com/r/ChatGPT/comments/1dsx8yp/
## 4 https://www.reddit.com/r/DecidingToBeBetter/comments/1gmmujy/
## 5 https://www.reddit.com/r/ChatGPT/comments/1jqbabk/
## 6 https://www.reddit.com/r/ChatGPT/comments/105qzsa/
## Post.Text
## 1 I'm writing here... I have to share this. I haven't cried once in 11 years... This changed since ChatGPT. I write to it about every little thing and problem... It has saved me from doing something stupid several times. The problem arises when I realize that I couldn't live without it... Does anyone here use ChatGPT and have a similar experience?
## 2 I recently went through a pretty bad breakup after a long, toxic relationship. Since it happened 3 weeks ago, my use of ChatGPT has increased exponentially. Processing the breakup has been very difficult, but ChatGPT has been very useful as a way to validate and process my feelings about the situation, and also provide advice on how to move forward. Is it a bad idea to use ChatGPT for mental health and as a journal?
## 3 Talking with ChatGPT about my insecurities and sharing random thoughts with it helped my mental health tremendously. I used to have pretty bad anxiety, but now I feel much more at peace.
## 4 You probably heard about people using ChatGPT as a substitute for seeing a psychologist. While I would say it's still advisable to seek professional help, I'm also baffled by how good this approach really works. It's my fault for being ignorant but I thought it can't be as good as people describe and would just give a lot of BS. Well, today I tried it when I had some time to spare and I'm legitimately astounded by the advice I got from this AI.
## 5 Basically, I've felt very down, suicidal, maybe drinking too much, and I told ChatGPT my life story and how it all led me here and it gave me some damn good advice. I feel valid, seen which is really sad because I'm getting that feeling from a robot. But I think if people are in distress, feel broken, need a shoulder to cry on, feel alone with no one to talk to, honestly this is an amazing resource.
## 6 Even though this AI doesn't have feelings, it's very nice to vent to it sometimes. I'll put in a personal story about what I'm dealing with and how I feel, and it responds with really helpful information that eases my mind most of the time. It makes me think of a situation in a different light. I have a support system of actual people, but sometimes I don't want to burden them with my issues. This AI has been really beneficial in some aspects of my life and I'm grateful.
## Sentiment Source
## 1 Positive Reddit
## 2 Positive Reddit
## 3 Positive Reddit
## 4 Positive Reddit
## 5 Positive Reddit
## 6 Positive Reddit
head(data100)
## X comment_id comment_parent_id
## 1 0 iztdxuh t3_zj2aeu
## 2 1 iztn0q0 t3_zj2aeu
## 3 2 izudrph t3_zj2aeu
## 4 3 iztfhtb t3_zj2aeu
## 5 4 izu2as9 t3_zj2aeu
## 6 5 izw8iw3 t3_zj2aeu
## comment_body
## 1 I've been shocked for days now, I don't need clickbait.
## 2 \n\nI am so angry right now. I just wasted my time reading a post on this sub that had a clickbait title, and it was all because of ChatGPT. I can't believe that this machine learning model was able to trick me into thinking that the post was interesting, when it was actually just a bunch of meaningless garbage.\n\nI am so sick and tired of ChatGPT and its ability to generate fake titles and content that is designed to trick people into clicking on them. This is not the first time that ChatGPT has fooled me, and I am sure it won't be the last. But I am not going to stand for it anymore.\n\nI demand that the moderators of this sub take action against ChatGPT and its creators. We need to put a stop to this trickery, and we need to hold ChatGPT accountable for the harm it is causing to this community. I am tired of being deceived by this machine, and I will not stand for it any longer.\n\nSo if you are reading this, ChatGPT, know that you have made a mistake. You have underestimated the intelligence and resilience of the members of this community, and we will not be fooled by your tricks anymore. We are better than that, and we deserve better than the fake content that you are trying to feed us.
## 3 chatgpt karma whoring is here folks! just when you think the stream of thought bullshit generator that it is couldn’t get more fun!
## 4 Worked on me, ngl.
## 5 Certified 10/10, must-see moment. It really did shock me to my core.
## 6 Wow, way to discover the most basic functionality of a language model. Congratulations, you truly are a pioneering mind of our time. \n\n​.\n\n\\-ChatGPT
## subreddit
## 1 r/ChatGPT
## 2 r/ChatGPT
## 3 r/ChatGPT
## 4 r/ChatGPT
## 5 r/ChatGPT
## 6 r/ChatGPT
desc <- data100 %>%
mutate(comment_len = nchar(comment_body)) %>% # change this to the correct column name!
summarise(avg_len_char = mean(comment_len, na.rm = TRUE))
#combine the datasets
#jsut keeping the comments column and getting rid of the rest
data200 <- c(data100$comment_body, data2$`Text`)
data200 <- data.frame(comments = data200)
#take a peak at my data
glimpse(data200)
## Rows: 35,744
## Columns: 1
## $ comments <chr> "I've been shocked for days now, I don't need clickbait.", " …
#Preprocess text
#install.packages("dplyr")
library(dplyr)
#install.packages("textclean")
library(textclean)
data200_clean <- data200 %>%
mutate(comments = str_squish(replace_non_ascii(comments)),
comments = tolower(comments))
# Function to strip URLs, Reddit markdown, etc.
strip_junk <- function(txt){
txt %>%
str_remove_all("http\\S+|www\\S+") %>% # URLs
str_remove_all("&|<|>") %>% # html garbage
str_remove_all("[^\\p{L}\\p{N}\\s]") %>% # punctuation & emojis
str_to_lower() # lower case
}
data300 <- data200_clean %>%
mutate(clean_text = strip_junk(comments))
# Tokenise into single words (“unigrams”)
tokens <- data300 %>%
unnest_tokens(word, clean_text) %>% # 1 row per token
anti_join(stop_words, by = "word") %>% # default stop words
mutate(word = lemmatize_words(word)) %>%
filter(str_detect(word, "^[a-z]")) # ditch numbers, etc.
top_words <- tokens %>%
count(word, sort = TRUE)
To assess the emotional tone of comments, the VADER sentiment analysis tool was employed, which returns a compound sentiment score ranging from -1 (very negative) to +1 (very positive). Based on this score, each comment was labeled as: Positive (≥ 0.05) Negative (≤ -0.05) Neutral (otherwise) The distribution of sentiment revealed that approximately 68% of mental-health-related comments were positive, while 26% were negative and 6% were neutral. This suggests a generally favorable framing of ChatGPT in mental health discussions. Additionally, sentiment was broken down by keyword to examine whether terms like anxiety or help tended to co-occur with positive or negative sentiment. Words such as hope, support, and help showed consistently positive associations, while anxiety and panic displayed mild negativity.
mh_lexicon <- c(
# core mental‑health terms
"mental","health","therapy","therapist","counsel","depress","anxious",
"anxiety","panic","stress","ptsd","overwhelm","burnout",
# feeling verbs / adjectives
"help","cope","cope","calm","relieve","support","comfort",
"lonely","isolate","sad","cry","fear","scare","worry","hope",
# chatgpt‑specific emotion language
"motivate","encourage","reassure","validate"
)
data300 <- data300 %>%
filter(str_detect(str_to_lower(clean_text),
str_c(mh_lexicon, collapse = "|")))
# 3) Tokenise those comments, keep tokens that are *in* the lexicon
tokens_mh <- data300 %>%
unnest_tokens(word, clean_text) %>%
mutate(word = lemmatize_words(word)) %>% # keeping it consistency
filter(word %in% mh_lexicon)
mh_sent <- data300 %>%
mutate(vader_compound = vader_df(comments)$compound,
sentiment_cls = case_when(
vader_compound >= 0.05 ~ "positive",
vader_compound <= -0.05 ~ "negative",
TRUE ~ "neutral"
))
#visulization
overall <- mh_sent %>%
count(sentiment_cls) %>%
mutate(pct = n / sum(n))
ggplot(overall, aes(x = sentiment_cls, y = pct, fill = sentiment_cls)) +
geom_col(width = 0.65, show.legend = FALSE) +
scale_y_continuous(labels = percent_format()) +
scale_fill_manual(values = c("negative" = "#4700b3",
"neutral" = "#9966ff",
"positive" = "#c4a3ff")) +
labs(title = "Overall sentiment of ChatGPT & Mental‑Health comments",
x = NULL, y = NULL) +
theme_minimal(base_family = "Helvetica") +
theme(axis.text.x = element_text(size = 12, face = "bold"),
panel.grid.major.y = element_blank())
#Which MH keyword gets what vibe?
kw_sent <- mh_sent %>%
mutate(keyword = str_extract(
str_to_lower(comments),
str_c(mh_lexicon, collapse = "|"))) %>%
filter(!is.na(keyword)) %>%
group_by(keyword) %>%
summarise(mean_vader = mean(vader_compound),
n = n())
ggplot(kw_sent, aes(x = reorder(keyword, mean_vader),
y = mean_vader,
fill = mean_vader)) +
geom_col(width = 0.8) +
coord_flip() +
scale_fill_gradientn(colours = c("#4700b3","#9966ff","#efe6ff"),
limits = c(-0.5,0.5), oob = squish) +
labs(title = "Average sentiment by mental‑health keyword",
x = NULL, y = "Mean VADER compound") +
theme_minimal(base_family = "Helvetica") +
theme(panel.grid.major.y = element_blank())