1. Main Question
How do the language patterns of users with depressive symptoms differ from those of the general population, and what do these patterns reveal about their emotional states, thought processes, and coping behaviors?
This study investigates how individuals with depressive symptoms express themselves on social media, using Twitter data to uncover linguistic patterns that may reflect their psychological state. As mental health concerns grow globally, particularly among youth and social media users, understanding how depression manifests in everyday digital communication has become an urgent and valuable pursuit. Language is not only a tool for self-expression but also a window into emotional well-being.
How do the language patterns of users with depressive symptoms differ from those of the general population, and what do these patterns reveal about their emotional states, thought processes, and coping behaviors?
The project began with the goal of identifying distinct language features in tweets posted by individuals who exhibit signs of depression.To uncover meaningful differences in language use between control and depressed groups, this study employed three key methods: NRC sentiment analysis to assess emotional tone, log odds ratio to identify statistically distinctive words used by each group, and trigram network visualization to explore the contextual and semantic patterns of word co-occurrences.
According to the NRC sentiment analysis, the Depressed group used more emotion-related words than the Control group across most sentiment categories, with particularly high usage of negative emotions like sadness and fear. Interestingly, the Depressed group also used more words associated with “joy” and “positive” sentiment, which may reflect a longing for recovery or self-reassurance rather than genuine positive emotions.
The log odds ratio analysis revealed that the Depressed group frequently used emotionally charged words such as depression, treatments, sos, and overcome, reflecting psychological distress and a desire for healing. In contrast, the Control group predominantly used neutral and everyday terms. This suggests that individuals with depression tend to use language more focused on psychological pain, self-reflection, and help-seeking.
The trigram network analysis showed that the Control group’s language structure was horizontal and socially driven, covering diverse topics like politics, music, and popular culture. On the other hand, the Depressed group formed a dense, emotionally anchored network centered around key terms like depression, treatments, and day, reflecting themes of emotional expression, therapeutic effort, and altered perceptions of time. Notably, many trigrams took on metaphorical or poetic forms, suggesting a tendency to express psychological pain indirectly.
In conclusion, the Depressed Group tend to use emotionally sensitive language, center their discourse around psychological pain and healing, and express their emotions through metaphorical and introspective means.
The ability to detect subtle linguistic cues associated with depression through social media analysis holds significant promise for public health applications. With proper ethical oversight, such models could inform early detection systems, digital mental health screenings, or targeted outreach efforts. Ultimately, this research reinforces the idea that language is not only a form of communication, but a potential diagnostic tool—offering a new avenue for identifying and supporting individuals at risk in our increasingly digital world.
The dataset used in this study was collected using the Twitter API as part of a research initiative aimed at exploring mental health through social media. It was originally compiled by researchers affiliated with the University of Maryland and other collaborators in the context of the CLPsych 2015 Shared Task (Computational Linguistics and Clinical Psychology). The project’s goal was to investigate whether psychological states, particularly depression, can be detected through linguistic patterns in users’ tweets. The dataset is in raw, uncleaned text format, and has been filtered to include only English-language tweets. Each tweet is labeled individually at the tweet level, with a binary classification label:
1: indicating that the tweet was written by a user identified as experiencing depression
0: representing a tweet from a user in the non-depressed (control) group.
The structure typically includes columns for the tweet text and its corresponding label. This dataset enables fine-grained analysis of emotional expression, word usage, and mental health indicators in social media posts. It serves as a valuable resource for sentiment analysis, natural language processing (NLP), and mental health classification studies, offering insights into how language reflects psychological states.
The dataset is first loaded from a CSV file. Since the text in the dataset follows the tweet format, a text cleaning is required for text-tokenization. A text cleaning function was defined to standardize the content by converting all text to lowercase and removing URLs, emojis, punctuation, numbers and extra whitespace. Then, only the relevant columns, ‘user_id’, ‘post_text’ and ‘label’ were selected for analysis. The cleaning function was then applied to the tweet text, and the cleaned text was tokenized into individual words using the unnest_tokens() function. Common English stop words were removed using a customized stop word list, which combined the built-in dataset with additional irrelevant words to improve word extraction accuracy. Finally, the label column is recoded from numeric form (1 for depressed, 0 for control) into descriptive categories: “Depressed” and “Control”. As a result, a tidy dataset was created, where each row represents a cleaned, meaningful word from each tweet, ready for further analysis.
library(tidytext)
library(stringr)
library(textclean)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
tweets <- read.csv("Mental-Health-Twitter.csv")
head(tweets)
## X post_id post_created
## 1 0 6.378947e+17 Sun Aug 30 07:48:37 +0000 2015
## 2 1 6.378904e+17 Sun Aug 30 07:31:33 +0000 2015
## 3 2 6.377493e+17 Sat Aug 29 22:11:07 +0000 2015
## 4 3 6.376964e+17 Sat Aug 29 18:40:49 +0000 2015
## 5 4 6.376963e+17 Sat Aug 29 18:40:26 +0000 2015
## 6 5 6.376928e+17 Sat Aug 29 18:26:24 +0000 2015
## post_text
## 1 It's just over 2 years since I was diagnosed with #anxiety and #depression. Today I'm taking a moment to reflect on how far I've come since.
## 2 It's Sunday, I need a break, so I'm planning to spend as little time as possible on the #A14...
## 3 Awake but tired. I need to sleep but my brain has other ideas...
## 4 RT @SewHQ: #Retro bears make perfect gifts and are great for beginners too! Get stitching with October's Sew on sale NOW! #yay http://t.co/…
## 5 It’s hard to say whether packing lists are making life easier or just reinforcing how much still needs doing... #movinghouse #anxiety
## 6 Making packing lists is my new hobby... #movinghouse
## user_id followers friends favourites statuses retweets label
## 1 1013187241 84 211 251 837 0 1
## 2 1013187241 84 211 251 837 1 1
## 3 1013187241 84 211 251 837 0 1
## 4 1013187241 84 211 251 837 2 1
## 5 1013187241 84 211 251 837 1 1
## 6 1013187241 84 211 251 837 1 1
clean_tweets <- function(text) {
text %>%
str_to_lower() %>%
str_replace_all("http\\S+\\s*", "") %>%
str_replace_all("[^\x01-\x7F]", "") %>%
str_replace_all("[[:punct:]]", " ") %>%
str_replace_all("[0-9]+", "") %>%
str_squish()
}
data_selected <- tweets %>%
select(user_id, post_text, label)
custom_stop_words <- bind_rows(
stop_words,
tibble(word = c("rt", "don", "amp", "ll","ve", "ii", "hey", "yong"),
lexicon = "custom"
))
tidy_tweets <- data_selected %>%
mutate(clean_text = clean_tweets(post_text)) %>%
unnest_tokens(word, clean_text) %>%
anti_join(custom_stop_words, by = "word") %>%
mutate(label = ifelse(label == 1, "Depressed", "Control"))
head(tidy_tweets)
## user_id
## 1 1013187241
## 2 1013187241
## 3 1013187241
## 4 1013187241
## 5 1013187241
## 6 1013187241
## post_text
## 1 It's just over 2 years since I was diagnosed with #anxiety and #depression. Today I'm taking a moment to reflect on how far I've come since.
## 2 It's just over 2 years since I was diagnosed with #anxiety and #depression. Today I'm taking a moment to reflect on how far I've come since.
## 3 It's just over 2 years since I was diagnosed with #anxiety and #depression. Today I'm taking a moment to reflect on how far I've come since.
## 4 It's just over 2 years since I was diagnosed with #anxiety and #depression. Today I'm taking a moment to reflect on how far I've come since.
## 5 It's just over 2 years since I was diagnosed with #anxiety and #depression. Today I'm taking a moment to reflect on how far I've come since.
## 6 It's just over 2 years since I was diagnosed with #anxiety and #depression. Today I'm taking a moment to reflect on how far I've come since.
## label word
## 1 Depressed diagnosed
## 2 Depressed anxiety
## 3 Depressed depression
## 4 Depressed taking
## 5 Depressed moment
## 6 Depressed reflect
To explore emotional differences in language use between depressed and non-depressed Twitter users, I utilized the NRC sentiment lexicon, which categorizes words into ten emotional categories (such as joy, sadness, anger, fear, etc.) along with positive and negative sentiments.
Importantly, the NRC lexicon was selected over alternatives such as Bing and AFINN because of its multi-dimensional emotional framework. While Bing offers a binary positive/negative classification and AFINN provides a numerical sentiment score, NRC captures a richer and more nuanced emotional spectrum. This aligns more closely with the study’s objective—to analyze emotional complexity and variation in language patterns between depressed and non-depressed users, rather than simply identifying polarity or intensity. The NRC approach enables a deeper understanding of specific emotional themes, such as fear or trust, that are particularly relevant to mental health discourse.
First, the cleaned tweet data was joined with the NRC lexicon using inner_join(get_sentiments(“nrc”)) to tag each tokenized word with its associated sentiment. After labeling each tweet by group (Depressed or Control), I calculated the frequency of each emotion within each group using group_by() and summarise() functions.
nrc_tweets <- tidy_tweets %>%
inner_join(get_sentiments("nrc"))
## Joining with `by = join_by(word)`
## Warning in inner_join(., get_sentiments("nrc")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2 of `x` matches multiple rows in `y`.
## ℹ Row 607 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
top_nrc <- nrc_tweets %>%
group_by(label, sentiment) %>%
summarise(count = n()) %>%
arrange(label, desc(count)) %>%
ungroup()
## `summarise()` has grouped output by 'label'. You can override using the
## `.groups` argument.
top_nrc
## # A tibble: 20 × 3
## label sentiment count
## <chr> <chr> <int>
## 1 Control positive 5035
## 2 Control negative 3839
## 3 Control trust 3050
## 4 Control joy 2495
## 5 Control anticipation 2487
## 6 Control fear 2167
## 7 Control anger 1950
## 8 Control sadness 1720
## 9 Control surprise 1517
## 10 Control disgust 1444
## 11 Depressed positive 5642
## 12 Depressed negative 5636
## 13 Depressed sadness 3470
## 14 Depressed trust 3228
## 15 Depressed anticipation 3047
## 16 Depressed joy 2918
## 17 Depressed fear 2793
## 18 Depressed anger 2362
## 19 Depressed disgust 1661
## 20 Depressed surprise 1145
To facilitate comparison, the resulting sentiment counts were visualized using a facet bar graph, where each facet represents a different NRC emotion. This visualization makes it easy to identify which emotions are more prevalent in each user group. Plus, the facet bar plot was chosen because it effectively separates and compares emotional categories across two groups side by side, making differences in emotional tone visually intuitive and statistically interpretable.
library(ggplot2)
ggplot(top_nrc, aes(x = label, y = count, fill = label)) +
geom_col(show.legend = T, width = 0.6) +
facet_wrap(~ sentiment, scales = "free_y") +
labs(
title = "Top 10 NRC Emotions by Group",
x = "Group",
y = "Word Frequency"
) +
theme_minimal() +
theme(strip.text = element_text(size = 11, face = "bold"),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
Analysis of the NRC sentiment graph revealed that the Depressed group consistently used more emotional words than the Control group across 9 of the 10 sentiment categories. Particularly striking were the large differences in words associated with negative emotions such as sadness, fear, and negative sentiment overall. This pattern aligns with clinical understandings of depression, where individuals tend to express more distress-related emotions in daily language use.
However, it was a notable result that the Depressed group also used more words associated with ‘joy’ and ‘positive’ sentiment compared to the Control group. This suggests that the use of emotionally positive words by depressed users may not reflect actual positive emotional experiences, but rather indicate a psychological longing, nostalgia, or self-presentation strategy. In some cases, these expressions may serve as a form of self-reassurance or reflect a desire to regain emotional balance. It also illustrates how sentiment analysis based purely on word frequency can overlook contextual meaning, reinforcing the need for complementary methods that capture semantic nuance.
The second figure presents a log odds ratio analysis to identify the most characteristic words used by each group, Depressed and Control, in their tweets.
The log odds ratio was chosen for this analysis because it is particularly effective in highlighting words that are statistically characteristic of one group over another, regardless of overall word frequency. Compared to simple frequency or TF-IDF, which can sometimes favor high-frequency generic words, the log odds ratio adjusts for both overall token counts and group imbalance. It helps surface distinctive lexical features that differentiate one population from another, which is crucial in a study focused on group-based linguistic and psychological differences.
To create this figure, word frequency data was first grouped by label and the top 10 most common words per group were extracted. Using a smoothed probability approach, the log odds ratio was calculated by taking the logarithm of the ratio between each word’s relative frequency in the Control group versus the Depressed group.
library(tidyr)
tweets_freq <- tidy_tweets %>%
group_by(label) %>% count(label, word)
lor_freq <- tweets_freq %>%
group_by(label) %>%
slice_max(n, n = 10) %>%
pivot_wider(names_from = label, values_from = n, values_fill = 0) %>%
mutate(ratio_Control = ((Control + 1)/(sum(Control + 1))),
ratio_Depressed = ((Depressed + 1)/(sum(Depressed + 1)))) %>%
mutate(log_odds_ratio = log(ratio_Control/ratio_Depressed))
lor_freq
## # A tibble: 17 × 6
## word Control Depressed ratio_Control ratio_Depressed log_odds_ratio
## <chr> <int> <int> <dbl> <dbl> <dbl>
## 1 user 509 0 0.189 0.000305 6.43
## 2 trump 428 0 0.159 0.000305 6.25
## 3 love 282 318 0.105 0.0973 0.0734
## 4 twitter 269 0 0.0999 0.000305 5.79
## 5 people 268 308 0.0995 0.0942 0.0545
## 6 realdonaldtru… 259 0 0.0962 0.000305 5.75
## 7 time 172 233 0.0640 0.0714 -0.109
## 8 joe 169 0 0.0629 0.000305 5.33
## 9 cameronhoodkin 166 0 0.0618 0.000305 5.31
## 10 putin 164 0 0.0610 0.000305 5.30
## 11 depression 0 886 0.000370 0.271 -6.59
## 12 misslusyd 0 332 0.000370 0.102 -5.61
## 13 treatments 0 268 0.000370 0.0820 -5.40
## 14 sos 0 255 0.000370 0.0781 -5.35
## 15 day 0 223 0.000370 0.0683 -5.22
## 16 genevieveverso 0 220 0.000370 0.0674 -5.20
## 17 overcome 0 219 0.000370 0.0671 -5.20
top_lor_tweets <- lor_freq %>%
group_by(label = ifelse(log_odds_ratio > 0, "Control", "Depressed")) %>%
slice_max(abs(log_odds_ratio), n = 10, with_ties = F)
top_lor_tweets
## # A tibble: 17 × 7
## # Groups: label [2]
## word Control Depressed ratio_Control ratio_Depressed log_odds_ratio label
## <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 user 509 0 0.189 0.000305 6.43 Cont…
## 2 trump 428 0 0.159 0.000305 6.25 Cont…
## 3 twitter 269 0 0.0999 0.000305 5.79 Cont…
## 4 realdon… 259 0 0.0962 0.000305 5.75 Cont…
## 5 joe 169 0 0.0629 0.000305 5.33 Cont…
## 6 cameron… 166 0 0.0618 0.000305 5.31 Cont…
## 7 putin 164 0 0.0610 0.000305 5.30 Cont…
## 8 love 282 318 0.105 0.0973 0.0734 Cont…
## 9 people 268 308 0.0995 0.0942 0.0545 Cont…
## 10 depress… 0 886 0.000370 0.271 -6.59 Depr…
## 11 misslus… 0 332 0.000370 0.102 -5.61 Depr…
## 12 treatme… 0 268 0.000370 0.0820 -5.40 Depr…
## 13 sos 0 255 0.000370 0.0781 -5.35 Depr…
## 14 day 0 223 0.000370 0.0683 -5.22 Depr…
## 15 genevie… 0 220 0.000370 0.0674 -5.20 Depr…
## 16 overcome 0 219 0.000370 0.0671 -5.20 Depr…
## 17 time 172 233 0.0640 0.0714 -0.109 Depr…
library(ggplot2)
ggplot(top_lor_tweets, aes(x = reorder(word, log_odds_ratio),
y = log_odds_ratio,
fill = label)) +
geom_col(show.legend = T) +
coord_flip() +
labs(title = "Top 10 Log Odds Ratio Words in Tweets by Depressed and Control Groups", x = NULL)
This visualization displays the top 10 words with the highest absolute log odds ratio values, meaning those that are most disproportionately associated with either the Depressed or Control group. The geom_col horizontal bar chart was used to clearly differentiate the most group-distinctive words, with positive values indicating association with the Control group and negative values indicating association with the Depressed group. Through this analysis, I aimed to answer: Which specific words are most indicative of the linguistic differences between depressed and non-depressed Twitter users? From this analysis, notable patterns emerged. Words most associated with the Depressed group include depression, treatments, sos, and overcome—all terms directly related to emotional struggles, mental health, and seeking help or healing. In contrast, the Control group predominantly used neutral terms. This divergence in word usage suggests that depressed users’ language is more emotionally loaded and psychologically revealing, often oriented around coping, distress, or personal reflection. Meanwhile, control users’ language reflects more general or situational content.
In showing the figures that you created, describe why you designed it the way you did. Why did you choose those colors, fonts, and other design elements? Does it convey truth?
In this study, I visualized the semantic patterns in the Control and Depressed groups using trigram network graphs. These visualizations were designed to uncover frequent three-word co-occurrences and examine how users in each group express thoughts, emotions, and experiences through language.
The choice of trigram networks was motivated by the desire to go beyond surface-level word frequencies. While unigrams offer limited context and bigrams may oversimplify relationships, trigrams strike a balance, capturing meaningful patterns. Expressions contain syntactic and semantic depth, reflecting more nuanced emotional states and actions. N-gram network analysis thus enables the study of how individuals construct their thoughts and not just what vocabulary they use.
Initially, I intended to use phi coefficient-based visualizations to examine word associations. However, the computation of phi coefficients for key terms like “depression” resulted in repeated NaN (Not a Number) errors, likely due to sparse co-occurrence or imbalance in contingency table distributions. These technical limitations led to the decision to adopt n-gram network visualization as a more flexible and interpretable alternative. This pivot allowed the study to retain a focus on co-occurrence and semantic structure, but in a format that better accommodates data sparsity and visual storytelling.
The graph was built using a force-directed layout (layout = “fr”), allowing the natural clustering of related trigrams. The layout enhances interpretability by grouping closely associated word triplets into color-coded communities, which were identified using group_infomap(). Color palette was automatically generated based on group membership, making each thematic cluster visually distinct. This helps identify latent linguistic topics without manual labeling. Node size represents degree centrality, visually emphasizing the most structurally influential trigrams. Text labels were made readable using repel = TRUE and max.overlaps = Inf, ensuring that important terms remain legible even in dense graphs. Edges were lightly transparent (alpha = 0.4) to avoid clutter and focus attention on structure rather than volume alone.
This approach supports truthful representation of the data: it reveals what phrases are frequently used, which terms connect semantically, and which trigrams are central to the discourse within each group.
library(tidytext)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ lubridate 1.9.4 ✔ tibble 3.2.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
tidy_control <- tidy_tweets %>%
filter(label == "Control")
con_trigram <- tidy_control %>%
unnest_tokens(input = post_text,
output = word,
token = "ngrams",
n = 3)
con_seprated <- con_trigram %>%
separate(word, c("word1", "word2", "word3"), sep = " ") %>%
filter(!word1 %in% c("br", custom_stop_words$word, "http","https", "t.co"),
!word2 %in% c("br", custom_stop_words$word, "http","https", "t.co"),
!word3 %in% c("br", custom_stop_words$word, "http","https", "t.co"))
con_pairs <- con_seprated %>%
count(word1, word2, word3, sort = T) %>%
na.omit()
head(con_pairs)
## word1 word2 word3 n
## 1 gt gt gt 377
## 2 pillowtalk bestmusicvideo iheartawards 339
## 4 cartoon fake yhvh 143
## 5 fake yhvh fuck 143
## 6 stupid sun glasses 143
## 7 video pillowtalk bestmusicvideo 130
library(tidygraph)
##
## Attaching package: 'tidygraph'
## The following object is masked from 'package:stats':
##
## filter
con_tri_graph <- con_pairs %>%
filter(n >= 60) %>%
as_tbl_graph(directed = F) %>%
mutate(centrality = centrality_degree(),
group = as.factor(group_infomap()))
library(ggraph)
set.seed(1234)
ggraph(con_tri_graph, layout = "fr") +
geom_edge_link(color = "gray50",
alpha = 0.4) +
geom_node_point(aes(size = centrality,
color = group),
show.legend = F) +
scale_size(range = c(4, 10)) +
geom_node_text(aes(label = name),
repel = TRUE,
max.overlaps = Inf,
size = 4) +
labs(title = "Trigram Network of Control Tweets") +
theme_graph()
In the Depressed group’s trigram network, I purposefully narrowed the focus to trigrams that include one of six keywords: “depression”, “treatments”, “overcome”, “sos”, “day”, “time”. These were selected based on prior log odds ratio analysis, which identified them as being significantly more frequent in the Depressed group compared to the Control group.
These terms were seen as semantically central to understanding the unique language patterns of individuals expressing depressive symptoms. By analyzing their surrounding trigrams, I sought to better understand the linguistic context in which these terms appear—whether they signal help-seeking behavior, expressions of struggle, or descriptions of lived experience.
library(tidytext)
library(tidyverse)
tidy_depressed <- tidy_tweets %>%
filter(label == "Depressed")
dep_trigram <- tidy_depressed %>%
unnest_tokens(input = post_text,
output = word,
token = "ngrams",
n = 3)
dep_seprated <- dep_trigram %>%
separate(word, c("word1", "word2", "word3"), sep = " ") %>%
filter(!word1 %in% c("br", custom_stop_words$word, "http","https", "t.co"),
!word2 %in% c("br", custom_stop_words$word, "http","https", "t.co"),
!word3 %in% c("br", custom_stop_words$word, "http","https", "t.co"))
target <- c("depression","treatments", "overcome","sos", "day", "time")
dep_pairs <- dep_seprated %>%
count(word1, word2, word3, sort = T) %>%
na.omit() %>%
filter(word1 %in% target)
head(dep_pairs)
## word1 word2 word3 n
## 1 overcome depressive disorders 117
## 2 depression depression treatments 85
## 3 depression article teller 74
## 4 overcome depression mental 23
## 5 overcome depression sleep 20
## 6 depression florida times 18
library(tidygraph)
dep_tri_graph <- dep_pairs %>%
filter(n >= 11) %>%
as_tbl_graph(directed = F) %>%
mutate(centrality = centrality_degree(),
group = as.factor(group_infomap()))
library(ggraph)
set.seed(1234)
ggraph(dep_tri_graph, layout = "fr") +
geom_edge_link(color = "gray50",
alpha = 0.4) +
geom_node_point(aes(size = centrality,
color = group),
show.legend = F) +
scale_size(range = c(4, 10)) +
geom_node_text(aes(label = name),
repel = TRUE,
max.overlaps = Inf,
size = 3) +
labs(title = "Trigram Network of Depressed Tweets") +
theme_graph()
The trigram network of the Control group encompassed a wide range of topics, including politics, music, and popular culture. For instance, network structures such as “democratic play game” and “hashtags trump quotes” suggest that general users often express political opinions or engage with casual, entertainment-oriented themes such as music requests or social media memes. The frequent presence of usernames and user tags further indicates that language use in this group tends to be interaction-driven and socially embedded, reflecting a communicative style focused on external engagement and everyday discourse.
In contrast, the trigram network of the Depressed group was constructed with a more focused analytical intent. Specifically, I extracted and visualized only those trigrams that contained one of six key terms: “depression”, “treatments”, “overcome”, “sos”, “day”, and “time”. These keywords were identified through a prior log odds ratio analysis as statistically overrepresented in the Depressed group compared to the Control group. However, their selection was not based solely on frequency. Rather, these terms were chosen to examine the recurring linguistic contexts in which emotional and psychological experiences—especially those related to depression—are framed.
The resulting network showed that these six keywords often appeared as central nodes, connecting to a variety of surrounding words and forming semantically dense clusters. This indicates that individuals expressing depressive symptoms tend to use language structures that reflect a mixture of emotional expression, therapeutic efforts, and altered perceptions of time and daily life. Terms like “anxiety”, “depression”, and “treatment” frequently co-occurred, often in sequential order, illustrating how users may narrate their struggles in cohesive linguistic units.
Some trigrams, such as “wont casually day lightning” or “prompts day lightning”, may initially appear ungrammatical or fragmented. However, they can be interpreted as poetic or metaphorical attempts to describe emotional states—for example, “a day that passes like lightning without meaning”. Such language use suggests an indirect or symbolic mode of emotional expression. It appears that individuals in the Depressed group may prefer metaphor and figurative language over explicit descriptions, potentially as a coping mechanism or due to the stigmatized nature of mental health discourse.
In summary, the Control group’s network is characterized by a horizontal structure centered on everyday information and social interaction. Meanwhile, the Depressed group’s network forms a more centrally concentrated structure, where emotion-laden keywords such as symptoms, treatments, and psychological states serve as anchors. This contrast highlights how differing psychological states influence not only word choice but also the semantic architecture of digital communication.