This project analyzes the dataset titled “Stress Level Detection in Social Media” obtained from Kaggle. The dataset includes social media posts categorized by stress labels and various metadata such as user type. For this analysis, we use only the first 100 observations to explore which factors help predict whether a post is marked as “stressed” or “not stressed.”
# Load required packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load the dataset
stress_data <- read.csv("dreaddit-train.csv")
# Use only the first 100 observations
stress_sample <- stress_data[1:100, ]
# Check structure
str(stress_sample)
## 'data.frame': 100 obs. of 116 variables:
## $ subreddit : chr "ptsd" "assistance" "ptsd" "relationships" ...
## $ post_id : chr "8601tu" "8lbrx9" "9ch1zh" "7rorpp" ...
## $ sentence_range : chr "(15, 20)" "(0, 5)" "(15, 20)" "[5, 10]" ...
## $ text : chr "He said he had not felt that way before, suggeted I go rest and so ..TRIGGER AHEAD IF YOUI'RE A HYPOCONDRIAC LI"| __truncated__ "Hey there r/assistance, Not sure if this is the right place to post this.. but here goes =) I'm currently a stu"| __truncated__ "My mom then hit me with the newspaper and it shocked me that she would do this, she knows I don't like play hit"| __truncated__ "until i met my new boyfriend, he is amazing, he is kind, he is sweet, he is a good student, he likes the same t"| __truncated__ ...
## $ id : int 33181 2606 38816 239 1421 17554 165 33053 7581 1517 ...
## $ label : int 1 0 1 1 1 1 0 1 1 1 ...
## $ confidence : num 0.8 1 0.8 0.6 0.8 1 0.8 0.8 0.6 1 ...
## $ social_timestamp : int 1521614353 1527009817 1535935605 1516429555 1539809005 1517274027 1512854409 1483582174 1514843984 1490428087 ...
## $ social_karma : int 5 4 2 0 24 2 6 1 134 20 ...
## $ syntax_ari : num 1.81 9.43 7.77 2.67 7.55 ...
## $ lex_liwc_WC : int 116 109 167 273 89 105 119 112 81 76 ...
## $ lex_liwc_Analytic : num 72.64 79.08 33.8 2.98 32.22 ...
## $ lex_liwc_Clout : num 15 76.8 76.4 15.2 28.7 ...
## $ lex_liwc_Authentic : num 89.3 56.8 86.2 95.4 84 ...
## $ lex_liwc_Tone : num 1 98.2 25.8 79.3 1 ...
## $ lex_liwc_WPS : num 29 27.2 33.4 54.6 17.8 ...
## $ lex_liwc_Sixltr : num 12.93 21.1 17.37 8.06 31.46 ...
## $ lex_liwc_Dic : num 87.1 87.2 91 98.9 88.8 ...
## $ lex_liwc_function : num 56 48.6 61.7 65.6 52.8 ...
## $ lex_liwc_pronoun : num 16.4 11.9 25.1 30.4 15.7 ...
## $ lex_liwc_ppron : num 12.07 7.34 16.17 23.44 11.24 ...
## $ lex_liwc_i : num 9.48 1.83 8.98 16.12 7.87 ...
## $ lex_liwc_we : num 0 2.75 1.8 0.37 0 3.81 2.52 0 3.7 2.63 ...
## $ lex_liwc_you : num 0.86 2.75 1.8 0.37 0 0 0 0 0 0 ...
## $ lex_liwc_shehe : num 1.72 0 2.99 6.59 3.37 ...
## $ lex_liwc_they : num 0 0 0.6 0 0 0 0 0 0 0 ...
## $ lex_liwc_ipron : num 4.31 4.59 8.98 6.96 4.49 5.71 2.52 8.93 1.23 3.95 ...
## $ lex_liwc_article : num 3.45 8.26 5.39 3.3 4.49 1.9 5.04 2.68 6.17 5.26 ...
## $ lex_liwc_prep : num 19.83 13.76 12.57 9.16 8.99 ...
## $ lex_liwc_auxverb : num 7.76 6.42 10.18 8.79 13.48 ...
## $ lex_liwc_adverb : num 5.17 3.67 1.8 6.59 4.49 ...
## $ lex_liwc_conj : num 4.31 8.26 5.99 9.89 4.49 ...
## $ lex_liwc_negate : num 1.72 0.92 1.2 3.66 2.25 4.76 3.36 0 1.23 1.32 ...
## $ lex_liwc_verb : num 16.4 15.6 21 20.9 13.5 ...
## $ lex_liwc_adj : num 6.03 2.75 1.2 3.66 4.49 3.81 5.04 4.46 6.17 6.58 ...
## $ lex_liwc_compare : num 3.45 0.92 0.6 1.83 2.25 1.9 2.52 3.57 3.7 2.63 ...
## $ lex_liwc_interrog : num 0.86 0.92 0.6 1.1 1.12 2.86 1.68 2.68 2.47 1.32 ...
## $ lex_liwc_number : num 1.72 2.75 1.2 0 1.12 2.86 0.84 0 2.47 1.32 ...
## $ lex_liwc_quant : num 1.72 0.92 1.8 1.1 1.12 2.86 2.52 0.89 1.23 3.95 ...
## $ lex_liwc_affect : num 8.62 5.5 2.4 8.79 7.87 5.71 5.04 8.04 7.41 7.89 ...
## $ lex_liwc_posemo : num 1.72 5.5 1.2 5.86 0 0.95 4.2 1.79 0 1.32 ...
## $ lex_liwc_negemo : num 6.9 0 1.2 2.93 7.87 4.76 0.84 6.25 7.41 6.58 ...
## $ lex_liwc_anx : num 0.86 0 0 0 1.12 0.95 0 2.68 1.23 0 ...
## $ lex_liwc_anger : num 2.59 0 0 0.37 4.49 0.95 0 0 0 5.26 ...
## $ lex_liwc_sad : num 3.45 0 0 0.73 0 1.9 0.84 0.89 3.7 0 ...
## $ lex_liwc_social : num 3.45 11.01 15.57 13.55 8.99 ...
## $ lex_liwc_family : num 0 0 0.6 0.37 0 0 1.68 0 3.7 0 ...
## $ lex_liwc_friend : num 0 0 3.59 1.1 0 0.95 0 0 0 1.32 ...
## $ lex_liwc_female : num 0 0 1.8 0.37 0 0 2.52 0 0 0 ...
## $ lex_liwc_male : num 1.72 0 2.4 8.06 4.49 ...
## $ lex_liwc_cogproc : num 11.2 11.9 10.2 16.9 11.2 ...
## $ lex_liwc_insight : num 3.45 1.83 4.19 7.69 3.37 3.81 0 5.36 3.7 3.95 ...
## $ lex_liwc_cause : num 0.86 0 1.2 0.73 2.25 0.95 1.68 3.57 0 1.32 ...
## $ lex_liwc_discrep : num 2.59 3.67 0.6 1.83 0 3.81 0.84 0.89 0 0 ...
## $ lex_liwc_tentat : num 5.17 5.5 2.99 1.83 0 0.95 1.68 0.89 2.47 0 ...
## $ lex_liwc_certain : num 0 1.83 0 1.47 1.12 0.95 0.84 0.89 1.23 1.32 ...
## $ lex_liwc_differ : num 2.59 6.42 1.8 4.76 4.49 4.76 5.04 3.57 2.47 2.63 ...
## $ lex_liwc_percept : num 6.03 0.92 0 7.33 2.25 0 4.2 0 3.7 3.95 ...
## $ lex_liwc_see : num 1.72 0.92 0 1.1 0 0 2.52 0 0 0 ...
## $ lex_liwc_hear : num 1.72 0 0 0 0 0 1.68 0 0 0 ...
## $ lex_liwc_feel : num 1.72 0 0 5.49 2.25 0 0 0 3.7 3.95 ...
## $ lex_liwc_bio : num 2.59 0 0.6 2.2 2.25 0.95 0.84 4.46 1.23 9.21 ...
## $ lex_liwc_body : num 0.86 0 0.6 0 0 0 0 3.57 0 3.95 ...
## $ lex_liwc_health : num 1.72 0 0 0 1.12 0.95 0.84 0.89 1.23 2.63 ...
## $ lex_liwc_sexual : num 0 0 0 0.37 1.12 0 0.84 0 0 1.32 ...
## $ lex_liwc_ingest : num 0 0 0 0.37 0 0 0 0 0 2.63 ...
## $ lex_liwc_drives : num 8.62 15.6 8.98 6.59 7.87 ...
## $ lex_liwc_affiliation : num 0 5.5 5.39 4.03 0 7.62 3.36 0 9.88 5.26 ...
## $ lex_liwc_achieve : num 1.72 3.67 0.6 0 2.25 2.86 0.84 2.68 0 0 ...
## $ lex_liwc_power : num 4.31 7.34 1.2 0.73 4.49 3.81 2.52 2.68 1.23 3.95 ...
## $ lex_liwc_reward : num 0.86 2.75 2.4 1.1 0 0.95 0 2.68 0 0 ...
## $ lex_liwc_risk : num 2.59 0 0 0.73 1.12 0.95 0 2.68 2.47 1.32 ...
## $ lex_liwc_focuspast : num 4.31 0.92 3.59 7.69 5.62 ...
## $ lex_liwc_focuspresent : num 11.21 13.76 14.37 13.19 6.74 ...
## $ lex_liwc_focusfuture : num 0.86 0.92 1.8 1.1 0 0.95 0 2.68 0 0 ...
## $ lex_liwc_relativ : num 17.2 15.6 16.2 11 14.6 ...
## $ lex_liwc_motion : num 0.86 2.75 4.79 1.83 2.25 1.9 0.84 0.89 4.94 0 ...
## $ lex_liwc_space : num 10.34 10.09 5.99 3.3 2.25 ...
## $ lex_liwc_time : num 6.03 1.83 5.39 6.23 10.11 ...
## $ lex_liwc_work : num 0.86 11.01 0 0.73 1.12 ...
## $ lex_liwc_leisure : num 0 0 1.2 0.37 0 0 0 0 0 3.95 ...
## $ lex_liwc_home : num 0 0 0.6 0.37 2.25 0 0 0 1.23 0 ...
## $ lex_liwc_money : num 0 0.92 0 0 1.12 0 0 0 1.23 0 ...
## $ lex_liwc_relig : num 2.59 0 0 0 0 0 0 0 0 0 ...
## $ lex_liwc_death : num 0 0 0 0 0 0 0 0 0 1.32 ...
## $ lex_liwc_informal : num 0.86 1.83 0 2.56 1.12 0.95 0 0.89 0 5.26 ...
## $ lex_liwc_swear : num 0.86 0 0 0 0 0 0 0 0 3.95 ...
## $ lex_liwc_netspeak : num 0 0.92 0 0.73 0 0.95 0 0 0 0 ...
## $ lex_liwc_assent : num 0 0 0 0.73 0 0 0 0.89 0 1.32 ...
## $ lex_liwc_nonflu : num 0 0 0 0 0 0 0 0 0 0 ...
## $ lex_liwc_filler : num 0 0 0 0 0 0 0 0 0 0 ...
## $ lex_liwc_AllPunc : num 21.6 14.7 10.8 12.1 16.9 ...
## $ lex_liwc_Period : num 9.48 4.59 2.4 2.56 5.62 4.76 4.2 4.46 6.17 7.89 ...
## $ lex_liwc_Comma : num 3.45 2.75 3.59 7.33 6.74 5.71 7.56 0 1.23 2.63 ...
## $ lex_liwc_Colon : num 0.86 0 0 0 1.12 0.95 0 0 1.23 0 ...
## $ lex_liwc_SemiC : num 0.86 0 0 0 0 0.95 0 0 0 0 ...
## $ lex_liwc_QMark : num 0 0 0.6 0 0 2.86 0 0 0 1.32 ...
## $ lex_liwc_Exclam : num 0 0 0 0 0 0 0 0 0 0 ...
## $ lex_liwc_Dash : num 0 0 0 0 0 0.95 0 0 0 0 ...
## [list output truncated]
The logistic regression analysis explored how user type and post length influenced the likelihood of a social media post being labeled as “stressed” or “not stressed.” Results showed that user type had a statistically significant effect, suggesting that some groups were more likely to express stress online than others. Post length also showed a positive association with stress labels longer posts tended to be more likely marked as “stressed,” possibly reflecting more expressive or emotional content.
These findings suggest that user characteristics and message length can be useful indicators for identifying emotional tone in digital content. For marketers, especially those managing social media engagement or mental health campaigns, this insight could inform strategies for tailoring content, monitoring sentiment, or identifying at-risk audiences. Understanding how different users express stress helps humanize data and supports more empathetic, data-informed marketing decisions aligned with customer behavior trends.