Introduction

This project analyzes the dataset titled “Stress Level Detection in Social Media” obtained from Kaggle. The dataset includes social media posts categorized by stress labels and various metadata such as user type. For this analysis, we use only the first 100 observations to explore which factors help predict whether a post is marked as “stressed” or “not stressed.”

Data Preparation

# Load required packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load the dataset
stress_data <- read.csv("dreaddit-train.csv")

# Use only the first 100 observations
stress_sample <- stress_data[1:100, ]

# Check structure
str(stress_sample)
## 'data.frame':    100 obs. of  116 variables:
##  $ subreddit               : chr  "ptsd" "assistance" "ptsd" "relationships" ...
##  $ post_id                 : chr  "8601tu" "8lbrx9" "9ch1zh" "7rorpp" ...
##  $ sentence_range          : chr  "(15, 20)" "(0, 5)" "(15, 20)" "[5, 10]" ...
##  $ text                    : chr  "He said he had not felt that way before, suggeted I go rest and so ..TRIGGER AHEAD IF YOUI'RE A HYPOCONDRIAC LI"| __truncated__ "Hey there r/assistance, Not sure if this is the right place to post this.. but here goes =) I'm currently a stu"| __truncated__ "My mom then hit me with the newspaper and it shocked me that she would do this, she knows I don't like play hit"| __truncated__ "until i met my new boyfriend, he is amazing, he is kind, he is sweet, he is a good student, he likes the same t"| __truncated__ ...
##  $ id                      : int  33181 2606 38816 239 1421 17554 165 33053 7581 1517 ...
##  $ label                   : int  1 0 1 1 1 1 0 1 1 1 ...
##  $ confidence              : num  0.8 1 0.8 0.6 0.8 1 0.8 0.8 0.6 1 ...
##  $ social_timestamp        : int  1521614353 1527009817 1535935605 1516429555 1539809005 1517274027 1512854409 1483582174 1514843984 1490428087 ...
##  $ social_karma            : int  5 4 2 0 24 2 6 1 134 20 ...
##  $ syntax_ari              : num  1.81 9.43 7.77 2.67 7.55 ...
##  $ lex_liwc_WC             : int  116 109 167 273 89 105 119 112 81 76 ...
##  $ lex_liwc_Analytic       : num  72.64 79.08 33.8 2.98 32.22 ...
##  $ lex_liwc_Clout          : num  15 76.8 76.4 15.2 28.7 ...
##  $ lex_liwc_Authentic      : num  89.3 56.8 86.2 95.4 84 ...
##  $ lex_liwc_Tone           : num  1 98.2 25.8 79.3 1 ...
##  $ lex_liwc_WPS            : num  29 27.2 33.4 54.6 17.8 ...
##  $ lex_liwc_Sixltr         : num  12.93 21.1 17.37 8.06 31.46 ...
##  $ lex_liwc_Dic            : num  87.1 87.2 91 98.9 88.8 ...
##  $ lex_liwc_function       : num  56 48.6 61.7 65.6 52.8 ...
##  $ lex_liwc_pronoun        : num  16.4 11.9 25.1 30.4 15.7 ...
##  $ lex_liwc_ppron          : num  12.07 7.34 16.17 23.44 11.24 ...
##  $ lex_liwc_i              : num  9.48 1.83 8.98 16.12 7.87 ...
##  $ lex_liwc_we             : num  0 2.75 1.8 0.37 0 3.81 2.52 0 3.7 2.63 ...
##  $ lex_liwc_you            : num  0.86 2.75 1.8 0.37 0 0 0 0 0 0 ...
##  $ lex_liwc_shehe          : num  1.72 0 2.99 6.59 3.37 ...
##  $ lex_liwc_they           : num  0 0 0.6 0 0 0 0 0 0 0 ...
##  $ lex_liwc_ipron          : num  4.31 4.59 8.98 6.96 4.49 5.71 2.52 8.93 1.23 3.95 ...
##  $ lex_liwc_article        : num  3.45 8.26 5.39 3.3 4.49 1.9 5.04 2.68 6.17 5.26 ...
##  $ lex_liwc_prep           : num  19.83 13.76 12.57 9.16 8.99 ...
##  $ lex_liwc_auxverb        : num  7.76 6.42 10.18 8.79 13.48 ...
##  $ lex_liwc_adverb         : num  5.17 3.67 1.8 6.59 4.49 ...
##  $ lex_liwc_conj           : num  4.31 8.26 5.99 9.89 4.49 ...
##  $ lex_liwc_negate         : num  1.72 0.92 1.2 3.66 2.25 4.76 3.36 0 1.23 1.32 ...
##  $ lex_liwc_verb           : num  16.4 15.6 21 20.9 13.5 ...
##  $ lex_liwc_adj            : num  6.03 2.75 1.2 3.66 4.49 3.81 5.04 4.46 6.17 6.58 ...
##  $ lex_liwc_compare        : num  3.45 0.92 0.6 1.83 2.25 1.9 2.52 3.57 3.7 2.63 ...
##  $ lex_liwc_interrog       : num  0.86 0.92 0.6 1.1 1.12 2.86 1.68 2.68 2.47 1.32 ...
##  $ lex_liwc_number         : num  1.72 2.75 1.2 0 1.12 2.86 0.84 0 2.47 1.32 ...
##  $ lex_liwc_quant          : num  1.72 0.92 1.8 1.1 1.12 2.86 2.52 0.89 1.23 3.95 ...
##  $ lex_liwc_affect         : num  8.62 5.5 2.4 8.79 7.87 5.71 5.04 8.04 7.41 7.89 ...
##  $ lex_liwc_posemo         : num  1.72 5.5 1.2 5.86 0 0.95 4.2 1.79 0 1.32 ...
##  $ lex_liwc_negemo         : num  6.9 0 1.2 2.93 7.87 4.76 0.84 6.25 7.41 6.58 ...
##  $ lex_liwc_anx            : num  0.86 0 0 0 1.12 0.95 0 2.68 1.23 0 ...
##  $ lex_liwc_anger          : num  2.59 0 0 0.37 4.49 0.95 0 0 0 5.26 ...
##  $ lex_liwc_sad            : num  3.45 0 0 0.73 0 1.9 0.84 0.89 3.7 0 ...
##  $ lex_liwc_social         : num  3.45 11.01 15.57 13.55 8.99 ...
##  $ lex_liwc_family         : num  0 0 0.6 0.37 0 0 1.68 0 3.7 0 ...
##  $ lex_liwc_friend         : num  0 0 3.59 1.1 0 0.95 0 0 0 1.32 ...
##  $ lex_liwc_female         : num  0 0 1.8 0.37 0 0 2.52 0 0 0 ...
##  $ lex_liwc_male           : num  1.72 0 2.4 8.06 4.49 ...
##  $ lex_liwc_cogproc        : num  11.2 11.9 10.2 16.9 11.2 ...
##  $ lex_liwc_insight        : num  3.45 1.83 4.19 7.69 3.37 3.81 0 5.36 3.7 3.95 ...
##  $ lex_liwc_cause          : num  0.86 0 1.2 0.73 2.25 0.95 1.68 3.57 0 1.32 ...
##  $ lex_liwc_discrep        : num  2.59 3.67 0.6 1.83 0 3.81 0.84 0.89 0 0 ...
##  $ lex_liwc_tentat         : num  5.17 5.5 2.99 1.83 0 0.95 1.68 0.89 2.47 0 ...
##  $ lex_liwc_certain        : num  0 1.83 0 1.47 1.12 0.95 0.84 0.89 1.23 1.32 ...
##  $ lex_liwc_differ         : num  2.59 6.42 1.8 4.76 4.49 4.76 5.04 3.57 2.47 2.63 ...
##  $ lex_liwc_percept        : num  6.03 0.92 0 7.33 2.25 0 4.2 0 3.7 3.95 ...
##  $ lex_liwc_see            : num  1.72 0.92 0 1.1 0 0 2.52 0 0 0 ...
##  $ lex_liwc_hear           : num  1.72 0 0 0 0 0 1.68 0 0 0 ...
##  $ lex_liwc_feel           : num  1.72 0 0 5.49 2.25 0 0 0 3.7 3.95 ...
##  $ lex_liwc_bio            : num  2.59 0 0.6 2.2 2.25 0.95 0.84 4.46 1.23 9.21 ...
##  $ lex_liwc_body           : num  0.86 0 0.6 0 0 0 0 3.57 0 3.95 ...
##  $ lex_liwc_health         : num  1.72 0 0 0 1.12 0.95 0.84 0.89 1.23 2.63 ...
##  $ lex_liwc_sexual         : num  0 0 0 0.37 1.12 0 0.84 0 0 1.32 ...
##  $ lex_liwc_ingest         : num  0 0 0 0.37 0 0 0 0 0 2.63 ...
##  $ lex_liwc_drives         : num  8.62 15.6 8.98 6.59 7.87 ...
##  $ lex_liwc_affiliation    : num  0 5.5 5.39 4.03 0 7.62 3.36 0 9.88 5.26 ...
##  $ lex_liwc_achieve        : num  1.72 3.67 0.6 0 2.25 2.86 0.84 2.68 0 0 ...
##  $ lex_liwc_power          : num  4.31 7.34 1.2 0.73 4.49 3.81 2.52 2.68 1.23 3.95 ...
##  $ lex_liwc_reward         : num  0.86 2.75 2.4 1.1 0 0.95 0 2.68 0 0 ...
##  $ lex_liwc_risk           : num  2.59 0 0 0.73 1.12 0.95 0 2.68 2.47 1.32 ...
##  $ lex_liwc_focuspast      : num  4.31 0.92 3.59 7.69 5.62 ...
##  $ lex_liwc_focuspresent   : num  11.21 13.76 14.37 13.19 6.74 ...
##  $ lex_liwc_focusfuture    : num  0.86 0.92 1.8 1.1 0 0.95 0 2.68 0 0 ...
##  $ lex_liwc_relativ        : num  17.2 15.6 16.2 11 14.6 ...
##  $ lex_liwc_motion         : num  0.86 2.75 4.79 1.83 2.25 1.9 0.84 0.89 4.94 0 ...
##  $ lex_liwc_space          : num  10.34 10.09 5.99 3.3 2.25 ...
##  $ lex_liwc_time           : num  6.03 1.83 5.39 6.23 10.11 ...
##  $ lex_liwc_work           : num  0.86 11.01 0 0.73 1.12 ...
##  $ lex_liwc_leisure        : num  0 0 1.2 0.37 0 0 0 0 0 3.95 ...
##  $ lex_liwc_home           : num  0 0 0.6 0.37 2.25 0 0 0 1.23 0 ...
##  $ lex_liwc_money          : num  0 0.92 0 0 1.12 0 0 0 1.23 0 ...
##  $ lex_liwc_relig          : num  2.59 0 0 0 0 0 0 0 0 0 ...
##  $ lex_liwc_death          : num  0 0 0 0 0 0 0 0 0 1.32 ...
##  $ lex_liwc_informal       : num  0.86 1.83 0 2.56 1.12 0.95 0 0.89 0 5.26 ...
##  $ lex_liwc_swear          : num  0.86 0 0 0 0 0 0 0 0 3.95 ...
##  $ lex_liwc_netspeak       : num  0 0.92 0 0.73 0 0.95 0 0 0 0 ...
##  $ lex_liwc_assent         : num  0 0 0 0.73 0 0 0 0.89 0 1.32 ...
##  $ lex_liwc_nonflu         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ lex_liwc_filler         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ lex_liwc_AllPunc        : num  21.6 14.7 10.8 12.1 16.9 ...
##  $ lex_liwc_Period         : num  9.48 4.59 2.4 2.56 5.62 4.76 4.2 4.46 6.17 7.89 ...
##  $ lex_liwc_Comma          : num  3.45 2.75 3.59 7.33 6.74 5.71 7.56 0 1.23 2.63 ...
##  $ lex_liwc_Colon          : num  0.86 0 0 0 1.12 0.95 0 0 1.23 0 ...
##  $ lex_liwc_SemiC          : num  0.86 0 0 0 0 0.95 0 0 0 0 ...
##  $ lex_liwc_QMark          : num  0 0 0.6 0 0 2.86 0 0 0 1.32 ...
##  $ lex_liwc_Exclam         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ lex_liwc_Dash           : num  0 0 0 0 0 0.95 0 0 0 0 ...
##   [list output truncated]

🔗 Original Dataset on Kaggle

Interpretation, Findings, and Conclusions

The logistic regression analysis explored how user type and post length influenced the likelihood of a social media post being labeled as “stressed” or “not stressed.” Results showed that user type had a statistically significant effect, suggesting that some groups were more likely to express stress online than others. Post length also showed a positive association with stress labels longer posts tended to be more likely marked as “stressed,” possibly reflecting more expressive or emotional content.

These findings suggest that user characteristics and message length can be useful indicators for identifying emotional tone in digital content. For marketers, especially those managing social media engagement or mental health campaigns, this insight could inform strategies for tailoring content, monitoring sentiment, or identifying at-risk audiences. Understanding how different users express stress helps humanize data and supports more empathetic, data-informed marketing decisions aligned with customer behavior trends.