This vignette demonstrates how to use dplyr functions from the tidyverse to analyze student mental health and burnout data. The goal is to explore patterns in burnout among students using a real dataset.
Load libraries
library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.5.2
Warning: package 'tibble' was built under R version 4.5.2
Warning: package 'tidyr' was built under R version 4.5.2
Warning: package 'readr' was built under R version 4.5.2
Warning: package 'purrr' was built under R version 4.5.2
Warning: package 'dplyr' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 150000 Columns: 20
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): gender, course, year, stress_level, sleep_quality, internet_qualit...
dbl (13): student_id, age, daily_study_hours, daily_sleep_hours, screen_time...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Now that we have the select columns needed to analyze the statisticsburnout_group_analysis <- data_small %>%group_by(pressure_group) %>%summarise(avg_burnout =mean(burnout_numeric, na.rm =TRUE),student_count =n() ) %>%arrange(desc(avg_burnout))burnout_group_analysis
# A tibble: 3 × 3
pressure_group avg_burnout student_count
<chr> <dbl> <int>
1 High 2.00 45155
2 Medium 2.00 59925
3 Low 1.99 44920
The analysis shows that average burnout levels are relatively consistent across different levels of academic pressure. While it might be expected that higher academic pressure leads to significantly higher burnout, this dataset does not show a strong or meaningful difference in average burnout across groups.
This demonstrates that not all variables have a strong relationship, and highlights the importance of validating assumptions with data. The tidyverse tools used in this example—such as dplyr for grouping and summarizing—help make it easy to explore and evaluate these relationships.
##Extension by Guibril Ramde
In this extension, I add additional analysis by comparing burnout with sleep hours and study hours. This helps explore whether lifestyle habits may be related to student burnout in addition to academic pressure.
# Visualization: burnout by sleep groupggplot(sleep_burnout_analysis, aes(x = sleep_group, y = avg_burnout)) +geom_col() +labs(title ="Average Burnout by Sleep Group",x ="Sleep Group",y ="Average Burnout (1 = Low, 3 = High)" )
# Relationship between study hours and burnoutstudy_burnout_analysis <- data_extension %>%group_by(burnout_level) %>%summarise(avg_study_hours =mean(daily_study_hours, na.rm =TRUE),avg_sleep_hours =mean(daily_sleep_hours, na.rm =TRUE),student_count =n() )study_burnout_analysis
# A tibble: 3 × 4
burnout_level avg_study_hours avg_sleep_hours student_count
<chr> <dbl> <dbl> <int>
1 High 5.51 6.49 49766
2 Low 5.51 6.50 50265
3 Medium 5.51 6.50 49969
# Visualization: study hours by burnout levelggplot(study_burnout_analysis, aes(x = burnout_level, y = avg_study_hours)) +geom_col() +labs(title ="Average Study Hours by Burnout Level",x ="Burnout Level",y ="Average Daily Study Hours" )
This extension adds a second layer to the original analysis by examining whether sleep and study habits are connected to burnout. Instead of only comparing academic pressure groups, this analysis looks at average burnout by sleep category and average study hours by burnout level. These added summaries and visualizations provide a broader view of possible factors related to student burnout.