Introduction -This project focuses on analyzing student mental health and burnout levels using a large dataset. -The study explores how factors like stress, sleep, and study hours influence burnout.
Objectives - To analyze burnout levels among students
- To study the relationship between stress, sleep, and study hours
- To identify key factors affecting student mental health
Scope -This project helps in understanding patterns of student stress and provides insights for improving mental well-being.
Load Necessary Libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(corrplot)
## corrplot 0.95 loaded
Load Dataset
data <- read.csv("C:/Users/ASUS/OneDrive/Documents/Desktop/R Script/student_mental_health_burnout_1M.csv")
-The dataset is loaded for analysis.
-h LEVEL 1: Understanding the Data (Basic Exploration)
Question 1.1: What is the structure of the dataset (number of rows, columns, and data types)?
• Types of variables (numeric, categorical) • Whether data is usable for stats/ML
str(data)
## 'data.frame': 1000000 obs. of 20 variables:
## $ age : int 23 20 29 27 24 29 21 23 26 19 ...
## $ gender : chr "Male" "Male" "Male" "Male" ...
## $ academic_year : int 2 3 2 4 4 3 3 2 4 3 ...
## $ study_hours_per_day : num 5.6 5.6 2.58 4.61 2.19 ...
## $ exam_pressure : num 6.49 5.63 6.02 6.68 4.01 ...
## $ academic_performance: num 68.4 67.7 58.4 68.9 69.1 ...
## $ stress_level : num 4.117 0.349 3.476 6.779 1.855 ...
## $ anxiety_score : num 2.28 0 2.43 4.51 1.1 ...
## $ depression_score : num 1.987 0 0.852 4.286 0 ...
## $ sleep_hours : num 6.88 7.46 8.95 4.57 5.99 ...
## $ physical_activity : num 2.73 3.69 3.3 2.07 4.03 ...
## $ social_support : num 6.47 0 6.9 2.35 4.51 ...
## $ screen_time : num 4.99 3.86 5.43 6.3 4.9 ...
## $ internet_usage : num 4.98 5.14 3.06 6.93 5.13 ...
## $ financial_stress : num 3.45 2.81 4.92 6.92 4.38 ...
## $ family_expectation : num 3.59 5.48 6.07 6.56 5.93 ...
## $ burnout_score : num 2.04 0 0 7.23 0 ...
## $ mental_health_index : num 7.07 9.86 7.63 4.65 8.93 ...
## $ risk_level : chr "Low" "Low" "Low" "High" ...
## $ dropout_risk : num 1.747 0 0.697 5.381 0 ...
dim(data)
## [1] 1000000 20
Interpretation This dataset contains both categorical (Gender, Academic Year) and numerical variables (Burnout, Stress, Sleep), making it suitable for statistical and predictive analysis.
Question 1.2: What do the summary statistics reveal about the average burnout, stress levels, and the range (minimum to maximum) of values in the dataset?
summary(data)
## age gender academic_year study_hours_per_day
## Min. :17 Length:1000000 Min. :1.000 Min. : 0.000
## 1st Qu.:20 Class :character 1st Qu.:2.000 1st Qu.: 3.651
## Median :23 Mode :character Median :3.000 Median : 4.998
## Mean :23 Mean :2.501 Mean : 5.002
## 3rd Qu.:26 3rd Qu.:4.000 3rd Qu.: 6.346
## Max. :29 Max. :4.000 Max. :14.000
## exam_pressure academic_performance stress_level anxiety_score
## Min. : 1.000 Min. :42.37 Min. : 0.000 Min. : 0.000
## 1st Qu.: 4.945 1st Qu.:67.18 1st Qu.: 3.103 1st Qu.: 1.924
## Median : 5.999 Median :71.00 Median : 4.244 Median : 2.970
## Mean : 5.999 Mean :71.00 Mean : 4.246 Mean : 2.986
## 3rd Qu.: 7.052 3rd Qu.:74.82 3rd Qu.: 5.385 3rd Qu.: 4.015
## Max. :10.000 Max. :97.25 Max. :10.000 Max. :10.000
## depression_score sleep_hours physical_activity social_support
## Min. :0.000000 Min. : 3.000 Min. :0.000 Min. : 0.000
## 1st Qu.:0.005198 1st Qu.: 5.491 1st Qu.:1.991 1st Qu.: 3.650
## Median :1.047839 Median : 6.502 Median :3.001 Median : 4.999
## Mean :1.274728 Mean : 6.502 Mean :3.011 Mean : 5.000
## 3rd Qu.:2.086397 3rd Qu.: 7.515 3rd Qu.:4.011 3rd Qu.: 6.350
## Max. :8.530800 Max. :10.000 Max. :7.000 Max. :10.000
## screen_time internet_usage financial_stress family_expectation
## Min. : 1.000 Min. : 1.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 3.651 1st Qu.: 3.491 1st Qu.: 3.657 1st Qu.: 4.647
## Median : 5.004 Median : 5.002 Median : 5.001 Median : 6.000
## Mean : 5.019 Mean : 5.038 Mean : 5.003 Mean : 5.983
## 3rd Qu.: 6.351 3rd Qu.: 6.507 3rd Qu.: 6.355 3rd Qu.: 7.352
## Max. :12.000 Max. :14.000 Max. :10.000 Max. :10.000
## burnout_score mental_health_index risk_level dropout_risk
## Min. : 0.0000 Min. : 1.310 Length:1000000 Min. :0.000
## 1st Qu.: 0.1248 1st Qu.: 6.142 Class :character 1st Qu.:0.000
## Median : 1.4965 Median : 7.074 Mode :character Median :1.010
## Mean : 1.7841 Mean : 7.023 Mean :1.325
## 3rd Qu.: 2.8895 3rd Qu.: 7.962 3rd Qu.:2.174
## Max. :10.0000 Max. :10.000 Max. :9.326
Interpretation The dataset shows students aged 17–29, with an average age of around 23. Most variables like study hours, stress, and screen time have moderate average values near 5. Academic performance is stable, with an average around 71%, and sleep averages about 6.5 hours. There is noticeable variation in mental health factors like anxiety, depression, and burnout among students.
Question 1.3: Are there any missing values in the dataset? • Data quality issues
colSums(is.na(data))
## age gender academic_year
## 0 0 0
## study_hours_per_day exam_pressure academic_performance
## 0 0 0
## stress_level anxiety_score depression_score
## 0 0 0
## sleep_hours physical_activity social_support
## 0 0 0
## screen_time internet_usage financial_stress
## 0 0 0
## family_expectation burnout_score mental_health_index
## 0 0 0
## risk_level dropout_risk
## 0 0
Interpretation This checks for missing values in each column. The dataset contains no missing values.
Question 1.4: What is the average burnout level of students for across all years? • Overall mental health state
# Clean column names
names(data) <- tolower(trimws(names(data)))
names(data) <- gsub(" ", "_", names(data))
# Check column names first
print(names(data))
## [1] "age" "gender" "academic_year"
## [4] "study_hours_per_day" "exam_pressure" "academic_performance"
## [7] "stress_level" "anxiety_score" "depression_score"
## [10] "sleep_hours" "physical_activity" "social_support"
## [13] "screen_time" "internet_usage" "financial_stress"
## [16] "family_expectation" "burnout_score" "mental_health_index"
## [19] "risk_level" "dropout_risk"
# Convert columns safely to numeric
cols <- c("burnout_score", "stress_level", "study_hours", "sleep_hours")
for (col in cols) {
if (col %in% names(data)) {
data[[col]] <- as.numeric(as.character(data[[col]]))
} else {
cat(paste("Column not found:", col, "\n"))
}
}
## Column not found: study_hours
# Check structure
str(data)
## 'data.frame': 1000000 obs. of 20 variables:
## $ age : int 23 20 29 27 24 29 21 23 26 19 ...
## $ gender : chr "Male" "Male" "Male" "Male" ...
## $ academic_year : int 2 3 2 4 4 3 3 2 4 3 ...
## $ study_hours_per_day : num 5.6 5.6 2.58 4.61 2.19 ...
## $ exam_pressure : num 6.49 5.63 6.02 6.68 4.01 ...
## $ academic_performance: num 68.4 67.7 58.4 68.9 69.1 ...
## $ stress_level : num 4.117 0.349 3.476 6.779 1.855 ...
## $ anxiety_score : num 2.28 0 2.43 4.51 1.1 ...
## $ depression_score : num 1.987 0 0.852 4.286 0 ...
## $ sleep_hours : num 6.88 7.46 8.95 4.57 5.99 ...
## $ physical_activity : num 2.73 3.69 3.3 2.07 4.03 ...
## $ social_support : num 6.47 0 6.9 2.35 4.51 ...
## $ screen_time : num 4.99 3.86 5.43 6.3 4.9 ...
## $ internet_usage : num 4.98 5.14 3.06 6.93 5.13 ...
## $ financial_stress : num 3.45 2.81 4.92 6.92 4.38 ...
## $ family_expectation : num 3.59 5.48 6.07 6.56 5.93 ...
## $ burnout_score : num 2.04 0 0 7.23 0 ...
## $ mental_health_index : num 7.07 9.86 7.63 4.65 8.93 ...
## $ risk_level : chr "Low" "Low" "Low" "High" ...
## $ dropout_risk : num 1.747 0 0.697 5.381 0 ...
Interpretation The dataset was cleaned by standardizing column names and converting key variables into numeric format. Some values may have been converted to missing (NA) due to inconsistent or invalid data entries. This preprocessing step ensures that the data is suitable for statistical analysis and visualization. Overall, the cleaned dataset provides a more reliable foundation for analyzing student burnout and related factors.
Question 1.5: How is the gender distribution represented in the dataset, and does it indicate a balanced dataset? Use the frequency table of the gender variable to support your answer.
data$gender <- trimws(as.character(data$gender))
data$gender[data$gender == ""] <- NA
table(data$gender, useNA = "ifany")
##
## Female Male Other
## 480070 479643 40287
Interpretation Dataset shows a nearly balanced gender distribution between male and female students.Female students (480,070) slightly outnumber male students (479,643).A smaller proportion of students identify as “Other” (40,287). Overall, the dataset is well-balanced, making it suitable for unbiased analysis across genders.
Question 1.6: What is the average number of study hours per day among students in the dataset, and how does handling missing values (NA) affect the calculation?
mean(data$study_hours_per_day, na.rm = TRUE)
## [1] 5.001727
Interpretation On average, students study around 5 hours per day.This indicates a moderate level of academic engagement among students. The use of na.rm = TRUE ensures missing values do not affect the result.Overall, the dataset shows a consistent study pattern across students.
| LEVEL 2: Data Manipulation (DPLYR ANALYSIS) |
|---|
| Question 2.1:What are the average burnout levels and average study hours for each gender group in the dataset? |
| ``` r library(dplyr) |
| # Check names names(data) ``` |
## [1] "age" "gender" "academic_year" ## [4] "study_hours_per_day" "exam_pressure" "academic_performance" ## [7] "stress_level" "anxiety_score" "depression_score" ## [10] "sleep_hours" "physical_activity" "social_support" ## [13] "screen_time" "internet_usage" "financial_stress" ## [16] "family_expectation" "burnout_score" "mental_health_index" ## [19] "risk_level" "dropout_risk" |
r # Use exact column names by selecting data %>% group_by(data[[2]]) %>% # gender column usually 2nd summarise( avg_burnout = mean(data[[6]], na.rm = TRUE), # burnout column avg_study = mean(data[[4]], na.rm = TRUE) # study hours column ) |
## # A tibble: 3 × 3 ## `data[[2]]` avg_burnout avg_study ## <chr> <dbl> <dbl> ## 1 Female 71.0 5.00 ## 2 Male 71.0 5.00 ## 3 Other 71.0 5.00 |
| Interpretation The analysis compares average burnout and study hours across different gender groups.It helps identify whether burnout levels vary between male, female, and other students.Similarly, differences in study hours show variation in academic engagement by gender.Overall, this analysis highlights how gender may influence both study patterns and burnout levels. |
| Question 2.2: How are students distributed across different stress categories (Low, Medium, High) based on their stress levels? |
| ``` r data <- data %>% mutate(stress_category = case_when( stress_level <= 3 ~ “Low”, stress_level <= 7 ~ “Medium”, TRUE ~ “High” )) |
| # Show result table(data$stress_category) ``` |
## ## High Low Medium ## 51136 231184 717680
Interpretation The majority of students fall under the medium stress
category (717,680), indicating moderate stress levels are most common. A
significant number of students are in the low stress category (231,184),
showing a fair portion experiences minimal stress. However, a smaller
group (51,136) falls into the high stress category, which may require
attention. Overall, the data suggests that while most students manage
moderate stress, a notable minority faces high stress levels. |
| Question 2.3: Which age group (Teen, Young Adult, Adult) experiences the highest average burnout level? |
| ``` r library(dplyr) |
| data %>% mutate(age_group = case_when( age < 20 ~ “Teen”, age <= 25 ~ “Young Adult”, TRUE ~ “Adult” )) %>% group_by(age_group) %>% summarise(avg_burnout = mean(burnout_score, na.rm = TRUE)) ``` |
## # A tibble: 3 × 2 ## age_group avg_burnout ## <chr> <dbl> ## 1 Adult 1.78 ## 2 Teen 1.79 ## 3 Young Adult 1.78
Interpretation The average burnout levels are quite similar across all
age groups.Teens have the slightly highest burnout (1.788), followed by
Young Adults (1.783) and Adults (1.782).The difference between groups is
very minimal,indicating burnout is almost evenly distributed.Overall,
age does not appear to significantly impact burnout levels in this
dataset. |
| Question 2.4: Who are the top 5 students with the highest burnout scores in the dataset, and why might they require immediate attention or intervention? |
r data %>% arrange(desc(burnout_score)) %>% head(5) |
## age gender academic_year study_hours_per_day exam_pressure ## 1 20 Female 2 7.434886 8.468656 ## 2 18 Female 2 9.681102 10.000000 ## 3 25 Male 3 8.121920 8.118053 ## 4 25 Female 1 7.120579 9.410431 ## 5 28 Female 2 8.337083 9.669194 ## academic_performance stress_level anxiety_score depression_score sleep_hours ## 1 67.36289 7.049948 6.296077 3.728402 4.178584 ## 2 80.92194 10.000000 7.828319 6.151836 4.000456 ## 3 68.15252 9.758479 7.578385 4.264993 3.000000 ## 4 68.14285 8.746559 6.363234 5.986526 3.075012 ## 5 69.79236 10.000000 8.432348 5.245182 6.880856 ## physical_activity social_support screen_time internet_usage financial_stress ## 1 3.936872 3.902406 5.996620 6.973373 4.547513 ## 2 1.759483 1.753121 4.214899 3.982958 8.674813 ## 3 2.142500 5.845011 5.894929 6.432793 6.603594 ## 4 3.188768 5.843709 5.991998 6.539810 9.787591 ## 5 2.421136 7.555843 3.328305 4.280713 9.463255 ## family_expectation burnout_score mental_health_index risk_level dropout_risk ## 1 7.388527 10 4.172677 High 4.357292 ## 2 7.403697 10 1.805954 High 6.712860 ## 3 5.513622 10 2.543595 High 6.029545 ## 4 8.037100 10 2.796448 High 7.141450 ## 5 5.007192 10 1.896741 High 5.782103 ## stress_category ## 1 High ## 2 High ## 3 High ## 4 High ## 5 High
Interpretation The analysis identifies the top 5 students with the
highest burnout scores in the dataset.These students represent extreme
cases compared to the overall population.Their high burnout levels may
indicate serious mental and academic pressure.Therefore, they may
require immediate attention and proper intervention. |
Question 3.1: What is the relationship between sleep hours and burnout score among students, and how does sleep affect burnout levels?
cor(
as.numeric(data$sleep_hours),
as.numeric(data$burnout_score),
use = "complete.obs"
)
## [1] -0.3713859
Interpretation The correlation analysis measures the relationship between sleep hours and burnout levels.A negative correlation indicates that as sleep decreases, burnout tends to increase.This suggests that students who get less sleep are more likely to experience higher burnout.Overall, proper sleep plays an important role in reducing burnout among s
Question 3.2: How do average study hours vary across different academic years, and what does this indicate about academic pressure?
data %>%
group_by(academic_year) %>%
summarise(avg_study = mean(study_hours_per_day, na.rm = TRUE))
## # A tibble: 4 × 2
## academic_year avg_study
## <int> <dbl>
## 1 1 5.00
## 2 2 5.00
## 3 3 5.00
## 4 4 5.00
Interpretation The average study hours are almost the same across all academic years.There is only a very slight increase in study hours from year 1 to year 4.
Question 3.3: Which students have high burnout scores but low study hours, and what might this indicate about their condition?
data %>%
filter(burnout_score > 7 & study_hours_per_day < 3)
## age gender academic_year study_hours_per_day exam_pressure
## 1 22 Male 3 2.7664645 4.589180
## 2 18 Male 2 1.5099779 3.658734
## 3 28 Female 1 2.2865946 5.883640
## 4 24 Female 2 2.6439506 5.047968
## 5 23 Other 4 2.4727515 6.122360
## 6 19 Male 2 2.2320142 5.666139
## 7 24 Male 1 2.9242063 6.054595
## 8 29 Female 2 2.2137185 6.074293
## 9 28 Female 2 2.7478305 4.964551
## 10 23 Female 3 2.8147511 6.864406
## 11 29 Female 3 2.5518931 5.103456
## 12 17 Male 4 2.1264212 6.398525
## 13 23 Female 4 1.4118082 6.592856
## 14 26 Male 2 2.7656726 5.872417
## 15 18 Male 2 1.9291570 3.761876
## 16 28 Male 1 2.7460615 6.137397
## 17 25 Female 1 1.5989425 3.878800
## 18 29 Male 1 2.4197032 5.128629
## 19 23 Male 2 2.2110535 7.018914
## 20 25 Female 3 1.6642838 5.536171
## 21 25 Male 1 2.6455247 6.215714
## 22 17 Female 1 2.0272267 3.364869
## 23 21 Male 2 2.3853429 6.533005
## 24 25 Female 1 1.1234241 3.880244
## 25 18 Male 2 2.8468938 6.103351
## 26 28 Male 2 2.0010709 5.885879
## 27 27 Female 4 0.0000000 3.414866
## 28 24 Male 2 2.1998777 5.717227
## 29 18 Male 4 2.5044336 4.291511
## 30 27 Female 1 1.0519486 5.065720
## 31 17 Female 3 2.3701006 5.225254
## 32 21 Female 3 2.3450568 6.988845
## 33 28 Male 3 2.8687208 5.137485
## 34 25 Female 3 2.9889178 5.432657
## 35 24 Female 4 1.7190737 5.408952
## 36 26 Female 4 2.8680856 5.136847
## 37 23 Female 4 2.0863010 7.029295
## 38 19 Female 3 2.5751931 7.056468
## 39 21 Female 2 2.0748494 6.722612
## 40 29 Female 1 2.8921087 4.518733
## 41 28 Male 2 2.7795971 5.409114
## 42 18 Female 3 2.7143694 6.177466
## 43 20 Female 4 0.7158414 4.664664
## 44 20 Female 4 2.7629412 4.951928
## 45 24 Female 2 1.8747637 4.463754
## 46 21 Female 4 1.8282868 7.544150
## 47 17 Female 4 2.4963502 4.926036
## 48 27 Female 2 2.9513445 7.071886
## 49 25 Male 4 1.7559504 6.008640
## 50 25 Male 1 2.6755065 2.835343
## 51 21 Male 3 2.8764745 4.841230
## 52 18 Male 4 2.7957252 4.746295
## 53 28 Male 2 2.2167873 4.804203
## 54 28 Male 3 2.2616998 6.373028
## 55 18 Male 1 2.4482749 4.039813
## 56 29 Other 1 2.5114051 4.305329
## 57 29 Male 3 2.5927587 5.470537
## 58 18 Female 2 2.4118345 5.396160
## 59 27 Female 1 1.4775475 5.841774
## 60 25 Male 2 2.5713823 5.531697
## 61 28 Female 1 2.1247682 6.359655
## 62 22 Male 1 2.7490652 6.756891
## 63 23 Female 3 2.7611667 5.113206
## 64 19 Female 1 2.3680369 5.020049
## 65 19 Other 3 2.9141171 5.295781
## 66 23 Female 1 2.9555685 9.038181
## academic_performance stress_level anxiety_score depression_score sleep_hours
## 1 71.57590 6.922059 7.406937 4.2759735 4.593005
## 2 69.22643 6.304914 5.677457 3.6388208 4.423357
## 3 65.03467 7.394936 4.813425 2.4239676 4.721513
## 4 72.11181 7.061245 5.285346 2.8491080 4.532336
## 5 67.14357 6.279518 4.631244 2.0216005 3.682621
## 6 70.03174 7.505642 5.108300 3.3564608 5.228673
## 7 70.51512 7.949085 7.517577 3.6691516 5.733907
## 8 61.69167 7.571588 7.032356 5.6869317 4.599037
## 9 70.75603 7.574985 6.347010 2.1660904 4.916221
## 10 60.40191 7.422617 6.023354 3.9896969 6.224178
## 11 65.28942 7.038465 5.241751 3.4138526 4.079692
## 12 66.88587 7.936794 5.965406 5.7927066 4.950533
## 13 64.74485 7.698999 7.080956 4.4947866 4.232068
## 14 78.91947 6.797810 6.327349 0.9438719 3.654157
## 15 69.14069 8.836854 5.236125 5.3699508 3.408973
## 16 55.43211 8.024838 5.993706 2.9623898 4.498520
## 17 64.26265 6.359579 6.053508 4.2275751 4.748724
## 18 63.50625 7.959417 6.146259 3.2848985 3.040584
## 19 67.96039 9.208591 7.001844 6.6374980 3.000000
## 20 66.31495 7.642988 5.550248 3.6833407 3.000000
## 21 68.30919 7.625498 6.738303 3.2340446 5.720183
## 22 67.95530 8.159750 4.922105 3.9728196 3.000000
## 23 62.31588 7.490747 6.591655 4.8985077 4.991149
## 24 66.06619 5.632942 4.116459 4.5777974 4.583127
## 25 66.17318 9.520307 5.485226 5.3531674 5.555849
## 26 69.19192 8.013388 6.041670 4.2237194 3.000000
## 27 67.89344 6.763818 7.276278 2.1717787 4.983623
## 28 68.78551 6.950256 5.921711 3.4205109 5.270893
## 29 58.17739 7.494285 7.194584 4.6919835 4.318293
## 30 68.00649 7.657069 3.643033 2.9354313 4.337110
## 31 66.20823 7.513840 5.922412 3.3117961 3.985555
## 32 64.11797 8.352533 3.745261 5.2134686 5.669982
## 33 74.51645 6.828379 2.958630 3.8191222 3.000000
## 34 63.63387 5.446310 3.927282 4.7968634 3.000000
## 35 71.14018 8.732641 5.368194 3.9762405 3.196513
## 36 64.58110 8.020590 5.209913 3.7525271 5.082960
## 37 62.37717 6.836717 7.671908 4.7859558 4.034876
## 38 59.20799 9.217629 6.382577 4.0274326 5.685083
## 39 62.32604 7.869867 6.987635 3.1079156 7.402199
## 40 68.72612 7.223438 5.410664 3.6085355 3.838692
## 41 61.42479 8.530603 7.685144 5.7036960 3.000000
## 42 67.28202 8.179641 4.791468 5.1109147 4.065765
## 43 68.31523 6.840585 6.821359 2.0468791 4.832804
## 44 64.75226 6.989046 5.213940 3.6890339 5.209715
## 45 67.23902 7.948310 6.692596 3.1396253 4.372022
## 46 60.69527 8.428624 6.709301 4.6462531 4.588784
## 47 72.87932 8.020934 5.232859 3.7129066 4.263608
## 48 64.16690 8.386509 6.206810 5.3674092 4.219032
## 49 72.00274 6.727341 6.068890 5.8391824 3.779148
## 50 56.57284 8.750017 8.404884 2.8628004 4.529202
## 51 67.98732 7.233725 5.624963 6.6123312 4.788086
## 52 67.12514 7.098860 6.595889 5.1114656 4.076269
## 53 63.56801 8.938628 4.755021 5.6645791 3.149180
## 54 69.72228 6.553573 4.843606 3.3465111 3.429240
## 55 61.27774 9.255991 5.464613 3.4307751 5.268390
## 56 76.99408 7.442230 4.457039 3.4386856 4.430949
## 57 65.71962 8.918114 6.490950 5.2753261 3.000000
## 58 60.60082 7.600435 3.430083 3.2553267 4.344867
## 59 57.35242 7.390835 7.039568 1.7249606 3.285594
## 60 71.75165 9.175739 6.998808 3.7248155 4.742595
## 61 72.14155 5.936867 3.228154 4.4232439 4.520384
## 62 66.33420 6.660016 6.683599 4.1540865 4.341100
## 63 59.85959 8.868856 7.917835 3.9585520 3.000000
## 64 71.23603 7.016851 6.142979 4.1054875 3.000000
## 65 67.28902 7.976703 7.613253 5.4928838 4.414605
## 66 71.40459 7.636825 6.686762 2.1233046 6.174883
## physical_activity social_support screen_time internet_usage financial_stress
## 1 3.4343936 3.24916935 4.804843 2.301280 4.879540
## 2 2.6098057 3.71844261 4.155154 3.994222 6.612315
## 3 1.9645597 6.49519016 5.694452 6.878628 8.442311
## 4 2.5832086 1.48495422 5.501220 4.613716 7.825519
## 5 2.9523048 3.56364847 4.155639 5.642483 6.571124
## 6 1.3089149 1.55499931 4.266509 3.000125 8.222645
## 7 2.4861469 4.18214747 6.577716 6.878524 9.738804
## 8 1.8881351 3.14119703 5.233493 6.471028 6.451674
## 9 2.9696729 7.21500108 8.813883 8.285656 6.105695
## 10 4.3913636 3.13747669 3.889238 6.324845 5.410617
## 11 4.2775682 3.84988953 8.251074 6.821011 8.891268
## 12 2.9566299 4.64597169 2.897736 4.141079 9.922092
## 13 3.2415985 2.72468644 6.074900 5.523304 4.185518
## 14 2.8206283 7.49821528 6.558042 4.413671 7.049005
## 15 4.4644720 0.76572761 8.144644 8.684905 9.603963
## 16 1.5329623 1.95446110 3.891875 2.602208 10.000000
## 17 3.4011023 0.03826296 5.110468 5.317680 9.791090
## 18 3.7610110 6.10436677 1.297869 1.263988 10.000000
## 19 3.0754660 3.43494225 3.729528 4.500015 8.118391
## 20 3.4213346 3.83679112 4.461302 4.059364 5.510529
## 21 1.9360298 1.75086806 7.920460 8.824320 5.697992
## 22 1.4789400 1.54287694 9.154598 9.958066 4.801118
## 23 4.4111376 2.04715309 7.916656 7.815029 9.272656
## 24 3.3975009 1.13174920 6.196575 5.288491 7.841112
## 25 2.5368213 1.67646422 3.110961 3.014175 6.484735
## 26 1.7008934 1.98815224 5.139895 5.867809 7.173008
## 27 4.5649267 3.34828401 5.834293 6.209893 9.381991
## 28 3.3288893 2.22148703 4.766632 5.595867 8.353537
## 29 1.5083700 4.54132262 5.992936 6.339860 8.447883
## 30 2.5642054 6.76280211 3.284578 2.379128 7.226893
## 31 0.7611050 0.18992677 1.000000 3.169534 5.895410
## 32 2.2338278 3.89047944 4.103220 3.656163 9.007804
## 33 0.0000000 5.48227881 6.423818 4.357607 5.971805
## 34 3.7503060 0.00000000 8.448276 8.774205 5.030046
## 35 1.3001598 4.83014136 3.856916 2.897423 6.785889
## 36 4.5310641 5.82088262 4.336431 3.595190 10.000000
## 37 0.2324391 2.94098839 5.711533 5.509197 4.177808
## 38 1.9362846 5.73952743 3.835354 5.226144 9.305125
## 39 2.1495439 0.62433359 4.528427 4.753104 6.567349
## 40 0.0000000 3.42493270 5.322838 5.702844 7.544919
## 41 1.6286432 2.07200017 4.345357 3.364072 7.302465
## 42 1.6029521 3.46563955 1.000000 3.253127 6.194253
## 43 1.7074346 2.23381472 7.264806 6.886336 6.897715
## 44 3.7403438 4.49907329 1.218415 1.000000 7.205770
## 45 1.3921702 5.77737619 9.427271 8.909941 8.024570
## 46 4.8364664 3.90177391 3.017938 3.388576 8.140058
## 47 2.2155832 2.43266927 7.645265 7.902305 8.383564
## 48 3.3194116 2.37410627 6.602973 7.291509 9.615357
## 49 1.8366570 0.00000000 7.926914 7.542428 8.096157
## 50 2.7598109 4.98648155 4.454584 3.213918 9.641538
## 51 4.3664039 0.00000000 5.022328 5.456036 7.478615
## 52 3.3067997 0.00000000 6.001435 7.597422 9.717045
## 53 1.1859158 5.32549780 3.652873 4.673192 8.506458
## 54 4.2948539 2.76861644 1.000000 1.000000 4.411466
## 55 4.3821131 4.41014686 4.174971 3.694146 8.921134
## 56 0.9657283 5.24838869 6.074591 5.797221 7.313866
## 57 0.4118288 3.84577579 3.693823 3.199683 5.611038
## 58 4.0468884 2.18848235 6.921208 6.159106 8.129003
## 59 3.4513113 2.45367621 6.776569 7.219208 4.685655
## 60 0.0000000 3.80218905 8.761642 8.444514 9.941159
## 61 2.8505870 3.02570486 2.462998 3.024297 4.410332
## 62 1.1168242 4.26053406 5.931941 4.278811 6.999807
## 63 2.9520053 3.66902632 3.859136 2.283165 8.535073
## 64 5.7260661 3.61170974 3.558765 2.548766 2.921604
## 65 0.4440304 2.35967905 5.070539 4.839424 6.264243
## 66 0.9625280 4.17320902 1.000000 1.422415 10.000000
## family_expectation burnout_score mental_health_index risk_level dropout_risk
## 1 9.567179 7.666109 3.726303 High 5.761763
## 2 7.863416 7.008120 4.683151 High 4.844499
## 3 6.354101 7.392882 4.870808 High 3.256869
## 4 9.249084 7.058138 4.735166 High 5.176573
## 5 7.420730 7.441946 5.492339 High 3.372702
## 6 7.927364 7.461183 4.458315 High 7.117629
## 7 8.681005 7.506648 3.464347 High 6.487214
## 8 7.126718 7.099915 3.155579 High 6.194144
## 9 4.452725 7.068842 4.416076 High 2.700096
## 10 8.540742 7.013876 4.027038 High 3.648274
## 11 8.282033 7.154120 4.587933 High 3.053865
## 12 7.289254 7.740943 3.297849 High 6.006228
## 13 7.259658 7.318910 3.447678 High 4.393201
## 14 8.411513 7.084349 5.099510 High 2.111860
## 15 9.657895 7.491382 3.283436 High 8.052678
## 16 7.590593 7.267681 4.103236 High 4.875692
## 17 5.076589 7.138798 4.371843 High 8.124340
## 18 5.330413 7.126607 3.986886 High 5.007213
## 19 6.397704 7.344058 2.224761 High 6.579348
## 20 8.478760 7.297735 4.172728 High 4.367250
## 21 9.556074 7.070798 3.958097 High 4.979348
## 22 6.514850 7.208484 4.067623 High 4.989010
## 23 6.108870 7.265342 3.556652 High 6.419573
## 24 7.079694 7.025325 5.138546 High 4.818197
## 25 9.676154 7.351566 2.940359 High 5.034966
## 26 4.030313 8.319738 3.715028 High 4.668724
## 27 6.675785 7.412974 4.460056 High 3.660456
## 28 7.773715 7.007319 4.417231 High 4.227592
## 29 5.096599 7.394965 3.436316 High 6.336516
## 30 5.905136 7.197382 4.963633 High 3.336655
## 31 7.766006 7.573999 4.224202 High 5.964985
## 32 5.293567 7.708743 3.971368 High 5.285188
## 33 6.735988 7.065686 5.235323 High 4.642341
## 34 7.059510 7.020820 5.204233 High 4.510307
## 35 5.460258 7.286844 3.703613 High 4.804202
## 36 7.619545 7.340471 4.103032 High 7.157225
## 37 4.039251 7.961196 3.527954 High 5.841169
## 38 7.452516 7.246279 3.189946 High 4.828537
## 39 7.643617 7.499554 3.823388 High 4.958537
## 40 7.845402 7.405841 4.404865 High 6.244578
## 41 7.803323 8.274104 2.571107 High 6.756365
## 42 6.461382 7.172892 3.757429 High 6.265660
## 43 3.816223 7.213799 4.603294 High 4.954871
## 44 9.426228 7.076345 4.533489 High 3.599777
## 45 8.181630 7.351463 3.871009 High 3.941323
## 46 9.563540 7.415768 3.221884 High 4.656346
## 47 5.555241 7.386210 4.107897 High 3.444455
## 48 7.326211 8.404778 3.173131 High 6.852744
## 49 8.994406 7.593673 3.736642 High 6.969048
## 50 9.513563 8.312329 3.119688 High 3.329640
## 51 7.976977 7.160737 3.435322 High 5.500402
## 52 6.898720 7.542005 3.648249 High 7.354499
## 53 9.043106 9.424525 3.298669 High 5.832988
## 54 8.112579 8.051759 4.921536 High 4.775516
## 55 8.992554 7.067088 3.628987 High 2.986024
## 56 6.625903 7.244487 4.654391 High 5.757050
## 57 8.987222 7.516295 2.902871 High 5.584604
## 58 8.723108 7.128011 4.954203 High 5.345439
## 59 10.000000 8.003765 4.414307 High 2.397678
## 60 9.075040 7.848760 3.112618 High 6.350650
## 61 6.530127 7.030023 5.329834 High 3.835598
## 62 3.793718 7.297613 4.084688 High 4.803495
## 63 8.355916 7.627317 2.889542 High 6.411738
## 64 9.778943 7.411784 4.118719 High 4.765548
## 65 10.000000 7.414707 2.877478 High 4.257310
## 66 6.256730 7.118447 4.302250 High 3.387377
## stress_category
## 1 Medium
## 2 Medium
## 3 High
## 4 High
## 5 Medium
## 6 High
## 7 High
## 8 High
## 9 High
## 10 High
## 11 High
## 12 High
## 13 High
## 14 Medium
## 15 High
## 16 High
## 17 Medium
## 18 High
## 19 High
## 20 High
## 21 High
## 22 High
## 23 High
## 24 Medium
## 25 High
## 26 High
## 27 Medium
## 28 Medium
## 29 High
## 30 High
## 31 High
## 32 High
## 33 Medium
## 34 Medium
## 35 High
## 36 High
## 37 Medium
## 38 High
## 39 High
## 40 High
## 41 High
## 42 High
## 43 Medium
## 44 Medium
## 45 High
## 46 High
## 47 High
## 48 High
## 49 Medium
## 50 High
## 51 High
## 52 High
## 53 High
## 54 Medium
## 55 High
## 56 High
## 57 High
## 58 High
## 59 High
## 60 High
## 61 Medium
## 62 Medium
## 63 High
## 64 High
## 65 High
## 66 High
Interpretation Students with high burnout but low study hours are identified.Their burnout may not be due to academic pressure alone.They could be facing emotional or psychological stress.This highlights the importance of mental health factors.
| LEVEL 4: Visualization of Dataset |
Question 4.1: What does the histogram reveal about the distribution and shape of burnout scores among students?
library(ggplot2)
ggplot(data, aes(x = burnout_score, fill = ..count..)) +
geom_histogram(bins = 30, color = "black") +
labs(
title = "Distribution of Burnout Scores",
x = "Burnout Score",
y = "Frequency"
) +
theme_minimal()
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Interpretation If right-skewed → Most students have low to moderate burnout, with a few having very high burnout levels. If left-skewed → Most students have high burnout, with only a few experiencing low burnout.
If normal distribution → Burnout levels are evenly spread, with most students having moderate burnout. If uniform → Burnout scores are evenly distributed across all levels.
Question 4.2: How do burnout levels compare across different genders, and what does the boxplot reveal about variability and outliers?
library(ggplot2)
ggplot(data, aes(x = gender, y = burnout_score, fill = gender)) +
geom_boxplot() +
scale_fill_manual(values = c("#FFC0CB", "#BEBEBE", "#800000")) +
labs(
title = "Burnout Level by Gender",
x = "Gender",
y = "Burnout Score",
fill = "Gender"
) +
theme_minimal()
Interpretation All medians are similar → Burnout levels are almost the same across all genders. boxes overlap significantly → There is no major difference in burnout distribution between genders. spread is similar → Variability in burnout is consistent across genders.
Overall → Gender does not have a significant impact on burnout levels in this dataset. Every gender shows median burnout and more variability.
Question 4.3: What relationship does the scatter plot show between study hours per day and burnout score among students?
# Ensure the library is loaded
library(ggplot2)
# Corrected Plot Code
ggplot(data, aes(x = `study_hours_per_day`, y = `burnout_score`)) +
geom_point(color = "pink", alpha = 0.6) +
geom_smooth(method = "lm", se = TRUE, color = "black") +
labs(
title = "Study Hours vs Burnout Score",
x = "Study Hours per Day",
y = "Burnout Score"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Interpretation A positive trend indicates that more study hours increase
burnout.
Question 4.4:How does the average stress level vary across different academic years, as shown in the bar chart?
data %>%
group_by(academic_year) %>%
summarise(avg_stress = mean(stress_level, na.rm = TRUE)) %>%
ggplot(aes(x = factor(academic_year), y = avg_stress, fill = factor(academic_year))) +
geom_bar(stat = "identity") +
geom_text(aes(label = round(avg_stress, 2)), vjust = -0.5) +
scale_fill_manual(values = c("pink","grey","maroon","beige")) +
labs(
title = "Average Stress Level by Academic Year",
x = "Academic Year",
y = "Average Stress Level",
fill = "Year"
) +
coord_cartesian(ylim = c(0, 10)) +
theme_minimal()
Interpretation The average stress level remains almost the same across
all academic years.There is no significant variation in stress levels
from year 1 to year 4.This suggests that academic pressure is consistent
throughout all years.Overall, academic year does not have a strong
impact on stress levels in this dataset.
Question 4.5: How does the relationship between sleep hours and burnout score differ across genders in the facet plot?
ggplot(data, aes(x = sleep_hours, y = burnout_score, color = gender)) +
geom_point() +
facet_wrap(~gender) +
scale_color_manual(values = c("Male" = "maroon", "Female" = "pink", "Other" ="grey"))
Interpretation The facet plot shows a similar pattern between sleep
hours and burnout across all genders.Data points are evenly spread, and
no gender shows a distinct trend or deviation.This indicates that the
relationship between sleep and burnout is consistent for all
groups.Overall, gender does not significantly alter how sleep impacts
burnout in this dataset.
LEVEL 5: Advanced Analysis
Question 5.1: What does the correlation matrix reveal about the relationships between stress level, sleep hours, study hours, and burnout score?
num_data <- data %>%
select(stress_level, sleep_hours, study_hours_per_day, burnout_score)
cor_matrix <- cor(num_data, use = "complete.obs", method = "pearson")
library(corrplot)
corrplot(cor_matrix,
method = "circle",
type = "upper",
diag = FALSE,
tl.col = "black",
addCoef.col = "black")
Interpretation Stress level has the strongest positive correlation with burnout, making it a key predictor. 1. Strong Positive Correlation: Stress vs. Burnout (0.75) 2. Moderate Positive Correlation: Study Hours vs. Burnout (0.34) 3. Negative Correlation: Sleep vs. Burnout (-0.37) 4. No Correlation: Sleep vs. Study Hours (0.00)
Question 5.2: Which students are identified as outliers in burnout scores, and what do these extreme values indicate?
# Load required libraries
library(dplyr)
library(knitr)
# Calculate IQR
Q1 <- quantile(data$burnout_score, 0.25, na.rm = TRUE)
Q3 <- quantile(data$burnout_score, 0.75, na.rm = TRUE)
IQR_val <- Q3 - Q1
# Identify outliers
outliers <- data %>%
filter(burnout_score < (Q1 - 1.5 * IQR_val) |
burnout_score > (Q3 + 1.5 * IQR_val))
# Show number of outliers
cat("Total Outliers:", nrow(outliers), "\n")
## Total Outliers: 3735
# Display a clean sample (10 rows, selected columns)
outliers %>%
select(age, gender, academic_year, burnout_score) %>%
sample_n(10) %>%
kable()
| age | gender | academic_year | burnout_score |
|---|---|---|---|
| 19 | Female | 2 | 7.761073 |
| 26 | Male | 1 | 7.276458 |
| 19 | Male | 2 | 7.183207 |
| 19 | Female | 4 | 7.103009 |
| 27 | Other | 4 | 7.256140 |
| 21 | Male | 1 | 7.676246 |
| 18 | Female | 4 | 7.812587 |
| 29 | Male | 2 | 7.853736 |
| 22 | Female | 1 | 7.748621 |
| 20 | Female | 3 | 7.553783 |
Interpretation The analysis shows that stress level has the strongest impact on burnout among students.Sleep plays an important role, as lower sleep is associated with higher burnout.Academic year and gender do not show significant differences in burnout levels. Overall, managing stress and maintaining proper sleep can help reduce burnout.
Question 5.3: whether there is a significant difference in burnout scores between genders. Use one-way ANOVA and interpret the results.
anova_gender <- aov(burnout_score ~ gender, data=data)
summary(anova_gender)
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 2e+00 0 0.0292 0.011 0.99
## Residuals 1e+06 2769010 2.7690
Interpretation -The p-value = 0.99, which is much greater than 0.05 -The F-value = 0.0292, which is very small
-H₀ (Null Hypothesis): No difference in burnout scores between genders -H₁ (Alternative Hypothesis): There is a difference -Since p-value > 0.05, we fail to reject H₀
ANOVA (Does Burnout Differ by Academic Year?)
Question 5.4: Is burnout scores differ significantly across different academic years. Perform a one-way ANOVA using academic_year as a factor and interpret the results.
anova_model <- aov(burnout_score ~ factor(academic_year), data=data)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(academic_year) 3e+00 2 0.7688 0.278 0.842
## Residuals 1e+06 2769007 2.7690
Interpretation The p-value = 0.842, which is greater than 0.05 The F-value = 0.7688, which is relatively small
Hypothesis Testing -H₀ (Null Hypothesis): Burnout scores are the same across all academic years -H₁ (Alternative Hypothesis): At least one academic year has different burnout scores
-Since p-value > 0.05, we fail to reject H₀
-Polynomial Regression (Study Hours)
Question 5.5: Polynomial regression model to examine the relationship between study hours per day and burnout score. Use a quadratic model and interpret the results.
model_poly <- lm(burnout_score ~ poly(study_hours_per_day, 2), data=data)
summary(model_poly)
##
## Call:
## lm(formula = burnout_score ~ poly(study_hours_per_day, 2), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4846 -1.2352 -0.2856 1.0240 8.3844
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.784e+00 1.567e-03 1138.51 <2e-16 ***
## poly(study_hours_per_day, 2)1 5.577e+02 1.567e+00 355.88 <2e-16 ***
## poly(study_hours_per_day, 2)2 4.960e+01 1.567e+00 31.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.567 on 999997 degrees of freedom
## Multiple R-squared: 0.1132, Adjusted R-squared: 0.1132
## F-statistic: 6.383e+04 on 2 and 999997 DF, p-value: < 2.2e-16
Question 5.6: Fit a multiple regression model to examine the effect of study hours per day on burnout score, including a quadratic (polynomial) term. Also include other relevant variables such as stress level and anxiety score. Interpret the results.
model_multi_poly <- lm(burnout_score ~ poly(study_hours_per_day, 2) + stress_level + anxiety_score, data=data)
summary(model_multi_poly)
##
## Call:
## lm(formula = burnout_score ~ poly(study_hours_per_day, 2) + stress_level +
## anxiety_score, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3190 -0.7372 -0.0522 0.6847 5.6773
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.288276 0.003053 -421.95 <2e-16 ***
## poly(study_hours_per_day, 2)1 131.436103 1.131711 116.14 <2e-16 ***
## poly(study_hours_per_day, 2)2 48.787707 1.058854 46.08 <2e-16 ***
## stress_level 0.548116 0.001001 547.83 <2e-16 ***
## anxiety_score 0.249395 0.001081 230.74 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.059 on 999995 degrees of freedom
## Multiple R-squared: 0.5951, Adjusted R-squared: 0.5951
## F-statistic: 3.674e+05 on 4 and 999995 DF, p-value: < 2.2e-16
Interpretation This is a multiple regression model because it includes multiple predictors It is also a polynomial (quadratic) model due to poly(study_hours_per_day, 2)
This means burnout is influenced by:
Study hours (linear + curved effect) Stress level Anxiety score
Conclusion The analysis shows that burnout is mainly influenced by stress levels and sleep patterns. Students in higher academic years experience more burnout. Gender does not significantly affect burnout. Managing stress and improving sleep can help reduce burnout levels. The dataset reveals that burnout among students is primarily influenced by psychological and lifestyle factors such as stress, anxiety, study patterns, and sleep, rather than demographic variables like gender or academic year. The relationship between study hours and burnout is non-linear, and burnout significantly impacts academic outcomes such as dropout risk. Overall, burnout is a multi-factor phenomenon requiring holistic management.