Introduction -This project focuses on analyzing student mental health and burnout levels using a large dataset. -The study explores how factors like stress, sleep, and study hours influence burnout.

Objectives - To analyze burnout levels among students
- To study the relationship between stress, sleep, and study hours
- To identify key factors affecting student mental health

Scope -This project helps in understanding patterns of student stress and provides insights for improving mental well-being.

Load Necessary Libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(corrplot)
## corrplot 0.95 loaded

Load Dataset

data <- read.csv("C:/Users/ASUS/OneDrive/Documents/Desktop/R Script/student_mental_health_burnout_1M.csv")

-The dataset is loaded for analysis.

-h LEVEL 1: Understanding the Data (Basic Exploration)

Question 1.1: What is the structure of the dataset (number of rows, columns, and data types)?

  1. Structure of Dataset

• Types of variables (numeric, categorical) • Whether data is usable for stats/ML

str(data)
## 'data.frame':    1000000 obs. of  20 variables:
##  $ age                 : int  23 20 29 27 24 29 21 23 26 19 ...
##  $ gender              : chr  "Male" "Male" "Male" "Male" ...
##  $ academic_year       : int  2 3 2 4 4 3 3 2 4 3 ...
##  $ study_hours_per_day : num  5.6 5.6 2.58 4.61 2.19 ...
##  $ exam_pressure       : num  6.49 5.63 6.02 6.68 4.01 ...
##  $ academic_performance: num  68.4 67.7 58.4 68.9 69.1 ...
##  $ stress_level        : num  4.117 0.349 3.476 6.779 1.855 ...
##  $ anxiety_score       : num  2.28 0 2.43 4.51 1.1 ...
##  $ depression_score    : num  1.987 0 0.852 4.286 0 ...
##  $ sleep_hours         : num  6.88 7.46 8.95 4.57 5.99 ...
##  $ physical_activity   : num  2.73 3.69 3.3 2.07 4.03 ...
##  $ social_support      : num  6.47 0 6.9 2.35 4.51 ...
##  $ screen_time         : num  4.99 3.86 5.43 6.3 4.9 ...
##  $ internet_usage      : num  4.98 5.14 3.06 6.93 5.13 ...
##  $ financial_stress    : num  3.45 2.81 4.92 6.92 4.38 ...
##  $ family_expectation  : num  3.59 5.48 6.07 6.56 5.93 ...
##  $ burnout_score       : num  2.04 0 0 7.23 0 ...
##  $ mental_health_index : num  7.07 9.86 7.63 4.65 8.93 ...
##  $ risk_level          : chr  "Low" "Low" "Low" "High" ...
##  $ dropout_risk        : num  1.747 0 0.697 5.381 0 ...
dim(data)
## [1] 1000000      20

Interpretation This dataset contains both categorical (Gender, Academic Year) and numerical variables (Burnout, Stress, Sleep), making it suitable for statistical and predictive analysis.

Question 1.2: What do the summary statistics reveal about the average burnout, stress levels, and the range (minimum to maximum) of values in the dataset?

summary(data)
##       age        gender          academic_year   study_hours_per_day
##  Min.   :17   Length:1000000     Min.   :1.000   Min.   : 0.000     
##  1st Qu.:20   Class :character   1st Qu.:2.000   1st Qu.: 3.651     
##  Median :23   Mode  :character   Median :3.000   Median : 4.998     
##  Mean   :23                      Mean   :2.501   Mean   : 5.002     
##  3rd Qu.:26                      3rd Qu.:4.000   3rd Qu.: 6.346     
##  Max.   :29                      Max.   :4.000   Max.   :14.000     
##  exam_pressure    academic_performance  stress_level    anxiety_score   
##  Min.   : 1.000   Min.   :42.37        Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 4.945   1st Qu.:67.18        1st Qu.: 3.103   1st Qu.: 1.924  
##  Median : 5.999   Median :71.00        Median : 4.244   Median : 2.970  
##  Mean   : 5.999   Mean   :71.00        Mean   : 4.246   Mean   : 2.986  
##  3rd Qu.: 7.052   3rd Qu.:74.82        3rd Qu.: 5.385   3rd Qu.: 4.015  
##  Max.   :10.000   Max.   :97.25        Max.   :10.000   Max.   :10.000  
##  depression_score    sleep_hours     physical_activity social_support  
##  Min.   :0.000000   Min.   : 3.000   Min.   :0.000     Min.   : 0.000  
##  1st Qu.:0.005198   1st Qu.: 5.491   1st Qu.:1.991     1st Qu.: 3.650  
##  Median :1.047839   Median : 6.502   Median :3.001     Median : 4.999  
##  Mean   :1.274728   Mean   : 6.502   Mean   :3.011     Mean   : 5.000  
##  3rd Qu.:2.086397   3rd Qu.: 7.515   3rd Qu.:4.011     3rd Qu.: 6.350  
##  Max.   :8.530800   Max.   :10.000   Max.   :7.000     Max.   :10.000  
##   screen_time     internet_usage   financial_stress family_expectation
##  Min.   : 1.000   Min.   : 1.000   Min.   : 0.000   Min.   : 0.000    
##  1st Qu.: 3.651   1st Qu.: 3.491   1st Qu.: 3.657   1st Qu.: 4.647    
##  Median : 5.004   Median : 5.002   Median : 5.001   Median : 6.000    
##  Mean   : 5.019   Mean   : 5.038   Mean   : 5.003   Mean   : 5.983    
##  3rd Qu.: 6.351   3rd Qu.: 6.507   3rd Qu.: 6.355   3rd Qu.: 7.352    
##  Max.   :12.000   Max.   :14.000   Max.   :10.000   Max.   :10.000    
##  burnout_score     mental_health_index  risk_level         dropout_risk  
##  Min.   : 0.0000   Min.   : 1.310      Length:1000000     Min.   :0.000  
##  1st Qu.: 0.1248   1st Qu.: 6.142      Class :character   1st Qu.:0.000  
##  Median : 1.4965   Median : 7.074      Mode  :character   Median :1.010  
##  Mean   : 1.7841   Mean   : 7.023                         Mean   :1.325  
##  3rd Qu.: 2.8895   3rd Qu.: 7.962                         3rd Qu.:2.174  
##  Max.   :10.0000   Max.   :10.000                         Max.   :9.326

Interpretation The dataset shows students aged 17–29, with an average age of around 23. Most variables like study hours, stress, and screen time have moderate average values near 5. Academic performance is stable, with an average around 71%, and sleep averages about 6.5 hours. There is noticeable variation in mental health factors like anxiety, depression, and burnout among students.

Question 1.3: Are there any missing values in the dataset? • Data quality issues

colSums(is.na(data))
##                  age               gender        academic_year 
##                    0                    0                    0 
##  study_hours_per_day        exam_pressure academic_performance 
##                    0                    0                    0 
##         stress_level        anxiety_score     depression_score 
##                    0                    0                    0 
##          sleep_hours    physical_activity       social_support 
##                    0                    0                    0 
##          screen_time       internet_usage     financial_stress 
##                    0                    0                    0 
##   family_expectation        burnout_score  mental_health_index 
##                    0                    0                    0 
##           risk_level         dropout_risk 
##                    0                    0

Interpretation This checks for missing values in each column. The dataset contains no missing values.

Question 1.4: What is the average burnout level of students for across all years? • Overall mental health state

# Clean column names
names(data) <- tolower(trimws(names(data)))
names(data) <- gsub(" ", "_", names(data))

# Check column names first
print(names(data))
##  [1] "age"                  "gender"               "academic_year"       
##  [4] "study_hours_per_day"  "exam_pressure"        "academic_performance"
##  [7] "stress_level"         "anxiety_score"        "depression_score"    
## [10] "sleep_hours"          "physical_activity"    "social_support"      
## [13] "screen_time"          "internet_usage"       "financial_stress"    
## [16] "family_expectation"   "burnout_score"        "mental_health_index" 
## [19] "risk_level"           "dropout_risk"
# Convert columns safely to numeric
cols <- c("burnout_score", "stress_level", "study_hours", "sleep_hours")

for (col in cols) {
  if (col %in% names(data)) {
    data[[col]] <- as.numeric(as.character(data[[col]]))
  } else {
    cat(paste("Column not found:", col, "\n"))
  }
}
## Column not found: study_hours
# Check structure
str(data)
## 'data.frame':    1000000 obs. of  20 variables:
##  $ age                 : int  23 20 29 27 24 29 21 23 26 19 ...
##  $ gender              : chr  "Male" "Male" "Male" "Male" ...
##  $ academic_year       : int  2 3 2 4 4 3 3 2 4 3 ...
##  $ study_hours_per_day : num  5.6 5.6 2.58 4.61 2.19 ...
##  $ exam_pressure       : num  6.49 5.63 6.02 6.68 4.01 ...
##  $ academic_performance: num  68.4 67.7 58.4 68.9 69.1 ...
##  $ stress_level        : num  4.117 0.349 3.476 6.779 1.855 ...
##  $ anxiety_score       : num  2.28 0 2.43 4.51 1.1 ...
##  $ depression_score    : num  1.987 0 0.852 4.286 0 ...
##  $ sleep_hours         : num  6.88 7.46 8.95 4.57 5.99 ...
##  $ physical_activity   : num  2.73 3.69 3.3 2.07 4.03 ...
##  $ social_support      : num  6.47 0 6.9 2.35 4.51 ...
##  $ screen_time         : num  4.99 3.86 5.43 6.3 4.9 ...
##  $ internet_usage      : num  4.98 5.14 3.06 6.93 5.13 ...
##  $ financial_stress    : num  3.45 2.81 4.92 6.92 4.38 ...
##  $ family_expectation  : num  3.59 5.48 6.07 6.56 5.93 ...
##  $ burnout_score       : num  2.04 0 0 7.23 0 ...
##  $ mental_health_index : num  7.07 9.86 7.63 4.65 8.93 ...
##  $ risk_level          : chr  "Low" "Low" "Low" "High" ...
##  $ dropout_risk        : num  1.747 0 0.697 5.381 0 ...

Interpretation The dataset was cleaned by standardizing column names and converting key variables into numeric format. Some values may have been converted to missing (NA) due to inconsistent or invalid data entries. This preprocessing step ensures that the data is suitable for statistical analysis and visualization. Overall, the cleaned dataset provides a more reliable foundation for analyzing student burnout and related factors.

Question 1.5: How is the gender distribution represented in the dataset, and does it indicate a balanced dataset? Use the frequency table of the gender variable to support your answer.

data$gender <- trimws(as.character(data$gender))
data$gender[data$gender == ""] <- NA

table(data$gender, useNA = "ifany")
## 
## Female   Male  Other 
## 480070 479643  40287

Interpretation Dataset shows a nearly balanced gender distribution between male and female students.Female students (480,070) slightly outnumber male students (479,643).A smaller proportion of students identify as “Other” (40,287). Overall, the dataset is well-balanced, making it suitable for unbiased analysis across genders.

Question 1.6: What is the average number of study hours per day among students in the dataset, and how does handling missing values (NA) affect the calculation?

mean(data$study_hours_per_day, na.rm = TRUE)
## [1] 5.001727

Interpretation On average, students study around 5 hours per day.This indicates a moderate level of academic engagement among students. The use of na.rm = TRUE ensures missing values do not affect the result.Overall, the dataset shows a consistent study pattern across students.

LEVEL 2: Data Manipulation (DPLYR ANALYSIS)
Question 2.1:What are the average burnout levels and average study hours for each gender group in the dataset?
``` r library(dplyr)
# Check names names(data) ```
## [1] "age" "gender" "academic_year" ## [4] "study_hours_per_day" "exam_pressure" "academic_performance" ## [7] "stress_level" "anxiety_score" "depression_score" ## [10] "sleep_hours" "physical_activity" "social_support" ## [13] "screen_time" "internet_usage" "financial_stress" ## [16] "family_expectation" "burnout_score" "mental_health_index" ## [19] "risk_level" "dropout_risk"
r # Use exact column names by selecting data %>% group_by(data[[2]]) %>% # gender column usually 2nd summarise( avg_burnout = mean(data[[6]], na.rm = TRUE), # burnout column avg_study = mean(data[[4]], na.rm = TRUE) # study hours column )
## # A tibble: 3 × 3 ## `data[[2]]` avg_burnout avg_study ## <chr> <dbl> <dbl> ## 1 Female 71.0 5.00 ## 2 Male 71.0 5.00 ## 3 Other 71.0 5.00
Interpretation The analysis compares average burnout and study hours across different gender groups.It helps identify whether burnout levels vary between male, female, and other students.Similarly, differences in study hours show variation in academic engagement by gender.Overall, this analysis highlights how gender may influence both study patterns and burnout levels.
Question 2.2: How are students distributed across different stress categories (Low, Medium, High) based on their stress levels?
``` r data <- data %>% mutate(stress_category = case_when( stress_level <= 3 ~ “Low”, stress_level <= 7 ~ “Medium”, TRUE ~ “High” ))
# Show result table(data$stress_category) ```
## ## High Low Medium ## 51136 231184 717680 Interpretation The majority of students fall under the medium stress category (717,680), indicating moderate stress levels are most common. A significant number of students are in the low stress category (231,184), showing a fair portion experiences minimal stress. However, a smaller group (51,136) falls into the high stress category, which may require attention. Overall, the data suggests that while most students manage moderate stress, a notable minority faces high stress levels.
Question 2.3: Which age group (Teen, Young Adult, Adult) experiences the highest average burnout level?
``` r library(dplyr)
data %>% mutate(age_group = case_when( age < 20 ~ “Teen”, age <= 25 ~ “Young Adult”, TRUE ~ “Adult” )) %>% group_by(age_group) %>% summarise(avg_burnout = mean(burnout_score, na.rm = TRUE)) ```
## # A tibble: 3 × 2 ## age_group avg_burnout ## <chr> <dbl> ## 1 Adult 1.78 ## 2 Teen 1.79 ## 3 Young Adult 1.78 Interpretation The average burnout levels are quite similar across all age groups.Teens have the slightly highest burnout (1.788), followed by Young Adults (1.783) and Adults (1.782).The difference between groups is very minimal,indicating burnout is almost evenly distributed.Overall, age does not appear to significantly impact burnout levels in this dataset.
Question 2.4: Who are the top 5 students with the highest burnout scores in the dataset, and why might they require immediate attention or intervention?
r data %>% arrange(desc(burnout_score)) %>% head(5)
## age gender academic_year study_hours_per_day exam_pressure ## 1 20 Female 2 7.434886 8.468656 ## 2 18 Female 2 9.681102 10.000000 ## 3 25 Male 3 8.121920 8.118053 ## 4 25 Female 1 7.120579 9.410431 ## 5 28 Female 2 8.337083 9.669194 ## academic_performance stress_level anxiety_score depression_score sleep_hours ## 1 67.36289 7.049948 6.296077 3.728402 4.178584 ## 2 80.92194 10.000000 7.828319 6.151836 4.000456 ## 3 68.15252 9.758479 7.578385 4.264993 3.000000 ## 4 68.14285 8.746559 6.363234 5.986526 3.075012 ## 5 69.79236 10.000000 8.432348 5.245182 6.880856 ## physical_activity social_support screen_time internet_usage financial_stress ## 1 3.936872 3.902406 5.996620 6.973373 4.547513 ## 2 1.759483 1.753121 4.214899 3.982958 8.674813 ## 3 2.142500 5.845011 5.894929 6.432793 6.603594 ## 4 3.188768 5.843709 5.991998 6.539810 9.787591 ## 5 2.421136 7.555843 3.328305 4.280713 9.463255 ## family_expectation burnout_score mental_health_index risk_level dropout_risk ## 1 7.388527 10 4.172677 High 4.357292 ## 2 7.403697 10 1.805954 High 6.712860 ## 3 5.513622 10 2.543595 High 6.029545 ## 4 8.037100 10 2.796448 High 7.141450 ## 5 5.007192 10 1.896741 High 5.782103 ## stress_category ## 1 High ## 2 High ## 3 High ## 4 High ## 5 High Interpretation The analysis identifies the top 5 students with the highest burnout scores in the dataset.These students represent extreme cases compared to the overall population.Their high burnout levels may indicate serious mental and academic pressure.Therefore, they may require immediate attention and proper intervention.

LEVEL 3: Correlation & Relationship Analysis

Question 3.1: What is the relationship between sleep hours and burnout score among students, and how does sleep affect burnout levels?

cor(
  as.numeric(data$sleep_hours),
  as.numeric(data$burnout_score),
  use = "complete.obs"
)
## [1] -0.3713859

Interpretation The correlation analysis measures the relationship between sleep hours and burnout levels.A negative correlation indicates that as sleep decreases, burnout tends to increase.This suggests that students who get less sleep are more likely to experience higher burnout.Overall, proper sleep plays an important role in reducing burnout among s

Question 3.2: How do average study hours vary across different academic years, and what does this indicate about academic pressure?

data %>%
  group_by(academic_year) %>%
  summarise(avg_study = mean(study_hours_per_day, na.rm = TRUE))
## # A tibble: 4 × 2
##   academic_year avg_study
##           <int>     <dbl>
## 1             1      5.00
## 2             2      5.00
## 3             3      5.00
## 4             4      5.00

Interpretation The average study hours are almost the same across all academic years.There is only a very slight increase in study hours from year 1 to year 4.

Question 3.3: Which students have high burnout scores but low study hours, and what might this indicate about their condition?

data %>%
  filter(burnout_score > 7 & study_hours_per_day < 3)
##    age gender academic_year study_hours_per_day exam_pressure
## 1   22   Male             3           2.7664645      4.589180
## 2   18   Male             2           1.5099779      3.658734
## 3   28 Female             1           2.2865946      5.883640
## 4   24 Female             2           2.6439506      5.047968
## 5   23  Other             4           2.4727515      6.122360
## 6   19   Male             2           2.2320142      5.666139
## 7   24   Male             1           2.9242063      6.054595
## 8   29 Female             2           2.2137185      6.074293
## 9   28 Female             2           2.7478305      4.964551
## 10  23 Female             3           2.8147511      6.864406
## 11  29 Female             3           2.5518931      5.103456
## 12  17   Male             4           2.1264212      6.398525
## 13  23 Female             4           1.4118082      6.592856
## 14  26   Male             2           2.7656726      5.872417
## 15  18   Male             2           1.9291570      3.761876
## 16  28   Male             1           2.7460615      6.137397
## 17  25 Female             1           1.5989425      3.878800
## 18  29   Male             1           2.4197032      5.128629
## 19  23   Male             2           2.2110535      7.018914
## 20  25 Female             3           1.6642838      5.536171
## 21  25   Male             1           2.6455247      6.215714
## 22  17 Female             1           2.0272267      3.364869
## 23  21   Male             2           2.3853429      6.533005
## 24  25 Female             1           1.1234241      3.880244
## 25  18   Male             2           2.8468938      6.103351
## 26  28   Male             2           2.0010709      5.885879
## 27  27 Female             4           0.0000000      3.414866
## 28  24   Male             2           2.1998777      5.717227
## 29  18   Male             4           2.5044336      4.291511
## 30  27 Female             1           1.0519486      5.065720
## 31  17 Female             3           2.3701006      5.225254
## 32  21 Female             3           2.3450568      6.988845
## 33  28   Male             3           2.8687208      5.137485
## 34  25 Female             3           2.9889178      5.432657
## 35  24 Female             4           1.7190737      5.408952
## 36  26 Female             4           2.8680856      5.136847
## 37  23 Female             4           2.0863010      7.029295
## 38  19 Female             3           2.5751931      7.056468
## 39  21 Female             2           2.0748494      6.722612
## 40  29 Female             1           2.8921087      4.518733
## 41  28   Male             2           2.7795971      5.409114
## 42  18 Female             3           2.7143694      6.177466
## 43  20 Female             4           0.7158414      4.664664
## 44  20 Female             4           2.7629412      4.951928
## 45  24 Female             2           1.8747637      4.463754
## 46  21 Female             4           1.8282868      7.544150
## 47  17 Female             4           2.4963502      4.926036
## 48  27 Female             2           2.9513445      7.071886
## 49  25   Male             4           1.7559504      6.008640
## 50  25   Male             1           2.6755065      2.835343
## 51  21   Male             3           2.8764745      4.841230
## 52  18   Male             4           2.7957252      4.746295
## 53  28   Male             2           2.2167873      4.804203
## 54  28   Male             3           2.2616998      6.373028
## 55  18   Male             1           2.4482749      4.039813
## 56  29  Other             1           2.5114051      4.305329
## 57  29   Male             3           2.5927587      5.470537
## 58  18 Female             2           2.4118345      5.396160
## 59  27 Female             1           1.4775475      5.841774
## 60  25   Male             2           2.5713823      5.531697
## 61  28 Female             1           2.1247682      6.359655
## 62  22   Male             1           2.7490652      6.756891
## 63  23 Female             3           2.7611667      5.113206
## 64  19 Female             1           2.3680369      5.020049
## 65  19  Other             3           2.9141171      5.295781
## 66  23 Female             1           2.9555685      9.038181
##    academic_performance stress_level anxiety_score depression_score sleep_hours
## 1              71.57590     6.922059      7.406937        4.2759735    4.593005
## 2              69.22643     6.304914      5.677457        3.6388208    4.423357
## 3              65.03467     7.394936      4.813425        2.4239676    4.721513
## 4              72.11181     7.061245      5.285346        2.8491080    4.532336
## 5              67.14357     6.279518      4.631244        2.0216005    3.682621
## 6              70.03174     7.505642      5.108300        3.3564608    5.228673
## 7              70.51512     7.949085      7.517577        3.6691516    5.733907
## 8              61.69167     7.571588      7.032356        5.6869317    4.599037
## 9              70.75603     7.574985      6.347010        2.1660904    4.916221
## 10             60.40191     7.422617      6.023354        3.9896969    6.224178
## 11             65.28942     7.038465      5.241751        3.4138526    4.079692
## 12             66.88587     7.936794      5.965406        5.7927066    4.950533
## 13             64.74485     7.698999      7.080956        4.4947866    4.232068
## 14             78.91947     6.797810      6.327349        0.9438719    3.654157
## 15             69.14069     8.836854      5.236125        5.3699508    3.408973
## 16             55.43211     8.024838      5.993706        2.9623898    4.498520
## 17             64.26265     6.359579      6.053508        4.2275751    4.748724
## 18             63.50625     7.959417      6.146259        3.2848985    3.040584
## 19             67.96039     9.208591      7.001844        6.6374980    3.000000
## 20             66.31495     7.642988      5.550248        3.6833407    3.000000
## 21             68.30919     7.625498      6.738303        3.2340446    5.720183
## 22             67.95530     8.159750      4.922105        3.9728196    3.000000
## 23             62.31588     7.490747      6.591655        4.8985077    4.991149
## 24             66.06619     5.632942      4.116459        4.5777974    4.583127
## 25             66.17318     9.520307      5.485226        5.3531674    5.555849
## 26             69.19192     8.013388      6.041670        4.2237194    3.000000
## 27             67.89344     6.763818      7.276278        2.1717787    4.983623
## 28             68.78551     6.950256      5.921711        3.4205109    5.270893
## 29             58.17739     7.494285      7.194584        4.6919835    4.318293
## 30             68.00649     7.657069      3.643033        2.9354313    4.337110
## 31             66.20823     7.513840      5.922412        3.3117961    3.985555
## 32             64.11797     8.352533      3.745261        5.2134686    5.669982
## 33             74.51645     6.828379      2.958630        3.8191222    3.000000
## 34             63.63387     5.446310      3.927282        4.7968634    3.000000
## 35             71.14018     8.732641      5.368194        3.9762405    3.196513
## 36             64.58110     8.020590      5.209913        3.7525271    5.082960
## 37             62.37717     6.836717      7.671908        4.7859558    4.034876
## 38             59.20799     9.217629      6.382577        4.0274326    5.685083
## 39             62.32604     7.869867      6.987635        3.1079156    7.402199
## 40             68.72612     7.223438      5.410664        3.6085355    3.838692
## 41             61.42479     8.530603      7.685144        5.7036960    3.000000
## 42             67.28202     8.179641      4.791468        5.1109147    4.065765
## 43             68.31523     6.840585      6.821359        2.0468791    4.832804
## 44             64.75226     6.989046      5.213940        3.6890339    5.209715
## 45             67.23902     7.948310      6.692596        3.1396253    4.372022
## 46             60.69527     8.428624      6.709301        4.6462531    4.588784
## 47             72.87932     8.020934      5.232859        3.7129066    4.263608
## 48             64.16690     8.386509      6.206810        5.3674092    4.219032
## 49             72.00274     6.727341      6.068890        5.8391824    3.779148
## 50             56.57284     8.750017      8.404884        2.8628004    4.529202
## 51             67.98732     7.233725      5.624963        6.6123312    4.788086
## 52             67.12514     7.098860      6.595889        5.1114656    4.076269
## 53             63.56801     8.938628      4.755021        5.6645791    3.149180
## 54             69.72228     6.553573      4.843606        3.3465111    3.429240
## 55             61.27774     9.255991      5.464613        3.4307751    5.268390
## 56             76.99408     7.442230      4.457039        3.4386856    4.430949
## 57             65.71962     8.918114      6.490950        5.2753261    3.000000
## 58             60.60082     7.600435      3.430083        3.2553267    4.344867
## 59             57.35242     7.390835      7.039568        1.7249606    3.285594
## 60             71.75165     9.175739      6.998808        3.7248155    4.742595
## 61             72.14155     5.936867      3.228154        4.4232439    4.520384
## 62             66.33420     6.660016      6.683599        4.1540865    4.341100
## 63             59.85959     8.868856      7.917835        3.9585520    3.000000
## 64             71.23603     7.016851      6.142979        4.1054875    3.000000
## 65             67.28902     7.976703      7.613253        5.4928838    4.414605
## 66             71.40459     7.636825      6.686762        2.1233046    6.174883
##    physical_activity social_support screen_time internet_usage financial_stress
## 1          3.4343936     3.24916935    4.804843       2.301280         4.879540
## 2          2.6098057     3.71844261    4.155154       3.994222         6.612315
## 3          1.9645597     6.49519016    5.694452       6.878628         8.442311
## 4          2.5832086     1.48495422    5.501220       4.613716         7.825519
## 5          2.9523048     3.56364847    4.155639       5.642483         6.571124
## 6          1.3089149     1.55499931    4.266509       3.000125         8.222645
## 7          2.4861469     4.18214747    6.577716       6.878524         9.738804
## 8          1.8881351     3.14119703    5.233493       6.471028         6.451674
## 9          2.9696729     7.21500108    8.813883       8.285656         6.105695
## 10         4.3913636     3.13747669    3.889238       6.324845         5.410617
## 11         4.2775682     3.84988953    8.251074       6.821011         8.891268
## 12         2.9566299     4.64597169    2.897736       4.141079         9.922092
## 13         3.2415985     2.72468644    6.074900       5.523304         4.185518
## 14         2.8206283     7.49821528    6.558042       4.413671         7.049005
## 15         4.4644720     0.76572761    8.144644       8.684905         9.603963
## 16         1.5329623     1.95446110    3.891875       2.602208        10.000000
## 17         3.4011023     0.03826296    5.110468       5.317680         9.791090
## 18         3.7610110     6.10436677    1.297869       1.263988        10.000000
## 19         3.0754660     3.43494225    3.729528       4.500015         8.118391
## 20         3.4213346     3.83679112    4.461302       4.059364         5.510529
## 21         1.9360298     1.75086806    7.920460       8.824320         5.697992
## 22         1.4789400     1.54287694    9.154598       9.958066         4.801118
## 23         4.4111376     2.04715309    7.916656       7.815029         9.272656
## 24         3.3975009     1.13174920    6.196575       5.288491         7.841112
## 25         2.5368213     1.67646422    3.110961       3.014175         6.484735
## 26         1.7008934     1.98815224    5.139895       5.867809         7.173008
## 27         4.5649267     3.34828401    5.834293       6.209893         9.381991
## 28         3.3288893     2.22148703    4.766632       5.595867         8.353537
## 29         1.5083700     4.54132262    5.992936       6.339860         8.447883
## 30         2.5642054     6.76280211    3.284578       2.379128         7.226893
## 31         0.7611050     0.18992677    1.000000       3.169534         5.895410
## 32         2.2338278     3.89047944    4.103220       3.656163         9.007804
## 33         0.0000000     5.48227881    6.423818       4.357607         5.971805
## 34         3.7503060     0.00000000    8.448276       8.774205         5.030046
## 35         1.3001598     4.83014136    3.856916       2.897423         6.785889
## 36         4.5310641     5.82088262    4.336431       3.595190        10.000000
## 37         0.2324391     2.94098839    5.711533       5.509197         4.177808
## 38         1.9362846     5.73952743    3.835354       5.226144         9.305125
## 39         2.1495439     0.62433359    4.528427       4.753104         6.567349
## 40         0.0000000     3.42493270    5.322838       5.702844         7.544919
## 41         1.6286432     2.07200017    4.345357       3.364072         7.302465
## 42         1.6029521     3.46563955    1.000000       3.253127         6.194253
## 43         1.7074346     2.23381472    7.264806       6.886336         6.897715
## 44         3.7403438     4.49907329    1.218415       1.000000         7.205770
## 45         1.3921702     5.77737619    9.427271       8.909941         8.024570
## 46         4.8364664     3.90177391    3.017938       3.388576         8.140058
## 47         2.2155832     2.43266927    7.645265       7.902305         8.383564
## 48         3.3194116     2.37410627    6.602973       7.291509         9.615357
## 49         1.8366570     0.00000000    7.926914       7.542428         8.096157
## 50         2.7598109     4.98648155    4.454584       3.213918         9.641538
## 51         4.3664039     0.00000000    5.022328       5.456036         7.478615
## 52         3.3067997     0.00000000    6.001435       7.597422         9.717045
## 53         1.1859158     5.32549780    3.652873       4.673192         8.506458
## 54         4.2948539     2.76861644    1.000000       1.000000         4.411466
## 55         4.3821131     4.41014686    4.174971       3.694146         8.921134
## 56         0.9657283     5.24838869    6.074591       5.797221         7.313866
## 57         0.4118288     3.84577579    3.693823       3.199683         5.611038
## 58         4.0468884     2.18848235    6.921208       6.159106         8.129003
## 59         3.4513113     2.45367621    6.776569       7.219208         4.685655
## 60         0.0000000     3.80218905    8.761642       8.444514         9.941159
## 61         2.8505870     3.02570486    2.462998       3.024297         4.410332
## 62         1.1168242     4.26053406    5.931941       4.278811         6.999807
## 63         2.9520053     3.66902632    3.859136       2.283165         8.535073
## 64         5.7260661     3.61170974    3.558765       2.548766         2.921604
## 65         0.4440304     2.35967905    5.070539       4.839424         6.264243
## 66         0.9625280     4.17320902    1.000000       1.422415        10.000000
##    family_expectation burnout_score mental_health_index risk_level dropout_risk
## 1            9.567179      7.666109            3.726303       High     5.761763
## 2            7.863416      7.008120            4.683151       High     4.844499
## 3            6.354101      7.392882            4.870808       High     3.256869
## 4            9.249084      7.058138            4.735166       High     5.176573
## 5            7.420730      7.441946            5.492339       High     3.372702
## 6            7.927364      7.461183            4.458315       High     7.117629
## 7            8.681005      7.506648            3.464347       High     6.487214
## 8            7.126718      7.099915            3.155579       High     6.194144
## 9            4.452725      7.068842            4.416076       High     2.700096
## 10           8.540742      7.013876            4.027038       High     3.648274
## 11           8.282033      7.154120            4.587933       High     3.053865
## 12           7.289254      7.740943            3.297849       High     6.006228
## 13           7.259658      7.318910            3.447678       High     4.393201
## 14           8.411513      7.084349            5.099510       High     2.111860
## 15           9.657895      7.491382            3.283436       High     8.052678
## 16           7.590593      7.267681            4.103236       High     4.875692
## 17           5.076589      7.138798            4.371843       High     8.124340
## 18           5.330413      7.126607            3.986886       High     5.007213
## 19           6.397704      7.344058            2.224761       High     6.579348
## 20           8.478760      7.297735            4.172728       High     4.367250
## 21           9.556074      7.070798            3.958097       High     4.979348
## 22           6.514850      7.208484            4.067623       High     4.989010
## 23           6.108870      7.265342            3.556652       High     6.419573
## 24           7.079694      7.025325            5.138546       High     4.818197
## 25           9.676154      7.351566            2.940359       High     5.034966
## 26           4.030313      8.319738            3.715028       High     4.668724
## 27           6.675785      7.412974            4.460056       High     3.660456
## 28           7.773715      7.007319            4.417231       High     4.227592
## 29           5.096599      7.394965            3.436316       High     6.336516
## 30           5.905136      7.197382            4.963633       High     3.336655
## 31           7.766006      7.573999            4.224202       High     5.964985
## 32           5.293567      7.708743            3.971368       High     5.285188
## 33           6.735988      7.065686            5.235323       High     4.642341
## 34           7.059510      7.020820            5.204233       High     4.510307
## 35           5.460258      7.286844            3.703613       High     4.804202
## 36           7.619545      7.340471            4.103032       High     7.157225
## 37           4.039251      7.961196            3.527954       High     5.841169
## 38           7.452516      7.246279            3.189946       High     4.828537
## 39           7.643617      7.499554            3.823388       High     4.958537
## 40           7.845402      7.405841            4.404865       High     6.244578
## 41           7.803323      8.274104            2.571107       High     6.756365
## 42           6.461382      7.172892            3.757429       High     6.265660
## 43           3.816223      7.213799            4.603294       High     4.954871
## 44           9.426228      7.076345            4.533489       High     3.599777
## 45           8.181630      7.351463            3.871009       High     3.941323
## 46           9.563540      7.415768            3.221884       High     4.656346
## 47           5.555241      7.386210            4.107897       High     3.444455
## 48           7.326211      8.404778            3.173131       High     6.852744
## 49           8.994406      7.593673            3.736642       High     6.969048
## 50           9.513563      8.312329            3.119688       High     3.329640
## 51           7.976977      7.160737            3.435322       High     5.500402
## 52           6.898720      7.542005            3.648249       High     7.354499
## 53           9.043106      9.424525            3.298669       High     5.832988
## 54           8.112579      8.051759            4.921536       High     4.775516
## 55           8.992554      7.067088            3.628987       High     2.986024
## 56           6.625903      7.244487            4.654391       High     5.757050
## 57           8.987222      7.516295            2.902871       High     5.584604
## 58           8.723108      7.128011            4.954203       High     5.345439
## 59          10.000000      8.003765            4.414307       High     2.397678
## 60           9.075040      7.848760            3.112618       High     6.350650
## 61           6.530127      7.030023            5.329834       High     3.835598
## 62           3.793718      7.297613            4.084688       High     4.803495
## 63           8.355916      7.627317            2.889542       High     6.411738
## 64           9.778943      7.411784            4.118719       High     4.765548
## 65          10.000000      7.414707            2.877478       High     4.257310
## 66           6.256730      7.118447            4.302250       High     3.387377
##    stress_category
## 1           Medium
## 2           Medium
## 3             High
## 4             High
## 5           Medium
## 6             High
## 7             High
## 8             High
## 9             High
## 10            High
## 11            High
## 12            High
## 13            High
## 14          Medium
## 15            High
## 16            High
## 17          Medium
## 18            High
## 19            High
## 20            High
## 21            High
## 22            High
## 23            High
## 24          Medium
## 25            High
## 26            High
## 27          Medium
## 28          Medium
## 29            High
## 30            High
## 31            High
## 32            High
## 33          Medium
## 34          Medium
## 35            High
## 36            High
## 37          Medium
## 38            High
## 39            High
## 40            High
## 41            High
## 42            High
## 43          Medium
## 44          Medium
## 45            High
## 46            High
## 47            High
## 48            High
## 49          Medium
## 50            High
## 51            High
## 52            High
## 53            High
## 54          Medium
## 55            High
## 56            High
## 57            High
## 58            High
## 59            High
## 60            High
## 61          Medium
## 62          Medium
## 63            High
## 64            High
## 65            High
## 66            High

Interpretation Students with high burnout but low study hours are identified.Their burnout may not be due to academic pressure alone.They could be facing emotional or psychological stress.This highlights the importance of mental health factors.

LEVEL 4: Visualization of Dataset

Question 4.1: What does the histogram reveal about the distribution and shape of burnout scores among students?

library(ggplot2)

ggplot(data, aes(x = burnout_score, fill = ..count..)) +
  geom_histogram(bins = 30, color = "black") +
  labs(
    title = "Distribution of Burnout Scores",
    x = "Burnout Score",
    y = "Frequency"
  ) +
  theme_minimal()
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Interpretation If right-skewed → Most students have low to moderate burnout, with a few having very high burnout levels. If left-skewed → Most students have high burnout, with only a few experiencing low burnout.

If normal distribution → Burnout levels are evenly spread, with most students having moderate burnout. If uniform → Burnout scores are evenly distributed across all levels.

Question 4.2: How do burnout levels compare across different genders, and what does the boxplot reveal about variability and outliers?

library(ggplot2)

ggplot(data, aes(x = gender, y = burnout_score, fill = gender)) +
  geom_boxplot() +
  scale_fill_manual(values = c("#FFC0CB", "#BEBEBE", "#800000")) +
  labs(
    title = "Burnout Level by Gender",
    x = "Gender",
    y = "Burnout Score",
    fill = "Gender"
  ) +
  theme_minimal()

Interpretation All medians are similar → Burnout levels are almost the same across all genders. boxes overlap significantly → There is no major difference in burnout distribution between genders. spread is similar → Variability in burnout is consistent across genders.

Overall → Gender does not have a significant impact on burnout levels in this dataset. Every gender shows median burnout and more variability.

Question 4.3: What relationship does the scatter plot show between study hours per day and burnout score among students?

# Ensure the library is loaded
library(ggplot2)

# Corrected Plot Code
ggplot(data, aes(x = `study_hours_per_day`, y = `burnout_score`)) +
  geom_point(color = "pink", alpha = 0.6) +
  geom_smooth(method = "lm", se = TRUE, color = "black") +
  labs(
    title = "Study Hours vs Burnout Score",
    x = "Study Hours per Day",
    y = "Burnout Score"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Interpretation A positive trend indicates that more study hours increase burnout.

Question 4.4:How does the average stress level vary across different academic years, as shown in the bar chart?

data %>%
  group_by(academic_year) %>%
  summarise(avg_stress = mean(stress_level, na.rm = TRUE)) %>%
  ggplot(aes(x = factor(academic_year), y = avg_stress, fill = factor(academic_year))) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = round(avg_stress, 2)), vjust = -0.5) +
  scale_fill_manual(values = c("pink","grey","maroon","beige")) +
  labs(
    title = "Average Stress Level by Academic Year",
    x = "Academic Year",
    y = "Average Stress Level",
    fill = "Year"
  ) +
  coord_cartesian(ylim = c(0, 10)) +
  theme_minimal()

Interpretation The average stress level remains almost the same across all academic years.There is no significant variation in stress levels from year 1 to year 4.This suggests that academic pressure is consistent throughout all years.Overall, academic year does not have a strong impact on stress levels in this dataset.

Question 4.5: How does the relationship between sleep hours and burnout score differ across genders in the facet plot?

ggplot(data, aes(x = sleep_hours, y = burnout_score, color = gender)) +
  geom_point() +
  facet_wrap(~gender) +
  scale_color_manual(values = c("Male" = "maroon", "Female" = "pink", "Other" ="grey"))

Interpretation The facet plot shows a similar pattern between sleep hours and burnout across all genders.Data points are evenly spread, and no gender shows a distinct trend or deviation.This indicates that the relationship between sleep and burnout is consistent for all groups.Overall, gender does not significantly alter how sleep impacts burnout in this dataset.

LEVEL 5: Advanced Analysis

Question 5.1: What does the correlation matrix reveal about the relationships between stress level, sleep hours, study hours, and burnout score?

num_data <- data %>%
  select(stress_level, sleep_hours, study_hours_per_day, burnout_score)


cor_matrix <- cor(num_data, use = "complete.obs", method = "pearson")


library(corrplot)
corrplot(cor_matrix, 
         method = "circle", 
         type = "upper", 
         diag = FALSE,
         tl.col = "black", 
         addCoef.col = "black")

Interpretation Stress level has the strongest positive correlation with burnout, making it a key predictor. 1. Strong Positive Correlation: Stress vs. Burnout (0.75) 2. Moderate Positive Correlation: Study Hours vs. Burnout (0.34) 3. Negative Correlation: Sleep vs. Burnout (-0.37) 4. No Correlation: Sleep vs. Study Hours (0.00)

Question 5.2: Which students are identified as outliers in burnout scores, and what do these extreme values indicate?

# Load required libraries
library(dplyr)
library(knitr)

# Calculate IQR
Q1 <- quantile(data$burnout_score, 0.25, na.rm = TRUE)
Q3 <- quantile(data$burnout_score, 0.75, na.rm = TRUE)
IQR_val <- Q3 - Q1

# Identify outliers
outliers <- data %>%
  filter(burnout_score < (Q1 - 1.5 * IQR_val) |
         burnout_score > (Q3 + 1.5 * IQR_val))

# Show number of outliers
cat("Total Outliers:", nrow(outliers), "\n")
## Total Outliers: 3735
# Display a clean sample (10 rows, selected columns)
outliers %>%
  select(age, gender, academic_year, burnout_score) %>%
  sample_n(10) %>%
  kable()
age gender academic_year burnout_score
19 Female 2 7.761073
26 Male 1 7.276458
19 Male 2 7.183207
19 Female 4 7.103009
27 Other 4 7.256140
21 Male 1 7.676246
18 Female 4 7.812587
29 Male 2 7.853736
22 Female 1 7.748621
20 Female 3 7.553783

Interpretation The analysis shows that stress level has the strongest impact on burnout among students.Sleep plays an important role, as lower sleep is associated with higher burnout.Academic year and gender do not show significant differences in burnout levels. Overall, managing stress and maintaining proper sleep can help reduce burnout.

Question 5.3: whether there is a significant difference in burnout scores between genders. Use one-way ANOVA and interpret the results.

anova_gender <- aov(burnout_score ~ gender, data=data)
summary(anova_gender)
##                Df  Sum Sq Mean Sq F value Pr(>F)
## gender      2e+00       0  0.0292   0.011   0.99
## Residuals   1e+06 2769010  2.7690

Interpretation -The p-value = 0.99, which is much greater than 0.05 -The F-value = 0.0292, which is very small

-H₀ (Null Hypothesis): No difference in burnout scores between genders -H₁ (Alternative Hypothesis): There is a difference -Since p-value > 0.05, we fail to reject H₀

ANOVA (Does Burnout Differ by Academic Year?)

Question 5.4: Is burnout scores differ significantly across different academic years. Perform a one-way ANOVA using academic_year as a factor and interpret the results.

anova_model <- aov(burnout_score ~ factor(academic_year), data=data)
summary(anova_model)
##                          Df  Sum Sq Mean Sq F value Pr(>F)
## factor(academic_year) 3e+00       2  0.7688   0.278  0.842
## Residuals             1e+06 2769007  2.7690

Interpretation The p-value = 0.842, which is greater than 0.05 The F-value = 0.7688, which is relatively small

Hypothesis Testing -H₀ (Null Hypothesis): Burnout scores are the same across all academic years -H₁ (Alternative Hypothesis): At least one academic year has different burnout scores

-Since p-value > 0.05, we fail to reject H₀

-Polynomial Regression (Study Hours)

Question 5.5: Polynomial regression model to examine the relationship between study hours per day and burnout score. Use a quadratic model and interpret the results.

model_poly <- lm(burnout_score ~ poly(study_hours_per_day, 2), data=data)
summary(model_poly)
## 
## Call:
## lm(formula = burnout_score ~ poly(study_hours_per_day, 2), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4846 -1.2352 -0.2856  1.0240  8.3844 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   1.784e+00  1.567e-03 1138.51   <2e-16 ***
## poly(study_hours_per_day, 2)1 5.577e+02  1.567e+00  355.88   <2e-16 ***
## poly(study_hours_per_day, 2)2 4.960e+01  1.567e+00   31.65   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.567 on 999997 degrees of freedom
## Multiple R-squared:  0.1132, Adjusted R-squared:  0.1132 
## F-statistic: 6.383e+04 on 2 and 999997 DF,  p-value: < 2.2e-16

Question 5.6: Fit a multiple regression model to examine the effect of study hours per day on burnout score, including a quadratic (polynomial) term. Also include other relevant variables such as stress level and anxiety score. Interpret the results.

model_multi_poly <- lm(burnout_score ~ poly(study_hours_per_day, 2) + stress_level + anxiety_score, data=data)
summary(model_multi_poly)
## 
## Call:
## lm(formula = burnout_score ~ poly(study_hours_per_day, 2) + stress_level + 
##     anxiety_score, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3190 -0.7372 -0.0522  0.6847  5.6773 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    -1.288276   0.003053 -421.95   <2e-16 ***
## poly(study_hours_per_day, 2)1 131.436103   1.131711  116.14   <2e-16 ***
## poly(study_hours_per_day, 2)2  48.787707   1.058854   46.08   <2e-16 ***
## stress_level                    0.548116   0.001001  547.83   <2e-16 ***
## anxiety_score                   0.249395   0.001081  230.74   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.059 on 999995 degrees of freedom
## Multiple R-squared:  0.5951, Adjusted R-squared:  0.5951 
## F-statistic: 3.674e+05 on 4 and 999995 DF,  p-value: < 2.2e-16

Interpretation This is a multiple regression model because it includes multiple predictors It is also a polynomial (quadratic) model due to poly(study_hours_per_day, 2)

This means burnout is influenced by:

Study hours (linear + curved effect) Stress level Anxiety score


Conclusion The analysis shows that burnout is mainly influenced by stress levels and sleep patterns. Students in higher academic years experience more burnout. Gender does not significantly affect burnout. Managing stress and improving sleep can help reduce burnout levels. The dataset reveals that burnout among students is primarily influenced by psychological and lifestyle factors such as stress, anxiety, study patterns, and sleep, rather than demographic variables like gender or academic year. The relationship between study hours and burnout is non-linear, and burnout significantly impacts academic outcomes such as dropout risk. Overall, burnout is a multi-factor phenomenon requiring holistic management.