Understanding the Relationship Between Student Lifestyle Factors and Depression Levels
Introduction
Depression is a growing concern among students due to academic pressure, social dynamics, and lifestyle changes. This project aims to explore the factors that may influence student depression levels by analyzing a dataset focused on various lifestyle, social, and academic variables. By applying statistical methods such as Chi-square tests, T-tests, ANOVA, and Multiple Linear Regression, we aim to uncover patterns and relationships that can provide insight into student mental health trends.
Dataset
Name: Student Depression Dataset
Source: Kaggle - Student Depression Dataset
Description: This dataset includes self-reported information from students on variables such as age, gender, sleep duration, academic performance, social support, and depression indicators.
Data:
Age: Age of the student.
Gender: Gender identity (Male/Female/Other).
Sleep Duration: Average number of hours the student sleeps per day.
GPA: Grade Point Average.
Depression: Depression status (Yes/No).
Social Support: Level of support from friends or family.
Exercise Frequency: How often the student exercises.
Screen Time: Daily screen time in hours.
Depression Score: A numerical score indicating severity of depressive symptoms (if present).
Types of Used Columns
Numerical (float/int):Age, Academic Pressure, Work Pressure, GPA, Study Satisfaction, Job Satisfaction, Work/Study Hours, Financial Stress.
Categorical (object):Gender, Sleep Duration, Dietary Habits, Degree, Family History of Mental Illness, Suicidal Thoughts.
Target Variable:Depression (int, 0 or 1)
Number of Records
Total records: 27,901
Most columns have no missing values, except:
Financial Stress: 3 missing values (removed)
data <- read.csv("Student Depression Dataset.csv")
clean_data <- data[!is.na(data$Financial_Stress), ]
Other Interesting Notes Respondents come from various cities and educational backgrounds.
Age varies widely (students in both undergrad and postgrad).
Resource 1: Chi-square Test
To determine whether there is a statistically significant association between Gender and Depression status among students.
Variables Used Gender: Categorical variable with levels Male and Female.
Depression: Binary variable (0 = No, 1 = Yes).
Method: A Chi-square test of independence was performed using a contingency table of the counts of depressed and non-depressed individuals grouped by gender.
data <- read.csv("Student Depression Dataset.csv")
gender_depression_table <- table(data$Gender, data$Depression)
print(gender_depression_table)
##
## 0 1
## Female 5133 7221
## Male 6432 9115
chisq_test <- chisq.test(gender_depression_table)
print(chisq_test)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: gender_depression_table
## X-squared = 0.082658, df = 1, p-value = 0.7737
Interpretation
With a p-value of 0.774, this means there is no statistically significant association between gender and depression status in this dataset.
data <- read.csv("Student Depression Dataset.csv")
data_clean <- na.omit(data[, c("CGPA", "Depression")])
t_test <- t.test(CGPA ~ Depression, data = data_clean)
print(t_test)
##
## Welch Two Sample t-test
##
## data: CGPA by Depression
## t = -3.6946, df = 24506, p-value = 0.0002207
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.10148191 -0.03112933
## sample estimates:
## mean in group 0 mean in group 1
## 7.617282 7.683588
Interpretation
A T-test was conducted to examine whether there was a difference in GPA between students who were depressed and those who were not. The mean GPA for depressed students (M = 7.68) was slightly higher than for non-depressed students (M = 7.62). This data shows that depressed students (group 1) have a slightly higher average CGPA (7.68) than non-depressed students (7.62). Even though the difference is small, it is statistically significant, meaning it is unlikely due to chance.
Goal of Multiple Linear Regression:
To predict GPA using multiple predictor variables like:
-Sleep Duration
-Financial Stress
-Depression
-Study Hours
-Anxiety
data <- read.csv("Student Depression Dataset.csv")
mlr_data <- na.omit(data[, c("CGPA", "Sleep.Duration", "Financial.Stress", "Depression", "Work.Study.Hours")])
mlr_model <- lm(CGPA ~ Sleep.Duration + Financial.Stress + Depression + Work.Study.Hours, data = mlr_data)
summary(mlr_model)
##
## Call:
## lm(formula = CGPA ~ Sleep.Duration + Financial.Stress + Depression +
## Work.Study.Hours, data = mlr_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.7103 -1.3484 0.1186 1.2540 2.4335
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.665838 0.031094 246.537 < 2e-16 ***
## Sleep.Duration7-8 hours -0.004768 0.025381 -0.188 0.850981
## Sleep.DurationLess than 5 hours -0.054211 0.024739 -2.191 0.028438 *
## Sleep.DurationMore than 8 hours -0.077651 0.026633 -2.916 0.003553 **
## Sleep.DurationOthers -0.090571 0.347040 -0.261 0.794108
## Financial.Stress -0.002565 0.006577 -0.390 0.696522
## Depression 0.069699 0.019652 3.547 0.000391 ***
## Work.Study.Hours -0.001139 0.002429 -0.469 0.639094
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.47 on 27890 degrees of freedom
## Multiple R-squared: 0.0009754, Adjusted R-squared: 0.0007247
## F-statistic: 3.89 on 7 and 27890 DF, p-value: 0.0003042
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.3
library(broom)
## Warning: package 'broom' was built under R version 4.4.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data <- read.csv("Student Depression Dataset.csv")
mlr_data <- na.omit(data[, c("CGPA", "Sleep.Duration", "Financial.Stress", "Depression", "Work.Study.Hours")])
mlr_model <- lm(CGPA ~ Sleep.Duration + Financial.Stress + Depression + Work.Study.Hours, data = mlr_data)
tidy_model <- broom::tidy(mlr_model, conf.int = TRUE)
tidy_model <- tidy_model %>% filter(term != "(Intercept)")
ggplot(tidy_model, aes(x = estimate, y = reorder(term, estimate))) +
geom_point(color = "blue", size = 5) +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.4, color = "gray") +
geom_vline(xintercept = 0, linetype = "dashed", color = "red") +
labs(
title = "Linear Regression Coefficients",
x = "Coefficient Estimate (with 95% CI)",
y = "Predictor Variables"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title.y = element_blank()
)
Interpretation
This model explains less than 1% of the variation in GPA.
We noticed that students sleeping less than 5 hours or more than 8 hours had slightly lower GPA.
Depression was associated with a higher GPA, which matches the t-test result.
Students who sleep more and study more tend to have higher GPAs, while financial stress negatively affects academic performance.
Test Type Result Summary
Chi-Squar-No significant link between gender and depression T-Tes-Depressed students had slightly higher GPAs (significant) Regressio-Weak model, but sleep and depression had significant effects
Final Conclusions
In this project, we looked into how different factors might be related to student depression and academic performance (GPA), using a large dataset with over 27,000 students. We used three main statistical methods: Chi-square, T-test, and Multiple Linear Regression.
Chi-square test: We checked if gender and depression were related. Turns out, there’s no significant connection, both male and female students reported depression at similar rates.
T-test: We compared GPA between students with and without depression. The results showed a small but statistically significant difference, students with depression actually had a slightly higher GPA. It’s surprising, but the difference is pretty small overall.
Multiple Linear Regression: We attempted to predict GPA based on things like sleep duration, financial stress, depression, and study/work hours. The model didn’t explain much, but a few factors stood out. Students who sleep less than 5 hours or more than 8 tend to have lower GPAs. Also, depression showed a small positive link to GPA again, just like in the T-test.
Final Thoughts:
Gender doesn’t really seem to affect depression rates in this group.
Depression doesn’t necessarily hurt academic performance, at least based on GPA.
Sleep seems to matter a bit — too little or too much might hurt GPA.
Overall, these factors don’t predict GPA super well, but some interesting trends popped up.