Midterm Project

Dataset Dictionary and Source

Student Lifestyle Dataset

This study evaluates a dataset of 2,000 students from 2023 to 2024 regarding their lifestyle information.

Data Source: Kaggle

Core Variables:

  • Study_Hours_Per_Day: Number of hours a student spends on studying per day
  • Extracurricular_Hours_Per_Day: Number of hours spent on extracurricular activities per day
  • Sleep_Hours_Per_Day: Average number of hours a student sleeps per day
  • Social_Hours_Per_Day: Number of hours spent on social activities per day
  • Physical_Activity_Hours_Per_Day: Number of hours spent on physical activities per day

Dataset Dictionary and Source

  • Stress_Level: Stress level categories(Low, Moderate, High)
  • Gender: Student gender
  • Grades: Academic performance score on scale 0-10.

Data Loading

#Importing library packages
library(readr)
library(plotly)
library(ggplot2)
library(dplyr)
library(janitor)

#Loading the dataset
sl_dataset <- read_csv("~/Downloads/student_lifestyle_dataset..csv")

#Subsetting the main dataset for more organized 2D scatterplots
sl_dataset2 = sample_n(tbl = sl_dataset, size = 500)

3D Plotly: Study Hours, Sleep Hours & Grades

3D Plot Analysis

Key Observations:

  • Study Hours Distribution: Students who study longer generally receive higher grades and fall into higher stress level categories. As shown in the graph, there appears to be a positive relationship between study hours and grades.

  • Sleep Trend: High grades appear to cluster around 7-9 hours of sleep per day. On the other hand, students who get a little amount of sleep show more variability in their grades, which shows that sleep deprivation may reduce consistency in achieving high grades.

  • Grade Pattern: Students with higher grades usually have longer study hours.

  • Stress Levels: As student scores increase, stress levels also tend to increase, which indicates there is a positive relationship between grades and stress levels.

  • Joint Effect: The 3D graph depicts that students who have longer study hours and adequate sleep are more likely to receive high grades.

ggplot Boxplot: Grades vs Stress Levels

Plotly Scatter: Grades vs Study Hours by Stress Level

ggplot Scatter: Grades vs Study Hours by Stress & Gender

ggplot Histogram: Distribution of Student Grades

ggplot Boxplot: Comparison of Activity Hours

Plotly Scatterplot: Grades vs Physical Activity by Gender

ggplot Bar: Student Count by Gender

Statistical Analysis: Five-Number Summary

sl_dataset %>%
  group_by(Stress_Level) %>%
  summarise(
    Count = n(),
    Min_Grades = min(Grades),
    Q1_Grades = quantile(Grades, 0.25),
    Median_Grades = median(Grades),
    Q3_Grades = quantile(Grades, 0.75),
    Max_Grades = max(Grades)
  )
## # A tibble: 3 × 7
##   Stress_Level Count Min_Grades Q1_Grades Median_Grades Q3_Grades Max_Grades
##   <chr>        <int>      <dbl>     <dbl>         <dbl>     <dbl>      <dbl>
## 1 High          1029       5.78      7.72          8.18      8.65      10   
## 2 Low            297       5.6       6.7           7.05      7.38       8.95
## 3 Moderate       674       6.1       7.18          7.55      7.95       9.38

Statistical Analysis: Mean & Standard Deviation

sl_dataset %>%
  group_by(Stress_Level) %>%
  summarise(
    Count = n(),
    Mean_Study = round(mean(Study_Hours_Per_Day), 2),
    Mean_Sleep = round(mean(Sleep_Hours_Per_Day), 2),
    Mean_Grades = round(mean(Grades), 2),
    SD_Grades = round(sd(Grades), 2)
  )
## # A tibble: 3 × 6
##   Stress_Level Count Mean_Study Mean_Sleep Mean_Grades SD_Grades
##   <chr>        <int>      <dbl>      <dbl>       <dbl>     <dbl>
## 1 High          1029       8.39       7.05        8.15      0.69
## 2 Low            297       5.47       8.06        7.04      0.54
## 3 Moderate       674       6.97       7.95        7.56      0.55

Summary Statistics: Interpretation

Detailed Findings:

  • Grades and Stress Relationship: Students with high stress demonstrate the highest median and mean grades, which portrays that high grades are often associated with high stress. It is also supported by the interquartile range (Q1 and Q3), as the grades of high-stress students are consistently higher than those in the low and moderate stress groups. Additionally, the mean and median values are similar, as the dataset is not skewed.

  • Sleep Pattern: There appears to be a pattern in which stress increases as sleep decreases. This represents that students who sacrifice sleep duration tend to experience higher levels of stress.

  • Study Hours Trend: There is a clear positive relationship between stress levels and study hours, which students who study longer seems to experience more stress than those who spend less time studying.

Summary Statistics: Interpretation

  • Grades vs Lifestyle: A lot of students who study more and achieve high grades suffer from sleep deprivation and high stress levels, which indicates that there are trade-offs between academic performance and students’ welfare.

Statistical Analysis: T-Test

low_stress <- sl_dataset$Grades[sl_dataset$Stress_Level == "Low"]
high_stress <- sl_dataset$Grades[sl_dataset$Stress_Level == "High"]

t_test_result <- t.test(low_stress, high_stress)
t_test_result
## 
##  Welch Two Sample t-test
## 
## data:  low_stress and high_stress
## t = -29.359, df = 601.48, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.187186 -1.038318
## sample estimates:
## mean of x mean of y 
##  7.042088  8.154840

T-Test: Interpretation

Comprehensive Analysis:

  • Statistical Significance: The p-value of 2.2e-16 provides proof that the difference in mean grades between low and high stress students is statistically significant and not by chance.

  • Mean Difference: The average grade for low-stress students is 7.04, whereas for high-stress students is 8.15. This shows that high-stress students achieve higher scores on average.

  • Confidence Interval: We are 95% confident that the true population difference in mean lies between -1.19 and -1.04 (low - high), which supports the claim that high-stress students tend to achieve higher grades.

  • Practical Implementation: While longer study hours are correlated with higher scores, there are other aspects, such as sleep duration and stress levels that have to be exchanged in order to accomplish outstanding academic performance at school.

Statistical Analysis: ANOVA

anova_result <- aov(Grades ~ Stress_Level, data = sl_dataset)
summary(anova_result)
##                Df Sum Sq Mean Sq F value Pr(>F)    
## Stress_Level    2  338.1  169.06   434.7 <2e-16 ***
## Residuals    1997  776.7    0.39                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova: Interpretation

Extensive Analysis:

  • Statistical Significance: The p-value of 2e-16 provides strong evidence that there are significant differences in mean grades across different stress levels. It depicts that the observed differences are not because of random chance.

  • F-Statistics: The F-value of 434.7 shows that the variation between stress level categories is much greater than the variation within groups.

  • Confidence in Results: Given the small p-value and large F-statistic that were obtained from the anova test, there is a statistical evidence that stress level is correlated with differences in student grades. This pattern is also shown in the boxplot in the previous slide, which compares grades and stress levels.

Conclusions

Major Findings:

  • Stress level and grades are highly correlated. This is supported by statistical tests and small p-values far below the significance level.

  • There are multiple factors that contribute toward students’ grades and welfare, such as study hours, sleep duration, and stress levels.

  • Trade-offs may exist in order for students to achieve high grades or to maintain mental health.

Practical Implications and Future Directions

Practical Implementations: The findings encourage students to maintain their stress levels while studying and aiming for high scores. Although higher stress levels are highly associated with higher grades, excessive stress may adversely impact one’s overall welfare.

Study Limitations: This study might not account for confounding or lurking variables such as socioeconomic background of each student, classes that students take, or motivation levels that may influence stress levels and grades.

Future Research Directions:

  • Incorporate objective measures, for example, to track study time or sleep duration to improve data objectivity.
  • Examine causations through randomized experiments to further extend the analysis.

Resource Links

Thank You

Thank you