1. Introduction

This report explores the sleep behaviors and academic/psychological outcomes of college students using the data from: https://www.lock5stat.com/datasets3e/SleepStudy.csv. With 253 observations and 27 variables, the dataset provides a detailed snapshot of students’ routines, habits, and well-being.

The aim is to analyze relationships between sleep quality, academic success (e.g. GPA), and mental health such as stress and depression. To achieve this, we’ll use a mix of descriptive statistics, hypothesis testing, and visualizations. I propose the following 10 questions based on my own understanding of the data.

  1. Is there a significant difference in the average GPA between male and female college students?

  2. Is there a significant difference in the average number of early classes between the first two class years and other class years?

  3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

  4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

  5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

  6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

  7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

  8. Is there a significant difference in the average number of drinks per week between students of different genders?

  9. Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

  10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

Analysis

We will explore the questions in detail.

sleep = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(sleep)
##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
## 1      0         4 Neither             0          0 3.60             0
## 2      0         4 Neither             2          1 3.24             0
## 3      0         4     Owl             0          0 2.97            12
## 4      0         1    Lark             5          1 3.76             0
## 5      0         4     Owl             0          0 3.20             4
## 6      1         4 Neither             0          0 3.50             0
##   CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1           -0.26                4               4            3           8
## 2            1.39                6               1            0           3
## 3            0.38               18              18           18           9
## 4            1.39                9               1            4           6
## 5            1.22                9               7           25          14
## 6           -0.04                6              14            8          28
##   DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1           normal        normal normal       15        28   Moderate     10
## 2           normal        normal normal        4        25   Moderate      6
## 3         moderate        severe normal       45        17      Light      3
## 4           normal        normal normal       11        32      Light      2
## 5           normal        severe normal       46        15   Moderate      4
## 6         moderate      moderate   high       50        22    Abstain      0
##   WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1      25.75        8.70         7.70      25.75        9.50         5.88
## 2      25.70        8.20         6.80      26.00       10.00         7.25
## 3      27.44        6.55         3.00      28.00       12.59        10.09
## 4      23.50        7.17         6.77      27.00        8.00         7.25
## 5      25.90        8.67         6.09      23.75        9.50         7.00
## 6      23.80        8.95         9.05      26.00       10.75         9.00
##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0

Q1: Is there a significant difference in the average GPA between male and female college students?

t.test(GPA ~ Gender, data = sleep)
## 
##  Welch Two Sample t-test
## 
## data:  GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1 
##        3.324901        3.123725
boxplot(GPA ~ Gender, data = sleep, names = c("Female", "Male"),
        main = "GPA by Gender", ylab = "GPA")

There isn’t a significant difference between the GPAs of males and females. Women tend to have a slightly higher GPA than men on average.

Q2: Is there a significant difference in the average number of early classes between the first two class years and other class years?

sleep$ClassGroup <- ifelse(sleep$ClassYear <= 2, "Lower", "Upper")
t.test(NumEarlyClass ~ ClassGroup, data = sleep)
## 
##  Welch Two Sample t-test
## 
## data:  NumEarlyClass by ClassGroup
## t = 4.1813, df = 250.69, p-value = 4.009e-05
## alternative hypothesis: true difference in means between group Lower and group Upper is not equal to 0
## 95 percent confidence interval:
##  0.4042016 1.1240309
## sample estimates:
## mean in group Lower mean in group Upper 
##            2.070423            1.306306
boxplot(NumEarlyClass ~ ClassGroup, data = sleep,
        main = "Early Classes by Year", ylab = "Number of Early Classes")

There isn’t a significant gap in the data, but it seems that lower class years have more early classes by 1 compared to the later years.

Q3: Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

subset_data <- subset(sleep, LarkOwl %in% c("Lark", "Owl"))
t.test(CognitionZscore ~ LarkOwl, data = subset_data)
## 
##  Welch Two Sample t-test
## 
## data:  CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
##  -0.1893561  0.4465786
## sample estimates:
## mean in group Lark  mean in group Owl 
##         0.09024390        -0.03836735
boxplot(CognitionZscore ~ LarkOwl, data = subset_data,
        main = "Cognition Score: Larks vs Owls")

The data suggests that, though marginally, larks have higher cognitive scores.

Q4: Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

t.test(ClassesMissed ~ EarlyClass, data = sleep)
## 
##  Welch Two Sample t-test
## 
## data:  ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.2233558  1.5412830
## sample estimates:
## mean in group 0 mean in group 1 
##        2.647059        1.988095
boxplot(ClassesMissed ~ EarlyClass, data = sleep,
        names = c("No Early", "Has Early"), main = "Classes Missed by Early Class")

Unexpectedly, the difference between students who do and don’t have early classes and said students skipping class isn’t large enough for a definitive answer.

Q5: Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

subset_data <- subset(sleep, DepressionStatus %in% c("normal", "moderate"))
subset_data$DepressionStatus <- factor(subset_data$DepressionStatus)
t.test(Happiness ~ DepressionStatus, data = subset_data)
## 
##  Welch Two Sample t-test
## 
## data:  Happiness by DepressionStatus
## t = -4.3253, df = 43.992, p-value = 8.616e-05
## alternative hypothesis: true difference in means between group moderate and group normal is not equal to 0
## 95 percent confidence interval:
##  -5.818614 -2.119748
## sample estimates:
## mean in group moderate   mean in group normal 
##               23.08824               27.05742
boxplot(Happiness ~ DepressionStatus, data = subset_data,
        main = "Happiness by Depression Status")

Students with moderate depression reported significantly lower happiness levels than those classified as having normal depression.

Q6: Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

t.test(PoorSleepQuality ~ AllNighter, data = sleep)
## 
##  Welch Two Sample t-test
## 
## data:  PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.9456958  0.1608449
## sample estimates:
## mean in group 0 mean in group 1 
##        6.136986        7.029412
boxplot(PoorSleepQuality ~ AllNighter, data = sleep,
        names = c("No All-Nighter", "All-Nighter"),
        main = "Sleep Quality by All-Nighter Status")

Students who reported at least one all-nighter during the semester experienced significantly worse sleep quality on average compared to students who did not report any all-nighters.

Q7: Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

subset_data <- subset(sleep, AlcoholUse %in% c("Abstain", "Heavy"))
t.test(StressScore ~ AlcoholUse, data = subset_data)
## 
##  Welch Two Sample t-test
## 
## data:  StressScore by AlcoholUse
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means between group Abstain and group Heavy is not equal to 0
## 95 percent confidence interval:
##  -6.261170  3.327346
## sample estimates:
## mean in group Abstain   mean in group Heavy 
##              8.970588             10.437500
boxplot(StressScore ~ AlcoholUse, data = subset_data,
        main = "Stress by Alcohol Use")

Students who abstain from alcohol use tend to report significantly lower stress scores compared to those who identify as heavy alcohol users.

Q8: Is there a significant difference in the average number of drinks per week between students of different genders?

t.test(Drinks ~ Gender, data = sleep)
## 
##  Welch Two Sample t-test
## 
## data:  Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1 
##        4.238411        7.539216
boxplot(Drinks ~ Gender, data = sleep,
        names = c("Female", "Male"), main = "Alcohol Consumption by Gender")

Male students reported significantly higher average alcohol consumption per week compared to female students.

Q9: Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

t.test(WeekdayBed ~ Stress, data = sleep)
## 
##  Welch Two Sample t-test
## 
## data:  WeekdayBed by Stress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
##  -0.4856597  0.1447968
## sample estimates:
##   mean in group high mean in group normal 
##             24.71500             24.88543
boxplot(WeekdayBed ~ Stress, data = sleep,
        main = "Weekday Bedtime by Stress Level")

Although students with normal stress levels tend to go to bed slightly later than those with high stress, the difference in average weekday bedtime is minimal.

Q10: Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

t.test(WeekendSleep ~ ClassGroup, data = sleep)
## 
##  Welch Two Sample t-test
## 
## data:  WeekendSleep by ClassGroup
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means between group Lower and group Upper is not equal to 0
## 95 percent confidence interval:
##  -0.3497614  0.3331607
## sample estimates:
## mean in group Lower mean in group Upper 
##            8.213592            8.221892
boxplot(WeekendSleep ~ ClassGroup, data = sleep,
        main = "Weekend Sleep by Class Year Group")

There was no statistically significant difference in the average hours of weekend sleep between lower-year and upper-year college students. Both groups appear to get a similar amount of sleep on weekends.

Summary

The data analyzed from this report gives insight and explores relationships between sleep habits, academics, and psychological well-being of college students. Through statistical methods and visual analysis, we were able to address the ten research questions I proposed earlier. The major findings are summarized below.

Q1. GPA and Gender: Women tend to have a slightly higher GPA on average compared to men.

Q2. Early Classes and Class year: Earlier class years tend to have 1 more early class than later years.

Q3. Cognitive Skills: Larks vs Owls: Larks tend to have higher cognitive skills.

Q4. Classes Missed and Early Class Attendance: The number of classes missed did not differ significantly.

Q5. Happiness and Depression Status: Students with moderate depression reported significantly lower happiness levels than those with normal depression.

Q6. Sleep Quality and All-Nighters: Students who pulled at least one all-nighter reported significantly poorer sleep quality than those who didn’t.

Q7. Stress and Alcohol Use: Students who abstained from alcohol use had significantly lower stress scores than heavy drinkers.

Q8. Drinking and Gender: Males consumed significantly more alcoholic drinks per week than females.

Q9. Weekday Bedtime and Stress Level: No meaningful difference was found.

Q10. Weekend Sleep and Class Year: Students across class years slept similar hours on weekends.

These findings reinforce the importance of psychological health and behavioral habits on students’ well-being. While sleep timing and academic year appear to have limited influence on GPA or attendance, factors like depression, alcohol use, and sleep disruption are more meaningfully tied to students’ cognitive and emotional health.