This report explores the sleep behaviors and academic/psychological outcomes of college students using the data from: https://www.lock5stat.com/datasets3e/SleepStudy.csv. With 253 observations and 27 variables, the dataset provides a detailed snapshot of students’ routines, habits, and well-being.
The aim is to analyze relationships between sleep quality, academic success (e.g. GPA), and mental health such as stress and depression. To achieve this, we’ll use a mix of descriptive statistics, hypothesis testing, and visualizations. I propose the following 10 questions based on my own understanding of the data.
Is there a significant difference in the average GPA between male and female college students?
Is there a significant difference in the average number of early classes between the first two class years and other class years?
Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?
Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?
Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?
Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?
Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?
Is there a significant difference in the average number of drinks per week between students of different genders?
Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?
Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?
We will explore the questions in detail.
sleep = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(sleep)
## Gender ClassYear LarkOwl NumEarlyClass EarlyClass GPA ClassesMissed
## 1 0 4 Neither 0 0 3.60 0
## 2 0 4 Neither 2 1 3.24 0
## 3 0 4 Owl 0 0 2.97 12
## 4 0 1 Lark 5 1 3.76 0
## 5 0 4 Owl 0 0 3.20 4
## 6 1 4 Neither 0 0 3.50 0
## CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1 -0.26 4 4 3 8
## 2 1.39 6 1 0 3
## 3 0.38 18 18 18 9
## 4 1.39 9 1 4 6
## 5 1.22 9 7 25 14
## 6 -0.04 6 14 8 28
## DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1 normal normal normal 15 28 Moderate 10
## 2 normal normal normal 4 25 Moderate 6
## 3 moderate severe normal 45 17 Light 3
## 4 normal normal normal 11 32 Light 2
## 5 normal severe normal 46 15 Moderate 4
## 6 moderate moderate high 50 22 Abstain 0
## WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1 25.75 8.70 7.70 25.75 9.50 5.88
## 2 25.70 8.20 6.80 26.00 10.00 7.25
## 3 27.44 6.55 3.00 28.00 12.59 10.09
## 4 23.50 7.17 6.77 27.00 8.00 7.25
## 5 25.90 8.67 6.09 23.75 9.50 7.00
## 6 23.80 8.95 9.05 26.00 10.75 9.00
## AverageSleep AllNighter
## 1 7.18 0
## 2 6.93 0
## 3 5.02 0
## 4 6.90 0
## 5 6.35 0
## 6 9.04 0
t.test(GPA ~ Gender, data = sleep)
##
## Welch Two Sample t-test
##
## data: GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1
## 3.324901 3.123725
boxplot(GPA ~ Gender, data = sleep, names = c("Female", "Male"),
main = "GPA by Gender", ylab = "GPA")
There isn’t a significant difference between the GPAs of males and
females. Women tend to have a slightly higher GPA than men on
average.
sleep$ClassGroup <- ifelse(sleep$ClassYear <= 2, "Lower", "Upper")
t.test(NumEarlyClass ~ ClassGroup, data = sleep)
##
## Welch Two Sample t-test
##
## data: NumEarlyClass by ClassGroup
## t = 4.1813, df = 250.69, p-value = 4.009e-05
## alternative hypothesis: true difference in means between group Lower and group Upper is not equal to 0
## 95 percent confidence interval:
## 0.4042016 1.1240309
## sample estimates:
## mean in group Lower mean in group Upper
## 2.070423 1.306306
boxplot(NumEarlyClass ~ ClassGroup, data = sleep,
main = "Early Classes by Year", ylab = "Number of Early Classes")
There isn’t a significant gap in the data, but it seems that lower class
years have more early classes by 1 compared to the later years.
subset_data <- subset(sleep, LarkOwl %in% c("Lark", "Owl"))
t.test(CognitionZscore ~ LarkOwl, data = subset_data)
##
## Welch Two Sample t-test
##
## data: CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
## -0.1893561 0.4465786
## sample estimates:
## mean in group Lark mean in group Owl
## 0.09024390 -0.03836735
boxplot(CognitionZscore ~ LarkOwl, data = subset_data,
main = "Cognition Score: Larks vs Owls")
The data suggests that, though marginally, larks have higher cognitive
scores.
t.test(ClassesMissed ~ EarlyClass, data = sleep)
##
## Welch Two Sample t-test
##
## data: ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.2233558 1.5412830
## sample estimates:
## mean in group 0 mean in group 1
## 2.647059 1.988095
boxplot(ClassesMissed ~ EarlyClass, data = sleep,
names = c("No Early", "Has Early"), main = "Classes Missed by Early Class")
Unexpectedly, the difference between students who do and don’t have
early classes and said students skipping class isn’t large enough for a
definitive answer.
subset_data <- subset(sleep, DepressionStatus %in% c("normal", "moderate"))
subset_data$DepressionStatus <- factor(subset_data$DepressionStatus)
t.test(Happiness ~ DepressionStatus, data = subset_data)
##
## Welch Two Sample t-test
##
## data: Happiness by DepressionStatus
## t = -4.3253, df = 43.992, p-value = 8.616e-05
## alternative hypothesis: true difference in means between group moderate and group normal is not equal to 0
## 95 percent confidence interval:
## -5.818614 -2.119748
## sample estimates:
## mean in group moderate mean in group normal
## 23.08824 27.05742
boxplot(Happiness ~ DepressionStatus, data = subset_data,
main = "Happiness by Depression Status")
Students with moderate depression reported significantly lower happiness
levels than those classified as having normal depression.
t.test(PoorSleepQuality ~ AllNighter, data = sleep)
##
## Welch Two Sample t-test
##
## data: PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -1.9456958 0.1608449
## sample estimates:
## mean in group 0 mean in group 1
## 6.136986 7.029412
boxplot(PoorSleepQuality ~ AllNighter, data = sleep,
names = c("No All-Nighter", "All-Nighter"),
main = "Sleep Quality by All-Nighter Status")
Students who reported at least one all-nighter during the semester
experienced significantly worse sleep quality on average compared to
students who did not report any all-nighters.
subset_data <- subset(sleep, AlcoholUse %in% c("Abstain", "Heavy"))
t.test(StressScore ~ AlcoholUse, data = subset_data)
##
## Welch Two Sample t-test
##
## data: StressScore by AlcoholUse
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means between group Abstain and group Heavy is not equal to 0
## 95 percent confidence interval:
## -6.261170 3.327346
## sample estimates:
## mean in group Abstain mean in group Heavy
## 8.970588 10.437500
boxplot(StressScore ~ AlcoholUse, data = subset_data,
main = "Stress by Alcohol Use")
Students who abstain from alcohol use tend to report significantly lower
stress scores compared to those who identify as heavy alcohol users.
t.test(Drinks ~ Gender, data = sleep)
##
## Welch Two Sample t-test
##
## data: Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1
## 4.238411 7.539216
boxplot(Drinks ~ Gender, data = sleep,
names = c("Female", "Male"), main = "Alcohol Consumption by Gender")
Male students reported significantly higher average alcohol consumption
per week compared to female students.
t.test(WeekdayBed ~ Stress, data = sleep)
##
## Welch Two Sample t-test
##
## data: WeekdayBed by Stress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
## -0.4856597 0.1447968
## sample estimates:
## mean in group high mean in group normal
## 24.71500 24.88543
boxplot(WeekdayBed ~ Stress, data = sleep,
main = "Weekday Bedtime by Stress Level")
Although students with normal stress levels tend to go to bed slightly
later than those with high stress, the difference in average weekday
bedtime is minimal.
t.test(WeekendSleep ~ ClassGroup, data = sleep)
##
## Welch Two Sample t-test
##
## data: WeekendSleep by ClassGroup
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means between group Lower and group Upper is not equal to 0
## 95 percent confidence interval:
## -0.3497614 0.3331607
## sample estimates:
## mean in group Lower mean in group Upper
## 8.213592 8.221892
boxplot(WeekendSleep ~ ClassGroup, data = sleep,
main = "Weekend Sleep by Class Year Group")
There was no statistically significant difference in the average hours
of weekend sleep between lower-year and upper-year college students.
Both groups appear to get a similar amount of sleep on weekends.
The data analyzed from this report gives insight and explores relationships between sleep habits, academics, and psychological well-being of college students. Through statistical methods and visual analysis, we were able to address the ten research questions I proposed earlier. The major findings are summarized below.
Q1. GPA and Gender: Women tend to have a slightly higher GPA on average compared to men.
Q2. Early Classes and Class year: Earlier class years tend to have 1 more early class than later years.
Q3. Cognitive Skills: Larks vs Owls: Larks tend to have higher cognitive skills.
Q4. Classes Missed and Early Class Attendance: The number of classes missed did not differ significantly.
Q5. Happiness and Depression Status: Students with moderate depression reported significantly lower happiness levels than those with normal depression.
Q6. Sleep Quality and All-Nighters: Students who pulled at least one all-nighter reported significantly poorer sleep quality than those who didn’t.
Q7. Stress and Alcohol Use: Students who abstained from alcohol use had significantly lower stress scores than heavy drinkers.
Q8. Drinking and Gender: Males consumed significantly more alcoholic drinks per week than females.
Q9. Weekday Bedtime and Stress Level: No meaningful difference was found.
Q10. Weekend Sleep and Class Year: Students across class years slept similar hours on weekends.
These findings reinforce the importance of psychological health and behavioral habits on students’ well-being. While sleep timing and academic year appear to have limited influence on GPA or attendance, factors like depression, alcohol use, and sleep disruption are more meaningfully tied to students’ cognitive and emotional health.