1. Introduction

This report presents an analysis of sleep patterns among college students using the SleepStudy dataset.The dataset contains 253 observations on 27 variables, including sleep habits, academic performance, mental health, and lifestyle factors. The goal of this project is to explore how sleep and related behaviors connect to things like GPA, mood, stress, and daily routines. By looking at different groups of students (e.g, by gender, depression status, alcohol use, etc), we can see whether certain factors are associated with better or worse outcomes.

In this report, I will focus on the following research questions:

-1. Is there a significant difference in the average GPA between male and female college students? -2. Is there a significant difference in the average number of early classes between the first two class years and other class years? -3. Do students who identify as “larks” have significantly better cognitive skills (CognitionZscore) compared to “owls”? -4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass = 1) and those who didn’t (EarlyClass = 0)? -5. Is there a significant difference in the average happiness level between students with at least moderate depression and those with normal depression status? -6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter = 1) and those who didn’t (AllNighter = 0)? -7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use? -8. Is there a significant difference in the average number of drinks per week between students of different genders? -9. Is there a significant difference in the average weekday bedtime between students with high stress (Stress = High) and those with normal stress (Stress = Normal)? -10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

Analysis

SleepStudy = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(SleepStudy)
##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
## 1      0         4 Neither             0          0 3.60             0
## 2      0         4 Neither             2          1 3.24             0
## 3      0         4     Owl             0          0 2.97            12
## 4      0         1    Lark             5          1 3.76             0
## 5      0         4     Owl             0          0 3.20             4
## 6      1         4 Neither             0          0 3.50             0
##   CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1           -0.26                4               4            3           8
## 2            1.39                6               1            0           3
## 3            0.38               18              18           18           9
## 4            1.39                9               1            4           6
## 5            1.22                9               7           25          14
## 6           -0.04                6              14            8          28
##   DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1           normal        normal normal       15        28   Moderate     10
## 2           normal        normal normal        4        25   Moderate      6
## 3         moderate        severe normal       45        17      Light      3
## 4           normal        normal normal       11        32      Light      2
## 5           normal        severe normal       46        15   Moderate      4
## 6         moderate      moderate   high       50        22    Abstain      0
##   WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1      25.75        8.70         7.70      25.75        9.50         5.88
## 2      25.70        8.20         6.80      26.00       10.00         7.25
## 3      27.44        6.55         3.00      28.00       12.59        10.09
## 4      23.50        7.17         6.77      27.00        8.00         7.25
## 5      25.90        8.67         6.09      23.75        9.50         7.00
## 6      23.80        8.95         9.05      26.00       10.75         9.00
##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0

Q1: Is there a significant difference in the average GPA between male and female college students?

t.test(GPA ~ Gender, data = SleepStudy, alternative = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1 
##        3.324901        3.123725
boxplot(GPA ~ Gender,
        data = SleepStudy,
        xlab = "Gender (0 = Female, 1 = Male)",
        ylab = "GPA",
        main = "GPA by Gender",
        col = c("pink", "lightblue"))

For this question, I ran a two-sample t-test to compare the average GPA between female students (Gender = 0) and male students (Gender = 1). The results show a t-value of 3.91 with about 200.9 degrees of freedom and a p-value of 0.0001243. Since this p-value is well below 0.05, the difference in mean GPA between the two groups is statistically significant. The 95% confidence interval for the difference in means ranges from approximately 0.0998 to 0.3025, which does not include zero. Based on the sample means (3.3249 for female students and 3.1237 for male students), female students had slightly higher GPAs on average.

Q2: Is there a significant difference in the average number of early classes between the first two class years and other class years?

SleepStudy$YearGroup <- ifelse(SleepStudy$ClassYear %in% c(1, 2), "Lower", "Upper")
t.test(NumEarlyClass ~ YearGroup, data = SleepStudy)
## 
##  Welch Two Sample t-test
## 
## data:  NumEarlyClass by YearGroup
## t = 4.1813, df = 250.69, p-value = 4.009e-05
## alternative hypothesis: true difference in means between group Lower and group Upper is not equal to 0
## 95 percent confidence interval:
##  0.4042016 1.1240309
## sample estimates:
## mean in group Lower mean in group Upper 
##            2.070423            1.306306

For this question, I compared the average number of early classes (before 9 AM) between lower-level students (ClassYear 1–2) and upper-level students (ClassYear 3–4). I created a new variable called YearGroup to separate the two groups and ran a two-sample t-test. The results showed a t-value of 4.18 with 250.69 degrees of freedom and a p-value of 0.00004009. Since the p-value is far below 0.05, the difference is statistically significant. The 95% confidence interval for the difference in means ranges from about 0.404 to 1.124, which means lower-level students had noticeably more early classes on average. The sample means show this clearly: lower-level students averaged 2.07 early classes per week, while upper-level students averaged about 1.31.

Q3: Do students who identify as “larks” have significantly better cognitive skills (CognitionZscore) compared to “owls”?

Sleep_LarkOwl <- subset(SleepStudy, LarkOwl %in% c("Lark", "Owl"))
t.test(CognitionZscore ~ LarkOwl, data = Sleep_LarkOwl)
## 
##  Welch Two Sample t-test
## 
## data:  CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
##  -0.1893561  0.4465786
## sample estimates:
## mean in group Lark  mean in group Owl 
##         0.09024390        -0.03836735

For this question, I compared the cognitive performance of students who identify as “larks” versus “owls” using a two-sample t-test. The results showed a t-value of 0.8057 with about 75.33 degrees of freedom and a p-value of 0.4229. Since this p-value is greater than 0.05, there is no statistically significant difference in cognition scores between the two groups. The 95% confidence interval (–0.1894 to 0.4466) includes zero, which also supports the conclusion that any observed difference is not meaningful. The sample means show that larks averaged a cognition z-score of 0.0902 while owls averaged –0.0384, but this difference is not large enough to be considered statistically significant

Q4: Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass = 1) and those who didn’t (EarlyClass = 0)?

t.test(ClassesMissed ~ EarlyClass,
       data = SleepStudy,
       alternative = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.2233558  1.5412830
## sample estimates:
## mean in group 0 mean in group 1 
##        2.647059        1.988095
boxplot(ClassesMissed ~ EarlyClass,
        data = SleepStudy,
        xlab = "Early Class (0 = No, 1 = Yes)",
        ylab = "Classes Missed",
        main = "Classes Missed by Early Class Status",
        col = c("lightpink", "lightblue"))

For this question, I compared the average number of classes missed between students who had at least one early class (EarlyClass = 1) and those who did not (EarlyClass = 0). I ran a two-sample t-test, which produced a t-value of 1.4755 with about 152.78 degrees of freedom and a p-value of 0.1421. Since this p-value is greater than 0.05, the difference between the two groups is not statistically significant. The 95% confidence interval ranged from –0.2234 to 1.5413, which includes zero, further showing that the difference is not meaningful. Even though the sample means suggest that students without early classes missed slightly more classes on average (2.65 vs. 1.99), this difference isn’t strong enough to be statistically significant. I also added a box plot for this question to give a better visual reprisenation of the data.

Q5: Is there a significant difference in the average happiness level between students with at least moderate depression and those with normal depression status?

SleepStudy$DepGroup <- ifelse(SleepStudy$DepressionStatus == "normal",
                              "Normal",
                              "Moderate/Severe")

SleepStudy$DepGroup <- factor(SleepStudy$DepGroup)
table(SleepStudy$DepGroup)
## 
## Moderate/Severe          Normal 
##              44             209
t.test(Happiness ~ DepGroup,
       data = SleepStudy,
       alternative = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  Happiness by DepGroup
## t = -5.6339, df = 55.594, p-value = 6.057e-07
## alternative hypothesis: true difference in means between group Moderate/Severe and group Normal is not equal to 0
## 95 percent confidence interval:
##  -7.379724 -3.507836
## sample estimates:
## mean in group Moderate/Severe          mean in group Normal 
##                      21.61364                      27.05742
boxplot(Happiness ~ DepGroup,
        data = SleepStudy,
        xlab = "Depression Group",
        ylab = "Happiness Score",
        main = "Happiness by Depression Status",
        col = c("lightyellow", "orange"))

For this question, I compared the average happiness scores between students with normal depression status and those who were classified as moderate or severe. After creating a combined “Moderate/Severe” group, I ran a two-sample t-test. The results showed a t-value of –5.6339 with about 55.59 degrees of freedom and a p-value of 0.0000006057. Since this p-value is extremely small and far below 0.05, the difference is statistically significant. The 95% confidence interval ranged from –7.3797 to –3.5078, which does not include zero. The sample means show that students with moderate or severe depression had a much lower happiness score on average (21.61) compared to students with normal depression status (27.06). The box plot shows the statistical difference really well.

Q6: Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter = 1) and those who didn’t (AllNighter = 0)?

t.test(PoorSleepQuality ~ AllNighter,
       data = SleepStudy,
       alternative = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.9456958  0.1608449
## sample estimates:
## mean in group 0 mean in group 1 
##        6.136986        7.029412

For this question, I compared the average sleep quality scores between students who had at least one all-nighter and those who did not. I ran a two-sample t-test using PoorSleepQuality as the outcome. The results showed a t-value of –1.7068 with about 44.71 degrees of freedom and a p-value of 0.09479. Since this p-value is greater than 0.05, the difference in sleep quality between the two groups is not statistically significant. The 95% confidence interval ranged from –1.9457 to 0.1608, which includes zero, meaning the observed difference might just be due to random variation. The sample means show that students who pulled an all-nighter had slightly worse sleep quality on average (7.03 vs. 6.14), but this difference isn’t strong enough to be considered statistically meaningful.

Q7: Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

Sleep_Alcohol <- subset(SleepStudy, AlcoholUse %in% c("Abstain", "Heavy"))
Sleep_Alcohol$AlcoholUse <- factor(Sleep_Alcohol$AlcoholUse)
table(Sleep_Alcohol$AlcoholUse)
## 
## Abstain   Heavy 
##      34      16
t.test(StressScore ~ AlcoholUse,
       data = Sleep_Alcohol,
       alternative = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  StressScore by AlcoholUse
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means between group Abstain and group Heavy is not equal to 0
## 95 percent confidence interval:
##  -6.261170  3.327346
## sample estimates:
## mean in group Abstain   mean in group Heavy 
##              8.970588             10.437500

For this question, I compared stress levels between students who abstain from alcohol and students who report heavy alcohol use. After filtering the dataset to those two groups, I ran a two-sample t-test using StressScore as the outcome. The results showed a t-value of –0.6260 with about 28.73 degrees of freedom and a p-value of 0.5362. Since this p-value is much greater than 0.05, the difference in average stress levels between the two groups is not statistically significant. The 95% confidence interval ranged from –6.2612 to 3.3273, which includes zero and reinforces that the difference is not meaningful. Even though the sample means show that heavy drinkers had slightly higher stress scores on average (10.44) compared to those who abstain (8.97), this difference isn’t large enough to be significant.

Q8: Is there a significant difference in the average number of drinks per week between students of different genders?

t.test(Drinks ~ Gender,
       data = SleepStudy,
       alternative = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1 
##        4.238411        7.539216
boxplot(Drinks ~ Gender,
        data = SleepStudy,
        xlab = "Gender (0 = Female, 1 = Male)",
        ylab = "Drinks Per Week",
        main = "Alcohol Consumption by Gender",
        col = c("pink", "lightblue"))

For this question, I compared the average number of alcoholic drinks consumed per week between female students (Gender = 0) and male students (Gender = 1). I ran a two-sample t-test using Drinks as the outcome variable. The results showed a t-value of –6.1601 with about 142.75 degrees of freedom and a p-value of 0.000000007002. Since this p-value is far below 0.05, the difference between the two groups is statistically significant. The 95% confidence interval for the difference in means ranged from –4.3600 to –2.2416, and it does not include zero. The sample means show that males drank more on average (7.54 drinks per week) compared to females (4.24 drinks per week). I also included a boxplot for this question to better visualize the results.

Q9: -9. Is there a significant difference in the average weekday bedtime between students with high stress (Stress = High) and those with normal stress (Stress = Normal)?

t.test(WeekdayBed ~ Stress,
       data = SleepStudy,
       alternative = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  WeekdayBed by Stress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
##  -0.4856597  0.1447968
## sample estimates:
##   mean in group high mean in group normal 
##             24.71500             24.88543

For this question, I compared the average weekday bedtime between students with high stress and those with normal stress levels. I ran a two-sample t-test using WeekdayBed as the outcome. The results showed a t-value of –1.0746 with about 87.05 degrees of freedom and a p-value of 0.2855. Since this p-value is greater than 0.05, the difference between the two groups is not statistically significant. The 95% confidence interval ranged from –0.4857 to 0.1448, which includes zero, meaning there is no meaningful difference in bedtime between stressed and non-stressed students. The sample means were also very close: 24.72 for the high-stress group and 24.89 for the normal-stress group.

Q10: Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

t.test(WeekendSleep ~ YearGroup,
       data = SleepStudy,
       alternative = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  WeekendSleep by YearGroup
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means between group Lower and group Upper is not equal to 0
## 95 percent confidence interval:
##  -0.3497614  0.3331607
## sample estimates:
## mean in group Lower mean in group Upper 
##            8.213592            8.221892

For this question, I compared the average hours of sleep on weekends between lower-level students (ClassYear 1–2) and upper-level students (ClassYear 3–4). I ran a two-sample t-test using WeekendSleep as the outcome. The results showed a t-value of –0.0479 with about 237.36 degrees of freedom and a p-value of 0.9618. Since the p-value is much greater than 0.05, there is no statistically significant difference in weekend sleep between the two groups. The 95% confidence interval ranged from –0.3498 to 0.3332, which includes zero and indicates no meaningful difference. The sample means were basically the same: 8.21 hours for lower-level students and 8.22 hours for upper-level students.

Summary

After analyzing the SleepStudy dataset, I noticed a few clear patterns about college students’ sleep and lifestyle habits. Female students tended to have slightly higher GPAs than male students, and first and second year students had more early classes compared to upper-level students. Students with moderate or severe depression reported much lower happiness levels, which was one of the strongest differences in the whole dataset. On the other hand, several variables didn’t show big differences, for example, pulling an all-nighter didn’t significantly change sleep quality, stress levels didn’t affect weekday bedtimes much, and weekend sleep was basically the same across class years. Male students did drink more on average than female students, which matched the t-test results. Overall, the dataset showed that some factors like depression level, gender, and class year relate to student behaviors or outcomes, while others had very small or no significant differences.