Introduction

This report is a comprehensive analysis of the sleep patterns of college students. The analysis’s data comes from the “SleepStudy” set, located in “https://www.lock5stat.com/datasets3e/SleepStudy.csv.” The habits and other data regarding the state of the student’s health in regards ton their habits are observed in 27 variables, and 253 observations.

In this report, we address the following questions to better understand the data set and the information we can derive from it:

  1. Is there a significant difference in the average GPA between male and female college students?
  2. Is there a significant difference in the average number of early classes between the first two class years and other class years?
  3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?
  4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?
  5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?
  6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?
  7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?
  8. Is there a significant difference in the average number of drinks per week between students of different genders?
  9. Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?
  10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

Objectively, we hope to utilize these questions to derive causation from the various correlations between factors such as: gender, year, stress, mental health, and class times. Such information could be utilized to improve the sleep and health habits of students, and improve their academic performance.

Data

The data used in this analysis consists of 253 observations and 27 variables. We take advantage of “https://www.lock5stat.com/datapage3e.html,” to collect our data. We use the sleep study data set in this analysis. Each of the 253 observations is a student and the 27 variables are what is being observed. Each students results are recorded and compiled in to the data set we analyze: “https://www.lock5stat.com/datasets3e/SleepStudy.csv.”

Analysis

In this report we make use of R code and logical deduction to answer the 10 questions. We make extensive use of the t.test function to determine if certain events are related and how significantly they impact each other. We hope to derive logical explanations for the data associated with each question and to back the explinations with empirical evidence.

Q1: Is there a significant difference in the average GPA between male and female college students?

## 
##  Welch Two Sample t-test
## 
## data:  GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1 
##        3.324901        3.123725

The above data shows the difference between the GPA’s of male and female college students. The average GPA for male students is 3.32 and for females the average GPA is 3.12. This is an overall difference of .2. Knowing this, we can observe that there isn’t a significant enough difference to suggest that gender plays a direct role in the average GPA of a student. Additionally, the t-test produces a p-value of .0001243. Which further renforces the idea that the difference in GPA average is inconsequential in regards to the gender of the student. Any differences are likely attributed to other factors.

Q2: Is there a significant difference in the average number of early classes between the first two class years and other class years?

## FirstTwo    Other 
## 2.070423 1.306306
## 
##  Welch Two Sample t-test
## 
## data:  sleep$NumEarlyClass by group
## t = 4.1813, df = 250.69, p-value = 4.009e-05
## alternative hypothesis: true difference in means between group FirstTwo and group Other is not equal to 0
## 95 percent confidence interval:
##  0.4042016 1.1240309
## sample estimates:
## mean in group FirstTwo    mean in group Other 
##               2.070423               1.306306

The average number of early classes taken in the first two years is 2. And in all other years is 1 (rounding because fractions of a class makes no sense). The p-value calculated suggests that the events are not coincidental, that is, there is some causality between the early classes taken and what year they are most frequently seen. Whether it be because newer students have lower registration priority or if the older students have learned that eaerly classes are worse for them, overall.

Q3: Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

##        Lark         Owl 
##  0.09024390 -0.03836735
##      Lark       Owl 
## 0.6881824 0.4260723
## 
##  Welch Two Sample t-test
## 
## data:  CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
##  -0.1893561  0.4465786
## sample estimates:
## mean in group Lark  mean in group Owl 
##         0.09024390        -0.03836735

Based on the data collected and the p-value generated by the t-test, students who identify as Larks do not show significantly higher cognition z-scores than students who identify as Owls. Both groups perform similarly on cognitive measures in this sample. Since the p-value is greater than .05, the observed difference has a high probability of being coincidental. The boxplot shown provides a visual representation of the data and we can observe that the larks and owls have very similar looking plots.

Q4: Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

##        0        1 
## 2.647059 1.988095
##         0         1 
## 12.088235  9.616624
## 
##  Welch Two Sample t-test
## 
## data:  ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.2233558  1.5412830
## sample estimates:
## mean in group 0 mean in group 1 
##        2.647059        1.988095

The values computed show that those with an early class actually missed less classes than those without. Which is not an intuitive answer. However, the p-value calculated suggests that there is no statistical significance of this relationship. That is, it’s unlikely these are determining factors in whether or not a student misses a class.

Q5: Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

## [1] -0.5422648

By simply observing the graph provided. We can see that the overall happiness of students is decreased the more sever their depression is. With a correlation value of -.5, this makes sense.

Q6: Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

##        0        1 
## 6.136986 7.029412
## 
##  Welch Two Sample t-test
## 
## data:  PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.9456958  0.1608449
## sample estimates:
## mean in group 0 mean in group 1 
##        6.136986        7.029412

The calculated p-value suggests that there isn’t a correlation between the sleep quality of students and all-nighters. However, the value is close and the boxplot suggests that students that have had at least one all-nighter, have a higher score. Which means that their overall sleep quality is poorer that those without all-nighters.

Q7: Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

##   Abstain     Heavy 
##  8.970588 10.437500
##  Abstain    Heavy 
## 57.48396 60.79583
## 
##  Welch Two Sample t-test
## 
## data:  StressScore by AlcoholUse
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means between group Abstain and group Heavy is not equal to 0
## 95 percent confidence interval:
##  -6.261170  3.327346
## sample estimates:
## mean in group Abstain   mean in group Heavy 
##              8.970588             10.437500

Our p-value suggests there is no correlation between those who drink and those who don’t. The barplot tells a similar story. Both suggest that, regardless of how much you drink, your stress level is not statistically significant in terms of difference.

Q8: Is there a significant difference in the average number of drinks per week between students of different genders?

## logical(0)
## logical(0)
## 
##  Welch Two Sample t-test
## 
## data:  Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1 
##        4.238411        7.539216

Based on the t-test, we cab examine whether students of different genders differ in the number of alcoholic drinks consumed per week. The analysis showed that the difference was statistically significant (p = .000000007). These results indicate that there is evidence that average alcohol consumption differs across gender groups in this sample. In this case, females statistically drink more per week than males. The barplot provided is a visual representation of this data.

Q9: Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

## logical(0)
## 
##  Welch Two Sample t-test
## 
## data:  WeekdayBed by Stress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
##  -0.4856597  0.1447968
## sample estimates:
##   mean in group high mean in group normal 
##             24.71500             24.88543

According to the t-test utilized, the data analyzed indicated that the difference in mean weekday bedtime between the two groups was not statistically significant (t = -1.0746, p = ,2855). This suggests that stress level does not appear to influence typical weekday bedtime.

Q10: Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

## FirstTwo    Other 
## 8.213592 8.221892
## FirstTwo    Other 
## 1.889152 1.858063

The boxplot provided suggests that, regardless of year, the average student gets around 8 hours of sleep. That is, the current year of the student does not affect how much sleep they get on an average basis. The data calculated suggests the same, with the difference being about .01 hours.

Summary

This project analyzes the sleep habits, health indicators, and academic behaviors of college students using the SleepStudy dataset (253 students, 27 variables). The goal is to explore how sleep patterns relate to factors such as gender, class year, stress, mental health, and lifestyle behaviors. Ten research questions were addressed using descriptive statistics, t-tests, correlations, and graphical displays.

Key Findings:

  1. GPA and Gender: Male and female students differ slightly in average GPA (3.32 vs. 3.12), but the difference is not practically meaningful. The statistical significance is likely due to sample size rather than a real effect.

  2. Early Classes by Class Year: First- and second-year students take more early classes on average than upper-class students. The difference appears meaningful and may relate to scheduling priority or student preferences.

  3. Larks vs. Owls and Cognition: Students identifying as “Larks” do not have significantly higher cognition scores than “Owls.” Boxplots and t-test results show nearly identical performance.

  4. Early Classes and Classes Missed: Surprisingly, students with early classes miss slightly fewer classes, but the difference is not statistically significant.

  5. Happiness and Depression: Happiness and depression are moderately negatively correlated (r ≈ –0.5). Higher depression scores strongly correspond to lower happiness levels, as shown in the scatterplot.

  6. All-Nighters and Sleep Quality: Students who have had at least one all-nighter show worse sleep quality on average. The difference is small and the t-test result is borderline, but the trend is clear.

  7. Alcohol Use and Stress: Stress levels do not differ significantly between students who abstain from alcohol and heavy drinkers. Both the barplot and t-test show no meaningful difference.

  8. Gender and Weekly Drinks: The number of drinks consumed per week varies significantly by gender. In this sample, females report higher average alcohol consumption, and the t-test shows a highly significant difference.

  9. Stress and Weekday Bedtime: Students with high stress and normal stress go to bed at similar times. No statistically significant difference was found.

  10. Class Year and Weekend Sleep: First- and second-year students sleep about the same amount on weekends as upper-class students. Both the numbers and the boxplot show almost identical averages (around 8 hours).

Overall Conclusion

Across all ten research questions, the data suggests that many commonly assumed relationships—such as stress affecting bedtime, or early classes increasing absences—do not show strong statistical support. However, some patterns emerge: depression strongly reduces happiness, all-nighters worsen sleep quality, early-year students take more early classes, and gender differences appear in alcohol consumption.

The results highlight that while certain habits and health measures are clearly related (e.g., depression and happiness), many others show minimal or no measurable impact within this dataset.

References

D2L Assignment (Instruction set) – Dr. Zhang’s Project Video (Set up reference)

https://www.lock5stat.com/datapage3e.html (All data page) – ://www.lock5stat.com/datasets3e/SleepStudy.csv (Sleep data)

Chat GPT (Code troubleshooting)

Appendix

Full Analysis Code


``` r
# Q1
t_res <- t.test(GPA ~ Gender, data = sleep, var.equal = FALSE)
t_res
## 
##  Welch Two Sample t-test
## 
## data:  GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1 
##        3.324901        3.123725
# Q2
group <- ifelse(sleep$ClassYear %in% c(1,2), "FirstTwo", "Other")
tapply(sleep$NumEarlyClass, group, mean, na.rm = TRUE)
## FirstTwo    Other 
## 2.070423 1.306306
t.test(sleep$NumEarlyClass ~ group)
## 
##  Welch Two Sample t-test
## 
## data:  sleep$NumEarlyClass by group
## t = 4.1813, df = 250.69, p-value = 4.009e-05
## alternative hypothesis: true difference in means between group FirstTwo and group Other is not equal to 0
## 95 percent confidence interval:
##  0.4042016 1.1240309
## sample estimates:
## mean in group FirstTwo    mean in group Other 
##               2.070423               1.306306
# Q3
sub <- subset(sleep, LarkOwl %in% c("Lark","Owl"))

tapply(sub$CognitionZscore, sub$LarkOwl, mean, na.rm = TRUE)
##        Lark         Owl 
##  0.09024390 -0.03836735
tapply(sub$CognitionZscore, sub$LarkOwl, var, na.rm = TRUE)
##      Lark       Owl 
## 0.6881824 0.4260723
t.test(CognitionZscore ~ LarkOwl, data = sub)
## 
##  Welch Two Sample t-test
## 
## data:  CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
##  -0.1893561  0.4465786
## sample estimates:
## mean in group Lark  mean in group Owl 
##         0.09024390        -0.03836735
boxplot(CognitionZscore ~ LarkOwl, data=sub,
        main="Cognition Z-Score: Larks vs Owls",
        col=c("orange","lightblue"))

# Q4
tapply(sleep$ClassesMissed, sleep$EarlyClass, mean, na.rm = TRUE)
##        0        1 
## 2.647059 1.988095
tapply(sleep$ClassesMissed, sleep$EarlyClass, var, na.rm = TRUE)
##         0         1 
## 12.088235  9.616624
t.test(ClassesMissed ~ EarlyClass, data = sleep)
## 
##  Welch Two Sample t-test
## 
## data:  ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.2233558  1.5412830
## sample estimates:
## mean in group 0 mean in group 1 
##        2.647059        1.988095
boxplot(ClassesMissed ~ EarlyClass, data=sleep,
        main="Classes Missed by Early Class Status",
        col=c("green","purple"))

# Q5
cor(sleep$Happiness, sleep$DepressionScore, use = "complete.obs")
## [1] -0.5422648
plot(sleep$DepressionScore, sleep$Happiness,
     main = "Correlation Between Happiness and Depression",
     xlab = "Depression Score",
     ylab = "Happiness Score",
     pch = 19, col = "blue")


abline(lm(Happiness ~ DepressionScore, data = sleep), col = "red", lwd = 2)

# Q6
boxplot(PoorSleepQuality ~ AllNighter, data=sleep,
        names=c("No All-Nighter","At Least One"),
        main="Sleep Quality by All-Nighter Status",
        xlab="All-Nighter",
        ylab="Poor Sleep Quality Score",
        col=c("lightgreen","orange"))

tapply(sleep$PoorSleepQuality, sleep$AllNighter, mean, na.rm = TRUE)
##        0        1 
## 6.136986 7.029412
t.test(PoorSleepQuality ~ AllNighter, data = sleep)
## 
##  Welch Two Sample t-test
## 
## data:  PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.9456958  0.1608449
## sample estimates:
## mean in group 0 mean in group 1 
##        6.136986        7.029412
# Q7
sub2 <- subset(sleep, AlcoholUse %in% c("Abstain","Heavy"))

tapply(sub2$StressScore, sub2$AlcoholUse, mean, na.rm = TRUE)
##   Abstain     Heavy 
##  8.970588 10.437500
tapply(sub2$StressScore, sub2$AlcoholUse, var, na.rm = TRUE)
##  Abstain    Heavy 
## 57.48396 60.79583
t.test(StressScore ~ AlcoholUse, data=sub2)
## 
##  Welch Two Sample t-test
## 
## data:  StressScore by AlcoholUse
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means between group Abstain and group Heavy is not equal to 0
## 95 percent confidence interval:
##  -6.261170  3.327346
## sample estimates:
## mean in group Abstain   mean in group Heavy 
##              8.970588             10.437500
barplot(tapply(sub2$StressScore, sub2$AlcoholUse, mean, na.rm=TRUE),
        main="Average Stress Score: Abstainers vs Heavy Drinkers",
        col=c("lightgray","darkgray"))

# Q8
sub2 <- subset(sleep, Gender %in% c("Male","Female"))

tapply(sub2$Drinks, sub2$Gender, mean, na.rm = TRUE)
## logical(0)
tapply(sub2$Drinks, sub2$Gender, var, na.rm = TRUE)
## logical(0)
t.test(Drinks ~ Gender, data = sleep)
## 
##  Welch Two Sample t-test
## 
## data:  Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1 
##        4.238411        7.539216
barplot(tapply(sleep$Drinks, sleep$Gender, mean, na.rm = TRUE, names=c("Male","Female")),
        main = "Average Drinks per Week by Gender",
        xlab = "   Male                                                               Female",
        ylab = "Mean Drinks per Week",
        col = c("skyblue", "pink", "lightgreen"))

        names=c("Male","Female")

# Q9

sub_stress <- subset(sleep, Stress %in% c("High", "Normal"))


tapply(sub_stress$WeekdayBed, sub_stress$Stress, mean, na.rm = TRUE)
## logical(0)
t.test(WeekdayBed ~ Stress, data = sleep)
## 
##  Welch Two Sample t-test
## 
## data:  WeekdayBed by Stress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
##  -0.4856597  0.1447968
## sample estimates:
##   mean in group high mean in group normal 
##             24.71500             24.88543
# Q10
group <- ifelse(sleep$ClassYear %in% c(1,2), "FirstTwo", "Other")

tapply(sleep$WeekendSleep, group, mean, na.rm = TRUE)
## FirstTwo    Other 
## 8.213592 8.221892
tapply(sleep$WeekendSleep, group, var, na.rm = TRUE)
## FirstTwo    Other 
## 1.889152 1.858063
boxplot(sleep$WeekendSleep ~ group,
        main="Weekend Sleep Hours by Class Year Group",
        col=c("lightgreen","lightpink"))