This report is a comprehensive analysis of the sleep patterns of college students. The analysis’s data comes from the “SleepStudy” set, located in “https://www.lock5stat.com/datasets3e/SleepStudy.csv.” The habits and other data regarding the state of the student’s health in regards ton their habits are observed in 27 variables, and 253 observations.
In this report, we address the following questions to better understand the data set and the information we can derive from it:
Objectively, we hope to utilize these questions to derive causation from the various correlations between factors such as: gender, year, stress, mental health, and class times. Such information could be utilized to improve the sleep and health habits of students, and improve their academic performance.
The data used in this analysis consists of 253 observations and 27 variables. We take advantage of “https://www.lock5stat.com/datapage3e.html,” to collect our data. We use the sleep study data set in this analysis. Each of the 253 observations is a student and the 27 variables are what is being observed. Each students results are recorded and compiled in to the data set we analyze: “https://www.lock5stat.com/datasets3e/SleepStudy.csv.”
In this report we make use of R code and logical deduction to answer the 10 questions. We make extensive use of the t.test function to determine if certain events are related and how significantly they impact each other. We hope to derive logical explanations for the data associated with each question and to back the explinations with empirical evidence.
##
## Welch Two Sample t-test
##
## data: GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1
## 3.324901 3.123725
The above data shows the difference between the GPA’s of male and female college students. The average GPA for male students is 3.32 and for females the average GPA is 3.12. This is an overall difference of .2. Knowing this, we can observe that there isn’t a significant enough difference to suggest that gender plays a direct role in the average GPA of a student. Additionally, the t-test produces a p-value of .0001243. Which further renforces the idea that the difference in GPA average is inconsequential in regards to the gender of the student. Any differences are likely attributed to other factors.
## FirstTwo Other
## 2.070423 1.306306
##
## Welch Two Sample t-test
##
## data: sleep$NumEarlyClass by group
## t = 4.1813, df = 250.69, p-value = 4.009e-05
## alternative hypothesis: true difference in means between group FirstTwo and group Other is not equal to 0
## 95 percent confidence interval:
## 0.4042016 1.1240309
## sample estimates:
## mean in group FirstTwo mean in group Other
## 2.070423 1.306306
The average number of early classes taken in the first two years is 2. And in all other years is 1 (rounding because fractions of a class makes no sense). The p-value calculated suggests that the events are not coincidental, that is, there is some causality between the early classes taken and what year they are most frequently seen. Whether it be because newer students have lower registration priority or if the older students have learned that eaerly classes are worse for them, overall.
## Lark Owl
## 0.09024390 -0.03836735
## Lark Owl
## 0.6881824 0.4260723
##
## Welch Two Sample t-test
##
## data: CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
## -0.1893561 0.4465786
## sample estimates:
## mean in group Lark mean in group Owl
## 0.09024390 -0.03836735
Based on the data collected and the p-value generated by the t-test, students who identify as Larks do not show significantly higher cognition z-scores than students who identify as Owls. Both groups perform similarly on cognitive measures in this sample. Since the p-value is greater than .05, the observed difference has a high probability of being coincidental. The boxplot shown provides a visual representation of the data and we can observe that the larks and owls have very similar looking plots.
## 0 1
## 2.647059 1.988095
## 0 1
## 12.088235 9.616624
##
## Welch Two Sample t-test
##
## data: ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.2233558 1.5412830
## sample estimates:
## mean in group 0 mean in group 1
## 2.647059 1.988095
The values computed show that those with an early class actually missed less classes than those without. Which is not an intuitive answer. However, the p-value calculated suggests that there is no statistical significance of this relationship. That is, it’s unlikely these are determining factors in whether or not a student misses a class.
## [1] -0.5422648
By simply observing the graph provided. We can see that the overall happiness of students is decreased the more sever their depression is. With a correlation value of -.5, this makes sense.
## 0 1
## 6.136986 7.029412
##
## Welch Two Sample t-test
##
## data: PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -1.9456958 0.1608449
## sample estimates:
## mean in group 0 mean in group 1
## 6.136986 7.029412
The calculated p-value suggests that there isn’t a correlation between the sleep quality of students and all-nighters. However, the value is close and the boxplot suggests that students that have had at least one all-nighter, have a higher score. Which means that their overall sleep quality is poorer that those without all-nighters.
## Abstain Heavy
## 8.970588 10.437500
## Abstain Heavy
## 57.48396 60.79583
##
## Welch Two Sample t-test
##
## data: StressScore by AlcoholUse
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means between group Abstain and group Heavy is not equal to 0
## 95 percent confidence interval:
## -6.261170 3.327346
## sample estimates:
## mean in group Abstain mean in group Heavy
## 8.970588 10.437500
Our p-value suggests there is no correlation between those who drink and those who don’t. The barplot tells a similar story. Both suggest that, regardless of how much you drink, your stress level is not statistically significant in terms of difference.
## logical(0)
## logical(0)
##
## Welch Two Sample t-test
##
## data: Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1
## 4.238411 7.539216
Based on the t-test, we cab examine whether students of different genders differ in the number of alcoholic drinks consumed per week. The analysis showed that the difference was statistically significant (p = .000000007). These results indicate that there is evidence that average alcohol consumption differs across gender groups in this sample. In this case, females statistically drink more per week than males. The barplot provided is a visual representation of this data.
## logical(0)
##
## Welch Two Sample t-test
##
## data: WeekdayBed by Stress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
## -0.4856597 0.1447968
## sample estimates:
## mean in group high mean in group normal
## 24.71500 24.88543
According to the t-test utilized, the data analyzed indicated that the difference in mean weekday bedtime between the two groups was not statistically significant (t = -1.0746, p = ,2855). This suggests that stress level does not appear to influence typical weekday bedtime.
## FirstTwo Other
## 8.213592 8.221892
## FirstTwo Other
## 1.889152 1.858063
The boxplot provided suggests that, regardless of year, the average student gets around 8 hours of sleep. That is, the current year of the student does not affect how much sleep they get on an average basis. The data calculated suggests the same, with the difference being about .01 hours.
This project analyzes the sleep habits, health indicators, and academic behaviors of college students using the SleepStudy dataset (253 students, 27 variables). The goal is to explore how sleep patterns relate to factors such as gender, class year, stress, mental health, and lifestyle behaviors. Ten research questions were addressed using descriptive statistics, t-tests, correlations, and graphical displays.
GPA and Gender: Male and female students differ slightly in average GPA (3.32 vs. 3.12), but the difference is not practically meaningful. The statistical significance is likely due to sample size rather than a real effect.
Early Classes by Class Year: First- and second-year students take more early classes on average than upper-class students. The difference appears meaningful and may relate to scheduling priority or student preferences.
Larks vs. Owls and Cognition: Students identifying as “Larks” do not have significantly higher cognition scores than “Owls.” Boxplots and t-test results show nearly identical performance.
Early Classes and Classes Missed: Surprisingly, students with early classes miss slightly fewer classes, but the difference is not statistically significant.
Happiness and Depression: Happiness and depression are moderately negatively correlated (r ≈ –0.5). Higher depression scores strongly correspond to lower happiness levels, as shown in the scatterplot.
All-Nighters and Sleep Quality: Students who have had at least one all-nighter show worse sleep quality on average. The difference is small and the t-test result is borderline, but the trend is clear.
Alcohol Use and Stress: Stress levels do not differ significantly between students who abstain from alcohol and heavy drinkers. Both the barplot and t-test show no meaningful difference.
Gender and Weekly Drinks: The number of drinks consumed per week varies significantly by gender. In this sample, females report higher average alcohol consumption, and the t-test shows a highly significant difference.
Stress and Weekday Bedtime: Students with high stress and normal stress go to bed at similar times. No statistically significant difference was found.
Class Year and Weekend Sleep: First- and second-year students sleep about the same amount on weekends as upper-class students. Both the numbers and the boxplot show almost identical averages (around 8 hours).
Across all ten research questions, the data suggests that many commonly assumed relationships—such as stress affecting bedtime, or early classes increasing absences—do not show strong statistical support. However, some patterns emerge: depression strongly reduces happiness, all-nighters worsen sleep quality, early-year students take more early classes, and gender differences appear in alcohol consumption.
The results highlight that while certain habits and health measures are clearly related (e.g., depression and happiness), many others show minimal or no measurable impact within this dataset.
D2L Assignment (Instruction set) – Dr. Zhang’s Project Video (Set up reference)
https://www.lock5stat.com/datapage3e.html (All data page) – ://www.lock5stat.com/datasets3e/SleepStudy.csv (Sleep data)
Chat GPT (Code troubleshooting)
Full Analysis Code
``` r
# Q1
t_res <- t.test(GPA ~ Gender, data = sleep, var.equal = FALSE)
t_res
##
## Welch Two Sample t-test
##
## data: GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1
## 3.324901 3.123725
# Q2
group <- ifelse(sleep$ClassYear %in% c(1,2), "FirstTwo", "Other")
tapply(sleep$NumEarlyClass, group, mean, na.rm = TRUE)
## FirstTwo Other
## 2.070423 1.306306
t.test(sleep$NumEarlyClass ~ group)
##
## Welch Two Sample t-test
##
## data: sleep$NumEarlyClass by group
## t = 4.1813, df = 250.69, p-value = 4.009e-05
## alternative hypothesis: true difference in means between group FirstTwo and group Other is not equal to 0
## 95 percent confidence interval:
## 0.4042016 1.1240309
## sample estimates:
## mean in group FirstTwo mean in group Other
## 2.070423 1.306306
# Q3
sub <- subset(sleep, LarkOwl %in% c("Lark","Owl"))
tapply(sub$CognitionZscore, sub$LarkOwl, mean, na.rm = TRUE)
## Lark Owl
## 0.09024390 -0.03836735
tapply(sub$CognitionZscore, sub$LarkOwl, var, na.rm = TRUE)
## Lark Owl
## 0.6881824 0.4260723
t.test(CognitionZscore ~ LarkOwl, data = sub)
##
## Welch Two Sample t-test
##
## data: CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
## -0.1893561 0.4465786
## sample estimates:
## mean in group Lark mean in group Owl
## 0.09024390 -0.03836735
boxplot(CognitionZscore ~ LarkOwl, data=sub,
main="Cognition Z-Score: Larks vs Owls",
col=c("orange","lightblue"))
# Q4
tapply(sleep$ClassesMissed, sleep$EarlyClass, mean, na.rm = TRUE)
## 0 1
## 2.647059 1.988095
tapply(sleep$ClassesMissed, sleep$EarlyClass, var, na.rm = TRUE)
## 0 1
## 12.088235 9.616624
t.test(ClassesMissed ~ EarlyClass, data = sleep)
##
## Welch Two Sample t-test
##
## data: ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.2233558 1.5412830
## sample estimates:
## mean in group 0 mean in group 1
## 2.647059 1.988095
boxplot(ClassesMissed ~ EarlyClass, data=sleep,
main="Classes Missed by Early Class Status",
col=c("green","purple"))
# Q5
cor(sleep$Happiness, sleep$DepressionScore, use = "complete.obs")
## [1] -0.5422648
plot(sleep$DepressionScore, sleep$Happiness,
main = "Correlation Between Happiness and Depression",
xlab = "Depression Score",
ylab = "Happiness Score",
pch = 19, col = "blue")
abline(lm(Happiness ~ DepressionScore, data = sleep), col = "red", lwd = 2)
# Q6
boxplot(PoorSleepQuality ~ AllNighter, data=sleep,
names=c("No All-Nighter","At Least One"),
main="Sleep Quality by All-Nighter Status",
xlab="All-Nighter",
ylab="Poor Sleep Quality Score",
col=c("lightgreen","orange"))
tapply(sleep$PoorSleepQuality, sleep$AllNighter, mean, na.rm = TRUE)
## 0 1
## 6.136986 7.029412
t.test(PoorSleepQuality ~ AllNighter, data = sleep)
##
## Welch Two Sample t-test
##
## data: PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -1.9456958 0.1608449
## sample estimates:
## mean in group 0 mean in group 1
## 6.136986 7.029412
# Q7
sub2 <- subset(sleep, AlcoholUse %in% c("Abstain","Heavy"))
tapply(sub2$StressScore, sub2$AlcoholUse, mean, na.rm = TRUE)
## Abstain Heavy
## 8.970588 10.437500
tapply(sub2$StressScore, sub2$AlcoholUse, var, na.rm = TRUE)
## Abstain Heavy
## 57.48396 60.79583
t.test(StressScore ~ AlcoholUse, data=sub2)
##
## Welch Two Sample t-test
##
## data: StressScore by AlcoholUse
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means between group Abstain and group Heavy is not equal to 0
## 95 percent confidence interval:
## -6.261170 3.327346
## sample estimates:
## mean in group Abstain mean in group Heavy
## 8.970588 10.437500
barplot(tapply(sub2$StressScore, sub2$AlcoholUse, mean, na.rm=TRUE),
main="Average Stress Score: Abstainers vs Heavy Drinkers",
col=c("lightgray","darkgray"))
# Q8
sub2 <- subset(sleep, Gender %in% c("Male","Female"))
tapply(sub2$Drinks, sub2$Gender, mean, na.rm = TRUE)
## logical(0)
tapply(sub2$Drinks, sub2$Gender, var, na.rm = TRUE)
## logical(0)
t.test(Drinks ~ Gender, data = sleep)
##
## Welch Two Sample t-test
##
## data: Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1
## 4.238411 7.539216
barplot(tapply(sleep$Drinks, sleep$Gender, mean, na.rm = TRUE, names=c("Male","Female")),
main = "Average Drinks per Week by Gender",
xlab = " Male Female",
ylab = "Mean Drinks per Week",
col = c("skyblue", "pink", "lightgreen"))
names=c("Male","Female")
# Q9
sub_stress <- subset(sleep, Stress %in% c("High", "Normal"))
tapply(sub_stress$WeekdayBed, sub_stress$Stress, mean, na.rm = TRUE)
## logical(0)
t.test(WeekdayBed ~ Stress, data = sleep)
##
## Welch Two Sample t-test
##
## data: WeekdayBed by Stress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
## -0.4856597 0.1447968
## sample estimates:
## mean in group high mean in group normal
## 24.71500 24.88543
# Q10
group <- ifelse(sleep$ClassYear %in% c(1,2), "FirstTwo", "Other")
tapply(sleep$WeekendSleep, group, mean, na.rm = TRUE)
## FirstTwo Other
## 8.213592 8.221892
tapply(sleep$WeekendSleep, group, var, na.rm = TRUE)
## FirstTwo Other
## 1.889152 1.858063
boxplot(sleep$WeekendSleep ~ group,
main="Weekend Sleep Hours by Class Year Group",
col=c("lightgreen","lightpink"))