This report is an analyssi on sleep patterns and how a students academics are affected.
The primary objective of this analysis is to address several research
questions related to:
- Gender differences in academic performance (GPA)
- The influence of early class schedules on attendance and sleep
habits
- The impact of chronotype (i.e., “larks” vs. “owls”) on cognitive
performance
- Relationships between depression, happiness, alcohol use, and
stress
- Differences in sleep patterns across student class years
SleepStudy <- read_excel("~/Downloads/SleepStudy.xlsx")
str(SleepStudy)
## tibble [253 × 27] (S3: tbl_df/tbl/data.frame)
## $ Gender : num [1:253] 0 0 0 0 0 1 1 0 0 0 ...
## $ ClassYear : num [1:253] 4 4 4 1 4 4 2 2 1 4 ...
## $ LarkOwl : chr [1:253] "Neither" "Neither" "Owl" "Lark" ...
## $ NumEarlyClass : num [1:253] 0 2 0 5 0 0 2 0 2 2 ...
## $ EarlyClass : num [1:253] 0 1 0 1 0 0 1 0 1 1 ...
## $ GPA : num [1:253] 3.6 3.24 2.97 3.76 3.2 3.5 3.35 3 4 2.9 ...
## $ ClassesMissed : num [1:253] 0 0 12 0 4 0 2 0 0 0 ...
## $ CognitionZscore : num [1:253] -0.26 1.39 0.38 1.39 1.22 -0.04 0.41 -0.59 1.03 0.72 ...
## $ PoorSleepQuality: num [1:253] 4 6 18 9 9 6 2 10 5 2 ...
## $ DepressionScore : num [1:253] 4 1 18 1 7 14 1 2 12 6 ...
## $ AnxietyScore : num [1:253] 3 0 18 4 25 8 0 2 16 11 ...
## $ StressScore : num [1:253] 8 3 9 6 14 28 1 3 20 31 ...
## $ DepressionStatus: chr [1:253] "normal" "normal" "moderate" "normal" ...
## $ AnxietyStatus : chr [1:253] "normal" "normal" "severe" "normal" ...
## $ Stress : chr [1:253] "normal" "normal" "normal" "normal" ...
## $ DASScore : num [1:253] 15 4 45 11 46 50 2 7 48 48 ...
## $ Happiness : num [1:253] 28 25 17 32 15 22 25 29 29 30 ...
## $ AlcoholUse : chr [1:253] "Moderate" "Moderate" "Light" "Light" ...
## $ Drinks : num [1:253] 10 6 3 2 4 0 6 3 3 6 ...
## $ WeekdayBed : num [1:253] 25.8 25.7 27.4 23.5 25.9 ...
## $ WeekdayRise : num [1:253] 8.7 8.2 6.55 7.17 8.67 8.95 8.48 9.07 8.75 8 ...
## $ WeekdaySleep : num [1:253] 7.7 6.8 3 6.77 6.09 9.05 7.73 9.02 8.25 6.6 ...
## $ WeekendBed : num [1:253] 25.8 26 28 27 23.8 ...
## $ WeekendRise : num [1:253] 9.5 10 12.6 8 9.5 ...
## $ WeekendSleep : num [1:253] 5.88 7.25 10.09 7.25 7 ...
## $ AverageSleep : num [1:253] 7.18 6.93 5.02 6.9 6.35 9.04 7.52 9.01 8.54 6.68 ...
## $ AllNighter : num [1:253] 0 0 0 0 0 0 1 0 0 0 ...
print(names(SleepStudy))
## [1] "Gender" "ClassYear" "LarkOwl" "NumEarlyClass"
## [5] "EarlyClass" "GPA" "ClassesMissed" "CognitionZscore"
## [9] "PoorSleepQuality" "DepressionScore" "AnxietyScore" "StressScore"
## [13] "DepressionStatus" "AnxietyStatus" "Stress" "DASScore"
## [17] "Happiness" "AlcoholUse" "Drinks" "WeekdayBed"
## [21] "WeekdayRise" "WeekdaySleep" "WeekendBed" "WeekendRise"
## [25] "WeekendSleep" "AverageSleep" "AllNighter"
print(table(SleepStudy$ClassYear, useNA = "ifany"))
##
## 1 2 3 4
## 47 95 54 57
print(table(SleepStudy$LarkOwl, useNA = "ifany"))
##
## Lark Neither Owl
## 41 163 49
print(table(SleepStudy$DepressionStatus, useNA = "ifany"))
##
## moderate normal severe
## 34 209 10
if(!is.factor(SleepStudy$ClassYear)) {
SleepStudy$ClassYear <- factor(SleepStudy$ClassYear)
}
if(!is.factor(SleepStudy$Gender)) {
SleepStudy$Gender <- factor(SleepStudy$Gender)
}
if(is.factor(SleepStudy$EarlyClass)) {
SleepStudy$EarlyClass <- as.numeric(as.character(SleepStudy$EarlyClass))
}
if(!("StressScore" %in% names(SleepStudy)) && "Stress" %in% names(SleepStudy)) {
if(is.factor(SleepStudy$Stress)) {
SleepStudy$StressScore <- as.numeric(as.character(SleepStudy$Stress))
} else {
SleepStudy$StressScore <- SleepStudy$Stress
}
}
if(is.numeric(SleepStudy$Stress)) {
SleepStudy$StressCat <- ifelse(SleepStudy$Stress >= median(SleepStudy$Stress, na.rm = TRUE),
"High", "Normal")
SleepStudy$StressCat <- factor(SleepStudy$StressCat, levels = c("Normal", "High"))
}
if(any(tolower(as.character(SleepStudy$ClassYear)) %in% c("freshman", "sophomore"))) {
SleepStudy$Group <- ifelse(tolower(as.character(SleepStudy$ClassYear)) %in% c("freshman", "sophomore"),
"Underclassmen", "Upperclassmen")
} else if(is.numeric(as.numeric(as.character(SleepStudy$ClassYear)))) {
# If ClassYear is numeric (e.g., 1, 2, 3, 4), consider values <=2 as underclassmen.
class_year_numeric <- as.numeric(as.character(SleepStudy$ClassYear))
SleepStudy$Group <- ifelse(class_year_numeric <= 2, "Underclassmen", "Upperclassmen")
} else {
levs <- sort(unique(SleepStudy$ClassYear))
cutoff_index <- ceiling(length(levs) / 2)
lower_levels <- levs[1:cutoff_index]
SleepStudy$Group <- ifelse(SleepStudy$ClassYear %in% lower_levels, "Underclassmen", "Upperclassmen")
}
SleepStudy$Group <- factor(SleepStudy$Group, levels = c("Underclassmen", "Upperclassmen"))
print(table(SleepStudy$ClassYear))
##
## 1 2 3 4
## 47 95 54 57
print(table(SleepStudy$Group))
##
## Underclassmen Upperclassmen
## 142 111
For each research question, if the intended grouping variable does not have the required two levels, the code now prints a message and outputs descriptive statistics for that single group rather than skipping the analysis entirely.
boxplot(GPA ~ Gender, data = SleepStudy,
main = "GPA Distribution by Gender",
xlab = "Gender", ylab = "GPA")
t_test_gender <- t.test(GPA ~ Gender, data = SleepStudy)
print(t_test_gender)
##
## Welch Two Sample t-test
##
## data: GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1
## 3.324901 3.123725
SleepStudy_dropped <- droplevels(SleepStudy)
group_counts <- table(SleepStudy_dropped$Group)
print(group_counts)
##
## Underclassmen Upperclassmen
## 142 111
if(length(group_counts[group_counts > 0]) < 2) {
message("Only one group present for ClassYear (", names(group_counts)[group_counts > 0],
"). Displaying summary for NumEarlyClass:")
print(summary(SleepStudy_dropped$NumEarlyClass))
} else {
boxplot(NumEarlyClass ~ Group, data = SleepStudy_dropped,
main = "Number of Early Classes by Class Year Group",
xlab = "Class Year Group", ylab = "Number of Early Classes")
anova_early <- aov(NumEarlyClass ~ Group, data = SleepStudy_dropped)
print(summary(anova_early))
}
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 1 36.4 36.38 16.34 7.06e-05 ***
## Residuals 251 558.9 2.23
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
SleepStudy$LarkOwl <- factor(SleepStudy$LarkOwl)
if(length(levels(SleepStudy$LarkOwl)) != 2) {
message("LarkOwl has ", length(levels(SleepStudy$LarkOwl)), " level(s). Displaying summary for CognitionZscore:")
print(summary(SleepStudy$CognitionZscore))
} else {
boxplot(CognitionZscore ~ LarkOwl, data = SleepStudy,
main = "Cognition (Z-Score) by Chronotype",
xlab = "Chronotype (Lark/Owl)", ylab = "Cognition Z-Score")
t_test_chrono <- t.test(CognitionZscore ~ LarkOwl, data = SleepStudy)
print(t_test_chrono)
}
## LarkOwl has 3 level(s). Displaying summary for CognitionZscore:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.62e+00 -4.80e-01 -1.00e-02 -3.95e-05 4.40e-01 1.96e+00
if(length(unique(SleepStudy$EarlyClass)) < 2) {
message("EarlyClass has only one value. Displaying summary for ClassesMissed:")
print(summary(SleepStudy$ClassesMissed))
} else {
boxplot(ClassesMissed ~ EarlyClass, data = SleepStudy,
main = "Classes Missed by Early Class Attendance",
xlab = "Early Class (0 = None, 1 = At Least One)", ylab = "Classes Missed")
t_test_attendance <- t.test(ClassesMissed ~ EarlyClass, data = SleepStudy)
print(t_test_attendance)
}
##
## Welch Two Sample t-test
##
## data: ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.2233558 1.5412830
## sample estimates:
## mean in group 0 mean in group 1
## 2.647059 1.988095
SleepStudy$DepressionStatus <- factor(SleepStudy$DepressionStatus)
if(length(levels(SleepStudy$DepressionStatus)) != 2) {
message("DepressionStatus has ", length(levels(SleepStudy$DepressionStatus)), " level(s). Displaying summary for Happiness:")
print(summary(SleepStudy$Happiness))
} else {
boxplot(Happiness ~ DepressionStatus, data = SleepStudy,
main = "Happiness by Depression Status",
xlab = "Depression Status", ylab = "Happiness")
t_test_happiness <- t.test(Happiness ~ DepressionStatus, data = SleepStudy)
print(t_test_happiness)
}
## DepressionStatus has 3 level(s). Displaying summary for Happiness:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 24.00 28.00 26.11 30.00 35.00
if(length(unique(SleepStudy$AllNighter)) < 2) {
message("AllNighter has only one level. Displaying summary for PoorSleepQuality:")
print(summary(SleepStudy$PoorSleepQuality))
} else {
boxplot(PoorSleepQuality ~ AllNighter, data = SleepStudy,
main = "Sleep Quality by All-Nighter Experience",
xlab = "All-Nighter (0 = None, 1 = At Least One)", ylab = "Poor Sleep Quality")
t_test_sleepquality <- t.test(PoorSleepQuality ~ AllNighter, data = SleepStudy)
print(t_test_sleepquality)
}
##
## Welch Two Sample t-test
##
## data: PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -1.9456958 0.1608449
## sample estimates:
## mean in group 0 mean in group 1
## 6.136986 7.029412
dat <- subset(SleepStudy, !is.na(StressScore) & !is.na(AlcoholUse) & is.finite(StressScore))
if(nrow(dat) == 0) {
message("No valid observations for StressScore and AlcoholUse. Cannot perform analysis.")
} else {
dat$AlcoholUse <- factor(dat$AlcoholUse, levels = c(0, 1), labels = c("Abstinent", "Heavy Use"))
if(length(unique(dat$AlcoholUse)) < 2) {
message("Only one group in AlcoholUse present. Displaying summary for StressScore:")
print(summary(dat$StressScore))
} else {
stress_range <- range(dat$StressScore[is.finite(dat$StressScore)], na.rm = TRUE)
boxplot(StressScore ~ AlcoholUse, data = dat,
main = "Stress Scores by Alcohol Use",
xlab = "Alcohol Use", ylab = "Stress Score", ylim = stress_range)
t_test_stress <- t.test(StressScore ~ AlcoholUse, data = dat)
print(t_test_stress)
}
}
## Only one group in AlcoholUse present. Displaying summary for StressScore:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 3.000 8.000 9.466 14.000 37.000
if(length(unique(SleepStudy$Gender)) < 2) {
message("Only one gender level present. Displaying summary for Drinks:")
print(summary(SleepStudy$Drinks))
} else {
boxplot(Drinks ~ Gender, data = SleepStudy,
main = "Weekly Alcohol Consumption by Gender",
xlab = "Gender", ylab = "Number of Drinks")
t_test_alcohol <- t.test(Drinks ~ Gender, data = SleepStudy)
print(t_test_alcohol)
}
##
## Welch Two Sample t-test
##
## data: Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1
## 4.238411 7.539216
if(length(unique(SleepStudy$Stress)) < 2) {
message("Only one stress level present (", paste(levels(SleepStudy$Stress), collapse=", "),
"). Displaying summary for WeekdayBed:")
print(summary(SleepStudy$WeekdayBed))
} else {
boxplot(WeekdayBed ~ Stress, data = SleepStudy,
main = "Weekday Bedtime by Stress Level",
xlab = "Stress Level (Normal vs. High)", ylab = "Weekday Bedtime")
t_test_bedtime <- t.test(WeekdayBed ~ Stress, data = SleepStudy)
print(t_test_bedtime)
}
##
## Welch Two Sample t-test
##
## data: WeekdayBed by Stress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
## -0.4856597 0.1447968
## sample estimates:
## mean in group high mean in group normal
## 24.71500 24.88543
SleepStudy$Group <- droplevels(SleepStudy$Group)
if(nlevels(SleepStudy$Group) < 2) {
message("Only one level present in Group (", paste(levels(SleepStudy$Group), collapse=", "),
"). Displaying summary for WeekendSleep:")
print(summary(SleepStudy$WeekendSleep))
} else {
boxplot(WeekendSleep ~ Group, data = SleepStudy,
main = "Weekend Sleep Hours by Class Year Group",
xlab = "Class Year Group", ylab = "Weekend Sleep Hours")
anova_weekend_sleep <- aov(WeekendSleep ~ Group, data = SleepStudy)
print(summary(anova_weekend_sleep))
}
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 1 0.0 0.0043 0.002 0.962
## Residuals 251 470.8 1.8755
The analysis of the SleepStudy dataset provided insights into several aspects of college students’ sleep and related behaviors: - Gender Differences in GPA: A t-test examines whether male and female students differ significantly in GPA. - Early Classes Across Class Years: Using NumEarlyClass and the grouping variable (Underclassmen vs. Upperclassmen), we compare early class attendance. If only one group is present, descriptive summaries are provided. - Chronotype and Cognitive Performance: Cognitive performance (CognitionZscore) is compared between Larks and Owls; otherwise, a summary is shown. - Impact of Early Classes on Attendance: The relationship between early class attendance and ClassesMissed is evaluated, with descriptive stats if only one value exists. - Depression Status and Happiness: Happiness scores are compared across depression statuses (or summarized if only one level exists). - All-Nighter Experience and Sleep Quality: The impact of pulling an all‑nighter on poor sleep quality is analyzed. - Alcohol Use and Stress: Stress scores are compared between abstinent and heavy alcohol users; if only one group exists, summaries are provided. - Gender and Alcohol Consumption: Weekly alcohol consumption (Drinks) is compared by gender. - Stress Levels and Weekday Bedtime: Weekday bedtimes are analyzed based on stress levels (Stress). - Class Year and Weekend Sleep Hours: Weekend sleep hours are compared between underclassmen and upperclassmen; if only one group is present, descriptive statistics are displayed.