Research Question: Do high school students in different grade levels (9th, 10th, 11th, and 12th grade) have significantly different mean hours of sleep on school nights?
This study focuses on the data of Youth Risk Behavior Surveillance System (YRBSS) on whether statistically significant variations in sleep duration exist between high school grade levels. To study health patterns among adolescents, 13,583 observations of high school students surveyed by the Centers for Disease Control and Prevention (CDC) are in the dataset. The variables utilized in this analysis are specific, and they are:
The data is obtained through the OpenIntro project and it is a full national survey on the youth health behaviors. The reason behind this topic is that sleep deprivation in high school students is a rising national health issue and insight into how sleep patterns change throughout the grades can be used to guide interventions and policy-making to respond to the issues related to school start times and distribution of academic workload.
Data preparation had some important steps that were implemented to make the dataset clean and ready to be analyzed statistically. I started by loading the required R packages (dplyr to manipulate the data and ggplot2 to visualize it) and imported the YRBSS data in the openintro package that originally included 13,583 observations that consisted of 13 variables. Secondly, I applied the select function to only retrieve the two variables that we were interested in: grade and school_night_hours_sleep. Third, I used the filter option to eliminate all observations with missing data in either of the two variables because ANOVA cannot be tested with incomplete data. This cleaning process left us with 12,262 out of 13,583 complete data items, which is a retention rate of about 90%. Fourth, I transformed the grade variable to a factor using the mutate function to make school_night_hours_sleep a numeric variable which is required to perform ANOVA and to create the necessary visualizations. And lastly, I applied group_by and summarise functions to estimate the descriptive statistics of every grade level. The summary indicated that 9th graders had an average of 7.11 hours of sleep (n=3,204), 10th grades had an average of 6.92 hours (n=2,840), 11th graders had an average of 6.78 hours (n=2,917), and 12th grades had an average of 6.68 hours (n=3,278). It is notable that 23 students as well who belonged to the category of other grade reported an average of 7.33 hours of sleep. These descriptive statistics imply that the average length of sleep significantly decreases as the students advance in high school, with an overall change of about 0.43 hours in 9th grade and 12th grade.
# Load required packages
library(dplyr)
library(ggplot2)
library(openintro)
data("yrbss")
# View structure of the dataset
dim(yrbss)
## [1] 13583 13
# Select relevant variables
sleep_data <- yrbss %>%
select(grade, school_night_hours_sleep) %>%
filter(!is.na(grade), !is.na(school_night_hours_sleep))
dim(sleep_data)
## [1] 12262 2
# Convert grade to factor for ANOVA
sleep_data <- sleep_data %>%
mutate(
grade = as.factor(grade),
school_night_hours_sleep = as.numeric(school_night_hours_sleep)
)
# Summary statistics by grade
sleep_data %>%
group_by(grade) %>%
summarise(
n = n(),
mean_sleep = mean(school_night_hours_sleep, na.rm = TRUE),
sd_sleep = sd(school_night_hours_sleep, na.rm = TRUE),
median_sleep = median(school_night_hours_sleep, na.rm = TRUE)
)
## # A tibble: 5 × 5
## grade n mean_sleep sd_sleep median_sleep
## <fct> <int> <dbl> <dbl> <dbl>
## 1 10 2840 6.92 1.11 7
## 2 11 2917 6.78 1.11 7
## 3 12 3278 6.68 1.09 7
## 4 9 3204 7.11 1.17 7
## 5 other 23 7.33 1.23 7
# Create a summary table showing distribution of observations
table(sleep_data$grade)
##
## 10 11 12 9 other
## 2840 2917 3278 3204 23
To answer this research question, I applied one way ANOVA in order to compare the mean sleep hours of the four grade levels. This is suitable as I have an independent variable (grade), which is discrete, and a dependent variable (sleep hours), which is continuous.
Hypotheses:
Null Hypothesis (H₀): μ₉ = μ₁₀ = μ₁₁ = μ₁₂ (all grade means are equal)
Alternative Hypothesis (Hₐ): At least one grade level has a different mean
The boxplot visualization shows a declining trend in sleep hours as grade level increases, with 9th graders displaying the highest median (7 hours) and 12th graders the lowest.
The ANOVA results revealed statistically significant differences across grade levels, F(4, 10989) = 58.17, p < .001. I rejected the null hypothesis and conclude that sleep duration differs significantly by grade level.
The HSD post-hoc test conducted by Tukey revealed some pairwise differences. Ninth graders reported more sleep than any other grade: 0.42 hours more than 12 th graders (p <.001), 0.33 hours more than 11 th graders (p <.001) and 0.18 hours more than 10 th graders (p <.001). Furthermore, 12 th graders had a very low sleep compared to 10 th and 11 th graders.
The effect size ((η² = 0.021)shows that the grade level accounts about 2 percent of sleep variation which is a small practical effect although it is statistically significant.
# Create boxplot for distribution of sleep across grades
ggplot(sleep_data, aes(x = grade, y = school_night_hours_sleep, fill = grade)) +
geom_boxplot() +
labs(
title = "Distribution of School Night Sleep Hours by Grade Level",
x = "Grade Level",
y = "Hours of Sleep on School Nights"
) +
theme_minimal() +
theme(legend.position = "none")
# Conduct One-Way ANOVA
anova_model <- aov(school_night_hours_sleep ~ grade, data = sleep_data)
# Display ANOVA results
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## grade 4 291 72.85 58.17 <2e-16 ***
## Residuals 10989 13762 1.25
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1268 observations deleted due to missingness
# Conduct Tukey's HSD post-hoc test
tukey_results <- TukeyHSD(anova_model)
print(tukey_results)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = school_night_hours_sleep ~ grade, data = sleep_data)
##
## $grade
## diff lwr upr p adj
## 11-10 -0.14389452 -0.22873945 -0.05904959 0.0000369
## 12-10 -0.23951899 -0.32199477 -0.15704321 0.0000000
## 9-10 0.18189489 0.09881685 0.26497293 0.0000000
## other-10 0.40983607 -0.47356820 1.29324033 0.7123443
## 12-11 -0.09562447 -0.17762710 -0.01362185 0.0127796
## 9-11 0.32578941 0.24318107 0.40839774 0.0000000
## other-11 0.55373058 -0.32962963 1.43709080 0.4276161
## 9-12 0.42141388 0.34124076 0.50158700 0.0000000
## other-12 0.64935506 -0.23378076 1.53249087 0.2630389
## other-9 0.22794118 -0.65525108 1.11113344 0.9556219
# Visualize Tukey HSD results
plot(tukey_results, las = 1, col = "blue")
# Calculate effect size (eta-squared)
anova_summary <- summary(anova_model)
ss_grade <- anova_summary[[1]]$`Sum Sq`[1]
ss_total <- sum(anova_summary[[1]]$`Sum Sq`)
eta_squared <- ss_grade / ss_total
cat("Eta-squared (effect size):", round(eta_squared, 4))
## Eta-squared (effect size): 0.0207
This comparison has found that there are statistically significant variations in the length of sleep by high school grade level, F(4, 10989) = 58.17, p =.001, and so we can reject the null hypothesis. The results indicate that sleep hours are reducing with grade level, with only 9th graders getting 7.11 hours of sleep on average compared to the 12th graders getting 6.68 hours of sleep on average, a difference of about 25 minutes in a day. The small effect size (e2 = 0.021) although statistically significant indicates that grade level accounts only 2 percent of variation in sleep meaning that there are other factors which contribute strongly. However, when considering its effect on the health of the population, the steady decrease is alarming because all grades are below the recommended sleep by the American Academy of Sleep Medicine 8-10 hours per teenagers.
The implications of these findings to educators, parents and policy makers are significant. The gradual loss of sleep can be attributed to more academic work, extracurricular activities, part time jobs and later bedtime occasioned by more autonomy. The issue of 12 th graders being especially sleep-deprived casts doubt on their college or workforce preparedness. Future studies need to be longitudinal designs that track students over the four-year span and add some more variables including homework time, screen time, extra-curricular activities and signs of mental health. Limiting factors such as self-reported data (subject to recall bias), cross-sectional design (no causal inferences), and failure to control confounding factors such as the socioeconomic status or family commitments are also important limitations. These limitations notwithstanding, this analysis offers important evidence that upperclassmen can be quite successfully educated in sleep, and that plays a role in the discussion concerning the school policies that may allow students to have healthier sleep patterns.
Centers for Disease Control and Prevention (CDC). Youth Risk Behavior Surveillance System (YRBSS). Available from: https://www.cdc.gov/healthyyouth/data/yrbs/index.htm
OpenIntro Project. (2024). YRBSS Dataset. OpenIntro Data. Available from: https://www.openintro.org/data/index.php?data=yrbss