Introduction

As a college student juggling school, work, and relationships, I’ve learned firsthand how important sleep is to function well and stay on top of everything. Without enough rest, it’s harder to stay focused, manage stress, and perform academically. Unfortunately, many college students face similar challenges and struggle to maintain healthy sleep patterns.

This project explores sleep behaviors among college students using data from a comprehensive sleep study. The focus is on comparing first- and second-year students with upperclassmen (third-year and beyond) to see if there are meaningful differences in sleep duration, quality, and related factors such as GPA and mental health scores.

By understanding these patterns, we can better identify the challenges students face in their early college years and explore how sleep may impact overall well-being and academic success.

The questions include:

# Q1: Is there a significant difference in the average GPA between male and female college students?
# Q2: Is there a significant difference in the average number of early classes between the first two class years and other class years?
#Q3: Do students who identify as "larks" have significantly better cognitive skills (cognition z-score) compared to "owls"?
#Q4: Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn't (EarlyClass=0)?
#Q5: Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?
#Q6: Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn't (AllNighter=0)?
#Q7: Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?
#Q8: Is there a significant difference in the average number of drinks per week between students of different genders?
#Q9: Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?
#Q10: Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

Data

college = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(college
     )

The data set includes information on 253 college students. It contains variables related to sleep patterns, mental health, academic performance, and lifestyle. Below is a summary of the data set structure and the variables included.

# Show structure of the data set
str(college)

The data set contains 27 variables about college students sleep habits, academic performance, mental health, and lifestyle factors. Below is a list of all variables included:

# List all variable names
names(college)

The data was collected through a survey of college students, who self-reported their sleep patterns, academic habits, and mental health status. This observational data set is commonly used in educational statistics courses and was provided as part of the Lock5Stat textbook resources. While self-reported data may include some bias or inaccuracies, it provides meaningful insights into student well-being and daily routines.

Analysis

Is there a significant difference in the average GPA between male and female college students?

We compared GPA scores between male and female students to see if gender is associated with academic performance. The box plot compares GPA across genders. In this data set, Gender is coded as 0 and 1. Based on the plot, students in group 0 appear to have a slightly higher median GPA compared to group 1. The spread of GPA is also slightly wider for group 1, with a few lower outliers.The 0 and 1 are not specifically labelled.

To determine whether the difference in average GPA is statistically significant, a two-sample t-test is used.

t.test(GPA ~ Gender, data = college)

A Welch two-sample t-test was conducted to compare the average GPA between gender groups. Students in group 0 had a higher average GPA (3.32) compared to group 1 (3.12). The test resulted in a p-value of 0.0001, indicating that the difference in GPA is statistically significant at the 0.05 level. We can conclude that gender is associated with a significant difference in GPA in this sample.

Is there a significant difference in the average number of early classes between the first two class years and other class years?

To investigate whether early-year college students have a different number of early classes compared to upper-year students, we grouped the data into two categories: “Lower” (first and second year) and “Upper” (third year and above). We then compared the number of early classes between these groups using visualizations and a two-sample t-test.

## [1] 4 1 2 3
table(college$YearGroup)
## 
## Lower Upper 
##   142   111
hist(college$EarlyClass[college$YearGroup == "Lower"],
     main = "Lower-Year Students: Early Classes",
     xlab = "Number of Early Classes",
     col = "lightblue",
     breaks = 5,
     xlim = c(0, max(college$EarlyClass, na.rm = TRUE)))

hist(college$EarlyClass[college$YearGroup == "Upper"],
     main = "Upper-Year Students: Early Classes",
     xlab = "Number of Early Classes",
     col = "lightgreen",
     breaks = 5,
     xlim = c(0, max(college$EarlyClass, na.rm = TRUE)))

t.test(EarlyClass ~ YearGroup, data = college)

The average number of early classes for lower-year students was approximately 0.73, compared to 0.59 for upper-year students. A Welch two-sample t-test yielded a p-value of 0.021, indicating that the difference is statistically significant at the 0.05 level. Therefore, we conclude that lower-year students tend to have significantly more early classes than upper-year students in this sample.

Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

t.test(CognitionZscore ~ LarkOwl, data = chronotype_data)

We compared the cognitive scores of students who identify as “larks” and “owls.” Larks had a slightly higher average score (0.09) than owls (-0.04), but a t-test gave a p-value of 0.423. This means the difference is not statistically significant. So, we can’t say that one group performs better than the other based on this data.

Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

t.test(ClassesMissed ~ EarlyClass, data = college)

Students who did not have early classes missed an average of 2.65 classes, while those with early classes missed about 2.00 classes. A t-test returned a p-value of 0.142, meaning the difference is not statistically significant at the 0.05 level. So, we can’t say that having an early class affects the number of classes a student misses based on this data.

Is there a significant difference in the average happiness level between students with at least moderate depression and those with normal depression status?

To explore how depression relates to happiness among college students, we grouped students based on their depression status. Those with “Moderate” or “Severe” depression were combined into one group and compared to students with a “Normal” depression status. We then compared their average happiness levels using a bar plot and a t-test.

## 
## Moderate+    Normal 
##        44       209

t.test(Happiness ~ DepressionGroup, data = college)

Students with normal depression status had an average happiness score of 27.06, while those with moderate or severe depression averaged 21.61. A Welch two-sample t-test showed this difference to be statistically significant (p < 0.000001), with a 95% confidence interval of approximately -7.38 to -3.51. This provides strong evidence that students experiencing moderate or severe depression report significantly lower happiness levels than those with normal depression status.

Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

To examine the relationship between pulling all-nighters and sleep quality, students were grouped based on whether they reported having at least one all-nighter (AllNighter = 1) or none (AllNighter = 0). We compared their average sleep quality scores, where higher scores indicate worse sleep quality, to see if staying up all night is linked to poorer rest.

t.test(PoorSleepQuality ~ AllNighter, data = college)

Students who did not report pulling an all-nighter had an average poor sleep quality score of 6.14, while those who did had a higher average of 7.03. Although the group that pulled all-nighters reported worse sleep, a t-test gave a p-value of 0.095. This means we cannot confidently say that pulling all-nighters is associated with worse sleep quality in this sample.

Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

To see if alcohol use is related to stress levels, we compared students who abstain from alcohol with those who report heavy drinking. We focused on their average stress scores to test for a significant difference between these two groups.

t.test(StressScore ~ AlcoholUse, data = stress_data)

Students who reported heavy alcohol use had a slightly higher average stress score than those who abstained. But the difference was not statistically significant (p = 0.536), so we can’t say alcohol use is related to stress levels based on this data.

Is there a significant difference in the average number of drinks per week between students of different genders?

To explore if alcohol consumption differs by gender, we compared the average number of drinks per week between male and female students.

t.test(Drinks ~ Gender, data = college)

Students in group 1 (likely males) drank more per week on average (7.54 drinks) than students in group 0 (likely females, 4.24 drinks). The p-value was very small (p < 0.000001), meaning this difference is statistically significant. So, gender appears to be related to how much students drink.

Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

We explored whether students with high stress go to bed at different times compared to those with normal stress levels. We used reported weekday bedtime data for this comparison.

t.test(WeekdayBed ~ Stress, data = college)

Students with high stress had an average weekday bedtime of 24.72 (about 12:43 AM), while those with normal stress went to bed at 24.89 (about 12:53 AM). The difference was not statistically significant (p = 0.286), so we can’t say stress level affects weekday bedtime based on this data.

Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

We compared weekend sleep hours between underclassmen (1st and 2nd year students) and upperclassmen (3rd year and beyond) to see if weekend sleep habits vary by academic year.

t.test(WeekendSleep ~ YearGroup, data = college)

Lower-year students got an average of 8.21 hours of weekend sleep, while upper-year students averaged 8.22. A t-test gave a p-value of 0.962, meaning there is no significant difference in weekend sleep between the two groups.

Project Conclusion

This project analyzed a wide range of factors that impact college students’ sleep habits, mental health, academic performance, and lifestyle choices. We examined 10 specific research questions using visualizations and t-tests to compare averages between two groups in each case.

Some results supported clear associations. For example, we found that students with moderate or severe depression reported significantly lower happiness scores, and male students consumed significantly more alcohol per week than female students. Similarly, lower-year students had more early classes on average, and gender showed a statistically significant difference in GPA.

However, several comparisons revealed no significant differences. These included bedtime based on stress levels, number of missed classes between early class and non-early class students, and stress differences between heavy drinkers and abstainers. These non-significant findings remind us that not all assumptions about student behavior hold up under statistical analysis.

Overall, this analysis provides insight into how academic and personal habits intersect with health and well-being in college students. While some patterns were expected, others were surprising, and all results contribute to a deeper understanding of student life.

Appendix of all code chunks

# Q1: GPA by Gender
boxplot(GPA ~ Gender, data = college)

t.test(GPA ~ Gender, data = college)

# Q2: Early Classes by Year Group
lower_mean <- mean(college$EarlyClass[college$YearGroup == "Lower"], na.rm = TRUE)
upper_mean <- mean(college$EarlyClass[college$YearGroup == "Upper"], na.rm = TRUE)
college$YearGroup <- ifelse(college$ClassYear %in% c(1, 2), "Lower", "Upper")
barplot(c(lower_mean, upper_mean),
        names.arg = c("Lower", "Upper"),
        col = "lightblue")

t.test(EarlyClass ~ YearGroup, data = college)

# Q3: Cognition Z-score by Chronotype
chronotype_data <- college[college$LarkOwl %in% c("Lark", "Owl"), ]
stripchart(CognitionZscore ~ LarkOwl, data = chronotype_data)

t.test(CognitionZscore ~ LarkOwl, data = chronotype_data)

# Q4: Classes Missed by EarlyClass
barplot(c(missed_0, missed_1),
        names.arg = c("No Early Class", "Had Early Class"),
        col = "pink")

t.test(ClassesMissed ~ EarlyClass, data = college)

# Q5: Happiness by Depression Status
college$DepressionGroup <- ifelse(college$DepressionStatus == "normal", "Normal", "Moderate+")
barplot(c(happy_normal, happy_moderate),
        names.arg = c("Normal", "Moderate+"),
        col = "green")

t.test(Happiness ~ DepressionGroup, data = college)

# Q6: Sleep Quality by All-Nighter Status
stripchart(PoorSleepQuality ~ AllNighter, data = college)

t.test(PoorSleepQuality ~ AllNighter, data = college)

# Q7: Stress Score by Alcohol Use
stress_data <- college[college$AlcoholUse %in% c("Abstain", "Heavy"), ]
stripchart(StressScore ~ AlcoholUse, data = stress_data)

t.test(StressScore ~ AlcoholUse, data = stress_data)

# Q8: Weekly Drinks by Gender
boxplot(Drinks ~ Gender, data = college)

t.test(Drinks ~ Gender, data = college)

# Q9: Weekday Bedtime by Stress Level
stripchart(WeekdayBed ~ Stress, data = college)

t.test(WeekdayBed ~ Stress, data = college)

# Q10: Weekend Sleep by Year Group
college$YearGroup <- ifelse(college$ClassYear %in% c(1, 2), "Lower", "Upper")
par(mfrow = c(1, 2))
hist(college$WeekendSleep[college$YearGroup == "Lower"],
     main = "Lower-Year Students", xlab = "Weekend Sleep (hrs)", col = "orange")
hist(college$WeekendSleep[college$YearGroup == "Upper"],
     main = "Upper-Year Students", xlab = "Weekend Sleep (hrs)", col = "skyblue")

par(mfrow = c(1, 1))
t.test(WeekendSleep ~ YearGroup, data = college)