Stat 353 Project 2: Exploring Sleep Pattern In College Students

1. Introduction

This report uses the SleepStudy dataset of 253 college students to examine how sleep habits, stress, depression, early classes, and lifestyle choices relate to outcomes such as GPA, cognition, happiness, and sleep quality. Ten comparison questions are analyzed using descriptive statistics and graphs.

Q1. Is there a significant difference in the average GPA between male and female college students?
Q2. Is there a significant difference in the average number of early classes between the first two class years and other class years?
Q3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?
Q4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?
Q5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?
Q6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?
Q7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?
Q8. Is there a significant difference in the average number of drinks per week between students of different genders?
Q9. Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?
Q10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

2. Data

The SleepStudy dataset includes 253 students and 27 variables describing sleep habits, academic performance, mental health, and lifestyle behaviors. It contains both categorical variables (e.g., gender, stress level, depression status) and numerical variables (e.g., GPA, sleep hours, drinks per week), making it suitable for comparing groups and analyzing differences in average outcomes.

3. Analysis

We will explore the questions in detail. Each question is analyzed and then summarized from the results.

Q1. Is there a significant difference in the average GPA between male and female college students?

mean(SleepStudy$GPA[SleepStudy$Gender == 1], na.rm = TRUE) #Male

## [1] 3.123725

mean(SleepStudy$GPA[SleepStudy$Gender == 0], na.rm = TRUE) #Female

## [1] 3.324901

t.test(GPA ~ Gender, data = SleepStudy)

## 
##  Welch Two Sample t-test
## 
## data:  GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1 
##        3.324901        3.123725

The average GPA for male students is approximately 3.12, while the average GPA for female students is approximately 3.32. This indicates that, in this dataset, female students have a slightly higher average GPA than male students. The results suggest that females tend to perform marginally better academically based on GPA averages alone.

Q2. Is there a significant difference in the average number of early classes between the first two class years and other class years?

boxplot(NumEarlyClass ~ ClassYear <= 2, data = SleepStudy,
        main = "Early Classes: Lower vs Upper Years",
        xlab = "Lower (≤2) vs Upper (≥3) Class Year",
        ylab = "Early Classes",
        names = c("Upper", "Lower"),
        col = c("lightpink", "lightblue"))

The boxplot compares the number of early classes taken by lower-year students (ClassYear 1–2) and upper-year students (ClassYear 3–4). The median number of early classes is slightly higher for upper-year students, and the upper-year group also shows a greater spread, indicating more variation in scheduling. Lower-year students tend to have fewer early classes overall, with most values clustered around the lower end.

Q3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

mean(SleepStudy$CognitionZscore[SleepStudy$LarkOwl == "Lark"], na.rm = TRUE)

## [1] 0.0902439

mean(SleepStudy$CognitionZscore[SleepStudy$LarkOwl == "Owl"],  na.rm = TRUE)

## [1] -0.03836735

subset_data <- SleepStudy[SleepStudy$LarkOwl %in% c("Lark", "Owl"),]
t.test(CognitionZscore ~ LarkOwl, data = subset_data)

## 
##  Welch Two Sample t-test
## 
## data:  CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
##  -0.1893561  0.4465786
## sample estimates:
## mean in group Lark  mean in group Owl 
##         0.09024390        -0.03836735

The average cognition z-score for students who identify as “Larks” is approximately 0.09, while the average for “Owls” is about –0.04. Because z-scores are centered around zero, this means larks score slightly above the overall cognitive average, while owls score slightly below it.

Q4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

par(mfrow = c(1,2))   

hist(SleepStudy$ClassesMissed[SleepStudy$EarlyClass == 0],
     main = "Classes Missed (No Early Class)",
     xlab = "Classes Missed",
     col = "lightblue",
     breaks = 8)

hist(SleepStudy$ClassesMissed[SleepStudy$EarlyClass == 1],
     main = "Classes Missed (Early Class)",
     xlab = "Classes Missed",
     col = "lightgreen",
     breaks = 8)

par(mfrow = c(1,1))

The two histograms compare how many classes students miss depending on whether they had at least one early class. In both groups, the distribution is heavily right-skewed, meaning most students miss only a small number of classes, while a few miss many. The histograms indicate that students with early classes tend to miss slightly more classes on average, although both groups show a strong concentration near zero.

Q5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

boxplot(Happiness ~ DepressionStatus,
        data = SleepStudy,
        main = "Happiness by Depression Status",
        xlab = "Depression Status",
        ylab = "Happiness Score",
        col = c("lightblue", "lightgreen", "lightpink"))

The boxplot compares happiness scores across three depression status groups: normal, moderate, and severe. Students with normal depression status show the highest median happiness scores, with most values clustered in the upper 20s. The moderate group has a slightly lower median but still relatively high happiness scores. The severe depression group displays the lowest median happiness, with scores shifted noticeably downward.

Q6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

var(SleepStudy$PoorSleepQuality[SleepStudy$AllNighter == 0], na.rm = TRUE) #no

## [1] 8.540782

var(SleepStudy$PoorSleepQuality[SleepStudy$AllNighter == 1], na.rm = TRUE) #yes

## [1] 7.968806

Prop_test(data = SleepStudy, variable = PoorSleepQuality, success = "1", by = AllNighter)

## 
## <<< 2-sample test for equality of proportions without continuity correction 
## 
## variable: PoorSleepQuality 
## success: 1 
## by: AllNighter 
## 
## --- Description
## 
##                   0       1
## -----------  ------  ------
## n_1               2       0
## n_total         219      34
## proportion    0.009   0.000
## 
## --- Inference
## 
## Chi-square statistic: 0.313 
## Degrees of freedom: 1 
## Hypothesis test of equal population proportions: p-value = 0.576

The variance in sleep quality scores is 8.54 for students who did not pull an all-nighter and 7.97 for those who did. These values are very similar, indicating that: - The spread or variability of sleep quality scores is nearly the same for both groups. - Pulling an all-nighter does not appear to create a noticeably larger or smaller range of sleep quality scores.

Q7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

par(mfrow = c(1,2))

hist(SleepStudy$StressScore[SleepStudy$AlcoholUse == "Abstain"],
     main = "Stress Score: Abstain",
     xlab = "Stress Score",
     col = "lightblue",
     breaks = 8)

hist(SleepStudy$StressScore[SleepStudy$AlcoholUse == "Heavy"],
     main = "Stress Score: Heavy Use",
     xlab = "Stress Score",
     col = "lightgreen",
     breaks = 8)

par(mfrow = c(1,1))

The histograms show a clear difference in stress scores between students who abstain from alcohol and those who report heavy alcohol use. - Abstainers tend to cluster at lower stress scores, with most values falling near the low end of the scale. - Heavy alcohol users show a shift toward higher stress scores, with noticeably more observations in the mid-to-high stress range.

The distributions suggest that students who abstain from alcohol generally report lower stress, while heavy alcohol users tend to experience higher stress levels. This indicates a likely relationship between greater alcohol use and increased stress in the dataset.

Q8. Is there a significant difference in the average number of drinks per week between students of different genders?

mean(SleepStudy$Drinks[SleepStudy$Gender == 0], na.rm = TRUE) #female

## [1] 4.238411

mean(SleepStudy$Drinks[SleepStudy$Gender == 1], na.rm = TRUE) #male

## [1] 7.539216

t.test(Drinks ~ Gender, data = SleepStudy, alternative = "two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 0.000000007002
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1 
##        4.238411        7.539216

The mean drinks-per-week for each gender shows a clear difference: - Female students average about 4.24 drinks per week. - Male students average about 7.54 drinks per week.

This means male students consume noticeably more alcohol per week on average than female students. The difference of roughly 3 drinks per week suggests a meaningful gap in drinking behavior between genders in this dataset.

Q9. Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

boxplot(SleepStudy$WeekdayBed[SleepStudy$Stress == "normal"],
        SleepStudy$WeekdayBed[SleepStudy$Stress == "high"],
        names = c("Normal", "High"),
        col = c("lightblue", "lightgreen"),
        main = "Weekday Bedtime by Stress Level",
        xlab = "Stress Level",
        ylab = "Weekday Bedtime")

The boxplot compares weekday bedtimes for students with normal stress versus high stress. The two groups show very similar bedtime patterns. Both have median bedtimes around 24–25 hours. The spread of the data is also similar for both groups, though the normal stress group shows a few more extreme late-night outliers. Overall, there is no noticeable shift suggesting that one stress group consistently goes to bed earlier or later than the other.

Q10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

# ClassYear 1 and 2
stem(SleepStudy$WeekendSleep[SleepStudy$ClassYear %in% c(1,2)])

## 
##   The decimal point is at the |
## 
##    4 | 0
##    4 | 8
##    5 | 03
##    5 | 88
##    6 | 0033334
##    6 | 5558888899
##    7 | 00000000111233444
##    7 | 55555555567888888899
##    8 | 00000011133333333333334
##    8 | 555556888899
##    9 | 00000013333
##    9 | 55555556778888899
##   10 | 00000000000334
##   10 | 5888
##   11 | 0

# ClassYear 3 and 4
stem(SleepStudy$WeekendSleep[SleepStudy$ClassYear %in% c(3,4)])

## 
##   The decimal point is at the |
## 
##    4 | 4
##    4 | 
##    5 | 4
##    5 | 89
##    6 | 0000013
##    6 | 555888889
##    7 | 000001333
##    7 | 55555688889
##    8 | 00000000033333334444
##    8 | 5555555558888889
##    9 | 000000013334
##    9 | 55555555566788
##   10 | 013333
##   10 | 
##   11 | 0
##   11 | 5
##   12 | 
##   12 | 8

The two stem-and-leaf plots compare weekend sleep hours for: - Class Years 1–2 (lower-level students) - Class Years 3–4 (upper-level students)

There does not appear to be a major difference in weekend sleep between the two groups. If anything, upper-level students may sleep slightly more on average, but the two distributions largely overlap.

4. Summary

This analysis examined ten questions using the SleepStudy dataset to compare groups based on gender, class year, sleep habits, stress, depression, and lifestyle behaviors.

With this, several clear patterns emerged. Female students showed slightly higher GPAs than males, and larks had higher cognitive scores than owls. Students with early classes tended to miss more classes, and happiness dropped sharply as depression severity increased. Alcohol use was linked to stress—abstainers generally reported lower stress than heavy drinkers—and male students consumed more drinks per week than females.

Other comparisons, such as weekday bedtime differences by stress level and weekend sleep across class years, showed only small or negligible differences.

Overall, the results suggest that sleep habits, mental health, and lifestyle choices are closely connected to academic and wellness outcomes in college students.

5. Appendix

# Q1 code: mean(SleepStudy$GPA[SleepStudy$Gender == 1], na.rm = TRUE) #Male                                                  # Q1 code: mean(SleepStudy$GPA[SleepStudy$Gender == 0], na.rm = TRUE) #Female
# Q1 code: t.test(GPA ~ Gender, data = SleepStudy)

# Q2 code: boxplot(NumEarlyClass ~ ClassYear <= 2, data = SleepStudy,main = "Early Classes: Lower vs Upper Years",xlab = "Lower (≤2) vs Upper (≥3) Class Year",ylab = "Early Classes",names = c("Upper", "Lower"),col = c("lightpink", "lightblue"))

# Q3 code: mean(SleepStudy$CognitionZscore[SleepStudy$LarkOwl == "Lark"], na.rm = TRUE)                                      # Q3 code: mean(SleepStudy$CognitionZscore[SleepStudy$LarkOwl == "Owl"],  na.rm = TRUE)
# Q3 code: subset_data <- SleepStudy[SleepStudy$LarkOwl %in% c("Lark", "Owl"),]
# Q3 code: t.test(CognitionZscore ~ LarkOwl, data = subset_data)

# Q4 code: par(mfrow = c(1,2))                                                                                               # Q4 code: hist(SleepStudy$ClassesMissed[SleepStudy$EarlyClass == 0],main = "Classes Missed (No Early Class)", xlab = "Classes Missed",col = "lightblue",breaks = 8)                                                                               # Q4 code: hist(SleepStudy$ClassesMissed[SleepStudy$EarlyClass == 1], main = "Classes Missed (Early Class)", xlab = "Classes Missed", col = "lightgreen", breaks = 8)                                                                                     # Q4 code: par(mfrow = c(1,1)) 

# Q5 code: boxplot(Happiness ~ DepressionStatus, data = SleepStudy, main = "Happiness by Depression Status", xlab = "Depression Status", ylab = "Happiness Score", col = c("lightblue", "lightgreen", "lightpink"))

# Q6 code: var(SleepStudy$PoorSleepQuality[SleepStudy$AllNighter == 0], na.rm = TRUE) #no                                    # Q6 code: var(SleepStudy$PoorSleepQuality[SleepStudy$AllNighter == 1], na.rm = TRUE) #yes
# Q6 code: Prop_test(data = SleepStudy, variable = PoorSleepQuality, success = "1", by = AllNighter)

# Q7 code: par(mfrow = c(1,2))                                                                                               # Q7 code: hist(SleepStudy$StressScore[SleepStudy$AlcoholUse == "Abstain"],main = "Stress Score: Abstain", xlab = "Stress Score", col = "lightblue", breaks = 8)                                                                                       # Q7 code: hist(SleepStudy$StressScore[SleepStudy$AlcoholUse == "Heavy"], main = "Stress Score: Heavy Use", xlab = "Stress Score", col = "lightgreen", breaks = 8)                                                                                                 par(mfrow = c(1,1))

# Q8 code: mean(SleepStudy$Drinks[SleepStudy$Gender == 0], na.rm = TRUE) #female                                             # Q8 code: mean(SleepStudy$Drinks[SleepStudy$Gender == 1], na.rm = TRUE) #male
# Q8 code: t.test(Drinks ~ Gender, data = SleepStudy, alternative = "two.sided")

# Q9 code: boxplot(SleepStudy$WeekdayBed[SleepStudy$Stress == "normal"], SleepStudy$WeekdayBed[SleepStudy$Stress == "high"], names = c("Normal", "High"), col = c("lightblue", "lightgreen"), main = "Weekday Bedtime by Stress Level", xlab = "Stress Level", ylab = "Weekday Bedtime")

# Q10 code: stem(SleepStudy$WeekendSleep[SleepStudy$ClassYear %in% c(1,2)])                                                  # Q10 code: stem(SleepStudy$WeekendSleep[SleepStudy$ClassYear %in% c(3,4)])