Introduction

The dataset contains 253 observations across 27 variables. These variables include numerical data (e.g., GPA, sleep hours), categorical data (e.g., Gender, LarkOwl), and coded status indicators (e.g., Stress, DepressionStatus).

The data were self-reported by college students and likely collected through surveys and cognitive assessments. While self-reported data can contain biases, it is valuable for exploratory analysis in social science and health research.

The following research questions will be addressed in this report:

By addressing these questions, we aim to provide a comprehensive understanding of the sleep patterns and related factors among college students, ultimately contributing to the well-being and academic success of this population.

Data

The dataset includes 253 observations on 27 variables. These variables span multiple dimensions of college life, such as academic performance, sleep behavior, emotional well-being, and lifestyle choices. Key variables include:

Analysis

We will explore the questions in detail.

college = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(college)
##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
## 1      0         4 Neither             0          0 3.60             0
## 2      0         4 Neither             2          1 3.24             0
## 3      0         4     Owl             0          0 2.97            12
## 4      0         1    Lark             5          1 3.76             0
## 5      0         4     Owl             0          0 3.20             4
## 6      1         4 Neither             0          0 3.50             0
##   CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1           -0.26                4               4            3           8
## 2            1.39                6               1            0           3
## 3            0.38               18              18           18           9
## 4            1.39                9               1            4           6
## 5            1.22                9               7           25          14
## 6           -0.04                6              14            8          28
##   DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1           normal        normal normal       15        28   Moderate     10
## 2           normal        normal normal        4        25   Moderate      6
## 3         moderate        severe normal       45        17      Light      3
## 4           normal        normal normal       11        32      Light      2
## 5           normal        severe normal       46        15   Moderate      4
## 6         moderate      moderate   high       50        22    Abstain      0
##   WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1      25.75        8.70         7.70      25.75        9.50         5.88
## 2      25.70        8.20         6.80      26.00       10.00         7.25
## 3      27.44        6.55         3.00      28.00       12.59        10.09
## 4      23.50        7.17         6.77      27.00        8.00         7.25
## 5      25.90        8.67         6.09      23.75        9.50         7.00
## 6      23.80        8.95         9.05      26.00       10.75         9.00
##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0

Q1: Is there a significant difference in the average GPA between male and female college students?

Statistical Method: We used a Welch Two Sample t-test, which is appropriate when comparing the means of two independent groups (male vs. female students) and assuming unequal variances.

Hypothesis: Null hypothesis (H₀): There is no difference in mean GPA between male and female students.

Alternative hypothesis (H₁): There is a difference in mean GPA between male and female students.

The t-test shows a statistically significant difference in GPA between male and female students (t = 3.91, p < 0.001). The average GPA for female students is approximately 3.32, compared to 3.12 for male students. The 95% confidence interval for the difference in means is (0.10, 0.30), indicating that female students tend to have higher GPAs. The boxplot further illustrates this difference in central tendency.

Q2: Is there a significant difference in the average number of early classes between the first two class years and other class years?

Statistical Method: To answer this question, we used a Welch Two Sample t-test. This test compares the means of two independent groups: “Lower” class years (first and second years) and “Upper” class years (third and fourth years). The Welch test is appropriate here because it does not assume equal variances between the two groups.

Hypothesis: Null hypothesis (H₀): There is no significant difference in the average number of early classes between “Lower” and “Upper” class years.

Alternative hypothesis (H₁): There is a significant difference in the average number of early classes between “Lower” and “Upper” class years.

The Welch Two Sample t-test indicates a significant difference in the number of early classes between “Lower” class years (first and second years) and “Upper” class years (third and fourth years) with a t value of 4.18 and a p-value of 0.00004009, which is highly significant. The average number of early classes for “Lower” class years is 2.07, while for “Upper” class years, it is 1.31. The 95% confidence interval for the difference in means is (0.40, 1.12), which means that the true difference in early classes between the two groups is likely to fall within this range.

The boxplot visually shows the difference in early class numbers, with “Lower” class years having a higher median value compared to “Upper” class years.

Q3: Do students who identify as “larks” have significantly better cognitive skills (Cognition Z-score) compared to “owls”?

Statistical Method: To answer this question, we used a Welch Two Sample t-test. This test compares the means of two independent groups—students who identify as “larks” and those who identify as “owls”—to determine if there is a significant difference in their cognitive skills (measured by the Cognition Z-score). The Welch test was chosen because it does not assume equal variances between the two groups.

Hypothesis: Null hypothesis (H₀): There is no significant difference in the average cognitive skills (Cognition Z-score) between “larks” and “owls.”

Alternative hypothesis (H₁): There is a significant difference in the average cognitive skills (Cognition Z-score) between “larks” and “owls.”

The Welch Two Sample t-test reveals that there is no significant difference in the average cognitive skills (Cognition Z-score) between “larks” and “owls” with a t value of 0.81 and a p-value of 0.4229. Since the p-value is greater than the typical significance level of 0.05, we fail to reject the null hypothesis. Therefore, we conclude that there is no statistically significant difference in the cognitive skills between “larks” and “owls.”

The 95% confidence interval for the difference in means is (-0.19, 0.45), which includes zero, further supporting the conclusion that the true difference in cognitive skills between these groups could be zero.

The boxplot visually shows that the distributions of cognitive skills (Cognition Z-score) for both “larks” and “owls” are fairly similar, with no clear indication of a significant difference between the two groups.

Q4: Is there a significant difference in the average number of classes missed in a semester between students with at least one early class and those without?

Statistical Method: To answer this question, we used a Welch Two Sample t-test. This test was chosen because it compares the means of two independent groups: students who have at least one early class (EarlyClass = 1) and those who do not (EarlyClass = 0). Since the assumption of equal variances between the groups might not hold, the Welch test is used to account for unequal variances.

Hypothesis: Null hypothesis (H₀): There is no significant difference in the average number of classes missed between students with at least one early class and those without early classes.

Alternative hypothesis (H₁): There is a significant difference in the average number of classes missed between students with at least one early class and those without early classes.

Welch Two Sample t-test yields a t value of 1.4755 and a p-value of 0.1421. Since the p-value is greater than the typical significance level of 0.05, we fail to reject the null hypothesis. This suggests that there is no statistically significant difference in the average number of classes missed between students with at least one early class and those without early classes.

The 95% confidence interval for the difference in means is (-0.22, 1.54), which includes zero, further supporting the conclusion that the true difference in the number of classes missed could be zero.

The boxplot visually shows that while the students with early classes (EarlyClass = 1) tend to miss fewer classes on average (mean = 1.99), the difference between the groups is not large enough to be statistically significant.

Q5: Is there a significant difference in the average happiness level between students with moderate/severe depression and those with normal depression status?

Statistical Method: To test whether there is a significant difference in the average happiness level between students with moderate/severe depression and those with normal depression status, we used a Welch Two Sample t-test. The Welch test is appropriate here because it compares the means of two independent groups and accounts for unequal variances between them.

Hypothesis: Null hypothesis (H₀): There is no significant difference in the average happiness level between students with moderate/severe depression and those with normal depression status.

Alternative hypothesis (H₁): There is a significant difference in the average happiness level between students with moderate/severe depression and those with normal depression status.

The Welch Two Sample t-test results show a t-value of -5.6339 and a p-value of 6.057e-07. Since the p-value is much smaller than the significance level of 0.05, we reject the null hypothesis. This indicates that there is a statistically significant difference in the average happiness levels between students with moderate/severe depression and those with normal depression status.

The 95% confidence interval for the difference in means is (-7.38, -3.51). This interval does not contain zero, further supporting the conclusion that the true difference in happiness levels between the two groups is significantly different from zero.

The sample estimates show that the average happiness level for students with moderate/severe depression is 21.61, while for students with normal depression status, it is 27.06. This suggests that students with moderate/severe depression tend to report lower happiness levels on average.

The boxplot visually reinforces this finding, showing that students in the “Normal” depression group generally report higher happiness levels compared to those in the “Moderate/Severe” depression group.

Q6: Is there a significant difference in average sleep quality scores between students who reported at least one all-nighter and those who didn’t?

Statistical Method: To assess whether there is a significant difference in the average sleep quality scores between students who reported at least one all-nighter and those who did not, we used a Welch Two Sample t-test. The Welch test is appropriate because it compares the means of two independent groups and accounts for unequal variances between the groups.

Hypothesis: Null hypothesis (H₀): There is no significant difference in the average sleep quality scores between students who reported at least one all-nighter and those who did not.

Alternative hypothesis (H₁): There is a significant difference in the average sleep quality scores between students who reported at least one all-nighter and those who did not.

The Welch Two Sample t-test results show a t-value of -1.7068 and a p-value of 0.09479. Since the p-value is slightly greater than the significance level of 0.05, we fail to reject the null hypothesis. This suggests that there is no statistically significant difference in the average sleep quality scores between students who reported at least one all-nighter and those who did not, at the 5% significance level.

The 95% confidence interval for the difference in means is (-1.95, 0.16). This interval includes zero, which further supports the conclusion that there is no significant difference in sleep quality scores between the two groups.

The sample estimates show that the average poor sleep quality score for students who did not report any all-nighters is 6.14, while for those who reported at least one all-nighter, it is 7.03. Though the mean score for students who reported all-nighters is higher, the difference is not statistically significant.

The boxplot provides a visual representation of the distribution of sleep quality scores for both groups, showing that while the group with all-nighters tends to have higher scores (indicating poorer sleep quality), the difference is not substantial enough to reach statistical significance.

Q7: Do students who abstain from alcohol have significantly better stress scores than those who report heavy alcohol use?

Statistical Method: To investigate whether students who abstain from alcohol have significantly better stress scores than those who report heavy alcohol use, we performed a Welch Two Sample t-test. This test is used to compare the means of two independent groups, particularly when the variances of the groups may not be equal.

Hypothesis: Null hypothesis (H₀): There is no significant difference in the average stress scores between students who abstain from alcohol and those who report heavy alcohol use.

Alternative hypothesis (H₁): There is a significant difference in the average stress scores between students who abstain from alcohol and those who report heavy alcohol use.

The Welch Two Sample t-test results show a t-value of -0.62604 and a p-value of 0.5362. Since the p-value is greater than the significance level of 0.05, we fail to reject the null hypothesis. This indicates that there is no statistically significant difference in the average stress scores between students who abstain from alcohol and those who report heavy alcohol use.

The 95% confidence interval for the difference in means is (-6.261170, 3.327346), which includes zero. This further supports the conclusion that there is no significant difference between the two groups in terms of stress scores.

The sample estimates show that the mean stress score for students who abstain from alcohol is 8.97, while the mean stress score for students who report heavy alcohol use is 10.44. Although the group with heavy alcohol use has a slightly higher average stress score, this difference is not statistically significant.

The boxplot visually confirms the similarity in the distribution of stress scores between the two groups, with no major differences in the spread or central tendency of the data.

Q8: Is there a significant difference in the average number of drinks per week between male and female students?

Statistical Method: To determine if there is a significant difference in the average number of alcoholic drinks per week between male and female students, we used a Welch Two Sample t-test. This test compares the means of two independent groups (male and female students) while accounting for unequal variances between the groups.

Hypothesis: Null hypothesis (H₀): There is no significant difference in the average number of drinks per week between male and female students.

Alternative hypothesis (H₁): There is a significant difference in the average number of drinks per week between male and female students.

The Welch Two Sample t-test results show a t-value of -6.1601 and a p-value of 7.002e-09, which is much smaller than the typical significance level of 0.05. This means that we reject the null hypothesis and conclude that there is a statistically significant difference in the average number of alcoholic drinks consumed per week between male and female students.

The 95% confidence interval for the difference in means is (-4.36, -2.24), which does not include zero. This interval further supports the conclusion that the difference in the average number of drinks per week between male and female students is statistically significant.

The sample estimates indicate that the mean number of drinks per week for female students is approximately 4.24, while for male students, it is approximately 7.54. This suggests that, on average, male students report consuming more alcoholic drinks per week compared to female students.

The boxplot provides a clear visual representation of the distribution of drinks per week for both genders, showing that male students tend to have higher values than female students. This aligns with the statistical results that show a significant difference in means.

Q9: Is there a significant difference in the average weekday bedtime between students with high and normal stress levels?

Statistical Method: To assess if there is a significant difference in the average weekday bedtime between students with high stress levels and those with normal stress levels, we used a Welch Two Sample t-test. This test compares the means of two independent groups (high stress and normal stress) while adjusting for unequal variances.

Hypothesis: Null hypothesis (H₀): There is no significant difference in the average weekday bedtime between students with high stress and those with normal stress.

Alternative hypothesis (H₁): There is a significant difference in the average weekday bedtime between students with high stress and those with normal stress.

The Welch Two Sample t-test results show a t-value of -1.0746 and a p-value of 0.2855, which is greater than the common significance level of 0.05. This means that we fail to reject the null hypothesis and conclude that there is no statistically significant difference in the average weekday bedtime between students with high stress levels and those with normal stress levels.

The 95% confidence interval for the difference in means is (-0.49, 0.14), which includes zero. This suggests that the true difference in means could be zero, further supporting the conclusion that there is no significant difference between the two groups.

The sample estimates indicate that the average weekday bedtime for students with high stress is approximately 24.72, while for students with normal stress, it is approximately 24.89. The slight difference in means between the two groups is not statistically significant.

The boxplot provides a visual representation of the distribution of weekday bedtime for both high stress and normal stress groups. The boxplots suggest that there is a slight variation in bedtime between the two groups, but the difference is not large enough to be statistically significant.

Q10: Is there a significant difference in the average hours of sleep on weekends between first two year students and upperclassmen?

Statistical Method: To determine if there is a significant difference in the average hours of sleep on weekends between first-year and second-year students (Lower class) and upperclassmen (Upper class), we used a Welch Two Sample t-test. This test was chosen because it compares the means of two independent groups while adjusting for unequal variances.

Hypothesis: Null hypothesis (H₀): There is no significant difference in the average hours of sleep on weekends between Lower class students and Upper class students.

Alternative hypothesis (H₁): There is a significant difference in the average hours of sleep on weekends between Lower class students and Upper class students.

The Welch Two Sample t-test results show a t-value of -0.047888 and a p-value of 0.9618, which is much higher than the significance level of 0.05. This suggests that we fail to reject the null hypothesis, meaning there is no statistically significant difference in the average hours of sleep on weekends between first-year/second-year students (Lower class) and upperclassmen (Upper class).

The 95% confidence interval for the difference in means is (-0.35, 0.33), which includes zero, further supporting the conclusion that there is no significant difference in sleep hours between the two groups.

The sample estimates indicate that the average hours of sleep on weekends for Lower class students is approximately 8.21 hours, and for Upper class students, it is approximately 8.22 hours. The very small difference in means is not statistically significant.

The boxplot visually shows the distribution of weekend sleep hours for Lower and Upper class students. The boxplots are very similar, indicating that there is little variation between the two groups in terms of weekend sleep.

Summary

Conclusion: The analysis provides valuable insights into various factors affecting college students. Gender differences in GPA and alcohol consumption were significant, suggesting gender may influence academic performance and drinking habits. Early class scheduling appears to favor underclassmen, while chronotype and stress do not show meaningful effects on cognitive performance or bedtime. Mental health concerns, particularly depression, are strongly linked to lower happiness levels, underscoring the importance of support services. Overall, the findings emphasize the need for tailored academic policies and increased mental health resources for students.

References

  1. Lock5Stat. (n.d.). SleepStudy Dataset. Retrieved from https://www.lock5stat.com/datasets3e/SleepStudy.csv
# Load data
# college = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
# head(college)

# Q1: Is there a significant difference in the average GPA between male and female college students?
# Gender: 1 = male, 0 = female
# t.test(GPA ~ Gender, data = college)
# boxplot(GPA ~ Gender, data = college, 
#        names = c("Female", "Male"),
#        main = "GPA by Gender",
#        ylab = "GPA",
#        col = c("lightpink", "lightblue"))

# Q2: Is there a significant difference in the average number of early classes between the first two class years and other class years?
# college$YearGroup <- ifelse(college$ClassYear <= 2, "Lower", "Upper")
# t.test(NumEarlyClass ~ YearGroup, data = college)
#  Boxplot for visualizing the difference in early classes
# boxplot(NumEarlyClass ~ YearGroup, data = college, 
#        names = c("Lower Class Years", "Upper Class Years"),
#        main = "Early Classes by Class Year Group",
#        ylab = "Number of Early Classes",
#        col = c("lightgreen", "lightcoral"))

# Q3: Do students who identify as "larks" have significantly better cognitive skills (Cognition Z-score) compared to "owls"?
# lark_owl_data <- subset(college, LarkOwl %in% c("Lark", "Owl"))
# t.test(CognitionZscore ~ LarkOwl, data = lark_owl_data)
# Boxplot for visualizing the cognitive skills (Cognition Z-score) for Larks and Owls
# boxplot(CognitionZscore ~ LarkOwl, data = lark_owl_data, 
#        names = c("Larks", "Owls"),
#        main = "Cognitive Skills (Cognition Z-score) by Lark and Owl",
#        ylab = "Cognition Z-score",
#        col = c("lightblue", "lightcoral"))

# Q4: Is there a significant difference in the average number of classes missed in a semester between students with at least one early class and those without?
# t.test(ClassesMissed ~ EarlyClass, data = college)
# Boxplot for visualizing the number of classes missed for students with and without early classes
# boxplot(ClassesMissed ~ EarlyClass, data = college, 
#        names = c("No Early Class", "At Least One Early Class"),
#        main = "Classes Missed by Early Class Status",
#        ylab = "Number of Classes Missed",
#        col = c("lightblue", "lightcoral"))

# Q5: Is there a significant difference in the average happiness level between students with moderate/severe depression and those with normal depression status?
# college$DepGroup <- ifelse(college$DepressionStatus == "normal", "Normal", "Moderate/Severe")
# t.test(Happiness ~ DepGroup, data = college)
# Boxplot for visualizing happiness levels by depression status
# boxplot(Happiness ~ DepGroup, data = college,
#        names = c("Normal", "Moderate/Severe"),
#        main = "Happiness by Depression Status",
#        ylab = "Happiness Level",
#        col = c("lightblue", "lightcoral"))

# Q6: Is there a significant difference in average sleep quality scores between students who reported at least one all-nighter and those who didn’t?
# t.test(PoorSleepQuality ~ AllNighter, data = college)
# Boxplot for visualizing sleep quality by all-nighter status
# boxplot(PoorSleepQuality ~ AllNighter, data = college,
#        names = c("No All-Nighter", "All-Nighter"),
#        main = "Sleep Quality by All-Nighter Status",
#        ylab = "Poor Sleep Quality Score",
#        col = c("lightblue", "lightcoral"))

# Q7: Do students who abstain from alcohol have significantly better stress scores than those who report heavy alcohol use?
# alc_data <- subset(college, AlcoholUse %in% c("Abstain", "Heavy"))
# t.test(StressScore ~ AlcoholUse, data = alc_data)
# Boxplot for visualizing stress scores by alcohol use
# boxplot(StressScore ~ AlcoholUse, data = alc_data,
#        names = c("Abstain", "Heavy"),
#        main = "Stress Scores by Alcohol Use",
#        ylab = "Stress Score",
#        col = c("lightblue", "lightcoral"))

# Q8: Is there a significant difference in the average number of drinks per week between male and female students?
# t.test(Drinks ~ Gender, data = college)
# Boxplot for visualizing number of drinks by gender
# boxplot(Drinks ~ Gender, data = college,
#        names = c("Female", "Male"),
#        main = "Drinks per Week by Gender",
#        ylab = "Number of Drinks per Week",
#        col = c("lightblue", "lightcoral"))

# Q9: Is there a significant difference in the average weekday bedtime between students with high and normal stress levels?
# t.test(WeekdayBed ~ Stress, data = college)
# Boxplot for visualizing weekday bedtime by stress level
# boxplot(WeekdayBed ~ Stress, data = college,
#        names = c("Normal Stress", "High Stress"),
#        main = "Weekday Bedtime by Stress Level",
#        ylab = "Weekday Bedtime",
#        col = c("lightgreen", "lightcoral"))

# Q10: Is there a significant difference in the average hours of sleep on weekends between first two year students and upperclassmen?
# t.test(WeekendSleep ~ YearGroup, data = college)
# Boxplot to compare weekend sleep hours between Lower and Upper class students
# boxplot(WeekendSleep ~ YearGroup, data = college,
#        names = c("Lower Class", "Upper Class"),
#        main = "Weekend Sleep Hours by Year Group",
#        ylab = "Weekend Sleep Hours",
#        col = c("lightblue", "lightpink"))