1. Introduction

This report examines the complex interplay between sleep patterns, mental health, and academic performance among college students, utilizing data from the “SleepStudy” dataset sourced from https://www.lock5stat.com/datapage3e.html. The dataset includes 254 entries with 28 variables, offering a comprehensive view of sleep habits, stress levels, and lifestyle factors in this population.

The primary objective of this analysis is to identify trends and relationships that highlight the pivotal role of sleep in shaping both the academic achievements and personal well-being of students. By addressing essential research questions, this report examines how variations in sleep duration and quality affect stress regulation, cognitive functioning, and overall mental health. The findings aim to guide strategies that enhance student health and academic performance, contributing to a deeper understanding of how lifestyle factors influence educational outcomes.

Following are the questions that I will be exploring in this report:

Q1. Is there a significant difference in the average GPA between male and female college students?

Q2. Is there a significant difference in the average number of early classes between the first two class years and other class years?

Q3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

Q4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

Q5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

Q6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

Q7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

Q8. Is there a significant difference in the average number of drinks per week between students of different genders?

Q9. Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

Q10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

2. Data

Structure of the Dataset

Number of Observations: 254 students.

Variables: The dataset contains 28 variables, as listed below:

Gender: The gender of the student.

ClassYear: The academic year of the student (e.g., freshman, sophomore).

LarkOwl: Chronotype preference, indicating whether the student is a “morning person” (lark) or a “night person” (owl).

NumEarlyClass: Number of early morning classes attended.

EarlyClass: Indicator of early class attendance.

GPA: Grade Point Average of the student.

ClassesMissed: Number of classes missed.

CognitionZscore: A cognitive ability score normalized as a Z-score.

PoorSleepQuality: Measure of poor sleep quality.

DepressionScore: A numerical score indicating depression levels.

AnxietyScore: A numerical score indicating anxiety levels.

StressScore: A numerical score indicating stress levels.

DepressionStatus: Depression classification (e.g., no depression, mild, severe).

AnxietyStatus: Anxiety classification (e.g., no anxiety, mild, severe).

Stress: Stress classification (e.g., low, moderate, high).

DASScore: Combined Depression, Anxiety, and Stress (DAS) score.

Happiness: A measure of happiness.

AlcoholUse: Alcohol consumption frequency or behavior.

Drinks: Number of drinks consumed.

WeekdayBed: Weekday bedtime.

WeekdayRise: Weekday wake-up time.

WeekdaySleep: Total hours of sleep on weekdays.

WeekendBed: Weekend bedtime.

WeekendRise: Weekend wake-up time.

WeekendSleep: Total hours of sleep on weekends.

AverageSleep: Average sleep hours across weekdays and weekends.

AllNighter: Indicator of whether the student has pulled an all-nighter.

# Load the dataset
sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")

# Inspect the dataset to understand its structure
str(sleepStudy)
## 'data.frame':    253 obs. of  27 variables:
##  $ Gender          : int  0 0 0 0 0 1 1 0 0 0 ...
##  $ ClassYear       : int  4 4 4 1 4 4 2 2 1 4 ...
##  $ LarkOwl         : chr  "Neither" "Neither" "Owl" "Lark" ...
##  $ NumEarlyClass   : int  0 2 0 5 0 0 2 0 2 2 ...
##  $ EarlyClass      : int  0 1 0 1 0 0 1 0 1 1 ...
##  $ GPA             : num  3.6 3.24 2.97 3.76 3.2 3.5 3.35 3 4 2.9 ...
##  $ ClassesMissed   : int  0 0 12 0 4 0 2 0 0 0 ...
##  $ CognitionZscore : num  -0.26 1.39 0.38 1.39 1.22 -0.04 0.41 -0.59 1.03 0.72 ...
##  $ PoorSleepQuality: int  4 6 18 9 9 6 2 10 5 2 ...
##  $ DepressionScore : int  4 1 18 1 7 14 1 2 12 6 ...
##  $ AnxietyScore    : int  3 0 18 4 25 8 0 2 16 11 ...
##  $ StressScore     : int  8 3 9 6 14 28 1 3 20 31 ...
##  $ DepressionStatus: chr  "normal" "normal" "moderate" "normal" ...
##  $ AnxietyStatus   : chr  "normal" "normal" "severe" "normal" ...
##  $ Stress          : chr  "normal" "normal" "normal" "normal" ...
##  $ DASScore        : int  15 4 45 11 46 50 2 7 48 48 ...
##  $ Happiness       : int  28 25 17 32 15 22 25 29 29 30 ...
##  $ AlcoholUse      : chr  "Moderate" "Moderate" "Light" "Light" ...
##  $ Drinks          : int  10 6 3 2 4 0 6 3 3 6 ...
##  $ WeekdayBed      : num  25.8 25.7 27.4 23.5 25.9 ...
##  $ WeekdayRise     : num  8.7 8.2 6.55 7.17 8.67 8.95 8.48 9.07 8.75 8 ...
##  $ WeekdaySleep    : num  7.7 6.8 3 6.77 6.09 9.05 7.73 9.02 8.25 6.6 ...
##  $ WeekendBed      : num  25.8 26 28 27 23.8 ...
##  $ WeekendRise     : num  9.5 10 12.6 8 9.5 ...
##  $ WeekendSleep    : num  5.88 7.25 10.09 7.25 7 ...
##  $ AverageSleep    : num  7.18 6.93 5.02 6.9 6.35 9.04 7.52 9.01 8.54 6.68 ...
##  $ AllNighter      : int  0 0 0 0 0 0 1 0 0 0 ...
# Create a new variable "AlcoholUseGroup"
# Grouping students based on their alcohol use
sleepStudy$AlcoholUseGroup <- ifelse(sleepStudy$AlcoholUse %in% c("Abstain", "Light"), "Low Use", "High Use")

# Verify the new variable
table(sleepStudy$AlcoholUseGroup)
## 
## High Use  Low Use 
##      136      117

3. Analysis

Here are the 10 questions in detail exploration.

college = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(college)
##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
## 1      0         4 Neither             0          0 3.60             0
## 2      0         4 Neither             2          1 3.24             0
## 3      0         4     Owl             0          0 2.97            12
## 4      0         1    Lark             5          1 3.76             0
## 5      0         4     Owl             0          0 3.20             4
## 6      1         4 Neither             0          0 3.50             0
##   CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1           -0.26                4               4            3           8
## 2            1.39                6               1            0           3
## 3            0.38               18              18           18           9
## 4            1.39                9               1            4           6
## 5            1.22                9               7           25          14
## 6           -0.04                6              14            8          28
##   DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1           normal        normal normal       15        28   Moderate     10
## 2           normal        normal normal        4        25   Moderate      6
## 3         moderate        severe normal       45        17      Light      3
## 4           normal        normal normal       11        32      Light      2
## 5           normal        severe normal       46        15   Moderate      4
## 6         moderate      moderate   high       50        22    Abstain      0
##   WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1      25.75        8.70         7.70      25.75        9.50         5.88
## 2      25.70        8.20         6.80      26.00       10.00         7.25
## 3      27.44        6.55         3.00      28.00       12.59        10.09
## 4      23.50        7.17         6.77      27.00        8.00         7.25
## 5      25.90        8.67         6.09      23.75        9.50         7.00
## 6      23.80        8.95         9.05      26.00       10.75         9.00
##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0

Q1: Is there a significant difference in the average GPA between male and female college students?

t_test_gpa_gender <- t.test(GPA ~ Gender, data = college, na.rm = TRUE)
t_test_gpa_gender
## 
##  Welch Two Sample t-test
## 
## data:  GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1 
##        3.324901        3.123725

Q2. Is there a significant difference in the average number of early classes between the first two class years and other class years?

# Create a new grouping variable for "FirstTwoYears" and "OtherYears"
sleepStudy$YearGroup <- ifelse(sleepStudy$ClassYear %in% c(1, 2), "FirstTwoYears", "OtherYears")

# Perform the t-test to compare the average number of early classes between the two groups
t_test_result <- t.test(NumEarlyClass ~ YearGroup, data = sleepStudy)

# Print the t-test results
print(t_test_result)
## 
##  Welch Two Sample t-test
## 
## data:  NumEarlyClass by YearGroup
## t = 4.1813, df = 250.69, p-value = 4.009e-05
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
##  0.4042016 1.1240309
## sample estimates:
## mean in group FirstTwoYears    mean in group OtherYears 
##                    2.070423                    1.306306
# Optional: Calculate the mean number of early classes by year group
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
group_means <- sleepStudy %>%
  group_by(YearGroup) %>%
  summarise(Average_EarlyClasses = mean(NumEarlyClass, na.rm = TRUE))
print(group_means)
## # A tibble: 2 × 2
##   YearGroup     Average_EarlyClasses
##   <chr>                        <dbl>
## 1 FirstTwoYears                 2.07
## 2 OtherYears                    1.31

Q3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

# Load the dataset
sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")

# Check the unique values in the LarkOwl variable
unique(sleepStudy$LarkOwl)
## [1] "Neither" "Owl"     "Lark"
# Subset data to include only "Lark" and "Owl" categories
sleepStudy_subset <- subset(sleepStudy, LarkOwl %in% c("Lark", "Owl"))

# Perform a two-sample t-test for CognitionZscore by LarkOwl
t_test_cognition <- t.test(CognitionZscore ~ LarkOwl, data = sleepStudy_subset, na.rm = TRUE)

# Output the results
t_test_cognition
## 
##  Welch Two Sample t-test
## 
## data:  CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
##  -0.1893561  0.4465786
## sample estimates:
## mean in group Lark  mean in group Owl 
##         0.09024390        -0.03836735

Q4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

# Load the dataset
sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")

# Perform a two-sample t-test for ClassesMissed by EarlyClass
t_test_classes_missed <- t.test(ClassesMissed ~ EarlyClass, data = sleepStudy, na.rm = TRUE)

# Output the results
t_test_classes_missed
## 
##  Welch Two Sample t-test
## 
## data:  ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.2233558  1.5412830
## sample estimates:
## mean in group 0 mean in group 1 
##        2.647059        1.988095

Q5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

# Load the dataset
sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")

# Create a new variable "DepressionGroup" based on depression status
# Assuming DepressionStatus is a categorical variable with at least moderate depression being '1'
# and normal depression status being '0'.
sleepStudy$DepressionGroup <- ifelse(sleepStudy$DepressionStatus >= "Moderate", "At Least Moderate", "Normal")

# Perform a two-sample t-test for Happiness by DepressionGroup
t_test_happiness <- t.test(Happiness ~ DepressionGroup, data = sleepStudy, na.rm = TRUE)

# Output the results
t_test_happiness
## 
##  Welch Two Sample t-test
## 
## data:  Happiness by DepressionGroup
## t = 3.7601, df = 46.07, p-value = 0.0004777
## alternative hypothesis: true difference in means between group At Least Moderate and group Normal is not equal to 0
## 95 percent confidence interval:
##  1.622570 5.360777
## sample estimates:
## mean in group At Least Moderate            mean in group Normal 
##                        26.57991                        23.08824

Q6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

# Ensure the AllNighter variable is a factor (0 = No, 1 = Yes)
sleepStudy$AllNighter <- as.factor(sleepStudy$AllNighter)

# Perform the t-test to compare average sleep quality scores between students who had at least one all-nighter and those who didn't
t_test_result <- t.test(PoorSleepQuality ~ AllNighter, data = sleepStudy)

# Print the t-test results
print(t_test_result)
## 
##  Welch Two Sample t-test
## 
## data:  PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.9456958  0.1608449
## sample estimates:
## mean in group 0 mean in group 1 
##        6.136986        7.029412
# Optional: Calculate the mean sleep quality score by all-nighter status
library(dplyr)
group_means <- sleepStudy %>%
  group_by(AllNighter) %>%
  summarise(Average_SleepQuality = mean(PoorSleepQuality, na.rm = TRUE))
print(group_means)
## # A tibble: 2 × 2
##   AllNighter Average_SleepQuality
##   <fct>                     <dbl>
## 1 0                          6.14
## 2 1                          7.03

Q7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")

sleepStudy$AlcoholUseGroup <- ifelse(sleepStudy$AlcoholUse %in% c("Abstain", "Light"), "Low Use", "High Use")

t_test_stress <- t.test(StressScore ~ AlcoholUseGroup, data = sleepStudy, na.rm = TRUE)
print(t_test_stress)
## 
##  Welch Two Sample t-test
## 
## data:  StressScore by AlcoholUseGroup
## t = 0.24753, df = 248.92, p-value = 0.8047
## alternative hypothesis: true difference in means between group High Use and group Low Use is not equal to 0
## 95 percent confidence interval:
##  -1.722125  2.217223
## sample estimates:
## mean in group High Use  mean in group Low Use 
##               9.580882               9.333333

Q8. Is there a significant difference in the average number of drinks per week between students of different genders?

# Ensure the Gender variable is a factor
sleepStudy$Gender <- as.factor(sleepStudy$Gender)

# Perform t-test to compare the average number of drinks per week between genders
t_test_result <- t.test(Drinks ~ Gender, data = sleepStudy)

# Print the t-test results
print(t_test_result)
## 
##  Welch Two Sample t-test
## 
## data:  Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1 
##        4.238411        7.539216
# Optional: Calculate group means for drinks by gender
library(dplyr)
group_means <- sleepStudy %>%
  group_by(Gender) %>%
  summarise(Average_Drinks = mean(Drinks, na.rm = TRUE))
print(group_means)
## # A tibble: 2 × 2
##   Gender Average_Drinks
##   <fct>           <dbl>
## 1 0                4.24
## 2 1                7.54

Q9. Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

# Ensure the Stress variable is a factor
sleepStudy$Stress <- as.factor(sleepStudy$Stress)

# Check the levels of the Stress variable
levels(sleepStudy$Stress)
## [1] "high"   "normal"
# Perform the t-test to compare average weekday bedtime between high and normal stress students
t_test_result <- t.test(WeekdayBed ~ Stress, data = sleepStudy)

# Print the result of the t-test
print(t_test_result)
## 
##  Welch Two Sample t-test
## 
## data:  WeekdayBed by Stress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
##  -0.4856597  0.1447968
## sample estimates:
##   mean in group high mean in group normal 
##             24.71500             24.88543
# Optional: Calculate the mean weekday bedtime by stress level
library(dplyr)
group_means <- sleepStudy %>%
  group_by(Stress) %>%
  summarise(Average_WeekdayBed = mean(WeekdayBed, na.rm = TRUE))
print(group_means)
## # A tibble: 2 × 2
##   Stress Average_WeekdayBed
##   <fct>               <dbl>
## 1 high                 24.7
## 2 normal               24.9

Q10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

# Create a new grouping variable for "FirstTwoYears" and "OtherYears"
sleepStudy$YearGroup <- ifelse(sleepStudy$ClassYear %in% c(1, 2), "FirstTwoYears", "OtherYears")

# Check the distribution of the new grouping variable
table(sleepStudy$YearGroup)
## 
## FirstTwoYears    OtherYears 
##           142           111
# Perform the t-test to compare average weekend sleep between the two groups
t_test_result <- t.test(WeekendSleep ~ YearGroup, data = sleepStudy)

# Print the t-test results
print(t_test_result)
## 
##  Welch Two Sample t-test
## 
## data:  WeekendSleep by YearGroup
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
##  -0.3497614  0.3331607
## sample estimates:
## mean in group FirstTwoYears    mean in group OtherYears 
##                    8.213592                    8.221892
# Optional: Calculate group means for weekend sleep by the new group
library(dplyr)
group_means <- sleepStudy %>%
  group_by(YearGroup) %>%
  summarise(Average_WeekendSleep = mean(WeekendSleep, na.rm = TRUE))
print(group_means)
## # A tibble: 2 × 2
##   YearGroup     Average_WeekendSleep
##   <chr>                        <dbl>
## 1 FirstTwoYears                 8.21
## 2 OtherYears                    8.22

4. Summary

This project explores the key factors shaping college students’ academic performance and personal well-being, emphasizing the roles of sleep quality, stress, and academic year. The dataset comprises 253 records, including variables such as gender, GPA, sleep habits, and mental health indicators. By analyzing these dimensions, the study aims to identify patterns and relationships that influence students’ overall success and lifestyle choices.

Key analysis:

Gender and GPA: - Tested whether there is a significant difference in GPA between male and female students using a two-sample t-test.

Early Classes and Class Year:
- Compared the average number of early classes attended by first- and second-year students versus other students to identify any significant differences.

Larks vs. Owls and Cognitive Skills: - Assessed whether students identifying as “larks” have significantly better cognitive skills (cognition z-score) compared to those identifying as “owls.”

Classes Missed and Early Classes: - Evaluated whether students with at least one early class missed more classes on average compared to those without early classes.

Happiness and Depression Status: - Investigated the difference in average happiness levels between students with at least moderate depression and those with normal depression.

Sleep Quality and All-Nighters: - Examined whether students who experienced all-nighters had significantly worse sleep quality scores than those who did not.

Alcohol Use and Stress: - Compared stress scores between students with low alcohol use (abstainers and light drinkers) and those with high alcohol use.

Gender and Weekly Alcohol Consumption: - Tested for differences in the average number of drinks consumed per week between male and female students.

Stress and Weekday Bedtime: - Analyzed weekday bedtime differences between students with high stress levels and those with normal stress levels.

Weekend Sleep and Class Year: - Investigated differences in average hours of sleep on weekends between first- and second-year students and upperclassmen.

These analyses collectively explore critical relationships between demographic factors, academic behaviors, and mental health indicators, offering valuable insights into the interplay of these variables in college students’ lives.

5. Reference

https://www.lock5stat.com/datasets3e/SleepStudy.csv