This report delves into the intricate relationship between sleep quality, stress levels, and academic outcomes among college students, leveraging the “SleepStudy” dataset sourced from [insert source]. This dataset encompasses 253 entries across 27 variables, capturing a detailed snapshot of sleep behaviors, mental health indicators, and lifestyle habits within this demographic.
The analysis seeks to uncover patterns and correlations that shed light on the critical role of sleep in the academic and personal lives of students. By addressing key research questions, this report aims to explore how sleep duration and quality influence stress management, cognitive performance, and overall well-being. The insights generated from this study are intended to inform strategies for enhancing student health and optimizing academic success, contributing to a broader understanding of the interplay between lifestyle factors and education outcomes.
These are the 10 question I will be discussing int he project.
Q1. Is there a significant difference in the average GPA between male and female college students? Q2. Is there a significant difference in the average number of early classes between the first two class years and other class years? Q3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”? Q4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)? Q5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status? Q6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)? Q7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use? Q8. Is there a significant difference in the average number of drinks per week between students of different genders? Q9. Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)? Q10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?
Structure of the Dataset
Number of Observations: 254 students.
Variables: The dataset contains 28 variables, as listed below:
Gender: The gender of the student.
ClassYear: The academic year of the student (e.g., freshman, sophomore).
LarkOwl: Chronotype preference, indicating whether the student is a “morning person” (lark) or a “night person” (owl).
NumEarlyClass: Number of early morning classes attended.
EarlyClass: Indicator of early class attendance.
GPA: Grade Point Average of the student.
ClassesMissed: Number of classes missed.
CognitionZscore: A cognitive ability score normalized as a Z-score.
PoorSleepQuality: Measure of poor sleep quality.
DepressionScore: A numerical score indicating depression levels.
AnxietyScore: A numerical score indicating anxiety levels.
StressScore: A numerical score indicating stress levels.
DepressionStatus: Depression classification (e.g., no depression, mild, severe).
AnxietyStatus: Anxiety classification (e.g., no anxiety, mild, severe).
Stress: Stress classification (e.g., low, moderate, high).
DASScore: Combined Depression, Anxiety, and Stress (DAS) score.
Happiness: A measure of happiness.
AlcoholUse: Alcohol consumption frequency or behavior.
Drinks: Number of drinks consumed.
WeekdayBed: Weekday bedtime.
WeekdayRise: Weekday wake-up time.
WeekdaySleep: Total hours of sleep on weekdays.
WeekendBed: Weekend bedtime.
WeekendRise: Weekend wake-up time.
WeekendSleep: Total hours of sleep on weekends.
AverageSleep: Average sleep hours across weekdays and weekends.
AllNighter: Indicator of whether the student has pulled an all-nighter.
# Load the dataset
sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
# Inspect the dataset to understand its structure
str(sleepStudy)
## 'data.frame': 253 obs. of 27 variables:
## $ Gender : int 0 0 0 0 0 1 1 0 0 0 ...
## $ ClassYear : int 4 4 4 1 4 4 2 2 1 4 ...
## $ LarkOwl : chr "Neither" "Neither" "Owl" "Lark" ...
## $ NumEarlyClass : int 0 2 0 5 0 0 2 0 2 2 ...
## $ EarlyClass : int 0 1 0 1 0 0 1 0 1 1 ...
## $ GPA : num 3.6 3.24 2.97 3.76 3.2 3.5 3.35 3 4 2.9 ...
## $ ClassesMissed : int 0 0 12 0 4 0 2 0 0 0 ...
## $ CognitionZscore : num -0.26 1.39 0.38 1.39 1.22 -0.04 0.41 -0.59 1.03 0.72 ...
## $ PoorSleepQuality: int 4 6 18 9 9 6 2 10 5 2 ...
## $ DepressionScore : int 4 1 18 1 7 14 1 2 12 6 ...
## $ AnxietyScore : int 3 0 18 4 25 8 0 2 16 11 ...
## $ StressScore : int 8 3 9 6 14 28 1 3 20 31 ...
## $ DepressionStatus: chr "normal" "normal" "moderate" "normal" ...
## $ AnxietyStatus : chr "normal" "normal" "severe" "normal" ...
## $ Stress : chr "normal" "normal" "normal" "normal" ...
## $ DASScore : int 15 4 45 11 46 50 2 7 48 48 ...
## $ Happiness : int 28 25 17 32 15 22 25 29 29 30 ...
## $ AlcoholUse : chr "Moderate" "Moderate" "Light" "Light" ...
## $ Drinks : int 10 6 3 2 4 0 6 3 3 6 ...
## $ WeekdayBed : num 25.8 25.7 27.4 23.5 25.9 ...
## $ WeekdayRise : num 8.7 8.2 6.55 7.17 8.67 8.95 8.48 9.07 8.75 8 ...
## $ WeekdaySleep : num 7.7 6.8 3 6.77 6.09 9.05 7.73 9.02 8.25 6.6 ...
## $ WeekendBed : num 25.8 26 28 27 23.8 ...
## $ WeekendRise : num 9.5 10 12.6 8 9.5 ...
## $ WeekendSleep : num 5.88 7.25 10.09 7.25 7 ...
## $ AverageSleep : num 7.18 6.93 5.02 6.9 6.35 9.04 7.52 9.01 8.54 6.68 ...
## $ AllNighter : int 0 0 0 0 0 0 1 0 0 0 ...
# Create a new variable "AlcoholUseGroup"
# Grouping students based on their alcohol use
sleepStudy$AlcoholUseGroup <- ifelse(sleepStudy$AlcoholUse %in% c("Abstain", "Light"), "Low Use", "High Use")
# Verify the new variable
table(sleepStudy$AlcoholUseGroup)
##
## High Use Low Use
## 136 117
We will explore all 10 questions in detail.
college = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(college)
## Gender ClassYear LarkOwl NumEarlyClass EarlyClass GPA ClassesMissed
## 1 0 4 Neither 0 0 3.60 0
## 2 0 4 Neither 2 1 3.24 0
## 3 0 4 Owl 0 0 2.97 12
## 4 0 1 Lark 5 1 3.76 0
## 5 0 4 Owl 0 0 3.20 4
## 6 1 4 Neither 0 0 3.50 0
## CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1 -0.26 4 4 3 8
## 2 1.39 6 1 0 3
## 3 0.38 18 18 18 9
## 4 1.39 9 1 4 6
## 5 1.22 9 7 25 14
## 6 -0.04 6 14 8 28
## DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1 normal normal normal 15 28 Moderate 10
## 2 normal normal normal 4 25 Moderate 6
## 3 moderate severe normal 45 17 Light 3
## 4 normal normal normal 11 32 Light 2
## 5 normal severe normal 46 15 Moderate 4
## 6 moderate moderate high 50 22 Abstain 0
## WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1 25.75 8.70 7.70 25.75 9.50 5.88
## 2 25.70 8.20 6.80 26.00 10.00 7.25
## 3 27.44 6.55 3.00 28.00 12.59 10.09
## 4 23.50 7.17 6.77 27.00 8.00 7.25
## 5 25.90 8.67 6.09 23.75 9.50 7.00
## 6 23.80 8.95 9.05 26.00 10.75 9.00
## AverageSleep AllNighter
## 1 7.18 0
## 2 6.93 0
## 3 5.02 0
## 4 6.90 0
## 5 6.35 0
## 6 9.04 0
t_test_gpa_gender <- t.test(GPA ~ Gender, data = college, na.rm = TRUE)
t_test_gpa_gender
##
## Welch Two Sample t-test
##
## data: GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1
## 3.324901 3.123725
# Create a new grouping variable for "FirstTwoYears" and "OtherYears"
sleepStudy$YearGroup <- ifelse(sleepStudy$ClassYear %in% c(1, 2), "FirstTwoYears", "OtherYears")
# Perform the t-test to compare the average number of early classes between the two groups
t_test_result <- t.test(NumEarlyClass ~ YearGroup, data = sleepStudy)
# Print the t-test results
print(t_test_result)
##
## Welch Two Sample t-test
##
## data: NumEarlyClass by YearGroup
## t = 4.1813, df = 250.69, p-value = 4.009e-05
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
## 0.4042016 1.1240309
## sample estimates:
## mean in group FirstTwoYears mean in group OtherYears
## 2.070423 1.306306
# Optional: Calculate the mean number of early classes by year group
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
group_means <- sleepStudy %>%
group_by(YearGroup) %>%
summarise(Average_EarlyClasses = mean(NumEarlyClass, na.rm = TRUE))
print(group_means)
## # A tibble: 2 × 2
## YearGroup Average_EarlyClasses
## <chr> <dbl>
## 1 FirstTwoYears 2.07
## 2 OtherYears 1.31
# Load the dataset
sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
# Check the unique values in the LarkOwl variable
unique(sleepStudy$LarkOwl)
## [1] "Neither" "Owl" "Lark"
# Subset data to include only "Lark" and "Owl" categories
sleepStudy_subset <- subset(sleepStudy, LarkOwl %in% c("Lark", "Owl"))
# Perform a two-sample t-test for CognitionZscore by LarkOwl
t_test_cognition <- t.test(CognitionZscore ~ LarkOwl, data = sleepStudy_subset, na.rm = TRUE)
# Output the results
t_test_cognition
##
## Welch Two Sample t-test
##
## data: CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
## -0.1893561 0.4465786
## sample estimates:
## mean in group Lark mean in group Owl
## 0.09024390 -0.03836735
# Load the dataset
sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
# Perform a two-sample t-test for ClassesMissed by EarlyClass
t_test_classes_missed <- t.test(ClassesMissed ~ EarlyClass, data = sleepStudy, na.rm = TRUE)
# Output the results
t_test_classes_missed
##
## Welch Two Sample t-test
##
## data: ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.2233558 1.5412830
## sample estimates:
## mean in group 0 mean in group 1
## 2.647059 1.988095
# Load the dataset
sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
# Create a new variable "DepressionGroup" based on depression status
# Assuming DepressionStatus is a categorical variable with at least moderate depression being '1'
# and normal depression status being '0'.
sleepStudy$DepressionGroup <- ifelse(sleepStudy$DepressionStatus >= "Moderate", "At Least Moderate", "Normal")
# Perform a two-sample t-test for Happiness by DepressionGroup
t_test_happiness <- t.test(Happiness ~ DepressionGroup, data = sleepStudy, na.rm = TRUE)
# Output the results
t_test_happiness
##
## Welch Two Sample t-test
##
## data: Happiness by DepressionGroup
## t = 3.7601, df = 46.07, p-value = 0.0004777
## alternative hypothesis: true difference in means between group At Least Moderate and group Normal is not equal to 0
## 95 percent confidence interval:
## 1.622570 5.360777
## sample estimates:
## mean in group At Least Moderate mean in group Normal
## 26.57991 23.08824
# Ensure the AllNighter variable is a factor (0 = No, 1 = Yes)
sleepStudy$AllNighter <- as.factor(sleepStudy$AllNighter)
# Perform the t-test to compare average sleep quality scores between students who had at least one all-nighter and those who didn't
t_test_result <- t.test(PoorSleepQuality ~ AllNighter, data = sleepStudy)
# Print the t-test results
print(t_test_result)
##
## Welch Two Sample t-test
##
## data: PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -1.9456958 0.1608449
## sample estimates:
## mean in group 0 mean in group 1
## 6.136986 7.029412
# Optional: Calculate the mean sleep quality score by all-nighter status
library(dplyr)
group_means <- sleepStudy %>%
group_by(AllNighter) %>%
summarise(Average_SleepQuality = mean(PoorSleepQuality, na.rm = TRUE))
print(group_means)
## # A tibble: 2 × 2
## AllNighter Average_SleepQuality
## <fct> <dbl>
## 1 0 6.14
## 2 1 7.03
sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
sleepStudy$AlcoholUseGroup <- ifelse(sleepStudy$AlcoholUse %in% c("Abstain", "Light"), "Low Use", "High Use")
t_test_stress <- t.test(StressScore ~ AlcoholUseGroup, data = sleepStudy, na.rm = TRUE)
print(t_test_stress)
##
## Welch Two Sample t-test
##
## data: StressScore by AlcoholUseGroup
## t = 0.24753, df = 248.92, p-value = 0.8047
## alternative hypothesis: true difference in means between group High Use and group Low Use is not equal to 0
## 95 percent confidence interval:
## -1.722125 2.217223
## sample estimates:
## mean in group High Use mean in group Low Use
## 9.580882 9.333333
# Ensure the Gender variable is a factor
sleepStudy$Gender <- as.factor(sleepStudy$Gender)
# Perform t-test to compare the average number of drinks per week between genders
t_test_result <- t.test(Drinks ~ Gender, data = sleepStudy)
# Print the t-test results
print(t_test_result)
##
## Welch Two Sample t-test
##
## data: Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1
## 4.238411 7.539216
# Optional: Calculate group means for drinks by gender
library(dplyr)
group_means <- sleepStudy %>%
group_by(Gender) %>%
summarise(Average_Drinks = mean(Drinks, na.rm = TRUE))
print(group_means)
## # A tibble: 2 × 2
## Gender Average_Drinks
## <fct> <dbl>
## 1 0 4.24
## 2 1 7.54
# Ensure the Stress variable is a factor
sleepStudy$Stress <- as.factor(sleepStudy$Stress)
# Check the levels of the Stress variable
levels(sleepStudy$Stress)
## [1] "high" "normal"
# Perform the t-test to compare average weekday bedtime between high and normal stress students
t_test_result <- t.test(WeekdayBed ~ Stress, data = sleepStudy)
# Print the result of the t-test
print(t_test_result)
##
## Welch Two Sample t-test
##
## data: WeekdayBed by Stress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
## -0.4856597 0.1447968
## sample estimates:
## mean in group high mean in group normal
## 24.71500 24.88543
# Optional: Calculate the mean weekday bedtime by stress level
library(dplyr)
group_means <- sleepStudy %>%
group_by(Stress) %>%
summarise(Average_WeekdayBed = mean(WeekdayBed, na.rm = TRUE))
print(group_means)
## # A tibble: 2 × 2
## Stress Average_WeekdayBed
## <fct> <dbl>
## 1 high 24.7
## 2 normal 24.9
# Create a new grouping variable for "FirstTwoYears" and "OtherYears"
sleepStudy$YearGroup <- ifelse(sleepStudy$ClassYear %in% c(1, 2), "FirstTwoYears", "OtherYears")
# Check the distribution of the new grouping variable
table(sleepStudy$YearGroup)
##
## FirstTwoYears OtherYears
## 142 111
# Perform the t-test to compare average weekend sleep between the two groups
t_test_result <- t.test(WeekendSleep ~ YearGroup, data = sleepStudy)
# Print the t-test results
print(t_test_result)
##
## Welch Two Sample t-test
##
## data: WeekendSleep by YearGroup
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
## -0.3497614 0.3331607
## sample estimates:
## mean in group FirstTwoYears mean in group OtherYears
## 8.213592 8.221892
# Optional: Calculate group means for weekend sleep by the new group
library(dplyr)
group_means <- sleepStudy %>%
group_by(YearGroup) %>%
summarise(Average_WeekendSleep = mean(WeekendSleep, na.rm = TRUE))
print(group_means)
## # A tibble: 2 × 2
## YearGroup Average_WeekendSleep
## <chr> <dbl>
## 1 FirstTwoYears 8.21
## 2 OtherYears 8.22
This project analyzes factors affecting college students’ academic and personal behaviors, focusing on stress, sleep patterns, and class year. The dataset includes 254 students with variables like gender, class year, GPA, sleep quality, and mental health indicators.
Key Analyses: Gender and GPA: We tested if there is a significant GPA difference between male and female students. Stress and Bedtime: We compared weekday bedtimes between students with high and normal stress. Class Year and Weekend Sleep: We examined if first- and second-year students sleep differently on weekends compared to others. All-Nighters and Sleep Quality: We assessed whether students with all-nighters had worse sleep quality than those without. Early Classes and Class Year: We looked at whether first- and second-year students attend more early classes than others. Results: The analysis provided insights into the impact of stress, class year, and sleep habits on students’ academic and personal lives. Findings can inform strategies to improve student well-being and academic success.
Refrences: Some of the R code were done using the help of geeksforgeeks.