Project 2 Stat 353: Exploring sleep patters in college students

1. Introduction

This report delves into the intricate relationship between sleep quality, stress levels, and academic outcomes among college students, leveraging the “SleepStudy” dataset sourced from [insert source]. This dataset encompasses 253 entries across 27 variables, capturing a detailed snapshot of sleep behaviors, mental health indicators, and lifestyle habits within this demographic.

The analysis seeks to uncover patterns and correlations that shed light on the critical role of sleep in the academic and personal lives of students. By addressing key research questions, this report aims to explore how sleep duration and quality influence stress management, cognitive performance, and overall well-being. The insights generated from this study are intended to inform strategies for enhancing student health and optimizing academic success, contributing to a broader understanding of the interplay between lifestyle factors and education outcomes.

These are the 10 question I will be discussing int he project.

Q1. Is there a significant difference in the average GPA between male and female college students? Q2. Is there a significant difference in the average number of early classes between the first two class years and other class years? Q3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”? Q4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)? Q5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status? Q6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)? Q7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use? Q8. Is there a significant difference in the average number of drinks per week between students of different genders? Q9. Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)? Q10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

2. Data

Structure of the Dataset

Number of Observations: 254 students.

Variables: The dataset contains 28 variables, as listed below:

Gender: The gender of the student.

ClassYear: The academic year of the student (e.g., freshman, sophomore).

LarkOwl: Chronotype preference, indicating whether the student is a “morning person” (lark) or a “night person” (owl).

NumEarlyClass: Number of early morning classes attended.

EarlyClass: Indicator of early class attendance.

GPA: Grade Point Average of the student.

ClassesMissed: Number of classes missed.

CognitionZscore: A cognitive ability score normalized as a Z-score.

PoorSleepQuality: Measure of poor sleep quality.

DepressionScore: A numerical score indicating depression levels.

AnxietyScore: A numerical score indicating anxiety levels.

StressScore: A numerical score indicating stress levels.

DepressionStatus: Depression classification (e.g., no depression, mild, severe).

AnxietyStatus: Anxiety classification (e.g., no anxiety, mild, severe).

Stress: Stress classification (e.g., low, moderate, high).

DASScore: Combined Depression, Anxiety, and Stress (DAS) score.

Happiness: A measure of happiness.

AlcoholUse: Alcohol consumption frequency or behavior.

Drinks: Number of drinks consumed.

WeekdayBed: Weekday bedtime.

WeekdayRise: Weekday wake-up time.

WeekdaySleep: Total hours of sleep on weekdays.

WeekendBed: Weekend bedtime.

WeekendRise: Weekend wake-up time.

WeekendSleep: Total hours of sleep on weekends.

AverageSleep: Average sleep hours across weekdays and weekends.

AllNighter: Indicator of whether the student has pulled an all-nighter.

# Load the dataset
sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")

# Inspect the dataset to understand its structure
str(sleepStudy)

## 'data.frame':    253 obs. of  27 variables:
##  $ Gender          : int  0 0 0 0 0 1 1 0 0 0 ...
##  $ ClassYear       : int  4 4 4 1 4 4 2 2 1 4 ...
##  $ LarkOwl         : chr  "Neither" "Neither" "Owl" "Lark" ...
##  $ NumEarlyClass   : int  0 2 0 5 0 0 2 0 2 2 ...
##  $ EarlyClass      : int  0 1 0 1 0 0 1 0 1 1 ...
##  $ GPA             : num  3.6 3.24 2.97 3.76 3.2 3.5 3.35 3 4 2.9 ...
##  $ ClassesMissed   : int  0 0 12 0 4 0 2 0 0 0 ...
##  $ CognitionZscore : num  -0.26 1.39 0.38 1.39 1.22 -0.04 0.41 -0.59 1.03 0.72 ...
##  $ PoorSleepQuality: int  4 6 18 9 9 6 2 10 5 2 ...
##  $ DepressionScore : int  4 1 18 1 7 14 1 2 12 6 ...
##  $ AnxietyScore    : int  3 0 18 4 25 8 0 2 16 11 ...
##  $ StressScore     : int  8 3 9 6 14 28 1 3 20 31 ...
##  $ DepressionStatus: chr  "normal" "normal" "moderate" "normal" ...
##  $ AnxietyStatus   : chr  "normal" "normal" "severe" "normal" ...
##  $ Stress          : chr  "normal" "normal" "normal" "normal" ...
##  $ DASScore        : int  15 4 45 11 46 50 2 7 48 48 ...
##  $ Happiness       : int  28 25 17 32 15 22 25 29 29 30 ...
##  $ AlcoholUse      : chr  "Moderate" "Moderate" "Light" "Light" ...
##  $ Drinks          : int  10 6 3 2 4 0 6 3 3 6 ...
##  $ WeekdayBed      : num  25.8 25.7 27.4 23.5 25.9 ...
##  $ WeekdayRise     : num  8.7 8.2 6.55 7.17 8.67 8.95 8.48 9.07 8.75 8 ...
##  $ WeekdaySleep    : num  7.7 6.8 3 6.77 6.09 9.05 7.73 9.02 8.25 6.6 ...
##  $ WeekendBed      : num  25.8 26 28 27 23.8 ...
##  $ WeekendRise     : num  9.5 10 12.6 8 9.5 ...
##  $ WeekendSleep    : num  5.88 7.25 10.09 7.25 7 ...
##  $ AverageSleep    : num  7.18 6.93 5.02 6.9 6.35 9.04 7.52 9.01 8.54 6.68 ...
##  $ AllNighter      : int  0 0 0 0 0 0 1 0 0 0 ...

# Create a new variable "AlcoholUseGroup"
# Grouping students based on their alcohol use
sleepStudy$AlcoholUseGroup <- ifelse(sleepStudy$AlcoholUse %in% c("Abstain", "Light"), "Low Use", "High Use")

# Verify the new variable
table(sleepStudy$AlcoholUseGroup)

## 
## High Use  Low Use 
##      136      117

3. Analysis

We will explore all 10 questions in detail.

college = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(college)

##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
## 1      0         4 Neither             0          0 3.60             0
## 2      0         4 Neither             2          1 3.24             0
## 3      0         4     Owl             0          0 2.97            12
## 4      0         1    Lark             5          1 3.76             0
## 5      0         4     Owl             0          0 3.20             4
## 6      1         4 Neither             0          0 3.50             0
##   CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1           -0.26                4               4            3           8
## 2            1.39                6               1            0           3
## 3            0.38               18              18           18           9
## 4            1.39                9               1            4           6
## 5            1.22                9               7           25          14
## 6           -0.04                6              14            8          28
##   DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1           normal        normal normal       15        28   Moderate     10
## 2           normal        normal normal        4        25   Moderate      6
## 3         moderate        severe normal       45        17      Light      3
## 4           normal        normal normal       11        32      Light      2
## 5           normal        severe normal       46        15   Moderate      4
## 6         moderate      moderate   high       50        22    Abstain      0
##   WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1      25.75        8.70         7.70      25.75        9.50         5.88
## 2      25.70        8.20         6.80      26.00       10.00         7.25
## 3      27.44        6.55         3.00      28.00       12.59        10.09
## 4      23.50        7.17         6.77      27.00        8.00         7.25
## 5      25.90        8.67         6.09      23.75        9.50         7.00
## 6      23.80        8.95         9.05      26.00       10.75         9.00
##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0

Q1: Is there a significant difference in the average GPA between male and female college students?

t_test_gpa_gender <- t.test(GPA ~ Gender, data = college, na.rm = TRUE)
t_test_gpa_gender

## 
##  Welch Two Sample t-test
## 
## data:  GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1 
##        3.324901        3.123725

Q2. Is there a significant difference in the average number of early classes between the first two class years and other class years?

# Create a new grouping variable for "FirstTwoYears" and "OtherYears"
sleepStudy$YearGroup <- ifelse(sleepStudy$ClassYear %in% c(1, 2), "FirstTwoYears", "OtherYears")

# Perform the t-test to compare the average number of early classes between the two groups
t_test_result <- t.test(NumEarlyClass ~ YearGroup, data = sleepStudy)

# Print the t-test results
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  NumEarlyClass by YearGroup
## t = 4.1813, df = 250.69, p-value = 4.009e-05
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
##  0.4042016 1.1240309
## sample estimates:
## mean in group FirstTwoYears    mean in group OtherYears 
##                    2.070423                    1.306306

# Optional: Calculate the mean number of early classes by year group
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

group_means <- sleepStudy %>%
  group_by(YearGroup) %>%
  summarise(Average_EarlyClasses = mean(NumEarlyClass, na.rm = TRUE))
print(group_means)

## # A tibble: 2 × 2
##   YearGroup     Average_EarlyClasses
##   <chr>                        <dbl>
## 1 FirstTwoYears                 2.07
## 2 OtherYears                    1.31

Q3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

# Load the dataset
sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")

# Check the unique values in the LarkOwl variable
unique(sleepStudy$LarkOwl)

## [1] "Neither" "Owl"     "Lark"

# Subset data to include only "Lark" and "Owl" categories
sleepStudy_subset <- subset(sleepStudy, LarkOwl %in% c("Lark", "Owl"))

# Perform a two-sample t-test for CognitionZscore by LarkOwl
t_test_cognition <- t.test(CognitionZscore ~ LarkOwl, data = sleepStudy_subset, na.rm = TRUE)

# Output the results
t_test_cognition

## 
##  Welch Two Sample t-test
## 
## data:  CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
##  -0.1893561  0.4465786
## sample estimates:
## mean in group Lark  mean in group Owl 
##         0.09024390        -0.03836735

Q4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

# Load the dataset
sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")

# Perform a two-sample t-test for ClassesMissed by EarlyClass
t_test_classes_missed <- t.test(ClassesMissed ~ EarlyClass, data = sleepStudy, na.rm = TRUE)

# Output the results
t_test_classes_missed

## 
##  Welch Two Sample t-test
## 
## data:  ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.2233558  1.5412830
## sample estimates:
## mean in group 0 mean in group 1 
##        2.647059        1.988095

Q5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

# Load the dataset
sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")

# Create a new variable "DepressionGroup" based on depression status
# Assuming DepressionStatus is a categorical variable with at least moderate depression being '1'
# and normal depression status being '0'.
sleepStudy$DepressionGroup <- ifelse(sleepStudy$DepressionStatus >= "Moderate", "At Least Moderate", "Normal")

# Perform a two-sample t-test for Happiness by DepressionGroup
t_test_happiness <- t.test(Happiness ~ DepressionGroup, data = sleepStudy, na.rm = TRUE)

# Output the results
t_test_happiness

## 
##  Welch Two Sample t-test
## 
## data:  Happiness by DepressionGroup
## t = 3.7601, df = 46.07, p-value = 0.0004777
## alternative hypothesis: true difference in means between group At Least Moderate and group Normal is not equal to 0
## 95 percent confidence interval:
##  1.622570 5.360777
## sample estimates:
## mean in group At Least Moderate            mean in group Normal 
##                        26.57991                        23.08824

Q6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

# Ensure the AllNighter variable is a factor (0 = No, 1 = Yes)
sleepStudy$AllNighter <- as.factor(sleepStudy$AllNighter)

# Perform the t-test to compare average sleep quality scores between students who had at least one all-nighter and those who didn't
t_test_result <- t.test(PoorSleepQuality ~ AllNighter, data = sleepStudy)

# Print the t-test results
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.9456958  0.1608449
## sample estimates:
## mean in group 0 mean in group 1 
##        6.136986        7.029412

# Optional: Calculate the mean sleep quality score by all-nighter status
library(dplyr)
group_means <- sleepStudy %>%
  group_by(AllNighter) %>%
  summarise(Average_SleepQuality = mean(PoorSleepQuality, na.rm = TRUE))
print(group_means)

## # A tibble: 2 × 2
##   AllNighter Average_SleepQuality
##   <fct>                     <dbl>
## 1 0                          6.14
## 2 1                          7.03

Q7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

sleepStudy <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")

sleepStudy$AlcoholUseGroup <- ifelse(sleepStudy$AlcoholUse %in% c("Abstain", "Light"), "Low Use", "High Use")

t_test_stress <- t.test(StressScore ~ AlcoholUseGroup, data = sleepStudy, na.rm = TRUE)
print(t_test_stress)

## 
##  Welch Two Sample t-test
## 
## data:  StressScore by AlcoholUseGroup
## t = 0.24753, df = 248.92, p-value = 0.8047
## alternative hypothesis: true difference in means between group High Use and group Low Use is not equal to 0
## 95 percent confidence interval:
##  -1.722125  2.217223
## sample estimates:
## mean in group High Use  mean in group Low Use 
##               9.580882               9.333333

Q8. Is there a significant difference in the average number of drinks per week between students of different genders?

# Ensure the Gender variable is a factor
sleepStudy$Gender <- as.factor(sleepStudy$Gender)

# Perform t-test to compare the average number of drinks per week between genders
t_test_result <- t.test(Drinks ~ Gender, data = sleepStudy)

# Print the t-test results
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1 
##        4.238411        7.539216

# Optional: Calculate group means for drinks by gender
library(dplyr)
group_means <- sleepStudy %>%
  group_by(Gender) %>%
  summarise(Average_Drinks = mean(Drinks, na.rm = TRUE))
print(group_means)

## # A tibble: 2 × 2
##   Gender Average_Drinks
##   <fct>           <dbl>
## 1 0                4.24
## 2 1                7.54

Q9. Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

# Ensure the Stress variable is a factor
sleepStudy$Stress <- as.factor(sleepStudy$Stress)

# Check the levels of the Stress variable
levels(sleepStudy$Stress)

## [1] "high"   "normal"

# Perform the t-test to compare average weekday bedtime between high and normal stress students
t_test_result <- t.test(WeekdayBed ~ Stress, data = sleepStudy)

# Print the result of the t-test
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  WeekdayBed by Stress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
##  -0.4856597  0.1447968
## sample estimates:
##   mean in group high mean in group normal 
##             24.71500             24.88543

# Optional: Calculate the mean weekday bedtime by stress level
library(dplyr)
group_means <- sleepStudy %>%
  group_by(Stress) %>%
  summarise(Average_WeekdayBed = mean(WeekdayBed, na.rm = TRUE))
print(group_means)

## # A tibble: 2 × 2
##   Stress Average_WeekdayBed
##   <fct>               <dbl>
## 1 high                 24.7
## 2 normal               24.9

Q10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

# Create a new grouping variable for "FirstTwoYears" and "OtherYears"
sleepStudy$YearGroup <- ifelse(sleepStudy$ClassYear %in% c(1, 2), "FirstTwoYears", "OtherYears")

# Check the distribution of the new grouping variable
table(sleepStudy$YearGroup)

## 
## FirstTwoYears    OtherYears 
##           142           111

# Perform the t-test to compare average weekend sleep between the two groups
t_test_result <- t.test(WeekendSleep ~ YearGroup, data = sleepStudy)

# Print the t-test results
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  WeekendSleep by YearGroup
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
##  -0.3497614  0.3331607
## sample estimates:
## mean in group FirstTwoYears    mean in group OtherYears 
##                    8.213592                    8.221892

# Optional: Calculate group means for weekend sleep by the new group
library(dplyr)
group_means <- sleepStudy %>%
  group_by(YearGroup) %>%
  summarise(Average_WeekendSleep = mean(WeekendSleep, na.rm = TRUE))
print(group_means)

## # A tibble: 2 × 2
##   YearGroup     Average_WeekendSleep
##   <chr>                        <dbl>
## 1 FirstTwoYears                 8.21
## 2 OtherYears                    8.22

4.Summary

This project analyzes factors affecting college students’ academic and personal behaviors, focusing on stress, sleep patterns, and class year. The dataset includes 254 students with variables like gender, class year, GPA, sleep quality, and mental health indicators.

Key Analyses: Gender and GPA: We tested if there is a significant GPA difference between male and female students. Stress and Bedtime: We compared weekday bedtimes between students with high and normal stress. Class Year and Weekend Sleep: We examined if first- and second-year students sleep differently on weekends compared to others. All-Nighters and Sleep Quality: We assessed whether students with all-nighters had worse sleep quality than those without. Early Classes and Class Year: We looked at whether first- and second-year students attend more early classes than others. Results: The analysis provided insights into the impact of stress, class year, and sleep habits on students’ academic and personal lives. Findings can inform strategies to improve student well-being and academic success.

Refrences: Some of the R code were done using the help of geeksforgeeks.