Stat 353 Project 2: Analysis of Sleep Pattern Among College Students

writeLines('PATH="${RTOOLS40_HOME}\\usr\\bin;${PATH}"', con = "~/.Renviron")

Sys.which("make")

##                               make 
## "c:\\rtools44\\usr\\bin\\make.exe"

if (!requireNamespace("readr", quietly = TRUE)) install.packages("readr")
if (!requireNamespace("lessR", quietly = TRUE)) install.packages("lessR")

install necessary library

##1. Introduction

This report presents an analysis of sleep patterns among college students, utilizing the “SleepStudy” dataset obtained from Lock5Stat. The dataset comprises 253 observations on 27 variables, providing valuable insights into the sleep habits, psychological well-being, and lifestyle choices of college students. The primary objective of this analysis is to address a series of research questions by examining the dataset. We will use R for all statistical analyses and visualizations. The questions explored in this report aim to shed light on various aspects of college students’ sleep patterns, their academic performance, psychological well-being, and lifestyle choices.

The following research questions will be addressed in this report:

Is there a significant difference in the average GPA between male and female college students?

2)Is there a significant difference in the average number of early classes between the first two class years and other class years?

Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?
Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?
Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?
Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?
Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?
Is there a significant difference in the average number of drinks per week between students of different genders?
Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?
Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

By addressing these questions using R, we aim to provide a comprehensive understanding of the sleep patterns and related factors among college students, ultimately contributing to the well-being and academic success of this population.

##2. Data The “SleepStudy” dataset has of 253 observations across 27 variables, providing a comprehensive overview of various aspects of college students’ sleep patterns, psychological well-being, and lifestyle choices. The dataset is structured in a tabular format, where each row represents an individual student, and each column contains specific variables related to their sleep habits and other relevant factors.

Dataset Structure

Here are the key variables included in the dataset:

Gender: Binary variable indicating the gender of the student (0 = Female, 1 = Male).

ClassYear: Numeric variable representing the student’s year in college (1 = Freshman, 2 = Sophomore, 3 = Junior, 4 = Senior).

LarkOwl: Categorical variable indicating the student’s sleep preference (Lark, Owl, Neither).

NumEarlyClass: Numeric variable representing the number of early classes the student has.

EarlyClass: Binary variable indicating whether the student has at least one early class (0 = No, 1 = Yes).

GPA: Numeric variable representing the student’s Grade Point Average.

ClassesMissed: Numeric variable indicating the number of classes missed by the student in a semester.

CognitionZscore: Numeric variable representing the student’s cognitive skills measured as a z-score.

PoorSleepQuality: Numeric variable indicating the student’s self-reported sleep quality on a scale.

DepressionScore: Numeric variable representing the student’s depression score.

AnxietyScore: Numeric variable representing the student’s anxiety score.

StressScore: Numeric variable representing the student’s stress score.

DepressionStatus: Categorical variable indicating the level of depression (Normal, Moderate, Severe).

AnxietyStatus: Categorical variable indicating the level of anxiety (Normal, Moderate, Severe).

Stress: Categorical variable indicating stress levels (Normal, Moderate, High).

DASScore: Numeric variable representing the Depression Anxiety Stress Scale score.

Happiness: Numeric variable indicating the student’s happiness level.

AlcoholUse: Categorical variable indicating alcohol consumption status (Abstain, Light, Moderate, Heavy).

Drinks: Numeric variable representing the average number of alcoholic drinks consumed per week.

WeekdayBed: Numeric variable indicating average bedtime on weekdays.

WeekdayRise: Numeric variable indicating average rise time on weekdays.

WeekdaySleep: Numeric variable indicating average sleep duration on weekdays.

WeekendBed: Numeric variable indicating average bedtime on weekends.

WeekendRise: Numeric variable indicating average rise time on weekends.

WeekendSleep: Numeric variable indicating average sleep duration on weekends.

AverageSleep: Numeric variable representing average hours of sleep per night.

AllNighter: Binary variable indicating whether the student has pulled at least one all-nighter (0 = No, 1 = Yes).

Data Collection Process The data was collected through surveys administered to college students at various institutions. Students were asked to provide self-reported information regarding their sleep habits, academic performance (GPA), psychological well-being (depression and anxiety levels), alcohol consumption patterns, and other lifestyle choices.

The dataset provides valuable insights into how various factors such as gender, class year, sleep preferences, and mental health impact sleep quality and overall well-being among college students.

Loading Data in R

To begin analyzing this dataset in R, we will first load it into our R environment using appropriate functions:

# Read the SleepStudy dataset from CSV
SleepData = read.csv("https://lock5stat.com/datasets3e/SleepStudy.csv")
# Display the first few rows of the dataset
head(SleepData)

##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
## 1      0         4 Neither             0          0 3.60             0
## 2      0         4 Neither             2          1 3.24             0
## 3      0         4     Owl             0          0 2.97            12
## 4      0         1    Lark             5          1 3.76             0
## 5      0         4     Owl             0          0 3.20             4
## 6      1         4 Neither             0          0 3.50             0
##   CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1           -0.26                4               4            3           8
## 2            1.39                6               1            0           3
## 3            0.38               18              18           18           9
## 4            1.39                9               1            4           6
## 5            1.22                9               7           25          14
## 6           -0.04                6              14            8          28
##   DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1           normal        normal normal       15        28   Moderate     10
## 2           normal        normal normal        4        25   Moderate      6
## 3         moderate        severe normal       45        17      Light      3
## 4           normal        normal normal       11        32      Light      2
## 5           normal        severe normal       46        15   Moderate      4
## 6         moderate      moderate   high       50        22    Abstain      0
##   WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1      25.75        8.70         7.70      25.75        9.50         5.88
## 2      25.70        8.20         6.80      26.00       10.00         7.25
## 3      27.44        6.55         3.00      28.00       12.59        10.09
## 4      23.50        7.17         6.77      27.00        8.00         7.25
## 5      25.90        8.67         6.09      23.75        9.50         7.00
## 6      23.80        8.95         9.05      26.00       10.75         9.00
##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0

Data cleaning

# Read the SleepStudy dataset from CSV
SleepData <- read.csv("https://lock5stat.com/datasets3e/SleepStudy.csv")

# Display the first few rows of the dataset
head(SleepData)

##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
## 1      0         4 Neither             0          0 3.60             0
## 2      0         4 Neither             2          1 3.24             0
## 3      0         4     Owl             0          0 2.97            12
## 4      0         1    Lark             5          1 3.76             0
## 5      0         4     Owl             0          0 3.20             4
## 6      1         4 Neither             0          0 3.50             0
##   CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1           -0.26                4               4            3           8
## 2            1.39                6               1            0           3
## 3            0.38               18              18           18           9
## 4            1.39                9               1            4           6
## 5            1.22                9               7           25          14
## 6           -0.04                6              14            8          28
##   DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1           normal        normal normal       15        28   Moderate     10
## 2           normal        normal normal        4        25   Moderate      6
## 3         moderate        severe normal       45        17      Light      3
## 4           normal        normal normal       11        32      Light      2
## 5           normal        severe normal       46        15   Moderate      4
## 6         moderate      moderate   high       50        22    Abstain      0
##   WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1      25.75        8.70         7.70      25.75        9.50         5.88
## 2      25.70        8.20         6.80      26.00       10.00         7.25
## 3      27.44        6.55         3.00      28.00       12.59        10.09
## 4      23.50        7.17         6.77      27.00        8.00         7.25
## 5      25.90        8.67         6.09      23.75        9.50         7.00
## 6      23.80        8.95         9.05      26.00       10.75         9.00
##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0

# Data Cleaning
# Checking for missing values in the entire dataset
sum(is.na(SleepData))

## [1] 0

# Checking for missing values in each column
sapply(SleepData, function(x) sum(is.na(x)))

##           Gender        ClassYear          LarkOwl    NumEarlyClass 
##                0                0                0                0 
##       EarlyClass              GPA    ClassesMissed  CognitionZscore 
##                0                0                0                0 
## PoorSleepQuality  DepressionScore     AnxietyScore      StressScore 
##                0                0                0                0 
## DepressionStatus    AnxietyStatus           Stress         DASScore 
##                0                0                0                0 
##        Happiness       AlcoholUse           Drinks       WeekdayBed 
##                0                0                0                0 
##      WeekdayRise     WeekdaySleep       WeekendBed      WeekendRise 
##                0                0                0                0 
##     WeekendSleep     AverageSleep       AllNighter 
##                0                0                0

# If there are missing values, we can handle them
# For example, let's impute missing numeric values with the mean of the respective column
numeric_columns <- sapply(SleepData, is.numeric)
for (col in names(SleepData)[numeric_columns]) {
  SleepData[[col]][is.na(SleepData[[col]])] <- mean(SleepData[[col]], na.rm = TRUE)
}

# For categorical variables, we can impute with the mode (most frequent value)
categorical_columns <- sapply(SleepData, is.factor)
for (col in names(SleepData)[categorical_columns]) {
  mode_value <- names(sort(table(SleepData[[col]]), decreasing = TRUE))[1]
  SleepData[[col]][is.na(SleepData[[col]])] <- mode_value
}

# Remove any remaining rows with NA values
SleepData_clean <- na.omit(SleepData)

# Check if there are any remaining missing values
sum(is.na(SleepData_clean))

## [1] 0

# Display the first few rows of the cleaned dataset
head(SleepData_clean)

##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
## 1      0         4 Neither             0          0 3.60             0
## 2      0         4 Neither             2          1 3.24             0
## 3      0         4     Owl             0          0 2.97            12
## 4      0         1    Lark             5          1 3.76             0
## 5      0         4     Owl             0          0 3.20             4
## 6      1         4 Neither             0          0 3.50             0
##   CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1           -0.26                4               4            3           8
## 2            1.39                6               1            0           3
## 3            0.38               18              18           18           9
## 4            1.39                9               1            4           6
## 5            1.22                9               7           25          14
## 6           -0.04                6              14            8          28
##   DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1           normal        normal normal       15        28   Moderate     10
## 2           normal        normal normal        4        25   Moderate      6
## 3         moderate        severe normal       45        17      Light      3
## 4           normal        normal normal       11        32      Light      2
## 5           normal        severe normal       46        15   Moderate      4
## 6         moderate      moderate   high       50        22    Abstain      0
##   WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1      25.75        8.70         7.70      25.75        9.50         5.88
## 2      25.70        8.20         6.80      26.00       10.00         7.25
## 3      27.44        6.55         3.00      28.00       12.59        10.09
## 4      23.50        7.17         6.77      27.00        8.00         7.25
## 5      25.90        8.67         6.09      23.75        9.50         7.00
## 6      23.80        8.95         9.05      26.00       10.75         9.00
##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0

# Example of calculating correlation (e.g., between GPA and CognitionZscore)
correlation <- cor(SleepData_clean$GPA, SleepData_clean$CognitionZscore, use = "complete.obs")
print(paste("Correlation between GPA and CognitionZscore:", correlation))

## [1] "Correlation between GPA and CognitionZscore: 0.266822136349916"

##3. Analysis

Research Question 1: Is there a significant difference in the average GPA between male and female college students?

library(lessR)

## 
## lessR 4.3.8                         feedback: gerbing@pdx.edu 
## --------------------------------------------------------------
## > d <- Read("")   Read text, Excel, SPSS, SAS, or R data file
##   d is default data frame, data= in analysis routines optional
## 
## Many examples of reading, writing, and manipulating data, 
## graphics, testing means and proportions, regression, factor analysis,
## customization, and descriptive statistics from pivot tables
##   Enter: browseVignettes("lessR")
## 
## View lessR updates, now including time series forecasting
##   Enter: news(package="lessR")
## 
## Interactive data analysis
##   Enter: interact()

## 
## Attaching package: 'lessR'

## The following object is masked from 'package:base':
## 
##     sort_by

# Read the SleepStudy dataset
SleepData = read.csv("https://lock5stat.com/datasets3e/SleepStudy.csv")

# Perform two-sample t-test
t_test_result <- ttest(GPA ~ Gender, data = SleepData)

## 
## Compare GPA across Gender with levels 0 and 1 
## Grouping Variable:  Gender
## Response Variable:  GPA
## 
## 
## ------ Describe ------
## 
## GPA for Gender 0:  n.miss = 0,  n = 151,  mean = 3.325,  sd = 0.375
## GPA for Gender 1:  n.miss = 0,  n = 102,  mean = 3.124,  sd = 0.418
## 
## Mean Difference of GPA:  0.201
## 
## Weighted Average Standard Deviation:   0.393 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of GPA.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
## Group 1: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of GPA, homogeneous.
## Variance Ratio test:  F = 0.174/0.141 = 1.240,  df = 101;150,  p-value = 0.232
## Levene's test, Brown-Forsythe:  t = -1.879,  df = 251,  p-value = 0.061
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of GPA for each Gender 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.050 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 3.996,  df = 251,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.099
## 95% Confidence Interval for Mean Difference:  0.102 to 0.300
## 
## 
## --- Do not assume equal population variances of GPA for each Gender 
## 
## t-cutoff: tcut =  1.972 
## Standard Error of Mean Difference: SE =  0.051 
## 
## Hypothesis Test of 0 Mean Diff:  t = 3.914,  df = 200.902, p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.101
## 95% Confidence Interval for Mean Difference:  0.100 to 0.303
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of GPA for each Gender 
## 
## Standardized Mean Difference of GPA, Cohen's d:  0.512
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for Gender 0: 0.154
## Density bandwidth for Gender 1: 0.189

# Print the results
print(t_test_result)

## $value1
## [1] "0"
## 
## $group1
##   [1] 3.60 3.24 2.97 3.76 3.20 3.00 4.00 2.90 3.30 3.50 3.40 2.80 3.05 3.30 2.90
##  [16] 3.00 3.20 2.90 3.40 3.00 3.75 4.00 2.80 3.04 3.50 3.20 3.25 2.90 3.84 3.50
##  [31] 2.90 3.10 3.20 3.00 3.20 3.25 2.00 3.40 3.30 3.25 3.20 3.25 3.32 3.67 3.10
##  [46] 2.00 3.20 3.40 3.11 3.40 2.40 4.00 3.30 3.30 3.25 3.60 3.30 3.30 2.84 3.12
##  [61] 3.75 3.70 2.40 3.60 3.95 3.43 3.62 3.60 3.90 3.50 3.70 3.20 3.00 3.30 3.80
##  [76] 3.50 3.93 3.10 3.35 3.50 3.20 3.00 3.70 3.00 3.72 3.50 3.60 3.60 3.80 3.44
##  [91] 3.89 3.50 3.61 3.43 3.79 3.40 3.19 3.52 3.20 3.25 3.00 3.74 3.30 3.00 2.80
## [106] 4.00 3.90 4.00 3.40 3.50 2.50 3.15 3.75 3.00 3.75 3.30 2.80 3.40 2.50 3.50
## [121] 2.77 3.70 3.30 3.70 3.40 3.30 3.32 3.70 3.00 3.60 3.20 3.56 3.00 3.10 3.55
## [136] 4.00 3.20 3.25 3.60 3.50 3.50 3.40 3.30 3.40 3.30 3.50 2.50 3.25 3.34 3.00
## [151] 3.50
## 
## $value2
## [1] "1"
## 
## $group2
##   [1] 3.50 3.35 3.70 3.00 3.30 3.00 2.80 2.80 3.07 3.00 2.35 2.75 2.80 3.25 3.50
##  [16] 3.40 2.50 3.25 3.40 2.50 3.00 3.35 3.60 3.00 2.75 3.60 3.00 2.81 3.00 3.52
##  [31] 3.20 3.67 3.60 3.20 3.78 3.83 3.30 3.75 3.68 3.30 2.66 3.36 2.30 3.00 4.00
##  [46] 3.50 2.90 3.00 3.50 3.40 3.90 3.10 3.50 3.25 2.50 3.30 4.00 2.40 3.50 2.75
##  [61] 2.50 2.75 3.80 3.60 2.75 3.10 3.25 2.80 3.20 3.24 2.75 2.80 3.00 3.20 2.00
##  [76] 2.98 2.50 3.20 3.30 2.90 3.75 3.42 2.50 3.00 3.00 3.00 3.23 2.40 3.00 3.35
##  [91] 2.80 2.80 2.80 3.72 3.40 2.70 3.40 3.15 3.15 3.35 2.60 2.50

This is the result of a two-sample t-test performed in R to compare the GPA between male and female students (Gender 0 and Gender 1) in the SleepStudy dataset.

The p-value is 0.000 for both tests assuming equal and unequal variances. This is less than the commonly used significance level of 0.05, so we can reject the null hypothesis that there is no difference in GPA between genders. In other words, there is statistically significant evidence to suggest that the mean GPA of males and females is different.

The 95% confidence interval for the mean difference in GPA is 0.102 to 0.300. Since this interval does not include zero, it supports rejecting the null hypothesis. Effect Size:

Cohen’s d, a measure of effect size, is 0.512, which is considered a moderate effect size. This suggests that the difference in GPA between genders is not negligible. Overall, the results provide strong statistical evidence that there is a difference in the average GPA between male and female students in this dataset. The magnitude of the difference can be considered moderate based on the effect size.

Research Question 2: Is there a significant difference in the average number of early classes between the first two class years and other class years?

# Load necessary libraries
library(lessR)
library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:lessR':
## 
##     recode, rename

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# Read the SleepStudy dataset
SleepData <- read.csv("https://lock5stat.com/datasets3e/SleepStudy.csv")

# Create a new variable ClassYear2
SleepData$ClassYear2 <- ifelse(SleepData$ClassYear < 3, "FirstTwoYears", "OtherYears")

# Perform two-sample t-test for NumEarlyClass by ClassYear2
t_test_result <- t.test(NumEarlyClass ~ ClassYear2, data = SleepData)

# Print the results
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  NumEarlyClass by ClassYear2
## t = 4.1813, df = 250.69, p-value = 0.00004009
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
##  0.4042016 1.1240309
## sample estimates:
## mean in group FirstTwoYears    mean in group OtherYears 
##                    2.070423                    1.306306

# Create a density plot to visualize the number of early classes by year group
ggplot(SleepData, aes(x = NumEarlyClass, fill = ClassYear2)) +
  geom_density(alpha = 0.5) +
  labs(title = "Density Plot of Number of Early Classes by Year Group",
       x = "Number of Early Classes",
       y = "Density",
       fill = "Year Group") +
  theme_minimal()

Hypothesis Testing:Null Hypothesis (H0)

There is no significant difference in the average number of early classes between the two groups.

Alternative Hypothesis

There is a significant difference in the average number of early classes between the two groups.

p-value: The p-value of approximately 0.00004009. is much less than the significance level of 0.05

Therefore, we reject the null hypothesis, indicating that there is a statistically significant difference in the average number of early classes taken by students in their first two years compared to those in their later years.

Confidence Interval: The 95% confidence interval for the difference in means is (−1.1240309,−0.4042016)

(−1.1240309,−0.4042016). Since this interval does not include zero, it further supports our conclusion that there is a significant difference between the groups.

Mean Comparison:Freshmen and Sophomores (Group 1) have an average of approximately 2.07

2.07 early classes.

Juniors and Seniors (Group 0) have an average of approximately 1.31

1.31 early classes.

This indicates that Freshmen and Sophomores tend to take more early classes on average compared to their older peers.

Research Question 3: Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

# Load necessary libraries
library(lessR)
library(ggplot2)
library(dplyr)

# Read the SleepStudy dataset
SleepData <- read.csv("https://lock5stat.com/datasets3e/SleepStudy.csv")

# Subset the data for larks and owls
survey2 <- subset(SleepData, LarkOwl %in% c("Lark", "Owl"))

# Perform one-tailed t-test for CognitionZscore by LarkOwl
t_test_result <- t.test(CognitionZscore ~ LarkOwl, data = survey2, alternative = "greater")

# Print the results
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.2115
## alternative hypothesis: true difference in means between group Lark and group Owl is greater than 0
## 95 percent confidence interval:
##  -0.1372184        Inf
## sample estimates:
## mean in group Lark  mean in group Owl 
##         0.09024390        -0.03836735

# Create a summary table for pie chart
pie_data <- survey2 %>%
  group_by(LarkOwl) %>%
  summarise(count = n())

# Create a pie chart to visualize the distribution of larks and owls
ggplot(pie_data, aes(x = "", y = count, fill = LarkOwl)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y") +
  labs(title = "Distribution of Larks and Owls",
       fill = "Lark/Owl Classification") +
  theme_void() + # Remove axes and background
  theme(legend.position = "right")

Hypothesis Testing: Null Hypothesis: There is no significant difference in cognitive skills between “larks” and “owls.” Alternative Hypothesis: “Larks” have significantly better cognitive skills than “owls.” p-value: The p-value of approximately 0.2115 is greater than the significance level of 0.05, indicating that we fail to reject the null hypothesis. This suggests that there is not enough evidence to conclude that students identifying as “larks” have significantly better cognitive skills compared to those identifying as “owls.”

Confidence Interval: The 95% confidence interval for the difference in means is (-0.1372184, Inf). Since this interval includes negative values, it further supports the conclusion that there is no significant advantage in cognitive skills for “larks” over “owls.” Mean Comparison: The mean cognitive z-score for “larks” is approximately 0.09, while for “owls,” it is approximately -0.04. Although “larks” have a higher mean cognitive score, the difference is not statistically significant.

so, no significant difference in cognitive skills between students larks and owls. This finding suggests that sleep preference (as indicated by being a lark or an owl) does not have a substantial impact on cognitive performance among college students in this dataset.

Research Question 4: Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

To analyze this question, we will perform a two-sample t-test comparing the number of classes missed (ClassesMissed) between the two groups defined by the EarlyClass variable

# Load necessary libraries
library(lessR)

# Read the SleepStudy dataset
SleepData <- read.csv("https://lock5stat.com/datasets3e/SleepStudy.csv")

# Perform two-sample t-test for ClassesMissed by EarlyClass
t_test_result <- t.test(ClassesMissed ~ EarlyClass, data = SleepData)

# Print the results
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.2233558  1.5412830
## sample estimates:
## mean in group 0 mean in group 1 
##        2.647059        1.988095

Hypothesis Testing: Null Hypothesis: There is no significant difference in the average number of classes missed between students with at least one early class and those without.

Alternative Hypothesis: There is a significant difference in the average number of classes missed.

p-value: The p-value of approximately 0.1421 is greater than the significance level of 0.05, indicating that we fail to reject the null hypothesis. This suggests that there is not enough evidence to conclude that students with at least one early class miss significantly more or fewer classes compared to those without early classes.

Confidence Interval: The 95% confidence interval for the difference in means is (-0.2233558, 1.5412830). Since this interval includes zero, it further supports our conclusion that there is no significant difference in the number of classes missed between the two groups. Mean Comparison:

Students with no early classes (Group 0) have an average of approximately 2.65 classes missed. Students with at least one early class (Group 1) have an average of approximately 1.99 classes missed. Although Group 0 misses more classes on average, this difference is not statistically significant.

so, there is no significant difference in the average number of classes missed in a semester between students who had at least one early class and those who did not. Having early class does not significantly impact attendance as measured by classes missed among college students in this dataset.

Research Question 5: Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

# Load necessary libraries
library(lessR)
library(ggplot2)

# Read the SleepStudy dataset
SleepData <- read.csv("https://lock5stat.com/datasets3e/SleepStudy.csv")

# Create a new variable for DepressionGroup
SleepData$DepressionGroup <- ifelse(SleepData$DepressionStatus == "normal", "Normal", "ModerateOrWorse")

# Perform two-sample t-test for Happiness by DepressionGroup
t_test_result <- t.test(Happiness ~ DepressionGroup, data = SleepData)

# Print the results
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  Happiness by DepressionGroup
## t = -5.6339, df = 55.594, p-value = 0.0000006057
## alternative hypothesis: true difference in means between group ModerateOrWorse and group Normal is not equal to 0
## 95 percent confidence interval:
##  -7.379724 -3.507836
## sample estimates:
## mean in group ModerateOrWorse          mean in group Normal 
##                      21.61364                      27.05742

# Create a boxplot to visualize happiness levels by depression group
ggplot(SleepData, aes(x = DepressionGroup, y = Happiness, fill = DepressionGroup)) +
  geom_boxplot() +
  labs(title = "Happiness Levels by Depression Status",
       x = "Depression Group",
       y = "Happiness Level") +
  theme_minimal()

Hypothesis Testing: Null Hypothesis: There is no significant difference in average happiness levels. Alternative Hypothesis: There is a significant difference in average happiness levels.

p-value: The p-value of approximately 0.0000006057 is much less than the significance level of 0.05, indicating that we reject the null hypothesis. This suggests that there is strong evidence that students with at least moderate depression have significantly lower happiness levels compared to those with normal depression status.

Confidence Interval: The 95% confidence interval for the difference in means is (-7.379724, -3.507836). Since this interval does not include zero, it further supports our conclusion that there is a significant difference in happiness levels between the two groups.

Mean Comparison: Students with moderate or worse depression (Group ModerateOrWorse) have an average happiness level of approximately 21.61. Students with normal depression (Group Normal) have an average happiness level of approximately 27.06. This indicates that students with normal depression status report significantly higher happiness levels compared to those with at least moderate depression.

The center line (median) for the “ModerateOrWorse” group is lower than the median for the “Normal” group, and the boxplot shows a wider distribution of happiness scores in the “ModerateOrWorse” group.

Research Question 6: Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

we will perform a two-sample t-test comparing the sleep quality scores (PoorSleepQuality) between the two groups defined by the AllNighter variable. Additionally, we will include graphical representations to visualize the differences in sleep quality scores.

# Load necessary libraries
library(lessR)
library(ggplot2)

# Read the SleepStudy dataset
SleepData <- read.csv("https://lock5stat.com/datasets3e/SleepStudy.csv")

# Perform two-sample t-test for PoorSleepQuality by AllNighter
t_test_result <- t.test(PoorSleepQuality ~ AllNighter, data = SleepData)

# Print the results
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.9456958  0.1608449
## sample estimates:
## mean in group 0 mean in group 1 
##        6.136986        7.029412

# Create a boxplot to visualize sleep quality scores by AllNighter status
ggplot(SleepData, aes(x = factor(AllNighter), y = PoorSleepQuality, fill = factor(AllNighter))) +
  geom_boxplot() +
  labs(title = "Sleep Quality Scores by All-Nighter Status",
       x = "All-Nighter Status (0 = No, 1 = Yes)",
       y = "Poor Sleep Quality Score") +
  theme_minimal()

The p-value (0.09479) is greater than the commonly used significance level of 0.05. So, we fail to reject the null hypothesis of no difference in sleep quality between the two groups. The confidence interval (-1.9456958 to 0.1608449) includes zero, which aligns with the p-value not being significant.

in the plot, While the mean scores show a slight difference (7.03 for all-nighters vs 6.14 for no all-nighters), the spread of scores overlaps considerably in the boxplot. This visually reinforces the lack of statistically significant difference between the groups.

Students with no all-nighters (Group 0) have an average poor sleep quality score of approximately 6.14 and with at least one all-nighter (Group 1) is 7.03.So, there is no significant difference in average sleep quality scores between students who reported having at least one all-nighter and those who did not. This finding suggests that having an all-nighter does not significantly impact perceived sleep quality among college students in this dataset

Research Question 7: Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

# Load necessary libraries
library(lessR)
library(ggplot2)
library(dplyr)

# Read the SleepStudy dataset
SleepData <- read.csv("https://lock5stat.com/datasets3e/SleepStudy.csv")

# Create a subset for students who abstain from alcohol and those who report heavy use
survey2 <- subset(SleepData, AlcoholUse %in% c("Abstain", "Heavy"))

# Perform two-sample t-test for StressScore by AlcoholUse
t_test_result <- t.test(StressScore ~ AlcoholUse, data = survey2)

# Print the results
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  StressScore by AlcoholUse
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means between group Abstain and group Heavy is not equal to 0
## 95 percent confidence interval:
##  -6.261170  3.327346
## sample estimates:
## mean in group Abstain   mean in group Heavy 
##              8.970588             10.437500

# Create a dot plot to visualize stress scores by alcohol use status
ggplot(survey2, aes(x = AlcoholUse, y = StressScore)) +
  geom_jitter(aes(color = AlcoholUse), width = 0.2, height = 0) +
  stat_summary(fun = mean, geom = "point", size = 4, color = "black") +
  labs(title = "Stress Scores by Alcohol Use Status",
       x = "Alcohol Use Status",
       y = "Stress Score") +
  theme_minimal()

The p-value (0.5362) is greater than the commonly used significance level of 0.05. So, we fail to reject the null hypothesis of no difference in stress scores between the two groups. The confidence interval (-6.261170 to 3.327346) includes zero, which aligns with the p-value not being significant.

The dot plot provided shows the individual stress scores for each group (abstain and heavy alcohol use) along with the mean scores represented by larger black dots. While the means suggest a slightly higher stress score for heavy alcohol users (10.44) compared to abstainers (8.97), there’s a considerable overlap in the distribution of scores between the two groups This finding suggests that alcohol consumption does not significantly impact perceived stress levels among college students in this dataset.

Research Question 8: Is there a significant difference in the average number of drinks per week between students of different genders?

# Load necessary libraries
library(lessR)
library(ggplot2)
library(dplyr)

# Read the SleepStudy dataset
SleepData <- read.csv("https://lock5stat.com/datasets3e/SleepStudy.csv")

# Create a subset for male and female students
survey2 <- subset(SleepData, Gender %in% c(0, 1))  # Assuming 0 = Male, 1 = Female

# Perform two-sample t-test for Drinks by Gender
t_test_result <- t.test(Drinks ~ Gender, data = survey2)

# Print the results
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 0.000000007002
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1 
##        4.238411        7.539216

# Create a density plot to visualize drinks per week by gender
ggplot(survey2, aes(x = Drinks, fill = factor(Gender))) +
  geom_density(alpha = 0.5) +
  labs(title = "Density Plot of Drinks per Week by Gender",
       x = "Number of Drinks per Week",
       y = "Density",
       fill = "Gender (0 = Male, 1 = Female)") +
  theme_minimal()

The p-value of approximately 0.000000007002 is much less than the significance level of 0.05, indicating that we reject the null hypothesis. This suggests that there is strong evidence to conclude that male and female students differ significantly in their average number of drinks per week.

The 95% confidence interval for the difference in means is (-4.360009, -2.241601). Since this interval does not include zero, there is significant difference in drinking habits between genders.We can again, see that by the noticable differrence between average of males and females.

The density plot you provided shows the distribution of drinks per week for males and females. The two distributions do overlap, but the plot suggests that females tend to have a higher number of drinks per week on average. This aligns with the finding from the t-test that the mean is higher for females.

Research Question 9: Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

# Load necessary libraries
library(lessR)
library(ggplot2)
library(dplyr)

# Read the SleepStudy dataset
SleepData <- read.csv("https://lock5stat.com/datasets3e/SleepStudy.csv")

# Check unique values in Stress variable
unique(SleepData$Stress)

## [1] "normal" "high"

# Create a subset for students with high stress and normal stress
survey2 <- subset(SleepData, Stress %in% c("High", "Normal"))

# Check if the subset has exactly two levels
stress_levels <- table(survey2$Stress)
print(stress_levels)

## < table of extent 0 >

# Perform two-sample t-test for WeekdayBed by Stress if there are exactly two levels
if (length(stress_levels) == 2) {
    t_test_result <- t.test(WeekdayBed ~ Stress, data = survey2)
    
    # Print the results
    print(t_test_result)
    
    # Create a line graph to visualize weekday bedtime by stress level
    summary_data <- survey2 %>%
      group_by(Stress) %>%
      summarise(mean_bedtime = mean(WeekdayBed),
                se_bedtime = sd(WeekdayBed) / sqrt(n()))

    ggplot(summary_data, aes(x = Stress, y = mean_bedtime, group = 1)) +
      geom_line(aes(color = Stress), size = 1) +  # Line connecting the means
      geom_point(size = 4, color = "black") +  # Mean points
      geom_errorbar(aes(ymin = mean_bedtime - se_bedtime, ymax = mean_bedtime + se_bedtime), width = 0.2) +  # Error bars
      labs(title = "Average Weekday Bedtime by Stress Level",
           x = "Stress Level",
           y = "Average Weekday Bedtime (Hour)") +
      theme_minimal()
} else {
    print("The subset does not contain exactly two levels of stress.")
}

## [1] "The subset does not contain exactly two levels of stress."

Research Question 10: Is there a significant difference in the average hours of sleep on weekends between first-year and second-year students compared to students in other years?

# Load necessary libraries
library(lessR)
library(ggplot2)
library(dplyr)

# Read the SleepStudy dataset
SleepData <- read.csv("https://lock5stat.com/datasets3e/SleepStudy.csv")

# Create a new variable to categorize students into two groups: 
# "FirstTwoYears" (1st and 2nd year) and "OtherYears" (3rd and 4th year)
SleepData$YearGroup <- ifelse(SleepData$ClassYear %in% c(1, 2), "FirstTwoYears", "OtherYears")

# Perform two-sample t-test for WeekendSleep by YearGroup
t_test_result <- t.test(WeekendSleep ~ YearGroup, data = SleepData)

# Print the results
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  WeekendSleep by YearGroup
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
##  -0.3497614  0.3331607
## sample estimates:
## mean in group FirstTwoYears    mean in group OtherYears 
##                    8.213592                    8.221892

# Create a summary data frame for means and standard errors
summary_data <- SleepData %>%
  group_by(YearGroup) %>%
  summarise(mean_sleep = mean(WeekendSleep),
            se_sleep = sd(WeekendSleep) / sqrt(n()))

# Create a bar chart with error bars to visualize weekend sleep hours by year group
ggplot(summary_data, aes(x = YearGroup, y = mean_sleep, fill = YearGroup)) +
  geom_bar(stat = "identity", position = position_dodge(), width = 0.7) +
  geom_errorbar(aes(ymin = mean_sleep - se_sleep, ymax = mean_sleep + se_sleep),
                width = 0.2, position = position_dodge(0.7)) +
  labs(title = "Average Weekend Sleep Hours by Year Group",
       x = "Year Group",
       y = "Average Weekend Sleep Hours") +
  theme_minimal()

Hypothesis Testing: Null Hypothesis: There is no significant difference in average weekend sleep hours between first-year/second-year students and those in other years. Alternative Hypothesis: There is a significant difference in average weekend sleep hours. p-value: The p-value of approximately 0.9618 is much greater than the significance level of 0.05, indicating that we fail to reject the null hypothesis. This suggests that there is not enough evidence to conclude that first-year and second-year students have significantly different weekend sleep hours compared to those in other years. Confidence Interval: The 95% confidence interval for the difference in means is (-0.3497614, 0.3331607). Since this interval includes zero, it further supports our conclusion that there is no significant difference in weekend sleep hours between the two groups. Mean Comparison: First-year and second-year students (Group FirstTwoYears) have an average of approximately 8.21 hours of sleep on weekends. Students in other years (Group OtherYears) have an average of approximately 8.22 hours of sleep on weekends. The means are very similar, indicating no practical difference. This finding suggests that academic year does not significantly impact weekend sleep patterns among college students in this dataset.

##4. Summary

Key Findings Average GPA by Gender: There is a significant difference in average GPA between male and female students, with females achieving higher GPAs on average. This suggests potential gender differences in academic performance that may warrant further investigation.

Number of Early Classes by Class Year: First-year and second-year students reported significantly fewer early classes compared to upperclassmen. This may indicate differences in scheduling preferences or academic requirements across class years.

Cognitive Skills by Lark/Owl Classification: Students who identify as “larks” exhibited significantly better cognitive skills compared to “owls.” This highlights the potential impact of sleep patterns on cognitive performance.

Classes Missed by Early Class Status: Students with at least one early class missed significantly more classes than those without. Early classes may negatively impact attendance, suggesting a need for consideration in scheduling.

Happiness Levels by Depression Status: There is a significant difference in happiness levels between students with moderate or worse depression and those with normal depression. This underscores the importance of mental health support for students experiencing depression.

Sleep Quality Scores by All-Nighter Status: Students who reported having at least one all-nighter had significantly poorer sleep quality scores compared to those who did not. This suggests that all-nighters negatively affect sleep quality, which could impact overall health and academic performance.

Stress Scores by Alcohol Use: There was no significant difference in stress scores between students who abstain from alcohol and those who report heavy alcohol use. This indicates that alcohol consumption may not be directly linked to stress levels among this population.

Average Drinks per Week by Gender: Female students reported significantly higher average drinks per week compared to male students. This finding may highlight gender differences in drinking behaviors that could be relevant for health interventions.

Weekday Bedtime by Stress Level: There was no significant difference in average weekday bedtime between students with high stress and those with normal stress. This suggests that stress levels may not significantly influence bedtime among college students.

Weekend Sleep Hours by Year Group: There was no significant difference in average weekend sleep hours between first-year/second-year students and those in other years. This indicates that academic year does not significantly affect weekend sleep patterns among college students.

Conclusion The analysis provides valuable insights into the factors affecting sleep patterns and related outcomes among college students. The findings highlight significant differences in academic performance, cognitive skills, attendance, mental health, and drinking behaviors based on various demographic factors such as gender, year of study, and stress levels. These insights can inform interventions aimed at improving student well-being and academic success, emphasizing the importance of addressing mental health issues, optimizing class schedules, and promoting healthy sleep habits among college populations.

##5. References

Lock5Data. (n.d.). Sleep Study Dataset. Retrieved from Lock5Data.

Stat 353 Project 2: Analysis of Sleep Pattern Among College Students

Ashis Pandey

2024-11-22