Project Two, Exploring sleep patterns

Introduction

This report presents an analysis of sleep patterns among college students, utilizing the “SleepStudy” dataset obtained from https://www.lock5stat.com/datapage3e.html. The dataset comprises 253 observations on 27 variables, providing insights into sleep habits, psychological well-being, academic performance, and lifestyle choices of college students.

The purpose of this report is to answer 10 research questions about student health and behavior, using statistical hypothesis tests. These questions explore differences in GPA, depression, stress, alcohol use, sleep habits, and more.

Research Questions:

Is there a significant difference in the average GPA between male and female college students?
Is there a significant difference in the average number of early classes between the first two class years and other class years?
Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?
Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class and those who didn’t?
Is there a significant difference in the average happiness level between students with at least moderate depression and those with normal depression status?
Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter and those who didn’t?
Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?
Is there a significant difference in the average number of drinks per week between students of different genders?
Is there a significant difference in the average weekday bedtime between students with high and low stress?
Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

Data

The dataset includes 253 college students and 27 variables, covering a wide range of sleep- and wellness-related metrics such as GPA, stress, hours of sleep, bedtimes, alcohol use, and depression levels.

The data were collected through a comprehensive student survey. Variables include both categorical and numerical data, making the dataset suitable for hypothesis testing using t-tests and proportion tests.

# Load libraries
library(lessR)

## 
## lessR 4.4.2                         feedback: gerbing@pdx.edu 
## --------------------------------------------------------------
## > d <- Read("")  Read data file, many formats available, e.g., Excel
##   d is default data frame, data= in analysis routines optional
## 
## Many examples of reading, writing, and manipulating data, 
## graphics, testing means and proportions, regression, factor analysis,
## customization, forecasting, and aggregation from pivot tables
##   Enter: browseVignettes("lessR")
## 
## View lessR updates, now including time series forecasting
##   Enter: news(package="lessR")
## 
## Interactive data analysis
##   Enter: interact()

## 
## Attaching package: 'lessR'

## The following object is masked from 'package:base':
## 
##     sort_by

# Load dataset
SleepStudy <- Read("https://www.lock5stat.com/datasets3e/SleepStudy.csv", quiet=TRUE)

# View structure
str(SleepStudy)

## 'data.frame':    253 obs. of  27 variables:
##  $ Gender          : int  0 0 0 0 0 1 1 0 0 0 ...
##  $ ClassYear       : int  4 4 4 1 4 4 2 2 1 4 ...
##  $ LarkOwl         : chr  "Neither" "Neither" "Owl" "Lark" ...
##  $ NumEarlyClass   : int  0 2 0 5 0 0 2 0 2 2 ...
##  $ EarlyClass      : int  0 1 0 1 0 0 1 0 1 1 ...
##  $ GPA             : num  3.6 3.24 2.97 3.76 3.2 3.5 3.35 3 4 2.9 ...
##  $ ClassesMissed   : int  0 0 12 0 4 0 2 0 0 0 ...
##  $ CognitionZscore : num  -0.26 1.39 0.38 1.39 1.22 -0.04 0.41 -0.59 1.03 0.72 ...
##  $ PoorSleepQuality: int  4 6 18 9 9 6 2 10 5 2 ...
##  $ DepressionScore : int  4 1 18 1 7 14 1 2 12 6 ...
##  $ AnxietyScore    : int  3 0 18 4 25 8 0 2 16 11 ...
##  $ StressScore     : int  8 3 9 6 14 28 1 3 20 31 ...
##  $ DepressionStatus: chr  "normal" "normal" "moderate" "normal" ...
##  $ AnxietyStatus   : chr  "normal" "normal" "severe" "normal" ...
##  $ Stress          : chr  "normal" "normal" "normal" "normal" ...
##  $ DASScore        : int  15 4 45 11 46 50 2 7 48 48 ...
##  $ Happiness       : int  28 25 17 32 15 22 25 29 29 30 ...
##  $ AlcoholUse      : chr  "Moderate" "Moderate" "Light" "Light" ...
##  $ Drinks          : int  10 6 3 2 4 0 6 3 3 6 ...
##  $ WeekdayBed      : num  25.8 25.7 27.4 23.5 25.9 ...
##  $ WeekdayRise     : num  8.7 8.2 6.55 7.17 8.67 8.95 8.48 9.07 8.75 8 ...
##  $ WeekdaySleep    : num  7.7 6.8 3 6.77 6.09 9.05 7.73 9.02 8.25 6.6 ...
##  $ WeekendBed      : num  25.8 26 28 27 23.8 ...
##  $ WeekendRise     : num  9.5 10 12.6 8 9.5 ...
##  $ WeekendSleep    : num  5.88 7.25 10.09 7.25 7 ...
##  $ AverageSleep    : num  7.18 6.93 5.02 6.9 6.35 9.04 7.52 9.01 8.54 6.68 ...
##  $ AllNighter      : int  0 0 0 0 0 0 1 0 0 0 ...

Analysis

Question 1: GPA vs. Gender

We want to test whether there is a statistically significant difference in the average GPA between male and female college students.

# Reload data, was running into some issues with loading data
library(lessR)
SleepStudy <- Read("https://www.lock5stat.com/datasets3e/SleepStudy.csv", quiet=TRUE)

# Convert Gender from 0/1 to "Male" and "Female"
SleepStudy$Gender <- factor(SleepStudy$Gender, levels = c(0, 1), labels = c("Male", "Female"))

# Run the two-sample t-test
ttest(GPA ~ Gender, data = SleepStudy)

## 
## Compare GPA across Gender with levels Male and Female 
## Grouping Variable:  Gender
## Response Variable:  GPA
## 
## 
## ------ Describe ------
## 
## GPA for Gender Male:  n.miss = 0,  n = 151,  mean = 3.325,  sd = 0.375
## GPA for Gender Female:  n.miss = 0,  n = 102,  mean = 3.124,  sd = 0.418
## 
## Mean Difference of GPA:  0.201
## 
## Weighted Average Standard Deviation:   0.393 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of GPA.
## Group Male: Sample mean assumed normal because n > 30, so no test needed.
## Group Female: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of GPA, homogeneous.
## Variance Ratio test:  F = 0.174/0.141 = 1.240,  df = 101;150,  p-value = 0.232
## Levene's test, Brown-Forsythe:  t = -1.879,  df = 251,  p-value = 0.061
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of GPA for each Gender 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.050 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 3.996,  df = 251,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.099
## 95% Confidence Interval for Mean Difference:  0.102 to 0.300
## 
## 
## --- Do not assume equal population variances of GPA for each Gender 
## 
## t-cutoff: tcut =  1.972 
## Standard Error of Mean Difference: SE =  0.051 
## 
## Hypothesis Test of 0 Mean Diff:  t = 3.914,  df = 200.902, p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.101
## 95% Confidence Interval for Mean Difference:  0.100 to 0.303
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of GPA for each Gender 
## 
## Standardized Mean Difference of GPA, Cohen's d:  0.512
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for Gender Male: 0.154
## Density bandwidth for Gender Female: 0.189

# Confirm Gender got converted from 0 and 1 to male and female
table(SleepStudy$Gender)

## 
##   Male Female 
##    151    102

A two-sample t-test was conducted to compare GPA between male and female college students. Male students had a significantly higher mean GPA (M = 3.325, SD = 0.375) compared to female students (M = 3.124, SD = 0.418), with a mean difference of 0.201. The result was statistically significant, t(251) = 3.996, p < 0.001, with a 95% confidence interval of [0.102, 0.300]. The effect size (Cohen’s d = 0.512) suggests a moderate practical difference between groups.

Question 2: Early Classes vs. Class Year

We want to determine whether students in the first two years have a significantly different number of early classes compared to students in the later years

We’ll use a two-sample t-test comparing the EarlyClass variable across class groups.

# Recode ClassYear: 1 = First 2 years, 0 = Other
SleepStudy$ClassGroup <- ifelse(SleepStudy$ClassYear <= 2, "First2Years", "UpperYears")
SleepStudy$ClassGroup <- as.factor(SleepStudy$ClassGroup)
# Run the two-sample t-test on EarlyClass
ttest(EarlyClass ~ ClassGroup, data = SleepStudy)

## 
## Compare EarlyClass across ClassGroup with levels First2Years and UpperYears 
## Grouping Variable:  ClassGroup
## Response Variable:  EarlyClass
## 
## 
## ------ Describe ------
## 
## EarlyClass for ClassGroup First2Years:  n.miss = 0,  n = 142,  mean = 0.725,  sd = 0.448
## EarlyClass for ClassGroup UpperYears:  n.miss = 0,  n = 111,  mean = 0.586,  sd = 0.495
## 
## Mean Difference of EarlyClass:  0.140
## 
## Weighted Average Standard Deviation:   0.469 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of EarlyClass.
## Group First2Years: Sample mean assumed normal because n > 30, so no test needed.
## Group UpperYears: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of EarlyClass, homogeneous.
## Variance Ratio test:  F = 0.245/0.201 = 1.221,  df = 110;141,  p-value = 0.264
## Levene's test, Brown-Forsythe:  t = -2.352,  df = 251,  p-value = 0.019
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of EarlyClass for each ClassGroup 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.059 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 2.352,  df = 251,  p-value = 0.019
## 
## Margin of Error for 95% Confidence Level:  0.117
## 95% Confidence Interval for Mean Difference:  0.023 to 0.257
## 
## 
## --- Do not assume equal population variances of EarlyClass for each ClassGroup 
## 
## t-cutoff: tcut =  1.971 
## Standard Error of Mean Difference: SE =  0.060 
## 
## Hypothesis Test of 0 Mean Diff:  t = 2.323,  df = 224.255, p-value = 0.021
## 
## Margin of Error for 95% Confidence Level:  0.119
## 95% Confidence Interval for Mean Difference:  0.021 to 0.258
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of EarlyClass for each ClassGroup 
## 
## Standardized Mean Difference of EarlyClass, Cohen's d:  0.298
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for ClassGroup First2Years: 0.189
## Density bandwidth for ClassGroup UpperYears: 0.220

A two-sample t-test was used to compare the average number of early classes between students in their first two class years and upper-year students.

First 2 Years: M = 0.725, SD = 0.448
Upper Years: M = 0.586, SD = 0.495
Mean Difference = 0.140

The test showed a significant differece in early class count, t(251) = 2.352, p = 0.019, with a 95% confidence interval of [0.023, 0.257].

We conclude that students in the first two years of college take more early classes on average than upper-year students. The effect size (Cohen’s d = 0.298) suggests a small to moderate difference.

Question 3: Cognition Scores – Larks vs. Owls

We want to determine whether students who identify as larks (morning people) have significantly better cognitive skills than those who identify as owls (night people). We’ll compare their Cognition z-scores using a two-sample t-test.

# Filter dataset to only "Lark" and "Owl" students
SleepStudy_LarkOwl <- subset(SleepStudy, LarkOwl %in% c("Lark", "Owl"))

ttest(CognitionZscore ~ LarkOwl, data = SleepStudy_LarkOwl)

## 
## Compare CognitionZscore across LarkOwl with levels Lark and Owl 
## Grouping Variable:  LarkOwl
## Response Variable:  CognitionZscore
## 
## 
## ------ Describe ------
## 
## CognitionZscore for LarkOwl Lark:  n.miss = 0,  n = 41,  mean = 0.090,  sd = 0.830
## CognitionZscore for LarkOwl Owl:  n.miss = 0,  n = 49,  mean = -0.038,  sd = 0.653
## 
## Mean Difference of CognitionZscore:  0.129
## 
## Weighted Average Standard Deviation:   0.738 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of CognitionZscore.
## Group Lark: Sample mean assumed normal because n > 30, so no test needed.
## Group Owl: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of CognitionZscore, homogeneous.
## Variance Ratio test:  F = 0.688/0.426 = 1.615,  df = 40;48,  p-value = 0.112
## Levene's test, Brown-Forsythe:  t = 1.336,  df = 88,  p-value = 0.185
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of CognitionZscore for each LarkOwl 
## 
## t-cutoff for 95% range of variation: tcut =  1.987 
## Standard Error of Mean Difference: SE =  0.156 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 0.823,  df = 88,  p-value = 0.413
## 
## Margin of Error for 95% Confidence Level:  0.311
## 95% Confidence Interval for Mean Difference:  -0.182 to 0.439
## 
## 
## --- Do not assume equal population variances of CognitionZscore for each LarkOwl 
## 
## t-cutoff: tcut =  1.992 
## Standard Error of Mean Difference: SE =  0.160 
## 
## Hypothesis Test of 0 Mean Diff:  t = 0.806,  df = 75.331, p-value = 0.423
## 
## Margin of Error for 95% Confidence Level:  0.318
## 95% Confidence Interval for Mean Difference:  -0.189 to 0.447
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of CognitionZscore for each LarkOwl 
## 
## Standardized Mean Difference of CognitionZscore, Cohen's d:  0.174
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for LarkOwl Lark: 0.450
## Density bandwidth for LarkOwl Owl: 0.341

A two-sample t-test was conducted to compare cognition z-scores between students who identify as larks and those who identify as owls.

Larks: M = 0.090, SD = 0.830
Owls: M = -0.038, SD = 0.653
Mean Difference = 0.129

The result of the t-test was not statistically significant, t(88) = 0.823, p = 0.413, with a 95% confidence interval of [-0.182, 0.439].

We fail to reject the null hypothesis. This suggests there is no significant difference in cognitive performance between students who identify as larks versus owls. The effect size (Cohen’s d = 0.174) indicates a small and likely negligible difference in practical terms.

Question 4: Classes Missed vs. Early Class

We want to determine whether students who had at least one early class (EarlyClass = 1) missed significantly more or fewer classes than those who had no early classes (EarlyClass = 0).

# Convert EarlyClass to a factor for grouping
SleepStudy$EarlyClass <- factor(SleepStudy$EarlyClass, levels = c(0, 1), labels = c("No Early Class", "Has Early Class"))
# Two-sample t-test on ClassesMissed by EarlyClass
ttest(ClassesMissed ~ EarlyClass, data = SleepStudy)

## 
## Compare ClassesMissed across EarlyClass with levels No Early Class and Has Early Class 
## Grouping Variable:  EarlyClass
## Response Variable:  ClassesMissed
## 
## 
## ------ Describe ------
## 
## ClassesMissed for EarlyClass No Early Class:  n.miss = 0,  n = 85,  mean = 2.647,  sd = 3.477
## ClassesMissed for EarlyClass Has Early Class:  n.miss = 0,  n = 168,  mean = 1.988,  sd = 3.101
## 
## Mean Difference of ClassesMissed:  0.659
## 
## Weighted Average Standard Deviation:   3.232 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of ClassesMissed.
## Group No Early Class: Sample mean assumed normal because n > 30, so no test needed.
## Group Has Early Class: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of ClassesMissed, homogeneous.
## Variance Ratio test:  F = 12.088/9.617 = 1.257,  df = 84;167,  p-value = 0.214
## Levene's test, Brown-Forsythe:  t = 1.373,  df = 251,  p-value = 0.171
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of ClassesMissed for each EarlyClass 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.430 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 1.532,  df = 251,  p-value = 0.127
## 
## Margin of Error for 95% Confidence Level:  0.847
## 95% Confidence Interval for Mean Difference:  -0.188 to 1.506
## 
## 
## --- Do not assume equal population variances of ClassesMissed for each EarlyClass 
## 
## t-cutoff: tcut =  1.976 
## Standard Error of Mean Difference: SE =  0.447 
## 
## Hypothesis Test of 0 Mean Diff:  t = 1.475,  df = 152.779, p-value = 0.142
## 
## Margin of Error for 95% Confidence Level:  0.882
## 95% Confidence Interval for Mean Difference:  -0.223 to 1.541
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of ClassesMissed for each EarlyClass 
## 
## Standardized Mean Difference of ClassesMissed, Cohen's d:  0.204
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for EarlyClass No Early Class: 1.629
## Density bandwidth for EarlyClass Has Early Class: 1.044

# Visualize classes missed by early class status
Plot(x = EarlyClass, y = ClassesMissed, data = SleepStudy)

## 
## >>> Suggestions or enter: style(suggest=FALSE)
## Plot(EarlyClass, ClassesMissed, data=SleepStudy, means=FALSE)  # do not plot means
## Plot(EarlyClass, ClassesMissed, data=SleepStudy, stat="mean")  # only plot means
## ttest(ClassesMissed ~ EarlyClass)  # inferential analysis 
## 
## ClassesMissed 
##   - by levels of - 
## EarlyClass 
##  
##                    n   miss      mean        sd       min       mdn       max 
## No Early Class     85      0     2.647     3.477     0.000     2.000    20.000 
## Has Early Class   168      0     1.988     3.101     0.000     1.000    20.000 
##

A two-sample t-test was conducted to compare the number of classes missed between students who had at least one early class and those who didn’t.

No Early Class: M = 2.647, SD = 3.477
Has Early Class: M = 1.988, SD = 3.101
Mean difference = 0.659

The test was not statistically significant, t(251) = 1.532, p = 0.127, with a 95% confidence interval of [-0.188, 1.506]. The effect size (Cohen’s d = 0.204) suggests a small and likely negligible difference in practical terms.

We fail to reject the null hypothesis. This means there is no significant difference in class attendance between students with and without early classes.

Question 5: Happiness vs. Depression Status

We want to determine whether there is a statistically significant difference in happiness levels between students with normal depression status and those with moderate or severe depression.

# Reload dataset
SleepStudy <- Read("https://www.lock5stat.com/datasets3e/SleepStudy.csv", quiet=TRUE)

# Recode: combine moderate + severe, keep normal
SleepStudy$DepGroup <- ifelse(SleepStudy$DepressionStatus == "normal", 
                              "Normal", 
                              "Mod/Sev")

# Convert 
SleepStudy$DepGroup <- factor(SleepStudy$DepGroup)

# Confirm new table
table(SleepStudy$DepGroup)

## 
## Mod/Sev  Normal 
##      44     209

ttest(Happiness ~ DepGroup, data = SleepStudy)

## 
## Compare Happiness across DepGroup with levels Normal and Mod/Sev 
## Grouping Variable:  DepGroup
## Response Variable:  Happiness
## 
## 
## ------ Describe ------
## 
## Happiness for DepGroup Normal:  n.miss = 0,  n = 209,  mean = 27.057,  sd = 4.885
## Happiness for DepGroup Mod/Sev:  n.miss = 0,  n = 44,  mean = 21.614,  sd = 6.005
## 
## Mean Difference of Happiness:  5.444
## 
## Weighted Average Standard Deviation:   5.094 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of Happiness.
## Group Normal: Sample mean assumed normal because n > 30, so no test needed.
## Group Mod/Sev: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of Happiness, homogeneous.
## Variance Ratio test:  F = 36.057/23.862 = 1.511,  df = 43;208,  p-value = 0.062
## Levene's test, Brown-Forsythe:  t = -2.246,  df = 251,  p-value = 0.026
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of Happiness for each DepGroup 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.845 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 6.443,  df = 251,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  1.664
## 95% Confidence Interval for Mean Difference:  3.780 to 7.108
## 
## 
## --- Do not assume equal population variances of Happiness for each DepGroup 
## 
## t-cutoff: tcut =  2.004 
## Standard Error of Mean Difference: SE =  0.966 
## 
## Hypothesis Test of 0 Mean Diff:  t = 5.634,  df = 55.594, p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  1.936
## 95% Confidence Interval for Mean Difference:  3.508 to 7.380
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of Happiness for each DepGroup 
## 
## Standardized Mean Difference of Happiness, Cohen's d:  1.069
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for DepGroup Normal: 1.202
## Density bandwidth for DepGroup Mod/Sev: 3.211

A two-sample t-test was conducted to compare happiness levels between students with normal depression status and those with moderate or severe depression.

Normal: M = 27.057, SD = 4.885
Mod/Sev: M = 21.614, SD = 6.005
Mean difference = 5.444

The test was highly statistically significant, t(251) = 6.443, p < 0.001, with a 95% confidence interval of [3.780, 7.108]. The effect size (Cohen’s d = 1.069) indicates a large and meaningful difference in happiness levels between groups.

We reject the null hypothesis and conclude that students with moderate or severe depression report significantly lower happiness compared to those with normal depression status.

Question 6: Sleep Quality vs. All-Nighter

We want to determine whether students who pulled at least one all-nighter report significantly worse sleep quality compared to those who didn’t.

# Convert AllNighter to factor labels
SleepStudy$AllNighter <- factor(SleepStudy$AllNighter, levels = c(0, 1), labels = c("No", "Yes"))
# Run the two-sample t-test on sleep quality
ttest(PoorSleepQuality ~ AllNighter, data = SleepStudy)

## 
## Compare PoorSleepQuality across AllNighter with levels Yes and No 
## Grouping Variable:  AllNighter
## Response Variable:  PoorSleepQuality
## 
## 
## ------ Describe ------
## 
## PoorSleepQuality for AllNighter Yes:  n.miss = 0,  n = 34,  mean = 7.029,  sd = 2.823
## PoorSleepQuality for AllNighter No:  n.miss = 0,  n = 219,  mean = 6.137,  sd = 2.922
## 
## Mean Difference of PoorSleepQuality:  0.892
## 
## Weighted Average Standard Deviation:   2.910 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of PoorSleepQuality.
## Group Yes: Sample mean assumed normal because n > 30, so no test needed.
## Group No: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of PoorSleepQuality, homogeneous.
## Variance Ratio test:  F = 8.541/7.969 = 1.072,  df = 218;33,  p-value = 0.846
## Levene's test, Brown-Forsythe:  t = 0.279,  df = 251,  p-value = 0.780
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of PoorSleepQuality for each AllNighter 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.536 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 1.664,  df = 251,  p-value = 0.097
## 
## Margin of Error for 95% Confidence Level:  1.056
## 95% Confidence Interval for Mean Difference:  -0.164 to 1.949
## 
## 
## --- Do not assume equal population variances of PoorSleepQuality for each AllNighter 
## 
## t-cutoff: tcut =  2.014 
## Standard Error of Mean Difference: SE =  0.523 
## 
## Hypothesis Test of 0 Mean Diff:  t = 1.707,  df = 44.708, p-value = 0.095
## 
## Margin of Error for 95% Confidence Level:  1.053
## 95% Confidence Interval for Mean Difference:  -0.161 to 1.946
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of PoorSleepQuality for each AllNighter 
## 
## Standardized Mean Difference of PoorSleepQuality, Cohen's d:  0.307
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for AllNighter Yes: 1.589
## Density bandwidth for AllNighter No: 0.936

# Visualize sleep quality by all-nighter status
Plot(x = AllNighter, y = PoorSleepQuality, data = SleepStudy)

## 
## >>> Suggestions or enter: style(suggest=FALSE)
## Plot(AllNighter, PoorSleepQuality, data=SleepStudy, means=FALSE)  # do not plot means
## Plot(AllNighter, PoorSleepQuality, data=SleepStudy, stat="mean")  # only plot means
## ttest(PoorSleepQuality ~ AllNighter)  # inferential analysis 
## 
## PoorSleepQuality 
##   - by levels of - 
## AllNighter 
##  
##         n   miss      mean        sd       min       mdn       max 
## No    219      0     6.137     2.922     1.000     6.000    18.000 
## Yes    34      0     7.029     2.823     2.000     7.000    12.000 
##

A two-sample t-test was conducted to compare sleep quality scores between students who had pulled at least one all-nighter and those who hadn’t.

No All-Nighter: M = 6.137, SD = 2.922
Yes All-Nighter: M = 7.029, SD = 2.823
Mean difference = 0.892

The test was not statistically significant, t(251) = 1.664, p = 0.097, with a 95% confidence interval of [-0.164, 1.949]. The effect size (Cohen’s d = 0.307) suggests a small difference in sleep quality scores.

We fail to reject the null hypothesis. This means there is no significant difference in reported sleep quality between students who pulled all-nighters and those who did not, although there is a slight trend toward poorer sleep quality among students who pulled all-nighters.

Question 7: Stress Score vs. Alcohol Use

We want to determine whether students who abstain from alcohol use report significantly better stress scores than students who report heavy alcohol use.

# Check unique values in AlcoholUse
table(SleepStudy$AlcoholUse)

## 
##  Abstain    Heavy    Light Moderate 
##       34       16       83      120

# Filter for only "Abstain" and "Heavy"
SleepStudy_Alcohol <- subset(SleepStudy, AlcoholUse %in% c("Abstain", "Heavy"))
SleepStudy_Alcohol$AlcoholUse <- factor(SleepStudy_Alcohol$AlcoholUse)

# Run the t-test on StressScore
ttest(StressScore ~ AlcoholUse, data = SleepStudy_Alcohol)

## 
## Compare StressScore across AlcoholUse with levels Heavy and Abstain 
## Grouping Variable:  AlcoholUse
## Response Variable:  StressScore
## 
## 
## ------ Describe ------
## 
## StressScore for AlcoholUse Heavy:  n.miss = 0,  n = 16,  mean = 10.438,  sd = 7.797
## StressScore for AlcoholUse Abstain:  n.miss = 0,  n = 34,  mean = 8.971,  sd = 7.582
## 
## Mean Difference of StressScore:  1.467
## 
## Weighted Average Standard Deviation:   7.650 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of StressScore.
## Group Heavy  Shapiro-Wilk normality test:  W = 0.961,  p-value = 0.687
## Group Abstain: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of StressScore, homogeneous.
## Variance Ratio test:  F = 60.796/57.484 = 1.058,  df = 15;33,  p-value = 0.856
## Levene's test, Brown-Forsythe:  t = 0.347,  df = 48,  p-value = 0.730
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of StressScore for each AlcoholUse 
## 
## t-cutoff for 95% range of variation: tcut =  2.011 
## Standard Error of Mean Difference: SE =  2.319 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 0.633,  df = 48,  p-value = 0.530
## 
## Margin of Error for 95% Confidence Level:  4.663
## 95% Confidence Interval for Mean Difference:  -3.196 to 6.130
## 
## 
## --- Do not assume equal population variances of StressScore for each AlcoholUse 
## 
## t-cutoff: tcut =  2.046 
## Standard Error of Mean Difference: SE =  2.343 
## 
## Hypothesis Test of 0 Mean Diff:  t = 0.626,  df = 28.733, p-value = 0.536
## 
## Margin of Error for 95% Confidence Level:  4.794
## 95% Confidence Interval for Mean Difference:  -3.327 to 6.261
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of StressScore for each AlcoholUse 
## 
## Standardized Mean Difference of StressScore, Cohen's d:  0.192
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for AlcoholUse Heavy: 5.096
## Density bandwidth for AlcoholUse Abstain: 4.268

# Plot stress score by alcohol use group
Plot(x = AlcoholUse, y = StressScore, data = SleepStudy_Alcohol)

## 
## >>> Suggestions or enter: style(suggest=FALSE)
## Plot(AlcoholUse, StressScore, data=SleepStudy_Alcohol, means=FALSE)  # do not plot means
## Plot(AlcoholUse, StressScore, data=SleepStudy_Alcohol, stat="mean")  # only plot means
## ttest(StressScore ~ AlcoholUse)  # inferential analysis 
## 
## StressScore 
##   - by levels of - 
## AlcoholUse 
##  
##            n   miss       mean         sd        min        mdn        max 
## Abstain   34      0      8.971      7.582      0.000      7.000     28.000 
## Heavy     16      0     10.438      7.797      0.000     10.000     27.000 
##

A two-sample t-test was conducted to compare stress scores between students who abstain from alcohol and those who report heavy alcohol use.

Abstain: M = 8.971, SD = 7.582
Heavy Use: M = 10.438, SD = 7.797
Mean difference = 1.467

The test was not statistically significant, t(48) = 0.633, p = 0.530, with a 95% confidence interval of [-3.196, 6.130]. The effect size (Cohen’s d = 0.192) indicates a small and likely negligible difference.

We fail to reject the null hypothesis. This suggests there is no significant difference in stress levels between students who avoid alcohol and those who drink heavily, although the trend favors slightly lower stress among abstainers.

Question 8: Number of Drinks per Week vs. Gender

We want to determine whether there is a statistically significant difference in the average number of alcoholic drinks per week between male and female students.

# Convert Gender from 0/1 to labels
SleepStudy$Gender <- factor(SleepStudy$Gender, levels = c(0, 1), labels = c("Male", "Female"))
# Run the t-test comparing number of drinks per week
ttest(Drinks ~ Gender, data = SleepStudy)

## 
## Compare Drinks across Gender with levels Female and Male 
## Grouping Variable:  Gender
## Response Variable:  Drinks
## 
## 
## ------ Describe ------
## 
## Drinks for Gender Female:  n.miss = 0,  n = 102,  mean = 7.539,  sd = 4.929
## Drinks for Gender Male:  n.miss = 0,  n = 151,  mean = 4.238,  sd = 2.720
## 
## Mean Difference of Drinks:  3.301
## 
## Weighted Average Standard Deviation:   3.768 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of Drinks.
## Group Female: Sample mean assumed normal because n > 30, so no test needed.
## Group Male: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of Drinks, homogeneous.
## Variance Ratio test:  F = 24.291/7.396 = 3.284,  df = 101;150,  p-value = 0.000
## Levene's test, Brown-Forsythe:  t = 5.471,  df = 251,  p-value = 0.000
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of Drinks for each Gender 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.483 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 6.836,  df = 251,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.951
## 95% Confidence Interval for Mean Difference:  2.350 to 4.252
## 
## 
## --- Do not assume equal population variances of Drinks for each Gender 
## 
## t-cutoff: tcut =  1.977 
## Standard Error of Mean Difference: SE =  0.536 
## 
## Hypothesis Test of 0 Mean Diff:  t = 6.160,  df = 142.754, p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  1.059
## 95% Confidence Interval for Mean Difference:  2.242 to 4.360
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of Drinks for each Gender 
## 
## Standardized Mean Difference of Drinks, Cohen's d:  0.876
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for Gender Female: 2.227
## Density bandwidth for Gender Male: 1.136

# Visualize number of drinks per week by gender
Plot(x = Gender, y = Drinks, data = SleepStudy)

## 
## >>> Suggestions or enter: style(suggest=FALSE)
## Plot(Gender, Drinks, data=SleepStudy, means=FALSE)  # do not plot means
## Plot(Gender, Drinks, data=SleepStudy, stat="mean")  # only plot means
## ttest(Drinks ~ Gender)  # inferential analysis 
## 
## Drinks 
##   - by levels of - 
## Gender 
##  
##            n   miss      mean        sd       min       mdn       max 
## Male     151      0     4.238     2.720     0.000     4.000    12.000 
## Female   102      0     7.539     4.929     0.000     8.000    24.000 
##

A two-sample t-test was conducted to compare the number of alcoholic drinks consumed per week between male and female students.

Female: M = 7.539, SD = 4.929
Male: M = 4.238, SD = 2.720
Mean difference = 3.301

The test was statistically significant, t(142.75) = 6.160, p < 0.001, with a 95% confidence interval of [2.242, 4.360]. The effect size (Cohen’s d = 0.876) indicates a large difference in weekly alcohol consumption between genders.

We reject the null hypothesis and conclude that female students reported significantly more alcoholic drinks per week than male students in this sample.

Question 9: Weekday Bedtime vs. Stress Level

We want to determine whether students with high stress go to bed at a significantly different time on weekdays compared to those with normal stress.

#check variable names
table(SleepStudy$Stress)

## 
##   high normal 
##     56    197

# Filter for only "high" and "normal" stress categories
SleepStudy_Stress <- subset(SleepStudy, Stress %in% c("high", "normal"))
SleepStudy_Stress$Stress <- factor(SleepStudy_Stress$Stress)
# Run t-test comparing weekday bedtime
ttest(WeekdayBed ~ Stress, data = SleepStudy_Stress)

## 
## Compare WeekdayBed across Stress with levels normal and high 
## Grouping Variable:  Stress
## Response Variable:  WeekdayBed
## 
## 
## ------ Describe ------
## 
## WeekdayBed for Stress normal:  n.miss = 0,  n = 197,  mean = 24.885,  sd = 1.028
## WeekdayBed for Stress high:  n.miss = 0,  n = 56,  mean = 24.715,  sd = 1.053
## 
## Mean Difference of WeekdayBed:  0.170
## 
## Weighted Average Standard Deviation:   1.033 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of WeekdayBed.
## Group normal: Sample mean assumed normal because n > 30, so no test needed.
## Group high: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of WeekdayBed, homogeneous.
## Variance Ratio test:  F = 1.108/1.056 = 1.049,  df = 55;196,  p-value = 0.792
## Levene's test, Brown-Forsythe:  t = -0.054,  df = 251,  p-value = 0.957
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of WeekdayBed for each Stress 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.156 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 1.089,  df = 251,  p-value = 0.277
## 
## Margin of Error for 95% Confidence Level:  0.308
## 95% Confidence Interval for Mean Difference:  -0.138 to 0.479
## 
## 
## --- Do not assume equal population variances of WeekdayBed for each Stress 
## 
## t-cutoff: tcut =  1.988 
## Standard Error of Mean Difference: SE =  0.159 
## 
## Hypothesis Test of 0 Mean Diff:  t = 1.075,  df = 87.048, p-value = 0.286
## 
## Margin of Error for 95% Confidence Level:  0.315
## 95% Confidence Interval for Mean Difference:  -0.145 to 0.486
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of WeekdayBed for each Stress 
## 
## Standardized Mean Difference of WeekdayBed, Cohen's d:  0.165
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for Stress normal: 0.407
## Density bandwidth for Stress high: 0.536

# Visualize bedtime by stress level
Plot(x = Stress, y = WeekdayBed, data = SleepStudy_Stress)

## 
## >>> Suggestions or enter: style(suggest=FALSE)
## Plot(Stress, WeekdayBed, data=SleepStudy_Stress, means=FALSE)  # do not plot means
## Plot(Stress, WeekdayBed, data=SleepStudy_Stress, stat="mean")  # only plot means
## ttest(WeekdayBed ~ Stress)  # inferential analysis 
## 
## WeekdayBed 
##   - by levels of - 
## Stress 
##  
##           n   miss       mean         sd        min        mdn        max 
## high      56      0     24.715      1.053     22.830     24.700     26.800 
## normal   197      0     24.885      1.028     21.800     24.900     29.100 
##

A two-sample t-test was conducted to compare weekday bedtimes between students with high and normal stress levels.

Normal Stress: M = 24.885, SD = 1.028
High Stress: M = 24.715, SD = 1.053
Mean difference = 0.170

The test was not statistically significant, t(251) = 1.089, p = 0.277, with a 95% confidence interval of [-0.138, 0.479]. The effect size (Cohen’s d = 0.165) indicates a small and negligible difference in bedtime behavior.

We fail to reject the null hypothesis. This suggests there is no significant difference in weekday bedtime between students with high and normal stress levels.

Question 10: Weekend Sleep Hours vs. Class Year

We want to determine whether students in their first two years of college sleep more or less on weekends compared to upper-year students.

# Create class group variable
SleepStudy$ClassGroup <- ifelse(SleepStudy$ClassYear <= 2, "First2Years", "UpperYears")
SleepStudy$ClassGroup <- factor(SleepStudy$ClassGroup)
# Run t-test comparing weekend sleep hours
ttest(WeekendSleep ~ ClassGroup, data = SleepStudy)

## 
## Compare WeekendSleep across ClassGroup with levels UpperYears and First2Years 
## Grouping Variable:  ClassGroup
## Response Variable:  WeekendSleep
## 
## 
## ------ Describe ------
## 
## WeekendSleep for ClassGroup UpperYears:  n.miss = 0,  n = 111,  mean = 8.222,  sd = 1.363
## WeekendSleep for ClassGroup First2Years:  n.miss = 0,  n = 142,  mean = 8.214,  sd = 1.374
## 
## Mean Difference of WeekendSleep:  0.008
## 
## Weighted Average Standard Deviation:   1.369 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of WeekendSleep.
## Group UpperYears: Sample mean assumed normal because n > 30, so no test needed.
## Group First2Years: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of WeekendSleep, homogeneous.
## Variance Ratio test:  F = 1.889/1.858 = 1.017,  df = 141;110,  p-value = 0.933
## Levene's test, Brown-Forsythe:  t = -0.497,  df = 251,  p-value = 0.619
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of WeekendSleep for each ClassGroup 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.174 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 0.048,  df = 251,  p-value = 0.962
## 
## Margin of Error for 95% Confidence Level:  0.342
## 95% Confidence Interval for Mean Difference:  -0.333 to 0.350
## 
## 
## --- Do not assume equal population variances of WeekendSleep for each ClassGroup 
## 
## t-cutoff: tcut =  1.970 
## Standard Error of Mean Difference: SE =  0.173 
## 
## Hypothesis Test of 0 Mean Diff:  t = 0.048,  df = 237.363, p-value = 0.962
## 
## Margin of Error for 95% Confidence Level:  0.341
## 95% Confidence Interval for Mean Difference:  -0.333 to 0.350
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of WeekendSleep for each ClassGroup 
## 
## Standardized Mean Difference of WeekendSleep, Cohen's d:  0.006
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for ClassGroup UpperYears: 0.606
## Density bandwidth for ClassGroup First2Years: 0.581

# Visualize weekend sleep by class group
Plot(x = ClassGroup, y = WeekendSleep, data = SleepStudy)

## 
## >>> Suggestions or enter: style(suggest=FALSE)
## Plot(ClassGroup, WeekendSleep, data=SleepStudy, means=FALSE)  # do not plot means
## Plot(ClassGroup, WeekendSleep, data=SleepStudy, stat="mean")  # only plot means
## ttest(WeekendSleep ~ ClassGroup)  # inferential analysis 
## 
## WeekendSleep 
##   - by levels of - 
## ClassGroup 
##  
##                 n   miss      mean        sd       min       mdn       max 
## First2Years   142      0     8.214     1.374     4.000     8.250    11.000 
## UpperYears    111      0     8.222     1.363     4.380     8.250    12.750 
##

A two-sample t-test was conducted to compare average weekend sleep hours between students in their first two years of college and upper-year students.

First2Years: M = 8.214, SD = 1.374
UpperYears: M = 8.222, SD = 1.363
Mean difference = 0.008

The test was not statistically significant, t(251) = 0.048, p = 0.962, with a 95% confidence interval of [-0.333, 0.350]. The effect size (Cohen’s d = 0.006) indicates virtually no difference.

We fail to reject the null hypothesis. This suggests there is no meaningful difference in weekend sleep duration between early-year and upper-year students.

Summary and Conclusion

This report analyzed ten research questions related to sleep habits, academic behavior, mental health, and lifestyle among college students using the SleepStudy dataset.

These results highlight a few meaningful patterns, such as the impact of mental health on happiness and gender differences in drinking behavior, while also reinforcing the importance of not assuming large behavioral differences based on single lifestyle factors. Overall, this analysis demonstrates the value of data-driven approaches to understanding student well-being.