This report presents an analysis of sleep patterns among college students, utilizing the “SleepStudy” dataset obtained from https://www.lock5stat.com/datapage3e.html. The dataset comprises 253 observations on 27 variables, providing insights into sleep habits, psychological well-being, academic performance, and lifestyle choices of college students.
The purpose of this report is to answer 10 research questions about student health and behavior, using statistical hypothesis tests. These questions explore differences in GPA, depression, stress, alcohol use, sleep habits, and more.
The dataset includes 253 college students and 27 variables, covering a wide range of sleep- and wellness-related metrics such as GPA, stress, hours of sleep, bedtimes, alcohol use, and depression levels.
The data were collected through a comprehensive student survey. Variables include both categorical and numerical data, making the dataset suitable for hypothesis testing using t-tests and proportion tests.
# Load libraries
library(lessR)
##
## lessR 4.4.2 feedback: gerbing@pdx.edu
## --------------------------------------------------------------
## > d <- Read("") Read data file, many formats available, e.g., Excel
## d is default data frame, data= in analysis routines optional
##
## Many examples of reading, writing, and manipulating data,
## graphics, testing means and proportions, regression, factor analysis,
## customization, forecasting, and aggregation from pivot tables
## Enter: browseVignettes("lessR")
##
## View lessR updates, now including time series forecasting
## Enter: news(package="lessR")
##
## Interactive data analysis
## Enter: interact()
##
## Attaching package: 'lessR'
## The following object is masked from 'package:base':
##
## sort_by
# Load dataset
SleepStudy <- Read("https://www.lock5stat.com/datasets3e/SleepStudy.csv", quiet=TRUE)
# View structure
str(SleepStudy)
## 'data.frame': 253 obs. of 27 variables:
## $ Gender : int 0 0 0 0 0 1 1 0 0 0 ...
## $ ClassYear : int 4 4 4 1 4 4 2 2 1 4 ...
## $ LarkOwl : chr "Neither" "Neither" "Owl" "Lark" ...
## $ NumEarlyClass : int 0 2 0 5 0 0 2 0 2 2 ...
## $ EarlyClass : int 0 1 0 1 0 0 1 0 1 1 ...
## $ GPA : num 3.6 3.24 2.97 3.76 3.2 3.5 3.35 3 4 2.9 ...
## $ ClassesMissed : int 0 0 12 0 4 0 2 0 0 0 ...
## $ CognitionZscore : num -0.26 1.39 0.38 1.39 1.22 -0.04 0.41 -0.59 1.03 0.72 ...
## $ PoorSleepQuality: int 4 6 18 9 9 6 2 10 5 2 ...
## $ DepressionScore : int 4 1 18 1 7 14 1 2 12 6 ...
## $ AnxietyScore : int 3 0 18 4 25 8 0 2 16 11 ...
## $ StressScore : int 8 3 9 6 14 28 1 3 20 31 ...
## $ DepressionStatus: chr "normal" "normal" "moderate" "normal" ...
## $ AnxietyStatus : chr "normal" "normal" "severe" "normal" ...
## $ Stress : chr "normal" "normal" "normal" "normal" ...
## $ DASScore : int 15 4 45 11 46 50 2 7 48 48 ...
## $ Happiness : int 28 25 17 32 15 22 25 29 29 30 ...
## $ AlcoholUse : chr "Moderate" "Moderate" "Light" "Light" ...
## $ Drinks : int 10 6 3 2 4 0 6 3 3 6 ...
## $ WeekdayBed : num 25.8 25.7 27.4 23.5 25.9 ...
## $ WeekdayRise : num 8.7 8.2 6.55 7.17 8.67 8.95 8.48 9.07 8.75 8 ...
## $ WeekdaySleep : num 7.7 6.8 3 6.77 6.09 9.05 7.73 9.02 8.25 6.6 ...
## $ WeekendBed : num 25.8 26 28 27 23.8 ...
## $ WeekendRise : num 9.5 10 12.6 8 9.5 ...
## $ WeekendSleep : num 5.88 7.25 10.09 7.25 7 ...
## $ AverageSleep : num 7.18 6.93 5.02 6.9 6.35 9.04 7.52 9.01 8.54 6.68 ...
## $ AllNighter : int 0 0 0 0 0 0 1 0 0 0 ...
We want to test whether there is a statistically significant difference in the average GPA between male and female college students.
# Reload data, was running into some issues with loading data
library(lessR)
SleepStudy <- Read("https://www.lock5stat.com/datasets3e/SleepStudy.csv", quiet=TRUE)
# Convert Gender from 0/1 to "Male" and "Female"
SleepStudy$Gender <- factor(SleepStudy$Gender, levels = c(0, 1), labels = c("Male", "Female"))
# Run the two-sample t-test
ttest(GPA ~ Gender, data = SleepStudy)
##
## Compare GPA across Gender with levels Male and Female
## Grouping Variable: Gender
## Response Variable: GPA
##
##
## ------ Describe ------
##
## GPA for Gender Male: n.miss = 0, n = 151, mean = 3.325, sd = 0.375
## GPA for Gender Female: n.miss = 0, n = 102, mean = 3.124, sd = 0.418
##
## Mean Difference of GPA: 0.201
##
## Weighted Average Standard Deviation: 0.393
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of GPA.
## Group Male: Sample mean assumed normal because n > 30, so no test needed.
## Group Female: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of GPA, homogeneous.
## Variance Ratio test: F = 0.174/0.141 = 1.240, df = 101;150, p-value = 0.232
## Levene's test, Brown-Forsythe: t = -1.879, df = 251, p-value = 0.061
##
##
## ------ Infer ------
##
## --- Assume equal population variances of GPA for each Gender
##
## t-cutoff for 95% range of variation: tcut = 1.969
## Standard Error of Mean Difference: SE = 0.050
##
## Hypothesis Test of 0 Mean Diff: t-value = 3.996, df = 251, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 0.099
## 95% Confidence Interval for Mean Difference: 0.102 to 0.300
##
##
## --- Do not assume equal population variances of GPA for each Gender
##
## t-cutoff: tcut = 1.972
## Standard Error of Mean Difference: SE = 0.051
##
## Hypothesis Test of 0 Mean Diff: t = 3.914, df = 200.902, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 0.101
## 95% Confidence Interval for Mean Difference: 0.100 to 0.303
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of GPA for each Gender
##
## Standardized Mean Difference of GPA, Cohen's d: 0.512
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for Gender Male: 0.154
## Density bandwidth for Gender Female: 0.189
# Confirm Gender got converted from 0 and 1 to male and female
table(SleepStudy$Gender)
##
## Male Female
## 151 102
A two-sample t-test was conducted to compare GPA between male and female college students. Male students had a significantly higher mean GPA (M = 3.325, SD = 0.375) compared to female students (M = 3.124, SD = 0.418), with a mean difference of 0.201. The result was statistically significant, t(251) = 3.996, p < 0.001, with a 95% confidence interval of [0.102, 0.300]. The effect size (Cohen’s d = 0.512) suggests a moderate practical difference between groups.
We want to determine whether students in the first two years have a significantly different number of early classes compared to students in the later years
We’ll use a two-sample t-test comparing the EarlyClass
variable across class groups.
# Recode ClassYear: 1 = First 2 years, 0 = Other
SleepStudy$ClassGroup <- ifelse(SleepStudy$ClassYear <= 2, "First2Years", "UpperYears")
SleepStudy$ClassGroup <- as.factor(SleepStudy$ClassGroup)
# Run the two-sample t-test on EarlyClass
ttest(EarlyClass ~ ClassGroup, data = SleepStudy)
##
## Compare EarlyClass across ClassGroup with levels First2Years and UpperYears
## Grouping Variable: ClassGroup
## Response Variable: EarlyClass
##
##
## ------ Describe ------
##
## EarlyClass for ClassGroup First2Years: n.miss = 0, n = 142, mean = 0.725, sd = 0.448
## EarlyClass for ClassGroup UpperYears: n.miss = 0, n = 111, mean = 0.586, sd = 0.495
##
## Mean Difference of EarlyClass: 0.140
##
## Weighted Average Standard Deviation: 0.469
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of EarlyClass.
## Group First2Years: Sample mean assumed normal because n > 30, so no test needed.
## Group UpperYears: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of EarlyClass, homogeneous.
## Variance Ratio test: F = 0.245/0.201 = 1.221, df = 110;141, p-value = 0.264
## Levene's test, Brown-Forsythe: t = -2.352, df = 251, p-value = 0.019
##
##
## ------ Infer ------
##
## --- Assume equal population variances of EarlyClass for each ClassGroup
##
## t-cutoff for 95% range of variation: tcut = 1.969
## Standard Error of Mean Difference: SE = 0.059
##
## Hypothesis Test of 0 Mean Diff: t-value = 2.352, df = 251, p-value = 0.019
##
## Margin of Error for 95% Confidence Level: 0.117
## 95% Confidence Interval for Mean Difference: 0.023 to 0.257
##
##
## --- Do not assume equal population variances of EarlyClass for each ClassGroup
##
## t-cutoff: tcut = 1.971
## Standard Error of Mean Difference: SE = 0.060
##
## Hypothesis Test of 0 Mean Diff: t = 2.323, df = 224.255, p-value = 0.021
##
## Margin of Error for 95% Confidence Level: 0.119
## 95% Confidence Interval for Mean Difference: 0.021 to 0.258
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of EarlyClass for each ClassGroup
##
## Standardized Mean Difference of EarlyClass, Cohen's d: 0.298
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for ClassGroup First2Years: 0.189
## Density bandwidth for ClassGroup UpperYears: 0.220
A two-sample t-test was used to compare the average number of early
classes between students in their first two class years and upper-year
students.
The test showed a significant differece in early class count, t(251) = 2.352, p = 0.019, with a 95% confidence interval of [0.023, 0.257].
We conclude that students in the first two years of college take more early classes on average than upper-year students. The effect size (Cohen’s d = 0.298) suggests a small to moderate difference.
We want to determine whether students who identify as larks (morning
people) have significantly better cognitive skills than those who
identify as owls (night people). We’ll compare their
Cognition z-scores using a two-sample t-test.
# Filter dataset to only "Lark" and "Owl" students
SleepStudy_LarkOwl <- subset(SleepStudy, LarkOwl %in% c("Lark", "Owl"))
ttest(CognitionZscore ~ LarkOwl, data = SleepStudy_LarkOwl)
##
## Compare CognitionZscore across LarkOwl with levels Lark and Owl
## Grouping Variable: LarkOwl
## Response Variable: CognitionZscore
##
##
## ------ Describe ------
##
## CognitionZscore for LarkOwl Lark: n.miss = 0, n = 41, mean = 0.090, sd = 0.830
## CognitionZscore for LarkOwl Owl: n.miss = 0, n = 49, mean = -0.038, sd = 0.653
##
## Mean Difference of CognitionZscore: 0.129
##
## Weighted Average Standard Deviation: 0.738
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of CognitionZscore.
## Group Lark: Sample mean assumed normal because n > 30, so no test needed.
## Group Owl: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of CognitionZscore, homogeneous.
## Variance Ratio test: F = 0.688/0.426 = 1.615, df = 40;48, p-value = 0.112
## Levene's test, Brown-Forsythe: t = 1.336, df = 88, p-value = 0.185
##
##
## ------ Infer ------
##
## --- Assume equal population variances of CognitionZscore for each LarkOwl
##
## t-cutoff for 95% range of variation: tcut = 1.987
## Standard Error of Mean Difference: SE = 0.156
##
## Hypothesis Test of 0 Mean Diff: t-value = 0.823, df = 88, p-value = 0.413
##
## Margin of Error for 95% Confidence Level: 0.311
## 95% Confidence Interval for Mean Difference: -0.182 to 0.439
##
##
## --- Do not assume equal population variances of CognitionZscore for each LarkOwl
##
## t-cutoff: tcut = 1.992
## Standard Error of Mean Difference: SE = 0.160
##
## Hypothesis Test of 0 Mean Diff: t = 0.806, df = 75.331, p-value = 0.423
##
## Margin of Error for 95% Confidence Level: 0.318
## 95% Confidence Interval for Mean Difference: -0.189 to 0.447
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of CognitionZscore for each LarkOwl
##
## Standardized Mean Difference of CognitionZscore, Cohen's d: 0.174
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for LarkOwl Lark: 0.450
## Density bandwidth for LarkOwl Owl: 0.341
A two-sample t-test was conducted to compare cognition z-scores between students who identify as larks and those who identify as owls.
The result of the t-test was not statistically significant, t(88) = 0.823, p = 0.413, with a 95% confidence interval of [-0.182, 0.439].
We fail to reject the null hypothesis. This suggests there is no significant difference in cognitive performance between students who identify as larks versus owls. The effect size (Cohen’s d = 0.174) indicates a small and likely negligible difference in practical terms.
We want to determine whether students who had at least one early class (EarlyClass = 1) missed significantly more or fewer classes than those who had no early classes (EarlyClass = 0).
# Convert EarlyClass to a factor for grouping
SleepStudy$EarlyClass <- factor(SleepStudy$EarlyClass, levels = c(0, 1), labels = c("No Early Class", "Has Early Class"))
# Two-sample t-test on ClassesMissed by EarlyClass
ttest(ClassesMissed ~ EarlyClass, data = SleepStudy)
##
## Compare ClassesMissed across EarlyClass with levels No Early Class and Has Early Class
## Grouping Variable: EarlyClass
## Response Variable: ClassesMissed
##
##
## ------ Describe ------
##
## ClassesMissed for EarlyClass No Early Class: n.miss = 0, n = 85, mean = 2.647, sd = 3.477
## ClassesMissed for EarlyClass Has Early Class: n.miss = 0, n = 168, mean = 1.988, sd = 3.101
##
## Mean Difference of ClassesMissed: 0.659
##
## Weighted Average Standard Deviation: 3.232
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of ClassesMissed.
## Group No Early Class: Sample mean assumed normal because n > 30, so no test needed.
## Group Has Early Class: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of ClassesMissed, homogeneous.
## Variance Ratio test: F = 12.088/9.617 = 1.257, df = 84;167, p-value = 0.214
## Levene's test, Brown-Forsythe: t = 1.373, df = 251, p-value = 0.171
##
##
## ------ Infer ------
##
## --- Assume equal population variances of ClassesMissed for each EarlyClass
##
## t-cutoff for 95% range of variation: tcut = 1.969
## Standard Error of Mean Difference: SE = 0.430
##
## Hypothesis Test of 0 Mean Diff: t-value = 1.532, df = 251, p-value = 0.127
##
## Margin of Error for 95% Confidence Level: 0.847
## 95% Confidence Interval for Mean Difference: -0.188 to 1.506
##
##
## --- Do not assume equal population variances of ClassesMissed for each EarlyClass
##
## t-cutoff: tcut = 1.976
## Standard Error of Mean Difference: SE = 0.447
##
## Hypothesis Test of 0 Mean Diff: t = 1.475, df = 152.779, p-value = 0.142
##
## Margin of Error for 95% Confidence Level: 0.882
## 95% Confidence Interval for Mean Difference: -0.223 to 1.541
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of ClassesMissed for each EarlyClass
##
## Standardized Mean Difference of ClassesMissed, Cohen's d: 0.204
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for EarlyClass No Early Class: 1.629
## Density bandwidth for EarlyClass Has Early Class: 1.044
# Visualize classes missed by early class status
Plot(x = EarlyClass, y = ClassesMissed, data = SleepStudy)
##
## >>> Suggestions or enter: style(suggest=FALSE)
## Plot(EarlyClass, ClassesMissed, data=SleepStudy, means=FALSE) # do not plot means
## Plot(EarlyClass, ClassesMissed, data=SleepStudy, stat="mean") # only plot means
## ttest(ClassesMissed ~ EarlyClass) # inferential analysis
##
## ClassesMissed
## - by levels of -
## EarlyClass
##
## n miss mean sd min mdn max
## No Early Class 85 0 2.647 3.477 0.000 2.000 20.000
## Has Early Class 168 0 1.988 3.101 0.000 1.000 20.000
##
A two-sample t-test was conducted to compare the number of classes missed between students who had at least one early class and those who didn’t.
The test was not statistically significant, t(251) = 1.532, p = 0.127, with a 95% confidence interval of [-0.188, 1.506]. The effect size (Cohen’s d = 0.204) suggests a small and likely negligible difference in practical terms.
We fail to reject the null hypothesis. This means there is no significant difference in class attendance between students with and without early classes.
We want to determine whether there is a statistically significant difference in happiness levels between students with normal depression status and those with moderate or severe depression.
# Reload dataset
SleepStudy <- Read("https://www.lock5stat.com/datasets3e/SleepStudy.csv", quiet=TRUE)
# Recode: combine moderate + severe, keep normal
SleepStudy$DepGroup <- ifelse(SleepStudy$DepressionStatus == "normal",
"Normal",
"Mod/Sev")
# Convert
SleepStudy$DepGroup <- factor(SleepStudy$DepGroup)
# Confirm new table
table(SleepStudy$DepGroup)
##
## Mod/Sev Normal
## 44 209
ttest(Happiness ~ DepGroup, data = SleepStudy)
##
## Compare Happiness across DepGroup with levels Normal and Mod/Sev
## Grouping Variable: DepGroup
## Response Variable: Happiness
##
##
## ------ Describe ------
##
## Happiness for DepGroup Normal: n.miss = 0, n = 209, mean = 27.057, sd = 4.885
## Happiness for DepGroup Mod/Sev: n.miss = 0, n = 44, mean = 21.614, sd = 6.005
##
## Mean Difference of Happiness: 5.444
##
## Weighted Average Standard Deviation: 5.094
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of Happiness.
## Group Normal: Sample mean assumed normal because n > 30, so no test needed.
## Group Mod/Sev: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of Happiness, homogeneous.
## Variance Ratio test: F = 36.057/23.862 = 1.511, df = 43;208, p-value = 0.062
## Levene's test, Brown-Forsythe: t = -2.246, df = 251, p-value = 0.026
##
##
## ------ Infer ------
##
## --- Assume equal population variances of Happiness for each DepGroup
##
## t-cutoff for 95% range of variation: tcut = 1.969
## Standard Error of Mean Difference: SE = 0.845
##
## Hypothesis Test of 0 Mean Diff: t-value = 6.443, df = 251, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 1.664
## 95% Confidence Interval for Mean Difference: 3.780 to 7.108
##
##
## --- Do not assume equal population variances of Happiness for each DepGroup
##
## t-cutoff: tcut = 2.004
## Standard Error of Mean Difference: SE = 0.966
##
## Hypothesis Test of 0 Mean Diff: t = 5.634, df = 55.594, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 1.936
## 95% Confidence Interval for Mean Difference: 3.508 to 7.380
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of Happiness for each DepGroup
##
## Standardized Mean Difference of Happiness, Cohen's d: 1.069
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for DepGroup Normal: 1.202
## Density bandwidth for DepGroup Mod/Sev: 3.211
A two-sample t-test was conducted to compare happiness levels between
students with normal depression status and those with moderate or severe
depression.
The test was highly statistically significant, t(251) = 6.443, p < 0.001, with a 95% confidence interval of [3.780, 7.108]. The effect size (Cohen’s d = 1.069) indicates a large and meaningful difference in happiness levels between groups.
We reject the null hypothesis and conclude that students with moderate or severe depression report significantly lower happiness compared to those with normal depression status.
We want to determine whether students who pulled at least one all-nighter report significantly worse sleep quality compared to those who didn’t.
# Convert AllNighter to factor labels
SleepStudy$AllNighter <- factor(SleepStudy$AllNighter, levels = c(0, 1), labels = c("No", "Yes"))
# Run the two-sample t-test on sleep quality
ttest(PoorSleepQuality ~ AllNighter, data = SleepStudy)
##
## Compare PoorSleepQuality across AllNighter with levels Yes and No
## Grouping Variable: AllNighter
## Response Variable: PoorSleepQuality
##
##
## ------ Describe ------
##
## PoorSleepQuality for AllNighter Yes: n.miss = 0, n = 34, mean = 7.029, sd = 2.823
## PoorSleepQuality for AllNighter No: n.miss = 0, n = 219, mean = 6.137, sd = 2.922
##
## Mean Difference of PoorSleepQuality: 0.892
##
## Weighted Average Standard Deviation: 2.910
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of PoorSleepQuality.
## Group Yes: Sample mean assumed normal because n > 30, so no test needed.
## Group No: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of PoorSleepQuality, homogeneous.
## Variance Ratio test: F = 8.541/7.969 = 1.072, df = 218;33, p-value = 0.846
## Levene's test, Brown-Forsythe: t = 0.279, df = 251, p-value = 0.780
##
##
## ------ Infer ------
##
## --- Assume equal population variances of PoorSleepQuality for each AllNighter
##
## t-cutoff for 95% range of variation: tcut = 1.969
## Standard Error of Mean Difference: SE = 0.536
##
## Hypothesis Test of 0 Mean Diff: t-value = 1.664, df = 251, p-value = 0.097
##
## Margin of Error for 95% Confidence Level: 1.056
## 95% Confidence Interval for Mean Difference: -0.164 to 1.949
##
##
## --- Do not assume equal population variances of PoorSleepQuality for each AllNighter
##
## t-cutoff: tcut = 2.014
## Standard Error of Mean Difference: SE = 0.523
##
## Hypothesis Test of 0 Mean Diff: t = 1.707, df = 44.708, p-value = 0.095
##
## Margin of Error for 95% Confidence Level: 1.053
## 95% Confidence Interval for Mean Difference: -0.161 to 1.946
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of PoorSleepQuality for each AllNighter
##
## Standardized Mean Difference of PoorSleepQuality, Cohen's d: 0.307
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for AllNighter Yes: 1.589
## Density bandwidth for AllNighter No: 0.936
# Visualize sleep quality by all-nighter status
Plot(x = AllNighter, y = PoorSleepQuality, data = SleepStudy)
##
## >>> Suggestions or enter: style(suggest=FALSE)
## Plot(AllNighter, PoorSleepQuality, data=SleepStudy, means=FALSE) # do not plot means
## Plot(AllNighter, PoorSleepQuality, data=SleepStudy, stat="mean") # only plot means
## ttest(PoorSleepQuality ~ AllNighter) # inferential analysis
##
## PoorSleepQuality
## - by levels of -
## AllNighter
##
## n miss mean sd min mdn max
## No 219 0 6.137 2.922 1.000 6.000 18.000
## Yes 34 0 7.029 2.823 2.000 7.000 12.000
##
A two-sample t-test was conducted to compare sleep quality scores between students who had pulled at least one all-nighter and those who hadn’t.
The test was not statistically significant, t(251) = 1.664, p = 0.097, with a 95% confidence interval of [-0.164, 1.949]. The effect size (Cohen’s d = 0.307) suggests a small difference in sleep quality scores.
We fail to reject the null hypothesis. This means there is no significant difference in reported sleep quality between students who pulled all-nighters and those who did not, although there is a slight trend toward poorer sleep quality among students who pulled all-nighters.
We want to determine whether students who abstain from alcohol use report significantly better stress scores than students who report heavy alcohol use.
# Check unique values in AlcoholUse
table(SleepStudy$AlcoholUse)
##
## Abstain Heavy Light Moderate
## 34 16 83 120
# Filter for only "Abstain" and "Heavy"
SleepStudy_Alcohol <- subset(SleepStudy, AlcoholUse %in% c("Abstain", "Heavy"))
SleepStudy_Alcohol$AlcoholUse <- factor(SleepStudy_Alcohol$AlcoholUse)
# Run the t-test on StressScore
ttest(StressScore ~ AlcoholUse, data = SleepStudy_Alcohol)
##
## Compare StressScore across AlcoholUse with levels Heavy and Abstain
## Grouping Variable: AlcoholUse
## Response Variable: StressScore
##
##
## ------ Describe ------
##
## StressScore for AlcoholUse Heavy: n.miss = 0, n = 16, mean = 10.438, sd = 7.797
## StressScore for AlcoholUse Abstain: n.miss = 0, n = 34, mean = 8.971, sd = 7.582
##
## Mean Difference of StressScore: 1.467
##
## Weighted Average Standard Deviation: 7.650
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of StressScore.
## Group Heavy Shapiro-Wilk normality test: W = 0.961, p-value = 0.687
## Group Abstain: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of StressScore, homogeneous.
## Variance Ratio test: F = 60.796/57.484 = 1.058, df = 15;33, p-value = 0.856
## Levene's test, Brown-Forsythe: t = 0.347, df = 48, p-value = 0.730
##
##
## ------ Infer ------
##
## --- Assume equal population variances of StressScore for each AlcoholUse
##
## t-cutoff for 95% range of variation: tcut = 2.011
## Standard Error of Mean Difference: SE = 2.319
##
## Hypothesis Test of 0 Mean Diff: t-value = 0.633, df = 48, p-value = 0.530
##
## Margin of Error for 95% Confidence Level: 4.663
## 95% Confidence Interval for Mean Difference: -3.196 to 6.130
##
##
## --- Do not assume equal population variances of StressScore for each AlcoholUse
##
## t-cutoff: tcut = 2.046
## Standard Error of Mean Difference: SE = 2.343
##
## Hypothesis Test of 0 Mean Diff: t = 0.626, df = 28.733, p-value = 0.536
##
## Margin of Error for 95% Confidence Level: 4.794
## 95% Confidence Interval for Mean Difference: -3.327 to 6.261
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of StressScore for each AlcoholUse
##
## Standardized Mean Difference of StressScore, Cohen's d: 0.192
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for AlcoholUse Heavy: 5.096
## Density bandwidth for AlcoholUse Abstain: 4.268
# Plot stress score by alcohol use group
Plot(x = AlcoholUse, y = StressScore, data = SleepStudy_Alcohol)
##
## >>> Suggestions or enter: style(suggest=FALSE)
## Plot(AlcoholUse, StressScore, data=SleepStudy_Alcohol, means=FALSE) # do not plot means
## Plot(AlcoholUse, StressScore, data=SleepStudy_Alcohol, stat="mean") # only plot means
## ttest(StressScore ~ AlcoholUse) # inferential analysis
##
## StressScore
## - by levels of -
## AlcoholUse
##
## n miss mean sd min mdn max
## Abstain 34 0 8.971 7.582 0.000 7.000 28.000
## Heavy 16 0 10.438 7.797 0.000 10.000 27.000
##
A two-sample t-test was conducted to compare stress scores between students who abstain from alcohol and those who report heavy alcohol use.
The test was not statistically significant, t(48) = 0.633, p = 0.530, with a 95% confidence interval of [-3.196, 6.130]. The effect size (Cohen’s d = 0.192) indicates a small and likely negligible difference.
We fail to reject the null hypothesis. This suggests there is no significant difference in stress levels between students who avoid alcohol and those who drink heavily, although the trend favors slightly lower stress among abstainers.
We want to determine whether there is a statistically significant difference in the average number of alcoholic drinks per week between male and female students.
# Convert Gender from 0/1 to labels
SleepStudy$Gender <- factor(SleepStudy$Gender, levels = c(0, 1), labels = c("Male", "Female"))
# Run the t-test comparing number of drinks per week
ttest(Drinks ~ Gender, data = SleepStudy)
##
## Compare Drinks across Gender with levels Female and Male
## Grouping Variable: Gender
## Response Variable: Drinks
##
##
## ------ Describe ------
##
## Drinks for Gender Female: n.miss = 0, n = 102, mean = 7.539, sd = 4.929
## Drinks for Gender Male: n.miss = 0, n = 151, mean = 4.238, sd = 2.720
##
## Mean Difference of Drinks: 3.301
##
## Weighted Average Standard Deviation: 3.768
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of Drinks.
## Group Female: Sample mean assumed normal because n > 30, so no test needed.
## Group Male: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of Drinks, homogeneous.
## Variance Ratio test: F = 24.291/7.396 = 3.284, df = 101;150, p-value = 0.000
## Levene's test, Brown-Forsythe: t = 5.471, df = 251, p-value = 0.000
##
##
## ------ Infer ------
##
## --- Assume equal population variances of Drinks for each Gender
##
## t-cutoff for 95% range of variation: tcut = 1.969
## Standard Error of Mean Difference: SE = 0.483
##
## Hypothesis Test of 0 Mean Diff: t-value = 6.836, df = 251, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 0.951
## 95% Confidence Interval for Mean Difference: 2.350 to 4.252
##
##
## --- Do not assume equal population variances of Drinks for each Gender
##
## t-cutoff: tcut = 1.977
## Standard Error of Mean Difference: SE = 0.536
##
## Hypothesis Test of 0 Mean Diff: t = 6.160, df = 142.754, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 1.059
## 95% Confidence Interval for Mean Difference: 2.242 to 4.360
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of Drinks for each Gender
##
## Standardized Mean Difference of Drinks, Cohen's d: 0.876
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for Gender Female: 2.227
## Density bandwidth for Gender Male: 1.136
# Visualize number of drinks per week by gender
Plot(x = Gender, y = Drinks, data = SleepStudy)
##
## >>> Suggestions or enter: style(suggest=FALSE)
## Plot(Gender, Drinks, data=SleepStudy, means=FALSE) # do not plot means
## Plot(Gender, Drinks, data=SleepStudy, stat="mean") # only plot means
## ttest(Drinks ~ Gender) # inferential analysis
##
## Drinks
## - by levels of -
## Gender
##
## n miss mean sd min mdn max
## Male 151 0 4.238 2.720 0.000 4.000 12.000
## Female 102 0 7.539 4.929 0.000 8.000 24.000
##
A two-sample t-test was conducted to compare the number of alcoholic drinks consumed per week between male and female students.
The test was statistically significant, t(142.75) = 6.160, p < 0.001, with a 95% confidence interval of [2.242, 4.360]. The effect size (Cohen’s d = 0.876) indicates a large difference in weekly alcohol consumption between genders.
We reject the null hypothesis and conclude that female students reported significantly more alcoholic drinks per week than male students in this sample.
We want to determine whether students with high stress go to bed at a significantly different time on weekdays compared to those with normal stress.
#check variable names
table(SleepStudy$Stress)
##
## high normal
## 56 197
# Filter for only "high" and "normal" stress categories
SleepStudy_Stress <- subset(SleepStudy, Stress %in% c("high", "normal"))
SleepStudy_Stress$Stress <- factor(SleepStudy_Stress$Stress)
# Run t-test comparing weekday bedtime
ttest(WeekdayBed ~ Stress, data = SleepStudy_Stress)
##
## Compare WeekdayBed across Stress with levels normal and high
## Grouping Variable: Stress
## Response Variable: WeekdayBed
##
##
## ------ Describe ------
##
## WeekdayBed for Stress normal: n.miss = 0, n = 197, mean = 24.885, sd = 1.028
## WeekdayBed for Stress high: n.miss = 0, n = 56, mean = 24.715, sd = 1.053
##
## Mean Difference of WeekdayBed: 0.170
##
## Weighted Average Standard Deviation: 1.033
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of WeekdayBed.
## Group normal: Sample mean assumed normal because n > 30, so no test needed.
## Group high: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of WeekdayBed, homogeneous.
## Variance Ratio test: F = 1.108/1.056 = 1.049, df = 55;196, p-value = 0.792
## Levene's test, Brown-Forsythe: t = -0.054, df = 251, p-value = 0.957
##
##
## ------ Infer ------
##
## --- Assume equal population variances of WeekdayBed for each Stress
##
## t-cutoff for 95% range of variation: tcut = 1.969
## Standard Error of Mean Difference: SE = 0.156
##
## Hypothesis Test of 0 Mean Diff: t-value = 1.089, df = 251, p-value = 0.277
##
## Margin of Error for 95% Confidence Level: 0.308
## 95% Confidence Interval for Mean Difference: -0.138 to 0.479
##
##
## --- Do not assume equal population variances of WeekdayBed for each Stress
##
## t-cutoff: tcut = 1.988
## Standard Error of Mean Difference: SE = 0.159
##
## Hypothesis Test of 0 Mean Diff: t = 1.075, df = 87.048, p-value = 0.286
##
## Margin of Error for 95% Confidence Level: 0.315
## 95% Confidence Interval for Mean Difference: -0.145 to 0.486
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of WeekdayBed for each Stress
##
## Standardized Mean Difference of WeekdayBed, Cohen's d: 0.165
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for Stress normal: 0.407
## Density bandwidth for Stress high: 0.536
# Visualize bedtime by stress level
Plot(x = Stress, y = WeekdayBed, data = SleepStudy_Stress)
##
## >>> Suggestions or enter: style(suggest=FALSE)
## Plot(Stress, WeekdayBed, data=SleepStudy_Stress, means=FALSE) # do not plot means
## Plot(Stress, WeekdayBed, data=SleepStudy_Stress, stat="mean") # only plot means
## ttest(WeekdayBed ~ Stress) # inferential analysis
##
## WeekdayBed
## - by levels of -
## Stress
##
## n miss mean sd min mdn max
## high 56 0 24.715 1.053 22.830 24.700 26.800
## normal 197 0 24.885 1.028 21.800 24.900 29.100
##
A two-sample t-test was conducted to compare weekday bedtimes between students with high and normal stress levels.
The test was not statistically significant, t(251) = 1.089, p = 0.277, with a 95% confidence interval of [-0.138, 0.479]. The effect size (Cohen’s d = 0.165) indicates a small and negligible difference in bedtime behavior.
We fail to reject the null hypothesis. This suggests there is no significant difference in weekday bedtime between students with high and normal stress levels.
We want to determine whether students in their first two years of college sleep more or less on weekends compared to upper-year students.
# Create class group variable
SleepStudy$ClassGroup <- ifelse(SleepStudy$ClassYear <= 2, "First2Years", "UpperYears")
SleepStudy$ClassGroup <- factor(SleepStudy$ClassGroup)
# Run t-test comparing weekend sleep hours
ttest(WeekendSleep ~ ClassGroup, data = SleepStudy)
##
## Compare WeekendSleep across ClassGroup with levels UpperYears and First2Years
## Grouping Variable: ClassGroup
## Response Variable: WeekendSleep
##
##
## ------ Describe ------
##
## WeekendSleep for ClassGroup UpperYears: n.miss = 0, n = 111, mean = 8.222, sd = 1.363
## WeekendSleep for ClassGroup First2Years: n.miss = 0, n = 142, mean = 8.214, sd = 1.374
##
## Mean Difference of WeekendSleep: 0.008
##
## Weighted Average Standard Deviation: 1.369
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of WeekendSleep.
## Group UpperYears: Sample mean assumed normal because n > 30, so no test needed.
## Group First2Years: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of WeekendSleep, homogeneous.
## Variance Ratio test: F = 1.889/1.858 = 1.017, df = 141;110, p-value = 0.933
## Levene's test, Brown-Forsythe: t = -0.497, df = 251, p-value = 0.619
##
##
## ------ Infer ------
##
## --- Assume equal population variances of WeekendSleep for each ClassGroup
##
## t-cutoff for 95% range of variation: tcut = 1.969
## Standard Error of Mean Difference: SE = 0.174
##
## Hypothesis Test of 0 Mean Diff: t-value = 0.048, df = 251, p-value = 0.962
##
## Margin of Error for 95% Confidence Level: 0.342
## 95% Confidence Interval for Mean Difference: -0.333 to 0.350
##
##
## --- Do not assume equal population variances of WeekendSleep for each ClassGroup
##
## t-cutoff: tcut = 1.970
## Standard Error of Mean Difference: SE = 0.173
##
## Hypothesis Test of 0 Mean Diff: t = 0.048, df = 237.363, p-value = 0.962
##
## Margin of Error for 95% Confidence Level: 0.341
## 95% Confidence Interval for Mean Difference: -0.333 to 0.350
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of WeekendSleep for each ClassGroup
##
## Standardized Mean Difference of WeekendSleep, Cohen's d: 0.006
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for ClassGroup UpperYears: 0.606
## Density bandwidth for ClassGroup First2Years: 0.581
# Visualize weekend sleep by class group
Plot(x = ClassGroup, y = WeekendSleep, data = SleepStudy)
##
## >>> Suggestions or enter: style(suggest=FALSE)
## Plot(ClassGroup, WeekendSleep, data=SleepStudy, means=FALSE) # do not plot means
## Plot(ClassGroup, WeekendSleep, data=SleepStudy, stat="mean") # only plot means
## ttest(WeekendSleep ~ ClassGroup) # inferential analysis
##
## WeekendSleep
## - by levels of -
## ClassGroup
##
## n miss mean sd min mdn max
## First2Years 142 0 8.214 1.374 4.000 8.250 11.000
## UpperYears 111 0 8.222 1.363 4.380 8.250 12.750
##
A two-sample t-test was conducted to compare average weekend sleep hours between students in their first two years of college and upper-year students.
The test was not statistically significant, t(251) = 0.048, p = 0.962, with a 95% confidence interval of [-0.333, 0.350]. The effect size (Cohen’s d = 0.006) indicates virtually no difference.
We fail to reject the null hypothesis. This suggests there is no meaningful difference in weekend sleep duration between early-year and upper-year students.
This report analyzed ten research questions related to sleep habits, academic behavior, mental health, and lifestyle among college students using the SleepStudy dataset.
These results highlight a few meaningful patterns, such as the impact of mental health on happiness and gender differences in drinking behavior, while also reinforcing the importance of not assuming large behavioral differences based on single lifestyle factors. Overall, this analysis demonstrates the value of data-driven approaches to understanding student well-being.