Introduction

This report presents an analysis of sleep patterns among college students, utilizing the “SleepStudy” dataset obtained from Lock5 Datasets, Third Edition. The dataset comprises 253 observations on 27 variables, providing data about the sleep habits, mental health, and lifestyle choices of college students.

The primary objective of this analysis is to answer a series of questions about the college student data using statistical analysis. The results of this analysis offer insights into the the factors that affect the health, happiness, and academic performance of college students.

The following research questions will be addressed in this report:

Is there a significant difference in the average GPA between male and female college students?
Is there a significant difference in the average number of early classes between the first two class years and other class years?
Do students who identify as “larks” have significantly better cognitive skills compared to “owls”?
Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class and those who didn’t?
Is there a significant difference in the average happiness level between students with at least moderate depression and those with normal depression status?
Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter and those who didn’t?
Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?
Is there a significant difference in the average number of drinks per week between students of different genders?
Is there a significant difference in the average weekday bedtime between students with high and normal stress levels?
Is there a significant difference in the average hours of sleep on weekends between first-year and second-year students compared to other students?

By analyzing these questions, we will gain useful insights about college students, their sleep patterns, and other related factors.

Data

The data, according to the Lock5 Data Guide, Third Edition, “were obtained from a sample of students who did skills tests to measure cognitive function, completed a survey that asked many questions about attitudes and habits, and kept a sleep diary to record time and quality of sleep over a two week period.” It was sourced from Onyper, S., Thacher, P., Gilbert, J., Gradess, S., “Class Start Times, Sleep, and Academic Performance in College: A Path Analysis,” April 2012; 29(3): 318-335.

The “SleepStudy” dataset contains observations on the following 27 variables:

Gender: Gender of the student (1 = male, 0 = female).
ClassYear: Year in school, coded as 1 = first year, 2 = sophomore, 3 = junior, 4 = senior.
LarkOwl: Self-identification as a morning person (“Lark”), evening person (“Owl”), or Neither.
NumEarlyClass: Number of classes per week before 9 AM.
EarlyClass: Indicator of having at least one early class (1 = Yes, 0 = No).
GPA: Grade Point Average on a 0-4 scale.
ClassesMissed: Number of classes missed in a semester.
CognitionZscore: Z-score on a test of cognitive skills.
PoorSleepQuality: Measure of sleep quality, where higher values indicate poorer sleep.
DepressionScore: Measure of degree of depression.
AnxietyScore: Measure of amount of anxiety.
StressScore: Measure of amount of stress.
DepressionStatus: Coded depression score: normal, moderate, or severe.
AnxietyStatus: Coded anxiety score: normal, moderate, or severe.
Stress: Coded stress score: normal or high.
DASScore: Combined score for depression, anxiety, and stress.
Happiness: Measure of degree of happiness.
AlcoholUse: Self-reported level of alcohol use: Abstain, Light, Moderate, or Heavy.
Drinks: Number of alcoholic drinks consumed per week.
WeekdayBed: Average weekday bedtime (24.0 = midnight).
WeekdayRise: Average weekday rise time (8.0 = 8 AM).
WeekdaySleep: Average hours of sleep on weekdays.
WeekendBed: Average weekend bedtime (24.0 = midnight).
WeekendRise: Average weekend rise time (8.0 = 8 AM).
WeekendSleep: Average hours of sleep on weekends.
AverageSleep: Average hours of sleep across all days.
AllNighter: Indicator of having pulled an all-nighter this semester (1 = Yes, 0 = No).

Methodology

The analysis was performed and the report was compiled using R in Posit Cloud. The graphs were generated by the lessR library.

The lessR library ttest was used for most of the analysis because of its utility in evaluating hypothesis questions in a succinct way.

The threshold of statistical significance was set at a p-value of or below 0.05. Any p-values above this threshold were considered statistically insignificant.

Analysis

Question 1: Difference in Average GPA Between Male and Female Students

Hypothesis: Is there a significant difference in the average GPA between male and female college students?

We will perform an independent samples t-test to compare the average GPA between male and female students to see if there is a statistically significant difference between them.

In the Gender column of the dataset, 0 represents female while 1 represents male.

## 
## Compare GPA across Gender with levels Female and Male 
## Grouping Variable:  Gender
## Response Variable:  GPA
## 
## 
## ------ Describe ------
## 
## GPA for Gender Female:  n.miss = 0,  n = 151,  mean = 3.325,  sd = 0.375
## GPA for Gender Male:  n.miss = 0,  n = 102,  mean = 3.124,  sd = 0.418
## 
## Mean Difference of GPA:  0.201
## 
## Weighted Average Standard Deviation:   0.393 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of GPA.
## Group Female: Sample mean assumed normal because n > 30, so no test needed.
## Group Male: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of GPA, homogeneous.
## Variance Ratio test:  F = 0.174/0.141 = 1.240,  df = 101;150,  p-value = 0.232
## Levene's test, Brown-Forsythe:  t = -1.879,  df = 251,  p-value = 0.061
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of GPA for each Gender 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.050 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 3.996,  df = 251,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.099
## 95% Confidence Interval for Mean Difference:  0.102 to 0.300
## 
## 
## --- Do not assume equal population variances of GPA for each Gender 
## 
## t-cutoff: tcut =  1.972 
## Standard Error of Mean Difference: SE =  0.051 
## 
## Hypothesis Test of 0 Mean Diff:  t = 3.914,  df = 200.902, p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.101
## 95% Confidence Interval for Mean Difference:  0.100 to 0.303
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of GPA for each Gender 
## 
## Standardized Mean Difference of GPA, Cohen's d:  0.512
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for Gender Female: 0.154
## Density bandwidth for Gender Male: 0.189

The female group has a moderately higher average GPA than the male group, 3.325 to 3.124. Both groups have similar standard deviations, though the male group has a slightly higher one, 0.418 to the female group’s 0.375, indicating very slightly more variability within the male group. The p-value is near 0.0, indicating that it the difference is statistically significant. The mean difference in GPA is 0.201.

This test supports the hypothesis that there is a significant difference in the average GPA between male and female college students, with female students having a higher average GPA.

Question 2: Difference in Number of Early Classes Between Class Years

Hypothesis: Is there a significant difference in the average number of early classes between the first two class years and other class years?

An early class is defined in the dataset as a class before 9 AM.

## 
## Compare NumEarlyClass across YearGroup with levels FirstTwoYears and OtherYears 
## Grouping Variable:  YearGroup
## Response Variable:  NumEarlyClass
## 
## 
## ------ Describe ------
## 
## NumEarlyClass for YearGroup FirstTwoYears:  n.miss = 0,  n = 142,  mean = 2.070,  sd = 1.657
## NumEarlyClass for YearGroup OtherYears:  n.miss = 0,  n = 111,  mean = 1.306,  sd = 1.249
## 
## Mean Difference of NumEarlyClass:  0.764
## 
## Weighted Average Standard Deviation:   1.492 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of NumEarlyClass.
## Group FirstTwoYears: Sample mean assumed normal because n > 30, so no test needed.
## Group OtherYears: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of NumEarlyClass, homogeneous.
## Variance Ratio test:  F = 2.747/1.560 = 1.761,  df = 141;110,  p-value = 0.002
## Levene's test, Brown-Forsythe:  t = 2.424,  df = 251,  p-value = 0.016
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of NumEarlyClass for each YearGroup 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.189 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 4.042,  df = 251,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.372
## 95% Confidence Interval for Mean Difference:  0.392 to 1.136
## 
## 
## --- Do not assume equal population variances of NumEarlyClass for each YearGroup 
## 
## t-cutoff: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.183 
## 
## Hypothesis Test of 0 Mean Diff:  t = 4.181,  df = 250.690, p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.360
## 95% Confidence Interval for Mean Difference:  0.404 to 1.124
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of NumEarlyClass for each YearGroup 
## 
## Standardized Mean Difference of NumEarlyClass, Cohen's d:  0.512
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for YearGroup FirstTwoYears: 0.701
## Density bandwidth for YearGroup OtherYears: 0.555

Students in the first two years take an average of 2.070 early classes per week, while students in other years take an average of 1.306, meaning that students in the first two years take 0.764 more early classes on average. This is statistically significant with a p-value nearing 0.0.

This test supports the hypothesis that there is a significant difference in the average number of early classes between the first two class years and other class years, with students in the first two years taking more early courses.

Question 3: Cognitive Skills Between “Larks” and “Owls”

Hypothesis: Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

We will compare the cognition Z-scores between students who identify as each.

## 
## Compare CognitionZscore across LarkOwl with levels Lark and Owl 
## Grouping Variable:  LarkOwl
## Response Variable:  CognitionZscore
## 
## 
## ------ Describe ------
## 
## CognitionZscore for LarkOwl Lark:  n.miss = 0,  n = 41,  mean = 0.090,  sd = 0.830
## CognitionZscore for LarkOwl Owl:  n.miss = 0,  n = 49,  mean = -0.038,  sd = 0.653
## 
## Mean Difference of CognitionZscore:  0.129
## 
## Weighted Average Standard Deviation:   0.738 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of CognitionZscore.
## Group Lark: Sample mean assumed normal because n > 30, so no test needed.
## Group Owl: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of CognitionZscore, homogeneous.
## Variance Ratio test:  F = 0.688/0.426 = 1.615,  df = 40;48,  p-value = 0.112
## Levene's test, Brown-Forsythe:  t = 1.336,  df = 88,  p-value = 0.185
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of CognitionZscore for each LarkOwl 
## 
## t-cutoff for 95% range of variation: tcut =  1.987 
## Standard Error of Mean Difference: SE =  0.156 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 0.823,  df = 88,  p-value = 0.413
## 
## Margin of Error for 95% Confidence Level:  0.311
## 95% Confidence Interval for Mean Difference:  -0.182 to 0.439
## 
## 
## --- Do not assume equal population variances of CognitionZscore for each LarkOwl 
## 
## t-cutoff: tcut =  1.992 
## Standard Error of Mean Difference: SE =  0.160 
## 
## Hypothesis Test of 0 Mean Diff:  t = 0.806,  df = 75.331, p-value = 0.423
## 
## Margin of Error for 95% Confidence Level:  0.318
## 95% Confidence Interval for Mean Difference:  -0.189 to 0.447
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of CognitionZscore for each LarkOwl 
## 
## Standardized Mean Difference of CognitionZscore, Cohen's d:  0.174
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for LarkOwl Lark: 0.450
## Density bandwidth for LarkOwl Owl: 0.341

The t-test compared the cognitive skills of “owls” and “larks.” The test failed to find a statistically significant difference with a p-value of 0.413, greater than the threshold of 0.05, supporting the null hypothesis of no significant difference in cognitive skills between the two groups.

This test does not support the hypothesis that students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls.”

Question 4: Classes Missed Between Students With and Without Early Classes

Hypothesis: Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

We will compare the number of missed classes for students with and without early classes.

## 
## Compare ClassesMissed across EarlyClass with levels 0 and >= 1 
## Grouping Variable:  EarlyClass
## Response Variable:  ClassesMissed
## 
## 
## ------ Describe ------
## 
## ClassesMissed for EarlyClass 0:  n.miss = 0,  n = 85,  mean = 2.647,  sd = 3.477
## ClassesMissed for EarlyClass >= 1:  n.miss = 0,  n = 168,  mean = 1.988,  sd = 3.101
## 
## Mean Difference of ClassesMissed:  0.659
## 
## Weighted Average Standard Deviation:   3.232 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of ClassesMissed.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
## Group >= 1: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of ClassesMissed, homogeneous.
## Variance Ratio test:  F = 12.088/9.617 = 1.257,  df = 84;167,  p-value = 0.214
## Levene's test, Brown-Forsythe:  t = 1.373,  df = 251,  p-value = 0.171
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of ClassesMissed for each EarlyClass 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.430 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 1.532,  df = 251,  p-value = 0.127
## 
## Margin of Error for 95% Confidence Level:  0.847
## 95% Confidence Interval for Mean Difference:  -0.188 to 1.506
## 
## 
## --- Do not assume equal population variances of ClassesMissed for each EarlyClass 
## 
## t-cutoff: tcut =  1.976 
## Standard Error of Mean Difference: SE =  0.447 
## 
## Hypothesis Test of 0 Mean Diff:  t = 1.475,  df = 152.779, p-value = 0.142
## 
## Margin of Error for 95% Confidence Level:  0.882
## 95% Confidence Interval for Mean Difference:  -0.223 to 1.541
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of ClassesMissed for each EarlyClass 
## 
## Standardized Mean Difference of ClassesMissed, Cohen's d:  0.204
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for EarlyClass 0: 1.629
## Density bandwidth for EarlyClass >= 1: 1.044

The students with early with early classes missed slightly fewer classes, 1.988, compared to students without early classes, 2.647, but the p-value of 0.127 exceeds the threshold of 0.05, so we cannot reject the null hypothesis that the difference is due to random chance. Therefore, the difference is not statistically significant.

This test does not support the hypothesis that there is a significant difference in the average number of classes missed in a semester between students who had at least one early class and those who didn’t.

Question 5: Happiness Levels Between Different Depression Statuses

Hypothesis: Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

We will compare the average happiness levels of students with at least moderate depression and those with normal depression status.

## 
## Compare Happiness across DepressionGroup with levels Normal and AtLeastModerate 
## Grouping Variable:  DepressionGroup
## Response Variable:  Happiness
## 
## 
## ------ Describe ------
## 
## Happiness for DepressionGroup Normal:  n.miss = 0,  n = 209,  mean = 27.057,  sd = 4.885
## Happiness for DepressionGroup AtLeastModerate:  n.miss = 0,  n = 44,  mean = 21.614,  sd = 6.005
## 
## Mean Difference of Happiness:  5.444
## 
## Weighted Average Standard Deviation:   5.094 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of Happiness.
## Group Normal: Sample mean assumed normal because n > 30, so no test needed.
## Group AtLeastModerate: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of Happiness, homogeneous.
## Variance Ratio test:  F = 36.057/23.862 = 1.511,  df = 43;208,  p-value = 0.062
## Levene's test, Brown-Forsythe:  t = -2.246,  df = 251,  p-value = 0.026
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of Happiness for each DepressionGroup 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.845 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 6.443,  df = 251,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  1.664
## 95% Confidence Interval for Mean Difference:  3.780 to 7.108
## 
## 
## --- Do not assume equal population variances of Happiness for each DepressionGroup 
## 
## t-cutoff: tcut =  2.004 
## Standard Error of Mean Difference: SE =  0.966 
## 
## Hypothesis Test of 0 Mean Diff:  t = 5.634,  df = 55.594, p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  1.936
## 95% Confidence Interval for Mean Difference:  3.508 to 7.380
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of Happiness for each DepressionGroup 
## 
## Standardized Mean Difference of Happiness, Cohen's d:  1.069
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for DepressionGroup Normal: 1.202
## Density bandwidth for DepressionGroup AtLeastModerate: 3.211

The “normal” depression group had an average happiness score of 27.057, substantially higher than the “moderate to severe” depression group that had an average happiness score of 21.614, showing a statistically significant mean difference of 5.444 with a p-value nearing 0.0.

This test supports the hypothesis that there is a significant difference in the average happiness level between students with at least moderate depression and normal depression status, which is an intuitive result, as students who are more depressed would be expected to be less happy.

Question 6: Sleep Quality Scores and All-Nighters

Hypothesis: Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

We will compare the average sleep quality scores between students who have pulled at least one all-nighter and those who haven’t.

## 
## Compare PoorSleepQuality across AllNighter with levels 1 and 0 
## Grouping Variable:  AllNighter
## Response Variable:  PoorSleepQuality
## 
## 
## ------ Describe ------
## 
## PoorSleepQuality for AllNighter 1:  n.miss = 0,  n = 34,  mean = 7.029,  sd = 2.823
## PoorSleepQuality for AllNighter 0:  n.miss = 0,  n = 219,  mean = 6.137,  sd = 2.922
## 
## Mean Difference of PoorSleepQuality:  0.892
## 
## Weighted Average Standard Deviation:   2.910 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of PoorSleepQuality.
## Group 1: Sample mean assumed normal because n > 30, so no test needed.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of PoorSleepQuality, homogeneous.
## Variance Ratio test:  F = 8.541/7.969 = 1.072,  df = 218;33,  p-value = 0.846
## Levene's test, Brown-Forsythe:  t = 0.279,  df = 251,  p-value = 0.780
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of PoorSleepQuality for each AllNighter 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.536 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 1.664,  df = 251,  p-value = 0.097
## 
## Margin of Error for 95% Confidence Level:  1.056
## 95% Confidence Interval for Mean Difference:  -0.164 to 1.949
## 
## 
## --- Do not assume equal population variances of PoorSleepQuality for each AllNighter 
## 
## t-cutoff: tcut =  2.014 
## Standard Error of Mean Difference: SE =  0.523 
## 
## Hypothesis Test of 0 Mean Diff:  t = 1.707,  df = 44.708, p-value = 0.095
## 
## Margin of Error for 95% Confidence Level:  1.053
## 95% Confidence Interval for Mean Difference:  -0.161 to 1.946
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of PoorSleepQuality for each AllNighter 
## 
## Standardized Mean Difference of PoorSleepQuality, Cohen's d:  0.307
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for AllNighter 1: 1.589
## Density bandwidth for AllNighter 0: 0.936

While there is a slight reduction in mean sleep quality for students that pull all-nighters, 6.137, compared to students who don’t, 7.029, the p-value of 0.097 is above the threshold of 0.05, so we cannot rule out the null hypothesis that there is not a statistically significant difference between the two groups.

This test does not support the hypothesis that there is a significant difference in average sleep quality scores between students who reported having at least one all-nighter and those who didn’t.

Question 7: Stress Scores and Alcohol Use

Hypothesis: Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

We will compare the stress scores between the students who abstain from alcohol and those who report heavy alcohol usage.

## 
## Compare StressScore across AlcoholGroup with levels Heavy and Abstain 
## Grouping Variable:  AlcoholGroup
## Response Variable:  StressScore
## 
## 
## ------ Describe ------
## 
## StressScore for AlcoholGroup Heavy:  n.miss = 0,  n = 16,  mean = 10.438,  sd = 7.797
## StressScore for AlcoholGroup Abstain:  n.miss = 0,  n = 34,  mean = 8.971,  sd = 7.582
## 
## Mean Difference of StressScore:  1.467
## 
## Weighted Average Standard Deviation:   7.650 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of StressScore.
## Group Heavy  Shapiro-Wilk normality test:  W = 0.961,  p-value = 0.687
## Group Abstain: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of StressScore, homogeneous.
## Variance Ratio test:  F = 60.796/57.484 = 1.058,  df = 15;33,  p-value = 0.856
## Levene's test, Brown-Forsythe:  t = 0.347,  df = 48,  p-value = 0.730
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of StressScore for each AlcoholGroup 
## 
## t-cutoff for 95% range of variation: tcut =  2.011 
## Standard Error of Mean Difference: SE =  2.319 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 0.633,  df = 48,  p-value = 0.530
## 
## Margin of Error for 95% Confidence Level:  4.663
## 95% Confidence Interval for Mean Difference:  -3.196 to 6.130
## 
## 
## --- Do not assume equal population variances of StressScore for each AlcoholGroup 
## 
## t-cutoff: tcut =  2.046 
## Standard Error of Mean Difference: SE =  2.343 
## 
## Hypothesis Test of 0 Mean Diff:  t = 0.626,  df = 28.733, p-value = 0.536
## 
## Margin of Error for 95% Confidence Level:  4.794
## 95% Confidence Interval for Mean Difference:  -3.327 to 6.261
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of StressScore for each AlcoholGroup 
## 
## Standardized Mean Difference of StressScore, Cohen's d:  0.192
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for AlcoholGroup Heavy: 5.096
## Density bandwidth for AlcoholGroup Abstain: 4.268

While the stress scores for the heavy alcohol usage group are slightly higher on average, 10.438 compared to 8.971 of the abstention group, with a p-value of 0.530, the difference is not statistically significant.

This test does not support the hypothesis that students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use.

Question 8: Number of Drinks Per Week Between Genders

Hypothesis: Is there a significant difference in the average number of drinks per week between students of different genders?

We will compare the average number of drinks per week between male and female students.

## 
## Compare Drinks across Gender with levels Male and Female 
## Grouping Variable:  Gender
## Response Variable:  Drinks
## 
## 
## ------ Describe ------
## 
## Drinks for Gender Male:  n.miss = 0,  n = 102,  mean = 7.539,  sd = 4.929
## Drinks for Gender Female:  n.miss = 0,  n = 151,  mean = 4.238,  sd = 2.720
## 
## Mean Difference of Drinks:  3.301
## 
## Weighted Average Standard Deviation:   3.768 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of Drinks.
## Group Male: Sample mean assumed normal because n > 30, so no test needed.
## Group Female: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of Drinks, homogeneous.
## Variance Ratio test:  F = 24.291/7.396 = 3.284,  df = 101;150,  p-value = 0.000
## Levene's test, Brown-Forsythe:  t = 5.471,  df = 251,  p-value = 0.000
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of Drinks for each Gender 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.483 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 6.836,  df = 251,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.951
## 95% Confidence Interval for Mean Difference:  2.350 to 4.252
## 
## 
## --- Do not assume equal population variances of Drinks for each Gender 
## 
## t-cutoff: tcut =  1.977 
## Standard Error of Mean Difference: SE =  0.536 
## 
## Hypothesis Test of 0 Mean Diff:  t = 6.160,  df = 142.754, p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  1.059
## 95% Confidence Interval for Mean Difference:  2.242 to 4.360
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of Drinks for each Gender 
## 
## Standardized Mean Difference of Drinks, Cohen's d:  0.876
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for Gender Male: 2.227
## Density bandwidth for Gender Female: 1.136

The average number of drinks per week for male students was 7.539, substantially higher than the female student average of 4.238.

The standard deviation for male students is 4.929, much higher than for female students, 2.720. This indicates a much greater variability for male students, while female students are clustered much closer together in the data.

The test supports the hypothesis that there is a significant difference in the average number of drinks per week between students of different genders, with male students drinking significantly more per week.

Question 9: Weekday Bedtime Between Different Stress Levels

Hypothesis: Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

We will compare the average weekday bedtime between students with high stress and those with normal stress levels. A bedtime of 24 represents midnight.

## 
## Compare WeekdayBed across Stress with levels normal and high 
## Grouping Variable:  Stress
## Response Variable:  WeekdayBed
## 
## 
## ------ Describe ------
## 
## WeekdayBed for Stress normal:  n.miss = 0,  n = 197,  mean = 24.885,  sd = 1.028
## WeekdayBed for Stress high:  n.miss = 0,  n = 56,  mean = 24.715,  sd = 1.053
## 
## Mean Difference of WeekdayBed:  0.170
## 
## Weighted Average Standard Deviation:   1.033 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of WeekdayBed.
## Group normal: Sample mean assumed normal because n > 30, so no test needed.
## Group high: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of WeekdayBed, homogeneous.
## Variance Ratio test:  F = 1.108/1.056 = 1.049,  df = 55;196,  p-value = 0.792
## Levene's test, Brown-Forsythe:  t = -0.054,  df = 251,  p-value = 0.957
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of WeekdayBed for each Stress 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.156 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 1.089,  df = 251,  p-value = 0.277
## 
## Margin of Error for 95% Confidence Level:  0.308
## 95% Confidence Interval for Mean Difference:  -0.138 to 0.479
## 
## 
## --- Do not assume equal population variances of WeekdayBed for each Stress 
## 
## t-cutoff: tcut =  1.988 
## Standard Error of Mean Difference: SE =  0.159 
## 
## Hypothesis Test of 0 Mean Diff:  t = 1.075,  df = 87.048, p-value = 0.286
## 
## Margin of Error for 95% Confidence Level:  0.315
## 95% Confidence Interval for Mean Difference:  -0.145 to 0.486
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of WeekdayBed for each Stress 
## 
## Standardized Mean Difference of WeekdayBed, Cohen's d:  0.165
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for Stress normal: 0.407
## Density bandwidth for Stress high: 0.536

The average weekday bedtime for high stress students was 24.715 (12:43 AM), and normal stress students was 24.885 (12:53 AM), which are very similar, as are their standard deviations, 1.053 and 1.028, respectively, showing a similar variance among the groups. The p-value of 0.277 is above the threshold of 0.05, adding further support for the null hypothesis, as the differences are not statistically significant.

This test does not support the hypothesis that there is a significant difference in the average weekday bedtime between students with high and low stress.

Question 10: Weekend Sleep Hours Between Class Years

Hypothesis: Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

We will compare the average hours of sleep on weekends between first-year and second-year students to those of other students.

## 
## Compare WeekendSleep across YearGroup with levels OtherYears and FirstTwoYears 
## Grouping Variable:  YearGroup
## Response Variable:  WeekendSleep
## 
## 
## ------ Describe ------
## 
## WeekendSleep for YearGroup OtherYears:  n.miss = 0,  n = 111,  mean = 8.222,  sd = 1.363
## WeekendSleep for YearGroup FirstTwoYears:  n.miss = 0,  n = 142,  mean = 8.214,  sd = 1.374
## 
## Mean Difference of WeekendSleep:  0.008
## 
## Weighted Average Standard Deviation:   1.369 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of WeekendSleep.
## Group OtherYears: Sample mean assumed normal because n > 30, so no test needed.
## Group FirstTwoYears: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of WeekendSleep, homogeneous.
## Variance Ratio test:  F = 1.889/1.858 = 1.017,  df = 141;110,  p-value = 0.933
## Levene's test, Brown-Forsythe:  t = -0.497,  df = 251,  p-value = 0.619
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of WeekendSleep for each YearGroup 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.174 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 0.048,  df = 251,  p-value = 0.962
## 
## Margin of Error for 95% Confidence Level:  0.342
## 95% Confidence Interval for Mean Difference:  -0.333 to 0.350
## 
## 
## --- Do not assume equal population variances of WeekendSleep for each YearGroup 
## 
## t-cutoff: tcut =  1.970 
## Standard Error of Mean Difference: SE =  0.173 
## 
## Hypothesis Test of 0 Mean Diff:  t = 0.048,  df = 237.363, p-value = 0.962
## 
## Margin of Error for 95% Confidence Level:  0.341
## 95% Confidence Interval for Mean Difference:  -0.333 to 0.350
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of WeekendSleep for each YearGroup 
## 
## Standardized Mean Difference of WeekendSleep, Cohen's d:  0.006
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for YearGroup OtherYears: 0.606
## Density bandwidth for YearGroup FirstTwoYears: 0.581

The average weekend sleeping hours for first two year students and other students was nearly the same in this dataset, about 8.2 hours. The p-value of 0.256 is above the threshold of 0.05, indicating that there is not a statistically significant difference between these two groups.

This test does not support the hypothesis that there is a significant difference in the average hours of sleep on weekends between first two year students and other students.

Summary

In this report, we conducted statistical analyses to address ten research questions concerning sleep patterns, academic performance, and well-being among college students. The key findings are as follows:

GPA and Gender: Females had a slightly higher GPA than males by 0.201. The difference was statistically significant.
Early Classes and Class Year: Students in the first two years had significantly more early classes, averaging 2.070, compared to students in other years, averaging 1.306. The difference was statistically significant.
Cognitive Skills and Lark/Owl Preference: There was not a statistically significant difference in cognitive skills between “larks” and “owls.”
Classes Missed and Early Classes: Students with early classes missed slightly fewer classes on average, 1.988, compared to students without early classes, 2.647, but the difference was not statistically significant.
Happiness and Depression Status: Students with normal depression status reported significantly higher average happiness levels, 27.06, compared to students with moderate to severe depression, 21.61. The difference was statistically significant.
Sleep Quality and All-Nighters: Students who pulled all-nighters had an average sleep score of 6.137, lower than those who didn’t, 7.029, but the difference was not statistically significant.
Stress Scores and Alcohol Use: Students who abstained from alcohol had an average stress score of 8.971, slightly lower than heavy alcohol users with an average stress score of 10.438, but the difference was not statistically significant.
Alcohol Consumption and Gender: Male students consumed an average of 7.539 drinks per week, while female students consumed an average of 4.238 drinks per week, meaning male students averaged 3.301 more per week. The standard deviations were 4.929 for males and 2.720 for females, showing greater variability among male students. The results were statistically significant.
Weekday Bedtime and Stress Levels: Students with high stress levels had an average weekday bedtime of 24.715 (12:43 AM), while those with normal stress levels had an average bedtime of 24.885 (12:53 AM). The difference was not statistically significant.
Weekend Sleep and Class Year: Students in the first two class years had an average weekend sleep of 8.217 hours, while those in other class years averaged 8.213 hours. The difference was not statistically significant.

The results of this analysis offer insights into the the factors that affect the health, happiness, and academic performance of college students.

References

Lock5 Datasets, Third Edition. “SleepStudy Dataset.” Accessed Dec. 1, 2024. Available at:

https://www.lock5stat.com/datapage3e.html.

STAT 353 Project 2: Analyzing College Student Sleep Data

Kent Biernath

2024-12-01