1. Introduction

We use the data from ….

I propose the following 10 questions based.

2. Anaylsis

We will explore the questions in detail.

library(lessR)
## 
## lessR 4.3.8                         feedback: gerbing@pdx.edu 
## --------------------------------------------------------------
## > d <- Read("")   Read text, Excel, SPSS, SAS, or R data file
##   d is default data frame, data= in analysis routines optional
## 
## Many examples of reading, writing, and manipulating data, 
## graphics, testing means and proportions, regression, factor analysis,
## customization, and descriptive statistics from pivot tables
##   Enter: browseVignettes("lessR")
## 
## View lessR updates, now including time series forecasting
##   Enter: news(package="lessR")
## 
## Interactive data analysis
##   Enter: interact()
## 
## Attaching package: 'lessR'
## The following object is masked from 'package:base':
## 
##     sort_by
sleep = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(sleep)
##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
## 1      0         4 Neither             0          0 3.60             0
## 2      0         4 Neither             2          1 3.24             0
## 3      0         4     Owl             0          0 2.97            12
## 4      0         1    Lark             5          1 3.76             0
## 5      0         4     Owl             0          0 3.20             4
## 6      1         4 Neither             0          0 3.50             0
##   CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1           -0.26                4               4            3           8
## 2            1.39                6               1            0           3
## 3            0.38               18              18           18           9
## 4            1.39                9               1            4           6
## 5            1.22                9               7           25          14
## 6           -0.04                6              14            8          28
##   DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1           normal        normal normal       15        28   Moderate     10
## 2           normal        normal normal        4        25   Moderate      6
## 3         moderate        severe normal       45        17      Light      3
## 4           normal        normal normal       11        32      Light      2
## 5           normal        severe normal       46        15   Moderate      4
## 6         moderate      moderate   high       50        22    Abstain      0
##   WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1      25.75        8.70         7.70      25.75        9.50         5.88
## 2      25.70        8.20         6.80      26.00       10.00         7.25
## 3      27.44        6.55         3.00      28.00       12.59        10.09
## 4      23.50        7.17         6.77      27.00        8.00         7.25
## 5      25.90        8.67         6.09      23.75        9.50         7.00
## 6      23.80        8.95         9.05      26.00       10.75         9.00
##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0
sleep$AlcoholUseGroup <- ifelse(sleep$AlcoholUse == "Abstain" | sleep$AlcoholUse == "Light", "Low Use", "High Use")

sleep$ClassYear <- ifelse(sleep$AverageSleep == "Sleep" | sleep$AverageSleep == "First Year", "Second Year", "Rest")


survey = read.csv("https://www.lock5stat.com/datasets3e/NHANES.csv")
head(survey)
##   Case Organic    Health          HealthBinary Income
## 1    1      No      Good    Poor / Fair / Good 3324.5
## 2    2      No      Fair    Poor / Fair / Good 1024.0
## 3    3     Yes      Good    Poor / Fair / Good 2500.0
## 4    4      No Excellent Very good / Excellent 1450.0
## 5    5      No      Good    Poor / Fair / Good 1450.0
## 6    6      No      Good    Poor / Fair / Good 5824.0

Q1: Is there a signficant difference in the average GPA between male and female college students?

# Perform the 2-sample t-test for comparing means
ttest(GPA~Gender, data=sleep, alternative="two_sided")
## 
## Compare GPA across Gender with levels 0 and 1 
## Grouping Variable:  Gender
## Response Variable:  GPA
## 
## 
## ------ Describe ------
## 
## GPA for Gender 0:  n.miss = 0,  n = 151,  mean = 3.325,  sd = 0.375
## GPA for Gender 1:  n.miss = 0,  n = 102,  mean = 3.124,  sd = 0.418
## 
## Mean Difference of GPA:  0.201
## 
## Weighted Average Standard Deviation:   0.393 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of GPA.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
## Group 1: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of GPA, homogeneous.
## Variance Ratio test:  F = 0.174/0.141 = 1.240,  df = 101;150,  p-value = 0.232
## Levene's test, Brown-Forsythe:  t = -1.879,  df = 251,  p-value = 0.061
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of GPA for each Gender 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.050 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 3.996,  df = 251,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.099
## 95% Confidence Interval for Mean Difference:  0.102 to 0.300
## 
## 
## --- Do not assume equal population variances of GPA for each Gender 
## 
## t-cutoff: tcut =  1.972 
## Standard Error of Mean Difference: SE =  0.051 
## 
## Hypothesis Test of 0 Mean Diff:  t = 3.914,  df = 200.902, p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.101
## 95% Confidence Interval for Mean Difference:  0.100 to 0.303
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of GPA for each Gender 
## 
## Standardized Mean Difference of GPA, Cohen's d:  0.512
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for Gender 0: 0.154
## Density bandwidth for Gender 1: 0.189

No, the gpa is relatively similar

Q2: Is there a significant difference in the average number of early classes between the first two class years and other class years?

hist(sleep$NumEarlyClass)

Yes there is a significant difference, there were alot more classes in the early years compared to the later ones.

Q3. Do students who identify as “larks” have significantly better cognitive skills)cognition z-score) compared to “owls”?

median(sleep$LarkOwl)
## [1] "Neither"

The cost is varied between periods of ups and downs

Q4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyCLass=0)?

# Perform the 2-sample t-test for comparing means
ttest(ClassesMissed~EarlyClass, data=sleep, alternative="two_sided")
## 
## Compare ClassesMissed across EarlyClass with levels 0 and 1 
## Grouping Variable:  EarlyClass
## Response Variable:  ClassesMissed
## 
## 
## ------ Describe ------
## 
## ClassesMissed for EarlyClass 0:  n.miss = 0,  n = 85,  mean = 2.647,  sd = 3.477
## ClassesMissed for EarlyClass 1:  n.miss = 0,  n = 168,  mean = 1.988,  sd = 3.101
## 
## Mean Difference of ClassesMissed:  0.659
## 
## Weighted Average Standard Deviation:   3.232 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of ClassesMissed.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
## Group 1: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of ClassesMissed, homogeneous.
## Variance Ratio test:  F = 12.088/9.617 = 1.257,  df = 84;167,  p-value = 0.214
## Levene's test, Brown-Forsythe:  t = 1.373,  df = 251,  p-value = 0.171
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of ClassesMissed for each EarlyClass 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.430 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 1.532,  df = 251,  p-value = 0.127
## 
## Margin of Error for 95% Confidence Level:  0.847
## 95% Confidence Interval for Mean Difference:  -0.188 to 1.506
## 
## 
## --- Do not assume equal population variances of ClassesMissed for each EarlyClass 
## 
## t-cutoff: tcut =  1.976 
## Standard Error of Mean Difference: SE =  0.447 
## 
## Hypothesis Test of 0 Mean Diff:  t = 1.475,  df = 152.779, p-value = 0.142
## 
## Margin of Error for 95% Confidence Level:  0.882
## 95% Confidence Interval for Mean Difference:  -0.223 to 1.541
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of ClassesMissed for each EarlyClass 
## 
## Standardized Mean Difference of ClassesMissed, Cohen's d:  0.204
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for EarlyClass 0: 1.629
## Density bandwidth for EarlyClass 1: 1.044

No, there isn’t a significant difference in the average number of classes missed between those who had an early class and those who didnt

Q5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

hist(sleep$Happiness)

Yes, there is a significant differnce in the average happiness level and the students with normal depression are alot happier than those with moderate.

Q6. Is there a significant difference in average sleep quality scores between students who reported having at least one-all nighter (AllNighter = 1) and those who didn’t (AllNighter = 0)?

ttest(AverageSleep~AllNighter, data=sleep, alternative="two_sided")
## 
## Compare AverageSleep across AllNighter with levels 0 and 1 
## Grouping Variable:  AllNighter
## Response Variable:  AverageSleep
## 
## 
## ------ Describe ------
## 
## AverageSleep for AllNighter 0:  n.miss = 0,  n = 219,  mean = 8.074,  sd = 0.916
## AverageSleep for AllNighter 1:  n.miss = 0,  n = 34,  mean = 7.271,  sd = 0.994
## 
## Mean Difference of AverageSleep:  0.803
## 
## Weighted Average Standard Deviation:   0.927 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of AverageSleep.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
## Group 1: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of AverageSleep, homogeneous.
## Variance Ratio test:  F = 0.988/0.840 = 1.177,  df = 33;218,  p-value = 0.489
## Levene's test, Brown-Forsythe:  t = -0.815,  df = 251,  p-value = 0.416
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of AverageSleep for each AllNighter 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.171 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 4.698,  df = 251,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.336
## 95% Confidence Interval for Mean Difference:  0.466 to 1.139
## 
## 
## --- Do not assume equal population variances of AverageSleep for each AllNighter 
## 
## t-cutoff: tcut =  2.018 
## Standard Error of Mean Difference: SE =  0.181 
## 
## Hypothesis Test of 0 Mean Diff:  t = 4.426,  df = 42.171, p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.366
## 95% Confidence Interval for Mean Difference:  0.437 to 1.169
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of AverageSleep for each AllNighter 
## 
## Standardized Mean Difference of AverageSleep, Cohen's d:  0.866
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for AllNighter 0: 0.333
## Density bandwidth for AllNighter 1: 0.559

There is a signifcant difference as students who reported having at least one-all nighter had much lower sleep quality scores.

Q7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alchol use?

# Perform the 2-sample t-test for comparing means
ttest(StressScore~AlcoholUseGroup, data=sleep, alternative="two_sided")
## 
## Compare StressScore across AlcoholUseGroup with levels High Use and Low Use 
## Grouping Variable:  AlcoholUseGroup
## Response Variable:  StressScore
## 
## 
## ------ Describe ------
## 
## StressScore for AlcoholUseGroup High Use:  n.miss = 0,  n = 136,  mean = 9.581,  sd = 8.183
## StressScore for AlcoholUseGroup Low Use:  n.miss = 0,  n = 117,  mean = 9.333,  sd = 7.708
## 
## Mean Difference of StressScore:  0.248
## 
## Weighted Average Standard Deviation:   7.967 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of StressScore.
## Group High Use: Sample mean assumed normal because n > 30, so no test needed.
## Group Low Use: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of StressScore, homogeneous.
## Variance Ratio test:  F = 66.956/59.414 = 1.127,  df = 135;116,  p-value = 0.509
## Levene's test, Brown-Forsythe:  t = 0.251,  df = 251,  p-value = 0.802
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of StressScore for each AlcoholUseGroup 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  1.005 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 0.246,  df = 251,  p-value = 0.806
## 
## Margin of Error for 95% Confidence Level:  1.978
## 95% Confidence Interval for Mean Difference:  -1.731 to 2.226
## 
## 
## --- Do not assume equal population variances of StressScore for each AlcoholUseGroup 
## 
## t-cutoff: tcut =  1.970 
## Standard Error of Mean Difference: SE =  1.000 
## 
## Hypothesis Test of 0 Mean Diff:  t = 0.248,  df = 248.919, p-value = 0.805
## 
## Margin of Error for 95% Confidence Level:  1.970
## 95% Confidence Interval for Mean Difference:  -1.722 to 2.217
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of StressScore for each AlcoholUseGroup 
## 
## Standardized Mean Difference of StressScore, Cohen's d:  0.031
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for AlcoholUseGroup High Use: 2.895
## Density bandwidth for AlcoholUseGroup Low Use: 3.070

No, the students who abstain from alcohol usage dont have signifcantly better stress scores than those who don’t

Q8. Is there a significant difference in the average number of drinks per week between students of different genders?

# Perform the 2-sample t-test for comparing means
ttest(Drinks~Gender, data=sleep, alternative="two_sided")
## 
## Compare Drinks across Gender with levels 1 and 0 
## Grouping Variable:  Gender
## Response Variable:  Drinks
## 
## 
## ------ Describe ------
## 
## Drinks for Gender 1:  n.miss = 0,  n = 102,  mean = 7.539,  sd = 4.929
## Drinks for Gender 0:  n.miss = 0,  n = 151,  mean = 4.238,  sd = 2.720
## 
## Mean Difference of Drinks:  3.301
## 
## Weighted Average Standard Deviation:   3.768 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of Drinks.
## Group 1: Sample mean assumed normal because n > 30, so no test needed.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of Drinks, homogeneous.
## Variance Ratio test:  F = 24.291/7.396 = 3.284,  df = 101;150,  p-value = 0.000
## Levene's test, Brown-Forsythe:  t = 5.471,  df = 251,  p-value = 0.000
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of Drinks for each Gender 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.483 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 6.836,  df = 251,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.951
## 95% Confidence Interval for Mean Difference:  2.350 to 4.252
## 
## 
## --- Do not assume equal population variances of Drinks for each Gender 
## 
## t-cutoff: tcut =  1.977 
## Standard Error of Mean Difference: SE =  0.536 
## 
## Hypothesis Test of 0 Mean Diff:  t = 6.160,  df = 142.754, p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  1.059
## 95% Confidence Interval for Mean Difference:  2.242 to 4.360
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of Drinks for each Gender 
## 
## Standardized Mean Difference of Drinks, Cohen's d:  0.876
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for Gender 1: 2.227
## Density bandwidth for Gender 0: 1.136

Yes, male students drink more than the female students by a considerable margin.

Q9. Is there a significant difference in the average weekday bedttime between students with high and low stress (Stress=High vs Stress=Normal)?

hist(sleep$StressScore)

students with higher weekday bedtime are significantly less stressed than students with a lower weekday bedtime

Q10. Is there a significant difference in the average hour of sleep on weekends between first two year students and other students?

hist(sleep$AverageSleep)

Yes, the students in the other years slept less on the weekends than the first two year students did

3. Summary

I wasn’t able to generate output for code that required more than 2 variables, but for the code that worked, The results came in as expected, the answers verify the questions and my assumptions. Sleep data is consistent, and answers are either close or far apart.