Project2a

2. Anaylsis

We will explore the questions in detail.

library(lessR)

## 
## lessR 4.3.8                         feedback: gerbing@pdx.edu 
## --------------------------------------------------------------
## > d <- Read("")   Read text, Excel, SPSS, SAS, or R data file
##   d is default data frame, data= in analysis routines optional
## 
## Many examples of reading, writing, and manipulating data, 
## graphics, testing means and proportions, regression, factor analysis,
## customization, and descriptive statistics from pivot tables
##   Enter: browseVignettes("lessR")
## 
## View lessR updates, now including time series forecasting
##   Enter: news(package="lessR")
## 
## Interactive data analysis
##   Enter: interact()

## 
## Attaching package: 'lessR'

## The following object is masked from 'package:base':
## 
##     sort_by

sleep = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(sleep)

##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
## 1      0         4 Neither             0          0 3.60             0
## 2      0         4 Neither             2          1 3.24             0
## 3      0         4     Owl             0          0 2.97            12
## 4      0         1    Lark             5          1 3.76             0
## 5      0         4     Owl             0          0 3.20             4
## 6      1         4 Neither             0          0 3.50             0
##   CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1           -0.26                4               4            3           8
## 2            1.39                6               1            0           3
## 3            0.38               18              18           18           9
## 4            1.39                9               1            4           6
## 5            1.22                9               7           25          14
## 6           -0.04                6              14            8          28
##   DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1           normal        normal normal       15        28   Moderate     10
## 2           normal        normal normal        4        25   Moderate      6
## 3         moderate        severe normal       45        17      Light      3
## 4           normal        normal normal       11        32      Light      2
## 5           normal        severe normal       46        15   Moderate      4
## 6         moderate      moderate   high       50        22    Abstain      0
##   WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1      25.75        8.70         7.70      25.75        9.50         5.88
## 2      25.70        8.20         6.80      26.00       10.00         7.25
## 3      27.44        6.55         3.00      28.00       12.59        10.09
## 4      23.50        7.17         6.77      27.00        8.00         7.25
## 5      25.90        8.67         6.09      23.75        9.50         7.00
## 6      23.80        8.95         9.05      26.00       10.75         9.00
##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0

sleep$AlcoholUseGroup <- ifelse(sleep$AlcoholUse == "Abstain" | sleep$AlcoholUse == "Light", "Low Use", "High Use")

sleep$ClassYear <- ifelse(sleep$AverageSleep == "Sleep" | sleep$AverageSleep == "First Year", "Second Year", "Rest")


survey = read.csv("https://www.lock5stat.com/datasets3e/NHANES.csv")
head(survey)

##   Case Organic    Health          HealthBinary Income
## 1    1      No      Good    Poor / Fair / Good 3324.5
## 2    2      No      Fair    Poor / Fair / Good 1024.0
## 3    3     Yes      Good    Poor / Fair / Good 2500.0
## 4    4      No Excellent Very good / Excellent 1450.0
## 5    5      No      Good    Poor / Fair / Good 1450.0
## 6    6      No      Good    Poor / Fair / Good 5824.0

Q1: Is there a signficant difference in the average GPA between male and female college students?

# Perform the 2-sample t-test for comparing means
ttest(GPA~Gender, data=sleep, alternative="two_sided")

## 
## Compare GPA across Gender with levels 0 and 1 
## Grouping Variable:  Gender
## Response Variable:  GPA
## 
## 
## ------ Describe ------
## 
## GPA for Gender 0:  n.miss = 0,  n = 151,  mean = 3.325,  sd = 0.375
## GPA for Gender 1:  n.miss = 0,  n = 102,  mean = 3.124,  sd = 0.418
## 
## Mean Difference of GPA:  0.201
## 
## Weighted Average Standard Deviation:   0.393 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of GPA.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
## Group 1: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of GPA, homogeneous.
## Variance Ratio test:  F = 0.174/0.141 = 1.240,  df = 101;150,  p-value = 0.232
## Levene's test, Brown-Forsythe:  t = -1.879,  df = 251,  p-value = 0.061
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of GPA for each Gender 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.050 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 3.996,  df = 251,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.099
## 95% Confidence Interval for Mean Difference:  0.102 to 0.300
## 
## 
## --- Do not assume equal population variances of GPA for each Gender 
## 
## t-cutoff: tcut =  1.972 
## Standard Error of Mean Difference: SE =  0.051 
## 
## Hypothesis Test of 0 Mean Diff:  t = 3.914,  df = 200.902, p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.101
## 95% Confidence Interval for Mean Difference:  0.100 to 0.303
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of GPA for each Gender 
## 
## Standardized Mean Difference of GPA, Cohen's d:  0.512
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for Gender 0: 0.154
## Density bandwidth for Gender 1: 0.189

No, the gpa is relatively similar

Q2: Is there a significant difference in the average number of early classes between the first two class years and other class years?

hist(sleep$NumEarlyClass)

Yes there is a significant difference, there were alot more classes in the early years compared to the later ones.

Q3. Do students who identify as “larks” have significantly better cognitive skills)cognition z-score) compared to “owls”?

median(sleep$LarkOwl)

## [1] "Neither"

The cost is varied between periods of ups and downs

Q4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyCLass=0)?

# Perform the 2-sample t-test for comparing means
ttest(ClassesMissed~EarlyClass, data=sleep, alternative="two_sided")

## 
## Compare ClassesMissed across EarlyClass with levels 0 and 1 
## Grouping Variable:  EarlyClass
## Response Variable:  ClassesMissed
## 
## 
## ------ Describe ------
## 
## ClassesMissed for EarlyClass 0:  n.miss = 0,  n = 85,  mean = 2.647,  sd = 3.477
## ClassesMissed for EarlyClass 1:  n.miss = 0,  n = 168,  mean = 1.988,  sd = 3.101
## 
## Mean Difference of ClassesMissed:  0.659
## 
## Weighted Average Standard Deviation:   3.232 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of ClassesMissed.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
## Group 1: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of ClassesMissed, homogeneous.
## Variance Ratio test:  F = 12.088/9.617 = 1.257,  df = 84;167,  p-value = 0.214
## Levene's test, Brown-Forsythe:  t = 1.373,  df = 251,  p-value = 0.171
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of ClassesMissed for each EarlyClass 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.430 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 1.532,  df = 251,  p-value = 0.127
## 
## Margin of Error for 95% Confidence Level:  0.847
## 95% Confidence Interval for Mean Difference:  -0.188 to 1.506
## 
## 
## --- Do not assume equal population variances of ClassesMissed for each EarlyClass 
## 
## t-cutoff: tcut =  1.976 
## Standard Error of Mean Difference: SE =  0.447 
## 
## Hypothesis Test of 0 Mean Diff:  t = 1.475,  df = 152.779, p-value = 0.142
## 
## Margin of Error for 95% Confidence Level:  0.882
## 95% Confidence Interval for Mean Difference:  -0.223 to 1.541
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of ClassesMissed for each EarlyClass 
## 
## Standardized Mean Difference of ClassesMissed, Cohen's d:  0.204
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for EarlyClass 0: 1.629
## Density bandwidth for EarlyClass 1: 1.044

No, there isn’t a significant difference in the average number of classes missed between those who had an early class and those who didnt

Q5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

hist(sleep$Happiness)

Yes, there is a significant differnce in the average happiness level and the students with normal depression are alot happier than those with moderate.

Q6. Is there a significant difference in average sleep quality scores between students who reported having at least one-all nighter (AllNighter = 1) and those who didn’t (AllNighter = 0)?

ttest(AverageSleep~AllNighter, data=sleep, alternative="two_sided")

## 
## Compare AverageSleep across AllNighter with levels 0 and 1 
## Grouping Variable:  AllNighter
## Response Variable:  AverageSleep
## 
## 
## ------ Describe ------
## 
## AverageSleep for AllNighter 0:  n.miss = 0,  n = 219,  mean = 8.074,  sd = 0.916
## AverageSleep for AllNighter 1:  n.miss = 0,  n = 34,  mean = 7.271,  sd = 0.994
## 
## Mean Difference of AverageSleep:  0.803
## 
## Weighted Average Standard Deviation:   0.927 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of AverageSleep.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
## Group 1: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of AverageSleep, homogeneous.
## Variance Ratio test:  F = 0.988/0.840 = 1.177,  df = 33;218,  p-value = 0.489
## Levene's test, Brown-Forsythe:  t = -0.815,  df = 251,  p-value = 0.416
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of AverageSleep for each AllNighter 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.171 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 4.698,  df = 251,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.336
## 95% Confidence Interval for Mean Difference:  0.466 to 1.139
## 
## 
## --- Do not assume equal population variances of AverageSleep for each AllNighter 
## 
## t-cutoff: tcut =  2.018 
## Standard Error of Mean Difference: SE =  0.181 
## 
## Hypothesis Test of 0 Mean Diff:  t = 4.426,  df = 42.171, p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.366
## 95% Confidence Interval for Mean Difference:  0.437 to 1.169
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of AverageSleep for each AllNighter 
## 
## Standardized Mean Difference of AverageSleep, Cohen's d:  0.866
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for AllNighter 0: 0.333
## Density bandwidth for AllNighter 1: 0.559

There is a signifcant difference as students who reported having at least one-all nighter had much lower sleep quality scores.

Q7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alchol use?

# Perform the 2-sample t-test for comparing means
ttest(StressScore~AlcoholUseGroup, data=sleep, alternative="two_sided")

## 
## Compare StressScore across AlcoholUseGroup with levels High Use and Low Use 
## Grouping Variable:  AlcoholUseGroup
## Response Variable:  StressScore
## 
## 
## ------ Describe ------
## 
## StressScore for AlcoholUseGroup High Use:  n.miss = 0,  n = 136,  mean = 9.581,  sd = 8.183
## StressScore for AlcoholUseGroup Low Use:  n.miss = 0,  n = 117,  mean = 9.333,  sd = 7.708
## 
## Mean Difference of StressScore:  0.248
## 
## Weighted Average Standard Deviation:   7.967 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of StressScore.
## Group High Use: Sample mean assumed normal because n > 30, so no test needed.
## Group Low Use: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of StressScore, homogeneous.
## Variance Ratio test:  F = 66.956/59.414 = 1.127,  df = 135;116,  p-value = 0.509
## Levene's test, Brown-Forsythe:  t = 0.251,  df = 251,  p-value = 0.802
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of StressScore for each AlcoholUseGroup 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  1.005 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 0.246,  df = 251,  p-value = 0.806
## 
## Margin of Error for 95% Confidence Level:  1.978
## 95% Confidence Interval for Mean Difference:  -1.731 to 2.226
## 
## 
## --- Do not assume equal population variances of StressScore for each AlcoholUseGroup 
## 
## t-cutoff: tcut =  1.970 
## Standard Error of Mean Difference: SE =  1.000 
## 
## Hypothesis Test of 0 Mean Diff:  t = 0.248,  df = 248.919, p-value = 0.805
## 
## Margin of Error for 95% Confidence Level:  1.970
## 95% Confidence Interval for Mean Difference:  -1.722 to 2.217
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of StressScore for each AlcoholUseGroup 
## 
## Standardized Mean Difference of StressScore, Cohen's d:  0.031
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for AlcoholUseGroup High Use: 2.895
## Density bandwidth for AlcoholUseGroup Low Use: 3.070

No, the students who abstain from alcohol usage dont have signifcantly better stress scores than those who don’t

Q8. Is there a significant difference in the average number of drinks per week between students of different genders?

# Perform the 2-sample t-test for comparing means
ttest(Drinks~Gender, data=sleep, alternative="two_sided")

## 
## Compare Drinks across Gender with levels 1 and 0 
## Grouping Variable:  Gender
## Response Variable:  Drinks
## 
## 
## ------ Describe ------
## 
## Drinks for Gender 1:  n.miss = 0,  n = 102,  mean = 7.539,  sd = 4.929
## Drinks for Gender 0:  n.miss = 0,  n = 151,  mean = 4.238,  sd = 2.720
## 
## Mean Difference of Drinks:  3.301
## 
## Weighted Average Standard Deviation:   3.768 
## 
## 
## ------ Assumptions ------
## 
## Note: These hypothesis tests can perform poorly, and the 
##       t-test is typically robust to violations of assumptions. 
##       Use as heuristic guides instead of interpreting literally. 
## 
## Null hypothesis, for each group, is a normal distribution of Drinks.
## Group 1: Sample mean assumed normal because n > 30, so no test needed.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
## 
## Null hypothesis is equal variances of Drinks, homogeneous.
## Variance Ratio test:  F = 24.291/7.396 = 3.284,  df = 101;150,  p-value = 0.000
## Levene's test, Brown-Forsythe:  t = 5.471,  df = 251,  p-value = 0.000
## 
## 
## ------ Infer ------
## 
## --- Assume equal population variances of Drinks for each Gender 
## 
## t-cutoff for 95% range of variation: tcut =  1.969 
## Standard Error of Mean Difference: SE =  0.483 
## 
## Hypothesis Test of 0 Mean Diff:  t-value = 6.836,  df = 251,  p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  0.951
## 95% Confidence Interval for Mean Difference:  2.350 to 4.252
## 
## 
## --- Do not assume equal population variances of Drinks for each Gender 
## 
## t-cutoff: tcut =  1.977 
## Standard Error of Mean Difference: SE =  0.536 
## 
## Hypothesis Test of 0 Mean Diff:  t = 6.160,  df = 142.754, p-value = 0.000
## 
## Margin of Error for 95% Confidence Level:  1.059
## 95% Confidence Interval for Mean Difference:  2.242 to 4.360
## 
## 
## ------ Effect Size ------
## 
## --- Assume equal population variances of Drinks for each Gender 
## 
## Standardized Mean Difference of Drinks, Cohen's d:  0.876
## 
## 
## ------ Practical Importance ------
## 
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
## 
## 
## ------ Graphics Smoothing Parameter ------
## 
## Density bandwidth for Gender 1: 2.227
## Density bandwidth for Gender 0: 1.136

Yes, male students drink more than the female students by a considerable margin.

Q9. Is there a significant difference in the average weekday bedttime between students with high and low stress (Stress=High vs Stress=Normal)?

hist(sleep$StressScore)

students with higher weekday bedtime are significantly less stressed than students with a lower weekday bedtime

Q10. Is there a significant difference in the average hour of sleep on weekends between first two year students and other students?

hist(sleep$AverageSleep)

Yes, the students in the other years slept less on the weekends than the first two year students did