We use the data from ….
I propose the following 10 questions based.
We will explore the questions in detail.
library(lessR)
##
## lessR 4.3.8 feedback: gerbing@pdx.edu
## --------------------------------------------------------------
## > d <- Read("") Read text, Excel, SPSS, SAS, or R data file
## d is default data frame, data= in analysis routines optional
##
## Many examples of reading, writing, and manipulating data,
## graphics, testing means and proportions, regression, factor analysis,
## customization, and descriptive statistics from pivot tables
## Enter: browseVignettes("lessR")
##
## View lessR updates, now including time series forecasting
## Enter: news(package="lessR")
##
## Interactive data analysis
## Enter: interact()
##
## Attaching package: 'lessR'
## The following object is masked from 'package:base':
##
## sort_by
sleep = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(sleep)
## Gender ClassYear LarkOwl NumEarlyClass EarlyClass GPA ClassesMissed
## 1 0 4 Neither 0 0 3.60 0
## 2 0 4 Neither 2 1 3.24 0
## 3 0 4 Owl 0 0 2.97 12
## 4 0 1 Lark 5 1 3.76 0
## 5 0 4 Owl 0 0 3.20 4
## 6 1 4 Neither 0 0 3.50 0
## CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1 -0.26 4 4 3 8
## 2 1.39 6 1 0 3
## 3 0.38 18 18 18 9
## 4 1.39 9 1 4 6
## 5 1.22 9 7 25 14
## 6 -0.04 6 14 8 28
## DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1 normal normal normal 15 28 Moderate 10
## 2 normal normal normal 4 25 Moderate 6
## 3 moderate severe normal 45 17 Light 3
## 4 normal normal normal 11 32 Light 2
## 5 normal severe normal 46 15 Moderate 4
## 6 moderate moderate high 50 22 Abstain 0
## WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1 25.75 8.70 7.70 25.75 9.50 5.88
## 2 25.70 8.20 6.80 26.00 10.00 7.25
## 3 27.44 6.55 3.00 28.00 12.59 10.09
## 4 23.50 7.17 6.77 27.00 8.00 7.25
## 5 25.90 8.67 6.09 23.75 9.50 7.00
## 6 23.80 8.95 9.05 26.00 10.75 9.00
## AverageSleep AllNighter
## 1 7.18 0
## 2 6.93 0
## 3 5.02 0
## 4 6.90 0
## 5 6.35 0
## 6 9.04 0
sleep$AlcoholUseGroup <- ifelse(sleep$AlcoholUse == "Abstain" | sleep$AlcoholUse == "Light", "Low Use", "High Use")
sleep$ClassYear <- ifelse(sleep$AverageSleep == "Sleep" | sleep$AverageSleep == "First Year", "Second Year", "Rest")
survey = read.csv("https://www.lock5stat.com/datasets3e/NHANES.csv")
head(survey)
## Case Organic Health HealthBinary Income
## 1 1 No Good Poor / Fair / Good 3324.5
## 2 2 No Fair Poor / Fair / Good 1024.0
## 3 3 Yes Good Poor / Fair / Good 2500.0
## 4 4 No Excellent Very good / Excellent 1450.0
## 5 5 No Good Poor / Fair / Good 1450.0
## 6 6 No Good Poor / Fair / Good 5824.0
# Perform the 2-sample t-test for comparing means
ttest(GPA~Gender, data=sleep, alternative="two_sided")
##
## Compare GPA across Gender with levels 0 and 1
## Grouping Variable: Gender
## Response Variable: GPA
##
##
## ------ Describe ------
##
## GPA for Gender 0: n.miss = 0, n = 151, mean = 3.325, sd = 0.375
## GPA for Gender 1: n.miss = 0, n = 102, mean = 3.124, sd = 0.418
##
## Mean Difference of GPA: 0.201
##
## Weighted Average Standard Deviation: 0.393
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of GPA.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
## Group 1: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of GPA, homogeneous.
## Variance Ratio test: F = 0.174/0.141 = 1.240, df = 101;150, p-value = 0.232
## Levene's test, Brown-Forsythe: t = -1.879, df = 251, p-value = 0.061
##
##
## ------ Infer ------
##
## --- Assume equal population variances of GPA for each Gender
##
## t-cutoff for 95% range of variation: tcut = 1.969
## Standard Error of Mean Difference: SE = 0.050
##
## Hypothesis Test of 0 Mean Diff: t-value = 3.996, df = 251, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 0.099
## 95% Confidence Interval for Mean Difference: 0.102 to 0.300
##
##
## --- Do not assume equal population variances of GPA for each Gender
##
## t-cutoff: tcut = 1.972
## Standard Error of Mean Difference: SE = 0.051
##
## Hypothesis Test of 0 Mean Diff: t = 3.914, df = 200.902, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 0.101
## 95% Confidence Interval for Mean Difference: 0.100 to 0.303
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of GPA for each Gender
##
## Standardized Mean Difference of GPA, Cohen's d: 0.512
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for Gender 0: 0.154
## Density bandwidth for Gender 1: 0.189
No, the gpa is relatively similar
hist(sleep$NumEarlyClass)
Yes there is a significant difference, there were alot more classes in the early years compared to the later ones.
median(sleep$LarkOwl)
## [1] "Neither"
The cost is varied between periods of ups and downs
# Perform the 2-sample t-test for comparing means
ttest(ClassesMissed~EarlyClass, data=sleep, alternative="two_sided")
##
## Compare ClassesMissed across EarlyClass with levels 0 and 1
## Grouping Variable: EarlyClass
## Response Variable: ClassesMissed
##
##
## ------ Describe ------
##
## ClassesMissed for EarlyClass 0: n.miss = 0, n = 85, mean = 2.647, sd = 3.477
## ClassesMissed for EarlyClass 1: n.miss = 0, n = 168, mean = 1.988, sd = 3.101
##
## Mean Difference of ClassesMissed: 0.659
##
## Weighted Average Standard Deviation: 3.232
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of ClassesMissed.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
## Group 1: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of ClassesMissed, homogeneous.
## Variance Ratio test: F = 12.088/9.617 = 1.257, df = 84;167, p-value = 0.214
## Levene's test, Brown-Forsythe: t = 1.373, df = 251, p-value = 0.171
##
##
## ------ Infer ------
##
## --- Assume equal population variances of ClassesMissed for each EarlyClass
##
## t-cutoff for 95% range of variation: tcut = 1.969
## Standard Error of Mean Difference: SE = 0.430
##
## Hypothesis Test of 0 Mean Diff: t-value = 1.532, df = 251, p-value = 0.127
##
## Margin of Error for 95% Confidence Level: 0.847
## 95% Confidence Interval for Mean Difference: -0.188 to 1.506
##
##
## --- Do not assume equal population variances of ClassesMissed for each EarlyClass
##
## t-cutoff: tcut = 1.976
## Standard Error of Mean Difference: SE = 0.447
##
## Hypothesis Test of 0 Mean Diff: t = 1.475, df = 152.779, p-value = 0.142
##
## Margin of Error for 95% Confidence Level: 0.882
## 95% Confidence Interval for Mean Difference: -0.223 to 1.541
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of ClassesMissed for each EarlyClass
##
## Standardized Mean Difference of ClassesMissed, Cohen's d: 0.204
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for EarlyClass 0: 1.629
## Density bandwidth for EarlyClass 1: 1.044
No, there isn’t a significant difference in the average number of
classes missed between those who had an early class and those who
didnt
hist(sleep$Happiness)
Yes, there is a significant differnce in the average happiness level and the students with normal depression are alot happier than those with moderate.
ttest(AverageSleep~AllNighter, data=sleep, alternative="two_sided")
##
## Compare AverageSleep across AllNighter with levels 0 and 1
## Grouping Variable: AllNighter
## Response Variable: AverageSleep
##
##
## ------ Describe ------
##
## AverageSleep for AllNighter 0: n.miss = 0, n = 219, mean = 8.074, sd = 0.916
## AverageSleep for AllNighter 1: n.miss = 0, n = 34, mean = 7.271, sd = 0.994
##
## Mean Difference of AverageSleep: 0.803
##
## Weighted Average Standard Deviation: 0.927
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of AverageSleep.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
## Group 1: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of AverageSleep, homogeneous.
## Variance Ratio test: F = 0.988/0.840 = 1.177, df = 33;218, p-value = 0.489
## Levene's test, Brown-Forsythe: t = -0.815, df = 251, p-value = 0.416
##
##
## ------ Infer ------
##
## --- Assume equal population variances of AverageSleep for each AllNighter
##
## t-cutoff for 95% range of variation: tcut = 1.969
## Standard Error of Mean Difference: SE = 0.171
##
## Hypothesis Test of 0 Mean Diff: t-value = 4.698, df = 251, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 0.336
## 95% Confidence Interval for Mean Difference: 0.466 to 1.139
##
##
## --- Do not assume equal population variances of AverageSleep for each AllNighter
##
## t-cutoff: tcut = 2.018
## Standard Error of Mean Difference: SE = 0.181
##
## Hypothesis Test of 0 Mean Diff: t = 4.426, df = 42.171, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 0.366
## 95% Confidence Interval for Mean Difference: 0.437 to 1.169
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of AverageSleep for each AllNighter
##
## Standardized Mean Difference of AverageSleep, Cohen's d: 0.866
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for AllNighter 0: 0.333
## Density bandwidth for AllNighter 1: 0.559
There is a signifcant difference as students who reported having at least one-all nighter had much lower sleep quality scores.
# Perform the 2-sample t-test for comparing means
ttest(StressScore~AlcoholUseGroup, data=sleep, alternative="two_sided")
##
## Compare StressScore across AlcoholUseGroup with levels High Use and Low Use
## Grouping Variable: AlcoholUseGroup
## Response Variable: StressScore
##
##
## ------ Describe ------
##
## StressScore for AlcoholUseGroup High Use: n.miss = 0, n = 136, mean = 9.581, sd = 8.183
## StressScore for AlcoholUseGroup Low Use: n.miss = 0, n = 117, mean = 9.333, sd = 7.708
##
## Mean Difference of StressScore: 0.248
##
## Weighted Average Standard Deviation: 7.967
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of StressScore.
## Group High Use: Sample mean assumed normal because n > 30, so no test needed.
## Group Low Use: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of StressScore, homogeneous.
## Variance Ratio test: F = 66.956/59.414 = 1.127, df = 135;116, p-value = 0.509
## Levene's test, Brown-Forsythe: t = 0.251, df = 251, p-value = 0.802
##
##
## ------ Infer ------
##
## --- Assume equal population variances of StressScore for each AlcoholUseGroup
##
## t-cutoff for 95% range of variation: tcut = 1.969
## Standard Error of Mean Difference: SE = 1.005
##
## Hypothesis Test of 0 Mean Diff: t-value = 0.246, df = 251, p-value = 0.806
##
## Margin of Error for 95% Confidence Level: 1.978
## 95% Confidence Interval for Mean Difference: -1.731 to 2.226
##
##
## --- Do not assume equal population variances of StressScore for each AlcoholUseGroup
##
## t-cutoff: tcut = 1.970
## Standard Error of Mean Difference: SE = 1.000
##
## Hypothesis Test of 0 Mean Diff: t = 0.248, df = 248.919, p-value = 0.805
##
## Margin of Error for 95% Confidence Level: 1.970
## 95% Confidence Interval for Mean Difference: -1.722 to 2.217
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of StressScore for each AlcoholUseGroup
##
## Standardized Mean Difference of StressScore, Cohen's d: 0.031
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for AlcoholUseGroup High Use: 2.895
## Density bandwidth for AlcoholUseGroup Low Use: 3.070
No, the students who abstain from alcohol usage dont have signifcantly better stress scores than those who don’t
# Perform the 2-sample t-test for comparing means
ttest(Drinks~Gender, data=sleep, alternative="two_sided")
##
## Compare Drinks across Gender with levels 1 and 0
## Grouping Variable: Gender
## Response Variable: Drinks
##
##
## ------ Describe ------
##
## Drinks for Gender 1: n.miss = 0, n = 102, mean = 7.539, sd = 4.929
## Drinks for Gender 0: n.miss = 0, n = 151, mean = 4.238, sd = 2.720
##
## Mean Difference of Drinks: 3.301
##
## Weighted Average Standard Deviation: 3.768
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of Drinks.
## Group 1: Sample mean assumed normal because n > 30, so no test needed.
## Group 0: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of Drinks, homogeneous.
## Variance Ratio test: F = 24.291/7.396 = 3.284, df = 101;150, p-value = 0.000
## Levene's test, Brown-Forsythe: t = 5.471, df = 251, p-value = 0.000
##
##
## ------ Infer ------
##
## --- Assume equal population variances of Drinks for each Gender
##
## t-cutoff for 95% range of variation: tcut = 1.969
## Standard Error of Mean Difference: SE = 0.483
##
## Hypothesis Test of 0 Mean Diff: t-value = 6.836, df = 251, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 0.951
## 95% Confidence Interval for Mean Difference: 2.350 to 4.252
##
##
## --- Do not assume equal population variances of Drinks for each Gender
##
## t-cutoff: tcut = 1.977
## Standard Error of Mean Difference: SE = 0.536
##
## Hypothesis Test of 0 Mean Diff: t = 6.160, df = 142.754, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 1.059
## 95% Confidence Interval for Mean Difference: 2.242 to 4.360
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of Drinks for each Gender
##
## Standardized Mean Difference of Drinks, Cohen's d: 0.876
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for Gender 1: 2.227
## Density bandwidth for Gender 0: 1.136
Yes, male students drink more than the female students by a considerable margin.
hist(sleep$StressScore)
students with higher weekday bedtime are significantly less stressed
than students with a lower weekday bedtime
hist(sleep$AverageSleep)
Yes, the students in the other years slept less on the weekends than the first two year students did
I wasn’t able to generate output for code that required more than 2 variables, but for the code that worked, The results came in as expected, the answers verify the questions and my assumptions. Sleep data is consistent, and answers are either close or far apart.