library(tidyverse)
library(ggplot2)
library(dslabs)
library(readxl)
library(car)
Modality <- read_excel("Modality.xlsx")
boxplot(Modality$Final ~ Modality$Modality, xlab = "Teaching Modality", ylab = "Final Exam Score")
It looks like f2f is higher than the other 2
f2f <- Modality %>% filter(Modality == "f2f")
hist(f2f$Final) #looks normal
shapiro.test(f2f$Final) #0.961 ok!
##
## Shapiro-Wilk normality test
##
## data: f2f$Final
## W = 0.982, p-value = 0.961
hyb <- Modality %>% filter(Modality == "hybrid")
shapiro.test(hyb$Final) #0.8953 ok!
##
## Shapiro-Wilk normality test
##
## data: hyb$Final
## W = 0.96966, p-value = 0.8953
hist(hyb$Final) #looks ok
onl <- Modality %>% filter(Modality == "online")
shapiro.test(onl$Final) #0.9443 ok!
##
## Shapiro-Wilk normality test
##
## data: onl$Final
## W = 0.97663, p-value = 0.9443
hist(onl$Final)
meets the assumption of normality
leveneTest(Modality$Final ~ as.factor(Modality$Modality))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.534 0.5948
## 19
p-value = 0.5948 - ok! Assumption of homogeneity of variance is met!
looks fine
ok!
BUT NOT RELEVANT IN THIS CASE
kruskal.test(Modality$Final ~ Modality$Modality)
##
## Kruskal-Wallis rank sum test
##
## data: Modality$Final by Modality$Modality
## Kruskal-Wallis chi-squared = 5.3961, df = 2, p-value = 0.06734
#post-hoc test (non-parametric)
pairwise.wilcox.test(Modality$Final, Modality$Modality, p.adjust.method = "holm", exact=F)
##
## Pairwise comparisons using Wilcoxon rank sum test with continuity correction
##
## data: Modality$Final and Modality$Modality
##
## f2f hybrid
## hybrid 0.12 -
## online 0.12 0.96
##
## P value adjustment method: holm
anovaMod <- aov(Modality$Final ~ Modality$Modality)
summary(anovaMod)
## Df Sum Sq Mean Sq F value Pr(>F)
## Modality$Modality 2 1527 763.6 4.097 0.0332 *
## Residuals 19 3541 186.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
p-value = 0.0332 - this is significant!
Conclude that at least 1 group is significantly different, but which one??
pairwise.t.test(Modality$Final, Modality$Modality, p.adj="holm")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: Modality$Final and Modality$Modality
##
## f2f hybrid
## hybrid 0.06 -
## online 0.06 1.00
##
## P value adjustment method: holm
f2f is NOT significantly different from hybrid & online, but “marginally or close to significant”
hybrid & on-line are not significantly different from each other
There is a significant difference between means of final exam scores across modalities (p = 0.0332). But Post-hoc tests show that students in face-to-face modalities score higher on the final exam (but only marginally significant) than students in either the hybrid or on-line modalities. There is no difference in final exam scores of those in hybrid or on-line modalities.
You are interested in the relationship between anger and heart disease
cont_table <- data.frame(c(53, 110, 27), c(3057, 4621, 606))
chisq.test(cont_table)
##
## Pearson's Chi-squared test
##
## data: cont_table
## X-squared = 16.077, df = 2, p-value = 0.0003228
p-value is 0.0003
There is a significant difference in proportions
Assumptions that expected values > 5 are met.I know that because there is no error in the Pearson’s Chi-squared test output.
What would you conclude? We need descriptive statistics.
53/(53+3057)
## [1] 0.0170418
low anger = 0.017 ~ 1.7%
110 / (110 +4621)
## [1] 0.0232509
moderate anger 0.023 ~ 2.3%
27 / (27 + 606)
## [1] 0.04265403
high anger 0.04 ~ 4.3%
53/3057
## [1] 0.01733726
27/606
## [1] 0.04455446
(27/606) / (53/3057)
## [1] 2.569867
The proportion of CHD is higher among those with high anger compared to those with moderate or low anger. There is a significant difference in proportions across the groups. The odds of having CHD is 2.57 times higher for a person who scores high on the easily angered scale compared to a person who scores low on an easily angered scale
Depression and Recreational drugs
df <- data.frame(drug=c("E", "E", "E", "E", "E", "E", "E", "E", "E", "E",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A"),
depression = c(15, 35, 16, 18, 19, 17, 27, 16, 13, 20, 16, 15, 20, 15, 16, 13, 14, 19, 18, 18))
boxplot(df$depression ~ df$drug, xlab = "Type of drug", ylab="Depression score")
Ecstasy has a higher average with some high outliers
independent t-test
Conduct tests to determine if this data meets the assumptions
Independence - yes!
Data is interval - yes!
Data within groups is normal
ecstasy <- df %>% filter(drug=="E")
shapiro.test(ecstasy$depression) #p = 0.019 Significantly NOT normal!
##
## Shapiro-Wilk normality test
##
## data: ecstasy$depression
## W = 0.81064, p-value = 0.01952
hist(ecstasy$depression)
alcohol <- df %>% filter(drug == "A")
shapiro.test(alcohol$depression) #p=0.78 normal
##
## Shapiro-Wilk normality test
##
## data: alcohol$depression
## W = 0.95947, p-value = 0.7798
hist(alcohol$depression)
Assumption not met.From theShapiro-Wilk normality test for
ecstasy$depression the
W = 0.81064, p-value = 0.01952 means that there is
significance.
leveneTest(df$depression ~ df$drug)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 1.8803 0.1872
## 18
p-value = 0.187 - ok!
Homogeneity of variance assumption met!
Since assumptions aren’t met, run the Wilcoxon test
wilcox.test(df$depression ~ df$drug, exact=F)
##
## Wilcoxon rank sum test with continuity correction
##
## data: df$depression by df$drug
## W = 35.5, p-value = 0.2861
## alternative hypothesis: true location shift is not equal to 0
# An alternative way to run
wilcox.test(ecstasy$depression, alcohol$depression, exact=F)
##
## Wilcoxon rank sum test with continuity correction
##
## data: ecstasy$depression and alcohol$depression
## W = 64.5, p-value = 0.2861
## alternative hypothesis: true location shift is not equal to 0
p-value = 0.2861
p-value from wilcoxon test is 0.2861 - not significant. There is not a significant difference in depression following the use of ecstasy or alcohol.
mean(alcohol$depression)
## [1] 16.4
sd(alcohol$depression)
## [1] 2.270585
median(alcohol$depression)
## [1] 16
mean(ecstasy$depression)
## [1] 19.6
sd(ecstasy$depression)
## [1] 6.60303
median(ecstasy$depression)
## [1] 17.5
The depression score for ecstasy (Mdn=17.5) is not significantly different than depression score for alcohol (Mdn = 16).Since ecstasy did not meet our assumption of normality, a Wilcoxon rank-sum test was conducted. Results show that the median difference between groups was not significantly different W=64.5, p=0.29.
Fostering kittens & happiness
Kittens <- read_excel("Kittens.xlsx")
One option: boxplot
boxplot(Kittens$Kitten, Kittens$No_kitten, ylab = "Happiness", names=c("fostering", "no fostering"))
Second option: look at the difference scores
diff <- Kittens$Kitten - Kittens$No_kitten
boxplot(diff, ylab = "Difference in happiness of fostering v. not")
hist(diff, xlab = "Happiness of fostering - happiness without fostering")
Dependent t-test
Assumptions
Differences are normally distributed
Data are dependent - yes!
Data are measured at least at the interval level - yes!
shapiro.test(diff)
##
## Shapiro-Wilk normality test
##
## data: diff
## W = 0.86632, p-value = 0.01013
p-value = 0.01013 - not normal. Assumptions are not met.
wilcox.test(Kittens$Kitten, Kittens$No_kitten, paired=T, exact=F)
##
## Wilcoxon signed rank test with continuity correction
##
## data: Kittens$Kitten and Kittens$No_kitten
## V = 141.5, p-value = 0.06372
## alternative hypothesis: true location shift is not equal to 0
p-value = 0.06. This is not significant (but close).
median(diff)
## [1] 4.5
Testing whether fostering kittens increases happiness, we find that people experience an median increase of 4.5 on their happiness score. The difference scores were not normally distributed, so we ran a Wilcoxon signed-rank test. The results of this test suggest that this difference isn’t statistically different at alpha = 0.05 but the p-vale of 0.06 is close to our 0.05 cutoff. With an alpha = 0.05, we do not find a statistically significant increase in happiness with fostering kittens.