1-Load the ToothGrowth data and perform some basic exploratory data analyses
data(ToothGrowth)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
plot(ToothGrowth$len~ToothGrowth$supp)
ToothGrowth$dose=as.factor(ToothGrowth$dose)
plot(ToothGrowth$len~ToothGrowth$dose)
table(ToothGrowth$supp,ToothGrowth$dose)
##
## 0.5 1 2
## OJ 10 10 10
## VC 10 10 10
2-Provide a basic summary of the data.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
3-Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)
-The populations are independent —- Yes, since the samples from the two are not related.
-The population variances are equal —- Yes (ratio falls from 0.5 to 2) ===> pooled t.test
sd(ToothGrowth$len[ToothGrowth$supp=="OJ"])/sd(ToothGrowth$len[ToothGrowth$supp=="VC"])
## [1] 0.7991215
var.test(ToothGrowth$len~ToothGrowth$supp)
##
## F test to compare two variances
##
## data: ToothGrowth$len by ToothGrowth$supp
## F = 0.6386, num df = 29, denom df = 29, p-value = 0.2331
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.3039488 1.3416857
## sample estimates:
## ratio of variances
## 0.6385951
-Each population is either normal or the sample size is large —– Yes n_1 and n_2 > 25
ggpubr::ggqqplot(ToothGrowth$len[ToothGrowth$supp=="OJ"])
ggpubr::ggqqplot(ToothGrowth$len[ToothGrowth$supp=="VC"])
t.test(ToothGrowth$len[ToothGrowth$supp=="OJ"],ToothGrowth$len[ToothGrowth$supp=="VC"],paired = F, var.equal = T)
##
## Two Sample t-test
##
## data: ToothGrowth$len[ToothGrowth$supp == "OJ"] and ToothGrowth$len[ToothGrowth$supp == "VC"]
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1670064 7.5670064
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
%%%%%ANOVA Assumptions 1-The responses for each factor level have a normal population distribution. ——Yes
ks.test(ToothGrowth$len[ToothGrowth$dose == 0.5],"pnorm" , mean=mean(ToothGrowth$len[ToothGrowth$dose == 0.5]), sd=sd(ToothGrowth$len[ToothGrowth$dose == 0.5]))
## Warning in ks.test(ToothGrowth$len[ToothGrowth$dose == 0.5], "pnorm", mean =
## mean(ToothGrowth$len[ToothGrowth$dose == : ties should not be present for the
## Kolmogorov-Smirnov test
##
## One-sample Kolmogorov-Smirnov test
##
## data: ToothGrowth$len[ToothGrowth$dose == 0.5]
## D = 0.17117, p-value = 0.6011
## alternative hypothesis: two-sided
ks.test(ToothGrowth$len[ToothGrowth$dose == 1],"pnorm" , mean=mean(ToothGrowth$len[ToothGrowth$dose == 1]), sd=sd(ToothGrowth$len[ToothGrowth$dose == 1]))
## Warning in ks.test(ToothGrowth$len[ToothGrowth$dose == 1], "pnorm", mean =
## mean(ToothGrowth$len[ToothGrowth$dose == : ties should not be present for the
## Kolmogorov-Smirnov test
##
## One-sample Kolmogorov-Smirnov test
##
## data: ToothGrowth$len[ToothGrowth$dose == 1]
## D = 0.15935, p-value = 0.6901
## alternative hypothesis: two-sided
ks.test(ToothGrowth$len[ToothGrowth$dose == 2],"pnorm" , mean=mean(ToothGrowth$len[ToothGrowth$dose == 2]), sd=sd(ToothGrowth$len[ToothGrowth$dose == 2]))
## Warning in ks.test(ToothGrowth$len[ToothGrowth$dose == 2], "pnorm", mean =
## mean(ToothGrowth$len[ToothGrowth$dose == : ties should not be present for the
## Kolmogorov-Smirnov test
##
## One-sample Kolmogorov-Smirnov test
##
## data: ToothGrowth$len[ToothGrowth$dose == 2]
## D = 0.13684, p-value = 0.848
## alternative hypothesis: two-sided
2-These distributions have the same variance. ——-Yes (compare the smallest and largest sample standard deviations: alls within 0.5 to 2)
max(c(sd(ToothGrowth$len[ToothGrowth$dose==0.5]),sd(ToothGrowth$len[ToothGrowth$dose==1]),sd(ToothGrowth$len[ToothGrowth$dose==2])))/min(c(sd(ToothGrowth$len[ToothGrowth$dose==0.5]),sd(ToothGrowth$len[ToothGrowth$dose==1]),sd(ToothGrowth$len[ToothGrowth$dose==2])))
## [1] 1.192259
bartlett.test(ToothGrowth$len~as.factor(ToothGrowth$dose))
##
## Bartlett test of homogeneity of variances
##
## data: ToothGrowth$len by as.factor(ToothGrowth$dose)
## Bartlett's K-squared = 0.66547, df = 2, p-value = 0.717
3-The data are independent. ——Yes
anova(lm(ToothGrowth$len~ToothGrowth$dose))
## Analysis of Variance Table
##
## Response: ToothGrowth$len
## Df Sum Sq Mean Sq F value Pr(>F)
## ToothGrowth$dose 2 2426.4 1213.2 67.416 9.533e-16 ***
## Residuals 57 1025.8 18.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#or summary(aov(ToothGrowth$len~ToothGrowth$dose))
TukeyHSD(aov(ToothGrowth$len~ToothGrowth$dose))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = ToothGrowth$len ~ ToothGrowth$dose)
##
## $`ToothGrowth$dose`
## diff lwr upr p adj
## 1-0.5 9.130 5.901805 12.358195 0.00e+00
## 2-0.5 15.495 12.266805 18.723195 0.00e+00
## 2-1 6.365 3.136805 9.593195 4.25e-05
4-State your conclusions and the assumptions needed for your conclusions.
ggpubr::ggdensity(ToothGrowth$len,
main = "Density plot of tooth length",
xlab = "Tooth length")
ggpubr::ggqqplot(ToothGrowth$len)
ks.test(ToothGrowth$len,"pnorm", mean=mean(ToothGrowth$len),sd=sd(ToothGrowth$len))
## Warning in ks.test(ToothGrowth$len, "pnorm", mean = mean(ToothGrowth$len), :
## ties should not be present for the Kolmogorov-Smirnov test
##
## One-sample Kolmogorov-Smirnov test
##
## data: ToothGrowth$len
## D = 0.097092, p-value = 0.6237
## alternative hypothesis: two-sided