library(ggplot2)
library(datasets)
library(gridExtra)
data(ToothGrowth)
attach(ToothGrowth)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
The dataset has 60 observations and 3 variables (len, supp and dose). We will look at the first 6 rows of the data to have an overview how the data looks like
head(ToothGrowth)
qplot(supp, len, data = ToothGrowth, facets = ~dose, main = "Tooth Growth by Supplement and Dosage", xlab = "Supplement type", ylab = "Tooth Length") + geom_boxplot(aes(fill = supp))
According to the chart, dosage increases the tooth length. Also, the OJ increases the toothgrowth more than the VC except at the highest dosage (2.0 mg)
We assume that the variables are independent from each other, tooth growth follows the normal distribution and alpha is set to be 5%. We will perform hypothesis test with confidence interval and use t.test function to find 95% confidence interval
State Hypothesis
Null Value: lenOJ = lenVC
Alternative value: lenOJ > lenVC
Prepare the data set
OJ = subset(ToothGrowth, supp %in% c("OJ"))
VC = subset(ToothGrowth, supp %in% c("VC"))
t.test(len ~ supp, alternative = "greater", paired = FALSE, var.equal = FALSE, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.4682687 Inf
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
p-value is less than 5%. Also 95% CI ~ (0.47, inf)-> reject the null hypothesis –> lenOJ is greater than lenVC –> OJ has greater effect on the tooth length more than VC
State Hypothesis
Null Value: There is no correlation between the dose and Tooth Length
Alternative value: There is a correlation between the dose and Tooth Length
Prepare the dose for analysis
dose05_10 <- subset(ToothGrowth, dose %in% c(0.5, 1.0))
dose05_20 <- subset(ToothGrowth, dose %in% c(0.5, 2.0))
dose10_20 <- subset(ToothGrowth, dose %in% c(1.0, 2.0))
t.test(len ~ dose, paired = F, var.equal = F, data = dose05_10)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
p-value is less than 5% and 95% CI ~ (-11.98, -6.28)–> reject the null hypothesis
t.test(len ~ dose, paired = F, var.equal = F, data = dose05_20)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
The p-value is less than 5% and 95% CI ~(-18.16, -12.83)–> reject null hypothesis
t.test(len ~ dose, paired = F, var.equal =F, data = dose10_20)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
The p-value is less than 5% and 95% CI ~(-9, -3.7)–> reject the null hypothesis
*In conclusion, there is a correlation between the dosage and the tooth length
Prepare the data
dose05 <- subset(ToothGrowth, dose %in% c(0.5))
dose10 <- subset(ToothGrowth, dose %in% c(1.0))
dose20 <- subset(ToothGrowth, dose %in% c(2.0))
t.test(len ~ supp, paired = F, var.equal = F, data = dose05)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
p-value is less than 5% and 95% CI ~(1.72, 8.78) –> reject the null hypothesis
t.test(len ~ supp, paired = F, var.equal = F, data = dose10)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
The p-value is less than 5% and 95% CI ~(2.80, 9.06) –> reject the null hypothesis
t.test(len ~ supp, paired = F, var.equal = F, data = dose20)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
The p-value(0.9639) is greater than 5% and 95% CI ~(-3.80, 3.64) –> fail to reject the null hypothesis –> there is some effect on tooth length when using supplement delivery at the dosage of 2.0 mg
*** Conclusion **