In this project, we have performed experiment on the Tooth growth of Guinea pig. The main goal of this rubric is to understand the effect of two different delivery methods and/or three doses of Vitamin C on tooth growth. Beside this primary focus, we have also done some preliminary exploratory data analysis and some basic summary of the data.
In this project, we have to satisfy the following four requirements:
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
From the above summary, we can see that there are 60 observations and 3 variables. The variables are
Supp is a factor variable and other two are numeric. The summary also gives the maximum, minimum, mean, quantile and other information about the variables of the dataset.
Figure 1(a), shows that with increasing amount of dose the tooth also grows linearly. The range of variation is relatively lower at high dose (2). The effect of delivery method is not that straight forward from figure 1(b). In general, we can say that Orange Juice (OJ) is more effective than VC method although the length varies a lot in VC delivery method.
Figure 2 shows the tooth growth at different doses with corresponding delivery methods. The growth is maximum at high dose (2) regardless the delivery method. In all three doses, delivery using orange juice shows higher tooth growth. Delivery method VC and dose 0.5 gives the shortest tooth length. The red point in all the figures show the average tooth length.
The assumption for the t-test that means are equal. There will be no effect on tooth growth if means are euqal between different delivery methods and amount of doses.
## [1] "95% confidence interval of difference in mean based on delivery methods"
## [1] -0.1710156 7.5710156
## [1] "P-value : 0.0606345078809341"
In the above t-test, the 95% confidence interval contains zero (0). Thus, there is no effect of delivery method on tooth growth.
Compare between dose 0.5 and 1
## [1] "95% confidence interval of difference in mean between dose 0.5 and 1"
## [1] -11.983781 -6.276219
## [1] "P-value : 1.26830072017385e-07"
Compare between dose 1 and 2
## [1] "95% confidence interval of difference in mean between dose 1 and 2"
## [1] -8.996481 -3.733519
## [1] "P-value : 1.9064295136718e-05"
Compare between dose 0.5 and 2
## [1] "95% confidence interval of difference in mean between dose 0.5 and 2"
## [1] -18.15617 -12.83383
## [1] "P-value : 4.39752495936323e-14"
The above three t-tests compare the tooth growth based on different dose levels. In all three cases, the 95% confidence intervals are below zero and p-values are <0.05. So, we reject the null hypothesis that the mean is equal and the 95% confidence interval of the difference in mean tooth length is between the calculated values.
We can also do the t-test on different dose based on delivery method. Here, we will only look at dose 2, as from the figure 2, there should not be significant effect of delivery method at dose 2.
## [1] "95% confidence interval of difference in mean between delivery methods of dose 2"
## [1] -3.79807 3.63807
## [1] "P-value : 0.963851588723373"
Here, 95% confidence interval is between -3.79807 and 3.63807 and p-values is 0.9639. Hence, the confidence interval contains zero (0) also the p-value is > 0.05. So, there is no effect of delivery method and we can not reject the null hypothesis that means are equal.
Finally, we can say that tooth growth increases with higher dose. The delivery method also has some effect at low and medium doses. The delivery by orange juice seems to be more effective at low (0.5) and medium (1) dose.
library(datasets)
tooth <- ToothGrowth
str(tooth)
summary(tooth)
par(mfrow = c(1,2))
boxplot(len ~ dose, data = tooth,
main = "Fig 1(a): Tooth Growth vs. Dose",
xlab = "Amount of dose",
ylab = "Tooth length")
means <- tapply(tooth$len,tooth$dose,mean)
points(means, col = "red", pch = 18, cex = 2)
boxplot(len ~ supp, data = tooth,
main = "Fig 1(b): Tooth Growth vs. Delivery Method",
xlab = "Delivery Methods",
ylab = "Tooth length")
means <- tapply(tooth$len,tooth$supp,mean)
points(means, col = "red", pch = 18, cex = 2)
boxplot(len ~ supp + dose, data = tooth,
main = "Fig 2: Tooth Growth at different Dose and Delivery Method",
xlab = "Amount of (dose + delivery method)",
ylab = "Tooth length")
means <- tapply(tooth$len,list(tooth$supp, tooth$dose),mean)
means <- as.vector(means)
points(means, col = "red", pch = 18, cex = 2)
test1 <- t.test(len ~ supp, data = tooth)
print("95% confidence interval of difference in mean based on delivery methods")
test1$conf.int[1:2]
print(paste0("P-value : ", test1$p.value))
low_mid <- subset(tooth, dose == 0.5 | dose == 1)
test2 <- t.test(len ~ dose, data = low_mid)
print("95% confidence interval of difference in mean between dose 0.5 and 1")
test2$conf.int[1:2]
print(paste0("P-value : ", test2$p.value))
mid_hi <- subset(tooth, dose == 1 | dose == 2)
test3 <- t.test(len ~ dose, data = mid_hi)
print("95% confidence interval of difference in mean between dose 1 and 2")
test3$conf.int[1:2]
print(paste0("P-value : ", test3$p.value))
low_hi <- subset(tooth, dose == 0.5 | dose == 2)
test4 <- t.test(len ~ dose, data = low_hi)
print("95% confidence interval of difference in mean between dose 0.5 and 2")
test4$conf.int[1:2]
print(paste0("P-value : ", test4$p.value))
dose_2 <- subset(tooth, dose == 2)
test5 <- t.test(len ~ supp, data = dose_2)
print("95% confidence interval of difference in mean between delivery methods of dose 2")
test5$conf.int[1:2]
print(paste0("P-value : ", test5$p.value))