This project focuses on analyzing the ToothGrowth data in the R datasets package. The following topics are incorporated in the report:
1.Load the ToothGrowth data and perform some basic exploratory data analyses.
2.Provide a basic summary of the data.
3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose (only use the techniques from class, even if there is other approaches worth considering).
4.State your conclusions and the assumptions needed for your conclusions.
Load the necessary packages
library(datasets)
library(dplyr)
library(ggplot2)
library(reshape)
loading and summary of the datasets
data(ToothGrowth)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
dim(ToothGrowth)
## [1] 60 3
Description
?ToothGrowth
The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
The data frame has 60 observations and three variables namely len(tooth length), supp (suppliments types, VC (vitamin C) or OJ(Orange Juice)) and dose (dosage in milligrams/day).
Exploratory Data Analysis
1. Mean group by dose and supp
mean_growth<- aggregate(ToothGrowth$len,list(ToothGrowth$supp,ToothGrowth$dose),mean)
mean_growth<- rename(mean_growth, c(Group.1="Supplement", Group.2 = "Dosage", x = "Mean"))
mean_growth
## Supplement Dosage Mean
## 1 OJ 0.5 13.23
## 2 VC 0.5 7.98
## 3 OJ 1.0 22.70
## 4 VC 1.0 16.77
## 5 OJ 2.0 26.06
## 6 VC 2.0 26.14
2. Standard deviation grouped by dose and supp
sd_growth<- aggregate(ToothGrowth$len,list(ToothGrowth$supp,ToothGrowth$dose), sd)
sd_growth<- rename(sd_growth, c(Group.1="Supplement", Group.2 = "Dosage", x = "sd"))
sd_growth
## Supplement Dosage sd
## 1 OJ 0.5 4.459709
## 2 VC 0.5 2.746634
## 3 OJ 1.0 3.910953
## 4 VC 1.0 2.515309
## 5 OJ 2.0 2.655058
## 6 VC 2.0 4.797731
3. Create some plots to explore the data
# Convert dose to a factor
ToothGrowth$dose<-as.factor(ToothGrowth$dose)
ggplot(ToothGrowth, aes(x = dose, y = len, fill = dose))+
geom_boxplot()+ facet_grid(.~supp)+
labs(title = "Supplements dosage effect on Tooth Growth", x = "Doses", y = "Tooth Length")
4. Analyze the Tooth lenght relative to Suppliment.
ggplot(aes(x=supp, y=len), data=ToothGrowth) +
geom_boxplot(aes(fill=supp)) + xlab("Supplements") +
ylab("Tooth Length") + facet_grid(~ dose) +
ggtitle("Tooth length by dosage for each supplement")
As shown from above data, orange juice(OJ) looks to be doing better with dose 0.5 and 1 effect on teeth growth than VC (vitamin C). At dosage amount of 2, VC has higher variability than OJ.
5. Hypothesis Testing
The two hypothesis:
\(H_0\): the different supplements do not have effect on tooth length
\(H_a\): the different supplements have effects on tooth length
t_test<- t.test(len ~ supp, ToothGrowth)
# Extracting the necesary values
round(t_test$p.value, 3)
## [1] 0.061
t_test$conf.int
## [1] -0.1710156 7.5710156
## attr(,"conf.level")
## [1] 0.95
The p-value is greater than 0.05 and the confidence interval of the test contains zero. This indicates that we can not reject the null hypothesis that the different supplement types have no effect on tooth length.
For dose 0.5
ojdose0.5 <- ToothGrowth %>% filter(supp=="OJ" & dose=="0.5")
vcdose0.5 <- ToothGrowth %>% filter(supp=="VC" & dose=="0.5")
t.test(ojdose0.5$len,vcdose0.5$len)
##
## Welch Two Sample t-test
##
## data: ojdose0.5$len and vcdose0.5$len
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean of x mean of y
## 13.23 7.98
As it is observed from the above summary, for does 0.5 of both supliments at 95 percent confidence interval the limts do not contain zero (from 1.719057 to 8.780943). Hence, the difference in means is not equal to zero.
For dose 1
ojdose1 <- ToothGrowth %>% filter(supp=="OJ" & dose=="1")
vcdose1 <- ToothGrowth %>% filter(supp=="VC" & dose=="1")
t.test(ojdose1$len,vcdose1$len)
##
## Welch Two Sample t-test
##
## data: ojdose1$len and vcdose1$len
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean of x mean of y
## 22.70 16.77
As it is observed from the above summary, for dose 1 of both supliments at 95 percent confidence interval the limts do not contain zero (from 2.802148 to 9.057852). Hence, the difference in means is not equal to zero (mean of OJ = 22.70 and mean of VC = 16.77).
For dose 2
ojdose2 <- ToothGrowth %>% filter(supp=="OJ" & dose=="2")
vcdose2 <- ToothGrowth %>% filter(supp=="VC" & dose=="2")
t.test(ojdose2$len,vcdose2$len)
##
## Welch Two Sample t-test
##
## data: ojdose2$len and vcdose2$len
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean of x mean of y
## 26.06 26.14
As it is observed from the above summary, for dose 2 of both supliments at 95 percent confidence interval the limts contains zero (from -3.79807 3.63807). Hence, the difference in means is nearly equal to zero (mean of OJ = 26.06 and mean of VC = 26.14 ).
For the current analysis purpose, it is assumed that the poplulations are independent, the variances between samples are different, a random sampling technique is used, the samples are comprised of similar guinea pigs and measurement error is accounted for with significant digits. Furthermore,the researchers taking the measurements should unaware of which guinea pigs were given which dose level or delivery method. In reviewing our t-test analysis from above, it can be concluded that with 95% confident dose 0.5 and dose 1 of orange juice (OJ) resulted in longer tooth length than vitamin C (VC) for the same dose. However, at the highest dose of 2, there is no statistically significant difference between the effects of OJ and VC on tooth growth.