In this part of the project, we are going to analyze the ToothGrowth data in the R datasets package.
library(datasets)
library(ggplot2)
library(plyr)
## Warning: package 'plyr' was built under R version 3.1.1
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
g <- ggplot(ToothGrowth,aes(len,dose))
plot(g+geom_point()+facet_grid(.~supp)+geom_smooth(method="lm")+ggtitle("Distibution of tooth lengths across supplement type"))
p <- ggplot(ToothGrowth,aes(supp,len))
plot(p+geom_point()+facet_grid(.~dose)+ggtitle("Distribution of tooth lengths across dosage levels"))
summary <- ddply(ToothGrowth,.(supp,dose),summarize,min_len=min(len),mean_len=mean(len),median_len=median(len),max_len=max(len),sd_len=round(sd(len),2),count=length(len))
g1 <- ggplot(summary,aes(dose,mean_len,color=supp))+geom_point(size=4)+geom_line()+ggtitle("Comparison of mean tooth lengths with supplement type and dosage")
plot(g1)
p1 <- ggplot(summary,aes(supp,mean_len))+geom_boxplot(aes(fill=supp))+ggtitle("Mean tooth length based on supplement type")
plot(p1)
p2 <- ggplot(summary,aes(factor(dose),mean_len))+geom_boxplot(aes(fill=factor(dose)))+ggtitle("Mean Tooth length in relation to dosage")
plot(p2)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
#Displaying the dataframe summary that we created earlier summarizing the ToothGrowth dataset
summary
## supp dose min_len mean_len median_len max_len sd_len count
## 1 OJ 0.5 8.2 13.23 12.25 21.5 4.46 10
## 2 OJ 1.0 14.5 22.70 23.45 27.3 3.91 10
## 3 OJ 2.0 22.4 26.06 25.95 30.9 2.66 10
## 4 VC 0.5 4.2 7.98 7.15 11.5 2.75 10
## 5 VC 1.0 13.6 16.77 16.50 22.5 2.52 10
## 6 VC 2.0 18.5 26.14 25.95 33.9 4.80 10
t.test(len~supp,paired=FALSE,var.equal=TRUE,data=ToothGrowth)
##
## Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1670064 7.5670064
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
t.test(len~supp,paired=FALSE,var.equal=FALSE,data=ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
Confidence intervals and p-values are calculated using the t.test function setting constant variance to TRUE and FALSE. In both cases, the p-value is above 5%. The t-confidence interval contains 0. This means that we fail to reject the null hypothesis. We cannot clearly state that the supplement type affects the tooth length.
diffdose1 <- subset(ToothGrowth,dose %in% c(0.5,1.0))
diffdose2 <- subset(ToothGrowth,dose %in% c(1.0,2.0))
diffdose3 <- subset(ToothGrowth,dose %in% c(0.5,2.0))
t.test(len~dose,paired=FALSE,var.equal=TRUE,data=diffdose1)
##
## Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 38, p-value = 1.266e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983748 -6.276252
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
t.test(len~dose,paired=FALSE,var.equal=TRUE,data=diffdose2)
##
## Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.994387 -3.735613
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
t.test(len~dose,paired=FALSE,var.equal=TRUE,data=diffdose3)
##
## Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 38, p-value = 2.838e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15352 -12.83648
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
Comparing dosage of 0.5 to 1.0 The 95% confidence interval (-11.983748,-6.276252) is entirely below 0 and the p-value = 1.266e-07 is much smaller than 5%. This means that the null hypothesis is rejected. Difference in dosage from 0.5 to 1.0 has a positive impact on tooth growth.
Comparing dosage of 1.0 to 2.0 The 95% confidence interval (-8.994387,-3.735613) is entirely below 0 and the p- value=1.811e-05 is much smaller than 5%. This means that the null hypothesis is rejected. Difference in dosage from 1.0 to 2.0 has a positive impact on tooth growth.
Comparing dosage of 0.5 to 2.0 The 95% confidence interval (-18.15352,-12.83648) is entirely below 0 and the p-value = 2.838e-14 is much smaller than 5%. This means that the null hypothesis is rejected. Difference in dosage from 0.5 to 2.0 has a positive impact on tooth growth.
Based on the summary of the dataset and the initial exploratory data analysis, it also seems worthwhile to explore the impact of the supplement type given the dosage. At lower dosages, there seems to be a significant difference in tooth growth length between the two the supplements.
summary
## supp dose min_len mean_len median_len max_len sd_len count
## 1 OJ 0.5 8.2 13.23 12.25 21.5 4.46 10
## 2 OJ 1.0 14.5 22.70 23.45 27.3 3.91 10
## 3 OJ 2.0 22.4 26.06 25.95 30.9 2.66 10
## 4 VC 0.5 4.2 7.98 7.15 11.5 2.75 10
## 5 VC 1.0 13.6 16.77 16.50 22.5 2.52 10
## 6 VC 2.0 18.5 26.14 25.95 33.9 4.80 10
dose05 <- ToothGrowth[ToothGrowth$dose==0.5,]
dose10 <- ToothGrowth[ToothGrowth$dose==1.0,]
dose20 <- ToothGrowth[ToothGrowth$dose==2.0,]
t.test(len~supp,paired=FALSE,var.equal=TRUE,data=dose05)
##
## Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 18, p-value = 0.005304
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.770262 8.729738
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
t.test(len~supp,paired=FALSE,var.equal=TRUE,data=dose10)
##
## Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 18, p-value = 0.0007807
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.840692 9.019308
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
t.test(len~supp,paired=FALSE,var.equal=TRUE,data=dose20)
##
## Two Sample t-test
##
## data: len by supp
## t = -0.0461, df = 18, p-value = 0.9637
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.722999 3.562999
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
At dose 0.5 The 95% confidence interval (1.770262,8.729738) is entirely above 0 and the p-value = 0.005304 is much smaller than 5%. This means that the null hypothesis is rejected. Type of supplement at a dose of 0.5 has an impact on tooth growth. OJ performs better that VC
At dose 1.0 The 95% confidence interval (2.840692,9.019308) is entirely above 0 and the p-value = 0.0007807 is much smaller than 5%. This means that the null hypothesis is rejected. Type of supplement at a dose of 1.0 has an impact on tooth growth. OJ performs better than VC.
At dose 2.0 The 95% confidence interval (-3.72299,3.562999) contains 0 and the p-value = 0.9637 is bigger than 5%. This means that we fail to reject the null hypothesis. Type of supplement at a dose of 2.0 does not have an impact on tooth growth.
We are able to conclude that dosage levels have a significant effect on the length of the tooth growth. Increasing the dosage led to an increase in tooth growth. The dosage type did not have a clear impact on the length of the tooth, but when considered specifically in the context of each dosage level, we can conclude that at lower dosage levels of 0.5 and 1.0, the OJ supplement performs better than VC. For a dose of 2.0, there is no significant difference between the two types of supplement.