This project aims to analyze the ToothGrowth Dataset present in the R datasets package. The ToolthGrowth dataset contains observation for the effect of vitamin C on tooth growth in Guinea Pigs, which corresponds to the response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid). The main objectives of this project are to:-
1) Load the ToothGrowth data and perform some basic exploratory data analysis.
2) Provide a basic summary of the data.
3) Use confidence intervals and/or hypothesis tests to compare tooth growth by Supplement and Dose.
…
We load the libraries required for conducting the analysis.
library(ggplot2)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.4.4
## -- Attaching packages ----------------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v tibble 1.4.2 v purrr 0.2.4
## v tidyr 0.8.0 v dplyr 0.7.4
## v readr 1.1.1 v stringr 1.3.0
## v tibble 1.4.2 v forcats 0.3.0
## Warning: package 'readr' was built under R version 3.4.4
## Warning: package 'forcats' was built under R version 3.4.4
## -- Conflicts -------------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(knitr)
To get a basic understanding of the data, we initially view the first few observations and then see a summary of the Data.
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
We load the ToothGrowth data into another dataframe and convert Dose from a numeric type variable to a factor variable for better data analysis.
Now, we can clearly understand the basic summary of our data.
d<-ToothGrowth
d$dose<-as.factor(d$dose)
summary(d)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
To get an even better understanding of our data, we create an Exploratory Boxplot.
ggplot(d,aes(x=dose,y=len,color=supp))+geom_boxplot()+labs(title="Comparision of Tooth Growth in Guinea Pigs for different doses of Supplements ",x="Dose",y="Length of Odontoblasts",color="Supplement type")+ scale_color_manual(labels = c("Orange Juice", "Vitamin C"),values = c("blue", "red"))
From the graph, we can see that with increase in Dose, there seems to be an increase in our response variable that is the Length of Odontoblasts.But we further need to verify this claim through an appropiate Statistical Test.
We also see that for different doses, the two supplements seem to have different effect on our response variable.It seems that when dose is low ( at 0.5 unit or 1 unit), Orange Juice is a better supplement than Ascorbic Acid or Vitamin C, while when Dose is High, Vitamin C seems to be a better supplement.
We further verify this claim using an appropiate Statistical Test.
In order to verify our claims made by seeing the Exporatory Boxplot, we conduct t-tests on our data.
An assumption for conducting t-test on the data is that the observations should be independently,indentically distributed as Normal Random Variables.
In order to verify this assumptuion, we draw Normal Q-Q Plot.
qqnorm(d$len)
qqline(d$len)
If the data is normally distributed, the points in the QQ-normal plot lie on a straight diagonal line. Since the deviations from the straight line are minimal, we can say that our assumption is true.
The given data is manipulated to create small chunks of data which is used for conducting Multiple T-Tests to check our claim that at different doses , supplements seems to have different effect on response variable.
toothgrowth_VC_0.5<- d %>%
filter(supp=="VC",dose==0.5)
toothgrowth_VC_1<- d %>%
filter(supp=="VC",dose==1)
toothgrowth_VC_2<- d %>%
filter(supp=="VC",dose==2)
toothgrowth_OJ_0.5<- d %>%
filter(supp=="OJ",dose==0.5)
toothgrowth_OJ_1<- d %>%
filter(supp=="OJ",dose==1)
toothgrowth_OJ_2<- d %>%
filter(supp=="OJ",dose==2)
3 one-sided T-Tests are conducted to verify our claim. The results are stored in a variable and a Summary Table of Results is created.
res1<-t.test(toothgrowth_OJ_0.5$len,toothgrowth_VC_0.5$len,mu=0,alternative = "greater")
res2<-t.test(toothgrowth_OJ_1$len,toothgrowth_VC_1$len,mu=0,alternative = "greater")
res3<-t.test(toothgrowth_OJ_2$len,toothgrowth_VC_2$len,mu=0,alternative = "greater")
Dose<-c(0.5,1,2)
Null_Hypothesis_1<-c("Mean Tooth Growth is same for OJ and VC at 0.5 dose",
" Mean Tooth Growth is same for OJ and VC at 1 dose",
" Mean Tooth Growth is same for OJ and VC at 2 dose")
Alternative_Hypothesis_1<-c("Mean Growth is Greater for OJ in comparision to VC at 0.5 dose",
"Mean Tooth Growth is Greater for OJ in comparision to VC at 1 dose",
"Mean Tooth Growth is Greater for OJ in comparision to VC at 2 dose")
P_Values_per_Dose<-rbind(res1$p.value,res2$p.value,res3$p.value)
Conf_Lower_per_Dose<-rbind(res1$conf.int[1],res2$conf.int[1],res3$conf.int[1])
Conf_Upper_per_Dose<-rbind(res1$conf.int[2],res2$conf.int[2],res3$conf.int[2])
Dose_Comparision_1<-data.frame(Dose,Null_Hypothesis_1,Alternative_Hypothesis_1,P_Values_per_Dose,Conf_Lower_per_Dose,Conf_Upper_per_Dose)
kable (Dose_Comparision_1,format = "pandoc",caption="Summary of Results of T-Tests for comparing Supplements at different levels of Doses")
| Dose | Null_Hypothesis_1 | Alternative_Hypothesis_1 | P_Values_per_Dose | Conf_Lower_per_Dose | Conf_Upper_per_Dose |
|---|---|---|---|---|---|
| 0.5 | Mean Tooth Growth is same for OJ and VC at 0.5 dose | Mean Growth is Greater for OJ in comparision to VC at 0.5 dose | 0.0031793 | 2.346040 | Inf |
| 1.0 | Mean Tooth Growth is same for OJ and VC at 1 dose | Mean Tooth Growth is Greater for OJ in comparision to VC at 1 dose | 0.0005192 | 3.356158 | Inf |
| 2.0 | Mean Tooth Growth is same for OJ and VC at 2 dose | Mean Tooth Growth is Greater for OJ in comparision to VC at 2 dose | 0.5180742 | -3.133500 | Inf |
From the summary table, we can clearly see that that Null Hypothesis is rejected at 5% Level of Significance in the first two cases and it is accepted in the third case. This verifies our claim that at various levels of doses, two supplements have different effects.
Now, in order to check for the overall difference in effect of Supplement on Response Variable, we conduct another one-sided t-test. Our Null Hypothesis is that Both the Supplements have same effect on Response Variable. Our Alternative Hypothesis is that Orange Juice has more effect on Response Variable than Vitamin C.
toothgrowth_OJ<- d %>%
filter(supp=="OJ")
toothgrowth_VC<- d %>%
filter(supp=="VC")
t.test(toothgrowth_OJ$len,toothgrowth_VC$len,mu=0,alternative = "greater")
##
## Welch Two Sample t-test
##
## data: toothgrowth_OJ$len and toothgrowth_VC$len
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.4682687 Inf
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
Here also, We see that our Null Hypothesis is rejected at 5% Level of Significance. This implies that Orange Juice is more effective than Vitamin C or Ascorbic Acid in context of tooth growth in Guinea Pigs.
Now, to verify another claim that different levels of doses have different effect on our response variable, we conducted 2 One-sided Two- Sample T-Tests.
toothgrowth_0.5<- d %>%
filter(dose==0.5)
toothgrowth_1<- d %>%
filter(dose==1)
toothgrowth_2<- d %>%
filter(dose==2)
res4<-t.test(toothgrowth_1$len,toothgrowth_0.5$len,mu=0,alternative = "greater")
res5<-t.test(toothgrowth_2$len,toothgrowth_1$len,mu=0,alternative = "greater")
Null_Hypothesis_2<-c("Mean Tooth Growth is same at 0.5 unit dose and 1 unit dose",
" Mean Tooth Growth is same at 1 unit dose and 2 unit dose")
Alternative_Hypothesis_2<-c("Mean Growth is Greater at 1 Unit dose than at 0.5 Unit Dose",
"Mean Growth is Greater at 2 Unit dose than at 1 Unit Dose")
P_Values<-rbind(res4$p.value,res5$p.value)
Conf_Lower<-rbind(res4$conf.int[1],res5$conf.int[1])
Conf_Upper<-rbind(res4$conf.int[2],res5$conf.int[2])
Dose_Comparision_2<-data.frame(Null_Hypothesis_2,Alternative_Hypothesis_2,P_Values,Conf_Lower,Conf_Upper)
kable (Dose_Comparision_2,format = "pandoc",caption="Summary of Results of T-Tests for different levels of Doses")
| Null_Hypothesis_2 | Alternative_Hypothesis_2 | P_Values | Conf_Lower | Conf_Upper |
|---|---|---|---|---|
| Mean Tooth Growth is same at 0.5 unit dose and 1 unit dose | Mean Growth is Greater at 1 Unit dose than at 0.5 Unit Dose | 1.0e-07 | 6.753323 | Inf |
| Mean Tooth Growth is same at 1 unit dose and 2 unit dose | Mean Growth is Greater at 2 Unit dose than at 1 Unit Dose | 9.5e-06 | 4.173870 | Inf |
From the table, we see that our Null Hypothesis is rejected in both the cases at 5% Level of Significance. This implies that with increase in dose, the effect on response variable increases.
In this project we assumed that our response variable that is the Length of Odontoblasts is identically,indepentally Normal Distributed. We verified that our assumption holds true using Q-Q Normal Plot.
We conclude that at 0.5 Level of Dose and 1 Level of Dose , Orange Juice is a more effective supplement than Vitamin C while at 2 level of Dose, Vitamin C is a more effective supplement than Orange Juice. We also find that in general, Orange Juice seems to be more effective supplement than Vitamin C.
We also find that with increase in levels of doses, the effect of Supplements on Tooth Growth in Guinea Pigs increases significantly.