The purpose of this project is to analyse the ToothGrowth data in the R datasets package. Such analysis has the purpose of defining if supplements, or vitamin C doses inluence the length of a tooth.
The data is loaded directly in R. Here is a small summary of the database. As explained in the overview, the database shows the length of a tooth, its supplement and dose.
raw.data=ToothGrowth
raw.data$dose=as.factor(raw.data$dose)
head(raw.data)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
str(raw.data)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...
summary(raw.data)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
In order to define if the supplement influences the tooth length, the mean length will be calculated for each supplement.
split.supp=split(raw.data$len,raw.data$supp)
sapply(split.supp, mean)
## OJ VC
## 20.66333 16.96333
As we can see, there is an apparent difference for each supplement, but it has yet to be proven.
In order to understand the distribution of data per supplement a boxplot was created.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.4
ggplot(raw.data, aes(x=supp,y=len, fill=supp))+ geom_boxplot()+
labs(title="Tooth Length by Supplement", x="Supplement", y="Tooth length")+
scale_fill_manual(values = c("blue","green"))
The same information is calculated using the dose instead of the supplement.
split.dose=split(raw.data$len,raw.data$dose)
sapply(split.dose, mean)
## 0.5 1 2
## 10.605 19.735 26.100
We can see here that also there is an apparent difference in the length depending on the dose. Also this has to be yet proven.
The corresponding boxplots were also made, in order to have a visual representation of the data
ggplot(raw.data, aes(x=dose,y=len, fill=dose))+ geom_boxplot()+
labs(title="Tooth Length by Dose", x="Dose", y="Tooth length")+
scale_fill_manual(values = c("blue","green","orange"))
Here a t test will be made in order to define if the supplement influences the length of tooth.
t.test(raw.data$len[raw.data$supp=="OJ"],raw.data$len[raw.data$supp=="VC"], paired=F)
##
## Welch Two Sample t-test
##
## data: raw.data$len[raw.data$supp == "OJ"] and raw.data$len[raw.data$supp == "VC"]
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
As we can see in the p value and the conference intervals there is not enough information to conclude that the supplement influences the length of a tooth with a 95% confidence interval.
Now the test will be applied for the different values of dose in order to determine if the dose influences the tooth length.
t.test(raw.data$len[raw.data$dose=="1"],raw.data$len[raw.data$dose=="0.5"], paired=F)
##
## Welch Two Sample t-test
##
## data: raw.data$len[raw.data$dose == "1"] and raw.data$len[raw.data$dose == "0.5"]
## t = 6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 6.276219 11.983781
## sample estimates:
## mean of x mean of y
## 19.735 10.605
Here the results show that there is sufficient evidence to conclude that a dose=1 increases the length of a tooth compared to a dose=0.5 with a 95% confidence interval.
A second test will be made comparing the dose=1 with dose=2.
t.test(raw.data$len[raw.data$dose=="2"],raw.data$len[raw.data$dose=="1"], paired=F)
##
## Welch Two Sample t-test
##
## data: raw.data$len[raw.data$dose == "2"] and raw.data$len[raw.data$dose == "1"]
## t = 4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.733519 8.996481
## sample estimates:
## mean of x mean of y
## 26.100 19.735
In this case the results also show that a dose=2 increases the length of a tooth compared to a dose=1 with a 95% confidence interval.
With this simple analysis we can conclude that the supplement does not influence the length of a tooth. On the other hand, we have sufficient evidence to conclude that the dose has an influence on the length of the tooth. Both assumptions made with a 95% confidence interval.