The main of this project is to preform an explorartory data analysis of the ToothGrowth data in the R datasets package and perform a hypothesis testing on the data to get a deeper insight of the data.
library(ggplot2)
library(datasets)
attach(ToothGrowth)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
From the above Summary on the data, there are 3 varibles, the tooth length, the menthod the supplement was given and the dosage levels.
Dosage_Levels <- factor(ToothGrowth$dose)
g <- ggplot(data = ToothGrowth,aes(dose,len)) +
geom_boxplot(aes(group = dose, fill = Dosage_Levels )) +
ggtitle("Tooth Length for differnt Dosage Levels") +
xlab("Dosage Levels") +
ylab("Tooth Length")
g
From the above plot it can be seen that the mean tooth length for each dosage level is different and it can also be seen that the tooth length increases as the dosage level increases.
Supplement_Type <- factor(ToothGrowth$supp,labels = c("Orange Juice","Absorbic Acid"))
g <- ggplot(data = ToothGrowth,aes(Supplement_Type,len)) +
geom_boxplot(aes(group = supp, fill = Supplement_Type)) +
ggtitle("Tooth Length for differnt Supplement types") +
xlab("Supplement Type") +
ylab("Tooth Length")
g
From the above plot it can be seen that the mean tooth length for each Supplement Type is different and it can also be seen that the tooth length is more when the supplement is delivered by mixing with Orange Juice - a form of Vitamin C.
ToothGrowth$Dosage_Levels <- factor(ToothGrowth$dose)
ToothGrowth$Supplement_Type <- factor(ToothGrowth$supp,labels = c("Orange Juice","Absorbic Acid"))
g <- ggplot(data = ToothGrowth,aes(Supplement_Type,len)) +
geom_boxplot(aes(fill = Supplement_Type)) +
facet_grid(~ dose) +
ggtitle("Tooth Length for different Supplement types across Dosage Levels") +
xlab("Dosage Levels") +
ylab("Tooth Length")
g
I have added two new columns to the data set by converting the Supplement Types and the Dosage Levels to factors.
data1 <- subset(ToothGrowth, Dosage_Levels %in% c(0.5,1))
t.test(len ~ Dosage_Levels, paired = FALSE, var.equal = FALSE, data = data1)
##
## Welch Two Sample t-test
##
## data: len by Dosage_Levels
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
It can be seen that the mean tooth length for each dosage levels 0.5 and 1 are different and it can also be seen that the mean tooth length is more for dosage level 1. Also, the confidence interval does not contain 0, so we reject the null hypothesis.
data2 <- subset(ToothGrowth, Dosage_Levels %in% c(1,2))
t.test(len ~ Dosage_Levels, paired = FALSE, var.equal = FALSE, data = data2)
##
## Welch Two Sample t-test
##
## data: len by Dosage_Levels
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
It can be seen that the mean tooth length for each dosage levels 1 and 2 are different and it can also be seen that the mean tooth length is more for dosage level 2. Also, the confidence interval does not contain 0, so we reject the null hypothesis.
data3 <- subset(ToothGrowth, Dosage_Levels %in% c(0.5,2))
t.test(len ~ Dosage_Levels, paired = FALSE, var.equal = FALSE, data = data3)
##
## Welch Two Sample t-test
##
## data: len by Dosage_Levels
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
It can be seen that the mean tooth length for each dosage levels 0.5 and 2 are different and it can also be seen that the mean tooth length is more for dosage level 2. Also, the confidence interval does not contain 0, so we reject the null hypothesis.
t.test(len ~ Supplement_Type, paired = FALSE, var.equal = FALSE, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by Supplement_Type
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group Orange Juice mean in group Absorbic Acid
## 20.66333 16.96333
It can be seen that the mean tooth length for each Supplement Type is different and it can also be seen that the tooth length is more when the supplement is delivered by mixing with Orange Juice - a form of Vitamin C. Also, the confidence interval contains 0, so we cannot reject the null hypothesis.