This project is concerned with the ToothGrowth data set in R where I will be performing a bit of Exploratory Data Analysis to get the gist of Data & then statistics to compare the growth of the teeth by the level of Supplement & Dose given.
dat <- datasets::ToothGrowth #loading the data set into a variable
str(dat) #checking the structure of the datset
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
We see that the length & Dose are supposedly numeric
whereas the supplement is Factor
,so let’s check the factors
present in supplement.
table(dat$supp)
##
## OJ VC
## 30 30
Now after exploring the data set,let’s make a plot to visualize the data set to get the gist of the data clearly.
library(ggplot2) #using the ggplot package
ggplot(aes(x = as.factor(dose), y=len), data=dat) + geom_boxplot(aes(fill = dose)) +
xlab("Dose Amount") + ylab("Tooth Length") + facet_grid(.~supp)
With this plot we have visualized the data in the data set,looking clearly we can deduce that the amount of Dose given plays a huge role in it’s growth, but it’s still very much unclear about the difference in supplement,so we will be using stats for further analysis.
For the statistical testing part, I will be performing a series of Hypothesis-Testing for both Doses & Supplements.
dose0.5 <- subset(dat,dat$dose == 0.5) #creating a new data set that only has value(0.5) in Dose
dose1.0 <- subset(dat,dat$dose == 1) #creating a new data set that only has value(1) in Dose
dose2.0 <- subset(dat,dat$dose == 2) #creating a new data set that only has value(2) in Dose
# by subsetting Dosage into different data set, we can conduct tests between them easily
Now after creating different data sets that host each value of dosages,let’s do some Testing between Dosages.
t.test(dose1.0$len,dose0.5$len,alternative = "greater") #test b/w dose 1 mg and dose 0.5 mg
##
## Welch Two Sample t-test
##
## data: dose1.0$len and dose0.5$len
## t = 6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 6.753323 Inf
## sample estimates:
## mean of x mean of y
## 19.735 10.605
t.test(dose2.0$len,dose1.0$len,alternative = "greater") #test b/w dose 2 mg and dose 0.5 mg
##
## Welch Two Sample t-test
##
## data: dose2.0$len and dose1.0$len
## t = 4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 4.17387 Inf
## sample estimates:
## mean of x mean of y
## 26.100 19.735
we can see that while performing both the test with 95% confidence interval, the P-Value* are less than 0.05, infact it is less than 0.01 so with this we can clearly reject the NULL hypothesis** and go with the ALTERNATIVE hypothesis that higher Dose(MG) does lead to more growth of teeth.
For this we will be performing tests between supplements for each Dose(0.5,1,2) MG.
doseOJ0.5 <- subset(dose0.5,dose0.5$supp == "OJ") #OJ & VC data set for 0.5 MG tests
doseVC0.5 <- subset(dose0.5,dose0.5$supp == "VC")
doseOJ1.0 <- subset(dose1.0,dose1.0$supp == "OJ") #OJ & VC data set for 1 MG tests
doseVC1.0 <- subset(dose1.0,dose1.0$supp == "VC")
doseOJ2.0 <- subset(dose2.0,dose2.0$supp == "OJ") #OJ & VC data set for 2 MG tests
doseVC2.0 <- subset(dose2.0,dose2.0$supp == "VC")
Now let’s do the testing of Supplements for each respective Dose on the basis of their length.
t.test(doseOJ0.5$len,doseVC0.5$len,alternative = "greater")
##
## Welch Two Sample t-test
##
## data: doseOJ0.5$len and doseVC0.5$len
## t = 3.1697, df = 14.969, p-value = 0.003179
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 2.34604 Inf
## sample estimates:
## mean of x mean of y
## 13.23 7.98
t.test(doseOJ1.0$len,doseVC1.0$len,alternative = "greater")
##
## Welch Two Sample t-test
##
## data: doseOJ1.0$len and doseVC1.0$len
## t = 4.0328, df = 15.358, p-value = 0.0005192
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 3.356158 Inf
## sample estimates:
## mean of x mean of y
## 22.70 16.77
t.test(doseOJ2.0$len,doseVC2.0$len,alternative = "greater")
##
## Welch Two Sample t-test
##
## data: doseOJ2.0$len and doseVC2.0$len
## t = -0.046136, df = 14.04, p-value = 0.5181
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -3.1335 Inf
## sample estimates:
## mean of x mean of y
## 26.06 26.14
while performing all the tests with 95% confidence interval, the P-Value* are less than 0.05, so with this we can clearly reject the NULL hypothesis** and go with the ALTERNATIVE hypothesis that OJ clearly has more influence on teeth growth compared to VC.
After exploring the data set and performing appropriate statistical tests, we can conclude that the higher the Dose(MG) more does it influence the growth of the tooth.For the supplement we can deduce that OJ promotes the growth of the tooth far better as compared to the VC.
Thank you