The report includes studying the ToothGrowth data from the R package datasets. Post studying, perform a basic exploratory analysis. Finally, state any conclusions drawn after performing hypothesis testing.
library(ggplot2)
tooth <- datasets::ToothGrowth
str(tooth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# Summarizing the data
summary(tooth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
# Looking for unique values in application method and dosage amounts
unique(tooth$supp)
## [1] VC OJ
## Levels: OJ VC
unique(tooth$dose)
## [1] 0.5 1.0 2.0
len – Tooth length
supp – Factor consisting of OJ and VC
dose – Dosage amount.
Hence, we can conclude that there are only two suppliment types -
Also, the dosage (in milligrams) is in 3 amounts -
plot(tooth)
We can see that there is some pattern between supp and dose so lets explore that part in detail.
# Plotting a line curve for mean length vs doses
doses <- c(0.5, 1, 2)
mns <- NULL
for(i in 1:3) mns = c(mns, mean(subset(tooth, dose == doses[i])$len))
plot(doses, mns, type = "l" , col="red", lwd = 3, xlab = "Dosage Amount", ylab="Tooth Growth", main="Effect of Dosage amount on tooth growth")
points(doses, mns, pch =19 )
# Plotting a boxplot
tooth$dose <- as.factor(tooth$dose)
g <- ggplot(tooth, aes(x=dose, y = len, color = supp)) + geom_boxplot() + xlab("Dosage Amount") + ylab("Tooth growth") + ggtitle("Effect of Dosage amount on tooth growth by supplements")
g
From the above graph, we observe the following trends:
We will perform hypothesis testing to confirm whether the observed trends are a fact or just random observations.
# testing 0.5 mg and 1 mg
t.test(len~dose, data = tooth[tooth$dose != 2, ])
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
# testing 0.5 mg and 2 mg
t.test(len~dose, data = tooth[tooth$dose != 1, ])
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
# testing 2 mg and 1 mg
t.test(len~dose, data = tooth[tooth$dose != 0.5, ])
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
From the above tests, we can see that the p-values are all very small (close to zero) and zero does not lie in any of the confidence intervals. This implies that dosage amount does affect the tooth growth and higher the amount higher the mean growth as observed.
Hence, we accept the alterante hypothesis that higher dosage amount results in higher tooth growth.
t.test(len~supp, data = tooth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
t.test(len~supp, data = tooth[tooth$dose != 2,])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.0503, df = 36.553, p-value = 0.004239
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.875234 9.304766
## sample estimates:
## mean in group OJ mean in group VC
## 17.965 12.375
t.test(len~supp, data = tooth[tooth$dose == 2,])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
From these tests, we can say that except for a dosage amount of 2mg, the other dosage amounts showed an obvious increase in tooth growth by Orange Juice than compared to Vitamin C. This is proven by the very small p value of 2nd test (0.004239) and zero not lying in the confidence interval.
Hence, we accept the alterante hypothesis that OJ has higher tooth growth than VC.
Following assumptions were made :-
Keeping these assumptions in mind, following conclusions can be drawn:-