In this report we will analyze the ToothGrowth data in the R datasets package. Analysis include the effect of supplement and dosage on tooth growth.
Let us load and examine the data
## Exploration of the dataset
library(datasets)
data(ToothGrowth)
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
Need to clean up the data as dose like a factor
## clean up: dose is a factor
ToothGrowth$dose = as.factor(ToothGrowth$dose)
Let us look at the summary and plot the data points in the dataset
## summary
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
## individual data point plot
library(ggplot2)
g1 <- ggplot(ToothGrowth, aes(dose, len))
g1 <- g1 + geom_point(color = "red", size = 1, shape = 15) +
facet_grid(.~supp) +
labs(x = "Dose (milligrams/day)") +
labs(y = "Tooth length (mm)") +
labs(title = "Effect of Supplement Type and Dose on Tooth Growth (individual data points)")
print(g1)
Let us look at a box plot to understand the data better. Let us test the hypothesis the difference in means between OJ and VC is not equal to 0.
## box plot
g2 <- ggplot(ToothGrowth, aes(dose, len, fill = dose))
g2 <- g2 + geom_boxplot(notch = F) +
facet_grid(.~supp) +
labs(x = "Dose (milligrams/day)") +
labs(y = "Tooth length (mm)") +
labs(title = "Effect of Supplement Type and Dose on Tooth Growth (box plot)")
print(g2)
## Testing hypothesis MUa is not equal to MU0
t.test(len ~ supp, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
By examining the box plot and the results of the hypothesis testing, we can conclude the following:
Since the p value is 0.0603 (and higher than 0.05) and the confidence interval is 95%, the mean difference between Vitamin C (VC) and Orange Juice (OJ) is not enough to state their means are different.
By looking at the box plot, we can conclude that OJ is more effective than VC for doses of 0.5 and 1.0. However, if we look at the entire sample, we can’t come to the same conclusion!!!