In this report, we’re going to analyze the ToothGrowth data in the R datasets package.
library(datasets)
data <- ToothGrowth
head(data)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
ToothGrowth has 60 observations of 3 variables The breakdown of supllements is as follows:
summary(ToothGrowth$supp)
## OJ VC
## 30 30
There are 3 variables in this data set:
names(ToothGrowth)
## [1] "len" "supp" "dose"
Factor levels for variables: supp and dose
unique(ToothGrowth$supp)
## [1] VC OJ
## Levels: OJ VC
unique(ToothGrowth$dose)
## [1] 0.5 1.0 2.0
60 patients received two kinds of supplement in dose varying from Min. : 0.5 to Max. : 2
plot(ToothGrowth$supp, ToothGrowth$len, main="supplement vs. length of tooth")
From the above box plot, we may notice:
‘OJ’ has a higher median than ‘VC’.
Lengths associated with ‘OJ’ have a smaller variability comparing ot its counterpart in ‘VC’.
library(ggplot2)
g <- ggplot(data,aes(x=len))
g + geom_density(aes(x=len,colour=supp)) +
facet_grid(. ~ dose,scales="free") +
labs(title="Density plot for different doses")
g + geom_density(aes(x=len,colour=as.factor(dose))) +
facet_grid(. ~ supp,scales="free") +
labs(title="Density plot for different supplements")
### Plot Graph
ggplot(data=ToothGrowth, aes(x=as.factor(dose), y=len, fill=supp)) +
geom_bar(stat="identity",) +
facet_grid(. ~ supp) +
xlab("Dose in miligrams") +
ylab("Tooth length") +
guides(fill=guide_legend(title="Supplement type"))
Based on the above figure, it is clear that there is a positive correlation between the tooth length and the dose level with respect to both delivery methods.
aggregate(list(Median.Length = ToothGrowth$len), by = list(Dose = ToothGrowth$dose, Supplement = ToothGrowth$supp), FUN = median)
## Dose Supplement Median.Length
## 1 0.5 OJ 12.25
## 2 1.0 OJ 23.45
## 3 2.0 OJ 25.95
## 4 0.5 VC 7.15
## 5 1.0 VC 16.50
## 6 2.0 VC 25.95
Generally, increases in dose seem to correlate with increases in tooth length. This is true of both types of supplements. The median tooth length at lower dosages (0.5 mg and 1.0 mg) was lower for VC than OJ; however, the VC observations did not increase as much overall between the dosage increases, resulting in the same median tooth length of 25.95 at 2.0 mg dosage for both delivery methods.
summary(data)
## len supp dose
## Min. : 4.2 OJ:30 Min. :0.50
## 1st Qu.:13.1 VC:30 1st Qu.:0.50
## Median :19.2 Median :1.00
## Mean :18.8 Mean :1.17
## 3rd Qu.:25.3 3rd Qu.:2.00
## Max. :33.9 Max. :2.00
This data set contains 3 columns: - len: tooth length - supp: suplement type used (VC: ascorbic acid, OJ: orange juice) - dose: vitamin C dose in milligrams. The data set includes 60 observations. They took place across 10 guinea pigs. Each guinea pig was observed at each of the three dose levels of Vitamin C with each of the two delivery methods.
h0: The difference in mean tooth length when given a Vitamin C dose of 2.0 mg vs 0.5 mg is 0.
h1: The difference in mean tooth length when given a Vitamin C dose of 2.0 mg vs 0.5 mg is different than 0.
TG1 <- subset(ToothGrowth, ToothGrowth$dose %in% c(0.5, 2.0))
t.test(len ~ dose, var.equal = FALSE, data = TG1)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.8, df = 36.88, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.16 -12.83
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.61 26.10
Since the p-value < .05, the null hypothesis h0 can be rejected and the alternative hypothesis h1 can be accepted.
h0: The difference in mean tooth length when given a Vitamin C dose of 1.0 mg vs 0.5 mg is 0.
h1: The difference in mean tooth length when given a Vitamin C dose of 1.0 mg vs 0.5 mg is different than 0.
TG2 <- subset(ToothGrowth, ToothGrowth$dose %in% c(0.5, 1.0))
t.test(len ~ dose, var.equal = FALSE, data = TG2)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.477, df = 37.99, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.984 -6.276
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.61 19.73
Since the p-value < .05, the null hypothesis h0 can be rejected and the alternative hypothesis h1 can be accepted.
h0: The difference in mean tooth length when given a Vitamin C dose of 2.0 mg vs 1.0 mg is 0.
h1: The difference in mean tooth length when given a Vitamin C dose of 2.0 mg vs 1.0 mg is different than 0.
TG3 <- subset(ToothGrowth, ToothGrowth$dose %in% c(2.0, 1.0))
t.test(len ~ dose, var.equal = FALSE, data = TG3)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.901, df = 37.1, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996 -3.734
## sample estimates:
## mean in group 1 mean in group 2
## 19.73 26.10
Since the p-value < .05, the null hypothesis h0 can be rejected and the alternative hypothesis h1 can be accepted.
h0: The difference in mean tooth length when given a Vitamin C via Orange Juice (OJ) vs. Ascorbic Acid (VC) is 0.
h1: The difference in mean tooth length when given a Vitamin C dose via Orange Juice (OJ) vs. Ascorbic Acid (VC) is different than 0.
t.test(len ~ supp, var.equal = FALSE, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.915, df = 55.31, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.171 7.571
## sample estimates:
## mean in group OJ mean in group VC
## 20.66 16.96
Since the p-value > .05, the null hypothesis h0 cannot be rejected. However, this does not mean that the alternative hypothesis h1 can be accepted.
By increasing dosages of Vitamin C, there are distinct increases in tooth size based on both the exploratory data analysis performed and the first three t-tests above. We do not have enough data to determine (with 95% confidence) a difference in tooth size between the two delivery methods (orange juice and ascorbic acid); however, we also cannot determine that there is no correlation.
All of the t-tests performed:
did not assume the same population variance between the two groups being compared, allowing for a more robust comparison.
assumed that the data is normally distributed.
assumed that the data distributions are not skewed.