This project is a breif graphical and numericla exploration of the dataset, by performing the techniques of confidence interval and hypothesis testing.
library(datasets)
require(ggplot2)
## Loading required package: ggplot2
require(RColorBrewer)
## Loading required package: RColorBrewer
require(grDevices)
data(ToothGrowth)
attach(ToothGrowth)
# first look: 3 variables and 60 observations
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
# convert $dose to factors
unique(ToothGrowth$dose)
## [1] 0.5 1.0 2.0
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
This section examines the relationship between dose size, dose type, and tooth length
require(ggplot2)
require(gridExtra)
## Loading required package: gridExtra
theme <- theme(
panel.background = element_rect(fill = "lightgrey", colour = "lightgrey", size = 0.5, linetype = "solid"),
panel.grid.major = element_line(size = 0.5, linetype = 'solid', colour = "white"),
panel.grid.minor = element_line(size = 0.25, linetype = 'solid', colour = "white")
)
plot1 <- ggplot(aes(x = factor(dose), y = len), data = ToothGrowth) +
geom_boxplot(aes(fill = factor(dose))) +
theme + scale_fill_brewer(palette="GnBu") +
labs(title = "Figure 1")
plot2 <- ggplot(aes(x = supp, y = len), data = ToothGrowth) +
geom_boxplot(aes(fill = supp)) +
theme + scale_fill_brewer(palette="PuOr") +
labs(title = "Figure 2")
grid.arrange(plot1, plot2, ncol=2)
Figure 1 shows that tooth length increases as the doseage increases.
Figure 2 shows that Orange Juice is more effective than Vitamin C, including all doseage levels.
require(ggplot2)
require(gridExtra)
plot3 <- ggplot(aes(x = supp, y = len), data = ToothGrowth) +
geom_boxplot(aes(fill = supp)) +
facet_wrap(~ dose) + theme +
scale_fill_brewer(palette="YlOrRd") +
labs(title = "Figure 3: Tooth Growth due to Two Supplements by Incremental Dosages") +
annotate("text", x = 1.5, y = 2, label = c("p = .03","p = .0005","p = .5"))
plot3
Figure 3 shows that Orange Juice is more effective than Vitamin C only in the first two doseage, but they are equally effective in the third doseage.
First, we need to reformat the data by using the split() function. Therefore, we can do hypothesis testing more conviently from the reformatted dataset.
# split the data frame by dose and supplement type
split_tooth <- split(ToothGrowth, f = list(ToothGrowth$dose, ToothGrowth$supp))
Second, aggregations show the sample mean and standard deviation.
aggregate(len, list(supp, dose), mean)
## Group.1 Group.2 x
## 1 OJ 0.5 13.23
## 2 VC 0.5 7.98
## 3 OJ 1.0 22.70
## 4 VC 1.0 16.77
## 5 OJ 2.0 26.06
## 6 VC 2.0 26.14
aggregate(len, list(supp, dose), sd)
## Group.1 Group.2 x
## 1 OJ 0.5 4.459709
## 2 VC 0.5 2.746634
## 3 OJ 1.0 3.910953
## 4 VC 1.0 2.515309
## 5 OJ 2.0 2.655058
## 6 VC 2.0 4.797731
Third, we perform hypothesis testing at the 5% significance level. Each p-value correponds to the question immediately above it
t.test(c(split_tooth[[1]]$len, split_tooth[[2]]$len, split_tooth[[3]]$len),
c(split_tooth[[4]]$len, split_tooth[[5]]$len, split_tooth[[6]]$len),
alternative = "greater")$p.value
## [1] 0.03031725
t.test(split_tooth[[1]]$len, split_tooth[[4]]$len,
alternative = "greater")$p.value
## [1] 0.003179303
t.test(split_tooth[[2]]$len, split_tooth[[5]]$len,
alternative = "greater")$p.value
## [1] 0.0005191879
t.test(split_tooth[[3]]$len, split_tooth[[6]]$len,
alternative = "greater")$p.value
## [1] 0.5180742
t.test(c(split_tooth[[1]]$len, split_tooth[[4]]$len),
c(split_tooth[[2]]$len, split_tooth[[5]]$len),
alternative = "less")$p.value
## [1] 6.341504e-08
t.test(c(split_tooth[[1]]$len, split_tooth[[4]]$len),
c(split_tooth[[3]]$len, split_tooth[[6]]$len),
alternative = "less")$p.value
## [1] 2.198762e-14
t.test(c(split_tooth[[2]]$len, split_tooth[[5]]$len),
c(split_tooth[[3]]$len, split_tooth[[6]]$len),
alternative = "less")$p.value
## [1] 9.532148e-06