This report is a course project within the Statistical Inference Course on the Data Science Specialization by Johns Hopkins University on Coursera.
The project consists of two parts:
We are going to analyze the ToothGrowth data in the R datasets package.
About ToothGroth Dataset
The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (coded as OJ) or ascorbic acid (a form of vitamin C and coded as VC).
Loading the Data
# Loading the data
data(ToothGrowth)
# Convert the variable dose from a numeric to a factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
# Display the data
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...
Basic exploratory data analyses
# Box Plotx
plot1 <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_boxplot(fill = "#ff5c82") +
ggtitle("Tooth growth by dose (mg/day)") +
theme(axis.text = element_text(face="bold")) +
scale_x_discrete(name = "Dose") +
scale_y_continuous(name = "Tooth length",
breaks = seq(0, 35, 5))
plot2 <- qplot(supp, len, data = ToothGrowth,
facets = ~dose,
main = "Tooth growth by supplement and dose (mg/day)",
xlab = "Supplement",
ylab = "Tooth length") +
geom_boxplot(aes(fill = supp)) +
theme(legend.position = "none") +
theme(axis.text = element_text(face="bold")) +
scale_y_continuous(breaks = seq(0,35,5)) +
scale_fill_manual(values= c("#fff68f", "#ffec06"))
grid.arrange(plot1, plot2, ncol=2)
We can see that as the dose is increased, the length of the tooth also increases for both supplement types.
Regarding the supplements, it seems like OJ is more efficient for 0.5 and 1 mg/day dose levels. Although, this is not so clear for the 2 mg/day dose.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
For this question, we will perform a Welch’s t-test. Because we want to test the hypothesis that two populations have equal means. This is an adaptation of Student’s t-test, and is more reliable when the two samples have unequal variances and/or unequal sample sizes.
H0: (H0 = M1-M2 = 0) There is no difference in tooth growth given OJ or VC, for a 0.5 mg dose.
H1: (H1 = M1-M2 > 0) Tooth growth is bigger when usign OJ, than VC, for a 0.5 mg dose.
Where:
M1 is the average of tooth growth length given the OJ supplement and 0.5 mg dose.
M2 is the average of tooth growth length given the VC supplement and 0.5 mg dose.
Student’s t-Test
VC.length <- ToothGrowth$len[which(ToothGrowth$supp == "VC" & ToothGrowth$dose == 0.5)]
OJ.length <- ToothGrowth$len[which(ToothGrowth$supp == "OJ" & ToothGrowth$dose == 0.5)]
t.test(OJ.length, VC.length, alternative = "greater", paired = FALSE, var.equal = FALSE, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: OJ.length and VC.length
## t = 3.1697, df = 14.969, p-value = 0.003179
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 2.34604 Inf
## sample estimates:
## mean of x mean of y
## 13.23 7.98
At 5% significance level (1-conf.level), there is sufficient evidence to reject the Null Hypothesis (p-value < Significance level). For a 0.5 mg/day dose, the tooth growth is greater when OJ supplement is given.
H0: (H0 = M1-M2 = 0) There is no difference in tooth growth given OJ or VC, for a 1 mg dose.
H1: (H1 = M1-M2 > 0) Tooth growth is bigger when usign OJ, than VC, for a 1 mg dose.
Where:
M1 is the average of tooth growth length given the OJ supplement and 1 mg dose.
M2 is the average of tooth growth length given the VC supplement and 1 mg dose.
Student’s t-Test
VC.length <- ToothGrowth$len[which(ToothGrowth$supp == "VC" & ToothGrowth$dose == 1.0)]
OJ.length <- ToothGrowth$len[which(ToothGrowth$supp == "OJ" & ToothGrowth$dose == 1.0)]
t.test(OJ.length, VC.length, alternative = "greater", paired = FALSE, var.equal = FALSE, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: OJ.length and VC.length
## t = 4.0328, df = 15.358, p-value = 0.0005192
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 3.356158 Inf
## sample estimates:
## mean of x mean of y
## 22.70 16.77
At 5% significance level (1-conf.level), there is sufficient evidence to reject the Null Hypothesis (p-value < Significance level). For a 1.0 Mg/day dose, the tooth growth is greater when OJ supplement is given.
H0: (H0 = M1-M2 = 0) There is no difference in tooth growth given OJ or VC, for a 2 mg dose.
H1: (H1 = M1-M2 > 0) Tooth growth is bigger when usign OJ, than VC, for a 2 mg dose.
Where:
M1 is the average of tooth growth length given the OJ supplement and 1 mg dose.
M2 is the average of tooth growth length given the VC supplement and 1 mg dose.
Student’s t-Test
VC.length <- ToothGrowth$len[which(ToothGrowth$supp == "VC" & ToothGrowth$dose == 2.0)]
OJ.length <- ToothGrowth$len[which(ToothGrowth$supp == "OJ" & ToothGrowth$dose == 2.0)]
t.test(OJ.length, VC.length, alternative = "greater", paired = FALSE, var.equal = FALSE, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: OJ.length and VC.length
## t = -0.046136, df = 14.04, p-value = 0.5181
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -3.1335 Inf
## sample estimates:
## mean of x mean of y
## 26.06 26.14
At 5% significance level (1-conf.level), we fail to reject the Null Hypothesis (p-value > Significance level). For a 2.0 mg/day dose, the tooth growth is the same when OJ or VC supplements are given.
H0: Null hypothesis, there is no difference in tooth growth given 0.5 or 1 mg/day dose. (H0 = M0-M1 = 0)
H1: Alternative hypothesis, tooth growth is bigger when usign 1 mg/day dose, than 0.5. (H1 = M0-M1 > 0)
Where:
M0 is the average of tooth growth length for a dose of 0.5 mg/day.
M1 is the average of tooth growth length for a dose of 1 mg/day.
Student’s t-Test
Now we will perform the t-test using the two tooth length vectors dose_0.5 and dose_1.
dose_0.5 <- ToothGrowth$len[ToothGrowth$dose == "0.5"]
dose_1 <- ToothGrowth$len[ToothGrowth$dose == "1"]
t.test(dose_0.5, dose_1,
alternative = "less", # is the alterntavie that dose_0.5 has a smaller mean than dose_1 (which should be true, looking at the graphics before)
paired = FALSE, # the data point are not paired with each other
var.equal = FALSE, # the variances are not equal
conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: dose_0.5 and dose_1
## t = -6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -6.753323
## sample estimates:
## mean of x mean of y
## 10.605 19.735
At 5% significance level (1-conf.level), there is sufficient evidence to reject the Null Hypothesis. The tooth growth is greater when a 1 mg/day dose is given instead of a 0.5 mg/day dose.
H0: Null hypothesis, there is no difference in tooth growth given 1 or 2 mg/day dose. (H0 = M0-M1 = 0)
H1: Alternative hypothesis, tooth growth is bigger when usign 2 mg/day dose, than 1. (H1 = M0-M1 > 0)
Where:
M0 is the average of tooth growth length for a dose of 1 mg/day.
M1 is the average of tooth growth length for a dose of 2 mg/day.
Student’s t-Test
Now we will perform the t-test using the two tooth length vectors dose_1 and dose_2.
dose_1 <- ToothGrowth$len[ToothGrowth$dose == "1"]
dose_2 <- ToothGrowth$len[ToothGrowth$dose == "2"]
t.test(dose_1, dose_2,
alternative = "less", # is the alterntavie that dose_1 has a smaller mean than dose_2 (here as well: which should be true, looking at the graphics before)
paired = FALSE, # the data point are not paired with each other
var.equal = FALSE, # the variances are not equal
conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: dose_1 and dose_2
## t = -4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -4.17387
## sample estimates:
## mean of x mean of y
## 19.735 26.100
At 5% significance level (1-conf.level), there is sufficient evidence to reject the Null Hypothesis. The tooth growth is greater when a 2 mg/day dose is given instead of 1 mg/day dose.
After running the hypothesis tests, based on the Tooth Growth sample, we can confirm the following: