This is the second part of the final project for the course Statistical Inference by JHU in Coursera.
In this project, we will do some basic inferential data analysis on the ToothGrowth data which is part of the datasets package of R.
ToothGrowth is the data that shows the results from an experiment on 60 guinea pigs that took vitamin C in different methods (by orange juice or by ascorbic acid), and the length of odontoblast was measured. The sample was divided into two groups. One that takes orange juice, and another that takes ascorbic acid. Each animal from each group taken a specific dosage of the method assigned to them, and odontoblast was then measured.
My main purpose of this study is to test whether there are significant differences on the length of odontoblasts to guinea pigs if different dosages of ascorbic acid or orange juice were given to the subjects.
I’ve stored the data in tooth variable. Let’s look at the first few rows.
head(tooth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
There are only three variables in this data. len is the measured length of the odontoblasts after the dosage. supp is the method how the vitamin C was taken, which is either via orange juice (OJ) or ascorbic acid (VC). And the dose variable gives us the dosage that were given to the guinea pigs, which is either 0.5, 1, or 2. For sanity check, let’s see if the sample size is equally distributed among the methods of intake and the number of dosages.
table(tooth[, 2:3])
## dose
## supp 0.5 1 2
## OJ 10 10 10
## VC 10 10 10
The 5-number summary of the len variable is worth looking. Let’s take a look at these summaries per the dosage that was taken.
dosage <- sort(unique(tooth$dose))
for (i in 1:3) print(summary(tooth$len[tooth$dose == dosage[i]]))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.200 7.225 9.850 10.605 12.250 21.500
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.60 16.25 19.25 19.73 23.38 27.30
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.50 23.52 25.95 26.10 27.82 33.90
Just by looking at the minimum and maximum values (or the range) per dosage, it’s pretty obvious that the length of odontoblasts is generally increasing as the dosage increases. These summaries do not provide distinctions between taking orange juice and ascorbic acid. So, it’s better to visualize it.
tooth %>% mutate(supp = if_else(supp == 'OJ', 'Orange Juice', 'Ascorbic Acid')) %>%
ggplot(aes(x = as.factor(dose), y = len, fill = supp)) +
geom_boxplot() +
stat_summary(fun = mean, geom = 'point', aes(group = supp),
position = position_dodge(.75), size = 2,
col = 'blue', alpha = 0.5) +
theme_minimal() +
labs(x = 'Dosage', y = 'Length',
title = 'Distribution of Odontoblasts per method of vitamin C intake per dosage',
fill = 'Method of Intake')
The above boxplots show us that taking orange juice with dosage of 0.5 and 1 give larger variance to odontoblasts as compared to taking ascorbic acid. With orange juice, we can also get a higher average length of odontoblasts. This story has changed when the dosage was raised to 2. This time, taking ascorbic acid gave more variance. The average length of odontoblasts has now become approximately the same between orange juice and ascorbic acid takers.
Let’s see what will happen if we disregarded the dosages, and just focus on the method of vitamin C intake.
tooth %>% mutate(supp = if_else(supp == 'OJ', 'Orange Juice', 'Ascorbic Acid')) %>%
ggplot(aes(x = supp, y = len, fill = supp)) +
geom_boxplot(show.legend = FALSE) +
stat_summary(fun = mean, geom = 'point', size = 4,
col = 'blue', alpha = 0.5, show.legend = FALSE) +
theme_minimal() +
labs(x = 'Supplementary', y = 'Odontoblast Length',
title = 'Boxplots of odontoblasts length per method of intake')
So, if we looked at these supplements without the dosage, we can say that taking ascorbic acid gives more variance of length of odontoblasts than taking orange juice. In addition, taking orange juice has a higher average length of odontoblasts.
It would be logical to study the effects of orange juice and ascorbic acid by dosage, and not as a whole to be more accurate in concluding this study. Hence, I’m going to test three hypotheses. Below are my null and alternative hypotheses across all tests for all dosages.
\(H_0: \mu_{OJ} - \mu_{VC} = 0\)
\(H_A: \mu_{OJ} - \mu_{VC} \ne 0\)
In this study, I will use paired t test to conduct difference in means tests, and see if the difference is 0. Let’s create the variables of subsets of our data.
oj1 <- tooth$len[tooth$supp == 'OJ' & tooth$dose == 0.5]
vc1 <- tooth$len[tooth$supp == 'VC' & tooth$dose == 0.5]
oj2 <- tooth$len[tooth$supp == 'OJ' & tooth$dose == 1]
vc2 <- tooth$len[tooth$supp == 'VC' & tooth$dose == 1]
oj3 <- tooth$len[tooth$supp == 'OJ' & tooth$dose == 2]
vc3 <- tooth$len[tooth$supp == 'VC' & tooth$dose == 2]
limit <- qt(0.975, 9)
cat('t statistic (limit): ', limit)
## t statistic (limit): 2.262157
So each pair of oj and vc variables are the inputs of our hypotheses tests. I find it logical to do the tests by dosage because I’m interested to see if the different variances that we’ve seen earlier will affect our hypothesis tests. I also reserved the variable called limit for us to see if the calculated t statistics of the hypothesis tests will be greater or lower than this “limit” that we set.
dose1 <- t.test(oj1, vc1, paired = TRUE)
dose2 <- t.test(oj2, vc2, paired = TRUE)
dose3 <- t.test(oj3, vc3, paired = TRUE)
The dose variables above contain the results of the hypothesis tests conducted to three pairs of the subsets of our data.
results <- data.frame(Doses = c(0.5, 1, 2),
T.Statistic = c(dose1$statistic, dose2$statistic, dose3$statistic),
Min = c(dose1$conf.int[1], dose2$conf.int[1], dose3$conf.int[1]),
Max = c(dose1$conf.int[2], dose2$conf.int[2], dose3$conf.int[2]),
P.Value = c(dose1$p.value, dose2$p.value, dose3$p.value))
results
## Doses T.Statistic Min Max P.Value
## 1 0.5 2.97910471 1.263458 9.236542 0.015472048
## 2 1.0 3.37211953 1.951911 9.908089 0.008229248
## 3 2.0 -0.04259204 -4.328976 4.168976 0.966956704
I compiled the needed details in a dataframe called results to easily see the results of the tests.
The first row which is the hypothesis test result for the 0.5 dosage of orange juice and ascorbic acid, our t statistic is 2.98 which is greater than the computed t statistic limit of 2.26. In addition, our confidence interval of 1.26 to 9.24 does not include 0, so we can rule out having equal means here. The p-value is 0.0156, which is below 0.05, which means that the results are significant. Hence, we reject the null hypothesis.
For dosage of 1, the t statistic of our hypothesis test is 3.37 which is greater than the limit that we set, 2.26. Also, the confidence interval for this test is 1.95 to 9.91. This doesn’t contain zero, so we can rule out having equal means. Lastly, the p-value is 0.008 which is lower than 0.05. Therefore, we reject the null hypothesis.
The last dosage tells a different story. The t statistic is -0.04 which is lower than 2.26. The confidence interval range is -4.32 and 4.16. This contains zero so can rule out having the same means. And, the p-value is 0.97 which is way above our alpha value of 0.05. So, we fail to reject the null hypothesis.
It’s also worth noting to see the power associated to each tests.
power.t.test(n = 10, delta = dose1$estimate, sd = sd(oj1 - vc1))$power
## [1] 0.5134363
power.t.test(n = 10, delta = dose2$estimate, sd = sd(oj2 - vc2))$power
## [1] 0.6164196
power.t.test(n = 10, delta = dose3$estimate, sd = sd(oj3 - vc3))$power
## [1] 0.02671658
The above powers measure the probability of rejecting null hypothesis when the null hypothesis is indeed false. Power is no longer relevant for the third test since we failed to reject the null hypothesis. I’m just presenting it for uniformity.
The power attached to the first hypothesis test is 51.34%, which is fairly high. But we could consider it being near to having an equal probability of having a Type II error, that is 1 - Power. If we’re interested in getting more power, say, 70%, we can see what other factors that we can change. Let’s see how many more samples would we need to get that power.
power.t.test(delta = dose1$estimate, sd = sd(oj1 - vc1), power = .7)$n
## [1] 14.9292
So, we need to have 15 samples to get that power provided that the mean and standard deviation are the same. Let’s also do it for the second hypothesis test. Though 61.64% of power is fairly high, let’s assume that we need a threshold of 70% power. So, let’s compute for the number of samples needed to get this.
power.t.test(delta = dose2$estimate, sd = sd(oj2 - vc2), power = .7)$n
## [1] 11.89229
So, we just need 12 samples to get this power.
As we’ve seen from the boxplots earlier, taking orange juice at dosage level of 0.5 and 1 gives larger mean and variance to the odontoblasts length as compared to the same level of dosages of ascorbic acid. This has been proven by our hypothesis test and having 0 difference in means, in other words, having equal means, has been rejected with significant rate of p-values and confidence intervals. But just like what we’ve seen from the boxplots, at dosage level 2, the two means become close to each other. Our hypothesis test on dosage level 2 says that we can’t rule out having the same means.
In analogy, perhaps having greater dosages of ascorbic acid can lead to higher average length of odontoblasts, which could exceed even exceed the average odontoblasts length that orange juice can produce.