This is a Statistical Inference Course Project report which attempts to do basic exploratory analysis of the ToothGrowth dataset. The report identifies some basic hypotheses to compare tooth growth between the candidate supplements at different doses. With the help of t-tests and permutation tests, the report proceeds to reject or accept the null hypotheses. At the outset, some basic conclusions are presented.
The following is a list of assumptions for basic statistical inference:
Independent and identically distributed samples
The samples are drawn from approximately normal distribution or the sample statistic approach normal distribution as a result of CLT. Using t-tests improves the robustness to the normality assumption.
The underlying distributions are roughly symmetrical and there is no pronounced skew in the pdf or pmf
The samples are randomly drawn from the concerned populations for robust inference.
Unequal variances are assumed for group testing.
Two limitations are noted with the ToothLength dataset:
No randomization: There were only 10 Guinea pigs getting each of the six treatments (3 doses x 2 supplements) instead of the expected sample of 60 guinea pigs with batches of 10 samples which could have been assigned randomly to the six treatments.
There is no baseline for tooth growth without administering any supplements. This makes it hard to assess the overall effect of the supplements compared to normal tooth growth without any supplements, which would be very useful.
Notation used in the report: OJ stands for Orange Juice, and VC stands for Ascorbic Acid (Vitamin C)
## Warning: package 'lattice' was built under R version 3.1.3
## Warning: package 'knitr' was built under R version 3.1.3
opts_chunk$set(echo=TRUE, cache=TRUE)
data(ToothGrowth)
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
Here is a basic summary of the dataset:
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
# Identify if there are any NA in the dataset
sum(any(is.na(ToothGrowth)))
## [1] 0
# There are no NA in the dataset
The following is a plot of the ToothGrowth dataset (courtesy: Help page for ToothGrowth dataset)
coplot(len ~ dose | supp, data=ToothGrowth,
panel=panel.smooth,
lwd=2,
col=c("darkblue"),
xlab = "dose",
ylab = "Tooth Length"
)
Figure 1: ToothGrowth data: length vs dose for each supplement
tg <- select(ToothGrowth,len,supp,dose)
meanStats <- tapply(tg$len,list(tg$supp,tg$dose),mean)
meanStats
## 0.5 1 2
## OJ 13.23 22.70 26.06
## VC 7.98 16.77 26.14
It appears OJ has mean lengths that are significantly higher than VC at low and medium doses.
varStats <- tapply(tg$len,list(tg$supp,tg$dose),var)
varStats
## 0.5 1 2
## OJ 19.889 15.295556 7.049333
## VC 7.544 6.326778 23.018222
It appears OJ has considerably higher variance in the lengths at low and medium doses. However, VC seems to higher variance than OJ at higher doses. Higher variance indicates higher variability in the underlying data. Based on the observed variances, it is not intuitive to deduce the overall efficacy of the supplements without systematic statistical inference.
One can ask several questions that need answers for this dataset. Here is a list of some possible questions for this project. A good reference for a discussion on framing the questions is Interpreting Linear Models.
Q1. H0: There is no difference in the mean effects between supplements vs Ha: Supplements have different effects
Q2. H0: Doses have the same mean effect vs Ha: Dose effects differ
Q3. H0: The effect of supplement (OJ vs VC) is the same at all doses vs Ha: Mean Supplement effects (OJ vs VC) differ by dosage
Q4. H0: The effect of supplement between OJ and VC is not greater than 0 at low and medium doses vs Ha: The effect of supplement between OJ and VC is actually greater than 0 at low and medium doses
The following analysis attempts to find answers to set of questios identified previously using only the hypothesis testing methods taught so far in this course on Statistical Inference (Ref: Statistical Inference for Data Science by Brian Caffo).
This proposition is also equivalent to testing whether supplement labels are irrelevant vs labels are relevant. Let us apply two-sided t-test using the differences between the groups: OJ and VC ignoring any dose effects.
vclen <- filter(tg,supp=="VC") %>% select(len)
ojlen <- filter(tg,supp=="OJ") %>% select(len)
# significance level
alpha = 0.05
t.test(ojlen, vclen, alternative="two.sided", conf.level=0.95, var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: ojlen and vclen
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
# Compute the sample mean (m) for the specific case of interest (sample size = ns = 40)
The p-value is 0.06039 which is greater than alpha. Also the confdence interval (CI) is [-0.1710156, 7.5710156] which includes 0 value which is the value suggested by null hypothesis. The p-value is higher than alpha, but the p-value by itself is so low that it does not provide a ringing endorsement of the null hypothesis. It is a feeble but positive case for keeping the null hypothesis at 95% CI. Let us look at the attained significance level, which is defined as the smallest value of alpha that will result in rejecting the null hypothesis is 0.07. At 93% CI, alpha is 0.07, and p-value is lower than alpha. So we can lean on rejecting the null hypothesis at 93% CI.
To confirm this decision, let us apply a second method such as Permutation test for the same proposition (a null hypothesis that the labels are irrelevant vs labels are relevant). Let us choose 10000 permutations for sampling the tooth lengths with replacement. The supp labels are shuffled between OJ and VC based on a random selection of the supp labels in each simulation.
tlen <- tg$len
supp <- as.character(tg$supp)
testStat <- function(w, g) mean(w[g == "OJ"]) - mean(w[g == "VC"])
observedStat <- testStat(tlen, supp)
permutations <- sapply(1:10000, function(i) testStat(tlen, sample(supp)))
cat("The test statistic is: ", observedStat, "\n")
## The test statistic is: 3.7
Let us calculate the p-value, which is defined as the probability of observing data as or more extreme in favor of the alternative than was actually obtained, where the probability is calculated assuming that the null hypothesis is true.
p-value = The proportion of times we got a simulated statistic larger than our observed statistic is:
mean(permutations > observedStat)
## [1] 0.0305
Let us plot the histogram of the permuted tooth length dataset. It is evident that the distribution fits a normal distribution. The line (in purple color) in the figure denotes the value of the test statistic.
histogram(permutations,
nint=30,
type="density",
xlab="Tooth length",
main="Figure 2: Histogram of Permuted tooth lengths",
panel=function(x, ...) {
panel.abline(v=observedStat, col.line="purple",lwd=4)
panel.histogram(x, ...)
panel.mathdensity(
dmath=dnorm,
col="orange",
lwd=c(2),
args=list(mean=mean(x),sd=sd(x))
)
panel.grid(h=10, v=10)
}
)
cat("The p-value ",mean(permutations > observedStat), " is less than alpha=0.05 even if the difference is not significant.","\n")
## The p-value 0.0305 is less than alpha=0.05 even if the difference is not significant.
At the outset based on t-test and Permutation test, it appears reasonable to reject the null hypothesis.
We compare the sample sets for individual doses pooling together the supplements. Since we have three doses, we have three tests for comparison 1.0 vs 0.5, 2.0 vs 1.0, and 2.0 vs 0.5. In this analysis, doses of 0.5 mg, 1.0 mg, and 2.0 mg are designated as low, medium, and high. Let us apply t-tests using the differences between doses but ignoring the supplement labels. Since there are three possible combinations arising of three choosing two at a time, this will result in three different t-tests, as shown below:
lowDose <- filter(tg, dose==0.5) %>% select(len)
medDose <- filter(tg, dose==1.0) %>% select(len)
highDose <- filter(tg, dose==2.0) %>% select(len)
# p-values and confidence intervals
# (medium - low) dose effects
t.test(medDose, lowDose, alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value
## [1] 1.268301e-07
t.test(medDose, lowDose, alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]
## [1] 6.276219 11.983781
# (high - medium) dose effects
t.test(highDose, medDose, alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value
## [1] 1.90643e-05
t.test(highDose, medDose, alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]
## [1] 3.733519 8.996481
# (high - medium) dose effects
t.test(highDose, lowDose, alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value
## [1] 4.397525e-14
t.test(highDose, lowDose, alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]
## [1] 12.83383 18.15617
All the returned p-values are several orders of magnitude lower than alpha (=0.05) approaching zero, and all the confidence intervals exclude 0. So we strongly reject the null hypothesis. We accept the alternate hypothesis that the dose effects actually have different effects.
Now let us differentiate the dose related samples based on supplement type and individual dose
Let us compile lengths for each supplement and each dose into separate data frames to cover the possible combinations for dose and supplement.
# OJ dose datasets for low, medium, and high dose effects
lowDose_OJ <- filter(tg, dose==0.5 & supp=="OJ") %>% select(len)
medDose_OJ <- filter(tg, dose==1.0 & supp=="OJ") %>% select(len)
highDose_OJ <- filter(tg, dose==2.0 & supp=="OJ") %>% select(len)
# VC dose datasets for low, medium, and high dose effects
lowDose_VC <- filter(tg, dose==0.5 & supp=="VC") %>% select(len)
medDose_VC <- filter(tg, dose==1.0 & supp=="VC") %>% select(len)
highDose_VC <- filter(tg, dose==2.0 & supp=="VC") %>% select(len)
Let us evaluate two-sides t-tests for supp=OJ at various doses separately choosing two doses at a time for a possible three combinations.
# Compare medium and low dose effects for H0 vs Ha
t.test(medDose_OJ , lowDose_OJ , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value
## [1] 8.784919e-05
t.test(medDose_OJ , lowDose_OJ , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]
## [1] 5.524366 13.415634
Based on p-value and CI, we strongly reject null hypothesis H0 in favor of Ha
# Compare high and low dose effects for H0 vs Ha
t.test(highDose_OJ , lowDose_OJ , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value
## [1] 1.323784e-06
t.test(highDose_OJ , lowDose_OJ , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]
## [1] 9.324759 16.335241
Based on p-value and CI, we strongly reject null hypothesis H0 in favor of Ha
# Compare high and low dose effects for H0 vs Ha
t.test(highDose_OJ , medDose_OJ , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value
## [1] 0.03919514
t.test(highDose_OJ , medDose_OJ , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]
## [1] 0.1885575 6.5314425
The p-value (0.04) < alpha even if it is barely lower than alpha. The CI does not include 0 value. So we reject null hypothesis H0, even if it is not a strong endorsement of the alternate hypothesis Ha.
Let us evaluate two-sides t-tests for supp=VC at various doses separately choosing two doses at a time for a possible three combinations.
# Compare medium and low dose effects for H0 vs Ha
t.test(medDose_VC , lowDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value
## [1] 6.811018e-07
t.test(medDose_VC , lowDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]
## [1] 6.314288 11.265712
Based on p-value and CI, we strongly eject null hypothesis H0 in favor of Ha.
# Compare high and low dose effects for H0 vs Ha
t.test(highDose_VC , lowDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value
## [1] 4.681577e-08
t.test(highDose_VC , lowDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]
## [1] 14.41849 21.90151
Based on p-value and CI, we strongly reject null hypothesis H0
# Compare high and medium dose effects for H0 vs Ha
t.test(highDose_VC , medDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value
## [1] 9.155603e-05
t.test(highDose_VC , medDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]
## [1] 5.685733 13.054267
Based on p-value and CI, we definitively reject null hypothesis H0.
Conclusion: For VC, we reject the null hypothesis that the doses have the same mean effect in favor of the alternate hypothesis that the dose effects differ for VC
# dose = 0.5 (low). Compare OJ vs VC at low dose
t.test(lowDose_OJ , lowDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value
## [1] 0.006358607
t.test(lowDose_OJ , lowDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]
## [1] 1.719057 8.780943
Based on p-value and CI, we reject H0 in favor of Ha
# dose = 1.0 (medium). Compare OJ vs VC at medium dose
t.test(medDose_OJ , medDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value
## [1] 0.001038376
t.test(medDose_OJ , medDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]
## [1] 2.802148 9.057852
Based on p-value and CI, we reject H0 in favor of Ha
# dose = 2.0 (high)
t.test(highDose_OJ , highDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: highDose_OJ and highDose_VC
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean of x mean of y
## 26.06 26.14
Based on p-value (significantly higher than alpha), and CI includes 0, we have strong evidence for null hypothesis H0 at high dose.
Since the two-sided test indicated rejection of H0 in favor of Ha (for medium and high doses), we can further test the null hypothesis vs the alternate hypothesis that the mean supplement effect for OJ is actually greater than that for VC for low and medium doses.
We can do this by using the alternative of “greater” in t.test for group testing.
t.test(lowDose_OJ, lowDose_VC, alternative="greater", conf.level=0.95,var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: lowDose_OJ and lowDose_VC
## t = 3.1697, df = 14.969, p-value = 0.003179
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 2.34604 Inf
## sample estimates:
## mean of x mean of y
## 13.23 7.98
t.test(medDose_OJ, medDose_VC, alternative="greater", conf.level=0.95,var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: medDose_OJ and medDose_VC
## t = 4.0328, df = 15.358, p-value = 0.0005192
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 3.356158 Inf
## sample estimates:
## mean of x mean of y
## 22.70 16.77
Based on p-values and CI, we reject H0 in favor of Ha that the OJ has higher mean effects than VC for low and medium doses
There is sufficient evidence for the hypothesis that supplements have different mean effects. Higher sample size is needed to improve the significance of this result.
There is strong evidence for the alternate hypothesis that mean dose effects do have differences.
There is strong evidence for the alternate hypothesis that the mean supplement effects (OJ vs VC) do differ for low and medium doses.
There is strong evidence for the null hypothesis that the mean supplement effects (OJ vs VC) do not differ for high doses. Thus we reject the alternate hypothesis at high doses.
There is strong evidence for the hypothesis that OJ has actually higher mean effect than VC for low and medium doses