Exploratory Analysis of ToothGrowth Dataset and Hypothesis Testing

Overview:

This is a Statistical Inference Course Project report which attempts to do basic exploratory analysis of the ToothGrowth dataset. The report identifies some basic hypotheses to compare tooth growth between the candidate supplements at different doses. With the help of t-tests and permutation tests, the report proceeds to reject or accept the null hypotheses. At the outset, some basic conclusions are presented.

Assumptions:

The following is a list of assumptions for basic statistical inference:

Independent and identically distributed samples
The samples are drawn from approximately normal distribution or the sample statistic approach normal distribution as a result of CLT. Using t-tests improves the robustness to the normality assumption.
The underlying distributions are roughly symmetrical and there is no pronounced skew in the pdf or pmf
The samples are randomly drawn from the concerned populations for robust inference.
Unequal variances are assumed for group testing.

Limitations of the Dataset:

Two limitations are noted with the ToothLength dataset:

No randomization: There were only 10 Guinea pigs getting each of the six treatments (3 doses x 2 supplements) instead of the expected sample of 60 guinea pigs with batches of 10 samples which could have been assigned randomly to the six treatments.
There is no baseline for tooth growth without administering any supplements. This makes it hard to assess the overall effect of the supplements compared to normal tooth growth without any supplements, which would be very useful.

Basic Exploration of ToothGrowth Dataset:

Notation used in the report: OJ stands for Orange Juice, and VC stands for Ascorbic Acid (Vitamin C)

## Warning: package 'lattice' was built under R version 3.1.3

## Warning: package 'knitr' was built under R version 3.1.3

opts_chunk$set(echo=TRUE, cache=TRUE)

data(ToothGrowth)

ToothGrowth$dose <- as.factor(ToothGrowth$dose)

Here is a basic summary of the dataset:

summary(ToothGrowth)

##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

# Identify if there are any NA in the dataset
sum(any(is.na(ToothGrowth)))

## [1] 0

# There are no NA in the dataset

The following is a plot of the ToothGrowth dataset (courtesy: Help page for ToothGrowth dataset)

coplot(len ~ dose | supp, data=ToothGrowth, 
       panel=panel.smooth, 
       lwd=2, 
       col=c("darkblue"),
       xlab = "dose",
       ylab = "Tooth Length"
)

Figure 1: ToothGrowth data: length vs dose for each supplement

tg <- select(ToothGrowth,len,supp,dose)  

meanStats <- tapply(tg$len,list(tg$supp,tg$dose),mean)

meanStats

##      0.5     1     2
## OJ 13.23 22.70 26.06
## VC  7.98 16.77 26.14

It appears OJ has mean lengths that are significantly higher than VC at low and medium doses.

varStats <-  tapply(tg$len,list(tg$supp,tg$dose),var)

varStats

##       0.5         1         2
## OJ 19.889 15.295556  7.049333
## VC  7.544  6.326778 23.018222

It appears OJ has considerably higher variance in the lengths at low and medium doses. However, VC seems to higher variance than OJ at higher doses. Higher variance indicates higher variability in the underlying data. Based on the observed variances, it is not intuitive to deduce the overall efficacy of the supplements without systematic statistical inference.

Framing of test Hypotheses:

One can ask several questions that need answers for this dataset. Here is a list of some possible questions for this project. A good reference for a discussion on framing the questions is Interpreting Linear Models.

Q1. H0: There is no difference in the mean effects between supplements vs Ha: Supplements have different effects

Q2. H0: Doses have the same mean effect vs Ha: Dose effects differ

Q3. H0: The effect of supplement (OJ vs VC) is the same at all doses vs Ha: Mean Supplement effects (OJ vs VC) differ by dosage

Q4. H0: The effect of supplement between OJ and VC is not greater than 0 at low and medium doses vs Ha: The effect of supplement between OJ and VC is actually greater than 0 at low and medium doses

Analysis:

The following analysis attempts to find answers to set of questios identified previously using only the hypothesis testing methods taught so far in this course on Statistical Inference (Ref: Statistical Inference for Data Science by Brian Caffo).

Q1. H0: There is no difference in the mean effects between Supplements vs Ha: Supplements have different effects

This proposition is also equivalent to testing whether supplement labels are irrelevant vs labels are relevant. Let us apply two-sided t-test using the differences between the groups: OJ and VC ignoring any dose effects.

vclen <- filter(tg,supp=="VC") %>% select(len)
ojlen <- filter(tg,supp=="OJ") %>% select(len)

# significance level
alpha = 0.05

t.test(ojlen, vclen, alternative="two.sided", conf.level=0.95, var.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  ojlen and vclen
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

# Compute the sample mean (m) for the specific case of interest (sample size = ns = 40)

The p-value is 0.06039 which is greater than alpha. Also the confdence interval (CI) is [-0.1710156, 7.5710156] which includes 0 value which is the value suggested by null hypothesis. The p-value is higher than alpha, but the p-value by itself is so low that it does not provide a ringing endorsement of the null hypothesis. It is a feeble but positive case for keeping the null hypothesis at 95% CI. Let us look at the attained significance level, which is defined as the smallest value of alpha that will result in rejecting the null hypothesis is 0.07. At 93% CI, alpha is 0.07, and p-value is lower than alpha. So we can lean on rejecting the null hypothesis at 93% CI.

To confirm this decision, let us apply a second method such as Permutation test for the same proposition (a null hypothesis that the labels are irrelevant vs labels are relevant). Let us choose 10000 permutations for sampling the tooth lengths with replacement. The supp labels are shuffled between OJ and VC based on a random selection of the supp labels in each simulation.

tlen <- tg$len
supp <- as.character(tg$supp)
testStat <- function(w, g) mean(w[g == "OJ"]) - mean(w[g == "VC"])
observedStat <- testStat(tlen, supp)
permutations <- sapply(1:10000, function(i) testStat(tlen, sample(supp)))

cat("The test statistic is: ", observedStat, "\n")

## The test statistic is:  3.7

Let us calculate the p-value, which is defined as the probability of observing data as or more extreme in favor of the alternative than was actually obtained, where the probability is calculated assuming that the null hypothesis is true.

p-value = The proportion of times we got a simulated statistic larger than our observed statistic is:

mean(permutations > observedStat)

## [1] 0.0305

Let us plot the histogram of the permuted tooth length dataset. It is evident that the distribution fits a normal distribution. The line (in purple color) in the figure denotes the value of the test statistic.

histogram(permutations,
          nint=30,
          type="density",
          xlab="Tooth length",
          main="Figure 2: Histogram of Permuted tooth lengths",
          panel=function(x, ...) {
              panel.abline(v=observedStat, col.line="purple",lwd=4)
              panel.histogram(x, ...)
              panel.mathdensity(
                  dmath=dnorm, 
                  col="orange",
                  lwd=c(2),
                  args=list(mean=mean(x),sd=sd(x))
              )
              panel.grid(h=10, v=10)
          }     
)

cat("The p-value ",mean(permutations > observedStat), " is less than alpha=0.05 even if the difference is not significant.","\n")

## The p-value  0.0305  is less than alpha=0.05 even if the difference is not significant.

At the outset based on t-test and Permutation test, it appears reasonable to reject the null hypothesis.

Conclusion: We reject the null hypothesis (H0) that there is no difference between the mean upplement effects with CI between 93% and 95%, in favor of the alternate hypothesis (Ha) that there is difference between the mean supplement effects.

Q2. H0: Doses have the same mean effect vs Ha: Dose effects differ

We compare the sample sets for individual doses pooling together the supplements. Since we have three doses, we have three tests for comparison 1.0 vs 0.5, 2.0 vs 1.0, and 2.0 vs 0.5. In this analysis, doses of 0.5 mg, 1.0 mg, and 2.0 mg are designated as low, medium, and high. Let us apply t-tests using the differences between doses but ignoring the supplement labels. Since there are three possible combinations arising of three choosing two at a time, this will result in three different t-tests, as shown below:

lowDose <- filter(tg, dose==0.5) %>% select(len)
medDose <- filter(tg, dose==1.0) %>% select(len)
highDose <- filter(tg, dose==2.0) %>% select(len)

# p-values and confidence intervals
# (medium - low) dose effects
t.test(medDose, lowDose, alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value

## [1] 1.268301e-07

t.test(medDose, lowDose, alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]

## [1]  6.276219 11.983781

# (high - medium) dose effects
t.test(highDose, medDose, alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value

## [1] 1.90643e-05

t.test(highDose, medDose, alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]

## [1] 3.733519 8.996481

# (high - medium) dose effects
t.test(highDose, lowDose, alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value

## [1] 4.397525e-14

t.test(highDose, lowDose, alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]

## [1] 12.83383 18.15617

All the returned p-values are several orders of magnitude lower than alpha (=0.05) approaching zero, and all the confidence intervals exclude 0. So we strongly reject the null hypothesis. We accept the alternate hypothesis that the dose effects actually have different effects.

Now let us differentiate the dose related samples based on supplement type and individual dose

Let us compile lengths for each supplement and each dose into separate data frames to cover the possible combinations for dose and supplement.

# OJ dose datasets for low, medium, and high dose effects
lowDose_OJ <- filter(tg, dose==0.5 & supp=="OJ") %>% select(len)
medDose_OJ <- filter(tg, dose==1.0 & supp=="OJ") %>% select(len)
highDose_OJ <- filter(tg, dose==2.0 & supp=="OJ") %>% select(len)

# VC dose datasets for low, medium, and high dose effects
lowDose_VC <- filter(tg, dose==0.5 & supp=="VC") %>% select(len)
medDose_VC <- filter(tg, dose==1.0 & supp=="VC") %>% select(len)
highDose_VC <- filter(tg, dose==2.0 & supp=="VC") %>% select(len)

Let us evaluate two-sides t-tests for supp=OJ at various doses separately choosing two doses at a time for a possible three combinations.

# Compare medium and low dose effects for H0 vs Ha
t.test(medDose_OJ , lowDose_OJ , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value

## [1] 8.784919e-05

t.test(medDose_OJ , lowDose_OJ , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]

## [1]  5.524366 13.415634

Based on p-value and CI, we strongly reject null hypothesis H0 in favor of Ha

# Compare high and low dose effects for H0 vs Ha
t.test(highDose_OJ , lowDose_OJ , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value

## [1] 1.323784e-06

t.test(highDose_OJ , lowDose_OJ , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]

## [1]  9.324759 16.335241

Based on p-value and CI, we strongly reject null hypothesis H0 in favor of Ha

# Compare high and low dose effects for H0 vs Ha
t.test(highDose_OJ , medDose_OJ , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value

## [1] 0.03919514

t.test(highDose_OJ , medDose_OJ , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]

## [1] 0.1885575 6.5314425

The p-value (0.04) < alpha even if it is barely lower than alpha. The CI does not include 0 value. So we reject null hypothesis H0, even if it is not a strong endorsement of the alternate hypothesis Ha.

Conclusion: For OJ, we reject the null hypothesis that the doses have the same mean effect in favor of the alternate hypothesis that the dose effects do differ for OJ.

Let us evaluate two-sides t-tests for supp=VC at various doses separately choosing two doses at a time for a possible three combinations.

# Compare medium and low dose effects for H0 vs Ha
t.test(medDose_VC , lowDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value

## [1] 6.811018e-07

t.test(medDose_VC , lowDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]

## [1]  6.314288 11.265712

Based on p-value and CI, we strongly eject null hypothesis H0 in favor of Ha.

# Compare high and low dose effects for H0 vs Ha
t.test(highDose_VC , lowDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value

## [1] 4.681577e-08

t.test(highDose_VC , lowDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]

## [1] 14.41849 21.90151

Based on p-value and CI, we strongly reject null hypothesis H0

# Compare high and medium dose effects for H0 vs Ha
t.test(highDose_VC , medDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value

## [1] 9.155603e-05

t.test(highDose_VC , medDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]

## [1]  5.685733 13.054267

Based on p-value and CI, we definitively reject null hypothesis H0.

Conclusion: For VC, we reject the null hypothesis that the doses have the same mean effect in favor of the alternate hypothesis that the dose effects differ for VC

Overall conclusion: Reject the hypothesis that the doses have the same mean effect with 95% CI in favor of the alternate hypothesis that the dose effects differ

Q3. H0: The effect of supplement (OJ vs VC) is the same at all doses vs Ha: Mean Supplement effects (OJ vs VC) differ by dosage

Let us evaluate two-sides t-tests for dose=low between supp groups of OJ and VC.

# dose = 0.5 (low). Compare OJ vs VC at low dose
t.test(lowDose_OJ , lowDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value

## [1] 0.006358607

t.test(lowDose_OJ , lowDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]

## [1] 1.719057 8.780943

Based on p-value and CI, we reject H0 in favor of Ha

Let us evaluate two-sides t-tests for dose=medium between supp groups of OJ and VC.

# dose = 1.0 (medium). Compare OJ vs VC at medium dose
t.test(medDose_OJ , medDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$p.value

## [1] 0.001038376

t.test(medDose_OJ , medDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)$conf.int[1:2]

## [1] 2.802148 9.057852

Based on p-value and CI, we reject H0 in favor of Ha

Let us evaluate two-sides t-tests for dose=high between supp groups of OJ and VC.

# dose = 2.0 (high)
t.test(highDose_OJ , highDose_VC , alternative="two.sided", conf.level=0.95,var.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  highDose_OJ and highDose_VC
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean of x mean of y 
##     26.06     26.14

Based on p-value (significantly higher than alpha), and CI includes 0, we have strong evidence for null hypothesis H0 at high dose.

Overall conclusion: There is sufficient evidence for the alternate hypothesis that the mean supplement effects (OJ vs VC) do differ for low and medium doses. But there is insufficient evidence for the alternate hypothesis that the mean supplement effects (OJ vs VC) differ for high dose.

Q4. H0: The effect of supplement between OJ and VC is not greater than 0 at low and medium doses vs Ha: the effect of supplement between OJ and VC is actually greater than 0 for low and medium doses.

Since the two-sided test indicated rejection of H0 in favor of Ha (for medium and high doses), we can further test the null hypothesis vs the alternate hypothesis that the mean supplement effect for OJ is actually greater than that for VC for low and medium doses.

We can do this by using the alternative of “greater” in t.test for group testing.

t.test(lowDose_OJ, lowDose_VC, alternative="greater", conf.level=0.95,var.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  lowDose_OJ and lowDose_VC
## t = 3.1697, df = 14.969, p-value = 0.003179
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  2.34604     Inf
## sample estimates:
## mean of x mean of y 
##     13.23      7.98

t.test(medDose_OJ, medDose_VC, alternative="greater", conf.level=0.95,var.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  medDose_OJ and medDose_VC
## t = 4.0328, df = 15.358, p-value = 0.0005192
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  3.356158      Inf
## sample estimates:
## mean of x mean of y 
##     22.70     16.77

Based on p-values and CI, we reject H0 in favor of Ha that the OJ has higher mean effects than VC for low and medium doses

Overall study conclusions:

There is sufficient evidence for the hypothesis that supplements have different mean effects. Higher sample size is needed to improve the significance of this result.
There is strong evidence for the alternate hypothesis that mean dose effects do have differences.
There is strong evidence for the alternate hypothesis that the mean supplement effects (OJ vs VC) do differ for low and medium doses.
There is strong evidence for the null hypothesis that the mean supplement effects (OJ vs VC) do not differ for high doses. Thus we reject the alternate hypothesis at high doses.
There is strong evidence for the hypothesis that OJ has actually higher mean effect than VC for low and medium doses