Statistical Inference Project : Part 2. Perform Basic Inferential Data Analysis on ToothGrowth Data in R datasets (By : Narendra Shukla)

What is this Dataset ?

The ToothGrowth data consists of 60 observations of 3 variables. It measures tooth growth (microns) in each of 10 guinea pigs, at each of three dose levels of Vitamin C (0.5, 1, and 2 mg), with each of two delivery methods (orange juice or ascorbic acid). The variables are, mean length of odontoblasts (microns),supplement type (OJ or VC),Vitamin C dosage (milligrams/day)

Loading Data, Providing Basic Summary, Performing Some Basic Exploratory Data Analysis

Refer to Appendix section A to review Loading of Data & Gathering Basic Summary of the data.

We now summarize the data by dose and supp. Please refer to Appendix section B.

Let’s see the variation in tooth growth by supplement type. Code is available in Appendix Section C :

Figure 1 :

Based on figure 1, we draw following conclusions :

OJ data seems to be skewed to the left
VC data seems to be skewed to the right
Median of OJ seems to be more on the right than Median of VC

Now we know that there are 3 types of doses given ie (0.5, 1, and 2 mg). Let’s see the tooth growth by supp and dose.

Figure 2 :

Based on figure 2, we draw following conclusions :

In (0.5) and (1.0) milligram dosage, tooth growth in OJ seems to be more than tooth growth in VC
In (2.0) milligram dosage, tooth growth in VC seems to have larger variance
There seems to be increased tooth growth between (0.5) and (1.0) milligram of dosage & between (1.0) and (2.0) milligram of dosage

Running Hypothesis Tests to compare tooth growth by supp and dose

We shall conduct 3 different t-tests. For each of the t-tests, our assumptions are,

Observations are independent of each other
Observations came from nearly normal distribution
Sample size is small
Since each guinea pig is different from one another, we shall NOT DO paired testing. ie. We shall do Independent Group testing
We shall assume variance between 2 groups to be equal

Test 1 : Compare MEAN Tooth-Growth between Supplement Type

Ho -> Mu(OJ) - Mu(VC) = 0

H1 -> Mu(OJ) - Mu(VC) != 0

Refer to Appendix Section D : for Confidence Interval testing and conclusion.

Now we shall execute hypothesis test. Test is available in Appendix Section D :

t.test(len ~ supp, paired = FALSE, var.equal = TRUE, data = ToothGrowth)

There are 2 ways to verify the results,

One, we reject the null hypothesis if our test statistic is larger than qt(.975, 58) or smaller than qt(.025, 58). Here DF(Degree Of Freedom) = 58. That is, we would reject the null hypothesis if our t-test statistic is greater than 2.001717 or lower than -2.001717. But, our t-test statistic is 1.9153. Hence, we fail to reject null hypothesis.

Two, check whether {p-value} ie. 0.06039 < {alpha ie. significance level} ie. 0.05. This test is FALSE.

Conclusion : Therefore, we fail to reject null hypothesis that true population difference in the mean between OJ and VC is equal to 0.

Test 2 : Compare MEAN Tooth-Growth between (0.5) & (1.0) milligram Doses

Ho -> Mu(oneDose) - Mu(pointFiveDose) = 0

H1 -> Mu(oneDose) - Mu(pointFiveDose) > 0 ,One-Tail Test.

Let’s split ToothGrowth Data, by dose. Code is available in Appendix Section E :.

Let’s run the t-test. Test is available in Appendix Section F :

t.test(onelen, pointfivelen, alternative="greater", paired=FALSE, var.equal=TRUE)

Conclusion : Since {p-value} ie. 6.331e-08 is much less than {alpha} ie. 0.05, we reject the null hypothesis. We conclude that true population difference in the mean of (1.0) milligram dose and (0.5) milligram dose, is greater than 0. ie. (1.0) milligram dose is more effective than (0.5) dose in terms of tooth growth in population.

Test 3 : Compare MEAN Tooth-Growth between (2.0) mg Doses of VC & OJ

Here’s our test,

Ho -> Mu(twoVC) - Mu(twoOJ) = 0

H1 -> Mu(twoVC) - Mu(twoOJ) > 0 ,One-Tail Test.

Here’s a command to execute this test. Please refer to Appendix Section G : for details.

t.test(twoVC, twoOJ, alternative="greater", paired=FALSE, var.equal=TRUE)

Conclusion : Here {p-value} ie. 0.4819 is actually greater than {alpha} ie. 0.05. Therefore, we fail to reject null hypothesis that in (2.0) milligram dosage, true population difference in the mean, in the tooth growth, between VC and OJ is equal to 0.

Appendix :

Section A : Here’s Data Loading & Gathering Basic Summary of the Data,

Load Data,

data(ToothGrowth)

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Section B : Summarize the data by dose and supp,

library(dplyr)

ToothGrowth %>% group_by(dose,supp) %>% 
    summarize(countlen = length(len), minlen = min(len),
              maxlen = max(len), avglen = mean(len))

## Source: local data frame [6 x 6]
## Groups: dose
## 
##   dose supp countlen minlen maxlen avglen
## 1  0.5   OJ       10    8.2   21.5  13.23
## 2  0.5   VC       10    4.2   11.5   7.98
## 3  1.0   OJ       10   14.5   27.3  22.70
## 4  1.0   VC       10   13.6   22.5  16.77
## 5  2.0   OJ       10   22.4   30.9  26.06
## 6  2.0   VC       10   18.5   33.9  26.14

Section C : Code for Figure 1,

library(ggplot2)
g11 <- ggplot(ToothGrowth, aes(x = len, fill=supp))  +
     geom_histogram(fill = "white", binwidth=1, aes(y=..density..), colour="black") +
     geom_density(size = 2, colour = "black",alpha=0.2, fill="blue") +
     facet_grid(. ~ supp) +
     labs(x="Tooth Growth (microns)") +
     labs(y="Density") +
     labs(title="Figure 1 : Histogram of Tooth Growth By Supplement Type")
print(g11)

Section D : Compare MEAN Tooth-Growth between Supplement Type,

rbind(
 t.test(len ~ supp, paired = FALSE, var.equal = TRUE, data = ToothGrowth)$conf,
 t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = ToothGrowth)$conf)

##            [,1]     [,2]
## [1,] -0.1670064 7.567006
## [2,] -0.1710156 7.571016

Look at the first reading, for example.

Based on this, we conclude that, we are 95% confident that true population difference in the mean between OJ and VC is between -0.1670064 and 7.567006. By the way, since confidence interval falls within 0, we are doubtful whether we can reject null hypothesis. Let’s prove it anyway.

Now, we execute the t-test,

t.test(len ~ supp, paired = FALSE, var.equal = TRUE, data = ToothGrowth)

## 
##  Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1670064  7.5670064
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

Section E : Code to split ToothGrowth Data, by dose,

pointfive <- filter(ToothGrowth, dose==0.5)
one <- filter(ToothGrowth, dose==1)
two <- filter(ToothGrowth, dose==2)
pointfivelen <- pointfive$len
onelen <- one$len
twolen <- two$len

Section F : Compare MEAN Tooth-Growth between (0.5) & (1.0) milligram Doses,

t.test(onelen, pointfivelen, alternative="greater", paired=FALSE, var.equal=TRUE)

## 
##  Two Sample t-test
## 
## data:  onelen and pointfivelen
## t = 6.4766, df = 38, p-value = 6.331e-08
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  6.753344      Inf
## sample estimates:
## mean of x mean of y 
##    19.735    10.605

Similar test can be run to Compare MEAN Tooth-Growth between (1.0) & (2.0) milligram Doses.

Section G : Test 3 : Compare MEAN Tooth-Growth between (2.0) mg Doses of VC & OJ,

Let’s split ToothGrowth Data, by dose and supp.

pointfiveVC <- filter(pointfive, supp=="VC")$len
pointfiveOJ <- filter(pointfive, supp=="OJ")$len

oneVC <- filter(one, supp=="VC")$len
oneOJ <- filter(one, supp=="OJ")$len

twoVC <- filter(two, supp=="VC")$len
twoOJ <- filter(two, supp=="OJ")$len

Here’s our test,

Ho -> Mu(twoVC) - Mu(twoOJ) = 0

H1 -> Mu(twoVC) - Mu(twoOJ) > 0 ,One-Tail Test.

Here’s a command to execute this test.

t.test(twoVC, twoOJ, alternative="greater", paired=FALSE, var.equal=TRUE)

## 
##  Two Sample t-test
## 
## data:  twoVC and twoOJ
## t = 0.0461, df = 18, p-value = 0.4819
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -2.926866       Inf
## sample estimates:
## mean of x mean of y 
##     26.14     26.06

Similar tests can be run to Compare,

MEAN Tooth-Growth between (0.5) milligram Doses of OJ & VC
MEAN Tooth-Growth between (1.0) milligram Doses of OJ & VC