Statistical Inference Assignment

Part 2

Analysing the ToothGrowth dataset

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

dim(ToothGrowth)

## [1] 60  3

head(ToothGrowth)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

After Observing the dataset we get to know that there are three variables in each observations

len : the length of the cell
supp : supplement provided to diliver the dose, VC means Ascorbic Acid and OC means Orange Juice.
dose : The dose of Vitamin C given

library(tidyverse)
library(ggplot2)
g <- ggplot(data = ToothGrowth)+geom_point(mapping = aes(x=dose, y=len))
g <- g+ facet_grid(.~supp)
g <- g+ geom_smooth(mapping = aes(x=dose, y=len))
g

g2 <- ggplot(ToothGrowth, aes(dose,len))
g2<-g2+geom_boxplot(mapping = aes(group=dose), col ="black", fill = "red")
g2<-g2+facet_grid(.~supp)
g2

Now we want to compare the two vitamin C delivery methods (OJ and VC) in terms of their effect on the cell length. Assuming tooth growth is a good thing, we want to maximize the effect of delivering a particular dose. The question we want to ask first is: which is the more effective delivery method? Since there are three dosages, we effectively have three datasets to compare.

Let’s start with the lowest dosage: 0.5mg/day. We will set up a hypothesis test under the following conditions:

Null: both delivery methods are equally efficient
Alternative: there is a measurable difference between the two methods

Since the sample size is relatively small (10 samples per delivery method), we will apply a two-sided t-test. The observations refer to different subjects, so we must use an unpaired test.

low_vc <- filter(ToothGrowth, supp == "VC", dose == 0.5)$len
low_oj <- filter(ToothGrowth, supp == "OJ", dose == 0.5)$len
t.test(low_oj-low_vc,alternative = "two.sided", paired=F,conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  low_oj - low_vc
## t = 2.9791, df = 9, p-value = 0.01547
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  1.263458 9.236542
## sample estimates:
## mean of x 
##      5.25

The results show us that the mean difference between the lengths is 5.25 units in favor of the OJ method. The 95% confidence interval ranges from 1.2 to 9.2 units, allowing us to reject the null hypothesis.

For smaller dosages, OJ is better than VC. Now Chicking for middle range doses

mid_oj <- filter(ToothGrowth, supp == "OJ", dose == 1)$len
mid_vc <- filter(ToothGrowth, supp == "VC", dose == 1)$len
t.test(mid_oj-mid_vc,alternative = "two.sided", paired=F,conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  mid_oj - mid_vc
## t = 3.3721, df = 9, p-value = 0.008229
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  1.951911 9.908089
## sample estimates:
## mean of x 
##      5.93

The results show us that the mean difference between the lengths is 5.93 units in favor of the OJ method. The 95% confidence interval ranges from 1.95 to 9.9 units, allowing us to reject the null hypothesis.

For middle dosages, OJ is better than VC.

Checking for High range doses

hig_oj = filter(ToothGrowth, supp == 'OJ', dose == 2.)$len
hig_vc = filter(ToothGrowth, supp == 'VC', dose == 2.)$len
t.test(hig_oj - hig_vc, alternative='two.sided', paired=FALSE, conf.level = .95)

## 
##  One Sample t-test
## 
## data:  hig_oj - hig_vc
## t = -0.042592, df = 9, p-value = 0.967
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -4.328976  4.168976
## sample estimates:
## mean of x 
##     -0.08

In this case, the mean difference is close to zero. The CI also includes zero, meaning we fail to reject the null hypothesis. This means there is no clear winner in this particular case.

Another important observation is the variance for high doses

c(var(hig_oj), var(hig_vc))

## [1]  7.049333 23.018222

Shows us the VC method indeed yields results of greater variance

Statistical Inference Assignment

Part 2

Conclusion