Statistical Inference part 2

#PART 2
tooth <- ToothGrowth

This dataset contains information on:

The Effect of Vitamin C on Tooth Growth in Guinea Pigs

Vitamins were administered via orange juice (OJ) or ascorbic acid (VC)

g <- ggplot(tooth, aes(x = supp, y = len, fill = supp))
g +geom_boxplot()+labs(title = "Supplement effect on tooth length")

tooth$dose <- as.factor(tooth$dose)
h <- ggplot(tooth, aes(x = dose, y = len, fill = dose))
h +geom_boxplot()+labs(title = "Dose effect on tooth length")

We can tell from our preliminary graphs that VC seems to have a weaker effect on tooth length thanif OJ was administered. Also, unsurprisingly, if the dose is higher tooth length has a higher value

To confirm this statistically we do the following tests:

group <- as.character(tooth$supp)
testStat <- function(w, g) mean (w[g=="OJ"]) - mean (w[g =="VC"])
observedStat <- testStat(tooth$len, group)
observedStat

## [1] 3.7

The observed difference between groups is 3.7, with OJ having a larger effect on length than VC

permutations <- sapply(1:1000, function(i) testStat (tooth$len, sample(group)))
mean(permutations > observedStat)

## [1] 0.035

After 1000 permutations we found 3.5% of the datasets were larger than the original data

So we reject the null hypothesis at an alpha of 0.05, but would not for an alpha of 0.01

ggplot()+aes(permutations)+geom_histogram(binwidth=1, color="lightblue")+
        geom_vline(aes(xintercept=3.7, color="red", size =0.05), show.legend = FALSE)+
        labs(title = "Permutation Distribution")

From our histogram we see that most observations fall below that mean we calculated of 3.7.

So a 0 mean difference is relatively unlikely at first glance

Next let’s build the Confidence intervals and p - values of our observations based on supplement

g1 <- tooth$len[1:30]; g2 <- tooth$len[31:60]
difference <-  g2 - g1
t.test(difference, paired = FALSE)

## 
##  One Sample t-test
## 
## data:  difference
## t = 3.3026, df = 29, p-value = 0.00255
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  1.408659 5.991341
## sample estimates:
## mean of x 
##       3.7

Based on this we see we have our confidence interval and a very low p value of 0.0025 which implies that OJ has a more positive effect on length than VC of 1.41 to 5.99 in 95% of cases

Now we do the same for dose for both supplements separately

h1oj <- tooth %>% 
      filter(supp == "OJ") %>% 
      arrange(dose) %>% 
      slice(1:10) %>% 
      select(len)
h2oj <- tooth %>% 
  filter(supp == "OJ") %>% 
  arrange(dose) %>% 
  slice(11:20) %>%
  select(len)
h3oj <- tooth %>%
  filter(supp == "OJ") %>% 
  arrange(dose) %>% 
  slice(21:30) %>% 
  select(len)
difference2oj <- h3oj-h2oj
t.test(difference2oj, paired = FALSE)

## 
##  One Sample t-test
## 
## data:  difference2oj
## t = 1.9435, df = 9, p-value = 0.08384
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.5509376  7.2709376
## sample estimates:
## mean of x 
##      3.36

difference3oj <- h2oj-h1oj
t.test(difference3oj, paired = FALSE)

## 
##  One Sample t-test
## 
## data:  difference3oj
## t = 4.1635, df = 9, p-value = 0.002435
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   4.324616 14.615384
## sample estimates:
## mean of x 
##      9.47

In the case of OJ it seems that the p value is a bit higher when comparing a dose of 2 with 1

With a possibility that there is no noticeable effect (lower CI goes below 0) which passes the p-test at 10%. Whereas for doses of 1 to 0.5 the effect is clearly positive, with a very low p-value and high CI.

h1vc <- tooth %>% 
  filter(supp == "VC") %>% 
  arrange(dose) %>% 
  slice(1:10) %>% 
  select(len)
h2vc <- tooth %>% 
  filter(supp == "VC") %>% 
  arrange(dose) %>% 
  slice(11:20) %>%
  select(len)
h3vc <- tooth %>% 
  filter(supp == "VC") %>% 
  arrange(dose) %>% 
  slice(21:30) %>% 
  select(len)
difference2vc <- h3vc-h2vc
t.test(difference2vc, paired = FALSE)

## 
##  One Sample t-test
## 
## data:  difference2vc
## t = 5.346, df = 9, p-value = 0.0004648
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   5.405082 13.334918
## sample estimates:
## mean of x 
##      9.37

difference3vc <- h2vc-h1vc
t.test(difference3vc, paired = FALSE)

## 
##  One Sample t-test
## 
## data:  difference3vc
## t = 6.1364, df = 9, p-value = 0.0001715
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   5.549601 12.030399
## sample estimates:
## mean of x 
##      8.79

As for VC we can see that increased dosing has a very statstically significant positive effect on length, with high/positive CI and p-values that are extremely low. The Ho that the mean is 0 is very easily rejected.

Statistical Inference part 2

JRP

12/9/2019