overview

we are going to do some inferential statistics on ToothGrowth dataset. what is this dataset about?

The Effect of Vitamin C on Tooth Growth in Guinea Pigs

this guy :

The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC). let’s see what it is. you can access more data using this command : ?ToothGrowth


inferential by Dose

let’s start doing some exploratory on our dataset. we

tg <- datasets::ToothGrowth
summary(tg)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
dim(tg)
## [1] 60  3

so now we know that this research was conducted on 60 pigs divided to 6 groups. as following :

groups dose type
1 0.5 OJ
2 0.5 VC
3 1.0 OJ
4 1.0 VC
5 2.0 OJ
6 2.0 VC

we are going to check the affect of both factors on length of animal’s tooth.

tg$supp <- factor(tg$supp)
tg$dose <- factor(tg$dose)
library(ggplot2)
g <- ggplot(tg,aes(x = dose,y = len)) + geom_point(aes(color = dose))
g <- g + labs(y = 'length')
g <- g + stat_summary(aes(group = 1),fun.y = mean,
               geom = "line",color = 'black',size = 1.2)
g

we can see that by increasing dose of Vitamin C the length increases.but can we prove it statistically? in such research we better conduct a student t test on our sample. We want to reject the hypothesis that claims change in mean of groups is not considerable. before doing the t.test, let’s say that we know that the data is not a paired data. also since the variability in data is not doffering highly I assume the case that variances are equal. finally since we have three groups we should run our test three times comparing any two groups out of three different doses.

first comparing 0.5 and 2; the most agressive difference :

tg05 <- tg[tg$dose == 0.5,1]
tg2  <- tg[tg$dose == 2.0,1]
tg1  <- tg[tg$dose == 1.0,1]


t205 <-t.test(tg2,tg05,paired = F,var.equal = T)
t205$p.value
## [1] 2.837553e-14

calculating P-value for two samples with dose 2.0 and 0.5 assures us that with almost any significance level \(\alpha\) we can reject our hypothesis , let’s see the rest of t.test:

## 
##  Two Sample t-test
## 
## data:  tg2 and tg05
## t = 11.799, df = 38, p-value = 2.838e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  12.83648 18.15352
## sample estimates:
## mean of x mean of y 
##    26.100    10.605

now time for comparing 0.5 and 1.0 first checking the P-value :

## [1] 1.266297e-07

again the P-value is very small that we can’t top it on any sig. Level.

reading the rest of t.test is the full proof here.

## 
##  Two Sample t-test
## 
## data:  tg1 and tg05
## t = 6.4766, df = 38, p-value = 1.266e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.276252 11.983748
## sample estimates:
## mean of x mean of y 
##    19.735    10.605

as we see above in 95 percent confidence interval it is proved that the group with dose 1.0 have more length.

finally comparing dose 2.0 with 1.0

first P-value :

## [1] 1.810829e-05

again we are facing a P-value in order of \(10^-5\).

just to see the means and our confidence level let’s see the rest :

## 
##  Two Sample t-test
## 
## data:  tg2 and tg1
## t = 4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.735613 8.994387
## sample estimates:
## mean of x mean of y 
##    26.100    19.735

so now we rejected the hypothesis that difference in mean of sample groups is Zero, three times.

inferential by type

first of all let’s see how the lengths differ by supp

ggplot(tg,aes(x = supp,y = len,fill = supp),color = 'black') +
      geom_boxplot() + labs(x = 'type of suppy')

by looking at the plot above we see that the length of samples in group OJ is more than VC. we can make sure that by hypothesis testing. the null hypothesis here is mean of the difference between two group’s length is zero. we need to do a t test to reject this hypothesis. again let’s first see the P-value. of course data groups are not paired.since the variability in data looks different I use the assumption variances are not equal.

tgOJ <- tg[tg$supp == 'OJ',1]
tgVC <- tg[tg$supp == 'VC',1]

tsupp <- t.test(tgOJ,tgVC,paired = F,var.equal = F)
tsupp$p.value
## [1] 0.06063451

the P-value is almot 6 percent. so rejecting or accepting our H0 depends on our sig. Level, \(\alpha\) .if we choose 95 percent we won’t reject. and if we decide \(\alpha\) to be 93 percent, we can reject. let’s see the t.test result with 95 percent sig.Level.

## 
##  Welch Two Sample t-test
## 
## data:  tgOJ and tgVC
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

we can see that the confidence interval includes zero with sig.Level 95 percent. so we didn’t reject!

although we could modify the \(\alpha\) by 2 percent and reject.