Synopsis

In this document, the ToothGrowth database from the datasets library in R is used in order to provide various examples of hipothesis testing analysis

The document has 4 main sections:

  1. Data extraction
  2. Data summary
  3. Data analysis
  4. Conclusions

Data extraction

The data is loaded from the datasets library. Additional libraries are called

library("datasets")
library("ggplot2")
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

More information can be found in the help page, typing:

?ToothGrowth

From the help page: The Effect of Vitamin C on Tooth Growth in Guinea Pigs: The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).

Data summary

The data shows the lengh of the teeth depending on two basic features: the amount of vitamin C provided and the delivery methods, the data is distributed as follows:

g <- ggplot(ToothGrowth, aes(factor(dose),len))
g <- g + geom_boxplot(aes(fill = factor(dose))) 
g <- g + labs(title="Comparison of teeth lenght by vitamin C dose", x="Amount of vitamin C (mg)", y="Teeth length")
g

g <- ggplot(ToothGrowth, aes(supp,len))
g <- g + geom_boxplot(aes(fill = supp)) 
g <- g + labs(title="Comparison of teeth lenght by delivery method", x="delivery method", y="Teeth length")
g

As can be seen, the ginea pig teeth lenght is apparently affected by the amount of vitamin C consumed. The delivery method seems to have certain relation with the teeth lenght but the significance of the relationship is unclear.

Data analysis

The analysis is intended to provide statistical evidence of wether or not the teeth growth of the ginea pigs is affected by both the amount of vitamin C and the delivery method (Oranje Juice or direct ascorbic acid), representing the different groups.

Delivery method

We will perform a Hipothesis Test - Student´s T-test, where:

  • H0: Difference of the means of both groups is not statistically different from 0
  • H1: Difference of the means of both groups is statistically different from 0

It is to be noted that, although the standard t-test function assumes that variances are equal the t.test function corrects unequal variances by default ( this is calles the “Welch Two Sample t-test”. This could be shut down using the var.equal parameter in the t.test function)

VC <- subset(ToothGrowth,supp=="VC")$len
OJ <- subset(ToothGrowth,supp=="OJ")$len

t.test(VC,OJ)
## 
##  Welch Two Sample t-test
## 
## data:  VC and OJ
## t = -1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.5710156  0.1710156
## sample estimates:
## mean of x mean of y 
##  16.96333  20.66333

It can be seen that the null hypothesis can be rejected with a p-value of 0.06063, meaning that there is a 6% of probability of a Type I error (failing in rejecting a true null hypothesis).

The means of teeth growth for OJ and VC delivery methods are 20.66 and 16.96 respectively, statistically different at more than a 90% confidence level.

Dose

In the same fashion as in the previuos case,we will perform a Hipothesis Test - Student´s T-test, where:

  • H0: Difference of the means of both groups is not statistically different from 0
  • H1: Difference of the means of both groups is statistically different from 0

Again, it is to be noted that, although the standard t-test function assumes that variances are equal the t.test function corrects unequal variances by default ( this is calles the “Welch Two Sample t-test”. This could be shut down using the var.equal parameter in the t.test function)

As the dose amounts in the sample make 3 differente groups (0.5, 1, 2), the test will be performed pairwise between 0.5-1 and 1-2 (the test could be done 3-ways but it is out of the scope of the required analysis).

dose_05<- subset(ToothGrowth,dose==0.5)$len
dose_1 <- subset(ToothGrowth,dose==1)$len
dose_2 <- subset(ToothGrowth,dose==2)$len

#Comparing 0.5 vs 1 mg
t.test(dose_05,dose_1)
## 
##  Welch Two Sample t-test
## 
## data:  dose_05 and dose_1
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean of x mean of y 
##    10.605    19.735
#Comparing 1 vs 2 mg
t.test(dose_1,dose_2)
## 
##  Welch Two Sample t-test
## 
## data:  dose_1 and dose_2
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean of x mean of y 
##    19.735    26.100

In both cases, the null hypothesis can be rejected with a very small p-value, under 0.0001, meaning that there is a 0.0001% of probability of a Type I error (failing in rejecting a true null hypothesis).

The means of teeth growth for 0.5, 1 and 2 mg are 10.61, 19.74 y 26.10 respectively, statistically different at more than a 99% confidence level.

Conclusions

The result show that vitamin C is a relevant driver of the teeth growth in this particular experiment with ginea pigs.