In this document, the ToothGrowth database from the datasets library in R is used in order to provide various examples of hipothesis testing analysis
The document has 4 main sections:
The data is loaded from the datasets library. Additional libraries are called
library("datasets")
library("ggplot2")
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
More information can be found in the help page, typing:
?ToothGrowth
From the help page: The Effect of Vitamin C on Tooth Growth in Guinea Pigs: The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
The data shows the lengh of the teeth depending on two basic features: the amount of vitamin C provided and the delivery methods, the data is distributed as follows:
g <- ggplot(ToothGrowth, aes(factor(dose),len))
g <- g + geom_boxplot(aes(fill = factor(dose)))
g <- g + labs(title="Comparison of teeth lenght by vitamin C dose", x="Amount of vitamin C (mg)", y="Teeth length")
g
g <- ggplot(ToothGrowth, aes(supp,len))
g <- g + geom_boxplot(aes(fill = supp))
g <- g + labs(title="Comparison of teeth lenght by delivery method", x="delivery method", y="Teeth length")
g
As can be seen, the ginea pig teeth lenght is apparently affected by the amount of vitamin C consumed. The delivery method seems to have certain relation with the teeth lenght but the significance of the relationship is unclear.
The analysis is intended to provide statistical evidence of wether or not the teeth growth of the ginea pigs is affected by both the amount of vitamin C and the delivery method (Oranje Juice or direct ascorbic acid), representing the different groups.
We will perform a Hipothesis Test - Student´s T-test, where:
It is to be noted that, although the standard t-test function assumes that variances are equal the t.test function corrects unequal variances by default ( this is calles the “Welch Two Sample t-test”. This could be shut down using the var.equal parameter in the t.test function)
VC <- subset(ToothGrowth,supp=="VC")$len
OJ <- subset(ToothGrowth,supp=="OJ")$len
t.test(VC,OJ)
##
## Welch Two Sample t-test
##
## data: VC and OJ
## t = -1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -7.5710156 0.1710156
## sample estimates:
## mean of x mean of y
## 16.96333 20.66333
It can be seen that the null hypothesis can be rejected with a p-value of 0.06063, meaning that there is a 6% of probability of a Type I error (failing in rejecting a true null hypothesis).
The means of teeth growth for OJ and VC delivery methods are 20.66 and 16.96 respectively, statistically different at more than a 90% confidence level.
In the same fashion as in the previuos case,we will perform a Hipothesis Test - Student´s T-test, where:
Again, it is to be noted that, although the standard t-test function assumes that variances are equal the t.test function corrects unequal variances by default ( this is calles the “Welch Two Sample t-test”. This could be shut down using the var.equal parameter in the t.test function)
As the dose amounts in the sample make 3 differente groups (0.5, 1, 2), the test will be performed pairwise between 0.5-1 and 1-2 (the test could be done 3-ways but it is out of the scope of the required analysis).
dose_05<- subset(ToothGrowth,dose==0.5)$len
dose_1 <- subset(ToothGrowth,dose==1)$len
dose_2 <- subset(ToothGrowth,dose==2)$len
#Comparing 0.5 vs 1 mg
t.test(dose_05,dose_1)
##
## Welch Two Sample t-test
##
## data: dose_05 and dose_1
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean of x mean of y
## 10.605 19.735
#Comparing 1 vs 2 mg
t.test(dose_1,dose_2)
##
## Welch Two Sample t-test
##
## data: dose_1 and dose_2
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean of x mean of y
## 19.735 26.100
In both cases, the null hypothesis can be rejected with a very small p-value, under 0.0001, meaning that there is a 0.0001% of probability of a Type I error (failing in rejecting a true null hypothesis).
The means of teeth growth for 0.5, 1 and 2 mg are 10.61, 19.74 y 26.10 respectively, statistically different at more than a 99% confidence level.
The result show that vitamin C is a relevant driver of the teeth growth in this particular experiment with ginea pigs.
The amount of vitamine C provided produce differences in teeth length with a high statistical significant. The means of teeth length for the different dose groups are different with a confidence level of more than 99%. being 10.61, 19.74 y 26.10 respectively for 0.5, 1 and 2 mg of vitamin C.
The delivery method also shows significant differences in the mean lenght of teeth, being different at least at a 90% confidence level. Being 20.66 and 16.96 respectively for Oranje Juice and Ascorbic Acid. This indicated that vitamin C is better absorbed comming from Orange juice than provided directly