This project will analyse the ToothGrowth data in the R Datasets package.
The data captures the result of the study of the effect of vitamin C has on tooth growth in a sample size of 60 Guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (OJ) or ascorbic acid (VC), a form of vitamin C.
The results of the study shows a positive correlation between dosage and tooth growth.
setwd("~/Data Science/Module 6 Statistical Inference/")
library(datasets) # load datasets
library(lattice) # use lattice plots to visually inspect the data
Load and review data structure and variables.
data("ToothGrowth")
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
Since the study is only based on 3 level of dosage (0.5, 1 or 2 mg/day), we can re-define the variable “dose” as a factor with 3 levels instead of as numerical data.
ToothGrowth$dose<-as.factor(ToothGrowth$dose)
Visual examination of relationships between variables.
xyplot(len ~ supp | dose, ToothGrowth,
layout= c(3,1),
main=list(label=" Relationship between Tooth Growth, Delivery Method and Dosage",cex=0.75),
xlab=list(label="Type of Supplement and Dosage", cex=0.75),
ylab=list(label=" Length of Tooth Growth", cex=0.75),
par.settings = simpleTheme(col="blue"))
A visual examination of the above plot suggests the following:
There does not appear to be a strong correlation between the delivery method of supplement and length of tooth growth.
There appears to be a positive correlation between the dosage of the supplement administered and the length of tooth growth.
Based on the above observations, we can formula an appropriate test to determine if there is a significant difference between (i) the mean lenth of tooth growth for each delivery method and (ii) the mean length of tooth growth for each level of dosage administered.
Determine, at 5% significance level, if there is a correlation between the delivery method of the supplement and length of tooth growth.
Null Hypothesis (H0) : The average length of tooth growth is NOT significantly greater than zero given the delivery method of the supplement.
Alternative Hypothesis (H1) : The average length of tooth growth is significantly greater than zero given the delivery method of the supplement.
len <- ToothGrowth$len
dose<- ToothGrowth$dose
supp<- ToothGrowth$supp
t.test(len[supp == "OJ"], len[supp == "VC"], paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: len[supp == "OJ"] and len[supp == "VC"]
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
Conclusion
Given,
1. The p-value of the test is 0.06, and is greater than the 5% significance level.
2. 95% confidence level includes zero.
We failed to reject the null hypothesis and therefore conclude that there is no significant correlation between tooth growth and the delivery method of the supplement.
Determine, at 5% significance level, if there is a correlation between the dosage of the supplement administered and length of tooth growth.
5.1 Test for significance of correlation between tooth growth and dosage, given dosage is 0.5mg/day vs. 1.0mg/day.
Null Hypothesis (H0) : The average length of tooth growth is NOT significantly greater than zero given the dosage of the supplement administered is increased from 0.5mg/day to 1.0mg/day.
Alternative Hypothesis (H1) : The average length of tooth growth is significantly greater than zero given the dosage of the supplement administered is increased from 0.5mg/day to 1.0mg/day
t.test(len[dose == "0.5"], len[dose == "1"], paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: len[dose == "0.5"] and len[dose == "1"]
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean of x mean of y
## 10.605 19.735
5.2 Test for significance of correlation between tooth growth and dosage, given dosage is 1.0mg/day vs. 2.0mg/day.
Null Hypothesis (H0) : The average length of tooth growth is NOT significantly greater than zero given the dosage of the supplement administered is increased from 1mg/day to 2mg/day.
Alternative Hypothesis (H1) : The average length of tooth growth is significantly greater than zero given the dosage of the supplement administered is increased from 1mg/day to 2mg/day
t.test(len[dose == "1"], len[dose == "2"], paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: len[dose == "1"] and len[dose == "2"]
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean of x mean of y
## 19.735 26.100
5.3 Test for significance of correlation between tooth growth and dosage, given dosage is 0.5mg/day vs. 2.0mg/day.
Null Hypothesis (H0) : The average length of tooth growth is NOT significantly greater than zero given the dosage of the supplement administered is increased from 0.5mg/day to 2mg/day.
Alternative Hypothesis (H1) : The average length of tooth growth is significantly greater than zero given the dosage of the supplement administered is increased from 0.5mg/day to 2mg/day
t.test(len[dose == "0.5"], len[dose == "2"], paired = FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: len[dose == "0.5"] and len[dose == "2"]
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean of x mean of y
## 10.605 26.100
Conclusion:
From the 3 sets of test results above,
1. The p-value of the test is zero, thus lesser than the 5% significance level.
2. The higher the increase in dosage, the smaller the p-value of the test.
2. 95% confidence level for all 3 tests does not include zero.
Hence, We reject all 3 null hypothesis(s) and conclude that the average length of tooth growth is greater than zero for each of the 3 levels of the supplement dosage administered.
Assuming that the distribution of the sample mean adheres to the central limit theorm, we can statistically conclude that the dosage of the supplement administered has a signicant correlation to the length of tooth growth of Guinea pigs.