This report uses the ToothGrowth datset to investigate the effect of vitamin C on tooth growth in Guinea pigs across two diliveray methods (orange juice and ascorbic acid) and three different doses of vitamin C (0.5, 1, 2 mg) and assess the confidence intervals of these compared growth rates to decide if they are significant.
The following are summaries for the dataset. The data include three variables: “len” represents the length of the teeth in 10 guinea pigs, “supp” represents the two diliveray methods (OJ: orange juice and VC: ascorbic acid) each used in 30 cases, and “dose” represents the three doses of vitamin C used (0.5, 1, 2 mg) each used in 20 cases. There are a total of 60 observations (30 OJ, 30VC and 20 for each of the doses). The mean of teeth lengths is 18.81 and the median is 19.25 and a range of 4.2 to 33.9.
data(ToothGrowth)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
The chart below shows the distribution of tooth lengths. It seems to be somewhat normally distributed.
hist(ToothGrowth$len)
The boxplot below and the corresponding T-test compare the medians of tooth lengths between the two delivery methods (OJ and VC). The boxplot shows that the median legth of teeth for the OJ delivery method is slightly higher than that for the VC method (the t-test also shows that the first mean is higher than the second: 20.66 for OJ versus 16.96 for VC), but their ranges seem to largely overlap. As indicated by the hypothesis test result (t-test), the true difference between the means is not equal to zero, meaing we should reject the null hypothesis that the true means are equal and accept the alternative hypothesis that they are different in the population. However, the p-value of the T-test is 0.06, which means this is higher than the 5% probability accepted for the 95% confidence interval. In addition, the confidence interval of the difference between the two means is -0.1710156 to 7.5710156, which (slightly) touches the zero. My final conclusion is that we cannot accept the alternative hypothesis that the means are truely different.
boxplot(ToothGrowth$len ~ ToothGrowth$supp, main="Comparing Tooth Length Across Two Supplements", xlab="Supplement", ylab="Tooth Length")
t.test(len ~ as.numeric(supp), data=ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by as.numeric(supp)
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group 1 mean in group 2
## 20.66333 16.96333
The boxplot and T-tests below compares the medians of tooth lengths across the three doses of vitamin C (0.5, 1.0 and 2.0 mg). The boxplot shows that the median legth of teeth for each dose is higher for higher doses (the t-test also shows that the first mean for the higher doses is also higher: 10.6 for 0.5mg, 19.74 for 1mg, and 26.1 for 2mg), and their ranges seem to only slightly overlap (less only 25th quantiles overlap). As indicated by the hypothesis test result (t-test), the true difference between the means is not equal to zero, meaing we can reject the null hypothesis that the true means are equal and accept the alternative hypothesis that they are different in the population. This is also confirmed by the p-values of the T-tests (all three of which are less than 0.05). The confidence interval of the difference between the two means for the 0.5mg vs 1.0mg doses is -11.983781 to -6.276219, which does not touch zero. Similarly, The confidence interval of the difference between the two means for the 1.0mg vs 2.0mg doses is -8.996481 to -3.733519, and for the 0.5mg vs 2.0mg doses is -18.15617 -12.83383, none of which touch zero. Therefore, we can confidently accept the althernative hypothses that the means of the higher doses are truely higher in the population.
Note: because there are three doses and a T-test only compare two sample means, we had to run three t-tests, one for each pair of compared doses (dplyr was use to filter the doses).
boxplot(ToothGrowth$len ~ ToothGrowth$dose, main="Comparing Tooth Length Across Three Doses", xlab="Doses", ylab="Tooth Length")
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data <- tbl_df(ToothGrowth)
data_dose.5v1 <- filter(data, dose==0.5 | dose==1.0)
t.test(data_dose.5v1$len ~ data_dose.5v1$dose)
##
## Welch Two Sample t-test
##
## data: data_dose.5v1$len by data_dose.5v1$dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
data_dose1v2 <- filter(data, dose==1.0 | dose==2.0)
t.test(data_dose1v2$len ~ data_dose1v2$dose)
##
## Welch Two Sample t-test
##
## data: data_dose1v2$len by data_dose1v2$dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
data_dose.5v2 <- filter(data, dose==0.5 | dose==2.0)
t.test(data_dose.5v2$len ~ data_dose.5v2$dose)
##
## Welch Two Sample t-test
##
## data: data_dose.5v2$len by data_dose.5v2$dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
The assumptions needed these conclusions above is that the variables are independent between the different compared groups (ie not matched) and the data distribution of the compared variable is normal.