R’s built in ToothGrowth dataset shows The Effect of Vitamin C on Tooth Growth in Guinea Pigs. It is a data frame with 60 observations on 3 variables. Where,
> summary(ToothGrowth)
len supp dose
Min. : 4.20 OJ:30 Min. :0.500
1st Qu.:13.07 VC:30 1st Qu.:0.500
Median :19.25 Median :1.000
Mean :18.81 Mean :1.167
3rd Qu.:25.27 3rd Qu.:2.000
Max. :33.90 Max. :2.000
Question: Test whether the true mean of len in ToothGrowth is equal to 15 or not.
Solution:
> t.test(x = ToothGrowth$len, alternative = "two.sided", mu = 15,
+ conf.level = 0.95)
One Sample t-test
data: ToothGrowth$len
t = 3.8615, df = 59, p-value = 0.0002823
alternative hypothesis: true mean is not equal to 15
95 percent confidence interval:
16.83731 20.78936
sample estimates:
mean of x
18.81333
Since the p-value is less than 0.05, we may reject the null hypothesis of true mean being 15. Hence we may conclude that the true mean is not equal to 15 at 5% level of significance.
Question: Test whether the true mean of len in ToothGrowth is greater than 17 or not.
Solution:
> t.test(x = ToothGrowth$len, alternative = "greater", mu = 17,
+ conf.level = 0.95)
One Sample t-test
data: ToothGrowth$len
t = 1.8362, df = 59, p-value = 0.03568
alternative hypothesis: true mean is greater than 17
95 percent confidence interval:
17.16309 Inf
sample estimates:
mean of x
18.81333
Here, p-value is less than 0.05. So that we may reject the null hypothesis of the true mean being 17 at 5% significance level. Hence we may conclude that the true mean is indeed greater than 17.
Question: Conduct a proper test to determine if there is a difference in tooth length based on supplement type given to Guinea Pigs.
Solution:
> t.test(len ~ supp, data = ToothGrowth, var.equal = T)
Two Sample t-test
data: len by supp
t = 1.9153, df = 58, p-value = 0.06039
alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
95 percent confidence interval:
-0.1670064 7.5670064
sample estimates:
mean in group OJ mean in group VC
20.66333 16.96333
Since the P-value is greater than 0.05, we may not reject the null hypothesis of their true means being equal at 5% significance level.
Question: Assume that the variances are not equal in the two groups. Now test the equality of the averages.
Solution:
> t.test(len ~ supp, data = ToothGrowth, var.equal = F)
Welch Two Sample t-test
data: len by supp
t = 1.9153, df = 55.309, p-value = 0.06063
alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
95 percent confidence interval:
-0.1710156 7.5710156
sample estimates:
mean in group OJ mean in group VC
20.66333 16.96333
Welch’s two sample t-test is performed because the variances are assumed to be unequal.
The p-value is greater than 0.05. So we not reject the null hypothesis that the true means are equal at 5% level of significance.
If we have data that are dependent on one another then we may conduct a paired-sample t-test.
Suppose we have data on scores of students before a certain course and after the course is completed -
> before <- c(68, 63, 77, 74, 71, 68, 75, 64, 76, 73, 70, 75, 58, 61, 67)
> after <- c(77, 69, 72, 80, 82, 68, 83, 72, 74, 72, 58, 68, 67, 75, 79)
In this case a paired-sample t-test should be conducted with an one sided alternative that tests whether the scores have been improved or not assuming that the variances are equal -
> t.test(after, before, paired = TRUE,
+ alternative = "greater", var.equal = TRUE)
Paired t-test
data: after and before
t = 1.8701, df = 14, p-value = 0.04126
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
0.2171491 Inf
sample estimates:
mean of the differences
3.733333
Since p-value is less than 0.05, we may reject the null hypothesis.
If we consider len as dependent variable that depends on supp, then the significance of the regression coefficient for the fitted model is calculated using the t-test method -
> summary(lm(len ~ as.factor(supp), data = ToothGrowth))
Call:
lm(formula = len ~ as.factor(supp), data = ToothGrowth)
Residuals:
Min 1Q Median 3Q Max
-12.7633 -5.7633 0.4367 5.5867 16.9367
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.663 1.366 15.127 <2e-16 ***
as.factor(supp)VC -3.700 1.932 -1.915 0.0604 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.482 on 58 degrees of freedom
Multiple R-squared: 0.05948, Adjusted R-squared: 0.04327
F-statistic: 3.668 on 1 and 58 DF, p-value: 0.06039
From the model output it can be noticed that the p-value for the corresponding t statistic is 0.0604. If we do the t-test separately, we will observe the same p-value.
Since linear regression assumes that data is homogeneous in nature, var.equal=TRUE is used.
> t.test(len ~ supp, data = ToothGrowth, var.equal = TRUE)
Two Sample t-test
data: len by supp
t = 1.9153, df = 58, p-value = 0.06039
alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
95 percent confidence interval:
-0.1670064 7.5670064
sample estimates:
mean in group OJ mean in group VC
20.66333 16.96333
Here the p-value is same as the previously observed p-value in regression summary output.