This document uses the ToothGrowth dataset readily available in R. The dataset contains information on the effect of vitamin C on tooth growth in Guinea Pigs.
Terms that are highlighted
are defined at the bottom of the document.
library(pander)
library(ggplot2)
library(dplyr)
A t-test is a type of inferential statistic used to determine if there is a significant difference between the means of two groups (which generally are related in certain features).
A t-test is most commonly applied when the test statistic would follow a normal distribution
if the value of a scaling term in the test statistic were known. This is not always the case but we will get to non-normal distributions later.
hist(ToothGrowth$len, main = "Tooth Length: Histogram", xlab = "Length")
The histogram above appears to have a normally distributed bell curve. To be certain, run the Shapiro-Wilk test of normality.
The Shapiro-Wilk test for normality is a general normality test designed to detect all departures from normality. It is comparable in power to the other two tests. The test rejects the hypothesis of normality when the p-value is less than or equal to 0.05.
shapiro <- shapiro.test(ToothGrowth$len)
pander(shapiro)
Test statistic | P value |
---|---|
0.9674 | 0.1091 |
The p-value for the Shapiro-Wilk test of Normality in the Toothgrowth dataset is > 0.05. Therefore, it is suitable for our t-test.
ggplot(ToothGrowth, aes(x = ToothGrowth$supp, y = ToothGrowth$len, fill = ToothGrowth$supp)) +
geom_violin() +
labs(title = "Vitamin C Supplement Effects on Guinea Pig Tooth Growth", x = "Supplement", y = "Tooth Length",
fill = "Toothgrowth") +
theme_bw()
When running the test, quantitative variables are measured against two groups. In this case, the test is seeking to view if two different vitamin C supplements produce tooth length that is significantly different in length means.
header <- head(ToothGrowth)
pander(header)
len | supp | dose |
---|---|---|
4.2 | VC | 0.5 |
11.5 | VC | 0.5 |
7.3 | VC | 0.5 |
5.8 | VC | 0.5 |
6.4 | VC | 0.5 |
10 | VC | 0.5 |
With the one sided t-test, we are testing to see if the mean value is equal to a certain number. For example, the mean value of the tooth length from the ToothGrowth dataset is…
pander(mean(ToothGrowth$len))
18.81
Now that we know the mean value of the dataset, we can run a One sided t-test to measure if the mean average is 18.81.
t <- t.test(ToothGrowth$len, mu = 18)
pander(t)
Test statistic | df | P value | Alternative hypothesis | mean of x |
---|---|---|---|---|
0.8236 | 59 | 0.4135 | two.sided | 18.81 |
The result of the t-test when specifying the mean average is greater than 0.05. This would mean we accept our null hypothesis that the mean tooth length is 18.
As an example, lets see if the average is greater than 5.
pander(t.test(ToothGrowth$len, alternative = "greater", mu = 5))
Test statistic | df | P value | Alternative hypothesis | mean of x |
---|---|---|---|---|
13.99 | 59 | 1.079e-20 * * * | greater | 18.81 |
The p-value, being far less than 0.05, lets us know that our mean tooth length is greater than 5.
Function = t.test(Dataset, alternative = “greater” “less” or “two.sided”, mu = mean average of testing)
A two sided t-test, unlike the one sided test, is looking for a difference in the means of two populations.
The null hypothesis is as follows:
H0: There is no difference in the population means of the two groups. (Hint: A p-value of less than 0.05 would mean there is a difference.)
OJ <- ToothGrowth$len[ToothGrowth$supp == "OJ"]
VC <- ToothGrowth$len[ToothGrowth$supp == "VC"]
tt <- t.test(OJ,VC, paired = FALSE, var.equal = FALSE, conf.level = 0.95)
pander(tt)
Test statistic | df | P value | Alternative hypothesis | mean of x | mean of y |
---|---|---|---|---|---|
1.915 | 55.31 | 0.06063 | two.sided | 20.66 | 16.96 |