T-Tests

This document uses the ToothGrowth dataset readily available in R. The dataset contains information on the effect of vitamin C on tooth growth in Guinea Pigs.

Terms that are highlighted are defined at the bottom of the document.

library(pander)
library(ggplot2)
library(dplyr)

What is a t-test?

A t-test is a type of inferential statistic used to determine if there is a significant difference between the means of two groups (which generally are related in certain features).

A Normal Distribution

A t-test is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. This is not always the case but we will get to non-normal distributions later.

hist(ToothGrowth$len, main = "Tooth Length: Histogram", xlab = "Length")

The histogram above appears to have a normally distributed bell curve. To be certain, run the Shapiro-Wilk test of normality.

Shapiro-Wilk test of Normality

The Shapiro-Wilk test for normality is a general normality test designed to detect all departures from normality. It is comparable in power to the other two tests. The test rejects the hypothesis of normality when the p-value is less than or equal to 0.05.

shapiro <- shapiro.test(ToothGrowth$len)
pander(shapiro)

Shapiro-Wilk normality test: `ToothGrowth$len`
Test statistic	P value
0.9674	0.1091

The p-value for the Shapiro-Wilk test of Normality in the Toothgrowth dataset is > 0.05. Therefore, it is suitable for our t-test.

Visualization

ggplot(ToothGrowth, aes(x = ToothGrowth$supp, y = ToothGrowth$len, fill = ToothGrowth$supp)) +
  geom_violin() +
  labs(title = "Vitamin C Supplement Effects on Guinea Pig Tooth Growth", x = "Supplement", y = "Tooth Length",
       fill = "Toothgrowth") +
  theme_bw()

Running the tests

When running the test, quantitative variables are measured against two groups. In this case, the test is seeking to view if two different vitamin C supplements produce tooth length that is significantly different in length means.

header <- head(ToothGrowth)
pander(header)

len	supp	dose
4.2	VC	0.5
11.5	VC	0.5
7.3	VC	0.5
5.8	VC	0.5
6.4	VC	0.5
10	VC	0.5

One sided t-test

With the one sided t-test, we are testing to see if the mean value is equal to a certain number. For example, the mean value of the tooth length from the ToothGrowth dataset is…

pander(mean(ToothGrowth$len))

18.81

Now that we know the mean value of the dataset, we can run a One sided t-test to measure if the mean average is 18.81.

t <- t.test(ToothGrowth$len, mu = 18)
pander(t)

One Sample t-test: `ToothGrowth$len`
Test statistic	df	P value	Alternative hypothesis	mean of x
0.8236	59	0.4135	two.sided	18.81

The result of the t-test when specifying the mean average is greater than 0.05. This would mean we accept our null hypothesis that the mean tooth length is 18.

As an example, lets see if the average is greater than 5.

pander(t.test(ToothGrowth$len, alternative = "greater", mu = 5))

One Sample t-test: `ToothGrowth$len`
Test statistic	df	P value	Alternative hypothesis	mean of x
13.99	59	1.079e-20 * * *	greater	18.81

The p-value, being far less than 0.05, lets us know that our mean tooth length is greater than 5.

Setting up the function

Function = t.test(Dataset, alternative = “greater” “less” or “two.sided”, mu = mean average of testing)

Two sided t-test

A two sided t-test, unlike the one sided test, is looking for a difference in the means of two populations.

The null hypothesis is as follows:

H0: There is no difference in the population means of the two groups. (Hint: A p-value of less than 0.05 would mean there is a difference.)

OJ <- ToothGrowth$len[ToothGrowth$supp == "OJ"]
VC <- ToothGrowth$len[ToothGrowth$supp == "VC"]
tt <- t.test(OJ,VC, paired = FALSE, var.equal = FALSE, conf.level = 0.95)
pander(tt)

Welch Two Sample t-test: `OJ` and `VC`
Test statistic	df	P value	Alternative hypothesis	mean of x	mean of y
1.915	55.31	0.06063	two.sided	20.66	16.96