Module 6: T-test

Objectives

Know what a t-test is
Know the difference between a one-sample t-test and a two-sample t-test
Know the difference between a one-side t-test and a two-side t-test
Know the difference between paired and unpaired data

Intorduction

A t-test is any hypothesis test in which the test statistic follows the t-distribution if the null hypothesis is true.

Some common t-tests are:

one-sample t-test:
It is used to determine whether a hypothesized population mean differs significantly from an observed sample mean.
two-sample t-test:
It is used to determine whether the difference between sample means differs significantly from the hypothesized difference between population means.

In this case, our data ExampleData is the two-sample t-test.

To perform a t-test in R, we use the following general command:

First download the data ExampleData in R.
Type: t.test(ExampleData$variable, mu = the null hypothesis value, paired = T/F, alternative = “greater”/”less”/”two.sided”)

We will now go through the various components of this command.

Paired & Unpaired Data

Paired data are also known as related or matched data. Paired samples or paired comparisons occur when a single individual (or populations of individuals) is tested twice (e.g. before and after). Another possible use occurs when an individual is divided and then subjected to two treatments.
Unpaired samples or comparisons occur when a single individual (or population of individuals) is measured or assayed only once. There will, therefore, be two completely separate groups of observations comprising the data set for the two samples.

In R, if we have paired data, type paired=TRUE;
If we have unpaired data, type paired=FALSE.

One-Side & Two-Sided T-tests

Our alternative hypothesis determines whether our t-tests are one-sided or two-sided.
For example, suppose we are testing wether having the treatment makes the concentration much different, we can let our null hypothesis be that the concentration of the treantment group (Cc) and the concentration of the control group (Cc) are equal (H0: Ct = Cc).

But what is our alternative hypothesis? It can take on two forms:

The alternative could be that the concentration of the treatment group is either larger or smaller than the concentration of the control group, that is, Ha: Ct > Cc, or Ha: Ct < Cc. If either of the two is the case, we will conduct a one-side t-test. In R, we would type: alternative = “greater” or alternative = “smaller”.
The alternative could be that the concentration of the treatment group does not equal to the contration of the control group. If this is the case, then we will conduct a two-side t-test. In R, we would type: alternative = “two.sided”. (This is the default if you don't tell R otherwise.)

Example

Our dataset in this case needs a two-sample t-test because it has two groups–the treatment group and the control group, and we want to compare their sample means. Also, our dataset is unpaired since we have two completely separate groups of observations.
The following is the command to conduct a t-test using our dataset:

treat0=c(0.11,0.13,0.15,0.21,0.19)
treat1=c(0.70,0.66,0.78,0.71,0.70)
t.test(treat0,treat1,paired=F)

## 
##  Welch Two Sample t-test
## 
## data:  x and treat1
## t = -20.51, df = 7.98, p-value = 3.44e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.6141 -0.4899
## sample estimates:
## mean of x mean of y 
##     0.158     0.710

# If we have a lot of observations, we can use the function subset() to generate different groups.   

# treat0 = subset(ExampleData, treatment=="0")  

# treat1 = subset(ExampleData, treatment=="1")

We obtained p-value much smaller than 0.05, then we can conclude that the average concentration of two groups are significantly different.

Questions to Submit

Q1. Describe your dataset (one-sample/two-sample, paired/unpaired).

Q2. Conduct an appropriate t-test using your dataset.

# Insert your code here

Q3. Interpret the result of your t-test.

Answers

Q1.
The dataset is unpaired, but is neither a one-sample nor a two-sample dataset since it maintains three groups. To compare sample means sing t-test, we need to conduct t-tests between every two groups because t-tests cannot compare more than 2 groups at one time. In this case, the question does not say whether we should use one-sided or two-sided test, so we choose the default, which is the two-sided t-test.

Q2.

# First install your data to R using the following command:  
Rdata3=fetchGoogle("https://docs.google.com/spreadsheet/pub?key=0AnFamthOzwySdGNWZHRLc2hiTjRJZ251ZUlEVktjR2c&output=csv")

## Error: Missing packages.  Please retry after installing the following:
## RCurl

water = subset(Rdata3, inducer=="water")

## Error: object 'Rdata3' not found

lactose = subset(Rdata3, inducer=="lactose")

## Error: object 'Rdata3' not found

raff = subset(Rdata3, inducer=="raffinose")

## Error: object 'Rdata3' not found

t.test(water$IU, lactose$IU, paired=F)

## Error: object 'water' not found

t.test(water$IU, raff$IU, paired=F)

## Error: object 'water' not found

t.test(lactose$IU, raff$IU, paired=F)

## Error: object 'lactose' not found

(T-tests are not the most appropriate method if we have more than 2 groups. We should use ANOVA instead, which we will talk about in the following modules.)

Q3.
Remember the null hypothesis is the sample means between different groups are similar. If we reject the null, it means the sample means are different.

The p-value in the t-test for water and lactose groups is smaller than 0.05, so we conclude there is a significant difference between the average IU values for water and lactose groups.
The p-value for water and raffinose groups is also smaller than 0.05, so we conclude that there is a significant difference between the average IU values for water and raffinose groups.
The p-value for lactose and raffinose groups is larger than 0.05, so we conclude that the average IU values for lactose and raffinose groups are significantly similar.