Two equivalent ways to set up a paired t-test in R

First, I’ll set up some random data. Let’s say this is the heart of 5 students before they take my test and after the test. This first chunk of R code just makes the data and can be ignored

#Make some fake data
students <- c("Zetta","Jude","April","Hendrick","Cindy", "Modou","Percy")

n <- length(students)
before.test <- round(runif(n, 45,65),1)

after.test <- round(before.test + 10 + 
                      rnorm(n, 0,3),1)

data.2.num.columns <- data.frame(students, before.test,after.test)


data.1.num.column <- data.frame(students = rep(students,2),
heart.rate = c(before.test, after.test),
when = c(rep("before",n), rep("after",n)))

Data for a paired t-test can be made two different ways. First, it can be in a format where each of the two measurements on a subject/object/etc is in a sperate column. There are therefore two columns of numeric data. One column is for the 1st time point (before), the other column is for the 2nd time point (after)

data.2.num.columns
##   students before.test after.test
## 1    Zetta        59.1       68.3
## 2     Jude        50.6       61.4
## 3    April        48.7       61.2
## 4 Hendrick        47.6       58.3
## 5    Cindy        48.5       56.9
## 6    Modou        63.1       72.9
## 7    Percy        48.4       49.0

Alternative, the data can be set up with all of the numeric data in a single column. That is, the “before” data are stacked in a single column on top of the “After” data. Then, which time an observation corresponds to is indicated in another column. Here, the 1st 7 rows are from the “before” time period and the 2nd 7 rows are from the “after” time period.

data.1.num.column
##    students heart.rate   when
## 1     Zetta       59.1 before
## 2      Jude       50.6 before
## 3     April       48.7 before
## 4  Hendrick       47.6 before
## 5     Cindy       48.5 before
## 6     Modou       63.1 before
## 7     Percy       48.4 before
## 8     Zetta       68.3  after
## 9      Jude       61.4  after
## 10    April       61.2  after
## 11 Hendrick       58.3  after
## 12    Cindy       56.9  after
## 13    Modou       72.9  after
## 14    Percy       49.0  after

Note that the names of the subjects occur in the exact same order within both groups. The first person listed is Zetta in row 1, which is a “before” " measurement. The first “after” measurement (row 8) must therefore also be Zetta. The 2nd “before” row is for Jude, and therefore the 2nd “After” measurement must be for Jude (row 9). If the order is perfect and you do a paired t-test on the data, the answer will be wrong.

Paired T-test on 2 column data

For the first data set, we can calculate the difference betweeen the measurements and then do a one-sample t-test on the differences between the before and aftter

First, do the math

data.2.num.columns$before.minus.after <- data.2.num.columns$before.test - data.2.num.columns$after.test

Then do the one-sample t-test

t.test(data.2.num.columns$before.minus.after,
        mu = 0)
## 
##  One Sample t-test
## 
## data:  data.2.num.columns$before.minus.after
## t = -6.0561, df = 6, p-value = 0.0009185
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -12.435813  -5.278473
## sample estimates:
## mean of x 
## -8.857143

Note that you get the exact same answer if you do after minus before, just the t-value is the opposite sign.

data.2.num.columns$after.minus.before <-data.2.num.columns$after.test - data.2.num.columns$before.test

t.test(data.2.num.columns$after.minus.before,
        mu = 0)
## 
##  One Sample t-test
## 
## data:  data.2.num.columns$after.minus.before
## t = 6.0561, df = 6, p-value = 0.0009185
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   5.278473 12.435813
## sample estimates:
## mean of x 
##  8.857143

Paired t-test on one column of data

We can have R do the math for us if the data is organized with all the numeric data in a single column. However, the data MUST be oragnized properly.

t.test(heart.rate ~ when, 
       data = data.1.num.column,
       paired = TRUE)
## 
##  Paired t-test
## 
## data:  heart.rate by when
## t = 6.0561, df = 6, p-value = 0.0009185
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   5.278473 12.435813
## sample estimates:
## mean of the differences 
##                8.857143

Note that I MUST include ** paired = TRUE **. If we foret this we will run a standard two-sample test and get the wrong answer. Try it to see what happens. The t value will be wrong as well as the degrees of freedom.