First, I’ll set up some random data. Let’s say this is the heart of 5 students before they take my test and after the test. This first chunk of R code just makes the data and can be ignored
#Make some fake data
students <- c("Zetta","Jude","April","Hendrick","Cindy", "Modou","Percy")
n <- length(students)
before.test <- round(runif(n, 45,65),1)
after.test <- round(before.test + 10 +
rnorm(n, 0,3),1)
data.2.num.columns <- data.frame(students, before.test,after.test)
data.1.num.column <- data.frame(students = rep(students,2),
heart.rate = c(before.test, after.test),
when = c(rep("before",n), rep("after",n)))
Data for a paired t-test can be made two different ways. First, it can be in a format where each of the two measurements on a subject/object/etc is in a sperate column. There are therefore two columns of numeric data. One column is for the 1st time point (before), the other column is for the 2nd time point (after)
data.2.num.columns
## students before.test after.test
## 1 Zetta 59.1 68.3
## 2 Jude 50.6 61.4
## 3 April 48.7 61.2
## 4 Hendrick 47.6 58.3
## 5 Cindy 48.5 56.9
## 6 Modou 63.1 72.9
## 7 Percy 48.4 49.0
Alternative, the data can be set up with all of the numeric data in a single column. That is, the “before” data are stacked in a single column on top of the “After” data. Then, which time an observation corresponds to is indicated in another column. Here, the 1st 7 rows are from the “before” time period and the 2nd 7 rows are from the “after” time period.
data.1.num.column
## students heart.rate when
## 1 Zetta 59.1 before
## 2 Jude 50.6 before
## 3 April 48.7 before
## 4 Hendrick 47.6 before
## 5 Cindy 48.5 before
## 6 Modou 63.1 before
## 7 Percy 48.4 before
## 8 Zetta 68.3 after
## 9 Jude 61.4 after
## 10 April 61.2 after
## 11 Hendrick 58.3 after
## 12 Cindy 56.9 after
## 13 Modou 72.9 after
## 14 Percy 49.0 after
Note that the names of the subjects occur in the exact same order within both groups. The first person listed is Zetta in row 1, which is a “before” " measurement. The first “after” measurement (row 8) must therefore also be Zetta. The 2nd “before” row is for Jude, and therefore the 2nd “After” measurement must be for Jude (row 9). If the order is perfect and you do a paired t-test on the data, the answer will be wrong.
For the first data set, we can calculate the difference betweeen the measurements and then do a one-sample t-test on the differences between the before and aftter
First, do the math
data.2.num.columns$before.minus.after <- data.2.num.columns$before.test - data.2.num.columns$after.test
Then do the one-sample t-test
t.test(data.2.num.columns$before.minus.after,
mu = 0)
##
## One Sample t-test
##
## data: data.2.num.columns$before.minus.after
## t = -6.0561, df = 6, p-value = 0.0009185
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -12.435813 -5.278473
## sample estimates:
## mean of x
## -8.857143
Note that you get the exact same answer if you do after minus before, just the t-value is the opposite sign.
data.2.num.columns$after.minus.before <-data.2.num.columns$after.test - data.2.num.columns$before.test
t.test(data.2.num.columns$after.minus.before,
mu = 0)
##
## One Sample t-test
##
## data: data.2.num.columns$after.minus.before
## t = 6.0561, df = 6, p-value = 0.0009185
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 5.278473 12.435813
## sample estimates:
## mean of x
## 8.857143
We can have R do the math for us if the data is organized with all the numeric data in a single column. However, the data MUST be oragnized properly.
t.test(heart.rate ~ when,
data = data.1.num.column,
paired = TRUE)
##
## Paired t-test
##
## data: heart.rate by when
## t = 6.0561, df = 6, p-value = 0.0009185
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 5.278473 12.435813
## sample estimates:
## mean of the differences
## 8.857143
Note that I MUST include ** paired = TRUE **. If we foret this we will run a standard two-sample test and get the wrong answer. Try it to see what happens. The t value will be wrong as well as the degrees of freedom.