Sleep data

In R typing data(sleep) brings up data originally analyzed in Gosset’s Biometrika paper, which shows the increase in hours slept for 10 patients on two soporific drugs. R treats the data as two groups rather than paired, but here we’re going to treat the data as if they were paired.

# loads the data
data(sleep)
# prints the first few rows of the dataset
head(sleep)
##   extra group ID
## 1   0.7     1  1
## 2  -1.6     1  2
## 3  -0.2     1  3
## 4  -1.2     1  4
## 5  -0.1     1  5
## 6   3.4     1  6

Variable extra is the extra hours slept; group is a group ID; and ID is a subject ID (1 trough 10, 11 is subject ID 1 again, and so on).

Plotting the data

ggplot(data=sleep, aes(x=group, y=extra, group=ID, colour=ID)) +
        geom_line() +
        geom_point( size=4, shape=21, fill="pink") +
        ggtitle("Extra Hours Slept by Subject")

Each subject is connected with a line, and it’s pretty clear the benefit from acknowledging that these are repeat measurements on the same subjects. If you do not acknowledge that, then what you are comparing is group 1 variation minus group 2 variation. If you do acknowledge that, then you are comparing these subjects specific differences, when comparing across groups. But a variation in these differences is much lower because observations within a subject are quite correlated.

Results

# Here I grab the first ten measurements, and the latter 10 measurements
g1 <- sleep$extra[1 : 10]; g2 <- sleep$extra[11 : 20]
# The difference then is group 2 minus group 1. The vector subtraction makes sense, because I grabbed them in a specific order.
difference <- g2 - g1
# Then I calculate the mean, standard deviation, and number of observations
mn <- mean(difference); s <- sd(difference); n <- 10
# My t confidence interval can be given like this. It is the mean plus or minus the relevant t quantile, evaluated at n-1 degrees of freedom, times the standard error of the interval.
mn + c(-1, 1) * qt(.975, n-1) * s / sqrt(n)
## [1] 0.7001142 2.4598858
# Of course we don't want to do this every time, so we can just do the function t.test of difference
t.test(difference)
## 
##  One Sample t-test
## 
## data:  difference
## t = 4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.7001142 2.4598858
## sample estimates:
## mean of x 
##      1.58
# And t.test where we pass it the two vectors, and give it the argument paired equals TRUE
t.test(g2, g1, paired = T)
## 
##  Paired t-test
## 
## data:  g2 and g1
## t = 4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.7001142 2.4598858
## sample estimates:
## mean of the differences 
##                    1.58
# Or you can actually give it a form of model statement where you say outcome extra is a function of the group where paired equals TRUE evaluated for the data frame sleep.
t.test(extra ~ I(relevel(group, 2)), paired = T, data = sleep)
## 
##  Paired t-test
## 
## data:  extra by I(relevel(group, 2))
## t = 4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.7001142 2.4598858
## sample estimates:
## mean of the differences 
##                    1.58

Results

# creates an empty vector
test <- c()

# binds all the test values
test <- rbind(test, mn + c(-1, 1) * qt(.975, n-1) * s / sqrt(n))
test <- rbind(test, t.test(difference)$conf.int)
test <- rbind(test, t.test(g2, g1, paired = T)$conf.int)
test <- rbind(test, t.test(extra ~ I(relevel(group, 2)), paired = T, data = sleep)$conf.int)

# displays it
test
##           [,1]     [,2]
## [1,] 0.7001142 2.459886
## [2,] 0.7001142 2.459886
## [3,] 0.7001142 2.459886
## [4,] 0.7001142 2.459886

You can see that all these commands give you the same result. The difference in the groups being somewhere between 0.7 and 2.46. So, because this is a confidence interval the interpretation is, that if we were to repeatedly perform this procedure on independent samples, about \(95\%\) of the intervals that we obtained would contain the true mean difference that we’re estimating. This, of course, assumes that these subjects are relevant sample from a population subjects that we’re interested in.