There are times when we want to compare a sample mean to a parametric value. Perhaps more commonly, we want to compare the means of two samples to see if they are different. We recognize two such tests: paired-sample tests and independent-sample tests.


In these cases we want to compare the change (error variance) of each individual in a group before and after a treatment. Consider if we wanted to test the effects of a drug for lowering cholesterol. A group of people would have their levels measured before treatment and then again afterward. These two observations (usually framed as before and after) are not independent of one another because they’re done on the same person. So, first we pair up the data and then calculate the differences between them (d).


\(t\:=\:\frac{\overline{d}-\mu_{d0}}{SE_{\overline{d}}}\), where \(SE_{\overline{d}}\:=\frac{\:s_d}{n}\)


The paired t test has n – 1 degrees of freedom, where n is the number of pairs of observations (or the number of differences).

The paired t test is testing the hypothesis of no difference in a variable between the two observations, but in practice we usually use the one-tailed construction. We want to know if a treatment improved or increased/decreased an attribute, not just made it different than before, and we usually frame it as afterbefore differences so that the signs make intuitive sense:


\(H_0\): There is no treatment effect or it decreased the desired outcome (\(A \le B\) or \(\overline{d}\:\le0\)).
\(H_a\): The treatment increased the desired outcome (\(A > B\) or \(\overline{d}>0\)).


(or vice versa if a decreased value is desirable)


This test has assumptions, and they are mostly like the one-sample t test:


Using the paired t test

Imagine that I am testing the effects of a Very Low Calorie Diet (VLCD) on a sample of young women in high school. My data are:

Before: 117.3   111.4   98.6    104.3   105.4     100.4 81.7    89.5    78.2  
After:  83.3    85.9    75.8    82.9    82.3      77.7  62.7    69.0    63.9

Did the VLCD caused these subjects to lose weight (\(\alpha\) = 0.05)?


\(H_0\): The VLCD caused these subjects to gain weight or stay the same (\(A \ge B\) or: \(\overline{d}\:\ge0\)).
\(H_a\): The treatment increased the desired outcome (\(A < B\) or \(\overline{d}<0\)).

There are nine pairs of observations, so there are 9 – 1 = 8 degrees of freedom. The critical value for rejection is \(t_{0.05\left(1\right),8}=-1.86\) t. Why negative? Because we set up our differences as A – B differences. This means that if our calculated value for t from the data is more extreme than –1.86 we can reject the null hypothesis with P < 0.05.

# Make arrays of the observations
before <- c(117.3, 111.4, 98.6, 104.3, 105.4, 100.4, 81.7, 89.5, 78.2)
after <- c(83.3, 85.9, 75.8, 82.9, 82.3, 77.7, 62.7, 69.0, 63.9)

# Combine those arrays into a data frame
vlcd <- data.frame(before, after)

# Calculate the differences between each pair and insert a new column
vlcd$difference <-(vlcd$after - vlcd$before)   ## This makes lost weight negative numbers

# Inspect the differences to see if they appear to be normally-distributed
hist(vlcd$difference, right = FALSE, col = "skyblue", main ="", xlab = "After - Before Difference")


Uh, yeah, I think that looks pretty darn normal, but we’ll see formal tests for assessing normality later.


We can also visualize the differences due to the treatment in a “bump chart” like this.

Reshape your matrix to do it:

# Reshape your data into a new matrix
vlcd2 = reshape(vlcd, varying = 1:2, direction = "long", 
     idvar = "vlcd", v.names = "weight", 
     times = factor(c("before","after"), levels = c("before","after")))

# Make a strips hart from that new matrix
stripchart(weight ~ time, data = vlcd2, vertical = TRUE, 
     xlab = "Time", ylab="Body mass (kg)",
     las = 1, pch = 16, col = "firebrick")


# Add the lines to track subjects
segments(1, vlcd$before, 2, vlcd$after)


Okay, so let’s do the test already:

# Either one of these will give you the same result:

t.test(vlcd$after, vlcd$before, paired = TRUE, alternative = "l")   ## if you didn't calculate a difference
## 
##  Paired t-test
## 
## data:  vlcd$after and vlcd$before
## t = -12.74, df = 8, p-value = 6.787e-07
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -19.29166
## sample estimates:
## mean of the differences 
##               -22.58889
t.test(vlcd$difference, alternative = "l")  ## if you did
## 
##  One Sample t-test
## 
## data:  vlcd$difference
## t = -12.74, df = 8, p-value = 6.787e-07
## alternative hypothesis: true mean is less than 0
## 95 percent confidence interval:
##       -Inf -19.29166
## sample estimates:
## mean of x 
## -22.58889


Note that if we didn’t include the argument alternative = “less” then we’d get the two-tailed result.

So, we have a calculated t of -12.74 which throws us farther into the area of rejection than our \(t_{0.05\left(1\right),8}=-1.86\). We can conclude that the VLCD treatment cause these subjects to lose weight (P < 0.001).


Two-sample t test (for independent samples)

The two-sample t test assesses the null hypothesis that two samples come from the same population, or


H0: \(\mu_1=\mu_2\) (or \(\overline{Y_1}=\overline{Y_2}\))
Ha: \(\mu_1\ne\mu_2\) (or \(\overline{Y_1}\ne\overline{Y_2}\))


The test is \(t=\frac{\left(\overline{Y}_1-\overline{Y}_2\right)-\left(\mu_1-\mu_2\right)}{SE_{\overline{Y}_1-\overline{Y}_2}}\). We assume that the term \(\mu_1-\mu_2\) is equal to zero if we are testing the null hypothesis that they’re equal. In the rare case that you wanted to test whether the two samples had a difference of 8 or something, then you’d have a 8 in that part of the equation. The denominator is a value called the “pooled variance,” which is a drag to calculate by hand but R will knock it out like that’s it’s job. Which it is.


This test’s assumptions are:


Using the two sample t test

I have some data comparing the length of the pelvis in 5 male macaques and 9 male gibbons. Do these species have the same pelvis length?

H0: Macaque = Gibbon (\(\mu_1=\mu_2\)).
Ha: Macaque ≠ Gibbon (\(\mu_1\ne\mu_2\)).

Download the CSV file.

# Bring in the file from your drive. Insert the file path inside the quotes.

pelvis <- read.csv(url("https://raw.githubusercontent.com/nmccurtin/CSVfilesbiostats/master/pelvislength%20(2).csv"))

# Use the package called "lattice" to do a stacked histogram. 
# Click it to activate it in the "Packages" pane or use this function.

library(lattice)

histogram( ~ pelvis | species, data = pelvis, layout = c(1,2), col = "orange", breaks = seq(7, 15, by = 1), xlab = "Pelvis length (mm)")


# Do the test

t.test(pelvis ~ species, data = pelvis, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  pelvis by species
## t = 8.0414, df = 12, p-value = 3.566e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.091495 5.389393
## sample estimates:
##         mean in group Hylobates lar mean in group Macaca fascicularis   
##                            13.25444                             9.01400

So, it’s clear that we can reject the null hypothesis that these means come from the same statistical population, (t = 8.04, df = 12, P < 0.001).


But go back to the histograms. Do they look like they have the same variance? Let’s use a test to see if they are. There are several choices to pick from in R. We’ll use the variance ratio test (F test) to test the null hypothesis that the ratio of the variances is = 1.

# Do the test

t.test(pelvis ~ species, data = pelvis, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  pelvis by species
## t = 8.0414, df = 12, p-value = 3.566e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.091495 5.389393
## sample estimates:
##         mean in group Hylobates lar mean in group Macaca fascicularis   
##                            13.25444                             9.01400

So, here we conclude that the variances are not different from one another (or that the ratio between them is not different from 1), P = 0.88. Notice that the 95% CI includes 1.

Because we failed to reject the null hypothesis, we could have added the function var.equal = TRUE and R would have calculated the degrees of freedom using the “usual” pooled variance. The two-sample t test runs Welch’s approximation by default. Frankly it’s always safer to use that. In this example there was no difference between the two methods.


Conclusion

That’s all for now. In the next module I’ll introduce some nonparametric methods for working with data that fail to conform to a test’s assumptions.