z and t tests

one group z

Base R does not have a command that will calculate a one sample z test. While there are add on packages to do this, the simplest way to perform a test is to write and import a simple function. The following code will calculate z for the data set specified in a, with a mean equal to 13, and a variance equal to 16. (This is the example from the z and t test notes from D2L.)

a <- c(10, 8, 7, 12, 9, 6, 7, 8)
mu <- 13
var <- 16
z.test = function(a, mu, var){
   zeta = (mean(a) - mu) / (sqrt(var / length(a)))
   return(zeta)
} 
z.test(a, mu, var)
## [1] -3.270369

We could then evaluate the obtained value of z using the critical value appropriate to our alpha level, ± 1.96.

one group t

The command t.test is used for all t tests in R. For the above example, if we did not know the population standard deviation, a one group t would be appropriate and could be performed using the following code:

t.test(a, mu = 13, alternative = "two.sided", con.level = 0.95)
## 
##  One Sample t-test
## 
## data:  a
## t = -6.804, df = 7, p-value = 0.0002523
## alternative hypothesis: true mean is not equal to 13
## 95 percent confidence interval:
##  6.767658 9.982342
## sample estimates:
## mean of x 
##     8.375

As you can see, the t.test command provides substantially more information than the z.test function with much less code.

repeated measures t

A repeated measures t is equally easy in R. We will perform a t test on the blood pressure data from the z and t test handout in D2L.

BPpre <- c(123, 132, 111, 127,149, 130, 122, 117, 125)
BPpost <- c(118, 128, 104, 113, 126, 115, 125, 110, 120)

t.test(BPpre, BPpost, mu = 0, conf.int = 0.95, paired = T)
## 
##  Paired t-test
## 
## data:  BPpre and BPpost
## t = 3.3694, df = 8, p-value = 0.009794
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   2.700152 14.410959
## sample estimates:
## mean of the differences 
##                8.555556

Note that we included both the pre test measurement and the post test measurement for each subject, the null hypothesis difference between the means and we must tell R that this is a paired t test.

two group t test

The two group t test is appropriate when there are two experimental conditions with different subjects in each condition (as opposed to the repeated measures t where the same subjects serve in both conditions). There are two different versions of the two group t available in R, the more commonly performed “Student’s t” and the “Welch two sample t test”. First, we will perform the Student’s t. Note that I have already imported and attached the ”fusion" data file for these examples.

t.test(Time ~ condition, mu = 0, con.level = 0.95, paired = F, var.eq = T)
## 
##  Two Sample t-test
## 
## data:  Time by condition
## t = 1.9395, df = 76, p-value = 0.05615
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.08093735  6.09901044
## sample estimates:
## mean in group 1 mean in group 2 
##        8.560465        5.551429

Note that the R code is very similar to the repeated measures t with only two changes - paired = T instead of paired = F, and a new parameter - var.eq = T.The change in the value of paired from T to F is fairly obvious, the two group test uses independent groups, not paired or repeated measures. The new parameter “var.eq = T” indicates that we are using the assumption that the variances of the two groups are equal. This tells R to perform a Student’s t test. This is the test most commonly included in text books on statistics. If we change the value of var.eq to F, R will compute a Welch two sample t test which does not assume equal variance and adjusts the critical value of t to accommodate different variances for the two different groups.

t.test(Time ~ condition, mu = 0, con.level = 0.95, paired = F, var.eq = F)
## 
##  Welch Two Sample t-test
## 
## data:  Time by condition
## t = 2.0384, df = 70.039, p-value = 0.04529
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.06493219 5.95314090
## sample estimates:
## mean in group 1 mean in group 2 
##        8.560465        5.551429

Many statisticians believe the Welch test to be more accurate than the Student’s t and prefer to use it. Note that, in this case, our obtained probability level changed from just greater than 0.05 to just less than 0.05, allowing us to reject the null hypothesis with the Welch test but not Student’s.

One last thing worth knowing, the following code will produce the same results as the last example we did:

t.test(Time ~ condition)
## 
##  Welch Two Sample t-test
## 
## data:  Time by condition
## t = 2.0384, df = 70.039, p-value = 0.04529
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.06493219 5.95314090
## sample estimates:
## mean in group 1 mean in group 2 
##        8.560465        5.551429

This is because the t.test command has default values for all of the parameters except the dependent and independent variables (Time and condition in our example). If not specified, the predicted mean difference will be assumed to be zero, the 95% confidence interval will be calculated, the unpaired test will be done, and unequal variances will be assumed. Care should be taken to avoid the following mistake -

BPpre <- c(123, 132, 111, 127,149, 130, 122, 117, 125)
BPpost <- c(118, 128, 104, 113, 126, 115, 125, 110, 120)
t.test(BPpre, BPpost)
## 
##  Welch Two Sample t-test
## 
## data:  BPpre and BPpost
## t = 1.9241, df = 14.81, p-value = 0.07378
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.9326035 18.0437146
## sample estimates:
## mean of x mean of y 
##  126.2222  117.6667

These are, of course, the data from our previous repeated measures example but R assumes independent groups (paired = F) unless told otherwise, and calculates the two group version of t.

R is perfectly happy to calculate a value for the Welch test but, because the data were from a repeated measures design and the rest is for independent groups, the outcome of the test is meaningless.