Statistical Hypothesis Testing with SAS and R

by Dirk Taeger and Sonja Kuhnt

(c) John Wiley & Sons, Ltd

Test 2.2.4: Paired z-test

Description:

Tests if the difference of two population means \(\mu_d = \mu_1 - \mu_2\) differ from a value \(d_0\) in the case that observations are collected in pairs.

Assumptions:

Data are measured on an interval or ratio scale and randomly sampled in pairs \((X_1, X_2)\).
\(X_1\) follows Gaussian distributions with \(\mu_1\) and variance \(\sigma_1^2\). \(X_2\) follows Gaussian distributions with \(\mu_2\) and variance \(\sigma_2^2\). The covariance of \(X_1\) and \(X_2\) is \(\sigma_{12}\).
The standard deviations \(\sigma_d = \sqrt{\sigma_1^2 + \sigma_2^2 - 2\sigma_{12}}\) of the differences \(X_1 - X_2\) is known.

Hypotheses:

\(H_0\) : \(\mu_d = d_0\) vs \(H_1\) : \(\mu_d \neq d_0\).
\(H_0\) : \(\mu_d \le d_0\) vs \(H_1\) : \(\mu_d \gt d_0\).
\(H_0\) : \(\mu_d \ge d_0\) vs \(H_1\) : \(\mu_d \lt d_0\).

Test statistic:

\[Z = \frac{\bar{D} - d_0}{\sigma_d}\sqrt{n}\] with \[\bar{D} = \frac{1}{n}\sum_{i=1}^n (X_{1i} - X_{2i})\]

Test decision:

Rejection \(H_0\) if for the observed value z of Z:

\(z \lt z_{\alpha/2}\) or \(z \gt z_{1-\alpha/2}\)
\(z \gt z_{1-\alpha}\)
\(z \lt z_\alpha\)

P-value:

\(\rho = 2\phi(-|z|)\)
\(\rho = 1 - \phi(z)\)
\(\rho = \phi(z)\)

Annotation:

The test statistic \(Z\) follows a standard normal distribution.
\(z_\alpha\) is the \(\alpha\)-quantitle of the standard normal distribution.
The assumtion of underlying Gaussian distribution can be relaxed if the distribution of the difference is symmetric.

Example

To test if the mean intelligence quotient increases by 10 comparing before training (IQ1) and after training (IQ2). It is known that the standard deviation of the difference is 1.40. Note: Because we are interested in a negative difference of means of \(IQ1 - IQ2\), we must test against \(d_0 = -10\)

#iq dataset
no <- seq(1:20)
IQ1 <- c(127, 98,105,83,133,90,107,98,91,100,88,96,110,87,88,88,105,95,79,106)
IQ2 <- c(137, 108,115,93,143,100,117,108,101,110,98,106,120,97,98,100,115,111,89,116)
iq <-data.frame(no,IQ1, IQ2)

# Set difference to test
d0<--10
# Set standard deviation of the difference
sigma_diff<-1.40

# Calculate the mean of the difference
mean_diff<-mean(iq$IQ1-iq$IQ2)

# Calculate the sample size 
n_total<-length(iq$IQ1)

# Calculate test statistic and two-sided p-value
z<-sqrt(n_total)*((mean_diff-d0)/sigma_diff)
p_value=2*pnorm(-abs(z))

# Output results
z

## [1] -1.277753

p_value

## [1] 0.2013365

Remarks:

There is no basic R function to calculate the two-sample z-test directly.
The one-sided p-value for hypothesis (B) can be calculated with p_value_B=1-pnorm(z) and the p-value for hypothesis (C) with p_value_C=pnorm(z).