Statistical Hypothesis Testing with SAS and R

by Dirk Taeger and Sonja Kuhnt

(c) John Wiley & Sons, Ltd

Test 2.2.4: Paired z-test

Description:

Tests if the difference of two population means \(\mu_d = \mu_1 - \mu_2\) differ from a value \(d_0\) in the case that observations are collected in pairs.

Assumptions:

Hypotheses:

  1. \(H_0\) : \(\mu_d = d_0\) vs \(H_1\) : \(\mu_d \neq d_0\).
  2. \(H_0\) : \(\mu_d \le d_0\) vs \(H_1\) : \(\mu_d \gt d_0\).
  3. \(H_0\) : \(\mu_d \ge d_0\) vs \(H_1\) : \(\mu_d \lt d_0\).

Test statistic:

\[Z = \frac{\bar{D} - d_0}{\sigma_d}\sqrt{n}\] with \[\bar{D} = \frac{1}{n}\sum_{i=1}^n (X_{1i} - X_{2i})\]

Test decision:

Rejection \(H_0\) if for the observed value z of Z:

  1. \(z \lt z_{\alpha/2}\) or \(z \gt z_{1-\alpha/2}\)

  2. \(z \gt z_{1-\alpha}\)

  3. \(z \lt z_\alpha\)

P-value:

  1. \(\rho = 2\phi(-|z|)\)

  2. \(\rho = 1 - \phi(z)\)

  3. \(\rho = \phi(z)\)

Annotation:

Example

To test if the mean intelligence quotient increases by 10 comparing before training (IQ1) and after training (IQ2). It is known that the standard deviation of the difference is 1.40. Note: Because we are interested in a negative difference of means of \(IQ1 - IQ2\), we must test against \(d_0 = -10\)

#iq dataset
no <- seq(1:20)
IQ1 <- c(127, 98,105,83,133,90,107,98,91,100,88,96,110,87,88,88,105,95,79,106)
IQ2 <- c(137, 108,115,93,143,100,117,108,101,110,98,106,120,97,98,100,115,111,89,116)
iq <-data.frame(no,IQ1, IQ2)
# Set difference to test
d0<--10
# Set standard deviation of the difference
sigma_diff<-1.40

# Calculate the mean of the difference
mean_diff<-mean(iq$IQ1-iq$IQ2)

# Calculate the sample size 
n_total<-length(iq$IQ1)

# Calculate test statistic and two-sided p-value
z<-sqrt(n_total)*((mean_diff-d0)/sigma_diff)
p_value=2*pnorm(-abs(z))

# Output results
z
## [1] -1.277753
p_value
## [1] 0.2013365

Remarks:

  • There is no basic R function to calculate the two-sample z-test directly.
  • The one-sided p-value for hypothesis (B) can be calculated with p_value_B=1-pnorm(z) and the p-value for hypothesis (C) with p_value_C=pnorm(z).

Noted: Sang Nguyen
Nashville,TN - NOV 2016