Statistical Hypothesis Testing with SAS and R

by Dirk Taeger and Sonja Kuhnt

(c) John Wiley & Sons, Ltd

Test 2.2.1: Two-sample z-test

Description:

Tests if two population means \(\mu_1\) and \(\mu_2\) differ less than, more than or by a value \(d_0\).

Assumptions:

Hypotheses:

  1. \(H_0\) : \(\mu_1 - \mu_2 = d_0\) vs \(H_1\) : \(\mu_1 - \mu_2 \neq d_0\).
  2. \(H_0\) : \(\mu_1 - \mu_2 \le d_0\) vs \(H_1\) : \(\mu_1 - \mu_2 \gt d_0\).
  3. \(H_0\) : \(\mu_1 - \mu2 \ge d_0\) vs \(H_1\) : \(\mu_1 - \mu_2 \lt d_0\).

Test statistic:

\[Z = \frac{(\bar{X_1} - \bar{X_2}) - d_0}{\sqrt{\frac{\sigma_1}{n_1} + \frac{\sigma_2}{n_2}}}\]

Test decision:

Rejection \(H_0\) if for the observed value z of Z:

  1. \(z \lt z_{\alpha/2}\) or \(z \gt z_{1-\alpha/2}\)

  2. \(z \gt z_{1-\alpha}\)

  3. \(z \lt z_\alpha\)

P-value:

  1. \(\rho = 2\phi(-|z|)\)

  2. \(\rho = 1 - \phi(z)\)

  3. \(\rho = \phi(z)\)

Annotation:

Example

To test the hypothesis that the mean systolic boold pressures of healthy subjects (status=0) and subjects with hypertention (status=1) are equal (\(d_0 = 0\)) with known standard deviation of \(\sigma_1 = 10\) and \(\sigma_2 = 12\). The dataset contains \(n_1 = 25\) subjects with status 0 and \(n_2 = 30\) with status 1.

#Blood_pressure dataset
no <- seq(1:55)
status <- c(rep(0, 25), rep(1, 30))
mmhg <- c(120,115,94,118,111,102,102,131,104,107,115,139,115,113,114,105,
          115,134,109,109,93,118,109,106,125,150,142,119,127,141,149,144,
          142,149,161,143,140,148,149,141,146,159,152,135,134,161,130,125,
          141,148,153,145,137,147,169)
blood_pressure <-data.frame(no,status,mmhg)
# Set difference to be tested
d0<-0
# Set standard deviation of sample with status 0
sigma0<-10
# Set standard deviation of sample with status 1
sigma1<-12

# Calculate the two means 
mean_status0<-mean(blood_pressure$mmhg[blood_pressure$status==0])
mean_status1<-mean(blood_pressure$mmhg[blood_pressure$status==1])

# Calculate both the sample sizes 
n_status0<-length(blood_pressure$mmhg[blood_pressure$status==0])
n_status1<-length(blood_pressure$mmhg[blood_pressure$status==1])

# Calculate test statistic and two-sided p-value
z<-((mean_status0-mean_status1)-d0)/sqrt(sigma0^2/n_status0+sigma1^2/n_status1)
p_value=2*pnorm(-abs(z))

# Output results
z
## [1] -10.55572
p_value
## [1] 4.779482e-26

Remarks:

  • There is no basic R function to calculate the two-sample z-test directly.
  • The one-sided p-value for hypothesis (B) can be calculated with p_value_B=1-pnorm(z) and the p-value for hypothesis (C) with p_value_C=pnorm(z).

Noted: Sang Nguyen
Nashville,TN - NOV 2016