Tests if two population means \(\mu_1\) and \(\mu_2\) differ less than, more than or by a value \(d_0\).
\[T = \frac{(\bar{X_1} - \bar{X_2}) - d_0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\] with \[S_j = \sqrt{\frac{1}{n_j-1}\sum_{i=1}^n (X_j - \bar{X_j})^2}\]
for \(j= 1,2\)
Rejection \(H_0\) if for the observed value t of T:
\(t \lt t_{\alpha/2, v}\) or \(t \gt t_{1-\alpha/2, v}\)
\(t \gt t_{1-\alpha, v}\)
\(t \lt t_{\alpha,v}\)
\(\rho = 2P(T \le (-|t|))\)
\(\rho = 1 - P(T \le t)\)
\(\rho = P(T \le t)\)
\[v = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\left(\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}\right)} \] degree of freedom of [Bernard Welch (1947) and Franklin Satterthwaite (1946) approximation].
\(t_{\alpha,v}\) is the \(\alpha\) quantitle of t-distribution with \(v\) degree of freedom.
William Cochran and Gertrude Cox (1950) proposed an alternative way to calculate critical values for the test statistic.
The assumtion of underlying Gaussian distribution can be relaxed if the sample sizes of both samples are large. Usually sample sizes \(n_1, n_2 \ge 25\) or \(30\) for both distributions are considered to be large enough.
To test the hypothesis that the mean systolic boold pressures of healthy subjects (status=0) and subjects with hypertention (status=1) are equal, hence \(d_0 = 0\). The dataset contains \(n_1 = 25\) subjects with status 0 and \(n_2 = 30\) with status 1.
#Blood_pressure dataset
no <- seq(1:55)
status <- c(rep(0, 25), rep(1, 30))
mmhg <- c(120,115,94,118,111,102,102,131,104,107,115,139,115,113,114,105,
115,134,109,109,93,118,109,106,125,150,142,119,127,141,149,144,
142,149,161,143,140,148,149,141,146,159,152,135,134,161,130,125,
141,148,153,145,137,147,169)
blood_pressure <-data.frame(no,status,mmhg)
status0<-blood_pressure$mmhg[blood_pressure$status==0]
status1<-blood_pressure$mmhg[blood_pressure$status==1]
t.test(status0,status1,mu=0,alternative="two.sided",var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: status0 and status1
## t = -10.451, df = 50.886, p-value = 2.887e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -37.32904 -25.29763
## sample estimates:
## mean of x mean of y
## 112.9200 144.2333