In General,we represent the confidence limit for population mean by
\[\bar{X} - t_c(\frac{s}{\sqrt{N-1}})\] and \[\bar{X} + t_c(\frac{s}{\sqrt{N-1}})\] where the value \(t_c\), called critical values or confidence cofficient,depend on the level of confidence desired and on the sample size. This interval is often referred to as the “t-interval for the mean.”.
For example, if \(-t_{0.975}\) and \(t_{0.975}\) are the values of t for which 2.5% of the area lies in each tail of the t distribution, then for 95% confidence interval, \(\mu\) to lie in the interval \[\bar{X}-t_{0.975}\frac{s}{\sqrt{N-1}} < \mu < \bar{X} + t_{0.975}\frac{s}{\sqrt{N-1}}\] Note that \(t_{0.975}\) represents the 97.5 percentile value, while \(t_{0.025}\) =\(-t_{0.975}\) represent the 2.5 percentile value.
The formula in case of two groups with sample size as \(n_x\) and \(n_y\), standard deviation, \(S_x\) and \(S_y\). The Standard Error or SE is given by
\[SE=\sqrt{\frac{(n_x-1)(S_x)^2+(n_y-1)(S_y)^2}{(n_x+n_y-2)}}* \sqrt{\frac{1}{n_{oc}} + \frac{1}{n_c}}\]
Degree of freedom = \(df =(n_x-1)+(n_y-1)\)
Difference in Mean = \({\bar{X}}-\bar{Y}\)
Now let’s talk about calculating confidence intervals for two groups which have unequal variances.In this case the formula for the interval is similar to what we saw before, \[\bar{Y}-\bar{X}
+ c(-1,1)*t_{df} * SE\], where as before \(\bar{Y}-\bar{X}\) represents the difference of the sample means.
However, the standard error SE and the quantile t_df are calculated differently from previous methods. Here the standard error and degrees of freedom is given as below
\[SE = \sqrt{\frac{(s_1)^2}{n_1} + \frac{(s_2)^2}{n_2}}\]
\[df= \frac{(\frac{{s_x}^2}{n_x} + \frac{{s_y}^2}{n_y})^2}{(\frac{(s_x)^2}{n_x})^2\frac{1}{n_x-1} + (\frac{(s_y)^2}{n_x})^2\frac{1}{n_y-1}}\]
and the t interval given
\[\bar{Y}-\bar{X} + c(-1,1) *t_{df} * \sqrt{(\frac{(s_1)^2}{n_1}) + (\frac{(s_2)^2}{n_2})}\]
Example 1
x = seq(-4,4,0.01)
y = dt(x,df = 9)
plot(x , y, type="l", col = "black",lty=3, main="Student't T-Dist. with 9 degrees of freedom ")
abline(v=qt(0.975, df=9),lty= 2, col = 'blue')
abline(v= - qt(0.975, df=9),lty= 2)
The graph of student’s t distribution with nine degrees of freedom shown here. Find
Example2. Suppose that we want to compare the mean blood pressure between two groups in a randomized trial.The first is a group of 8 oral contraceptive (oc) users and the second is a group of 21 controls(c). (Note we cannot use the paired t test because the groups are independent and may have different sample sizes).
\(\bar{X}_{oc}\)=132.86 and \(s_{oc}\)= 15.34
\(\bar{X}_{c}\)=127.44 and \(s_{c}\)= 18.23
Solution:
Standard Error (se) = \(\sqrt{\frac{(n_x-1)(S_x)^2+(n_y-1)(S_y)^2}{(n_x+n_y-2)}}* \sqrt{\frac{1}{n_{oc}} + \frac{1}{n_c}}\) =7.281838
Degrees of freedom (df) = \((n_x-1)+(n_y-1)\) = 27
So 95% confidence interval is calculate using \(\bar{X} + c(-1,1)t_{df}se\) = -9.517, 20.357
Same example with unequal variance
\(\bar{X}_{oc}\)=132.86 ,\(n_{oc}\) = 8 , \(s_{oc}\)= 15.34
\(\bar{X}_{c}\)=127.44 ,\(n_{c}\) = 21 ,\(s_{c}\)= 18.23
\(df= \frac{(\frac{{s_x}^2}{n_x} + \frac{{s_y}^2}{n_y})^2}{(\frac{(s_x)^2}{n_x})^2\frac{1}{n_x-1} + (\frac{(s_y)^2}{n_x})^2\frac{1}{n_y-1}}\) = 15.03518
\(se = \sqrt{\frac{(s_1)^2}{n_1} + \frac{(s_2)^2}{n_2}}\) = 6.7260558
\(\bar{X}\)= 5.42
95% confidence Interval is given by \(\bar{X}+c(−1,1)tdfse\)= -8.916, 19.756
Example 3 with Sleep data in R studio. Get the sleep data from r studio. Get the t-confidence interval using t.test in R
solution :
data(sleep)
sleep
## extra group ID
## 1 0.7 1 1
## 2 -1.6 1 2
## 3 -0.2 1 3
## 4 -1.2 1 4
## 5 -0.1 1 5
## 6 3.4 1 6
## 7 3.7 1 7
## 8 0.8 1 8
## 9 0.0 1 9
## 10 2.0 1 10
## 11 1.9 2 1
## 12 0.8 2 2
## 13 1.1 2 3
## 14 0.1 2 4
## 15 -0.1 2 5
## 16 4.4 2 6
## 17 5.5 2 7
## 18 1.6 2 8
## 19 4.6 2 9
## 20 3.4 2 10
sd <- sd(sleep$extra)
t.test(sleep$extra,paired = FALSE)$conf.int
## [1] 0.5955845 2.4844155
## attr(,"conf.level")
## [1] 0.95
Lets , do the same manually using the formula
\(\bar{X}+c(-1,1)qt(0.975,df)\frac{sd}{\sqrt{n}}\)= 0.5946109, 2.4853891
Example 4
g1<- c(0.7 ,-1.6 ,-0.2,-1.2, -0.1 , 3.4 , 3.7 , 0.8 , 0.0 , 2.0)
g2 <- c(1.9 , 0.8, 1.1 , 0.1, -0.1 ,4.4 , 5.5 , 1.6 , 4.6 ,3.4)
t.test(g1,g2,paired =FALSE,var.equal= TRUE)$conf.int
## [1] -3.363874 0.203874
## attr(,"conf.level")
## [1] 0.95
t.test(g1,g2,paired =FALSE,var.equal= FALSE )$conf.int
## [1] -3.3654832 0.2054832
## attr(,"conf.level")
## [1] 0.95
t.test(g1,g2,paired =TRUE )$conf.int
## [1] -2.4598858 -0.7001142
## attr(,"conf.level")
## [1] 0.95
Using the formula, 95% confidence interval
g1<- c(0.7 ,-1.6 ,-0.2,-1.2, -0.1 , 3.4 , 3.7 , 0.8 , 0.0 , 2.0)
g2 <- c(1.9 , 0.8, 1.1 , 0.1, -0.1 ,4.4 , 5.5 , 1.6 , 4.6 ,3.4)
m_g1 <- mean(g1)
sd_g1 <- sd(g1)
m_g2 <- mean(g2)
sd_g2 <- sd(g2)
se <- ((9*sd_g1^2 + 9*sd_g2^2)/18)^0.5 * (2/10)^0.5
# 95% confidence interval
m_g1-m_g2 + c(-1,1)*qt(0.975,18)*se
## [1] -3.363874 0.203874
Using the formula, the variance are unequal
g1<- c(0.7 ,-1.6 ,-0.2,-1.2, -0.1 , 3.4 , 3.7 , 0.8 , 0.0 , 2.0)
g2 <- c(1.9 , 0.8, 1.1 , 0.1, -0.1 ,4.4 , 5.5 , 1.6 , 4.6 ,3.4)
m_g1 <- mean(g1)
sd_g1 <- sd(g1)
m_g2 <- mean(g2)
sd_g2 <- sd(g2)
se = sqrt(sd_g1^2/10 + sd_g2^2/10)
df = (sd_g1^2/10 + sd_g2^2/10)^2 /( sd_g1^4/(10^2 *9) + sd_g2^4/(10^2 *9))
# 95% confidence interval
m_g1-m_g2 + c(-1,1)*qt(0.975,df)*se
## [1] -3.3654832 0.2054832