Student’s t-Confidence interval with Examples

In General,we represent the confidence limit for population mean by

\[\bar{X} - t_c(\frac{s}{\sqrt{N-1}})\] and \[\bar{X} + t_c(\frac{s}{\sqrt{N-1}})\] where the value \(t_c\), called critical values or confidence cofficient,depend on the level of confidence desired and on the sample size. This interval is often referred to as the “t-interval for the mean.”.

For example, if \(-t_{0.975}\) and \(t_{0.975}\) are the values of t for which 2.5% of the area lies in each tail of the t distribution, then for 95% confidence interval, \(\mu\) to lie in the interval \[\bar{X}-t_{0.975}\frac{s}{\sqrt{N-1}} < \mu < \bar{X} + t_{0.975}\frac{s}{\sqrt{N-1}}\] Note that \(t_{0.975}\) represents the 97.5 percentile value, while \(t_{0.025}\) =\(-t_{0.975}\) represent the 2.5 percentile value.

Difference of Mean

The formula in case of two groups with sample size as \(n_x\) and \(n_y\), standard deviation, \(S_x\) and \(S_y\). The Standard Error or SE is given by
\[SE=\sqrt{\frac{(n_x-1)(S_x)^2+(n_y-1)(S_y)^2}{(n_x+n_y-2)}}* \sqrt{\frac{1}{n_{oc}} + \frac{1}{n_c}}\]

Degree of freedom = \(df =(n_x-1)+(n_y-1)\)

Difference in Mean = \({\bar{X}}-\bar{Y}\)

2 groups with unequal variances.

Now let’s talk about calculating confidence intervals for two groups which have unequal variances.In this case the formula for the interval is similar to what we saw before, \[\bar{Y}-\bar{X} + c(-1,1)*t_{df} * SE\], where as before \(\bar{Y}-\bar{X}\) represents the difference of the sample means.
However, the standard error SE and the quantile t_df are calculated differently from previous methods. Here the standard error and degrees of freedom is given as below

\[SE = \sqrt{\frac{(s_1)^2}{n_1} + \frac{(s_2)^2}{n_2}}\]

\[df= \frac{(\frac{{s_x}^2}{n_x} + \frac{{s_y}^2}{n_y})^2}{(\frac{(s_x)^2}{n_x})^2\frac{1}{n_x-1} + (\frac{(s_y)^2}{n_x})^2\frac{1}{n_y-1}}\]

and the t interval given
\[\bar{Y}-\bar{X} + c(-1,1) *t_{df} * \sqrt{(\frac{(s_1)^2}{n_1}) + (\frac{(s_2)^2}{n_2})}\]

Example 1

x = seq(-4,4,0.01)
y = dt(x,df = 9)
plot(x , y, type="l", col = "black",lty=3, main="Student't T-Dist. with 9 degrees of freedom ")
abline(v=qt(0.975, df=9),lty= 2, col = 'blue')
abline(v= - qt(0.975, df=9),lty= 2)

The graph of student’s t distribution with nine degrees of freedom shown here. Find

the area to the right of blue line is 0.05 =qt(1-0.05,9) = 1.83
the total area right & left to line is 0.05 =qt(1-0.05/2,9) = 2.26
the total unshaded area is 0.99 =qt((1-0.99)/2,9) = 2.82
the shaded area on the left is 0.01 =qt(1-0.01),9) = 2.82
the area to the left of blue line =qt(0.9,9)= 1.38

Example2. Suppose that we want to compare the mean blood pressure between two groups in a randomized trial.The first is a group of 8 oral contraceptive (oc) users and the second is a group of 21 controls(c). (Note we cannot use the paired t test because the groups are independent and may have different sample sizes).

\(\bar{X}_{oc}\)=132.86 and \(s_{oc}\)= 15.34
\(\bar{X}_{c}\)=127.44 and \(s_{c}\)= 18.23

Solution:
Standard Error (se) = \(\sqrt{\frac{(n_x-1)(S_x)^2+(n_y-1)(S_y)^2}{(n_x+n_y-2)}}* \sqrt{\frac{1}{n_{oc}} + \frac{1}{n_c}}\) =7.281838
Degrees of freedom (df) = \((n_x-1)+(n_y-1)\) = 27

So 95% confidence interval is calculate using \(\bar{X} + c(-1,1)t_{df}se\) = -9.517, 20.357

Same example with unequal variance

\(\bar{X}_{oc}\)=132.86 ,\(n_{oc}\) = 8 , \(s_{oc}\)= 15.34
\(\bar{X}_{c}\)=127.44 ,\(n_{c}\) = 21 ,\(s_{c}\)= 18.23

\(df= \frac{(\frac{{s_x}^2}{n_x} + \frac{{s_y}^2}{n_y})^2}{(\frac{(s_x)^2}{n_x})^2\frac{1}{n_x-1} + (\frac{(s_y)^2}{n_x})^2\frac{1}{n_y-1}}\) = 15.03518
\(se = \sqrt{\frac{(s_1)^2}{n_1} + \frac{(s_2)^2}{n_2}}\) = 6.7260558
\(\bar{X}\)= 5.42
95% confidence Interval is given by \(\bar{X}+c(−1,1)tdfse\)= -8.916, 19.756

Example 3 with Sleep data in R studio. Get the sleep data from r studio. Get the t-confidence interval using t.test in R

solution :

data(sleep)
sleep

##    extra group ID
## 1    0.7     1  1
## 2   -1.6     1  2
## 3   -0.2     1  3
## 4   -1.2     1  4
## 5   -0.1     1  5
## 6    3.4     1  6
## 7    3.7     1  7
## 8    0.8     1  8
## 9    0.0     1  9
## 10   2.0     1 10
## 11   1.9     2  1
## 12   0.8     2  2
## 13   1.1     2  3
## 14   0.1     2  4
## 15  -0.1     2  5
## 16   4.4     2  6
## 17   5.5     2  7
## 18   1.6     2  8
## 19   4.6     2  9
## 20   3.4     2 10

sd <- sd(sleep$extra)
t.test(sleep$extra,paired = FALSE)$conf.int

## [1] 0.5955845 2.4844155
## attr(,"conf.level")
## [1] 0.95

Lets , do the same manually using the formula

\(\bar{X}+c(-1,1)qt(0.975,df)\frac{sd}{\sqrt{n}}\)= 0.5946109, 2.4853891

Example 4

g1<- c(0.7 ,-1.6 ,-0.2,-1.2, -0.1 , 3.4 , 3.7 , 0.8 , 0.0 , 2.0)
g2 <- c(1.9 , 0.8,  1.1 , 0.1, -0.1 ,4.4 , 5.5 , 1.6 , 4.6  ,3.4)
t.test(g1,g2,paired =FALSE,var.equal= TRUE)$conf.int

## [1] -3.363874  0.203874
## attr(,"conf.level")
## [1] 0.95

t.test(g1,g2,paired =FALSE,var.equal= FALSE )$conf.int

## [1] -3.3654832  0.2054832
## attr(,"conf.level")
## [1] 0.95

t.test(g1,g2,paired =TRUE )$conf.int

## [1] -2.4598858 -0.7001142
## attr(,"conf.level")
## [1] 0.95

Using the formula, 95% confidence interval

g1<- c(0.7 ,-1.6 ,-0.2,-1.2, -0.1 , 3.4 , 3.7 , 0.8 , 0.0 , 2.0)
g2 <- c(1.9 , 0.8,  1.1 , 0.1, -0.1 ,4.4 , 5.5 , 1.6 , 4.6  ,3.4)
m_g1 <- mean(g1)
sd_g1 <- sd(g1)
m_g2 <- mean(g2)
sd_g2 <- sd(g2)
se <- ((9*sd_g1^2 + 9*sd_g2^2)/18)^0.5 * (2/10)^0.5

# 95% confidence interval
m_g1-m_g2 + c(-1,1)*qt(0.975,18)*se

## [1] -3.363874  0.203874

Using the formula, the variance are unequal

g1<- c(0.7 ,-1.6 ,-0.2,-1.2, -0.1 , 3.4 , 3.7 , 0.8 , 0.0 , 2.0)
g2 <- c(1.9 , 0.8,  1.1 , 0.1, -0.1 ,4.4 , 5.5 , 1.6 , 4.6  ,3.4)
m_g1 <- mean(g1)
sd_g1 <- sd(g1)
m_g2 <- mean(g2)
sd_g2 <- sd(g2)

se = sqrt(sd_g1^2/10 + sd_g2^2/10)  
df = (sd_g1^2/10 + sd_g2^2/10)^2 /( sd_g1^4/(10^2 *9) + sd_g2^4/(10^2 *9))

# 95% confidence interval
m_g1-m_g2 + c(-1,1)*qt(0.975,df)*se

## [1] -3.3654832  0.2054832