Confidence Intervals for One Sample: Continuous Outcome

Confidence Intervals for μ

For n > 30 use the Z table for the standard normal distribution.

\[\bar{X} \pm Z \frac{S}{\sqrt{n}}\ \]

For n<30 use the t table with df=n-1

\[\bar{X} \pm t \frac{S}{\sqrt{n}}\ \]

Question 1:

Characteristic n Sample.Mean Standard.Deviation..s.
Body Mass Index 3326 28.15 5.32
Systolic Blood Pressure 3534 127.30 19.00

What is the 90% confidence interval for BMI? Since n > 30, we use Z score:

x_cap=28.5
n = 3326
S = 5.32
ci = .90
ci_90 = x_cap + c(-1,1) * qnorm(ci+(1-ci)/2) * S/sqrt(n)

90% confidence interval is (28.3482678, 28.6517322)

Question 2:

Characteristic n Sample.Mean Standard.Deviation..s.
Body Mass Index 10 27.26 3.1
Systolic Blood Pressure 10 121.20 11.1

What is the 90% confidence interval for BMI? Since n < 30, we use t score with degrees of freedom (df)=n-1, df = 10 - 1 = 9:

x_cap=27.26
n = 10
df = n - 1
S = 3.1
ci = .90
ci_90 = x_cap + c(-1,1) * qt(ci+(1-ci)/2, df=df) * S/sqrt(n)

90% confidence interval is (25.4629883, 29.0570117)

Confidence Interval for the Population Proportion

The sample proportion is \[ \hat{p}\]̂ (called “p-hat”), and it is computed by taking the ratio of the number of successes in the sample to the sample size, that is: \[ \hat{p}=x/n \]

Confidence Interval for the Population Proportion

If there are more than 5 successes and more than 5 failures, then the confidence interval can be computed with this formula: \[\hat{p} \pm Z * SE(\hat{p})\ \] where \[SE(\hat{p})\] is standard error of the point estimate and calculated as follows: \[SE(\hat{p}) = \sqrt{ \frac{ \hat{p} (1-\hat{p})} {n}} \] therefore full formula for confidence interval calculation is: \[\hat{p} \pm Z * \sqrt{\frac{ \hat{p} (1-\hat{p})} {n}} \]

Question 3:

The table below shows the number of men and women found with or without cardiovascular disease (CVD). Estimate the prevalence of CVD in men using a 95% confidence interval.

Characteristic Free.of.CVD Prevalent.CVD Total
Men 1548 244 1792
Women 1872 135 2007
Total 3420 379 3799
x=244
n = 1792
p_hat=x/n
SE = sqrt(p_hat*(1-p_hat)/n)
ci = .95
ci_95 = p_hat + c(-1,1) * pnorm(ci+(1-ci)/2) * SE

95% confidence interval is (0.1293941, 0.1429274)
With 95% confidence the prevalence of cardiovascular disease in men is between 12.0 to 15.2%.

Confidence Interval for Two Independent Samples, Continuous Outcome

For example, we might be interested in comparing mean systolic blood pressure in men and women, or perhaps compare body mass index (BMI) in smokers and non-smokers. Both of these situations involve comparisons between two independent groups, meaning that there are different people in the groups being compared.
The use of Z or t again depends on whether the sample sizes are large (n1 > 30 and n2 > 30) or small.
The parameter of interest is the difference in population means, μ1 - μ2. The point estimate for the difference in population means is the difference in sample means: \[\bar{x_{1}} - \bar{x_{2}} \] the standard error (SE) of the difference in sample means is the pooled estimate of the common standard deviation (Sp) (assuming that the variances in the populations are similar) computed as the weighted average of the standard deviations in the samples, i.e.: \[SE(\bar{x_{1}} - \bar{x_{2}}) = S_{p}\sqrt{\frac{1}{n_{1}}+\frac{1}{n_{2}}} \]
and the pooled estimate of the common standard deviation is: \[S_{p}=\sqrt{\frac{(n_{1}-1)s_{1}^2+(n_{2}-1)s_{2}^2} {n_{1}+n_{2}-2}} \]

Computing the Confidence Interval for a Difference Between Two Means

If n1 > 30 and n2 > 30, we can use the z-table: \[(\bar{x_{1}}-\bar{x_{2}}) \pm zS_{p}\sqrt{\frac{1}{n_{1}}+\frac{1}{n_{2}}} \] If n1 < 30 or n2 < 30, use the t-table: \[(\bar{x_{1}}-\bar{x_{2}}) \pm tS_{p}\sqrt{\frac{1}{n_{1}}+\frac{1}{n_{2}}} \]

Question 4

Characteristic n..men Sample.Mean..men s..men n..women Sample.Mean..women s..women
SBP 6 117.5 9.7 4 126.8 12

Suppose we wish to construct a 95% confidence interval for the difference in mean systolic blood pressures between men and women using these data. We will again arbitrarily designate men group 1 and women group 2. Since the sample sizes are small (i.e., n1< 30 and n2< 30), the confidence interval formula with t is appropriate. The ratio of the sample variances is 9.72/12.02 = 0.65, which falls between 0.5 and 2, suggesting that the assumption of equality of population variances is reasonable.

n_1 = 6
n_2 = 4
df = n_1 + n_2 - 2
S_1 = 9.7
S_2 = 12
x_1 = 117.5
x_2 = 126.8

S_p = sqrt( ((n_1 - 1) * S_1^2 + (n_2 - 1) * S_2^2) / df )
print(S_p)
## [1] 10.62103
ci = .95
ci_95 = (x_1-x_2) + c(-1,1) * qt(ci+(1-ci)/2, df=df) * S_p* sqrt(1/n_1+1/n_2)

95% confidence interval for the difference is (-25.1096058, 6.5096058)

Confidence Intervals for Matched Samples, Continuous Outcome

Consider the following scenarios:
- A single sample of participants and each participant is measured twice, once before and then after an intervention.
- A single sample of participants and each participant is measured twice under two different experimental conditions (e.g., in a crossover trial).

Generic formula: \[X_{d} \pm \bar{X_{d}}\]

In n > 30 use Z table for standard normal distribution
If n < 30 use t-table with df = n-1

Full formula for n < 30:
\[X_{d} \pm t \frac{S_{d}}{\sqrt{n}}\]

Subject.. Examination.6 Examination.7 Difference
1 168 141 -27
2 111 119 8
3 139 122 -17
4 127 127 0
5 155 125 -30
6 115 123 8
7 125 113 -12
8 123 106 -17
9 130 131 1
10 137 142 5
11 130 131 1
12 129 135 6
13 112 119 7
14 141 130 -11
15 122 121 -1

In this sample, we have n=15, the mean difference score = -5.27 and sd = 12.81, respectively.
The calculations are shown below

Subject.. Difference Difference…Mean.Difference X.Difference…Mean.Difference..2
1 -27 -21.7 470.89
2 8 13.3 176.89
3 -17 -11.7 136.89
4 0 5.3 28.09
5 -30 -24.7 610.09
6 8 13.3 176.89
7 -12 -6.7 44.89
8 -17 -11.7 136.89
9 1 6.3 39.69
10 5 10.3 106.09
11 1 6.3 39.69
12 6 11.3 127.69
13 7 12.3 151.29
14 -11 -5.7 32.49
15 -1 4.3 18.49
Sum -79 0.0 2296.95

\[X_{d} = \frac{\sum X}{n} = \frac{-79.0}{15} = -5.3\] \[S_{d} = \sqrt{\frac{\sum(Differences-\bar{X_{d}})^2}{n-1}} = \sqrt{\frac{2996.95}{14}} = -12.8\]

We can now use these descriptive statistics to compute a 95% confidence interval for the mean difference in systolic blood pressures in the population. Because the sample size is small (n=15), we use the formula that employs the t-statistic. The degrees of freedom are df=n-1=14. From the table of t-scores, t = 2.145. \[X_{d} \pm t \frac{S_{d}}{\sqrt{n}} = -5.3 \pm 2.145\frac{12.8}{\sqrt{15}} = -5.3 \pm 7.1\]
So, the 95% confidence interval for the difference is (-12.4, 1.8)

Confidence Interval for Two Independent Samples, Dichotomous Outcome

It is common to compare two independent groups with respect to the presence or absence of a dichotomous characteristic or attribute. When the outcome is dichotomous, the analysis involves comparing the proportions of successes between the two groups. There are several ways of comparing proportions in two independent groups.

One can compute a risk difference, which is computed by taking the difference in proportions between comparison groups and is similar to the estimate of the difference in means for a continuous outcome.
The risk ratio (or relative risk) is another useful measure to compare proportions between two independent populations and it is computed by taking the ratio of proportions.

Confidence Interval for a Risk Difference or Prevalence Difference

A risk difference (RD) or prevalence difference is a difference in proportions. The point estimate is the difference in sample proportions: \[\hat{RD} = \hat{p_{1}} - \hat{p_{2}} \]

The sample proportions are computed by taking the ratio of the number of “successes” (or health events, x) to the sample size (n) in each group: \[\hat{p_{1}} = \frac{x_{1}}{n_{1}} and \hat{p_{2}} = \frac{x_{2}}{n_{2}}\]

Computing the Confidence Interval for a Difference in Proportions ( p1-p2 )

\[(\hat{p_{1}} - \hat{p_{2}}) \pm Z\sqrt{\frac{\hat{p_{1}}(1-\hat{p_{1}})}{n_{1}} + \frac{\hat{p_{2}}(1-\hat{p_{2}})}{n_{2}}}\]

Question 5

Compute the 95% confidence interval for the difference in proportions of patients reporting relief (in this case a risk difference, since it is a difference in cumulative incidence).

Treatment.group n X..with.reduction Proportion.with.reduction
New pain reliever 50 23 0.46
Standard pain reliever 50 11 0.22

\[(0.46 - 0.22) \pm 1.96\sqrt{\frac{0.46(1-0.46)}{50} + \frac{0.22(1-0.22)}{50}}\]

\[ 0.24 \pm 0.18 \]

p1 = 0.46
p2 = 0.22
ci = 0.95
ci_95 = (p1-p2) + c(-1,1)*qnorm(ci+(1-ci)/2)*sqrt(p1*(1-p1)/50 + p2*(1-p2)/50)

95% confidence interval is 0.0603663, 0.4196337 Interpretation: Our best estimate is an increase of 24% in pain relief with the new treatment, and with 95% confidence, the risk difference is between 6% and 42%. Since the 95% confidence interval does not contain the null value of 0, we can conclude that there is a statistically significant improvement with the new treatment.

Confidence Intervals for the Risk Ratio (Relative Risk)

X. With.Outcome Without.Outcome Total
Exposed group (1) x1 n1-x1 n1
Non-exposed group (2) x2 n2-x2 n2

Computation of a Confidence Interval for a Risk Ratio

\[RR=p+{1}/p_{2}\] \[ln(\hat{RR}) \pm Z\sqrt{\frac{(n_{1} - x_{1})/x_{1}}{n_{1}} + \frac{(n_{2} - x_{2})/x_{2}}{n_{2}}}\] Compute the confidence interval for RR by finding the antilog of the result in step 1, i.e., exp(Lower Limit), exp (Upper Limit).

Note that the null value of the confidence interval for the relative risk is one. If a 95% CI for the relative risk includes the null value of 1, then there is insufficient evidence to conclude that the groups are statistically significantly different.

Question 6

Compute the point estimate for the relative risk for achieving pain relief, comparing those receiving the new drug to those receiving the standard pain reliever. Then compute the 95% confidence interval for the relative risk, and interpret your findings in words.

Treatment.group n X..with.reduction Proportion.with.reduction
New pain reliever 50 23 0.46
Standard pain reliever 50 11 0.22

\[\hat{RR} = \frac{0.46}{0.22} = 2.09\] \[ln(\hat{RR}) \pm Z\sqrt{\frac{(n_{1} - x_{1})/x_{1}}{n_{1}} + \frac{(n_{2} - x_{2})/x_{2}}{n_{2}}} = ln(2.09) \pm 1.96 \sqrt{\frac{(50 - 23)/23}{50} + \frac{(50 - 11)/11}{50}} = 0.737 \pm 0.602 = (0.135, 1.339) \] This is the confidence interval for ln(RR). To compute the upper and lower limits for the confidence interval for RR we must find the antilog using the (exp) function: exp(0.135) = 1.14
exp(1.339) = 3.82
Therefore, we are 95% confident that patients receiving the new pain reliever are between 1.14 and 3.82 times as likely to report a meaningful reduction in pain compared to patients receiving tha standard pain reliever.
The point estimate for the risk ratio is RR=p1/p2=0.46/0.22=2.09. Therefore, new drug has 2.09 times the risk of relieving pain compared to standard drug.

C. Confidence Intervals for the Odds Ratio

Treatment.group Diseased Non.diseased
Exposed a b
Non-Exposed c d

Odds ratio computed: OR = (a/b) / (c/d)
Odds ratios do not follow a normal distribution, so we use the log transformation to promote normality.
#### Computing the Confidence Interval for an Odds Ratio \[Ln(OR) \pm Z\sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}}\]

The null, or no difference, value of the confidence interval for the odds ratio is one. If a 95% CI for the odds ratio does not include one, then the odds are said to be statistically significantly different.

Question 7

Treatment.group n X..with.reduction Proportion.with.reduction
New pain reliever 50 23 0.46
Standard pain reliever 50 11 0.22

It is easier to solve this problem if the information is organized in a contingency table in this way (50 - 23 = 27 etc):

Treatment.group Pain.relief.3. Less.relief
New pain reliever 23 27
Standard pain reliever 11 39

Odds of pain relief 3+ with new drug = 23/27 0.8519
Odds of pain relief 3+ with standard drug = 11/39 = 0.2821
Odds Ratio = 0.8519 / 0.2821 = 3.02
To compute the 95% cofidence interval for the odds ratio:
\[ln(3.02) \pm 1.96\sqrt{\frac{1}{23}+\frac{1}{27}+\frac{1}{11}+\frac{1}{39}} = 1.105257 \pm 0.87008 = (0.235173, 1.975341)\]

exp(0.235173) = 1.265
exp(1.975341) = 7.209
The point estimate of the odds ratio is OR=3.2, and we are 95% confident that the true odds ratio lies between 1.27 and 7.21. This is statistically significant because the 95% confidence interval does not include the null value (OR=1.0).

Reference and summary: Link