Confidence Intervals for μ
For n > 30 use the Z table for the standard normal distribution.
\[\bar{X} \pm Z \frac{S}{\sqrt{n}}\ \]
For n<30 use the t table with df=n-1
\[\bar{X} \pm t \frac{S}{\sqrt{n}}\ \]
| Characteristic | n | Sample.Mean | Standard.Deviation..s. |
|---|---|---|---|
| Body Mass Index | 3326 | 28.15 | 5.32 |
| Systolic Blood Pressure | 3534 | 127.30 | 19.00 |
What is the 90% confidence interval for BMI? Since n > 30, we use Z score:
x_cap=28.5
n = 3326
S = 5.32
ci = .90
ci_90 = x_cap + c(-1,1) * qnorm(ci+(1-ci)/2) * S/sqrt(n)
90% confidence interval is (28.3482678, 28.6517322)
| Characteristic | n | Sample.Mean | Standard.Deviation..s. |
|---|---|---|---|
| Body Mass Index | 10 | 27.26 | 3.1 |
| Systolic Blood Pressure | 10 | 121.20 | 11.1 |
What is the 90% confidence interval for BMI? Since n < 30, we use t score with degrees of freedom (df)=n-1, df = 10 - 1 = 9:
x_cap=27.26
n = 10
df = n - 1
S = 3.1
ci = .90
ci_90 = x_cap + c(-1,1) * qt(ci+(1-ci)/2, df=df) * S/sqrt(n)
90% confidence interval is (25.4629883, 29.0570117)
The sample proportion is \[ \hat{p}\]̂ (called “p-hat”), and it is computed by taking the ratio of the number of successes in the sample to the sample size, that is: \[ \hat{p}=x/n \]
If there are more than 5 successes and more than 5 failures, then the confidence interval can be computed with this formula: \[\hat{p} \pm Z * SE(\hat{p})\ \] where \[SE(\hat{p})\] is standard error of the point estimate and calculated as follows: \[SE(\hat{p}) = \sqrt{ \frac{ \hat{p} (1-\hat{p})} {n}} \] therefore full formula for confidence interval calculation is: \[\hat{p} \pm Z * \sqrt{\frac{ \hat{p} (1-\hat{p})} {n}} \]
The table below shows the number of men and women found with or without cardiovascular disease (CVD). Estimate the prevalence of CVD in men using a 95% confidence interval.
| Characteristic | Free.of.CVD | Prevalent.CVD | Total |
|---|---|---|---|
| Men | 1548 | 244 | 1792 |
| Women | 1872 | 135 | 2007 |
| Total | 3420 | 379 | 3799 |
x=244
n = 1792
p_hat=x/n
SE = sqrt(p_hat*(1-p_hat)/n)
ci = .95
ci_95 = p_hat + c(-1,1) * pnorm(ci+(1-ci)/2) * SE
95% confidence interval is (0.1293941, 0.1429274)
With 95% confidence the prevalence of cardiovascular disease in men is between 12.0 to 15.2%.
For example, we might be interested in comparing mean systolic blood pressure in men and women, or perhaps compare body mass index (BMI) in smokers and non-smokers. Both of these situations involve comparisons between two independent groups, meaning that there are different people in the groups being compared.
The use of Z or t again depends on whether the sample sizes are large (n1 > 30 and n2 > 30) or small.
The parameter of interest is the difference in population means, μ1 - μ2. The point estimate for the difference in population means is the difference in sample means: \[\bar{x_{1}} - \bar{x_{2}} \] the standard error (SE) of the difference in sample means is the pooled estimate of the common standard deviation (Sp) (assuming that the variances in the populations are similar) computed as the weighted average of the standard deviations in the samples, i.e.: \[SE(\bar{x_{1}} - \bar{x_{2}}) = S_{p}\sqrt{\frac{1}{n_{1}}+\frac{1}{n_{2}}} \]
and the pooled estimate of the common standard deviation is: \[S_{p}=\sqrt{\frac{(n_{1}-1)s_{1}^2+(n_{2}-1)s_{2}^2} {n_{1}+n_{2}-2}} \]
If n1 > 30 and n2 > 30, we can use the z-table: \[(\bar{x_{1}}-\bar{x_{2}}) \pm zS_{p}\sqrt{\frac{1}{n_{1}}+\frac{1}{n_{2}}} \] If n1 < 30 or n2 < 30, use the t-table: \[(\bar{x_{1}}-\bar{x_{2}}) \pm tS_{p}\sqrt{\frac{1}{n_{1}}+\frac{1}{n_{2}}} \]
| Characteristic | n..men | Sample.Mean..men | s..men | n..women | Sample.Mean..women | s..women |
|---|---|---|---|---|---|---|
| SBP | 6 | 117.5 | 9.7 | 4 | 126.8 | 12 |
Suppose we wish to construct a 95% confidence interval for the difference in mean systolic blood pressures between men and women using these data. We will again arbitrarily designate men group 1 and women group 2. Since the sample sizes are small (i.e., n1< 30 and n2< 30), the confidence interval formula with t is appropriate. The ratio of the sample variances is 9.72/12.02 = 0.65, which falls between 0.5 and 2, suggesting that the assumption of equality of population variances is reasonable.
n_1 = 6
n_2 = 4
df = n_1 + n_2 - 2
S_1 = 9.7
S_2 = 12
x_1 = 117.5
x_2 = 126.8
S_p = sqrt( ((n_1 - 1) * S_1^2 + (n_2 - 1) * S_2^2) / df )
print(S_p)
## [1] 10.62103
ci = .95
ci_95 = (x_1-x_2) + c(-1,1) * qt(ci+(1-ci)/2, df=df) * S_p* sqrt(1/n_1+1/n_2)
95% confidence interval for the difference is (-25.1096058, 6.5096058)
Consider the following scenarios:
- A single sample of participants and each participant is measured twice, once before and then after an intervention.
- A single sample of participants and each participant is measured twice under two different experimental conditions (e.g., in a crossover trial).
Generic formula: \[X_{d} \pm \bar{X_{d}}\]
In n > 30 use Z table for standard normal distribution
If n < 30 use t-table with df = n-1
Full formula for n < 30:
\[X_{d} \pm t \frac{S_{d}}{\sqrt{n}}\]
| Subject.. | Examination.6 | Examination.7 | Difference |
|---|---|---|---|
| 1 | 168 | 141 | -27 |
| 2 | 111 | 119 | 8 |
| 3 | 139 | 122 | -17 |
| 4 | 127 | 127 | 0 |
| 5 | 155 | 125 | -30 |
| 6 | 115 | 123 | 8 |
| 7 | 125 | 113 | -12 |
| 8 | 123 | 106 | -17 |
| 9 | 130 | 131 | 1 |
| 10 | 137 | 142 | 5 |
| 11 | 130 | 131 | 1 |
| 12 | 129 | 135 | 6 |
| 13 | 112 | 119 | 7 |
| 14 | 141 | 130 | -11 |
| 15 | 122 | 121 | -1 |
In this sample, we have n=15, the mean difference score = -5.27 and sd = 12.81, respectively.
The calculations are shown below
| Subject.. | Difference | Difference…Mean.Difference | X.Difference…Mean.Difference..2 |
|---|---|---|---|
| 1 | -27 | -21.7 | 470.89 |
| 2 | 8 | 13.3 | 176.89 |
| 3 | -17 | -11.7 | 136.89 |
| 4 | 0 | 5.3 | 28.09 |
| 5 | -30 | -24.7 | 610.09 |
| 6 | 8 | 13.3 | 176.89 |
| 7 | -12 | -6.7 | 44.89 |
| 8 | -17 | -11.7 | 136.89 |
| 9 | 1 | 6.3 | 39.69 |
| 10 | 5 | 10.3 | 106.09 |
| 11 | 1 | 6.3 | 39.69 |
| 12 | 6 | 11.3 | 127.69 |
| 13 | 7 | 12.3 | 151.29 |
| 14 | -11 | -5.7 | 32.49 |
| 15 | -1 | 4.3 | 18.49 |
| Sum | -79 | 0.0 | 2296.95 |
\[X_{d} = \frac{\sum X}{n} = \frac{-79.0}{15} = -5.3\] \[S_{d} = \sqrt{\frac{\sum(Differences-\bar{X_{d}})^2}{n-1}} = \sqrt{\frac{2996.95}{14}} = -12.8\]
We can now use these descriptive statistics to compute a 95% confidence interval for the mean difference in systolic blood pressures in the population. Because the sample size is small (n=15), we use the formula that employs the t-statistic. The degrees of freedom are df=n-1=14. From the table of t-scores, t = 2.145. \[X_{d} \pm t \frac{S_{d}}{\sqrt{n}} = -5.3 \pm 2.145\frac{12.8}{\sqrt{15}} = -5.3 \pm 7.1\]
So, the 95% confidence interval for the difference is (-12.4, 1.8)
It is common to compare two independent groups with respect to the presence or absence of a dichotomous characteristic or attribute. When the outcome is dichotomous, the analysis involves comparing the proportions of successes between the two groups. There are several ways of comparing proportions in two independent groups.
One can compute a risk difference, which is computed by taking the difference in proportions between comparison groups and is similar to the estimate of the difference in means for a continuous outcome.
The risk ratio (or relative risk) is another useful measure to compare proportions between two independent populations and it is computed by taking the ratio of proportions.
A risk difference (RD) or prevalence difference is a difference in proportions. The point estimate is the difference in sample proportions: \[\hat{RD} = \hat{p_{1}} - \hat{p_{2}} \]
The sample proportions are computed by taking the ratio of the number of “successes” (or health events, x) to the sample size (n) in each group: \[\hat{p_{1}} = \frac{x_{1}}{n_{1}} and \hat{p_{2}} = \frac{x_{2}}{n_{2}}\]
\[(\hat{p_{1}} - \hat{p_{2}}) \pm Z\sqrt{\frac{\hat{p_{1}}(1-\hat{p_{1}})}{n_{1}} + \frac{\hat{p_{2}}(1-\hat{p_{2}})}{n_{2}}}\]
Compute the 95% confidence interval for the difference in proportions of patients reporting relief (in this case a risk difference, since it is a difference in cumulative incidence).
| Treatment.group | n | X..with.reduction | Proportion.with.reduction |
|---|---|---|---|
| New pain reliever | 50 | 23 | 0.46 |
| Standard pain reliever | 50 | 11 | 0.22 |
\[(0.46 - 0.22) \pm 1.96\sqrt{\frac{0.46(1-0.46)}{50} + \frac{0.22(1-0.22)}{50}}\]
\[ 0.24 \pm 0.18 \]
p1 = 0.46
p2 = 0.22
ci = 0.95
ci_95 = (p1-p2) + c(-1,1)*qnorm(ci+(1-ci)/2)*sqrt(p1*(1-p1)/50 + p2*(1-p2)/50)
95% confidence interval is 0.0603663, 0.4196337 Interpretation: Our best estimate is an increase of 24% in pain relief with the new treatment, and with 95% confidence, the risk difference is between 6% and 42%. Since the 95% confidence interval does not contain the null value of 0, we can conclude that there is a statistically significant improvement with the new treatment.
| X. | With.Outcome | Without.Outcome | Total |
|---|---|---|---|
| Exposed group (1) | x1 | n1-x1 | n1 |
| Non-exposed group (2) | x2 | n2-x2 | n2 |
\[RR=p+{1}/p_{2}\] \[ln(\hat{RR}) \pm Z\sqrt{\frac{(n_{1} - x_{1})/x_{1}}{n_{1}} + \frac{(n_{2} - x_{2})/x_{2}}{n_{2}}}\] Compute the confidence interval for RR by finding the antilog of the result in step 1, i.e., exp(Lower Limit), exp (Upper Limit).
Note that the null value of the confidence interval for the relative risk is one. If a 95% CI for the relative risk includes the null value of 1, then there is insufficient evidence to conclude that the groups are statistically significantly different.
Compute the point estimate for the relative risk for achieving pain relief, comparing those receiving the new drug to those receiving the standard pain reliever. Then compute the 95% confidence interval for the relative risk, and interpret your findings in words.
| Treatment.group | n | X..with.reduction | Proportion.with.reduction |
|---|---|---|---|
| New pain reliever | 50 | 23 | 0.46 |
| Standard pain reliever | 50 | 11 | 0.22 |
\[\hat{RR} = \frac{0.46}{0.22} = 2.09\] \[ln(\hat{RR}) \pm Z\sqrt{\frac{(n_{1} - x_{1})/x_{1}}{n_{1}} + \frac{(n_{2} - x_{2})/x_{2}}{n_{2}}} =
ln(2.09) \pm 1.96 \sqrt{\frac{(50 - 23)/23}{50} + \frac{(50 - 11)/11}{50}} = 0.737 \pm 0.602 = (0.135, 1.339)
\] This is the confidence interval for ln(RR). To compute the upper and lower limits for the confidence interval for RR we must find the antilog using the (exp) function: exp(0.135) = 1.14
exp(1.339) = 3.82
Therefore, we are 95% confident that patients receiving the new pain reliever are between 1.14 and 3.82 times as likely to report a meaningful reduction in pain compared to patients receiving tha standard pain reliever.
The point estimate for the risk ratio is RR=p1/p2=0.46/0.22=2.09. Therefore, new drug has 2.09 times the risk of relieving pain compared to standard drug.
| Treatment.group | Diseased | Non.diseased |
|---|---|---|
| Exposed | a | b |
| Non-Exposed | c | d |
Odds ratio computed: OR = (a/b) / (c/d)
Odds ratios do not follow a normal distribution, so we use the log transformation to promote normality.
#### Computing the Confidence Interval for an Odds Ratio \[Ln(OR) \pm Z\sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}}\]
The null, or no difference, value of the confidence interval for the odds ratio is one. If a 95% CI for the odds ratio does not include one, then the odds are said to be statistically significantly different.
| Treatment.group | n | X..with.reduction | Proportion.with.reduction |
|---|---|---|---|
| New pain reliever | 50 | 23 | 0.46 |
| Standard pain reliever | 50 | 11 | 0.22 |
It is easier to solve this problem if the information is organized in a contingency table in this way (50 - 23 = 27 etc):
| Treatment.group | Pain.relief.3. | Less.relief |
|---|---|---|
| New pain reliever | 23 | 27 |
| Standard pain reliever | 11 | 39 |
Odds of pain relief 3+ with new drug = 23/27 0.8519
Odds of pain relief 3+ with standard drug = 11/39 = 0.2821
Odds Ratio = 0.8519 / 0.2821 = 3.02
To compute the 95% cofidence interval for the odds ratio:
\[ln(3.02) \pm 1.96\sqrt{\frac{1}{23}+\frac{1}{27}+\frac{1}{11}+\frac{1}{39}} = 1.105257 \pm 0.87008 = (0.235173, 1.975341)\]
exp(0.235173) = 1.265
exp(1.975341) = 7.209
The point estimate of the odds ratio is OR=3.2, and we are 95% confident that the true odds ratio lies between 1.27 and 7.21. This is statistically significant because the 95% confidence interval does not include the null value (OR=1.0).
Reference and summary: Link