Jared Parrish
2020-10-07
We will be reviewing the online modules from BU
BU Online MPH Learning Modules
We will first start with the Epidemiology and then move to the Biostatistics Modules.
Confidence Intervals are incredibly informative by quantifying the degree of uncertainty as well as facilitate making comparisons.
When used to compare estimates between strata/groups as a “crude” homogeneity test the general rules are:
The visual “test of homogeneity” comes at a cost
We're comparing CI"s for estimates in each group opposed to differences between the point estimate of interest.
Goldstein and Healy (1995) suggest using 83% Confidence intervals for each group to represent a 95% significant difference between two estimates.
If we only have the point estimate and confidence interval for each group we can use the CI's to estimate the standard error for each group and conduct a homogeneity test under the form:
\[ \mathbf{Z}_{homog} = \frac{abs{(\hat{\mu}_1 - \hat{\mu}_2)}}{\sqrt{S^2_1 + S^2_2}} \]
These data are taken from a preterm birth report that WCFH is working on.
| Year | PretermBirths | NonPretermBirths | Births | proportion | lower | upper | conf.level |
|---|---|---|---|---|---|---|---|
| 2010 | 944 | 10560 | 11504 | 0.08206 | 0.07704 | 0.08707 | 0.95 |
| 2011 | 1010 | 10457 | 11467 | 0.08808 | 0.08289 | 0.09327 | 0.95 |
| 2012 | 853 | 10352 | 11205 | 0.07613 | 0.07122 | 0.08104 | 0.95 |
| 2013 | 973 | 10498 | 11471 | 0.08482 | 0.07972 | 0.08992 | 0.95 |
| 2014 | 966 | 10462 | 11428 | 0.08453 | 0.07943 | 0.08963 | 0.95 |
| 2015 | 1010 | 10315 | 11325 | 0.08918 | 0.08393 | 0.09443 | 0.95 |
| 2016 | 1007 | 10238 | 11245 | 0.08955 | 0.08427 | 0.09483 | 0.95 |
| 2017 | 946 | 9550 | 10496 | 0.09013 | 0.08465 | 0.09561 | 0.95 |
| 2018 | 937 | 9183 | 10120 | 0.09259 | 0.08694 | 0.09824 | 0.95 |
| 2019 | 956 | 8906 | 9862 | 0.09694 | 0.09110 | 0.10278 | 0.95 |
Credit: Rachel Gallegos - PHAP fellow assigned to WCFH/MCH-Epidemiology
$data
NonPretermBirths PretermBirths Total
2018 9183 937 10120
2019 8906 956 9862
Total 18089 1893 19982
$measure
NA
risk ratio with 95% C.I. estimate lower upper
2018 1.000000 NA NA
2019 1.046969 0.9609563 1.14068
$p.value
NA
two-sided midp.exact fisher.exact chi.square
2018 NA NA NA
2019 0.2940939 0.2989286 0.2939509
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
Remember that for a proportion, the CI is calculated as approximately:
\[ \hat{p} \pm z^{*} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]
Note: proportions are bound between 0 and 1 and rates 0 and infinity.
so we need to adjust our z formula as:
\[ \mathbf{Z}_{homog} = \frac{abs{(\log{\hat{\mu}_1} - \log{\hat{\mu}_2)}}}{\sqrt{S^2_1 + S^2_2}} \]
were S =
\[ S_{0} = \frac{\log{(upperCI)} - \log{(lowerCI)}}{1.96*2} \]
\[ S_{1} = \frac{\log{(upperCI)} - \log{(lowerCI)}}{1.96*2} \]
Year proportion lower upper
9 2018 0.09258893 0.08694165 0.09823621
10 2019 0.09693774 0.09109831 0.10277718
s0 <- (log(0.09823621) - log(0.08694165))/3.92 #1.96*2
s1 <- (log(0.1027772) - log(0.09109831))/3.92
zh <- abs(log(0.09258893) - log(0.09693774))/(sqrt(s0^2 + s1^2))
2*pnorm(zh, lower.tail = F)
[1] 0.2945752
NA
two-sided midp.exact fisher.exact chi.square
2018 NA NA NA
2019 0.2940939 0.2989286 0.2939509
$data
NonPretermBirths PretermBirths Total
2015 10315 1010 11325
2019 8906 956 9862
Total 19221 1966 21187
$measure
NA
risk ratio with 95% C.I. estimate lower upper
2015 1.00000 NA NA
2019 1.08695 0.9991566 1.182458
$p.value
NA
two-sided midp.exact fisher.exact chi.square
2015 NA NA NA
2019 0.05251294 0.05451793 0.05232018
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
t2p <- dat1[c(6,10),c(1,5:7)]
t2p
Year proportion lower upper
6 2015 0.08918322 0.08393411 0.09443234
10 2019 0.09693774 0.09109831 0.10277718
s0 <- (log(0.09443234) - log(0.08393411))/3.92
s1 <- (log(0.10277718) - log(0.09109831))/3.92
zh <- abs(log(0.08918322) - log(0.09693774))/(sqrt(s0^2 + s1^2))
2*pnorm(zh, lower.tail = F)
[1] 0.052615
NA
two-sided midp.exact fisher.exact chi.square
2015 NA NA NA
2019 0.05251294 0.05451793 0.05232018
$data
NonPretermBirths PretermBirths Total
2011 10457 1010 11467
2019 8906 956 9862
Total 19363 1966 21329
$measure
NA
risk ratio with 95% C.I. estimate lower upper
2011 1.000000 NA NA
2019 1.100579 1.011659 1.197315
$p.value
NA
two-sided midp.exact fisher.exact chi.square
2011 NA NA NA
2019 0.02590005 0.02725046 0.02575092
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
t3p <- dat1[c(2,10),c(1,5:7)]
t3p
Year proportion lower upper
2 2011 0.08807883 0.08289158 0.09326609
10 2019 0.09693774 0.09109831 0.10277718
s0 <- (log(0.09326609) - log(0.08289158))/3.92
s1 <- (log(0.10277718) - log(0.09109831))/3.92
zh <- abs(log(0.08807883) - log(0.09693774))/(sqrt(s0^2 + s1^2))
2*pnorm(zh, lower.tail = F)
[1] 0.02594365
NA
two-sided midp.exact fisher.exact chi.square
2011 NA NA NA
2019 0.02590005 0.02725046 0.02575092
We can't always rely on confidence interval overlap when making statements about “significance”