Learning Objectives

Sources

Load Libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openxlsx)
library(epitools)
## Warning: package 'epitools' was built under R version 4.5.2

Load data

setwd("E:/Biostat and Study Design/204/Lectures/Data")
NHANES_df <- openxlsx::read.xlsx('NHEFS.xlsx')

Cohort studies and randomized clinical trials

A cohort study is a study in which a group of disease-free individuals is identified at one point in time and is followed over a period of time until some of them develop the disease. The development of disease over time is then related to other variables measured at baseline, generally called exposure variables. The study population in a prospective study is often called a cohort. The primary distinction between a cohort study and a randomized controlled trial lies in whether the researcher employs randomization to assign patients to either the treatment or control group.

Alt

Relative risk (RR) is defined as the probability of an event (developing a disease) occurring in exposed people compared to the probability of the event in nonexposed people or as the ratio of the two probabilities. The relative risk is calculated using the following formula:

\[ Relative\:risk= \frac{Risk\:in\:exposed}{Risk\:in\:nonexposed}=\frac{Incidence\:in\:exposed}{Incidence\:in \:nonexposed}=\frac{\frac{a}{a+b}}{\frac{c}{c+d}}\]

Alt

The standard error (SE) for relative risk is calculated using the following formula:

\[SE=\sqrt{\frac{1}{a}+\frac{1}{c}-\frac{1}{a+b}-\frac{1}{c+d}}\] To calculate the 95% CI, we use the following formula:

\[ 95\%\:CI=exp(ln(RR)-1.96\times SE ),\:exp(ln(RR)+1.96\times SE )\]

If relative risk = 1, then risk in exposed equal to risk in nonexposed (no association); if relative risk > 1, then risk in exposed greater than risk in nonexposed (positive association; possibly causal); if relative risk < 1, then risk in exposed less than risk in nonexposed (negative association; possibly protective).

Example: You conducted a prospective cohort study of 3,000 smokers and 5,000 nonsmokers to investigate the relationship between smoking and the development of coronary heart disease (CHD) over 1 year. Calculate relative risk!

\({H_0}: RR=1\)

\({H_1}: RR\neq1\)

CHD No CHD
Smokers 84 2,916
Non-Smokers 87 4,913


\[ Relative\:risk= \frac{Risk\:in\:exposed}{Risk\:in\:nonexposed}=\frac{Incidence\:in\:exposed}{Incidence\:in \:nonexposed}=\frac{\frac{a}{a+b}}{\frac{c}{c+d}}=\frac{\frac{84}{84+2916}}{\frac{87}{87+4913}}=\frac{0.028}{0.0174}=1.61\]

\[SE=\sqrt{\frac{1}{a}+\frac{1}{c}-\frac{1}{a+b}-\frac{1}{c+d}}=\sqrt{\frac{1}{84}+\frac{1}{87}-\frac{1}{84+2916}-\frac{1}{87+4913}}=0.151\]

\[ 95\%\:CI=exp(ln(RR)-1.96\times SE ),\:exp(ln(RR)+1.96\times SE )=exp(ln(1.61)-1.96\times 0.151 ),\:exp(ln(1.61)+1.96\times 0.151 )=1.20, 2.16 \]

Interpretation: Smokers have 1.61 (95% CI 1.20, 2.16) times the risk of CHD compared to non-smokers. If the relative risk was 0.6, the interpretation would be that smokers have a 40% reduction in the risk of CHD compared to non-smokers.

We can calculate the relative risk using the epitools package in R

Alt

smokers_table <- matrix(c(84,87,2916,4913),ncol = 2,nrow = 2)
colnames(smokers_table) <- c('CHD','No CHD')
rownames(smokers_table) <- c('Smokers','Non-Smokers')
smokers_table
##             CHD No CHD
## Smokers      84   2916
## Non-Smokers  87   4913
epitools::riskratio(smokers_table,rev='both',method = 'wald')
## $data
##             No CHD CHD Total
## Non-Smokers   4913  87  5000
## Smokers       2916  84  3000
## Total         7829 171  8000
## 
## $measure
##                         NA
## risk ratio with 95% C.I. estimate    lower    upper
##              Non-Smokers 1.000000       NA       NA
##              Smokers     1.609195 1.196452 2.164325
## 
## $p.value
##              NA
## two-sided      midp.exact fisher.exact  chi.square
##   Non-Smokers          NA           NA          NA
##   Smokers     0.001799736  0.001800482 0.001505872
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"

Risk difference

Instead of comparing two measures of disease by calculating their risk ratio, we can compare risk in terms of their absolute difference. The risk difference is calculated by subtracting the cumulative risk in the unexposed group from the cumulative risk in the exposure group.

\[Risk\:difference =Risk_{exposed}-Risk_{unexposed}\]

Example: Using the example above, calculate the risk difference.

CHD No CHD
Smokers 84 2,916
Non-Smokers 87 4,913


\[ Risk\:difference= Risk\:in\:exposed-Risk\:in\:nonexposed={\frac{a}{a+b}}-{\frac{c}{c+d}}={0.028}-{0.0174}=0.0106\]

Interpretation: Smokers have 11 additional cases of CHD per 1000 people compared to non-smokers.

Attributable Risk Percent

Attributable risk percent is the proportion of disease in the exposed group that can be attributed to the exposure.

\[Attributable\:risk\:percent =\frac{Risk\:in\:exposed-Risk\:in\:nonexposed}{Risk\:in\:exposed}\times{100}\]

Example: Using the example above, calculate the attributable risk difference.

CHD Developed CHD Did Not Develop
Smokers 84 2,916
Non-Smokers 87 4,913


\[Attributable\:relative\:percent= \frac{Risk\:in\:exposed-Risk\:in\:nonexposed}{Risk\:in\:exposed}=\frac{{0.028}-{0.0174}}{0.028}\times{100}=37.9\]

Interpretation: 37.9% of the total risk for CHD among smokers may be attributable to smoking.

NNT and NNH

When interpreting findings of clinical trials, It is important to help frame results in a way that clinicians can understand and integrate into decision making process. The number needed to treat (NNT) is the number of patients you need to treat to prevent one additional bad outcome. NNT is calculated using the following formula:

\[NNT =\frac{1}{Risk\:in\:the\:untreated\:group -Risk\:in\:the\:treated\:group}\] Estimates of NNT are usually rounded up to the next highest whole number to avoid overestimate of efficacy.

Example: 1,500 patients with CHF were randomized to receive a new treatment plan while 1,500 were randomized to standard of care. After one year’s time, 375 patients of the standard of care group expired, while 325 patients of the new treatment group expired. Calculate NNT.

\[NNT =\frac{1}{Risk\:in\:the\:untreated\:group -Risk\:in\:the\:treated\:group}=\frac{1}{375/1500-325/1500}=30\] Interpretation: 30 patients need to be treated with the new treatment plan to prevent one death.

The same approach can also be used to look at the risk of side effects by calculating the number needed to harm (NNH) to cause one additional person to be harmed.

\[NNH =\frac{1}{Rate\:in\:the\:treated\:group -Rate\:in\:the\:untreated\:group}\] Estimates of NNH are usually rounded down to the next lowest whole number to avoid understating the harms.

Example: 1,500 patients with colorectal cancer were randomized to receive a new therapeutic agent while 1,500 were randomized to receive placebo. After 6 month’s time, 25 of the placebo group developed severe diarrhea, while 375 of the treatment group developed severe diarrhea. Calculate NNH!

\[NNH =\frac{1}{Rate\:in\:the\:treated\:group -Rate\:in\:the\:untreated\:group}=\frac{1}{375/1500-25/1500}=4.3\] Interpretation: 4 patients need to be treated with the new therapeutic agent in order for one patient to develop severe diarrhea.

Case–control studies

case–control study is a study in which two groups of individuals are initially identified: (1) a group that has the disease under study (the cases) and (2) a group that does not have the disease under study (the controls). An attempt is then made to relate their prior health habits to their current disease status.

Alt

In a case-control study, we do not know the incidence in the exposed population or the incidence in the nonexposed population because we start with diseased people (cases) and nondiseased people (controls). Therefore, in a case-control study we cannot calculate the relative risk. An alternative solution would be to calculate odds ratio.

To better under the concept of odds, consider the following example. If you toss a fair coin, the probability of getting heads is 50% (P). The probability of getting tails is 1-P. What are the odds of getting heads? To answer this question we start by defining odds, which is defined as the ratio of the number of ways the event can occur to the number of ways the event cannot occur.

\[Odds\:of\:heads=\frac{probability\:of\:heads}{probability\:of\:tails}=\frac{P}{1-P}=\frac{50\%}{50\%}=1\] Keep in mind the distinction between probability and odds. In our example, the probability of getting heads when tossing a fair coin is 50% and the odds of getting heads when tossing a fair coin is 1.

Since we cannot calculate relative risk in case-control study, we use alternatively odds ratio (OR) instead. Odds ratio is calculated using the following formula:

\[ Odds\:ratio= \frac{Odds\:that\:a\:case\:was\:exposed}{Odds\:that\:a\:control\:was\:exposed}=\frac{\frac{a}{c}}{\frac{b}{d}}=\frac{ad}{bc}\]

Alt

The standard error (SE) for odds ratio is calculated using the following formula:

\[SE=\sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}}\] To calculate the 95% CI, we use the following formula:

\[ 95\%\:CI=exp(ln(OR)-1.96\times SE ),\:exp(ln(OR)+1.96\times SE )\]

Odds ratio is interpreted similarly to relative risk. If the exposure is not related to the disease, the odds ratio will equal 1. If the exposure is positively related to the disease, the odds ratio will be greater than 1. If the exposure is negatively related to the disease, the odds ratio will be less than 1.

Example: An outbreak of cyclosporiasis was detected among residents of New Jersey. In a case-control study, investigators found that 21 of 30 case-patients and four of 60 controls had eaten raspberries.

\({H_0}: OR=1\)

\({H_1}: OR\neq1\)

Cyclosporiasis No Cyclosporiasis
Ate Raspberries 21 4
Did not eat raspberries 9 56


\[Odds\:ratio=\frac{\frac{a}{c}}{\frac{b}{d}}=\frac{\frac{21}{9}}{\frac{4}{56}}=32.67\] \[SE=\sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}}=\sqrt{\frac{1}{21}+\frac{1}{9}+\frac{1}{4}+\frac{1}{56}}=0.65\] \[95\%\:CI=exp(ln(OR)-1.96\times SE ),\:exp(ln(OR)+1.96\times SE)=exp(ln(32.67)-1.96\times 0.65),\:exp(ln(32.67)+1.96\times 0.65)=9.08,117.5\]

Interpretation: The odds of cyclosporiasis was 32.7 (95% CI 9.08- 117.5) times higher in those who ate raspberries compared to those who did not eat raspberries.

Alt

cyclosporiasis_table <- matrix(c(21,9,4,56),ncol = 2,nrow = 2)
colnames(cyclosporiasis_table) <- c('Cyclosporiasis','No cyclosporiasis')
rownames(cyclosporiasis_table) <- c('Ate Raspberries','Did not eat raspberries')
cyclosporiasis_table
##                         Cyclosporiasis No cyclosporiasis
## Ate Raspberries                     21                 4
## Did not eat raspberries              9                56
epitools::oddsratio(cyclosporiasis_table,rev='both',method = 'wald')
## $data
##                         No cyclosporiasis Cyclosporiasis Total
## Did not eat raspberries                56              9    65
## Ate Raspberries                         4             21    25
## Total                                  60             30    90
## 
## $measure
##                          NA
## odds ratio with 95% C.I.  estimate    lower    upper
##   Did not eat raspberries  1.00000       NA       NA
##   Ate Raspberries         32.66667 9.081425 117.5048
## 
## $p.value
##                          NA
## two-sided                   midp.exact fisher.exact   chi.square
##   Did not eat raspberries           NA           NA           NA
##   Ate Raspberries         6.358611e-10 6.183017e-10 2.555681e-10
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"

Example: Using data from NHANES study, determine if there is an association between asthma diagnosis and sex.

We start by creating 2 X 2 of of asthma diagnosis and sex.

asthma_table <- table(Sex=NHANES_df$sex,Asthma=NHANES_df$asthma)
asthma_table
##         Asthma
## Sex        0   1
##   Female 781  49
##   Male   769  30

Next, let’s re-arrange the table results to fit our contingency table layout.

Asthma No Asthma
Female 49 781
Male 30 769


epitools::oddsratio(asthma_table,rev='rows',method = 'wald') #use males as a reference
## $data
##         Asthma
## Sex         0  1 Total
##   Male    769 30   799
##   Female  781 49   830
##   Total  1550 79  1629
## 
## $measure
##         odds ratio with 95% C.I.
## Sex      estimate    lower    upper
##   Male   1.000000       NA       NA
##   Female 1.608237 1.010044 2.560708
## 
## $p.value
##         two-sided
## Sex      midp.exact fisher.exact chi.square
##   Male           NA           NA         NA
##   Female 0.04412095   0.04961711 0.04354632
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"

Interpretation: The odds of asthma was 1.61 (95% CI 1.01, 2.56) times higher in females than males. If the odds ratio was 0.6, the interpretation would be that the odds of asthma is reduced by 40% in females compared to males.

Odds Ratio As an Estimate of the Relative Risk

Odds ratio can be used to estimate relative risk when the following conditions are satisfied:

The third condition can be proven mathematically:

\[ Relative\:risk=\frac{\frac{a}{a+b}}{\frac{c}{c+d}} \cong \frac{\frac{a}{b}}{\frac{c}{d}} = \frac{ad}{bc} \]

\[Odds\:ratio=\frac{\frac{a}{c}}{\frac{b}{d}}= \frac{ad}{bc}\]

Example: For the table below, calculate the relative risk and the odds ratio:

Developd disease Do not develop disease
Exposed 200 9,800
Not exposed 100 9,900


\[ Relative\:risk= \frac{Risk\:in\:exposed}{Risk\:in\:nonexposed}=\frac{\frac{a}{a+b}}{\frac{c}{c+d}}=\frac{\frac{200}{200+9800}}{\frac{100}{100+9900}}=2\]

\[Odds\:ratio=\frac{\frac{a}{c}}{\frac{b}{d}}= \frac{\frac{200}{100}}{\frac{9800}{9900}}=2.02\]