Introduction

The odds ratio (OR) is a statistical measure used to quantify the strength and direction of the association between two categorical variables. It is commonly employed in epidemiology, medicine, and other fields where researchers are interested in understanding the relationship between two variables.

Formula

The odds ratio is calculated as the ratio of the odds of an event occurring in one group to the odds of it occurring in another group. Mathematically, it is expressed as:

\[ \text{Odds Ratio (OR)} = \frac{\text{Odds of event in Group 1}}{\text{Odds of event in Group 2}} \]

In a 2x2 contingency table, where data is organized based on the presence or absence of two variables (A and B), the odds ratio can be calculated as follows:

\[ \text{Odds Ratio (OR)} = \frac{ \text{ad} }{ \text{bc} } \]

Here, \(a\), \(b\), \(c\), and \(d\) represent the counts in each cell of the contingency table:

\[ \begin{array}{|c|c|c|} \hline & \text{Group 1 (Exposure)} & \text{Group 2 (No Exposure)} \\ \hline \text{Event} & a & b \\ \hline \text{No Event} & c & d \\ \hline \end{array} \]

Interpretation

  1. If the odds ratio is equal to 1, it suggests that the odds of the event are the same in both groups, indicating no association.

  2. If the odds ratio is greater than 1, it suggests that the odds of the event are higher in Group 1 compared to Group 2.

  3. If the odds ratio is less than 1, it suggests that the odds of the event are lower in Group 1 compared to Group 2.

Logistic Regression and Odds Ratio Example

# Set a seed for reproducibility
set.seed(123)

# Create an example dataset
data <- data.frame(
  exposure = sample(c(0, 1), 100, replace = TRUE),  # 0: No exposure, 1: Exposure
  event = sample(c(0, 1), 100, replace = TRUE))      # 0: No event, 1: Event

# Display the first few rows of the dataset
head(data)
##   exposure event
## 1        0     0
## 2        0     1
## 3        0     1
## 4        1     0
## 5        0     1
## 6        1     1

Logistic Regression Model

# Fit logistic regression model
model <- glm(event ~ exposure, data = data, family = "binomial")

# Display the summary of the model
summary(model)
## 
## Call:
## glm(formula = event ~ exposure, family = "binomial", data = data)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.253  -1.253   1.104   1.104   1.119  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.17589    0.26593   0.661    0.508
## exposure    -0.03613    0.40521  -0.089    0.929
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 137.99  on 99  degrees of freedom
## Residual deviance: 137.98  on 98  degrees of freedom
## AIC: 141.98
## 
## Number of Fisher Scoring iterations: 3

In the output above, you’ll find the coefficient for the exposure variable. The exponentiated coefficient represents the odds ratio.

Extract and Interpret Odds Ratio

# Extract odds ratio from the model
odds_ratio <- exp(coef(model)["exposure"])

# Display the odds ratio
cat("Odds Ratio:", odds_ratio, "\n")
## Odds Ratio: 0.9645161

In this example, replace the event and exposure variables with the appropriate names from your actual dataset. The logistic regression model estimates the log-odds of the event occurring as a linear combination of the predictors, and the exponentiated coefficient of the exposure variable gives the odds ratio.

Conclusion

Researchers often use confidence intervals to assess the precision of the estimated odds ratio. A confidence interval that includes 1 indicates that the odds ratio is not statistically significant, while intervals not including 1 suggest a statistically significant association.

Odds ratios are widely used in case-control studies, logistic regression, and other analytical approaches to assess the strength of associations between variables.