The fundamental task of epidemiology is to quantify the occurrence of disease. We do this to describe the health of a population, identify groups at high risk, and evaluate the effectiveness of health interventions.
To measure disease, we don’t just count individuals; we relate those counts to the size of the population and the passage of time.
Before we measure specific diseases, we must understand the three mathematical building blocks:
A fraction where the numerator is not part of the denominator. \[Ratio = \frac{x}{y}\] * Example: The ratio of male to female births in a hospital (e.g., 1.05:1).
A fraction where the numerator is part of the denominator. Usually expressed as a percentage. \[Proportion = \frac{x}{x + y}\] * Example: 15 students in a class of 100 have the flu (15%).
A proportion that incorporates the concept of time. * Formula: \(\frac{\text{Event}}{\text{Population at risk} \times \text{Time Period}}\)
Morbidity refers to the state of being diseased or unhealthy within a population. The two primary measures are Prevalence and Incidence.
Prevalence measures the burden of disease. It answers: What proportion of the population has the disease at a specific point in time?
\[P = \frac{\text{Number of existing cases at a point in time}}{\text{Total population at that point in time}}\]
Incidence measures the risk or the flow of new cases. It answers: How many people developed the disease during a specific period?
Used when the entire population is followed for the same amount of time. \[CI = \frac{\text{Number of new cases during a period}}{\text{Total population at risk at the start of the period}}\]
Used when people are followed for different lengths of time (using Person-Time). \[IR = \frac{\text{Number of new cases}}{\text{Sum of person-time at risk}}\]
Think of a bathtub: * Incidence is the water flowing in from the faucet (new cases). * Prevalence is the level of water in the tub (total existing cases). * Mortality/Recovery is the water leaving through the drain.
\[Prevalence \approx Incidence \times Duration\]
Real-Life Example: HIV/AIDS In the 1990s, when highly active antiretroviral therapy (HAART) was introduced, the Prevalence of HIV increased. This wasn’t because more people were getting infected (Incidence was stable), but because fewer people were dying (Duration of life increased).
Let’s visualize the difference between Incidence and Prevalence using a simulated dataset of a flu outbreak in a small dorm.
# Creating dummy data
days <- 1:15
new_cases <- c(0, 2, 5, 8, 12, 10, 5, 3, 1, 0, 0, 0, 0, 0, 0)
total_cases <- cumsum(new_cases) - c(0, 0, 0, 1, 2, 4, 6, 8, 10, 12, 14, 15, 15, 15, 15) # subtracting recoveries
df <- data.frame(Day = days, New_Cases = new_cases, Active_Cases = total_cases)
ggplot(df, aes(x = Day)) +
geom_line(aes(y = New_Cases, color = "Incidence (New)"), size = 1.2) +
geom_line(aes(y = Active_Cases, color = "Prevalence (Current)"), size = 1.2, linetype = "dashed") +
labs(title = "Incidence vs. Prevalence during a Flu Outbreak",
y = "Number of People",
color = "Metric") +
theme_minimal()Mortality measures the occurrence of death in a population.
| Measure | Formula |
|---|---|
| Crude Death Rate | \(\frac{\text{Total deaths in a year}}{\text{Mid-year population}} \times 1,000\) |
| Case Fatality Rate (CFR) | \(\frac{\text{Deaths from specific disease}}{\text{Number of people with that disease}} \times 100\) |
| Infant Mortality Rate | \(\frac{\text{Deaths < 1 year old}}{\text{Number of live births}} \times 1,000\) |
In the early stages of a pandemic, the Case Fatality Rate (CFR) often appears higher because we only test the sickest individuals (the denominator is small). As testing expands to asymptomatic people, the CFR usually drops.
Scenario: 5 healthy men were followed for a study on heart disease for 5 years. * Subject A: Followed 5 years, no disease. * Subject B: Developed disease at year 2. * Subject C: Lost to follow-up at year 3. * Subject D: Followed 5 years, no disease. * Subject E: Developed disease at year 4.
Calculation: 1. Total Person-Years = \(5 (A) + 2 (B) + 3 (C) + 5 (D) + 4 (E) = 19 \text{ person-years}\). 2. New Cases = 2 (B and E). 3. Incidence Rate = \(2 / 19 = 0.105\) cases per person-year (or 10.5 cases per 100 person-years).
```
$$P = \frac{x}{y}$$) which renders
beautifully in R Markdown.ggplot2
code chunk to demonstrate the relationship between Incidence and
Prevalence visually.toc: true) and themed styling for a professional lecture
note feel.install.packages(c("ggplot2", "dplyr", "tidyr")).