1. Introduction

In epidemiology, “measurement” is the process of quantifying the occurrence of disease or health states in a population. Without measurement, we cannot compare the health status of different populations, track disease trends over time, or evaluate the effectiveness of interventions.

Key Learning Objectives: 1. Distinguish between ratios, proportions, and rates. 2. Define and calculate Prevalence. 3. Define and calculate Incidence (Cumulative and Density). 4. Understand the relationship between Prevalence and Incidence.


2. Basic Mathematical Tools

Before measuring disease, we must define our mathematical tools.

2.1 Ratio

A division of one number by another where the numerator is not necessarily part of the denominator. \[ \text{Ratio} = \frac{a}{b} \] * Example: The Male-to-Female ratio.

2.2 Proportion

A ratio where the numerator is included in the denominator. Usually expressed as a percentage. \[ \text{Proportion} = \frac{a}{a+b} \times 100 \] * Example: The proportion of students who wear glasses.

2.3 Rate

A ratio that includes time as an essential element in the denominator. It measures the speed of occurrence. \[ \text{Rate} = \frac{\Delta a}{\Delta t} \]


3. Measures of Morbidity

Morbidity refers to the “state of being diseased.” We rely on two primary measures: Prevalence and Incidence.

3.1 Prevalence

Prevalence is the proportion of a population that has a specific characteristic (disease) at a specific time point. It is a “snapshot” of the population.

\[ \text{Prevalence} = \frac{\text{Number of existing cases at a specific time}}{\text{Total population at that time}} \]

Real-Life Example: Diabetes

According to the CDC, if we look at the US population in 2020, there were roughly 34 million people with diabetes out of a population of 330 million.

cases <- 34000000
population <- 330000000
prev <- (cases / population) * 100
paste0("The prevalence of diabetes is approximately ", round(prev, 1), "%.")
## [1] "The prevalence of diabetes is approximately 10.3%."

Visualizing Prevalence

Let’s visualize the prevalence of a hypothetical disease across three different cities.

# Data creation
city_data <- data.frame(
  City = c("City A", "City B", "City C"),
  Population = c(10000, 15000, 8000),
  Cases = c(500, 1200, 200)
) %>%
  mutate(Prevalence_Pct = (Cases / Population) * 100)

# Plotting
ggplot(city_data, aes(x = City, y = Prevalence_Pct, fill = City)) +
  geom_col() +
  geom_text(aes(label = paste0(round(Prevalence_Pct, 1), "%")), vjust = -0.5) +
  labs(title = "Disease Prevalence Comparison", y = "Prevalence (%)") +
  theme_minimal()
Figure 1: Comparative Prevalence by City

Figure 1: Comparative Prevalence by City


3.2 Incidence

Incidence measures the occurrence of new cases of disease in a population at risk over a specified period of time. It measures the “risk” or the “flow” of disease.

There are two types of incidence:

A. Cumulative Incidence (Risk)

The proportion of an at-risk group that develops the disease over a specific time period.

\[ CI = \frac{\text{Number of NEW cases}}{\text{Population at risk at start of period}} \]

B. Incidence Density (Incidence Rate)

Used when people are observed for different lengths of time. The denominator is Person-Time.

\[ IR = \frac{\text{Number of NEW cases}}{\text{Total Person-Time of observation}} \]

Real-Life Example: COVID-19 Outbreak

Imagine a nursing home with 100 residents (all healthy at start). Over 1 month, 20 residents test positive.

  • Cumulative Incidence: 20 / 100 = 0.20 (or 20% risk over 1 month).

Simulation: The Epidemic Curve

Incidence is often visualized using an epidemic curve (Epi Curve).

set.seed(123)
# Simulating an outbreak over 30 days
days <- 1:30
new_cases <- round(dnorm(days, mean = 15, sd = 5) * 200)

epi_data <- data.frame(Day = days, New_Cases = new_cases)

ggplot(epi_data, aes(x = Day, y = New_Cases)) +
  geom_col(fill = "steelblue") +
  geom_smooth(method = "loess", se = FALSE, color = "red") +
  labs(title = "Daily Incidence of Disease X", 
       subtitle = "The red line represents the trend (Incidence Rate shape)",
       y = "Number of New Cases", x = "Day of Outbreak") +
  theme_classic()
Figure 2: Incidence of New Cases Over Time (Epi Curve)

Figure 2: Incidence of New Cases Over Time (Epi Curve)


4. The Relationship: Prevalence vs. Incidence

The relationship between Prevalence (P) and Incidence (I) depends on the Duration (D) of the disease.

\[ \text{Prevalence} \approx \text{Incidence} \times \text{Duration} \] (Assumption: The disease is rare and the population is stable)

The Bathtub Analogy

  • Faucet (Incidence): New water entering the tub.
  • Tub Water (Prevalence): The total amount of water (cases) currently in the tub.
  • Drain (Recovery/Death): Cases leaving the population.

Implications: 1. If a new drug prevents death but doesn’t cure the disease (increases Duration), Prevalence will go UP even if Incidence stays the same. 2. If a disease is highly fatal and kills quickly (short Duration), Prevalence will be LOW even if Incidence is high (e.g., Ebola).

Calculation Example:

If the Incidence of a chronic condition is 5 per 1,000 person-years, and the average duration of the disease is 10 years:

\[ P \approx 5 \times 10 = 50 \text{ per 1,000 population} \]


5. Measures of Mortality

While morbidity measures illness, mortality measures death.

5.1 Case Fatality Rate (CFR)

Measures the severity of a disease.

\[ \text{CFR} = \frac{\text{Number of deaths from disease}}{\text{Number of confirmed cases of disease}} \times 100 \]

Real-Life Comparison: Ebola vs. Influenza

  • Ebola (Zaire strain): High CFR (~50% to 90%).
  • Seasonal Flu: Low CFR (< 0.1%).
disease_stats <- data.frame(
  Disease = c("Ebola", "SARS-CoV-2 (Early)", "Seasonal Flu"),
  Cases = c(1000, 1000, 1000),
  Deaths = c(500, 20, 1)
) %>%
  mutate(CFR_Percent = (Deaths / Cases) * 100)

kable(disease_stats, caption = "Table 1: Comparison of Case Fatality Rates (Hypothetical Standardized Cohort)") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 1: Comparison of Case Fatality Rates (Hypothetical Standardized Cohort)
Disease Cases Deaths CFR_Percent
Ebola 1000 500 50.0
SARS-CoV-2 (Early) 1000 20 2.0
Seasonal Flu 1000 1 0.1

6. Summary Assignment

Using the dataset below, calculate the Prevalence of “Disease Y” at Year 5.

Dataset: * Total Population at Year 5: 5,000 * New cases diagnosed in Year 5: 50 * Pre-existing cases (still active) in Year 5: 150

Solution Code:

total_pop <- 5000
new_cases <- 50
existing_cases <- 150

# Prevalence includes ALL cases (new + old)
total_cases <- new_cases + existing_cases

prevalence_calc <- (total_cases / total_pop) * 100

print(paste("The prevalence is:", prevalence_calc, "%"))
## [1] "The prevalence is: 4 %"

End of Module 3 ```