In epidemiology, “measurement” is the process of quantifying the occurrence of disease or health states in a population. Without measurement, we cannot compare the health status of different populations, track disease trends over time, or evaluate the effectiveness of interventions.
Key Learning Objectives: 1. Distinguish between ratios, proportions, and rates. 2. Define and calculate Prevalence. 3. Define and calculate Incidence (Cumulative and Density). 4. Understand the relationship between Prevalence and Incidence.
Before measuring disease, we must define our mathematical tools.
A division of one number by another where the numerator is not necessarily part of the denominator. \[ \text{Ratio} = \frac{a}{b} \] * Example: The Male-to-Female ratio.
A ratio where the numerator is included in the denominator. Usually expressed as a percentage. \[ \text{Proportion} = \frac{a}{a+b} \times 100 \] * Example: The proportion of students who wear glasses.
A ratio that includes time as an essential element in the denominator. It measures the speed of occurrence. \[ \text{Rate} = \frac{\Delta a}{\Delta t} \]
Morbidity refers to the “state of being diseased.” We rely on two primary measures: Prevalence and Incidence.
Prevalence is the proportion of a population that has a specific characteristic (disease) at a specific time point. It is a “snapshot” of the population.
\[ \text{Prevalence} = \frac{\text{Number of existing cases at a specific time}}{\text{Total population at that time}} \]
According to the CDC, if we look at the US population in 2020, there were roughly 34 million people with diabetes out of a population of 330 million.
cases <- 34000000
population <- 330000000
prev <- (cases / population) * 100
paste0("The prevalence of diabetes is approximately ", round(prev, 1), "%.")## [1] "The prevalence of diabetes is approximately 10.3%."
Let’s visualize the prevalence of a hypothetical disease across three different cities.
# Data creation
city_data <- data.frame(
City = c("City A", "City B", "City C"),
Population = c(10000, 15000, 8000),
Cases = c(500, 1200, 200)
) %>%
mutate(Prevalence_Pct = (Cases / Population) * 100)
# Plotting
ggplot(city_data, aes(x = City, y = Prevalence_Pct, fill = City)) +
geom_col() +
geom_text(aes(label = paste0(round(Prevalence_Pct, 1), "%")), vjust = -0.5) +
labs(title = "Disease Prevalence Comparison", y = "Prevalence (%)") +
theme_minimal()Figure 1: Comparative Prevalence by City
Incidence measures the occurrence of new cases of disease in a population at risk over a specified period of time. It measures the “risk” or the “flow” of disease.
There are two types of incidence:
The proportion of an at-risk group that develops the disease over a specific time period.
\[ CI = \frac{\text{Number of NEW cases}}{\text{Population at risk at start of period}} \]
Used when people are observed for different lengths of time. The denominator is Person-Time.
\[ IR = \frac{\text{Number of NEW cases}}{\text{Total Person-Time of observation}} \]
Imagine a nursing home with 100 residents (all healthy at start). Over 1 month, 20 residents test positive.
Incidence is often visualized using an epidemic curve (Epi Curve).
set.seed(123)
# Simulating an outbreak over 30 days
days <- 1:30
new_cases <- round(dnorm(days, mean = 15, sd = 5) * 200)
epi_data <- data.frame(Day = days, New_Cases = new_cases)
ggplot(epi_data, aes(x = Day, y = New_Cases)) +
geom_col(fill = "steelblue") +
geom_smooth(method = "loess", se = FALSE, color = "red") +
labs(title = "Daily Incidence of Disease X",
subtitle = "The red line represents the trend (Incidence Rate shape)",
y = "Number of New Cases", x = "Day of Outbreak") +
theme_classic()Figure 2: Incidence of New Cases Over Time (Epi Curve)
The relationship between Prevalence (P) and Incidence (I) depends on the Duration (D) of the disease.
\[ \text{Prevalence} \approx \text{Incidence} \times \text{Duration} \] (Assumption: The disease is rare and the population is stable)
Implications: 1. If a new drug prevents death but doesn’t cure the disease (increases Duration), Prevalence will go UP even if Incidence stays the same. 2. If a disease is highly fatal and kills quickly (short Duration), Prevalence will be LOW even if Incidence is high (e.g., Ebola).
If the Incidence of a chronic condition is 5 per 1,000 person-years, and the average duration of the disease is 10 years:
\[ P \approx 5 \times 10 = 50 \text{ per 1,000 population} \]
While morbidity measures illness, mortality measures death.
Measures the severity of a disease.
\[ \text{CFR} = \frac{\text{Number of deaths from disease}}{\text{Number of confirmed cases of disease}} \times 100 \]
disease_stats <- data.frame(
Disease = c("Ebola", "SARS-CoV-2 (Early)", "Seasonal Flu"),
Cases = c(1000, 1000, 1000),
Deaths = c(500, 20, 1)
) %>%
mutate(CFR_Percent = (Deaths / Cases) * 100)
kable(disease_stats, caption = "Table 1: Comparison of Case Fatality Rates (Hypothetical Standardized Cohort)") %>%
kable_styling(bootstrap_options = c("striped", "hover"))| Disease | Cases | Deaths | CFR_Percent |
|---|---|---|---|
| Ebola | 1000 | 500 | 50.0 |
| SARS-CoV-2 (Early) | 1000 | 20 | 2.0 |
| Seasonal Flu | 1000 | 1 | 0.1 |
Using the dataset below, calculate the Prevalence of “Disease Y” at Year 5.
Dataset: * Total Population at Year 5: 5,000 * New cases diagnosed in Year 5: 50 * Pre-existing cases (still active) in Year 5: 150
Solution Code:
total_pop <- 5000
new_cases <- 50
existing_cases <- 150
# Prevalence includes ALL cases (new + old)
total_cases <- new_cases + existing_cases
prevalence_calc <- (total_cases / total_pop) * 100
print(paste("The prevalence is:", prevalence_calc, "%"))## [1] "The prevalence is: 4 %"
End of Module 3 ```