In epidemiology, “measurement” is the cornerstone of research. Before we can compare the health status of populations or identify causes of disease, we must be able to quantify the occurrence of disease.
Learning Objectives:
To measure disease frequency, we use three distinct mathematical parameters.
A division of two numbers where the numerator is NOT necessarily included in the denominator. \[ \text{Ratio} = \frac{x}{y} \] * Example: The sex ratio (Males / Females).
A division where the numerator IS included in the denominator. It is often expressed as a percentage. \[ \text{Proportion} = \frac{a}{a+b} \times 100 \] * Example: The proportion of students in this class who wear glasses.
A ratio where time is an intrinsic part of the denominator. It measures the speed of occurrence of an event. \[ \text{Rate} = \frac{\text{Events}}{\text{Population at risk} \times \text{Time}} \]
Morbidity refers to the state of being diseased or unhealthy within a population. We look at two main perspectives: Incidence (new cases) and Prevalence (existing cases).
Prevalence measures the burden of disease at a specific time.
\[ \text{Prevalence} = \frac{\text{Number of existing cases}}{\text{Total Population}} \]
Let’s simulate a survey of 100 people testing for Hypertension.
# Simulating data
set.seed(123)
population_size <- 1000
# 1 = Disease Present, 0 = Disease Absent
survey_data <- data.frame(
id = 1:population_size,
status = sample(c(0, 1), population_size, replace = TRUE, prob = c(0.85, 0.15))
)
# Calculate Point Prevalence
existing_cases <- sum(survey_data$status)
prevalence <- (existing_cases / population_size) * 100
print(paste("The Point Prevalence of Hypertension is:", prevalence, "%"))## [1] "The Point Prevalence of Hypertension is: 14.7 %"
Incidence measures the risk or rate of developing a new disease. It only looks at the “Population at Risk” (people who do not have the disease yet).
The proportion of people who develop the disease during a specified period.
\[ CI = \frac{\text{Number of NEW cases}}{\text{Population at risk at start}} \]
Used when follow-up times differ for participants. The denominator is Person-Time.
\[ ID = \frac{\text{Number of NEW cases}}{\text{Total Person-Time of observation}} \]
Imagine a study of 5 patients followed for 10 years to see if they develop Heart Disease.
# Create data for the plot
cohort_data <- data.frame(
Patient = c("A", "B", "C", "D", "E"),
Start = c(0, 0, 0, 0, 0),
End = c(10, 5, 2, 10, 8),
Outcome = c("Healthy", "Disease", "Lost", "Healthy", "Disease")
)
# Calculate Person Years
total_person_years <- sum(cohort_data$End)
new_cases <- sum(cohort_data$Outcome == "Disease")
inc_rate <- new_cases / total_person_years
# Plotting the timeline (Gantt Chart style)
ggplot(cohort_data, aes(y = Patient, x = End)) +
geom_segment(aes(x = Start, xend = End, y = Patient, yend = Patient), size = 2, color = "steelblue") +
geom_point(aes(color = Outcome), size = 4) +
scale_color_manual(values = c("red", "green", "grey")) +
theme_minimal() +
labs(title = "Follow-up of 5 Patients",
subtitle = paste0("Total Person-Years: ", total_person_years,
" | New Cases: ", new_cases),
x = "Years of Follow-up", y = "Patient ID")Figure 1: Visualizing Person-Time for Incidence Rate
## [1] "Incidence Rate: 0.057 cases per person-year"
This is often described using the Bathtub Analogy.
The Formula: \[ Prevalence \approx Incidence \times \text{Duration of Disease} \]
Figure 2: The Relationship between P, I, and D
While morbidity measures illness, mortality measures death.
The actual observed mortality in a population over a given period.
\[ \text{CDR} = \frac{\text{Total Deaths}}{\text{Total Mid-year Population}} \times 1000 \]
Measures the severity (virulence) of a disease. It asks: Of those who got sick, how many died?
\[ \text{CFR} = \frac{\text{Deaths from Disease X}}{\text{Cases of Disease X}} \times 100 \]
Let’s compare the severity of two diseases.
| Disease | Cases | Deaths |
|---|---|---|
| Disease A (Like Ebola) | 500 | 250 |
| Disease B (Like Flu) | 50,000 | 500 |
# Create Data
disease_data <- data.frame(
Disease = c("Ebola-like", "Flu-like"),
Cases = c(500, 50000),
Deaths = c(250, 500)
)
# Calculate CFR
disease_data <- disease_data %>%
mutate(CFR_Percentage = (Deaths / Cases) * 100)
kable(disease_data, caption = "Comparison of Case Fatality Rates")| Disease | Cases | Deaths | CFR_Percentage |
|---|---|---|---|
| Ebola-like | 500 | 250 | 50 |
| Flu-like | 50000 | 500 | 1 |
Interpretation: Even though the “Flu-like” disease caused more total deaths (500 vs 250), the “Ebola-like” disease is much more virulent (50% CFR vs 1% CFR).
Scenario: In a village of 1,000 people, on January 1st, 50 people already have Malaria. Over the course of the year: 1. 100 NEW people develop Malaria. 2. None of the Malaria patients die, but 10 people die from car accidents.
Calculate: 1. Prevalence at start. 2. Cumulative Incidence (Risk) over the year. 3. Crude Death Rate.
pop_total <- 1000
existing_cases <- 50
new_cases <- 100
deaths <- 10
# 1. Prevalence at Jan 1
prev <- existing_cases / pop_total
print(paste("Prevalence:", prev))## [1] "Prevalence: 0.05"
# 2. Cumulative Incidence
# Denominator must be population AT RISK (Total - Existing Cases)
pop_at_risk <- pop_total - existing_cases
ci <- new_cases / pop_at_risk
print(paste("Cumulative Incidence:", round(ci, 3)))## [1] "Cumulative Incidence: 0.105"
# 3. Crude Death Rate (per 1,000)
# Assuming mid-year pop is approx same as start for simplicity
cdr <- (deaths / pop_total) * 1000
print(paste("Crude Death Rate:", cdr, "per 1,000"))## [1] "Crude Death Rate: 10 per 1,000"
Always check your denominator! Are you dividing by the whole population, or just those at risk?
```
population_size or
deaths) and re-run the chunks to see how rates change.ggplot2 to create a
“Gantt Chart” style timeline for Person-Time. This is much better than
static text for explaining Incidence Density.