In Module II, we discussed how to define a “case.” In Module III, we move to the mathematical quantification of these cases. Epidemiology relies on comparing the frequency of disease in different populations (e.g., “Is malaria more common in Kismayo than in Baidoa?”). To answer this, we need standardized measurements.
This module covers: 1. Mathematical Tools: Ratios, Proportions, and Rates. 2. Measures of Morbidity: Prevalence and Incidence. 3. Visualization: Swimmer plots and dynamic calculation.
Before measuring disease, we must understand the three types of calculations used.
A value obtained by dividing one quantity by another. The numerator is NOT included in the denominator. \[ Ratio = \frac{X}{Y} \] * Real-Life Example: The number of hospital beds per doctor in a region. * Calculation: 500 beds / 50 doctors = 10 beds per doctor.
A specific type of ratio where the numerator IS included in the denominator. Usually expressed as a percentage (%). \[ Proportion = \frac{A}{A + B} \times 100 \] * Real-Life Example: The proportion of students in this class who are female.
A change in one quantity per unit change in another quantity (usually time). A rate must contain a time dimension. \[ Rate = \frac{\Delta \text{Events}}{\Delta \text{Time}} \]
Definition: Prevalence is the proportion of a population who have a specific characteristic or disease at a specific point in time. It includes both new and existing cases.
\[ Prevalence = \frac{\text{Number of existing cases}}{\text{Total Population}} \]
Imagine a cross-sectional survey conducted in a district in Mogadishu. We test 1,000 people for Type 2 Diabetes on January 1st. * Found: 120 people have Diabetes. * Calculation: \(120 / 1000 = 0.12\) or \(12\%\).
The following R code visualizes prevalence as parts of a whole using a bar chart.
# Create dummy data representing a population of 100 people
pop_data <- data.frame(
Status = c(rep("Diseased", 12), rep("Healthy", 88))
)
# Calculate percentage
counts <- pop_data %>% count(Status) %>% mutate(prop = n/sum(n))
# Plot
ggplot(counts, aes(x = "", y = prop, fill = Status)) +
geom_bar(stat = "identity", width = 1) +
coord_flip() +
theme_minimal() +
scale_fill_manual(values = c("Diseased" = "#e74c3c", "Healthy" = "#2ecc71")) +
labs(title = "Point Prevalence Visualization",
subtitle = "12% Prevalence (Red portion represents the burden of disease)",
x = NULL, y = "Proportion") +
theme(axis.text.y = element_blank())Definition: Incidence measures the occurrence of NEW cases of disease that develop in a candidate population over a specified time period.
There are two distinct types of incidence:
The proportion of an at-risk group that develops the disease over a specific time.
\[ CI = \frac{\text{Number of NEW cases}}{\text{Population at risk at start}} \]
This is used when we follow people for different lengths of time. The denominator is Person-Time.
\[ IR = \frac{\text{Number of NEW cases}}{\text{Total Person-Time of Observation}} \]
This is the most difficult concept for students. Let’s visualize a cohort study following 5 individuals for 5 years to see if they develop a disease.
# Create the dataset
cohort_data <- data.frame(
ID = c("Person A", "Person B", "Person C", "Person D", "Person E"),
Time_Years = c(5, 2, 5, 3, 4),
Status = c("Healthy", "Disease", "Healthy", "Disease", "Lost to Follow-up")
)
# Calculate totals for the lecture
total_person_years <- sum(cohort_data$Time_Years)
total_new_cases <- sum(cohort_data$Status == "Disease")
inc_rate <- total_new_cases / total_person_years
# Generate Swimmer Plot
ggplot(cohort_data, aes(x = ID, y = Time_Years, fill = Status)) +
geom_bar(stat = "identity", width = 0.6) +
coord_flip() +
scale_fill_manual(values = c("Disease" = "#e74c3c",
"Healthy" = "#2ecc71",
"Lost to Follow-up" = "#95a5a6")) +
geom_text(aes(label = paste(Time_Years, "yrs")), hjust = 1.2, color = "white", fontface = "bold") +
labs(title = "Swimmer Plot: Visualizing Person-Time",
subtitle = paste0("Total Person-Time = ", total_person_years, " years | New Cases = ", total_new_cases),
y = "Years Observed", x = "Subject") +
theme_bw()Calculation from Figure: 1. Numerator: 2 New Cases (Person B, Person D). 2. Denominator: Sum of all bars (\(5 + 2 + 5 + 3 + 4 = 19\) Person-Years). 3. Incidence Rate: \(2 / 19 = 0.105\) cases per person-year (or 10.5 per 100 person-years).
The relationship between Prevalence (P) and Incidence (I) depends on the Duration (D) of the disease.
\[ P \approx I \times D \]
Imagine a Bathtub: 1. Faucet (Incidence): New cases flowing in. 2. Water Level (Prevalence): The total amount of disease currently in the tub. 3. Drain (Recovery/Death): Cases leaving the prevalence pool.
Scenario: You are monitoring a group of 500 children for Malaria over 1 year. 1. At the start (Jan 1), 50 children already have malaria. 2. We follow the remaining 450 healthy children. 3. Over the year, 45 of these healthy children develop malaria.
Question: Calculate the Prevalence at the start and the Cumulative Incidence over the year.
# Inputs
total_pop = 500
existing_cases = 50
new_cases = 45
# 1. Prevalence (at start)
# Denominator is Total Population
prev_start = existing_cases / total_pop
# 2. Cumulative Incidence
# Denominator is Population AT RISK (Healthy people at start)
pop_at_risk = total_pop - existing_cases
cum_inc = new_cases / pop_at_risk
# Display Results
print(paste("Prevalence at Jan 1:", prev_start * 100, "%"))## [1] "Prevalence at Jan 1: 10 %"
## [1] "Cumulative Incidence (Risk) over 1 year: 10 %"
End of Module III ````
File > New File >
R Markdown.ggplot2
figure specifically designed to teach the difficult concept of
“Person-Time” visually.