Introduction

In Module II, we discussed how to define a “case.” In Module III, we move to the mathematical quantification of these cases. Epidemiology relies on comparing the frequency of disease in different populations (e.g., “Is malaria more common in Kismayo than in Baidoa?”). To answer this, we need standardized measurements.

This module covers: 1. Mathematical Tools: Ratios, Proportions, and Rates. 2. Measures of Morbidity: Prevalence and Incidence. 3. Visualization: Swimmer plots and dynamic calculation.


1. The Mathematical Tools

Before measuring disease, we must understand the three types of calculations used.

A. Ratio

A value obtained by dividing one quantity by another. The numerator is NOT included in the denominator. \[ Ratio = \frac{X}{Y} \] * Real-Life Example: The number of hospital beds per doctor in a region. * Calculation: 500 beds / 50 doctors = 10 beds per doctor.

B. Proportion

A specific type of ratio where the numerator IS included in the denominator. Usually expressed as a percentage (%). \[ Proportion = \frac{A}{A + B} \times 100 \] * Real-Life Example: The proportion of students in this class who are female.

C. Rate

A change in one quantity per unit change in another quantity (usually time). A rate must contain a time dimension. \[ Rate = \frac{\Delta \text{Events}}{\Delta \text{Time}} \]


2. Prevalence (The “Snapshot”)

Definition: Prevalence is the proportion of a population who have a specific characteristic or disease at a specific point in time. It includes both new and existing cases.

\[ Prevalence = \frac{\text{Number of existing cases}}{\text{Total Population}} \]

Real-Life Example: Diabetes Survey

Imagine a cross-sectional survey conducted in a district in Mogadishu. We test 1,000 people for Type 2 Diabetes on January 1st. * Found: 120 people have Diabetes. * Calculation: \(120 / 1000 = 0.12\) or \(12\%\).

R Visualization: Prevalence

The following R code visualizes prevalence as parts of a whole using a bar chart.

# Create dummy data representing a population of 100 people
pop_data <- data.frame(
  Status = c(rep("Diseased", 12), rep("Healthy", 88))
)

# Calculate percentage
counts <- pop_data %>% count(Status) %>% mutate(prop = n/sum(n))

# Plot
ggplot(counts, aes(x = "", y = prop, fill = Status)) +
  geom_bar(stat = "identity", width = 1) +
  coord_flip() +
  theme_minimal() +
  scale_fill_manual(values = c("Diseased" = "#e74c3c", "Healthy" = "#2ecc71")) +
  labs(title = "Point Prevalence Visualization",
       subtitle = "12% Prevalence (Red portion represents the burden of disease)",
       x = NULL, y = "Proportion") +
  theme(axis.text.y = element_blank())


3. Incidence (The “Movie”)

Definition: Incidence measures the occurrence of NEW cases of disease that develop in a candidate population over a specified time period.

There are two distinct types of incidence:

A. Cumulative Incidence (Risk)

The proportion of an at-risk group that develops the disease over a specific time.

\[ CI = \frac{\text{Number of NEW cases}}{\text{Population at risk at start}} \]

  • Real-Life Example (Outbreak): A wedding reception has 200 guests. They all eat contaminated salad. Within 24 hours, 40 guests develop cholera symptoms.
    • \(CI = 40 / 200 = 0.20\) or \(20\%\) (Also called the “Attack Rate”).

B. Incidence Density (Incidence Rate)

This is used when we follow people for different lengths of time. The denominator is Person-Time.

\[ IR = \frac{\text{Number of NEW cases}}{\text{Total Person-Time of Observation}} \]

Understanding Person-Time with a “Swimmer Plot”

This is the most difficult concept for students. Let’s visualize a cohort study following 5 individuals for 5 years to see if they develop a disease.

  • Subject A: Followed for 5 years, stays healthy.
  • Subject B: Followed for 2 years, gets disease (Event).
  • Subject C: Followed for 5 years, stays healthy.
  • Subject D: Followed for 3 years, gets disease (Event).
  • Subject E: Followed for 4 years, moves away (Lost to follow-up/Censored).
# Create the dataset
cohort_data <- data.frame(
  ID = c("Person A", "Person B", "Person C", "Person D", "Person E"),
  Time_Years = c(5, 2, 5, 3, 4),
  Status = c("Healthy", "Disease", "Healthy", "Disease", "Lost to Follow-up")
)

# Calculate totals for the lecture
total_person_years <- sum(cohort_data$Time_Years)
total_new_cases <- sum(cohort_data$Status == "Disease")
inc_rate <- total_new_cases / total_person_years

# Generate Swimmer Plot
ggplot(cohort_data, aes(x = ID, y = Time_Years, fill = Status)) +
  geom_bar(stat = "identity", width = 0.6) +
  coord_flip() + 
  scale_fill_manual(values = c("Disease" = "#e74c3c", 
                               "Healthy" = "#2ecc71", 
                               "Lost to Follow-up" = "#95a5a6")) +
  geom_text(aes(label = paste(Time_Years, "yrs")), hjust = 1.2, color = "white", fontface = "bold") +
  labs(title = "Swimmer Plot: Visualizing Person-Time",
       subtitle = paste0("Total Person-Time = ", total_person_years, " years | New Cases = ", total_new_cases),
       y = "Years Observed", x = "Subject") +
  theme_bw()

Calculation from Figure: 1. Numerator: 2 New Cases (Person B, Person D). 2. Denominator: Sum of all bars (\(5 + 2 + 5 + 3 + 4 = 19\) Person-Years). 3. Incidence Rate: \(2 / 19 = 0.105\) cases per person-year (or 10.5 per 100 person-years).


4. Prevalence vs. Incidence: The Bathtub Analogy

The relationship between Prevalence (P) and Incidence (I) depends on the Duration (D) of the disease.

\[ P \approx I \times D \]

Imagine a Bathtub: 1. Faucet (Incidence): New cases flowing in. 2. Water Level (Prevalence): The total amount of disease currently in the tub. 3. Drain (Recovery/Death): Cases leaving the prevalence pool.

  • High Prevalence Example: HIV (Modern treatment extends life, so D is long. Even if Incidence is low, Prevalence accumulates).
  • Low Prevalence Example: Ebola (High mortality and rapid course. D is short. People leave the “tub” (die or recover) quickly, so Prevalence stays low even if Incidence spikes).

5. Class Exercise

Scenario: You are monitoring a group of 500 children for Malaria over 1 year. 1. At the start (Jan 1), 50 children already have malaria. 2. We follow the remaining 450 healthy children. 3. Over the year, 45 of these healthy children develop malaria.

Question: Calculate the Prevalence at the start and the Cumulative Incidence over the year.

# Inputs
total_pop = 500
existing_cases = 50
new_cases = 45

# 1. Prevalence (at start)
# Denominator is Total Population
prev_start = existing_cases / total_pop

# 2. Cumulative Incidence
# Denominator is Population AT RISK (Healthy people at start)
pop_at_risk = total_pop - existing_cases
cum_inc = new_cases / pop_at_risk

# Display Results
print(paste("Prevalence at Jan 1:", prev_start * 100, "%"))
## [1] "Prevalence at Jan 1: 10 %"
print(paste("Cumulative Incidence (Risk) over 1 year:", cum_inc * 100, "%"))
## [1] "Cumulative Incidence (Risk) over 1 year: 10 %"

End of Module III ````

How to use this file:

  1. Open RStudio.
  2. Go to File > New File > R Markdown.
  3. Clear the default text in the editor.
  4. Paste the code block above entirely.
  5. Click the Knit button (icon of a ball of yarn) and select Knit to HTML.

Key Features included in this draft:

  • Swimmer Plot: This is a custom ggplot2 figure specifically designed to teach the difficult concept of “Person-Time” visually.
  • Interactive Math: The R chunks allow students to change the numbers (e.g., in the Class Exercise) and re-run the code to see how the rates change.
  • Conceptual Analogies: Includes the “Bathtub” analogy and “Snapshot vs. Movie” comparison which are standard pedagogical tools in Epidemiology.