1. Introduction

In epidemiology, “measurement” is the cornerstone of research. Before we can compare the health status of populations or identify causes of disease, we must be able to quantify the occurrence of disease.

Learning Objectives:

  1. Distinguish between Ratios, Proportions, and Rates.
  2. Calculate and interpret Measures of Morbidity (Incidence and Prevalence).
  3. Calculate and interpret Measures of Mortality.
  4. Understand the relationship between Incidence and Prevalence (\(P \approx I \times D\)).

2. Mathematical Basics

To measure disease frequency, we use three distinct mathematical parameters.

2.1 Ratio

A division of two numbers where the numerator is NOT necessarily included in the denominator. \[ \text{Ratio} = \frac{x}{y} \] * Example: The sex ratio (Males / Females).

2.2 Proportion

A division where the numerator IS included in the denominator. It is often expressed as a percentage. \[ \text{Proportion} = \frac{a}{a+b} \times 100 \] * Example: The proportion of students in this class who wear glasses.

2.3 Rate

A ratio where time is an intrinsic part of the denominator. It measures the speed of occurrence of an event. \[ \text{Rate} = \frac{\text{Events}}{\text{Population at risk} \times \text{Time}} \]


3. Measures of Morbidity

Morbidity refers to the state of being diseased or unhealthy within a population. We look at two main perspectives: Incidence (new cases) and Prevalence (existing cases).

3.1 Prevalence

Prevalence measures the burden of disease at a specific time.

\[ \text{Prevalence} = \frac{\text{Number of existing cases}}{\text{Total Population}} \]

Types of Prevalence:

  1. Point Prevalence: Do you have the disease right now?
  2. Period Prevalence: Have you had the disease at any point during this month/year?

R Example: Calculating Prevalence

Let’s simulate a survey of 100 people testing for Hypertension.

# Simulating data
set.seed(123)
population_size <- 1000
# 1 = Disease Present, 0 = Disease Absent
survey_data <- data.frame(
  id = 1:population_size,
  status = sample(c(0, 1), population_size, replace = TRUE, prob = c(0.85, 0.15))
)

# Calculate Point Prevalence
existing_cases <- sum(survey_data$status)
prevalence <- (existing_cases / population_size) * 100

print(paste("The Point Prevalence of Hypertension is:", prevalence, "%"))
## [1] "The Point Prevalence of Hypertension is: 14.7 %"

3.2 Incidence

Incidence measures the risk or rate of developing a new disease. It only looks at the “Population at Risk” (people who do not have the disease yet).

A. Cumulative Incidence (Risk)

The proportion of people who develop the disease during a specified period.

\[ CI = \frac{\text{Number of NEW cases}}{\text{Population at risk at start}} \]

B. Incidence Density (Incidence Rate)

Used when follow-up times differ for participants. The denominator is Person-Time.

\[ ID = \frac{\text{Number of NEW cases}}{\text{Total Person-Time of observation}} \]

Real Life Visualization: Person-Time

Imagine a study of 5 patients followed for 10 years to see if they develop Heart Disease.

  • Patient A: Healthy for 10 years.
  • Patient B: Develops disease at Year 5.
  • Patient C: Moves away (Lost to follow-up) at Year 2.
  • Patient D: Healthy for 10 years.
  • Patient E: Develops disease at Year 8.
# Create data for the plot
cohort_data <- data.frame(
  Patient = c("A", "B", "C", "D", "E"),
  Start = c(0, 0, 0, 0, 0),
  End = c(10, 5, 2, 10, 8),
  Outcome = c("Healthy", "Disease", "Lost", "Healthy", "Disease")
)

# Calculate Person Years
total_person_years <- sum(cohort_data$End)
new_cases <- sum(cohort_data$Outcome == "Disease")
inc_rate <- new_cases / total_person_years

# Plotting the timeline (Gantt Chart style)
ggplot(cohort_data, aes(y = Patient, x = End)) +
  geom_segment(aes(x = Start, xend = End, y = Patient, yend = Patient), size = 2, color = "steelblue") +
  geom_point(aes(color = Outcome), size = 4) +
  scale_color_manual(values = c("red", "green", "grey")) +
  theme_minimal() +
  labs(title = "Follow-up of 5 Patients",
       subtitle = paste0("Total Person-Years: ", total_person_years, 
                         " | New Cases: ", new_cases),
       x = "Years of Follow-up", y = "Patient ID")
Figure 1: Visualizing Person-Time for Incidence Rate

Figure 1: Visualizing Person-Time for Incidence Rate

print(paste("Incidence Rate:", round(inc_rate, 3), "cases per person-year"))
## [1] "Incidence Rate: 0.057 cases per person-year"

4. The Relationship: Incidence vs. Prevalence

This is often described using the Bathtub Analogy.

  • Water flowing in: Incidence (New Cases).
  • Water in the tub: Prevalence (Burden).
  • Water draining out: Recovery or Death.

The Formula: \[ Prevalence \approx Incidence \times \text{Duration of Disease} \]

Real Life Examples:

  1. Short Duration, High Incidence (Common Cold):
    • Many people get it (High I), but recover quickly (Low D).
    • Result: Low Prevalence (The tub drains as fast as it fills).
  2. Long Duration, Low Incidence (Diabetes):
    • Few people get it daily (Low I), but they have it for life (High D).
    • Result: High Prevalence (The tub fills up and never drains).
Figure 2: The Relationship between P, I, and D

Figure 2: The Relationship between P, I, and D


5. Measures of Mortality

While morbidity measures illness, mortality measures death.

5.1 Crude Death Rate (CDR)

The actual observed mortality in a population over a given period.

\[ \text{CDR} = \frac{\text{Total Deaths}}{\text{Total Mid-year Population}} \times 1000 \]

5.2 Case Fatality Rate (CFR)

Measures the severity (virulence) of a disease. It asks: Of those who got sick, how many died?

\[ \text{CFR} = \frac{\text{Deaths from Disease X}}{\text{Cases of Disease X}} \times 100 \]

Real Life Example: Ebola vs. COVID-19 (Hypothetical Data Comparison)

Let’s compare the severity of two diseases.

Disease Cases Deaths
Disease A (Like Ebola) 500 250
Disease B (Like Flu) 50,000 500
# Create Data
disease_data <- data.frame(
  Disease = c("Ebola-like", "Flu-like"),
  Cases = c(500, 50000),
  Deaths = c(250, 500)
)

# Calculate CFR
disease_data <- disease_data %>%
  mutate(CFR_Percentage = (Deaths / Cases) * 100)

kable(disease_data, caption = "Comparison of Case Fatality Rates")
Comparison of Case Fatality Rates
Disease Cases Deaths CFR_Percentage
Ebola-like 500 250 50
Flu-like 50000 500 1

Interpretation: Even though the “Flu-like” disease caused more total deaths (500 vs 250), the “Ebola-like” disease is much more virulent (50% CFR vs 1% CFR).


6. Summary Exercise

Scenario: In a village of 1,000 people, on January 1st, 50 people already have Malaria. Over the course of the year: 1. 100 NEW people develop Malaria. 2. None of the Malaria patients die, but 10 people die from car accidents.

Calculate: 1. Prevalence at start. 2. Cumulative Incidence (Risk) over the year. 3. Crude Death Rate.

pop_total <- 1000
existing_cases <- 50
new_cases <- 100
deaths <- 10

# 1. Prevalence at Jan 1
prev <- existing_cases / pop_total
print(paste("Prevalence:", prev))
## [1] "Prevalence: 0.05"
# 2. Cumulative Incidence
# Denominator must be population AT RISK (Total - Existing Cases)
pop_at_risk <- pop_total - existing_cases
ci <- new_cases / pop_at_risk
print(paste("Cumulative Incidence:", round(ci, 3)))
## [1] "Cumulative Incidence: 0.105"
# 3. Crude Death Rate (per 1,000)
# Assuming mid-year pop is approx same as start for simplicity
cdr <- (deaths / pop_total) * 1000
print(paste("Crude Death Rate:", cdr, "per 1,000"))
## [1] "Crude Death Rate: 10 per 1,000"

7. Conclusion

  • Prevalence helps public health officials determine the need for facilities (hospital beds, medicine stocks).
  • Incidence helps researchers identify risk factors and effectiveness of prevention programs.
  • Mortality rates (like CFR) tell us how deadly a disease is.

Always check your denominator! Are you dividing by the whole population, or just those at risk?

```

Instructions on how to use this:

  1. Install R and RStudio if you haven’t already.
  2. Open RStudio.
  3. Go to File > New File > R Markdown.
  4. Delete the default text in the editor.
  5. Copy and Paste the code block above into the editor.
  6. Save the file.
  7. Click the Knit button (icon with a ball of yarn) at the top of the script editor.

Key Features of this Note:

  • Interactive Code: Students can change the numbers in the variables (like population_size or deaths) and re-run the chunks to see how rates change.
  • Visuals: It uses ggplot2 to create a “Gantt Chart” style timeline for Person-Time. This is much better than static text for explaining Incidence Density.
  • Real-Life Context: Uses familiar examples (Ebola vs. Flu, Diabetes vs. Cold) to explain abstract math.