Learning Objectives

By the end of this module, students should be able to:

  1. Distinguish between counts, ratios, proportions, and rates.
  2. Define, calculate, and interpret measures of Morbidity (Prevalence and Incidence).
  3. Define, calculate, and interpret measures of Mortality (Crude, Specific, and Case Fatality).
  4. Understand the mathematical relationship between Prevalence and Incidence.

1. Introduction: The Quantitative Nature of Epidemiology

Epidemiology is defined as the study of the distribution and determinants of health-related states. To study “distribution,” we must quantify disease. We cannot manage what we cannot measure.

The Mathematical Tools

Before we measure disease, we must define our tools:

A. Ratios

One quantity divided by another quantity. The numerator is not necessarily in the denominator. \[ \text{Ratio} = \frac{x}{y} \] * Example: Sex ratio (Males / Females).

B. Proportions

A type of ratio where the numerator is included in the denominator. Usually expressed as a percentage. \[ \text{Proportion} = \frac{a}{a+b} \times 100 \]

C. Rates

A ratio that consists of a numerator and denominator, but explicitly includes a measure of time. It measures the speed of disease occurrence.

# R Example: Basic Calculations
females <- 60
males <- 40
total <- females + males

# Ratio
sex_ratio <- males / females
print(paste("Sex Ratio (M:F):", round(sex_ratio, 2)))
## [1] "Sex Ratio (M:F): 0.67"
# Proportion
prop_female <- females / total
print(paste("Proportion of Females:", prop_female * 100, "%"))
## [1] "Proportion of Females: 60 %"

2. Measures of Morbidity

Morbidity refers to the state of being diseased or unhealthy within a population. We look at two main perspectives: Status (Prevalence) and Change (Incidence).

A. Prevalence (P)

Prevalence measures the proportion of individuals in a population who have the disease at a specific point in time (or period). It is a “snapshot.”

\[ \text{Prevalence} = \frac{\text{Number of existing cases}}{\text{Total population}} \times 100 \]

Real-Life Example:

Diabetes in the USA: If there are 34 million people with diabetes in a population of 330 million. \[ P = \frac{34,000,000}{330,000,000} \approx 10.3\% \]

Types of Prevalence:

  1. Point Prevalence: Do you have the disease right now?
  2. Period Prevalence: Did you have the disease at any time during the last year?
# Visualizing Prevalence
set.seed(123)
population_n <- 100
status <- sample(c("Healthy", "Diseased"), population_n, replace = TRUE, prob = c(0.85, 0.15))
df <- data.frame(status)

ggplot(df, aes(x = status, fill = status)) +
  geom_bar() +
  labs(title = "Point Prevalence Snapshot", y = "Count") +
  theme_minimal() +
  scale_fill_manual(values = c("red", "lightblue"))
Figure 1: Visualizing Prevalence in a Sample Population

Figure 1: Visualizing Prevalence in a Sample Population


B. Incidence (I)

Incidence measures the occurrence of new cases of disease in a population at risk over a specified period.

1. Cumulative Incidence (Risk)

The probability that an individual will develop the disease during a specific time period.

\[ \text{CI} = \frac{\text{Number of NEW cases}}{\text{Population at risk at start of period}} \]

  • Key Condition: The denominator must exclude people who already have the disease or are immune.

2. Incidence Density (Incidence Rate)

Used when follow-up times differ for individuals (dynamic populations). The denominator is Person-Time.

\[ \text{IR} = \frac{\text{Number of NEW cases}}{\text{Total Person-Time of observation}} \]

Real-Life Example: The Framingham Heart Study

Researchers followed a cohort for decades. Because people entered and left the study at different times, Incidence Rate (using person-years) was the appropriate measure to calculate the risk of cardiovascular disease.


R Example: Calculating Person-Time

Imagine a study of 5 patients followed for 5 years or until they get the disease.

# Create dummy data
patient_data <- data.frame(
  Patient_ID = 1:5,
  Years_Followed = c(5, 3, 5, 2, 5), # 3 and 2 dropped out or got disease
  Developed_Disease = c(0, 1, 0, 1, 0) # 1 = Yes, 0 = No
)

total_person_years <- sum(patient_data$Years_Followed)
new_cases <- sum(patient_data$Developed_Disease)

incidence_rate <- new_cases / total_person_years

knitr::kable(patient_data, caption = "Patient Follow-up Data")
Patient Follow-up Data
Patient_ID Years_Followed Developed_Disease
1 5 0
2 3 1
3 5 0
4 2 1
5 5 0
print(paste("Total Person-Years:", total_person_years))
## [1] "Total Person-Years: 20"
print(paste("New Cases:", new_cases))
## [1] "New Cases: 2"
print(paste("Incidence Rate:", round(incidence_rate, 3), "cases per person-year"))
## [1] "Incidence Rate: 0.1 cases per person-year"

3. The Relationship: Prevalence vs. Incidence

The “Bathtub” Analogy is central to understanding epidemiology.

The Formula: \[ P \approx I \times D \] (Prevalence \(\approx\) Incidence \(\times\) Duration of disease)

# Simulation of Relationship
duration <- seq(1, 10, by=1)
incidence <- 5 # Constant incidence
prevalence <- incidence * duration

data_rel <- data.frame(Duration = duration, Prevalence = prevalence)

ggplot(data_rel, aes(x = Duration, y = Prevalence)) +
  geom_line(color = "blue", size = 1.5) +
  geom_point(size = 3) +
  labs(title = "Effect of Disease Duration on Prevalence",
       subtitle = "Assuming constant Incidence of 5",
       x = "Average Duration of Disease (Years)",
       y = "Prevalence (Cases)") +
  theme_bw()
Figure 2: The Bathtub Analogy Simulation

Figure 2: The Bathtub Analogy Simulation


4. Measures of Mortality

Mortality rates are essential for understanding the severity of disease and the health status of a population.

A. Crude Death Rate (CDR)

The actual observed mortality.

\[ \text{CDR} = \frac{\text{Total Deaths}}{\text{Mid-year Population}} \times 1,000 \]

B. Cause-Specific Death Rate

Mortality due to a specific cause.

\[ \text{Rate} = \frac{\text{Deaths from Cause X}}{\text{Total Population}} \times 100,000 \]

C. Case Fatality Rate (CFR)

Measures the virulence or killing power of a disease. It answers: “If I get this disease, how likely am I to die?”

\[ \text{CFR} = \frac{\text{Deaths from Disease X}}{\text{Total Cases of Disease X}} \times 100 \]

Real-Life Comparison: COVID-19 vs. Ebola

  • COVID-19 (early 2020): CFR was roughly 2% - 3%.
  • Ebola (Zaire strain): CFR can be up to 90%.
  • Note: A disease with low CFR (like flu) can still kill more people in total than a high CFR disease (like Ebola) if the Incidence is massive.
# Example Calculation
disease_stats <- data.frame(
  Disease = c("Disease A (Like Rabies)", "Disease B (Like Flu)"),
  Cases = c(100, 100000),
  Deaths = c(99, 100)
)

disease_stats$CFR_Percentage <- (disease_stats$Deaths / disease_stats$Cases) * 100

knitr::kable(disease_stats, caption = "Comparing Virulence (CFR)")
Comparing Virulence (CFR)
Disease Cases Deaths CFR_Percentage
Disease A (Like Rabies) 1e+02 99 99.0
Disease B (Like Flu) 1e+05 100 0.1

5. Class Exercise

Scenario: In a town of 10,000 people on Jan 1st: 1. 500 people already have hypertension. 2. Over the year, 200 new people are diagnosed with hypertension. 3. No one died or was cured.

Calculate: 1. Prevalence on Jan 1st. 2. Cumulative Incidence over the year. 3. Prevalence on Dec 31st.

(Instructor Note: Discuss the denominator for Cumulative Incidence closely—did you subtract the prevalent cases?)


Summary

Measure Question Answered Numerator Denominator
Prevalence How much disease is there now? Existing Cases Total Population
Incidence Risk What is the risk of getting it? New Cases Population at Risk
Incidence Rate How fast is it spreading? New Cases Person-Time
CFR How deadly is it? Deaths from X Cases of X

```