Learning Objectives

By the end of this module, students should be able to:

  1. Distinguish between counts, ratios, proportions, and rates.
  2. Calculate and interpret measures of prevalence (point and period).
  3. Calculate and interpret measures of incidence (cumulative incidence and incidence density).
  4. Understand the mathematical relationship between prevalence, incidence, and duration of disease.
  5. Apply R to calculate basic epidemiological measures from raw data.

1. Introduction: Why Measure Disease?

Epidemiology is often defined as the study of the distribution and determinants of disease. To study distribution, we must quantify disease occurrence. Measurement allows epidemiologists to:

  • Surveillance: Monitor the health status of a population (e.g., tracking daily COVID-19 cases).
  • Resource Allocation: Determine where medical infrastructure is needed most.
  • Etiology: Investigate the causes of disease by comparing frequencies between exposed and unexposed groups.

Historical Context: John Snow’s mapping of Cholera cases in 1854 is a classic example of using counts and geographic distribution to identify a source of infection (the Broad Street Pump).


2. Key Concepts in Quantification

Before calculating specific morbidity measures, we must define the mathematical building blocks.

A. Counts

The absolute number of events (e.g., 50 cases of influenza). While useful for resource planning (e.g., number of hospital beds needed), counts do not allow for comparison between populations of different sizes.

B. Ratios

A value obtained by dividing one quantity by another. The numerator and denominator may be completely distinct. \[ \text{Ratio} = \frac{a}{b} \] Example: The ratio of male cases to female cases.

C. Proportions

A type of ratio where the numerator is included in the denominator. It is dimensionless and ranges from 0 to 1 (or 0% to 100%). \[ \text{Proportion} = \frac{a}{a + b} \]

D. Rates

A ratio that consists of a numerator and denominator, but explicitly includes a measure of time in the denominator. Rates measure the speed at which disease occurs.


3. Measures of Morbidity

Morbidity refers to the state of being diseased or unhealthy within a population.

A. Prevalence

Prevalence measures the burden of disease at a specific time. It includes both new and existing cases. It is a proportion, not a rate.

Point Prevalence

The proportion of the population that is diseased at a single point in time.

\[ \text{Point Prevalence} = \frac{\text{Number of existing cases at time } t}{\text{Total population at time } t} \]

Period Prevalence

The proportion of the population that is diseased during a specified duration of time.

\[ \text{Period Prevalence} = \frac{\text{Existing cases + New cases during period}}{\text{Average population during period}} \]

Figure 1: Visual comparison of Point vs. Period Prevalence. Point prevalence is a ‘snapshot’, while period prevalence includes anyone who had the disease at any time during the interval.
Figure 1: Visual comparison of Point vs. Period Prevalence. Point prevalence is a ‘snapshot’, while period prevalence includes anyone who had the disease at any time during the interval.

R Example: Calculating Prevalence

Let’s simulate a dataset representing a survey of 20 individuals tested for Diabetes.

# Create sample data
# 1 = Diabetic, 0 = Non-Diabetic
set.seed(123)
study_pop <- data.frame(
  id = 1:20,
  diabetes_status = sample(c(0, 1), 20, replace = TRUE, prob = c(0.7, 0.3))
)

# Calculate Point Prevalence
num_cases <- sum(study_pop$diabetes_status)
total_pop <- nrow(study_pop)

prevalence <- (num_cases / total_pop) * 100

print(paste("Number of Cases:", num_cases))
## [1] "Number of Cases: 7"
print(paste("Total Population:", total_pop))
## [1] "Total Population: 20"
print(paste("Point Prevalence of Diabetes:", prevalence, "%"))
## [1] "Point Prevalence of Diabetes: 35 %"

B. Incidence

Incidence measures the occurrence of new cases of disease in a population at risk over a period of time. It measures the transition from health to disease (risk).

1. Cumulative Incidence (Risk)

The proportion of an at-risk population that develops the disease over a specified period.

\[ CI = \frac{\text{Number of new cases during period}}{\text{Total population at risk at start of period}} \]

Note: The denominator must exclude individuals who already have the disease or are immune.

2. Incidence Rate (Incidence Density)

Used when individuals are followed for different lengths of time. The denominator is Person-Time.

\[ IR = \frac{\text{Number of new cases}}{\text{Total Person-Time of observation}} \]

Person-Time is the sum of time each individual remains at risk and under observation.

R Example: Calculating Incidence Rate

Consider a cohort study tracking COVID-19 infection. Participants contribute different “person-days” until they get infected or the study ends.

# Create sample cohort data
# Status: 1 = Infected, 0 = Censored/Healthy
# Time: Days followed
cohort_data <- data.frame(
  id = 1:5,
  status = c(1, 0, 1, 0, 0),
  days_followed = c(15, 30, 5, 30, 20) 
)

# Calculate Total Person-Time (in days)
total_person_time <- sum(cohort_data$days_followed)

# Calculate New Cases
new_cases <- sum(cohort_data$status)

# Calculate Incidence Rate per 1,000 person-days
inc_rate <- (new_cases / total_person_time) * 1000

print(paste("Total Person-Days:", total_person_time))
## [1] "Total Person-Days: 100"
print(paste("New Cases:", new_cases))
## [1] "New Cases: 2"
print(paste("Incidence Rate:", round(inc_rate, 2), "cases per 1,000 person-days"))
## [1] "Incidence Rate: 20 cases per 1,000 person-days"

C. The Relationship between Prevalence and Incidence

In a steady-state population (where in-migration equals out-migration and prevalence is constant), the relationship is defined as:

\[ P \approx I \times D \]

Where: * \(P\) = Prevalence * \(I\) = Incidence Rate * \(D\) = Average Duration of disease

Interpretation: * High Incidence, Short Duration (e.g., Common Cold): Prevalence remains relatively low. * Low Incidence, Long Duration (e.g., Diabetes): Prevalence can be very high because cases accumulate.

Figure 2: The “Bathtub” Analogy. Water flowing in represents Incidence. The water level in the tub represents Prevalence. Water draining out (via recovery or death) represents duration/cure.
Figure 2: The “Bathtub” Analogy. Water flowing in represents Incidence. The water level in the tub represents Prevalence. Water draining out (via recovery or death) represents duration/cure.

4. Measures of Mortality

While morbidity measures illness, mortality measures death.

Case Fatality Rate (CFR)

The proportion of people with a specific disease who die from it. It measures the severity or virulence of the disease.

\[ \text{CFR} (\%) = \frac{\text{Number of deaths from disease } X}{\text{Number of cases of disease } X} \times 100 \]

Example: In the early stages of the COVID-19 pandemic, CFR was a critical metric to estimate the virus’s deadliness compared to seasonal flu.

Cause-Specific Mortality Rate

The rate of death from a specific cause in the total population.

\[ \text{CSMR} = \frac{\text{Deaths from cause } X}{\text{Total mid-year population}} \times 100,000 \]


Summary Table

Measure Numerator Denominator Interpretation
Prevalence Existing cases Total population Burden of disease
Cumulative Incidence New cases Population at risk Individual risk/probability
Incidence Rate New cases Person-time Speed of outbreak