By the end of this module, students should be able to:
Epidemiology is often defined as the study of the distribution and determinants of disease. To study distribution, we must quantify disease occurrence. Measurement allows epidemiologists to:
Historical Context: John Snow’s mapping of Cholera cases in 1854 is a classic example of using counts and geographic distribution to identify a source of infection (the Broad Street Pump).
Before calculating specific morbidity measures, we must define the mathematical building blocks.
The absolute number of events (e.g., 50 cases of influenza). While useful for resource planning (e.g., number of hospital beds needed), counts do not allow for comparison between populations of different sizes.
A value obtained by dividing one quantity by another. The numerator and denominator may be completely distinct. \[ \text{Ratio} = \frac{a}{b} \] Example: The ratio of male cases to female cases.
A type of ratio where the numerator is included in the denominator. It is dimensionless and ranges from 0 to 1 (or 0% to 100%). \[ \text{Proportion} = \frac{a}{a + b} \]
A ratio that consists of a numerator and denominator, but explicitly includes a measure of time in the denominator. Rates measure the speed at which disease occurs.
Morbidity refers to the state of being diseased or unhealthy within a population.
Prevalence measures the burden of disease at a specific time. It includes both new and existing cases. It is a proportion, not a rate.
The proportion of the population that is diseased at a single point in time.
\[ \text{Point Prevalence} = \frac{\text{Number of existing cases at time } t}{\text{Total population at time } t} \]
The proportion of the population that is diseased during a specified duration of time.
\[ \text{Period Prevalence} = \frac{\text{Existing cases + New cases during period}}{\text{Average population during period}} \]
Let’s simulate a dataset representing a survey of 20 individuals tested for Diabetes.
# Create sample data
# 1 = Diabetic, 0 = Non-Diabetic
set.seed(123)
study_pop <- data.frame(
id = 1:20,
diabetes_status = sample(c(0, 1), 20, replace = TRUE, prob = c(0.7, 0.3))
)
# Calculate Point Prevalence
num_cases <- sum(study_pop$diabetes_status)
total_pop <- nrow(study_pop)
prevalence <- (num_cases / total_pop) * 100
print(paste("Number of Cases:", num_cases))
## [1] "Number of Cases: 7"
print(paste("Total Population:", total_pop))
## [1] "Total Population: 20"
print(paste("Point Prevalence of Diabetes:", prevalence, "%"))
## [1] "Point Prevalence of Diabetes: 35 %"
Incidence measures the occurrence of new cases of disease in a population at risk over a period of time. It measures the transition from health to disease (risk).
The proportion of an at-risk population that develops the disease over a specified period.
\[ CI = \frac{\text{Number of new cases during period}}{\text{Total population at risk at start of period}} \]
Note: The denominator must exclude individuals who already have the disease or are immune.
Used when individuals are followed for different lengths of time. The denominator is Person-Time.
\[ IR = \frac{\text{Number of new cases}}{\text{Total Person-Time of observation}} \]
Person-Time is the sum of time each individual remains at risk and under observation.
Consider a cohort study tracking COVID-19 infection. Participants contribute different “person-days” until they get infected or the study ends.
# Create sample cohort data
# Status: 1 = Infected, 0 = Censored/Healthy
# Time: Days followed
cohort_data <- data.frame(
id = 1:5,
status = c(1, 0, 1, 0, 0),
days_followed = c(15, 30, 5, 30, 20)
)
# Calculate Total Person-Time (in days)
total_person_time <- sum(cohort_data$days_followed)
# Calculate New Cases
new_cases <- sum(cohort_data$status)
# Calculate Incidence Rate per 1,000 person-days
inc_rate <- (new_cases / total_person_time) * 1000
print(paste("Total Person-Days:", total_person_time))
## [1] "Total Person-Days: 100"
print(paste("New Cases:", new_cases))
## [1] "New Cases: 2"
print(paste("Incidence Rate:", round(inc_rate, 2), "cases per 1,000 person-days"))
## [1] "Incidence Rate: 20 cases per 1,000 person-days"
In a steady-state population (where in-migration equals out-migration and prevalence is constant), the relationship is defined as:
\[ P \approx I \times D \]
Where: * \(P\) = Prevalence * \(I\) = Incidence Rate * \(D\) = Average Duration of disease
Interpretation: * High Incidence, Short Duration (e.g., Common Cold): Prevalence remains relatively low. * Low Incidence, Long Duration (e.g., Diabetes): Prevalence can be very high because cases accumulate.
While morbidity measures illness, mortality measures death.
The proportion of people with a specific disease who die from it. It measures the severity or virulence of the disease.
\[ \text{CFR} (\%) = \frac{\text{Number of deaths from disease } X}{\text{Number of cases of disease } X} \times 100 \]
Example: In the early stages of the COVID-19 pandemic, CFR was a critical metric to estimate the virus’s deadliness compared to seasonal flu.
The rate of death from a specific cause in the total population.
\[ \text{CSMR} = \frac{\text{Deaths from cause } X}{\text{Total mid-year population}} \times 100,000 \]
| Measure | Numerator | Denominator | Interpretation |
|---|---|---|---|
| Prevalence | Existing cases | Total population | Burden of disease |
| Cumulative Incidence | New cases | Population at risk | Individual risk/probability |
| Incidence Rate | New cases | Person-time | Speed of outbreak |