By the end of this module, students should be able to:
Epidemiology is defined as the study of the distribution and determinants of health-related states. To study “distribution,” we must quantify disease. We cannot manage what we cannot measure.
Before we measure disease, we must define our tools:
One quantity divided by another quantity. The numerator is not necessarily in the denominator. \[ \text{Ratio} = \frac{x}{y} \] * Example: Sex ratio (Males / Females).
A type of ratio where the numerator is included in the denominator. Usually expressed as a percentage. \[ \text{Proportion} = \frac{a}{a+b} \times 100 \]
A ratio that consists of a numerator and denominator, but explicitly includes a measure of time. It measures the speed of disease occurrence.
# R Example: Basic Calculations
females <- 60
males <- 40
total <- females + males
# Ratio
sex_ratio <- males / females
print(paste("Sex Ratio (M:F):", round(sex_ratio, 2)))## [1] "Sex Ratio (M:F): 0.67"
# Proportion
prop_female <- females / total
print(paste("Proportion of Females:", prop_female * 100, "%"))## [1] "Proportion of Females: 60 %"
Morbidity refers to the state of being diseased or unhealthy within a population. We look at two main perspectives: Status (Prevalence) and Change (Incidence).
Prevalence measures the proportion of individuals in a population who have the disease at a specific point in time (or period). It is a “snapshot.”
\[ \text{Prevalence} = \frac{\text{Number of existing cases}}{\text{Total population}} \times 100 \]
Diabetes in the USA: If there are 34 million people with diabetes in a population of 330 million. \[ P = \frac{34,000,000}{330,000,000} \approx 10.3\% \]
# Visualizing Prevalence
set.seed(123)
population_n <- 100
status <- sample(c("Healthy", "Diseased"), population_n, replace = TRUE, prob = c(0.85, 0.15))
df <- data.frame(status)
ggplot(df, aes(x = status, fill = status)) +
geom_bar() +
labs(title = "Point Prevalence Snapshot", y = "Count") +
theme_minimal() +
scale_fill_manual(values = c("red", "lightblue"))Figure 1: Visualizing Prevalence in a Sample Population
Incidence measures the occurrence of new cases of disease in a population at risk over a specified period.
The probability that an individual will develop the disease during a specific time period.
\[ \text{CI} = \frac{\text{Number of NEW cases}}{\text{Population at risk at start of period}} \]
Used when follow-up times differ for individuals (dynamic populations). The denominator is Person-Time.
\[ \text{IR} = \frac{\text{Number of NEW cases}}{\text{Total Person-Time of observation}} \]
Researchers followed a cohort for decades. Because people entered and left the study at different times, Incidence Rate (using person-years) was the appropriate measure to calculate the risk of cardiovascular disease.
Imagine a study of 5 patients followed for 5 years or until they get the disease.
# Create dummy data
patient_data <- data.frame(
Patient_ID = 1:5,
Years_Followed = c(5, 3, 5, 2, 5), # 3 and 2 dropped out or got disease
Developed_Disease = c(0, 1, 0, 1, 0) # 1 = Yes, 0 = No
)
total_person_years <- sum(patient_data$Years_Followed)
new_cases <- sum(patient_data$Developed_Disease)
incidence_rate <- new_cases / total_person_years
knitr::kable(patient_data, caption = "Patient Follow-up Data")| Patient_ID | Years_Followed | Developed_Disease |
|---|---|---|
| 1 | 5 | 0 |
| 2 | 3 | 1 |
| 3 | 5 | 0 |
| 4 | 2 | 1 |
| 5 | 5 | 0 |
## [1] "Total Person-Years: 20"
## [1] "New Cases: 2"
## [1] "Incidence Rate: 0.1 cases per person-year"
The “Bathtub” Analogy is central to understanding epidemiology.
The Formula: \[ P \approx I \times D \] (Prevalence \(\approx\) Incidence \(\times\) Duration of disease)
# Simulation of Relationship
duration <- seq(1, 10, by=1)
incidence <- 5 # Constant incidence
prevalence <- incidence * duration
data_rel <- data.frame(Duration = duration, Prevalence = prevalence)
ggplot(data_rel, aes(x = Duration, y = Prevalence)) +
geom_line(color = "blue", size = 1.5) +
geom_point(size = 3) +
labs(title = "Effect of Disease Duration on Prevalence",
subtitle = "Assuming constant Incidence of 5",
x = "Average Duration of Disease (Years)",
y = "Prevalence (Cases)") +
theme_bw()Figure 2: The Bathtub Analogy Simulation
Mortality rates are essential for understanding the severity of disease and the health status of a population.
The actual observed mortality.
\[ \text{CDR} = \frac{\text{Total Deaths}}{\text{Mid-year Population}} \times 1,000 \]
Mortality due to a specific cause.
\[ \text{Rate} = \frac{\text{Deaths from Cause X}}{\text{Total Population}} \times 100,000 \]
Measures the virulence or killing power of a disease. It answers: “If I get this disease, how likely am I to die?”
\[ \text{CFR} = \frac{\text{Deaths from Disease X}}{\text{Total Cases of Disease X}} \times 100 \]
# Example Calculation
disease_stats <- data.frame(
Disease = c("Disease A (Like Rabies)", "Disease B (Like Flu)"),
Cases = c(100, 100000),
Deaths = c(99, 100)
)
disease_stats$CFR_Percentage <- (disease_stats$Deaths / disease_stats$Cases) * 100
knitr::kable(disease_stats, caption = "Comparing Virulence (CFR)")| Disease | Cases | Deaths | CFR_Percentage |
|---|---|---|---|
| Disease A (Like Rabies) | 1e+02 | 99 | 99.0 |
| Disease B (Like Flu) | 1e+05 | 100 | 0.1 |
Scenario: In a town of 10,000 people on Jan 1st: 1. 500 people already have hypertension. 2. Over the year, 200 new people are diagnosed with hypertension. 3. No one died or was cured.
Calculate: 1. Prevalence on Jan 1st. 2. Cumulative Incidence over the year. 3. Prevalence on Dec 31st.
(Instructor Note: Discuss the denominator for Cumulative Incidence closely—did you subtract the prevalent cases?)
| Measure | Question Answered | Numerator | Denominator |
|---|---|---|---|
| Prevalence | How much disease is there now? | Existing Cases | Total Population |
| Incidence Risk | What is the risk of getting it? | New Cases | Population at Risk |
| Incidence Rate | How fast is it spreading? | New Cases | Person-Time |
| CFR | How deadly is it? | Deaths from X | Cases of X |
```