What is Epidemiology?

Epidemiology is the study of the distribution, incidence, and possible control of diseases. The use of statistics is extremely important in epidemiology since it provides the tools needed to make sense of all the data such as Poisson and Binomial distribution. Using these models help us describe disease patterns through rates to summarize how many people are sick or using it to model outbreaks to show how fast a disease is spreading and to predict future cases.

Poisson Distribution

In epidemiology, poisson distribution is useful in how it models the probability of a given number events like cases of infections or deaths occurring in a fixed period of time or space. Being able to detect outbreaks and predicting case counts can be crucial in keeping the most amount of people safe and healthy. \[ P(x) = \frac{\lambda^x e^{-\lambda}}{x!} \] In the case of poisson distribution, x would represent the number of events observed, so for example the number of new infected cases in a week while lambda would be the expected number of events in the same time period of region, such as the average number of flue cases in a week in a specific city.

Binomial Distribution

For binomial distribution, it is a model that only has two possible outcomes which are usually labeled “success” and “failure”. For epidemiology it is quite helpful especially with cases where you are trying to model whether someone is infected or not. Especially in situation where each person or trial has a fixed possibility on being infected from for example exposure as well as that each test result in independent. Overall letting you calculate the probability of a certain amount being infected. \[ P(x) = \binom{n}{x}p^x(1-p)^{n-x} \]

Binomial Distribution Example

Let’s say, for example, you test 100 people for a virus and each person has a 10% chance of being infected with 5 people being infected. For the equation n would equal the number of trials or in this case people tested (100), x would equal the number of occurrences, 5 infected, and p the probability of success or for this case the amount actually infected which is 10%. \[ P(5) = \binom{100}{5}.1^5(1-.1)^{100-5} \] So for this the probabilty for this specific scenario was 0.03387 or 3.387%.

Total Cases Over Time

This chart shows the total cases of Amebiasis in California reported each year from 2001 to 2014.

Infection Rates by Sex

This bar chart compares infection rates of Amebiasis between male and females over the years in California.

R Code for Infection Rate by Sex Plot

Here is the code used for the previous plot.

CA_Disease_Sex <- infections_disease %>%
  filter(County =="California", Disease == "Amebiasis", 
         Sex %in% c("Male", "Female"))

ggplot(CA_Disease_Sex, aes(x = Year, y = Rate, fill = Sex)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Amebiasis Infection Rates by Sex in California (2001 - 2014)" ,
       x = "Year", 
       y = "Rate per 100,000")

Amebiasis Cases vs Population Over Time in California

Conclusion

Statistics help transform raw health data into insights that guide public health decisions. With the Poisson and Binomial distribution helping us model disease counts and infection risks, it makes it easier to detect outbreaks and predict trends. Through these tools and many others, they help strengthen epidemiology’s ability to protect communities.