Normal Distribution: A bell-shaped, symmetric
and continuous distribution. The mean (μ) and the standard deviation (σ)
are the parameters used in defining a normal distribution. The mean (μ)
locates the center of the peak, while the standard deviation (σ)
determines the spread (ie How squished or spread out the curve is).
Ubiquitous because of all the real-world applications.
Binomial Distribution: Used to model the number
of successes in a fixed number of trials. To use the model, there are 4
requirements that must be met; there are a fixed number of trials,
outcomes are binary(success or fail), the probability for success
remains the same through each trial, the trials are independent. By
checking for those 4 conditions, you will know that what you are looking
at is a Binomial Distribution. The formula for calculating the
probability of a specific outcome is:
\[
P(k) = \binom{n}{k} p^k (1-p)^{n-k}
\]
Poisson Distribution: Models the number of events occurring in a fixed interval of time or space. These events must happen independently at a constant rate.
PDF (Probability Density Function): For continuous distributions, the PDF gives the relative likelihood of the variable taking on a specific value within a certain range. The area under the PDF curve over an interval gives the probability of the variable falling in that interval.
CDF (Cumulative Distribution Function): Gives the probability that the random variable is less than or equal to a specific value. It accumulates probability from the left tail up to that point, always increasing from 0 to 1.
(Note: I tried this with Poisson Distribution formula and my head exploded. Not sure why) In a Binomial Distribution, the formula makes sense as the \(\binom{n}{k}\) counts the ways that k success can be arranged over n trials. Then, \(p^k\) is the probability of getting k successes. Then, \((1-p)^{n-k}\) is the probability of getting \((n-k)\) failures. Individual probabilities are then combined by multiplying the individual probabilities.
# Normal PDF
# Parameters: mean = 0, sd = 1 (standard normal)
mu <- 0
sigma <- 1
x_norm <- seq(-3, 3, length.out = 100)
# Calculate PDF values
pdf_norm <- dnorm(x_norm, mean = mu, sd = sigma)
# Plot
plot(x_norm, pdf_norm, type = "l", lwd = 2, col = "green",
main = "Normal PDF \n(μ=0, σ=1)",
xlab = "x", ylab = "Density")
# Normal CDF
# Parameters: mean = 0, sd = 1 (standard normal)
mu <- 0
sigma <- 1
x_norm <- seq(-4, 4, length.out = 100)
# Calculate CDF values
cdf_norm <- pnorm(x_norm, mean = mu, sd = sigma)
# Plot
plot(x_norm, cdf_norm, type = "l", lwd = 2, col = "orange",
main = "Normal CDF \n(μ=0, σ=1)",
xlab = "x", ylab = "Cumulative Probability")
# Binomial PDF
# Parameters: n = 25 free throws, p = 0.6 probability
n <- 25
p <- 0.6
x_binom <- 0:n
# Calculate PDF values
pdf_binom <- dbinom(x_binom, size = n, prob = p)
# Plot
plot(x_binom, pdf_binom, type = "h", lwd = 2, col = "blue",
main = "Binomial PDF (n=25, p=0.6)",
xlab = "Number of Successes", ylab = "Probability",
ylim = c(0, max(pdf_binom) * 1.1))
points(x_binom, pdf_binom, pch = 20, col = "blue")
Binomial Approach
# My selected parameters
N <- 60 #Number of trials
x <- 12 #Brain bleeds and death within 30 days
pi <- 0.01 #National proportion (1% mortality)
# Binomial Model
# Task: Compare P(x >= 12) when national proportion is 0.08
prob_binomial <- pbinom(x - 1, size = N, prob = pi)
cat("Binomial Analysis:\n")
## Binomial Analysis:
cat("Expected deaths:", N * pi, "\n")
## Expected deaths: 0.6
cat("Observed deaths:", x, "\n")
## Observed deaths: 12
cat("P(x >= 12 | pi = 0.01):", prob_binomial, "\n")
## P(x >= 12 | pi = 0.01): 1
# Compare to alpha (significance level) of 0.05. This is a concept I only just read about now in my quest to write this code. This gives a threshold in which to determine the significance of our results. Display a conclusion, Yes significance or No significance
if (prob_binomial < 0.05) {
cat("There is evidence to suggest hospital proportion is higher than national (p < 0.05)\n")
} else {
cat("There is no strong evidence hospital differs from national proportion\n")
}
## There is no strong evidence hospital differs from national proportion
Poisson Approach
# My selected parameters (from above)
N <- 60 #Number of trials
x <- 12 #Brain bleeds and death within 30 days
pi <- 0.01 #National proportion (8% mortality)
# Poisson Model: Lambda = expected number of deaths. In other words, we expect 8 deaths out of every 100 trials
lambda <- N * pi
# Probability P(x >= 12) when lambda = 1
prob_poisson <- 1 - ppois(x - 1, lambda = lambda)
cat("\nPoisson Analysis:\n")
##
## Poisson Analysis:
cat("Lambda (expected deaths):", lambda, "\n")
## Lambda (expected deaths): 0.6
cat("Observed deaths:", x, "\n")
## Observed deaths: 12
cat("P(X >= 12 | lambda = 1):", prob_poisson, "\n")
## P(X >= 12 | lambda = 1): 2.614131e-12
if (prob_poisson < 0.05) {
cat("Evidence suggests hospital proportion is higher than national (p < 0.05)\n")
} else {
cat("No strong evidence hospital differs from national proportion\n")
}
## Evidence suggests hospital proportion is higher than national (p < 0.05)
Conclusion There is a very big difference in the results! This was partially intentional as I kept toying with the parameters in order to achieve different results. I began with high values of N (>1000) and then landed on a lower N of 60. It is clear that a Poisson Model has limitations (?) when lambda is outside of a certain range.