Normal Distribution: It is a type of continuous distribution that is defined by two parameters (mean and Standard deviation). it contains a symmetric bell curved shape where the mean represents the central value while the standard deviation shows the spread of the data.
Binomial: A discrete probability distribution in which Its trials are determined by two possible outcomes (Success or failure) in which they help model the number of successes in fa ixed number of independent trials. It is defined by the number of trials and the probability of success for those trials. We also have to take into account that the two possible outcomes are mutually exclusive of each other.
Poisson: A discrete probability distribution most commonly used to model rare or random events in a fixed interval. it is defined by the number of events ( x ) occurring in the fixed interval.
PDF quantifies the likelihood of a continuous random variable taking on a specific value. A CDF on the other hand can be associated with both discerete and continuous probability distributions measures, the cumulative probability of the variable being less than or equal to a given value.
Normal Distribution: The below PDF formula makes sense, the mean is locating the center of the distribution and and standard deviation which analyzes the dispersion of the data are both intuitive in relation to the symmetric bell curve shape. Due to the fact that the data is relative to the center of the bell curve helps to understand how the probability distribution changes as it moves further away from the mean.
Normal Distribution: It is defined by two key parameters (mean and Standard deviation).
Binomial: It is defined by the number of trials and the probability of success for those trials.
Poisson: it is defined by the number of events ( x ) occurring in a fixed interval of time/space
These Parameters above need to be declared in R as well as the random variable
Normal Distribution: This can be commonly used when trying to model the height or weight distribution of adults in a given country or region. It can even extend to something such as job satisfaction levels across a company. Overall, normal distributions tend to be once of the most common distributions utilized.
Binomial: As mentioned previously, binomial distributions are best used where the number of trials are fixed, only has a success and failure outcome, and where the objective is finding a specific number of successes. For this reason scenarios such flipping a coin and trying to see how many times it will land heads in a fixed amount of coin flips is a good example. Another situation could be when you are trying to model out the probability of successes in a clinical trial of a new drug where the outcomes are only success or failure.
Poisson: Situations where poisson can be modeled can include predicting the number of car crashes occurring at a given intersection in a given time period (month or year). It can even extend to a situation where you are trying to predict the number of customers arriving at a store in given time interval (hour or day).
#Normal distribution
mean_n <- 5
sd_n <- 1
#Binomial distribution
size_b <- 15
prob_b <- 0.5
#Poisson distribution
lambda_p <- 6
#values for X
x <- seq(from = mean_n - 3*sd_n,
to = mean_n + 3*sd_n,
length.out = 25)
# PDF values for each distribution
pdf_n <- dnorm(x,
mean = mean_n,
sd = sd_n)
pdf_b <- dbinom(0:20,
size = size_b,
prob = prob_b)
pdf_p <- dpois(0:15,
lambda = lambda_p)
#plot normal
plot(x,
pdf_n,
type = "l",
col = "blue",
main = "Normal Distribution",
ylab = "PDF")
abline(v = mean_n, col = "darkgreen", lty = 3)
#plot binomial
plot(0:20,
pdf_b,
type = "h",
col = "blue",
main = "Binomial Distribution",
ylab = "PDF")
#plot poisson
plot(0:15,
pdf_p,
type = "h",
col = "blue",
main = "Poisson Distribution",
ylab = "PDF")
Let’s assume that a hospital’s neurosurgical team performed N procedures for in-brain bleeding last year. x of these procedures resulted in death within 30 days. If the national proportion for death in these cases is π, then is there evidence to suggest that your hospital’s proportion of deaths is more extreme than the national proportion?
In other words, pick your own values of N, x, and π.
x is necessarily less than or equal to N, and π is a fixed probability of success.
The probability should be greater than or equal to x.
Then model both as a binomial and a Poisson, and provide your R code solutions. Do you get similar answers or not under the two different distributional assumptions, and can you guess why ?
N =250 procedures performed
x = 60 of these procedures resulted in death within 30 days
π = 0.20 (20%) as the national proportion for death in such cases
Binomial
n <- 250
x <- 60
p <- 0.20
# Calculate the probability (binomial distribution)
bin_prob <- sum(pbinom(x-1, # range
size = n, ## Procedures performed
prob = p, # probability of death
lower.tail = FALSE)) #if we are looking for greater than do FALSE
print(bin_prob)
## [1] 0.06885367
Poisson
lambda <- n * p
pois_prob <- sum(ppois(x-1,
lambda,
lower.tail = FALSE))
print(pois_prob)
## [1] 0.09226505
Poisson is a limiting case of binomial distribution, as the number of trials increases and the probability of success is smaller in the binomial distribution, the closer resemblance between the two. In my example the number is similar but not the same, if i was to increase the # of procedures performed then i will likely see my two answers become even closer in result.