Discussion

Part 1

A) Please explain each of the 3 distributions in less than 4 sentences.

Normal Distribution: It is a type of continuous distribution that is defined by two parameters (mean and Standard deviation). it contains a symmetric bell curved shape where the mean represents the central value while the standard deviation shows the spread of the data.
Binomial: A discrete probability distribution in which Its trials are determined by two possible outcomes (Success or failure) in which they help model the number of successes in fa ixed number of independent trials. It is defined by the number of trials and the probability of success for those trials. We also have to take into account that the two possible outcomes are mutually exclusive of each other.
Poisson: A discrete probability distribution most commonly used to model rare or random events in a fixed interval. it is defined by the number of events ( x ) occurring in the fixed interval.

B) Explain what the pdf and cdf of a distribution measures. Pick any of the three distributions (or a distribution from the list above that we have not covered in class), and provide some intuition as to if the pdf formula makes sense or not.

PDF quantifies the likelihood of a continuous random variable taking on a specific value. A CDF on the other hand can be associated with both discerete and continuous probability distributions measures, the cumulative probability of the variable being less than or equal to a given value.

Normal Distribution: The below PDF formula makes sense, the mean is locating the center of the distribution and and standard deviation which analyzes the dispersion of the data are both intuitive in relation to the symmetric bell curve shape. Due to the fact that the data is relative to the center of the bell curve helps to understand how the probability distribution changes as it moves further away from the mean.

C) What are the key parameters that define the 3 distributions above (or a distribution from the list above)? Does R require these key parameters to be declared ? Type the “?distribution” command in R to find out.

Normal Distribution: It is defined by two key parameters (mean and Standard deviation).
Binomial: It is defined by the number of trials and the probability of success for those trials.
Poisson: it is defined by the number of events ( x ) occurring in a fixed interval of time/space

These Parameters above need to be declared in R as well as the random variable

D) Give a few examples of situations that can be modeled with each of the 3 distributions above. You can try to read Chapter 1.3 Parametric Families of DistributionLinks to an external site. in Introduction to Statistical Thought by Michael Lavine recommended textbook.

Normal Distribution: This can be commonly used when trying to model the height or weight distribution of adults in a given country or region. It can even extend to something such as job satisfaction levels across a company. Overall, normal distributions tend to be once of the most common distributions utilized.
Binomial: As mentioned previously, binomial distributions are best used where the number of trials are fixed, only has a success and failure outcome, and where the objective is finding a specific number of successes. For this reason scenarios such flipping a coin and trying to see how many times it will land heads in a fixed amount of coin flips is a good example. Another situation could be when you are trying to model out the probability of successes in a clinical trial of a new drug where the outcomes are only success or failure.
Poisson: Situations where poisson can be modeled can include predicting the number of car crashes occurring at a given intersection in a given time period (month or year). It can even extend to a situation where you are trying to predict the number of customers arriving at a store in given time interval (hour or day).

E) Plot the distribution in part B (3 if you stick close to class notes, or 1 if you venture out). You can begin by reading up on the plot()Links to an external site. function, and seeing the coded lecture examples

#Normal distribution
mean_n <- 5
sd_n <- 1

#Binomial distribution
size_b <- 15
prob_b <- 0.5

#Poisson distribution
lambda_p <- 6

#values for X
x <- seq(from =  mean_n - 3*sd_n, 
               to =  mean_n + 3*sd_n, 
               length.out =  25)

# PDF values for each distribution
pdf_n <- dnorm(x,
               mean = mean_n,
               sd = sd_n)
pdf_b <- dbinom(0:20,
               size = size_b,
               prob = prob_b)
pdf_p <- dpois(0:15,
               lambda = lambda_p)


#plot normal
plot(x,
     pdf_n,
     type = "l",
     col = "blue",
     main = "Normal Distribution",
     ylab = "PDF")
abline(v = mean_n, col = "darkgreen", lty = 3)

#plot binomial
plot(0:20,
     pdf_b,
     type = "h",
     col = "blue",
     main = "Binomial Distribution",
     ylab = "PDF")

#plot poisson
plot(0:15,
     pdf_p,
     type = "h",
     col = "blue",
     main = "Poisson Distribution",
     ylab = "PDF")

Part 2

Prompt

Let’s assume that a hospital’s neurosurgical team performed N procedures for in-brain bleeding last year. x of these procedures resulted in death within 30 days. If the national proportion for death in these cases is π, then is there evidence to suggest that your hospital’s proportion of deaths is more extreme than the national proportion?

In other words, pick your own values of N, x, and π.

x is necessarily less than or equal to N, and π is a fixed probability of success.

The probability should be greater than or equal to x.

Then model both as a binomial and a Poisson, and provide your R code solutions. Do you get similar answers or not under the two different distributional assumptions, and can you guess why ?

Work

N =250 procedures performed

x = 60 of these procedures resulted in death within 30 days

π = 0.20 (20%) as the national proportion for death in such cases

Binomial

n <- 250
x <- 60
p <- 0.20

# Calculate the probability (binomial distribution)
bin_prob <- sum(pbinom(x-1, # range
                   size = n, ## Procedures performed
                   prob = p, # probability of death
                   lower.tail = FALSE)) #if we are looking for greater than do FALSE

print(bin_prob)

## [1] 0.06885367

Poisson

lambda <- n * p

pois_prob <- sum(ppois(x-1,
                      lambda, 
                      lower.tail = FALSE))

print(pois_prob)

## [1] 0.09226505

Poisson is a limiting case of binomial distribution, as the number of trials increases and the probability of success is smaller in the binomial distribution, the closer resemblance between the two. In my example the number is similar but not the same, if i was to increase the # of procedures performed then i will likely see my two answers become even closer in result.

Discussion_6

Bryan Calderon