Part 1

A) Please explain each of the 3 distributions in less than 4 sentences.

  • Normal Distribution: It is a type of continuous distribution that is defined by two parameters (mean and Standard deviation). it contains a symmetric bell curved shape where the mean represents the central value while the standard deviation shows the spread of the data.

  • Binomial: A discrete probability distribution in which Its trials are determined by two possible outcomes (Success or failure) in which they help model the number of successes in fa ixed number of independent trials. It is defined by the number of trials and the probability of success for those trials. We also have to take into account that the two possible outcomes are mutually exclusive of each other.

  • Poisson: A discrete probability distribution most commonly used to model rare or random events in a fixed interval. it is defined by the number of events ( x ) occurring in the fixed interval.

B) Explain what the pdf and cdf of a distribution measures. Pick any of the three distributions (or a distribution from the list above that we have not covered in class), and provide some intuition as to if the pdf formula makes sense or not.

PDF quantifies the likelihood of a continuous random variable taking on a specific value. A CDF on the other hand can be associated with both discerete and continuous probability distributions measures, the cumulative probability of the variable being less than or equal to a given value.

Normal Distribution: The below PDF formula makes sense, the mean is locating the center of the distribution and and standard deviation which analyzes the dispersion of the data are both intuitive in relation to the symmetric bell curve shape. Due to the fact that the data is relative to the center of the bell curve helps to understand how the probability distribution changes as it moves further away from the mean.

C) What are the key parameters that define the 3 distributions above (or a distribution from the list above)? Does R require these key parameters to be declared ? Type the “?distribution” command in R to find out.

  • Normal Distribution: It is defined by two key parameters (mean and Standard deviation).

  • Binomial: It is defined by the number of trials and the probability of success for those trials.

  • Poisson: it is defined by the number of events ( x ) occurring in a fixed interval of time/space

These Parameters above need to be declared in R as well as the random variable

Part 2

Prompt

Let’s assume that a hospital’s neurosurgical team performed N procedures for in-brain bleeding last year. x of these procedures resulted in death within 30 days. If the national proportion for death in these cases is π, then is there evidence to suggest that your hospital’s proportion of deaths is more extreme than the national proportion?

In other words, pick your own values of N, x, and π.

x is necessarily less than or equal to N, and π is a fixed probability of success.

The probability should be greater than or equal to x.

Then model both as a binomial and a Poisson, and provide your R code solutions. Do you get similar answers or not under the two different distributional assumptions, and can you guess why ? 

Work

N =250 procedures performed

x = 60 of these procedures resulted in death within 30 days

π = 0.20 (20%) as the national proportion for death in such cases

Binomial

n <- 250
x <- 60
p <- 0.20

# Calculate the probability (binomial distribution)
bin_prob <- sum(pbinom(x-1, # range
                   size = n, ## Procedures performed
                   prob = p, # probability of death
                   lower.tail = FALSE)) #if we are looking for greater than do FALSE

print(bin_prob)
## [1] 0.06885367

Poisson

lambda <- n * p

pois_prob <- sum(ppois(x-1,
                      lambda, 
                      lower.tail = FALSE))

print(pois_prob)
## [1] 0.09226505

Poisson is a limiting case of binomial distribution, as the number of trials increases and the probability of success is smaller in the binomial distribution, the closer resemblance between the two. In my example the number is similar but not the same, if i was to increase the # of procedures performed then i will likely see my two answers become even closer in result.