Week 3 Discussion

Part I

A) Please explain each of the 3 distributions in less than 4 sentences.

A Normal distribution is a continuous distribution that is symmetric about its mean. This is represented by your standard bell shaped curve. These distributions can be standardized and represented by their z-scores, allowing for you to compare distributions that may have different means and standard deviations.

A Binomial distribution is a discrete probability distribution that is used for trials of an experiment where each trial only has two outcomes. These trials are also independent of each other.

A Poisson distribution is a discrete probability distribution used to model how many times an event happens during a specified interval. These intervals can be time, distance, area, or volume. The Poisson distribution is typically used when there is large populations with very small probabilities

B) Explain what the pdf and cdf of a distribution means. Pick any of the three distributions, and provide some intuition as to if the pdf formula makes sense or not

In a continuous distribution it is not possible to determine the probability of an exact value. That’s where the PDF and CDF functions come into play. The PDF shows the likelihood of outcomes over a range. The CDF is the compliment to the PDF so it is the cumulative probability of everything left of the PDF. The PDF and CDF formulas would make sense for a Normal distribution as that is a continuous distribution.

C) What are the key parameters that define the 3 distributions above? Does R require these key parameters to be declared?

The key parameters for a Normal distribution is the mean of the distribution and standard deviation. In R, the mean and standard deviation aren’t required to be declared, but they will assume the default values of 0 and 1 respectively, if not declared.

The key parameters for a Binomial distribution the number of trials (n) and probability of success (p). The size (n) is required in R, if not provided you will receive a NaN

The key parameter for a Poisson distribution is the rate also known as lambda. This value is required and if lambda is invalid you will receive a NaN value.

D) Give a few examples of situations that can be modeled with each of the 3 distributions above.

Normal Distribution examples: IQ of people, SAT Scores, and the height of people

Binomial Distribution examples: If a cure worked or not, How many times head showed if flipping a coin 10 times, or if people responded to an email or not.

Poisson Distribution examples: Support tickets per day, cars arriving at an intersection, and amount of times an elevator fails in a year.

E) Plot the distribution in part B

Normal distribution graph:

Height among males, given the mean is 68 inches with a standard deviation of 2 inches.

# creating a sequence of x values
x <- seq(60, 76, by = 0.1)

# Using PDF to find the Y axis values
y <- dnorm(x, 68, 2)

plot(
  x,
  y,
  type = 'l',
  xlab = 'Height in inches',
  ylab = 'Density',
  main = 'Normal Distribution of Height Among Males'
)

Binomial distribution graph:

What is the probability of rolling a 4 two times if you roll a dice 6 times? n = 6, p = 0.1667 (1/6), x = 2

# number of trials
n <- 6

# probability of success
p <- 0.1667

# x values (number of successes)
x <- 0:n

# calculating the probabilities for each value of x
prob <- dbinom(
  x,
  n,
  p,
)

#plot the Binomial distribution
barplot(height = prob, 
        names.arg = x, 
        main = "Binomial Distribution", 
        xlab = "Number of Successes", 
        ylab = "Probability"
        )

Poisson distribution graph:

An intersection sees on average 7 cars a day. What is the probability that 10 cars appeared at the intersection in one day? Lambda = 7 per day

lambda <- 7

# x values (number of events)
x <- 0:20

# calculate the PMF using dpois
pmf <- dpois(x, lambda)

# plot the pmf
plot(
  x,
  pmf,
  type = 'h',
  xlab = 'Number of events',
  ylab = 'probability',
  main = 'Poisson Distribution'
)

Part II

Let’s assume that a hospital’s neurosurgical team performed 25 procedures for in-brain bleeding last year. 3 of these procedures resulted in death within 30 days. If the national proportion for death in these cases is 0.20, then is there evidence to suggest that your hospital’s proportion of deaths is more extreme than the national proportion?

N = 25, x = 3, \(\pi\) = 0.20

A) Model both as a binomial and a Poisson, and provide your R code solutions.

Binomial:

# setting variables
N <- 25
x <- 3
pi <- 0.20
# Probability statement P(X>=3 | N = 25, pi =0.20)
binomial <- sum(dbinom(3:25, N, pi))
binomial

## [1] 0.9017748

Poisson:

N <- 25
x <- 3
pi <- 0.20
lambda <- N * pi

# probability statement P(Y>=3 | lambda = 5)
poisson <- 1-ppois(2, 5)
poisson

## [1] 0.875348

B) Do you get similar answers or not under the two different distributional assumptions, and can you guess why?

Yes, the Binomial gave me 0.9018 and the Poisson gave me 0.8753.I assume the results since the Poisson is the approximation of the Binomial distribution for very rare events, and the situation was a rare event