1. Foundational Distributions:

Please explain each of the 3 distributions in less than 4 sentences. A Normal Distribution is symmetric around the mean, mode, and median, which are all the same. A Binomial distribution is defined as having 2 possible outcomes, the outcomes are mutually exclusive, a random variable that is the result of counts, and each trial is independent. A Poisson Distribution is an approximation of the Binomial Distributions and used for estimating the number of events in a large population over a unit of time.

A. Explain what the pdf and cdf of a distribution measures. The probability density function, pdf and sometimes referred to as probability distribution function, is the equation that defines the ‘curve’ that represents the probabilities of a set of events, where the area under that curve should sum to 1. a cumulative distribution function, cdf, is a curve that represents the ‘cumulative’/total area under the curve over the range of random variables.

B Pick any of the three distributions (or a distribution from the list above that we have not covered in class), and provide some intuition as to if the pdf formula makes sense or not. The pdf formula for a Binomial is:

\(P(x) = {n \choose x}p^x(1 - p)^{n-x}\)

which makes sense because \(p^x(1 - p)^{n-x}\) calculates the probability of one instance of the needed number of success and failures, but doesn’t take into account the different order of successes and failures, so we need multiply that by ${n x} to determine the probability of every instance.

C. What are the key parameters that define the 3 distributions above (or a distribution from the list above)? Does R require these key parameters to be declared ? Type the “?distribution” command in R to find out.

Normal Distribution - R only requires one parameter, the random variable or vector of random variables. But you can also specify the mean and standard deviation if it is not a Standard Normal Distribution

Binomial distribution - R requires a single value or vector, the size of the population, and the probability of achieving success for the value(s)

Poisson Distribution - R requires the number of successes in one unit of time and the rate per one unit of time.

note: all 3 of these also accept boolean parameters regarding log and lower.tail, but I don’t know what they are used for yet.

4. Give a few examples of situations that can be modeled with each of the 3 distributions above. You can try to read Chapter 1.3 Parametric Families of Distribution in Introduction to Statistical Thought by Michael Lavine recommended textbook.

Normal Distribution Examples - Test Scores, Birth Rates, Height, Weight, which tend to be normally distributed.

Binomial distribution Examples - Surveys with Yes/No questions, flipping a coin, quality assurance (good vs. bad)

Poisson Distribution Examples - Daily Visitors to a National Park, number of requests on a web server, number of times awakened by noise each night.

5. Plot the distribution in part B (3 if you stick close to class notes, or 1 if you venture out). You can begin by reading up on the plot() function, and seeing the coded lecture examples - https://rpubs.com/sharmaar2/Distributions.

Normal Distribution Plot

The average goals scored by the winning team in a 2022-23 regular season hockey game is 3.09 with a # standard deviation of 1.74. What is the probability that a randomly selected winning team scored less 2 goals?

mu <- 3.09
sigma <- 1.74
st3 <- 3*sigma

# Generate a range of values around the mean

x <- seq(from = mu - st3, to = mu + st3, length.out =  2000)

# Calculate the probability density function

myPdf <- dnorm(x,mu,sigma)

# Plot the normal distribution

plot(x,myPdf,type="l",col="red",lwd=2,xlab="Goals",ylab="Density",main = "Normal Distribution with Mean 3.09 and SD 1.74")

Binomial

Based on polling the probability of Donald Duck of being favored in a survey of Mickey Mouse is .63. If 100 people vote in the survey what is the probability gets less than half the votes?

n <- 100
px <- .63
x <- 0:49
psums <- sum(dbinom(x,n,px))

# Plot the distribution

bdist <- 0:100
probs <- dbinom(bdist, n, px)
barplot(probs,bdist, col = "red", main = "Binomial Distribution", 
xlab = "Number of Votes for Donald Duck", 
ylab = "Probability")

Poisson The expected number of cars entering the North Gate of Yosemite National park is 33 per hour What is the probability that 20 to 30 cars enter in an hour

# 
ppois(q = 30, lambda = 33, lower.tail = TRUE) - 
+     ppois(q = 19, lambda = 33, lower.tail = TRUE)
## [1] 0.334441
success <- 0:60
l = 33
plot(success, dpois(success, lambda=l), type='h')

PART II. Converge of Distributions

BACKGROUND: Often, we can model processes using several different probability distributions. For exampleLinks to an external site., we might use the Poisson instead of the binomial (if n>20 and np<10 i.e. large n and small p) as we did in class, the binomial instead of the geometric (both are repetitions of independent Bernoulli trials Download Bernoulli trials), or the normal approximation instead of the binomial (if np>10 and nq>10 i.e. n is large). If the assumptions are understood, then the probability results will be nearly identical.

TASK:

Let’s assume that a hospital’s neurosurgical team performed N procedures for in-brain bleeding last year. x of these procedures resulted in death within 30 days. If the national proportion for death in these cases is , then is there evidence to suggest that your hospital’s proportion of deaths is more extreme than the national proportion?

Pick your own values of N, x, and pi. x is necessarily less than or equal to N, and is a fixed probability of success. The probability should be greater than or equal to x.

Then model both as a binomial and a Poisson, and provide your R code solutions. Do you get similar answers or not under the two different distributional assumptions, and can you guess why? Hint: Build your code from Week 3 Inclass.R (attached below for convenience) and skim over even Key Distributions.html Download Key Distributions.htmlto brush up on your basic concepts.

# As binomial
N <- 1000. # per year
x <- 12 # per 30 days = 1 month, 12 per year
pi <- .0075
choose(n = N, k = x) * pi^x * (1-pi)^(N-x) # gives same answer as above
## [1] 0.03642281
dbinom(x   = 12, size = 1000, prob = .0075) # should give same result
## [1] 0.03642281
# As Poisson
sum(dpois(x = 1:11, lambda = 12)) 
## [1] 0.4615912