Module 3 Discussion

Foundational Distributions:

A. Please explain each of the 3 distributions in less than 4 sentences.

Normal Distribution: A normal distribution is a continuous distribution meaning the integers are infinite. The normal distribution is the most widely known distribution used and has a symmetric bell-shape to it. The bell-shape being symmetric means that the mean, mode, and median are all equal.

Binomial Distribution: A binomial distribution is a discrete distribution which means the outcomes are finite and the trials are identical. Each observation has two possible outcomes: “success” or “failure”. The probability of success is fixed and remains the same for each individual trial.

Poisson Distribution: The Poisson distribution is similar to the binomial distribution where it is a discrete distribution. The Poisson distribution describes the number of times an event occurs during a specific interval, where the interval may be time, distance, area, or volume. Lastly, lambada is the only parameter which is the mean number of events in the interval.

B. Explain what the pdf and cdf of a distribution measures. Pick any of the three distributions (or a distribution from the list above that we have not covered in class), and provide some intuition as to if the pdf formula makes sense or not.

Probability density function (“pdf”) can be applied to continuous variable distributions (infiite). The pdf allows you to calculate the probability of a range of possible outcomes, as opposed to calculating the probability of a singular outcome. The cumulative density function (“cdf”) can be applied to both continuous and discrete (finite) variable distributions. The cdf outcome will not be negative and will always range between 0 and 1. The cdf outcome will be the sum of probabilities that are less than or equal to each value in scope.

The pdf formula would make sense being applied to a normal distribution because it is continuous. When you are looking to calculate the range of values in a normal distribution, the pdf formula would be best because the possible values are infinite. For example, let’s say I was looking to calculate the probability of Boston receiving 2 - 3 inches of snow tomorrow. Because these values are infinite (i.e. Boston could receive 0.0000001 inches or 1 million inches of snow), the pdf formula would need to be applied to calculate the probability.

C. What are the key parameters that define the 3 distributions above (or a distribution from the list above)? Does R require these key parameters to be declared ? Type the “?distribution” command in R to find out.

Binomial Distribution (dbinom) key parameters: x = vector of quantiles, size = number of trials, prob = probability of success of each trial.

Poisson Distribution (dpois) key parameters: x = vector of (non-negative integer) quantiles, lamada = vector of (non-negative) means

Normal distribution (dnorm) key parameters: x = vector of quantiles, mean = vector of means, sd = vector of standard deviations

D. Give a few examples of situations that can be modeled with each of the 3 distributions above. You can try to read Chapter 1.3 Parametric Families of Distribution in Introduction to Statistical Thought by Michael Lavine recommended textbook.

Normal Distribution Examples:

The amount of rainfall in inches for a given month
The weights of newborn children on 2/1/2024
The amount of time it takes to finish a novel

Binomial Distributions Examples:

The probability of making a free throw shot
The number of planted tomatoes ready to be picked
The probability of making a penalty kick

Poisson Distribution Examples:

The number of people to take the train at Quincy Center
The number of accidents to occur at an intersection
The amount of delayed flights at an airport

E. Plot the distribution in part B (3 if you stick close to class notes, or 1 if you venture out).

This example looks at the normal temperature of a human is known to be 98.6 degrees Fahrenheit. The code below creates a normal distribution with 98.6 being centered at the mean.

Using the density norm (dnorm) function to display the density of observations with a temperature between 99.5 and 100.5.

# Set the mean and standard deviation
mu    <- 98.6
sigma <- 1

# Generate a range of values around the mean
?seq
x <- seq(from       =  mu - 3*sigma,
         to         =  mu + 3*sigma, 
         length.out =  1000
         )

# Calculate the probability density function
pdf <- dnorm(x    = x, 
             mean = mu, 
             sd   = sigma
             )

# Plot the normal distribution

plot(x    = x, 
     y    = pdf,
     type = 'l', 
     col  = 'lightblue', 
     lwd  = 3, 
     xlab = 'Temperature', 
     ylab = 'Density',
     main = 'Normal Distribution with Tempertaure Mean 98.6 Degrees'
     )

abline(v = 98.6, col = 'darkblue', lty = 2)

# Shade the area under the curve between 99.5 and 100.5
x_shade <- seq(from       = 99.5, 
               to         = 100.5, 
               length.out = 100
               )

pdf_shade <- dnorm(x    = x_shade, 
                   mean = mu, 
                   sd   = sigma
                   )

polygon(x      =  c(x_shade, rev(x_shade)),
        y      =  c(pdf_shade, rep(x      = 0, 
                                   times  = length(pdf_shade)
                                  )
                   ), 
        col    = 'darkred', 
        border = NA
        )

Using the same example as before, you can plot a CDF using pnorm.

curve(pnorm(x, mean = mu, sd = sigma), 
      from = mu - 3*sigma,
      to = mu + 3*sigma,
      main = "CDF for Tempertaure Mean 98.6 Degrees",
      ylab = "F(x)")

II. Converge of Distributions:

TASK:

Let’s assume that a hospital’s neurosurgical team performed 150 procedures for in-brain bleeding last year. 20 of these procedures resulted in death within 30 days. If the national proportion for death in these cases is 12% then is there evidence to suggest that your hospital’s proportion of deaths is more extreme than the national proportion?

Binomial Example:

## BINOMIAL EXAMPLE ##

# Number of trials (procedures)
n <- 150

# Probability of success (death rate)
pi <- 0.12

# Generate values for x (number of successes)
x_values <- 0:150

# Calculate the probabilities for each value of x
probs <- dbinom(x = x_values,
           size = n, 
           prob = pi)

bionom <- sum(dbinom(20:150,150,.12))
print(bionom)

## [1] 0.3431109

## PLOTTING BINOMIAL ##
barplot(height = probs, 
        names.arg = x_values, 
        col = "blue", 
        main = "Binomial Distribution", 
        xlab = "Number of Successes", 
        ylab = "Probability"
        )

Poisson Example:

## POISSON EXAMPLE ##
n <- 150
x <- 20
range <- 0:150
pi <- .12
l <- n * pi

pois <- sum(dpois(x=20:150,lambda = l))
print(pois)

## [1] 0.3490839

## PLOTTING POISSON ##

values <- dpois(x = range, lambda = l)

# Plot
plot(x = 0:150, 
     y = values, 
     type = "h", 
     lwd = 2, 
     col = "blue",
     xlab = "Number of Events", 
     ylab = "Probability",
     main = "Poisson Distribution")

The results I get from implementing binomial model (.3431109) and Poisson model (0.3490839) are similar, but are not identical. The reason for this could be that the probability in the binomial distribution remains constant in each trial, compared to the Poisson model does not explicitly consider the probability of success.

Module 3 Discussion

Chris Toomey

2024-02-02