1.Definition of the Three Distributions

#Normal disturbition: The normal distribution is a symmetric probability distribution centered around the mean, with its shape resembling a bell curve. It is defined by two parameters: the mean (μ), which determines the center of the distribution, and the standard deviation (σ), which measures the spread. The distribution follows the empirical rule, where about 68% of the data falls within one standard deviation of the mean, 95% within two, and 99.7% within three.

#Binomial distribution: The binomial distribution is a way of figuring out the chances of getting a certain number of successes in a fixed number of tries, where each try can only have one of two outcomes (success or failure) and the probability of success is always the same. It’s defined by two things: the number of tries (n) and the probability of success in each try (p). This distribution helps to calculate probabilities like “the chance of getting exactly k successes out of n tries.”

#Poisson Distribution: The Poisson distribution is a way to figure out how many events happen in a specific amount of time or space. It assumes that the events happen independently and at a steady average rate. It’s characterized by one number, the rate (λ), which shows the average number of events in that time or space. This distribution is handy for estimating the chance of a certain number of events happening in a set period, like the number of emails received in an hour or the arrival of customers at a store.

2. PDF and CDF

#The probability density function (PDF) shows how probabilities are spread out in a continuous distribution, indicating the density of probabilities at any point. The cumulative distribution function (CDF) represents the probability that a variable takes a value less than or equal to a certain point, adding up probabilities up to that point. For the normal distribution, the PDF’s bell curve formula lines up with natural phenomena’s tendency towards a mean, making it pretty intuitive. The binomial distribution’s probability mass function (similar in role to a PDF for discrete distributions) logically models the probability of a fixed number of successes in trials with binary outcomes. The Poisson distribution’s probability mass function, great for modeling events occurring at a constant rate over time, reasonably predicts the occurrence of rare events within fixed intervals.

3 Key Parameters

# Normal Distribution: Parameters include mean (μ) and standard deviation (σ).
# pdf
dnorm

## function (x, mean = 0, sd = 1, log = FALSE) 
## .Call(C_dnorm, x, mean, sd, log)
## <bytecode: 0x000001bd0402ed28>
## <environment: namespace:stats>

#cdf
pnorm

## function (q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) 
## .Call(C_pnorm, q, mean, sd, lower.tail, log.p)
## <bytecode: 0x000001bd041757c8>
## <environment: namespace:stats>

# Binomial Distribution: Parameters include the number of trials (n) and the probability of success (p)
# pdf
dbinom

## function (x, size, prob, log = FALSE) 
## .Call(C_dbinom, x, size, prob, log)
## <bytecode: 0x000001bd043467f0>
## <environment: namespace:stats>

#cdf
pbinom

## function (q, size, prob, lower.tail = TRUE, log.p = FALSE) 
## .Call(C_pbinom, q, size, prob, lower.tail, log.p)
## <bytecode: 0x000001bd04435ea0>
## <environment: namespace:stats>

# Poisson Distribution: Parameters include the average rate (λ) of occurrence.
# pdf
dpois

## function (x, lambda, log = FALSE) 
## .Call(C_dpois, x, lambda, log)
## <bytecode: 0x000001bd04602918>
## <environment: namespace:stats>

#cdf
ppois

## function (q, lambda, lower.tail = TRUE, log.p = FALSE) 
## .Call(C_ppois, q, lambda, lower.tail, log.p)
## <bytecode: 0x000001bd0484c1c8>
## <environment: namespace:stats>

4 Applications

#Binomial Distribution Examples: Quality control: The number of defective (or non-defective) items in a batch of products, assuming each item’s defectiveness is independent of the others. Tossing coins: The number of heads (successes) in a fixed number of coin tosses.

#Poisson Distribution examples : Call center calls: The number of incoming calls to a call center within an hour. Website hits: The number of hits on a website within a minute.

#Normal Distribution: Measurement errors: Small random errors in scientific measurements often follow a normal distribution, centered around the true measurement. Stock market returns: The returns on a stock over a long period can sometimes be modeled by a normal distribution, assuming a large number of small, independent factors influence the return.

Q5

#Binomial distribution:
#parameters
n <-  20  #total trials
p <- 0.2 # whether products are defective or not

#Generate x values
x <- 0:n # Values for the number of successes

#Calculate the probability mass function (PMF) for each x value
pmf_binomial <- dbinom(x, size = n, prob = p)

# Plot the Binomial distribution
plot(x, pmf_binomial, type = "h", lwd = 10, col = "yellow", xlab = "not defective", ylab = "Probability", 
     main = paste("The Probability of a product being defective"))

#Normal distribution:

# Set the mean and standard deviation
mu    <- 100
sigma <- 1

#Generate a range of values around the mean
x <- seq(from       =  mu - 3*sigma,
         to         =  mu + 3*sigma, 
         length.out =  1000
         )

#Calculate the probability density function
pdf <- dnorm(x    = x, 
             mean = mu, 
             sd   = sigma
             )

#Plot the normal distribution

plot(x    = x, 
     y    = pdf,
     type = 'l', 
     col  = 'red', 
     lwd  = 2, 
     xlab = 'stock market', 
     ylab = 'Density',
     main = 'Normal Distribution with Mean 100 and SD 1'
     )

#Poisson distribution
#Parameters
lambda_t <- 4  # A website's average number of hit over an 1 week

#Generate x values
x <- 0:7  # Number of days

#pmf for every event/month
pmf_pois <- dpois(x, lambda_t)

#visual
plot(x, pmf_pois, type = "h", lwd = 10, col = "blue", xlab = "Number of hits", ylab = "Probability", main = "Average Probability of a website Being attack")

# Part 2
# Parameter
n <- 50   # num of procedures in the last year
X <- 8    # num deaths in past 30 days
range <- 8:50
pi <- .15 # national proportion of death

sum(dbinom(8:50,size = 50,prob = .15))

## [1] 0.4812479

1 - pbinom(8,50,.12)

## [1] 0.1392006

pbinom(8,50,.15,lower.tail=F)

## [1] 0.3318993

#The probability is 0.3319

n <- 50   # num of procedures in the last year
X <- 8    # num deaths in past 30 days
pi <- .15 # national proportion of death
lambda_t <- n * pi

sum(dpois(x = 8:50, lambda = lambda_t) )

## [1] 0.4753615

1 - ppois(q = 8,lambda = lambda_t,lower.tail=T)

## [1] 0.3380329

ppois(q = 8,lambda = lambda_t,lower.tail=F)

## [1] 0.3380329

1 - ppois( q          =8,  
           lambda     = lambda_t, 
           lower.tail = TRUE
           )

## [1] 0.3380329

#The probability is 0.3380

#I believe we’re getting very similar probabilities using both methods. I think it’s because the probability distribution doesn’t change much whether we’re dealing with a finite or infinite number of trials due to the large sample size.

#Binomial distribution

# Parameters
N  <- 50
pi <- 0.15

# Generate x values (number of events)
x_values <- 0:50

# Calculate the cumulative probabilities using ppois
pmf_values <- dbinom(x = x_values,
                     size = 50, 
                     prob = pi
                     )

# Plot the CDF as a "step" plot
plot(x = x_values, 
     y = pmf_values, 
     type = "h", 
     lwd = 2, 
     col = "red",
     xlab = "Number of death", 
     ylab = "Probability",
     main = "Binomial Distribution PMF"
     )

#poisson distribution
 
lambda <- 50*0.15  # Mean of the Poisson distribution

# Generate x values (number of events)
x_values <- 0:50

# Calculate the PMF using dpois
pmf_values <- dpois(x = x_values, 
                    lambda = lambda
                    )

# Plot the PMF
plot(x = x_values, 
     y = pmf_values, 
     type = "h", 
     lwd = 2, 
     col = "blue",
     xlab = "Number of death", 
     ylab = "Probability",
     main = "Poisson Distribution PMF")

DISCUSSION_3

ANDI XU

2024-04-06

1.Definition of the Three Distributions

2. PDF and CDF

3 Key Parameters

4 Applications

Q5