Normal Distribution: Normal distribution is the probability distribution that is symmetric around the mean. Normal distribution are bell shaped, because data near the mean are observed more frequent compared to data far from the mean. The mean in normal distribution defines the center of the data, and the standard deviation is the measure of the spread or distance from mean. Normal distributions follow the empirical rule, that states roughly 68.2% of the observations will appear within one standard deviation of the mean, 95.4% of the observations will fall within two standard deviations; and 99.7% within three standard deviations.
Binomial Distribution: Similar to Bernoulli distribution, Binomial distribution is used to model the probability of observing values in the form of one of two independent values. However, one difference in the parameters, is that in binomial distribution there are a fixed number of independent Bernoulli trials, denoted as (n).
Poisson Distribution: Poisson distribution models how many times an event is likely to occur over a specified period. Poisson distributions are discrete functions, meaning the variable can only take specific values, such as whole numbers, with no fractions or decimals. One example of poisson distribution is the the to model the distrubtion of the number of aircraft accidents in one year such as 2023.
Explain what the pdf and cdf of a distribution measures.
The Probability Density Function (PDF) measures the distribution of a continuous random variable. A continuous random variable is a variable that takes an infinite number of possible values. PDF defines the probability of an outcome will occur, shown as the percentage of a data set’s distribution that falls between two criteria. Whereas, the Cumulative Distribution Function (CDF) of a distribution measures the cumulative probability a variable will take a value between 0 and 1. In other words, CDF is used as a measurement of the likelihood that a random observation taken from the population will be less than or equal to x. The PDF formula applies to normal, binomial and poisson distributions, respectfully. For Normal Distribution, the PDF formula can be used to calculate the probability for a certain range of values and not for a single value. For Binomial Distribution, the PDF formula can be used to compute the the probability of successes occurring during of the sample size. For Poisson Distribution, the pdf function, calculates the probability of an event occurring within a given time or space interval exactly.
What are the key parameters that define the 3 distributions above?
Normal Distribution: The key parameters for normal distribution are Mean, Standard Deviation, x is a vector of numbers, p is a vector of probabilities, n is number of sample observations. R requires sample size (n), mean and standard deviation to be declared for normal distribution.
Binomial Distribution: The key parameters for binomial distribution are the vector of quantiles (x, q), vector of probabilities (p), number of observations (n), number of trials (zero or more) (size), probability of success on each trial (prob), as well as log, log.p and the lower tail. R only requires p and n as declared arguments in the binomial distribution function.
Poisson Distribution: The key parameters for poisson distribution are, vector of (non-negative integer) quantiles (x), vector of quantiles (q), vector of probabilities (p), the number of random values to return (n), lambda which is the vector of (non-negative) means, as well as log, log.p and the lower tail. Only lambda is required in R to create a poisson distribution.
Normal Distribution:
Binomial Distribution:
Poisson Distribution:
the number of bankruptcies that are filed in a month
the number of airport arrivals in one hour
the average customers served at a diner in 45 minutes
Plot the distribution in part B: (Note: I created these example for fun)
Normal Distribution Plot:
##Average Major League Baseball Retirement Age of 29.5 years old
##Standard Deviation of 1 year
# Set the mean and standard deviation}
mu <- 29.5
sigma <- 1
# Generate a range of values around the mean
?seq
x <- seq(from = mu - 3*sigma,
to = mu + 3*sigma,
length.out = 1000
)
# Calculate the probability density function
pdf <- dnorm(x = x,
mean = mu,
sd = sigma
)
# Plot the normal distribution
plot(x = x,
y = pdf,
type = 'l',
col = 'red',
lwd = 2,
xlab = 'Age',
ylab = 'Density',
main = 'Normal Distribution with Mean 29.5 and SD 1'
)
# apply empirical rule to check the area under 1, 2 and 3 sd of the mean
?pnorm
pnorm(q = 29.5 + 1, mean = 29.5, sd = 1 ) - pnorm(q = 29.5 - 1, mean = 29.5, sd = 1 )
## [1] 0.6826895
pnorm(q = 29.5 + 2, mean = 29.5, sd = 1 ) - pnorm(q = 29.5 - 2, mean = 29.5, sd = 1 )
## [1] 0.9544997
pnorm(q = 29.5 + 3, mean = 29.5, sd = 1 ) - pnorm(q = 29.5 - 3, mean = 29.5, sd = 1 )
## [1] 0.9973002
Binomial Distribution Plot: Estimating the chances of success for a free-throw shooter in basketball
In 2023, the average NBA free throw percentage is 78.8%, player X made 6 free throws in 10 free trows attempts his last game. Does sufficient evidence exist to suggest that the player need more free throw practice. In other words, what is the probability that this team would experience something this or more severe, i.e., 29 or more misses in 100 attempts trials if they were shooting at the NBA average of 78.8%?
Sample size = 10 free throws
2023 NBA free throw average = 0.778
Player X Successful Free Throws = 6
# Number of trials
n <- 10
# Probability of success
p <- 0.6
# Generate values for x (number of successes)
x <- 0:n
# Calculate the probabilities for each value of x (can be calculated for individual values as well instead of the entire x vector)
dbinom(x = 0:10,
size = 10,
prob = 0.788
)
## [1] 1.833828e-07 6.816304e-06 1.140123e-04 1.130085e-03 7.350880e-03
## [6] 3.278770e-02 1.015594e-01 2.157110e-01 3.006727e-01 2.483544e-01
## [11] 9.231285e-02
# store them
probabilities <- dbinom(x = x,
size = n,
prob = p)
# Plot the binomial distribution
barplot(height = probabilities,
names.arg = x,
col = "skyblue",
main = "Binomial Distribution",
xlab = "Number of Successes",
ylab = "Probability"
)
# Determine PDF
sum(dbinom(x = 6:10,
size = 10,
prob = 0.788)
)
## [1] 0.9586103
Poisson Distribution Plot:
A typical fast food drive through serves customers 4 customers per 2 minutes. I observed my local fast food chain and counted 5 customers being served in the span of 4 minutes. What is the probability that this fast food location is within the franchise standard. In other words, what is the probability of having 5 or more customers served in 4 minutes.
## Let X be count of customers served in 2 minute
# We observed 5 customers served in 4 minutes.
t <- 2 # 4 minute sample / 2 minute avg.
lambda_t <- 4 * t # lambda = 4 (4 customers served in 2 minute); but there are 4 minutes in the sample so scale lambda by time
lambda_t
## [1] 8
# P (Y >=5 | lambda * t = 8)
### P(Y >= 5 errors | lambda=4, t=8)
ppois( q = 8, # discrete variable adjustment - read lower tail argument specification P[X>x]
lambda = lambda_t,
lower.tail = FALSE
)
## [1] 0.4074527
# Poisson Distribution Histogram
rpois # random sample to generate distribution
## function (n, lambda)
## .Call(C_rpois, n, lambda)
## <bytecode: 0x7faa596958a0>
## <environment: namespace:stats>
hist(rpois(n = 10000,
lambda = 4),
col = "blue",
xlab = "Customers Served",
ylab = "Probability",
main = "Poisson Distribution Plot")
N = 600 procedures for in-brain bleeding last year
x = 3 of these procedures resulted in death within 30 days
� = 2 deaths in every 400 patients - national proportion for death in these cases
Binomial Distribution Model:
# number of trials
n <- 600
# probability
p <- 0.005
# Generate values for x
x <- 0:n
# Calculate the probabilities for each value of x
dbinom(x = 0:600,
size = 600,
prob = 0.005)
## [1] 4.941382e-02 1.489864e-01 2.242283e-01 2.246039e-01 1.684529e-01
## [6] 1.009024e-01 5.028220e-02 2.144123e-02 7.986588e-03 2.639899e-03
## [11] 7.840103e-04 2.113139e-04 5.212055e-05 1.184650e-05 2.496013e-06
## [16] 4.900046e-07 9.002912e-08 1.554153e-08 2.529512e-09 3.893616e-10
## [21] 5.683897e-11 7.888634e-12 1.043289e-12 1.317503e-13 1.591707e-14
## [26] 1.842860e-15 2.048018e-16 2.187907e-17 2.249948e-18 2.230064e-19
## [31] 2.132942e-20 1.970785e-21 1.760956e-22 1.523105e-23 1.276382e-24
## [36] 1.037233e-25 8.180296e-27 6.266042e-28 4.665144e-29 3.378187e-30
## [41] 2.380858e-31 1.634122e-32 1.092934e-33 7.126998e-35 4.533735e-36
## [46] 2.814915e-37 1.706661e-38 1.010895e-39 5.852441e-41 3.313042e-42
## [51] 1.834659e-43 9.942484e-45 5.274859e-46 2.740706e-47 1.395092e-48
## [56] 6.959528e-50 3.403574e-51 1.632323e-52 7.679358e-54 3.545024e-55
## [61] 1.606246e-56 7.145340e-58 3.121526e-59 1.339540e-60 5.648027e-62
## [66] 2.340427e-63 9.533490e-65 3.818258e-66 1.503940e-67 5.826933e-69
## [71] 2.221178e-70 8.331973e-72 3.076224e-73 1.118088e-74 4.001306e-76
## [76] 1.410175e-77 4.895148e-79 1.673992e-80 5.640366e-82 1.872827e-83
## [81] 6.129037e-85 1.977232e-86 6.288658e-88 1.972225e-89 6.099788e-91
## [86] 1.860769e-92 5.599485e-94 1.662413e-95 4.869905e-97 1.407821e-98
## [91] 4.016731e-100 1.131224e-101 3.145034e-103 8.632826e-105 2.339807e-106
## [96] 6.262588e-108 1.655469e-109 4.322417e-111 1.114848e-112 2.840736e-114
## [101] 7.151804e-116 1.779144e-117 4.373795e-119 1.062668e-120 2.551922e-122
## [106] 6.057686e-124 1.421520e-125 3.297943e-127 7.565074e-129 1.715927e-130
## [111] 3.848881e-132 8.537967e-134 1.873235e-135 4.065187e-137 8.726731e-139
## [116] 1.853262e-140 3.893746e-142 8.094202e-144 1.664892e-145 3.388700e-147
## [121] 6.825647e-149 1.360651e-150 2.684536e-152 5.242506e-154 1.013404e-155
## [126] 1.939217e-157 3.673638e-159 6.889980e-161 1.279429e-162 2.352422e-164
## [131] 4.282918e-166 7.721705e-168 1.378666e-169 2.437812e-171 4.269325e-173
## [136] 7.405566e-175 1.272387e-176 2.165527e-178 3.651005e-180 6.097987e-182
## [141] 1.009035e-183 1.654215e-185 2.686972e-187 4.324537e-189 6.896682e-191
## [146] 1.089893e-192 1.706827e-194 2.648957e-196 4.074350e-198 6.210941e-200
## [151] 9.384034e-202 1.405310e-203 2.086036e-205 3.069412e-207 4.477019e-209
## [156] 6.473499e-211 9.279432e-213 1.318717e-214 1.857998e-216 2.595478e-218
## [161] 3.594867e-220 4.936925e-222 6.722843e-224 9.077921e-226 1.215545e-227
## [166] 1.614063e-229 2.125438e-231 2.775675e-233 3.594961e-235 4.617832e-237
## [171] 5.883197e-239 7.434173e-241 9.317694e-243 1.158385e-244 1.428495e-246
## [176] 1.747419e-248 2.120412e-250 2.552465e-252 3.048085e-254 3.611049e-256
## [181] 4.244142e-258 4.948887e-260 5.725285e-262 6.571571e-264 7.484010e-266
## [186] 8.456739e-268 9.481673e-270 1.054850e-271 1.164474e-273 1.275593e-275
## [191] 1.386587e-277 1.495701e-279 1.601082e-281 1.700840e-283 1.793094e-285
## [196] 1.876037e-287 1.947992e-289 2.007471e-291 2.053223e-293 2.084280e-295
## [201] 2.099990e-297 2.100043e-299 2.084475e-301 2.053670e-303 2.008343e-305
## [206] 1.949513e-307 1.878464e-309 1.796701e-311 1.705893e-313 1.607824e-315
## [211] 1.504329e-317 1.397267e-319 1.289511e-321 9.881313e-324 0.000000e+00
## [216] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [221] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [226] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [231] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [236] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [241] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [246] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [251] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [256] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [261] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [266] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [271] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [276] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [281] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [286] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [291] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [296] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [301] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [306] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [311] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [316] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [321] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [326] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [331] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [336] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [341] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [346] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [351] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [356] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [361] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [366] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [371] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [376] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [381] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [386] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [391] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [396] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [401] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [406] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [411] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [416] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [421] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [426] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [431] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [436] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [441] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [446] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [451] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [456] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [461] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [466] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [471] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [476] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [481] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [486] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [491] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [496] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [501] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [506] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [511] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [516] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [521] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [526] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [531] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [536] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [541] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [546] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [551] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [556] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [561] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [566] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [571] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [576] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [581] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [586] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [591] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [596] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [601] 0.000000e+00
## Model Binomial
sum(dbinom(3:600,size = 600,prob = .005))
## [1] 0.5773715
1 - pbinom(5.01,600,0.005) # adjust for discrete variable
## [1] 0.0834123
pbinom(5.01,600,0.005,lower.tail=F) # adjust for discrete variable
## [1] 0.0834123
## Plot Binomial
# Parameters
N <- 600
pi <- 2/400 # rate of death is 2 in 400 surgeries
# Generate x values (number of events)
x_values <- 0:50
# Calculate the cumulative probabilities using ppois
pmf_values <- dbinom(x = x_values,
size = 600,
prob = pi)
# Plot the CDF as a "step" plot
plot(x = x_values,
y = pmf_values,
type = "h",
lwd = 2,
col = "red",
xlab = "Number of Events",
ylab = "Probability",
main = "Binomial Distribution PMF"
)
Poisson Distribution Model:
# Determine lambda_t
t <- 1.5 # (600 sample / 400)
# scale lambda by time
lambda_t <- 2 * t # lambda = 2 (2 deaths in every 400 cases)
lambda_t
## [1] 3
# P (Y >=3 | lambda * t = 3)
### P(Y >= 3 deaths | lambda= 2, t=1.5)
ppois( q = 3, # discrete variable adjustment - read lower tail argument specification P[X>x]
lambda = lambda_t,
lower.tail = FALSE
)
## [1] 0.3527681
## Plot Poisson
# Parameters
lambda <- 3 # Mean of the Poisson distribution
# Generate x values (number of events)
x_values <- 0:50
# Calculate the PMF using dpois
pmf_values <- dpois(x = x_values,
lambda = lambda
)
# Plot the PMF
plot(x = x_values,
y = pmf_values,
type = "h",
lwd = 2,
col = "blue",
xlab = "Number of Events",
ylab = "Probability",
main = "Poisson Distribution PMF")
Do you get similar answers or not under the two different distributional assumptions, and can you guess why?
Answer:
Yes, I got similar answers between the Poisson and Binomial models because both show the distribution of binary data, but Binomial models describe the probability of getting an events out of trials, while Poisson distribution describes the probability of getting an event in a population. In my opinion, this is why the calculations may differ but the distribution looks approximately identical.