Normal Distribution: Normal distributions are always centered around the average value, and the average or mean, mode and median are all equal. The normal distrubtion helps us understand the majority of values a given data point takes on. For example, adult height has a normal distribution. The mean tells us where the center of the curve is whereas the standard deviation tells us how wide the curve should be.
Binomial Distribution: Binomial distributions help us understand the probability of an event occurring, over several trials, and there are only two possible outcomes on each trail - success or failure. compared with failures. The outcomes must also be mutually exclusive, the random variable is a result of counts, and each trial must be independent (one has no effect on the other).
Poisson Distribution: Poisson distributions give the probability of an event happening a specific number of times within a given interval of time or space. Poisson primarily uses the mean number of events to calculate this probability. For example, what is the probability of 2 prospects becoming a customer in a year for a tech company. The event would be 2 prospects becoming customers, the interval is one year, we use the mean or lambda to say on average in a given year how many prospects become customers, which say is 5.
The PDF measures the likelihood or probability of X taking on particular value in a range of values. The probability of an event occurring is represented by the area under the curve of the PDF over a specified range. The PDF makes sense to use with the normal distribution. For example, say we wanted to understand the likelihood of dating a male with a height of 6’0”. And say we know the average male height is 5’7” and the standard deviation is 2”. We can calculate the probability of selecting a male in that range using the PDF.
The CDF describes the probability that a random variable X takes on a value less than or equal to a specified value. The CDF fits with the normal and Poisson distributions. For example, we can use the CDF to determine the probability that normally distributed variable X (say 5’5”) takes on a value less than or equal to x, say 5’9”, when talking about height. The CDF for normal distributions is symmetric about its mean, so at the mean = 5’7”, we know there is a 50% chance that random variable 5’5” falls below its mean. The CDF also works with Poisson. Say we want to understand how many visits we had to a website in a given hour. We can use the CDF to determine the probability of 200 visitors or less in an hour using the average.
For the normal distribution, the key components are:
q, the variable in question
the mean, which defaults to 0
the standard deviation, which defaults to 1
lower.tail which allows us to choose whether we are interested in probabilities less than or equal to x or greater than x
log.p which allows us to choose whether we want to take the log of the probability
For the poisson distribution, the key components are:
q, the variable in question
lambda which is the mean
lower tail and log.p, which have the same properties as the normal distribution function
For the binomial distribution, the key components are:
q, the variable in question
size which is the number of trials
probability, which includes the probability of success
lower tail and log.p, which allow us to choose the same as described in the normal distribution
?distribution
?dpois
?dbinom
Normal distribution examples: heights of people, ocean temps at a particular location, SAT scores.
Poisson distribution examples: tree seedlings emerging from the forest floor, bugs occurring in computer code, defects occur long a strand of yarn.
Binomial distribution examples: medical trials, toxicity tests, quality control (item is defective or it’s not).
Let’s say the average male height is 5’7” and the standard deviation is 2”. What is the probability of dating a male will be between 5’9” and 6’2”. For calculations sake, I will use centimeters.
#set the mean and standard deviation
mu <- 170.18
sigma <- 5.08
#Generate a range of values around the mean
x <- seq(from = mu - 3*sigma,
to = mu + 3*sigma,
length.out = 200
)
#Calculate the PDF
pdf <- dnorm(x = x,
mean = mu,
sd = sigma
)
#Plot the normal distribution
plot(x = x,
y = pdf,
type = 'l',
col = 'red',
lwd = 2,
xlab = 'Height',
ylab = 'Density',
main = 'Normal Distribution with Mean 170cm and SD 5cm'
)
Let’s plot the area to understand probability of dating a male between 5’9” and 6’2”.
plot(x = x,
y = pdf,
type = 'l',
col = 'blue',
lwd = 2,
xlab = 'Height',
ylab = 'Density',
main = 'Normal Distribution with Mean 170cm and SD 5cm'
)
# Shade the area under the curve for values below 96
x_shade <- seq(from = 175.26,
to = 187.96,
length.out = 1000
)
pdf_shade <- dnorm(x = x_shade,
mean = mu,
sd = sigma
)
?rev
polygon(x = c(x_shade, rev(x_shade)),
y = c(pdf_shade, rep(x = 0,
times = length(pdf_shade)
)
),
col = 'red',
border = NA
)
Let’s calculate the probability
pnorm(q = 187.96, mean = mu, sd = sigma) - pnorm(q = 175.26, mean = mu, sd = sigma)
## [1] 0.1584226
#explicitly state the upper tail argument
pnorm(q = 187.96, mean = mu, sd = sigma, lower.tail = T) - pnorm(q = 175.26, mean = mu, sd = sigma, lower.tail = T)
## [1] 0.1584226
1 - pnorm(q = 187.96, mean = mu, sd = sigma, lower.tail = F) - (1-pnorm(q = 175.26, mean = mu, sd = sigma, lower.tail = F))
## [1] 0.1584226
The probability of dating a male between 5’9” and 6’2” is .1584.
#modeled as Binomial
# What is N? 50
# What is $\pi$? .12 or 6/50
# x = 4
?dbinom
dbinom(x = 4,
size = 50,
prob = .12
)
## [1] 0.1334203
The answer is 0.1334.
#modeled as Poisson
N <- 50
pi <- 6
dpois(4, 6)
## [1] 0.1338526
Answer is 0.1339.
Results are very similar between Poisson and binomial. Likely because the Poisson distribution can be used to approximate the binomial distribution when the number of trials is large and the probability of success is small.