A normal distribution is a distribution in which data cluster around the central, mean value, and the values before and after the mean are symmetrical. Normal distributions are used when the data being analyzed seem “normal”, or the mean and median are the same, and 68% of the data falls within 1 standard deviation of the mean, 95% of data falls within 2 standard deviations of the mean, and ~99.7% of data falls within 3 standard deviations of the mean. As a result of the data being symmetrical, they are considered “normally distributed”. An example of a normal distribution is blood pressure, when the average across the entire population is 120/80, and is generally normally distributed, with each side declining in prevalence at the same rate when looking at the overall population.
A binomial distribution is used when there are two distinct outcomes of a trial- e.g., success and failure, win or lose, heads or tails. These two outcomes are mutually exclusive. This distribution is used to model the probability of a successful outcome during a series of fixed, randomized trails. One example is the number of heads in a series of coin flips (assuming that heads = success) – say, the probability of getting heads in 10 coin flips. Each trial must be independent of the other, meaning that it cannot be affected by the success of another trial. For this distribution, the mean is equal to the number of trials (n) multiplied by the probability of success on a single trial (p). So the mean calculation is n*p.
A Poisson distribution models the likelihood, or average rate, that a specific event will happen in a given time frame. This distribution can be used when the average rate a specific event happens is known, and can determine the amount of variation that occurs from the average number of these event occurrences. One example is modeling the number of sales of computers in a given day. The mean is denoted as lambda (λ) and the standard deviation is the square root of lambda ( )
The pdf, or probability density function, measures the
probability or likelihood that the percentage of a dataset’s
distribution falls between two values. Stated differently, it measures
the likelihood of values of a continuous random variable. This is
represented by the formula:
\(P(a \le X \le b) =
\int_{a}^{b}f(x)dx\)
The pdf works well for the normal distribution because the normal distribution encompasses continuous variables. Additionally, it works for a normal distribution because a normal distribution is symmetrical, and it can be easy to find the area under a normal distribution between two points due to this feature.
The cdf, or cumulative distribution function, measures the probability that a random variable will take on a value that is less than or equal to a particular number. The cdf is always within the range of 0 to 1 and represents the entire range of probability that a variable will take on a certain value. If we’re looking at the binomial distribution, this would tell us the probability that up to a certain number of successes would occur. The formula
The normal distribution can be adjusted using 2 parameters: the mean and standard deviation. The mean shifts the curve to the left or right, and the standard deviation widens or steepens the sides of the curve. For the normal distribution, R requires the mean and standard deviation to be declared.
For a binomial curve, the parameters required are the number of trials (n) and the probability (p). These must be declared in R for the binomial distribution.
#Normal Distribution
data(Orange)
mean_O <- mean(Orange$circumference)
Stdev_O <- sd(Orange$circumference)
x <- Orange$circumference
y <- dnorm(Orange$circumference,mean = mean_O,sd = Stdev_O)
plot(x,y,
ylab = "Probability",
xlab = "Circumference",
main = "Normal Distribution of Tree Circumference")
#Binomial Distribution
##Generate Binomial Data
Binom_data = 0:10
plot(Binom_data,dbinom(Binom_data, size = 10, prob = 0.5),
type = 'h',
xlab = "Number of Successes",
ylab = "Probability",
main = "Binomial Distribution of 10 Coin Flips"
)
#Poisson Distribution
##Example: number of car accidents in a month
###Define Range
Accidents = 0:100
plot(Accidents,dpois(Accidents,lambda = 6),
type = 'h',
ylab = "Probability",
main = "Number of Accidents in a Month")
N = 100 x = 5 pi = .1
dbinom(5,
size = 100,
prob = .1
)
## [1] 0.0338658
N = 100
pi = 10/100
dpois(5,lambda = (N*pi))
## [1] 0.03783327
They are similar because the binomial is looking at success vs failure, while the poisson is looking at the rate of deaths over time. They are both discrete probability distributions which model the occurrence of discrete events. This may be another reason why they’re similar.