Part 1

Question 1: Please explain each of the 3 distributions in less than 4 sentences.

  1. A normal distribution is a distribution in which data cluster around the central, mean value, and the values before and after the mean are symmetrical. Normal distributions are used when the data being analyzed seem “normal”, or the mean and median are the same, and 68% of the data falls within 1 standard deviation of the mean, 95% of data falls within 2 standard deviations of the mean, and ~99.7% of data falls within 3 standard deviations of the mean. As a result of the data being symmetrical, they are considered “normally distributed”. An example of a normal distribution is blood pressure, when the average across the entire population is 120/80, and is generally normally distributed, with each side declining in prevalence at the same rate when looking at the overall population.

  2. A binomial distribution is used when there are two distinct outcomes of a trial- e.g., success and failure, win or lose, heads or tails. These two outcomes are mutually exclusive. This distribution is used to model the probability of a successful outcome during a series of fixed, randomized trails. One example is the number of heads in a series of coin flips (assuming that heads = success) – say, the probability of getting heads in 10 coin flips. Each trial must be independent of the other, meaning that it cannot be affected by the success of another trial. For this distribution, the mean is equal to the number of trials (n) multiplied by the probability of success on a single trial (p). So the mean calculation is n*p.

  3. A Poisson distribution models the likelihood, or average rate, that a specific event will happen in a given time frame. This distribution can be used when the average rate a specific event happens is known, and can determine the amount of variation that occurs from the average number of these event occurrences. One example is modeling the number of sales of computers in a given day. The mean is denoted as lambda (λ) and the standard deviation is the square root of lambda ( )

Question #2: Explain what the pdf and cdf of a distribution measures. Pick any of the three distributions (or a distribution from the list above that we have not covered in class), and provide some intuition as to if the pdf formula makes sense or not.

  1. The pdf, or probability density function, measures the probability or likelihood that the percentage of a dataset’s distribution falls between two values. Stated differently, it measures the likelihood of values of a continuous random variable. This is represented by the formula:
    \(P(a \le X \le b) = \int_{a}^{b}f(x)dx\)

  2. The pdf works well for the normal distribution because the normal distribution encompasses continuous variables. Additionally, it works for a normal distribution because a normal distribution is symmetrical, and it can be easy to find the area under a normal distribution between two points due to this feature.

  3. The cdf, or cumulative distribution function, measures the probability that a random variable will take on a value that is less than or equal to a particular number. The cdf is always within the range of 0 to 1 and represents the entire range of probability that a variable will take on a certain value. If we’re looking at the binomial distribution, this would tell us the probability that up to a certain number of successes would occur. The formula

  1. FX(x)=P(X≤x), for all x∈ℝ. Shows on the right-hand side, the probability that X takes on a value less than or equal to x. The cdf makes sense for a binomial distribution because it can tell us the probability that X, or the number of successes in question, is less than or equal to x, the number of successes we’re interested in.

Question 3: What are the key parameters that define the 3 distributions above (or a distribution from the list above)? Does R require these key parameters to be declared ? Type the “?distribution” command in R to find out.

  1. The normal distribution can be adjusted using 2 parameters: the mean and standard deviation. The mean shifts the curve to the left or right, and the standard deviation widens or steepens the sides of the curve. For the normal distribution, R requires the mean and standard deviation to be declared.

  2. For a binomial curve, the parameters required are the number of trials (n) and the probability (p). These must be declared in R for the binomial distribution.

Question 4: Give a few examples of situations that can be modeled with each of the 3 distributions above.

  1. Binomial Distribution:
  1. Drug treatment trials: success measured as patient cured. The n = number of clinical trials.
  2. Seeds are planted: success = germination. N = number of seeds planted.
  1. Normal Distribution:
  1. IQ scores of the population
  2. Birthweight of all babies born in the US
  1. Poisson Distribution:
  1. Quality control: the number of faulty products on an assembly line
  2. Customer service: the number of calls to a customer service center about a product

Question 5: Plot the distribution in part B (3 if you stick close to class notes, or 1 if you venture out).

Normal Distribution Plot

#Normal Distribution
data(Orange)
mean_O <- mean(Orange$circumference)
Stdev_O <- sd(Orange$circumference)
x <- Orange$circumference
y <- dnorm(Orange$circumference,mean = mean_O,sd = Stdev_O)
plot(x,y,
     ylab = "Probability",
     xlab = "Circumference",
     main = "Normal Distribution of Tree Circumference")

Binomial Distribution Plot

#Binomial Distribution
##Generate Binomial Data
Binom_data = 0:10
plot(Binom_data,dbinom(Binom_data, size = 10, prob = 0.5),
     type = 'h',
     xlab = "Number of Successes",
     ylab = "Probability",
     main = "Binomial Distribution of 10 Coin Flips"
     )

Poisson Distribution Plot

#Poisson Distribution
##Example: number of car accidents in a month
###Define Range
Accidents = 0:100
plot(Accidents,dpois(Accidents,lambda = 6),
     type = 'h',
     ylab = "Probability",
     main = "Number of Accidents in a Month")

Part II

N = 100 x = 5 pi = .1

Question A: Then model both as a binomial and a Poisson, and provide your R code solutions.

Binomal Distribution

dbinom(5, 
       size = 100, 
       prob = .1
       )
## [1] 0.0338658

Poisson Distribution

N = 100
pi = 10/100
dpois(5,lambda = (N*pi))
## [1] 0.03783327

Question B:

They are similar because the binomial is looking at success vs failure, while the poisson is looking at the rate of deaths over time. They are both discrete probability distributions which model the occurrence of discrete events. This may be another reason why they’re similar.