Common Distributions in SWIRL

Common Distributions

Suppose we roll a fair six-sided die 3 times. What is the probability of getting exactly 2 sixes? For our notation, let X be the number of sixes obtained in the 3 rolls. Then X has a binomial distribution with n = 3 and p = 1/6.

dbinom(2, 3, 1/6)

## [1] 0.06944444

The R function dbinom computes the pmf of a binomial distribution. For instance, to compute the probability that Y = 11, we use the R code: dbinom(11,60,1/6), which computes to 0.1246.

dbinom(11,60,1/6)

## [1] 0.1245574

Suppose we were going to flip a biased coin 5 times. The probability of tossing a head is .8 and a tail .2. What is the probability that you’ll toss at least 3 heads.

pbinom(2, 5, .8, lower.tail = FALSE)

## [1] 0.94208

1-pbinom(2, 5, .8, lower.tail = TRUE)

## [1] 0.94208

Negative Binomial

Consider a sequence of independent Bernoulli trials with constant probability p of success. Let the random variable Y denote the total number of failures in this sequence before the rth success, that is, Y + r is equal to the number of trials necessary to produce exactly r successes with the last trial as a success. Here r is a fixed positive integer. To determine the pmf of Y , let y be an element of {y : y = 0,1,2,…}. Then, since the trials are independent, P(Y = y) is equal to the product of the probability of obtaining exactly r − 1 successes in the first y + r −1 trials times the probability p of a success on the (y +r)th trial.

Example 3.1.6. Suppose the probability that a person has blood type B is 0.12. In order to conduct a study concerning people with blood type B, patients are sampled independently of one another until 10 are obtained who have blood type B. Determine the probability that at most 30 patients have to have their blood type determined. Let Y have a negative binomial distribution with p = 0.12 and r = 10.

 pnbinom(20,10,0.12)

## [1] 0.001895293

Geometric Distribution

In this special case, r = 1, we say that Y has a geometric distribution. In terms of Bernoulli trials, Y is the number of failures until the first success. The geometric distribution was first discussed in Example 1.6.3 of Chapter 1. For the last example, the probability that exactly 11 patients have to have their blood type determined before the first patient with type B blood is found is given by .88 11 0.12. This is computed in R by dgeom(11,0.12) = 0.0294.

dgeom(11,0.12)

## [1] 0.0294097

Multinomial Distribution

The binomial distribution is generalized to the multinomial distribution as follows. Let a random experiment be repeated n independent times. On each repetition, there is one and only one outcome from one of k categories. Call the categories C 1 ,C 2 ,…,C k . For example, the upface of a roll of a six-sided die. Then the categories are C i = {i}, i = 1,2,…,6. For i = 1,…,k, let p i be the probability that the outcome is an element of C i and assume that p i remains constant throughout the n independent repetitions. Define the random variable X i to be equal to the number of outcomes that are elements of C i , i = 1,2,…,k − 1. Because X k = n − X 1 − ··· − X k−1 , X k is determined by the other X i ’s. Hence, for the joint distribution of interest we need only consider X 1 ,X 2 ,…,X k−1 .

Hypergeometric Distribution

Example: Suppose we draw 2 cards from a well shuffled standard deck of 52 cards and record the number of aces. The next R segment shows the probabilities over the range {0,1,2} for sampling with and without replacement, respectively:

rng <- 0:2; dbinom(rng,2,1/13); dhyper(rng,4,48,2)

## [1] 0.85207101 0.14201183 0.00591716

## [1] 0.850678733 0.144796380 0.004524887

Poisson Distribution is a a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. In other words, the Poisson distribution models counts or number of event in some interval of time. From Wikipedia, “Any variable that is Poisson distributed only takes on integer values.”(Wikipedia)

The PMF of the Poisson distribution has one parameter, lambda. As with the other distributions the PMF calculates the probability that the Poisson distributed random variable X takes the value x. Specifically, P(X=x)=(lambda^x)e(-lambda)/x!. Here x ranges from 0 to infinity.

The mean and variance of the Poisson distribution are both lambda.

Poisson random variables are used to model rates such as the rate of hard drive failures. We write X~Poisson(lambda*t) where lambda is the expected count per unit of time and t is the total monitoring time.

For example, suppose the number of people that show up at a bus stop is Poisson with a mean of 2.5 per hour, and we want to know the probability that at most 3 people show up in a 4 hour period. We use the R function ppois which returns a probability that the random variable is less than or equal to 3. We only need to specify the quantile (3) and the mean (2.5 * 4). We can use the default parameters, lower.tail=TRUE and log.p=FALSE.

ppois(3, 2.5*4, lower.tail = TRUE)

Finally, the Poisson distribution approximates the binomial distribution in certain cases. Recall that the binomial distribution is the discrete distribution of the number of successes, k, out of n independent binary trials, each with probability p. If n is large and p is small then the Poisson distribution with lambda equal to n*p is a good approximation to the binomial distribution.

To see this, use the R function pbinom to estimate the probability that you’ll see at most 5 successes out of 1000 trials each of which has probability .01. As before, you can use the default parameter values (lower.tail=TRUE and log.p=FALSE) and just specify the quantile, size, and probability.

pbinom(5,1000,.01)

## [1] 0.06613951

Now use the function ppois with quantile equal to 5 and lambda equal to n*p to see if you get a similar result.

ppois(5,1000*.01)

## [1] 0.06708596

# See how they're close? Pretty cool, right? This worked because n was large (1000) and p was small (.01).

The R function qnorm(prob) returns the value of x (quantile) for which the area under the standard normal distribution to the left of x equals the parameter prob. (Recall that the entire area under the curve is 1.) Use qnorm now to find the 10th percentile of the standard normal. Remember the argument prob must be between 0 and 1. You don’t have to specify any of the other parameters since the default is the standard normal.

qnorm(.10)

## [1] -1.281552

We can use R’s qnorm function and simply specify the mean and standard deviation (the square root of the variance). Do this now. Find the 97.5th percentile of a normal distribution with mean 3 and standard deviation 2.

qnorm(.975,3,2)

## [1] 6.919928

Suppose you have a normal distribution with mean 1020 and standard deviation of 50 and you want to compute the probability that the associated random variable X > 1200. The easiest way to do this is to use R’s pnorm function in which you specify the quantile (1200), the mean (1020) and standard deviation (50).

pnorm(1200,mean=1020,sd=50,lower.tail=FALSE)

## [1] 0.0001591086

Alternatively, we could use the formula above to transform the given distribution to a standard normal. We compute the number of standard deviations the specified number (1200) is from the mean with Z = (X -mu)/sigma. This is our new quantile. We can then use the standard normal distribution and the default values of pnorm. Remember to specify that lower.tail is FALSE.

pnorm((1200-1020)/50,lower.tail=FALSE)

## [1] 0.0001591086

For practice, using the same distribution, find the 75% percentile. Use qnorm and specify the probability (.75), the mean (1020) and standard deviation (50). Since we want to include the left part of the curve we can use the default lower.tail=TRUE.

qnorm(.75, 1020, 50, lower.tail = TRUE)

## [1] 1053.724

Common Distributions in SWIRL

Neal V. Quizon

7/18/2019