Project #3 - Probability Distributions

Purpose

This project will demonstrate your understanding of the normal and binomial probability distributions in R and RStudio.

Question 1

Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.

P(x > 65)

#First assign values to the mean and standard deviation.
mean_IQ <- 100
sd_IQ <- 15

#Then use pnorm
pnorm(65, mean_IQ, sd_IQ, lower.tail = FALSE)

## [1] 0.9901847

The probability that a randomly selected person has an IQ over 65 is 99.02%.

P(x < 150)

#Again, use pnorm to find the percentage.
pnorm(150, mean_IQ, sd_IQ, lower.tail = TRUE)

## [1] 0.9995709

The probability that a randomly selected person has an IQ less than 150 is 99.96%.

Question 2

Assume the same mean and standard deviation of IQ scores that was described in question 2.

A high school offers a special program for gifted students. In order to qualify, students must have IQ scores in the top 5%. What is the minimum qualifying IQ?

#To find the score that delineates the top 5%, use qnorm.
qnorm(0.05, mean_IQ, sd_IQ, lower.tail = FALSE )

## [1] 124.6728

To qualify for the gifted program, a student must have an IQ higher than 124.67.

If one person is randomly selected, what is the probability that their IQ score is greater than 125?

#Since 125 is close to the value found in part a, I'm guessing that the percentage is close to 5%.
pnorm(125, mean_IQ, sd_IQ, lower.tail = FALSE)

## [1] 0.04779035

The probability that a randomly selected person has an IQ score greater than 125 is 4.78%.

Question 3

Still using the mean and standard deviation from question 1, what is the z-score for an IQ of 140?

The z-score is calculated with the formula \(z = \frac{(x - \mu)}{\sigma}\)

(140 - mean_IQ)/sd_IQ

## [1] 2.666667

The z-score for an IQ of 140 is +2.67.

We mentioned in week 6 that a data value is considered “unusual” if it lies more than two standard deviations from the mean. Is an IQ of 140 considered unusual?
Yes. With a z-score of 2.67, an IQ of 140 is considered unusual.
What is the probability of getting an IQ greater than 140?

#This calls for pnorm, again.
pnorm(140, mean_IQ, sd_IQ, lower.tail = FALSE)

## [1] 0.003830381

The probability of an IQ greater than 140 is 0.38%.

Question 4

You are taking a 15-question multiple choice quiz and each question has 5 options (a,b,c,d,e) and you randomly guess every question.

How many questions do you expect to answer correct on average?

#The expected value is calculated by multiplying the number of questions by the probability that any one question is correct.
15*0.2

## [1] 3

I expect 3 questions of the 15 to be answered correctly.

What is the probability that you get every question correct?

#To find the probability of answering every question correctly, use dbinom with a probability of 0.2.
dbinom(15,15,0.2)

## [1] 3.2768e-11

For all intents and purposes, this is a probability of zero.

What is the probability that you get every question incorrect?

#Since the chance of answering a question incorrectly is much higher than answering a question correctly, I guess that the probability of all incorrect answers should be different than zero.
dbinom(15,15,0.8)

## [1] 0.03518437

#This is an alternate method.
dbinom(x = 0, size =  15, prob = 0.2)

## [1] 0.03518437

The probability of answering ALL questions INCORRECTLY is 3.52%.

Question 5

Consider still the 15-question multiple choice quiz that each question has 5 options (a,b,c,d,e) and you randomly guess every question.

How many questions does one need to answer correctly in order score exactly a 60%?

0.6*15

## [1] 9

60% of 15 is 9.

If a grade of 60% or lower is considered failing, then what is the probability of you failing?

#Calculate the probability of answering at most 9 questions correctly
pbinom(q = 9,size = 15,prob = 0.2)

## [1] 0.9998868

The probability of failure is 99.99%

If you need a grade of 80% or higher on this quiz to maintain a passing grade, what is the probability of you maintaining that passing grade?

#A grade of 80% requires at least 12 correct answers.
#First I'll calculate this as the complement of answering at most 11 questions correctly.
1- pbinom(q = 11, size =  15, prob = 0.2, lower.tail = TRUE)

## [1] 1.011253e-06

#A check is made by setting lower.tail = FALSE)
pbinom(q = 11, size = 15, prob = 0.2, lower.tail = FALSE)

## [1] 1.011253e-06

There is a 0.0001% chance of maintaining a passing grade.

Question 6

Suppose you own a catering company. You hire local college students as servers. Not being the most reliable employees, there is an 80% chance that any one server will actually show up for a scheduled event. For a wedding scheduled on Saturday, you need at least 5 servers.

Suppose you schedule 5 employees, what is the probability that all 5 come to work?

dbinom(x = 5,size = 5,prob = 0.8)

## [1] 0.32768

If 5 employees are scheduled, the probability that all 5 will show up is 32.77%.

Suppose you schedule 7 employees, what is the probability that at least 5 come to work?

#First, I'll answer this by adding together the probability of 5 employees coming to work and the probability of 6 employees coming to work and the probability of 7 employees coming to work.
dbinom(x = 5,size = 7, prob = 0.8) + dbinom(x = 6,size = 7, prob = 0.8) + dbinom(x = 7,size = 7, prob = 0.8)

## [1] 0.851968

#Let's see if I get the same answer with pbinom
pbinom(q = 4, size = 7,prob = 0.8,lower.tail = FALSE)

## [1] 0.851968

When I increase the number of employees scheduled to 7, the probability that 5 or more people show up is 85.20%.

It is really important that you have at least 5 servers show up! How many employees should you schedule in order to be 99% confident that at least 5 show up? Hint: there is no single formula for the answer here, perhaps use some kind of trial and error method, but show all work in an r chunk below.

#I'll write a loop that finds the smallest number of servers that have to be scheduled to achieve a 99% probability that at least 5 will show up.

#Let n be the number of employees to be scheduled. I'll increase n by one for each iteration and calculate a new probability, P. The initial conditions will be set to the values calculated in part b.
n <- 7
P <- 0.85
while (P<0.99){
  n = n + 1
  P = pbinom(q = 4, size = n,prob = 0.8, lower.tail = FALSE)
  cat("When n = ",n,",  P = ",P,"\n")}

## When n =  8 ,  P =  0.9437184 
## When n =  9 ,  P =  0.9804186 
## When n =  10 ,  P =  0.9936306

From this result, we see that 10 employees must be scheduled to be 99% confident that at least 5 of them will show up. Seems like a good time to find more reliable employees!