Project #3 - Normal and Binomial Distributions

Purpose

In this project, Colin will demonstrate his understanding of probability and the normal and binomial distributions and try not to forget to remove any instructions or be ridiculous.

Question 1

Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.

q1_mean <- 100
q1_sd <- 15

P(x > 65)

(1 - pnorm(65,q1_mean,q1_sd)) * 100

## [1] 99.01847

P(x < 150)

pnorm(150,q1_mean,q1_sd)*100

## [1] 99.95709

Question 2

Assume the same mean and standard deviation of IQ scores that was described in question 1.

A high school offers a special program for gifted students. In order to qualify, students must have IQ scores in the top 5%. What is the minimum qualifying IQ?

qnorm(0.95, q1_mean, q1_sd)

## [1] 124.6728

If one person is randomly selected, what is the probability that their IQ score is greater than 110?

(1 - pnorm(110, q1_mean, q1_sd)) * 100

## [1] 25.24925

Question 3

Still using the mean and standard deviation from question 1, what is the z-score for an IQ of 140?

(140 - q1_mean) / q1_sd

## [1] 2.666667

We mentioned in week 6 that a data value is considered “unusual” if it lies more than two standard deviations from the mean. Is an IQ of 140 considered unusual?

q3b_zscore <- abs(140 - q1_mean) > 2 * q1_sd
q3b_zscore

## [1] TRUE

What is the probability of getting an IQ greater than 140?

q3b_answer <- (1 - pnorm(q3b_zscore))
q3b_answer * 100

## [1] 15.86553

Question 4

You are taking a 15-question multiple choice quiz and each question has 5 options (a,b,c,d,e) and you randomly guess every question.

How many questions do you expect to answer correctly on average?

q4a_answer <- 15 * (1 / 5)
q4a_answer

## [1] 3

What is the probability that you get every question correct?

q4b_answer <- (1 / 5)^15
q4b_answer

## [1] 3.2768e-11

floor(q4b_answer)

## [1] 0

What is the probability that you get every question incorrect?

q4c_answer <- ((5 - 1)/5)^15
q4c_answer * 100

## [1] 3.518437

Question 5

Consider still the 15-question multiple choice quiz that each question has 5 options (a,b,c,d,e) and you randomly guess every question.

How many questions does one need to answer correctly in order score exactly a 60%?

q5a_answer <- 15 * 0.60
q5a_answer

## [1] 9

If a grade of 60% or lower is considered failing, then what is the probability of you failing?

q5b_answer <- (pbinom(q5a_answer - 1, 15, 1 / 5))
q5b_answer * 100

## [1] 99.9215

If you need a grade of 80% or higher on this quiz to maintain a passing grade, what is the probability of you maintaining that passing grade?

q5c_answer <- 1 - pbinom(q5a_answer - 1, 15, 1 / 5)
q5c_answer

## [1] 0.0007849854

Question 6

Suppose you own a catering company. You hire local college students as servers. Not being the most reliable employees, there is an 80% chance that any one server will actually show up for a scheduled event. For a wedding scheduled on Saturday, you need at least 5 servers.

Suppose you schedule 5 employees, what is the probability that all 5 come to work?

q6a_answer <- dbinom(5, 5, 4/5)
q6a_answer * 100

## [1] 32.768

Suppose you schedule 7 employees, what is the probability that at least 5 come to work?

q6b_answer <- 1 - pbinom(4, 7, 4/5)
q6b_answer * 100

## [1] 85.1968

It is really important that you have at least 5 servers show up! How many employees should you schedule in order to be 99% confident that at least 5 show up? Hint: there is no single formula for the answer here, so maybe use some kind of trial and error method.

for(q6c_answer in 5:20) {
  yes_theyre_here <- 1 - pbinom(4, q6c_answer, 4/5)
  if (yes_theyre_here >= 0.99) {
    break
  }
}
q6c_answer

## [1] 10

Question 7

Generate a random sample of 10,000 numbers from a normal distribution with mean of 51 and standard deviation of 7. Store that data in object called rand_nums.

rand_nums <- rnorm(10000, 51, 7)

Create a histogram of that random sample.

hist(rand_nums, main = "Question 7", xlab = "Value", breaks = 30, col = "lightblue", border = "black")

Question 8

How many values in your rand_nums vector are below 40?

q8a_answer <- sum(rand_nums < 40)
q8a_answer

## [1] 580

For a theoretical normal distribution, how many of those 10,000 values would you expect to be below 40?

q8b_answer <- pnorm(40, 51, 7) * 10000
q8b_answer

## [1] 580.4157

Is your answer in part a reasonably close to your answer in part b?

q8a_answer

## [1] 580

q8b_answer

## [1] 580.4157

That looks reasonably close! Even when Knitting this document a few dozen times. Plus, there’s this:

((q8a_answer - q8b_answer)/q8b_answer) * 100

## [1] -0.07161569

Reasonably close.