Project #3 - Normal and Binomial Distributions

Purpose

In this project, students will demonstrate their understanding of probability and the normal and binomial distributions.

Question 1

Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.

P(x > 65)

#Find the probability that a randomly selected person has an IQ greater than 65.
pnorm(q = 65, mean = 100, sd = 15, lower.tail = FALSE)

## [1] 0.9901847

The probability that a randomly selected person has an IQ greater than 65 is 0.9902.

P(x < 150)

#Find the probability that a randomly selected person has an IQ lower than 150.
pnorm(q = 150, mean = 100, sd = 15)

## [1] 0.9995709

The probability that a randomly selected person has an IQ lower than 150 is 0.9996.

Question 2

Assume the same mean and standard deviation of IQ scores that was described in question 1.

A high school offers a special program for gifted students. In order to qualify, students must have IQ scores in the top 5%. What is the minimum qualifying IQ?

#Find the minimum qualifying score to be accepted to a special program for gifted students, given that only students with IQ scores in the top 5% can be accepted to the program. Use qnorm() function.
qnorm(p = 0.95, mean = 100, sd = 15)

## [1] 124.6728

The minimum qualifying IQ score to be in the top 5% and to be accepted to the special program in the high school is 124.7.

If one person is randomly selected, what is the probability that their IQ score is greater than 110?

#Find the probability that a randomly selected person would have an IQ of 110 or greater using pnorm() function.
pnorm(q = 110, mean = 100, sd = 15, lower.tail = FALSE)

## [1] 0.2524925

The probability that a randomely selected person would have an IQ greater than 110 is 0.2525.

Question 3

Still using the mean and standard deviation from question 1, what is the z-score for an IQ of 140?

#Find the Z-score for an IQ of 140. The formula for finding the Z-score is (x-mean)/sd.
(140-100)/15

## [1] 2.666667

The Z-score of an IQ 140 is 2.67.

We mentioned in week 6 that a data value is considered “unusual” if it lies more than two standard deviations from the mean. Is an IQ of 140 considered unusual?

Yes, the IQ of 140 is considered unusual because it has Z-score of 2.67; therefore, the IQ of 140 is more than 2 standard deviations from the mean.

What is the probability of getting an IQ greater than 140?

#Find the probability of getting an IQ greater than 140 using pnorm() function.
pnorm(q = 140, mean = 100, sd = 15, lower.tail = FALSE)

## [1] 0.003830381

The probability of getting an IQ greater than 140 is 0.0038.

Question 4

You are taking a 15-question multiple choice quiz and each question has 5 options (a,b,c,d,e) and you randomly guess every question.

How many questions do you expect to answer correctly on average?

#To find the mean or average we need to multiply number of trials by the probability of success: n*p.
15*0.2

## [1] 3

On average a person should expect to guess 3 out 15 questions with a correct answer.

What is the probability that you get every question correct?

#Find the probability of getting all 15 questions correct using dbinom() function.
dbinom(x = 15, size = 15, prob = 0.2)

## [1] 3.2768e-11

The probability of guessing all 15 questions out 15 correctly is very unlikely: 3.2768e-11.

What is the probability that you get every question incorrect?

#Find the probability of getting all questions incorrect using dbinom() function.
dbinom(x = 15, size = 15, prob = 0.8)

## [1] 0.03518437

The probability of getting all 15 questions incorrect is 0.0352.

Question 5

Consider still the 15-question multiple choice quiz that each question has 5 options (a,b,c,d,e) and you randomly guess every question.

How many questions does one need to answer correctly in order score exactly a 60%?

#Find how many questions a person should answer correctly to score 60%.
15*0.6

## [1] 9

To score 60% in a test with 15 questions a person must answer 9 questions correctly.

If a grade of 60% or lower is considered failing, then what is the probability of you failing?

#Find the probability of getting 60% or lower on the test with 15 questions using pbinom() function.
pbinom(q = 9, size = 15, prob = 0.2)

## [1] 0.9998868

The probability of getting exactly 60% or lower on the test, or the probability of failing the test, is 0.9999.

If you need a grade of 80% or higher on this quiz to maintain a passing grade, what is the probability of you maintaining that passing grade?

#Find the probability of getting 80% or more on the test with 15 questions by guessing the answers using pbinom() function. 
pbinom(q = 0.8*15, size = 15, prob = 0.2, lower.tail = FALSE)

## [1] 5.704909e-08

The probability of getting 80% or more on the test with 15 questions by guessing the answers is highly unlikely: 5.704909e-08.

Question 6

Suppose you own a catering company. You hire local college students as servers. Not being the most reliable employees, there is an 80% chance that any one server will actually show up for a scheduled event. For a wedding scheduled on Saturday, you need at least 5 servers.

Suppose you schedule 5 employees, what is the probability that all 5 come to work?

#Find the probability of 5 out 5 employees to show up at work, given that the chnces of each of them to show up is 0.8. Use dbinom() function.
dbinom(x = 5, size = 5, prob = 0.8)

## [1] 0.32768

The probability of all 5 workers to show up for the event is 0.3277.

Suppose you schedule 7 employees, what is the probability that at least 5 come to work?

#Find the probability of at least 5 employees out of 7 to show up for work using pbinom() function.
pbinom(q = 5, size = 7, prob = 0.8, lower.tail = FALSE)

## [1] 0.5767168

The probability of at least 5 employees out of 7 to show up for work is 0.5767.

It is really important that you have at least 5 servers show up! How many employees should you schedule in order to be 99% confident that at least 5 show up? Hint: there is no single formula for the answer here, so maybe use some kind of trial and error method.

#Find how many employees should you schedule in order to be 99% sure that at least 5 of them show up for work. Use pbinom() function to find the size. Since there's no single formula to find the size, we would have to try different numbers for the size until we get probability close to 0.99.
pbinom(q = 5, size = 11, prob = 0.8, lower.tail = FALSE)

## [1] 0.9883458

In order for you to be 99% sure that at least 5 servers will show up for work, you have to schedule 11 servers.

Question 7

Generate a random sample of 10,000 numbers from a normal distribution with mean of 51 and standard deviation of 7. Store that data in object called rand_nums.

#Store 10000 random numbers with the mean of 51 and standard deviation of 7 in the object called rand_nums.
rand_nums <- round(rnorm(10000, 51, 7))

Create a histogram of that random sample.

#Create a histogram of rand_nums object.
hist(rand_nums)

Question 8

How many values in your rand_nums vector are below 40?

#Find how many numbers in the rand_nums vector are below 40 using table() function.
table(rand_nums<40)

## 
## FALSE  TRUE 
##  9508   492

There are 497 numbers that are below 40 in the vector called rand_nums.

For a theoretical normal distribution, how many of those 10,000 values would you expect to be below 40?

#To find how many numbers we should expect to be below 40, given that there's 10000 values with the mean of 51 and standard deviation of 7, we need to find the probability of the random numbers under 40 using pnorm() function.
pnorm(q = 40, mean = 51, sd = 7)

## [1] 0.05804157

#Now that we know that the probability of getting a number below 40 is 0.058, we can find how many values we should expect to be under that number: n*p.
10000*0.058

## [1] 580

We should expect to see 580 numbers that are lower than 40.

Is your answer in part a reasonably close to your answer in part b?

Yes, those numbers are relatively close, given that the size of the trial is 10000. If the trial size was much smaller (for example, 800), then the difference would be significant.