Project #3 - Normal and Binomial Distributions

Purpose

In this project, students will demonstrate their understanding of probability and the normal and binomial distributions.

Question 1

Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.

P(x > 65)

#Using the "pnorm" fuction to find the probability of an IQ higher than 65 (normally distributed data):

pnorm(65, mean=100, sd=15, lower.tail=FALSE)

## [1] 0.9901847

There is a 0.990 probability that a randomly selected person would have an IQ of 65 or higher.

P(x < 150)

#Using the "pnorm" function to find the probability of an IQ lower than 150:

pnorm(150, mean=100, sd=15)

## [1] 0.9995709

There is a 0.999% probability that a randomly selected person will have an IQ lower than 150.

Question 2

Assume the same mean and standard deviation of IQ scores that was described in question 1.

A high school offers a special program for gifted students. In order to qualify, students must have IQ scores in the top 5%. What is the minimum qualifying IQ?

#Using the "qnorm" function to calculate the IQ score at the top 5% (or 95th percentile):

qnorm(0.95, mean=100,sd=15)

## [1] 124.6728

The minimum IQ score to qualify into the gifted program is approximately 124.67.

If one person is randomly selected, what is the probability that their IQ score is greater than 110?

#Using the "pnorm" function to calculate the probability of an IQ of at least 110:

pnorm(110, mean=100, sd=15, lower.tail=FALSE)

## [1] 0.2524925

There is a probability of 0.253 that a randomly selected person has an IQ of at least 110.

Question 3

Still using the mean and standard deviation from question 1, what is the z-score for an IQ of 140?

Z-score formula: \[z = \frac{(x - \mu)}{\sigma} \]

#Calculating using Z-score formula:

(140-100)/15

## [1] 2.666667

The Z-score for an IQ of 140 is 2.67.

We mentioned in week 6 that a data value is considered “unusual” if it lies more than two standard deviations from the mean. Is an IQ of 140 considered unusual?

Yes. Because it is 2.67 standard deviations away from the mean (which is more than 2), an IQ score of 140 is considered unusual.

What is the probability of getting an IQ greater than 140?

#Calculating using the pnorm function:
  
pnorm(140, mean=100, sd=15, lower.tail=FALSE)

## [1] 0.003830381

The probability of getting an IQ greater than 140 is 0.00383.

Question 4

You are taking a 15-question multiple choice quiz and each question has 5 options (a,b,c,d,e) and you randomly guess every question.

How many questions do you expect to answer correctly on average?

A multiple-choice test has a binomial distribution. The formula for calculating a binomial mean (expected value) is: \(\mu = n*p\)

#Multiplying 15 (number of trials) by 0.2 (probability of success):
15*0.2

## [1] 3

I would expect to answer three questions correctly.

What is the probability that you get every question correct?

#Calculating the probability of exactly 15 successes out of 15 trials by using the dbinom function:

dbinom(x=15, size=15, prob=0.2)

## [1] 3.2768e-11

The probability of getting every question correct is 0.0000000000328.

What is the probability that you get every question incorrect?

#Calculating the probability of exactly 0 successes out of 15 trials, with a probability of 0.8, using the dbinom function:

dbinom(x=0, size=15, prob=0.8)

## [1] 3.2768e-11

The probability of getting zero questions correct is 0.0000000000328.

Question 5

Consider still the 15-question multiple choice quiz that each question has 5 options (a,b,c,d,e) and you randomly guess every question.

How many questions does one need to answer correctly in order score exactly a 60%?

#Calculating sixty percent of 15:

15*0.6

## [1] 9

In order to receive a score of exactly 60% on the test, I need to answer 9 questions correctly.

If a grade of 60% or lower is considered failing, then what is the probability of you failing?

#Calculating the cumulative probability of 9 or fewer successes by using the pbinom function:

pbinom(q=9, size=15, prob=0.2)

## [1] 0.9998868

There is a 99.9% chance that I will fail.

If you need a grade of 80% or higher on this quiz to maintain a passing grade, what is the probability of you maintaining that passing grade?

# Multiplying 0.80 by 15 to find out how many successes would result in a score of 80%:

15* 0.8

## [1] 12

# Calculating the cumulative probability of 12 or more successes by using the pbinom function:

pbinom(q=12, size=15, prob=0.2, lower.tail=FALSE)

## [1] 5.704909e-08

I will need to answer at least 12 questions correctly in order to maintain a passing grade. The probability of this happening is 0.00000005705.

Question 6

Suppose you own a catering company. You hire local college students as servers. Not being the most reliable employees, there is an 80% chance that any one server will actually show up for a scheduled event. For a wedding scheduled on Saturday, you need at least 5 servers.

Suppose you schedule 5 employees, what is the probability that all 5 come to work?

#Calculating the probability of exactly 5 out of 5 employees showing up for the event by using the dbinom function:

dbinom(5, size=5, prob=0.8)

## [1] 0.32768

The probability of all five employees showing up for work is 0.328.

Suppose you schedule 7 employees, what is the probability that at least 5 come to work?

#Calculating the probability of exactly 5 out of 7 employees showing up for the event by using the dbinom function:

dbinom(5, size=7, prob=0.8)

## [1] 0.2752512

The probability of five of the seven employees showing up for work is 0.275.

It is really important that you have at least 5 servers show up! How many employees should you schedule in order to be 99% confident that at least 5 show up? Hint: there is no single formula for the answer here, so maybe use some kind of trial and error method.

#Calculating the cumulative probability of five or more servers showing up using the pbinom function in a trial-and-error method that experiments with number of trials:

pbinom(5, size=10, prob=0.8, lower.tail=FALSE)

## [1] 0.9672065

pbinom(5, size=11, prob=0.8, lower.tail=FALSE)

## [1] 0.9883458

pbinom(5, size=12, prob=0.8, lower.tail=FALSE)

## [1] 0.9960969

In order to be 99.6% confident (when rounding to three significant digits) that at least five servers will show up, I should schedule at minimum twelve employees.

Question 7

Generate a random sample of 10,000 numbers from a normal distribution with mean of 51 and standard deviation of 7. Store that data in object called rand_nums.

#Creating a vector of randomly-chosen numbers called "rand_nums" using the rnorm function:
#Setting the seed to ensure the vector does not change:
set.seed(1)
rand_nums <- rnorm(n=10000,mean=51,sd=7)

Create a histogram of that random sample.

#Creating histogram of "rand_nums":
hist(rand_nums, xlab="randomly chosen numbers")

Question 8

How many values in your rand_nums vector are below 40?

#Finding out how many values are below 40 using the sum function:
sum(rand_nums < 40)

## [1] 617

The are 617 values in the vector “rand_nums” that are below 40.

For a theoretical normal distribution, how many of those 10,000 values would you expect to be below 40?
I will calculate a z-score of 40 by using the formula \[z = \frac{(x - \mu)}{\sigma} \]

#Calculating the z-score:
(40-51)/7

## [1] -1.571429

40 has a z-score of -1.57. Using a z-score table, the probability of a randomly-chosed value in “rand_nums” being 40 or below is 0.0582.

Is your answer in part a reasonably close to your answer in part b?

#Calculating the actual occurance of values under 40 by taking my number of occurances (617) and dividing it by the total number of values: 

617/10000

## [1] 0.0617

In the vector “rand_nums”, 6.07% of values are under 40. This number is very close to the probability of 5.82% which was calculated using the rules of standard normal distribution.