Project #3 - Normal and Binomial Distributions

Purpose

In this project, students will demonstrate their understanding of probability and the normal and binomial distributions.

Question 1

IQ scores are approximately normally distributed with: X ~ N(μ=100,σ=15)

What proportion of the population has an IQ greater than 65? Interpret the result in context in a complete sentence.

#This code will allow us to calculate what portion of the population has an IQ greater than 6 
pnorm(q = 65, mean = 100, sd = 15, lower.tail = FALSE)

## [1] 0.9901847

Approximately 0.99 of the population has an IQ greater than 65.

What IQ score represents the top 5% of the population? Explain in a sentence what this value means in plain language.

#This code allows us to calculate the IQ score that represents the top 5% of the population
qnorm(p = 0.05, mean = 100, sd = 15, lower.tail = FALSE)

## [1] 124.6728

In order for a person’s IQ to be within the top 5% of the population, they must have an IQ of around 125 or higher.

Question 2

Recall our definition: A value is considered unusual if it lies more than two standard deviations from the mean.

Find the IQ values that mark the lower and upper bounds of the “usual” range.

#This code allows us to calculate the IQ values that mark determine the "usual" range. 
100 - (15*2)

## [1] 70

(15*2) + 100

## [1] 130

Any IQ scores below 70 and above 130 are considered outliers, while anything in between is considered normal.

What proportion of the population falls outside this range?

#This code allows us to calculate the proportion of the population whose IQ falls above 130. 
pnorm(q = 130, mean = 100, sd = 15, lower.tail = FALSE)

## [1] 0.02275013

#This code allows us to calculate the proportion of the population whose IQ falls below 70. 
pnorm(q = 70, mean = 100, sd = 15, lower.tail = TRUE)

## [1] 0.02275013

#This code allows us to calculate the proportion of the population whose IQs are considered outliers. 
pnorm(q = 130, mean = 100, sd = 15, lower.tail = FALSE) + pnorm(q = 70, mean = 100, sd = 15, lower.tail = TRUE)

## [1] 0.04550026

Approximately 0.05 of the population falls outside the range of a 70-130 IQ score.

Question 3

Two students took different standardized tests.

Alex took the SAT and scored 1650. Taylor took the ACT and scored 27.

Assume the distributions:
SAT~N(1500,300) ACT~N(21,5)

Compute the z-score for each student.

#This code allows us to calculate Alex's z-score
(1650 - 1500)/300

## [1] 0.5

#This code allows us to calculate Taylor's z-score
(27 - 21)/5

## [1] 1.2

Which student performed better relative to other test-takers?

Taylor scored better relative to the other test-takers. We know this because her z-score was 1.2, which is greater than Alex’s z-score of 0.5. A greater z-score means it’s greater than the average scores received on the test.
Explain why comparing the raw scores alone would be misleading.

Comparing the raw scores alone would be misleading because the tests are on two different scales and we would not know whose scores were better or worse than the average.

Question 4

You are taking a 15-question multiple choice quiz and each question has 5 options (a,b,c,d,e) and you randomly guess every question.

How many questions do you expect to answer correctly on average?

 #This code allows us to calculate how many questions are expected to be answered correctly on average
    15 * 0.2

## [1] 3

I expect 3 questions to be answered correctly on average.

What is the probability that you get every question correct?

#This code allows us to calculate the probability of guessing every question correctly
    dbinom(x = 15, size = 15, prob = 0.20)

## [1] 3.2768e-11

The probability of guessing every question correctly is approximately 3.28 x 10^-11, a very small number close to zero.

What is the probability that you get every question incorrect?

#This code allows us to calculate the probability of guessing every question incorrectly
    dbinom(x = 0, size = 15, prob = 0.20)

## [1] 0.03518437

The probability of guessing every question incorrectly is approximately 0.04.

What is the probability of getting exactly 10 questions correct?

#This code allows us to calculate the probability of guessing exactly 10 questions correctly
    dbinom(x = 10, size = 15, prob = 0.20)

## [1] 0.000100764

The probability of guessing exactly 10 questions correctly is approximately 0.0001.

What is the probability of getting 10 or more correct answers?

#This code allows us to calculate the probability of guessing 10 or more questions correctly
    sum(dbinom(x = 10:15, size = 15, prob = 0.20))

## [1] 0.0001132257

The probability of guessing 10 or more questions correctly is approximately 0.0001.

Suppose a student claims they guessed randomly but got 10 out of 15 correct. Based on your probability above, do you believe this claim? Explain your reasoning. (There is no single correct answer, but your reasoning must use the probability you calculated.)

I would not believe this claim because there is a very small probability of it occurring, that probability being 0.0001.
If you need a grade of 80% or higher on this quiz to maintain a passing grade, what is the probability of you maintaining that passing grade?

#This code allows us to calculate the probability of receiving an 80% through guessing
sum(dbinom(x = 12:15, size = 15, prob = 0.20))

## [1] 1.011253e-06

The probability of receiving a grade of 80% or higher and maintaining a passing grade in the class is 1.01 x 10^-6, a very small number.

Question 5

A company schedules 10 employees for a shift. Each employee independently shows up with probability: p = 0.85

Let X = number of employees who show up

The company needs at least 8 workers to operate normally.

What is the probability that fewer than 8 employees show up?

#This code allows us to calculate the probability that fewer than 8 employees show up 
sum(dbinom(0:7, 10, 0.85))

## [1] 0.1798035

The probability that fewer than 8 employees show up is approximately 0.18.

What is the probability the company has enough workers for this shift?

#This code allows us to calculate the probability that 8 workers will show up
sum(dbinom(8:10, 10, 0.85))

## [1] 0.8201965

The probability that the company has enough workers for this shift is approximately 0.82.

Explain what this probability means in the context of scheduling workers.

This probability means that the company is able to operate properly on a regular basis because the likelihood of 8 or more employees showing up for the shift is high.
Management wants at least a 95% chance of having enough workers. Should they schedule more than 10 employees? Explain your reasoning.
```
#This code allows us to calculate the probability of having enough workers if more were scheduled for a shift  
sum(dbinom(8:12, 12, 0.85))
```
```
## [1] 0.9760781
```
Yes, management should schedule more than 10 employees for a shift because of there will be a higher chance of at least 8 workers showing up. As you can see, if they were to schedule just two more people for the shift, the probability of having enough workers for the shift increases to 0.97.

Question 6

ACT scores are approximately normally distributed where: X ~ N(21,5) a. Use R to simulate 10,000 ACT scores.

#This code allows us to simulate a random sample with the same results each time the chunk is run
set.seed(123)

#This code allows us to simulate a set of 10,000 ACT scores
ACTscores <- rnorm(n = 10000, mean = 21, sd = 5)

#This code calculates the mean of the simulated ACT scores
meanACT <- mean(ACTscores)

#This code calculates the standard deviation of the simulated ACT scores
sdACT <- sd(ACTscores)

Find what percent of your simulated ACT scores were above 30

#This code calculates what percentage of our simulated ACT scores fall above 30
pnorm(q = 30, mean = meanACT, sd = sdACT, lower.tail = FALSE) * 100

## [1] 3.555046

Approximately 3.56% of the simulated ACT scores fall above 30.

Now compute the theoretical probability of getting an ACT score above 30 using pnorm().

#This code calculates the theoretical probability of getting an ACT score above 30
pnorm(q = 30, mean = 21, sd = 5, lower.tail = FALSE)

## [1] 0.03593032

Compare the two values. Why are they similar but not identical? The probability value of simulated ACT scores above 30 was 0.0356, but the probability value of the expected ACT scores above 30 is 0.0359. They are very similar but not identical due to randomness and the size of our sample. While our sample size is large, it still does not represent all ACT scores.

Question 7

Sally has 15 hens and wants to begin selling eggs at the local market for fun and a little extra money. With each hen laying an average of 5 eggs a week, she has 75 to sell. She wants to know how many eggs she’ll be able to sell as large, medium, and possibly small. This way, she can estimate the money she’ll bring in during the week.

Chicken egg weights in grams are approximately normally distributed with: X ~ N(μ=58,σ=4.5)

#This code simulates a random sample with the same results each time the chunk is run
set.seed(135)

#This code simulates a random sample of Sally's eggs and their weights
EggWeights <- rnorm(n = 75, mean = 58, sd = 4.5)

#This code calculates the mean of the simulated egg weights
meanEggWeights <- mean(EggWeights)

#This code calculates the standard deviation of the simulated egg weights
sdEggWeights <- sd(EggWeights)

How many eggs can Sally expect to sell as large? (54.34g – 61.41g)

#This code calculates the portion of eggs Sally can sell as large
pnorm(q = 54.34, mean = meanEggWeights, sd = sdEggWeights, lower.tail = FALSE)

## [1] 0.8503882

#This calculates the number of laid eggs that are classified as large
75 * 0.85

## [1] 63.75

Sally can expect to have approximately 64 large eggs to sell. She’ll get the most profit from these.

How many eggs can Sally expect to sell as small? (40.16g - 47.24g)

#This code calculates the portion of eggs Sally can sell as small
pnorm(q = 47.24, mean = meanEggWeights, sd = sdEggWeights, lower.tail = TRUE)

## [1] 0.001088206

#This calculates the number of laid eggs that are classified as small
75 * 0.00

## [1] 0

Sally can expect to have no small eggs to sell for this upcoming local market.

How many eggs can Sally expect to sell as medium? (47.25g - 54.33g)

#This code calculates the portion of eggs Sally can sell as medium
1 - pnorm(q = 54.34, mean = meanEggWeights, sd = sdEggWeights, lower.tail = FALSE) - pnorm(q = 47.24, mean = meanEggWeights, sd = sdEggWeights, lower.tail = TRUE)

## [1] 0.1485236

#This calculates the number of laid eggs that are classified as medium
75 * 0.15

## [1] 11.25

Sally can expect to have approximately 11 medium eggs to sell at the upcoming local market.

If Sally prices her large eggs at 50 cents each and her medium eggs at 30 cents each, she can expect to bring home approximately $35 a week.