Project #3 - Normal and Binomial Distributions

Purpose

In this project, students will demonstrate their understanding of probability and the normal and binomial distributions.

Question 1

IQ scores are approximately normally distributed with: X ∼ N(μ=100,σ=15)

What proportion of the population has an IQ greater than 65? Interpret the result in context in a complete sentence.

# Calculate the Z-score for an IQ of 65
(65 - 100) / 15

## [1] -2.333333

# Calculate the proportion of the population with an IQ above the Z-score
pnorm(q = -2.333333, lower.tail = FALSE)

## [1] 0.9901847

About 99% of the population has an IQ score greater than 65.

What IQ score represents the top 5% of the population? Explain in a sentence what this value means in plain language.

# Find the value for the 95th percentile of IQ scores
qnorm(p = 0.95, mean = 100, sd = 15)

## [1] 124.6728

An IQ score of about 125 represents the 95th percentile of IQ scores, or, the top 5% of the population. This means that 5% of the population has an IQ score of 125 or higher.

Question 2

Recall our definition: A value is considered unusual if it lies more than two standard deviations from the mean.

Find the IQ values that mark the lower and upper bounds of the “usual” range.

# Subtract 2 standard deviations from the mean
100 - 2*15

## [1] 70

# Add 2 standard deviations to the mean
100 + 2*15

## [1] 130

The value that represents the lower bounds of IQ scores is 70 and the value that represents the upper bounds is 130.

What proportion of the population falls outside this range?

# Find the percentage of scores below 2 standard deviations
pnorm(q = -2, lower.tail = TRUE)

## [1] 0.02275013

# Find the percentage of scores above 2 standard deviations
pnorm(q = 2, lower.tail = FALSE)

## [1] 0.02275013

# Add the values together
pnorm(q = -2, lower.tail = TRUE) + pnorm(q = 2, lower.tail = FALSE)

## [1] 0.04550026

Approximately 4.55% of the population falls outside of this range.

Question 3

Two students took different standardized tests.

Alex took the SAT and scored 1650. Taylor took the ACT and scored 27.

Assume the distributions:
SAT∼N(1500,300) ACT∼N(21,5)

Compute the z-score for each student.

# Alex Z-score
(1650 - 1500) / 300

## [1] 0.5

# Taylor Z-score
(27 - 21) / 5

## [1] 1.2

Alex’s Z-score was 0.5 and Taylor’s Z-score was 1.2.

Which student performed better relative to other test-takers?

Taylor performed better relative to other test-takers because their Z-score was 1.2 standard deviations above the mean. Meanwhile, Alex’s was only 0.5 standard deviations above the mean.

Explain why comparing the raw scores alone would be misleading.

The raw scores are hard to examine alone because the tests are scored in very different ways. On the surface, Alex’s SAT score looks higher simply because the SAT uses larger numbers for its grading system. Looking at the results using Z-scores helps us understand them relatively.

Question 4

You are taking a 15-question multiple choice quiz and each question has 5 options (a,b,c,d) and you randomly guess every question.

How many questions do you expect to answer correctly on average?

On average, you should expect to get 1/5 of the answers correct.

What is the probability that you get every question correct?

# 15 trials of getting 1 answer correct
1/5^15

## [1] 3.2768e-11

The probability of getting every question correct is 3.28e-11.

What is the probability that you get every question incorrect?

# 15 trials of getting 1 answer incorrect
4/5^15

## [1] 1.31072e-10

The probability of getting every question incorrect is 1.31e-10.

What is the probability of getting exactly 10 questions correct?

# Probability of getting exactly 10 questions correct
dbinom(x = 10, size = 15, prob = 1/5)

## [1] 0.000100764

The probability of getting exactly 10 questions correct is 0.01%.

What is the probability of getting 10 or more correct answers?

# Probability of getting 10 or more correct answers
pbinom(q = 10, size = 15, prob = 1/5, lower.tail = FALSE)

## [1] 1.24617e-05

The probability of getting 10 or more questions correct is 1.25e-05.

Suppose a student claims they guessed randomly but got 10 out of 15 correct. Based on your probability above, do you believe this claim? Explain your reasoning. (There is no single correct answer, but your reasoning must use the probability you calculated.)

The probability of answering randomly and getting 10 out of 15 answers correct comes out to 0.001246%, or about a 1/100000 chance. It is possible, but pretty unlikely.

If you need a grade of 80% or higher on this quiz to maintain a passing grade, what is the probability of you maintaining that passing grade?

# Amount of correct answers you would need for a score of 80%
15 * 0.80

## [1] 12

# Probability of getting 12 or more correct answers
pbinom(q = 12, size = 15, prob = 1/5, lower.tail = FALSE)

## [1] 5.704909e-08

Getting a grade of 80% or higher would mean getting 12 or more questions correct. The probability of this is 5.71e-08.

Question 5

A company schedules 10 employees for a shift. Each employee independently shows up with probability: p = 0.85

Let X = number of employees who show up

The company needs at least 8 workers to operate normally.

What is the probability that fewer than 8 employees show up?

# Probability that fewer than 8 employees show up
pbinom(q = 7, size = 10, prob = 0.85)

## [1] 0.1798035

The probability that fewer than 8 employees show up is about 18%.

What is the probability the company has enough workers for this shift?

# Probability that 8 or more employees show up
pbinom(q = 8, size = 10, prob = 0.85, lower.tail = FALSE)

## [1] 0.5442998

The company needs at least 8 workers to function properly, so the probability that 8 or more employees show up is about 54%.

Explain what this probability means in the context of scheduling workers.

If the company needs 8 employees at the bare minimum, a 54% chance of operating normally is probably not one they are willing to take. This means that the company should look into scheduling more employees.

Management wants at least a 95% chance of having enough workers. Should they schedule more than 10 employees? Explain your reasoning.

# Probability that 8 or more employees will show up on a schedule with 13 employees
pbinom(q = 8, size = 13, prob = 0.85, lower.tail = FALSE)

## [1] 0.9658354

Yes, they should schedule more than 10 employees. Increasing the schedule just from 10 employees to 13 makes the chances of normal operation increase to about 97%.

Question 6

ACT scores are approximately normally distributed where: X ∼ N(21,5) a. Use R to simulate 10,000 ACT scores.

# Randomly generate 10000 ACT scores and store them into an object called ACTscores
ACTscores <- round(rnorm(n = 10000, mean = 21, sd = 5), digits = 0)

Find what percent of your simulated ACT scores were above 30

# Find a table of the proportion of ACT scores above 30
prop.table(table(ACTscores > 30))

## 
## FALSE  TRUE 
## 0.972 0.028

The percent of ACT scores above 30 is about 2.6%.

Now compute the theoretical probability of getting an ACT score above 30 using pnorm().

pnorm(q = 31, mean = 21, sd = 5, lower.tail = FALSE)

## [1] 0.02275013

The theoretical probability of getting an ACT score above 30 is about 2.3%

Compare the two values. Why are they similar but not identical?

They are similar but not identical because the first value is the experimental probability from a sample of 10000 randomly generated ACT results and the second value is the theoretical probability of getting a score above 30. The theoretical probability is what we expect to happen under perfect circumstances, while the experimental probability is slightly more skewed or variable and may contain random errors.

Question 7

Create your own real-world situation that could be modeled using either a binomial distribution or a normal distribution.

Your problem must include: * A description of the situation * Identification of reasonable parameters (mean, sd OR n, p) * One probability calculation in R * A written interpretation of the result

Examples might include: * basketball free throws * weather events * exam scores * products being defective

300 patients are being tested in a phase 2 clinical trial for the effectiveness of a new gene therapy drug. The rate of effectiveness is currently 60%. Researchers need at least 170 successes to secure funding and move onto phase 3 of the study. What is the probability they will be able to secure this funding?

# Probability of 170 successes
dbinom(x = 170, size = 300, prob = 0.60)

## [1] 0.02332595

The probability that the researchers will be able to secure their funding is only 2.3%. They most likely will not be moving onto phase 3 of the study.