In this project, students will demonstrate their understanding of probability and the normal and binomial distributions.
Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.
#Probability of a randomly selected person scoring greater than 65 IQ score.
pnorm(q = 65, mean = 100, sd = 15, lower.tail = FALSE)
## [1] 0.9901847
99.02% rounded to the nearest hundredth,
#Probability of a randomly selected person scoring less than 150 IQ score.
pnorm(q = 150, mean = 100, sd = 15)
## [1] 0.9995709
99.96% probability rounded to two decimal places.
Assume the same mean and standard deviation of IQ scores that was described in question 1.
#To find the minimum qualifying IQ in the top 5%.
qnorm(p = 0.05, mean = 100, sd = 15, lower.tail = FALSE)
## [1] 124.6728
#To find the probability that a randomly selected person will score an IQ greater than 110.
pnorm(q = 110, mean = 100, sd = 15, lower.tail = FALSE)
## [1] 0.2524925
This probability is 25.25%, rounded to two decimal places.
#store the mean and standard deviation of the IQ scores...
IQ_mean <- c(100)
IQ_sd <- c(15)
#Use stored values to compute the z-score.
(140 - IQ_mean)/IQ_sd
## [1] 2.666667
We mentioned in week 6 that a data value is considered “unusual”
if it lies more than two standard deviations from the mean. Is an IQ of
140 considered unusual?
Considering that the standard deviation for IQ scores is 15 and the mean
being 100, mental math tells us that 2 * 15 is 30 and 100 + 30 = 130,
140 < 130 so an IQ score of 140 is considered unusual.
What is the probability of getting an IQ greater than 140?
#To find the probability of getting an IQ greater than 140.
pnorm(q = 140, mean = 100, sd = 15, lower.tail = FALSE)
## [1] 0.003830381
The probability of getting an IQ greater than 140 is .383%.
You are taking a 15-question multiple choice quiz and each question has 5 options (a,b,c,d,e) and you randomly guess every question.
#Expected average = the mean.
15 * .2
## [1] 3
#To find the probability of 15/15.
dbinom(x = 15, size = 15, prob = .2)
## [1] 3.2768e-11
#To find the probability of 0/15.
dbinom(x = 0, size = 15, prob = .2)
## [1] 0.03518437
3.5184% chance that all questions are answered incorrectly.
Consider still the 15-question multiple choice quiz that each question has 5 options (a,b,c,d,e) and you randomly guess every question.
#We can find this simply by multipliying the percentage to the number of questions.
15 * .6
## [1] 9
#To find the probability of scoring a 60% or lower.
pbinom(q = 9, size = 15, prob = .2)
## [1] 0.9998868
#To find the probability of scoring an 80% or higher.
#First let's find the amount of questions that denote a score of 70%...
15 * .7
## [1] 10.5
#With the previous result in mind, let's use the value in the pbinom function.
pbinom(q = 10.5, size = 15, prob = .2, lower.tail = FALSE)
## [1] 1.24617e-05
Suppose you own a catering company. You hire local college students as servers. Not being the most reliable employees, there is an 80% chance that any one server will actually show up for a scheduled event. For a wedding scheduled on Saturday, you need at least 5 servers.
#To find the probability that all 5 show up to work.
dbinom(x = 5, size = 5, prob = .8)
## [1] 0.32768
#To find the probability that at least 5 of the 7 come to work.
pbinom(q = 4, size = 7, prob = .8, lower.tail = FALSE)
## [1] 0.851968
#We will attempt to find the confidence interval that will tell us the range of average employess that should be scheduled in order to be 99% confident that at least 5 show up.
# Define the parameters
p <- 0.8 # Probability that each server shows up
min_servers_needed <- 5 # Minimum number of servers needed
confidence_level <- 0.99 # Confidence level
# Function to calculate the probability of at least k successes in n trials
prob_at_least_k <- function(k, n, p) {
sum(dbinom(k:n, n, p))
}
# Function to find the number of employees needed to be confident that at least k servers show up
find_min_employees <- function(k, p, confidence_level) {
n <- k # Start with the minimum number of servers needed
while (TRUE) {
probability <- prob_at_least_k(k, n, p)
if (probability >= confidence_level) {
return(n)
}
n <- n + 1
}
}
# Calculate the number of employees needed
min_employees <- find_min_employees(min_servers_needed, p, confidence_level)
cat("You should schedule at least", min_employees, "employees to be 99% confident that at least", min_servers_needed, "servers show up.")
## You should schedule at least 10 employees to be 99% confident that at least 5 servers show up.
Yes, it is important to have at least 5 serves show up.
#Function to generate the random sample.
rand_nums <- rnorm(n = 10000, mean = 51, sd = 7)
#Function to create histogram of rand_nums.
hist(rand_nums)
# Example vector
my_vector <- c(rand_nums)
# Threshold
threshold <- 40
# Count the number of values below the threshold
count_below_threshold <- sum(my_vector < threshold)
# Output the result
cat("Number of values below the threshold:", count_below_threshold)
## Number of values below the threshold: 599
For a theoretical normal distribution, how many of those 10,000
values would you expect to be below 40?
I would expect around 10% of values to be below 40% in a theorectical
normal distribution.
Is your answer in part a reasonably close to your answer in part b? My answer is reasonably close to my answer in part b as 10% of 10000 is 1000, and my product for part b is 567 numbers below the threshold. I believe this lower quartile in the normal distribution floats around 5% to 10%.