CUNY Math Bridge W3

1. The weights of steers in a herd are distributed normally. The variance is 40,000 and the mean steer weight is 1300 lbs. Find the probability that the weight of a randomly selected steer is greater than 979 lbs. (Round your answer to 4 decimal places)

#Need to use mean (1300), given value (979), and std deviation of variance (40000)
#Since problem wants to know the probability of a random steer being larger than mean, use 1 - pnorm result

cow_variance <- 40000
cow_sd <- sqrt(cow_variance)
cow_mn <- 1300
cow_z_wt <- 979

print(paste0("There is a ", 100 * round(pnorm(cow_z_wt, mean = cow_mn, sd = cow_sd, lower.tail = FALSE),4)," probability the steer will weigh more than than 979 lbs."))

## [1] "There is a 94.58 probability the steer will weigh more than than 979 lbs."

2. SVGA monitors manufactured by TSI Electronics have life spans that have a normal distribution with a variance of 1,960,000 and a mean life span of 11,000 hours. If a SVGA monitor is selected at random, find the probability that the life span of the monitor will be more than 8340 hours. (Round your answer to 4 decimal places)

#Second verse, same as the first. Oi'm 'enry the 8th oi am...

mntr_variance <- 1960000
mntr_sd <- sqrt(mntr_variance)
mntr_mn <- 11000
mntr_z_wt <- 8430

print(paste0("There is a ", 100 * round(pnorm(mntr_z_wt, mean = mntr_mn, sd = mntr_sd, lower.tail = FALSE),4)," probability the monitor's life sapen will be more than 8340 hours."))

## [1] "There is a 96.68 probability the monitor's life sapen will be more than 8340 hours."

3. Suppose the mean income of firms in the industry for a year is 80 million dollars with a standard deviation of 3 million dollars. If incomes for the industry are distributed normally, what is the probability that a randomly selected firm will earn between 83 and 85 million dollars? (Round your answer to 4 decimal places)

#This is a rock cover of 'enry the 8th: Find 83 and find 85. 
inc_mn <- 80000000
inc_sd <- 3000000
low_val <- pnorm(83000000, mean=inc_mn, sd=inc_sd, lower.tail=TRUE)
high_val <- pnorm(85000000, mean=inc_mn, sd=inc_sd, lower.tail=FALSE)

print(paste0("A randomly selected firm will have a probability of ", 100 * round(1 - (low_val + high_val),4)," to earn an obscene amount of money between $83 - 85mil"))

## [1] "A randomly selected firm will have a probability of 11.09 to earn an obscene amount of money between $83 - 85mil"

4. Suppose GRE Verbal scores are normally distributed with a mean of 456 and a standard deviation of 123. A university plans to offer tutoring jobs to students whose scores are in the top 14%. What is the minimum score required for the job offer? Round your answer to the nearest whole number, if necessary.

gre_mn <- 456
gre_sd <- 123
gre_limit <- .14

#Use qnorm so you can provide a percent instead of a value. Presto-changeo, it gives you the value that matches the percent - all things being normal, of course. :)

print(paste0("A student must score at least a ", round(qnorm(1 - gre_limit, gre_mn, gre_sd),0)," to be accepted. Hmmm...GREs must've changed a lot since 1990. I scored a 1240."))

## [1] "A student must score at least a 589 to be accepted. Hmmm...GREs must've changed a lot since 1990. I scored a 1240."

#Looky there - I don't need to do the whole "mean = blah" nonsense. I can just put the values in the correct order.

5. The lengths of nails produced in a factory are normally distributed with a mean of 6.13 centimeters and a standard deviation of 0.06 centimeters. Find the two lengths that separate the top 7% and the bottom 7%. These lengths could serve as limits used to identify which nails should be rejected. Round your answer to the nearest hundredth, if necessary.

#mix of 3 and 4
len_mn <- 6.13
len_sd <- 0.06
len_low_val <- 0.07
len_high_val <- 1 - len_low_val

print(paste0("Reject nails that are < ",round(qnorm(len_low_val, len_mn, len_sd),2), " or > ", round(qnorm(len_high_val, len_mn, len_sd),2),"."))

## [1] "Reject nails that are < 6.04 or > 6.22."

6. An English professor assigns letter grades on a test according to the following scheme.

A:  Top 13% of scores
B:  Scores below the top 13% and above the bottom 55%
C:  Scores below the top 45% and above the bottom 20%
D:  Scores below the top 80% and above the bottom 9%
F:  Bottom 9% of scores

Scores on the test are normally distributed with a mean of 78.8 and a standard deviation of 9.8. Find the numerical limits for a C grade. Round your answers to the nearest whole number, if necessary.

grd_mn <- 78.8
grd_sd <- 9.8
c_low_val <- .20
c_high_val <- .55

print(paste0("A student will earn a C if he/she scores between a ", round(qnorm(c_low_val, grd_mn, grd_sd),0), " and ", round(qnorm(c_high_val, grd_mn, grd_sd),0),"."))

## [1] "A student will earn a C if he/she scores between a 71 and 80."

7. Suppose ACT Composite scores are normally distributed with a mean of 21.2 and a standard deviation of 5.4. A university plans to admit students whose scores are in the top 45%. What is the minimum score required for admission? Round your answer to the nearest tenth, if necessary.

act_mn <- 21.2
act_sd <- 5.4
act_low_lim <- .55

print(paste0("Let's see if I could get in. The lowest score for admission is ", round(qnorm(act_low_lim, act_mn, act_sd),1),". And, I'm easily in!"))

## [1] "Let's see if I could get in. The lowest score for admission is 21.9. And, I'm easily in!"

8. Consider the probability that less than 11 out of 151 students will not graduate on time. Assume the probability that a given student will not graduate on time is 9%. Approximate the probability using the normal distribution. (Round your answer to 4 decimal places.)

#Yuck. Formulas. Luckily, some R brainiac wrote some code for me.
n <- 151 #sample
p <- 0.09 #probability
grd_late <- 10 #1 less than the probability grd_late we're given

print(paste0("Statistics is almost as much fun as running or eating broccoli. Use the pbinom function. The probability that a student will successfully not graduate on time using normal distribution is--> ", round(pbinom(grd_late, n, p),4)*100))

## [1] "Statistics is almost as much fun as running or eating broccoli. Use the pbinom function. The probability that a student will successfully not graduate on time using normal distribution is--> 19.2"

9. The mean lifetime of a tire is 48 months with a standard deviation of 7. If 147 tires are sampled, what is the probability that the mean of the sample would be greater than 48.83 months? (Round your answer to 4 decimal places)

#Central limits since we don't have sample size, but we do have Google, we know the mean of all samples will be ~ equal to mean of population - assuming sample is sufficiently large (i.e., >= 30)
tire_mn <- 48
tire_sd <- 7
tire_sampl <- 147
tire_mn_std_error <- tire_sd/(sqrt(tire_sampl)) #subs as sd in pnorm
tire_val_chk <- 48.83

print(paste0("Probability the mean of sample w/b > 48.83 months: ", 100 * round(pnorm(tire_val_chk, tire_mn, tire_mn_std_error, lower.tail = FALSE),4),". Unless, however, the warranty expires at 48 mos. Then, the probability is infinitesimally small. Like mean +/-3 standard deviations small."))

## [1] "Probability the mean of sample w/b > 48.83 months: 7.53. Unless, however, the warranty expires at 48 mos. Then, the probability is infinitesimally small. Like mean +/-3 standard deviations small."

10. The quality control manager at a computer manufacturing company believes that the mean life of a computer is 91 months, with a standard deviation of 10. If he is correct, what is the probability that the mean of a sample of 68 computers would be greater than 93.54 months? (Round your answer to 4 decimal places)

pc_mn <- 91
pc_sd <- 10
pc_sampl <- 68
pc_mn_std_error <- pc_sd/(sqrt(pc_sampl)) #subs as sd in pnorm
pc_val_chk <- 93.54

print(paste0("Probability the mean of sample w/b > 93.54 months: ", 100 * round(pnorm(pc_val_chk, pc_mn, pc_mn_std_error, lower.tail = FALSE),4),". Again, read the warranty's small print as it will impact the PC's lifespan."))

## [1] "Probability the mean of sample w/b > 93.54 months: 1.81. Again, read the warranty's small print as it will impact the PC's lifespan."

11. A director of reservations believes that 7% of the ticketed passengers are no-shows. If the director is right, what is the probability that the proportion of no-shows in a sample of 540 ticketed passengers would differ from the population proportion by less than 3%? (Round your answer to 4 decimal places)

#gah. 
#Find area between high and low values. 
no_shows <- 0.07
psngr_smpl <- 540
psngr_std_error <- sqrt(no_shows * (1-no_shows)/psngr_smpl)
psngr_high_limit <- 0.1
psngr_low_limit <- 0.04

f_t <- function(v) {.5 * log((1+v)/(1-v))} #Fisher Trans
f_sd <- function(n) {1/sqrt(n - 3)} #Fisher SD


#Determine the 10%ile and 4%ile as that's +/- the population proportion
psngr_10_pctl <- pnorm(f_t(psngr_high_limit), f_t(no_shows), f_sd(psngr_smpl), lower.tail=TRUE)
psngr_4_pctl <- pnorm(f_t(psngr_low_limit), f_t(no_shows), f_sd(psngr_smpl), lower.tail=TRUE)

#find area between 4 and 10, so minus
print(paste0("No-shows are ", 100* round(psngr_10_pctl - psngr_4_pctl,4)," likely to be within 3% of population proportion."))

## [1] "No-shows are 51.53 likely to be within 3% of population proportion."

12. A bottle maker believes that 23% of his bottles are defective. If the bottle maker is accurate, what is the probability that the proportion of defective bottles in a sample of 602 bottles would differ from the population proportion by greater than 4%? (Round your answer to 4 decimal places)

#Ctrl V then Ctrl C
#gah. 
btl_defct <- 0.23
btl_smpl <- 602
btl_std_error <- sqrt(btl_defct * (1-btl_defct)/btl_smpl)
btl_high_limit <- 0.27
btl_low_limit <- 0.19

#Determine the 27%ile and 19%ile as that's +/- the population proportion
btl_27_pctl <- pnorm(f_t(btl_high_limit), f_t(btl_defct), f_sd(btl_smpl), lower.tail=FALSE)
btl_19_pctl <- pnorm(f_t(btl_low_limit), f_t(btl_defct), f_sd(btl_smpl), lower.tail=TRUE)

#find area < 19 and area > 27. (Not between)
print(paste0("Defects are ", 100 * round(btl_27_pctl + btl_19_pctl,4)," likely to be within 4% of population proportion."))

## [1] "Defects are 30.1 likely to be within 4% of population proportion."

13. A research company desires to know the mean consumption of beef per week among males over age 48. Suppose a sample of size 208 is drawn with x Ì = 3.9. Assume Â® = 0.8 . Construct the 80% confidence interval for the mean number of lb. of beef per week among males over 48. (Round your answers to 1 decimal place)

#Yay. No SD. More obscure formulas. Again, thank you to the nameless / faceless R coders. Will need to do an upper and lower as there's 20% to distribute around

bf_mn <- 3.9
bf_smpl <- 208
bf_ci <- 0.8
bf_sd <- 0.8
bf_p <- (1 - bf_ci)/2
bf_t <- abs(qt(bf_p,bf_smpl - 1))
bf_se <- bf_sd /sqrt(bf_smpl)

bf_lower <- round(bf_mn - bf_t * bf_se, 1)
bf_upper <- round(bf_mn + bf_t * bf_se, 1)

print(paste0("The 80% confidence interval for lbs of meat on average per 48 yr old men is ", bf_lower, " to ", bf_upper,"."))

## [1] "The 80% confidence interval for lbs of meat on average per 48 yr old men is 3.8 to 4."

14. An economist wants to estimate the mean per capita income (in thousands of dollars) in a major city in California. Suppose a sample of size 7472 is drawn with x Ì = 16.6. Assume Â® = 11 . Construct the 98% confidence interval for the mean per capita income. (Round your answers to 1 decimal place)

#Note to self: You would not make a good statistician. 100% true, MoE 0 
pc_inc_mn <- 16.6
pc_inc_smpl <- 7472
pc_inc_ci <- .98
pc_inc_sd <- 11
pc_inc_p <- (1 - pc_inc_ci)/2
pc_inc_t <- abs(qt(pc_inc_p,pc_inc_smpl - 1))
pc_inc_se <- pc_inc_sd /sqrt(pc_inc_smpl)

pc_inc_lower <- round(pc_inc_mn - pc_inc_t * pc_inc_se, 1)
pc_inc_upper <- round(pc_inc_mn + pc_inc_t * pc_inc_se, 1)

print(paste0("The 98% confidence interval for per capita income in a major city in CA is  ", pc_inc_lower, "K to ", pc_inc_upper,"K. And, I don't want to move there. "))

## [1] "The 98% confidence interval for per capita income in a major city in CA is  16.3K to 16.9K. And, I don't want to move there. "

15. Find the value of t such that 0.05 of the area under the curve is to the left of t. Assume the degrees of freedom equals 26.

Step 1. Choose the picture which best describes the problem.

print("I'll take the chart on the upper right for 200, Alex.")

## [1] "I'll take the chart on the upper right for 200, Alex."

Step 2. Write your answer below.

print("t is 1.706 for 0.05 of the area under the curve and 26 degrees of freedom. src = http://www.statisticshowto.com/tables/t-distribution-table/")

## [1] "t is 1.706 for 0.05 of the area under the curve and 26 degrees of freedom. src = http://www.statisticshowto.com/tables/t-distribution-table/"

16. The following measurements ( in picocuries per liter ) were recorded by a set of helium gas detectors installed in a laboratory facility: 383.6, 347.1, 371.9, 347.6, 325.8, 337.

Using these measurements, construct a 90% confidence interval for the mean level of helium gas present in the facility. Assume the population is normally distributed.

Step 1. Calculate the sample mean for the given sample data. (Round answer to 2 decimal places)

hel_vals <- c(383.6, 347.1, 371.9, 347.6, 325.8, 337)
hel_mn <- mean(hel_vals)

print(paste0("The mean is ", round(hel_mn,2),"."))

## [1] "The mean is 352.17."

Step 2. Calculate the sample standard deviation for the given sample data. (Round answer to 2 decimal places)

hel_sd <- sd(hel_vals, na.rm = FALSE)

print(paste0("The standard deviation is ", round(hel_sd,2),"."))

## [1] "The standard deviation is 21.68."

Step 3. Find the critical value that should be used in constructing the confidence interval. (Round answer to 3 decimal places)

hel_ci <- 0.90
hel_t <- abs(qt((1-hel_ci)/2, length(hel_vals) - 1))

print(paste0("The critical value needed to construct the CI is ", round(hel_t,3),". "))

## [1] "The critical value needed to construct the CI is 2.015. "

Step 4. Construct the 90% confidence interval. (Round answer to 2 decimal places)

hel_se <- hel_sd /sqrt(length(hel_vals))
hel_lower <- hel_mn - hel_t * hel_se
hel_upper <- hel_mn + hel_t * hel_se

print(paste0("The 90% confidence interval is ", round(hel_lower,2), " to ", round(hel_upper,2),"."))

## [1] "The 90% confidence interval is 334.34 to 370."

17. A random sample of 16 fields of spring wheat has a mean yield of 46.4 bushels per acre and standard deviation of 2.45 bushels per acre. Determine the 80% confidence interval for the true mean yield. Assume the population is normally distributed.

Step 1. Find the critical value that should be used in constructing the confidence interval. (Round answer to 3 decimal places)

wheat_mn <- 46.4
wheat_sd <- 2.45
wheat_ci <- 0.80
wheat_n <- 16
wheat_t <- abs(qt((1-wheat_ci)/2, wheat_n - 1))

print(paste0("The critical value is ", round(wheat_t,3), ". This was checked by looking at t-table for 15 degrees of freedom and CI of 80% is 1.341"))

## [1] "The critical value is 1.341. This was checked by looking at t-table for 15 degrees of freedom and CI of 80% is 1.341"

Step 2. Construct the 80% confidence interval. (Round answer to 1 decimal place)

wheat_se <- wheat_sd /sqrt(wheat_n)
wheat_lower <- wheat_mn - wheat_t * wheat_se
wheat_upper <- wheat_mn + wheat_t * wheat_se

print(paste0("The 80% confidence interval is ", round(wheat_lower,1), " to ", round(wheat_upper,1),"."))

## [1] "The 80% confidence interval is 45.6 to 47.2."

18. A toy manufacturer wants to know how many new toys children buy each year. She thinks the mean is 8 toys per year. Assume a previous study found the standard deviation to be 1.9. How large of a sample would be required in order to estimate the mean number of toys bought per child at the 99% confidence level with an error of at most 0.13 toys? (Round your answer up to the next integer)

toys_mn <- 8
toys_sd <- 1.9
toys_z <- 2.57
toys_moe <- 0.13
toys_n <- ((toys_z* toys_sd)/toys_moe)^2

print(paste0("The sample needs to be at least ", round(toys_n,0), " for 99% confidence with a MoE of 0.13."))

## [1] "The sample needs to be at least 1411 for 99% confidence with a MoE of 0.13."

19. A research scientist wants to know how many times per hour a certain strand of bacteria reproduces. He believes that the mean is 12.6. Assume the variance is known to be 3.61. How large of a sample would be required in order to estimate the mean number of reproductions per hour at the 95% confidence level with an error of at most 0.19 reproductions? (Round your answer up to the next integer)

bac_mn <- 12.6
bac_var <- 3.61
bac_sd <- sqrt(bac_var)
bac_ci <- .95
bac_z <- 1.96
bac_moe <- 0.19
bac_n <- ((bac_z * bac_sd)/bac_moe)^2

print(paste0("The sample for the busy bacteria needs to be ", round(bac_n,0),"."))

## [1] "The sample for the busy bacteria needs to be 384."

20. The state education commission wants to estimate the fraction of tenth grade students that have reading skills at or below the eighth grade level.

Step 1. Suppose a sample of 2089 tenth graders is drawn. Of the students sampled, 1734 read above the eighth grade level. Using the data, estimate the proportion of tenth graders reading at or below the eighth grade level. (Write your answer as a fraction or a decimal number rounded to 3 decimal places)

soph_smpl <- 2089
above_8 <- 1734
below_8_pct <- (soph_smpl - above_8)/soph_smpl

print(paste0("The proportion of sophomores reading at or below the 8th grade level is ", round(below_8_pct,3),"."))

## [1] "The proportion of sophomores reading at or below the 8th grade level is 0.17."

Step 2. Suppose a sample of 2089 tenth graders is drawn. Of the students sampled, 1734 read above the eighth grade level. Using the data, construct the 98% confidence interval for the population proportion of tenth graders reading at or below the eighth grade level. (Round your answers to 3 decimal places)

soph_ci <- .98
soph_z <- 2.33
soph_moe <- soph_z * sqrt((below_8_pct*(1 - below_8_pct))/soph_smpl)
soph_upper <- (below_8_pct + soph_moe) 
soph_lower <- (below_8_pct - soph_moe)

print(paste0("The range for 98% ci of sophs reading at or below 8th grade is ", round(soph_lower,3), " - ", round(soph_upper,3),"." ))

## [1] "The range for 98% ci of sophs reading at or below 8th grade is 0.151 - 0.189."

21. An environmentalist wants to find out the fraction of oil tankers that have spills each month.

Step 1. Suppose a sample of 474 tankers is drawn. Of these ships, 156 had spills. Using the data, estimate the proportion of oil tankers that had spills. (Write your answer as a fraction or a decimal number rounded to 3 decimal places)

tanker_smpl <- 474
tanker_spill <- 156
tanker_spill_pct <- (tanker_spill/tanker_smpl)

print(paste0("The porportion of oil tankers who leaked is ", round(tanker_spill_pct,3),"."))

## [1] "The porportion of oil tankers who leaked is 0.329."

Step 2. Suppose a sample of 474 tankers is drawn. Of these ships, 156 had spills. Using the data, construct the 95% confidence interval for the population proportion of oil tankers that have spills each month. (Round your answers to 3 decimal places)

tanker_ci <- .95
tanker_z <- 1.96
tanker_moe <- tanker_z * sqrt((tanker_spill_pct*(1 - tanker_spill_pct))/tanker_smpl)
tanker_upper <- (tanker_spill_pct + tanker_moe) 
tanker_lower <- (tanker_spill_pct - tanker_moe)

print(paste0("The range for 95% ci of leaking ships is ", round(tanker_lower,3), " - ", round(tanker_upper,3),"." ))

## [1] "The range for 95% ci of leaking ships is 0.287 - 0.371."

Complete all calculations in R and provide the associated code. In addition to the attached, answer the following to questions, showing all work.

22. The cumulative distribution function of the random variable X is

\[Fx^{(x)}=1âe^{âax},a>0,x>0\]

What is the probability density function? A probability distribution is a statistical function that describes all the possible values and likelihoods that a random variable can take within a given range. The area under the graph of a probability distribution function will indicate the interval for the variable. (from Investopedia.com)

What is the expected value? The expected value should be regarded as the average value. When X is a discrete random variable, then the expected value of X is precisely the mean of the corresponding data. From Google, I see it can be translated into a formula of \[E[x] = \frac {1}{a}\]

What is the variance? The variance should be regarded as (something like) the average of the difference of the actual values from the average. A larger variance indicates a wider spread of values. Again, Google provides a formula \[VA[x] = \frac {1}{a^2}\]

*Note: both defnitions above from https://math.berkeley.edu/~scanlon/m16bs04/ln/16b2lec30.pdf*

Determine P(X<.5 | alpha =1).

x <- 0.5
a <- 1

print(paste0("Assuming a normal distribution, I think it can be solved using the dnorm function. However, I don't feel that's right as I don't have the values for the formula. If I use x as x and a as mean, the answer is ", dnorm(x,a),"."))

## [1] "Assuming a normal distribution, I think it can be solved using the dnorm function. However, I don't feel that's right as I don't have the values for the formula. If I use x as x and a as mean, the answer is 0.3520653267643."

23. The probability mass function for a particular random variable Y is

\[fy^{(y)} = \frac{e^{-b}b^y}{y!}, y\in[0,1...inf], b>0\] What is the E(Y)? What is E(Y^2) ? What is the variance?

This is the probability mass function from Poisson (“Hello, Vardamon’s mom”), so E(Y) is the mean, and E(Y^2) is the variance. Making the standard deviation equal to the sqrt of the variance.