Homework 3

1. A researcher wishes to conduct a study of the color preferences of new car buyers. Suppose that 50% of this population prefers the color red. If 20 buyers are randomly selected, what is the probability that between 9 and 12 (both inclusive) buyers would prefer red?

n <- 20 #Sample size
p <- 0.5 #Probability of red preference
kseq <- (0:n)

barplot(dbinom(kseq, n, p), names.arg = kseq, ylab = "Probability", xlab = "# of Buyers Preferring Red", cex.names = 0.8)

Due to the small sample size, and relatively high rate of red preference (which we will treat like a success), the Binomial distribution will be more appropriate than the Poisson distribution. We can find the probability of 9-12 buyers preferring red by taking the CDF of k = 12 and k = 9 individually, then subtracting them from each other.

k1 <- 8 #Since the range is inclusive, we enter 8 instead of 9.
k2 <- 12

ck1 <- pbinom(k1, n, p) #CDF with 9 buyers preferring red
ck2 <- pbinom(k2, n, p) #CDF with 12 buyers preferring red

round(ck2-ck1, 4)

## [1] 0.6167

The probability that between 9 and 12 (both inclusive) buyers would prefer red is 61.67%. We can double check the math by adding up the PDF results for k = 9, 10, 11, and 12.

round(dbinom(9, n, p) + dbinom(10, n, p) + dbinom(11, n, p) + dbinom(12, n, p),4)

## [1] 0.6167

2. A quality control inspector has drawn a sample of 13 light bulbs from a recent production lot. Suppose 20% of the bulbs in the lot are defective. What is the probability that less than 6 but more than 3 bulbs from the sample are defective?

n <- 13 #Sample size
p <- 0.2 #Probability of defect
kseq <- (0:n)

barplot(dbinom(kseq, n, p), names.arg = kseq, ylab = "Probability", xlab = "# of Defective Bulbs", cex.names = 0.8)

Again, due to the small sample size and relatively high rate of failure, we will use the Binomial distribution in a way very similar to the last problem.

k1 <- 3
k2 <- 5 #Since range is exclusive, we enter 5 instead of 6

ck1 <- pbinom(k1, n, p) #CDF with 3 defective bulbs
ck2 <- pbinom(k2, n, p) #CDF with 6 defective bulbs

round(ck2-ck1, 4)

## [1] 0.2226

The probability of there being more than 3 but less than 6 defective bulbs is 22.26%. Again we will check the result by adding up the appropriate PDF results.

round(dbinom(4, n, p) + dbinom(5, n, p), 4)

## [1] 0.2226

3. The auto parts department of an automotive dealership sends out a mean of 4.2 special orders daily. What is the probability that, for any day, the number of special orders sent out will be no more than 3?

lambda <- 4.2 #rate of occurrence
kseq <- (0:15)

barplot(dpois(kseq, lambda), names.arg = kseq, ylab = "Probability", xlab = "# of Special Orders", cex.names = 0.8)

Since we are just given a rate of occurrence in absolute terms, and no population size, our best bet is to use the Poisson distribution here.

k <- 3 #Since we are just looking for the area to the left of k = 3, we only need this one point.

ck <- ppois(k, lambda) #CDF with 3 or less special orders

round(ck, 4)

## [1] 0.3954

The probability of there being no more than 3 sepcial orders per day is 39.54%. We will again double check the results by adding up the relevant PDF results.

round(dpois(0, lambda) + dpois(1, lambda) + dpois(2, lambda) + dpois(3, lambda), 4)

## [1] 0.3954

4. A pharmacist receives a shipment of 17 bottles of a drug and has 3 of the bottles tested. If 6 of the 17 bottles are contaminated, what is the probability that less than 2 of the tested bottles are contaminated?

contam <- 6
clean <- 11
sample <- 3
kseq <- (0:3)

barplot(dhyper(kseq, contam, clean, sample), names.arg = kseq, ylab = "Probability", xlab = "# of Contaminated Bottles", cex.names = 0.8)

Here we will have to use the hypergeometric distribution, as we will be drawing from a finite population without replacement.

k <- 1 #Since we are looking for 2 or less, we will take the CDF of k = 1

ck <- phyper(k, contam, clean, sample)

round(ck, 4)

## [1] 0.7279

The probability of the sample of 3 has less than 2 contaminated bottles is 72.79%. Double checking the result by adding up the PDF results…

round(dhyper(0, contam, clean, sample) + dhyper(1, contam, clean, sample), 4)

## [1] 0.7279

5. A town recently dismissed 6 employees in order to meet their new budget reductions. The town had 6 employees over 50 years of age and 19 under 50. If the dismissed employees were selected at random, what is the probability that more than 1 employee was over 50?

over <- 6
under <- 19
sample <- 6
kseq <- (0:6)

barplot(dhyper(kseq, over, under, sample), names.arg = kseq, ylab = "Probability", xlab = "# of Employees Over 50", cex.names = 0.8)

Again we are sampling without replacement, so we will use the hypergeometic distrobution.

k <- 1 #find the area to the left of k = 1

ck <- phyper(k, over, under, sample)

round(1 - ck, 4) #To get the area to the right, subtract our earlier result from 1

## [1] 0.4529

The probability of more than 1 of the 6 fired employee being over 50 is 45.29%. Double checking our results by adding up the PDF results gives us the same result.

round(dhyper(2, over, under, sample) + dhyper(3, over, under, sample) + dhyper(4, over, under, sample)
       + dhyper(5, over, under, sample) + dhyper(6, over, under, sample), 4)

## [1] 0.4529

6 The weights of steers in a herd are distributed normally. The variance is 90,000 and the mean steer weight is 800 lbs. Find the probability that the weight of a randomly selected steer is between 1040 and 1460 lbs..

mean <- 800
variance <- 90000
sd <- sqrt(variance)

curve(dnorm(x, mean = mean, sd = sd), from = 0, to = 1600, xlab = "Weight (lbs)", ylab = "Probability")

Since we are now working with a continuous curve as opposed to discrete “buckets” it is a bit simpler to compute the cumulative distribution functions needed.

k1 <- 1040 #lower bound
k2 <- 1460 #upper bound

ck1 <- pnorm(k1, mean = mean, sd = sd) #take CDF function up to lower bound
ck2 <- pnorm(k2, mean = mean, sd = sd) #take CDF function up to upper bound

round(ck2-ck1, 4) #find the area between

## [1] 0.198

The probability that the weight of a random steer is between 1040 and 1460 lbs is about 19.80%.

7.The diameters of ball bearings are distributed normally. The mean diameter is 106 millimeters and the standard deviation is 4 millimeters. Find the probability that the diameter of a selected bearing is between 103 and 111 millimeters.

mean <- 106
sd <- 4

curve(dnorm(x, mean = mean, sd = sd), from = 90, to = 122, xlab = "Diameter (mm)", ylab = "Probability")

Again we will be using a normal distribution:

k1 <- 103 #lower bound
k2 <- 111 #upper bound

ck1 <- pnorm(k1, mean = mean, sd = sd) #take CDF function up to lower bound
ck2 <- pnorm(k2, mean = mean, sd = sd) #take CDF function up to upper bound

round(ck2-ck1, 4) #find the area between

## [1] 0.6677

The probability that the diameter of a randomly selected bearing is between 103 and 111 mm is about 66.77%

8.The lengths of nails produced in a factory are normally distributed with a mean of 3.34 centimeters and a standard deviation of 0.07 centimeters. Find the two lengths that separate the top 3% and the bottom 3%. These lengths could serve as limits used to identify which nails should be rejected.

mean <- 3.34
sd <- 0.07

curve(dnorm(x, mean = mean, sd = sd), from = 3, to = 3.7, xlab = "Length (cm)", ylab = "Probability")

Next we can use the normal quantile function to see what values mark the 3rd and 97th percentile of length.

round(qnorm(0.03, mean = mean, sd = sd), 2) #find the value below which, 3% of nails fall

## [1] 3.21

round(qnorm(0.97, mean = mean, sd = sd), 2) #find the value below which, 97% of nails fall

## [1] 3.47

9. A psychology professor assigns letter grades on a test according to the following scheme.

A: Top 9% of scores

B: Scores below the top 9% and above the bottom 63%

C: Scores below the top 37% and above the bottom 17%

D: Scores below the top 83% and above the bottom 8%

F: Bottom 8% of scores

Scores on the test are normally distributed with a mean of 75.8 and a standard deviation of 8.1. Find the minimum score required for an A grade.

mean <- 75.8
sd <- 8.1

curve(dnorm(x, mean = mean, sd = sd), from = 0, to = 100, xlab = "Test Score", ylab = "Probability")

If one must be in the top 9% of scores to get an A, then we can find the minimum score required for an A by identifying the 91st percentile. If you are above this number, you are in the top 9% and will get an A. We can easily find it using the quantile function from the last problem.

round(qnorm(0.91, mean = mean, sd = sd), 4) #find the value above which is the top 9% of scores

## [1] 86.6601

If one scores and 86.66 (87 with rounding) or higher on the test, they should get an A.

10. Consider the probability that exactly 96 out of 155 computers will not crash in a day. Assume the probability that a given computer will not crash in a day is 61%. Approximate the (binomial) probability using the normal distribution.

n <- 155 #sample size
p <- 0.61 #probability of not crashing

#Calculate parameters for normal distribution
mean <- n*p 
sd <- sqrt(n*p*(1-p))

kseq <- (75:115)

plot(kseq, dbinom(kseq, n, p), type = "h", lwd = 5, ylab = "Probability", xlab = "# of Non-Crashing Computers",
     main = "Normal Approximation of Binonmial Distribution")
lines(kseq, dnorm(kseq, mean = mean, sd = sd), lwd = 2)

With the sample size large enough, we are able to perform a reasonable normal approximation of the binomial distribution. Below we calculate the probability of exactly 96 out 155 computers not crashing using both the normal and binomial density functions.

round(dnorm(96, mean = mean, sd = sd), 4)

## [1] 0.0639

round(dbinom(96, n, p), 4)

## [1] 0.064

The 2 results are nearly identical with only a difference of 0.01%