Question 1.

1. A researcher wishes to conduct a study of the color preferences of new car buyers.
Suppose that 50% of this population prefers the color red. If 20 buyers are randomly
selected, what is the probability that between 9 and 12 (both inclusive) buyers would
prefer red?
Round your answer to four decimal places. Use the round() function in R.

I. Identify the Distribution. This is a binomial distribution. This distribution is used to describe the probability/number of successes in a repeated trials of an experiment where each trial is independent of one another. The binomial distribution has 2 possible outcomes; success and failure. In this problem a person preferring a “red” car is a success and a person NOT preferring a red car is a failure.

Check assumptions
- There are only two possible outcomes on a trial of the experiment (prefer red or do not prefer red)
- The outcomes are mutually exclusive (cannot both prefer red and not prefer red)
- The random variable is the result of counts (X = how many people prefer red)
- Each trial is independent of any other trail (randomly selected people)

II. Identify Parameters

\(n = 20\)

\(\pi = 50\%=.50\)

III. Probability Statement

Let X be the count of people that prefer red cars.

In Words: Probability that between (inclusive) 9 and 12 buyers would prefer red cars.

In Math: \(P(9 \le X \le 12 \mid n = 20, \pi = .5)\)

IV. Compute/Evaluate

#plot the distribution
plot(x    = 0:20,
     y    = dbinom(x    = 0:20,
                   size = 20,
                   prob = .5
                   ),
     type = 'h',
     main = 'Binomial Distribution (n = 20, p = .5)',
     ylab = 'Probability',
     xlab = 'People that prefer red cars (# Successes)',
     lwd = 3
)

#Compute Probability Statement using PMF

dbinom(x = 9:12, size =20, prob = .5) #Get individual probabilities for 9, 10, 11, 12

## [1] 0.1601791 0.1761971 0.1601791 0.1201344

sum(dbinom(x = 9:12, size = 20, prob =.5)) #sum the probabilities

## [1] 0.6166897

round(x = sum(dbinom(x = 9:12, size = 20, prob = .5)), digits = 4) #round the sums to 4 decimal places.

## [1] 0.6167

#Compute using CDF
#notice q=8 in the second pbinom function becuase it pbinom is inclusive, and we do not want to subtract the scenerio where exactly 9 people prefer red, we want that to be inclued in the probability. 

round(x = (pbinom(q = 12, size = 20, prob = .5) - pbinom(q = 8, size = 20, prob = .5)), digits = 4)

## [1] 0.6167

As we can see we got the same answer in both methods. .6167 probability that out of a randomly selected group of 20 people between 9 and 12 people will prefer the color red for their car.

Question 2.

A quality control inspector has drawn a sample of 13 light bulbs from a recent production lot. Suppose 20% of the bulbs in the lot are defective. What is the probability that less than 6 but more than 3 bulbs from the sample are defective? Round your answer to four decimal places.

I. Identify the Distribution. This is a binomial distribution. This distribution is used to describe the probability/number of successes in a repeated trials of an experiment where each trial is independent of one another. The binomial distribution has 2 possible outcomes; success and failure. In this problem a defective bulb is a success and a effective bulb is a failure.

Check the assumptions:
- There are only two possible outcomes on an outcomes on one trial of the experiment (bulb is defective, bulb is NOT defective/is functioning)
- The outcomes are mutually exclusive (cannot have a functioning and not functioning light bulb)
- Random variable is the result of counts (X = how many defective bulbs)
- Each trial is independent of each other

II. Identify the parameters

\(n = 13\)

\(\pi = 20\% = .2\)

III. Probability Statement Required

Let X be the count of defective bulbs.

In Words: Probability that less than 6 but more than 3 bulbs from the sample of 13 bulbs are defective.

In Math: \(P(3<X<6 \mid n = 13, \pi = .2)\)

IV. Compute

#plot the distribution

plot(x   = 0:13,
     y   = dbinom(x   = 0:13,
                  size = 13,
                  prob = .2
                  ),
     type = 'h',
     main = 'Binomial Distribution (n = 13, p = .2)',
     ylab = 'Probability',
     xlab = 'Number of defective bulbs (# Successes)',
     lwd = 3
)

#The individual Probabilities of getting 4 OR 5 defective bulbs
dbinom(x = 4:5, size = 13, prob = .2)

## [1] 0.15354508 0.06909529

#Compute using PMF (sums from the previous line of code)
round(x = sum(dbinom(x = 4:5, size = 13, prob = .2)), digits = 4)

## [1] 0.2226

#compute usung CDF
#subtract pbinom q = 5 becuase we want to count 5 and not 6, q = 3 because the pbinom is inclusive and we want to count q = 4.

round(x = (pbinom(q = 5, size = 13, prob = .2) - pbinom(q = 3, size = 13, prob = .2)), digits =4)

## [1] 0.2226

We get the same answer with both methods. .2226 probability that less than 6 and more than 3 bulbs (4 or 5) are defective from a sample of 13 bulbs (assuming that the rate of defective bulbs is .2). In other words, it is the probability of having either 4 or 5 defective bulbs in a sample of 13 randomly selected bulbs.

Question 3.

The auto parts department of an automotive dealership sends out a mean of 4.2 special orders daily. What is the probability that, for any day, the number of special orders sent out will be no more than 3? Round your answer to four decimal places.

I. Identify the distribution. This is the Poisson distribution problem. The Poisson distribution is used to find the probability of getting a certain number of successes/event happens (k) during a specified time, distance, area, or volume interval. There is one or two parameter(s) in a Poisson distribution (depends on the source), \(\lambda\) (lambda) being the mean number of successes in a time interval. Be sure to account for the time interval in the lambda parameter (sometimes called \(\lambda t\)).

Check the assumptions
- Successes in the experiment/time interval can be counted (special orders can be counted)
- The Probability is proportional to the length of the interval (longer the interval, more special orders)
- Mean number of successes in a time interval is known (4.2 special orders per day)
- The intervals are independent (each day is independent from one another)

II. Identify the Parameters

\(\lambda = 4.2\)

\(t = 1\)

\(\lambda t = 4.2\)

Lambda = the mean special orders per day

Time = 1 day

III. Probability Statement

Let X be the number of special orders in a specific day.

In Words: Probability that, for any given day, the number of special orders sent out will be no more than 3.

In Math: \(P(X \le 3 \mid \lambda t = 4.2)\)

IV. Compute

dpois(x = 0:3, lambda = 4.2) #Individual probabilities of getting 0, 1, 2, or 3 special packages in one day.

## [1] 0.01499558 0.06298142 0.13226099 0.18516538

#plot a histogram of the Poisson Distribution
plot(x    = 0:15,
     y    = dpois(x  = 0:15,
              lambda = 4.2),
     type = 'h',
     main = 'Poisson Distribution (lambda = 4.2)',
     ylab = 'Probability',
     xlab = 'Special Orders Per Day (# Successes)',
     lwd = 3)

#Compute using PMF- sum all the individual probabilites of 0,1,2,3 special packages in a day.
round(sum(dpois(x = 0:3, lambda = 4.2)), digits = 4)

## [1] 0.3954

#Compute using CMF (we should get the same answer)

round(x = ppois(q = 3, lambda = 4.2), digits = 4)

## [1] 0.3954

Probability that there is no more than 3 special orders in any given day (0 to 3) is .3954.

Question 4.

A pharmacist receives a shipment of 17 bottles of a drug and has 3 of the bottles tested. If 6 of the 17 bottles are contaminated, what is the probability that less than 2 of the tested bottles are contaminated?
Round your answer to four decimal places.

I. Identify the Distribution. The distribution is Hypergeometric. In order for a hypergeometric distribution to take place you must have a population with a known probability of success, and a sample that is drawn from the population. It describes the probability of choosing k objects in n draws without replacement from the finite population N.

Assumptions:

We have a finite population with a known success rate (17 bottles with 6 being contaminated).
Sample of that population that we are performing sampling without replacement (sampling 3 bottles from the population of 17).
Random variable is counts.

II. Identify the Parameters.

\(N = 17\) : Population size

\(m = 6\) : known number of successes (from population)

\(n = 11\) : known number of failures (from population)

\(k = 3\) : Sample size

III. Probability Statement

Let X be the count of contaminated bottles in the sample of 3 bottles.

In Words: Probability of less than 2 of the tested bottles being contaminated.

In Math: \(P(X<2 \mid m = 6, n = 11, k = 3)\)

plot(x   = 0:3,
     y   = dhyper(x  = 0:3,
                  m  = 6,
                  n  = 11,
                  k  = 3),
     type = 'h',
     main = 'Hypergeometric Distribution',
     xlab = '# Contaminated Bottles',
     ylab = 'Probability',
     lwd = 3)

#calculate the individual probabilites of having 0 or 1 contaiminated bottles in your sample
dhyper(x = 0:1, m = 6, n = 11, k = 3)

## [1] 0.2426471 0.4852941

#sum and round the results from the last step
round(x = sum(dhyper(x = 0:1, m = 6, n = 11, k = 3)), digits = 4)

## [1] 0.7279

#Can also calculate using the CMF
round(x = (phyper(q = 1, m = 6, n = 11, k = 3)), digits = 4)

## [1] 0.7279

The probability of having less than 2 (0 or 1) of the 3 tested bottles be contaminated from the population of 17 bottles (having 6 total contaminated bottles) is .7279 or 72.79%.

Question 5.

A town recently dismissed 6 employees in order to meet their new budget reductions. The town had 6 employees over 50 years of age and 19 under 50. If the dismissed employees were selected at random, what is the probability that more than 1 employee was over 50?
Round your answer to four decimal places.

I. Identify the Distribution. The distribution is Hypergeometric. In order for a hypergeometric distribution to take place you must have a population with a known probability of success, and a sample that you draw from the population. It describes the probability of choosing k objects in n draws without replacement from the finite population N.

Assumptions:

We have a finite population with a known success rate (6/25 people over age of 50).
Sample of that population that we are performing sampling without replacement.
Random variable is counts.

II. Identify the Parameters

\(N = 25\) : Population size

\(m = 6\) : Success number of the population (people over age 50)

\(n = 19\) : Failure number of the population (people under age 50)

\(k = 6\) : Sample size (from the population- fired employees)

III. Probability Statement

Let X be the count of fired employees over the age of 50.

In Words: Probability that more than 1 of the 6 fired employees was over 50?

In Math: \(P(X > 1 \mid m = 6, n = 19, k = 6)\)

IV. Compute

plot(x    = 0:6,
     y    = dhyper(x  = 0:6,
                   m  = 6,
                   n  = 19,
                   k  = 6),
     type = 'h',
     main = 'Hypergeometric Distribution',
     xlab = 'Fired Employees Over Age 50',
     ylab = 'Probability')

#Calculate using the PMF
round(x = sum(dhyper(x = 2:6, m = 6, n = 19, k = 6)), digits = 4)

## [1] 0.4529

#Can also compute using the CMF
round(x = (phyper(q = 6, m = 6, n = 19, k = 6) - phyper(q = 1, m = 6, n = 19, k = 6)), digits = 4)

## [1] 0.4529

There is a .4529 probability that more than one of the fired employees are over age 50.

Question 6.

The weights of steers in a herd are distributed normally. The variance is 90,000 and the mean steer weight is 800 lbs. Find the probability that the weight of a randomly selected steer is between 1040 and 1460 lbs.

Round your answer to four decimal places

I. Identify the distribution. This is a Normal Distribution Problem. The normal distribution is a continuous distribution that is symmetric about the mean. Normal distribution fits many continuous variables such as weight distributions (this problem). The mean weight of steers is 800lbs with a standard deviation of 300 lbs The standard deviation is the square root of the variance. The Normal Distribution is unimodal, asymptotic to the horizontal axis and the area under the curve is equal to 1.

Check the assumptions
- Continuous distribution (yes weight is continuous distribution)
- Symmetric about the mean
- Asymptotic about the h-axis
- Unimodal
- Area under the curve = 1
- Bell-shaped distribution

II. Identify the Parameters

\(\mu = 800lbs\)

\(\phi = 300lbs\)

III. Probability Statement

Let X be the the weight of a randomly selected steer.

In Words: Probability that a randomly selected steer is between 1040 and 1460 lbs.

In Math: \(P(1040 \le X \le 1460 \mid \mu = 800, \phi = 300)\)

IV. Compute

#Graph
plot(x   = 0:2000,
     y   = dnorm(x = 0:2000,
                 mean = 800,
                 sd = 300),
     type = 'l',
     main = 'Normal Distribution (mean = 800, sd = 300)',
     ylab = 'Probability',
     xlab = 'Steer Weight (lbs)',
     lwd = 3)
abline(v = 800, col = 'purple', lty = 3)

#Compute using PDF... this is incorrect becuase the normal distribution is a continuous variable. This is treating as a descrete. Although the answer is close the best method is to use the CDF.
round(sum(dnorm(x = 1040:1460, mean = 800, sd = 300)), digits = 4)

## [1] 0.1985

#Compute using CDF
round(x = (pnorm(q = 1460, mean = 800, sd = 300) - pnorm(q = 1040, mean = 800, sd = 300)), digits = 4)

## [1] 0.198

The probability that a randomly selected steer is between 1040 and 1460 lbs is .1980

Question 7.

The diameters of ball bearings are distributed normally. The mean diameter is 106 millimeters and the standard deviation is 4 millimeters. Find the probability that the diameter of a selected bearing is between 103 and 111 millimeters.
Round your answer to four decimal places.

I. Identify the distribution. This is a Normal Distribution Problem. The normal distribution is a continuous distribution that is symmetric about the mean. The Normal Distribution is unimodal, asymptotic to the horizontal axis and the area under the curve is equal to 1. Normal distribution fits many continuous variables. This problem also specifies the normal distribution. The 2 parameters needed are the mean and standard deviation. The mean diameter of the ball bearings is 106mm and the standard deviation is 4mm.

Check the assumptions
- Continuous distribution (yes, the ball bearing diameter is on a continuous scale)
- Symmetric about the mean (yes, normal dist. with sd = 4)
- Asymptotic about the h-axis
- Unimodal
- Area under the curve = 1
- Bell-shaped distribution

II. Identify the Parameters.

\(\mu = 106 mm\)

\(\phi = 4mm\)

III. Probability Statement

Let X be the diameter of a randomly selected ball bearing.

In Words: Probability that a selected bearing is between 103 and 111 millimeters in diameter.

In Math: \(P(103 < X < 111 \mid \mu = 106, \phi = 4)\)

IV. Compute

plot(x    = 80:140,
     y    = dnorm(x    = 80:140,
                  mean = 106,
                  sd   = 4),
     type = 'l',
     main = 'Normal Distribution (u = 106, sd = 4)',
     xlab = 'Diameter (mm)',
     ylab = 'Probability',
     lwd  = 3)
abline(v = 106, col = 'purple', lty = 3)

#Compute using PNORM *** - explained in the previous problem.
round( x = (pnorm(q = 111, mean = 106, sd = 4) - pnorm(q = 103, mean = 106, sd = 4)), digits = 4)

## [1] 0.6677

The probability of randomly selecting a ball bearing between 103 and 111 mm in diameter is .6677

Question 8.

The lengths of nails produced in a factory are normally distributed with a mean of 3.34 centimeters and a standard deviation of 0.07 centimeters. Find the two lengths that separate the top 3% and the bottom 3%. These lengths could serve as limits used to identify which nails should be rejected.

Round your answer to the nearest hundredth (2 decimal places), if necessary.
You will have to use the quantile function1, qnorm() here. In fact, we have seen a little bit of quantiles already when we talked about median and boxplots.

I. Identify the distribution. This is a Normal Distribution Problem. The normal distribution is a continuous distribution that is symmetric about the mean. The Normal Distribution is unimodal, asymptotic to the horizontal axis and the area under the curve is equal to 1. Normal distribution fits many continuous variables. This problem also specifies the normal distribution. The 2 parameters needed are the mean and standard deviation. The mean is 3.34 cm nail length, and the standard deviation is .07 cm.

Check the assumptions
- Continuous distribution (yes, the ball bearing diameter is on a continuous scale)
- Symmetric about the mean (yes, normal dist. with sd = 4)
- Asymptotic about the h-axis
- Unimodal
- Area under the curve = 1
- Bell-shaped distribution

II. Identify the Parameter

\(\mu = 3.34 cm\)

\(\phi = .07cm\)

III. Probability Statement

Find the 3% Percentile and the 97% Percentile.

In Words: Find the two lengths that separate the top 3% and the bottom 3%.

In Math: \((3rd Percentile, 97th Percentile)\)

IV. Plot

x <- seq(2.8,3.8,.01)
plot(x     = x,
     y     = dnorm(x    = x,
                   mean = 3.34,
                   sd   = .07),
     type = 'l',
     main = 'Normal Distribution (u = 3.34, sd = .07)',
     xlab = 'length of nail (cm)',
     ylab = 'Probability',
     lwd  = 3)
abline(v = 3.34, col = 'purple', lty = 3)
abline(v = qnorm(.03, mean = 3.34, sd = .07), col = 'green')
abline(v = qnorm(.97, mean = 3.34, sd = .07), col = 'green')

#3rd percentile 
round(x = qnorm(.03, mean = 3.34, sd = .07), digits = 2)

## [1] 3.21

#97th percentile
round(x = qnorm(.97, mean = 3.34, sd = .07), digits = 2)

## [1] 3.47

p <- c(.03, .97)
round(x = (qnorm(p, mean = 3.34, sd = .07)), digits = 2)

## [1] 3.21 3.47

The 3rd percentile length is 3.21 cm and the 97th percentile length is 3.47. This means that any nail under the length of 3.21 cm and over the length of 3.47 cm should be rejected by the factory.

Question 9.

A psychology professor assigns letter grades on a test according to the following
scheme.
A: Top 9% of scores
B: Scores below the top 9% and above the bottom 63%
C: Scores below the top 37% and above the bottom 17%
D: Scores below the top 83% and above the bottom 8%
F: Bottom 8% of scores
Scores on the test are normally distributed with a mean of 75.8 and a standard
deviation of 8.1. Find the minimum score required for an A grade.
Round your answer to the nearest whole number, if necessary.

I. Identify the distribution. This is a Normal Distribution Problem. The normal distribution is a continuous distribution that is symmetric about the mean. The Normal Distribution is unimodal, asymptotic to the horizontal axis and the area under the curve is equal to 1. Normal distribution fits many continuous variables. This problem also specifies the normal distribution. The 2 parameters needed are the mean and standard deviation. The mean is 75.8 points, and the standard deviation is 8.1 points.

Check the assumptions
- Continuous distribution (yes, grades with decimals are continuous)
- Symmetric about the mean (yes, normal dist. with sd = 8.1)
- Asymptotic about the h-axis
- Unimodal
- Area under the curve = 1
- Bell-shaped distribution

II. Identify The Parameters

\(\mu = 75.8 points\)

\(\phi = 8.1 points\)

III. Identify the Problem.

We need the minimum score for a grade “A”. A’s are the top 9% of scores. In other words we need to find the 91st Percentile.

IV. Compute

#Graph 
plot(x     = 40:100,
     y     = dnorm(x    = 40:100,
                   mean = 75.8,
                   sd   = 8.1),
     type = 'l',
     main = 'Normal Distribution (u = 75.8, sd = 8.1)',
     xlab = 'Test Scores',
     ylab = 'Probability',
     lwd  = 3)
abline(v = 75.8, col = 'purple', lty = 3)
abline(v = qnorm(.91, 75.8, 8.1), col = 'blue', lty = 3)

round(x = qnorm(.91, mean = 75.8, sd = 8.1), digits = 0)

## [1] 87

The minimum test score for an A is 87.

Question 10.

Consider the probability that exactly 96 out of 155 computers will not crash in a day. Assume the probability that a given computer will not crash in a day is 61%. Approximate the probability using the normal distribution.
Round your answer to four decimal places.

I. Identify the distribution. This is a Normal Distribution Problem. The normal distribution is a continuous distribution that is symmetric about the mean. The Normal Distribution is unimodal, asymptotic to the horizontal axis and the area under the curve is equal to 1. Normal distribution fits many continuous variables. This problem also specifies the normal distribution. The 2 parameters needed are the mean and standard deviation. The mean is 94.55 (.61 * 155) \((np)\) and the standard deviation is calculated by \(sqrt(np(1-p)) = 6.072\)

Check the assumptions
- Continuous distribution (yes, grades with decimals are continuous)
- Symmetric about the mean (yes, normal dist. with sd = 8.1)
- Asymptotic about the h-axis
- Unimodal
- Area under the curve = 1
- Bell-shaped distribution

II. Identify the Parameters

#calc the SD with the formula sqrt(np(1-p))
p_prime <- 1-.61
sd <- sqrt(94.55 *p_prime)
sd

## [1] 6.072438

\(\mu = 94.55\)

\(\phi = 6.072\)

III. Probability Statement

Let X be the number of computers that will not crash in a day.

In Words: Probability that exactly 96 out of 155 computers will not crash in a day.

In Math: \(P(X = 96 \mid \mu = 94.55, \phi = 6.072)\)

IV. Compute

plot(x   = 50:120,
     y   = dnorm(x    = 50:120,
                 mean = 94.55,
                 sd   = sd),
     type = 'h',
     main = 'Normal Distribution (mean = 94.55, sd = 6.072)',
     xlab = 'Count of Computers that do not crash',
     ylab = 'Probability'
)
abline(v = 94.55, col = 'purple', lty = 3)

#Compute using the PDF becuase we are looking for a specific value. 

round(x = (dnorm(x = 96, mean = 94.55, sd = sd)), digits = 4)

## [1] 0.0639

The probability of having exactly 96/155 computers not fail in a day following the normal distribution is .0639.

Homework3

Ryan O’Hara

2023-10-06

Question 1.

Question 2.

Question 3.

Question 4.

Question 5.

Question 6.

Question 7.

Question 8.

Question 9.

Question 10.