These questions will help you build an understanding of Normal, Binomial, Hypergeometric and Poisson distribution. It would be very helpful if you could plot the distributions before calculating the probabilities. Thus, begin with reading up on the plot () function.

You will be using the probability density function, cumulative density function and quantile function in this assignment.

Q1.

A researcher wishes to conduct a study of the color preferences of new car buyers. Suppose that 50% of this population prefers the color red. If 20 buyers are randomly selected, what is the probability that between 9 and 12 (both inclusive) buyers would prefer red? Round your answer to four decimal places. Use the round() function in R.

\(P(9 \le X\le12\ |\ n=20, \pi =.5 )\)

Q1. Solution:

#binomial

#parameters
x <- 9:12 #number of buyers
n <- 20 #sample size
pi <- .50 #prob of red pref

#probability of preferring red
prob_prefer_red <-dbinom(x=9:12,
                         size=20,
                         prob=.5) 
round(sum(prob_prefer_red), digits=4)

## [1] 0.6167

#plot
probabilities <- dbinom(x = 0:20, 
                        size = n, 
                        prob = pi
                        )

barplot(height = probabilities, 
        names.arg = 0:20, 
        col = "skyblue", 
        main = "Binomial Distribution  (n=20, p=0.5)", 
        xlab = "Number of Successes", 
        ylab = "Probability"
        )

#another way to visualize
plot(x    = 0:20, 
     y    = dbinom(x= 0:20, 
                   size = 20, 
                   prob = .5
                  ), 
     type = 'h',
     main = 'Binomial Distribution (n=20, p=0.50)',
     ylab = 'Probability',
     xlab = 'No. of Buyers',
     lwd  = 3,
     col  = "turquoise"
     )

There is a 61.67% chance that 9-12 buyers would prefer red cars.

Q2.

A quality control inspector has drawn a sample of 13 light bulbs from a recent production lot. Suppose 20% of the bulbs in the lot are defective. What is the probability that less than 6 but more than 3 bulbs from the sample are defective? Round your answer to four decimal places.

\(P(3<X<6\ |\ n=13, \pi =.2 )\)

Q2. Solution:

#binomial
x <- 4:5 # 4 to 5 defective
n <- 13 #number of bulbs
pi <- .2 #prob of defect

prob_defect_bulb<-dbinom(x=4:5,
                         size=13,
                         prob= .2)
round(sum(prob_defect_bulb), digits=4)

## [1] 0.2226

#plot
probabilities <- dbinom(x = 0:13, 
                        size = n, 
                        prob = pi
                        )

barplot(height = probabilities, 
        names.arg = 0:13, 
        col = "skyblue", 
        main = "Binomial Distribution  (n=13, p=0.2)", 
        xlab = "Number of Successes", 
        ylab = "Probability"
        )

#Another way to visualize
plot(x    = 0:13, 
     y    = dbinom(x    = 0:13, 
                   size = 13, 
                   prob = .2
                  ), 
     type = 'h',
     main = 'Binomial Distribution (n=13, p=0.2)',
     ylab = 'Probability',
     xlab = 'No. of Successes',
     lwd  = 3,
     col  = "lavender"
     )

The probability of more than 3 but less than 6 bulbs being defective is 22.26%.

Q3.

The auto parts department of an automotive dealership sends out a mean of 4.2 special orders daily. What is the probability that, for any day, the number of special orders sent out will be no more than 3? Round your answer to four decimal places.

Q3. Solution:

#poisson

#parameters
lambda <- 4.2
k <- 3

#calculate probability
round(ppois(k, lambda), digits=4)

## [1] 0.3954

#plot
hist(rpois(n=10000, lambda=4.2), col="skyblue")

#another way to visualize
plot(x    = 0:15,
     y    = dpois(x = 0:15,
                  lambda = 4.2
                ),
     type = 'h',
     main = 'Poisson Distribution (lambda=4.2)',
     ylab = 'Probability',
     xlab = 'No. of Special Orders',
     lwd  = 3,
     col  = "pink"
)

The probability that the number of special orders sent out will be no more than 3 is 39.54%.

Q4.

A pharmacist receives a shipment of 17 bottles of a drug and has 3 of the bottles tested. If 6 of the 17 bottles are contaminated, what is the probability that less than 2 of the tested bottles are contaminated? Round your answer to four decimal places.

Q4. Solution:

#hypergeometric

round(sum(dhyper(x=0:1, #0 or 1 bottle contaminated
                 m=6, #failure 
                 n=11, #success
                 k=3)), #tested
      digits=4)

## [1] 0.7279

m<-6; n<-11; k<-3
probabilities <- dhyper(x=0:17, m, #failure 
                 n, #success
                 k) #tested

barplot(height = probabilities, 
        names.arg = 0:17, 
        col = "skyblue", 
        main = "Hypergeometric Distribution", 
        xlab = "Number of Successes", 
        ylab = "Probability"
        )

#visualize
plot(x    = 0:17, 
     y    = dhyper(x    = 0:17, 
                   m=6, #failure
                   n=11, #success
                   k=3, #tested
                  ), 
     type = 'h',
     main = 'Hypergeometric Distribution',
     ylab = 'Probability',
     xlab = 'No. of Bottles',
     lwd  = 3,
     col  = "purple"
     )

The probability of less than 2 bottles being contaminated is 72.79%.

Q5.

A town recently dismissed 6 employees in order to meet their new budget reductions. The town had 6 employees over 50 years of age and 19 under 50. If the dismissed employees were selected at random, what is the probability that more than 1 employee was over 50? Round your answer to four decimal places.

Q5. Solution:

#hypergeometric

round(sum(dhyper(x = 2:6, 
                 m = 6, 
                 n = 19, 
                 k = 6)), 
      digits=4)

## [1] 0.4529

#another way to do it

round(sum(1-phyper(1, #more than 1 employee (-1 signifies right tailed distribution)
                   m = 6,  #no. of employees over 50
                   n = 19, #no. of employees under 50
                   k = 6)), #no. of dismissed
      digits=4)

## [1] 0.4529

#plot
m<-6; n<-19; k<-6
probabilities <- dhyper(x=0:6, m, #failure 
                 n, #success
                 k) #tested

barplot(height = probabilities, 
        names.arg = 0:6, 
        col = "skyblue", 
        main = "Hypergeometric Distribution", 
        xlab = "Number of Successes", 
        ylab = "Probability"
        )

# another way to visualize

plot(x    = 0:6,
     y    = dhyper(0:6,
                   m = 6,
                   n = 19,
                   k = 6
                ),
     type = 'h',
     main = 'Hypergeometric Distribution',
     ylab = 'Probability ',
     xlab = 'No. of Dismissed Employees',
     lwd  = 3,
     col  = "cadetblue"
)

The probability that more than 1 employee over 50 being dismissed is 45.29%.

Q6.

The weights of steers in a herd are distributed normally. The variance is 90,000 and the mean steer weight is 800 lbs. Find the probability that the weight of a randomly selected steer is between 1040 and 1460 lbs. Round your answer to four decimal places.

Q6. Solution:

#normal

sd <- sqrt(90000) #variance is stdev^2

#probability of selected steer being between 1040 and 1460 lbs
round(sum(dnorm(x    = 1040:1460, #range for prob.
                mean = 800, #mu 
                sd   = sd #sd, sqrt(variance)
                )),
      digits=4)

## [1] 0.1985

# Set the mean and standard deviation
mu    <- 800
sigma <- sd

# Generate a range of values around the mean
x <- seq(from       =  mu - 3*sigma,
         to         =  mu + 3*sigma, 
         length.out =  1000
         )

# Calculate the probability density function
pdf <- dnorm(x    = x, 
             mean = mu, 
             sd   = sigma
             )

# Plot the normal distribution

plot(x    = x, 
     y    = pdf,
     type = 'l', 
     col  = 'black', 
     lwd  = 2, 
     xlab = 'Weight of Steer (lbs)', 
     ylab = 'Probability Density',
     main = 'Normal Distribution with Mean 800 and SD 300'
     )

# Shade the area under the curve for values between 103 and 111
x_shade <- seq(from       = 1040, 
               to         = 1460, 
               length.out = 1000
               )

pdf_shade <- dnorm(x    = x_shade, 
                   mean = mu, 
                   sd   = sigma
                   )
?rev
polygon(x      =  c(x_shade, rev(x_shade)),
        y      =  c(pdf_shade, rep(x      = 0, 
                                   times  = length(pdf_shade)
                                  )
                   ), 
        col    = 'red', 
        border = NA
        )

The probability that the weight of a randomly selected steer is between 1040 and 1460 lbs is 19.85%.

Q7.

The diameters of ball bearings are distributed normally. The mean diameter is 106 millimeters and the standard deviation is 4 millimeters. Find the probability that the diameter of a selected bearing is between 103 and 111 millimeters. Round your answer to four decimal places.

Q7. Solution:

#normal

# Set the mean and standard deviation
mu    <- 106
sigma <- 4

#find probability of diameter being between 103 and 111 mm
round(sum(dnorm(x    = 103:111, 
                mean = mu, 
                sd   = sigma 
                )),
      digits=4)

## [1] 0.7258

# Generate a range of values around the mean
?seq
x <- seq(from       =  mu - 3*sigma,
         to         =  mu + 3*sigma, 
         length.out =  1000
         )

# Calculate the probability density function
pdf <- dnorm(x    = x, 
             mean = mu, 
             sd   = sigma
             )

# Plot the normal distribution

plot(x    = x, 
     y    = pdf,
     type = 'l', 
     col  = "black", 
     lwd  = 2, 
     xlab = 'Mean Diameter of a Ball Bearing', 
     ylab = 'Probability Density',
     main = 'Normal Distribution with Mean 106 and SD 4'
     )

# Shade the area under the curve for values between 103 and 111
x_shade <- seq(from       = 103, 
               to         = 111, 
               length.out = 1000
               )

pdf_shade <- dnorm(x    = x_shade, 
                   mean = mu, 
                   sd   = sigma
                   )
?rev
polygon(x      =  c(x_shade, rev(x_shade)),
        y      =  c(pdf_shade, rep(x      = 0, 
                                   times  = length(pdf_shade)
                                  )
                   ), 
        col    = 'red', 
        border = NA
        )

#i wonder if this is also correct?
#visualize the probability distributions
plot(x    = 91:118,
     y    = dnorm(91:118, #range for distribution
                mean = 106, #mu 
                sd   = 4, #sd
                ),
     type = 'h',
     main = 'Normal Distribution',
     ylab = 'Probability Density',
     xlab = 'Mean Diameter of a Ball Bearing',
     lwd  = 3,
     col  = "blue"
)

The probability that the diameter of a selected bearing being between 103 and 111 millimeters is 72.58%.

Q8.

The lengths of nails produced in a factory are normally distributed with a mean of 3.34 centimeters and a standard deviation of 0.07 centimeters. Find the two lengths that separate the top 3% and the bottom 3%. These lengths could serve as limits used to identify which nails should be rejected. Round your answer to the nearest hundredth (2 decimal places), if necessary. You will have to use the quantile function1, qnorm() here. In fact, we have seen a little bit of quantiles already when we talked about median and boxplots.

Q8. Solution:

#normal
?qnorm

round(qnorm(.03, #bottom 3%
            3.34, #mean
            .07), #sd
      digits = 2)

## [1] 3.21

round(qnorm(.97, #top 3%
            3.34, #mean
            .07), #sd
      digits = 2)

## [1] 3.47

# Set the mean and standard deviation
mu    <- 3.34
sigma <- .07

# Generate a range of values around the mean
x <- seq(from       =  mu - 3*sigma,
         to         =  mu + 3*sigma, 
         length.out =  1000
         )

# Calculate the probability density function
pdf <- dnorm(x    = x, 
             mean = mu, 
             sd   = sigma
             )

# Plot the normal distribution

plot(x    = x, 
     y    = pdf,
     type = 'l', 
     col  = 'black', 
     lwd  = 2, 
     xlab = 'Length of Nails (cm)', 
     ylab = 'Density',
     main = 'Normal Distribution with Mean 3.34 and SD .07'
     )

# Shade the area under the curve for bottom 3%
x_shade <- seq(from       = mu - 3*sigma, 
               to         = 3.21, 
               length.out = 100
               )

pdf_shade <- dnorm(x    = x_shade, 
                   mean = mu, 
                   sd   = sigma
                   )

polygon(x      =  c(x_shade, rev(x_shade)),
        y      =  c(pdf_shade, rep(x      = 0, 
                                   times  = length(pdf_shade)
                                  )
                   ), 
        col    = 'red', 
        border = NA
        )

# Shade the area under the curve for values at top 3%
x_shade <- seq(from       = 3.41, 
               to         = mu + 3*sigma, 
               length.out = 100
               )

pdf_shade <- dnorm(x    = x_shade, 
                   mean = mu, 
                   sd   = sigma
                   )

polygon(x      =  c(x_shade, rev(x_shade)),
        y      =  c(pdf_shade, rep(x      = 0, 
                                   times  = length(pdf_shade)
                                  )
                   ), 
        col    = 'red', 
        border = NA
        )

The nails should be rejected at 3.21 cm and 3.41 cm.

Q9.

A psychology professor assigns letter grades on a test according to the following scheme.

A: Top 9% of scores

B: Scores below the top 9% and above the bottom 63%

C: Scores below the top 37% and above the bottom 17%

D: Scores below the top 83% and above the bottom 8%

F: Bottom 8% of scores

Scores on the test are normally distributed with a mean of 75.8 and a standard deviation of 8.1. Find the minimum score required for an A grade. Round your answer to the nearest whole number, if necessary.

Q9. Solution:

#parameters
mu <- 75.8
sigma <- 8.1
quantile <- .91

round(qnorm(quantile, 
            mu, 
            sigma), 
      digits = 0)

## [1] 87

# Generate a range of values around the mean
x <- seq(from       =  mu - 3*sigma,
         to         =  mu + 3*sigma, 
         length.out =  1000
         )

# Calculate the probability density function
pdf <- dnorm(x    = x, 
             mean = mu, 
             sd   = sigma
             )

# Plot the normal distribution

plot(x    = x, 
     y    = pdf,
     type = 'l', 
     col  = 'black', 
     lwd  = 2, 
     xlab = 'Exam scores (Percent Accuracy)', 
     ylab = 'Probability Density',
     main = 'Normal Distribution with Mean=75.8% and SD=8.1%'
     )

# Shade the area under the curve for values at top 9%
x_shade <- seq(from       = 87, 
               to         = 100,
               length.out = 100
               )

pdf_shade <- dnorm(x    = x_shade, 
                   mean = mu, 
                   sd   = sigma
                   )

polygon(x      =  c(x_shade, rev(x_shade)),
        y      =  c(pdf_shade, rep(x      = 0, 
                                   times  = length(pdf_shade)
                                  )
                   ), 
        col    = 'red', 
        border = NA
        )

Q10.

Consider the probability that exactly 96 out of 155 computers will not crash in a day. Assume the probability that a given computer will not crash in a day is 61%. Approximate the (binomial) probability using the normal distribution. Round your answer to four decimal places.

Q10. Solution:

#binomial + normal distribution

#parameters
p <- .61 #prob not crash
x <- 59 #no. of computers that'll crash
k <- 96 #will not crash
n <- 155 #sample size
mu <- p*n
sd <- p*(1-p)*n

#binomial
?pbinom
round(pbinom(q = k,
             size = n,
             prob = .61,
             lower.tail = FALSE),
      digits = 4)

## [1] 0.3762

#normal
?pnorm
round(pnorm(q= k,
            mean= mu,
            sd= sd,
            lower.tail = FALSE),
      digits = 4)

## [1] 0.4843

The probability of 96 computers out of 155 not crashing in a given day, with a probability of it not crashing being 61%, is 37.62% when examining a binomial distribution and 48.43% with a normal distribution.

Extra resource

**What is quantile function?**

A quantile function, also known as a quantile map or an inverse cumulative distribution function, is a mathematical function that maps a probability value to the corresponding quantile value in a distribution. In other words, given a probability value, the quantile function returns the value at which that probability of the data is exceeded. For example, the 0.75 quantile, also known as the 75th percentile, is the value below which 75% of the data falls.

Quantile functions are commonly used in statistics and data analysis to summarize the distribution of a dataset, and to compare the distributions of different datasets. For a continuous distribution, the quantile function is unique and can be represented analytically. For discrete distributions, quantile functions may not exist, or may not be unique, and are often estimated from the data.

HW 3

Jiwon Ban

2024-04-07

Q1.