Stat 371 Homework #4

Submit your homework to Canvas by the due date and time. Email your lecturer if you have extenuating circumstances and need to request an extension.
If an exercise asks you to use R, include a copy of all relevant code and output in your submitted homework file. You can copy/paste your code, take screenshots, or compile your work in an Rmarkdown document.
If a problem does not specify how to compute the answer, you many use any appropriate method. I may ask you to use R or use manual calculations on your exams, so practice accordingly.
You must include an explanation and/or intermediate calculations for an exercise to be complete.
Be sure to submit the HWK4 Autograde Quiz which will give you ~20 of your 40 accuracy points.
50 points total: 40 points accuracy, and 10 points completion

Discrete Random Variables

Exercise 1. A chemical supply company ships a certain solvent in 10-gallon drums. Let X represent the number of drums ordered by a randomly chosen customer. Assume X has the following probability mass function (pmf). The mean and variance of X are : \(\mu_X = 2.3\) and \(\sigma^2_X = 1.81\):

X	P(X=x)
1	0.4
2	0.2
3	0.2
4	0.1
5	0.1

Calculate \(P(X \le 2)\) and describe what it means in the context of the problem. P(X=2) + P(X=3) + P(X=4) + P(X=5) = 0.6 This means that there is a probability of 0.6 that a customer orders at least 2 drums of solvent.

Let Y be the number of gallons ordered, so \(Y=10X\). Find the probability mass function of Y.

Y	P(Y=y)
10	0.4
20	0.2
30	0.2
40	0.1
50	0.1

Calculate the expected value (mean) number of gallons ordered \(\mu_Y\).

10(0.4) + 20(0.2) + 30(0.2) + 40(0.1) + 50(0.1) = 4 + 4 + 6 + 4 + 5 = 23

Calculate the standard deviation of the number of gallons ordered, \(\sigma_Y\).

(10-23)^2(0.4) + (20-23)^2(0.2) + (30-23)^2(0.2) + (40-23)^2(0.1) + (50-23)^2(0.1) = (-13^2)(0.4) + (-3^2)(0.2) + (7^2)(0.2) + (17^2)(0.1) + (27^2)(0.1) = 169(0.4) + 9(0.2) + 49(0.2) + 289(0.1) + 729(0.1) = 67.6 + 1.8 + 9.8 + 28.9 + 72.9 = 181

Variance is 181 therefore standard deviation is 13.4536

Normal Random Variables

Exercise 2. Weights of female cats of a certain breed (A) are well approximated by a normal distribution with mean 4.1 kg and standard deviation of 0.6 kg \(W_A~\sim N(4.1, 0.6^2)\).

What proportion of female cats of that breed (A) have weights between 3.7 and 4.4 kg?

pnorm(3.7, 4.1, 0.6, lower.tail = F) - pnorm(4.4, 4.1, 0.6, lower.tail = F)

## [1] 0.4389699

A female cat of that breed (A) has a weight that is 0.5 standard deviations above the mean. What proportion of female cats of that breed (A) are heavier than this one?

Mean is 4.1 kg and the standard deviation is 0.6, a half standard deviaition is 0.3, therefore half a standard deviation more than the mean is 4.4 kg.

pnorm(4.4,4.1,0.6,lower.tail = F)

## [1] 0.3085375

How heavy is a female cat of this breed whose weght is the 80th percentile?

qnorm(.8,4.1,0.6)

## [1] 4.604973

What is the IQR of weights for female cats of this breed using the normal distribution approximation?

qnorm(.75,4.1,0.6) - qnorm(.25,4.1,0.6)

## [1] 0.8093877

Females from another breed of cats (breed B) have weights well approximated by a normal distribution with mean 10.6 lb and standard deviation of 0.9 lb. \(W_{B.lb}~\sim N(10.6, 0.9^2)\). Transform the weights of cat breed B into kilograms using the conversion: 1 lb \(\approx\) 0.454 kgs. You can use the transformation: \(W_{B}=0.454(W_{B.lb})\). Compare the shape, center, and spread of the two breeds.

10.6.454 = 4.8124 0.9.454 = 0.4086 \(W_B~\sim N(4.8124, 0.4086^2)\)

qnorm(.75, 4.8124, 0.4086) - qnorm(.25, 4.8124, 0.4086)

## [1] 0.551193

qnorm(.75,4.1,.06) - qnorm(.25,4.1,.06)

## [1] 0.08093877

The center is higher in breed B, and breed A has a higher variance. Breed B has a higher IQR, leading to likely being a greater spread.

Sampling Distributions

Exercise 3. A serving of breakfast cereal has a sugar content that is well approximated by a Normal random variable X with mean 13 g and variance \(1.3^2 g^2\). We can consider each serving as an independent and identical draw from X.

In what percent of servings will the sugar content be above 13.3 g?

pnorm(13.3,13,1.3,lower.tail = F)

## [1] 0.408747

The probability of sugar content being above 13.3 is 0.4087

What is the probability that a randomly chosen serving will have a sugar content between 13.877 and 12.123? What do we call the difference: 13.877-12.123=1.754?

pnorm(12.123,13,1.3,lower.tail = F) - pnorm(13.877,13,1.3,lower.tail = F)

## [1] 0.5000798

qnorm(.75,13,1.3) - qnorm(.25,13,1.3)

## [1] 1.753673

The probability of being between 12.123 and 13.877 is 0.5001 The difference of 1.754 would be called the IQR.

Calculate the probability that in 6 servings, only 1 has a sugar content below 13 g.

There is a probability of 0.5 to be under 13g

dbinom(1,6,0.5)

## [1] 0.09375

Describe the sampling distribution for the mean sugar content of 6 servings \(\bar{X}\).

\(/bar{x}\) ~ N(13,(1.3^2)/6)

What is the interquartile range of the sampling distribution for the sample mean \(\bar{X}\) when n=6? Is that value larger or smaller than the IQR implied in part (b)? Why do the relative sizes of the IQRs make sense?

qnorm(.75,13,1.3/sqrt(6)) - qnorm(.25,13,1.3/sqrt(6))

## [1] 0.7159341

It is smaller than implied in part b, this makes sense, because as n increases, the spread decreases.

What is the probability that the mean sugar content in 6 servings is more than 13.3 g ?

1-pnorm(13.3,13,1.3/sqrt(6))

## [1] 0.2859461

Is it more or less likely that the mean sugar content is above 13.3 g in 10 servings or 6 servings (as computed in f)? Can you explain it without actually computing the new probability?

It is more likely to be above 13.3 in 6 servings, as more data point would make it more unlikely to deviate from the mean.

Suppose each cereal box of this type contains 10 servings and consider the total sugar content in each box as a sum of 10 iid random draws from \(X \sim N(13, 1.3^2)\). If you were to eat a whole box of cereal, above what total sugar content would you consume with 95% probability? Show and briefly explain your calculations.

qnorm(.95,10*13,sqrt(10)*1.3)

## [1] 136.7619

Exercise 4. You will be comparing the sampling distributions for two different estimators of \(\sigma\), the population standard deviation.

When trying to estimate the standard deviation of a population (\(\sigma\)) from a sample we could use:

The graphs below give the sampling distributions produced by these estimators when drawing a sample of size 8 from a normal population with mean \(\mu_x=3\) and standard deviation \(\sigma_X=5\).

What do you notice about the mean of the standard deviations produced using the \(s_1\) estimator compared to the \(s_2\) estimator compared to the true population standard deviation? Why do we prefer to use the \(s_1\) formulation when we have a sample of data and are interested in estimating the population standard deviation? (You should use the resulting histograms to help you answer the question and use the word “bias”.)

Estimator 1 is the Standard Deviation of the sample while Estimator 2 is the Standard Deviation of the population. In the histograms, the red line is the actual Sd, and the blue lines are the simulated standard deviations. In Estimator 1, the blue line is closer to the red line than it is for estimator 2. This makes estimator 1 a better estimator as the bias is less. It has less bias because it does not include outliers and is closer to a normal distribution

Stat 371 Homework #4

Brady Ring

Discrete Random Variables

Normal Random Variables

Sampling Distributions