Discrete Random Variables

Exercise 1. A chemical supply company ships a certain solvent in 10-gallon drums. Let X represent the number of drums ordered by a randomly chosen customer. Assume X has the following probability mass function (pmf). The mean and variance of X are : \(\mu_X = 2.3\) and \(\sigma^2_X = 1.81\):

X P(X=x)
1 0.4
2 0.2
3 0.2
4 0.1
5 0.1
  1. Calculate \(P(X \le 2)\) and describe what it means in the context of the problem. P(X=2) + P(X=3) + P(X=4) + P(X=5) = 0.6 This means that there is a probability of 0.6 that a customer orders at least 2 drums of solvent.
  2. Let Y be the number of gallons ordered, so \(Y=10X\). Find the probability mass function of Y.
Y P(Y=y)
10 0.4
20 0.2
30 0.2
40 0.1
50 0.1
  1. Calculate the expected value (mean) number of gallons ordered \(\mu_Y\).

10(0.4) + 20(0.2) + 30(0.2) + 40(0.1) + 50(0.1) = 4 + 4 + 6 + 4 + 5 = 23

  1. Calculate the standard deviation of the number of gallons ordered, \(\sigma_Y\).

(10-23)^2(0.4) + (20-23)^2(0.2) + (30-23)^2(0.2) + (40-23)^2(0.1) + (50-23)^2(0.1) = (-13^2)(0.4) + (-3^2)(0.2) + (7^2)(0.2) + (17^2)(0.1) + (27^2)(0.1) = 169(0.4) + 9(0.2) + 49(0.2) + 289(0.1) + 729(0.1) = 67.6 + 1.8 + 9.8 + 28.9 + 72.9 = 181

Variance is 181 therefore standard deviation is 13.4536

Normal Random Variables

Exercise 2. Weights of female cats of a certain breed (A) are well approximated by a normal distribution with mean 4.1 kg and standard deviation of 0.6 kg \(W_A~\sim N(4.1, 0.6^2)\).

  1. What proportion of female cats of that breed (A) have weights between 3.7 and 4.4 kg?
pnorm(3.7, 4.1, 0.6, lower.tail = F) - pnorm(4.4, 4.1, 0.6, lower.tail = F)
## [1] 0.4389699
  1. A female cat of that breed (A) has a weight that is 0.5 standard deviations above the mean. What proportion of female cats of that breed (A) are heavier than this one?

Mean is 4.1 kg and the standard deviation is 0.6, a half standard deviaition is 0.3, therefore half a standard deviation more than the mean is 4.4 kg.

pnorm(4.4,4.1,0.6,lower.tail = F)
## [1] 0.3085375
  1. How heavy is a female cat of this breed whose weght is the 80th percentile?
qnorm(.8,4.1,0.6)
## [1] 4.604973
  1. What is the IQR of weights for female cats of this breed using the normal distribution approximation?
qnorm(.75,4.1,0.6) - qnorm(.25,4.1,0.6)
## [1] 0.8093877
  1. Females from another breed of cats (breed B) have weights well approximated by a normal distribution with mean 10.6 lb and standard deviation of 0.9 lb. \(W_{B.lb}~\sim N(10.6, 0.9^2)\). Transform the weights of cat breed B into kilograms using the conversion: 1 lb \(\approx\) 0.454 kgs. You can use the transformation: \(W_{B}=0.454(W_{B.lb})\). Compare the shape, center, and spread of the two breeds.

10.6.454 = 4.8124 0.9.454 = 0.4086 \(W_B~\sim N(4.8124, 0.4086^2)\)

qnorm(.75, 4.8124, 0.4086) - qnorm(.25, 4.8124, 0.4086)
## [1] 0.551193
qnorm(.75,4.1,.06) - qnorm(.25,4.1,.06)
## [1] 0.08093877

The center is higher in breed B, and breed A has a higher variance. Breed B has a higher IQR, leading to likely being a greater spread.

Sampling Distributions

Exercise 3. A serving of breakfast cereal has a sugar content that is well approximated by a Normal random variable X with mean 13 g and variance \(1.3^2 g^2\). We can consider each serving as an independent and identical draw from X.

  1. In what percent of servings will the sugar content be above 13.3 g?
pnorm(13.3,13,1.3,lower.tail = F)
## [1] 0.408747

The probability of sugar content being above 13.3 is 0.4087

  1. What is the probability that a randomly chosen serving will have a sugar content between 13.877 and 12.123? What do we call the difference: 13.877-12.123=1.754?
pnorm(12.123,13,1.3,lower.tail = F) - pnorm(13.877,13,1.3,lower.tail = F)
## [1] 0.5000798
qnorm(.75,13,1.3) - qnorm(.25,13,1.3)
## [1] 1.753673

The probability of being between 12.123 and 13.877 is 0.5001 The difference of 1.754 would be called the IQR.

  1. Calculate the probability that in 6 servings, only 1 has a sugar content below 13 g.

There is a probability of 0.5 to be under 13g

dbinom(1,6,0.5)
## [1] 0.09375
  1. Describe the sampling distribution for the mean sugar content of 6 servings \(\bar{X}\).

\(/bar{x}\) ~ N(13,(1.3^2)/6)

  1. What is the interquartile range of the sampling distribution for the sample mean \(\bar{X}\) when n=6? Is that value larger or smaller than the IQR implied in part (b)? Why do the relative sizes of the IQRs make sense?
qnorm(.75,13,1.3/sqrt(6)) - qnorm(.25,13,1.3/sqrt(6))
## [1] 0.7159341

It is smaller than implied in part b, this makes sense, because as n increases, the spread decreases.

  1. What is the probability that the mean sugar content in 6 servings is more than 13.3 g ?
1-pnorm(13.3,13,1.3/sqrt(6))
## [1] 0.2859461
  1. Is it more or less likely that the mean sugar content is above 13.3 g in 10 servings or 6 servings (as computed in f)? Can you explain it without actually computing the new probability?

It is more likely to be above 13.3 in 6 servings, as more data point would make it more unlikely to deviate from the mean.

  1. Suppose each cereal box of this type contains 10 servings and consider the total sugar content in each box as a sum of 10 iid random draws from \(X \sim N(13, 1.3^2)\). If you were to eat a whole box of cereal, above what total sugar content would you consume with 95% probability? Show and briefly explain your calculations.
qnorm(.95,10*13,sqrt(10)*1.3)
## [1] 136.7619

Exercise 4. You will be comparing the sampling distributions for two different estimators of \(\sigma\), the population standard deviation.

When trying to estimate the standard deviation of a population (\(\sigma\)) from a sample we could use:

The graphs below give the sampling distributions produced by these estimators when drawing a sample of size 8 from a normal population with mean \(\mu_x=3\) and standard deviation \(\sigma_X=5\).

What do you notice about the mean of the standard deviations produced using the \(s_1\) estimator compared to the \(s_2\) estimator compared to the true population standard deviation? Why do we prefer to use the \(s_1\) formulation when we have a sample of data and are interested in estimating the population standard deviation? (You should use the resulting histograms to help you answer the question and use the word “bias”.)

Estimator 1 is the Standard Deviation of the sample while Estimator 2 is the Standard Deviation of the population. In the histograms, the red line is the actual Sd, and the blue lines are the simulated standard deviations. In Estimator 1, the blue line is closer to the red line than it is for estimator 2. This makes estimator 1 a better estimator as the bias is less. It has less bias because it does not include outliers and is closer to a normal distribution