Statistics 371 Homework #4

*Submit your homework to Canvas by the due date and time. Email your instructor if you have extenuating circumstances and need to request an extension.

*If an exercise asks you to use R, include a copy of the code and output. Please edit your code and output to be only the relevant portions.

*If a problem does not specify how to compute the answer, you many use any appropriate method. I may ask you to use R or use manually calculations on your exams, so practice accordingly.

*You must include an explanation and/or intermediate calculations for an exercise to be complete.

*Be sure to submit the HWK4 Autograde Quiz which will give you ~20 of your 40 accuracy points.

*50 points total: 40 points accuracy, and 10 points completion

Discrete RV Expectation, Variance, and Transformation

Exercise 1: A chemical supply company ships a certain solvent in 10-gallon drums. Let X represent the number of drums ordered by a randomly chosen customer. Assume X has the following probability mass function (pmf). The mean and variance of X is : \(\mu_X=2.3\) and \(\sigma^2_X=1.81\):

X	P(X=x)
1	0.4
2	0.2
3	0.2
4	0.1
5	0.1

Calculate \(P(X \le 2)\) and describe what it means in the context of the problem.

0.4+0.2

## [1] 0.6

There is at least a 60 percent chance of selling at the most 2 drums.

Let Y be the number of gallons ordered, so \(Y=10X\). Find the probability mass function of Y.

y	P(Y=y)
10	.4
20	.2
30	.2
40	.1
50	.1

Calculate the mean number of gallons ordered \(\mu_Y\).

(10*0.4)+(20*0.2)+(30*0.2)+(40*0.1)+(50*0.1)

## [1] 23

23 is the mean.

Calculate the standard deviation of the number of gallons ordered, \(\sigma_Y\).

sqrt((0.4*(10-23)^2)+(0.2*(20-23)^2)+(0.2*(30-23)^2)+(0.1*(40-23)^2)+(0.1*(50-23)^2))

## [1] 13.45362

13.45 is the standard deviation.

Normal RVs

Exercise 2: Weights of female cats of a certain breed (A) are well approximated by a normal distribution with mean 4.1 kg and standard deviation of 0.6 kg \(W_A~\sim N(4.1, 0.6^2)\).

What proportion of female cats of that breed (A) have weights between 3.7 and 4.4 kg?

pnorm(4.4,4.1,0.6)- pnorm(3.7,4.1,0.6)

## [1] 0.4389699

0.44 is the proportion of female cats of breed (A) that have weights between 3.7 and 4.4 kg.

A female cat of that breed (A) has a weight that is 0.5 standard deviations above the mean. What proportion of female cats of that breed (A) are heavier than this one?

1 - pnorm(4.4, 4.1, 0.6)

## [1] 0.3085375

1 - pnorm(0.5)

## [1] 0.3085375

pnorm(1.2, 1.387, 0.161)

## [1] 0.1227212

The proportion of female cats of bread (A) heavier than this one is .31.

How heavy is a female cat of this breed whose weight is on the 80th percentile?

qnorm(0.8,4.1,0.6)

## [1] 4.604973

The cat is 4.60 kg. > d. Calculate the IQR of weights for female cats of this breed.

qnorm(0.75,4.1,.6)-qnorm(0.25,4.1,.6)

## [1] 0.8093877

Females from another breed of cats (breed B) have weights well approximated by a normal distribution with mean 10.6 lb and standard deviation of 0.9 lb \(W_{B.lb}~\sim N(10.6, 0.9^2)\). Transform the weights of cat breed B into kilograms using the conversion: 1 lb \(\approx\) 0.454 kgs. You can use the transformation: \(W_{B}=0.454*W_{B.lb}\). Compare the shape, center, and spread of the two breeds.

10.6*0.453592

## [1] 4.808075

0.9*0.453592

## [1] 0.4082328

Breed B has a larger average size in comparison. Breed A has a larger variation among its weights, with a larger standard deviation as well.

Sampling Distributions

Exercise 3: A serving of breakfast cereal has a sugar content that is well approximated by a Normal distributed random variable X with mean 13 g and variance: \(1.3^2 g^2\). We can consider each serving as an independent and identical draw from X.

In what percent of servings will the sugar content be above 13.3 g?

1-pnorm(13.3,13.0,1.3)

## [1] 0.408747

40.87 percent of servings will have a sugar content above 13.3.

What is the probability that a randomly chosen serving will have a sugar content between 13.877 and 12.123? What do we call the difference: 13.877-12.123=1.754?

pnorm(13.877,13.0,1.3)-pnorm(12.12,13.0,1.3)

## [1] 0.5008125

The probability is .50. The difference would be called the range.

Calculate the probability that in 6 servings, only 1 has a sugar content below 13 g.

(pnorm(13.0,13.0,1.3))^6

## [1] 0.015625

Describe the sampling distribution for the mean sugar content of 6 servings \(\bar{X}\). (Give shape, mean, and standard deviation or variance, if possible)

(pnorm(13.0,13.0,1.3))^6

## [1] 0.015625

This is binomial, with a mean of 13, and a standard deviation of .016.

What is the interquartile range of the sampling distribution for the sample mean \(\bar{X}\) when n=6? Is that value larger or smaller than the IQR implied in part b? Why does the relative sizes of the IQRs make sense?

(pnorm(13.877,13.0,1.3)-pnorm(12.12,13.0,1.3))/(6^2)

## [1] 0.01391146

The value is larger than that of part B. Because there are more samples, the IQR will be larger, because there is more opportunity for variation.

What is the probability that the mean sugar content in 6 servings is more than 13.3 g ?

(1-pnorm(13.3,13.0,1.3))*6

## [1] 2.452482

Is it more or less likely that the mean sugar content is above 13.3 g in 10 servings or 6 servings (as computed in f)? Can you explain it without actually computing the new probability?

It is more likely that the mean sugar content is above 13.3 g in 10 servings becayse there will be a greater opportunity for variation.

Suppose each cereal box of this type contains 10 servings and consider the total sugar content in each box as a sum of 10 iid random draws from \(X \sim N(13, 1.3^2)\). If you were to eat a whole box of cereal, above what total sugar content would you consume with 95% probability? Show and briefly explain your calculations.

qnorm(.95,13,1.3)

## [1] 15.13831

Above the sugar content of 15.13831 grams could you consume with 95% probability.

Exercise 4: You will be comparing the sampling distributions for two different estimators of \(\sigma\), the population standard deviation.

When trying to estimate the standard deviation of a population (\(\sigma\)) from a sample we could use:

The graphs below give the sampling distributions produced by these estimators when drawing a sample of size 8 from a normal population with mean \(\mu_x=3\) and standard deviation \(\sigma_X=5\).

What do you notice about the mean of the standard deviations produced using the \(s_1\) estimator compared to the \(s_2\) estimator compared to the true population standard deviation \(\sigma_X=5\)? Why do we prefer to use the \(s_1\) formulation when we have a sample of data and are interested in estimating the population standard deviation? (You should use the resulting histograms to help you answer the question and use the word “bias”.)

The mean of the \(s_2\) estimator leans slightly to the left in comparison to that of \(s_1\). Likewise, the entire graph is more focused and heightened to the right in \(s_1\) compared to \(s_2\). In order to avoid a sampling bias and maintain an accurate representation of the true population, we use the formula of \(s_1\) because it does include the outliers that may exist in regular populations.

Statistics 371 Homework #4

Gweneth Childs

Discrete RV Expectation, Variance, and Transformation

Normal RVs

Sampling Distributions