*Submit your homework to Canvas by the due date and time. Email your instructor if you have extenuating circumstances and need to request an extension.
*If an exercise asks you to use R, include a copy of the code and output. Please edit your code and output to be only the relevant portions.
*If a problem does not specify how to compute the answer, you many use any appropriate method. I may ask you to use R or use manually calculations on your exams, so practice accordingly.
*You must include an explanation and/or intermediate calculations for an exercise to be complete.
*Be sure to submit the HWK4 Autograde Quiz which will give you ~20 of your 40 accuracy points.
*50 points total: 40 points accuracy, and 10 points completion
Exercise 1: A chemical supply company ships a certain solvent in 10-gallon drums. Let X represent the number of drums ordered by a randomly chosen customer. Assume X has the following probability mass function (pmf). The mean and variance of X is : \(\mu_X=2.3\) and \(\sigma^2_X=1.81\):
| X | P(X=x) |
|---|---|
| 1 | 0.4 |
| 2 | 0.2 |
| 3 | 0.2 |
| 4 | 0.1 |
| 5 | 0.1 |
- Calculate \(P(X \le 2)\) and describe what it means in the context of the problem.
I added up the probabilities of 1 and 2. 0.4+0.2=0.6.0.6 is the probability that X will be less than or equal to 2.
- Let Y be the number of gallons ordered, so \(Y=10X\). Find the probability mass function of Y.
| Y | P(Y=y) |
|---|---|
| 10 | 0.4 |
| 20 | 0.2 |
| 30 | 0.2 |
| 40 | 0.1 |
| 50 | 0.1 |
- Calculate the mean number of gallons ordered \(\mu_Y\).
2.3*10
## [1] 23
- Calculate the standard deviation of the number of gallons ordered, \(\sigma_Y\).
sqrt(1.81)*10
## [1] 13.45362
Exercise 2: Weights of female cats of a certain breed (A) are well approximated by a normal distribution with mean 4.1 kg and standard deviation of 0.6 kg \(W_A~\sim N(4.1, 0.6^2)\).
- What proportion of female cats of that breed (A) have weights between 3.7 and 4.4 kg?
pnorm(4.4,mean=4.1,sd=0.6)-pnorm(3.7,mean=4.1,sd=0.6)
## [1] 0.4389699
- A female cat of that breed (A) has a weight that is 0.5 standard deviations above the mean. What proportion of female cats of that breed (A) are heavier than this one?
1-pnorm(4.4,mean=4.1,sd=0.6)
## [1] 0.3085375
- How heavy is a female cat of this breed whose weight is on the 80th percentile?
qnorm(0.80,4.1,0.6)
## [1] 4.604973
- What is the IQR of weights for female cats of this breed using the normal distribution approximation?
qnorm(0.75,4.1,0.6)-qnorm(0.25,4.1,0.6)
## [1] 0.8093877
- Females from another breed of cats (breed B) have weights well approximated by a normal distribution with mean 10.6 lb and standard deviation of 0.9 lb \(W_{B.lb}~\sim N(10.6, 0.9^2)\). Transform the weights of cat breed B into kilograms using the conversion: 1 lb \(\approx\) 0.454 kgs. You can use the transformation: \(W_{B}=0.454*W_{B.lb}\). Compare the shape, center, and spread of the two breeds.
They will both have the same shape and be normal distributions, the center and spread will be slightly different. B will have a higher mean, about 0.7, and median weight while having a lower standard deviation, about 0.6.
10.6*0.454
## [1] 4.8124
0.9*0.454
## [1] 0.4086
Exercise 3: A serving of breakfast cereal has a sugar content that is well approximated by a Normal distributed random variable X with mean 13 g and variance: \(1.3^2 g^2\). We can consider each serving as an independent and identical draw from X.
- In what percent of servings will the sugar content be above 13.3 g?
pnorm(13.3,13,1.3, lower.tail = FALSE)
## [1] 0.408747
- What is the probability that a randomly chosen serving will have a sugar content between 13.877 and 12.123? What do we call the difference: 13.877-12.123=1.754?
pnorm(13.877,13,1.3)-pnorm(12.123,13,1.3)
## [1] 0.5000798
This difference is called the interquartile range.
- Calculate the probability that in 6 servings, only 1 has a sugar content below 13 g.
1-pnorm(13,13,1.3)
## [1] 0.5
dbinom(1,6,0.5)
## [1] 0.09375
pnorm will give the probability that the sugar content is below 12, while dbinom will help to determine the probability that only 1 serving will have sugar content below 13g in 6 servings. > d. Describe the sampling distribution for the mean sugar content of 6 servings \(\bar{X}\).
sqrt(1.3)
## [1] 1.140175
1.3/sqrt(6)
## [1] 0.5307228
To find standard deviation, you take 1.3 divided by the squareroot of 6 and get 0.5307. Than the normal sample distribution has a mean of 13 and a standard deviation of 0.5307.When we are taking data for 6 servings the mean will be the same, 13, but the standard deviation will change to be 1.3/6= 1.3/sqrt(6)=0.5307. The sampling distribution for the mean sugar content of 6 servings, X, is normal because the question states that the population is well approximated by a Normal Distributed Random variable.
- What is the interquartile range of the sampling distribution for the sample mean \(\bar{X}\) when n=6? Is that value larger or smaller than the IQR implied in part b? Why does the relative sizes of the IQRs make sense?
qnorm(0.75,13,0.5307)-qnorm(0.25,13,0.5307)
## [1] 0.7159034
This value is smaller than the IQR in part b due to the standard deviation is part b is bigger. A smaller standard deviation means the numbers are less spread and the IQR will be smaller. This is a sample size of 6 while part b was only 1. Larger sample sizes lead to less variability and closer values to each other.
- What is the probability that the mean sugar content in 6 servings is more than 13.3 g ?
pnorm(13,13.3,0.5307)
## [1] 0.2859379
- Is it more or less likely that the mean sugar content is above 13.3 g in 10 servings or 6 servings (as computed in f)? Can you explain it without actually computing the new probability?
It would be less likely that the mean sugar content is above 13.3g in 10 servings because you are working with a bigger population/sample size. A larger sample size creates better chances for a normal distribution. It would be less likely in 10 servings, because more data points will lead to less variability.
- Suppose each cereal box of this type contains 10 servings and consider the total sugar content in each box as a sum of 10 iid random draws from \(X \sim N(13, 1.3^2)\). If you were to eat a whole box of cereal, above what total sugar content would you consume with 95% probability? Show and briefly explain your calculations.
qnorm(.05,130,sd=1.3/sqrt(10))
## [1] 129.3238
The value is the sugar content of 10 servings of sugar. This is in the 5th percentile of the data. 130 is the mean, because we take 13 and multiply it by 10, because that is the serving side. The standard deviation is then found taking the 1.3 and dividing it by the square root of 10, the number of servings.
Exercise 4: You will be comparing the sampling distributions for two different estimators of \(\sigma\), the population standard deviation.
When trying to estimate the standard deviation of a population (\(\sigma\)) from a sample we could use:
The graphs below give the sampling distributions produced by these estimators when drawing a sample of size 8 from a normal population with mean \(\mu_x=3\) and standard deviation \(\sigma_X=5\).
- What do you notice about the mean of the standard deviations produced using the \(s_1\) estimator compared to the \(s_2\) estimator compared to the true population standard deviation? Why do we prefer to use the \(s_1\) formulation when we have a sample of data and are interested in estimating the population standard deviation? (You should use the resulting histograms to help you answer the question and use the word “bias”.)
The s1 sampling has a more normal distribution than the s2. When talking about bias, the graphs show the parameter and the expected value of the estimator. There is a smaller difference between the mean and standard deviation in s1 meaning that there is less bias in comparison to s2, which has a bigger difference between the mean and standard deviation and therefore, more bias. The smaller the difference, the less bias or unbias a data set is.The s1 mean is around 5, possibly 4.8, which is about 0.2 to the standard deviation. S2 has a mean of about 4.5 and is about 0.5 from the standard deviation. This is a greater difference and therefore has more bias leading it to be the less preferred chart. This has a larger bias than s1.