Set up workspace

Exercise 4.4

(a) What is the point estimate for the average height of active individuals? What about the median?

Point estimate of the mean: 171.1 Point estimate of the median: 170.3

(b) What is the point estimate for the standard deviation of the heights of active individuals? What about the IQR?

Point estimate of sd: 9.4 Point estimate of IQR: 177.8 - 163.8 = 14

(c) Is a person who is 1m 80cm (180 cm) tall considered unusually tall? And is a person who is 1m 55cm (155cm) considered unusually short? Explain your reasoning.

(180 - 171.1)/9.4
## [1] 0.9468085
(155 - 171.1)/9.4
## [1] -1.712766

Both are within 2 SD of the mean, so neither would be considered “unusual” by that statistical measure.

(d) The researchers take another random sample of physically active individuals. Would you expect the mean and the standard deviation of this new sample to be the ones given above? Explain your reasoning.

We would expect them to be “similiar” but not identical. For instance, mean would be expected to be within about 2 SD of the mean given above.

(e) The sample means obtained are point estimates for the mean height of all active individuals, if the sample of individuals is equivalent to a simple random sample. What measure do we use to quantify the variability of such an estimate? Compute this quantity using the data from the original sample under the condition that the data are a simple random sample.

We would use the Standard Error: SD/sqrt(n) = 0.417

9.4/(sqrt(507))
## [1] 0.4174687

Exercise 4.14

(a) We are 95% confident that the average spending of these 436 American adults is between $80.31 and $89.11.

FALSE - We are 100% confident about the sample mean is between these two numbers.

(b) This confidence interval is not valid since the distribution of spending in the sample is right skewed.

FALSE - since the number of observations is 436, even though the data is slightly skewed, we can still assume an approximately normal distribution for sample means from this population.

(c) 95% of random samples have a sample mean between $80.31 and $89.11.

FALSE - CI is not about the sample mean

(d) We are 95% confident that the average spending of all American adults is between $80.31 and $89.11.

TRUE

(e) A 90% confidence interval would be narrower than the 95% confidence interval since we don’t need to be as sure about our estimate.

TRUE that it would be narrower

(f) In order to decrease the margin of error of a 95% confidence interval to a third of what it is now, we would need to use a sample 3 times larger.

FALSE based on a 1/sqrt(n)… so would need to be 9 times larger

(g) The margin of error is 4.4.

TRUE - margin of error is 1/2 of the CI

Exercise 4.24

(a) Are conditions for inference satisfied?

Yes: independence - less than 10% of the total population skew - from sample, doesn’t appear there are significant outliers indicating underlying pop. skewed sample > 30

(b) Suppose you read online that children first count to 10 successfully when they are 32 months old, on average. Perform a hypothesis test to evaluate if these data provide convincing evidence that the average age at which gifted children first count to 10 successfully is less than the general average of 32 months. Use a significance level of 0.10.

H(null): Mean of 32 is true average H(alt): Mean of 32 is not the true average

30.69 - 1.96*(4.31/sqrt(36))
## [1] 29.28207
30.69 + 1.96*(4.31/sqrt(36))
## [1] 32.09793

We would expect 95% of the sample CI’s to include the true population mean. This one does. So we would not reject the null at this level

(c) Interpret the p-value in context of the hypothesis test and the data.

2*(1 - pnorm(-(30.69 - 32)/(4.31/sqrt(36))))
## [1] 0.0682026

Chance of seeing this mean greater than 5% limit we had established, so do not reject the mean.

(d) Calculate a 90% confidence interval for the average age at which gifted children first count to 10 successfully.

pnorm(1.65)
## [1] 0.9505285
30.69 - 1.65*(4.31/sqrt(36))
## [1] 29.50475
30.69 + 1.65*(4.31/sqrt(36))
## [1] 31.87525

(e) Do your results from the hypothesis test and the confidence interval agree? Explain

Would reject the null hyopothesis at this lower confidence level/narrower CI.

Exercise 4.26

(a) Perform a hypothesis test to evaluate if these data provide convincing evidence that the average IQ of mothers of gifted children is different than the average IQ for the population at large, which is 100. Use a significance level of 0.10.

z <- (118.2 - 100)/(6.5/sqrt(36))
1 - pnorm(z)
## [1] 0

Z score huge (error in approach?)… Would reject null hypothesis of no difference from the general population.

(b) Calculate a 90% confidence interval for the average IQ of mothers of gifted children.

118.2 - 1.65*(6.5/sqrt(36))
## [1] 116.4125
118.2 + 1.65*(6.5/sqrt(36))
## [1] 119.9875

(c) Do your results from the hypothesis test and the confidence interval agree? Explain.

Yes. 90% CI does not include the mean of 100 by a long shot. Expect 90% of these 90% CI’s to include the true mean of the population.

Exercise 4.34

Sampling distribution is the distribution of the means of repeated samples taken from a population. Expect this distribution to be approximately normal given independent observations and not highly skewed underlying population distribution for small sample sizes. As sample size increase, standard error of the distribution decreases (bell narrows).

Exercise 4.40

(a) What is the probability that a randomly chosen light bulb lasts more than 10,500 hours?

1 - pnorm((10500 - 9000)/1000)
## [1] 0.0668072

(b) Describe the distribution of the mean lifespan of 15 light bulbs.

1000/sqrt(10)
## [1] 316.2278

N(9000, 316.2278)

(c) What is the probability that the mean lifespan of 15 randomly chosen light bulbs is more than 10,500 hours?

pnorm(10500 - 9000)/(1000/sqrt(15))
## [1] 0.003872983

About 0.3%

(e) Could you estimate the probabilities from parts (a) and (c) if the lifespans of light bulbs had a skewed distribution?

No. Has to be approximately normal. Also, sample < 30.

Exercise 4.48

P-value will decrease. Because it is inversely dependent on the square root of the sample size.