Untitled

Exercise 4.7

Relaxing after work. The General Social Survey (GSS) is a sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States. In 2010, the survey collected responses from 1,154 US residents. The survey is conducted face-to-face with an in-person interview of a randomly-selected sample of adults. One of the questions on the survey is “After an average work day, about how many hours do you have to relax or pursue activities that you enjoy?” A 95% confidence interval from the 2010 GSS survey is 3.53 to 3.83 hours.

Interpret this interval in the context of the data.

With 95% confidence, the population mean number of hours spent relaxing is between 3.53 hours and 3.83 hours.

What does a 95% confidence level mean in this context?

We have a 95% probability that the process used worked to capture the mean, and a 5% chance that it failed. IT IS NOT THE PROBABILITY THAT THE MEAN IS IN THE INTERVAL.

Suppose the researchers think a 90% confidence level would be more appropriate for this interval. Will this new interval be smaller or larger than the 95% confidence interval? Assume the standard deviation has remained constant since 2010.

The interval will be smaller.

Exercise 4.12

Thanksgiving spending, Part I. The 2009 holiday retail season, which kicked off on November 27, 2009 (the day after Thanksgiving), had been marked by somewhat lower self-reported consumer spending than was seen during the comparable period in 2008. To get an estimate of consumer spending, 436 randomly sampled American adults were surveyed. Daily consumer spending for the six-day period after Thanksgiving, spanning the Black Friday weekend and Cyber Monday, averaged $84.71. A 95% confidence interval based on this sample is ($80.31, $89.11). Determine whether the following statements are true or false, and explain your reasoning.

We are 95% confident that the average spending of these 436 American adults is between $80.31 and $89.11.

False, the confidence interval estimates the parameter for the population, not just for the sample size.

This confidence interval is not valid since the distribution of spending in the sample is right skewed.

False, the CLT works here since the sample size is greater than 30 hence the shape of the distribution sampled from doesn’t change the result.

95% of such random samples would have a sample mean between $80.31 and $89.11.

False, it’s not talking about the sample mean, it’s talking about the population mean. CI tells us about population mean. The correct version would be that we are 95% confident that the population mean is in the interval.

We are 95% confident that the average spending of all American adults is between $80.31 and $89.11.

True, this is the definition of a confidence interval.

A 90% confidence interval would be narrower than the 95% confidence interval since we don’t need to be as sure about capturing the parameter.

False, bad question.

In order to decrease the margin of error of a 95% confidence interval to a third of what it is now, we would need to use a sample 3 times larger.

False, we would need to take a sample size 9 times larger.

The margin of error for the reported interval is 4.4.

True.

4.14 Age at first marriage, Part I. The National Survey of Family Growth conducted by the Centers for Disease Control gathers information on family life, marriage and divorce, pregnancy, infertility, use of contraception, and men’s and women’s health. One of the variables collected on this survey is the age at first marriage. The histogram below shows the distribution of ages at first marriage of 5,534 randomly sampled women between 2006 and 2010. The average age at first marriage among these women is 23.44 with a standard deviation of 4.72.

Estimate the average age at first marriage of women using a 95% confidence interval, and interpret this interval in context. Discuss any relevant assumptions.

LB = 23.44-1.96*4.72/sqrt(5534)
UB = 23.44+1.96*4.72/sqrt(5534)
x=c(LB,UB)
x

## [1] 23.31564 23.56436

We are 95% confident that the average age of first marriage of females in the US is between 23.3 years to 23.6 years. This is assuming that CLT works, that the samples are independent, and that the sample size is greater than 30.

Text Exercises

Exercise 3.8

Portfolio returns. The Capital Asset Pricing Model is a financial model that assumes returns on a portfolio are normally distributed. Suppose a portfolio has an average annual return of 14.7% (i.e. an average gain of 14.7%) with a standard deviation of 33%. A return of 0% means the value of the portfolio doesn’t change, a negative return means that the portfolio loses money, and a positive return means that the portfolio gains money.

What percent of years does this portfolio lose money, i.e. have a return less than 0%?

pnorm(0,14.7,33)

## [1] 0.3279957

33%

What is the cutoff for the highest 15% of annual returns with this portfolio?

1-pnorm(85,14.7,33)

## [1] 0.0165733

.02%

Exercise 3.36

Sickle cell anemia. Sickle cell anemia is a genetic blood disorder where red blood cells lose their flexibility and assume an abnormal, rigid, “sickle” shape, which results in a risk of various complications. If both parents are carriers of the disease, then a child has a 25% chance of having the disease, 50% chance of being a carrier, and 25% chance of neither having the disease nor being a carrier. If two parents who are carriers of the disease have 3 children, what is the probability that

two will have the disease?

dbinom(2,3,.25)

## [1] 0.140625

14%

none will have the disease?

dbinom(0,3,.25)

## [1] 0.421875

42%

Exercise 4.8

Mental health. Another question on the General Social Survey introduced in Exercise 4.7 is “For how many days during the past 30 days was your mental health, which includes stress, depression, and problems with emotions, not good?” Based on responses from 1,151 US residents, the survey reported a 95% confidence interval of 3.40 to 4.24 days in 2010.

Interpret this interval in context of the data.
With 95% confidence, we see that US residents have 3.40 to 4.24 days per month when their mental health is not good.
What does a 95% confidence level mean in this context?

It means that 95% of the time, the process used to capture the mean on mental health worked, and 5% of the time it failed.

Suppose the researchers think a 99% confidence level would be more appropriate for this interval. Will this new interval be smaller or larger than the 95% confidence interval?

It will be smaller.

If a new survey asking the same questions was to be done with 500 Americans, would the standard error of the estimate be larger, smaller, or about the same. Assume the standard deviation has remained constant since 2010.

The Standard Error will be larger because the Standard Deviation will be divided by a smaller number.

Exercise 4.9

Width of a confidence interval. Earlier in Chapter 4, we calculated the 99% confidence interval for the average age of runners in the 2012 Cherry Blossom Run as (32.7, 37.4) based on a sample of 100 runners. How could we decrease the width of this interval without losing confidence.

If we increase the sample size, the SE will be smaller and thereby make the width the interval smaller.

Exercise 4.10

Confidence levels. If a higher confidence level means that we are more confident about the number we are reporting, why don’t we always report a confidence interval with the highest possible confidence level?

Because we have to balance between the usefulness of the confidence interval vs. the precision of what it’s predicting.

Exercise 4.11

Waiting at an ER, Part I. A hospital administrator hoping to improve wait times decides to estimate the average emergency room waiting time at her hospital. He collects a simple random sample of 64 patients and determines the time (in minutes) between when they checked in to the ER until they were first seen by a doctor. A 95% confidence interval based on this sample is (128 minutes, 147 minutes), which is based on the normal model for the mean. Determine whether the following statements are true or false, and explain your reasoning for those statements you identify as false.

This confidence interval is not valid since we do not know if the population distribution of the ER wait times is nearly normal.

False. The confidence interval is valid because the sample size is greater than 30. Any distribution will eventually become nearly normal after being repeated many times.

We are 95% confident that the average waiting time of these 64 emergency room patients is between 128 and 147 minutes.

False, the confidence interval does not apply to the sample.

We are 95% confident that the average waiting time of all patients at this hospital’s emergency room is between 128 and 147 minutes.

True.

95% of such random samples would have a sample mean between 128 and 147 minutes.

False, because the confidence interval is a statement about the population mean, and does not apply to other sample means.

A 99% confidence interval would be narrower than the 95% confidence interval since we need to be more sure of our estimate.

False. it would be wider because it would be taking into account more possibilities.

The margin of error is 9.5 and the sample mean is 137.5.

True

In order to decrease the margin of error of a 95% confidence interval to half of what it is now, we would need to double the sample size.

False, you would have to do 4 times the sample size, because you are dividing the s by the square root of the n.

Exercise 4.14

Age at first marriage, Part I. The National Survey of Family Growth conducted by the Centers for Disease Control gathers information on family life, marriage and divorce, pregnancy, infertility, use of contraception, and men’s and women’s health. One of the variables collected on this survey is the age at first marriage. The histogram below shows the distribution of ages at first marriage of 5,534 randomly sampled women between 2006 and 2010. The average age at first marriage among these women is 23.44 with a standard deviation of 4.72.