Assignment 6

Lab 5

Exercise 4.16

Identify hypotheses, Part II. Write the null and alternative hypotheses in words and using symbols for each of the following situations.

Since 2008, chain restaurants in California have been required to display calorie counts of each menu item. Prior to menus displaying calorie counts, the average calorie intake of diners at a restaurant was 1100 calories. After calorie counts started to be displayed on menus, a nutritionist collected data on the number of calories consumed at this restaurant from a random sample of diners. Do these data provide convincing evidence of a dfference in the average calorie intake of a diners at this restaurant?

$H_O:$ There is no change in the average calorie intake for diners.

$H_A:$ There is a difference in calorie intake for diners.

$H_O: \mu = 1100$
$H_A: \mu \ne 1100$

Based on the performance of those who took the GRE exam between July 1, 2004 and June 30, 2007, the average Verbal Reasoning score was calculated to be 462. In 2011 the average verbal score was slightly higher. Do these data provide convincing evidence that the average GRE Verbal Reasoning score has changed since 2004?

$H_O:$ There is no change in the average VRS.

$H_A:$ There is an increase in the average VRS.

$H_O: \mu = 462$
$H_A: \mu > 462$

Exercise 4.18

Age at first marriage, Part II. Exercise 4.14 presents the results of a 2006 - 2010 survey showing that the average age of women at first marriage is 23.44. Suppose a researcher believes that this value has increased in 2012, but he would also be interested if he found a decrease. Below is how he set up his hypotheses. Indicate any errors you see.

$H_O: \bar x = 23.44$ years old
$H_A: \bar x > 23.44$ years old

The value of the mean should be $\mu$ and alternate should be a $\ne$

Exercise 4.20

Thanksgiving spending, Part II. Exercise 4.12 provides a 95% confidence interval for the average spending by American adults during the six-day period after Thanksgiving 2009: ($80.31, $89.11).

A local news anchor claims that the average spending during this period in 2009 was $100. What do you think of this claim?

I think he failed stats… if our estimate is the interval of ($80.31, $89.11), his estimate is much larger than the interval estimate.

Would the news anchor’s claim be considered reasonable based on a 90% confidence interval?
Why or why not?

No, because when you lower the confidence interval, interval would be narrower so the value of $100 still would not be our interval.

Exercise 4.22

Gifted children, Part I. Researchers investigating characteristics of gifted children collected data from schools in a large city on a random sample of thirty-six children who were identified as gifted children soon after they reached the age of four. The following histogram shows the distribution of the ages (in months) at which these children first counted to 10 successfully. Also provided are some sample statistics.

n 36
min 21
mean 30.69
sd 4.31
max 39

Are conditions for inference satisfied?
Yes, sample is large, >30, random sample, and hist is roughly symmetric.
Suppose you read on a parenting website that children first count to 10 successfully when they are 32 months old, on average. Perform a hypothesis test to evaluate if these data provide convincing evidence that the average age at which gifted children first count to 10 successfully is less than the general average of 32 months. Use a significance level of 0.10. ($\alpha = 10%$)

$H_O:$ The mean age is 32 months to count to 10.

$H_A:$ The mean age is less than 32 months to count to 10.

$H_O: \mu = 32$
$H_A: \mu < 32$

Decision rule: Reject the null if the pvalue is less than 10%.

Test Statistic: Calculate Z

(30.69-32)/(4.31/sqrt(36))

## [1] -1.823666

Calculate the p-value:

pnorm(-1.823666,0,1)

## [1] 0.03410129

Decision: Reject the null.

Interpret the p-value in context of the hypothesis test and the data.

There is significant evidence to infer that the gifted children can count to 10 at an earlier age.

Calculate a 90% confidence interval for the average age at which gifted children first count to 10 successfully.

30.69-1.64*4.31/sqrt(36)

## [1] 29.51193

30.69+1.64*4.31/sqrt(36)

## [1] 31.86807

Our 90% CI is given by (29.51,31.87)

Do your results from the hypothesis test and the confidence interval agree? Explain

Yes, we see the value of 32 months is outside our CI. We would conclude that 32 months is an unusual event.

Exercise 4.28

Testing for food safety. A food safety inspector is called upon to investigate a restaurant with a few customer reports of poor sanitation practices. The food safety inspector uses a hypothesis testing framework to evaluate whether regulations are not being met. If he decides the restaurant is in gross violation, its license to serve food will be revoked.

Write the hypotheses in words.
Null: The restaurant’s sanitation has no problem. Good sanitation.
Alternate: The restaurant’s sanitation is bad. Bad sanitation.
What is a Type 1 error in this context?
To reject that the sanitiation is bad when it really has no problem.
What is a Type 2 error in this context?
To reject that the sanitation has problems when it’s not true that it has no problem.
Which error is more problematic for the restaurant owner? Why?
Type one because they would get mistakenly shut down.
Which error is more problematic for the diners? Why?
Type 2 because they would be eating unsafe food.
As a diner, would you prefer that the food safety inspector requires strong evidence or very strong evidence of health concerns before revoking a restaurant’s license? Explain your reasoning.
Strong evidence to err on the side of rejecting that the sanitation is bad.

Homework Exercises

Exercise 4.15

Identify hypotheses, Part I. Write the null and alternative hypotheses in words and then symbols for each of the following situations.

New York is known as “the city that never sleeps”. A random sample of 25 New Yorkers were asked how much sleep they get per night. Do these data provide convincing evidence that New Yorkers on average sleep less than 8 hours a night?

$H_O:$ New Yorkers sleep at least 8 hours per night.

$H_A:$ New Yorkers sleep less than 8 hours per night.

$H_O: \mu > 8$ hours sleep per night

$H_A: \mu < 8$ hours sleep per night

Employers at a firm are worried about the effect of March Madness, a basketball championship held each spring in the US, on employee productivity. They estimate that on a regular business day employees spend on average 15 minutes of company time checking personal email, making personal phone calls, etc. They also collect data on how much company time employees spend on such non-business activities during March Madness. They want to determine if these data provide convincing evidence that employee productivity decreases during March Madness.

First of all these employers are assholes.

$H_O:$ Employees spend 15 minutes on non-business activities

$H_A:$ Employees spend greater than 15 minutes on non-business activities

$H_O: \mu = 15$ minutes

$H_A: \mu > 15$ minutes

Exercise 4.17

Online communication. A study suggests that the average college student spends 2 hours per week communicating with others online. You believe that this is an underestimate and decide to collect your own sample for a hypothesis test. You randomly sample 60 students from your dorm and find that on average they spent 3.5 hours a week communicating with others online. A friend of yours, who offers to help you with the hypothesis test, comes up with the following set of hypotheses. Indicate any errors you see.

$H_O: \bar x < 2$ hours
$H_A: \bar x > 3.5$ hours

Value of the mean should be $\mu$, null should be $=$ and alternate should be $=$

Exercise 4.19

Waiting at an ER, Part II. Exercise 4.11 provides a 95% confidence interval for the mean waiting time at an emergency room (ER) of (128 minutes, 147 minutes).

A local newspaper claims that the average waiting time at this ER exceeds 3 hours. What do you think of this claim?

That would exceed the upper confidence interval by 33 minutes, so it would warrant a closer look at whatever data the newspaper is looking at.

The Dean of Medicine at this hospital claims the average wait time is 2.2 hours. What do you think of this claim?

That is almost the mean wait time reported.

Without actually calculating the interval, determine if the claim of the Dean from part (b) would be considered reasonable based on a 99% confidence interval?

It would result in a larger interval so the Dean’s claim would still be reasonable.

Exercise 4.21

Ball bearings. A manufacturer claims that bearings produced by their machine last 7 hours on average under harsh conditions. A factory worker randomly samples 75 ball bearings, and records their lifespans under harsh conditions. He calculates a sample mean of 6.85 hours, and the standard deviation of the data is 1.25 working hours. The following histogram shows the distribution of the lifespans of the ball bearings in this sample. Conduct a formal hypothesis test of this claim. Make sure to check that relevant conditions are satisfied.

$H_O:$ Bearings last 7 hours under harsh conditions.
$H_A:$ Bearings a different amount of time than 7 hours under harsh conditions.

$H_O: = \mu = 7$
$H_A: = \mu \ne 7$

= 6.85
s = 1.25
n = 75

Independent, roughly symmetrical with no outliers, sample size >30.

Exercise 4.23

Waiting at an ER, Part III. The hospital administrator mentioned in Exercise 4.11 randomly selected 64 patients and measured the time (in minutes) between when they checked in to the ER and the time they were first seen by a doctor. The average time is 137.5 minutes and the standard deviation is 39 minutes. He is getting grief from his supervisor on the basis that the wait times in the ER increased greatly from last year’s average of 127 minutes. However, the administrator claims that the increase is probably just due to chance.

Are conditions for inference met? Note any assumptions you must make to proceed.
Using a significance level of alpha = 0.05, is the change in wait times statistically significant? Use a two-sided test since it seems the supervisor had to inspect the data before he suggested an increase occurred.

(137.5-127)/(39/sqrt(64))

## [1] 2.153846

2*(1-pnorm(2.15,0,1))

## [1] 0.03155521

Reject the null.

Would the conclusion of the hypothesis test change if the significance level was changed to alpha = 0.01?

Would fail to reject the null.

Exercise 4.27

Testing for Fibromyalgia. A patient named Diana was diagnosed with Fibromyalgia, a long-term syndrome of body pain, and was prescribed anti-depressants. Being the skeptic that she is, Diana didn’t initially believe that anti-depressants would help her symptoms. However after a couple months of being on the medication she decides that the anti-depressants are working, because she feels like her symptoms are in fact getting better.

Write the hypotheses in words for Diana’s skeptical position when she started taking the anti-depressants.

Null: Anti-depressants have no effect on fibromyalgia symptoms.
Alternate: Taking anti-depressants results in a decrease of fibrmyalgia symptoms.

What is a Type 1 error in this context?

Type 1 error would be to incorrectly dismiss the null, and erroneously conclude that taking antidepressants results in a decrease of fibromyalgia symptoms.

What is a Type 2 error in this context?

Type 2 error would be to incrorrectly dismiss the alternate, and erroneously conclude that anti-depressants have no effect on fibromyalgia symptoms.

How would these errors affect the patient?

Presuming there are no negative side effects to the antidepressants, a type 1 error would not be harmful.

A type 2 error would lead the patient to refuse drugs that could help her condition.

Exercise 4.33

Ages of pennies, Part I. The histogram below shows the distribution of ages of pennies at a bank.

Describe the distribution.

Most of the pennies are on the newer side, and there are diminishing numbers as the penny age increases.

Sampling distributions for means from simple random samples of 5, 30, and 100 pennies is shown in the histograms below. Describe the shapes of these distributions and comment on whether they look like what you would expect to see based on the Central Limit Theorem.

As the n increases, the histograms fall more into a normal distribution, and I would expect that trend to continue.

Exercise 4.35

Identify distributions, Part I. Four plots are presented below. The plot at the top is a distribution for a population. The mean is 10 and the standard deviation is 3. Also shown below is a distribution of (1) a single random sample of 100 values from this population, (2) a distribution of 100 sample means from random samples with size 5, and (3) a distribution of 100 sample means from random samples with size 25. Determine which plot (A, B, or C) is which and explain your reasoning.

4.43 Spam mail, Part I. The 2004 National Technology Readiness Survey sponsored by the Smith School of Business at the University of Maryland surveyed 418 randomly sampled Americans, asking them how many spam emails they receive per day. The survey was repeated on a new random sample of 499 Americans in 2009. (a) What are the hypotheses for evaluating if the average spam emails per day has changed from 2004 to 2009.

$H_O: mu 2004 - mu 2009 = 0 HA : mu 2004 - mu 2009 does nt = 0

In 2004 the mean was 18.5 spam emails per day, and in 2009 this value was 14.9 emails per day. What is the point estimate for the difference between the two population means?

18.5-14.9

## [1] 3.6

A report on the survey states that the observed difference between the sample means is not statistically significant. Explain what this means in context of the hypothesis test and the data.

Not statistically significant means we failed to reject the null.

Would you expect a confidence interval for the difference between the two population means to contain 0? Explain your reasoning.

Yes