3.6 Triathlon times, Part II. In Exercise 3.4 we saw two distributions for triathlon times: N(μ = 4313,! = 583) for Men, Ages 30 - 34 and N(μ = 5261,! = 807) for the Women, Ages 25 - 29 group. Times are listed in seconds. Use this information to compute each of the following:

  1. The cutoff time for the fastest 5% of athletes in the men’s group, i.e. those who took the shortest 5% of time to finish.

P(Z > z)=.05

qnorm(.05, mean=4313, sd=583)
## [1] 3354.05
  1. The cutoff time for the slowest 10% of athletes in the women’s group.

1- P(Z > z) = .1 P(Z > Z) = 1 -.1 P(Z > z) = .9

qnorm(.9, mean=5261 , sd=807 )
## [1] 6295.212

3.12 Speeding on the I-5. The distribution of passenger vehicle speeds traveling on the Interstate 5 Freeway (I-5) in California is nearly normal with a mean of 72.6 miles/hour and a standard deviation of 4.78 miles/hour.

  1. What percent of passenger vehicles travel slower than 80 miles/hour?

P(X<80)

pnorm(80, 72.6, 4.78)
## [1] 0.939203
  1. What percent of passenger vehicles travel between 60 and 80 miles/hour?
pnorm(80,72.6,4.78)-pnorm(60,72.6,4.78)
## [1] 0.9350083
  1. How fast do the fastest 5% of passenger vehicles travel?
qnorm(.95,72.6,4.78)
## [1] 80.4624
  1. The speed limit on this stretch of the I-5 is 70 miles/hour. Approximate what percentage of the passenger vehicles travel above the speed limit on this stretch of the I-5.

P(X>70)

1-pnorm(70,72.6,4.78)
## [1] 0.7067562

3.18 Heights of female college students.

hist(heights<-c(54,55,56,56,57,58,58,59,60,60,60,61,61,62,62,63,63,63,64,65,65,67,67,69,73))

mean(heights)
## [1] 61.52
sd(heights)
## [1] 4.583667

Do these data appear to follow a normal distribution? Explain your reasoning using the graphs provided below.

Using the given mean and standard deviation. 68% of the points should fall between one standard deviation of each side of the mean which are 56.94 (61.52-4.58) and 66.1 (61.52+4.58). Taking a count of the given values of heights that equates to 17/25 = 68% of the observations fall within one standard deviation.

95% (Within two standard deviations of the mean) 61.52 + (4.582) = 70.68 61.52 - (4.582) = 52.36 Taking a count of heights that’s 24/25 values = 96%

99.7 (within three standard deviations of the mean) 61.52 + (4.583) = 75.26 61.52 - (4.583) = 47.78 Taking a count of the heights that’s 25/25 = 100%

This means the dataset is normally distributed.

3.24 Speeding on the I-5 part 2.

Exercise 3.12 states that the distribution of speeds of cars traveling on the Interstate 5 Freeway (I-5) in California is nearly normal with a mean of 72.6 miles/hour and a standard deviation of 4.78 miles/hour. The speed limit on this stretch of the I-5 is 70 miles/hour.

  1. A highway patrol officer is hidden on the side of the freeway. What is the probability that 5 cars pass and none are speeding? Assume that the speeds of the cars are independent of each other.

P(x<70)^5 The probability is .216

#Probability of cars not speeding
notspeeding<- pnorm(70,72.6,4.78) 

#Raise to the 5th power for five cars
notspeeding^5
## [1] 0.002168423
  1. On average, how many cars would the highway patrol officer expect to watch until the first car that is speeding? 1.41

The probability of observing the first success on the nth trial indicates a geometric random variable or a negative binomial distribution. First we need to understand the probability of the cars that would be speeding which is .71 then we take the expectation for the geometric 1/probability of success which is 1.41.

speeding<- 1-notspeeding
firstsuccess<- 1/speeding

What is the standard deviation of the number of cars he would expect to watch? 2.83

sqrt <- (firstsuccess/speeding^2)

3.30 Survey Response Rate Pew Research reported in 2012 that the typical response rate to their surveys is only 9%. If for a particular survey 15,000 households are contacted, what is the probability that at least 1,500 will agree to respond?

P(X>1499)

1-pbinom(1499,15000,.09)
## [1] 1.326331e-05

3.36 Multiple choice quiz. In a multiple choice quiz there are 5 questions and 4 choices for each question (a, b, c, d). Robin has not studied for the quiz at all, and decides to randomly guess the answers. What is the probability that

  1. the first question she gets right is the 3rd question? .141
third<- .75*.75*.25
third
## [1] 0.140625

P(X=3) or P(X=4) (b) she gets exactly 3 or exactly 4 questions right?

dbinom(3,5,.25) + dbinom(4,5,.25)
## [1] 0.1025391
  1. she gets the majority of the questions right? P(X>2)
1-pbinom(2,5,.25)
## [1] 0.1035156

3.42 Serving in volleyball. A not-so-skilled volleyball player has a 15% chance of making the serve, which involves hitting the ball so it passes over the net on a trajectory such that it will land in the opposing teams court. Suppose that her serves are independent of each other.

  1. What is the probability that on the 10th try she will make her 3rd successful serve?

Success is fixed on the last attempt so we want to solve for the probability of two successes in nine trials and multiply by the success of the 10th trial

dbinom(2,9,.15)*.15
## [1] 0.03895012
  1. Suppose she has made two successful serves in nine attempts. What is the probability that her 10th serve will be successful?

each trial is independent so the probability is .15

  1. Even though parts (a) and (b) discuss the same scenario, the probabilities you calculated should be different. Can you explain the reason for this discrepancy?

They are asking different questions, a is concerned with the probability of success for 9 trials given the 10th trial is a success. b is more concerned with the individual probability and because the events are independent, each trial has the same probability.

4.6 Art after school. Elijah and Tyler, two high school juniors, conducted a survey on 15 students at their school, asking the students whether they would like the school to offer an afterschool art program, counted the number of “yes” answers, and recorded the sample proportion. 14 out of the 15 students responded “yes”. They repeated this 100 times and built a distribution of sample means. (Note that this question requires having reviewed Section 3.4.2 on the normal approximation to the binomial distribution.)

  1. What is this distribution called?

sampling distribution

  1. Would you expect the shape of this distribution to be symmetric, right skewed, or left skewed? Explain your reasoning.

Given the samples gathered are from a small population of less than 30 I would not expect the distribution to be normally distributed, I would expect this data to be skewed.

  1. Calculate the variability of this distribution and state the appropriate term used to refer to this value.

The standard error of the dataset is approximately

pop <- rbinom(100,15,.93)
s_size <-15
n_rep<- 100
s_means <-rep(NA,n_rep)
for (i in 1:n_rep){
  pop_samp <- sample(pop,s_size)
  s_means[i] <- mean(pop_samp)
}
SE <-sd(pop_samp)/sqrt(15)
SE
## [1] 0.1902379
  1. Suppose that the students were able to recruit a few more friends to help them with sampling, and are now able to collect data from random samples of 25 students. Once again, they record the number of “yes” answers, and record the sample proportion, and repeat this 100 times to build a new distribution of sample proportions. How will the variability of this new distribution compare to the variability of the original distribution?

The sample is still less than 30 so I do not see much of an impact on the variability

4.12 Mental health. The 2010 General Social Survey asked the question: “For how many days during the past 30 days was your mental health, which includes stress, depression, and problems with emotions, not good?” Based on responses from 1,151 US residents, the survey reported a 95% confidence interval of 3.40 to 4.24 days in 2010. (a) Interpret this interval in context of the data.

the true mean is likely to be within the given interval

  1. What does “95% confident” mean? Explain in the context of the application.

the study provided us with 95% confidence that the true mean is between .4 and 4.24 days

  1. Suppose the researchers think a 99% confidence level would be more appropriate for this interval. Will this new interval be smaller or larger than the 95% confidence interval?

the interval would be larger

  1. If a new survey were to be done with 500 Americans, would the standard error of the estimate be larger, smaller, or about the same. Assume the standard deviation has remained constant since 2010.

the SE is computed at standard deviation/square root of number of observations so as the number of participants increases the SE gets smaller

4.18 Identify hypotheses, Part II. Write the null and alternative hypotheses in words and using symbols for each of the following situations.

  1. Since 2008, chain restaurants in California have been required to display calorie counts of each menu item. Prior to menus displaying calorie counts, the average calorie intake of diners at a restaurant was 1100 calories. After calorie counts started to be displayed on menus, a nutritionist collected data on the number of calories consumed at this restaurant from a random sample of diners.

Do these data provide convincing evidence of a difference in the average calorie intake of a diner at this restaurant?

the change does not specify direction

  1. Based on the performance of those who took the GRE exam between July 1, 2004 and June 30, 2007, the average Verbal Reasoning score was calculated to be 462. In 2011 the average verbal score was slightly higher. Do these data provide convincing evidence that the average GRE Verbal Reasoning score has changed since 2004?

there is no direction specified to make a conclusion that there has been change

4.24 Gifted children, Part I. Researchers investigating characteristics of gifted children collected data from schools in a large city on a random sample of thirty-six children who were identified as gifted children soon after they reached the age of four. The following histogram shows the distribution of the ages (in months) at which these children first counted to 10 successfully. Also provided are some sample statistics

  1. Are conditions for inference satisfied?

yes, the children were randomly selected and the sample size is > 30

  1. Suppose you read online that children first count to 10 successfully when they are 32 months old, on average. Perform a hypothesis test to evaluate if these data provide convincing evidence that the average age at which gifted children fist count to 10 successfully is less than the general average of 32 months. Use a significance level of 0.10.
p <- (30.69-32)/(4.31/6)
ps <- 2*pnorm(p)
ps
## [1] 0.0682026
  1. Interpret the p-value in context of the hypothesis test and the data.

since the p-value of .1 is greater than .05 we reject the null in favor of the alternative that there is difference

  1. Calculate a 90% confidence interval for the average age at which gifted children first count to 10 successfully.
SE <-(4.31/6)
upper <- (30.69+ 1.64)*SE
upper
## [1] 23.22372
lower <- (30.69 -1.64)*SE  
lower
## [1] 20.86758
  1. Do your results from the hypothesis test and the confidence interval agree? Explain.

yes, because the true mean is not within the lower and upper bounds then we reject the null

4.30 Testing for food safety. A food safety inspector is called upon to investigate a restaurant with a few customer reports of poor sanitation practices. The food safety inspector uses a hypothesis testing framework to evaluate whether regulations are not being met. If he decides the restaurant is in gross violation, its license to serve food will be revoked.

  1. Write the hypotheses in words.

Ho: The restaurant does not have has poor sanitation practices

  1. What is a Type 1 Error in this context?

The restaurant is concluded to have poor sanitation practices when it actually does not

  1. What is a Type 2 Error in this context?

The restaurant is cleared of having poor sanitation practices when it actually does

  1. Which error is more problematic for the restaurant owner? Why?

Type 1, because the owner would lose his business or potentially have to pay large fine and spend further capital on bringing the restaurant up to the health standards.

  1. Which error is more problematic for the diners? Why?

Type 2, their health might be compromised

  1. As a diner, would you prefer that the food safety inspector requires strong evidence or very strong evidence of health concerns before revoking a restaurants license? Explain your reasoning.

As a diner I would prefer strong evidence before revoking the license. If strong evidence does not exist then I may feel more comfortable going back.

4.32 True or false. Determine if the following statements are true or false, and explain your reasoning. If false, state how it could be corrected.

  1. If a given value (for example, the null hypothesized value of a parameter) is within a 95% confidence interval, it will also be within a 99% confidence interval.

False

  1. Decreasing the significance level will increase the probability of making a Type 1 Error.

False

  1. Suppose the null hypothesis is μ = 5 and we fail to reject H0. Under this scenario, the true population mean is 5.

False

  1. If the alternative hypothesis is true, then the probability of making a Type 2 Error and the power of a test add up to 1.

True

  1. With large sample sizes, even small differences between the null value and the true value of the parameter, a difference often called the eject size, will be identified as statistically significant.

True

4.38 Identify distributions, Part II. Four plots are presented below. The plot at the top is a distribution for a population. The mean is 60 and the standard deviation is 18. Also shown below is a distribution of (1) a single random sample of 500 values from this population, (2) a distribution of 500 sample means from random samples of each size 18, and (3) a distribution of 500 sample means from random samples of each size 81.

Determine which plot (A, B, or C) is which and explain your

Plot B, given the values are similar and the curve also closely resembles the population

4.44 Nearsighted. It is believed that nearsightedness affects about 8% of all children. In a random sample of 194 children, 21 are nearsighted.

  1. Construct hypotheses appropriate for the following question: do these data provide evidence that the 8% value is inaccurate?

Ho: 8% of all children are affected by nearsightedness Ha: more of less than 8% are affected by nearsightedness

  1. What proportion of children in this sample are nearsighted?

21/194 = 11%

  1. Given that the standard error of the sample proportion is 0.0195 and the point estimate follows a nearly normal distribution, calculate the test statistic (the Z-statistic).

z= x - mu/sd z = (.08 -.11)/.0195 = -1.45

x<-.08
mu<-21/194
sd<-.0195
z<-(x-mu)/sd
z
## [1] -1.448586
  1. What is the p-value for this hypothesis test?
p<- pnorm(z)
p
## [1] 0.07372665
  1. What is the conclusion of the hypothesis test?

since the p-value is greater than alpha .05, we fail to reject the null hypothesis