#Homework 4, James Lunga

#3.6

In Exercise 3.4 we saw two distributions for triathlon times: men’s finishing times, M∼N(4313,583), and women’s finishing times. W∼N(5261,807). Times are listed in seconds. Use this information to compute each of the following:

a.) The cutoff time for the fastest 5% of athletes in the men’s group, i.e. those who took the shortest 5% of time to finish. b) The cutoff time for the slowest 10% of athletes in the women’s group.

#Solution 3.6

a)We’ll use qnorm since we have a quantile value we’re looking for and we were told it was normally distributed.

qnorm(0.05, mean = 4313, sd = 583)
## [1] 3354.05

The cut off time for the fastest 5% is 3354.05 b)

qnorm(.9, mean = 5621, sd = 807)
## [1] 6655.212

for women the slowest 10 percent we do the same thing for males except use the complement to q or 1-q.

#3.12

3.12 Speeding on the I-5, Part I. The distribution of passenger vehicle speeds traveling on the Interstate 5 Freeway (I-5) in California is nearly normal with a mean of 72.6 miles/hour and a standard deviation of 4.78 miles/hour.

  1. What percent of passenger vehicles travel slower than 80 miles/hour?
  2. What percent of passenger vehicles travel between 60 and 80 miles/hour?
  3. How fast do the fastest 5% of passenger vehicles travel?
  4. The speed limit on this stretch of the I-5 is 70 miles/hour. Approximate what percentage of the passenger vehicles travel above the speed limit on this stretch of the I-5.

#Solution 3.12

pnorm(80, 72.6, 4.78)
## [1] 0.939203

93.92% go slower than 80

pnorm(80, 72.6, 4.78) - pnorm(60, 72.6, 4.78)
## [1] 0.9350083

93.5% go between 60 and 80 c)

qnorm(.95, 72.6, 4.78)
## [1] 80.4624

fastest 5% go at 80.46 mph

1-pnorm(70, 72.6,4.78)
## [1] 0.7067562

70.68%

#3.18

#3.18 Heights of female college students. Below are heights of 25 female college students.
heights <- c(54,55,56,56,57,58,58,59,60,60,60,61,61,62,62,63,63,63,64,65,65,67,67,69,73)
#a) The mean height is 61.52 inches with a standard deviation of 4.58 inches. Use this information to determine if the heights approximately follow the 68-95-99.7% Rule.
avg <- 61.52
sd <- as.numeric(4.58)
#68
sigma1 <- heights <= (avg+sd) & heights >= (avg-sd)
length(sigma1[sigma1 == TRUE])/length(heights)*100
## [1] 68
#95
sigma2 <- heights <= (avg+sd*2) & heights >= (avg-sd*2)
length(sigma2[sigma2 == TRUE])/length(heights)*100
## [1] 96
#99.7 Test
sigma3 <- heights <= (avg+sd*3) & heights >= (avg-sd*3)
length(sigma3[sigma3 == TRUE])/length(heights)*100
## [1] 100
  1. Running the 68-95-99.7 test we found the values approximetely follow them. Additionally, if you look at the histogram the data follows the line of normal distribution.

#3.24 Speeding on the I-5, Part II. Exercise 3.12 states that the distribution of speeds of cars traveling on the Interstate 5 Freeway (I-5) in California is nearly normal with a mean of 72.6 miles/hour and a standard deviation of 4.78 miles/hour. The speed limit on this stretch of the I-5 is 70 miles/hour.

  1. A highway patrol officer is hidden on the side of the freeway. What is the probability that 5 cars pass and none are speeding? Assume that the speeds of the cars are independent of each
  2. On average, how many cars would the highway patrol officer expect to watch until the first car that is speeding? What is the standard deviation of the number of cars he would expect to watch?

#3.24 Solutions

ns <- (72.6 - 70)/4.78
pns <- pnorm(ns)
pns^5
## [1] 0.1763389
nc <- 1/pns
nc
## [1] 1.414915
Variance <- nc*pns*(1-pns)
sqrt(Variance)
## [1] 0.5415199

#3.30 Survey response rate. Pew Research reported in 2012 that the typical response rate to their surveys is only 9%. If for a particular survey 15,000 households are contacted, what is the probability that at least 1,500 will agree to respond?

#3.30 Solution

pbinom(1499,15000,.09)
## [1] 0.9999867

99.9% probability #3.36 Multiple choice quiz. In a multiple choice quiz there are 5 questions and 4 choices for each question (a, b, c, d). Robin has not studied for the quiz at all, and decides to randomly guess the answers. What is the probability that (a) the first question she gets right is the 3rd question? (b) she gets exactly 3 or exactly 4 questions right? (c) she gets the majority of the questions right?

#3.36 Solution a)

pqr <- 0.25
pqw <- (1-pqr)
(pqw^2)*pqr
## [1] 0.140625

14.06% b)

dbinom(4,5,0.25)+dbinom(3,5,0.25)
## [1] 0.1025391
1-pbinom(2,5,0.25)
## [1] 0.1035156

#3.42 Serving in volleyball. A not-so-skilled volleyball player has a 15% chance of making the serve, which involves hitting the ball so it passes over the net on a trajectory such that it will land in the opposing team’s court. Suppose that her serves are independent of each other. (a) What is the probability that on the 10th try she will make her 3rd successful serve? (b) Suppose she has made two successful serves in nine attempts. What is the probability that her 10th serve will be successful? (c) Even though parts (a) and (b) discuss the same scenario, the probabilities you calculated should be di↵erent. Can you explain the reason for this discrepancy?

#3.42 Solutions a) 4.6

prbS <- 0.15
k <- 3
n <- 10
factorial(n-1)/(factorial(k-1)*(factorial(n - k)))*prbS^k*(1-prbS)^(n-k)
## [1] 0.03895012
  1. 15% because they’re independant c)They’re different one looks a a specific event occuring given other events, where as one is an independent event.

#4.12

Mental health. The 2010 General Social Survey asked the question: “For how many days during the past 30 days was your mental health, which includes stress, depression, and problems with emotions, not good?” Based on responses from 1,151 US residents, the survey reported a 95% confidence interval of 3.40 to 4.24 days in 2010. (a) Interpret this interval in context of the data. (b) What does “95% confident” mean? Explain in the context of the application. (c) Suppose the researchers think a 99% confidence level would be more appropriate for this interval. Will this new interval be smaller or larger than the 95% confidence interval? (d) If a new survey were to be done with 500 Americans, would the standard error of the estimate be larger, smaller, or about the same. Assume the standard deviation has remained constant since 2010

#4.12 Solutions

  1. it means that 95% of the population is in between intervals 3.4 and 4.24
  2. Well it means that 95% of the data point lie within the interval so at random they are 95% certain a data point will be in the interval. c)it would be larger to include more data points at the extremes.
  3. Higher because the less data points the more unsure you are but to a point.

#4.18 4.18 Identify hypotheses, Part II. Write the null and alternative hypotheses in words and using symbols for each of the following situations. (a) Since 2008, chain restaurants in California have been required to display calorie counts of each menu item. Prior to menus displaying calorie counts, the average calorie intake of diners at a restaurant was 1100 calories. After calorie counts started to be displayed on menus, a nutritionist collected data on the number of calories consumed at this restaurant from a random sample of diners. Do these data provide convincing evidence of a di↵erence in the average calorie intake of a diners at this restaurant? (b) Based on the performance of those who took the GRE exam between July 1, 2004 and June 30, 2007, the average Verbal Reasoning score was calculated to be 462. In 2011 the average verbal score was slightly higher. Do these data provide convincing evidence that the average GRE Verbal Reasoning score has changed since 2004?

#4.18 Solutions a) Depends on the sample size in relation to how many restaraunts but it should be convincing evidence. b) No because they average out 3 years then took the data point from one year and asked if there was sgrowth. What they should have done was plot each year box plot and put in a line of best fit to see if there was actual growth.

#4.24

#Calculate Z
(30.69-32)/(4.31/sqrt(36))
## [1] -1.823666
#Calculate the p-value:
pnorm(-1.823666,0,1)
## [1] 0.03410129

The results say to reject the null c) because the p-value is so low there is significant evidence to infer that that gifted children can count to 10. d)

30.69-1.64*4.31/sqrt(36)
## [1] 29.51193
30.69+1.64*4.31/sqrt(36)
## [1] 31.86807

A 90% confidence interval from 29.5 to 31.86 e)

#4.30 and Solutions 4.30 Testing for food safety. A food safety inspector is called upon to investigate a restaurant with a few customer reports of poor sanitation practices. The food safety inspector uses a hypothesis testing framework to evaluate whether regulations are not being met. If he decides the restaurant is in gross violation, its license to serve food will be revoked. (a) Write the hypotheses in words. The hypothesis is that if the regulations aren ot met the restaurant is in violation. (b) What is a Type 1 Error in this context? The restaurant passes the tests but is actually in violation of other regulations so a false-positive (c) What is a Type 2 Error in this context? If the restautant fails the tests but is in actuality in compliance (d) Which error is more problematic for the restaurant owner? Why? Error 2 because he loses his license and can not sell. (e) Which error is more problematic for the diners? Why? Error 1 because while passing the tests the diner is in violation and the the diners could get sick. (f) As a diner, would you prefer that the food safety inspector requires strong evidence or very strong evidence of health concerns before revoking a restaurant’s license? Explain your reasoning. Strong evidence because with a lower margin of tolerance the failures and revokings are going to be high so you won’t eat there as often.

#4.32 and Solutions True or false. Determine if the following statements are true or false, and explain your reasoning. If false, state how it could be corrected. (a) If a given value (for example, the null hypothesized value of a parameter) is within a 95% confidence interval, it will also be within a 99% confidence interval. False becuase the two are not mutually excluisive. You can be in the same standard deviation that lies in bot but it’s not guarenteed. (b) Decreasing the significance level will increase the probability of making a Type 1 Error. True (c) Suppose the null hypothesis is μ = 5 and we fail to reject H0. Under this scenario, the true population mean is 5. False unless the null hypothesis was the compliment to the hypothesis. (d) If the alternative hypothesis is true, then the probability of making a Type 2 Error and the power of a test add up to 1. True (e) With large sample sizes, even small di↵erences between the null value and the true value of the parameter, a di↵erence often called the e↵ect size , will be identified as statistically significant. True

#4.38 and Solution 1 is with B because plot B is a random sampling of the top plot so they should be similar in shape. 2 is plot c because while normally distibuted there is less observations 3 is A because the average is going to tend higher towards the center and there are more sample sizes. #4.44 4.44 Nearsighted. It is believed that nearsightedness affects about 8% of all children. In a random sample of 194 children, 21 are nearsighted. (a) Construct hypotheses appropriate for the following question: do these data provide evidence that the 8% value is inaccurate? H0: 8% average of nearsighted children Ha:Not an 8% average for near sightedness (b) What proportion of children in this sample are nearsighted? Approximately 10.82% (21/194) (c) Given that the standard error of the sample proportion is 0.0195 and the point estimate follows a nearly normal distribution, calculate the test statistic (the Z-statistic). (0.1082474-0.08)/.0195 1.44 (d) What is the p-value for this hypothesis test? 2*(1-pnorm(1.448585)) [1] 0.1474535 (e) What is the conclusion of the hypothesis test? Significance level is greater than the p-value, so the null hypothesis would be rejected.