3.6 Triathlon times, Part II. In Exercise 3.4 we saw two distributions for triathlon times: N(μ = 4313,! = 583) for Men, Ages 30 - 34 and N(μ = 5261,! = 807) for the Women, Ages 25 - 29 group. Times are listed in seconds. Use this information to compute each of the following:
P(Z > z)=.05
qnorm(.05, mean=4313, sd=583)
## [1] 3354.05
1- P(Z > z) = .1 P(Z > Z) = 1 -.1 P(Z > z) = .9
qnorm(.9, mean=5261 , sd=807 )
## [1] 6295.212
3.12 Speeding on the I-5. The distribution of passenger vehicle speeds traveling on the Interstate 5 Freeway (I-5) in California is nearly normal with a mean of 72.6 miles/hour and a standard deviation of 4.78 miles/hour.
P(X<80)
pnorm(80, 72.6, 4.78)
## [1] 0.939203
pnorm(80,72.6,4.78)-pnorm(60,72.6,4.78)
## [1] 0.9350083
qnorm(.95,72.6,4.78)
## [1] 80.4624
P(X>70)
1-pnorm(70,72.6,4.78)
## [1] 0.7067562
3.18 Heights of female college students.
hist(heights<-c(54,55,56,56,57,58,58,59,60,60,60,61,61,62,62,63,63,63,64,65,65,67,67,69,73))
mean(heights)
## [1] 61.52
sd(heights)
## [1] 4.583667
Do these data appear to follow a normal distribution? Explain your reasoning using the graphs provided below.
Using the given mean and standard deviation. 68% of the points should fall between one standard deviation of each side of the mean which are 56.94 (61.52-4.58) and 66.1 (61.52+4.58). Taking a count of the given values of heights that equates to 17/25 = 68% of the observations fall within one standard deviation.
95% (Within two standard deviations of the mean) 61.52 + (4.582) = 70.68 61.52 - (4.582) = 52.36 Taking a count of heights that’s 24/25 values = 96%
99.7 (within three standard deviations of the mean) 61.52 + (4.583) = 75.26 61.52 - (4.583) = 47.78 Taking a count of the heights that’s 25/25 = 100%
This means the dataset is normally distributed.
3.24 Speeding on the I-5 part 2.
Exercise 3.12 states that the distribution of speeds of cars traveling on the Interstate 5 Freeway (I-5) in California is nearly normal with a mean of 72.6 miles/hour and a standard deviation of 4.78 miles/hour. The speed limit on this stretch of the I-5 is 70 miles/hour.
P(x<70)^5 The probability is .216
#Probability of cars not speeding
notspeeding<- pnorm(70,72.6,4.78)
#Raise to the 5th power for five cars
notspeeding^5
## [1] 0.002168423
The probability of observing the first success on the nth trial indicates a geometric random variable or a negative binomial distribution. First we need to understand the probability of the cars that would be speeding which is .71 then we take the expectation for the geometric 1/probability of success which is 1.41.
speeding<- 1-notspeeding
firstsuccess<- 1/speeding
What is the standard deviation of the number of cars he would expect to watch? 2.83
sqrt <- (firstsuccess/speeding^2)
3.30 Survey Response Rate Pew Research reported in 2012 that the typical response rate to their surveys is only 9%. If for a particular survey 15,000 households are contacted, what is the probability that at least 1,500 will agree to respond?
P(X>1499)
1-pbinom(1499,15000,.09)
## [1] 1.326331e-05
3.36 Multiple choice quiz. In a multiple choice quiz there are 5 questions and 4 choices for each question (a, b, c, d). Robin has not studied for the quiz at all, and decides to randomly guess the answers. What is the probability that
third<- .75*.75*.25
third
## [1] 0.140625
P(X=3) or P(X=4) (b) she gets exactly 3 or exactly 4 questions right?
dbinom(3,5,.25) + dbinom(4,5,.25)
## [1] 0.1025391
1-pbinom(2,5,.25)
## [1] 0.1035156
3.42 Serving in volleyball. A not-so-skilled volleyball player has a 15% chance of making the serve, which involves hitting the ball so it passes over the net on a trajectory such that it will land in the opposing teams court. Suppose that her serves are independent of each other.
Success is fixed on the last attempt so we want to solve for the probability of two successes in nine trials and multiply by the success of the 10th trial
dbinom(2,9,.15)*.15
## [1] 0.03895012
each trial is independent so the probability is .15
They are asking different questions, a is concerned with the probability of success for 9 trials given the 10th trial is a success. b is more concerned with the individual probability and because the events are independent, each trial has the same probability.
4.6 Art after school. Elijah and Tyler, two high school juniors, conducted a survey on 15 students at their school, asking the students whether they would like the school to offer an afterschool art program, counted the number of “yes” answers, and recorded the sample proportion. 14 out of the 15 students responded “yes”. They repeated this 100 times and built a distribution of sample means. (Note that this question requires having reviewed Section 3.4.2 on the normal approximation to the binomial distribution.)
sampling distribution
Given the samples gathered are from a small population of less than 30 I would not expect the distribution to be normally distributed, I would expect this data to be skewed.
The standard error of the dataset is approximately
pop <- rbinom(100,15,.93)
s_size <-15
n_rep<- 100
s_means <-rep(NA,n_rep)
for (i in 1:n_rep){
pop_samp <- sample(pop,s_size)
s_means[i] <- mean(pop_samp)
}
SE <-sd(pop_samp)/sqrt(15)
SE
## [1] 0.1902379
The sample is still less than 30 so I do not see much of an impact on the variability
4.12 Mental health. The 2010 General Social Survey asked the question: “For how many days during the past 30 days was your mental health, which includes stress, depression, and problems with emotions, not good?” Based on responses from 1,151 US residents, the survey reported a 95% confidence interval of 3.40 to 4.24 days in 2010. (a) Interpret this interval in context of the data.
the true mean is likely to be within the given interval
the study provided us with 95% confidence that the true mean is between .4 and 4.24 days
the interval would be larger
the SE is computed at standard deviation/square root of number of observations so as the number of participants increases the SE gets smaller
4.18 Identify hypotheses, Part II. Write the null and alternative hypotheses in words and using symbols for each of the following situations.
Do these data provide convincing evidence of a difference in the average calorie intake of a diner at this restaurant?
the change does not specify direction
there is no direction specified to make a conclusion that there has been change
4.24 Gifted children, Part I. Researchers investigating characteristics of gifted children collected data from schools in a large city on a random sample of thirty-six children who were identified as gifted children soon after they reached the age of four. The following histogram shows the distribution of the ages (in months) at which these children first counted to 10 successfully. Also provided are some sample statistics
yes, the children were randomly selected and the sample size is > 30
p <- (30.69-32)/(4.31/6)
ps <- 2*pnorm(p)
ps
## [1] 0.0682026
since the p-value of .1 is greater than .05 we reject the null in favor of the alternative that there is difference
SE <-(4.31/6)
upper <- (30.69+ 1.64)*SE
upper
## [1] 23.22372
lower <- (30.69 -1.64)*SE
lower
## [1] 20.86758
yes, because the true mean is not within the lower and upper bounds then we reject the null
4.30 Testing for food safety. A food safety inspector is called upon to investigate a restaurant with a few customer reports of poor sanitation practices. The food safety inspector uses a hypothesis testing framework to evaluate whether regulations are not being met. If he decides the restaurant is in gross violation, its license to serve food will be revoked.
Ho: The restaurant does not have has poor sanitation practices
The restaurant is concluded to have poor sanitation practices when it actually does not
The restaurant is cleared of having poor sanitation practices when it actually does
Type 1, because the owner would lose his business or potentially have to pay large fine and spend further capital on bringing the restaurant up to the health standards.
Type 2, their health might be compromised
As a diner I would prefer strong evidence before revoking the license. If strong evidence does not exist then I may feel more comfortable going back.
4.32 True or false. Determine if the following statements are true or false, and explain your reasoning. If false, state how it could be corrected.
False
False
False
True
True
4.38 Identify distributions, Part II. Four plots are presented below. The plot at the top is a distribution for a population. The mean is 60 and the standard deviation is 18. Also shown below is a distribution of (1) a single random sample of 500 values from this population, (2) a distribution of 500 sample means from random samples of each size 18, and (3) a distribution of 500 sample means from random samples of each size 81.
Determine which plot (A, B, or C) is which and explain your
Plot B, given the values are similar and the curve also closely resembles the population
4.44 Nearsighted. It is believed that nearsightedness affects about 8% of all children. In a random sample of 194 children, 21 are nearsighted.
Ho: 8% of all children are affected by nearsightedness Ha: more of less than 8% are affected by nearsightedness
21/194 = 11%
z= x - mu/sd z = (.08 -.11)/.0195 = -1.45
x<-.08
mu<-21/194
sd<-.0195
z<-(x-mu)/sd
z
## [1] -1.448586
p<- pnorm(z)
p
## [1] 0.07372665
since the p-value is greater than alpha .05, we fail to reject the null hypothesis