*Submit your homework to Canvas by the due date and time. Email your instructor if you have extenuating circumstances and need to request an extension.
*If an exercise asks you to use R, include a copy of the code and output. Please edit your code and output to be only the relevant portions.
*If a problem does not specify how to compute the answer, you many use any appropriate method. I may ask you to use R or use manually calculations on your exams, so practice accordingly.
*You must include an explanation and/or intermediate calculations for an exercise to be complete.
*Be sure to submit the HWK6 Autograde Quiz which will give you ~20 of your 40 accuracy points.
*50 points total: 40 points accuracy, and 10 points completion
Exercise 1. An automobile club pays for emergency road services (ERS) requested by its members. Upon examining a sample of 2927 ERS calls from the club members, the club finds that 1499 calls related to starting problems, 849 calls involved serious mechanical failures requiring towing, 498 calls involved flat tires or lockouts, and 81 calls were for other reasons.
- Construct a \(98\%\) confidence interval “by hand” for the proportion of all ERS calls from club members that are serious mechanical problems requiring towing services (after checking that necessary assumptions are well met).
p_hat<-849/2927 #point estimate
# Calculate standard error
se<-sqrt((p_hat*(1-p_hat))/2927)
# Find critical value
t<-qt(.99,2926)
# Find margin of error
moe <- se*t
c(p_hat-moe,p_hat+moe)
## [1] 0.2705347 0.3095815
- The current policy rate the automobile club pays is based on the thought that \(20\%\) of services requested will be serious mechanical problems requiring towing. However, the insurance company claims that the auto club has a higher rate of serious mechanical problems requiring towing services. Using your confidence interval in part (a), respond to the insurance company’s claim.
We are 98% confident that all ERs calls are between 0.2705-0.3096. Since the .2 does not fall into the region, the company claim that the auto club has a higher rate of serious mechanical problems requiring towing services is true.
- The club wants to construct a \(95\%\) confidence interval for the proportion of members who want a chocolate fountain at the annual picnic. They want the margin of error to be less than 0.01. How large of a random sample of club members should they contact if they start with the assumption that \(50\%\) are in favor of a chocolate fountain at the picnic? (Hint: write out the formula for margin of error, then solve for n)
Margin of error<-set set=.01
p_hat=.5
t=qnorm(.975)
t
## [1] 1.959964
Critical val=1.96 next step=rearrange equation (p_hat)(1-p_hat)/((.01/t)^2))<=n
(p_hat*(1-p_hat))/((.01/t)^2)
## [1] 9603.647
Our sample size needs to be at least 9603.647 to get a margin of error less than .01.
Exercise 2. Recall the tree data set in R, trees. Note that the diameter (in inches) is labelled Girth in the data.
- Consider the hypothesis test of \(H_o: \mu_D=12\) vs \(H_A: \mu_D \ne 12\) where \(\mu_D\) is the mean diameter of cherry trees from which this sample was collected. Use an alpha level of \(\alpha=0.10\)
- Compute the t test statistic and pvalue not using t.test() and then confirm the values using t.test().
Girth <- trees$Girth
(x_bar <- mean(Girth))
## [1] 13.24839
(s <- sd(Girth))
## [1] 3.138139
(n <- length(Girth))
## [1] 31
qt(.995,30)
## [1] 2.749996
Girth <- trees$Girth
(13.248-12)/(3.138/sqrt(31))
## [1] 2.214331
2*pt(2.214331, 30, lower.tail=FALSE)
## [1] 0.03454771
t.test(trees,mu=12)
##
## One Sample t-test
##
## data: trees
## t = 9.3936, df = 92, p-value = 4.309e-15
## alternative hypothesis: true mean is not equal to 12
## 95 percent confidence interval:
## 33.92733 45.68557
## sample estimates:
## mean of x
## 39.80645
- Use the p value to draw a conclusion about the hypothesis:\(H_o: \mu_D=12\) vs \(H_A: \mu_D \ne 12\) where \(\mu_D\) is the mean diameter of cherry trees from which this sample was collected in the context of the question.
The p value is smaller than the alpha, so we reject this hypothesis.
- Compare the conclusions drawn from the 90% confidence interval for \(\mu_D\) in homework 5, exercise 2b and the hypothesis test in the previous question.
In the confidence interval from homework 5 is similar to to the ones drawn in the hypothesis test in 2b. However the values drawn from 2b show the pvalue test, which is the probability of obtaining a result as extreme or more extreme as the one obtained in the sample, assuming the null is true. This is telling us that the mean diameter is not equal to 12 and is not included in the 90% CI. The CI is an interval estimate, a random variable that has some probability of covering the true parameter value.It gives us the exact mean by adding the upper and lower values and dividing them by 2, which the pvalue does not include, it also gives us an interval of the 90% of the data lies.
- Consider the hypothesis test of \(H_o: \mu_H=77\) vs \(H_A: \mu \ne 77\) where \(\mu_H\) is the mean height of cherry trees from which this sample was collected. Use an alpha level of \(\alpha=0.10\)
- Compute the t test statistic and pvalue not using t.test() and then confirm the values using t.test().
qt(.9,30)
## [1] 1.310415
2*pt(2.27, 69, lower.tail=FALSE)
## [1] 0.02633481
2*pt(.87378599, 30, lower.tail = FALSE)
## [1] 0.389176
(76-77)/(6.372/sqrt(31))
## [1] -0.873786
2*pt(0.873786, 30, lower.tail=FALSE)
## [1] 0.389176
t.test(trees,mu=77)
##
## One Sample t-test
##
## data: trees
## t = -12.565, df = 92, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 77
## 95 percent confidence interval:
## 33.92733 45.68557
## sample estimates:
## mean of x
## 39.80645
- Use the p value to draw a conclusion about the hypothesis:\(H_o: \mu_H=77\) vs \(H_A: \mu \ne 77\) where \(\mu_H\) in the context of the question.
The p value is within the region, so therefore we fail to reject and will include 77.
- Compare the conclusions drawn from the 90% confidence interval for \(\mu_H\) in homework 5, exercise 2b and the hypothesis test in the previous question.
This conclusion is similar to the one answered in part 3. The confidence interval from homework 5 is similar to the ones drawn in the hypothesis test in 2b. However the values drawn from 2b show the pvalue test, which is the probability of obtaining a result as extreme or more extreme as the one obtained in the sample, assuming the null is true. This is telling us that the mean diameter is close to 77 and is included in the 90% CI. The CI is an interval estimate, a random variable that has some probability of covering the true parameter value. It gives us the exact mean by adding the upper and lower values and dividing them by 2, which the pvalue does include, it also gives us an interval of the 90% of the data lies.
- Construct a 90% Bootstrap T Confidence interval for \(\mu_D\) (mean diameter) from the sample data in the trees data set. Compare this confidence interval to that which you computed in 2b and 2c and brainstorm possible reasons for the relationships you noticed. Feel free to use the bootstrap code provided in conf_intervals_R.Rmd.
# Function to build 100(1-alpha) bootstrap CI
boot_ci <- function(Girth, n_boot, alpha)
# Get summaries from the data
Girth <- trees$Girth
x_bar<-mean(Girth)
s<-sd(Girth)
n<-length(Girth)
se<-s/sqrt(n)
# Vector to store bootstrap samples
B<-1000
t_hat <- numeric(B)
# Bootstrap loop
for(i in 1:B){
# 2. Draw a SRS of size nn from data
x_star <- sample(Girth, size = n, replace = T)
# 3. Calculate resampled mean and sd
x_bar_star <- mean(x_star)
s_star <- sd(x_star)
# 4. Calculate t_hat, and store it in vector
t_hat[i] <- (x_bar_star - x_bar) / (s_star/sqrt(n))
}
hist(t_hat,main="Approx. Sampling Distribution of t")
# Find upper and lower critical values of approx. distribution
t_lower <- quantile(t_hat, probs = 0.05, names = T)
t_upper <- quantile(t_hat, probs = 0.95, names = T)
# Build final CI
boot_ci_upper <- x_bar - t_lower*se
boot_ci_lower <- x_bar - t_upper*se
c(boot_ci_lower, boot_ci_upper)
## 95% 5%
## 12.34953 14.29916