Submit your homework to Canvas by the due date and time. Email your lecturer if you have extenuating circumstances and need to request an extension.
If an exercise asks you to use R, include a copy of all relevant code and output in your submitted homework file. You can copy/paste your code, take screenshots, or compile your work in an Rmarkdown document.
If a problem does not specify how to compute the answer, you many use any appropriate method. I may ask you to use R or use manual calculations on your exams, so practice accordingly.
You must include an explanation and/or intermediate calculations for an exercise to be complete.
Be sure to submit the HWK8 Autograde Quiz which will give you ~20 of your 40 accuracy points.
50 points total: 40 points accuracy, and 10 points completion
Exercise 1. Data on household vehicle miles of travel (VMT) are compiled annually by the Federal Highway Administration. A researcher is interested in whether there is a difference in last year’s mean VMT for Midwestern and southern households (\(\mu_M\) and \(\mu_S\)). Independent random samples of 15 Midwestern households and 14 southern households provided the following data on last year’s VMT, in thousands of miles:
\[\text{Midwest}: 16.2, 12.9, 17.3, 14.6, 18.6, 10.8, 11.2, 16.6, 16.6, 24.4, 20.3, 20.9, 9.6, 15.1, 18.3\]
\[\text{South}: 22.2, 19.2, 9.3, 24.6, 20.2, 15.8, 18.0, 12.2, 20.1, 16.0, 17.5, 18.2, 22.8, 11.5\]
- Graph the data as you see fit. Why did you choose the graph(s) you did and what do they tell you? Also calculate summary statistics relevant to the research question.
midwest <- c(16.2, 12.9, 17.3, 14.6, 18.6, 10.8,
11.2, 16.6, 16.6, 24.4, 20.3, 20.9, 9.6, 15.1, 18.3)
south <- c(22.2, 19.2, 9.3, 24.6, 20.2, 15.8, 18.0,
12.2, 20.1, 16.0, 17.5, 18.2, 22.8, 11.5)
bin_breaks = c(9, 12, 15, 18, 21, 24, 27)
hist(midwest, main = "Midwest", ylim = c(0,5), breaks = bin_breaks, xlim = c(9, 30), xlab = "VMT")
hist(south, main = "South", ylim = c(0,5), breaks = bin_breaks, xlim = c(9, 30), xlab = "VMT")
qqnorm(midwest, main = "QQ-Plot of Midwest")
qqnorm(south, main = "QQ-Plot of South")
mean(midwest)
## [1] 16.22667
sd(midwest)
## [1] 4.055062
mean(south)
## [1] 17.68571
sd(south)
## [1] 4.42247
- Perform a \(10\%\) significance level two sample t-test for the difference in means assuming equal variance to address the researcher’s question. Justify why the assumptions of the test are reasonably met or describe what assumptions we are assuming are met.
2*pt(-0.926889, 27)
## [1] 0.362196
t.test(midwest, south, mu = 0, var.equal = TRUE, conf.level = 0.90)
##
## Two Sample t-test
##
## data: midwest and south
## t = -0.92689, df = 27, p-value = 0.3622
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
## -4.140237 1.222142
## sample estimates:
## mean of x mean of y
## 16.22667 17.68571
As part of this test, specify your assumptions and hypotheses, calculate your test statistic, p value and make a conclusion in the context of the question. Show all steps of the computation by hand and then check your computations using
t.test().
- A confidence interval for the true difference in means \(\mu_M-\mu_S\) assuming equal population variances is reported as: (-3.527, 0.609). Identify the point estimate, margin of error, standard error, critical value, degrees of freedom, and confidence level used to construct it.
t.test(midwest, south, mu = 0, var.equal = TRUE, conf.level = 0.80)
##
## Two Sample t-test
##
## data: midwest and south
## t = -0.92689, df = 27, p-value = 0.3622
## alternative hypothesis: true difference in means is not equal to 0
## 80 percent confidence interval:
## -3.5269808 0.6088855
## sample estimates:
## mean of x mean of y
## 16.22667 17.68571
qt(0.10, 27, lower.tail=FALSE)
## [1] 1.313703
- Compute the test statistic and p value for the hypotheses: \[H_0: \mu_M-\mu_S=0, H_A: \mu_M-\mu_S \ne 0\] at the 10% level not assuming equal population variances. You can use t.test(), but make sure you understand how those values are computed. How does the difference in population variances assumptions change how we do the calculations in the two independent sample t test we perform?
t.test(midwest, south, mu = 0, var.equal = FALSE, conf.level = 0.90)
##
## Welch Two Sample t-test
##
## data: midwest and south
## t = -0.92403, df = 26.344, p-value = 0.3639
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
## -4.150926 1.232831
## sample estimates:
## mean of x mean of y
## 16.22667 17.68571
Exercise 2. A reporter for a national magazine was interested in residents’ levels of worry about being the victim of crime in their neighborhood. They performed a telephone poll of 1500 adults who live in the United States- 140 from urban areas, 160 from suburban, and 200 from rural areas gave their opinions. The results of the survey are summarized in the table below.
A national politician references this telephone poll as evidence that “Over half of United States residents are worried about being the victim of crime in their neighborhood”.
| Urban | Suburban | Rural | |
|---|---|---|---|
| Worry about being victim | 83 | 92 | 86 |
| Do Not Worry about being victim | 57 | 68 | 114 |
| Total | 140 | 160 | 200 |
- Discuss how the sampling strategy impacts the population to which the inference should be made. Based on what we know about the sampling do we have a simple random sample of iid observations from all Americans? To what population could we more confidently make an inference?
- Use a one-sample large sample Z test for \(\pi_W\): the proportion of the population who are worried about being a victim to see if there is significant evidence at the \(\alpha=0.05\) level for the politician’s claim (as it applies to the population you identified in (a).
pnorm(0.98386, 0, 1, lower.tail = FALSE)
## [1] 0.1625922
2*pnorm(0.316, 0, 1, lower.tail = FALSE)
## [1] 0.7520025
- Perform a hypothesis test at the 5% level of significance to determine if there is evidence of a difference in the proportion of urban and suburban residents who worry about being the victim of crime. (Be sure to state your hypotheses, assumptions, and show your computations by hand.)
- Create a 95% confidence interval for \(\pi_R-\pi_U\), the difference in proportion of rural and urban residents who worry about being a victim of crime. (Be sure to show your computations.) Interpret the confidence interval in the context of the question.
qnorm(0.025, 0, 1)
## [1] -1.959964
qnorm(0.975, 0, 1)
## [1] 1.959964
- Suppose you look closer at the article and see that the reporter asked all adults within a household to respond to the question and recorded their responses. Does this concern you? If so, why? If not, why not?