Instructions: Answer all questions and submit the problems in order, making sure that the computer output and discussion are placed together. (Do not put the computer output at the end of the homework.) Raw computer output is not acceptable. Make it clear what parts of the output are relevant and show how they answer the questions posed in the homework. Make sure problem numbers and each part (a, b, c ) are clearly labeled. You may work together on the homework, but do not copy any part of a homework. Each student must produce his/her own homework to be handed in. Homework is to be submitted via Carmen.
R code should be inserted into R chunks. Answers/comments should be written as plain text OUTSIDE of R chunks, NOT INSIDE of R chunks prefaced by a hashtag. Inserting brief comments in your R code to describe what your code is doing is fine but detailed comments and answers to questions should be written OUTSIDE of R chunks. Review solutions to homework assignments for examples. Combining regular text with chunks of code is one of the main advantages of using a markdown file.
Refer to the lecture notes for examples of R code.
Additional help is provided by the links in the Installation of R and
RStudio document. When the assignement is completed, please upload two
files, your .RMD file and the output as a .pdf file. Filenames for each
homework assignment will have the following format:
lastname_firstname_hw#.Rmd which contains the text and R code, and
lastname_firstname_hw#.pdf, which is a copy of the output generated by
knitting lastname_firstname_hw#.Rmd. For example, for homework 6, I
would submit kelbick_nicole_hw6.Rmd and kelbick_nicole_hw6.pdf.
NOTE: When converting to a Word file first, you may edit page breaks
and center plots, tables and/or text but nothing else at this time. The
command \newpage will cause a page break in your
document.
Wherever possible, please show how each answer was derived.
Non-integer values should go out to at least 3 decimal places.
Unless indicated otherwise, use R as a
calculator as well as to obtain probabilities, such as p-values,
critical values as well as basic calculations, such as means, standard
deviations and the like.
Question 1
In addition to the computer’s calculations of miles per gallon, a car’s owner also recorded the miles per gallon by dividing the miles driven by the number of gallons at each fill up. The owner wants to know if his calculation of mpg and the computer’s calculations are significantly different. Below are the mpg recorded by both the computer and driver.
| Obs | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Computer | 41.5 | 50.7 | 36.6 | 37.3 | 34.2 | 45.0 | 48.0 | 43.2 | 47.7 | 42.2 |
| Driver | 46.5 | 57.2 | 36.0 | 39.0 | 37.9 | 49.5 | 56.0 | 45.4 | 52.6 | 45.2 |
| Obs | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
|---|---|---|---|---|---|---|---|---|---|---|
| Computer | 43.2 | 44.6 | 48.4 | 46.4 | 46.8 | 39.2 | 37.3 | 43.5 | 44.3 | 43.3 |
| Driver | 47.6 | 44.7 | 51.4 | 47.5 | 47.9 | 44.2 | 39.4 | 47.2 | 43.7 | 39.1 |
computer - driver. Wherever possible,
answers should be written out to at least 3 decimal places. Check out
Pairedttest.R for R code
related to the seizure drug study example mentioned in the lecture
notes. It is located on the paired t-test lecture page on Carmen.Clearly state the null (\(H_0\)) and alternative (\(H_a\)) hypotheses in terms of the parameter of interest. Null: No difference between the mean miles per gallon by the computer and by the driver. alternate: there is a difference between the mean miles per gallon by the computer and by the driver
Calculate the relevant test statistic. ’’’{r}
computer <- c(41.5, 50.7, 36.6, 37.3, 34.2, 45.0, 48.0, 43.2, 47.7, 42.2, 43.2, 44.6, 48.4, 46.4, 46.8, 39.2, 37.3, 43.5, 44.3, 43.3)
driver <- c(46.5, 57.2, 36.0, 39.0, 37.9, 49.5, 56.0, 45.4, 52.6, 45.2, 47.6, 44.7, 51.4, 47.5, 47.9, 44.2, 39.4, 47.2, 43.7, 39.1) tTest <- t.test(computer, driver, paired = TRUE) print(tTest) ’’’
p-value = 0.0003386 iv. State your conclusion based on a \(\alpha = 10\%\) significance level.
Since the p-value < alpha, we can reject the null hypothesis
’’’ critical <- qt(0.1 / 2, df = length(computer) - 1) rejectionRegion <- c(-Inf, -critical, critical, Inf) print(rejectionRegion) ’’’
’’’ qqline(computer - driver) hist(computer - driver, main = “differences”) qqnorm(computer - driver)
’’’
’’’ signTest <- binom.test(sum(computer > driver), length(computer), p = 0.5, alternative = “two.sided”) print(signTest) ’’’ Since the p-value< alpha, we reject the null hypothesis abd say that there are differences
NOTE: The p-value for a two-sided Sign Test is a little trickier to compute than for a one-sided Sign Test. We need to account for the fact that the number of positive differences could be either above the expected number $(n/2)$ or below it. Suppose the number of positive difference is "a". Then $n-a$ is the number of negative differences. Then the p-value will be $P(X \le min(a, n-a)) + P(X \ge max(a, n-a))$.
Question 2
Suppose Honda wants to compare gas mileage between the latest Accord and Civic models. Are there differences in MPG between the two models? Computer estimates of MPG were used for each model. Assume the readings were done on brand new cars of each model under similar conditions and that observations within a group as well as between groups are independent.
| Obs | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Accord | 41.5 | 50.7 | 36.6 | 37.3 | 34.2 | 45.0 | 48.0 | 43.2 | 47.7 | 42.2 |
| Civic | 46.5 | 57.2 | 36.0 | 39.0 | 37.9 | 49.5 | 56.0 | 45.4 | 52.6 | 45.2 |
| Obs | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
|---|---|---|---|---|---|---|---|---|---|---|
| Accord | 43.2 | 44.6 | 48.4 | 46.4 | 46.8 | 39.2 | 37.3 | 43.5 | 44.3 | 43.3 |
| Civic | 47.6 | 44.7 | 51.4 | 47.5 | 47.9 | 44.2 | 39.4 | 47.2 | 43.7 | 39.1 |
BP_Example_Handout.R for relevant examples
of R code related to the calcium and blood pressure study
example described in
BP_Example_Handout.pdf. Both files can be
downloaded from the Carmen assignment page.’’’ accord <- c(41.5, 50.7, 36.6, 37.3, 34.2, 45.0, 48.0, 43.2, 47.7, 42.2, 43.2, 44.6, 48.4, 46.4, 46.8, 39.2, 37.3, 43.5, 44.3, 43.3) civic <- c(46.5, 57.2, 36.0, 39.0, 37.9, 49.5, 56.0, 45.4, 52.6, 45.2, 47.6, 44.7, 51.4, 47.5, 47.9, 44.2, 39.4, 47.2, 43.7, 39.1)
par(mfrow = c(1, 2)) hist(accord, main = “Accord”, xlab = “MPG”) hist(civic, main = “Civic”, xlab = “MPG”)
par(mfrow = c(1, 2)) qqnorm(accord, main = “Accord, MPG”) qqline(accord) qqnorm(civic, main = “Civic, MPG”) qqline(civic) ’’’
’’’ test <- var.test(civic, accord) print(test) ’’’
Civic - Accord. In addition, when
calculating the test statistic, show how you calculated \(SE(\bar{Y}_{C}-\bar{Y}_A)\), the standard
error of \(\bar{Y}_{C}-\bar{Y}_A\). You
may use R as a calculator of sorts to help you with
calculations, as well as use pt() for
calculating probabilities and qt() for
obtaining critical values or quantiles. Compare your results to those
obtained from t.test() by using
var.equal=TRUE as one of the arguments to
t.test(). You may find lecture notes as
well as help(t.test) to be useful.Null: no difference in the mean gas mileage between the Accord and Civic Alternative: there is a difference in the mean gas mileage between the Accord and Civic
’’’ SE <- sqrt(((sd(accord))^2 / (length(accord))) + ((sd(civic))^2 / (length(civic)))) t <- ((mean(civic)) - (mean(accord))) / SE ’’’
’’’ df <- length(accord) + length(civic) - 2 pValue <- 2 * pt(-abs(t), df) criticalValue <- qt(0.05, df)
print(t) print(p_value) print(critical_value) ’’’
t.test()
for a two-sample pooled variance setting.’’’ confidenceInterval1 <- mean(civic) - mean(accord) + c(-1, 1) * qt(0.05, df = length(civic) + length(accord) - 2) * se_diff print(confidenceInterval1)
tTest <- t.test(civic, accord, var.equal = TRUE) print(tTest$conf.int) ’’’
Civic - Accord. In addition, when calculating the test
statistic, show how you calculated \(SE(\bar{Y}_{C}-\bar{Y}_A)\), the standard
error of \(\bar{Y}_{C}-\bar{Y}_A\). Use
Satterthwaite’s approximation to calculate degrees of freedom. Refer to
the lecture notes for the formula as well as
BP_Example_Handout.R for examples of
R code for similar calculations used with the calcium
intake and blood pressure data. You may use R as a
calculator of sorts to help you with calculations, as well as use
pt() for calculating probabilities and
qt() for obtaining critical values or
quantiles. Compare your results to those obtained from
t.test() by using
var.equal=FALSE as one of the arguments to
t.test(). You may find
BP_Example_Handout.pdf and
BP_Example_Handout.R as well as
help("t.test") to be helpful.Clearly state the null (\(H_0\)) and alternative (\(H_a\)) hypotheses in terms of the parameter of interest. Null: no difference in the mean gas mileage between the Accord and Civic Alternative: there is a difference in the mean gas mileage between the Accord and Civic
Calculate the relevant test statistic. ’’’ SE <- sqrt(((sd(accord))^2 / length(accord)) + ((sd(civic))^2 / length(civic)))
t <- ((mean(civic)) - (mean(accord))) / SE ’’’
’’’ df <- ((sd(accord))^2 / length(accord) + (sd(civic)^2 / length(civic))^2) / (((sd(accord)^2 / length(accord))^2 / (length(accord) - 1) + ((sd(civic)^2 / length(civic))^2 / (length(civic) - 1))
pValue <- 2 * pt(-abs(t), df) criticalValue <- qt(0.05, df)
print(t)
print(df)
print(pValue)
print(criticalValue) ’’’
’’’ tTest <- t.test(civic, accord, var.equal = FALSE) print(tTest) ’’’ We can reject the null hypothesis.
t.test()
for a two-sample unpooled variance setting.’’’ SE_diff <- sqrt(((sd(accord))^2 / (length(accord))) + ((sd(civic))^2 / length(civic)))
tScore <- qt(0.05, df = ((length(accord)) - 1) + ((length(civic)) - 1))
moe <- tScore * SE_diff
confidenceInterval <- c((mean(civic)) - (mean(accord)) - moe, (mean(civic)) - (mean(accord)) + moe) tTest <- t.test(civic, accord, var.equal = FALSE, conf.level = 0.90) print(tTest$confidenceInterval) ’’’
Both tests reached the same conclusion: reject the null hypothesis
The value of SE(¯YC−¯YA) were the same.
Degrees of freedom were not needed in part e.
The confidence interval in (f) was wider and possessed a larger margin of error.