The assignment is due on January 15 at 3 pm. You need show your work and answers to each problem as well as plots and R code and output to substantiate your answers. You don’t have to typeset your solution; handwritten answers together with a printout of code and output are fine. Please combine your submission into one PDF file and submit it to Canvas under the Assignments tab. You may discuss the homework with other students but you need to prepare your solution by yourself.
This first assignment is set up to get you used to R. You will need to employ some basic R commands such as plot, lm, qqnorm. The problems give pointers to which R functions will be appropriate. The help(. . . ) command is very useful to see the usage and possible arguments that can be used with each command. Remember that you don’t need to supply all the arguments that are available. At the end of the help file there is an example that shows how the command is used, which is often all you need.
Download the file to your computer. Then use the read.table command. For example, I would import the data from the Downloads folder on my computer this way: data=read.table(”/Users/guentherwalther/Downloads/NewspapersData.txt”, sep=”, header=TRUE)
library(tidyverse)
data <- read.table("~/Documents/Course Work/Applied Statistics/Data/NewspapersData.txt", header=TRUE,sep='\t')
plot(data$Daily, data$Sunday)
This plot does suggest a plausible linear relationship between Daily and Sunday circulation. I conclude that this relationship is plausible because
lmresult <- lm(data$Sunday ~ data$Daily) ## notations is Y ~ X, fits the model Y = beta_0 + beta_1X
summary(lmresult)
##
## Call:
## lm(formula = data$Sunday ~ data$Daily)
##
## Residuals:
## Min 1Q Median 3Q Max
## -255.19 -55.57 -20.89 62.73 278.17
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.83563 35.80401 0.386 0.702
## data$Daily 1.33971 0.07075 18.935 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 109.4 on 32 degrees of freedom
## Multiple R-squared: 0.9181, Adjusted R-squared: 0.9155
## F-statistic: 358.5 on 1 and 32 DF, p-value: < 2.2e-16
plot(data$Daily, data$Sunday)
abline(lmresult)
The equation for the regression line is:
# 95% confidence intervals using confint function
confint(lmresult)
## 2.5 % 97.5 %
## (Intercept) -59.094743 86.766003
## data$Daily 1.195594 1.483836
# 95% confidence intervals using provided formula
There is a significant relationship between Sunday circulation and daily circulation. See code below for a statistical test examining this relationship. The null hypothesis is that β1 = 0, which would indicate that there is no relationship between y and x. This hypothesis is rejected because…
The proportion of the variability in Sunday circulation accounted for by the daily circulation is…
Provide an interval estimate (based on 95% level) for the average Sunday circulation of newspapers with daily circulation of 500,000. You can do this in two ways: use the formula in the class notes, or use the function predict.
The particular newspaper that is considering a Sunday edition has a daily circulation of 500,000. Provide an interval estimate (based on 95% level) for the predicted Sunday circulation of this paper. How does this interval differ from that given in (f)?
The interval estimate, based on 95% level, would be … This interval is different from that given in (f) because it is a prediction value, which has more uncertainty (and thus larger prediction intervals) compared to the calculation of an average value.