STATS 191 Assignment #1 Winter 2022

Daniel Pimentel

2022-01-11

Introduction

The assignment is due on January 15 at 3 pm. You need show your work and answers to each problem as well as plots and R code and output to substantiate your answers. You don’t have to typeset your solution; handwritten answers together with a printout of code and output are fine. Please combine your submission into one PDF file and submit it to Canvas under the Assignments tab. You may discuss the homework with other students but you need to prepare your solution by yourself.

This first assignment is set up to get you used to R. You will need to employ some basic R commands such as plot, lm, qqnorm. The problems give pointers to which R functions will be appropriate. The help(. . . ) command is very useful to see the usage and possible arguments that can be used with each command. Remember that you don’t need to supply all the arguments that are available. At the end of the help file there is an example that shows how the command is used, which is often all you need.

  1. In order to investigate the feasibility of starting a Sunday edition for a large metropolitan newspaper, information was obtained from a sample of 34 newspapers concerning their daily and Sunday circulation (in thousands). The data are in the file ‘NewspapersData.txt’ in the directory ‘Data’ on Canvas. There is also a file ‘NewspapersData2.txt’ which only has the numbers. There are several ways to read the file into R:

Download the file to your computer. Then use the read.table command. For example, I would import the data from the Downloads folder on my computer this way: data=read.table(”/Users/guentherwalther/Downloads/NewspapersData.txt”, sep=”, header=TRUE)

library(tidyverse)
data <- read.table("~/Documents/Course Work/Applied Statistics/Data/NewspapersData.txt", header=TRUE,sep='\t')
  1. Construct a scatter plot of Sunday circulation versus daily circulation. Does the plot suggest a linear relationship between daily and Sunday circulation? Do you think this is a plausible relationship?
plot(data$Daily, data$Sunday)

This plot does suggest a plausible linear relationship between Daily and Sunday circulation. I conclude that this relationship is plausible because

  1. Fit a regression line predicting Sunday circulation from daily circulation and add the regression line to the plot in (a). State the equation of the regression line.
lmresult <- lm(data$Sunday ~ data$Daily) ## notations is Y ~ X, fits the model Y = beta_0 + beta_1X
summary(lmresult)
## 
## Call:
## lm(formula = data$Sunday ~ data$Daily)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -255.19  -55.57  -20.89   62.73  278.17 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.83563   35.80401   0.386    0.702    
## data$Daily   1.33971    0.07075  18.935   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 109.4 on 32 degrees of freedom
## Multiple R-squared:  0.9181, Adjusted R-squared:  0.9155 
## F-statistic: 358.5 on 1 and 32 DF,  p-value: < 2.2e-16
plot(data$Daily, data$Sunday)
abline(lmresult)

The equation for the regression line is:

  1. Obtain the 95% confidence intervals for β0 and β1. Do this in two ways: using the R function confint, and using the formula given in the class notes (the cutoff for the t-distribution can be obtained with qt). Check that the answers match.
# 95% confidence intervals using confint function
confint(lmresult)
##                  2.5 %    97.5 %
## (Intercept) -59.094743 86.766003
## data$Daily    1.195594  1.483836
# 95% confidence intervals using provided formula
  1. Is there a significant relationship between Sunday circulation and daily circulation? Justify your answer with a statistical test. State what hypothesis you are testing and your conclusion.

There is a significant relationship between Sunday circulation and daily circulation. See code below for a statistical test examining this relationship. The null hypothesis is that β1 = 0, which would indicate that there is no relationship between y and x. This hypothesis is rejected because…

  1. What proportion of the variability in Sunday circulation is accounted for by daily circulation?

The proportion of the variability in Sunday circulation accounted for by the daily circulation is…

  1. Provide an interval estimate (based on 95% level) for the average Sunday circulation of newspapers with daily circulation of 500,000. You can do this in two ways: use the formula in the class notes, or use the function predict.

  2. The particular newspaper that is considering a Sunday edition has a daily circulation of 500,000. Provide an interval estimate (based on 95% level) for the predicted Sunday circulation of this paper. How does this interval differ from that given in (f)?

The interval estimate, based on 95% level, would be … This interval is different from that given in (f) because it is a prediction value, which has more uncertainty (and thus larger prediction intervals) compared to the calculation of an average value.