Problem 1

  1. Intercept t-value: \(\frac{-17.5791}{6.7584} = -2.601\)

  2. speed t-value: \(\frac{3.9324}{0.4155} = 9.464\)

  3. speed p-value for f-statistic: ’***’ and a high t-value implies that the p-value is nearly zero

  4. 15.48 on 48 degrees of freedom

  5. Multiple R-squared: $1- = 0.65108

  6. f-statistic: 86.83372

  7. p-value for f-statistic: Again, ’***’ and a high t-value implies that the p-value is nearly zero

  8. speed mean square: 21186

  9. Residuals mean square: \(\frac{11354}{46} = 246.8261\)

  10. f-value: \(\frac{21186}{246.8261} = 86.8337\)

  11. p-value for f-statistic: Again, ’***’ and a high t-value implies that the p-value is nearly zero

Problem 2

a.

Auto <- read.table("http://faculty.marshall.usc.edu/gareth-james/ISL/Auto.data", 
                  header=TRUE,
                  na.strings = "?",
                  stringsAsFactors = FALSE)
lim.fit <- lm(mpg ~ horsepower, data = Auto)
summary(lim.fit)
## 
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5710  -3.2592  -0.3435   2.7630  16.9240 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.935861   0.717499   55.66   <2e-16 ***
## horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.906 on 390 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16
  1. Is there a relationship between the predictor and the response?

The p-values are lower than 0.5, implying that there is statistical significance between the predictor and the responce.

  1. How strong is the relationship between the predictor and the response?

The R-squared value is 0.6049, suggesting that there is a somewhat strong response.

  1. Is the relationship between the predictor and the resonse positive or negative?

The regression coeficcient is negative, suggesting a negative correlation.

  1. What is the predictied mpg assiciated with a horsepower of 98? What are the associated 95% confidence and prediction intervals?

Confidence Interval:

predict(lim.fit, data.frame(horsepower = c(85)), interval = "confidence")
##        fit    lwr      upr
## 1 26.51906 25.973 27.06512

Prediction Interval:

predict(lim.fit, data.frame(horsepower = c(85)), interval = "prediction")
##        fit      lwr      upr
## 1 26.51906 16.85857 36.17954

The 95% confidence interval is larger than the prediction interval because there is more uncertianty with a observation not in the original dataset.

b. and c.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
plot(Auto$horsepower, Auto$mpg, xlab = "horsepower", ylab = "mpg")
abline(lim.fit, col = "red")

par(mfrow=c(2,2))
plot(lim.fit)

In the Normal Q-Q plot, the seperation of the values at the right end of the plot suggest that the data is not quite normally distributed. Likewise, in the Residuals vs Leverage plot, there are a few data points with large residuals and large laverage.