Practice Problem 1 a.Intercept t-value: -17.5791/6.7584 = -2.601

b.speed t-value: 3.9324/0.4155 = 9.464

c.Because the we see *** and the t-value is high we can understand that the p-value for the t- stat is nearly zero.

  1. 15.38 on 48 degrees of freedom

e.Multiple R-squared: 1- (11354/32540) = 0.6510756

  1. F-statistic = 86.833715

  2. The p-value for the f-statisitc has *** and a high f-statistic lead us to beleive that the p-vale for the f-statisitic is nearly zero.

h.speed Mean Sq: 21186

  1. Resiuals Mean Sq: 11354/46 -> 246.826087
  1. F-value 21186/246.826087 -> 86.833715

  2. The p-value for the f-statisitc has *** and a high f-statistic lead us to beleive that the p-vale for the f-statisitic is nearly zero.

Problem 2 a

Auto <- read.table("http://faculty.marshall.usc.edu/gareth-james/ISL/Auto.data", 
                  header=TRUE,
                  na.strings = "?",
                  stringsAsFactors = FALSE)
lim.fit <- lm(mpg ~ horsepower, data = Auto)
summary(lim.fit)
## 
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5710  -3.2592  -0.3435   2.7630  16.9240 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.935861   0.717499   55.66   <2e-16 ***
## horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.906 on 390 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16
  1. The p-values are much lower then 0.05 which leads us to believe that there is strong evidence that there is statisitcal significance. This suggests that there is a relationship between the predictor and the response.

  2. There seems to be a moderetly strong relationship between the predictor and response because the R-squared value is 0.6049.

  3. The regression coefficient for the predictor, horsepower, is negative.This leads us to believe that it is a negative relationship

  4. confidence interval

predict(lim.fit, data.frame(horsepower = c(85)), interval = "confidence")
##        fit    lwr      upr
## 1 26.51906 25.973 27.06512

predicted interval

predict(lim.fit, data.frame(horsepower = c(85)), interval = "prediction")
##        fit      lwr      upr
## 1 26.51906 16.85857 36.17954

The 95% confidence interval for the prediction interval is bigger than that of the confidence interval because there is an increased uncertainity when a new individual observation must be accounted for.

library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.3
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
plot(Auto$horsepower, Auto$mpg, xlab = "horsepower", ylab = "mpg")
abline(lim.fit, col = "red")

par(mfrow=c(2,2))
plot(lim.fit)

In the Residuals vs. Leverage we cthere is relatively high leverage which suggests that there may be problems with the fit. The residual plot also has a trong curve, which may suggest that there are problems with the fit.