HW 5

Problem 2

a.

Auto <- read.table("http://faculty.marshall.usc.edu/gareth-james/ISL/Auto.data", 
                  header=TRUE,
                  na.strings = "?",
                  stringsAsFactors = FALSE)
lim.fit <- lm(mpg ~ horsepower, data = Auto)
summary(lim.fit)

## 
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5710  -3.2592  -0.3435   2.7630  16.9240 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.935861   0.717499   55.66   <2e-16 ***
## horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.906 on 390 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16

Is there a relationship between the predictor and the response?

The p-values are lower than 0.5, implying that there is statistical significance between the predictor and the responce.

How strong is the relationship between the predictor and the response?

The R-squared value is 0.6049, suggesting that there is a somewhat strong response.

Is the relationship between the predictor and the resonse positive or negative?

The regression coeficcient is negative, suggesting a negative correlation.

What is the predictied mpg assiciated with a horsepower of 98? What are the associated 95% confidence and prediction intervals?

Confidence Interval:

predict(lim.fit, data.frame(horsepower = c(85)), interval = "confidence")

##        fit    lwr      upr
## 1 26.51906 25.973 27.06512

Prediction Interval:

predict(lim.fit, data.frame(horsepower = c(85)), interval = "prediction")

##        fit      lwr      upr
## 1 26.51906 16.85857 36.17954

The 95% confidence interval is larger than the prediction interval because there is more uncertianty with a observation not in the original dataset.

b. and c.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0

## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

plot(Auto$horsepower, Auto$mpg, xlab = "horsepower", ylab = "mpg")
abline(lim.fit, col = "red")

par(mfrow=c(2,2))
plot(lim.fit)

In the Normal Q-Q plot, the seperation of the values at the right end of the plot suggest that the data is not quite normally distributed. Likewise, in the Residuals vs Leverage plot, there are a few data points with large residuals and large laverage.

HW 5

Johannes Griesser

3/9/2020

Problem 1

Problem 2

a.

b. and c.