Intercept t-value: \(\frac{-17.5791}{6.7584} = -2.601\)
speed t-value: \(\frac{3.9324}{0.4155} = 9.464\)
speed p-value for f-statistic: ’***’ and a high t-value implies that the p-value is nearly zero
15.48 on 48 degrees of freedom
Multiple R-squared: $1- = 0.65108
f-statistic: 86.83372
p-value for f-statistic: Again, ’***’ and a high t-value implies that the p-value is nearly zero
speed mean square: 21186
Residuals mean square: \(\frac{11354}{46} = 246.8261\)
f-value: \(\frac{21186}{246.8261} = 86.8337\)
p-value for f-statistic: Again, ’***’ and a high t-value implies that the p-value is nearly zero
Auto <- read.table("http://faculty.marshall.usc.edu/gareth-james/ISL/Auto.data",
header=TRUE,
na.strings = "?",
stringsAsFactors = FALSE)
lim.fit <- lm(mpg ~ horsepower, data = Auto)
summary(lim.fit)
##
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
The p-values are lower than 0.5, implying that there is statistical significance between the predictor and the responce.
The R-squared value is 0.6049, suggesting that there is a somewhat strong response.
The regression coeficcient is negative, suggesting a negative correlation.
Confidence Interval:
predict(lim.fit, data.frame(horsepower = c(85)), interval = "confidence")
## fit lwr upr
## 1 26.51906 25.973 27.06512
Prediction Interval:
predict(lim.fit, data.frame(horsepower = c(85)), interval = "prediction")
## fit lwr upr
## 1 26.51906 16.85857 36.17954
The 95% confidence interval is larger than the prediction interval because there is more uncertianty with a observation not in the original dataset.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.4
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
plot(Auto$horsepower, Auto$mpg, xlab = "horsepower", ylab = "mpg")
abline(lim.fit, col = "red")
par(mfrow=c(2,2))
plot(lim.fit)
In the Normal Q-Q plot, the seperation of the values at the right end of the plot suggest that the data is not quite normally distributed. Likewise, in the Residuals vs Leverage plot, there are a few data points with large residuals and large laverage.