Filling in the anova chart: tvalue_int = -2.6 tvalue_speed = 9.464
RSE: df = 48 R^2=0.6511 F stat: 89.57 on 1 and 48 df pvalue = 1.490127e-12
MS_speed = 21186 MS_Red = 236.54
F value = 89.566
Pr(>F) = 1.490127e-12
#Finding P value from F value
pf(89.566, 1, 48, lower.tail = FALSE)
## [1] 1.490127e-12
library(ISLR)
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 1.0.4
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
data("Carseats")
head(Carseats)
## Sales CompPrice Income Advertising Population Price ShelveLoc Age Education
## 1 9.50 138 73 11 276 120 Bad 42 17
## 2 11.22 111 48 16 260 83 Good 65 10
## 3 10.06 113 35 10 269 80 Medium 59 12
## 4 7.40 117 100 4 466 97 Medium 55 14
## 5 4.15 141 64 3 340 128 Bad 38 13
## 6 10.81 124 113 13 501 72 Bad 78 16
## Urban US
## 1 Yes Yes
## 2 Yes Yes
## 3 Yes Yes
## 4 Yes Yes
## 5 Yes No
## 6 No Yes
Sales -numeric; Price - numeric; Urban - categorical; US - categorical. Both the Urban and US have a binary response of yes and no.
car_mlr = lm(Sales~Price+Urban+US, data=Carseats)
summary(car_mlr)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
The slopes provide the relationship between the Sales variable. For each unit of Price, Sales will drop 0.054. If location is Urban, the Sales will drop 0.022 and if the location is US, then the Sales will increase by 1.2
All the slopes are -0.054 and the intercept will change according to the uban/US coefficients. So, Sales as a function Price would be modeled:
\[y = 13.043469 - 0.054459x\]
With the Urban variable, the equation would be:
\[y = (13.043469 + (-0.021916)) - 0.054459x\]
With the US variable, the equation would be:
\[y = (13.043469 + 1.200573) - 0.054459x\]
With the both the Urban and US variables, the equation would be:
\[y = (13.043469 + (-0.021916) +1.200573) - 0.054459x\]
The P-value for Urban is the only one greater than 0.05 so we fail to reject the null hypothesis. The other variables, US and Price, we can reject the null hypothesis.
small_mod = lm(Sales ~ Price + US, data = Carseats)
anova(small_mod)
## Analysis of Variance Table
##
## Response: Sales
## Df Sum Sq Mean Sq F value Pr(>F)
## Price 1 630.03 630.03 103.319 < 2.2e-16 ***
## US 1 131.37 131.37 21.543 4.707e-06 ***
## Residuals 397 2420.87 6.10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(car_mlr)
## Analysis of Variance Table
##
## Response: Sales
## Df Sum Sq Mean Sq F value Pr(>F)
## Price 1 630.03 630.03 103.0603 < 2.2e-16 ***
## Urban 1 0.10 0.10 0.0158 0.9001
## US 1 131.31 131.31 21.4802 4.86e-06 ***
## Residuals 396 2420.83 6.11
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The MSE for the smaller model is 6.10 and the MSE for the original model is 6.11. Technically, the smaller model has a lower MSE but the numbers are very close and small.
confint(car_mlr, level = .95)
## 2.5 % 97.5 %
## (Intercept) 11.76359670 14.32334118
## Price -0.06476419 -0.04415351
## UrbanYes -0.55597316 0.51214085
## USYes 0.69130419 1.70984121
We are 95% confident that the true value of the coefficient of Price is in the interval (-0.065, -0.044)