M. Drew LaMar
February 8, 2019
Definition: Multiple regression extends simple two-variable regression to the case that still has one response but many predictors (denoted \( x_1 \), \( x_2 \), \( x_3 \), …).
The method is motivated by scenarios where many variables may be simultaneously connected to an output.
We're going to look at auction data for the Mario Kart Wii game.
mario_kart <- marioKart %>%
dplyr::select(price = totalPr,
cond,
stock_photo = stockPhoto,
duration, wheels) %>%
mutate(cond = forcats::fct_relevel(cond, c("used", "new"))) %>%
filter(price < 100)
str(mario_kart)
'data.frame': 141 obs. of 5 variables:
$ price : num 51.5 37 45.5 44 71 ...
$ cond : Factor w/ 2 levels "used","new": 2 1 2 2 2 2 1 2 1 1 ...
$ stock_photo: Factor w/ 2 levels "no","yes": 2 2 1 2 2 2 2 2 2 1 ...
$ duration : int 3 7 3 3 1 3 1 1 3 7 ...
$ wheels : int 1 1 1 1 2 0 0 2 1 1 ...
mario_kart %>%
ggplot(aes(x = cond, y = price)) +
geom_point(position = position_jitter(width = 0.1))
summary(mdl_cond)
Call:
lm(formula = price ~ cond, data = mario_kart)
Residuals:
Min 1Q Median 3Q Max
-13.8911 -5.8311 0.1289 4.1289 22.1489
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.871 0.814 52.668 < 2e-16 ***
condnew 10.900 1.258 8.662 1.06e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.371 on 139 degrees of freedom
Multiple R-squared: 0.3506, Adjusted R-squared: 0.3459
F-statistic: 75.03 on 1 and 139 DF, p-value: 1.056e-14
A multiple regression model is a linear model with many predictors. In general, we write the model as \[ \hat{y} = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + \cdots + \beta_{k}x_{k} \] when there are \( k \) predictors.
mdl_full <- lm(price ~ cond + stock_photo + duration + wheels, data = mario_kart)
summary(mdl_full)
Call:
lm(formula = price ~ cond + stock_photo + duration + wheels,
data = mario_kart)
Residuals:
Min 1Q Median 3Q Max
-11.3788 -2.9854 -0.9654 2.6915 14.0346
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.21097 1.51401 23.917 < 2e-16 ***
condnew 5.13056 1.05112 4.881 2.91e-06 ***
stock_photoyes 1.08031 1.05682 1.022 0.308
duration -0.02681 0.19041 -0.141 0.888
wheels 7.28518 0.55469 13.134 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.901 on 136 degrees of freedom
Multiple R-squared: 0.719, Adjusted R-squared: 0.7108
F-statistic: 87.01 on 4 and 136 DF, p-value: < 2.2e-16