Select factors for LM

6 février 2018

Presentation

This tool helps to select factors to include in the formula applied in Linear Regression.

The dataset used is mtcars.

The context is the one of the final assigment for the Linear Regression course.

The purpose is to select the relevant factors to add to am as predictors of mpg as outcome.

Selecting factors / options for results

On the left side, you can select factors to add to the formula, by clicking on the corresponding checkbox.

The list gives, for each factor

the name of the factor
the variance importance factor (in a regression with all columns as factors), noted vif
the Residual Sum of Squares of the model with am and the considered factor, noted rss

You can also choose to display (or not) the results items (formula, regression results, Shapiro test, residuals plots).

Display results

The right part of the screen is updated, showing:

The resulting formula
The regression results (RSE - Residual Standard Error, R-squared, Adjusted R-squared)
A Shapiro test
The impact of am on mpg (all other factors left unchanged) following this regression model
Residuals plots (for checking)
- Density curve of the residuals (normality)
- Distribution of the predictions (symetry around 0)
- Residuals versus fitted (symetri around 0)
- Normal Q-Q (normality of residuals)
- Scale location (no heteroscedacity)
- Residuals versus leverage (no outliers)

Example of selection

With factors selected following mpg ~ am + hp + wt + qsec (the code below is evaluated in the next slide)

data(mtcars)
library(car)
library(knitr)
mf <- lm(mpg ~ am + hp + wt + qsec, data = mtcars)
r.squared <- round(summary(mf)$r.squared, 2)
adj.r.squared <- round(summary(mf)$adj.r.squared, 2)
rse <- round(summary(mf)$sigma, 2)
st <- shapiro.test(mf$residuals)
method <- st$method
W <- round(st$statistic, 2)
p.value <- round(st$p.value, 4)
dfprint <- data.frame(RSE = rse, R.sq = r.squared, Adj.R.sq = adj.r.squared,
                      Method = method, W = W, p.value = p.value)
kable(dfprint, format = "markdown", row.names = FALSE, align="c")
resid <- residuals(mf)
fitted <- fitted.values(mf)
par(mfrow = c(3,2), mar = c(2, 2, 2, 2))
plot(density(resid), xlab = "Residuals", ylab = "Density", main = "")
plot(fitted, resid, xlab = "Predicted values", ylab = "Residuals")
abline(h = 0, col = "red", lty = "dashed")
plot(mf, cex = 0.8)

Results with selection

RSE	R.sq	Adj.R.sq	Method	W	p.value
2.43	0.86	0.84	Shapiro-Wilk normality test	0.94	0.0713