6 février 2018

Presentation

This tool helps to select factors to include in the formula applied in Linear Regression.

The dataset used is mtcars.

The context is the one of the final assigment for the Linear Regression course.

The purpose is to select the relevant factors to add to am as predictors of mpg as outcome.

Selecting factors / options for results

   

On the left side, you can select factors to add to the formula, by clicking on the corresponding checkbox.

The list gives, for each factor

  • the name of the factor
  • the variance importance factor (in a regression with all columns as factors), noted vif
  • the Residual Sum of Squares of the model with am and the considered factor, noted rss

You can also choose to display (or not) the results items (formula, regression results, Shapiro test, residuals plots).

Display results

   

The right part of the screen is updated, showing:

  • The resulting formula
  • The regression results (RSE - Residual Standard Error, R-squared, Adjusted R-squared)
  • A Shapiro test
  • The impact of am on mpg (all other factors left unchanged) following this regression model
  • Residuals plots (for checking)
    • Density curve of the residuals (normality)
    • Distribution of the predictions (symetry around 0)
    • Residuals versus fitted (symetri around 0)
    • Normal Q-Q (normality of residuals)
    • Scale location (no heteroscedacity)
    • Residuals versus leverage (no outliers)

Example of selection

With factors selected following mpg ~ am + hp + wt + qsec (the code below is evaluated in the next slide)

data(mtcars)
library(car)
library(knitr)
mf <- lm(mpg ~ am + hp + wt + qsec, data = mtcars)
r.squared <- round(summary(mf)$r.squared, 2)
adj.r.squared <- round(summary(mf)$adj.r.squared, 2)
rse <- round(summary(mf)$sigma, 2)
st <- shapiro.test(mf$residuals)
method <- st$method
W <- round(st$statistic, 2)
p.value <- round(st$p.value, 4)
dfprint <- data.frame(RSE = rse, R.sq = r.squared, Adj.R.sq = adj.r.squared,
                      Method = method, W = W, p.value = p.value)
kable(dfprint, format = "markdown", row.names = FALSE, align="c")
resid <- residuals(mf)
fitted <- fitted.values(mf)
par(mfrow = c(3,2), mar = c(2, 2, 2, 2))
plot(density(resid), xlab = "Residuals", ylab = "Density", main = "")
plot(fitted, resid, xlab = "Predicted values", ylab = "Residuals")
abline(h = 0, col = "red", lty = "dashed")
plot(mf, cex = 0.8)

Results with selection

RSE R.sq Adj.R.sq Method W p.value
2.43 0.86 0.84 Shapiro-Wilk normality test 0.94 0.0713