library (class)
library(MASS)
library(ISLR)
library(dplyr)
library(caret)
En este laboratorio estudiaremos el metodo para la seleccion de predictores y que incluyan el mejor subset incluyendo el algoritmo de stepwise
Algoritmo:
Basado en AIC Stepwise Algorithm
https://www.rdocumentation.org/packages/stats/versions/3.6.0/topics/step
Utilizando Akaike information criterion (AIC): Estimador de calidad relativa dado un set de datos, basicamente dado un conjunto de modelos AIC provee el mean de estos modelos \(AIC=2K - 2 ln(\vec{L}\)
set.seed(1)
train=sample (392,196)
lm.fit=lm(mpg~.,data=Auto,subset=train)
lm.null <- lm(mpg ~ 1, data = Auto)
forward <- step(lm.null, direction = "forward", trace = 1, scope = ~ cylinders + displacement + horsepower + weight + acceleration +
year + origin + name)
Start: AIC=1611.93
mpg ~ 1
Df Sum of Sq RSS AIC
+ name 300 23039.2 779.7 871.58
+ weight 1 16497.8 7321.2 1151.49
+ displacement 1 15440.2 8378.8 1204.38
+ horsepower 1 14433.1 9385.9 1248.88
+ cylinders 1 14403.1 9415.9 1250.13
+ year 1 8027.7 15791.3 1452.81
+ origin 1 7609.2 16209.8 1463.07
+ acceleration 1 4268.5 19550.5 1536.52
<none> 23819.0 1611.93
Step: AIC=871.58
mpg ~ name
Df Sum of Sq RSS AIC
+ year 1 143.304 636.45 793.98
+ weight 1 90.918 688.83 824.98
+ displacement 1 85.861 693.89 827.85
+ horsepower 1 75.594 704.16 833.61
+ cylinders 1 47.303 732.45 849.05
<none> 779.75 871.58
+ acceleration 1 0.308 779.44 873.43
Step: AIC=793.98
mpg ~ name + year
Df Sum of Sq RSS AIC
+ weight 1 154.316 482.13 687.13
+ displacement 1 54.947 581.50 760.58
+ cylinders 1 37.765 598.68 772.00
+ horsepower 1 26.564 609.88 779.26
+ acceleration 1 7.389 629.06 791.40
<none> 636.45 793.98
Step: AIC=687.13
mpg ~ name + year + weight
Df Sum of Sq RSS AIC
+ acceleration 1 14.8962 467.23 676.82
+ cylinders 1 2.6621 479.47 686.95
<none> 482.13 687.13
+ horsepower 1 0.4372 481.69 688.77
+ displacement 1 0.4215 481.71 688.78
Step: AIC=676.82
mpg ~ name + year + weight + acceleration
Df Sum of Sq RSS AIC
+ horsepower 1 16.7956 450.44 664.47
+ cylinders 1 12.4871 454.75 668.20
+ displacement 1 9.1116 458.12 671.10
<none> 467.23 676.82
Step: AIC=664.47
mpg ~ name + year + weight + acceleration + horsepower
Df Sum of Sq RSS AIC
+ cylinders 1 11.5115 438.93 656.32
<none> 450.44 664.47
+ displacement 1 0.2658 450.17 666.24
Step: AIC=656.32
mpg ~ name + year + weight + acceleration + horsepower + cylinders
Df Sum of Sq RSS AIC
<none> 438.93 656.32
+ displacement 1 0.19562 438.73 658.15
Como podemos observar el menor AIC esta dado por Step: AIC=656.32 mpg ~ name + year + weight + acceleration + horsepower + cylinders
Por ultimo evaluaremos el resultado de la seleccion mas pequeña y verificamos cual es el AIC de menor aporte en el anova
forward$anova