library (class)
library(MASS)
library(ISLR)
library(dplyr)
library(caret)

Subset Selection

En este laboratorio estudiaremos el metodo para la seleccion de predictores y que incluyan el mejor subset incluyendo el algoritmo de stepwise

Stepwise Selection

Algoritmo:

Basado en AIC Stepwise Algorithm

https://www.rdocumentation.org/packages/stats/versions/3.6.0/topics/step

Utilizando Akaike information criterion (AIC): Estimador de calidad relativa dado un set de datos, basicamente dado un conjunto de modelos AIC provee el mean de estos modelos \(AIC=2K - 2 ln(\vec{L}\)

set.seed(1) 
train=sample (392,196)
lm.fit=lm(mpg~.,data=Auto,subset=train)
lm.null <- lm(mpg ~ 1, data = Auto)
forward <- step(lm.null, direction = "forward", trace = 1, scope =  ~ cylinders + displacement + horsepower + weight + acceleration + 
    year + origin + name)
Start:  AIC=1611.93
mpg ~ 1

                Df Sum of Sq     RSS     AIC
+ name         300   23039.2   779.7  871.58
+ weight         1   16497.8  7321.2 1151.49
+ displacement   1   15440.2  8378.8 1204.38
+ horsepower     1   14433.1  9385.9 1248.88
+ cylinders      1   14403.1  9415.9 1250.13
+ year           1    8027.7 15791.3 1452.81
+ origin         1    7609.2 16209.8 1463.07
+ acceleration   1    4268.5 19550.5 1536.52
<none>                       23819.0 1611.93

Step:  AIC=871.58
mpg ~ name

               Df Sum of Sq    RSS    AIC
+ year          1   143.304 636.45 793.98
+ weight        1    90.918 688.83 824.98
+ displacement  1    85.861 693.89 827.85
+ horsepower    1    75.594 704.16 833.61
+ cylinders     1    47.303 732.45 849.05
<none>                      779.75 871.58
+ acceleration  1     0.308 779.44 873.43

Step:  AIC=793.98
mpg ~ name + year

               Df Sum of Sq    RSS    AIC
+ weight        1   154.316 482.13 687.13
+ displacement  1    54.947 581.50 760.58
+ cylinders     1    37.765 598.68 772.00
+ horsepower    1    26.564 609.88 779.26
+ acceleration  1     7.389 629.06 791.40
<none>                      636.45 793.98

Step:  AIC=687.13
mpg ~ name + year + weight

               Df Sum of Sq    RSS    AIC
+ acceleration  1   14.8962 467.23 676.82
+ cylinders     1    2.6621 479.47 686.95
<none>                      482.13 687.13
+ horsepower    1    0.4372 481.69 688.77
+ displacement  1    0.4215 481.71 688.78

Step:  AIC=676.82
mpg ~ name + year + weight + acceleration

               Df Sum of Sq    RSS    AIC
+ horsepower    1   16.7956 450.44 664.47
+ cylinders     1   12.4871 454.75 668.20
+ displacement  1    9.1116 458.12 671.10
<none>                      467.23 676.82

Step:  AIC=664.47
mpg ~ name + year + weight + acceleration + horsepower

               Df Sum of Sq    RSS    AIC
+ cylinders     1   11.5115 438.93 656.32
<none>                      450.44 664.47
+ displacement  1    0.2658 450.17 666.24

Step:  AIC=656.32
mpg ~ name + year + weight + acceleration + horsepower + cylinders

               Df Sum of Sq    RSS    AIC
<none>                      438.93 656.32
+ displacement  1   0.19562 438.73 658.15

Como podemos observar el menor AIC esta dado por Step: AIC=656.32 mpg ~ name + year + weight + acceleration + horsepower + cylinders

Por ultimo evaluaremos el resultado de la seleccion mas pequeña y verificamos cual es el AIC de menor aporte en el anova

forward$anova
LS0tDQp0aXRsZTogIkxpbmVhciBNb2RlbCBTZWxlY3Rpb24gYW5kIFJlZ3VsYXJpemF0aW9uIg0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCg0KYGBge3J9DQpsaWJyYXJ5IChjbGFzcykNCmxpYnJhcnkoTUFTUykNCmxpYnJhcnkoSVNMUikNCmxpYnJhcnkoZHBseXIpDQpsaWJyYXJ5KGNhcmV0KQ0KYGBgDQoNCiMgU3Vic2V0IFNlbGVjdGlvbg0KRW4gZXN0ZSBsYWJvcmF0b3JpbyBlc3R1ZGlhcmVtb3MgZWwgbWV0b2RvIHBhcmEgbGEgc2VsZWNjaW9uIGRlIHByZWRpY3RvcmVzIHkgcXVlIGluY2x1eWFuIGVsIG1lam9yIHN1YnNldCBpbmNsdXllbmRvIGVsIGFsZ29yaXRtbyBkZSBzdGVwd2lzZQ0KIA0KIyBTdGVwd2lzZSBTZWxlY3Rpb24NCg0KQWxnb3JpdG1vOg0KDQotIExldCBNMCBkZW5vdGUgdGhlIG51bGwgbW9kZWwsIHdoaWNoIGNvbnRhaW5zIG5vIHByZWRpY3RvcnMNCi0gRm9yICRrID0wLC4uLixwLTEkOiANCiAgIC0gQ29uc2lkZXIgYWxsICRwLWskIG1vZGVscyB0aGF0IGF1Z21lbnQgdGhlIHByZWRpY3RvcnMgaW4gJE1fayQgd2l0aCBvbmUgYWRkaXRpb25hbCBwcmVkaWN0b3IuIA0KICAgLSBDaG9vc2UgdGhlIGJlc3QgYW1vbmcgdGhlc2UgJHAtayQgIG1vZGVscywgYW5kIGNhbGwgaXQgJE1fe2srMX0kLiANCiAgIEhlcmUgYmVzdCBpcyBkZWZpbmVkIGFzIGhhdmluZyBzbWFsbGVzdCBSU1Mgb3IgaGlnaGVzdCAkUl4yJC4gDQotIFNlbGVjdCBhIHNpbmdsZSBiZXN0IG1vZGVsIGZyb20gYW1vbmcgJE1fMCwuLi4sTV9wJCB1c2luZyBjcm9zc3ZhbGlkYXRlZCBwcmVkaWN0aW9uIGVycm9yLCAkQ19wIChBSUMpLCBCSUMkLCBvciBhZGp1c3RlZCAkUl4yJC4NCg0KQmFzYWRvIGVuIEFJQyBTdGVwd2lzZSBBbGdvcml0aG0gDQoNCmh0dHBzOi8vd3d3LnJkb2N1bWVudGF0aW9uLm9yZy9wYWNrYWdlcy9zdGF0cy92ZXJzaW9ucy8zLjYuMC90b3BpY3Mvc3RlcA0KDQoNCg0KVXRpbGl6YW5kbyBBa2Fpa2UgaW5mb3JtYXRpb24gY3JpdGVyaW9uIChBSUMpOiBFc3RpbWFkb3IgZGUgY2FsaWRhZCByZWxhdGl2YSBkYWRvIHVuIHNldCBkZSBkYXRvcywgYmFzaWNhbWVudGUgZGFkbyB1biBjb25qdW50byBkZSBtb2RlbG9zIEFJQyBwcm92ZWUgZWwgbWVhbiBkZSBlc3RvcyBtb2RlbG9zIA0KJEFJQz0ySyAtIDIgbG4oXHZlY3tMfSQNCg0KYGBge3J9DQoNCnNldC5zZWVkKDEpIA0KdHJhaW49c2FtcGxlICgzOTIsMTk2KQ0KDQpsbS5maXQ9bG0obXBnfi4sZGF0YT1BdXRvLHN1YnNldD10cmFpbikNCmxtLm51bGwgPC0gbG0obXBnIH4gMSwgZGF0YSA9IEF1dG8pDQpmb3J3YXJkIDwtIHN0ZXAobG0ubnVsbCwgZGlyZWN0aW9uID0gImZvcndhcmQiLCB0cmFjZSA9IDEsIHNjb3BlID0gIH4gY3lsaW5kZXJzICsgZGlzcGxhY2VtZW50ICsgaG9yc2Vwb3dlciArIHdlaWdodCArIGFjY2VsZXJhdGlvbiArIA0KICAgIHllYXIgKyBvcmlnaW4gKyBuYW1lKQ0KDQpgYGANCg0KQ29tbyBwb2RlbW9zIG9ic2VydmFyIGVsIG1lbm9yIEFJQyBlc3RhIGRhZG8gcG9yIA0KU3RlcDogIEFJQz02NTYuMzINCm1wZyB+IG5hbWUgKyB5ZWFyICsgd2VpZ2h0ICsgYWNjZWxlcmF0aW9uICsgaG9yc2Vwb3dlciArIGN5bGluZGVycw0KDQpQb3IgdWx0aW1vIGV2YWx1YXJlbW9zIGVsIHJlc3VsdGFkbyBkZSBsYSBzZWxlY2Npb24gbWFzIHBlcXVl8WEgeSB2ZXJpZmljYW1vcyBjdWFsIGVzIGVsIEFJQyBkZSBtZW5vciBhcG9ydGUgZW4gZWwgYW5vdmENCmBgYHtyfQ0KZm9yd2FyZCRhbm92YQ0KYGBgDQo=