The Bias Variance Tradeoff and Cross Valdation

Tony Fischetti
9/15/2015

Data Scientist at College Factual
@tonyfischetti
onthelambda.com

APO

classification

mtcars

library(magrittr)
mtcars %>% names
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
[11] "carb"
mtcars %>% head
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Decision tree

alt text

why is this bad?

(conf.matrix2)
     [,1] [,2]
[1,]    7    4
[2,]    3   16

irrelavant features

(cr.leaders)
                   name die.age born
1            Rosa Parks      92 1913
2 Martin Luther King Jr      39 1929
3              Malcom X      39 1925
4        Harriet Tubman      91 1822
5         Marcus Garvey      52 1887
6      Shirley Chisholm      80 1924

irrelavant features

(decision tree)

irrelavant features

alt text

irrelavant features

(more.leaders)
               name die.age born
1 Fredrick Douglass      77 1818
2      Marvel Cooke      97 1903

linear regression

mtcars

model <- lm(mpg ~ ., data=mtcars)
summary(model)$r.squared
[1] 0.8690158
model2 <- lm(mpg ~ am + wt, data=mtcars)
summary(model2)$r.squared
[1] 0.7528348

bias-variance tradeoff

alt text

why is it a tradoff?

finding the optimal point

validation set approach

set.seed(1)
train.indices <- sample(1:nrow(mtcars), nrow(mtcars)/2)
training <- mtcars[train.indices,]
testing <- mtcars[-train.indices,]
model <- lm(mpg ~ ., data=training)
summary(model)$r.squared
[1] 0.9879687

validation set approach

set.seed(1)
train.indices <- sample(1:nrow(mtcars), nrow(mtcars)/2)
training <- mtcars[train.indices,]
testing <- mtcars[-train.indices,]
model <- lm(mpg ~ ., data=training)
summary(model)$r.squared
[1] 0.9879687
mean((predict(model) - training$mpg) ^ 2)
[1] 0.4408109

validation set approach

set.seed(1)
train.indices <- sample(1:nrow(mtcars), nrow(mtcars)/2)
training <- mtcars[train.indices,]
testing <- mtcars[-train.indices,]
model <- lm(mpg ~ ., data=training)
summary(model)$r.squared
[1] 0.9879687
mean((predict(model) - training$mpg) ^ 2)
[1] 0.4408109
mean((predict(model, newdata=testing) - testing$mpg) ^ 2)
[1] 337.9995

validation set approach

simpler.model <- lm(mpg ~ am + wt, data=training)
mean((predict(simpler.model) - training$mpg) ^ 2)
[1] 9.396091
mean((predict(simpler.model, newdata=testing) - testing$mpg) ^ 2)
[1] 12.70338

5-fold cross validation

library(boot)
bad.model <- glm(mpg ~ ., data=mtcars)
better.model <- glm(mpg ~ am + wt + qsec, data=mtcars)
bad.cv.err <- cv.glm(mtcars, bad.model, K=5)

bad.cv.err$delta[2]
[1] 9.351715
better.cv.err <- cv.glm(mtcars, better.model, K=5)
better.cv.err$delta[2]
[1] 6.746993

bias-variance tradeoff

alt text

final lessons

final lessons

  • interpret your results

final lessons

  • interpret your results
  • rule of parsimony

final lessons

  • interpret your results
  • rule of parsimony
  • be mindful of overfitting

thanks

Tony Fischetti

College Factual

@tonyfischetti

onthelambda.com