C. Donovan
14 Feb 2018
Want to choose the best model amongst a set of candidate models
We know how to fit some of these models and choose between them
Commonly encounter:
We want more model types!
Need-to-knows
A bit of drawing required
Take an \( n\times p \) matrix \( \mathbf{X} \), define a \( p \) dimensional space \( \mathcal{R}_p \). We wish to apply a simple rule recursively:
We could stop splitting based on a threshold for decreases in sum of squares, but this might rule out a useful split further down the tree.
Need-to-knows
leoIsDead <- read.csv("train.csv", header=TRUE)
##### Classification tree
library(rattle)
library(rpart.plot)
library(RColorBrewer)
library(rpart)
## women and children first!
simpleTree <- rpart(survived ~ sex + age, method="class", data=leoIsDead)
## plot the tree
plot(simpleTree, uniform=TRUE,
main="Women and Children First", margin = 0.1)
text(simpleTree, use.n=TRUE, all=TRUE)
## the rpart plot is terrible, so get something better
fancyRpartPlot(simpleTree)
## learn a tree for a larger subset of covariates
biggerTree <- rpart(survived ~ pclass + sex + age + sibsp + parch + fare, method="class", data=leoIsDead)
## view details
summary(biggerTree)
Call:
rpart(formula = survived ~ pclass + sex + age + sibsp + parch +
fare, data = leoIsDead, method = "class")
n= 891
CP nsplit rel error xerror xstd
1 0.44444444 0 1.0000000 1.0000000 0.04244576
2 0.03070175 1 0.5555556 0.5555556 0.03574957
3 0.02339181 3 0.4941520 0.5000000 0.03437157
4 0.02046784 4 0.4707602 0.5029240 0.03444798
5 0.01267057 5 0.4502924 0.5087719 0.03459945
6 0.01000000 8 0.4122807 0.5000000 0.03437157
Variable importance
sex fare pclass sibsp parch age
48 19 14 7 7 6
Node number 1: 891 observations, complexity param=0.4444444
predicted class=0 expected loss=0.3838384 P(node) =1
class counts: 549 342
probabilities: 0.616 0.384
left son=2 (577 obs) right son=3 (314 obs)
Primary splits:
sex splits as RL, improve=124.426300, (0 missing)
pclass < 2.5 to the right, improve= 43.781830, (0 missing)
fare < 10.48125 to the left, improve= 37.941940, (0 missing)
parch < 0.5 to the left, improve= 9.157774, (0 missing)
age < 6.5 to the right, improve= 8.814172, (177 missing)
Surrogate splits:
fare < 77.6229 to the left, agree=0.679, adj=0.089, (0 split)
parch < 0.5 to the left, agree=0.678, adj=0.086, (0 split)
Node number 2: 577 observations, complexity param=0.02339181
predicted class=0 expected loss=0.1889081 P(node) =0.647587
class counts: 468 109
probabilities: 0.811 0.189
left son=4 (553 obs) right son=5 (24 obs)
Primary splits:
age < 6.5 to the right, improve=10.788930, (124 missing)
fare < 26.26875 to the left, improve=10.216720, (0 missing)
pclass < 1.5 to the right, improve=10.019140, (0 missing)
parch < 0.5 to the left, improve= 3.350327, (0 missing)
sibsp < 0.5 to the left, improve= 1.501502, (0 missing)
Node number 3: 314 observations, complexity param=0.03070175
predicted class=1 expected loss=0.2579618 P(node) =0.352413
class counts: 81 233
probabilities: 0.258 0.742
left son=6 (144 obs) right son=7 (170 obs)
Primary splits:
pclass < 2.5 to the right, improve=31.163130, (0 missing)
fare < 48.2 to the left, improve=10.114210, (0 missing)
sibsp < 2.5 to the right, improve= 9.372551, (0 missing)
parch < 3.5 to the right, improve= 5.140857, (0 missing)
age < 12 to the left, improve= 1.891684, (53 missing)
Surrogate splits:
fare < 25.69795 to the left, agree=0.799, adj=0.563, (0 split)
sibsp < 1.5 to the right, agree=0.592, adj=0.111, (0 split)
parch < 1.5 to the right, agree=0.567, adj=0.056, (0 split)
age < 18.5 to the left, agree=0.564, adj=0.049, (0 split)
Node number 4: 553 observations
predicted class=0 expected loss=0.1681736 P(node) =0.620651
class counts: 460 93
probabilities: 0.832 0.168
Node number 5: 24 observations, complexity param=0.02046784
predicted class=1 expected loss=0.3333333 P(node) =0.02693603
class counts: 8 16
probabilities: 0.333 0.667
left son=10 (9 obs) right son=11 (15 obs)
Primary splits:
sibsp < 2.5 to the right, improve=8.8888890, (0 missing)
pclass < 2.5 to the right, improve=3.8095240, (0 missing)
fare < 20.825 to the right, improve=2.6666670, (0 missing)
age < 1.5 to the right, improve=0.6095238, (0 missing)
Surrogate splits:
pclass < 2.5 to the right, agree=0.792, adj=0.444, (0 split)
fare < 26.95 to the right, agree=0.750, adj=0.333, (0 split)
Node number 6: 144 observations, complexity param=0.03070175
predicted class=0 expected loss=0.5 P(node) =0.1616162
class counts: 72 72
probabilities: 0.500 0.500
left son=12 (27 obs) right son=13 (117 obs)
Primary splits:
fare < 23.35 to the right, improve=10.051280, (0 missing)
sibsp < 2.5 to the right, improve= 4.571429, (0 missing)
age < 38.5 to the right, improve= 3.875163, (42 missing)
parch < 1.5 to the right, improve= 3.773262, (0 missing)
Surrogate splits:
sibsp < 2.5 to the right, agree=0.882, adj=0.37, (0 split)
parch < 1.5 to the right, agree=0.882, adj=0.37, (0 split)
Node number 7: 170 observations
predicted class=1 expected loss=0.05294118 P(node) =0.1907969
class counts: 9 161
probabilities: 0.053 0.947
Node number 10: 9 observations
predicted class=0 expected loss=0.1111111 P(node) =0.01010101
class counts: 8 1
probabilities: 0.889 0.111
Node number 11: 15 observations
predicted class=1 expected loss=0 P(node) =0.01683502
class counts: 0 15
probabilities: 0.000 1.000
Node number 12: 27 observations
predicted class=0 expected loss=0.1111111 P(node) =0.03030303
class counts: 24 3
probabilities: 0.889 0.111
Node number 13: 117 observations, complexity param=0.01267057
predicted class=1 expected loss=0.4102564 P(node) =0.1313131
class counts: 48 69
probabilities: 0.410 0.590
left son=26 (93 obs) right son=27 (24 obs)
Primary splits:
age < 16.5 to the right, improve=2.4685870, (34 missing)
fare < 7.8875 to the right, improve=2.0325270, (0 missing)
sibsp < 0.5 to the right, improve=0.3076923, (0 missing)
parch < 1.5 to the left, improve=0.1582418, (0 missing)
Surrogate splits:
sibsp < 1.5 to the left, agree=0.747, adj=0.087, (34 split)
fare < 20.8 to the left, agree=0.747, adj=0.087, (0 split)
Node number 26: 93 observations, complexity param=0.01267057
predicted class=1 expected loss=0.4516129 P(node) =0.1043771
class counts: 42 51
probabilities: 0.452 0.548
left son=52 (56 obs) right son=53 (37 obs)
Primary splits:
fare < 7.8875 to the right, improve=2.92648500, (0 missing)
age < 36.5 to the right, improve=1.66181500, (33 missing)
sibsp < 0.5 to the right, improve=0.34061040, (0 missing)
parch < 1.5 to the left, improve=0.05969685, (0 missing)
Surrogate splits:
sibsp < 0.5 to the right, agree=0.667, adj=0.162, (0 split)
Node number 27: 24 observations
predicted class=1 expected loss=0.25 P(node) =0.02693603
class counts: 6 18
probabilities: 0.250 0.750
Node number 52: 56 observations, complexity param=0.01267057
predicted class=0 expected loss=0.4464286 P(node) =0.06285073
class counts: 31 25
probabilities: 0.554 0.446
left son=104 (33 obs) right son=105 (23 obs)
Primary splits:
fare < 14.8729 to the left, improve=3.30439500, (0 missing)
age < 23.5 to the left, improve=1.31484800, (13 missing)
parch < 1.5 to the left, improve=1.04027400, (0 missing)
sibsp < 0.5 to the left, improve=0.02216117, (0 missing)
Surrogate splits:
sibsp < 0.5 to the left, agree=0.696, adj=0.261, (0 split)
parch < 1.5 to the left, agree=0.679, adj=0.217, (0 split)
Node number 53: 37 observations
predicted class=1 expected loss=0.2972973 P(node) =0.04152637
class counts: 11 26
probabilities: 0.297 0.703
Node number 104: 33 observations
predicted class=0 expected loss=0.3030303 P(node) =0.03703704
class counts: 23 10
probabilities: 0.697 0.303
Node number 105: 23 observations
predicted class=1 expected loss=0.3478261 P(node) =0.02581369
class counts: 8 15
probabilities: 0.348 0.652
## plot the tree
fancyRpartPlot(biggerTree)
Consider our general regression problem (note can be classification): \[ \mathbf{y}=f(\mathbf{X}) + \mathbf{e} \] and the usual approximation model (linear in its parameters): \[ \mathbf{y}=\mathbf{X}\boldsymbol{\beta} + \mathbf{e} \]