Nathaniel Phillips, Economic Psychology, University of Basel
BaselR Meeting, March 2017
Tree dramatically outperformed the doctor's clinical judgments and resulted in far fewer false-positives and huge cost savings
To this day, the tree is still used at the hospital.
A fast and frugal decision tree (FFT) is a very simple, highly restricted decision tree.
In an FFT, each node has exactly two branches, where at least one branch is an exit branch (Martignon et al., 2008).
FFTs are even faster and require less information than non-FFT trees.
There is no off-the-shelf method to construct FFTs
FFTrees.# Available on CRAN
install.packages("FFTrees")
devtools::github("ndphillips/FFTrees", include_vignette = TRUE)
library(FFTrees)
head(heartdisease)
## age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca
## 1 63 1 ta 145 233 1 hypertrophy 150 0 2.3 down 0
## 2 67 1 a 160 286 0 hypertrophy 108 1 1.5 flat 3
## 3 67 1 a 120 229 0 hypertrophy 129 1 2.6 flat 2
## 4 37 1 np 130 250 0 normal 187 0 3.5 down 0
## 5 41 0 aa 130 204 0 hypertrophy 172 0 1.4 up 0
## 6 56 1 aa 120 236 0 normal 178 0 0.8 up 0
## thal diagnosis
## 1 fd 0
## 2 normal 1
## 3 rd 1
## 4 normal 0
## 5 normal 0
## 6 normal 0
# Step 1: Create training and test data
set.seed(100)
heartdisease <- heartdisease[sample(nrow(heartdisease)),]
heart.train <- heartdisease[1:150,]
heart.test <- heartdisease[151:303,]
# Step 2: Create heart.fft
heart.fft <- FFTrees(formula = diagnosis ~.,
data = heart.train,
data.test = heart.test)
# Step 3: Summary statistics
heart.fft
## [1] "7 FFTs using up to 4 of 13 cues"
## [1] "FFT #4 uses 3 cues {thal,cp,ca} with the following performance:"
## train test
## n 150.00 153.00
## pci 0.88 0.88
## mcu 1.74 1.73
## acc 0.80 0.82
## bacc 0.80 0.82
## sens 0.82 0.88
## spec 0.79 0.76
plot(heart.fft, what = "cues", main = "Heart Disease")
plot(heart.fft,
main = "Heart Disease",
decision.names = c("healthy", "sick"),
stats = FALSE)
plot(heart.fft,
main = "Heart Disease",
decision.names = c("healthy", "sick"))
| dataset | cases | cues | base.rate |
|---|---|---|---|
| arrhythmia | 68 | 280 | 0.29 |
| audiology | 226 | 70 | 0.10 |
| breast | 683 | 10 | 0.35 |
| bridges | 92 | 10 | 0.39 |
| cmc | 1473 | 10 | 0.35 |
Table: 5 of the 10 prediction datasets