What is Classification Tree?

Classification and regression trees are machine-learning methods for constructing prediction models from data. The models are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. As a result, the partitioning can be represented graphically as a decision tree. Classification trees are designed for dependent variables that take a finite number of unordered values, with prediction error measured in terms of misclassification cost.

The dataset contains 800 observations and 8 variables.

## 
## airbus boeing 
##    400    400

400 observations from Airbus flights and 400 from Boeing flights.

Histogram

Splitting data into Training and Testing samples

Split the dataset into 70:30 ratio where 70% observations will be training data on which the model will be created and 30% observations will be used as testing data to validate our models and check out of sample prediction performance in terms of correctly classifying the observations.

set.seed(123456)
index <- sample(nrow(FAA), nrow(FAA)*0.70)
FAA.train <- FAA[index,]
FAA.test <- FAA[-index,]

Model Building

Classification tree model on the testing data using symmetric cost function using method=“class” to predict the class of response (0 or 1).

## n= 546 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 546 230 0 (0.57875458 0.42124542)  
##    2) speed_ground< 82.5009 301  22 0 (0.92691030 0.07308970)  
##      4) height< 40.42458 240   5 0 (0.97916667 0.02083333) *
##      5) height>=40.42458 61  17 0 (0.72131148 0.27868852)  
##       10) speed_ground< 72.5781 32   1 0 (0.96875000 0.03125000) *
##       11) speed_ground>=72.5781 29  13 1 (0.44827586 0.55172414)  
##         22) aircraft=airbus 12   1 0 (0.91666667 0.08333333) *
##         23) aircraft=boeing 17   2 1 (0.11764706 0.88235294) *
##    3) speed_ground>=82.5009 245  37 1 (0.15102041 0.84897959)  
##      6) speed_ground< 89.78981 85  33 1 (0.38823529 0.61176471)  
##       12) aircraft=airbus 46  13 0 (0.71739130 0.28260870) *
##       13) aircraft=boeing 39   0 1 (0.00000000 1.00000000) *
##      7) speed_ground>=89.78981 160   4 1 (0.02500000 0.97500000) *

The binary classification decision rule is if the fitted P(y=1)>0.5 then y=1. The value 0.5 is called cut-off probability. You can choose the cut-off probability based on mis-classification rate, cost function, etc. In this case, the cost function can indicate the trade off between having a risky landing where the prediction was a safe landing (predict 0, truth 1), and the risk of having a safe landing where the prediction was a risky landing (predict 1, truth 0).

These tables illustrate the impact of choosing a symmetric cost function with a cut-off probability of 0.5. Choosing a large cut-off probability will result in few cases being predicted as 1, and choosing a small cut-off probability will result in many cases being predicted as 1.

For training data

##     Pred
## True   0   1
##    0 310   6
##    1  20 210

For testing data

##     Pred
## True   0   1
##    0 141   6
##    1   9  79

Let us define a cost-function to calculate the mis-classification or error rate of the Classification Tree model.

cost <- function(r, pi){
  weight1 = 1
  weight0 = 1
  c1 = (r==1)&(pi==0) #logical vector - true if actual 1 but predict 0
  c0 = (r==0)&(pi==1) #logical vector - true if actual 0 but predict 1
  return(mean(weight1*c1+weight0*c0))
}

Mis-classification rate for Training data

## [1] 0.04761905

Mis-classification rate for Testing data

## [1] 0.06382979

ROC Curve

In order to show give an overall measure of goodness of classification, using the Receiver Operating Characteristic (ROC) curve is one way. Rather than use an overall misclassification rate, it employs two measures – true positive fraction (TPF) and false positive fraction (FPF).

True positive fraction, TPF = TP/TP+FN: is the proportion of true positives correctly predicted as positive.

False positive fraction, FPF = FP/FP+TN = 1− TN/FP+TN: is the proportion of true negatives incorrectly predicted as positive.

Confusion matrix

AUC

Calculating Area Under the Curve for Out-of-sample performance

## [1] 0.9542749

The above figure shows that our model is doing really great in classifying 0 as 0 and 1 as 1 with an approximate 96% Area under the curve.

Classification tree using Asymmetric cost function

FAA.rpart <- rpart(formula = risky_landing ~ . -distance , data = FAA.train,
                   method = "class", parms = list(loss=matrix(c(0,10,1,0),
                                                              nrow = 2)))
FAA.rpart
## n= 546 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##   1) root 546 316 1 (0.578754579 0.421245421)  
##     2) speed_ground< 72.60679 191  10 0 (0.994764398 0.005235602) *
##     3) speed_ground>=72.60679 355 126 1 (0.354929577 0.645070423)  
##       6) speed_ground< 79.29716 75  66 1 (0.880000000 0.120000000)  
##        12) height< 38.83051 55   0 0 (1.000000000 0.000000000) *
##        13) height>=38.83051 20  11 1 (0.550000000 0.450000000)  
##          26) aircraft=airbus 9   0 0 (1.000000000 0.000000000) *
##          27) aircraft=boeing 11   2 1 (0.181818182 0.818181818) *
##       7) speed_ground>=79.29716 280  60 1 (0.214285714 0.785714286)  
##        14) speed_ground< 89.78981 120  56 1 (0.466666667 0.533333333)  
##          28) aircraft=airbus 67  53 1 (0.791044776 0.208955224)  
##            56) height< 25.00017 16   0 0 (1.000000000 0.000000000) *
##            57) height>=25.00017 51  37 1 (0.725490196 0.274509804)  
##             114) speed_ground< 81.29059 11   0 0 (1.000000000 0.000000000) *
##             115) speed_ground>=81.29059 40  26 1 (0.650000000 0.350000000) *
##          29) aircraft=boeing 53   3 1 (0.056603774 0.943396226) *
##        15) speed_ground>=89.78981 160   4 1 (0.025000000 0.975000000) *
prp(FAA.rpart, extra=1)

For training data

##     Pred
## True   0   1
##    0 281  35
##    1   1 229

For testing data

pred_test_asym <- predict(FAA.rpart, FAA.test, type="class")
table(FAA.test$risky_landing, pred_test_asym, dnn = c("True", "Pred"))
##     Pred
## True   0   1
##    0 126  21
##    1   0  88

Let us define a cost-function to calculate the mis-classification or error rate of the Classification Tree model. Here we make the assumption that false negative cost 10 times of false positive. In real life the cost structure should be carefully researched.

cost_asym <- function(r, pi){
  weight1 = 10
  weight0 = 1
  c1 = (r==1)&(pi==0) #logical vector - true if actual 1 but predict 0
  c0 = (r==0)&(pi==1) #logical vector - true if actual 0 but predict 1
  return(mean(weight1*c1+weight0*c0))
}

Mis-classification rate for Training data

## [1] 0.08241758

Mis-classification rate for Testing data

## [1] 0.0893617

Calculating Area Under the Curve for Out-of-sample performance

## [1] 0.9753015