Decision trees

dr. Annelies Agten

2026-04-18

Decision tree models are a flexible and interpretable method for classification (and regression). They work by recursively splitting the data into subsets based on values of the predictor variables, with the goal of creating groups that are as homogeneous as possible with respect to the response variable. The final result is a tree-like structure consisting of decision rules that lead to a predicted class.

One of the main advantages of decision trees is their interpretability: the model can be visualised and understood as a sequence of if–then rules, making it easy to explain how predictions are made.

In the following section, we will learn how to fit decision tree models in R, how to visualise the resulting tree, and how to evaluate its predictive performance.

Decision trees in R

Decision trees in R are typically fitted using the rpart() function from the rpart package. The general syntax is:

Syntax:
rpart(formula, data, method = “class”, control = rpart.control())

Where:

The resulting model contains a set of decision rules that can be used to classify new observations.

Read data

First we need to read in our data into R.Throughtout this example we will use the wine data. These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.

The attributes are:

The wine data is in a .txt format, so to read in the data we can use the read.table() function in R.

wine <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", sep=",")

colnames(wine) <- c("Cultivar","Alcohol","Malic acid","Ash","Alcalinity of ash","Magnesium","Total phenols","Flavanoids","Nonflavanoid phenols","Proanthocyanins","Color intensity","Hue","OD280/OD315 of diluted wines","Proline")

dim(wine)
#> [1] 178  14

head(wine, 5)
#>   Cultivar Alcohol Malic acid  Ash Alcalinity of ash Magnesium Total phenols
#> 1        1   14.23       1.71 2.43              15.6       127          2.80
#> 2        1   13.20       1.78 2.14              11.2       100          2.65
#> 3        1   13.16       2.36 2.67              18.6       101          2.80
#> 4        1   14.37       1.95 2.50              16.8       113          3.85
#> 5        1   13.24       2.59 2.87              21.0       118          2.80
#>   Flavanoids Nonflavanoid phenols Proanthocyanins Color intensity  Hue
#> 1       3.06                 0.28            2.29            5.64 1.04
#> 2       2.76                 0.26            1.28            4.38 1.05
#> 3       3.24                 0.30            2.81            5.68 1.03
#> 4       3.49                 0.24            2.18            7.80 0.86
#> 5       2.69                 0.39            1.82            4.32 1.04
#>   OD280/OD315 of diluted wines Proline
#> 1                         3.92    1065
#> 2                         3.40    1050
#> 3                         3.17    1185
#> 4                         3.45    1480
#> 5                         2.93     735

The wine dataset contains 178 observations of 14 variables, including the 13 measured quantities of chemicals and the variable Cultivar, which indicates the type of grape from which the wine was produced.

Split data into training and test set

To evaluate the performance of a decision tree classifier, we first need to split the data into a training set and a test set. The training set is used to build the model, while the test set is used to assess how well the model performs on unseen data.

In R, this can be done using the createDataPartition() function from the caret package. This function ensures that the split is done in a way that preserves the class distribution in both sets (i.e., a stratified sample).

A common choice is an 80–20 split, where 80% of the data is used for training and 20% for testing.

library(caret)
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 4.5.3
#> Loading required package: lattice

set.seed(255)

wine$Cultivar <- factor(wine$Cultivar)

trainIndex <- createDataPartition(wine$Cultivar, 
                                  times=1, 
                                  p = .8, 
                                  list = FALSE)

train <- wine[trainIndex, ]
test <- wine[-trainIndex, ]

Decision tree building

Decision trees are fitted by recursively splitting the data into increasingly homogeneous groups based on the predictor variables. In R, this is done using the rpart() function, which builds a tree by selecting splits that best separate the classes in the response variable.

library(rpart)

treeModel <- rpart(
  Cultivar ~ .,
  data = train,
  method = "class"
)

Once a decision tree has been fitted using rpart(), it can be visualised to show the sequence of decision rules that define the model. A common and convenient way to do this is using the rpart.plot package.

library(rpart.plot)
#> Warning: package 'rpart.plot' was built under R version 4.5.3

rpart.plot(treeModel)

This produces a clear diagram of the tree structure, showing:

The resulting plot makes the model highly interpretable, as it shows exactly how predictions are made through a sequence of if–then decisions. Each path from the root to a terminal node corresponds to a classification rule, making it easy to understand which variables are most important in separating the classes.

If you prefer to extract the rules in text form (instead of traversing through the diagram) you can do this using the rpart.rules function.

library(rpart.plot)
rules <- rpart.rules(treeModel)

print(rules)
#>  Cultivar     1    2    3                                                                  
#>         1 [1.00  .00  .00] when Color intensity >= 3.8 & Flavanoids >= 1.4 & Proline >= 725
#>         2 [ .11  .89  .00] when Color intensity >= 3.8 & Flavanoids >= 1.4 & Proline <  725
#>         2 [ .04  .96  .00] when Color intensity <  3.8                                     
#>         3 [ .00  .00 1.00] when Color intensity >= 3.8 & Flavanoids <  1.4

Make predictions and evaluate

Once the decision tree model has been fitted, it can be used to predict the class labels of new (test) observations. In R, this is done using the predict() function with type = “class” to obtain predicted class memberships. These predicted labels can then be compared to the true class labels to evaluate how well the model performs.

A standard way to assess classification performance is through a confusion matrix, which summarises how many observations are correctly and incorrectly classified for each class. In R, this can be computed using the confusionMatrix() function from the caret package:

predictions <- predict(treeModel, test, type = "class")

confusionMatrix(predictions,test$Cultivar)
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction  1  2  3
#>          1  9  0  0
#>          2  2 14  1
#>          3  0  0  8
#> 
#> Overall Statistics
#>                                           
#>                Accuracy : 0.9118          
#>                  95% CI : (0.7632, 0.9814)
#>     No Information Rate : 0.4118          
#>     P-Value [Acc > NIR] : 1.474e-09       
#>                                           
#>                   Kappa : 0.8635          
#>                                           
#>  Mcnemar's Test P-Value : NA              
#> 
#> Statistics by Class:
#> 
#>                      Class: 1 Class: 2 Class: 3
#> Sensitivity            0.8182   1.0000   0.8889
#> Specificity            1.0000   0.8500   1.0000
#> Pos Pred Value         1.0000   0.8235   1.0000
#> Neg Pred Value         0.9200   1.0000   0.9615
#> Prevalence             0.3235   0.4118   0.2647
#> Detection Rate         0.2647   0.4118   0.2353
#> Detection Prevalence   0.2647   0.5000   0.2353
#> Balanced Accuracy      0.9091   0.9250   0.9444

A confusion matrix shows the classification results per class, the overall accuracy, indicating the proportion of correctly classified observations, and additional statistics such as the Kappa value, which accounts for agreement occurring by chance.

Together, prediction and confusion matrix evaluation provide a clear assessment of how well the decision tree generalises to unseen data.

Pruning

Decision trees are highly flexible models, which makes them prone to overfitting the training data. An overfitted tree can become too complex, capturing noise rather than the underlying structure of the data, which leads to poor performance on new observations.

To address this, we use pruning, which simplifies the tree by reducing its size. There are two main approaches:

Pre-pruning can be controlled directly in the rpart() function using the control argument:

tree_pre <- rpart(
  Cultivar ~ .,
  data = train,
  method = "class",
  control = rpart.control(
    maxdepth = 3,
    minsplit = 10,
    cp = 0.01
  )
)

Here:

Post-pruning is typically done using the complexity parameter (cp) and cross-validation results from the fitted rpart model.

After fitting a full tree, we examine the complexity parameter table:

printcp(treeModel)
#> 
#> Classification tree:
#> rpart(formula = Cultivar ~ ., data = train, method = "class")
#> 
#> Variables actually used in tree construction:
#> [1] Color intensity Flavanoids      Proline        
#> 
#> Root node error: 87/144 = 0.60417
#> 
#> n= 144 
#> 
#>        CP nsplit rel error  xerror     xstd
#> 1 0.44253      0  1.000000 1.00000 0.067452
#> 2 0.08046      2  0.114943 0.17241 0.042135
#> 3 0.01000      3  0.034483 0.11494 0.035063

plotcp(treeModel)

We can then choose the value of cp that minimises cross-validated error and prune the tree accordingly:

optimal_cp <- treeModel$cptable[which.min(treeModel$cptable[, "xerror"]), "CP"] #look for the optimal cp value

pruned_tree <- prune(treeModel, cp = optimal_cp)

rpart.plot(pruned_tree)

Pre-pruning restricts the growth of the tree during model fitting, while post-pruning simplifies a fully grown tree after it has been constructed. Both approaches aim to improve generalisation by reducing overfitting and producing a more interpretable model.