Decision tree models are a flexible and interpretable method for classification (and regression). They work by recursively splitting the data into subsets based on values of the predictor variables, with the goal of creating groups that are as homogeneous as possible with respect to the response variable. The final result is a tree-like structure consisting of decision rules that lead to a predicted class.

One of the main advantages of decision trees is their interpretability: the model can be visualised and understood as a sequence of if–then rules, making it easy to explain how predictions are made.

In the following section, we will learn how to fit decision tree models in R, how to visualise the resulting tree, and how to evaluate its predictive performance.

Decision trees in R

Decision trees in R are typically fitted using the rpart() function from the rpart package. The general syntax is:

Syntax:
rpart(formula, data, method = “class”, control = rpart.control())

Where:

formula: Specifies the response variable and predictors (e.g. y ~ .)

data: The dataset used to build the model

method: Defines the type of tree (“class” for classification, “anova” for regression)

control: Allows optional tuning of tree complexity (e.g. limiting tree depth or setting minimum split sizes)

The resulting model contains a set of decision rules that can be used to classify new observations.

Read data

First we need to read in our data into R.Throughtout this example we will use the wine data. These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.

The attributes are:

Alcohol
Malic acid
Ash
Alcalinity of ash
Magnesium
Total phenols
Flavanoids
Nonflavanoid phenols
Proanthocyanins
Color intensity
Hue - OD280/OD315 of diluted wines
Proline

The wine data is in a .txt format, so to read in the data we can use the read.table() function in R.

wine <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", sep=",")

colnames(wine) <- c("Cultivar","Alcohol","Malic acid","Ash","Alcalinity of ash","Magnesium","Total phenols","Flavanoids","Nonflavanoid phenols","Proanthocyanins","Color intensity","Hue","OD280/OD315 of diluted wines","Proline")

dim(wine)
#> [1] 178  14

head(wine, 5)
#>   Cultivar Alcohol Malic acid  Ash Alcalinity of ash Magnesium Total phenols
#> 1        1   14.23       1.71 2.43              15.6       127          2.80
#> 2        1   13.20       1.78 2.14              11.2       100          2.65
#> 3        1   13.16       2.36 2.67              18.6       101          2.80
#> 4        1   14.37       1.95 2.50              16.8       113          3.85
#> 5        1   13.24       2.59 2.87              21.0       118          2.80
#>   Flavanoids Nonflavanoid phenols Proanthocyanins Color intensity  Hue
#> 1       3.06                 0.28            2.29            5.64 1.04
#> 2       2.76                 0.26            1.28            4.38 1.05
#> 3       3.24                 0.30            2.81            5.68 1.03
#> 4       3.49                 0.24            2.18            7.80 0.86
#> 5       2.69                 0.39            1.82            4.32 1.04
#>   OD280/OD315 of diluted wines Proline
#> 1                         3.92    1065
#> 2                         3.40    1050
#> 3                         3.17    1185
#> 4                         3.45    1480
#> 5                         2.93     735

The wine dataset contains 178 observations of 14 variables, including the 13 measured quantities of chemicals and the variable Cultivar, which indicates the type of grape from which the wine was produced.

Split data into training and test set

To evaluate the performance of a decision tree classifier, we first need to split the data into a training set and a test set. The training set is used to build the model, while the test set is used to assess how well the model performs on unseen data.

In R, this can be done using the createDataPartition() function from the caret package. This function ensures that the split is done in a way that preserves the class distribution in both sets (i.e., a stratified sample).

A common choice is an 80–20 split, where 80% of the data is used for training and 20% for testing.

library(caret)
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 4.5.3
#> Loading required package: lattice

set.seed(255)

wine$Cultivar <- factor(wine$Cultivar)

trainIndex <- createDataPartition(wine$Cultivar, 
                                  times=1, 
                                  p = .8, 
                                  list = FALSE)

train <- wine[trainIndex, ]
test <- wine[-trainIndex, ]

Decision tree building

Decision trees are fitted by recursively splitting the data into increasingly homogeneous groups based on the predictor variables. In R, this is done using the rpart() function, which builds a tree by selecting splits that best separate the classes in the response variable.

library(rpart)

treeModel <- rpart(
  Cultivar ~ .,
  data = train,
  method = "class"
)

Once a decision tree has been fitted using rpart(), it can be visualised to show the sequence of decision rules that define the model. A common and convenient way to do this is using the rpart.plot package.

library(rpart.plot)
#> Warning: package 'rpart.plot' was built under R version 4.5.3

rpart.plot(treeModel)

This produces a clear diagram of the tree structure, showing:

the splitting rules at each node,
the predicted class at each terminal node,
and the proportions of observations in each group.

The resulting plot makes the model highly interpretable, as it shows exactly how predictions are made through a sequence of if–then decisions. Each path from the root to a terminal node corresponds to a classification rule, making it easy to understand which variables are most important in separating the classes.

If you prefer to extract the rules in text form (instead of traversing through the diagram) you can do this using the rpart.rules function.

library(rpart.plot)
rules <- rpart.rules(treeModel)

print(rules)
#>  Cultivar     1    2    3                                                                  
#>         1 [1.00  .00  .00] when Color intensity >= 3.8 & Flavanoids >= 1.4 & Proline >= 725
#>         2 [ .11  .89  .00] when Color intensity >= 3.8 & Flavanoids >= 1.4 & Proline <  725
#>         2 [ .04  .96  .00] when Color intensity <  3.8                                     
#>         3 [ .00  .00 1.00] when Color intensity >= 3.8 & Flavanoids <  1.4

Make predictions and evaluate

Once the decision tree model has been fitted, it can be used to predict the class labels of new (test) observations. In R, this is done using the predict() function with type = “class” to obtain predicted class memberships. These predicted labels can then be compared to the true class labels to evaluate how well the model performs.

A standard way to assess classification performance is through a confusion matrix, which summarises how many observations are correctly and incorrectly classified for each class. In R, this can be computed using the confusionMatrix() function from the caret package:

predictions <- predict(treeModel, test, type = "class")

confusionMatrix(predictions,test$Cultivar)
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction  1  2  3
#>          1  9  0  0
#>          2  2 14  1
#>          3  0  0  8
#> 
#> Overall Statistics
#>                                           
#>                Accuracy : 0.9118          
#>                  95% CI : (0.7632, 0.9814)
#>     No Information Rate : 0.4118          
#>     P-Value [Acc > NIR] : 1.474e-09       
#>                                           
#>                   Kappa : 0.8635          
#>                                           
#>  Mcnemar's Test P-Value : NA              
#> 
#> Statistics by Class:
#> 
#>                      Class: 1 Class: 2 Class: 3
#> Sensitivity            0.8182   1.0000   0.8889
#> Specificity            1.0000   0.8500   1.0000
#> Pos Pred Value         1.0000   0.8235   1.0000
#> Neg Pred Value         0.9200   1.0000   0.9615
#> Prevalence             0.3235   0.4118   0.2647
#> Detection Rate         0.2647   0.4118   0.2353
#> Detection Prevalence   0.2647   0.5000   0.2353
#> Balanced Accuracy      0.9091   0.9250   0.9444

A confusion matrix shows the classification results per class, the overall accuracy, indicating the proportion of correctly classified observations, and additional statistics such as the Kappa value, which accounts for agreement occurring by chance.

Together, prediction and confusion matrix evaluation provide a clear assessment of how well the decision tree generalises to unseen data.

Pruning

Decision trees are highly flexible models, which makes them prone to overfitting the training data. An overfitted tree can become too complex, capturing noise rather than the underlying structure of the data, which leads to poor performance on new observations.

To address this, we use pruning, which simplifies the tree by reducing its size. There are two main approaches:

Pre-pruning (early stopping): the tree is restricted during construction by setting constraints such as maximum depth, minimum number of observations in a node, or minimum improvement required for a split. This prevents the tree from growing too complex in the first place.
Post-pruning: the tree is first grown to a large size and then simplified by removing branches that do not contribute significantly to predictive performance.

Pre-pruning can be controlled directly in the rpart() function using the control argument:

tree_pre <- rpart(
  Cultivar ~ .,
  data = train,
  method = "class",
  control = rpart.control(
    maxdepth = 3,
    minsplit = 10,
    cp = 0.01
  )
)

Here:

maxdepth limits the depth of the tree,
minsplit sets the minimum number of observations required to attempt a split,
cp (complexity parameter) controls how much a split must improve the model to be included.

Post-pruning is typically done using the complexity parameter (cp) and cross-validation results from the fitted rpart model.

After fitting a full tree, we examine the complexity parameter table:

printcp(treeModel)
#> 
#> Classification tree:
#> rpart(formula = Cultivar ~ ., data = train, method = "class")
#> 
#> Variables actually used in tree construction:
#> [1] Color intensity Flavanoids      Proline        
#> 
#> Root node error: 87/144 = 0.60417
#> 
#> n= 144 
#> 
#>        CP nsplit rel error  xerror     xstd
#> 1 0.44253      0  1.000000 1.00000 0.067452
#> 2 0.08046      2  0.114943 0.17241 0.042135
#> 3 0.01000      3  0.034483 0.11494 0.035063

plotcp(treeModel)

We can then choose the value of cp that minimises cross-validated error and prune the tree accordingly:

optimal_cp <- treeModel$cptable[which.min(treeModel$cptable[, "xerror"]), "CP"] #look for the optimal cp value

pruned_tree <- prune(treeModel, cp = optimal_cp)

rpart.plot(pruned_tree)

Pre-pruning restricts the growth of the tree during model fitting, while post-pruning simplifies a fully grown tree after it has been constructed. Both approaches aim to improve generalisation by reducing overfitting and producing a more interpretable model.

Decision trees

dr. Annelies Agten

2026-04-18