p <- seq(0, 1, by = 0.01)
gini <- 2 * p * (1 - p)
class_error <- 1 - pmax(p, 1 - p)
entropy <- - (p * log2(p) + (1 - p) * log2(1 - p))
entropy[is.na(entropy)] <- 0
plot(p, gini, type = "l", col = "red", ylim = c(0, 1), ylab = "Impurity Measure", xlab = "P(Class = 1)")
lines(p, class_error, col = "blue")
lines(p, entropy, col = "darkgreen")
legend("topright", legend = c("Gini Index", "Classification Error", "Entropy"),
col = c("red", "blue", "darkgreen"), lty = 1)
X1 < 0.5
/ \
Yes No
(6) X2 < 0.5
/ \
Yes No
(2) (10)
# x1
#
# 0 1
# |--------------------------------|
# | 2.49 |
# 2 |--------------------------------|
#x2 | -1.06 | 0.21 |
# 1 |--------------------------------|
# | -1.8 | 0.63 |
# |--------------------------------|
We are given 10 probabilities:
0.1, 0.15, 0.2, 0.2, 0.55, 0.6, 0.6, 0.65, 0.7, 0.75
Count < 0.5 = 4 Majority Vote Classification: Red
Regression Tree Fitting Algorithm: Start with all data at the root node.
Search over all predictors (X) and all possible cutpoints to find the split that minimizes the residual sum of squares (RSS).
Split the data into two regions based on the optimal cutpoint found in step 2.
Repeat recursively: Apply steps 2 and 3 to each resulting region, continuing to split based on minimizing RSS.
Stop when:
A minimum node size is reached,
The improvement in RSS is not significant,
Or a maximum tree depth is specified.
Pruning:
To avoid overfitting, trees are often grown large and then pruned back using cost-complexity pruning, where a complexity parameter α penalizes tree size.
The best α is typically chosen via cross-validation.
This greedy, top-down approach is known as recursive binary splitting and is central to CART (Classification and Regression Trees).