In a binary classification problem with two classes, we can use different impurity measures to decide how “pure” or “impure” a node is in a decision tree. Below is the R code to plot how the Gini index, classification error, and entropy vary with the probability of class 1
# Create a sequence of values for pm1 from 0 to 1
pm1 <- seq(0, 1, by = 0.01)
# Calculate Gini index
gini <- 2 * pm1 * (1 - pm1)
# Calculate classification error
class_error <- 1 - pmax(pm1, 1 - pm1)
# Calculate entropy
entropy <- -pm1 * log2(pm1) - (1 - pm1) * log2(1 - pm1)
entropy[is.nan(entropy)] <- 0 # Handle 0 * log(0) = NaN
# Plot all three metrics
plot(pm1, gini, type = "l", col = "red", lwd = 2,
ylab = "Impurity Measure", xlab = expression(hat(p)[m1]),
ylim = c(0, 1), main = "Impurity Measures vs. p̂ₘ₁")
lines(pm1, class_error, col = "blue", lwd = 2)
lines(pm1, entropy, col = "darkgreen", lwd = 2)
# Add legend
legend("top", legend = c("Gini Index", "Classification Error", "Entropy"),
col = c("red", "blue", "darkgreen"), lwd = 2)
In a binary classification problem with two classes, we often use impurity measures like Gini index, classification error, and entropy to decide how pure a node is when building a decision tree. The following R code plots all three impurity measures as a function of the probability of class 1 (denoted as pm1), ranging from 0 to 1.
This tree is built from the rectangular regions and their mean values shown in the left panel. The splits and corresponding predictions are as follows
X2 < 1
/ \
Yes No
/ \
X1 < 1 Y = 15
/ \
X1 < 0.5 Y = 10
/ \
Y = 3 Y = 0
Explanation:
First split is on X2 < 1.
If X2 >= 1 → mean Y = 15.
If X2 < 1:
Then split on X1 < 1.
If X1 >= 1 → mean Y = 5 (not shown above but implied).
If X1 < 1:
If X1 < 0.5 and X2 < 0.5 → Y = 3.
If X1 < 0.5 and X2 >= 0.5 → Y = 0.
If 0.5 ≤ X1 < 1 → Y = 10.
We now translate the tree from the right panel into a partition of the X1-X2 space.
Steps:
First split: X2 < 1 → creates two regions.
If X2 < 1:
Split again on X1 < 1
If yes → Y = -1.80
If no → Y = 0.63
If X2 >= 1:
Split on X2 < 2
If yes → Split on X1 < 0
If yes → Y = -1.06
If no → Y = 0.21
If no → Y = 2.49
You can visualize the predictor space as a rectangle:
First horizontal split at X2 = 1
Left region: vertical split at X1 = 1
Right region (X2 ≥ 1): further horizontal split at X2 = 2, and vertical split at X1 = 0 (within X2 between 1 and 2) Label each rectangular region with its mean Y value:
Top-right: Y = 2.49
Mid-right left: Y = -1.06
Mid-right right: Y = 0.21
Bottom-left left: Y = -1.80
Bottom-left right: Y = 0.63
We are given 10 estimates of the probability that the class is Red for a specific value of X:
0.1, 0.15, 0.2, 0.2, 0.55, 0.6, 0.6, 0.65, 0.7, 0.75
There are two common methods to produce a final classification:
Majority Vote Approach:
Each tree classifies the observation based on whether the probability is greater than 0.5.
Probabilities greater than 0.5: 0.55, 0.6, 0.6, 0.65, 0.7, 0.75 → 6 trees vote Red
Probabilities less than or equal to 0.5: 0.1, 0.15, 0.2, 0.2 → 4 trees vote Green
Since the majority of trees vote Red, the final classification is Red.
The regression tree algorithm is used to predict a continuous outcome variable by recursively splitting the data into regions that are as homogeneous as possible. Here is a step-by-step explanation of how the algorithm works:
#Step 1: Start with the full dataset
Consider all observations and all predictor variables.
The goal is to split the data into two groups that minimize the prediction error.
#Step 2: Find the best split
For each predictor variable, consider all possible split points (thresholds).
At each possible split, divide the data into two regions.
For each split, calculate the Residual Sum of Squares (RSS): RSS = sum of squared differences between actual Y values and the mean of Y in that region.
#Step 3: Choose the split that minimizes RSS
Among all variables and all possible split points, choose the one that gives the lowest total RSS (sum of RSS for the two resulting regions).
#Step 4: Repeat the process
The same process is applied recursively to each of the two resulting regions.
At each step, choose the best split for that region based on minimizing RSS.
#Step 5: Stop when a stopping criterion is met
The splitting process continues until a stopping rule is reached, such as:
A minimum number of observations in a node
A minimum decrease in RSS
A maximum tree depth
#Step 6: Predict the outcome
For a new observation, follow the splits in the tree to reach a terminal node.
The predicted value is the average of the Y values in that terminal node.