Exercise 6

Question-3

Conceptual Formulas:

Gini Index: Gini = 1 - p^2 - (1 - p)^2 = 2 * p * (1 - p)

Classification Error: Error = 1 - max(p, 1 - p)

Entropy: Entropy = -plog2(p) - (1 - p)log2(1 - p)

These are measures of node impurity used in decision trees.

All are 0 when the node is pure (p = 0 or 1), and maximum at p = 0.5.

# Generate values of p (probability of class 1) from 0 to 1
p <- seq(0, 1, length.out = 100)
# Compute Gini Index
gini <- 2 * p * (1 - p)

# Compute Classification Error
class_error <- 1 - pmax(p, 1 - p)
# Compute Entropy (with fix for log(0))
entropy <- -p * log2(p) - (1 - p) * log2(1 - p)
entropy[is.nan(entropy)] <- 0  # Replace NaNs with 0 (at p = 0 or 1)
plot(p, gini, type = "l", col = "blue", lwd = 2,
     ylim = c(0, 1), xlab = expression(hat(p)[m1]),
     ylab = "Impurity Measure", main = "Impurity Measures vs Class Probability")

lines(p, class_error, col = "red", lwd = 2, lty = 2)
lines(p, entropy, col = "darkgreen", lwd = 2, lty = 3)
legend("topright", legend = c("Gini Index", "Classification Error", "Entropy"),
       col = c("blue", "red", "darkgreen"), lty = c(1, 2, 3), lwd = 2)

Interpretation Summary:

- All three measures peak at p = 0.5 (maximum uncertainty).

- Entropy is most sensitive; Gini is moderate; Error is least.

- Gini is used in CART, Entropy in ID3/C4.5, Error in pruning.


Question-4

(a).

The left panel of Figure 8.14 shows how the 2D space of predictors X1​ and X2​ is split into five rectangular regions. Each box displays the average value of the response variable Y in that region.

We reverse-engineer this to construct the following tree:

                        X1 < 1
                       /      \
                  X2 < 0      Region (Y = 5)
                  /     \
            Region(Y=15)  X1 < 0
                          /    \
                Region(Y=3)   Region(Y=10)

Interpretation:

  • If X1≥1: predict 5

  • If X1​<1 and X2​<0: predict 15

  • If X1​<1, X2≥0, and X1​<0: predict 3

  • If X1​<1, X2​≥0, and X1​≥0: predict 10

This is a regression tree where splits are made recursively on X1​ and X2 and terminal nodes represent the mean Y values.

(b).

The right panel of Figure 8.14 displays a regression tree. To recreate the corresponding partitioning of the predictor space, we follow the tree’s logic:

                 X2 < 1
                /      \
           X1 < 1      X2 < 2
          /     \     /     \
      -1.80    0.63 -1.06  X1 < 0
                         /     \
                      0.21    2.49

Each terminal node represents the average value of Y in a subregion of the predictor space.

# Load required packages
library(rpart)
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 4.4.3
# Simulated data reflecting the right panel's splits and mean values
set.seed(42)
data <- data.frame(
  X1 = c(-2, -1, 0, 1, 2, 1.5, -0.5, 0.5, 2.5),
  X2 = c(0.5, 0.3, 1.5, 0.7, 2.2, 2.5, 1.8, 2.1, 2.8),
  Y  = c(-1.8, -1.8, -1.06, 0.63, 2.49, 2.49, 0.21, 0.21, 2.49)
)

# Fit regression tree
tree_model <- rpart(Y ~ X1 + X2, data = data, method = "anova")

# Plot the tree
rpart.plot(tree_model, type = 4, fallen.leaves = TRUE,
           main = "Regression Tree (Right Panel of Fig 8.14)")

To visualize how the tree divides the predictor space (like the left panel):

# Draw a custom partition plot
plot(NA, xlim = c(-2, 3), ylim = c(-1, 3), xlab = "X1", ylab = "X2",
     main = "Partition Based on Tree Structure", asp = 1)

# Add horizontal splits (X2 = 1 and X2 = 2)
abline(h = 1, col = "blue", lty = 2)
abline(h = 2, col = "blue", lty = 2)

# Add vertical splits (X1 = 1 and X1 = 0)
abline(v = 1, col = "red", lty = 2)
abline(v = 0, col = "red", lty = 2)

# Add region labels
text(-1.5, 0.5, "-1.80")
text(1.5, 0.5, "0.63")
text(-1.5, 1.5, "-1.06")
text(0.5, 2.5, "0.21")
text(2, 2.5, "2.49")

This diagram mimics the left panel from the book but based on the splits of the right panel’s tree.


Question-5

  • Suppose we produce ten bootstrapped samples from a data set containing red and green classes. We then apply a classification tree to each bootstrapped sample and, for a specific value of \(X\), produce 10 estimates of\(P(\textrm{Class is Red}|X)\): > \[0.1, 0.15, 0.2, 0.2, 0.55, 0.6, 0.6, 0.65, 0.7, \textrm{and } 0.75.\]

  • There are two common ways to combine these results together into a single class prediction.

  • One is the majority vote approach discussed in this chapter.

  • The second approach is to classify based on the average probability. In this example, what is the final classification under each of these two approaches?

# Probability estimates from the 10 trees
probs <- c(0.1, 0.15, 0.2, 0.2, 0.55, 0.6, 0.6, 0.65, 0.7, 0.75)

# 1. Majority vote classification
votes_red <- sum(probs > 0.5)
votes_green <- sum(probs <= 0.5)
majority_class <- ifelse(votes_red > votes_green, "Red", "Green")

# 2. Average probability classification
avg_prob <- mean(probs)
avg_class <- ifelse(avg_prob > 0.5, "Red", "Green")

# Print results
cat("Majority Vote Classification:", majority_class, "\n")
## Majority Vote Classification: Red
cat("Average Probability Classification:", avg_class, "\n")
## Average Probability Classification: Green
cat("Average Probability Value:", avg_prob, "\n")
## Average Probability Value: 0.45

Question-6

A regression tree recursively partitions the predictor space to minimize the residual sum of squares (RSS). Here’s how the algorithm works:

  1. Start with all training data in the root node.

  2. For each feature \(X_j\) and each potential split point \(s\), define regions:

    \(R_1 = \{X | X_j < s\}\), \(R_2 = \{X | X_j \geq s\}\)

  3. Minimize RSS: \[ RSS = \sum_{i \in R_1}(y_i - \bar{y}_{R_1})^2 + \sum_{i \in R_2}(y_i - \bar{y}_{R_2})^2 \]

  4. Split at the best (j, s) and repeat recursively.

  5. Stop when a stopping rule is reached (e.g., min node size).

  6. Prediction for any input is the mean \(\bar{y}\) in the terminal node.

  7. Prune using cost-complexity pruning: \[ C_\alpha(T) = \text{Total RSS} + \alpha \cdot |T| \]

This method builds an interpretable model while minimizing variance within each region.