3. Consider the Gini index, classification error, and entropy in a simple classification setting with two classes. Create a single plot that dis plays each of these quantities as a function of ˆ pm1. The x-axis should display ˆ pm1, ranging from 0 to 1, and the y-axis should display the value of the Gini index, classification error, and entropy. Hint: In a setting with two classes, ˆ pm1 =1− ˆ pm2. You could make this plot by hand, but it will be much easier to make in R.

# Define pm1 from 0 to 1
pm1 <- seq(0, 1, by = 0.01)
pm2 <- 1 - pm1

# Gini index
gini <- 1 - pm1^2 - pm2^2

# Classification error
class_error <- 1 - pmax(pm1, pm2)

# Entropy
entropy <- -pm1 * log2(pm1) - pm2 * log2(pm2)
entropy[is.nan(entropy)] <- 0  # handle 0*log(0)

# Plot
plot(pm1, gini, type = "l", col = "red", ylim = c(0,1), ylab = "Value", xlab = expression(hat(p)[m1]))
lines(pm1, class_error, col = "blue")
lines(pm1, entropy, col = "darkgreen")
legend("topright", legend = c("Gini", "Classification Error", "Entropy"),
       col = c("red", "blue", "darkgreen"), lty = 1)

Explanation:
This code computes and plots the Gini index, classification error, and entropy as functions of the estimated class probability p^m1 for a two-class setting.

4. This question relates to the plots in Figure 8.14.
(a) Sketch the tree corresponding to the partition of the predictor space illustrated in the left-hand panel of Figure 8.14. The numbers inside the boxes indicate the mean of Y within each region.

library(rpart)
## Warning: package 'rpart' was built under R version 4.4.3
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 4.4.3
# Fake data to match the regions (for illustration)
x1 <- c(0.25, 0.75, 0.75, 0.25, 1.5)
x2 <- c(0.75, 0.25, 0.75, 0.25, 0.75)
y <- c(15, 0, 10, 3, 5)
data <- data.frame(x1, x2, y)

# Manually specify splits to match the partition (not fit, but for visualization)
tree <- rpart(y ~ x1 + x2, data = data, control = rpart.control(cp = 0.01, minsplit = 1))
rpart.plot(tree, main = "Tree for Partition in Figure 8.14 (Left)")

Explanation:
This code uses dummy data and rpart to sketch a tree structure similar to the left panel of Figure 8.14, illustrating how the predictor space is split.
(b) Create a diagram similar to the left-hand panel of Figure 8.14, using the tree illustrated in the right-hand panel of the same figure. You should divide up the predictor space into the correct regions, and indicate the mean for each region
.

# Create a grid for X1 and X2
x1 <- seq(-2, 3, length = 100)
x2 <- seq(-1, 3, length = 100)
grid <- expand.grid(X1 = x1, X2 = x2)

# Use the tree splits from the right panel (manually coded)
partition <- with(grid, ifelse(X2 < 1,
                        ifelse(X1 < 1, -1.80, 0.63),
                        ifelse(X2 < 2,
                               ifelse(X1 < 0, -1.06, 0.21),
                               2.49)))

# Plot the partition
library(ggplot2)
grid$Mean <- partition
ggplot(grid, aes(x = X1, y = X2, fill = as.factor(Mean))) +
  geom_tile() +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "Partition of Predictor Space (Tree from Figure 8.14 Right)",
       fill = "Mean Y") +
  theme_minimal()

Explanation:
The above code manually implements the splits from the right-hand tree in Figure 8.14 and visualizes the partition of the predictor space, coloring each region by its mean.

5. Suppose we produce ten bootstrapped samples from a data set containing red and green classes. We then apply a classification tree to each bootstrapped sample and, for a specific value of X, produce 10 estimates of P(Class is Red|X):

0.1, 0.15,0.2,0.2,0.55,0.6,0.6,0.65,0.7, and 0.75.

There are two common ways to combine these results together into a single class prediction. One is the majority vote approach discussed in this chapter. The second approach is to classify based on the average probability. In this example, what is the final classification under each of these two approaches?

# Probabilities from 10 bootstrapped trees
probs <- c(0.1, 0.15, 0.2, 0.2, 0.55, 0.6, 0.6, 0.65, 0.7, 0.75)

# Majority vote (count how many > 0.5)
majority_vote <- ifelse(sum(probs > 0.5) > 5, "Red", "Green")

# Average probability
avg_prob <- mean(probs)
average_vote <- ifelse(avg_prob > 0.5, "Red", "Green")

cat("Majority vote prediction:", majority_vote, "\n")
## Majority vote prediction: Red
cat("Average probability prediction:", average_vote, "\n")
## Average probability prediction: Green

Explanation:
This code calculates the final class prediction using both the majority vote and average probability approaches, as described in the exercise.

6. Provide a detailed explanation of the algorithm that is used to fit a regression tree.

# Pseudo-code for fitting a regression tree
# 1. Start with all data in a single region.
# 2. For each predictor and split point, calculate the sum of squared errors for splitting the data.
# 3. Choose the split that minimizes the sum of squared errors.
# 4. Repeat the process recursively for each resulting region, until a stopping criterion is met (e.g., minimum node size).
# 5. Assign the mean of Y in each region as the prediction for that region.

# Example: fit a regression tree in R
library(rpart)
fit <- rpart(y ~ x1 + x2, data = data)
rpart.plot(fit)

Explanation:
This code provides a pseudo-code outline of the regression tree fitting algorithm and shows how to fit a regression tree in R using rpart.