df <- data.frame(pm1 = seq(0.01, 0.99, by = 0.01))
df$Gini <- 2 * df$pm1 * (1 - df$pm1)
df$Entropy <- -df$pm1 * log(df$pm1) - (1 - df$pm1) * log(1 - df$pm1)
df$Error <- 1 - pmax(df$pm1, 1 - df$pm1)
library(ggplot2)
ggplot(df, aes(x = pm1)) +
geom_line(aes(y = Gini, color = "Gini Index")) +
geom_line(aes(y = Entropy, color = "Entropy")) +
geom_line(aes(y = Error, color = "Classification Error")) +
labs(
title = "Impurity Measures vs. Class Probability",
x = expression((p)[m1]),
y = "Measure Value",
color = "Impurity Metric"
) +
theme_minimal()
We can observe fron this plot that : Classification error is less sensitive to probability than entropy and Gini, which makes it less informative during tree splitting. Entropy and Gini both reach their maximum when p^m1=0.5, i.e., the node is purest when one class dominates.
A: Refer Picture
ggplot() +
xlim(-1, 2) + ylim(0, 3) +
geom_segment(aes(x = -1, xend = 2, y = 1, yend = 1)) + # Horizontal
geom_segment(aes(x = 1, xend = 1, y = 0, yend = 1)) + # Right
geom_segment(aes(x = 0, xend = 0, y = 1, yend = 2)) + # Left
geom_segment(aes(x = -1, xend = 2, y = 2, yend = 2)) + # Top
geom_text(aes(x = 0, y = 0.5, label = "-1.80")) +
geom_text(aes(x = 1.5, y = 0.5, label = "0.63")) +
geom_text(aes(x = -0.5, y = 1.5, label = "-1.06")) +
geom_text(aes(x = 1, y = 1.5, label = "0.21")) +
geom_text(aes(x = 0.5, y = 2.5, label = "2.49")) +
xlab(expression(X[1])) + ylab(expression(X[2])) +
theme_minimal()
0.1, 0.15, 0.2, 0.2, 0.55, 0.6, 0.6, 0.65, 0.7, 0.75 P(Red|X)
We classify each prediction as Red if the probability > 0.5. Here, 6 out of 10 trees favor Red So final classification: Red Now the second apperoach :
mean(c(0.1, 0.15, 0.2, 0.2, 0.55, 0.6, 0.6, 0.65, 0.7, 0.75))
## [1] 0.45
Since the average is below 0.5 , The final classification: Green This shows how different ensemble strategies can lead to different classifications.
A:The process of building a regression tree includes growing and pruning: Step 1: Grow the Full Tree: Start with the entire dataset and split nodes based on predictors to minimize Residual Sum of Squares (RSS), stopping when a minimal node size is reached. Step 2: Apply Cost-Complexity Pruning: Introduce a penalty term alpha for tree complexity. At each pruning step, remove the split that yields the least reduction in RSS until we achieve multiple candidate trees with varying sizes. Step 3 : Use Cross-Validation to Select Optimal Tree: Perform K-fold CV. For each fold: Fit the largest tree on training data. Prune it to generate a sequence of subtrees. Compute test MSE on the held-out fold. Average the MSEs across folds for each tree size. Step 4: Choose the Tree Size with Lowest CV Error and retrain this tree on the entire dataset. Step 5: Prediction: The final tree can now be used for regression tasks on new data.