library(mlbench)
set.seed(200)
<- mlbench.friedman1(200, sd = 1)
simulated <- cbind(simulated$x, simulated$y)
simulated <- as.data.frame(simulated)
simulated colnames(simulated)[ncol(simulated)] <- "y"
LFMG-Lab-9
Trees and Rules
8.1
Recreate the simulated data from Exercise 7.2
a)
Fit a random forest model to all of the predictors, then estimate the variable importance scores:
library(randomForest)
randomForest 4.7-1.2
Type rfNews() to see new features/changes/bug fixes.
library(caret)
Loading required package: ggplot2
Attaching package: 'ggplot2'
The following object is masked from 'package:randomForest':
margin
Loading required package: lattice
library(ggplot2)
library(dplyr)
Attaching package: 'dplyr'
The following object is masked from 'package:randomForest':
combine
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tibble)
library(knitr)
library(gbm)
Loaded gbm 2.2.2
This version of gbm is no longer under development. Consider transitioning to gbm3, https://github.com/gbm-developers/gbm3
<- randomForest(y ~ .,data = simulated,
model1 importance = TRUE,
ntree = 1000)
<- varImp(model1, scale = FALSE)
rfImp1
|> arrange(desc(Overall)) |> knitr::kable() rfImp1
Overall | |
---|---|
V1 | 8.7322354 |
V4 | 7.6151188 |
V2 | 6.4153694 |
V5 | 2.0235246 |
V3 | 0.7635918 |
V6 | 0.1651112 |
V7 | -0.0059617 |
V10 | -0.0749448 |
V9 | -0.0952927 |
V8 | -0.1663626 |
Did the random forest model significantly use the uninformative predictors (V6– V10)?
The five informative predictors (V1–V5) have substantially positive importance scores, with V1–V4 being the most dominant. On the other hand, the five uninformative predictors (V6–V10) all have importance scores very close to zero (even slightly negative in places). This basically tells us that the forest ignored V6–V10, that they were not used in any meaningful way by the model.
b)
Now add an additional predictor that is highly correlated with one of the informative predictors. For example:
$duplicate1 <- simulated$V1 + rnorm(200) * .1
simulatedcor(simulated$duplicate1, simulated$V1)
[1] 0.9460206
Fit another random forest model to these data. Did the importance score for V1 change? What happens when you add another predictor that is also highly correlated with V1?
# re‑fit with the first duplicate
<- randomForest(
model2 ~ .,
y data = simulated,
importance = TRUE,
ntree = 1000
)<- varImp(model2, scale = FALSE)
rfImp2 |>
rfImp2 arrange(desc(Overall)) |>
::kable(
knitrdigits = 4,
caption = "Variable importance with one V1‑duplicate"
)
Overall | |
---|---|
V4 | 7.0475 |
V2 | 6.0690 |
V1 | 5.6912 |
duplicate1 | 4.2833 |
V5 | 1.8724 |
V3 | 0.6297 |
V6 | 0.1357 |
V10 | 0.0289 |
V9 | 0.0084 |
V7 | -0.0135 |
V8 | -0.0437 |
After adding one duplicate V1 fell from its original 8.73 down to 5.69. Also, duplicate1 picked up the slack with an importance of 4.28. While total “credit” that belonged to V1 alone is now split around 60/40 between V1 and duplicate1.
# add a second highly‑correlated copy of V1
$duplicate2 <- simulated$V1 + rnorm(200) * 0.1
simulatedcat("cor(duplicate2, V1) =", cor(simulated$duplicate2, simulated$V1), "\n")
cor(duplicate2, V1) = 0.9408631
# re‑fit with both duplicates
<- randomForest(
model3 ~ .,
y data = simulated,
importance = TRUE,
ntree = 1000
)<- varImp(model3, scale = FALSE)
rfImp3 |>
rfImp3 arrange(desc(Overall)) |>
::kable(
knitrdigits = 4,
caption = "Variable importance with two V1‑duplicates"
)
Overall | |
---|---|
V4 | 7.0487 |
V2 | 6.5282 |
V1 | 4.9169 |
duplicate1 | 3.8007 |
V5 | 2.0312 |
duplicate2 | 1.8772 |
V3 | 0.5871 |
V6 | 0.1421 |
V7 | 0.1099 |
V10 | 0.0923 |
V9 | -0.0108 |
V8 | -0.0841 |
After adding a second duplicate V1 dropped again to 4.92, duplicate1 has 3.80 and duplicate2 at 1.8. So now the original signal is divided three ways, so each correlated predictor only receives a fraction of V1’s original score.
c)
Use the cforest function in the party package to fit a random forest model using conditional inference trees. The party package function varimp can calculate predictor importance. The conditional argument of that func tion toggles between the traditional importance measure and the modified version described in Strobl et al. (2007). Do these importances show the same pattern as the traditional random forest model?
# 1) load party and set up cforest
library(party)
Loading required package: grid
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4
Loading required package: strucchange
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Loading required package: sandwich
Attaching package: 'party'
The following object is masked from 'package:dplyr':
where
<- cforest_control(ntree = 1000,
cf_ctrl mtry = floor(sqrt(ncol(simulated) - 1)))
<- cforest(
cf_model ~ .,
y data = simulated,
controls = cf_ctrl
)
Now we get the “raw” permutation importance (analogous to randomForest’s) and then get the “conditional” importance
<- varimp(cf_model, conditional = FALSE)
imp_raw <- sort(imp_raw, decreasing = TRUE)
imp_raw
<- varimp(cf_model, conditional = TRUE)
imp_cond <- sort(imp_cond, decreasing = TRUE) imp_cond
<- data.frame(
imp_tbl Variable = names(imp_raw),
Raw_Importance = round(imp_raw, 3),
Conditional_Importance = round(imp_cond[names(imp_raw)], 3)
)|>
imp_tbl kable(
caption = "cforest Variable Importances: Raw vs. Conditional"
)
Variable | Raw_Importance | Conditional_Importance | |
---|---|---|---|
V4 | V4 | 5.593 | 3.413 |
V2 | V2 | 4.958 | 3.166 |
duplicate1 | duplicate1 | 3.819 | 0.880 |
V1 | V1 | 3.645 | 1.027 |
duplicate2 | duplicate2 | 1.776 | 0.291 |
V5 | V5 | 1.487 | 0.742 |
V7 | V7 | 0.069 | 0.027 |
V3 | V3 | 0.062 | 0.027 |
V10 | V10 | 0.004 | -0.019 |
V6 | V6 | -0.021 | 0.001 |
V9 | V9 | -0.033 | -0.004 |
V8 | V8 | -0.054 | -0.011 |
In the raw permutation‐importance from cforest (conditional = FALSE), we see the same “dilution” effect that occurs in a traditional random forest: V1’s predictive power is split among itself and its two near‐duplicates (dup1 and dup2), so each of those three variables shares a portion of what was originally V1’s entire importance. Uninformative predictors (V3, V6–V10) remain very close to zero.
On the other hand, the conditional importance (conditional = TRUE) asks how much each variable contributes beyond all others. Here V1 recovers almost all its original signal, while dup1 and dup2 approach zero, showing that once we account for V1, its noisy copies add not as much new information. This conditional measure heks correct the bias introduced by correlated features, giving a clearer picture of which predictors truly matter.
d)
Repeat this process with different tree models, such as boosted trees and Cubist. Does the same pattern occur?
set.seed(200)
<- train(
gbm_fit ~ .,
y data = simulated,
method = "gbm",
trControl = trainControl(method = "cv", number = 5),
verbose = FALSE,
tuneGrid = expand.grid(
n.trees = 1000,
interaction.depth = 3,
shrinkage = 0.01,
n.minobsinnode = 10
)
)<- varImp(gbm_fit, scale = FALSE)$importance
gbm_imp |>
gbm_imp arrange(desc(Overall)) |>
::kable(caption = "GBM variable importance") knitr
Overall | |
---|---|
V4 | 43631.3487 |
V2 | 33053.5717 |
V1 | 23176.4461 |
V5 | 18146.9778 |
duplicate1 | 16302.5178 |
V3 | 11809.8170 |
duplicate2 | 4193.8147 |
V7 | 1672.9984 |
V6 | 1189.5452 |
V9 | 1049.2769 |
V8 | 760.6454 |
V10 | 676.3745 |
set.seed(200)
<- train(
cubist_fit ~ .,
y data = simulated,
method = "cubist",
trControl = trainControl(method = "cv", number = 5),
tuneGrid = expand.grid(committees = c(1, 5, 10), neighbors = c(0, 5))
)<- varImp(cubist_fit, scale = FALSE)$importance
cubist_imp |>
cubist_imp arrange(desc(Overall)) |>
::kable(caption = "Cubist variable importance") knitr
Overall | |
---|---|
V2 | 70.0 |
V1 | 57.5 |
V4 | 52.5 |
V5 | 50.0 |
duplicate2 | 28.5 |
duplicate1 | 26.0 |
V3 | 25.0 |
V6 | 9.0 |
V8 | 4.0 |
V7 | 0.0 |
V9 | 0.0 |
V10 | 0.0 |
For both GBM and Cubist, the five most informative variables (V1–V5) dominate the importance rankings, while the five noise variables (V6–V10) sit very near zero. In the GBM fit, V4 and V2 lead by a wide margin, with V1 and V5 next, and we see that duplicate1 and duplicate2 have “stolen” some of V1’s credit.
In the Cubist model also see the previous idea: V2 and V1 are most important, V4 and V5 follow, and the two duplicates pick up moderate importance while V6–V10 drop to single digits or zero.
This mirrors what we saw with randomForest and cforest raw importances: any impurity or permutation based measure will split a feature’s importance among its highly correlated copies. The fact that duplicates register substantial, although lower importance, and pure noise features register none, is both normal and expected for tree-based models that don’t adjust for conditional associations.
8.2
Use a simulation to show tree bias with different granularities.
We can use a method of multiple simulations, with null‐signal that repeatedly generates predictors completely driven by noise (and factors with 2, 5, or 10 levels) and that tracks how often each winds up as the very first split. Since none of the features truly influence y, any systematic over representation can only be down to split bias. By aggregating across hundreds of runs we can get a clear picture of that bias in action.
library(rpart)
library(tidyr)
set.seed(456)
<- 500 # 500 datasets to simulate
n.sim <- 200 # sample size per dataset
n <- c(2, 5, 10)
k.levels
# storage for which variable is used in the very first split
<- matrix(NA, n.sim, length(k.levels) + 1,
first_split dimnames = list(NULL,
c("X_cont", paste0("X_cat_", k.levels)))
)
for(i in seq_len(n.sim)) {
# simulate one dataset
<- runif(n)
X_cont <- factor(sample(letters[1:2], n, TRUE))
X_cat_2 <- factor(sample(letters[1:5], n, TRUE))
X_cat_5 <- factor(sample(letters[1:10], n, TRUE))
X_cat_10 <- rnorm(n)
y <- data.frame(y, X_cont, X_cat_2, X_cat_5, X_cat_10)
df
# fit a full tree (no pre-pruning)
<- rpart(y ~ ., data = df,
fit method = "anova",
control = rpart.control(cp = 0, minsplit = 2))
# grab the variable used at the root node
<- fit$frame$var[1]
vs <- colnames(first_split) == vs
first_split[i, ]
}
# compute selection frequencies
<- as_tibble(first_split) %>%
freq_df summarise(across(everything(), mean)) %>%
pivot_longer(everything(),
names_to = "Variable",
values_to = "Freq")
# plot
ggplot(freq_df, aes(x = Variable, y = Freq)) +
geom_col() +
labs(
title = "Frequency of First Split by Predictor Granularity",
y = "Proportion of Simulations",
x = NULL
+
) theme_minimal(base_size = 14)
The plot clearly shows the split‐bias we were looking for. Even though none of the predictors actually influence y, the tree picks the 10-level factor (X_cat_10) at the root in over half of the simulations, the continuous variable (X_cont) next most often, then the 5-level factor, and almost never the 2-level factor. In other words, variables with more potential cut-points get chosen more frequently purely by chance, which is exactly the bias we wanted to illustrate.
8.3
In stochastic gradient boosting the bagging fraction and learning rate will govern the construction of the trees as they are guided by the gradient. Although the optimal values of these parameters should be obtained through the tuning process, it is helpful to understand how the magnitudes of these parameters affect magnitudes of variable importance. Figure 8.24 provides the variable importance plots for boosting using two extreme values for the bagging fraction (0.1 and 0.9) and the learning rate (0.1 and 0.9) for the solubility data. The left-hand plot has both parameters set to 0.1, and the right-hand plot has both set to 0.9:
a)
Why does the model on the right focus its importance on just the first few of predictors, whereas the model on the left spreads importance across more predictors?
When we crank both the bagging fraction and the learning rate up to 0.9, each tree in the ensemble sees nearly the entire data set and takes a very large “step” toward minimizing the loss. Resulting in the strongest predictors taking almost all of the residual error immediately, so the booster keeps splitting on those few variables over and over leaving the weaker signals nearly untouched.
In the other case, with a low bag fraction (0.1) and low shrinkage (0.1), each tree is built on only 10% of the data and only nudges the fit a little bit. That randomness and those tiny steps force the algorithm to revisit and exploit secondary predictors in later iterations, so importance gets spread more broadly across many features.
b)
Which model do you think would be more predictive of other samples?
The model on the left with both bagging fraction and learning rate equal to 0.1 will most likely generalize better to new data. By subsampling heavily and taking only tiny steps toward the gradient, it builds many small, diverse trees that each capture a sliver of the remaining signal, making it much less likely to over-fitting any one strong predictor or noise fluctuation.
c)
How would increasing interaction depth affect the slope of predictor importance for either model in Fig. 8.24?
Interaction depth in boosting controls how many splits (or tree levels) each tree can have, which lets the algorithm capture higher‐order interactions. If we increase depth, the strongest predictors explain even more of the residual error (in part by interacting with each other), so they soak up disproportionately more “credit” and the ranking curve falls off more sharply. Meaning that deeper trees amplify the top predictors’ importance at the expense of the weaker ones, making the slope of importance vs rank steeper.
8.7
Refer to Exercises 6.3 and 7.5 which describe a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several tree-based models:
library(AppliedPredictiveModeling)
data(ChemicalManufacturingProcess)
<- data.frame(ChemicalManufacturingProcess) cmp
library(RANN)
# Impute missing values
<- preProcess(cmp, method = "knnImpute") # or "medianImpute"
preProc # Applying the preprocessing to fill in missing values
<- predict(preProc, newdata = cmp)
cmp_imputed # Checking if any NAs remain
sum(is.na(cmp_imputed))
[1] 0
# Use the imputed data from earlier
<- cmp_imputed
df # Remove near-zero variance predictors
<- nearZeroVar(df)
nzv <- df[, -nzv]
df # Splitting data into training and testing
set.seed(48)
<- createDataPartition(df$Yield, p = 0.8, list = FALSE)
split_index <- df[split_index, ]
train_data <- df[-split_index, ] test_data
# set up repeated CV
<- trainControl(
ctrl method = "repeatedcv",
number = 10,
repeats = 5,
savePredictions = "final"
)
# we start by training the tree-based models and create a list called models
<- list(
models CART= train(Yield ~ ., data = train_data, method = "rpart", trControl = ctrl),
Bagged= train(Yield ~ ., data = train_data, method = "treebag", trControl = ctrl),
RF= train(Yield ~ ., data = train_data, method = "rf", trControl = ctrl),
GBM= train(Yield ~ ., data = train_data,
method = "gbm", trControl= ctrl, verbose = FALSE),
Cubist = train(Yield ~ ., data = train_data,
method = "cubist", trControl = ctrl,
tuneGrid = expand.grid(committees = c(1, 5, 10),neighbors = c(0, 5)))
)
Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
: There were missing values in resampled performance measures.
# compare resampling distributions of the list models
<- resamples(models)
resamps print(summary(resamps))
Call:
summary.resamples(object = resamps)
Models: CART, Bagged, RF, GBM, Cubist
Number of resamples: 50
MAE
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
CART 0.4738215 0.6678340 0.7131400 0.7295551 0.7961814 0.9970055 0
Bagged 0.2997462 0.4381138 0.4976493 0.5127379 0.5713255 0.8791791 0
RF 0.2764440 0.4198121 0.4656958 0.4830576 0.5431133 0.6953974 0
GBM 0.2649495 0.4186403 0.4722164 0.4844123 0.5567139 0.7210251 0
Cubist 0.2745832 0.3822608 0.4290977 0.4309295 0.4793674 0.6008285 0
RMSE
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
CART 0.5640456 0.8002778 0.9097597 0.9015343 1.0019018 1.3160264 0
Bagged 0.3551259 0.5690975 0.6450312 0.6677866 0.7650986 1.1363785 0
RF 0.3948928 0.5619098 0.6047475 0.6339438 0.7290587 0.9109366 0
GBM 0.3361831 0.5580418 0.5962691 0.6227318 0.7115450 0.9238865 0
Cubist 0.3515904 0.4806503 0.5429953 0.5509858 0.6473205 0.7889099 0
Rsquared
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
CART 0.001198163 0.1654309 0.3043729 0.2901753 0.3967191 0.6251660 0
Bagged 0.162801691 0.5366377 0.6344274 0.6150948 0.7118715 0.8633077 0
RF 0.373154893 0.5602971 0.6446800 0.6601402 0.7559946 0.8920301 0
GBM 0.345088951 0.5682870 0.6745453 0.6530730 0.7398090 0.8684146 0
Cubist 0.353517039 0.6724127 0.7232309 0.7193237 0.8190026 0.8891353 0
bwplot(resamps, metric = "RMSE")
bwplot(resamps, metric = "Rsquared")
a)
Which tree-based regression model gives the optimal resampling and test set performance?
# predictions on the test set
<- lapply(models, predict, newdata = test_data)
test_preds
# compute test‐set RMSE and R2 for each
<- sapply(test_preds, function(p) {
test_perf c(
RMSE = caret::RMSE(p, test_data$Yield),
R2 = caret::R2 (p, test_data$Yield)
)
})
round(test_perf, 3)
CART Bagged RF GBM Cubist
RMSE 0.473 0.519 0.479 0.620 0.532
R2 0.671 0.618 0.664 0.488 0.655
In the initial RMSE plot the Cubist model sits lowest on the y-axis (meaning it has the smallest median and overall spread of RMSE), so it’s the best-performing model in terms of prediction error. Similarly, in the R square plot the Cubist again sits at the top with its median Rsquare highest (around 0.72), and its entire box is shifted right of the others. That makes Cubist the clear winner on explained‐variance in cross‐validation.
However, on the independent 20% hold-out sample, CART actually delivered the best generalization, with RMSE = 0.473 and Rsquare = 0.671 beating Cubist (RMSE 0.532, R square 0.655), RF, GBM, and bagging. It could be due to CART’s simplicity working in its favor, as the single tree fit the particular quirks of that test split better than the more complex ones, which either under or over fit slightly.
b)
Which predictors are most important in the optimal tree-based regression model? Do either the biological or process variables dominate the list? How do the top 10 important predictors compare to the top 10 predictors from the optimal linear and nonlinear models?
We can compare both the Cubist and CART, since they could be considered optimal depending on which moment of model evaluation is prioritized.
# Start withC CART
<- varImp(models$CART, scale = FALSE)$importance %>%
cart_imp rownames_to_column("var") %>%
arrange(desc(Overall))
<- cart_imp %>%
top_cart slice(1:10) %>%
mutate(
type = case_when(
grepl("^ManufacturingProcess", var) ~ "Process",
grepl("^BiologicalMaterial", var) ~ "Biological",
TRUE ~ "Other"
) )
# Then we continue with Cubist
<- varImp(models$Cubist, scale = FALSE)$importance %>%
cubist_imp rownames_to_column("var") %>%
arrange(desc(Overall))
<- cubist_imp %>%
top_cubist slice(1:10) %>%
mutate(
type = case_when(
grepl("^ManufacturingProcess", var) ~ "Process",
grepl("^BiologicalMaterial", var) ~ "Biological",
TRUE ~ "Other"
) )
cat("Top 10 — CART:\n")
Top 10 — CART:
print(top_cart)
var Overall type
1 ManufacturingProcess32 0.3545986 Process
2 BiologicalMaterial12 0.2820327 Biological
3 ManufacturingProcess13 0.2755249 Process
4 ManufacturingProcess36 0.2621354 Process
5 ManufacturingProcess31 0.2581273 Process
6 BiologicalMaterial01 0.0000000 Biological
7 BiologicalMaterial02 0.0000000 Biological
8 BiologicalMaterial03 0.0000000 Biological
9 BiologicalMaterial04 0.0000000 Biological
10 BiologicalMaterial05 0.0000000 Biological
cat("\nTop 10 — Cubist:\n")
Top 10 — Cubist:
print(top_cubist)
var Overall type
1 ManufacturingProcess32 56.5 Process
2 ManufacturingProcess17 52.5 Process
3 ManufacturingProcess09 28.0 Process
4 ManufacturingProcess01 17.5 Process
5 BiologicalMaterial02 17.5 Biological
6 ManufacturingProcess13 16.5 Process
7 ManufacturingProcess29 14.5 Process
8 BiologicalMaterial03 13.0 Biological
9 ManufacturingProcess27 12.5 Process
10 BiologicalMaterial04 11.0 Biological
Across both CART and Cubist, the single most important predictor is ManufacturingProcess32. In the CART model it’s followed by BiologicalMaterial12, then three more process variables before all remaining biological measurements drop to zero importance. In the Cubist model, the top five are all process measures, with four more process variables rounding out the top ten alongside three biologicals at much lower importance scores.
In both cases, process variables dominate the top 10
CART: 4 out of the top 5 are process-related (and the next five are biological but with zero weight).
Cubist: 7 of the top 10 are process variables, versus just 3 biological.
To compare it with linear and non-linear models we can go back and check results from previous labs
SVM Predictors: 7 out of 10 are process-related variables, while 3 are biological. Process variables dominate in the SVM model, making up 70% of the top 10 predictors. This might indicate that process control factors (suchas equipment settings, temperature, timing) are more influential in determining chemical yield than the biological materials used.
PLS Predictors: 6 variables are shared across both models, which is a strong indicators of consistent importance. SVM picks up a few unique predictors (Process31, Bio12, Bio03).
Across all models, wether tree-based (CART and Cubist), linear (PLS), or nonlinear (SVM) the same process measurements consistently emerge as the strongest predictors. In particular, ManufacturingProcess32, Process13, and Process36 sit at or near the top of every top 10 list, and Process31 appears in SVM and CART as well. This heavy overlap (often 6–7 shared predictors across methods) shows that these process control factors are by far the most influential determinants of yield, regardless of whether we fit a simple tree, an ensemble, a linear latent-variable model, or an SVM.
c)
Plot the optimal single tree with the distribution of yield in the terminal nodes. Does this view of the data provide additional knowledge about the biological or process predictors and their relationship with yield?
library(rpart.plot)
prp(
$CART$finalModel,
modelstype = 2, # label nodes by split criteria
extra = 101, # draw a little histogram of y in each leaf
fallen.leaves = TRUE,
main = "Optimal CART Tree with Yield Distributions"
)
We can see that the single–tree split on ManufacturingProcess32 at 0.19 creates two very distinct yield regimes:
Left branch (MP32 < 0.19) (Terminal 2) has a median Yield around -0.5, with a wide spread down to about -3 and a few outliers up near +1.
Right branch (MP32 ≥ 0.19) (Terminal 3) jumps to a median Yield of roughly +0.7 and is much tighter, with most values sitting between 0 and +2.
Since no biological variables ever appear in the splits, this view confirms that process conditions (in this case, MP32) dominate the yield relationship.
# grab each training row’s terminal‐node number
$terminal <- models$CART$finalModel$where
train_data
# then plot
ggplot(train_data, aes(x = factor(terminal), y = Yield)) +
geom_boxplot() +
labs(
title = "Yield Distribution Across CART Terminal Nodes",
x = "Terminal Node",
y = "Yield"
+
) theme_minimal(base_size = 14)
The box‐plots don’t reveal any secondary splits on the biological materials. Any remaining variability within the node it’s probably random noise.
Overall, plotting the yield distributions in each terminal node makes the size and consistency of the process‐variable effect very clear and shows that biological measurements add little explanation power once MP32 is partitioned.