To properly fit a gradient boosting model to the data, I examined four tuning parameters, \( M,\ J,\ \nu,\ \eta \) on a randomly selected sample of 1000 observations.
sample_size = 1000
sample_idxs = sample.int(nrow(X_train), sample_size)
sample_data = data.frame(y = y_train[sample_idxs, ], X_train[sample_idxs, ])
Whilst tuning the model, I set the number of iterations, \( M=150 \). With the model's results, I selected the best iteration from it's 5-fold CV error. I consistently found the best iteration to be less than 150 (after which the CV error increases). With this in mind, I was confident I did not need to increase the number of iterations, \( M \).
I chose an interaction depth, \( J=3 \), because the data is naturally 3-dimensional, and it offered some reduction in the CV error compared to the additive model. Any higher \( J \) was computationally too expensive.
With \( J \) selected, I searched for the optimum shrinkage, \( \nu \) between \( 0.5 \) and \( 0.001 \). I found this optimum was dependant on the chosen \( J \), \( \eta \) and the random sample, however I found \( \nu=0.1 \) to generally perform the best.
Next, I experimented with subsampling below its default value of \( \eta=\frac{1}{2} \), however found no significant improvement in the CV error for my small sample of observations. Varying \( \eta \) may yield improvements across the full training set, but this was not examined due to computational limits.
Finally, when applying these parameters to the full training set, I found extra iterations yielded better results. With computational limits in mind, I decided to set \( M=300 \). My final tuning parameters were:
\[ M=300,\ J=3,\ \nu=0.1,\ \eta=0.5 \]
gbm.fit = gbm(y~.,
n.trees=300,
interaction.depth=3,
shrinkage=0.1,
bag.fraction=0.5,
data=data,
distribution = "multinomial",
cv.folds=5,
verbose=TRUE,
n.cores=4)
The CV error plotted against the boosting iterations clearly shows the error decreasing (and suggests further iterations may yield marginal improvements):
best.iter <- gbm.perf(gbm.fit, method = "cv")
print(gbm.fit$cv.error[best.iter])
## [1] 0.03865
Of the 561 predictors, 228 have a non-zero influence and clearly a small portion have a strong influence. The figure below outlines this, however there are too many predictors to display each one in detail.
summary(gbm.fit)
The three most important predictors, V41
(tGravityAcc-mean()-X
), V53
(tGravityAcc-min()-X
) and V559
(angle(X,gravityMean)
) have a strong influence on the classification of Laying
(pink):
plot(gbm.fit, "V41")
plot(gbm.fit, "V53")
plot(gbm.fit, "V559")
A few steps down in importance, V505
(fBodyAccMag-mad()
) has a strong influence on Sitting
(green). If V505
is negative, it is less likely the observation will be classified as Sitting
:
plot(gbm.fit, "V505")
A Random Forest model is creatd with the default parameters:
rf.fit = randomForest(y ~ ., data = data)
The misclassification rate can be plotted against the number of trees:
plot(rf.fit)
TODO. It appears rf has different predictors as most important:
plot(rf.fit$importance)