The Stump Model Base Learner of Adaboost

The stump model is a CART model with just one split, hence having two terminal nodes. The key is to find a splitting rule (with a splitting variable and a cutting point), and properly record the fitted value at each terminal node. For this question, I use a one-dimensional example, so only the cutting point needs to be searched. An additional difficulty is that I need to incorporate subject weights, which is used in AdaBoost.

  set.seed(5)
  n = 200
  x = runif(n)
  py <- function(x) sin(4*pi*x)/3 + 0.5
  y = (rbinom(n, 1, py(x))-0.5)*2
  
  plot(x, y + 0.1*runif(n, -1, 1), ylim = c(-1.1, 1.1), pch = 19, 
       col = ifelse(y == 1, "darkorange", "deepskyblue"), ylab = "y")
  lines(sort(x), py(x)[order(x)] - 0.5)

  testx = seq(0, 1, length.out = 1000)
  testy = (rbinom(1000, 1, py(testx))-0.5)*2

The stump model works this way:

Input: A set of data \({\cal D}_n = \{x_i, y_i, w_i\}_{i=1}^n\)
Output: The cutting point \(c\), and node predictions \(f_L, f_R \in \{-1, 1\}\)
Step 1: Search for a splitting rule \(\mathbf{1}(x \leq c)\) that maximizes the weighted reduction of Gini impurity. \[ \texttt{score} = - \, \frac{\sum_{ {\cal T}_L} w_i}{\sum w_i} \text{Gini}({\cal T}_L) - \frac{\sum_{ {\cal T}_R} w_i}{\sum w_i} \text{Gini}( {\cal T}_R ),\] where, for given data in a potential node \({\cal T}\), the weighted version of Gini impurity is \[ \text{Gini}({\cal T}) = \widehat p (1- \widehat p), \qquad \widehat p = (\textstyle \sum w_i)^{-1} \textstyle\sum w_i I(y_i = 1).\]
Step 2: Calculate and record the left and the right node prediction values \(f_L, f_R \in \{-1, 1\}\) respectively.

I wrote a function called myStump(x, y, w) that outputs the cutoff point, and the left and right predictions. After that, I test the code in the following two scenarios: * All training data has equal weights * Observations with \(x \geq 0.5\) has weights 2, while observations with \(x < 0.5\) has weights 0.1.

  gini <- function(y, w){ 
    p = weighted.mean(y ==1, w)
  return(p*(1-p)) } 
  stump <-function(x, y, w){ 
    n = length(x) 
    score = rep(1, n-1)
  # search for the cutoff
  for (i in 1:(n-1)){ 
    index = (x <= x[i]) 
    score[i] = -(sum(w[index])*gini(y[index], w[index]) + 
                         sum(w[!index])*gini(y[!index], w[!index]))/sum(w) } 
    
    C = x[which.max(score)]
  
    # label: vote by (weighted) majority
  pl = weighted.mean(y[x <= C] ==1, w[x <= C]) 
  pr = weighted.mean(y[x > C] ==1, w[x > C])
  return(list(cutoff = C, left = ifelse(pl>0.5,1, -1), 
                          right = ifelse(pr>0.5,1, -1))) }
  # check two scenarios
  w = rep(1/n, n) 
  stump(x, y, w)

## $cutoff
## [1] 0.2486734
## 
## $left
## [1] 1
## 
## $right
## [1] -1

  w = ifelse(x >=0.5,2,0.1) 
  stump(x, y, w)

## $cutoff
## [1] 0.8036775
## 
## $left
## [1] 1
## 
## $right
## [1] -1

AdaBoost

Let’s write my own code for AdaBoost which is an iterative method that calls the stump model and updates the weights.

I calculated the classification error and the exponential loss bound \(n^{-1} \sum_{i=1}\exp\{- y_i \delta \sum_k \alpha_k f_k(x_i)\}\) over the number of tree iterations for the training data.
I wrote a prediction function for the testing data that outputs the predicted value of all testing samples over all tree iterations.
I plotted the three quantities above over the tree iterations: training error, training exponential loss bound, and the testing error.
I tried two different shrinkage factors and plotted the final model (functional value of \(F\), and also the sign) with the observed data.

  # fit Adaboost
  my_adaboost <-function(x, y, ntrees, epsilon = 0.5){ 
    modellist = list() 
    w = rep(1/length(x), length(x))
  
    for(k in 1 : ntrees){ 
    fk = stump(x, y, w) 
    
    fit = ifelse(x <= fk$cutoff, fk$left, fk$right) 
    e = sum(w*(fit != y))
  
    # cat(paste(" e is ", e, "\n") )
  a =0.5* log((1-e)/e) 
  w = w * exp(-epsilon*a*y*fit) 
  w = w / sum(w)
  # apply shrinkage parameter to alpha
  fk['alpha'] = a * epsilon 
  modellist[[k]] = fk }
  return(modellist) }

# predict by AdaBoost
  my_adaboost_pred <-function (fit, K, testx, testy){ 
    pred = rep(0, length(testx)) 
    exp_loss = err = rep(NA, K)
  for(k in 1: min(K, length(fit))){ 
    pred_tree = ifelse(testx <= fit[[k]]$cutoff, fit[[k]]$left, fit[[k]]$right) 
    pred = pred + fit[[k]]$alpha * pred_tree
    exp_loss[k] = mean( exp(- testy * pred) ) 
    err[k] = mean(testy != sign(pred)) }
  return (list("pred"= pred,"exp_loss"= exp_loss,"err"= err))}
  
  #Fit model and prediction
  K = c(200,400,1000) 
  delta = c(1,0.5,0.1) 
  pred_test <- list() 
  pred_train <- list()
  
  for(i in 1:3){ 
    myfit = my_adaboost(x, y, K[i], delta[i]) 
    pred_test[[i]] = my_adaboost_pred(myfit, K[i], testx, testy) 
    pred_train[[i]] = my_adaboost_pred(myfit, K[i], x, y) } 
  
  par(mfrow = c(2,3))

  # plot the exponential loss and error
  for(i in 1:3){ 
  main_text = paste0("Exponential loss and errors, delta = ", delta[i]) 
  plot(pred_train[[i]]$exp_loss, type ="l", ylim = c(0,1), 
       main = main_text, ylab ="exponetial loss", xlab ="ntrees") 
  lines(pred_train[[i]]$err, col ='green') 
  lines(pred_test[[i]]$err, col ='red') 
  legend("right", lty =1, legend = c("exp loss","train error","test error"), 
         col = c("black","green","red")) }

  # plot fitted model (last iteration)
  for(i in 1:3){ 
  main_text = paste0("Final model, delta = ", delta[i]) 
  plot(x, y +0.1*runif(n, -1,1), ylim = c(-1.1,1.1), 
       col = ifelse(y ==1,"darkorange","deepskyblue"), ylab ="y", 
       main = main_text) 
       lines(sort(x), py(x)[order(x)] -0.5) 
       lines(testx, pred_test[[i]]$pred, col ="red")
       lines(testx, ifelse(pred_test[[i]]$pred >0,1, -1), col ="blue", lty =2)}

Project 3

Leon Hounnou

8/2/2021

The Stump Model Base Learner of Adaboost

AdaBoost