Improving model performance

Ensemble Approach

  • As an alternative to increasing the performance of a single model, it is possible to combine several models to form a powerful team.
  • By intelligently using the talents of several diverse team members, it is possible to create a strong team of multiple weak learners.

Process Diagram

Process Diagram

Process Diagram

Advantages

Random Forest

Random Forest for Regression of Classification Algorithm

  1. For \(b=1\) to \(B\):

    1. Draw a bootstrap sample \(Z^{*}\) of size \(N\) from the training data.

    2. Grow a random-forest tree \(T_b\) to the bootstrapped data, by recursively repeating the following steps for each terminal node of the tree, until the minimum nod size \(n_{min}\) is reached.

      i.Select \(m\) variables at random from the \(p\) variables.

      ii.Pick the best variable/split-point among the \(m\).

      iii.Split the node into two daughter nodes.

  2. Output the ensemble of trees \(\left\{T_b\right\}_1^B\).

To make a prediction at a new point \(x\):

\(Regression\): \(\hat{f}_{rf}^B=\frac{1}{B}\sum_{b=1}^{B}T_b(x)\).

\(Classification\): Let \(\hat{C}_b(x)\) be the class prediction of the \(b\) th random-forest tree.Then \(\hat{C}_{rf}^B(x)=majority \space vote \left\{ \hat{C}_b(x)\right\}_1^B\).

Difference to Standard Decision Tree

Random Forest Variance Decomposition

\(Var(\frac{1}{B}\sum_{i=1}^{B}T_i(c))=\frac{1}{B^2}\sum_{i=1}^{B}\sum_{j=1}^{B}Cov(T_i(x),T_j(x))\)

\(=\frac{1}{B^2}\sum_{i=1}^{B}\left (\sum_{j\neq i}^{B}Cov(T_i(x),T_j(x))+Var(T_i(x)) \right )\)

\(=\frac{1}{B^2}\sum_{i=1}^{B}\left ( (B-1) \sigma^2 \cdot \rho +\sigma ^2\right )\)

\(=\frac{B(B-1)\rho\sigma^2+B\sigma^2}{B^2}=\frac{(B-1)\rho\sigma^2}{B}+\frac{\sigma^2}{B}\)

\(=\rho\sigma^2-\frac{\rho\sigma^2}{B}+\frac{\sigma^2}{B}=\rho\sigma^2+\sigma^2\frac{1-\rho}{B}\)

Decision Tree v.s. Random Forest

Trees


Random Forest