Yan Ping Wu
September, 7, 2018
Process Diagram
Better generalizability to future problems.
Improved performance on massive or miniscule datasets.
The ability to synthesize of data from distinct domains
A more nuanced understanding of difficult learning tasks.
Bagging(Bootstrap Aggregating)
Boosting
1.For \( b=1 \) to \( B \):
(a) Draw a bootstrap sample \( Z^{*} \) of size \( N \) from the training data.
(b) Grow a random-forest tree \( T_b \) to the bootstrapped data, by recursively repeating the following steps for each terminal node of the tree, until the minimum nod size \( n_{min} \) is reached.
i.Select \( m \) variables at random from the \( p \) variables.
ii.Pick the best variable/split-point among the \( m \).
iii.Split the node into two daughter nodes.
2.Output the ensemble of trees \( \left\{T_b\right\}_1^B \).
To make a prediction at a new point \( x \):
\( Regression \): \( \hat{f}_{rf}^B=\frac{1}{B}\sum_{b=1}^{B}T_b(x) \).
\( Classification \): Let \( \hat{C}_b(x) \) be the class prediction of the \( b \) th random-forest tree.Then \( \hat{C}_{rf}^B(x)=majority \space vote \left\{ \hat{C}_b(x)\right\}_1^B \).
\( Var(\frac{1}{B}\sum_{i=1}^{B}T_i(c))=\frac{1}{B^2}\sum_{i=1}^{B}\sum_{j=1}^{B}Cov(T_i(x),T_j(x)) \)
\( =\frac{1}{B^2}\sum_{i=1}^{B}\left (\sum_{j\neq i}^{B}Cov(T_i(x),T_j(x))+Var(T_i(x)) \right ) \)
\( =\frac{1}{B^2}\sum_{i=1}^{B}\left ( (B-1) \sigma^2 \cdot \rho +\sigma ^2\right ) \)
\( =\frac{B(B-1)\rho\sigma^2+B\sigma^2}{B^2}=\frac{(B-1)\rho\sigma^2}{B}+\frac{\sigma^2}{B} \)
\( =\rho\sigma^2-\frac{\rho\sigma^2}{B}+\frac{\sigma^2}{B}=\rho\sigma^2+\sigma^2\frac{1-\rho}{B} \)
Trees
Random Forest