C. Donovan
04 April 2018
If it's not in the lecture or lab, it's not in the exam
combining models be good!
In principle another simple process
We'll looked at adaboost, we now look at realBoost
The problem is similar to before:
Initialise the algorithm with \( w_i=1/n \quad(i=1,..,n) \) and repeat the following steps for \( m=1,..,M \):
Iterate over \( m \)
In principle another simple process
Set initial weights for each of the \( n \) observations to be \( \frac{1}{n} \) (note these sum to one). Start iterations at \( m=1 \).
Fit classifier and predict probabilities for \( i=1,...,n \) i.e. \( \mathbf{p}_m(\mathbf{X}) \) to get this:
\[ f_m(\mathbf{X})=\frac{1}{2}\log\left(\frac{\mathbf{p}_m(\mathbf{X})}{(1-\mathbf{p}_m(\mathbf{X}))}\right) \]
At it's heart, it is \( p/(1-p) \), which are odds
At it's heart, it is \( p/(1-p) \), which are odds
\[ f_m(\mathbf{X})=\frac{1}{2}\log\left(\frac{\mathbf{p}_m(\mathbf{X})}{(1-\mathbf{p}_m(\mathbf{X}))}\right) \]
converting our predicted probabilities to log-odds - similar to logistic regression/GLM with logit link an binomial errors
In principle another simple process
Reweight for \( m+1 \) using:
\[ w_{(m+1,i)}=w_{m,i}e^{-yf_m(\mathbf{x}_i)}, \quad(i=1,..,n) \]
WTF?
Consider the weight updating function \( we^{-yf(\mathbf{x})} \):
So… weights are increased for observations that the model is misclassifying - just as in the ordinary Adaboost.
In principle another simple process
Then, normalise the weights i.e. make \( \sum_{i=1}^nw_{m,i}=1 \)
In principle another simple process
Class prediction for new data \( \mathbf{x}^* \) is sign\( (\sum_{m=1}^Mf_m(\mathbf{x^*})) \) so \( \hat{y}_i \) is either \( -1,1 \).
So
Rules of thumb
In a nutshell:
The predictive aggregate of lots of unstable models is better than the, almost random, best one you get from one fit to all the data.
Everybody on the Left
I want build some software to predict someone's age of death
Everybody on the Right
I want build some software to decide whether I should give someone a loan