"Developing Data Products

Javier Saravia
27/09/2019

APPLICATION DESCRIPTION:

Random Forest is a machine learning algorithm of the Bagging type.

Like most nonlinear models, the Random Forest has a series of parameters (or hyper parameters) that can be modified to make the algorithm more precise and efficient.

Through a slider you can select the value of the parameters (number of trees) and (number of variables to use per node). The idea is to be able to observe how changes in the parameters of the model affect its accuracy.

MODEL TRAINING

The model will training using a Random Forest regression with the inicial parameters: ntree = 100, mtry = 1


Call:
 randomForest(formula = Temp ~ Month + Solar.R + Ozone, data = data,      ntree = 100, mtry = 1) 
               Type of random forest: regression
                     Number of trees: 100
No. of variables tried at each split: 1

          Mean of squared residuals: 25.16971
                    % Var explained: 72.03

REAL VS PREDICTED PLOT

The idea is that we have a better model (changing parameters) if the show a high Pearson Correlation between the points.

plot of chunk unnamed-chunk-2

RESIDUALS PLOT

The residuals are the result of: real value - predicted value

If the histogram plot looks like a Gauss Bell (normal distribution) is a signal of more accurate model

plot of chunk unnamed-chunk-3

MAPE Error

A accurate model have a lower MAPE (Mean Absolute Percentual Error)

Mape is given by: mean(abs((real - predicted) /real))

[1] 0.0546