Assignment 5 Shiny Web app

Objectives and Background
Data Preparation and Transformation
Predictors Selection
Modelling Process
Evaluation
Conclusion

# Vector with the required packages 
packages <- c('xts', 'quantmod', 'MASS', 'forecast', 'tsbox', 'dplyr', 'lubridate', 'stats', 'tis', 'forecast', 'tidyquant', 'tsbox', 'prophet', 'nnet',
'devtools', 'performanceEstimation', 'TTR', 'DMwR2', 'earth', 'rpart', 'nnet', 'ranger', 'e1071') 
# Checking for package installations on the system and installing if not found.
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}
# Packages to use
for(package in packages){
  library(package, character.only = TRUE)
}

1. Objectives and Background

The objective of this task is to develop a Web app to predict future evolution of closing prices of an asset. The app will deploy the recent evolution of the closing prices of the selected stock together with the predictions of the closing price for the next 5 daily sessions.

The stocks selected for this task are the tech stocks: Apple -AAPL, Microsoft - MSFT, Google - GOOGL, Amazon-AMZN and Facebook-FB. According to Yahoo finance these are the stocks that move the market and they are categorised as “The Only Tech Stocks That Matter”.

Before the creation of the app, a separate study was first carried out to ascertain which models and indicators are suitable for the app’s prediction task.

2. Data Preparation and Transformation

For the purpose of the assignment, we assumed that the future value of the stock can be forecasted by observing historical movements in prices.

Firstly, we gather the stocks information as per the code below:

getSymbols('AAPL')

## [1] "AAPL"

getSymbols('MSFT')

## [1] "MSFT"

getSymbols('GOOGL')

## [1] "GOOGL"

getSymbols('AMZN')

## [1] "AMZN"

getSymbols('FB')

## [1] "FB"

Secondly, we get information of the technical indicators of the stocks as they reflect properties of the price time series. We will use this information as predictors of the closing price. The technical indicator information is gathered using the following functions:

myATR <- function(x) ATR(HLC(x))[,'atr']
mySMI <- function(x) SMI(HLC(x))[, "SMI"]
myADX <- function(x) ADX(HLC(x))[,'ADX']
myAroon <- function(x) aroon(cbind(Hi(x),Lo(x)))$oscillator
myEMV <- function(x) EMV(cbind(Hi(x),Lo(x)),Vo(x))[,2]
myMACD <- function(x) MACD(Cl(x))[,2]
myMFI <- function(x) MFI(HLC(x), Vo(x))
mySAR <- function(x) SAR(cbind(Hi(x),Cl(x))) [,1]
myVolat <- function(x) volatility(OHLC(x),calc="garman")[,1]

3. Predictors Selection

As the objective of the prediction is to forecast the closing price for the next 5 days, it is necessary to have predictors with information available for the forecasting day. We have assumed that a 5 day lag of every predictor can provide useful information for the model and ensure adequate performance.

With these assumptions and information, a model was created with Closing price as the target variable and the 5 day lag of the technical indicators as predictors.

data.model <- specifyModel(Cl(AAPL) ~ lag(myATR(AAPL),5) + lag(mySMI(AAPL),5) +
                             lag(myADX(AAPL),5) + lag(myAroon(AAPL),5) + lag(myEMV(AAPL),5) +
                             lag(myVolat(AAPL),5) + lag(myMACD(AAPL),5) + lag(myMFI(AAPL),5) + lag(mySAR(AAPL),5) + lag(Cl(AAPL),5))

Using the buildmodel function from the quantmod package, we subsequently attached a Random Forest Model to the data in order to estimate the importance of the technical indicators in this predictive task. We observed the respective Summary and Variable Importance Plot outcomes below.

set.seed(1234)
rf <- buildModel(data.model, method = 'randomForest', training.per=c('2015-01-02','2019-09-06'), ntree = 500, importance = TRUE)

## loading required package: randomForest

varImpPlot(rf@fitted.model, type= 1)

imp <- randomForest::importance(rf@fitted.model, type = 1)

Based on the variables’ importance, which is measured by the increase in MSE, we used the value 18 as the threshold, and 5 technical indicators, namely lag(mySMI(AAPL),5), lag(myADX(AAPL),5), lag(myMFI(AAPL),5), lag(mySAR(AAPL),5), lag(Cl(AAPL),5)) will be used in model building.

imp

##                     %IncMSE
## lag.myATR.AAPL.5   17.04542
## lag.mySMI.AAPL.5   16.26814
## lag.myADX.AAPL.5   26.31701
## lag.myAroon.AAPL.5 13.80980
## lag.myEMV.AAPL.5   13.29478
## lag.myVolat.AAPL.5 13.74046
## lag.myMACD.AAPL.5  16.42464
## lag.myMFI.AAPL.5   21.11486
## lag.mySAR.AAPL.5   26.71937
## lag.Cl.AAPL.5      32.54717

imp2 <- rownames(imp)[which(imp>18)]
imp2

## [1] "lag.myADX.AAPL.5" "lag.myMFI.AAPL.5" "lag.mySAR.AAPL.5"
## [4] "lag.Cl.AAPL.5"

4. Modelling Process

A set of training data was extracted for the period 2Jan2015 to 30Aug2019 to perform model selection and comparison. Firstly, we tuned the parameters for the SVM model in order to get the parameters that provide a better performance. The codes belows display the tuning process:

## Tuning SVM with a standard workflow 
##creation of data model dataframe with training dataset
data.model1 <- specifyModel(Cl(AAPL) ~  lag(mySMI(AAPL),5) +
                              lag(myADX(AAPL),5)  + lag(myMFI(AAPL),5) + lag(mySAR(AAPL),5) + lag(Cl(AAPL),5))

Tdata.train <- as.data.frame(modelData(data.model1,
                                       data.window=c('2015-01-02','2019-08-30')))

Tform <- as.formula('Cl.AAPL ~ .') 

#model evaluation using performance estimation
p1 <- performanceEstimation(
  PredTask(Tform , Tdata.train , 'AAPL'),   
    workflowVariants(learner='svm',learner.pars=list(cost=c(10,5),gamma=c(0.01,0001))
  ),
  EstimationTask(metrics="rmse",
                 method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))

## 
## 
## ##### PERFORMANCE ESTIMATION USING  MONTE CARLO  #####
## 
## ** PREDICTIVE TASK :: AAPL
## 
## ++ MODEL/WORKFLOW :: svm.v1 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v2 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v3 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v4 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293

topPerformer(p1, 'rmse','AAPL')

## Workflow Object:
##  Workflow ID       ::  svm.v1 
##  Workflow Function ::  standardWF
##       Parameter values:
##       learner.pars  -> cost=10 gamma=0.01 
##       learner  -> svm

The best parameters for SVM with a standard workflow are cost = 10 and gamma = 0.01

## Tuning SVM with a timeseries workflow 
##creation of data model dataframe with training dataset

data.model1 <- specifyModel(Cl(AAPL) ~  lag(mySMI(AAPL),5) +
                              lag(myADX(AAPL),5)  + lag(myMFI(AAPL),5) + lag(mySAR(AAPL),5) + lag(Cl(AAPL),5))

Tdata.train <- as.data.frame(modelData(data.model1,
                                       data.window=c('2015-01-02','2019-08-30')))

Tform <- as.formula('Cl.AAPL ~ .') 

#model evaluation 
p2 <- performanceEstimation(
  PredTask(Tform , Tdata.train , 'AAPL'),   
    workflowVariants('timeseriesWF', wfID="slideSVM", 
               type="slide", relearn.step=c(20, 50, 80),
               learner='svm',learner.pars=list(cost=c(1,5,10),gamma=0.01)),
  EstimationTask(metrics="rmse",
                 method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))

## 
## 
## ##### PERFORMANCE ESTIMATION USING  MONTE CARLO  #####
## 
## ** PREDICTIVE TASK :: AAPL
## 
## ++ MODEL/WORKFLOW :: svm.v1 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v2 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v3 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v4 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v5 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v6 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v7 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v8 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v9 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293

topPerformer(p2, 'rmse','AAPL')

## Workflow Object:
##  Workflow ID       ::  svm.v4 
##  Workflow Function ::  timeseriesWF
##       Parameter values:
##       learner.pars  -> cost=5 gamma=0.01 
##       type  -> slide 
##       relearn.step  -> 20 
##       learner  -> svm

The best parameters for SVM with a timeseriesWF workflow are cost = 5, and gamma = 0.01 and relearn.step = 20

Having tuned some models, we proceed to run 8 workflows with different models and select the 3 best performing models.

##creation of data model dataframe with training dataset
data.model1 <- specifyModel(Cl(AAPL) ~  lag(mySMI(AAPL),5) +
                              lag(myADX(AAPL),5)  + lag(myMFI(AAPL),5) + lag(mySAR(AAPL),5) + lag(Cl(AAPL),5))

Tdata.train <- as.data.frame(modelData(data.model1,
                                       data.window=c('2015-01-02','2019-08-30')))

Tform <- as.formula('Cl.AAPL ~ .') 

#model evaluation 
m <- performanceEstimation(
  PredTask(Tform , Tdata.train , 'AAPL'),   
  c(Workflow(learner= 'lm'),
    Workflow('standardWF', wfID="standSVM",
             learner='svm',learner.pars=list(cost=10,gamma=0.01)),
    Workflow('timeseriesWF', wfID="slideSVM", 
             type="slide", relearn.step=20,
             learner='svm',learner.pars=list(cost=5,gamma=0.01)),
    Workflow(learner="rpart",.fullOutput=TRUE),
    Workflow(learner="rpartXse"),
        Workflow(learner="randomForest",learner.pars=list(ntree=200),
             wfID="rf420"),
     Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=6, decay=0.01)),
    Workflow(learner="earth",learner.pars=list(thres=0.001))
  ),
  EstimationTask(metrics="rmse",
                 method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))

## 
## 
## ##### PERFORMANCE ESTIMATION USING  MONTE CARLO  #####
## 
## ** PREDICTIVE TASK :: AAPL
## 
## ++ MODEL/WORKFLOW :: lm 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: standSVM 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: slideSVM 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpart 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpartXse 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rf420 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: nnet 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: earth 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293

5. Evaluation

The Root Mean Squared Error (RMSE) was used as the evaluation metric with Monte Carlo simulation to derive reliable estimates of the models’ performance. A summary of the results below.

plot(m)

rankWorkflows(m, top = 4)

## $AAPL
## $AAPL$rmse
##   Workflow  Estimate
## 1       lm  7.075292
## 2 slideSVM  7.939980
## 3    earth  9.848980
## 4 standSVM 14.544396

The best models are linear regression, SVM and neural networks.

In order to validate the results, the model is applied to the other stocks.

Facebook

##creation of data model dataframe with training dataset
data.model1 <- specifyModel(Cl(FB) ~  lag(mySMI(FB),5) +
                              lag(myADX(FB),5)  + lag(myMFI(FB),5) + lag(mySAR(FB),5) + lag(Cl(FB),5))

Tdata.train <- as.data.frame(modelData(data.model1,
                                       data.window=c('2015-01-02','2019-08-30')))

Tform <- as.formula('Cl.FB ~ .') 

#model evaluation 
m <- performanceEstimation(
  PredTask(Tform , Tdata.train , 'FB'),   
  c(Workflow(learner= 'lm'),
    Workflow('standardWF', wfID="standSVM",
             learner='svm',learner.pars=list(cost=10,gamma=0.01)),
    Workflow('timeseriesWF', wfID="slideSVM", 
             type="slide", relearn.step=20,
             learner='svm',learner.pars=list(cost=5,gamma=0.01)),
    Workflow(learner="rpart",.fullOutput=TRUE),
    Workflow(learner="rpartXse"),
        Workflow(learner="randomForest",learner.pars=list(ntree=200),
             wfID="rf420"),
     Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=6, decay=0.01)),
    Workflow(learner="earth",learner.pars=list(thres=0.001))
  ),
  EstimationTask(metrics="rmse",
                 method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))

## 
## 
## ##### PERFORMANCE ESTIMATION USING  MONTE CARLO  #####
## 
## ** PREDICTIVE TASK :: FB
## 
## ++ MODEL/WORKFLOW :: lm 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: standSVM 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: slideSVM 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpart 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpartXse 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rf420 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: nnet 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: earth 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293

rankWorkflows(m, top = 4)

## $FB
## $FB$rmse
##   Workflow Estimate
## 1 slideSVM 8.184133
## 2 standSVM 8.357157
## 3       lm 8.529362
## 4    earth 8.704058

Microsoft

##creation of data model dataframe with training dataset
data.model1 <- specifyModel(Cl(MSFT) ~  lag(mySMI(MSFT),5) +
                              lag(myADX(MSFT),5)  + lag(myMFI(MSFT),5) + lag(mySAR(MSFT),5) + lag(Cl(MSFT),5))

Tdata.train <- as.data.frame(modelData(data.model1,
                                       data.window=c('2015-01-02','2019-08-30')))

Tform <- as.formula('Cl.MSFT ~ .') 

#model evaluation
m <- performanceEstimation(
  PredTask(Tform , Tdata.train , 'MSFT'),   
  c(Workflow(learner= 'lm'),
    Workflow('standardWF', wfID="standSVM",
             learner='svm',learner.pars=list(cost=10,gamma=0.01)),
    Workflow('timeseriesWF', wfID="slideSVM", 
             type="slide", relearn.step=20,
             learner='svm',learner.pars=list(cost=5,gamma=0.01)),
    Workflow(learner="rpart",.fullOutput=TRUE),
    Workflow(learner="rpartXse"),
        Workflow(learner="randomForest",learner.pars=list(ntree=200),
             wfID="rf420"),
     Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=6, decay=0.01)),
    Workflow(learner="earth",learner.pars=list(thres=0.001))
  ),
  EstimationTask(metrics="rmse",
                 method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))

## 
## 
## ##### PERFORMANCE ESTIMATION USING  MONTE CARLO  #####
## 
## ** PREDICTIVE TASK :: MSFT
## 
## ++ MODEL/WORKFLOW :: lm 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: standSVM 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: slideSVM 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpart 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpartXse 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rf420 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: nnet 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: earth 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293

rankWorkflows(m, top = 4)

## $MSFT
## $MSFT$rmse
##   Workflow  Estimate
## 1       lm  2.893678
## 2 slideSVM  3.089363
## 3    earth  3.396743
## 4 standSVM 11.819423

Amazon

##creation of data model dataframe with training dataset
data.model1 <- specifyModel(Cl(AMZN) ~  lag(mySMI(AMZN),5) +
                              lag(myADX(AMZN),5)  + lag(myMFI(AMZN),5) + lag(mySAR(AMZN),5) + lag(Cl(AMZN),5))

Tdata.train <- as.data.frame(modelData(data.model1,
                                       data.window=c('2015-01-02','2019-08-30')))

Tform <- as.formula('Cl.AMZN ~ .') 

#model evaluation
m <- performanceEstimation(
  PredTask(Tform , Tdata.train , 'AMZN'),   
  c(Workflow(learner= 'lm'),
    Workflow('standardWF', wfID="standSVM",
             learner='svm',learner.pars=list(cost=10,gamma=0.01)),
    Workflow('timeseriesWF', wfID="slideSVM", 
             type="slide", relearn.step=20,
             learner='svm',learner.pars=list(cost=5,gamma=0.01)),
    Workflow(learner="rpart",.fullOutput=TRUE),
    Workflow(learner="rpartXse"),
        Workflow(learner="randomForest",learner.pars=list(ntree=200),
             wfID="rf420"),
     Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=6, decay=0.01)),
    Workflow(learner="earth",learner.pars=list(thres=0.001))
  ),
  EstimationTask(metrics="rmse",
                 method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))

## 
## 
## ##### PERFORMANCE ESTIMATION USING  MONTE CARLO  #####
## 
## ** PREDICTIVE TASK :: AMZN
## 
## ++ MODEL/WORKFLOW :: lm 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: standSVM 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: slideSVM 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpart 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpartXse 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rf420 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: nnet 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: earth 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293

rankWorkflows(m, top = 4)

## $AMZN
## $AMZN$rmse
##   Workflow  Estimate
## 1       lm  71.29069
## 2 slideSVM  74.19632
## 3    earth 219.32507
## 4 standSVM 328.68194

Google

##creation of data model dataframe with training dataset
data.model1 <- specifyModel(Cl(GOOGL) ~  lag(mySMI(GOOGL),5) +
                              lag(myADX(GOOGL),5)  + lag(myMFI(GOOGL),5) + lag(mySAR(GOOGL),5) + lag(Cl(GOOGL),5))

Tdata.train <- as.data.frame(modelData(data.model1,
                                       data.window=c('2015-01-02','2019-08-30')))

Tform <- as.formula('Cl.GOOGL ~ .') 

#model evaluation
m <- performanceEstimation(
  PredTask(Tform , Tdata.train , 'GOOGL'),   
  c(Workflow(learner= 'lm'),
    Workflow('standardWF', wfID="standSVM",
             learner='svm',learner.pars=list(cost=10,gamma=0.01)),
    Workflow('timeseriesWF', wfID="slideSVM", 
             type="slide", relearn.step=20,
             learner='svm',learner.pars=list(cost=5,gamma=0.01)),
    Workflow(learner="rpart",.fullOutput=TRUE),
    Workflow(learner="rpartXse"),
        Workflow(learner="randomForest",learner.pars=list(ntree=200),
             wfID="rf420"),
     Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=6, decay=0.01)),
    Workflow(learner="earth",learner.pars=list(thres=0.001))
  ),
  EstimationTask(metrics="rmse",
                 method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))

## 
## 
## ##### PERFORMANCE ESTIMATION USING  MONTE CARLO  #####
## 
## ** PREDICTIVE TASK :: GOOGL
## 
## ++ MODEL/WORKFLOW :: lm 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: standSVM 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: slideSVM 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpart 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpartXse 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rf420 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: nnet 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293 
## 
## 
## 
## ++ MODEL/WORKFLOW :: earth 
## Task for estimating  rmse  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  685 ; test size =  293 
## Repetition  2 
##   start test =  688 ; test size =  293 
## Repetition  3 
##   start test =  698 ; test size =  293 
## Repetition  4 
##   start test =  720 ; test size =  293 
## Repetition  5 
##   start test =  871 ; test size =  293

rankWorkflows(m, top = 4)

## $GOOGL
## $GOOGL$rmse
##   Workflow  Estimate
## 1       lm  39.38511
## 2 slideSVM  40.75577
## 3 standSVM  63.07308
## 4    rpart 151.05835

After applying the performance estimation function to all the stocks, the results indicate that the best models are linear regression and SVM (slide and standard). Neural network had a good performance on the Apple and Facebook stocks and MARS - Multivariate Adaptive Regression Spline had a good performance on Amazon, Google and Microsoft. Since we have to restrict the models available for selection in to 3, we selected the MARS model over neural network given its performance is better in 3 out of 5 stocks we want to predict.

6. Conclusion

The models to be selected for the shiny app are linear regression, SVM and MARS.