# Vector with the required packages
packages <- c('xts', 'quantmod', 'MASS', 'forecast', 'tsbox', 'dplyr', 'lubridate', 'stats', 'tis', 'forecast', 'tidyquant', 'tsbox', 'prophet', 'nnet',
'devtools', 'performanceEstimation', 'TTR', 'DMwR2', 'earth', 'rpart', 'nnet', 'ranger', 'e1071')
# Checking for package installations on the system and installing if not found.
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
install.packages(setdiff(packages, rownames(installed.packages())))
}
# Packages to use
for(package in packages){
library(package, character.only = TRUE)
}
The objective of this task is to develop a Web app to predict future evolution of closing prices of an asset. The app will deploy the recent evolution of the closing prices of the selected stock together with the predictions of the closing price for the next 5 daily sessions.
The stocks selected for this task are the tech stocks: Apple -AAPL, Microsoft - MSFT, Google - GOOGL, Amazon-AMZN and Facebook-FB. According to Yahoo finance these are the stocks that move the market and they are categorised as “The Only Tech Stocks That Matter”.
Before the creation of the app, a separate study was first carried out to ascertain which models and indicators are suitable for the app’s prediction task.
For the purpose of the assignment, we assumed that the future value of the stock can be forecasted by observing historical movements in prices.
Firstly, we gather the stocks information as per the code below:
getSymbols('AAPL')
## [1] "AAPL"
getSymbols('MSFT')
## [1] "MSFT"
getSymbols('GOOGL')
## [1] "GOOGL"
getSymbols('AMZN')
## [1] "AMZN"
getSymbols('FB')
## [1] "FB"
Secondly, we get information of the technical indicators of the stocks as they reflect properties of the price time series. We will use this information as predictors of the closing price. The technical indicator information is gathered using the following functions:
myATR <- function(x) ATR(HLC(x))[,'atr']
mySMI <- function(x) SMI(HLC(x))[, "SMI"]
myADX <- function(x) ADX(HLC(x))[,'ADX']
myAroon <- function(x) aroon(cbind(Hi(x),Lo(x)))$oscillator
myEMV <- function(x) EMV(cbind(Hi(x),Lo(x)),Vo(x))[,2]
myMACD <- function(x) MACD(Cl(x))[,2]
myMFI <- function(x) MFI(HLC(x), Vo(x))
mySAR <- function(x) SAR(cbind(Hi(x),Cl(x))) [,1]
myVolat <- function(x) volatility(OHLC(x),calc="garman")[,1]
As the objective of the prediction is to forecast the closing price for the next 5 days, it is necessary to have predictors with information available for the forecasting day. We have assumed that a 5 day lag of every predictor can provide useful information for the model and ensure adequate performance.
With these assumptions and information, a model was created with Closing price as the target variable and the 5 day lag of the technical indicators as predictors.
data.model <- specifyModel(Cl(AAPL) ~ lag(myATR(AAPL),5) + lag(mySMI(AAPL),5) +
lag(myADX(AAPL),5) + lag(myAroon(AAPL),5) + lag(myEMV(AAPL),5) +
lag(myVolat(AAPL),5) + lag(myMACD(AAPL),5) + lag(myMFI(AAPL),5) + lag(mySAR(AAPL),5) + lag(Cl(AAPL),5))
Using the buildmodel function from the quantmod package, we subsequently attached a Random Forest Model to the data in order to estimate the importance of the technical indicators in this predictive task. We observed the respective Summary and Variable Importance Plot outcomes below.
set.seed(1234)
rf <- buildModel(data.model, method = 'randomForest', training.per=c('2015-01-02','2019-09-06'), ntree = 500, importance = TRUE)
## loading required package: randomForest
varImpPlot(rf@fitted.model, type= 1)
imp <- randomForest::importance(rf@fitted.model, type = 1)
Based on the variables’ importance, which is measured by the increase in MSE, we used the value 18 as the threshold, and 5 technical indicators, namely lag(mySMI(AAPL),5), lag(myADX(AAPL),5), lag(myMFI(AAPL),5), lag(mySAR(AAPL),5), lag(Cl(AAPL),5)) will be used in model building.
imp
## %IncMSE
## lag.myATR.AAPL.5 17.04542
## lag.mySMI.AAPL.5 16.26814
## lag.myADX.AAPL.5 26.31701
## lag.myAroon.AAPL.5 13.80980
## lag.myEMV.AAPL.5 13.29478
## lag.myVolat.AAPL.5 13.74046
## lag.myMACD.AAPL.5 16.42464
## lag.myMFI.AAPL.5 21.11486
## lag.mySAR.AAPL.5 26.71937
## lag.Cl.AAPL.5 32.54717
imp2 <- rownames(imp)[which(imp>18)]
imp2
## [1] "lag.myADX.AAPL.5" "lag.myMFI.AAPL.5" "lag.mySAR.AAPL.5"
## [4] "lag.Cl.AAPL.5"
A set of training data was extracted for the period 2Jan2015 to 30Aug2019 to perform model selection and comparison. Firstly, we tuned the parameters for the SVM model in order to get the parameters that provide a better performance. The codes belows display the tuning process:
## Tuning SVM with a standard workflow
##creation of data model dataframe with training dataset
data.model1 <- specifyModel(Cl(AAPL) ~ lag(mySMI(AAPL),5) +
lag(myADX(AAPL),5) + lag(myMFI(AAPL),5) + lag(mySAR(AAPL),5) + lag(Cl(AAPL),5))
Tdata.train <- as.data.frame(modelData(data.model1,
data.window=c('2015-01-02','2019-08-30')))
Tform <- as.formula('Cl.AAPL ~ .')
#model evaluation using performance estimation
p1 <- performanceEstimation(
PredTask(Tform , Tdata.train , 'AAPL'),
workflowVariants(learner='svm',learner.pars=list(cost=c(10,5),gamma=c(0.01,0001))
),
EstimationTask(metrics="rmse",
method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))
##
##
## ##### PERFORMANCE ESTIMATION USING MONTE CARLO #####
##
## ** PREDICTIVE TASK :: AAPL
##
## ++ MODEL/WORKFLOW :: svm.v1
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: svm.v2
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: svm.v3
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: svm.v4
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
topPerformer(p1, 'rmse','AAPL')
## Workflow Object:
## Workflow ID :: svm.v1
## Workflow Function :: standardWF
## Parameter values:
## learner.pars -> cost=10 gamma=0.01
## learner -> svm
The best parameters for SVM with a standard workflow are cost = 10 and gamma = 0.01
## Tuning SVM with a timeseries workflow
##creation of data model dataframe with training dataset
data.model1 <- specifyModel(Cl(AAPL) ~ lag(mySMI(AAPL),5) +
lag(myADX(AAPL),5) + lag(myMFI(AAPL),5) + lag(mySAR(AAPL),5) + lag(Cl(AAPL),5))
Tdata.train <- as.data.frame(modelData(data.model1,
data.window=c('2015-01-02','2019-08-30')))
Tform <- as.formula('Cl.AAPL ~ .')
#model evaluation
p2 <- performanceEstimation(
PredTask(Tform , Tdata.train , 'AAPL'),
workflowVariants('timeseriesWF', wfID="slideSVM",
type="slide", relearn.step=c(20, 50, 80),
learner='svm',learner.pars=list(cost=c(1,5,10),gamma=0.01)),
EstimationTask(metrics="rmse",
method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))
##
##
## ##### PERFORMANCE ESTIMATION USING MONTE CARLO #####
##
## ** PREDICTIVE TASK :: AAPL
##
## ++ MODEL/WORKFLOW :: svm.v1
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: svm.v2
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: svm.v3
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: svm.v4
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: svm.v5
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: svm.v6
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: svm.v7
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: svm.v8
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: svm.v9
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
topPerformer(p2, 'rmse','AAPL')
## Workflow Object:
## Workflow ID :: svm.v4
## Workflow Function :: timeseriesWF
## Parameter values:
## learner.pars -> cost=5 gamma=0.01
## type -> slide
## relearn.step -> 20
## learner -> svm
The best parameters for SVM with a timeseriesWF workflow are cost = 5, and gamma = 0.01 and relearn.step = 20
Having tuned some models, we proceed to run 8 workflows with different models and select the 3 best performing models.
##creation of data model dataframe with training dataset
data.model1 <- specifyModel(Cl(AAPL) ~ lag(mySMI(AAPL),5) +
lag(myADX(AAPL),5) + lag(myMFI(AAPL),5) + lag(mySAR(AAPL),5) + lag(Cl(AAPL),5))
Tdata.train <- as.data.frame(modelData(data.model1,
data.window=c('2015-01-02','2019-08-30')))
Tform <- as.formula('Cl.AAPL ~ .')
#model evaluation
m <- performanceEstimation(
PredTask(Tform , Tdata.train , 'AAPL'),
c(Workflow(learner= 'lm'),
Workflow('standardWF', wfID="standSVM",
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow('timeseriesWF', wfID="slideSVM",
type="slide", relearn.step=20,
learner='svm',learner.pars=list(cost=5,gamma=0.01)),
Workflow(learner="rpart",.fullOutput=TRUE),
Workflow(learner="rpartXse"),
Workflow(learner="randomForest",learner.pars=list(ntree=200),
wfID="rf420"),
Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=6, decay=0.01)),
Workflow(learner="earth",learner.pars=list(thres=0.001))
),
EstimationTask(metrics="rmse",
method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))
##
##
## ##### PERFORMANCE ESTIMATION USING MONTE CARLO #####
##
## ** PREDICTIVE TASK :: AAPL
##
## ++ MODEL/WORKFLOW :: lm
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: standSVM
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: slideSVM
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rpart
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rpartXse
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rf420
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: nnet
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: earth
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
The Root Mean Squared Error (RMSE) was used as the evaluation metric with Monte Carlo simulation to derive reliable estimates of the models’ performance. A summary of the results below.
plot(m)
rankWorkflows(m, top = 4)
## $AAPL
## $AAPL$rmse
## Workflow Estimate
## 1 lm 7.075292
## 2 slideSVM 7.939980
## 3 earth 9.848980
## 4 standSVM 14.544396
The best models are linear regression, SVM and neural networks.
In order to validate the results, the model is applied to the other stocks.
##creation of data model dataframe with training dataset
data.model1 <- specifyModel(Cl(FB) ~ lag(mySMI(FB),5) +
lag(myADX(FB),5) + lag(myMFI(FB),5) + lag(mySAR(FB),5) + lag(Cl(FB),5))
Tdata.train <- as.data.frame(modelData(data.model1,
data.window=c('2015-01-02','2019-08-30')))
Tform <- as.formula('Cl.FB ~ .')
#model evaluation
m <- performanceEstimation(
PredTask(Tform , Tdata.train , 'FB'),
c(Workflow(learner= 'lm'),
Workflow('standardWF', wfID="standSVM",
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow('timeseriesWF', wfID="slideSVM",
type="slide", relearn.step=20,
learner='svm',learner.pars=list(cost=5,gamma=0.01)),
Workflow(learner="rpart",.fullOutput=TRUE),
Workflow(learner="rpartXse"),
Workflow(learner="randomForest",learner.pars=list(ntree=200),
wfID="rf420"),
Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=6, decay=0.01)),
Workflow(learner="earth",learner.pars=list(thres=0.001))
),
EstimationTask(metrics="rmse",
method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))
##
##
## ##### PERFORMANCE ESTIMATION USING MONTE CARLO #####
##
## ** PREDICTIVE TASK :: FB
##
## ++ MODEL/WORKFLOW :: lm
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: standSVM
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: slideSVM
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rpart
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rpartXse
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rf420
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: nnet
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: earth
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
rankWorkflows(m, top = 4)
## $FB
## $FB$rmse
## Workflow Estimate
## 1 slideSVM 8.184133
## 2 standSVM 8.357157
## 3 lm 8.529362
## 4 earth 8.704058
##creation of data model dataframe with training dataset
data.model1 <- specifyModel(Cl(MSFT) ~ lag(mySMI(MSFT),5) +
lag(myADX(MSFT),5) + lag(myMFI(MSFT),5) + lag(mySAR(MSFT),5) + lag(Cl(MSFT),5))
Tdata.train <- as.data.frame(modelData(data.model1,
data.window=c('2015-01-02','2019-08-30')))
Tform <- as.formula('Cl.MSFT ~ .')
#model evaluation
m <- performanceEstimation(
PredTask(Tform , Tdata.train , 'MSFT'),
c(Workflow(learner= 'lm'),
Workflow('standardWF', wfID="standSVM",
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow('timeseriesWF', wfID="slideSVM",
type="slide", relearn.step=20,
learner='svm',learner.pars=list(cost=5,gamma=0.01)),
Workflow(learner="rpart",.fullOutput=TRUE),
Workflow(learner="rpartXse"),
Workflow(learner="randomForest",learner.pars=list(ntree=200),
wfID="rf420"),
Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=6, decay=0.01)),
Workflow(learner="earth",learner.pars=list(thres=0.001))
),
EstimationTask(metrics="rmse",
method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))
##
##
## ##### PERFORMANCE ESTIMATION USING MONTE CARLO #####
##
## ** PREDICTIVE TASK :: MSFT
##
## ++ MODEL/WORKFLOW :: lm
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: standSVM
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: slideSVM
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rpart
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rpartXse
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rf420
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: nnet
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: earth
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
rankWorkflows(m, top = 4)
## $MSFT
## $MSFT$rmse
## Workflow Estimate
## 1 lm 2.893678
## 2 slideSVM 3.089363
## 3 earth 3.396743
## 4 standSVM 11.819423
##creation of data model dataframe with training dataset
data.model1 <- specifyModel(Cl(AMZN) ~ lag(mySMI(AMZN),5) +
lag(myADX(AMZN),5) + lag(myMFI(AMZN),5) + lag(mySAR(AMZN),5) + lag(Cl(AMZN),5))
Tdata.train <- as.data.frame(modelData(data.model1,
data.window=c('2015-01-02','2019-08-30')))
Tform <- as.formula('Cl.AMZN ~ .')
#model evaluation
m <- performanceEstimation(
PredTask(Tform , Tdata.train , 'AMZN'),
c(Workflow(learner= 'lm'),
Workflow('standardWF', wfID="standSVM",
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow('timeseriesWF', wfID="slideSVM",
type="slide", relearn.step=20,
learner='svm',learner.pars=list(cost=5,gamma=0.01)),
Workflow(learner="rpart",.fullOutput=TRUE),
Workflow(learner="rpartXse"),
Workflow(learner="randomForest",learner.pars=list(ntree=200),
wfID="rf420"),
Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=6, decay=0.01)),
Workflow(learner="earth",learner.pars=list(thres=0.001))
),
EstimationTask(metrics="rmse",
method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))
##
##
## ##### PERFORMANCE ESTIMATION USING MONTE CARLO #####
##
## ** PREDICTIVE TASK :: AMZN
##
## ++ MODEL/WORKFLOW :: lm
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: standSVM
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: slideSVM
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rpart
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rpartXse
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rf420
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: nnet
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: earth
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
rankWorkflows(m, top = 4)
## $AMZN
## $AMZN$rmse
## Workflow Estimate
## 1 lm 71.29069
## 2 slideSVM 74.19632
## 3 earth 219.32507
## 4 standSVM 328.68194
##creation of data model dataframe with training dataset
data.model1 <- specifyModel(Cl(GOOGL) ~ lag(mySMI(GOOGL),5) +
lag(myADX(GOOGL),5) + lag(myMFI(GOOGL),5) + lag(mySAR(GOOGL),5) + lag(Cl(GOOGL),5))
Tdata.train <- as.data.frame(modelData(data.model1,
data.window=c('2015-01-02','2019-08-30')))
Tform <- as.formula('Cl.GOOGL ~ .')
#model evaluation
m <- performanceEstimation(
PredTask(Tform , Tdata.train , 'GOOGL'),
c(Workflow(learner= 'lm'),
Workflow('standardWF', wfID="standSVM",
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow('timeseriesWF', wfID="slideSVM",
type="slide", relearn.step=20,
learner='svm',learner.pars=list(cost=5,gamma=0.01)),
Workflow(learner="rpart",.fullOutput=TRUE),
Workflow(learner="rpartXse"),
Workflow(learner="randomForest",learner.pars=list(ntree=200),
wfID="rf420"),
Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=6, decay=0.01)),
Workflow(learner="earth",learner.pars=list(thres=0.001))
),
EstimationTask(metrics="rmse",
method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))
##
##
## ##### PERFORMANCE ESTIMATION USING MONTE CARLO #####
##
## ** PREDICTIVE TASK :: GOOGL
##
## ++ MODEL/WORKFLOW :: lm
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: standSVM
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: slideSVM
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rpart
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rpartXse
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: rf420
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: nnet
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
##
##
##
## ++ MODEL/WORKFLOW :: earth
## Task for estimating rmse using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 685 ; test size = 293
## Repetition 2
## start test = 688 ; test size = 293
## Repetition 3
## start test = 698 ; test size = 293
## Repetition 4
## start test = 720 ; test size = 293
## Repetition 5
## start test = 871 ; test size = 293
rankWorkflows(m, top = 4)
## $GOOGL
## $GOOGL$rmse
## Workflow Estimate
## 1 lm 39.38511
## 2 slideSVM 40.75577
## 3 standSVM 63.07308
## 4 rpart 151.05835
After applying the performance estimation function to all the stocks, the results indicate that the best models are linear regression and SVM (slide and standard). Neural network had a good performance on the Apple and Facebook stocks and MARS - Multivariate Adaptive Regression Spline had a good performance on Amazon, Google and Microsoft. Since we have to restrict the models available for selection in to 3, we selected the MARS model over neural network given its performance is better in 3 out of 5 stocks we want to predict.