The goal of this assignment is to develop a Web app with shiny that helps a user by presenting predictions of the future evolution of prices of an asset. As input actions the user shall be allowed to select among a pre-defined set of stock tickers and a prediction model again from a fixed set of available techniques.
This notebook will address the predictive modeling aspect of the assignment, with the Shiny app being build subsequently and separately. Specifically, we will assess the best modelling approach to predict the price of 5 different stocks using at least 3 prediction models.
Once the best modelling approach is selected, it will be implemented in the Shiny app.
First, we selected 5 different stocks. We decided to go for the 5 airlines (per the GICS Sector) of the S&P 500. The reason for the choice are:
The 5 stocks selected are:
# Creating a vector of packages used within.
packages <- c('dplyr', 'DMwR2', 'ggplot2', 'devtools', 'ROCR', 'performanceEstimation', 'UBL', 'ranger', 'e1071', 'xts', 'quantmod', 'TTR', 'earth', 'MASS', 'rpart', 'here', 'randomForest', 'knitr', 'gbm', 'nnet', 'cowplot', 'bizdays', 'timeDate')
# Checking for package installations on the system and installing if not found.
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
install.packages(setdiff(packages, rownames(installed.packages())))
}
# Including the packages for use.
for(package in packages){
library(package, character.only = TRUE)
}
#Ensure wd is set to current location by using here()
setwd(here::here())
#Check package version. Has to be 1.1.1.
# packageDescription('performanceEstimation')
# install_github("ltorgo/performanceEstimation",ref='develop')
We use the quantmod package to pull the stock data for the 5 selected stocks using a for loop
#Get stock data using a for loop
stocks <- c('ALK', 'AAL', 'DAL', 'LUV', 'UAL')
for (stock in stocks) {
assign(stock, getSymbols(stock, from = '2010-01-01', auto.assign = FALSE))
}
We visualize the time series performance of the stocks to make sure data was pulled correctly.
#Loop through stocks
for (stock in stocks) {
candleChart(xts::last(get(stock),'10 years'),theme='white',TA=NULL, name = paste(substr(names(get(stock))[1],1,3),'last 10 years'))
}
In order to predict stock prices we need to determine the predictors that we will use. One option would be to use a mere time series approach. However, we have calculated a number of technical indicators provided by the packages TTR and Quantmod. Technical indicators are numeric summaries that reflect some properties of the price time series. They provide some interesting summaries of the dynamic of a financial time series.
#True Range / Avg True Range - volatility of H-L-C series
myATR <- function(x) ATR(HLC(x))[,'atr']
#Stochastic Oscilator / Stochastic Momentum Index - related closer to midpoint of H-l in previous days
mySMI <- function(x) SMI(HLC(x))[, "SMI"]
#Welles Wilder's Directional Movement Index - % of true range that is up/down
myADX <- function(x) ADX(HLC(x))[,'ADX']
#Identified starting trends i.e. how long since the highest/lowert point in last n periods
myAroon <- function(x) aroon(cbind(Hi(x),Lo(x)))$oscillator
#Bollinger Bands
MyBB <- function(x) BBands(HLC(x))[,'pctB']
#Chaikin Volatility - percent difference between series
MyChaikinVol <- function(x) Delt(chaikinVolatility(cbind(Hi(x),Lo(x))))[,1]
#CLV - Days close to trading range
MyCLV <- function(x) EMA(CLV(HLC(x)))[,1]
#Arm's Ease of Movement Value - Emphasizes days were stock moves easily and minimizes not easily
myEMV <- function(x) EMV(cbind(Hi(x),Lo(x)),Vo(x))[,2]
#MACD Oscilator - Compares fast MA of a series with slow MA of the same series
myMACD <- function(x) MACD(Cl(x))[,2]
#Ratio of positive and negative money flow over time
myMFI <- function(x) MFI(HLC(x), Vo(x))
#Parabolit Stop-and-Reverse calculates a trailint stop
mySAR <- function(x) SAR(cbind(Hi(x),Cl(x))) [,1]
#Volatility estimator/indicator. German method.
myVolat <- function(x) volatility(OHLC(x),calc="garman")[,1]
We have identified a number of technical indicators that can potentially help in our prediction of the stock price. In order to gauge the importance of each predictor and check if we want to keep all or not.
In order to make a determination on what predictors to keep, we used a Random Forest model to estimate variable importance and set a threshold below which we will discard a variable i.e. based on a % of increase in the error if we were going to exclude that variable.
We plotted the importance in the fitted model of the different models.
We are also showing the % MSE driven by each indicator in a table.
data.model_AAL <- specifyModel(Cl(AAL) ~ Delt(Cl(AAL),k=1:3) + myATR(AAL) + mySMI(AAL) + myADX(AAL) + myAroon(AAL) + MyBB(AAL) + MyChaikinVol(AAL) + MyCLV(AAL) + myEMV(AAL) + myVolat(AAL) + myMACD(AAL) + myMFI(AAL) + mySAR(AAL) + runMean(Cl(AAL)) + CMO(Cl(AAL)) + EMA(Delt(Cl(AAL))) + RSI(Cl(AAL)) + runMean(Cl(AAL)) + runSD(Cl(AAL)))
test <- data.frame(Cl(AAL),Delt(Cl(AAL),k=1:3), myATR(AAL), lag(myATR(AAL), 5), mySMI(AAL), myADX(AAL), myAroon(AAL), MyBB(AAL), MyChaikinVol(AAL), MyCLV(AAL), myEMV(AAL), myVolat(AAL), myMACD(AAL), myMFI(AAL), mySAR(AAL), runMean(Cl(AAL)), CMO(Cl(AAL)), EMA(Delt(Cl(AAL))), RSI(Cl(AAL)), runMean(Cl(AAL)), runSD(Cl(AAL)))
set.seed(1234)
rf <- buildModel(x = data.model_AAL, method = 'randomForest', training.per=c('2010-01-01','2019-08-31'), ntree = 1000, importance = TRUE)
varImpPlot(rf@fitted.model, type = 1)
imp <- randomForest::importance(rf@fitted.model, type = 1)
kable(imp)
%IncMSE | |
---|---|
Delt.Cl.AAL.k.1.3.Delt.1.arithmetic | 6.7423326 |
Delt.Cl.AAL.k.1.3.Delt.2.arithmetic | 15.0454172 |
Delt.Cl.AAL.k.1.3.Delt.3.arithmetic | 15.0006270 |
myATR.AAL | 21.4640305 |
mySMI.AAL | 23.7441035 |
myADX.AAL | 26.7947513 |
myAroon.AAL | 18.8775610 |
MyBB.AAL | 29.5528169 |
MyChaikinVol.AAL | 0.2256495 |
MyCLV.AAL | 16.4688365 |
myEMV.AAL | 9.0908267 |
myVolat.AAL | 19.1167234 |
myMACD.AAL | 18.1530301 |
myMFI.AAL | 19.7267598 |
mySAR.AAL | 27.6380886 |
runMean.Cl.AAL | 37.1744845 |
CMO.Cl.AAL | 26.8975905 |
EMA.Delt.Cl.AAL | 23.2703432 |
RSI.Cl.AAL | 33.9688671 |
runSD.Cl.AAL | 11.9443934 |
Given that the mean importance was 18% and we needed to make predictions up to 5 days for 5 different stocks, we only considered those predictors with 20% or more % MSE.This leaves:
+My ATR +My SMI +MyADX +MySAR +RunMean +EMA Delt +RSI
kable(rownames(imp)[which(imp>20)])
x |
---|
myATR.AAL |
mySMI.AAL |
myADX.AAL |
MyBB.AAL |
mySAR.AAL |
runMean.Cl.AAL |
CMO.Cl.AAL |
EMA.Delt.Cl.AAL |
RSI.Cl.AAL |
For further analyses, we will use these predictors in our data models. Given that our 5 stocks operate within the same industry, we have assumed that these conclusions apply to the other stocks as well. This way, we will have a consistent group of predictors for all models.
Next, we specify the model and predictors. Specifically, we will try to predict the closing price of the specific stock. Given that we will need to make a prediction for 1 to 5 days and the technical indicators for each date won’t be available (i.e. leakage) we have lagged the predictors for 1 to 5 days, as this information would actually be available to make the predictions.
We divided the set between training set, which goes from January 2010 to July 2019 and test set, which include August 2019.
data.model_AAL <- specifyModel(Cl(AAL) ~ lag(myATR(AAL), 5) + lag(mySMI(AAL),5) + lag(myADX(AAL),5) + lag(mySAR(AAL),5) + lag(runMean(Cl(AAL)),5) + lag(EMA(Delt(Cl(AAL))),5) + lag(RSI(Cl(AAL)),5))
Tdata.train_AAL <- as.data.frame(modelData(data.model_AAL, data.window=c('2010-01-01','2019-07-31')))
Tdata.eval_AAL <- na.omit(as.data.frame(modelData(data.model_AAL, data.window=c('2019-08-01','2019-08-31'))))
Tform_AAL <- as.formula('Cl.AAL ~ .')
We reun a number of models with the following parameters: + Models: SVM (sliding and standard), lm, gbm, rpartXse, rpart, earth, nnet and randomforest + We used nmae as our evaluation metric as it’s normalize and easy to interpret. + We used MonteCarlo as our estimation method, with 5 iterations to obtain certainty on the performance of each of the models.
gbm.predict <- function(model, test, method, ...) {
best <- gbm.perf(model, plot.it=FALSE, method=method)
return(predict(model, test, n.trees=best, ...))
}
exp <- performanceEstimation(
PredTask(Tform_AAL, Tdata.train_AAL, 'AAL'),
c(Workflow('standardWF', wfID="standSVM",
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow('timeseriesWF', wfID="slideSVM",
type="slide", relearn.step=90,
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow(learner="randomForest",learner.pars=list(ntree=100),
wfID="rf420"),
Workflow(learner="rpart",.fullOutput=TRUE),
Workflow(learner= 'lm'),
Workflow(learner='gbm', learner.pars=list(n.trees=100, cv.folds=5), predictor='gbm.predict'),
Workflow(learner="rpartXse"),
Workflow(learner="earth",learner.pars=list(thres=0.001)),
Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=10, decay=0.01))
),
EstimationTask(metrics="nmae",
method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))
##
##
## ##### PERFORMANCE ESTIMATION USING MONTE CARLO #####
##
## ** PREDICTIVE TASK :: AAL
##
## ++ MODEL/WORKFLOW :: standSVM
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: slideSVM
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: rf420
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: rpart
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: lm
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: gbm
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 2
## start test = 1551 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 3
## start test = 1559 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 4
## start test = 1560 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 5
## start test = 1699 ; test size = 594
## Distribution not specified, assuming gaussian ...
##
##
##
## ++ MODEL/WORKFLOW :: rpartXse
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: earth
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: nnet
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
Below we have include the summary of the performance estimation tasks undertaken
summary(exp)
##
## == Summary of a Monte Carlo Performance Estimation Experiment ==
##
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
##
## * Predictive Tasks :: AAL
## * Workflows :: standSVM, slideSVM, rf420, rpart, lm, gbm, rpartXse, earth, nnet
##
## -> Task: AAL
## *Workflow: standSVM
## nmae
## avg 0.115048078
## std 0.014989660
## med 0.109625144
## iqr 0.001927339
## min 0.105123482
## max 0.141619005
## invalid 0.000000000
##
## *Workflow: slideSVM
## nmae
## avg 0.1056408818
## std 0.0174007392
## med 0.1033576695
## iqr 0.0004359777
## min 0.0848402417
## max 0.1332611405
## invalid 0.0000000000
##
## *Workflow: rf420
## nmae
## avg 0.158135879
## std 0.034693511
## med 0.159661883
## iqr 0.008784552
## min 0.110368673
## max 0.207989805
## invalid 0.000000000
##
## *Workflow: rpart
## nmae
## avg 0.18628860
## std 0.02505173
## med 0.19970967
## iqr 0.03091286
## min 0.15154953
## max 0.21154471
## invalid 0.00000000
##
## *Workflow: lm
## nmae
## avg 0.108162156
## std 0.021846779
## med 0.109851183
## iqr 0.001648006
## min 0.075454319
## max 0.137020065
## invalid 0.000000000
##
## *Workflow: gbm
## nmae
## avg 0.138066715
## std 0.029488059
## med 0.140756444
## iqr 0.003013282
## min 0.092147458
## max 0.174610982
## invalid 0.000000000
##
## *Workflow: rpartXse
## nmae
## avg 0.163858290
## std 0.035289428
## med 0.160662110
## iqr 0.003181326
## min 0.118197986
## max 0.217431705
## invalid 0.000000000
##
## *Workflow: earth
## nmae
## avg 0.12273230
## std 0.02674245
## med 0.12587048
## iqr 0.00147135
## min 0.08096336
## max 0.15573573
## invalid 0.00000000
##
## *Workflow: nnet
## nmae
## avg 0.15210920
## std 0.03241967
## med 0.14507666
## iqr 0.02291699
## min 0.12059789
## max 0.20498979
## invalid 0.00000000
Furthermore, we have included below the plot which shows the performance for each of the models in the 10 iterations (shown as boxplots).
plot(exp)
We run the different models using lagged predictors for 1 to 5 days. While the error increased as we lagged the predictors more (which is expectable), the Lm and the Sliding SVm resulted to be the best models for each of the iterations (i.e. 1 to 5 days), with the Standard SVM following. The graph above shows the results for 5 days lagged predictors.
Once this was concluded, we replicated this same analysis for the remaining 4 stocks, to test if the conclusions were the same.
#Build model
data.model_ALK <- specifyModel(Cl(ALK) ~ lag(myATR(ALK), 5) + lag(mySMI(ALK),5) + lag(myADX(ALK),5) + lag(mySAR(ALK),5) + lag(runMean(Cl(ALK)),5) + lag(EMA(Delt(Cl(ALK))),5) + lag(RSI(Cl(ALK)),5))
Tdata.train_ALK <- as.data.frame(modelData(data.model_ALK, data.window=c('2010-01-01','2019-07-31')))
Tdata.eval_ALK <- na.omit(as.data.frame(modelData(data.model_ALK, data.window=c('2019-08-01','2019-08-31'))))
Tform_ALK <- as.formula('Cl.ALK ~ .')
#Performance Estimation tasks
exp_ALK <- performanceEstimation(
PredTask(Tform_ALK, Tdata.train_ALK, 'ALK'),
c(Workflow('standardWF', wfID="standSVM",
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow('timeseriesWF', wfID="slideSVM",
type="slide", relearn.step=90,
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow(learner="randomForest",learner.pars=list(ntree=100),
wfID="rf420"),
Workflow(learner="rpart",.fullOutput=TRUE),
Workflow(learner= 'lm'),
Workflow(learner='gbm', learner.pars=list(n.trees=25, cv.folds=5), predictor='gbm.predict'),
Workflow(learner="rpartXse"),
Workflow(learner="earth",learner.pars=list(thres=0.001)),
Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=10, decay=0.01))
),
EstimationTask(metrics="nmae",
method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))
##
##
## ##### PERFORMANCE ESTIMATION USING MONTE CARLO #####
##
## ** PREDICTIVE TASK :: ALK
##
## ++ MODEL/WORKFLOW :: standSVM
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: slideSVM
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: rf420
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: rpart
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: lm
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: gbm
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 2
## start test = 1551 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 3
## start test = 1559 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 4
## start test = 1560 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 5
## start test = 1699 ; test size = 594
## Distribution not specified, assuming gaussian ...
##
##
##
## ++ MODEL/WORKFLOW :: rpartXse
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: earth
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: nnet
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
summary(exp_ALK)
##
## == Summary of a Monte Carlo Performance Estimation Experiment ==
##
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
##
## * Predictive Tasks :: ALK
## * Workflows :: standSVM, slideSVM, rf420, rpart, lm, gbm, rpartXse, earth, nnet
##
## -> Task: ALK
## *Workflow: standSVM
## nmae
## avg 0.11112773
## std 0.02729337
## med 0.09475328
## iqr 0.02386611
## min 0.09255428
## max 0.15619063
## invalid 0.00000000
##
## *Workflow: slideSVM
## nmae
## avg 0.084054849
## std 0.013848427
## med 0.081641261
## iqr 0.002301304
## min 0.068300818
## max 0.106449469
## invalid 0.000000000
##
## *Workflow: rf420
## nmae
## avg 0.23805992
## std 0.04639408
## med 0.21282942
## iqr 0.04594395
## min 0.19824232
## max 0.31087500
## invalid 0.00000000
##
## *Workflow: rpart
## nmae
## avg 0.23379063
## std 0.05549053
## med 0.19955476
## iqr 0.04904926
## min 0.19659157
## max 0.32521408
## invalid 0.00000000
##
## *Workflow: lm
## nmae
## avg 0.0750381687
## std 0.0152039358
## med 0.0769914044
## iqr 0.0004447275
## min 0.0509355306
## max 0.0932612127
## invalid 0.0000000000
##
## *Workflow: gbm
## nmae
## avg 0.32245725
## std 0.09626978
## med 0.27250286
## iqr 0.06950965
## min 0.26080431
## max 0.48669538
## invalid 0.00000000
##
## *Workflow: rpartXse
## nmae
## avg 0.23925067
## std 0.02065089
## med 0.23703291
## iqr 0.02560922
## min 0.21931569
## max 0.26990174
## invalid 0.00000000
##
## *Workflow: earth
## nmae
## avg 0.1982986
## std 0.1477604
## med 0.1282316
## iqr 0.0264172
## min 0.1215692
## max 0.4617127
## invalid 0.0000000
##
## *Workflow: nnet
## nmae
## avg 0.16258092
## std 0.01580117
## med 0.17077124
## iqr 0.02092761
## min 0.13952096
## max 0.17599050
## invalid 0.00000000
plot(exp_ALK)
Similar to the conclusions with AAL, the LM and the Sliding SVM are the best performing models for each iteration (i.e. 1 to 5 lag).
#Build model
data.model_DAL <- specifyModel(Cl(DAL) ~ lag(myATR(DAL), 5) + lag(mySMI(DAL),5) + lag(myADX(DAL),5) + lag(mySAR(DAL),5) + lag(runMean(Cl(DAL)),5) + lag(EMA(Delt(Cl(DAL))),5) + lag(RSI(Cl(DAL)),5))
Tdata.train_DAL <- as.data.frame(modelData(data.model_DAL, data.window=c('2010-01-01','2019-07-31')))
Tdata.eval_DAL <- na.omit(as.data.frame(modelData(data.model_DAL, data.window=c('2019-08-01','2019-08-31'))))
Tform_DAL <- as.formula('Cl.DAL ~ .')
#Performance Estimation tasks
exp_DAL <- performanceEstimation(
PredTask(Tform_DAL, Tdata.train_DAL, 'DAL'),
c(Workflow('standardWF', wfID="standSVM",
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow('timeseriesWF', wfID="slideSVM",
type="slide", relearn.step=90,
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow(learner="randomForest",learner.pars=list(ntree=100),
wfID="rf420"),
Workflow(learner="rpart",.fullOutput=TRUE),
Workflow(learner= 'lm'),
Workflow(learner='gbm', learner.pars=list(n.trees=25, cv.folds=5), predictor='gbm.predict'),
Workflow(learner="rpartXse"),
Workflow(learner="earth",learner.pars=list(thres=0.001)),
Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=10, decay=0.01))
),
EstimationTask(metrics="nmae",
method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))
##
##
## ##### PERFORMANCE ESTIMATION USING MONTE CARLO #####
##
## ** PREDICTIVE TASK :: DAL
##
## ++ MODEL/WORKFLOW :: standSVM
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: slideSVM
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: rf420
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: rpart
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: lm
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: gbm
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 2
## start test = 1551 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 3
## start test = 1559 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 4
## start test = 1560 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 5
## start test = 1699 ; test size = 594
## Distribution not specified, assuming gaussian ...
##
##
##
## ++ MODEL/WORKFLOW :: rpartXse
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: earth
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: nnet
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
summary(exp_DAL)
##
## == Summary of a Monte Carlo Performance Estimation Experiment ==
##
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
##
## * Predictive Tasks :: DAL
## * Workflows :: standSVM, slideSVM, rf420, rpart, lm, gbm, rpartXse, earth, nnet
##
## -> Task: DAL
## *Workflow: standSVM
## nmae
## avg 0.100073835
## std 0.013129771
## med 0.102848169
## iqr 0.003088132
## min 0.079211052
## max 0.115511777
## invalid 0.000000000
##
## *Workflow: slideSVM
## nmae
## avg 0.075519507
## std 0.007686614
## med 0.079292127
## iqr 0.002548964
## min 0.061910497
## max 0.079763776
## invalid 0.000000000
##
## *Workflow: rf420
## nmae
## avg 0.19738857
## std 0.05619705
## med 0.20996232
## iqr 0.01246602
## min 0.10251671
## max 0.25236883
## invalid 0.00000000
##
## *Workflow: rpart
## nmae
## avg 0.215051861
## std 0.075929803
## med 0.228375582
## iqr 0.009220149
## min 0.093586379
## max 0.303946292
## invalid 0.000000000
##
## *Workflow: lm
## nmae
## avg 0.076281503
## std 0.009256540
## med 0.078538247
## iqr 0.002242126
## min 0.061210281
## max 0.086511125
## invalid 0.000000000
##
## *Workflow: gbm
## nmae
## avg 0.337363446
## std 0.057555511
## med 0.337117088
## iqr 0.008702466
## min 0.258411780
## max 0.420850318
## invalid 0.000000000
##
## *Workflow: rpartXse
## nmae
## avg 0.144259980
## std 0.017715653
## med 0.137440674
## iqr 0.004817362
## min 0.131045392
## max 0.175280775
## invalid 0.000000000
##
## *Workflow: earth
## nmae
## avg 0.107216708
## std 0.022825587
## med 0.096430745
## iqr 0.006583378
## min 0.092975679
## max 0.147542701
## invalid 0.000000000
##
## *Workflow: nnet
## nmae
## avg 0.11605927
## std 0.01749918
## med 0.11173092
## iqr 0.01725000
## min 0.09911605
## max 0.14347170
## invalid 0.00000000
plot(exp_DAL)
Similar to the conclusions with AAL, the LM and the Sliding SVM are the best performing models for each iteration (i.e. 1 to 5 lag).
#Build model
data.model_LUV <- specifyModel(Cl(LUV) ~ lag(myATR(LUV), 5) + lag(mySMI(LUV),5) + lag(myADX(LUV),5) + lag(mySAR(LUV),5) + lag(runMean(Cl(LUV)),5) + lag(EMA(Delt(Cl(LUV))),5) + lag(RSI(Cl(LUV)),5))
Tdata.train_LUV <- as.data.frame(modelData(data.model_LUV, data.window=c('2010-01-01','2019-07-31')))
Tdata.eval_LUV <- na.omit(as.data.frame(modelData(data.model_LUV, data.window=c('2019-08-01','2019-08-31'))))
Tform_LUV <- as.formula('Cl.LUV ~ .')
#Performance Estimation tasks
exp_LUV <- performanceEstimation(
PredTask(Tform_LUV, Tdata.train_LUV, 'LUV'),
c(Workflow('standardWF', wfID="standSVM",
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow('timeseriesWF', wfID="slideSVM",
type="slide", relearn.step=90,
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow(learner="randomForest",learner.pars=list(ntree=100),
wfID="rf420"),
Workflow(learner="rpart",.fullOutput=TRUE),
Workflow(learner= 'lm'),
Workflow(learner='gbm', learner.pars=list(n.trees=25, cv.folds=5), predictor='gbm.predict'),
Workflow(learner="rpartXse"),
Workflow(learner="earth",learner.pars=list(thres=0.001)),
Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1500, size=10, decay=0.01))
),
EstimationTask(metrics="nmae",
method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))
##
##
## ##### PERFORMANCE ESTIMATION USING MONTE CARLO #####
##
## ** PREDICTIVE TASK :: LUV
##
## ++ MODEL/WORKFLOW :: standSVM
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: slideSVM
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: rf420
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: rpart
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: lm
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: gbm
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 2
## start test = 1551 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 3
## start test = 1559 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 4
## start test = 1560 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 5
## start test = 1699 ; test size = 594
## Distribution not specified, assuming gaussian ...
##
##
##
## ++ MODEL/WORKFLOW :: rpartXse
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: earth
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: nnet
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
summary(exp_LUV)
##
## == Summary of a Monte Carlo Performance Estimation Experiment ==
##
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
##
## * Predictive Tasks :: LUV
## * Workflows :: standSVM, slideSVM, rf420, rpart, lm, gbm, rpartXse, earth, nnet
##
## -> Task: LUV
## *Workflow: standSVM
## nmae
## avg 0.143047787
## std 0.023991455
## med 0.140123523
## iqr 0.003799764
## min 0.113023544
## max 0.180149147
## invalid 0.000000000
##
## *Workflow: slideSVM
## nmae
## avg 0.066440217
## std 0.002144861
## med 0.065425170
## iqr 0.001513464
## min 0.064953334
## max 0.070094211
## invalid 0.000000000
##
## *Workflow: rf420
## nmae
## avg 0.31894760
## std 0.08558052
## med 0.33165497
## iqr 0.01114144
## min 0.18086574
## max 0.41710913
## invalid 0.00000000
##
## *Workflow: rpart
## nmae
## avg 0.36825164
## std 0.08207889
## med 0.40238222
## iqr 0.00527719
## min 0.22190681
## max 0.41581529
## invalid 0.00000000
##
## *Workflow: lm
## nmae
## avg 0.059698131
## std 0.003108320
## med 0.057988445
## iqr 0.005209291
## min 0.057174400
## max 0.063666882
## invalid 0.000000000
##
## *Workflow: gbm
## nmae
## avg 0.438441977
## std 0.089843292
## med 0.446276503
## iqr 0.006400742
## min 0.299304787
## max 0.551006090
## invalid 0.000000000
##
## *Workflow: rpartXse
## nmae
## avg 0.31001207
## std 0.07304192
## med 0.31267052
## iqr 0.02506653
## min 0.20896925
## max 0.41390823
## invalid 0.00000000
##
## *Workflow: earth
## nmae
## avg 0.11139535
## std 0.06454535
## med 0.08020100
## iqr 0.01985322
## min 0.07653918
## max 0.22587866
## invalid 0.00000000
##
## *Workflow: nnet
## nmae
## avg 0.23049011
## std 0.03099567
## med 0.23451437
## iqr 0.03487250
## min 0.19337384
## max 0.27289000
## invalid 0.00000000
plot(exp_LUV)
Similar to the conclusions with AAL, the LM and the Sliding SVM are the best performing models for each iteration (i.e. 1 to 5 lag).
#Build model
data.model_UAL <- specifyModel(Cl(UAL) ~ lag(myATR(UAL), 5) + lag(mySMI(UAL),5) + lag(myADX(UAL),5) + lag(mySAR(UAL),5) + lag(runMean(Cl(UAL)),5) + lag(EMA(Delt(Cl(UAL))),5) + lag(RSI(Cl(UAL)),5))
Tdata.train_UAL <- as.data.frame(modelData(data.model_UAL, data.window=c('2010-01-01','2019-07-31')))
Tdata.eval_UAL <- na.omit(as.data.frame(modelData(data.model_UAL, data.window=c('2019-08-01','2019-08-31'))))
Tform_UAL <- as.formula('Cl.UAL ~ .')
#Performance Estimation tasks
exp_UAL <- performanceEstimation(
PredTask(Tform_UAL, Tdata.train_UAL, 'UAL'),
c(Workflow('standardWF', wfID="standSVM",
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow('timeseriesWF', wfID="slideSVM",
type="slide", relearn.step=90,
learner='svm',learner.pars=list(cost=10,gamma=0.01)),
Workflow(learner="randomForest",learner.pars=list(ntree=100),
wfID="rf420"),
Workflow(learner="rpart",.fullOutput=TRUE),
Workflow(learner= 'lm'),
Workflow(learner='gbm', learner.pars=list(n.trees=25, cv.folds=5), predictor='gbm.predict'),
Workflow(learner="rpartXse"),
Workflow(learner="earth",learner.pars=list(thres=0.001)),
Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=10, decay=0.01))
),
EstimationTask(metrics="nmae",
method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))
##
##
## ##### PERFORMANCE ESTIMATION USING MONTE CARLO #####
##
## ** PREDICTIVE TASK :: UAL
##
## ++ MODEL/WORKFLOW :: standSVM
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: slideSVM
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: rf420
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: rpart
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: lm
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: gbm
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 2
## start test = 1551 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 3
## start test = 1559 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 4
## start test = 1560 ; test size = 594
## Distribution not specified, assuming gaussian ...
## Repetition 5
## start test = 1699 ; test size = 594
## Distribution not specified, assuming gaussian ...
##
##
##
## ++ MODEL/WORKFLOW :: rpartXse
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: earth
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: nnet
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
summary(exp_UAL)
##
## == Summary of a Monte Carlo Performance Estimation Experiment ==
##
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
##
## * Predictive Tasks :: UAL
## * Workflows :: standSVM, slideSVM, rf420, rpart, lm, gbm, rpartXse, earth, nnet
##
## -> Task: UAL
## *Workflow: standSVM
## nmae
## avg 0.157833124
## std 0.026941992
## med 0.167957847
## iqr 0.007190916
## min 0.111358621
## max 0.180744908
## invalid 0.000000000
##
## *Workflow: slideSVM
## nmae
## avg 0.101754574
## std 0.006367492
## med 0.104662230
## iqr 0.006503304
## min 0.091459718
## max 0.106584051
## invalid 0.000000000
##
## *Workflow: rf420
## nmae
## avg 0.32083274
## std 0.08421363
## med 0.33302084
## iqr 0.01121475
## min 0.18835636
## max 0.42302137
## invalid 0.00000000
##
## *Workflow: rpart
## nmae
## avg 0.24044454
## std 0.04136346
## med 0.21807413
## iqr 0.03103832
## min 0.20869148
## max 0.30932475
## invalid 0.00000000
##
## *Workflow: lm
## nmae
## avg 0.10544083
## std 0.01272695
## med 0.11217347
## iqr 0.01317972
## min 0.08487747
## max 0.11445598
## invalid 0.00000000
##
## *Workflow: gbm
## nmae
## avg 0.412070861
## std 0.087038230
## med 0.414568025
## iqr 0.009477313
## min 0.288430415
## max 0.534397506
## invalid 0.000000000
##
## *Workflow: rpartXse
## nmae
## avg 0.24019676
## std 0.04223699
## med 0.21968875
## iqr 0.02227730
## min 0.20989495
## max 0.31304038
## invalid 0.00000000
##
## *Workflow: earth
## nmae
## avg 0.140842380
## std 0.017473792
## med 0.143141384
## iqr 0.003230122
## min 0.114457657
## max 0.163506613
## invalid 0.000000000
##
## *Workflow: nnet
## nmae
## avg 0.28170024
## std 0.07803857
## med 0.23462402
## iqr 0.11160134
## min 0.21989389
## max 0.39168136
## invalid 0.00000000
plot(exp_UAL)
Similar to the conclusions with the others stocks, the LM and the Sliding SVM are the best performing models for each iteration (i.e. 1 to 5 lag).
plot_grid(plot(exp), plot(exp_ALK), plot(exp_DAL), plot(exp_LUV), plot(exp_UAL))
Overall, the sliding SVM and the LM outperform every other model for each stock for each lag period (i.e. 1 to 5 days).
Given that the results are fairly consistent across stocks and prediction windows and that all stocks belong to the same industry, we believe it is reasonable to use a single model for all the predictions.
In our view, both the Slide SVM and the LM show a strong performance and merits to be chosen for prediction purposes. We decided to go for the LM model for the following reasons:
Note: In the appendices section of this notebook, we have included the code we used to optimize the parameters of the Slide SVM and how we would have used the Slide SVM to make predictions if this would have been our selected modeling approach.
First, we created a sequence of 5 days starting today and considering trading days i.e. excluding weekends and holidays per the NYSE calendar.
Next, for each stock we will create a data frame that enables us to make a prediction. For instance, to develop the 5-days prediction model, we will need to predict the close price of the stock using the technical indicator predictors lagged by 5 days (i.e. this would be the most recent information available upon prediction).
Once we have the data frames, we can fit an LM model to each of them.
Next, we need to make a prediction. It is worth noting, that the most recent data available will be that of the most recent trading day. We will use that information for each of the models with the parameter “weights” defining the close price for the next 1, 2, 3, 4 and 5 days.
We have included all the code below. The process is the same for the 5 stocks:
holidays <- holidayNYSE()
daysSeq <- as.timeDate(seq(from = Sys.Date(), length=15, by = "day"))
daysSeq <- head(daysSeq[isBizday(daysSeq, holidays = holidays, wday = 1:5)],5)
for (stock in stocks) {
for (i in 1:5) {
namdf <- paste0('t_',stock,'_DF_Lag', i)
contdf <- data.frame(Cl(get(stock)), lag(myATR(get(stock)), i), lag(mySMI(get(stock)),i), lag(myADX(get(stock)),i), lag(mySAR(get(stock)),i), lag(runMean(Cl(get(stock))),i), lag(EMA(Delt(Cl(get(stock)))),i), lag(RSI(Cl(get(stock))),i))
colnames(contdf) <- c('Close', 'ATR', 'SMI', 'ADX', 'SAR', 'runMean', 'EMA', 'RSI')
assign(namdf, contdf)
namlm <- paste0('t_',stock,'_lmlag',i)
contlm <- lm(formula = Close ~., data = get(paste0('t_',stock,'_DF_Lag',i)))
assign(namlm, contlm)
namtestpred <- paste0('t_',stock,'_DF_testpred')
conttestpred <- filter(data.frame(Cl(get(stock)), myATR(get(stock)), mySMI(get(stock)), myADX(get(stock)), mySAR(get(stock)), runMean(Cl(get(stock))), EMA(Delt(Cl(get(stock)))), RSI(Cl(get(stock)))), row_number() == n())
colnames(conttestpred) <- c('Close', 'ATR', 'SMI', 'ADX', 'SAR', 'runMean', 'EMA', 'RSI')
assign(namtestpred, conttestpred)
nampred <- paste0('t_',stock,'_lmpredlag',i)
contpred <- predict(get(paste0('t_',stock,'_lmlag',i)), get(paste0('t_',stock,'_DF_testpred')), type = 'response')
assign(nampred, contpred)
}
nampredictions <- paste0('t_',stock,'_preds')
contpredictions <- c(get(paste0('t_',stock,'_lmpredlag1')), get(paste0('t_',stock,'_lmpredlag2')), get(paste0('t_',stock,'_lmpredlag3')), get(paste0('t_',stock,'_lmpredlag4')), get(paste0('t_',stock,'_lmpredlag5')))
assign(nampredictions, contpredictions)
namxts <- paste0('t_',stock,'_xts')
contxts <- xts(x = get(paste0('t_',stock,'_preds')), order.by = as.Date(daysSeq))
assign(namxts,contxts)
namfullts <- paste0('t_',stock,'_fullts')
contfullts <- rbind(get(stock)[,4], get(paste0('t_',stock,'_xts')))
assign(namfullts,contfullts)
}
Finally, the table below contains the predictions for the next 5 days for the 5 stocks.
kable(data.frame(Date=daysSeq,AAL_Predictions=t_AAL_preds, ALK_Predictions=t_ALK_preds, UAL_Predictions=t_DAL_preds, LUV_Predictions=t_LUV_preds, UAL_Predictions=t_UAL_preds))
GMT.x..i.. | AAL_Predictions | ALK_Predictions | UAL_Predictions | LUV_Predictions | UAL_Predictions.1 |
---|---|---|---|---|---|
2019-09-16 | 29.87460 | 65.26187 | 59.70945 | 54.93033 | 89.71158 |
2019-09-17 | 29.85552 | 65.26870 | 59.68034 | 54.95730 | 89.67050 |
2019-09-18 | 29.82789 | 65.27375 | 59.66205 | 54.99211 | 89.66163 |
2019-09-19 | 29.79108 | 65.27668 | 59.63059 | 55.01707 | 89.62829 |
2019-09-20 | 29.75145 | 65.29509 | 59.60191 | 55.03467 | 89.62311 |
We tried a number of variants for the cost and gamma and concluded that the combination of relearn step every 30 days, cost=10 and gamma=0.01 was the one that showed the best performance.
#Performance Estimation tasks
exp_UAL1 <- performanceEstimation(
PredTask(Tform_UAL, Tdata.train_UAL, 'UAL'),
workflowVariants('timeseriesWF', wfID="slideSVM",
type="slide", relearn.step=c(30, 60, 90),
learner='svm',learner.pars=list(cost=c(1,7,10),gamma=0.01)),
EstimationTask(metrics="nmae",
method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))
##
##
## ##### PERFORMANCE ESTIMATION USING MONTE CARLO #####
##
## ** PREDICTIVE TASK :: UAL
##
## ++ MODEL/WORKFLOW :: svm.v1
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: svm.v2
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: svm.v3
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: svm.v4
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: svm.v5
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: svm.v6
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: svm.v7
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: svm.v8
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
##
##
##
## ++ MODEL/WORKFLOW :: svm.v9
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
## Repetition 1
## start test = 1257 ; test size = 594
## Repetition 2
## start test = 1551 ; test size = 594
## Repetition 3
## start test = 1559 ; test size = 594
## Repetition 4
## start test = 1560 ; test size = 594
## Repetition 5
## start test = 1699 ; test size = 594
summary(exp_UAL1)
##
## == Summary of a Monte Carlo Performance Estimation Experiment ==
##
## Task for estimating nmae using
## 5 repetitions Monte Carlo Simulation using:
## seed = 1234
## train size = 0.5 x NROW(DataSet)
## test size = 0.25 x NROW(DataSet)
##
## * Predictive Tasks :: UAL
## * Workflows :: svm.v1, svm.v2, svm.v3, svm.v4, svm.v5, svm.v6, svm.v7, svm.v8, svm.v9
##
## -> Task: UAL
## *Workflow: svm.v1
## nmae
## avg 0.10257327
## std 0.00718602
## med 0.10654621
## iqr 0.01004526
## min 0.09291279
## max 0.10938238
## invalid 0.00000000
##
## *Workflow: svm.v2
## nmae
## avg 0.107147430
## std 0.006234698
## med 0.110773807
## iqr 0.005164900
## min 0.096786215
## max 0.111477722
## invalid 0.000000000
##
## *Workflow: svm.v3
## nmae
## avg 0.109139979
## std 0.006672939
## med 0.113286464
## iqr 0.008481079
## min 0.099132420
## max 0.114077268
## invalid 0.000000000
##
## *Workflow: svm.v4
## nmae
## avg 0.096583114
## std 0.005518074
## med 0.098844393
## iqr 0.004638806
## min 0.088229580
## max 0.102594696
## invalid 0.000000000
##
## *Workflow: svm.v5
## nmae
## avg 0.100716373
## std 0.005113150
## med 0.102934344
## iqr 0.001530555
## min 0.091655892
## max 0.103721806
## invalid 0.000000000
##
## *Workflow: svm.v6
## nmae
## avg 0.102140986
## std 0.006156431
## med 0.105511998
## iqr 0.005060912
## min 0.091863185
## max 0.106426848
## invalid 0.000000000
##
## *Workflow: svm.v7
## nmae
## avg 0.096489131
## std 0.005040812
## med 0.098997864
## iqr 0.004962818
## min 0.088866777
## max 0.101481422
## invalid 0.000000000
##
## *Workflow: svm.v8
## nmae
## avg 0.100056720
## std 0.005089199
## med 0.102229307
## iqr 0.002112098
## min 0.091093952
## max 0.103115559
## invalid 0.000000000
##
## *Workflow: svm.v9
## nmae
## avg 0.101754574
## std 0.006367492
## med 0.104662230
## iqr 0.006503304
## min 0.091459718
## max 0.106584051
## invalid 0.000000000
plot(exp_UAL1)
topPerformer(exp_UAL1, 'nmae','UAL')
## Workflow Object:
## Workflow ID :: svm.v7
## Workflow Function :: timeseriesWF
## Parameter values:
## learner.pars -> cost=10 gamma=0.01
## type -> slide
## relearn.step -> 30
## learner -> svm
While we did not use this prediction, we were able to implement a prediction using the Slide SVM. We have included the code below for illustrative purposes.
UAL_DF_SVM_lag5 <- data.frame(Cl(UAL), lag(myATR(UAL), 5), lag(mySMI(UAL),5), lag(myADX(UAL),5), lag(mySAR(UAL),5), lag(runMean(Cl(UAL)),5), lag(EMA(Delt(Cl(UAL))),5), lag(RSI(Cl(UAL)),5))
colnames(t_UAL_DF_Lag5) <- c('Close', 'ATR', 'SMI', 'ADX', 'SAR', 'runMean', 'EMA', 'RSI')
relearn.step <- 30
n <- NROW(UAL_DF_SVM_lag5)
train.size_UAL <- NROW(Tdata.train_UAL)
sts <- seq(train.size_UAL+1, n, by = relearn.step)
preds_SVM_UAL <- vector()
for (s in sts) {
tr <- UAL_DF_SVM_lag5[(s-train.size_UAL):(s-1),]
colnames(tr) <- c('Close', 'ATR', 'SMI', 'ADX', 'SAR', 'runMean', 'EMA', 'RSI')
ts <- UAL_DF_SVM_lag5[s:min((s+relearn.step-1),n),]
colnames(ts) <- c('Close', 'ATR', 'SMI', 'ADX', 'SAR', 'runMean', 'EMA', 'RSI')
m <- do.call('svm', c(list(Close ~., tr), list(cost=10,gamma=0.01)))
}
UAL_svmpredlag5 <- predict(m, t_UAL_DF_testpred, type = 'response')
UAL_svmpredlag5
## 1
## 90.52084