Goal of assignment

The goal of this assignment is to develop a Web app with shiny that helps a user by presenting predictions of the future evolution of prices of an asset. As input actions the user shall be allowed to select among a pre-defined set of stock tickers and a prediction model again from a fixed set of available techniques.

Analytical approach

This notebook will address the predictive modeling aspect of the assignment, with the Shiny app being build subsequently and separately. Specifically, we will assess the best modelling approach to predict the price of 5 different stocks using at least 3 prediction models.

Once the best modelling approach is selected, it will be implemented in the Shiny app.

Selected stocks

First, we selected 5 different stocks. We decided to go for the 5 airlines (per the GICS Sector) of the S&P 500. The reason for the choice are:

We thought it was an interesting sector
We were interest to know if the same modelling approach would work for all given the industry similarities
These were the 5 only airlines in the S&P 500 so we captured the whole industry of the index
The fact that they are listed in the S&P 500 makes them liquid stocks with high trading, which makes them optimal for an intraday or short term prediction.

The 5 stocks selected are:

Alaska Air Group - ALK
American Airlines Group - AAL
Delta Air Lines - DAL
Southwest Airlines - LUV
United Airlines Holdings - UAL

Model selection for AAL

# Creating a vector of packages used within.  
packages <- c('dplyr', 'DMwR2', 'ggplot2', 'devtools', 'ROCR', 'performanceEstimation', 'UBL', 'ranger', 'e1071', 'xts', 'quantmod', 'TTR', 'earth', 'MASS', 'rpart', 'here', 'randomForest', 'knitr', 'gbm', 'nnet', 'cowplot', 'bizdays', 'timeDate')
# Checking for package installations on the system and installing if not found.
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}
# Including the packages for use.
for(package in packages){
  library(package, character.only = TRUE)
}

#Ensure wd is set to current location by using here()
setwd(here::here())

#Check package version. Has to be 1.1.1.
# packageDescription('performanceEstimation')
# install_github("ltorgo/performanceEstimation",ref='develop')

Pull stock data for the selected stocks

We use the quantmod package to pull the stock data for the 5 selected stocks using a for loop

#Get stock data using a for loop
stocks <- c('ALK', 'AAL', 'DAL', 'LUV', 'UAL')

for (stock in stocks) {
  assign(stock, getSymbols(stock, from = '2010-01-01', auto.assign = FALSE))
}

We will visualize the performance of the selected stocks

We visualize the time series performance of the stocks to make sure data was pulled correctly.

#Loop through stocks
for (stock in stocks) {
  candleChart(xts::last(get(stock),'10 years'),theme='white',TA=NULL, name = paste(substr(names(get(stock))[1],1,3),'last 10 years'))
}

Create functions to produce predictors for each stock

In order to predict stock prices we need to determine the predictors that we will use. One option would be to use a mere time series approach. However, we have calculated a number of technical indicators provided by the packages TTR and Quantmod. Technical indicators are numeric summaries that reflect some properties of the price time series. They provide some interesting summaries of the dynamic of a financial time series.

#True Range / Avg True Range - volatility of H-L-C series
myATR <- function(x) ATR(HLC(x))[,'atr']
#Stochastic Oscilator / Stochastic Momentum Index - related closer to midpoint of H-l in previous days
mySMI <- function(x) SMI(HLC(x))[, "SMI"]
#Welles Wilder's Directional Movement Index - % of true range that is up/down
myADX <- function(x) ADX(HLC(x))[,'ADX']
#Identified starting trends i.e. how long since the highest/lowert point in last n periods
myAroon <- function(x) aroon(cbind(Hi(x),Lo(x)))$oscillator
#Bollinger Bands
MyBB <- function(x) BBands(HLC(x))[,'pctB']
#Chaikin Volatility - percent difference between series
MyChaikinVol <- function(x) Delt(chaikinVolatility(cbind(Hi(x),Lo(x))))[,1]
#CLV - Days close to trading range
MyCLV <- function(x) EMA(CLV(HLC(x)))[,1]
#Arm's Ease of Movement Value - Emphasizes days were stock moves easily and minimizes not easily
myEMV <- function(x) EMV(cbind(Hi(x),Lo(x)),Vo(x))[,2]
#MACD Oscilator - Compares fast MA of a series with slow MA of the same series
myMACD <- function(x) MACD(Cl(x))[,2]
#Ratio of positive and negative money flow over time
myMFI <- function(x) MFI(HLC(x), Vo(x))
#Parabolit Stop-and-Reverse calculates a trailint stop
mySAR <- function(x) SAR(cbind(Hi(x),Cl(x))) [,1]
#Volatility estimator/indicator. German method. 
myVolat <- function(x) volatility(OHLC(x),calc="garman")[,1]

Refine predictors

We have identified a number of technical indicators that can potentially help in our prediction of the stock price. In order to gauge the importance of each predictor and check if we want to keep all or not.

In order to make a determination on what predictors to keep, we used a Random Forest model to estimate variable importance and set a threshold below which we will discard a variable i.e. based on a % of increase in the error if we were going to exclude that variable.

We plotted the importance in the fitted model of the different models.

We are also showing the % MSE driven by each indicator in a table.

data.model_AAL <- specifyModel(Cl(AAL) ~ Delt(Cl(AAL),k=1:3) + myATR(AAL) + mySMI(AAL) + myADX(AAL) + myAroon(AAL) + MyBB(AAL) + MyChaikinVol(AAL) + MyCLV(AAL) + myEMV(AAL) + myVolat(AAL) + myMACD(AAL) + myMFI(AAL) + mySAR(AAL) + runMean(Cl(AAL)) + CMO(Cl(AAL)) + EMA(Delt(Cl(AAL))) + RSI(Cl(AAL)) + runMean(Cl(AAL)) + runSD(Cl(AAL)))

test <- data.frame(Cl(AAL),Delt(Cl(AAL),k=1:3), myATR(AAL), lag(myATR(AAL), 5), mySMI(AAL), myADX(AAL), myAroon(AAL), MyBB(AAL), MyChaikinVol(AAL), MyCLV(AAL), myEMV(AAL), myVolat(AAL), myMACD(AAL), myMFI(AAL), mySAR(AAL), runMean(Cl(AAL)), CMO(Cl(AAL)), EMA(Delt(Cl(AAL))), RSI(Cl(AAL)), runMean(Cl(AAL)), runSD(Cl(AAL)))

set.seed(1234)
rf <- buildModel(x = data.model_AAL, method = 'randomForest', training.per=c('2010-01-01','2019-08-31'), ntree = 1000, importance = TRUE)

varImpPlot(rf@fitted.model, type = 1)

imp <- randomForest::importance(rf@fitted.model, type = 1)
kable(imp)

	%IncMSE
Delt.Cl.AAL.k.1.3.Delt.1.arithmetic	6.7423326
Delt.Cl.AAL.k.1.3.Delt.2.arithmetic	15.0454172
Delt.Cl.AAL.k.1.3.Delt.3.arithmetic	15.0006270
myATR.AAL	21.4640305
mySMI.AAL	23.7441035
myADX.AAL	26.7947513
myAroon.AAL	18.8775610
MyBB.AAL	29.5528169
MyChaikinVol.AAL	0.2256495
MyCLV.AAL	16.4688365
myEMV.AAL	9.0908267
myVolat.AAL	19.1167234
myMACD.AAL	18.1530301
myMFI.AAL	19.7267598
mySAR.AAL	27.6380886
runMean.Cl.AAL	37.1744845
CMO.Cl.AAL	26.8975905
EMA.Delt.Cl.AAL	23.2703432
RSI.Cl.AAL	33.9688671
runSD.Cl.AAL	11.9443934

Given that the mean importance was 18% and we needed to make predictions up to 5 days for 5 different stocks, we only considered those predictors with 20% or more % MSE.This leaves:

+My ATR +My SMI +MyADX +MySAR +RunMean +EMA Delt +RSI

kable(rownames(imp)[which(imp>20)])

x
myATR.AAL
mySMI.AAL
myADX.AAL
MyBB.AAL
mySAR.AAL
runMean.Cl.AAL
CMO.Cl.AAL
EMA.Delt.Cl.AAL
RSI.Cl.AAL

For further analyses, we will use these predictors in our data models. Given that our 5 stocks operate within the same industry, we have assumed that these conclusions apply to the other stocks as well. This way, we will have a consistent group of predictors for all models.

Create data model with refined list of predictors

Next, we specify the model and predictors. Specifically, we will try to predict the closing price of the specific stock. Given that we will need to make a prediction for 1 to 5 days and the technical indicators for each date won’t be available (i.e. leakage) we have lagged the predictors for 1 to 5 days, as this information would actually be available to make the predictions.

We divided the set between training set, which goes from January 2010 to July 2019 and test set, which include August 2019.

data.model_AAL <- specifyModel(Cl(AAL) ~ lag(myATR(AAL), 5) + lag(mySMI(AAL),5) + lag(myADX(AAL),5) + lag(mySAR(AAL),5) + lag(runMean(Cl(AAL)),5) + lag(EMA(Delt(Cl(AAL))),5) + lag(RSI(Cl(AAL)),5))
Tdata.train_AAL <- as.data.frame(modelData(data.model_AAL, data.window=c('2010-01-01','2019-07-31')))
Tdata.eval_AAL <- na.omit(as.data.frame(modelData(data.model_AAL, data.window=c('2019-08-01','2019-08-31'))))
Tform_AAL <- as.formula('Cl.AAL ~ .')

Run model

We reun a number of models with the following parameters: + Models: SVM (sliding and standard), lm, gbm, rpartXse, rpart, earth, nnet and randomforest + We used nmae as our evaluation metric as it’s normalize and easy to interpret. + We used MonteCarlo as our estimation method, with 5 iterations to obtain certainty on the performance of each of the models.

gbm.predict <- function(model, test, method, ...) {
best <- gbm.perf(model, plot.it=FALSE, method=method)
return(predict(model, test, n.trees=best, ...))
}


exp <- performanceEstimation(
    PredTask(Tform_AAL, Tdata.train_AAL, 'AAL'),   
    c(Workflow('standardWF', wfID="standSVM",
               learner='svm',learner.pars=list(cost=10,gamma=0.01)),
      Workflow('timeseriesWF', wfID="slideSVM", 
               type="slide", relearn.step=90,
               learner='svm',learner.pars=list(cost=10,gamma=0.01)),
      Workflow(learner="randomForest",learner.pars=list(ntree=100),
               wfID="rf420"),
      Workflow(learner="rpart",.fullOutput=TRUE),
      Workflow(learner= 'lm'),
      Workflow(learner='gbm', learner.pars=list(n.trees=100, cv.folds=5), predictor='gbm.predict'),
      Workflow(learner="rpartXse"),
      Workflow(learner="earth",learner.pars=list(thres=0.001)),
      Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=10, decay=0.01))
      ),
    EstimationTask(metrics="nmae",
                   method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))

## 
## 
## ##### PERFORMANCE ESTIMATION USING  MONTE CARLO  #####
## 
## ** PREDICTIVE TASK :: AAL
## 
## ++ MODEL/WORKFLOW :: standSVM 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: slideSVM 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rf420 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpart 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: lm 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: gbm 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  5 
##   start test =  1699 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpartXse 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: earth 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: nnet 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594

Below we have include the summary of the performance estimation tasks undertaken

summary(exp)

## 
## == Summary of a  Monte Carlo Performance Estimation Experiment ==
## 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## 
## * Predictive Tasks ::  AAL
## * Workflows  ::  standSVM, slideSVM, rf420, rpart, lm, gbm, rpartXse, earth, nnet 
## 
## -> Task:  AAL
##   *Workflow: standSVM 
##                nmae
## avg     0.115048078
## std     0.014989660
## med     0.109625144
## iqr     0.001927339
## min     0.105123482
## max     0.141619005
## invalid 0.000000000
## 
##   *Workflow: slideSVM 
##                 nmae
## avg     0.1056408818
## std     0.0174007392
## med     0.1033576695
## iqr     0.0004359777
## min     0.0848402417
## max     0.1332611405
## invalid 0.0000000000
## 
##   *Workflow: rf420 
##                nmae
## avg     0.158135879
## std     0.034693511
## med     0.159661883
## iqr     0.008784552
## min     0.110368673
## max     0.207989805
## invalid 0.000000000
## 
##   *Workflow: rpart 
##               nmae
## avg     0.18628860
## std     0.02505173
## med     0.19970967
## iqr     0.03091286
## min     0.15154953
## max     0.21154471
## invalid 0.00000000
## 
##   *Workflow: lm 
##                nmae
## avg     0.108162156
## std     0.021846779
## med     0.109851183
## iqr     0.001648006
## min     0.075454319
## max     0.137020065
## invalid 0.000000000
## 
##   *Workflow: gbm 
##                nmae
## avg     0.138066715
## std     0.029488059
## med     0.140756444
## iqr     0.003013282
## min     0.092147458
## max     0.174610982
## invalid 0.000000000
## 
##   *Workflow: rpartXse 
##                nmae
## avg     0.163858290
## std     0.035289428
## med     0.160662110
## iqr     0.003181326
## min     0.118197986
## max     0.217431705
## invalid 0.000000000
## 
##   *Workflow: earth 
##               nmae
## avg     0.12273230
## std     0.02674245
## med     0.12587048
## iqr     0.00147135
## min     0.08096336
## max     0.15573573
## invalid 0.00000000
## 
##   *Workflow: nnet 
##               nmae
## avg     0.15210920
## std     0.03241967
## med     0.14507666
## iqr     0.02291699
## min     0.12059789
## max     0.20498979
## invalid 0.00000000

Furthermore, we have included below the plot which shows the performance for each of the models in the 10 iterations (shown as boxplots).

plot(exp)

We run the different models using lagged predictors for 1 to 5 days. While the error increased as we lagged the predictors more (which is expectable), the Lm and the Sliding SVm resulted to be the best models for each of the iterations (i.e. 1 to 5 days), with the Standard SVM following. The graph above shows the results for 5 days lagged predictors.

Once this was concluded, we replicated this same analysis for the remaining 4 stocks, to test if the conclusions were the same.

Model selection for the remaining stocks

ALK

#Build model
data.model_ALK <- specifyModel(Cl(ALK) ~ lag(myATR(ALK), 5) + lag(mySMI(ALK),5) + lag(myADX(ALK),5) + lag(mySAR(ALK),5) + lag(runMean(Cl(ALK)),5) + lag(EMA(Delt(Cl(ALK))),5) + lag(RSI(Cl(ALK)),5))
Tdata.train_ALK <- as.data.frame(modelData(data.model_ALK, data.window=c('2010-01-01','2019-07-31')))
Tdata.eval_ALK <- na.omit(as.data.frame(modelData(data.model_ALK, data.window=c('2019-08-01','2019-08-31'))))
Tform_ALK <- as.formula('Cl.ALK ~ .')

#Performance Estimation tasks
exp_ALK <- performanceEstimation(
    PredTask(Tform_ALK, Tdata.train_ALK, 'ALK'),   
    c(Workflow('standardWF', wfID="standSVM",
               learner='svm',learner.pars=list(cost=10,gamma=0.01)),
      Workflow('timeseriesWF', wfID="slideSVM", 
               type="slide", relearn.step=90,
               learner='svm',learner.pars=list(cost=10,gamma=0.01)),
      Workflow(learner="randomForest",learner.pars=list(ntree=100),
               wfID="rf420"),
      Workflow(learner="rpart",.fullOutput=TRUE),
      Workflow(learner= 'lm'),
      Workflow(learner='gbm', learner.pars=list(n.trees=25, cv.folds=5), predictor='gbm.predict'),
      Workflow(learner="rpartXse"),
      Workflow(learner="earth",learner.pars=list(thres=0.001)),
      Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=10, decay=0.01))
      ),
    EstimationTask(metrics="nmae",
                   method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))

## 
## 
## ##### PERFORMANCE ESTIMATION USING  MONTE CARLO  #####
## 
## ** PREDICTIVE TASK :: ALK
## 
## ++ MODEL/WORKFLOW :: standSVM 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: slideSVM 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rf420 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpart 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: lm 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: gbm 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  5 
##   start test =  1699 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpartXse 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: earth 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: nnet 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594

summary(exp_ALK)

## 
## == Summary of a  Monte Carlo Performance Estimation Experiment ==
## 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## 
## * Predictive Tasks ::  ALK
## * Workflows  ::  standSVM, slideSVM, rf420, rpart, lm, gbm, rpartXse, earth, nnet 
## 
## -> Task:  ALK
##   *Workflow: standSVM 
##               nmae
## avg     0.11112773
## std     0.02729337
## med     0.09475328
## iqr     0.02386611
## min     0.09255428
## max     0.15619063
## invalid 0.00000000
## 
##   *Workflow: slideSVM 
##                nmae
## avg     0.084054849
## std     0.013848427
## med     0.081641261
## iqr     0.002301304
## min     0.068300818
## max     0.106449469
## invalid 0.000000000
## 
##   *Workflow: rf420 
##               nmae
## avg     0.23805992
## std     0.04639408
## med     0.21282942
## iqr     0.04594395
## min     0.19824232
## max     0.31087500
## invalid 0.00000000
## 
##   *Workflow: rpart 
##               nmae
## avg     0.23379063
## std     0.05549053
## med     0.19955476
## iqr     0.04904926
## min     0.19659157
## max     0.32521408
## invalid 0.00000000
## 
##   *Workflow: lm 
##                 nmae
## avg     0.0750381687
## std     0.0152039358
## med     0.0769914044
## iqr     0.0004447275
## min     0.0509355306
## max     0.0932612127
## invalid 0.0000000000
## 
##   *Workflow: gbm 
##               nmae
## avg     0.32245725
## std     0.09626978
## med     0.27250286
## iqr     0.06950965
## min     0.26080431
## max     0.48669538
## invalid 0.00000000
## 
##   *Workflow: rpartXse 
##               nmae
## avg     0.23925067
## std     0.02065089
## med     0.23703291
## iqr     0.02560922
## min     0.21931569
## max     0.26990174
## invalid 0.00000000
## 
##   *Workflow: earth 
##              nmae
## avg     0.1982986
## std     0.1477604
## med     0.1282316
## iqr     0.0264172
## min     0.1215692
## max     0.4617127
## invalid 0.0000000
## 
##   *Workflow: nnet 
##               nmae
## avg     0.16258092
## std     0.01580117
## med     0.17077124
## iqr     0.02092761
## min     0.13952096
## max     0.17599050
## invalid 0.00000000

plot(exp_ALK)

Similar to the conclusions with AAL, the LM and the Sliding SVM are the best performing models for each iteration (i.e. 1 to 5 lag).

DAL

#Build model
data.model_DAL <- specifyModel(Cl(DAL) ~ lag(myATR(DAL), 5) + lag(mySMI(DAL),5) + lag(myADX(DAL),5) + lag(mySAR(DAL),5) + lag(runMean(Cl(DAL)),5) + lag(EMA(Delt(Cl(DAL))),5) + lag(RSI(Cl(DAL)),5))
Tdata.train_DAL <- as.data.frame(modelData(data.model_DAL, data.window=c('2010-01-01','2019-07-31')))
Tdata.eval_DAL <- na.omit(as.data.frame(modelData(data.model_DAL, data.window=c('2019-08-01','2019-08-31'))))
Tform_DAL <- as.formula('Cl.DAL ~ .')

#Performance Estimation tasks
exp_DAL <- performanceEstimation(
    PredTask(Tform_DAL, Tdata.train_DAL, 'DAL'),   
    c(Workflow('standardWF', wfID="standSVM",
               learner='svm',learner.pars=list(cost=10,gamma=0.01)),
      Workflow('timeseriesWF', wfID="slideSVM", 
               type="slide", relearn.step=90,
               learner='svm',learner.pars=list(cost=10,gamma=0.01)),
      Workflow(learner="randomForest",learner.pars=list(ntree=100),
               wfID="rf420"),
      Workflow(learner="rpart",.fullOutput=TRUE),
      Workflow(learner= 'lm'),
      Workflow(learner='gbm', learner.pars=list(n.trees=25, cv.folds=5), predictor='gbm.predict'),
      Workflow(learner="rpartXse"),
      Workflow(learner="earth",learner.pars=list(thres=0.001)),
      Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=10, decay=0.01))
      ),
    EstimationTask(metrics="nmae",
                   method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))

## 
## 
## ##### PERFORMANCE ESTIMATION USING  MONTE CARLO  #####
## 
## ** PREDICTIVE TASK :: DAL
## 
## ++ MODEL/WORKFLOW :: standSVM 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: slideSVM 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rf420 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpart 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: lm 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: gbm 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  5 
##   start test =  1699 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpartXse 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: earth 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: nnet 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594

summary(exp_DAL)

## 
## == Summary of a  Monte Carlo Performance Estimation Experiment ==
## 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## 
## * Predictive Tasks ::  DAL
## * Workflows  ::  standSVM, slideSVM, rf420, rpart, lm, gbm, rpartXse, earth, nnet 
## 
## -> Task:  DAL
##   *Workflow: standSVM 
##                nmae
## avg     0.100073835
## std     0.013129771
## med     0.102848169
## iqr     0.003088132
## min     0.079211052
## max     0.115511777
## invalid 0.000000000
## 
##   *Workflow: slideSVM 
##                nmae
## avg     0.075519507
## std     0.007686614
## med     0.079292127
## iqr     0.002548964
## min     0.061910497
## max     0.079763776
## invalid 0.000000000
## 
##   *Workflow: rf420 
##               nmae
## avg     0.19738857
## std     0.05619705
## med     0.20996232
## iqr     0.01246602
## min     0.10251671
## max     0.25236883
## invalid 0.00000000
## 
##   *Workflow: rpart 
##                nmae
## avg     0.215051861
## std     0.075929803
## med     0.228375582
## iqr     0.009220149
## min     0.093586379
## max     0.303946292
## invalid 0.000000000
## 
##   *Workflow: lm 
##                nmae
## avg     0.076281503
## std     0.009256540
## med     0.078538247
## iqr     0.002242126
## min     0.061210281
## max     0.086511125
## invalid 0.000000000
## 
##   *Workflow: gbm 
##                nmae
## avg     0.337363446
## std     0.057555511
## med     0.337117088
## iqr     0.008702466
## min     0.258411780
## max     0.420850318
## invalid 0.000000000
## 
##   *Workflow: rpartXse 
##                nmae
## avg     0.144259980
## std     0.017715653
## med     0.137440674
## iqr     0.004817362
## min     0.131045392
## max     0.175280775
## invalid 0.000000000
## 
##   *Workflow: earth 
##                nmae
## avg     0.107216708
## std     0.022825587
## med     0.096430745
## iqr     0.006583378
## min     0.092975679
## max     0.147542701
## invalid 0.000000000
## 
##   *Workflow: nnet 
##               nmae
## avg     0.11605927
## std     0.01749918
## med     0.11173092
## iqr     0.01725000
## min     0.09911605
## max     0.14347170
## invalid 0.00000000

plot(exp_DAL)

Similar to the conclusions with AAL, the LM and the Sliding SVM are the best performing models for each iteration (i.e. 1 to 5 lag).

LUV

#Build model
data.model_LUV <- specifyModel(Cl(LUV) ~ lag(myATR(LUV), 5) + lag(mySMI(LUV),5) + lag(myADX(LUV),5) + lag(mySAR(LUV),5) + lag(runMean(Cl(LUV)),5) + lag(EMA(Delt(Cl(LUV))),5) + lag(RSI(Cl(LUV)),5))
Tdata.train_LUV <- as.data.frame(modelData(data.model_LUV, data.window=c('2010-01-01','2019-07-31')))
Tdata.eval_LUV <- na.omit(as.data.frame(modelData(data.model_LUV, data.window=c('2019-08-01','2019-08-31'))))
Tform_LUV <- as.formula('Cl.LUV ~ .')

#Performance Estimation tasks
exp_LUV <- performanceEstimation(
    PredTask(Tform_LUV, Tdata.train_LUV, 'LUV'),   
    c(Workflow('standardWF', wfID="standSVM",
               learner='svm',learner.pars=list(cost=10,gamma=0.01)),
      Workflow('timeseriesWF', wfID="slideSVM", 
               type="slide", relearn.step=90,
               learner='svm',learner.pars=list(cost=10,gamma=0.01)),
      Workflow(learner="randomForest",learner.pars=list(ntree=100),
               wfID="rf420"),
      Workflow(learner="rpart",.fullOutput=TRUE),
      Workflow(learner= 'lm'),
      Workflow(learner='gbm', learner.pars=list(n.trees=25, cv.folds=5), predictor='gbm.predict'),
      Workflow(learner="rpartXse"),
      Workflow(learner="earth",learner.pars=list(thres=0.001)),
      Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1500, size=10, decay=0.01))
      ),
    EstimationTask(metrics="nmae",
                   method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))

## 
## 
## ##### PERFORMANCE ESTIMATION USING  MONTE CARLO  #####
## 
## ** PREDICTIVE TASK :: LUV
## 
## ++ MODEL/WORKFLOW :: standSVM 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: slideSVM 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rf420 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpart 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: lm 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: gbm 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  5 
##   start test =  1699 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpartXse 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: earth 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: nnet 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594

summary(exp_LUV)

## 
## == Summary of a  Monte Carlo Performance Estimation Experiment ==
## 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## 
## * Predictive Tasks ::  LUV
## * Workflows  ::  standSVM, slideSVM, rf420, rpart, lm, gbm, rpartXse, earth, nnet 
## 
## -> Task:  LUV
##   *Workflow: standSVM 
##                nmae
## avg     0.143047787
## std     0.023991455
## med     0.140123523
## iqr     0.003799764
## min     0.113023544
## max     0.180149147
## invalid 0.000000000
## 
##   *Workflow: slideSVM 
##                nmae
## avg     0.066440217
## std     0.002144861
## med     0.065425170
## iqr     0.001513464
## min     0.064953334
## max     0.070094211
## invalid 0.000000000
## 
##   *Workflow: rf420 
##               nmae
## avg     0.31894760
## std     0.08558052
## med     0.33165497
## iqr     0.01114144
## min     0.18086574
## max     0.41710913
## invalid 0.00000000
## 
##   *Workflow: rpart 
##               nmae
## avg     0.36825164
## std     0.08207889
## med     0.40238222
## iqr     0.00527719
## min     0.22190681
## max     0.41581529
## invalid 0.00000000
## 
##   *Workflow: lm 
##                nmae
## avg     0.059698131
## std     0.003108320
## med     0.057988445
## iqr     0.005209291
## min     0.057174400
## max     0.063666882
## invalid 0.000000000
## 
##   *Workflow: gbm 
##                nmae
## avg     0.438441977
## std     0.089843292
## med     0.446276503
## iqr     0.006400742
## min     0.299304787
## max     0.551006090
## invalid 0.000000000
## 
##   *Workflow: rpartXse 
##               nmae
## avg     0.31001207
## std     0.07304192
## med     0.31267052
## iqr     0.02506653
## min     0.20896925
## max     0.41390823
## invalid 0.00000000
## 
##   *Workflow: earth 
##               nmae
## avg     0.11139535
## std     0.06454535
## med     0.08020100
## iqr     0.01985322
## min     0.07653918
## max     0.22587866
## invalid 0.00000000
## 
##   *Workflow: nnet 
##               nmae
## avg     0.23049011
## std     0.03099567
## med     0.23451437
## iqr     0.03487250
## min     0.19337384
## max     0.27289000
## invalid 0.00000000

plot(exp_LUV)

Similar to the conclusions with AAL, the LM and the Sliding SVM are the best performing models for each iteration (i.e. 1 to 5 lag).

UAL

#Build model
data.model_UAL <- specifyModel(Cl(UAL) ~ lag(myATR(UAL), 5) + lag(mySMI(UAL),5) + lag(myADX(UAL),5) + lag(mySAR(UAL),5) + lag(runMean(Cl(UAL)),5) + lag(EMA(Delt(Cl(UAL))),5) + lag(RSI(Cl(UAL)),5))
Tdata.train_UAL <- as.data.frame(modelData(data.model_UAL, data.window=c('2010-01-01','2019-07-31')))
Tdata.eval_UAL <- na.omit(as.data.frame(modelData(data.model_UAL, data.window=c('2019-08-01','2019-08-31'))))
Tform_UAL <- as.formula('Cl.UAL ~ .')

#Performance Estimation tasks
exp_UAL <- performanceEstimation(
    PredTask(Tform_UAL, Tdata.train_UAL, 'UAL'),   
    c(Workflow('standardWF', wfID="standSVM",
               learner='svm',learner.pars=list(cost=10,gamma=0.01)),
      Workflow('timeseriesWF', wfID="slideSVM", 
               type="slide", relearn.step=90,
               learner='svm',learner.pars=list(cost=10,gamma=0.01)),
      Workflow(learner="randomForest",learner.pars=list(ntree=100),
               wfID="rf420"),
      Workflow(learner="rpart",.fullOutput=TRUE),
      Workflow(learner= 'lm'),
      Workflow(learner='gbm', learner.pars=list(n.trees=25, cv.folds=5), predictor='gbm.predict'),
      Workflow(learner="rpartXse"),
      Workflow(learner="earth",learner.pars=list(thres=0.001)),
      Workflow(learner='nnet', learner.pars=list(linout=TRUE, trace=FALSE, maxit=1000, size=10, decay=0.01))
      ),
    EstimationTask(metrics="nmae",
                   method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))

## 
## 
## ##### PERFORMANCE ESTIMATION USING  MONTE CARLO  #####
## 
## ** PREDICTIVE TASK :: UAL
## 
## ++ MODEL/WORKFLOW :: standSVM 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: slideSVM 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rf420 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpart 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: lm 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: gbm 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## Repetition  5 
##   start test =  1699 ; test size =  594 
## Distribution not specified, assuming gaussian ...
## 
## 
## 
## ++ MODEL/WORKFLOW :: rpartXse 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: earth 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: nnet 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594

summary(exp_UAL)

## 
## == Summary of a  Monte Carlo Performance Estimation Experiment ==
## 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## 
## * Predictive Tasks ::  UAL
## * Workflows  ::  standSVM, slideSVM, rf420, rpart, lm, gbm, rpartXse, earth, nnet 
## 
## -> Task:  UAL
##   *Workflow: standSVM 
##                nmae
## avg     0.157833124
## std     0.026941992
## med     0.167957847
## iqr     0.007190916
## min     0.111358621
## max     0.180744908
## invalid 0.000000000
## 
##   *Workflow: slideSVM 
##                nmae
## avg     0.101754574
## std     0.006367492
## med     0.104662230
## iqr     0.006503304
## min     0.091459718
## max     0.106584051
## invalid 0.000000000
## 
##   *Workflow: rf420 
##               nmae
## avg     0.32083274
## std     0.08421363
## med     0.33302084
## iqr     0.01121475
## min     0.18835636
## max     0.42302137
## invalid 0.00000000
## 
##   *Workflow: rpart 
##               nmae
## avg     0.24044454
## std     0.04136346
## med     0.21807413
## iqr     0.03103832
## min     0.20869148
## max     0.30932475
## invalid 0.00000000
## 
##   *Workflow: lm 
##               nmae
## avg     0.10544083
## std     0.01272695
## med     0.11217347
## iqr     0.01317972
## min     0.08487747
## max     0.11445598
## invalid 0.00000000
## 
##   *Workflow: gbm 
##                nmae
## avg     0.412070861
## std     0.087038230
## med     0.414568025
## iqr     0.009477313
## min     0.288430415
## max     0.534397506
## invalid 0.000000000
## 
##   *Workflow: rpartXse 
##               nmae
## avg     0.24019676
## std     0.04223699
## med     0.21968875
## iqr     0.02227730
## min     0.20989495
## max     0.31304038
## invalid 0.00000000
## 
##   *Workflow: earth 
##                nmae
## avg     0.140842380
## std     0.017473792
## med     0.143141384
## iqr     0.003230122
## min     0.114457657
## max     0.163506613
## invalid 0.000000000
## 
##   *Workflow: nnet 
##               nmae
## avg     0.28170024
## std     0.07803857
## med     0.23462402
## iqr     0.11160134
## min     0.21989389
## max     0.39168136
## invalid 0.00000000

plot(exp_UAL)

Similar to the conclusions with the others stocks, the LM and the Sliding SVM are the best performing models for each iteration (i.e. 1 to 5 lag).

Visualization of results

plot_grid(plot(exp), plot(exp_ALK), plot(exp_DAL), plot(exp_LUV), plot(exp_UAL))

Conclusion on model selection

Overall, the sliding SVM and the LM outperform every other model for each stock for each lag period (i.e. 1 to 5 days).

Given that the results are fairly consistent across stocks and prediction windows and that all stocks belong to the same industry, we believe it is reasonable to use a single model for all the predictions.

In our view, both the Slide SVM and the LM show a strong performance and merits to be chosen for prediction purposes. We decided to go for the LM model for the following reasons:

It is simpler
It is easier to explain
It is computationally much quicker i.e. this is very important given that our next goal is to build a reactive Shiny app.

Note: In the appendices section of this notebook, we have included the code we used to optimize the parameters of the Slide SVM and how we would have used the Slide SVM to make predictions if this would have been our selected modeling approach.

Prediction of stock prices for the next 5 days using LM

First, we created a sequence of 5 days starting today and considering trading days i.e. excluding weekends and holidays per the NYSE calendar.

Next, for each stock we will create a data frame that enables us to make a prediction. For instance, to develop the 5-days prediction model, we will need to predict the close price of the stock using the technical indicator predictors lagged by 5 days (i.e. this would be the most recent information available upon prediction).

Once we have the data frames, we can fit an LM model to each of them.

Next, we need to make a prediction. It is worth noting, that the most recent data available will be that of the most recent trading day. We will use that information for each of the models with the parameter “weights” defining the close price for the next 1, 2, 3, 4 and 5 days.

We have included all the code below. The process is the same for the 5 stocks:

Create data frame for each prediction window
Fit linear model
Make prediction using the most recent date available for the stock
Create the production output data frame with the prediction for the next 5 trading days

holidays <- holidayNYSE()
daysSeq <- as.timeDate(seq(from = Sys.Date(), length=15, by = "day"))
daysSeq <- head(daysSeq[isBizday(daysSeq, holidays = holidays, wday = 1:5)],5)

for (stock in stocks) {
  
  for (i in 1:5) {
    
    namdf <- paste0('t_',stock,'_DF_Lag', i)
    contdf <- data.frame(Cl(get(stock)), lag(myATR(get(stock)), i), lag(mySMI(get(stock)),i), lag(myADX(get(stock)),i), lag(mySAR(get(stock)),i), lag(runMean(Cl(get(stock))),i), lag(EMA(Delt(Cl(get(stock)))),i), lag(RSI(Cl(get(stock))),i))
    colnames(contdf) <- c('Close', 'ATR', 'SMI', 'ADX', 'SAR', 'runMean', 'EMA', 'RSI')
    assign(namdf, contdf)
    
    namlm <- paste0('t_',stock,'_lmlag',i)
    contlm <- lm(formula = Close ~., data = get(paste0('t_',stock,'_DF_Lag',i)))
    assign(namlm, contlm)
    
    namtestpred <- paste0('t_',stock,'_DF_testpred')
    conttestpred <- filter(data.frame(Cl(get(stock)), myATR(get(stock)), mySMI(get(stock)), myADX(get(stock)), mySAR(get(stock)), runMean(Cl(get(stock))), EMA(Delt(Cl(get(stock)))), RSI(Cl(get(stock)))), row_number() == n())
    colnames(conttestpred) <- c('Close', 'ATR', 'SMI', 'ADX', 'SAR', 'runMean', 'EMA', 'RSI')
    assign(namtestpred, conttestpred)
  
    nampred <- paste0('t_',stock,'_lmpredlag',i)
    contpred <- predict(get(paste0('t_',stock,'_lmlag',i)), get(paste0('t_',stock,'_DF_testpred')), type = 'response')
    assign(nampred, contpred)
    
  }
  
  nampredictions <- paste0('t_',stock,'_preds')
  contpredictions <- c(get(paste0('t_',stock,'_lmpredlag1')), get(paste0('t_',stock,'_lmpredlag2')), get(paste0('t_',stock,'_lmpredlag3')), get(paste0('t_',stock,'_lmpredlag4')), get(paste0('t_',stock,'_lmpredlag5')))
  assign(nampredictions, contpredictions)
  
  namxts <- paste0('t_',stock,'_xts')
  contxts <- xts(x = get(paste0('t_',stock,'_preds')), order.by = as.Date(daysSeq))
  assign(namxts,contxts)
  
  namfullts <- paste0('t_',stock,'_fullts')
  contfullts <- rbind(get(stock)[,4], get(paste0('t_',stock,'_xts')))
  assign(namfullts,contfullts)

}

Finally, the table below contains the predictions for the next 5 days for the 5 stocks.

kable(data.frame(Date=daysSeq,AAL_Predictions=t_AAL_preds, ALK_Predictions=t_ALK_preds, UAL_Predictions=t_DAL_preds, LUV_Predictions=t_LUV_preds, UAL_Predictions=t_UAL_preds))

GMT.x..i..	AAL_Predictions	ALK_Predictions	UAL_Predictions	LUV_Predictions	UAL_Predictions.1
2019-09-16	29.87460	65.26187	59.70945	54.93033	89.71158
2019-09-17	29.85552	65.26870	59.68034	54.95730	89.67050
2019-09-18	29.82789	65.27375	59.66205	54.99211	89.66163
2019-09-19	29.79108	65.27668	59.63059	55.01707	89.62829
2019-09-20	29.75145	65.29509	59.60191	55.03467	89.62311

Appendices

Optimization of Slide SVM

We tried a number of variants for the cost and gamma and concluded that the combination of relearn step every 30 days, cost=10 and gamma=0.01 was the one that showed the best performance.

#Performance Estimation tasks
exp_UAL1 <- performanceEstimation(
    PredTask(Tform_UAL, Tdata.train_UAL, 'UAL'),   
      workflowVariants('timeseriesWF', wfID="slideSVM", 
               type="slide", relearn.step=c(30, 60, 90),
               learner='svm',learner.pars=list(cost=c(1,7,10),gamma=0.01)),
    EstimationTask(metrics="nmae",
                   method=MonteCarlo(nReps=5,szTrain=0.5,szTest=0.25)))

## 
## 
## ##### PERFORMANCE ESTIMATION USING  MONTE CARLO  #####
## 
## ** PREDICTIVE TASK :: UAL
## 
## ++ MODEL/WORKFLOW :: svm.v1 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v2 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v3 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v4 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v5 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v6 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v7 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v8 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594 
## 
## 
## 
## ++ MODEL/WORKFLOW :: svm.v9 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## Repetition  1 
##   start test =  1257 ; test size =  594 
## Repetition  2 
##   start test =  1551 ; test size =  594 
## Repetition  3 
##   start test =  1559 ; test size =  594 
## Repetition  4 
##   start test =  1560 ; test size =  594 
## Repetition  5 
##   start test =  1699 ; test size =  594

summary(exp_UAL1)

## 
## == Summary of a  Monte Carlo Performance Estimation Experiment ==
## 
## Task for estimating  nmae  using
## 5  repetitions Monte Carlo Simulation using: 
##   seed =  1234 
##   train size =  0.5 x NROW(DataSet) 
##   test size =  0.25 x NROW(DataSet) 
## 
## * Predictive Tasks ::  UAL
## * Workflows  ::  svm.v1, svm.v2, svm.v3, svm.v4, svm.v5, svm.v6, svm.v7, svm.v8, svm.v9 
## 
## -> Task:  UAL
##   *Workflow: svm.v1 
##               nmae
## avg     0.10257327
## std     0.00718602
## med     0.10654621
## iqr     0.01004526
## min     0.09291279
## max     0.10938238
## invalid 0.00000000
## 
##   *Workflow: svm.v2 
##                nmae
## avg     0.107147430
## std     0.006234698
## med     0.110773807
## iqr     0.005164900
## min     0.096786215
## max     0.111477722
## invalid 0.000000000
## 
##   *Workflow: svm.v3 
##                nmae
## avg     0.109139979
## std     0.006672939
## med     0.113286464
## iqr     0.008481079
## min     0.099132420
## max     0.114077268
## invalid 0.000000000
## 
##   *Workflow: svm.v4 
##                nmae
## avg     0.096583114
## std     0.005518074
## med     0.098844393
## iqr     0.004638806
## min     0.088229580
## max     0.102594696
## invalid 0.000000000
## 
##   *Workflow: svm.v5 
##                nmae
## avg     0.100716373
## std     0.005113150
## med     0.102934344
## iqr     0.001530555
## min     0.091655892
## max     0.103721806
## invalid 0.000000000
## 
##   *Workflow: svm.v6 
##                nmae
## avg     0.102140986
## std     0.006156431
## med     0.105511998
## iqr     0.005060912
## min     0.091863185
## max     0.106426848
## invalid 0.000000000
## 
##   *Workflow: svm.v7 
##                nmae
## avg     0.096489131
## std     0.005040812
## med     0.098997864
## iqr     0.004962818
## min     0.088866777
## max     0.101481422
## invalid 0.000000000
## 
##   *Workflow: svm.v8 
##                nmae
## avg     0.100056720
## std     0.005089199
## med     0.102229307
## iqr     0.002112098
## min     0.091093952
## max     0.103115559
## invalid 0.000000000
## 
##   *Workflow: svm.v9 
##                nmae
## avg     0.101754574
## std     0.006367492
## med     0.104662230
## iqr     0.006503304
## min     0.091459718
## max     0.106584051
## invalid 0.000000000

plot(exp_UAL1)

topPerformer(exp_UAL1, 'nmae','UAL')

## Workflow Object:
##  Workflow ID       ::  svm.v7 
##  Workflow Function ::  timeseriesWF
##       Parameter values:
##       learner.pars  -> cost=10 gamma=0.01 
##       type  -> slide 
##       relearn.step  -> 30 
##       learner  -> svm

Prediction using Sliding SVM

While we did not use this prediction, we were able to implement a prediction using the Slide SVM. We have included the code below for illustrative purposes.

UAL_DF_SVM_lag5 <- data.frame(Cl(UAL), lag(myATR(UAL), 5), lag(mySMI(UAL),5), lag(myADX(UAL),5), lag(mySAR(UAL),5), lag(runMean(Cl(UAL)),5), lag(EMA(Delt(Cl(UAL))),5), lag(RSI(Cl(UAL)),5))
colnames(t_UAL_DF_Lag5) <- c('Close', 'ATR', 'SMI', 'ADX', 'SAR', 'runMean', 'EMA', 'RSI')

relearn.step <- 30
n <- NROW(UAL_DF_SVM_lag5)
train.size_UAL <- NROW(Tdata.train_UAL)
sts <- seq(train.size_UAL+1, n, by = relearn.step)
preds_SVM_UAL <- vector()

for (s in sts) {
  tr <- UAL_DF_SVM_lag5[(s-train.size_UAL):(s-1),]
  colnames(tr) <- c('Close', 'ATR', 'SMI', 'ADX', 'SAR', 'runMean', 'EMA', 'RSI')
  ts <- UAL_DF_SVM_lag5[s:min((s+relearn.step-1),n),]
  colnames(ts) <- c('Close', 'ATR', 'SMI', 'ADX', 'SAR', 'runMean', 'EMA', 'RSI')
  m <- do.call('svm', c(list(Close ~., tr), list(cost=10,gamma=0.01)))
}

UAL_svmpredlag5 <- predict(m, t_UAL_DF_testpred, type = 'response')
UAL_svmpredlag5

##        1 
## 90.52084

R Notebook - Study Group 13 Post-module Assignment

Goal of assignment

Analytical approach

Selected stocks

Model selection for AAL

Pull stock data for the selected stocks

We will visualize the performance of the selected stocks

Create functions to produce predictors for each stock

Refine predictors

Create data model with refined list of predictors

Run model

Model selection for the remaining stocks

ALK

DAL

LUV

UAL

Visualization of results

Conclusion on model selection

Prediction of stock prices for the next 5 days using LM

Appendices

Optimization of Slide SVM

Prediction using Sliding SVM