R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

#This project aims at harnessing some of the capabilities of h2o.ai framework to build robust models applied to financial data. Application of h2o.ai to large financial data seems to have attracted less attention; and thus has improved less quickly.This first part aims at comparing some widely used machine learning algorithms; and tentatively touches only on a tiny portion of these capabilities; and hopes to inspire others to assess its practical usefulness and perhaps improve on it. Its set up uses some medium-size data; and a small number of widely used technical indicators to try predict stock market direction a certain period in the future. As we will see most probability metrics are good. On the other hand, it’d be interesting to see how this post applies to large more volatile data. These algorithms are being run separately; but one can choose to train and further tune some specific models or use H2O AUTOML interface to automate supervised models.Unsupervised learning algorithms can similarly be trained within the framework. Moreover,h2o.ai can knit seamlessly with other framework such as modeltime for time series analysis.Also,it’d also be interesting to compare these models with more traditional forecating methods . In a previous post, we looked at similar algorithms within caret framework. All comments, and feedback are welcome.

require(tidyquant)
require(tidyverse)
#Get data & construct some  technical indicators
stock.dt=function(stock, period){
dx=stock%>%tq_get(get = "stock.prices",from=Sys.Date()-years(3),to=Sys.Date()+1)
dx=dx%>%tq_mutate(c(high,low,close),mutate_fun = ATR,n=14,matype="EMA")
dx=dx%>%select(-c(tr,ATR,ATR..1))
dx=dx%>%tq_mutate(close,mutate_fun = RSI,n=14,maType="EMA")
dx=dx%>%tq_mutate(close,mutate_fun = MACD,maType="EMA",percent=F)
dx=dx%>%mutate(diff=macd-signal)
dx=dx%>%tq_mutate(c(high,low,close),mutate_fun = SMI,maType="EMA")
dx=dx%>%select(-(signal..1))
dx=dx%>%mutate(mfi=MFI(dx[,c("high","low","close")],dx[,"volume"],n=14))
dx=dx%>%tq_mutate(c(high,low,close),mutate_fun = WPR,col_rename = "wrp")%>%mutate(wrp=-100*wrp)
dx=dx%>%tq_mutate(c(high,low,close),mutate_fun = ADX,maType="EMA")
dx=dx%>%tq_mutate(c(high,low,close),mutate_fun = CCI,n=14,maType="EMA")
dx=dx%>%tq_mutate(close,mutate_fun = CMO,n=14)
dx=dx%>%tq_mutate_xy(close,volume,mutate_fun = OBV)
dx=dx%>%select(-(DIp:DX))
dx=dx%>%mutate(lag.chg=close-lag(close,period))
dx=dx%>%mutate(trend=ifelse(lag.chg>0,"UP","DOWN"))
dx=dx%>%select(-c(symbol,open:adjusted))
dx=na.omit(dx)%>%mutate_if(is.numeric,scale)
dx=dx%>%mutate_if(is.numeric,function(t) c(t))
return(dx)
}
#Application to SP500
stock="^GSPC"
period=2 # number of days in the future
df=stock.dt("^GSPC",2)

#Data manipulation in h2o:
df1=head(df,-period)
# Slice of data df number of days ahead equal to period.
slc=df%>%slice_tail(n=period)
slc1=slc %>% select(-c(date,lag.chg))
slc.fut=as.h2o(slc1,destination_frame = "slc.fut")

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

df1=df1 %>% select(-c(date,lag.chg))
dh=as.h2o(df1,destination_frame = "dh")

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

response=dh["trend"]=h2o.asfactor(dh["trend"])
predictors=setdiff(names(dh),response)
parts=h2o.splitFrame(dh,.8)
x.train=parts[[1]]
x.test=parts[[2]]
#End segment data manipulation in h2o

#Notice that instead of using the entire set of predictors, one can apply h2o tools for features selection : the most relevant predictors, the so-called “admissable features”,unique drivers of the trend. These admissible features help improve the performance of the models chosen, and enhance forecasts.

#Feature selection:
ifg=h2o.infogram(x=predictors,y="trend",training_frame = x.train)

## Warning in .verify_dataxy(training_frame, x, y): removing response variable from
## the explanatory variables

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================================| 100%

plot(ifg)

adm.var=ifg@admissible_features# admissible features
pr.var=setdiff(names(dh),c(adm.var,"trend"))#protected_variables
# Choose favorable_class:
ifg=h2o.infogram(x=pr.var,y="trend",training_frame = x.train)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |======================================================================| 100%

plot(ifg)

#End feature selection

The following algorithms can also be run using only admissible features.

#Addionally:one can readily use h2o.explain (model,newdata)function for detailed performance and visualization; for instance, to see how each predictor impacts the trend.

#Random forest with h2o segment:

h2o.rf=h2o.randomForest(x=predictors,y="trend",training_frame = x.train)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

h2o.performance(h2o.rf,x.test)

## H2OBinomialMetrics: drf
## 
## MSE:  0.09778684
## RMSE:  0.3127089
## LogLoss:  0.3186495
## Mean Per-Class Error:  0.1250896
## AUC:  0.9360215
## AUCPR:  0.9479744
## Gini:  0.872043
## R^2:  0.5951134
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##        DOWN UP    Error     Rate
## DOWN     52 10 0.161290   =10/62
## UP        8 82 0.088889    =8/90
## Totals   60 92 0.118421  =18/152
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold     value idx
## 1                       max f1  0.500000  0.901099  23
## 2                       max f2  0.260000  0.953390  31
## 3                 max f0point5  0.720000  0.897436  14
## 4                 max accuracy  0.500000  0.881579  23
## 5                max precision  1.000000  1.000000   0
## 6                   max recall  0.260000  1.000000  31
## 7              max specificity  1.000000  1.000000   0
## 8             max absolute_mcc  0.500000  0.753885  23
## 9   max min_per_class_accuracy  0.580000  0.854839  19
## 10 max mean_per_class_accuracy  0.500000  0.874910  23
## 11                     max tns  1.000000 62.000000   0
## 12                     max fns  1.000000 79.000000   0
## 13                     max fps  0.000000 62.000000  44
## 14                     max tps  0.260000 90.000000  31
## 15                     max tnr  1.000000  1.000000   0
## 16                     max fnr  1.000000  0.877778   0
## 17                     max fpr  0.000000  1.000000  44
## 18                     max tpr  0.260000  1.000000  31
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`

pred.rf=h2o.predict(h2o.rf,newdata = x.test)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

rf.table=h2o.predicted_vs_actual_by_variable(h2o.rf,x.test,pred.rf,"trend")
rf.table

## Predicted vs Actual by Variable: 
##   trend  predict   actual
## 1  DOWN 0.225806 0.000000
## 2    UP 0.911111 1.000000

h2o.varimp_plot(h2o.rf) # one can remove less important variables

#Predict number of days in the future equal to period
rf.fut=h2o.predict(h2o.rf,newdata = slc.fut)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

rf.fut

##   predict DOWN   UP
## 1      UP 0.54 0.46
## 2    DOWN 0.56 0.44
## 
## [2 rows x 3 columns]

#End random forest segment with h2o

# GLM with h2o segment:
glm.h2o=h2o.glm(x=predictors,y="trend",training_frame = x.train,family = "binomial",alpha = .75,lambda_search = T)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

#Note that additionally we also can use some classic feature selection method such as Lasso (L1) i.e. alpha=1 to remove weaker variables.
h2o.performance(glm.h2o,x.test)

## H2OBinomialMetrics: glm
## 
## MSE:  0.1050074
## RMSE:  0.3240484
## LogLoss:  0.3319682
## Mean Per-Class Error:  0.1206093
## AUC:  0.9274194
## AUCPR:  0.946052
## Gini:  0.8548387
## R^2:  0.5652167
## Residual Deviance:  100.9183
## AIC:  122.9183
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##        DOWN UP    Error     Rate
## DOWN     56  6 0.096774    =6/62
## UP       13 77 0.144444   =13/90
## Totals   69 83 0.125000  =19/152
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold     value idx
## 1                       max f1  0.626369  0.890173  82
## 2                       max f2  0.105243  0.937500 119
## 3                 max f0point5  0.665409  0.914634  79
## 4                 max accuracy  0.626369  0.875000  82
## 5                max precision  0.994784  1.000000   0
## 6                   max recall  0.105243  1.000000 119
## 7              max specificity  0.994784  1.000000   0
## 8             max absolute_mcc  0.626369  0.748980  82
## 9   max min_per_class_accuracy  0.626369  0.855556  82
## 10 max mean_per_class_accuracy  0.626369  0.879391  82
## 11                     max tns  0.994784 62.000000   0
## 12                     max fns  0.994784 89.000000   0
## 13                     max fps  0.000174 62.000000 151
## 14                     max tps  0.105243 90.000000 119
## 15                     max tnr  0.994784  1.000000   0
## 16                     max fnr  0.994784  0.988889   0
## 17                     max fpr  0.000174  1.000000 151
## 18                     max tpr  0.105243  1.000000 119
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`

pred.glm=h2o.predict(glm.h2o,newdata = x.test)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

pred.glm

##   predict        DOWN          UP
## 1      UP 0.200336913 0.799663087
## 2      UP 0.100544015 0.899455985
## 3    DOWN 0.832841575 0.167158425
## 4    DOWN 0.995166511 0.004833489
## 5    DOWN 0.777915250 0.222084750
## 6      UP 0.005215569 0.994784431
## 
## [152 rows x 3 columns]

glm.table=h2o.predicted_vs_actual_by_variable(glm.h2o,x.test,pred.glm,"trend")
glm.table

## Predicted vs Actual by Variable: 
##   trend  predict   actual
## 1  DOWN 0.145161 0.000000
## 2    UP 0.866667 1.000000

h2o.varimp_plot(glm.h2o)

#Predict number of days in the future equal to period
glm.fut=h2o.predict(glm.h2o,newdata = slc.fut)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

glm.fut

##   predict      DOWN        UP
## 1    DOWN 0.6460148 0.3539852
## 2    DOWN 0.8901014 0.1098986
## 
## [2 rows x 3 columns]

#End random forest segment with h2o
# GBM with h2o segment:
gbm.h2o=h2o.gbm(x=predictors,y="trend",training_frame = x.train,ntrees = 50,max_depth = 3,min_rows = 2,learn_rate = .2,distribution ="bernoulli" )

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

h2o.performance(gbm.h2o,newdata = x.test)

## H2OBinomialMetrics: gbm
## 
## MSE:  0.09517636
## RMSE:  0.3085067
## LogLoss:  0.3110409
## Mean Per-Class Error:  0.1170251
## AUC:  0.938172
## AUCPR:  0.9531808
## Gini:  0.8763441
## R^2:  0.6059221
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##        DOWN UP    Error     Rate
## DOWN     53  9 0.145161    =9/62
## UP        8 82 0.088889    =8/90
## Totals   61 91 0.111842  =17/152
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold     value idx
## 1                       max f1  0.417723  0.906077  89
## 2                       max f2  0.106803  0.933610 120
## 3                 max f0point5  0.640323  0.926829  78
## 4                 max accuracy  0.417723  0.888158  89
## 5                max precision  0.998350  1.000000   0
## 6                   max recall  0.106803  1.000000 120
## 7              max specificity  0.998350  1.000000   0
## 8             max absolute_mcc  0.417723  0.767948  89
## 9   max min_per_class_accuracy  0.503996  0.866667  83
## 10 max mean_per_class_accuracy  0.640323  0.889964  78
## 11                     max tns  0.998350 62.000000   0
## 12                     max fns  0.998350 89.000000   0
## 13                     max fps  0.002888 62.000000 149
## 14                     max tps  0.106803 90.000000 120
## 15                     max tnr  0.998350  1.000000   0
## 16                     max fnr  0.998350  0.988889   0
## 17                     max fpr  0.002888  1.000000 149
## 18                     max tpr  0.106803  1.000000 120
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`

pred.gbm=h2o.predict(gbm.h2o,newdata = x.test)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

pred.gbm

##   predict        DOWN        UP
## 1      UP 0.008525326 0.9914747
## 2      UP 0.004556997 0.9954430
## 3    DOWN 0.665278478 0.3347215
## 4    DOWN 0.855967711 0.1440323
## 5      UP 0.333538225 0.6664618
## 6      UP 0.013130790 0.9868692
## 
## [152 rows x 3 columns]

gbm.table=h2o.predicted_vs_actual_by_variable(gbm.h2o,x.test,pred.gbm,"trend")
gbm.table

## Predicted vs Actual by Variable: 
##   trend  predict   actual
## 1  DOWN 0.112903 0.000000
## 2    UP 0.855556 1.000000

h2o.varimp_plot(gbm.h2o)

#Predict number of days in the future equal to period
gbm.fut=h2o.predict(gbm.h2o,newdata = slc.fut)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

gbm.fut

##   predict      DOWN        UP
## 1    DOWN 0.6300092 0.3699908
## 2    DOWN 0.8205703 0.1794297
## 
## [2 rows x 3 columns]

#End GBM segment with h2o

# SVM with h2o segment:
svm.h2o=h2o.psvm(x=predictors,y="trend",training_frame = x.train,gamma = .01,rank_ratio = .1,disable_training_metrics = F )

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

h2o.performance(svm.h2o,x.test)

## H2OBinomialMetrics: psvm
## 
## MSE:  0.1578947
## RMSE:  0.3973597
## LogLoss:  NaN
## Mean Per-Class Error:  0.1609319
## AUC:  NaN
## AUCPR:  NaN
## Gini:  NaN
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##        DOWN UP    Error     Rate
## DOWN     51 11 0.177419   =11/62
## UP       13 77 0.144444   =13/90
## Totals   64 88 0.157895  =24/152
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold     value idx
## 1                       max f1  1.000000  0.865169   0
## 2                       max f2  1.000000  0.859375   0
## 3                 max f0point5  1.000000  0.871041   0
## 4                 max accuracy  1.000000  0.842105   0
## 5                max precision  1.000000  0.875000   0
## 6                   max recall  1.000000  0.855556   0
## 7              max specificity  1.000000  0.822581   0
## 8             max absolute_mcc  1.000000  0.674998   0
## 9   max min_per_class_accuracy  1.000000  0.822581   0
## 10 max mean_per_class_accuracy  1.000000  0.839068   0
## 11                     max tns  1.000000 51.000000   0
## 12                     max fns  1.000000 13.000000   0
## 13                     max fps  1.000000 11.000000   0
## 14                     max tps  1.000000 77.000000   0
## 15                     max tnr  1.000000  0.822581   0
## 16                     max fnr  1.000000  0.144444   0
## 17                     max fpr  1.000000  0.177419   0
## 18                     max tpr  1.000000  0.855556   0
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`

pred.svm=h2o.predict(svm.h2o,newdata = x.test)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

svm.table=h2o.predicted_vs_actual_by_variable(svm.h2o,x.test,pred.svm,"trend")
svm.table

## Predicted vs Actual by Variable: 
##   trend  predict   actual
## 1  DOWN 0.177419 0.000000
## 2    UP 0.855556 1.000000

#More generally h2o.explain(h2o.svm,newdata = x.test)
#Predict number of days in the future equal to period
svm.fut=h2o.predict(svm.h2o,newdata = slc.fut)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

svm.fut

##   predict DOWN UP
## 1    DOWN    1  0
## 2    DOWN    1  0
## 
## [2 rows x 3 columns]

#End SVM segment with h2o

Notice how the above models evaluate the importance of predictors.

#Now we can train a deep learning on data:
dl.h2o=h2o.deeplearning(x=predictors,y="trend",training_frame = x.train,epochs = 15,activation = "Rectifier",hidden = c(10,5,10),input_dropout_ratio = .7)

## Warning in .verify_dataxy(training_frame, x, y, autoencoder): removing response
## variable from the explanatory variables

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |======================================================================| 100%

h2o.performance(dl.h2o,newdata = x.test)

## H2OBinomialMetrics: deeplearning
## 
## MSE:  0.1455137
## RMSE:  0.3814626
## LogLoss:  0.4470938
## Mean Per-Class Error:  0.1956989
## AUC:  0.8697133
## AUCPR:  0.9004836
## Gini:  0.7394265
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##        DOWN UP    Error     Rate
## DOWN     46 16 0.258065   =16/62
## UP       12 78 0.133333   =12/90
## Totals   58 94 0.184211  =28/152
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold     value idx
## 1                       max f1  0.518471  0.847826  93
## 2                       max f2  0.109562  0.914761 120
## 3                 max f0point5  0.668760  0.837438  78
## 4                 max accuracy  0.518471  0.815789  93
## 5                max precision  0.989168  1.000000   0
## 6                   max recall  0.046528  1.000000 131
## 7              max specificity  0.989168  1.000000   0
## 8             max absolute_mcc  0.518471  0.615705  93
## 9   max min_per_class_accuracy  0.619020  0.774194  83
## 10 max mean_per_class_accuracy  0.518471  0.804301  93
## 11                     max tns  0.989168 62.000000   0
## 12                     max fns  0.989168 89.000000   0
## 13                     max fps  0.004088 62.000000 151
## 14                     max tps  0.046528 90.000000 131
## 15                     max tnr  0.989168  1.000000   0
## 16                     max fnr  0.989168  0.988889   0
## 17                     max fpr  0.004088  1.000000 151
## 18                     max tpr  0.046528  1.000000 131
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`

h2o.predict(dl.h2o,newdata = x.test)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

##   predict       DOWN        UP
## 1      UP 0.14424104 0.8557590
## 2      UP 0.07893262 0.9210674
## 3      UP 0.19644233 0.8035577
## 4    DOWN 0.79459663 0.2054034
## 5    DOWN 0.72422538 0.2757746
## 6      UP 0.17891157 0.8210884
## 
## [152 rows x 3 columns]

h2o.varimp_plot(dl.h2o)

#End of deep learning segment:

Lets try h2o Automl function:

aml.h2o=h2o.automl(y="trend",training_frame = x.train,max_models = 3)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==                                                                    |   3%
## 14:02:23.108: AutoML: XGBoost is not available; skipping it.
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |======================================================================| 100%

aml.h2o

## AutoML Details
## ==============
## Project Name: AutoML_2_20230822_140223 
## Leader Model ID: GLM_1_AutoML_2_20230822_140223 
## Algorithm: glm 
## 
## Total Number of Models Trained: 4 
## Start Time: 2023-08-22 14:02:23 UTC 
## End Time: 2023-08-22 14:02:34 UTC 
## Duration: 11 s
## 
## Leaderboard
## ===========
##                                                  model_id       auc   logloss
## 1                          GLM_1_AutoML_2_20230822_140223 0.9225302 0.3507170
## 2 StackedEnsemble_BestOfFamily_1_AutoML_2_20230822_140223 0.9215561 0.3524509
## 3                          DRF_1_AutoML_2_20230822_140223 0.9060080 0.4909026
## 4                          GBM_1_AutoML_2_20230822_140223 0.8825792 0.4372729
##       aucpr mean_per_class_error      rmse       mse
## 1 0.9374635            0.1496795 0.3335632 0.1112644
## 2 0.9305654            0.1464555 0.3321439 0.1103196
## 3 0.9076241            0.2003394 0.3494657 0.1221263
## 4 0.9045784            0.1931561 0.3737525 0.1396909
## 
## [4 rows x 7 columns]

h2o.performance(aml.h2o@leader,newdata = x.test)

## H2OBinomialMetrics: glm
## 
## MSE:  0.1040969
## RMSE:  0.3226405
## LogLoss:  0.3296671
## Mean Per-Class Error:  0.1206093
## AUC:  0.9281362
## AUCPR:  0.9465219
## Gini:  0.8562724
## R^2:  0.5689867
## Residual Deviance:  100.2188
## AIC:  126.2188
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##        DOWN UP    Error     Rate
## DOWN     56  6 0.096774    =6/62
## UP       13 77 0.144444   =13/90
## Totals   69 83 0.125000  =19/152
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold     value idx
## 1                       max f1  0.627000  0.890173  82
## 2                       max f2  0.112686  0.939457 118
## 3                 max f0point5  0.651365  0.914634  79
## 4                 max accuracy  0.627000  0.875000  82
## 5                max precision  0.994172  1.000000   0
## 6                   max recall  0.112686  1.000000 118
## 7              max specificity  0.994172  1.000000   0
## 8             max absolute_mcc  0.627000  0.748980  82
## 9   max min_per_class_accuracy  0.627000  0.855556  82
## 10 max mean_per_class_accuracy  0.627000  0.879391  82
## 11                     max tns  0.994172 62.000000   0
## 12                     max fns  0.994172 89.000000   0
## 13                     max fps  0.000205 62.000000 151
## 14                     max tps  0.112686 90.000000 118
## 15                     max tnr  0.994172  1.000000   0
## 16                     max fnr  0.994172  0.988889   0
## 17                     max fpr  0.000205  1.000000 151
## 18                     max tpr  0.112686  1.000000 118
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`

h2o.predict(aml.h2o,newdata = x.test)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

##   predict        DOWN          UP
## 1      UP 0.201180266 0.798819734
## 2      UP 0.107515127 0.892484873
## 3    DOWN 0.829518737 0.170481263
## 4    DOWN 0.994471889 0.005528111
## 5    DOWN 0.778892309 0.221107691
## 6      UP 0.005828042 0.994171958
## 
## [152 rows x 3 columns]

h2o.predict(aml.h2o,newdata = slc.fut)

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

##   predict      DOWN        UP
## 1    DOWN 0.6366427 0.3633573
## 2    DOWN 0.8794057 0.1205943
## 
## [2 rows x 3 columns]

# can retrieve more information using h2o.explain(aml.h2o,newdata) i.e. explain one single model or compare the above models .

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Stock Market Prediction and Machine Learning with h2o.ai

ABA

2023-08-21

R Markdown

The following algorithms can also be run using only admissible features.

Notice how the above models evaluate the importance of predictors.

Lets try h2o Automl function:

Including Plots