Forecast financial crisis in brasilian stock market using Naive Bayes

Marcos J Ribeiro

04/06/2020

My Machine Learning work

My Machine Learning work

Naive Bayes function

naivef = function(k, df, cd=1){
    if(cd == 1){
      naive_marcos(k, df)
    }else if (cd == 0){
      naive_marcos2(k, df)
    }else{
      cat('Type cd = 1 for categorical dependent variables, \n
      and cd = 0 for non-categorical dependent variables.')
    }} 

Naive Bayes function

predf = function(k, df, df_n, cl, cclas=0, cd=1){
  if(cd == 1){
    pred_marcos(k, df, df_n, cl, cclas)
  }else if (cd == 0){
    pred_marcos2(k, df, df_n, cl, cclas)
  }else{
    cat('Type cd = 1 for categorical dependent variables, 
    \n and cd = 0 for non-categorical dependent variables.')
  }} 

My first example (default risk)

Dataset with categorical independent variables
historia divida risco
ruim alta alto
desconhecida alta alto
desconhecida baixa moderado
desconhecida baixa alto
desconhecida baixa baixo
desconhecida baixa baixo

Inductor to categorical dependet variables

cl = naivef('risco', df, cd=1)
## [1] "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-"
## [1] "Marcos Naive Bayes Classifier for Discrete Predictors"
## [1] "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-"
## A-priori probabilities:
## 
##      alto     baixo  moderado 
## 0.4285714 0.3571429 0.2142857 
## Conditional Probabilities:

Inductor to categorical dependet variables

head(cl[, ,1])
##                    alta      baixa
## boa          0.04761905 0.02380952
## desconhecida 0.09523810 0.04761905
## ruim         0.14285714 0.07142857

Predict

New data set with categorical independent variables
historia divida
boa baixa
boa alta
ruim baixa
ruim alta
desconhecida baixa
desconhecida alta

Predict

predf('risco', df, df_teste, cl, cclas = 0, cd=1)
##           alto     baixo  moderado
## [1,] 0.1190476 0.6428571 0.2380952
## [2,] 0.3030303 0.5454545 0.1515152
## [3,] 0.6000000 0.0000000 0.4000000
## [4,] 0.8571429 0.0000000 0.1428571
## [5,] 0.2631579 0.4736842 0.2631579
## [6,] 0.5405405 0.3243243 0.1351351

Predict

predf('risco', df, df_teste, cl, cclas = 1, cd=1)
## [1] "baixo" "baixo" "alto"  "alto"  "baixo" "alto"

Quality control

library(e1071) 
clas2 = naiveBayes(x=df[-3], y = as.factor(df$risco))
prev2 = predict(clas2, newdata = df_teste) 
print(prev2) 
## [1] baixo baixo alto  alto  baixo alto 
## Levels: alto baixo moderado

Plots

Plots

Naive Bayes decision boundaries

Naive Bayes decision boundaries

My second example (gender characteristics)

Data set with non categorical independent variables
height weight sex
6.00 180 male
5.92 190 male
5.58 170 male
5.92 165 male
5.00 100 female
5.50 150 female

Inductor to non-categorical dependet variables

cl2 = naivef('sex', teste, cd=0)
## [1] "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-"
## [1] "Marcos Naive Bayes Classifier for Discrete Predictors"
## [1] "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-"
## A-priori probabilities:
## 
## female   male 
##    0.5    0.5

Inductor to non-categorical dependet variables

cl2
## , , female
## 
##          mean   variance
## [1,]   5.4175  0.3118092
## [2,] 132.5000 23.6290781
## 
## , , male
## 
##         mean   variance
## [1,]   5.855  0.1871719
## [2,] 176.250 11.0867789

Predict

New data set with non categorical independent variables
height weight
5.4 170
5.8 183
6.0 188
5.0 188

Predict

predf('sex',teste, dfn, cl2, cclas =1, cd=0)
##      [,1]    
## [1,] "female"
## [2,] "male"  
## [3,] "male"  
## [4,] "female"

Predict

predf('sex',teste, dfn, cl2, cclas =0, cd=0)
##           female        male
## [1,] 0.642353175 0.357646825
## [2,] 0.016711702 0.983288298
## [3,] 0.007327301 0.992672699
## [4,] 0.997700955 0.002299045

Quality control

clas3 = naiveBayes(x=teste[-3], y = teste$sex)
prev3 = predict(clas3, newdata = dfn, 'raw')
print(prev3)
##           female        male
## [1,] 0.642353175 0.357646825
## [2,] 0.016711702 0.983288298
## [3,] 0.007327301 0.992672699
## [4,] 0.997700955 0.002299045

Plots

Plots

Naive Bayes decision boundaries

Naive Bayes decision boundaries

Plots

Naive Bayes decision boundaries

Naive Bayes decision boundaries

My third example (census data)

Census dataset head
education occupation income
HS-grad Adm-clerical <=50K
Some-college Prof-specialty <=50K
HS-grad Adm-clerical <=50K
Bachelors Prof-specialty <=50K
Bachelors Prof-specialty <=50K
HS-grad Other-service <=50K

My third example

education occupation income
10th Adm-clerical <=50K
11th Armed-Forces >50K
12th Craft-repair
1st-4th Exec-managerial
5th-6th Farming-fishing
7th-8th Handlers-cleaners
9th Machine-op-inspct
Assoc-acdm Other-service
Assoc-voc Priv-house-serv
Bachelors Prof-specialty
Doctorate Protective-serv
HS-grad Sales
Masters Tech-support
Preschool Transport-moving
Prof-school
Some-college

Inductor to categorical dependet variables

cl4 =  naivef('income', tr1, cd=1)
## [1] "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-"
## [1] "Marcos Naive Bayes Classifier for Discrete Predictors"
## [1] "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-"
## A-priori probabilities:
## 
##     <=50K      >50K 
## 0.5032143 0.4967857 
## Conditional Probabilities:

Predict

head(predf('income', tr1, tst1, cl4, cclas=0, cd=1))
##          <=50K      >50K
## [1,] 0.8412499 0.1587501
## [2,] 0.4708610 0.5291390
## [3,] 1.0000000 0.0000000
## [4,] 1.0000000 0.0000000
## [5,] 1.0000000 0.0000000
## [6,] 0.1352615 0.8647385

Predict

ndp = predf('income', tr1, tst1, cl4, cclas=1, cd=1)
head(ndp)
## [1] " <=50K" " >50K"  " <=50K" " <=50K" " <=50K" " >50K"

Predict

acurracy1 = (sum((ndp==tst1[,'income'])*1)/length(tst1[,1])) *100
acurracy1 
## [1] 73.63552

Plots

Plots

 Naive Bayes decision region

Naive Bayes decision region

Plots

Naive Bayes decision boundaries

Naive Bayes decision boundaries

Predict financial crisis using my algorithm

\[\begin{equation}\label{eq12} E(R^i_{t+1}) - R^f_{t+1} = \lambda_{g_{t+1}} \beta_{i,g_{t+1}} \end{equation}\]

where

\[\begin{equation}\label{eq13} \beta_{i,g_{t+1}} = \left(\frac{Cov_t(g_{t+1}, R_{t+1})}{Var_t(g_{t+1})} \right) \end{equation}\]

and

\[\begin{equation}\label{eq14} \lambda_{g_{t+1}} = \gamma Var_t(g_{t+1}) \end{equation}\]

Predict financial crisis using my algorithm

Crisis proxy

\[\begin{equation}\label{eq15} CMAX_t = \frac{p_t}{max(p_{t-12},\dotsb,p_t)} \end{equation}\]

Crisis proxy

CMAX = function(w, n, s){
  l = matrix(nrow=n,ncol = (w+1))
  max = matrix(nrow=n, ncol = 1)
  cmax = matrix(nrow=n, ncol = 1)
  for (j in 1:n){
    
    l[j, 1:(w+1)] = s[j:(w+j)]
    max[j] = max(l[j, 1:(w+1)])
    
    cmax[j] = l[j, (w+1)]/max(max[j])
  }
  return(cmax)
}

Crisis proxy

Crisis proxy

CMAX to Ibovespa

CMAX to Ibovespa

Ibovespa returns

Non-categorical dependent variables

Inductor to crisis forecast in brasilian stock market

cl3 = naivef('x',tr, cd=0)
## [1] "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-"
## [1] "Marcos Naive Bayes Classifier for Discrete Predictors"
## [1] "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-"
## A-priori probabilities:
## 
##         0         1 
## 0.4935065 0.5064935

Predict

predf('x', tr, tst, cl3, cclas=0, cd=0)
##                   0 1
##  [1,] 5.545518e-105 1
##  [2,]  1.277693e-98 1
##  [3,]  5.748209e-91 1
##  [4,]  4.422171e-88 1
##  [5,]  1.389092e-87 1
##  [6,]  2.479376e-92 1
##  [7,] 1.332146e-110 1
##  [8,]  3.762972e-80 1
##  [9,]  2.888895e-67 1
## [10,]  1.063357e-14 1

Predict

predf('x', tr, tst, cl3, cclas=1, cd=0)
##       [,1]
##  [1,] "1" 
##  [2,] "1" 
##  [3,] "1" 
##  [4,] "1" 
##  [5,] "1" 
##  [6,] "1" 
##  [7,] "1" 
##  [8,] "1" 
##  [9,] "1" 
## [10,] "1"

Predict

prev = predf('x', tr, tst, cl3, cclas=1, cd=0)
accuracy = (sum((prev == tst[,1])*1)/length(tst[,1]) )*100
accuracy
## [1] 100

Plots

Plots

Naive Bayes decision region

Naive Bayes decision region

Plots

Naive Bayes decision boundaries

Naive Bayes decision boundaries

Predict financial crisis using my algorithm

Inductor to crisis forecast in brasilian stock market

cl4 = naivef('x',tr2, cd=0)
## [1] "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-"
## [1] "Marcos Naive Bayes Classifier for Discrete Predictors"
## [1] "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-"
## A-priori probabilities:
## 
##         0         1 
## 0.4935065 0.5064935

Predict

predf('x', tr2, tst2, cl4, cclas=0, cd=0)
##                  0 1
##  [1,] 5.020233e-42 1
##  [2,] 3.710212e-40 1
##  [3,] 5.763950e-46 1
##  [4,] 1.000049e-48 1
##  [5,] 4.960224e-48 1
##  [6,] 3.223118e-49 1
##  [7,] 1.396563e-46 1
##  [8,] 1.053612e-47 1
##  [9,] 3.915360e-53 1
## [10,] 4.292669e-68 1

Predict

predf('x', tr2, tst2, cl4, cclas=1, cd=0)
##       [,1]
##  [1,] "1" 
##  [2,] "1" 
##  [3,] "1" 
##  [4,] "1" 
##  [5,] "1" 
##  [6,] "1" 
##  [7,] "1" 
##  [8,] "1" 
##  [9,] "1" 
## [10,] "1"

Predict

prev2 = predf('x', tr2, tst2, cl4, cclas=1, cd=0)
accuracy = (sum((prev2 == tst2[,1])*1)/length(tst2[,1]) )*100
accuracy
## [1] 100

Plots

Plots

Naive Bayes decision region

Naive Bayes decision region

Plots

 Naive Bayes decision boundaries

Naive Bayes decision boundaries