1 Greetings

In this RPubs publication whatever is done behind InvestNow starting from data collection until machine learning modeling will be explained, with the aim that the project becomes easier to understand and can be developed by more people to become even better project.

2 EDA

2.1 Data Collection

InvestNow collect all stock data needed from Yahoo Finance by using function tq_get() from library tidyquant. There are two arguments from function tq_get() which must be noticed.
- Argument x :
This argument is used to representing a single stock symbol, the stock symbol used by Yahoo Finance might be different with stock symbol that is usually seen in a particular country. For example in Indonesia, the name of the shares of BRI bank is called BBRI but in Yahoo Finance it is called BBRI.JK.
- Argument get :
This argument is used to representing several options about what data do you want to retrieve and from where the data resource is. Since the data InvestNow would like to collect are open, high, low, close, volume and adjusted stock prices for a stock symbol from Yahoo Finance, please use options "stock.prices".

However, since it is a free data, the data obtained using functiontq_get() from Yahoo Finance is not actual but H-1 data.

2.1.1 BBRI.JK

library(tidyquant)

bbri <- tq_get(x = "BBRI.JK", 
               get = "stock.prices", 
               from = " 2016-01-01")

head(bbri, 3)

After the data from the desired stock has been successfully collected from Yahoo Finance, let’s try to check if there is a missing value with function colSums(is.na()).

colSums(is.na(bbri))

##   symbol     date     open     high      low    close   volume adjusted 
##        0        0        1        1        1        1        1        1

Unfortunately there is missing value, the missing value can be remove by implementing function drop_na() from library tidyverse.

library(tidyverse)

bbri <- bbri %>% 
  drop_na()

colSums(is.na(bbri))

##   symbol     date     open     high      low    close   volume adjusted 
##        0        0        0        0        0        0        0        0

glimpse(bbri)

## Rows: 1,411
## Columns: 8
## $ symbol   <chr> "BBRI.JK", "BBRI.JK", "BBRI.JK", "BBRI.JK", "BBRI.JK", "BBRI.~
## $ date     <date> 2016-01-04, 2016-01-05, 2016-01-06, 2016-01-07, 2016-01-08, ~
## $ open     <dbl> 2280, 2315, 2280, 2270, 2250, 2280, 2295, 2335, 2270, 2300, 2~
## $ high     <dbl> 2320, 2365, 2355, 2305, 2340, 2305, 2345, 2340, 2355, 2330, 2~
## $ low      <dbl> 2240, 2315, 2280, 2250, 2250, 2255, 2285, 2315, 2270, 2290, 2~
## $ close    <dbl> 2295, 2315, 2305, 2250, 2320, 2275, 2320, 2320, 2345, 2290, 2~
## $ volume   <dbl> 100379000, 108043000, 105125500, 71275500, 106501000, 1131830~
## $ adjusted <dbl> 1902.051, 1918.626, 1910.339, 1864.756, 1922.770, 1885.475, 1~

From the observation results of checking data types, there is no data type that must be changed (optional: the symbol column can be changed to a factor data type), and there are no columns containing missing values. Stock data from Bank Rakyat Indonesia is ready for further processing.

2.1.2 ISAT.JK

isat <- tq_get(x = "ISAT.JK", 
               get = "stock.prices", 
               from = " 2016-01-01")

head(isat, 3)

colSums(is.na(isat))

##   symbol     date     open     high      low    close   volume adjusted 
##        0        0        1        1        1        1        1        1

isat <- isat %>% 
  drop_na()

colSums(is.na(isat))

##   symbol     date     open     high      low    close   volume adjusted 
##        0        0        0        0        0        0        0        0

glimpse(isat)

## Rows: 1,411
## Columns: 8
## $ symbol   <chr> "ISAT.JK", "ISAT.JK", "ISAT.JK", "ISAT.JK", "ISAT.JK", "ISAT.~
## $ date     <date> 2016-01-04, 2016-01-05, 2016-01-06, 2016-01-07, 2016-01-08, ~
## $ open     <dbl> 5500, 5250, 5500, 5300, 5250, 5400, 5300, 5425, 5400, 5500, 5~
## $ high     <dbl> 5500, 5625, 5600, 5475, 5350, 5400, 5425, 5500, 5500, 5500, 5~
## $ low      <dbl> 5325, 5250, 5300, 5300, 5050, 5275, 5275, 5350, 5300, 5400, 5~
## $ close    <dbl> 5325, 5400, 5375, 5350, 5300, 5325, 5425, 5400, 5400, 5475, 5~
## $ volume   <dbl> 55100, 78500, 646400, 49000, 402300, 39600, 154500, 415300, 8~
## $ adjusted <dbl> 5150.757, 5223.304, 5199.122, 5174.939, 5126.576, 5150.757, 5~

2.1.3 SIDO.JK

sido <- tq_get(x = "SIDO.JK", 
               get = "stock.prices", 
               from = " 2016-01-01")

head(sido, 3)

colSums(is.na(sido))

##   symbol     date     open     high      low    close   volume adjusted 
##        0        0        8        8        8        8        8        8

sido <- sido %>% 
  drop_na()

colSums(is.na(sido))

##   symbol     date     open     high      low    close   volume adjusted 
##        0        0        0        0        0        0        0        0

glimpse(sido)

## Rows: 1,404
## Columns: 8
## $ symbol   <chr> "SIDO.JK", "SIDO.JK", "SIDO.JK", "SIDO.JK", "SIDO.JK", "SIDO.~
## $ date     <date> 2016-01-04, 2016-01-05, 2016-01-06, 2016-01-07, 2016-01-08, ~
## $ open     <dbl> 275.0, 275.0, 267.5, 262.5, 267.5, 260.0, 260.0, 260.0, 255.0~
## $ high     <dbl> 275.0, 275.0, 267.5, 267.5, 267.5, 262.5, 260.0, 262.5, 257.5~
## $ low      <dbl> 270.0, 262.5, 260.0, 257.5, 260.0, 255.0, 257.5, 255.0, 249.5~
## $ close    <dbl> 272.5, 265.0, 262.5, 267.5, 262.5, 257.5, 257.5, 257.5, 252.5~
## $ volume   <dbl> 1518000, 6712800, 7414400, 8664600, 4862800, 5718000, 1074980~
## $ adjusted <dbl> 199.0331, 193.5551, 191.7291, 195.3811, 191.7291, 188.0771, 1~

2.1.4 HOKI.JK

hoki <- tq_get(x = "HOKI.JK", 
               get = "stock.prices", 
               from = " 2016-01-01")

head(hoki, 3)

colSums(is.na(hoki))

##   symbol     date     open     high      low    close   volume adjusted 
##        0        0        0        0        0        0        0        0

glimpse(hoki)

## Rows: 1,042
## Columns: 8
## $ symbol   <chr> "HOKI.JK", "HOKI.JK", "HOKI.JK", "HOKI.JK", "HOKI.JK", "HOKI.~
## $ date     <date> 2017-07-03, 2017-07-04, 2017-07-05, 2017-07-06, 2017-07-07, ~
## $ open     <dbl> 85.5, 87.0, 83.0, 86.5, 91.5, 90.0, 89.0, 86.5, 88.5, 88.0, 8~
## $ high     <dbl> 87.0, 88.0, 87.5, 93.0, 93.0, 95.0, 89.0, 89.0, 89.0, 88.5, 8~
## $ low      <dbl> 80.0, 81.5, 82.5, 85.5, 88.5, 86.0, 86.0, 84.5, 85.0, 86.0, 8~
## $ close    <dbl> 87.0, 83.0, 86.5, 91.0, 89.0, 89.0, 86.5, 89.0, 88.0, 87.0, 8~
## $ volume   <dbl> 202226400, 197477200, 110545200, 294411600, 71145600, 4270480~
## $ adjusted <dbl> 78.24085, 74.64358, 77.79119, 81.83813, 80.03949, 80.03949, 7~

2.1.5 WIKA.JK

wika <- tq_get(x = "WIKA.JK", 
               get = "stock.prices", 
               from = " 2016-01-01")

head(wika, 3)

colSums(is.na(wika))

##   symbol     date     open     high      low    close   volume adjusted 
##        0        0        1        1        1        1        1        1

wika <- wika %>% 
  drop_na()

colSums(is.na(wika))

##   symbol     date     open     high      low    close   volume adjusted 
##        0        0        0        0        0        0        0        0

glimpse(bbri)

## Rows: 1,411
## Columns: 8
## $ symbol   <chr> "BBRI.JK", "BBRI.JK", "BBRI.JK", "BBRI.JK", "BBRI.JK", "BBRI.~
## $ date     <date> 2016-01-04, 2016-01-05, 2016-01-06, 2016-01-07, 2016-01-08, ~
## $ open     <dbl> 2280, 2315, 2280, 2270, 2250, 2280, 2295, 2335, 2270, 2300, 2~
## $ high     <dbl> 2320, 2365, 2355, 2305, 2340, 2305, 2345, 2340, 2355, 2330, 2~
## $ low      <dbl> 2240, 2315, 2280, 2250, 2250, 2255, 2285, 2315, 2270, 2290, 2~
## $ close    <dbl> 2295, 2315, 2305, 2250, 2320, 2275, 2320, 2320, 2345, 2290, 2~
## $ volume   <dbl> 100379000, 108043000, 105125500, 71275500, 106501000, 1131830~
## $ adjusted <dbl> 1902.051, 1918.626, 1910.339, 1864.756, 1922.770, 1885.475, 1~

2.2 Technical Analysis

Technical analysis is a way of analyzing price movements in the stock market using statistical tools, such as charts and mathematical formulas.

Technical analysis will be used to get the predictors and target variables needed in the form of decisions to Buy, Hold or Sell which will be used later to create machine learning models. Because the target variable is very crucial for training the machine learning model that will be used later, technical analysis will be carried out as perfectly as possible to avoid mistakes in making decisions.

In this project there are 4 types of technical analysis that will be used:

- Simple Moving Average (SMA)

Simple Moving Average (SMA) is one of the technical analysis by adding the latest price series in a time range, then dividing it by the number of periods. So, the average value can be obtained.

SMA is calculated by the following formula: \[SMA = (Pn1 + Pn2 + Pn3 + ...Pnx)/n \] Where:
- P = mean value based on n
- n = number of time periods (Example period = 5, meaning the price to be calculated is 4 days back)

To determine the time period in the calculation, there are no special regulations, but if the desired target is buying and selling in a short time, the period can be reduced and if you want a long period of time, the time period can be enlarged. However, if the time is shortened, it will increase the potential for errors in the decision-making process.

In determining when the right time to buy or sell there are 4 SMA with different time periods for comparison, SMA n1, SMA n2, SMA n3 & SMA n4.SMA n1 here serves as the lower limit and SMA n4 here serves as the upper limit, both of these limits will be very useful as an indicator that the stock price is in the “cheapest” and “most expensive” condition. While SMA n2 and SMA n3 are here as indicators of the movement of the stock price being declining or rising.

- Exponential Moving Average (EMA)

Exponential Moving Average (EMA) is similar to SMA, which is a technical analysis by measuring the direction of the trend over a certain period of time. However, the SMA only calculates the average of the price data whereas the EMA applies more weight to the more recent data. Due to its weighted calculations, the EMA will follow the price more closely than the SMA.

EMA is calculated by the following formula: \[EMAnow = (Closing Price - EMA before) x Multiplier + EMA before \]

Where:
- Closing Price = closing price on that day
- EMA before = EMA of the previous period (Example period = 5, meaning the price that will be calculated is the price of the past 4 days)
- Multiplier = exponential constant

In determining when the right time to buy or sell is needed 4 EMA with different time periods for comparison, after doing some experiments for BRI stocks will use EMA n1, EMA n2, EMA n3 and EMA n4. EMA n1 here functions as a lower limit and EMA n4 here functions as an upper limit, both of these limits will be very useful as an indicator that stock prices are in the “cheapest” and “most expensive” conditions. While the EMA n2 and EMA n3 here as an indicator of the movement of the stock price is declining or rising.

- Moving Average Convergence Divergence (MACD)

Moving Average Convergence Divergence (MACD) is an indicator in technical analysis that describes the relationship between two moving averages in an asset price trend.

MACD is calculated by the following formula: \[MACD = Nsmaller-Period EMA - Nbigger-PeriodEMA \]

Where:
Nsmaller-Period EMA = EMA with a smaller period
Nbigger-Period EMA = EMA with a larger period

MACD will use EMA as one of its indicators. The most popular MACD is usually calculated by subtracting the Nbigger-period Exponential Moving Average (EMA) from the Nsmaller-period EMA. MACD also needs one more EMA which is the N3-period EMA, the N3-period EMA which can serve as a trigger for buy and sell signals.

- Relative Strength Index (RSI)

The Relative Strength Index (RSI) is one of the technical analysis indicators that is usually used to measure the price of an asset. This indicator is used to evaluate whether the asset is in an overbought or oversold position.

RSI is calculated by the following formula: \[RSI = 100 – (100/(1+RS))\]

Where: - RS = Average increase in period / Average decrease in period. (Example period = 5, meaning the price that will be calculated is the price of the past 4 days)

The RSI, will not use the parameters of the SMA or EMA at all. RSI will use the calculated parameters of the RSI formula itself. The standard counting period is 14, as recommended by Welles Wilder. The period may be changed because it does not have special rules, either increasing or decreasing. However, this will affect the sensitivity of the RSI. For example, period 10 reaches overbought or oversold levels faster than period 20.

Last but not least, all four technical analysis will be done using library quantmod().

library(quantmod)

2.2.1 BBRI.JK

- SMA

In determining when the right time to buy or sell is needed 4 SMA with different time periods for comparison, after doing several experiments for this BRI stock will use SMA 5, SMA 30, SMA 50 & SMA 70.

SMA 5 here serves as the lower limit and SMA 70 here serves as the upper limit, both of these limits will be very useful as an indicator that the stock price is in the “cheapest” and “most expensive” condition. While SMA 20 and SMA 30 are here as indicators of the movement of the stock price being declining or rising.

bbri_sma <- bbri

# SMA 5
bbri_sma$SMA5 <- SMA(Ad(bbri_sma),
                n = 5)

# SMA 30
bbri_sma$SMA30 <- SMA(Ad(bbri_sma),
                n = 30)

# SMA 50
bbri_sma$SMA50 <- SMA(Ad(bbri_sma),
                      n = 50)

# SMA 70
bbri_sma$SMA70 <- SMA(Ad(bbri_sma),
                n = 70)

“Buy” and “Sell” signals:
- Buy -> When Price Close Adjusted < H-1 SMA 5 & SMA 30 < H-1 SMA 50
- Sell -> When Price Close Adjusted > H-1 SMA 70 & SMA 20 > H-1 SMA 50

The purpose is why it is developed with the H-1 price and or the H-1 SMA to make sure the price that day is cheaper than yesterday’s price when making the decision to buy, and vice versa. By providing such parameters, it is expected that the price of the shares purchased is the lowest price and when the price is sold it is the highest price.

bbri_sma <- bbri_sma %>%
  mutate(
    signal.SMA = case_when(
      adjusted < lag(SMA5, 1) & SMA30 < lag(SMA50, 1) ~ "Buy",
      adjusted > lag(SMA70, 1) & SMA30 > lag(SMA50, 1) ~ "Sell",
      TRUE ~ "Hold"
    ),
    previous_signal.SMA = lag(signal.SMA, 1),
    decision.SMA = case_when(
      signal.SMA == previous_signal.SMA ~ "Hold",
      TRUE ~ signal.SMA
    )
  )

bbri_sma %>% 
  filter(decision.SMA != "Hold" & ! is.na(SMA70))

If you pay attention to one by one of the overall “Buy” and “Sell” recommendations from the SMA method that has been filtered above, the results of the recommendations are not wrong or the recommendations provide benefits without the slightest loss. With the results as above, the target variable from SMA can be indicated as perfect because the target variable will be used as one of the predictors in the classification model.

- EMA

In determining when the right time to buy or sell is needed 4 EMA with different time periods for comparison, after doing some experiments for BRI stocks will use EMA 5, EMA 15, EMA 50 and EMA 70.

EMA 5 here functions as a lower limit and EMA 70 here functions as an upper limit, both of these limits will be very useful as an indicator that stock prices are in the “cheapest” and “most expensive” conditions. While the EMA 15 and EMA 50 here as an indicator of the movement of the stock price is declining or rising.

bbri_ema <- bbri

# EMA 5
bbri_ema$EMA5 <- EMA(Ad(bbri_ema),
                n = 5)

# EMA 15
bbri_ema$EMA15 <- EMA(Ad(bbri_ema),
                n = 15)

# EMA 50
bbri_ema$EMA50 <- EMA(Ad(bbri_ema),
                n = 50)

# EMA 70
bbri_ema$EMA70 <- EMA(Ad(bbri_ema),
                n = 70)

“Buy” and “Sell” signals:
- Buy -> Close Adjusted < H-1 EMA 5 & EMA 15 < EMA 50
- Sell -> Close Adjusted > H-1 EMA 70 & EMA 15 > EMA 50

The purpose of why it is developed with the H-1 price and or the H-1 EMA is to make sure that day’s price is cheaper than yesterday’s price when making the decision to buy, and vice versa. By providing such parameters, it is expected that the price of the shares purchased is the lowest price and when the price is sold it is the highest price.

bbri_ema <- bbri_ema %>% 
  mutate(
    signal.EMA = case_when(
      adjusted < lag(EMA5, 1) & EMA15 < lag(EMA50, 1) ~ "Buy",
      adjusted > lag(EMA70, 1) & EMA15 > lag(EMA50, 1) ~ "Sell",
      TRUE ~ "Hold"# Otherwise sell
    ),
    previous_signal.EMA = lag(signal.EMA, 1), 
    decision.EMA = case_when( 
      signal.EMA == previous_signal.EMA ~ "Hold",
      TRUE ~ signal.EMA
    )
  )

bbri_ema %>% 
  filter(decision.EMA != "Hold" & ! is.na(EMA70))

If you pay attention to one by one of the overall “Buy” and “Sell” recommendations from the EMA method that has been filtered above, the results of the recommendations are not wrong or the recommendations provide profits without the slightest loss. With the results as above, the target variable from the EMA can be indicated as perfect because the target variable will be used as one of the predictors in the classification model.

- MACD

After doing several experiments, the EMAs that will be used in MACD analysis are EMA5, EMA15 and EMA50.

bbri_macd <- bbri

# EMA 15
bbri_macd$EMA15 <- bbri_ema$EMA15

# EMA 50
bbri_macd$EMA50 <- bbri_ema$EMA50

# MACD
bbri_macd$MACD <- bbri_ema$EMA15 - bbri_ema$EMA50

“Buy” and “Sell” signals:
- Buy -> When Close Adjusted < H-1 EMA 5 & When MACD < 0 (Example period = 5, means the price will be calculated 4 days ago)
- Sell -> When Close Adjusted > H-1 EMA 5 & When MACD > 0

Slightly different from SMA and EMA, signals on MACD will use 0 as a parameter. When the result of the subtraction of the EMA15 and EMA50 is smaller than 0 or minus it is an indication that the stock price is experiencing a decline, and vice versa. As for the comparison of adjusted prices with EMA5, it is the same as SMA and EMA.

bbri_macd <- bbri_macd %>% 
  mutate(
    signal.MACD = case_when(
      adjusted < lag(bbri_ema$EMA5, 1) & MACD < 0 & lag(MACD, 1) < MACD ~ "Buy", 
      adjusted > lag(bbri_ema$EMA5, 1) & MACD > 0 & lag(MACD, 1) > MACD ~ "Sell", 
      TRUE ~ "Hold"
    ),
    previous_signal.MACD = lag(signal.MACD, 1), 
    decision.MACD = case_when( 
      signal.MACD == previous_signal.MACD ~ "Hold",
      TRUE ~ signal.MACD
    )
  )

bbri_macd %>% 
  filter(decision.MACD != "Hold" & ! is.na(EMA50))

If you pay attention to one by one of the overall “Buy” and “Sell” recommendations from the MACD method that has been filtered above, the results of the recommendations are not wrong or the recommendations provide benefits without the slightest loss. With the results as above, the target variable from MACD can be indicated as perfect because the target variable will be used as one of the predictors in the classification model.

- RSI

RSI will not use the parameters of the SMA or EMA at all. RSI will use the calculated parameters of the RSI formula itself. The standard counting period is 14, as recommended by Welles Wilder. The period may be changed because it does not have special rules, either increasing or decreasing. However, this will affect the sensitivity of the RSI. For example, period 10 reaches overbought or oversold levels faster than period 20. The RSI that will be used for BRI is RSI7, RSI14, RSI30 and RSI70, because after several experiments, these 4 RSIs showed the best results.

bbri_rsi <- bbri

# RSI 7
bbri_rsi$RSI7 <- RSI(Ad(bbri_rsi),
                     n =7)

# RSI 14
bbri_rsi$RSI14 <- RSI(Ad(bbri_rsi), 
                      n=14)

# RSI 30
bbri_rsi$RSI30 <- RSI(Ad(bbri_rsi), 
                      n=30)

# RSI 70
bbri_rsi$RSI70 <- RSI(Ad(bbri_rsi), 
                      n=70)

“Buy” and “Sell” signals:
- Buy -> When Close Adjusted Price < H-1 Close Adjusted Price & RSI7 > RSI14 & RSI7 < RSI20
- Sell -> When Price Close Adjusted > Price H-1 Close Adjusted & RSI7 < RSI14 & RSI7 > RSI80

In technical analysis using RSI signals “Buy” and “Sell” will depend on the movement of RSI 7 which will be compared to RSI 14, RSI30 and RSI 70. If the movement of RSI 7 is bigger than RSI14 it is an indication that prices will soon rise vice versa. Meanwhile, RSI 30 and RSI 70 will be used as an “overbought” indicator and RSI 70 will be used as an “oversell” indicator. larger than the RSI 30, it is an indicator that the stock price is not at the lowest price.

bbri_rsi <- bbri_rsi %>% 
  mutate(
    signal.RSI = case_when(
      adjusted < lag(adjusted,1) & RSI7 > RSI14 & RSI7 < RSI70 ~ "Buy",
      adjusted > lag(adjusted,1) & RSI7 < RSI14 & RSI7 > RSI30 ~ "Sell",
      TRUE ~ "Hold"
    ),
    previous_signal.RSI = lag(signal.RSI, 1),
    decision.RSI = case_when( 
      signal.RSI == previous_signal.RSI ~ "Hold",
      TRUE ~ signal.RSI
    )
  )

bbri_rsi %>% 
  group_by(decision.RSI) %>% 
  summarise(freq = n())

2.2.2 ISAT.JK

- SMA

isat_sma <- isat

# SMA 5
isat_sma$SMA5 <- SMA(Ad(isat_sma),
                n = 5)

# SMA 30
isat_sma$SMA20 <- SMA(Ad(isat_sma),
                n = 20)

# SMA 50
isat_sma$SMA60 <- SMA(Ad(isat_sma),
                      n = 60)

# SMA 70
isat_sma$SMA70 <- SMA(Ad(isat_sma),
                n =70)

# Membuat sinyal Buy, Sell & Hold
isat_sma <- isat_sma %>%
  mutate(
    signal.SMA = case_when(
      adjusted < lag(SMA5, 1) & SMA20 > lag(SMA60, 1) ~ "Sell",
      adjusted > lag(SMA60, 1) & SMA20 < lag(SMA60, 1) ~ "Buy",
      TRUE ~ "Hold"
    ),
    previous_signal.SMA = lag(signal.SMA, 1),
    decision.SMA = case_when(
      signal.SMA == previous_signal.SMA ~ "Hold",
      TRUE ~ signal.SMA
    )
  )

isat_sma %>% 
  filter(decision.SMA != "Hold" & ! is.na(SMA60))

- EMA

# Membuat objek baru 
isat_ema <- isat

# EMA 10
isat_ema$EMA10 <- EMA(Ad(isat_ema),
                n = 10)

# EMA 30
isat_ema$EMA30 <- EMA(Ad(isat_ema),
                n = 30)

# EMA 50
isat_ema$EMA50 <- EMA(Ad(isat_ema),
                n = 50)

# EMA 70
isat_ema$EMA70 <- EMA(Ad(isat_ema),
                n = 70)

isat_ema <- isat_ema %>% 
  mutate(
    signal.EMA = case_when(
      adjusted < lag(EMA10, 1) & EMA30 > lag(EMA50, 1) ~ "Sell",
      adjusted > lag(EMA70, 1) & EMA30 < lag(EMA50, 1) ~ "Buy",
      TRUE ~ "Hold"# Otherwise sell
    ),
    previous_signal.EMA = lag(signal.EMA, 1), 
    decision.EMA = case_when( 
      signal.EMA == previous_signal.EMA ~ "Hold",
      TRUE ~ signal.EMA
    )
  )

isat_ema %>% 
  filter(decision.EMA != "Hold" & ! is.na(EMA70))

- MACD

isat_macd <- isat

# EMA 10
isat_macd$EMA10 <- EMA(Ad(isat_ema),
                n = 10)

# EMA 18
isat_macd$EMA18 <- EMA(Ad(isat_ema),
                n = 18)

# EMA 48
isat_macd$EMA48 <- EMA(Ad(isat_ema),
                n = 48)

# MACD
isat_macd$MACD <- isat_macd$EMA18 - isat_macd$EMA48

isat_macd <- isat_macd %>% 
  mutate(
    signal.MACD = case_when(
      adjusted < lag(isat_macd$EMA10, 1) & MACD < 0 & lag(MACD, 1) < MACD ~ "Buy", 
      adjusted > lag(isat_macd$EMA10, 1) & MACD > 0 & lag(MACD, 1) > MACD ~ "Sell", 
      TRUE ~ "Hold"
    ),
    previous_signal.MACD = lag(signal.MACD, 1), 
    decision.MACD = case_when( 
      signal.MACD == previous_signal.MACD ~ "Hold",
      TRUE ~ signal.MACD
    )
  )

isat_macd %>% 
  filter(decision.MACD != "Hold")

- RSI

# Membuat objek baru
isat_rsi <- isat

# RSI 10
isat_rsi$RSI10 <- RSI(Ad(isat_rsi),
                     n =10)

# RSI 38
isat_rsi$RSI38 <- RSI(Ad(isat_rsi), 
                      n=38)

# RSI 45
isat_rsi$RSI45 <- RSI(Ad(isat_rsi), 
                      n=45)

# RSI 70
isat_rsi$RSI70 <- RSI(Ad(isat_rsi), 
                      n=70)

isat_rsi <- isat_rsi %>% 
  mutate(
    signal.RSI = case_when(
      adjusted < lag(adjusted,1) & RSI10 > RSI38 & RSI10 < RSI70 ~ "Buy",
      adjusted > lag(adjusted,1) & RSI10 < RSI38 & RSI10 > RSI45 ~ "Sell",
      TRUE ~ "Hold"
    ),
    previous_signal.RSI = lag(signal.RSI, 1),
    decision.RSI = case_when( 
      signal.RSI == previous_signal.RSI ~ "Hold",
      TRUE ~ signal.RSI
    )
  )

isat_rsi %>% 
  filter(decision.RSI != "Hold" & ! is.na(RSI70))

2.2.3 SIDO.JK

- SMA

sido_sma <- sido

# SMA 5
sido_sma$SMA5 <- SMA(Ad(sido_sma),
                n = 5)
  
# SMA 15
sido_sma$SMA15 <- SMA(Ad(sido_sma),
                n = 15)

# SMA 55
sido_sma$SMA55 <- SMA(Ad(sido_sma),
                      n = 55)

# SMA 80
sido_sma$SMA80 <- SMA(Ad(sido_sma),
                n =80)

sido_sma <- sido_sma %>%
  mutate(
    signal.SMA = case_when(
      adjusted < lag(SMA5, 1) & SMA15 < lag(SMA55, 1) ~ "Buy",
      adjusted > lag(SMA80, 1) & SMA15 > lag(SMA55, 1) ~ "Sell",
      TRUE ~ "Hold"
    ),
    previous_signal.SMA = lag(signal.SMA, 1),
    decision.SMA = case_when(
      signal.SMA == previous_signal.SMA ~ "Hold",
      TRUE ~ signal.SMA
    )
  )

sido_sma %>% 
  filter(decision.SMA != "Hold" & ! is.na(SMA80))

From the observation above SMA decision at 03-05-2017, 08-05-2017, 20-04-202 and 23-04-2020 must be changed into Hold in order to prevent trading loss.

sido_sma$decision.SMA[sido_sma$date == "2017-05-03"] <- "Hold"
sido_sma$decision.SMA[sido_sma$date == "2017-05-08"] <- "Hold"
sido_sma$decision.SMA[sido_sma$date == "2020-04-20"] <- "Hold"
sido_sma$decision.SMA[sido_sma$date == "2020-04-23"] <- "Hold"

- EMA

# Membuat objek baru 
sido_ema <- sido

# EMA 5
sido_ema$EMA5 <- EMA(Ad(sido_ema),
                n = 5)

# EMA 30
sido_ema$EMA30 <- EMA(Ad(sido_ema),
                n = 30)

# EMA 55
sido_ema$EMA55 <- EMA(Ad(sido_ema),
                n = 55)

# EMA 60
sido_ema$EMA60 <- EMA(Ad(sido_ema),
                n = 60)

sido_ema <- sido_ema %>% 
  mutate(
    signal.EMA = case_when(
      adjusted < lag(EMA5, 1) & EMA30 < lag(EMA55, 1) ~ "Buy",
      adjusted > lag(EMA60, 1) & EMA30 > lag(EMA55, 1) ~ "Sell",
      TRUE ~ "Hold"# Otherwise sell
    ),
    previous_signal.EMA = lag(signal.EMA, 1), 
    decision.EMA = case_when( 
      signal.EMA == previous_signal.EMA ~ "Hold",
      TRUE ~ signal.EMA
    )
  )

sido_ema %>% 
  filter(decision.EMA != "Hold" & ! is.na(EMA60))

sido_ema$decision.EMA[sido_ema$date == "2017-05-10"] <- "Hold"

- MACD

sido_macd <- sido

# EMA 10
sido_macd$EMA10 <- EMA(Ad(sido_macd),
                n = 10)

# EMA 15
sido_macd$EMA15 <- EMA(Ad(sido_macd),
                n = 15)

# EMA 20
sido_macd$EMA20 <- EMA(Ad(sido_macd),
                n = 20)

# MACD
sido_macd$MACD <- sido_macd$EMA15 - sido_macd$EMA20

sido_macd <- sido_macd %>% 
  mutate(
    signal.MACD = case_when(
      adjusted < lag(sido_macd$EMA10, 1) & MACD < 0 & lag(MACD, 1) < MACD ~ "Buy", 
      adjusted > lag(sido_macd$EMA10, 1) & MACD > 0 & lag(MACD, 1) > MACD ~ "Sell", 
      TRUE ~ "Hold"
    ),
    previous_signal.MACD = lag(signal.MACD, 1), 
    decision.MACD = case_when( 
      signal.MACD == previous_signal.MACD ~ "Hold",
      TRUE ~ signal.MACD
    )
  )

sido_macd %>% 
  filter(decision.MACD != "Hold")

sido_macd$decision.MACD[sido_macd$date == "2017-11-02"] <- "Hold"

- RSI

sido_rsi <- sido

# RSI 10
sido_rsi$RSI10 <- RSI(Ad(sido_rsi),
                     n =10)

# RSI 14
sido_rsi$RSI14 <- RSI(Ad(sido_rsi), 
                      n=14)

# RSI 30
sido_rsi$RSI30 <- RSI(Ad(sido_rsi), 
                      n=30)

# RSI 65
sido_rsi$RSI65 <- RSI(Ad(sido_rsi), 
                      n=65)

sido_rsi <- sido_rsi %>% 
  mutate(
    signal.RSI = case_when(
      close < lag(close,1) & RSI10 > RSI14 & RSI10 < RSI65 ~ "Sell",
      close > lag(close,1) & RSI10 < RSI14 & RSI10 > RSI30 ~ "Buy",
      TRUE ~ "Hold"
    ),
    previous_signal.RSI = lag(signal.RSI, 1),
    decision.RSI = case_when( 
      signal.RSI == previous_signal.RSI ~ "Hold",
      TRUE ~ signal.RSI
    )
  )

sido_rsi %>% 
  filter(decision.RSI != "Hold" & ! is.na(RSI65))

2.2.4 HOKI.JK

- SMA

hoki_sma <- hoki

# SMA 5
hoki_sma$SMA5 <- SMA(Ad(hoki_sma),
                n = 5)
  
# SMA 25
hoki_sma$SMA25 <- SMA(Ad(hoki_sma),
                n = 25)

# SMA 55
hoki_sma$SMA55 <- SMA(Ad(hoki_sma),
                      n = 55)

# SMA 70
hoki_sma$SMA70 <- SMA(Ad(hoki_sma),
                n =70)

# Membuat sinyal Buy, Sell & Hold
hoki_sma <- hoki_sma %>%
  mutate(
    signal.SMA = case_when(
      close < lag(SMA5, 1) & SMA25 < lag(SMA55, 1) ~ "Buy",
      close > lag(SMA70, 1) & SMA25 > lag(SMA55, 1) ~ "Sell",
      TRUE ~ "Hold"
    ),
    previous_signal.SMA = lag(signal.SMA, 1),
    decision.SMA = case_when(
      signal.SMA == previous_signal.SMA ~ "Hold",
      TRUE ~ signal.SMA
    )
  )

hoki_sma %>% 
  filter(decision.SMA != "Hold" & ! is.na(SMA70))

- EMA

hoki_ema <- hoki

# EMA 5
hoki_ema$EMA5 <- EMA(Ad(hoki_ema),
                n = 5)

# EMA 30
hoki_ema$EMA30 <- EMA(Ad(hoki_ema),
                n = 30)

# EMA 40
hoki_ema$EMA50 <- EMA(Ad(hoki_ema),
                n = 50)

# EMA 60
hoki_ema$EMA60 <- EMA(Ad(hoki_ema),
                n = 60)

hoki_ema <- hoki_ema %>% 
  mutate(
    signal.EMA = case_when(
      close < lag(EMA5, 1) & EMA30 < lag(EMA50, 1) ~ "Buy",
      close > lag(EMA60, 1) & EMA30 > lag(EMA50, 1) ~ "Sell",
      TRUE ~ "Hold"# Otherwise sell
    ),
    previous_signal.EMA = lag(signal.EMA, 1), 
    decision.EMA = case_when( 
      signal.EMA == previous_signal.EMA ~ "Hold",
      TRUE ~ signal.EMA
    )
  )

hoki_ema %>% 
  filter(decision.EMA != "Hold" & ! is.na(EMA60))

- MACD

# Membuat objek baru
hoki_macd <- hoki

# EMA 10
hoki_macd$EMA10 <- EMA(Ad(hoki_macd),
                n = 10)

# EMA 15
hoki_macd$EMA15 <- EMA(Ad(hoki_macd),
                n = 15)

# EMA 50
hoki_macd$EMA50 <- EMA(Ad(hoki_macd),
                n = 50)

# MACD
hoki_macd$MACD <- hoki_macd$EMA15 - hoki_macd$EMA50

hoki_macd <- hoki_macd %>% 
  mutate(
    signal.MACD = case_when(
      adjusted < lag(hoki_macd$EMA10, 1) & MACD < 0 & lag(MACD, 1) < MACD ~ "Buy", 
      adjusted > lag(hoki_macd$EMA10, 1) & MACD > 0 & lag(MACD, 1) > MACD ~ "Sell", 
      TRUE ~ "Hold"
    ),
    previous_signal.MACD = lag(signal.MACD, 1), 
    decision.MACD = case_when( 
      signal.MACD == previous_signal.MACD ~ "Hold",
      TRUE ~ signal.MACD
    )
  )

hoki_macd %>% 
  filter(decision.MACD != "Hold")

- RSI

hoki_rsi <- hoki

# RSI 10
hoki_rsi$RSI10 <- RSI(Ad(hoki_rsi),
                     n =10)

# RSI 14
hoki_rsi$RSI14 <- RSI(Ad(hoki_rsi), 
                      n=14)

# RSI 40
hoki_rsi$RSI40 <- RSI(Ad(hoki_rsi), 
                      n=40)

# RSI 65
hoki_rsi$RSI65 <- RSI(Ad(hoki_rsi), 
                      n=65)

hoki_rsi <- hoki_rsi %>% 
  mutate(
    signal.RSI = case_when(
      close < lag(close,1) & RSI10 > RSI14 & RSI10 < RSI65 ~ "Sell",
      close > lag(close,1) & RSI10 < RSI14 & RSI10 > RSI40 ~ "Buy",
      TRUE ~ "Hold"
    ),
    previous_signal.RSI = lag(signal.RSI, 1),
    decision.RSI = case_when( 
      signal.RSI == previous_signal.RSI ~ "Hold",
      TRUE ~ signal.RSI
    )
  )

hoki_rsi %>% 
  filter(decision.RSI != "Hold" & ! is.na(RSI65))

sido_rsi$decision.RSI[sido_rsi$date == "2016-08-03"] <- "Hold"
sido_rsi$decision.RSI[sido_rsi$date == "2016-08-08"] <- "Hold"

2.2.5 WIKA.JK

- SMA

wika_sma <- wika

# SMA 5
wika_sma$SMA20 <- SMA(Ad(wika_sma),
                n = 20)
  
# SMA 30
wika_sma$SMA30 <- SMA(Ad(wika_sma),
                n = 30)

# SMA 65
wika_sma$SMA65 <- SMA(Ad(wika_sma),
                      n =65)

# SMA 80
wika_sma$SMA80 <- SMA(Ad(wika_sma),
                n =80)

wika_sma <- wika_sma %>%
  mutate(
    signal.SMA = case_when(
      close < lag(SMA20, 1) & SMA30 < lag(SMA65, 1) ~ "Buy",
      close > lag(SMA80, 1) & SMA30 > lag(SMA65, 1) ~ "Sell",
      TRUE ~ "Hold"
    ),
    previous_signal.SMA = lag(signal.SMA, 1),
    decision.SMA = case_when(
      signal.SMA == previous_signal.SMA ~ "Hold",
      TRUE ~ signal.SMA
    )
  )

wika_sma %>% 
  filter(decision.SMA != "Hold" & ! is.na(SMA80))

wika_sma$decision.SMA[wika_sma$date == "2017-04-10"] <- "Hold"

- EMA

wika_ema <- wika

# EMA 20
wika_ema$EMA20 <- EMA(Ad(wika_ema),
                n = 20)

# EMA 30
wika_ema$EMA30 <- EMA(Ad(wika_ema),
                n = 30)

# EMA 65
wika_ema$EMA65 <- EMA(Ad(wika_ema),
                n = 65)

# EMA 80
wika_ema$EMA80 <- EMA(Ad(wika_ema),
                n = 80)

wika_ema <- wika_ema %>% 
  mutate(
    signal.EMA = case_when(
      adjusted < lag(EMA20, 1) & EMA30 < lag(EMA65, 1) ~ "Buy",
      adjusted > lag(EMA80, 1) & EMA30 > lag(EMA65, 1) ~ "Sell",
      TRUE ~ "Hold"# Otherwise sell
    ),
    previous_signal.EMA = lag(signal.EMA, 1), 
    decision.EMA = case_when( 
      signal.EMA == previous_signal.EMA ~ "Hold",
      TRUE ~ signal.EMA
    )
  )

wika_ema %>% 
  filter(decision.EMA != "Hold" & ! is.na(EMA80))

wika_ema$decision.EMA[wika_ema$date == "2020-01-29"] <- "Hold"
wika_ema$decision.EMA[wika_ema$date == "2020-02-07"] <- "Hold"
wika_ema$decision.EMA[wika_ema$date == "2020-02-24"] <- "Hold"

- MACD

# Membuat objek baru
wika_macd <- wika

# EMA 6
wika_macd$EMA6 <- EMA(Ad(wika_macd),
                n =6)

# EMA 10
wika_macd$EMA10 <- EMA(Ad(wika_macd),
                n = 10)

# EMA 25
wika_macd$EMA25 <- EMA(Ad(wika_macd),
                n = 25)

# MACD
wika_macd$MACD <- wika_macd$EMA10 - wika_macd$EMA25

wika_macd <- wika_macd %>% 
  mutate(
    signal.MACD = case_when(
      adjusted < lag(wika_macd$EMA10, 1) & MACD < 0 & lag(MACD, 1) < MACD ~ "Buy", 
      adjusted > lag(wika_macd$EMA10, 1) & MACD > 0 & lag(MACD, 1) > MACD ~ "Sell", 
      TRUE ~ "Hold"
    ),
    previous_signal.MACD = lag(signal.MACD, 1), 
    decision.MACD = case_when( 
      signal.MACD == previous_signal.MACD ~ "Hold",
      TRUE ~ signal.MACD
    )
  )

wika_macd %>% 
  filter(decision.MACD != "Hold")

wika_macd$decision.MACD[wika_macd$date == "2017-03-09"] <- "Hold"
wika_macd$decision.MACD[wika_macd$date == "2019-12-26"] <- "Hold"

- RSI

wika_rsi <- wika

# RSI 10
wika_rsi$RSI5 <- RSI(Ad(wika_rsi),
                     n =5)

# RSI 14
wika_rsi$RSI20 <- RSI(Ad(wika_rsi), 
                      n=20)

# RSI 30
wika_rsi$RSI35 <- RSI(Ad(wika_rsi), 
                      n=35)

# RSI 65
wika_rsi$RSI65 <- RSI(Ad(wika_rsi), 
                      n=65)

wika_rsi <- wika_rsi %>% 
  mutate(
    signal.RSI = case_when(
      close < lag(close,1) & RSI5 > RSI20 & RSI5 < RSI65 ~ "Buy",
      close > lag(close,1) & RSI5 < RSI20 & RSI5 > RSI35 ~ "Sell",
      TRUE ~ "Hold"
    ),
    previous_signal.RSI = lag(signal.RSI, 1),
    decision.RSI = case_when( 
      signal.RSI == previous_signal.RSI ~ "Hold",
      TRUE ~ signal.RSI
    )
  )

wika_rsi %>% 
  filter(decision.RSI != "Hold" & ! is.na(RSI65))

2.3 Combine All Indicator

After getting the required analysis results, combine them into a new data frame. The purpose of combining all indicators into one is to get a final decision, to assess whether the stock should be bought or sold on the day. The results of the final decision will also be used as a target variable to train the classification model.

2.3.1 BBRI.JK

# Combine all indicator
bbri_analisa <- cbind(bbri,bbri_sma$SMA5, bbri_sma$SMA30, bbri_sma$SMA50, bbri_sma$SMA70, bbri_ema$EMA5, bbri_ema$EMA15, bbri_ema$EMA50, bbri_ema$EMA70, bbri_macd$MACD, bbri_rsi$RSI7, bbri_rsi$RSI14, bbri_rsi$RSI30, bbri_rsi$RSI70, bbri_sma$decision.SMA, bbri_ema$decision.EMA, bbri_macd$decision.MACD, bbri_rsi$decision.RSI)

# Change column name
colnames(bbri_analisa) <- c("symbol", "date", "open", "high", "low", "close", "volume", "adjusted", "SMA5", "SMA30", "SMA50", "SMA70", "EMA5", "EMA15", "EMA50", "EMA70", "MACD", "RSI7", "RSI14", "RSI30", "RSI70", "decision.SMA", "decision.EMA", "decision.MACD", "decision.RSI")

head(bbri_analisa, 3)

After all the analysis columns are combined into a new data frame, create a new column that will be used as a final decision. The final decision will be taken from the decision column per each analysis.

bbri_analisa <- bbri_analisa %>% 
  mutate(
    final_decision = case_when(
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Sell" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Buy" ~ "Sell",
      TRUE ~ "Hold"
    )
  )

bbri_analisa %>% 
  filter(final_decision != "Hold")

From the results above, it can be seen that there are only 3 out of 4 technical analyzes showing the same decision, from there it can be concluded that the majority of the results from each technical analysis have different results. One alternative to get a final decision from the results of combining the 4 technical analyzes above is to give weight to the technical analysis that is better than the 4 analyzes that have been done.

Each technical analysis certainly has its advantages and disadvantages, in this project there are 2 analyzes that will be given additional weight to the MACD and RSI analysis. The reason why choosing MACD is its superiority in providing signals or indications as early as possible, this is very useful to avoid losing momentum to buy or too late to take profits. While the RSI is one of the most favorite indicators due to its ability to detect prices in the market being in the overbought or oversell area, the ability to detect is what makes RSI the favorite because it can provide maximum benefits from the results of the analysis. It is hoped that by combining the two indicators above, they can complement each other.

bbri_analisa2 <- bbri_analisa %>% 
  mutate(
    final_decision = case_when(
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Sell" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Buy" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      TRUE ~ "Hold"
    )
  )

bbri_analisa2 %>% 
  select(c("date", "open", "close", "final_decision"))

2.3.2 ISAT.JK

# Combine all indicator
isat_analisa <- cbind(isat,isat_sma$SMA5, isat_sma$SMA20, isat_sma$SMA60, isat_sma$SMA70, isat_ema$EMA10, isat_ema$EMA30, isat_ema$EMA50, isat_ema$EMA70, isat_macd$EMA10, isat_macd$EMA18, isat_macd$EMA48, isat_macd$MACD, isat_rsi$RSI10, isat_rsi$RSI38, isat_rsi$RSI45, isat_rsi$RSI70, isat_sma$decision.SMA, isat_ema$decision.EMA, isat_macd$decision.MACD, isat_rsi$decision.RSI)

# Change column name
colnames(isat_analisa) <- c("symbol", "date", "open", "high", "low", "close", "volume", "adjusted", "SMA5", "SMA20", "SMA60", "SMA70", "EMA5", "EMA10", "EMA30", "EMA50", "EMA70", "EMA18", "EMA48", "MACD", "RSI10", "RSI38", "RSI45", "RSI70", "decision.SMA", "decision.EMA", "decision.MACD", "decision.RSI")

head(isat_analisa, 3)

isat_analisa <- isat_analisa %>% 
  mutate(
    final_decision = case_when(
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Sell" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Buy" ~ "Sell",
      TRUE ~ "Hold"
    )
  )

isat_analisa %>% 
  select(c("date", "open", "close", "final_decision")) %>% 
  filter(final_decision != "Hold")

isat_analisa2 <- isat_analisa %>% 
  mutate(
    final_decision = case_when(
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.MACD == "Buy" | decision.EMA == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.MACD == "Buy" | decision.EMA == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.MACD == "Buy" | decision.EMA == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.MACD == "Buy" | decision.EMA == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.MACD == "Sell" | decision.EMA == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.MACD == "Sell" | decision.EMA == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.MACD == "Sell" | decision.EMA == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.MACD == "Sell" | decision.EMA == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.MACD == "Sell" | decision.EMA == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.MACD == "Buy" | decision.EMA == "Sell" | decision.RSI == "Sell" ~ "Sell",
      TRUE ~ "Hold"
    )
  )

isat_analisa2 %>%
  select(c("date", "open", "close", "final_decision")) %>% 
  filter(final_decision != "Hold")

2.3.3 SIDO.JK

# Combine all columns
sido_analisa <- cbind(sido, sido_sma$SMA5, sido_sma$SMA15, sido_sma$SMA55, sido_sma$SMA80, sido_ema$EMA5, sido_ema$EMA30, sido_ema$EMA55, sido_ema$EMA60, sido_macd$EMA10, sido_macd$EMA15, sido_macd$MACD, sido_rsi$RSI10, sido_rsi$RSI14, sido_rsi$RSI30, sido_rsi$RSI65, sido_sma$decision.SMA, sido_ema$decision.EMA, sido_macd$decision.MACD, sido_rsi$decision.RSI)

# Change column name
colnames(sido_analisa) <- c("symbol", "date", "open", "high", "low", "close", "volume", "adjusted", "SMA5", "SMA15", "SMA55", "SMA80", "EMA5", "EMA30", "EMA55", "EMA60", "EMA10", "EMA15", "MACD", "RSI10", "RSI14", "RSI40", "RSI65", "decision.SMA", "decision.EMA", "decision.MACD", "decision.RSI")

head(sido_analisa, 3)

sido_analisa <- sido_analisa %>% 
  mutate(
    final_decision = case_when(
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Sell" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Buy" ~ "Sell",
      TRUE ~ "Hold"
    )
  )

sido_analisa %>%
  select(c("date", "open", "close", "final_decision")) %>% 
  filter(final_decision != "Hold")

sido_analisa2 <- sido_analisa %>% 
  mutate(
    final_decision = case_when(
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Sell" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Buy" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      TRUE ~ "Hold"
    )
  )

sido_analisa2 %>%
  select(c("date", "open", "close", "final_decision")) %>% 
  filter(final_decision != "Hold")

sido_analisa2$final_decision[sido_analisa2$date == "2019-01-16"] <- "Hold"
sido_analisa2$final_decision[sido_analisa2$date == "2016-02-09"] <- "Hold"
sido_analisa2$final_decision[sido_analisa2$date == "2016-11-04"] <- "Hold"
sido_analisa2$final_decision[sido_analisa2$date == "2017-01-10"] <- "Buy"
sido_analisa2$final_decision[sido_analisa2$date == "2016-02-09"] <- "Hold"

sido_analisa2 %>%
  select(c("date", "open", "close", "final_decision")) %>% 
  filter(final_decision != "Hold")

2.3.4 HOKI.JK

# Combine all the columns
hoki_analisa <- cbind(hoki,hoki_sma$SMA5, hoki_sma$SMA25, hoki_sma$SMA55, hoki_sma$SMA70, hoki_ema$EMA5, hoki_ema$EMA30, hoki_ema$EMA50, hoki_ema$EMA60, hoki_macd$EMA10, hoki_macd$EMA15, hoki_macd$MACD, hoki_rsi$RSI10, hoki_rsi$RSI14, hoki_rsi$RSI40, hoki_rsi$RSI65, hoki_sma$decision.SMA, hoki_ema$decision.EMA, hoki_macd$decision.MACD, hoki_rsi$decision.RSI)

# Change column name
colnames(hoki_analisa) <- c("symbol", "date", "open", "high", "low", "close", "volume", "adjusted", "SMA5", "SMA25", "SMA55", "SMA70", "EMA5", "EMA30", "EMA50", "EMA60", "EMA10", "EMA15", "MACD", "RSI10", "RSI14", "RSI40", "RSI65", "decision.SMA", "decision.EMA", "decision.MACD", "decision.RSI")

head(hoki_analisa, 3)

hoki_analisa <- hoki_analisa %>% 
  mutate(
    final_decision = case_when(
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Sell" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Buy" ~ "Sell",
      TRUE ~ "Hold"
    )
  )

hoki_analisa %>%
  select(c("date", "open", "close", "final_decision")) %>% 
  filter(final_decision != "Hold")

hoki_analisa2 <- hoki_analisa %>% 
  mutate(
    final_decision = case_when(
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Sell" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Buy" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      TRUE ~ "Hold"
    )
  )

hoki_analisa2 %>%
  select(c("date", "open", "close", "final_decision")) %>% 
  filter(final_decision != "Hold")

hoki_analisa2$final_decision[hoki_analisa2$date == "2018-11-05"] <- "Hold"
hoki_analisa2$final_decision[hoki_analisa2$date == "2019-08-19"] <- "Hold"
hoki_analisa2$final_decision[hoki_analisa2$date == "2020-12-16"] <- "Hold"
hoki_analisa2$final_decision[hoki_analisa2$date == "2021-04-09"] <- "Hold"
hoki_analisa2$final_decision[hoki_analisa2$date == "2021-04-23"] <- "Hold"
hoki_analisa2$final_decision[hoki_analisa2$date == "2021-05-10"] <- "Hold"
hoki_analisa2$final_decision[hoki_analisa2$date == "2021-06-03"] <- "Hold"
hoki_analisa2$final_decision[hoki_analisa2$date == "2021-06-04"] <- "Hold"
hoki_analisa2$final_decision[hoki_analisa2$date == "2021-06-10"] <- "Hold"
hoki_analisa2$final_decision[hoki_analisa2$date == "2021-06-15"] <- "Hold"

hoki_analisa2 %>%
  select(c("date", "open", "close", "final_decision")) %>% 
  filter(final_decision != "Hold")

2.3.5 WIKA.JK

# Combine all the columns
wika_analisa <- cbind(wika, wika_sma$SMA20, wika_sma$SMA30, wika_sma$SMA65, wika_sma$SMA80, wika_ema$EMA20, wika_ema$EMA30, wika_ema$EMA65, wika_ema$EMA80, wika_macd$EMA10, wika_macd$EMA25, wika_macd$MACD, wika_rsi$RSI5, wika_rsi$RSI20, wika_rsi$RSI35, wika_rsi$RSI65, wika_sma$decision.SMA, wika_ema$decision.EMA, wika_macd$decision.MACD, wika_rsi$decision.RSI)

# Change column name
colnames(wika_analisa) <- c("symbol", "date", "open", "high", "low", "close", "volume", "adjusted", "SMA20", "SMA30", "SMA65", "SMA80", "EMA20", "EMA30", "EMA65", "EMA80", "EMA10", "EMA25", "MACD", "RSI5", "RSI20", "RSI35", "RSI65", "decision.SMA", "decision.EMA", "decision.MACD", "decision.RSI")

head(wika_analisa, 3)

wika_analisa <- wika_analisa %>% 
  mutate(
    final_decision = case_when(
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.RSI == "Sell" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.RSI == "Buy" ~ "Sell",
      TRUE ~ "Hold"
    )
  )

wika_analisa %>% 
  select(c("date", "open", "close", "final_decision")) %>% 
  filter(final_decision != "Hold")

wika_analisa2 <- wika_analisa %>% 
  mutate(
    final_decision = case_when(
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.MACD == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.MACD == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.MACD == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.MACD == "Buy" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Buy" &  decision.EMA == "Buy" & decision.MACD == "Buy" & decision.MACD == "Sell" | decision.MACD == "Buy" | decision.RSI == "Buy" ~ "Buy",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.MACD == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Buy" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.MACD == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Buy" & decision.MACD == "Sell" & decision.MACD == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Buy" & decision.MACD == "Sell" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      decision.SMA == "Sell" &  decision.EMA == "Sell" & decision.MACD == "Sell" & decision.MACD == "Buy" | decision.MACD == "Sell" | decision.RSI == "Sell" ~ "Sell",
      TRUE ~ "Hold"
    )
  )

wika_analisa2 %>%
  select(c("date", "open", "close", "final_decision")) %>% 
  filter(final_decision != "Hold")

wika_analisa2$final_decision[wika_analisa2$date == "2017-04-20"] <- "Hold"
wika_analisa2$final_decision[wika_analisa2$date == "2017-10-31"] <- "Hold"
wika_analisa2$final_decision[wika_analisa2$date == "2017-11-03"] <- "Hold"
wika_analisa2$final_decision[wika_analisa2$date == "2019-08-08"] <- "Hold"

wika_analisa2 %>%
  select(c("date", "open", "close", "final_decision")) %>% 
  filter(final_decision != "Hold")

3 Data Pre-processing

Data preprocessing is a process of preparing the raw data that has been obtained and has passed the EDA process and making it suitable for a machine learning model. It is the first and crucial step while creating a machine learning model.

3.1 Cross Validation

First thing first, let’s conduct cross validation. Cross-validation is a statistical technique for testing the performance of a Machine Learning model. In particular, a good cross validation method gives us a comprehensive measure of our model’s performance throughout the whole dataset.

In this project the whole dataset will be divided into three parts, Data Train, Data Validation and Data Test.
- Training Dataset: The sample of data used to fit the model.
- Validation Dataset: The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.
- Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.

The main reason why cross validation are devided the dataset intro three parts is to helps avoiding overfitting, since by dividing the data intro three parts we can concretely check that our model performs well on data seen during training and not. Without cross validation, we would never know if our model is amazing worldwide or just on our sheltered training set!

Data Train and Data Validation will start from 01-01-2018 until 31-05-2021.
Data Test will start from 01-01-2016 until 31-12-2017.

3.1.1 BBRI.JK

- Data Train

data_train <- bbri_analisa2 %>% 
  filter(date > "2017-12-31" & date < "2021-06-01") %>% 
  mutate_if(is.character, as.factor)

After the data train has been separated from data test, let’s randomize the data into two parts, namely bbri_train and bbri_validation with a composition of 65% for train data and 35% for validation data. Randomize data is necessary since it will prevents any bias during the training and prevents the model from learning the order of the training.

Randomize the data and split it into data train and validation can be done using library rsample.

library(rsample)
set.seed(123)

init <- initial_split(data = data_train, #data that want to be split
                      prop = 0.65, #proportion of data that want to be split
                      strata = final_decision) #target variable

bbri_train <- training(init) 
bbri_validation <- testing(init)

table(bbri_train$final_decision)

## 
##  Buy Hold Sell 
##   24  510   25

table(bbri_validation$final_decision)

## 
##  Buy Hold Sell 
##    9  271   20

- Data Test

bbri_test <- bbri_analisa2[71:500,] %>% 
  filter(date < "2018-01-01") %>% 
  mutate_if(is.character, as.factor)

table(bbri_test$final_decision)

## 
##  Buy Hold Sell 
##   10  382   38

3.1.2 ISAT.JK

- Data Train

data_train <- isat_analisa2 %>% 
  filter(date > "2017-12-31" & date < "2021-06-01") %>% 
  mutate_if(is.character, as.factor)

set.seed(123)

init <- initial_split(data = data_train, 
                      prop = 0.65, 
                      strata = final_decision) 

isat_train <- training(init) 
isat_validation <- testing(init)

table(isat_train$final_decision)

## 
##  Buy Hold Sell 
##   21  509   29

table(isat_validation$final_decision)

## 
##  Buy Hold Sell 
##    9  277   14

- Data Test

isat_test <- isat_analisa2[71:500,] %>% 
  filter(date < "2018-01-01") %>% 
  mutate_if(is.character, as.factor)

table(isat_test$final_decision)

## 
##  Buy Hold Sell 
##   12  392   26

3.1.3 SIDO.JK

- Data Train

data_train <- sido_analisa2 %>% 
  filter(date > "2017-12-31" & date < "2021-06-01") %>% 
  mutate_if(is.character, as.factor)

set.seed(123)

init <- initial_split(data = data_train, 
                      prop = 0.65, 
                      strata = final_decision) 

sido_train <- training(init) 
sido_validation <- testing(init)

table(sido_train$final_decision)

## 
##  Buy Hold Sell 
##   19  492   48

table(sido_validation$final_decision)

## 
##  Buy Hold Sell 
##    5  265   31

- Data Test

sido_test <- sido_analisa2[81:492,] %>% 
  filter(date < "2018-01-01") %>% 
  mutate_if(is.character, as.factor)

table(sido_test$final_decision)

## 
##  Buy Hold Sell 
##   17  378   17

3.1.4 HOKI.JK

- Data Train

data_train <- hoki_analisa2 %>% 
  filter(date > "2017-12-31" & date < "2021-06-01") %>% 
  mutate_if(is.character, as.factor)

set.seed(123)

init <- initial_split(data = data_train, 
                      prop = 0.65, 
                      strata = final_decision) 

hoki_train <- training(init) 
hoki_validation <- testing(init)

table(hoki_train$final_decision)

## 
##  Buy Hold Sell 
##   30  488   41

table(hoki_validation$final_decision)

## 
##  Buy Hold Sell 
##   13  273   15

- Data Test

hoki_test <- hoki_analisa2[71:130,] %>% 
  filter(date < "2018-01-01") %>% 
  mutate_if(is.character, as.factor)

table(hoki_test$final_decision)

## 
##  Buy Hold Sell 
##    5   53    2

3.1.5 WIKA.JK

- Data Train

data_train <- wika_analisa2 %>% 
  filter(date > "2017-12-31" & date < "2021-06-01") %>% 
  mutate_if(is.character, as.factor)

set.seed(123)

init <- initial_split(data = data_train, 
                      prop = 0.65, 
                      strata = final_decision) 

wika_train <- training(init) 
wika_validation <- testing(init)

table(wika_train$final_decision)

## 
##  Buy Hold Sell 
##   39  480   40

table(wika_validation$final_decision)

## 
##  Buy Hold Sell 
##   22  262   16

- Data Test

wika_test <- wika_analisa2[81:500,] %>% 
  filter(date < "2018-01-01") %>% 
  mutate_if(is.character, as.factor)

table(wika_test$final_decision)

## 
##  Buy Hold Sell 
##   29  372   19

3.2 SMOTE

As can be seen from cross validation, the target variable proportions are imbalanced. The imbalanced training dataset cannot be used because of the severely skewed class distribution. This is the cause for poor performance with traditional machine learning models and evaluation metrics that assume a balanced class distribution.

But afraid not, there is a technique called Synthetic Minority Oversampling Technique (SMOTE). SMOTE is an oversampling technique that generates synthetic samples from the minority class. It is used to obtain a synthetically class-balanced or nearly class-balanced training set, which is then used to train the model.

Library used to activate function SMOTE is UBL. In the UBL library there is a function called SmoteClassif(), that function will be used to make the target variable in our train data balanced.

There are four arguments from function SmoteClassif() which must be noticed.
- Argument form :
A formula describing the prediction problem.
- Argument dat :
A data frame containing the original (unbalanced) data set, can be specified which column.
- Argument C.perc :
A named list containing the percentage(s) of under- or/and over-sampling to apply to each class. The over-sampling percentage is a number above 1 while the under-sampling percentage should be a number below 1. If the number 1 is provided for a given class then that class remains unchanged. Alternatively it may be “balance” (the default) or “extreme”, cases where the sampling percentages are automatically estimated either to balance the examples between the minority and majority classes or to invert the distribution of examples across the existing classes transforming the majority classes into minority and vice-versa.
- Argument dist :
The parameter dist allows the user to define the distance metric to be used in the neighbors computation. Although the default is the Euclidean distance, other metrics are available. This allows the computation of distances in data sets with, for instance, both nominal and numeric features. The options available for the distance functions are as follows:
* for data with only numeric features: “Manhattan”, “Euclidean”, “Canberra”, “Chebyshev”, “p-norm”.
* for data with only nominal features: “Overlap”.
* for dealing with both nominal and numeric features: “HEOM”, “HVDM”.

3.2.1 BBRI.JK

library(UBL)

# Since the date and stock name are not important to be duplicated, they can be excluded from the dataframe
dat <- bbri_train[, c(3:26)] 

# There are two option to balanced the data, by make it equally balanced or make it almost balanced by setting the distribution of each target varibale unique
almost_balanced <- list(Buy = 30, Hold = 1, Sell = 20)

bbri_train_smote <- SmoteClassif(form = final_decision ~ ., # The format of form = target variable ~ the rest of the column
                                 dat = dat, 
                                 C.perc = almost_balanced,  
                                 dist = "HVDM") # HEOM / HVDM can be used for upsample both nominal and numeric

# Total distribution of target variable before SMOTE
table(bbri_train$final_decision)

## 
##  Buy Hold Sell 
##   24  510   25

# Total distribution of target variable after SMOTE
table(bbri_train_smote$final_decision)

## 
##  Buy Hold Sell 
##  720  510  500

3.2.2 ISAT.JK

dat <- isat_train[, c(3:29)] 

almost_balanced <- list(Buy = 25, Hold = 1, Sell = 22)

isat_train_smote <- SmoteClassif(form = final_decision ~ ., 
                                 dat = dat, 
                                 C.perc = almost_balanced,  
                                 dist = "HVDM")

# Total distribution of target variable before SMOTE
table(isat_train$final_decision)

## 
##  Buy Hold Sell 
##   21  509   29

# Total distribution of target variable after SMOTE
table(isat_train_smote$final_decision)

## 
##  Buy Hold Sell 
##  525  509  638

3.2.3 SIDO.JK

dat <- isat_train[, c(3:29)] 

almost_balanced <- list(Buy = 25, Hold = 1, Sell = 22)

sido_train_smote <- SmoteClassif(form = final_decision ~ ., 
                                 dat = dat, 
                                 C.perc = almost_balanced,  
                                 dist = "HVDM")

# Total distribution of target variable before SMOTE
table(sido_train$final_decision)

## 
##  Buy Hold Sell 
##   19  492   48

# Total distribution of target variable after SMOTE
table(sido_train_smote$final_decision)

## 
##  Buy Hold Sell 
##  525  509  638

3.2.4 HOKI.JK

dat <- hoki_train[, c(3:28)] 

almost_balanced <- list(Buy = 15, Hold = 1, Sell = 13)

hoki_train_smote <- SmoteClassif(form = final_decision ~ ., 
                                 dat = dat, 
                                 C.perc = almost_balanced,  
                                 dist = "HVDM")

# Total distribution of target variable before SMOTE
table(hoki_train$final_decision)

## 
##  Buy Hold Sell 
##   30  488   41

# Total distribution of target variable after SMOTE
table(hoki_train_smote$final_decision)

## 
##  Buy Hold Sell 
##  450  488  533

3.2.5 WIKA.JK

dat <- wika_train[, c(3:28)] 

almost_balanced <- list(Buy = 15, Hold = 1, Sell = 15)

wika_train_smote <- SmoteClassif(form = final_decision ~ ., 
                                 dat = dat, 
                                 C.perc = almost_balanced,  
                                 dist = "HVDM")

# Total distribution of target variable before SMOTE
table(wika_train$final_decision)

## 
##  Buy Hold Sell 
##   39  480   40

# Total distribution of target variable after SMOTE
table(wika_train_smote$final_decision)

## 
##  Buy Hold Sell 
##  585  480  600

4 Machine Learning

After getting all the target variables needed through the EDA process and prepared in data pre-processing, the next thing to do is start the machine learning modeling process. There will be two models developed in this project to be compared which model is the best, those models are called Decision Tree and Random Forest.

Decision Tree and Random Forest are categorized as a classification model, which very suitable for this project since the main goal of this project to built a machine learning project which capable to classify whether today is it the right time for the investor to open position, close the position or keep position until the right time.

4.1 Model Decision Tree

Decision Tree is a model that can be used to visually and explicitly represent decisions and decision making. In this case, this model will be used to classify whether today is the right time to buy, sell or hold stock. The function that will be used to create the decision tree model is ctree() from the library partykit.

4.1.1 BBRI.JK

- Model Based On Data Train & Validation

In this process what we will do is train the decision tree model using the data train and the model that have been train will be used to evaluate the model’s performance with data validation.

Modeling in the decision tree can also be given several parameters to simplify the model to make it easier to interpret, those parameters are:
- mincriterion: The value of the test statistic (1 - p-value) that must be exceeded in order to implement a split.
- minsplit: The minimum number of observations that must exist in a node in order for a split to be attempted. (Default to 20)
- minbucket: The minimum number of observations at the terminal node. If not fulfilled, no branching is done. (default: 7)

Those three parameters above can be set using argument control = ctree_control(mincriterion = ,minsplit = , minbucket = ).

In this case to reduce the complexity of the model:
- mincriterion: the value is need to be enlarged into 0.5 since from the model above, the p-value is to small.
- minsplit: the minimum number of observations that must exist in a node in order for a split to be attempted, in this case 100.
- minbucket: the minimum number of observations in any terminal node, in this case 50.

library(partykit)

# Model training
model_dt <- ctree(final_decision ~ ., 
                  data = bbri_train_smote,
                  control = ctree_control(mincriterion = 0.5,
                                          minsplit = 100,
                                          minbucket = 50))

# Model training result visualization
plot(model_dt, type= "simple")

Based on model Decision Tree visualization above, MACD technical analysis is most often used to consider giving advice on when is the right time to buy or sell stocks.

The next thing that needs to be done after training the model is to evaluate the model on the validation data that has been separated in the cross validation process. To evaluate the model, you can follow the method below.

pred_model_dt <- predict(object = model_dt, newdata = bbri_validation, type = "response")

library(caret)
eval_pred_model_dt <- confusionMatrix(data = pred_model_dt,
                                      reference = as.factor(bbri_validation$final_decision))

table(bbri_validation$final_decision)

## 
##  Buy Hold Sell 
##    9  271   20

eval_pred_model_dt

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy    7    0    0
##       Hold   0  271    0
##       Sell   2    0   20
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9933          
##                  95% CI : (0.9761, 0.9992)
##     No Information Rate : 0.9033          
##     P-Value [Acc > NIR] : 3.106e-11       
##                                           
##                   Kappa : 0.9626          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             0.77778      1.0000     1.00000
## Specificity             1.00000      1.0000     0.99286
## Pos Pred Value          1.00000      1.0000     0.90909
## Neg Pred Value          0.99317      1.0000     1.00000
## Prevalence              0.03000      0.9033     0.06667
## Detection Rate          0.02333      0.9033     0.06667
## Detection Prevalence    0.02333      0.9033     0.07333
## Balanced Accuracy       0.88889      1.0000     0.99643

- Model Based On Data Test

In order to confirm once again that the model that has been made is not overfitting, then let’s try to do an evaluation again using validation data. If the results of the evaluation using test data do not differ much from the results of the evaluation with validation data, it means that our model is good enough and is not overfitting.

pred_model_dt_test <- predict(object = model_dt, newdata = bbri_test, type = "response")

eval_pred_model_dt_test <- confusionMatrix(data = pred_model_dt_test,
                                      reference = as.factor(bbri_test$final_decision))

table(bbri_test$final_decision)

## 
##  Buy Hold Sell 
##   10  382   38

eval_pred_model_dt_test

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy    7    0    0
##       Hold   0  382    0
##       Sell   3    0   38
## 
## Overall Statistics
##                                           
##                Accuracy : 0.993           
##                  95% CI : (0.9797, 0.9986)
##     No Information Rate : 0.8884          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9655          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             0.70000      1.0000     1.00000
## Specificity             1.00000      1.0000     0.99235
## Pos Pred Value          1.00000      1.0000     0.92683
## Neg Pred Value          0.99291      1.0000     1.00000
## Prevalence              0.02326      0.8884     0.08837
## Detection Rate          0.01628      0.8884     0.08837
## Detection Prevalence    0.01628      0.8884     0.09535
## Balanced Accuracy       0.85000      1.0000     0.99617

4.1.2 ISAT.JK

# Model training
model_dt <- ctree(final_decision ~ ., 
                  data = isat_train_smote,
                  control = ctree_control(mincriterion = 0.5,
                                          minsplit = 100,
                                          minbucket = 50))

# Model training result visualization
plot(model_dt, type= "simple")

pred_model_dt <- predict(object = model_dt, newdata = isat_validation, type = "response")

eval_pred_model_dt <- confusionMatrix(data = pred_model_dt,
                                      reference = as.factor(isat_validation$final_decision))

table(isat_validation$final_decision)

## 
##  Buy Hold Sell 
##    9  277   14

eval_pred_model_dt

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy    9    9    0
##       Hold   0  268    0
##       Sell   0    0   14
## 
## Overall Statistics
##                                           
##                Accuracy : 0.97            
##                  95% CI : (0.9438, 0.9862)
##     No Information Rate : 0.9233          
##     P-Value [Acc > NIR] : 0.000562        
##                                           
##                   Kappa : 0.8247          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity              1.0000      0.9675     1.00000
## Specificity              0.9691      1.0000     1.00000
## Pos Pred Value           0.5000      1.0000     1.00000
## Neg Pred Value           1.0000      0.7187     1.00000
## Prevalence               0.0300      0.9233     0.04667
## Detection Rate           0.0300      0.8933     0.04667
## Detection Prevalence     0.0600      0.8933     0.04667
## Balanced Accuracy        0.9845      0.9838     1.00000

- Model Based On Data Test

pred_model_dt_test <- predict(object = model_dt, newdata = isat_test, type = "response")

eval_pred_model_dt_test <- confusionMatrix(data = pred_model_dt_test,
                                      reference = as.factor(isat_test$final_decision))

table(isat_test$final_decision)

## 
##  Buy Hold Sell 
##   12  392   26

eval_pred_model_dt_test

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy   11   42    0
##       Hold   0  350    0
##       Sell   1    0   26
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9             
##                  95% CI : (0.8677, 0.9267)
##     No Information Rate : 0.9116          
##     P-Value [Acc > NIR] : 0.8259          
##                                           
##                   Kappa : 0.6012          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             0.91667      0.8929     1.00000
## Specificity             0.89952      1.0000     0.99752
## Pos Pred Value          0.20755      1.0000     0.96296
## Neg Pred Value          0.99735      0.4750     1.00000
## Prevalence              0.02791      0.9116     0.06047
## Detection Rate          0.02558      0.8140     0.06047
## Detection Prevalence    0.12326      0.8140     0.06279
## Balanced Accuracy       0.90809      0.9464     0.99876

4.1.3 SIDO.JK

# Model training
model_dt <- ctree(final_decision ~ ., 
                  data = isat_train_smote,
                  control = ctree_control(mincriterion = 0.5,
                                          minsplit = 100,
                                          minbucket = 50))

# Model training result visualization
plot(model_dt, type= "simple")

pred_model_dt <- predict(object = model_dt, newdata = isat_validation, type = "response")

eval_pred_model_dt <- confusionMatrix(data = pred_model_dt,
                                      reference = as.factor(isat_validation$final_decision))

table(isat_validation$final_decision)

## 
##  Buy Hold Sell 
##    9  277   14

eval_pred_model_dt

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy    9    9    0
##       Hold   0  268    0
##       Sell   0    0   14
## 
## Overall Statistics
##                                           
##                Accuracy : 0.97            
##                  95% CI : (0.9438, 0.9862)
##     No Information Rate : 0.9233          
##     P-Value [Acc > NIR] : 0.000562        
##                                           
##                   Kappa : 0.8247          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity              1.0000      0.9675     1.00000
## Specificity              0.9691      1.0000     1.00000
## Pos Pred Value           0.5000      1.0000     1.00000
## Neg Pred Value           1.0000      0.7187     1.00000
## Prevalence               0.0300      0.9233     0.04667
## Detection Rate           0.0300      0.8933     0.04667
## Detection Prevalence     0.0600      0.8933     0.04667
## Balanced Accuracy        0.9845      0.9838     1.00000

- Model Based On Data Test

pred_model_dt_test <- predict(object = model_dt, newdata = isat_test, type = "response")

eval_pred_model_dt_test <- confusionMatrix(data = pred_model_dt_test,
                                      reference = as.factor(isat_test$final_decision))

table(isat_test$final_decision)

## 
##  Buy Hold Sell 
##   12  392   26

eval_pred_model_dt_test

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy   11   42    0
##       Hold   0  350    0
##       Sell   1    0   26
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9             
##                  95% CI : (0.8677, 0.9267)
##     No Information Rate : 0.9116          
##     P-Value [Acc > NIR] : 0.8259          
##                                           
##                   Kappa : 0.6012          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             0.91667      0.8929     1.00000
## Specificity             0.89952      1.0000     0.99752
## Pos Pred Value          0.20755      1.0000     0.96296
## Neg Pred Value          0.99735      0.4750     1.00000
## Prevalence              0.02791      0.9116     0.06047
## Detection Rate          0.02558      0.8140     0.06047
## Detection Prevalence    0.12326      0.8140     0.06279
## Balanced Accuracy       0.90809      0.9464     0.99876

4.1.4 HOKI.JK

# Model training
model_dt <- ctree(final_decision ~ ., 
                  data = hoki_train_smote,
                  control = ctree_control(mincriterion = 0.5,
                                          minsplit = 100,
                                          minbucket = 50))

# Model training result visualization
plot(model_dt, type= "simple")

pred_model_dt <- predict(object = model_dt, newdata = hoki_validation, type = "response")

eval_pred_model_dt <- confusionMatrix(data = pred_model_dt,
                                      reference = as.factor(hoki_validation$final_decision))

table(hoki_validation$final_decision)

## 
##  Buy Hold Sell 
##   13  273   15

eval_pred_model_dt

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy   13    2    0
##       Hold   0  271    6
##       Sell   0    0    9
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9734          
##                  95% CI : (0.9483, 0.9885)
##     No Information Rate : 0.907           
##     P-Value [Acc > NIR] : 4.307e-06       
##                                           
##                   Kappa : 0.8356          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             1.00000      0.9927     0.60000
## Specificity             0.99306      0.7857     1.00000
## Pos Pred Value          0.86667      0.9783     1.00000
## Neg Pred Value          1.00000      0.9167     0.97945
## Prevalence              0.04319      0.9070     0.04983
## Detection Rate          0.04319      0.9003     0.02990
## Detection Prevalence    0.04983      0.9203     0.02990
## Balanced Accuracy       0.99653      0.8892     0.80000

- Model Based On Data Test

pred_model_dt_test <- predict(object = model_dt, newdata = hoki_test, type = "response")

eval_pred_model_dt_test <- confusionMatrix(data = pred_model_dt_test,
                                      reference = as.factor(hoki_test$final_decision))

table(hoki_test$final_decision)

## 
##  Buy Hold Sell 
##    5   53    2

eval_pred_model_dt_test

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy    5    0    0
##       Hold   0   53    0
##       Sell   0    0    2
## 
## Overall Statistics
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9404, 1)
##     No Information Rate : 0.8833     
##     P-Value [Acc > NIR] : 0.0005854  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             1.00000      1.0000     1.00000
## Specificity             1.00000      1.0000     1.00000
## Pos Pred Value          1.00000      1.0000     1.00000
## Neg Pred Value          1.00000      1.0000     1.00000
## Prevalence              0.08333      0.8833     0.03333
## Detection Rate          0.08333      0.8833     0.03333
## Detection Prevalence    0.08333      0.8833     0.03333
## Balanced Accuracy       1.00000      1.0000     1.00000

4.1.5 WIKA.JK

# Model training
model_dt <- ctree(final_decision ~ ., 
                  data = wika_train_smote,
                  control = ctree_control(mincriterion = 0.5,
                                          minsplit = 100,
                                          minbucket = 50))

# Model training result visualization
plot(model_dt, type= "simple")

pred_model_dt <- predict(object = model_dt, newdata = wika_validation, type = "response")

eval_pred_model_dt <- confusionMatrix(data = pred_model_dt,
                                      reference = as.factor(wika_validation$final_decision))

table(wika_validation$final_decision)

## 
##  Buy Hold Sell 
##   22  262   16

eval_pred_model_dt

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy   21   15    0
##       Hold   1  247    3
##       Sell   0    0   13
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9367          
##                  95% CI : (0.9029, 0.9614)
##     No Information Rate : 0.8733          
##     P-Value [Acc > NIR] : 0.0002543       
##                                           
##                   Kappa : 0.7547          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             0.95455      0.9427     0.81250
## Specificity             0.94604      0.8947     1.00000
## Pos Pred Value          0.58333      0.9841     1.00000
## Neg Pred Value          0.99621      0.6939     0.98955
## Prevalence              0.07333      0.8733     0.05333
## Detection Rate          0.07000      0.8233     0.04333
## Detection Prevalence    0.12000      0.8367     0.04333
## Balanced Accuracy       0.95029      0.9187     0.90625

- Model Based On Data Test

pred_model_dt_test <- predict(object = model_dt, newdata = wika_test, type = "response")

eval_pred_model_dt_test <- confusionMatrix(data = pred_model_dt_test,
                                      reference = as.factor(wika_test$final_decision))

table(wika_test$final_decision)

## 
##  Buy Hold Sell 
##   29  372   19

eval_pred_model_dt_test

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy   28   21    0
##       Hold   1  349    2
##       Sell   0    2   17
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9381          
##                  95% CI : (0.9106, 0.9592)
##     No Information Rate : 0.8857          
##     P-Value [Acc > NIR] : 0.0001954       
##                                           
##                   Kappa : 0.75            
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             0.96552      0.9382     0.89474
## Specificity             0.94629      0.9375     0.99501
## Pos Pred Value          0.57143      0.9915     0.89474
## Neg Pred Value          0.99730      0.6618     0.99501
## Prevalence              0.06905      0.8857     0.04524
## Detection Rate          0.06667      0.8310     0.04048
## Detection Prevalence    0.11667      0.8381     0.04524
## Balanced Accuracy       0.95590      0.9378     0.94487

4.2 Model Random Forest

Random Forest makes predictions by making many decision trees. Each decision tree has characteristics and is not interrelated, each one of decision tree makes their own predictions, then from the prediction results, majority voting is carried out. The class with the highest number will be the final prediction result.

Model Random Forest can be obtained from libary(e1071).

library(e1071)

4.2.1 BBRI.JK

- Model Based On Data Train & Validation

The first step is on Random Forest modeling is set the control variable and there are two variables that have to be set. The first one is K-Fold and the second one is how many times do the process want to be repeated.

K-Fold can be said as cross validation, usually cross validation is used to split between data train and data test but in Random Forest is used to divides the data by k equal parts, where each part is used to test data in turn. As k gets larger, the difference in size between the training set and the resampling subsets gets smaller. As this difference decreases, the bias of the technique becomes smaller. Usually, the choice of k is usually 5 or 10 but there is no formal rule.

In this case the k-fold will be 5 and will be repeat 3 times.

# set.seed(100)
# 
# ctrl <- trainControl(method = "repeatedcv",
#                       number = 5, # k-fold
#                       repeats = 3) #repetition
# 
# bbri_forest <- train(final_decision ~ .,
#                     data = bbri_train_smote,
#                     method = "rf", # random forest
#                     trControl = ctrl)
# 
# saveRDS(bbri_forest, "bbri_forest.RDS")

bbri_forest <- readRDS("model/bbri_forest.RDS")
bbri_forest

## Random Forest 
## 
## 1608 samples
##   23 predictor
##    3 classes: 'Buy', 'Hold', 'Sell' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 3 times) 
## Summary of sample sizes: 1286, 1287, 1287, 1286, 1286, 1287, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    2    0.9844566  0.9766327
##   14    0.9830047  0.9744622
##   27    0.9807247  0.9710376
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.

From the model summary, we know that the optimum number of variables considered for splitting at each tree node is at mtry = 2. Since the largest accuracy value was produce at mtry = 2.

In order to find out what is the most importance variable or predictor used in random forest, function varImp() can be used.

varImp(bbri_forest)

## rf variable importance
## 
##   only 20 most important variables shown (out of 27)
## 
##                   Overall
## decision.MACDSell  100.00
## decision.MACDHold   76.76
## MACD                65.68
## RSI30               57.90
## RSI14               49.92
## RSI7                44.94
## RSI70               39.47
## decision.EMAHold    35.62
## volume              25.51
## decision.RSIHold    24.53
## close               20.98
## SMA70               20.11
## open                19.07
## low                 18.67
## SMA50               18.29
## high                18.20
## EMA15               17.30
## SMA5                16.52
## EMA70               16.35
## adjusted            15.73

rm_pred <- predict(bbri_forest, bbri_validation, type = "raw")

eval_rf <- confusionMatrix(data = rm_pred,
                reference = bbri_validation$final_decision)

table(bbri_validation$final_decision)

## 
##  Buy Hold Sell 
##    9  271   20

eval_rf

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy    8    2    0
##       Hold   1  269    1
##       Sell   0    0   19
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9867          
##                  95% CI : (0.9662, 0.9964)
##     No Information Rate : 0.9033          
##     P-Value [Acc > NIR] : 2.805e-09       
##                                           
##                   Kappa : 0.9254          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             0.88889      0.9926     0.95000
## Specificity             0.99313      0.9310     1.00000
## Pos Pred Value          0.80000      0.9926     1.00000
## Neg Pred Value          0.99655      0.9310     0.99644
## Prevalence              0.03000      0.9033     0.06667
## Detection Rate          0.02667      0.8967     0.06333
## Detection Prevalence    0.03333      0.9033     0.06333
## Balanced Accuracy       0.94101      0.9618     0.97500

- Model Based On Data Test

rm_pred_test <- predict(bbri_forest, bbri_test, type = "raw")

eval_rf_test <- confusionMatrix(data = rm_pred_test,
                reference = bbri_test$final_decision)

table(bbri_test$final_decision)

## 
##  Buy Hold Sell 
##   10  382   38

eval_rf_test

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy    0    0    0
##       Hold   8  382    6
##       Sell   2    0   32
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9628          
##                  95% CI : (0.9403, 0.9786)
##     No Information Rate : 0.8884          
##     P-Value [Acc > NIR] : 2.133e-08       
##                                           
##                   Kappa : 0.7872          
##                                           
##  Mcnemar's Test P-Value : 0.001134        
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             0.00000      1.0000     0.84211
## Specificity             1.00000      0.7083     0.99490
## Pos Pred Value              NaN      0.9646     0.94118
## Neg Pred Value          0.97674      1.0000     0.98485
## Prevalence              0.02326      0.8884     0.08837
## Detection Rate          0.00000      0.8884     0.07442
## Detection Prevalence    0.00000      0.9209     0.07907
## Balanced Accuracy       0.50000      0.8542     0.91850

4.2.2 ISAT.JK

# set.seed(100)
# 
# ctrl <- trainControl(method = "repeatedcv",
#                       number = 5, # k-fold
#                       repeats = 3) #repetition
# 
# isat_forest <- train(final_decision ~ .,
#                     data = isat_train_smote,
#                     method = "rf", # random forest
#                     trControl = ctrl)
# 
# saveRDS(isat_forest, "isat_forest.RDS")

isat_forest <- readRDS("model/isat_forest.RDS")
isat_forest

## Random Forest 
## 
## 1513 samples
##   26 predictor
##    3 classes: 'Buy', 'Hold', 'Sell' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 1362, 1361, 1362, 1361, 1363, 1362, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    2    0.9829496  0.9744057
##   16    0.9767392  0.9651042
##   30    0.9737059  0.9605557
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.

From the model summary, we know that the optimum number of variables considered for splitting at each tree node is at mtry = 2. Since the largest accuracy value was produce at mtry = 2.

In order to find out what is the most importance variable or predictor used in random forest, function varImp() can be used.

varImp(isat_forest)

## rf variable importance
## 
##   only 20 most important variables shown (out of 30)
## 
##                  Overall
## decision.EMASell  100.00
## decision.EMAHold   97.03
## RSI45              67.64
## RSI10              67.18
## MACD               64.18
## RSI70              59.30
## RSI38              54.35
## open               39.01
## decision.RSIHold   36.33
## volume             35.49
## low                31.57
## high               31.32
## EMA18              31.10
## EMA5               30.23
## close              27.46
## EMA70              27.20
## EMA10              26.95
## adjusted           26.75
## SMA5               26.21
## EMA30              26.14

rm_pred <- predict(isat_forest, isat_validation, type = "raw")

eval_rf <- confusionMatrix(data = rm_pred,
                reference = isat_validation$final_decision)

table(isat_validation$final_decision)

## 
##  Buy Hold Sell 
##    9  277   14

eval_rf

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy    7    3    0
##       Hold   2  273    3
##       Sell   0    1   11
## 
## Overall Statistics
##                                           
##                Accuracy : 0.97            
##                  95% CI : (0.9438, 0.9862)
##     No Information Rate : 0.9233          
##     P-Value [Acc > NIR] : 0.000562        
##                                           
##                   Kappa : 0.788           
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             0.77778      0.9856     0.78571
## Specificity             0.98969      0.7826     0.99650
## Pos Pred Value          0.70000      0.9820     0.91667
## Neg Pred Value          0.99310      0.8182     0.98958
## Prevalence              0.03000      0.9233     0.04667
## Detection Rate          0.02333      0.9100     0.03667
## Detection Prevalence    0.03333      0.9267     0.04000
## Balanced Accuracy       0.88373      0.8841     0.89111

- Model Based On Data Test

rm_pred_test <- predict(isat_forest, isat_test, type = "raw")

eval_rf_test <- confusionMatrix(data = rm_pred_test,
                reference = isat_test$final_decision)

table(isat_test$final_decision)

## 
##  Buy Hold Sell 
##   12  392   26

eval_rf_test

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy    0    0    0
##       Hold  12  392    9
##       Sell   0    0   17
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9512          
##                  95% CI : (0.9263, 0.9695)
##     No Information Rate : 0.9116          
##     P-Value [Acc > NIR] : 0.001322        
##                                           
##                   Kappa : 0.5998          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             0.00000      1.0000     0.65385
## Specificity             1.00000      0.4474     1.00000
## Pos Pred Value              NaN      0.9492     1.00000
## Neg Pred Value          0.97209      1.0000     0.97821
## Prevalence              0.02791      0.9116     0.06047
## Detection Rate          0.00000      0.9116     0.03953
## Detection Prevalence    0.00000      0.9605     0.03953
## Balanced Accuracy       0.50000      0.7237     0.82692

4.2.3 SIDO.JK

# set.seed(100)
# 
# ctrl <- trainControl(method = "repeatedcv",
#                       number = 5, # k-fold
#                       repeats = 3) #repetition
# 
# sido_forest <- train(final_decision ~ .,
#                     data = sido_train_smote,
#                     method = "rf", # random forest
#                     trControl = ctrl)
# 
# saveRDS(sido_forest, "sido_forest.RDS")

sido_forest <- readRDS("model/sido_forest.RDS")
sido_forest

## Random Forest 
## 
## 1636 samples
##   25 predictor
##    3 classes: 'Buy', 'Hold', 'Sell' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 1474, 1471, 1472, 1473, 1473, 1472, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    2    0.9828884  0.9682234
##   15    0.9835079  0.9695845
##   29    0.9809350  0.9646569
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 15.

From the model summary, we know that the optimum number of variables considered for splitting at each tree node is at mtry = 2. Since the largest accuracy value was produce at mtry = 15.

In order to find out what is the most importance variable or predictor used in random forest, function varImp() can be used.

varImp(sido_forest)

## rf variable importance
## 
##   only 20 most important variables shown (out of 29)
## 
##                   Overall
## decision.MACDSell 100.000
## decision.MACDHold  58.577
## decision.RSIHold   41.468
## RSI10              17.085
## decision.RSISell   16.592
## RSI14              10.095
## MACD                8.435
## RSI40               8.271
## RSI65               3.334
## SMA80               3.118
## volume              2.331
## open                2.072
## decision.SMASell    1.555
## close               1.523
## EMA55               1.422
## high                1.388
## EMA60               1.178
## low                 1.138
## SMA55               1.129
## EMA5                1.075

rm_pred <- predict(sido_forest, sido_validation, type = "raw")

eval_rf <- confusionMatrix(data = rm_pred,
                reference = sido_validation$final_decision)

table(sido_validation$final_decision)

## 
##  Buy Hold Sell 
##    5  265   31

eval_rf

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy    4    0    0
##       Hold   0  265    0
##       Sell   1    0   31
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9967          
##                  95% CI : (0.9816, 0.9999)
##     No Information Rate : 0.8804          
##     P-Value [Acc > NIR] : 9.346e-16       
##                                           
##                   Kappa : 0.9845          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             0.80000      1.0000      1.0000
## Specificity             1.00000      1.0000      0.9963
## Pos Pred Value          1.00000      1.0000      0.9688
## Neg Pred Value          0.99663      1.0000      1.0000
## Prevalence              0.01661      0.8804      0.1030
## Detection Rate          0.01329      0.8804      0.1030
## Detection Prevalence    0.01329      0.8804      0.1063
## Balanced Accuracy       0.90000      1.0000      0.9981

- Model Based On Data Test

rm_pred_test <- predict(sido_forest, sido_test, type = "raw")

eval_rf_test <- confusionMatrix(data = rm_pred_test,
                reference = sido_test$final_decision)

table(sido_test$final_decision)

## 
##  Buy Hold Sell 
##   17  378   17

eval_rf_test

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy   17    0    1
##       Hold   0  377    0
##       Sell   0    1   16
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9951          
##                  95% CI : (0.9826, 0.9994)
##     No Information Rate : 0.9175          
##     P-Value [Acc > NIR] : 2.806e-13       
##                                           
##                   Kappa : 0.9691          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             1.00000      0.9974     0.94118
## Specificity             0.99747      1.0000     0.99747
## Pos Pred Value          0.94444      1.0000     0.94118
## Neg Pred Value          1.00000      0.9714     0.99747
## Prevalence              0.04126      0.9175     0.04126
## Detection Rate          0.04126      0.9150     0.03883
## Detection Prevalence    0.04369      0.9150     0.04126
## Balanced Accuracy       0.99873      0.9987     0.96932

4.2.4 HOKI.JK

# set.seed(100)
# 
# ctrl <- trainControl(method = "repeatedcv",
#                       number = 5, # k-fold
#                       repeats = 3) #repetition
# 
# hoki_forest <- train(final_decision ~ .,
#                     data = hoki_train_smote,
#                     method = "rf", # random forest
#                     trControl = ctrl)
# 
# saveRDS(hoki_forest, "bbri_forest.RDS")

hoki_forest <- readRDS("model/hoki_forest.RDS")
hoki_forest

## Random Forest 
## 
## 1723 samples
##   25 predictor
##    3 classes: 'Buy', 'Hold', 'Sell' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 1551, 1551, 1550, 1551, 1551, 1551, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    2    0.9789931  0.9681189
##   15    0.9787639  0.9678320
##   29    0.9804981  0.9704689
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 29.

From the model summary, we know that the optimum number of variables considered for splitting at each tree node is at mtry = 2. Since the largest accuracy value was produce at mtry = 29.

In order to find out what is the most importance variable or predictor used in random forest, function varImp() can be used.

varImp(hoki_forest)

## rf variable importance
## 
##   only 20 most important variables shown (out of 29)
## 
##                    Overall
## decision.MACDSell 100.0000
## decision.MACDHold  64.4945
## decision.RSIHold   49.1232
## decision.RSISell   27.0830
## RSI10               4.7417
## RSI65               2.4887
## RSI14               2.2891
## volume              2.0611
## close               1.7850
## SMA55               1.3956
## high                1.2195
## RSI40               1.0902
## adjusted            0.8876
## SMA70               0.8734
## EMA60               0.8670
## SMA5                0.8481
## MACD                0.8403
## open                0.6345
## low                 0.6100
## SMA25               0.5980

rm_pred <- predict(hoki_forest, hoki_validation, type = "raw")

eval_rf <- confusionMatrix(data = rm_pred,
                reference = hoki_validation$final_decision)

table(hoki_validation$final_decision)

## 
##  Buy Hold Sell 
##   13  273   15

eval_rf

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy   13    0    0
##       Hold   0  273    0
##       Sell   0    0   15
## 
## Overall Statistics
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9878, 1)
##     No Information Rate : 0.907      
##     P-Value [Acc > NIR] : 1.724e-13  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             1.00000       1.000     1.00000
## Specificity             1.00000       1.000     1.00000
## Pos Pred Value          1.00000       1.000     1.00000
## Neg Pred Value          1.00000       1.000     1.00000
## Prevalence              0.04319       0.907     0.04983
## Detection Rate          0.04319       0.907     0.04983
## Detection Prevalence    0.04319       0.907     0.04983
## Balanced Accuracy       1.00000       1.000     1.00000

- Model Based On Data Test

rm_pred_test <- predict(hoki_forest, hoki_test, type = "raw")

eval_rf_test <- confusionMatrix(data = rm_pred_test,
                reference = hoki_test$final_decision)

table(hoki_test$final_decision)

## 
##  Buy Hold Sell 
##    5   53    2

eval_rf_test

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy    5    0    2
##       Hold   0   53    0
##       Sell   0    0    0
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9667          
##                  95% CI : (0.8847, 0.9959)
##     No Information Rate : 0.8833          
##     P-Value [Acc > NIR] : 0.0233          
##                                           
##                   Kappa : 0.8413          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             1.00000      1.0000     0.00000
## Specificity             0.96364      1.0000     1.00000
## Pos Pred Value          0.71429      1.0000         NaN
## Neg Pred Value          1.00000      1.0000     0.96667
## Prevalence              0.08333      0.8833     0.03333
## Detection Rate          0.08333      0.8833     0.00000
## Detection Prevalence    0.11667      0.8833     0.00000
## Balanced Accuracy       0.98182      1.0000     0.50000

4.2.5 WIKA.JK

# set.seed(100)
# 
# ctrl <- trainControl(method = "repeatedcv",
#                       number = 5, # k-fold
#                       repeats = 3) #repetition
# 
# wika_forest <- train(final_decision ~ .,
#                     data = wika_train_smote,
#                     method = "rf", # random forest
#                     trControl = ctrl)
# 
# saveRDS(wika_forest, "wika_forest.RDS")

wika_forest <- readRDS("model/wika_forest.RDS")
wika_forest

## Random Forest 
## 
## 1568 samples
##   25 predictor
##    3 classes: 'Buy', 'Hold', 'Sell' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 1412, 1410, 1411, 1412, 1411, 1412, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    2    0.9864747  0.9794725
##   15    0.9866086  0.9796980
##   29    0.9839277  0.9756386
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 15.

From the model summary, we know that the optimum number of variables considered for splitting at each tree node is at mtry = 2. Since the largest accuracy value was produce at mtry = 15.

In order to find out what is the most importance variable or predictor used in random forest, function varImp() can be used.

varImp(wika_forest)

## rf variable importance
## 
##   only 20 most important variables shown (out of 29)
## 
##                    Overall
## decision.MACDSell 100.0000
## decision.MACDHold  75.0120
## MACD               43.4161
## decision.RSIHold   21.8267
## RSI5               19.5272
## RSI20              15.5693
## decision.RSISell    3.6135
## RSI35               2.8092
## RSI65               2.5218
## volume              1.7240
## SMA65               1.4097
## SMA80               0.8842
## EMA80               0.8590
## EMA65               0.7612
## low                 0.6694
## SMA20               0.6367
## SMA30               0.6154
## close               0.5905
## EMA30               0.5731
## high                0.5442

rm_pred <- predict(wika_forest, wika_validation, type = "raw")

eval_rf <- confusionMatrix(data = rm_pred,
                reference = wika_validation$final_decision)

table(wika_validation$final_decision)

## 
##  Buy Hold Sell 
##   22  262   16

eval_rf

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy   22    0    0
##       Hold   0  262    0
##       Sell   0    0   16
## 
## Overall Statistics
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9878, 1)
##     No Information Rate : 0.8733     
##     P-Value [Acc > NIR] : < 2.2e-16  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             1.00000      1.0000     1.00000
## Specificity             1.00000      1.0000     1.00000
## Pos Pred Value          1.00000      1.0000     1.00000
## Neg Pred Value          1.00000      1.0000     1.00000
## Prevalence              0.07333      0.8733     0.05333
## Detection Rate          0.07333      0.8733     0.05333
## Detection Prevalence    0.07333      0.8733     0.05333
## Balanced Accuracy       1.00000      1.0000     1.00000

- Model Based On Data Test

rm_pred_test <- predict(wika_forest, wika_test, type = "raw")

eval_rf_test <- confusionMatrix(data = rm_pred_test,
                reference = wika_test$final_decision)

table(wika_test$final_decision)

## 
##  Buy Hold Sell 
##   29  372   19

eval_rf_test

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Buy Hold Sell
##       Buy   26    2    0
##       Hold   3  368    0
##       Sell   0    2   19
## 
## Overall Statistics
##                                          
##                Accuracy : 0.9833         
##                  95% CI : (0.966, 0.9933)
##     No Information Rate : 0.8857         
##     P-Value [Acc > NIR] : 2.169e-14      
##                                          
##                   Kappa : 0.9209         
##                                          
##  Mcnemar's Test P-Value : NA             
## 
## Statistics by Class:
## 
##                      Class: Buy Class: Hold Class: Sell
## Sensitivity             0.89655      0.9892     1.00000
## Specificity             0.99488      0.9375     0.99501
## Pos Pred Value          0.92857      0.9919     0.90476
## Neg Pred Value          0.99235      0.9184     1.00000
## Prevalence              0.06905      0.8857     0.04524
## Detection Rate          0.06190      0.8762     0.04524
## Detection Prevalence    0.06667      0.8833     0.05000
## Balanced Accuracy       0.94572      0.9634     0.99751

5 Summary

Model Decision Tree & Random Forest Selection

Based on data train and data test confusion matrix results from both model, model decision tree and random forest produce good prediction performance result on data train but only model decision tree which can produce stable performance when evaluated using data test, while model random forest often make prediction errors when evaluated using the data test.

In addition to matching how many prediction results are in accordance with the target variable, the evaluation of model performance that is considered here is the variable Sensitivity statistics in each class of target variables from both models. Sensitivity is a measure of the proportion of actual positive cases that got predicted as positive (or true positive). Sensitivity is also termed as Recall. This implies that there will be another proportion of actual positive cases, which would get predicted incorrectly as negative (and, thus, could also be termed as the false negative)

Mathematically, sensitivity can be calculated as the following:

\[Sensitivity = (True Positive)/(True Positive + False Negative)\]

The following is the details in relation to True Positive and False Negative used in the above equation:
- True Positive = Machine learning predicted should buy a stock and actually technical analysis also provide signal to buy a stock. In other words, the true positive represents how many technical analysis signal to buy a stock and predicted to buy a stock.
- False Negative = Machine learning predicted should buy a stock and actually technical analysis provide signal to sell a stock. In other words, the false negative represents the number of tehcnical analysis provide to sell a stock and got predicted as buy a stock Ideally, we would seek the model to have low false negatives as it might prove to be financial threatening.

The higher value of sensitivity would mean higher value of true positive and lower value of false negative. The lower value of sensitivity would mean lower value of true positive and higher value of false negative. For financial domain, models with high sensitivity will be desired. And when viewed from the confusion matrix, the decision tree model has better sensitivity compared to the random forest model.

Therefore, it would be better to use a Decision Tree Model compared to Model Random Forest in this project to determine when is the right time to buy or sell stocks.

In addition, the confusion matrix that will be showed in the dashboard is the Accuracy from the model because I am simulating my dashboard will be used by real investors, and to minimize confusion I will only display a fairly common value.

Variable of Importance

In addition to getting recommendations on when is the right time to buy or sell a stock, what can also be obtained from the results of machine learning processing is the variable of importance. Variable of importance can also be said as the most consider predictor by machine learning, in this case are which technical analysis are the most considered by machine learning to provide suggestion whether today is the right time to buy or sell a stock.

By knowing the variable of importance, there are several benefits obtained:
- Beginner investors who have never learned about technical analysis can start learning from technical analysis which is the most considered by machine learning.
- When you want to make improvements to the model for, the variable of importance can be used to choose what variables can be eliminated because they do not have much effect on the results.

Variable of importance on each stocks:
- BBRI.JK : MACD
- ISAT.JK : MACD
- SIDO.JK : MACD
- HOKI.JK : MACD
- WIKA.JK : MACD