1 Introduction

In this report, we will evaluate machine learning application on investing strategies. There are several ways to incorporate machine learning techniques to construct a portfolios. However, we will focus on 2 main approaches; sparse portfolio and signals.

1.1 Models

** Benchmark portfolio ** We need benchmark portfolio to evaluate our strategies. For simplicity, we use equally-weighted portfolio as our benchmark. Where it is needed, the same weighting scheme will be applied to predicted results from machine learning strategies to clearly contrast the impact of having included machine learning techniques in the strategies.

** Sparse portfolio Sparse portfolio applies machine learning algorithm to predict stock future return by using other stocks’ current returns as features. Although it is not conventional, this approach gave insightful results as discussed in the class through Lasso method that it can be used to assign weights to each stock directly. Here, we will investigate the methodology by applying elastic network**. Elastic network is a combination of Lasso and Ridge regressions. Both are regularization methods which help the model avoid overfitting problem caused by inclusion of too many features. To do this, they penalise insignificant features, meaning that they reduces beta of each feature leaving large beta’s for significant features. Nonetheless, the difference is that Ridge reduces all beta’s without kicking any of features out of the model (l2 norm), while Lasso does both (l1 norm). We decided to apply elastic network because it allows us to utilise both of their advantages; feature selection from Lasso while not kicking the feature out too fast from Ridge. Therefore, hyperparameter for this method is the degree to use l1 and l2 norms, which are alpha and alpha-1.

** Signals ** This approach is rather more conventional application of machine learning to predict future returns. We will use the features provided for each stock to predict their future returns in each period. However, to consider the distribution of the data, we also add mean, variance, and skewness of each stock.

Signal approach can be splitted into 2 sub-approaches; numerical and categorical dependent variables. As future returns are numerical variable, we can apply only some machine learning techniques. Hence, we will transform future returns into categorical variable to allow us to construct a portfolio by utilising both types of algorithm.

While still considering the sign (positive and negative returns), we will work with 3 signals; percentile, 1-period momentum, and 3-period momentum. For numerical dependent variable, we transform predicted future returns into signals before apply equally-weighted scheme. For categorical approach, we transform actual future returns into signals, then use the signal as dependent variables. Lastly, we use the predicted signals to apply equally-weighted scheme.

2 Data Preparation

2.1 Import Data

Import and preview data

load('data_full.RData')
head(data)

2.2 Data Pre-processing

In addition to the given data, we need to perform some important data-preprocessing in order to generate new variables that are essential to our analysis. Those variables include the following:

  1. P_Return (Past return): The original data only contains monthly closing price information from which we calculate past monthly return. In general, return is commonly used rather than closing price in portfolio backtesting.

  2. F_Return (Future return): Our y variable is future monthly return which can be obtained by leading past return by one period.

  3. Moments of Return (Mean, Variance, and Skewness): We decided to generate additional information including first three moments of past monthly return to maximize the benefit of the data we have.

if(!require(tidyverse)){install.packages('tidyverse')}
if(!require(PerformanceAnalytics)){install.packages('PerformanceAnalytics')}
if(!require(dplyr)){install.packages("dplyr")}
library(dplyr)
library(tidyverse)
library(PerformanceAnalytics)
load('data_full.RData')                             # Reload data
data <- data %>% arrange(Date,Tick)
tick <- levels(data$Tick) 
data <- data  %>% 
    group_by(Tick) %>%                              # Group asset by asset
    mutate(P_Return = Close / lag(Close) - 1) %>%   # Adding past returns
    mutate(F_Return = lead(P_Return)) %>%           # Adding forward returns
    na.omit() %>%                                   # Drop all NA row before calculate moments
    mutate(mean = apply.fromstart(P_Return, FUN = 'mean' , gap = 1)) %>%              # Adding past return mean
    mutate(variance = apply.fromstart(P_Return, FUN = 'var' , gap = 1)) %>%           # Adding past return variance
    mutate(skewness = apply.fromstart(P_Return, FUN = 'skewness' , gap = 1)) %>%      # Adding past return skewness
    na.omit() %>%                                                                        # Drop all NA row
    ungroup() #addadd
head(data)

Before starting the analysis, we would like to investigate correlation and autocorrelation between the features in order to be aware of any hidden relationship among the explanatory variables. First, we find correlation between explanatory variables.

corr <- data %>%
  dplyr::select(-one_of('Date', 'Tick', 'F_Return'))
corr <- data.frame(cor(corr))
corr

There are 3 pairs of interesting correlation between explanatory variables.

  1. RSI postively correlated with Past Return (0.92)
  2. D2E positively correlated with P2B (0.64)
  3. Vol1M negatively correlated with Close (-0.66)

It is important to be aware that overfitting might occur if the model uses the training data which includes 2 or more positively correlated variables due to their information redundancy.

Apart from correlation between explanatory variables, we should also be considering the autocorrelation for each feature. Below is the analysis of autocorrelation for our x variables. It is obvious than there is not any significant autocorrelation for any feature.

for (i in seq(1,length(colnames(acf)),1)){
  temp_c <- cbind(temp_c, cor(autocorr[1:nrow(autocorr)-1,i], autocorr[2:nrow(autocorr),i]))
}
Error in seq.default(1, length(colnames(acf)), 1) : 
  wrong sign in 'by' argument

2.3 Split data

Within our monthly data over 25 years, there are 229 dates from April 1994 to April 2018. We decided to split the data into training set and testing set using 70/30 ratio. Since we are dealing with timeseries data, we simply split the dataset using the separation date which will give such desirable ratio, January 2013.

dates_all <- unique(data$Date)              # Select all unique dates
sep_date <- as.Date("2013-01-02")           # Set the seperation date for training set and          out-of-sample
dates_oos <- dates_all[dates_all >= sep_date]        # Out of sample set

3 Pre-defined functions and variables

Below includes the global functions and variables, which are utilised throughout the file.

3.1 Returns

returns <- data %>%                         
    dplyr::select(Date, Tick, P_Return) %>%        
    spread(key = Tick, value = P_Return)    
head(returns)
asset_return <- filter(returns, Date>sep_date)   
returns_oos <- data %>%                         
    dplyr::select(Date, Tick, P_Return) %>%       
    spread(key = Tick, value = P_Return) %>%
    filter(Date > sep_date) %>%
    dplyr::select(-Date)

3.2 Equally-weighted calculation

getEW <- function(weight){ # Define a function to calculate equally-weighted portfolio
  ew <- (t(scale(t(weight), center=FALSE, scale=colSums(t(weight)))))  # Normalise weight (Equally weighted)
  ew[is.na(ew)] <- 0
  return (ew)
}

3.3 Feature Selection 1

Define a function for feature selection for each stock using lasso. The function receives stock ticker as a parameter and returns a dataframe with only selected explanatory variables. The function uses glmnet package to build a lasso model using lambda.min which minimize the mean cross-validate error. In general, LASSO is a common tool for feature selection due to its penalised terms in objective function. Any variables with insigficant meaning to our response variable will be eliminated by lasso. There is a special case in which lasso eliminates all features. As a result, past return (P_Return) is used for the only feature.

if(!require(glmnet)){install.packages('glmnet')}
library(glmnet)
selectFeature <- function(ticker){
  set.seed(9) # Set seed for consistency
  temp <- data %>% filter(Tick == ticker) %>% # Filter ticker
      dplyr::select(-one_of('Tick', 'Date','Close')) # Drop tick, date, and close columns
  lasso <- cv.glmnet(x = as.matrix(temp[,-which(names(temp) == "F_Return")]), y = as.matrix(temp[,which(names(temp) == "F_Return")])) # Train lasso model
  subset = coef(lasso, s = "lambda.min")@Dimnames[[1]][coef(lasso, s = "lambda.min")@i + 1] # Get list of remaining variables
  subset =  subset[!subset == "(Intercept)"] # Get rid of intercept value
  if(length(subset) == 0){ # If there is no remaining variable
    subset = c('P_Return') # Use P_Return as only explanatory variable
  }
  return (data %>% filter(Tick == ticker) %>% # Return dataframe with Date, F_Return, and the selected explanatory variables
    dplyr::select(cbind('Date','F_Return',subset)))
}

Test the function by selecting feature for Apple stock

selectFeature('AAPL')

For Apple, lasso chooses 6 meaningful features including Mkt_Cap, Vol_1M, Prof_growth, mean, variance, and skewness.

3.4 Feature Selection 2

Use signal of future returns as dependent variable and perform feature selection using lasso for each stock.

if(!require(glmnet)){install.packages('glmnet')}
library(glmnet)
selectFeature2 <- function(ticker,sig_dat){
    #sig_dat: matrix with signal outputs for each stock
  set.seed(9)
  temp <- data %>% filter(Tick == ticker) %>%
      select(-one_of('Date','Close'))
  temp <- temp[-c(1)]
  temp <- temp[37:226,] #we trim the dataset for categorical, the reason is explained in the categorial section below.
  y = as.tibble(sig_dat) %>%
    select(ticker) 
  y = as.matrix(y)
  colnames(y) <- "y"
  lasso <- cv.glmnet(x = as.matrix(temp[,-which(names(temp) == "F_Return")]), 
                     y = y)
  subset = coef(lasso, s = "lambda.min")@Dimnames[[1]][coef(lasso, s = "lambda.min")@i + 1]
  subset =  subset[!subset == "(Intercept)"]
  if(length(subset) == 0){
    subset = c('P_Return')
  }
  result <- (data %>% filter(Tick == ticker) %>%
    select(cbind('Date',subset))) %>%
    slice(37:226) %>%
    mutate(output = y)
}

3.5 Performance Metrics

perf_met <- function(portf_returns, weights, asset_returns){
    avg_ret <- mean(portf_returns, na.rm = T)                     # Arithmetic mean 
    vol <- sd(portf_returns, na.rm = T)                           # Volatility
    Sharpe_ratio <- avg_ret / vol                                 # Sharpe ratio
    VaR_5 <- quantile(portf_returns, 0.05, na.rm = TRUE)          # Value-at-risk
    turn <- 0    
    for(t in 2:dim(weights)[1]){
        realised_returns <- asset_returns %>% filter(Date == dates_oos[t]) %>%  dplyr::select(-Date)
        prior_weights <- weights[t-1,] * (1 + realised_returns)
        temp_turn <- apply(abs(weights[t,] - prior_weights/sum(prior_weights)),1,sum)
        if(!any(is.na(temp_turn))){
          turn <- turn + apply(abs(weights[t,] - prior_weights/sum(prior_weights)),1,sum)
        }
    }
    turn <- turn/(length(dates_oos)-1)                                # Average over time
    met <- data.frame(avg_ret, vol, Sharpe_ratio, VaR_5, turn)    # Aggregation of all of this
    rownames(met) <- "metrics"
    return(met)
}
perf_met2 <- function(ret,weight,dates_oos,port_name,actual_ret){
    avg_ret <- mean(ret, na.rm = T)                     # Arithmetic mean 
    vol <- sd(ret, na.rm = T)                           # Volatility
    Sharpe_ratio <- avg_ret / vol                           # Sharpe ratio
    VaR_5 <- quantile(ret, 0.05)                        # Value-at-risk
    turn <- 0
    for(t in 2:length(ret)) {
        realised_ret <- actual_ret %>%
          filter(Date == dates_oos[t]) %>%
          select(-Date)
        prior_weight <- weight[t-1,] * (1 + realised_ret) # Before rebalancing
        turn <- turn + apply(abs(weight[t,] - prior_weight/sum(prior_weight)),1,sum)
    }
    turn <- turn/(length(ret)-1)
    met <- data.frame(avg_ret, vol, Sharpe_ratio, VaR_5, turn)    # Aggregation of all of this
    rownames(met) <- port_name
    return(met)
}

4 Investing Strategies

We create various portfolios using same set of 209 stocks based on 3 different schemes.

  • Sparse portfolio Weight to each stock is assigned based on the influence of other stocks’ returns on that stock. Whereas, the degree of such influence (weight) is determined by the following algorithms.
    • Elastic Network
  • Signals
    • Numercial: use future returns as dependent variables Steps
      • Use Lasso to determine features for each stocks
      • Predict the future returns for each stock using different ML algorithms, including
        • Lasso
        • KNN
        • Random Forest
      • Translate the numerical results into signals using 3 criteria:
        • Percentile - use expanding historical data to split percentile decision regions P0-P33: ‘sell’ P33-P66: ‘hold’, includes in portfolio only if the stock is held/bought from the previous period P66-P100: ‘buy’

        • Monthly Momentum - ‘buy’ a stock if predicted returns increase from previous day, and sell otherwise
        • 3-period Momentum - ‘buy’ a stock if predicted returns have been increasing for all 3 previous periods and the predicted return for the 4th period is in the same direction (rising), and sell otherwise
      • Consider the numerical results, and override signals: ‘buy’ only when signal is ‘buy’ and numerical prediction is positive, ‘sell’ otherwise
      • Apply equally weighted scheme for stocks with ‘buy’ signals
    • Categorical: use signals of future returns as dependent variables Steps
      • Translate the numerical actual future returns into signals using the same 3 criteria:
        • Percentile
        • Monthly Momentum
        • 3-period Momentum
      • Use Lasso to determine features for each stocks
      • Predict the signals of future returns for each stock using different ML algorithms, including
        • QDA
        • KNN
        • Logistic Model
        • Probit Model
      • Apply equally weighted scheme for stocks with ‘buy’ signals
  • Benchmark portfolio To interpret and gain understanding of the use of machine learning, we set equally-weighted portfolio as a benchmark.
    • EW

4.1 Benchmark portfolio

We use equally weighted portfolio as a benchmark for other strategies. The portfolio is simply distributed the risk of each asset equally across the portfolio. The portfolio weights can be computed as it holds all selected assets equally as \(w=\frac{1}{N}\).

4.1.1 Setup functions to calculate equal weigths

weights_ew <- function(returns){
    N = length(returns[1,])
    return(rep(1/N,N))
}

4.1.2 Calculate portfolio return

# Initiate storage for portfolio weights
port_w_ew = matrix(0, nrow = length(dates_oos), ncol = length(tick))
colnames(port_w_ew) = tick


# Initiate storage for portfolio returns
port_return_ew = matrix(0, nrow = length(dates_oos))

for(i in 1:length(dates_oos)){
    
    # Get the past return data
    past_return <- returns %>% 
      filter(Date < dates_oos[i]) %>%
      dplyr::select(-Date)
    
    # Calculate portfolio weight
    port_w_ew[i,] <- weights_ew(past_return)
    
    # Get the realised data
    realised_returns <- returns %>% 
        filter(Date ==  dates_oos[i]) %>%
        dplyr::select(-Date)
    
    # Calculating returns for each portfolio
    port_return_ew[i] <- sum(port_w_ew[i,] * realised_returns)
}

4.1.3 Result: Performance matrix for Benchmark portfolio

asset_returns <- data %>% 
    dplyr::select(Tick, Date, F_Return) %>% 
    spread(key = Tick, value = F_Return)
met_ew = perf_met(port_return_ew,port_w_ew,asset_returns)
row.names(met_ew) = c("Equally weighted")
met_ew

4.2 Sparse Portfolio

4.2.1 Elastic Network

Elastic net is a form of penalized regression which used to create sparse portfolio based on returns of assets. The condition of the regression is as follows \[ \hat{\beta}^{elastic}=\underset{\beta}{\text{argmin}}\, \left\{\frac{1}{N}\sum_{n=1}^N\left( y_{i}-a_i+\sum_{k= 1}^K\beta_{k}x_{k,i}\right)^2+\lambda \alpha \sum_{k= 1}^K| \beta_k|+\lambda (1-\alpha)\sum_{k= 1}^K| \beta_{k}^2|\right\},\]

4.2.1.1 Setup for elastic network weight and return calculations

The model use two parameters \(\lambda\) and \(\alpha\). \(\alpha\) is the parameter that changes the weight between Ridge regression and LASSO regression (Ridge: \(\alpha = 0\), LASSO: \(\alpha = 1\)). \(\lambda\) is the parameter that adjust the weight of the penalised part.

To find portfolio weights using Elastic net, we regress the returns of the selected asset on the returns of other assets as. \(r_{i,t}=a_i+\sum_{n\neq i}^N\beta_{n}r_{n,t}+\epsilon_{t}\)

Using penalised regression to get. \[ \underset{\beta_{i|}}{\text{argmin}}\, \left\{\sum_{t=1}^T\left( r_{i,t}-a_i+\sum_{n\neq i}^N\beta_{i|n}r_{n,t}\right)^2+\lambda \alpha || \beta||_1+\lambda (1-\alpha)||\beta||_2^2\right\},\]

weights_lasso <- function(returns,alpha){  # The parameters are defined here
    w <- 0                                          # Initiate weights
    for(i in 1:ncol(returns)){                      # Loop on the assets
        y <- returns[,i]                            # Dependent variable
        x <- returns[,-i]                           # Independent variable
        fit <- glmnet(x,y, alpha = alpha,lambda = 0.5)
        err <- y-predict(fit, x)                    # Prediction errors
        w[i] <- (1-sum(fit$beta))/var(err)          # Output: weight of asset i
    }
    return(w / sum(w))                              # Normalisation of weights
}

4.2.1.2 Alpha selection and calculation weights & returns

Elastic net portfolio weights can be calculated using the following steps 1. Use the value of alphas ranging from 0, 0.1, 0.2,…,1 and using \(\lambda = 0.5\). 2. Selecting past return data and use the function weights_lasso to calculate protfolio weights 3. Multiply the weights with realised returns to get the portfolio returns 4. Select the highest return portfolio

# Finding alpha that generate largest average return
colMeans(port_return_elas)
         X0        X0.1        X0.2        X0.3        X0.4        X0.5        X0.6        X0.7        X0.8        X0.9          X1 
0.007399314 0.008577279 0.008623625 0.008620449 0.008619760 0.008619747 0.008619747 0.008619747 0.008619747 0.008619747 0.008619747 

4.2.1.3 Result: Performance matrix for Sparse portfolio

4.3 Signals

signals We separate the predicted returns into 3 categories based on the asset past returns. 1. BUY signal (+1): If the predicted return of asset i is higher than P66 of its entire past returns 2. SELL signal (0): If the predicted return of asset i is lower than P33 of its entire past returns 3. HOLD signal (-1):If the predicted return of asset i falls in between P33 and P66 of its entire past returns

4.3.0.1 LASSO

For LASSO method, glmnet is to use penalised regression to predict future returns using the past feature data including past returns. The procedure are as the following steps: 1. Get the past data as independent variables (x) 2. Get the past returns as a dependent variable (y) 3. Use LASSO as a model to predict the future returns 4. Perform cross validation from the function cv.glmnet. the objective is to use the lambda that minimise the variance of the error 5. Use the current feature data to predict the next period returns

4.3.0.1.1 Train LASSO model
# temp_return = matrix(NA, ncol = length(tick), nrow=length(dates_oos))
# colnames(temp_return) = tick
# for (i in 1:length(tick)){
#   print(i)
#   for(t in 1:length(dates_oos)){
#     past_data = data %>%
#       filter(Tick == tick[i]) %>%
#       filter(Date < dates_oos[t]) %>%
#       dplyr::select(-Date,-Tick,-Close) %>%
#       tail(.,48) %>%
#       as.matrix()
#     # Get the training data (3months)
#     train_data = head(past_data,36) %>%
#       as.data.frame()
#     x_train = train_data %>%
#       dplyr::select(-F_Return) %>%
#       as.matrix()
#     y_train = train_data %>%
#       dplyr::select(F_Return) %>%
#       as.matrix
# 
# 
#     MSE = matrix(NA, ncol=10,nrow=11)
# 
#     for(a in 1:11){
#       # Alpha = 0, 0.1, 0.2,...,1
#       alpha = (a-1)/10
#     for(l in 1:10){
#       # Lambda = 0, 0.1, 0.2,...,1
#       lambda = (l)/10
#     
#       # Run penalised regression
#       fit = glmnet(x = x_train, y = y_train, alpha = alpha, lambda = lambda)
# 
#       # Get the validation data (3months)
#       validate_data = tail(past_data,12) %>%
#         as.data.frame() 
#       x_validate = validate_data %>%
#         dplyr::select(-F_Return) %>%
#         as.matrix()
#       y_validate= validate_data %>%
#         dplyr::select(F_Return) %>%
#         as.matrix()
# 
#       # Calculate MSE
#       MSE[a,l] = sum((y_validate - predict(fit,x_validate))^2)
# 
#     }
#   }
# 
# # Optimal parameters
# min_alpha = (which(MSE == min(MSE), arr.ind = TRUE)[1] - 1)/10
# min_lambda = (which(MSE == min(MSE), arr.ind = TRUE)[2])/10
# 
# 
# 
# last_period_x = data %>%
#       filter(Tick == tick[i]) %>%
#       filter(Date == dates_oos[t]) %>%
#       dplyr::select(-Date,-Tick,-Close, -F_Return) %>%
#       as.matrix()
#     
#     fit2 = glmnet(x = x_train, y = y_train, alpha = min_alpha, lambda = min_lambda)
#     temp_return[t,i] <- predict(fit2, last_period_x)
#   }
# }
# 
# write.csv(temp_return, file = "return_lasso.csv")

# For further execution
temp_return = read.csv(file = "return_lasso.csv")
temp_return = temp_return[-1]

** with LASSO with percentile **

# Translate Hold signal
for (i in 1:length(tick)){
  print(i)
  for (t in 2:length(dates_oos)){
    if(q_signal[t,i] == 0){
      q_signal[t,i] = q_signal[t-1,i]
    }
  }
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] 18
[1] 19
[1] 20
[1] 21
[1] 22
[1] 23
[1] 24
[1] 25
[1] 26
[1] 27
[1] 28
[1] 29
[1] 30
[1] 31
[1] 32
[1] 33
[1] 34
[1] 35
[1] 36
[1] 37
[1] 38
[1] 39
[1] 40
[1] 41
[1] 42
[1] 43
[1] 44
[1] 45
[1] 46
[1] 47
[1] 48
[1] 49
[1] 50
[1] 51
[1] 52
[1] 53
[1] 54
[1] 55
[1] 56
[1] 57
[1] 58
[1] 59
[1] 60
[1] 61
[1] 62
[1] 63
[1] 64
[1] 65
[1] 66
[1] 67
[1] 68
[1] 69
[1] 70
[1] 71
[1] 72
[1] 73
[1] 74
[1] 75
[1] 76
[1] 77
[1] 78
[1] 79
[1] 80
[1] 81
[1] 82
[1] 83
[1] 84
[1] 85
[1] 86
[1] 87
[1] 88
[1] 89
[1] 90
[1] 91
[1] 92
[1] 93
[1] 94
[1] 95
[1] 96
[1] 97
[1] 98
[1] 99
[1] 100
[1] 101
[1] 102
[1] 103
[1] 104
[1] 105
[1] 106
[1] 107
[1] 108
[1] 109
[1] 110
[1] 111
[1] 112
[1] 113
[1] 114
[1] 115
[1] 116
[1] 117
[1] 118
[1] 119
[1] 120
[1] 121
[1] 122
[1] 123
[1] 124
[1] 125
[1] 126
[1] 127
[1] 128
[1] 129
[1] 130
[1] 131
[1] 132
[1] 133
[1] 134
[1] 135
[1] 136
[1] 137
[1] 138
[1] 139
[1] 140
[1] 141
[1] 142
[1] 143
[1] 144
[1] 145
[1] 146
[1] 147
[1] 148
[1] 149
[1] 150
[1] 151
[1] 152
[1] 153
[1] 154
[1] 155
[1] 156
[1] 157
[1] 158
[1] 159
[1] 160
[1] 161
[1] 162
[1] 163
[1] 164
[1] 165
[1] 166
[1] 167
[1] 168
[1] 169
[1] 170
[1] 171
[1] 172
[1] 173
[1] 174
[1] 175
[1] 176
[1] 177
[1] 178
[1] 179
[1] 180
[1] 181
[1] 182
[1] 183
[1] 184
[1] 185
[1] 186
[1] 187
[1] 188
[1] 189
[1] 190
[1] 191
[1] 192
[1] 193
[1] 194
[1] 195
[1] 196
[1] 197
[1] 198
[1] 199
[1] 200
[1] 201
[1] 202
[1] 203
[1] 204
[1] 205
[1] 206
[1] 207
[1] 208
[1] 209

** with LASSO with three-momentum **

for (i in 1:length(tick)){
  print(i)
  for (t in 1:length(dates_oos)){
    three_M_past_return = data %>%
      filter(Tick == tick[i]) %>%
      filter(Date <= dates_oos[t]) %>%
      dplyr::select(P_Return) %>%
      as.matrix() %>%
      tail(3)
    
    p_1 = three_M_past_return[3]
    p_2 = three_M_past_return[2]
    p_3 = three_M_past_return[1]
    
    if(temp_return[t,i]>0 & p_1>0 & p_2>0 & p_3>0){
      m_signal[t,i] = 1
    }
      else{
      m_signal[t,i] = 0
    }
  }
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] 18
[1] 19
[1] 20
[1] 21
[1] 22
[1] 23
[1] 24
[1] 25
[1] 26
[1] 27
[1] 28
[1] 29
[1] 30
[1] 31
[1] 32
[1] 33
[1] 34
[1] 35
[1] 36
[1] 37
[1] 38
[1] 39
[1] 40
[1] 41
[1] 42
[1] 43
[1] 44
[1] 45
[1] 46
[1] 47
[1] 48
[1] 49
[1] 50
[1] 51
[1] 52
[1] 53
[1] 54
[1] 55
[1] 56
[1] 57
[1] 58
[1] 59
[1] 60
[1] 61
[1] 62
[1] 63
[1] 64
[1] 65
[1] 66
[1] 67
[1] 68
[1] 69
[1] 70
[1] 71
[1] 72
[1] 73
[1] 74
[1] 75
[1] 76
[1] 77
[1] 78
[1] 79
[1] 80
[1] 81
[1] 82
[1] 83
[1] 84
[1] 85
[1] 86
[1] 87
[1] 88
[1] 89
[1] 90
[1] 91
[1] 92
[1] 93
[1] 94
[1] 95
[1] 96
[1] 97
[1] 98
[1] 99
[1] 100
[1] 101
[1] 102
[1] 103
[1] 104
[1] 105
[1] 106
[1] 107
[1] 108
[1] 109
[1] 110
[1] 111
[1] 112
[1] 113
[1] 114
[1] 115
[1] 116
[1] 117
[1] 118
[1] 119
[1] 120
[1] 121
[1] 122
[1] 123
[1] 124
[1] 125
[1] 126
[1] 127
[1] 128
[1] 129
[1] 130
[1] 131
[1] 132
[1] 133
[1] 134
[1] 135
[1] 136
[1] 137
[1] 138
[1] 139
[1] 140
[1] 141
[1] 142
[1] 143
[1] 144
[1] 145
[1] 146
[1] 147
[1] 148
[1] 149
[1] 150
[1] 151
[1] 152
[1] 153
[1] 154
[1] 155
[1] 156
[1] 157
[1] 158
[1] 159
[1] 160
[1] 161
[1] 162
[1] 163
[1] 164
[1] 165
[1] 166
[1] 167
[1] 168
[1] 169
[1] 170
[1] 171
[1] 172
[1] 173
[1] 174
[1] 175
[1] 176
[1] 177
[1] 178
[1] 179
[1] 180
[1] 181
[1] 182
[1] 183
[1] 184
[1] 185
[1] 186
[1] 187
[1] 188
[1] 189
[1] 190
[1] 191
[1] 192
[1] 193
[1] 194
[1] 195
[1] 196
[1] 197
[1] 198
[1] 199
[1] 200
[1] 201
[1] 202
[1] 203
[1] 204
[1] 205
[1] 206
[1] 207
[1] 208
[1] 209

** with LASSO with one-momentum **

for (i in 1:length(tick)){
  print(i)
  for (t in 1:length(dates_oos)){
    one_M_past_return = data %>%
      filter(Tick == tick[i]) %>%
      filter(Date <= dates_oos[t]) %>%
      dplyr::select(P_Return) %>%
      as.matrix() %>%
      tail(1)
    if(temp_return[t,i]>0 & one_M_past_return>0){
      m1_signal[t,i] = 1
    }
      else{
      m1_signal[t,i] = 0
    }
  }
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] 18
[1] 19
[1] 20
[1] 21
[1] 22
[1] 23
[1] 24
[1] 25
[1] 26
[1] 27
[1] 28
[1] 29
[1] 30
[1] 31
[1] 32
[1] 33
[1] 34
[1] 35
[1] 36
[1] 37
[1] 38
[1] 39
[1] 40
[1] 41
[1] 42
[1] 43
[1] 44
[1] 45
[1] 46
[1] 47
[1] 48
[1] 49
[1] 50
[1] 51
[1] 52
[1] 53
[1] 54
[1] 55
[1] 56
[1] 57
[1] 58
[1] 59
[1] 60
[1] 61
[1] 62
[1] 63
[1] 64
[1] 65
[1] 66
[1] 67
[1] 68
[1] 69
[1] 70
[1] 71
[1] 72
[1] 73
[1] 74
[1] 75
[1] 76
[1] 77
[1] 78
[1] 79
[1] 80
[1] 81
[1] 82
[1] 83
[1] 84
[1] 85
[1] 86
[1] 87
[1] 88
[1] 89
[1] 90
[1] 91
[1] 92
[1] 93
[1] 94
[1] 95
[1] 96
[1] 97
[1] 98
[1] 99
[1] 100
[1] 101
[1] 102
[1] 103
[1] 104
[1] 105
[1] 106
[1] 107
[1] 108
[1] 109
[1] 110
[1] 111
[1] 112
[1] 113
[1] 114
[1] 115
[1] 116
[1] 117
[1] 118
[1] 119
[1] 120
[1] 121
[1] 122
[1] 123
[1] 124
[1] 125
[1] 126
[1] 127
[1] 128
[1] 129
[1] 130
[1] 131
[1] 132
[1] 133
[1] 134
[1] 135
[1] 136
[1] 137
[1] 138
[1] 139
[1] 140
[1] 141
[1] 142
[1] 143
[1] 144
[1] 145
[1] 146
[1] 147
[1] 148
[1] 149
[1] 150
[1] 151
[1] 152
[1] 153
[1] 154
[1] 155
[1] 156
[1] 157
[1] 158
[1] 159
[1] 160
[1] 161
[1] 162
[1] 163
[1] 164
[1] 165
[1] 166
[1] 167
[1] 168
[1] 169
[1] 170
[1] 171
[1] 172
[1] 173
[1] 174
[1] 175
[1] 176
[1] 177
[1] 178
[1] 179
[1] 180
[1] 181
[1] 182
[1] 183
[1] 184
[1] 185
[1] 186
[1] 187
[1] 188
[1] 189
[1] 190
[1] 191
[1] 192
[1] 193
[1] 194
[1] 195
[1] 196
[1] 197
[1] 198
[1] 199
[1] 200
[1] 201
[1] 202
[1] 203
[1] 204
[1] 205
[1] 206
[1] 207
[1] 208
[1] 209
4.3.0.1.2 Result: Performance matrix for LASSO

4.3.0.2 K-Nearest Neighbour

This algorithm consider values of K-nearest data points to a particular observation in order to predict that observations. Therefore, the hyperparameter for this model is K. Note that, since we train the model for each stock, the model automatically tunes the best K-value for each stock’s model resulting in different K’s.

4.3.0.2.1 Setup functions for KNN

Signals

#percentile
P_sig <- function(dat,hist_dat,sig_date) { 
      #dat: one column of returns (predicted for numerical, actual for categorical)
    #hist_dat: historical data to create percentile
    #sig_date: date at which the signal is assigned
  sig <- matrix (0,nrow = length(dat),ncol = 1)
  for (i in 1:length(dat)) {
    P_dat <- hist_dat[1:(sig_date-2+i)]    #expanding data to create percentile
    if (dat[i] >= quantile(P_dat, .66)) {
        sig[i] = 1
    } else if (dat[i] >= quantile(P_dat, .33)) {
        if (i == 1) {
          sig[i] = 0
        } else if (sig[i-1] == 1) {
          sig[i] = 1
        } 
    } else {
        sig[i] = 0
    }
  }
  return(sig)
}

#For Momentum functions, the input is data frame with future returns as first column and current returns as second columns

#Monthly MoM
MoM.1_sig <- function(dat){
  sig <- matrix(0,nrow = nrow(dat),ncol = 1)
  for (i in 1:nrow(dat)) {
    if (dat[i,1] > dat[i,2]) {      #buy if predicted future return > current return
      sig[i] = 1
    } else {
      sig[i] = 0
    }
  }
  return(cbind(dat,sig))
}

# Quarterly MoM
MoM.3_sig <- function(dat){
  sig <- matrix(0,nrow = nrow(dat),ncol = 1)
  for (i in 4:nrow(dat)) {
    if ((dat[i-2,2] > dat[i-3,2]) & (dat[i-1,2] > dat[i-2,2]) & (dat[i,2] > dat[i-1,2]) &
    (dat[i,1] > dat[i,2])){ 
                #buy if predicted future return > current return
                #given that the past return had been on rising momentum for last 3 periods
      sig[i] = 1
    } else {
      sig[i] = 0
    }
  }
  result <- cbind(dat,sig)
  result <- result[4:(nrow(dat)),] #somehow this does not delete first 3 rows
  #result <- result[4:67,]
  return(result)
}

Cross-validation Unlike sparse portfolio method, we deal with time-seies data to predict future returns in this section. Hence, we need to set time-series cross-validation as time slice to maintain the structure in the data (oldest to newest). To be consistent with our out-of-sample date which separates training and test sets with approximately 70/30 ratio, we will set 36 and 12 observations for each training and validation examples when we fit the model.

#carat library for training
if(!require(caret)){install.packages('caret')}
library(caret)

#Training set starts from first date until date before separation date
timeSlices_train <- createTimeSlices(1:(length(dates_all)-length(dates_oos)), 
                   initialWindow = 36, horizon = 12, fixedWindow = TRUE)

Date variables

oos_start <- ((length(dates_all)-length(dates_oos))+1)
oos_length <- length(dates_all) - oos_start + 1
oos_ret <- as_tibble(returns) %>% 
  filter(Date >= sep_date) %>%    
  dplyr::select(-Date) %>%                                   
  as.matrix()    

Equally-weighted calculation

cal_port_weight <- function(sig_dat) {
  weight <- matrix(0, nrow= nrow(sig_dat),ncol = ncol(sig_dat))     # create weight matrix
  for (i in 1:nrow(sig_dat)) {                                      # assign weights
    buy_thisDay <- which(sig_dat[i,] == 1)
    for (j in 1:length(buy_thisDay)) {
      weight[i,buy_thisDay[j]] <- 1/length(buy_thisDay)
    }
  }
 return (weight)
}
4.3.0.2.2 Train KNN with percentile
weight_knn_P = read.csv(file = "weight_knn_P")
cannot open file 'weight_knn_P': No such file or directoryError in file(file, "rt") : cannot open the connection
4.3.0.2.3 Train KNN with percentile & sign
# signals_knn_P.s <- matrix(0, nrow = oos_length, ncol = length(tick)) #create table to store signals
# pred_knn_P.s <- matrix(0, nrow = oos_length, ncol = length(tick))
# colnames(signals_knn_P.s) <- tick
# colnames(pred_knn_P.s) <- tick
# for (i in 1:length(tick)){
#   a <- selectFeature(tick[i])
#   a_pred <- train(F_Return ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "knn", 
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start:length(dates_all),]
#   a_pred.future_test <- predict(a_pred, a_actual.future_test)     # get prediction
#   a_signal <- P_sig(a_pred.future_test,as.matrix(returns[tick[i]]),oos_start)# transfrom into signals
#   pred_knn_P.s[, i] <- a_pred.future_test
#   for (j in 1: length(a_signal)){
#     if ((a_signal[j] == 1) & (a_pred.future_test[j] > 0)) {
#       signals_knn_P.s[j, i] = 1
#     } else {
#       signals_knn_P.s[j, i] = 0
#     }
#   }
# }
# weight_knn_P.s <- cal_port_weight(signals_knn_P.s)
# ret_knn_P.s <- rowSums(weight_knn_P.s*oos_ret)
# write.csv(weight_knn_P.s, file = "weight_knn_P_s.csv")
# write.csv(ret_knn_P.s, file = "ret_knn_P_s.csv")

weight_knn_P.s = read.csv(file = "weight_knn_P_s.csv")
weight_knn_P.s = weight_knn_P.s[-1]

ret_knn_P.s = read.csv(file = "ret_knn_P_s.csv")
ret_knn_P.s = ret_knn_P.s[-1]
4.3.0.2.4 Train KNN with 1-period momentum
weight_knn_MoM.1 <- cal_port_weight(signals_MoM.1) #addadd. change name of sig
Error in nrow(sig_dat) : object 'signals_MoM.1' not found
4.3.0.2.5 Train KNN with 1-period momentum & sign
# signals_knn_MoM.1.s <- matrix(0, nrow = oos_length, ncol = length(tick)) #create table to store signals
# pred_knn_MoM.1.s <- matrix(0, nrow = oos_length, ncol = length(tick))
# colnames(signals_knn_MoM.1.s) <- tick
# colnames(pred_knn_MoM.1.s) <- tick
# 
# for (i in 1:length(tick)){
#   a <- selectFeature(tick[i])
#   a_pred <- train(F_Return ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "knn", 
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start:length(dates_all),]
#   a_pred.future_test <- predict(a_pred, a_actual.future_test)     # get prediction
#   a_actual.current_test <- a$F_Return[(oos_start-1):(length(dates_all)-1)]
#   a_table <- cbind(a_pred.future_test,a_actual.current_test)
#   a_signal <- MoM.1_sig(a_table)                # transfrom into signals
#   pred_knn_MoM.1.s[, i] <- a_pred.future_test
#   for (j in 1: nrow(a_signal)){
#     if ((a_signal[j,3] == 1) & (a_pred.future_test[j] > 0)) {
#       signals_knn_MoM.1.s[j, i] = 1
#     } else {
#       signals_knn_MoM.1.s[j, i] = 0
#     }
#   }
# }
# weight_knn_MoM.1.s <- cal_port_weight(signals_knn_MoM.1.s) 
# ret_knn_MoM.1.s <- rowSums(weight_knn_MoM.1.s*oos_ret)
# 
# write.csv(weight_knn_MoM.1.s, file = "weight_knn_MoM1s.csv")
# write.csv(ret_knn_MoM.1.s, file = "ret_knn_MoM1s.csv")

weight_knn_MoM.1.s = read.csv(file = "weight_knn_MoM1s.csv")
weight_knn_MoM.1.s = weight_knn_MoM.1.s[-1]

ret_knn_MoM.1.s = read.csv(file = "ret_knn_MoM1s.csv")
ret_knn_MoM.1.s = ret_knn_MoM.1.s[-1]
4.3.0.2.6 Train KNN with 3-period momentum
# signals_knn_MoM.3 <- matrix(0, nrow = oos_length, ncol = length(tick)) #create table to store signals
# colnames(signals_knn_MoM.3) <- tick
# for (i in 1:length(tick)){
#   a <- selectFeature(tick[i])
#   a_pred <- train(F_Return ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "knn", 
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start:length(dates_all),]
#   pred <- predict(a_pred, a_actual.future_test)     # get prediction
#   a_pred.future_test <- c(0,0,0,pred)           # make lenght longer for previous actual returns
#   a_actual.current_test <- a$F_Return[(oos_start-3):(length(dates_all))]
#   a_table <- cbind(a_pred.future_test,a_actual.current_test)
#   a_signal <- MoM.3_sig(a_table)                # transfrom into signals
#   
#   signals_knn_MoM.3[,i]  <- a_signal[,3]          # combine signals for all stocks
# }
# weight_knn_MoM.3 <- cal_port_weight(signals_knn_MoM.3)
# ret_knn_MoM.3 <-rowSums(weight_knn_MoM.3*oos_ret)
# 
# write.csv(weight_knn_MoM.3, file = "weight_knn_MoM3.csv")
# write.csv(ret_knn_MoM.3, file = "ret_knn_MoM3.csv")

weight_knn_MoM.3 = read.csv(file = "weight_knn_MoM3.csv")
weight_knn_MoM.3 = weight_knn_MoM.3[-1]

ret_knn_MoM.3 = read.csv(file = "ret_knn_MoM3.csv")
ret_knn_MoM.3 = ret_knn_MoM.3[-1]
4.3.0.2.7 Train KNN with 3-period momentum & sign
# signals_knn_MoM.3.s <- matrix(0, nrow = oos_length, ncol = length(tick)) #create table to store signals
# colnames(signals_knn_MoM.3.s) <- tick
# pred_knn_MoM.3.s <- matrix(0, nrow = oos_length, ncol = length(tick)) 
# colnames(pred_knn_MoM.3.s) <- tick
# 
# for (i in 1:length(tick)){
#   a <- selectFeature(tick[i])
#   a_pred <- train(F_Return ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "knn", 
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start:length(dates_all),]
#   pred <- predict(a_pred, a_actual.future_test)     # get prediction
#   a_pred.future_test <- c(0,0,0,pred)           # make lenght longer for previous actual returns
#   a_actual.current_test <- a$F_Return[(oos_start-3):(length(dates_all))]
#   a_table <- cbind(a_pred.future_test,a_actual.current_test)
#   a_signal <- MoM.3_sig(a_table)                # transfrom into signals
#   
#   pred_knn_MoM.3.s[, i] <- a_pred.future_test[4:length(a_pred.future_test)]
#   for (j in 1: nrow(a_signal)){
#     if ((a_signal[j, 3] == 1) & (a_pred.future_test[j] > 0)) {
#       signals_knn_MoM.3.s[j, i] = 1
#     } else {
#       signals_knn_MoM.3.s[j, i] = 0
#     }
#   }
# }
# weight_knn_MoM.3.s <- cal_port_weight(signals_knn_MoM.3.s)
# ret_knn_MoM.3.s <-rowSums(weight_knn_MoM.3.s*oos_ret)
# 
# write.csv(weight_knn_MoM.3.s, file = "weight_knn_MoM3s.csv")
# write.csv(ret_knn_MoM.3.s, file = "ret_knn_MoM3s.csv")

weight_knn_MoM.3.s = read.csv(file = "weight_knn_MoM3s.csv")
weight_knn_MoM.3.s = weight_knn_MoM.3.s[-1]

ret_knn_MoM.3.s = read.csv(file = "ret_knn_MoM3s.csv")
ret_knn_MoM.3.s = ret_knn_MoM.3.s[-1]
4.3.0.2.8 Result: Peformance Metrics for KNN (Numerical)

4.3.0.3 Random Forest

4.3.0.3.1 Growing a Portfolio

Random Forest is another common machine learning techniques which developed from a decision tree. It works well with both classification and regression problems. It is a powerful technique due to its highly non-linear model. However there is a trade-off by having many hyperparameters namely mtry (number of predictors sampled for spliting at each node.) and ntree (number of tree in the forest). In this case, signal criteria (percentile of momentum) is also treat as another hyperparameter since different algorithms yield different portfolio weight.

Regression random forest is used in this setting, where the user define a value for hyperparameter. (Impact of hyperparameter is discussed in the next section). For the Random Forest portfolio, the weight for each asset at each time period will be determined by the following procedures:

  1. At each date, train Random Forest model by using past data. (Note that different assets have different model due to feature selection process for each asset)
  2. Use the model to get a prediction for future return (F_Return)
  3. Derive investment decision for each asset by applying signal criteria (either percentile or momentum)to the prediction
  4. Normalise portfolio weight using equally weighted method

Then the strategy will be evaluated on its performance metrics based on the portfolio weight and return.

First, weight_RF function is defined to calculated portfolio weight using Random Forest technique. The function receives past and current data, mtry, ntree, and s_criteria (s_criteria) as parameters and return the buying decision as output.

if(!require(randomForest)){install.packages("randomForest")}
library(randomForest)
weight_RF <- function(past_data, current_data, mtry, ntree, s_criteria, prev_w){ # Function for weight calculation using random forest technique
      train <- past_data[-c(1)] #Training data, drop date columns
      test <- current_data[-c(1,2)] #Testing data, drop date and F_Return columns
      rf.fit <- randomForest(F_Return ~ ., # Grow random forest
                         data = train, # Training data
                         mtry = min(mtry,length(test)), # Number of predictive variables used is 4 (or less than if there is less variable)
                         ntree = ntree) # Number of random trees in the forest
      pred <- predict(rf.fit, test) # Predict F_Return using the fitted model
      if (s_criteria == 1){ # percentile signal criteria
        return (ifelse(pred < quantile(past_data$F_Return, c(.33)), 0, ifelse(pred < quantile(past_data$F_Return, c(.66)), prev_w, 1)))
      } else if (s_criteria == 2){ # momentum signal criteria
        return (ifelse(pred > 0, (ifelse(past_data[nrow(past_data)-1,]$F_Return > 0 & 
                                           past_data[nrow(past_data)-2,]$F_Return > 0 & 
                                           past_data[nrow(past_data)-3,]$F_Return > 0, 1, 0)), 0))
      } else { # just invest if return is positive
        return (ifelse(pred > 0, 1, 0)) # Invest if return > 0                                                                                  
      }
}

Function doRF performs backtesting evaluation by creating a portfolio by using Random Forest technique, given a specific set of hyperparameter values. The funcion returns a performance metric as a result.

doRF <- function(mtry = 4, ntree = 10, s_criteria = 0){
  portf_weight_RF <- matrix(0, nrow = length(dates_oos)-1, ncol = length(tick)) # Initiate weight matrix for RF portfolio
  portf_return_RF <- c() # # Initiate return list for RF portfolio
  prev_w <- 0
  for (i in 1:length(tick)){ # Repeat for all ticker
    temp <- selectFeature(tick[i]) # Feature selection for ticker i 
    w <- c() # Initiate an empty list of weight for ticker i
    for (t in 2:(length(dates_oos))){ # Expandind window for out-of-sample dates
      past_data <- temp %>% filter(Date < dates_oos[t-1]) # Past data is all data from the date before date t
      current_data <- temp %>% filter(Date == dates_oos[t-1]) # Current data is data at date t
      res <- weight_RF(past_data = past_data, current_data = current_data, mtry = mtry, ntree = ntree, s_criteria = s_criteria, prev_w = prev_w) # Get weight using random forest method with specific parameter
      w = rbind(w, res) # Append date for stock i at period t
      prev_w <- w
    }
    portf_weight_RF[,i] <- w # Append weight for stock i for all period t
  }
  portf_weight_RF <- getEW(portf_weight_RF) # Normalise weight (Equally weighted)
  for(j in 1:(length(dates_oos)-1)){ 
    portf_return_RF[j] <- sum(portf_weight_RF[j,] * returns_oos[j,]) # Calculate portfolio return
  }
  return (perf_met(portf_return_RF, portf_weight_RF, asset_return)) # Get performance metrics
}

Next, create a portfolio based on Random Forest strategy and evaluate the performance

doRF()
4.3.0.3.2 Sensitivity Analysis for Hyperparameters - Using the GridSearch

Performance could be improved by using the right values of hyperparameter. During the trading period, cross-validation technique could be used to tune hyperparameter. Analysis below demonstrate how much does hypermeters impact the performance of the portfolio by applying GridSearch technique. GridSearch can be performed with the following step:

  1. Set up range of values to be tested for each hyperparameter
  2. Set up list of all possible combinations for all hyperparameters
  3. Evaluate the model on all settings from step 2

This operation could significantly consume a huge runtime. Therefore RandomSearch can also be used to reduce runtime. The concept of RandomSearch is to initially randomly set a list of hyperparameters to be tested. Then the set that gives the best performance will be used to tuned the hyperparameters once again by doing GridSearch around those points.

Below, GridSearch is performed on selected combinations of hyperparameters and the resulting average return for each setting is visualized.

mtry <- c(6, 7, 8, 9, 10) # List of mtry to be tested
ntree <- c(10, 20, 30, 40, 50) # List of ntree to be tested
s_criteria <- c(0, 1, 2) # List of signal criteria to be tested
pars_RF <- expand.grid(mtry, ntree, s_criteria)  # Exploring all combinations
mtry <- pars_RF[,1]
ntree <- pars_RF[,2]
s_criteria <- pars_RF[,3]
grid_pars_RF <- function(mtry, ntree, s_criteria){  # Parameters for the grid search
  print(paste('mtry: ', toString(mtry), ' ntree: ', toString(ntree), ' s_criteria: ', toString(s_criteria)))
  temp <- doRF(mtry = mtry, ntree = ntree, s_criteria = s_criteria)
  return (temp$avg_ret) # Return average return
}
# Load result which has already been computed to reduce runtime 
grd <- read.csv('grd_RF.csv', header=TRUE, sep=",")
# If haven't computed before, uncomment the following block
#------------------------------------------------------------------------------#
# grd <- pmap(list(mtry, ntree, s_criteria), # Parameters for the grid search
#             grid_pars_RF) # Input for function: test labels 
# grd <- data.frame(mtry, ntree, s_criteria, avg_ret = unlist(grd))
# write.csv(grd, file = "grd_RF.csv", row.names=FALSE) # Keep as csv file on local
#------------------------------------------------------------------------------#
grd$s_criteria <- ifelse(grd$s_criteria == 0, "none", ifelse(grd$s_criteria == 1, "percentile", "momentum"))
grd %>% ggplot(aes(x = s_criteria, y = avg_ret, fill = s_criteria)) + # Plot!
    geom_bar(stat = "identity") +
    facet_grid(rows = vars(mtry), cols = vars(ntree))
Error in facet_grid(rows = vars(mtry), cols = vars(ntree)) : 
  unused arguments (rows = vars(mtry), cols = vars(ntree))
4.3.0.3.3 Result: Peformance Metrics for Random Forest

Obviously, various combinations of hyparameters impact average return of the portfolio signigicantly. The best set of hyperparameter are: mtry = 9, ntree = 20, and percentile signal criteria. Full performance metrics for this setting is shown below.

doRF(mtry = 9, ntree = 20, s_criteria = 1)

4.3.1 Categorical

Classification models are used when dependent variables are categorical, for example; when they are binary. Hence, we need to transform actual outputs into categories, then use them to fit the models. We will follow the same categories in the previous section, namely percentile and momemtum.

4.3.1.1 QDA (Quadratic Discriminant Analysis)

4.3.1.1.1 QDA-Driven Portolio

QDA or quadratic discriminate analysis is one of the famous machine learning technique for classification prediction which requires no hyperparameter. Applying QDA to this scenario, the response variable or future return (F_Return) must be binarised into 1 for positive and 0 for negative return. Therefore the prediction from QDA model is a probability of having a positive return in the next period. For QDA portfolio, the weight for each asset at each time period will be determined by the following procedures:

  1. At each date, train QDA model by using past data. (Note that different assets have different model due to feature selection process for each asset)
  2. Use the model to get a prediction for future return (F_Return)
  3. Derive investment decision for each asset by applying momentum signal criteria to the prediction
  4. Normalise portfolio weight using equally weighted method

Then the strategy will be evaluated on its performance metrics based on the portfolio weight and return.

First, weight_QDA function is defined to calculated portfolio weight using QDA technique. The function receives past and current data and decision threshold as parameters. Decision threshold is a choosen value between 0 and 1 which is used to change probability of having a positive future return into a buying decision. (If the probability is higher than the threshold, then buy an asset) Lastly, function returns either 0 not to invest or 1 to invest.

if(!require(MASS)){install.packages('MASS')}
library(MASS)
weight_QDA <- function(past_data, current_data, threshold = 0.5){ # If not passing threshould then set to 0.5
  y <- past_data$F_Return # Set past data F_Return as a response variable
  y <- ifelse(y > 0 , 1, 0) # Binarise response variable (set to 1 if the return is positive and 0 o.w.)
  x <- as.matrix(past_data[-c(1,2)]) #Remove Date and F_Return from explanatory variables (training data)
  new_x <- current_data[-c(1,2)] #Remove Date and F_Return from explanatory variables (testing data)
  qda.fit <- qda(x = x, grouping = y) # Train QDA model
  pred <- predict(qda.fit, new_x) # Apply model on testing data
  if (pred$posterior[2] > threshold){ # If probability of getting a positive return is greater than threshold
    return (ifelse(y[length(y)-1] > 0 & y[length(y)-2] > 0 & y[length(y)-3] > 0, 1, 0)) # Invest if in a good momentum 
  } else { # Else
    return (0) # Do not invest...
  }
}

Function doQDA performs backtesting evaluation by creating a portfolio by using QDA technique, given a specific decision threshold. The funcion return a performance metric as a result.

doQDA <- function(threshold = 0.5){
  portf_weight_QDA <- matrix(0, nrow = length(dates_oos)-1, ncol = length(tick)) # Initiate weight matrix for QDA portfolio
  portf_return_QDA <- c() # # Initiate return list for QDA portfolio
  for (i in 1:length(tick)){ # Repeat for all ticker
    temp <- selectFeature(ticker = tick[i]) # Feature selection for ticker i 
    w = c() # Initiate an empty list of weight for ticker i
    for (t in 2:(length(dates_oos))){ # Expandind window for out-of-sample dates
      past_data <- temp %>% filter(Date < dates_oos[t-1]) # Past data is all data from the date before date t
      current_data <- temp %>% filter(Date == dates_oos[t-1]) # Current data is data at date t
      res <- weight_QDA(past_data = past_data, current_data = current_data, threshold = threshold) # Get weight using QDA method with specific                                                                                                       threshold
      w = rbind(w, res) # Append date for stock i at period t
    }
    portf_weight_QDA[,i] <- w # Append weight for stock i for all period t
  }
  portf_weight_QDA <- getEW(portf_weight_QDA) # Normalise weight (Equally weighted)
  for(j in 1:(length(dates_oos)-1)){ 
    portf_return_QDA[j] <- sum(portf_weight_QDA[j,] * returns_oos[j,]) # Calculate portfolio return
  }
  return (perf_met(portf_return_QDA, portf_weight_QDA, asset_return)) # Get performance metrics
}

Next, create a portfolio based on QDA strategy and evaluate the performance

doQDA()
4.3.1.1.2 Impact of Decision Threshold

In practice, decision threshold need to be set along trading days. For those investors who strongly believe the QDA model, they can have a low threshold which interpret a low probability of having a positive return as a buy signal. Conversely, investors that do not fully trust the model can set a high threshold to only buy an asset when the probability is very high. If the decision threshold is not set, QDA default value is 0.5 or known as predicting a class with higher probability

However, decision threshold can be set to maximize the portfolio performance. Below, decision threshold is varied from 0.1 to 0.9 and the performance metrics are plotted to illustrate the impact of decision threshold. However, in practice, threshold below 0.5 should not be choosen since there is higher probabilty of having a negative return.

if(!require(ggplot2)){install.packages('ggplot2')}
if(!require(reshape2)){install.packages('reshape2')}
library(ggplot2)
library(reshape2)
perf_met_QDA <- data.frame(matrix(ncol = 6, nrow = 0)) # Initiate dataframe for performance metric of QDA with different parameter values
colnames(perf_met_QDA) <- c('threshold', 'avg_ret', 'vol', 'Sharpe_ratio', 'VaR_5', 'turn') # Set column names
# Load result which has already been computed to reduce runtime
perf_met_QDA <- read.csv('QDA_01.csv', header=TRUE, sep=",")
# If haven't computed before, uncomment the following block
#------------------------------------------------------------------------------#
# for (threshold in seq(0.1, 0.9, 0.1)){ # Loop through set of threshold value (Hyperparameter tuning)
#   perf_met_QDA <- rbind(perf_met_QDA,  cbind(threshold, t(unlist(doQDA(threshold = threshold))))) # Train/test model and store perf. met.
# }
# write.csv(perf_met_QDA, file = "QDA_01.csv", row.names=FALSE) # Keep as csv file on local
#------------------------------------------------------------------------------#
plot_data <- melt(perf_met_QDA  ,  id.vars = 'threshold', variable.name = 'series') # Plot performance evaluation for different value of hy.par.
ggplot(plot_data, aes(threshold, value)) + 
  geom_line(colours = "blue") + 
  facet_grid(series ~ ., scale = "free_y") +
  scale_x_continuous(breaks = seq(0.1, 0.9, 0.1))

Obviously, the optimal decision threshold which provide the best performance for most metrics is about 0.6. It yields the higer average return, sharpe ratio, and value-at-risk with lower volatility. Optimal decision threshold could be search again at values around 0.6.

4.3.1.1.3 Result: Peformance Metrics for QDA
perf_met_QDA <- data.frame(matrix(ncol = 6, nrow = 0)) # Initiate dataframe for performance metric of QDA with different parameter values
colnames(perf_met_QDA) <- c('threshold', 'avg_ret', 'vol', 'Sharpe_ratio', 'VaR_5', 'turn') # Set column names
# Load result which has already been computed to reduce runtime 
perf_met_QDA <- read.csv('QDA_02.csv', header=TRUE, sep=",")
# If haven't computed before, uncomment the following block
#------------------------------------------------------------------------------#
# for (threshold in seq(0.6, 0.7, 0.01)){ # Loop through set of threshold value (Hyperparameter tuning)
#   perf_met_QDA <- rbind(perf_met_QDA,  cbind(threshold, t(unlist(doQDA(threshold = threshold))))) # Train/test model and store perf. met.
#}
# write.csv(perf_met_QDA, file = "QDA_02.csv", row.names=FALSE) # Keep as csv file on local
#------------------------------------------------------------------------------#
plot_data <- melt(perf_met_QDA  ,  id.vars = 'threshold', variable.name = 'series') # Plot performance evaluation for different value of hy.par.
ggplot(plot_data, aes(threshold, value)) + 
  geom_line() + 
  facet_grid(series ~., scale = "free_y") +
  scale_x_continuous(breaks = seq(0.6, 0.7, 0.01))

In conclusion, optimal decision threshold is choosen at 0.65 which yields the best of most performance metrics. Below is the full performance metrics at optimal decision threshold.

perf_met_QDA_opt <- perf_met_QDA %>% # Get performance metric for threshold = 0.6
  filter(threshold == 0.65) %>%
  dplyr::select(-(threshold))
perf_met_QDA_opt <- cbind("method" = "QDA", perf_met_QDA_opt) # Add method name
perf_met_QDA_opt

Important: Once again, decision threshold need to be set along trading days. A good rule of thumb is to set it at 0.5 or adjust it to suit the investor’s confident on the model.

4.3.1.2 K-Nearest Neighbour

With KNN method, we factorised the future returns using 3 criteria: percentile and momentum with 1 and 3 periods, then the model is trained with the features selected by LASSO algorithm. Lastly, equal weight is applied to the stocks with ‘buy’ signals.

Nonetheless, there is a lost of information when strategies portfolio this way.

  1. For percentile categories, we need to sacrifice some first chunk of past data to create the percentile. Here, we start computing the signals after 36 first periods. (dataset = day 37-229)
outputs_P <- matrix(0, nrow = length(dates_all)-36, ncol = length(tick))
colnames(outputs_P) <- tick
for (i in 1:length(tick)) {
   class_P <- P_sig(as.matrix(returns[tick[i]][37:length(dates_all),]),
                    as.matrix(returns[tick[i]]), 37)
   outputs_P[,i] = class_P
}
  1. We will lose the last observation for 1 momentum critaria as there is no future returns for the last period. (dataset = day 2-229)
outputs_MoM.1 <- matrix(0, nrow = length(dates_all)-1, ncol = length(tick))
colnames(outputs_MoM.1) <- tick
for (i in 1:length(tick)) {
  a_table <- cbind(as.matrix(returns[tick[i]][2:(length(dates_all)),]),
                  as.matrix(returns[tick[i]][1:(length(dates_all)-1),]))
  class_MoM.1 <- MoM.1_sig(a_table)
  outputs_MoM.1[,i] = class_MoM.1[,3]
}
  1. For 3 momentum periods, it costs 7 observations. We lose the first 4 to consider previous momemtum for the first case and the last 3 for not having future momemtum. (dataset = day 5-226)
outputs_MoM.3 <- matrix(0, nrow = (length(dates_all)-7), ncol = length(tick) )
colnames(outputs_MoM.3) <- tick
for (i in 1:length(tick)) {
  zeros <- matrix(c(0,0,0), nrow = 3)
  F_vector <- rbind(zeros,as.matrix(returns[tick[i]][5:(length(dates_all)-3),]))
  P_vector <- as.matrix(returns[tick[i]][1:(length(dates_all)-4),])
  #P_vector <- P_vector[1:(nrow(P_vector)-1),]
  a_table <- cbind(F_vector, P_vector)
  class_MoM.3 <- MoM.3_sig(a_table)
  outputs_MoM.3[,i] = class_MoM.3[,3]
}

Therefore, for consistency, we reduce all dataset for categorical outputs to all start on day 37 and end on day 226.

outputs_P <- outputs_P[1:190,]
outputs_MoM.1 <- outputs_MoM.1[36:225,]
outputs_MoM.3 <- outputs_MoM.3[33:nrow(outputs_MoM.3),]
oos_start_cat <- ((length(dates_all)-length(dates_oos))+1) - 37 
oos_length_cat <- 190 - oos_start_cat + 1
oos_ret_cat <- as_tibble(returns) %>% 
  slice(oos_start_cat:190) %>%
  dplyr::select(-Date) %>%                                   
  as.matrix()    

#due to trimming of dataframe for categorical outputs, the end date 
timeSlices_cat_train <- createTimeSlices(1:oos_start_cat-1, 
                   initialWindow = 36, horizon = 12, fixedWindow = TRUE)
4.3.1.2.1 Train KNN with percentile
for (i in 1:length(tick)){
  a <- selectFeature2(tick[i],outputs_P) %>%
    mutate(output = as.factor(output))          
  a_pred <- train(output ~ ., data = a[unlist(timeSlices_train),], 
                  method = "knn", 
                  preProc = c("center", "scale"))
  a_actual.future_test = a[oos_start_cat:nrow(a),] %>%
    mutate(output = as.factor(output))          
  a_pred.future_test <- predict(a_pred, a_actual.future_test)       # get prediction
  signals_knn_P_cat[,i]  <- as.numeric(as.character(a_pred.future_test))   
  # combine signals for all stocks
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] 18
[1] 19
[1] 20
[1] 21
[1] 22
[1] 23
[1] 24
[1] 25
[1] 26
[1] 27
[1] 28
[1] 29
[1] 30
[1] 31
[1] 32
[1] 33
[1] 34
[1] 35
[1] 36
[1] 37
[1] 38
[1] 39
[1] 40
[1] 41
[1] 42
[1] 43
[1] 44
[1] 45
[1] 46
[1] 47
[1] 48
[1] 49
[1] 50
[1] 51
[1] 52
[1] 53
[1] 54
[1] 55
[1] 56
[1] 57
[1] 58
[1] 59
[1] 60
[1] 61
[1] 62
[1] 63
[1] 64
[1] 65
[1] 66
[1] 67
[1] 68
[1] 69
[1] 70
[1] 71
[1] 72
[1] 73
[1] 74
[1] 75
[1] 76
[1] 77
[1] 78
[1] 79
[1] 80
[1] 81
[1] 82
[1] 83
[1] 84
[1] 85
[1] 86
[1] 87
[1] 88
[1] 89
[1] 90
[1] 91
[1] 92
[1] 93
[1] 94
[1] 95
[1] 96
[1] 97
[1] 98
[1] 99
[1] 100
[1] 101
[1] 102
[1] 103
[1] 104
[1] 105
[1] 106
[1] 107
[1] 108
[1] 109
[1] 110
[1] 111
[1] 112
[1] 113
[1] 114
[1] 115
[1] 116
[1] 117
[1] 118
[1] 119
[1] 120
[1] 121
[1] 122
[1] 123
[1] 124
[1] 125
[1] 126
[1] 127
[1] 128
[1] 129
[1] 130
[1] 131
[1] 132
[1] 133
[1] 134
[1] 135
[1] 136
[1] 137
[1] 138
[1] 139
[1] 140
[1] 141
[1] 142
[1] 143
[1] 144
[1] 145
[1] 146
[1] 147
[1] 148
[1] 149
[1] 150
[1] 151
[1] 152
[1] 153
[1] 154
[1] 155
[1] 156
[1] 157
[1] 158
[1] 159
[1] 160
[1] 161
[1] 162
[1] 163
[1] 164
[1] 165
[1] 166
[1] 167
[1] 168
[1] 169
These variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_Cap
[1] 170
[1] 171
[1] 172
[1] 173
[1] 174
[1] 175
[1] 176
[1] 177
[1] 178
[1] 179
[1] 180
[1] 181
[1] 182
[1] 183
[1] 184
These variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yieldThese variables have zero variances: Div_yield
[1] 185
[1] 186
[1] 187
[1] 188
[1] 189
[1] 190
[1] 191
[1] 192
[1] 193
[1] 194
[1] 195
[1] 196
[1] 197
[1] 198
[1] 199
[1] 200
[1] 201
[1] 202
[1] 203
[1] 204
[1] 205
[1] 206
[1] 207
[1] 208
[1] 209
4.3.1.2.2 Train KNN with 1-period momentum
for (i in 1:length(tick)){
  a <- selectFeature2(tick[i], outputs_MoM.1) %>%
    mutate(output = as.factor(output))          
  a_pred <- train(output ~ ., data = a[unlist(timeSlices_train),], 
                  method = "knn", 
                  preProc = c("center", "scale"))
  a_actual.future_test = a[oos_start_cat:nrow(a),] %>%
    mutate(output = as.factor(output))          
  a_pred.future_test <- predict(a_pred, a_actual.future_test)       # get prediction
  
  signals_knn_MoM.1_cat[,i]  <- as.numeric(as.character(a_pred.future_test))   
    # combine signals for all stocks
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] 18
[1] 19
[1] 20
[1] 21
[1] 22
[1] 23
[1] 24
[1] 25
[1] 26
[1] 27
[1] 28
[1] 29
[1] 30
[1] 31
[1] 32
[1] 33
[1] 34
[1] 35
[1] 36
[1] 37
[1] 38
[1] 39
[1] 40
[1] 41
[1] 42
[1] 43
[1] 44
[1] 45
[1] 46
[1] 47
[1] 48
[1] 49
[1] 50
[1] 51
[1] 52
[1] 53
[1] 54
[1] 55
[1] 56
[1] 57
[1] 58
[1] 59
[1] 60
[1] 61
[1] 62
[1] 63
[1] 64
[1] 65
[1] 66
[1] 67
[1] 68
[1] 69
[1] 70
[1] 71
[1] 72
[1] 73
[1] 74
[1] 75
[1] 76
[1] 77
[1] 78
[1] 79
[1] 80
[1] 81
[1] 82
[1] 83
[1] 84
[1] 85
[1] 86
[1] 87
[1] 88
[1] 89
[1] 90
[1] 91
[1] 92
[1] 93
[1] 94
[1] 95
[1] 96
[1] 97
[1] 98
[1] 99
[1] 100
[1] 101
[1] 102
[1] 103
[1] 104
[1] 105
[1] 106
[1] 107
[1] 108
[1] 109
[1] 110
[1] 111
[1] 112
[1] 113
[1] 114
[1] 115
[1] 116
[1] 117
[1] 118
[1] 119
[1] 120
[1] 121
[1] 122
[1] 123
[1] 124
[1] 125
[1] 126
[1] 127
[1] 128
[1] 129
[1] 130
[1] 131
[1] 132
[1] 133
[1] 134
[1] 135
[1] 136
[1] 137
[1] 138
[1] 139
[1] 140
[1] 141
[1] 142
[1] 143
[1] 144
[1] 145
[1] 146
[1] 147
[1] 148
[1] 149
[1] 150
[1] 151
[1] 152
[1] 153
[1] 154
[1] 155
[1] 156
[1] 157
[1] 158
[1] 159
[1] 160
[1] 161
[1] 162
[1] 163
[1] 164
[1] 165
[1] 166
[1] 167
[1] 168
[1] 169
These variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_CapThese variables have zero variances: Ret_Cap
[1] 170
[1] 171
[1] 172
[1] 173
[1] 174
[1] 175
[1] 176
[1] 177
[1] 178
[1] 179
[1] 180
[1] 181
[1] 182
[1] 183
[1] 184
[1] 185
[1] 186
[1] 187
[1] 188
[1] 189
[1] 190
[1] 191
[1] 192
[1] 193
[1] 194
[1] 195
[1] 196
[1] 197
[1] 198
[1] 199
[1] 200
[1] 201
[1] 202
[1] 203
[1] 204
[1] 205
[1] 206
[1] 207
[1] 208
[1] 209
4.3.1.2.3 Result: Peformance Metrics for KNN (Categorical)

4.3.1.3 Logit Model

This model transform the outputs into probability. Therefore, the outputs are capped between 0 and 1. However, it uses non-normal distribution to transform the output. It consumes less power to perform without the normal distribution constraint, but it costs fatter tails. There is no hyperparameter in this model.

4.3.1.3.1 Train logit model with percentile
4.3.1.3.2 Train logit model with 1-period momentum

Since the condition is rarely meet, some test folds contain information missing in training folds. For this reason, we will skip this method with the assumption that the signals represent a lack in evidence of consistent increasing momentum over 3 periods. In other word, it can be interpreted that 3-period past information is irrelevant in determining significant signals.

For example, considering apple stock autocorrelation below. The figure shows that past information is barely indicative, except only for lag 15-period observation which might be too further away from the current time to have an economic meaning.

4.3.1.3.3 Result: Performance Metrics for Logit Model

4.3.1.4 Probit Model

Similar to logit model, this model transform also the outputs into probability, which cap outputs between 0 and 1. On a contrary to logit model, it uses normal distribution to transform the output.

4.3.1.4.1 Train probit model with percentile
4.3.1.4.2 Train probit model with 1-period momentum

addadd –> for the same reason, no mom3 for probit

4.3.1.4.3 Result: Performance Metrics for Probit Model

5 Analysis

To assess the performance of each machine learning technique, we focus on the aspect of portfolio. Accordingly, we score each aspect of performance metrics from 1 to 18 for 18 methods, where 1 is the best in that category and 18 is the worst. Thus, we call this score ‘rank’. Namely, rank 1 for average return, sharpe ratio and VaR are the highest, while rank 1 for volatility and turnover are the lowest.

Table 1: Summary table for performance metrics & rank

Text
Text

From the table, our benchmark portfolio is quite high in the rank at #5. The all ML techniques which outperforms benchmark are categorical dependent variable with signals, except sparse portfolio which is #1. Thus, clearly, sparse portfolio outperforms signal models.

Within the signal models, categorical method outshines the numerical. Consider KNN, which is used for both categorical and numerical outputs. It performs poorly in the numerical outputs in signal models, as seen in the bottom ranks. In contrast, it gives top-ten ranks for categorical one. This might be because KNN algorithm tries to separate outputs into regions based on features, thus; it does a better job for categorical outputs as there are less regions to work with. This is because numerical outputs have more various values which the model needs to learn and fit, hence these models might face overfitting problems.

The rank shows that the simpler models, including probit and logit models, have low-end ranks. Whereas, the complex models, such as random forest and KNN, have high-end ranks. Nonetheless, the intermediate model in terms of complexity, QDA, performs the best among categorical model. This reflects that although the model tends to improve with higher complexity (lower variance), however it will face with high variance in the out-of-sample set. Therefore, only some level of complexity is appropriate, where it minimise both bias and variance.

According to performances of our portfolios, it can be assigned to each investors who prefer different levels of risk. For instance, risk-aversive investors will be willing to allocate their capital on less volatile portfolios such as Sparse portfolio. On the other hand, investors who can bear more risks can choose to invest in more risky portfolios to gain higher returns such as Categorical KNN portfolios. Sparse portfolio and Equally weighted portfolio are recommended for passive portfolio managing investors which will cost significantly less transanction cost than other portfolio strategies.

---
title: "Machine Learning in Factor Investing"
output:
  html_document:
    df_print: paged
    toc: yes
  html_notebook:
    number_sections: yes
    theme: default
    toc: yes
    toc_float:
      collapsed: no
  word_document:
    toc: yes
---
# Introduction

In this report, we will evaluate machine learning application on investing strategies. There are several ways to incorporate machine learning techniques to construct a portfolios. However, we will focus on 2 main approaches; sparse portfolio and signals. 

## Models

** Benchmark portfolio **
We need benchmark portfolio to evaluate our strategies. For simplicity, we use equally-weighted portfolio as our benchmark. Where it is needed, the same weighting scheme will be applied to predicted results from machine learning strategies to clearly contrast the impact of having included machine learning techniques in the strategies.

** Sparse portfolio **
Sparse portfolio applies machine learning algorithm to predict stock future return by using other stocks’ current returns as features.  Although it is not conventional, this approach gave insightful results as discussed in the class through Lasso method that it can be used to assign weights to each stock directly. Here, we will investigate the methodology by applying **elastic network**. Elastic network is a combination of Lasso and Ridge regressions. Both are regularization methods which help the model avoid overfitting problem caused by inclusion of too many features. To do this, they penalise insignificant features, meaning that they reduces beta of each feature leaving large beta’s for significant features. Nonetheless, the difference is that Ridge reduces all beta’s without kicking any of features out of the model (l2 norm), while Lasso does both (l1 norm). We decided to apply elastic network because it allows us to utilise both of their advantages; feature selection from Lasso while not kicking the feature out too fast from Ridge. Therefore, hyperparameter for this method is the degree to use l1 and l2 norms, which are alpha and alpha-1. 

** Signals **
This approach is rather more conventional application of machine learning to predict future returns. We will use the features provided for each stock to predict their future returns in each period.  However, to consider the distribution of the data, we also add mean, variance, and skewness of each stock. 

Signal approach can be splitted into 2 sub-approaches; numerical and categorical dependent variables. As future returns are numerical variable, we can apply only some machine learning techniques. Hence, we will transform future returns into categorical variable to allow us to construct a portfolio by utilising both types of algorithm. 

While still considering the sign (positive and negative returns), we will work with 3 signals; percentile, 1-period momentum, and 3-period momentum. For numerical dependent variable, we transform predicted future returns into signals before apply equally-weighted scheme. For categorical approach, we transform actual future returns into signals, then use the signal as dependent variables. Lastly, we use the predicted signals to apply equally-weighted scheme.

# Data Preparation
## Import Data
Import and preview data

```{r pkg_import_data, message = FALSE, warning = FALSE}
load('data_full.RData')
head(data)
```
## Data Pre-processing
In addition to the given data, we need to perform some important data-preprocessing in order to generate new variables that are essential to our analysis. Those variables include the following:

1. **P_Return** (Past return): The original data only contains monthly closing price information from which we calculate past monthly return. In general, return is commonly used rather than closing price in portfolio backtesting.

2. **F_Return** (Future return): Our y variable is future monthly return which can be obtained by leading past return by one period.

3. **Moments of Return** (**Mean**, **Variance**, and **Skewness**): We decided to generate additional information including first three moments of past monthly return to maximize the benefit of the data we have.

```{r, message = FALSE, warning = FALSE}
if(!require(tidyverse)){install.packages('tidyverse')}
if(!require(PerformanceAnalytics)){install.packages('PerformanceAnalytics')}
if(!require(dplyr)){install.packages("dplyr")}
library(dplyr)
library(tidyverse)
library(PerformanceAnalytics)
load('data_full.RData')                             # Reload data
data <- data %>% arrange(Date,Tick)
tick <- levels(data$Tick) 
data <- data  %>% 
    group_by(Tick) %>%                              # Group asset by asset
    mutate(P_Return = Close / lag(Close) - 1) %>%   # Adding past returns
    mutate(F_Return = lead(P_Return)) %>%           # Adding forward returns
    na.omit() %>%                                   # Drop all NA row before calculate moments
    mutate(mean = apply.fromstart(P_Return, FUN = 'mean' , gap = 1)) %>%              # Adding past return mean
    mutate(variance = apply.fromstart(P_Return, FUN = 'var' , gap = 1)) %>%           # Adding past return variance
    mutate(skewness = apply.fromstart(P_Return, FUN = 'skewness' , gap = 1)) %>%      # Adding past return skewness
    na.omit() %>%                                                                        # Drop all NA row
    ungroup()  
head(data)
```

Before starting the analysis, we would like to investigate correlation and autocorrelation between the features in order to be aware of any hidden relationship among the explanatory variables. First, we find correlation between explanatory variables.

``````{r}
corr <- data %>%
  dplyr::select(-one_of('Date', 'Tick', 'F_Return'))
corr <- data.frame(cor(corr))
corr
```

There are 3 pairs of interesting correlation between explanatory variables.

1. RSI postively correlated with Past Return (0.92)
2. D2E positively correlated with P2B (0.64)
3. Vol1M negatively correlated with Close (-0.66)

It is important to be aware that overfitting might occur if the model uses the training data which includes 2 or more positively correlated variables due to their information redundancy. 

Apart from correlation between explanatory variables, we should also be considering the autocorrelation for each feature. Below is the analysis of autocorrelation for our x variables. It is obvious than there is not any significant autocorrelation for any feature.

``````{r}
# autocorr <- data %>%
#   dplyr::select(-one_of('Date', 'Tick', 'F_Return'))
# temp_c = c()
# for (i in seq(1,length(colnames(acf)),1)){
#   temp_c <- cbind(temp_c, cor(autocorr[1:nrow(autocorr)-1,i], autocorr[2:nrow(autocorr),i]))
# }
# rownames(temp_c) <- c()
# acf <- data.frame(temp_c)
# acf
```

## Split data
Within our monthly data over 25 years, there are 229 dates from April 1994 to April 2018. We decided to split the data into training set and testing set using 70/30 ratio. Since we are dealing with timeseries data, we simply split the dataset using the separation date which will give such desirable ratio, January 2013.


``````{r}
dates_all <- unique(data$Date)              # Select all unique dates
sep_date <- as.Date("2013-01-02")           # Set the seperation date for training set and          out-of-sample
dates_oos <- dates_all[dates_all >= sep_date]        # Out of sample set
```

# Pre-defined functions and variables
Below includes the global functions and variables, which are utilised throughout the file. 

## Returns

```{r}
returns <- data %>%                         
    dplyr::select(Date, Tick, P_Return) %>%        
    spread(key = Tick, value = P_Return)    
head(returns)

asset_return <- filter(returns, Date>sep_date)   

returns_oos <- data %>%                         
    dplyr::select(Date, Tick, P_Return) %>%       
    spread(key = Tick, value = P_Return) %>%
    filter(Date > sep_date) %>%
    dplyr::select(-Date)

```

## Equally-weighted calculation
```{r}
getEW <- function(weight){ # Define a function to calculate equally-weighted portfolio
  ew <- (t(scale(t(weight), center=FALSE, scale=colSums(t(weight)))))  # Normalise weight (Equally weighted)
  ew[is.na(ew)] <- 0
  return (ew)
}
```

## Feature Selection 1 
Define a function for feature selection for each stock using lasso. The function receives stock ticker as a parameter and returns a dataframe with only selected explanatory variables. The function uses **glmnet** package to build a lasso model using *lambda.min* which minimize the mean cross-validate error. In general, LASSO is a common tool for feature selection due to its penalised terms in objective function. Any variables with insigficant meaning to our response variable will be eliminated by lasso. There is a special case in which lasso eliminates all features. As a result, past return (P_Return) is used for the only feature.

```{r, message = FALSE, warning = FALSE}
if(!require(glmnet)){install.packages('glmnet')}
library(glmnet)
selectFeature <- function(ticker){
  set.seed(9) # Set seed for consistency
  temp <- data %>% filter(Tick == ticker) %>% # Filter ticker
      dplyr::select(-one_of('Tick', 'Date','Close')) # Drop tick, date, and close columns
  lasso <- cv.glmnet(x = as.matrix(temp[,-which(names(temp) == "F_Return")]), y = as.matrix(temp[,which(names(temp) == "F_Return")])) # Train lasso model
  subset = coef(lasso, s = "lambda.min")@Dimnames[[1]][coef(lasso, s = "lambda.min")@i + 1] # Get list of remaining variables
  subset =  subset[!subset == "(Intercept)"] # Get rid of intercept value
  if(length(subset) == 0){ # If there is no remaining variable
    subset = c('P_Return') # Use P_Return as only explanatory variable
  }
  return (data %>% filter(Tick == ticker) %>% # Return dataframe with Date, F_Return, and the selected explanatory variables
    dplyr::select(cbind('Date','F_Return',subset)))
}
```

Test the function by selecting feature for Apple stock
```{r,message = FALSE, warning = FALSE}
selectFeature('AAPL')
```

For Apple, lasso chooses 6 meaningful features including *Mkt_Cap*, *Vol_1M*, *Prof_growth*, *mean*, *variance*, and *skewness*.

## Feature Selection 2
Use *signal* of future returns as dependent variable and perform feature selection using lasso for each stock.

```{r, message = FALSE, warning = FALSE}
if(!require(glmnet)){install.packages('glmnet')}
library(glmnet)
selectFeature2 <- function(ticker,sig_dat){
    #sig_dat: matrix with signal outputs for each stock
  set.seed(9)
  temp <- data %>% filter(Tick == ticker) %>%
      dplyr::select(-one_of('Date','Close'))
  temp <- temp[-c(1)]
  temp <- temp[37:226,] #we trim the dataset for categorical, the reason is explained in the categorial section below.
  y = as.tibble(sig_dat) %>%
    dplyr::select(ticker) 
  y = as.matrix(y)
  colnames(y) <- "y"
  lasso <- cv.glmnet(x = as.matrix(temp[,-which(names(temp) == "F_Return")]), 
                     y = y)
  subset = coef(lasso, s = "lambda.min")@Dimnames[[1]][coef(lasso, s = "lambda.min")@i + 1]
  subset =  subset[!subset == "(Intercept)"]
  if(length(subset) == 0){
    subset = c('P_Return')
  }
  result <- (data %>% filter(Tick == ticker) %>%
    dplyr::select(cbind('Date',subset))) %>%
    slice(37:226) %>%
    mutate(output = y)
}
```

## Performance Metrics 
```{r set up performance Metrics}
perf_met <- function(portf_returns, weights, asset_returns){
    avg_ret <- mean(portf_returns, na.rm = T)                     # Arithmetic mean 
    vol <- sd(portf_returns, na.rm = T)                           # Volatility
    Sharpe_ratio <- avg_ret / vol                                 # Sharpe ratio
    VaR_5 <- quantile(portf_returns, 0.05, na.rm = TRUE)          # Value-at-risk
    turn <- 0    
    for(t in 2:dim(weights)[1]){
        realised_returns <- asset_returns %>% filter(Date == dates_oos[t]) %>%  dplyr::select(-Date)
        prior_weights <- weights[t-1,] * (1 + realised_returns)
        temp_turn <- apply(abs(weights[t,] - prior_weights/sum(prior_weights)),1,sum)
        if(!any(is.na(temp_turn))){
          turn <- turn + apply(abs(weights[t,] - prior_weights/sum(prior_weights)),1,sum)
        }
    }
    turn <- turn/(length(dates_oos)-1)                                # Average over time
    met <- data.frame(avg_ret, vol, Sharpe_ratio, VaR_5, turn)    # Aggregation of all of this
    rownames(met) <- "metrics"
    return(met)
}
```

```{r set up performance Metrics2}
perf_met2 <- function(ret,weight,dates_oos,port_name,actual_ret){
    avg_ret <- mean(ret, na.rm = T)                     # Arithmetic mean 
    vol <- sd(ret, na.rm = T)                           # Volatility
    Sharpe_ratio <- avg_ret / vol                           # Sharpe ratio
    VaR_5 <- quantile(ret, 0.05)                        # Value-at-risk
    turn <- 0
    for(t in 2:length(ret)) {
        realised_ret <- actual_ret %>%
          filter(Date == dates_oos[t]) %>%
          dplyr::select(-Date)
        prior_weight <- weight[t-1,] * (1 + realised_ret) # Before rebalancing
        turn <- turn + apply(abs(weight[t,] - prior_weight/sum(prior_weight)),1,sum)
    }
    turn <- turn/(length(ret)-1)
    met <- data.frame(avg_ret, vol, Sharpe_ratio, VaR_5, turn)    # Aggregation of all of this
    rownames(met) <- port_name
    return(met)
}
```

# Investing Strategies 
We create various portfolios using same set of 209 stocks based on 3 different schemes. 

* Sparse portfolio
Weight to each stock is assigned based on the influence of other stocks' returns on that stock. Whereas, the degree of such influence (weight) is determined by the following algorithms. 
    + Elastic Network
  
* Signals
    + Numercial: use future returns as dependent variables
    **Steps**
        + Use Lasso to determine features for each stocks
        + Predict the future returns for each stock using different ML algorithms, including
            + Lasso
            + KNN
            + Random Forest
        + Translate the numerical results into signals using 3 criteria:
            + Percentile - use expanding historical data to split percentile decision regions 
            P0-P33: 'sell'
            P33-P66: 'hold', includes in portfolio only if the stock is held/bought from the previous period
            P66-P100: 'buy'
          
          + Monthly Momentum - 'buy' a stock if predicted returns increase from previous day, and sell otherwise
          + 3-period Momentum - 'buy' a stock if predicted returns have been increasing for all 3 previous periods and the predicted return for the 4th period is in the same direction (rising), and sell otherwise
        + Consider the numerical results, and override signals:
          'buy' only when signal is 'buy' and numerical prediction is positive, 
          'sell' otherwise
        + Apply equally weighted scheme for stocks with 'buy' signals

        
    + Categorical: use signals of future returns as dependent variables
    **Steps**
        + Translate the numerical actual future returns into signals using the same 3 criteria:
          + Percentile 
          + Monthly Momentum 
          + 3-period Momentum 
        + Use Lasso to determine features for each stocks
        + Predict the signals of future returns for each stock using different ML algorithms, including
          + QDA
          + KNN
          + Logistic Model
          + Probit Model
        + Apply equally weighted scheme for stocks with 'buy' signals
        
* Benchmark portfolio
To interpret and gain understanding of the use of machine learning, we set equally-weighted portfolio as a benchmark.
    + EW

## Benchmark portfolio
We use equally weighted portfolio as a benchmark for other strategies. The portfolio is simply distributed the risk of each asset equally across the portfolio. The portfolio weights can be computed as it holds all selected assets equally as $w=\frac{1}{N}$.

### Setup functions to calculate equal weigths
```{r benchmark1, message = FALSE, warning = FALSE}
weights_ew <- function(returns){
    N = length(returns[1,])
    return(rep(1/N,N))
}
```

### Calculate portfolio return
```{r benchmark2, message = FALSE, warning = FALSE}
# Initiate storage for portfolio weights
port_w_ew = matrix(0, nrow = length(dates_oos), ncol = length(tick))
colnames(port_w_ew) = tick


# Initiate storage for portfolio returns
port_return_ew = matrix(0, nrow = length(dates_oos))

for(i in 1:length(dates_oos)){
    
    # Get the past return data
    past_return <- returns %>% 
      filter(Date < dates_oos[i]) %>%
      dplyr::select(-Date)
    
    # Calculate portfolio weight
    port_w_ew[i,] <- weights_ew(past_return)
    
    # Get the realised data
    realised_returns <- returns %>% 
        filter(Date ==  dates_oos[i]) %>%
        dplyr::select(-Date)
    
    # Calculating returns for each portfolio
    port_return_ew[i] <- sum(port_w_ew[i,] * realised_returns)
}

```

### Result: Performance matrix for Benchmark portfolio
```{r}
asset_returns <- data %>% 
    dplyr::select(Tick, Date, F_Return) %>% 
    spread(key = Tick, value = F_Return)
met_ew = perf_met(port_return_ew,port_w_ew,asset_returns)
row.names(met_ew) = c("Equally weighted")
met_ew
```

## Sparse Portfolio

### Elastic Network

**Elastic net** is a form of penalized regression which used to create sparse portfolio based on returns of assets.  The condition of the regression is as follows
$$ \hat{\beta}^{elastic}=\underset{\beta}{\text{argmin}}\, \left\{\frac{1}{N}\sum_{n=1}^N\left( y_{i}-a_i+\sum_{k= 1}^K\beta_{k}x_{k,i}\right)^2+\lambda \alpha \sum_{k= 1}^K|  \beta_k|+\lambda (1-\alpha)\sum_{k= 1}^K|  \beta_{k}^2|\right\},$$

#### Setup for elastic network weight and return calculations
The model use two parameters $\lambda$ and $\alpha$. $\alpha$ is the parameter that changes the weight between Ridge regression and LASSO regression (Ridge: $\alpha = 0$, LASSO: $\alpha = 1$). $\lambda$ is the parameter that adjust the weight of the penalised part.

To find portfolio weights using Elastic net, we regress the returns of the selected asset on the returns of other assets as. $r_{i,t}=a_i+\sum_{n\neq i}^N\beta_{n}r_{n,t}+\epsilon_{t}$

Using penalised regression to get.
$$ \underset{\beta_{i|}}{\text{argmin}}\, \left\{\sum_{t=1}^T\left( r_{i,t}-a_i+\sum_{n\neq i}^N\beta_{i|n}r_{n,t}\right)^2+\lambda \alpha ||  \beta||_1+\lambda (1-\alpha)||\beta||_2^2\right\},$$
```{r, message = FALSE, warning = FALSE}
weights_lasso <- function(returns,alpha){  # The parameters are defined here
    w <- 0                                          # Initiate weights
    for(i in 1:ncol(returns)){                      # Loop on the assets
        y <- returns[,i]                            # Dependent variable
        x <- returns[,-i]                           # Independent variable
        fit <- glmnet(x,y, alpha = alpha,lambda = 0.5)
        err <- y-predict(fit, x)                    # Prediction errors
        w[i] <- (1-sum(fit$beta))/var(err)          # Output: weight of asset i
    }
    return(w / sum(w))                              # Normalisation of weights
}
```

#### Alpha selection and calculation weights & returns

Elastic net portfolio weights can be calculated using the following steps
1. Use the value of alphas ranging from 0, 0.1, 0.2,...,1 and using $\lambda = 0.5$.
2. Selecting past return data and use the function **weights_lasso** to calculate protfolio weights
3. Multiply the weights with realised returns to get the portfolio returns
4. Select the highest return portfolio

```{r, message = FALSE, warning = FALSE}
port_w_elas = matrix(0, nrow = length(dates_oos), ncol = length(tick))
colnames(port_w_elas) = tick

port_return_elas = matrix(0, nrow = length(dates_oos), ncol = 11)
colnames(port_return_elas) = seq(0,1,by = 0.1)

# for (a in 0:10){
#   alpha = a/10
#   print(a)
#   
  # for(i in 1:length(dates_oos)){
  #   # Get the past return data
  #   past_return <- returns %>%
  #     filter(Date < dates_oos[i]) %>%
  #     dplyr::select(-Date) %>%
  #     as.matrix()
  # 
  #   port_w_elas[i,] <- weights_lasso(past_return,alpha)
  # 
  #   realised_returns <- returns %>%
  #       filter(Date ==  dates_oos[i]) %>%
  #       dplyr::select(-Date)
  # 
  #   port_return_elas[i,(a+1)] <- sum(port_w_elas[i,] * realised_returns)
# }
# }

#write.csv(port_return_elas,file = "return_elas.csv")

port_return_elas = read.csv(file = "return_elas.csv")
port_return_elas = port_return_elas[-1]

# Finding alpha that generate largest average return
colMeans(port_return_elas)
port_return_elas_max = as.matrix(port_return_elas[,which.max(colMeans(port_return_elas))])

# Get the alpha from the column index
alpha = which.max(colMeans(port_return_elas))/10

# Get the weights
for(i in 1:length(dates_oos)){
    # Get the past return data
    past_return <- returns %>%
      filter(Date < dates_oos[i]) %>%
      dplyr::select(-Date) %>%
      as.matrix()

    port_w_elas[i,] <- weights_lasso(past_return,alpha)
}
```

#### Result: Performance matrix for Sparse portfolio
```{r}
asset_returns <- data %>% 
    dplyr::select(Tick, Date, F_Return) %>% 
    spread(key = Tick, value = F_Return) 

met_elas = perf_met(port_return_elas_max,port_w_elas,asset_returns)
row.names(met_elas) = c("Sparse")
met_elas
```


## Signals

*signals*
We separate the predicted returns into 3 categories based on the asset past returns.
1. BUY signal (+1): If the predicted return of asset i is higher than P66 of its entire past returns
2. SELL signal (0): If the predicted return of asset i is lower than P33 of its entire past returns
3. HOLD signal (-1):If the predicted return of asset i falls in between P33 and P66 of its entire past returns

#### LASSO

For LASSO method, **glmnet** is to use penalised regression to predict future returns using the past feature data including past returns. The procedure are as the following steps:
    1. Get the past data as independent variables (x)
    2. Get the past returns as a dependent variable (y)
    3. Use LASSO as a model to predict the future returns
    4. Perform cross validation from the function **cv.glmnet**. the objective is to use the lambda that minimise the variance of the error
    5. Use the current feature data to predict the next period returns

##### Train LASSO model 
```{r}
# temp_return = matrix(NA, ncol = length(tick), nrow=length(dates_oos))
# colnames(temp_return) = tick
# for (i in 1:length(tick)){
#   print(i)
#   for(t in 1:length(dates_oos)){
#     past_data = data %>%
#       filter(Tick == tick[i]) %>%
#       filter(Date < dates_oos[t]) %>%
#       dplyr::select(-Date,-Tick,-Close) %>%
#       tail(.,48) %>%
#       as.matrix()
#     # Get the training data (3months)
#     train_data = head(past_data,36) %>%
#       as.data.frame()
#     x_train = train_data %>%
#       dplyr::select(-F_Return) %>%
#       as.matrix()
#     y_train = train_data %>%
#       dplyr::select(F_Return) %>%
#       as.matrix
# 
# 
#     MSE = matrix(NA, ncol=10,nrow=11)
# 
#     for(a in 1:11){
#       # Alpha = 0, 0.1, 0.2,...,1
#       alpha = (a-1)/10
#     for(l in 1:10){
#       # Lambda = 0, 0.1, 0.2,...,1
#       lambda = (l)/10
#     
#       # Run penalised regression
#       fit = glmnet(x = x_train, y = y_train, alpha = alpha, lambda = lambda)
# 
#       # Get the validation data (3months)
#       validate_data = tail(past_data,12) %>%
#         as.data.frame() 
#       x_validate = validate_data %>%
#         dplyr::select(-F_Return) %>%
#         as.matrix()
#       y_validate= validate_data %>%
#         dplyr::select(F_Return) %>%
#         as.matrix()
# 
#       # Calculate MSE
#       MSE[a,l] = sum((y_validate - predict(fit,x_validate))^2)
# 
#     }
#   }
# 
# # Optimal parameters
# min_alpha = (which(MSE == min(MSE), arr.ind = TRUE)[1] - 1)/10
# min_lambda = (which(MSE == min(MSE), arr.ind = TRUE)[2])/10
# 
# 
# 
# last_period_x = data %>%
#       filter(Tick == tick[i]) %>%
#       filter(Date == dates_oos[t]) %>%
#       dplyr::select(-Date,-Tick,-Close, -F_Return) %>%
#       as.matrix()
#     
#     fit2 = glmnet(x = x_train, y = y_train, alpha = min_alpha, lambda = min_lambda)
#     temp_return[t,i] <- predict(fit2, last_period_x)
#   }
# }
# 
# write.csv(temp_return, file = "return_lasso.csv")

# For further execution
temp_return = read.csv(file = "return_lasso.csv")
temp_return = temp_return[-1]

```

** with LASSO with percentile **
```{r Signal Percentile LASSO}
q_signal = matrix(NA, ncol = length(tick), nrow=length(dates_oos))
colnames(q_signal) = tick
for (i in 1:length(tick)){
  print(i)
  for (t in 1:length(dates_oos)){
    past_return = data %>%
      filter(Tick == tick[i]) %>%
      filter(Date <= dates_oos[t]) %>%
      dplyr::select(P_Return) %>%
      as.matrix()
  
  if (temp_return[t,i] < as.numeric(quantile(past_return)[2])) {
    q_signal[t,i] = -1
    }
  
  # Returns above Q3 indicate BUY signal
    else if (temp_return[t,i] > as.numeric(quantile(past_return)[4])) {
    q_signal[t,i] = 1
    }
  
  # Returns between Q1 and Q3 indicate HOLD signal  
    else {
    q_signal[t,i] = 0
    }
}

}
# Translate Hold signal
for (i in 1:length(tick)){
  for (t in 2:length(dates_oos)){
    if(q_signal[t,i] == 0){
      q_signal[t,i] = q_signal[t-1,i]
    }
  }
}
# Change SELL signal to 0 (no-short selling)
# 1 = HOLDING , 0 = NOT HOLDING
q_signal[q_signal == -1] <- 0

# Create EW Portfolio
port_w_percentile = q_signal/rowSums(q_signal)

# Calculate Portfolio return
port_return_percentile = matrix(0, nrow = length(dates_oos))
for (i in 1:length(dates_oos)){
  realised_returns = data %>%
    filter(Date == dates_oos[i]) %>%
    dplyr::select(F_Return)
  
  port_return_percentile[i] = sum(port_w_percentile[i,] * realised_returns)
  
}
```

** with LASSO with three-momentum **
```{r MoM signal LASSO}
m_signal = matrix(NA, ncol = length(tick), nrow=length(dates_oos))
colnames(m_signal) = tick

for (i in 1:length(tick)){
  print(i)
  for (t in 1:length(dates_oos)){
    three_M_past_return = data %>%
      filter(Tick == tick[i]) %>%
      filter(Date <= dates_oos[t]) %>%
      dplyr::select(P_Return) %>%
      as.matrix() %>%
      tail(3)
    
    p_1 = three_M_past_return[3]
    p_2 = three_M_past_return[2]
    p_3 = three_M_past_return[1]
    
    if(temp_return[t,i]>0 & p_1>0 & p_2>0 & p_3>0){
      m_signal[t,i] = 1
    }
      else{
      m_signal[t,i] = 0
    }
  }
}

# Create EW Portfolio
port_w_mom = m_signal/rowSums(m_signal)
port_w_mom[is.na(port_w_mom)] <- 0

# Calculate Portfolio return
port_return_mom = matrix(0, nrow = length(dates_oos))
for (i in 1:length(dates_oos)){
  realised_returns = data %>%
    filter(Date == dates_oos[i]) %>%
    dplyr::select(F_Return)
  
  port_return_mom[i] = sum(port_w_mom[i,] * realised_returns)
  
}
```

** with LASSO with one-momentum **
```{r}
m1_signal = matrix(NA, ncol = length(tick), nrow=length(dates_oos))
colnames(m1_signal) = tick

for (i in 1:length(tick)){
  print(i)
  for (t in 1:length(dates_oos)){
    one_M_past_return = data %>%
      filter(Tick == tick[i]) %>%
      filter(Date <= dates_oos[t]) %>%
      dplyr::select(P_Return) %>%
      as.matrix() %>%
      tail(1)

    if(temp_return[t,i]>0 & one_M_past_return>0){
      m1_signal[t,i] = 1
    }
      else{
      m1_signal[t,i] = 0
    }
  }
}

# Create EW Portfolio
port_w_mom1 = m1_signal/rowSums(m1_signal)
port_w_mom1[is.na(port_w_mom1)] <- 0

# Calculate Portfolio return
port_return_mom1 = matrix(0, nrow = length(dates_oos))
for (i in 1:length(dates_oos)){
  realised_returns = data %>%
    filter(Date == dates_oos[i]) %>%
    dplyr::select(F_Return)
  
  port_return_mom1[i] = sum(port_w_mom1[i,] * realised_returns)
  
}


```

##### Result: Performance matrix for LASSO
```{r}
port_names_lasso = c("LASSO Percentile", "LASSO Momentum1", "LASSO Momentum3")
asset_returns <- data %>% 
    dplyr::select(Tick, Date, F_Return) %>% 
    spread(key = Tick, value = F_Return)

met_LASSO_q = perf_met(port_return_percentile,port_w_percentile,asset_returns)
met_LASSO_m1 = perf_met(port_return_mom1,port_w_mom1,asset_returns)
met_LASSO_m = perf_met(port_return_mom,port_w_mom,asset_returns)

lasso_met = rbind(met_LASSO_q,met_LASSO_m1,met_LASSO_m)
row.names(lasso_met) = port_names_lasso
lasso_met
```

#### K-Nearest Neighbour
This algorithm consider values of K-nearest data points to a particular observation in order to predict that observations. Therefore, the hyperparameter for this model is K. Note that, since we train the model for each stock, the model automatically tunes the best K-value for each stock’s model resulting in different K’s.

##### Setup functions for KNN
*Signals*
```{r setup signals_percentile, echo = T}
#percentile
P_sig <- function(dat,hist_dat,sig_date) { 
	  #dat: one column of returns (predicted for numerical, actual for categorical)
  	#hist_dat: historical data to create percentile
    #sig_date: date at which the signal is assigned
  sig <- matrix (0,nrow = length(dat),ncol = 1)
  for (i in 1:length(dat)) {
    P_dat <- hist_dat[1:(sig_date-2+i)]    #expanding data to create percentile
    if (dat[i] >= quantile(P_dat, .66)) {
	    sig[i] = 1
    } else if (dat[i] >= quantile(P_dat, .33)) {
    	if (i == 1) {
    	  sig[i] = 0
    	} else if (sig[i-1] == 1) {
          sig[i] = 1
        } 
    } else {
    	sig[i] = 0
    }
  }
  return(sig)
}
```

```{r setup signals_MoM, echo = T}

#For Momentum functions, the input is data frame with future returns as first column and current returns as second columns

#Monthly MoM
MoM.1_sig <- function(dat){
  sig <- matrix(0,nrow = nrow(dat),ncol = 1)
  for (i in 1:nrow(dat)) {
    if (dat[i,1] > dat[i,2]) {      #buy if predicted future return > current return
      sig[i] = 1
    } else {
      sig[i] = 0
    }
  }
  return(cbind(dat,sig))
}

# Quarterly MoM
MoM.3_sig <- function(dat){
  sig <- matrix(0,nrow = nrow(dat),ncol = 1)
  for (i in 4:nrow(dat)) {
    if ((dat[i-2,2] > dat[i-3,2]) & (dat[i-1,2] > dat[i-2,2]) & (dat[i,2] > dat[i-1,2]) &
    (dat[i,1] > dat[i,2])){ 
                #buy if predicted future return > current return
                #given that the past return had been on rising momentum for last 3 periods
      sig[i] = 1
    } else {
      sig[i] = 0
    }
  }
  result <- cbind(dat,sig)
  result <- result[4:(nrow(dat)),] #somehow this does not delete first 3 rows
  #result <- result[4:67,]
  return(result)
}
```

*Cross-validation*
Unlike sparse portfolio method, we deal with time-seies data to predict future returns in this section. Hence, we need to set time-series cross-validation as time slice to maintain the structure in the data (oldest to newest). To be consistent with our out-of-sample date which separates training and test sets with approximately 70/30 ratio, we will set 36 and 12 observations for each training and validation examples when we fit the model. 
```{r set up trainControl, echo= T, message = FALSE}
#carat library for training
if(!require(caret)){install.packages('caret')}
library(caret)

#Training set starts from first date until date before separation date
timeSlices_train <- createTimeSlices(1:(length(dates_all)-length(dates_oos)), 
                   initialWindow = 36, horizon = 12, fixedWindow = TRUE)
```

*Date variables*
```{r set up oos horizon and returns,echo = T}
oos_start <- ((length(dates_all)-length(dates_oos))+1)
oos_length <- length(dates_all) - oos_start + 1
oos_ret <- as_tibble(returns) %>% 
  filter(Date >= sep_date) %>%    
  dplyr::select(-Date) %>%                                   
  as.matrix()    
```

*Equally-weighted calculation*
```{r set up equal-weight function,echo = T}
cal_port_weight <- function(sig_dat) {
  weight <- matrix(0, nrow= nrow(sig_dat),ncol = ncol(sig_dat))     # create weight matrix
  for (i in 1:nrow(sig_dat)) {                                      # assign weights
    buy_thisDay <- which(sig_dat[i,] == 1)
    for (j in 1:length(buy_thisDay)) {
      weight[i,buy_thisDay[j]] <- 1/length(buy_thisDay)
    }
  }
 return (weight)
}
```

##### Train KNN with percentile
```{r knn_p, echo = T}
# signals_knn_P <- matrix(0, nrow = oos_length, ncol = length(tick)) #create table to store signals
# colnames(signals_knn_P) <- tick
# for (i in 1:length(tick)){
#   a <- selectFeature(tick[i])
#   a_pred <- train(F_Return ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "knn", 
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start:length(dates_all),]
#   a_pred.future_test <- predict(a_pred, a_actual.future_test)     # get prediction
#   a_signal <- P_sig(a_pred.future_test,as.matrix(returns[tick[i]]),oos_start)                # transfrom into signals
#   signals_knn_P[,i]  <- a_signal                                  # combine signals for all stocks
# }
# weight_knn_P <- cal_port_weight(signals_knn_P)
# ret_knn_P <- rowSums(weight_knn_P*oos_ret)
# write.csv(weight_knn_P, file = "weight_knn_P.csv")
# write.csv(ret_knn_P, file = "ret_knn_P.csv")

weight_knn_P = read.csv(file = "weight_knn_P.csv")
weight_knn_P = weight_knn_P[-1]

ret_knn_P = read.csv(file = "ret_knn_P.csv")
ret_knn_P = ret_knn_P[-1]
```

##### Train KNN with percentile & sign
```{r knn_p + sign, echo = T}
# signals_knn_P.s <- matrix(0, nrow = oos_length, ncol = length(tick)) #create table to store signals
# pred_knn_P.s <- matrix(0, nrow = oos_length, ncol = length(tick))
# colnames(signals_knn_P.s) <- tick
# colnames(pred_knn_P.s) <- tick
# for (i in 1:length(tick)){
#   a <- selectFeature(tick[i])
#   a_pred <- train(F_Return ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "knn", 
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start:length(dates_all),]
#   a_pred.future_test <- predict(a_pred, a_actual.future_test)     # get prediction
#   a_signal <- P_sig(a_pred.future_test,as.matrix(returns[tick[i]]),oos_start)# transfrom into signals
#   pred_knn_P.s[, i] <- a_pred.future_test
#   for (j in 1: length(a_signal)){
#     if ((a_signal[j] == 1) & (a_pred.future_test[j] > 0)) {
#       signals_knn_P.s[j, i] = 1
#     } else {
#       signals_knn_P.s[j, i] = 0
#     }
#   }
# }
# weight_knn_P.s <- cal_port_weight(signals_knn_P.s)
# ret_knn_P.s <- rowSums(weight_knn_P.s*oos_ret)
# write.csv(weight_knn_P.s, file = "weight_knn_P_s.csv")
# write.csv(ret_knn_P.s, file = "ret_knn_P_s.csv")

weight_knn_P.s = read.csv(file = "weight_knn_P_s.csv")
weight_knn_P.s = weight_knn_P.s[-1]

ret_knn_P.s = read.csv(file = "ret_knn_P_s.csv")
ret_knn_P.s = ret_knn_P.s[-1]
```

##### Train KNN with 1-period momentum 
```{r knn_mom1,echo = T}
# signals_knn_MoM.1 <- matrix(0, nrow = oos_length, ncol = length(tick)) #create table to store signals
# colnames(signals_knn_MoM.1) <- tick
# for (i in 1:length(tick)){
#   a <- selectFeature(tick[i])
#   a_pred <- train(F_Return ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "knn", 
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start:length(dates_all),]
#   a_pred.future_test <- predict(a_pred, a_actual.future_test)     # get prediction
#   a_actual.current_test <- a$F_Return[(oos_start-1):(length(dates_all)-1)]
#   a_table <- cbind(a_pred.future_test,a_actual.current_test)
#   a_signal <- MoM.1_sig(a_table)                # transfrom into signals
#   signals_knn_MoM.1[,i]  <- a_signal[,3]          # combine signals for all stocks
# }
# weight_knn_MoM.1 <- cal_port_weight(signals_knn_MoM.1) 
# ret_knn_MoM.1 <- rowSums(weight_knn_MoM.1*oos_ret)
# 
# write.csv(weight_knn_MoM.1, file = "weight_knn_MoM1.csv")
# write.csv(ret_knn_MoM.1, file = "ret_knn_MoM1.csv")

weight_knn_MoM.1 = read.csv(file = "weight_knn_MoM1.csv")
weight_knn_MoM.1 = weight_knn_MoM.1[-1]

ret_knn_MoM.1 = read.csv(file = "ret_knn_MoM1.csv")
ret_knn_MoM.1 = ret_knn_MoM.1[-1]
```

##### Train KNN with 1-period momentum & sign
```{r knn_mom1 + sign,echo = T}
# signals_knn_MoM.1.s <- matrix(0, nrow = oos_length, ncol = length(tick)) #create table to store signals
# pred_knn_MoM.1.s <- matrix(0, nrow = oos_length, ncol = length(tick))
# colnames(signals_knn_MoM.1.s) <- tick
# colnames(pred_knn_MoM.1.s) <- tick
# 
# for (i in 1:length(tick)){
#   a <- selectFeature(tick[i])
#   a_pred <- train(F_Return ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "knn", 
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start:length(dates_all),]
#   a_pred.future_test <- predict(a_pred, a_actual.future_test)     # get prediction
#   a_actual.current_test <- a$F_Return[(oos_start-1):(length(dates_all)-1)]
#   a_table <- cbind(a_pred.future_test,a_actual.current_test)
#   a_signal <- MoM.1_sig(a_table)                # transfrom into signals
#   pred_knn_MoM.1.s[, i] <- a_pred.future_test
#   for (j in 1: nrow(a_signal)){
#     if ((a_signal[j,3] == 1) & (a_pred.future_test[j] > 0)) {
#       signals_knn_MoM.1.s[j, i] = 1
#     } else {
#       signals_knn_MoM.1.s[j, i] = 0
#     }
#   }
# }
# weight_knn_MoM.1.s <- cal_port_weight(signals_knn_MoM.1.s) 
# ret_knn_MoM.1.s <- rowSums(weight_knn_MoM.1.s*oos_ret)
# 
# write.csv(weight_knn_MoM.1.s, file = "weight_knn_MoM1s.csv")
# write.csv(ret_knn_MoM.1.s, file = "ret_knn_MoM1s.csv")

weight_knn_MoM.1.s = read.csv(file = "weight_knn_MoM1s.csv")
weight_knn_MoM.1.s = weight_knn_MoM.1.s[-1]

ret_knn_MoM.1.s = read.csv(file = "ret_knn_MoM1s.csv")
ret_knn_MoM.1.s = ret_knn_MoM.1.s[-1]
```

##### Train KNN with 3-period momentum 
```{r knn_mom3,echo = T}
# signals_knn_MoM.3 <- matrix(0, nrow = oos_length, ncol = length(tick)) #create table to store signals
# colnames(signals_knn_MoM.3) <- tick
# for (i in 1:length(tick)){
#   a <- selectFeature(tick[i])
#   a_pred <- train(F_Return ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "knn", 
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start:length(dates_all),]
#   pred <- predict(a_pred, a_actual.future_test)     # get prediction
#   a_pred.future_test <- c(0,0,0,pred)           # make lenght longer for previous actual returns
#   a_actual.current_test <- a$F_Return[(oos_start-3):(length(dates_all))]
#   a_table <- cbind(a_pred.future_test,a_actual.current_test)
#   a_signal <- MoM.3_sig(a_table)                # transfrom into signals
#   
#   signals_knn_MoM.3[,i]  <- a_signal[,3]          # combine signals for all stocks
# }
# weight_knn_MoM.3 <- cal_port_weight(signals_knn_MoM.3)
# ret_knn_MoM.3 <-rowSums(weight_knn_MoM.3*oos_ret)
# 
# write.csv(weight_knn_MoM.3, file = "weight_knn_MoM3.csv")
# write.csv(ret_knn_MoM.3, file = "ret_knn_MoM3.csv")

weight_knn_MoM.3 = read.csv(file = "weight_knn_MoM3.csv")
weight_knn_MoM.3 = weight_knn_MoM.3[-1]

ret_knn_MoM.3 = read.csv(file = "ret_knn_MoM3.csv")
ret_knn_MoM.3 = ret_knn_MoM.3[-1]
```

##### Train KNN with 3-period momentum & sign
```{r knn_mom3 + sign,echo = T}
# signals_knn_MoM.3.s <- matrix(0, nrow = oos_length, ncol = length(tick)) #create table to store signals
# colnames(signals_knn_MoM.3.s) <- tick
# pred_knn_MoM.3.s <- matrix(0, nrow = oos_length, ncol = length(tick)) 
# colnames(pred_knn_MoM.3.s) <- tick
# 
# for (i in 1:length(tick)){
#   a <- selectFeature(tick[i])
#   a_pred <- train(F_Return ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "knn", 
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start:length(dates_all),]
#   pred <- predict(a_pred, a_actual.future_test)     # get prediction
#   a_pred.future_test <- c(0,0,0,pred)           # make lenght longer for previous actual returns
#   a_actual.current_test <- a$F_Return[(oos_start-3):(length(dates_all))]
#   a_table <- cbind(a_pred.future_test,a_actual.current_test)
#   a_signal <- MoM.3_sig(a_table)                # transfrom into signals
#   
#   pred_knn_MoM.3.s[, i] <- a_pred.future_test[4:length(a_pred.future_test)]
#   for (j in 1: nrow(a_signal)){
#     if ((a_signal[j, 3] == 1) & (a_pred.future_test[j] > 0)) {
#       signals_knn_MoM.3.s[j, i] = 1
#     } else {
#       signals_knn_MoM.3.s[j, i] = 0
#     }
#   }
# }
# weight_knn_MoM.3.s <- cal_port_weight(signals_knn_MoM.3.s)
# ret_knn_MoM.3.s <-rowSums(weight_knn_MoM.3.s*oos_ret)
# 
# write.csv(weight_knn_MoM.3.s, file = "weight_knn_MoM3s.csv")
# write.csv(ret_knn_MoM.3.s, file = "ret_knn_MoM3s.csv")

weight_knn_MoM.3.s = read.csv(file = "weight_knn_MoM3s.csv")
weight_knn_MoM.3.s = weight_knn_MoM.3.s[-1]

ret_knn_MoM.3.s = read.csv(file = "ret_knn_MoM3s.csv")
ret_knn_MoM.3.s = ret_knn_MoM.3.s[-1]
```

##### Result: Peformance Metrics for KNN (Numerical)
```{r PM for knn_numerical}
M_knn_P <- perf_met2(as.matrix(ret_knn_P),as.matrix(weight_knn_P),dates_oos,
                    port_name = "KNN_Percentile",returns)
M_knn_MoM.1 <- perf_met2(as.matrix(ret_knn_MoM.1),as.matrix(weight_knn_MoM.1),dates_oos,
                        port_name = "KNN_Momentum.1", returns)
M_knn_MoM.3 <- perf_met2(as.matrix(ret_knn_MoM.3),as.matrix(weight_knn_MoM.3),dates_oos,
                        port_name = "KNN_Momentum.3", returns)

M_knn_P.s <- perf_met2(as.matrix(ret_knn_P.s),as.matrix(weight_knn_P.s),dates_oos,
                    port_name = "KNN_Percentile with sign",returns)
M_knn_MoM.1.s <- perf_met2(as.matrix(ret_knn_MoM.1.s),as.matrix(weight_knn_MoM.1.s),dates_oos,
                        port_name = "KNN_Momentum.1 with sign", returns)
M_knn_MoM.3.s <- perf_met2(as.matrix(ret_knn_MoM.3.s),as.matrix(weight_knn_MoM.3.s),dates_oos,
                        port_name = "KNN_Momentum.3 with sign", returns)


Numerical_knn_with.s <- rbind(M_knn_P,M_knn_MoM.1,M_knn_MoM.3,
                       M_knn_P.s,M_knn_MoM.1.s,M_knn_MoM.3.s)
Numerical_knn_with.s
```

#### Random Forest

##### Growing a Portfolio
Random Forest is another common machine learning techniques which developed from a decision tree. It works well with both classification and regression problems. It is a powerful technique due to its highly non-linear model. However there is a trade-off by having many hyperparameters namely **mtry** (number of predictors sampled for spliting at each node.) and **ntree** (number of tree in the forest). In this case, signal criteria (percentile of momentum) is also treat as another hyperparameter since different algorithms yield different portfolio weight.

Regression random forest is used in this setting, where the user define a value for hyperparameter. (Impact of hyperparameter is discussed in the next section). For the Random Forest portfolio, the weight for each asset at each time period will be determined by the following procedures:

1. At each date, train Random Forest model by using past data. (Note that different assets have different model due to feature selection process for each asset)
2. Use the model to get a prediction for future return (F_Return)
3. Derive investment decision for each asset by applying signal criteria (either percentile or momentum)to the prediction 
4. Normalise portfolio weight using equally weighted method

Then the strategy will be evaluated on its performance metrics based on the portfolio weight and return.

First, **weight_RF** function is defined to calculated portfolio weight using Random Forest technique. The function receives past and current data, mtry, ntree, and s_criteria (s_criteria) as parameters and return the buying decision as output.

```{r rf 1, message = FALSE, warning = FALSE}
if(!require(randomForest)){install.packages("randomForest")}
library(randomForest)
weight_RF <- function(past_data, current_data, mtry, ntree, s_criteria, prev_w){ # Function for weight calculation using random forest technique
      train <- past_data[-c(1)] #Training data, drop date columns
      test <- current_data[-c(1,2)] #Testing data, drop date and F_Return columns
      rf.fit <- randomForest(F_Return ~ ., # Grow random forest
                         data = train, # Training data
                         mtry = min(mtry,length(test)), # Number of predictive variables used is 4 (or less than if there is less variable)
                         ntree = ntree) # Number of random trees in the forest
      pred <- predict(rf.fit, test) # Predict F_Return using the fitted model
      if (s_criteria == 1){ # percentile signal criteria
        return (ifelse(pred < quantile(past_data$F_Return, c(.33)), 0, ifelse(pred < quantile(past_data$F_Return, c(.66)), prev_w, 1)))
      } else if (s_criteria == 2){ # momentum signal criteria
        return (ifelse(pred > 0, (ifelse(past_data[nrow(past_data)-1,]$F_Return > 0 & 
                                           past_data[nrow(past_data)-2,]$F_Return > 0 & 
                                           past_data[nrow(past_data)-3,]$F_Return > 0, 1, 0)), 0))
      } else { # just invest if return is positive
        return (ifelse(pred > 0, 1, 0)) # Invest if return > 0                                                                                  
      }
}
```

Function **doRF** performs backtesting evaluation by creating a portfolio by using Random Forest technique, given a specific set of hyperparameter values. The funcion returns a performance metric as a result.
```{r rf 2, message = FALSE, warning = FALSE}
doRF <- function(mtry = 4, ntree = 10, s_criteria = 0){
  portf_weight_RF <- matrix(0, nrow = length(dates_oos)-1, ncol = length(tick)) # Initiate weight matrix for RF portfolio
  portf_return_RF <- c() # # Initiate return list for RF portfolio
  prev_w <- 0
  for (i in 1:length(tick)){ # Repeat for all ticker
    temp <- selectFeature(tick[i]) # Feature selection for ticker i 
    w <- c() # Initiate an empty list of weight for ticker i
    for (t in 2:(length(dates_oos))){ # Expandind window for out-of-sample dates
      past_data <- temp %>% filter(Date < dates_oos[t-1]) # Past data is all data from the date before date t
      current_data <- temp %>% filter(Date == dates_oos[t-1]) # Current data is data at date t
      res <- weight_RF(past_data = past_data, current_data = current_data, mtry = mtry, ntree = ntree, s_criteria = s_criteria, prev_w = prev_w) # Get weight using random forest method with specific parameter
      w = rbind(w, res) # Append date for stock i at period t
      prev_w <- w
    }
    portf_weight_RF[,i] <- w # Append weight for stock i for all period t
  }
  portf_weight_RF <- getEW(portf_weight_RF) # Normalise weight (Equally weighted)
  for(j in 1:(length(dates_oos)-1)){ 
    portf_return_RF[j] <- sum(portf_weight_RF[j,] * returns_oos[j,]) # Calculate portfolio return
  }
  return (perf_met(portf_return_RF, portf_weight_RF, asset_return)) # Get performance metrics
}
```

Next, create a portfolio based on Random Forest strategy and evaluate the performance
```{r rf 3, message = FALSE, warning = FALSE}
doRF()
```
##### Sensitivity Analysis for Hyperparameters - Using the GridSearch

Performance could be improved by using the right values of hyperparameter. During the trading period, **cross-validation** technique could be used to tune hyperparameter. Analysis below demonstrate how much does hypermeters impact the performance of the portfolio by applying **GridSearch** technique. GridSearch can be performed with the following step:

1. Set up range of values to be tested for each hyperparameter
2. Set up list of all possible combinations for all hyperparameters
3. Evaluate the model on all settings from step 2

This operation could significantly consume a huge runtime. Therefore **RandomSearch** can also be used to reduce runtime. The concept of RandomSearch is to initially randomly set a list of hyperparameters to be tested. Then the set that gives the best performance will be used to tuned the hyperparameters once again by doing GridSearch around those points.

Below, GridSearch is performed on selected combinations of hyperparameters and the resulting average return for each setting is visualized.

```{r rf 4, message = FALSE, warning = FALSE}
mtry <- c(6, 7, 8, 9, 10) # List of mtry to be tested
ntree <- c(10, 20, 30, 40, 50) # List of ntree to be tested
s_criteria <- c(0, 1, 2) # List of signal criteria to be tested
pars_RF <- expand.grid(mtry, ntree, s_criteria)  # Exploring all combinations
mtry <- pars_RF[,1]
ntree <- pars_RF[,2]
s_criteria <- pars_RF[,3]
grid_pars_RF <- function(mtry, ntree, s_criteria){  # Parameters for the grid search
  print(paste('mtry: ', toString(mtry), ' ntree: ', toString(ntree), ' s_criteria: ', toString(s_criteria)))
  temp <- doRF(mtry = mtry, ntree = ntree, s_criteria = s_criteria)
  return (temp$avg_ret) # Return average return
}

# Load result which has already been computed to reduce runtime 
grd <- read.csv('grd_RF.csv', header=TRUE, sep=",")

# If haven't computed before, uncomment the following block
#------------------------------------------------------------------------------#
# grd <- pmap(list(mtry, ntree, s_criteria), # Parameters for the grid search
#             grid_pars_RF) # Input for function: test labels 
# grd <- data.frame(mtry, ntree, s_criteria, avg_ret = unlist(grd))
# write.csv(grd, file = "grd_RF.csv", row.names=FALSE) # Keep as csv file on local
#------------------------------------------------------------------------------#

# grd$s_criteria <- ifelse(grd$s_criteria == 0, "none", ifelse(grd$s_criteria == 1, "percentile", "momentum"))
# grd %>% ggplot(aes(x = s_criteria, y = avg_ret, fill = s_criteria)) + # Plot!
#     geom_bar(stat = "identity") +
#     facet_grid(rows = vars(mtry), cols = vars(ntree))
```

##### Result: Peformance Metrics for Random Forest
Obviously, various combinations of hyparameters impact average return of the portfolio signigicantly. The best set of hyperparameter are: mtry = 9, ntree = 20, and percentile signal criteria. Full performance metrics for this setting is shown below.


```{r rf 5, message = FALSE, warning = FALSE}
doRF(mtry = 9, ntree = 20, s_criteria = 1)
```

### Categorical 
Classification models are used when dependent variables are categorical, for example; when they are binary. Hence, we need to transform actual outputs into categories, then use them to fit the models. We will follow the same categories in the previous section, namely percentile and momemtum.

#### QDA (Quadratic Discriminant Analysis)

##### QDA-Driven Portolio
QDA or quadratic discriminate analysis is one of the famous machine learning technique for classification prediction which requires no hyperparameter. Applying QDA to this scenario, the response variable or future return (F_Return) must be binarised into 1 for positive and 0 for negative return. Therefore the prediction from QDA model is a probability of having a positive return in the next period. For QDA portfolio, the weight for each asset at each time period will be determined by the following procedures: 

1. At each date, train QDA model by using past data. (Note that different assets have different model due to feature selection process for each asset)
2. Use the model to get a prediction for future return (F_Return)
3. Derive investment decision for each asset by applying momentum signal criteria to the prediction 
4. Normalise portfolio weight using equally weighted method

Then the strategy will be evaluated on its performance metrics based on the portfolio weight and return.

First, **weight_QDA** function is defined to calculated portfolio weight using QDA technique. The function receives past and current data and decision threshold as parameters. Decision threshold is a choosen value between 0 and 1 which is used to change probability of having a positive future return into a buying decision. (If the probability is higher than the threshold, then buy an asset) Lastly, function returns either 0 not to invest or 1 to invest.

```{r, message = FALSE, warning = FALSE}
if(!require(MASS)){install.packages('MASS')}
library(MASS)
weight_QDA <- function(past_data, current_data, threshold = 0.5){ # If not passing threshould then set to 0.5
  y <- past_data$F_Return # Set past data F_Return as a response variable
  y <- ifelse(y > 0 , 1, 0) # Binarise response variable (set to 1 if the return is positive and 0 o.w.)
  x <- as.matrix(past_data[-c(1,2)]) #Remove Date and F_Return from explanatory variables (training data)
  new_x <- current_data[-c(1,2)] #Remove Date and F_Return from explanatory variables (testing data)
  qda.fit <- qda(x = x, grouping = y) # Train QDA model
  pred <- predict(qda.fit, new_x) # Apply model on testing data
  if (pred$posterior[2] > threshold){ # If probability of getting a positive return is greater than threshold
    return (ifelse(y[length(y)-1] > 0 & y[length(y)-2] > 0 & y[length(y)-3] > 0, 1, 0)) # Invest if in a good momentum 
  } else { # Else
    return (0) # Do not invest...
  }
}
```

Function **doQDA** performs backtesting evaluation by creating a portfolio by using QDA technique, given a specific decision threshold. The funcion return a performance metric as a result.

```{r, message = FALSE, warning = FALSE}
doQDA <- function(threshold = 0.5){
  portf_weight_QDA <- matrix(0, nrow = length(dates_oos)-1, ncol = length(tick)) # Initiate weight matrix for QDA portfolio
  portf_return_QDA <- c() # # Initiate return list for QDA portfolio
  for (i in 1:length(tick)){ # Repeat for all ticker
    temp <- selectFeature(ticker = tick[i]) # Feature selection for ticker i 
    w = c() # Initiate an empty list of weight for ticker i
    for (t in 2:(length(dates_oos))){ # Expandind window for out-of-sample dates
      past_data <- temp %>% filter(Date < dates_oos[t-1]) # Past data is all data from the date before date t
      current_data <- temp %>% filter(Date == dates_oos[t-1]) # Current data is data at date t
      res <- weight_QDA(past_data = past_data, current_data = current_data, threshold = threshold) # Get weight using QDA method with specific                                                                                                       threshold
      w = rbind(w, res) # Append date for stock i at period t
    }
    portf_weight_QDA[,i] <- w # Append weight for stock i for all period t
  }
  portf_weight_QDA <- getEW(portf_weight_QDA) # Normalise weight (Equally weighted)
  for(j in 1:(length(dates_oos)-1)){ 
    portf_return_QDA[j] <- sum(portf_weight_QDA[j,] * returns_oos[j,]) # Calculate portfolio return
  }
  return (perf_met(portf_return_QDA, portf_weight_QDA, asset_return)) # Get performance metrics
}
```

Next, create a portfolio based on QDA strategy and evaluate the performance

```{r, message = FALSE, warning = FALSE}
doQDA()
```
##### Impact of Decision Threshold

In practice, decision threshold need to be set along trading days. For those investors who strongly believe the QDA model, they can have a low threshold which interpret a low probability of having a positive return as a buy signal. Conversely, investors that do not fully trust the model can set a high threshold to only buy an asset when the probability is very high. If the decision threshold is not set, QDA default value is 0.5 or known as predicting a class with higher probability

However, decision threshold can be set to maximize the portfolio performance. Below, decision threshold is varied from 0.1 to 0.9 and the performance metrics are plotted to illustrate the impact of decision threshold. However, in practice, threshold below 0.5 should not be choosen since there is higher probabilty of having a negative return.

```{r, message = FALSE, warning = FALSE}
if(!require(ggplot2)){install.packages('ggplot2')}
if(!require(reshape2)){install.packages('reshape2')}
library(ggplot2)
library(reshape2)

perf_met_QDA <- data.frame(matrix(ncol = 6, nrow = 0)) # Initiate dataframe for performance metric of QDA with different parameter values
colnames(perf_met_QDA) <- c('threshold', 'avg_ret', 'vol', 'Sharpe_ratio', 'VaR_5', 'turn') # Set column names

# Load result which has already been computed to reduce runtime
perf_met_QDA <- read.csv('QDA_01.csv', header=TRUE, sep=",")

# If haven't computed before, uncomment the following block
#------------------------------------------------------------------------------#
# for (threshold in seq(0.1, 0.9, 0.1)){ # Loop through set of threshold value (Hyperparameter tuning)
#   perf_met_QDA <- rbind(perf_met_QDA,  cbind(threshold, t(unlist(doQDA(threshold = threshold))))) # Train/test model and store perf. met.
# }
# write.csv(perf_met_QDA, file = "QDA_01.csv", row.names=FALSE) # Keep as csv file on local
#------------------------------------------------------------------------------#

plot_data <- melt(perf_met_QDA  ,  id.vars = 'threshold', variable.name = 'series') # Plot performance evaluation for different value of hy.par.
ggplot(plot_data, aes(threshold, value)) + 
  geom_line(colours = "blue") + 
  facet_grid(series ~ ., scale = "free_y") +
  scale_x_continuous(breaks = seq(0.1, 0.9, 0.1))
```

Obviously, the optimal decision threshold which provide the best performance for most metrics is about 0.6. It yields the higer average return, sharpe ratio, and value-at-risk with lower volatility. Optimal decision threshold could be search again at values around 0.6.

##### Result: Peformance Metrics for QDA
```{r, message = FALSE, warning = FALSE}
perf_met_QDA <- data.frame(matrix(ncol = 6, nrow = 0)) # Initiate dataframe for performance metric of QDA with different parameter values
colnames(perf_met_QDA) <- c('threshold', 'avg_ret', 'vol', 'Sharpe_ratio', 'VaR_5', 'turn') # Set column names

# Load result which has already been computed to reduce runtime 
perf_met_QDA <- read.csv('QDA_02.csv', header=TRUE, sep=",")

# If haven't computed before, uncomment the following block
#------------------------------------------------------------------------------#
# for (threshold in seq(0.6, 0.7, 0.01)){ # Loop through set of threshold value (Hyperparameter tuning)
#   perf_met_QDA <- rbind(perf_met_QDA,  cbind(threshold, t(unlist(doQDA(threshold = threshold))))) # Train/test model and store perf. met.
#}
# write.csv(perf_met_QDA, file = "QDA_02.csv", row.names=FALSE) # Keep as csv file on local
#------------------------------------------------------------------------------#


plot_data <- melt(perf_met_QDA  ,  id.vars = 'threshold', variable.name = 'series') # Plot performance evaluation for different value of hy.par.
ggplot(plot_data, aes(threshold, value)) + 
  geom_line() + 
  facet_grid(series ~., scale = "free_y") +
  scale_x_continuous(breaks = seq(0.6, 0.7, 0.01))
```

 In conclusion, optimal decision threshold is choosen at 0.65 which yields the best of most performance metrics.
 Below is the full performance metrics at optimal decision threshold.


```{r, message = FALSE, warning = FALSE}
perf_met_QDA_opt <- perf_met_QDA %>% # Get performance metric for threshold = 0.6
  filter(threshold == 0.65) %>%
  dplyr::select(-(threshold))
perf_met_QDA_opt <- cbind("method" = "QDA", perf_met_QDA_opt) # Add method name
perf_met_QDA_opt
```


**Important**: Once again, decision threshold need to be set along trading days. A good rule of thumb is to set it at 0.5 or adjust it to suit the investor's confident on the model.

#### K-Nearest Neighbour

With KNN method, we factorised the future returns using 3 criteria: percentile and momentum with 1 and 3 periods, then the model is trained with the features selected by LASSO algorithm. Lastly, equal weight is applied to the stocks with 'buy' signals.

Nonetheless, there is a lost of information when strategies portfolio this way. 

1. For percentile categories, we need to sacrifice some first chunk of past data to create the percentile. Here, we start computing the signals after 36 first periods. (dataset = day 37-229)

```{r percentile outputs, echo = T}
outputs_P <- matrix(0, nrow = length(dates_all)-36, ncol = length(tick))
colnames(outputs_P) <- tick
for (i in 1:length(tick)) {
   class_P <- P_sig(as.matrix(returns[tick[i]][37:length(dates_all),]),
                    as.matrix(returns[tick[i]]), 37)
   outputs_P[,i] = class_P
}
```
2. We will lose the last observation for 1 momentum critaria as there is no future returns for the last period. (dataset = day 2-229)
```{r MoM1 outputs, echo = T}
outputs_MoM.1 <- matrix(0, nrow = length(dates_all)-1, ncol = length(tick))
colnames(outputs_MoM.1) <- tick
for (i in 1:length(tick)) {
  a_table <- cbind(as.matrix(returns[tick[i]][2:(length(dates_all)),]),
                  as.matrix(returns[tick[i]][1:(length(dates_all)-1),]))
  class_MoM.1 <- MoM.1_sig(a_table)
  outputs_MoM.1[,i] = class_MoM.1[,3]
}
```
3. For 3 momentum periods, it costs 7 observations. We lose the first 4 to consider previous momemtum for the first case and the last 3 for not having future momemtum. (dataset = day 5-226)
```{r MoM3 outputs, echo = T}
outputs_MoM.3 <- matrix(0, nrow = (length(dates_all)-7), ncol = length(tick) )
colnames(outputs_MoM.3) <- tick
for (i in 1:length(tick)) {
  zeros <- matrix(c(0,0,0), nrow = 3)
  F_vector <- rbind(zeros,as.matrix(returns[tick[i]][5:(length(dates_all)-3),]))
  P_vector <- as.matrix(returns[tick[i]][1:(length(dates_all)-4),])
  #P_vector <- P_vector[1:(nrow(P_vector)-1),]
  a_table <- cbind(F_vector, P_vector)
  class_MoM.3 <- MoM.3_sig(a_table)
  outputs_MoM.3[,i] = class_MoM.3[,3]
}
```

Therefore, for consistency, we reduce all dataset for categorical outputs to all start on day 37 and end on day 226.
```{r}
outputs_P <- outputs_P[1:190,]
outputs_MoM.1 <- outputs_MoM.1[36:225,]
outputs_MoM.3 <- outputs_MoM.3[33:nrow(outputs_MoM.3),]
```

```{r Setup oos horizon and returns for categories outputs, echo = T, warning = F, message = F}
oos_start_cat <- ((length(dates_all)-length(dates_oos))+1) - 37 
oos_length_cat <- 190 - oos_start_cat + 1
oos_ret_cat <- as_tibble(returns) %>% 
  slice(oos_start_cat:190) %>%
  dplyr::select(-Date) %>%                                   
  as.matrix()    

#due to trimming of dataframe for categorical outputs, the end date 
timeSlices_cat_train <- createTimeSlices(1:oos_start_cat-1, 
                   initialWindow = 36, horizon = 12, fixedWindow = TRUE)
```

##### Train KNN with percentile
```{r knn_p_cat, echo = T}
# signals_knn_P_cat <- matrix(0, nrow = oos_length_cat, ncol = length(tick)) #create table to store signals
# colnames(signals_knn_P_cat) <- tick
# for (i in 1:length(tick)){
#   a <- selectFeature2(tick[i],outputs_P) %>%
#     mutate(output = as.factor(output))          
#   a_pred <- train(output ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "knn", 
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start_cat:nrow(a),] %>%
#     mutate(output = as.factor(output))          
#   a_pred.future_test <- predict(a_pred, a_actual.future_test)       # get prediction
#   signals_knn_P_cat[,i]  <- as.numeric(as.character(a_pred.future_test))   
#   # combine signals for all stocks
# }
# weight_knn_P_cat <- cal_port_weight(signals_knn_P_cat)
# ret_knn_P_cat<-rowSums(weight_knn_P_cat*oos_ret_cat)
# 
# write.csv(weight_knn_P_cat, file = "weight_knn_P_cat.csv")
# write.csv(ret_knn_P_cat, file = "ret_knn_P_cat.csv")

weight_knn_P_cat = read.csv(file = "weight_knn_P_cat.csv")
weight_knn_P_cat = weight_knn_P_cat[-1]

ret_knn_P_cat = read.csv(file = "ret_knn_P_cat.csv")
ret_knn_P_cat = ret_knn_P_cat[-1]
```

##### Train KNN with 1-period momentum
```{r knn_mom1_cat,echo= T}
# signals_knn_MoM.1_cat <- matrix(0, nrow = oos_length_cat, ncol = length(tick)) #create table to store signals
# colnames(signals_knn_MoM.1_cat) <- tick
# for (i in 1:length(tick)){
#   a <- selectFeature2(tick[i], outputs_MoM.1) %>%
#     mutate(output = as.factor(output))          
#   a_pred <- train(output ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "knn", 
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start_cat:nrow(a),] %>%
#     mutate(output = as.factor(output))          
#   a_pred.future_test <- predict(a_pred, a_actual.future_test)       # get prediction
#   
#   signals_knn_MoM.1_cat[,i]  <- as.numeric(as.character(a_pred.future_test))   
#     # combine signals for all stocks
#   print(i)
# }
# weight_knn_MoM.1_cat <- cal_port_weight(signals_knn_MoM.1_cat)
# ret_knn_MoM.1_cat<-rowSums(weight_knn_MoM.1_cat*oos_ret_cat)
# 
# write.csv(weight_knn_MoM.1_cat, file = "weight_knn_MoM1_cat.csv")
# write.csv(ret_knn_MoM.1_cat, file = "ret_knn_MoM1_cat.csv")

weight_knn_MoM.1_cat = read.csv(file = "weight_knn_MoM1_cat.csv")
weight_knn_MoM.1_cat = weight_knn_MoM.1_cat[-1]

ret_knn_MoM.1_cat = read.csv(file = "ret_knn_MoM1_cat.csv")
ret_knn_MoM.1_cat = ret_knn_MoM.1_cat[-1]
```


##### Result: Peformance Metrics for KNN (Categorical)
```{r}
M_knn_P_cat <- perf_met2(as.matrix(ret_knn_P_cat),as.matrix(weight_knn_P_cat),dates_oos,
                    port_name = "KNN_Percentile",returns)
M_knn_MoM.1_cat <- perf_met2(as.matrix(ret_knn_MoM.1_cat),as.matrix(weight_knn_MoM.1_cat),dates_oos,
                        port_name = "KNN_Momentum.1", returns)

cat_knn<- rbind(M_knn_P_cat,M_knn_MoM.1_cat)
cat_knn
```

#### Logit Model
This model transform the outputs into probability. Therefore, the outputs are capped between 0 and 1.  However, it uses non-normal distribution to transform the output. It consumes less power to perform without the normal distribution constraint, but it costs fatter tails. There is no hyperparameter in this model.

##### Train logit model with percentile
```{r Logit_p_cat, echo = T, message= FALSE,results = 'hide'}
# signals_logit_P_cat <- matrix(0, nrow = oos_length_cat, ncol = length(tick)) #create table to store signals
# colnames(signals_logit_P_cat) <- tick
# for (i in 1:length(tick)){
#   a <- selectFeature2(tick[i],outputs_P) %>%
#     mutate(output = as.factor(output))         
#   a_pred <- train(output ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "glm", 
#                   family = binomial(link = "logit"), 
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start_cat:nrow(a),] %>%
#     mutate(output = as.factor(output))         
#   a_pred.future_test <- predict(a_pred, a_actual.future_test)       # get prediction
#   
#   signals_logit_P_cat[,i]  <- as.numeric(as.character(a_pred.future_test))              
#   # combine signals for all stocks
#    
#  
# }
# weight_logit_P_cat <- cal_port_weight(signals_logit_P_cat)
# ret_logit_P_cat<-rowSums(weight_logit_P_cat*oos_ret_cat)
# 
# write.csv(weight_logit_P_cat, file = "weight_logit_P_cat.csv")
# write.csv(ret_logit_P_cat, file = "ret_logit_P_cat.csv")

weight_logit_P_cat = read.csv(file = "weight_logit_P_cat.csv")
weight_logit_P_cat = weight_logit_P_cat[-1]

ret_logit_P_cat = read.csv(file = "ret_logit_P_cat.csv")
ret_logit_P_cat = ret_logit_P_cat[-1]
```

##### Train logit model with 1-period momentum
```{r Logit_mom1_cat,echo = T, message= FALSE,results = 'hide'}
# signals_logit_MoM.1_cat <- matrix(0, nrow = oos_length_cat, ncol = length(tick)) #create table to store signals
# colnames(signals_logit_MoM.1_cat) <- tick
# for (i in 1:length(tick)){
#   a <- selectFeature2(tick[i], outputs_MoM.1) %>%
#     mutate(output = as.factor(output))         
#   a_pred <- train(output ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "glm", 
#                   family = binomial(link = "logit"),  
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start_cat:nrow(a),]  %>%
#     mutate(output = as.factor(output))         
#   a_pred.future_test <- predict(a_pred, a_actual.future_test)       # get prediction
#   
#   signals_logit_MoM.1_cat[,i]  <- as.numeric(as.character(a_pred.future_test))              
#     # combine signals for all stocks
#    
# }
# weight_logit_MoM.1_cat <- cal_port_weight(signals_logit_MoM.1_cat)
# ret_logit_MoM.1_cat<-rowSums(weight_logit_MoM.1_cat*oos_ret_cat)

# write.csv(weight_logit_MoM.1_cat, file = "weight_logit_MoM.1_cat.csv")
# write.csv(ret_logit_MoM.1_cat, file = "ret_logit_MoM.1_cat.csv")

weight_logit_MoM.1_cat = read.csv(file = "weight_logit_MoM.1_cat.csv")
weight_logit_MoM.1_cat = weight_logit_MoM.1_cat[-1]

ret_logit_MoM.1_cat = read.csv(file = "ret_logit_MoM.1_cat.csv")
ret_logit_MoM.1_cat = ret_logit_MoM.1_cat[-1]
```

Since the condition is rarely meet, some test folds contain information missing in training folds. For this reason, we will skip this method with the assumption that the signals represent a lack in evidence of consistent increasing momentum over 3 periods. In other word, it can be interpreted that 3-period past information is irrelevant in determining significant signals.

For example, considering apple stock autocorrelation below. The figure shows that past information is barely indicative, except only for lag 15-period observation which might be too further away from the current time to have an economic meaning.

```{r}
acf(outputs_MoM.3[,'AAPL'])
```

##### Result: Performance Metrics for Logit Model
```{r}
M_logit_P_cat <- perf_met2(as.matrix(ret_logit_P_cat),as.matrix(weight_logit_P_cat),dates_oos,
                    port_name = "logit_Percentile",returns)
M_logit_MoM.1_cat <- perf_met2(as.matrix(ret_logit_MoM.1_cat),as.matrix(weight_logit_MoM.1_cat),dates_oos,
                        port_name = "logit_Momentum.1", returns)
cat_logit <- rbind(M_logit_P_cat,M_logit_MoM.1_cat)
cat_logit
```

#### Probit Model
Similar to logit model, this model transform also the outputs into probability, which cap outputs between 0 and 1.  On a contrary to logit model, it uses normal distribution to transform the output. 

##### Train probit model with percentile
```{r probit_p_cat, echo = T, message= FALSE,results = 'hide'}
# signals_probit_P_cat <- matrix(0, nrow = oos_length_cat, ncol = length(tick)) #create table to store signals
# colnames(signals_probit_P_cat) <- tick
# for (i in 1:length(tick)){
#   a <- selectFeature2(tick[i],outputs_P) %>%
#     mutate(output = as.factor(output))         
#   a_pred <- train(output ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "glm", 
#                   family = binomial(link = "probit"), 
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start_cat:nrow(a),] %>%
#     mutate(output = as.factor(output))         
#   a_pred.future_test <- predict(a_pred, a_actual.future_test)       # get prediction
#   
#   signals_probit_P_cat[,i]  <- as.numeric(as.character(a_pred.future_test))              
#     # combine signals for all stocks
#    
# }
# weight_probit_P_cat <- cal_port_weight(signals_probit_P_cat)
# ret_probit_P_cat<-rowSums(weight_probit_P_cat*oos_ret_cat)

# write.csv(weight_probit_P_cat, file = "weight_probit_P_cat.csv")
# write.csv(ret_probit_P_cat, file = "ret_probit_P_cat.csv")

weight_probit_P_cat = read.csv(file = "weight_probit_P_cat.csv")
weight_probit_P_cat = weight_probit_P_cat[-1]

ret_probit_P_cat = read.csv(file = "ret_probit_P_cat.csv")
ret_probit_P_cat = ret_probit_P_cat[-1]
```


##### Train probit model with 1-period momentum
```{r probit_mom1_cat,echo = T, message= FALSE,results = 'hide'}
# signals_probit_MoM.1_cat <- matrix(0, nrow = oos_length_cat, ncol = length(tick)) #create table to store signals
# colnames(signals_probit_MoM.1_cat) <- tick
# for (i in 1:length(tick)){
#   a <- selectFeature2(tick[i], outputs_MoM.1) %>%
#     mutate(output = as.factor(output))         
#   a_pred <- train(output ~ ., data = a[unlist(timeSlices_train),], 
#                   method = "glm", 
#                   family = binomial(link = "probit"),  
#                   preProc = c("center", "scale"))
#   a_actual.future_test = a[oos_start_cat:nrow(a),]%>%
#     mutate(output = as.factor(output))         
#   a_pred.future_test <- predict(a_pred, a_actual.future_test)       # get prediction
#   
#   signals_probit_MoM.1_cat[,i] <- as.numeric(as.character(a_pred.future_test))              
#     # combine signals for all stocks
#    
# }
# weight_probit_MoM.1_cat <- cal_port_weight(signals_probit_MoM.1_cat)
# ret_probit_MoM.1_cat<-rowSums(weight_probit_MoM.1_cat*oos_ret_cat)
# 
# write.csv(weight_probit_MoM.1_cat, file = "weight_probit_MoM.1_cat.csv")
# write.csv(ret_probit_MoM.1_cat, file = "ret_probit_MoM.1_cat.csv")

weight_probit_MoM.1_cat = read.csv(file = "weight_probit_MoM.1_cat.csv")
weight_probit_MoM.1_cat = weight_probit_MoM.1_cat[-1]

ret_probit_MoM.1_cat = read.csv(file = "ret_probit_MoM.1_cat.csv")
ret_probit_MoM.1_cat = ret_probit_MoM.1_cat[-1]
```

addadd --> for the same reason, no mom3 for probit

##### Result: Performance Metrics for Probit Model
```{r}
M_probit_P_cat <- perf_met2(as.matrix(ret_probit_P_cat),as.matrix(weight_probit_P_cat),dates_oos,
                    port_name = "probit_Percentile",returns)
M_probit_MoM.1_cat <- perf_met2(as.matrix(ret_probit_MoM.1_cat),as.matrix(weight_probit_MoM.1_cat),dates_oos,
                        port_name = "probit_Momentum.1", returns)
cat_probit <- rbind(M_probit_P_cat,M_probit_MoM.1_cat)
cat_probit
```


# Analysis

To assess the performance of each machine learning technique, we focus on the aspect of portfolio. Accordingly, we score each aspect of performance metrics from 1 to 18 for 18 methods, where 1 is the best in that category and 18 is the worst. Thus, we call this score ‘rank’. Namely, rank 1 for average return, sharpe ratio and VaR are the highest, while rank 1 for volatility and turnover are the lowest.

Table 1: Summary table for performance metrics & rank


![Text](table1_summaryPerfMat.jpg)

From the table, our benchmark portfolio is quite high in the rank at #5. The all ML techniques which outperforms benchmark are categorical dependent variable with signals, except sparse portfolio which is #1. Thus, clearly, sparse portfolio outperforms signal models. 

Within the signal models, categorical method outshines the numerical. Consider KNN, which is used for both categorical and numerical outputs. It performs poorly in the numerical outputs in signal models, as seen in the bottom ranks. In contrast, it gives top-ten ranks for categorical one. This might be because KNN algorithm tries to separate outputs into regions based on features, thus; it does a better job for categorical outputs as there are less regions to work with. This is because numerical outputs have more various values which the model needs to learn and fit, hence these models might face overfitting problems. 

The rank shows that the simpler models, including probit and logit models, have low-end ranks. Whereas, the complex models, such as random forest and KNN, have high-end ranks. Nonetheless, the intermediate model in terms of complexity, QDA, performs the best among categorical model. This reflects that although the model tends to improve with higher complexity (lower variance), however it will face with high variance in the out-of-sample set. Therefore, only some level of complexity is appropriate, where it minimise both bias and variance. 

According to performances of our portfolios, it can be assigned to each investors who prefer different levels of risk. For instance, risk-aversive investors will be willing to allocate their capital on less volatile portfolios such as Sparse portfolio. On the other hand, investors who can bear more risks can choose to invest in more risky portfolios to gain higher returns such as Categorical KNN portfolios. Sparse portfolio and Equally weighted portfolio are recommended for passive portfolio managing investors which will cost significantly less transanction cost than other portfolio strategies. 
