Overview

In this vignette, I demonstrate how to utilize the RiskPortfolios library for portfolio selection. The analysis below relies on different estimation methods included with the library. Specifically, I demonstrate how to apply ten different estimation techniques for covariance matrices. Based on these estimates, I find a global minimum variance portfolio without position constraints. The back-testing procedure is conducted using Fama-French monthly industry portfolios.

Getting Started

Let us begin by loading all relevant packages and downloading the data needed.

library(RiskPortfolios)
library(lubridate)
library(xts)
library(moments)
library(ggplot2)
library(plotly)

The analysis relies on different Fama-French industry portfolios. The code below downloads the data directly from Ken French’s website and stacks the five industry datasets in a list.

file_i <- "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/XYZ_Industry_Portfolios_CSV.zip"
ds_list <- list()
for (ind in c(5,10,17,30,48)) {
  
  FF_file <- gsub("XYZ",ind,file_i,)
  temp <- tempfile()
  download.file(FF_file,temp)
  unz_files <- unzip(temp)
  ds <- read.csv(unz_files,skip = 11)
  flag_obs <- grep("Equal",ds[,1],ignore.case = T)
  ds <- ds[1:(flag_obs-1),]
  names(ds)[1] <- "date"
  ds <- data.frame(apply(ds, 2, as.numeric))
  ds[,-1] <- ds[,-1]/100
  ds$date <- ceiling_date(ymd(ds$date*100+ 01),"m")-1
  ds <- ds[ds$date >= "1990-01-01",]
  ds_list <- c(ds_list,list(ds))
}

I set the data to start from 1990. In this case, we have the follow number of observation for each data

sapply(ds_list,nrow)
[1] 378 378 378 378 378
years_data <- lapply(ds_list, function(x) unique(year(x$date)) )
years_data <- Reduce(union,years_data)
total_n_years <- max(years_data) - min(years_data)

which corresponds roughly to 31 years of data.

Note: the key is to run a program on a single data from ds_list list, so that we can rerun it through the whole datasets easily.

The Portfolio Problem

For the portfolio, I consider the global minimum variance portfolio (GMVP). Without position constraints, the portfolio weights are given by \[\begin{equation} \mathbf{w}=\frac{\boldsymbol{\Sigma}^{-1}\mathbf{1}}{\mathbf{1}^{\top}\boldsymbol{\Sigma}^{-1}\mathbf{1}} \end{equation}\] where \(\Sigma\) denotes a non-singular \(N \times N\) covariance matrix with \(N\) is the number of assets in the portfolio. Additionally, \(\mathbf{1}\) denotes a column vector of ones.

In practice, the portfolio manager needs to estimate the covariance matrix. We denote the corresponding estimate using methodology \(m\) and sample size \(T\) by \(\hat{\boldsymbol{\Sigma}}_{m,T}\). For this reason, the GMVP is a function of both \(m\) and \(T\), such that \[\begin{equation} \hat{\mathbf{w}}_{m,T}=\frac{\hat{\boldsymbol{\Sigma}}_{m,T}^{-1}\mathbf{1}}{\mathbf{1}^{\top}\hat{\boldsymbol{\Sigma}}_{m,T}^{-1}\mathbf{1}} \end{equation}\]

In the following I set \(T = 60\) denoting 5 years of monthly data. For \(m\), I consider different methods from the RiskPortfolios library.

Implementation

To utilize RiskPortfolios package we need to set different controls in order to corresponds to each method of interest.

m_1 <- list(type = "naive")
m_2 <- list(type = "ewma")
m_3 <- list(type = "lw")
m_4 <- list(type = "factor")
m_5 <- list(type = "const")
m_6 <- list(type = "cor")
m_7 <- list(type = "oneparm")
m_8 <- list(type = "diag")
m_9 <- list(type = "large")
m_10 <- list(type = "bs")

m_list <- list(m_1,m_2,m_3,
               m_4,m_5,m_6,
               m_7,m_8,m_9,
               m_10)

names(m_list) <- c("Sample","EWMA","LW",
                   "Factor","Constant","Cor",
                   "Uniform","Diagonal","Large",
                   "Bayes_Stein")

Each of the above methods results in a covariance matrix estimate and, hence, a portfolio. For further information about the methods, see the reference manual of the RiskPortfolios library.

For the ease of exposition, let us write the following GMVP function

GMVP_fun <- function(S_i) {
  S_i_inv <- solve(S_i)
  d <- nrow(S_i)
  v_ones <- rep(1,d)
  w_hat <- S_i_inv%*%v_ones/(sum(S_i_inv))
  return(w_hat)
}

The code below constitutes a nested loop. For a given dataset index data_i and method m, the inner loop iterates through each month and conducts an rolling window estimation. The outer loops iterates through each method and dataset. The end result of the following is a list of out-of-sample portfolio returns denoted by R_mat_list.

R_mat_list <- list()

for(data_i in 1:length(ds_list)) { 
  cat("This is data ", data_i,"\n")
  ds_i <- ds_list[[data_i]]
  
  total_months <- unique(ceiling_date(ds_i$date,"m") - 1)
  T_months <- 60 # corresponsd to five years of data
  month_seq <- sort(as.character(total_months[T_months:length(total_months)]))
  
  r_p_ts_list <- list()
  for (m in 1:length(m_list)) {
    m_i <- m_list[[m]]
    
    r_p_ts <- c()
    for (i in 1:(length(month_seq) - 1)) {
      month_i <- month_seq[i]
      ds_i_sub <- ds_i[ds_i$date <= month_i,]
      ds_i_sub <- tail(ds_i_sub,T_months)
      R_sub <- as.matrix(ds_i_sub[,-1])
      S_i <- covEstimation(R_sub,control = m_i)
      w_i <- GMVP_fun(S_i)
      
      month_next <- month_seq[i+1]
      ds_i_next <- ds_i[ds_i$date <= month_next & ds_i$date > month_i,]
      R_next <- as.matrix(ds_i_next[,-1])
      r_p_i <- c(t(w_i)%*%t(R_next))
      r_p_ts <- c(r_p_ts,r_p_i)
    }
    
    names(r_p_ts) <- month_seq[2:length(month_seq)]
    r_p_ts <- as.xts(r_p_ts)
    r_p_ts_list <- c(r_p_ts_list,list(r_p_ts))
    
  }
  
  R_mat <- Reduce(merge,r_p_ts_list)
  names(R_mat) <- names(m_list)
  
  R_mat_list <- c(R_mat_list,list(R_mat))
}

Portfolio Performance

To summarize the performance of the above 10 decision rules, let us consider the following performance metrics.

my_sum <- function(x) {
  m <- mean(x)*12
  s <- sd(x)*sqrt(12)
  SR <- m/s
  s_neg <- sd(x[x < 0])*sqrt(12)
  Sort <- m/s_neg
  skew <- skewness(x)
  kurt <- kurtosis(x)
  VaR <- mean(x) - quantile(x,0.01)
  results <- c(m,s,SR,Sort,skew,kurt,VaR)
  return(results)
}

For each data index data_i, I create a summary matrix using the following function:

sum_data_i <- function(data_i) {
  R_mat_i <- R_mat_list[[data_i]]
  sum_i <- apply(R_mat_i,2,my_sum)
  rownames(sum_i) <- c("Mean","Std","Sharpe","Sortino","Skewness","Kurtosis","VaR")
  return(sum_i)
}

Finally, we stack all summary results in the following list

sum_data_list <- lapply(1:length(R_mat_list),sum_data_i)

For instance, the summary performance for the 48 industry portfolios is given by

data.frame(t(sum_data_list[[5]]))

Summary

As an overall summary, let us visualize a given performance metric for each data set over the whole ten estimation methods used in our analysis. To do so, I create the following function that takes a performance metric of interest as its main argument and returns a ggplot item:

ggplot_sum <- function(perf_of_interest) {
  
  ds.plot <- data.frame()
  for (data_i in 1:length(sum_data_list)) {
    sum_data_i <- sum_data_list[[data_i]]
    y <- sum_data_i[perf_of_interest,]
    ds.plot <- rbind(ds.plot,data.frame(Performance = y, Method = names(y), Data = data_i))
  }
  
  ds.plot$Data <- as.factor(ds.plot$Data)
  p <- ggplot(ds.plot,aes(x = Data, y = Performance, colour =Method )) 
  p <- p +  geom_point() 
  p <- p +  labs(x = "FF Data", y = perf_of_interest)
  return(p)
}

For example, to summarize the risk-adjusted returns, I plot the Sortino ratio while distinguishing the data and method used.

p <- ggplot_sum("Sortino")
ggplotly(p,width = 800, height = 600)

Among all methods, we observe that the “factor” approach yields consistent performance and ranks at the top three among all datasets. The other two that follow suit are the Constant and Uniform approaches.


lapply(sum_data_list, function(x)  names(sort(x["Sortino",],decreasing = T)[1:3]))
[[1]]
[1] "EWMA"     "Factor"   "Diagonal"

[[2]]
[1] "Factor"   "Constant" "Uniform" 

[[3]]
[1] "Constant" "Factor"   "Uniform" 

[[4]]
[1] "Factor"   "Constant" "Uniform" 

[[5]]
[1] "Constant" "Factor"   "Uniform" 

Note: by definition, the “factor” methods estimates the covariance matrix using a single factor by default. Since we do not define a specific risk factor, the source code of the library utilizes the factanal stats command for factor analysis. For further info, see this link. The “constant” method refers to the case in which the covariance matrix is shrank towards a covariance matrix with constant correlation. This method is parsimonious and, potentially, leads to reduction of estimation risk. Finally, the Uniform method corresponds to “oneparm” and is considered even more parsimonious, since the prior is given by a single one-parameter matrix. In this case, all variances are equal, while covariances are set to zero.

As a final summary, let us reconstruct the previous plot but using value-at-risk (VaR):

p <- ggplot_sum("VaR")
ggplotly(p,width = 800, height = 600)

In terms of VaR, we see that the Factor approach results in the lowest downside risk for all datasets, except the 5 Fama-French industry portfolios.

lapply(sum_data_list, function(x)  names(sort(x["VaR",],decreasing = F)[1:3]))
[[1]]
[1] "EWMA"        "Bayes_Stein" "Sample"     

[[2]]
[1] "Factor"   "Cor"      "Constant"

[[3]]
[1] "Factor"   "Cor"      "Constant"

[[4]]
[1] "Factor"   "Constant" "Cor"     

[[5]]
[1] "Factor"   "Constant" "Cor"     

Overall, the above results indicate that shrinkage techniques provide greater improvement when one is dealing with larger number of assets. This denotes the case of great estimation risk. Additionally, the fact that the factor model dominates other techniques, this indicates that the decision-maker can fully shrink the covariance matrix towards a prior without the need of determining the shrinkage intensity. Clearly, to generalize this conjecture, one needs to validate over different datasets, sample sizes, and estimation frequencies.

Concluding Remarks

This vignette provides a simple paradigm to evaluate the out-of-sample performance of different estimation methods for covariance matrices. Interested researches in proposing new methods for covariance matrices would greatly benefit from RiskPortfolios in setting benchmarks and testing new decision rules. Note that the above analysis does not take into account statistical significance. One potential remedy is to perform bootstrap and assess whether outperformance is not merely a statistical fluke. I leave this for a future endeavor.

---
title: 'Global Minimum Variance RiskPortfolios'
author: "Majeed Simaan"
date: "August 28, 2021"
output:
  html_notebook: default
  pdf_document: default
  word_document: default
fig_width: 20
---

# Overview
In this vignette, I demonstrate how to utilize the `RiskPortfolios` library for portfolio selection. The analysis below relies on different estimation methods included with the library. Specifically, I demonstrate how to apply ten different estimation techniques for covariance matrices. Based on these estimates, I find a global minimum variance portfolio without position constraints. The back-testing procedure is conducted using Fama-French monthly industry portfolios. 

# Getting Started
Let us begin by loading all relevant packages and downloading the data needed. 

```{r,message=F,warning=F}
library(RiskPortfolios)
library(lubridate)
library(xts)
library(moments)
library(ggplot2)
library(plotly)

```

The analysis relies on different Fama-French industry portfolios. The code below downloads the data directly from Ken French's website and stacks the five industry datasets in a list.
```{r,message=FALSE,warning=FALSE,results='hide'}
file_i <- "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/XYZ_Industry_Portfolios_CSV.zip"
ds_list <- list()
for (ind in c(5,10,17,30,48)) {
  
  FF_file <- gsub("XYZ",ind,file_i,)
  temp <- tempfile()
  download.file(FF_file,temp)
  unz_files <- unzip(temp)
  ds <- read.csv(unz_files,skip = 11)
  flag_obs <- grep("Equal",ds[,1],ignore.case = T)
  ds <- ds[1:(flag_obs-1),]
  names(ds)[1] <- "date"
  ds <- data.frame(apply(ds, 2, as.numeric))
  ds[,-1] <- ds[,-1]/100
  ds$date <- ceiling_date(ymd(ds$date*100+ 01),"m")-1
  ds <- ds[ds$date >= "1990-01-01",]
  ds_list <- c(ds_list,list(ds))
}

```

I set the data to start from 1990. In this case, we have the follow number of observation for each data
```{r}
sapply(ds_list,nrow)
```
```{r}
years_data <- lapply(ds_list, function(x) unique(year(x$date)) )
years_data <- Reduce(union,years_data)
total_n_years <- max(years_data) - min(years_data)
```
which corresponds roughly to `r total_n_years` years of data. 

**Note**: the key is to run a program on a single data from `ds_list` list, so that we can rerun it through the whole datasets easily.

# The Portfolio Problem
For the portfolio, I consider the global minimum variance portfolio (GMVP). Without position constraints, the portfolio weights are given by 
\begin{equation}
\mathbf{w}=\frac{\boldsymbol{\Sigma}^{-1}\mathbf{1}}{\mathbf{1}^{\top}\boldsymbol{\Sigma}^{-1}\mathbf{1}}
\end{equation}
where  $\Sigma$ denotes a non-singular $N \times N$ covariance matrix with $N$ is the number of assets in the portfolio. Additionally, $\mathbf{1}$ denotes a column vector of ones.

In practice, the portfolio manager needs to estimate the covariance matrix. We denote the corresponding estimate using  methodology $m$ and sample size $T$ by $\hat{\boldsymbol{\Sigma}}_{m,T}$.  For this reason, the GMVP is a function of both $m$ and $T$, such that 
\begin{equation}
\hat{\mathbf{w}}_{m,T}=\frac{\hat{\boldsymbol{\Sigma}}_{m,T}^{-1}\mathbf{1}}{\mathbf{1}^{\top}\hat{\boldsymbol{\Sigma}}_{m,T}^{-1}\mathbf{1}}
\end{equation}

In the following I set $T = 60$ denoting 5 years of monthly data. For $m$, I consider different methods from the `RiskPortfolios` library.

# Implementation
To utilize `RiskPortfolios` package we need to set different controls in order to corresponds to each method of interest.
```{r}
m_1 <- list(type = "naive")
m_2 <- list(type = "ewma")
m_3 <- list(type = "lw")
m_4 <- list(type = "factor")
m_5 <- list(type = "const")
m_6 <- list(type = "cor")
m_7 <- list(type = "oneparm")
m_8 <- list(type = "diag")
m_9 <- list(type = "large")
m_10 <- list(type = "bs")

m_list <- list(m_1,m_2,m_3,
               m_4,m_5,m_6,
               m_7,m_8,m_9,
               m_10)

names(m_list) <- c("Sample","EWMA","LW",
                   "Factor","Constant","Cor",
                   "Uniform","Diagonal","Large",
                   "Bayes_Stein")

```
Each of the above methods results in a covariance matrix estimate and, hence, a portfolio. For further information about the methods, see the reference manual of the `RiskPortfolios` library.

For the ease of exposition, let us write the following GMVP function
```{r}
GMVP_fun <- function(S_i) {
  S_i_inv <- solve(S_i)
  d <- nrow(S_i)
  v_ones <- rep(1,d)
  w_hat <- S_i_inv%*%v_ones/(sum(S_i_inv))
  return(w_hat)
}
```

The code below constitutes a nested loop. For a given dataset index `data_i` and method `m`, the inner loop iterates through each month and conducts an rolling window estimation. The outer loops iterates through each method and dataset. The end result of the following is a list of out-of-sample portfolio returns denoted by `R_mat_list`.

```{r,message=FALSE,warning=FALSE,results='hide'}
R_mat_list <- list()

for(data_i in 1:length(ds_list)) { 
  cat("This is data ", data_i,"\n")
  ds_i <- ds_list[[data_i]]
  
  total_months <- unique(ceiling_date(ds_i$date,"m") - 1)
  T_months <- 60 # corresponsd to five years of data
  month_seq <- sort(as.character(total_months[T_months:length(total_months)]))
  
  r_p_ts_list <- list()
  for (m in 1:length(m_list)) {
    m_i <- m_list[[m]]
    
    r_p_ts <- c()
    for (i in 1:(length(month_seq) - 1)) {
      month_i <- month_seq[i]
      ds_i_sub <- ds_i[ds_i$date <= month_i,]
      ds_i_sub <- tail(ds_i_sub,T_months)
      R_sub <- as.matrix(ds_i_sub[,-1])
      S_i <- covEstimation(R_sub,control = m_i)
      w_i <- GMVP_fun(S_i)
      
      month_next <- month_seq[i+1]
      ds_i_next <- ds_i[ds_i$date <= month_next & ds_i$date > month_i,]
      R_next <- as.matrix(ds_i_next[,-1])
      r_p_i <- c(t(w_i)%*%t(R_next))
      r_p_ts <- c(r_p_ts,r_p_i)
    }
    
    names(r_p_ts) <- month_seq[2:length(month_seq)]
    r_p_ts <- as.xts(r_p_ts)
    r_p_ts_list <- c(r_p_ts_list,list(r_p_ts))
    
  }
  
  R_mat <- Reduce(merge,r_p_ts_list)
  names(R_mat) <- names(m_list)
  
  R_mat_list <- c(R_mat_list,list(R_mat))
}



```

# Portfolio Performance
To summarize the performance of the above 10 decision rules, let us consider the following performance metrics.
```{r}
my_sum <- function(x) {
  m <- mean(x)*12
  s <- sd(x)*sqrt(12)
  SR <- m/s
  s_neg <- sd(x[x < 0])*sqrt(12)
  Sort <- m/s_neg
  skew <- skewness(x)
  kurt <- kurtosis(x)
  VaR <- mean(x) - quantile(x,0.01)
  results <- c(m,s,SR,Sort,skew,kurt,VaR)
  return(results)
}
```

For each data index `data_i`, I create a summary matrix using the following function:
```{r}
sum_data_i <- function(data_i) {
  R_mat_i <- R_mat_list[[data_i]]
  sum_i <- apply(R_mat_i,2,my_sum)
  rownames(sum_i) <- c("Mean","Std","Sharpe","Sortino","Skewness","Kurtosis","VaR")
  return(sum_i)
}

```

Finally, we stack all summary results in the following list
```{r}
sum_data_list <- lapply(1:length(R_mat_list),sum_data_i)
```
For instance, the summary performance for the 48 industry portfolios is given by
```{r}
data.frame(t(sum_data_list[[5]]))
```

## Summary
As an overall summary, let us visualize a given performance metric for each data set over the whole ten estimation methods used in our analysis. To do so, I create the following function that takes a performance metric of interest as its main argument and returns a `ggplot` item:
```{r}
ggplot_sum <- function(perf_of_interest) {
  
  ds.plot <- data.frame()
  for (data_i in 1:length(sum_data_list)) {
    sum_data_i <- sum_data_list[[data_i]]
    y <- sum_data_i[perf_of_interest,]
    ds.plot <- rbind(ds.plot,data.frame(Performance = y, Method = names(y), Data = data_i))
  }
  
  ds.plot$Data <- as.factor(ds.plot$Data)
  p <- ggplot(ds.plot,aes(x = Data, y = Performance, colour =Method )) 
  p <- p +  geom_point() 
  p <- p +  labs(x = "FF Data", y = perf_of_interest)
  return(p)
}

```

For example, to summarize the risk-adjusted returns, I plot the Sortino ratio while distinguishing the data and method used. 
```{r}
p <- ggplot_sum("Sortino")
ggplotly(p,width = 800, height = 600)
```
Among all methods, we observe that the "factor" approach yields consistent performance and ranks at the top three among all datasets. The other two that follow suit are the Constant and Uniform approaches. 
```{r}

lapply(sum_data_list, function(x)  names(sort(x["Sortino",],decreasing = T)[1:3]))

```

**Note**: by definition, the "factor" methods estimates the covariance matrix using a single factor by default. Since we do not define a specific risk factor, the source [code](https://rdrr.io/cran/RiskPortfolios/src/R/covEstimation.R) of the library utilizes the `factanal` stats command for factor analysis. For further info, see this [link](https://www.geo.fu-berlin.de/en/v/soga/Geodata-analysis/factor-analysis/A-simple-example-of-FA/index.html). The "constant" method refers to the case in which the covariance matrix is shrank towards a covariance matrix with constant correlation. This method is parsimonious and, potentially, leads to reduction of estimation risk. Finally, the Uniform method corresponds to "oneparm" and is considered  even more parsimonious, since the prior is given by a single one-parameter matrix. In this case, all variances are equal, while covariances are set to zero. 

As a final summary, let us reconstruct the previous plot but using value-at-risk (VaR):
```{r}
p <- ggplot_sum("VaR")
ggplotly(p,width = 800, height = 600)
```
In terms of VaR, we see that the Factor approach results in the lowest downside risk for all datasets, except the 5 Fama-French industry portfolios. 

```{r}
lapply(sum_data_list, function(x)  names(sort(x["VaR",],decreasing = F)[1:3]))
```
Overall, the above results indicate that shrinkage techniques provide greater improvement when one is dealing with larger number of assets. This denotes the case of great estimation risk. Additionally, the fact that the factor model dominates other techniques, this indicates that the decision-maker can fully shrink the covariance matrix towards a prior without the need of  determining the shrinkage intensity. Clearly, to generalize this conjecture, one needs to validate over different datasets, sample sizes, and estimation frequencies. 


# Concluding Remarks
This vignette provides a simple paradigm to evaluate the out-of-sample performance of different estimation methods for covariance matrices. Interested researches in proposing new methods for covariance matrices would greatly benefit from `RiskPortfolios` in setting benchmarks and testing new decision rules. Note that the above analysis does not take into account statistical significance. One potential remedy is to perform bootstrap and assess whether outperformance is not merely a statistical fluke. I leave this for a future endeavor.


