This project is on incorporating factor alphas using the Black-Litterman model. We start by with a market weighted portfolio based on the S&P500 index where the equity returns are based on our prior belief, or in other words that implied by the Capital Asset Pricing Model.

Next, we aim on incorporating the factor alphas from our previous research based on the subjective views obtained. These views are then incorporated to our prior beliefs through a Bayesian updating process in order to obtain a vector of posterior expected returns. Finally, based on the posterior returns, we calculate the mean - variance optimal portfolio.

Let’s load the libraries first

library(tidyverse)

Let’s import the data now. The data is monthly time series (ranging from 197001 till 201606) of raw variables required to construct the factors for each constituent in S&P500 prevalent at the point of time.

Now, load the data and view the summary

Data <- read_csv("Data.csv")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   gvkey = col_character(),
##   tic = col_character(),
##   sp500 = col_integer(),
##   date = col_integer()
## )
## See spec(...) for full column specifications.
Data
## # A tibble: 483,208 x 23
##    gvkey  tic   prccm prchm prclm  trfm  trt1m cshom fyearq  fqtr  ceqq
##    <chr>  <chr> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl>
##  1 001010 4165A  46.4  49.9  46.1  1.38  -5.36   NaN   1969     4  192.
##  2 001010 4165A  48.1  48.4  44.4  1.40   5.07   NaN   1969     4  192.
##  3 001010 4165A  49.2  50.2  47.5  1.40   2.34   NaN   1969     4  192.
##  4 001010 4165A  45.5  51.5  42.5  1.40  -7.61   NaN   1970     1  NaN 
##  5 001010 4165A  38.5  45.1  35.6  1.42 -14.1    NaN   1970     1  NaN 
##  6 001010 4165A  38.5  42.6  38.5  1.42   0      NaN   1970     1  NaN 
##  7 001010 4165A  38.0  39.7  36.7  1.42  -1.30   NaN   1970     2  NaN 
##  8 001010 4165A  39.5  39.7  36.6  1.44   5.53   NaN   1970     2  NaN 
##  9 001010 4165A  43.2  43.5  39.0  1.44   9.49   NaN   1970     2  NaN 
## 10 001010 4165A  41.5  44.2  41.0  1.44  -4.05   NaN   1970     3  NaN 
## # ... with 483,198 more rows, and 12 more variables: cshoq <dbl>,
## #   dlttq <dbl>, epsfxq <dbl>, ibcomq <dbl>, ibq <dbl>, saleq <dbl>,
## #   seqq <dbl>, dvpsxq <dbl>, FY_1 <dbl>, LTG <dbl>, sp500 <int>,
## #   date <int>
summary(Data)
##     gvkey               tic                prccm             prchm        
##  Length:483208      Length:483208      Min.   :   0.00   Min.   :   0.00  
##  Class :character   Class :character   1st Qu.:  18.62   1st Qu.:  20.00  
##  Mode  :character   Mode  :character   Median :  30.12   Median :  32.25  
##                                        Mean   :  38.02   Mean   :  40.38  
##                                        3rd Qu.:  46.00   3rd Qu.:  48.68  
##                                        Max.   :4736.00   Max.   :5059.00  
##                                        NA's   :2722      NA's   :3955     
##      prclm              trfm             trt1m         
##  Min.   :   0.00   Min.   :  1.000   Min.   : -99.866  
##  1st Qu.:  17.12   1st Qu.:  1.081   1st Qu.:  -4.412  
##  Median :  28.00   Median :  1.537   Median :   0.944  
##  Mean   :  35.34   Mean   :  2.989   Mean   :   1.574  
##  3rd Qu.:  42.88   3rd Qu.:  2.683   3rd Qu.:   6.647  
##  Max.   :4467.00   Max.   :265.278   Max.   :9900.000  
##  NA's   :3947      NA's   :1655      NA's   :2330      
##      cshom               fyearq           fqtr            ceqq          
##  Min.   :0.000e+00   Min.   :1969    Min.   :1.000   Min.   :-136332.0  
##  1st Qu.:8.616e+07   1st Qu.:1980    1st Qu.:1.000   1st Qu.:    233.1  
##  Median :1.747e+08   Median :1992    Median :2.000   Median :    826.8  
##  Mean   :4.185e+08   Mean   :1992    Mean   :2.497   Mean   :   3142.2  
##  3rd Qu.:3.818e+08   3rd Qu.:2003    3rd Qu.:3.000   3rd Qu.:   2444.8  
##  Max.   :2.906e+10   Max.   :2017    Max.   :4.000   Max.   : 263025.0  
##  NA's   :309953      NA's   :12011   NA's   :12011   NA's   :35017      
##      cshoq              dlttq               epsfxq        
##  Min.   :    0.00   Min.   :      0.0   Min.   :-990.000  
##  1st Qu.:   17.30   1st Qu.:     71.6   1st Qu.:   0.240  
##  Median :   54.61   Median :    400.5   Median :   0.510  
##  Mean   :  199.03   Mean   :   3936.1   Mean   :   0.917  
##  3rd Qu.:  161.90   3rd Qu.:   1686.1   3rd Qu.:   0.888  
##  Max.   :29206.44   Max.   :3160074.0   Max.   :3330.000  
##  NA's   :29333      NA's   :36312       NA's   :26967     
##      ibcomq               ibq                saleq         
##  Min.   :-62059.00   Min.   :-61659.00   Min.   :-25623.0  
##  1st Qu.:     4.00   1st Qu.:     4.20   1st Qu.:   141.4  
##  Median :    21.03   Median :    21.93   Median :   459.6  
##  Mean   :    97.22   Mean   :   100.67   Mean   :  1607.6  
##  3rd Qu.:    79.00   3rd Qu.:    80.97   3rd Qu.:  1329.8  
##  Max.   :127090.00   Max.   :127140.00   Max.   :131565.0  
##  NA's   :14323       NA's   :14302       NA's   :15674     
##       seqq              dvpsxq             FY_1             LTG         
##  Min.   :-91142.0   Min.   :  0.000   Min.   :-94.12   Min.   :-708.10  
##  1st Qu.:   243.2   1st Qu.:  0.000   1st Qu.:  0.12   1st Qu.:   9.47  
##  Median :   857.8   Median :  0.160   Median :  0.32   Median :  12.25  
##  Mean   :  3253.2   Mean   :  0.228   Mean   :  0.44   Mean   :  13.46  
##  3rd Qu.:  2511.0   3rd Qu.:  0.340   3rd Qu.:  0.63   3rd Qu.:  15.86  
##  Max.   :270083.0   Max.   :216.350   Max.   : 29.88   Max.   : 310.90  
##  NA's   :28793      NA's   :12993     NA's   :263543   NA's   :273410   
##      sp500             date         
##  Min.   :0.0000   Min.   :19700131  
##  1st Qu.:0.0000   1st Qu.:19800831  
##  Median :1.0000   Median :19920131  
##  Mean   :0.5815   Mean   :19918209  
##  3rd Qu.:1.0000   3rd Qu.:20030531  
##  Max.   :1.0000   Max.   :20161130  
## 

Defining the columns of the dataset.

  1. gvkey: Unique identifier for compustat database
  2. tic: Stock ticker
  3. prccm: Adjusted closing price
  4. prchm: Adjusted monthly high
  5. prclm: Adjusted monthly low
  6. trfm: Total return factor
  7. trt1m: Monthly total return
  8. cshom: Adjusted monthly common shares outstanding
  9. fyearq: Fiscal year
  10. fqtr: Quarter #
  11. ceqq: Quarterly common equity in millions
  12. cshoq: Adjusted quarterly common shares outstanding
  13. dlttq: Quarterly long term debt
  14. epsfxq: Quarterly earnings per share
  15. ibcomq: Quarterly income before extraordinary items, available for Common Equity in millions
  16. ibq: Quarterly income before extraordinary items
  17. saleq: Quarterly sales
  18. seqq: Shareholders equity - total
  19. dvpsxq: Quarterly dividends per share
  20. LTG: Analyst consensus for long term growth
  21. sp500: Indicator where the stock was part of the index
  22. date: Date

Let’s do some data pre-processing. * Keeping the dates up to six digits * Taking out the ticker and sp500 since they are no longer relevant * Reorganizing the dataset a bit

Date_dat <- round(Data[, "date"]/100)

Data <- Data %>% select(- tic, - date) %>%
                 bind_cols(Date_dat) %>%
                 select(date, gvkey:sp500)

Data
## # A tibble: 483,208 x 22
##      date gvkey  prccm prchm prclm  trfm  trt1m cshom fyearq  fqtr  ceqq
##     <dbl> <chr>  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl>
##  1 197001 001010  46.4  49.9  46.1  1.38  -5.36   NaN   1969     4  192.
##  2 197002 001010  48.1  48.4  44.4  1.40   5.07   NaN   1969     4  192.
##  3 197003 001010  49.2  50.2  47.5  1.40   2.34   NaN   1969     4  192.
##  4 197004 001010  45.5  51.5  42.5  1.40  -7.61   NaN   1970     1  NaN 
##  5 197005 001010  38.5  45.1  35.6  1.42 -14.1    NaN   1970     1  NaN 
##  6 197006 001010  38.5  42.6  38.5  1.42   0      NaN   1970     1  NaN 
##  7 197007 001010  38.0  39.7  36.7  1.42  -1.30   NaN   1970     2  NaN 
##  8 197008 001010  39.5  39.7  36.6  1.44   5.53   NaN   1970     2  NaN 
##  9 197009 001010  43.2  43.5  39.0  1.44   9.49   NaN   1970     2  NaN 
## 10 197010 001010  41.5  44.2  41.0  1.44  -4.05   NaN   1970     3  NaN 
## # ... with 483,198 more rows, and 11 more variables: cshoq <dbl>,
## #   dlttq <dbl>, epsfxq <dbl>, ibcomq <dbl>, ibq <dbl>, saleq <dbl>,
## #   seqq <dbl>, dvpsxq <dbl>, FY_1 <dbl>, LTG <dbl>, sp500 <int>
summary(Data)
##       date           gvkey               prccm             prchm        
##  Min.   :197001   Length:483208      Min.   :   0.00   Min.   :   0.00  
##  1st Qu.:198008   Class :character   1st Qu.:  18.62   1st Qu.:  20.00  
##  Median :199201   Mode  :character   Median :  30.12   Median :  32.25  
##  Mean   :199182                      Mean   :  38.02   Mean   :  40.38  
##  3rd Qu.:200305                      3rd Qu.:  46.00   3rd Qu.:  48.68  
##  Max.   :201611                      Max.   :4736.00   Max.   :5059.00  
##                                      NA's   :2722      NA's   :3955     
##      prclm              trfm             trt1m         
##  Min.   :   0.00   Min.   :  1.000   Min.   : -99.866  
##  1st Qu.:  17.12   1st Qu.:  1.081   1st Qu.:  -4.412  
##  Median :  28.00   Median :  1.537   Median :   0.944  
##  Mean   :  35.34   Mean   :  2.989   Mean   :   1.574  
##  3rd Qu.:  42.88   3rd Qu.:  2.683   3rd Qu.:   6.647  
##  Max.   :4467.00   Max.   :265.278   Max.   :9900.000  
##  NA's   :3947      NA's   :1655      NA's   :2330      
##      cshom               fyearq           fqtr            ceqq          
##  Min.   :0.000e+00   Min.   :1969    Min.   :1.000   Min.   :-136332.0  
##  1st Qu.:8.616e+07   1st Qu.:1980    1st Qu.:1.000   1st Qu.:    233.1  
##  Median :1.747e+08   Median :1992    Median :2.000   Median :    826.8  
##  Mean   :4.185e+08   Mean   :1992    Mean   :2.497   Mean   :   3142.2  
##  3rd Qu.:3.818e+08   3rd Qu.:2003    3rd Qu.:3.000   3rd Qu.:   2444.8  
##  Max.   :2.906e+10   Max.   :2017    Max.   :4.000   Max.   : 263025.0  
##  NA's   :309953      NA's   :12011   NA's   :12011   NA's   :35017      
##      cshoq              dlttq               epsfxq        
##  Min.   :    0.00   Min.   :      0.0   Min.   :-990.000  
##  1st Qu.:   17.30   1st Qu.:     71.6   1st Qu.:   0.240  
##  Median :   54.61   Median :    400.5   Median :   0.510  
##  Mean   :  199.03   Mean   :   3936.1   Mean   :   0.917  
##  3rd Qu.:  161.90   3rd Qu.:   1686.1   3rd Qu.:   0.888  
##  Max.   :29206.44   Max.   :3160074.0   Max.   :3330.000  
##  NA's   :29333      NA's   :36312       NA's   :26967     
##      ibcomq               ibq                saleq         
##  Min.   :-62059.00   Min.   :-61659.00   Min.   :-25623.0  
##  1st Qu.:     4.00   1st Qu.:     4.20   1st Qu.:   141.4  
##  Median :    21.03   Median :    21.93   Median :   459.6  
##  Mean   :    97.22   Mean   :   100.67   Mean   :  1607.6  
##  3rd Qu.:    79.00   3rd Qu.:    80.97   3rd Qu.:  1329.8  
##  Max.   :127090.00   Max.   :127140.00   Max.   :131565.0  
##  NA's   :14323       NA's   :14302       NA's   :15674     
##       seqq              dvpsxq             FY_1             LTG         
##  Min.   :-91142.0   Min.   :  0.000   Min.   :-94.12   Min.   :-708.10  
##  1st Qu.:   243.2   1st Qu.:  0.000   1st Qu.:  0.12   1st Qu.:   9.47  
##  Median :   857.8   Median :  0.160   Median :  0.32   Median :  12.25  
##  Mean   :  3253.2   Mean   :  0.228   Mean   :  0.44   Mean   :  13.46  
##  3rd Qu.:  2511.0   3rd Qu.:  0.340   3rd Qu.:  0.63   3rd Qu.:  15.86  
##  Max.   :270083.0   Max.   :216.350   Max.   : 29.88   Max.   : 310.90  
##  NA's   :28793      NA's   :12993     NA's   :263543   NA's   :273410   
##      sp500       
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :1.0000  
##  Mean   :0.5815  
##  3rd Qu.:1.0000  
##  Max.   :1.0000  
## 

Before we can start the calculation process, it might be feasible to create panel data tables for each raw variable which will be used to calculate these factors. This way we can directly call the variables from the local environment. We can use the “spread” function from the “tidyr” package by taking our initial data and create variable tables based on the gvkey.

for (i in 3:ncol(Data)) {
    
          samp <- Data[, c(1, 2, i)]
          colnames(samp)[3] <- c("x")
          assign(colnames(Data)[i], as.matrix(spread(samp, gvkey, x))) 
                          
}

For building the Black Litterman framework, we need to start out with a value weighted portfolio with equities which follow the criteria below

  1. Belonging to the S&P500 index in Dec 2010
  2. Has more than 200 trading observations prior to Dec 2010, this condition will help us develop a better behaved variance - covariance matrix when estimated using historical data
  3. Has no more than 50 missing returns during the out of sample evaluation period (201101 - 201606)

First, we apply the non-member filter,

t_0 <- which(sp500[, 1] == 201012)
const_201012 <- which(sp500[t_0, ] == 1.0000)
trt1m_t0 <- trt1m[,const_201012]

let’s find the equities satisfying the second condition,

# creating a separate vector for storing the dates 

date_id <- prccm[, 1]

# this loop will go over each constituent column for 201012 and check to see if they don't have sufficient trading observations
trd_id  <- matrix(NA, nrow = dim(trt1m_t0)[2])
for (i in 1:dim(trt1m_t0)[2]){
  
  if ((sum(is.na(trt1m_t0[1:(t_0 - 1), i])) <= 291) == FALSE) {
            
    trd_id[i,1] <- colnames(trt1m_t0)[i]        
    
    } else {
      
    trd_id[i,1] <- NA        

  }
}

trd_id <- trd_id[complete.cases(trd_id)]
trt1m_t0 <- trt1m_t0[, which(!colnames(trt1m_t0)%in%trd_id)]

Now the third condition,

T_0 <- which(sp500[, 1] == 201606) # index to find 201606 or the last oos

# the loop below will detect all the equities which have more than 50 missing # observations

trd_idpost <- matrix(nrow = dim(trt1m_t0)[2])
for (j in 1:dim(trt1m_t0)[2]) {
  
  if ((sum(is.na(trt1m_t0[(t_0+1):T_0, j])) <= 50) == FALSE){
          
    trd_idpost[j,1] <- colnames(trt1m_t0)[j]        
          
  } else {
    
    trd_idpost[j,1] <- NA
    
    }
}

trd_idpost <- trd_idpost[complete.cases(trd_idpost),]
trt1m_t0 <- trt1m_t0[, which(!colnames(trt1m_t0)%in%trd_idpost)]

const_id <- colnames(trt1m_t0)

Next, we use the three significant factors we found from our previous research and calculate them here to derive an expectation for all three.

The significant factors we found earlier were,

  1. Size: Market Cap \[LogMCap_{i,t} = \log(CSHOM_{i,t}*CloseM_{i,t})\]

  2. Valuation: book to price \[BP_{i,t} = \frac{CEQQ_{i,t}}{CSHOQ_{i,t}*{CloseM_{i,t}}}\]

  3. Price momentum \[HL1M_i,_t = \frac{HighM_{i,t} - CloseM_{i,t}}{CloseM_{i,t} - LowM_{i,t}}\]

Calculating Log of Market Cap

# Log Market Cap

logmcap <- function(cshom, prccm){
  
  LogMCap <- log(cshom*prccm)
  
  return(LogMCap)

}

LogMCap <- logmcap(cshom = cshom[, const_id], prccm = prccm[, const_id])
as.tibble(LogMCap)
## # A tibble: 563 x 378
##    `001075` `001078` `001161` `001177` `001209` `001300` `001356` `001380`
##       <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
##  1      NaN      NaN       NA      NaN      NaN      NaN      NaN      NaN
##  2      NaN      NaN       NA      NaN      NaN      NaN      NaN      NaN
##  3      NaN      NaN       NA      NaN      NaN      NaN      NaN      NaN
##  4      NaN      NaN       NA      NaN      NaN      NaN      NaN      NaN
##  5      NaN      NaN       NA      NaN      NaN      NaN      NaN      NaN
##  6      NaN      NaN       NA      NaN      NaN      NaN      NaN      NaN
##  7      NaN      NaN       NA      NaN      NaN      NaN      NaN      NaN
##  8      NaN      NaN       NA      NaN      NaN      NaN      NaN      NaN
##  9      NaN      NaN       NA      NaN      NaN      NaN      NaN      NaN
## 10      NaN      NaN       NA      NaN      NaN      NaN      NaN      NaN
## # ... with 553 more rows, and 370 more variables: `001408` <dbl>,
## #   `001440` <dbl>, `001447` <dbl>, `001449` <dbl>, `001487` <dbl>,
## #   `001602` <dbl>, `001632` <dbl>, `001661` <dbl>, `001678` <dbl>,
## #   `001690` <dbl>, `001704` <dbl>, `001722` <dbl>, `001878` <dbl>,
## #   `001891` <dbl>, `001913` <dbl>, `001920` <dbl>, `001976` <dbl>,
## #   `001988` <dbl>, `001995` <dbl>, `002019` <dbl>, `002044` <dbl>,
## #   `002086` <dbl>, `002111` <dbl>, `002136` <dbl>, `002154` <dbl>,
## #   `002184` <dbl>, `002269` <dbl>, `002285` <dbl>, `002312` <dbl>,
## #   `002403` <dbl>, `002435` <dbl>, `002547` <dbl>, `002574` <dbl>,
## #   `002663` <dbl>, `002710` <dbl>, `002751` <dbl>, `002783` <dbl>,
## #   `002817` <dbl>, `002884` <dbl>, `002968` <dbl>, `002991` <dbl>,
## #   `003024` <dbl>, `003062` <dbl>, `003107` <dbl>, `003121` <dbl>,
## #   `003144` <dbl>, `003170` <dbl>, `003221` <dbl>, `003226` <dbl>,
## #   `003231` <dbl>, `003243` <dbl>, `003310` <dbl>, `003336` <dbl>,
## #   `003362` <dbl>, `003413` <dbl>, `003439` <dbl>, `003505` <dbl>,
## #   `003532` <dbl>, `003650` <dbl>, `003735` <dbl>, `003813` <dbl>,
## #   `003835` <dbl>, `003897` <dbl>, `003905` <dbl>, `003980` <dbl>,
## #   `004029` <dbl>, `004040` <dbl>, `004058` <dbl>, `004060` <dbl>,
## #   `004066` <dbl>, `004087` <dbl>, `004093` <dbl>, `004094` <dbl>,
## #   `004108` <dbl>, `004145` <dbl>, `004199` <dbl>, `004213` <dbl>,
## #   `004242` <dbl>, `004321` <dbl>, `004423` <dbl>, `004430` <dbl>,
## #   `004494` <dbl>, `004503` <dbl>, `004510` <dbl>, `004517` <dbl>,
## #   `004560` <dbl>, `004598` <dbl>, `004611` <dbl>, `004640` <dbl>,
## #   `004674` <dbl>, `004699` <dbl>, `004723` <dbl>, `004737` <dbl>,
## #   `004818` <dbl>, `004839` <dbl>, `004843` <dbl>, `004885` <dbl>,
## #   `004988` <dbl>, `004990` <dbl>, `005046` <dbl>, ...

Calculating Book to Price

# Valuation: Book to Price

bp_ratio <- function(ceqq, cshoq, prccm){
  
  t <- nrow(ceqq)
  BP <- ceqq/(cshoq*prccm)
  return(BP[13:t,])
  
}

BP <- bp_ratio(ceqq = ceqq[, const_id], cshoq = cshoq[, const_id], prccm = prccm[, const_id])

as.tibble(BP)
## # A tibble: 551 x 378
##    `001075` `001078` `001161` `001177` `001209` `001300` `001356` `001380`
##       <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
##  1      NaN    0.242       NA  NaN      NaN        1.15     0.837    0.708
##  2      NaN    0.239       NA  NaN      NaN        0.995    0.860    0.675
##  3      NaN    0.258       NA  NaN      NaN        1.06     0.880    0.593
##  4      NaN  NaN           NA    0.808  NaN      NaN      NaN      NaN    
##  5      NaN  NaN           NA    0.861  NaN      NaN      NaN      NaN    
##  6      NaN  NaN           NA    0.796  NaN      NaN      NaN      NaN    
##  7      NaN  NaN           NA    0.772  NaN      NaN      NaN      NaN    
##  8      NaN  NaN           NA    0.720  NaN      NaN      NaN      NaN    
##  9      NaN  NaN           NA    0.749  NaN      NaN      NaN      NaN    
## 10      NaN  NaN           NA    0.823    0.550  NaN      NaN      NaN    
## # ... with 541 more rows, and 370 more variables: `001408` <dbl>,
## #   `001440` <dbl>, `001447` <dbl>, `001449` <dbl>, `001487` <dbl>,
## #   `001602` <dbl>, `001632` <dbl>, `001661` <dbl>, `001678` <dbl>,
## #   `001690` <dbl>, `001704` <dbl>, `001722` <dbl>, `001878` <dbl>,
## #   `001891` <dbl>, `001913` <dbl>, `001920` <dbl>, `001976` <dbl>,
## #   `001988` <dbl>, `001995` <dbl>, `002019` <dbl>, `002044` <dbl>,
## #   `002086` <dbl>, `002111` <dbl>, `002136` <dbl>, `002154` <dbl>,
## #   `002184` <dbl>, `002269` <dbl>, `002285` <dbl>, `002312` <dbl>,
## #   `002403` <dbl>, `002435` <dbl>, `002547` <dbl>, `002574` <dbl>,
## #   `002663` <dbl>, `002710` <dbl>, `002751` <dbl>, `002783` <dbl>,
## #   `002817` <dbl>, `002884` <dbl>, `002968` <dbl>, `002991` <dbl>,
## #   `003024` <dbl>, `003062` <dbl>, `003107` <dbl>, `003121` <dbl>,
## #   `003144` <dbl>, `003170` <dbl>, `003221` <dbl>, `003226` <dbl>,
## #   `003231` <dbl>, `003243` <dbl>, `003310` <dbl>, `003336` <dbl>,
## #   `003362` <dbl>, `003413` <dbl>, `003439` <dbl>, `003505` <dbl>,
## #   `003532` <dbl>, `003650` <dbl>, `003735` <dbl>, `003813` <dbl>,
## #   `003835` <dbl>, `003897` <dbl>, `003905` <dbl>, `003980` <dbl>,
## #   `004029` <dbl>, `004040` <dbl>, `004058` <dbl>, `004060` <dbl>,
## #   `004066` <dbl>, `004087` <dbl>, `004093` <dbl>, `004094` <dbl>,
## #   `004108` <dbl>, `004145` <dbl>, `004199` <dbl>, `004213` <dbl>,
## #   `004242` <dbl>, `004321` <dbl>, `004423` <dbl>, `004430` <dbl>,
## #   `004494` <dbl>, `004503` <dbl>, `004510` <dbl>, `004517` <dbl>,
## #   `004560` <dbl>, `004598` <dbl>, `004611` <dbl>, `004640` <dbl>,
## #   `004674` <dbl>, `004699` <dbl>, `004723` <dbl>, `004737` <dbl>,
## #   `004818` <dbl>, `004839` <dbl>, `004843` <dbl>, `004885` <dbl>,
## #   `004988` <dbl>, `004990` <dbl>, `005046` <dbl>, ...

Calculating Price Momentum

hl1m <- function(prchm, prccm, prclm){
  
  t <- nrow(prchm)
  
  HL1M <- (prchm - prccm)/(prccm - prclm)
  rownames(HL1M) <- prchm[,1]
  
  return(HL1M[13:t,])
  
}

HL1M <- hl1m(prchm = prchm[, const_id], prccm = prccm[, const_id], prclm = prclm[, const_id])
as.tibble(HL1M)
## # A tibble: 551 x 378
##    `001075` `001078` `001161` `001177` `001209` `001300` `001356` `001380`
##       <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
##  1    0.714   0.0833       NA   0.0405   0.659     1.30    0.0526   0     
##  2  Inf       2            NA   0.400    0.273     0.172   2.00     0.667 
##  3    0.3     2.5          NA   0.429    0.0250   16.0     2.25     0.0274
##  4    3.25    1.44         NA   2.36     2.00      0.179   0.667    0.226 
##  5    5       3.3          NA   2.00     4.00      3.00    0.556    1.45  
##  6    0       4.91         NA   0        1.00      0.235   3.69     0.0868
##  7    1       2.15         NA   0.423   12.0     Inf       4.56     4.20  
##  8   10       0.216        NA   0.400    0.151     0       0.690    4.26  
##  9    1       0.455        NA   3.85    17.0       6      70.0      2.71  
## 10    5.5     1.36         NA   1.52     2.43      4       4.29    11.6   
## # ... with 541 more rows, and 370 more variables: `001408` <dbl>,
## #   `001440` <dbl>, `001447` <dbl>, `001449` <dbl>, `001487` <dbl>,
## #   `001602` <dbl>, `001632` <dbl>, `001661` <dbl>, `001678` <dbl>,
## #   `001690` <dbl>, `001704` <dbl>, `001722` <dbl>, `001878` <dbl>,
## #   `001891` <dbl>, `001913` <dbl>, `001920` <dbl>, `001976` <dbl>,
## #   `001988` <dbl>, `001995` <dbl>, `002019` <dbl>, `002044` <dbl>,
## #   `002086` <dbl>, `002111` <dbl>, `002136` <dbl>, `002154` <dbl>,
## #   `002184` <dbl>, `002269` <dbl>, `002285` <dbl>, `002312` <dbl>,
## #   `002403` <dbl>, `002435` <dbl>, `002547` <dbl>, `002574` <dbl>,
## #   `002663` <dbl>, `002710` <dbl>, `002751` <dbl>, `002783` <dbl>,
## #   `002817` <dbl>, `002884` <dbl>, `002968` <dbl>, `002991` <dbl>,
## #   `003024` <dbl>, `003062` <dbl>, `003107` <dbl>, `003121` <dbl>,
## #   `003144` <dbl>, `003170` <dbl>, `003221` <dbl>, `003226` <dbl>,
## #   `003231` <dbl>, `003243` <dbl>, `003310` <dbl>, `003336` <dbl>,
## #   `003362` <dbl>, `003413` <dbl>, `003439` <dbl>, `003505` <dbl>,
## #   `003532` <dbl>, `003650` <dbl>, `003735` <dbl>, `003813` <dbl>,
## #   `003835` <dbl>, `003897` <dbl>, `003905` <dbl>, `003980` <dbl>,
## #   `004029` <dbl>, `004040` <dbl>, `004058` <dbl>, `004060` <dbl>,
## #   `004066` <dbl>, `004087` <dbl>, `004093` <dbl>, `004094` <dbl>,
## #   `004108` <dbl>, `004145` <dbl>, `004199` <dbl>, `004213` <dbl>,
## #   `004242` <dbl>, `004321` <dbl>, `004423` <dbl>, `004430` <dbl>,
## #   `004494` <dbl>, `004503` <dbl>, `004510` <dbl>, `004517` <dbl>,
## #   `004560` <dbl>, `004598` <dbl>, `004611` <dbl>, `004640` <dbl>,
## #   `004674` <dbl>, `004699` <dbl>, `004723` <dbl>, `004737` <dbl>,
## #   `004818` <dbl>, `004839` <dbl>, `004843` <dbl>, `004885` <dbl>,
## #   `004988` <dbl>, `004990` <dbl>, `005046` <dbl>, ...

Long - Short Factor Portfolio

Next, we calculate the Qspreads based on these factors for our portfolio.

# Sorting stocks into quintiles based on the values 
# of various stock characteristics 
# This function sorts returns based on factors and 
# also calculates the Q spread
sort_q <- function(factor_data, trt1m, flip = F){
  
  t <- nrow(factor_data)
  k <- ncol(factor_data)
  
  top20 <- data.frame(matrix(data = NA, nrow = t, ncol = k))
  colnames(top20) <- colnames(factor_data)
  

  bot20 <- data.frame(matrix(data = NA, nrow = t, ncol = k))
  colnames(bot20) <- colnames(factor_data)
   
  for (i in 1:(t-1)){
          
    sort_rank <- sort(rank(factor_data[i, is.finite(factor_data[i,])], 
                           na.last = NA, ties.method = "min"))
    ID_bot20 <- names(sort_rank[which(sort_rank <= quantile(sort_rank, 
                                                           .20))])
    ID_top20 <- names(sort_rank[which(sort_rank >= quantile(sort_rank,
                                                           .80))])
          
      for (q in 1:length(ID_bot20)) {
        
               bot20[i+1, ID_bot20[q]] <- trt1m[i+1, ID_bot20[q]]
      }
      for (j in 1:length(ID_top20)){
        
              top20[i+1, ID_top20[j]] <- trt1m[i+1, ID_top20[j]]
      }
  }  
  
  # Evaluate the portfolio performance and calcualte Q-spread
  PP_QS <- data.frame(matrix(data = NA, nrow = t, ncol = 3))
  colnames(PP_QS) <- c("top20", "bot20", "Q-spread")

  for (i in 1:(t-1)){
      
            PP_QS[i, 1] <- mean(as.numeric(top20[i, 1:k]), na.rm = TRUE)
            PP_QS[i, 2] <- mean(as.numeric(bot20[i, 1:k]), na.rm = TRUE)
  }
  
  if (flip == T){
    
    PP_QS[, 3] <- PP_QS[, 2] - PP_QS[, 1]
    
    } else {
  # Q-spread
    PP_QS[, 3] <- PP_QS[, 1] - PP_QS[, 2]
  
    }
  
  return (PP_QS)
  
}


Q_HL1M <- sort_q(HL1M, trt1m_t0, flip = F)
Q_LogMCap <- sort_q(LogMCap, trt1m_t0, flip = T)
Q_BP <- sort_q(BP, trt1m_t0, flip = F)

Let’s inspect the calculated Q spreads for each factor at each point in time.

QSpreads <- cbind(Q_HL1M[,"Q-spread"], 
                      Q_BP[,"Q-spread"],
                      Q_LogMCap[,"Q-spread"])
## Warning in cbind(Q_HL1M[, "Q-spread"], Q_BP[, "Q-spread"], Q_LogMCap[, "Q-
## spread"]): number of rows of result is not a multiple of vector length (arg
## 1)
colnames(QSpreads) <- c("HL1M", "BP", "LogMCap")

for(i in 1:ncol(QSpreads)){
  
  barplot(height = QSpreads[,i], names.arg = ceqq[, 1], 
       main = colnames(QSpreads)[i], 
       xlab = "Time", ylab = "QSpread", col = i)

}

The Black Litterman Model - Overview

Now, the first step in the BL framework is to calculate the implied excess equilibrium returns denoted by \(\Pi\), the formula can be written as,

\[\Pi = \delta\Sigma w_{mkt}\] Where, \(\delta\) is the risk averson coefficient \(\Sigma\) is the covariance matrix of excess returns \(w_{mkt}\) is the market capitalization weight

Next, the combined return vector \(E[R]\) can be calculated based on the following formula,

\[E[R] = [(\tau\Sigma)^{-1}+P^{'}\Omega^{-1}P]^{-1}[(\tau\Sigma)^{-1}\Pi+P^{'}\Omega^{-1}Q]\] Where, \(\tau\) is a scalar \(P\) is a matrix which identifies assets involved in the views \(\Omega\) is a diagonal covariance matrix of error terms from the expressed views representing uncertainty embedded within each views \(Q\) is the vector for views

After calculating the posterior returns or \(E[R]\), the final Black Litterman weights \(w_{BL}\) for the portfolio can be calculated by solving the first equation for weights, \[w_{BL} = (\delta\Sigma)^{-1}E[R]\] # Calculating Priori Inputs To start off, let’s assume values for \(\tau\) and \(\delta\),

  tau <- 0.5
  delta <- 10

Next, let’s calculate the covariance matrix based on the historical returns of the portfolio.

trt1m_t0 <- trt1m_t0/100
cov_mat <- corpcor::make.positive.definite(cov(trt1m_t0[1:t_0, ],
                                               use = "pairwise.complete.obs"))

Next, we calculate the market weights using a simple function,

MKT_weight <- function(prccm, cshom, ids){
  

  mkt_weight <- matrix(prccm*cshom/sum(prccm*cshom, na.rm = T))
  rownames(mkt_weight) <- ids
  
  return(mkt_weight)
}

mkt_weight <- MKT_weight(prccm[t_0, const_id], cshom[t_0, const_id], const_id)

barplot(mkt_weight[,1], names.arg = const_id, main = "Portfolio Weights")

Now, we can calculate \(\Pi\) vector. Let’s also plot them.

eq_ret <- delta*cov_mat%*%mkt_weight
hist(eq_ret[,1], main = "Implied Equilibrium Excess Returns", 
     breaks = 20, 
     xlab = "Returns",
     col = "blue",
     border = "green")

Moments for Factor Views

Next, we move on to calculating the return vector and the covariance matrix of errors for the factor views (\(Q\) & \(\Omega\) respectively)

# constructing the Q vector
mu_HL1M <- mean(Q_HL1M[1:t_0, "Q-spread"], na.rm = TRUE)
mu_LogMCap <- mean(Q_LogMCap[1:t_0, "Q-spread"], na.rm = TRUE)
mu_BP <- mean(Q_BP[1:t_0, "Q-spread"], na.rm = TRUE)

Q <- rbind(mu_HL1M, mu_LogMCap, mu_BP)
  
# Constructing the omega matrix  
Omega1 <- var(Q_HL1M[1:t_0, "Q-spread"],na.rm = TRUE)
Omega2 <- var(Q_LogMCap[1:t_0, "Q-spread"],na.rm = TRUE)
Omega3 <- var(Q_BP[1:t_0, "Q-spread"],na.rm = TRUE)

Omega <- matrix(0,3,3)
diag(Omega)<- c(Omega1,Omega2,Omega3)

Constructing View Matrix

Next, we calculate the matrix for views, The matrix \(P\) is created based on equal weights for stocks belonging to the long side or short side of each investment factor portfolio.

# weights for the view matrix P
P_weights <- function(factor_data, flip = F){
      
  k <- length(factor_data)
  P_weights <- data.frame(matrix(data = NA, ncol = k))
  names(P_weights) <- names(factor_data)

  sort_rank <- sort(rank(factor_data[is.finite(factor_data)],
                         na.last = NA,ties.method = "min"))
  bot_20 <- names(sort_rank[which(sort_rank <= quantile(sort_rank, 0.20))])
  top_20 <- names(sort_rank[which(sort_rank >= quantile(sort_rank, 0.80))])
  
            
  for (j in 1:length(bot_20)){
    
    if (flip == F){
      
      P_weights[,bot_20[j]] <- -1/length(bot_20)
          
        }else{
            
      P_weights[,bot_20[j]] <- 1/length(bot_20)
            
    }
  }
  
  for (k in 1:length(top_20)){
    
    if (flip == F){      
      
      P_weights[,top_20[k]] <- 1/length(top_20)
      
      }else{
        
      P_weights[,top_20[k]] <- -1/length(top_20)
      
      }
  }
      
      return(P_weights)
} 

P1 <- P_weights(HL1M[t_0, ], flip = F)
P2 <- P_weights(LogMCap[t_0, ], flip = T)
P3 <- P_weights(BP[t_0, ], flip = F)

P1[is.na(P1)] <- 0
P2[is.na(P2)] <- 0
P3[is.na(P3)] <- 0

P <- rbind(P1,P2,P3)

Black - Litterman Weights

Next, we calculate the posterior return vector and the corresponding Black-Litterman weights.

# Black Litterman weights, combined return vector and updated covariance matrix

BL_weights <- function(tau, delta, cov_mat, P, Omega, eq_ret, Q){
  
  cov_post <- solve(solve(tau*cov_mat)+t(as.matrix(P))%*%solve(Omega)%*%as.matrix(P))
  er_bl <- cov_post%*%(solve(tau*cov_mat)%*%eq_ret+t(P)%*%solve(Omega)%*%Q)
  BL_weights <- solve(delta*cov_mat)%*%er_bl
  
  return(BL_weights)
}

bl_weights <- BL_weights(tau, delta, cov_mat, P, Omega, eq_ret, Q)

barplot(bl_weights[,1], names.arg = const_id, main = "Black-Litterman Weights")

Out of Time Performance

For both the Black-Litterman updated portfolio and the value-weighted S&P 500 market portfolio, the portfolio weights were determined on Dec 2010, and held fixed over the subsequent evaluation period from 201101 to 201606

ix <- nrow(trt1m_t0) - t_0
ret_mat <- matrix(NA, nrow = ix, ncol = 2)
  
for (i in 1:ix){ 
    
    ret_mat[i, 1] <- sum(mkt_weight*trt1m_t0[t_0 + i, const_id], na.rm = T)
    eq_ret <- delta*cov_mat%*%mkt_weight
    ret_mat[i, 2] <- sum(bl_weights*trt1m_t0[t_0 + i, const_id], na.rm = T)
}

plot(y = ret_mat[,1], x = date_id[(t_0+1):length(date_id)], 
     main = "Out of Sample Performance - Value Weighted Portfolio", 
     type = "o", col = "red", lty = 1)

plot(y = ret_mat[,2], x = date_id[(t_0+1):length(date_id)], 
     main = "Out of Sample Performance - Black Litterman Portfolio", 
     type = "o", col = "green", lty = 1)

Performance Stats

performance_summary <- data.frame(matrix(NA, nrow = 2, ncol = 4))
colnames(performance_summary) <- c("Mean", "Std_dev", "Sharpe_ratio", "t_stats")
rownames(performance_summary) <- c("Value_weighted", "Black_litterman")
performance_summary[, 1] <- apply(ret_mat, 2, mean, na.rm = T)
performance_summary[, 2] <- apply(ret_mat, 2, sd, na.rm = T)
performance_summary[, 3] <- performance_summary[, 1]/performance_summary[, 2]
performance_summary[, 4] <- c(t.test(ret_mat[,1])$statistic , t.test(ret_mat[,2])$statistic)

performance_summary
##                        Mean    Std_dev Sharpe_ratio  t_stats
## Value_weighted  0.009509087 0.03361586     0.282875 2.383547
## Black_litterman 0.009545313 0.03347553     0.285143 2.402658

We can see that the portfolio constructed based on black litterman weights have a slightly higher OOT mean return (significant under 5% significance level) than the value weighted portfolio. On a risk adjusted basis, the value weighted portfolio also marginally underperforms the black-litterman portfolio. There are several uncertainties which may arise which may affect the performance. One reason could be relatively high uncertainty embedded within the factor views compared to only following the market factor. This could be dependent on the length of time window picked for calculating the average spreads to be incorporated in the posterior calculation and the possible relatively higher volatility associated with the historical performance of the factor portfolios. Also, different assumptions regarding the model inputs (tau and delta, to be specific) can also affect how the posterior weights are determined.