This project is on incorporating factor alphas using the Black-Litterman model. We start by with a market weighted portfolio based on the S&P500 index where the equity returns are based on our prior belief, or in other words that implied by the Capital Asset Pricing Model.
Next, we aim on incorporating the factor alphas from our previous research based on the subjective views obtained. These views are then incorporated to our prior beliefs through a Bayesian updating process in order to obtain a vector of posterior expected returns. Finally, based on the posterior returns, we calculate the mean - variance optimal portfolio.
Let’s load the libraries first
library(tidyverse)
Let’s import the data now. The data is monthly time series (ranging from 197001 till 201606) of raw variables required to construct the factors for each constituent in S&P500 prevalent at the point of time.
Now, load the data and view the summary
Data <- read_csv("Data.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## gvkey = col_character(),
## tic = col_character(),
## sp500 = col_integer(),
## date = col_integer()
## )
## See spec(...) for full column specifications.
Data
## # A tibble: 483,208 x 23
## gvkey tic prccm prchm prclm trfm trt1m cshom fyearq fqtr ceqq
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 001010 4165A 46.4 49.9 46.1 1.38 -5.36 NaN 1969 4 192.
## 2 001010 4165A 48.1 48.4 44.4 1.40 5.07 NaN 1969 4 192.
## 3 001010 4165A 49.2 50.2 47.5 1.40 2.34 NaN 1969 4 192.
## 4 001010 4165A 45.5 51.5 42.5 1.40 -7.61 NaN 1970 1 NaN
## 5 001010 4165A 38.5 45.1 35.6 1.42 -14.1 NaN 1970 1 NaN
## 6 001010 4165A 38.5 42.6 38.5 1.42 0 NaN 1970 1 NaN
## 7 001010 4165A 38.0 39.7 36.7 1.42 -1.30 NaN 1970 2 NaN
## 8 001010 4165A 39.5 39.7 36.6 1.44 5.53 NaN 1970 2 NaN
## 9 001010 4165A 43.2 43.5 39.0 1.44 9.49 NaN 1970 2 NaN
## 10 001010 4165A 41.5 44.2 41.0 1.44 -4.05 NaN 1970 3 NaN
## # ... with 483,198 more rows, and 12 more variables: cshoq <dbl>,
## # dlttq <dbl>, epsfxq <dbl>, ibcomq <dbl>, ibq <dbl>, saleq <dbl>,
## # seqq <dbl>, dvpsxq <dbl>, FY_1 <dbl>, LTG <dbl>, sp500 <int>,
## # date <int>
summary(Data)
## gvkey tic prccm prchm
## Length:483208 Length:483208 Min. : 0.00 Min. : 0.00
## Class :character Class :character 1st Qu.: 18.62 1st Qu.: 20.00
## Mode :character Mode :character Median : 30.12 Median : 32.25
## Mean : 38.02 Mean : 40.38
## 3rd Qu.: 46.00 3rd Qu.: 48.68
## Max. :4736.00 Max. :5059.00
## NA's :2722 NA's :3955
## prclm trfm trt1m
## Min. : 0.00 Min. : 1.000 Min. : -99.866
## 1st Qu.: 17.12 1st Qu.: 1.081 1st Qu.: -4.412
## Median : 28.00 Median : 1.537 Median : 0.944
## Mean : 35.34 Mean : 2.989 Mean : 1.574
## 3rd Qu.: 42.88 3rd Qu.: 2.683 3rd Qu.: 6.647
## Max. :4467.00 Max. :265.278 Max. :9900.000
## NA's :3947 NA's :1655 NA's :2330
## cshom fyearq fqtr ceqq
## Min. :0.000e+00 Min. :1969 Min. :1.000 Min. :-136332.0
## 1st Qu.:8.616e+07 1st Qu.:1980 1st Qu.:1.000 1st Qu.: 233.1
## Median :1.747e+08 Median :1992 Median :2.000 Median : 826.8
## Mean :4.185e+08 Mean :1992 Mean :2.497 Mean : 3142.2
## 3rd Qu.:3.818e+08 3rd Qu.:2003 3rd Qu.:3.000 3rd Qu.: 2444.8
## Max. :2.906e+10 Max. :2017 Max. :4.000 Max. : 263025.0
## NA's :309953 NA's :12011 NA's :12011 NA's :35017
## cshoq dlttq epsfxq
## Min. : 0.00 Min. : 0.0 Min. :-990.000
## 1st Qu.: 17.30 1st Qu.: 71.6 1st Qu.: 0.240
## Median : 54.61 Median : 400.5 Median : 0.510
## Mean : 199.03 Mean : 3936.1 Mean : 0.917
## 3rd Qu.: 161.90 3rd Qu.: 1686.1 3rd Qu.: 0.888
## Max. :29206.44 Max. :3160074.0 Max. :3330.000
## NA's :29333 NA's :36312 NA's :26967
## ibcomq ibq saleq
## Min. :-62059.00 Min. :-61659.00 Min. :-25623.0
## 1st Qu.: 4.00 1st Qu.: 4.20 1st Qu.: 141.4
## Median : 21.03 Median : 21.93 Median : 459.6
## Mean : 97.22 Mean : 100.67 Mean : 1607.6
## 3rd Qu.: 79.00 3rd Qu.: 80.97 3rd Qu.: 1329.8
## Max. :127090.00 Max. :127140.00 Max. :131565.0
## NA's :14323 NA's :14302 NA's :15674
## seqq dvpsxq FY_1 LTG
## Min. :-91142.0 Min. : 0.000 Min. :-94.12 Min. :-708.10
## 1st Qu.: 243.2 1st Qu.: 0.000 1st Qu.: 0.12 1st Qu.: 9.47
## Median : 857.8 Median : 0.160 Median : 0.32 Median : 12.25
## Mean : 3253.2 Mean : 0.228 Mean : 0.44 Mean : 13.46
## 3rd Qu.: 2511.0 3rd Qu.: 0.340 3rd Qu.: 0.63 3rd Qu.: 15.86
## Max. :270083.0 Max. :216.350 Max. : 29.88 Max. : 310.90
## NA's :28793 NA's :12993 NA's :263543 NA's :273410
## sp500 date
## Min. :0.0000 Min. :19700131
## 1st Qu.:0.0000 1st Qu.:19800831
## Median :1.0000 Median :19920131
## Mean :0.5815 Mean :19918209
## 3rd Qu.:1.0000 3rd Qu.:20030531
## Max. :1.0000 Max. :20161130
##
Defining the columns of the dataset.
Let’s do some data pre-processing. * Keeping the dates up to six digits * Taking out the ticker and sp500 since they are no longer relevant * Reorganizing the dataset a bit
Date_dat <- round(Data[, "date"]/100)
Data <- Data %>% select(- tic, - date) %>%
bind_cols(Date_dat) %>%
select(date, gvkey:sp500)
Data
## # A tibble: 483,208 x 22
## date gvkey prccm prchm prclm trfm trt1m cshom fyearq fqtr ceqq
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 197001 001010 46.4 49.9 46.1 1.38 -5.36 NaN 1969 4 192.
## 2 197002 001010 48.1 48.4 44.4 1.40 5.07 NaN 1969 4 192.
## 3 197003 001010 49.2 50.2 47.5 1.40 2.34 NaN 1969 4 192.
## 4 197004 001010 45.5 51.5 42.5 1.40 -7.61 NaN 1970 1 NaN
## 5 197005 001010 38.5 45.1 35.6 1.42 -14.1 NaN 1970 1 NaN
## 6 197006 001010 38.5 42.6 38.5 1.42 0 NaN 1970 1 NaN
## 7 197007 001010 38.0 39.7 36.7 1.42 -1.30 NaN 1970 2 NaN
## 8 197008 001010 39.5 39.7 36.6 1.44 5.53 NaN 1970 2 NaN
## 9 197009 001010 43.2 43.5 39.0 1.44 9.49 NaN 1970 2 NaN
## 10 197010 001010 41.5 44.2 41.0 1.44 -4.05 NaN 1970 3 NaN
## # ... with 483,198 more rows, and 11 more variables: cshoq <dbl>,
## # dlttq <dbl>, epsfxq <dbl>, ibcomq <dbl>, ibq <dbl>, saleq <dbl>,
## # seqq <dbl>, dvpsxq <dbl>, FY_1 <dbl>, LTG <dbl>, sp500 <int>
summary(Data)
## date gvkey prccm prchm
## Min. :197001 Length:483208 Min. : 0.00 Min. : 0.00
## 1st Qu.:198008 Class :character 1st Qu.: 18.62 1st Qu.: 20.00
## Median :199201 Mode :character Median : 30.12 Median : 32.25
## Mean :199182 Mean : 38.02 Mean : 40.38
## 3rd Qu.:200305 3rd Qu.: 46.00 3rd Qu.: 48.68
## Max. :201611 Max. :4736.00 Max. :5059.00
## NA's :2722 NA's :3955
## prclm trfm trt1m
## Min. : 0.00 Min. : 1.000 Min. : -99.866
## 1st Qu.: 17.12 1st Qu.: 1.081 1st Qu.: -4.412
## Median : 28.00 Median : 1.537 Median : 0.944
## Mean : 35.34 Mean : 2.989 Mean : 1.574
## 3rd Qu.: 42.88 3rd Qu.: 2.683 3rd Qu.: 6.647
## Max. :4467.00 Max. :265.278 Max. :9900.000
## NA's :3947 NA's :1655 NA's :2330
## cshom fyearq fqtr ceqq
## Min. :0.000e+00 Min. :1969 Min. :1.000 Min. :-136332.0
## 1st Qu.:8.616e+07 1st Qu.:1980 1st Qu.:1.000 1st Qu.: 233.1
## Median :1.747e+08 Median :1992 Median :2.000 Median : 826.8
## Mean :4.185e+08 Mean :1992 Mean :2.497 Mean : 3142.2
## 3rd Qu.:3.818e+08 3rd Qu.:2003 3rd Qu.:3.000 3rd Qu.: 2444.8
## Max. :2.906e+10 Max. :2017 Max. :4.000 Max. : 263025.0
## NA's :309953 NA's :12011 NA's :12011 NA's :35017
## cshoq dlttq epsfxq
## Min. : 0.00 Min. : 0.0 Min. :-990.000
## 1st Qu.: 17.30 1st Qu.: 71.6 1st Qu.: 0.240
## Median : 54.61 Median : 400.5 Median : 0.510
## Mean : 199.03 Mean : 3936.1 Mean : 0.917
## 3rd Qu.: 161.90 3rd Qu.: 1686.1 3rd Qu.: 0.888
## Max. :29206.44 Max. :3160074.0 Max. :3330.000
## NA's :29333 NA's :36312 NA's :26967
## ibcomq ibq saleq
## Min. :-62059.00 Min. :-61659.00 Min. :-25623.0
## 1st Qu.: 4.00 1st Qu.: 4.20 1st Qu.: 141.4
## Median : 21.03 Median : 21.93 Median : 459.6
## Mean : 97.22 Mean : 100.67 Mean : 1607.6
## 3rd Qu.: 79.00 3rd Qu.: 80.97 3rd Qu.: 1329.8
## Max. :127090.00 Max. :127140.00 Max. :131565.0
## NA's :14323 NA's :14302 NA's :15674
## seqq dvpsxq FY_1 LTG
## Min. :-91142.0 Min. : 0.000 Min. :-94.12 Min. :-708.10
## 1st Qu.: 243.2 1st Qu.: 0.000 1st Qu.: 0.12 1st Qu.: 9.47
## Median : 857.8 Median : 0.160 Median : 0.32 Median : 12.25
## Mean : 3253.2 Mean : 0.228 Mean : 0.44 Mean : 13.46
## 3rd Qu.: 2511.0 3rd Qu.: 0.340 3rd Qu.: 0.63 3rd Qu.: 15.86
## Max. :270083.0 Max. :216.350 Max. : 29.88 Max. : 310.90
## NA's :28793 NA's :12993 NA's :263543 NA's :273410
## sp500
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.5815
## 3rd Qu.:1.0000
## Max. :1.0000
##
Before we can start the calculation process, it might be feasible to create panel data tables for each raw variable which will be used to calculate these factors. This way we can directly call the variables from the local environment. We can use the “spread” function from the “tidyr” package by taking our initial data and create variable tables based on the gvkey.
for (i in 3:ncol(Data)) {
samp <- Data[, c(1, 2, i)]
colnames(samp)[3] <- c("x")
assign(colnames(Data)[i], as.matrix(spread(samp, gvkey, x)))
}
For building the Black Litterman framework, we need to start out with a value weighted portfolio with equities which follow the criteria below
First, we apply the non-member filter,
t_0 <- which(sp500[, 1] == 201012)
const_201012 <- which(sp500[t_0, ] == 1.0000)
trt1m_t0 <- trt1m[,const_201012]
let’s find the equities satisfying the second condition,
# creating a separate vector for storing the dates
date_id <- prccm[, 1]
# this loop will go over each constituent column for 201012 and check to see if they don't have sufficient trading observations
trd_id <- matrix(NA, nrow = dim(trt1m_t0)[2])
for (i in 1:dim(trt1m_t0)[2]){
if ((sum(is.na(trt1m_t0[1:(t_0 - 1), i])) <= 291) == FALSE) {
trd_id[i,1] <- colnames(trt1m_t0)[i]
} else {
trd_id[i,1] <- NA
}
}
trd_id <- trd_id[complete.cases(trd_id)]
trt1m_t0 <- trt1m_t0[, which(!colnames(trt1m_t0)%in%trd_id)]
Now the third condition,
T_0 <- which(sp500[, 1] == 201606) # index to find 201606 or the last oos
# the loop below will detect all the equities which have more than 50 missing # observations
trd_idpost <- matrix(nrow = dim(trt1m_t0)[2])
for (j in 1:dim(trt1m_t0)[2]) {
if ((sum(is.na(trt1m_t0[(t_0+1):T_0, j])) <= 50) == FALSE){
trd_idpost[j,1] <- colnames(trt1m_t0)[j]
} else {
trd_idpost[j,1] <- NA
}
}
trd_idpost <- trd_idpost[complete.cases(trd_idpost),]
trt1m_t0 <- trt1m_t0[, which(!colnames(trt1m_t0)%in%trd_idpost)]
const_id <- colnames(trt1m_t0)
Next, we use the three significant factors we found from our previous research and calculate them here to derive an expectation for all three.
The significant factors we found earlier were,
Size: Market Cap \[LogMCap_{i,t} = \log(CSHOM_{i,t}*CloseM_{i,t})\]
Valuation: book to price \[BP_{i,t} = \frac{CEQQ_{i,t}}{CSHOQ_{i,t}*{CloseM_{i,t}}}\]
Price momentum \[HL1M_i,_t = \frac{HighM_{i,t} - CloseM_{i,t}}{CloseM_{i,t} - LowM_{i,t}}\]
# Log Market Cap
logmcap <- function(cshom, prccm){
LogMCap <- log(cshom*prccm)
return(LogMCap)
}
LogMCap <- logmcap(cshom = cshom[, const_id], prccm = prccm[, const_id])
as.tibble(LogMCap)
## # A tibble: 563 x 378
## `001075` `001078` `001161` `001177` `001209` `001300` `001356` `001380`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 NaN NaN NA NaN NaN NaN NaN NaN
## 2 NaN NaN NA NaN NaN NaN NaN NaN
## 3 NaN NaN NA NaN NaN NaN NaN NaN
## 4 NaN NaN NA NaN NaN NaN NaN NaN
## 5 NaN NaN NA NaN NaN NaN NaN NaN
## 6 NaN NaN NA NaN NaN NaN NaN NaN
## 7 NaN NaN NA NaN NaN NaN NaN NaN
## 8 NaN NaN NA NaN NaN NaN NaN NaN
## 9 NaN NaN NA NaN NaN NaN NaN NaN
## 10 NaN NaN NA NaN NaN NaN NaN NaN
## # ... with 553 more rows, and 370 more variables: `001408` <dbl>,
## # `001440` <dbl>, `001447` <dbl>, `001449` <dbl>, `001487` <dbl>,
## # `001602` <dbl>, `001632` <dbl>, `001661` <dbl>, `001678` <dbl>,
## # `001690` <dbl>, `001704` <dbl>, `001722` <dbl>, `001878` <dbl>,
## # `001891` <dbl>, `001913` <dbl>, `001920` <dbl>, `001976` <dbl>,
## # `001988` <dbl>, `001995` <dbl>, `002019` <dbl>, `002044` <dbl>,
## # `002086` <dbl>, `002111` <dbl>, `002136` <dbl>, `002154` <dbl>,
## # `002184` <dbl>, `002269` <dbl>, `002285` <dbl>, `002312` <dbl>,
## # `002403` <dbl>, `002435` <dbl>, `002547` <dbl>, `002574` <dbl>,
## # `002663` <dbl>, `002710` <dbl>, `002751` <dbl>, `002783` <dbl>,
## # `002817` <dbl>, `002884` <dbl>, `002968` <dbl>, `002991` <dbl>,
## # `003024` <dbl>, `003062` <dbl>, `003107` <dbl>, `003121` <dbl>,
## # `003144` <dbl>, `003170` <dbl>, `003221` <dbl>, `003226` <dbl>,
## # `003231` <dbl>, `003243` <dbl>, `003310` <dbl>, `003336` <dbl>,
## # `003362` <dbl>, `003413` <dbl>, `003439` <dbl>, `003505` <dbl>,
## # `003532` <dbl>, `003650` <dbl>, `003735` <dbl>, `003813` <dbl>,
## # `003835` <dbl>, `003897` <dbl>, `003905` <dbl>, `003980` <dbl>,
## # `004029` <dbl>, `004040` <dbl>, `004058` <dbl>, `004060` <dbl>,
## # `004066` <dbl>, `004087` <dbl>, `004093` <dbl>, `004094` <dbl>,
## # `004108` <dbl>, `004145` <dbl>, `004199` <dbl>, `004213` <dbl>,
## # `004242` <dbl>, `004321` <dbl>, `004423` <dbl>, `004430` <dbl>,
## # `004494` <dbl>, `004503` <dbl>, `004510` <dbl>, `004517` <dbl>,
## # `004560` <dbl>, `004598` <dbl>, `004611` <dbl>, `004640` <dbl>,
## # `004674` <dbl>, `004699` <dbl>, `004723` <dbl>, `004737` <dbl>,
## # `004818` <dbl>, `004839` <dbl>, `004843` <dbl>, `004885` <dbl>,
## # `004988` <dbl>, `004990` <dbl>, `005046` <dbl>, ...
# Valuation: Book to Price
bp_ratio <- function(ceqq, cshoq, prccm){
t <- nrow(ceqq)
BP <- ceqq/(cshoq*prccm)
return(BP[13:t,])
}
BP <- bp_ratio(ceqq = ceqq[, const_id], cshoq = cshoq[, const_id], prccm = prccm[, const_id])
as.tibble(BP)
## # A tibble: 551 x 378
## `001075` `001078` `001161` `001177` `001209` `001300` `001356` `001380`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 NaN 0.242 NA NaN NaN 1.15 0.837 0.708
## 2 NaN 0.239 NA NaN NaN 0.995 0.860 0.675
## 3 NaN 0.258 NA NaN NaN 1.06 0.880 0.593
## 4 NaN NaN NA 0.808 NaN NaN NaN NaN
## 5 NaN NaN NA 0.861 NaN NaN NaN NaN
## 6 NaN NaN NA 0.796 NaN NaN NaN NaN
## 7 NaN NaN NA 0.772 NaN NaN NaN NaN
## 8 NaN NaN NA 0.720 NaN NaN NaN NaN
## 9 NaN NaN NA 0.749 NaN NaN NaN NaN
## 10 NaN NaN NA 0.823 0.550 NaN NaN NaN
## # ... with 541 more rows, and 370 more variables: `001408` <dbl>,
## # `001440` <dbl>, `001447` <dbl>, `001449` <dbl>, `001487` <dbl>,
## # `001602` <dbl>, `001632` <dbl>, `001661` <dbl>, `001678` <dbl>,
## # `001690` <dbl>, `001704` <dbl>, `001722` <dbl>, `001878` <dbl>,
## # `001891` <dbl>, `001913` <dbl>, `001920` <dbl>, `001976` <dbl>,
## # `001988` <dbl>, `001995` <dbl>, `002019` <dbl>, `002044` <dbl>,
## # `002086` <dbl>, `002111` <dbl>, `002136` <dbl>, `002154` <dbl>,
## # `002184` <dbl>, `002269` <dbl>, `002285` <dbl>, `002312` <dbl>,
## # `002403` <dbl>, `002435` <dbl>, `002547` <dbl>, `002574` <dbl>,
## # `002663` <dbl>, `002710` <dbl>, `002751` <dbl>, `002783` <dbl>,
## # `002817` <dbl>, `002884` <dbl>, `002968` <dbl>, `002991` <dbl>,
## # `003024` <dbl>, `003062` <dbl>, `003107` <dbl>, `003121` <dbl>,
## # `003144` <dbl>, `003170` <dbl>, `003221` <dbl>, `003226` <dbl>,
## # `003231` <dbl>, `003243` <dbl>, `003310` <dbl>, `003336` <dbl>,
## # `003362` <dbl>, `003413` <dbl>, `003439` <dbl>, `003505` <dbl>,
## # `003532` <dbl>, `003650` <dbl>, `003735` <dbl>, `003813` <dbl>,
## # `003835` <dbl>, `003897` <dbl>, `003905` <dbl>, `003980` <dbl>,
## # `004029` <dbl>, `004040` <dbl>, `004058` <dbl>, `004060` <dbl>,
## # `004066` <dbl>, `004087` <dbl>, `004093` <dbl>, `004094` <dbl>,
## # `004108` <dbl>, `004145` <dbl>, `004199` <dbl>, `004213` <dbl>,
## # `004242` <dbl>, `004321` <dbl>, `004423` <dbl>, `004430` <dbl>,
## # `004494` <dbl>, `004503` <dbl>, `004510` <dbl>, `004517` <dbl>,
## # `004560` <dbl>, `004598` <dbl>, `004611` <dbl>, `004640` <dbl>,
## # `004674` <dbl>, `004699` <dbl>, `004723` <dbl>, `004737` <dbl>,
## # `004818` <dbl>, `004839` <dbl>, `004843` <dbl>, `004885` <dbl>,
## # `004988` <dbl>, `004990` <dbl>, `005046` <dbl>, ...
hl1m <- function(prchm, prccm, prclm){
t <- nrow(prchm)
HL1M <- (prchm - prccm)/(prccm - prclm)
rownames(HL1M) <- prchm[,1]
return(HL1M[13:t,])
}
HL1M <- hl1m(prchm = prchm[, const_id], prccm = prccm[, const_id], prclm = prclm[, const_id])
as.tibble(HL1M)
## # A tibble: 551 x 378
## `001075` `001078` `001161` `001177` `001209` `001300` `001356` `001380`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.714 0.0833 NA 0.0405 0.659 1.30 0.0526 0
## 2 Inf 2 NA 0.400 0.273 0.172 2.00 0.667
## 3 0.3 2.5 NA 0.429 0.0250 16.0 2.25 0.0274
## 4 3.25 1.44 NA 2.36 2.00 0.179 0.667 0.226
## 5 5 3.3 NA 2.00 4.00 3.00 0.556 1.45
## 6 0 4.91 NA 0 1.00 0.235 3.69 0.0868
## 7 1 2.15 NA 0.423 12.0 Inf 4.56 4.20
## 8 10 0.216 NA 0.400 0.151 0 0.690 4.26
## 9 1 0.455 NA 3.85 17.0 6 70.0 2.71
## 10 5.5 1.36 NA 1.52 2.43 4 4.29 11.6
## # ... with 541 more rows, and 370 more variables: `001408` <dbl>,
## # `001440` <dbl>, `001447` <dbl>, `001449` <dbl>, `001487` <dbl>,
## # `001602` <dbl>, `001632` <dbl>, `001661` <dbl>, `001678` <dbl>,
## # `001690` <dbl>, `001704` <dbl>, `001722` <dbl>, `001878` <dbl>,
## # `001891` <dbl>, `001913` <dbl>, `001920` <dbl>, `001976` <dbl>,
## # `001988` <dbl>, `001995` <dbl>, `002019` <dbl>, `002044` <dbl>,
## # `002086` <dbl>, `002111` <dbl>, `002136` <dbl>, `002154` <dbl>,
## # `002184` <dbl>, `002269` <dbl>, `002285` <dbl>, `002312` <dbl>,
## # `002403` <dbl>, `002435` <dbl>, `002547` <dbl>, `002574` <dbl>,
## # `002663` <dbl>, `002710` <dbl>, `002751` <dbl>, `002783` <dbl>,
## # `002817` <dbl>, `002884` <dbl>, `002968` <dbl>, `002991` <dbl>,
## # `003024` <dbl>, `003062` <dbl>, `003107` <dbl>, `003121` <dbl>,
## # `003144` <dbl>, `003170` <dbl>, `003221` <dbl>, `003226` <dbl>,
## # `003231` <dbl>, `003243` <dbl>, `003310` <dbl>, `003336` <dbl>,
## # `003362` <dbl>, `003413` <dbl>, `003439` <dbl>, `003505` <dbl>,
## # `003532` <dbl>, `003650` <dbl>, `003735` <dbl>, `003813` <dbl>,
## # `003835` <dbl>, `003897` <dbl>, `003905` <dbl>, `003980` <dbl>,
## # `004029` <dbl>, `004040` <dbl>, `004058` <dbl>, `004060` <dbl>,
## # `004066` <dbl>, `004087` <dbl>, `004093` <dbl>, `004094` <dbl>,
## # `004108` <dbl>, `004145` <dbl>, `004199` <dbl>, `004213` <dbl>,
## # `004242` <dbl>, `004321` <dbl>, `004423` <dbl>, `004430` <dbl>,
## # `004494` <dbl>, `004503` <dbl>, `004510` <dbl>, `004517` <dbl>,
## # `004560` <dbl>, `004598` <dbl>, `004611` <dbl>, `004640` <dbl>,
## # `004674` <dbl>, `004699` <dbl>, `004723` <dbl>, `004737` <dbl>,
## # `004818` <dbl>, `004839` <dbl>, `004843` <dbl>, `004885` <dbl>,
## # `004988` <dbl>, `004990` <dbl>, `005046` <dbl>, ...
Next, we calculate the Qspreads based on these factors for our portfolio.
# Sorting stocks into quintiles based on the values
# of various stock characteristics
# This function sorts returns based on factors and
# also calculates the Q spread
sort_q <- function(factor_data, trt1m, flip = F){
t <- nrow(factor_data)
k <- ncol(factor_data)
top20 <- data.frame(matrix(data = NA, nrow = t, ncol = k))
colnames(top20) <- colnames(factor_data)
bot20 <- data.frame(matrix(data = NA, nrow = t, ncol = k))
colnames(bot20) <- colnames(factor_data)
for (i in 1:(t-1)){
sort_rank <- sort(rank(factor_data[i, is.finite(factor_data[i,])],
na.last = NA, ties.method = "min"))
ID_bot20 <- names(sort_rank[which(sort_rank <= quantile(sort_rank,
.20))])
ID_top20 <- names(sort_rank[which(sort_rank >= quantile(sort_rank,
.80))])
for (q in 1:length(ID_bot20)) {
bot20[i+1, ID_bot20[q]] <- trt1m[i+1, ID_bot20[q]]
}
for (j in 1:length(ID_top20)){
top20[i+1, ID_top20[j]] <- trt1m[i+1, ID_top20[j]]
}
}
# Evaluate the portfolio performance and calcualte Q-spread
PP_QS <- data.frame(matrix(data = NA, nrow = t, ncol = 3))
colnames(PP_QS) <- c("top20", "bot20", "Q-spread")
for (i in 1:(t-1)){
PP_QS[i, 1] <- mean(as.numeric(top20[i, 1:k]), na.rm = TRUE)
PP_QS[i, 2] <- mean(as.numeric(bot20[i, 1:k]), na.rm = TRUE)
}
if (flip == T){
PP_QS[, 3] <- PP_QS[, 2] - PP_QS[, 1]
} else {
# Q-spread
PP_QS[, 3] <- PP_QS[, 1] - PP_QS[, 2]
}
return (PP_QS)
}
Q_HL1M <- sort_q(HL1M, trt1m_t0, flip = F)
Q_LogMCap <- sort_q(LogMCap, trt1m_t0, flip = T)
Q_BP <- sort_q(BP, trt1m_t0, flip = F)
Let’s inspect the calculated Q spreads for each factor at each point in time.
QSpreads <- cbind(Q_HL1M[,"Q-spread"],
Q_BP[,"Q-spread"],
Q_LogMCap[,"Q-spread"])
## Warning in cbind(Q_HL1M[, "Q-spread"], Q_BP[, "Q-spread"], Q_LogMCap[, "Q-
## spread"]): number of rows of result is not a multiple of vector length (arg
## 1)
colnames(QSpreads) <- c("HL1M", "BP", "LogMCap")
for(i in 1:ncol(QSpreads)){
barplot(height = QSpreads[,i], names.arg = ceqq[, 1],
main = colnames(QSpreads)[i],
xlab = "Time", ylab = "QSpread", col = i)
}
Now, the first step in the BL framework is to calculate the implied excess equilibrium returns denoted by \(\Pi\), the formula can be written as,
\[\Pi = \delta\Sigma w_{mkt}\] Where, \(\delta\) is the risk averson coefficient \(\Sigma\) is the covariance matrix of excess returns \(w_{mkt}\) is the market capitalization weight
Next, the combined return vector \(E[R]\) can be calculated based on the following formula,
\[E[R] = [(\tau\Sigma)^{-1}+P^{'}\Omega^{-1}P]^{-1}[(\tau\Sigma)^{-1}\Pi+P^{'}\Omega^{-1}Q]\] Where, \(\tau\) is a scalar \(P\) is a matrix which identifies assets involved in the views \(\Omega\) is a diagonal covariance matrix of error terms from the expressed views representing uncertainty embedded within each views \(Q\) is the vector for views
After calculating the posterior returns or \(E[R]\), the final Black Litterman weights \(w_{BL}\) for the portfolio can be calculated by solving the first equation for weights, \[w_{BL} = (\delta\Sigma)^{-1}E[R]\] # Calculating Priori Inputs To start off, let’s assume values for \(\tau\) and \(\delta\),
tau <- 0.5
delta <- 10
Next, let’s calculate the covariance matrix based on the historical returns of the portfolio.
trt1m_t0 <- trt1m_t0/100
cov_mat <- corpcor::make.positive.definite(cov(trt1m_t0[1:t_0, ],
use = "pairwise.complete.obs"))
Next, we calculate the market weights using a simple function,
MKT_weight <- function(prccm, cshom, ids){
mkt_weight <- matrix(prccm*cshom/sum(prccm*cshom, na.rm = T))
rownames(mkt_weight) <- ids
return(mkt_weight)
}
mkt_weight <- MKT_weight(prccm[t_0, const_id], cshom[t_0, const_id], const_id)
barplot(mkt_weight[,1], names.arg = const_id, main = "Portfolio Weights")
Now, we can calculate \(\Pi\) vector. Let’s also plot them.
eq_ret <- delta*cov_mat%*%mkt_weight
hist(eq_ret[,1], main = "Implied Equilibrium Excess Returns",
breaks = 20,
xlab = "Returns",
col = "blue",
border = "green")
Next, we move on to calculating the return vector and the covariance matrix of errors for the factor views (\(Q\) & \(\Omega\) respectively)
# constructing the Q vector
mu_HL1M <- mean(Q_HL1M[1:t_0, "Q-spread"], na.rm = TRUE)
mu_LogMCap <- mean(Q_LogMCap[1:t_0, "Q-spread"], na.rm = TRUE)
mu_BP <- mean(Q_BP[1:t_0, "Q-spread"], na.rm = TRUE)
Q <- rbind(mu_HL1M, mu_LogMCap, mu_BP)
# Constructing the omega matrix
Omega1 <- var(Q_HL1M[1:t_0, "Q-spread"],na.rm = TRUE)
Omega2 <- var(Q_LogMCap[1:t_0, "Q-spread"],na.rm = TRUE)
Omega3 <- var(Q_BP[1:t_0, "Q-spread"],na.rm = TRUE)
Omega <- matrix(0,3,3)
diag(Omega)<- c(Omega1,Omega2,Omega3)
Next, we calculate the matrix for views, The matrix \(P\) is created based on equal weights for stocks belonging to the long side or short side of each investment factor portfolio.
# weights for the view matrix P
P_weights <- function(factor_data, flip = F){
k <- length(factor_data)
P_weights <- data.frame(matrix(data = NA, ncol = k))
names(P_weights) <- names(factor_data)
sort_rank <- sort(rank(factor_data[is.finite(factor_data)],
na.last = NA,ties.method = "min"))
bot_20 <- names(sort_rank[which(sort_rank <= quantile(sort_rank, 0.20))])
top_20 <- names(sort_rank[which(sort_rank >= quantile(sort_rank, 0.80))])
for (j in 1:length(bot_20)){
if (flip == F){
P_weights[,bot_20[j]] <- -1/length(bot_20)
}else{
P_weights[,bot_20[j]] <- 1/length(bot_20)
}
}
for (k in 1:length(top_20)){
if (flip == F){
P_weights[,top_20[k]] <- 1/length(top_20)
}else{
P_weights[,top_20[k]] <- -1/length(top_20)
}
}
return(P_weights)
}
P1 <- P_weights(HL1M[t_0, ], flip = F)
P2 <- P_weights(LogMCap[t_0, ], flip = T)
P3 <- P_weights(BP[t_0, ], flip = F)
P1[is.na(P1)] <- 0
P2[is.na(P2)] <- 0
P3[is.na(P3)] <- 0
P <- rbind(P1,P2,P3)
Next, we calculate the posterior return vector and the corresponding Black-Litterman weights.
# Black Litterman weights, combined return vector and updated covariance matrix
BL_weights <- function(tau, delta, cov_mat, P, Omega, eq_ret, Q){
cov_post <- solve(solve(tau*cov_mat)+t(as.matrix(P))%*%solve(Omega)%*%as.matrix(P))
er_bl <- cov_post%*%(solve(tau*cov_mat)%*%eq_ret+t(P)%*%solve(Omega)%*%Q)
BL_weights <- solve(delta*cov_mat)%*%er_bl
return(BL_weights)
}
bl_weights <- BL_weights(tau, delta, cov_mat, P, Omega, eq_ret, Q)
barplot(bl_weights[,1], names.arg = const_id, main = "Black-Litterman Weights")
For both the Black-Litterman updated portfolio and the value-weighted S&P 500 market portfolio, the portfolio weights were determined on Dec 2010, and held fixed over the subsequent evaluation period from 201101 to 201606
ix <- nrow(trt1m_t0) - t_0
ret_mat <- matrix(NA, nrow = ix, ncol = 2)
for (i in 1:ix){
ret_mat[i, 1] <- sum(mkt_weight*trt1m_t0[t_0 + i, const_id], na.rm = T)
eq_ret <- delta*cov_mat%*%mkt_weight
ret_mat[i, 2] <- sum(bl_weights*trt1m_t0[t_0 + i, const_id], na.rm = T)
}
plot(y = ret_mat[,1], x = date_id[(t_0+1):length(date_id)],
main = "Out of Sample Performance - Value Weighted Portfolio",
type = "o", col = "red", lty = 1)
plot(y = ret_mat[,2], x = date_id[(t_0+1):length(date_id)],
main = "Out of Sample Performance - Black Litterman Portfolio",
type = "o", col = "green", lty = 1)
performance_summary <- data.frame(matrix(NA, nrow = 2, ncol = 4))
colnames(performance_summary) <- c("Mean", "Std_dev", "Sharpe_ratio", "t_stats")
rownames(performance_summary) <- c("Value_weighted", "Black_litterman")
performance_summary[, 1] <- apply(ret_mat, 2, mean, na.rm = T)
performance_summary[, 2] <- apply(ret_mat, 2, sd, na.rm = T)
performance_summary[, 3] <- performance_summary[, 1]/performance_summary[, 2]
performance_summary[, 4] <- c(t.test(ret_mat[,1])$statistic , t.test(ret_mat[,2])$statistic)
performance_summary
## Mean Std_dev Sharpe_ratio t_stats
## Value_weighted 0.009509087 0.03361586 0.282875 2.383547
## Black_litterman 0.009545313 0.03347553 0.285143 2.402658
We can see that the portfolio constructed based on black litterman weights have a slightly higher OOT mean return (significant under 5% significance level) than the value weighted portfolio. On a risk adjusted basis, the value weighted portfolio also marginally underperforms the black-litterman portfolio. There are several uncertainties which may arise which may affect the performance. One reason could be relatively high uncertainty embedded within the factor views compared to only following the market factor. This could be dependent on the length of time window picked for calculating the average spreads to be incorporated in the posterior calculation and the possible relatively higher volatility associated with the historical performance of the factor portfolios. Also, different assumptions regarding the model inputs (tau and delta, to be specific) can also affect how the posterior weights are determined.