In the realm of investment management, constructing a well-balanced portfolio that maximizes returns while minimizing risks remains a formidable challenge. To address this, Modern Portfolio Theory (MPT) has emerged as a prominent and empirically grounded framework. This implementation aims to leverage the principles of MPT to create a data-driven investment approach that emphasizes diversification, risk management, and evidence-based decision-making. By focusing on long-term objectives and employing statistical analysis, this implementation seeks to identify optimal portfolio combinations that lie on the efficient frontier, providing investors with superior risk-adjusted returns. Through the implementation of MPT, we endeavor to offer insights that can enhance the rational allocation of assets, instill confidence in investment decisions, and potentially pave the way for more efficient wealth accumulation strategies.
This implementation optimizes the following function:
\[\large \min_{\textbf{w}} \textbf{w}^T \Sigma\textbf{w}\] Subject to: \[\large A\textbf{w} \geq b\] Where
In other words, this implementation optimizes a portfolio to maximize expected returns and minimize risk. It determines expected returns and risk of each asset based on a weighted measure of its past performance, volatility, and correlation with other assets.
In constructing the weighted covariance matrix between different assets, we used a linear function \(f(date) = 1 + dateDiff(date - startDate)/totalDays\) to determine the weighting of movements between assets on each trading day. This ensures that more recent correlations between assets are taken more into account.
A similar approach was used to determine annual returns. Each annual return is weighted according to the number of years since the start of a user-defined record tracking start date, with weighting ranging from 0.75 to 1.25, scaling linearly. This amplifies the importance of more recent returns, and more closely reflects the current state of the market.
This implementation does NOT allow for short selling. That is, allocations cannot go below zero for any asset. This is so users can enjoy a beginner-friendly allocation that still largely achieves its purpose.
In the case where there is no possible allocation for the desired return, this implementation will simply return nothing. A lottery ticket may be better in those cases :)
In determine which pool of assets to choose from, we landed on a selection of all ETFs with at least $100 million in AUM and expense ratio less than 0.6%. The dataset we used can be found here: https://infolific.com/money-management/exchange-traded-funds/list-of-etfs/
Here are the reasons for this rationale:
We only looked at ETFs since they are more diversified, so projections on expected returns based purely on historical returns should have less error
Only ETFs with > $100M AUM are considered for liquidity concerns
Only ETFs with < 0.6% expense ratios are considered so lost returns don’t compound significantly in the long run
With these constraints, there are 363 ETFs in the pool of candidate assets. They cover essentially all major investing styles, strategies, and asset classes. This is meant to largely represent the investable portion of the market, but it’s open to change and iterative updates.
#Gets info from quantmod API
getInfo = function(tickers, years){
# call API
keep = c()
for (symbol in tickers){
res = tryCatch(getSymbols(symbol, type = "xts"),
warning = function(x){return("no")},
error = function(x){return("no")})
if (res != "no"){
keep = c(keep, symbol)
}
}
tickers = keep
n = length(tickers)
# get prices for this ticker for the last "years" years
getSymbols(tickers[1], type = "xts")
tmp = tail(get(tickers[1])[,1],252*years)
# get percent changes over this period
changes = diff(log(tmp[,1]))[-1,]
# initialize price / daily change dataframes
portfolio = data.frame(diff = tmp)
p_dailyChange = data.frame(changes)
badNames = c()
# populate dataframes
for (name in tickers[-1]){
getSymbols(name, type = "xts")
tmp = na.approx(tail(get(name)[,1],252*years))
if (length(tmp[,1]) < 252*years){
n = n - 1
badNames = c(badNames, name)
next
}
changes = diff(log(tmp[,1]))[-1,]
portfolio = data.frame(portfolio, tmp)
p_dailyChange = data.frame(p_dailyChange, changes)
}
remainingTickers = tickers[!tickers %in% unique(badNames)]
names(portfolio) = remainingTickers
names(p_dailyChange) = remainingTickers
# get weighted covariance matrix of daily returns between tickers, where
# most recent entry is twice as important as the earliest record.
covWeights = 1 + c(1:nrow(p_dailyChange))/nrow(p_dailyChange)
covmatrix = matrix(c(cov.wt(p_dailyChange, wt = covWeights))[[1]],
nrow=n, ncol=n)
dimnames(covmatrix) = list(remainingTickers, remainingTickers)
logReturn = function(rowNum){
prices = portfolio[seq(1,252*years, length.out = years), rowNum]
dates = as.Date(rownames(portfolio)[seq(1,252*years,
length.out = years)])
zoo_data <- zoo(xts(prices, order.by = dates), order.by = dates)
xts_data <- as.xts(zoo_data)
log_returns <- na.omit(Return.calculate(xts_data, method = "log"))
weights = seq(0.75, 1.25, length.out = years - 1)
weightedReturn = mean((exp(log_returns) - 1) * weights)
return(weightedReturn)
}
# annual geometric returns for each stock
avgReturns = round(unlist(lapply(c(1:n),logReturn)), 3)
names(avgReturns) = remainingTickers
return(list(avgReturns, covmatrix, remainingTickers))
}
getAllocHelper = function(desiredReturn, estReturns, covmatrix,
tickers, principle){
# set up constrained optimization problem
n = length(estReturns)
covmatrix[is.na(covmatrix) | is.infinite(covmatrix)] <- 1
Dmat = covmatrix
dvec = rep(0,n)
Amat = cbind(matrix(estReturns, nrow = n),
matrix(rep(1, n), nrow = n), diag(n))
bvec = matrix(c(desiredReturn, 1, rep(0,n)), ncol = 1)
# first "meq" constraints are equality constraints
# 1. expected return = desiredReturn
# 2. sum of weights = 1
# 3. all weights >= 0
solution <- tryCatch(list("pass!", solve.QP(Dmat, dvec, Amat, bvec, meq = 2)),
error = function(error){
noSol = data.frame(matrix(rep(0, 3*length(tickers)),
ncol = 3))
rownames(noSol) = tickers
names(noSol) = c("strategy", "allocation", "amount")
return(list("error!", noSol))
})
if (solution[[1]] == "error!"){
return(solution[[2]])
}
solution = solution[[2]]
# extract weighting and fund allocation
strategy = round(solution$solution, digits = 3)
allocation = abs(strategy)/sqrt(sum((strategy)^2))
allocation = allocation/sum(allocation)
overallStrat = data.frame(t(rbind(strategy, allocation)))
rownames(overallStrat) = tickers
overallStrat = round(overallStrat, 3)
overallStrat$amount = overallStrat$allocation * principle
return(overallStrat)
}
getAlloc = function(years, principle, myEst, symbols,
baseReturn, desiredReturn, stepSize, actual){
c(avgReturns, covmatrix, tickers) %<-% getInfo(symbols, years)
returns = data.frame(Actual = avgReturns, myGuess = myEst)
if(actual == 1){
estReturns = avgReturns
}
else{
estReturns = myEst
}
# dataframe of allocations for all tickers across all
# desired return preferences
targetWeights = data.frame(t(matrix(unlist(
mclapply(seq(baseReturn,desiredReturn,stepSize),
function(x){
return(getAllocHelper(x, estReturns,
covmatrix, tickers,
principle)$amount)
})), nrow = length(tickers))
)
)
names(targetWeights) = tickers
# additional metrics: variance & achieved estimated return
portfolio_sd <- sqrt(252)* apply(targetWeights, 1,
function(row){
return(sqrt(t(row/principle)%*%
covmatrix%*%row/principle))})
portfolio_sd = round(portfolio_sd, 4)
est_return <- round(apply(targetWeights, 1,
function(row){
return((row%*%estReturns)/principle)}), 2)
output = data.frame(est_return = est_return,
annualRisk_sd = portfolio_sd,
targetWeights[,rev(order(apply(targetWeights,
2, which.max)))])
zeros_cols <- colSums(output) == 0
zeros_rows <- rowSums(output) == 0
output = data.frame(subset(output, select = !zeros_cols))[!zeros_rows,]
rownames(output) = NULL
return(output)
}
etfs = read.csv("etfList.csv")
etfs = subset(etfs, Structure == "ETF")
etfs$Market.Cap..millions. = as.numeric(gsub("(\\$|,)","",
etfs$Market.Cap..millions.))
etfs$Expense.Ratio = as.numeric(gsub("(\\%|,)","",
etfs$Expense.Ratio))
etfs = subset(etfs, Market.Cap..millions. > 100)
etfs = subset(etfs, Expense.Ratio < 0.6)
# Liquidated ETF (edge case)
etfs = subset(etfs, Symbol != "ELV")
tickers = c(etfs$Symbol)
Now we will, following Modern Portfolio Theory, determine the optimal portfolios across expected returns from 0% to 30%. As an example, we will designate the data collection look-back period to be 10 years and a total of $10000USD to allocate to the assets.
optimalAlloc = getAlloc(years = 10, principle = 10000,
myEst = NA, symbols = tickers,
baseReturn = 0, desiredReturn = 0.5,
stepSize = 0.02, actual = 1)
kable_styling(kable(optimalAlloc),bootstrap_options = "striped") %>%
scroll_box(width = "800px", height = "650px")
| est_return | annualRisk_sd | SMH | XHB | PSCD | EDV | RWJ | IHI | VGT | TLT | SLX | SHV | VTIP | TDTT | PLW | PGX | GSY | DBJP | CYB | BIL |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.00 | 0.0047 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4530 | 50 | 10 | 10 | 30 | 630 | 10 | 20 | 4700 |
| 0.02 | 0.0172 | 280 | 0 | 200 | 0 | 80 | 170 | 20 | 160 | 30 | 5790 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3270 |
| 0.04 | 0.0337 | 560 | 0 | 400 | 0 | 170 | 330 | 40 | 330 | 50 | 6510 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1600 |
| 0.06 | 0.0510 | 850 | 0 | 610 | 0 | 250 | 500 | 60 | 500 | 80 | 7150 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.08 | 0.0677 | 1130 | 0 | 810 | 0 | 340 | 660 | 80 | 670 | 100 | 6210 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.10 | 0.0847 | 1420 | 0 | 1010 | 0 | 420 | 830 | 100 | 840 | 120 | 5260 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.12 | 0.1015 | 1700 | 0 | 1210 | 0 | 510 | 990 | 120 | 1020 | 140 | 4300 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.14 | 0.1189 | 1990 | 0 | 1410 | 0 | 600 | 1160 | 140 | 1190 | 170 | 3350 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.16 | 0.1357 | 2270 | 0 | 1620 | 0 | 680 | 1320 | 160 | 1360 | 190 | 2400 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.18 | 0.1528 | 2560 | 0 | 1820 | 0 | 770 | 1480 | 180 | 1530 | 210 | 1450 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.20 | 0.1695 | 2840 | 0 | 2020 | 0 | 850 | 1650 | 200 | 1700 | 230 | 500 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.22 | 0.1866 | 3230 | 0 | 2260 | 100 | 880 | 1710 | 180 | 1400 | 230 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.24 | 0.2040 | 3720 | 0 | 2520 | 860 | 860 | 1690 | 120 | 0 | 220 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.26 | 0.2225 | 4350 | 80 | 2810 | 340 | 760 | 1480 | 0 | 0 | 190 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.28 | 0.2413 | 5140 | 210 | 3140 | 0 | 620 | 730 | 0 | 0 | 160 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.30 | 0.2682 | 8130 | 0 | 1870 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
From the output, a few things stand out. First, we notice very heavy allocations to SHV and BIL when our target returns are low. This makes sense as these are treasury ETFs and are considered to be essentially risk-free assets. We also notice that as the target returns increases, allocations in more growth-oriented ETFs start to pick up, namely SMH, PSCD, and IHI. This also makes sense as these names have a higher historical average returns over the last 10 years compared to other low-risk assets. We also see the portfolio risk steadily increasing as the target returns increase, which is to be expected – it’s the price investors pay for higher expected returns.
Now with the optimal allocations, we can visualize the relationship between expected annual returns and portfolio variance and find the allocation with the highest Sharpe Ratio given a 2.5% risk free rate (rough average of the 10-year treasury yield over the last 10 years).
frontier = data.frame(est_return = optimalAlloc$est_return,
risk = optimalAlloc$annualRisk_sd,
risk2 = optimalAlloc$annualRisk_sd^2)
model = lm(est_return ~ risk + risk2, data = frontier)
result <- function(a,b,c){
if(delta(a,b,c) > 0){ # first case D>0
x_1 = (-b+sqrt(delta(a,b,c)))/(2*a)
x_2 = (-b-sqrt(delta(a,b,c)))/(2*a)
result = c(x_1,x_2)
}
else if(delta(a,b,c) == 0){ # second case D=0
x = -b/(2*a)
}
else {"There are no real roots."} # third case D<0
}
# Constructing delta
delta<-function(a,b,c){
b^2-4*a*c
}
riskFreeRate = 0.025
bestVar = result(model$coefficients[3], 0,
riskFreeRate - model$coefficients[1])
bestEstReturns = predict(model, newdata =
list(risk = max(bestVar[2]),
risk2 = max(bestVar[2])^2))
ggplot(frontier, aes(x = risk, y = model$fitted.values)) +
geom_smooth(method = "loess", formula = "y~x", se = FALSE, linewidth = 1.5) +
geom_abline(intercept = riskFreeRate,
slope = 2*model$coefficients[3]*max(bestVar) +
model$coefficients[2],
color="red", linetype = "dashed", linewidth=0.8, alpha = 8) +
geom_point(aes(x = max(bestVar), y = bestEstReturns), size = 2) +
labs(title =
"Efficient Frontier, CAL, and Tangency Portfolio at 2.5% Risk-Free Rate",
y = "Expected Annual Returns", x = "Annual Portfolio Risk (sd)") +
ylim(0,0.35) +
scale_x_continuous(n.breaks = 7, limits = c(0,0.3))
From the plot above, we observe that the efficient frontier curve (blue) demonstrates a positive correlation between daily portfolio risk and expected returns, which is to be expected – the higher the expected return, the higher the portfolio risk. Any point on this line represents an optimal possible allocation of assets given an investors risk tolerance.
The red dashed line shows the capital allocation line and the point tangent to the efficient frontier is the allocation with the highest Sharpe Ratio. In this case, the expected return of the tangency portfolio is 27.4% with a portfolio risk of 23.9%. In the next section we will take the allocation of the tangency portfolio and simulate returns with its risk reward profile.
To simulate this, we will assume that annual returns follow a normal distribution with mean = principle * 1.274 and has the previously stated standard deviation of 23.9%. We will draw 500000 samples from this distribution to observe the distribution of possible outcomes from following this variance minimizing allocation strategy.
principle = 10000
endReturn = rnorm(500000, mean = principle*1.274, sd = principle*0.239)
ggplot(data.frame(endReturn), aes(x = endReturn)) +
geom_histogram(aes(y = after_stat(density)),
fill = "maroon", color = "orange", bins = 30) +
scale_x_continuous(n.breaks = 15) +
geom_density() +
labs(x = "EOY Return", title = "EOY Returns at E(return) = 27.4%")
lossProb = length(which(endReturn < principle*1.025))/length(endReturn)
#SPY Avg gain over last 10 years = 13.09%
gainProb = length(which(endReturn > 11309))/length(endReturn)
As shown, this risk-reward portfolio profile can lead to returns ranging from around -20% to +75% in the middle 95% range. This demonstrates that even when achieving minimum variance, having a portfolio with expected annual return standard deviation of 24.3% can still lead to significant negative returns. In fact, according to this model, there is a ~14.8% chance in expectation that a hypothetical investor with this portfolio does not beat the long-term risk-free rate of 2.5%. However, this risk is compensated by a ~72.5% probability that this allocation will beat the S&P500 index in returns in a year. Overall, this plot provides a theoretical lower bound visualization on the variance of EOY returns of a portfolio given previously stated parameters and risk tolerance – it serves as a useful guideline approximation on real-world potential returns.
There are plenty of use cases for these optimal allocation given an investor’s risk tolerance, namely:
However, there are a few key assumptions that one should keep in mind about utilizing MPT in this context, namely:
To address these issues, this implementation has introduced a new parameter, “myEst”, where users can define their estimated returns for each asset independent of past performance. Note that changing this parameter will not alter the covariance matrix, but will likely alter the resulting allocation weights for different assets.
Even with this in mind, it’s important to note the optimal allocations should on be used as a reference instead of pure truth. It doesn’t account for real world events nor circumstances and any output should therefore be interpreted with a large grain of salt.
In this implementation, we provided a weighted covariance and geometric return approach to the MPT optimization problem. We were able to show the optimal allocations given the ETFs as candidate assets. We also showed the risk reward profile of an hypothetical investor holding the tangency portfolio with a 27.4% expected annual return. In terms of next steps, we hope to incorporate analyst estimates on the stocks as a factor in estimating expected returns to obtain a more comprehensive view on an asset’s outlook.
Special thanks to Professor O. at Rice University on Coursera for motivating this mini-project!