logo

Introduction

Portfolio optimization is an important topic in Finance. Modern portfolio theory (MPT) states that investors are risk averse and given a level of risk, they will choose the portfolios that offer the most return. To do that we need to optimize the portfolios.

To perform the optimization we will need

So lets begin

Downloading data

Let’s select a few mutual funds to build our portfolios. These are the top mutual funds to invest in for a long time horizon, they don’t all have the same goals, but they share a number of advantages.

10 Great Mutual Funds to Invest In for the Long Haul

Let’s download the price data.

tick = c('VFIAX', 'VWUSX', 'FSDIX', 'PARMX', 'WMICX',
         'PRHSX','FSCSX','FIVFX','VWELX','FTBFX')

price_data = tq_get(tick,
                     from = '2014-01-01',
                     to = '2020-12-20',
                     get = 'stock.prices')

Next we will calculate the daily returns for these mutual funds. We will use the logarithmic returns.

log_ret_tidy = price_data %>%
  group_by(symbol) %>%
  tq_transmute(select = adjusted,
               mutate_fun = periodReturn,
               period = 'daily',
               col_rename = 'ret',
               type = 'log')

Let’s look at the first few rows.

head(log_ret_tidy)
## # A tibble: 6 x 3
## # Groups:   symbol [1]
##   symbol date              ret
##   <chr>  <date>          <dbl>
## 1 VFIAX  2014-01-02  0        
## 2 VFIAX  2014-01-03 -0.000355 
## 3 VFIAX  2014-01-06 -0.00249  
## 4 VFIAX  2014-01-07  0.00610  
## 5 VFIAX  2014-01-08  0.0000592
## 6 VFIAX  2014-01-09  0.000354

As you can see that this data is in tidy format. We will use the spread() function to convert it to a wide format. And we will also convert it into a time series object using xts() function.

log_ret_xts = log_ret_tidy %>%
  spread(symbol, value = ret) %>%
  tk_xts()
## Warning: Non-numeric columns being dropped: date
## Using column `date` for date_var.
head(log_ret_xts)
##                    FIVFX        FSCSX        FSDIX         FTBFX         PARMX
## 2014-01-02  0.0000000000  0.000000000  0.000000000  0.0000000000  0.0000000000
## 2014-01-03  0.0000000000  0.001020929  0.001432774  0.0000000000  0.0016129025
## 2014-01-06  0.0006038837 -0.001446201 -0.001432774  0.0019121710 -0.0060615396
## 2014-01-07  0.0024126914  0.013530616  0.005005371  0.0009543614  0.0044486371
## 2014-01-08  0.0000000000  0.001678662 -0.002142092 -0.0028665325 -0.0012112848
## 2014-01-09 -0.0024126914 -0.006224916  0.000000000  0.0009567795  0.0008080076
##                   PRHSX         VFIAX         VWELX         VWUSX        WMICX
## 2014-01-02  0.000000000  0.000000e+00  0.0000000000  0.0000000000  0.000000000
## 2014-01-03  0.001389965 -3.552476e-04  0.0005303929 -0.0014059898  0.006246138
## 2014-01-06 -0.009243989 -2.491073e-03 -0.0007953234 -0.0049365347 -0.004993730
## 2014-01-07  0.021323494  6.097524e-03  0.0042348106  0.0091484776  0.013673403
## 2014-01-08  0.016332512  5.915859e-05 -0.0013214261  0.0020993374  0.002466210
## 2014-01-09  0.011409552  3.540491e-04  0.0015857432 -0.0006993078  0.004913703

This is better for our purpose.

Next let’s calculate the mean daily returns for each asset.

mean_ret = colMeans(log_ret_xts)
print(round(mean_ret, 5))
##   FIVFX   FSCSX   FSDIX   FTBFX   PARMX   PRHSX   VFIAX   VWELX   VWUSX   WMICX 
## 0.00039 0.00074 0.00035 0.00017 0.00041 0.00058 0.00048 0.00034 0.00074 0.00068

Next we will calculate the covariance matrix for all these mutual funds. We will annualize it by multiplying by 252.

cov_mat = cov(log_ret_xts) * 252

print(round(cov_mat,4))
##        FIVFX   FSCSX  FSDIX   FTBFX   PARMX   PRHSX   VFIAX  VWELX   VWUSX
## FIVFX 0.0256  0.0288 0.0175  0.0003  0.0226  0.0246  0.0251 0.0157  0.0273
## FSCSX 0.0288  0.0477 0.0228 -0.0005  0.0300  0.0350  0.0348 0.0204  0.0406
## FSDIX 0.0175  0.0228 0.0163  0.0001  0.0201  0.0205  0.0220 0.0138  0.0218
## FTBFX 0.0003 -0.0005 0.0001  0.0013 -0.0003 -0.0005 -0.0005 0.0001 -0.0004
## PARMX 0.0226  0.0300 0.0201 -0.0003  0.0276  0.0271  0.0283 0.0174  0.0287
## PRHSX 0.0246  0.0350 0.0205 -0.0005  0.0271  0.0433  0.0304 0.0181  0.0346
## VFIAX 0.0251  0.0348 0.0220 -0.0005  0.0283  0.0304  0.0321 0.0195  0.0331
## VWELX 0.0157  0.0204 0.0138  0.0001  0.0174  0.0181  0.0195 0.0124  0.0195
## VWUSX 0.0273  0.0406 0.0218 -0.0004  0.0287  0.0346  0.0331 0.0195  0.0403
## WMICX 0.0276  0.0386 0.0230 -0.0002  0.0310  0.0363  0.0331 0.0199  0.0366
##         WMICX
## FIVFX  0.0276
## FSCSX  0.0386
## FSDIX  0.0230
## FTBFX -0.0002
## PARMX  0.0310
## PRHSX  0.0363
## VFIAX  0.0331
## VWELX  0.0199
## VWUSX  0.0366
## WMICX  0.0485

Before we apply our methods to thousands of random portfolio, let us demonstrate the steps on a single portfolio.

To calculate the portfolio returns and risk (standard deviation) we will us need

Lets create random weights first.

wts = runif(n = length(tick))
print(wts)
##  [1] 0.08347867 0.22541292 0.06921215 0.13578196 0.69971388 0.83693655
##  [7] 0.49515624 0.60552065 0.65795977 0.73867803
print(sum(wts))
## [1] 4.547851

We created some random weights, but the problem is that their sum is more than 1. We can fix this as shown below.

wts = wts/sum(wts)
print(wts)
##  [1] 0.01835563 0.04956471 0.01521865 0.02985629 0.15385594 0.18402902
##  [7] 0.10887697 0.13314435 0.14467488 0.16242354
sum(wts)
## [1] 1

Next we will calculate the annualized portfolio returns.

port_returns = (sum(wts * mean_ret) + 1)^252 - 1

Next we will calculate the portfolio risk (Standard deviation). This will be annualized Standard deviation for the portfolio. We will use linear algebra to calculate our portfolio risk.

port_risk = sqrt(t(wts) %*% (cov_mat %*% wts))
print(port_risk)
##           [,1]
## [1,] 0.1660733

Next we will assume 0% risk free rate to calculate the Sharpe Ratio.

# Since Risk free rate is 0% 
sharpe_ratio = port_returns/port_risk
print(sharpe_ratio)
##           [,1]
## [1,] 0.8732792

Let’s put all these steps together.

# Calculate the random weights
wts = runif(n = length(tick))
wts = wts/sum(wts)

# Calculate the portfolio returns
port_returns = (sum(wts * mean_ret) + 1)^252 - 1

# Calculate the portfolio risk
port_risk = sqrt(t(wts) %*% (cov_mat %*% wts))

# Calculate the Sharpe Ratio
sharpe_ratio = port_returns/port_risk

print(wts)
##  [1] 0.223507514 0.060107662 0.144730038 0.040609628 0.006096751 0.094284572
##  [7] 0.013432645 0.183711865 0.085338102 0.148181224
print(port_returns)
## [1] 0.1279638
print(port_risk)
##           [,1]
## [1,] 0.1483227
print(sharpe_ratio)
##          [,1]
## [1,] 0.862739

We have everything we need to perform our optimization. All we need now is to run this code on 5000 random portfolios. For that we will use a for loop.

Before we do that, we need to create empty vectors and matrix for storing our values.

num_port = 5000

# Creating a matrix to store the weights

all_wts = matrix(nrow = num_port,
                  ncol = length(tick))

# Creating an empty vector to store
# Portfolio returns

port_returns = vector('numeric', length = num_port)

# Creating an empty vector to store
# Portfolio Standard deviation

port_risk = vector('numeric', length = num_port)

# Creating an empty vector to store
# Portfolio Sharpe Ratio

sharpe_ratio = vector('numeric', length = num_port)

Next lets run the for loop 5000 times.

for (i in seq_along(port_returns)) {
  
  wts = runif(length(tick))
  wts = wts/sum(wts)
  
  # Storing weight in the matrix
  all_wts[i,] = wts
  
  # Portfolio returns
  
  port_ret = sum(wts * mean_ret)
  port_ret = ((port_ret + 1)^252) - 1
  
  # Storing Portfolio Returns values
  port_returns[i] = port_ret
  
  
  # Creating and storing portfolio risk
  port_sd = sqrt(t(wts) %*% (cov_mat  %*% wts))
  port_risk[i] = port_sd
  
  # Creating and storing Portfolio Sharpe Ratios
  # Assuming 0% Risk free rate
  
  sr = port_ret/port_sd
  sharpe_ratio[i] = sr
  
}

All the heavy lifting has been done and now we can create a data table to store all the values together.

# Storing the values in the table
portfolio_values = tibble(Return = port_returns,
                  Risk = port_risk,
                  SharpeRatio = sharpe_ratio)


# Converting matrix to a tibble and changing column names
all_wts = tk_tbl(all_wts)
## Warning in tk_tbl.data.frame(as.data.frame(data), preserve_index,
## rename_index, : Warning: No index to preserve. Object otherwise converted to
## tibble successfully.
colnames(all_wts) = colnames(log_ret_xts)

# Combing all the values together
portfolio_values = tk_tbl(cbind(all_wts, portfolio_values))
## Warning in tk_tbl.data.frame(cbind(all_wts, portfolio_values)): Warning: No
## index to preserve. Object otherwise converted to tibble successfully.

Lets look at the first few values.

head(portfolio_values)
## # A tibble: 6 x 13
##     FIVFX  FSCSX  FSDIX  FTBFX  PARMX  PRHSX  VFIAX  VWELX   VWUSX  WMICX Return
##     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>  <dbl>  <dbl>
## 1 0.00417 0.0249 0.0748 0.127  0.0973 0.147  0.150  0.0204 0.173   0.182   0.140
## 2 0.0605  0.162  0.176  0.0668 0.0830 0.0484 0.0843 0.159  0.00690 0.154   0.129
## 3 0.0576  0.126  0.0585 0.126  0.124  0.0786 0.125  0.120  0.125   0.0600  0.130
## 4 0.118   0.0569 0.0348 0.132  0.105  0.0920 0.132  0.0780 0.106   0.145   0.130
## 5 0.181   0.115  0.189  0.0218 0.115  0.0613 0.108  0.190  0.00110 0.0179  0.116
## 6 0.159   0.121  0.0184 0.160  0.0606 0.111  0.0649 0.128  0.105   0.0720  0.127
## # ... with 2 more variables: Risk <dbl>, SharpeRatio <dbl>

We have the weights in each asset with the risk and returns along with the Sharpe ratio of each portfolio.

Next lets look at the portfolios that matter the most.

min_var = portfolio_values[which.min(portfolio_values$Risk),]
max_sr = portfolio_values[which.max(portfolio_values$SharpeRatio),]

Lets plot the weights of each portfolio. First with the minimum variance portfolio.

p = min_var %>%
  gather(FIVFX:WMICX, key = Asset,
         value = Weights) %>%
  mutate(Asset = as.factor(Asset)) %>%
  ggplot(aes(x = fct_reorder(Asset,Weights), y = Weights, fill = Asset)) +
  geom_bar(stat = 'identity') +
  theme_minimal() +
  labs(x = 'Assets', y = 'Weights', title = "Minimum Variance Portfolio Weights") +
  scale_y_continuous(labels = scales::percent) 

ggplotly(p)

As we can observe the Minimum variance portfolio has no allocation to VFIAX and very little allocation to WMICX The majority of the portfolio is invested in VWELX and FTBFX mutual funds.

Next lets look at the tangency portfolio or the the portfolio with the highest sharpe ratio.

p = max_sr %>%
  gather(FIVFX:WMICX, key = Asset,
         value = Weights) %>%
  mutate(Asset = as.factor(Asset)) %>%
  ggplot(aes(x = fct_reorder(Asset,Weights), y = Weights, fill = Asset)) +
  geom_bar(stat = 'identity') +
  theme_minimal() +
  labs(x = 'Assets', y = 'Weights', title = "Tangency Portfolio Weights") +
  scale_y_continuous(labels = scales::percent) 

ggplotly(p)

Not surprisingly, the portfolio with the highest sharpe ratio has very little invested in FSDIX and VFIAX. This portfolio has most of the assets invested in VWUSX, WMICX and FTBFX. Three best performing mutual funds in the last decade.

Finally lets plot all the random portfolios and visualize the efficient frontier.

p = portfolio_values %>%
  ggplot(aes(x = Risk, y = Return, color = SharpeRatio)) +
  ylim(0,.20)+
  geom_point() +
  theme_classic() +
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(labels = scales::percent) +
  labs(x = 'Annualized Risk',
       y = 'Annualized Returns',
       title = "Portfolio Optimization & Efficient Frontier") +
  geom_point(aes(x = Risk,
                 y = Return), data = min_var, color = 'red') +
  geom_point(aes(x = Risk,
                 y = Return), data = max_sr, color = 'red') +
  annotate('text', x = 0.20, y = 0.42, label = "Tangency Portfolio") +
  annotate('text', x = 0.18, y = 0.01, label = "Minimum variance portfolio") +
  annotate(geom = 'segment', x = 0.14, xend = 0.135,  y = 0.01, 
           yend = 0.06, color = 'red', arrow = arrow(type = "open")) +
  annotate(geom = 'segment', x = 0.22, xend = 0.2275,  y = 0.405, 
           yend = 0.365, color = 'red', arrow = arrow(type = "open"))
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
ggplotly(p)

In the chart above we can observe all the 5000 portfolios. As mentioned above, a risk averse investor will demand a highest return for a given level of risk. In other words he/she will try to obtain portfolios that lie on the efficient frontier.