# #| include: false
import yfinance as yf
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import iguizarFuncs as igPortfolio Analysis
Disclaimer: This document is intended for educational purposes only. It does not constitute financial advice. It is part of my financial modeling course at Tec de Monterrey.
1 Intro
Portfolio theory, developed by Markowitz (1952), provides a framework for how investors can distribute their wealth to achieve optimal outcomes, while balancing the desire to maximize expected returns and to minimize risk. It remains a cornerstone of modern investment strategies. The objective of this document is to illustrate the principles that underpin this theory by developing an algorithm to obtain optimal portfolios via simulations, starting with a set of only two stocks and later generalizing it to \(N>2\) risky assets. The emphasis here is on the applications, the conceptual framework is covered in standard investment textbooks such as Elton et al. (2014).
2 Two risky assets
An investor has to decide how to distribute her wealth into two risky assets, A (with returns \(R_A\)) and B (with returns \(R_B\)). The proportion of wealth allocated to each asset is \(\omega_A\) and \(\omega_B\), respectively. The return on the portfolio is then a random variable \(R_p = \omega_A R_A + \omega_BR_B\).
The expected return on the portfolio is:
\[ \mu_p = E(R_p)= \omega_A \mu_A + \omega_B \mu_B \]
which in matrix form can be written as:
\[ \mathbf{\mu_p=\omega'\mu} \]
where \(\mu' = [\mu_A \ \ \mu_B]\) is the vector of expected returns and \(\omega' = [\omega_A \ \ \omega_B]\) is the vector of portfolio weights.
The variance of the return on the portfolio is: \[ \sigma^2_p = Var(R_p)= \omega^2_A\sigma^2_A + \omega^2_B\sigma^2_B + 2\omega_A \omega_B \sigma_{AB} \] where \(\sigma_{AB}\) is the covariance of \(R_A\) and \(R_B\). It follows that the correlation coefficient between \(R_A\) and \(R_B\) is \(\rho=\frac{\sigma_{AB}}{\sigma_A \sigma_B}\).
In matrix form:
\[ \mathbf{\sigma^2_p}= \omega' \Sigma\omega \]
where \(\Sigma = \begin{bmatrix}\sigma^2_A & \sigma_{AB} \\\sigma_{BA} & \sigma^2_B \end{bmatrix}\) is the variance-covariance matrix.
To maintain the measure of risk in the same units as the expected returns, we use the standard deviation instead of the variance, \(\sigma_p=[\sigma_p^2]^{1/2}\).
Determine an optimal portfolio using historical returns of two stocks: Unitedhealth Group (UNH) and Wells Fargo (WFC). To facilitate the analysis use monthly instead of daily data.
Get data:
tickers = ['UNH','WFC']
start_date = '2022-09-30'
end_date = '2024-09-30'
Data = ig.get_returns(tickers, start_date, end_date)
Data['Date'] = pd.to_datetime(Data['Date'])
monthly_rets = Data.resample('ME', on='Date')[tickers].sum().reset_index()
monthly_rets['Date'] = pd.to_datetime(monthly_rets['Date']).dt.dateCalculate the return, risk, and correlation:
mu_rets = monthly_rets[tickers].mean()
round(100*mu_rets,2)Ticker
UNH 0.71
WFC 1.61
dtype: float64
std_rets = monthly_rets[tickers].std()
round(std_rets*100,2)Ticker
UNH 4.82
WFC 8.63
dtype: float64
cov_matrix = monthly_rets[tickers].cov()
corr_matrix = monthly_rets[tickers].corr()
plt.figure(figsize=(7, 5))
sns.heatmap(corr_matrix,annot=True,cmap='Blues')
plt.show()We can simulate several portfolios where the investor can choose from, for instance:
omega = np.array([0.5, 0.5])
# return
mu_p = np.dot(omega, mu_rets)
round(mu_p*100,2)1.16
# risk
sigma_p = np.sqrt(np.dot(omega.T, np.dot(cov_matrix, omega)))
round(sigma_p*100,2)5.19
Note that the expected return is exactly half-way between the expected returns on the two assets, but the risk is less than half between the standard deviations. This illustrates the benefits of diversification.
omega = np.array([0.2, 0.8])
# return
mu_p = np.dot(omega, mu_rets)
round(mu_p*100,2)1.43
# risk
sigma_p = np.sqrt(np.dot(omega.T, np.dot(cov_matrix, omega)))
round(sigma_p*100,2)7.08
omega = np.array([0.8, 0.2])
# return
mu_p = np.dot(omega, mu_rets)
round(mu_p*100,2)0.89
# risk
sigma_p = np.sqrt(np.dot(omega.T, np.dot(cov_matrix, omega)))
round(sigma_p*100,2)4.41
Each of these portfolios are feasible.
2.1 Investment opportunity set
There is not limit to the number of portfolios the analyst can simulate, as long as \(\omega_A + \omega_B =1\). The entire set of feasible portfolios is called the Investment Opportunity Set
num_assets = len(tickers)
simul_portfolios = 1000
portfolios = pd.DataFrame(index=range(simul_portfolios), columns= tickers + ['port_return', 'port_risk'])
np.random.seed(42)
for port in range(simul_portfolios):
omega = np.random.random(num_assets)
omega /= np.sum(omega)
ret = np.dot(omega, mu_rets)
risk = np.sqrt(np.dot(omega.T, np.dot(cov_matrix, omega)))
portfolios.loc[port, tickers] = omega
portfolios.at[port, 'port_return'] = ret
portfolios.at[port, 'port_risk'] = riskGraphically:
plt.figure(figsize=(7, 5))
plt.scatter(portfolios['port_risk'], portfolios['port_return'], color='blue', marker='o', s = 10, alpha=0.7)
plt.scatter(std_rets.iloc[0], mu_rets.iloc[0], color='c', marker='o', s=30)
plt.scatter(std_rets.iloc[1], mu_rets.iloc[1], color='olive', marker='o', s=30)
plt.title(f'Investment Opportunity Set (n = {simul_portfolios})')
plt.xlabel('Risk (SD)')
plt.ylabel('Expected Return')
plt.grid(True, linestyle=':', alpha=0.25)
plt.tight_layout()
plt.show()How does the correlation between the returns shape the investment opportunity set?
rho = 1
cov_matrix2 = cov_matrix.copy()
std_devs = np.sqrt(np.diag(cov_matrix))
cov_matrix2.iloc[0,1] = rho*std_devs[0]*std_devs[1]
cov_matrix2.iloc[1,0] = cov_matrix2.iloc[0,1]
portfolios2 = pd.DataFrame(index=range(simul_portfolios), columns= tickers + ['port_return', 'port_risk'])
np.random.seed(42)
for port in range(simul_portfolios):
omega = np.random.random(num_assets)
omega /= np.sum(omega)
ret = np.dot(omega, mu_rets)
risk = np.sqrt(np.dot(omega.T, np.dot(cov_matrix2, omega)))
portfolios2.loc[port, tickers] = omega
portfolios2.at[port, 'port_return'] = ret
portfolios2.at[port, 'port_risk'] = risk
plt.figure(figsize=(7, 5))
plt.scatter(portfolios2['port_risk'], portfolios2['port_return'], color='blue', marker='o', s = 10, alpha=0.7)
plt.scatter(std_rets.iloc[0], mu_rets.iloc[0], color='c', marker='o', s=30)
plt.scatter(std_rets.iloc[1], mu_rets.iloc[1], color='olive', marker='o', s=30)
plt.title(f'Investment Opportunity Set (rho = {rho})')
plt.xlabel('Risk (SD)')
plt.ylabel('Expected Return')
plt.grid(True, linestyle=':', alpha=0.25)
plt.tight_layout()
plt.show()rho = 0
cov_matrix2 = cov_matrix.copy()
std_devs = np.sqrt(np.diag(cov_matrix))
cov_matrix2.iloc[0,1] = rho*std_devs[0]*std_devs[1]
cov_matrix2.iloc[1,0] = cov_matrix2.iloc[0,1]
portfolios2 = pd.DataFrame(index=range(simul_portfolios), columns= tickers + ['port_return', 'port_risk'])
np.random.seed(42)
for port in range(simul_portfolios):
omega = np.random.random(num_assets)
omega /= np.sum(omega)
ret = np.dot(omega, mu_rets)
risk = np.sqrt(np.dot(omega.T, np.dot(cov_matrix2, omega)))
portfolios2.loc[port, tickers] = omega
portfolios2.at[port, 'port_return'] = ret
portfolios2.at[port, 'port_risk'] = risk
plt.figure(figsize=(7, 5))
plt.scatter(portfolios2['port_risk'], portfolios2['port_return'], color='blue', marker='o', s = 10, alpha=0.7)
plt.scatter(std_rets.iloc[0], mu_rets.iloc[0], color='c', marker='o', s=30)
plt.scatter(std_rets.iloc[1], mu_rets.iloc[1], color='olive', marker='o', s=30)
plt.title(f'Investment Opportunity Set (rho = {rho})')
plt.xlabel('Risk (SD)')
plt.ylabel('Expected Return')
plt.grid(True, linestyle=':', alpha=0.25)
plt.tight_layout()
plt.show()rho = -1
cov_matrix2 = cov_matrix.copy()
std_devs = np.sqrt(np.diag(cov_matrix))
cov_matrix2.iloc[0,1] = rho*std_devs[0]*std_devs[1]
cov_matrix2.iloc[1,0] = cov_matrix2.iloc[0,1]
portfolios2 = pd.DataFrame(index=range(simul_portfolios), columns= tickers + ['port_return', 'port_risk'])
np.random.seed(42)
for port in range(simul_portfolios):
omega = np.random.random(num_assets)
omega /= np.sum(omega)
ret = np.dot(omega, mu_rets)
risk = np.sqrt(np.dot(omega.T, np.dot(cov_matrix2, omega)))
portfolios2.loc[port, tickers] = omega
portfolios2.at[port, 'port_return'] = ret
portfolios2.at[port, 'port_risk'] = risk
plt.figure(figsize=(7, 5))
plt.scatter(portfolios2['port_risk'], portfolios2['port_return'], color='blue', marker='o', s = 10, alpha=0.7)
plt.scatter(std_rets.iloc[0], mu_rets.iloc[0], color='c', marker='o', s=30)
plt.scatter(std_rets.iloc[1], mu_rets.iloc[1], color='olive', marker='o', s=30)
plt.title(f'Investment Opportunity Set (rho = {rho})')
plt.xlabel('Risk (SD)')
plt.ylabel('Expected Return')
plt.grid(True, linestyle=':', alpha=0.25)
plt.tight_layout()
plt.show()2.2 Portfolio of minimum risk
Of particular importance is the portfolio of minimum risk, which can be found by minimizing the equation for the variance. For only two risky assets, this is:
\[ \omega_A^{min} = \frac{\sigma^2_B-\sigma_{AB}}{\sigma^2_A+\sigma^2_B-2\sigma_{AB}} \]
\[ \omega_B^{min}= 1- \omega_A^{min} \]
Or we can simply select the row where the standard deviation is the lowest:
min_risk = portfolios.loc[portfolios['port_risk'].idxmin()]
min_risk*100UNH 79.129108
WFC 20.870892
port_return 0.901146
port_risk 4.409023
Name: 75, dtype: object
2.3 Efficient portfolios of risky assets
The investor would prefer portfolios that maximize the expected returns for a given level of risk. This subset of portfolios, shown in orange in Figure 1, is the efficient frontier.
2.4 Adding a risk-free asset
A natural extension of the previous analysis is to allow the investor to form portfolios that include a risk-free asset. Denote the expected return from this asset as \(r_f\), by definition the variance on its returns is zero. The investment opportunity set that combines the risky asset \(i = \{A,B\}\) with a risk-free asset is characterized by:
\[ \mu_p = \omega_i \mu_i + w_f\ r_f \]
and
\[ \sigma_p = \omega_i \sigma_i \]
By substituting the risk equation into the returns equation, where we have used the fact that \(\omega_f = 1-\omega_i\), we arrive to:
\[ \mu_p = r_f + \sigma_p\frac{(\mu_i-r_f)}{\sigma_i} \]
This is a straight line, where \(r_f\) is the intercept and \(\Big[\frac{\mu_i -r_f}{\sigma_i} \Big]\) is the slope, which in this context, represents the risk-premium on asset \(i\) per unit of risk, that is the Sharpe Ratio.
Add a risk-free asset to the portfolio of risky assets, using the U.S. Treasury Securities at 1-Month Constant Maturity (DGS1MO).
Obtain the risk-free rate of return on the Treasury bill:
# !pip install pandas-datareader
from pandas_datareader import data as pdr
tbill = pdr.DataReader('DGS1MO', 'fred', start_date, end_date)
tbill = tbill.reset_index()
month_tbill = tbill.resample('ME', on='DATE')['DGS1MO'].mean().reset_index()
rf = month_tbill['DGS1MO'].mean()/100/12
print(f"The monthly risk-free rate is: {100*rf:.2f}%")The monthly risk-free rate is: 0.42%
Obtain the Sharpe ratio:
sharpe1 = (mu_rets.iloc[0] - rf) / std_rets.iloc[0]
sharpe2 = (mu_rets.iloc[1] - rf) / std_rets.iloc[1]
print(f" The Sharpe ratio of portfolios that combine {tickers[0]} with the tbill is: {sharpe1:.3f}, \n while the mixture of {tickers[1]} with the tbill renders: {sharpe2:.3f} ") The Sharpe ratio of portfolios that combine UNH with the tbill is: 0.062,
while the mixture of WFC with the tbill renders: 0.139
We would prefer to invest only in the assets where the Sharpe’s ratio is higher. As shown in the Figure 2, these are portfolios with higher expected return, at any level of risk, than the set with a lower Sharpe ratio.
2.5 Tangency portfolio
An even better choice would combine the risk-free asset with a mixture of risky assets that yield the highest possible Sharpe ratio. This is the tangency portfolio:
portfolios['sharpe'] = (portfolios['port_return'] - rf) / portfolios['port_risk']
tangency = portfolios.loc[portfolios['sharpe'].idxmax()]
values = pd.to_numeric(tangency)
100*values.round(4)UNH 38.20
WFC 61.80
port_return 1.27
port_risk 5.84
sharpe 14.60
Name: 51, dtype: float64
Graphically, this is the tangent:
tl_x = np.linspace(0, tangency['port_risk']*(1.1), 100)
tl_y = rf + (tangency['port_return'] - rf) / tangency['port_risk'] * tl_x
plt.figure(figsize=(7, 5))
plt.scatter(portfolios['port_risk'], portfolios['port_return']-0.0001, color='blue', marker='o', s=1, alpha=0.7)
plt.scatter(min_risk.iloc[3], min_risk.iloc[2], color='y', marker='o', s=50)
plt.scatter(tangency.iloc[3], tangency.iloc[2], color='purple', marker='o', s=60)
plt.plot(tl_x, tl_y, color='orange', linewidth=2)
plt.plot(tl_x1, tl_y1, color='c', linewidth=2, alpha = 0.25)
plt.plot(tl_x2, tl_y2, color='olive', linewidth=2, alpha = 0.25)
plt.title(f'Investment Opportunity Set')
plt.xlabel('Risk')
plt.ylabel('Expected Return')
plt.grid(True, linestyle=':', alpha=0.25)
plt.tight_layout()
plt.show()Analytically, the values of the tangency portfolio are found by maximizing the Sharpe ratio. With only two assets, the solution is:
\[ \omega^{tan}_A = \frac{\pi_A\sigma^2_B-\pi_B\sigma_{AB}}{\pi_A\sigma^2_B + \pi_B\sigma^2_A-(\pi_A+\pi_B)\sigma_{AB}} \] and
\[ \omega^{tan}_B = 1 - \omega^{tan}_A \] where \(\pi_i = \mu_i - r_f\) is the risk premium on asset \(i\).
3 More than two risky assets
Determine an optimal portfolio using historical returns of all the stocks that were selected using the Momentum investment strategy studied in the previous topic. Recall the file saved was named “selectStocks.xslx”. Continue using monthly instead of daily data.
filtered = pd.read_excel('selectStocks.xlsx')
tickers = filtered['Ticker'].tolist()
start_date = '2022-09-30'
end_date = '2024-09-30'Calculate the expected returns, risk, and the variance-covariance matrix:
Data = ig.get_returns(tickers, start_date, end_date)
Data['Date'] = pd.to_datetime(Data['Date'])
monthly_rets = Data.resample('ME', on='Date')[tickers].sum().reset_index()
monthly_rets['Date'] = pd.to_datetime(monthly_rets['Date']).dt.date
mu_rets = monthly_rets[tickers].mean()
std_rets = monthly_rets[tickers].std()One should notice that the expected return for some of the selected stocks were negative during the period. While it’s perfectly valid to mantain those in the portolio, I will simplify the analysis by considering only those with positive expected returns:
tickers = mu_rets[mu_rets > 0].index.tolist()
tickers['BAC', 'BRK-A', 'BRK-B', 'CMCSA', 'ELV', 'MS', 'PANW', 'PG', 'QCOM', 'VRTX']
tickers = mu_rets[mu_rets > 0].index.tolist()
mu_rets = mu_rets[tickers]
std_rets = std_rets[tickers]The correlation matrix helps us to identify diversification opportunities:
cov_matrix = monthly_rets[tickers].cov()
corr_matrix = monthly_rets[tickers].corr()
mask = np.triu(np.ones_like(corr_matrix, dtype=bool)) # upper triangle
plt.figure(figsize=(7, 5))
ax = sns.heatmap(
corr_matrix,
annot=True,
cmap='Blues',
# mask=mask,
annot_kws={"size": 8},
cbar_kws={"shrink": 0.75}
)
ax.set_xticklabels(ax.get_xticklabels(), fontsize=8)
ax.set_yticklabels(ax.get_yticklabels(), fontsize=8)
# Display the plot
plt.show()3.1 Investment opportunity set
num_assets = len(tickers)
simul_portfolios = 1000
portfolios = pd.DataFrame(index=range(simul_portfolios), columns= tickers + ['port_return','port_risk'])
np.random.seed(42)
for port in range(simul_portfolios):
omega = np.random.random(num_assets)
omega /= np.sum(omega)
ret = np.dot(omega, mu_rets)
risk = np.sqrt(np.dot(omega.T, np.dot(cov_matrix, omega)))
portfolios.loc[port, tickers] = omega
portfolios.at[port, 'port_return'] = ret
portfolios.at[port, 'port_risk'] = riskplt.figure(figsize=(7, 5))
plt.scatter(portfolios['port_risk'], portfolios['port_return'], color='blue', marker='o', s = 20, alpha=0.5)
plt.title(f'Investment Opportunity Set (n = {simul_portfolios})')
plt.xlabel('Risk (SD)')
plt.ylabel('Expected Return')
plt.grid(True, linestyle=':', alpha=0.25)
plt.tight_layout()
plt.show()3.2 Portfolio of minimum risk
min_risk = portfolios.loc[portfolios['port_risk'].idxmin()]
val_risk = pd.to_numeric(min_risk)
round(val_risk*100,2)BAC 0.84
BRK-A 1.56
BRK-B 22.56
CMCSA 3.47
ELV 13.25
MS 10.23
PANW 8.64
PG 22.39
QCOM 0.54
VRTX 16.52
port_return 1.79
port_risk 3.87
Name: 201, dtype: float64
3.3 Tangency portfolio
Select the row where the Sharpe ratio is the highest:
portfolios['sharpe'] = (portfolios['port_return'] - rf) / portfolios['port_risk']
tangency = portfolios.loc[portfolios['sharpe'].idxmax()]
values = pd.to_numeric(tangency)
100*values.round(4)BAC 1.07
BRK-A 3.48
BRK-B 27.89
CMCSA 8.91
ELV 3.26
MS 3.03
PANW 20.34
PG 24.16
QCOM 7.27
VRTX 0.59
port_return 2.07
port_risk 4.07
sharpe 40.50
Name: 780, dtype: float64
Show the investment opportunity set:
tl_x = np.linspace(0, tangency['port_risk']*(1.1), 100)
tl_y = rf + (tangency['port_return'] - rf) / tangency['port_risk'] * tl_x
plt.figure(figsize=(7, 5))
sc = plt.scatter(portfolios['port_risk'], portfolios['port_return']-0.001, c=portfolios['sharpe'], cmap='plasma_r', marker='o', s=20, alpha=0.7)
plt.colorbar(sc)
plt.scatter(min_risk['port_risk'], min_risk['port_return'], color='r', marker='o', s=50)
plt.scatter(tangency['port_risk'], tangency['port_return'], color='purple', marker='o', s=60, label='Tangency')
plt.plot(tl_x, tl_y, color='orange', linewidth=2)
plt.title('Investment Opportunity Set')
plt.xlabel('Risk (Standard Deviation)')
plt.ylabel('Expected Return')
plt.grid(True, linestyle=':', alpha=0.25)
plt.tight_layout()
plt.show()3.4 Efficient portfolios
The efficient portfolios lie on the tangent, which, as previously stated, it is characterized by:
\[ \mu_p = r_f + \omega_{tan} (\mu^{tan}_p - r_f) \]
and
\[ \sigma_p = \omega_{tan} \sigma^{tan}_p \]
which is:
\[ \mu_p = r_f + \sigma_p\frac{(\mu^{tan}_p-r_f)}{\sigma^{tan}_p} \]
We can now define our target level of risk or expected returns, and derive the optimal investment.
<Figure size 672x480 with 0 Axes>
What would occur to the investment opportunity set if we continue adding stocks?
4 A simple risk analysis under normality assumptions
4.1 Recall: Value at risk
While the standard deviation is a useful measure of risk, it can be less intuitive to interpret. Value at Risk (VaR) addresses the issue by summarizing risk in a single, easy-to-understand number. It answers the question:
What loss level is such that we are X% confident it will not be exceeded in N business days?
The choice of confidence level and time horizon is determined by the analyst. For financial institutions, Basel II suggested capital requirements for credit risk based on a one-year 99.9% VaR. For market risk, the capital requirement is derived from multiples of a 10-day horizon at a 99% confidence level.
The simplified assumption that individual security returns are normally distributed is convenient because, since a linear combination of jointly normal random variables is normally distributed, it guarantees the portfolio return is also normally distributed.
The VaR at X% level of confidence is:
\[ VaR(X\%)= \mu_p + \sigma_p \Phi^{-1}(X\%) \]
where \(\Phi(\cdot)\) is the standard normal standard distribution. What is the same:
\[ VaR(X\%) = N^{-1}(X\%) \] where \(N(\cdot)\) is the normal distribution with mean \(\mu_p\) and standard deviation \(\sigma_p\).
Consider an investment of $128,000.00 in the tangency portfolio, estimate and interpret the Value at Risk of the investment.
from scipy.stats import norm
mu = tangency['port_return']
sigma = tangency['port_risk']
# Define the confidence level
cl = 0.99
VaR = mu + sigma*norm.ppf(1 - cl)
print(f"The {cl * 100:.1f}% VaR for the tangency portfolio over one month is: {-(np.exp(VaR)-1):.2%}")The 99.0% VaR for the tangency portfolio over one month is: 7.15%
Interpretation: We are 99% confident that the tangency portfolio will not lose more than 8.94% over one month.
In monetary terms:
W = 128000
VaR2 = W*(np.exp(VaR)-1)
print(f"The VaR on the tangency portfolio over one month is: ${-VaR2:.2f}")The VaR on the tangency portfolio over one month is: $9145.65
4.2 Recall: Expected shortfall
Expected shortfall asks the question:
If things do get bad, what is the expected loss?
The expected shortfall, also called conditional value at risk, indicates the expected loss over a specified time period conditional on the loss being greater than the \(Xth\) percentile of the loss distribution.
While, VaR is the loss level that will not be exceeded with a specified probability, Expected shortfall is the expected loss given that the loss is greater than the VaR level (also called C-VaR and Tail Loss).
For \(\mu>0\) and under the normality assumption, this is:
\[ ES(X\%) = \mu_{p} - \sigma_p \cdot \frac{\phi \Big(\Phi^{-1}(1-X\%)\Big)}{1-X\%} \]
z = norm.ppf(1-cl)
ES = mu - sigma * norm.pdf(z) / (1-cl)
es_per = round((np.exp(ES)-1) * 100, 2)
print(f"The Expected Shortfall (ES) is: {-es_per}%")The Expected Shortfall (ES) is: 8.42%