The Portfolio Problem
The decision-making problem is denoted by a decision vector \(\mathbf{d}\) and future realization of the underlying assets denoted by \(\mathbf{y}\). The former denotes the portfolio weights allocated to risky \(N\) assets, whereas the latter denotes the asset returns. The two vectors determine the wealth/reward of the investor, which is uncertain. It is commonly assumed that the parameters of the \(\mathbf{y}\) are known denoted by \(\boldsymbol{\theta}\), such that the density of \(\mathbf{y}\) is known. Nonetheless, in practice, these parameters are unknown, and their estimation is subjected to uncertainty. As a result, the decision process takes into account the conditional distribution of \(\mathbf{y}\mid\boldsymbol{\theta}\). In this regard, if we know the conditional distribution, then the decision-making process boils down to maximizing the following conditional expected utility: \[
\underset{\mathbf{d}}{\text{max}}\,\mathbb{E}\left[U(\mathbf{y}^{\top}\mathbf{d})\mid\boldsymbol{\theta}\right]
\]
The standard solution to the optimization problem replaces the parameters with the sample estimates and solves for optimal portfolio. This is known as the plug-in approach. However, it is well-recognized that such practice is associated with estimation risk and poor out-of-sample portfolio performance. Additionally, the derivation of this portfolio is inconsistent with utility theory (see, e.g., Klein and Bawa (1976)).
Uncertainty about \(\boldsymbol{\theta}\) can be taken into account employing Bayes’ decision criterion. According to the Bayes’ rule, we know that the posterior is proportional to the product between the likelihood function and the prior, i.e., \[\begin{equation}
p(\boldsymbol{\theta}\mid\mathbf{X},\boldsymbol{\theta}_{0})\propto p(\mathbf{X}\mid\boldsymbol{\theta})\pi(\boldsymbol{\theta})
\end{equation}\] where \(\mathbf{X}\) and \(\pi(\boldsymbol{\theta})\) denote, respectively, the sample-related information and the prior distribution. Here, \(\boldsymbol{\theta}_{0}\) denotes the beliefs about the unknown parameters.
Optimal Portfolio
Under the above assumptions, the posterior distribution \(p(\boldsymbol{\theta}\mid\mathbf{X},\boldsymbol{\theta}_{0})\) is known. We can determine the expected utility and, hence, the optimal portfolio accordingly. Under CARA utility, we have \[\begin{equation}
U(w)=-\frac{1}{\gamma}e^{-\gamma w}
\end{equation}\] with \(w=\mathbf{d}^{\top}\mathbf{y}\) denote the terminal wealth for allocation \(\mathbf{d}\). According to the tower property, we know that \[\begin{equation}
\mathbb{E}[U(w)]=\mathbb{E}_{\boldsymbol{\theta}}\left[\mathbb{E}[U(w)\mid\boldsymbol{\theta}]\right]
\end{equation}\] where \(\mathbb{E}_{\boldsymbol{\theta}}\) denotes the expectation condition on observing data \(\mathbf{X}\) and prior \(\boldsymbol{\theta}_{0}\).
If the parameters are known, then from the moment generating function of a Gaussian variable, we have \[\begin{equation}
\mathbb{E}[U(w)\mid\boldsymbol{\theta}]=-\frac{1}{\gamma}\mathbb{E}[e^{-\gamma\mathbf{d}^{\top}\mathbf{y}}\mid\boldsymbol{\theta}]=-\frac{1}{\gamma}e^{-\gamma\mathbf{d}^{\top}\boldsymbol{\mu}+\frac{\gamma^{2}}{2}\mathbf{d}^{\top}\boldsymbol{\Sigma}\mathbf{d}}
\end{equation}\] Finally, the expected utility function depends on finding the expectation of \[\begin{equation}
\mathbb{E}[U(w)]=-\frac{1}{\gamma}\mathbb{E}_{\boldsymbol{\theta}}\left[e^{-\gamma\mathbf{d}^{\top}\boldsymbol{\mu}+\frac{\gamma^{2}}{2}\mathbf{d}^{\top}\boldsymbol{\Sigma}\mathbf{d}}\right]
\end{equation}\] which requires knowing the posterior distribution \(p(\boldsymbol{\theta}\mid\mathbf{X},\boldsymbol{\theta}_{0})\). Alternatively, we can consider the predictive density function of \(\mathbf{y}\). Under the above assumptions, \(\mathbf{y}\mid\mathbf{X},\boldsymbol{\theta}_{0}\) follows a multivariate student-t distribution. Finding the expectation of which can be obtained numerically.
Known \(\Sigma\)
While a numerical solution provides a more accurate approximation, it comes at the cost of intuition. To gain further insights about the updating process, we make one simplification. We assume that the decision-maker knows the covariance matrix without any uncertainty. In this regards, we have \(\boldsymbol{\Sigma}=\boldsymbol{\Sigma}_{0}\). Therefore, the expected utility simplifies to \[\begin{equation}
\mathbb{E}[U(w)]=-\frac{1}{\gamma}e^{\frac{\gamma^{2}}{2}\mathbf{d}^{\top}\boldsymbol{\Sigma}\mathbf{d}}\mathbb{E}_{\boldsymbol{\theta}}\left[e^{-\gamma\mathbf{d}^{\top}\boldsymbol{\mu}}\right].
\end{equation}\]
In this case, we know that \[\begin{equation}
\boldsymbol{\mu}\mid\mathbf{X},\boldsymbol{\mu}_{0}\sim N\left(\boldsymbol{\Sigma}_{1}\boldsymbol{\Sigma}_{0}^{-1}\boldsymbol{\mu}_{0}+\boldsymbol{\Sigma}_{1}T\boldsymbol{\Sigma}_{0}^{-1}\bar{\mathbf{y}},\boldsymbol{\Sigma}_{1}\right)
\end{equation}\] with \[\begin{equation}
\boldsymbol{\Sigma}_{1}^{-1}=\boldsymbol{\Sigma}_{0}^{-1}+T\boldsymbol{\Sigma}^{-1}
\end{equation}\] Since \(\boldsymbol{\Sigma}=\boldsymbol{\Sigma}_{0}\), then \[\begin{equation}
\boldsymbol{\Sigma}_{1}=\frac{1}{1+T}\boldsymbol{\Sigma}_{0}
\end{equation}\] and \[\begin{equation}
\boldsymbol{\Sigma}_{1}\boldsymbol{\Sigma}_{0}^{-1}\boldsymbol{\mu}_{0}+\boldsymbol{\Sigma}_{1}T\boldsymbol{\Sigma}_{0}^{-1}\bar{\mathbf{y}}=\frac{1}{T}\boldsymbol{\mu}_{0}+\frac{T}{1+T}\bar{\mathbf{y}}
\end{equation}\] with \(T\) denoting the sample size. As a result, we have \[\begin{equation}
\boldsymbol{\mu}\mid\mathbf{X},\boldsymbol{\mu}_{0}\sim N\left(\alpha\boldsymbol{\mu}_{0}+(1-\alpha)\bar{\mathbf{y}},\frac{1}{1+T}\boldsymbol{\Sigma}_{0}\right)
\end{equation}\] with
\[\begin{equation}
\alpha=\frac{1}{1+T}\in(0,1)\:\text{for }T>0.
\end{equation}\] The expected utility is given by \[\begin{equation}
\mathbb{E}[U(w)]=-\frac{1}{\gamma}\text{exp}\left(-\gamma\mathbf{d}^{\top}\left[\alpha\boldsymbol{\mu}_{0}+(1-\alpha)\bar{\mathbf{y}}\right]+(1+\alpha)\frac{\gamma^{2}}{2}\mathbf{d}^{\top}\boldsymbol{\Sigma}_{0}\mathbf{d}\right).
\end{equation}\] Applying a positive transformation on the expected utility, its optimization is consistent with the following mean-variance (MV) preference: \[\begin{equation}
\mathbf{d}^{\top}\left[\alpha\boldsymbol{\mu}_{0}+(1-\alpha)\bar{\mathbf{y}}\right]-\frac{\gamma}{2}(1+\alpha)\mathbf{d}^{\top}\boldsymbol{\Sigma}_{0}\mathbf{d}
\end{equation}\]
Consider the case of full information in which \(T\rightarrow\infty\) such that \(\alpha\rightarrow0\). Under this extreme scenario, it is clear that the Bayesian portfolio problem converges to the conventional MV optimization. For a finite sample size, \(\alpha\) denotes the ambiguity about the mean return. The larger it is, the more variance the decision market assigns to the portfolio variance and the greater weight is attributed to the prior.
Under the constraint that \(\mathbf{d}^{\top}\mathbf{1}=1\), the optimal portfolio is given by \[\begin{equation}
\mathbf{d}^{*}=\mathbf{d}_{g}+\frac{1}{1+\alpha}\mathbf{B}[\alpha\boldsymbol{\mu}_{0}+(1-\alpha)\bar{\mathbf{y}}]
\end{equation}\] with \[\begin{equation}
\mathbf{d}_{g}=\frac{\boldsymbol{\Sigma}_{0}^{-1}\mathbf{1}}{\mathbf{1}^{\top}\boldsymbol{\Sigma}_{0}^{-1}\mathbf{1}}
\end{equation}\] denoting the global minimum variance (GMV) portfolio and \[\begin{equation}
\mathbf{B}=\boldsymbol{\Sigma}_{0}^{-1}\left[\mathbf{I}-\mathbf{1}\mathbf{d}_{g}^{\top}\right]
\end{equation}\] is a centering matrix around the GMV portfolio (see, e.g., Simaan et al. (2018)).
It is clear that the MV optimal portfolio consists of three funds. The first is the GMV. The second (third) is the arbitrage portfolio that depends on the prior (sample) mean return: \[\begin{equation}
\mathbf{d}^{*}=\mathbf{d}_{g}+\frac{\alpha}{1+\alpha}\mathbf{B}\boldsymbol{\mu}_{0}+\frac{1-\alpha}{1+\alpha}\mathbf{B}\bar{\mathbf{y}}.
\end{equation}\] To simplify, let \(\mathbf{B}_{\alpha}=(1+\alpha)^{-1}\mathbf{B}\), such that
\[\begin{equation}
\mathbf{d}^{*}=\mathbf{d}_{g}+\alpha\mathbf{B}_{\alpha}\boldsymbol{\mu}_{0}+(1-\alpha)\mathbf{B}_{\alpha}\bar{\mathbf{y}}.
\end{equation}\]
We observe that the Bayesian MV portfolio is a convex combination between two portfolios. Specifically, for \(\alpha\in(0,1)\) it follows that \[\begin{equation}
\mathbf{d}^{*}=\alpha\mathbf{d}^{*}(\boldsymbol{\mu}_{0})+\left[1-\alpha\right]\mathbf{d}^{*}(\bar{\mathbf{y}})
\end{equation}\] with
\[\begin{equation}
\mathbf{d}^{*}(\mathbf{u})=\mathbf{d}_{g}+\mathbf{B}_{\alpha}\mathbf{u},
\end{equation}\] denoting a linear function of vector \(\mathbf{u}\).
Empirical Example
Let us start with a simple numerical example. To do so, I will be working with two ETFs. The first one is the SPY ETF denoting the stock market portfolio, whereas the other is the IEF denoting the return on 7-10 years Treasury bonds.
library(quantmod)
library(lubridate)
v <- c("SPY","IEF")
t1 <- "1996-01-01" # the starting date
P.list <- lapply(v,function(x) get(getSymbols(x,from = t1)) )
P <- lapply(P.list,function(x) x[,grep("Adjusted",names(x))])
P <- Reduce(function(...) merge(...),P )
names(P) <- v
P <- apply.monthly(P,function(x) x[nrow(x),] )
R <- na.omit(P/lag(P)-1)
R <- R["/2020-12-31",]
We consider the returns on both ETFs up to late 2020. Note that the inception of the IEF took place during the second half of 2002. For this reason, the data dates between
range(date(R))
[1] "2002-08-30" "2020-12-31"
The key to applying the Bayesian portfolio is to construct several matrices and vectors. We are also interested in evaluating the portfolio weights over time and, hence, its performance. To do so, we perform a simple backtesting procedure. At the end of each month, we consider the recent \(T\) months of data as our sample.
d <- ncol(R)
T_sample <- 12*2
month_series <- unique(ceiling_date(date(R),"m") - 1)
month_series <- month_series[year(month_series) > 2004] # keep two years for initial estimation
d_star_mat <- data.frame()
for (month_i in month_series) {
alpha <- 1/(1+T_sample)
e <- rep(1,d) # vector of ones
I_mat <- diag(d) # diagonal matrix
R_sub <- R[date(R) <= month_i,]
R_sub <- tail(R_sub,T_sample) # sample
y_bar <- apply(R_sub,2,mean) # sample mean vector
mu_0 <- c(0.11,0.06)
Sigma <- var(R_sub) # covariance matrix
Sigma_inv <- solve(Sigma) # covariance matrix inverse
d_g <- Sigma_inv%*%e/sum(Sigma_inv)
B_mat <- Sigma_inv%*%(I_mat - e%*%t(d_g) )
B_mat_a <- (1/(1+alpha))*B_mat
d_star <- d_g + alpha*B_mat_a%*%mu_0 + (1-alpha)*B_mat_a%*%y_bar
d_star <- exp(d_star)/(1+exp(d_star))
d_star <- d_star/sum(d_star)
d_star_mat <- rbind(d_star_mat,t(d_star))
}
A number of comments are in order. First, we note that the above procedure loops through all unique months in the data one by one. This application allows updating the sample estimates on a monthly basis. Second, the prior above corresponds to the total sample estimates of the mean return of each ETF. This approach is subjected to data snooping (look-ahead bias), since such information is only revealed after the fact. However, we do so for the sake of illustration. In the following discussion, we consider the sensitivity of the results to different priors. Third, it is well-recognized that the MV portfolio weights result in extreme positions that are considered unrealistic. To mitigate this, we consider a simple logit transformation, which is applied using these two commands:
d_star <- exp(d_star)/(1+exp(d_star))
d_star <- d_star/sum(d_star)
The second command normalizes the portfolio weights, such that its weights sum to one. While this may seem like an ad-hoc approach, this transformation results in a lower portfolio norm and is consistent with DeMiguel et al. (2009).
The plot below illustrates the weight allocated to the SPY ETF over time. We observe that the weight is above 80% the majority of the time. The weight decreases during market selloffs such as the 2007-09 financial crisis, late 2016, and COVID-19.
rownames(d_star_mat) <- month_series
d_star_mat <- as.xts(d_star_mat)
plot(d_star_mat[,1],main = "Weight to SPY")

Let us evaluate the portfolio performance in terms of return over time
R_out <- R[year(date(R)) > 2004 ,]
R_out <- as.matrix(R_out[-1,])
W_in <- as.matrix(d_star_mat[-nrow(d_star_mat)])
r_port_oos <- apply(W_in*R_out,1,sum)
names(r_port_oos) <- rownames(R_out)
r_port_oos <- as.xts(r_port_oos)
As a benchmark, we consider the famous 60-40 decision rule
W_naive <- matrix(c(0.6,0.4),nrow = nrow(W_in),2,byrow = T)
r_port_oos_naive <- apply(W_naive*R_out,1,sum)
names(r_port_oos_naive) <- rownames(R_out)
r_port_oos_naive <- as.xts(r_port_oos_naive)
r_port_oos <- merge(r_port_oos,r_port_oos_naive)
names(r_port_oos) <- c("Bayes","Naive")
plot(cumprod(r_port_oos+1),main = "Cumulative Return",legend.loc = "topleft")

Overall, we observe that the Bayesian portfolio outperforms the 60-40 decision rule.
Different Priors
In this section, we repeat the above analysis for different priors. For example, suppose that the decision-maker believes that the return on the SPY is negative 11%, whereas the return on the IEF stays the same. In this case, the corresponding portfolio would be tilted further towards the IEF (safer asset). As the figure below demonstrates, we note that Bayes portfolio underperforms in terms of cumulative return.

In terms of risk-adjusted returns, we compute the Sharpe-ratio for each portfolio and report below
SR_f <- function(x) sqrt(12)*mean(x)/sd(x)
result <- apply(r_port_oos, 2, SR_f)
round(data.frame(t(result)),2)
While the Bayes results in lower volatility, it is also associated with a lower return. Taken altogether, we realize that the Bayes results in a roughly similar Sharpe ratio.
In the next example, we modify the backtesting program a little. In doing so, the prior incorporates the recent one-month return as the investor’s belief. For example, at the end of month \(t\), after realizing the monthly returns between \(t-1\) and \(t\) denoted by \(\mathbf{y}_t\), the decision-maker sets the prior to \(\boldsymbol{\mu}_0 = \mathbf{y}_t\). To do so, we repeat the previous commands, however, with a small tweak in defining the prior mu_0, i.e.,
mu_0 <- t(tail(R_sub,1))
We summarize the cumulative return in the plot below:

Interestingly, we observe that the Bayes portfolio outperforms the naive decision rule. Furthermore, in terms of risk-adjusted performance, we note that the Bayes rule outperforms the naive, resulting in a Sharpe ratio of 1.06:
result <- apply(r_port_oos, 2, SR_f)
round(data.frame(t(result)),2)
One concern with the above is that the Bayes portfolio may exhibit a higher turnover. Taking into account transaction costs (TC), it is unclear whether the Bayes still outperforms. To evaluate this, we consider a similar TC analysis pursued in this article
W_to <- W_in*(1+R_out)
W_to <- t(apply(W_to,1,function(x) t(x)/sum(x)))
W_to <- W_in[-1,] - W_to[-nrow(W_to),]
W_to <- mean(apply(W_to,1,function(x) sum(abs(x)) ))
TO_bayes <- W_to
whereas for the naive portfolio, we have
W_to <- W_naive*(1+R_out)
W_to <- t(apply(W_to,1,function(x) t(x)/sum(x)))
W_to <- W_in[-1,] - W_to[-nrow(W_to),]
W_to <- mean(apply(W_to,1,function(x) sum(abs(x)) ))
data.frame(Bayes = TO_bayes,Naive = W_to)
While the Bayes does include recent returns into the allocation and is subjected to more noise, its incorporation seems to have a mitigating effect. This result could potentially be due to the persistence of returns over time. To relate to this conjecture, consider the autocorrelation of each ETF over time. Using the same sample size T_sample, we compute the rolling window autocorrelation for each asset and plot the results below.
library(plyr)
R_lag <- na.omit(cbind(R_out,lag.xts(R_out,1)))
cor_roll <- rollapply(R_lag,T_sample, function(x) as.vector(cor(x)), by.column = F, align = "right")
cor_roll <- alply(cor_roll, 1, function(x) matrix(x, nrow = ncol(R_lag) ) )
spy_ar <- sapply(cor_roll,function(x) x[3,1])
ief_ar <- sapply(cor_roll,function(x) x[4,2])
AR_mat <- cbind(spy_ar,ief_ar)
rownames(AR_mat) <- rownames(R_lag)[T_sample:nrow(R_lag)]
AR_mat <- as.xts(AR_mat)
names(AR_mat) <- c("SPY","IEF")
plot(AR_mat,legend.loc = "bottomleft",main = "Autocorrelation")

The above analysis relies on the same trick done in the aforementioned article. Overall, we note the serial correlation varies over time, with values ranging between -53% and 45% for the SPY, whereas the corresponding numbers for IEF are -46% and 35%, respectively. Given the contemporaneous negative relation between the two ETFs, we also observe that both autocorrelations co-move together over time. Nonetheless, over the entire sample period, we note that both coefficients are positive.
AR_result <- data.frame(SPY = cor(R$SPY,lag(R$SPY), use = "pairwise"),
IEF = cor(R$IEF,lag(R$IEF), use = "pairwise"))
rownames(AR_result) <- "Autocorrelation (%)"
round(AR_result*100,2)
---
title: "Bayesian Portfolio Selection: Application to Tactical Asset Allocation"
#output: rmarkdown::github_document
output:
  html_notebook: default
  pdf_document: default
author: Majeed Simaan
date: July 7, 2021
fig_width: 20
---

# Overview
The literature on portfolio selection under parameter uncertainty is numerous. The most common techniques involve Bayesian statistics or shrinkage. For instance, Jorion (1986) demonstrates how a decision-maker can employ Bayesian statistics from a shrinkage point of view. This decision rule is commonly known as the Bayes-Stein. Another famous paradigm is the Black and Litterman (1990)  model. The latter model allows the decision-maker to incorporate individual views/beliefs about the underlying assets. As the paper's title indicates, these views could reflect a deviation from an equilibrium model. In either case, since estimation error can be severe, especially for the mean returns, the premise behind the analysis is motivated by a variance-bias trade-off. On the one hand, the decision-maker imposes a bias on the portfolio weights. On the other hand, the inclusion of the prior is fixed and less subjected to estimation. Overall, if the prior is informative, one expects that its inclusion leads to a lower mean-squared error.


This article discusses the portfolio selection problem from a Bayesian perspective. In doing so, I first provide an overview of the portfolio problem and motivate the decision-making process from an expected utility point of view. Then, I demonstrate the analytical solution to the problem and stress the intuition behind the Bayesian application. In particular, in the case of risky assets, the optimal portfolio corresponds to three funds. The first is the minimum variance portfolio, whereas the others denote two self-financing portfolios corresponding to the mean returns. The combination between the first and the second funds is consistent with the conventional mean-variance portfolio. Furthermore, with the inclusion of the third fund, the portfolio incorporates the priors/beliefs of the decision-making into the portfolio selection. Based on the analytical insights, I conduct a small empirical experiment using two ETFs. The experiment emulates a tactical asset allocation problem. All the empirical analysis is conducted using R.



# The Portfolio Problem
The decision-making problem is denoted by a decision vector $\mathbf{d}$
and future realization of the underlying assets denoted by $\mathbf{y}$.
The former denotes the portfolio weights allocated to risky $N$ assets,
whereas the latter denotes the asset returns. The two vectors determine
the wealth/reward of the investor, which is uncertain. It is commonly
assumed that the parameters of the $\mathbf{y}$ are known denoted
by $\boldsymbol{\theta}$, such that the density of $\mathbf{y}$
is known. Nonetheless, in practice, these parameters are unknown, and their estimation is subjected to uncertainty. As a result, the decision process takes into account the conditional distribution of $\mathbf{y}\mid\boldsymbol{\theta}$.
In this regard, if we know the conditional distribution, then the decision-making process boils down to
maximizing the following conditional expected utility:
$$
\underset{\mathbf{d}}{\text{max}}\,\mathbb{E}\left[U(\mathbf{y}^{\top}\mathbf{d})\mid\boldsymbol{\theta}\right]
$$

The standard solution to the optimization problem replaces the parameters with the sample estimates and solves for optimal portfolio. This is known as the plug-in approach. However, it is well-recognized that such practice is associated with estimation risk and poor out-of-sample portfolio performance. Additionally, the derivation of this portfolio is inconsistent with utility theory (see, e.g., Klein and Bawa (1976)). 

Uncertainty about $\boldsymbol{\theta}$ can be taken into account
employing Bayes' decision criterion. According to the Bayes' rule,
we know that the posterior is proportional to the product between
the likelihood function and the prior, i.e.,
\begin{equation}
p(\boldsymbol{\theta}\mid\mathbf{X},\boldsymbol{\theta}_{0})\propto p(\mathbf{X}\mid\boldsymbol{\theta})\pi(\boldsymbol{\theta})
\end{equation}
where $\mathbf{X}$ and $\pi(\boldsymbol{\theta})$ denote, respectively,
the sample-related information and the prior distribution. Here, $\boldsymbol{\theta}_{0}$
denotes the beliefs about the unknown parameters.

# Assumptions
We make a couple of assumptions. The first is that the decision-maker's preference is represented using a constant absolute risk
aversion (CARA) utility function. The second assumption relates to
the distribution function. For the returns, we assume that the 
\begin{equation}
\mathbf{y}\,\mid\boldsymbol{\theta}\sim N(\boldsymbol{\mu},\boldsymbol{\Sigma})
\end{equation}
where $\boldsymbol{\mu}$ and $\boldsymbol{\Sigma}$ are unknown. 

To determine the expected utility, we need to postulate an assumption
regarding the priors of these two unknowns. In particular, we assume
that the joint distribution between the two is Normal-Wishart, such that
\begin{equation}
\boldsymbol{\theta}\sim NW\left(\boldsymbol{\mu}_{0},\tau,\boldsymbol{\Sigma}_{0},\nu\right)
\end{equation}
with $\tau$ and $\mu$ denoting the parameters that determine the
strength of the prior belief, whereas $\boldsymbol{\mu}_{0}$ and
$\boldsymbol{\Sigma}_{0}$ denote the prior knowledge about the mean
vector and the covariance matrix, respectively.

# Optimal Portfolio
Under the above assumptions, the posterior distribution $p(\boldsymbol{\theta}\mid\mathbf{X},\boldsymbol{\theta}_{0})$
is known. We can determine the expected utility and, hence, the optimal
portfolio accordingly. Under CARA utility, we have 
\begin{equation}
U(w)=-\frac{1}{\gamma}e^{-\gamma w}
\end{equation}
with $w=\mathbf{d}^{\top}\mathbf{y}$ denote the terminal wealth for
allocation $\mathbf{d}$. According to the tower property, we know
that
\begin{equation}
\mathbb{E}[U(w)]=\mathbb{E}_{\boldsymbol{\theta}}\left[\mathbb{E}[U(w)\mid\boldsymbol{\theta}]\right]
\end{equation}
where $\mathbb{E}_{\boldsymbol{\theta}}$ denotes the expectation
condition on observing data $\mathbf{X}$ and prior $\boldsymbol{\theta}_{0}$.

If the parameters are known, then from the moment generating function
of a Gaussian variable, we have 
\begin{equation}
\mathbb{E}[U(w)\mid\boldsymbol{\theta}]=-\frac{1}{\gamma}\mathbb{E}[e^{-\gamma\mathbf{d}^{\top}\mathbf{y}}\mid\boldsymbol{\theta}]=-\frac{1}{\gamma}e^{-\gamma\mathbf{d}^{\top}\boldsymbol{\mu}+\frac{\gamma^{2}}{2}\mathbf{d}^{\top}\boldsymbol{\Sigma}\mathbf{d}}
\end{equation}
Finally, the expected utility function depends on finding the expectation
of 
\begin{equation}
\mathbb{E}[U(w)]=-\frac{1}{\gamma}\mathbb{E}_{\boldsymbol{\theta}}\left[e^{-\gamma\mathbf{d}^{\top}\boldsymbol{\mu}+\frac{\gamma^{2}}{2}\mathbf{d}^{\top}\boldsymbol{\Sigma}\mathbf{d}}\right]
\end{equation}
which requires knowing the posterior distribution $p(\boldsymbol{\theta}\mid\mathbf{X},\boldsymbol{\theta}_{0})$.
Alternatively, we can consider the predictive density function of
$\mathbf{y}$. Under the above assumptions,  $\mathbf{y}\mid\mathbf{X},\boldsymbol{\theta}_{0}$
follows a multivariate student-t distribution. Finding the expectation
of which can be obtained numerically. 

## Known $\Sigma$
While a numerical solution provides a more accurate approximation, it comes at the cost of intuition. To gain further insights about the updating process, we make one simplification. We assume that the decision-maker knows the covariance matrix without any uncertainty. In this regards, we have $\boldsymbol{\Sigma}=\boldsymbol{\Sigma}_{0}$. Therefore, the expected utility simplifies to 
\begin{equation}
\mathbb{E}[U(w)]=-\frac{1}{\gamma}e^{\frac{\gamma^{2}}{2}\mathbf{d}^{\top}\boldsymbol{\Sigma}\mathbf{d}}\mathbb{E}_{\boldsymbol{\theta}}\left[e^{-\gamma\mathbf{d}^{\top}\boldsymbol{\mu}}\right].
\end{equation}

In this case, we know that
\begin{equation}
\boldsymbol{\mu}\mid\mathbf{X},\boldsymbol{\mu}_{0}\sim N\left(\boldsymbol{\Sigma}_{1}\boldsymbol{\Sigma}_{0}^{-1}\boldsymbol{\mu}_{0}+\boldsymbol{\Sigma}_{1}T\boldsymbol{\Sigma}_{0}^{-1}\bar{\mathbf{y}},\boldsymbol{\Sigma}_{1}\right)
\end{equation}
with
\begin{equation}
\boldsymbol{\Sigma}_{1}^{-1}=\boldsymbol{\Sigma}_{0}^{-1}+T\boldsymbol{\Sigma}^{-1}
\end{equation}
Since $\boldsymbol{\Sigma}=\boldsymbol{\Sigma}_{0}$, then
\begin{equation}
\boldsymbol{\Sigma}_{1}=\frac{1}{1+T}\boldsymbol{\Sigma}_{0}
\end{equation}
and
\begin{equation}
\boldsymbol{\Sigma}_{1}\boldsymbol{\Sigma}_{0}^{-1}\boldsymbol{\mu}_{0}+\boldsymbol{\Sigma}_{1}T\boldsymbol{\Sigma}_{0}^{-1}\bar{\mathbf{y}}=\frac{1}{T}\boldsymbol{\mu}_{0}+\frac{T}{1+T}\bar{\mathbf{y}}
\end{equation}
with $T$ denoting the sample size. As a result, we have
\begin{equation}
\boldsymbol{\mu}\mid\mathbf{X},\boldsymbol{\mu}_{0}\sim N\left(\alpha\boldsymbol{\mu}_{0}+(1-\alpha)\bar{\mathbf{y}},\frac{1}{1+T}\boldsymbol{\Sigma}_{0}\right)
\end{equation}
with

\begin{equation}
\alpha=\frac{1}{1+T}\in(0,1)\:\text{for }T>0.
\end{equation}
The expected utility is given by
\begin{equation}
\mathbb{E}[U(w)]=-\frac{1}{\gamma}\text{exp}\left(-\gamma\mathbf{d}^{\top}\left[\alpha\boldsymbol{\mu}_{0}+(1-\alpha)\bar{\mathbf{y}}\right]+(1+\alpha)\frac{\gamma^{2}}{2}\mathbf{d}^{\top}\boldsymbol{\Sigma}_{0}\mathbf{d}\right).
\end{equation}
Applying a positive transformation on the expected utility, its optimization
is consistent with the following mean-variance (MV) preference:
\begin{equation}
\mathbf{d}^{\top}\left[\alpha\boldsymbol{\mu}_{0}+(1-\alpha)\bar{\mathbf{y}}\right]-\frac{\gamma}{2}(1+\alpha)\mathbf{d}^{\top}\boldsymbol{\Sigma}_{0}\mathbf{d}
\end{equation}

Consider the case of full information in which $T\rightarrow\infty$
such that $\alpha\rightarrow0$. Under this extreme scenario, it is
clear that the Bayesian portfolio problem converges to the conventional
MV optimization. For a finite sample size, $\alpha$ denotes the ambiguity
about the mean return. The larger it is, the more variance the decision
market assigns to the portfolio variance and the greater weight is
attributed to the prior.

Under the constraint that $\mathbf{d}^{\top}\mathbf{1}=1$, the optimal
portfolio is given by 
\begin{equation}
\mathbf{d}^{*}=\mathbf{d}_{g}+\frac{1}{1+\alpha}\mathbf{B}[\alpha\boldsymbol{\mu}_{0}+(1-\alpha)\bar{\mathbf{y}}]
\end{equation}
with 
\begin{equation}
\mathbf{d}_{g}=\frac{\boldsymbol{\Sigma}_{0}^{-1}\mathbf{1}}{\mathbf{1}^{\top}\boldsymbol{\Sigma}_{0}^{-1}\mathbf{1}}
\end{equation}
denoting the global minimum variance (GMV) portfolio and 
\begin{equation}
\mathbf{B}=\boldsymbol{\Sigma}_{0}^{-1}\left[\mathbf{I}-\mathbf{1}\mathbf{d}_{g}^{\top}\right]
\end{equation}
is a centering matrix around the GMV portfolio (see, e.g., Simaan et al. (2018)). 

It is clear that the MV optimal portfolio consists of three funds. The first is the GMV. The second (third) is the arbitrage portfolio that depends on the prior (sample) mean return:
\begin{equation}
\mathbf{d}^{*}=\mathbf{d}_{g}+\frac{\alpha}{1+\alpha}\mathbf{B}\boldsymbol{\mu}_{0}+\frac{1-\alpha}{1+\alpha}\mathbf{B}\bar{\mathbf{y}}.
\end{equation}
To simplify, let $\mathbf{B}_{\alpha}=(1+\alpha)^{-1}\mathbf{B}$,
such that

\begin{equation}
\mathbf{d}^{*}=\mathbf{d}_{g}+\alpha\mathbf{B}_{\alpha}\boldsymbol{\mu}_{0}+(1-\alpha)\mathbf{B}_{\alpha}\bar{\mathbf{y}}.
\end{equation}

We observe that the Bayesian MV portfolio is a convex combination
between two portfolios. Specifically, for $\alpha\in(0,1)$ it follows
that
\begin{equation}
\mathbf{d}^{*}=\alpha\mathbf{d}^{*}(\boldsymbol{\mu}_{0})+\left[1-\alpha\right]\mathbf{d}^{*}(\bar{\mathbf{y}})
\end{equation}
with

\begin{equation}
\mathbf{d}^{*}(\mathbf{u})=\mathbf{d}_{g}+\mathbf{B}_{\alpha}\mathbf{u},
\end{equation}
denoting a linear function of vector $\mathbf{u}$. 

# Empirical Example
Let us start with a simple numerical example. To do so, I will be working with two ETFs. The first one is the SPY ETF denoting the stock market portfolio, whereas the other is the IEF denoting the return on 7-10 years Treasury bonds. 
```{r,message=F,warning=F}
library(quantmod)
library(lubridate)
v <- c("SPY","IEF")
t1 <- "1996-01-01" # the starting date
P.list <- lapply(v,function(x) get(getSymbols(x,from = t1)) )
P <- lapply(P.list,function(x) x[,grep("Adjusted",names(x))])
P <- Reduce(function(...) merge(...),P  )
names(P) <- v
P <- apply.monthly(P,function(x) x[nrow(x),] )
R <- na.omit(P/lag(P)-1)
R <- R["/2020-12-31",]
```

We consider the returns on both ETFs up to late 2020. Note that the inception of the IEF took place during the second half of 2002.  For this reason, the data dates between
```{r}
range(date(R))
```
The key to applying the Bayesian portfolio is to construct several matrices and vectors. We are also interested in evaluating the portfolio weights over time and, hence, its performance. To do so, we perform a simple backtesting procedure. At the end of each month, we consider the recent $T$ months of data as our sample.
```{r,out.width = '75%',fig.align="center"}
d <- ncol(R)
T_sample <- 12*2
month_series <- unique(ceiling_date(date(R),"m") - 1)
month_series <- month_series[year(month_series) > 2004] # keep two years for initial estimation
d_star_mat <- data.frame()
for (month_i in month_series) {
  alpha <- 1/(1+T_sample)
  e <- rep(1,d) # vector of ones
  I_mat <- diag(d) # diagonal matrix
  R_sub <- R[date(R) <= month_i,]
  R_sub <- tail(R_sub,T_sample) # sample 
  y_bar <- apply(R_sub,2,mean) # sample mean vector
  mu_0 <-  c(0.11,0.06)
  Sigma <- var(R_sub) # covariance matrix
  Sigma_inv <- solve(Sigma) # covariance matrix inverse
  d_g <- Sigma_inv%*%e/sum(Sigma_inv)
  B_mat <- Sigma_inv%*%(I_mat - e%*%t(d_g) )
  B_mat_a <- (1/(1+alpha))*B_mat
  d_star <- d_g + alpha*B_mat_a%*%mu_0 + (1-alpha)*B_mat_a%*%y_bar
  d_star <- exp(d_star)/(1+exp(d_star))
  d_star <- d_star/sum(d_star)
  d_star_mat <- rbind(d_star_mat,t(d_star))
}
```
A number of comments are in order. First, we note that the above procedure loops through all unique months in the data one by one. This application allows updating the sample estimates on a monthly basis. 
Second, the prior above corresponds to the total sample estimates of the mean return of each ETF. This approach is subjected to data snooping (look-ahead bias), since such information is only revealed after the fact. However, we do so for the sake of illustration. In the following discussion, we consider the sensitivity of the results to different priors.
Third, it is well-recognized that the MV portfolio weights result in extreme positions that are considered unrealistic. To mitigate this, we consider a simple logit transformation, which is applied using these two commands:
```{r,eval=FALSE}
d_star <- exp(d_star)/(1+exp(d_star))
d_star <- d_star/sum(d_star)
```
The second command normalizes the portfolio weights, such that its weights sum to one. While this may seem like an ad-hoc approach, this transformation results in a lower portfolio norm and is consistent with DeMiguel et al. (2009). 

The plot below illustrates the weight allocated to the SPY ETF over time. We observe that the weight is above 80\% the majority of the time. The weight decreases during market selloffs such as the 2007-09 financial crisis, late 2016, and COVID-19. 
```{r}
rownames(d_star_mat) <- month_series
d_star_mat <- as.xts(d_star_mat)
plot(d_star_mat[,1],main = "Weight to SPY")
```

Let us evaluate the portfolio performance in terms of return over time
```{r}
R_out <- R[year(date(R)) > 2004 ,]
R_out <- as.matrix(R_out[-1,])
W_in <- as.matrix(d_star_mat[-nrow(d_star_mat)])
r_port_oos <- apply(W_in*R_out,1,sum)
names(r_port_oos) <- rownames(R_out)
r_port_oos <- as.xts(r_port_oos)
```
As a benchmark, we consider the famous 60-40 decision rule
```{r}
W_naive <- matrix(c(0.6,0.4),nrow = nrow(W_in),2,byrow = T)
r_port_oos_naive <- apply(W_naive*R_out,1,sum)
names(r_port_oos_naive) <-  rownames(R_out)
r_port_oos_naive <- as.xts(r_port_oos_naive)
```

```{r,out.width = '75%',fig.align="center"}
r_port_oos <- merge(r_port_oos,r_port_oos_naive) 
names(r_port_oos) <- c("Bayes","Naive")
plot(cumprod(r_port_oos+1),main = "Cumulative Return",legend.loc = "topleft")
```
Overall, we observe that the Bayesian portfolio outperforms the 60-40 decision rule. 

## Different Priors
In this section, we repeat the above analysis for different priors. For example, suppose that the decision-maker believes that the return on the SPY is negative 11\%, whereas the return on the IEF stays the same. In this case, the corresponding portfolio would be tilted further towards the IEF (safer asset). As the figure below demonstrates, we note that Bayes portfolio underperforms in terms of cumulative return.
```{r,echo=F,fig.align="center",out.width = '75%'}
d_star_mat <- data.frame()
for (month_i in month_series) {
  alpha <- 1/(1+T_sample)
  e <- rep(1,d) # vector of ones
  I_mat <- diag(d) # diagonal matrix
  R_sub <- R[date(R) <= month_i,]
  R_sub <- tail(R_sub,T_sample) # sample 
  y_bar <- apply(R_sub,2,mean) # sample mean vector
  mu_0 <-  c(-0.11,0.06)
  Sigma <- var(R_sub) # covariance matrix
  Sigma_inv <- solve(Sigma) # covariance matrix inverse
  d_g <- Sigma_inv%*%e/sum(Sigma_inv)
  B_mat <- Sigma_inv%*%(I_mat - e%*%t(d_g) )
  B_mat_a <- (1/(1+alpha))*B_mat
  d_star <- d_g + alpha*B_mat_a%*%mu_0 + (1-alpha)*B_mat_a%*%y_bar
  d_star <- exp(d_star)/(1+exp(d_star))
  d_star <- d_star/sum(d_star)
  d_star_mat <- rbind(d_star_mat,t(d_star))
}

rownames(d_star_mat) <- month_series
d_star_mat <- as.xts(d_star_mat)
R_out <- R[year(date(R)) > 2004 ,]
R_out <- as.matrix(R_out[-1,])
W_in <- as.matrix(d_star_mat[-nrow(d_star_mat)])
r_port_oos <- apply(W_in*R_out,1,sum)
names(r_port_oos) <- rownames(R_out)
r_port_oos <- as.xts(r_port_oos)

W_naive <- matrix(c(0.6,0.4),nrow = nrow(W_in),2,byrow = T)
r_port_oos_naive <- apply(W_naive*R_out,1,sum)
names(r_port_oos_naive) <-  rownames(R_out)
r_port_oos_naive <- as.xts(r_port_oos_naive)

r_port_oos <- merge(r_port_oos,r_port_oos_naive) 
names(r_port_oos) <- c("Bayes","Naive")
plot(cumprod(r_port_oos+1),main = "Cumulative Return",legend.loc = "topleft")
```
In terms of risk-adjusted returns, we compute the Sharpe-ratio for each portfolio and report below
```{r}
SR_f <- function(x) sqrt(12)*mean(x)/sd(x)
result <- apply(r_port_oos, 2, SR_f)
round(data.frame(t(result)),2)
```
While the Bayes results in lower volatility, it is also associated with a lower return. Taken altogether, we realize that the Bayes results in a roughly similar Sharpe ratio.

In the next example, we modify the backtesting program a little. In doing so, the prior incorporates the recent one-month return as the investor's belief. For example, at the end of month $t$, after realizing the monthly returns between $t-1$ and $t$ denoted by $\mathbf{y}_t$, the decision-maker sets the prior to $\boldsymbol{\mu}_0 = \mathbf{y}_t$. To do so, we repeat the previous commands, however, with a small tweak in defining the prior `mu_0`, i.e.,
```{r,eval=FALSE}
mu_0 <-  t(tail(R_sub,1))
```
We summarize the cumulative return in the plot below:
```{r,fig.align="center",echo=FALSE,out.width = '75%'}
d_star_mat <- data.frame()
for (month_i in month_series) {
  alpha <- 1/(1+T_sample)
  e <- rep(1,d) # vector of ones
  I_mat <- diag(d) # diagonal matrix
  R_sub <- R[date(R) <= month_i,]
  R_sub <- tail(R_sub,T_sample) # sample 
  y_bar <- apply(R_sub,2,mean) # sample mean vector
  mu_0 <-  t(tail(R_sub,1))
  Sigma <- var(R_sub) # covariance matrix
  Sigma_inv <- solve(Sigma) # covariance matrix inverse
  d_g <- Sigma_inv%*%e/sum(Sigma_inv)
  B_mat <- Sigma_inv%*%(I_mat - e%*%t(d_g) )
  B_mat_a <- (1/(1+alpha))*B_mat
  d_star <- d_g + alpha*B_mat_a%*%mu_0 + (1-alpha)*B_mat_a%*%y_bar
  d_star <- exp(d_star)/(1+exp(d_star))
  d_star <- d_star/sum(d_star)
  d_star_mat <- rbind(d_star_mat,t(d_star))
}

rownames(d_star_mat) <- month_series
d_star_mat <- as.xts(d_star_mat)
R_out <- R[year(date(R)) > 2004 ,]
R_out <- as.matrix(R_out[-1,])
W_in <- as.matrix(d_star_mat[-nrow(d_star_mat)])
r_port_oos <- apply(W_in*R_out,1,sum)
names(r_port_oos) <- rownames(R_out)
r_port_oos <- as.xts(r_port_oos)

W_naive <- matrix(c(0.6,0.4),nrow = nrow(W_in),2,byrow = T)
r_port_oos_naive <- apply(W_naive*R_out,1,sum)
names(r_port_oos_naive) <-  rownames(R_out)
r_port_oos_naive <- as.xts(r_port_oos_naive)

r_port_oos <- merge(r_port_oos,r_port_oos_naive) 
names(r_port_oos) <- c("Bayes","Naive")
plot(cumprod(r_port_oos+1),main = "Cumulative Return",legend.loc = "topleft")
```
Interestingly, we observe that the Bayes portfolio outperforms the naive decision rule. Furthermore, in terms of risk-adjusted performance, we note that the Bayes rule outperforms the naive, resulting in a Sharpe ratio of `r round(result,2)[1]`:
```{r}
result <- apply(r_port_oos, 2, SR_f)
round(data.frame(t(result)),2)
```

One concern with the above is that the Bayes portfolio may exhibit a higher turnover. Taking into account transaction costs (TC), it is unclear whether the Bayes still outperforms. To evaluate this, we consider a similar TC analysis pursued in this [article](https://rpubs.com/simaan84/port_opt)
```{r}
W_to <- W_in*(1+R_out)
W_to <- t(apply(W_to,1,function(x) t(x)/sum(x)))
W_to <- W_in[-1,] - W_to[-nrow(W_to),] 
W_to <- mean(apply(W_to,1,function(x) sum(abs(x))  ))
TO_bayes <- W_to
```
whereas for the naive portfolio, we have
```{r}
W_to <- W_naive*(1+R_out)
W_to <- t(apply(W_to,1,function(x) t(x)/sum(x)))
W_to <- W_in[-1,] - W_to[-nrow(W_to),] 
W_to <- mean(apply(W_to,1,function(x) sum(abs(x))  ))
data.frame(Bayes = TO_bayes,Naive = W_to)
```

While the Bayes does include recent returns into the allocation and is subjected to more noise, its incorporation seems to have a mitigating effect. This result could potentially be due to the persistence of returns over time. To relate to this conjecture, consider the autocorrelation of each ETF over time. Using the same sample size `T_sample`, we compute the rolling window autocorrelation for each asset and plot the results below.
```{r,warning=F,message=F,out.width = '75%',fig.align="center"}
library(plyr)
R_lag <- na.omit(cbind(R_out,lag.xts(R_out,1)))
cor_roll <- rollapply(R_lag,T_sample, function(x) as.vector(cor(x)), by.column = F, align = "right")
cor_roll <- alply(cor_roll, 1, function(x) matrix(x, nrow = ncol(R_lag) )   )

spy_ar <- sapply(cor_roll,function(x) x[3,1])
ief_ar <- sapply(cor_roll,function(x) x[4,2])
AR_mat <- cbind(spy_ar,ief_ar)
rownames(AR_mat) <- rownames(R_lag)[T_sample:nrow(R_lag)]
AR_mat <- as.xts(AR_mat)
names(AR_mat) <- c("SPY","IEF")
plot(AR_mat,legend.loc = "bottomleft",main = "Autocorrelation")

```
The above analysis relies on the same trick done in the aforementioned [article](https://rpubs.com/simaan84/port_opt). Overall, we note the serial correlation varies over time, with values ranging between -53\% and 45\% for the SPY, whereas the corresponding numbers for IEF are -46\% and 35\%, respectively. Given the contemporaneous negative relation between the two ETFs, we also observe that both autocorrelations co-move together over time. Nonetheless, over the entire sample period, we note that both coefficients are positive.
```{r}
AR_result <- data.frame(SPY = cor(R$SPY,lag(R$SPY), use = "pairwise"),
                        IEF = cor(R$IEF,lag(R$IEF), use = "pairwise"))
rownames(AR_result) <- "Autocorrelation (%)"
round(AR_result*100,2)
```

# Remarks
The analysis above highlights several important aspects. First, the article illustrates a simple theory to link beliefs into the portfolio problem. Second, the empirical results illustrate a backtesting procedure using two assets and evaluate the sensitivity of the results using different priors. Finally, the above paradigm could provide theoretical justification for incorporating predictive models into mean-variance optimization. I leave this for future research.  



# References 

1. Black, F., & Litterman, R. (1990). Asset allocation: combining investor views with market equilibrium. Goldman Sachs Fixed Income Research, 115.

2. DeMiguel, V., Garlappi, L., Nogales, F. J., & Uppal, R. (2009). A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms. Management science, 55(5), 798-812.

3. Jorion, P. (1986). Bayes-Stein estimation for portfolio analysis. Journal of Financial and Quantitative analysis, 21(3), 279-292.


4. Klein, R. W., & Bawa, V. S. (1976). The effect of estimation risk on optimal portfolio choice. Journal of financial economics, 3(3), 215-231.


5. Simaan, M., Simaan, Y., & Tang, Y. (2018). Estimation error in mean returns and the mean-variance efficient frontier. International Review of Economics & Finance, 56, 109-124.