Portfolio optimization addresses a fundamental capital allocation problem: how to distribute investment across a set of securities to achieve the best possible return for a given level of risk. This project utilized the Markowitz Mean-Variance framework to identify ‘non-dominated’ portfolios; those where no other combination of weights offers a higher return for the same risk. Our analysis focused on a diverse basket of seven equities (GE, XOM, GBX, SBUX, PFE, HMC, and NVDA) during the volatile 2021–2024 period. Rather than relying on traditional quadratic programming solvers, we employed a simulation-based approach. By generating 700 random weight vectors, we were able to empirically map the feasible set and approximate the Efficient Frontier, providing a more intuitive and visual understanding of the risk-return trade-offs inherent in these specific assets.
Data retrieval was performed using the yfR package,
which provides adjusted closing prices by default and supports batch
downloading of multiple tickers in a single call, offering a more robust
and up-to-date alternative to quantmod.
myMeanVarPort <- function(tickers, begin_date, end_date, rf) {
# Download monthly adjusted returns via yfR
df_long <- yf_get(tickers = tickers,
first_date = begin_date,
last_date = end_date,
freq_data = "monthly",
type_return = "arit",
be_quiet = TRUE)
# Reshape from long to wide format, preserve ticker order, drop NAs
retout <- df_long %>%
select(ref_date, ticker, ret_adjusted_prices) %>%
pivot_wider(names_from = ticker,
values_from = ret_adjusted_prices) %>%
arrange(ref_date) %>%
select(all_of(tickers)) %>%
na.omit()
# Aggregate means and covariance matrix over the full period
meanret <- as.matrix(colMeans(retout))
covar <- var(retout)
# Simulate 100*N random weight vectors (seed = 12)
n <- length(tickers)
niter <- 100 * n
set.seed(12)
randomnums <- data.frame(replicate(n, runif(niter, 0, 1)))
wt_sim <- randomnums / rowSums(randomnums)
# Convert annual rf to monthly before computing Sharpe ratios
rf_monthly <- rf / 12
# Compute portfolio mean, sigma, and Sharpe ratio for each simulation
weight <- matrix(NA, nrow = n, ncol = 1)
Results <- matrix(NA, nrow = niter, ncol = n + 3)
for (i in 1:niter) {
for (k in 1:n) {
Results[i, k] <- weight[k, 1] <- wt_sim[i, k]
}
port_mean <- t(weight) %*% meanret
port_sigma <- sqrt(t(weight) %*% covar %*% weight)
sharpe <- (port_mean - rf_monthly) / port_sigma
Results[i, n + 1] <- port_mean
Results[i, n + 2] <- port_sigma
Results[i, n + 3] <- sharpe
}
colnames(Results) <- c(paste0(tickers, "_wt"), "PortMean", "PortSigma", "Sharpe")
list(
stock_means = round(meanret, 6),
cov_matrix = round(covar, 8),
sim_portfolios = as.data.frame(Results)
)
}The function accepts four inputs: tickers,
begin_date, end_date, and rf
(annual risk-free rate) and returns a named list containing the vector
of stock means, the covariance matrix, and a data frame of simulated
portfolio results (weights, mean, sigma, Sharpe ratio).
yfR::yf_get() retrieves all tickers in a single call and
returns monthly arithmetic returns on adjusted prices directly, removing
the need for per-ticker loops or manual adjusted price extraction. The
long format output was reshaped to wide format via
pivot_wider() before any matrix calculations were
performed.
The function was run with the required inputs below. With 7 securities, 700 portfolios were simulated (100 × 7).
The mean returns and covariance matrix are single aggregate values computed over the full sample period. The diagonal elements of the covariance matrix are individual stock variances; off-diagonal elements measure pairwise co-movement and drive diversification benefits.
## Mean Monthly Returns (2021-2024):
## [,1]
## GE 0.029508
## XOM 0.025301
## GBX 0.023086
## SBUX 0.003455
## PFE -0.000358
## HMC 0.006483
## NVDA 0.063030
## Variance-Covariance Matrix:
## GE XOM GBX SBUX PFE HMC
## GE 0.00909667 0.00394022 0.00641849 0.00314594 -0.00075865 0.00270523
## XOM 0.00394022 0.00711813 0.00387242 0.00071498 -0.00066581 0.00141214
## GBX 0.00641849 0.00387242 0.02247839 0.00226764 0.00175633 0.00220912
## SBUX 0.00314594 0.00071498 0.00226764 0.00623296 0.00093618 0.00024185
## PFE -0.00075865 -0.00066581 0.00175633 0.00093618 0.00566407 -0.00046055
## HMC 0.00270523 0.00141214 0.00220912 0.00024185 -0.00046055 0.00396152
## NVDA 0.00739327 0.00005447 0.00755342 0.00277184 0.00288030 0.00331454
## NVDA
## GE 0.00739327
## XOM 0.00005447
## GBX 0.00755342
## SBUX 0.00277184
## PFE 0.00288030
## HMC 0.00331454
## NVDA 0.02393421
## Total portfolios simulated: 700
| GE_wt | XOM_wt | GBX_wt | SBUX_wt | PFE_wt | HMC_wt | NVDA_wt | PortMean | PortSigma | Sharpe |
|---|---|---|---|---|---|---|---|---|---|
| 0.02200 | 0.12060 | 0.10444 | 0.14058 | 0.26511 | 0.21988 | 0.12739 | 0.01596 | 0.05211 | 0.27424 |
| 0.17490 | 0.09083 | 0.16895 | 0.18331 | 0.12098 | 0.20000 | 0.06103 | 0.01709 | 0.05882 | 0.26227 |
| 0.24676 | 0.01779 | 0.26135 | 0.04117 | 0.14936 | 0.11392 | 0.16966 | 0.02529 | 0.07712 | 0.30625 |
| 0.08228 | 0.22659 | 0.06260 | 0.01112 | 0.27532 | 0.07600 | 0.26609 | 0.02681 | 0.06470 | 0.38862 |
| 0.05806 | 0.20295 | 0.28035 | 0.08887 | 0.16370 | 0.06580 | 0.14027 | 0.02284 | 0.07026 | 0.30132 |
| 0.00897 | 0.11912 | 0.24735 | 0.18626 | 0.10236 | 0.10331 | 0.23262 | 0.02493 | 0.07284 | 0.31936 |
| 0.04826 | 0.19277 | 0.22443 | 0.16279 | 0.26723 | 0.10418 | 0.00034 | 0.01265 | 0.05669 | 0.19366 |
| 0.20001 | 0.15077 | 0.26378 | 0.01574 | 0.18823 | 0.17998 | 0.00148 | 0.01705 | 0.06508 | 0.23643 |
| 0.00667 | 0.10845 | 0.27122 | 0.09188 | 0.25678 | 0.14664 | 0.11837 | 0.01784 | 0.06533 | 0.24757 |
| 0.00282 | 0.21481 | 0.24811 | 0.17142 | 0.14310 | 0.11865 | 0.10109 | 0.01893 | 0.06292 | 0.27435 |
opt_idx <- which.max(Results$Sharpe)
opt_port <- Results[opt_idx, ]
mvp_idx <- which.min(Results$PortSigma)
mvp_port <- Results[mvp_idx, ]
cat("OPTIMAL PORTFOLIO (Highest Sharpe)\n")## OPTIMAL PORTFOLIO (Highest Sharpe)
## Portfolio Mean: 0.03802
## Portfolio Sigma: 0.08575
## Sharpe Ratio: 0.42395
## Weights:
## GE : 14.04%
## XOM : 18.07%
## GBX : 2.32%
## SBUX : 8.09%
## PFE : 1.31%
## HMC : 12.22%
## NVDA : 43.95%
##
## MINIMUM VARIANCE PORTFOLIO
## Portfolio Mean: 0.01091
## Portfolio Sigma: 0.04371
## Sharpe Ratio: 0.21155
## Weights:
## GE : 7.49%
## XOM : 19.45%
## GBX : 2.44%
## SBUX : 23.79%
## PFE : 20.96%
## HMC : 24.46%
## NVDA : 1.40%
The “feasible set” represents the universe of all 700 simulated portfolios, visualized here as a characteristic “bullet” shape. Each steel-blue point represents a unique combination of the seven stocks (GE, XOM, GBX, SBUX, PFE, HMC, and NVDA). The interior points represent diversified but sub-optimal holdings, while the outer boundary represents the limit of what is achievable given the historical returns and correlations of these specific assets.
ggplot(Results, aes(x = PortSigma, y = PortMean)) +
geom_point(colour = "steelblue", alpha = 0.3, size = 1.2) +
labs(
title = "Feasible Set of Simulated Portfolios",
subtitle = "GE, XOM, GBX, SBUX, PFE, HMC, NVDA | 2021-2024 | 700 Simulations",
x = "Portfolio Sigma (Monthly Std Dev)",
y = "Portfolio Mean (Monthly Return)"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", hjust = 0.5), # Centered Title
plot.subtitle = element_text(hjust = 0.5) # Centered Subtitle
)To identify the best-performing portfolios from our simulation, we applied a binning method that traces the “outer shell” of the feasible set. By selecting the portfolio with the minimum risk (\(\sigma\)) for every return level, we highlight the Efficient Frontier (red points). The Minimum Variance Portfolio (green triangle) marks the absolute lowest volatility achievable. Critically, only the segment extending above the MVP is considered the true Efficient Frontier; any points below it are considered inefficient. The Optimal Portfolio (gold diamond) identifies the specific weight allocation that maximizes the Sharpe Ratio, representing the most efficient balance of risk and reward relative to our 2% risk-free rate.
# Bin by sigma (x-axis) and take the MAX return per bin -> upper envelope
seqmret <- seq(round(min(Results$PortMean), 3) - .001,
max(Results$PortMean) + .001, .005)
optim_frontier <- Results %>%
mutate(portnumber = row_number(),
ints = cut(PortMean, breaks = seqmret)) %>%
group_by(ints) %>%
summarise(sig_optim = min(PortSigma),
retn_optim = PortMean[which.min(PortSigma)],
.groups = "drop") %>%
filter(!is.na(ints)) %>%
arrange(retn_optim)
ggplot() +
geom_point(data = Results, aes(x = PortSigma, y = PortMean),
colour = "steelblue", alpha = 0.2, size = 1.2) +
geom_point(data = optim_frontier, aes(x = sig_optim, y = retn_optim),
colour = "darkred", size = 2) +
geom_point(data = opt_port, aes(x = PortSigma, y = PortMean),
colour = "gold", size = 5, shape = 18) +
geom_point(data = mvp_port, aes(x = PortSigma, y = PortMean),
colour = "green4", size = 5, shape = 17) +
annotate("text",
x = opt_port$PortSigma + 0.001,
y = opt_port$PortMean,
label = sprintf("Optimal Portfolio \n (Sharpe = %.3f)", opt_port$Sharpe),
hjust = 0, size = 3.2, colour = "darkgoldenrod") +
annotate("text",
x = mvp_port$PortSigma + 0.001,
y = mvp_port$PortMean,
label = sprintf("Min Variance Portfolio \n (σ = %.4f)", mvp_port$PortSigma),
hjust = 0, size = 3.2, colour = "green4") +
labs(
title = "Mean-Variance Portfolio Optimization via Simulation",
subtitle = "GE, XOM, GBX, SBUX, PFE, HMC, NVDA | 2021-2024 | rf = 2% p.a.",
x = "Portfolio Sigma (Monthly Std Dev)",
y = "Portfolio Mean (Monthly Return)",
caption = paste("Gold diamond = Optimal Portfolio (Max Sharpe)",
"| Green triangle = Min Variance Portfolio")
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 13, hjust = 0.5), # Centered
plot.subtitle = element_text(hjust = 0.5), # Centered
plot.caption = element_text(size = 9, hjust = 0.5) # Centered
)The simulation results revealed a significant performance gap between the selected securities from 2021–2024, offering a practical look at the trade-offs between risk and reward. NVDA dominated the set with a monthly mean return of ~6.3%, driven by the AI semiconductor boom, while GE and XOM provided strong secondary growth. Conversely, SBUX and PFE showed near-zero or negative returns, acting primarily as diversifiers rather than return-drivers.
This disparity is clearly visible in our two benchmark portfolios:
The Optimal Portfolio (Gold Diamond): This represents the “sweet spot” that maximizes the Sharpe Ratio
\[\text{Sharpe Ratio} = \frac{\bar{R}_p - R_f}{\sigma_p}\]
To achieve its high risk-adjusted return, the model allocated a heavy 44% weight to NVDA. While mathematically ideal, this concentration highlights a real-world risk: relying heavily on a single high-performer can make the portfolio vulnerable if that sector’s trend reverses.
The Minimum Variance Portfolio (Green Triangle): Positioned at the absolute “nose” of the feasible set, the MVP prioritizes stability over growth. It achieved the lowest volatility (\(\sigma\)) by shifting weight away from NVDA and into more stable assets like HMC and SBUX.
Ultimately, the Efficient Frontier (red points) above the MVP illustrate that diversification is not just about owning many stocks, but about finding the right weights to eliminate “dominated” portfolios. Any point below the MVP is inefficient because an investor could achieve a higher return for the same level of risk by simply moving to a point higher up on the frontier.
| Simulation | Quadratic Optimization | |
|---|---|---|
| Precision | Approximate | Exact |
| Interpretability | High — full feasible set visible | Low — solution only |
| Solver required | No | Yes (quadprog,
PortfolioAnalytics) |
| Scalability | Poor for large N | Better |
| Constraints | Simple (filter results) | Requires explicit specification |
The primary advantage of the simulation method is its transparency and interpretability. By visualizing the full feasible set, investors can see the “density” of potential outcomes without relying on a black-box mathematical solver. However, simulation is an approximation; the accuracy of the frontier depends on the number of iterations and the random seed used. While we can easily add constraints (like “no short-selling”) by simply filtering our results, both this method and quadratic optimization suffer from estimation risk. Because the model relies on historical means and covariances, any “garbage in” regarding past data will result in “garbage out” for future projections.
This report demonstrated simulation-based mean-variance optimization
using seven stocks (GE, XOM, GBX, SBUX, PFE, HMC, NVDA) over January
2021 to December 2024. Data retrieval was performed using the
yfR package, which provides adjusted closing prices by
default and enables batch downloading in a single function call.
Generating 700 random portfolios with all return, sigma, and Sharpe
values treated as aggregate figures across the full period which allowed
us to visualize the feasible set, trace the efficient frontier, and
identify both the Optimal Portfolio and the Minimum Variance Portfolio.
The simulation approach is intuitive and requires no specialized
solvers, making it a practical alternative to quadratic optimization at
the cost of an approximate rather than exact solution.
Running this simulation really helped our group move past the theory and actually see how mean-variance optimization works in practice. By testing 700 different weight combinations and plotting the results, we could clearly see how diversification helps lower risk. It was also easy to spot why portfolios below the Minimum Variance Portfolio (MVP) just don’t make sense, as they offer less return for the same or even higher risk. A few things really clicked for us. We noticed how much NVDA dominated the Optimal Portfolio with a 44% weight, which showed us how a single high-performing stock can take over when you’re focused solely on the Sharpe ratio. It made us wonder if relying that heavily on one asset would actually work out in the real world. We also spent time comparing the Optimal Portfolio to the MVP. It was a great lesson in trade-offs: one aims for the best possible return per unit of risk, while the other just tries to keep things steady. Neither is “better” overall; it really just depends on how much risk an investor is willing to handle. Finally, we recognized that while this model is powerful, it has its limits. Since it relies on historical data to predict the future, any errors in our initial assumptions get magnified. Overall, this project helped us connect the math to real-world numbers in a way that finally made the concepts feel intuitive.