1 Introduction

Portfolio optimization addresses a fundamental capital allocation problem: how to distribute investment across a set of securities to achieve the best possible return for a given level of risk. This project utilized the Markowitz Mean-Variance framework to identify ‘non-dominated’ portfolios; those where no other combination of weights offers a higher return for the same risk. Our analysis focused on a diverse basket of seven equities (GE, XOM, GBX, SBUX, PFE, HMC, and NVDA) during the volatile 2021–2024 period. Rather than relying on traditional quadratic programming solvers, we employed a simulation-based approach. By generating 700 random weight vectors, we were able to empirically map the feasible set and approximate the Efficient Frontier, providing a more intuitive and visual understanding of the risk-return trade-offs inherent in these specific assets.

Data retrieval was performed using the yfR package, which provides adjusted closing prices by default and supports batch downloading of multiple tickers in a single call, offering a more robust and up-to-date alternative to quantmod.

2 Required Libraries

library(yfR)      # Stock price data (adjusted prices by default)
library(tidyr)    # pivot_wider for reshaping yfR long format output
library(ggplot2)  # Plotting
library(dplyr)    # Data manipulation
library(knitr)    # kable for formatted table output

3 The myMeanVarPort Function

myMeanVarPort <- function(tickers, begin_date, end_date, rf) {

  # Download monthly adjusted returns via yfR
  df_long <- yf_get(tickers    = tickers,
                    first_date  = begin_date,
                    last_date   = end_date,
                    freq_data   = "monthly",
                    type_return = "arit",
                    be_quiet    = TRUE)

  # Reshape from long to wide format, preserve ticker order, drop NAs
  retout <- df_long %>%
    select(ref_date, ticker, ret_adjusted_prices) %>%
    pivot_wider(names_from  = ticker,
                values_from = ret_adjusted_prices) %>%
    arrange(ref_date) %>%
    select(all_of(tickers)) %>%
    na.omit()

  # Aggregate means and covariance matrix over the full period
  meanret <- as.matrix(colMeans(retout))
  covar   <- var(retout)

  # Simulate 100*N random weight vectors (seed = 12)
  n     <- length(tickers)
  niter <- 100 * n
  set.seed(12)
  randomnums <- data.frame(replicate(n, runif(niter, 0, 1)))
  wt_sim     <- randomnums / rowSums(randomnums)

  # Convert annual rf to monthly before computing Sharpe ratios
  rf_monthly <- rf / 12

  # Compute portfolio mean, sigma, and Sharpe ratio for each simulation
  weight  <- matrix(NA, nrow = n, ncol = 1)
  Results <- matrix(NA, nrow = niter, ncol = n + 3)

  for (i in 1:niter) {
    for (k in 1:n) {
      Results[i, k] <- weight[k, 1] <- wt_sim[i, k]
    }
    port_mean         <- t(weight) %*% meanret
    port_sigma        <- sqrt(t(weight) %*% covar %*% weight)
    sharpe            <- (port_mean - rf_monthly) / port_sigma
    Results[i, n + 1] <- port_mean
    Results[i, n + 2] <- port_sigma
    Results[i, n + 3] <- sharpe
  }

  colnames(Results) <- c(paste0(tickers, "_wt"), "PortMean", "PortSigma", "Sharpe")

  list(
    stock_means    = round(meanret, 6),
    cov_matrix     = round(covar, 8),
    sim_portfolios = as.data.frame(Results)
  )
}

The function accepts four inputs: tickers, begin_date, end_date, and rf (annual risk-free rate) and returns a named list containing the vector of stock means, the covariance matrix, and a data frame of simulated portfolio results (weights, mean, sigma, Sharpe ratio).

yfR::yf_get() retrieves all tickers in a single call and returns monthly arithmetic returns on adjusted prices directly, removing the need for per-ticker loops or manual adjusted price extraction. The long format output was reshaped to wide format via pivot_wider() before any matrix calculations were performed.

4 Results

4.1 Function Inputs

The function was run with the required inputs below. With 7 securities, 700 portfolios were simulated (100 × 7).

tickers    <- c("GE", "XOM", "GBX", "SBUX", "PFE", "HMC", "NVDA")
begin_date <- "2021-01-01"
end_date   <- "2024-12-31"
rf         <- 0.02

port_output <- myMeanVarPort(tickers, begin_date, end_date, rf)
Results     <- port_output$sim_portfolios

4.2 Stock Means and Covariance Matrix

The mean returns and covariance matrix are single aggregate values computed over the full sample period. The diagonal elements of the covariance matrix are individual stock variances; off-diagonal elements measure pairwise co-movement and drive diversification benefits.

cat("Mean Monthly Returns (2021-2024):\n")
## Mean Monthly Returns (2021-2024):
print(port_output$stock_means)
##           [,1]
## GE    0.029508
## XOM   0.025301
## GBX   0.023086
## SBUX  0.003455
## PFE  -0.000358
## HMC   0.006483
## NVDA  0.063030
cat("Variance-Covariance Matrix:\n")
## Variance-Covariance Matrix:
print(port_output$cov_matrix)
##               GE         XOM        GBX       SBUX         PFE         HMC
## GE    0.00909667  0.00394022 0.00641849 0.00314594 -0.00075865  0.00270523
## XOM   0.00394022  0.00711813 0.00387242 0.00071498 -0.00066581  0.00141214
## GBX   0.00641849  0.00387242 0.02247839 0.00226764  0.00175633  0.00220912
## SBUX  0.00314594  0.00071498 0.00226764 0.00623296  0.00093618  0.00024185
## PFE  -0.00075865 -0.00066581 0.00175633 0.00093618  0.00566407 -0.00046055
## HMC   0.00270523  0.00141214 0.00220912 0.00024185 -0.00046055  0.00396152
## NVDA  0.00739327  0.00005447 0.00755342 0.00277184  0.00288030  0.00331454
##            NVDA
## GE   0.00739327
## XOM  0.00005447
## GBX  0.00755342
## SBUX 0.00277184
## PFE  0.00288030
## HMC  0.00331454
## NVDA 0.02393421

4.3 Simulated Portfolios (First 10 Rows)

cat("Total portfolios simulated:", nrow(Results), "\n\n")
## Total portfolios simulated: 700
kable(round(head(Results, 10), 5),
      caption = "First 10 Simulated Portfolios",
      align   = "r")
First 10 Simulated Portfolios
GE_wt XOM_wt GBX_wt SBUX_wt PFE_wt HMC_wt NVDA_wt PortMean PortSigma Sharpe
0.02200 0.12060 0.10444 0.14058 0.26511 0.21988 0.12739 0.01596 0.05211 0.27424
0.17490 0.09083 0.16895 0.18331 0.12098 0.20000 0.06103 0.01709 0.05882 0.26227
0.24676 0.01779 0.26135 0.04117 0.14936 0.11392 0.16966 0.02529 0.07712 0.30625
0.08228 0.22659 0.06260 0.01112 0.27532 0.07600 0.26609 0.02681 0.06470 0.38862
0.05806 0.20295 0.28035 0.08887 0.16370 0.06580 0.14027 0.02284 0.07026 0.30132
0.00897 0.11912 0.24735 0.18626 0.10236 0.10331 0.23262 0.02493 0.07284 0.31936
0.04826 0.19277 0.22443 0.16279 0.26723 0.10418 0.00034 0.01265 0.05669 0.19366
0.20001 0.15077 0.26378 0.01574 0.18823 0.17998 0.00148 0.01705 0.06508 0.23643
0.00667 0.10845 0.27122 0.09188 0.25678 0.14664 0.11837 0.01784 0.06533 0.24757
0.00282 0.21481 0.24811 0.17142 0.14310 0.11865 0.10109 0.01893 0.06292 0.27435

4.4 Optimal and Minimum Variance Portfolios

opt_idx  <- which.max(Results$Sharpe)
opt_port <- Results[opt_idx, ]

mvp_idx  <- which.min(Results$PortSigma)
mvp_port <- Results[mvp_idx, ]

cat("OPTIMAL PORTFOLIO (Highest Sharpe)\n")
## OPTIMAL PORTFOLIO (Highest Sharpe)
cat(sprintf("  Portfolio Mean:  %.5f\n", opt_port$PortMean))
##   Portfolio Mean:  0.03802
cat(sprintf("  Portfolio Sigma: %.5f\n", opt_port$PortSigma))
##   Portfolio Sigma: 0.08575
cat(sprintf("  Sharpe Ratio:    %.5f\n", opt_port$Sharpe))
##   Sharpe Ratio:    0.42395
cat("  Weights:\n")
##   Weights:
for (t in tickers) cat(sprintf("    %-4s : %.2f%%\n", t,
                                opt_port[[paste0(t, "_wt")]] * 100))
##     GE   : 14.04%
##     XOM  : 18.07%
##     GBX  : 2.32%
##     SBUX : 8.09%
##     PFE  : 1.31%
##     HMC  : 12.22%
##     NVDA : 43.95%
cat("\nMINIMUM VARIANCE PORTFOLIO\n")
## 
## MINIMUM VARIANCE PORTFOLIO
cat(sprintf("  Portfolio Mean:  %.5f\n", mvp_port$PortMean))
##   Portfolio Mean:  0.01091
cat(sprintf("  Portfolio Sigma: %.5f\n", mvp_port$PortSigma))
##   Portfolio Sigma: 0.04371
cat(sprintf("  Sharpe Ratio:    %.5f\n", mvp_port$Sharpe))
##   Sharpe Ratio:    0.21155
cat("  Weights:\n")
##   Weights:
for (t in tickers) cat(sprintf("    %-4s : %.2f%%\n", t,
                                mvp_port[[paste0(t, "_wt")]] * 100))
##     GE   : 7.49%
##     XOM  : 19.45%
##     GBX  : 2.44%
##     SBUX : 23.79%
##     PFE  : 20.96%
##     HMC  : 24.46%
##     NVDA : 1.40%

5 Plots

5.1 Feasible Set

The “feasible set” represents the universe of all 700 simulated portfolios, visualized here as a characteristic “bullet” shape. Each steel-blue point represents a unique combination of the seven stocks (GE, XOM, GBX, SBUX, PFE, HMC, and NVDA). The interior points represent diversified but sub-optimal holdings, while the outer boundary represents the limit of what is achievable given the historical returns and correlations of these specific assets.

ggplot(Results, aes(x = PortSigma, y = PortMean)) +
  geom_point(colour = "steelblue", alpha = 0.3, size = 1.2) +
  labs(
    title    = "Feasible Set of Simulated Portfolios",
    subtitle = "GE, XOM, GBX, SBUX, PFE, HMC, NVDA | 2021-2024 | 700 Simulations",
    x        = "Portfolio Sigma (Monthly Std Dev)",
    y        = "Portfolio Mean (Monthly Return)"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5), # Centered Title
    plot.subtitle = element_text(hjust = 0.5)              # Centered Subtitle
  )

5.2 Optimal & Minimum Variance Portfolios

To identify the best-performing portfolios from our simulation, we applied a binning method that traces the “outer shell” of the feasible set. By selecting the portfolio with the minimum risk (\(\sigma\)) for every return level, we highlight the Efficient Frontier (red points). The Minimum Variance Portfolio (green triangle) marks the absolute lowest volatility achievable. Critically, only the segment extending above the MVP is considered the true Efficient Frontier; any points below it are considered inefficient. The Optimal Portfolio (gold diamond) identifies the specific weight allocation that maximizes the Sharpe Ratio, representing the most efficient balance of risk and reward relative to our 2% risk-free rate.

# Bin by sigma (x-axis) and take the MAX return per bin -> upper envelope
seqmret <- seq(round(min(Results$PortMean), 3) - .001,
               max(Results$PortMean) + .001, .005)

optim_frontier <- Results %>%
  mutate(portnumber = row_number(),
         ints       = cut(PortMean, breaks = seqmret)) %>%
  group_by(ints) %>%
  summarise(sig_optim  = min(PortSigma),
            retn_optim = PortMean[which.min(PortSigma)],
            .groups    = "drop") %>%
  filter(!is.na(ints)) %>%
  arrange(retn_optim)
ggplot() +
  geom_point(data = Results, aes(x = PortSigma, y = PortMean),
             colour = "steelblue", alpha = 0.2, size = 1.2) +
  geom_point(data = optim_frontier, aes(x = sig_optim, y = retn_optim),
             colour = "darkred", size = 2) +
  geom_point(data = opt_port, aes(x = PortSigma, y = PortMean),
             colour = "gold", size = 5, shape = 18) +
  geom_point(data = mvp_port, aes(x = PortSigma, y = PortMean),
             colour = "green4", size = 5, shape = 17) +
  annotate("text",
           x     = opt_port$PortSigma + 0.001,
           y     = opt_port$PortMean,
           label = sprintf("Optimal Portfolio \n (Sharpe = %.3f)", opt_port$Sharpe),
           hjust = 0, size = 3.2, colour = "darkgoldenrod") +
  annotate("text",
           x     = mvp_port$PortSigma + 0.001,
           y     = mvp_port$PortMean,
           label = sprintf("Min Variance Portfolio \n (σ = %.4f)", mvp_port$PortSigma),
           hjust = 0, size = 3.2, colour = "green4") +
  labs(
    title    = "Mean-Variance Portfolio Optimization via Simulation",
    subtitle = "GE, XOM, GBX, SBUX, PFE, HMC, NVDA | 2021-2024 | rf = 2% p.a.",
    x        = "Portfolio Sigma (Monthly Std Dev)",
    y        = "Portfolio Mean (Monthly Return)",
    caption  = paste("Gold diamond = Optimal Portfolio (Max Sharpe)",
                     "| Green triangle = Min Variance Portfolio")
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title   = element_text(face = "bold", size = 13, hjust = 0.5), # Centered
    plot.subtitle = element_text(hjust = 0.5),                         # Centered
    plot.caption = element_text(size = 9, hjust = 0.5)                 # Centered
  )

5.3 Interpreting the Outputs

The simulation results revealed a significant performance gap between the selected securities from 2021–2024, offering a practical look at the trade-offs between risk and reward. NVDA dominated the set with a monthly mean return of ~6.3%, driven by the AI semiconductor boom, while GE and XOM provided strong secondary growth. Conversely, SBUX and PFE showed near-zero or negative returns, acting primarily as diversifiers rather than return-drivers.

This disparity is clearly visible in our two benchmark portfolios:

The Optimal Portfolio (Gold Diamond): This represents the “sweet spot” that maximizes the Sharpe Ratio

\[\text{Sharpe Ratio} = \frac{\bar{R}_p - R_f}{\sigma_p}\]

To achieve its high risk-adjusted return, the model allocated a heavy 44% weight to NVDA. While mathematically ideal, this concentration highlights a real-world risk: relying heavily on a single high-performer can make the portfolio vulnerable if that sector’s trend reverses.

The Minimum Variance Portfolio (Green Triangle): Positioned at the absolute “nose” of the feasible set, the MVP prioritizes stability over growth. It achieved the lowest volatility (\(\sigma\)) by shifting weight away from NVDA and into more stable assets like HMC and SBUX.

Ultimately, the Efficient Frontier (red points) above the MVP illustrate that diversification is not just about owning many stocks, but about finding the right weights to eliminate “dominated” portfolios. Any point below the MVP is inefficient because an investor could achieve a higher return for the same level of risk by simply moving to a point higher up on the frontier.

5.4 Pros and Cons of the Simulation Method

Simulation Quadratic Optimization
Precision Approximate Exact
Interpretability High — full feasible set visible Low — solution only
Solver required No Yes (quadprog, PortfolioAnalytics)
Scalability Poor for large N Better
Constraints Simple (filter results) Requires explicit specification

The primary advantage of the simulation method is its transparency and interpretability. By visualizing the full feasible set, investors can see the “density” of potential outcomes without relying on a black-box mathematical solver. However, simulation is an approximation; the accuracy of the frontier depends on the number of iterations and the random seed used. While we can easily add constraints (like “no short-selling”) by simply filtering our results, both this method and quadratic optimization suffer from estimation risk. Because the model relies on historical means and covariances, any “garbage in” regarding past data will result in “garbage out” for future projections.

6 Conclusion

This report demonstrated simulation-based mean-variance optimization using seven stocks (GE, XOM, GBX, SBUX, PFE, HMC, NVDA) over January 2021 to December 2024. Data retrieval was performed using the yfR package, which provides adjusted closing prices by default and enables batch downloading in a single function call. Generating 700 random portfolios with all return, sigma, and Sharpe values treated as aggregate figures across the full period which allowed us to visualize the feasible set, trace the efficient frontier, and identify both the Optimal Portfolio and the Minimum Variance Portfolio. The simulation approach is intuitive and requires no specialized solvers, making it a practical alternative to quadratic optimization at the cost of an approximate rather than exact solution.

Running this simulation really helped our group move past the theory and actually see how mean-variance optimization works in practice. By testing 700 different weight combinations and plotting the results, we could clearly see how diversification helps lower risk. It was also easy to spot why portfolios below the Minimum Variance Portfolio (MVP) just don’t make sense, as they offer less return for the same or even higher risk. A few things really clicked for us. We noticed how much NVDA dominated the Optimal Portfolio with a 44% weight, which showed us how a single high-performing stock can take over when you’re focused solely on the Sharpe ratio. It made us wonder if relying that heavily on one asset would actually work out in the real world. We also spent time comparing the Optimal Portfolio to the MVP. It was a great lesson in trade-offs: one aims for the best possible return per unit of risk, while the other just tries to keep things steady. Neither is “better” overall; it really just depends on how much risk an investor is willing to handle. Finally, we recognized that while this model is powerful, it has its limits. Since it relies on historical data to predict the future, any errors in our initial assumptions get magnified. Overall, this project helped us connect the math to real-world numbers in a way that finally made the concepts feel intuitive.