First things First

Getting all 30 Stocks that comprise the Index

We are going to get historical prices for all 30 stocks from yahoo finance using quantmod library.

suppressPackageStartupMessages(library(quantmod,warn.conflicts = FALSE))
suppressPackageStartupMessages(library(dplyr,warn.conflicts = FALSE))

symbols=c(
  'MMM','AXP','AMGN','AAPL','BA','CAT','CVX','CSCO','KO','DOW','GS','HD','HON',
  'IBM','INTC','JNJ','JPM','MCD','MRK','MSFT','NKE','PG','CRM','TRV','UNH','VZ',
  'V','WBA','WMT','DIS'
)
getSymbols(symbols, src='yahoo', from='2020-01-01', to ='2023-10-01')

##  [1] "MMM"  "AXP"  "AMGN" "AAPL" "BA"   "CAT"  "CVX"  "CSCO" "KO"   "DOW" 
## [11] "GS"   "HD"   "HON"  "IBM"  "INTC" "JNJ"  "JPM"  "MCD"  "MRK"  "MSFT"
## [21] "NKE"  "PG"   "CRM"  "TRV"  "UNH"  "VZ"   "V"    "WBA"  "WMT"  "DIS"

Putting all 30 stocks in a list to apply cleaning functions to all companies.

listStocks = list(
  MMM,AXP,AMGN, AAPL,BA,CAT,CVX,CSCO, KO,DOW,GS,HD,HON,IBM,INTC,JNJ,JPM,MCD,
  MRK,MSFT,NKE,PG,CRM,TRV,UNH,VZ,V, WBA,WMT,DIS
)

# all elements of list as dataframe
listStocks1=lapply(listStocks,as.data.frame)

# function to move rownames (date) to column 
getDateToColumn=function(aDataFrame){
  x=rownames(aDataFrame)
  aDataFrame$Date = as.Date(x)
  aDataFrame
}

listStocks2=lapply(listStocks1,getDateToColumn)

#making sure we don't have any missing dates
df_dates=data.frame(allDates=seq(as.Date('2020-01-02'),as.Date('2023-10-01'), by='day' ))

fn_leftJoinDates = function(df){
  df1=df_dates %>% left_join(df,by=c('allDates'='Date'))
  df1
}

listStocks3=lapply(listStocks2,fn_leftJoinDates)
rm(listStocks,listStocks1,listStocks2)

library(zoo)
#filling in values based on last existing non-NA value
listStocks4=lapply(listStocks3,na.locf)

suppressPackageStartupMessages(library(lubridate))

fn_filterLastDayMonth = function(df){
  #filtering for last day of the month
  df1=df %>% filter(day(allDates) == days_in_month(allDates))
  #selecting columns 1 and 5
  df1[,c(1,5)]
}

listStocks5=lapply(listStocks4,fn_filterLastDayMonth)

#joining all df into 1 
joiningAllDf = function(aList){
  baseStart = aList[[1]]
  
  for(i in 2:30){
      baseStart = baseStart %>% left_join(aList[[i]], by='allDates')  
  }
  
  baseStart
  
}

df_stockPricesAll=joiningAllDf(listStocks5)

After initial data wrangling is done, this is the output:

df_stockPricesAll %>% head()

Composition of the Dow Jones as of Sept 30, 2023

library(ggplot2)

df_September=df_stockPricesAll %>% filter(allDates=='2023-09-30') %>% 
  tidyr::pivot_longer(cols=2:31) 
df_September$name=stringr::str_replace(df_September$name,'.Close','')

df_September$name <- reorder(df_September$name, -df_September$value)

  df_September%>% 
  ggplot()+
  geom_col(
    aes(x=name,y=value),fill='steelblue4'
  )+
    labs(title = 'Price of Stock End of September 2023',
         y='Price of Stock',x='Company Name') +
  theme(
    axis.text.x = ggplot2::element_text(angle = 45, hjust=1)
  )

We notice that United Health Group, followed by a distant second Goldman Sachs take the top 2 spots in the Dow Jones Index.

Important Note 1 It is important to notice that the DJIA is a Price Weighed Index, as explained here: link to document

A price weighed index just means that the weight assigned to each stock is based on its price.

Important Note 2 The composition of the Dow Jones can change over time, and it certainly has. As recent as of August 31, 2020 Pfizer, ExxonMobil, and Raytheon are no longer part of the index. They were replaced by Amgen, Honeywell, and Salesforce. link to article

This means that our analysis will be based on the historical performance of the current composition of the index, expecting that it will stay unchanged going into the future.

Now, we want to calculate the return of each stock and that of the DJIA index.

So, return is basically the percentage of price change.

\[\dfrac{Price\_This\_Month - Price\_Last\_Month}{Price\_Last\_Month}\]

df_stockPricesAll %>% 
  mutate(
    totalIndexPrice = df_stockPricesAll[,2:31] %>% rowSums()
    
  ) -> df_stockPricesAll


fn_calculateReturn=function(aDf){
  aDf1=aDf %>% select(-allDates)
  
  percentage_change_df <- aDf1 %>%
    mutate_all(funs((. - lag(.))/lag(.))) 
  
  # Combine with the date column
  percentage_change_df <- cbind(aDf['allDates'], percentage_change_df)
  
  percentage_change_df[-1,]
}

df_returns_all=fn_calculateReturn(df_stockPricesAll)

## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
## 
## # Simple named list: list(mean = mean, median = median)
## 
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
## 
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

df_returns_all %>% head

Ok, now let’s observe the return of the Dow Jones in time:

library(scales)
df_returns_all %>% select(allDates,totalIndexPrice) %>% 
  ggplot()+geom_line(aes(x=allDates,y=totalIndexPrice),color='steelblue4')+
  scale_y_continuous(labels = label_percent(scale = 100))+
  geom_smooth(aes(x=allDates,y=totalIndexPrice),method = "lm", se = T, color = "blue",linetype = "dotted")+
  labs(y='Index Price',x='Time')

## `geom_smooth()` using formula = 'y ~ x'

It seems on average that the Dow Jones index has been performing well below 2.5% return month to month, with very high spikes and very deep lows which is typical of a risky asset.

The average return of the index, and its standard deviation has been:

df_returns_all %>% select(totalIndexPrice) %>% summarise(average_return_index=mean(totalIndexPrice),st.dev.index=sd(totalIndexPrice))

What this tells us is that the index return average is close to zero percent, and its volatility has been +/- 5 percentual points from its average.

What’s Next?

Ideally, what we would like to know if there is another combination of the same 30 stocks where we can achieve a higher average return, or a smaller volatility or both. In other words, we would like to find an optimal combination of weights where we can maximize:

\[\dfrac{E(r)}{Risk}\]

E(r) = Expected Portfolio Return
Risk as measured by the standard deviation of the portfolio.

And with this we are approximating the equation of the Sharpe Ratio learn about Sharpe Ratio here. Except that the Sharpe ratio depends on a constant being the risk free rate. Since it is as constant, it would not affect the comparison to find our optimal porfolio. So, we will proceed with our formula above.

But, how are we going to visualize this?

Well, we are going to use a bit of geometry First, to visualize a few different weighted portfolios, we are going to draw a circumference with all 30 stocks equally distant from each other forming a circle. As such:

#getting 30 degrees equally distant from each other: 12, 24, 36... 360
degrees=seq(12,360, by=12)

#circle of radius = 1 
heights1 <- sin(degrees*pi/180) #all heights
lengths1 <-  cos(degrees*pi/180) #all lengths

df_init=data.frame(l1=lengths1,h1=heights1,s1=symbols)

df_init %>% ggplot()+
  geom_text(aes(l1,h1,label=s1))

Let’s now visualize, equally weighted portfolio vs arbitrarily weighted portfolio

#equally weighed portfolio: all stocks are given the same weight (1/30)
df_equalWeight1=df_init %>% mutate(eqWeightedLength = (1/30) * l1, 
                                   eqWeighedHeight = (1/30) * h1) %>% 
  summarise(weightedLenght= sum(eqWeightedLength),
            weightedHeight = sum(eqWeighedHeight))




#arbitrarily providing more weight to bottom-right quadrant (60% weight)
myWeights = c( 
                #anything else
                rep(1,21),18,
               #bottom right quadrant
               10,10,10,10,10,5,5,1)/100


anotherDf=data.frame(name=symbols,value=myWeights)


df_myWeight1=df_init %>% left_join(anotherDf,by=c('s1'='name')) %>% 
  mutate(
    weightedLenght= l1*value,
    weightedHeight = h1*value
  ) %>% 
  summarise(
    totalWeightedLength = sum(weightedLenght),
    totalWeightedHeight = sum(weightedHeight)
  )


df_init %>% ggplot()+
  geom_text(aes(l1,h1,label=s1))+
  geom_point(aes(df_equalWeight1$weightedLenght,df_equalWeight1$weightedHeight),color='red')+
  geom_text(aes(df_equalWeight1$weightedLenght,df_equalWeight1$weightedHeight + 0.1,label='equally weighted (1/30)'), color ='red')+
  geom_point(aes(df_myWeight1$totalWeightedLength,df_myWeight1$totalWeightedHeight),color='blue')+
  geom_text(aes(df_myWeight1$totalWeightedLength,df_myWeight1$totalWeightedHeight+0.1,label='arbitrary - higher weight to bottom right quad'), color='blue')

Now, to get an optimum portfolio we will make use of Monte Carlo Simulation. We will arbitrarily select 10,000 different random portfolios.

Getting 10,000 random portfolios with different weights

Example of first 3 porfolios

set.seed(123)
# get 10,000 portfolios of 30 random weights that add to 1. Each porfolio of weights is a column
matrix_weights_columnWise=replicate(10000,crearNumerosRandomSumanUno(30))

matrix_weights_columnWise[,1:3]

##       [,1] [,2] [,3]
##  [1,] 0.01 0.00 0.00
##  [2,] 0.00 0.05 0.00
##  [3,] 0.00 0.00 0.00
##  [4,] 0.00 0.00 0.00
##  [5,] 0.00 0.00 0.03
##  [6,] 0.00 0.00 0.00
##  [7,] 0.00 0.00 0.00
##  [8,] 0.00 0.95 0.00
##  [9,] 0.13 0.00 0.00
## [10,] 0.02 0.00 0.00
## [11,] 0.01 0.00 0.00
## [12,] 0.00 0.00 0.00
## [13,] 0.00 0.00 0.00
## [14,] 0.02 0.00 0.00
## [15,] 0.00 0.00 0.00
## [16,] 0.30 0.00 0.00
## [17,] 0.00 0.00 0.00
## [18,] 0.00 0.00 0.00
## [19,] 0.00 0.00 0.00
## [20,] 0.00 0.00 0.24
## [21,] 0.00 0.00 0.00
## [22,] 0.00 0.00 0.00
## [23,] 0.00 0.00 0.00
## [24,] 0.50 0.00 0.00
## [25,] 0.00 0.00 0.05
## [26,] 0.00 0.00 0.00
## [27,] 0.00 0.00 0.00
## [28,] 0.01 0.00 0.68
## [29,] 0.00 0.00 0.00
## [30,] 0.00 0.00 0.00

Function to Calculate Return, and Risk of all 10,000 Portfolios

df_numeric_returnsOnly = 
  df_returns_all %>% 
  filter(allDates>= '2021-01-01' & 
         allDates <= '2022-12-31') %>% 
  select(-c(allDates,totalIndexPrice))

mean_return=
  df_numeric_returnsOnly %>% 
  colMeans()

covariance_portfolio = cov(df_numeric_returnsOnly)

calculateReturn = function(columna_de_pesos){
  sum(mean_return * columna_de_pesos)
}

calculateRisk = function(columna_de_pesos){
  varPortfolio=t(columna_de_pesos) %*% 
  covariance_portfolio %*% 
  t(t(columna_de_pesos)) 
  
  sqrt(varPortfolio)
}


vector_allReturns = apply(matrix_weights_columnWise,2,calculateReturn) 
vector_allRisk = apply(matrix_weights_columnWise,2,calculateRisk) 
vector_SharpeR = vector_allReturns/vector_allRisk

The current risk-return trade-off of Dow Jones is:

df_returns_all %>% 
  filter(allDates >= '2021-01-31' & allDates <= '2022-12-31') %>%  
  select(totalIndexPrice) %>%
  summarise(average_return_index=mean(totalIndexPrice),
            st.dev.index=sd(totalIndexPrice),
            currentSharpe = mean(totalIndexPrice)/sd(totalIndexPrice)
            )

Let’s view those portfolios with: Highest Sharpe Ratio

df_risk_and_return=data.frame(vector_allReturns,vector_allRisk, vector_SharpeR)
df_risk_and_return$portfolioNumber = 1:10000 #giving portfolio numbers


# 5 Portfolios with Highest Sharpe Ratio
df_risk_and_return %>% 
  arrange(desc(vector_SharpeR)) %>% 
  head(5)

Visually inspecting 5 portfolios:

Dow Jones
Dow Jones with similar Risk but Higher Return
Highest Sharpe Ratio
Lowest Risk
Highest Return

library(scales)

df_forLinePlot=data.frame(allDates=df_stockPricesAll %>% filter(allDates>= '2020-12-31') %>% pull(allDates),
           `DJIA`=fn_moneyInvested_inTime(weight_vector = index_initialWeights_Dec2020)  ,
           `DJIA Low Risk and High Return`= fn_moneyInvested_inTime(weight_vector =matrix_weights_columnWise[,portafolio_similarDJIA_butBetter]),
           `Highest Return`= fn_moneyInvested_inTime(weight_vector =matrix_weights_columnWise[,portafolio_highestReturn]),
           `Lowest Risk`= fn_moneyInvested_inTime(weight_vector =matrix_weights_columnWise[,portafolio_lowestRisk]),
           `Highest Sharpe`= fn_moneyInvested_inTime(weight_vector =matrix_weights_columnWise[,portafolio_highestSharpe])
           
           ) %>% tidyr::pivot_longer(cols=2:6) 

df_forLinePlot_v2=df_forLinePlot %>% filter(allDates=='2023-09-30')



  df_forLinePlot%>% 
  ggplot()+
  geom_line(aes(x=allDates,y=value,color=name,linetype=name))+
  geom_text(data=df_forLinePlot_v2,aes(x=allDates,y=value+7000,
                                       label=sprintf("$%s", formatC(round(value,0), format = "d", big.mark = ","))),size=2.5)+
  labs(x='Time',y='Portfolio Value', title='Investing $100,000 in Rebalanced Dow Jones Index')+
  scale_y_continuous(labels=scales::dollar_format())+
  scale_x_date(labels = scales::date_format("%b %Y"),breaks = '3 month')+
  theme(
    legend.position = "bottom",
    legend.key.size = unit(0.5, "cm"),  # Adjust the size of the legend keys
    legend.text = element_text(size = 8),  # Adjust the size of legend text
    legend.title = element_blank(),
    axis.text.x = element_text(angle = 45, hjust = 1,size = 8)
    
  )

We can notice that if we had started with $100,000 in Dec 2020, and invested them in a similar risky portfolio but with a higher return, we would have $137,815 rather than $109,510 if we had invested in the DJIA. Let’s remember that we are considering the same 30 stocks as the DJIA, we are just assigning different weights on how to distribute our money.

There are clearly portfolios than in the same time period would have netted us a negative 26 thousand dollars, or others with higher volatility but a higher return as the top 2 investment vehicles in the chart above. At the end, we are looking to have the highest return with the lowest risk, but this is easier said than done. There are many more ways that we can combine those 30 stocks and do worse than the index, than the ways we can combine them and do better. The reality is that the lower the risk, the lower the return and vice-versa.

Finally, I think it would be great visualizing how these portfolios look in our circle diagram:

#equally weighed portfolio: all stocks are given the same weight (1/30)
df_equalWeight1=df_init %>% mutate(eqWeightedLength = (1/30) * l1, 
                                   eqWeighedHeight = (1/30) * h1) %>% 
  summarise(weightedLenght= sum(eqWeightedLength),
            weightedHeight = sum(eqWeighedHeight))




#arbitrarily providing more weight to bottom-right quadrant (60% weight)
myWeights = c( 
                #anything else
                rep(1,21),18,
               #bottom right quadrant
               10,10,10,10,10,5,5,1)/100


anotherDf=data.frame(name=symbols,value=myWeights)


fn_visualizePortfolioInCircle=function(theWeights){

  weighedLength1 = df_init$l1 * theWeights 
  weighedHeight1 = df_init$h1 * theWeights 
  
  c(sum(weighedLength1),sum(weighedHeight1))
  
}

df_init %>% ggplot()+
  geom_text(aes(l1,h1,label=s1))+
  geom_point(aes(df_equalWeight1$weightedLenght,df_equalWeight1$weightedHeight),color='tan',size=3)+
  geom_text(aes(df_equalWeight1$weightedLenght,df_equalWeight1$weightedHeight + 0.1,label='equally weighted (1/30)'), color ='tan')+
  
  #DJIA
  geom_point(aes(fn_visualizePortfolioInCircle(index_initialWeights_Dec2020)[1],fn_visualizePortfolioInCircle(index_initialWeights_Dec2020)[2]),color='black',size=3)+
  geom_text(aes(fn_visualizePortfolioInCircle(index_initialWeights_Dec2020)[1],fn_visualizePortfolioInCircle(index_initialWeights_Dec2020)[2]+0.1,label='DJIA'), color='black')+
  
  #Similar DJIA but Higher Return
  geom_point(aes(fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_similarDJIA_butBetter])[1],
                 fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_similarDJIA_butBetter])[2]),color='navy',size=3)+
  geom_text(aes(
    fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_similarDJIA_butBetter])[1],
    fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_similarDJIA_butBetter])[2]+0.1,
    label='DJIA Equal Risk and Higher Return'), color='navy')+
  
  #Highest Return 
  
  geom_point(aes(fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestReturn])[1],
                 fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestReturn])[2]),color='red',size=3)+
  geom_text(aes(
    fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestReturn])[1],
    fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestReturn])[2]+0.1,
    label='Highest Return'), color='red')+
  
  #Highest Sharpe
   geom_point(aes(fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestSharpe])[1],
                 fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestSharpe])[2]),color='steelblue3',size=3)+
  geom_text(aes(
    fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestSharpe])[1],
    fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestSharpe])[2]+0.1,
    label='Highest Sharpe'), color='steelblue3')+
  
  theme(
    axis.text = element_blank(),
    axis.ticks = element_blank(),
    axis.title = element_blank()
  )

This graph shows that the highest return is actually putting all of our money on CVX (Chevron Corporation), with a high risk no doubt.

This is an attempt to demonstrate the risk-tradeoff with return and how it could potentially be used to study Portfolio Optimization. (Note: Weight of DJIA as of Dec 2020 to be comparable with chart before)

Dow Jones Industrial Average Rebalancing

Intro

First things First

Getting all 30 Stocks that comprise the Index

What’s Next?

Visually inspecting 5 portfolios: