In this project I will attempt to rebalance the Dow Jones Industrial Average in such a way as to expect a higher return, a lower risk, or both. I feel this is a lot to digest at first. So, let’s start with the basics.
1. What is the Dow Jones?
Investopedia Link
1
From investopedia, we learn:
The Dow Jones Industrial Average (DJIA) is a stock market index that tracks 30 large, publicly-owned blue-chip companies trading on the New York Stock Exchange (NYSE) and Nasdaq.
2. How is the index calculated?
Basically, you
get 1 stock from each of the 30 companies, add them up and then multiply
by an arbitrary constant (yes, basically arbitrary) which is 6.59. More
on this constant here: Investopedia
Link 2
Since we are interested in the return from Month-to-Month, we can skip this constant. Removing it from the equation will not affect the calculated return.
We are going to get historical prices for all 30 stocks from yahoo finance using quantmod library.
suppressPackageStartupMessages(library(quantmod,warn.conflicts = FALSE))
suppressPackageStartupMessages(library(dplyr,warn.conflicts = FALSE))
symbols=c(
'MMM','AXP','AMGN','AAPL','BA','CAT','CVX','CSCO','KO','DOW','GS','HD','HON',
'IBM','INTC','JNJ','JPM','MCD','MRK','MSFT','NKE','PG','CRM','TRV','UNH','VZ',
'V','WBA','WMT','DIS'
)
getSymbols(symbols, src='yahoo', from='2020-01-01', to ='2023-10-01')
## [1] "MMM" "AXP" "AMGN" "AAPL" "BA" "CAT" "CVX" "CSCO" "KO" "DOW"
## [11] "GS" "HD" "HON" "IBM" "INTC" "JNJ" "JPM" "MCD" "MRK" "MSFT"
## [21] "NKE" "PG" "CRM" "TRV" "UNH" "VZ" "V" "WBA" "WMT" "DIS"
Putting all 30 stocks in a list to apply cleaning functions to all companies.
listStocks = list(
MMM,AXP,AMGN, AAPL,BA,CAT,CVX,CSCO, KO,DOW,GS,HD,HON,IBM,INTC,JNJ,JPM,MCD,
MRK,MSFT,NKE,PG,CRM,TRV,UNH,VZ,V, WBA,WMT,DIS
)
# all elements of list as dataframe
listStocks1=lapply(listStocks,as.data.frame)
# function to move rownames (date) to column
getDateToColumn=function(aDataFrame){
x=rownames(aDataFrame)
aDataFrame$Date = as.Date(x)
aDataFrame
}
listStocks2=lapply(listStocks1,getDateToColumn)
#making sure we don't have any missing dates
df_dates=data.frame(allDates=seq(as.Date('2020-01-02'),as.Date('2023-10-01'), by='day' ))
fn_leftJoinDates = function(df){
df1=df_dates %>% left_join(df,by=c('allDates'='Date'))
df1
}
listStocks3=lapply(listStocks2,fn_leftJoinDates)
rm(listStocks,listStocks1,listStocks2)
library(zoo)
#filling in values based on last existing non-NA value
listStocks4=lapply(listStocks3,na.locf)
suppressPackageStartupMessages(library(lubridate))
fn_filterLastDayMonth = function(df){
#filtering for last day of the month
df1=df %>% filter(day(allDates) == days_in_month(allDates))
#selecting columns 1 and 5
df1[,c(1,5)]
}
listStocks5=lapply(listStocks4,fn_filterLastDayMonth)
#joining all df into 1
joiningAllDf = function(aList){
baseStart = aList[[1]]
for(i in 2:30){
baseStart = baseStart %>% left_join(aList[[i]], by='allDates')
}
baseStart
}
df_stockPricesAll=joiningAllDf(listStocks5)
After initial data wrangling is done, this is the output:
df_stockPricesAll %>% head()
Composition of the Dow Jones as of Sept 30, 2023
library(ggplot2)
df_September=df_stockPricesAll %>% filter(allDates=='2023-09-30') %>%
tidyr::pivot_longer(cols=2:31)
df_September$name=stringr::str_replace(df_September$name,'.Close','')
df_September$name <- reorder(df_September$name, -df_September$value)
df_September%>%
ggplot()+
geom_col(
aes(x=name,y=value),fill='steelblue4'
)+
labs(title = 'Price of Stock End of September 2023',
y='Price of Stock',x='Company Name') +
theme(
axis.text.x = ggplot2::element_text(angle = 45, hjust=1)
)
We notice that United Health Group, followed by a distant second
Goldman Sachs take the top 2 spots in the Dow Jones Index.
Important Note 1 It is important to notice that the DJIA is a Price Weighed Index, as explained here: link to document
A price weighed index just means that the weight assigned to each stock is based on its price.
Important Note 2 The composition of the Dow Jones can change over time, and it certainly has. As recent as of August 31, 2020 Pfizer, ExxonMobil, and Raytheon are no longer part of the index. They were replaced by Amgen, Honeywell, and Salesforce. link to article
This means that our analysis will be based on the historical performance of the current composition of the index, expecting that it will stay unchanged going into the future.
Now, we want to calculate the return of each stock and that of the DJIA index.
So, return is basically the percentage of price change.
\[\dfrac{Price\_This\_Month - Price\_Last\_Month}{Price\_Last\_Month}\]
df_stockPricesAll %>%
mutate(
totalIndexPrice = df_stockPricesAll[,2:31] %>% rowSums()
) -> df_stockPricesAll
fn_calculateReturn=function(aDf){
aDf1=aDf %>% select(-allDates)
percentage_change_df <- aDf1 %>%
mutate_all(funs((. - lag(.))/lag(.)))
# Combine with the date column
percentage_change_df <- cbind(aDf['allDates'], percentage_change_df)
percentage_change_df[-1,]
}
df_returns_all=fn_calculateReturn(df_stockPricesAll)
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
df_returns_all %>% head
Ok, now let’s observe the return of the Dow Jones in time:
library(scales)
df_returns_all %>% select(allDates,totalIndexPrice) %>%
ggplot()+geom_line(aes(x=allDates,y=totalIndexPrice),color='steelblue4')+
scale_y_continuous(labels = label_percent(scale = 100))+
geom_smooth(aes(x=allDates,y=totalIndexPrice),method = "lm", se = T, color = "blue",linetype = "dotted")+
labs(y='Index Price',x='Time')
## `geom_smooth()` using formula = 'y ~ x'
It seems on average that the Dow Jones index has been performing
well below 2.5% return month to month, with very high spikes and very
deep lows which is typical of a risky asset.
The average return of the index, and its standard deviation has been:
df_returns_all %>% select(totalIndexPrice) %>% summarise(average_return_index=mean(totalIndexPrice),st.dev.index=sd(totalIndexPrice))
What this tells us is that the index return average is close to zero percent, and its volatility has been +/- 5 percentual points from its average.
Ideally, what we would like to know if there is another combination of the same 30 stocks where we can achieve a higher average return, or a smaller volatility or both. In other words, we would like to find an optimal combination of weights where we can maximize:
\[\dfrac{E(r)}{Risk}\]
E(r) = Expected Portfolio Return
Risk as measured by the standard
deviation of the portfolio.
And with this we are approximating the equation of the Sharpe Ratio learn about Sharpe Ratio here. Except that the Sharpe ratio depends on a constant being the risk free rate. Since it is as constant, it would not affect the comparison to find our optimal porfolio. So, we will proceed with our formula above.
But, how are we going to visualize this?
Well, we are going to use a bit of geometry First, to visualize a few different weighted portfolios, we are going to draw a circumference with all 30 stocks equally distant from each other forming a circle. As such:
#getting 30 degrees equally distant from each other: 12, 24, 36... 360
degrees=seq(12,360, by=12)
#circle of radius = 1
heights1 <- sin(degrees*pi/180) #all heights
lengths1 <- cos(degrees*pi/180) #all lengths
df_init=data.frame(l1=lengths1,h1=heights1,s1=symbols)
df_init %>% ggplot()+
geom_text(aes(l1,h1,label=s1))
Let’s now visualize, equally weighted portfolio vs arbitrarily
weighted portfolio
#equally weighed portfolio: all stocks are given the same weight (1/30)
df_equalWeight1=df_init %>% mutate(eqWeightedLength = (1/30) * l1,
eqWeighedHeight = (1/30) * h1) %>%
summarise(weightedLenght= sum(eqWeightedLength),
weightedHeight = sum(eqWeighedHeight))
#arbitrarily providing more weight to bottom-right quadrant (60% weight)
myWeights = c(
#anything else
rep(1,21),18,
#bottom right quadrant
10,10,10,10,10,5,5,1)/100
anotherDf=data.frame(name=symbols,value=myWeights)
df_myWeight1=df_init %>% left_join(anotherDf,by=c('s1'='name')) %>%
mutate(
weightedLenght= l1*value,
weightedHeight = h1*value
) %>%
summarise(
totalWeightedLength = sum(weightedLenght),
totalWeightedHeight = sum(weightedHeight)
)
df_init %>% ggplot()+
geom_text(aes(l1,h1,label=s1))+
geom_point(aes(df_equalWeight1$weightedLenght,df_equalWeight1$weightedHeight),color='red')+
geom_text(aes(df_equalWeight1$weightedLenght,df_equalWeight1$weightedHeight + 0.1,label='equally weighted (1/30)'), color ='red')+
geom_point(aes(df_myWeight1$totalWeightedLength,df_myWeight1$totalWeightedHeight),color='blue')+
geom_text(aes(df_myWeight1$totalWeightedLength,df_myWeight1$totalWeightedHeight+0.1,label='arbitrary - higher weight to bottom right quad'), color='blue')
Now, to get an optimum portfolio we will make use of Monte Carlo Simulation. We will arbitrarily select 10,000 different random portfolios.
Getting 10,000 random portfolios with different weights
Example of first 3 porfolios
set.seed(123)
# get 10,000 portfolios of 30 random weights that add to 1. Each porfolio of weights is a column
matrix_weights_columnWise=replicate(10000,crearNumerosRandomSumanUno(30))
matrix_weights_columnWise[,1:3]
## [,1] [,2] [,3]
## [1,] 0.01 0.00 0.00
## [2,] 0.00 0.05 0.00
## [3,] 0.00 0.00 0.00
## [4,] 0.00 0.00 0.00
## [5,] 0.00 0.00 0.03
## [6,] 0.00 0.00 0.00
## [7,] 0.00 0.00 0.00
## [8,] 0.00 0.95 0.00
## [9,] 0.13 0.00 0.00
## [10,] 0.02 0.00 0.00
## [11,] 0.01 0.00 0.00
## [12,] 0.00 0.00 0.00
## [13,] 0.00 0.00 0.00
## [14,] 0.02 0.00 0.00
## [15,] 0.00 0.00 0.00
## [16,] 0.30 0.00 0.00
## [17,] 0.00 0.00 0.00
## [18,] 0.00 0.00 0.00
## [19,] 0.00 0.00 0.00
## [20,] 0.00 0.00 0.24
## [21,] 0.00 0.00 0.00
## [22,] 0.00 0.00 0.00
## [23,] 0.00 0.00 0.00
## [24,] 0.50 0.00 0.00
## [25,] 0.00 0.00 0.05
## [26,] 0.00 0.00 0.00
## [27,] 0.00 0.00 0.00
## [28,] 0.01 0.00 0.68
## [29,] 0.00 0.00 0.00
## [30,] 0.00 0.00 0.00
Function to Calculate Return, and Risk of all 10,000 Portfolios
df_numeric_returnsOnly =
df_returns_all %>%
filter(allDates>= '2021-01-01' &
allDates <= '2022-12-31') %>%
select(-c(allDates,totalIndexPrice))
mean_return=
df_numeric_returnsOnly %>%
colMeans()
covariance_portfolio = cov(df_numeric_returnsOnly)
calculateReturn = function(columna_de_pesos){
sum(mean_return * columna_de_pesos)
}
calculateRisk = function(columna_de_pesos){
varPortfolio=t(columna_de_pesos) %*%
covariance_portfolio %*%
t(t(columna_de_pesos))
sqrt(varPortfolio)
}
vector_allReturns = apply(matrix_weights_columnWise,2,calculateReturn)
vector_allRisk = apply(matrix_weights_columnWise,2,calculateRisk)
vector_SharpeR = vector_allReturns/vector_allRisk
The current risk-return trade-off of Dow Jones is:
df_returns_all %>%
filter(allDates >= '2021-01-31' & allDates <= '2022-12-31') %>%
select(totalIndexPrice) %>%
summarise(average_return_index=mean(totalIndexPrice),
st.dev.index=sd(totalIndexPrice),
currentSharpe = mean(totalIndexPrice)/sd(totalIndexPrice)
)
Let’s view those portfolios with: Highest Sharpe Ratio
df_risk_and_return=data.frame(vector_allReturns,vector_allRisk, vector_SharpeR)
df_risk_and_return$portfolioNumber = 1:10000 #giving portfolio numbers
# 5 Portfolios with Highest Sharpe Ratio
df_risk_and_return %>%
arrange(desc(vector_SharpeR)) %>%
head(5)
library(scales)
df_forLinePlot=data.frame(allDates=df_stockPricesAll %>% filter(allDates>= '2020-12-31') %>% pull(allDates),
`DJIA`=fn_moneyInvested_inTime(weight_vector = index_initialWeights_Dec2020) ,
`DJIA Low Risk and High Return`= fn_moneyInvested_inTime(weight_vector =matrix_weights_columnWise[,portafolio_similarDJIA_butBetter]),
`Highest Return`= fn_moneyInvested_inTime(weight_vector =matrix_weights_columnWise[,portafolio_highestReturn]),
`Lowest Risk`= fn_moneyInvested_inTime(weight_vector =matrix_weights_columnWise[,portafolio_lowestRisk]),
`Highest Sharpe`= fn_moneyInvested_inTime(weight_vector =matrix_weights_columnWise[,portafolio_highestSharpe])
) %>% tidyr::pivot_longer(cols=2:6)
df_forLinePlot_v2=df_forLinePlot %>% filter(allDates=='2023-09-30')
df_forLinePlot%>%
ggplot()+
geom_line(aes(x=allDates,y=value,color=name,linetype=name))+
geom_text(data=df_forLinePlot_v2,aes(x=allDates,y=value+7000,
label=sprintf("$%s", formatC(round(value,0), format = "d", big.mark = ","))),size=2.5)+
labs(x='Time',y='Portfolio Value', title='Investing $100,000 in Rebalanced Dow Jones Index')+
scale_y_continuous(labels=scales::dollar_format())+
scale_x_date(labels = scales::date_format("%b %Y"),breaks = '3 month')+
theme(
legend.position = "bottom",
legend.key.size = unit(0.5, "cm"), # Adjust the size of the legend keys
legend.text = element_text(size = 8), # Adjust the size of legend text
legend.title = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1,size = 8)
)
We can notice that if we had started with $100,000 in Dec 2020, and invested them in a similar risky portfolio but with a higher return, we would have $137,815 rather than $109,510 if we had invested in the DJIA. Let’s remember that we are considering the same 30 stocks as the DJIA, we are just assigning different weights on how to distribute our money.
There are clearly portfolios than in the same time period would have netted us a negative 26 thousand dollars, or others with higher volatility but a higher return as the top 2 investment vehicles in the chart above. At the end, we are looking to have the highest return with the lowest risk, but this is easier said than done. There are many more ways that we can combine those 30 stocks and do worse than the index, than the ways we can combine them and do better. The reality is that the lower the risk, the lower the return and vice-versa.
Finally, I think it would be great visualizing how these portfolios look in our circle diagram:
#equally weighed portfolio: all stocks are given the same weight (1/30)
df_equalWeight1=df_init %>% mutate(eqWeightedLength = (1/30) * l1,
eqWeighedHeight = (1/30) * h1) %>%
summarise(weightedLenght= sum(eqWeightedLength),
weightedHeight = sum(eqWeighedHeight))
#arbitrarily providing more weight to bottom-right quadrant (60% weight)
myWeights = c(
#anything else
rep(1,21),18,
#bottom right quadrant
10,10,10,10,10,5,5,1)/100
anotherDf=data.frame(name=symbols,value=myWeights)
fn_visualizePortfolioInCircle=function(theWeights){
weighedLength1 = df_init$l1 * theWeights
weighedHeight1 = df_init$h1 * theWeights
c(sum(weighedLength1),sum(weighedHeight1))
}
df_init %>% ggplot()+
geom_text(aes(l1,h1,label=s1))+
geom_point(aes(df_equalWeight1$weightedLenght,df_equalWeight1$weightedHeight),color='tan',size=3)+
geom_text(aes(df_equalWeight1$weightedLenght,df_equalWeight1$weightedHeight + 0.1,label='equally weighted (1/30)'), color ='tan')+
#DJIA
geom_point(aes(fn_visualizePortfolioInCircle(index_initialWeights_Dec2020)[1],fn_visualizePortfolioInCircle(index_initialWeights_Dec2020)[2]),color='black',size=3)+
geom_text(aes(fn_visualizePortfolioInCircle(index_initialWeights_Dec2020)[1],fn_visualizePortfolioInCircle(index_initialWeights_Dec2020)[2]+0.1,label='DJIA'), color='black')+
#Similar DJIA but Higher Return
geom_point(aes(fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_similarDJIA_butBetter])[1],
fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_similarDJIA_butBetter])[2]),color='navy',size=3)+
geom_text(aes(
fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_similarDJIA_butBetter])[1],
fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_similarDJIA_butBetter])[2]+0.1,
label='DJIA Equal Risk and Higher Return'), color='navy')+
#Highest Return
geom_point(aes(fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestReturn])[1],
fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestReturn])[2]),color='red',size=3)+
geom_text(aes(
fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestReturn])[1],
fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestReturn])[2]+0.1,
label='Highest Return'), color='red')+
#Highest Sharpe
geom_point(aes(fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestSharpe])[1],
fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestSharpe])[2]),color='steelblue3',size=3)+
geom_text(aes(
fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestSharpe])[1],
fn_visualizePortfolioInCircle(matrix_weights_columnWise[,portafolio_highestSharpe])[2]+0.1,
label='Highest Sharpe'), color='steelblue3')+
theme(
axis.text = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank()
)
This graph shows that the highest return is actually putting
all of our money on CVX (Chevron Corporation), with a high risk no
doubt.
This is an attempt to demonstrate the risk-tradeoff with return and how it could potentially be used to study Portfolio Optimization. (Note: Weight of DJIA as of Dec 2020 to be comparable with chart before)