Background : If you’ve looked into investing at all you’ve probably heard about the historical returns of the S&P 500. Reliable news sources have reported them at anywhere from 8% to 11%. That’s quite a range so what’s the real answer?
Goal : Let’s calculate the historical returns for ourselves. We will look at a simple average return as well as run some Monte Carlo simulations.
Let’s start by clearing the environment and loading the packages required.
# Optional memory clear
rm(list=ls())
# Disable Scientific Notation in printing
options(scipen=999)
# Load libraries
require(tidyverse)
require(quantmod)
require(lubridate)
require(scales)
require(kableExtra)
The Quantmod package allows us to read in historical pricing data for stocks, funds and indexes. We will use it to load historical S&P500 prices and for comparison we will also load the Dow Jones Industrial average as well as 10 Year Treasury yields.
# Load data using the QuantMod package to retrive historical data
# The stock indexes are best loaded from yahoo finance
getSymbols(c("^DJI", "^GSPC"), src='yahoo', from = '1900-01-01',
to = Sys.Date(), warnings = TRUE)
# The 10 Year treasury yields can be loaded from FRED, the
# Federal Reserve Economic Data service of the St. Louis Fed
# It has thousands of data sources
# check it out at https://fred.stlouisfed.org/
getSymbols(c("DGS10"), src='FRED', from = '1900-01-01',
to = Sys.Date(), warnings = TRUE)
# THe Quantmod package returns time series objects so we need to
# convert them to data frames that show prices and the aggregated
# year over year returns.
# First the S&P500
SP500YearReturns <- data.frame(GSPC) %>%
# Because it's a time series the date has to be extracted from the row name
mutate(QuoteDate = as.Date(row.names(.), "%Y-%m-%d"),
# Only the closing prices is relevant for our analysis
CloseValue = GSPC.Close,
# Extract the beginning date of the year
Year = floor_date(QuoteDate, unit = "year")) %>%
# Group by the year
group_by(Year) %>%
# 12/31 is not always the last day the markets are open in a given year
# this extracts the last day prices are available for each year
mutate(MaxYearDate = max(QuoteDate)) %>%
ungroup() %>%
# Now the data frame is filtered to one row for each year
# with the end of year closing price
filter(QuoteDate == MaxYearDate) %>%
# Using the lead/lag function the value in each row can be compared to n
# rows above or below. Using a lag of 1 the year over year return is calculated
mutate(Return = CloseValue / lag(CloseValue, n = 1, order_by = Year) - 1) %>%
dplyr::select(Year, Return) %>%
filter(!is.na(Return))
# Repeat for the Dow Jones Industrial
DOWYearReturns <- data.frame(DJI) %>%
mutate(QuoteDate = as.Date(row.names(.), "%Y-%m-%d"),
CloseValue = DJI.Close,
Year = floor_date(QuoteDate, unit = "year")) %>%
group_by(Year) %>%
mutate(MaxYearDate = max(QuoteDate)) %>%
ungroup() %>%
filter(QuoteDate == MaxYearDate) %>%
mutate(Return = CloseValue / lag(CloseValue, n = 1, order_by = Year) - 1) %>%
dplyr::select(Year, Return) %>%
filter(!is.na(Return))
# And finally the 10 year treasury which is a bit different because the returned data
# already has the yield (return) so in that case we will just return the mean yield
# for each year
TR10YearReturns <- data.frame(DGS10) %>%
mutate(QuoteDate = as.Date(row.names(.), "%Y-%m-%d"),
Return = DGS10 / 100,
Year = floor_date(QuoteDate, unit = "year")) %>%
group_by(Year) %>%
summarise(Return = mean(Return, na.rm = TRUE))
The results are all in the same format as this 5 record sample of the SP500Year data frame:
Year | Return |
---|---|
1928-01-01 | 37.9% |
1929-01-01 | -11.9% |
1930-01-01 | -28.5% |
1931-01-01 | -47.1% |
1932-01-01 | -14.8% |
We could chart the returns or prices by year but those charts are available every day on Yahoo, Marketwatch or FRED.
To demonstrate how often an index provides a particular rate of return, a histogram is much more useful.
SP500YearReturns %>%
ggplot() +
geom_histogram(aes(Return), stat = "bin", bins = 20,
fill = "blue", color = "white") +
geom_vline(aes(xintercept = median(Return)), colour="black", size = 3) +
scale_x_continuous(labels = percent_format()) +
theme_classic() +
geom_label(aes(median(Return), 0, label = percent(median(Return),
accuracy = .1) )) +
labs(
title = paste0("Histogram : SP500 Returns Since ", year(min(SP500YearReturns$Year))),
subtitle = "Annual Returns EOY to EOY - Median Return Highlighted",
x = "Return Rate",
y = "# Years")
So from this data we have calculated that the S&P500 has produced a median return of 10.5% from 1928 through 2019. This excludes any returns from dividend payments which average an additional 2% for the S&P500.
It’s interesting to note that the very worst years are slightly more unfavorable than the very best years. The median is carried upward to 10.5% by the preponderance of those years where returns are between 10% and 25%.
DOWYearReturns %>%
ggplot() +
geom_histogram(aes(Return), stat = "bin", bins = 20,
fill = "dark grey", color = "white") +
geom_vline(aes(xintercept = median(Return)), colour="red", size = 3) +
scale_x_continuous(labels = percent_format()) +
theme_classic() +
geom_label(aes(median(Return), 0, label = percent(median(Return),
accuracy = .1) )) +
labs(
title = paste0("Histogram : DJI Returns Since ", year(min(DOWYearReturns$Year))),
subtitle = "Annual Returns EOY to EOY - Median Return Highlighted",
x = "Return Rate",
y = "# Years")
The Dow Jones Industrials have produced a median return of 11% from 1986 through 2019. This excludes any returns from dividend payments which average an additional 2%.
It is disappointing that the historical data available doesn’t go back further through any of the sources available via the Quantmod package.
TR10YearReturns %>%
mutate(Timeframe = ifelse(Year <= as.Date("1990-01-01") &
Year >= as.Date("1975-01-01"),
"1975-1990", "Other")) %>%
ggplot() +
geom_histogram(aes(Return, fill = Timeframe), stat = "bin", bins = 20,
color = "white") +
scale_fill_manual(values = c("green", "dark grey")) +
geom_vline(aes(xintercept = median(Return)), colour="dark green", size = 3) +
scale_x_continuous(labels = percent_format()) +
theme_classic() +
theme(legend.position="bottom") +
geom_label(aes(median(Return), 0, label = percent(median(Return),
accuracy = .1) )) +
labs(
title = paste0("Histogram : 10 Year Treasury Returns Since ", year(min(TR10YearReturns$Year))),
subtitle = "Annual Returns EOY to EOY - Median Return Highlighted",
x = "Return Rate",
y = "# Years")
10 Year Treasury notes have produced a median return of 5.9% from 1962 through 2019.
It’s also interesting to note that most of the higher yields haven’t repeated in over 30 years. For those reasons we are going to exclude yields prior to 1990 from this point on.
For the next step in our analysis we will do a Monte Carlo simulation on each of these indexes. There are many types of Monte Carlo simulations but ours will be simple as follows:
First we are going to add a numerical index to our historical return data. The index will make it easier to reference a given value when we randomly sample them.
SP500YearReturns <- SP500YearReturns %>%
mutate(Return = round(Return, 3),
Index = row_number()) %>%
dplyr::select(Index, Return)
DOWYearReturns <- DOWYearReturns %>%
mutate(Return = round(Return, 3),
Index = row_number()) %>%
dplyr::select(Index, Return)
# Filter the higher yields from the high inflation era of the 70's and 80's
TR10YearReturns <- TR10YearReturns %>%
filter(Year > as.Date("1990-01-01")) %>%
mutate(Return = round(Return, 3),
Index = row_number()) %>%
dplyr::select(Index, Return)
The results are all in the same format as this 5 record sample of the SP500Year data frame:
Index | Return |
---|---|
1 | 37.9% |
2 | -11.9% |
3 | -28.5% |
4 | -47.1% |
5 | -14.8% |
Now we will set-up and run the Monte Carlo simulations.
# Set number of simulations to run
Simulations <- 500
# Set the number of years to run each simulation for
SimulationYears <- 30
# Using the simulation value and years, sample the actual unique
# returns at random and create a vector of those random returns
# the set.seed prior to each sampling ensures that these exact results
# will be repeated each time the code is ran
set.seed(45239)
SP500ReturnIndex <- sample.int(nrow(SP500YearReturns), (SimulationYears * Simulations),
replace = TRUE)
DOWReturnIndex <- sample.int(nrow(DOWYearReturns), (SimulationYears * Simulations),
replace = TRUE)
TR10ReturnIndex <- sample.int(nrow(TR10YearReturns), (SimulationYears * Simulations),
replace = TRUE)
# Using the crossing function we create a cross join of
# 30 years for each of 500 simulations or 30 * 500 = 15,000 rows
SP500YearMC <- crossing(data.frame(
Simulation = as.numeric(seq(1:Simulations))),
Year = as.numeric(seq(1:SimulationYears))) %>%
mutate(Index = SP500ReturnIndex) %>%
left_join(SP500YearReturns, by=("Index" = "Index"))
DOWYearMC <- crossing(data.frame(
Simulation = as.numeric(seq(1:Simulations))),
Year = as.numeric(seq(1:SimulationYears))) %>%
mutate(Index = DOWReturnIndex) %>%
left_join(DOWYearReturns, by=("Index" = "Index"))
TR10YearMC <- crossing(data.frame(
Simulation = as.numeric(seq(1:Simulations))),
Year = as.numeric(seq(1:SimulationYears))) %>%
mutate(Index = TR10ReturnIndex) %>%
left_join(TR10YearReturns, by=("Index" = "Index"))
# Create a custom function to calculate a vector of
# return balances given a start value and a vector of returns
# also we will add in average dividends and assume that they
# are reinvested; otherwise the stock index performances will be
# understated since the price doesn't include dividends
ReturnBalance <- function(Start, Returns, AvgDividend) {
Balance <- rep(0, length(Returns))
for (i in 1:length(Returns)) {
if (i==1) {
Balance[i] = Start
} else {
Balance[i] = Balance[i-1]
}
Balance[i] <- Balance[i] * (1 + (Returns[i] + AvgDividend))
}
Return <- Balance
}
# Add a Result field based on a $10000 initial investment for each index
# Also included is the historical average dividend
SP500YearMC <- SP500YearMC %>%
group_by(Simulation) %>%
mutate(Result = ReturnBalance(10000, Return, .02)) %>%
ungroup()
DOWYearMC <- DOWYearMC %>%
group_by(Simulation) %>%
mutate(Result = ReturnBalance(10000, Return, .02)) %>%
ungroup()
TR10YearMC <- TR10YearMC %>%
group_by(Simulation) %>%
mutate(Result = ReturnBalance(10000, Return, 0)) %>%
ungroup()
The results are all in the same format as this 5 record sample of the SP500YearMC data frame:
Simulation | Year | Index | Return | Result |
---|---|---|---|---|
1 | 1 | 16 | 19.4% | $12,140.00 |
1 | 2 | 35 | -11.8% | $10,950.28 |
1 | 3 | 42 | -11.4% | $9,920.95 |
1 | 4 | 69 | 20.3% | $12,133.33 |
1 | 5 | 56 | 17.3% | $14,475.06 |
Now let’s plot the results of all 500 simulations over 30 years for the S&P
SP500YearMC %>%
ggplot() +
geom_line(aes(Year, Result, group = Simulation, color = Simulation)) +
scale_y_continuous(labels=dollar_format(prefix = "$")) +
scale_color_gradient(low='dark blue', high='light blue') +
theme_classic() +
labs(title = "S&P 500 Monte Carlo Simulation",
subtitle = "Return on $10K over 30 Years - 500 Simulations",
x = "Year", y = "Portfolio Value")
That plot is a little busy but you can see all 500 simulations at work! There are some extreme outliers where your $10K is worth $1.5M or more.
Let’s look at some box plots to see the quartiles. To keep the scale readable we will focus on the front end of the timeline.
# Create a function to render an annotated box plot
annotatedboxplot <- function(DataFrame, Title, BoxColor, DotColor, selectvalues) {
# selectvalues <- c(2,4,6,8,10)
mcboxplot <- DataFrame %>%
filter(Year %in% selectvalues) %>%
ggplot(aes(Year, Result, group = Year)) +
geom_boxplot(fill = BoxColor, outlier.shape = NA) +
geom_jitter(color = DotColor, width = 0.2) +
scale_x_continuous(breaks = selectvalues) +
scale_y_continuous(labels=dollar_format(prefix = "$")) +
theme_classic() +
labs(title = Title,
subtitle = "Return on $10K - 500 Simulations",
caption = paste0("Median Compound Average Growth Rate : ",
percent(
((median(
filter(DataFrame, Year == selectvalues[length(selectvalues)])$Result)
/
10000)^(1/selectvalues[length(selectvalues)])) -1
)
),
x = "Years Invested",
y = "Portfolio Value")
boxdata <- ggplot_build(mcboxplot)$data[[1]] %>%
dplyr::select(ymin:ymax, x) %>%
gather(type, value, - x) %>%
mutate(value = round(value)) %>%
arrange(x) %>%
filter(str_detect(type, "lower|middle|upper"))
shift <- mean(selectvalues[1:2]) - selectvalues[1]
mcboxplot +
annotate("label",
x = filter(boxdata, type!="middle")$x + shift,
y = filter(boxdata, type!="middle")$value,
label = dollar(filter(boxdata, type!="middle")$value), size = 3) +
annotate("label",
x = filter(boxdata, type=="middle")$x,
y = filter(boxdata, type=="middle")$value,
label = dollar(filter(boxdata, type=="middle")$value), size = 3, color="red")
}
annotatedboxplot(SP500YearMC, "S&P500 Monte Carlo Simulation", "light blue", "dark blue",
c(2,4,6,8,10))
Much more readable and this shows how important that investment time horizon is.
Based on this simulation, if you invest for 2 years or less there is about a 25% chance you could break-even or worse. If you need your money soon, you should probably invest it in something other than the stock market.
Here is a look at the outer years of that investment. (from this point on the code is hidden since we’ve already demonstrated how the plots are generated)
You might be tempted to say that these returns look pessimistic versus history but keep in mind that these are the results of 500 simulations and we’ve got less than 100 years of actual history.
Now the same plot for the Dow Jones Industrials whose returns initially appear to be more favorable:
However, keep in mind that the DJI only has retrievable history since 1985 so a direct comparison omits the great depression of the 30’s and stagflation of the 70’s. That makes it an interesting but not equivalent view.
Finally, 10 Year US Treasuries.
Running a Monte Carlo simulation on treasuries may be excessive since the yields tend to change more slowly than stock market returns but there still have been some fairly quick swings when the Fed takes drastic monetary action.
With potential yields in a tighter range you can see a much steadier albeit limited return on your investment. Also, don’t forget that yields prior to 1990 were excluded as previously noted.
For the treasury bonds we only looked at the historical yields. Therefore this view assumes a “hold until maturity” strategy where the bonds purchased would mature at the points shown. If you were to buy bonds and then sell them prior to maturity, the price could be higher or lower depending on how the market values the yield of the bonds you are selling.
Conclusions :
While the median historical return of the S&P500 is over 10%, the likely result of investing in it is going to be closer to 8%. Why is that true? Consider a simple scenario where the returns in 4 consecutive years are -5%, 15%, 15% and -5%. Both the mean and median return is 5% but the compound annual growth rate is only 4.5%. Every time you lose money it deflates your principal so the next round of growth starts from a lower point. Keep that in mind for your long term planning.
Intermediate term treasury bonds are of course a very safe investment. However, over a longer term like 10 years you are sacrificing significant upside while avoiding relatively little risk; the values at the 25th percentile are nearly identical!
I hope these visualizations helped you understand market risks a little bit better.
Invest wisely.