In this project, we will perform a Monte Carlo Simulation of Stock Market Returns. A 120-day simulation of random stock market movements will be computed. Extensive statistical analysis will be performed. Using the Monte Carlo Simulation, we will be able to find answers to the following questions: What returns can we expect to make on the stock market in 120 days? What is the probability of making more than a 12% return in 120 days? What is the probability of losing more than 20% of an investment in 120 days?
The Monte Carlo Simulation takes its name from the famous gambling city of Monaco. The method is a very versatile tool with countless applications, ranging from finance or mathematics to quantum physics. It has many types and varieties but in every case, the method relies on randomness. In mathematics, the number pi, or any complex integral can be easily estimated using MC. In quantum physics, it is used to simulate unusual movements and reactions that are very hard to predict. Every type of Monte Carlo Simulation has the following steps implemented:
The Stock Market is a very interesting application of MC. Because of its complexity, daily moves on the stock market are impossible to predict - from our perspective they are random. Based on a probability distribution we will generate random returns for each day. These returns will be applied to the starting price to receive a price action graph. We will be then able to run statistical analysis on the data and answer our questions.
We will simulate returns the SPY, an SP500 ETF, for 120 days in the future. I have downloaded 10 years of historical SPY daily returns using the tidyquant library. The price data is sourced from Yahoo Fianance.
getSymbols("SPY", from = '2011-09-08', to = "2021-09-08", auto.assign = TRUE, warnings = FALSE)
## [1] "SPY"
daily_mean <- mean(dailyReturn(SPY)) #dailyReturn function from the tidyquant library
daily_std_dev <- sd(dailyReturn(SPY))
For this simulation, we will assume that stock market returns are normally distributed (in reality, they are almost normally distributed). I have calculated the daily returns, the mean daily return, and the standard deviation of the returns. We will use the mean and the standard deviation to generate normally distributed random variables. We don’t know the real mean and standard deviation, but we assume that the empirical figures from the last 10 years are good enough for our needs.
Let’s define the number of days to be simulated. We will simulate SPY for 120 days in the future. The starting price is taken from the last element of the historical price data.
no_of_days <- 120 # Set variable to 120 days
starting_price <- last(SPY$SPY.Close)[[1]] #Closing price from 8.09.2021 - last item in our frame
In the following code block, we generate the daily return using the normal distribution with the mean and standard deviation that was calculated earlier. We are left with a list of daily percentage returns, so to calculate prices we calculate the cumulative product of the returns list and the starting price. The result is a list of the simulated price action of 120 days in the future.
set.seed(101) #Set seed for reproducibility of the random numbers
returns <- 1+rnorm(no_of_days, mean=daily_mean, sd=daily_std_dev) #Generate random variables
prices <- cumprod(c(starting_price, returns)) #Calculate cumulative product
plot(prices, type='l', ylab="Simulated price of SPY", xlab="Days")
On the plot we can see the price chart of our SPY simulation. It looks very much like a real stock chart.
Running only one simulation is not a very safe approach. We know from the Law of Large Numbers that the results obtained from a large number of trials will be closer to the expected value. The more simulations we run, the more we can rely on the data. So, we will run 1001 simulations of 120 days each.
We run a for loop 1001 times, where each iteration a new 120 day simulation is performed. We save all of our data in matrices instead of lists, in order to store our two-dimensional data. Next, a plot of 50 first simulations is shown. A plot of 1001 lines would be unreadable.
no_of_sims <- 1001
returns_list <- matrix(0, nrow = no_of_sims, ncol = no_of_days) #define matrices
prices_list <- matrix(0, nrow = no_of_sims, ncol = no_of_days+1)
#Note: returns_list and prices_list are actually matrices, I just chose a poor name
for(i in 1:no_of_sims) { # for loop - 1001 iterations
returns_list[i,] <- rnorm(no_of_days, mean=daily_mean, sd=daily_std_dev) #Generate random variables
prices_list[i,] <- cumprod(c(starting_price, 1+returns_list[i,]))#Calculate cumulative product
}
#Drawing a plot of 50 first simulations
plot(prices_list[1,], type='l', ylab="Simulated price of SPY", xlab="Days", ylim=c(380, 620))
for(i in 1:50) {
lines(prices_list[i, ], type = 'l', col=i)
}
As we can see, the results of different simulations vary greatly. Some end up below the starting point and some increase very rapidly to over $600. We can also notice, just by looking at the graphs, that most simulations tend to end up with a positive return. As this is only 50 of 1001 simulations we can only assume that’s the case for all simulations.
We have calculated all one thousand simulations and now have all the necessary price data data. Let’s calculate the total returns:
total_returns <- array(NA, dim= no_of_sims, dimnames=NULL)
for (i in 1:no_of_sims) {
total_returns[i] <- (prices_list[i, 121]-prices_list[i, 1])/prices_list[i,1] #calculate total % return for each 120 day simulation
}
For each simulation a total cumulative return is calculated. We store all returns in an array of size 1001.
Now we have calculated the total return data we can start with descriptive statistics:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.300685 -0.008446 0.073283 0.074826 0.147201 0.496959
## 98th percentile: 0.357299
## Standard Deviation: 0.1220927
## Variance: 0.01490663
The above data show us that the average simulation ends up profitable. Moreover, the first quantile is almost zero, so we can say that in total a little bit over 25% of simulations results in a negative return.
Let’s visualize our data. A histogram, boxplot, and density plot are drawn:
The boxplot shows the mean, interquartile range, and the max and min values. Outliers are much more prominent on the positive side of the boxplot, with only one outlier on the negative side. As is shown on the histogram and density chart, the returns are skewed to the right. We can confirm it by calculating skewness:
skewness(total_returns)
## [1] 0.269092
Skewness is positive, confirming that the results are skewed to the right of the y axis, and therefore the mean is larger than the median result. We can confirm that by looking at the median and mean, which were calculated earlier.
Let’s show a price chart that is easier to read. Only three simulations will be included: the median, the min. and the max. We gather the indexes of the simulations in the matrix, so we can show them on graphs. For this we need the number of simulations to be odd, hence the 1001 simulations. If the number of simulations is even, we cannot show the median simulation on the graph.
max <- which.max(total_returns) # Find index of maximum value
min <- which.min(total_returns) # Find index of minimum value
if (no_of_sims %% 2 != 0) {med <- match(median(total_returns), total_returns)} # Find index of median value, if the median result is not an average of two results.
# Draw plot:
plot(prices_list[min,], type='l', ylab="Simulated price of SPY", xlab="Days", ylim=c(300, 700), col="red")
if (no_of_sims %% 2 != 0) {lines(prices_list[med, ], type = 'l', col='yellow')}
lines(prices_list[max, ], type = 'l', col='green')
We can now easily visualize the performance of the best, worst and median simulation. We can understand it a as kind of an ‘interval’ (not a confidence interval!) of where our real investment might exist.
To answer our first question, based on the graphs, and on the data, we can expect a return ranging from -30.07% to 49.7%.
We have already answered the first question earlier. The last questions will be answered by pure Monte Carlo: we will just count results that meet our requirements and use them to calculate the probability of it occurring.
condition1 <- sum(total_returns > 0.12) #Count results higher than 12%
condition2 <- sum(total_returns < -0.2) #Count results lower than 20%
probability1 <- condition1/no_of_sims #Calculate probability
probability2 <- condition2/no_of_sims
#Providing the answer:
cat("Probability of making more than 12% of return is ", probability1*100, "%", ", probability of losing more than 20% is ", probability2*100, "%")
## Probability of making more than 12% of return is 33.86613 % , probability of losing more than 20% is 0.5994006 %
As we can see the probability of gaining more than 12% is 33.87%, while the probability of losing more than 20% is only 0.599%. These are good odds for an investor.
In this project, we have defined the distribution and its parameters of the random variables. We have then generated the random variables from the natural distribution with parameters derived from empirical data. Lastly, we have performed the deterministic computation and extensively analyzed its results. This study has provided us with many insights about the stock market that would be very hard to compute otherwise. The probabilities, for example, that we calculated earlier would be almost impossible to calculate using any other methods. This project illustrates, that the Monte Carlo Simulation has a practical application by investors, hedge funds, and other financial institutions.