Monte Carlo Simulation

A Monte Carlo Simulation is a numeric method to solve problems with randomness which might be deterministic in nature, using the technique of repeated random sampling to produce numeric results. In the context of Finance it is a great tool to calculate the risk, in case of high uncertainty. The simulations can also be performed to provide an optimal pricing for an equity based on each possible price path generated randomly. The results tend to get closer to the real value with the number of simulations performed on the (pseudo) random variables, affirming the Law of Large Numbers.

The Experiment

This is a simple experiment based on the Lecture Slides. We assume the random variables to be independent and identically distributed, and follow a uniform distribution \(U[0,1]\), and log-normal distribution \(Y = exp(X)\), where \(X \sim (0,1)\). However, in this experiment, we simulate in \(N = 100000\) iterations from \(n\) amount of observations, which I will be setting at 250. The value of \(n\) can be changed to any value so long \(n \le N\).

set.seed(123)

#Number of simulations
N <- 100000

#Generating N random variables from U[0,1]
uniform <- runif(N, min = 0, max = 1)

#Generating N random variables from N(0,1) i.e. standard normal
normal <- rnorm(N, mean = 0, sd = 1)

#Transforming the standard normal rvs to log-normal
lognormal <- exp(normal)

#Let's see if this works. Printing the first few values
print(head(uniform))
## [1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673 0.0455565
print(head(lognormal))
## [1] 1.2963981 2.5030518 0.4856251 0.4456221 0.8681836 9.5545115

As the data is printed on the screen, we can confirm the random nature of this experiment.

Adding Plots

Currently, we have not set the \(n\) amount of observations yet, which we shall proceed with, to graphically represent the current Monte Carlo Simulation. We first start with a histogram containing both the distributions:

#Loading the libraries
library(ggplot2)

#For graphing and performance purposes, we are basing it on n = 250 observations, that can be changed
uniform_subset <- uniform[abs(qnorm(ppoints(N)) - uniform) <= 0.05][1:250]
lognormal_subset <- lognormal[abs(qnorm(ppoints(N)) - lognormal) <= 0.05][1:250]

#Create a data frame for each distribution
uniform_data <- data.frame(value = uniform_subset, distribution = "Uniform")
lognormal_data <- data.frame(value = lognormal_subset, distribution = "Log-normal")

#Combine the data frames
combined_data <- rbind(uniform_data, lognormal_data)

#Create the histogram with a density line
ggplot(combined_data, aes(x = value, fill = distribution)) +
  geom_histogram(aes(y = after_stat(density)), bins = 25, alpha = 0.5, position = "identity") +
  stat_function(fun = dnorm, args = list(mean = mean(combined_data$value), sd = sd(combined_data$value)), color = "black",   linewidth = 0.1) +
  ggtitle("Histogram at n Iterations")
## Warning: Multiple drawing groups in `geom_function()`
## ℹ Did you use the correct group, colour, or fill aesthetics?

Here, we have the histogram visualizing the empirical distribution of the \(N\) iterations based on 250 observations. The graph also shows a standard normal density line for reference. Here, we see the uniform distribution has a much higher kurtosis, is skewed from the left of the mean, and has a much narrow value range than log-normal distribution, which has a slight skew to the right, but has approximately the same kurtosis as standard normal distribution.

However, to visually assess if our simulated data distribution matches with the theoretical data distribution, a QQ-Plot can be created for each distribution. Keeping the same number of observations, the QQ-Plots of both the distributions can be seen as follows:

#QQ plot (Uniform)
ggplot() +
  stat_qq(data = data.frame(uniform_subset), aes(sample = uniform_subset)) +
  stat_qq_line(data = data.frame(uniform_subset), aes(sample = uniform_subset), color = "blue") +
  ggtitle("QQ Plot (Uniform Distribution)")

#QQ plot (Log-normal)
ggplot() +
  stat_qq(data = data.frame(lognormal_subset), aes(sample = lognormal_subset)) +
  stat_qq_line(data = data.frame(lognormal_subset), aes(sample = lognormal_subset), color = "blue") +
  ggtitle("QQ Plot (Log-normal Distribution)")

From both the plots, it can be seen that for the most part, the simulated data follows the theoretical distribution. However, the tails of the log-normal distribution may not fit that well with the Gaussian density, which is shown by the 45° line. This may imply the log-normal distribution is a heavy-tailed distribution, which can be better modeled with another distribution, such as t-distribution.

Conclusion

This concludes the simple experiment! Thank you for going through my first public project involving R. Feedback is always appreciated, as it will help me understand R better and practice more.