Fit a Geometric Brownian Motion in R

Beniamino Sartini

16/7/2022

Content

This article is a review the basics of the stochastic differential equations, in particular the geometric brownian motion. In the article is covered the mathematics behind and how to implement a function to fit such model in R. In the last part the function is applied to real stock data, in particular we consider Apple, Amazon and Tesla.

Geometric Brownian Motion

The SDE for a geometric brownian motion is given by:

\[\frac{dS_t}{S_t} = \mu dt + \sigma dW_t\] where:

  • \(\mu\) is the drift and \(\sigma\) the diffusion and are the parameters that need to be calibrated.

It is possible to solve the SDE in \([t,T]\) in order to obtain an expression of \(S_T\) in terms of \(S_t\):

\[S_T = S_t \, e^{(\mu-\frac{\sigma^2}{2})(T-t) + \sigma (W_T - W_t) }\] This equation hold in an interval of time given by \([t,T]\), in order to have a more fittable version we need to rewrite it in:

\[\frac{S_T}{S_t} = e^{(\mu-\frac{\sigma^2}{2})(T-t) + \sigma (W_T - W_t) } \, \, \, \Rightarrow \, \, \, ln\biggl(\frac{S_T}{S_t}\biggl) = (\mu-\frac{\sigma^2}{2})(T-t) + \sigma (W_T - W_t) \]

We have represented the solution of the SDE in terms of the log returns. In this way we can represent every \(S_t\) in this way:

\[\begin{cases} \\ S_0 = S_0 \\ \\ S_1 = S_0 \frac{S_1}{S_0} \\ \\ S_2 = S_0 \frac{S_1}{S_0} \frac{S_2}{S_1} \\ ... \\ S_t = S_0 \prod_{i = 1}^{t-1} \frac{S_i}{S_{i-1}} \\ \end{cases}\]

Pitfalls on Calibration

Given our variable of interest that is \(S_t\) we have to:

  • compute the dynamic of \(\frac{S_T}{S_t}\)
  • recover \(S_T\) in the final stage considering the product representation: \(S_T = S_0 \prod_{i = 1}^{T-1} \frac{S_i}{S_{i-1}}\)

Another problem is the simulation of the random noise given by the brownian motion. It is known that if we consider \(dW_t = W_t - W_s \sim N(0, t-s)\). Therefore we are able to recover it simulating a normal random variable. The problem is in practice since in order to simulate it we need to set a seed that determine a possible path of our random variable. In order to make it more clear, consider a random variable \(X: \Omega \rightarrow R\) if we can immagine to run a simulation on this random variable and stop the simulation after \(t\)-realizations every simulation will represent a different seed. Therefore we have that in our process there is a noise given by a realization of a random variable, therefore the optimization does not only consist in finding the best parameters, but also to find the right seed that is representative of the noise realizated in our stochastic process.

Implementation in R

The first function that need to be defined is a generic one called geometric_brownian. It takes as inputs a vector of parameters in the following order: \((\mu, \sigma)\). The output is given by a list containig the parameters and a function under the method fit that allow to fit a dynamic of a process. The fit function takes as parameters:

  • an initial point: S0 = 1
  • a time interval: N = 10
  • a time difference: dt = 1
  • a random seed: seed = 1
geometric_brownian <- function(params){
  
  mu = params[1]
  sigma = params[2]
  

  fitGBM <-  function(S0 = 1, N = 10, dt = 1,  seed = 1){
    
    set.seed(seed) 
    
    index = seq(dt, N, dt)
    
    dWt = rnorm(length(index), 0, dt)
    
    Rt =  exp( (mu - sigma^2/2)*dt + sigma*dWt )
    
    St = S0 * cumprod(Rt)
    
    tibble(
      t = index, 
      St = St
    )
  }
  
  structure(
    list(
      mu = params[1],
      sigma = params[2],
      fit = fitGBM

    )
  )
}

The second function, called optimize_geometric_brownian, is an optimizer to find the best parameters, this is done by minimizing the mean square error between the real values and the fitted ones.

The function takes as input:

  • a vector St ordered in this way \((S_0, S_1, ..., S_N)\).
  • a vector of starting parameters: params.
  • a random seed: seed = 1

The output of this function is a tibble object, similar to a dataframe but allow to nest more elements in a single row. In this way the output of this function will be a 1x5 tibble containing:

  • mu: the optimized drift \(\mu\).
  • sigma: the optimized diffusion \(\sigma\).
  • seed: the random seed in which the optimum was found.
  • sumMSE: the sum of the square of the errors, is the quantity that we minimize.
  • path: in this row is contained a 1x3 tibble with the real values and the fitted ones.

This tibble is the type of data called in the following part as df_optim.

optimize_geometric_brownian <- function(St = NULL, params = NULL, seed = 1){
  
  set.seed(seed) 
  
  loss_function <- function(params){
    
    mu = params[1]
    sigma = params[2]
    
    # sigma positive 
    if(sigma < 0 ){
      return(NA)
    }
    
    sum((St - geometric_brownian(params = params)$fit(S0 = St[1], N = length(St), dt = 1,  seed = seed)$St)^2)
    
  }
  
  optim_params = optim(params, loss_function)
  
  fitted_values = geometric_brownian(params = optim_params$par)$fit(S0 = St[1], N = length(St), dt = 1,  seed = seed)
  
  df_optim =tibble( t      = 1:length(St),
                    mu     = optim_params$par[1], 
                    sigma  = optim_params$par[2], 
                    sumMSE = optim_params$value,
                    MSE    = sqrt(sumMSE/length(St)),
                    seed   = seed,
                    real   = St,
                    fitted = fitted_values$St
                  )
  
  df_optim = group_by(df_optim, mu, sigma, sumMSE,MSE, seed) %>% nest(path = c("t", "real", "fitted") )
  
  df_optim$seed_used = list(seed)
    
  return(df_optim)
}

The third function have the purpose to find the right seed in which we can find the minimum sum of square of the errors. In order to do it we start from the first minimization done at seed 1 using the last function than, using the parameters found we can change the seed and perform again the optimization. If the sum of square is decreased we have found a new minimum and we start from there, if not we just change seed and repeat the procedure. This is done N times till the sum of square is under a certain threshold if specified, or till N is reached.

The inputs of the function calibrate_geometric_brownian are given by:

  • df_optim: dataset with the best parameters obtained with the function
  • N: the number of simulations
  • threshold: the square of errors under that the procedure stops.
  • verbose: if TRUE display some messages when it is runned such as when a minimum is find tells the seed, the improvement with respect to the last best and the actual MSE. Also every 50 iteration it display a status message counting the actual iteration with respect to the total in percentage.

The output of the function is a dataset of the kind df_optim.

calibrate_geometric_brownian <- function(df_optim = NULL, N = 1000, threshold = NULL, verbose = TRUE ){
  
  params = c(df_optim$mu, df_optim$sigma)
  
  St = df_optim$path[[1]]$real
  
  seed_used = df_optim$seed_used[[1]]
  
  dynamic_seed = c(max(seed_used) + 1)
  
  for(i in 2:N){
    
    dynamic_seed[i] = dynamic_seed[i-1] + 1
    
    set.seed(dynamic_seed[i])
    
    new_optim = optimize_geometric_brownian(St = St, params = params, seed = dynamic_seed[i])
    
    new_optim$sumMSE < df_optim$sumMSE
    
    if(new_optim$sumMSE < df_optim$sumMSE){
      
      if(verbose) message("New MINIMUM found with seed: ", dynamic_seed[i] , " MSE reduction: ", new_optim$sumMSE - df_optim$sumMSE, " New MSE: ", new_optim$MSE)
      
      df_optim = new_optim
      
    } else {
      
      if(verbose && (i %% 50 == 0)){message("Nothing found, iteration: ", i, "/", N, " (", round(i/N, 2)*100, " %)" )}
      
    }
    
    if(!is.null(threshold) && new_optim$sumMSE <= threshold){
      break
    } 
  }
  
  df_optim$seed_used = list(unique(c(seed_used, dynamic_seed )))
  
  return(df_optim)
}

Last function is the IterativeCalibration, takes as input a dataset df_optim and perform the optimization iteratively. We have note that the last procedure after 200 or 300 iteration starts to not give more minimums. In order to find more possible minimums we have implemented this function that:

  • n.iterations = 1 is the starting point: from the first simulation’s parameters the best parameters are calibrated using the function calibrate_geometric_brownian for N-times.
  • n.iterations = 2 the parameters obtained in the last iteration are used as starting parameters and the seed is changed again, new parameters are calibrated using the function calibrate_geometric_brownian again and so on till the last n.iterations.

In this way we will obtain multiple results in the form of a df_optim with n.iterations rows and 5 columns.

IterativeCalibration <- function(df_optim = NULL, n.iterations = 10, N = 300, threshold = 100000, verbose = TRUE){
  
  iterations = list()
  
  iterations[[1]] = calibrate_geometric_brownian(df_optim, N = N, threshold = threshold, verbose = verbose )
  
  St = df_optim$path[[1]]$real
  
  if(n.iterations > 1){
    
     for(i in 2:n.iterations){
    
    last_seed = max(iterations[[i-1]]$seed_used[[1]])
    
    params = c(iterations[[i-1]]$mu, iterations[[i-1]]$sigma)
    
    new_df_optim = optimize_geometric_brownian(St = St, params = params, seed = last_seed + i) 
    
    iterations[[i]] = calibrate_geometric_brownian(new_df_optim, N = N, threshold = threshold, verbose = verbose )
     }
  }
 
  iterations = bind_rows(iterations)
  iterations$simulations = 1:n.iterations
  
  return(iterations)
  
}

Example Using Bitcoin Prices

Using cryptocompare.com we have recovered the hourly prices of Bitcoin in the last 16 days. The day of writing is the 16/07/2022 and the starting date is 01/07/2022.

Xt = BTC_hourly_price$Adj

# first optimization 
df_optim = optimize_geometric_brownian(St = Xt, params = c(1, 1), seed = 1)

# we can perfom just one simulation with this function or more than one using the other
# df_optim = calibrate_geometric_brownian(df_optim, N = 300, threshold = 100000, verbose = FALSE)


# permform 5 iteration with 200 simulation each, in total 1000 simulations, 5 minimums
df_optim_iterated = IterativeCalibration(df_optim, n.iterations = 5, N = 200, threshold = 100000, verbose = FALSE )


# select the best result
index_optim = which(df_optim_iterated$sumMSE == min(df_optim_iterated$sumMSE))

# extraction of the new df_optim 
df_optim = df_optim_iterated[index_optim, ]


# Create a df for plotting the result 
df_plot = df_optim$path[[1]]
df_plot$t = BTC_hourly_price$Date

df_plot %>%
  ggplot()+
  geom_line(data = df_plot, aes(t, fitted), color = "red")+
  geom_line(data = df_plot, aes(t, real), color = "black") +
  geom_label(data = df_plot[30,], aes(t, real + 1500, label = paste0("MSE: ", round(df_optim$MSE, 2))))+
  geom_label(data = df_plot[330,], aes(t, real + 2600, label = paste0("mu: ", round(df_optim$mu, 6), " \n sigma: ", round(df_optim$sigma, 6))))+
  ggthemes::theme_solarized()+
  theme(axis.text.x = element_text(angle = 0, face = "bold", size = 7), 
        axis.text.y = element_text(face = "bold"), 
        axis.title  = element_text(face = "bold"),
        plot.title  = element_text(face = "bold"),
        plot.subtitle = element_text(face = "italic"),
        plot.caption = element_text(face = "italic"),
        panel.grid.major.x = element_line(colour="grey60", linetype="dotted"),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.y = element_line(colour="grey60", linetype="dotted"),
        legend.text = element_text(face = "italic", size = 5),
        legend.title = element_text(face = "bold"),
        legend.position = "top" ) +
  ggtitle("Geometric Brownian with Bitcoin, Hourly Price", subtitle = "Fitted Value (Red), Real value (Black)") +
  xlab("")+
  ylab("Price")+
  labs(caption = "Cyptocompare, All Exchanges July 2022")

df_optim_iterated %>% 
  ungroup() %>%
  select(simulations, mu, sigma, MSE, seed) %>%
  knitr::kable(caption = "Simulations results for the GBM") %>%
  kableExtra::kable_classic() %>%
  kable_styling(latex_options = "hold_position")
Simulations results for the GBM
simulations mu sigma MSE seed
1 -0.0000750 0.0037949 512.2167 158
2 -0.0001517 0.0052223 427.1527 400
3 0.0000201 0.0041106 468.0472 523
4 -0.0004247 0.0056277 414.1161 686
5 0.0000110 0.0052486 495.6788 929