ChatGPT is a powerful language model developed by OpenAI that has the capability to generate human-like text based on the input it receives. It has been trained on a large corpus of text data, including technical writing and programming languages. While it may not have been specifically trained to write code to solve financial problems, its ability to understand and generate code could potentially be used to write code to solve simple problems in finance. However, it is important to note that the accuracy and efficiency of the code generated by ChatGPT may vary and it would be wise to thoroughly review and test the code before using it in any financial applications.

The above paragraph has been written by chatGPT.¹ As a researcher in Finance focusing on quantitative finance, I write a lot of code and am also involved in teaching programming/machine learning courses to business school students. I’ve seen many posts and threads showcasing what chatGPT can do in terms of writing and analyzing code, which made me interested in exploring what kind of code chatGPT can generate to tackle simple problems in Finance. This is not a detailed analysis, just an initial exploration of chatGPT’s capabilities.

Application 1: Moving Averages

I’ll start with something very simple: downloading data for the SPY, calculating and plotting some moving averages. I tried the following prompt:

Prompt: Could you generate R code to download SPY data, then plot 50-day and 200-day moving averages?

The code generated by chatGPT is shown below. It relies on quantmod to download data, which makes sense. For some reason, it also loads tidyverse, although it doesn’t seem to use it. As has been pointed out by many people, chatGPT doesn’t really know anything. All it knows how to do is arrange words (or code) based on patterns it has seen before. But it did get the job done, which is interesting. Also, it plotted the price and the moving averages, even though I only asked for the moving averages. Of course, any examples chatGPT would have had access to would have likely done the same, so this is not surprising.

# Install required packages if not already installed
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("quantmod")) install.packages("quantmod")

# Load packages
library(tidyverse)
library(quantmod)

# Download SPY data
getSymbols("SPY")

## [1] "SPY"

# Calculate 50-day and 200-day moving averages
spy_50ma <- SMA(Cl(SPY), n = 50)
spy_200ma <- SMA(Cl(SPY), n = 200)

# Plot SPY data with 50-day and 200-day moving averages
ggplot() +
  geom_line(aes(x = index(SPY), y = Cl(SPY))) +
  geom_line(aes(x = index(spy_50ma), y = spy_50ma), color = "red") +
  geom_line(aes(x = index(spy_200ma), y = spy_200ma), color = "blue") +
  labs(x = "", y = "Closing Price") +
  ggtitle("SPY Data with 50-day and 200-day Moving Averages")

Now, chatGPT chose to use ggplot. I never use ggplot, and I never will. So I asked chatGPT to redo this without using ggplot and to also add a legend to the graph.

Prompt: Please redo without using ggplot and this time add the legend.

This was the resulting code. It still “thinks” it needs tidyverse. It tried to add legends, but since the data is in xts format, it doesn’t work.

# Install required packages if not already installed
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("quantmod")) install.packages("quantmod")

# Load packages
library(tidyverse)
library(quantmod)

# Download SPY data
getSymbols("SPY")

## [1] "SPY"

# Calculate 50-day and 200-day moving averages
spy_50ma <- SMA(Cl(SPY), n = 50)
spy_200ma <- SMA(Cl(SPY), n = 200)

# Plot SPY data with 50-day and 200-day moving averages
plot(Cl(SPY), type = "l", col = "black",
     xlab = "", ylab = "Closing Price",
     main = "SPY Data with 50-day and 200-day Moving Averages")
lines(spy_50ma, col = "red")
lines(spy_200ma, col = "blue")
legend("topright", c("Closing Price", "50-day MA", "200-day MA"),
       col = c("black", "red", "blue"), lty = 1)

Although the legend doesn’t work properly, this could be fixed easily with the addLegend command.

plot(cbind(Cl(SPY), spy_200ma, spy_50ma), 
     lty=c(1, 1,1), 
     lwd=c(1,1,1), 
     col = c("black", "blue", "red"))
addLegend("topleft", 
          legend.names = c("SPY", "SMA(200)", "SMA(50)"), 
          lty=c(1, 1,1), 
          lwd=c(1,1,1), 
          col = c("black", "blue", "red"))

Next, let’s see if chatGPT can create and test a moving average crossover strategy:

Prompt: Please generate R code that implements a moving average crossover strategy and test it on the SPY.

The result was the code below, which is not really usable. The code starts with the (by now) standard approach of using quantmod to download the data, and the package tidyverse because … it’s popular? The getSymbols function creates by default an xts object. Then, it calculates the two moving averages. So far, so good. But in the next step, chatGPT did something nonsensical. It created a function strategy to calculate the signal of the strategy, which sounds like something you could do. But the function takes as input some data which is never used. Instead, the function used the previously calculated moving averages. So when it calls signals <- strategy(SPY), this is the same as calling the function with any other argument, or even with no argument. It still works in this case as the moving averages were pre-calculated.

In the next step, chatGPT tries to create a new dataframe with the date, the closing price, and the signal from the trading strategy. However, this doesn’t make much sense, because they are all xts objects already, so you don’t need the date. In addition, because these are xts objects, the column names will not be correctly set, which will cause an error with the calculation of returns. It also added an NA when it tries to calculate log-returns, which is not needed since diff automatically adds it. Finally, the calculation of the strategy returns uses today’s signal with today’s return. Since today’s signal uses today’s moving averages, the signal is available at the end of the day.

Simply put, the code generated by chatGPT doesn’t work. I would point out, however, that it’s possible to modify the prompt to get better results.

# Install required packages if not already installed
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("quantmod")) install.packages("quantmod")

# Load packages
library(tidyverse)
library(quantmod)

# Download SPY data
getSymbols("SPY")

# Calculate 50-day and 200-day moving averages
spy_50ma <- SMA(Cl(SPY), n = 50)
spy_200ma <- SMA(Cl(SPY), n = 200)

# Create a moving average crossover strategy
strategy <- function(data) {
  # Buy signal
  buy_signal <- ifelse(spy_50ma[, 1] > spy_200ma[, 1], 1, 0)
  # Sell signal
  sell_signal <- ifelse(spy_50ma[, 1] < spy_200ma[, 1], -1, 0)
  # Combine signals
  signals <- buy_signal + sell_signal
  return(signals)
}

# Apply strategy to SPY data
signals <- strategy(SPY)

# Create a data frame to store strategy results
results <- data.frame(Date = index(SPY),
                      Close = Cl(SPY),
                      Signals = signals)
 
# Create a new column to store returns
results$Returns <- c(NA, diff(log(results$Close)))

# Calculate strategy returns
results$Strategy_Returns <- ifelse(results$Signals == 1, results$Returns, 0)
cumsum(results$Strategy_Returns)

# Plot strategy returns
plot(cumsum(results$Strategy_Returns), type = "l",
     xlab = "", ylab = "Cumulative Returns",
     main = "Moving Average Crossover Strategy on SPY")

Let’s fix the code to make it more general.

if (!require("quantmod")) install.packages("quantmod")
if (!require("PerformanceAnalytics")) install.packages("PerformanceAnalytics")

# Load packages
library(quantmod)
library(PerformanceAnalytics)

# Download SPY data
getSymbols("SPY")

## [1] "SPY"

# Create a moving average crossover strategy
strategy <- function(data, n_fast = 50, n_slow = 200, long = FALSE) {
  
  # Calculate 50-day and 200-day moving averages
  fast_ma <- SMA(Cl(data), n = n_fast)
  slow_ma <- SMA(Cl(data), n = n_slow)

  # signal

  if (long) {
    signals <- ifelse(fast_ma > slow_ma, 1, 0)
  } else{
    signals <- ifelse(fast_ma > slow_ma, 1, -1)
  }
  return(signals)
}

# Apply long-short strategy to SPY data with default parameters
signals <- strategy(SPY)

# Create a data frame to store strategy results
results <- merge(Cl(SPY),
                 signals)
colnames(results) <- c("Close", "Signals")
 
# Create a new column to store returns
results$Returns <- CalculateReturns(results$Close)

# Calculate strategy returns
results$Strategy_Returns <- stats::lag(results$Signals) * results$Returns

# Plot SPY and strategy returns
chart.CumReturns(results[201:nrow(results), c( "Returns", "Strategy_Returns")], main = "Moving Average Crossover Strategy on SPY", legend.loc = "topleft")

Note that the code now would work with another series. For example, let’s run it on BND:

getSymbols("BND")

## [1] "BND"

signals <- strategy(BND)

# Create a data frame to store strategy results
results <- merge(Cl(BND),
                 signals)
colnames(results) <- c("Close", "Signals")
 
# Create a new column to store returns
results$Returns <- CalculateReturns(results$Close)

# Calculate strategy returns
results$Strategy_Returns <- stats::lag(results$Signals) * results$Returns

# Plot SPY and strategy returns
chart.CumReturns(results[201:nrow(results), c( "Returns", "Strategy_Returns")], main = "Moving Average Crossover Strategy on BND", legend.loc = "topleft")

Obviously, there are multiple ways to do this and the code could be made much more efficient. What is interesting is that, even though the code generated by chatGPT initially didn’t work and had multiple issues, it was relatively straightforward to fix it.

Application 2: Single sorts in asset pricing

Many studies in asset pricing rely on sorting stocks into groups based on some characteristic. Then, portfolios are formed based on these groups, and held over some holding period. A typical workflow to implement this involves getting data for all stocks in the universe, cycling through portfolio formation dates, removing certain stocks (for example, microcaps), and then assigning the remaining stocks to a number of groups (for example, 10 groups for decile portfolios). The data often is sourced in a “long” format, where each row contains data for one stock and one date. One possible solution is to convert the data to a “wide” format.² But we can also do everything without this step.

To test chatGPT’s ability to perform these tasks, I first create some random data that exemplifies a typical dataset. The code below creates a dataframe with daily data for 2000 stocks. The variables in the dataframe are the date, the stock code (stock_cd), the market capitalization (mkt_cap), the exchange code (exchcd) and the volatility. In the simulated data, most stocks exist through the entire period, while the data for some stocks start later, and some stocks disappear (i.e. are delisted), as in real life. The resulting dataframe stock_data had approximately 4 million rows.

# create some random data
library(bizdays)      # for business days
library(extraDistr)   # for discrete unif distr

# create some dates
dates <- as.Date(bizseq("2017-01-01", "2023-02-21"))

# create a dataframe with data for stocks with arbitary starting points
n_stocks <- 2000
n_dates <- length(dates)
stock_data <- data.frame()
  
for (i in 1:n_stocks){
  # with high probability, the stock already exists at the beginning of our sample.  
  i_start <- ifelse(runif(1) > 0.1, 1, rdunif(1, 1, floor(n_dates/2)))
  # with high probability, the stock existed until the end of the sample, but some stocks are "delisted"
  i_end <- ifelse(runif(1) > 0.2, n_dates, rdunif(1, i_start, n_dates))  
  n_dates_stock <- i_end - i_start+1
  stock_cd <- rep(paste0("Stock_", i), n_dates_stock)
  mkt_cap <- rnorm(n_dates_stock)
  exchcd <- rep(rdunif(1, 1, 3), n_dates_stock)
  volatility <- rnorm(n_dates_stock)
  this_stock_data <- data.frame(date = dates[i_start:i_end],
                                 stock_cd, 
                                 mkt_cap, 
                                 exchcd,
                                 volatility )
stock_data <- rbind(stock_data, this_stock_data)
}

I provided the following prompt to chatGPT:

I have a dataframe called stock_data in R with the following columns: date, stock_cd, mkt_cap, exchcd, volatility. Each row contains an observation for one stock in one day. I would like you to create R code to do the following. At the end of each month, calculate the 20th percentile of mkt_cap for all stocks for which exchcd is equal to 1. Then, select all stocks whose mkt_cap is higher than this value. Finally, assign stocks to one of 10 groups based on volatility. Keep only the month ends. Do not alter the original dataframe.

The idea is to replicate the single-factor sorting process in many US studies in empirical asset pricing, in which microcaps are defined relative to the 20th percentile of the market capitalization of NYSE stocks (in the prompt, identified by exchcd equal to 1). These stocks are removed at formation date, and the remaining stocks are sorted into deciles according to the desired characteristic (in our case, volatility). This was the initial code generated by chatGPT. It looks good, but there are two issues. The first one is that it’s using data from all month ends to calculate the market cap breakpoint. The second one is that it’s selecting only stocks with exchcd equal to 1, which is not what I wanted.

library(dplyr)
library(lubridate)

# convert date column to a date object
stock_data$date <- as.Date(stock_data$date)

# create a new column for the end of the month
stock_data$end_of_month <- floor_date(stock_data$date, unit = "month") + days(days_in_month(stock_data$date) - 1)

# filter to only include month ends
stock_data_month_ends <- stock_data %>% filter(date == end_of_month)

# calculate the 20th percentile of mkt_cap for all stocks for which exchcd is equal to 1
mkt_cap_20th_percentile <- stock_data_month_ends %>% filter(exchcd == 1) %>% summarize(mkt_cap_20th_percentile = quantile(mkt_cap, 0.2))

# select all stocks whose mkt_cap is higher than this value
selected_stocks <- stock_data_month_ends %>% filter(mkt_cap > mkt_cap_20th_percentile$mkt_cap_20th_percentile & exchcd == 1)

# assign stocks to one of 10 groups based on volatility, with values ranging from 1 to 10
selected_stocks$volatility_group <- as.numeric(cut(selected_stocks$volatility, breaks = quantile(selected_stocks$volatility, probs = seq(0, 1, 0.1)), labels = 1:10))

# print the selected_stocks dataframe
selected_stocks

I asked chatGPT to remove the filter exchcd when selecting stocks, and also pointed out that the market capitalization percentile needs to be calculated at the end of each month.³ chatGPT made the corrections and generated the code below:

library(dplyr)
library(lubridate)

# convert date column to a date object
stock_data$date <- as.Date(stock_data$date)

# create a new column for the end of the month
stock_data$end_of_month <- floor_date(stock_data$date, unit = "month") + days(days_in_month(stock_data$date) - 1)

# filter to only include month ends
stock_data_month_ends <- stock_data %>% filter(date == end_of_month)

# for each month, calculate the 20th percentile of mkt_cap for all stocks with exchcd == 1
mkt_cap_20th_percentile_by_month <- stock_data_month_ends %>% 
  filter(exchcd == 1) %>% 
  group_by(end_of_month) %>% 
  summarize(mkt_cap_20th_percentile = quantile(mkt_cap, 0.2))

# join the mkt_cap_20th_percentile_by_month data with the stock_data_month_ends data
stock_data_month_ends <- stock_data_month_ends %>% 
  left_join(mkt_cap_20th_percentile_by_month, by = "end_of_month")

# select all stocks whose mkt_cap is higher than this value
selected_stocks <- stock_data_month_ends %>% 
  filter(mkt_cap > mkt_cap_20th_percentile)

# assign stocks to one of 10 groups based on volatility, with values ranging from 1 to 10
selected_stocks$volatility_group <- as.numeric(cut(selected_stocks$volatility, breaks = quantile(selected_stocks$volatility, probs = seq(0, 1, 0.1)), labels = 1:10))

This is pretty good! We could sort the dataframe with the selected stocks by date, and check how many stocks we have per group:

# print the selected_stocks dataframe
selected_stocks <- selected_stocks[order(selected_stocks$date),]
head(selected_stocks, 10)

##           date stock_cd     mkt_cap exchcd  volatility end_of_month
## 1   2017-01-31  Stock_1  0.07311324      2 -0.93966117   2017-01-31
## 120 2017-01-31  Stock_3  1.28858545      2 -0.90402062   2017-01-31
## 297 2017-01-31  Stock_7  0.87986849      3 -1.01039056   2017-01-31
## 352 2017-01-31  Stock_8  1.70374794      3 -0.08445289   2017-01-31
## 411 2017-01-31  Stock_9 -0.27182618      3  0.59708537   2017-01-31
## 426 2017-01-31 Stock_11  0.08721117      2  0.02489237   2017-01-31
## 477 2017-01-31 Stock_12 -0.91399469      2  2.74052282   2017-01-31
## 533 2017-01-31 Stock_13  0.10363270      2 -0.38076908   2017-01-31
## 564 2017-01-31 Stock_14 -0.38902889      1 -1.19413366   2017-01-31
## 715 2017-01-31 Stock_17 -0.67802222      3 -1.32402846   2017-01-31
##     mkt_cap_20th_percentile volatility_group
## 1                 -1.023015                2
## 120               -1.023015                2
## 297               -1.023015                2
## 352               -1.023015                5
## 411               -1.023015                8
## 426               -1.023015                6
## 477               -1.023015               10
## 533               -1.023015                4
## 564               -1.023015                2
## 715               -1.023015                1

# show how many stocks per group on first two dates
head(selected_stocks %>% 
    group_by(date,volatility_group) %>% 
    summarize(num_stocks = n()),20)

## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.

## # A tibble: 20 × 3
## # Groups:   date [2]
##    date       volatility_group num_stocks
##    <date>                <dbl>      <int>
##  1 2017-01-31                1        127
##  2 2017-01-31                2        172
##  3 2017-01-31                3        141
##  4 2017-01-31                4        150
##  5 2017-01-31                5        151
##  6 2017-01-31                6        163
##  7 2017-01-31                7        124
##  8 2017-01-31                8        145
##  9 2017-01-31                9        144
## 10 2017-01-31               10        168
## 11 2017-02-28                1        154
## 12 2017-02-28                2        139
## 13 2017-02-28                3        149
## 14 2017-02-28                4        132
## 15 2017-02-28                5        159
## 16 2017-02-28                6        145
## 17 2017-02-28                7        138
## 18 2017-02-28                8        138
## 19 2017-02-28                9        135
## 20 2017-02-28               10        153

Overall, although this application seems relatively complex, it can be broken down into a series of simple sequential steps, which chatGPT handled pretty well, with some guidance and correction.

Application 3: Regressions and Betas

Let’s see if chatGPT can calculate a stock beta:

Prompt: Can you generate R code that downloads prices of TSLA and SPY, calculates the returns of both series, and then run a regression of the returns of TSLA on the returns of SPY?

Here is the code it generated. It almost works, except that the two time series have different sizes. Let’s try to solve it by being more specific.

# Install the required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("quantmod")) install.packages("quantmod")

# Load the required packages
library(tidyverse)
library(quantmod)

# Download the prices of TSLA and SPY
getSymbols("TSLA", from = "2010-01-01", auto.assign = TRUE)
getSymbols("SPY", from = "2010-01-01", auto.assign = TRUE)

# Calculate the returns of TSLA and SPY
tsla_returns = ROC(Cl(TSLA))
spy_returns = ROC(Cl(SPY))

# Run the regression of TSLA returns on SPY returns
reg = lm(tsla_returns ~ spy_returns)

# Print the summary of the regression results
summary(reg)

I had to play around with the prompt for a while to get it to work properly. For example, if I don’t ask it to rename the columns, it refers to incorrect column names. The prompt below worked:

Could you please modify this code by creating a new xts object merging the returns of TSLA and SPY, renaming the columns of the merged object to “TSLA” and “SPY”, then discarding any rows with missing observations, and running the regression using the merged returns?

The code generated by this prompt is shown below. It calculated the beta of TSLA to be 1.39.

# Install the required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("quantmod")) install.packages("quantmod")

# Load the required packages
library(tidyverse)
library(quantmod)

# Download the prices of TSLA and SPY
getSymbols("TSLA", from = "2010-01-01", auto.assign = TRUE)

## [1] "TSLA"

getSymbols("SPY", from = "2010-01-01", auto.assign = TRUE)

## [1] "SPY"

# Calculate the returns of TSLA and SPY
tsla_returns = ROC(Cl(TSLA))
spy_returns = ROC(Cl(SPY))

# Create a new xts object merging the returns of TSLA and SPY
merged_returns = cbind(tsla_returns, spy_returns)

# Rename the columns of the merged object to "TSLA" and "SPY"
colnames(merged_returns) = c("TSLA", "SPY")

# Discard any rows with missing observations
merged_returns = merged_returns[complete.cases(merged_returns), ]

# Run the regression using the merged returns
reg = lm(TSLA ~ SPY, data = data.frame(merged_returns))

# Print the summary of the regression results
summary(reg)

## 
## Call:
## lm(formula = TSLA ~ SPY, data = data.frame(merged_returns))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.208446 -0.016252 -0.000204  0.015975  0.221310 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.0009245  0.0005775   1.601    0.109    
## SPY         1.3977946  0.0520356  26.862   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03256 on 3181 degrees of freedom
## Multiple R-squared:  0.1849, Adjusted R-squared:  0.1846 
## F-statistic: 721.6 on 1 and 3181 DF,  p-value: < 2.2e-16

Now, let’s ask chatGPT to calculate and plot a rolling beta. Still on the same chat, I asked for this:

Prompt: Could you please modify this code to implement a recursive regression using a rolling window of 60 days, store the slope coefficient and plot it?

The generated code is below. It does exactly what I asked it to.

# Install the required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("quantmod")) install.packages("quantmod")

# Load the required packages
library(tidyverse)
library(quantmod)

# Download the prices of TSLA and SPY
getSymbols("TSLA", from = "2010-01-01", auto.assign = TRUE)

## [1] "TSLA"

getSymbols("SPY", from = "2010-01-01", auto.assign = TRUE)

## [1] "SPY"

# Calculate the returns of TSLA and SPY
tsla_returns = ROC(Cl(TSLA))
spy_returns = ROC(Cl(SPY))

# Create a new xts object merging the returns of TSLA and SPY
merged_returns = cbind(tsla_returns, spy_returns)

# Rename the columns of the merged object to "TSLA" and "SPY"
colnames(merged_returns) = c("TSLA", "SPY")

# Discard any rows with missing observations
merged_returns = merged_returns[complete.cases(merged_returns), ]

# Define a function to run a regression with a rolling window of 60 days
reg_rolling = function(data, window = 60) {
  roll_reg = rollapplyr(data, window, function(x) {
    reg = lm(TSLA ~ SPY, data = data.frame(x))
    coef(reg)[2]
  }, by.column = FALSE, align = "right")
  return(roll_reg)
}

# Run the recursive regression with a rolling window of 60 days
slope_coef = reg_rolling(merged_returns)

# Plot the slope coefficient
plot(slope_coef, type = "l", col = "blue", ylab = "Slope", xlab = "Time")

Application 4: Simple GARCH model

I asked chatGPT to estimate a GARCH model and plot the conditional volatility:

Prompt: Can you generate R code that downloads prices for the SPY, calculates returns, estimates a GARCH(1,1) model, and plots the conditional volatility?

Here’s what chatGPT provided, with one modification to make it work (I exchanged fitted(fit) with sigma(fit) below).

library(tseries)
library(rugarch)

# Download SPY prices
spy <- get.hist.quote(instrument = "SPY", start = "2000-01-01", end = "2023-02-10", quote = "AdjClose", provider = "yahoo", compression = "d", retclass = "zoo")

## time series starts 2000-01-03
## time series ends   2023-02-09

# Calculate returns
returns <- diff(log(spy))

# Estimate GARCH(1,1) model
garch11.spec <- ugarchspec(mean.model = list(armaOrder = c(0,0)), variance.model = list(garchOrder = c(1,1)))
fit <- ugarchfit(spec = garch11.spec, data = returns)

# Plot the estimated volatility
plot(sigma(fit), main = "Estimated Volatility (GARCH(1,1))", ylab = "Volatility")

I then asked chatGPT to also estimate an EGARCH and plot both conditional volatilities:

Prompt: Now please estimate an EGARCH and plot both estimates of volatility.

Again, I had to fix the fitted(fit) thing, but it works:

library(tseries)
library(rugarch)

# Download SPY prices
spy <- get.hist.quote(instrument = "SPY", start = "2000-01-01", end = "2023-02-10", quote = "AdjClose", provider = "yahoo", compression = "d", retclass = "zoo")

## time series starts 2000-01-03
## time series ends   2023-02-09

# Calculate returns
returns <- diff(log(spy))

# Estimate GARCH(1,1) model
garch11.spec <- ugarchspec(mean.model = list(armaOrder = c(0,0)), variance.model = list(garchOrder = c(1,1)))
garch11.fit <- ugarchfit(spec = garch11.spec, data = returns)

# Estimate EGARCH(1,1) model
egarch11.spec <- ugarchspec(mean.model = list(armaOrder = c(0,0)), variance.model = list(model = "eGARCH", garchOrder = c(1,1)))
egarch11.fit <- ugarchfit(spec = egarch11.spec, data = returns)

# Plot the estimated volatilities
par(mfrow = c(2, 1))
plot(sigma(garch11.fit), main = "GARCH(1,1) Estimated Volatility", ylab = "Volatility")
plot(sigma(egarch11.fit), main = "EGARCH(1,1) Estimated Volatility", ylab = "Volatility")

Application 5: Portfolio Optimization

Can we leverage chatGPT to demonstrate Markowitz-style portfolio optimization in R?

Prompt: Generate R code to demonstrate portfolio optimization. Create and plot the efficient frontier using the following assets: BND, VTI, GSG, and USRT.

This was the code generated:

# Install required packages if not already installed
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("quantmod")) install.packages("quantmod")
if (!require("PerformanceAnalytics")) install.packages("PerformanceAnalytics")

# Load packages
library(tidyverse)
library(quantmod)
library(PerformanceAnalytics)

# Define the list of assets
assets <- c("BND", "VTI", "GSG", "USRT")

# Download data for all assets
getSymbols(assets, from = "2010-01-01")

# Calculate returns for all assets
returns <- lapply(assets, function(x) Ad(get(x)))
returns <- do.call(cbind, returns)
colnames(returns) <- assets

# Plot efficient frontier
chart.Portfolio(R = returns, weight = matrix(0.125, ncol = length(assets), nrow = 1),
               risk.target = "sd", return.target = "mean",
                portfolio = "frontier", col = "blue")

I didn’t really expect this to work, and it doesn’t. The request is quite general and doesn’t offer any hints about where to look. The first thing that is weird is this line of code:

returns <- lapply(assets, function(x) Ad(get(x)))

It looks like a concise way to use lapply to get all the returns at once, but what it does is get all the adjusted closing prices. Again, chatGPT doesn’t really know what is a price and what is a return. The above step looks like something you could use to gather all prices, and then calculate returns.

Moving on, we see something quite interesting. While, in the previous examples, the code generated by chatGPT “kinda” worked or could be salvaged, in this example, chatGPT just straight-up calls a made-up function, chart.Portfolio, to do the portfolio optimization. In the explanation of the code, chatGPT wrote:

The chart.Portfolio function from the PerformanceAnalytics package is used to plot the efficient frontier based on the mean return and standard deviation of each portfolio combination.

It sounds legit, until we realize that this function doesn’t exist in the package. I tried a few more prompts to see if I could get it to generate better code using the PortfolioAnalytics package, which does have a function to generate an efficient frontier, but with no luck. This is not really surprising: chatGPT is a neural network trained to generalize. Because of this, it gets even simple questions with precise answers wrong. But at a first glance, we could almost think the code would work.

I gave it one more shot in a new chat:

Prompt: Please generate R code to demonstrate portfolio optimization. Create and plot the efficient frontier using the following assets: BND, VTI, GSG, USRT.

This was the resulting code, which attempts to simulate a bunch of portfolios using uniform weights, and then plot their returns and variances. However, the code does not run. The reason is the way it calls the getSymbols function, which does not work.

# Load necessary packages
library(tidyverse)
library(PerformanceAnalytics)
library(quantmod)

# Get the daily closing prices for the assets
start_date <- as.Date("2017-01-01")
end_date <- as.Date("2022-12-31")
tickers <- c("BND", "VTI", "GSG", "USRT")

prices <- getSymbols(tickers, from = start_date, to = end_date, auto.assign = TRUE)
prices <- Ad(prices) # get adjusted closing prices

# Calculate daily returns
returns <- prices %>% 
  log() %>% 
  diff() %>% 
  na.omit() %>% 
  as.data.frame()
colnames(returns) <- tickers

# Define the number of simulations and portfolios to generate
num_portfolios <- 1000
num_assets <- length(tickers)

# Set the seed for reproducibility
set.seed(123)

# Generate the portfolios and calculate their expected returns and variances
portfolio_returns <- numeric(num_portfolios)
portfolio_variances <- numeric(num_portfolios)
for (i in 1:num_portfolios) {
  weights <- runif(num_assets)
  weights <- weights / sum(weights)
  portfolio_returns[i] <- sum(weights * colMeans(returns))
  portfolio_variances[i] <- as.numeric(t(weights) %*% cov(returns) %*% weights)
}

# Plot the efficient frontier
plot(portfolio_variances, portfolio_returns, xlab = "Variance", ylab = "Expected Return",
     main = "Efficient Frontier", col = "blue")

I fixed the issue to create the prices object and also changed the plot to use standard deviations instead of variances. The result is this:

# Load necessary packages
library(tidyverse)
library(PerformanceAnalytics)
library(quantmod)

# Get the daily closing prices for the assets
start_date <- as.Date("2017-01-01")
end_date <- as.Date("2022-12-31")
tickers <- c("BND", "VTI", "GSG", "USRT")

getSymbols(tickers, from = start_date, to = end_date, auto.assign = TRUE)

## [1] "BND"  "VTI"  "GSG"  "USRT"

prices <- lapply(tickers, function(x) Ad(get(x)))
prices <- do.call('cbind', prices)
colnames(prices) <- tickers

# Calculate daily returns
returns <- prices %>% 
  log() %>% 
  diff() %>% 
  na.omit() %>% 
  as.data.frame()
colnames(returns) <- tickers

# Define the number of simulations and portfolios to generate
num_portfolios <- 1000
num_assets <- length(tickers)

# Set the seed for reproducibility
set.seed(123)

# Generate the portfolios and calculate their expected returns and variances
portfolio_returns <- numeric(num_portfolios)
portfolio_variances <- numeric(num_portfolios)
for (i in 1:num_portfolios) {
  weights <- runif(num_assets)
  weights <- weights / sum(weights)
  portfolio_returns[i] <- sum(weights * colMeans(returns))
  portfolio_variances[i] <- as.numeric(t(weights) %*% cov(returns) %*% weights)
}

# Plot the efficient frontier
plot(sqrt(portfolio_variances), portfolio_returns, xlab = "Variance", ylab = "Expected Return",
     main = "Efficient Frontier", col = "blue")

I mean, it’s not what we wanted, but it did something…related? It didn’t plot the efficient frontier, only a bunch of random portfolios. On the other hand, I didn’t provide much information to chatGPT. It’s possible that other prompts would get it to do something more reasonable. For example, we could try to direct it to use the create.EfficientFrontier function from the PortfolioAnalytics package, a very straightforward way to accomplish the task. Alternatively, we could correct chatGPT and see what comes of it.

Conclusion

In conclusion, ChatGPT can certainly be a valuable tool for writing code for finance applications. Its ability to understand natural language input and generate code based on that input can save time and effort for developers, particularly those without a strong background in programming. Additionally, its machine learning capabilities can enable it to learn from past code and improve over time, leading to more efficient and effective development.

However, it’s important to note that ChatGPT should not be seen as a replacement for skilled programmers or a complete solution to all development needs. It can certainly aid in certain aspects of code writing, but ultimately it’s up to the developer to ensure the code is secure, efficient, and appropriate for the specific finance application at hand. With that in mind, ChatGPT can be a useful tool in the finance industry and beyond, enabling developers to work more efficiently and effectively.⁴

The conclusion above has, of course, been written by chatGPT. Now for my own conclusion:

chatGPT is an impressive tool with a lot of potential in different domains. In this short piece, I have explored the capabilities of chatGPT to write code in R to solve a few tasks in Finance with varying degrees of complexity. For simple tasks, like plotting moving averages of prices, calculating a stock’s beta, or estimating GARCH models, chatGPT was able to produce code that was relevant to solve the problem at hand, but needed some tweaking in order to run correctly. In a slightly more complicated application related to empirical asset pricing (sorting stocks into groups according to some characteristic), which includes a series of sequential, logical steps, chatGPT produced good code that worked immediately. This is impressive and useful: you can quickly put together some code that gives you a decent starting point.

In a more complicated application involving portfolio optimization, I asked chatGPT to create and plot the efficient frontier for a set of assets. Although the code it generated looked good at a first glance, chatGPT “cheated” by calling a function that doesn’t exist. In a separate attempt, it did produce code that was related to the task, but which didn’t run correctly and ultimately didn’t solve the problem.

So can we use chatGPT to help write code for Finance? I believe the answer is yes. However, like any tool, we need to use it carefully and for the right purpose. chatGPT is designed to generalize: it doesn’t really know how to solve your specific problem; all it does is arrange words (or code) based on patterns it has seen previously. For simple problems, this may be enough; but for problems that require many specific choices, it’s very unlikely that you can immediately get a good answer that runs without any issues. Nevertheless, I believe chatGPT can be very useful, especially in some situations:

Your problem or application is simple and common. In this case, chatGPT will likely generalize well from the sources it was trained on.
You want to have a skeleton script to get started quickly. chatGPT can help you to quickly structure some code (or an initial version at least) to tackle a problem, as long as you describe it well. The more detail you provide, the better the answer will be.
You want to understand a piece of code. Sometimes we’re using someone else’s code, or even our own code from six months ago, which is not easily readable or not documented properly. If it’s not evidence what the code does, or how it does it, we can just ask chatGPT to explain it to us.
You want to achieve a set of manipulations that follow logical, sequential steps which can be easily described in words. If you can describe in words the structure of the data and what you want to do, chatGPT is likely to give you a good answer. This was the case in the portfolio sorting application.

By working on this quick exploration of chatGPT to generate R code to tackle simple Finance applications, a few things stood out to me:

The need to thoroughly verify and test any AI-generated code. This one is pretty obvious (even to chatGPT).
You need to work with the tool to refine your prompts to give chatGPT context and correct it when it makes a mistake. The quality of the output will be directly related to how much effort you put into providing context to the tool, which in turn depends on your knowledge of the subject or problem in question. If you don’t enough of the subject matter to correct the AI, you will end up with poor results. I believe the skills to interact and guide AI in this way will become essential in the future.
I believe we’re in very early days regarding the use of these AI tools. In fact, I fully expect this short piece to already be outdated. I haven’t tested Bing AI yet, but it’s probable that it could do better. Looking back at this post some years from now will be very interesting (not to mention awkward!).
One of my main concerns before writing this post was the possibility of students using chatGPT or other AI tools to cheat in my courses. After having played with it for a short while, I don’t think this concern is where we should put our attention. Sure, in the short term, some students may cheat using chatGPT, and perhaps some kind of watermarking or cheat detection tool may be appropriate. But we should be looking forward. In my opinion, the ubiquity of this type of tool is inevitable. What is needed is to teach people how to use them properly, efficiently, and ethically. I’ve been seeing inspiring examples of how other educators are incorporating AI into their courses.
chatGPT-tyle AI will become another resource, another tool in our toolbox, although a special tool with which we can interact, ask questions, and describe the problems we’re trying to solve. Like many technological tools, AI will enhance our capabilities and it will be difficult to imagine how things were done previously. But it won’t replace hard work, critical thinking, creativity, or deep understanding of a topic.

Prompt: Please write an introductory paragraph on the following topic: “Can chatGPT be used to write code to solve simple problems in finance?”↩︎
See an example using dcast here.↩︎
These were the prompts: “Could you redo this, but when you select stocks in step 6, remove the filter for exchcd” and “Thanks but there is an issue. The market capitalization percentile needs to be calculated at the end of each month.”↩︎
Prompt: “Could you write a two-paragraph conclusion to a short essay on whether chatGPT can be useful to write code for finance applications?”↩︎

Writing R code for finance with chatGPT

Alexandre Rubesam

2023-02-10