The purpose of this analysis is to create and evaluate a spread-reversion trading strategy between PJX (Invesco Oil & Gas Services ETF) and OIH (VanEck Oil Services ETF). While both ETFs seek to replicate the performance of the same industry, the products track different indicies and trade with very different volumes. These difference have led to periods where performance between the ETFs differ, a return spread. The goal is to identify a trading strategy that capitalizes on the reversion of that return spread towards zero.
The general idea of the trading strategy is to identify periods when one ETF has performed better than the other above a given threshold. Once the return spread has reached the specified threshold, the strategy will short the ETF that has outperformed and buy the ETF that has under performed. If the return spread between the ETFs reverts back towards zero, the strategy should capture profits as the ETF we bough should outperform the ETF we sold. This may not always be the case, as the spread is calculated using a rolling look back window. If the window includes days of large return differences that are rolled off, the spread could revert back towards zero, even though the return of the ETFs since the trade was put on do not differ (or move against us). This could arise dues to the differences in the benchmark construction.
For backtesting there are a number of parameters that need to be set, including the number of days to use to calculate the spread, the level of spread at which we put a trade on, the level of spread at which we take a trade off, a stop loss level to limit run away trades, and trading costs. Once a stop loss level is triggered, the trade is closed, and a new trade can not be opened until the start of next month.
To generate the trade strategy, I opted to create an S4 class object representing a trade (SpreadTrade), in which a single order is placed and tracked, as well as a trading strategy S4 class object (SpreadBT) that collects all of the trades entered during a backtesting period. Using different objects for trades and a strategy allows me to look at the performance of each trade individually and as a whole.
My assumptions in the backtesting is that we can enter a trade on the same day that we get a spread signal, and that we can execute the trade at the closing price. I used the adjusted close price of the ETFs, in order to minimize the complexity of dealing with dividends. The size of the trade is based on the lesser of the capital available and the volume available. Volume available is based on the the lesser of the 15 day moving median dollar volume traded in each ETF divided by 100.
The following functions are used to download data from Ken French’s website.
FFGetDatasetNames <- function(
url = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html"){
# Extract DataSet Names and URLs from Ken French Website
#
# Args:
# url: Ken French Website
#
# Returns:
# List containing
# Description: Description of dataset
# Data: List of data sets with descriptions from file
url %>%
rvest::read_html() %>%
rvest::html_nodes("b") %>%
as.character() %>%
tibble::as_tibble(column_name = "value") %>%
dplyr::mutate(`lead2` = lead(`value`, 2)) %>%
dplyr::filter(str_detect(`lead2`, "CSV.zip")) %>%
dplyr::transmute(
`Name` = stringr::str_extract(`value`, "(?<=\\<b\\>).*(?=\\</b\\>)"),
`Link` = stringr::str_extract(`lead2`, "ftp.+zip"))
}
FFExtract <- function(df){
# Extracts description and data for a dataset from Ken French's website.
#
# Args:
# df: Dataframe containing name/description of data followed by data set
# with labels in first rows.
#
# Returns:
# List containing
# Description (char) - the description of the data
# Data (numeric dataframe) - the data
# Location of data description
name.location <- stringr::str_which(df$V1, "[A-z]+")
# If no description give NA otherwise combine multiple rows
if(length(name.location) == 0){
name <- NA
}else{
name <- stringr::str_c(
stringr::str_trim(df[name.location,1][[1]]),
collapse = " ")}
# Location of data
data.location <- stringr::str_which(df$V1, "^\\s*[0-9]{4,}")
data <- df[data.location,] %>% # Clean data frame
`colnames<-`(df[min(data.location)-1,]) %>% # Assign first row as col names
janitor::clean_names() %>% # Make col names clean
dplyr::rename(`Date` = 1) %>% # First col name will be blank so give "Date"
dplyr::mutate_all(as.numeric) # Since df had chars for description convert to num
return(list("Description" = name, "Data" = data))
}
FFDataDownload <- function(dataset = "Fama/French 3 Factor"){
# Downloads CSV Data for Data Ken French's website.
#
# Args:
# dataset: Name of Fama French Data Set
#
# Returns:
# List containing
# Description: Description of dataset
# Data: List of data sets with descriptions from file
childurls <- FFGetDatasetNames()
child <- childurls$Link[which(childurls$Name %in% dataset)]
# Create working URL of child
file.name <-
child %>%
stringr::str_extract("(?<=/).+(?=\\_CSV\\.zip)") %>%
stringr::str_c(".CSV")
# Create temporary file for downloading data
temp <- tempfile()
# Download zip file from Ken French website
download.file(
url =
stringr::str_c(
"https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/",
child),
destfile = temp)
unzip(temp) # Unzip data file
# Read in CSV file
raw.data <- read.csv(
file.name,
header = FALSE,
sep = ",",
col.names = paste0("V",seq_len(200)),
fill = T,
blank.lines.skip = F)
# Remove Leading rows
file.desc <-
raw.data[1:(min(stringr::str_which(raw.data$V1, "^[0-9]{4,}"))-2),1]
raw.data <-
raw.data[(min(stringr::str_which(raw.data$V1, "^[0-9]{4,}"))-2):nrow(raw.data),]
# Split data into different data chunks
data.chunks <- raw.data %>%
dplyr::filter(!stringr::str_detect(.[[1]], "Copyright")) %>%
dplyr::select_if(~!all(is.na(.))) %>%
dplyr::select_if(~!all(. == "")) %>%
dplyr::mutate(
`istext` = ifelse(stringr::str_detect(.[[1]], "[a-z]+"),1,0),
`change` = ifelse(`istext` == 1 & lag(`istext`,1, default = 0) == 0 , 1, 0),
`group` = cumsum(`change`)) %>%
dplyr::select(-c(`istext`, `change`)) %>%
dplyr::group_split(`group`, .keep = F)
# Extract desc/data from chunks
extracted.list <-
lapply(data.chunks, FFExtract)
return(list(
"Description" = file.desc,
"Data" = extracted.list))
}
The following are utility functions. RollApplyFunc is a function to apply a provided function on a rolling basis over a given number of lags. calcSignal calculates the rolling m day return difference between the adjusted returns of security x vs security y. calcLiquidity calculated the rolling minimum median dollar volume traded between two securities over the past 15 periods. It calculates dollar volumn by multiplying the close price by the volume on that day.
RollApplyFunc <- function(x, func, lags, ...){
x_lag_mat <- sapply(lags, lag, x=x)
apply(x_lag_mat, MARGIN = 1,func, ...)
}
# -------------------------------------------------------------------------
calcSignal <- function(data_x, data_y, dates, m){
signal_df <-
merge(
dplyr::select(data_x, `adj_close`),
dplyr::select(data_y, `adj_close`),
by = 0) %>%
dplyr::mutate(
`x_log` = log(`adj_close.x`),
`y_log` = log(`adj_close.y`),
.keep = "unused") %>%
dplyr::mutate(
`x_return` = exp(`x_log` - lag(`x_log`, m)),
`y_return` = exp(`y_log` - lag(`y_log`, m)),
.keep = "unused") %>%
dplyr::mutate(
`signal` = `x_return` - `y_return`,
.keep = "unused") %>%
#dplyr::mutate(`signal` = lag(`signal`,1)) %>%
column_to_rownames("Row.names") %>%
dplyr::filter(rownames(.) %in% seq.Date(dates[1], dates[2], by = "day"))
return(signal_df)
}
# -------------------------------------------------------------------------
calcLiquidity <- function(data_x, data_y, dates){
# Calculate 15 day rolling median of security x
N_x <- data_x %>%
dplyr::mutate(
`avg_px` = (`close`),
`dolvol` = `avg_px` * `volume`,
`N` = RollApplyFunc(`dolvol`, median, 1:16)) %>%
dplyr::select(`N`)
# Calculate 15 day rlling median of security y
N_y <- data_y %>%
dplyr::mutate(
`avg_px` = (`close`),
`dolvol` = `avg_px` * `volume`,
`N` = RollApplyFunc(`dolvol`, median, 1:16)) %>%
dplyr::select(`N`)
# Find minimum of liquidity from x and y
N_df <- merge(N_x, N_y, by = 0) %>%
rowwise(`Row.names`) %>%
dplyr::mutate(`N` = min(`N.x`, `N.y`), .keep = "unused") %>%
ungroup() %>%
column_to_rownames("Row.names") %>%
dplyr::filter(rownames(.) %in% seq.Date(dates[1], dates[2], by = "day"))
return(N_df)
}
For this analysis I opted to use S4 class objects and methods. Below are the class class structures for a trade and a backtest, along with their associated methods and functions.
The structure of the backtest is to loop through each day of data. If the trade signal indicates a new trade should be opened, a “SpreadTrade” object is created. If a “SpreadTrade” is already opened, new data is appended to the object. If the trade signal indicates the trade should be closed, the “SpreadTrade” is closed and appended to a list of trades. The “SpreadBT” contains the list of “SpreadTrade”. Methods for “SpreadTrade” and “SpreadBT” return metrics about each trades profit and losses as well as the profit and losses of the “SpreadBT”.
While this process is not as fast a vectorized approach, it allows for more customization in the backtest because it better replicates the actually environment in which an investment professional would receive and digest additional data. From my experience, it is usually best to actually replicate the investment process during the backtesting stage so additional considerations can be added more easily. Nothing is worse than a PM asking you to build a backtest with certain parameters, then once it is done, asking for additional parameters that require a complete retooling.
# SpreadTrade (S4 Object) -------------------------------------------------
setClass(
"SpreadTrade",
representation(
side = "character",
entrydate = "Date",
exitdate = "Date",
entry_capital = "numeric",
position_x = "numeric",
position_y = "numeric",
entry_gross = "numeric",
dates = "Date",
signal = "numeric",
cash = "numeric",
price_x = "numeric",
price_y = "numeric",
value_net = "numeric",
pnl = "numeric",
pnl_pct = "numeric"))
# SpreadBT (S4 Object) ----------------------------------------------------
setClass(
"SpreadBT",
representation(
init_capital = "numeric",
date_range = "Date",
m = "numeric",
g = "numeric",
j = "numeric",
s = "numeric",
c = "numeric",
dates = "Date",
signal = "list",
trades = "list"))
# Create Spread Trade Function --------------------------------------------
createNewSpreadTrade <- function(
date, side, capital, price_x, price_y, trade_cost, signal){
# Set Position in X and Y based on Spread Trade Side
# Position is closest integer to initial capital supplied
if(side == "Long"){
position_x = capital %/% price_x
position_y = -capital %/% price_y}
else{
position_x = -capital %/% price_x
position_y = capital %/% price_y}
# Calculate the Initial Values of X and Y Traded
value_x <- (price_x * position_x)
value_y <- (price_y * position_y)
# Calculate Gross Value of Positions Traded
entry_gross <- abs(value_x) + abs(value_y)
# Calculate the trading cost in dollars
trade_cost_dollars <- entry_gross * trade_cost
# Calculate the Initial Cash balance
# Initial Capital - Purchases + Short Sales - Trading Costs
cash <- capital - value_x - value_y - trade_cost_dollars
# Calculate the net value of the trade
# Cash + Long Position Value - Short Position Value
value_net <- cash + value_x + value_y
# Calculate the Profit & Loss based on the Capital Supplied
pnl <- value_net - capital
# Calculate the Profit & Loss Percent based on the Gross Entry Value
# This is used for stop losses
pnl_pct <- pnl / entry_gross
new("SpreadTrade",
side = side,
entrydate = as.Date(date),
exitdate = as.Date(NA),
entry_capital = capital,
position_x = position_x,
position_y = position_y,
entry_gross = entry_gross,
dates = date,
signal = signal,
cash = cash,
price_x = price_x,
price_y = price_y,
value_net = value_net,
pnl = pnl,
pnl_pct = pnl_pct)
}
# UpdateSpreadTrade Method ------------------------------------------------
setGeneric("UpdateSpreadTrade", function(object, date, price_x, price_y, signal) standardGeneric("UpdateSpreadTrade"))
## [1] "UpdateSpreadTrade"
setMethod("UpdateSpreadTrade", signature(object = "SpreadTrade"),
function(object, date, price_x, price_y, signal){
if(!is.na(object@exitdate)) return(object)
value_net <-
(object@position_x * price_x) +
(object@position_y * price_y) +
dplyr::last(object@cash)
pnl <- value_net - object@entry_capital
pnl_pct <- pnl / object@entry_gross
object@dates <- c(object@dates, date)
object@signal <- c(object@signal, signal)
object@price_x <- c(object@price_x, price_x)
object@price_y <- c(object@price_y, price_y)
object@value_net <- c(object@value_net, value_net)
object@pnl <- c(object@pnl, pnl)
object@pnl_pct <- pnl_pct
return(object)
})
# CloseSpreadTrade Method -------------------------------------------------
setGeneric("CloseSpreadTrade", function(object, trade_cost) standardGeneric("CloseSpreadTrade"))
## [1] "CloseSpreadTrade"
setMethod("CloseSpreadTrade", signature(object = "SpreadTrade"),
function(object, trade_cost){
last_date <- dplyr::last(object@dates)
object@exitdate <- last_date
trade_cost_dollars <-
trade_cost *
(abs(object@position_x * dplyr::last(object@price_x)) +
abs(object@position_y * dplyr::last(object@price_y)))
value_net <- dplyr::last(object@value_net) - trade_cost_dollars
pnl <- value_net - object@entry_capital
object@value_net <- c(object@value_net[-length(object@value_net)], value_net)
object@pnl <- c(object@pnl[-length(object@pnl)], pnl)
return(object)
})
# getPNL Method -----------------------------------------------------------
setGeneric("getCumPNL", function(object,...) standardGeneric("getCumPNL"))
## [1] "getCumPNL"
setMethod("getCumPNL", signature(object = "SpreadTrade"),
function(object,...){
data.frame(
"date" = object@dates,
"pnl" = object@pnl,
row.names = "date")
})
setMethod("getCumPNL", signature(object = "SpreadBT"),
function(object, ...){
getPNL(object) %>%
dplyr::mutate(`pnl` = cumsum(`pnl`)) %>%
tibble::column_to_rownames("date")
})
# getPNLChange Method -----------------------------------------------------
setGeneric("getPNL", function(object, ...) standardGeneric("getPNL", ...))
## [1] "getPNL"
setMethod("getPNL", signature(object = "SpreadTrade"),
function(object, ...){
CumPNL <- getCumPNL(object)
CumPNL - dplyr::lag(CumPNL$pnl, n=1, defaul = 0)
}
)
setMethod("getPNL", signature(object = "SpreadBT"),
function(object, ...){
if(length(object@trades) == 0){
return(
data.frame(
"date" = seq.Date(object@date_range[1], object@date_range[2], "day"),
"pnl" = 0))
}
lapply(object@trades, getPNL) %>%
dplyr::bind_rows() %>%
tibble::rownames_to_column("date") %>%
dplyr::mutate(`date` = as.Date(`date`)) %>%
dplyr::group_by(`date`) %>%
dplyr::summarise(`pnl` = sum(`pnl`)) %>%
dplyr::ungroup() %>%
dplyr::full_join(
data.frame(
'date' = seq.Date(object@date_range[1], object@date_range[2], "day")),
by = "date") %>%
dplyr::arrange(`date`) %>%
dplyr::mutate(`pnl` = replace_na(`pnl`, 0))
})
# getTotalPNL Method ------------------------------------------------------
setGeneric("getTotalPNL", function(object,...) standardGeneric("getTotalPNL"))
## [1] "getTotalPNL"
setMethod("getTotalPNL", signature(object = "SpreadTrade"),
function(object, ...){
dplyr::last(object@pnl)
})
setMethod("getTotalPNL", signature(object = "SpreadBT"),
function(object, ...){
getCumPNL(object) %>%
dplyr::last() %>%
as.numeric()
})
# getPNLPct Method --------------------------------------------------------
setGeneric("getPNLPct", function(object, ...) standardGeneric("getPNLPct"))
## [1] "getPNLPct"
setMethod("getPNLPct", signature(object = "SpreadBT"),
function(object, ...){
as.numeric(getTotalPNL(object) / object@init_capital)
})
setGeneric("getPeriodicReturn", function(object, ...) standardGeneric("getPeriodicReturn"))
## [1] "getPeriodicReturn"
setMethod("getPeriodicReturn", signature(object = "SpreadBT"),
function(object, ...){
cumpnl <- getCumPNL(object) + object@init_capital
cumpnl / lag(cumpnl$pnl, 1, default = object@init_capital) - 1
})
# getSummaryStats Method -----------------------------------------------___
setGeneric("getSummaryStats", function(object, ...) standardGeneric("getSummaryStats"))
## [1] "getSummaryStats"
setMethod("getSummaryStats", signature(object = "SpreadBT"),
function(object, ...){
TotalPNL <- getTotalPNL(object)
ReturnonCapital <- getPNLPct(object)
pnlpct <- as.double(getPeriodicReturn(object)$pnl)
annmean <- mean(pnlpct) * 252
annvol <- sd(pnlpct) * sqrt(252)
downside_vol <- sd(ifelse(pnlpct <0 , pnlpct, 0)) * sqrt(252)
skew <- Skew(pnlpct)
kurtosis <- Kurt(pnlpct)
Sharpe <- annmean / annvol
Sortino <- annmean / downside_vol
NAV <- getCumPNL(object)$pnl + object@init_capital
previous_peaks <- cummax(NAV)
drawdowns <- (NAV - previous_peaks) / previous_peaks
max_drawdown <- min(drawdowns)
NTrades <- length(object@trades)
MostCapTrade <- max(sapply(object@trades, \(x) x@entry_capital))
PctProftTrades <- length(which(sapply(object@trades, getTotalPNL)>0)) / NTrades
BestTrade <- max(sapply(object@trades, \(x) getTotalPNL(x)/x@entry_capital))
WorstTrade <- min(sapply(object@trades, \(x) getTotalPNL(x)/x@entry_capital))
AvgTradeLen <- mean(sapply(object@trades, \(x) x@exitdate-x@entrydate))
cat(
paste("Total Profit/Loss:", paste0("$",formatC(TotalPNL, format="f", digits = 2, big.mark = ",")), sep = "\t"),
paste("Return on Capital:", paste0(formatC(ReturnonCapital * 100, digits = 2),"%"), sep = "\t"),
paste("Ann. Mean Returns:", paste0(formatC(annmean * 100, digits = 2),"%"), sep = "\t"),
paste("Ann. Return Vol:", paste0(formatC(annvol * 100, digits = 3),"%"), sep = "\t"),
paste("Sharpe Ratio (no rf):", formatC(Sharpe, digits = 3), sep = "\t"),
paste("Sortino Ratio (no rf):", formatC(Sortino, digits = 3), sep = "\t"),
paste("Skewness:", formatC(skew, digits = 3), sep = "\t\t"),
paste("Excess Kurtosis", formatC(kurtosis, digits = 3), sep = "\t\t"),
paste("Max Drawdown:", paste0(formatC(max_drawdown * 100, digits = 3),"%"), sep = "\t\t"),
paste("# of Trades Entered:", NTrades, sep = "\t"),
paste("% of Trades Profitable:", paste0(formatC(PctProftTrades*100, digits = 3),"%"), sep = "\t"),
paste("Most Capital Used:", paste0("$",formatC(MostCapTrade, format="f", digits = 2, big.mark = ",")), sep = "\t"),
paste("Best Trade Return:",paste0(formatC(BestTrade * 100, digits = 3),"%"), sep = "\t"),
paste("Worst Trade Return:", paste0(formatC(WorstTrade * 100, digits = 3),"%"), sep = "\t"),
paste("Average Trade Length:", paste(formatC(AvgTradeLen, digits=2), "Days"), sep = "\t"),
sep = "\n"
)
})
# -------------------------------------------------------------------------
SpreadTradeStrategy <- function(data_x, data_y, dates, capital, m, g, j, s, c){
init_capital <- capital
N_df <- calcLiquidity(data_x, data_y, dates)
signal_df <- calcSignal(data_x, data_y, dates, m)
# Initial Set of Values for Loop
trade <- NULL
stop_loss <- F
trades <- list()
# Main Loop Through Backtest -----------------------------------
for(date in as.character(seq.Date(dates[1], dates[2], by = "day"))){
d <- as.Date(date)
signal <- signal_df[date,"signal"]
N <- N_df[date, "N"] / 100
price_x <- data_x[date, 'adj_close']
price_y <- data_y[date, 'adj_close']
# Check for Stop Loss ----------------------------------------
# If stop loss and first of the month
# clear stop loss
if(stop_loss & lubridate::day(d) == 1){
stop_loss <- F
}
# If stop loss and not first of the month
# next
if(stop_loss & lubridate::day(d) > 1){
next()
}
# If missing data
# next
if(is.na(signal) || is.na(N) || is.na(price_x) || is.na(price_y)){
next()
}
# Close Last Day ---------------------------------------------
# Check for last day in test & if open trade
# Update and close trade
if(date == last(rownames(signal_df)) && !is.null(trade)){
trade <- UpdateSpreadTrade(trade, d, price_x, price_y, signal)
trade <- CloseSpreadTrade(trade, c)
capital <- capital + getTotalPNL(trade)
trades <- c(trades, trade)
trade <- NULL
next()
}
# Signal Above High Level ------------------------------------
# If signal is greater than the high level & No open trade
# Create New Short Trade
if(signal >= g && is.null(trade)){
trade <- createNewSpreadTrade(
d, "Short", min(capital, N), price_x, price_y, c, signal)
next()
}
# If signal is greater than the high level & open trade is Long
# Close Long, add it to trades list, and open short trade
if(signal >= g && trade@side == "Long"){
trade <- UpdateSpreadTrade(trade, d, price_x, price_y, signal)
trade <- CloseSpreadTrade(trade, c)
capital <- capital + getTotalPNL(trade)
trades <- c(trades, trade)
trade <- createNewSpreadTrade(d, "Short", min(capital, N), price_x, price_y, c, signal)
next()
}
# If signal is greater than the high level & open trade is short
# Update trade, check for stop loss
if(signal >= g && trade@side == "Short"){
trade <- UpdateSpreadTrade(trade, d, price_x, price_y, signal)
# If stop loss triggered
# Close and set stop loss flag
if(trade@pnl_pct < s){
trade <- CloseSpreadTrade(trade, c)
capital <- capital + getTotalPNL(trade)
trades <- c(trades, trade)
trade <- NULL
stop_loss <- T
}
next()
}
# Signal Below Negative High Level ---------------------------
# If signal is less than the neg high level & no open trade
# Create new Long Trade
if(signal <= -g && is.null(trade)){
trade <- createNewSpreadTrade(d, "Long", min(capital, N), price_x, price_y, c, signal)
next()
}
# If signal is less than neg high level & open trade is short
# Close Short, add it to trades list, and open long short
if(signal <=-g && trade@side == "Short"){
trade <- UpdateSpreadTrade(trade, d, price_x, price_y, signal)
trade <- CloseSpreadTrade(trade, c)
trades <- c(trades, trade)
capital <- capital + getTotalPNL(trade)
trade <- createNewSpreadTrade(d, "Long", min(capital, N), price_x, price_y, c, signal)
next()
}
# If signal is less than neg high level & open trade is long
# Update trade, check for stop loss
if(signal <= -g && trade@side == "Long"){
trade <- UpdateSpreadTrade(trade, d, price_x, price_y, signal)
# If stop loss triggered
# Close and set stop loss flag
if(trade@pnl_pct < s){
trade <- CloseSpreadTrade(trade, c)
trades <- c(trades, trade)
capital <- capital + getTotalPNL(trade)
trade <- NULL
stop_loss <- T
}
next()
}
# Signal Between Levels --------------------------------------
# If signal is between levels and no open trade
if(signal >= j && signal < g && is.null(trade)){
next()
}
# If signal is between levels and open trade is short
# Update trade
if(signal >= j && signal < g && trade@side == "Short"){
trade <- UpdateSpreadTrade(trade, d, price_x, price_y, signal)
next()
}
# If signal is between levels and open trade is long
# Close trade
if(signal >=j && signal < g && trade@side == "Long"){
trade <- UpdateSpreadTrade(trade, d, price_x, price_y, signal)
trade <- CloseSpreadTrade(trade, c)
trades <- c(trades, trade)
capital <- capital + getTotalPNL(trade)
trade <- NULL
next()
}
# Signal Between Negative Levels -----------------------------
# If signal is between negative levels and no open trade
# Do nothing
if(signal <= -j && signal > -g && is.null(trade)){
next()
}
# If signal is between negative levels and open trade is long
# Update trade
if(signal <= -j && signal > -g && trade@side == "Long"){
trade <- UpdateSpreadTrade(trade, d, price_x, price_y, signal)
next()
}
# If signal is between negative levels and open trade is short
# Close trade
if(signal <= -j && signal >-g && trade@side == "Short"){
trade <- UpdateSpreadTrade(trade, d, price_x, price_y, signal)
trade <- CloseSpreadTrade(trade, c)
trades <- c(trades, trade)
capital <- capital + getTotalPNL(trade)
trade <- NULL
next()
}
# Signal Between Lower Levels --------------------------------
# If signal between lower levels and no trade
# do nothing
if(signal < j && signal > -j && is.null(trade)){
next()
}
# If Signal between lower levels and trade
# Close Trade
if(signal < j && signal > -j && !is.null(trade)){
trade <- UpdateSpreadTrade(trade, d, price_x, price_y, signal)
trade <- CloseSpreadTrade(trade, c)
trades <- c(trades, trade)
capital <- capital + getTotalPNL(trade)
trade <- NULL
next()
}
}
new("SpreadBT",
init_capital = init_capital,
date_range = dates,
m = m, g = g, j = j, s = s, c = c,
dates = as.Date(rownames(signal_df)),
signal = signal_df,
trades = trades)
}
Here we will download the necessary data for the analysis. In addition to the quandl data for PJX and OIH, I also downloaded quandl data for SVOL, as well as the Fama/French Developed 3 Factors. These data sets will be used to further analysis of the spread-reversion strategy.
PJX_df <-
Quandl.datatable('QUOTEMEDIA/PRICES', ticker = "PXJ") %>%
dplyr::arrange(`date`) %>%
tibble::column_to_rownames("date")
OIH_df <-
Quandl.datatable('QUOTEMEDIA/PRICES', ticker = "OIH") %>%
dplyr::arrange(`date`) %>%
tibble::column_to_rownames("date")
SVOL_df <-
Quandl.datatable('QUOTEMEDIA/PRICES', ticker = "SVOL") %>%
dplyr::arrange(`date`)
FF3Developed <-
FFDataDownload("Fama/French Developed 3 Factors [Daily]") %>%
purrr::pluck("Data") %>%
purrr::pluck(1) %>%
purrr::pluck("Data")
The two ETFs used in this analysis, PJX and OIH, track the same industry, but with very different volumes. ETFs are baskets of stocks that are traded on exchanges. Since the demand for the ETFs and the demand for the underlying securities in the ETF may differ, there are times where the price of the ETF and the net asset value (NAV) of the underlying securities in the ETF differ, Authorized participants are allowed to create new ETF shares by buying the underlying securities in the correct proportions, or redeem ETFs by buying the ETF shares and selling the underlying holdings. If the price of the ETF is higher than the NAV, the authorized participants are incentivized to create new ETF shares, since they can sell them at a higher cost than the price to purchase the securities to make the ETF. Conversely, if the NAV of the ETF is higher than the price, the authorized participants are incentivized to redeem the ETF shares, by purchasing the ETF and selling the underlying securities at a higher price. This process keeps the NAV and the price of the ETF close to the same value.
The difference in volumes may lead to periods where the performance differs between the two, since ETF authorized participants may let the price of the ETF drift away from the NAV if they do not believe they will be able to buy or sell the ETF without impacting the price enough that profits are impacted. We would suspect that with two ETFs that track the performance of similar companies to have slightly different returns if the volumes are very different, since the redemption and creation thresholds may be different.
The chart below shows the 15 day rolling median dollar volume traded in PJX and OIH over our analysis period. It is clear that OIH has much more liquidity than PJX over the time frame. For the strategy analysis, we will therefore be using the liquidity available in PJX.
PJX_dolvol_15DMM <-
PJX_df %>%
rownames_to_column("date") %>%
dplyr::mutate(
`date` = as.Date(`date`),
`avg_px` = (`high` + `low`)/2,
`dolvol` = `avg_px` * `volume`,
`dolvol_15DMM` = RollApplyFunc(`dolvol`, median, 1:16)) %>%
dplyr::filter(dplyr::between(`date`, analysis_start, analysis_end)) %>%
dplyr::select(`date`, `ticker`, `dolvol_15DMM`)
OIH_dolvol_15DMM <-
OIH_df %>%
rownames_to_column("date") %>%
dplyr::mutate(
`date` = as.Date(`date`),
`avg_px` = (`high` + `low`)/2,
`dolvol` = `avg_px` * `volume`,
`dolvol_15DMM` = RollApplyFunc(`dolvol`, median, 1:16)) %>%
dplyr::filter(dplyr::between(`date`, analysis_start, analysis_end)) %>%
dplyr::select(`date`, `ticker`, `dolvol_15DMM`)
plot_dolvol_15DMM <-
dplyr::bind_rows(PJX_dolvol_15DMM, OIH_dolvol_15DMM) %>%
ggplot() +
aes(x = date, y = dolvol_15DMM, colour = ticker) +
geom_line() +
scale_y_continuous(labels = scales::dollar_format()) +
scale_color_hue(direction = 1) +
theme_minimal() +
labs(
title = "Rolling 15 Day Median Estimated Dollar Volume of PJX and OIH",
subtitle = "Rolling 15 Day Median of Volume * Average(High Price, Low Price)",
x = "Date",
y = "Estimated Dollar Volume",
legend = "Ticker")
plot_dolvol_15DMM
We will test a variety of parameter sets for backtesting the spread trade, and then compare the performance by looking at the return on capital of the strategy. The parameters include: m: The number of days used to calculate the return spread between ETF1 and ETF2 g: The outer signal level. When the signal crosses above this level the strategy will enter a new short spread trade. When the signal crosses below the -1 * this level the strategy will enter a new long spread trade. j: The inner signal. If a short spread trade is open and the signal crosses below this level, the short trade is closed. If a long spread trade is open and the signal crosses above -1 * this level, the long trade is closed. s: The stop loss level. If the (PNL of a trade) / (gross exposure of trade) crosses below this level, the trade is closed and a new trade can not be opened until the first day of the next month. c: Trading cost as a percent of the notional traded.
For tuning the model well will look at “m” in a range from 2 days to 63 days with steps of 2, 5, 10, 20, and 63 days. “g” in a range from 1% to 5% in intervals of 1%, as well as 10%. “j” in a range from 0 to 2.5% with intervals of 0.5%, as well as 5%. And “s” values of -2%, -5%, -10% and -20%. Excluding situations where j greater than or equal to g, we are left with 600 permutations of parameters. We will set the initial capital to $8,471,176, which is 2x the maximum of N over the entire analysis period and set the trading cost to 0.001%
The backtest is run on all of the hyperparameters, and hyperparameters where no trades where made are excluded.
params <-
expand.grid(
"m" = c(2, 5, 10, 20, 63),
"g" = c(seq(.01, .05, .01),0.1),
"j" = c(seq(0, .025, .005),0.5),
"s" = c(-0.02, -0.05, -0.1, -0.2)) %>%
dplyr::filter(`g` > `j`)
param_tuning <-
apply(
params,
1,
\(x) SpreadTradeStrategy(
PJX_df,
OIH_df,
c(analysis_start, analysis_end),
capital = 8471176,
m = x[1],
g = x[2],
j = x[3],
s = x[4],
c = 0.00001))
param_tuning_BT_PNL <- sapply(param_tuning, getPNLPct)
param_tuning_BT_Ntrade <- sapply(param_tuning, \(x) length(x@trades))
param_tuning_results <-
params %>%
dplyr::bind_cols(
"trades" = param_tuning_BT_Ntrade,
"return" = param_tuning_BT_PNL) %>%
dplyr::filter(`trades` > 0) %>%
dplyr::mutate(
`m` = as.factor(`m`),
`g` = as.factor(`g`),
`j` = as.factor(`j`),
`s` = as.factor(`s`))
We can then look at different combinations of parameters and the resulting returns to the backtested spread trading strategy.
First we will look at m, g, and j. Each facet represents a different m parameter, with the x axis representing our j parameter, and the y axis representing our g parameter. The color of the box indicates the median profit level of the strategy using the combination of the three hyperparameters.
The highest returns seem to come from when m is set to 2 days, a very short spread window.
param_tuning_results %>%
dplyr::group_by(`m`, `g`, `j`) %>%
dplyr::summarise(`return` = max(`return`)) %>%
ggplot() +
aes(x = j, y = g, fill = return) +
geom_tile() +
scale_fill_distiller(palette = "RdBu", direction = 1) +
labs(
x = "Hyper Parameter J",
y = "Hyper Parameter G",
title = "PJX vs OIH Spread Trade Strategy Returns by Hyperparameters M, G, and J",
fill = "Strategy Return") +
theme_minimal() +
facet_wrap(vars(m))
## `summarise()` has grouped output by 'm', 'g'. You can override using the
## `.groups` argument.
Filtering the results to just hyperparameter sets where m is 2, we then look at the returns for different combinations of s, g, and j parameters. The chart below shows that all of the returns are the same, signifying that the stop loss was no triggered in any of the trades. To be safe we will set the stop loss to -2%.
The chart also shows that the optimal g and j parameters are 1% and 0.5%.
param_tuning_results %>%
filter(`m` == 2) %>%
#dplyr::group_by(`s`, `g`, `j`) %>%
#dplyr::summarise(`return` = median(`return`)) %>%
ggplot() +
aes(x = j, y = g, fill = return) +
geom_tile() +
scale_fill_distiller(palette = "RdBu", direction = 1) +
labs(
x = "Hyper Parameter J",
y = "Hyper Parameter G",
title = "Spread Trade Strategy Returns by Hyperparameters S, G, and J",
subtitle = "m parameter set to 2",
fill = "Strategy Return") +
theme_minimal() +
facet_wrap(vars(s))
For the following analysis we will look at the performance of the trading strategy using the parameters m = 2, g = 1%, j = 0.5%, s = -2%, and c = 0.001%.
Strategy <- SpreadTradeStrategy(
PJX_df,
OIH_df,
c(analysis_start, analysis_end),
capital = 8471176,
m = 2,
g = 0.01,
j = 0.005,
s = -0.02,
c = 0.00001)
The chart below shows the 2 day return spread between PJX and OIH over the entirety of the analysis period. Superimposed on the chart are the trade entering and trade exiting levels. When the spread moves above the red line, the strategy will sell the spread, selling PJX and buying OIH in equal dollar volumes, rounded down to the nearest whole share. When the spread moved below the green line, the strategy will buy the spread, selling OIH and buying PJX. If the spread moves from above to below (below to above) the upper (lower) black lines, the strategy will close the open trade.
signal <-
data.frame(
"date" = Strategy@dates,
"signal" = Strategy@signal)
trades <- lapply(
Strategy@trades,
\(x) data.frame(
"date" = c(x@entrydate, x@exitdate),
"trade" = c(
ifelse(x@side == "Long",1,-1),
ifelse(x@side == "Long",-1,1)))) %>%
dplyr::bind_rows() %>%
dplyr::summarise(`trade` = sum(`trade`), .by = `date`)
trade_signal_data <-
dplyr::full_join(signal, trades, by = "date") %>%
dplyr::mutate(
`side` = dplyr::case_when(
`trade` > 0 ~ "buy",
`trade` < 0 ~ "sell",
.default = NA),
`sizeabs` = abs(`trade`))
trade_signal_data %>%
ggplot() +
aes(x = date, y = signal) +
geom_line(colour = "#112446") +
theme_minimal() +
labs(
x = "Date",
y = "2 Day Return Spread",
title = "PJX vs OIH Spread Reversion Strategy - 2 Day Return Spread",
subtitle = "Dotted Lines represent Entry and Exit Points for Trades") +
scale_y_continuous(labels = scales::percent_format()) +
geom_hline(yintercept=0.01, linetype='dotted', colour = "red")+
geom_hline(yintercept=0.005, linetype='dotted') +
geom_hline(yintercept=-0.01, linetype='dotted', colour = "green")+
geom_hline(yintercept=-0.005, linetype='dotted')
The chart below shows the points at which the strategy buys and sells the spreads. Each dot represents a trade, with larger dots representing that a long (short) spread trade was closes and a new short (long) spread trade was opened on the same day. This occurs when the spread moves dramatically from one signal to the other.
trade_signal_data %>%
dplyr::mutate(`signal_t` = ifelse(is.na(`trade`), NA, `signal`)) %>%
dplyr::mutate(`Trade Size` = as.factor(sizeabs)) %>%
ggplot() +
aes(x = date, y = signal) +
geom_line(colour = "#112446") +
theme_minimal() +
labs(
x = "Date",
y = "2 Day Return Spread",
title = "PJX vs OIH Spread Reversion Strategy - 2 Day Return Spread",
subtitle = "Dotted Lines represent Entry and Exit Points for Trades
Dots Represent Trades
Trade Size of 2 is Close of Previous Trade and Open of Opposite Side") +
scale_y_continuous(labels = scales::percent_format()) +
geom_hline(yintercept=0.01, linetype='dotted', colour = "red")+
geom_hline(yintercept=0.005, linetype='dotted') +
geom_hline(yintercept=-0.01, linetype='dotted', colour = "green")+
geom_hline(yintercept=-0.005, linetype='dotted') +
geom_point(aes(x = date, y = signal_t, colour = side, size = `Trade Size`), shape = "circle") +
scale_color_manual(
values = c(buy = "#00A658",sell = "#A50026"))
## Warning: Using size for a discrete variable is not advised.
## Warning: Removed 378 rows containing missing values (`geom_point()`).
The chart below shows the cumulative dollar profit/loss of the strategy over the analysis period. Over the entire analysis period, the strategy returned a total profit of $1,888.82.
getCumPNL(Strategy) %>%
rownames_to_column("date") %>%
dplyr::mutate(`date` = as.Date(`date`)) %>%
ggplot() +
aes(x = date, y = pnl) +
geom_line(colour = "#112446") +
labs(
x = "Date",
y = "Strategy Cumulative Profit/Loss",
title = "PJX vs OIH Spread Reversion Strategy - Cumulative Profit/Loss",
subtitle = "m = 2, g = 1%, j = 0.5%, s = -2%") +
theme_minimal() +
scale_y_continuous(labels = scales::dollar_format())
Below are some summary statistics about the PJX vs OIH spread reversion strategy.
getSummaryStats(Strategy)
## Total Profit/Loss: $1,888.82
## Return on Capital: 0.022%
## Ann. Mean Returns: 0.0082%
## Ann. Return Vol: 0.00563%
## Sharpe Ratio (no rf): 1.46
## Sortino Ratio (no rf): 3.07
## Skewness: 3.52
## Excess Kurtosis 34.8
## Max Drawdown: -0.00517%
## # of Trades Entered: 48
## % of Trades Profitable: 60.4%
## Most Capital Used: $39,822.39
## Best Trade Return: 1.47%
## Worst Trade Return: -3.74%
## Average Trade Length: 2.4 Days
The strategy produced a total of $1,888.82 in profits which equates to a return on capital of 0.022%. Assuming a risk free rate of zero, the strategy had a Sharpe Ratio of 1.46 and a Sortino Ratio of 3.07. The distribution of returns had a positive skew and high excess kurtosis. It had a max-drawdown of capital of -0.005%.
Looking at the trade level details, the strategy had 48 trades opened over the analysis period, with 60% of the trades generating profits after trading costs. At most, the trades deployed ~$40k in capital, representing less than 0.5% of the initial capital. On average, trades were closed in less than 3 days of opening. The best trade returned 1.5% on the capital deployed, and the worst trade lost 3.7%.
Overall, the strategy was profitable and had a decent return per unit of risk, but was not very profitable in terms of the initial capital since at most we could only deploy 0.5% of the initial capital, as outlined in the assignment. If we use the trade with the most capital deployed as the denominator, the strategy returned 4.7% ($1,888.82 / $39,822.39) after trading fees.
The regression results below show the PJX vs OIH return spread reversion strategy’s beta to the Fama/French 3 Factor Developed Market returns. The results show that while the beta to the Mkt - RF, SMB (Small - Big Market Value) and HML (High Minus Low Book Yield) are positive, none are statistically significant from zero.
FF3Developed_analysis <-
FF3Developed %>%
dplyr::mutate(`date` = as.Date(as.character(`Date`), format = "%Y%m%d"), .keep = "unused") %>%
dplyr::select(`date`, dplyr::everything()) %>%
dplyr::filter(dplyr::between(`date`, analysis_start, analysis_end)) %>%
dplyr::full_join(
getPeriodicReturn(Strategy) %>%
tibble::rownames_to_column("date") %>%
dplyr::mutate(`date` = as.Date(`date`)) %>%
dplyr::rename(`return` = `pnl`),
by = "date")
summary(lm(`return` ~ `mkt_rf` + `smb` + `hml`, data = FF3Developed_analysis))
##
## Call:
## lm(formula = return ~ mkt_rf + smb + hml, data = FF3Developed_analysis)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.010e-05 -6.230e-07 -4.570e-07 -2.770e-07 3.676e-05
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.614e-07 1.914e-07 2.411 0.0163 *
## mkt_rf 1.854e-07 2.019e-07 0.918 0.3590
## smb 3.703e-07 4.687e-07 0.790 0.4299
## hml 2.737e-07 2.983e-07 0.917 0.3594
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.2e-06 on 484 degrees of freedom
## (196 observations deleted due to missingness)
## Multiple R-squared: 0.003091, Adjusted R-squared: -0.003088
## F-statistic: 0.5003 on 3 and 484 DF, p-value: 0.6822
The chart below shows the close price of SVOL, with the PJX vs OIH spread reversion strategy trades overlayed on top. The chart indicates that the strategy places trades regardless of SVOL adjusted close price. This would theoretically mean that the strategy places the same amount of trades when volatility is high and low since SVOL is a short VIX futures ETF. However, there are a lot of differences between VIX and trading VIX futures. VIX futures exhibit significant negative roll yield in most periods, aside from when the VIX spikes. Shorting VIX futures results in a positive roll yield in most time horizons. For now I will assume that SVOL is a good indicator for the inverse of VIX.
trade_svol_data <-
SVOL_df %>%
dplyr::filter(dplyr::between(`date`, analysis_start, analysis_end)) %>%
dplyr::full_join(trades, by = "date") %>%
dplyr::mutate(
`side` = dplyr::case_when(
`trade` > 0 ~ "buy",
`trade` < 0 ~ "sell",
.default = NA),
`sizeabs` = abs(`trade`))
trade_svol_data %>%
dplyr::mutate(`svol_t` = ifelse(is.na(`trade`), NA, `adj_close`)) %>%
dplyr::mutate(`Trade Size` = as.factor(sizeabs)) %>%
ggplot() +
aes(x = date, y = adj_close) +
geom_line(colour = "#112446") +
theme_minimal() +
labs(
x = "Date",
y = "SVOL Adjusted Close Price",
title = "SVOL Adj Close & PJX vs OIH Spread Reversion Strategy Trades",
subtitle = "Dots represent PJX vs OIH Spread Reversion Strategy Trades
Trade Size of 2 is Close of Previous Trade and Open of Opposite Side") +
scale_y_continuous(labels = scales::dollar_format()) +
geom_point(aes(x = date, y = svol_t, colour = side, size = `Trade Size`), shape = "circle") +
scale_color_manual(
values = c(buy = "#00A658",sell = "#A50026"))
## Warning: Using size for a discrete variable is not advised.
## Warning: Removed 378 rows containing missing values (`geom_point()`).
We can test to see if the strategy is more likely to trade when SVOL is lower or higher by comparing the adjusted close price of SVOL when there are no trades on vs the adjusted close price of SVOL when we enter into a new spread trade. The statistics below show that the two-sample z statistic of the mean difference is 1.3, indicating that the the level of SVOL does not influence if a trade is entered or not.
trade_entry_dates <- sapply(Strategy@trades, \(x) as.character(x@entrydate))
trade_on_dates <- unlist(sapply(Strategy@trades, \(x) as.character(x@dates)))
SVOL_notrades <-
SVOL_df %>%
dplyr::filter(dplyr::between(`date`, analysis_start, analysis_end)) %>%
dplyr::filter(!(`date` %in% as.Date(trade_on_dates)))
mean_SVOL_notrades <- mean(SVOL_notrades$adj_close)
var_SVOL_notrades <- var(SVOL_notrades$adj_close)
SVOL_trades <-
SVOL_df %>%
dplyr::filter(`date` %in% as.Date(trade_entry_dates))
mean_SVOL_trades <- mean(SVOL_trades$adj_close)
var_SVOL_trades <- var(SVOL_trades$adj_close)
z = (mean_SVOL_notrades - mean_SVOL_trades) /
sqrt((var_SVOL_notrades/nrow(SVOL_notrades)) + (var_SVOL_trades/nrow(SVOL_trades)))
cat(
paste("SVOL Close Mean With No Trades:", paste0("$",formatC(mean_SVOL_notrades, digits = 4)), sep="\t\t"),
paste("SVOL Close Var With No Trades:", paste0("$",formatC(var_SVOL_notrades, digits = 2)), sep="\t\t"),
paste("SVOL Close Mean When Trades Entered:", paste0("$",formatC(mean_SVOL_trades, digits = 4)), sep="\t"),
paste("SVOL Close Var When Trades Entered:", paste0("$",formatC(var_SVOL_trades, digits = 2)), sep="\t"),
paste("Z Statistic of Mean Difference:", formatC(z, digits = 2), sep ="\t\t"),
sep = "\n"
)
## SVOL Close Mean With No Trades: $22.29
## SVOL Close Var With No Trades: $0.48
## SVOL Close Mean When Trades Entered: $22.15
## SVOL Close Var When Trades Entered: $0.55
## Z Statistic of Mean Difference: 1.3
We then look to see if the profitability of the strategy is a factor of the return of SVOL. Regressing the PJX - OIH return spread reversion strategy returns on the returns of SVOL. The results below show that there is a negative beta to SVOL returns, but it is not statistically significant from zero. We can therefore conclude that SVOL does not impact the frequency of trades, or the profitability of the strategy.
SVOL_analysis <-
SVOL_df %>%
dplyr::mutate(`svol_return` = `adj_close` / dplyr::lag(`adj_close`) -1) %>%
dplyr::filter(dplyr::between(`date`, analysis_start, analysis_end)) %>%
dplyr::full_join(
getPeriodicReturn(Strategy) %>%
tibble::rownames_to_column("date") %>%
dplyr::mutate(`date` = as.Date(`date`)) %>%
dplyr::rename(`return` = `pnl`),
by = "date")
summary(lm(`return` ~ `svol_return`, data = SVOL_analysis))
##
## Call:
## lm(formula = return ~ svol_return, data = SVOL_analysis)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.006e-05 -5.170e-07 -4.670e-07 -4.280e-07 3.697e-05
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.729e-07 1.968e-07 2.403 0.0167 *
## svol_return -6.667e-06 2.075e-05 -0.321 0.7481
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.272e-06 on 469 degrees of freedom
## (213 observations deleted due to missingness)
## Multiple R-squared: 0.0002201, Adjusted R-squared: -0.001912
## F-statistic: 0.1032 on 1 and 469 DF, p-value: 0.7481
In conclusion, the optimal PJX - OIH return spread reversion strategy looks at the trailing 2 day return spread, enters trades when the spread is above 1% or below -1% and closes trades when the spread crosses below 0.5% or above -0.5%. This strategy only returns 0.022% on the capital of $8,471,176, but the trades have a hit rate of 60%. Overall, the risk adjusted returns are decent, with a Sharpe of 1.46, but the initial capital provided is not used enough to generate a high return on capital. The returns are not correlated to the Fama-French 3 Factor Model returns, and are not correlated to SVOL returns.