Student ID end in 26 ICE T T versus ICE G G × 0.11 (2) CBT_TU_TU versus CBT_TY_TY × 0.8445 (6)

Introduction

This work is to analyze the spread between two futures contracts for two sets of commodities. The first spread (spread1) is the price spread between ICE_T_T (WTI Crude Futures) and ICE_G_G (Low Sulphur Gasoil Futures). The second spread (spread2) is the price spread between CBT_TU_TU (2Yr Treasury Note) and CBT_TY_TY (10 Yr Treasury Note).

For the analysis I used the second month (first contract with more than 30 days until termination) of quarterly futures contract data. The analysis was conducted on futures data from 12/3/2020 through 8/31/2023.

Required Packages and Functions

Below are the required R packages necessary to run the analysis.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(MASS)
## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select
library(Quandl)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################
## 
## Attaching package: 'xts'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
library(ggplot2)
library(stringr)
library(R.cache)
## R.cache v0.16.0 (2022-07-21 16:20:02 UTC) successfully loaded. See ?R.cache for help.
library(data.table)
## 
## Attaching package: 'data.table'
## 
## The following objects are masked from 'package:xts':
## 
##     first, last
## 
## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## 
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## 
## The following object is masked from 'package:purrr':
## 
##     transpose
library(DescTools)
## 
## Attaching package: 'DescTools'
## 
## The following object is masked from 'package:data.table':
## 
##     %like%
library(tseries)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
Quandl.api_key("EsFRJz6Xd3CxmytxKgCB")

analysis_start <- as.Date("2020-12-03")
analysis_end <- as.Date("2023-08-31")

The following functions are used to download the necessary data from Quandl. This code is a translation from Python functions provided by Professor Boonstra.

grab_quandl_table <- function(
  root_data_dir = NULL,
  table_path, avoid_download=F, replace_existing=F,
  date_override=NULL, allow_old_file=F,...)
  {
  
  print(date_override)
  if(is.null(root_data_dir)){
    root_data_dir <- file.path(Sys.getenv("HOME"), "quandl_data_table_downloads")
  }
  
  
  data_symlink <- file.path(root_data_dir, paste0(table_path, "_latest.zip"))
  
  # If root data directory doesnt exist
  # create root data directory
  if(!dir.exists(root_data_dir)){
    cat("Creating new root dir", root_data_dir)
    dir.create(root_data_dir)}
  
  # If avoiding downloads and the data file already exists
  # return the existind data
  if(avoid_download && file.exists(data_symlink)){
    cat("Skipping any possible download of", table_path)
    return(data_symlink)}
  
  # If the table directory does not already exist
  # create the table directory
  table_dir <- dirname(data_symlink)
  if(!dir.exists(table_dir)){
    cat("Creating new data dir", table_path)
    dir.create(table_dir)}
  
  # If date_override is NULL
  # set my_date to system date
  # else set my_date to date_override
  if(is.null(date_override)){
    my_date <- format(Sys.Date(), "%Y%m%d")
  }else{
    my_date <- paste(format(date_override, "%Y%m%d"), collapse = "%2C")
  }
  
  
  # Set data zip file path 
  data_file <- file.path(root_data_dir, paste0(table_path, "_", my_date, ".zip"))
  
  # If file already exists
  # and if replace existing is trye or the filse size > 0
  # Remove old file
  if(file.exists(data_file)){
    file_info <- file.info(data_file)
    file_size <- file_info$size
    if(replace_existing || !file_size >0){
      cat("Removing old file", data_file, "size", file_size)
    }else{
      cat("Data file", data_file, "size", file_size, "exists already, no need to download")
      return(data_file)
    }}
  
  dl <- Quandl.datatable.bulk_download_to_file(table_path, filename = data_file, ...)
  file_size <- file.info(data_file)$size
  
  if(file.exists(data_file) && file_size > 0) {
    cat("Download finished:", file_size,"bytes")
    #if(is.null(date_override)) {
    #  if(file.exists(data_symlink)) {
    #    cat("Removing old symlink")
    #    file.remove(data_symlink)}
    #  
    #  cat("Creating symlink:"data_file," -> " data_symlink))
    #  system2("ln", args = c("-s", data_file, data_symlink), stdout = TRUE)
    #}
    return(data_file)
  }else{
    cat("Data file",date_file,"failed download")
    return()
  }
  
  if(is.null(date_override) || allow_old_file){
    return(data_file)
  }else{
    return("NoFileAvailable")
  }
}

fetch_quandl_table <- function(
    table_path, root_data_dir = NULL,
    date_override = NULL, avoid_download=T, ...){
  results <- readr::read_csv(
    grab_quandl_table(
      table_path, 
      root_data_dir = root_data_dir,
      avoid_download = avoid_download,
      date_override = date_override,
      ...))
  return(results)
}

Initial Data Cleaning

The code below downloads the entire AR/IVM table from Quandl.

OWF_data_all <- fetch_quandl_table(table_path = "AR/IVM", avoid_download = F)
## NULL
## Data file /Users/taylor/quandl_data_table_downloads/AR/IVM_20240127.zip size 291196619 exists already, no need to download
## Rows: 4918394 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (4): exchange_code, futures_code, option_code, expiration
## dbl  (16): futures, atm, rr25, rr10, fly25, fly10, beta1, beta2, beta3, beta...
## date  (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Once the entire data set has been downloaded, I filtered out the imputed constant maturity entries, non-quarterly futures, and any futures with termination dates that are not the second month, as defined in the introduction.

OWF_data_secondmonth <-
  OWF_data_all %>%
  arrange(`date`) %>%
  dplyr::filter(between(`date`, analysis_start, analysis_end)) %>%
  # Only use actual (not interpolated) futures
  dplyr::filter(!(`expiration` %in% c("1W","1M","2M","3M","6M","9M","1Y"))) %>%
  dplyr::filter(str_starts(`expiration`, "H|M|U|Z")) %>%
  dplyr::mutate(`id` = paste(`exchange_code`, `futures_code`, `option_code`, sep="_")) %>%
  # Keep only second expiration
  dplyr::filter(`days_termination` > 30) %>%
  dplyr::group_by(`id`, `date`) %>%
  dplyr::mutate(`exp` = rank(`days_termination`)) %>%
  dplyr::filter(`exp` == 1) %>%
  dplyr::select(-`exp`) %>%
  dplyr::ungroup()

Using the filtered data set with only second month quarterly futures, I created a dataframe for each of the two spreads, containing the futures prices and the spread between them. For time series analysis, I also create an ts series of each of the spreads. Figure 3.1 below plots the spreads through time.

spread1_ids <- c("ICE_T_T", "ICE_G_G")
spread2_ids <- c("CBT_TU_TU", "CBT_TY_TY")

spread1_df <-
  OWF_data_secondmonth %>%
  dplyr::filter(`id` %in% spread1_ids) %>%
  dplyr::select(`date`, `id`, `futures`) %>%
  # Pivot values wide
  tidyr::pivot_wider(names_from = `id`, values_from = `futures`) %>%
  # Calculate spread
  dplyr::mutate(`spread` = (`ICE_G_G` * 0.11) - `ICE_T_T`)

spread1 <- as.xts(spread1_df)
  
  
spread2_df <-
  OWF_data_secondmonth %>%
  dplyr::filter(`id` %in% spread2_ids) %>%
  dplyr::select(`date`, `id`, `futures`) %>%
  tidyr::pivot_wider(names_from = `id`, values_from = `futures`) %>%
  dplyr::mutate(`spread` = (`CBT_TY_TY` * 0.8445) - `CBT_TU_TU`)

spread2 <- as.xts(spread2_df)

plot_spreads <- 
  dplyr::full_join(
    spread1_df, spread2_df, 
    by = c("date"),
    suffix = c("(ICE_G_G-ICE_T_T)", "(CBT_TY_TY-CBT_TU_TU)")) %>%
  dplyr::select(c(`date`, contains("spread"))) %>%
  tidyr::pivot_longer(
    cols = c(everything(), -`date`),
    names_to = "Spread_Name",
    values_to = "Spread_Value") %>%
  ggplot() +
  aes(x = date, y = Spread_Value) +
  geom_step(colour = "#112446") +
  labs(
    title = "Figure 3.1: Futures Spreads Through Time",
    subtitle = "Spread between 2nd Month Quarterly Futures Contracts",
    x = "Date",
    y = "Futures Contract Price Spread") +
  theme_bw() +
  facet_wrap(vars(Spread_Name), scales = "free_y")
plot_spreads

At first glance, Figure 3.1 shows that through the analysis period, the spread between the US 10 Year and US 2 Year (CBT_TY_TY * 0.8445 - CBT_TU_TU) had steadily declined through 2021 and most of 2022, before rising through the middle of 2023, and subsequently declining again. Figure 3.1 also displays a somewhat stable spread between Low Sulphur Gasoil Futures and WTI Crude Futures (ICE_G_G *0.11 - ICE_T_T) in the period prior to the Russian invasion of Ukraine in early 2022. Following the invasion, the spread increased dramatically, before falling in early 2023.

Professor Boonstra was kind enough to provide us with a couple data points to ensure of spread calculations are being calculated correctly. The following codechecks to make sure my spreads match the observations provided.

spread1_check_dates <- c(as.Date("2021-10-18"), as.Date("2021-10-19"), as.Date("2021-11-10"))
spread1_check_values <- c(-1.3075, 0.6475, -0.3350)

spread1_check_data <-
  spread1_df %>%
  dplyr::filter(`date` %in% spread1_check_dates) %>%
  pull(`spread`) %>%
  round(4)

if(setequal(spread1_check_data, spread1_check_values)){
  print("Spread 1 matches examples provided")
}else{
  print("Spread 1 does match examples provided")
}
## [1] "Spread 1 matches examples provided"
spread2_check_dates <- c(as.Date("2021-11-19"), as.Date("2021-11-24"), as.Date("2021-11-29"))
spread2_check_values <- c(0.859875, -0.498039, 0.616266)

spread2_check_data <-
  spread2_df %>%
  dplyr::filter(`date` %in% spread2_check_dates) %>%
  pull(`spread`) %>%
  round(6)

if(setequal(spread2_check_data, spread2_check_values)){
  print("Spread 2 matches examples provided")
}else{
  print("Spread 2 does match examples provided")
}
## [1] "Spread 2 matches examples provided"

Analysis of Spread1: Low Sulphur Gasoil- WTI Crude (ICE_G_G * 0.11) - ICE_T_T)

Low Sulphur Gasoil Futures (ICE_G) are based on the underlying physical market for diesel barges delivered in Amsterdam, Rotterdam, and Antwerp, and is used as the pricing reference for distilled Gasoil in Europe.

WTI Futures (ICE_T) are based on the underlying physical market for WTI (West Texas Intermediate) Light Sweet Crude produced in the United States, and is used as a benchmark in oil pricing from the US.

The spread between ICE_G and ICE_T captured the pricing differences in demand for refined gasoil in Europe and crude oil from the US, The spread is influenced by both the difference in supply and demand for refined vs crude oil, as well as the difference in supply and demand in oil products from the US consumed in Europe.

The right hand plot in Figure 3.1 above shows the spread over the analysis period. During the analysis period, Russia invaded Ukraine on 2/24/2022, leading to global sanctions and bans of all Russian seaborne petroleum products into the EU on 2/5/2023. Around this time, US President Biden announced a release of of 1 million barrels of oil per day for 180 days. During this period, Spread1 increased rapidly.

First, Spread1 was decomposed into its seasonal, trend, and irregular components using Loess. Gasoil and crude oil have been shown in other studies to have a seasonality component, my hypothesis was that the spread may be seasonally affected as well. Figure 4.1 below shows that while there does exists some seasonality component to Spread1, the accompanying statistics show that the interquartile range (IQR) of the seasonality component was on;y 26.4% of the IQR of Spread1, while the trend IQR was 83.6% of Spread1 IQR. This indicated that the trend over the analysis period is more impactful to Spread1 than seasonality.

spread1_df_exp <- 
  spread1_df %>%
  full_join(
    data.frame("date" = seq.Date(min(spread1_df$date), max(spread1_df$date), by = "day")),
    by = "date") %>%
  arrange(`date`)

spread1.ts <- 
  ts(
    data = spread1_df_exp$spread,
    start = c(2020, 338),
    end = c(2023, 244),
    frequency = 365) %>%
  na.locf()
  
spread1_stl <- stl(spread1.ts, s.window = "periodic")
spread1_stl_df <-
  spread1_stl %>%
  pluck("time.series") %>%
  as.data.frame() %>%
  cbind("date" = spread1_df_exp$date) 

plot_spread1_stl <-
  spread1_stl_df %>%
  pivot_longer(
    cols = c(`seasonal`, `trend`, `remainder`),
    names_to = "Decomposition",
    values_to = "value") %>%
  mutate(`Decomposition` = factor(`Decomposition`, levels=c("trend","seasonal","remainder"))) %>%
  ggplot() +
  aes(x = date, y = value) +
  geom_line(colour = "#112446") +
  theme_minimal() +
  facet_wrap(vars(Decomposition)) +
  labs(
    x = "Date",
    y = "Spread1 Value Contribution",
    title = "Spread1 Seasonal Decomposition by LOESS",
    subtitle = 
      paste("Figure 4.1: Spread ICE_G_G - ICE_T_T  Decomposition into Trend, Seasonality, and Remainder", 
        "\n", "12/3/2020 through 8/31/2023", 
        "\n","Grouped by Decomposition Component"))


plot_spread1_stl

summary(spread1_stl)
##  Call:
##  stl(x = spread1.ts, s.window = "periodic")
## 
##  Time.series components:
##     seasonal             trend             remainder         
##  Min.   :-5.816650   Min.   :-2.604275   Min.   :-11.812114  
##  1st Qu.:-2.832776   1st Qu.:-1.459571   1st Qu.: -2.731602  
##  Median :-0.624351   Median : 7.421004   Median :  0.002203  
##  Mean   :-0.243171   Mean   : 6.886741   Mean   :  0.190465  
##  3rd Qu.: 2.061635   3rd Qu.:14.040619   3rd Qu.:  2.886004  
##  Max.   : 7.010474   Max.   :18.179594   Max.   : 16.078322  
##  IQR:
##      STL.seasonal STL.trend STL.remainder data  
##       4.894       15.500     5.618        18.548
##    %  26.4         83.6      30.3         100.0 
## 
##  Weights: all == 1
## 
##  Other components: List of 5
##  $ win  : Named num [1:3] 10021 549 365
##  $ deg  : Named int [1:3] 0 1 1
##  $ jump : Named num [1:3] 1003 55 37
##  $ inner: int 2
##  $ outer: int 0

Figure 4.2 below shows the distribution of Spread1 over the entire time frame of the analysis. The plot and the accompanying statistics show the distribution is right skewed with negative excess kurtosis.

plot_spread1_dist <-
  spread1_df %>%
  ggplot() +
  aes(x = spread) +
  geom_histogram(bins = 30L, fill = "#112446") +
  labs(
    x = "Spread",
    y = "Count",
    title = "Figure 4.2: Histogram of Spread1",
    subtitle = "Distribution of ICE_G_G - ICE_T_T from 12/3/2020 through 8/31/2023") +
  theme_minimal()

plot_spread1_dist

cat(
  "Spread ICE_G_G - ICE_T_T", "\n",
  "Mean:", formatC(mean(spread1$spread, na.rm=T),digits =3), "\n",
  "Median", formatC(median(spread1$spread, na.rm = T), digits=3), "\n",
  "Std Dev:", formatC(sd(spread1$spread, na.rm = T), digits=3), "\n",
  "Skew:", formatC(Skew(spread1$spread, na.rm = T), digits=3), "\n",
  "Excess Kurtosis:", formatC(Kurt(spread1$spread, na.rm=T), digits= 3))
## Spread ICE_G_G - ICE_T_T 
##  Mean: 6.85 
##  Median 3.63 
##  Std Dev: 10.5 
##  Skew: 0.621 
##  Excess Kurtosis: -0.939

Since the chart of Spread1 shows that the Russian invasion impacted the spread, it is useful to examine Spread1 before and after the invasion on 2/24/2022. Figure 4.3 below shows the distribution of Spread1 pre and post Russia’s invasion of Ukraine. This chart shows that prior to the invasion, Spread1 had a much lower mean and standard deviation.

plot_spread1_dist_ban <-
  spread1_df %>%
  mutate(`Invasion` = 
    factor(ifelse(`date` < as.Date("2022-02-23"), "Pre-Invasion", "Post-Invasion"),
      levels = c("Pre-Invasion", "Post-Invasion"))) %>%
  mutate(`year` = as.factor(year(`date`))) %>%
  ggplot() +
  aes(x = spread) +
  geom_histogram(bins = 30L, fill = "#112446") +
  theme_minimal() +
  facet_wrap(vars(Invasion)) +
  labs(
    x = "Spread",
    y = "Count",
    title = "Figure 4.3: Histogram of Spread1 Pre- & Post- Russian Invasion of Ukraine",
    subtitle = 
      paste("Distribution of ICE_G_G - ICE_T_T from 12/3/2020 through 8/31/2023", "\n",
          "Grouped by Observations Prior and Post 2/24/2022"))

plot_spread1_dist_ban

Figure 4.4 below shows the rolling average and standard deviation of Spread1 using different rolling periods, further highlighting the changing dynamics in spread1 prior to early 2022. The changing rolling mean and rolling volatilty indicate that Spread1 is non-stationary.

spread1_rolling_sd <- 
  spread1_df %>%
  dplyr::mutate(
    `21 Days` = rollapply(`spread`, 21, sd, na.rm=T, fill=NA, align="right"),
    `42 Days` = rollapply(`spread`, 42, sd, na.rm=T, fill=NA, align="right"),
    `63 Days` = rollapply(`spread`, 63, sd, na.rm=T, fill=NA, align="right"),
    `126 Days` = rollapply(`spread`, 126, sd, na.rm=T, fill=NA, align="right")) %>%
  dplyr::select(`date`, contains("Days")) %>%
  pivot_longer(
    cols = contains("Days"),
    names_to = "Rolling Period",
    values_to = "Value") %>%
  dplyr::mutate(`metric` = "Standard Deviation")

spread1_rolling_mean <-
  spread1_df %>%
  dplyr::mutate(
    `21 Days` = rollapply(`spread`, 21, mean, na.rm=T, fill=NA, align="right"),
    `42 Days` = rollapply(`spread`, 42, mean, na.rm=T, fill=NA, align="right"),
    `63 Days` = rollapply(`spread`, 63, mean, na.rm=T, fill=NA, align="right"),
    `126 Days` = rollapply(`spread`, 126, mean, na.rm=T, fill=NA, align="right")) %>%
  dplyr::select(`date`, contains("Days")) %>%
  pivot_longer(
    cols = contains("Days"),
    names_to = "Rolling Period",
    values_to = "Value") %>%
  dplyr::mutate(`metric` = "Average")

spread1_rolling_metrics <-
  bind_rows(
    spread1_rolling_sd,
    spread1_rolling_mean)

plot_spread1_rolling_metrics <-
  spread1_rolling_metrics %>%
  ggplot() +
  aes(x = date, y = Value, colour = `Rolling Period`) +
  geom_line() +
  scale_color_hue(direction = 1) +
  theme_minimal() +
  facet_wrap(vars(metric), scales = "free_y") +
  labs(
    x = "Date",
    y = "Rolling Value",
    title = "Figure 4.4: Rolling Average and Standard Deviation of Spread1",
    subtitle = 
      paste("Rolling Standard Deviation and Average of ICE_G_G - ICE_T_T from 12/3/2020 through 8/31/2023", "\n",
          "Grouped by Rolling Period"))

plot_spread1_rolling_metrics
## Warning: Removed 248 rows containing missing values (`geom_line()`).

Figure 4.5 below shows the autocorrelation and partial autocorrelation function for Spread1 over the entire analysis period, using week end observations. Given that ACF plot shows a high level of autocorrelation that drops slowly as the lags increase, it is likely that the series is non-stationary over the entire time frame.

ep <- endpoints(spread1$spread, on = "weeks")
spread1_we <- spread1[ep,]

spread1_acf <- acf(spread1_we$spread, plot = F,  lag.max = 52)
spread1_pacf <- pacf(spread1_we$spread,plot=F, lag.max = 52)

plot_spread1_acf <-
  data.frame(
    "lags" = spread1_acf$lag[,1,],
    "acf" = spread1_acf$acf[,1,],
    "pacf" = c(1,spread1_pacf$acf[,1,])) %>%
  pivot_longer(
    cols = c("acf","pacf"),
    names_to = "Metric",
    values_to = "Value") %>%
  ggplot() +
  aes(x = lags, y = Value) +
  geom_col(colour = "#112446") +
  theme_minimal() +
  facet_wrap(vars(Metric), scales = "free_y") +
  labs(
    x = "Number of Lags (Weeks)",
    y = "ACF",
    title = "Figure 4.5: Autocorrelation Function and Partial Autocorrelation Function ",
    subtitle =
      paste("ACF and PACF of Week End ICE_G_G - ICE_T_T from 12/3/2020 through 8/31/2023"))
plot_spread1_acf

To further test for stationary of Series1, a Dickey-Fuller Test was performed, with the results shown below. The Dickey-Fuller Test indicates that we can not reject the null hypothesis that the series is non-stationary.

tseries::adf.test(spread1_we$spread)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  spread1_we$spread
## Dickey-Fuller = -1.2662, Lag order = 5, p-value = 0.882
## alternative hypothesis: stationary

Given that Spread1 appears to be non-stationary, with both changing mean and volatility, the rolling z-score was calculated using 42 days (~2 months) of data. Figure 4.6 below shows the rolling 42 day Z-Score of Spread1.

spread1_rollz_df <-
  spread1_rolling_metrics %>%
  dplyr::filter(`Rolling Period` == "42 Days") %>%
  pivot_wider(
    values_from = `Value`,
    names_from = `metric`) %>%
  left_join(spread1_df, by = "date") %>%
  dplyr::mutate(`Z-Score` = (`spread`-`Average`)/`Standard Deviation`) %>%
  dplyr::select(`date`, `Z-Score`)

plot_spread1_rollz <-
  ggplot(spread1_rollz_df) +
  aes(x = date, y = `Z-Score`) +
  geom_line(colour = "#112446") +
  theme_minimal() + 
  labs(
    x = "Date",
    y = "42 Day Rolling Z-Score of Spread1",
    title = "Figure 4.6: 42 Day Rolling Z-Score of Spread1",
    subtitle =
      paste("42 Day Rolling Z-Score of ICE_G_G - ICE_T_T from 12/3/2020 through 8/31/2023"))

plot_spread1_rollz
## Warning: Removed 41 rows containing missing values (`geom_line()`).

Using the 42 Day Rolling Z-Score of Spread1, the autocorrelation and partial autocorrelation show an a geometric decline in the ACF indicating an AR series, and the PACF plot indicates an Autoregressive process.

spread1_rollz <- as.xts(spread1_rollz_df)
ep <- endpoints(spread1_rollz$`Z-Score`, on = "weeks")
spread1_rollz_we <- spread1_rollz[ep,]


spread1_rollz_acf <- acf(spread1_rollz_we$`Z-Score`["20210202/"], plot = F,  lag.max = 52)
spread1_rollz_pacf <- pacf(spread1_rollz_we$`Z-Score`["20210202/"],plot=F, lag.max = 52)

plot_spread1_rollz_acf <-
  data.frame(
    "lags" = spread1_rollz_acf$lag[,1,],
    "acf" = spread1_rollz_acf$acf[,1,],
    "pacf" = c(1,spread1_rollz_pacf$acf[,1,])) %>%
  pivot_longer(
    cols = c("acf","pacf"),
    names_to = "Metric",
    values_to = "Value") %>%
  ggplot() +
  aes(x = lags, y = Value) +
  geom_col(colour = "#112446") +
  theme_minimal() +
  facet_wrap(vars(Metric), scales = "free_y") +
  labs(
    x = "Number of Lags (Weeks)",
    y = "ACF",
    title = "Figure 4.7: Spread1 42 Day Rolling Z-Score ACF and PACF",
    subtitle =
      paste("ACF and PACF of Week End 42 Days Rolling Z-Score of ICE_G_G - ICE_T_T", "\n","12/3/2020 through 8/31/2023"))

plot_spread1_rollz_acf

The Dickey-Fuller Test shows that we can reject the null and conclude that the rolling 42 day Z-Score of Spread1 is stationary with an AR(5) process.

tseries::adf.test(spread1_rollz_we$`Z-Score`["20210202/"])
## 
##  Augmented Dickey-Fuller Test
## 
## data:  spread1_rollz_we$`Z-Score`["20210202/"]
## Dickey-Fuller = -3.7622, Lag order = 5, p-value = 0.02299
## alternative hypothesis: stationary

Analysis of Spread2: 10 Yr T-Note - 2 Yr T-Note (CBT_TY_TY *0.8445 - CBT_TU_TU)

10 Year T-Note Futures (CBT_TY) are based on the delivery of 10 Year US Treasury bonds. 2 Year T-Note Futures (CBT_TU) are based on the delivery of 2 Year US Treasury Noted

The spread between CBT_TY and CBT_TU captures the steepness in the US Treasury Yield curve. The spread is influenced by expectations of future Federal Reserve interest rate decisions.

During the analysis period, inflation in the US came in higher than expected following fiscal and monetary stimulus used to prop up the economy during the COVID pandemic. As the economic growth remained strong and inflation persistently high, the Federal Reserve began a series of rate hikes in order to maintain price stability. The expectation of rate hikes caused what is known as a bear-flattening in rates in early 2021, that lasted though 2022. During a bear-flattener environment, short term rates rise more quickly than rates on the long end, as the market prices in rising fed fund rates. Since long-term interest rates are based more on growth than short term rates, when the fed is expected to raise rates, the front end of the curve rises more than the long end.

Figure 5.1 below shows Spread2 over the entire analysis period. Spread2 fell from the beginning of the period through most of 2022.

spread2_df %>%
  ggplot() +
  aes(x = date, y = spread) +
  geom_line(colour = "#112446") +
  theme_minimal() +
  labs(
    x = "Date",
    y = "Spread2",
    title = "Figure 5.1: Spread2 ",
    subtitle =
      paste("CBT_TY_TY - CBT_TU_TU", "\n","12/3/2020 through 8/31/2023"))

Given that there is clearly a trend (bear-flattening) for much of the analysis period, it makes sense to de-trend the data. Figure 5.2 below shows the in-sample Local Polynomial Regression using LOESS with a span of 0.5.

spread2_loess <- loess(spread ~ index, data=mutate(spread2_df, `index` = row_number()), span = 0.5)
spread2_loess_smoothed <- predict(spread2_loess)

spread2_df_loess <-
  spread2_df %>%
  cbind("loess" = spread2_loess_smoothed) %>%
  dplyr::mutate(`resid` = `spread` - `loess`)

plot_spread2_loess <-
  spread2_df_loess %>%
  rename(
    `Spread2` = `spread`,
    `Spread2 Smoothed` = `loess`) %>%
  dplyr::select(`date`, `Spread2`, `Spread2 Smoothed`) %>%
  pivot_longer(
    cols = c(`Spread2`, `Spread2 Smoothed`),
    names_to = "Metric",
    values_to = "Value") %>%
  ggplot() +
  aes(x = date, y = Value, colour = Metric) +
  geom_line() +
  scale_color_hue(direction = 1) +
  theme_minimal() +
  labs(
    x = "Date",
    y = "Spread2",
    title = "Figure 5.2: Spread2 & Local Regression Smoothing",
    subtitle =
      paste("Spread and Loess CBT_TY_TY - CBT_TU_TU", "\n","12/3/2020 through 8/31/2023"))
  
plot_spread2_loess

With the smoothed values calculated, the polynomial trend can be removed from Spread 2. The residuals from de-trending (Spread2 Residuals) can be seen in Figure 5.3.

plot_spread2_resid <-
  ggplot(spread2_df_loess) +
  aes(x = date, y = resid) +
  geom_line(colour = "#112446") +
  theme_minimal() + 
  labs(
    x = "Date",
    y = "Spread2",
    title = "Figure 5.3: Spread2 Residuals From LOESS",
    subtitle =
      paste("Residual Values of Spread CBT_TY_TY - CBT_TU_TU - LOESS Regression of Spread " , 
            "\n","12/3/2020 through 8/31/2023"))

plot_spread2_resid

Figure 5.3 shows the rolling 126 and 252 average and standard deviation of Spread2 Residuals. Given the range of residual values as plotted in Figure 5.2, the rolling average and rolling volatility are fairly stable.

spread2_resid_rolling_sd <- 
  spread2_df_loess %>%
  dplyr::mutate(
    `126 Days` = rollapply(`resid`, 126, sd, na.rm=T, fill=NA, align="right"),
    `252 Days` = rollapply(`resid`, 252, sd, na.rm=T, fill=NA, align="right")) %>%
  dplyr::select(`date`, contains("Days")) %>%
  pivot_longer(
    cols = contains("Days"),
    names_to = "Rolling Period",
    values_to = "Value") %>%
  dplyr::mutate(`metric` = "Standard Deviation")

spread2_resid_rolling_mean <-
  spread2_df_loess %>%
  dplyr::mutate(
    `126 Days` = rollapply(`resid`, 126, mean, na.rm=T, fill=NA, align="right"),
    `252 Days` = rollapply(`resid`, 252, mean, na.rm=T, fill=NA, align="right")) %>%
  dplyr::select(`date`, contains("Days")) %>%
  pivot_longer(
    cols = contains("Days"),
    names_to = "Rolling Period",
    values_to = "Value") %>%
  dplyr::mutate(`metric` = "Average")

spread2_resid_rolling_metrics <-
  bind_rows(
    spread2_resid_rolling_sd,
    spread2_resid_rolling_mean)

plot_spread2_resid_rolling_metrics <-
  spread2_resid_rolling_metrics %>%
  ggplot() +
  aes(x = date, y = Value, colour = `Rolling Period`) +
  geom_line() +
  scale_color_hue(direction = 1) +
  theme_minimal() +
  facet_wrap(vars(metric), scales = "free_y") +
  labs(
    x = "Date",
    y = "Rolling Value",
    title = "Figure 5.4: Rolling Average and Standard Deviation of Spread2 Residuals",
    subtitle = 
      paste("Rolling Standard Deviation and Average of LOESS residual CBT_TY_TY - CBT_TU_TU","\n","12/3/2020 through 8/31/2023", "\n",
          "Grouped by Rolling Period"))

plot_spread2_resid_rolling_metrics
## Warning: Removed 376 rows containing missing values (`geom_line()`).

Next, we can check for stationary using Dickey Fuller Test, using the month-end Spread 2 Residuals. The output below shows that we can conclude the Spread2 Residuals are stationary.

spread2_loess <- as.xts(spread2_df_loess)
ep <- endpoints(spread2_loess$resid, on = "weeks")
spread2_loess_we <- spread2_loess[ep,]

tseries::adf.test(spread2_loess_we$resid)
## Warning in tseries::adf.test(spread2_loess_we$resid): p-value smaller than
## printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  spread2_loess_we$resid
## Dickey-Fuller = -4.2497, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

Figure 5.4 below shows the the autocorrelation and partial autocorrelation functions of the Spread2 Residual. The plots show an a geometric decline in the ACF indicating an AR series,

spread2_resid_acf <- acf(spread2_loess_we$`resid`, plot = F,  lag.max = 52)
spread2_resid_pacf <- pacf(spread2_loess_we$`resid`,plot=F, lag.max = 52)

plot_spread2_resid_acf <-
  data.frame(
    "lags" = spread2_resid_acf$lag[,1,],
    "acf" = spread2_resid_acf$acf[,1,],
    "pacf" = c(1,spread2_resid_pacf$acf[,1,])) %>%
  pivot_longer(
    cols = c("acf","pacf"),
    names_to = "Metric",
    values_to = "Value") %>%
  ggplot() +
  aes(x = lags, y = Value) +
  geom_col(colour = "#112446") +
  theme_minimal() +
  facet_wrap(vars(Metric), scales = "free_y") +
  labs(
    x = "Number of Lags (Weeks)",
    y = "ACF",
    title = "Figure 5.5: Spread2 Residual ACF and PACF",
    subtitle =
      paste("ACF and PACF of LOESS residual CBT_TY_TY - CBT_TU_TU", "\n","12/3/2020 through 8/31/2023"))

plot_spread2_resid_acf

The output below and the Augmented Dickey Fuller test performed previously indicated that Spread2 Residuals have an AR(5) process.

ar(spread2_loess_we$resid)
## 
## Call:
## ar(x = spread2_loess_we$resid)
## 
## Coefficients:
##       1        2        3        4        5  
##  0.8447   0.1072  -0.1968   0.1609  -0.1874  
## 
## Order selected 5  sigma^2 estimated as  0.3748

Conclusion

Both sets of spreads exhibited unique non-stationary behaviors due to the environment in which the analysis was conducted. Spread1 was influenced by Russia’s invasion of Ukraine, which caused a step change in the spread with a higher mean and larger volatility. To account for this I used a short horizon rolling Z-Score to normalize the spread, and found it to be stationary when calculated with a 42 day rolling window. Spread2 was influenced by market expectations of Federal Reserve rate hikes as a response to persistently above target inflation. To account for this I used localized regression to de-trend the data. The residuals where stationary and exhibited an AR(5) process using week_end data.