QTP

Author

Aftikhar Mominzada

Code

digraph presentation_flow {
  graph [rankdir=LR,
         fontname="Helvetica",
         nodesep=0.3,
         ranksep=0.2,
         bgcolor="#F8F9FA",
         splines=ortho]

  node [shape=rectangle,
        style="filled",
        fillcolor="#E9ECEF",
        fontsize=11,
        width=2.8,
        height=0.9,
        margin=0.1]

  // Compact node definitions
  intro [label="Introduction\nStrategy Overview", fillcolor="#B3E5FC"]
  data [label="Data Preparation\nSources & Preprocessing", fillcolor="#C8E6C9"]
  analysis [label="Analysis\nADF Tests & Modeling", fillcolor="#C8E6C9"]
  risk [label="Risk Management\nStop-loss Mechanisms", fillcolor="#C8E6C9"]
  results [label="Results\nMetrics & Backtesting", fillcolor="#C8E6C9"]
  limitations [label="Limitations &\nImprovements", fillcolor="#FFECB3"]
  conclusion [label="Conclusion\nFindings & Future Work", fillcolor="#B3E5FC"]
  learning [label="Learning Outcomes\nKey Insights", fillcolor="#FFECB3"]

  // Main horizontal flow
  intro -> data -> analysis -> risk -> results -> limitations -> conclusion -> learning [weight=10]

  // Vertical relationships for compact layout
  {rank=same; intro}
  {rank=same; data analysis risk results}
  {rank=same; limitations conclusion learning}
  
  // Shortened edge paths
  edge [color="#666666", penwidth=1, arrowsize=0.7]
}

Pairs trading between Brent and WTI crude oil involves exploiting the price relationship between these two globally significant crude oil benchmarks. The strategy focuses on identifying deviations in their spread and capitalizing on the tendency for the relationship to revert to a mean.
Why Brent and WTI?

Highly Correlated Assets: Both crude types are substitutes and are influenced by global supply-demand factors. As such, they tend to move in similar directions, making them good candidates for pairs trading.
Spread Dynamics: The price spread (spread = Brent - WTI) often exhibits mean-reverting behavior due to arbitrage and relative demand-supply dynamics.

Transportation and Logistics: WTI prices can diverge due to storage bottlenecks or pipeline issues in the U.S., especially at the delivery hub in Cushing, Oklahoma. Also transporation cost (tanker prices) play an important role in price divergence and convergence/price volatility of the spread. transportation cost will be treated as fundamental factor for the quantitative model.

Geo-politics: Brent is more sensitive to geopolitical tensions in Europe, the Middle East, and Africa. WTI is more influenced by U.S. domestic production and storage conditions. Although this model would not rely on geo-politics, it is noteworthy to bring up its relevance in the report. However the scope of this report will remain to transportation cost as fundamental factor.

Note: For this model front month continuous contract is used and adjusted for rollover. It is noteworthy to mention that only the closing price is used instead of OHLC.

Code

s_date <- "2020-01-01"
CL25F <- RTL::getPrice(
  feed = "CME_NymexFutures_EOD_continuous",
  contract = "CL_001_Month",
  from = s_date,
  iuser = iuser,
  ipassword = ipassword
) %>% RTL::rolladjust(
  commodityname = c("cmewti"),
  rolltype = c("Last.Trade")
)


BRN25F <- RTL::getPrice(
  feed = "ICE_EuroFutures_continuous",
  contract = "BRN_001_Month",
  from = s_date,
  iuser = iuser,
  ipassword = ipassword
) %>% RTL::rolladjust(
  commodityname = c("icebrent"),
  rolltype = c("Last.Trade"))


  
futures_data <- BRN25F %>% inner_join(CL25F, by ="date")
names(futures_data)[c(2, 3)] <- c("BRNF", "CLF")

futures_data02 <- futures_data %>% 
  transmute(date, BRNF, CLF, spreadF = BRNF - CLF) %>%
  mutate(return_spreadF = spreadF/lag(spreadF)-1) %>% 
  mutate(normalized_z_spreadF = scale(spreadF, center = TRUE, scale = TRUE))



adf_test <- futures_data02$spreadF %>% na.omit() %>% urca::ur.df(type = "trend", selectlags = "AIC")




##############################################################################
# detrending normalzied data.
lm_fit <- lm(normalized_z_spreadF ~ seq_along(normalized_z_spreadF), data = futures_data02)

# Subtract the fitted values (the trend) from the original 'values_spread25F' column to get the detrended data
futures_data02$normalized_z_detrended_spreadF <- futures_data02$normalized_z_spreadF - lm_fit$fitted.values

##############################################################################




file_path <- "https://raw.githubusercontent.com/Aftikharmnz/Baltic/refs/heads/main/Baltic%20Dirty%20Tanker%20Historical%20Data.csv"

### If you're reading this and wondering why I stored data in github and imported it here, the reason is Website that provides historical data for baltic dirty tanker index prices is protected and I could not webscrape it.

baltic_data <- read_csv(file_path, col_types = cols(.default = col_character()))




baltic_data <- baltic_data %>%
  transmute(
    Date = as.Date(Date, format = "%m/%d/%Y"),
    index = as.numeric(Price),
    change = index/lag(index)-1)

O1 <- futures_data %>% transmute(Date = date,
                                 spread = BRNF - CLF,
    spread_ratio = spread / dplyr::lag(spread) -1) %>% na.omit()

comparsion <- baltic_data %>% inner_join(O1, by = "Date")

Code

library(slider)
library(dplyr)

# Prepare your data by joining datasets
corr_data <- baltic_data %>% 
  inner_join(futures_data02 %>% mutate(Date = date), by = "Date") %>% select(Date, change, return_spreadF)

# Fit the linear regression model
lm_model <- lm(return_spreadF ~ change, data = corr_data)




# Plot the data points
library(ggplot2)
spread_changeplot <- ggplot(corr_data, aes(x = change, y = return_spreadF)) +
  geom_point(color = "blue") +  # scatter plot of data points
  geom_smooth(method = "lm", color = "red", se = FALSE) +  # regression line
  labs(title = "Linear Regression: Return SpreadF vs Change",
       x = "Change",
       y = "Return SpreadF") +
  theme_minimal()

Code

url02 <- "https://raw.githubusercontent.com/Aftikharmnz/Baltic_test01/refs/heads/main/Baltic_test%20data_2018_2020.csv"
url02_data <- read.csv(url02)[, c(1, 2)]

# Convert the Date column to Date format
library(dplyr)
comparsion_test <- url02_data %>%
  mutate(Date = as.Date(Date, "%m/%d/%Y"),
         change = Price/lag(Price)-1)


comparsion_test <- comparsion_test %>% mutate(normalized_z_index_baltic_test = scale(change, center = TRUE, scale = TRUE)) %>% arrange(as.Date(Date))


s_date_test <- "2018-01-01"
CLF_test <- RTL::getPrice(
  feed = "CME_NymexFutures_EOD_continuous",
  contract = "CL_001_Month",
  from = s_date_test,
  iuser = iuser,
  ipassword = ipassword
) %>% RTL::rolladjust(
  commodityname = c("cmewti"),
  rolltype = c("Last.Trade")
)


BRNF_test <- RTL::getPrice(
  feed = "ICE_EuroFutures_continuous",
  contract = "BRN_001_Month",
  from = s_date_test,
  iuser = iuser,
  ipassword = ipassword
) %>% RTL::rolladjust(
  commodityname = c("icebrent"),
  rolltype = c("Last.Trade"))
  
futures_data_test <- BRNF_test %>% inner_join(CLF_test, by ="date")
names(futures_data_test)[c(2, 3)] <- c("BRNF_test", "CLF_test")

Code

plot_stationaty <- plot_ly(futures_data02, x = ~date, y = ~normalized_z_detrended_spreadF, 
        type = 'scatter', mode = 'lines', name = 'Normalized Spread_F', 
        line = list(color = 'blue', width = 1)) %>%
  add_trace(
    x = futures_data02$date, y = rep(0.5, length(futures_data02$date)),
    type = 'scatter', mode = 'lines', name = 'Positive 0.5 SD',
    line = list(color = 'green', dash = 'dash', width = 1)
  ) %>%
  add_trace(
    x = futures_data02$date, y = rep(-0.5, length(futures_data02$date)),
    type = 'scatter', mode = 'lines', name = 'Negative 0.5 SD',
    line = list(color = 'red', dash = 'dash', width = 1)
  ) %>% layout(
    legend = list(
      orientation = 'h',    # Horizontal orientation
      x = 0.5,              # Center the legend
      y = -0.2,             # Position below the chart
      xanchor = 'center',   # Align horizontally to the center
      yanchor = 'top'       # Align vertically to the top of the legend
    )
  )

ADF test: Baltic dirty Tanker index changes and Brent-WTI spread changes

Augmented Dickey-Fuller (ADF) Test Results for Brent-WTI Spread

------------------------------------------

Test Statistic:  -13.303

Critical Values (tau3):

  1%:  -3.96

  5%:  -3.41

 10%:  -3.12

Null Hypothesis: Series is not stationary

Alternative Hypothesis: Series is stationary

P-Value (Confidence level 99%):  < 0.01 (reject null)

P-Value (Confidence level 95%):  < 0.05 (reject null)

P-Value (Confidence level 90%):  < 0.10 (reject null)

Augmented Dickey-Fuller (ADF) Test Results for Baltic Dirty Tanker index changes

-----------------------------------------------------------

Test Statistic:  -13.926

Critical Values (tau3):

  1%:  -3.96

  5%:  -3.41

 10%:  -3.12

Null Hypothesis: Series is not stationary

Alternative Hypothesis: Series is stationary

P-Value (confidence level 99%):  < 0.01 (reject null)

P-Value (Confidence level 95%):  < 0.05 (reject null)

P-Value (Confidence level 90%):  < 0.10 (reject null)

Reasoning: The Augmented Dickey-Fuller (ADF) test is a statistical test used to determine whether a time series is stationary. A stationary time series has a constant mean, variance, and auto-covariance over time, which is a critical assumption for many time series models. Given that spread between Brent and WTI (or in the case of Baltic dirty tanker index changes) is stationary we can confidently continue building our quantitative model under assumption that the spread will mean revert. In other words since there would be fluctuation from mean we can capitalize on it assuming that it will revert back to mean, and build a quantitative model around it while optimizing for standard deviations as benchmarks to short or long the spread.

Conclusion from stationarity in spread/Baltic changes:

Confirms a strong long-term equilibrium relationship.
Provides a foundation for statistical arbitrage (mean-reversion-based) pairs trading.
Indicates that deviations in the spread are trade-able anomalies rather than signs of a fundamental divergence in the two markets (in the context of WTI-Brent spread).

Normalized and de-trended Baltic dirty tanker index changes and Brent - WTI spread

Graphic below shows that spread volatility is around mean.
The reasoning behind detrending the normalized z-scored lies in ensuring that the spread between prices/index reflect only short-term deviations from its historical equilibrium, free from any residual linear trends. Even though the spread/index changes has been found to be stationary, slight directional drifts may still be present due to market dynamics or structural shifts, which could obscure the true mean-reverting nature of the spread. By fitting a linear regression model to the normalized data and subtracting the fitted trend, we effectively remove these long-term drifts, isolating the oscillatory component of the spread. This detrended series provides a clearer signal for identifying mean-reversion opportunities, which is the cornerstone of pairs trading strategies. In this refined form, the data better represents temporary anomalies that can be exploited for trading, enabling more reliable and robust signal generation while minimizing the risk of misinterpreting persistent trends as trading opportunities. This process enhances both the quality and interpret-ability of the data for market-neutral strategies focused on statistical arbitrage. We will use the standard deviation as benchmark/signal for when to enter and exit a trade and will optimize for it.
(please zoom for better view of mean reverting behavior)

Here is a visual representation of mean revesion behaviour.

Code

plot_stationaty

Code

plott0011

Composite indicator¹

\[ \text{composite\_indicator} = (\text{spreadweight} \times \text{normalized\_z\_detrended\_spreadF}) + (\text{balticweight} \times \text{normalized\_z\_index\_baltic}) \]

The composite indicator I developed combines two critical factors—detrended spread values and changes in the Baltic index—into a single metric designed to capture market dynamics comprehensively. By assigning weights to these components, the indicator balances the localized imbalances reflected in the spread with the global shipping demand trends represented by the Baltic index. The rationale behind this approach is to create a nuanced measure of market conditions, where the weighted contributions of these variables adjust the sensitivity of trade signals. The indicator is further optimized by setting thresholds to identify overvalued or undervalued conditions, enabling systematic buy and sell decisions. Moving forward, I plan to optimize the weights and thresholds to improve the indicator’s predictive power and adapt it to different market environments. The baltic weight contributes to composite indicator by increasing the indicator when price of tanker increases, and vice versa, providing a better signal rooted with fundamentals.

The weights (spread weight and baltic weight²) allow for flexibility in emphasizing either the spread or the Baltic index changes based on market dynamics. The thresholds (Upper standard deviation and Lower standard deviation) for spread can be adjusted to control the sensitivity of the strategy.

Baltic weight: The Baltic weight is particularly impactful as it highlights discrepancies in transportation costs and logistical bottlenecks, which are significant drivers of the price differential between the two benchmarks. Brent prices are more influenced by seaborne trade, while WTI, tied to inland U.S. production, is more sensitive to domestic transportation constraints. The Baltic weight bridges this gap by quantifying global transport demand, making it the most critical component of the composite indicator. Its inclusion ensures the model captures broader macroeconomic and fundamental factors that directly affect the spread, providing a robust basis for trading and forecasting.

Highlight of the model logic

Critical: while calculating the returns I ensured I took the returns from actual spread. It is noteworthy because the model is working with normalized and actual prices/index.

For more practicality the model will charge a 1% transaction cost and there is a stop loss of -10% (both embedded in the logic of this model)
Par1value - upper standard deviation (above mean/zero)
Par2value - lower standard deviation (below mean/zero)

Code

digraph G {

size="25,20";
  ratio=compress;
  node [fontsize=45, width=6];
  edge [fontsize=45];
  nodesep=2;
  ranksep=1.5;
  
  graph [rankdir=TB, 
         fontsize=45, 
         labelloc=top, 
         label="Trading Strategy Decision Flowchart: Signal Generation and Execution Logic"]
  
  node [shape=rectangle, style=filled, fillcolor=lightblue]
  
  Start [label="Start: Input Composite Indicator"]
  Detrend [label="Detrending and Normalizing Data"]
  Check_Thresholds [label="Check Composite Indicator\nvs. Thresholds:\npar1value (Upper), par2value (Lower)"]
  Buy_Signal [label="Buy Signal:\nIndicator < par2value (Lower Threshold)"]
  Sell_Signal [label="Sell Signal:\nIndicator > par1value (Upper Threshold)"]
  Hold [label="Hold Signal:\npar2value ≤ Indicator ≤ par1value"]
  Stop_Loss_Check [label="Stop Loss Check:\nIs Loss > Threshold?"]
  Close_Position [label="Close Position: Stop Loss Triggered"]
  Trade_Execution [label="Trade Execution:\nBuy/Sell Spread at Next Open"]
  Trade_Cost [label="Adjust for 1% Trade Charge"]
  Position_Update [label="Update Position:\nLong, Short, Flat"]
  End [label="End"]

  Start -> Detrend -> Check_Thresholds
  Check_Thresholds -> Buy_Signal [label="Indicator < par2value"]
  Check_Thresholds -> Sell_Signal [label="Indicator > par1value"]
  Check_Thresholds -> Hold [label="Within Thresholds"]
  Buy_Signal -> Stop_Loss_Check
  Sell_Signal -> Stop_Loss_Check
  Stop_Loss_Check -> Close_Position [label="Loss > Threshold"]
  Stop_Loss_Check -> Trade_Execution [label="No Stop Loss Triggered"]
  Close_Position -> Trade_Cost
  Trade_Execution -> Trade_Cost
  Trade_Cost -> Position_Update -> End
  Hold -> End
  

}

A Check of the model before doing optimization: Used the following constraints:

par1value = 0.7

par2value = -0.7

spreadweight = 0.7

balticweight = 0.3

The model is working fine –> executing trade, taking positions, with a cumulative performance. All looks good

Code

initial_strategy <- function(data = train,
                     par1value = 0.7,
                     par2value = -0.7,
                     spreadweight = 0.7,
                     balticweight = 0.3) {
  
  data <- data %>% 
    select(
      Date,
      spreadF,
      return_spreadF,
      normalized_z_detrended_spreadF,
      normalized_z_index_baltic
    ) %>%
    mutate(
      # Create composite indicator using weighted spread and Baltic changes
      composite_indicator = spreadweight * normalized_z_detrended_spreadF + balticweight * normalized_z_index_baltic,
      
      # Generate trade signals based on the composite indicator
      signal = dplyr::case_when(
        composite_indicator < par2value ~ 1,  # Buy signal
        composite_indicator > par1value ~ -1, # Sell signal
        TRUE ~ 0  # No trade signal
      )
    )
  
  # Create trade orders based on the change in signal
  withtrades <- data %>%
    dplyr::mutate(
      trade = tidyr::replace_na(dplyr::lag(signal) - dplyr::lag(signal, n = 2L), 0)
    )
  
  # Track positions and calculate returns
  withposition <- withtrades %>%
    mutate(
      pos = cumsum(trade),  # Cumulative position based on trade signals
      
      # Return on new trades
      ret_new = ifelse(trade != 0, trade * return_spreadF, 0),
      
      # Return on existing trades (carry the same position)
      ret_exist = ifelse(trade == 0 & pos != 0, pos * return_spreadF, 0),
    
    # Total return considering new and existing trades
    ret = ret_new + ret_exist,
    ret = ret - abs(trade) * 0.01 # transaction cost
    )
  
  # Calculate cumulative equity (cumulative product of total returns)
  final <- withposition %>%
    dplyr::mutate(
      cumeq = cumprod(1 + ret)  # Cumulative return
    )
  
  return(final)
}


strategy <- function(data = train,
                     par1value = 0.7,
                     par2value = -0.7,
                     spreadweight = 0.7,
                     balticweight = 0.3,
                     stop_loss_threshold = - 0.1) {  # Stop-loss threshold (3%)

  data <- data %>% 
    select(
      Date,
      spreadF,
      return_spreadF,
      normalized_z_detrended_spreadF,
      normalized_z_index_baltic
    ) %>%
    mutate(
      # composite indicator using weighted spread and Baltic changes
      composite_indicator = spreadweight * normalized_z_detrended_spreadF + balticweight * normalized_z_index_baltic,
      
      # Generate trade signals based on the composite indicator
      signal = dplyr::case_when(
        composite_indicator < par2value ~ 1,  # Buy signal
        composite_indicator > par1value ~ -1, # Sell signal
        TRUE ~ 0  # No trade signal
      )
    )
  
  #  trade orders based on the change in signal
  withtrades <- data %>%
    dplyr::mutate(
      trade = tidyr::replace_na(dplyr::lag(signal) - dplyr::lag(signal, n = 2L), 0)
    )
  
  # Track positions and calculate returns
  withposition <- withtrades %>%
    mutate(
      pos = cumsum(trade),  # Cumulative position based on trade signals
      
      # Track entry price when trade is executed
      entry_price = ifelse(trade != 0, spreadF, NA),
      
      # Initialize a column to track the stop-loss condition
      stop_loss_triggered = FALSE,
      
      # For each trade, checking if stop-loss is triggered
      stop_loss = ifelse(
        !is.na(entry_price) & pos != 0, 
        abs((spreadF - entry_price) / entry_price) >= stop_loss_threshold, 
        FALSE
      ),
      
      # Update the trade position based on stop-loss logic
      adjusted_trade = ifelse(stop_loss, 0, trade),
      
      # Calculate returns (new, existing, or stopped out)
      ret_new = ifelse(adjusted_trade != 0, adjusted_trade * return_spreadF, 0),
      
      ret_exist = ifelse(adjusted_trade == 0 & pos != 0, pos * return_spreadF, 0),
      
      # Total return considering new, existing, and stop-loss trades
      ret = ret_new + ret_exist,
      ret = ret - abs(adjusted_trade) * 0.01  # transaction cost
    )
  
  # Calculate cumulative equity (cumulative product of total returns)
  final <- withposition %>%
    dplyr::mutate(
      cumeq = cumprod(1 + ret)  # Cumulative return
    )
  
  return(final)
}



par1value = seq(from = 0, to = 2, by = 0.05)
par2value <- seq(from = -2, to = 0, by = 0.05)

param_grid  <- expand.grid(par1value = par1value, par2value = par2value)

Justification for using spread rather than buying/selling WTI/Brent seperately

Statistical Validity: spread is stationary and mean-reverting; hence the strategy aligns perfectly with statistical arbitrage principles (cointegration & mean reversion).
Risk Mitigation: Spread trading reduces systemic risk and focuses on relative value, making it more robust in volatile oil market. (risk neutral hedging).
Simplicity and Execution: Trading the spread simplifies both modeling and execution, while still capturing the trading edge identified by this model.
common industry practice: Trading the spread as a unit is a well-established strategy in the commodity markets, particularly in the oil industry. Many institutional investors and hedge funds use spread trading to exploit inefficiencies between related assets

My risk appetite

As an investor/trader with defined limitations, my primary concern is determining the maximum duration to lock capital into a trade without incurring significant losses/ or when I have to lock capital. If the market reverts within a week, the risk is minimal; however, if it extends beyond three or more weeks, the opportunity cost of potentially higher returns elsewhere becomes significant, specially if contracts are expiring.

Other key considerations include:

Time Horizon & Reinvestment Risk: A short reversion (within a week) poses minimal risk, while longer durations increase the opportunity cost of better returns elsewhere.
Risk Perception: I have a high tolerance for market volatility and can handle fluctuations without significant emotional impact, although I assess whether I would lose sleep over substantial market movements—my answer: not typically. In the context of this model, I consider my risk perception to be low
Confidence: When data and analytics fails provide inside I tend to follow my guts. Although I must guard against excessive risk-taking and ensure my market calls are sound and rooted with fundamentals. One question I struggle with is: How to make better decisions without complete information? Is there any principled approach?
Investment/Trading Knowledge: My trading knowledge, especially in commodities, is less than that of seasoned professionals, so I must carefully balance risk with my level of expertise. Yet, it does not mean I don’t have any knowledge of how markets work.

Here is a visual of how to understand my risk appetite and navigate it. The diagram is not exactly accurate in numbers but gives me a general sense of where I stand. Based on this I chose the stop loss of -10%.

Code

# Install and load the fmsb package
if (!require(fmsb)) install.packages("fmsb")
library(fmsb)

# Define the categories and scores
categories <- c("Time Horizon & Reinvestment Risk", 
                "Risk Perception", 
                "Confidence in Decision-Making", 
                "Investment/Trading Knowledge")

# Create a data frame with scores
scores <- data.frame(
  `Time Horizon & Reinvestment Risk` = c(0, 10, 7),  # Min, Max, Self-Assessment
  `Risk Perception` = c(0, 10, 9),
  `Confidence in Decision-Making` = c(0, 10, 4),
  `Investment/Trading Knowledge` = c(0, 10, 5)
)

# Add row names for clarity
rownames(scores) <- c("Min", "Max", "Self_Assessment")

# Create the radar chart
radarchart(scores, 
           axistype = 1,             # Axis type
           pcol = "blue",            # Line color
           pfcol = rgb(0.1, 0.2, 0.8, 0.3), # Fill color with transparency
           plwd = 2,                 # Line width
           cglcol = "grey",          # Grid line color
           cglty = 1,                # Grid line type
           cglwd = 0.8,              # Grid line width
           vlcex = 0.8               # Label text size
)
title(main = "Risk Appetite Assessment", col.main = "darkblue", font.main = 2, cex.main = 1.5)

Model performance

The graph below (cumulative return on all permutations vs spreadweight) highlights that the model performs well on spread weight of 0.75 and above. As the baltic weight increase model performs better, and when baltic weight is zero the model is not as good. At this point we realize that choosing baltic index as a fundamental factor adds value to our model. Therefore, a risk profile of weightage on spread = 0.85 (Baltic = 0.15) will be used as a sample to check the spread of performance against other factors. Moreover, being more cautious we recognize that there are risks associated with such measurement and will also analyze what the risk looks like in more details.

Understanding the risk!

Distributions with low lower_standard_deviations are concentrated around zero, with negative skewdness , as seen in the graph below. Even if a model shows high returns with a low lower_standard_deviation (as in two dimensional dotted plot for spreadweight you just saw), it does not guarantee consistency due to the concentration of returns around zero and low-probability of extremely high returns, which can distort the Omega ratio (which we will show in a bit). For instance, when weight of baltic is 0.4 and lower_STD is -1 there is a very high positive return with low probability. That will distort omega ratio. For this model the focus/interest is on positively skewed distributions that has higher lower_standard_deviations to improve risk-adjusted performance.

Low Consistency and High Volatility: The presence of rare, large positive returns (even if they are substantial in magnitude) introduces a disproportionate positive skew in the performance data, leading to an overstatement of the strategy’s long-term profitability. However, the strategy may suffer from low consistency in more typical scenarios, with returns clustered around zero or exhibiting negative skewness. This can result in a high risk of drawdowns despite an attractive Omega ratio (which will be discussed next). Specially in ideal case when the underlying returns are concentrated around a narrow range such as peaked distributions (on positive returns) with negative skewness (on negative returns). The distribution is indicative of high concentration risk rather than true efficiency in the strategy. Not recognizing it and using ordinary risk measuring tools like Sharpe (or even omega) can bring the elephant to the kneels. Coincidentally, this particular model is dealing with exact issue.

Without looking at this graph spread weight 0.8 (Baltic weight 0.2) seems to provide the most desirable return profile with lower downside and higher upside return. However we know that standard deviations below -1 has very low probability high/low returns distribution (you will see next how and why). Omega ratio cannot detect it because it is essentially the sum of all profits divided by the sum of all losses given a threshold (in our case 4%). There might be a very high return occuring onces (ie with spreadweigh of 0.7) that would dwarf smaller consistent losses, and from a risk management perspective we do not want that. With this knowledge we can make more informed decision.

Statistical Sensitivity and Misleading Optimism: The Omega ratio’s sensitivity to tail events means that strategies relying on outlier-driven returns will distort the signal-to-noise ratio of risk-adjusted performance. In practical terms, this could indicate that a strategy is superior in terms of its risk-adjusted return, when, in fact, it is simply relying on an asymmetric payoff structure that is unlikely to be repeatable. The risk of non-normality and heavy tails often leads to incorrect inferences about the distribution’s predictability and sustainability. All the omega ratios have attractive returns that are associated with very large up/down STD (mostly +-2) and we just learned that returns with large STD are not consistent.

Omega Ratio and Distribution Shape: The Omega ratio does not account for the underlying shape of the return distribution, focusing primarily on the ratio of gains above a defined threshold relative to losses below it. As such, it remains agnostic to the frequency and magnitude of these gains and losses, and is particularly vulnerable to distortion from heavy tails or outliers in the return distribution.

Impact of Rare Outliers: In scenarios where the return distribution is highly skewed with a small probability of extreme positive outliers (i.e., Pareto-like tail behavior), the Omega ratio can be disproportionately influenced by these rare occurrences. This results in a hyperbolic inflation of the ratio, which masks the volatility clustering and mean reversion that may characterize the bulk of the return series. Consequently, a strategy exhibiting such distributional characteristics may appear far more attractive in terms of the Omega ratio than it is in practice, as the ratio is driven by non-representative events that are not indicative of the strategy’s day-to-day performance.
So what to do? lets move next –>

Code

heatmap_omega

Code

fig02

Therefore, direct comparison of the risk profile (VaR) as a function of Lower_STD for varying spreadweight allocations is required. This helps in identifying how different spreadweights, such as 0.8, which seemed optimal initially, perform under different levels of risk. Refering to the graph below, one thing that is consistent is with lower lower_STD (mostly around -1) the var is consistent across different spreadweights at that level of STD. In other words if I have a constraint of lower_STD ~ -1 I might do well but if stuff hit the fun I will be the one ending up shirtless. The reason there is no var for values of lower_STD below -1 is because in those scenario the model did not execute any trade, and for the ones which it did trade (ie lower_STD -1) it had a disastrous loss.
Therefore, an analysis involved considering both VaR and Omega we can conclude that a spread-weight 0.5 var at 90% CI with lower_STD of -0.7, Upper_STD 1.3 looks appropriate. Next we will test it in other dataset.

Code

VaR_plot_facet

From the graphs below, with the chosen constraint, first is the test model, and second is the training model.

How good is my returns? Could I rely on them?

Code

strategy_test <- function(data = train,
                          par1value = 0.7,
                          par2value = -0.7,
                          spreadweight = 0.7,
                          balticweight = 0.3,
                          stop_loss_threshold = -0.1) {  # Stop-loss threshold (3%)

  data <- data %>% 
    select(
      Date,
      spreadF,
      return_spreadF,
      normalized_z_detrended_spreadF,
      normalized_z_index_baltic
    ) %>%
    mutate(
      # Composite indicator using weighted spread and Baltic changes
      composite_indicator = spreadweight * normalized_z_detrended_spreadF + balticweight * normalized_z_index_baltic,
      
      # Generate trade signals based on the composite indicator
      signal = dplyr::case_when(
        composite_indicator < par2value ~ 1,  # Buy signal
        composite_indicator > par1value ~ -1, # Sell signal
        TRUE ~ 0  # No trade signal
      )
    )
  
  # Trade orders based on the change in signal
  withtrades <- data %>%
    dplyr::mutate(
      trade = tidyr::replace_na(dplyr::lag(signal) - dplyr::lag(signal, n = 2L), 0)
    )
  
  # Track positions and calculate returns
  withposition <- withtrades %>%
    mutate(
      pos = cumsum(trade),  # Cumulative position based on trade signals
      
      # Track entry price when trade is executed
      entry_price = ifelse(trade != 0, spreadF, NA),
      
      # Initialize a column to track the stop-loss condition
      stop_loss_triggered = FALSE,
      
      # For each trade, checking if stop-loss is triggered
      stop_loss = ifelse(
        !is.na(entry_price) & pos != 0, 
        abs((spreadF - entry_price) / entry_price) >= stop_loss_threshold, 
        FALSE
      ),
      
      # Update the trade position based on stop-loss logic
      adjusted_trade = ifelse(stop_loss, 0, trade),
      
      # Calculate returns (new, existing, or stopped out)
      ret_new = ifelse(adjusted_trade != 0, adjusted_trade * return_spreadF, 0),
      
      ret_exist = ifelse(adjusted_trade == 0 & pos != 0, pos * return_spreadF, 0),
      
      # Total return considering new, existing, and stop-loss trades
      ret = ret_new + ret_exist,
      ret = ret - abs(adjusted_trade) * 0.01  # Transaction cost
    )
  
  # Calculate cumulative equity (cumulative product of total returns)
  final_test <- withposition %>%
    dplyr::mutate(
      cumeq = cumprod(1 + ret)  # Cumulative return
    )
  
  return(final_test)
}

# Create a new dataset with cleaned column names for final_test
final_clean <- strategy_test(train)  # Assuming you use 'train' here
colnames(final_clean) <- gsub("_test$", "", colnames(final_clean))  # Clean column names

# Detect available cores and register the parallel cluster
n_cores <- detectCores() - 1
cl <- makeCluster(n_cores)
registerDoParallel(cl)

# Define the updated optimize_strategy function for `final_test`
optimize_strategy_test <- function(test_data, par1value, par2value, spreadweight, balticweight) {
  
  balticweight <- 1 - spreadweight
  # Run the strategy with the given parameters
  results_test <- strategy_test(test_data, 
                      par1value = par1value, 
                      par2value = par2value, 
                      spreadweight = spreadweight, 
                      balticweight = balticweight)
  
  # Calculate the performance metric (e.g., final cumulative equity)
  performance <- last(results_test$cumeq)  # Assuming 'cumeq' is the final cumulative equity column
  
  return(performance)
}

# Set parameter grid for optimization
par1value <- seq(from = 0, to = 2, by = 0.05)
par2value <- seq(from = -2, to = 0, by = 0.05)
spreadweight <- seq(from = 0.05, to = 1, by = 0.05)
param_grid <- expand.grid(spreadweight = spreadweight, par2value = par2value, par1value = par1value)

# Parallel optimization using foreach for the `final_test` dataset
results_test <- foreach(i = 1:nrow(param_grid), .combine = rbind, .packages = c("dplyr", "tidyr")) %dopar% {
  par1 <- param_grid$par1value[i]
  par2 <- param_grid$par2value[i]
  spreadweight <- param_grid$spreadweight[i]
  
  # Run strategy and evaluate performance on `final_test`
  performance <- optimize_strategy_test(final_clean, 
                                        par1value = par1, 
                                        par2value = par2, 
                                        spreadweight = spreadweight)
  
  # Return results as a data frame
  data.frame(par1value = par1, par2value = par2, spreadweight = spreadweight, performance = performance)
}

# Stop parallel cluster
stopCluster(cl)

Overall Percentage Win with set model constraints (during training phase): 56.25% — The strategy achieved positive returns in 56.25% of trades during the training phase. It’s a moderate success rate, indicating the strategy’s consistency during backtesting.
Overall Percentage Win with set model constraints (during test phase): 47.62% — This shows a lower success rate in the testing phase, meaning the strategy was less consistent in generating positive returns on unseen data.

Code

cat("Overall Percentage Win (Train):", overall_win_train, "%\n")

Overall Percentage Win (Train): 56.25 %

Code

cat("Overall Percentage Win (Test):", overall_win_test, "%\n")

Overall Percentage Win (Test): 48.83721 %

Code

library(PerformanceAnalytics)
library(xts)


train_returns <- xts(train_strategy$ret, order.by = train_strategy$Date)
test_returns <- xts(test_strategy$ret, order.by = test_strategy$Date)

Code

PerformanceAnalytics::chart.Drawdown(test_returns, main = "Drawdown Chart (Test Strategy)")

Code

PerformanceAnalytics::chart.Drawdown(train_returns, main = "Drawdown Chart (Train Strategy)")

The absence of consistent drawdowns in both senario can largely be attributed to the embedded stop-loss mechanism, which effectively mitigates the occurrence of significant losses. The relatively low frequency of large drawdowns in both scenarios indicates that the model carries a lower overall risk profile. In the training model, there is only one pronounced drawdown, but the subsequent recovery is swift, demonstrating the model’s resilience in the face of market shocks. From a personal risk tolerance perspective, such sharp fluctuations should not provoke significant emotional distress, as an investor, due to their relatively brief duration and the model’s ability to recover quickly.

Potential Weaknesses and Limitations:

Sensitivity to Model Parameters:
The effectiveness of the composite indicator depends on the weighting of the spread and Baltic index components. Inappropriate weighting may overemphasize one factor, leading to suboptimal signals. Linear trend-line for spread and baltic changes attached in appendix for reference.
Impact of Roll Adjustments:
Using front-month continuous contracts adjusted for rollovers introduces a risk of data distortions, particularly during periods of sharp contango or backwardation in the futures curve. These distortions can affect the stationarity assumptions.
Assumption of Constant Market Dynamics:
The model assumes that the fundamental relationship between Brent and WTI prices and their drivers remains consistent over time. Structural changes in market conditions, such as shifts in global oil production or changes in transportation infrastructure, may weaken the model’s predictive power. In other words the decay rate of models of such type is very high, compared to multidimensional pure factor models.

Opportunities for Future Improvement:
1. Incorporate Volatility Adjustments:
  Adding a volatility-adjusted measure to the composite indicator could help account for periods of market stress, providing additional safeguards against unexpected deviations.
2. Expand Fundamental Drivers:
  Including other fundamental factors such as refinery demand, storage capacity, or geopolitical risk indices could enhance the explanatory power of the model.
3. Backtesting Across Multiple Regimes:
  Conducting extensive backtesting across various market regimes (e.g., high volatility, geopolitical tensions) would provide deeper insights into the model’s resilience and adaptability.
4. Dynamic Weighting Mechanism:
  A dynamic adjustment of the weights for the spread and Baltic index components could allow the model to better adapt to changing market conditions, improving performance over time.

Conclusion:

The developed pairs trading strategy for the Brent-WTI spread demonstrates a robust and disciplined framework for statistical arbitrage. Its strengths lie in combining stationarity-based statistical insights with fundamental market driver, providing a reliable foundation for mean-reversion trades. The built-in risk management framework—comprising stop-loss mechanisms, and continuous monitoring—further enhances its practical applicability by safeguarding against excessive losses.

However, the model has limitations, including sensitivity to parameter choices, exclusion of exogenous risks, and potential vulnerability to structural market changes. Future iterations could address these challenges by incorporating dynamic weighting, volatility adjustments, and expanded fundamental drivers.

Ultimately, the model is a powerful tool for market-neutral trading in the crude oil market, capable of capturing short-term mispricings while maintaining a strong focus on capital preservation and disciplined risk management.

Learning Outcomes

1. Understanding Risk Appetite

I developed a nuanced understanding of risk appetite, recognizing its critical role in shaping trading strategies and decision-making. This involved identifying tolerance levels and aligning them with practical, executable logic.

2. Codifying Logic and Testing with Historical Data

The process of transforming my views into a structured logic code and testing it with historical data was an enriching yet challenging experience. Algorithmic trading fundamentally revolves around codifying logic into a specific set of processes while adhering to well-defined constraints. Through this, I realized that designing a robust solution is as intellectually demanding as coding it. The iterative process was often frustrating but ultimately rewarding, as it deepened my understanding of systematic problem-solving.

3. Iterative Development Approach

Rather than attempting to build the entire model at once, I adopted an iterative approach akin to agile methodology. I started with a simple foundation and progressively added features as needed, such as incorporating transaction costs, and using both var with omega rather than only omega. This approach allowed for continuous refinement, where elements that made sense were retained, while others were removed or adjusted based on outcomes.

4. Organizing Code for Efficiency

One of the key lessons I learned was the importance of organizing code effectively. Much like maintaining an orderly house, having well-structured and logically arranged code significantly reduces time spent searching for functions or debugging issues. This practice enhanced my productivity and allowed me to focus more on improving the model’s performance.

5. Understanding Risk Appetite

6. Understanding Key Metrics: Omega and VaR

This project introduced me to financial risk metrics such as Omega and Value at Risk (VaR), both of which I had not utilized before. Exploring these metrics enhanced my understanding of their applications in evaluating the performance and risk of trading strategies.

Appendix

Code

spread_changeplot

Footnotes

linear regression graph of the two indicators attached in appendix↩︎
Spread weight + baltic weight = 1↩︎

--- title: "QTP" author: "Aftikhar Mominzada" format: html: code-fold: true code-tools: true mainfont: Times New Roman self-contained: true #page-layout: custom grid: sidebar-width: 50px body-width: 1400px margin-width: 50px execute: echo: true CSS: styles.css fontsize: 12pt --- ```{r,echo=FALSE} knitr::opts_chunk$set(echo = T,warning = F,message = F, fig.width = 10, fig.height = 7,fig.align = "center", tidy = FALSE, strip.white = TRUE) ``` ```{r, echo=FALSE, message=FALSE, warning=FALSE, include=FALSE} library(magrittr) library(readxl) # For reading Excel files library(dplyr) # For data manipulation (filter, mutate, pipe operators) library(plotly) # For creating interactive visualizations library(tidyr) # Optional: For data tidying and reshaping library(ggplot2) # Optional: For static plots and data exploration library(tibble) # Optional: For enhanced data frame functionality library(zoo) library(magrittr) library(dplyr) library(urca) library(readr) library(plotly) library(dplyr) library(readr) library(foreach) library(doParallel) library(dplyr) library(ggplot2) library(purrr) library(scales) library(DiagrammeR) library(patchwork) library(fmsb) ``` ```{dot} digraph presentation_flow { graph [rankdir=LR, fontname="Helvetica", nodesep=0.3, ranksep=0.2, bgcolor="#F8F9FA", splines=ortho] node [shape=rectangle, style="filled", fillcolor="#E9ECEF", fontsize=11, width=2.8, height=0.9, margin=0.1] // Compact node definitions intro [label="Introduction\nStrategy Overview", fillcolor="#B3E5FC"] data [label="Data Preparation\nSources & Preprocessing", fillcolor="#C8E6C9"] analysis [label="Analysis\nADF Tests & Modeling", fillcolor="#C8E6C9"] risk [label="Risk Management\nStop-loss Mechanisms", fillcolor="#C8E6C9"] results [label="Results\nMetrics & Backtesting", fillcolor="#C8E6C9"] limitations [label="Limitations &\nImprovements", fillcolor="#FFECB3"] conclusion [label="Conclusion\nFindings & Future Work", fillcolor="#B3E5FC"] learning [label="Learning Outcomes\nKey Insights", fillcolor="#FFECB3"] // Main horizontal flow intro -> data -> analysis -> risk -> results -> limitations -> conclusion -> learning [weight=10] // Vertical relationships for compact layout {rank=same; intro} {rank=same; data analysis risk results} {rank=same; limitations conclusion learning} // Shortened edge paths edge [color="#666666", penwidth=1, arrowsize=0.7] } ``` ```{r echo=FALSE} iuser <-"trading@ualberta.ca" ipassword <-"#5GvhnRCvk" ``` Pairs trading between Brent and WTI crude oil involves exploiting the price relationship between these two globally significant crude oil benchmarks. The strategy focuses on identifying deviations in their spread and capitalizing on the tendency for the relationship to revert to a mean.\ Why Brent and WTI? [Highly Correlated Assets:]{.underline} Both crude types are substitutes and are influenced by global supply-demand factors. As such, they tend to move in similar directions, making them good candidates for pairs trading.\ Spread Dynamics: The price spread (spread = Brent - WTI) often exhibits mean-reverting behavior due to arbitrage and relative demand-supply dynamics. [Transportation and Logistics]{.underline}: WTI prices can diverge due to storage bottlenecks or pipeline issues in the U.S., especially at the delivery hub in Cushing, Oklahoma. Also transporation cost (tanker prices) play an important role in price divergence and convergence/price volatility of the spread. transportation cost will be treated as fundamental factor for the quantitative model. [Geo-politics:]{.underline} Brent is more sensitive to geopolitical tensions in Europe, the Middle East, and Africa. WTI is more influenced by U.S. domestic production and storage conditions. Although this model would not rely on geo-politics, it is noteworthy to bring up its relevance in the report. However the scope of this report will remain to transportation cost as fundamental factor.\ \ [Note]{.underline}: For this model front month continuous contract is used and adjusted for rollover. It is noteworthy to mention that only the closing price is used instead of OHLC. ```{r, message=FALSE, warning=FALSE} s_date <- "2020-01-01" CL25F <- RTL::getPrice( feed = "CME_NymexFutures_EOD_continuous", contract = "CL_001_Month", from = s_date, iuser = iuser, ipassword = ipassword ) %>% RTL::rolladjust( commodityname = c("cmewti"), rolltype = c("Last.Trade") ) BRN25F <- RTL::getPrice( feed = "ICE_EuroFutures_continuous", contract = "BRN_001_Month", from = s_date, iuser = iuser, ipassword = ipassword ) %>% RTL::rolladjust( commodityname = c("icebrent"), rolltype = c("Last.Trade")) futures_data <- BRN25F %>% inner_join(CL25F, by ="date") names(futures_data)[c(2, 3)] <- c("BRNF", "CLF") futures_data02 <- futures_data %>% transmute(date, BRNF, CLF, spreadF = BRNF - CLF) %>% mutate(return_spreadF = spreadF/lag(spreadF)-1) %>% mutate(normalized_z_spreadF = scale(spreadF, center = TRUE, scale = TRUE)) adf_test <- futures_data02$spreadF %>% na.omit() %>% urca::ur.df(type = "trend", selectlags = "AIC") ############################################################################## # detrending normalzied data. lm_fit <- lm(normalized_z_spreadF ~ seq_along(normalized_z_spreadF), data = futures_data02) # Subtract the fitted values (the trend) from the original 'values_spread25F' column to get the detrended data futures_data02$normalized_z_detrended_spreadF <- futures_data02$normalized_z_spreadF - lm_fit$fitted.values ############################################################################## file_path <- "https://raw.githubusercontent.com/Aftikharmnz/Baltic/refs/heads/main/Baltic%20Dirty%20Tanker%20Historical%20Data.csv" ### If you're reading this and wondering why I stored data in github and imported it here, the reason is Website that provides historical data for baltic dirty tanker index prices is protected and I could not webscrape it. baltic_data <- read_csv(file_path, col_types = cols(.default = col_character())) baltic_data <- baltic_data %>% transmute( Date = as.Date(Date, format = "%m/%d/%Y"), index = as.numeric(Price), change = index/lag(index)-1) O1 <- futures_data %>% transmute(Date = date, spread = BRNF - CLF, spread_ratio = spread / dplyr::lag(spread) -1) %>% na.omit() comparsion <- baltic_data %>% inner_join(O1, by = "Date") ``` ```{r} library(slider) library(dplyr) # Prepare your data by joining datasets corr_data <- baltic_data %>% inner_join(futures_data02 %>% mutate(Date = date), by = "Date") %>% select(Date, change, return_spreadF) # Fit the linear regression model lm_model <- lm(return_spreadF ~ change, data = corr_data) # Plot the data points library(ggplot2) spread_changeplot <- ggplot(corr_data, aes(x = change, y = return_spreadF)) + geom_point(color = "blue") + # scatter plot of data points geom_smooth(method = "lm", color = "red", se = FALSE) + # regression line labs(title = "Linear Regression: Return SpreadF vs Change", x = "Change", y = "Return SpreadF") + theme_minimal() ``` ```{r} url02 <- "https://raw.githubusercontent.com/Aftikharmnz/Baltic_test01/refs/heads/main/Baltic_test%20data_2018_2020.csv" url02_data <- read.csv(url02)[, c(1, 2)] # Convert the Date column to Date format library(dplyr) comparsion_test <- url02_data %>% mutate(Date = as.Date(Date, "%m/%d/%Y"), change = Price/lag(Price)-1) comparsion_test <- comparsion_test %>% mutate(normalized_z_index_baltic_test = scale(change, center = TRUE, scale = TRUE)) %>% arrange(as.Date(Date)) s_date_test <- "2018-01-01" CLF_test <- RTL::getPrice( feed = "CME_NymexFutures_EOD_continuous", contract = "CL_001_Month", from = s_date_test, iuser = iuser, ipassword = ipassword ) %>% RTL::rolladjust( commodityname = c("cmewti"), rolltype = c("Last.Trade") ) BRNF_test <- RTL::getPrice( feed = "ICE_EuroFutures_continuous", contract = "BRN_001_Month", from = s_date_test, iuser = iuser, ipassword = ipassword ) %>% RTL::rolladjust( commodityname = c("icebrent"), rolltype = c("Last.Trade")) futures_data_test <- BRNF_test %>% inner_join(CLF_test, by ="date") names(futures_data_test)[c(2, 3)] <- c("BRNF_test", "CLF_test") ``` ```{r} plot_stationaty <- plot_ly(futures_data02, x = ~date, y = ~normalized_z_detrended_spreadF, type = 'scatter', mode = 'lines', name = 'Normalized Spread_F', line = list(color = 'blue', width = 1)) %>% add_trace( x = futures_data02$date, y = rep(0.5, length(futures_data02$date)), type = 'scatter', mode = 'lines', name = 'Positive 0.5 SD', line = list(color = 'green', dash = 'dash', width = 1) ) %>% add_trace( x = futures_data02$date, y = rep(-0.5, length(futures_data02$date)), type = 'scatter', mode = 'lines', name = 'Negative 0.5 SD', line = list(color = 'red', dash = 'dash', width = 1) ) %>% layout( legend = list( orientation = 'h', # Horizontal orientation x = 0.5, # Center the legend y = -0.2, # Position below the chart xanchor = 'center', # Align horizontally to the center yanchor = 'top' # Align vertically to the top of the legend ) ) ``` # ADF test: Baltic dirty Tanker index changes and Brent-WTI spread changes ::::: grid ::: g-col-6 ```{r, echo=FALSE, message = FALSE, warning = FALSE} adf_test <- futures_data02$spreadF %>% na.omit() %>% urca::ur.df(type = "trend", selectlags = "AIC") # Extract Key Information adf_statistic <- adf_test@teststat[1] # ADF test statistic critical_values <- adf_test@cval # Critical values critical_tau3 <- critical_values["tau3", ] # Extract tau3 critical values # Calculate p-values for each confidence level p_value_1pct <- ifelse(adf_statistic < critical_tau3["1pct"], "< 0.01 (reject null)", "> 0.01 (fail to reject null)") p_value_5pct <- ifelse(adf_statistic < critical_tau3["5pct"], "< 0.05 (reject null)", "> 0.05 (fail to reject null)") p_value_10pct <- ifelse(adf_statistic < critical_tau3["10pct"], "< 0.10 (reject null)", "> 0.10 (fail to reject null)") # Display Clean Results cat("Augmented Dickey-Fuller (ADF) Test Results for Brent-WTI Spread\n") cat("------------------------------------------\n") cat("Test Statistic: ", round(adf_statistic, 3), "\n") cat("Critical Values (tau3):\n") cat(" 1%: ", round(critical_tau3["1pct"], 3), "\n") cat(" 5%: ", round(critical_tau3["5pct"], 3), "\n") cat(" 10%: ", round(critical_tau3["10pct"], 3), "\n") cat("\n") cat("Null Hypothesis: Series is not stationary\n") cat("Alternative Hypothesis: Series is stationary\n") cat("P-Value (Confidence level 99%): ", p_value_1pct, "\n") cat("P-Value (Confidence level 95%): ", p_value_5pct, "\n") cat("P-Value (Confidence level 90%): ", p_value_10pct, "\n") ``` ::: ::: g-col-6 ```{r, echo=FALSE, message = FALSE, warning = FALSE} # Perform Augmented Dickey-Fuller Test adf_test_baltic <- comparsion$change %>% na.omit() %>% urca::ur.df(type = "trend", selectlags = "AIC") # Extract Key Information adf_statistic_baltic <- adf_test_baltic@teststat[1] # ADF test statistic critical_values_baltic <- adf_test_baltic@cval # Critical values critical_tau3_baltic <- critical_values_baltic["tau3", ] # Extract tau3 critical values # Calculate p-values for each confidence level p_value_1pct <- ifelse(adf_statistic_baltic < critical_tau3_baltic["1pct"], "< 0.01 (reject null)", "> 0.01 (fail to reject null)") p_value_5pct <- ifelse(adf_statistic_baltic < critical_tau3_baltic["5pct"], "< 0.05 (reject null)", "> 0.05 (fail to reject null)") p_value_10pct <- ifelse(adf_statistic_baltic < critical_tau3_baltic["10pct"], "< 0.10 (reject null)", "> 0.10 (fail to reject null)") # Display Clean Results cat("Augmented Dickey-Fuller (ADF) Test Results for Baltic Dirty Tanker index changes\n") cat("-----------------------------------------------------------\n") cat("Test Statistic: ", round(adf_statistic_baltic, 3), "\n") cat("Critical Values (tau3):\n") cat(" 1%: ", round(critical_tau3_baltic["1pct"], 3), "\n") cat(" 5%: ", round(critical_tau3_baltic["5pct"], 3), "\n") cat(" 10%: ", round(critical_tau3_baltic["10pct"], 3), "\n") cat("\n") cat("Null Hypothesis: Series is not stationary\n") cat("Alternative Hypothesis: Series is stationary\n") cat("P-Value (confidence level 99%): ", p_value_1pct, "\n") cat("P-Value (Confidence level 95%): ", p_value_5pct, "\n") cat("P-Value (Confidence level 90%): ", p_value_10pct, "\n") ``` ::: ::::: Reasoning: The Augmented Dickey-Fuller (ADF) test is a statistical test used to determine whether a time series is stationary. A stationary time series has a constant mean, variance, and auto-covariance over time, which is a critical assumption for many time series models. Given that spread between Brent and WTI (or in the case of Baltic dirty tanker index changes) is stationary we can confidently continue building our quantitative model under assumption that the spread will mean revert. In other words since there would be fluctuation from mean we can capitalize on it assuming that it will revert back to mean, and build a quantitative model around it while optimizing for standard deviations as benchmarks to short or long the spread. Conclusion from stationarity in spread/Baltic changes: - Confirms a strong long-term equilibrium relationship. - Provides a foundation for statistical arbitrage (mean-reversion-based) pairs trading. - Indicates that deviations in the spread are trade-able anomalies rather than signs of a fundamental divergence in the two markets (in the context of WTI-Brent spread). # Normalized and de-trended Baltic dirty tanker index changes and Brent - WTI spread Graphic below shows that spread volatility is around mean.\ The reasoning behind detrending the normalized z-scored lies in ensuring that the spread between prices/index reflect only short-term deviations from its historical equilibrium, free from any residual linear trends. Even though the spread/index changes has been found to be stationary, slight directional drifts may still be present due to market dynamics or structural shifts, which could obscure the true mean-reverting nature of the spread. By fitting a linear regression model to the normalized data and subtracting the fitted trend, we effectively remove these long-term drifts, isolating the oscillatory component of the spread. This detrended series provides a clearer signal for identifying mean-reversion opportunities, which is the cornerstone of pairs trading strategies. In this refined form, the data better represents temporary anomalies that can be exploited for trading, enabling more reliable and robust signal generation while minimizing the risk of misinterpreting persistent trends as trading opportunities. This process enhances both the quality and interpret-ability of the data for market-neutral strategies focused on statistical arbitrage. We will use the standard deviation as benchmark/signal for when to enter and exit a trade and will optimize for it.\ (please zoom for better view of mean reverting behavior) ```{r, message=FALSE, warning=FALSE, include=FALSE} adf_test_baltic <- comparsion$change %>% na.omit() %>% urca::ur.df(type = "trend", selectlags = "AIC") #F-statistic: 90.19 on 3 and 1079 DF, p-value: < 2.2e-16 # data is stationary selected_futures <- futures_data02 %>% mutate(Date = date) %>% select(Date, normalized_z_detrended_spreadF) comparsion <- comparsion %>% inner_join(selected_futures, by = "Date") # normalizing data comparsion <- comparsion %>% mutate(normalized_z_index_baltic = scale(change, center = TRUE, scale = TRUE)) %>% arrange(as.Date(Date)) plott0011 <- comparsion %>% plot_ly( x = ~Date, y = ~normalized_z_index_baltic, type = 'scatter', mode = 'lines', name = 'Normalized_Detrended_Baltic_index', line = list(color = 'red', width = 1)) %>% add_trace( y = ~normalized_z_detrended_spreadF, type = 'scatter', mode = 'lines', name = 'Normalized Spread F', line = list(color = 'blue', width = 1) ) %>% layout( legend = list( orientation = 'h', # Horizontal orientation x = 0.5, # Center the legend y = -0.2, # Position below the chart xanchor = 'center', # Align horizontally to the center yanchor = 'top' # Align vertically to the top of the legend ) ) ``` Here is a visual representation of mean revesion behaviour. ::::: grid ::: g-col-6 ``` {r} plot_stationaty ``` ::: ::: g-col-6 ``` {r} plott0011 ``` ::: ::::: # Composite indicator[^1] [^1]: linear regression graph of the two indicators attached in appendix $$ \text{composite\_indicator} = (\text{spreadweight} \times \text{normalized\_z\_detrended\_spreadF}) + (\text{balticweight} \times \text{normalized\_z\_index\_baltic}) $$ The composite indicator I developed combines two critical factors—detrended spread values and changes in the Baltic index—into a single metric designed to capture market dynamics comprehensively. By assigning weights to these components, the indicator balances the localized imbalances reflected in the spread with the global shipping demand trends represented by the Baltic index. The rationale behind this approach is to create a nuanced measure of market conditions, where the weighted contributions of these variables adjust the sensitivity of trade signals. The indicator is further optimized by setting thresholds to identify overvalued or undervalued conditions, enabling systematic buy and sell decisions. Moving forward, I plan to optimize the weights and thresholds to improve the indicator's predictive power and adapt it to different market environments. The baltic weight contributes to composite indicator by increasing the indicator when price of tanker increases, and vice versa, providing a better signal rooted with fundamentals.\ \ The weights (`spread weight` and `baltic weight`[^2]) allow for flexibility in emphasizing either the spread or the Baltic index changes based on market dynamics. The thresholds (`Upper standard deviation` and `Lower standard deviation`) for spread can be adjusted to control the sensitivity of the strategy.\ \ Baltic weight: The Baltic weight is particularly impactful as it highlights discrepancies in transportation costs and logistical bottlenecks, which are significant drivers of the price differential between the two benchmarks. Brent prices are more influenced by seaborne trade, while WTI, tied to inland U.S. production, is more sensitive to domestic transportation constraints. The Baltic weight bridges this gap by quantifying global transport demand, making it the most critical component of the composite indicator. Its inclusion ensures the model captures broader macroeconomic and fundamental factors that directly affect the spread, providing a robust basis for trading and forecasting. [^2]: Spread weight + baltic weight = 1 ```{r, message=FALSE, warning=FALSE, include=FALSE} par1value <- 0.7 par2value <- -0.7 spreadweight <- 0.7 balticweight <- 0.3 # Combine the spread signal and Baltic changes into a single indicator futures_data03 <- futures_data02 %>% mutate(Date = date) %>% inner_join(comparsion %>% select(Date, normalized_z_index_baltic), by = "Date") %>% mutate( composite_indicator = spreadweight * normalized_z_detrended_spreadF + balticweight * normalized_z_index_baltic, signal = dplyr::case_when( composite_indicator < par2value ~ 1, composite_indicator > par1value ~ -1, TRUE ~ 0 ) ) %>% na.omit() ############################################################################### withtrades <- futures_data03 %>% dplyr::mutate( # Calculate trade orders by looking at the change in signal trade = tidyr::replace_na(dplyr::lag(signal) - dplyr::lag(signal, n = 2L), 0) ) stop_loss_threshold <- -0.04 withposition <- withtrades %>% mutate( pos = cumsum(trade), # Cumulative position based on trade signals # Track entry price when trade is executed entry_price = ifelse(trade != 0, spreadF, NA), # Initialize a column to track the stop-loss condition stop_loss_triggered = FALSE, # For each trade, checking if stop-loss is triggered stop_loss = ifelse( !is.na(entry_price) & pos != 0, abs((spreadF - entry_price) / entry_price) >= stop_loss_threshold, FALSE ), # Update the trade position based on stop-loss logic adjusted_trade = ifelse(stop_loss, 0, trade), # Calculate returns (new, existing, or stopped out) ret_new = ifelse(adjusted_trade != 0, adjusted_trade * return_spreadF, 0), ret_exist = ifelse(adjusted_trade == 0 & pos != 0, pos * return_spreadF, 0), # Total return considering new, existing, and stop-loss trades ret = ret_new + ret_exist, ret = ret - abs(adjusted_trade) * 0.01 # Transaction cost ) final <- withposition %>% dplyr::mutate( cumeq = cumprod(1 + ret) # Cumulative P&L ) ``` ```{r, message=FALSE, warning=FALSE, include=FALSE} # Change -1 to +1 for trade and pos in the final dataset final001 <- final # Create individual ggplot objects for each variable # Plot for spreadF_test plot_spreadF_final <- ggplot(final001, aes(x = date, y = spreadF)) + geom_line(color = "black", size = 1) + labs(title = "Spread", x = "Date", y = "Value") + theme_minimal() # Plot for trade (points where trade occurs, assuming trade is binary) plot_trade_final <- ggplot(final001, aes(x = date, y = trade)) + geom_point(color = "blue", size = 1) + # Points for trade occurrences (only when trade == 1) labs(title = "Trades", x = "Date", y = "Trade") + theme_minimal() # Plot for pos (points where position exists, assuming pos is binary) plot_pos_final <- ggplot(final001, aes(x = date, y = pos)) + geom_point(color = "blue", size = 1) + # Points for position occurrences (only when pos == 1) labs(title = "Positions", x = "Date", y = "Position") + theme_minimal() # Plot for cumeq plot_cumeq_final <- ggplot(final001, aes(x = date, y = cumeq)) + geom_line(color = "blue", size = 1) + labs(title = "Cumulative Performance", x = "Date", y = "Value") + theme_minimal() # Combine all plots in a single view with shared x-axis final_plot_combined <- plot_spreadF_final / plot_trade_final / plot_pos_final / plot_cumeq_final + plot_layout(heights = c(1, 1, 1, 1)) # Equal height for each plot # Display the combined plot ``` # Highlight of the model logic Critical: while calculating the returns I ensured I took the returns from actual spread. It is noteworthy because the model is working with normalized and actual prices/index. - For more practicality the model will charge a 1% transaction cost and there is a stop loss of -10% (both embedded in the logic of this model) - Par1value - upper standard deviation (above mean/zero) - Par2value - lower standard deviation (below mean/zero) ```{dot} digraph G { size="25,20"; ratio=compress; node [fontsize=45, width=6]; edge [fontsize=45]; nodesep=2; ranksep=1.5; graph [rankdir=TB, fontsize=45, labelloc=top, label="Trading Strategy Decision Flowchart: Signal Generation and Execution Logic"] node [shape=rectangle, style=filled, fillcolor=lightblue] Start [label="Start: Input Composite Indicator"] Detrend [label="Detrending and Normalizing Data"] Check_Thresholds [label="Check Composite Indicator\nvs. Thresholds:\npar1value (Upper), par2value (Lower)"] Buy_Signal [label="Buy Signal:\nIndicator < par2value (Lower Threshold)"] Sell_Signal [label="Sell Signal:\nIndicator > par1value (Upper Threshold)"] Hold [label="Hold Signal:\npar2value ≤ Indicator ≤ par1value"] Stop_Loss_Check [label="Stop Loss Check:\nIs Loss > Threshold?"] Close_Position [label="Close Position: Stop Loss Triggered"] Trade_Execution [label="Trade Execution:\nBuy/Sell Spread at Next Open"] Trade_Cost [label="Adjust for 1% Trade Charge"] Position_Update [label="Update Position:\nLong, Short, Flat"] End [label="End"] Start -> Detrend -> Check_Thresholds Check_Thresholds -> Buy_Signal [label="Indicator < par2value"] Check_Thresholds -> Sell_Signal [label="Indicator > par1value"] Check_Thresholds -> Hold [label="Within Thresholds"] Buy_Signal -> Stop_Loss_Check Sell_Signal -> Stop_Loss_Check Stop_Loss_Check -> Close_Position [label="Loss > Threshold"] Stop_Loss_Check -> Trade_Execution [label="No Stop Loss Triggered"] Close_Position -> Trade_Cost Trade_Execution -> Trade_Cost Trade_Cost -> Position_Update -> End Hold -> End } ``` A Check of the model before doing optimization: Used the following constraints: par1value = 0.7 par2value = -0.7 spreadweight = 0.7 balticweight = 0.3 ```{r, echo=FALSE,message=FALSE, warning=FALSE, fig.width=10, fig.height=6} final_plot_combined ``` The model is working fine --\> executing trade, taking positions, with a cumulative performance. All looks good ```{r,message=FALSE, warning=FALSE, include=FALSE} train <- futures_data03 ``` ```{r, message=FALSE, warning=FALSE} initial_strategy <- function(data = train, par1value = 0.7, par2value = -0.7, spreadweight = 0.7, balticweight = 0.3) { data <- data %>% select( Date, spreadF, return_spreadF, normalized_z_detrended_spreadF, normalized_z_index_baltic ) %>% mutate( # Create composite indicator using weighted spread and Baltic changes composite_indicator = spreadweight * normalized_z_detrended_spreadF + balticweight * normalized_z_index_baltic, # Generate trade signals based on the composite indicator signal = dplyr::case_when( composite_indicator < par2value ~ 1, # Buy signal composite_indicator > par1value ~ -1, # Sell signal TRUE ~ 0 # No trade signal ) ) # Create trade orders based on the change in signal withtrades <- data %>% dplyr::mutate( trade = tidyr::replace_na(dplyr::lag(signal) - dplyr::lag(signal, n = 2L), 0) ) # Track positions and calculate returns withposition <- withtrades %>% mutate( pos = cumsum(trade), # Cumulative position based on trade signals # Return on new trades ret_new = ifelse(trade != 0, trade * return_spreadF, 0), # Return on existing trades (carry the same position) ret_exist = ifelse(trade == 0 & pos != 0, pos * return_spreadF, 0), # Total return considering new and existing trades ret = ret_new + ret_exist, ret = ret - abs(trade) * 0.01 # transaction cost ) # Calculate cumulative equity (cumulative product of total returns) final <- withposition %>% dplyr::mutate( cumeq = cumprod(1 + ret) # Cumulative return ) return(final) } strategy <- function(data = train, par1value = 0.7, par2value = -0.7, spreadweight = 0.7, balticweight = 0.3, stop_loss_threshold = - 0.1) { # Stop-loss threshold (3%) data <- data %>% select( Date, spreadF, return_spreadF, normalized_z_detrended_spreadF, normalized_z_index_baltic ) %>% mutate( # composite indicator using weighted spread and Baltic changes composite_indicator = spreadweight * normalized_z_detrended_spreadF + balticweight * normalized_z_index_baltic, # Generate trade signals based on the composite indicator signal = dplyr::case_when( composite_indicator < par2value ~ 1, # Buy signal composite_indicator > par1value ~ -1, # Sell signal TRUE ~ 0 # No trade signal ) ) # trade orders based on the change in signal withtrades <- data %>% dplyr::mutate( trade = tidyr::replace_na(dplyr::lag(signal) - dplyr::lag(signal, n = 2L), 0) ) # Track positions and calculate returns withposition <- withtrades %>% mutate( pos = cumsum(trade), # Cumulative position based on trade signals # Track entry price when trade is executed entry_price = ifelse(trade != 0, spreadF, NA), # Initialize a column to track the stop-loss condition stop_loss_triggered = FALSE, # For each trade, checking if stop-loss is triggered stop_loss = ifelse( !is.na(entry_price) & pos != 0, abs((spreadF - entry_price) / entry_price) >= stop_loss_threshold, FALSE ), # Update the trade position based on stop-loss logic adjusted_trade = ifelse(stop_loss, 0, trade), # Calculate returns (new, existing, or stopped out) ret_new = ifelse(adjusted_trade != 0, adjusted_trade * return_spreadF, 0), ret_exist = ifelse(adjusted_trade == 0 & pos != 0, pos * return_spreadF, 0), # Total return considering new, existing, and stop-loss trades ret = ret_new + ret_exist, ret = ret - abs(adjusted_trade) * 0.01 # transaction cost ) # Calculate cumulative equity (cumulative product of total returns) final <- withposition %>% dplyr::mutate( cumeq = cumprod(1 + ret) # Cumulative return ) return(final) } par1value = seq(from = 0, to = 2, by = 0.05) par2value <- seq(from = -2, to = 0, by = 0.05) param_grid <- expand.grid(par1value = par1value, par2value = par2value) ``` ```{r, message=FALSE, warning=FALSE, include=FALSE} n_cores <- detectCores() - 1 # Assign cores and register the cluster cl <- makeCluster(n_cores) #Register selected number of cores to be used foreach() function registerDoParallel(cl) # Updated optimize_strategy function with additional parameters optimize_strategy <- function(train_data, par1value, par2value, spreadweight, balticweight) { balticweight <- 1 - spreadweight # Run the strategy with the four parameters results <- strategy(train_data, par1value = par1value, par2value = par2value, spreadweight = spreadweight, balticweight = balticweight) # Calculate performance metric (e.g., final cumulative equity) performance <- last(results$cumeq) # Final cumulative equity return(performance) } # Create a parameter grid that includes the additional parameters par1value <- seq(from = 0, to = 2, by = 0.05) par2value <- seq(from = -2, to = 0, by = 0.05) spreadweight <- seq(from = 0.05, to = 1, by = 0.05) # Create a parameter grid that includes all combinations of the parameters param_grid <- expand.grid(spreadweight = spreadweight, par2value = par2value, par1value = par1value) # Parallel optimization using foreach to optimize over all four parameters results <- foreach(i = 1:nrow(param_grid), .combine = rbind, .packages = c("dplyr", "tidyr")) %dopar% { par1 <- param_grid$par1value[i] par2 <- param_grid$par2value[i] spreadweight <- param_grid$spreadweight[i] # Run strategy and evaluate performance performance <- optimize_strategy(train, par1value = par1, par2value = par2, spreadweight = spreadweight) # Return results as a data frame data.frame(par1value = par1, par2value = par2, spreadweight = spreadweight, performance = performance) } # Stop parallel cluster stopCluster(cl) plot_spreadweight_performances <- plot_ly(data = results, x = ~spreadweight, y = ~performance, type = "scatter", mode = "markers") %>% layout( title = list( text = "Performance based on different spreadweights", # Main title text x = 0.5, # Position the title in the center xanchor = "center" ), annotations = list( text = " ", # Subtitle text x = 0.5, # Position the subtitle in the center y = 1.05, # Adjust the vertical position to be above the title showarrow = FALSE, font = list(size = 12, color = "grey") # Style the subtitle font ) ) ``` ```{r, message=FALSE, warning=FALSE, include=FALSE} plotly_data <- results %>% mutate(spreadweight = as.factor(spreadweight)) # Treat spreadweight as a categorical variable for animation # Generate the animated 3D plot fig01 <- plot_ly( data = plotly_data, x = ~par1value, # X-axis: par1value y = ~par2value, # Y-axis: par2value z = ~performance, # Z-axis: performance color = ~spreadweight, colorscale = "Viridis", type = "scatter3d", # Use scatter3d for dynamic plots mode = "markers", # Use markers for better animation frame = ~spreadweight, # Animation frame for each spreadweight marker = list(size = 5) # Customize marker size ) # Customize layout fig01 <- fig01 %>% layout( title = "Performance/returns across different Spread_weights", scene = list( xaxis = list(title = "Upper_STD"), yaxis = list(title = "Lower_STD"), zaxis = list(title = "Performance/cumulative Return") ), updatemenus = list( list( type = "buttons", showactive = FALSE, buttons = list( list( label = "Play", method = "animate", args = list(NULL, list(frame = list(duration = 500, redraw = TRUE), fromcurrent = TRUE)) ), list( label = "Pause", method = "animate", args = list(NULL, list(mode = "immediate", frame = list(duration = 0, redraw = FALSE), transition = list(duration = 0))) ) ) ) ) ) ``` # **Justification for using spread rather than buying/selling WTI/Brent seperately** 1. **Statistical Validity:** spread is stationary and mean-reverting; hence the strategy aligns perfectly with statistical arbitrage principles (cointegration & mean reversion). 2. **Risk Mitigation:** Spread trading reduces systemic risk and focuses on relative value, making it more robust in volatile oil market. (risk neutral hedging). 3. **Simplicity and Execution:** Trading the spread simplifies both modeling and execution, while still capturing the trading edge identified by this model. 4. **common industry practice:** Trading the spread as a unit is a well-established strategy in the commodity markets, particularly in the oil industry. Many institutional investors and hedge funds use spread trading to exploit inefficiencies between related assets # My risk appetite As an investor/trader with defined limitations, my primary concern is determining the maximum duration to lock capital into a trade without incurring significant losses/ or when I have to lock capital. If the market reverts within a week, the risk is minimal; however, if it extends beyond three or more weeks, the opportunity cost of potentially higher returns elsewhere becomes significant, specially if contracts are expiring. Other key considerations include: 1. **Time Horizon & Reinvestment Risk**: A short reversion (within a week) poses minimal risk, while longer durations increase the opportunity cost of better returns elsewhere. 2. **Risk Perception**: I have a high tolerance for market volatility and can handle fluctuations without significant emotional impact, although I assess whether I would lose sleep over substantial market movements—my answer: not typically. In the context of this model, I consider my risk perception to be low 3. **Confidence**: When data and analytics fails provide inside I tend to follow my guts. Although I must guard against excessive risk-taking and ensure my market calls are sound and rooted with fundamentals. One question I struggle with is: How to make better decisions without complete information? Is there any principled approach? 4. **Investment/Trading Knowledge**: My trading knowledge, especially in commodities, is less than that of seasoned professionals, so I must carefully balance risk with my level of expertise. Yet, it does not mean I don't have **any** knowledge of how markets work. Here is a visual of how to understand my risk appetite and navigate it. The diagram is not exactly accurate in numbers but gives me a general sense of where I stand. Based on this I chose the stop loss of -10%. ```{r} # Install and load the fmsb package if (!require(fmsb)) install.packages("fmsb") library(fmsb) # Define the categories and scores categories <- c("Time Horizon & Reinvestment Risk", "Risk Perception", "Confidence in Decision-Making", "Investment/Trading Knowledge") # Create a data frame with scores scores <- data.frame( `Time Horizon & Reinvestment Risk` = c(0, 10, 7), # Min, Max, Self-Assessment `Risk Perception` = c(0, 10, 9), `Confidence in Decision-Making` = c(0, 10, 4), `Investment/Trading Knowledge` = c(0, 10, 5) ) # Add row names for clarity rownames(scores) <- c("Min", "Max", "Self_Assessment") # Create the radar chart radarchart(scores, axistype = 1, # Axis type pcol = "blue", # Line color pfcol = rgb(0.1, 0.2, 0.8, 0.3), # Fill color with transparency plwd = 2, # Line width cglcol = "grey", # Grid line color cglty = 1, # Grid line type cglwd = 0.8, # Grid line width vlcex = 0.8 # Label text size ) title(main = "Risk Appetite Assessment", col.main = "darkblue", font.main = 2, cex.main = 1.5) ``` # Model performance The graph below (cumulative return on all permutations vs spreadweight) highlights that the model performs well on spread weight of 0.75 and above. As the baltic weight increase model performs better, and when baltic weight is zero the model is not as good. At this point we realize that choosing baltic index as a fundamental factor adds value to our model. Therefore, a risk profile of weightage on spread = 0.85 (Baltic = 0.15) will be used as a sample to check the spread of performance against other factors. Moreover, being more cautious we recognize that there are risks associated with such measurement and will also analyze what the risk looks like in more details. ::::: grid ::: g-col-6 ```{r warning=FALSE, echo=FALSE} plot_spreadweight_performances ``` ::: ::: g-col-6 ```{r warning=FALSE, echo=FALSE} fig01 ``` ::: ::::: ### Understanding the risk! Distributions with low lower_standard_deviations are concentrated around zero, with negative skewdness , as seen in the graph below. Even if a model shows high returns with a low lower_standard_deviation (as in two dimensional dotted plot for spreadweight you just saw), it does not guarantee consistency due to the concentration of returns around zero and low-probability of extremely high returns, which can distort the Omega ratio (which we will show in a bit). For instance, when weight of baltic is 0.4 and lower_STD is -1 there is a very high positive return with low probability. That will distort omega ratio. For this model the focus/interest is on positively skewed distributions that has higher lower_standard_deviations to improve risk-adjusted performance. **Low Consistency and High Volatility**: The presence of rare, large positive returns (even if they are substantial in magnitude) introduces a **disproportionate positive skew** in the performance data, leading to an overstatement of the strategy’s **long-term profitability**. However, the strategy may suffer from **low consistency** in more typical scenarios, with returns clustered around zero or exhibiting **negative skewness**. This can result in a **high risk of drawdowns** despite an attractive Omega ratio (which will be discussed next). Specially in ideal case when the underlying returns are **concentrated** around a narrow range such as peaked distributions (on positive returns) with negative skewness (on negative returns). The distribution is indicative of **high concentration risk** rather than true **efficiency** in the strategy. Not recognizing it and using ordinary risk measuring tools like Sharpe (or even omega) can bring the elephant to the kneels. Coincidentally, this particular model is dealing with exact issue. ```{r, message=FALSE, warning=FALSE, include=FALSE} result_omegasharpe <- results %>% filter(par2value < -1) distrbution_1LSTD <- ggplot(result_omegasharpe, aes(x = performance)) + geom_histogram(color = "black", fill = "blue", aes(y = after_stat(density))) + stat_function(fun = dnorm, colour = "red", args = list( mean = mean(result_omegasharpe$performance, na.rm = TRUE), sd = sd(result_omegasharpe$performance, na.rm = TRUE) )) + labs( title = "Sharpe vs omega when spread_weight = 0.8", subtitle = "Distribution of performance when lowerSTD is less than -1", x = "Performance", y = "Density" ) + scale_x_continuous(labels = scales::comma) + scale_y_continuous(labels = scales::comma) library(dplyr) library(ggplot2) library(purrr) library(scales) # Define the range of spreadweight values spreadweight_values <- seq(0.1, 1, by = 0.1) # Combine all filtered data into a single data frame with spreadweight column combined_data <- map_dfr(spreadweight_values, function(weight) { results %>% filter(par2value < -1 & spreadweight == weight) %>% mutate(spreadweight = weight) }) # Plot all histograms in one plot using facet_wrap combined_plot <- ggplot(combined_data, aes(x = performance)) + geom_histogram(color = "black", fill = "blue", aes(y = after_stat(density))) + stat_function(fun = dnorm, colour = "red", args = list( mean = mean(combined_data$performance, na.rm = TRUE), sd = sd(combined_data$performance, na.rm = TRUE) )) + labs( title = "distribution of performance values for different spreadweight conditions", subtitle = "Distribution of performance when lowerSTD is less than -1", x = "Performance", y = "Density" ) + scale_x_continuous(labels = comma) + scale_y_continuous(labels = comma) + facet_wrap(~spreadweight, scales = "free") # Display the combined plot ``` ```{r, echo=FALSE, message=FALSE, warning=FALSE} combined_plot ``` ```{r, message=FALSE, warning=FALSE, include=FALSE} results01 <- results %>% filter(as.factor(spreadweight) == 0.8) performance_matrix <- matrix(results01$performance, nrow = length(unique(results01$par1value)), ncol = length(unique(results01$par2value)), byrow = TRUE) # Create a 3D Surface Plot fig02 <- plot_ly( x = unique(results01$par1value), # X-axis values y = unique(results01$par2value), # Y-axis values z = performance_matrix, # Z-axis matrix type = "surface", # Surface plot colorscale = "Viridis" ) # Customize Layout fig02 <- fig02 %>% layout( title = "Optimization Results: Spread weight = 0.8", scene = list( xaxis = list(title = "Upper_STD"), yaxis = list(title = "Lower_STD"), zaxis = list(title = "Performance/cumulative return") ) ) ``` Without looking at this graph spread weight 0.8 (Baltic weight 0.2) seems to provide the most desirable return profile with lower downside and higher upside return. However we know that standard deviations below -1 has very low probability high/low returns distribution (you will see next how and why). Omega ratio cannot detect it because it is essentially the sum of all profits divided by the sum of all losses given a threshold (in our case 4%). There might be a very high return occuring onces (ie with spreadweigh of 0.7) that would dwarf smaller consistent losses, and from a risk management perspective we do not want that. With this knowledge we can make more informed decision. **Statistical Sensitivity and Misleading Optimism**: The Omega ratio's sensitivity to tail events means that strategies relying on **outlier-driven returns** will distort the **signal-to-noise ratio** of risk-adjusted performance. In practical terms, this could indicate that a strategy is **superior** in terms of its risk-adjusted return, when, in fact, it is simply relying on an **asymmetric payoff structure** that is unlikely to be repeatable. The risk of **non-normality** and **heavy tails** often leads to **incorrect inferences** about the distribution’s **predictability** and **sustainability**. All the omega ratios have attractive returns that are associated with very large up/down STD (mostly +-2) and we just learned that returns with large STD are not consistent. **Omega Ratio and Distribution Shape**: The Omega ratio does not account for the underlying **shape of the return distribution**, focusing primarily on the ratio of **gains** above a defined threshold relative to **losses** below it. As such, it remains agnostic to the **frequency** and **magnitude** of these gains and losses, and is particularly vulnerable to distortion from **heavy tails** or outliers in the return distribution. **Impact of Rare Outliers**: In scenarios where the return distribution is **highly skewed** with a small probability of extreme positive outliers (i.e., **Pareto-like tail behavior**), the Omega ratio can be disproportionately influenced by these rare occurrences. This results in a **hyperbolic inflation** of the ratio, which masks the **volatility clustering** and **mean reversion** that may characterize the bulk of the return series. Consequently, a strategy exhibiting such distributional characteristics may appear far more attractive in terms of the Omega ratio than it is in practice, as the ratio is driven by **non-representative events** that are not indicative of the strategy’s day-to-day performance.\ So what to do? lets move next –\> ```{r, message=FALSE, warning=FALSE, include=FALSE} calculate_performance <- function(data, threshold = 0.04) { # Calculate gains and losses gains <- data$ret[data$ret > threshold] losses <- data$ret[data$ret <= threshold] # Sum of gains and losses sum_gains <- sum(gains) sum_losses <- sum(losses) # Omega ratio: sum of gains / sum of losses omega_ratio <- sum_gains / abs(sum_losses) return(omega_ratio) } # Set up parallel backend num_cores <- detectCores() - 1 cl <- makeCluster(num_cores) registerDoParallel(cl) # Function to calculate Omega ratio for a given set of parameters optimize_strategy <- function(train_data, par1value, par2value, spreadweight) { # Run the strategy with the given parameters results_omega <- strategy(train, par1value = par1value, par2value = par2value, spreadweight = spreadweight) # Calculate performance (Omega ratio) omega_ratio <- calculate_performance(results_omega) return(omega_ratio) } # Parallel optimization using foreach (with spreadweight) results01 <- foreach(i = 1:nrow(param_grid), .combine = rbind, .packages = c("dplyr", "tidyr")) %dopar% { par1 <- param_grid$par1value[i] par2 <- param_grid$par2value[i] spreadweight <- param_grid$spreadweight[i] # Run strategy and evaluate performance based on Omega ratio omega_ratio <- optimize_strategy(train, par1value = par1, par2value = par2, spreadweight = spreadweight) # Return results as a data frame data.frame(par1value = par1, par2value = par2, spreadweight = spreadweight, omega_ratio = omega_ratio) } # Stop parallel cluster stopCluster(cl) ################################### # Ensure `spreadweight` is treated as a factor for animation results_omega <- results01 %>% mutate(spreadweight = as.factor(spreadweight)) # Convert spreadweight to categorical for animation # Create the plot fig_omega <- plot_ly( data = results_omega, x = ~par1value, # X-axis values y = ~par2value, # Y-axis values z = ~omega_ratio, # Z-axis values (Omega Ratio) color = ~spreadweight,# Color by Omega Ratio frame = ~spreadweight, # Animation frames based on spreadweight type = "scatter3d", # 3D scatter plot for animation mode = "markers", # Use markers for better visibility marker = list(size = 5) # Customize marker size ) # Customize Layout fig_omega <- fig_omega %>% layout( title = "Omega Ratio Across Spreadweight", scene = list( xaxis = list(title = "Upper_STD"), yaxis = list(title = "Lower_STD"), zaxis = list(title = "Omega Ratio") ), updatemenus = list( list( type = "buttons", showactive = FALSE, buttons = list( list( label = "Play", method = "animate", args = list(NULL, list(frame = list(duration = 500, redraw = TRUE), fromcurrent = TRUE)) ), list( label = "Pause", method = "animate", args = list(NULL, list(mode = "immediate", frame = list(duration = 0, redraw = FALSE), transition = list(duration = 0))) ) ) ) ) ) ``` ```{r, message=FALSE, warning=FALSE, include=FALSE} library(ggplot2) library(dplyr) # Ensure spreadweight is a factor results01 <- results01 %>% mutate(spreadweight = as.factor(spreadweight)) # Create heatmap for each spreadweight value heatmap_omega <- ggplot(results01, aes(x = par1value, y = par2value, fill = omega_ratio)) + geom_tile() + facet_wrap(~spreadweight, scales = "free") + scale_fill_viridis_c() + # Use a continuous color scale (viridis) theme_minimal() + labs( title = "Heatmap of Omega Ratio for Different Spreadweights", x = "Upper_STD (par1value)", y = "Lower_STD (par2value)", fill = "Omega Ratio" ) ``` ::::: grid ::: g-col-8 ```{r} heatmap_omega ``` ::: ::: g-col-4 ```{r} fig02 ``` ::: ::::: Therefore, direct comparison of the risk profile (VaR) as a function of `Lower_STD` for varying spreadweight allocations is required. This helps in identifying how different spreadweights, such as 0.8, which seemed optimal initially, perform under different levels of risk. Refering to the graph below, one thing that is consistent is with lower lower_STD (mostly around -1) the var is consistent across different spreadweights at that level of STD. In other words if I have a constraint of lower_STD \~ -1 I might do well but if stuff hit the fun I will be the one ending up shirtless. The reason there is no var for values of lower_STD below -1 is because in those scenario the model did not execute any trade, and for the ones which it did trade (ie lower_STD -1) it had a disastrous loss.\ Therefore, an analysis involved considering both VaR and Omega we can conclude that a spread-weight 0.5 var at 90% CI with lower_STD of -0.7, Upper_STD 1.3 looks appropriate. Next we will test it in other dataset.**\ ** ```{r, message=FALSE, warning=FALSE, include=FALSE} library(ggplot2) library(dplyr) library(scales) # Parameters par2_vals <- seq(-0.5, -2, by = -0.05) spreadweights <- seq(0, 1, by = 0.05) # Prepare data for all spreadweights VaR_results_all <- results %>% filter(spreadweight %in% spreadweights, par2value %in% par2_vals) %>% group_by(spreadweight, par2value) %>% summarise(VaR = quantile(performance, 0.1), .groups = "drop") # 10% quantile for VaR # Plot all spreadweights in one faceted plot VaR_plot_facet <- ggplot(VaR_results_all, aes(x = par2value, y = VaR)) + geom_line(color = "blue", size = 1.2) + # Line graph geom_point(color = "red", size = 1) + # Add points for emphasis facet_wrap(~spreadweight, scales = "free_y") + # Facet by spreadweight theme_minimal() + # Minimal theme labs( title = "Value at Risk (VaR) for Different Spreadweights", x = "Lower_STD (par2value)", y = "VaR (10th Percentile Performance)" ) + scale_y_continuous(labels = label_number(scale_cut = cut_short_scale())) + # Avoid scientific notation theme( plot.title = element_text(hjust = 0.5, face = "bold"), axis.title = element_text(size = 12), axis.text = element_text(size = 10), strip.text = element_text(size = 10) # Adjust facet labels' size ) ``` ```{r} VaR_plot_facet ``` ```{r, message=FALSE, warning=FALSE, include=FALSE} My_data <- strategy(data = train, par1value = 1.3, par2value = -0.7, spreadweight = 0.5, balticweight = 0.5) training_optimized_set <- My_data %>% timetk::tk_xts(date_var = Date) plot(training_optimized_set$spreadF, main = "Strategy Results") xts::addSeries( training_optimized_set$trade, main = "Trades", on = NA, type = "h", col = "blue", lty = 1, lwd = 1, pch = 0 ) xts::addSeries( training_optimized_set$pos, main = "Positions", on = NA, type = "h", col = "blue", lty = 1, lwd = 1, pch = 0 ) xts::addSeries( training_optimized_set$cumeq, main = "CumEQ", on = NA, type = "l", col = "blue", lty = 1, lwd = 1, pch = 0 ) ``` ```{r, message=FALSE, warning=FALSE, include=FALSE} futures_data02_test <- futures_data_test %>% transmute(date, BRNF_test, CLF_test, spreadF_test = BRNF_test - CLF_test) %>% mutate(return_spreadF_test = spreadF_test / lag(spreadF_test) - 1) %>% mutate(normalized_z_spreadF_test = scale(spreadF_test, center = TRUE, scale = TRUE)) lm_fit_test <- lm(normalized_z_spreadF_test ~ seq_along(normalized_z_spreadF_test), data = futures_data02_test) futures_data02_test$normalized_z_detrended_spreadF_test <- futures_data02_test$normalized_z_spreadF_test futures_data02_test$normalized_z_detrended_spreadF_test <- futures_data02_test$normalized_z_detrended_spreadF_test - lm_fit_test$fitted.values par1value <- 1.3 par2value <- -0.5 spreadweight <- 0.35 balticweight <- 0.65 # Combine the spread signal and Baltic changes into a single indicator futures_data03_test <- futures_data02_test %>% mutate(Date = date) %>% inner_join(comparsion_test %>% select(Date, normalized_z_index_baltic_test), by = "Date") %>% mutate( composite_indicator = spreadweight * normalized_z_detrended_spreadF_test + balticweight * normalized_z_index_baltic_test, signal = dplyr::case_when( composite_indicator < par2value ~ 1, composite_indicator > par1value ~ -1, TRUE ~ 0 ) ) %>% na.omit() ############################################################################### withtrades_test <- futures_data03_test %>% dplyr::mutate( # Calculate trade orders by looking at the change in signal trade = tidyr::replace_na(dplyr::lag(signal) - dplyr::lag(signal, n = 2L), 0) ) withposition_test <- withtrades_test %>% mutate( pos = cumsum(trade), # Cumulative position based on trade signals # Track entry price when trade is executed entry_price = ifelse(trade != 0, spreadF_test, NA), # Initialize a column to track the stop-loss condition stop_loss_triggered = FALSE, # For each trade, checking if stop-loss is triggered stop_loss = ifelse( !is.na(entry_price) & pos != 0, abs((spreadF_test - entry_price) / entry_price) >= stop_loss_threshold, FALSE ), # Update the trade position based on stop-loss logic adjusted_trade = ifelse(stop_loss, 0, trade), # Calculate returns (new, existing, or stopped out) ret_new = ifelse(adjusted_trade != 0, adjusted_trade * return_spreadF_test, 0), ret_exist = ifelse(adjusted_trade == 0 & pos != 0, pos * return_spreadF_test, 0), # Total return considering new, existing, and stop-loss trades ret = ret_new + ret_exist, ret = ret - abs(adjusted_trade) * 0.01 # Transaction cost ) # Calculate cumulative equity (cumulative product of total returns) final_test <- withposition_test %>% dplyr::mutate( cumeq = cumprod(1 + ret) # Cumulative return ) test_Set <- final_test %>% timetk::tk_xts(date_var = date) ``` ```{r, message=FALSE, warning=FALSE, include=FALSE} My_data01 <- My_data # Create individual ggplot objects for each variable # Plot for spreadF_test plot_spreadF_My_data <- ggplot(My_data01, aes(x = Date, y = spreadF)) + geom_line(color = "black", size = 1) + labs(title = "Spread", x = "Date", y = "Value") + theme_minimal() # Plot for trade (points where trade occurs, assuming trade is binary) plot_trade_My_data <- ggplot(My_data01, aes(x = Date, y = trade)) + geom_point(color = "blue", size = 1) + # Points for trade occurrences (only when trade == 1) labs(title = "Trades", x = "Date", y = "Trade") + theme_minimal() # Plot for pos (points where position exists, assuming pos is binary) plot_pos_My_data <- ggplot(My_data01, aes(x = Date, y = pos)) + geom_point(color = "blue", size = 1) + # Points for position occurrences (only when pos == 1) labs(title = "Positions", x = "Date", y = "Position") + theme_minimal() # Plot for cumeq plot_cumeq_My_data <- ggplot(My_data01, aes(x = Date, y = cumeq)) + geom_line(color = "blue", size = 1) + labs(title = "Cumulative performance", x = "Date", y = "Value") + theme_minimal() # Combine all plots in a single view with shared x-axis final_plot_My_data <- plot_spreadF_My_data / plot_trade_My_data / plot_pos_My_data / plot_cumeq_My_data + plot_layout(heights = c(1, 1, 1, 1)) # Equal height for each plot # Display the combined plot ################################################################# # Change -1 to +1 for trade and pos final_test01 <- final_test # Create individual ggplot objects for each variable # Plot for spreadF_test plot_spreadF <- ggplot(final_test01, aes(x = date, y = spreadF_test)) + geom_line(color = "black", size = 1) + labs(title = "Spread", x = "Date", y = "Value") + theme_minimal() # Plot for trade (points where trade occurs, assuming trade is binary) plot_trade <- ggplot(final_test01, aes(x = date, y = trade)) + geom_point(color = "blue", size = 1) + # Points for trade occurrences (only when trade == 1) labs(title = "Trades", x = "Date", y = "Trade") + theme_minimal() # Plot for pos (points where position exists, assuming pos is binary) plot_pos <- ggplot(final_test01, aes(x = date, y = pos)) + geom_point(color = "blue", size = 1) + # Points for position occurrences (only when pos == 1) labs(title = "Positions", x = "Date", y = "Position") + theme_minimal() # Plot for cumeq plot_cumeq <- ggplot(final_test01, aes(x = date, y = cumeq)) + geom_line(color = "blue", size = 1) + labs(title = "Cumulative performance", x = "Date", y = "Value") + theme_minimal() # Combine all plots in a single view with shared x-axis final_plot_test <- plot_spreadF / plot_trade / plot_pos / plot_cumeq + plot_layout(heights = c(1, 1, 1, 1)) # Equal height for each plot # Display the combined plot ``` From the graphs below, with the chosen constraint, first is the test model, and second is the training model.  ::::: grid ::: g-col-6 ```{r, echo=FALSE,message=FALSE, warning=FALSE} final_plot_test ``` ::: ::: g-col-6 ```{r echo =FALSE, message=FALSE, warning=FALSE} final_plot_My_data ``` ::: ::::: # How good is my returns? Could I rely on them? ```{r} strategy_test <- function(data = train, par1value = 0.7, par2value = -0.7, spreadweight = 0.7, balticweight = 0.3, stop_loss_threshold = -0.1) { # Stop-loss threshold (3%) data <- data %>% select( Date, spreadF, return_spreadF, normalized_z_detrended_spreadF, normalized_z_index_baltic ) %>% mutate( # Composite indicator using weighted spread and Baltic changes composite_indicator = spreadweight * normalized_z_detrended_spreadF + balticweight * normalized_z_index_baltic, # Generate trade signals based on the composite indicator signal = dplyr::case_when( composite_indicator < par2value ~ 1, # Buy signal composite_indicator > par1value ~ -1, # Sell signal TRUE ~ 0 # No trade signal ) ) # Trade orders based on the change in signal withtrades <- data %>% dplyr::mutate( trade = tidyr::replace_na(dplyr::lag(signal) - dplyr::lag(signal, n = 2L), 0) ) # Track positions and calculate returns withposition <- withtrades %>% mutate( pos = cumsum(trade), # Cumulative position based on trade signals # Track entry price when trade is executed entry_price = ifelse(trade != 0, spreadF, NA), # Initialize a column to track the stop-loss condition stop_loss_triggered = FALSE, # For each trade, checking if stop-loss is triggered stop_loss = ifelse( !is.na(entry_price) & pos != 0, abs((spreadF - entry_price) / entry_price) >= stop_loss_threshold, FALSE ), # Update the trade position based on stop-loss logic adjusted_trade = ifelse(stop_loss, 0, trade), # Calculate returns (new, existing, or stopped out) ret_new = ifelse(adjusted_trade != 0, adjusted_trade * return_spreadF, 0), ret_exist = ifelse(adjusted_trade == 0 & pos != 0, pos * return_spreadF, 0), # Total return considering new, existing, and stop-loss trades ret = ret_new + ret_exist, ret = ret - abs(adjusted_trade) * 0.01 # Transaction cost ) # Calculate cumulative equity (cumulative product of total returns) final_test <- withposition %>% dplyr::mutate( cumeq = cumprod(1 + ret) # Cumulative return ) return(final_test) } # Create a new dataset with cleaned column names for final_test final_clean <- strategy_test(train) # Assuming you use 'train' here colnames(final_clean) <- gsub("_test$", "", colnames(final_clean)) # Clean column names # Detect available cores and register the parallel cluster n_cores <- detectCores() - 1 cl <- makeCluster(n_cores) registerDoParallel(cl) # Define the updated optimize_strategy function for `final_test` optimize_strategy_test <- function(test_data, par1value, par2value, spreadweight, balticweight) { balticweight <- 1 - spreadweight # Run the strategy with the given parameters results_test <- strategy_test(test_data, par1value = par1value, par2value = par2value, spreadweight = spreadweight, balticweight = balticweight) # Calculate the performance metric (e.g., final cumulative equity) performance <- last(results_test$cumeq) # Assuming 'cumeq' is the final cumulative equity column return(performance) } # Set parameter grid for optimization par1value <- seq(from = 0, to = 2, by = 0.05) par2value <- seq(from = -2, to = 0, by = 0.05) spreadweight <- seq(from = 0.05, to = 1, by = 0.05) param_grid <- expand.grid(spreadweight = spreadweight, par2value = par2value, par1value = par1value) # Parallel optimization using foreach for the `final_test` dataset results_test <- foreach(i = 1:nrow(param_grid), .combine = rbind, .packages = c("dplyr", "tidyr")) %dopar% { par1 <- param_grid$par1value[i] par2 <- param_grid$par2value[i] spreadweight <- param_grid$spreadweight[i] # Run strategy and evaluate performance on `final_test` performance <- optimize_strategy_test(final_clean, par1value = par1, par2value = par2, spreadweight = spreadweight) # Return results as a data frame data.frame(par1value = par1, par2value = par2, spreadweight = spreadweight, performance = performance) } # Stop parallel cluster stopCluster(cl) ``` ```{r, message=FALSE, warning=FALSE, include=FALSE} test_strategy <- final_test train_strategy <- My_data calculate_overall_percentage_win <- function(data) { total_trades <- nrow(data %>% filter(ret != 0)) # Exclude zero returns (no trades) winning_trades <- nrow(data %>% filter(ret > 0)) percentage_win <- (winning_trades / total_trades) * 100 return(percentage_win) } # Example Usage overall_win_train <- calculate_overall_percentage_win(train_strategy) overall_win_test <- calculate_overall_percentage_win(test_strategy) ``` - **Overall Percentage Win with set model constraints (during training phase)**: 56.25% — The strategy achieved positive returns in 56.25% of trades during the training phase. It's a moderate success rate, indicating the strategy's consistency during backtesting. - **Overall Percentage Win with set model constraints (during test phase)**: 47.62% — This shows a lower success rate in the testing phase, meaning the strategy was less consistent in generating positive returns on unseen data. ```{r} cat("Overall Percentage Win (Train):", overall_win_train, "%\n") cat("Overall Percentage Win (Test):", overall_win_test, "%\n") ``` ```{r} library(PerformanceAnalytics) library(xts) train_returns <- xts(train_strategy$ret, order.by = train_strategy$Date) test_returns <- xts(test_strategy$ret, order.by = test_strategy$Date) ``` ::::: grid ::: g-col-6 ```{r, message=FALSE, warning=FALSE} PerformanceAnalytics::chart.Drawdown(test_returns, main = "Drawdown Chart (Test Strategy)") ``` ::: ::: g-col-6 ```{r, message=FALSE, warning=FALSE} PerformanceAnalytics::chart.Drawdown(train_returns, main = "Drawdown Chart (Train Strategy)") ``` ::: ::::: The absence of consistent drawdowns in both senario can largely be attributed to the embedded stop-loss mechanism, which effectively mitigates the occurrence of significant losses. The relatively low frequency of large drawdowns in both scenarios indicates that the model carries a lower overall risk profile. In the training model, there is only one pronounced drawdown, but the subsequent recovery is swift, demonstrating the model's resilience in the face of market shocks. From a personal risk tolerance perspective, such sharp fluctuations should not provoke significant emotional distress, as an investor, due to their relatively brief duration and the model's ability to recover quickly. # **Potential Weaknesses and Limitations:** 1. **Sensitivity to Model Parameters:**\ The effectiveness of the composite indicator depends on the weighting of the spread and Baltic index components. Inappropriate weighting may overemphasize one factor, leading to suboptimal signals. Linear trend-line for spread and baltic changes attached in appendix for reference. 2. **Impact of Roll Adjustments:**\ Using front-month continuous contracts adjusted for rollovers introduces a risk of data distortions, particularly during periods of sharp contango or backwardation in the futures curve. These distortions can affect the stationarity assumptions. 3. **Assumption of Constant Market Dynamics:**\ The model assumes that the fundamental relationship between Brent and WTI prices and their drivers remains consistent over time. Structural changes in market conditions, such as shifts in global oil production or changes in transportation infrastructure, may weaken the model’s predictive power. In other words the decay rate of models of such type is very high, compared to multidimensional pure factor models. # **Opportunities for Future Improvement:** 1. **Incorporate Volatility Adjustments:**\ Adding a volatility-adjusted measure to the composite indicator could help account for periods of market stress, providing additional safeguards against unexpected deviations. 2. **Expand Fundamental Drivers:**\ Including other fundamental factors such as refinery demand, storage capacity, or geopolitical risk indices could enhance the explanatory power of the model. 3. **Backtesting Across Multiple Regimes:**\ Conducting extensive backtesting across various market regimes (e.g., high volatility, geopolitical tensions) would provide deeper insights into the model's resilience and adaptability. 4. **Dynamic Weighting Mechanism:**\ A dynamic adjustment of the weights for the spread and Baltic index components could allow the model to better adapt to changing market conditions, improving performance over time. # Conclusion: The developed pairs trading strategy for the Brent-WTI spread demonstrates a robust and disciplined framework for statistical arbitrage. Its strengths lie in combining stationarity-based statistical insights with fundamental market driver, providing a reliable foundation for mean-reversion trades. The built-in risk management framework—comprising stop-loss mechanisms, and continuous monitoring—further enhances its practical applicability by safeguarding against excessive losses. However, the model has limitations, including sensitivity to parameter choices, exclusion of exogenous risks, and potential vulnerability to structural market changes. Future iterations could address these challenges by incorporating dynamic weighting, volatility adjustments, and expanded fundamental drivers. Ultimately, the model is a powerful tool for market-neutral trading in the crude oil market, capable of capturing short-term mispricings while maintaining a strong focus on capital preservation and disciplined risk management. # Learning Outcomes #### 1. Understanding Risk Appetite I developed a nuanced understanding of risk appetite, recognizing its critical role in shaping trading strategies and decision-making. This involved identifying tolerance levels and aligning them with practical, executable logic. #### 2. Codifying Logic and Testing with Historical Data The process of transforming my views into a structured logic code and testing it with historical data was an enriching yet challenging experience. Algorithmic trading fundamentally revolves around codifying logic into a specific set of processes while adhering to well-defined constraints. Through this, I realized that designing a robust solution is as intellectually demanding as coding it. The iterative process was often frustrating but ultimately rewarding, as it deepened my understanding of systematic problem-solving. #### 3. Iterative Development Approach Rather than attempting to build the entire model at once, I adopted an iterative approach akin to agile methodology. I started with a simple foundation and progressively added features as needed, such as incorporating transaction costs, and using both var with omega rather than only omega. This approach allowed for continuous refinement, where elements that made sense were retained, while others were removed or adjusted based on outcomes. #### 4. Organizing Code for Efficiency One of the key lessons I learned was the importance of organizing code effectively. Much like maintaining an orderly house, having well-structured and logically arranged code significantly reduces time spent searching for functions or debugging issues. This practice enhanced my productivity and allowed me to focus more on improving the model’s performance. #### 5. Understanding Risk Appetite I developed a nuanced understanding of risk appetite, recognizing its critical role in shaping trading strategies and decision-making. This involved identifying tolerance levels and aligning them with practical, executable logic. #### 6. Understanding Key Metrics: Omega and VaR This project introduced me to financial risk metrics such as Omega and Value at Risk (VaR), both of which I had not utilized before. Exploring these metrics enhanced my understanding of their applications in evaluating the performance and risk of trading strategies. # Appendix ```{r} spread_changeplot ```

ADF test: Baltic dirty Tanker index changes and Brent-WTI spread changes

Normalized and de-trended Baltic dirty tanker index changes and Brent - WTI spread

Composite indicator1

Highlight of the model logic

Justification for using spread rather than buying/selling WTI/Brent seperately

My risk appetite

Model performance

Understanding the risk!

How good is my returns? Could I rely on them?

Potential Weaknesses and Limitations:

Opportunities for Future Improvement:

Conclusion:

Learning Outcomes

1. Understanding Risk Appetite

2. Codifying Logic and Testing with Historical Data

3. Iterative Development Approach

4. Organizing Code for Efficiency

5. Understanding Risk Appetite

6. Understanding Key Metrics: Omega and VaR

Appendix

Footnotes

Composite indicator¹