BTC Bridge: Multi-Exchange Decomposition Feature Engineering

Author

Adrian Aziza

library(fpp3)
library(tidyverse)
library(lubridate)
library(patchwork)
library(knitr)

1 1. Concept: The Collapse Layer

The collapse layer is the foundation of multi-feed time series feature engineering:

Define a single temporal grid (e.g., 1-minute bars)
Enforce a consistent missing-data policy (forward-fill, interpolate, or drop)
Align all feeds to this grid
Generate features on the aligned data

This enables: - Time-aligned feature matrices (X_t) where each row is a timestamp - Cross-feed analysis (correlations, spreads, leads/lags) - Decomposition-based features (trend, seasonal, remainder)

2 2. Simulated Multi-Exchange BTC Data

Since we don’t have live API access in this environment, we’ll simulate realistic multi-exchange data with: - Different sampling frequencies - Missing timestamps (gaps) - Noise and drift between exchanges

set.seed(42)

# Base BTC price signal (shared trend)
start_time <- ymd_hms("2024-01-01 00:00:00")
n_minutes <- 1440  # 24 hours of minute data

base_grid <- tibble(
  timestamp = start_time + minutes(0:(n_minutes-1)),
  base_price = 42000 + cumsum(rnorm(n_minutes, mean = 0.5, sd = 50))
)

# Exchange 1: Binance (complete data, slight premium)
binance <- base_grid %>%
  mutate(
    exchange = "binance",
    price = base_price * 1.002 + rnorm(n(), 0, 10)  # 0.2% premium + noise
  ) %>%
  select(timestamp, exchange, price)

# Exchange 2: Coinbase (90% completeness, slight discount)
coinbase <- base_grid %>%
  sample_frac(0.90) %>%  # Missing 10% of timestamps
  mutate(
    exchange = "coinbase",
    price = base_price * 0.998 + rnorm(n(), 0, 15)  # 0.2% discount + more noise
  ) %>%
  select(timestamp, exchange, price)

# Exchange 3: Kraken (85% completeness, variable spread)
kraken <- base_grid %>%
  sample_frac(0.85) %>%  # Missing 15% of timestamps
  mutate(
    exchange = "kraken",
    price = base_price * (1 + rnorm(n(), 0, 0.003)) + rnorm(n(), 0, 20)  # Variable spread
  ) %>%
  select(timestamp, exchange, price)

# Combine all exchanges
raw_data <- bind_rows(binance, coinbase, kraken)

# Summary
raw_data %>%
  count(exchange) %>%
  kable(caption = "Raw data counts by exchange (before alignment)")

Raw data counts by exchange (before alignment)
exchange	n
binance	1440
coinbase	1296
kraken	1224

3 3. The Collapse Layer: Grid Alignment

# Step 1: Define the master temporal grid (every minute)
master_grid <- tibble(
  timestamp = start_time + minutes(0:(n_minutes-1))
)

# Step 2: Pivot to wide format (one column per exchange)
wide_data <- raw_data %>%
  pivot_wider(
    names_from = exchange,
    values_from = price,
    names_prefix = "price_"
  )

# Step 3: Align to master grid and handle missing data
aligned_data <- master_grid %>%
  left_join(wide_data, by = "timestamp") %>%
  # Forward-fill missing values (last observation carried forward)
  fill(starts_with("price_"), .direction = "down") %>%
  # Drop any remaining leading NAs
  drop_na()

# Verify alignment
cat("Master grid rows:", nrow(master_grid), "\n")

Master grid rows: 1440

cat("Aligned data rows:", nrow(aligned_data), "\n")

Aligned data rows: 1440

cat("Missing values:\n")

Missing values:

aligned_data %>% 
  summarise(across(starts_with("price_"), ~sum(is.na(.)))) %>%
  kable()

price_binance	price_coinbase	price_kraken
0	0	0

# Visualize the aligned price feeds
aligned_data %>%
  pivot_longer(
    cols = starts_with("price_"),
    names_to = "exchange",
    values_to = "price",
    names_prefix = "price_"
  ) %>%
  ggplot(aes(x = timestamp, y = price, color = exchange)) +
  geom_line(alpha = 0.7) +
  labs(
    title = "Aligned BTC Prices Across Exchanges (Post-Collapse)",
    x = "Time",
    y = "Price (USD)",
    color = "Exchange"
  ) +
  theme_minimal()

4 4. STL Decomposition on Each Feed

Now we decompose each exchange feed into trend + seasonal + remainder components.

# Convert to tsibble format (required for fable/feasts)
# We'll use 15-minute aggregation for seasonality detection
btc_ts <- aligned_data %>%
  mutate(
    time_15min = floor_date(timestamp, "15 minutes")
  ) %>%
  group_by(time_15min) %>%
  summarise(
    across(starts_with("price_"), mean, na.rm = TRUE)
  ) %>%
  ungroup() %>%
  as_tsibble(index = time_15min)

# Verify tsibble properties
cat("Tsibble interval:", format(interval(btc_ts)), "\n")

Tsibble interval: 15m

cat("Has gaps:", has_gaps(btc_ts)$.gaps, "\n")

Has gaps: FALSE

# Decompose each exchange feed using STL
# Note: We need at least 2 full seasonal periods for STL
# With 15-min bars over 24 hours, we have 96 observations
# We'll use a 4-hour seasonal window (16 bars)

decomp_models <- btc_ts %>%
  pivot_longer(
    cols = starts_with("price_"),
    names_to = "exchange",
    values_to = "price",
    names_prefix = "price_"
  ) %>%
  as_tsibble(key = exchange, index = time_15min) %>%
  model(
    stl = STL(price ~ trend(window = 21) + season(window = "periodic"))
  )

# Extract components
decomp_components <- decomp_models %>%
  components()

# View structure
decomp_components %>%
  head(10) %>%
  kable(digits = 2, caption = "Sample STL components (first 10 rows)")

Sample STL components (first 10 rows)
exchange	.model	time_15min	price	trend	season_hour	remainder	season_adjust
binance	stl	2024-01-01 00:00:00	42309.90	42320.97	-9.68	-1.40	42319.57
binance	stl	2024-01-01 00:15:00	42278.66	42289.26	1.41	-12.01	42277.25
binance	stl	2024-01-01 00:30:00	42137.62	42257.55	20.91	-140.84	42116.71
binance	stl	2024-01-01 00:45:00	42053.16	42225.84	-12.63	-160.04	42065.80
binance	stl	2024-01-01 01:00:00	42179.54	42192.25	-9.68	-3.03	42189.22
binance	stl	2024-01-01 01:15:00	42273.51	42158.66	1.41	113.44	42272.10
binance	stl	2024-01-01 01:30:00	42376.58	42125.07	20.91	230.61	42355.68
binance	stl	2024-01-01 01:45:00	42362.75	42093.26	-12.63	282.12	42375.39
binance	stl	2024-01-01 02:00:00	42056.52	42061.46	-9.68	4.74	42066.20
binance	stl	2024-01-01 02:15:00	41997.11	42029.65	1.41	-33.94	41995.70

5 5. Visualize Decompositions

# Plot all three exchange decompositions
decomp_components %>%
  autoplot() +
  facet_wrap(~ exchange, ncol = 1, scales = "free_y") +
  labs(title = "STL Decomposition: All Exchanges") +
  theme_minimal()

6 6. Feature Matrix Construction

Extract features from each component and create the time-aligned feature matrix (X_t).

# Check what columns STL produced
cat("STL decomposition columns:\n")

STL decomposition columns:

cat(paste(colnames(decomp_components), collapse = "\n"))

exchange
.model
time_15min
price
trend
season_hour
remainder
season_adjust

cat("\n\n")

# Show sample of decomposition
decomp_components %>%
  head(3) %>%
  kable(digits = 2, caption = "Sample decomposition output")

Sample decomposition output
exchange	.model	time_15min	price	trend	season_hour	remainder	season_adjust
binance	stl	2024-01-01 00:00:00	42309.90	42320.97	-9.68	-1.40	42319.57
binance	stl	2024-01-01 00:15:00	42278.66	42289.26	1.41	-12.01	42277.25
binance	stl	2024-01-01 00:30:00	42137.62	42257.55	20.91	-140.84	42116.71

# First, let's see what columns we have
decomp_df <- decomp_components %>% as_tibble()
cat("Available columns:\n")

Available columns:

print(colnames(decomp_df))

[1] "exchange"      ".model"        "time_15min"    "price"        
[5] "trend"         "season_hour"   "remainder"     "season_adjust"

cat("\n")

# Get the seasonal column name (it varies based on the period)
seasonal_cols <- colnames(decomp_df)[grepl("season", colnames(decomp_df))]
seasonal_col <- seasonal_cols[1]  # Take the first seasonal column
cat("Seasonal column found:", seasonal_col, "\n\n")

Seasonal column found: season_hour

# Extract trend features: level, slope (first difference), curvature (second difference)
trend_features <- decomp_df %>%
  group_by(exchange) %>%
  arrange(time_15min) %>%
  mutate(
    trend_level = trend,
    trend_slope = trend - lag(trend, default = first(trend)),
    trend_curvature = trend_slope - lag(trend_slope, default = first(trend_slope))
  ) %>%
  ungroup() %>%
  select(time_15min, exchange, trend_level, trend_slope, trend_curvature)

# Extract seasonal and remainder features
# Use the seasonal column we found dynamically
seasonal_remainder <- decomp_df %>%
  select(time_15min, exchange, season_component = all_of(seasonal_col), remainder)

# Combine all features
features_long <- trend_features %>%
  left_join(seasonal_remainder, by = c("time_15min", "exchange"))

# Pivot to wide format (one column per exchange-feature combination)
feature_matrix <- features_long %>%
  pivot_wider(
    names_from = exchange,
    values_from = c(trend_level, trend_slope, trend_curvature, season_component, remainder),
    names_sep = "_"
  ) %>%
  arrange(time_15min)

# Display feature matrix structure
cat("Feature matrix dimensions:", nrow(feature_matrix), "x", ncol(feature_matrix), "\n\n")

Feature matrix dimensions: 96 x 16

cat("Feature columns:\n")

Feature columns:

colnames(feature_matrix) %>% head(10) %>% cat(sep = "\n")

time_15min
trend_level_binance
trend_level_coinbase
trend_level_kraken
trend_slope_binance
trend_slope_coinbase
trend_slope_kraken
trend_curvature_binance
trend_curvature_coinbase
trend_curvature_kraken

# Show sample
feature_matrix %>%
  head(5) %>%
  select(1:8) %>%  # First 8 columns only for display
  kable(digits = 3, caption = "Sample feature matrix (first 5 rows, 8 cols)")

Sample feature matrix (first 5 rows, 8 cols)
time_15min	trend_level_binance	trend_level_coinbase	trend_level_kraken	trend_slope_binance	trend_slope_coinbase	trend_slope_kraken	trend_curvature_binance
2024-01-01 00:00:00	42320.97	42158.79	42261.32	0.000	0.000	0.000	0.000
2024-01-01 00:15:00	42289.26	42126.31	42226.80	-31.712	-32.475	-34.518	-31.712
2024-01-01 00:30:00	42257.55	42093.84	42192.28	-31.712	-32.475	-34.518	0.000
2024-01-01 00:45:00	42225.84	42061.36	42157.76	-31.712	-32.475	-34.518	0.000
2024-01-01 01:00:00	42192.25	42027.08	42121.32	-33.589	-34.282	-36.444	-1.878

7 7. Cross-Feed Analysis: Spreads and Correlations

# Calculate exchange spreads (price differences)
spread_data <- aligned_data %>%
  mutate(
    spread_binance_coinbase = price_binance - price_coinbase,
    spread_binance_kraken = price_binance - price_kraken,
    spread_coinbase_kraken = price_coinbase - price_kraken
  )

# Visualize spreads
spread_data %>%
  select(timestamp, starts_with("spread_")) %>%
  pivot_longer(-timestamp, names_to = "spread", values_to = "value") %>%
  ggplot(aes(x = timestamp, y = value, color = spread)) +
  geom_line(alpha = 0.7) +
  labs(
    title = "Inter-Exchange Price Spreads",
    x = "Time",
    y = "Spread (USD)",
    color = "Spread"
  ) +
  theme_minimal()

# Calculate rolling correlations between exchanges
library(zoo)

# 60-minute rolling window correlations
roll_window <- 60

correlations <- aligned_data %>%
  mutate(
    corr_binance_coinbase = rollapply(
      cbind(price_binance, price_coinbase),
      width = roll_window,
      FUN = function(x) cor(x[,1], x[,2]),
      by.column = FALSE,
      fill = NA,
      align = "right"
    ),
    corr_binance_kraken = rollapply(
      cbind(price_binance, price_kraken),
      width = roll_window,
      FUN = function(x) cor(x[,1], x[,2]),
      by.column = FALSE,
      fill = NA,
      align = "right"
    ),
    corr_coinbase_kraken = rollapply(
      cbind(price_coinbase, price_kraken),
      width = roll_window,
      FUN = function(x) cor(x[,1], x[,2]),
      by.column = FALSE,
      fill = NA,
      align = "right"
    )
  ) %>%
  drop_na()

# Visualize rolling correlations
correlations %>%
  select(timestamp, starts_with("corr_")) %>%
  pivot_longer(-timestamp, names_to = "pair", values_to = "correlation") %>%
  ggplot(aes(x = timestamp, y = correlation, color = pair)) +
  geom_line(alpha = 0.7) +
  labs(
    title = paste("Rolling Correlation (", roll_window, "-minute window)"),
    x = "Time",
    y = "Correlation",
    color = "Exchange Pair"
  ) +
  ylim(0.9, 1.0) +
  theme_minimal()

8 8. ACF/PACF Analysis on Remainder Components

# ACF/PACF on remainder to check for remaining autocorrelation
remainder_data <- decomp_components %>%
  as_tibble() %>%
  select(time_15min, exchange, remainder) %>%
  as_tsibble(key = exchange, index = time_15min)

# Plot ACF for each exchange
p_acf <- remainder_data %>%
  ACF(remainder, lag_max = 48) %>%
  autoplot() +
  facet_wrap(~ exchange, ncol = 1) +
  labs(title = "ACF of Remainder (checking for leftover structure)") +
  theme_minimal()

# Plot PACF for each exchange
p_pacf <- remainder_data %>%
  PACF(remainder, lag_max = 48) %>%
  autoplot() +
  facet_wrap(~ exchange, ncol = 1) +
  labs(title = "PACF of Remainder") +
  theme_minimal()

p_acf / p_pacf

9 9. Export Feature Matrix for Modeling

# Export the feature matrix for downstream modeling
library(writexl)

write_xlsx(feature_matrix, "btc_feature_matrix.xlsx")
write.csv(feature_matrix, "btc_feature_matrix.csv", row.names = FALSE)

cat("Feature matrix exported to:\n")

Feature matrix exported to:

cat("- btc_feature_matrix.xlsx\n")

- btc_feature_matrix.xlsx

cat("- btc_feature_matrix.csv\n\n")

- btc_feature_matrix.csv

cat("Ready for:\n")

Ready for:

cat("- ML models (regression, classification, reinforcement learning)\n")

- ML models (regression, classification, reinforcement learning)

cat("- ARIMA/VAR models on trend components\n")

- ARIMA/VAR models on trend components

cat("- Volatility forecasting on remainder\n")

- Volatility forecasting on remainder

cat("- Spread/arbitrage signal generation\n")

- Spread/arbitrage signal generation

10 10. Summary: The Complete Pipeline

10.1 What We Built

Multi-source data ingestion (simulated 3 exchanges)
Collapse layer (alignment to master grid + forward-fill)
STL decomposition (trend + seasonal + remainder per exchange)
Feature engineering (trend level/slope/curvature, seasonal, remainder)
Cross-feed analysis (spreads, rolling correlations)
Persistence diagnostics (ACF/PACF on remainder)
Time-aligned feature matrix (X_t) (ready for modeling)

10.2 Why This Matters

Traditional approaches: - Use raw prices directly → noisy, non-stationary - Ignore cross-feed structure → miss arbitrage opportunities - No systematic decomposition → hard to interpret model features

This pipeline: - ✅ Separates signal (trend/seasonal) from noise (remainder) - ✅ Creates interpretable features (trend slope = momentum) - ✅ Enables multi-timescale analysis (trend for position, remainder for risk) - ✅ Generalizes to any multi-feed scenario (crypto indices, FX, commodities)

10.3 Next Steps

Feature selection: Use feature importance (RF, LASSO) on (X_t)
Forecasting: Build ARIMAX/VAR models on trend components
Regime detection: Cluster on remainder variance (high vol = regime shift)
Signal generation: Spread crossovers, trend slope thresholds
Backtesting: Simulate trades using feature-based rules

10.4 Connection to Discussion 3

This is the direct extension of the PAYNSA decomposition work:

PAYNSA (Single Series)	BTC Engine (Multi-Feed)
1 series (employment)	N series (exchanges)
Monthly frequency	Minute/tick frequency
Additive vs multiplicative	Additive (price levels)
Forecast next value	Generate trading signals
Excel verification	Production pipeline

Core insight remains the same: Decomposition transforms raw data into structured features that are easier to model, interpret, and act upon.