library(fpp3)
library(tidyverse)
library(lubridate)
library(patchwork)
library(knitr)BTC Bridge: Multi-Exchange Decomposition Feature Engineering
1 1. Concept: The Collapse Layer
The collapse layer is the foundation of multi-feed time series feature engineering:
- Define a single temporal grid (e.g., 1-minute bars)
- Enforce a consistent missing-data policy (forward-fill, interpolate, or drop)
- Align all feeds to this grid
- Generate features on the aligned data
This enables: - Time-aligned feature matrices (X_t) where each row is a timestamp - Cross-feed analysis (correlations, spreads, leads/lags) - Decomposition-based features (trend, seasonal, remainder)
2 2. Simulated Multi-Exchange BTC Data
Since we don’t have live API access in this environment, we’ll simulate realistic multi-exchange data with: - Different sampling frequencies - Missing timestamps (gaps) - Noise and drift between exchanges
set.seed(42)
# Base BTC price signal (shared trend)
start_time <- ymd_hms("2024-01-01 00:00:00")
n_minutes <- 1440 # 24 hours of minute data
base_grid <- tibble(
timestamp = start_time + minutes(0:(n_minutes-1)),
base_price = 42000 + cumsum(rnorm(n_minutes, mean = 0.5, sd = 50))
)
# Exchange 1: Binance (complete data, slight premium)
binance <- base_grid %>%
mutate(
exchange = "binance",
price = base_price * 1.002 + rnorm(n(), 0, 10) # 0.2% premium + noise
) %>%
select(timestamp, exchange, price)
# Exchange 2: Coinbase (90% completeness, slight discount)
coinbase <- base_grid %>%
sample_frac(0.90) %>% # Missing 10% of timestamps
mutate(
exchange = "coinbase",
price = base_price * 0.998 + rnorm(n(), 0, 15) # 0.2% discount + more noise
) %>%
select(timestamp, exchange, price)
# Exchange 3: Kraken (85% completeness, variable spread)
kraken <- base_grid %>%
sample_frac(0.85) %>% # Missing 15% of timestamps
mutate(
exchange = "kraken",
price = base_price * (1 + rnorm(n(), 0, 0.003)) + rnorm(n(), 0, 20) # Variable spread
) %>%
select(timestamp, exchange, price)
# Combine all exchanges
raw_data <- bind_rows(binance, coinbase, kraken)
# Summary
raw_data %>%
count(exchange) %>%
kable(caption = "Raw data counts by exchange (before alignment)")| exchange | n |
|---|---|
| binance | 1440 |
| coinbase | 1296 |
| kraken | 1224 |
3 3. The Collapse Layer: Grid Alignment
# Step 1: Define the master temporal grid (every minute)
master_grid <- tibble(
timestamp = start_time + minutes(0:(n_minutes-1))
)
# Step 2: Pivot to wide format (one column per exchange)
wide_data <- raw_data %>%
pivot_wider(
names_from = exchange,
values_from = price,
names_prefix = "price_"
)
# Step 3: Align to master grid and handle missing data
aligned_data <- master_grid %>%
left_join(wide_data, by = "timestamp") %>%
# Forward-fill missing values (last observation carried forward)
fill(starts_with("price_"), .direction = "down") %>%
# Drop any remaining leading NAs
drop_na()
# Verify alignment
cat("Master grid rows:", nrow(master_grid), "\n")Master grid rows: 1440
cat("Aligned data rows:", nrow(aligned_data), "\n")Aligned data rows: 1440
cat("Missing values:\n")Missing values:
aligned_data %>%
summarise(across(starts_with("price_"), ~sum(is.na(.)))) %>%
kable()| price_binance | price_coinbase | price_kraken |
|---|---|---|
| 0 | 0 | 0 |
# Visualize the aligned price feeds
aligned_data %>%
pivot_longer(
cols = starts_with("price_"),
names_to = "exchange",
values_to = "price",
names_prefix = "price_"
) %>%
ggplot(aes(x = timestamp, y = price, color = exchange)) +
geom_line(alpha = 0.7) +
labs(
title = "Aligned BTC Prices Across Exchanges (Post-Collapse)",
x = "Time",
y = "Price (USD)",
color = "Exchange"
) +
theme_minimal()4 4. STL Decomposition on Each Feed
Now we decompose each exchange feed into trend + seasonal + remainder components.
# Convert to tsibble format (required for fable/feasts)
# We'll use 15-minute aggregation for seasonality detection
btc_ts <- aligned_data %>%
mutate(
time_15min = floor_date(timestamp, "15 minutes")
) %>%
group_by(time_15min) %>%
summarise(
across(starts_with("price_"), mean, na.rm = TRUE)
) %>%
ungroup() %>%
as_tsibble(index = time_15min)
# Verify tsibble properties
cat("Tsibble interval:", format(interval(btc_ts)), "\n")Tsibble interval: 15m
cat("Has gaps:", has_gaps(btc_ts)$.gaps, "\n")Has gaps: FALSE
# Decompose each exchange feed using STL
# Note: We need at least 2 full seasonal periods for STL
# With 15-min bars over 24 hours, we have 96 observations
# We'll use a 4-hour seasonal window (16 bars)
decomp_models <- btc_ts %>%
pivot_longer(
cols = starts_with("price_"),
names_to = "exchange",
values_to = "price",
names_prefix = "price_"
) %>%
as_tsibble(key = exchange, index = time_15min) %>%
model(
stl = STL(price ~ trend(window = 21) + season(window = "periodic"))
)
# Extract components
decomp_components <- decomp_models %>%
components()
# View structure
decomp_components %>%
head(10) %>%
kable(digits = 2, caption = "Sample STL components (first 10 rows)")| exchange | .model | time_15min | price | trend | season_hour | remainder | season_adjust |
|---|---|---|---|---|---|---|---|
| binance | stl | 2024-01-01 00:00:00 | 42309.90 | 42320.97 | -9.68 | -1.40 | 42319.57 |
| binance | stl | 2024-01-01 00:15:00 | 42278.66 | 42289.26 | 1.41 | -12.01 | 42277.25 |
| binance | stl | 2024-01-01 00:30:00 | 42137.62 | 42257.55 | 20.91 | -140.84 | 42116.71 |
| binance | stl | 2024-01-01 00:45:00 | 42053.16 | 42225.84 | -12.63 | -160.04 | 42065.80 |
| binance | stl | 2024-01-01 01:00:00 | 42179.54 | 42192.25 | -9.68 | -3.03 | 42189.22 |
| binance | stl | 2024-01-01 01:15:00 | 42273.51 | 42158.66 | 1.41 | 113.44 | 42272.10 |
| binance | stl | 2024-01-01 01:30:00 | 42376.58 | 42125.07 | 20.91 | 230.61 | 42355.68 |
| binance | stl | 2024-01-01 01:45:00 | 42362.75 | 42093.26 | -12.63 | 282.12 | 42375.39 |
| binance | stl | 2024-01-01 02:00:00 | 42056.52 | 42061.46 | -9.68 | 4.74 | 42066.20 |
| binance | stl | 2024-01-01 02:15:00 | 41997.11 | 42029.65 | 1.41 | -33.94 | 41995.70 |
5 5. Visualize Decompositions
# Plot all three exchange decompositions
decomp_components %>%
autoplot() +
facet_wrap(~ exchange, ncol = 1, scales = "free_y") +
labs(title = "STL Decomposition: All Exchanges") +
theme_minimal()6 6. Feature Matrix Construction
Extract features from each component and create the time-aligned feature matrix (X_t).
# Check what columns STL produced
cat("STL decomposition columns:\n")STL decomposition columns:
cat(paste(colnames(decomp_components), collapse = "\n"))exchange
.model
time_15min
price
trend
season_hour
remainder
season_adjust
cat("\n\n")# Show sample of decomposition
decomp_components %>%
head(3) %>%
kable(digits = 2, caption = "Sample decomposition output")| exchange | .model | time_15min | price | trend | season_hour | remainder | season_adjust |
|---|---|---|---|---|---|---|---|
| binance | stl | 2024-01-01 00:00:00 | 42309.90 | 42320.97 | -9.68 | -1.40 | 42319.57 |
| binance | stl | 2024-01-01 00:15:00 | 42278.66 | 42289.26 | 1.41 | -12.01 | 42277.25 |
| binance | stl | 2024-01-01 00:30:00 | 42137.62 | 42257.55 | 20.91 | -140.84 | 42116.71 |
# First, let's see what columns we have
decomp_df <- decomp_components %>% as_tibble()
cat("Available columns:\n")Available columns:
print(colnames(decomp_df))[1] "exchange" ".model" "time_15min" "price"
[5] "trend" "season_hour" "remainder" "season_adjust"
cat("\n")# Get the seasonal column name (it varies based on the period)
seasonal_cols <- colnames(decomp_df)[grepl("season", colnames(decomp_df))]
seasonal_col <- seasonal_cols[1] # Take the first seasonal column
cat("Seasonal column found:", seasonal_col, "\n\n")Seasonal column found: season_hour
# Extract trend features: level, slope (first difference), curvature (second difference)
trend_features <- decomp_df %>%
group_by(exchange) %>%
arrange(time_15min) %>%
mutate(
trend_level = trend,
trend_slope = trend - lag(trend, default = first(trend)),
trend_curvature = trend_slope - lag(trend_slope, default = first(trend_slope))
) %>%
ungroup() %>%
select(time_15min, exchange, trend_level, trend_slope, trend_curvature)
# Extract seasonal and remainder features
# Use the seasonal column we found dynamically
seasonal_remainder <- decomp_df %>%
select(time_15min, exchange, season_component = all_of(seasonal_col), remainder)
# Combine all features
features_long <- trend_features %>%
left_join(seasonal_remainder, by = c("time_15min", "exchange"))
# Pivot to wide format (one column per exchange-feature combination)
feature_matrix <- features_long %>%
pivot_wider(
names_from = exchange,
values_from = c(trend_level, trend_slope, trend_curvature, season_component, remainder),
names_sep = "_"
) %>%
arrange(time_15min)
# Display feature matrix structure
cat("Feature matrix dimensions:", nrow(feature_matrix), "x", ncol(feature_matrix), "\n\n")Feature matrix dimensions: 96 x 16
cat("Feature columns:\n")Feature columns:
colnames(feature_matrix) %>% head(10) %>% cat(sep = "\n")time_15min
trend_level_binance
trend_level_coinbase
trend_level_kraken
trend_slope_binance
trend_slope_coinbase
trend_slope_kraken
trend_curvature_binance
trend_curvature_coinbase
trend_curvature_kraken
# Show sample
feature_matrix %>%
head(5) %>%
select(1:8) %>% # First 8 columns only for display
kable(digits = 3, caption = "Sample feature matrix (first 5 rows, 8 cols)")| time_15min | trend_level_binance | trend_level_coinbase | trend_level_kraken | trend_slope_binance | trend_slope_coinbase | trend_slope_kraken | trend_curvature_binance |
|---|---|---|---|---|---|---|---|
| 2024-01-01 00:00:00 | 42320.97 | 42158.79 | 42261.32 | 0.000 | 0.000 | 0.000 | 0.000 |
| 2024-01-01 00:15:00 | 42289.26 | 42126.31 | 42226.80 | -31.712 | -32.475 | -34.518 | -31.712 |
| 2024-01-01 00:30:00 | 42257.55 | 42093.84 | 42192.28 | -31.712 | -32.475 | -34.518 | 0.000 |
| 2024-01-01 00:45:00 | 42225.84 | 42061.36 | 42157.76 | -31.712 | -32.475 | -34.518 | 0.000 |
| 2024-01-01 01:00:00 | 42192.25 | 42027.08 | 42121.32 | -33.589 | -34.282 | -36.444 | -1.878 |
7 7. Cross-Feed Analysis: Spreads and Correlations
# Calculate exchange spreads (price differences)
spread_data <- aligned_data %>%
mutate(
spread_binance_coinbase = price_binance - price_coinbase,
spread_binance_kraken = price_binance - price_kraken,
spread_coinbase_kraken = price_coinbase - price_kraken
)
# Visualize spreads
spread_data %>%
select(timestamp, starts_with("spread_")) %>%
pivot_longer(-timestamp, names_to = "spread", values_to = "value") %>%
ggplot(aes(x = timestamp, y = value, color = spread)) +
geom_line(alpha = 0.7) +
labs(
title = "Inter-Exchange Price Spreads",
x = "Time",
y = "Spread (USD)",
color = "Spread"
) +
theme_minimal()# Calculate rolling correlations between exchanges
library(zoo)
# 60-minute rolling window correlations
roll_window <- 60
correlations <- aligned_data %>%
mutate(
corr_binance_coinbase = rollapply(
cbind(price_binance, price_coinbase),
width = roll_window,
FUN = function(x) cor(x[,1], x[,2]),
by.column = FALSE,
fill = NA,
align = "right"
),
corr_binance_kraken = rollapply(
cbind(price_binance, price_kraken),
width = roll_window,
FUN = function(x) cor(x[,1], x[,2]),
by.column = FALSE,
fill = NA,
align = "right"
),
corr_coinbase_kraken = rollapply(
cbind(price_coinbase, price_kraken),
width = roll_window,
FUN = function(x) cor(x[,1], x[,2]),
by.column = FALSE,
fill = NA,
align = "right"
)
) %>%
drop_na()
# Visualize rolling correlations
correlations %>%
select(timestamp, starts_with("corr_")) %>%
pivot_longer(-timestamp, names_to = "pair", values_to = "correlation") %>%
ggplot(aes(x = timestamp, y = correlation, color = pair)) +
geom_line(alpha = 0.7) +
labs(
title = paste("Rolling Correlation (", roll_window, "-minute window)"),
x = "Time",
y = "Correlation",
color = "Exchange Pair"
) +
ylim(0.9, 1.0) +
theme_minimal()8 8. ACF/PACF Analysis on Remainder Components
# ACF/PACF on remainder to check for remaining autocorrelation
remainder_data <- decomp_components %>%
as_tibble() %>%
select(time_15min, exchange, remainder) %>%
as_tsibble(key = exchange, index = time_15min)
# Plot ACF for each exchange
p_acf <- remainder_data %>%
ACF(remainder, lag_max = 48) %>%
autoplot() +
facet_wrap(~ exchange, ncol = 1) +
labs(title = "ACF of Remainder (checking for leftover structure)") +
theme_minimal()
# Plot PACF for each exchange
p_pacf <- remainder_data %>%
PACF(remainder, lag_max = 48) %>%
autoplot() +
facet_wrap(~ exchange, ncol = 1) +
labs(title = "PACF of Remainder") +
theme_minimal()
p_acf / p_pacf9 9. Export Feature Matrix for Modeling
# Export the feature matrix for downstream modeling
library(writexl)
write_xlsx(feature_matrix, "btc_feature_matrix.xlsx")
write.csv(feature_matrix, "btc_feature_matrix.csv", row.names = FALSE)
cat("Feature matrix exported to:\n")Feature matrix exported to:
cat("- btc_feature_matrix.xlsx\n")- btc_feature_matrix.xlsx
cat("- btc_feature_matrix.csv\n\n")- btc_feature_matrix.csv
cat("Ready for:\n")Ready for:
cat("- ML models (regression, classification, reinforcement learning)\n")- ML models (regression, classification, reinforcement learning)
cat("- ARIMA/VAR models on trend components\n")- ARIMA/VAR models on trend components
cat("- Volatility forecasting on remainder\n")- Volatility forecasting on remainder
cat("- Spread/arbitrage signal generation\n")- Spread/arbitrage signal generation
10 10. Summary: The Complete Pipeline
10.1 What We Built
- Multi-source data ingestion (simulated 3 exchanges)
- Collapse layer (alignment to master grid + forward-fill)
- STL decomposition (trend + seasonal + remainder per exchange)
- Feature engineering (trend level/slope/curvature, seasonal, remainder)
- Cross-feed analysis (spreads, rolling correlations)
- Persistence diagnostics (ACF/PACF on remainder)
- Time-aligned feature matrix (X_t) (ready for modeling)
10.2 Why This Matters
Traditional approaches: - Use raw prices directly → noisy, non-stationary - Ignore cross-feed structure → miss arbitrage opportunities - No systematic decomposition → hard to interpret model features
This pipeline: - ✅ Separates signal (trend/seasonal) from noise (remainder) - ✅ Creates interpretable features (trend slope = momentum) - ✅ Enables multi-timescale analysis (trend for position, remainder for risk) - ✅ Generalizes to any multi-feed scenario (crypto indices, FX, commodities)
10.3 Next Steps
- Feature selection: Use feature importance (RF, LASSO) on (X_t)
- Forecasting: Build ARIMAX/VAR models on trend components
- Regime detection: Cluster on remainder variance (high vol = regime shift)
- Signal generation: Spread crossovers, trend slope thresholds
- Backtesting: Simulate trades using feature-based rules
10.4 Connection to Discussion 3
This is the direct extension of the PAYNSA decomposition work:
| PAYNSA (Single Series) | BTC Engine (Multi-Feed) |
|---|---|
| 1 series (employment) | N series (exchanges) |
| Monthly frequency | Minute/tick frequency |
| Additive vs multiplicative | Additive (price levels) |
| Forecast next value | Generate trading signals |
| Excel verification | Production pipeline |
Core insight remains the same: Decomposition transforms raw data into structured features that are easier to model, interpret, and act upon.