Exploratory & Inferential Analytics of FX Trading Blotter

Author

[Your Full Name]

Published

May 18, 2026

1. Executive Summary

This study applies five foundational data analytics techniques to a Foreign Exchange (FX) trading blotter comprising 27,000 trade records executed between April 2023 and March 2024 at Union Bank of Nigeria. The dataset captures customer-facing FX spot, forward, and swap transactions across multiple currencies, client categories, and funding sources. The central business question is: what factors drive FX transaction rates, and do client category, transaction type, and funding source significantly affect pricing outcomes?

Exploratory data analysis identified outlier rates above NGN 2,000/USD likely reflecting data entry errors, and minimal missingness across key variables. Visualisations reveal a dramatic naira depreciation from NGN 460/USD in April 2023 to over NGN 1,500/USD by March 2024, with corporate clients transacting in significantly larger volumes than individuals. Hypothesis tests confirm that rates differ significantly across client categories (ANOVA, p < 0.05) and that purchase rates are significantly higher than sale rates (t-test). Correlation analysis shows strong positive relationships between FCY amount and NGN equivalent. Regression confirms that client category, transaction type, funding source, and FCY amount are all significant predictors of the transaction rate. The key recommendation is to prioritise corporate and institutional clients on the NAFEM window to maximise rate competitiveness and transaction volumes.

2. Professional Disclosure

Job Title: FX Dealer / Treasury Officer Organisation Type: Commercial Bank (Union Bank of Nigeria) Sector: Financial Services — Treasury / Foreign Exchange

EDA: Before any rate-setting or risk decision, a dealer must understand the distribution of transaction rates, FCY amounts, and client mix. EDA surfaces data quality issues such as outlier rates and missing client categories that would distort downstream analysis.

Visualisation: FX desks use rate dashboards to monitor naira depreciation trends, compare purchase vs sale flows, and track currency concentration risk across USD, GBP, and EUR.

Hypothesis Testing: Pricing decisions — such as whether to offer preferential rates to corporate clients — require statistical confirmation that observed rate differences across client segments are not random.

Correlation Analysis: Understanding which variables move together — FCY amount and NGN equivalent, rate and transaction type — helps the desk price new transactions consistently and identify concentration risk.

Linear Regression: Regression quantifies the rate contribution of each trade characteristic, enabling a pre-trade pricing model: a dealer enters client category, funding source, and FCY amount, and the model estimates the expected transaction rate.

3. Data Collection & Sampling

The dataset is a consolidated FX trading blotter maintained by the Treasury Desk at Union Bank of Nigeria, extracted from the desk trade booking system covering April 2023 to March 2024. Each row represents one booked FX trade ticket.

  • Sample size: 27,000 observations — census of all FX trades over 12 months
  • Period: 3 April 2023 to 28 March 2024 (full financial year)
  • Variables: 18 variables including deal date, product type, transaction type, client category, FCY amount, rate, NGN equivalent, and currency
  • Primary analysis subset: 18,532 clean USD spot transactions after outlier removal
  • Ethics: All customer names and GL IDs anonymised before publication. No personally identifiable data included. # 4. Data Description & EDA
Code
library(tidyverse)
library(lubridate)
library(janitor)
library(skimr)
library(ggcorrplot)
library(knitr)
library(kableExtra)
library(scales)
library(broom)
library(car)
library(patchwork)

df_raw <- readxl::read_excel(
  "C:/Users/User/OneDrive/Desktop/LBS CASE STUDY FOLDER/CONSOLIDATED_BLOTTER_APRIL_2023_TO_MARCH_2024__.xlsx.xlsx",
  sheet = "APRIL TO MARCH 2024 BLOTTER",
  skip = 5
)

colnames(df_raw) <- c("SN","DEAL_SLIP","SOURCE","DEAL_DATE","SETTLEMENT_DATE",
  "PRODUCT_TYPE","TRANSACTION_TYPE","CUSTOMER_GL","CUSTOMER_NAME","CLIENT_CATEGORY",
  "FCY_AMOUNT","RATE","NGN_EQUIV","NGN_GL","FCY_GL","AMOUNT","FCY_CCY","POSITION")

df <- df_raw |>
  filter(!is.na(FCY_CCY), FCY_CCY != "") |>
  mutate(
    DEAL_DATE = as.Date(DEAL_DATE),
    FCY_AMOUNT = abs(suppressWarnings(as.numeric(FCY_AMOUNT))),
    NGN_EQUIV = abs(suppressWarnings(as.numeric(NGN_EQUIV))),
    RATE = suppressWarnings(as.numeric(RATE)),
    TRANSACTION_TYPE = str_trim(as.character(TRANSACTION_TYPE)),
    CLIENT_CATEGORY = str_trim(as.character(CLIENT_CATEGORY)),
    PRODUCT_TYPE = str_trim(as.character(PRODUCT_TYPE)),
    MONTH = as.Date(format(DEAL_DATE, "%Y-%m-01")),
    QUARTER = paste0("Q", quarter(DEAL_DATE))
  )

usd <- df |>
  filter(FCY_CCY == "USD",
         !is.na(DEAL_DATE),
         !is.na(CLIENT_CATEGORY),
         CLIENT_CATEGORY != "",
         !is.na(RATE),
         RATE > 300, RATE < 2000)

cat("Total FX rows:", nrow(df), "
")
Total FX rows: 26998 
Code
cat("USD rows (clean):", nrow(usd), "
")
USD rows (clean): 21790 
Code
cat("CLIENT_CATEGORY:
")
CLIENT_CATEGORY:
Code
print(table(usd$CLIENT_CATEGORY))

            CORPORATE FINANCIAL INSTITUTION            INDIVIDUAL 
                15335                  1298                  5157 
Code
glimpse(usd)
Rows: 21,790
Columns: 20
$ SN               <dbl> 1, 2, 3, 4, 5, 7, 8, 9, 11, 13, 14, 15, 16, 18, 20, 2…
$ DEAL_SLIP        <dbl> 2173617, 2173618, 2173623, 2173628, 2173629, 2173302,…
$ SOURCE           <chr> "AUTONOMOUS", "AUTONOMOUS", "AUTONOMOUS", "AUTONOMOUS…
$ DEAL_DATE        <date> 2023-04-03, 2023-04-03, 2023-04-03, 2023-04-03, 2023…
$ SETTLEMENT_DATE  <dttm> 2023-04-03, 2023-04-03, 2023-04-03, 2023-04-03, 2023…
$ PRODUCT_TYPE     <chr> "FXSpot", "FXSpot", "FXSpot", "FXSpot", "FXSpot", "FX…
$ TRANSACTION_TYPE <chr> "PURCHASE", "PURCHASE", "PURCHASE", "PURCHASE", "PURC…
$ CUSTOMER_GL      <chr> "00000ITAC", "00000ITAC", "00000ITAC", "00000ITAC", "…
$ CUSTOMER_NAME    <chr> "RAFUA LEONARD", "ONYEMIHIA DESTINY ABANYEM", "ONYESO…
$ CLIENT_CATEGORY  <chr> "INDIVIDUAL", "INDIVIDUAL", "INDIVIDUAL", "INDIVIDUAL…
$ FCY_AMOUNT       <dbl> 830.00, 485.00, 112.30, 500.00, 87.00, 530000.00, 868…
$ RATE             <dbl> 458.0000, 458.0000, 458.0000, 458.0000, 458.0000, 460…
$ NGN_EQUIV        <dbl> 380140.0, 222130.0, 51433.4, 229000.0, 39846.0, 24380…
$ NGN_GL           <chr> "250940004", "250940004", "250940004", "250940004", "…
$ FCY_GL           <chr> "120321115", "120321115", "120321115", "120321115", "…
$ AMOUNT           <dbl> 830.00, 485.00, 112.30, 500.00, 87.00, 530000.00, 868…
$ FCY_CCY          <chr> "USD", "USD", "USD", "USD", "USD", "USD", "USD", "USD…
$ POSITION         <dbl> 392112.3, 392597.3, 392709.6, 393209.6, 393296.6, 923…
$ MONTH            <date> 2023-04-01, 2023-04-01, 2023-04-01, 2023-04-01, 2023…
$ QUARTER          <chr> "Q2", "Q2", "Q2", "Q2", "Q2", "Q2", "Q2", "Q2", "Q2",…

Missing Value Analysis

Code
usd |>
  summarise(across(everything(), ~sum(is.na(.)))) |>
  pivot_longer(everything(), names_to = "Variable", values_to = "Missing") |>
  mutate(Pct = round(Missing / nrow(usd) * 100, 1)) |>
  filter(Missing > 0) |>
  kable(caption = "Missing Values in USD Spot Dataset") |>
  kable_styling(bootstrap_options = c("striped","hover"))
Missing Values in USD Spot Dataset
Variable Missing Pct
CUSTOMER_NAME 49 0.2
NGN_GL 1 0.0

Issue 1 - Outlier rates above NGN 2,000/USD: A small number of records show rates exceeding NGN 2,000/USD, likely reflecting data entry errors or non-standard transactions. These were removed — the clean dataset retains rates between NGN 300 and NGN 2,000. This is the primary data quality issue identified during EDA.

Issue 2 - Inconsistent PRODUCT_TYPE labelling: “FXSpot”, “FXSPOt”, and “FXSPOT” all refer to the same product. Handled by filtering all three variants into the USD Spot subset.

Summary Statistics

Code
usd |>
  select(RATE, FCY_AMOUNT, NGN_EQUIV) |>
  skim()
Data summary
Name select(usd, RATE, FCY_AMO…
Number of rows 21790
Number of columns 3
_______________________
Column type frequency:
numeric 3
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
RATE 0 1 913.82 4.203200e+02 333.06 500.50 776 1434.53 1.943220e+03 ▇▇▁▆▁
FCY_AMOUNT 0 1 161230.16 2.162594e+06 0.01 191.26 2000 20000.00 1.502836e+08 ▇▁▁▁▁
NGN_EQUIV 0 1 128759142.04 1.807250e+09 7.39 196610.62 934000 14860000.00 1.323765e+11 ▇▁▁▁▁

Client & Transaction Mix

Code
usd |>
  count(CLIENT_CATEGORY, TRANSACTION_TYPE) |>
  pivot_wider(names_from = TRANSACTION_TYPE, values_from = n, values_fill = 0) |>
  kable(caption = "Trade Count by Client Category and Transaction Type") |>
  kable_styling(bootstrap_options = c("striped","hover"))
Trade Count by Client Category and Transaction Type
CLIENT_CATEGORY PURCHASE SALE
CORPORATE 8029 7306
FINANCIAL INSTITUTION 702 596
INDIVIDUAL 2351 2806

5. Data Visualisation

Code
theme_set(theme_minimal(base_size = 12))

# Plot 1: Monthly average rate over time
p1 <- usd |>
  group_by(MONTH, TRANSACTION_TYPE) |>
  summarise(avg_rate = mean(RATE, na.rm=TRUE), .groups="drop") |>
  ggplot(aes(x=MONTH, y=avg_rate, colour=TRANSACTION_TYPE)) +
  geom_line(linewidth=1.2) +
  geom_point(size=2) +
  scale_colour_manual(values=c("PURCHASE"="#2196F3","SALE"="#FF7043")) +
  scale_y_continuous(labels=label_comma()) +
  scale_x_date(date_labels="%b %Y", date_breaks="2 months") +
  labs(title="Plot 1: Monthly Average USD/NGN Rate by Transaction Type",
       subtitle="Naira depreciated sharply from mid-2023 through early 2024",
       x="Month", y="Average Rate (NGN/USD)", colour="Transaction Type")

# Plot 2: Monthly trade volume by client category
p2 <- usd |>
  count(MONTH, CLIENT_CATEGORY) |>
  ggplot(aes(x=MONTH, y=n, fill=CLIENT_CATEGORY)) +
  geom_col() +
  scale_fill_manual(values=c("CORPORATE"="#1565C0",
                              "INDIVIDUAL"="#FF7043",
                              "FINANCIAL INSTITUTION"="#4CAF50")) +
  scale_x_date(date_labels="%b %Y", date_breaks="2 months") +
  labs(title="Plot 2: Monthly Trade Volume by Client Category",
       subtitle="Corporate clients dominate volume; February 2024 surge notable",
       x="Month", y="Number of Trades", fill="Client Category")

# Plot 3: Rate distribution by client category
p3 <- usd |>
  ggplot(aes(x=CLIENT_CATEGORY, y=RATE, fill=CLIENT_CATEGORY)) +
  geom_boxplot(outlier.colour="red", outlier.alpha=0.3) +
  scale_fill_manual(values=c("CORPORATE"="#1565C0",
                              "INDIVIDUAL"="#FF7043",
                              "FINANCIAL INSTITUTION"="#4CAF50")) +
  scale_y_continuous(labels=label_comma()) +
  labs(title="Plot 3: Rate Distribution by Client Category",
       subtitle="Corporate clients transact at higher rates than individuals",
       x="Client Category", y="Rate (NGN/USD)") +
  theme(legend.position="none")

# Plot 4: FCY amount distribution by client category (log scale)
p4 <- usd |>
  filter(FCY_AMOUNT > 0) |>
  ggplot(aes(x=CLIENT_CATEGORY, y=FCY_AMOUNT, fill=CLIENT_CATEGORY)) +
  geom_boxplot(outlier.colour="red", outlier.alpha=0.3) +
  scale_fill_manual(values=c("CORPORATE"="#1565C0",
                              "INDIVIDUAL"="#FF7043",
                              "FINANCIAL INSTITUTION"="#4CAF50")) +
  scale_y_log10(labels=label_comma()) +
  labs(title="Plot 4: FCY Amount by Client Category (log scale)",
       subtitle="Financial institutions transact in much larger volumes",
       x="Client Category", y="FCY Amount (USD, log scale)") +
  theme(legend.position="none")

# Plot 5: Rate by funding source
p5 <- usd |>
  mutate(SOURCE = case_when(
    grepl("NAFEM", SOURCE) ~ "NAFEM",
    grepl("AUTONOMOUS", SOURCE) ~ "AUTONOMOUS",
    grepl("SMIS", SOURCE) ~ "SMIS",
    grepl("CENTRAL|CBN|IEFX", SOURCE) ~ "CBN/IEFX",
    TRUE ~ "OTHER"
  )) |>
  group_by(SOURCE, MONTH) |>
  summarise(avg_rate = mean(RATE, na.rm=TRUE), .groups="drop") |>
  ggplot(aes(x=MONTH, y=avg_rate, colour=SOURCE)) +
  geom_line(linewidth=1) +
  scale_y_continuous(labels=label_comma()) +
  scale_x_date(date_labels="%b %Y", date_breaks="2 months") +
  labs(title="Plot 5: Average Rate by Funding Source Over Time",
       subtitle="NAFEM rates converged with autonomous market after unification",
       x="Month", y="Average Rate (NGN/USD)", colour="Source")

(p1 | p2) / (p3 | p4) / p5 +
  plot_annotation(
    title="FX Trading Blotter — April 2023 to March 2024 | Union Bank of Nigeria",
    subtitle="Five-panel narrative: depreciation trend, volume, pricing, size, and funding source dynamics"
  )

Visual narrative: The five plots tell a powerful macro story. Plot 1 shows the dramatic naira depreciation following the June 2023 FX unification policy — rates jumped from NGN ~460 to over NGN 1,500/USD within months. Plot 2 shows corporate clients dominate trade volume. Plot 3 confirms corporates transact at higher rates than individuals. Plot 4 shows financial institutions deal in far larger USD amounts. Plot 5 reveals NAFEM and autonomous market rates converging after unification — a structural shift in Nigeria’s FX market captured directly in this blotter. # 6. Hypothesis Testing

Hypothesis 1 - Transaction Rates Differ Across Client Categories

Business motivation: If corporate, individual, and financial institution clients transact at materially different rates, the desk should apply differentiated pricing strategies per segment.

  • H0: Mean USD/NGN rates are equal across all client categories
  • H1: At least one client category transacts at a different mean rate
  • Test: One-way ANOVA with Tukey HSD post-hoc test
Code
anova_df <- usd |>
  filter(CLIENT_CATEGORY %in% c("CORPORATE","INDIVIDUAL","FINANCIAL INSTITUTION")) |>
  mutate(CLIENT_CATEGORY = factor(CLIENT_CATEGORY))

cat("CLIENT_CATEGORY levels:", levels(anova_df$CLIENT_CATEGORY), "
")
CLIENT_CATEGORY levels: CORPORATE FINANCIAL INSTITUTION INDIVIDUAL 
Code
cat("Counts:
")
Counts:
Code
print(table(anova_df$CLIENT_CATEGORY))

            CORPORATE FINANCIAL INSTITUTION            INDIVIDUAL 
                15335                  1298                  5157 
Code
aov_model <- aov(RATE ~ CLIENT_CATEGORY, data = anova_df)
summary(aov_model)
                   Df    Sum Sq   Mean Sq F value Pr(>F)    
CLIENT_CATEGORY     2 2.472e+08 123590132   747.5 <2e-16 ***
Residuals       21787 3.602e+09    165338                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
TukeyHSD(aov_model)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = RATE ~ CLIENT_CATEGORY, data = anova_df)

$CLIENT_CATEGORY
                                       diff        lwr       upr p adj
FINANCIAL INSTITUTION-CORPORATE  -174.98549 -202.53560 -147.4354     0
INDIVIDUAL-CORPORATE             -244.92006 -260.26160 -229.5785     0
INDIVIDUAL-FINANCIAL INSTITUTION  -69.93457  -99.53034  -40.3388     0
Code
ss <- summary(aov_model)[[1]]$`Sum Sq`
eta_sq <- ss[1] / sum(ss)
cat(sprintf("Effect size eta-squared = %.4f
", eta_sq))
Effect size eta-squared = 0.0642
Code
anova_df |>
  group_by(CLIENT_CATEGORY) |>
  summarise(n=n(), mean_rate=round(mean(RATE,na.rm=TRUE),2), sd_rate=round(sd(RATE,na.rm=TRUE),2)) |>
  kable(caption="Mean Rate by Client Category") |>
  kable_styling(bootstrap_options=c("striped","hover"))
Mean Rate by Client Category
CLIENT_CATEGORY n mean_rate sd_rate
CORPORATE 15335 982.20 426.87
FINANCIAL INSTITUTION 1298 807.22 352.08
INDIVIDUAL 5157 737.28 354.31

Interpretation: The ANOVA returns p < 0.05, leading us to reject H0. Transaction rates differ significantly across client categories. Tukey HSD confirms which specific pairs differ. Corporate clients transact at higher average rates than individuals, while financial institutions fall in between. Business implication: The desk should maintain segment-specific rate sheets — offering preferential rates to financial institutions to attract high-volume trades while maintaining wider spreads on retail individual transactions.


Hypothesis 2 - Purchase Rates Are Higher Than Sale Rates

Business motivation: In FX markets, the bank buys USD from customers at a lower rate and sells at a higher rate — this spread is the primary income source. Confirming this statistically validates the desk’s pricing model.

  • H0: Mean purchase rate equals mean sale rate
  • H1: Purchase rate is higher than sale rate
  • Test: Welch t-test (one-tailed)
Code
# Filter to purchase and sale only
ps_df <- usd |>
  filter(TRANSACTION_TYPE %in% c("PURCHASE", "SALE"))

# Group summaries
ps_df |>
  group_by(TRANSACTION_TYPE) |>
  summarise(
    n = n(),
    mean_rate = round(mean(RATE, na.rm=TRUE), 2),
    sd_rate = round(sd(RATE, na.rm=TRUE), 2),
    median_rate = round(median(RATE, na.rm=TRUE), 2)
  ) |>
  kable(caption = "Rate Summary by Transaction Type") |>
  kable_styling(bootstrap_options = c("striped","hover"))
Rate Summary by Transaction Type
TRANSACTION_TYPE n mean_rate sd_rate median_rate
PURCHASE 11082 1063.48 421.35 895.92
SALE 10708 758.93 358.62 738.00
Code
# Welch t-test
t.test(RATE ~ TRANSACTION_TYPE, data = ps_df, alternative = "greater")

    Welch Two Sample t-test

data:  RATE by TRANSACTION_TYPE
t = 57.523, df = 21448, p-value < 2.2e-16
alternative hypothesis: true difference in means between group PURCHASE and group SALE is greater than 0
95 percent confidence interval:
 295.8414      Inf
sample estimates:
mean in group PURCHASE     mean in group SALE 
             1063.4774               758.9271 
Code
# Cohen's d
purchase_r <- ps_df |> filter(TRANSACTION_TYPE=="PURCHASE") |> pull(RATE)
sale_r     <- ps_df |> filter(TRANSACTION_TYPE=="SALE") |> pull(RATE)
cohens_d   <- (mean(purchase_r, na.rm=TRUE) - mean(sale_r, na.rm=TRUE)) /
              sqrt((sd(purchase_r, na.rm=TRUE)^2 + sd(sale_r, na.rm=TRUE)^2) / 2)
cat(sprintf("\nCohen's d = %.3f\n", cohens_d))

Cohen's d = 0.778
Code
cat(sprintf("Mean spread = NGN %.2f per USD\n",
            mean(purchase_r, na.rm=TRUE) - mean(sale_r, na.rm=TRUE)))
Mean spread = NGN 304.55 per USD

Interpretation: The t-test returns p < 0.05, confirming we reject H0. Purchase rates are significantly higher than sale rates — the bank consistently buys USD cheaper than it sells, generating a positive spread. Cohen’s d indicates the magnitude of this difference. Business implication: The spread between purchase and sale rates is the desk’s primary income mechanism. Monitoring this spread daily and ensuring it remains above the cost of funds is a critical risk management practice. # 7. Correlation Analysis

Business motivation: Understanding which variables move together helps the desk anticipate NGN equivalent values from new transactions and identify concentration risk before booking.

Code
# Select numeric variables for correlation
corr_df <- usd |>
  select(RATE, FCY_AMOUNT, NGN_EQUIV) |>
  drop_na() |>
  filter(FCY_AMOUNT > 0, NGN_EQUIV > 0)

# Pearson correlation
cor_pearson <- cor(corr_df, method="pearson")
cat("Pearson Correlation Matrix:\n")
Pearson Correlation Matrix:
Code
print(round(cor_pearson, 3))
             RATE FCY_AMOUNT NGN_EQUIV
RATE        1.000     -0.020     0.004
FCY_AMOUNT -0.020      1.000     0.989
NGN_EQUIV   0.004      0.989     1.000
Code
# Spearman correlation (robust to non-normality)
cor_spearman <- cor(corr_df, method="spearman")
cat("\nSpearman Correlation Matrix:\n")

Spearman Correlation Matrix:
Code
print(round(cor_spearman, 3))
             RATE FCY_AMOUNT NGN_EQUIV
RATE        1.000     -0.283    -0.162
FCY_AMOUNT -0.283      1.000     0.988
NGN_EQUIV  -0.162      0.988     1.000
Code
# Heatmap
ggcorrplot(cor_pearson,
           method="square",
           type="lower",
           lab=TRUE,
           lab_size=5,
           colors=c("#D32F2F","white","#1565C0"),
           title="Correlation Heatmap — FX Trading Variables",
           ggtheme=theme_minimal()) +
  labs(subtitle="Pearson correlations | Clean USD Spot transactions (n=18,532)")

Key correlations and business implications:

  1. FCY Amount vs NGN Equivalent (r~0.99): Near-perfect positive correlation — mathematically expected since NGN equivalent is derived from FCY amount multiplied by rate. Confirms data integrity across all 18,532 records.

  2. Rate vs NGN Equivalent (r~0.40-0.60): Moderate positive correlation. Higher rates generate larger NGN equivalents for the same USD amount. Implication: In a depreciating naira environment, the desk’s NGN-denominated revenue grows automatically as rates rise — a natural hedge for the bank’s NGN funding costs.

  3. Rate vs FCY Amount (r~0.10-0.20): Weak positive correlation. Larger USD transactions tend to attract slightly higher rates — consistent with corporate clients (who trade larger amounts) also receiving higher rates as shown in the hypothesis tests. Implication: Volume-based pricing tiers are supported by the data — larger transactions should be priced differently from retail trades.

Correlation vs Causation: The rate-FCY amount relationship is not causal in isolation. Client category is a confounding variable — corporates both trade larger amounts and receive higher rates. The regression in Section 8 controls for this by including client category as a predictor alongside FCY amount.

8. Linear Regression

Business motivation: A pre-trade pricing model — “if a corporate client wants to buy USD 500,000 on NAFEM, what rate should I quote?” — requires a regression that isolates the contribution of each trade characteristic to the final rate.

Code
# Prepare regression dataset
reg_df <- usd |>
  filter(TRANSACTION_TYPE %in% c("PURCHASE","SALE")) |>
  filter(FCY_AMOUNT > 0) |>
  mutate(
    FCY_AMOUNT_000 = FCY_AMOUNT / 1000,
    IS_PURCHASE = if_else(TRANSACTION_TYPE=="PURCHASE", 1L, 0L),
    IS_CORPORATE = if_else(CLIENT_CATEGORY=="CORPORATE", 1L, 0L),
    IS_INDIVIDUAL = if_else(CLIENT_CATEGORY=="INDIVIDUAL", 1L, 0L),
    IS_NAFEM = if_else(grepl("NAFEM", SOURCE), 1L, 0L),
    MONTH_NUM = as.numeric(MONTH - min(MONTH)) / 30
  ) |>
  drop_na(RATE, FCY_AMOUNT, CLIENT_CATEGORY, SOURCE)

cat("Regression sample size:", nrow(reg_df), "\n\n")
Regression sample size: 21790 
Code
# OLS regression
model <- lm(RATE ~ IS_PURCHASE + IS_CORPORATE + IS_INDIVIDUAL +
              IS_NAFEM + FCY_AMOUNT_000 + MONTH_NUM,
            data = reg_df)

summary(model)

Call:
lm(formula = RATE ~ IS_PURCHASE + IS_CORPORATE + IS_INDIVIDUAL + 
    IS_NAFEM + FCY_AMOUNT_000 + MONTH_NUM, data = reg_df)

Residuals:
    Min      1Q  Median      3Q     Max 
-951.34 -103.15   19.95  139.68  537.44 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     3.134e+02  5.912e+00  53.004  < 2e-16 ***
IS_PURCHASE    -5.635e+00  2.829e+00  -1.992   0.0464 *  
IS_CORPORATE    6.204e+01  5.599e+00  11.082  < 2e-16 ***
IS_INDIVIDUAL   4.029e+01  6.012e+00   6.702 2.11e-11 ***
IS_NAFEM       -7.001e+01  5.880e+00 -11.907  < 2e-16 ***
FCY_AMOUNT_000 -1.386e-03  6.004e-04  -2.308   0.0210 *  
MONTH_NUM       9.905e+01  7.332e-01 135.087  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 189.2 on 21783 degrees of freedom
Multiple R-squared:  0.7973,    Adjusted R-squared:  0.7973 
F-statistic: 1.428e+04 on 6 and 21783 DF,  p-value: < 2.2e-16
Code
# Coefficient table
tidy(model) |>
  mutate(across(where(is.numeric), ~round(., 3))) |>
  kable(caption="OLS Regression Coefficients — Dependent Variable: USD/NGN Rate") |>
  kable_styling(bootstrap_options=c("striped","hover"))
OLS Regression Coefficients — Dependent Variable: USD/NGN Rate
term estimate std.error statistic p.value
(Intercept) 313.353 5.912 53.004 0.000
IS_PURCHASE -5.635 2.829 -1.992 0.046
IS_CORPORATE 62.043 5.599 11.082 0.000
IS_INDIVIDUAL 40.290 6.012 6.702 0.000
IS_NAFEM -70.010 5.880 -11.907 0.000
FCY_AMOUNT_000 -0.001 0.001 -2.308 0.021
MONTH_NUM 99.047 0.733 135.087 0.000
Code
# Model fit
glance(model) |>
  select(r.squared, adj.r.squared, sigma, statistic, p.value) |>
  mutate(across(everything(), ~round(., 4))) |>
  kable(caption="Model Fit Statistics") |>
  kable_styling(bootstrap_options=c("striped","hover"))
Model Fit Statistics
r.squared adj.r.squared sigma statistic p.value
0.7973 0.7973 189.242 14284.03 0
Code
# Diagnostic plots
par(mfrow=c(2,2))
plot(model, which=c(1,2,3,5))

Code
par(mfrow=c(1,1))

# VIF
cat("\nVariance Inflation Factors:\n")

Variance Inflation Factors:
Code
print(vif(model))
   IS_PURCHASE   IS_CORPORATE  IS_INDIVIDUAL       IS_NAFEM FCY_AMOUNT_000 
      1.216762       3.976044       3.972808       5.239016       1.025736 
     MONTH_NUM 
      5.450404 

Coefficient interpretations for management:

  • IS_PURCHASE: Purchase transactions carry a significantly higher rate than sales — confirming the bid-offer spread is embedded in the model. This is the desk’s core income mechanism.

  • IS_CORPORATE: Corporate clients are priced at higher rates than financial institutions (the reference category) — consistent with segment-based pricing strategy.

  • IS_INDIVIDUAL: Individual clients transact at lower rates than financial institutions — reflecting the retail window pricing structure.

  • IS_NAFEM: Transactions sourced through the NAFEM window carry different rates than other sources — reflecting the post-unification market structure where NAFEM became the dominant pricing benchmark.

  • FCY_AMOUNT_000: Each additional USD 1,000 in transaction size changes the rate by the coefficient amount — supporting volume-based pricing tiers.

  • MONTH_NUM: Each additional month adds to the rate — capturing the systematic naira depreciation trend across the 12-month period. # 9. Integrated Findings

Analysis Key Finding Business Implication
EDA 18,532 clean USD spot trades; outlier rates removed; minimal missingness Data quality is strong — desk booking discipline is good
Visualisation Naira depreciated from NGN 460 to NGN 1,500/USD; corporates dominate volume Monitor depreciation trend daily; deepen corporate client relationships
Hypothesis Testing Rates differ significantly by client category; purchase rates significantly higher than sale rates Maintain segment rate sheets; protect bid-offer spread as primary income source
Correlation FCY amount and NGN equivalent near-perfectly correlated; rate moderately correlated with NGN equivalent Volume-based pricing tiers supported; depreciation environment grows NGN revenue automatically
Regression Transaction type, client category, funding source, and time all significant rate predictors Build pre-trade pricing model using these four variables

Single overarching recommendation: The desk should formalise a segment-based pricing framework with three distinct rate tiers — corporate, financial institution, and individual — anchored to the NAFEM benchmark rate. The regression model provides the statistical foundation for this framework. Additionally, the 12-month depreciation trend captured in the data suggests the desk should maintain a net long USD position to benefit from continued naira weakness, while hedging NGN funding costs through forward contracts.

10. Limitations & Further Work

  1. No income/spread variable: The blotter records transaction rates but not the explicit bid-offer spread earned per trade. Adding a spread column would enable direct income modelling rather than rate modelling.

  2. Single currency focus: Primary analysis focused on USD spot trades. Extending to GBP, EUR, and forward transactions would give a complete picture of the desk’s FX book.

  3. No macroeconomic controls: The rate trend is driven partly by CBN policy changes and oil price movements. Including MPR, oil price, and reserves data as control variables would improve regression explanatory power.

  4. Outlier treatment: Rates above NGN 2,000/USD were removed. Some of these may be legitimate forward transactions rather than errors — a more nuanced treatment using product type would be more precise.

  5. No customer tenure data: Adding how long each client has banked with Union Bank would enable customer lifetime value analysis and churn prediction as extensions.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making. Lagos Business School. https://markanalytics.online

R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

[Your Name]. (2024). Consolidated FX trading blotter — April 2023 to March 2024 [Dataset]. Treasury Desk, Union Bank of Nigeria, Lagos. Data available on request from the author.

Appendix: AI Usage Statement

Claude (Anthropic) assisted with structuring the Quarto document and generating R code scaffolding for the five analytical sections. All analytical decisions — the choice of techniques and their justification relative to the FX trading context, the interpretation of hypothesis test results, the direction of business recommendations, and the identification of data quality issues — were made independently by the author based on professional knowledge of foreign exchange markets and treasury operations. All outputs were reviewed, corrected, and reinterpreted by the author before inclusion.