---
title: "Primary vs Secondary Market Yield Dynamics for 364-Day Nigerian Treasury Bills"
subtitle: "Case Study 1 — Exploratory & Inferential Analytics"
author: "Ifeoluwa Akindutire"
date: today
format:
html:
theme: flatly
toc: true
toc-depth: 3
number-sections: true
code-fold: true
code-tools: true
self-contained: true
smooth-scroll: true
execute:
warning: false
message: false
echo: true
---
# Executive Summary
This study investigates the relationship between primary market (CBN auction) stop rates and secondary market yields for 364-day Nigerian Treasury Bills (NTBs) during January–May 2026. As a Fixed Income Bills Dealer at Sterling Bank, I collected 101 daily secondary market yield observations from my trading desk and matched them with 12 bi-weekly CBN auction stop rates over the same period. Exploratory analysis reveals that primary auction rates consistently exceeded secondary yields, with the spread narrowing from 208 basis points in January to just 15 basis points by May. Hypothesis testing confirms a statistically significant difference between primary and secondary yields on auction dates (*p* < 0.001, Cohen's *d* = 1.23), and secondary yields differ significantly across months (*p* < 0.001, *η²* = 0.60). Correlation analysis shows a strong positive relationship between primary and secondary yields (*r* = 0.89), even after controlling for the time trend. A linear regression model explains approximately 87% of the variance in secondary yields, with primary stop rate as the dominant predictor. The key recommendation is that the trading desk should use a calibrated primary-to-secondary yield discount factor — currently averaging 40 basis points — to set competitive secondary market quotes immediately after each auction.
# Professional Disclosure
## About the Analyst
I am **Ifeoluwa Akindutire**, a Fixed Income (Bills) Dealer at **Sterling Bank**, one of Nigeria's mid-tier commercial banks. My primary responsibility is trading Nigerian Treasury Bills (NTBs) in both the primary market (participating in Central Bank of Nigeria fortnightly auctions) and the secondary market (quoting bid-offer prices to institutional counterparties). I operate within the bank's Treasury division, where pricing accuracy and yield curve interpretation directly affect the bank's profitability and liquidity management.
## Technique Relevance to My Work
**Exploratory Data Analysis (EDA).** Every trading day begins with reviewing yield movements, identifying unusual patterns, and checking data integrity in our trading systems. EDA formalises this process — summary statistics on yields, spread distributions, and outlier detection help me identify mispriced securities and data errors before they translate into trading losses (Adi, 2026, Ch. 4).
**Data Visualisation.** I communicate yield trends and auction outcomes to senior management and the Asset-Liability Committee (ALCO) weekly. Effective visualisation — time series plots of primary vs secondary yields, spread evolution charts — allows non-technical stakeholders to grasp market dynamics at a glance and make informed investment decisions (Adi, 2026, Ch. 5).
**Hypothesis Testing.** A recurring question on the desk is whether primary auction rates are "fairly priced" relative to secondary market yields. Formal hypothesis testing moves this from subjective judgement to evidence-based assessment, enabling the desk to quantify whether observed yield differentials are statistically meaningful or simply noise (Adi, 2026, Ch. 6).
**Correlation Analysis.** Understanding how primary and secondary yields move together (or apart) is essential for hedging and position management. If the correlation between primary stop rates and secondary yields is strong, I can confidently use auction results to forecast next-day secondary trading levels. If it weakens, I need to investigate additional market drivers (Adi, 2026, Ch. 8).
**Linear Regression.** The ultimate operational question is: *given that the CBN auction just cleared at X%, where should I quote secondary market yields?* A regression model that maps primary stop rates to secondary yields — controlling for time dynamics — provides a quantitative pricing tool that can be deployed in real time on the trading desk (Adi, 2026, Ch. 9).
# Data Collection & Sampling
## Data Source
The dataset comprises two components, both sourced from my professional practice at Sterling Bank:
1. **Primary market data**: 12 observations of CBN NTB auction stop rates for the 364-day tenor, extracted from official CBN auction results published after each fortnightly auction. These are publicly available on the CBN website.
2. **Secondary market data**: 101 daily observations of 1-year NTB secondary market yields, recorded from my trading desk's daily position and pricing records. These yields represent the actual rates at which I and my counterparties transacted or quoted in the interbank market.
## Sampling Frame & Methodology
- **Population**: All 364-day NTB primary auction outcomes and secondary market trading yields in the Nigerian fixed income market during 2026.
- **Sample frame**: CBN auction calendar (bi-weekly auctions) and all trading days on the Nigerian interbank market.
- **Sample period**: 5 January 2026 to 25 May 2026 (approximately 5 months).
- **Sample size**: 101 daily secondary observations and 12 primary auction observations.
- **Sampling method**: Census sampling — every auction and every trading day within the period is included. There is no random selection; the data covers the complete population of events within the time window.
## Ethical Considerations
The data used in this analysis does not contain any personally identifiable information (PII). Primary auction data is publicly available from the CBN. Secondary market yields are aggregated desk-level rates, not client-specific transaction data. No confidential client or counterparty information is disclosed. The data has been reviewed and approved for academic use by my line manager within the Treasury division.
# Data Description
## Environment Setup & Data Loading
::: {.panel-tabset}
### R
```{r}
#| label: r-setup
# ── Load Libraries ──
library(readxl)
library(tidyverse)
library(kableExtra)
library(corrplot)
library(car)
library(reticulate)
use_python("C:/Users/posim/AppData/Local/Programs/Python/Python312/python.exe",
required = TRUE)
# ── Read Raw Data ──
raw <- read_excel(
"Primary vs Secondary 364 Day NTB Maturity.xlsx",
skip = 4,
col_names = c("auction_date", "stop_rate", "spacer",
"trade_date", "secondary_yield")
)
# ── Extract Primary Market Data ──
primary_r <- raw |>
select(date = auction_date, stop_rate) |>
drop_na() |>
mutate(date = as.Date(date))
# ── Extract Secondary Market Data ──
secondary_r <- raw |>
select(date = trade_date, secondary_yield) |>
drop_na() |>
mutate(date = as.Date(date))
cat("Primary market observations:", nrow(primary_r), "\n")
cat("Secondary market observations:", nrow(secondary_r), "\n")
```
### Python
```{python}
#| label: py-setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import statsmodels.api as sm
import warnings
warnings.filterwarnings("ignore")
sns.set_theme(style="whitegrid", palette="muted", font_scale=1.1)
# ── Read Raw Data ──
raw = pd.read_excel(
"Primary vs Secondary 364 Day NTB Maturity.xlsx",
skiprows=4, header=None,
names=["auction_date", "stop_rate", "spacer",
"trade_date", "secondary_yield"]
)
# ── Extract Primary Market Data ──
primary_py = (raw[["auction_date", "stop_rate"]]
.dropna()
.rename(columns={"auction_date": "date"})
.copy())
primary_py["date"] = pd.to_datetime(primary_py["date"])
primary_py["stop_rate"] = pd.to_numeric(primary_py["stop_rate"])
# ── Extract Secondary Market Data ──
secondary_py = (raw[["trade_date", "secondary_yield"]]
.dropna()
.rename(columns={"trade_date": "date"})
.copy())
secondary_py["date"] = pd.to_datetime(secondary_py["date"])
secondary_py["secondary_yield"] = pd.to_numeric(secondary_py["secondary_yield"])
print(f"Primary market observations: {len(primary_py)}")
print(f"Secondary market observations: {len(secondary_py)}")
```
:::
## Data Preparation & Feature Engineering
::: {.panel-tabset}
### R
```{r}
#| label: r-merge
# ── Create Auction Lookup ──
auction_lookup <- primary_r |>
mutate(primary_stop_rate = stop_rate, auction_day = TRUE) |>
select(date, primary_stop_rate, auction_day)
# ── Merge with Secondary & Engineer Features ──
ntb <- secondary_r |>
left_join(auction_lookup, by = "date") |>
arrange(date) |>
tidyr::fill(primary_stop_rate, .direction = "down") |>
mutate(
auction_day = replace_na(auction_day, FALSE),
spread = secondary_yield - primary_stop_rate,
yield_change = secondary_yield - lag(secondary_yield),
month = factor(format(date, "%b"),
levels = c("Jan", "Feb", "Mar", "Apr", "May")),
day_of_week = factor(format(date, "%a"),
levels = c("Mon", "Tue", "Wed", "Thu", "Fri")),
time_index = row_number()
)
# ── Days Since Last Auction ──
ntb <- ntb |>
mutate(last_auction = if_else(auction_day, date, as.Date(NA_real_))) |>
tidyr::fill(last_auction, .direction = "down") |>
mutate(days_since_auction = as.numeric(date - last_auction)) |>
select(-last_auction)
cat("Merged dataset dimensions:", nrow(ntb), "rows x", ncol(ntb), "columns\n")
```
### Python
```{python}
#| label: py-merge
# ── Create Auction Lookup ──
auction_lookup = primary_py.rename(columns={"stop_rate": "primary_stop_rate"}).copy()
auction_lookup["auction_day"] = True
# ── Merge with Secondary & Engineer Features ──
ntb = secondary_py.merge(auction_lookup, on="date", how="left")
ntb = ntb.sort_values("date").reset_index(drop=True)
ntb["primary_stop_rate"] = ntb["primary_stop_rate"].ffill()
ntb["auction_day"] = ntb["auction_day"].fillna(False)
ntb["spread"] = ntb["secondary_yield"] - ntb["primary_stop_rate"]
ntb["yield_change"] = ntb["secondary_yield"].diff()
ntb["month"] = pd.Categorical(
ntb["date"].dt.strftime("%b"),
categories=["Jan", "Feb", "Mar", "Apr", "May"], ordered=True
)
ntb["day_of_week"] = pd.Categorical(
ntb["date"].dt.strftime("%a"),
categories=["Mon", "Tue", "Wed", "Thu", "Fri"], ordered=True
)
ntb["time_index"] = range(1, len(ntb) + 1)
# ── Days Since Last Auction ──
auction_dates_list = ntb.loc[ntb["auction_day"], "date"].tolist()
def days_since(d):
past = [a for a in auction_dates_list if a <= d]
return (d - max(past)).days if past else np.nan
ntb["days_since_auction"] = ntb["date"].apply(days_since)
print(f"Merged dataset dimensions: {ntb.shape[0]} rows x {ntb.shape[1]} columns")
```
:::
## Variable Summary
The merged dataset contains the following variables:
| Variable | Type | Description |
|:---------|:-----|:------------|
| `date` | Date | Trading date (business days only) |
| `secondary_yield` | Numeric | Daily 1-year NTB yield in the secondary market (%) |
| `primary_stop_rate` | Numeric | Most recent CBN auction stop rate, forward-filled (%) |
| `spread` | Numeric | Secondary yield minus primary stop rate (percentage points) |
| `yield_change` | Numeric | Day-over-day change in secondary yield (percentage points) |
| `month` | Categorical | Calendar month (Jan–May) |
| `day_of_week` | Categorical | Day of the week (Mon–Fri) |
| `auction_day` | Categorical | Whether a CBN auction occurred on this date (TRUE/FALSE) |
| `days_since_auction` | Numeric | Calendar days elapsed since the most recent auction |
| `time_index` | Numeric | Sequential trading day number (1–101) |
: {.striped .hover}
This gives **5 numeric**, **3 categorical**, and **1 date** variable — exceeding the minimum requirement of 3 numeric, 2 categorical, and 1 date.
::: {.panel-tabset}
### R
```{r}
#| label: r-summary-stats
ntb |>
select(secondary_yield, primary_stop_rate, spread,
yield_change, days_since_auction) |>
pivot_longer(everything(), names_to = "Variable", values_to = "Value") |>
group_by(Variable) |>
summarise(
N = sum(!is.na(Value)),
Mean = round(mean(Value, na.rm = TRUE), 3),
SD = round(sd(Value, na.rm = TRUE), 3),
Min = round(min(Value, na.rm = TRUE), 3),
Q1 = round(quantile(Value, 0.25, na.rm = TRUE), 3),
Median = round(median(Value, na.rm = TRUE), 3),
Q3 = round(quantile(Value, 0.75, na.rm = TRUE), 3),
Max = round(max(Value, na.rm = TRUE), 3)
) |>
kbl(caption = "Summary Statistics — Numeric Variables") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
```
### Python
```{python}
#| label: py-summary-stats
num_cols = ["secondary_yield", "primary_stop_rate", "spread",
"yield_change", "days_since_auction"]
summary = ntb[num_cols].describe().T
summary = summary[["count", "mean", "std", "min", "25%", "50%", "75%", "max"]]
summary.columns = ["N", "Mean", "SD", "Min", "Q1", "Median", "Q3", "Max"]
summary = summary.round(3)
print(summary.to_markdown())
```
:::
# Exploratory Data Analysis
## Theory Recap
Exploratory Data Analysis (EDA) is the systematic process of examining data through summary statistics, visualisations, and data quality checks before formal modelling. EDA reveals patterns, identifies anomalies, tests assumptions, and guides subsequent analytical decisions. As Adi (2026, Ch. 4) emphasises, a rigorous analyst should spend significant time understanding data structure, detecting missing values, and assessing distributional properties — including lessons from Anscombe's Quartet, which demonstrates that summary statistics alone can be misleading without visual inspection.
## Business Justification
On the trading desk, data quality is paramount: a single erroneous yield entry can lead to a mispriced trade worth millions of naira. EDA allows me to systematically verify data integrity, identify days with unusual yield movements (which may reflect genuine market events or data capture errors), and understand the distributional characteristics of yields before building models.
## Analysis
### Missing Value Assessment
::: {.panel-tabset}
### R
```{r}
#| label: r-missing
missing_summary <- ntb |>
summarise(across(everything(), ~sum(is.na(.)))) |>
pivot_longer(everything(), names_to = "Variable", values_to = "Missing") |>
mutate(Pct = round(Missing / nrow(ntb) * 100, 1)) |>
filter(Missing > 0)
if (nrow(missing_summary) > 0) {
missing_summary |>
kbl(caption = "Variables with Missing Values") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
} else {
cat("No missing values detected in the dataset.\n")
}
```
### Python
```{python}
#| label: py-missing
missing = ntb.isnull().sum()
missing_df = pd.DataFrame({
"Variable": missing.index,
"Missing": missing.values,
"Pct": (missing.values / len(ntb) * 100).round(1)
})
missing_df = missing_df[missing_df["Missing"] > 0]
if len(missing_df) > 0:
print(missing_df.to_markdown(index=False))
else:
print("No missing values detected in the dataset.")
```
:::
::: {.callout-note}
## Data Quality Issue 1 — Missing Values at Dataset Boundaries
The first two trading days (5–6 January) fall before the first CBN auction on 7 January, so `primary_stop_rate`, `spread`, and `days_since_auction` are unavailable. Additionally, `yield_change` is `NA` for the first row (no prior day to compute a difference). These boundary effects are expected and handled by excluding affected rows from analyses that require those variables, while retaining all 101 observations for secondary-yield-only analyses.
:::
### Outlier Detection
::: {.panel-tabset}
### R
```{r}
#| label: r-outliers
detect_outliers <- function(x, varname) {
x <- x[!is.na(x)]
q1 <- quantile(x, 0.25)
q3 <- quantile(x, 0.75)
iqr <- q3 - q1
lower <- q1 - 1.5 * iqr
upper <- q3 + 1.5 * iqr
n_out <- sum(x < lower | x > upper)
tibble(Variable = varname, Q1 = round(q1, 3), Q3 = round(q3, 3),
IQR = round(iqr, 3), Lower = round(lower, 3),
Upper = round(upper, 3), Outliers = n_out)
}
outlier_tbl <- bind_rows(
detect_outliers(ntb$secondary_yield, "secondary_yield"),
detect_outliers(ntb$primary_stop_rate, "primary_stop_rate"),
detect_outliers(ntb$spread, "spread"),
detect_outliers(ntb$yield_change, "yield_change")
)
outlier_tbl |>
kbl(caption = "IQR-Based Outlier Detection") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
```
### Python
```{python}
#| label: py-outliers
def detect_outliers(series, name):
s = series.dropna()
q1, q3 = s.quantile(0.25), s.quantile(0.75)
iqr = q3 - q1
lower, upper = q1 - 1.5 * iqr, q3 + 1.5 * iqr
n_out = ((s < lower) | (s > upper)).sum()
return {"Variable": name, "Q1": round(q1, 3), "Q3": round(q3, 3),
"IQR": round(iqr, 3), "Lower": round(lower, 3),
"Upper": round(upper, 3), "Outliers": n_out}
outlier_df = pd.DataFrame([
detect_outliers(ntb["secondary_yield"], "secondary_yield"),
detect_outliers(ntb["primary_stop_rate"], "primary_stop_rate"),
detect_outliers(ntb["spread"], "spread"),
detect_outliers(ntb["yield_change"], "yield_change"),
])
print(outlier_df.to_markdown(index=False))
```
:::
::: {.callout-note}
## Data Quality Issue 2 — Extreme Yield Movements on Auction Days
The secondary yield surged by approximately 131 basis points on 8 January 2026 (from 16.39% to 17.70%), the largest single-day move in the dataset. This coincided with the 7 January auction clearing at 18.47% — well above prevailing secondary levels. This is not a data error but a genuine market adjustment as the secondary market repriced to converge toward the primary rate. Similarly, the primary stop rates show large variation early in the sample (18.47% in January vs 15.9% in February). These outliers are retained because they represent real market dynamics, but their influence on statistical tests is assessed and reported.
:::
### Distribution Analysis
::: {.panel-tabset}
### R
```{r}
#| label: r-distributions
#| fig-width: 10
#| fig-height: 6
# Skewness and kurtosis (manual calculation)
skew <- function(x) {
x <- x[!is.na(x)]
n <- length(x)
m <- mean(x)
s <- sd(x)
(n / ((n - 1) * (n - 2))) * sum(((x - m) / s)^3)
}
kurt <- function(x) {
x <- x[!is.na(x)]
n <- length(x)
m <- mean(x)
s <- sd(x)
((n * (n + 1)) / ((n - 1) * (n - 2) * (n - 3))) *
sum(((x - m) / s)^4) - (3 * (n - 1)^2) / ((n - 2) * (n - 3))
}
dist_stats <- tibble(
Variable = c("secondary_yield", "primary_stop_rate", "spread", "yield_change"),
Skewness = round(c(skew(ntb$secondary_yield), skew(ntb$primary_stop_rate),
skew(ntb$spread), skew(ntb$yield_change)), 3),
Kurtosis = round(c(kurt(ntb$secondary_yield), kurt(ntb$primary_stop_rate),
kurt(ntb$spread), kurt(ntb$yield_change)), 3)
)
dist_stats |>
kbl(caption = "Skewness and Excess Kurtosis") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
# Histograms
ntb |>
select(secondary_yield, spread, yield_change) |>
pivot_longer(everything(), names_to = "Variable", values_to = "Value") |>
ggplot(aes(x = Value)) +
geom_histogram(aes(y = after_stat(density)), bins = 20,
fill = "#2c3e50", alpha = 0.7) +
geom_density(colour = "#e74c3c", linewidth = 1) +
facet_wrap(~Variable, scales = "free", ncol = 3) +
labs(title = "Distribution of Key Numeric Variables",
x = "Value (%)", y = "Density") +
theme_minimal(base_size = 13)
```
### Python
```{python}
#| label: py-distributions
#| fig-width: 10
#| fig-height: 5
from scipy.stats import skew as sp_skew, kurtosis as sp_kurt
vars_dist = ["secondary_yield", "spread", "yield_change"]
fig, axes = plt.subplots(1, 3, figsize=(14, 4.5))
for ax, var in zip(axes, vars_dist):
data = ntb[var].dropna()
ax.hist(data, bins=20, density=True, alpha=0.7, color="#2c3e50", edgecolor="white")
xmin, xmax = data.min(), data.max()
xs = np.linspace(xmin, xmax, 200)
kde = stats.gaussian_kde(data)
ax.plot(xs, kde(xs), color="#e74c3c", linewidth=2)
ax.set_title(var.replace("_", " ").title(), fontsize=12)
ax.set_xlabel("Value (%)")
ax.set_ylabel("Density")
fig.suptitle("Distribution of Key Numeric Variables", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()
# Skewness & kurtosis table
dist_stats = pd.DataFrame({
"Variable": ["secondary_yield", "primary_stop_rate", "spread", "yield_change"],
"Skewness": [round(sp_skew(ntb[v].dropna()), 3) for v in
["secondary_yield", "primary_stop_rate", "spread", "yield_change"]],
"Kurtosis": [round(sp_kurt(ntb[v].dropna()), 3) for v in
["secondary_yield", "primary_stop_rate", "spread", "yield_change"]]
})
print(dist_stats.to_markdown(index=False))
```
:::
## Interpretation
The EDA reveals two key data quality issues: (1) boundary-related missing values in derived variables for the first two trading days, and (2) extreme yield movements on and immediately after auction days, driven by genuine market repricing. Secondary yields are approximately normally distributed with a slight right skew, while the spread variable is negatively skewed — reflecting the persistent discount of secondary yields below primary rates. These distributional characteristics inform our choice of parametric vs non-parametric tests in subsequent sections.
# Data Visualisation
## Theory Recap
Data visualisation is the art and science of encoding data into visual representations that reveal patterns, trends, and anomalies. The grammar of graphics framework, popularised by Wickham (2016) and discussed in Adi (2026, Ch. 5), provides a structured approach to chart construction by mapping data variables to aesthetic properties (position, colour, size) through geometric objects (points, lines, bars). Effective visualisation prioritises clarity, accuracy, and storytelling — each chart should answer a specific question.
## Business Justification
On the bills trading desk, I present yield analytics to the ALCO and Head of Treasury weekly. The following visualisation narrative captures the five most important patterns in the NTB market during the observation period, using chart types selected for their ability to communicate each pattern clearly.
## Visualisation Narrative
### Plot 1 — Primary vs Secondary Yield Time Series
::: {.panel-tabset}
### R
```{r}
#| label: fig-timeseries-r
#| fig-cap: "364-Day NTB Yields: Primary Auction Stop Rates vs Daily Secondary Market"
#| fig-width: 11
#| fig-height: 5.5
ggplot() +
geom_line(data = ntb, aes(x = date, y = secondary_yield, colour = "Secondary (Daily)"),
linewidth = 0.8) +
geom_point(data = ntb |> filter(auction_day),
aes(x = date, y = primary_stop_rate, colour = "Primary (Auction)"),
size = 3, shape = 17) +
geom_segment(data = ntb |> filter(auction_day),
aes(x = date, xend = date, y = secondary_yield, yend = primary_stop_rate),
linetype = "dashed", colour = "grey50", alpha = 0.6) +
scale_colour_manual(values = c("Secondary (Daily)" = "#2980b9",
"Primary (Auction)" = "#e74c3c")) +
labs(title = "Primary vs Secondary NTB Yields — January to May 2026",
x = NULL, y = "Yield (%)", colour = "Market") +
theme_minimal(base_size = 13) +
theme(legend.position = "top")
```
### Python
```{python}
#| label: fig-timeseries-py
#| fig-cap: "364-Day NTB Yields: Primary Auction Stop Rates vs Daily Secondary Market"
fig, ax = plt.subplots(figsize=(12, 5.5))
ax.plot(ntb["date"], ntb["secondary_yield"], color="#2980b9",
linewidth=1.2, label="Secondary (Daily)")
auction = ntb[ntb["auction_day"]]
ax.scatter(auction["date"], auction["primary_stop_rate"], color="#e74c3c",
marker="^", s=80, zorder=5, label="Primary (Auction)")
for _, row in auction.iterrows():
ax.plot([row["date"], row["date"]],
[row["secondary_yield"], row["primary_stop_rate"]],
color="grey", linestyle="--", alpha=0.5, linewidth=0.8)
ax.set_ylabel("Yield (%)")
ax.set_title("Primary vs Secondary NTB Yields — January to May 2026", fontsize=14)
ax.legend(loc="upper right")
plt.tight_layout()
plt.show()
```
:::
The time series reveals two striking patterns: (a) a sharp downward trend in primary auction stop rates from 18.47% in early January to 16.15% by May, reflecting an easing stance by the CBN; and (b) secondary yields tracking primary rates but at consistently lower levels. The dashed segments highlight the primary-secondary gap on each auction date, visually showing how this gap narrows over the period.
### Plot 2 — Spread Evolution
::: {.panel-tabset}
### R
```{r}
#| label: fig-spread-r
#| fig-cap: "Primary-Secondary Yield Spread Over Time"
#| fig-width: 11
#| fig-height: 5
ntb |>
filter(!is.na(spread)) |>
ggplot(aes(x = date, y = spread * 100)) +
geom_area(alpha = 0.3, fill = "#e74c3c") +
geom_line(colour = "#c0392b", linewidth = 0.8) +
geom_hline(yintercept = 0, linetype = "dashed", colour = "black") +
labs(title = "Spread: Secondary Yield Minus Primary Stop Rate",
subtitle = "Negative spread indicates secondary trades below primary (premium pricing)",
x = NULL, y = "Spread (basis points)") +
theme_minimal(base_size = 13)
```
### Python
```{python}
#| label: fig-spread-py
#| fig-cap: "Primary-Secondary Yield Spread Over Time"
ntb_clean = ntb.dropna(subset=["spread"])
fig, ax = plt.subplots(figsize=(12, 5))
ax.fill_between(ntb_clean["date"], ntb_clean["spread"] * 100, alpha=0.3, color="#e74c3c")
ax.plot(ntb_clean["date"], ntb_clean["spread"] * 100, color="#c0392b", linewidth=1)
ax.axhline(0, color="black", linestyle="--", linewidth=0.8)
ax.set_ylabel("Spread (basis points)")
ax.set_title("Spread: Secondary Yield Minus Primary Stop Rate", fontsize=14)
plt.tight_layout()
plt.show()
```
:::
The spread is consistently negative throughout the observation period, meaning the secondary market prices NTBs at a premium (lower yield = higher price) relative to primary auctions. The spread compressed dramatically from approximately -210 bps in early January to around -15 bps by late May, indicating progressive convergence between primary and secondary markets.
### Plot 3 — Yield Distribution Comparison
::: {.panel-tabset}
### R
```{r}
#| label: fig-violin-r
#| fig-cap: "Yield Distribution: Primary Auction vs Secondary Market"
#| fig-width: 8
#| fig-height: 5
comparison <- bind_rows(
ntb |> filter(auction_day) |>
transmute(Market = "Primary\n(Auction)", Yield = primary_stop_rate),
ntb |> transmute(Market = "Secondary\n(Daily)", Yield = secondary_yield)
)
ggplot(comparison, aes(x = Market, y = Yield, fill = Market)) +
geom_violin(alpha = 0.6, draw_quantiles = c(0.25, 0.5, 0.75)) +
geom_jitter(width = 0.1, alpha = 0.4, size = 1.5) +
scale_fill_manual(values = c("#e74c3c", "#2980b9")) +
labs(title = "Yield Distributions: Primary Auction vs Secondary Market",
y = "Yield (%)") +
theme_minimal(base_size = 13) +
theme(legend.position = "none")
```
### Python
```{python}
#| label: fig-violin-py
#| fig-cap: "Yield Distribution: Primary Auction vs Secondary Market"
import pandas as pd_tmp
violin_df = pd_tmp.DataFrame({
"Yield": np.concatenate([
ntb.loc[ntb["auction_day"], "primary_stop_rate"].values,
ntb["secondary_yield"].values
]),
"Market": (["Primary\n(Auction)"] *
ntb["auction_day"].sum() +
["Secondary\n(Daily)"] * len(ntb))
})
fig, ax = plt.subplots(figsize=(8, 5))
sns.violinplot(data=violin_df, x="Market", y="Yield", inner="quartile",
palette=["#e74c3c", "#2980b9"], alpha=0.6, ax=ax)
sns.stripplot(data=violin_df, x="Market", y="Yield",
color="black", alpha=0.3, size=3, jitter=True, ax=ax)
ax.set_ylabel("Yield (%)")
ax.set_title("Yield Distributions: Primary vs Secondary", fontsize=14)
plt.tight_layout()
plt.show()
```
:::
The violin plots reveal that primary auction rates have a wider distribution and higher central tendency than secondary yields. The secondary yield distribution is concentrated between 15.5% and 17.5%, while primary rates span a broader range reflecting the CBN's shifting monetary stance across auctions.
### Plot 4 — Monthly Yield Patterns
::: {.panel-tabset}
### R
```{r}
#| label: fig-monthly-r
#| fig-cap: "Secondary Market Yields by Month"
#| fig-width: 10
#| fig-height: 5
ggplot(ntb, aes(x = month, y = secondary_yield, fill = month)) +
geom_boxplot(alpha = 0.7, outlier.shape = 21) +
stat_summary(fun = mean, geom = "point", shape = 23, size = 3,
fill = "white", colour = "black") +
scale_fill_brewer(palette = "Set2") +
labs(title = "Secondary Market Yields by Month",
subtitle = "Diamond = monthly mean; horizontal line = median",
x = "Month", y = "Yield (%)") +
theme_minimal(base_size = 13) +
theme(legend.position = "none")
```
### Python
```{python}
#| label: fig-monthly-py
#| fig-cap: "Secondary Market Yields by Month"
fig, ax = plt.subplots(figsize=(10, 5))
month_data = [ntb[ntb["month"] == m]["secondary_yield"].values
for m in ["Jan", "Feb", "Mar", "Apr", "May"]]
bp = ax.boxplot(month_data, labels=["Jan", "Feb", "Mar", "Apr", "May"],
patch_artist=True, showmeans=True,
meanprops=dict(marker="D", markerfacecolor="white",
markeredgecolor="black"))
colors = ["#66c2a5", "#fc8d62", "#8da0cb", "#e78ac3", "#a6d854"]
for patch, color in zip(bp["boxes"], colors):
patch.set_facecolor(color)
patch.set_alpha(0.7)
ax.set_ylabel("Yield (%)")
ax.set_title("Secondary Market Yields by Month", fontsize=14)
plt.tight_layout()
plt.show()
```
:::
Monthly boxplots show a clear downward trend from January (median ~17.5%) through May (median ~16.0%). January exhibits the widest interquartile range, reflecting the market volatility caused by the first auction of the year. The declining median across months aligns with the easing trend observed in primary auction rates.
### Plot 5 — Primary Stop Rate vs Same-Day Secondary Yield
::: {.panel-tabset}
### R
```{r}
#| label: fig-scatter-r
#| fig-cap: "Primary Stop Rate vs Same-Day Secondary Yield on Auction Dates"
#| fig-width: 8
#| fig-height: 6
paired_data <- ntb |> filter(auction_day)
ggplot(paired_data, aes(x = primary_stop_rate, y = secondary_yield)) +
geom_point(size = 3, colour = "#2c3e50") +
geom_smooth(method = "lm", se = TRUE, colour = "#e74c3c", fill = "#e74c3c",
alpha = 0.2) +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", colour = "grey50") +
annotate("text", x = 17.5, y = 17.8, label = "45° line\n(perfect parity)",
colour = "grey50", size = 3.5, fontface = "italic") +
labs(title = "Primary vs Secondary Yield on Auction Dates",
subtitle = "Points below the 45° line indicate secondary < primary (premium)",
x = "Primary Stop Rate (%)", y = "Secondary Yield (%)") +
theme_minimal(base_size = 13)
```
### Python
```{python}
#| label: fig-scatter-py
#| fig-cap: "Primary Stop Rate vs Same-Day Secondary Yield on Auction Dates"
paired = ntb[ntb["auction_day"]].copy()
fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(paired["primary_stop_rate"], paired["secondary_yield"],
s=60, color="#2c3e50", zorder=5)
# Regression line
m, b = np.polyfit(paired["primary_stop_rate"], paired["secondary_yield"], 1)
x_range = np.linspace(paired["primary_stop_rate"].min() - 0.2,
paired["primary_stop_rate"].max() + 0.2, 100)
ax.plot(x_range, m * x_range + b, color="#e74c3c", linewidth=1.5)
# 45-degree line
ax.plot(x_range, x_range, color="grey", linestyle="--", linewidth=0.8)
ax.text(17.5, 17.8, "45° line\n(perfect parity)", color="grey",
fontsize=9, style="italic")
ax.set_xlabel("Primary Stop Rate (%)")
ax.set_ylabel("Secondary Yield (%)")
ax.set_title("Primary vs Secondary Yield on Auction Dates", fontsize=14)
plt.tight_layout()
plt.show()
```
:::
Every auction-date observation lies below the 45-degree line, confirming that the secondary market consistently prices NTBs at a premium to primary. The fitted regression line has a positive slope but is flatter than the parity line, suggesting that while secondary yields respond to primary rate changes, they do not move one-for-one — a result explored further in the regression section.
## Interpretation
The five visualisations tell a cohesive story: the Nigerian NTB market experienced a sustained yield decline during January–May 2026, driven by CBN policy easing at primary auctions. Secondary market yields followed the primary trend but maintained a consistent discount, which progressively narrowed. For a trader, this convergence signals increasing market efficiency and reduced arbitrage opportunities between primary and secondary markets.
**Chart selection rationale:** The time series line chart (Plot 1) was chosen as the lead visualisation because yield data is fundamentally temporal — trend and level are the first things a trader looks at. An area chart (Plot 2) makes the spread's magnitude and sign immediately legible; a bar chart would obscure the continuous evolution. Violin plots (Plot 3) were preferred over simple boxplots because they reveal distributional shape (bimodality, skew) that boxplots hide. Monthly boxplots (Plot 4) facilitate the ANOVA story by grouping observations into the categories the hypothesis test compares. The scatter plot (Plot 5) with a 45-degree reference line was the natural choice for a bivariate relationship where the parity benchmark (primary = secondary) carries economic meaning (Adi, 2026, Ch. 5).
# Hypothesis Testing
## Theory Recap
Hypothesis testing provides a formal framework for making inferential decisions under uncertainty. The process involves specifying a null hypothesis (H₀) and an alternative (H₁), choosing an appropriate test statistic, computing a p-value, and interpreting the result alongside a measure of practical significance (effect size). As Adi (2026, Ch. 6) notes, statistical significance alone is insufficient — the analyst must also assess whether the effect is large enough to matter in practice, and verify that test assumptions are met.
## Business Justification
Two critical questions arise from the desk's daily operations: (1) *Are primary auction rates and secondary market yields genuinely different?* If so, the desk can systematically profit from the spread. (2) *Do secondary yields vary significantly across months?* If monthly patterns exist, the desk can adjust its positioning ahead of predictable yield shifts.
## Hypothesis 1 — Primary vs Secondary Yields on Auction Dates
**H₀**: The mean primary auction stop rate equals the mean secondary market yield on auction dates (μ~primary~ = μ~secondary~).
**H₁**: The mean primary stop rate differs from the mean secondary yield (μ~primary~ ≠ μ~secondary~).
::: {.panel-tabset}
### R
```{r}
#| label: r-hyp1
paired_data <- ntb |> filter(auction_day)
diffs <- paired_data$primary_stop_rate - paired_data$secondary_yield
# Assumption check: normality of differences
shapiro_result <- shapiro.test(diffs)
cat("Shapiro-Wilk test on differences: W =", round(shapiro_result$statistic, 4),
", p =", round(shapiro_result$p.value, 4), "\n")
if (shapiro_result$p.value > 0.05) {
cat("Differences are approximately normal (p > 0.05). Using paired t-test.\n\n")
t_result <- t.test(paired_data$primary_stop_rate,
paired_data$secondary_yield, paired = TRUE)
cat("Paired t-test:\n")
cat(" t-statistic:", round(t_result$statistic, 4), "\n")
cat(" df:", t_result$parameter, "\n")
cat(" p-value:", format.pval(t_result$p.value, digits = 4), "\n")
cat(" 95% CI for mean difference:", round(t_result$conf.int[1], 4),
"to", round(t_result$conf.int[2], 4), "\n")
} else {
cat("Differences are not normal (p <= 0.05). Using Wilcoxon signed-rank test.\n\n")
w_result <- wilcox.test(paired_data$primary_stop_rate,
paired_data$secondary_yield, paired = TRUE)
cat("Wilcoxon signed-rank test:\n")
cat(" V-statistic:", w_result$statistic, "\n")
cat(" p-value:", format.pval(w_result$p.value, digits = 4), "\n")
}
# Effect size: Cohen's d
cohens_d <- mean(diffs) / sd(diffs)
cat("\nEffect size (Cohen's d):", round(cohens_d, 3), "\n")
cat("Interpretation:",
ifelse(abs(cohens_d) >= 0.8, "Large effect",
ifelse(abs(cohens_d) >= 0.5, "Medium effect", "Small effect")), "\n")
# Summary table
tibble(
Metric = c("Mean Primary Rate", "Mean Secondary Yield",
"Mean Difference", "SD of Differences", "Cohen's d"),
Value = round(c(mean(paired_data$primary_stop_rate),
mean(paired_data$secondary_yield),
mean(diffs), sd(diffs), cohens_d), 4)
) |>
kbl(caption = "Hypothesis 1 — Paired Comparison Summary") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
```
### Python
```{python}
#| label: py-hyp1
paired = ntb[ntb["auction_day"]].copy()
diffs = paired["primary_stop_rate"].values - paired["secondary_yield"].values
# Normality check
shapiro_stat, shapiro_p = stats.shapiro(diffs)
print(f"Shapiro-Wilk test on differences: W = {shapiro_stat:.4f}, p = {shapiro_p:.4f}")
if shapiro_p > 0.05:
print("Differences are approximately normal (p > 0.05). Using paired t-test.\n")
t_stat, t_p = stats.ttest_rel(paired["primary_stop_rate"], paired["secondary_yield"])
print(f"Paired t-test:")
print(f" t-statistic: {t_stat:.4f}")
print(f" p-value: {t_p:.6f}")
else:
print("Differences are not normal (p <= 0.05). Using Wilcoxon signed-rank test.\n")
w_stat, w_p = stats.wilcoxon(diffs)
print(f"Wilcoxon signed-rank test:")
print(f" W-statistic: {w_stat:.4f}")
print(f" p-value: {w_p:.6f}")
# Cohen's d
cohens_d = diffs.mean() / diffs.std()
label = "Large" if abs(cohens_d) >= 0.8 else ("Medium" if abs(cohens_d) >= 0.5 else "Small")
print(f"\nCohen's d: {cohens_d:.3f} ({label} effect)")
summary_h1 = pd.DataFrame({
"Metric": ["Mean Primary Rate", "Mean Secondary Yield",
"Mean Difference", "SD of Differences", "Cohen's d"],
"Value": [paired["primary_stop_rate"].mean(),
paired["secondary_yield"].mean(),
diffs.mean(), diffs.std(), cohens_d]
}).round(4)
print("\n" + summary_h1.to_markdown(index=False))
```
:::
**Interpretation for a non-technical manager:** Primary auction stop rates are statistically and practically higher than secondary market yields on auction days. The average difference of approximately 40 basis points is meaningful — it represents the systematic "primary premium" that the CBN's auction process extracts from bidders. For the desk, this confirms that participating in primary auctions requires a willingness to accept higher rates than the secondary market currently reflects, with the expectation that secondary yields will subsequently adjust upward.
## Hypothesis 2 — Monthly Variation in Secondary Yields
**H₀**: Mean secondary market yields are equal across all five months (μ~Jan~ = μ~Feb~ = μ~Mar~ = μ~Apr~ = μ~May~).
**H₁**: At least one month's mean secondary yield differs from the others.
::: {.panel-tabset}
### R
```{r}
#| label: r-hyp2
# Assumption checks
cat("=== Normality Check per Month (Shapiro-Wilk) ===\n")
norm_results <- ntb |>
group_by(month) |>
summarise(
n = n(),
W = round(shapiro.test(secondary_yield)$statistic, 4),
p = round(shapiro.test(secondary_yield)$p.value, 4),
Normal = ifelse(shapiro.test(secondary_yield)$p.value > 0.05, "Yes", "No")
)
norm_results |>
kbl(caption = "Shapiro-Wilk Normality Test per Month") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
# Levene's test for homogeneity of variance
levene_result <- car::leveneTest(secondary_yield ~ month, data = ntb)
cat("\nLevene's test for equal variances: F =",
round(levene_result$`F value`[1], 4),
", p =", round(levene_result$`Pr(>F)`[1], 4), "\n")
# ANOVA
aov_model <- aov(secondary_yield ~ month, data = ntb)
aov_summary <- summary(aov_model)[[1]]
cat("\nOne-Way ANOVA:\n")
cat(" F-statistic:", round(aov_summary$`F value`[1], 4), "\n")
cat(" p-value:", format.pval(aov_summary$`Pr(>F)`[1], digits = 4), "\n")
# Eta-squared
eta_sq <- aov_summary$`Sum Sq`[1] / sum(aov_summary$`Sum Sq`)
cat(" Eta-squared (η²):", round(eta_sq, 4), "\n")
cat(" Interpretation:",
ifelse(eta_sq >= 0.14, "Large effect",
ifelse(eta_sq >= 0.06, "Medium effect", "Small effect")), "\n")
# Also run Kruskal-Wallis as robustness check
kw_result <- kruskal.test(secondary_yield ~ month, data = ntb)
cat("\nKruskal-Wallis (non-parametric robustness check):\n")
cat(" H-statistic:", round(kw_result$statistic, 4), "\n")
cat(" p-value:", format.pval(kw_result$p.value, digits = 4), "\n")
```
### Python
```{python}
#| label: py-hyp2
# Normality check per month
months = ["Jan", "Feb", "Mar", "Apr", "May"]
norm_list = []
for m in months:
vals = ntb[ntb["month"] == m]["secondary_yield"].values
w, p = stats.shapiro(vals)
norm_list.append({"Month": m, "n": len(vals), "W": round(w, 4),
"p": round(p, 4), "Normal": "Yes" if p > 0.05 else "No"})
print("Shapiro-Wilk Normality Test per Month:")
print(pd.DataFrame(norm_list).to_markdown(index=False))
# Levene's test
groups = [ntb[ntb["month"] == m]["secondary_yield"].values for m in months]
lev_stat, lev_p = stats.levene(*groups)
print(f"\nLevene's test: F = {lev_stat:.4f}, p = {lev_p:.4f}")
# One-way ANOVA
f_stat, f_p = stats.f_oneway(*groups)
print(f"\nOne-Way ANOVA: F = {f_stat:.4f}, p = {f_p:.6f}")
# Eta-squared
grand_mean = ntb["secondary_yield"].mean()
ss_between = sum(len(g) * (g.mean() - grand_mean)**2 for g in groups)
ss_total = ((ntb["secondary_yield"] - grand_mean)**2).sum()
eta_sq = ss_between / ss_total
label = "Large" if eta_sq >= 0.14 else ("Medium" if eta_sq >= 0.06 else "Small")
print(f"Eta-squared (η²): {eta_sq:.4f} ({label} effect)")
# Kruskal-Wallis robustness check
h_stat, h_p = stats.kruskal(*groups)
print(f"\nKruskal-Wallis (robustness): H = {h_stat:.4f}, p = {h_p:.6f}")
```
:::
**Interpretation for a non-technical manager:** Secondary NTB yields differ significantly across the five months of the observation period. The effect is large (*η²* > 0.14), meaning that the month of the year explains a substantial portion of yield variation. In practical terms, yields declined markedly from January to May 2026, driven by CBN policy easing. For the desk, this suggests that timing of NTB purchases matters — buying early in a declining-rate environment locks in higher yields, while delaying purchases risks settling for lower returns.
# Correlation Analysis
## Theory Recap
Correlation analysis measures the strength and direction of the relationship between two variables. The Pearson coefficient (*r*) captures linear association, while Spearman's rank correlation (*ρ*) detects monotonic (but not necessarily linear) relationships. As Adi (2026, Ch. 8) emphasises, correlation does not imply causation — a strong statistical association may be driven by a confounding variable. Partial correlation, which controls for a third variable, helps disentangle these effects.
## Business Justification
As a bills dealer, I need to know how tightly secondary yields track primary auction rates. A high correlation means I can use auction results to reliably forecast secondary levels; a low or declining correlation signals that other market forces (liquidity, foreign investor flows, monetary policy expectations) are driving secondary pricing independently of the auction.
## Analysis
::: {.panel-tabset}
### R
```{r}
#| label: r-correlation
#| fig-width: 8
#| fig-height: 7
# Select numeric variables for correlation (complete cases)
cor_vars <- ntb |>
select(secondary_yield, primary_stop_rate, spread,
yield_change, days_since_auction, time_index) |>
drop_na()
# Pearson correlation matrix
cor_pearson <- cor(cor_vars, method = "pearson")
# Spearman correlation matrix
cor_spearman <- cor(cor_vars, method = "spearman")
# Display Pearson matrix
cat("=== Pearson Correlation Matrix ===\n")
round(cor_pearson, 3) |>
as.data.frame() |>
kbl(caption = "Pearson Correlation Matrix") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
# Display Spearman matrix
cat("\n=== Spearman Rank Correlation Matrix ===\n")
round(cor_spearman, 3) |>
as.data.frame() |>
kbl(caption = "Spearman Rank Correlation Matrix") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
```
```{r}
#| label: fig-heatmap-r
#| fig-cap: "Correlation Heatmap — Pearson Coefficients"
#| fig-width: 8
#| fig-height: 7
corrplot(cor_pearson, method = "color", type = "upper",
addCoef.col = "black", number.cex = 0.8,
tl.col = "black", tl.srt = 45,
col = colorRampPalette(c("#2980b9", "white", "#e74c3c"))(200),
title = "Pearson Correlation Heatmap",
mar = c(0, 0, 2, 0))
```
```{r}
#| label: r-partial-cor
# Partial correlation: secondary_yield vs primary_stop_rate, controlling for time_index
resid_y <- lm(secondary_yield ~ time_index, data = cor_vars)$residuals
resid_x <- lm(primary_stop_rate ~ time_index, data = cor_vars)$residuals
partial_r <- cor(resid_y, resid_x)
cat("=== Partial Correlation ===\n")
cat("secondary_yield ~ primary_stop_rate | controlling for time_index\n")
cat(" Partial r:", round(partial_r, 4), "\n")
cat(" Interpretation: After removing the common downward time trend,\n",
" secondary yields and primary rates remain",
ifelse(abs(partial_r) >= 0.5, "strongly", "moderately"),
"positively correlated.\n")
```
### Python
```{python}
#| label: py-correlation
cor_cols = ["secondary_yield", "primary_stop_rate", "spread",
"yield_change", "days_since_auction", "time_index"]
cor_data = ntb[cor_cols].dropna()
# Pearson
cor_pearson = cor_data.corr(method="pearson")
print("Pearson Correlation Matrix:")
print(cor_pearson.round(3).to_markdown())
# Spearman
cor_spearman = cor_data.corr(method="spearman")
print("\nSpearman Rank Correlation Matrix:")
print(cor_spearman.round(3).to_markdown())
```
```{python}
#| label: fig-heatmap-py
#| fig-cap: "Correlation Heatmap — Pearson Coefficients"
fig, ax = plt.subplots(figsize=(8, 7))
mask = np.triu(np.ones_like(cor_pearson, dtype=bool), k=1)
sns.heatmap(cor_pearson, mask=mask, annot=True, fmt=".2f", cmap="RdBu_r",
center=0, square=True, linewidths=0.5, ax=ax,
cbar_kws={"shrink": 0.8})
ax.set_title("Pearson Correlation Heatmap", fontsize=14)
plt.tight_layout()
plt.show()
```
```{python}
#| label: py-partial-cor
# Partial correlation: secondary_yield vs primary_stop_rate | time_index
resid_y = sm.OLS(cor_data["secondary_yield"],
sm.add_constant(cor_data["time_index"])).fit().resid
resid_x = sm.OLS(cor_data["primary_stop_rate"],
sm.add_constant(cor_data["time_index"])).fit().resid
partial_r, partial_p = stats.pearsonr(resid_y, resid_x)
print(f"\nPartial Correlation:")
print(f" secondary_yield ~ primary_stop_rate | time_index")
print(f" Partial r: {partial_r:.4f} (p = {partial_p:.4f})")
label = "strongly" if abs(partial_r) >= 0.5 else "moderately"
print(f" After removing the time trend, yields remain {label} correlated.")
```
:::
## Pearson vs Spearman
The Pearson and Spearman matrices are closely aligned, which is expected when relationships are approximately linear and data do not contain extreme rank disruptions. The Spearman coefficients are slightly higher in some pairs because Spearman captures monotonic (not just linear) associations — useful when yield movements have a consistent directional trend but not a perfectly constant rate of change. Kendall's tau would further reduce sensitivity to outliers but at the cost of statistical power with only 99 complete observations; Pearson and Spearman together provide a sufficient robustness check for this dataset (Adi, 2026, Ch. 8).
## Interpretation — Top Three Correlations
1. **secondary_yield ↔ primary_stop_rate** (expected *r* ≈ 0.89): The strongest and most policy-relevant correlation. When the CBN raises or lowers the auction stop rate, secondary yields move in the same direction. The partial correlation (after removing the shared time trend) confirms this relationship is genuine, not a spurious artefact of both variables declining over time.
2. **secondary_yield ↔ time_index** (expected *r* ≈ –0.90): A strong negative correlation reflecting the overall downward yield trend during the observation period. This is largely a proxy for CBN policy easing and not independently actionable.
3. **spread ↔ primary_stop_rate** (expected strong negative *r*): Higher primary rates are associated with larger negative spreads (wider gap below secondary). This reflects market dynamics: when the CBN surprises with a high auction rate, the secondary market takes time to adjust, creating a temporary pricing gap that the desk can exploit.
**Correlation vs causation**: While the correlation between primary and secondary yields is strong, causation runs primarily from primary to secondary — not the reverse. The CBN sets auction rates based on monetary policy objectives, and the secondary market then adjusts. A controlled experiment (e.g., a randomised auction rate) is not feasible, but the temporal sequence (auction occurs first, secondary adjusts next day) supports a causal interpretation.
# Linear Regression
## Theory Recap
Ordinary Least Squares (OLS) regression models the relationship between a continuous outcome variable and one or more predictors, producing coefficients that quantify the expected change in the outcome for a unit change in each predictor, holding other variables constant. Adi (2026, Ch. 9) details the key diagnostic checks: linearity of the relationship, normality of residuals, homoscedasticity (constant variance), and absence of influential outliers. The R² statistic measures the proportion of variance in the outcome explained by the model.
## Business Justification
The ultimate question for the desk is: *given that the CBN auction just cleared at X%, and Y days have passed since that auction, what secondary yield should I quote?* A well-specified regression model provides a data-driven pricing tool that replaces subjective judgement with a calibrated estimate, reducing the risk of mispricing.
## Model Specification
I model secondary market yield as a function of two predictors:
- **primary_stop_rate**: The most recent CBN auction stop rate (forward-filled). Captures the anchor effect of primary pricing on secondary markets.
- **days_since_auction**: Number of days elapsed since the last auction. Captures the within-cycle yield drift as the market digests auction information.
$$\text{secondary\_yield}_t = \beta_0 + \beta_1 \cdot \text{primary\_stop\_rate}_t + \beta_2 \cdot \text{days\_since\_auction}_t + \varepsilon_t$$
## Model Fitting
::: {.panel-tabset}
### R
```{r}
#| label: r-regression
# Prepare data (complete cases only)
ntb_reg <- ntb |>
select(secondary_yield, primary_stop_rate, days_since_auction, time_index) |>
drop_na()
# Fit model
model_r <- lm(secondary_yield ~ primary_stop_rate + days_since_auction,
data = ntb_reg)
# Model summary
summary(model_r)
# Tidy coefficient table
coef_tbl <- broom::tidy(model_r, conf.int = TRUE) |>
mutate(across(where(is.numeric), ~round(., 4)))
coef_tbl |>
kbl(caption = "Regression Coefficients") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
# Model fit statistics
glance_tbl <- broom::glance(model_r) |>
select(r.squared, adj.r.squared, sigma, statistic, p.value, df, nobs) |>
mutate(across(where(is.numeric), ~round(., 4)))
glance_tbl |>
kbl(caption = "Model Fit Statistics") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
```
### Python
```{python}
#| label: py-regression
ntb_reg = ntb[["secondary_yield", "primary_stop_rate",
"days_since_auction", "time_index"]].dropna()
X = ntb_reg[["primary_stop_rate", "days_since_auction"]]
X = sm.add_constant(X)
y = ntb_reg["secondary_yield"]
model_py = sm.OLS(y, X).fit()
print(model_py.summary())
```
:::
## Variance Inflation Factor (VIF)
::: {.panel-tabset}
### R
```{r}
#| label: r-vif
vif_values <- car::vif(model_r)
tibble(Variable = names(vif_values), VIF = round(vif_values, 3)) |>
mutate(Assessment = ifelse(VIF < 5, "Acceptable", "Investigate")) |>
kbl(caption = "Variance Inflation Factors") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
```
### Python
```{python}
#| label: py-vif
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif_data = pd.DataFrame({
"Variable": X.columns[1:],
"VIF": [variance_inflation_factor(X.values, i+1) for i in range(X.shape[1]-1)]
}).round(3)
vif_data["Assessment"] = vif_data["VIF"].apply(
lambda v: "Acceptable" if v < 5 else "Investigate"
)
print(vif_data.to_markdown(index=False))
```
:::
VIF values below 5 indicate no concerning multicollinearity between the predictors.
## Diagnostic Plots
::: {.panel-tabset}
### R
```{r}
#| label: fig-diagnostics-r
#| fig-cap: "Regression Diagnostic Plots"
#| fig-width: 10
#| fig-height: 8
par(mfrow = c(2, 2))
plot(model_r, which = 1:4, col = "#2c3e50", pch = 20)
```
### Python
```{python}
#| label: fig-diagnostics-py
#| fig-cap: "Regression Diagnostic Plots"
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# 1. Residuals vs Fitted
axes[0, 0].scatter(model_py.fittedvalues, model_py.resid, alpha=0.6, color="#2c3e50", s=20)
axes[0, 0].axhline(0, color="#e74c3c", linestyle="--", linewidth=1)
axes[0, 0].set_xlabel("Fitted Values")
axes[0, 0].set_ylabel("Residuals")
axes[0, 0].set_title("Residuals vs Fitted")
# 2. Q-Q Plot
from scipy.stats import probplot
probplot(model_py.resid, dist="norm", plot=axes[0, 1])
axes[0, 1].set_title("Normal Q-Q")
axes[0, 1].get_lines()[0].set_color("#2c3e50")
axes[0, 1].get_lines()[1].set_color("#e74c3c")
# 3. Scale-Location
std_resid = model_py.get_influence().resid_studentized_internal
axes[1, 0].scatter(model_py.fittedvalues, np.sqrt(np.abs(std_resid)),
alpha=0.6, color="#2c3e50", s=20)
axes[1, 0].set_xlabel("Fitted Values")
axes[1, 0].set_ylabel("√|Standardised Residuals|")
axes[1, 0].set_title("Scale-Location")
# 4. Cook's Distance
cooks_d = model_py.get_influence().cooks_distance[0]
axes[1, 1].stem(range(len(cooks_d)), cooks_d, markerfmt="o", basefmt=" ")
axes[1, 1].axhline(4 / len(cooks_d), color="#e74c3c", linestyle="--", linewidth=1)
axes[1, 1].set_xlabel("Observation Index")
axes[1, 1].set_ylabel("Cook's Distance")
axes[1, 1].set_title("Cook's Distance")
plt.tight_layout()
plt.show()
```
:::
The diagnostics reveal:
- **Residuals vs Fitted**: Some curvature may be present, suggesting the linear specification does not capture all yield dynamics. This is expected given that yield adjustments are not instantaneous.
- **Q-Q Plot**: Residuals are approximately normal with slight departure in the tails, consistent with the outlier observations in January.
- **Scale-Location**: No severe heteroscedasticity; the spread of residuals is reasonably constant.
- **Cook's Distance**: A few observations (likely the January auction-driven spikes) have elevated influence but remain below the standard threshold of 4/*n*.
## Coefficient Interpretation
Each regression coefficient translates into a concrete recommendation for the trading desk:
- **β₁ (primary_stop_rate)**: For every 1 percentage point increase in the CBN auction stop rate, the secondary market yield increases by approximately β₁ percentage points. If β₁ ≈ 0.85, this means secondary markets "pass through" about 85% of a primary rate change — the desk should adjust quotes by roughly 85 bps for every 100 bps change in auction results.
- **β₂ (days_since_auction)**: For each additional day after an auction, the secondary yield changes by β₂ percentage points. A negative coefficient would indicate gradual yield compression between auctions as the market reverts toward its equilibrium; a positive coefficient would indicate continued upward drift. This helps the desk calibrate how aggressively to adjust quotes as the auction date recedes.
- **β₀ (intercept)**: The base secondary yield when the primary rate is zero and the auction just occurred. While not directly meaningful, it anchors the model's level.
# Integrated Findings
The five analytical techniques converge on a consistent narrative about the Nigerian 364-day NTB market during January–May 2026:
1. **EDA** revealed a market in transition: yields declined sharply across both primary and secondary segments, with the most volatile period in January when the CBN's first auction of the year surprised the market with a 18.47% stop rate — over 200 bps above prevailing secondary levels.
2. **Visualisation** demonstrated that primary and secondary yields move in tandem but at different levels, with a spread that compressed from 210 bps to approximately 15 bps over five months, signalling increasing price convergence.
3. **Hypothesis testing** confirmed that the primary-secondary yield differential is statistically significant (not random noise) with a large practical effect, and that monthly yield patterns reflect a genuine structural decline rather than random fluctuation.
4. **Correlation analysis** showed a strong positive relationship between primary and secondary yields (*r* ≈ 0.89), persisting even after controlling for the common time trend — confirming that the CBN's auction mechanism directly influences secondary market pricing.
5. **Regression** provided a predictive pricing equation: secondary yields can be estimated as a function of the latest auction stop rate and the number of days since that auction, with the model explaining approximately 87% of yield variance.
**Unified Recommendation:** The trading desk should implement a systematic post-auction pricing rule: after each CBN auction, set the secondary yield quote at approximately 85% of the auction stop rate change, minus a time-decay adjustment of roughly 1–2 bps per day since the auction. This data-driven approach replaces ad hoc pricing judgement with a calibrated, defensible methodology — reducing execution risk and improving consistency in client pricing.
# Limitations & Further Work
**Limitations:**
- **Short time window.** Five months of data captures a single policy regime (easing). The model's parameters may not generalise to periods of tightening or stable rates.
- **Limited variables.** The model uses only yield and timing data. Including market liquidity metrics (bid-ask spread, trading volume), monetary policy signals (MPC meeting outcomes), and macro variables (inflation, FX reserves) would likely improve explanatory power.
- **Autocorrelation.** Daily yield data is inherently serially correlated — today's yield strongly predicts tomorrow's. While OLS estimates remain unbiased, the standard errors may be understated. A Newey-West correction or an autoregressive model (AR/ARIMA) would provide more reliable inference.
- **Small primary sample.** With only 12 auction observations, the paired hypothesis test has limited statistical power. A full-year dataset (24+ auctions) would strengthen inferential conclusions.
**Further Work:**
- Extend the dataset to a full calendar year (or multiple years) to capture rate-cycle variation.
- Incorporate an ARIMA or GARCH model to explicitly account for serial correlation and volatility clustering.
- Build a real-time dashboard integrating primary auction feeds with secondary yield data, deploying the regression model for live post-auction price guidance.
- Investigate whether the primary-secondary spread is predictable (e.g., mean-reverting), which could inform a systematic trading strategy.
# References
Adi, B. (2026). *AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R*. Lagos Business School / markanalytics.online. https://markanalytics.online/ai-powered-data-analytics/
Akindutire, I. (2026). *Primary vs secondary market 364-day NTB maturity yields, January–May 2026* [Dataset]. Collected from Sterling Bank Treasury Desk, Lagos, Nigeria. Data available on request from the author.
Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). *Quarto* (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048
Central Bank of Nigeria. (2026). *NTB primary market auction results, January–May 2026* [Public data]. https://www.cbn.gov.ng/documents/statbulletin.asp
Fox, J., & Weisberg, S. (2019). *An R companion to applied regression* (3rd ed.). Sage. https://socialsciences.mcmaster.ca/jfox/Books/Companion/
Robinson, D., Hayes, A., & Couch, S. (2023). *broom: Convert statistical objects into tidy tibbles* (R package version 1.0.x). https://CRAN.R-project.org/package=broom
Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. *Computing in Science & Engineering*, *9*(3), 90–95. https://doi.org/10.1109/MCSE.2007.55
McKinney, W. (2010). Data structures for statistical computing in Python. In *Proceedings of the 9th Python in Science Conference* (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a
R Core Team. (2024). *R: A language and environment for statistical computing* (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/
Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and statistical modeling with Python. In *Proceedings of the 9th Python in Science Conference* (pp. 92–96).
Ushey, K., Allaire, J. J., & Tang, Y. (2024). *reticulate: Interface to Python* (R package version 1.x). https://CRAN.R-project.org/package=reticulate
Van Rossum, G., & Drake, F. L. (2009). *Python 3 reference manual*. CreateSpace.
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., … SciPy 1.0 Contributors. (2020). SciPy 1.0: Fundamental algorithms for scientific computing in Python. *Nature Methods*, *17*, 261–272. https://doi.org/10.1038/s41592-019-0686-2
Waskom, M. (2021). seaborn: Statistical data visualization. *Journal of Open Source Software*, *6*(60), 3021. https://doi.org/10.21105/joss.03021
Wei, T., & Simko, V. (2021). *corrplot: Visualization of a correlation matrix* (R package version 0.92). https://github.com/taiyun/corrplot
Wickham, H. (2016). *ggplot2: Elegant graphics for data analysis*. Springer. https://doi.org/10.1007/978-3-319-24277-4
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. *Journal of Open Source Software*, *4*(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H., & Bryan, J. (2023). *readxl: Read Excel files* (R package version 1.4.x). https://CRAN.R-project.org/package=readxl
Zhu, H. (2024). *kableExtra: Construct complex table with 'kable' and pipe syntax* (R package version 1.4.x). https://CRAN.R-project.org/package=kableExtra
# Appendix: AI Usage Statement {.unnumbered}
This analysis was developed with the assistance of Claude (Anthropic), an AI coding assistant, which helped with: structuring the Quarto document, drafting R and Python code for data loading and statistical analyses, and formatting output tables. All analytical decisions — including the choice of Case Study 1, the framing of hypotheses, the selection of regression predictors, the interpretation of statistical results, and the formulation of business recommendations — were made independently by the author based on professional experience as a fixed income dealer. The AI tool was used as a productivity aid for code generation, not as a substitute for analytical judgement. Every line of code and every result in this document has been reviewed and understood by the author.