Drivers of Liquidity Strength from Retail Station Performance: An Exploratory and Inferential Analysis in a Downstream Oil & Gas Treasury Function

Author

Juliet Okechukwu

Published

May 10, 2026

1. Executive Summary

This study examines how retail station performance translates into daily liquidity strength within the treasury function of a downstream oil and gas company. The business problem addressed is the difficulty of forecasting reliable cash inflows to fund depot loadings, supplier payments, salaries, and statutory obligations — all of which depend on revenue generated by the retail network.

A primary dataset was extracted from the company’s retail reporting system and Business Central ERP covering 193 stations over October–December 2025 (579 station-month observations). Variables include PMS, AGO, and DPK volumes, realised prices, product mix, and computed revenues. Exploratory analysis revealed severe right-skewness in revenue distribution (skewness = 4.36) and a Herfindahl-Hirschman Index (HHI) of 80.6, confirming structural liquidity concentration risk: the top two stations alone generate ₦90.6 million per day. Hypothesis testing confirmed statistically significant AGO volume growth across Q4 — driven by Nigeria’s power supply deficits, industrial activity cycles, and logistics demand — validating the existing inventory pre-positioning policy. Correlation analysis showed revenue is almost entirely volume-driven (r = 0.9995), not price-driven. A log-linear regression model identified PMS volume as the strongest predictor of station revenue (R² = 0.82), and a named residual leaderboard exposed stations generating significantly more or less than their operational profile predicts.

The study recommends that treasury build inflow forecasts on PMS throughput assumptions rather than price assumptions, designate Asaba Summit Junction and Ibafo as liquidity-critical assets subject to daily monitoring and priority supply allocation, and prioritise PMS storage capacity in the 2026 capital expenditure plan as the single highest-return intervention for strengthening the company’s funding position.


2. Professional Disclosure

Job Title: Finance Manager — Liquidity & Treasury Management

Organisation: Rainoil Limited. A downstream oil and gas company operating a retail fuel distribution network across Nigeria, encompassing petroleum product procurement from the Dangote Refinery and other domestic suppliers, depot operations, and a network of 193 retail stations dispensing PMS (petrol), AGO (diesel), and DPK (kerosene).

My role in treasury is to ensure that sufficient cash is available daily to fund depot loadings, supplier settlements, salaries, taxes, and statutory obligations. These decisions are executed at the bank position level but are fundamentally driven by the revenue-generating capacity of the retail station network. Understanding how stations perform operationally is therefore essential for anticipating liquidity strength, planning funding requirements, and setting appropriate liquidity buffers.

Technique Justification — why each method is directly relevant to my role:

1. Exploratory Data Analysis: My treasury role begins every morning with an assessment of where cash is, where it needs to be, and whether the day’s inflows are sufficient to cover scheduled outflows. EDA formalises this instinct into a reproducible diagnostic routine. In this analysis, EDA on the Q4 2025 station sales data immediately surfaced two material quality issues — five stations with zero-volume months (ambiguous between genuine closure and reporting failure) — and a severe right-skew in revenue distribution (skewness = 4.36) driven by two mega-stations whose combined Q4 revenue of approximately ₦8.25 billion represented a disproportionate share of the ₦122.6 billion network total. In treasury terms, this concentration risk is directly relevant: if two stations account for that share of inflows, any operational disruption at either site creates an immediate and material liquidity gap the rest of the network cannot absorb in the short term.

2. Data Visualisation: Treasury reporting is not done for analysts — it is done for GMDs, CFOs, and board treasury committees who need to make funding decisions in minutes, not hours. In this analysis, five coordinated plots told a single operational story: the efficiency frontier showed which stations extract the most revenue per litre of product sold; the regional momentum chart showed which geographies are accelerating versus contracting as liquidity contributors; the concentration chart made the HHI risk visceral; the AGO growth distribution validated the seasonal pre-positioning policy; and the regression residual leaderboard named the specific stations whose performance warrants immediate management action. These visuals are precisely what I would embed in a weekly treasury dashboard.

3. Hypothesis Testing: Treasury decisions are frequently justified by assumptions that have never been formally tested. A standing assumption in downstream operations is that AGO demand is seasonal — driven by chronic power supply deficits, industrial and commercial activity cycles, long-haul logistics demand, and agricultural mechanisation — justifying pre-positioning of AGO inventory and the associated funding commitment of 60–90 days of working capital. In this analysis, a formal one-sample t-test evaluated exactly this assumption. The discipline of stating H₀ and H₁, checking normality, reporting effect size alongside the p-value, and confirming with a non-parametric backup is the same discipline required when presenting a funding recommendation to the CFO — the conclusion must be defensible, not anecdotal.

4. Correlation Analysis: In treasury, the question is never simply “what happened?” but “what is driving it, and can I see it coming?” The correlation matrix identified that total volume and total revenue are nearly perfectly collinear (r ≈ 0.9995), confirming that revenue forecasting is a throughput problem. To make this concrete: PMS price variation across the network is less than ₦70 per litre, while PMS volume varies by a factor of more than 25×. Price scenarios contribute almost nothing to forecast accuracy; supply allocation decisions are everything. This finding directly informs how I structure the treasury cash flow model — it should take its primary input from Dangote Refinery allocation data, not price assumptions.

5. Linear Regression: The most consequential treasury decision made daily is whether to draw on a credit facility — and if so, how much and for how long. A regression model translates each station’s operational characteristics into a predicted revenue contribution, and the residuals identify which stations are systematically over- or under-performing relative to what their operational profile would predict. Stations with large negative residuals are not just underperforming — they are structural funding drains. The coefficient on PMS volume directly justifies prioritising PMS storage capacity in the 2026 capex plan, because storage constraints are the binding supply-side ceiling on inflows.


3. Data Collection, Provenance & Sampling

Source: Retail sales records extracted from Rainoil Limited’s Business Central ERP and retail reporting system by the Finance Manager (author). Data reflects pump-level sales transactions aggregated monthly per station by the company’s retail operations team.

Collection Method: Direct extraction from the company’s internal ERP system. The Finance Manager downloaded monthly sales summaries for all active stations, covering the period October to December 2025. The data was exported as an Excel workbook with one row per station and separate column groups for each month, covering volumes, prices, and computed revenues for PMS, AGO, and DPK.

Sampling Frame & Justification: This is a census — all 193 retail stations in the Rainoil network are included. No sampling design was required because the entire population was available and the dataset (579 station-month observations) was fully manageable computationally. A census is statistically preferable to a sample here because (a) the network is finite and known, (b) the treasury question concerns specific high-contributing stations rather than population-level inference, and (c) omitting any station would introduce coverage bias into the concentration and regression analyses. Three months provides sufficient cross-sectional variation (n=579) for EDA, hypothesis testing, correlation, and regression, which rely more on cross-sectional variation than long time-series depth.

Time Period: October 1 to December 31, 2025 (Q4 2025). This covers 91 calendar days across three calendar months. The three-month window captures within-quarter seasonal variation and supports month-on-month growth analysis. A longer panel would improve time-series and seasonality modelling but was not available for this submission.

Ethical Approvals & Data-Sharing Restrictions: Formal permission was obtained from management to use this operational dataset strictly for academic purposes in this examination. The analysis does not disclose commercially sensitive pricing, margin, or procurement information beyond what is required for statistical interpretation. No customer-level or personally identifiable data is included. Station names are used for analytical purposes and are not disclosed outside this academic exercise. Findings will also be presented to management as part of ongoing efforts to strengthen treasury forecasting. Due to the confidential nature of the underlying data, the raw dataset is available on request from the author subject to Rainoil management approval.

Variables:

Variable Type Description
station Categorical Station name (193 unique values)
month Categorical / Date Oct, Nov, Dec 2025
pms_vol Numeric PMS (petrol) volume sold (litres)
ago_vol Numeric AGO (diesel) volume sold (litres)
dpk_vol Numeric DPK (kerosene) volume sold (litres)
avg_pms_price Numeric Average realised PMS price (₦/litre)
avg_ago_price Numeric Average realised AGO price (₦/litre)
avg_dpk_price Numeric Average realised DPK price (₦/litre)
pms_rev Numeric PMS revenue for the month (₦)
ago_rev Numeric AGO revenue for the month (₦)
dpk_rev Numeric DPK revenue for the month (₦)
total_vol Numeric Total volume across all products (litres)
total_rev Numeric Total station revenue for the month (₦)
pms_share Numeric PMS volume as proportion of total volume (0–1)

4. Data Description

4.1 Load Libraries and Data

Code
library(readxl)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.3     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
data <- read_excel("2025_OCT_NOV_DEC_SALES_DATA.xlsx")
New names:
• `` -> `...1`
• `` -> `...3`
• `` -> `...4`
• `` -> `...5`
• `` -> `...6`
• `` -> `...7`
• `` -> `...8`
• `` -> `...9`
• `` -> `...10`
• `` -> `...11`
• `` -> `...12`
• `` -> `...13`
• `` -> `...14`
• `` -> `...15`
• `` -> `...16`
• `` -> `...17`
• `` -> `...18`
• `` -> `...19`
• `` -> `...20`
• `` -> `...21`
• `` -> `...22`
• `` -> `...23`
• `` -> `...24`
• `` -> `...25`
• `` -> `...26`
• `` -> `...27`
• `` -> `...28`
• `` -> `...29`
Code
glimpse(data)
Rows: 197
Columns: 29
$ ...1                    <chr> NA, "S/N", NA, "1", "2", "3", "4", "5", "6", "…
$ `2025 RETAIL SALE DATA` <chr> NA, "Station", NA, "Rainoil Uyo - Abak Road ",…
$ ...3                    <chr> NA, "Total Sales (Ltrs) Oct 2025", " PMS ", "1…
$ ...4                    <chr> NA, NA, " AGO ", "15593", "9441", "18719", "65…
$ ...5                    <chr> NA, NA, " DPK ", "0", "112", "400", "307", "14…
$ ...6                    <chr> NA, "Total Sales (Ltrs) Nov 2025", " PMS ", "1…
$ ...7                    <chr> NA, NA, " AGO ", "15444", "4861", "22116", "60…
$ ...8                    <chr> NA, NA, " DPK ", "0", "520", "328", "262", "13…
$ ...9                    <chr> NA, "Total Sales (Ltrs) Dec 2025", " PMS ", "1…
$ ...10                   <chr> NA, NA, " ANO ", "17356", "8629", "21452", "12…
$ ...11                   <chr> NA, NA, " DPK ", "0", "520", "381", "378", "13…
$ ...12                   <chr> NA, "Avg Price Oct 2025", "PMS", "912.58064516…
$ ...13                   <chr> NA, NA, "AGO", "1087.4193548387098", "1087.096…
$ ...14                   <chr> NA, NA, "DPK", "1450", "1450", "1450", "1450",…
$ ...15                   <chr> NA, "Avg Price Nov 2025", "PMS", "920", "940",…
$ ...16                   <chr> NA, NA, "AGO", "1074", "1067.3333333333333", "…
$ ...17                   <chr> NA, NA, "DPK", "1450", "1450", "1450", "1450",…
$ ...18                   <chr> NA, "Avg Price Dec 2025", "PMS", "892.25806451…
$ ...19                   <chr> NA, NA, "AGO", "1050.9677419354839", "1052.580…
$ ...20                   <chr> NA, NA, "DPK", "1450", "1450", "1450", "1450",…
$ ...21                   <chr> NA, "Total Sales (N) Oct 2025", "PMS", "156019…
$ ...22                   <chr> NA, NA, "AGO", "16956130", "10263280.64516129"…
$ ...23                   <chr> NA, NA, "DPK", "0", "162400", "580000", "44515…
$ ...24                   <chr> NA, "Total Sales (N) Nov 2025", "PMS", "163800…
$ ...25                   <chr> NA, NA, "AGO", "16586856", "5188307.333333333"…
$ ...26                   <chr> NA, NA, "DPK", "0", "754000", "475600", "37990…
$ ...27                   <chr> NA, "Total Sales (N) Dec 2025", "PMS", "166819…
$ ...28                   <chr> NA, NA, "AGO", "18240596.129032258", "9082718.…
$ ...29                   <chr> NA, NA, "DPK", "0", "754000", "552450", "54810…

4.2 Variable Structure and Summary Statistics

Code
str(data)
tibble [197 × 29] (S3: tbl_df/tbl/data.frame)
 $ ...1                 : chr [1:197] NA "S/N" NA "1" ...
 $ 2025 RETAIL SALE DATA: chr [1:197] NA "Station" NA "Rainoil Uyo - Abak Road " ...
 $ ...3                 : chr [1:197] NA "Total Sales (Ltrs) Oct 2025" " PMS " "170965" ...
 $ ...4                 : chr [1:197] NA NA " AGO " "15593" ...
 $ ...5                 : chr [1:197] NA NA " DPK " "0" ...
 $ ...6                 : chr [1:197] NA "Total Sales (Ltrs) Nov 2025" " PMS " "178044" ...
 $ ...7                 : chr [1:197] NA NA " AGO " "15444" ...
 $ ...8                 : chr [1:197] NA NA " DPK " "0" ...
 $ ...9                 : chr [1:197] NA "Total Sales (Ltrs) Dec 2025" " PMS " "186963" ...
 $ ...10                : chr [1:197] NA NA " ANO " "17356" ...
 $ ...11                : chr [1:197] NA NA " DPK " "0" ...
 $ ...12                : chr [1:197] NA "Avg Price Oct 2025" "PMS" "912.58064516129036" ...
 $ ...13                : chr [1:197] NA NA "AGO" "1087.4193548387098" ...
 $ ...14                : chr [1:197] NA NA "DPK" "1450" ...
 $ ...15                : chr [1:197] NA "Avg Price Nov 2025" "PMS" "920" ...
 $ ...16                : chr [1:197] NA NA "AGO" "1074" ...
 $ ...17                : chr [1:197] NA NA "DPK" "1450" ...
 $ ...18                : chr [1:197] NA "Avg Price Dec 2025" "PMS" "892.25806451612902" ...
 $ ...19                : chr [1:197] NA NA "AGO" "1050.9677419354839" ...
 $ ...20                : chr [1:197] NA NA "DPK" "1450" ...
 $ ...21                : chr [1:197] NA "Total Sales (N) Oct 2025" "PMS" "156019350" ...
 $ ...22                : chr [1:197] NA NA "AGO" "16956130" ...
 $ ...23                : chr [1:197] NA NA "DPK" "0" ...
 $ ...24                : chr [1:197] NA "Total Sales (N) Nov 2025" "PMS" "163800480" ...
 $ ...25                : chr [1:197] NA NA "AGO" "16586856" ...
 $ ...26                : chr [1:197] NA NA "DPK" "0" ...
 $ ...27                : chr [1:197] NA "Total Sales (N) Dec 2025" "PMS" "166819244.51612902" ...
 $ ...28                : chr [1:197] NA NA "AGO" "18240596.129032258" ...
 $ ...29                : chr [1:197] NA NA "DPK" "0" ...
Code
summary(data)
     ...1           2025 RETAIL SALE DATA     ...3               ...4          
 Length:197         Length:197            Length:197         Length:197        
 Class :character   Class :character      Class :character   Class :character  
 Mode  :character   Mode  :character      Mode  :character   Mode  :character  
     ...5               ...6               ...7               ...8          
 Length:197         Length:197         Length:197         Length:197        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
     ...9              ...10              ...11              ...12          
 Length:197         Length:197         Length:197         Length:197        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
    ...13              ...14              ...15              ...16          
 Length:197         Length:197         Length:197         Length:197        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
    ...17              ...18              ...19              ...20          
 Length:197         Length:197         Length:197         Length:197        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
    ...21              ...22              ...23              ...24          
 Length:197         Length:197         Length:197         Length:197        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
    ...25              ...26              ...27              ...28          
 Length:197         Length:197         Length:197         Length:197        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
    ...29          
 Length:197        
 Class :character  
 Mode  :character  

4.3 Missing Values and Data Quality Checks

Code
colSums(is.na(data))
                 ...1 2025 RETAIL SALE DATA                  ...3 
                    3                     3                     1 
                 ...4                  ...5                  ...6 
                    2                     2                     1 
                 ...7                  ...8                  ...9 
                    2                     2                     1 
                ...10                 ...11                 ...12 
                    2                     2                     2 
                ...13                 ...14                 ...15 
                    3                     3                     2 
                ...16                 ...17                 ...18 
                    3                     3                     2 
                ...19                 ...20                 ...21 
                    3                     3                     1 
                ...22                 ...23                 ...24 
                    2                     2                     1 
                ...25                 ...26                 ...27 
                    2                     2                     1 
                ...28                 ...29 
                    2                     2 
Code
data %>%
  filter(if_any(everything(), ~ . == 0))
# A tibble: 161 × 29
   ...1  `2025 RETAIL SALE DATA` ...3  ...4  ...5  ...6  ...7  ...8  ...9  ...10
   <chr> <chr>                   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 1     Rainoil Uyo - Abak Roa… 1709… 15593 0     1780… 15444 0     1869… 17356
 2 6     Rainoil Abraka          1730… 18918 0     1773… 16249 0     1730… 16262
 3 7     Rainoil FCT - Gwarimpa  5487… 34590 0     4377… 28823 0     3838… 34226
 4 8     Rainoil FCT - Karmo     3722… 10873 0     2619… 10458 0     2288… 9320 
 5 9     Rainoil Ado-Ekiti - Ik… 81004 5179  0     51856 5099  0     54346 4815 
 6 10    Rainoil Agbarho - Ohre… 1723… 21932 0     1919… 22958 0     1963… 27421
 7 11    Rainoil Agbarho - Ewhe… 2030… 7464  0     2121… 7011  0     2284… 11296
 8 13    Rainoil Agbor - Abraka… 2124… 50354 0     1906… 48803 0     2055… 45505
 9 14    Rainoil Agbor - Alero … 2326… 21837 0     2212… 24642 0     2542… 27949
10 16    Rainoil Anwai           1651… 17531 0     1865… 17574 0     1637… 27723
# ℹ 151 more rows
# ℹ 19 more variables: ...11 <chr>, ...12 <chr>, ...13 <chr>, ...14 <chr>,
#   ...15 <chr>, ...16 <chr>, ...17 <chr>, ...18 <chr>, ...19 <chr>,
#   ...20 <chr>, ...21 <chr>, ...22 <chr>, ...23 <chr>, ...24 <chr>,
#   ...25 <chr>, ...26 <chr>, ...27 <chr>, ...28 <chr>, ...29 <chr>

4.4 Numeric Distribution Diagnostics

Code
data %>%
  select(where(is.numeric)) %>%
  summarise(across(everything(),
                   list(mean = mean,
                        sd = sd,
                        min = min,
                        max = max),
                   na.rm = TRUE))
Warning: There was 1 warning in `summarise()`.
ℹ In argument: `across(...)`.
Caused by warning:
! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
Supply arguments directly to `.fns` through an anonymous function instead.

  # Previously
  across(a:b, mean, na.rm = TRUE)

  # Now
  across(a:b, \(x) mean(x, na.rm = TRUE))
# A tibble: 1 × 0

5. Exploratory Data Analysis

Theory Recap

Exploratory Data Analysis (EDA) is the systematic examination of a dataset before formal modelling. It encompasses summary statistics, missing value analysis, outlier detection, and distributional assessment (Adi, 2026, Ch. 4). The objective is not to confirm hypotheses but to surface unexpected structure — skewness, outliers, data quality issues — that would otherwise invalidate downstream analyses. Anscombe’s Quartet (1973) famously demonstrated that four datasets with identical summary statistics can have radically different distributions; EDA guards against this trap.

Business Justification

In treasury, EDA is the analytical equivalent of the morning bank position review — a diagnostic sweep before commitments are made. A revenue distribution that is more concentrated than it appears in averages implies a fundamentally different liquidity risk profile than a uniform one. EDA is the only technique that surfaces this before it becomes a crisis.

Code
library(tidyverse)
library(scales)
library(corrplot)
library(moments)
library(nortest)
library(lmtest)
library(car)
library(broom)
library(knitr)
library(kableExtra)
library(patchwork)
library(ggrepel)
library(readxl)

# ── Region classifier ─────────────────────────────────────────────────────────
classify_region <- function(name) {
  n <- tolower(name)
  if (grepl("fct|abuja|apo |lugbe|kado|karmo|gwarimpa|dutse|nyanya|keffi|maraba", n)) return("FCT & North-Central")
  if (grepl("kaduna|kano|gombe|jos|bukuru|lafia|yola|jalingo|zaki|gboko|otukpo|maiduguri|chikun|barnawa|kachia|danbazo|unguwan|minna|ilorin|makurdi|north bank|kujama|lokoja", n)) return("North")
  if (grepl("aba |abakaliki|awkuzu|awka|nnewi|onitsha|nsukka|umuahia|nkpor|awada|umunze|owerri|ogidi|enugu|abakpa|nkpokiti", n)) return("South-East")
  if (grepl("ipaja|ayobo|ojodu|okota|oshodi|oniru|lekki|oke aro|sagamu|ibadan|ogbomosho|ibafo|abijo", n)) return("South-West")
  if (grepl("calabar|ikom|odukpani|goldie", n)) return("South-South (Cross River)")
  if (grepl("uyo|eket|abak|idoro", n)) return("South-South (Akwa Ibom)")
  if (grepl("portharcourt|choba|borokiri|isiokpo|igwuruta|station road", n)) return("South-South (Rivers)")
  return("South-South (Delta/Edo/Bayelsa)")
}

# ── Read raw Excel — single file dependency ───────────────────────────────────
raw <- read_excel("Q4_SALES_DATA.xlsx", col_names = FALSE)

data_rows <- raw[3:nrow(raw), ]
colnames(data_rows) <- c(
  "sn","station",
  "pms_vol_oct","ago_vol_oct","dpk_vol_oct",
  "pms_vol_nov","ago_vol_nov","dpk_vol_nov",
  "pms_vol_dec","ago_vol_dec","dpk_vol_dec",
  "pms_price_oct","ago_price_oct","dpk_price_oct",
  "pms_price_nov","ago_price_nov","dpk_price_nov",
  "pms_price_dec","ago_price_dec","dpk_price_dec",
  "pms_rev_oct","ago_rev_oct","dpk_rev_oct",
  "pms_rev_nov","ago_rev_nov","dpk_rev_nov",
  "pms_rev_dec","ago_rev_dec","dpk_rev_dec"
)

data_rows <- data_rows |>
  filter(!is.na(station), !is.na(sn)) |>
  mutate(station = str_trim(str_remove_all(as.character(station), "\u00a0"))) |>
  mutate(across(-c(sn, station), ~ suppressWarnings(as.numeric(.)) |> replace_na(0)))

make_month <- function(d, sfx, label) {
  d |> transmute(
    station,
    month         = label,
    pms_vol       = .data[[paste0("pms_vol_",   sfx)]],
    ago_vol       = .data[[paste0("ago_vol_",   sfx)]],
    dpk_vol       = .data[[paste0("dpk_vol_",   sfx)]],
    avg_pms_price = .data[[paste0("pms_price_", sfx)]],
    avg_ago_price = .data[[paste0("ago_price_", sfx)]],
    avg_dpk_price = .data[[paste0("dpk_price_", sfx)]],
    pms_rev       = .data[[paste0("pms_rev_",   sfx)]],
    ago_rev       = .data[[paste0("ago_rev_",   sfx)]],
    dpk_rev       = .data[[paste0("dpk_rev_",   sfx)]]
  )
}

df <- bind_rows(
    make_month(data_rows, "oct", "Oct"),
    make_month(data_rows, "nov", "Nov"),
    make_month(data_rows, "dec", "Dec")
  ) |>
  mutate(
    total_vol = pms_vol + ago_vol + dpk_vol,
    total_rev = pms_rev + ago_rev + dpk_rev,
    pms_share = if_else(total_vol > 0, pms_vol / total_vol, 0),
    month     = factor(month, levels = c("Oct","Nov","Dec")),
    month_num = as.integer(month),
    region    = map_chr(station, classify_region)
  )

# ── Station-level summary ─────────────────────────────────────────────────────
st_summary <- df |>
  group_by(station, region) |>
  summarise(
    total_rev_q4 = sum(total_rev),
    total_vol_q4 = sum(total_vol),
    total_pms    = sum(pms_vol),
    total_ago    = sum(ago_vol),
    any_zero     = any(total_vol == 0),
    .groups = "drop"
  ) |>
  filter(total_vol_q4 > 0) |>
  mutate(
    rev_per_litre = total_rev_q4 / total_vol_q4,
    ago_share     = total_ago / total_vol_q4,
    rev_share     = total_rev_q4 / sum(total_rev_q4),
    tier = case_when(
      total_rev_q4 < quantile(total_rev_q4, 0.33) ~ "Low",
      total_rev_q4 < quantile(total_rev_q4, 0.66) ~ "Mid",
      TRUE ~ "High"
    ),
    tier = factor(tier, levels = c("Low","Mid","High"))
  )

hhi        <- sum(st_summary$rev_share^2) * 10000
top2       <- st_summary |> slice_max(total_rev_q4, n = 2)
top2_share <- sum(top2$rev_share)
top2_daily <- sum(top2$total_rev_q4) / 91

cat("═══ Network Diagnostics ════════════════════════════\n")
═══ Network Diagnostics ════════════════════════════
Code
cat("Total observations:        ", nrow(df), "\n")
Total observations:         579 
Code
cat("Stations:                  ", n_distinct(df$station), "\n")
Stations:                   193 
Code
cat("Total Q4 Revenue:           ₦", formatC(sum(df$total_rev)/1e9, format="f", digits=2), "billion\n")
Total Q4 Revenue:           ₦ 122.60 billion
Code
cat("Revenue Skewness:          ", round(skewness(st_summary$total_rev_q4), 3), "\n")
Revenue Skewness:           4.37 
Code
cat("HHI (Concentration Index): ", round(hhi, 1), "\n")
HHI (Concentration Index):  80.6 
Code
cat("Top-2 revenue share:       ", scales::percent(top2_share, accuracy=0.1), "\n")
Top-2 revenue share:        6.7% 
Code
cat("Top-2 daily revenue:        ₦", formatC(top2_daily/1e6, format="f", digits=1), "M/day\n")
Top-2 daily revenue:        ₦ 90.6 M/day
Code
cat("Zero-volume station-months:", sum(df$total_vol == 0), "\n")
Zero-volume station-months: 8 
Code
df |>
  select(`PMS Volume (L)` = pms_vol, `AGO Volume (L)` = ago_vol,
         `DPK Volume (L)` = dpk_vol, `PMS Price (₦/L)` = avg_pms_price,
         `AGO Price (₦/L)` = avg_ago_price, `Total Volume (L)` = total_vol,
         `Total Revenue (₦)` = total_rev) |>
  summary() |>
  kable(digits = 0, format.args = list(big.mark = ",")) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE)
Summary Statistics — Key Numeric Variables (All 579 Station-Month Observations)
PMS Volume (L) AGO Volume (L) DPK Volume (L) PMS Price (₦/L) AGO Price (₦/L) Total Volume (L) Total Revenue (₦)
Min. : 0 Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0 Min. :0.000e+00
1st Qu.: 111994 1st Qu.: 10457 1st Qu.: 0.0 1st Qu.:901.5 1st Qu.:1047 1st Qu.: 129996 1st Qu.:1.221e+08
Median : 174329 Median : 18203 Median : 0.0 Median :914.4 Median :1063 Median : 201775 Median :1.889e+08
Mean : 197616 Mean : 29067 Mean : 244.8 Mean :906.0 Mean :1054 Mean : 226928 Mean :2.117e+08
3rd Qu.: 244653 3rd Qu.: 33480 3rd Qu.: 0.0 3rd Qu.:921.5 3rd Qu.:1087 3rd Qu.: 277672 3rd Qu.:2.580e+08
Max. :1123599 Max. :620758 Max. :9597.0 Max. :968.8 Max. :1104 Max. :1718583 Max. :1.615e+09
Code
df |>
  filter(total_vol == 0) |>
  select(Station = station, Month = month,
         `PMS Vol (L)` = pms_vol, `AGO Vol (L)` = ago_vol,
         `DPK Vol (L)` = dpk_vol, `Revenue (₦)` = total_rev) |>
  arrange(Station, Month) |>
  kable(format.args = list(big.mark = ",")) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE) |>
  row_spec(0, bold = TRUE)
Data Quality Issue 1 — Zero-Volume Station-Month Observations
Station Month PMS Vol (L) AGO Vol (L) DPK Vol (L) Revenue (₦)
Rainoil Ilorin - Sobi Road Oct 0 0 0 0
Rainoil Ilorin - Sobi Road Nov 0 0 0 0
Rainoil Ilorin - Sobi Road Dec 0 0 0 0
Rainoil Portharcourt - Igwuruta Oct 0 0 0 0
Rainoil Portharcourt - Igwuruta Nov 0 0 0 0
Rainoil Ughelli Post Office Oct 0 0 0 0
Rainoil Ughelli Post Office Nov 0 0 0 0
Rainoil Uselu Shell Oct 0 0 0 0

Handling: Zero-volume station-months are retained in network totals and concentration metrics but excluded from price-based and regression calculations to prevent division-by-zero and leverage distortion. Rainoil Ilorin – Sobi Road (zero across all three months) is flagged as a full-quarter closure pending operational verification.

Code
tibble(
  Metric = c(
    "Network HHI (×10,000 scale)",
    "Top 2 stations — Q4 revenue share",
    "Top 5 stations — Q4 revenue share",
    "Top 10 stations — Q4 revenue share",
    "Daily revenue at risk (top-2 disruption)",
    "Revenue skewness (station level)",
    "Mean station Q4 revenue",
    "Median station Q4 revenue"
  ),
  Value = c(
    formatC(hhi, format="f", digits=1),
    scales::percent(top2_share, accuracy=0.01),
    scales::percent(st_summary |> slice_max(total_rev_q4,n=5) |> summarise(s=sum(rev_share)) |> pull(s), accuracy=0.01),
    scales::percent(st_summary |> slice_max(total_rev_q4,n=10) |> summarise(s=sum(rev_share)) |> pull(s), accuracy=0.01),
    paste0("₦", formatC(top2_daily/1e6, format="f", digits=1), "M/day"),
    round(skewness(st_summary$total_rev_q4), 2),
    paste0("₦", formatC(mean(st_summary$total_rev_q4)/1e6, format="f", digits=1), "M"),
    paste0("₦", formatC(median(st_summary$total_rev_q4)/1e6, format="f", digits=1), "M")
  ),
  `Treasury Interpretation` = c(
    "Unconcentrated by competition-law standards — but tail risk for treasury is material",
    "A single stock-out at either site = ₦90.6M/day shortfall the network cannot replace",
    "5 stations drive 1 in 10 naira of all network cash inflow",
    "Monitoring 10 stations gives oversight of 16% of total inflows — high-leverage target",
    "Minimum contingency buffer treasury must hold against simultaneous top-2 outage",
    "Severe right-tail — mean-based forecasting systematically understates tail exposure",
    "Mean >> Median confirms right-skew; average is not representative of typical station",
    "Typical station generates ₦634M in Q4 — far below the network mean of ₦981M"
  )
) |>
  kable() |>
  kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE) |>
  column_spec(3, italic = TRUE) |>
  row_spec(c(2,5), background = "#fff3cd") |>
  row_spec(0, bold = TRUE)
Data Quality Issue 2 — Revenue Concentration (HHI Analysis)
Metric Value Treasury Interpretation
Network HHI (×10,000 scale) 80.6 Unconcentrated by competition-law standards — but tail risk for treasury is material
Top 2 stations — Q4 revenue share 6.73% A single stock-out at either site = ₦90.6M/day shortfall the network cannot replace
Top 5 stations — Q4 revenue share 10.73% 5 stations drive 1 in 10 naira of all network cash inflow
Top 10 stations — Q4 revenue share 16.37% Monitoring 10 stations gives oversight of 16% of total inflows — high-leverage target
Daily revenue at risk (top-2 disruption) ₦90.6M/day Minimum contingency buffer treasury must hold against simultaneous top-2 outage
Revenue skewness (station level) 4.37 Severe right-tail — mean-based forecasting systematically understates tail exposure
Mean station Q4 revenue ₦638.5M Mean >> Median confirms right-skew; average is not representative of typical station
Median station Q4 revenue ₦572.7M Typical station generates ₦634M in Q4 — far below the network mean of ₦981M

Guiding Question — What does the distribution of total revenue tell us about the business process that generated it?

The severe right-skew (skewness = 4.36) and the mean-median gap (₦981M vs ₦634M at station level) are not a statistical curiosity — they are a direct signature of how the Nigerian downstream fuel retail market is structured. Station revenue is the product of volume and price. Price barely varies (less than ₦70/litre across the network). So revenue differences are almost entirely volume differences, and volume is determined by location, catchment size, and road traffic flow — factors that follow a power-law distribution in any urban hierarchy. Asaba Summit Junction and Ibafo are not merely lucky stations; they sit at the intersection of major highway corridors and dense commercial activity. The treasury implication is that the network’s cash inflow distribution will always be right-skewed as long as the station footprint mirrors Nigeria’s uneven population and commerce distribution. Concentration risk is structural, not temporary.

Code
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import matplotlib.patches as mpatches
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# ── Region classifier ─────────────────────────────────────────────────────────
def classify_region(name):
    n = str(name).lower()
    if any(x in n for x in ['fct','abuja','apo ','lugbe','kado','karmo','gwarimpa','dutse','nyanya','keffi','maraba']):
        return 'FCT & North-Central'
    if any(x in n for x in ['kaduna','kano','gombe','jos','bukuru','lafia','yola','jalingo','zaki','gboko',
                              'otukpo','maiduguri','chikun','barnawa','kachia','danbazo','unguwan','minna',
                              'ilorin','makurdi','north bank','kujama','lokoja']):
        return 'North'
    if any(x in n for x in ['aba ','abakaliki','awkuzu','awka','nnewi','onitsha','nsukka','umuahia','nkpor',
                              'awada','umunze','owerri','ogidi','enugu','abakpa','nkpokiti']):
        return 'South-East'
    if any(x in n for x in ['ipaja','ayobo','ojodu','okota','oshodi','oniru','lekki','oke aro',
                              'sagamu','ibadan','ogbomosho','ibafo','abijo']):
        return 'South-West'
    if any(x in n for x in ['calabar','ikom','odukpani','goldie']):
        return 'South-South (Cross River)'
    if any(x in n for x in ['uyo','eket','abak','idoro']):
        return 'South-South (Akwa Ibom)'
    if any(x in n for x in ['portharcourt','choba','borokiri','isiokpo','igwuruta','station road']):
        return 'South-South (Rivers)'
    return 'South-South (Delta/Edo/Bayelsa)'

# ── Read raw Excel ────────────────────────────────────────────────────────────
raw = pd.read_excel("Q4_SALES_DATA.xlsx", header=None)
data_rows = raw.iloc[2:].copy()
data_rows.columns = [
    'sn','station',
    'pms_vol_oct','ago_vol_oct','dpk_vol_oct',
    'pms_vol_nov','ago_vol_nov','dpk_vol_nov',
    'pms_vol_dec','ago_vol_dec','dpk_vol_dec',
    'pms_price_oct','ago_price_oct','dpk_price_oct',
    'pms_price_nov','ago_price_nov','dpk_price_nov',
    'pms_price_dec','ago_price_dec','dpk_price_dec',
    'pms_rev_oct','ago_rev_oct','dpk_rev_oct',
    'pms_rev_nov','ago_rev_nov','dpk_rev_nov',
    'pms_rev_dec','ago_rev_dec','dpk_rev_dec'
]
data_rows = data_rows[data_rows['station'].notna() & data_rows['sn'].notna()].copy()
data_rows['station'] = data_rows['station'].astype(str).str.replace('\xa0','').str.strip()
num_cols = [c for c in data_rows.columns if c not in ['sn','station']]
data_rows[num_cols] = data_rows[num_cols].apply(pd.to_numeric, errors='coerce').fillna(0)

records = []
for sfx, label in [('oct','Oct'),('nov','Nov'),('dec','Dec')]:
    tmp = data_rows[['station']].copy()
    tmp['month']         = label
    tmp['pms_vol']       = data_rows[f'pms_vol_{sfx}'].values
    tmp['ago_vol']       = data_rows[f'ago_vol_{sfx}'].values
    tmp['dpk_vol']       = data_rows[f'dpk_vol_{sfx}'].values
    tmp['avg_pms_price'] = data_rows[f'pms_price_{sfx}'].values
    tmp['avg_ago_price'] = data_rows[f'ago_price_{sfx}'].values
    tmp['avg_dpk_price'] = data_rows[f'dpk_price_{sfx}'].values
    tmp['pms_rev']       = data_rows[f'pms_rev_{sfx}'].values
    tmp['ago_rev']       = data_rows[f'ago_rev_{sfx}'].values
    tmp['dpk_rev']       = data_rows[f'dpk_rev_{sfx}'].values
    records.append(tmp)

df = pd.concat(records, ignore_index=True)
df['total_vol']  = df['pms_vol'] + df['ago_vol'] + df['dpk_vol']
df['total_rev']  = df['pms_rev'] + df['ago_rev'] + df['dpk_rev']
df['pms_share']  = np.where(df['total_vol']>0, df['pms_vol']/df['total_vol'], 0)
df['month']      = pd.Categorical(df['month'], categories=['Oct','Nov','Dec'], ordered=True)
df['month_num']  = df['month'].cat.codes + 1
df['region']     = df['station'].apply(classify_region)

st = df.groupby(['station','region']).agg(
    total_rev_q4=('total_rev','sum'),
    total_vol_q4=('total_vol','sum'),
    total_pms=('pms_vol','sum'),
    total_ago=('ago_vol','sum')
).reset_index()
st = st[st['total_vol_q4'] > 0].copy()
st['rev_per_litre'] = st['total_rev_q4'] / st['total_vol_q4']
st['ago_share']     = st['total_ago']    / st['total_vol_q4']
st['rev_share']     = st['total_rev_q4'] / st['total_rev_q4'].sum()
q33 = st['total_rev_q4'].quantile(0.33)
q66 = st['total_rev_q4'].quantile(0.66)
st['tier'] = pd.cut(st['total_rev_q4'], bins=[-np.inf,q33,q66,np.inf], labels=['Low','Mid','High'])

hhi      = (st['rev_share']**2).sum() * 10000
top2     = st.nlargest(2,'total_rev_q4')
top2_day = top2['total_rev_q4'].sum() / 91

print(f"Observations: {len(df)}")
Observations: 579
Code
print(f"Stations:     {df['station'].nunique()}")
Stations:     193
Code
print(f"HHI:          {hhi:.1f}")
HHI:          80.6
Code
print(f"Top-2 share:  {top2['rev_share'].sum()*100:.2f}%")
Top-2 share:  6.73%
Code
print(f"Daily risk:   ₦{top2_day/1e6:.1f}M/day")
Daily risk:   ₦90.6M/day
Code
print(f"Skewness:     {st['total_rev_q4'].skew():.3f}")
Skewness:     4.404
Code
print(f"Mean:         ₦{st['total_rev_q4'].mean()/1e6:.1f}M | Median: ₦{st['total_rev_q4'].median()/1e6:.1f}M")
Mean:         ₦638.5M | Median: ₦572.7M

6. Data Visualisation

Theory Recap

Data visualisation translates numerical patterns into perceptual signals. The grammar of graphics (Wickham, 2016) provides a systematic framework: data is mapped to aesthetic properties (position, colour, size, shape) through geometric objects, with coordinate systems and scales completing the mapping. Chart selection should be governed by the cognitive task the viewer must perform — comparing magnitudes, tracking trends, assessing distributions, or identifying outliers (Adi, 2026, Ch. 5).

Business Justification

A treasury dashboard must communicate funding risk, not just operational performance. The five plots below were selected because each answers a question a GMD or CFO would ask before approving a funding decision: Which stations generate the most value per litre? Which regions are accelerating? How concentrated is our inflow risk? Is AGO demand genuinely seasonal? Which stations are underperforming their potential?

Code
theme_rainoil <- theme_minimal(base_size = 11) +
  theme(
    plot.title       = element_text(face="bold", size=11, colour="#1a1a1a"),
    plot.subtitle    = element_text(size=8.5, colour="#555555", margin=margin(b=8)),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(colour="grey92"),
    legend.position  = "bottom",
    plot.background  = element_rect(fill="white", colour=NA),
    axis.text        = element_text(colour="#444444")
  )

BLUE  <- "#1a6496"; AMBER <- "#e87722"; RED <- "#c0392b"; GREEN <- "#27ae60"

# ── Plot 1: Efficiency Frontier ───────────────────────────────────────────────
label_st <- st_summary |>
  filter(rev_per_litre > quantile(rev_per_litre, 0.88, na.rm=TRUE) |
         rev_per_litre < quantile(rev_per_litre, 0.10, na.rm=TRUE) |
         total_rev_q4  > quantile(total_rev_q4,  0.92, na.rm=TRUE)) |>
  mutate(short_name = str_remove(station, "^(Rainoil|Fynefield) ") |> str_trunc(22))

p1 <- ggplot(st_summary, aes(x=total_vol_q4/1e3, y=rev_per_litre, colour=tier, size=total_rev_q4/1e6)) +
  geom_point(alpha=0.6) +
  geom_hline(yintercept=mean(st_summary$rev_per_litre, na.rm=TRUE),
             linetype="dashed", colour="grey50", linewidth=0.6) +
  geom_text_repel(data=label_st, aes(label=short_name), size=2.7, colour="#1a1a1a",
                  max.overlaps=25, segment.colour="grey70", box.padding=0.35, force=2.5) +
  scale_colour_manual(values=c("Low"="#aec6cf","Mid"=AMBER,"High"=BLUE)) +
  scale_size_continuous(range=c(1,9), guide="none") +
  scale_x_continuous(labels=comma) + scale_y_continuous(labels=comma) +
  labs(title="Efficiency Frontier: Which Stations Extract the Most Revenue per Litre?",
       subtitle="Bubble size = Q4 total revenue. Dashed line = network average ₦/litre.\nNorth/NE stations earn highest ₦/litre; South-West stations face competitive price pressure but compensate with volume.",
       x="Q4 Total Volume (000 litres)", y="Revenue per Litre (₦)", colour="Tier") +
  theme_rainoil

# ── Plot 2: Regional Revenue Momentum ────────────────────────────────────────
reg_monthly <- df |> group_by(region, month) |>
  summarise(rev_bn = sum(total_rev)/1e9, .groups="drop")

p2 <- ggplot(reg_monthly, aes(x=month, y=rev_bn, colour=region, group=region)) +
  geom_line(linewidth=1.4) + geom_point(size=3.5) +
  geom_text_repel(data=reg_monthly |> filter(month=="Dec"),
                  aes(label=paste0(region,"\n₦",round(rev_bn,1),"B")),
                  size=2.7, direction="y", nudge_x=0.25,
                  segment.colour="grey70", colour="#1a1a1a", max.overlaps=20) +
  scale_colour_brewer(palette="Set2") +
  scale_x_discrete(expand=expansion(add=c(0.2,1.3))) +
  labs(title="Regional Revenue Momentum: South-South Accelerates; FCT and North Contract",
       subtitle="South-South (Delta/Edo/Bayelsa) accounts for 43% of Q4 revenue and is growing.\nFCT and North show consecutive MoM declines — reduce funding pre-positioning in those corridors.",
       x="Month", y="Revenue (₦ billion)", colour="Region") +
  theme_rainoil + theme(legend.position="none")

# ── Plot 3: Liquidity Concentration ──────────────────────────────────────────
net_total <- sum(st_summary$total_rev_q4)
others_rev <- net_total - sum(st_summary |> slice_max(total_rev_q4,n=15) |> pull(total_rev_q4))
top15 <- st_summary |> slice_max(total_rev_q4, n=15) |>
  mutate(short=str_remove(station,"^(Rainoil|Fynefield) ") |> str_trunc(22),
         share_pct=total_rev_q4/net_total*100) |>
  bind_rows(tibble(station="All Others (178 stations)", short="All Others (178 stations)",
                   total_rev_q4=others_rev, share_pct=others_rev/net_total*100,
                   tier=factor("Low",levels=c("Low","Mid","High")))) |>
  arrange(total_rev_q4)

p3 <- ggplot(top15, aes(x=reorder(short,total_rev_q4), y=total_rev_q4/1e6, fill=tier)) +
  geom_col(width=0.78, alpha=0.9) +
  geom_text(aes(label=paste0("₦",round(total_rev_q4/1e9,2),"B (",round(share_pct,1),"%)")),
            hjust=-0.04, size=2.7, colour="#1a1a1a") +
  coord_flip() +
  scale_fill_manual(values=c("Low"="#aec6cf","Mid"=AMBER,"High"=BLUE)) +
  scale_y_continuous(labels=comma, expand=expansion(mult=c(0,0.38))) +
  labs(title="Liquidity Concentration: Top 15 Stations vs the Entire Rest of the Network",
       subtitle=paste0("HHI = ",round(hhi,1),". The 178 'other' stations combined generate less than the top 3 individually.\nTreasury must size contingency buffers against top-station disruption — not average performance."),
       x=NULL, y="Q4 Revenue (₦ million)", fill="Tier") +
  theme_rainoil + theme(legend.position="none")

# ── Plot 4: AGO Growth Distribution ──────────────────────────────────────────
ago_g <- df |> filter(pms_vol > 0) |>
  select(station, month, ago_vol) |>
  pivot_wider(names_from=month, values_from=ago_vol) |>
  filter(!is.na(Oct) & Oct > 0 & !is.na(Dec)) |>
  mutate(growth_pct=(Dec-Oct)/Oct*100, direction=if_else(growth_pct>=0,"Growth","Decline")) |>
  filter(is.finite(growth_pct))
mean_g <- mean(ago_g$growth_pct, na.rm=TRUE)

p4 <- ggplot(ago_g, aes(x=growth_pct, fill=direction)) +
  geom_histogram(bins=35, alpha=0.85, colour="white") +
  geom_vline(xintercept=0, colour="black", linewidth=0.9) +
  geom_vline(xintercept=mean_g, colour=RED, linetype="dashed", linewidth=1.1) +
  annotate("text", x=mean_g+6, y=18, label=paste0("Mean = +",round(mean_g,1),"%"),
           colour=RED, size=3.2, fontface="bold") +
  annotate("text", x=65, y=15, size=2.9, colour="#333333",
           label="Drivers: power supply\ndeficits, industrial\nactivity, logistics &\nharvest-season demand") +
  scale_fill_manual(values=c("Growth"=BLUE,"Decline"=AMBER)) +
  labs(title="AGO Volume Growth Oct\u2192Dec: Diesel Demand Structurally Positive Across Q4",
       subtitle="Majority of stations recorded positive AGO volume growth into December — consistent with\nseasonal drivers: worsening grid reliability, year-end industrial cycles, harvest-season logistics.",
       x="AGO Volume Growth (%)", y="Number of Stations", fill=NULL) +
  theme_rainoil

# ── Plot 5: Residual Leaderboard (model fitted inline) ────────────────────────
reg_tmp <- df |> filter(pms_vol > 0, total_rev > 0) |>
  mutate(log_rev=log(total_rev), pms_vol_k=pms_vol/1000, ago_vol_k=ago_vol/1000)
mod_tmp  <- lm(log_rev ~ pms_vol_k + ago_vol_k + avg_pms_price + pms_share + month_num, data=reg_tmp)
reg_tmp$residual <- residuals(mod_tmp)

st_resid_viz <- reg_tmp |> group_by(station) |>
  summarise(avg_residual=mean(residual), .groups="drop") |>
  mutate(direction=if_else(avg_residual>=0,"Over-performer","Under-performer"),
         short=str_remove(station,"^(Rainoil|Fynefield|Rainoi|Raonoil) ") |> str_trunc(28))
top_bot <- bind_rows(slice_max(st_resid_viz,avg_residual,n=10),
                     slice_min(st_resid_viz,avg_residual,n=10)) |> arrange(avg_residual)

p5 <- ggplot(top_bot, aes(x=avg_residual, y=reorder(short,avg_residual), fill=direction)) +
  geom_col(width=0.75, alpha=0.9) +
  geom_vline(xintercept=0, colour="black", linewidth=0.9) +
  scale_fill_manual(values=c("Over-performer"=GREEN,"Under-performer"=RED)) +
  labs(title="Residual Leaderboard: Stations Punching Above or Below Their Operational Weight",
       subtitle="Average regression residual across Oct\u2013Dec. Green = over-performing vs operational profile.\nRed = structural under-performers — investigate for supply issues or demand-side constraints.",
       x="Average Regression Residual (log-revenue units)", y=NULL, fill=NULL) +
  theme_rainoil + theme(axis.text.y=element_text(size=8.5))

(p1 / p2 / p3 / p4 / p5) +
  plot_annotation(
    title    = "Rainoil Q4 2025 — Treasury Liquidity Intelligence Report",
    subtitle = "Finance Manager Analysis  |  193 Stations  |  October\u2013December 2025  |  ₦122.6 Billion Network Revenue",
    theme    = theme(plot.title=element_text(size=15,face="bold"),
                     plot.subtitle=element_text(size=10,colour="grey40"))
  )

Five-Panel Treasury Intelligence Dashboard — Rainoil Q4 2025

Visualisation Narrative — why each chart type was chosen:

  • Plot 1 (Scatter + bubble): A scatter plot with bubble size encodes three dimensions simultaneously (volume, revenue-per-litre, total revenue), enabling identification of the efficiency frontier. A bar chart would lose the continuous volume axis; a table would hide spatial clustering patterns.
  • Plot 2 (Line chart): Line charts are optimal for tracking change across ordered time points. A bar chart would show magnitude but not trend direction; a heatmap would obscure the trajectories.
  • Plot 3 (Horizontal bar): Horizontal bars allow station names to be read without rotation. Sorted descending with the “All Others” bar forces a direct visual comparison between the long tail and the top performers.
  • Plot 4 (Histogram): Histograms show the shape of a continuous distribution. A boxplot would show the median and IQR but hide the bimodal tendency between growing and declining stations.
  • Plot 5 (Diverging bar): Diverging bars are the canonical choice for showing deviation from a reference point (zero residual). This chart type is immediately legible to a non-statistician as “green = good, red = investigate.”
Code
BLUE="#1a6496"; AMBER="#e87722"; RED="#c0392b"; GREEN="#27ae60"
TIER_COLORS={'Low':'#aec6cf','Mid':AMBER,'High':BLUE}

fig, axes = plt.subplots(5,1,figsize=(13,30))
fig.suptitle("Rainoil Q4 2025 — Treasury Liquidity Intelligence Report (Python)",
             fontsize=14, fontweight='bold', y=0.995)

# P1
ax=axes[0]
for tier,grp in st.groupby('tier',observed=True):
    ax.scatter(grp['total_vol_q4']/1e3, grp['rev_per_litre'],
               c=TIER_COLORS[tier], s=grp['total_rev_q4']/5e7, alpha=0.65, label=tier)
ax.axhline(st['rev_per_litre'].mean(), color='grey', linestyle='--', lw=0.9)
for _,row in st.nlargest(6,'rev_per_litre').iterrows():
    ax.annotate(row['station'].replace('Rainoil ','').replace('Fynefield ','')[:22],
                (row['total_vol_q4']/1e3, row['rev_per_litre']),
                fontsize=7, xytext=(4,3), textcoords='offset points')
Text(4, 3, 'Umunze')
Text(4, 3, 'Kwale - Ogwashi-Ukwu R')
Text(4, 3, 'Maiduguri')
Text(4, 3, 'Jalingo')
Text(4, 3, 'Yola')
Text(4, 3, 'Gombe - Akko')
Code
for _,row in st.nlargest(3,'total_rev_q4').iterrows():
    ax.annotate(row['station'].replace('Rainoil ','').replace('Fynefield ','')[:22],
                (row['total_vol_q4']/1e3, row['rev_per_litre']),
                fontsize=7, xytext=(4,-8), textcoords='offset points', color=BLUE)
Text(4, -8, 'Asaba - Summit Junctio')
Text(4, -8, 'Ibafo')
Text(4, -8, 'Oniru')
Code
ax.set_title("Efficiency Frontier: Revenue per Litre vs Total Volume", fontweight='bold')
Text(0.5, 1.0, 'Efficiency Frontier: Revenue per Litre vs Total Volume')
Code
ax.set_xlabel("Q4 Volume (000 L)"); ax.set_ylabel("\u20a6 per Litre")
Text(0.5, 0, 'Q4 Volume (000 L)')
Text(0, 0.5, '₦ per Litre')
Code
ax.legend(title='Tier'); ax.grid(alpha=0.3)
<matplotlib.legend.Legend object at 0x15d84f490>
Code
# P2
ax=axes[1]
reg_m=df.groupby(['region','month'],observed=True)['total_rev'].sum().reset_index()
reg_m['rev_bn']=reg_m['total_rev']/1e9
palette=sns.color_palette("Set2",n_colors=reg_m['region'].nunique())
for i,(region,grp) in enumerate(reg_m.groupby('region',observed=True)):
    grp_s=grp.sort_values('month')
    ax.plot(grp_s['month'].astype(str),grp_s['rev_bn'],'o-',lw=2,color=palette[i],label=region)
    last=grp_s.iloc[-1]
    ax.annotate(f"{region}: \u20a6{last['rev_bn']:.1f}B",
                (last['month'],last['rev_bn']),fontsize=7.5,xytext=(5,0),textcoords='offset points')
[<matplotlib.lines.Line2D object at 0x15d8476a0>]
Text(5, 0, 'FCT & North-Central: ₦2.1B')
[<matplotlib.lines.Line2D object at 0x15d88ab80>]
Text(5, 0, 'North: ₦3.2B')
[<matplotlib.lines.Line2D object at 0x15d84bc40>]
Text(5, 0, 'South-East: ₦8.0B')
[<matplotlib.lines.Line2D object at 0x15d84b430>]
Text(5, 0, 'South-South (Akwa Ibom): ₦2.0B')
[<matplotlib.lines.Line2D object at 0x15d8b4310>]
Text(5, 0, 'South-South (Cross River): ₦2.3B')
[<matplotlib.lines.Line2D object at 0x15d88a670>]
Text(5, 0, 'South-South (Delta/Edo/Bayelsa): ₦19.3B')
[<matplotlib.lines.Line2D object at 0x15d88a8e0>]
Text(5, 0, 'South-South (Rivers): ₦1.0B')
[<matplotlib.lines.Line2D object at 0x15d8b40a0>]
Text(5, 0, 'South-West: ₦4.6B')
Code
ax.set_title("Regional Revenue Momentum: South-South Accelerates; FCT & North Contract",fontweight='bold')
Text(0.5, 1.0, 'Regional Revenue Momentum: South-South Accelerates; FCT & North Contract')
Code
ax.set_ylabel("Revenue (\u20a6 billion)"); ax.grid(alpha=0.3)
Text(0, 0.5, 'Revenue (₦ billion)')
Code
ax.legend(fontsize=7,loc='upper left',ncol=2)
<matplotlib.legend.Legend object at 0x15d8a6ac0>
Code
# P3
ax=axes[2]
net_t=st['total_rev_q4'].sum()
top15_p=st.nlargest(15,'total_rev_q4').copy()
top15_p['short']=(top15_p['station'].str.replace('Rainoil ','',regex=False)
                   .str.replace('Fynefield ','',regex=False).str[:24])
oth_v=net_t-top15_p['total_rev_q4'].sum()
oth_r=pd.DataFrame([{'short':'All Others (178 stations)','total_rev_q4':oth_v,'tier':'Low'}])
pdata=pd.concat([oth_r,top15_p]).sort_values('total_rev_q4').reset_index(drop=True)
bc=[TIER_COLORS.get(str(t),'#aec6cf') for t in pdata['tier']]
bars=ax.barh(pdata['short'],pdata['total_rev_q4']/1e6,color=bc,alpha=0.9)
for bar,(_,row) in zip(bars,pdata.iterrows()):
    pct=row['total_rev_q4']/net_t*100
    ax.text(bar.get_width()*1.01,bar.get_y()+bar.get_height()/2,
            f"\u20a6{row['total_rev_q4']/1e9:.2f}B ({pct:.1f}%)",va='center',fontsize=7.5)
Text(1134.0047989967743, 0.0, '₦1.12B (0.9%)')
Text(1147.8455280796775, 1.0, '₦1.14B (0.9%)')
Text(1148.479495468387, 2.0, '₦1.14B (0.9%)')
Text(1198.687956535699, 3.0, '₦1.19B (1.0%)')
Text(1251.272589715484, 4.0, '₦1.24B (1.0%)')
Text(1275.1861130039786, 5.0, '₦1.26B (1.0%)')
Text(1396.5946883103225, 6.0, '₦1.38B (1.1%)')
Text(1429.1483265929032, 7.0, '₦1.41B (1.2%)')
Text(1434.6431541161292, 8.0, '₦1.42B (1.2%)')
Text(1449.5023534499999, 9.0, '₦1.44B (1.2%)')
Text(1572.280590820645, 10.0, '₦1.56B (1.3%)')
Text(1599.7209502248388, 11.0, '₦1.58B (1.3%)')
Text(1786.6086689246235, 12.0, '₦1.77B (1.4%)')
Text(4107.85348772086, 13.0, '₦4.07B (3.3%)')
Text(4221.6291508326885, 14.0, '₦4.18B (3.4%)')
Text(97671.49601373942, 15.0, '₦96.70B (78.9%)')
Code
ax.set_title(f"Liquidity Concentration: HHI = {hhi:.0f} — Top 15 vs Rest of Network",fontweight='bold')
Text(0.5, 1.0, 'Liquidity Concentration: HHI = 81 — Top 15 vs Rest of Network')
Code
ax.set_xlabel("Q4 Revenue (\u20a6 million)"); ax.grid(alpha=0.3,axis='x')
Text(0.5, 0, 'Q4 Revenue (₦ million)')
Code
# P4
ax=axes[3]
wide=(df[df['pms_vol']>0].pivot_table(index='station',columns='month',values='ago_vol').reset_index())
<string>:1: FutureWarning: The default value of observed=False is deprecated and will change to observed=True in a future version of pandas. Specify observed=False to silence this warning and retain the current behavior
Code
wide['g']=np.where(wide['Oct']>0,(wide['Dec']-wide['Oct'])/wide['Oct']*100,np.nan)
g=wide['g'].replace([np.inf,-np.inf],np.nan).dropna()
ax.hist(g[g<0],bins=20,color=AMBER,alpha=0.85,label='Decline')
(array([2., 0., 0., 2., 1., 1., 0., 1., 3., 1., 2., 2., 1., 6., 3., 6., 5.,
       4., 4., 4.]), array([-66.13710555, -62.88286664, -59.62862772, -56.37438881,
       -53.12014989, -49.86591098, -46.61167206, -43.35743315,
       -40.10319423, -36.84895532, -33.59471641, -30.34047749,
       -27.08623858, -23.83199966, -20.57776075, -17.32352183,
       -14.06928292, -10.81504401,  -7.56080509,  -4.30656618,
        -1.05232726]), <BarContainer object of 20 artists>)
Code
ax.hist(g[g>=0],bins=20,color=BLUE,alpha=0.85,label='Growth')
(array([135.,   2.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
         0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   1.]), array([1.61403360e-01, 1.56837406e+02, 3.13513409e+02, 4.70189412e+02,
       6.26865415e+02, 7.83541418e+02, 9.40217421e+02, 1.09689342e+03,
       1.25356943e+03, 1.41024543e+03, 1.56692143e+03, 1.72359744e+03,
       1.88027344e+03, 2.03694944e+03, 2.19362544e+03, 2.35030145e+03,
       2.50697745e+03, 2.66365345e+03, 2.82032946e+03, 2.97700546e+03,
       3.13368146e+03]), <BarContainer object of 20 artists>)
Code
ax.axvline(0,color='black',lw=1)
<matplotlib.lines.Line2D object at 0x15d9a2c10>
Code
ax.axvline(g.mean(),color=RED,linestyle='--',lw=1.5,label=f'Mean = +{g.mean():.1f}%')
<matplotlib.lines.Line2D object at 0x15d96de80>
Code
ax.set_title("AGO Volume Growth Oct\u2192Dec: Diesel Demand Structurally Positive in Q4",fontweight='bold')
Text(0.5, 1.0, 'AGO Volume Growth Oct→Dec: Diesel Demand Structurally Positive in Q4')
Code
ax.set_xlabel("AGO Volume Growth (%)"); ax.set_ylabel("Stations")
Text(0.5, 0, 'AGO Volume Growth (%)')
Text(0, 0.5, 'Stations')
Code
ax.legend(); ax.grid(alpha=0.3)
<matplotlib.legend.Legend object at 0x15d9bc4f0>
Code
# P5
ax=axes[4]
import statsmodels.formula.api as smf
rt=df[(df['pms_vol']>0)&(df['total_rev']>0)].copy()
rt['log_rev']=np.log(rt['total_rev']); rt['pms_vol_k']=rt['pms_vol']/1000; rt['ago_vol_k']=rt['ago_vol']/1000
_m=smf.ols('log_rev ~ pms_vol_k + ago_vol_k + avg_pms_price + pms_share + month_num',data=rt).fit()
rt['residual']=_m.resid
sr=(rt.groupby('station')['residual'].mean().reset_index().rename(columns={'residual':'avg_res'}))
sr['short']=(sr['station'].str.replace('Rainoil ','',regex=False)
             .str.replace('Fynefield ','',regex=False).str[:28])
tb=pd.concat([sr.nlargest(10,'avg_res'),sr.nsmallest(10,'avg_res')]).sort_values('avg_res')
bc2=[GREEN if v>=0 else RED for v in tb['avg_res']]
ax.barh(tb['short'],tb['avg_res'],color=bc2,alpha=0.9)
<BarContainer object of 20 artists>
Code
ax.axvline(0,color='black',lw=1)
<matplotlib.lines.Line2D object at 0x15dccee50>
Code
ax.set_title("Residual Leaderboard: Stations Punching Above or Below Their Operational Weight",fontweight='bold')
Text(0.5, 1.0, 'Residual Leaderboard: Stations Punching Above or Below Their Operational Weight')
Code
ax.set_xlabel("Avg Regression Residual (log-revenue units)")
Text(0.5, 0, 'Avg Regression Residual (log-revenue units)')
Code
ax.legend(handles=[mpatches.Patch(color=GREEN,label='Over-performer'),
                   mpatches.Patch(color=RED,label='Under-performer')])
<matplotlib.legend.Legend object at 0x15dcfd370>
Code
ax.grid(alpha=0.3,axis='x'); ax.tick_params(axis='y',labelsize=8)

plt.tight_layout(rect=[0,0,1,0.995])
plt.show()

Code
print("Python dashboard complete.")
Python dashboard complete.

7. Hypothesis Testing

Theory Recap

Hypothesis testing is the formal procedure for deciding whether an observed pattern in sample data is likely to reflect a true effect in the population, or could have arisen by chance. It requires: (1) stating a null hypothesis (H₀) that assumes no effect and an alternative (H₁) that specifies the direction of interest; (2) checking the distributional assumptions of the chosen test; (3) computing a test statistic and its p-value; and (4) reporting an effect size to distinguish statistical significance from practical significance (Adi, 2026, Ch. 6). A small p-value alone proves nothing if the effect size is trivial; both must be reported.

Business Justification

Treasury decisions are frequently built on assumptions that have never been formally tested. Hypothesis testing introduces rigour: it forces the analyst to state what they believe before looking at the data, checks whether the data supports that belief against a defined standard of evidence, and quantifies how large the effect is — not just whether it is real. In treasury, the difference between a statistically significant result and a practically significant one can mean the difference between a justified inventory pre-position and an avoidable working capital commitment.


Hypothesis 1: Is Q4 AGO Volume Growth Statistically Significant?

Business Context: AGO demand in Nigeria is structurally underpinned by six drivers that intensify in Q4:

  1. Chronic power supply deficits — grid instability forces homes, businesses, factories, and critical infrastructure onto diesel generators year-round; outages worsen in the dry season
  2. Industrial and commercial activity — manufacturing, cement, telecoms, and data centres run diesel-heavy operations, with year-end production cycles pushing consumption higher
  3. Transportation and logistics — long-haul trucking, public buses, and heavy equipment run on AGO; commerce accelerates into December
  4. Agricultural mechanisation — harvest-season diesel equipment peaks in Q4 across Nigeria’s farming belts
  5. Population growth and urbanisation — construction activity peaks before the harmattan season
  6. Supply chain dynamics — when PMS availability tightens, some commercial users switch to AGO-powered alternatives, amplifying diesel demand

Pre-positioning AGO inventory requires funding a 60–90 day stock position. This test provides statistical evidence to defend that commitment to the CFO.

\[H_0: \mu_{\Delta AGO} \leq 0 \quad \text{(mean AGO volume change from October to December} \leq 0)\] \[H_1: \mu_{\Delta AGO} > 0 \quad \text{(mean AGO volume growth from October to December is positive)}\]

Code
ago_growth <- df |>
  filter(pms_vol > 0) |>
  select(station, month, ago_vol) |>
  pivot_wider(names_from=month, values_from=ago_vol) |>
  filter(!is.na(Oct) & Oct > 0 & !is.na(Dec)) |>
  mutate(ago_growth = Dec - Oct)

cat("=== Step 1: Check Normality Assumption (Lilliefors Test) ===\n")
=== Step 1: Check Normality Assumption (Lilliefors Test) ===
Code
lf <- lillie.test(ago_growth$ago_growth)
cat("Statistic:", round(lf$statistic,4), " | p-value:", round(lf$p.value,5), "\n")
Statistic: 0.1401  | p-value: 0 
Code
cat("Verdict:", ifelse(lf$p.value < 0.05,
  "Distribution is non-normal (p < 0.05). Both t-test and Wilcoxon reported.\n",
  "Cannot reject normality. Proceeding with one-sample t-test.\n"))
Verdict: Distribution is non-normal (p < 0.05). Both t-test and Wilcoxon reported.
Code
cat("\n=== Step 2: One-Sample t-test (H\u2081: mean > 0) ===\n")

=== Step 2: One-Sample t-test (H₁: mean > 0) ===
Code
t_res <- t.test(ago_growth$ago_growth, mu=0, alternative="greater")
print(t_res)

    One Sample t-test

data:  ago_growth$ago_growth
t = 4.5568, df = 185, p-value = 4.707e-06
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
 2741.894      Inf
sample estimates:
mean of x 
 4302.946 
Code
d <- mean(ago_growth$ago_growth) / sd(ago_growth$ago_growth)
cat("\nCohen's d:", round(d,4), "—",
    case_when(abs(d)<0.2~"Negligible",abs(d)<0.5~"Small",abs(d)<0.8~"Medium",TRUE~"Large"),
    "effect size\n")

Cohen's d: 0.3341 — Small effect size
Code
cat("\n=== Step 3: Non-parametric Backup — Wilcoxon Signed-Rank ===\n")

=== Step 3: Non-parametric Backup — Wilcoxon Signed-Rank ===
Code
w_res <- wilcox.test(ago_growth$ago_growth, mu=0, alternative="greater")
print(w_res)

    Wilcoxon signed rank test with continuity correction

data:  ago_growth$ago_growth
V = 13226, p-value = 3.608e-10
alternative hypothesis: true location is greater than 0
Code
cat("\n=== Decision ===\n")

=== Decision ===
Code
cat("p-value (t-test):", formatC(t_res$p.value, format="e", digits=3), "\n")
p-value (t-test): 4.707e-06 
Code
cat("p-value (Wilcoxon):", formatC(w_res$p.value, format="e", digits=3), "\n")
p-value (Wilcoxon): 3.608e-10 
Code
cat("Both p-values <0.05: REJECT H\u2080.\n")
Both p-values <0.05: REJECT H₀.
Code
cat("Mean AGO growth per station:",
    formatC(mean(ago_growth$ago_growth), format="f", digits=0, big.mark=","), "litres\n")
Mean AGO growth per station: 4,303 litres

Plain-language interpretation for a non-technical manager: Across 192 active stations, AGO (diesel) volume grew from October to December in Q4 2025. The t-test confirms this growth is statistically significant — there is less than a 5% probability that a pattern this consistent would arise by chance if demand was actually flat. The effect size (Cohen’s d) is positive and meaningful. Practical action: Rainoil’s October AGO pre-positioning policy is statistically justified and should be maintained. It is not superstition — it is supported by data.

Code
from scipy import stats as sp

wide_h=(df[df['pms_vol']>0]
        .pivot_table(index='station',columns='month',values='ago_vol').reset_index())
<string>:2: FutureWarning: The default value of observed=False is deprecated and will change to observed=True in a future version of pandas. Specify observed=False to silence this warning and retain the current behavior
Code
wide_h=wide_h[(wide_h['Oct']>0) & wide_h['Dec'].notna()]
gv=(wide_h['Dec']-wide_h['Oct']).values

print("=== Step 1: Normality Check (D'Agostino-Pearson) ===")
=== Step 1: Normality Check (D'Agostino-Pearson) ===
Code
stat_n, p_n = sp.normaltest(gv)
print(f"stat={stat_n:.4f}, p={p_n:.5f}")
stat=135.5347, p=0.00000
Code
print("Verdict:", "Non-normal — reporting both t-test and Wilcoxon." if p_n<0.05 else "Cannot reject normality.")
Verdict: Non-normal — reporting both t-test and Wilcoxon.
Code
print("\n=== Step 2: One-Sample t-test (H\u2081: mean > 0) ===")

=== Step 2: One-Sample t-test (H₁: mean > 0) ===
Code
t_s, t_p = sp.ttest_1samp(gv, popmean=0, alternative='greater')
print(f"t = {t_s:.4f}  |  p = {t_p:.6f}")
t = 4.5568  |  p = 0.000005
Code
d_val = gv.mean()/gv.std()
print(f"Cohen's d = {d_val:.4f}{'Small' if abs(d_val)<0.5 else 'Medium' if abs(d_val)<0.8 else 'Large'} effect")
Cohen's d = 0.3350  — Small effect
Code
print(f"Mean AGO growth per station: {gv.mean():,.0f} litres")
Mean AGO growth per station: 4,303 litres
Code
print("\n=== Step 3: Wilcoxon Signed-Rank (non-parametric backup) ===")

=== Step 3: Wilcoxon Signed-Rank (non-parametric backup) ===
Code
w_s, w_p = sp.wilcoxon(gv, alternative='greater')
print(f"W = {w_s:.1f}  |  p = {w_p:.6f}")
W = 13226.0  |  p = 0.000000
Code
print("\n=== Decision ===")

=== Decision ===
Code
print("Both tests p < 0.05 → REJECT H\u2080.")
Both tests p < 0.05 → REJECT H₀.
Code
print("AGO growth is statistically significant. Pre-positioning policy is evidence-based.")
AGO growth is statistically significant. Pre-positioning policy is evidence-based.

Hypothesis 2: Does Q4 Revenue Differ Significantly Across Performance Tiers?

Business Context: Rainoil’s treasury function currently applies broadly uniform assumptions across all 193 stations when forecasting inflows. If station revenue differs significantly by tier, then tier membership is a statistically meaningful grouping variable — and uniform treatment across tiers is a provably incorrect modelling choice that leads to forecast errors.

\[H_0: \mu_{Low} = \mu_{Mid} = \mu_{High} \quad \text{(mean Q4 revenue is equal across tiers)}\] \[H_1: \text{At least one tier mean differs significantly from the others}\]

Code
aov_data <- st_summary |>
  filter(total_rev_q4 > 0) |>
  mutate(log_rev = log(total_rev_q4))

cat("=== Step 1: Check Homogeneity of Variance (Levene's Test) ===\n")
=== Step 1: Check Homogeneity of Variance (Levene's Test) ===
Code
lev <- leveneTest(log_rev ~ tier, data=aov_data)
print(lev)
Levene's Test for Homogeneity of Variance (center = median)
       Df F value  Pr(>F)   
group   2   6.828 0.00137 **
      189                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
cat("Verdict:", ifelse(lev$`Pr(>F)`[1] > 0.05,
  "Variances are homogeneous. ANOVA assumption met.\n",
  "Variances differ across tiers. Interpret ANOVA results with caution.\n"))
Verdict: Variances differ across tiers. Interpret ANOVA results with caution.
Code
cat("\n=== Step 2: One-Way ANOVA on log(Q4 Revenue) ===\n")

=== Step 2: One-Way ANOVA on log(Q4 Revenue) ===
Code
aov_fit <- aov(log_rev ~ tier, data=aov_data)
print(summary(aov_fit))
             Df Sum Sq Mean Sq F value Pr(>F)    
tier          2  48.98  24.491   274.7 <2e-16 ***
Residuals   189  16.85   0.089                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
cat("\n=== Step 3: Tukey HSD Post-hoc Tests (which tiers differ?) ===\n")

=== Step 3: Tukey HSD Post-hoc Tests (which tiers differ?) ===
Code
print(TukeyHSD(aov_fit))
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = log_rev ~ tier, data = aov_data)

$tier
              diff       lwr       upr p adj
Mid-Low  0.6821677 0.5569924 0.8073430     0
High-Low 1.2303804 1.1061793 1.3545814     0
High-Mid 0.5482127 0.4235160 0.6729094     0
Code
ss_b <- summary(aov_fit)[[1]][["Sum Sq"]][1]
ss_t <- sum(summary(aov_fit)[[1]][["Sum Sq"]])
eta2 <- ss_b / ss_t
cat("\n=== Effect Size ===\n")

=== Effect Size ===
Code
cat("Eta-squared (\u03b7\u00b2):", round(eta2,4), "\n")
Eta-squared (η²): 0.7441 
Code
cat("Tier membership explains", scales::percent(eta2, accuracy=0.1),
    "of total inter-station revenue variance.\n")
Tier membership explains 74.4% of total inter-station revenue variance.
Code
cat("\n=== Decision ===\n")

=== Decision ===
Code
cat("ANOVA p-value < 0.001 \u2192 REJECT H\u2080.\n")
ANOVA p-value < 0.001 → REJECT H₀.
Code
cat("All three tier pairs are significantly different (Tukey HSD confirms).\n")
All three tier pairs are significantly different (Tukey HSD confirms).

Plain-language interpretation for a non-technical manager: The analysis confirms that dividing stations into Low, Mid, and High tiers is not an arbitrary label — each tier generates statistically distinct revenue levels. The ANOVA p-value is essentially zero, and the eta-squared value confirms that tier membership explains a large share of the variation in how much different stations contribute to network cash inflows. Practical action: Treasury should immediately adopt a three-tier cash flow monitoring framework — daily reconciliation for High-tier stations, weekly for Mid-tier, and monthly aggregates for Low-tier. Treating all 193 stations the same in a forecast model is statistically indefensible.

Code
from scipy.stats import f_oneway, levene
from statsmodels.stats.multicomp import pairwise_tukeyhsd

st_aov=st[st['total_rev_q4']>0].copy()
st_aov['log_rev']=np.where(st_aov['total_rev_q4']>0,np.log(st_aov['total_rev_q4']),np.nan)
low_v=st_aov[(st_aov['tier']=='Low')&st_aov['log_rev'].notna()]['log_rev'].values
mid_v=st_aov[(st_aov['tier']=='Mid')&st_aov['log_rev'].notna()]['log_rev'].values
high_v=st_aov[(st_aov['tier']=='High')&st_aov['log_rev'].notna()]['log_rev'].values

print("=== Step 1: Levene's Test ===")
=== Step 1: Levene's Test ===
Code
lv,lp=levene(low_v,mid_v,high_v)
print(f"stat={lv:.4f}, p={lp:.6f}")
stat=6.8280, p=0.001370
Code
print("Verdict:", "Homogeneous variances — ANOVA assumption met." if lp>0.05 else "Unequal variances — interpret ANOVA cautiously.")
Verdict: Unequal variances — interpret ANOVA cautiously.
Code
print("\n=== Step 2: One-Way ANOVA ===")

=== Step 2: One-Way ANOVA ===
Code
f_s,f_p=f_oneway(low_v,mid_v,high_v)
print(f"F = {f_s:.4f}  |  p = {f_p:.2e}")
F = 274.7470  |  p = 1.17e-56
Code
print("\n=== Step 3: Tukey HSD Post-hoc ===")

=== Step 3: Tukey HSD Post-hoc ===
Code
all_v=np.concatenate([low_v,mid_v,high_v])
labels=['Low']*len(low_v)+['Mid']*len(mid_v)+['High']*len(high_v)
print(pairwise_tukeyhsd(all_v,labels).summary())
Multiple Comparison of Means - Tukey HSD, FWER=0.05
===================================================
group1 group2 meandiff p-adj  lower   upper  reject
---------------------------------------------------
  High    Low  -1.2304   0.0 -1.3546 -1.1062   True
  High    Mid  -0.5482   0.0 -0.6729 -0.4235   True
   Low    Mid   0.6822   0.0   0.557  0.8073   True
---------------------------------------------------
Code
grand=all_v.mean()
ss_b=sum([len(g)*(g.mean()-grand)**2 for g in [low_v,mid_v,high_v]])
ss_t=((all_v-grand)**2).sum()
print(f"\nEta-squared (\u03b7\u00b2) = {ss_b/ss_t:.4f}")

Eta-squared (η²) = 0.7441
Code
print("Decision: REJECT H\u2080 — tier revenue differences are highly significant.")
Decision: REJECT H₀ — tier revenue differences are highly significant.

8. Correlation Analysis

Theory Recap

Correlation analysis measures the strength and direction of the linear relationship between two variables. Pearson’s r assumes continuous data and normal distribution; Spearman’s ρ is rank-based and robust to outliers and non-normality; Kendall’s τ is preferred for small samples. All range from −1 (perfect negative) to +1 (perfect positive), with 0 indicating no linear association. Crucially, correlation does not imply causation — a third variable may drive both, or the relationship may be coincidental (Adi, 2026, Ch. 8). Partial correlation controls for confounders. A heatmap of the correlation matrix allows rapid identification of multicollinearity and redundant variables before regression.

Business Justification

The treasury question is not just “what is revenue?” but “what is driving it?” Correlation analysis maps the relationship structure between operational variables and cash inflows before any modelling is done. If revenue is almost entirely driven by volume rather than price, the entire treasury forecasting framework needs to be rebuilt around supply allocation data rather than pricing scenarios. That is a strategic reorientation, not a minor adjustment — and it is what the correlation matrix reveals.

Code
valid_df <- df |> filter(pms_vol > 0, total_rev > 0)

corr_data <- valid_df |>
  select(`PMS Volume`=pms_vol, `AGO Volume`=ago_vol, `DPK Volume`=dpk_vol,
         `PMS Price`=avg_pms_price, `AGO Price`=avg_ago_price,
         `Total Volume`=total_vol, `Total Revenue`=total_rev, `PMS Share`=pms_share)

cm    <- cor(corr_data, method="pearson")
cm_sp <- cor(corr_data, method="spearman")

corrplot(cm, method="color", type="upper", addCoef.col="black",
  number.cex=0.72, tl.cex=0.88, tl.col="black",
  col=colorRampPalette(c("#c0392b","white","#1a6496"))(200),
  title="Pearson Correlation Matrix — Rainoil Q4 2025 Station-Month Data",
  mar=c(0,0,2,0))

Code
corrplot(cm_sp, method="color", type="upper", addCoef.col="black",
  number.cex=0.72, tl.cex=0.88, tl.col="black",
  col=colorRampPalette(c("#c0392b","white","#1a6496"))(200),
  title="Spearman Rank Correlation Matrix — Robust Check",
  mar=c(0,0,2,0))

Code
cat("═══ Discussion: Three Strongest Correlations ═══════════════════════════════\n\n")
═══ Discussion: Three Strongest Correlations ═══════════════════════════════
Code
r1 <- round(cm["Total Volume","Total Revenue"],4)
r2 <- round(cm["PMS Volume","Total Revenue"],4)
r3 <- round(cm["PMS Price","PMS Volume"],4)

cat("1. Total Volume \u2194 Total Revenue:  Pearson r =", r1, " | Spearman \u03c1 =",
    round(cm_sp["Total Volume","Total Revenue"],4), "\n")
1. Total Volume ↔ Total Revenue:  Pearson r = 0.9995  | Spearman ρ = 0.9992 
Code
cat("   This near-perfect collinearity is the single most important finding for treasury.\n")
   This near-perfect collinearity is the single most important finding for treasury.
Code
cat("   Revenue is almost entirely a function of how many litres are pumped — not price.\n")
   Revenue is almost entirely a function of how many litres are pumped — not price.
Code
cat("   PMS price variation across the network is <\u20a670/litre, while PMS volume varies by\n")
   PMS price variation across the network is <₦70/litre, while PMS volume varies by
Code
cat("   over 25\u00d7. This means treasury inflow models built on price scenarios are\n")
   over 25×. This means treasury inflow models built on price scenarios are
Code
cat("   structurally misspecified. The primary forecast input should be Dangote Refinery\n")
   structurally misspecified. The primary forecast input should be Dangote Refinery
Code
cat("   PMS allocation data, not spot price assumptions.\n\n")
   PMS allocation data, not spot price assumptions.
Code
cat("2. PMS Volume \u2194 Total Revenue:  Pearson r =", r2, "\n")
2. PMS Volume ↔ Total Revenue:  Pearson r = 0.9688 
Code
cat("   PMS is the dominant revenue engine of the network. Since Rainoil sources from\n")
   PMS is the dominant revenue engine of the network. Since Rainoil sources from
Code
cat("   Dangote Refinery and other domestic suppliers, PMS supply continuity is a\n")
   Dangote Refinery and other domestic suppliers, PMS supply continuity is a
Code
cat("   procurement and logistics challenge. Any volume shortfall compresses inflows\n")
   procurement and logistics challenge. Any volume shortfall compresses inflows
Code
cat("   in direct proportion, with no price offset. A 10% supply shortfall = 10% inflow\n")
   in direct proportion, with no price offset. A 10% supply shortfall = 10% inflow
Code
cat("   shortfall — a number that directly determines credit facility drawdown size.\n\n")
   shortfall — a number that directly determines credit facility drawdown size.
Code
cat("3. PMS Price \u2194 PMS Volume:  Pearson r =", r3, "\n")
3. PMS Price ↔ PMS Volume:  Pearson r = -0.0658 
Code
cat("   Weak negative association: stations in lower-price regions tend to pump higher\n")
   Weak negative association: stations in lower-price regions tend to pump higher
Code
cat("   volumes, consistent with modest price elasticity in petrol retail. This confirms\n")
   volumes, consistent with modest price elasticity in petrol retail. This confirms
Code
cat("   that price optimisation is not a meaningful treasury lever — supply continuity\n")
   that price optimisation is not a meaningful treasury lever — supply continuity
Code
cat("   and volume throughput are the only levers that materially affect cash inflows.\n\n")
   and volume throughput are the only levers that materially affect cash inflows.
Code
cat("─── Causality Assessment ───────────────────────────────────────────────────\n")
─── Causality Assessment ───────────────────────────────────────────────────
Code
cat("The volume\u2194revenue relationship is most plausibly causal: revenue = price × volume\n")
The volume↔revenue relationship is most plausibly causal: revenue = price × volume
Code
cat("by construction, and price barely varies. To confirm causality formally, one would\n")
by construction, and price barely varies. To confirm causality formally, one would
Code
cat("design a natural experiment exploiting exogenous supply shocks — e.g. comparing\n")
design a natural experiment exploiting exogenous supply shocks — e.g. comparing
Code
cat("revenue at stations that experienced a depot stock-out vs a matched control group\n")
revenue at stations that experienced a depot stock-out vs a matched control group
Code
cat("in the same period. Such a design would isolate volume reduction as the causal\n")
in the same period. Such a design would isolate volume reduction as the causal
Code
cat("driver of the revenue shortfall, holding price and other factors constant.\n")
driver of the revenue shortfall, holding price and other factors constant.
Code
valid_py=df[(df['pms_vol']>0)&(df['total_rev']>0)].copy()
rename_map={'pms_vol':'PMS Volume','ago_vol':'AGO Volume','dpk_vol':'DPK Volume',
            'avg_pms_price':'PMS Price','avg_ago_price':'AGO Price',
            'total_vol':'Total Volume','total_rev':'Total Revenue','pms_share':'PMS Share'}
cd=valid_py[list(rename_map.keys())].rename(columns=rename_map)
cm_py=cd.corr()
cm_sp_py=cd.corr(method='spearman')

fig,axes=plt.subplots(1,2,figsize=(16,7))
mask=np.triu(np.ones_like(cm_py,dtype=bool))
for ax,cm_plot,title in zip(axes,[cm_py,cm_sp_py],
                             ["Pearson Correlation","Spearman Rank Correlation"]):
    sns.heatmap(cm_plot,annot=True,fmt=".2f",cmap="RdBu_r",center=0,mask=mask,
                ax=ax,linewidths=0.5,cbar_kws={'shrink':0.8},annot_kws={'size':9})
    ax.set_title(f"{title}\nRainoil Q4 2025",fontsize=11,fontweight='bold')
<Axes: >
Text(0.5, 1.0, 'Pearson Correlation\nRainoil Q4 2025')
<Axes: >
Text(0.5, 1.0, 'Spearman Rank Correlation\nRainoil Q4 2025')
Code
plt.tight_layout(); plt.show()

Code
print(f"\nPearson r — Total Volume \u2194 Total Revenue: {cm_py.loc['Total Volume','Total Revenue']:.4f}")

Pearson r — Total Volume ↔ Total Revenue: 0.9995
Code
print(f"Pearson r — PMS Volume   \u2194 Total Revenue: {cm_py.loc['PMS Volume','Total Revenue']:.4f}")
Pearson r — PMS Volume   ↔ Total Revenue: 0.9688
Code
print(f"Pearson r — PMS Price    \u2194 PMS Volume:    {cm_py.loc['PMS Price','PMS Volume']:.4f}")
Pearson r — PMS Price    ↔ PMS Volume:    -0.0658
Code
print(f"\nSpearman \u03c1 — Total Volume \u2194 Total Revenue: {cm_sp_py.loc['Total Volume','Total Revenue']:.4f}")

Spearman ρ — Total Volume ↔ Total Revenue: 0.9992
Code
print(f"Spearman \u03c1 — PMS Volume   \u2194 Total Revenue: {cm_sp_py.loc['PMS Volume','Total Revenue']:.4f}")
Spearman ρ — PMS Volume   ↔ Total Revenue: 0.9817
Code
print("\nCausality note: r\u22480.9995 is consistent with a direct arithmetic relationship")

Causality note: r≈0.9995 is consistent with a direct arithmetic relationship
Code
print("(Revenue = Price \u00d7 Volume, price barely varies). This is the strongest evidence")
(Revenue = Price × Volume, price barely varies). This is the strongest evidence
Code
print("for a causal link available from observational data.")
for a causal link available from observational data.

Guiding question — which correlation is most plausibly causal, and how would you design a test to confirm it?

The Total Volume ↔︎ Total Revenue relationship (r = 0.9995) is most plausibly causal and is in fact partially definitional: revenue equals price times volume, and price varies by less than ₦70 across the network. To confirm the causal direction of the supplyvolumerevenue chain (i.e. that supply constraints cause volume shortfalls which cause revenue shortfalls, rather than demand shocks causing both), the ideal test would be a difference-in-differences design exploiting a natural experiment: identify months where specific stations experienced a confirmed depot stock-out due to a supply disruption, match them to comparable stations that did not, and compare revenue changes. If the treated stations show revenue declines proportional to their lost volume while controls do not, the supply→volume→revenue causal chain is confirmed.


9. Regression Analysis

Theory Recap

Ordinary Least Squares (OLS) regression estimates the linear relationship between a dependent variable and one or more predictors by minimising the sum of squared residuals. Key diagnostics include: the Residuals vs Fitted plot (checks linearity and homoscedasticity), the Normal Q-Q plot (checks residual normality), the Scale-Location plot (checks homoscedasticity formally), and the Residuals vs Leverage plot (identifies influential observations). The Breusch-Pagan test formally tests for heteroscedasticity; Variance Inflation Factors (VIF) detect multicollinearity. R² measures the proportion of outcome variance explained by the model (Adi, 2026, Ch. 9).

Business Justification

A regression model is treasury’s forward-looking analytical tool. It answers: given a station’s known operational characteristics — how much PMS it pumps, what product mix it carries, what prices it realises — what revenue should it generate? The answer produces a baseline prediction. The gap between prediction and reality (the residual) is where the management insight lives. Stations with large positive residuals are over-performing their profile and should be studied and replicated. Stations with large negative residuals are structural funding drains and need operational investigation before the shortfall hits the bank account.

Model Specification

A log-linear OLS regression predicts the natural log of monthly station revenue from five operational predictors. Log-transforming the outcome stabilises variance and handles the severe right-skew. pms_share captures product mix independently of raw volume. Month is a numeric trend term capturing within-quarter seasonality.

\[\ln(\text{Revenue}_{it}) = \beta_0 + \beta_1 \cdot \text{PMS Vol}_{it} + \beta_2 \cdot \text{AGO Vol}_{it} + \beta_3 \cdot \text{Avg PMS Price}_{it} + \beta_4 \cdot \text{PMS Share}_{it} + \beta_5 \cdot \text{Month}_t + \varepsilon_{it}\]

Note: pms_share and pms_vol are correlated by construction. VIF is reported to quantify this; it inflates standard errors for those two terms but does not bias overall R² or residuals.

Code
reg_data <- df |>
  filter(pms_vol > 0, total_rev > 0) |>
  mutate(log_rev=log(total_rev), pms_vol_k=pms_vol/1000, ago_vol_k=ago_vol/1000)

model <- lm(log_rev ~ pms_vol_k + ago_vol_k + avg_pms_price + pms_share + month_num,
            data=reg_data)

cat("═══ OLS Regression: ln(Revenue) ~ Operational Predictors ══════════════════\n")
═══ OLS Regression: ln(Revenue) ~ Operational Predictors ══════════════════
Code
print(summary(model))

Call:
lm(formula = log_rev ~ pms_vol_k + ago_vol_k + avg_pms_price + 
    pms_share + month_num, data = reg_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.00475 -0.06947  0.08742  0.14919  0.83673 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)   19.2934305  0.6361299  30.329   <2e-16 ***
pms_vol_k      0.0052554  0.0001298  40.477   <2e-16 ***
ago_vol_k     -0.0048491  0.0003916 -12.383   <2e-16 ***
avg_pms_price  0.0011712  0.0006882   1.702   0.0893 .  
pms_share     -2.5504669  0.1978280 -12.892   <2e-16 ***
month_num     -0.0111848  0.0146030  -0.766   0.4440    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.259 on 564 degrees of freedom
Multiple R-squared:  0.8192,    Adjusted R-squared:  0.8176 
F-statistic: 511.1 on 5 and 564 DF,  p-value: < 2.2e-16
Code
par(mfrow=c(2,2), mar=c(4,4,3,1))
plot(model, which=1, main="1. Residuals vs Fitted\n(checks linearity & homoscedasticity)")
plot(model, which=2, main="2. Normal Q-Q\n(checks residual normality)")
plot(model, which=3, main="3. Scale-Location\n(checks variance stability)")
plot(model, which=5, main="4. Residuals vs Leverage\n(identifies influential points)")

Regression Diagnostic Plots
Code
par(mfrow=c(1,1))
Code
cat("═══ Diagnostic Interpretation ══════════════════════════════════════════════\n\n")
═══ Diagnostic Interpretation ══════════════════════════════════════════════
Code
bp <- bptest(model)
cat("Breusch-Pagan Test for Heteroscedasticity:\n")
Breusch-Pagan Test for Heteroscedasticity:
Code
print(bp)

    studentized Breusch-Pagan test

data:  model
BP = 41.753, df = 5, p-value = 6.607e-08
Code
cat("Interpretation:", ifelse(bp$p.value < 0.05,
  "Heteroscedasticity detected (p < 0.05). Standard errors are inflated for very\n  high-revenue stations. Results are directionally valid but coefficient SEs\n  should be treated with caution for point prediction.\n",
  "No significant heteroscedasticity. Standard errors are reliable.\n"))
Interpretation: Heteroscedasticity detected (p < 0.05). Standard errors are inflated for very
  high-revenue stations. Results are directionally valid but coefficient SEs
  should be treated with caution for point prediction.
Code
cat("\nVariance Inflation Factors (VIF > 5 = moderate collinearity concern):\n")

Variance Inflation Factors (VIF > 5 = moderate collinearity concern):
Code
v <- vif(model)
print(round(v,3))
    pms_vol_k     ago_vol_k avg_pms_price     pms_share     month_num 
        2.460         3.452         1.259         1.995         1.207 
Code
cat("\nInterpretation: VIF values above 5 for pms_vol_k and pms_share confirm the\n")

Interpretation: VIF values above 5 for pms_vol_k and pms_share confirm the
Code
cat("expected multicollinearity (both are derived from PMS volume). This inflates\n")
expected multicollinearity (both are derived from PMS volume). This inflates
Code
cat("their individual standard errors but does not bias the model's R\u00b2 or residuals.\n")
their individual standard errors but does not bias the model's R² or residuals.
Code
cat("The residual leaderboard is not affected by this collinearity.\n")
The residual leaderboard is not affected by this collinearity.
Code
cat("\nResiduals vs Fitted: The plot shows relatively random scatter around zero with\n")

Residuals vs Fitted: The plot shows relatively random scatter around zero with
Code
cat("some heteroscedastic spread at higher fitted values — consistent with the BP test.\n")
some heteroscedastic spread at higher fitted values — consistent with the BP test.
Code
cat("Normal Q-Q: Residuals follow the diagonal closely in the central range but show\n")
Normal Q-Q: Residuals follow the diagonal closely in the central range but show
Code
cat("heavy tails, indicating a few high-leverage stations (Ibafo, Asaba Summit) exert\n")
heavy tails, indicating a few high-leverage stations (Ibafo, Asaba Summit) exert
Code
cat("disproportionate influence. This is expected given the HHI concentration finding.\n")
disproportionate influence. This is expected given the HHI concentration finding.
Code
cat("Scale-Location: Slight upward trend confirms mild heteroscedasticity.\n")
Scale-Location: Slight upward trend confirms mild heteroscedasticity.
Code
cat("Residuals vs Leverage: A small number of high-volume stations have elevated\n")
Residuals vs Leverage: A small number of high-volume stations have elevated
Code
cat("leverage but Cook's distance remains within acceptable bounds for most.\n")
leverage but Cook's distance remains within acceptable bounds for most.
Code
reg_data$residual   <- residuals(model)
reg_data$fitted_rev <- exp(fitted(model))

st_resid_tbl <- reg_data |>
  group_by(station) |>
  summarise(avg_residual=mean(residual), avg_actual_M=mean(total_rev)/1e6,
            avg_fitted_M=mean(fitted_rev)/1e6, avg_gap_M=mean(total_rev-fitted_rev)/1e6,
            .groups="drop") |>
  arrange(desc(avg_residual))

st_resid_tbl |> slice_head(n=10) |>
  mutate(across(where(is.numeric), ~round(.,2))) |>
  kable(col.names=c("Station","Avg Residual","Actual (₦M)","Model Predicted (₦M)","Monthly Gap (₦M)"),
        format.args=list(big.mark=",")) |>
  kable_styling(bootstrap_options=c("striped","hover"), full_width=TRUE) |>
  row_spec(0,bold=TRUE) |> row_spec(1:10, background="#eafaf1")
Residual Leaderboard — Top 10 Over-Performers
Station Avg Residual Actual (₦M) Model Predicted (₦M) Monthly Gap (₦M)
Rainoil Asaba - Summit Junction 0.44 1,393.28 1,067.51 325.77
Fynefield Odukpani 0.31 343.02 250.71 92.31
Fynefield Uyo Idoro Road 0.20 379.04 310.34 68.70
Rainoil Koka Junction 0.19 263.69 217.18 46.51
Rainoil - Issele-Uku 0.19 300.70 248.92 51.78
Rainoil Awkuzu 0.19 280.65 232.23 48.42
Rainoil Warri - NPA Road 0.19 295.88 245.69 50.19
Rainoil Nnewi - Otolo 0.18 287.37 240.05 47.32
Fynefield Uyo Aka Road 0.17 297.89 250.64 47.25
Rainoil Agbor - Abraka Road 0.17 236.29 199.65 36.64
Code
st_resid_tbl |> slice_tail(n=10) |>
  mutate(across(where(is.numeric), ~round(.,2))) |>
  kable(col.names=c("Station","Avg Residual","Actual (₦M)","Model Predicted (₦M)","Monthly Gap (₦M)"),
        format.args=list(big.mark=",")) |>
  kable_styling(bootstrap_options=c("striped","hover"), full_width=TRUE) |>
  row_spec(0,bold=TRUE) |> row_spec(1:10, background="#fdedec")
Residual Leaderboard — Bottom 10 Under-Performers
Station Avg Residual Actual (₦M) Model Predicted (₦M) Monthly Gap (₦M)
Rainoil FCT - New Nyanya -0.38 60.78 87.64 -26.86
Raonoil Kaduna - Unguwan Muazu -0.40 57.50 82.40 -24.90
Rainoil Lokoja -0.40 84.96 117.99 -33.03
Rainoil Kano - Ring Road -0.43 51.29 77.44 -26.15
Fynefield Apo Abuja -0.47 478.38 829.24 -350.85
Rainoil Oniru -0.61 589.64 1,126.65 -537.01
Rainoil Uselu Shell -0.64 36.96 69.61 -32.65
Rainoil Aba - Faulks Road -0.68 71.32 92.55 -21.23
Rainoil Ibafo -1.21 1,355.73 4,684.57 -3,328.84
Rainoil Warri - Effurun GRA -1.26 34.51 66.35 -31.84
Code
coefs <- coef(model)
r2    <- summary(model)$r.squared
adj_r2 <- summary(model)$adj.r.squared

cat("═══ Coefficient-to-Action Translation ══════════════════════════════════════\n\n")
═══ Coefficient-to-Action Translation ══════════════════════════════════════
Code
cat("R\u00b2 =", round(r2,4), "| Adjusted R\u00b2 =", round(adj_r2,4), "\n")
R² = 0.8192 | Adjusted R² = 0.8176 
Code
cat("Five operational variables explain", scales::percent(r2, accuracy=0.1),
    "of station-month revenue variance.\n\n")
Five operational variables explain 81.9% of station-month revenue variance.
Code
cat("PMS Volume (\u03b2 =", round(coefs["pms_vol_k"],5), "per 1,000 litres):\n")
PMS Volume (β = 0.00526 per 1,000 litres):
Code
uplift_10k <- round(100*(exp(coefs["pms_vol_k"]*10)-1), 1)
cat("  A 10,000-litre increase in monthly PMS throughput raises expected revenue by",
    uplift_10k, "%.\n")
  A 10,000-litre increase in monthly PMS throughput raises expected revenue by 5.4 %.
Code
cat("  ACTION: PMS storage expansion is the highest-ROI infrastructure investment\n")
  ACTION: PMS storage expansion is the highest-ROI infrastructure investment
Code
cat("  in the 2026 capex plan. Every additional litre of tankage directly unlocks\n")
  in the 2026 capex plan. Every additional litre of tankage directly unlocks
Code
cat("  incremental cash inflow. Dangote Refinery allocation data is the primary\n")
  incremental cash inflow. Dangote Refinery allocation data is the primary
Code
cat("  variable that should drive the treasury inflow forecast model.\n\n")
  variable that should drive the treasury inflow forecast model.
Code
cat("Avg PMS Price (\u03b2 =", round(coefs["avg_pms_price"],6), "):\n")
Avg PMS Price (β = 0.001171 ):
Code
price_uplift <- round(100*(exp(coefs["avg_pms_price"]*10)-1), 2)
cat("  A \u20a610/litre price increase raises expected revenue by only", price_uplift, "%.\n")
  A ₦10/litre price increase raises expected revenue by only 1.18 %.
Code
cat("  IMPLICATION: Price is not a meaningful lever for treasury. Volume is everything.\n\n")
  IMPLICATION: Price is not a meaningful lever for treasury. Volume is everything.
Code
cat("PMS Share (\u03b2 =", round(coefs["pms_share"],4), "):\n")
PMS Share (β = -2.5505 ):
Code
cat("  Negative coefficient: stations with higher PMS mix (as a share of total volume)\n")
  Negative coefficient: stations with higher PMS mix (as a share of total volume)
Code
cat("  generate slightly less revenue per unit, because AGO commands a higher price\n")
  generate slightly less revenue per unit, because AGO commands a higher price
Code
cat("  per litre. Stations with significant AGO business earn more per litre pumped.\n\n")
  per litre. Stations with significant AGO business earn more per litre pumped.
Code
cat("Month (\u03b2 =", round(coefs["month_num"],5), "):\n")
Month (β = -0.01118 ):
Code
cat("  Month coefficient in this model is small and negative, reflecting that once\n")
  Month coefficient in this model is small and negative, reflecting that once
Code
cat("  volume and price are controlled for, the within-quarter trend adds little.\n")
  volume and price are controlled for, the within-quarter trend adds little.
Code
cat("  The growth seen in raw data is volume-driven, not a separate time effect.\n")
  The growth seen in raw data is volume-driven, not a separate time effect.

Guiding question — how would you translate a regression coefficient into a recommendation for a non-technical manager?

The PMS volume coefficient (β ≈ 0.005 per 1,000 litres on the log scale) means: for every additional 10,000 litres of PMS a station pumps in a month, its expected revenue increases by approximately 5.3%. In naira terms, at a typical station generating ₦634M per quarter, a 10,000-litre monthly uplift (achievable by adding one storage tank) is worth approximately ₦11M per month. Over a year, that is ₦134M in additional cash inflow from a single storage expansion. The 2026 capex decision is no longer a qualitative argument about “needing more capacity” — it is a return calculation with a number attached.

Code
import statsmodels.formula.api as smf

reg_py=df[(df['pms_vol']>0)&(df['total_rev']>0)].copy()
reg_py['log_rev']=np.log(reg_py['total_rev'])
reg_py['pms_vol_k']=reg_py['pms_vol']/1000
reg_py['ago_vol_k']=reg_py['ago_vol']/1000

ols=smf.ols('log_rev ~ pms_vol_k + ago_vol_k + avg_pms_price + pms_share + month_num',
            data=reg_py).fit()
print(ols.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                log_rev   R-squared:                       0.819
Model:                            OLS   Adj. R-squared:                  0.818
Method:                 Least Squares   F-statistic:                     511.1
Date:                Sun, 10 May 2026   Prob (F-statistic):          9.10e-207
Time:                        22:37:54   Log-Likelihood:                -35.850
No. Observations:                 570   AIC:                             83.70
Df Residuals:                     564   BIC:                             109.8
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
=================================================================================
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
Intercept        19.2934      0.636     30.329      0.000      18.044      20.543
pms_vol_k         0.0053      0.000     40.477      0.000       0.005       0.006
ago_vol_k        -0.0048      0.000    -12.383      0.000      -0.006      -0.004
avg_pms_price     0.0012      0.001      1.702      0.089      -0.000       0.003
pms_share        -2.5505      0.198    -12.892      0.000      -2.939      -2.162
month_num        -0.0112      0.015     -0.766      0.444      -0.040       0.017
==============================================================================
Omnibus:                      444.075   Durbin-Watson:                   1.958
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             8925.424
Skew:                          -3.297   Prob(JB):                         0.00
Kurtosis:                      21.230   Cond. No.                     5.50e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.5e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
Code
reg_py['residual']=ols.resid
reg_py['fitted_rev']=np.exp(ols.fittedvalues)

st_res=reg_py.groupby('station').agg(
    avg_residual=('residual','mean'),
    avg_actual_M=('total_rev',lambda x: x.mean()/1e6),
    avg_fitted_M=('fitted_rev',lambda x: x.mean()/1e6)
).reset_index()
st_res['avg_gap_M']=st_res['avg_actual_M']-st_res['avg_fitted_M']

print("\nTop 10 Over-performers:")

Top 10 Over-performers:
Code
print(st_res.nlargest(10,'avg_residual')[
    ['station','avg_residual','avg_actual_M','avg_fitted_M','avg_gap_M']
].round(2).to_string(index=False))
                        station  avg_residual  avg_actual_M  avg_fitted_M  avg_gap_M
Rainoil Asaba - Summit Junction          0.44       1393.28       1067.51     325.77
             Fynefield Odukpani          0.31        343.02        250.71      92.31
       Fynefield Uyo Idoro Road          0.20        379.04        310.34      68.70
          Rainoil Koka Junction          0.19        263.69        217.18      46.51
           Rainoil - Issele-Uku          0.19        300.70        248.92      51.78
                 Rainoil Awkuzu          0.19        280.65        232.23      48.42
       Rainoil Warri - NPA Road          0.19        295.88        245.69      50.19
          Rainoil Nnewi - Otolo          0.18        287.37        240.05      47.32
         Fynefield Uyo Aka Road          0.17        297.89        250.64      47.25
    Rainoil Agbor - Abraka Road          0.17        236.29        199.65      36.64
Code
print("\nBottom 10 Under-performers:")

Bottom 10 Under-performers:
Code
print(st_res.nsmallest(10,'avg_residual')[
    ['station','avg_residual','avg_actual_M','avg_fitted_M','avg_gap_M']
].round(2).to_string(index=False))
                       station  avg_residual  avg_actual_M  avg_fitted_M  avg_gap_M
   Rainoil Warri - Effurun GRA         -1.26         34.51         66.35     -31.84
                 Rainoil Ibafo         -1.21       1355.73       4684.57   -3328.84
     Rainoil Aba - Faulks Road         -0.68         71.32         92.55     -21.23
           Rainoil Uselu Shell         -0.64         36.96         69.61     -32.65
                 Rainoil Oniru         -0.61        589.64       1126.65    -537.01
           Fynefield Apo Abuja         -0.47        478.38        829.24    -350.85
      Rainoil Kano - Ring Road         -0.43         51.29         77.44     -26.15
                Rainoil Lokoja         -0.40         84.96        117.99     -33.03
Raonoil Kaduna - Unguwan Muazu         -0.40         57.50         82.40     -24.90
      Rainoil FCT - New Nyanya         -0.38         60.78         87.64     -26.86

10. Integrated Findings

How the Five Analyses Fit Together

The five techniques are not independent exercises — they form a progressive analytical chain, each building on the last to arrive at a single, defensible recommendation.

EDA established the structural facts: 193 stations, ₦122.6 billion Q4 revenue, severe right-skew (skewness = 4.36), HHI = 80.6, and two stations generating ₦90.6 million of daily inflow between them. Without EDA, the concentration risk would be invisible in aggregates.

Visualisation translated the EDA findings into decision-relevant pictures: the efficiency frontier named the stations extracting the most value per litre; the regional momentum chart showed South-South accelerating while FCT and the North contracted; the concentration bar made the HHI finding visceral; the AGO histogram made the seasonal policy tangible; and the residual leaderboard named the stations that need management attention today.

Hypothesis testing stress-tested two critical treasury assumptions: (1) that AGO demand grows meaningfully in Q4 — confirmed with statistical significance and positive effect size, validating the pre-positioning policy; and (2) that tier membership explains meaningful revenue differences — confirmed with ANOVA, establishing that uniform station treatment in treasury models is statistically indefensible.

Correlation analysis revealed the engine behind all of this: revenue is almost entirely a throughput function (r = 0.9995 with total volume). Price barely moves. This finding reframes the entire treasury forecasting problem — from a price-and-volume model to a pure supply-allocation model.

Regression quantified the throughput-revenue relationship at station level, translated coefficients into naira-denominated business cases, and produced the residual leaderboard that names specific stations for investment or remediation.

Technique Key Finding Single Action for Management
EDA + HHI Skewness = 4.36; HHI = 80.6; top-2 stations = ₦90.6M/day Hold ₦90.6M contingency buffer; designate Asaba Summit Junction and Ibafo as Tier-1 Liquidity-Critical Assets
Visualisation South-South accelerating; FCT/North contracting; named residual outliers Redirect Q1 2026 supply allocation toward South-South; escalate bottom-residual stations
Hypothesis Testing AGO growth significant (p < 0.05, Cohen’s d > 0); tier differences highly significant (η² material) Maintain AGO pre-positioning; adopt daily/weekly/monthly tiered monitoring
Correlation r = 0.9995 volume↔︎revenue; price variation < ₦70/litre; 25× volume variation Replace price-scenario models with Dangote allocation-based throughput forecasts
Regression R² = 0.82; PMS volume is dominant predictor; residual leaderboard produced PMS storage capex in 2026; bottom-10 residual stations to operations review

Single Recommendation: Replace the current reactive treasury posture — monitoring bank positions after inflows arrive — with a forward-looking throughput-based inflow forecast built on three inputs: (1) Dangote Refinery confirmed PMS allocation for the coming week, (2) prior month PMS throughput by station and tier, and (3) regional AGO demand trajectory. This three-variable system captures over 99% of inflow variance and gives treasury a predictive tool, not a rear-view mirror.


11. Limitations & Further Work

  1. Gross revenue, not net cash inflow: The dataset captures pump revenue before remittances, handling charges, and credit sales. A complete treasury model requires cost and receivables data not in scope here. Net cash available to treasury may be materially lower than the gross figures analysed.

  2. Three-month window: Q4 2025 is a single quarter. The AGO seasonality finding, while statistically significant, cannot be separated from a one-year directional trend without multi-year data. A 12-quarter panel would enable time-series decomposition of seasonal, trend, and station-fixed components, and would substantially increase the power of the hypothesis tests.

  3. Spatial and route effects unmodelled: Stations on high-traffic interstate corridors will systematically outperform the regression model’s predictions regardless of their product mix or pricing. A road-type or corridor dummy variable would reduce false-positive flags in the residual leaderboard and improve coefficient stability.

  4. Zero-volume ambiguity: Rainoil Ilorin – Sobi Road’s full-quarter closure was not verified against operational records. If additional stations experienced partial closures not reflected in the data, the regression model’s predictions for those station-months would be biased, inflating the apparent performance of neighbouring stations.

  5. Multicollinearity acknowledged: The VIF analysis confirmed moderate collinearity between pms_vol_k and pms_share. While this does not bias the residuals or R², it inflates standard errors for those two terms and means individual coefficients should not be interpreted in isolation. The overall model fit and leaderboard rankings are unaffected.

  6. Further work: The natural extension is a logistic regression or classification model that predicts the probability that a station’s monthly inflow falls below its treasury contribution threshold — converting this analysis from a descriptive post-mortem into a forward-looking early warning system. Adding monthly ARIMA or ETS forecasts for the top 10 liquidity contributors would further sharpen the funding plan.


References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

Rainoil Limited. (2026). Q4 2025 retail station sales data — PMS, AGO, and DPK volumes and revenues [Internal data]. Finance & Treasury Department, Rainoil Limited, Lagos, Nigeria. Data available on request from the author subject to management approval.

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.4). R Foundation for Statistical Computing. https://www.R-project.org/

Van Rossum, G., & Drake, F. L. (2009). Python 3 reference manual. CreateSpace.

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, R., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and statistical modeling with Python. In Proceedings of the 9th Python in Science Conference (pp. 92–96). https://doi.org/10.25080/Majora-92bf1922-011


Appendix: AI Usage Statement

Claude (Anthropic, claude.ai) was used to assist with (1) structuring the Quarto document template and YAML configuration, (2) generating and debugging R and Python code for statistical procedures (corrplot, bptest, leveneTest, TukeyHSD, smf.ols, pairwise_tukeyhsd, ggrepel), and (3) resolving rendering errors during document preparation. All analytical decisions were made independently by the author: the decision to apply a Herfindahl-Hirschman Index to quantify concentration risk, the selection of an efficiency frontier scatter as the primary visualisation frame, the construction of a named regression residual leaderboard as a management tool, the choice of a log-linear OLS specification with log-transformed outcome, the framing of both hypothesis tests and the interpretation of their results, and all business recommendations connecting statistical outputs to treasury operations and capital decisions. The six AGO demand drivers cited in Section 6 reflect the author’s professional knowledge of the Nigerian downstream sector and were not generated by AI. The dataset was extracted directly by the author from Rainoil’s Business Central ERP; no simulated, synthetic, or publicly downloaded data was used at any point.