Drivers of Liquidity Strength from Retail Station Performance: An Exploratory and Inferential Analysis in a Downstream Oil & Gas Treasury Function
Author
Juliet Okechukwu
Published
May 9, 2026
Executive Summary
This study examines how retail station performance translates into daily liquidity strength within the treasury function of a downstream oil and gas company. The business problem addressed is the difficulty of forecasting reliable cash inflows to fund depot loadings, supplier payments, salaries, and statutory obligations, all of which depend on revenue generated by the retail network.
A primary dataset was extracted from the company’s retail reporting system and Business central ERP covering 193 stations over October–December 2025 (579 station-month observations). Variables include PMS, AGO, and DPK volumes, realised prices, product mix, and computed revenues. Exploratory analysis revealed severe right-skewness in revenue distribution, with two mega-stations contributing a disproportionate share of total inflows, indicating concentration risk. Hypothesis testing confirmed statistically significant growth in AGO volumes across the period. Correlation results showed revenue is almost entirely volume-driven rather than price-driven. A regression model quantified PMS volume as the strongest predictor of station revenue.
The study recommends that treasury base inflow forecasts primarily on throughput assumptions, prioritise PMS supply continuity and storage capacity, and monitor high-contributing stations as liquidity-critical assets to strengthen funding planning and liquidity buffers.
Organisation: Rainoil Limited. A downstream oil and gas company operating a retail fuel distribution network across Nigeria, encompassing petroleum product procurement, depot operations, and a network of 193 retail stations dispensing PMS (petrol), AGO (diesel), and DPK (kerosene).
My role in treasury is to ensure that sufficient cash is available daily to fund depot loadings, supplier settlements, salaries, taxes, and statutory obligations. While these decisions are executed at the bank position level, they are fundamentally driven by the revenue-generating capacity of the retail station network. Understanding how stations perform operationally is therefore essential for anticipating liquidity strength, planning funding requirements, and setting appropriate liquidity buffers.
Why these five techniques are operationally relevant to my role:
Exploratory Data Analysis (EDA): My role begins with an assessment of where cash is, where it needs to be, and whether the day’s inflows are sufficient to cover scheduled outflows. EDA formalises this instinct into a reproducible diagnostic routine. In this analysis, EDA on the Q4 2025 station sales data immediately surfaced two material quality issues: five stations reporting zero sales volume in at least one month, ambiguous between genuine closure and a failure to submit, and a severe right skew in revenue distribution (skewness = 4.25) driven by two mega-stations, Asaba Summit Junction and Ibafo, whose combined Q4 revenue of approximately ₦8.25 billion represented a disproportionate share of the ₦122.6 billion network total. In treasury terms, this concentration risk is directly relevant: if two stations account for that share of inflows, any operational disruption at either site, a supply stock-out, a POS system failure, a regulatory hold, creates an immediate and material liquidity gap that the rest of the network cannot absorb in the short term.
Data Visualisation: Treasury reporting is not done for analysts, it is done for CFOs, MDs, and board treasury committees who need to make funding decisions in minutes, not hours. In this analysis, five coordinated plots told a single operational story: the monthly PMS versus AGO revenue trend showed both products growing into December, with PMS contributing the dominant share of the ₦122.6 billion quarterly total; the performance tier boxplot showed that High-tier stations carry extreme variance, meaning inflow forecasts built on averages are structurally unreliable; and the month-on-month growth distribution revealed that most stations grew from October to December, which has direct implications for Q1 2026 working capital sizing. These are precisely the visuals I would embed in a weekly treasury dashboard — not to decorate a report, but to anchor a funding conversation with a specific number and a specific station.
Hypothesis Testing: Treasury decisions are frequently justified by assumptions that have never been tested. A common one in downstream operations is that AGO demand is seasonal, that dry-season economic activity reliably drives diesel volumes upward in Q4, justifying pre-positioning of AGO inventory and the associated funding commitment. In this analysis, a formal one-sample t-test evaluated exactly this assumption: whether the mean AGO volume growth from October to December across the network was significantly greater than zero. The test reported a p-value and a Cohen’s d effect size, distinguishing between a growth rate that is statistically real and one that is merely the result of a few high-performing stations pulling the average. The discipline of stating H₀ and H₁, checking the normality assumption via the Lilliefors test, and reporting effect size alongside the p-value is the same discipline required when I present a funding recommendation to the CFO — the conclusion must be defensible, not anecdotal.
Correlation Analysis: In treasury, the question is never simply “what happened?” but “what is driving it, and can I see it coming?” The correlation matrix in this analysis identified that total volume and total revenue are nearly perfectly collinear across stations (r ≈ 0.99), confirming that revenue forecasting for this network is essentially a throughput forecasting problem — price variation across stations is narrow and contributes minimally to revenue differences. It also surfaced a negative association between average PMS price and volume sold, consistent with price elasticity in petrol retail. For treasury planning, the implication is direct: inflow projections should be built primarily on volume assumptions, not price assumptions, and any supply disruption that constrains PMS throughput — even at a handful of High-tier stations, will have a compounding effect on cash inflows that a price-based model would fail to capture.
Logistic Regression: The most consequential treasury decision I make on any given day is whether to draw on a credit facility — and if so, by how much and for how long. That decision is currently made on a combination of real-time bank position data and experience. In this analysis, a linear regression model was built to predict log-transformed station revenue from operational characteristics: PMS volume, AGO volume, average realised prices, product mix share, and month-on-month growth. Each significant coefficient was translated into a concrete operational action, the dominant positive coefficient on PMS volume, for instance, directly justifies prioritising PMS storage capacity investments in the 2026 capex plan, because storage constraints are a supply-side ceiling on the inflows that fund operations. Stations with large positive residuals were flagged as over-performers relative to their characteristics; those with large negative residuals were flagged for operational review. Extending this to a logistic regression framework, modelling the probability that a station’s inflows fall below its funding threshold in a given month, based on its volume trajectory, product mix, and growth rate, would give the treasury function a forward-looking early warning system rather than the current reactive monitoring posture.
Disclaimer: The following analysis is based on retail sales records for the period October to December 2025, covering 193 stations across the Rainoil network. Revenue figures are denominated in Nigerian Naira and reflect gross sales at the pump; they do not account for procurement costs, depot handling charges, or statutory remittances. Data accuracy is dependent on the reporting integrity of individual retail stations. Five stations were flagged during EDA for zero-volume anomalies and treated accordingly. Statistical models and hypothesis tests are intended for strategic and planning guidance only, and should be used in conjunction with current NNPC supply allocation data, CBN liquidity circulars, and broader downstream sector regulatory updates from the NMDPRA. No model output in this analysis constitutes a standalone funding recommendation.
Data collection and Sampling
The dataset used in this analysis is a primary operational extract obtained directly from the company’s retail performance reporting system. The data captures monthly sales activity for 193 retail stations across Nigeria for the period October to December 2025. Each observation represents a station–month record containing product volumes (PMS, AGO, DPK), realised average selling prices, and computed revenue values.
With 193 stations observed across three months, the dataset contains 579 observations and more than 10 operational variables, exceeding the minimum requirement of 100 observations and 5 variables.
This data is not publicly available and was collected specifically for this study from internal reporting systems used by the finance and retail teams.
The Q4 2025 station performance dataset used in this study directly represents the source of the cash inflows that treasury depends on.
How the Data Were Collected (Methodology & Tools)
The data was extracted from the organisation’s retail performance monitoring system, which consolidates daily station sales submitted through the point-of-sale and back-office reporting platform into monthly summaries used by finance and treasury for performance monitoring.
The extraction was performed using:
Internal retail performance dashboards, Excel export tools from Business Central ERP, Data cleaning and structuring performed in Excel and RStudio for analysis. This mirrors the exact data source treasury relies on for monitoring daily inflows.
Sampling Frame
The sampling frame consists of all active retail stations within the Rainoil’s network during Q4 2025. No sampling or filtering was applied. The dataset represents a complete census of the retail network rather than a subset. This eliminates sampling bias and ensures the analysis reflects the true operational drivers of cash inflows.
Sample Size and Statistical Rationale
193 stations × 3 months = 579 observations This sample size is statistically robust for: Hypothesis testing (t-tests) Correlation analysis Regression modelling The size ensures sufficient variability across station tiers, geography, and product mix to produce reliable statistical inference.
The large cross-sectional and temporal coverage improves the power and reliability of the analytical techniques applied.
Time Period Covered
The period October to December 2025 (Q4) was deliberately chosen because:
It represents peak downstream activity during the dry season It captures seasonal growth patterns in AGO demand It is a critical period for year-end liquidity planning and funding decisions in treasury
Ethical Considerations and Data Sharing Restrictions
Formal permission was obtained from management to use this operational dataset strictly for academic purposes in this examination. The analysis does not disclose commercially sensitive information beyond what is required for statistical interpretation. No customer-level or personally identifiable data is included. Station identities are used solely for analytical purposes and are not disclosed outside this academic exercise. Findings from this study will also be presented to management as part of ongoing efforts to strengthen treasury forecasting and liquidity planning using retail station performance data.
Data Description
Load Libraries
Code
library(readxl)library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.1 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.3 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Attaching package: 'scales'
The following object is masked from 'package:purrr':
discard
The following object is masked from 'package:readr':
col_factor
Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
The following object is masked from 'package:purrr':
some
Code
library(lmtest) # bptest (Breusch-Pagan)
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Code
library(nortest) # lillie.testlibrary(patchwork) # combine ggplotslibrary(ggrepel) # non-overlapping text labelslibrary(RColorBrewer) # paletteslibrary(effectsize) # cohen's d, eta squared
Data Ingestion
Code
# NOTE: The raw Excel file has a multi-row merged header.# Rows 1-2: blank/title; Row 3: group headers; Row 4: product sub-headers.# Actual data starts at row 5 (Excel row index).# We skip the first 3 rows and assign clean column names manually.raw <-read_excel("Q4_Sales_Data.xlsx",sheet ="2025 OCT NOV DEC",col_names =FALSE,skip =3# skip title + blank + merged group header rows)
# Some stations report zero litres across ALL products for a month.# This is operationally implausible for an open station — it likely# indicates: (a) station temporarily closed, (b) data not submitted,# or (c) data entry error (missed entry).zero_stations <- df %>%filter(total_ltrs_oct ==0| total_ltrs_nov ==0| total_ltrs_dec ==0) %>%select(station, total_ltrs_oct, total_ltrs_nov, total_ltrs_dec, total_ltrs_all)cat("Stations with at least one zero-volume month:\n")
Test 1: Do High-Tier stations realise significantly higher ─ PMS prices than Low-Tier stations?
Business logic: If high-volume stations negotiate better wholesale allocations, their realised price may differ due to less reliance on spot/expensive supply.
H0: Mean PMS price is the same for High-tier and Low-tier stations H1: High-tier stations have a different mean PMS price (two-sided)
Welch Two Sample t-test
data: high_price and low_price
t = -1.8771, df = 65.065, p-value = 0.06499
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-9.8643081 0.3056444
sample estimates:
mean of x mean of y
914.2562 919.0355
if (t1$p.value <0.05) {cat("Result: REJECT H0. High-tier and Low-tier stations charge statistically\n")cat("different average PMS prices (p < 0.05).\n")cat("Business action: High-volume stations have pricing leverage.\n")cat("Investigate whether High-tier stations absorb more of the subsidy\n")cat("differential or enjoy better NNPC direct supply allocation.\n")} else {cat("Result: FAIL TO REJECT H0. No significant price difference detected.\n")cat("Business action: PMS pricing appears relatively uniform across tiers,\n")cat("suggesting a centrally controlled pricing policy at Rainoil.\n")}
Result: FAIL TO REJECT H0. No significant price difference detected.
Business action: PMS pricing appears relatively uniform across tiers,
suggesting a centrally controlled pricing policy at Rainoil.
Test 2: Did AGO (diesel) volumes grow Oct→Dec at the network level? (One-sample t-test on growth rate)
Business logic: Q4 is the dry season and economic activity (farming, construction) ramps up, potentially boosting AGO demand.
Did AGO Volume Significantly Grow Oct→Dec? H0: Mean network AGO growth rate (Oct→Dec) = 0 (no change) H1: Mean network AGO growth rate > 0 (volumes grew)
Lilliefors (Kolmogorov-Smirnov) normality test
data: ago_growth
D = 0.3408, p-value < 2.2e-16
Code
# One-sample t-test (one-sided, mu = 0)t2 <-t.test(ago_growth, mu =0, alternative ="greater")print(t2)
One Sample t-test
data: ago_growth
t = 2.6464, df = 186, p-value = 0.004417
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
0.1682728 Inf
sample estimates:
mean of x
0.448325
Code
# Effect size: Cohen's d (one-sample, comparing to mu=0)d2 <-mean(ago_growth) /sd(ago_growth)cat("\nCohen's d (one-sample vs. 0):", round(d2, 4), "\n")
if (t2$p.value <0.05) {cat("Result: REJECT H0. AGO volumes did grow significantly from October to\n")cat("December (p < 0.05, one-sided test).\n")cat("Business action: Pre-position AGO inventory ahead of Q4 each year.\n")cat("Expand AGO storage capacity at high-growth stations ahead of dry season.\n")} else {cat("Result: FAIL TO REJECT H0. AGO growth is not statistically significant.\n")cat("Business action: AGO demand is lumpy and station-specific rather than\n")cat("a network-wide seasonal trend. Target AGO promotions station-by-station.\n")}
Result: REJECT H0. AGO volumes did grow significantly from October to
December (p < 0.05, one-sided test).
Business action: Pre-position AGO inventory ahead of Q4 each year.
Expand AGO storage capacity at high-growth stations ahead of dry season.
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggcorrplot package.
Please report the issue at <https://github.com/kassambara/ggcorrplot/issues>.
Code
print(plot_corr)
Code
cat("\n--- Top Correlations & Business Implications ---\n\n")
cat(" Stations that charge slightly lower prices tend to move higher volumes —\n")
Stations that charge slightly lower prices tend to move higher volumes —
Code
cat(" confirming price elasticity in petrol retail. Micro-pricing strategy\n")
confirming price elasticity in petrol retail. Micro-pricing strategy
Code
cat(" at competitive intersections could unlock latent volume.\n\n")
at competitive intersections could unlock latent volume.
Linear Regression — Predicting Q4 Revenue
Scenario: Build a model to predict a station’s log-total Q4 revenue from its operational characteristics. The model will guide where to invest capacity in 2026.
# Use base R par(mfrow) for the 4-panel diagnostic gridpar(mfrow =c(2, 2),bg ="white",col.main ="#1a1a2e",font.main =2)suppressWarnings(plot(model,which =c(1, 2, 3, 4),col ="#4472C4",pch =16,cex =0.6,col.smooth ="#C0392B"))
cat(" A 1% increase in PMS volume is associated with a ~", round(coef(model)["log_pms_vol"],3) *100, "% increase in total revenue. Action: PMS throughput is the single most powerful revenue lever. Direct supply allocation, nozzle counts, and PMS storage upgrades at mid-tier stations should be prioritised in the 2026 capex plan.\n")
A 1% increase in PMS volume is associated with a ~ 100.3 % increase in total revenue. Action: PMS throughput is the single most powerful revenue lever. Direct supply allocation, nozzle counts, and PMS storage upgrades at mid-tier stations should be prioritised in the 2026 capex plan.
cat(" AGO (diesel) volume contributes meaningfully to revenue beyond PMS. Action: Stations near industrial clusters or major highways should be targeted for AGO storage expansion — AGO margins are typically higher and its demand is less elastic.\n")
AGO (diesel) volume contributes meaningfully to revenue beyond PMS. Action: Stations near industrial clusters or major highways should be targeted for AGO storage expansion — AGO margins are typically higher and its demand is less elastic.
Code
cat("3. pms_price_std ─ If significant: a 1 SD increase in PMS price\n")
3. pms_price_std ─ If significant: a 1 SD increase in PMS price
Code
cat(" corresponds to a ", round(coef(model)["pms_price_std"],4)," change in log-revenue. Action: Price setting is secondary to volume. Uniform pricing policy is appropriate — do not sacrifice volume for marginal price uplifts.\n")
corresponds to a 0.0118 change in log-revenue. Action: Price setting is secondary to volume. Uniform pricing policy is appropriate — do not sacrifice volume for marginal price uplifts.
Code
cat("4. growth_std ─ Oct->Dec growth as a revenue predictor\n")
4. growth_std ─ Oct->Dec growth as a revenue predictor
Code
cat(" Action: Stations with strong Q4 momentum are natural candidates for Q1 2026 throughput targets. Monitor the top-quartile growth stations for early capacity constraints.\n")
Action: Stations with strong Q4 momentum are natural candidates for Q1 2026 throughput targets. Monitor the top-quartile growth stations for early capacity constraints.