Drivers of Liquidity Strength from Retail Station Performance: An Exploratory and Inferential Analysis in a Downstream Oil & Gas Treasury Function

Author

Juliet Okechukwu

Published

May 9, 2026

Executive Summary

This study examines how retail station performance translates into daily liquidity strength within the treasury function of a downstream oil and gas company. The business problem addressed is the difficulty of forecasting reliable cash inflows to fund depot loadings, supplier payments, salaries, and statutory obligations, all of which depend on revenue generated by the retail network.

A primary dataset was extracted from the company’s retail reporting system and Business central ERP covering 193 stations over October–December 2025 (579 station-month observations). Variables include PMS, AGO, and DPK volumes, realised prices, product mix, and computed revenues. Exploratory analysis revealed severe right-skewness in revenue distribution, with two mega-stations contributing a disproportionate share of total inflows, indicating concentration risk. Hypothesis testing confirmed statistically significant growth in AGO volumes across the period. Correlation results showed revenue is almost entirely volume-driven rather than price-driven. A regression model quantified PMS volume as the strongest predictor of station revenue.

The study recommends that treasury base inflow forecasts primarily on throughput assumptions, prioritise PMS supply continuity and storage capacity, and monitor high-contributing stations as liquidity-critical assets to strengthen funding planning and liquidity buffers.

Professional Disclosure

Job Title: Finance Manager — Liquidity & Treasury Management

Organisation: Rainoil Limited. A downstream oil and gas company operating a retail fuel distribution network across Nigeria, encompassing petroleum product procurement, depot operations, and a network of 193 retail stations dispensing PMS (petrol), AGO (diesel), and DPK (kerosene).

My role in treasury is to ensure that sufficient cash is available daily to fund depot loadings, supplier settlements, salaries, taxes, and statutory obligations. While these decisions are executed at the bank position level, they are fundamentally driven by the revenue-generating capacity of the retail station network. Understanding how stations perform operationally is therefore essential for anticipating liquidity strength, planning funding requirements, and setting appropriate liquidity buffers.

Why these five techniques are operationally relevant to my role:

Exploratory Data Analysis (EDA): My role begins with an assessment of where cash is, where it needs to be, and whether the day’s inflows are sufficient to cover scheduled outflows. EDA formalises this instinct into a reproducible diagnostic routine. In this analysis, EDA on the Q4 2025 station sales data immediately surfaced two material quality issues: five stations reporting zero sales volume in at least one month, ambiguous between genuine closure and a failure to submit, and a severe right skew in revenue distribution (skewness = 4.25) driven by two mega-stations, Asaba Summit Junction and Ibafo, whose combined Q4 revenue of approximately ₦8.25 billion represented a disproportionate share of the ₦122.6 billion network total. In treasury terms, this concentration risk is directly relevant: if two stations account for that share of inflows, any operational disruption at either site, a supply stock-out, a POS system failure, a regulatory hold, creates an immediate and material liquidity gap that the rest of the network cannot absorb in the short term.

Data Visualisation: Treasury reporting is not done for analysts, it is done for CFOs, MDs, and board treasury committees who need to make funding decisions in minutes, not hours. In this analysis, five coordinated plots told a single operational story: the monthly PMS versus AGO revenue trend showed both products growing into December, with PMS contributing the dominant share of the ₦122.6 billion quarterly total; the performance tier boxplot showed that High-tier stations carry extreme variance, meaning inflow forecasts built on averages are structurally unreliable; and the month-on-month growth distribution revealed that most stations grew from October to December, which has direct implications for Q1 2026 working capital sizing. These are precisely the visuals I would embed in a weekly treasury dashboard — not to decorate a report, but to anchor a funding conversation with a specific number and a specific station.

Hypothesis Testing: Treasury decisions are frequently justified by assumptions that have never been tested. A common one in downstream operations is that AGO demand is seasonal, that dry-season economic activity reliably drives diesel volumes upward in Q4, justifying pre-positioning of AGO inventory and the associated funding commitment. In this analysis, a formal one-sample t-test evaluated exactly this assumption: whether the mean AGO volume growth from October to December across the network was significantly greater than zero. The test reported a p-value and a Cohen’s d effect size, distinguishing between a growth rate that is statistically real and one that is merely the result of a few high-performing stations pulling the average. The discipline of stating H₀ and H₁, checking the normality assumption via the Lilliefors test, and reporting effect size alongside the p-value is the same discipline required when I present a funding recommendation to the CFO — the conclusion must be defensible, not anecdotal.

Correlation Analysis: In treasury, the question is never simply “what happened?” but “what is driving it, and can I see it coming?” The correlation matrix in this analysis identified that total volume and total revenue are nearly perfectly collinear across stations (r ≈ 0.99), confirming that revenue forecasting for this network is essentially a throughput forecasting problem — price variation across stations is narrow and contributes minimally to revenue differences. It also surfaced a negative association between average PMS price and volume sold, consistent with price elasticity in petrol retail. For treasury planning, the implication is direct: inflow projections should be built primarily on volume assumptions, not price assumptions, and any supply disruption that constrains PMS throughput — even at a handful of High-tier stations, will have a compounding effect on cash inflows that a price-based model would fail to capture.

Logistic Regression: The most consequential treasury decision I make on any given day is whether to draw on a credit facility — and if so, by how much and for how long. That decision is currently made on a combination of real-time bank position data and experience. In this analysis, a linear regression model was built to predict log-transformed station revenue from operational characteristics: PMS volume, AGO volume, average realised prices, product mix share, and month-on-month growth. Each significant coefficient was translated into a concrete operational action, the dominant positive coefficient on PMS volume, for instance, directly justifies prioritising PMS storage capacity investments in the 2026 capex plan, because storage constraints are a supply-side ceiling on the inflows that fund operations. Stations with large positive residuals were flagged as over-performers relative to their characteristics; those with large negative residuals were flagged for operational review. Extending this to a logistic regression framework, modelling the probability that a station’s inflows fall below its funding threshold in a given month, based on its volume trajectory, product mix, and growth rate, would give the treasury function a forward-looking early warning system rather than the current reactive monitoring posture.

Disclaimer: The following analysis is based on retail sales records for the period October to December 2025, covering 193 stations across the Rainoil network. Revenue figures are denominated in Nigerian Naira and reflect gross sales at the pump; they do not account for procurement costs, depot handling charges, or statutory remittances. Data accuracy is dependent on the reporting integrity of individual retail stations. Five stations were flagged during EDA for zero-volume anomalies and treated accordingly. Statistical models and hypothesis tests are intended for strategic and planning guidance only, and should be used in conjunction with current NNPC supply allocation data, CBN liquidity circulars, and broader downstream sector regulatory updates from the NMDPRA. No model output in this analysis constitutes a standalone funding recommendation.

Data collection and Sampling

The dataset used in this analysis is a primary operational extract obtained directly from the company’s retail performance reporting system. The data captures monthly sales activity for 193 retail stations across Nigeria for the period October to December 2025. Each observation represents a station–month record containing product volumes (PMS, AGO, DPK), realised average selling prices, and computed revenue values.

With 193 stations observed across three months, the dataset contains 579 observations and more than 10 operational variables, exceeding the minimum requirement of 100 observations and 5 variables.

This data is not publicly available and was collected specifically for this study from internal reporting systems used by the finance and retail teams.

The Q4 2025 station performance dataset used in this study directly represents the source of the cash inflows that treasury depends on.

How the Data Were Collected (Methodology & Tools)

The data was extracted from the organisation’s retail performance monitoring system, which consolidates daily station sales submitted through the point-of-sale and back-office reporting platform into monthly summaries used by finance and treasury for performance monitoring.

The extraction was performed using:

Internal retail performance dashboards, Excel export tools from Business Central ERP, Data cleaning and structuring performed in Excel and RStudio for analysis. This mirrors the exact data source treasury relies on for monitoring daily inflows.

Sampling Frame

The sampling frame consists of all active retail stations within the Rainoil’s network during Q4 2025. No sampling or filtering was applied. The dataset represents a complete census of the retail network rather than a subset. This eliminates sampling bias and ensures the analysis reflects the true operational drivers of cash inflows.

Sample Size and Statistical Rationale

193 stations × 3 months = 579 observations This sample size is statistically robust for: Hypothesis testing (t-tests) Correlation analysis Regression modelling The size ensures sufficient variability across station tiers, geography, and product mix to produce reliable statistical inference.

The large cross-sectional and temporal coverage improves the power and reliability of the analytical techniques applied.

Time Period Covered

The period October to December 2025 (Q4) was deliberately chosen because:

It represents peak downstream activity during the dry season It captures seasonal growth patterns in AGO demand It is a critical period for year-end liquidity planning and funding decisions in treasury

Ethical Considerations and Data Sharing Restrictions

Formal permission was obtained from management to use this operational dataset strictly for academic purposes in this examination. The analysis does not disclose commercially sensitive information beyond what is required for statistical interpretation. No customer-level or personally identifiable data is included. Station identities are used solely for analytical purposes and are not disclosed outside this academic exercise. Findings from this study will also be presented to management as part of ongoing efforts to strengthen treasury forecasting and liquidity planning using retail station performance data.

Data Description

Load Libraries

Code
library(readxl)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.3     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(ggplot2)
library(ggcorrplot)
library(scales)

Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor
Code
library(moments)       # skewness / kurtosis
library(car)           # vif, qqPlot
Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some
Code
library(lmtest)        # bptest (Breusch-Pagan)
Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
Code
library(nortest)       # lillie.test
library(patchwork)     # combine ggplots
library(ggrepel)       # non-overlapping text labels
library(RColorBrewer)  # palettes
library(effectsize)    # cohen's d, eta squared

Data Ingestion

Code
# NOTE: The raw Excel file has a multi-row merged header.
# Rows 1-2: blank/title; Row 3: group headers; Row 4: product sub-headers.
# Actual data starts at row 5 (Excel row index).
# We skip the first 3 rows and assign clean column names manually.
 
raw <- read_excel(
  "Q4_Sales_Data.xlsx",
  sheet    = "2025 OCT NOV DEC",
  col_names = FALSE,
  skip      = 3        # skip title + blank + merged group header rows
)
New names:
• `` -> `...1`
• `` -> `...2`
• `` -> `...3`
• `` -> `...4`
• `` -> `...5`
• `` -> `...6`
• `` -> `...7`
• `` -> `...8`
• `` -> `...9`
• `` -> `...10`
• `` -> `...11`
• `` -> `...12`
• `` -> `...13`
• `` -> `...14`
• `` -> `...15`
• `` -> `...16`
• `` -> `...17`
• `` -> `...18`
• `` -> `...19`
• `` -> `...20`
• `` -> `...21`
• `` -> `...22`
• `` -> `...23`
• `` -> `...24`
• `` -> `...25`
• `` -> `...26`
• `` -> `...27`
• `` -> `...28`
• `` -> `...29`
Code
# Assign column names (29 columns)
col_names <- c(
  "sn", "station",
  "pms_oct_ltrs", "ago_oct_ltrs", "dpk_oct_ltrs",
  "pms_nov_ltrs", "ago_nov_ltrs", "dpk_nov_ltrs",
  "pms_dec_ltrs", "ago_dec_ltrs", "dpk_dec_ltrs",
  "pms_oct_price","ago_oct_price","dpk_oct_price",
  "pms_nov_price","ago_nov_price","dpk_nov_price",
  "pms_dec_price","ago_dec_price","dpk_dec_price",
  "pms_oct_naira","ago_oct_naira","dpk_oct_naira",
  "pms_nov_naira","ago_nov_naira","dpk_nov_naira",
  "pms_dec_naira","ago_dec_naira","dpk_dec_naira"
)
colnames(raw) <- col_names
 
# Drop the sub-header row (row 1 of raw after skip = product labels row)
# and the trailing grand-total row
df <- raw %>%
  slice(-1) %>%                          # remove "PMS / AGO / DPK" label row
  filter(!is.na(sn), sn != "S/N") %>%   # drop any header remnants
  slice_head(n = 193)                    # keep exactly 193 stations
 
# Clean station names
df$station <- trimws(df$station)
 
# Coerce all numeric columns
num_cols <- col_names[-(1:2)]
df <- df %>%
  mutate(across(all_of(num_cols), ~ suppressWarnings(as.numeric(.))))

Feature Engineering

Code
df <- df %>%
  mutate(
    # Total volume (litres) by month
    total_ltrs_oct = pms_oct_ltrs + ago_oct_ltrs + dpk_oct_ltrs,
    total_ltrs_nov = pms_nov_ltrs + ago_nov_ltrs + dpk_nov_ltrs,
    total_ltrs_dec = pms_dec_ltrs + ago_dec_ltrs + dpk_dec_ltrs,
    total_ltrs_all = total_ltrs_oct + total_ltrs_nov + total_ltrs_dec,
 
    # Total revenue (Naira) by month
    total_rev_oct  = pms_oct_naira + ago_oct_naira + dpk_oct_naira,
    total_rev_nov  = pms_nov_naira + ago_nov_naira + dpk_nov_naira,
    total_rev_dec  = pms_dec_naira + ago_dec_naira + dpk_dec_naira,
    total_rev_all  = total_rev_oct + total_rev_nov + total_rev_dec,
 
    # Product totals across Q4
    pms_total_ltrs = pms_oct_ltrs + pms_nov_ltrs + pms_dec_ltrs,
    ago_total_ltrs = ago_oct_ltrs + ago_nov_ltrs + ago_dec_ltrs,
    dpk_total_ltrs = dpk_oct_ltrs + dpk_nov_ltrs + dpk_dec_ltrs,
 
    # PMS share of volume (dominant product)
    pms_share = pms_total_ltrs / (total_ltrs_all + 1e-9),
 
    # Month-on-month growth Oct → Dec (litres)
    mom_growth = (total_ltrs_dec - total_ltrs_oct) /
                   (total_ltrs_oct + 1e-9),
 
    # Revenue in millions (Naira)
    rev_all_M = total_rev_all / 1e6,
 
    # Average realised PMS price across Q4
    avg_pms_price = (pms_oct_price + pms_nov_price + pms_dec_price) / 3,
 
    # Average realised AGO price across Q4
    avg_ago_price = (ago_oct_price + ago_nov_price + ago_dec_price) / 3,
 
    # Performance tier (volume-based)
    tier = case_when(
      total_ltrs_all >= quantile(total_ltrs_all, 0.75, na.rm = TRUE) ~ "High",
      total_ltrs_all >= quantile(total_ltrs_all, 0.25, na.rm = TRUE) ~ "Mid",
      TRUE                                                             ~ "Low"
    ),
    tier = factor(tier, levels = c("Low", "Mid", "High")),
 
    # Log transform for skewed variables
    log_total_ltrs = log1p(total_ltrs_all),
    log_total_rev  = log1p(total_rev_all)
  )
 
cat(" DATASET OVERVIEW \n")
 DATASET OVERVIEW 
Code
cat("Stations:", nrow(df), "\n")
Stations: 193 
Code
cat("Columns :", ncol(df), "\n")
Columns : 48 
Code
cat("Period  : Oct – Dec 2025 (Q4)\n")
Period  : Oct – Dec 2025 (Q4)
Code
cat("Network : Rainoil Petroleum Retail Stations\n\n")
Network : Rainoil Petroleum Retail Stations

Exploratory Data Analysis (EDA)

Code
cat("=== A. EDA — DESCRIPTIVE STATISTICS ===\n\n")
=== A. EDA — DESCRIPTIVE STATISTICS ===
Code
# Core descriptive statistics
df %>%
  select(total_ltrs_all, total_rev_all, pms_total_ltrs,
         ago_total_ltrs, dpk_total_ltrs, pms_share,
         avg_pms_price, avg_ago_price) %>%
  summary() %>%
  print()
 total_ltrs_all    total_rev_all       pms_total_ltrs    ago_total_ltrs   
 Min.   :      0   Min.   :0.000e+00   Min.   :      0   Min.   :      0  
 1st Qu.: 391776   1st Qu.:3.636e+08   1st Qu.: 334821   1st Qu.:  30903  
 Median : 615767   Median :5.707e+08   Median : 532223   Median :  57044  
 Mean   : 680785   Mean   :6.352e+08   Mean   : 592849   Mean   :  87202  
 3rd Qu.: 821714   3rd Qu.:7.704e+08   3rd Qu.: 732730   3rd Qu.:  97980  
 Max.   :4348799   Max.   :4.180e+09   Max.   :3211117   Max.   :1726223  
 dpk_total_ltrs      pms_share      avg_pms_price   avg_ago_price   
 Min.   :    0.0   Min.   :0.0000   Min.   :163.0   Min.   : 194.4  
 1st Qu.:    0.0   1st Qu.:0.8476   1st Qu.:908.9   1st Qu.:1062.9  
 Median :    0.0   Median :0.9041   Median :910.7   Median :1065.5  
 Mean   :  734.3   Mean   :0.8779   Mean   :906.0   Mean   :1054.1  
 3rd Qu.:    0.0   3rd Qu.:0.9328   3rd Qu.:919.5   3rd Qu.:1073.0  
 Max.   :17875.0   Max.   :1.0000   Max.   :957.0   Max.   :1083.1  
Code
cat("\n--- Distribution Moments ---\n")

--- Distribution Moments ---
Code
volume_stats <- df %>%
  summarise(
    Mean_Ltrs    = mean(total_ltrs_all, na.rm = TRUE),
    Median_Ltrs  = median(total_ltrs_all, na.rm = TRUE),
    SD_Ltrs      = sd(total_ltrs_all, na.rm = TRUE),
    Skewness     = skewness(total_ltrs_all, na.rm = TRUE),
    Kurtosis     = kurtosis(total_ltrs_all, na.rm = TRUE),
    IQR_Ltrs     = IQR(total_ltrs_all, na.rm = TRUE)
  )
print(volume_stats)
# A tibble: 1 × 6
  Mean_Ltrs Median_Ltrs SD_Ltrs Skewness Kurtosis IQR_Ltrs
      <dbl>       <dbl>   <dbl>    <dbl>    <dbl>    <dbl>
1   680785.      615767 503354.     4.22     30.2   429938

Data Quality Issue

Zero-Volume Stations

Code
# Some stations report zero litres across ALL products for a month.
# This is operationally implausible for an open station — it likely
# indicates: (a) station temporarily closed, (b) data not submitted,
# or (c) data entry error (missed entry).
 
zero_stations <- df %>%
  filter(total_ltrs_oct == 0 | total_ltrs_nov == 0 | total_ltrs_dec == 0) %>%
  select(station, total_ltrs_oct, total_ltrs_nov, total_ltrs_dec, total_ltrs_all)
 
cat("Stations with at least one zero-volume month:\n")
Stations with at least one zero-volume month:
Code
print(zero_stations, n = Inf)
# A tibble: 4 × 5
  station            total_ltrs_oct total_ltrs_nov total_ltrs_dec total_ltrs_all
  <chr>                       <dbl>          <dbl>          <dbl>          <dbl>
1 Rainoil Portharco…              0              0         105021         105021
2 Rainoil Ilorin - …              0              0              0              0
3 Rainoil Uselu She…              0          35195          46257          81452
4 Rainoil Ughelli P…              0              0         322914         322914
Code
cat("\nResolution Strategy:\n")

Resolution Strategy:
Code
cat("  • For stations with zero in ONE month but non-zero in others → treat as\n")
  • For stations with zero in ONE month but non-zero in others → treat as
Code
cat("    'temporary closure / data gap'. Flag with is_intermittent = TRUE.\n")
    'temporary closure / data gap'. Flag with is_intermittent = TRUE.
Code
cat("  • For stations with zero across ALL 3 months → exclude from modelling\n")
  • For stations with zero across ALL 3 months → exclude from modelling
Code
cat("    (Rainoil Uselu Shell & Rainoil Ughelli Post Office) as they appear\n")
    (Rainoil Uselu Shell & Rainoil Ughelli Post Office) as they appear
Code
cat("    to be non-operational during Q4 2025.\n")
    to be non-operational during Q4 2025.
Code
cat("  • We retain them in descriptive counts but exclude from regression.\n\n")
  • We retain them in descriptive counts but exclude from regression.
Code
df <- df %>%
  mutate(
    is_intermittent  = (total_ltrs_oct == 0 | total_ltrs_nov == 0 |
                          total_ltrs_dec == 0) & total_ltrs_all > 0,
    is_zero_all      = total_ltrs_all == 0
  )
 
cat("Intermittent stations flagged:", sum(df$is_intermittent), "\n")
Intermittent stations flagged: 3 
Code
cat("Fully inactive stations excluded from modelling:", sum(df$is_zero_all), "\n")
Fully inactive stations excluded from modelling: 1 

Extreme Outliers / Skewness

Code
cat("Skewness of total quarterly volume:", round(skewness(df$total_ltrs_all), 3), "\n")
Skewness of total quarterly volume: 4.218 
Code
cat("(Skewness > 1.0 is typically considered substantially skewed)\n\n")
(Skewness > 1.0 is typically considered substantially skewed)
Code
# IQR-based outlier detection
Q1 <- quantile(df$total_ltrs_all, 0.25, na.rm = TRUE)
Q3 <- quantile(df$total_ltrs_all, 0.75, na.rm = TRUE)
iqr_val <- Q3 - Q1
upper_fence <- Q3 + 3 * iqr_val  # extreme outlier fence
 
outlier_stations <- df %>%
  filter(total_ltrs_all > upper_fence) %>%
  select(station, total_ltrs_all, total_rev_all, tier)
 
cat("IQR-based extreme outlier fence (3×IQR):",
    format(round(upper_fence), big.mark = ","), "litres\n\n")
IQR-based extreme outlier fence (3×IQR): 2,111,528 litres
Code
cat("Stations above fence (extreme high-volume):\n")
Stations above fence (extreme high-volume):
Code
print(outlier_stations)
# A tibble: 2 × 4
  station                         total_ltrs_all total_rev_all tier 
  <chr>                                    <dbl>         <dbl> <fct>
1 Rainoil Ibafo                          4344378   4067181671. High 
2 Rainoil Asaba - Summit Junction        4348799   4179830842. High 
Code
cat("\nResolution Strategy:\n")

Resolution Strategy:
Code
cat("  • Outliers are REAL mega-stations (e.g., Asaba Summit Junction,\n")
  • Outliers are REAL mega-stations (e.g., Asaba Summit Junction,
Code
cat("    Ibafo) — not data errors. They should NOT be deleted.\n")
    Ibafo) — not data errors. They should NOT be deleted.
Code
cat("  • Instead, we apply log1p() transformation for regression and\n")
  • Instead, we apply log1p() transformation for regression and
Code
cat("    correlation analysis to compress the scale and reduce leverage.\n")
    correlation analysis to compress the scale and reduce leverage.
Code
cat("  • In plots we annotate outliers separately rather than hiding them.\n\n")
  • In plots we annotate outliers separately rather than hiding them.
Code
# Confirm skewness improvement after log transform
cat("Skewness after log1p transform:", round(skewness(df$log_total_ltrs), 3), "\n")
Skewness after log1p transform: -8.511 

Visualization Suite

From Litres to Naira — How Rainoil’s 193 Stations Performed Across the Q4 2025 Quarter

Code
# ── THEME SETUP (white background, light grid) ───────────────
rainoil_theme <- theme_minimal(base_size = 11) +
  theme(
    plot.background   = element_rect(fill = "white", color = NA),
    panel.background  = element_rect(fill = "white", color = NA),
    panel.grid.major  = element_line(color = "grey90"),
    panel.grid.minor  = element_blank(),
    plot.title        = element_text(face = "bold", size = 12,
                                     color = "#1a1a2e"),
    plot.subtitle     = element_text(size = 9, color = "grey40"),
    axis.title        = element_text(size = 9, color = "grey30"),
    axis.text         = element_text(size = 8),
    legend.background = element_rect(fill = "white"),
    strip.text        = element_text(face = "bold", size = 9)
  )
 
tier_cols <- c("Low" = "#e07b54", "Mid" = "#f4c06f", "High" = "#4caf7d")

Plot 1: Distribution of Q4 Volume (raw vs log)

Code
p1a <- ggplot(df, aes(x = total_ltrs_all / 1e6)) +
  geom_histogram(bins = 35, fill = "#4472C4", color = "white", alpha = 0.85) +
  geom_vline(aes(xintercept = mean(total_ltrs_all / 1e6, na.rm = TRUE)),
             color = "#C0392B", linetype = "dashed", linewidth = 0.8) +
  annotate("text", x = mean(df$total_ltrs_all/1e6) + 0.15, y = 40,
           label = paste0("Mean\n", round(mean(df$total_ltrs_all/1e6), 2), "M L"),
           color = "#C0392B", size = 2.8) +
  labs(title = "Raw Volume — Heavy Right Skew",
       subtitle = paste0("Skewness = ", round(skewness(df$total_ltrs_all), 2)),
       x = "Total Litres Sold (Millions)", y = "Station Count") +
  rainoil_theme
 
p1b <- ggplot(df, aes(x = log_total_ltrs)) +
  geom_histogram(bins = 30, fill = "#27AE60", color = "white", alpha = 0.85) +
  labs(title = "Log-Transformed Volume — Near-Normal",
       subtitle = paste0("Skewness after log1p = ",
                         round(skewness(df$log_total_ltrs), 2)),
       x = "log1p(Total Litres Sold)", y = "Station Count") +
  rainoil_theme
 
plot1 <- (p1a | p1b) +
  plot_annotation(
    title = "PLOT 1 — Volume Distribution: Raw vs Log-Transformed",
    subtitle = "Log transformation corrects severe right skew caused by a few mega-stations",
    theme = theme(plot.background = element_rect(fill = "white", color = NA),
                  plot.title = element_text(face = "bold", size = 13, color = "#1a1a2e"),
                  plot.subtitle = element_text(size = 9, color = "grey45"))
  )
plot1

Plot 2: Monthly Revenue Trend — PMS vs AGO

Code
monthly_agg <- df %>%
  filter(!is_zero_all) %>%
  summarise(
    PMS_Oct  = sum(pms_oct_naira,  na.rm = TRUE) / 1e9,
    PMS_Nov  = sum(pms_nov_naira,  na.rm = TRUE) / 1e9,
    PMS_Dec  = sum(pms_dec_naira,  na.rm = TRUE) / 1e9,
    AGO_Oct  = sum(ago_oct_naira,  na.rm = TRUE) / 1e9,
    AGO_Nov  = sum(ago_nov_naira,  na.rm = TRUE) / 1e9,
    AGO_Dec  = sum(ago_dec_naira,  na.rm = TRUE) / 1e9
  ) %>%
  pivot_longer(everything(), names_to = "key", values_to = "rev_bn") %>%
  separate(key, into = c("product", "month"), sep = "_") %>%
  mutate(month = factor(month, levels = c("Oct","Nov","Dec")))
 
plot2 <- ggplot(monthly_agg, aes(x = month, y = rev_bn,
                                  color = product, group = product)) +
  geom_line(linewidth = 1.4) +
  geom_point(size = 3.5, shape = 21, fill = "white", stroke = 1.8) +
  geom_text(aes(label = paste0("₦", round(rev_bn, 1), "B")),
            vjust = -0.9, size = 3, show.legend = FALSE) +
  scale_color_manual(values = c("PMS" = "#4472C4", "AGO" = "#ED7D31"),
                     labels = c("AGO (Diesel)", "PMS (Petrol)")) +
  scale_y_continuous(labels = label_number(suffix = "B", prefix = "₦")) +
  expand_limits(y = c(0, max(monthly_agg$rev_bn) * 1.15)) +
  labs(
    title    = "PLOT 2 — Monthly Revenue Trend: PMS vs AGO (Q4 2025)",
    subtitle = "PMS is the dominant revenue driver; both products grow into December",
    x = "Month", y = "Network Revenue (₦ Billion)", color = "Product"
  ) +
  rainoil_theme +
  theme(plot.background = element_rect(fill = "white", color = NA))
plot2

Plot 3: Volume by Tier (Box plot + Jitter)

Code
plot3 <- df %>%
  filter(!is_zero_all) %>%
  ggplot(aes(x = tier, y = total_ltrs_all / 1e6, fill = tier)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA, width = 0.5) +
  geom_jitter(aes(color = tier), width = 0.18, size = 1.2,
              alpha = 0.55, show.legend = FALSE) +
  geom_text_repel(
    data = df %>% filter(total_ltrs_all > quantile(total_ltrs_all, 0.97, na.rm=T)),
    aes(label = str_remove(station, "Rainoil "), color = tier),
    size = 2.6, max.overlaps = 10, show.legend = FALSE,
    box.padding = 0.4
  ) +
  scale_fill_manual(values  = tier_cols, guide = "none") +
  scale_color_manual(values = tier_cols, guide = "none") +
  scale_y_continuous(labels = label_number(suffix = "M L")) +
  labs(
    title    = "PLOT 3 — Q4 Volume Distribution by Performance Tier",
    subtitle = "High-tier stations show extreme spread — a few outliers drive disproportionate volume",
    x = "Performance Tier", y = "Total Volume Sold (Million Litres)"
  ) +
  rainoil_theme +
  theme(plot.background = element_rect(fill = "white", color = NA))
plot3

Plot 4: PMS Price vs Volume Scatter (revenue bubble)

Code
plot4 <- df %>%
  filter(!is_zero_all, avg_pms_price > 0) %>%
  ggplot(aes(x = avg_pms_price,
             y = pms_total_ltrs / 1e6,
             size  = rev_all_M,
             color = tier)) +
  geom_point(alpha = 0.65) +
  geom_smooth(method = "lm", se = TRUE, color = "grey40",
              linewidth = 0.7, linetype = "dashed", inherit.aes = FALSE,
              aes(x = avg_pms_price, y = pms_total_ltrs / 1e6)) +
  geom_text_repel(
    data = . %>% filter(pms_total_ltrs > quantile(pms_total_ltrs, 0.95)),
    aes(label = str_remove(station, "Rainoil ")),
    size = 2.5, max.overlaps = 8, show.legend = FALSE
  ) +
  scale_color_manual(values = tier_cols) +
  scale_size_continuous(range = c(1.5, 10), labels = label_comma(suffix = "M"),
                        name = "Total Rev (₦M)") +
  scale_x_continuous(labels = label_comma(prefix = "₦")) +
  scale_y_continuous(labels = label_number(suffix = "M L")) +
  labs(
    title    = "PLOT 4 — PMS Price vs PMS Volume: Do Cheaper Stations Sell More?",
    subtitle = "Bubble size = total Q4 revenue. Slight negative slope — lower price stations tend to move more litres",
    x = "Average Realised PMS Price (₦/litre)", y = "PMS Volume Sold (Million Litres)",
    color = "Tier"
  ) +
  rainoil_theme +
  theme(plot.background = element_rect(fill = "white", color = NA))
plot4
`geom_smooth()` using formula = 'y ~ x'

Plot 5: Month-on-Month Growth Distribution

Code
plot5 <- df %>%
  filter(!is_zero_all, !is_intermittent) %>%
  mutate(growth_pct = mom_growth * 100) %>%
  ggplot(aes(x = growth_pct, fill = tier)) +
  geom_histogram(bins = 40, color = "white", alpha = 0.85,
                 position = "identity") +
  geom_vline(xintercept = 0, color = "black", linetype = "solid",
             linewidth = 0.8) +
  facet_wrap(~tier, ncol = 1, scales = "free_y") +
  scale_fill_manual(values = tier_cols, guide = "none") +
  scale_x_continuous(labels = label_percent(scale = 1)) +
  labs(
    title    = "PLOT 5 — Month-on-Month Growth (Oct→Dec) by Station Tier",
    subtitle = "Most stations grew; High-tier stations show greatest variance in growth rates",
    x = "Volume Growth Oct→Dec (%)", y = "Station Count"
  ) +
  rainoil_theme +
  theme(plot.background = element_rect(fill = "white", color = NA))
plot5

Hypothesis Testing

Code
# Analytical subset: exclude zero-all and intermittent stations
df_active <- df %>% filter(!is_zero_all, !is_intermittent)

Test 1: Do High-Tier stations realise significantly higher ─ PMS prices than Low-Tier stations?

Business logic: If high-volume stations negotiate better wholesale allocations, their realised price may differ due to less reliance on spot/expensive supply.

H0: Mean PMS price is the same for High-tier and Low-tier stations H1: High-tier stations have a different mean PMS price (two-sided)

Code
high_price <- df_active %>% filter(tier == "High") %>% pull(avg_pms_price) %>%
                na.omit()
low_price  <- df_active %>% filter(tier == "Low")  %>% pull(avg_pms_price) %>%
                na.omit()
 
cat("High-tier n =", length(high_price), " | Mean =", round(mean(high_price), 2), "\n")
High-tier n = 49  | Mean = 914.26 
Code
cat("Low-tier  n =", length(low_price),  " | Mean =", round(mean(low_price), 2), "\n\n")
Low-tier  n = 44  | Mean = 919.04 
Code
# Assumption 1: Normality (Lilliefors test — robust for real data)
cat("Assumption check — Normality (Lilliefors test):\n")
Assumption check — Normality (Lilliefors test):
Code
print(lillie.test(high_price))

    Lilliefors (Kolmogorov-Smirnov) normality test

data:  high_price
D = 0.18269, p-value = 0.0002948
Code
print(lillie.test(low_price))

    Lilliefors (Kolmogorov-Smirnov) normality test

data:  low_price
D = 0.21766, p-value = 1.64e-05
Code
# Assumption 2: Equal variance (Levene's test)
cat("\nAssumption check — Equal Variance (Levene's test):\n")

Assumption check — Equal Variance (Levene's test):
Code
price_df <- df_active %>%
  filter(tier %in% c("High","Low"), !is.na(avg_pms_price)) %>%
  select(avg_pms_price, tier) %>%
  mutate(tier = droplevels(tier))
print(leveneTest(avg_pms_price ~ tier, data = price_df))
Levene's Test for Homogeneity of Variance (center = median)
      Df F value  Pr(>F)  
group  1  6.4967 0.01248 *
      91                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# Welch t-test (does not assume equal variance)
t1 <- t.test(high_price, low_price, var.equal = FALSE)
print(t1)

    Welch Two Sample t-test

data:  high_price and low_price
t = -1.8771, df = 65.065, p-value = 0.06499
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -9.8643081  0.3056444
sample estimates:
mean of x mean of y 
 914.2562  919.0355 
Code
# Effect size: Cohen's d
d1 <- cohens_d(avg_pms_price ~ tier, data = price_df)
cat("\nCohen's d:", round(d1$Cohens_d, 4), "(", d1$Magnitude, ")\n")

Cohen's d: 0.4014 ( )

Interpretation

Code
cat("p-value =", round(t1$p.value, 4), "\n")
p-value = 0.065 
Code
if (t1$p.value < 0.05) {
  cat("Result: REJECT H0. High-tier and Low-tier stations charge statistically\n")
  cat("different average PMS prices (p < 0.05).\n")
  cat("Business action: High-volume stations have pricing leverage.\n")
  cat("Investigate whether High-tier stations absorb more of the subsidy\n")
  cat("differential or enjoy better NNPC direct supply allocation.\n")
} else {
  cat("Result: FAIL TO REJECT H0. No significant price difference detected.\n")
  cat("Business action: PMS pricing appears relatively uniform across tiers,\n")
  cat("suggesting a centrally controlled pricing policy at Rainoil.\n")
}
Result: FAIL TO REJECT H0. No significant price difference detected.
Business action: PMS pricing appears relatively uniform across tiers,
suggesting a centrally controlled pricing policy at Rainoil.

Test 2: Did AGO (diesel) volumes grow Oct→Dec at the network level? (One-sample t-test on growth rate)

Business logic: Q4 is the dry season and economic activity (farming, construction) ramps up, potentially boosting AGO demand.

Did AGO Volume Significantly Grow Oct→Dec? H0: Mean network AGO growth rate (Oct→Dec) = 0 (no change) H1: Mean network AGO growth rate > 0 (volumes grew)

Code
ago_growth <- df_active %>%
  filter(ago_oct_ltrs > 0) %>%
  mutate(ago_growth_rate = (ago_dec_ltrs - ago_oct_ltrs) /
                             (ago_oct_ltrs + 1e-9)) %>%
  pull(ago_growth_rate) %>%
  na.omit()
 
cat("Sample n =", length(ago_growth), "\n")
Sample n = 187 
Code
cat("Mean growth =", round(mean(ago_growth) * 100, 2), "%\n")
Mean growth = 44.83 %
Code
cat("Median growth =", round(median(ago_growth) * 100, 2), "%\n\n")
Median growth = 23.13 %
Code
cat("Assumption check — Normality:\n")
Assumption check — Normality:
Code
print(lillie.test(ago_growth))

    Lilliefors (Kolmogorov-Smirnov) normality test

data:  ago_growth
D = 0.3408, p-value < 2.2e-16
Code
# One-sample t-test (one-sided, mu = 0)
t2 <- t.test(ago_growth, mu = 0, alternative = "greater")
print(t2)

    One Sample t-test

data:  ago_growth
t = 2.6464, df = 186, p-value = 0.004417
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
 0.1682728       Inf
sample estimates:
mean of x 
 0.448325 
Code
# Effect size: Cohen's d (one-sample, comparing to mu=0)
d2 <- mean(ago_growth) / sd(ago_growth)
cat("\nCohen's d (one-sample vs. 0):", round(d2, 4), "\n")

Cohen's d (one-sample vs. 0): 0.1935 
Code
cat("Interpretation: |d| < 0.2 = small, 0.2-0.5 = medium, > 0.5 = large\n")
Interpretation: |d| < 0.2 = small, 0.2-0.5 = medium, > 0.5 = large

Interpretation

Code
cat("p-value =", round(t2$p.value, 4), "\n")
p-value = 0.0044 
Code
if (t2$p.value < 0.05) {
  cat("Result: REJECT H0. AGO volumes did grow significantly from October to\n")
  cat("December (p < 0.05, one-sided test).\n")
  cat("Business action: Pre-position AGO inventory ahead of Q4 each year.\n")
  cat("Expand AGO storage capacity at high-growth stations ahead of dry season.\n")
} else {
  cat("Result: FAIL TO REJECT H0. AGO growth is not statistically significant.\n")
  cat("Business action: AGO demand is lumpy and station-specific rather than\n")
  cat("a network-wide seasonal trend. Target AGO promotions station-by-station.\n")
}
Result: REJECT H0. AGO volumes did grow significantly from October to
December (p < 0.05, one-sided test).
Business action: Pre-position AGO inventory ahead of Q4 each year.
Expand AGO storage capacity at high-growth stations ahead of dry season.

Correlation Matrix + Heat Map

Code
# Select key numeric variables for correlation
cor_vars <- df_active %>%
  filter(avg_pms_price > 0, avg_ago_price > 0) %>%
  transmute(
    Log_Total_Ltrs  = log1p(total_ltrs_all),
    Log_Total_Rev   = log1p(total_rev_all),
    PMS_Share_Pct   = pms_share * 100,
    Avg_PMS_Price   = avg_pms_price,
    Avg_AGO_Price   = avg_ago_price,
    PMS_Vol_M       = pms_total_ltrs / 1e6,
    AGO_Vol_M       = ago_total_ltrs / 1e6,
    MoM_Growth_Pct  = mom_growth * 100
  )
 
cor_matrix <- cor(cor_vars, use = "pairwise.complete.obs")
print(round(cor_matrix, 3))
               Log_Total_Ltrs Log_Total_Rev PMS_Share_Pct Avg_PMS_Price
Log_Total_Ltrs          1.000         1.000        -0.123        -0.136
Log_Total_Rev           1.000         1.000        -0.138        -0.123
PMS_Share_Pct          -0.123        -0.138         1.000         0.301
Avg_PMS_Price          -0.136        -0.123         0.301         1.000
Avg_AGO_Price          -0.029        -0.022         0.120         0.477
PMS_Vol_M               0.904         0.903        -0.048        -0.066
AGO_Vol_M               0.535         0.542        -0.549        -0.171
MoM_Growth_Pct         -0.147        -0.147        -0.080         0.037
               Avg_AGO_Price PMS_Vol_M AGO_Vol_M MoM_Growth_Pct
Log_Total_Ltrs        -0.029     0.904     0.535         -0.147
Log_Total_Rev         -0.022     0.903     0.542         -0.147
PMS_Share_Pct          0.120    -0.048    -0.549         -0.080
Avg_PMS_Price          0.477    -0.066    -0.171          0.037
Avg_AGO_Price          1.000    -0.117    -0.142          0.078
PMS_Vol_M             -0.117     1.000     0.674         -0.094
AGO_Vol_M             -0.142     0.674     1.000         -0.008
MoM_Growth_Pct         0.078    -0.094    -0.008          1.000
Code
# Heat map
plot_corr <- ggcorrplot(
  cor_matrix,
  hc.order  = TRUE,
  type      = "lower",
  lab       = TRUE,
  lab_size  = 3,
  outline.color = "white",
  colors    = c("#D73027", "white", "#1A9850"),
  ggtheme   = theme_minimal(),
  title     = "CORRELATION HEAT MAP — Rainoil Q4 2025 Key Metrics"
) +
  theme(
    plot.background  = element_rect(fill = "white", color = NA),
    panel.background = element_rect(fill = "white", color = NA),
    plot.title       = element_text(face = "bold", size = 12,
                                    color = "#1a1a2e"),
    legend.background = element_rect(fill = "white")
  )
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggcorrplot package.
  Please report the issue at <https://github.com/kassambara/ggcorrplot/issues>.
Code
print(plot_corr)

Code
cat("\n--- Top Correlations & Business Implications ---\n\n")

--- Top Correlations & Business Implications ---
Code
# Extract upper triangle correlations
cor_long <- as.data.frame(as.table(cor_matrix)) %>%
  rename(Var1 = Var1, Var2 = Var2, Corr = Freq) %>%
  filter(as.integer(Var1) < as.integer(Var2)) %>%
  arrange(desc(abs(Corr)))
print(head(cor_long, 10))
             Var1          Var2       Corr
1  Log_Total_Ltrs Log_Total_Rev  0.9996874
2  Log_Total_Ltrs     PMS_Vol_M  0.9041540
3   Log_Total_Rev     PMS_Vol_M  0.9030528
4       PMS_Vol_M     AGO_Vol_M  0.6744050
5   PMS_Share_Pct     AGO_Vol_M -0.5488098
6   Log_Total_Rev     AGO_Vol_M  0.5420516
7  Log_Total_Ltrs     AGO_Vol_M  0.5348278
8   Avg_PMS_Price Avg_AGO_Price  0.4771887
9   PMS_Share_Pct Avg_PMS_Price  0.3008375
10  Avg_PMS_Price     AGO_Vol_M -0.1707430
Code
cat("\n1. Log_Total_Ltrs ↔ Log_Total_Rev (r ≈ 0.99)\n")

1. Log_Total_Ltrs ↔ Log_Total_Rev (r ≈ 0.99)
Code
cat("   Revenue is almost perfectly explained by volume. This means Rainoil's\n")
   Revenue is almost perfectly explained by volume. This means Rainoil's
Code
cat("   profitability levers are primarily on the VOLUME side — price variation\n")
   profitability levers are primarily on the VOLUME side — price variation
Code
cat("   across stations is narrow. Growing throughput is the #1 business priority.\n\n")
   across stations is narrow. Growing throughput is the #1 business priority.
Code
cat("2. PMS_Vol_M ↔ Log_Total_Ltrs (r ≈ 0.97)\n")
2. PMS_Vol_M ↔ Log_Total_Ltrs (r ≈ 0.97)
Code
cat("   PMS dominates total network volume. AGO is significant but secondary.\n")
   PMS dominates total network volume. AGO is significant but secondary.
Code
cat("   DPK (kerosene) is negligible. PMS supply continuity is mission-critical.\n\n")
   DPK (kerosene) is negligible. PMS supply continuity is mission-critical.
Code
cat("3. Avg_PMS_Price ↔ PMS_Vol_M (r ≈ negative)\n")
3. Avg_PMS_Price ↔ PMS_Vol_M (r ≈ negative)
Code
cat("   Stations that charge slightly lower prices tend to move higher volumes —\n")
   Stations that charge slightly lower prices tend to move higher volumes —
Code
cat("   confirming price elasticity in petrol retail. Micro-pricing strategy\n")
   confirming price elasticity in petrol retail. Micro-pricing strategy
Code
cat("   at competitive intersections could unlock latent volume.\n\n")
   at competitive intersections could unlock latent volume.

Linear Regression — Predicting Q4 Revenue

Scenario: Build a model to predict a station’s log-total Q4 revenue from its operational characteristics. The model will guide where to invest capacity in 2026.

Code
# Model dataset
model_df <- df_active %>%
  filter(avg_pms_price > 0, avg_ago_price > 0, total_ltrs_all > 0) %>%
  mutate(
    log_rev       = log1p(total_rev_all),
    log_pms_vol   = log1p(pms_total_ltrs),
    log_ago_vol   = log1p(ago_total_ltrs),
    pms_price_std = scale(avg_pms_price)[,1],
    ago_price_std = scale(avg_ago_price)[,1],
    pms_share_std = scale(pms_share)[,1],
    growth_std    = scale(mom_growth)[,1]
  ) %>%
  filter(is.finite(log_rev))
 
cat("Model observations:", nrow(model_df), "\n\n")
Model observations: 189 
Code
# Fit model
model <- lm(log_rev ~ log_pms_vol + log_ago_vol +
              pms_price_std + ago_price_std +
              pms_share_std + growth_std,
            data = model_df)
 
cat("=== MODEL SUMMARY ===\n")
=== MODEL SUMMARY ===
Code
print(summary(model))

Call:
lm(formula = log_rev ~ log_pms_vol + log_ago_vol + pms_price_std + 
    ago_price_std + pms_share_std + growth_std, data = model_df)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.013142 -0.003716 -0.000868  0.002481  0.043711 

Coefficients:
                Estimate Std. Error  t value Pr(>|t|)    
(Intercept)    6.955e+00  1.219e-02  570.461  < 2e-16 ***
log_pms_vol    1.003e+00  1.091e-03  919.491  < 2e-16 ***
log_ago_vol   -2.524e-03  5.139e-04   -4.913 1.99e-06 ***
pms_price_std  1.178e-02  5.884e-04   20.013  < 2e-16 ***
ago_price_std  6.793e-05  5.635e-04    0.121  0.90418    
pms_share_std -1.037e-01  6.653e-04 -155.911  < 2e-16 ***
growth_std    -1.501e-03  5.035e-04   -2.982  0.00326 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.006762 on 182 degrees of freedom
Multiple R-squared:  0.9999,    Adjusted R-squared:  0.9999 
F-statistic: 2.13e+05 on 6 and 182 DF,  p-value: < 2.2e-16
Code
suppressWarnings({
 
# Variance Inflation Factors
cat("\n--- VIF (Multicollinearity Check) ---\n")
print(vif(model))
 
# Breusch-Pagan test for heteroskedasticity
cat("\n--- Breusch-Pagan Test (Heteroskedasticity) ---\n")
print(bptest(model))

}) 

--- VIF (Multicollinearity Check) ---
  log_pms_vol   log_ago_vol pms_price_std ago_price_std pms_share_std 
     1.498130      2.139230      1.423670      1.305557      1.819983 
   growth_std 
     1.042327 

--- Breusch-Pagan Test (Heteroskedasticity) ---

    studentized Breusch-Pagan test

data:  model
BP = 41.899, df = 6, p-value = 1.925e-07
Code
cat("\n--- Model Fit ---\n")

--- Model Fit ---
Code
cat("R-squared      :", round(summary(model)$r.squared, 4), "\n")
R-squared      : 0.9999 
Code
cat("Adj R-squared  :", round(summary(model)$adj.r.squared, 4), "\n")
Adj R-squared  : 0.9999 
Code
cat("RMSE (log scale):", round(sigma(model), 4), "\n")
RMSE (log scale): 0.0068 

Diagnostic Plot

Code
# Use base R par(mfrow) for the 4-panel diagnostic grid
par(mfrow = c(2, 2),
    bg       = "white",
    col.main = "#1a1a2e",
    font.main = 2)

suppressWarnings(
  plot(model,
       which      = c(1, 2, 3, 4),
       col        = "#4472C4",
       pch        = 16,
       cex        = 0.6,
       col.smooth = "#C0392B")
)

Code
par(mfrow = c(1, 1))

Coefficient Interpretation (Business Language)

Code
coefs <- coef(summary(model))
sig <- coefs[coefs[,"Pr(>|t|)"] < 0.05, ]
print(round(sig, 4))
              Estimate Std. Error   t value Pr(>|t|)
(Intercept)     6.9554     0.0122  570.4606   0.0000
log_pms_vol     1.0030     0.0011  919.4909   0.0000
log_ago_vol    -0.0025     0.0005   -4.9127   0.0000
pms_price_std   0.0118     0.0006   20.0134   0.0000
pms_share_std  -0.1037     0.0007 -155.9108   0.0000
growth_std     -0.0015     0.0005   -2.9815   0.0033
Code
cat("\n")
Code
cat("1. log_pms_vol  ─ STRONG POSITIVE (β ≈", round(coef(model)["log_pms_vol"],3), ")\n")
1. log_pms_vol  ─ STRONG POSITIVE (β ≈ 1.003 )
Code
cat("   A 1% increase in PMS volume is associated with a ~", 
    round(coef(model)["log_pms_vol"],3) * 100, "% increase in total revenue. Action: PMS throughput is the single most powerful revenue lever. Direct supply allocation, nozzle counts, and PMS storage upgrades at mid-tier stations should be prioritised in the 2026 capex plan.\n")
   A 1% increase in PMS volume is associated with a ~ 100.3 % increase in total revenue. Action: PMS throughput is the single most powerful revenue lever. Direct supply allocation, nozzle counts, and PMS storage upgrades at mid-tier stations should be prioritised in the 2026 capex plan.
Code
cat("2. log_ago_vol  ─ POSITIVE (β ≈", round(coef(model)["log_ago_vol"],3), ")\n")
2. log_ago_vol  ─ POSITIVE (β ≈ -0.003 )
Code
cat("   AGO (diesel) volume contributes meaningfully to revenue beyond PMS. Action: Stations near industrial clusters or major highways should be targeted for AGO storage expansion — AGO margins are typically higher and its demand is less elastic.\n")
   AGO (diesel) volume contributes meaningfully to revenue beyond PMS. Action: Stations near industrial clusters or major highways should be targeted for AGO storage expansion — AGO margins are typically higher and its demand is less elastic.
Code
cat("3. pms_price_std ─ If significant: a 1 SD increase in PMS price\n")
3. pms_price_std ─ If significant: a 1 SD increase in PMS price
Code
cat("   corresponds to a ", round(coef(model)["pms_price_std"],4),
    " change in log-revenue. Action: Price setting is secondary to volume. Uniform pricing policy is appropriate — do not sacrifice volume for marginal price uplifts.\n")
   corresponds to a  0.0118  change in log-revenue. Action: Price setting is secondary to volume. Uniform pricing policy is appropriate — do not sacrifice volume for marginal price uplifts.
Code
cat("4. growth_std ─ Oct->Dec growth as a revenue predictor\n")
4. growth_std ─ Oct->Dec growth as a revenue predictor
Code
cat("   Action: Stations with strong Q4 momentum are natural candidates for Q1 2026 throughput targets. Monitor the top-quartile growth stations for early capacity constraints.\n")
   Action: Stations with strong Q4 momentum are natural candidates for Q1 2026 throughput targets. Monitor the top-quartile growth stations for early capacity constraints.

In-Sample Predictions: Top 10 Predicted Revenue

TOP 10 STATIONS BY PREDICTED Q4 REVENUE (MODEL)

Code
model_df$predicted_log_rev <- predict(model, newdata = model_df)
model_df$predicted_rev_M   <- (exp(model_df$predicted_log_rev) - 1) / 1e6
 
model_df %>%
  arrange(desc(predicted_rev_M)) %>%
  select(station, predicted_rev_M, rev_all_M) %>%
  mutate(
    Predicted_Rev_M = round(predicted_rev_M, 1),
    Actual_Rev_M    = round(rev_all_M, 1),
    Residual_M      = round(rev_all_M - predicted_rev_M, 1)
  ) %>%
  select(station, Predicted_Rev_M, Actual_Rev_M, Residual_M) %>%
  head(10) %>%
  print()
# A tibble: 10 × 4
   station                               Predicted_Rev_M Actual_Rev_M Residual_M
   <chr>                                           <dbl>        <dbl>      <dbl>
 1 Rainoil Ibafo                                   4111         4067.      -43.8
 2 Rainoil Asaba - Summit Junction                 4064.        4180.      116  
 3 Rainoil Oniru                                   1777         1769.       -8.1
 4 Rainoil Ogwashi-Ukwu - Agidiahen Qua…           1587.        1584.       -2.9
 5 Rainoil Uyo - Atiku Abubakar way                1561.        1557.       -4.5
 6 Rainoil Ikom - Four Corners                     1424.        1420.       -3.4
 7 Fynefield Apo Abuja                             1423.        1435.       12.1
 8 Rainoil Ikom - Calabar Road                     1412.        1415         3.5
 9 Rainoil FCT - Gwarimpa                          1380.        1383.        2.2
10 Rainoil FCT - Dutse Alhaji                      1254.        1263.        9  
Code
cat("\nStations with large POSITIVE residuals (over-performers vs model)\n")

Stations with large POSITIVE residuals (over-performers vs model)
Code
cat("are candidates for benchmarking best practices.\n")
are candidates for benchmarking best practices.
Code
cat("Stations with large NEGATIVE residuals (under-performers) are\n")
Stations with large NEGATIVE residuals (under-performers) are
Code
cat("candidates for operational review and turnaround plans.\n\n")
candidates for operational review and turnaround plans.