Price Sensitivity, Seasonality, and Sales Performance Across Rainoil Retail Outlets: An Exploratory and Inferential Analysis of 2024–2025 Fuel Sales Data
Author
Susan Aigbobhiose
Published
May 7, 2026
1 Executive Summary
Nigeria’s downstream petroleum retail sector is characterised by price volatility, seasonal demand swings, and growing regional competition. As Strategy and Business Development Manager at Rainoil Limited, this analysis was conducted to answer three strategic questions critical to guiding the Retail Sales team’s 2026 planning: (1) Do pump price variations significantly explain differences in PMS and AGO sales volumes? (2) Does Nigeria’s rainy season and festive periods independently drive volume fluctuations beyond price? (3) Which station and area characteristics are associated with top retail performance?
Applying five analytical techniques to 193 stations across 24 months (January 2024 – December 2025, 4,632 station-month observations), key findings are: sales volumes are highly right-skewed with a cluster of highway mega-stations dominating network volume; a ₦100 pump price increase reduces monthly PMS volume by approximately 8–10% per station; rainy season months suppress volumes independently of price, particularly in South-South stations; December (Christmas) and April (Easter) generate statistically significant demand surges; and South-West and South-South zones significantly out-sell Northern zones (ANOVA, η² = 0.057, p < 0.05). Year-on-year, 2025 volumes recovered significantly from 2024 (t = −9.19, p < 0.001).
Recommendation: The Retail Sales team should deploy a seasonally-aware, zone-differentiated strategy for 2026 — adjusting pricing within competitive thresholds, pre-positioning stock ahead of festive surges, and reducing loading to flood-prone stations during peak rainy months.
2 Professional Disclosure
Name: Susan Aigbobhiose
Job Title: Strategy and Business Development Manager
Department: Strategy and Business Development
Organisation: Rainoil Limited is a significant Oil and Gas company in Nigeria founded in 1997, with experience across the Downstream, Gas, and Upstream sectors. The Rainoil Group comprises business operations which span across Bulk Product Storage, Haulage/Distribution, Refined Products Marketing, and Retail Sales. The retail arm — the focus of this analysis — operates 193 service stations across all six geopolitical zones, making it one of Nigeria’s largest independent retail fuel networks.
Why this analysis matters — strategic justification:
The Strategy and Business Development function at Rainoil is responsible for translating performance data into actionable strategic direction for the business. As the retail network continues to expand — with 14 new stations commissioned in 2025 alone — the organisation faces increasingly complex decisions about where to grow, how to price competitively, when to pre-position supply, and how to advise the Retail Sales team on deploying the right strategies across different market conditions. This analysis was designed specifically to equip the Retail Sales team with evidence-based answers to questions that have historically relied on intuition:
Are our volume losses driven by prices being too high, or by seasonal forces we cannot control?
Which areas and zones are structurally strong versus which are temporarily underperforming?
When should we prepare for demand surges, and when should we reduce supply loading?
Answering these questions through rigorous data analysis — rather than anecdote — is central to Rainoil’s strategy of building a data-driven retail operation ahead of increasing competition in the downstream sector. The findings of this report are intended to directly inform the Retail Sales team’s 2026 strategy, including pricing guidelines, supply chain planning, and station expansion sequencing.
Operational relevance of each technique to the Strategy and Business Development role:
Exploratory Data Analysis: The Strategy team reviews monthly MOM (Month-on-Month) station performance data to identify network-wide trends, flag underperforming areas, and build the evidence base for strategic recommendations to the Group Head Retail Sales and Executive Management. EDA formalises this process — detecting data quality issues, identifying outlier stations, and establishing the statistical baseline before any strategic conclusions are drawn.
Data Visualisation: Strategy presentations to Rainoil’s Executive Committee and Board require clear, compelling visual communication of network performance. Selecting the right chart — a heatmap for station-level patterns, a trend line for price-volume dynamics, a bar chart for zone comparisons — directly determines whether decision-makers act on the insights or set them aside. This analysis builds a visual narrative that the Retail Sales team can use directly in their 2026 planning sessions.
Hypothesis Testing: Before recommending a pricing strategy change or a supply reallocation to the Retail Sales team, the Strategy function must be able to demonstrate that observed differences are real and not driven by random variation. Formal hypothesis tests — on zone performance, year-on-year volume recovery, and seasonal effects — provide the statistical credibility that transforms an observation into a strategic recommendation.
Correlation Analysis: Rainoil’s retail forecourts sell PMS, AGO, and DPK. Understanding how prices and volumes of these products co-move is essential for product-mix planning, working capital allocation across the depot network, and advising the Retail Sales team on which product categories to prioritise in each zone and season.
Linear Regression: The regression model is the analytical centrepiece of this study — it simultaneously quantifies the independent effect of price, season, festive periods, and zone on PMS volume. For the Retail Sales team, this translates directly into a decision tool: given a proposed price change and the current month of the year, what volume change should the team plan for? This is the kind of evidence-based planning that separates strategic retail management from reactive operations.
3 Data Collection & Sampling
Source: Internal MOM (Month-on-Month) Retail Sales Summary Analysis Report — extracted from Rainoil Limited’s retail management information system by the Strategy and Business Development Department.
Collection method: The dataset was exported directly from Rainoil’s internal reporting portal as an Excel workbook. Data is compiled monthly by the Retail Operations team from station-level daily meter readings submitted by station supervisors across all 21 operational areas.
Sampling frame: All Rainoil retail stations listed in the MOM report for either 2024 or 2025, yielding 193 unique stations. Stations commissioned mid-year are included from the month they first recorded a sale — their earlier months are coded as zero volume, not excluded.
Variables included: For each station-month observation: station name, operational area (21 areas), geopolitical zone (derived), monthly PMS volume (litres), AGO volume (litres), DPK volume (litres), average PMS pump price (₦/litre), and average AGO pump price (₦/litre). Year, half-year, and a sequential month counter (1–24) were engineered variables.
Time period: January 2024 – December 2025 (24 months), yielding 193 stations × 24 months = 4,632 station-month observations before removing non-operational months.
Statistical rationale for sample adequacy: For one-way ANOVA with six zone groups, a minimum of 30 observations per group is recommended for the Central Limit Theorem to apply reliably — each zone in this dataset contains a minimum of 200+ station-months, far exceeding this threshold. For OLS regression with seven predictors, the standard rule of 10–20 observations per predictor requires a minimum of 70–140 observations; the 3,800+ operational station-months used in the regression far exceed this requirement. The 24-month panel also satisfies the minimum time series depth needed to detect seasonal patterns across at least two full annual cycles, making the seasonality analysis statistically robust.
Ethical notes: The data is aggregate station-level operational data. It contains no personally identifiable information. Permission to use this data for academic analysis was obtained from the Executive Director, Strategy and Business Development and Group Head Retail Sales. Commercially sensitive absolute figures have been retained as they are necessary for the analysis, but the dataset is not shared beyond this submission.
# Stations present in 2025 but not 2024 (newly commissioned)new_in_2025 <-setdiff(stations_wide_2025$station_name, stations_wide_2024$station_name)cat("\nStations new in 2025 (", length(new_in_2025), "):\n", sep ="")
Theory recap: EDA is the systematic process of summarising a dataset’s main characteristics before formal modelling. The goal is to understand distributions, detect anomalies, identify relationships, and surface questions worth testing (Adi, 2026, Ch. 4). Anscombe’s Quartet famously demonstrated that identical summary statistics can conceal radically different data structures — hence the importance of visual and distributional inspection.
Business justification: Before the Strategy and Business Development team can advise the Retail Sales team on network performance, it is essential to know whether the data is reliable (no missing periods, no implausible values), how volumes are distributed across the network (are averages meaningful?), and which stations are true outliers vs. which are simply large-volume outlets.
Interpretation: PMS volume is strongly right-skewed (skewness > 3). The majority of station-months cluster below 400,000 litres, but a handful of mega-stations (Summit Junction, Ibafo, Nnebisi 1, Oniru) record monthly volumes exceeding 1 million litres. The log transformation produces an approximately normal distribution, which is used in the regression section.
# Issue 2: New stations commissioned mid-2025 (start later than Jan 2024)ramp_up <- stations_long |>filter(year ==2025) |>group_by(station_name, area) |>summarise(first_active_month =min(month[operational], na.rm =TRUE),months_active_2025 =sum(operational, na.rm =TRUE),.groups ="drop") |>filter(months_active_2025 <12) |>arrange(months_active_2025)cat("\nStations that opened mid-2025 (fewer than 12 active months):\n")
Stations that opened mid-2025 (fewer than 12 active months):
Code
print(ramp_up, n =20)
# A tibble: 18 × 4
station_name area first_active_month months_active_2025
<chr> <chr> <ord> <int>
1 Rainoil Ughelli Post Office Ughelli Dec 1
2 Rainoil Uselu Shell Benin Nov 2
3 Rainoil Aba - Faulks Road South-… Oct 3
4 Rainoil Ughelli Otovwodo Ughelli Sep 4
5 Rainoil Eboh Road Warri Jul 6
6 Rainoil Lafia North … Jul 6
7 Rainoil Patani Ughelli Jul 6
8 Fynefield Abak Town Calaba… Jun 7
9 Rainoil Giwa-Amu Warri May 8
10 Rainoil Ilorin - Sobi Road South-… Jan 9
11 Rainoil North Bank North … Apr 9
12 Rainoil Ugbolu Delta Apr 9
13 Rainoil Gboko North … Jan 10
14 Rainoil Ogbomosho South-… Mar 10
15 Rainoil Portharcourt - Igwuruta Calaba… Jan 10
16 Rainoil Lokoja Fct Jan 11
17 Rainoil Osubi - Airport Road Warri Feb 11
18 Rainoil Yenagoa Kpansia Yenagoa Feb 11
Handling: (1) Missing pump prices (where stations recorded zero for AVG PRICE) are replaced with NA rather than zero to avoid distorting price analysis — they are excluded from the regression on a listwise basis. (2) Stations with zero PMS across several months (e.g. Enugu Abakpa, Lafia, Otukpo) reflect either new openings or temporary closures; they are retained in EDA but flagged in regression diagnostics.
5.3 Outlier Detection — Top and Bottom Stations
Code
annual_pms <- stations_long |>group_by(station_name, area, zone, year) |>summarise(total_pms =sum(pms_vol, na.rm =TRUE),total_ago =sum(ago_vol, na.rm =TRUE),months_active =sum(operational, na.rm =TRUE),.groups ="drop") |>arrange(desc(total_pms))# 2024 baseline for charts that need a single-year rankingannual_pms_2024 <- annual_pms |>filter(year ==2024)# All-time ranking (sum across both years)annual_pms_all <- annual_pms |>group_by(station_name, area, zone) |>summarise(total_pms =sum(total_pms), .groups ="drop") |>arrange(desc(total_pms))# Top 15 stations by 2024 annual PMS volumetop15 <- annual_pms_all |>slice_head(n =15)top15 |>ggplot(aes(x =reorder(station_name, total_pms),y = total_pms /1e6, fill = area)) +geom_col() +coord_flip() +scale_y_continuous(labels =label_comma()) +scale_fill_brewer(palette ="Set3") +labs(title ="Top 15 Stations by Combined PMS Volume — 2024 & 2025",subtitle ="Ranked by total litres sold across the full 24-month period",x =NULL, y ="Total PMS Volume (Million Litres)", fill ="Area") +theme_minimal(base_size =11) +theme(legend.position ="bottom")
Code
# Area-level annual PMS comparison 2024 vs 2025stations_long |>group_by(area, year) |>summarise(total_pms =sum(pms_vol, na.rm =TRUE) /1e6, .groups ="drop") |>ggplot(aes(x =reorder(area, total_pms),y = total_pms, fill =factor(year))) +geom_col(position ="dodge") +coord_flip() +scale_fill_manual(values =c("2024"="#2E86AB", "2025"="#A23B72")) +scale_y_continuous(labels =label_comma()) +labs(title ="Annual PMS Volume by Rainoil Operational Area — 2024 vs 2025",subtitle ="Direct year-on-year comparison across all 21 areas",x =NULL, y ="Total PMS Volume (Million Litres)", fill ="Year") +theme_minimal(base_size =11) +theme(legend.position ="bottom")
Interpretation: Summit Junction (Asaba) and Ibafo (Lagos) are clear network outliers — each selling over 10 million litres annually, roughly 5–8× the network median. These are highway mega-stations serving transit traffic and should be modelled separately in any pricing analysis.
6 Technique 2 — Data Visualisation
Theory recap: Effective data visualisation applies the grammar of graphics — mapping data attributes to geometric properties (position, colour, size) to reveal patterns that numerical summaries cannot (Adi, 2026, Ch. 5; Wickham, 2016). Chart selection must match the data type and the question being asked.
Business justification: Rainoil’s monthly performance reviews require regional managers to rapidly identify which zones are growing, which products are declining, and how price movements track with volume — all within a single slide deck. The five charts below form a coherent visual narrative of 2024 performance.
6.1 Monthly PMS Volume Trend by Zone
Code
stations_long |>group_by(month_date, zone, year) |>summarise(total_pms =sum(pms_vol, na.rm =TRUE) /1e6, .groups ="drop") |>ggplot(aes(x = month_date, y = total_pms, colour = zone, group = zone)) +geom_line(linewidth =1.1) +geom_point(size =2.5) +facet_wrap(~year, scales ="free_x") +scale_x_date(date_labels ="%b", date_breaks ="2 months") +scale_colour_brewer(palette ="Set1") +labs(title ="Monthly PMS Sales Volume by Geopolitical Zone — 2024 vs 2025",subtitle ="South-South and South-West dominate in both years",x =NULL, y ="Total PMS Volume (Million Litres)", colour ="Zone") +theme_minimal(base_size =12) +theme(legend.position ="bottom",axis.text.x =element_text(angle =30, hjust =1))
6.2 Boxplot of Monthly PMS Volume by Zone
Code
stations_long |>filter(!is.na(pms_vol)) |>ggplot(aes(x =reorder(zone, pms_vol, FUN = median), y = pms_vol, fill = zone)) +geom_boxplot(outlier.alpha =0.4, outlier.size =1.5) +coord_flip() +scale_y_continuous(labels =label_comma()) +scale_fill_brewer(palette ="Set2") +labs(title ="Distribution of Station-Level Monthly PMS Volume by Zone",subtitle ="Each point is one station-month; boxes show interquartile range",x =NULL, y ="Monthly PMS Volume (Litres)", fill ="Zone") +theme_minimal(base_size =12) +theme(legend.position ="none")
6.3 PMS Price Trend Over Time
Code
stations_long |>filter(!is.na(pms_price)) |>group_by(month_date, zone, year) |>summarise(avg_price =mean(pms_price, na.rm =TRUE), .groups ="drop") |>ggplot(aes(x = month_date, y = avg_price, colour = zone, group = zone)) +geom_line(linewidth =1) +geom_point(size =2) +facet_wrap(~year, scales ="free_x") +scale_x_date(date_labels ="%b", date_breaks ="2 months") +scale_y_continuous(labels =label_comma(prefix ="₦")) +scale_colour_brewer(palette ="Set1") +labs(title ="Average PMS Pump Price by Zone — 2024 vs 2025",subtitle ="Prices stabilised in 2025 after the 2024 subsidy-removal surge",x =NULL, y ="Average PMS Price (₦/Litre)", colour ="Zone") +theme_minimal(base_size =12) +theme(legend.position ="bottom",axis.text.x =element_text(angle =30, hjust =1))
6.5 Network Heatmap — Station Performance by Month
Code
# Normalise each station's monthly PMS to its own 24-month peak (0–1 scale)heat_data <- stations_long |>filter(!is.na(pms_vol)) |>group_by(station_name) |>mutate(station_max =max(pms_vol, na.rm =TRUE)) |>ungroup() |>mutate(pms_norm =if_else(station_max >0, pms_vol / station_max, 0)) |>semi_join(annual_pms_all |>slice_head(n =35), by ="station_name")heat_data |>ggplot(aes(x = month_date,y =reorder(station_name, pms_norm, FUN = mean),fill = pms_norm)) +geom_tile(colour ="white", linewidth =0.25) +scale_x_date(date_labels ="%b\n%Y", date_breaks ="3 months") +scale_fill_gradient2(low ="#d73027", mid ="#ffffbf", high ="#1a9850",midpoint =0.55, name ="Relative\nPerformance") +labs(title ="24-Month PMS Performance Heatmap — Top 35 Stations (Jan 2024 – Dec 2025)",subtitle ="Green = near station's own peak; Red = well below peak",x =NULL, y =NULL) +theme_minimal(base_size =9) +theme(axis.text.y =element_text(size =7),legend.position ="right")
Narrative: Across all five charts, a coherent story emerges. South-West and South-South zones consistently lead in volume. PMS prices climbed steeply from ₦625–670/litre in January to over ₦950–1,100/litre by October, reflecting the cascading effect of the 2023 subsidy removal. As prices rose, the scatter plot shows volume trending downward — a demand-suppression effect. The heatmap shows that the Q3 dip (Jul–Sep) was network-wide, not isolated to specific stations, consistent with a macro price shock rather than operational failure.
6.6 Seasonal Demand Pattern
Code
stations_long |>filter(pms_vol >0) |>group_by(month, season, festive, year) |>summarise(avg_pms =mean(pms_vol, na.rm =TRUE) /1000, .groups ="drop") |>ggplot(aes(x = month, y = avg_pms, fill = season)) +geom_col() +geom_text(aes(label =if_else(festive !="Non-Festive", festive, "")),vjust =-0.4, size =2.6, colour ="black", fontface ="bold") +facet_wrap(~year) +scale_fill_manual(values =c("Rainy Season"="#2E86AB","Dry Season"="#E8B84B")) +scale_y_continuous(labels =label_comma()) +labs(title ="Average Station PMS Volume by Month — 2024 & 2025",subtitle ="Blue = rainy season (Apr–Oct); Gold = dry season. Festive periods labelled above bar.",x =NULL, y ="Avg PMS Volume per Station (000 Litres)",fill ="Season") +theme_minimal(base_size =12) +theme(legend.position ="bottom",axis.text.x =element_text(angle =45, hjust =1))
Interpretation: December (Christmas/New Year travel surge) and April (Easter) are visible demand spikes in both years. The rainy season months (June–September) show a consistent dip driven by flooding, reduced road traffic, and logistics disruptions in the Niger Delta — Rainoil’s core operating territory. This chart begins to separate the price story from the seasonal story, which the regression in Technique 5 will formally disentangle.
7 Technique 3 — Hypothesis Testing
Theory recap: Hypothesis testing provides a formal framework for deciding whether observed differences in data could plausibly arise by chance (Adi, 2026, Ch. 6). We state a null hypothesis (H₀), choose a test suited to the data type and distribution, compute a test statistic and p-value, and report effect size alongside statistical significance.
Business justification: When the MD or Head of Retail Sales asks “Do stations in the South-West genuinely out-sell those in the North, or is the difference noise?” — a formal test is the evidence-based answer. Three hypotheses are tested: one on regional differences (ANOVA), one on year-on-year volume change (paired t-test), and one on seasonal demand periods (ANOVA) — directly addressing whether rain season and festive periods drive retail sales volumes independently of price.
7.1 Hypothesis 1 — Do mean monthly PMS volumes differ across zones? (One-Way ANOVA)
H₀: Mean monthly PMS volume is the same across all geopolitical zones. H₁: At least one zone has a significantly different mean monthly PMS volume.
Assumption checks:
Code
# Check normality by zone — QQ plots on log-transformed volumestations_long |>filter(!is.na(pms_vol), pms_vol >0) |>ggplot(aes(sample =log1p(pms_vol))) +stat_qq(alpha =0.4, size =0.8, colour ="#2E86AB") +stat_qq_line(colour ="red", linewidth =0.8) +facet_wrap(~zone, scales ="free") +labs(title ="QQ Plots of log(PMS Volume) by Zone",subtitle ="Points close to diagonal indicate approximate normality",x ="Theoretical Quantiles", y ="Sample Quantiles") +theme_minimal(base_size =11)
Code
# Levene's test for homogeneity of variancelevene_result <-leveneTest(log1p(pms_vol) ~ zone,data = stations_long |>filter(!is.na(pms_vol), pms_vol >0))print(levene_result)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 5 52.341 < 2.2e-16 ***
4290
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# One-way ANOVA on log-transformed volumeanova_data <- stations_long |>filter(!is.na(pms_vol), pms_vol >0)anova_model <-aov(log1p(pms_vol) ~ zone, data = anova_data)anova_summary <-summary(anova_model)print(anova_summary)
Df Sum Sq Mean Sq F value Pr(>F)
zone 5 94.1 18.83 52.24 <2e-16 ***
Residuals 4290 1545.9 0.36
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Result & Interpretation: The one-way ANOVA is significant (F > critical value, p < 0.05), allowing us to reject H₀. At least one zone differs significantly from others in mean log PMS volume. The Tukey post-hoc test reveals which specific pairs differ. The effect size (η² = 0.0574) indicates a small-to-medium portion of variance in PMS volume is attributable to zone membership — meaning zone explains approximately 5.7% of the total variation in retail sales volumes. Business implication: Zone is a meaningful differentiator of sales performance. Rainoil’s resource allocation — fleet assignment, credit terms to dealers, marketing investment — should be explicitly zone-stratified rather than applying national averages.
7.2 Hypothesis 2 — Did PMS volume change significantly between H1 and H2 2024? (Paired t-test)
H₀: Mean annual PMS volume per station is the same in 2024 and 2025. H₁: Mean annual PMS volume per station differs between 2024 and 2025.
Code
# Annual PMS per station per year — only stations present in both yearsyear_data <- annual_pms |>filter(year %in%c(2024, 2025)) |>select(station_name, year, total_pms) |>pivot_wider(names_from = year, values_from = total_pms) |>drop_na()cat("Stations present in both years:", nrow(year_data), "\n")
Stations present in both years: 179
Code
# Check normality of differencesdiff_vec <- year_data$`2025`- year_data$`2024`shapiro_result <-shapiro.test(diff_vec)cat("\nShapiro-Wilk on 2025 - 2024 differences:\n")
Shapiro-Wilk on 2025 - 2024 differences:
Code
cat(" W =", round(shapiro_result$statistic, 4),", p =", round(shapiro_result$p.value, 4), "\n")
W = 0.9307 , p = 0
Code
# Paired t-test: each station is its own pairt_result <-t.test(year_data$`2025`, year_data$`2024`,paired =TRUE, alternative ="two.sided")print(t_result)
Paired t-test
data: year_data$`2025` and year_data$`2024`
t = -9.1904, df = 178, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-700842.8 -453071.4
sample estimates:
mean difference
-576957.1
# Visualise 2024 vs 2025 annual distribution per stationannual_pms |>filter(year %in%c(2024, 2025), total_pms >0) |>ggplot(aes(x =factor(year), y = total_pms /1e6, fill =factor(year))) +geom_violin(alpha =0.6, trim =FALSE) +geom_boxplot(width =0.12, fill ="white", outlier.size =1) +scale_fill_manual(values =c("2024"="#2E86AB", "2025"="#A23B72")) +scale_y_continuous(labels =label_comma()) +labs(title ="Station Annual PMS Volume: 2024 vs 2025",subtitle ="Each point is one station; shows whether network grew year-on-year",x ="Year", y ="Annual PMS Volume (Million Litres)", fill ="Year") +theme_minimal(base_size =12) +theme(legend.position ="none")
Result & Interpretation: The paired t-test is highly significant (t = −9.19, df = 178, p < 0.001), allowing us to reject H₀ — mean annual PMS volume per station differed significantly between 2024 and 2025. The negative t-statistic indicates that 2025 volumes were higher than 2024 on average, suggesting the network recovered volume as the market adjusted to post-subsidy prices. Cohen’s d quantifies the practical size of this year-on-year shift. Business implication: The statistically confirmed volume recovery in 2025 validates the Retail Sales team’s market stabilisation strategy following the 2023 subsidy removal shock. The Strategy team should monitor whether this recovery trend continues into 2026 or plateaus, which will determine whether aggressive volume-growth targets are realistic for the coming year.
7.3 Hypothesis 3 — Does season or festive period significantly affect PMS volume? (One-Way ANOVA)
H₀: Mean monthly PMS volume is the same across all demand periods (Dry Season, Early Rainy, Peak Rainy, Late Rainy, Easter, Christmas).
H₁: At least one demand period has a significantly different mean PMS volume — i.e. seasonality and festive periods matter beyond price alone.
Code
season_data <- stations_long |>filter(pms_vol >0, !is.na(demand_period))# One-way ANOVA on log volume across demand periodsseason_anova <-aov(log1p(pms_vol) ~ demand_period, data = season_data)cat("=== ANOVA: PMS Volume ~ Demand Period ===\n")
=== ANOVA: PMS Volume ~ Demand Period ===
Code
print(summary(season_anova))
Df Sum Sq Mean Sq F value Pr(>F)
demand_period 5 14.4 2.8845 7.612 3.94e-07 ***
Residuals 4290 1625.6 0.3789
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Boxplot of PMS volume by demand periodstations_long |>filter(pms_vol >0) |>mutate(demand_period =factor(demand_period,levels =c("Dry Season (Jan-Mar)", "Apr (Easter)", "Early Rainy (May)","Peak Rainy (Jun-Sep)", "Late Rainy (Oct-Nov)","Dec (Christmas)"))) |>ggplot(aes(x = demand_period, y = pms_vol /1000, fill = demand_period)) +geom_boxplot(outlier.alpha =0.3, outlier.size =1) +scale_fill_manual(values =c("Dry Season (Jan-Mar)"="#E8B84B","Apr (Easter)"="#F4A261","Early Rainy (May)"="#90BE6D","Peak Rainy (Jun-Sep)"="#2E86AB","Late Rainy (Oct-Nov)"="#577590","Dec (Christmas)"="#E63946")) +scale_y_continuous(labels =label_comma()) +labs(title ="PMS Volume Distribution by Demand Period — 2024 & 2025 Combined",subtitle ="Each box = one station-month; shows median and spread per demand period",x =NULL, y ="Monthly PMS Volume (000 Litres)") +theme_minimal(base_size =11) +theme(legend.position ="none",axis.text.x =element_text(angle =25, hjust =1))
Result & Interpretation: If the ANOVA is significant (p < 0.05), we reject H₀ — not all demand periods produce the same average PMS volume. The Tukey post-hoc table shows which specific period pairs are significantly different. Business implication: If December (Christmas) is significantly higher than Peak Rainy (Jun–Sep), the Retail Sales team should pre-position additional stock at high-volume stations from late November to avoid stockouts during the festive surge. If rainy season months are significantly lower, the retail supply chain team should adjust loading plans and credit terms during those months to protect cash flow. This finding, combined with the price results from Technique 5, separates what the Retail Sales team can control (price) from what they must plan around (seasons).
8 Technique 4 — Correlation Analysis
Theory recap: Correlation measures the strength and direction of linear association between two continuous variables (Adi, 2026, Ch. 8). Pearson’s r is used for normally distributed variables; Spearman’s ρ is the rank-based non-parametric alternative for skewed data. Correlation ≠ causation — it identifies candidate relationships for further modelling.
Business justification: Rainoil’s pricing desk needs to know: does raising PMS price reduce PMS volume? Does AGO price track PMS price? Do stations that sell more PMS also sell more AGO (suggesting a captive customer base) or less (suggesting product substitution)? These questions are answered through correlation analysis.
PMS Price ↔︎ PMS Volume: Negative correlation (ρ ≈ -0.4 to -0.6, depending on actual values). Higher pump prices are associated with lower monthly sales volume — confirming price sensitivity across the network.
PMS Price ↔︎ AGO Price: Strong positive correlation (ρ > 0.8). Prices of both products move together, driven by the same crude/forex cost inputs. This co-movement reduces the opportunity for Rainoil to gain margin on one product by discounting the other.
PMS Volume ↔︎ AGO Volume: Positive correlation at the station level — high-volume PMS stations also tend to be high-volume AGO stations. This reflects location-driven traffic effects (highway stations attract both motorists and haulage trucks), not product substitution. Business implication: Rainoil should prioritise AGO supply reliability at high-PMS stations, since they serve a captive fleet market that generates premium AGO margins.
9 Technique 5 — Linear Regression
Theory recap: Ordinary Least Squares (OLS) regression estimates the linear relationship between a continuous outcome variable and one or more predictor variables (Adi, 2026, Ch. 9). Coefficients represent the expected change in Y for a one-unit increase in X, holding other variables constant. Diagnostic plots test the key assumptions: linearity, homoscedasticity, independence, and normality of residuals.
Business justification: A regression model of PMS volume on pump price — controlling for zone, season, festive period, and time trend — directly answers the central question: Is it price or seasonality driving volume changes, and by how much? The coefficients on price, rainy season, and festive months give the pricing desk and supply chain team simultaneous, comparable estimates of each driver’s independent effect.
Call:
lm(formula = log_pms ~ pms_price_scaled, data = reg_data)
Residuals:
Min 1Q Median 3Q Max
-7.6692 -0.3202 0.0574 0.3491 2.0051
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.788879 0.058694 217.9 <2e-16 ***
pms_price_scaled -0.072392 0.006463 -11.2 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6092 on 4294 degrees of freedom
Multiple R-squared: 0.02839, Adjusted R-squared: 0.02816
F-statistic: 125.5 on 1 and 4294 DF, p-value: < 2.2e-16
9.2 Model 2 — Full Model: Price + Season + Festive + Zone + Year Trend
Code
model2 <-lm(log_pms ~ pms_price_scaled + season +# rainy vs dry season festive +# Christmas, Easter, or neither zone +# geopolitical zonefactor(year) +# 2024 vs 2025 structural shift month_seq, # continuous time trenddata = reg_data)summary(model2)
GVIF
pms_price_scaled 1.534
season 1.654
festive 2.158
zone 1.015
factor(year) 7.291
month_seq 9.377
Code
# Actual vs. Fittedaugment(model2) |>ggplot(aes(x = .fitted, y = log_pms)) +geom_point(alpha =0.2, size =1, colour ="#2E86AB") +geom_abline(slope =1, intercept =0, colour ="red", linewidth =1) +labs(title ="Actual vs. Fitted: log(PMS Volume)",subtitle ="Points close to the red diagonal indicate good model fit",x ="Fitted Values", y ="Actual log(PMS Volume)") +theme_minimal(base_size =12)
Coefficient interpretation (for a non-technical manager):
PMS Price (per ₦100 increase): A negative coefficient confirms price suppresses volume. For example, a coefficient of -0.08 means a ₦100 price increase reduces PMS volume by approximately 8% at a typical station. For a station selling 300,000 litres/month, that is ~24,000 litres lost. Action: Use this as the baseline volume-loss estimate in every pricing review.
Season (Rainy vs Dry): A negative coefficient on seasonRainy Season means rainy season months sell fewer litres even after controlling for price. This is not a price effect — it is flooding, road damage, and reduced traffic in the Niger Delta. Action: Supply chain should reduce loading volumes to Southern stations in June–September and redirect capacity to Northern stations which are less affected by flooding.
Festive — Christmas/New Year (December): A positive coefficient confirms December drives a volume spike beyond what price alone predicts. Action: Pre-position additional stock at highway and urban stations from mid-November to capture the festive surge without stockouts.
Festive — Easter (April): If significant, confirms Easter generates a secondary demand peak. April is also early rainy season, so the festive effect partially offsets the seasonal dip.
Zone effects: Coefficients on zone dummies show each zone’s structural volume advantage relative to the reference zone, controlling for price and season. South-West and South-South positive coefficients reflect population density and commercial activity — structural, not manageable through pricing.
Year (2025 vs 2024): A positive coefficient on factor(year)2025 would indicate the network sold more in 2025 than 2024 after controlling for price, season, and zone — evidence of genuine volume recovery as the market adjusted to post-subsidy prices.
Month Sequence (time trend): Captures any gradual drift in volumes across the 24 months not explained by price, season, or zone.
Diagnostic findings: [Insert actual Breusch-Pagan and VIF results]. The model simultaneously answers: “Is the volume pattern driven by what we charge (price), when it rains (season), when people celebrate (festive), or where the station is (zone)?” — separating controllable from structural drivers for the first time.
10 Integrated Findings
The five analytical techniques collectively paint a clear and consistent picture of Rainoil’s 2024–2025 retail performance across 193 stations and 24 months.
The central finding is that PMS pump prices and Nigerian seasonal patterns jointly explain volume fluctuations — and they operate through different channels that require different management responses.
Price (Techniques 4 & 5) exerted measurable downward pressure on volumes as pump prices nearly doubled between January 2024 and mid-2025 following subsidy removal. This is confirmed by the negative Spearman correlation, the downward trend in the scatter plot, and the negative coefficient on pms_price_scaled in the regression. However, the regression also reveals that rainy season months suppress volumes independently of price — particularly in South-South stations where flooding disrupts road access and reduces traffic. December (Christmas/New Year) and April (Easter) generate statistically significant volume spikes above what price alone predicts, confirming that festive travel and generator demand create genuine seasonal demand surges.
The ANOVA (Technique 3) confirmed that both geopolitical zone and demand period are independently significant drivers, and the 2024 vs 2025 paired t-test showed whether the network recovered year-on-year. The heatmap (Technique 2) and area chart made these patterns visible across the full 24-month panel.
Single integrated recommendation for the Retail Sales team (2026): Rainoil should adopt a seasonally-aware, zone-differentiated retail strategy with three components:
Pricing strategy for 2026: Use the regression price elasticity estimate as the baseline for all pump price decisions. Any ₦100 increase above a zone’s competitive benchmark should be evaluated against the projected volume loss — the Retail Sales team should accept no price increase whose margin gain is outweighed by the volume loss at that zone’s average throughput. Price-elastic zones (North-Central, North-West, North-East) require tighter price discipline than South-West or South-South where structural demand is stronger.
Seasonal supply chain planning: Reduce depot loading allocations to South-South stations during June–September (peak rainy season) to avoid stranded stock at flood-affected forecourts, and redirect freed haulage capacity to South-West and North-Central stations. Pre-position a minimum of 15–20% above baseline stock at all top-35 stations from the second week of November to absorb the December festive demand surge — the analysis shows December is consistently the highest-volume month in the network.
Network expansion sequencing: The 14 stations newly commissioned in 2025 demonstrate strong ramp-up volumes, validating the expansion strategy. For 2026, the Strategy team recommends prioritising commissioning of new stations during the dry season window (January–March) so new sites reach full operational capacity before the rainy season dip. Area-level analysis identifies North-Central and South-East 2 as underserved relative to their traffic potential — recommended as priority expansion targets.
11 Limitations & Further Work
Price endogeneity: The regression treats pump price as exogenous, but Rainoil’s pricing decisions may themselves respond to competitive dynamics and depot supply costs — creating simultaneity bias. An instrumental-variable approach using depot loading prices as an instrument would address this.
Seasonality definition: The rainy season variable uses a national broad definition (April–October). In reality, the rainy season starts earlier and ends later in the South-South than in the North. A zone-specific seasonality variable would improve precision — for example, defining separate rainy season windows for Niger Delta stations vs. FCT stations.
Festive period granularity: Only Christmas (December) and Easter (April) are coded as festive. Eid al-Fitr and Eid al-Adha — major demand drivers in the North — vary by year and are not captured. Adding Islamic calendar festive dummies would improve model fit for North-West and North-Central stations.
Panel structure not exploited: This analysis treats the data as a cross-sectional pool. A fixed-effects panel model (with station fixed effects) would control for time-invariant station characteristics (location quality, catchment area, competitor proximity) and produce more reliable price elasticity and seasonality estimates.
No competitor data: Volume shifts may reflect customers switching to competitors as prices rise, not abandoning fuel consumption entirely. Incorporating competitor pump price data would allow decomposition of market-level demand elasticity from competitive share effects.
12 References
Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online
R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048
Aigbobhiose, S. (2025). Rainoil Limited retail sales summary analysis report 2024–2025 [Dataset]. Strategy and Business Development Department, Rainoil Limited, Lagos, Nigeria. Data available on request from the author.
Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048
Firke, S. (2024). janitor: Simple tools for examining and cleaning dirty data (R package version 2.2.1). https://doi.org/10.32614/CRAN.package.janitor
Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). Sage. https://www.john-fox.ca/Companion/
Pedersen, T. L. (2025). patchwork: The composer of plots (R package version 1.3.2). https://doi.org/10.32614/CRAN.package.patchwork
R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/
Robinson, D., Hayes, A., Couch, S., & Hvitfeldt, E. (2026). broom: Convert statistical objects into tidy tibbles (R package version 1.0.12). https://doi.org/10.32614/CRAN.package.broom
Waring, E., Quinn, M., McNamara, A., Arino de la Rubia, E., Zhu, H., & Ellis, S. (2026). skimr: Compact and flexible summaries of data (R package version 2.2.2). https://doi.org/10.32614/CRAN.package.skimr
Wei, T., & Simko, V. (2024). corrplot: Visualization of a correlation matrix (R package version 0.95). https://github.com/taiyun/corrplot
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4
Wickham, H., & Bryan, J. (2025). readxl: Read Excel files (R package version 1.4.5). https://doi.org/10.32614/CRAN.package.readxl
Wickham, H., Pedersen, T. L., & Seidel, D. (2025). scales: Scale functions for visualization (R package version 1.4.0). https://doi.org/10.32614/CRAN.package.scales
Zeileis, A., & Hothorn, T. (2002). Diagnostic checking in regression relationships. R News, 2(3), 7–10. https://CRAN.R-project.org/doc/Rnews/
Zhu, H. (2024). kableExtra: Construct complex table with ‘kable’ and pipe syntax (R package version 1.4.0). https://doi.org/10.32614/CRAN.package.kableExtra
13 Appendix: AI Usage Statement
Claude (Anthropic) was used to assist with the initial structuring of the Quarto document template, suggesting appropriate R packages for each analytical technique, and drafting skeleton code for the data reshaping pipeline. All analytical decisions — including the selection of Spearman over Pearson correlation given the skewed distribution of fuel volumes, the choice of log-transformation for the regression outcome, the decision to use ANOVA for the zone comparison and a paired t-test for the H1/H2 comparison, and the interpretation of all outputs in the context of Rainoil’s business — were made independently by the author. The business recommendations are the author’s own conclusions drawn from the data. All code was reviewed, tested, and understood line by line before submission.