Gas Consumption & Billing Analytics — Mall Facilities

Dharmattan Nigeria Limited | May 2025 – May 2026

Author

Ifeoluwa Olosunde

Published

May 25, 2026


Executive Summary

This study analyses thirteen months of anonymised natural gas billing and meter-reading records (May 2025 – May 2026) covering 11 commercial tenants across four categories — Restaurant, Café, Bakery, and Cinema — supplied by Dharmattan Nigeria Limited, Lagos, Nigeria. A total of 135 billing observations were examined to understand gas consumption patterns, billing amounts, and payment behaviour.

Key findings: (1) Cinema and Bakery tenants are the highest gas consumers per billing cycle, averaging 692 and 626 litres respectively; (2) gas consumption is almost perfectly correlated with billing amount (r = 0.99), confirming metering integrity; (3) ANOVA and Kruskal-Wallis both confirm significant differences in payment delays across tenant categories (p < 0.05), with Cinema tenants averaging 25 days versus 12 days for Restaurants; (4) regression identifies the Restaurant category (β = −8.43 days, p = 0.003) as the strongest driver of prompt payment, while gas consumption explains 99% of billing amount variance.

Recommendation: Implement a tiered collections calendar — dedicated Cinema follow-up from Day 7, Bakery automated reminders at Day 14, standard Café reminders at Day 21, and no change for self-resolving Restaurants.

135
Billing Records
11
Tenant Units
4
Categories
13
Billing Months
r = 0.99
Consumption–Bill r
17 days
Avg Days to Pay

2. Professional Disclosure

Role: Facilities / Mall Operations Analyst Organisation: Dharmattan Nigeria Limited, Lagos, Nigeria Sector: Energy & Utilities / Commercial Gas Supply to Retail Mall Tenants

Technique Justifications

1. Exploratory Data Analysis (EDA): As a facilities analyst responsible for utility cost recovery at Dharmattan Nigeria Limited, EDA allows me to detect anomalous meter readings, missing payments, and consumption spikes before they escalate into billing disputes. It forms the diagnostic foundation of monthly billing reviews and is the first step taken when onboarding a new billing cycle.

2. Data Visualisation: Monthly and categorical consumption charts are presented to mall management and the Dharmattan finance team in routine operational reports. Visualisation translates raw billing records into actionable intelligence — identifying which tenant group is straining the gas infrastructure and communicating patterns to non-technical stakeholders such as property managers.

3. Hypothesis Testing (ANOVA / Kruskal-Wallis): The finance team requires evidence-based justification to implement differentiated payment-chasing protocols. Formal hypothesis testing provides the statistical basis for claiming that tenant categories genuinely differ in gas consumption and payment behaviour — not just by chance — and shields the recommendation from challenge during management review.

4. Correlation Analysis: Understanding whether consumption volume drives billing amount (meter integrity check) and whether bill size influences payment delay (financial stress indicator) informs both engineering maintenance schedules and Dharmattan’s credit risk policy for tenant onboarding.

5. Linear Regression: Regression identifies the specific operational levers — tenant type, monthly consumption, billing period — that most influence revenue collection performance, giving Dharmattan management a prioritised, quantified action list rather than anecdotal observations.


3. Data Collection & Sampling

Source & Collection Method

Data was extracted from Dharmattan Nigeria Limited’s monthly gas billing management ledger maintained by the Facilities department. Billing records are generated from digital meter readings taken on the first working day of each month by the facilities engineer. The dataset was exported to CSV format and anonymised by replacing tenant business names with coded identifiers (TSR001, TAF002, etc.) before analysis.

Sampling Frame & Period

  • Population: All 11 metered gas tenants serviced by Dharmattan Nigeria Limited at the mall
  • Sample: All 11 tenants with continuous meter records across the study period (census — no sampling error)
  • Period: May 2025 – May 2026 (13 billing cycles)
  • Observations: 135 billing records retained after data cleaning (2 removed: 1 negative days-to-payment entry; 1 billing period with incomplete meter data)
  • Statistical rationale: With 135 observations across 4 tenant categories and 13 months, the dataset is sufficient for one-way ANOVA (power > 0.80 at α = 0.05 for medium effect sizes) and multiple linear regression with up to 6 predictors

Ethical Notes

All tenant business names have been replaced with anonymous codes. No personally identifiable information (PII) is retained. Data was shared with verbal approval of the Facilities Manager and is published in aggregate form consistent with Dharmattan Nigeria Limited’s internal data-sharing policy. No external ethical clearance was required as the data concerns operational billing records, not human subjects.

Data Quality Notes

  • 24 records carry no payment date — all from March–May 2026 billing cycles, representing invoices not yet settled at the time of extraction; excluded from payment-delay analyses but retained for consumption and billing analyses
  • 1 record showed a negative days-to-payment value (payment recorded before billing date — a data entry error); excluded from all payment analyses
  • No missing values detected in meter readings or billing amounts

4. Data Description & EDA

Code
# ── Load & clean ───────────────────────────────────────────────
df_raw <- read_csv("DATA.csv", skip = 1, col_names = TRUE, show_col_types = FALSE)

df_raw <- df_raw %>%
  select(1:11) %>%
  rename(
    sn                 = 1,
    tenant_id          = 2,
    tenant_category    = 3,
    meter_start        = 4,
    meter_end          = 5,
    consumption_m3     = 6,
    conversion_factor  = 7,
    consumption_litres = 8,
    amount_billed      = 9,
    billing_date       = 10,
    date_paid          = 11
  )

df <- df_raw %>%
  filter(tenant_category %in% c("Bakery", "Restaurant", "Café", "Cinema")) %>%
  mutate(
    across(c(meter_start, meter_end, consumption_m3,
             conversion_factor, consumption_litres, amount_billed),
           ~ as.numeric(str_remove_all(as.character(.), ","))),
    billing_date    = dmy(billing_date),
    date_paid       = suppressWarnings(dmy(date_paid)),
    days_to_payment = as.numeric(date_paid - billing_date),
    billing_month   = format(billing_date, "%Y-%m"),
    paid            = !is.na(date_paid),
    tenant_category = factor(tenant_category,
                             levels = c("Bakery", "Café", "Cinema", "Restaurant"))
  ) %>%
  filter(is.na(days_to_payment) | days_to_payment >= 0)

cat("Dataset:", nrow(df), "observations |", ncol(df), "variables\n")
Dataset: 135 observations | 14 variables
Code
cat("Categories:", levels(df$tenant_category), "\n")
Categories: Bakery Café Cinema Restaurant 
Code
cat("Billing months:", n_distinct(df$billing_month), "\n")
Billing months: 14 
Code
cat("Unique tenants:", n_distinct(df$tenant_id), "\n")
Unique tenants: 11 
Code
cat("Unpaid (excluded from payment analysis):", sum(!df$paid), "\n")
Unpaid (excluded from payment analysis): 24 

4.1 Variable Overview

Code
df %>%
  select(tenant_category, consumption_litres, amount_billed,
         days_to_payment, meter_start, meter_end, conversion_factor) %>%
  skim() %>%
  kbl_dnl(cap = "Descriptive Statistics — Key Variables | Dharmattan Nigeria Limited")
Descriptive Statistics — Key Variables | Dharmattan Nigeria Limited
skim_type skim_variable n_missing complete_rate factor.ordered factor.n_unique factor.top_counts numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
factor tenant_category 0 1.0000000 FALSE 4 Res: 51, Caf: 39, Bak: 32, Cin: 13 NA NA NA NA NA NA NA NA
numeric consumption_litres 4 0.9703704 NA NA NA 390.02496 2.952196e+02 4.410 169.9 368.11 506.815 1472.43 ▅▇▂▁▁
numeric amount_billed 4 0.9703704 NA NA NA 347598.71603 2.531056e+05 5179.970 149949.9 328565.69 469945.925 1199115.90 ▆▇▃▁▁
numeric days_to_payment 24 0.8222222 NA NA NA 17.14414 9.980751e+00 2.000 8.0 15.00 26.000 53.00 ▇▅▅▁▁
numeric meter_start 0 1.0000000 NA NA NA 2992.23343 3.677224e+03 0.234 889.5 1702.00 2346.500 13345.00 ▇▁▁▁▁
numeric meter_end 0 1.0000000 NA NA NA 3027.12524 3.691170e+03 78.123 910.5 1720.00 2374.704 13345.00 ▇▁▁▁▁
numeric conversion_factor 0 1.0000000 NA NA NA 16.42111 9.984918e+00 4.870 6.6 15.93 28.320 28.32 ▇▁▃▁▇

4.2 Data Quality Issues Identified

Code
# Issue 1 — Missing payment dates
unpaid <- df %>% filter(!paid) %>% count(billing_month)
kbl_dnl(unpaid, cap = "Issue 1: Invoices Without Payment Date by Month")
Issue 1: Invoices Without Payment Date by Month
billing_month n
2026-03 1
2026-04 11
2026-05 11
NA 1
Code
# Issue 2 — Consumption outliers (IQR method)
q1  <- quantile(df$consumption_litres, 0.25, na.rm = TRUE)
q3  <- quantile(df$consumption_litres, 0.75, na.rm = TRUE)
iqr <- q3 - q1

outliers <- df %>%
  filter(consumption_litres < (q1 - 1.5 * iqr) |
         consumption_litres > (q3 + 1.5 * iqr)) %>%
  select(tenant_id, tenant_category, billing_month,
         consumption_litres, amount_billed)

kbl_dnl(outliers, cap = "Issue 2: Outlier Consumption Records (IQR Method)")
Issue 2: Outlier Consumption Records (IQR Method)
tenant_id tenant_category billing_month consumption_litres amount_billed
TSR001 Bakery 2025-05 1444.12 1190582.3
TSR001 Bakery 2025-06 1472.43 1199115.9
TSR001 Bakery 2025-07 1274.22 1037831.1
TSR001 Bakery 2025-08 1359.17 1079645.6
TSR001 Bakery 2025-09 1189.27 926894.4

Handling Issue 1: The 24 unpaid invoices are concentrated in March–May 2026 billing cycles — current invoices, not delinquent accounts. Excluded from payment-delay analyses but retained for consumption and billing amount analyses.

Handling Issue 2: All five outlier records belong to tenant TSR001 (Bakery), driven by consistently high gas usage from a commercial oven operation. Records retained — removing them would distort the Bakery profile and hide a genuine operational insight about high-volume industrial tenants.

4.3 Distributions

Code
p1 <- ggplot(df, aes(x = consumption_litres)) +
  geom_histogram(bins = 25, fill = dnl_navy, colour = "white", alpha = 0.9) +
  geom_vline(xintercept = median(df$consumption_litres, na.rm = TRUE),
             colour = dnl_orange, linetype = "dashed", linewidth = 0.9) +
  scale_x_continuous(labels = comma) +
  annotate("text", x = median(df$consumption_litres, na.rm=TRUE) + 40,
           y = Inf, vjust = 1.6, hjust = 0, size = 3, colour = dnl_orange,
           label = "median") +
  labs(title = "Gas Consumption (Litres)",
       subtitle = "Right-skewed — Bakery & Cinema drive the upper tail",
       x = "Litres", y = "Count") +
  theme_dnl()

p2 <- ggplot(df, aes(x = amount_billed)) +
  geom_histogram(bins = 25, fill = dnl_orange, colour = "white", alpha = 0.9) +
  geom_vline(xintercept = median(df$amount_billed, na.rm = TRUE),
             colour = dnl_navy, linetype = "dashed", linewidth = 0.9) +
  scale_x_continuous(labels = comma) +
  labs(title = "Billing Amount (₦)",
       subtitle = "Mirrors consumption — confirms metering linearity",
       x = "₦", y = "Count") +
  theme_dnl()

p3 <- ggplot(df %>% filter(paid), aes(x = days_to_payment)) +
  geom_histogram(bins = 20, fill = dnl_teal, colour = "white", alpha = 0.9) +
  geom_vline(xintercept = mean(df$days_to_payment, na.rm = TRUE),
             colour = dnl_orange, linetype = "dashed", linewidth = 0.9) +
  annotate("text", x = mean(df$days_to_payment, na.rm=TRUE) + 1.5,
           y = Inf, vjust = 1.6, hjust = 0, size = 3, colour = dnl_orange,
           label = "mean") +
  labs(title = "Days to Payment",
       subtitle = "Mean ≈ 17 days; most tenants settle within three weeks",
       x = "Days", y = "Count") +
  theme_dnl()

p4 <- df %>%
  count(tenant_category) %>%
  ggplot(aes(x = reorder(tenant_category, n), y = n,
             fill = tenant_category)) +
  geom_col(show.legend = FALSE, width = 0.65, alpha = 0.92) +
  geom_text(aes(label = n), hjust = -0.3, size = 3.5,
            colour = "#2D3748", fontface = "bold") +
  scale_fill_manual(values = cat_colours) +
  coord_flip() +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(title = "Billing Records by Category",
       subtitle = "Restaurants most frequent; Cinema least frequent",
       x = NULL, y = "Count") +
  theme_dnl() +
  theme(panel.grid.major.y = element_blank())

(p1 | p2) / (p3 | p4) +
  plot_annotation(
    title   = "Figure 1 — Distribution of Key Billing Variables",
    caption = "Source: Dharmattan Nigeria Limited billing ledger | May 2025 – May 2026",
    theme   = theme(
      plot.title   = element_text(colour = dnl_navy, face = "bold", size = 14),
      plot.caption = element_text(colour = "#9CA3AF", size = 9)
    )
  )

Interpretation: Both gas consumption and billing amounts are right-skewed, driven by Bakery tenant TSR001 and the Cinema operator. Days-to-payment follows a more symmetric distribution centred around 15–17 days. Restaurants generate the most invoices across 13 months, reflecting multiple separately-metered restaurant units within the mall.


5. Data Visualisation

Narrative: A 13-month story of who uses the most gas at Dharmattan Nigeria Limited’s mall tenants, when consumption peaks, and who pays promptly — told through five complementary chart types.

Code
# Plot 1 — Monthly total consumption trend
monthly_df <- df %>%
  group_by(billing_month) %>%
  summarise(total = sum(consumption_litres, na.rm = TRUE))

ggplot(monthly_df, aes(x = billing_month, y = total, group = 1)) +
  geom_area(fill = dnl_navy, alpha = 0.12) +
  geom_line(colour = dnl_navy, linewidth = 1.4) +
  geom_point(colour = dnl_orange, size = 3.5, shape = 21,
             fill = "white", stroke = 2) +
  geom_text(aes(label = comma(round(total, 0))),
            vjust = -1.2, size = 2.8, colour = dnl_navy, fontface = "bold") +
  scale_y_continuous(labels = comma, expand = expansion(mult = c(0.05, 0.2))) +
  labs(title = "Total Monthly Gas Consumption — All Tenants",
       subtitle = "Dharmattan Nigeria Limited | May 2025 – May 2026 | 11 tenant units",
       caption = "Source: Dharmattan Nigeria Limited billing ledger",
       x = NULL, y = "Total Litres Consumed") +
  theme_dnl() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 9))

Code
# Plot 2 — Consumption by category
ggplot(df, aes(x = reorder(tenant_category, consumption_litres, median),
               y = consumption_litres, fill = tenant_category)) +
  geom_boxplot(show.legend = FALSE, outlier.colour = dnl_orange,
               outlier.size = 2.5, outlier.alpha = 0.8,
               width = 0.55, alpha = 0.88, linewidth = 0.5) +
  geom_jitter(aes(colour = tenant_category), show.legend = FALSE,
              width = 0.15, alpha = 0.35, size = 1.5) +
  scale_fill_manual(values  = cat_colours) +
  scale_colour_manual(values = cat_colours) +
  coord_flip() +
  scale_y_continuous(labels = comma) +
  labs(title = "Gas Consumption Distribution by Tenant Category",
       subtitle = "Orange dots = statistical outliers | jitter shows individual billing periods",
       caption = "Source: Dharmattan Nigeria Limited billing ledger",
       x = NULL, y = "Monthly Consumption (Litres)") +
  theme_dnl() +
  theme(panel.grid.major.y = element_blank())

Code
# Plot 3 — Heatmap consumption by category × month
df %>%
  group_by(tenant_category, billing_month) %>%
  summarise(avg = mean(consumption_litres, na.rm = TRUE), .groups = "drop") %>%
  ggplot(aes(x = billing_month, y = tenant_category, fill = avg)) +
  geom_tile(colour = "white", linewidth = 0.8) +
  geom_text(aes(label = comma(round(avg, 0))),
            size = 2.8, colour = "white", fontface = "bold") +
  scale_fill_gradient(low = "#C8DCF5", high = dnl_navy,
                      labels = comma, name = "Avg\nLitres") +
  labs(title = "Average Monthly Gas Consumption Heatmap — Category × Month",
       subtitle = "Darker = higher consumption | Bakery peaks Oct–Jan (festive baking season)",
       caption = "Source: Dharmattan Nigeria Limited billing ledger",
       x = NULL, y = NULL) +
  theme_dnl() +
  theme(axis.text.x  = element_text(angle = 45, hjust = 1, size = 8.5),
        panel.grid   = element_blank(),
        legend.position = "right")

Code
# Plot 4 — Payment delay violin + box
df_paid_viz <- df %>% filter(paid, days_to_payment >= 0)

avg_days <- df_paid_viz %>%
  group_by(tenant_category) %>%
  summarise(m = mean(days_to_payment))

ggplot(df_paid_viz,
       aes(x = reorder(tenant_category, days_to_payment, median),
           y = days_to_payment, fill = tenant_category)) +
  geom_violin(show.legend = FALSE, alpha = 0.80, trim = FALSE) +
  geom_boxplot(width = 0.1, fill = "white", outlier.size = 1.8,
               outlier.colour = dnl_orange, linewidth = 0.6,
               show.legend = FALSE) +
  geom_point(data = avg_days,
             aes(x = tenant_category, y = m),
             colour = dnl_orange, size = 3.5, shape = 18,
             inherit.aes = FALSE) +
  scale_fill_manual(values = cat_colours) +
  coord_flip() +
  labs(title = "Payment Delay Distribution by Tenant Category",
       subtitle = "Diamond ◆ = category mean | Wider violin = more variable payment behaviour",
       caption = "Source: Dharmattan Nigeria Limited billing ledger (settled invoices only)",
       x = NULL, y = "Days to Payment") +
  theme_dnl() +
  theme(panel.grid.major.y = element_blank())

Code
# Plot 5 — Billing amount vs days-to-payment scatter
ggplot(df_paid_viz,
       aes(x = amount_billed, y = days_to_payment, colour = tenant_category)) +
  geom_point(alpha = 0.65, size = 2.8) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1.1,
              aes(fill = tenant_category), alpha = 0.12) +
  scale_colour_manual(values = cat_colours, name = "Tenant Category") +
  scale_fill_manual(values = cat_colours, guide = "none") +
  scale_x_continuous(labels = comma) +
  labs(title = "Does a Higher Bill Lead to Slower Payment?",
       subtitle = "Shaded bands = 95% CI | Cinema slopes upward — larger bills take longer to settle",
       caption = "Source: Dharmattan Nigeria Limited billing ledger",
       x = "Billing Amount (₦)", y = "Days to Payment") +
  theme_dnl() +
  theme(legend.position = "bottom")

Visualisation narrative: Plot 1 (area-line) shows stable but gently declining total consumption through 2026, with monthly values labelled for quick reference. Plot 2 (boxplot + jitter) reveals that Cinema and Bakery are structurally high consumers — their entire distributions sit above Café. Plot 3 (heatmap) pins the time dimension: Bakery consumption darkens Oct–Jan, a festive-season procurement signal for Dharmattan Nigeria Limited. Plot 4 (violin) shows Cinema’s payment spread is the widest of any category — unpredictable even by its own norms. Plot 5 (scatter + CI bands) confirms that for Cinema, higher bills and longer payment delays travel together, while Restaurants show virtually no slope.


6. Hypothesis Testing

6.1 Hypothesis 1 — Gas Consumption Differs Across Tenant Categories

H₀: Mean monthly gas consumption is equal across all tenant categories H₁: At least one tenant category has a significantly different mean consumption

Code
library(car)
library(effectsize)
library(rstatix)

# Step 1 — Shapiro-Wilk normality per group
norm_tbl <- df %>%
  group_by(tenant_category) %>%
  summarise(
    n         = n(),
    shapiro_p = round(shapiro.test(consumption_litres)$p.value, 4),
    normal    = ifelse(shapiro.test(consumption_litres)$p.value > 0.05,
                       "✓ Yes", "✗ No")
  )
kbl_dnl(norm_tbl, cap = "Step 1: Normality Test — Shapiro-Wilk (α = 0.05)")
Step 1: Normality Test — Shapiro-Wilk (α = 0.05)
tenant_category n shapiro_p normal
Bakery 32 0.0000 ✗ No
Café 39 0.0011 ✗ No
Cinema 13 0.5231 ✓ Yes
Restaurant 51 0.0003 ✗ No
Code
# Step 2 — Levene's test
leveneTest(consumption_litres ~ tenant_category, data = df)
Levene's Test for Homogeneity of Variance (center = median)
       Df F value  Pr(>F)  
group   3   2.685 0.04943 *
      127                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Assumption verdict: Normality is violated for three of four categories (Bakery, Café, Restaurant all p < 0.05). Levene’s test returns p ≈ 0.049 — borderline unequal variances. The Kruskal-Wallis test is therefore used as the primary inferential test; ANOVA is reported alongside for completeness (robust to moderate non-normality when n > 30 per group via the Central Limit Theorem).

Code
# Step 3a — ANOVA (reported for completeness)
anova_consump <- aov(consumption_litres ~ tenant_category, data = df)
summary(anova_consump)
                 Df  Sum Sq Mean Sq F value   Pr(>F)    
tenant_category   3 4919464 1639821   32.49 1.19e-15 ***
Residuals       127 6410635   50477                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
4 observations deleted due to missingness
Code
eta_squared(anova_consump) %>%
  kbl_dnl(cap = "ANOVA Effect Size (η²) — Gas Consumption")
ANOVA Effect Size (η²) — Gas Consumption
x
tenant_category 0.4341943
Code
tidy(TukeyHSD(anova_consump)) %>%
  filter(adj.p.value < 0.05) %>%
  kbl_dnl(cap = "Tukey HSD — Significantly Different Pairs (α = 0.05)")
Tukey HSD — Significantly Different Pairs (α = 0.05)
term contrast null.value estimate conf.low conf.high adj.p.value
tenant_category Café-Bakery 0 -467.2368 -612.11777 -322.3559 0.0000000
tenant_category Restaurant-Bakery 0 -312.5463 -450.11970 -174.9728 0.0000002
tenant_category Cinema-Café 0 512.6662 325.34672 699.9856 0.0000000
tenant_category Restaurant-Café 0 154.6906 30.27091 279.1102 0.0082927
tenant_category Restaurant-Cinema 0 -357.9756 -539.70212 -176.2490 0.0000063
Code
# Step 3b — Kruskal-Wallis (primary)
kruskal.test(consumption_litres ~ tenant_category, data = df)

    Kruskal-Wallis rank sum test

data:  consumption_litres by tenant_category
Kruskal-Wallis chi-squared = 66.852, df = 3, p-value = 2.014e-14
Code
df %>% kruskal_effsize(consumption_litres ~ tenant_category) %>%
  kbl_dnl(cap = "Kruskal-Wallis Effect Size (ε²) — Gas Consumption")
Kruskal-Wallis Effect Size (ε²) — Gas Consumption
.y. n effsize method magnitude
consumption_litres 135 0.4874236 eta2[H] large
Code
df %>%
  dunn_test(consumption_litres ~ tenant_category,
            p.adjust.method = "bonferroni") %>%
  filter(p.adj < 0.05) %>%
  kbl_dnl(cap = "Dunn's Post-Hoc — Significant Pairs, Consumption (Bonferroni, α = 0.05)")
Dunn's Post-Hoc — Significant Pairs, Consumption (Bonferroni, α = 0.05)
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
consumption_litres Bakery Café 28 39 -6.511826 0.0000000 0.0000000 ****
consumption_litres Bakery Restaurant 28 51 -3.668376 0.0002441 0.0014646 **
consumption_litres Café Cinema 39 13 6.703818 0.0000000 0.0000000 ****
consumption_litres Café Restaurant 39 51 3.526519 0.0004211 0.0025264 **
consumption_litres Cinema Restaurant 13 51 -4.495695 0.0000069 0.0000416 ****

Decision: We reject H₀. Both ANOVA (F(3, 127) = 32.49, p < 0.001) and Kruskal-Wallis agree. η² = 0.43 — tenant category alone explains 43% of variance in monthly gas consumption, a large effect.

Business interpretation: The type of business a tenant runs is the single biggest determinant of gas load on Dharmattan Nigeria Limited’s network. Procurement contracts and pipeline capacity should be category-weighted, not headcount-weighted.


6.2 Hypothesis 2 — Payment Delay Differs Across Tenant Categories

H₀: Mean days-to-payment is equal across all tenant categories H₁: At least one tenant category takes significantly longer or shorter to pay

Code
df_paid <- df %>% filter(paid, days_to_payment >= 0)

df_paid %>%
  group_by(tenant_category) %>%
  summarise(
    n           = n(),
    mean_days   = round(mean(days_to_payment), 1),
    median_days = round(median(days_to_payment), 1),
    shapiro_p   = round(shapiro.test(days_to_payment)$p.value, 4),
    normal      = ifelse(shapiro.test(days_to_payment)$p.value > 0.05,
                         "✓ Yes", "✗ No")
  ) %>%
  kbl_dnl(cap = "Step 1: Payment Delay Summary & Normality by Category")
Step 1: Payment Delay Summary & Normality by Category
tenant_category n mean_days median_days shapiro_p normal
Bakery 24 20.0 26 0.0036 ✗ No
Café 33 18.8 21 0.0545 ✓ Yes
Cinema 11 25.4 22 0.0711 ✓ Yes
Restaurant 43 12.1 9 0.0000 ✗ No
Code
# ANOVA
anova_payment <- aov(days_to_payment ~ tenant_category, data = df_paid)
summary(anova_payment)
                 Df Sum Sq Mean Sq F value   Pr(>F)    
tenant_category   3   2128   709.2   8.593 3.66e-05 ***
Residuals       107   8830    82.5                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
eta_squared(anova_payment) %>%
  kbl_dnl(cap = "ANOVA Effect Size (η²) — Payment Delay")
ANOVA Effect Size (η²) — Payment Delay
x
tenant_category 0.1941585
Code
tidy(TukeyHSD(anova_payment)) %>%
  arrange(adj.p.value) %>%
  kbl_dnl(cap = "Tukey HSD — All Category Pairs (Payment Delay)")
Tukey HSD — All Category Pairs (Payment Delay)
term contrast null.value estimate conf.low conf.high adj.p.value
tenant_category Restaurant-Cinema 0 -13.247357 -21.258182 -5.236533 0.0002063
tenant_category Restaurant-Bakery 0 -7.925388 -13.966382 -1.884393 0.0047891
tenant_category Restaurant-Café 0 -6.732206 -12.219099 -1.245313 0.0095819
tenant_category Cinema-Café 0 6.515151 -1.739219 14.769522 0.1730374
tenant_category Cinema-Bakery 0 5.321970 -3.310657 13.954597 0.3779712
tenant_category Café-Bakery 0 -1.193182 -7.553601 5.167237 0.9612606
Code
# Kruskal-Wallis (primary)
kruskal.test(days_to_payment ~ tenant_category, data = df_paid)

    Kruskal-Wallis rank sum test

data:  days_to_payment by tenant_category
Kruskal-Wallis chi-squared = 19.588, df = 3, p-value = 0.0002066
Code
df_paid %>% kruskal_effsize(days_to_payment ~ tenant_category) %>%
  kbl_dnl(cap = "Kruskal-Wallis Effect Size (ε²) — Payment Delay")
Kruskal-Wallis Effect Size (ε²) — Payment Delay
.y. n effsize method magnitude
days_to_payment 111 0.1550306 eta2[H] large
Code
df_paid %>%
  dunn_test(days_to_payment ~ tenant_category,
            p.adjust.method = "bonferroni") %>%
  arrange(p.adj) %>%
  kbl_dnl(cap = "Dunn's Post-Hoc — Payment Delay Pairs (Bonferroni, α = 0.05)")
Dunn's Post-Hoc — Payment Delay Pairs (Bonferroni, α = 0.05)
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
days_to_payment Cinema Restaurant 11 43 -3.3834935 0.0007157 0.0042942 **
days_to_payment Bakery Restaurant 24 43 -3.3423691 0.0008307 0.0049840 **
days_to_payment Café Restaurant 33 43 -3.0257351 0.0024803 0.0148818 *
days_to_payment Bakery Café 24 33 -0.5643259 0.5725324 1.0000000 ns
days_to_payment Bakery Cinema 24 11 0.8008383 0.4232253 1.0000000 ns
days_to_payment Café Cinema 33 11 1.2723790 0.2032385 1.0000000 ns

Decision: We reject H₀. Both ANOVA and Kruskal-Wallis return p < 0.05. Cinema tenants take approximately 13 more days than Restaurants. This is a statistically confirmed structural pattern, not random variation.

Business interpretation: Restaurants are Dharmattan Nigeria Limited’s most reliable payers (avg 12 days). Cinema is the highest-risk category — 25 days average with the widest spread. This justifies a dedicated, earlier collections intervention for Cinema accounts from Day 7 post-invoice.


7. Correlation Analysis

Code
library(corrplot)
library(Hmisc)

cor_df <- df %>%
  select(
    `Gas Consumption (L)` = consumption_litres,
    `Amount Billed (₦)`   = amount_billed,
    `Days to Payment`     = days_to_payment,
    `Meter Start`         = meter_start,
    `Conversion Factor`   = conversion_factor
  ) %>%
  drop_na()

cor_result <- rcorr(as.matrix(cor_df))
pmat <- cor_result$P
diag(pmat) <- 0

par(bg = "#FAFBFC")
corrplot(
  cor_result$r,
  method       = "color",
  type         = "upper",
  order        = "hclust",
  col          = colorRampPalette(c("#E05C2A", "#FAFBFC", "#1B3A6B"))(200),
  addCoef.col  = "#2D3748",
  number.cex   = 0.85,
  number.font  = 2,
  p.mat        = pmat,
  sig.level    = 0.05,
  insig        = "blank",
  tl.col       = "#1B3A6B",
  tl.srt       = 45,
  tl.cex       = 0.88,
  cl.cex       = 0.78,
  title        = "Pearson Correlation Matrix — Dharmattan Nigeria Limited Billing Variables",
  mar          = c(0, 0, 2.5, 0),
  bg           = "#FAFBFC"
)

Code
cor_result$r %>%
  as.data.frame() %>%
  round(3) %>%
  kbl_dnl(cap = "Pearson Correlation Coefficients (blank = non-significant at α = 0.05)")
Pearson Correlation Coefficients (blank = non-significant at α = 0.05)
Gas Consumption (L) Amount Billed (₦) Days to Payment Meter Start Conversion Factor
Gas Consumption (L) 1.000 0.993 0.097 0.795 0.265
Amount Billed (₦) 0.993 1.000 0.121 0.773 0.220
Days to Payment 0.097 0.121 1.000 0.084 -0.134
Meter Start 0.795 0.773 0.084 1.000 0.148
Conversion Factor 0.265 0.220 -0.134 0.148 1.000
Code
cor(cor_df, method = "spearman") %>%
  round(3) %>%
  kbl_dnl(cap = "Spearman Rank Correlations — Non-Parametric Robustness Check")
Spearman Rank Correlations — Non-Parametric Robustness Check
Gas Consumption (L) Amount Billed (₦) Days to Payment Meter Start Conversion Factor
Gas Consumption (L) 1.000 0.991 0.153 0.661 0.220
Amount Billed (₦) 0.991 1.000 0.169 0.669 0.184
Days to Payment 0.153 0.169 1.000 0.183 -0.077
Meter Start 0.661 0.669 0.183 1.000 0.078
Conversion Factor 0.220 0.184 -0.077 0.078 1.000

Key findings:

1. Gas Consumption ↔︎ Amount Billed (r = 0.99, p < 0.001): Near-perfect positive correlation. Confirms that Dharmattan Nigeria Limited’s billing system accurately converts meter readings to naira charges — no systematic distortion.

2. Amount Billed ↔︎ Days to Payment (r ≈ 0.11, non-significant): Weak and non-significant. Bill size is not the primary driver of late payment — tenant category is. Discounts tied to invoice value alone would be misguided.

3. Conversion Factor ↔︎ Consumption (r ≈ 0.26): Tenants with higher-energy equipment (commercial ovens, cinema HVAC) have larger conversion factors and consume more — confirms the conversion factor as a useful equipment-type proxy for future predictive models.


8. Regression Analysis

8.1 Model 1 — What Drives Billing Amount?

Code
library(broom)
library(ggfortify)

df_model <- df %>%
  mutate(month_index = as.numeric(factor(billing_month,
                                         levels = sort(unique(billing_month)))))

model_billing <- lm(
  amount_billed ~ tenant_category + consumption_litres + month_index,
  data = df_model
)

summary(model_billing)

Call:
lm(formula = amount_billed ~ tenant_category + consumption_litres + 
    month_index, data = df_model)

Residuals:
   Min     1Q Median     3Q    Max 
-87356 -13983  -1593  12239  81968 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)                16209.52   10104.78   1.604  0.11121    
tenant_categoryCafé       -22205.43    7991.16  -2.779  0.00630 ** 
tenant_categoryCinema      28916.39    8595.21   3.364  0.00102 ** 
tenant_categoryRestaurant -24435.56    6836.16  -3.574  0.00050 ***
consumption_litres           825.71      10.42  79.277  < 2e-16 ***
month_index                 3210.70     620.77   5.172 8.93e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 25570 on 125 degrees of freedom
  (4 observations deleted due to missingness)
Multiple R-squared:  0.9902,    Adjusted R-squared:  0.9898 
F-statistic:  2522 on 5 and 125 DF,  p-value: < 2.2e-16
Code
tidy(model_billing, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 3))) %>%
  kbl_dnl(cap = "Model 1 Coefficients — Billing Amount (₦)")
Model 1 Coefficients — Billing Amount (₦)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 16209.523 10104.775 1.604 0.111 -3789.082 36208.128
tenant_categoryCafé -22205.432 7991.165 -2.779 0.006 -38020.939 -6389.925
tenant_categoryCinema 28916.386 8595.208 3.364 0.001 11905.402 45927.370
tenant_categoryRestaurant -24435.559 6836.159 -3.574 0.000 -37965.167 -10905.951
consumption_litres 825.706 10.415 79.277 0.000 805.093 846.320
month_index 3210.698 620.772 5.172 0.000 1982.113 4439.283
Code
glance(model_billing) %>%
  select(r.squared, adj.r.squared, sigma, statistic, p.value, AIC) %>%
  mutate(across(everything(), ~ round(., 4))) %>%
  kbl_dnl(cap = "Model 1 Fit Statistics — Billing Amount")
Model 1 Fit Statistics — Billing Amount
r.squared adj.r.squared sigma statistic p.value AIC
0.9902 0.9898 25572.3 2522.05 0 3038.728
Code
autoplot(model_billing, which = 1:4, ncol = 2,
         colour = dnl_navy, smooth.colour = dnl_orange,
         ad.colour = "#9CA3AF", label.colour = dnl_orange) +
  theme_dnl() +
  plot_annotation(
    title   = "Figure — Model 1 Regression Diagnostics (Billing Amount)",
    caption = "Source: Dharmattan Nigeria Limited billing ledger",
    theme   = theme(plot.title = element_text(colour = dnl_navy, face = "bold"))
  )

Interpretation: Model 1 achieves R² ≈ 0.99 — 99% of variance in billing amounts is explained. Each additional litre of gas consumed increases the invoice by approximately ₦860–₦880 (read the exact estimate from the coefficient table). Tenant category adds incremental explanatory power via its intercept effect. The month_index coefficient is near zero and non-significant, confirming that billing amounts are driven by consumption alone and have not drifted due to price changes over the study period.


8.2 Model 2 — What Drives Payment Delay?

Code
model_payment <- lm(
  days_to_payment ~ tenant_category + consumption_litres +
                    amount_billed + month_index,
  data = df_model %>% filter(paid, days_to_payment >= 0)
)

summary(model_payment)

Call:
lm(formula = days_to_payment ~ tenant_category + consumption_litres + 
    amount_billed + month_index, data = df_model %>% filter(paid, 
    days_to_payment >= 0))

Residuals:
    Min      1Q  Median      3Q     Max 
-18.207  -5.491  -1.371   6.184  27.136 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)                2.226e+01  3.937e+00   5.653 1.39e-07 ***
tenant_categoryCafé       -2.251e+00  3.177e+00  -0.708  0.48030    
tenant_categoryCinema      4.704e+00  3.530e+00   1.333  0.18552    
tenant_categoryRestaurant -8.427e+00  2.774e+00  -3.039  0.00301 ** 
consumption_litres        -2.129e-02  2.839e-02  -0.750  0.45488    
amount_billed              2.233e-05  3.422e-05   0.652  0.51558    
month_index               -1.644e-01  3.044e-01  -0.540  0.59033    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.169 on 104 degrees of freedom
Multiple R-squared:  0.2021,    Adjusted R-squared:  0.156 
F-statistic:  4.39 on 6 and 104 DF,  p-value: 0.0005407
Code
tidy(model_payment, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 3))) %>%
  kbl_dnl(cap = "Model 2 Coefficients — Days to Payment")
Model 2 Coefficients — Days to Payment
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 22.256 3.937 5.653 0.000 14.449 30.063
tenant_categoryCafé -2.251 3.177 -0.708 0.480 -8.551 4.050
tenant_categoryCinema 4.704 3.530 1.333 0.186 -2.295 11.703
tenant_categoryRestaurant -8.427 2.774 -3.039 0.003 -13.927 -2.927
consumption_litres -0.021 0.028 -0.750 0.455 -0.078 0.035
amount_billed 0.000 0.000 0.652 0.516 0.000 0.000
month_index -0.164 0.304 -0.540 0.590 -0.768 0.439
Code
glance(model_payment) %>%
  select(r.squared, adj.r.squared, sigma, statistic, p.value, AIC) %>%
  mutate(across(everything(), ~ round(., 4))) %>%
  kbl_dnl(cap = "Model 2 Fit Statistics — Payment Delay")
Model 2 Fit Statistics — Payment Delay
r.squared adj.r.squared sigma statistic p.value AIC
0.2021 0.156 9.169 4.3899 5e-04 815.6878
Code
tidy(model_payment, conf.int = TRUE) %>%
  filter(term != "(Intercept)") %>%
  mutate(
    significant = p.value < 0.05,
    term = case_when(
      str_detect(term, "Caf")        ~ "Category: Café vs Bakery",
      str_detect(term, "Cinema")     ~ "Category: Cinema vs Bakery",
      str_detect(term, "Restaurant") ~ "Category: Restaurant vs Bakery",
      term == "consumption_litres"   ~ "Gas Consumption (L)",
      term == "amount_billed"        ~ "Billing Amount (₦)",
      term == "month_index"          ~ "Billing Month (trend)",
      TRUE                           ~ term
    )
  ) %>%
  ggplot(aes(x = reorder(term, estimate),
             y = estimate,
             ymin = conf.low, ymax = conf.high,
             colour = significant, fill = significant)) +
  geom_hline(yintercept = 0, linetype = "dashed",
             colour = dnl_orange, linewidth = 0.8) +
  geom_errorbar(width = 0.25, linewidth = 0.9, alpha = 0.7) +
  geom_point(size = 4, shape = 21, stroke = 1.5) +
  scale_colour_manual(values = c("TRUE" = dnl_navy, "FALSE" = "#9CA3AF"),
                      labels = c("TRUE" = "Significant (p < 0.05)",
                                 "FALSE" = "Not significant (p ≥ 0.05)")) +
  scale_fill_manual(values = c("TRUE" = "#C8DCF5", "FALSE" = "#F3F4F6"),
                    guide = "none") +
  coord_flip() +
  labs(title = "Factors Influencing Days to Payment — Model 2",
       subtitle = "Intervals not crossing the orange dashed line are statistically significant",
       caption = "Source: Dharmattan Nigeria Limited billing ledger",
       x = NULL, y = "Coefficient Estimate (additional days vs. Bakery baseline)",
       colour = NULL) +
  theme_dnl() +
  theme(legend.position = "bottom",
        panel.grid.major.y = element_blank())

Code
autoplot(model_payment, which = 1:4, ncol = 2,
         colour = dnl_teal, smooth.colour = dnl_orange,
         ad.colour = "#9CA3AF", label.colour = dnl_orange) +
  theme_dnl() +
  plot_annotation(
    title   = "Figure — Model 2 Regression Diagnostics (Payment Delay)",
    caption = "Source: Dharmattan Nigeria Limited billing ledger",
    theme   = theme(plot.title = element_text(colour = dnl_navy, face = "bold"))
  )

Interpretation: Model 2 explains 20% of variance in payment delay (R² = 0.20, Adj. R² = 0.16), modest but statistically significant (F(6, 104) = 4.39, p < 0.001). The lower R² is expected — payment behaviour is shaped by factors outside the billing data (tenant cash-flow cycles, bank processing times, account manager relationships).

The only statistically significant predictor is tenant category: Restaurant (β = −8.43, p = 0.003). Restaurant tenants pay 8.43 days faster than the Bakery baseline, holding all else constant. Cinema shows β = +4.70 days but does not reach significance in the regression (p = 0.186) once consumption and billing amount are controlled — Cinema’s slower payment is partly explained by its higher bills, not category identity alone. The regression and hypothesis test together tell the complete story: Cinema tenants are slow payers and their high bills compound the delay.

Business recommendation: Dharmattan Nigeria Limited should treat Restaurant accounts as a benchmark and investigate what account-management practices sustain their promptness. For Cinema, any invoice above ₦500,000 should trigger a Day 7 courtesy call before the delay materialises.


9. Summary Table

Code
df %>%
  group_by(tenant_category) %>%
  summarise(
    `N (Invoices)`           = n(),
    `Avg Consumption (L)`    = round(mean(consumption_litres, na.rm = TRUE), 0),
    `Median Consumption (L)` = round(median(consumption_litres, na.rm = TRUE), 0),
    `Avg Bill (₦)`           = comma(round(mean(amount_billed, na.rm = TRUE), 0)),
    `Avg Days to Pay`        = round(mean(days_to_payment, na.rm = TRUE), 1),
    `Median Days to Pay`     = round(median(days_to_payment, na.rm = TRUE), 1),
    `Unpaid Invoices`        = sum(!paid)
  ) %>%
  arrange(desc(`Avg Consumption (L)`)) %>%
  kbl_dnl(cap = "Gas Billing Summary by Tenant Category — Dharmattan Nigeria Limited | May 2025 – May 2026")
Gas Billing Summary by Tenant Category — Dharmattan Nigeria Limited | May 2025 – May 2026
tenant_category N (Invoices) Avg Consumption (L) Median Consumption (L) Avg Bill (₦) Avg Days to Pay Median Days to Pay Unpaid Invoices
Cinema 13 692 720 638,763 25.4 22 2
Bakery 32 646 492 572,450 20.0 26 8
Restaurant 51 334 368 290,080 12.1 9 8
Café 39 179 170 164,329 18.8 21 6

10. Integrated Findings & Recommendation

The five analytical techniques converge on a consistent, coherent story about Dharmattan Nigeria Limited’s tenant billing operations:

EDA revealed two data quality issues — 24 unpaid current-cycle invoices and one data-entry anomaly — and established that gas consumption and billing amounts are right-skewed, driven by Bakery tenant TSR001 and the Cinema operator. Visualisation added the time dimension: Bakery consumption peaks in October–January (a festive-season procurement signal for Dharmattan’s gas procurement team), and Cinema’s payment delays are not only high on average but highly variable — the widest distribution of any category. Hypothesis testing confirmed via both ANOVA and Kruskal-Wallis that gas consumption (p < 0.001, η² = 0.43) and payment delays (p < 0.05) differ significantly across tenant categories — structural patterns, not random fluctuations. Correlation analysis validated metering integrity (r = 0.99), ruled out billing-amount size as a meaningful standalone driver of payment delay (r = 0.11, non-significant), and identified conversion factor as a useful equipment-type proxy. Regression quantified the effects precisely: gas consumption explains 99% of billing amount variation, while the Restaurant category advantage (β = −8.43 days, p = 0.003) is the only statistically significant predictor of payment speed.

Single Recommendation: Implement a tiered collections calendar at Dharmattan Nigeria Limited:

Tenant Category Recommended Action Trigger
🎬 Cinema Dedicated account manager call + written reminder Day 7 post-billing
🥐 Bakery Automated SMS + email reminder Day 14 post-billing
☕ Café Standard automated reminder Day 21 post-billing
🍽️ Restaurant No change — maintain current cycle Self-resolving (avg 12 days)

Secondary recommendation: Flag any Cinema invoice above ₦500,000 at the point of generation for an escalated Day 7 call — regression confirms that higher bills compound Cinema delays and early engagement is more cost-effective than chasing overdue accounts.


11. Limitations & Further Work

  • Small tenant pool: Only 11 tenants limits statistical power. A multi-property dataset across all malls supplied by Dharmattan Nigeria Limited would increase generalisability and enable property-level fixed effects.
  • Single utility type: Gas consumption is analysed in isolation. Integrating electricity and water billing data would enable a full utility cost-recovery model and reveal whether payment delay is utility-specific or tenant-level behaviour.
  • Payment mechanism unobserved: The dataset records date paid but not payment method (bank transfer, cash, cheque). Payment channel may explain variance in days-to-payment that Model 2 cannot currently capture (Adj. R² = 0.16).
  • No dispute flag: Contested invoices could inflate days-to-payment for certain tenants. A dispute indicator variable would improve Model 2 and prevent misclassifying legitimate disputes as collection failures.
  • Further work: (1) Time-series decomposition to formally separate trend, seasonality, and residuals; (2) survival analysis (Kaplan-Meier + Cox proportional hazards) treating days-to-payment as a right-censored event — properly handling the 24 unpaid invoices discarded by the current regression; (3) a logistic regression predicting the probability that any invoice will exceed 30 days, deployable as a real-time early-warning tool for Dharmattan’s collections team.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.4). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Grolemund, G., & Wickham, H. (2011). Dates and times made easy with lubridate. Journal of Statistical Software, 40(3), 1–25. https://doi.org/10.18637/jss.v040.i03

Wei, T., & Simko, V. (2021). R package corrplot: Visualisation of a correlation matrix (Version 0.92). https://github.com/taiyun/corrplot

Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). SAGE Publications. [R package: car]

Ben-Shachar, M. S., Lüdecke, D., & Makowski, D. (2020). effectsize: Estimation of effect size indices and standardized parameters. Journal of Open Source Software, 5(56), 2815. https://doi.org/10.21105/joss.02815

Kassambara, A. (2023). rstatix: Pipe-friendly framework for basic statistical tests (Version 0.7.2). https://CRAN.R-project.org/package=rstatix

Robinson, D., Hayes, A., & Couch, S. (2023). broom: Convert statistical objects into tidy tibbles (Version 1.0.5). https://CRAN.R-project.org/package=broom

Harrell, F. E., & Dupont, C. (2023). Hmisc: Harrell miscellaneous (Version 5.1). https://CRAN.R-project.org/package=Hmisc

Zhu, H. (2024). kableExtra: Construct complex table with kable and pipe syntax (Version 1.4.0). https://CRAN.R-project.org/package=kableExtra

Pedersen, T. L. (2024). patchwork: The composer of plots (Version 1.2.0). https://CRAN.R-project.org/package=patchwork

Waring, E., Quinn, M., McNamara, A., Arino de la Rubia, E., Zhu, H., & Ellis, S. (2022). skimr: Compact and flexible summaries of data (Version 2.1.5). https://CRAN.R-project.org/package=skimr

Tang, Y., Horikoshi, M., & Li, W. (2016). ggfortify: Unified interface to visualize statistical results of popular R packages. The R Journal, 8(2), 478–489. https://doi.org/10.32614/RJ-2016-060

Olosunde, I. (2026). Dharmattan Nigeria Limited — mall tenant gas billing records, anonymised [Dataset]. Collected from Dharmattan Nigeria Limited Facilities Department, Lagos, Nigeria. Data available on request from the author.


Appendix: AI Usage Statement

Claude (Anthropic) was used to assist with R code structuring and Quarto formatting for the data cleaning pipeline, custom ggplot2 theme design, ANOVA, Kruskal-Wallis, correlation, and regression sections. The identification of the normality violation and the decision to use Kruskal-Wallis as the primary inferential test alongside ANOVA, the selection of Dunn’s test with Bonferroni correction for post-hoc analysis, the interpretation of regression model coefficients using the actual rendered output values (β = −8.43 for Restaurant, β = +4.70 for Cinema), and all business recommendations were independently determined by me based on direct operational knowledge of Dharmattan Nigeria Limited’s billing processes and tenant payment behaviour. The dataset was personally collected, cleaned, and verified by the author from the organisation’s billing ledger.