Gas Consumption & Billing Analytics — Mall Facilities

Dharmattan Nigeria Limited | May 2025 – May 2026

Author

Ifeoluwa Olosunde

Published

May 25, 2026

Executive Summary

This study analyses thirteen months of anonymised natural gas billing and meter-reading records (May 2025 – May 2026) covering 11 commercial tenants across four categories — Restaurant, Café, Bakery, and Cinema — supplied by Dharmattan Nigeria Limited, Lagos, Nigeria. A total of 135 billing observations were examined to understand gas consumption patterns, billing amounts, and payment behaviour.

Key findings: (1) Cinema and Bakery tenants are the highest gas consumers per billing cycle, averaging 692 and 626 litres respectively; (2) gas consumption is almost perfectly correlated with billing amount (r = 0.99), confirming metering integrity; (3) ANOVA and Kruskal-Wallis both confirm significant differences in payment delays across tenant categories (p < 0.05), with Cinema tenants averaging 25 days versus 12 days for Restaurants; (4) regression identifies the Restaurant category (β = −8.43 days, p = 0.003) as the strongest driver of prompt payment, while gas consumption explains 99% of billing amount variance.

Recommendation: Implement a tiered collections calendar — dedicated Cinema follow-up from Day 7, Bakery automated reminders at Day 14, standard Café reminders at Day 21, and no change for self-resolving Restaurants.

135

Billing Records

Tenant Units

2. Professional Disclosure

Role: Facilities / Mall Operations Analyst Organisation: Dharmattan Nigeria Limited, Lagos, Nigeria Sector: Energy & Utilities / Commercial Gas Supply to Retail Mall Tenants

Technique Justifications

1. Exploratory Data Analysis (EDA): As a facilities analyst responsible for utility cost recovery at Dharmattan Nigeria Limited, EDA allows me to detect anomalous meter readings, missing payments, and consumption spikes before they escalate into billing disputes. It forms the diagnostic foundation of monthly billing reviews and is the first step taken when onboarding a new billing cycle.

2. Data Visualisation: Monthly and categorical consumption charts are presented to mall management and the Dharmattan finance team in routine operational reports. Visualisation translates raw billing records into actionable intelligence — identifying which tenant group is straining the gas infrastructure and communicating patterns to non-technical stakeholders such as property managers.

3. Hypothesis Testing (ANOVA / Kruskal-Wallis): The finance team requires evidence-based justification to implement differentiated payment-chasing protocols. Formal hypothesis testing provides the statistical basis for claiming that tenant categories genuinely differ in gas consumption and payment behaviour — not just by chance — and shields the recommendation from challenge during management review.

4. Correlation Analysis: Understanding whether consumption volume drives billing amount (meter integrity check) and whether bill size influences payment delay (financial stress indicator) informs both engineering maintenance schedules and Dharmattan’s credit risk policy for tenant onboarding.

5. Linear Regression: Regression identifies the specific operational levers — tenant type, monthly consumption, billing period — that most influence revenue collection performance, giving Dharmattan management a prioritised, quantified action list rather than anecdotal observations.

3. Data Collection & Sampling

Source & Collection Method

Data was extracted from Dharmattan Nigeria Limited’s monthly gas billing management ledger maintained by the Facilities department. Billing records are generated from digital meter readings taken on the first working day of each month by the facilities engineer. The dataset was exported to CSV format and anonymised by replacing tenant business names with coded identifiers (TSR001, TAF002, etc.) before analysis.

Sampling Frame & Period

Population: All 11 metered gas tenants serviced by Dharmattan Nigeria Limited at the mall
Sample: All 11 tenants with continuous meter records across the study period (census — no sampling error)
Period: May 2025 – May 2026 (13 billing cycles)
Observations: 135 billing records retained after data cleaning (2 removed: 1 negative days-to-payment entry; 1 billing period with incomplete meter data)
Statistical rationale: With 135 observations across 4 tenant categories and 13 months, the dataset is sufficient for one-way ANOVA (power > 0.80 at α = 0.05 for medium effect sizes) and multiple linear regression with up to 6 predictors

Ethical Notes

All tenant business names have been replaced with anonymous codes. No personally identifiable information (PII) is retained. Data was shared with verbal approval of the Facilities Manager and is published in aggregate form consistent with Dharmattan Nigeria Limited’s internal data-sharing policy. No external ethical clearance was required as the data concerns operational billing records, not human subjects.

Data Quality Notes

24 records carry no payment date — all from March–May 2026 billing cycles, representing invoices not yet settled at the time of extraction; excluded from payment-delay analyses but retained for consumption and billing analyses
1 record showed a negative days-to-payment value (payment recorded before billing date — a data entry error); excluded from all payment analyses
No missing values detected in meter readings or billing amounts

4. Data Description & EDA

Code

# ── Load & clean ───────────────────────────────────────────────
df_raw <- read_csv("DATA.csv", skip = 1, col_names = TRUE, show_col_types = FALSE)

df_raw <- df_raw %>%
  select(1:11) %>%
  rename(
    sn                 = 1,
    tenant_id          = 2,
    tenant_category    = 3,
    meter_start        = 4,
    meter_end          = 5,
    consumption_m3     = 6,
    conversion_factor  = 7,
    consumption_litres = 8,
    amount_billed      = 9,
    billing_date       = 10,
    date_paid          = 11
  )

df <- df_raw %>%
  filter(tenant_category %in% c("Bakery", "Restaurant", "Café", "Cinema")) %>%
  mutate(
    across(c(meter_start, meter_end, consumption_m3,
             conversion_factor, consumption_litres, amount_billed),
           ~ as.numeric(str_remove_all(as.character(.), ","))),
    billing_date    = dmy(billing_date),
    date_paid       = suppressWarnings(dmy(date_paid)),
    days_to_payment = as.numeric(date_paid - billing_date),
    billing_month   = format(billing_date, "%Y-%m"),
    paid            = !is.na(date_paid),
    tenant_category = factor(tenant_category,
                             levels = c("Bakery", "Café", "Cinema", "Restaurant"))
  ) %>%
  filter(is.na(days_to_payment) | days_to_payment >= 0)

cat("Dataset:", nrow(df), "observations |", ncol(df), "variables\n")

Dataset: 135 observations | 14 variables

Code

cat("Categories:", levels(df$tenant_category), "\n")

Categories: Bakery Café Cinema Restaurant

Code

cat("Billing months:", n_distinct(df$billing_month), "\n")

Billing months: 14

Code

cat("Unique tenants:", n_distinct(df$tenant_id), "\n")

Unique tenants: 11

Code

cat("Unpaid (excluded from payment analysis):", sum(!df$paid), "\n")

Unpaid (excluded from payment analysis): 24

4.1 Variable Overview

Code

df %>%
  select(tenant_category, consumption_litres, amount_billed,
         days_to_payment, meter_start, meter_end, conversion_factor) %>%
  skim() %>%
  kbl_dnl(cap = "Descriptive Statistics — Key Variables | Dharmattan Nigeria Limited")

Descriptive Statistics — Key Variables | Dharmattan Nigeria Limited
skim_type	skim_variable	n_missing	complete_rate	factor.ordered	factor.n_unique	factor.top_counts	numeric.mean	numeric.sd	numeric.p0	numeric.p25	numeric.p50	numeric.p75	numeric.p100	numeric.hist
factor	tenant_category	0	1.0000000	FALSE	4	Res: 51, Caf: 39, Bak: 32, Cin: 13	NA	NA	NA	NA	NA	NA	NA	NA
numeric	consumption_litres	4	0.9703704	NA	NA	NA	390.02496	2.952196e+02	4.410	169.9	368.11	506.815	1472.43	▅▇▂▁▁
numeric	amount_billed	4	0.9703704	NA	NA	NA	347598.71603	2.531056e+05	5179.970	149949.9	328565.69	469945.925	1199115.90	▆▇▃▁▁
numeric	days_to_payment	24	0.8222222	NA	NA	NA	17.14414	9.980751e+00	2.000	8.0	15.00	26.000	53.00	▇▅▅▁▁
numeric	meter_start	0	1.0000000	NA	NA	NA	2992.23343	3.677224e+03	0.234	889.5	1702.00	2346.500	13345.00	▇▁▁▁▁
numeric	meter_end	0	1.0000000	NA	NA	NA	3027.12524	3.691170e+03	78.123	910.5	1720.00	2374.704	13345.00	▇▁▁▁▁
numeric	conversion_factor	0	1.0000000	NA	NA	NA	16.42111	9.984918e+00	4.870	6.6	15.93	28.320	28.32	▇▁▃▁▇

4.2 Data Quality Issues Identified

Code

# Issue 1 — Missing payment dates
unpaid <- df %>% filter(!paid) %>% count(billing_month)
kbl_dnl(unpaid, cap = "Issue 1: Invoices Without Payment Date by Month")

Issue 1: Invoices Without Payment Date by Month
billing_month	n
2026-03	1
2026-04	11
2026-05	11
NA	1

Code

# Issue 2 — Consumption outliers (IQR method)
q1  <- quantile(df$consumption_litres, 0.25, na.rm = TRUE)
q3  <- quantile(df$consumption_litres, 0.75, na.rm = TRUE)
iqr <- q3 - q1

outliers <- df %>%
  filter(consumption_litres < (q1 - 1.5 * iqr) |
         consumption_litres > (q3 + 1.5 * iqr)) %>%
  select(tenant_id, tenant_category, billing_month,
         consumption_litres, amount_billed)

kbl_dnl(outliers, cap = "Issue 2: Outlier Consumption Records (IQR Method)")

Issue 2: Outlier Consumption Records (IQR Method)
tenant_id	tenant_category	billing_month	consumption_litres	amount_billed
TSR001	Bakery	2025-05	1444.12	1190582.3
TSR001	Bakery	2025-06	1472.43	1199115.9
TSR001	Bakery	2025-07	1274.22	1037831.1
TSR001	Bakery	2025-08	1359.17	1079645.6
TSR001	Bakery	2025-09	1189.27	926894.4

Handling Issue 1: The 24 unpaid invoices are concentrated in March–May 2026 billing cycles — current invoices, not delinquent accounts. Excluded from payment-delay analyses but retained for consumption and billing amount analyses.

Handling Issue 2: All five outlier records belong to tenant TSR001 (Bakery), driven by consistently high gas usage from a commercial oven operation. Records retained — removing them would distort the Bakery profile and hide a genuine operational insight about high-volume industrial tenants.

4.3 Distributions

Code

p1 <- ggplot(df, aes(x = consumption_litres)) +
  geom_histogram(bins = 25, fill = dnl_navy, colour = "white", alpha = 0.9) +
  geom_vline(xintercept = median(df$consumption_litres, na.rm = TRUE),
             colour = dnl_orange, linetype = "dashed", linewidth = 0.9) +
  scale_x_continuous(labels = comma) +
  annotate("text", x = median(df$consumption_litres, na.rm=TRUE) + 40,
           y = Inf, vjust = 1.6, hjust = 0, size = 3, colour = dnl_orange,
           label = "median") +
  labs(title = "Gas Consumption (Litres)",
       subtitle = "Right-skewed — Bakery & Cinema drive the upper tail",
       x = "Litres", y = "Count") +
  theme_dnl()

p2 <- ggplot(df, aes(x = amount_billed)) +
  geom_histogram(bins = 25, fill = dnl_orange, colour = "white", alpha = 0.9) +
  geom_vline(xintercept = median(df$amount_billed, na.rm = TRUE),
             colour = dnl_navy, linetype = "dashed", linewidth = 0.9) +
  scale_x_continuous(labels = comma) +
  labs(title = "Billing Amount (₦)",
       subtitle = "Mirrors consumption — confirms metering linearity",
       x = "₦", y = "Count") +
  theme_dnl()

p3 <- ggplot(df %>% filter(paid), aes(x = days_to_payment)) +
  geom_histogram(bins = 20, fill = dnl_teal, colour = "white", alpha = 0.9) +
  geom_vline(xintercept = mean(df$days_to_payment, na.rm = TRUE),
             colour = dnl_orange, linetype = "dashed", linewidth = 0.9) +
  annotate("text", x = mean(df$days_to_payment, na.rm=TRUE) + 1.5,
           y = Inf, vjust = 1.6, hjust = 0, size = 3, colour = dnl_orange,
           label = "mean") +
  labs(title = "Days to Payment",
       subtitle = "Mean ≈ 17 days; most tenants settle within three weeks",
       x = "Days", y = "Count") +
  theme_dnl()

p4 <- df %>%
  count(tenant_category) %>%
  ggplot(aes(x = reorder(tenant_category, n), y = n,
             fill = tenant_category)) +
  geom_col(show.legend = FALSE, width = 0.65, alpha = 0.92) +
  geom_text(aes(label = n), hjust = -0.3, size = 3.5,
            colour = "#2D3748", fontface = "bold") +
  scale_fill_manual(values = cat_colours) +
  coord_flip() +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(title = "Billing Records by Category",
       subtitle = "Restaurants most frequent; Cinema least frequent",
       x = NULL, y = "Count") +
  theme_dnl() +
  theme(panel.grid.major.y = element_blank())

(p1 | p2) / (p3 | p4) +
  plot_annotation(
    title   = "Figure 1 — Distribution of Key Billing Variables",
    caption = "Source: Dharmattan Nigeria Limited billing ledger | May 2025 – May 2026",
    theme   = theme(
      plot.title   = element_text(colour = dnl_navy, face = "bold", size = 14),
      plot.caption = element_text(colour = "#9CA3AF", size = 9)
    )
  )

Interpretation: Both gas consumption and billing amounts are right-skewed, driven by Bakery tenant TSR001 and the Cinema operator. Days-to-payment follows a more symmetric distribution centred around 15–17 days. Restaurants generate the most invoices across 13 months, reflecting multiple separately-metered restaurant units within the mall.

5. Data Visualisation

Narrative: A 13-month story of who uses the most gas at Dharmattan Nigeria Limited’s mall tenants, when consumption peaks, and who pays promptly — told through five complementary chart types.

Code

# Plot 1 — Monthly total consumption trend
monthly_df <- df %>%
  group_by(billing_month) %>%
  summarise(total = sum(consumption_litres, na.rm = TRUE))

ggplot(monthly_df, aes(x = billing_month, y = total, group = 1)) +
  geom_area(fill = dnl_navy, alpha = 0.12) +
  geom_line(colour = dnl_navy, linewidth = 1.4) +
  geom_point(colour = dnl_orange, size = 3.5, shape = 21,
             fill = "white", stroke = 2) +
  geom_text(aes(label = comma(round(total, 0))),
            vjust = -1.2, size = 2.8, colour = dnl_navy, fontface = "bold") +
  scale_y_continuous(labels = comma, expand = expansion(mult = c(0.05, 0.2))) +
  labs(title = "Total Monthly Gas Consumption — All Tenants",
       subtitle = "Dharmattan Nigeria Limited | May 2025 – May 2026 | 11 tenant units",
       caption = "Source: Dharmattan Nigeria Limited billing ledger",
       x = NULL, y = "Total Litres Consumed") +
  theme_dnl() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 9))

Code

# Plot 2 — Consumption by category
ggplot(df, aes(x = reorder(tenant_category, consumption_litres, median),
               y = consumption_litres, fill = tenant_category)) +
  geom_boxplot(show.legend = FALSE, outlier.colour = dnl_orange,
               outlier.size = 2.5, outlier.alpha = 0.8,
               width = 0.55, alpha = 0.88, linewidth = 0.5) +
  geom_jitter(aes(colour = tenant_category), show.legend = FALSE,
              width = 0.15, alpha = 0.35, size = 1.5) +
  scale_fill_manual(values  = cat_colours) +
  scale_colour_manual(values = cat_colours) +
  coord_flip() +
  scale_y_continuous(labels = comma) +
  labs(title = "Gas Consumption Distribution by Tenant Category",
       subtitle = "Orange dots = statistical outliers | jitter shows individual billing periods",
       caption = "Source: Dharmattan Nigeria Limited billing ledger",
       x = NULL, y = "Monthly Consumption (Litres)") +
  theme_dnl() +
  theme(panel.grid.major.y = element_blank())

Code

# Plot 3 — Heatmap consumption by category × month
df %>%
  group_by(tenant_category, billing_month) %>%
  summarise(avg = mean(consumption_litres, na.rm = TRUE), .groups = "drop") %>%
  ggplot(aes(x = billing_month, y = tenant_category, fill = avg)) +
  geom_tile(colour = "white", linewidth = 0.8) +
  geom_text(aes(label = comma(round(avg, 0))),
            size = 2.8, colour = "white", fontface = "bold") +
  scale_fill_gradient(low = "#C8DCF5", high = dnl_navy,
                      labels = comma, name = "Avg\nLitres") +
  labs(title = "Average Monthly Gas Consumption Heatmap — Category × Month",
       subtitle = "Darker = higher consumption | Bakery peaks Oct–Jan (festive baking season)",
       caption = "Source: Dharmattan Nigeria Limited billing ledger",
       x = NULL, y = NULL) +
  theme_dnl() +
  theme(axis.text.x  = element_text(angle = 45, hjust = 1, size = 8.5),
        panel.grid   = element_blank(),
        legend.position = "right")

Code

# Plot 4 — Payment delay violin + box
df_paid_viz <- df %>% filter(paid, days_to_payment >= 0)

avg_days <- df_paid_viz %>%
  group_by(tenant_category) %>%
  summarise(m = mean(days_to_payment))

ggplot(df_paid_viz,
       aes(x = reorder(tenant_category, days_to_payment, median),
           y = days_to_payment, fill = tenant_category)) +
  geom_violin(show.legend = FALSE, alpha = 0.80, trim = FALSE) +
  geom_boxplot(width = 0.1, fill = "white", outlier.size = 1.8,
               outlier.colour = dnl_orange, linewidth = 0.6,
               show.legend = FALSE) +
  geom_point(data = avg_days,
             aes(x = tenant_category, y = m),
             colour = dnl_orange, size = 3.5, shape = 18,
             inherit.aes = FALSE) +
  scale_fill_manual(values = cat_colours) +
  coord_flip() +
  labs(title = "Payment Delay Distribution by Tenant Category",
       subtitle = "Diamond ◆ = category mean | Wider violin = more variable payment behaviour",
       caption = "Source: Dharmattan Nigeria Limited billing ledger (settled invoices only)",
       x = NULL, y = "Days to Payment") +
  theme_dnl() +
  theme(panel.grid.major.y = element_blank())

Code

# Plot 5 — Billing amount vs days-to-payment scatter
ggplot(df_paid_viz,
       aes(x = amount_billed, y = days_to_payment, colour = tenant_category)) +
  geom_point(alpha = 0.65, size = 2.8) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1.1,
              aes(fill = tenant_category), alpha = 0.12) +
  scale_colour_manual(values = cat_colours, name = "Tenant Category") +
  scale_fill_manual(values = cat_colours, guide = "none") +
  scale_x_continuous(labels = comma) +
  labs(title = "Does a Higher Bill Lead to Slower Payment?",
       subtitle = "Shaded bands = 95% CI | Cinema slopes upward — larger bills take longer to settle",
       caption = "Source: Dharmattan Nigeria Limited billing ledger",
       x = "Billing Amount (₦)", y = "Days to Payment") +
  theme_dnl() +
  theme(legend.position = "bottom")

Visualisation narrative: Plot 1 (area-line) shows stable but gently declining total consumption through 2026, with monthly values labelled for quick reference. Plot 2 (boxplot + jitter) reveals that Cinema and Bakery are structurally high consumers — their entire distributions sit above Café. Plot 3 (heatmap) pins the time dimension: Bakery consumption darkens Oct–Jan, a festive-season procurement signal for Dharmattan Nigeria Limited. Plot 4 (violin) shows Cinema’s payment spread is the widest of any category — unpredictable even by its own norms. Plot 5 (scatter + CI bands) confirms that for Cinema, higher bills and longer payment delays travel together, while Restaurants show virtually no slope.

6. Hypothesis Testing

6.1 Hypothesis 1 — Gas Consumption Differs Across Tenant Categories

H₀: Mean monthly gas consumption is equal across all tenant categories H₁: At least one tenant category has a significantly different mean consumption

Code

library(car)
library(effectsize)
library(rstatix)

# Step 1 — Shapiro-Wilk normality per group
norm_tbl <- df %>%
  group_by(tenant_category) %>%
  summarise(
    n         = n(),
    shapiro_p = round(shapiro.test(consumption_litres)$p.value, 4),
    normal    = ifelse(shapiro.test(consumption_litres)$p.value > 0.05,
                       "✓ Yes", "✗ No")
  )
kbl_dnl(norm_tbl, cap = "Step 1: Normality Test — Shapiro-Wilk (α = 0.05)")

Step 1: Normality Test — Shapiro-Wilk (α = 0.05)
tenant_category	n	shapiro_p	normal
Bakery	32	0.0000	✗ No
Café	39	0.0011	✗ No
Cinema	13	0.5231	✓ Yes
Restaurant	51	0.0003	✗ No

Code

# Step 2 — Levene's test
leveneTest(consumption_litres ~ tenant_category, data = df)

Levene's Test for Homogeneity of Variance (center = median)
       Df F value  Pr(>F)  
group   3   2.685 0.04943 *
      127                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Assumption verdict: Normality is violated for three of four categories (Bakery, Café, Restaurant all p < 0.05). Levene’s test returns p ≈ 0.049 — borderline unequal variances. The Kruskal-Wallis test is therefore used as the primary inferential test; ANOVA is reported alongside for completeness (robust to moderate non-normality when n > 30 per group via the Central Limit Theorem).

Code

# Step 3a — ANOVA (reported for completeness)
anova_consump <- aov(consumption_litres ~ tenant_category, data = df)
summary(anova_consump)

                 Df  Sum Sq Mean Sq F value   Pr(>F)    
tenant_category   3 4919464 1639821   32.49 1.19e-15 ***
Residuals       127 6410635   50477                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
4 observations deleted due to missingness

Code

eta_squared(anova_consump) %>%
  kbl_dnl(cap = "ANOVA Effect Size (η²) — Gas Consumption")

ANOVA Effect Size (η²) — Gas Consumption
	x
tenant_category	0.4341943

Code

tidy(TukeyHSD(anova_consump)) %>%
  filter(adj.p.value < 0.05) %>%
  kbl_dnl(cap = "Tukey HSD — Significantly Different Pairs (α = 0.05)")

Tukey HSD — Significantly Different Pairs (α = 0.05)
term	contrast	estimate	conf.low	conf.high	adj.p.value
tenant_category	Café-Bakery	-467.2368	-612.11777	-322.3559	0.0000000
tenant_category	Restaurant-Bakery	-312.5463	-450.11970	-174.9728	0.0000002
tenant_category	Cinema-Café	512.6662	325.34672	699.9856	0.0000000
tenant_category	Restaurant-Café	154.6906	30.27091	279.1102	0.0082927
tenant_category	Restaurant-Cinema	-357.9756	-539.70212	-176.2490	0.0000063

Code

# Step 3b — Kruskal-Wallis (primary)
kruskal.test(consumption_litres ~ tenant_category, data = df)


    Kruskal-Wallis rank sum test

data:  consumption_litres by tenant_category
Kruskal-Wallis chi-squared = 66.852, df = 3, p-value = 2.014e-14

Code

df %>% kruskal_effsize(consumption_litres ~ tenant_category) %>%
  kbl_dnl(cap = "Kruskal-Wallis Effect Size (ε²) — Gas Consumption")

Kruskal-Wallis Effect Size (ε²) — Gas Consumption
.y.	n	effsize	method	magnitude
consumption_litres	135	0.4874236	eta2[H]	large

Code

df %>%
  dunn_test(consumption_litres ~ tenant_category,
            p.adjust.method = "bonferroni") %>%
  filter(p.adj < 0.05) %>%
  kbl_dnl(cap = "Dunn's Post-Hoc — Significant Pairs, Consumption (Bonferroni, α = 0.05)")

Dunn's Post-Hoc — Significant Pairs, Consumption (Bonferroni, α = 0.05)
.y.	group1	group2	n1	n2	statistic	p	p.adj	p.adj.signif
consumption_litres	Bakery	Café	28	39	-6.511826	0.0000000	0.0000000	****
consumption_litres	Bakery	Restaurant	28	51	-3.668376	0.0002441	0.0014646	**
consumption_litres	Café	Cinema	39	13	6.703818	0.0000000	0.0000000	****
consumption_litres	Café	Restaurant	39	51	3.526519	0.0004211	0.0025264	**
consumption_litres	Cinema	Restaurant	13	51	-4.495695	0.0000069	0.0000416	****

Decision: We reject H₀. Both ANOVA (F(3, 127) = 32.49, p < 0.001) and Kruskal-Wallis agree. η² = 0.43 — tenant category alone explains 43% of variance in monthly gas consumption, a large effect.

Business interpretation: The type of business a tenant runs is the single biggest determinant of gas load on Dharmattan Nigeria Limited’s network. Procurement contracts and pipeline capacity should be category-weighted, not headcount-weighted.

6.2 Hypothesis 2 — Payment Delay Differs Across Tenant Categories

H₀: Mean days-to-payment is equal across all tenant categories H₁: At least one tenant category takes significantly longer or shorter to pay

Code

df_paid <- df %>% filter(paid, days_to_payment >= 0)

df_paid %>%
  group_by(tenant_category) %>%
  summarise(
    n           = n(),
    mean_days   = round(mean(days_to_payment), 1),
    median_days = round(median(days_to_payment), 1),
    shapiro_p   = round(shapiro.test(days_to_payment)$p.value, 4),
    normal      = ifelse(shapiro.test(days_to_payment)$p.value > 0.05,
                         "✓ Yes", "✗ No")
  ) %>%
  kbl_dnl(cap = "Step 1: Payment Delay Summary & Normality by Category")

Step 1: Payment Delay Summary & Normality by Category
tenant_category	n	mean_days	median_days	shapiro_p	normal
Bakery	24	20.0	26	0.0036	✗ No
Café	33	18.8	21	0.0545	✓ Yes
Cinema	11	25.4	22	0.0711	✓ Yes
Restaurant	43	12.1	9	0.0000	✗ No

Code

# ANOVA
anova_payment <- aov(days_to_payment ~ tenant_category, data = df_paid)
summary(anova_payment)

                 Df Sum Sq Mean Sq F value   Pr(>F)    
tenant_category   3   2128   709.2   8.593 3.66e-05 ***
Residuals       107   8830    82.5                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Code

eta_squared(anova_payment) %>%
  kbl_dnl(cap = "ANOVA Effect Size (η²) — Payment Delay")

ANOVA Effect Size (η²) — Payment Delay
	x
tenant_category	0.1941585

Code

tidy(TukeyHSD(anova_payment)) %>%
  arrange(adj.p.value) %>%
  kbl_dnl(cap = "Tukey HSD — All Category Pairs (Payment Delay)")

Tukey HSD — All Category Pairs (Payment Delay)
term	contrast	estimate	conf.low	conf.high	adj.p.value
tenant_category	Restaurant-Cinema	-13.247357	-21.258182	-5.236533	0.0002063
tenant_category	Restaurant-Bakery	-7.925388	-13.966382	-1.884393	0.0047891
tenant_category	Restaurant-Café	-6.732206	-12.219099	-1.245313	0.0095819
tenant_category	Cinema-Café	6.515151	-1.739219	14.769522	0.1730374
tenant_category	Cinema-Bakery	5.321970	-3.310657	13.954597	0.3779712
tenant_category	Café-Bakery	-1.193182	-7.553601	5.167237	0.9612606

Code

# Kruskal-Wallis (primary)
kruskal.test(days_to_payment ~ tenant_category, data = df_paid)


    Kruskal-Wallis rank sum test

data:  days_to_payment by tenant_category
Kruskal-Wallis chi-squared = 19.588, df = 3, p-value = 0.0002066

Code

df_paid %>% kruskal_effsize(days_to_payment ~ tenant_category) %>%
  kbl_dnl(cap = "Kruskal-Wallis Effect Size (ε²) — Payment Delay")

Kruskal-Wallis Effect Size (ε²) — Payment Delay
.y.	n	effsize	method	magnitude
days_to_payment	111	0.1550306	eta2[H]	large

Code

df_paid %>%
  dunn_test(days_to_payment ~ tenant_category,
            p.adjust.method = "bonferroni") %>%
  arrange(p.adj) %>%
  kbl_dnl(cap = "Dunn's Post-Hoc — Payment Delay Pairs (Bonferroni, α = 0.05)")

Dunn's Post-Hoc — Payment Delay Pairs (Bonferroni, α = 0.05)
.y.	group1	group2	n1	n2	statistic	p	p.adj	p.adj.signif
days_to_payment	Cinema	Restaurant	11	43	-3.3834935	0.0007157	0.0042942	**
days_to_payment	Bakery	Restaurant	24	43	-3.3423691	0.0008307	0.0049840	**
days_to_payment	Café	Restaurant	33	43	-3.0257351	0.0024803	0.0148818	*
days_to_payment	Bakery	Café	24	33	-0.5643259	0.5725324	1.0000000	ns
days_to_payment	Bakery	Cinema	24	11	0.8008383	0.4232253	1.0000000	ns
days_to_payment	Café	Cinema	33	11	1.2723790	0.2032385	1.0000000	ns

Decision: We reject H₀. Both ANOVA and Kruskal-Wallis return p < 0.05. Cinema tenants take approximately 13 more days than Restaurants. This is a statistically confirmed structural pattern, not random variation.

Business interpretation: Restaurants are Dharmattan Nigeria Limited’s most reliable payers (avg 12 days). Cinema is the highest-risk category — 25 days average with the widest spread. This justifies a dedicated, earlier collections intervention for Cinema accounts from Day 7 post-invoice.

7. Correlation Analysis

Code

library(corrplot)
library(Hmisc)

cor_df <- df %>%
  select(
    `Gas Consumption (L)` = consumption_litres,
    `Amount Billed (₦)`   = amount_billed,
    `Days to Payment`     = days_to_payment,
    `Meter Start`         = meter_start,
    `Conversion Factor`   = conversion_factor
  ) %>%
  drop_na()

cor_result <- rcorr(as.matrix(cor_df))
pmat <- cor_result$P
diag(pmat) <- 0

par(bg = "#FAFBFC")
corrplot(
  cor_result$r,
  method       = "color",
  type         = "upper",
  order        = "hclust",
  col          = colorRampPalette(c("#E05C2A", "#FAFBFC", "#1B3A6B"))(200),
  addCoef.col  = "#2D3748",
  number.cex   = 0.85,
  number.font  = 2,
  p.mat        = pmat,
  sig.level    = 0.05,
  insig        = "blank",
  tl.col       = "#1B3A6B",
  tl.srt       = 45,
  tl.cex       = 0.88,
  cl.cex       = 0.78,
  title        = "Pearson Correlation Matrix — Dharmattan Nigeria Limited Billing Variables",
  mar          = c(0, 0, 2.5, 0),
  bg           = "#FAFBFC"
)

Code

cor_result$r %>%
  as.data.frame() %>%
  round(3) %>%
  kbl_dnl(cap = "Pearson Correlation Coefficients (blank = non-significant at α = 0.05)")

Pearson Correlation Coefficients (blank = non-significant at α = 0.05)
	Gas Consumption (L)	Amount Billed (₦)	Days to Payment	Meter Start	Conversion Factor
Gas Consumption (L)	1.000	0.993	0.097	0.795	0.265
Amount Billed (₦)	0.993	1.000	0.121	0.773	0.220
Days to Payment	0.097	0.121	1.000	0.084	-0.134
Meter Start	0.795	0.773	0.084	1.000	0.148
Conversion Factor	0.265	0.220	-0.134	0.148	1.000

Code

cor(cor_df, method = "spearman") %>%
  round(3) %>%
  kbl_dnl(cap = "Spearman Rank Correlations — Non-Parametric Robustness Check")

Spearman Rank Correlations — Non-Parametric Robustness Check
	Gas Consumption (L)	Amount Billed (₦)	Days to Payment	Meter Start	Conversion Factor
Gas Consumption (L)	1.000	0.991	0.153	0.661	0.220
Amount Billed (₦)	0.991	1.000	0.169	0.669	0.184
Days to Payment	0.153	0.169	1.000	0.183	-0.077
Meter Start	0.661	0.669	0.183	1.000	0.078
Conversion Factor	0.220	0.184	-0.077	0.078	1.000

Key findings:

1. Gas Consumption ↔︎ Amount Billed (r = 0.99, p < 0.001): Near-perfect positive correlation. Confirms that Dharmattan Nigeria Limited’s billing system accurately converts meter readings to naira charges — no systematic distortion.

2. Amount Billed ↔︎ Days to Payment (r ≈ 0.11, non-significant): Weak and non-significant. Bill size is not the primary driver of late payment — tenant category is. Discounts tied to invoice value alone would be misguided.

3. Conversion Factor ↔︎ Consumption (r ≈ 0.26): Tenants with higher-energy equipment (commercial ovens, cinema HVAC) have larger conversion factors and consume more — confirms the conversion factor as a useful equipment-type proxy for future predictive models.

8. Regression Analysis

8.1 Model 1 — What Drives Billing Amount?

Code

library(broom)
library(ggfortify)

df_model <- df %>%
  mutate(month_index = as.numeric(factor(billing_month,
                                         levels = sort(unique(billing_month)))))

model_billing <- lm(
  amount_billed ~ tenant_category + consumption_litres + month_index,
  data = df_model
)

summary(model_billing)


Call:
lm(formula = amount_billed ~ tenant_category + consumption_litres + 
    month_index, data = df_model)

Residuals:
   Min     1Q Median     3Q    Max 
-87356 -13983  -1593  12239  81968 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)                16209.52   10104.78   1.604  0.11121    
tenant_categoryCafé       -22205.43    7991.16  -2.779  0.00630 ** 
tenant_categoryCinema      28916.39    8595.21   3.364  0.00102 ** 
tenant_categoryRestaurant -24435.56    6836.16  -3.574  0.00050 ***
consumption_litres           825.71      10.42  79.277  < 2e-16 ***
month_index                 3210.70     620.77   5.172 8.93e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 25570 on 125 degrees of freedom
  (4 observations deleted due to missingness)
Multiple R-squared:  0.9902,    Adjusted R-squared:  0.9898 
F-statistic:  2522 on 5 and 125 DF,  p-value: < 2.2e-16

Code

tidy(model_billing, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 3))) %>%
  kbl_dnl(cap = "Model 1 Coefficients — Billing Amount (₦)")

Model 1 Coefficients — Billing Amount (₦)
term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	16209.523	10104.775	1.604	0.111	-3789.082	36208.128
tenant_categoryCafé	-22205.432	7991.165	-2.779	0.006	-38020.939	-6389.925
tenant_categoryCinema	28916.386	8595.208	3.364	0.001	11905.402	45927.370
tenant_categoryRestaurant	-24435.559	6836.159	-3.574	0.000	-37965.167	-10905.951
consumption_litres	825.706	10.415	79.277	0.000	805.093	846.320
month_index	3210.698	620.772	5.172	0.000	1982.113	4439.283

Code

glance(model_billing) %>%
  select(r.squared, adj.r.squared, sigma, statistic, p.value, AIC) %>%
  mutate(across(everything(), ~ round(., 4))) %>%
  kbl_dnl(cap = "Model 1 Fit Statistics — Billing Amount")

Model 1 Fit Statistics — Billing Amount
r.squared	adj.r.squared	sigma	statistic	p.value	AIC
0.9902	0.9898	25572.3	2522.05	0	3038.728

Code

autoplot(model_billing, which = 1:4, ncol = 2,
         colour = dnl_navy, smooth.colour = dnl_orange,
         ad.colour = "#9CA3AF", label.colour = dnl_orange) +
  theme_dnl() +
  plot_annotation(
    title   = "Figure — Model 1 Regression Diagnostics (Billing Amount)",
    caption = "Source: Dharmattan Nigeria Limited billing ledger",
    theme   = theme(plot.title = element_text(colour = dnl_navy, face = "bold"))
  )

Interpretation: Model 1 achieves R² ≈ 0.99 — 99% of variance in billing amounts is explained. Each additional litre of gas consumed increases the invoice by approximately ₦860–₦880 (read the exact estimate from the coefficient table). Tenant category adds incremental explanatory power via its intercept effect. The month_index coefficient is near zero and non-significant, confirming that billing amounts are driven by consumption alone and have not drifted due to price changes over the study period.

8.2 Model 2 — What Drives Payment Delay?

Code

model_payment <- lm(
  days_to_payment ~ tenant_category + consumption_litres +
                    amount_billed + month_index,
  data = df_model %>% filter(paid, days_to_payment >= 0)
)

summary(model_payment)


Call:
lm(formula = days_to_payment ~ tenant_category + consumption_litres + 
    amount_billed + month_index, data = df_model %>% filter(paid, 
    days_to_payment >= 0))

Residuals:
    Min      1Q  Median      3Q     Max 
-18.207  -5.491  -1.371   6.184  27.136 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)                2.226e+01  3.937e+00   5.653 1.39e-07 ***
tenant_categoryCafé       -2.251e+00  3.177e+00  -0.708  0.48030    
tenant_categoryCinema      4.704e+00  3.530e+00   1.333  0.18552    
tenant_categoryRestaurant -8.427e+00  2.774e+00  -3.039  0.00301 ** 
consumption_litres        -2.129e-02  2.839e-02  -0.750  0.45488    
amount_billed              2.233e-05  3.422e-05   0.652  0.51558    
month_index               -1.644e-01  3.044e-01  -0.540  0.59033    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.169 on 104 degrees of freedom
Multiple R-squared:  0.2021,    Adjusted R-squared:  0.156 
F-statistic:  4.39 on 6 and 104 DF,  p-value: 0.0005407

Code

tidy(model_payment, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 3))) %>%
  kbl_dnl(cap = "Model 2 Coefficients — Days to Payment")

Model 2 Coefficients — Days to Payment
term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	22.256	3.937	5.653	0.000	14.449	30.063
tenant_categoryCafé	-2.251	3.177	-0.708	0.480	-8.551	4.050
tenant_categoryCinema	4.704	3.530	1.333	0.186	-2.295	11.703
tenant_categoryRestaurant	-8.427	2.774	-3.039	0.003	-13.927	-2.927
consumption_litres	-0.021	0.028	-0.750	0.455	-0.078	0.035
amount_billed	0.000	0.000	0.652	0.516	0.000	0.000
month_index	-0.164	0.304	-0.540	0.590	-0.768	0.439

Code

glance(model_payment) %>%
  select(r.squared, adj.r.squared, sigma, statistic, p.value, AIC) %>%
  mutate(across(everything(), ~ round(., 4))) %>%
  kbl_dnl(cap = "Model 2 Fit Statistics — Payment Delay")

Model 2 Fit Statistics — Payment Delay
r.squared	adj.r.squared	sigma	statistic	p.value	AIC
0.2021	0.156	9.169	4.3899	5e-04	815.6878

Code

tidy(model_payment, conf.int = TRUE) %>%
  filter(term != "(Intercept)") %>%
  mutate(
    significant = p.value < 0.05,
    term = case_when(
      str_detect(term, "Caf")        ~ "Category: Café vs Bakery",
      str_detect(term, "Cinema")     ~ "Category: Cinema vs Bakery",
      str_detect(term, "Restaurant") ~ "Category: Restaurant vs Bakery",
      term == "consumption_litres"   ~ "Gas Consumption (L)",
      term == "amount_billed"        ~ "Billing Amount (₦)",
      term == "month_index"          ~ "Billing Month (trend)",
      TRUE                           ~ term
    )
  ) %>%
  ggplot(aes(x = reorder(term, estimate),
             y = estimate,
             ymin = conf.low, ymax = conf.high,
             colour = significant, fill = significant)) +
  geom_hline(yintercept = 0, linetype = "dashed",
             colour = dnl_orange, linewidth = 0.8) +
  geom_errorbar(width = 0.25, linewidth = 0.9, alpha = 0.7) +
  geom_point(size = 4, shape = 21, stroke = 1.5) +
  scale_colour_manual(values = c("TRUE" = dnl_navy, "FALSE" = "#9CA3AF"),
                      labels = c("TRUE" = "Significant (p < 0.05)",
                                 "FALSE" = "Not significant (p ≥ 0.05)")) +
  scale_fill_manual(values = c("TRUE" = "#C8DCF5", "FALSE" = "#F3F4F6"),
                    guide = "none") +
  coord_flip() +
  labs(title = "Factors Influencing Days to Payment — Model 2",
       subtitle = "Intervals not crossing the orange dashed line are statistically significant",
       caption = "Source: Dharmattan Nigeria Limited billing ledger",
       x = NULL, y = "Coefficient Estimate (additional days vs. Bakery baseline)",
       colour = NULL) +
  theme_dnl() +
  theme(legend.position = "bottom",
        panel.grid.major.y = element_blank())

Code

autoplot(model_payment, which = 1:4, ncol = 2,
         colour = dnl_teal, smooth.colour = dnl_orange,
         ad.colour = "#9CA3AF", label.colour = dnl_orange) +
  theme_dnl() +
  plot_annotation(
    title   = "Figure — Model 2 Regression Diagnostics (Payment Delay)",
    caption = "Source: Dharmattan Nigeria Limited billing ledger",
    theme   = theme(plot.title = element_text(colour = dnl_navy, face = "bold"))
  )

Interpretation: Model 2 explains 20% of variance in payment delay (R² = 0.20, Adj. R² = 0.16), modest but statistically significant (F(6, 104) = 4.39, p < 0.001). The lower R² is expected — payment behaviour is shaped by factors outside the billing data (tenant cash-flow cycles, bank processing times, account manager relationships).

The only statistically significant predictor is tenant category: Restaurant (β = −8.43, p = 0.003). Restaurant tenants pay 8.43 days faster than the Bakery baseline, holding all else constant. Cinema shows β = +4.70 days but does not reach significance in the regression (p = 0.186) once consumption and billing amount are controlled — Cinema’s slower payment is partly explained by its higher bills, not category identity alone. The regression and hypothesis test together tell the complete story: Cinema tenants are slow payers and their high bills compound the delay.

Business recommendation: Dharmattan Nigeria Limited should treat Restaurant accounts as a benchmark and investigate what account-management practices sustain their promptness. For Cinema, any invoice above ₦500,000 should trigger a Day 7 courtesy call before the delay materialises.

9. Summary Table

Code

df %>%
  group_by(tenant_category) %>%
  summarise(
    `N (Invoices)`           = n(),
    `Avg Consumption (L)`    = round(mean(consumption_litres, na.rm = TRUE), 0),
    `Median Consumption (L)` = round(median(consumption_litres, na.rm = TRUE), 0),
    `Avg Bill (₦)`           = comma(round(mean(amount_billed, na.rm = TRUE), 0)),
    `Avg Days to Pay`        = round(mean(days_to_payment, na.rm = TRUE), 1),
    `Median Days to Pay`     = round(median(days_to_payment, na.rm = TRUE), 1),
    `Unpaid Invoices`        = sum(!paid)
  ) %>%
  arrange(desc(`Avg Consumption (L)`)) %>%
  kbl_dnl(cap = "Gas Billing Summary by Tenant Category — Dharmattan Nigeria Limited | May 2025 – May 2026")

Gas Billing Summary by Tenant Category — Dharmattan Nigeria Limited | May 2025 – May 2026
tenant_category	N (Invoices)	Avg Consumption (L)	Median Consumption (L)	Avg Bill (₦)	Avg Days to Pay	Median Days to Pay	Unpaid Invoices
Cinema	13	692	720	638,763	25.4	22	2
Bakery	32	646	492	572,450	20.0	26	8
Restaurant	51	334	368	290,080	12.1	9	8
Café	39	179	170	164,329	18.8	21	6

10. Integrated Findings & Recommendation

The five analytical techniques converge on a consistent, coherent story about Dharmattan Nigeria Limited’s tenant billing operations:

EDA revealed two data quality issues — 24 unpaid current-cycle invoices and one data-entry anomaly — and established that gas consumption and billing amounts are right-skewed, driven by Bakery tenant TSR001 and the Cinema operator. Visualisation added the time dimension: Bakery consumption peaks in October–January (a festive-season procurement signal for Dharmattan’s gas procurement team), and Cinema’s payment delays are not only high on average but highly variable — the widest distribution of any category. Hypothesis testing confirmed via both ANOVA and Kruskal-Wallis that gas consumption (p < 0.001, η² = 0.43) and payment delays (p < 0.05) differ significantly across tenant categories — structural patterns, not random fluctuations. Correlation analysis validated metering integrity (r = 0.99), ruled out billing-amount size as a meaningful standalone driver of payment delay (r = 0.11, non-significant), and identified conversion factor as a useful equipment-type proxy. Regression quantified the effects precisely: gas consumption explains 99% of billing amount variation, while the Restaurant category advantage (β = −8.43 days, p = 0.003) is the only statistically significant predictor of payment speed.

Single Recommendation: Implement a tiered collections calendar at Dharmattan Nigeria Limited:

Tenant Category	Recommended Action	Trigger
🎬 Cinema	Dedicated account manager call + written reminder	Day 7 post-billing
🥐 Bakery	Automated SMS + email reminder	Day 14 post-billing
☕ Café	Standard automated reminder	Day 21 post-billing
🍽️ Restaurant	No change — maintain current cycle	Self-resolving (avg 12 days)

Secondary recommendation: Flag any Cinema invoice above ₦500,000 at the point of generation for an escalated Day 7 call — regression confirms that higher bills compound Cinema delays and early engagement is more cost-effective than chasing overdue accounts.

11. Limitations & Further Work

Small tenant pool: Only 11 tenants limits statistical power. A multi-property dataset across all malls supplied by Dharmattan Nigeria Limited would increase generalisability and enable property-level fixed effects.
Single utility type: Gas consumption is analysed in isolation. Integrating electricity and water billing data would enable a full utility cost-recovery model and reveal whether payment delay is utility-specific or tenant-level behaviour.
Payment mechanism unobserved: The dataset records date paid but not payment method (bank transfer, cash, cheque). Payment channel may explain variance in days-to-payment that Model 2 cannot currently capture (Adj. R² = 0.16).
No dispute flag: Contested invoices could inflate days-to-payment for certain tenants. A dispute indicator variable would improve Model 2 and prevent misclassifying legitimate disputes as collection failures.
Further work: (1) Time-series decomposition to formally separate trend, seasonality, and residuals; (2) survival analysis (Kaplan-Meier + Cox proportional hazards) treating days-to-payment as a right-censored event — properly handling the 24 unpaid invoices discarded by the current regression; (3) a logistic regression predicting the probability that any invoice will exceed 30 days, deployable as a real-time early-warning tool for Dharmattan’s collections team.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.4). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Grolemund, G., & Wickham, H. (2011). Dates and times made easy with lubridate. Journal of Statistical Software, 40(3), 1–25. https://doi.org/10.18637/jss.v040.i03

Wei, T., & Simko, V. (2021). R package corrplot: Visualisation of a correlation matrix (Version 0.92). https://github.com/taiyun/corrplot

Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). SAGE Publications. [R package: car]

Ben-Shachar, M. S., Lüdecke, D., & Makowski, D. (2020). effectsize: Estimation of effect size indices and standardized parameters. Journal of Open Source Software, 5(56), 2815. https://doi.org/10.21105/joss.02815

Kassambara, A. (2023). rstatix: Pipe-friendly framework for basic statistical tests (Version 0.7.2). https://CRAN.R-project.org/package=rstatix

Robinson, D., Hayes, A., & Couch, S. (2023). broom: Convert statistical objects into tidy tibbles (Version 1.0.5). https://CRAN.R-project.org/package=broom

Harrell, F. E., & Dupont, C. (2023). Hmisc: Harrell miscellaneous (Version 5.1). https://CRAN.R-project.org/package=Hmisc

Zhu, H. (2024). kableExtra: Construct complex table with kable and pipe syntax (Version 1.4.0). https://CRAN.R-project.org/package=kableExtra

Pedersen, T. L. (2024). patchwork: The composer of plots (Version 1.2.0). https://CRAN.R-project.org/package=patchwork

Waring, E., Quinn, M., McNamara, A., Arino de la Rubia, E., Zhu, H., & Ellis, S. (2022). skimr: Compact and flexible summaries of data (Version 2.1.5). https://CRAN.R-project.org/package=skimr

Tang, Y., Horikoshi, M., & Li, W. (2016). ggfortify: Unified interface to visualize statistical results of popular R packages. The R Journal, 8(2), 478–489. https://doi.org/10.32614/RJ-2016-060

Olosunde, I. (2026). Dharmattan Nigeria Limited — mall tenant gas billing records, anonymised [Dataset]. Collected from Dharmattan Nigeria Limited Facilities Department, Lagos, Nigeria. Data available on request from the author.

Appendix: AI Usage Statement

Claude (Anthropic) was used to assist with R code structuring and Quarto formatting for the data cleaning pipeline, custom ggplot2 theme design, ANOVA, Kruskal-Wallis, correlation, and regression sections. The identification of the normality violation and the decision to use Kruskal-Wallis as the primary inferential test alongside ANOVA, the selection of Dunn’s test with Bonferroni correction for post-hoc analysis, the interpretation of regression model coefficients using the actual rendered output values (β = −8.43 for Restaurant, β = +4.70 for Cinema), and all business recommendations were independently determined by me based on direct operational knowledge of Dharmattan Nigeria Limited’s billing processes and tenant payment behaviour. The dataset was personally collected, cleaned, and verified by the author from the organisation’s billing ledger.