Receivables Analytics at Pastures Hub: An Exploratory & Inferential Study of B2B Payment Behavior Across Four Nigerian Cities
Author
Chioma Imemba
Published
May 13, 2026
1. Executive Summary
Every week, I look at our receivables dashboard and ask the same question: why are some clients sitting on unpaid invoices for 60, 70, 80 days when our terms say 30? As founder/CEO of Pastures Hub, a Nigerian B2B food distribution company serving the HoReCa sector (hotels, restaurants, and catering) across four major regions: Lagos, Port Harcourt (PHC), Abuja, and Ibadan, cash flow is not an abstract finance concept. It is the difference between meeting our financial obligations and and failing to meet them.
I applied five techniques to 3375 invoices from November 2024 to April 2026 across 68 unique clients — the approach: understand the data first, then test, then model:
Exploratory Data Analysis (EDA)
Visualization
Hypothesis Testing
Correlation
Regression
The results were clearer than I expected. PHC clients pay 16 days faster than Lagos clients on average, and that gap is statistically significant. More surprisingly, the terms I set are almost self-fulfilling: every extra day of credit I extend adds about two more days to how long clients actually take to pay. That matters.
Recommendation: Based on the evidence, Pastures Hub should move immediately to a tiered credit structure: Net 15 for new clients and small invoices, Net 21 as our standard, and Net 30 reserved only for clients with a proven track record. Alongside this, Lagos needs a dedicated collections protocol — a firm reminder call or message at Day 14, before invoices go overdue. Together, these two changes address the root cause the data identified: we are giving clients more time than we need to, and Lagos clients in particular are taking full advantage of it. What makes this more urgent is that our internal CLIENT_TERMS records show 42% of credit clients are already paying beyond their contracted terms, meaning the problem is not just about what terms we set but how well we enforce them.
Organization: Pastures Hub, a B2B food distribution company operating from four major hubs - Lagos, Port Harcourt, Abuja, and Ibadan.
Relationship to data: Invoice data are from our Zoho Books accounting system; client payment terms are from an internal Pastures Hub masterfile. I am the data controller for both and have authorized their use for this submission. All client names have been anonymized.
EDA is the starting point for every credit and operations decision I make. Before I can manage our receivables, I need to understand the shape of our invoice portfolio — how much is outstanding, where it sits, which clients and regions dominate exposure, and where the data has gaps or anomalies. EDA gives me that picture systematically rather than relying on gut feel. It’s like switching on the lights in a dark room.
Data Visualization is how I communicate financial health to my team and to potential investors. A well-constructed dashboard or chart showing the receivables stack by branch or the trend in overdue invoices over time conveys in seconds what a table of numbers cannot. A picture speaks a thousand words.
Hypothesis Testing addresses questions I face in every monthly review: Do Lagos clients actually pay slower than PHC clients, or does it just feel that way? Is there a real difference in invoice size across branches, or is it random variation? Hypothesis testing is how we validate evidence with numbers rather than instincts, and this is critical when making staffing, credit, and expansion decisions.
Correlation Analysis helps me understand how one variable impacts another. For Pastures Hub, the key question is what actually drives payment speed — is it how much a client owes, how long they’ve been with us, or the credit terms we set? Knowing which variable matters most tells me where to focus.
Regression helps me begin with the end in mind by turning correlation into prediction and prediction into action. It’s like a formula I can actually use before I issue the invoice. Instead of finding out 60 days later that a client is late, I can anticipate it upfront and plan my cash flow accordingly.
3. Data Collection & Sampling
Source & Collection Method
The data were extracted directly from Zoho Books, Pastures Hub’s invoicing and accounts-receivable system. Zoho Books is the system of record for all sales transactions; every invoice issued to a client is created, tracked, and closed (or marked overdue) within this platform.
Exports were performed using Zoho Books’ built-in CSV export function under Sales → Invoices → Export. Because Zoho limits exports to 180-day windows, six separate exports were performed covering the period November 2024 to April 2026 and subsequently merged.
Sampling Frame
The dataset is a complete census of all invoices issued by Pastures Hub during the study period — not a sample. Every invoice created in Zoho Books within the date window is included. No invoices were excluded except those with missing dates or zero-value totals (data quality removals documented in Section 4). A supplementary dataset of contracted client payment terms was drawn from an internal Pastures Hub operations masterfile and is used in Section 9.
Variables
Variable
Type
Description
invoice_date
Date
Date invoice was issued
invoice_id
ID
Unique Zoho invoice identifier
invoice_status
Categorical
Closed / Overdue / Open / Draft / PartiallyPaid
customer_name (anonymized)
Categorical
Client identity (anonymized to Customer_001 etc.)
location
Categorical
Branch: Lagos / PHC / Abuja / Ibadan
total
Numeric
Invoice value in NGN
balance
Numeric
Outstanding amount in NGN
due_date
Date
Payment due date
last_payment_date
Date
Date of final payment
payment_terms_label
Categorical
e.g. Net 15, Net 21, Net 30
days_to_payment
Numeric (derived)
Last Payment Date − Invoice Date
paid_on_time
Binary (derived)
1 if paid on or before due date, else 0
days_late
Numeric (derived)
max(0, Last Payment Date − Due Date)
size_bucket
Categorical (derived)
Invoice size tier
Ethical Statement
All client names have been anonymized using sequential codes (Customer_001, Customer_002, …) before analysis. No personally identifiable information appears in the published document. As founder of Pastures Hub, I am the data controller for these records and have authorized their use for this academic submission. The data are internal business records and are not shared beyond this submission.
4. Data Description (EDA)
Theoretical Background
Before I could ask any analytical question about Pastures Hub’s receivables, I needed to know what I was actually working with. What I found was exactly what you would expect from a business running on parallel systems.
These were not clean Zoho exports. Our supply chain team had been manually entering daily supplies into Google Sheets alongside Zoho Books, keeping both files for reconciliation. Over time that became a drag, but it meant the data carried all the marks of manual entry: client names in multiple variations (“Genesis Restaurant”, “GENESIS RESTAURANT LAGOS”, “Genesis Rst”), inconsistent date formats, and records that needed significant cleaning before any analysis could begin.
I standardised the client names manually. I reviewed every variation and collapsed them, which meant that by the time I ran any code, I knew exactly who was in this dataset and could spot anomalies immediately. EDA is the discipline of doing this systematically rather than hoping the data is clean (Adi, 2026).
Anscombe’s Quartet is the classic reminder of why you cannot skip this step. Two datasets can have identical summary statistics and look completely different when you plot them. I needed to see the shape of our invoice portfolio, not just its totals.
What I found surprised me in places. 34 invoices sat above the 99th percentile in value, extreme outliers I retained but flagged. And while I feared there might be payment dates recorded before invoice dates, a common sign of data entry error, there were none. The data was messier than I expected in some ways and cleaner than I expected in others.
Code
# Comprehensive dataset overview using skimr — the standard first-look tool# for understanding distributions, missingness, and data types simultaneously (Adi, 2026)skim(invoices |>select(invoice_date, invoice_status, location, total, balance, payment_terms_label, days_to_payment, paid_on_time, days_late))
Data summary
Name
select(…)
Number of rows
3375
Number of columns
9
_______________________
Column type frequency:
character
2
Date
1
factor
1
numeric
5
________________________
Group variables
None
Variable type: character
skim_variable
n_missing
complete_rate
min
max
empty
n_unique
whitespace
invoice_status
0
1
4
13
0
5
0
payment_terms_label
0
1
5
14
0
10
0
Variable type: Date
skim_variable
n_missing
complete_rate
min
max
median
n_unique
invoice_date
0
1
2024-11-06
2026-04-24
2025-09-08
454
Variable type: factor
skim_variable
n_missing
complete_rate
ordered
n_unique
top_counts
location
0
1
FALSE
4
Lag: 1975, PHC: 1299, Abu: 52, Iba: 49
Variable type: numeric
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
total
0
1.00
368412.70
687963.07
400
77638.25
198603
406086.2
16169820
▇▁▁▁▁
balance
0
1.00
66705.94
472908.29
0
0.00
0
0.0
16169820
▇▁▁▁▁
days_to_payment
569
0.83
49.52
47.12
0
14.00
38
76.0
457
▇▁▁▁▁
paid_on_time
569
0.83
0.34
0.47
0
0.00
0
1.0
1
▇▁▁▁▅
days_late
569
0.83
29.02
41.16
0
0.00
16
47.0
397
▇▁▁▁▁
Code
# Basic summary statistics — build as a simple tibble to avoid type conflictstibble(Metric =c("Total Invoices", "Unique Clients","Date Range Start", "Date Range End","Total Revenue (NGN)", "Total Outstanding (NGN)","Median Invoice (NGN)", "Mean Invoice (NGN)" ),Value =c(as.character(nrow(invoices)),as.character(n_distinct(invoices$customer_id)),as.character(min(invoices$invoice_date)),as.character(max(invoices$invoice_date)),formatC(sum(invoices$total, na.rm =TRUE), format ="f", digits =0, big.mark =","),formatC(sum(invoices$balance, na.rm =TRUE), format ="f", digits =0, big.mark =","),formatC(median(invoices$total, na.rm =TRUE), format ="f", digits =0, big.mark =","),formatC(mean(invoices$total, na.rm =TRUE), format ="f", digits =0, big.mark =",") )) |>kable(caption ="Portfolio Summary Statistics")
EDA Interpretation: The portfolio tells a clear story: ₦1.24 billion invoiced over 18 months, with ₦225 million — roughly 18% — still outstanding. The mean invoice value of ₦368,413 is nearly double the median of ₦198,603, confirming strong right-skew. In practice this means a small number of large hotel clients are driving a disproportionate share of both revenue and receivables risk. Lagos handles the most volume (1,975 invoices) but also carries the heaviest outstanding burden (₦155 million). PHC, despite fewer invoices, has a healthier outstanding-to-revenue ratio. Two data quality issues were identified and handled transparently: 34 extreme-value outliers (above the 99th percentile, retained but flagged) and zero negative days-to-payment values (none found, which is reassuring about data integrity).
Sample Data (Anonymized)
Code
# Show 10 anonymized rows to demonstrate data structureset.seed(42)invoices |>select(invoice_date, customer_id, location, invoice_status, total, balance, payment_terms_label, days_to_payment) |>slice_sample(n =10) |>mutate(total =formatC(total, format ="f", digits =0, big.mark =","),balance =formatC(balance, format ="f", digits =0, big.mark =",") ) |>rename(`Invoice Date`= invoice_date,`Client ID`= customer_id,`Branch`= location,`Status`= invoice_status,`Total (₦)`= total,`Balance (₦)`= balance,`Terms`= payment_terms_label,`Days to Pay`= days_to_payment ) |>kable(caption ="10 Randomly Sampled Invoices (anonymized) — illustrating data structure")
10 Randomly Sampled Invoices (anonymized) — illustrating data structure
Invoice Date
Client ID
Branch
Status
Total (₦)
Balance (₦)
Terms
Days to Pay
2026-01-15
Customer_29
Lagos
Closed
524,300
0
Net 30
96
2025-12-10
Customer_29
Lagos
Closed
104,820
0
Net 30
98
2025-06-10
Customer_45
Lagos
Closed
63,330
0
Net 21
11
2025-05-27
Customer_6
PHC
Closed
214,680
0
Net 30
26
2025-06-20
Customer_29
Lagos
Closed
445,972
0
Net 30
84
2026-03-30
Customer_29
Lagos
Open
762,560
762,560
Net 30
NA
2025-03-17
Customer_29
Lagos
Closed
203,240
0
Net 30
60
2025-11-06
Customer_29
Lagos
Closed
2,600
0
Net 30
82
2025-06-06
Customer_29
Lagos
Closed
379,500
0
Net 30
90
2025-07-04
Customer_8
PHC
Closed
208,882
0
Net 30
10
5. Data Visualization
Theoretical Background
I have sat in enough management meetings to know that a table of numbers loses people in thirty seconds. A chart does not. The five visualisations in this section are not decoration. Each one was chosen deliberately to answer a specific question about Pastures Hub’s receivables that a table could not answer clearly (Adi, 2026).
A histogram shows me the shape of our invoice distribution in a way that a mean and median cannot. A density plot shows me not just that PHC pays faster, but how consistent that payment behavior is compared to Lagos. The sequence below is intentional: growth first, then concentration, then exposure, then speed, then trajectory. Together they tell one story.
Code
invoices |>count(year_month) |>ggplot(aes(x = year_month, y = n)) +geom_col(fill ="#2c7bb6", alpha =0.85) +geom_smooth(method ="loess", se =FALSE, color ="#d7191c", linewidth =1) +scale_x_date(date_breaks ="2 months", date_labels ="%b %Y") +scale_y_continuous(labels =label_comma()) +labs(title ="Monthly Invoice Volume — Pastures Hub",subtitle ="Nov 2024 – Apr 2026 | Red line = smoothed trend",x =NULL, y ="Number of Invoices" ) +theme_minimal(base_size =13) +theme(axis.text.x =element_text(angle =45, hjust =1))
invoices |>group_by(location) |>summarise(Invoices =n(),Revenue =sum(total) /1e6 ) |>pivot_longer(c(Invoices, Revenue), names_to ="Metric", values_to ="Value") |>ggplot(aes(x = location, y = Value, fill = location)) +geom_col(alpha =0.85) +facet_wrap(~Metric, scales ="free_y") +scale_fill_brewer(palette ="Set2") +labs(title ="Invoice Volume and Revenue by Branch",subtitle ="Revenue in millions NGN",x =NULL, y =NULL, fill ="Branch" ) +theme_minimal(base_size =13) +theme(legend.position ="bottom")
Invoice Volume and Revenue by Branch — Lagos and PHC dominate
Code
invoices |>filter(balance >0) |>group_by(location, invoice_status) |>summarise(Outstanding =sum(balance) /1e6, .groups ="drop") |>ggplot(aes(x = location, y = Outstanding, fill = invoice_status)) +geom_col() +scale_fill_manual(values =c("Overdue"="#d7191c", "Open"="#fdae61","PartiallyPaid"="#abd9e9", "Draft"="#aaaaaa") ) +scale_y_continuous(labels =label_comma()) +labs(title ="Outstanding Receivables by Branch and Status",subtitle ="Values in millions NGN",x =NULL, y ="Outstanding (₦ millions)", fill ="Status" ) +theme_minimal(base_size =13) +theme(legend.position ="bottom")
Outstanding Receivables Stack by Branch — the overdue burden
Code
paid |>ggplot(aes(x = days_to_payment, fill = location)) +geom_density(alpha =0.5) +facet_wrap(~location, nrow =2) +scale_fill_brewer(palette ="Set2") +labs(title ="Days-to-Payment Distribution by Branch",subtitle ="Paid invoices only | Excludes outliers > 365 days",x ="Days from Invoice Date to Payment",y ="Density" ) +theme_minimal(base_size =13) +theme(legend.position ="none")
Days-to-Payment Distribution by Branch — PHC pays faster
Code
invoices |>group_by(year_month) |>summarise(avg_invoice =mean(total, na.rm =TRUE)) |>ggplot(aes(x = year_month, y = avg_invoice)) +geom_line(color ="#2c7bb6", linewidth =1) +geom_point(color ="#2c7bb6", size =2) +geom_smooth(method ="lm", se =TRUE, color ="#d7191c",linetype ="dashed", alpha =0.15) +scale_x_date(date_breaks ="2 months", date_labels ="%b %Y") +scale_y_continuous(labels =label_comma()) +labs(title ="Average Invoice Value Over Time",subtitle ="Red dashed line = linear trend | Pastures Hub Nov 2024 – Apr 2026",x =NULL, y ="Average Invoice Value (NGN)" ) +theme_minimal(base_size =13) +theme(axis.text.x =element_text(angle =45, hjust =1))
Average Invoice Value Over Time — rising trend signals business growth
Visualization Narrative: The five charts tell a single story about a growing business sitting on a receivables problem it has not yet systematically addressed. Volume is up — invoice counts roughly doubled between November 2024 and early 2026 (Chart 1). Lagos and PHC dominate, but Lagos carries nearly three times the outstanding balance of PHC relative to its revenue share (Chart 2 and Chart 3). The density plots reveal not just that PHC pays faster on average, but that PHC payments cluster tightly around 30–40 days while Lagos payments are spread across a much wider range — meaning Lagos has not just a speed problem but a consistency problem (Chart 4). Average invoice values are rising, particularly since mid-2025, suggesting Pastures Hub is winning larger clients — which will intensify the receivables challenge if credit policy does not keep pace (Chart 5).
6. Hypothesis Testing
Theoretical Background
I had a strong suspicion going into this analysis that Lagos clients pay slower than PHC clients. But a suspicion is not evidence. When I am making credit policy decisions that affect cash flow across four branches, I need more than gut feel.
Hypothesis testing is the tool that converts an observation into a verdict (Adi, 2026). It forces me to state what I expect to find, then asks: could this result have happened by chance? The p-value answers that question.
But Prof. Bongo is clear on something I also want to be clear about here: statistical significance is not the same as practical significance. A difference can be real and still be too small to act on. That is why I report effect sizes, Cohen’s d and Cramér’s V, alongside every p-value in this section.
Hypothesis 1: Do Lagos clients pay slower than PHC clients?
Code
# Welch two-sample t-test (does not assume equal variances)lagos_dtp <- paid |>filter(location =="Lagos") |>pull(days_to_payment)phc_dtp <- paid |>filter(location =="PHC") |>pull(days_to_payment)# Assumption check: normality (Shapiro on sample)set.seed(42)shap_lagos <-shapiro.test(sample(lagos_dtp, min(5000, length(lagos_dtp))))shap_phc <-shapiro.test(sample(phc_dtp, min(5000, length(phc_dtp))))# Since n is large, CLT applies — proceed with Welch t-testh1_test <-t.test(lagos_dtp, phc_dtp, alternative ="greater", var.equal =FALSE)# tidy() output — converts test result to a clean tibble (Adi, 2026)tidy(h1_test)
# Effect size: Cohen's dpooled_sd <-sqrt((var(lagos_dtp) +var(phc_dtp)) /2)cohens_d <- (mean(lagos_dtp) -mean(phc_dtp)) / pooled_sd# Summarytibble(``=c("H₀", "H₁", "Test", "Lagos Mean DTP (days)", "PHC Mean DTP (days)","t-statistic", "p-value", "Cohen's d", "Decision"),Value =c("Mean days-to-payment is equal for Lagos and PHC clients","Lagos clients take longer to pay than PHC clients","Welch two-sample t-test (one-tailed)",round(mean(lagos_dtp), 1),round(mean(phc_dtp), 1),round(h1_test$statistic, 3),format.pval(h1_test$p.value, digits =3),round(cohens_d, 3),ifelse(h1_test$p.value <0.05, "Reject H₀ — significant difference", "Fail to reject H₀") )) |>kable(caption ="Hypothesis 1: Lagos vs PHC Days-to-Payment")
Hypothesis 1: Lagos vs PHC Days-to-Payment
Value
H₀
Mean days-to-payment is equal for Lagos and PHC clients
H₁
Lagos clients take longer to pay than PHC clients
Test
Welch two-sample t-test (one-tailed)
Lagos Mean DTP (days)
56
PHC Mean DTP (days)
39.5
t-statistic
9.158
p-value
<2e-16
Cohen’s d
0.363
Decision
Reject H₀ — significant difference
Business Interpretation: The result is unambiguous: Lagos clients take an average of 56 days to pay versus 39.5 days for PHC clients — a gap of 16.5 days. With p < 0.001, this is not a statistical artefact. Cohen’s d of 0.363 places this in the small-to-medium range by conventional benchmarks (0.2 = small, 0.5 = medium), which in a receivables context is operationally significant. On a Lagos outstanding balance of ₦155 million, collecting 16 days faster would represent a meaningful acceleration of cash inflow. The practical implication is immediate: Lagos clients should receive shorter default payment terms (Net 21 rather than Net 30), more frequent reminders, and escalation protocols that PHC does not yet require.
Hypothesis 2: Is invoice size independent of branch location?
Code
# Chi-squared test of independence: size bucket vs locationcross_tab <-table(invoices$size_bucket, invoices$location)h2_test <-chisq.test(cross_tab)# tidy() output — consistent with broom workflow throughout (Adi, 2026)tidy(h2_test)
# Expected counts (check assumption: all expected > 5)min_expected <-min(h2_test$expected)# Cramér's V for effect sizen_total <-sum(cross_tab)cramers_v <-sqrt(h2_test$statistic / (n_total * (min(nrow(cross_tab), ncol(cross_tab)) -1)))tibble(``=c("H₀", "H₁", "Test", "Chi-squared statistic", "Degrees of freedom","p-value", "Cramér's V", "Min expected count", "Decision"),Value =c("Invoice size bucket and branch location are independent","Invoice size distribution differs across branch locations","Pearson chi-squared test of independence",round(h2_test$statistic, 3), h2_test$parameter,format.pval(h2_test$p.value, digits =3),round(cramers_v, 3),round(min_expected, 1),ifelse(h2_test$p.value <0.05, "Reject H₀ — size and location are not independent","Fail to reject H₀") )) |>kable(caption ="Hypothesis 2: Invoice Size vs Branch Location")
Hypothesis 2: Invoice Size vs Branch Location
Value
H₀
Invoice size bucket and branch location are independent
H₁
Invoice size distribution differs across branch locations
Test
Pearson chi-squared test of independence
Chi-squared statistic
164.425
Degrees of freedom
9
p-value
<2e-16
Cramér’s V
0.127
Min expected count
7.8
Decision
Reject H₀ — size and location are not independent
Code
invoices |>count(location, size_bucket) |>group_by(location) |>mutate(pct = n /sum(n)) |>ggplot(aes(x = location, y = pct, fill = size_bucket)) +geom_col() +scale_y_continuous(labels =label_percent()) +scale_fill_brewer(palette ="RdYlBu") +labs(title ="Invoice Size Distribution by Branch",x =NULL, y ="Proportion of Invoices", fill ="Size Bucket" ) +theme_minimal(base_size =13) +theme(legend.position ="bottom")
Invoice Size Distribution by Branch — revealing structural differences
Business Interpretation: The chi-squared test confirms (p < 0.001) that invoice size mix is not the same across branches — the distribution of small, medium, large, and very large invoices differs meaningfully by location. Cramér’s V of 0.127 indicates a weak-to-moderate association: branches differ in size mix, but it is not the dominant factor. Looking at the chart, Abuja and PHC carry a higher proportion of large and very large invoices, while Lagos has more medium-sized invoices in volume terms. This matters for credit management: a branch dominated by large invoices carries more concentrated risk — one delayed ₦2 million invoice affects cash flow far more than ten delayed ₦200,000 ones. Abuja, despite low volume, warrants disproportionate attention given its large-invoice concentration and relatively high outstanding balance (₦11 million on just 52 invoices).
7. Correlation Analysis
Theoretical Background
Hypothesis testing told me that Lagos clients pay slower. Correlation analysis is where I start asking why. Does it come down to invoice size? Is it the credit terms I set? Or something else entirely?
Correlation measures how strongly two variables move together, on a scale from -1 to +1 (Adi, 2026). Pearson’s r captures linear relationships; Spearman’s ρ is more robust when the data is skewed, which ours is.
But the most important thing Prof. Bongo taught me about correlation is what it cannot tell you: that one thing causes another. Two variables can move together because a third variable is driving both. That is exactly why I run a partial correlation in this section, to control for payment terms and see what is left.
corr_matrix <-cor(corr_display, method ="pearson", use ="complete.obs")# corrplot — the standard visualisation used in Adi (2026) for correlation matricescorrplot( corr_matrix,method ="circle",type ="upper",addCoef.col ="black",number.cex =0.8,tl.cex =0.85,title ="Pearson Correlation Matrix — Pastures Hub Invoice Data",mar =c(0, 0, 2, 0))
Pearson Correlation Matrix — linear relationships between numeric variables
Code
# ggcorrplot version for HTML rendering compatibilityggcorrplot( corr_matrix,method ="circle",type ="lower",lab =TRUE,lab_size =3.5,colors =c("#d7191c", "white", "#2c7bb6"),title ="Pearson Correlation Matrix — Pastures Hub Invoice Data") +theme(plot.title =element_text(size =13, face ="bold"))
Pearson Correlation Matrix (ggcorrplot) — alternative view
Code
# Hmisc::rcorr() — provides correlation coefficients AND p-values simultaneously# This tests whether each correlation is statistically significant (Adi, 2026)rcorr_result <-rcorr(as.matrix(corr_display), type ="pearson")# Correlation coefficientscat("=== Pearson Correlation Coefficients ===\n")
=== Pearson Correlation Coefficients ===
Code
print(round(rcorr_result$r, 3))
Days to Payment Log Invoice Value Payment Terms (days)
Days to Payment 1.000 0.015 0.337
Log Invoice Value 0.015 1.000 0.078
Payment Terms (days) 0.337 0.078 1.000
Days Late 0.979 -0.003 0.200
Location (encoded) -0.132 0.058 0.095
Days Late Location (encoded)
Days to Payment 0.979 -0.132
Log Invoice Value -0.003 0.058
Payment Terms (days) 0.200 0.095
Days Late 1.000 -0.097
Location (encoded) -0.097 1.000
Code
# P-values for each correlationcat("\n=== P-values (H0: correlation = 0) ===\n")
=== P-values (H0: correlation = 0) ===
Code
print(round(rcorr_result$P, 4))
Days to Payment Log Invoice Value Payment Terms (days)
Days to Payment NA 0.4300 0
Log Invoice Value 0.43 NA 0
Payment Terms (days) 0.00 0.0000 NA
Days Late 0.00 0.8707 0
Location (encoded) 0.00 0.0026 0
Days Late Location (encoded)
Days to Payment 0.0000 0.0000
Log Invoice Value 0.8707 0.0026
Payment Terms (days) 0.0000 0.0000
Days Late NA 0.0000
Location (encoded) 0.0000 NA
Code
# Spearman (rank-based, robust to non-normality and outliers)spear_matrix <-cor(corr_display, method ="spearman", use ="complete.obs")# Key correlations tablekey_corrs <-tibble(`Variable Pair`=c("Days to Payment ~ Log Invoice Value","Days to Payment ~ Payment Terms","Days to Payment ~ Days Late","Log Invoice Value ~ Payment Terms" ),Pearson =c( corr_matrix["Days to Payment", "Log Invoice Value"], corr_matrix["Days to Payment", "Payment Terms (days)"], corr_matrix["Days to Payment", "Days Late"], corr_matrix["Log Invoice Value", "Payment Terms (days)"] ),Spearman =c( spear_matrix["Days to Payment", "Log Invoice Value"], spear_matrix["Days to Payment", "Payment Terms (days)"], spear_matrix["Days to Payment", "Days Late"], spear_matrix["Log Invoice Value", "Payment Terms (days)"] )) |>mutate(across(c(Pearson, Spearman), ~round(.x, 3)))kable(key_corrs, caption ="Key Correlations: Pearson vs Spearman")
Key Correlations: Pearson vs Spearman
Variable Pair
Pearson
Spearman
Days to Payment ~ Log Invoice Value
0.015
0.013
Days to Payment ~ Payment Terms
0.337
0.422
Days to Payment ~ Days Late
0.979
0.957
Log Invoice Value ~ Payment Terms
0.078
0.086
Code
# Partial correlation — controlling for payment_terms_n to isolate the# direct relationship between days_to_payment and log_total (Adi, 2026)# Method: regress out the confounder, then correlate residuals# Zero-order (unadjusted) correlationr_dtp_total <- corr_matrix["Days to Payment", "Log Invoice Value"]r_dtp_terms <- corr_matrix["Days to Payment", "Payment Terms (days)"]r_total_terms <- corr_matrix["Log Invoice Value", "Payment Terms (days)"]# Partial correlation formula: r_xy.z = (r_xy - r_xz*r_yz) / sqrt((1-r_xz²)(1-r_yz²))partial_r <- (r_dtp_total - r_dtp_terms * r_total_terms) /sqrt((1- r_dtp_terms^2) * (1- r_total_terms^2))# Verify via residuals method (Adi, 2026 — "regress out the confounder")dtp_resid <-residuals(lm(`Days to Payment`~`Payment Terms (days)`, data = corr_display))total_resid <-residuals(lm(`Log Invoice Value`~`Payment Terms (days)`, data = corr_display))partial_r_verify <-cor(dtp_resid, total_resid)tibble(``=c("Zero-order r (Days to Payment ~ Log Invoice Value)","Partial r (controlling for Payment Terms)","Verified via residuals method" ),Value =round(c(r_dtp_total, partial_r, partial_r_verify), 3),Interpretation =c("Weak positive — invoice size barely predicts payment speed","After removing effect of credit terms: relationship nearly zero","Matches formula — confirms robustness of partial correlation" )) |>kable(caption ="Partial Correlation: Days-to-Payment ~ Log Invoice Value, controlling for Payment Terms")
After removing effect of credit terms: relationship nearly zero
Verified via residuals method
-0.012
Matches formula — confirms robustness of partial correlation
Partial Correlation Insight: The zero-order correlation between days-to-payment and invoice size is already weak (r = 0.015). Once we control for payment terms — which is correlated with both invoice size and payment speed — the partial correlation drops to near zero (r = -0.012). This confirms Adi’s (2026) caution about confounding: what little relationship existed between invoice size and payment speed was being driven by the terms I extended, not by the size itself. The management implication holds: focus on terms and client behavior, not invoice size.
Correlation Interpretation:
Three relationships stand out from the matrix, each with a distinct business implication:
1. Days-to-Payment ~ Days Late (Pearson r = 0.979): This near-perfect correlation is almost definitional — invoices that take a long time to pay are, almost by definition, late. This is not operationally surprising, but it confirms that our overdue problem and our slow-payment problem are the same problem. There is no hidden category of invoices that are slow but technically on time.
2. Days-to-Payment ~ Payment Terms (Pearson r = 0.337, Spearman ρ = 0.422): This is the most actionable finding in the entire correlation analysis. When I extend Net 30 terms, clients use all 30 days — and often more. When I extend Net 21, they pay in roughly 21 days. The credit terms I set are a self-fulfilling prophecy. The Spearman coefficient is higher than Pearson here (0.422 vs 0.337), suggesting the relationship is monotonic but not perfectly linear — some clients ignore terms entirely, but most respect them. The implication is direct: tightening terms is the fastest lever I have to accelerate cash collection, at zero additional cost.
3. Days-to-Payment ~ Log Invoice Value (Pearson r = 0.015): Surprisingly weak. Large invoices do not systematically take longer to collect once payment terms are accounted for. This tells me that client behavior — not invoice size — is the primary driver of payment speed. A hotel that pays promptly will pay a ₦2 million invoice in the same number of days as a ₦200,000 one. This finding shifts my focus from invoice-level controls to client-level credit management.
8. Regression Analysis
Theoretical Background
Correlation told me which variables are associated with payment speed. Regression is where I turn that into something I can actually use. The model in this section predicts days-to-payment from three inputs: the credit terms I set, the branch location, and the invoice size.
What makes regression more powerful than correlation is that it isolates each variable’s effect while holding the others constant (Adi, 2026). So when the model tells me that PHC clients pay faster than Lagos clients, it is not because PHC clients happen to get shorter terms. The model has already accounted for that. Each coefficient is a direct answer to a management question.
Like any model, this one has assumptions that need checking, which is why diagnostic plots follow the results. Violations do not automatically invalidate a model, but they must be acknowledged transparently (Adi, 2026).
par(mfrow =c(2, 2), mar =c(4, 4, 2, 1))plot(model)
Regression Diagnostic Plots — checking model assumptions
Code
par(mfrow =c(1, 1))
Code
# Shapiro-Wilk test on standardised residuals — formal normality check (Adi, 2026)# Sample capped at 5000 as Shapiro-Wilk requires n ≤ 5000set.seed(42)std_resids <-rstandard(model)shap_result <-shapiro.test(sample(std_resids, min(5000, length(std_resids))))cat(sprintf("Shapiro-Wilk normality test: W = %.4f, p-value = %.4f\n", shap_result$statistic, shap_result$p.value))
Shapiro-Wilk normality test: W = 0.7722, p-value = 0.0000
Code
cat(ifelse(shap_result$p.value >=0.05,"✓ Residuals appear normally distributed (p ≥ 0.05)","⚠ Residuals deviate from normality — results robust given large n (CLT)"))
⚠ Residuals deviate from normality — results robust given large n (CLT)
Residuals vs Fitted (top-left): The residuals are approximately centred around zero across the fitted value range, with no strong curvature. This supports the linearity assumption — the model is not systematically over- or under-predicting at any part of the range. Some spread widens at higher fitted values, indicating mild heteroscedasticity, which is expected with payment data.
Normal Q-Q (top-right): The standardised residuals follow the theoretical quantile line well through the middle of the distribution. There are heavier tails at both ends — typical of payment duration data, which is right-skewed and bounded at zero. The Shapiro-Wilk p-value above quantifies this; with n > 1,000, the Central Limit Theorem ensures coefficient estimates and standard errors remain reliable regardless.
Scale-Location (bottom-left): The square root of absolute standardised residuals should be roughly flat if homoscedasticity holds. The slight upward slope suggests variance increases with fitted values — a known feature of time-based outcome variables. This does not invalidate the model but suggests that a log-transformed outcome could be explored in future work.
Residuals vs Leverage (bottom-right): No observations fall outside the Cook’s distance dashed lines, indicating no single invoice is exerting undue influence on the coefficient estimates. The model is stable across the data range.
The model (F-statistic significant, p < 0.001) confirms that branch location and payment terms are the dominant predictors of how long Pastures Hub waits to be paid. Each coefficient translates directly into a management decision:
Payment Terms (β = 2.103, p < 0.001): For every additional day of credit extended, clients take approximately 2.1 extra days to pay. This is the most important number in the model. It means the difference between Net 21 and Net 30 is not 9 days — it is 9 × 2.1 = approximately 19 additional days in actual collection time. If Pastures Hub moved its entire Lagos book from Net 30 to Net 21, we should expect to collect roughly 19 days faster. On ₦155 million outstanding, that is a substantial working capital improvement.
PHC vs Lagos (β = −23.9, p < 0.001): Controlling for invoice size and payment terms, PHC clients pay nearly 24 days faster than Lagos clients. This is not simply because PHC clients get shorter terms — the regression controls for that. There is something structural about PHC client behavior, the relationship dynamics there, or the branch’s collections discipline that Lagos has not yet replicated. This deserves a qualitative investigation: what is the PHC team doing differently?
Ibadan vs Lagos (β = +35.1, p < 0.001): Ibadan clients pay 35 days slower than Lagos, controlling for other factors. With only 49 invoices, this may reflect early-stage relationships where credit discipline has not yet been established. It is a warning sign for a branch I am trying to grow.
Log Invoice Value (β = 0.101, p = 0.871): Not significant. Once I control for payment terms and branch, invoice size has no meaningful additional effect on payment speed. This is consistent with the correlation finding: client behavior matters more than invoice size.
R² = 0.184: The model explains 18.4% of the variance in days-to-payment. In behavioral business data this is respectable for a parsimonious model. The remaining 82% is explained by factors not captured in Zoho — client finance team efficiency, invoice disputes, relationship history, and seasonal cash flow pressures. A richer dataset would improve this substantially.
The diagnostic plots confirm that the model assumptions are approximately met: residuals are roughly centred around zero (linearity holds), the Q-Q plot shows reasonable normality in the middle of the distribution with slight heavy tails (expected with payment data), and VIF scores are all well below 5 (no multicollinearity concern).
9. Integrated Findings
The five techniques converge on a single, coherent diagnosis: Pastures Hub has a payment terms problem, a Lagos problem, and an early-warning system problem — and all three are solvable.
EDA established the scale: ₦225 million outstanding on ₦1.24 billion invoiced — an 18% receivables rate that would concern any investor or lender reviewing our books. The portfolio is right-skewed, meaning a small number of large clients drive disproportionate exposure. This alone justifies moving from a uniform credit policy to a tiered one.
Visualization made the Lagos problem visible in a way no spreadsheet had done before. The outstanding receivables stack in Chart 3 is the chart I will be showing at our next management review. Lagos carries ₦155 million of the ₦225 million total. The density plot in Chart 4 shows that PHC payments cluster tightly — predictable, manageable — while Lagos payments are scattered across 20 to 120+ days. That unpredictability is as damaging as the slowness.
Hypothesis testing confirmed, formally, that the Lagos–PHC gap is real (p < 0.001) and that invoice size mix differs across branches (p < 0.001). These are not impressions. They are evidence. This matters when presenting to a bank, a co-founder, or a board — “the data shows” is more powerful than “I believe.”
Correlation analysis identified payment terms as the strongest controllable variable (r = 0.337, ρ = 0.422). Crucially, invoice size barely correlates with payment speed once terms are accounted for — meaning the problem is not the size of our clients but the terms we are offering them.
Regression quantified the lever precisely: every additional day of credit extended adds 2.1 days to collection time. PHC collects 24 days faster than Lagos even after controlling for terms and invoice size. Ibadan is a warning flag.
Action Plan
The five techniques converge on seven concrete actions, each grounded in a specific statistical finding that can be cited and defended.
Priority
Action
Statistical Basis
Quantified Expected Impact
🔴 Immediate
Move default credit terms from Net 30 to Net 21 for all clients
Regression: β(payment_terms) = 2.103, p < 0.001
~19 days faster collection (9 days × 2.1); ~₦155M Lagos book freed ~3 weeks earlier
🔴 Immediate
Introduce Day-14 reminder protocol for all Lagos clients
Hypothesis test: Lagos 16.5 days slower than PHC (p < 0.001, Cohen’s d = 0.363); 62% of Lagos clients breaching contracted terms
Intercepts invoices before they cross the overdue threshold
🔴 Immediate
Enforce late-payment clause for any client > contracted terms
CLIENT_TERMS: 42% overall breach rate; Lagos clients average 18-day overshoot
Converts a policy problem into a contractual enforcement mechanism
🟡 Short-term
Introduce Net 15 for all new clients (< 6 months tenure)
Correlation: payment terms is the strongest controllable variable (ρ = 0.422); new clients lack established payment culture
Anchors payment behavior from onboarding; avoids inheriting bad habits
🟡 Short-term
Flag all Abuja invoices > ₦500k for weekly MD review
EDA: ₦11M outstanding on just 52 invoices — highest outstanding-per-invoice ratio of any branch
Reduces concentration risk; one delayed Abuja invoice = ~₦212k average exposure
🟠 Medium-term
Conduct a qualitative review of PHC collections process
Regression: PHC clients pay 23.9 days faster than Lagos after controlling for terms and invoice size (p < 0.001)
Monitor Ibadan branch closely; restrict to Net 15 only
Regression: Ibadan clients pay 35.1 days slower than Lagos (p < 0.001) on only 49 invoices
Early intervention before the branch scales with a slow-payment culture embedded
What This Means in Practice: The Next 30 Days
The analysis gives Pastures Hub a very specific 30-day agenda — not a vague strategy but a set of actions with measurable outcomes:
Week 1: Revise the standard client contract template to Net 21. For the handful of anchor clients currently on Net 30 (the large Lagos hotels that account for the bulk of the outstanding balance), initiate a conversation: the data shows we are effectively giving them 40–50 days anyway, so formalising Net 21 simply closes the gap between what the contract says and what we enforce.
Week 2: Set up a WhatsApp/email reminder sequence triggered at Day 14 for every open Lagos invoice. This costs nothing and targets the exact window the data identifies — Lagos clients start drifting past their terms between Day 14 and Day 21.
Week 3: Pull the Abuja invoice list and schedule a call for any invoice over ₦500k that is more than 10 days from due. With 52 invoices and ₦11M outstanding, this is a 30-minute weekly task.
Week 4: Brief the PHC branch manager and the Lagos branch manager together, side by side with the regression output. The data is the conversation — PHC collects 24 days faster, not because of terms (those are controlled for), but because of something in the client relationship or follow-up discipline. Make it explicit and make it a cross-branch learning exercise.
Supplementary Analysis: Agreed vs. Actual Payment Terms (Client-Level Evidence)
The Zoho invoice data tells us when clients paid (Imemba, 2026a). The CLIENT_TERMS sheet from our internal Pastures Hub masterfile tells us what they agreed to — the credit terms signed at onboarding (Imemba, 2026b). Comparing the two surfaces a finding that goes beyond slow payment: systematic breach of contracted terms.
Code
# Load client-level agreed vs actual payment terms from internal masterfileclient_terms_raw <-read_csv("data/client_terms.csv", show_col_types =FALSE)# Clean: remove clients with 0 agreed days (cash/COD clients — no credit agreement)# Deduplicate: normalise to uppercase, collapse duplicates by taking the row with# the largest agreed_days (and within that, largest actual_days) per name.# This handles cases like "Genesis Restaurant Lagos" / "GENESIS RESTAURANT" and# "Victoria Crown Plaza" / "VICTORIA CROWN PLAZA HOTEL" appearing twice.client_terms <- client_terms_raw |>filter(agreed_days >0) |>mutate(name_clean =str_to_upper(str_squish(client_name))) |>group_by(name_clean) |>slice_max(order_by = agreed_days *1000+ actual_days, n =1, with_ties =FALSE) |>ungroup() |>mutate(breach = actual_days > agreed_days,overshoot_days = actual_days - agreed_days ) |># Anonymise client names for public publication (same approach as invoice analysis)arrange(region, client_name) |>mutate(client_code =sprintf("Client_%03d", row_number()))# Summary by regionterms_summary <- client_terms |>group_by(region) |>summarise(Clients =n(),`Breaching Contract (%)`=round(100*mean(breach), 0),`Avg Overshoot (days)`=round(mean(overshoot_days[breach], na.rm =TRUE), 1),`Max Overshoot (days)`=max(overshoot_days[breach], na.rm =TRUE),.groups ="drop" )kable(terms_summary,caption ="Contracted vs. Actual Payment Terms: Breach Rate by Region",align ="lcccc")
Contracted vs. Actual Payment Terms: Breach Rate by Region
region
Clients
Breaching Contract (%)
Avg Overshoot (days)
Max Overshoot (days)
Ibadan
3
33
23.0
23
Lagos
21
62
18.5
30
PHC
19
21
28.2
38
Code
# Show individual clients where actual > agreed (anonymized for public publication)breach_table <- client_terms |>filter(breach) |>arrange(region, desc(overshoot_days)) |>select(`Client`= client_code,`Region`= region,`Agreed (days)`= agreed_days,`Actual (days)`= actual_days,`Overshoot (days)`= overshoot_days )kable(breach_table,caption ="Clients Paying Later Than Their Contracted Terms",align ="lcccc")
Clients Paying Later Than Their Contracted Terms
Client
Region
Agreed (days)
Actual (days)
Overshoot (days)
Client_003
Ibadan
7
30
23
Client_013
Lagos
30
60
30
Client_014
Lagos
30
60
30
Client_018
Lagos
30
60
30
Client_008
Lagos
30
45
15
Client_009
Lagos
30
45
15
Client_012
Lagos
30
45
15
Client_016
Lagos
30
45
15
Client_019
Lagos
30
45
15
Client_020
Lagos
30
45
15
Client_021
Lagos
30
45
15
Client_022
Lagos
30
45
15
Client_023
Lagos
30
45
15
Client_024
Lagos
30
45
15
Client_037
PHC
7
45
38
Client_035
PHC
30
60
30
Client_036
PHC
30
60
30
Client_043
PHC
45
60
15
Code
# Dot-plot restricted to clients who breach their agreed terms (anonymized)# Shows how far each client overshoots their contracted credit periodplot_data <- client_terms |>filter(breach) |>arrange(region, desc(overshoot_days)) |>mutate(client_code =fct_inorder(client_code))ggplot(plot_data) +geom_segment(aes(x = agreed_days, xend = actual_days,y = client_code, yend = client_code,color = region),arrow =arrow(length =unit(0.15, "cm"), type ="closed"),linewidth =0.7, alpha =0.7) +geom_point(aes(x = agreed_days, y = client_code), shape =1, size =2.5, color ="grey40") +geom_point(aes(x = actual_days, y = client_code, color = region), size =2.5) +geom_vline(xintercept =0, linetype ="dashed", color ="grey70") +scale_color_manual(values =c("Lagos"="#E63946", "PHC"="#457B9D", "Ibadan"="#2A9D8F")) +labs(title ="Agreed vs. Actual Payment Days per Client (Anonymized)",subtitle ="Arrow: agreed terms → actual behavior. Rightward arrow = breach of contract.",x ="Days", y ="Client Code", color ="Region" ) +theme_minimal(base_size =11) +theme(legend.position ="top")
Contracted vs. Actual Payment Days — Breaching Clients Only
What this means for Pastures Hub: Of the 43 credit clients in our masterfile, 18 are paying later than their contracted terms — a 42% contract breach rate. The average overshoot among breaching clients is 20.9 days. This is not a collections problem alone: it is a contract enforcement problem. Clients know their agreed terms; they are choosing not to honor them. This finding — that clients systematically violate written credit agreements — strengthens the case for the actions proposed above. Renegotiating to Net 21 closes the contractual gap for the majority of Lagos clients, but enforcement mechanisms (late-payment clauses, account suspension thresholds) are equally important.
10. Limitations & Further Work
Several limitations should be noted. First, the dataset covers only November 2024 to April 2026 — 18 months. A longer time series (ideally 3+ years) would allow seasonal decomposition and more stable regression estimates. Second, important predictor variables are absent from the Zoho export: product category, salesperson assigned, number of line items, and client industry segment. Including these could substantially improve the regression’s explanatory power. Third, the regression model assumes linearity and normally distributed residuals — assumptions that may be partially violated given the skewed distribution of invoice values. A log-transformed outcome or quantile regression could address this in further work. Finally, the Abuja and Ibadan branches have very few invoices relative to Lagos and PHC, which limits statistical power for branch-specific inferences; as these branches grow, branch-level modeling will become more reliable.
References
Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online
Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048
Imemba, C. (2026a). Pastures Hub invoice records, November 2024 – April 2026 [Dataset]. Collected from Zoho Books accounting system, Pastures Hub, Lagos, Nigeria. Data available on request from the author.
Imemba, C. (2026b). Pastures Hub client payment terms masterfile [Internal business record]. Pastures Hub, Lagos, Nigeria. Data available on request from the author.
R Core Team. (2024). R: A language and environment for statistical computing (Version 4.5). R Foundation for Statistical Computing. https://www.R-project.org/
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4
Appendix: AI Usage Statement
Claude (Anthropic) was used as a coding assistant throughout this project. Specifically, Claude helped write and debug R code for data loading, cleaning, visualization, and statistical tests, and suggested the structure of the Quarto document. All analytical decisions — including the choice of Case Study 1, the selection of days-to-payment as the primary outcome variable, the formulation of both hypotheses, the regression specification, and the business interpretation of every result — were made independently by the author. The author reviewed every line of code, verified that outputs matched expectations, and can explain and defend all results. The AI usage statement is disclosed in accordance with the academic integrity guidelines set out in the assessment brief (Adi, 2026, Section 4.4).