Receivables Analytics at Pastures Hub: An Exploratory & Inferential Study of B2B Payment Behavior Across Four Nigerian Cities

Author

Chioma Imemba

Published

May 13, 2026

1. Executive Summary

Every week, I look at our receivables dashboard and ask the same question: why are some clients sitting on unpaid invoices for 60, 70, 80 days when our terms say 30? As founder/CEO of Pastures Hub, a Nigerian B2B food distribution company serving the HoReCa sector (hotels, restaurants, and catering) across four major regions: Lagos, Port Harcourt (PHC), Abuja, and Ibadan, cash flow is not an abstract finance concept. It is the difference between meeting our financial obligations and and failing to meet them.

I applied five techniques to 3375 invoices from November 2024 to April 2026 across 68 unique clients — the approach: understand the data first, then test, then model:

Exploratory Data Analysis (EDA)
Visualization
Hypothesis Testing
Correlation
Regression

The results were clearer than I expected. PHC clients pay 16 days faster than Lagos clients on average, and that gap is statistically significant. More surprisingly, the terms I set are almost self-fulfilling: every extra day of credit I extend adds about two more days to how long clients actually take to pay. That matters.

Recommendation: Based on the evidence, Pastures Hub should move immediately to a tiered credit structure: Net 15 for new clients and small invoices, Net 21 as our standard, and Net 30 reserved only for clients with a proven track record. Alongside this, Lagos needs a dedicated collections protocol — a firm reminder call or message at Day 14, before invoices go overdue. Together, these two changes address the root cause the data identified: we are giving clients more time than we need to, and Lagos clients in particular are taking full advantage of it. What makes this more urgent is that our internal CLIENT_TERMS records show 42% of credit clients are already paying beyond their contracted terms, meaning the problem is not just about what terms we set but how well we enforce them.

2. Professional Disclosure

Job Title: Founder & CEO, Pastures Hub Nigeria Ltd.

Organization: Pastures Hub, a B2B food distribution company operating from four major hubs - Lagos, Port Harcourt, Abuja, and Ibadan.

Relationship to data: Invoice data are from our Zoho Books accounting system; client payment terms are from an internal Pastures Hub masterfile. I am the data controller for both and have authorized their use for this submission. All client names have been anonymized.

Case Study: CS1 - Exploratory & Inferential Analytics (EDA, Visualization, Hypothesis Testing, Correlation, Regression).

Why These Five Techniques Are Relevant to My Work

EDA is the starting point for every credit and operations decision I make. Before I can manage our receivables, I need to understand the shape of our invoice portfolio — how much is outstanding, where it sits, which clients and regions dominate exposure, and where the data has gaps or anomalies. EDA gives me that picture systematically rather than relying on gut feel. It’s like switching on the lights in a dark room.

Data Visualization is how I communicate financial health to my team and to potential investors. A well-constructed dashboard or chart showing the receivables stack by branch or the trend in overdue invoices over time conveys in seconds what a table of numbers cannot. A picture speaks a thousand words.

Hypothesis Testing addresses questions I face in every monthly review: Do Lagos clients actually pay slower than PHC clients, or does it just feel that way? Is there a real difference in invoice size across branches, or is it random variation? Hypothesis testing is how we validate evidence with numbers rather than instincts, and this is critical when making staffing, credit, and expansion decisions.

Correlation Analysis helps me understand how one variable impacts another. For Pastures Hub, the key question is what actually drives payment speed — is it how much a client owes, how long they’ve been with us, or the credit terms we set? Knowing which variable matters most tells me where to focus.

Regression helps me begin with the end in mind by turning correlation into prediction and prediction into action. It’s like a formula I can actually use before I issue the invoice. Instead of finding out 60 days later that a client is late, I can anticipate it upfront and plan my cash flow accordingly.

3. Data Collection & Sampling

Source & Collection Method

The data were extracted directly from Zoho Books, Pastures Hub’s invoicing and accounts-receivable system. Zoho Books is the system of record for all sales transactions; every invoice issued to a client is created, tracked, and closed (or marked overdue) within this platform.

Exports were performed using Zoho Books’ built-in CSV export function under Sales → Invoices → Export. Because Zoho limits exports to 180-day windows, six separate exports were performed covering the period November 2024 to April 2026 and subsequently merged.

Sampling Frame

The dataset is a complete census of all invoices issued by Pastures Hub during the study period — not a sample. Every invoice created in Zoho Books within the date window is included. No invoices were excluded except those with missing dates or zero-value totals (data quality removals documented in Section 4). A supplementary dataset of contracted client payment terms was drawn from an internal Pastures Hub operations masterfile and is used in Section 9.

Variables

Variable	Type	Description
`invoice_date`	Date	Date invoice was issued
`invoice_id`	ID	Unique Zoho invoice identifier
`invoice_status`	Categorical	Closed / Overdue / Open / Draft / PartiallyPaid
`customer_name` (anonymized)	Categorical	Client identity (anonymized to Customer_001 etc.)
`location`	Categorical	Branch: Lagos / PHC / Abuja / Ibadan
`total`	Numeric	Invoice value in NGN
`balance`	Numeric	Outstanding amount in NGN
`due_date`	Date	Payment due date
`last_payment_date`	Date	Date of final payment
`payment_terms_label`	Categorical	e.g. Net 15, Net 21, Net 30
`days_to_payment`	Numeric (derived)	Last Payment Date − Invoice Date
`paid_on_time`	Binary (derived)	1 if paid on or before due date, else 0
`days_late`	Numeric (derived)	max(0, Last Payment Date − Due Date)
`size_bucket`	Categorical (derived)	Invoice size tier

Ethical Statement

All client names have been anonymized using sequential codes (Customer_001, Customer_002, …) before analysis. No personally identifiable information appears in the published document. As founder of Pastures Hub, I am the data controller for these records and have authorized their use for this academic submission. The data are internal business records and are not shared beyond this submission.

4. Data Description (EDA)

Theoretical Background

Before I could ask any analytical question about Pastures Hub’s receivables, I needed to know what I was actually working with. What I found was exactly what you would expect from a business running on parallel systems.

These were not clean Zoho exports. Our supply chain team had been manually entering daily supplies into Google Sheets alongside Zoho Books, keeping both files for reconciliation. Over time that became a drag, but it meant the data carried all the marks of manual entry: client names in multiple variations (“Genesis Restaurant”, “GENESIS RESTAURANT LAGOS”, “Genesis Rst”), inconsistent date formats, and records that needed significant cleaning before any analysis could begin.

I standardised the client names manually. I reviewed every variation and collapsed them, which meant that by the time I ran any code, I knew exactly who was in this dataset and could spot anomalies immediately. EDA is the discipline of doing this systematically rather than hoping the data is clean (Adi, 2026).

Anscombe’s Quartet is the classic reminder of why you cannot skip this step. Two datasets can have identical summary statistics and look completely different when you plot them. I needed to see the shape of our invoice portfolio, not just its totals.

What I found surprised me in places. 34 invoices sat above the 99th percentile in value, extreme outliers I retained but flagged. And while I feared there might be payment dates recorded before invoice dates, a common sign of data entry error, there were none. The data was messier than I expected in some ways and cleaner than I expected in others.

Code

# Comprehensive dataset overview using skimr — the standard first-look tool
# for understanding distributions, missingness, and data types simultaneously (Adi, 2026)
skim(invoices |> select(invoice_date, invoice_status, location, total, balance,
                         payment_terms_label, days_to_payment, paid_on_time, days_late))

Data summary
Name	select(…)
Number of rows	3375
Number of columns	9
_______________________
Column type frequency:
character	2
Date	1
factor	1
numeric	5
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
invoice_status	0	1	4	13	0	5	0
payment_terms_label	0	1	5	14	0	10	0

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
invoice_date	0	1	2024-11-06	2026-04-24	2025-09-08	454

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
location	0	1	FALSE	4	Lag: 1975, PHC: 1299, Abu: 52, Iba: 49

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
total	0	1.00	368412.70	687963.07	400	77638.25	198603	406086.2	16169820	▇▁▁▁▁
balance	0	1.00	66705.94	472908.29	0	0.00	0	0.0	16169820	▇▁▁▁▁
days_to_payment	569	0.83	49.52	47.12	0	14.00	38	76.0	457	▇▁▁▁▁
paid_on_time	569	0.83	0.34	0.47	0	0.00	0	1.0	1	▇▁▁▁▅
days_late	569	0.83	29.02	41.16	0	0.00	16	47.0	397	▇▁▁▁▁

Code

# Basic summary statistics — build as a simple tibble to avoid type conflicts
tibble(
  Metric = c(
    "Total Invoices", "Unique Clients",
    "Date Range Start", "Date Range End",
    "Total Revenue (NGN)", "Total Outstanding (NGN)",
    "Median Invoice (NGN)", "Mean Invoice (NGN)"
  ),
  Value = c(
    as.character(nrow(invoices)),
    as.character(n_distinct(invoices$customer_id)),
    as.character(min(invoices$invoice_date)),
    as.character(max(invoices$invoice_date)),
    formatC(sum(invoices$total, na.rm = TRUE), format = "f", digits = 0, big.mark = ","),
    formatC(sum(invoices$balance, na.rm = TRUE), format = "f", digits = 0, big.mark = ","),
    formatC(median(invoices$total, na.rm = TRUE), format = "f", digits = 0, big.mark = ","),
    formatC(mean(invoices$total, na.rm = TRUE), format = "f", digits = 0, big.mark = ",")
  )
) |>
  kable(caption = "Portfolio Summary Statistics")

Portfolio Summary Statistics
Metric	Value
Total Invoices	3375
Unique Clients	68
Date Range Start	2024-11-06
Date Range End	2026-04-24
Total Revenue (NGN)	1,243,392,873
Total Outstanding (NGN)	225,132,542
Median Invoice (NGN)	198,603
Mean Invoice (NGN)	368,413

Code

# Invoice status breakdown
invoices |>
  count(invoice_status, sort = TRUE) |>
  mutate(Percent = round(n / sum(n) * 100, 1)) |>
  rename(Status = invoice_status, Count = n) |>
  kable(caption = "Invoice Status Distribution")

Invoice Status Distribution
Status	Count	Percent
Closed	2784	82.5
Overdue	386	11.4
Open	201	6.0
PartiallyPaid	3	0.1
Draft	1	0.0

Code

# By location
invoices |>
  group_by(location) |>
  summarise(
    Invoices        = n(),
    Clients         = n_distinct(customer_id),
    `Total (NGN)`   = sum(total),
    `Outstanding (NGN)` = sum(balance),
    `Median Invoice`= median(total)
  ) |>
  kable(caption = "Invoice Portfolio by Branch Location",
        format.args = list(big.mark = ","))

Invoice Portfolio by Branch Location
location	Invoices	Clients	Total (NGN)	Outstanding (NGN)	Median Invoice
Lagos	1,975	22	763,771,004	155,845,643	172,500
PHC	1,299	45	448,472,862	53,007,544	226,800
Abuja	52	8	20,858,018	11,385,702	261,852
Ibadan	49	6	10,290,989	4,893,653	203,412

Code

# DATA QUALITY ISSUE 1: Missing last_payment_date on non-closed invoices
missing_payment <- invoices |>
  filter(invoice_status == "Closed", is.na(last_payment_date)) |>
  nrow()

# DATA QUALITY ISSUE 2: Extreme outliers in invoice value
q99 <- quantile(invoices$total, 0.99, na.rm = TRUE)
outliers_high <- invoices |> filter(total > q99) |> nrow()

# DATA QUALITY ISSUE 3: days_to_payment < 0 (payment before invoice — data entry error)
negative_dtp <- invoices |>
  filter(!is.na(days_to_payment), days_to_payment < 0) |>
  nrow()

tibble(
  Issue = c(
    "Closed invoices with no Last Payment Date recorded",
    "Invoice values above 99th percentile (extreme outliers)",
    "Negative days-to-payment (payment date before invoice date)"
  ),
  Count = c(missing_payment, outliers_high, negative_dtp),
  Action = c(
    "Excluded from days_to_payment analysis; retained for portfolio totals",
    "Retained but flagged; robust statistics used where appropriate",
    "Excluded from payment-speed analysis (likely data entry errors)"
  )
) |>
  kable(caption = "Data Quality Issues Identified and Handled")

Data Quality Issues Identified and Handled
Issue	Count	Action
Closed invoices with no Last Payment Date recorded	0	Excluded from days_to_payment analysis; retained for portfolio totals
Invoice values above 99th percentile (extreme outliers)	34	Retained but flagged; robust statistics used where appropriate
Negative days-to-payment (payment date before invoice date)	0	Excluded from payment-speed analysis (likely data entry errors)

Code

invoices |>
  filter(total <= quantile(total, 0.99)) |>
  ggplot(aes(x = total)) +
  geom_histogram(bins = 50, fill = "#2c7bb6", color = "white", alpha = 0.85) +
  scale_x_continuous(labels = label_comma()) +
  labs(
    title    = "Distribution of Invoice Values",
    subtitle = "Pastures Hub | Nov 2024 – Apr 2026 | 99th percentile cap applied",
    x        = "Invoice Total (NGN)",
    y        = "Number of Invoices"
  ) +
  theme_minimal(base_size = 13)

Distribution of Invoice Values (NGN) — right-skewed, typical of B2B invoicing

Code

# Confirm skewness
tibble(
  Statistic = c("Mean", "Median", "Std Dev", "Skewness"),
  Value = c(
    mean(invoices$total, na.rm = TRUE),
    median(invoices$total, na.rm = TRUE),
    sd(invoices$total, na.rm = TRUE),
    (mean(invoices$total, na.rm=TRUE) - median(invoices$total, na.rm=TRUE)) /
      sd(invoices$total, na.rm=TRUE) * 3  # Pearson's 2nd skewness coefficient
  )
) |>
  mutate(Value = round(Value, 2)) |>
  kable(caption = "Invoice Value Distribution Statistics")

Invoice Value Distribution Statistics
Statistic	Value
Mean	368412.70
Median	198603.00
Std Dev	687963.07
Skewness	0.74

EDA Interpretation: The portfolio tells a clear story: ₦1.24 billion invoiced over 18 months, with ₦225 million — roughly 18% — still outstanding. The mean invoice value of ₦368,413 is nearly double the median of ₦198,603, confirming strong right-skew. In practice this means a small number of large hotel clients are driving a disproportionate share of both revenue and receivables risk. Lagos handles the most volume (1,975 invoices) but also carries the heaviest outstanding burden (₦155 million). PHC, despite fewer invoices, has a healthier outstanding-to-revenue ratio. Two data quality issues were identified and handled transparently: 34 extreme-value outliers (above the 99th percentile, retained but flagged) and zero negative days-to-payment values (none found, which is reassuring about data integrity).

Sample Data (Anonymized)

Code

# Show 10 anonymized rows to demonstrate data structure
set.seed(42)
invoices |>
  select(invoice_date, customer_id, location, invoice_status,
         total, balance, payment_terms_label, days_to_payment) |>
  slice_sample(n = 10) |>
  mutate(
    total   = formatC(total, format = "f", digits = 0, big.mark = ","),
    balance = formatC(balance, format = "f", digits = 0, big.mark = ",")
  ) |>
  rename(
    `Invoice Date` = invoice_date,
    `Client ID`    = customer_id,
    `Branch`       = location,
    `Status`       = invoice_status,
    `Total (₦)`    = total,
    `Balance (₦)`  = balance,
    `Terms`        = payment_terms_label,
    `Days to Pay`  = days_to_payment
  ) |>
  kable(caption = "10 Randomly Sampled Invoices (anonymized) — illustrating data structure")

10 Randomly Sampled Invoices (anonymized) — illustrating data structure
Invoice Date	Client ID	Branch	Status	Total (₦)	Balance (₦)	Terms	Days to Pay
2026-01-15	Customer_29	Lagos	Closed	524,300	0	Net 30	96
2025-12-10	Customer_29	Lagos	Closed	104,820	0	Net 30	98
2025-06-10	Customer_45	Lagos	Closed	63,330	0	Net 21	11
2025-05-27	Customer_6	PHC	Closed	214,680	0	Net 30	26
2025-06-20	Customer_29	Lagos	Closed	445,972	0	Net 30	84
2026-03-30	Customer_29	Lagos	Open	762,560	762,560	Net 30	NA
2025-03-17	Customer_29	Lagos	Closed	203,240	0	Net 30	60
2025-11-06	Customer_29	Lagos	Closed	2,600	0	Net 30	82
2025-06-06	Customer_29	Lagos	Closed	379,500	0	Net 30	90
2025-07-04	Customer_8	PHC	Closed	208,882	0	Net 30	10

5. Data Visualization

Theoretical Background

I have sat in enough management meetings to know that a table of numbers loses people in thirty seconds. A chart does not. The five visualisations in this section are not decoration. Each one was chosen deliberately to answer a specific question about Pastures Hub’s receivables that a table could not answer clearly (Adi, 2026).

A histogram shows me the shape of our invoice distribution in a way that a mean and median cannot. A density plot shows me not just that PHC pays faster, but how consistent that payment behavior is compared to Lagos. The sequence below is intentional: growth first, then concentration, then exposure, then speed, then trajectory. Together they tell one story.

Code

invoices |>
  count(year_month) |>
  ggplot(aes(x = year_month, y = n)) +
  geom_col(fill = "#2c7bb6", alpha = 0.85) +
  geom_smooth(method = "loess", se = FALSE, color = "#d7191c", linewidth = 1) +
  scale_x_date(date_breaks = "2 months", date_labels = "%b %Y") +
  scale_y_continuous(labels = label_comma()) +
  labs(
    title    = "Monthly Invoice Volume — Pastures Hub",
    subtitle = "Nov 2024 – Apr 2026 | Red line = smoothed trend",
    x        = NULL, y = "Number of Invoices"
  ) +
  theme_minimal(base_size = 13) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Monthly Invoice Volume — showing Pastures Hub’s growth trajectory

Code

invoices |>
  group_by(location) |>
  summarise(
    Invoices = n(),
    Revenue  = sum(total) / 1e6
  ) |>
  pivot_longer(c(Invoices, Revenue), names_to = "Metric", values_to = "Value") |>
  ggplot(aes(x = location, y = Value, fill = location)) +
  geom_col(alpha = 0.85) +
  facet_wrap(~Metric, scales = "free_y") +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title    = "Invoice Volume and Revenue by Branch",
    subtitle = "Revenue in millions NGN",
    x        = NULL, y = NULL, fill = "Branch"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom")

Invoice Volume and Revenue by Branch — Lagos and PHC dominate

Code

invoices |>
  filter(balance > 0) |>
  group_by(location, invoice_status) |>
  summarise(Outstanding = sum(balance) / 1e6, .groups = "drop") |>
  ggplot(aes(x = location, y = Outstanding, fill = invoice_status)) +
  geom_col() +
  scale_fill_manual(
    values = c("Overdue" = "#d7191c", "Open" = "#fdae61",
               "PartiallyPaid" = "#abd9e9", "Draft" = "#aaaaaa")
  ) +
  scale_y_continuous(labels = label_comma()) +
  labs(
    title    = "Outstanding Receivables by Branch and Status",
    subtitle = "Values in millions NGN",
    x        = NULL, y = "Outstanding (₦ millions)", fill = "Status"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom")

Outstanding Receivables Stack by Branch — the overdue burden

Code

paid |>
  ggplot(aes(x = days_to_payment, fill = location)) +
  geom_density(alpha = 0.5) +
  facet_wrap(~location, nrow = 2) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title    = "Days-to-Payment Distribution by Branch",
    subtitle = "Paid invoices only | Excludes outliers > 365 days",
    x        = "Days from Invoice Date to Payment",
    y        = "Density"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none")

Days-to-Payment Distribution by Branch — PHC pays faster

Code

invoices |>
  group_by(year_month) |>
  summarise(avg_invoice = mean(total, na.rm = TRUE)) |>
  ggplot(aes(x = year_month, y = avg_invoice)) +
  geom_line(color = "#2c7bb6", linewidth = 1) +
  geom_point(color = "#2c7bb6", size = 2) +
  geom_smooth(method = "lm", se = TRUE, color = "#d7191c",
              linetype = "dashed", alpha = 0.15) +
  scale_x_date(date_breaks = "2 months", date_labels = "%b %Y") +
  scale_y_continuous(labels = label_comma()) +
  labs(
    title    = "Average Invoice Value Over Time",
    subtitle = "Red dashed line = linear trend | Pastures Hub Nov 2024 – Apr 2026",
    x        = NULL, y = "Average Invoice Value (NGN)"
  ) +
  theme_minimal(base_size = 13) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Average Invoice Value Over Time — rising trend signals business growth

Visualization Narrative: The five charts tell a single story about a growing business sitting on a receivables problem it has not yet systematically addressed. Volume is up — invoice counts roughly doubled between November 2024 and early 2026 (Chart 1). Lagos and PHC dominate, but Lagos carries nearly three times the outstanding balance of PHC relative to its revenue share (Chart 2 and Chart 3). The density plots reveal not just that PHC pays faster on average, but that PHC payments cluster tightly around 30–40 days while Lagos payments are spread across a much wider range — meaning Lagos has not just a speed problem but a consistency problem (Chart 4). Average invoice values are rising, particularly since mid-2025, suggesting Pastures Hub is winning larger clients — which will intensify the receivables challenge if credit policy does not keep pace (Chart 5).

6. Hypothesis Testing

Theoretical Background

I had a strong suspicion going into this analysis that Lagos clients pay slower than PHC clients. But a suspicion is not evidence. When I am making credit policy decisions that affect cash flow across four branches, I need more than gut feel.

Hypothesis testing is the tool that converts an observation into a verdict (Adi, 2026). It forces me to state what I expect to find, then asks: could this result have happened by chance? The p-value answers that question.

But Prof. Bongo is clear on something I also want to be clear about here: statistical significance is not the same as practical significance. A difference can be real and still be too small to act on. That is why I report effect sizes, Cohen’s d and Cramér’s V, alongside every p-value in this section.

Hypothesis 1: Do Lagos clients pay slower than PHC clients?

Code

# Welch two-sample t-test (does not assume equal variances)
lagos_dtp <- paid |> filter(location == "Lagos") |> pull(days_to_payment)
phc_dtp   <- paid |> filter(location == "PHC")   |> pull(days_to_payment)

# Assumption check: normality (Shapiro on sample)
set.seed(42)
shap_lagos <- shapiro.test(sample(lagos_dtp, min(5000, length(lagos_dtp))))
shap_phc   <- shapiro.test(sample(phc_dtp,   min(5000, length(phc_dtp))))

# Since n is large, CLT applies — proceed with Welch t-test
h1_test <- t.test(lagos_dtp, phc_dtp, alternative = "greater", var.equal = FALSE)

# tidy() output — converts test result to a clean tibble (Adi, 2026)
tidy(h1_test)

# A tibble: 1 × 10
  estimate estimate1 estimate2 statistic  p.value parameter conf.low conf.high
     <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>    <dbl>     <dbl>
1     16.5      56.0      39.5      9.16 6.26e-20     2025.     13.5       Inf
# ℹ 2 more variables: method <chr>, alternative <chr>

Code

# Effect size: Cohen's d
pooled_sd <- sqrt((var(lagos_dtp) + var(phc_dtp)) / 2)
cohens_d  <- (mean(lagos_dtp) - mean(phc_dtp)) / pooled_sd

# Summary
tibble(
  ` ` = c("H₀", "H₁", "Test", "Lagos Mean DTP (days)", "PHC Mean DTP (days)",
           "t-statistic", "p-value", "Cohen's d", "Decision"),
  Value = c(
    "Mean days-to-payment is equal for Lagos and PHC clients",
    "Lagos clients take longer to pay than PHC clients",
    "Welch two-sample t-test (one-tailed)",
    round(mean(lagos_dtp), 1),
    round(mean(phc_dtp), 1),
    round(h1_test$statistic, 3),
    format.pval(h1_test$p.value, digits = 3),
    round(cohens_d, 3),
    ifelse(h1_test$p.value < 0.05, "Reject H₀ — significant difference", "Fail to reject H₀")
  )
) |> kable(caption = "Hypothesis 1: Lagos vs PHC Days-to-Payment")

Hypothesis 1: Lagos vs PHC Days-to-Payment
	Value
H₀	Mean days-to-payment is equal for Lagos and PHC clients
H₁	Lagos clients take longer to pay than PHC clients
Test	Welch two-sample t-test (one-tailed)
Lagos Mean DTP (days)	56
PHC Mean DTP (days)	39.5
t-statistic	9.158
p-value	<2e-16
Cohen’s d	0.363
Decision	Reject H₀ — significant difference

Business Interpretation: The result is unambiguous: Lagos clients take an average of 56 days to pay versus 39.5 days for PHC clients — a gap of 16.5 days. With p < 0.001, this is not a statistical artefact. Cohen’s d of 0.363 places this in the small-to-medium range by conventional benchmarks (0.2 = small, 0.5 = medium), which in a receivables context is operationally significant. On a Lagos outstanding balance of ₦155 million, collecting 16 days faster would represent a meaningful acceleration of cash inflow. The practical implication is immediate: Lagos clients should receive shorter default payment terms (Net 21 rather than Net 30), more frequent reminders, and escalation protocols that PHC does not yet require.

Hypothesis 2: Is invoice size independent of branch location?

Code

# Chi-squared test of independence: size bucket vs location
cross_tab <- table(invoices$size_bucket, invoices$location)

h2_test <- chisq.test(cross_tab)

# tidy() output — consistent with broom workflow throughout (Adi, 2026)
tidy(h2_test)

# A tibble: 1 × 4
  statistic  p.value parameter method                    
      <dbl>    <dbl>     <int> <chr>                     
1      164. 8.93e-31         9 Pearson's Chi-squared test

Code

# Expected counts (check assumption: all expected > 5)
min_expected <- min(h2_test$expected)

# Cramér's V for effect size
n_total  <- sum(cross_tab)
cramers_v <- sqrt(h2_test$statistic / (n_total * (min(nrow(cross_tab), ncol(cross_tab)) - 1)))

tibble(
  ` ` = c("H₀", "H₁", "Test", "Chi-squared statistic", "Degrees of freedom",
           "p-value", "Cramér's V", "Min expected count", "Decision"),
  Value = c(
    "Invoice size bucket and branch location are independent",
    "Invoice size distribution differs across branch locations",
    "Pearson chi-squared test of independence",
    round(h2_test$statistic, 3),
    h2_test$parameter,
    format.pval(h2_test$p.value, digits = 3),
    round(cramers_v, 3),
    round(min_expected, 1),
    ifelse(h2_test$p.value < 0.05, "Reject H₀ — size and location are not independent",
           "Fail to reject H₀")
  )
) |> kable(caption = "Hypothesis 2: Invoice Size vs Branch Location")

Hypothesis 2: Invoice Size vs Branch Location
	Value
H₀	Invoice size bucket and branch location are independent
H₁	Invoice size distribution differs across branch locations
Test	Pearson chi-squared test of independence
Chi-squared statistic	164.425
Degrees of freedom	9
p-value	<2e-16
Cramér’s V	0.127
Min expected count	7.8
Decision	Reject H₀ — size and location are not independent

Code

invoices |>
  count(location, size_bucket) |>
  group_by(location) |>
  mutate(pct = n / sum(n)) |>
  ggplot(aes(x = location, y = pct, fill = size_bucket)) +
  geom_col() +
  scale_y_continuous(labels = label_percent()) +
  scale_fill_brewer(palette = "RdYlBu") +
  labs(
    title    = "Invoice Size Distribution by Branch",
    x        = NULL, y = "Proportion of Invoices", fill = "Size Bucket"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom")

Invoice Size Distribution by Branch — revealing structural differences

Business Interpretation: The chi-squared test confirms (p < 0.001) that invoice size mix is not the same across branches — the distribution of small, medium, large, and very large invoices differs meaningfully by location. Cramér’s V of 0.127 indicates a weak-to-moderate association: branches differ in size mix, but it is not the dominant factor. Looking at the chart, Abuja and PHC carry a higher proportion of large and very large invoices, while Lagos has more medium-sized invoices in volume terms. This matters for credit management: a branch dominated by large invoices carries more concentrated risk — one delayed ₦2 million invoice affects cash flow far more than ten delayed ₦200,000 ones. Abuja, despite low volume, warrants disproportionate attention given its large-invoice concentration and relatively high outstanding balance (₦11 million on just 52 invoices).

7. Correlation Analysis

Theoretical Background

Hypothesis testing told me that Lagos clients pay slower. Correlation analysis is where I start asking why. Does it come down to invoice size? Is it the credit terms I set? Or something else entirely?

Correlation measures how strongly two variables move together, on a scale from -1 to +1 (Adi, 2026). Pearson’s r captures linear relationships; Spearman’s ρ is more robust when the data is skewed, which ours is.

But the most important thing Prof. Bongo taught me about correlation is what it cannot tell you: that one thing causes another. Two variables can move together because a third variable is driving both. That is exactly why I run a partial correlation in this section, to control for payment terms and see what is left.

Code

# Build numeric correlation dataset
corr_data <- paid |>
  mutate(
    log_total        = log1p(total),
    payment_terms_n  = as.numeric(str_extract(payment_terms_label, "\\d+")),
    location_n       = as.numeric(factor(location))
  ) |>
  select(
    days_to_payment,
    log_total,
    payment_terms_n,
    days_late,
    location_n
  ) |>
  drop_na()

# Rename for display
corr_display <- corr_data |>
  rename(
    `Days to Payment`  = days_to_payment,
    `Log Invoice Value` = log_total,
    `Payment Terms (days)` = payment_terms_n,
    `Days Late`        = days_late,
    `Location (encoded)` = location_n
  )

Code

corr_matrix <- cor(corr_display, method = "pearson", use = "complete.obs")

# corrplot — the standard visualisation used in Adi (2026) for correlation matrices
corrplot(
  corr_matrix,
  method      = "circle",
  type        = "upper",
  addCoef.col = "black",
  number.cex  = 0.8,
  tl.cex      = 0.85,
  title       = "Pearson Correlation Matrix — Pastures Hub Invoice Data",
  mar         = c(0, 0, 2, 0)
)

Pearson Correlation Matrix — linear relationships between numeric variables

Code

# ggcorrplot version for HTML rendering compatibility
ggcorrplot(
  corr_matrix,
  method   = "circle",
  type     = "lower",
  lab      = TRUE,
  lab_size = 3.5,
  colors   = c("#d7191c", "white", "#2c7bb6"),
  title    = "Pearson Correlation Matrix — Pastures Hub Invoice Data"
) +
  theme(plot.title = element_text(size = 13, face = "bold"))

Pearson Correlation Matrix (ggcorrplot) — alternative view

Code

# Hmisc::rcorr() — provides correlation coefficients AND p-values simultaneously
# This tests whether each correlation is statistically significant (Adi, 2026)
rcorr_result <- rcorr(as.matrix(corr_display), type = "pearson")

# Correlation coefficients
cat("=== Pearson Correlation Coefficients ===\n")

=== Pearson Correlation Coefficients ===

Code

print(round(rcorr_result$r, 3))

                     Days to Payment Log Invoice Value Payment Terms (days)
Days to Payment                1.000             0.015                0.337
Log Invoice Value              0.015             1.000                0.078
Payment Terms (days)           0.337             0.078                1.000
Days Late                      0.979            -0.003                0.200
Location (encoded)            -0.132             0.058                0.095
                     Days Late Location (encoded)
Days to Payment          0.979             -0.132
Log Invoice Value       -0.003              0.058
Payment Terms (days)     0.200              0.095
Days Late                1.000             -0.097
Location (encoded)      -0.097              1.000

Code

# P-values for each correlation
cat("\n=== P-values (H0: correlation = 0) ===\n")


=== P-values (H0: correlation = 0) ===

Code

print(round(rcorr_result$P, 4))

                     Days to Payment Log Invoice Value Payment Terms (days)
Days to Payment                   NA            0.4300                    0
Log Invoice Value               0.43                NA                    0
Payment Terms (days)            0.00            0.0000                   NA
Days Late                       0.00            0.8707                    0
Location (encoded)              0.00            0.0026                    0
                     Days Late Location (encoded)
Days to Payment         0.0000             0.0000
Log Invoice Value       0.8707             0.0026
Payment Terms (days)    0.0000             0.0000
Days Late                   NA             0.0000
Location (encoded)      0.0000                 NA

Code

# Spearman (rank-based, robust to non-normality and outliers)
spear_matrix <- cor(corr_display, method = "spearman", use = "complete.obs")

# Key correlations table
key_corrs <- tibble(
  `Variable Pair` = c(
    "Days to Payment ~ Log Invoice Value",
    "Days to Payment ~ Payment Terms",
    "Days to Payment ~ Days Late",
    "Log Invoice Value ~ Payment Terms"
  ),
  Pearson  = c(
    corr_matrix["Days to Payment", "Log Invoice Value"],
    corr_matrix["Days to Payment", "Payment Terms (days)"],
    corr_matrix["Days to Payment", "Days Late"],
    corr_matrix["Log Invoice Value", "Payment Terms (days)"]
  ),
  Spearman = c(
    spear_matrix["Days to Payment", "Log Invoice Value"],
    spear_matrix["Days to Payment", "Payment Terms (days)"],
    spear_matrix["Days to Payment", "Days Late"],
    spear_matrix["Log Invoice Value", "Payment Terms (days)"]
  )
) |>
  mutate(across(c(Pearson, Spearman), ~ round(.x, 3)))

kable(key_corrs, caption = "Key Correlations: Pearson vs Spearman")

Key Correlations: Pearson vs Spearman
Variable Pair	Pearson	Spearman
Days to Payment ~ Log Invoice Value	0.015	0.013
Days to Payment ~ Payment Terms	0.337	0.422
Days to Payment ~ Days Late	0.979	0.957
Log Invoice Value ~ Payment Terms	0.078	0.086

Code

# Partial correlation — controlling for payment_terms_n to isolate the
# direct relationship between days_to_payment and log_total (Adi, 2026)
# Method: regress out the confounder, then correlate residuals

# Zero-order (unadjusted) correlation
r_dtp_total <- corr_matrix["Days to Payment", "Log Invoice Value"]
r_dtp_terms <- corr_matrix["Days to Payment", "Payment Terms (days)"]
r_total_terms <- corr_matrix["Log Invoice Value", "Payment Terms (days)"]

# Partial correlation formula: r_xy.z = (r_xy - r_xz*r_yz) / sqrt((1-r_xz²)(1-r_yz²))
partial_r <- (r_dtp_total - r_dtp_terms * r_total_terms) /
             sqrt((1 - r_dtp_terms^2) * (1 - r_total_terms^2))

# Verify via residuals method (Adi, 2026 — "regress out the confounder")
dtp_resid   <- residuals(lm(`Days to Payment`    ~ `Payment Terms (days)`, data = corr_display))
total_resid <- residuals(lm(`Log Invoice Value`   ~ `Payment Terms (days)`, data = corr_display))
partial_r_verify <- cor(dtp_resid, total_resid)

tibble(
  ` ` = c(
    "Zero-order r (Days to Payment ~ Log Invoice Value)",
    "Partial r (controlling for Payment Terms)",
    "Verified via residuals method"
  ),
  Value = round(c(r_dtp_total, partial_r, partial_r_verify), 3),
  Interpretation = c(
    "Weak positive — invoice size barely predicts payment speed",
    "After removing effect of credit terms: relationship nearly zero",
    "Matches formula — confirms robustness of partial correlation"
  )
) |>
  kable(caption = "Partial Correlation: Days-to-Payment ~ Log Invoice Value, controlling for Payment Terms")

Partial Correlation: Days-to-Payment ~ Log Invoice Value, controlling for Payment Terms
	Value	Interpretation
Zero-order r (Days to Payment ~ Log Invoice Value)	0.015	Weak positive — invoice size barely predicts payment speed
Partial r (controlling for Payment Terms)	-0.012	After removing effect of credit terms: relationship nearly zero
Verified via residuals method	-0.012	Matches formula — confirms robustness of partial correlation

Partial Correlation Insight: The zero-order correlation between days-to-payment and invoice size is already weak (r = 0.015). Once we control for payment terms — which is correlated with both invoice size and payment speed — the partial correlation drops to near zero (r = -0.012). This confirms Adi’s (2026) caution about confounding: what little relationship existed between invoice size and payment speed was being driven by the terms I extended, not by the size itself. The management implication holds: focus on terms and client behavior, not invoice size.

Correlation Interpretation:

Three relationships stand out from the matrix, each with a distinct business implication:

1. Days-to-Payment ~ Days Late (Pearson r = 0.979): This near-perfect correlation is almost definitional — invoices that take a long time to pay are, almost by definition, late. This is not operationally surprising, but it confirms that our overdue problem and our slow-payment problem are the same problem. There is no hidden category of invoices that are slow but technically on time.

2. Days-to-Payment ~ Payment Terms (Pearson r = 0.337, Spearman ρ = 0.422): This is the most actionable finding in the entire correlation analysis. When I extend Net 30 terms, clients use all 30 days — and often more. When I extend Net 21, they pay in roughly 21 days. The credit terms I set are a self-fulfilling prophecy. The Spearman coefficient is higher than Pearson here (0.422 vs 0.337), suggesting the relationship is monotonic but not perfectly linear — some clients ignore terms entirely, but most respect them. The implication is direct: tightening terms is the fastest lever I have to accelerate cash collection, at zero additional cost.

3. Days-to-Payment ~ Log Invoice Value (Pearson r = 0.015): Surprisingly weak. Large invoices do not systematically take longer to collect once payment terms are accounted for. This tells me that client behavior — not invoice size — is the primary driver of payment speed. A hotel that pays promptly will pay a ₦2 million invoice in the same number of days as a ₦200,000 one. This finding shifts my focus from invoice-level controls to client-level credit management.

8. Regression Analysis

Theoretical Background

Correlation told me which variables are associated with payment speed. Regression is where I turn that into something I can actually use. The model in this section predicts days-to-payment from three inputs: the credit terms I set, the branch location, and the invoice size.

What makes regression more powerful than correlation is that it isolates each variable’s effect while holding the others constant (Adi, 2026). So when the model tells me that PHC clients pay faster than Lagos clients, it is not because PHC clients happen to get shorter terms. The model has already accounted for that. Each coefficient is a direct answer to a management question.

Like any model, this one has assumptions that need checking, which is why diagnostic plots follow the results. Violations do not automatically invalidate a model, but they must be acknowledged transparently (Adi, 2026).

OLS Regression: Predicting Days-to-Payment

Code

# Prepare regression dataset
reg_data <- paid |>
  mutate(
    log_total       = log1p(total),
    payment_terms_n = as.numeric(str_extract(payment_terms_label, "\\d+")),
    location        = relevel(factor(location), ref = "Lagos")
  ) |>
  select(days_to_payment, log_total, payment_terms_n, location) |>
  drop_na()

Code

# Fit OLS model
model <- lm(days_to_payment ~ log_total + payment_terms_n + location, data = reg_data)

# Tidy output
tidy(model) |>
  mutate(
    across(c(estimate, std.error, statistic), ~ round(.x, 3)),
    p.value = format.pval(p.value, digits = 3),
    significance = case_when(
      as.numeric(p.value) < 0.001 ~ "***",
      as.numeric(p.value) < 0.01  ~ "**",
      as.numeric(p.value) < 0.05  ~ "*",
      TRUE ~ ""
    )
  ) |>
  rename(
    Term = term, Estimate = estimate, `Std. Error` = std.error,
    `t value` = statistic, `p value` = p.value, Sig = significance
  ) |>
  kable(caption = "OLS Regression: Predicting Days-to-Payment")

OLS Regression: Predicting Days-to-Payment
Term	Estimate	Std. Error	t value	p value	Sig
(Intercept)	5.664	7.703	0.735	0.462
log_total	0.101	0.620	0.162	0.871
payment_terms_n	2.103	0.093	22.542	<2e-16
locationPHC	-23.911	1.664	-14.365	<2e-16
locationAbuja	-11.639	8.025	-1.450	0.147
locationIbadan	35.093	7.786	4.507	0	***

Code

# Model fit statistics
glance(model) |>
  select(r.squared, adj.r.squared, sigma, statistic, p.value, nobs) |>
  mutate(across(c(r.squared, adj.r.squared), ~ round(.x, 3)),
         sigma = round(sigma, 1),
         statistic = round(statistic, 1),
         p.value = format.pval(p.value, digits = 3)) |>
  rename(
    `R²` = r.squared, `Adj. R²` = adj.r.squared,
    `Residual Std. Error` = sigma, `F statistic` = statistic,
    `p value` = p.value, `N observations` = nobs
  ) |>
  kable(caption = "Model Fit Statistics")

Model Fit Statistics
R²	Adj. R²	Residual Std. Error	F statistic	p value	N observations
0.184	0.182	41.2	123	<2e-16	2740

Code

par(mfrow = c(2, 2), mar = c(4, 4, 2, 1))
plot(model)

Regression Diagnostic Plots — checking model assumptions

Code

par(mfrow = c(1, 1))

Code

# Shapiro-Wilk test on standardised residuals — formal normality check (Adi, 2026)
# Sample capped at 5000 as Shapiro-Wilk requires n ≤ 5000
set.seed(42)
std_resids <- rstandard(model)
shap_result <- shapiro.test(sample(std_resids, min(5000, length(std_resids))))
cat(sprintf("Shapiro-Wilk normality test: W = %.4f, p-value = %.4f\n",
            shap_result$statistic, shap_result$p.value))

Shapiro-Wilk normality test: W = 0.7722, p-value = 0.0000

Code

cat(ifelse(shap_result$p.value >= 0.05,
           "✓ Residuals appear normally distributed (p ≥ 0.05)",
           "⚠ Residuals deviate from normality — results robust given large n (CLT)"))

⚠ Residuals deviate from normality — results robust given large n (CLT)

Diagnostic Plot Interpretation (Adi, 2026 framework):

Residuals vs Fitted (top-left): The residuals are approximately centred around zero across the fitted value range, with no strong curvature. This supports the linearity assumption — the model is not systematically over- or under-predicting at any part of the range. Some spread widens at higher fitted values, indicating mild heteroscedasticity, which is expected with payment data.
Normal Q-Q (top-right): The standardised residuals follow the theoretical quantile line well through the middle of the distribution. There are heavier tails at both ends — typical of payment duration data, which is right-skewed and bounded at zero. The Shapiro-Wilk p-value above quantifies this; with n > 1,000, the Central Limit Theorem ensures coefficient estimates and standard errors remain reliable regardless.
Scale-Location (bottom-left): The square root of absolute standardised residuals should be roughly flat if homoscedasticity holds. The slight upward slope suggests variance increases with fitted values — a known feature of time-based outcome variables. This does not invalidate the model but suggests that a log-transformed outcome could be explored in future work.
Residuals vs Leverage (bottom-right): No observations fall outside the Cook’s distance dashed lines, indicating no single invoice is exerting undue influence on the coefficient estimates. The model is stable across the data range.

Code

# Check for multicollinearity
vif_vals <- vif(model)
tibble(
  Variable = names(vif_vals),
  VIF      = round(as.numeric(vif_vals), 2),
  Concern  = ifelse(as.numeric(vif_vals) > 5, "High — multicollinearity concern", "OK")
) |>
  kable(caption = "Variance Inflation Factors (VIF) — Multicollinearity Check")

Variance Inflation Factors (VIF) — Multicollinearity Check
VIF	Concern
1.01	OK
1.09	OK
1.09	OK
1.00	OK
1.00	OK
3.00	OK
1.00	OK
1.04	OK
1.01	OK

Regression Interpretation:

The model (F-statistic significant, p < 0.001) confirms that branch location and payment terms are the dominant predictors of how long Pastures Hub waits to be paid. Each coefficient translates directly into a management decision:

Payment Terms (β = 2.103, p < 0.001): For every additional day of credit extended, clients take approximately 2.1 extra days to pay. This is the most important number in the model. It means the difference between Net 21 and Net 30 is not 9 days — it is 9 × 2.1 = approximately 19 additional days in actual collection time. If Pastures Hub moved its entire Lagos book from Net 30 to Net 21, we should expect to collect roughly 19 days faster. On ₦155 million outstanding, that is a substantial working capital improvement.
PHC vs Lagos (β = −23.9, p < 0.001): Controlling for invoice size and payment terms, PHC clients pay nearly 24 days faster than Lagos clients. This is not simply because PHC clients get shorter terms — the regression controls for that. There is something structural about PHC client behavior, the relationship dynamics there, or the branch’s collections discipline that Lagos has not yet replicated. This deserves a qualitative investigation: what is the PHC team doing differently?
Ibadan vs Lagos (β = +35.1, p < 0.001): Ibadan clients pay 35 days slower than Lagos, controlling for other factors. With only 49 invoices, this may reflect early-stage relationships where credit discipline has not yet been established. It is a warning sign for a branch I am trying to grow.
Log Invoice Value (β = 0.101, p = 0.871): Not significant. Once I control for payment terms and branch, invoice size has no meaningful additional effect on payment speed. This is consistent with the correlation finding: client behavior matters more than invoice size.
R² = 0.184: The model explains 18.4% of the variance in days-to-payment. In behavioral business data this is respectable for a parsimonious model. The remaining 82% is explained by factors not captured in Zoho — client finance team efficiency, invoice disputes, relationship history, and seasonal cash flow pressures. A richer dataset would improve this substantially.

The diagnostic plots confirm that the model assumptions are approximately met: residuals are roughly centred around zero (linearity holds), the Q-Q plot shows reasonable normality in the middle of the distribution with slight heavy tails (expected with payment data), and VIF scores are all well below 5 (no multicollinearity concern).

9. Integrated Findings

The five techniques converge on a single, coherent diagnosis: Pastures Hub has a payment terms problem, a Lagos problem, and an early-warning system problem — and all three are solvable.

EDA established the scale: ₦225 million outstanding on ₦1.24 billion invoiced — an 18% receivables rate that would concern any investor or lender reviewing our books. The portfolio is right-skewed, meaning a small number of large clients drive disproportionate exposure. This alone justifies moving from a uniform credit policy to a tiered one.

Visualization made the Lagos problem visible in a way no spreadsheet had done before. The outstanding receivables stack in Chart 3 is the chart I will be showing at our next management review. Lagos carries ₦155 million of the ₦225 million total. The density plot in Chart 4 shows that PHC payments cluster tightly — predictable, manageable — while Lagos payments are scattered across 20 to 120+ days. That unpredictability is as damaging as the slowness.

Hypothesis testing confirmed, formally, that the Lagos–PHC gap is real (p < 0.001) and that invoice size mix differs across branches (p < 0.001). These are not impressions. They are evidence. This matters when presenting to a bank, a co-founder, or a board — “the data shows” is more powerful than “I believe.”

Correlation analysis identified payment terms as the strongest controllable variable (r = 0.337, ρ = 0.422). Crucially, invoice size barely correlates with payment speed once terms are accounted for — meaning the problem is not the size of our clients but the terms we are offering them.

Regression quantified the lever precisely: every additional day of credit extended adds 2.1 days to collection time. PHC collects 24 days faster than Lagos even after controlling for terms and invoice size. Ibadan is a warning flag.

Action Plan

The five techniques converge on seven concrete actions, each grounded in a specific statistical finding that can be cited and defended.

Priority	Action	Statistical Basis	Quantified Expected Impact
🔴 Immediate	Move default credit terms from Net 30 to Net 21 for all clients	Regression: β(payment_terms) = 2.103, p < 0.001	~19 days faster collection (9 days × 2.1); ~₦155M Lagos book freed ~3 weeks earlier
🔴 Immediate	Introduce Day-14 reminder protocol for all Lagos clients	Hypothesis test: Lagos 16.5 days slower than PHC (p < 0.001, Cohen’s d = 0.363); 62% of Lagos clients breaching contracted terms	Intercepts invoices before they cross the overdue threshold
🔴 Immediate	Enforce late-payment clause for any client > contracted terms	CLIENT_TERMS: 42% overall breach rate; Lagos clients average 18-day overshoot	Converts a policy problem into a contractual enforcement mechanism
🟡 Short-term	Introduce Net 15 for all new clients (< 6 months tenure)	Correlation: payment terms is the strongest controllable variable (ρ = 0.422); new clients lack established payment culture	Anchors payment behavior from onboarding; avoids inheriting bad habits
🟡 Short-term	Flag all Abuja invoices > ₦500k for weekly MD review	EDA: ₦11M outstanding on just 52 invoices — highest outstanding-per-invoice ratio of any branch	Reduces concentration risk; one delayed Abuja invoice = ~₦212k average exposure
🟠 Medium-term	Conduct a qualitative review of PHC collections process	Regression: PHC clients pay 23.9 days faster than Lagos after controlling for terms and invoice size (p < 0.001)	Identify transferable practices — relationship management, invoice follow-up discipline — for Lagos replication
🟠 Medium-term	Monitor Ibadan branch closely; restrict to Net 15 only	Regression: Ibadan clients pay 35.1 days slower than Lagos (p < 0.001) on only 49 invoices	Early intervention before the branch scales with a slow-payment culture embedded

What This Means in Practice: The Next 30 Days

The analysis gives Pastures Hub a very specific 30-day agenda — not a vague strategy but a set of actions with measurable outcomes:

Week 1: Revise the standard client contract template to Net 21. For the handful of anchor clients currently on Net 30 (the large Lagos hotels that account for the bulk of the outstanding balance), initiate a conversation: the data shows we are effectively giving them 40–50 days anyway, so formalising Net 21 simply closes the gap between what the contract says and what we enforce.

Week 2: Set up a WhatsApp/email reminder sequence triggered at Day 14 for every open Lagos invoice. This costs nothing and targets the exact window the data identifies — Lagos clients start drifting past their terms between Day 14 and Day 21.

Week 3: Pull the Abuja invoice list and schedule a call for any invoice over ₦500k that is more than 10 days from due. With 52 invoices and ₦11M outstanding, this is a 30-minute weekly task.

Week 4: Brief the PHC branch manager and the Lagos branch manager together, side by side with the regression output. The data is the conversation — PHC collects 24 days faster, not because of terms (those are controlled for), but because of something in the client relationship or follow-up discipline. Make it explicit and make it a cross-branch learning exercise.

Supplementary Analysis: Agreed vs. Actual Payment Terms (Client-Level Evidence)

The Zoho invoice data tells us when clients paid (Imemba, 2026a). The CLIENT_TERMS sheet from our internal Pastures Hub masterfile tells us what they agreed to — the credit terms signed at onboarding (Imemba, 2026b). Comparing the two surfaces a finding that goes beyond slow payment: systematic breach of contracted terms.

Code

# Load client-level agreed vs actual payment terms from internal masterfile
client_terms_raw <- read_csv("data/client_terms.csv", show_col_types = FALSE)

# Clean: remove clients with 0 agreed days (cash/COD clients — no credit agreement)
# Deduplicate: normalise to uppercase, collapse duplicates by taking the row with
# the largest agreed_days (and within that, largest actual_days) per name.
# This handles cases like "Genesis Restaurant Lagos" / "GENESIS RESTAURANT" and
# "Victoria Crown Plaza" / "VICTORIA CROWN PLAZA HOTEL" appearing twice.
client_terms <- client_terms_raw |>
  filter(agreed_days > 0) |>
  mutate(name_clean = str_to_upper(str_squish(client_name))) |>
  group_by(name_clean) |>
  slice_max(order_by = agreed_days * 1000 + actual_days, n = 1, with_ties = FALSE) |>
  ungroup() |>
  mutate(
    breach = actual_days > agreed_days,
    overshoot_days = actual_days - agreed_days
  ) |>
  # Anonymise client names for public publication (same approach as invoice analysis)
  arrange(region, client_name) |>
  mutate(client_code = sprintf("Client_%03d", row_number()))

# Summary by region
terms_summary <- client_terms |>
  group_by(region) |>
  summarise(
    Clients = n(),
    `Breaching Contract (%)` = round(100 * mean(breach), 0),
    `Avg Overshoot (days)` = round(mean(overshoot_days[breach], na.rm = TRUE), 1),
    `Max Overshoot (days)` = max(overshoot_days[breach], na.rm = TRUE),
    .groups = "drop"
  )

kable(terms_summary,
      caption = "Contracted vs. Actual Payment Terms: Breach Rate by Region",
      align = "lcccc")

Contracted vs. Actual Payment Terms: Breach Rate by Region
region	Clients	Breaching Contract (%)	Avg Overshoot (days)	Max Overshoot (days)
Ibadan	3	33	23.0	23
Lagos	21	62	18.5	30
PHC	19	21	28.2	38

Code

# Show individual clients where actual > agreed (anonymized for public publication)
breach_table <- client_terms |>
  filter(breach) |>
  arrange(region, desc(overshoot_days)) |>
  select(
    `Client` = client_code,
    `Region` = region,
    `Agreed (days)` = agreed_days,
    `Actual (days)` = actual_days,
    `Overshoot (days)` = overshoot_days
  )

kable(breach_table,
      caption = "Clients Paying Later Than Their Contracted Terms",
      align = "lcccc")

Clients Paying Later Than Their Contracted Terms
Client	Region	Agreed (days)	Actual (days)	Overshoot (days)
Client_003	Ibadan	7	30	23
Client_013	Lagos	30	60	30
Client_014	Lagos	30	60	30
Client_018	Lagos	30	60	30
Client_008	Lagos	30	45	15
Client_009	Lagos	30	45	15
Client_012	Lagos	30	45	15
Client_016	Lagos	30	45	15
Client_019	Lagos	30	45	15
Client_020	Lagos	30	45	15
Client_021	Lagos	30	45	15
Client_022	Lagos	30	45	15
Client_023	Lagos	30	45	15
Client_024	Lagos	30	45	15
Client_037	PHC	7	45	38
Client_035	PHC	30	60	30
Client_036	PHC	30	60	30
Client_043	PHC	45	60	15

Code

# Dot-plot restricted to clients who breach their agreed terms (anonymized)
# Shows how far each client overshoots their contracted credit period
plot_data <- client_terms |>
  filter(breach) |>
  arrange(region, desc(overshoot_days)) |>
  mutate(client_code = fct_inorder(client_code))

ggplot(plot_data) +
  geom_segment(aes(x = agreed_days, xend = actual_days,
                   y = client_code, yend = client_code,
                   color = region),
               arrow = arrow(length = unit(0.15, "cm"), type = "closed"),
               linewidth = 0.7, alpha = 0.7) +
  geom_point(aes(x = agreed_days, y = client_code), shape = 1, size = 2.5, color = "grey40") +
  geom_point(aes(x = actual_days, y = client_code, color = region), size = 2.5) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "grey70") +
  scale_color_manual(values = c("Lagos" = "#E63946", "PHC" = "#457B9D", "Ibadan" = "#2A9D8F")) +
  labs(
    title = "Agreed vs. Actual Payment Days per Client (Anonymized)",
    subtitle = "Arrow: agreed terms → actual behavior. Rightward arrow = breach of contract.",
    x = "Days", y = "Client Code", color = "Region"
  ) +
  theme_minimal(base_size = 11) +
  theme(legend.position = "top")

Contracted vs. Actual Payment Days — Breaching Clients Only

What this means for Pastures Hub: Of the 43 credit clients in our masterfile, 18 are paying later than their contracted terms — a 42% contract breach rate. The average overshoot among breaching clients is 20.9 days. This is not a collections problem alone: it is a contract enforcement problem. Clients know their agreed terms; they are choosing not to honor them. This finding — that clients systematically violate written credit agreements — strengthens the case for the actions proposed above. Renegotiating to Net 21 closes the contractual gap for the majority of Lagos clients, but enforcement mechanisms (late-payment clauses, account suspension thresholds) are equally important.

10. Limitations & Further Work

Several limitations should be noted. First, the dataset covers only November 2024 to April 2026 — 18 months. A longer time series (ideally 3+ years) would allow seasonal decomposition and more stable regression estimates. Second, important predictor variables are absent from the Zoho export: product category, salesperson assigned, number of line items, and client industry segment. Including these could substantially improve the regression’s explanatory power. Third, the regression model assumes linearity and normally distributed residuals — assumptions that may be partially violated given the skewed distribution of invoice values. A log-transformed outcome or quantile regression could address this in further work. Finally, the Abuja and Ibadan branches have very few invoices relative to Lagos and PHC, which limits statistical power for branch-specific inferences; as these branches grow, branch-level modeling will become more reliable.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048

Imemba, C. (2026a). Pastures Hub invoice records, November 2024 – April 2026 [Dataset]. Collected from Zoho Books accounting system, Pastures Hub, Lagos, Nigeria. Data available on request from the author.

Imemba, C. (2026b). Pastures Hub client payment terms masterfile [Internal business record]. Pastures Hub, Lagos, Nigeria. Data available on request from the author.

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.5). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Appendix: AI Usage Statement

Claude (Anthropic) was used as a coding assistant throughout this project. Specifically, Claude helped write and debug R code for data loading, cleaning, visualization, and statistical tests, and suggested the structure of the Quarto document. All analytical decisions — including the choice of Case Study 1, the selection of days-to-payment as the primary outcome variable, the formulation of both hypotheses, the regression specification, and the business interpretation of every result — were made independently by the author. The author reviewed every line of code, verified that outputs matched expectations, and can explain and defend all results. The AI usage statement is disclosed in accordance with the academic integrity guidelines set out in the assessment brief (Adi, 2026, Section 4.4).