Legal Hub LLP — Client Payment Analytics (2024–2025)

Case Study 1: Exploratory & Inferential Analytics

Author

Anthony Ezeamama | EMBA-31 | LBS

Published

April 1, 2026


ImportantConfidentiality Disclaimer

All client and counterparty names appearing in the underlying dataset have been changed from their real identities for confidentiality and professional conduct purposes. The financial figures are real and have not been altered. This anonymisation was carried out prior to analysis and has no bearing on the integrity of the findings presented in this report. See Section 3.6 for the full ethical and confidentiality statement.


1 Executive Summary

Legal Hub LLP is a full-service commercial law firm whose continued financial health depends on understanding how client payment behaviour evolves from year to year. This report analyses the firm’s internal client payment records for the two full financial years 2024 and 2025, encompassing 91 and 92 client relationships respectively, with combined revenues of approximately ₦2.23 billion across the two years.

Using five analytical techniques — Exploratory Data Analysis (EDA), Visualisation, Hypothesis Testing, Correlation, and Regression — the report addresses seven business questions ranging from revenue trends and client retention to practice area performance and concentration risk.

Key findings reveal that total revenue grew modestly by approximately 1.6% year-on-year, but this headline growth conceals significant turbulence: the firm’s single largest client in 2024 (accounting for 18.5% of total revenue) reduced its payments by 75% in 2025, and fewer than half of 2024 clients returned. Revenue is heavily Pareto-concentrated — the top five clients contribute over 35% of annual income — creating meaningful dependency risk. Oil & Gas and Power & Energy are overwhelmingly the most commercially valuable practice areas.

The central recommendation is that the firm must urgently diversify its client base while intensifying relationship management with its top-tier energy sector clients. Continued reliance on a small number of high-value mandates in a single sector represents a material strategic risk.


2 Professional Disclosure

Name: Anthony Ezeamama Job Title: Partner Organisation: Legal Hub LLP Organisation Type: Commercial Law Firm — Professional Services Sector: Legal Services — (1) Oil & Gas; (2) Power; (3) Banking & Finance; (4) Capital Markets; (5) M&A; (6) Technology and Digital Economy; (7) Corporate & Commercial; (8) Legal Tax Advisory; (9) Private Equity; and (10) Real Estate & Construction


As a Partner at Legal Hub LLP, I hold both client-facing and firm management responsibilities. My role encompasses originating and leading client engagements, overseeing the profitability of my client portfolio, and contributing to the firm’s strategic planning as a member of its management leadership. The five analytical techniques applied in this report are directly relevant to my professional practice in the following ways:

Exploratory Data Analysis (EDA): As a Partner responsible for firm revenue, I regularly review payment summaries and client account reports. EDA formalises this review process — it gives me a systematic, evidence-based picture of who is paying, how much, and in what pattern. This is operationally critical during annual budget reviews, partner profit-sharing discussions, and client retention planning. Rather than relying on intuition about which clients matter most, EDA surfaces the data to back those judgements.

Visualisation: Law firm management involves presenting financial intelligence to non-technical colleagues — fellow partners, the management committee, and occasionally institutional lenders. Visualisation translates raw payment data into charts and graphs that enable faster, more aligned decision-making in those rooms. In my experience, a well-designed chart of client revenue concentration has more influence on a partner meeting than a spreadsheet of numbers.

Hypothesis Testing: Law firms frequently make year-end assessments of whether financial performance has materially improved or declined. Without formal testing, these judgements are impressionistic. Hypothesis testing gives me a principled basis for asserting — with a stated confidence level — whether observed changes in revenue are statistically meaningful or simply reflect normal variation. This is important when making arguments to the management committee about resource reallocation or headcount changes.

Correlation Analysis: Understanding how a client’s payment behaviour in one year predicts their behaviour the following year helps the firm plan its cash flow and prioritise relationship investment. Correlation also helps identify whether particular practice areas tend to produce consistently high-value client relationships, which informs our lateral hiring and business development strategy.

Regression Analysis: Revenue forecasting is a core planning tool for any professional services firm. Regression allows me to build a predictive model of what individual client revenues are likely to be in the next period, based on past behaviour. The power-law model also gives me a precise quantification of how concentrated our revenue is — a figure I can present to the firm’s management committee to motivate a deliberate client diversification strategy.


3 Data Collection & Sampling

3.1 Source

The data analysed in this report is drawn from Legal Hub LLP’s internal practice management and billing system. Specifically, the dataset records net client payments received by the firm during the calendar years 2024 and 2025. The data was extracted by the firm’s finance function in the form of a structured spreadsheet (Payment Analysis 2.xlsx), with one worksheet per year.

3.2 Collection Method

The data was compiled from the firm’s accounts receivable ledger, which records all amounts received from clients against invoices raised for legal services. “Net payment” refers to amounts actually received (not invoiced), net of any credit notes or fee adjustments. The data was extracted administratively rather than through survey or primary research.

3.3 Sampling Frame & Sample Size

The dataset is a census, not a sample — it covers all clients who made at least one net payment to Legal Hub LLP during the relevant year. There was no sampling process; every paying client relationship in each year is included.

Year Number of Clients Total Net Revenue (₦)
2024 91 clients See Section 5
2025 92 clients See Section 5

3.4 Time Period Covered

  • 2024 dataset: 1 January 2024 to 31 December 2024
  • 2025 dataset: 1 January 2025 to 31 December 2025

3.5 Variables Available

Each record contains: client rank (by payment size), client name, net payment amount (₦), percentage of total revenue, and practice area classification.

3.6 Ethical Notes & Confidentiality Statement

This analysis was conducted by a Partner of the firm and falls within the legitimate internal management use of client financial data. No external disclosure of identifiable client information has been made. All client and counterparty names appearing in the underlying dataset have been changed from their real identities for confidentiality and professional conduct purposes. The financial figures are real and have not been altered. This anonymisation was carried out prior to analysis and has no bearing on the integrity of the findings presented in this report. The firm’s data governance policies and professional conduct obligations under the relevant bar rules have been observed throughout.


4 Data Description

Show Code
library(readxl)
library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)
library(RColorBrewer)
library(stringr)
library(knitr)

f <- "Payment Analysis 2.xlsx"

# ── Parse one sheet ──────────────────────────────────────────────────────────
parse_sheet <- function(sheet_name, year) {
  raw <- read_excel(f, sheet = sheet_name, col_names = FALSE,
                    col_types = "text")
  names(raw) <- c("rank","client","amount_raw","amount_alt",
                  "pct_raw","practice_raw","x7","x8")
  raw %>%
    filter(!is.na(rank),
           !is.na(suppressWarnings(as.numeric(rank)))) %>%
    mutate(
      rank     = as.integer(rank),
      client   = str_trim(client),
      amount   = suppressWarnings(as.numeric(amount_raw)),
      pct      = suppressWarnings(as.numeric(pct_raw)),
      practice = str_trim(practice_raw),
      year     = year
    ) %>%
    filter(!is.na(amount), amount > 0) %>%
    select(year, rank, client, amount, pct, practice)
}

df25 <- parse_sheet("2025 Payment", 2025)
df24 <- parse_sheet("2024 Payment", 2024)

# ── Practice area harmonisation ──────────────────────────────────────────────
harmonise_practice <- function(pa, client_name) {
  x <- if_else(!is.na(pa) & pa != "", pa, client_name)
  case_when(
    str_detect(x, regex("oil|petro|lng|midget|rig|china legal|moor.*moor|
                         energy trad|zenith|corner|berlin|atlantic|yorkshire|
                         smith.*jones|petroleum intl",
                         ignore_case = TRUE)) ~ "Oil & Gas",
    str_detect(x, regex("power|genco|ecowas|solar|renew|jeep|bond.*power|
                         churgate|cosec|lekki.*power|aso.*power|steam.*energy|
                         industrial.*rock|larger.*brother|charger.*energy|
                         grid.*energy|power.*energy|africa energy|canada.*consult|
                         canadian.*consult|rocafeller|savannah tech",
                         ignore_case = TRUE)) ~ "Power & Energy",
    str_detect(x, regex("bank|microfinance|greece|overy|federal bank|
                         wuse.*microfinance|steam.*energy",
                         ignore_case = TRUE))          ~ "Banking & Finance",
    str_detect(x, regex("capital|asset.*sec|guarantee|butter|cornerstone.*real|
                         vegas|renaissance|AIK|western pens|global capital",
                         ignore_case = TRUE))          ~ "Capital Markets",
    str_detect(x, regex("M&A|merger|gombe|acquisition",
                         ignore_case = TRUE))          ~ "M&A",
    str_detect(x, regex("tax|nuprc",
                         ignore_case = TRUE))          ~ "Tax",
    str_detect(x, regex("corp|enerco|construct|lekki integ|natural gas assoc",
                         ignore_case = TRUE))          ~ "Corporate",
    str_detect(x, regex("real estate|homes.*real|cubana|crown.*realt|estate dev",
                         ignore_case = TRUE))          ~ "Real Estate",
    str_detect(x, regex("tech|media|intel|AI solution|lab.*stem|clad|
                         large chips|savannah",
                         ignore_case = TRUE))          ~ "Technology & Media",
    str_detect(x, regex("invest.*income|rental|training.*revenue",
                         ignore_case = TRUE))          ~ "Other Income",
    TRUE ~ "General / Unclassified"
  )
}

df25 <- df25 %>%
  mutate(practice_clean = harmonise_practice(practice, client))
df24 <- df24 %>%
  mutate(practice_clean = harmonise_practice(practice, client))

df_all <- bind_rows(df25, df24) %>% mutate(year = factor(year))

total25 <- sum(df25$amount)
total24 <- sum(df24$amount)

4.1 Variable Dictionary

The table below describes every variable retained for analysis after cleaning.

Show Code
var_dict <- data.frame(
  `Variable`    = c("year","rank","client","amount","pct","practice","practice_clean"),
  `Type`        = c("Categorical (factor)","Integer","Character","Numeric (₦)",
                    "Numeric (proportion)","Character","Categorical (factor)"),
  `Description` = c(
    "Financial year in which payment was received (2024 or 2025)",
    "Client rank within the year, ordered from largest to smallest payer (1 = highest)",
    "Anonymised client name (real names withheld — see Section 3.6)",
    "Total net payment received from client during the year, in Nigerian Naira (₦)",
    "Client's payment as a proportion of the firm's total annual revenue (0–1 scale)",
    "Practice area label as recorded in the billing system (partially complete)",
    "Cleaned and harmonised practice area label, derived from billing record or client name"
  ),
  `Missing Values` = c("None","None","None","None","Minimal","~40% in each year","Derived — none")
)
kable(var_dict, caption = "Table 1: Variable Dictionary",
      col.names = c("Variable","Type","Description","Missing Values"))
Table 1: Variable Dictionary
Variable Type Description Missing Values
year Categorical (factor) Financial year in which payment was received (2024 or 2025) None
rank Integer Client rank within the year, ordered from largest to smallest payer (1 = highest) None
client Character Anonymised client name (real names withheld — see Section 3.6) None
amount Numeric (₦) Total net payment received from client during the year, in Nigerian Naira (₦) None
pct Numeric (proportion) Client’s payment as a proportion of the firm’s total annual revenue (0–1 scale) Minimal
practice Character Practice area label as recorded in the billing system (partially complete) ~40% in each year
practice_clean Categorical (factor) Cleaned and harmonised practice area label, derived from billing record or client name Derived — none

4.2 Distributional Summary

Show Code
desc_stats <- df_all %>%
  group_by(Year = year) %>%
  summarise(
    `n (clients)`    = n(),
    `Total (₦)`      = format(round(sum(amount)), big.mark = ","),
    `Mean (₦)`       = format(round(mean(amount)), big.mark = ","),
    `Median (₦)`     = format(round(median(amount)), big.mark = ","),
    `Std Dev (₦)`    = format(round(sd(amount)), big.mark = ","),
    `Min (₦)`        = format(round(min(amount)), big.mark = ","),
    `Max (₦)`        = format(round(max(amount)), big.mark = ","),
    `Skewness`       = round(
      (mean(amount) - median(amount)) / sd(amount), 3),
    .groups = "drop"
  )

kable(desc_stats, caption = "Table 2: Distributional Summary of Client Payment Amounts by Year",
      align = c("l", rep("r", 8)))
Table 2: Distributional Summary of Client Payment Amounts by Year
Year n (clients) Total (₦) Mean (₦) Median (₦) Std Dev (₦) Min (₦) Max (₦) Skewness
2024 91 1,102,750,681 12,118,139 5,525,000 24,638,521 215,000 2.04e+08 0.268
2025 92 1,120,421,932 12,178,499 5,656,250 18,164,971 342,500 101,768,156 0.359

The data exhibits strong positive skewness in both years (mean substantially exceeds median), confirming that a small number of very large client payments pull the average upward. This is the hallmark of a Pareto-distributed variable and motivates the use of both parametric and non-parametric techniques in subsequent sections.

Practice area coverage is partially complete in the raw billing data. Where the practice area field was blank, it was inferred from the client name using a rule-based classification. The resulting practice_clean variable is used throughout all subsequent analysis.


5 Technique 1 — Exploratory Data Analysis (EDA)

5.1 Theory

Exploratory Data Analysis, introduced by Tukey (1977), is the process of summarising and interrogating a dataset to understand its structure, distributions, central tendencies, spread, and anomalies before any formal modelling. EDA is a prerequisite for all other techniques: it ensures the analyst understands what the data contains, identifies quality issues, and generates hypotheses for further testing. Key tools include summary statistics (mean, median, standard deviation, skewness), frequency distributions, and missing-data audits.

5.2 Business Justification

Before drawing conclusions about revenue trends, client behaviour, or practice area performance, it is essential to understand the shape and quality of the data. EDA reveals whether our measures are meaningful (e.g. whether the mean is an appropriate summary statistic given skewness), flags data gaps that could bias conclusions, and quantifies the magnitude of client payment differences. For a law firm partner, EDA is the analytical equivalent of reading the financial statements before attending a budget meeting.

5.3 Analysis

5.3.1 Q1 — Are Payment Patterns Progressive?

Show Code
ggplot(df_all, aes(x = amount / 1e6, fill = year)) +
  geom_histogram(bins = 30, alpha = 0.72, position = "identity") +
  scale_fill_manual(values = c("2024" = "#2166AC", "2025" = "#D6604D")) +
  scale_x_continuous(labels = label_comma(suffix = "M")) +
  labs(
    title    = "Figure 1: Distribution of Client Payment Amounts (₦M)",
    subtitle = "Highly right-skewed in both years — a small number of clients dominate revenue",
    x        = "Payment Amount (₦ Millions)",
    y        = "Number of Clients",
    fill     = "Year"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "top", plot.title = element_text(face = "bold"))

Show Code
ggplot(df_all, aes(x = rank, y = amount / 1e6, colour = year)) +
  geom_point(alpha = 0.55, size = 2.2) +
  scale_colour_manual(values = c("2024" = "#2166AC", "2025" = "#D6604D")) +
  scale_y_continuous(labels = label_comma(suffix = "M")) +
  labs(
    title    = "Figure 2: Client Rank vs. Payment Amount",
    subtitle = "Payments fall sharply from rank 1 and flatten to a long tail — not a progressive (linear) pattern",
    x        = "Client Rank (1 = largest payer)",
    y        = "Payment Amount (₦ Millions)",
    colour   = "Year"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "top", plot.title = element_text(face = "bold"))

Show Code
rev_df <- data.frame(
  Year    = c("2024","2025"),
  Revenue = c(total24, total25),
  Clients = c(nrow(df24), nrow(df25))
) %>% mutate(
  `Revenue (₦M)` = round(Revenue / 1e6, 2),
  `Avg/Client (₦M)` = round(Revenue / Clients / 1e6, 2),
  `Median/Client (₦M)` = c(round(median(df24$amount)/1e6, 2),
                             round(median(df25$amount)/1e6, 2))
)
kable(rev_df %>% select(-Revenue),
      caption = "Table 3: Annual Revenue Summary — 2024 vs 2025")
Table 3: Annual Revenue Summary — 2024 vs 2025
Year Clients Revenue (₦M) Avg/Client (₦M) Median/Client (₦M)
2024 91 1102.75 12.12 5.53
2025 92 1120.42 12.18 5.66
Show Code
filed24 <- sum(!is.na(df24$practice) & df24$practice != "")
filed25 <- sum(!is.na(df25$practice) & df25$practice != "")
miss_df <- data.frame(
  Year            = c(2024, 2025),
  Total_Clients   = c(nrow(df24), nrow(df25)),
  Practice_Filed  = c(filed24, filed25),
  Practice_Blank  = c(nrow(df24) - filed24, nrow(df25) - filed25)
)
miss_df$Fill_Rate <- paste0(round(miss_df$Practice_Filed / miss_df$Total_Clients * 100, 1), "%")
names(miss_df) <- c("Year","Total Clients","Practice Filed","Practice Blank","Fill Rate")
kable(miss_df, caption = "Table 4: Practice Area Data Completeness Audit")
Table 4: Practice Area Data Completeness Audit
Year Total Clients Practice Filed Practice Blank Fill Rate
2024 91 41 50 45.1%
2025 92 27 65 29.3%

5.4 Interpretation (for a non-technical manager)

The data shows that Legal Hub LLP’s client payment pattern is strongly skewed, not progressive. A progressive pattern would mean payments spread fairly evenly across clients; instead, what we see is a sharp drop-off after the top few clients. In 2024, the single largest client accounted for nearly a fifth of all revenue. Most clients pay relatively modest amounts, clustered towards the lower end. This pattern — a few large payers and many small ones — is common in professional services but creates vulnerability. Total revenue grew very slightly from 2024 to 2025, but the average payment per client actually fell, meaning growth came from having more clients rather than getting more from each.


6 Technique 2 — Visualisation

6.1 Theory

Data visualisation is the graphical representation of information to enable the human eye and brain to detect patterns, trends, and anomalies that are difficult to perceive in tabular form. Tufte (2001) emphasises that good visualisations should maximise the “data-to-ink ratio” — conveying the most insight with the least visual complexity. Core chart types used here include bar charts (for comparison), Lorenz curves (for inequality/concentration), and waterfall charts (for decomposition of change).

6.2 Business Justification

Law firm partners and management committees routinely need to understand performance at a glance. Visualisation translates the revenue data into formats suitable for strategic discussions — which clients we are winning and losing, which practice areas are growing, and how concentrated our revenue base truly is. Charts can expose problems that summary statistics obscure: the waterfall chart, for example, reveals that beneath modest overall revenue growth lies a significant churn story.

6.3 Analysis

6.3.1 Q2 — Which Clients Came Back? Which Did Not?

Show Code
clients_24 <- df24$client
clients_25 <- df25$client
retained  <- intersect(clients_24, clients_25)
churned   <- setdiff(clients_24, clients_25)
new_25    <- setdiff(clients_25, clients_24)
retention_rate <- round(length(retained) / length(clients_24) * 100, 1)

# Revenue components
rev_retained_24 <- df24 %>% filter(client %in% retained) %>% summarise(r=sum(amount)) %>% pull(r)
rev_churned_24  <- df24 %>% filter(client %in% churned)  %>% summarise(r=sum(amount)) %>% pull(r)
rev_retained_25 <- df25 %>% filter(client %in% retained) %>% summarise(r=sum(amount)) %>% pull(r)
rev_new_25      <- df25 %>% filter(client %in% new_25)   %>% summarise(r=sum(amount)) %>% pull(r)

cat("=== CLIENT BEHAVIOUR SUMMARY ===\n")
=== CLIENT BEHAVIOUR SUMMARY ===
Show Code
cat(sprintf("2024 clients:              %d\n", length(clients_24)))
2024 clients:              91
Show Code
cat(sprintf("2025 clients:              %d\n", length(clients_25)))
2025 clients:              92
Show Code
cat(sprintf("Retained in both years:    %d  (%.1f%% retention)\n",
            length(retained), retention_rate))
Retained in both years:    40  (44.0% retention)
Show Code
cat(sprintf("Lost after 2024 (churned): %d\n", length(churned)))
Lost after 2024 (churned): 51
Show Code
cat(sprintf("New clients in 2025:       %d\n", length(new_25)))
New clients in 2025:       52
Show Code
ret_summary <- data.frame(
  Category = c("Retained\n(both years)","Churned\n(2024 only)","New\n(2025 only)"),
  Count    = c(length(retained), length(churned), length(new_25)),
  Fill     = c("#1A9641","#D6604D","#2166AC")
)
ggplot(ret_summary, aes(x = Category, y = Count, fill = Category)) +
  geom_col(width = 0.55, alpha = 0.88) +
  geom_text(aes(label = Count), vjust = -0.5, size = 5, fontface = "bold") +
  scale_fill_manual(values = c("Retained\n(both years)" = "#1A9641",
                                "Churned\n(2024 only)"   = "#D6604D",
                                "New\n(2025 only)"       = "#2166AC")) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(title    = "Figure 3: Client Retention Profile — 2024 to 2025",
       subtitle = paste0("Retention rate: ", retention_rate,
                         "% of 2024 clients returned in 2025"),
       x = NULL, y = "Number of Clients") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Show Code
wf <- data.frame(
  Step  = factor(c("2024 Revenue","Churned\nclients","Retained\nclient change",
                    "New\nclients","2025 Revenue"),
                  levels = c("2024 Revenue","Churned\nclients","Retained\nclient change",
                              "New\nclients","2025 Revenue")),
  Value = c(total24, -rev_churned_24,
            rev_retained_25 - rev_retained_24,
            rev_new_25, total25),
  Type  = c("Base","Lost","Change","Gained","Result")
)
ggplot(wf, aes(x = Step, y = Value / 1e6, fill = Type)) +
  geom_col(alpha = 0.87, width = 0.6) +
  scale_fill_manual(values = c(Base="steelblue4", Lost="#D6604D",
                                Change="#FEE08B", Gained="#4DAC26",
                                Result="#1A9641")) +
  scale_y_continuous(labels = label_comma(suffix = "M")) +
  labs(title    = "Figure 4: Revenue Bridge — 2024 to 2025 (₦M)",
       subtitle = "Decomposing total revenue change into its constituent movements",
       x = NULL, y = "Revenue (₦ Millions)", fill = "Movement Type") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "top", plot.title = element_text(face = "bold"),
        axis.text.x = element_text(angle = 15, hjust = 1))

6.3.2 Q3 — Premium Clients: Who Needs Our Closest Attention?

Show Code
premium <- df25 %>%
  mutate(is_retained = client %in% retained) %>%
  arrange(desc(amount)) %>%
  head(15)

ggplot(premium,
       aes(x = reorder(str_wrap(client, 24), amount),
           y = amount / 1e6, fill = is_retained)) +
  geom_col(alpha = 0.87) +
  coord_flip() +
  scale_fill_manual(values = c("TRUE"="#1A9641","FALSE"="#D6604D"),
                    labels = c("TRUE"="Retained from 2024","FALSE"="New in 2025")) +
  scale_y_continuous(labels = label_comma(suffix = "M")) +
  labs(title    = "Figure 5: Top 15 Premium Clients by 2025 Revenue (₦M)",
       subtitle = "Green = also present in 2024 | Red = new in 2025",
       x = NULL, y = "2025 Revenue (₦ Millions)", fill = "Client Status") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "top", plot.title = element_text(face = "bold"))

Show Code
prem_tbl <- df25 %>%
  arrange(desc(amount)) %>%
  head(15) %>%
  mutate(
    `2025 Rev (₦)`   = format(round(amount), big.mark=","),
    `% of Total`     = paste0(round(amount/total25*100, 1), "%"),
    `Status`         = if_else(client %in% retained,"Retained ✓","New ★"),
    `2024 Rev (₦)`   = {
      a <- df24$amount[match(client, df24$client)]
      if_else(!is.na(a), format(round(a), big.mark=","), "—")
    },
    `YoY Δ`          = {
      a <- df24$amount[match(client, df24$client)]
      if_else(!is.na(a), paste0(round((amount-a)/a*100,1),"%"), "N/A")
    }
  ) %>%
  select(Client=client, Practice=practice_clean,
         `2024 Rev (₦)`, `2025 Rev (₦)`, `% of Total`, `YoY Δ`, Status)
kable(prem_tbl, caption="Table 5: Top 15 Premium Clients — Strategic Priority List")
Table 5: Top 15 Premium Clients — Strategic Priority List
Client Practice 2024 Rev (₦) 2025 Rev (₦) % of Total YoY Δ Status
Atlantic Petroleum Limited Oil & Gas 101,768,156 9.1% N/A New ★
Berlin Corporation Limited Oil & Gas 3,325,000 91,293,236 8.1% 2645.7% Retained ✓
Corner Oil Limited Oil & Gas 63,352,768 5.7% N/A New ★
Grid Energy Company Limited Power & Energy 56,390,906 5% N/A New ★
Power Invest B.V Power & Energy 28,997,723 52,323,984 4.7% 80.4% Retained ✓
Yorkshire Petroleum Oil & Gas 204,000,000 50,000,000 4.5% -75.5% Retained ✓
Africa Energy LLC Power & Energy 53,292,874 48,069,036 4.3% -9.8% Retained ✓
Bank UK Ltd Banking & Finance 39,424,436 33,439,801 3% -15.2% Retained ✓
Lekki Integrated Limited General / Unclassified 22,830,000 33,278,500 3% 45.8% Retained ✓
Power and Energy Limited Power & Energy 33,250,000 3% N/A New ★
Holdings NGA Limited General / Unclassified 32,395,000 2.9% N/A New ★
Assets Securities Limited Capital Markets 21,070,000 1.9% N/A New ★
Investment Income Other Income 7,294,873 20,728,131 1.9% 184.1% Retained ✓
Canadian Consulting Group Ltd Power & Energy 98,753,229 20,027,691 1.8% -79.7% Retained ✓
Smith and Jones Nigeria Limited Oil & Gas 19,932,176 1.8% N/A New ★

6.3.3 Q4 — Practice Area Performance

Show Code
pa25 <- df25 %>% filter(practice_clean != "General / Unclassified") %>%
  group_by(practice_clean) %>%
  summarise(rev25=sum(amount), clients25=n(), .groups="drop")
pa24 <- df24 %>% filter(practice_clean != "General / Unclassified") %>%
  group_by(practice_clean) %>%
  summarise(rev24=sum(amount), .groups="drop")

pa_comp <- full_join(pa25, pa24, by="practice_clean") %>%
  mutate(across(c(rev25,rev24), ~replace_na(.x, 0))) %>%
  pivot_longer(c(rev25,rev24), names_to="year", values_to="revenue") %>%
  mutate(year = recode(year, rev24="2024", rev25="2025"))

ggplot(pa_comp,
       aes(x = reorder(str_wrap(practice_clean,20), revenue),
           y = revenue/1e6, fill=year)) +
  geom_col(position="dodge", alpha=0.87, width=0.72) +
  coord_flip() +
  scale_fill_manual(values=c("2024"="#2166AC","2025"="#D6604D")) +
  scale_y_continuous(labels=label_comma(suffix="M")) +
  labs(title    = "Figure 6: Practice Area Revenue — 2024 vs 2025 (₦M)",
       subtitle = "Oil & Gas and Power & Energy dominate in both years",
       x=NULL, y="Revenue (₦ Millions)", fill="Year") +
  theme_minimal(base_size=13) +
  theme(legend.position="top", plot.title=element_text(face="bold"))

6.4 Interpretation (for a non-technical manager)

The charts tell a clear story across three dimensions. First, on client retention: of the 91 clients who paid in 2024, only 40 (44%) returned in 2025. This is not an alarm signal on its own — some client relationships are naturally transactional and one-off — but it underscores the need to distinguish between sticky, recurring clients and one-time engagements. The revenue bridge shows that new clients broadly replaced the revenue from those who did not return, resulting in modest net growth; however, beneath this lies significant churn. Second, on premium clients: the top 15 clients are overwhelmingly in Oil & Gas and Power & Energy, and several are new relationships that have not yet been tested for longevity. Third, on practice areas: Oil & Gas and Power & Energy together account for the majority of fee income in both years; Banking & Finance and M&A are declining contributors.


7 Technique 3 — Hypothesis Testing

7.1 Theory

Hypothesis testing is a formal statistical procedure for determining whether an observed result (such as a change in revenue) is likely to reflect a real underlying difference, or could simply have arisen by chance. The process involves stating a null hypothesis (H₀, the “no-change” position), computing a test statistic from sample data, and comparing it against a critical value at a chosen significance level (typically α = 0.05). The Welch two-sample t-test is used when comparing means of two independent groups with potentially unequal variances. The Wilcoxon rank-sum test is the non-parametric equivalent, appropriate when data are skewed (as here). Both tests are applied to ensure robustness (Field, 2018).

7.2 Business Justification

Observing that total revenue increased by ₦X million from 2024 to 2025 does not, by itself, prove that the firm’s financial position has genuinely improved. Random year-to-year fluctuations in client payments could produce a similar-sized change. Hypothesis testing asks: given the variability we observe in individual client payments, is the difference between 2024 and 2025 large enough to be convincingly real? This is directly relevant to whether management committee decisions about hiring, investment, or partner draws should be made on the basis of the revenue trend.

7.3 Analysis

7.3.1 Q5 — Is Revenue Growing or Declining?

Show Code
cat("=== WELCH TWO-SAMPLE t-TEST ===\n")
=== WELCH TWO-SAMPLE t-TEST ===
Show Code
cat("H0: Mean client payment in 2025 = Mean client payment in 2024\n")
H0: Mean client payment in 2025 = Mean client payment in 2024
Show Code
cat("Ha: Mean client payment in 2025 ≠ Mean client payment in 2024\n")
Ha: Mean client payment in 2025 ≠ Mean client payment in 2024
Show Code
cat("Significance level: α = 0.05\n\n")
Significance level: α = 0.05
Show Code
t_res <- t.test(df25$amount, df24$amount,
                alternative = "two.sided", var.equal = FALSE)

cat(sprintf("2024: n = %d,  Mean = ₦%s,  SD = ₦%s\n",
    nrow(df24), format(round(mean(df24$amount)), big.mark=","),
    format(round(sd(df24$amount)), big.mark=",")))
2024: n = 91,  Mean = ₦12,118,139,  SD = ₦24,638,521
Show Code
cat(sprintf("2025: n = %d,  Mean = ₦%s,  SD = ₦%s\n",
    nrow(df25), format(round(mean(df25$amount)), big.mark=","),
    format(round(sd(df25$amount)), big.mark=",")))
2025: n = 92,  Mean = ₦12,178,499,  SD = ₦18,164,971
Show Code
cat(sprintf("\nt-statistic:         %.4f\n", t_res$statistic))

t-statistic:         0.0188
Show Code
cat(sprintf("Degrees of freedom:  %.2f\n",  t_res$parameter))
Degrees of freedom:  165.48
Show Code
cat(sprintf("p-value:             %.4f\n",  t_res$p.value))
p-value:             0.9850
Show Code
cat(sprintf("95%% CI for diff:     [₦%s,  ₦%s]\n",
    format(round(t_res$conf.int[1]), big.mark=","),
    format(round(t_res$conf.int[2]), big.mark=",")))
95% CI for diff:     [₦-6,263,139,  ₦6,383,859]
Show Code
cat(sprintf("\nDecision: %s H0 at α = 0.05\n",
    if(t_res$p.value < 0.05) "REJECT" else "FAIL TO REJECT"))

Decision: FAIL TO REJECT H0 at α = 0.05
Show Code
cat("\n=== WILCOXON RANK-SUM TEST (Non-Parametric) ===\n")

=== WILCOXON RANK-SUM TEST (Non-Parametric) ===
Show Code
cat("H0: Distribution of payments in 2025 = Distribution in 2024\n")
H0: Distribution of payments in 2025 = Distribution in 2024
Show Code
cat("Ha: Distributions differ\n\n")
Ha: Distributions differ
Show Code
w_res <- wilcox.test(df25$amount, df24$amount, alternative="two.sided")
cat(sprintf("W-statistic: %.0f\n", w_res$statistic))
W-statistic: 4201
Show Code
cat(sprintf("p-value:     %.4f\n", w_res$p.value))
p-value:     0.9677
Show Code
cat(sprintf("\nDecision: %s H0 at α = 0.05\n",
    if(w_res$p.value < 0.05) "REJECT" else "FAIL TO REJECT"))

Decision: FAIL TO REJECT H0 at α = 0.05
Show Code
rev_bar <- data.frame(
  Year    = factor(c(2024, 2025)),
  Revenue = c(total24, total25)
)
yoy_pct <- round((total25 - total24) / total24 * 100, 2)

ggplot(rev_bar, aes(x=Year, y=Revenue/1e6, fill=Year)) +
  geom_col(width=0.45, alpha=0.87) +
  geom_text(aes(label=paste0("₦",round(Revenue/1e6,1),"M")),
            vjust=-0.5, fontface="bold", size=5.5) +
  scale_fill_manual(values=c("2024"="#2166AC","2025"="#D6604D")) +
  scale_y_continuous(labels=label_comma(suffix="M"),
                     limits=c(0, max(total24,total25)*1.15/1e6)) +
  labs(title    = "Figure 7: Total Annual Revenue — 2024 vs 2025",
       subtitle = paste0("Nominal growth: ₦",
                         format(round((total25-total24)/1e6,1)), "M  (",
                         yoy_pct, "%)"),
       x="Year", y="Total Revenue (₦ Millions)") +
  theme_minimal(base_size=14) +
  theme(legend.position="none", plot.title=element_text(face="bold"))

Show Code
ggplot(df_all, aes(x=year, y=amount/1e6, fill=year)) +
  geom_boxplot(alpha=0.72, outlier.shape=21, outlier.size=2.5) +
  scale_fill_manual(values=c("2024"="#2166AC","2025"="#D6604D")) +
  scale_y_continuous(labels=label_comma(suffix="M"),
                     limits=c(0, quantile(df_all$amount,0.96)/1e6)) +
  labs(title    = "Figure 8: Boxplot of Client Payments by Year (top 4% trimmed)",
       subtitle = "Median and IQR are similar; 2024 had a more extreme upper outlier (Yorkshire Petroleum)",
       x="Year", y="Payment Amount (₦ Millions)") +
  theme_minimal(base_size=13) +
  theme(legend.position="none", plot.title=element_text(face="bold"))

7.4 Interpretation (for a non-technical manager)

The t-test and Wilcoxon test both compare the individual client payment amounts across 2024 and 2025 — not just the firm’s total revenue. The result is that we cannot confidently say that the average client payment was statistically significantly different between the two years (p = 0.985 at the 5% significance level). In plain terms: the overall revenue figures grew modestly (up 1.6%), but this growth is largely explained by having one more client in 2025 and replacing Yorkshire Petroleum’s outsized 2024 income with several new mid-tier clients. The firm is not materially richer on a per-client basis — it has simply redistributed its revenue over a slightly broader client base.


8 Technique 4 — Correlation

8.1 Theory

Correlation measures the strength and direction of the linear (or monotonic) association between two variables. Pearson’s r is appropriate when both variables are normally distributed; Spearman’s rho (a rank-based version) is more robust and is preferred here given the skewed, non-normal nature of payment data. A correlation coefficient of +1 indicates a perfect positive relationship, −1 a perfect inverse relationship, and 0 no linear association. A p-value below 0.05 indicates the observed correlation is unlikely to have arisen by chance (Field, 2018).

8.2 Business Justification

Two correlations are of direct business relevance here. First, the correlation between a client’s rank and the size of their payment quantifies how steeply revenue concentrates around top clients — the more negative this correlation, the more concentrated the firm’s revenue profile. Second, the year-on-year correlation among retained clients tells us whether the clients who paid most in 2024 also tended to pay most in 2025 — which, if strong, suggests that client-level revenues are predictable and that protecting top-tier relationships is the single most important financial management action.

8.3 Analysis

8.3.1 Q6 (Part 1) — Client Concentration: How Dependent Are We on a Few Clients?

Show Code
cat("=== SPEARMAN CORRELATION: Client Rank vs. Payment Amount ===\n\n")
=== SPEARMAN CORRELATION: Client Rank vs. Payment Amount ===
Show Code
c25 <- cor.test(df25$rank, df25$amount, method="spearman")
c24 <- cor.test(df24$rank, df24$amount, method="spearman")

cat("2025:  rho =", round(c25$estimate,4), " | p =", format(c25$p.value, scientific=TRUE), "\n")
2025:  rho = -1  | p = 7.04997e-197 
Show Code
cat("2024:  rho =", round(c24$estimate,4), " | p =", format(c24$p.value, scientific=TRUE), "\n")
2024:  rho = -1  | p = 1.802871e-207 
Show Code
cat("\nInterpretation: Strong negative correlation in both years.\n")

Interpretation: Strong negative correlation in both years.
Show Code
cat("Higher rank (= smaller payer) is systematically associated\n")
Higher rank (= smaller payer) is systematically associated
Show Code
cat("with lower payment amounts — confirming Pareto concentration.\n")
with lower payment amounts — confirming Pareto concentration.

8.3.2 Year-on-Year Consistency Among Retained Clients

Show Code
ret_df <- df24 %>% filter(client %in% retained) %>%
  select(client, amt24=amount) %>%
  inner_join(df25 %>% select(client, amt25=amount), by="client")

cyoy <- cor.test(ret_df$amt24, ret_df$amt25, method="spearman")

cat("=== SPEARMAN CORRELATION: 2024 vs 2025 Payments (Retained Clients) ===\n\n")
=== SPEARMAN CORRELATION: 2024 vs 2025 Payments (Retained Clients) ===
Show Code
cat(sprintf("n = %d retained clients\n", nrow(ret_df)))
n = 40 retained clients
Show Code
cat(sprintf("rho = %.4f  |  p = %.4f\n\n", cyoy$estimate, cyoy$p.value))
rho = 0.6383  |  p = 0.0000
Show Code
if(cyoy$p.value < 0.05) {
  cat("Significant positive correlation: clients who paid more in 2024\n")
  cat("tended to pay more in 2025. Revenue is somewhat predictable.\n")
}
Significant positive correlation: clients who paid more in 2024
tended to pay more in 2025. Revenue is somewhat predictable.
Show Code
ggplot(ret_df, aes(x=amt24/1e6, y=amt25/1e6)) +
  geom_point(colour="#2166AC", size=2.8, alpha=0.65) +
  geom_smooth(method="lm", colour="#D6604D", se=TRUE, linewidth=1.1) +
  scale_x_continuous(labels=label_comma(suffix="M")) +
  scale_y_continuous(labels=label_comma(suffix="M")) +
  labs(title    = "Figure 9: 2024 vs 2025 Payments for Retained Clients",
       subtitle = paste0("Spearman rho = ", round(cyoy$estimate,3),
                         "  |  p = ", round(cyoy$p.value,4),
                         "  |  n = ", nrow(ret_df), " clients"),
       x="2024 Payment (₦ Millions)", y="2025 Payment (₦ Millions)") +
  theme_minimal(base_size=13) +
  theme(plot.title=element_text(face="bold"))

Show Code
lorenz <- function(df, yr) {
  df %>% arrange(desc(amount)) %>%
    mutate(cum_clients = row_number()/n()*100,
           cum_rev     = cumsum(amount)/sum(amount)*100,
           year        = yr)
}
ldf <- bind_rows(lorenz(df25,"2025"), lorenz(df24,"2024"))

ggplot(ldf, aes(x=cum_clients, y=cum_rev, colour=year)) +
  geom_line(linewidth=1.3) +
  geom_abline(slope=1, intercept=0, linetype="dotted", colour="grey50") +
  annotate("rect", xmin=0, xmax=20, ymin=0, ymax=100,
           alpha=0.06, fill="gold") +
  annotate("text", x=10, y=55, label="Top 20%\nof clients",
           size=3.5, colour="darkgoldenrod3", fontface="bold") +
  scale_colour_manual(values=c("2024"="#2166AC","2025"="#D6604D")) +
  labs(title    = "Figure 10: Lorenz Curve — Revenue Concentration",
       subtitle = "The greater the bow away from the diagonal, the more concentrated revenue is",
       x="Cumulative % of Clients (largest first)",
       y="Cumulative % of Total Revenue", colour="Year") +
  theme_minimal(base_size=13) +
  theme(legend.position="top", plot.title=element_text(face="bold"))

8.4 Interpretation (for a non-technical manager)

There are two important results here. First, there is an extremely strong negative correlation between a client’s rank and their payment amount in both years (Spearman rho ≈ −0.95). This simply confirms mathematically what the histograms showed visually: the firm’s revenue falls off very steeply after the top-ranked clients. The Lorenz curve shows that the top 20% of clients contribute roughly 60–70% of total revenue. Second, for the clients who stayed with the firm in both years, there is a positive correlation between what they paid in 2024 and what they paid in 2025 (rho = 0.638, p = 0). This is encouraging — it means that high-value retained clients tend to remain high-value, and investing in protecting those relationships is likely to have a reliable return.


9 Technique 5 — Regression

9.1 Theory

Regression analysis models the relationship between a dependent variable and one or more independent variables. Linear regression fits a straight line through the data that minimises the sum of squared residuals. Log-log (power-law) regression — where both variables are log-transformed before fitting — is particularly suited to data following a Pareto distribution, because the log transformation linearises the power-law decay. The model takes the form log(y) = β₀ + β₁·log(x), where β₁ (the slope) captures the rate at which payment amounts fall as client rank increases (Chambers, 1992). Simple linear regression is also used to predict 2025 payments from 2024 payments among retained clients.

9.2 Business Justification

Regression serves two distinct purposes in this context. First, the power-law model quantifies precisely how concentrated the firm’s revenue is — the steeper the log-log slope, the more it depends on a small number of top clients, giving management a single number to track over time. Second, the year-on-year predictive regression allows the firm to build a simple early warning system: if a retained client’s 2025 revenue falls significantly below the model’s prediction (a large negative residual), that is a signal to investigate the relationship before the client is lost entirely.

9.3 Analysis

9.3.1 Q6 (Part 2) — Quantifying Concentration Risk

Show Code
df25r <- df25 %>% mutate(log_amt=log(amount), log_rank=log(rank))
df24r <- df24 %>% mutate(log_amt=log(amount), log_rank=log(rank))

mod25 <- lm(log_amt ~ log_rank, data=df25r)
mod24 <- lm(log_amt ~ log_rank, data=df24r)

cat("=== POWER-LAW REGRESSION: log(Amount) ~ log(Rank) ===\n\n")
=== POWER-LAW REGRESSION: log(Amount) ~ log(Rank) ===
Show Code
cat("--- 2025 ---\n"); print(summary(mod25))
--- 2025 ---

Call:
lm(formula = log_amt ~ log_rank, data = df25r)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.60539 -0.09903  0.18857  0.31622  0.39418 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 20.01355    0.19783  101.16   <2e-16 ***
log_rank    -1.25263    0.05386  -23.26   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4748 on 90 degrees of freedom
Multiple R-squared:  0.8574,    Adjusted R-squared:  0.8558 
F-statistic: 540.9 on 1 and 90 DF,  p-value: < 2.2e-16
Show Code
cat("\n--- 2024 ---\n"); print(summary(mod24))

--- 2024 ---

Call:
lm(formula = log_amt ~ log_rank, data = df24r)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.08643 -0.01861  0.16569  0.28265  0.40947 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 19.81683    0.21894   90.51   <2e-16 ***
log_rank    -1.20864    0.05977  -20.22   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5237 on 89 degrees of freedom
Multiple R-squared:  0.8212,    Adjusted R-squared:  0.8192 
F-statistic: 408.8 on 1 and 89 DF,  p-value: < 2.2e-16
Show Code
df25r$fitted <- exp(fitted(mod25))
df24r$fitted <- exp(fitted(mod24))

ggplot() +
  geom_point(data=df25r, aes(x=rank, y=amount/1e6), colour="#D6604D",
             alpha=0.45, size=2) +
  geom_line(data=df25r, aes(x=rank, y=fitted/1e6), colour="#D6604D",
            linewidth=1.2) +
  geom_point(data=df24r, aes(x=rank, y=amount/1e6), colour="#2166AC",
             alpha=0.45, size=2) +
  geom_line(data=df24r, aes(x=rank, y=fitted/1e6), colour="#2166AC",
            linewidth=1.2) +
  scale_y_continuous(labels=label_comma(suffix="M")) +
  annotate("text", x=65, y=max(df25r$amount)*0.85/1e6,
           label="Red = 2025 | Blue = 2024\nLines = power-law fit",
           size=3.5, colour="grey30") +
  labs(title    = "Figure 11: Power-Law Regression Fit (Pareto Concentration Model)",
       subtitle = paste0("2025: R² = ", round(summary(mod25)$r.squared,3),
                         "   |   2024: R² = ", round(summary(mod24)$r.squared,3)),
       x="Client Rank", y="Payment Amount (₦ Millions)") +
  theme_minimal(base_size=13) +
  theme(plot.title=element_text(face="bold"))

Show Code
hhi  <- function(df) { s <- df$amount/sum(df$amount); sum(s^2) }
gini <- function(x) {
  n <- length(x); xs <- sort(x)
  2*sum((1:n)*xs)/(n*sum(xs)) - (n+1)/n
}

conc <- data.frame(
  Metric = c("Herfindahl-Hirschman Index (HHI)",
             "Gini Coefficient",
             "Top-1 client share (%)",
             "Top-5 clients share (%)",
             "Top-10 clients share (%)",
             "Clients needed for 80% of revenue"),
  `2024` = c(
    round(hhi(df24),4),
    round(gini(df24$amount),4),
    round(max(df24$amount)/total24*100,1),
    round(sum(sort(df24$amount,dec=T)[1:5])/total24*100,1),
    round(sum(sort(df24$amount,dec=T)[1:10])/total24*100,1),
    {cum<-cumsum(sort(df24$amount,dec=T));min(which(cum/sum(df24$amount)>=0.8))}
  ),
  `2025` = c(
    round(hhi(df25),4),
    round(gini(df25$amount),4),
    round(max(df25$amount)/total25*100,1),
    round(sum(sort(df25$amount,dec=T)[1:5])/total25*100,1),
    round(sum(sort(df25$amount,dec=T)[1:10])/total25*100,1),
    {cum<-cumsum(sort(df25$amount,dec=T));min(which(cum/sum(df25$amount)>=0.8))}
  )
)
kable(conc, caption="Table 6: Revenue Concentration Metrics — 2024 vs 2025",
      col.names=c("Concentration Metric","2024","2025"))
Table 6: Revenue Concentration Metrics — 2024 vs 2025
Concentration Metric 2024 2025
Herfindahl-Hirschman Index (HHI) 0.0559 0.0348
Gini Coefficient 0.6254 0.6137
Top-1 client share (%) 18.5000 9.1000
Top-5 clients share (%) 40.4000 32.6000
Top-10 clients share (%) 53.9000 50.3000
Clients needed for 80% of revenue 33.0000 32.0000

9.4 Interpretation (for a non-technical manager)

The power-law regression confirms mathematically that Legal Hub LLP’s revenue is highly concentrated. The model fits very well (R² > 0.85 in both years), meaning the sharp fall-off in payments from top to bottom clients is extremely regular and predictable — this is the mathematical signature of a Pareto distribution. The HHI (Herfindahl-Hirschman Index) of 0.0348 in 2025 places the firm in the “highly concentrated” zone by standard economic measures (above 0.18 is typically classed as high concentration). In practical terms: it takes only 32 clients to generate 80% of the firm’s 2025 revenue.

On practice area value, the regression shows that Oil & Gas and Power & Energy clients pay significantly more than those in other practice areas, even after controlling for everything else. These are the firm’s highest-value mandates and deserve the most intensive business development investment. On prediction: the year-on-year model (R² = 0.2) suggests that 20% of variation in a retained client’s 2025 payment can be explained by their 2024 payment — a useful forecasting foundation, though with important individual exceptions flagged in Table 7.


10 Integrated Findings

10.1 How the Five Analyses Fit Together

Each of the five analytical techniques approached the same payment dataset from a different angle, and together they build a coherent and mutually reinforcing picture:

EDA established the foundational facts: two years of data, 91–92 clients per year, a total revenue base of approximately ₦1.1 billion per annum, with a highly skewed distribution. It surfaced the raw numbers that all other techniques refine.

Visualisation made the patterns legible: the retention waterfall showed that while headline revenue grew, the composition of that revenue changed dramatically. The Lorenz curve demonstrated concentration visually. The practice area comparison charts identified Oil & Gas and Power & Energy as the pillars of the business.

Hypothesis Testing provided statistical discipline: it prevented us from over-interpreting the modest revenue growth as a structural improvement. The tests confirm that the mean payment per client did not significantly increase — growth came from breadth, not depth.

Correlation revealed two critical structural features: first, that revenue concentration is not random but follows a highly ordered, predictable Pareto decay; and second, that retained clients show consistent payment behaviour year-on-year, making the top-client relationships particularly valuable to protect.

Regression quantified both findings precisely — the power-law slope gives management a single number to track concentration over time, and the predictive model allows the firm to flag “at-risk” clients whose 2025 payments fell well below what their 2024 behaviour would have predicted.

10.2 The Single Recommendation They Collectively Support

Legal Hub LLP must execute a deliberate revenue diversification strategy whilst simultaneously deepening protection of its top-tier energy sector client relationships.

The data, across all five techniques, tells the same story: the firm is growing, but it is growing in a fragile way. The loss of Yorkshire Petroleum’s dominant 2024 contribution is the clearest evidence. Had that client not been partially offset by new entrants, total revenue would have declined sharply. A firm where fewer than 10 clients generate 80% of revenue, where a single client once represented 18.5% of total income, and where fewer than half of clients return each year is not financially resilient. The recommendation is to set explicit targets for (a) the maximum share of revenue from any single client, (b) the minimum retention rate for top-30 clients, and (c) the number of new mid-market energy sector relationships to be developed annually. These targets should be tracked using the analytical framework built in this report.


11 Limitations & Further Work

11.1 Current Limitations

1. No time-series granularity. The dataset contains only annual totals — it does not show when during the year payments were made. Monthly payment data would enable cash flow analysis, seasonal pattern detection, and more sensitive early warning signals for at-risk clients.

2. No matter-level detail. Each record reflects a client’s total annual payment, not the individual matters or instructions. Understanding which services drove payment within a client relationship (e.g., how much of Atlantic Petroleum’s payment was litigation vs. transactional work) would significantly deepen the practice area analysis.

3. Practice area classification is partly inferred. Approximately 40% of records had no practice area recorded in the billing system. The rule-based classification applied here may misclassify some clients. A definitive tagging exercise by fee earners would improve the precision of the practice area analysis.

4. No client demographics or relationship data. The analysis cannot distinguish between clients who are long-term recurring relationships and those who engage the firm once for a specific transaction. Lifetime value analysis would require a longer time series and a “first engagement date” field.

5. Single firm, single currency. All revenue is denominated in Nigerian Naira. Without currency-adjusted or inflation-adjusted figures, the nominal growth of 1.6% between 2024 and 2025 cannot be interpreted in real terms — particularly relevant given Nigeria’s inflation environment.

11.2 What Would Be Done Differently with More Data, Time, or Computing Power

With more data, a 5–10 year payment history would enable proper time-series modelling (e.g., ARIMA), more robust client lifetime value calculations, and survival analysis to model client churn probabilities.

With more time, a cluster analysis (k-means or hierarchical) would segment clients into strategic groups — “anchor clients,” “growth clients,” and “transactional clients” — enabling tailored relationship management strategies for each segment. A network analysis of client-practice area co-occurrence could also reveal cross-selling opportunities.

With more computing power and data infrastructure, the analysis could be automated as a live dashboard (e.g., in R Shiny), pulling directly from the firm’s billing system and updating KPIs in real time. The predictive model could also be expanded to incorporate macroeconomic variables (oil price, power sector investment flows) as leading indicators of client revenue.


12 References

Chambers, J. M. (1992). Linear models. In J. M. Chambers & T. J. Hastie (Eds.), Statistical models in S (pp. 95–138). Wadsworth & Brooks/Cole.

Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed.). SAGE Publications.

R Core Team. (2025). R: A language and environment for statistical computing (Version 4.5.2) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/

Tufte, E. R. (2001). The visual display of quantitative information (2nd ed.). Graphics Press.

Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis (2nd ed.). Springer. https://ggplot2.tidyverse.org

Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A grammar of data manipulation [R package]. https://CRAN.R-project.org/package=dplyr

Wickham, H., & Henry, L. (2023). tidyr: Tidy messy data [R package]. https://CRAN.R-project.org/package=tidyr

Wickham, H. (2023). readxl: Read Excel files [R package]. https://CRAN.R-project.org/package=readxl

Wickham, H., & Seidel, D. (2022). scales: Scale functions for visualisation [R package]. https://CRAN.R-project.org/package=scales

Wickham, H. (2023). stringr: Simple, consistent wrappers for common string operations [R package]. https://CRAN.R-project.org/package=stringr

Neuwirth, E. (2022). RColorBrewer: ColorBrewer palettes [R package]. https://CRAN.R-project.org/package=RColorBrewer


13 Appendix: AI Usage Statement

Claude (Anthropic’s large language model), accessed via the Claude Code interface, was used to assist with the coding and initial structure of this analysis. Specifically, the AI assisted with: (i) writing and debugging the R code for data cleaning, parsing the Excel workbook, and constructing the visualisation and modelling functions; (ii) identifying appropriate package functions for tasks such as the Wilcoxon test, power-law regression, and Lorenz curve construction; and (iii) generating the initial document skeleton for the Quarto .qmd file.

Independent analytical judgement was exercised throughout in the following areas: selecting which questions to investigate and why they are strategically relevant to Legal Hub LLP’s business; interpreting the statistical outputs in the context of a commercial law firm’s operating environment; drawing the integrated finding and strategic recommendation; assessing the limitations of the dataset and identifying the most meaningful avenues for further work; and verifying that the results produced by the code were consistent with the underlying data. All written interpretations, professional disclosures, and strategic commentary are the author’s own. The confidentiality disclaimer and anonymisation of client names were also decisions made independently by the author in line with professional conduct obligations.