Legal Hub LLP — Client Payment Analytics (2024–2025)

Case Study 1: Exploratory & Inferential Analytics

Author

Anthony Ezeamama | EMBA-31 | LBS

Published

April 1, 2026

Confidentiality Disclaimer

All client and counterparty names appearing in the underlying dataset have been changed from their real identities for confidentiality and professional conduct purposes. The financial figures are real and have not been altered. This anonymisation was carried out prior to analysis and has no bearing on the integrity of the findings presented in this report. See Section 3.6 for the full ethical and confidentiality statement.

1 Executive Summary

Legal Hub LLP is a full-service commercial law firm whose continued financial health depends on understanding how client payment behaviour evolves from year to year. This report analyses the firm’s internal client payment records for the two full financial years 2024 and 2025, encompassing 91 and 92 client relationships respectively, with combined revenues of approximately ₦2.23 billion across the two years.

Using five analytical techniques — Exploratory Data Analysis (EDA), Visualisation, Hypothesis Testing, Correlation, and Regression — the report addresses seven business questions ranging from revenue trends and client retention to practice area performance and concentration risk.

Key findings reveal that total revenue grew modestly by approximately 1.6% year-on-year, but this headline growth conceals significant turbulence: the firm’s single largest client in 2024 (accounting for 18.5% of total revenue) reduced its payments by 75% in 2025, and fewer than half of 2024 clients returned. Revenue is heavily Pareto-concentrated — the top five clients contribute over 35% of annual income — creating meaningful dependency risk. Oil & Gas and Power & Energy are overwhelmingly the most commercially valuable practice areas.

The central recommendation is that the firm must urgently diversify its client base while intensifying relationship management with its top-tier energy sector clients. Continued reliance on a small number of high-value mandates in a single sector represents a material strategic risk.

2 Professional Disclosure

Name: Anthony Ezeamama Job Title: Partner Organisation: Legal Hub LLP Organisation Type: Commercial Law Firm — Professional Services Sector: Legal Services — (1) Oil & Gas; (2) Power; (3) Banking & Finance; (4) Capital Markets; (5) M&A; (6) Technology and Digital Economy; (7) Corporate & Commercial; (8) Legal Tax Advisory; (9) Private Equity; and (10) Real Estate & Construction

As a Partner at Legal Hub LLP, I hold both client-facing and firm management responsibilities. My role encompasses originating and leading client engagements, overseeing the profitability of my client portfolio, and contributing to the firm’s strategic planning as a member of its management leadership. The five analytical techniques applied in this report are directly relevant to my professional practice in the following ways:

Exploratory Data Analysis (EDA): As a Partner responsible for firm revenue, I regularly review payment summaries and client account reports. EDA formalises this review process — it gives me a systematic, evidence-based picture of who is paying, how much, and in what pattern. This is operationally critical during annual budget reviews, partner profit-sharing discussions, and client retention planning. Rather than relying on intuition about which clients matter most, EDA surfaces the data to back those judgements.

Visualisation: Law firm management involves presenting financial intelligence to non-technical colleagues — fellow partners, the management committee, and occasionally institutional lenders. Visualisation translates raw payment data into charts and graphs that enable faster, more aligned decision-making in those rooms. In my experience, a well-designed chart of client revenue concentration has more influence on a partner meeting than a spreadsheet of numbers.

Hypothesis Testing: Law firms frequently make year-end assessments of whether financial performance has materially improved or declined. Without formal testing, these judgements are impressionistic. Hypothesis testing gives me a principled basis for asserting — with a stated confidence level — whether observed changes in revenue are statistically meaningful or simply reflect normal variation. This is important when making arguments to the management committee about resource reallocation or headcount changes.

Correlation Analysis: Understanding how a client’s payment behaviour in one year predicts their behaviour the following year helps the firm plan its cash flow and prioritise relationship investment. Correlation also helps identify whether particular practice areas tend to produce consistently high-value client relationships, which informs our lateral hiring and business development strategy.

Regression Analysis: Revenue forecasting is a core planning tool for any professional services firm. Regression allows me to build a predictive model of what individual client revenues are likely to be in the next period, based on past behaviour. The power-law model also gives me a precise quantification of how concentrated our revenue is — a figure I can present to the firm’s management committee to motivate a deliberate client diversification strategy.

3 Data Collection & Sampling

3.1 Source

The data analysed in this report is drawn from Legal Hub LLP’s internal practice management and billing system. Specifically, the dataset records net client payments received by the firm during the calendar years 2024 and 2025. The data was extracted by the firm’s finance function in the form of a structured spreadsheet (Payment Analysis 2.xlsx), with one worksheet per year.

3.2 Collection Method

The data was compiled from the firm’s accounts receivable ledger, which records all amounts received from clients against invoices raised for legal services. “Net payment” refers to amounts actually received (not invoiced), net of any credit notes or fee adjustments. The data was extracted administratively rather than through survey or primary research.

3.3 Sampling Frame & Sample Size

The dataset is a census, not a sample — it covers all clients who made at least one net payment to Legal Hub LLP during the relevant year. There was no sampling process; every paying client relationship in each year is included.

Year	Number of Clients	Total Net Revenue (₦)
2024	91 clients	See Section 5
2025	92 clients	See Section 5

3.4 Time Period Covered

2024 dataset: 1 January 2024 to 31 December 2024
2025 dataset: 1 January 2025 to 31 December 2025

3.5 Variables Available

Each record contains: client rank (by payment size), client name, net payment amount (₦), percentage of total revenue, and practice area classification.

3.6 Ethical Notes & Confidentiality Statement

This analysis was conducted by a Partner of the firm and falls within the legitimate internal management use of client financial data. No external disclosure of identifiable client information has been made. All client and counterparty names appearing in the underlying dataset have been changed from their real identities for confidentiality and professional conduct purposes. The financial figures are real and have not been altered. This anonymisation was carried out prior to analysis and has no bearing on the integrity of the findings presented in this report. The firm’s data governance policies and professional conduct obligations under the relevant bar rules have been observed throughout.

4 Data Description

Show Code

library(readxl)
library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)
library(RColorBrewer)
library(stringr)
library(knitr)

f <- "Payment Analysis 2.xlsx"

# ── Parse one sheet ──────────────────────────────────────────────────────────
parse_sheet <- function(sheet_name, year) {
  raw <- read_excel(f, sheet = sheet_name, col_names = FALSE,
                    col_types = "text")
  names(raw) <- c("rank","client","amount_raw","amount_alt",
                  "pct_raw","practice_raw","x7","x8")
  raw %>%
    filter(!is.na(rank),
           !is.na(suppressWarnings(as.numeric(rank)))) %>%
    mutate(
      rank     = as.integer(rank),
      client   = str_trim(client),
      amount   = suppressWarnings(as.numeric(amount_raw)),
      pct      = suppressWarnings(as.numeric(pct_raw)),
      practice = str_trim(practice_raw),
      year     = year
    ) %>%
    filter(!is.na(amount), amount > 0) %>%
    select(year, rank, client, amount, pct, practice)
}

df25 <- parse_sheet("2025 Payment", 2025)
df24 <- parse_sheet("2024 Payment", 2024)

# ── Practice area harmonisation ──────────────────────────────────────────────
harmonise_practice <- function(pa, client_name) {
  x <- if_else(!is.na(pa) & pa != "", pa, client_name)
  case_when(
    str_detect(x, regex("oil|petro|lng|midget|rig|china legal|moor.*moor|
                         energy trad|zenith|corner|berlin|atlantic|yorkshire|
                         smith.*jones|petroleum intl",
                         ignore_case = TRUE)) ~ "Oil & Gas",
    str_detect(x, regex("power|genco|ecowas|solar|renew|jeep|bond.*power|
                         churgate|cosec|lekki.*power|aso.*power|steam.*energy|
                         industrial.*rock|larger.*brother|charger.*energy|
                         grid.*energy|power.*energy|africa energy|canada.*consult|
                         canadian.*consult|rocafeller|savannah tech",
                         ignore_case = TRUE)) ~ "Power & Energy",
    str_detect(x, regex("bank|microfinance|greece|overy|federal bank|
                         wuse.*microfinance|steam.*energy",
                         ignore_case = TRUE))          ~ "Banking & Finance",
    str_detect(x, regex("capital|asset.*sec|guarantee|butter|cornerstone.*real|
                         vegas|renaissance|AIK|western pens|global capital",
                         ignore_case = TRUE))          ~ "Capital Markets",
    str_detect(x, regex("M&A|merger|gombe|acquisition",
                         ignore_case = TRUE))          ~ "M&A",
    str_detect(x, regex("tax|nuprc",
                         ignore_case = TRUE))          ~ "Tax",
    str_detect(x, regex("corp|enerco|construct|lekki integ|natural gas assoc",
                         ignore_case = TRUE))          ~ "Corporate",
    str_detect(x, regex("real estate|homes.*real|cubana|crown.*realt|estate dev",
                         ignore_case = TRUE))          ~ "Real Estate",
    str_detect(x, regex("tech|media|intel|AI solution|lab.*stem|clad|
                         large chips|savannah",
                         ignore_case = TRUE))          ~ "Technology & Media",
    str_detect(x, regex("invest.*income|rental|training.*revenue",
                         ignore_case = TRUE))          ~ "Other Income",
    TRUE ~ "General / Unclassified"
  )
}

df25 <- df25 %>%
  mutate(practice_clean = harmonise_practice(practice, client))
df24 <- df24 %>%
  mutate(practice_clean = harmonise_practice(practice, client))

df_all <- bind_rows(df25, df24) %>% mutate(year = factor(year))

total25 <- sum(df25$amount)
total24 <- sum(df24$amount)

4.1 Variable Dictionary

The table below describes every variable retained for analysis after cleaning.

Show Code

var_dict <- data.frame(
  `Variable`    = c("year","rank","client","amount","pct","practice","practice_clean"),
  `Type`        = c("Categorical (factor)","Integer","Character","Numeric (₦)",
                    "Numeric (proportion)","Character","Categorical (factor)"),
  `Description` = c(
    "Financial year in which payment was received (2024 or 2025)",
    "Client rank within the year, ordered from largest to smallest payer (1 = highest)",
    "Anonymised client name (real names withheld — see Section 3.6)",
    "Total net payment received from client during the year, in Nigerian Naira (₦)",
    "Client's payment as a proportion of the firm's total annual revenue (0–1 scale)",
    "Practice area label as recorded in the billing system (partially complete)",
    "Cleaned and harmonised practice area label, derived from billing record or client name"
  ),
  `Missing Values` = c("None","None","None","None","Minimal","~40% in each year","Derived — none")
)
kable(var_dict, caption = "Table 1: Variable Dictionary",
      col.names = c("Variable","Type","Description","Missing Values"))

Table 1: Variable Dictionary
Variable	Type	Description	Missing Values
year	Categorical (factor)	Financial year in which payment was received (2024 or 2025)	None
rank	Integer	Client rank within the year, ordered from largest to smallest payer (1 = highest)	None
client	Character	Anonymised client name (real names withheld — see Section 3.6)	None
amount	Numeric (₦)	Total net payment received from client during the year, in Nigerian Naira (₦)	None
pct	Numeric (proportion)	Client’s payment as a proportion of the firm’s total annual revenue (0–1 scale)	Minimal
practice	Character	Practice area label as recorded in the billing system (partially complete)	~40% in each year
practice_clean	Categorical (factor)	Cleaned and harmonised practice area label, derived from billing record or client name	Derived — none

4.2 Distributional Summary

Show Code

desc_stats <- df_all %>%
  group_by(Year = year) %>%
  summarise(
    `n (clients)`    = n(),
    `Total (₦)`      = format(round(sum(amount)), big.mark = ","),
    `Mean (₦)`       = format(round(mean(amount)), big.mark = ","),
    `Median (₦)`     = format(round(median(amount)), big.mark = ","),
    `Std Dev (₦)`    = format(round(sd(amount)), big.mark = ","),
    `Min (₦)`        = format(round(min(amount)), big.mark = ","),
    `Max (₦)`        = format(round(max(amount)), big.mark = ","),
    `Skewness`       = round(
      (mean(amount) - median(amount)) / sd(amount), 3),
    .groups = "drop"
  )

kable(desc_stats, caption = "Table 2: Distributional Summary of Client Payment Amounts by Year",
      align = c("l", rep("r", 8)))

Table 2: Distributional Summary of Client Payment Amounts by Year
Year	n (clients)	Total (₦)	Mean (₦)	Median (₦)	Std Dev (₦)	Min (₦)	Max (₦)	Skewness
2024	91	1,102,750,681	12,118,139	5,525,000	24,638,521	215,000	2.04e+08	0.268
2025	92	1,120,421,932	12,178,499	5,656,250	18,164,971	342,500	101,768,156	0.359

The data exhibits strong positive skewness in both years (mean substantially exceeds median), confirming that a small number of very large client payments pull the average upward. This is the hallmark of a Pareto-distributed variable and motivates the use of both parametric and non-parametric techniques in subsequent sections.

Practice area coverage is partially complete in the raw billing data. Where the practice area field was blank, it was inferred from the client name using a rule-based classification. The resulting practice_clean variable is used throughout all subsequent analysis.

5 Technique 1 — Exploratory Data Analysis (EDA)

5.1 Theory

Exploratory Data Analysis, introduced by Tukey (1977), is the process of summarising and interrogating a dataset to understand its structure, distributions, central tendencies, spread, and anomalies before any formal modelling. EDA is a prerequisite for all other techniques: it ensures the analyst understands what the data contains, identifies quality issues, and generates hypotheses for further testing. Key tools include summary statistics (mean, median, standard deviation, skewness), frequency distributions, and missing-data audits.

5.2 Business Justification

Before drawing conclusions about revenue trends, client behaviour, or practice area performance, it is essential to understand the shape and quality of the data. EDA reveals whether our measures are meaningful (e.g. whether the mean is an appropriate summary statistic given skewness), flags data gaps that could bias conclusions, and quantifies the magnitude of client payment differences. For a law firm partner, EDA is the analytical equivalent of reading the financial statements before attending a budget meeting.

5.3 Analysis

5.3.1 Q1 — Are Payment Patterns Progressive?

Show Code

ggplot(df_all, aes(x = amount / 1e6, fill = year)) +
  geom_histogram(bins = 30, alpha = 0.72, position = "identity") +
  scale_fill_manual(values = c("2024" = "#2166AC", "2025" = "#D6604D")) +
  scale_x_continuous(labels = label_comma(suffix = "M")) +
  labs(
    title    = "Figure 1: Distribution of Client Payment Amounts (₦M)",
    subtitle = "Highly right-skewed in both years — a small number of clients dominate revenue",
    x        = "Payment Amount (₦ Millions)",
    y        = "Number of Clients",
    fill     = "Year"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "top", plot.title = element_text(face = "bold"))

Show Code

ggplot(df_all, aes(x = rank, y = amount / 1e6, colour = year)) +
  geom_point(alpha = 0.55, size = 2.2) +
  scale_colour_manual(values = c("2024" = "#2166AC", "2025" = "#D6604D")) +
  scale_y_continuous(labels = label_comma(suffix = "M")) +
  labs(
    title    = "Figure 2: Client Rank vs. Payment Amount",
    subtitle = "Payments fall sharply from rank 1 and flatten to a long tail — not a progressive (linear) pattern",
    x        = "Client Rank (1 = largest payer)",
    y        = "Payment Amount (₦ Millions)",
    colour   = "Year"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "top", plot.title = element_text(face = "bold"))

Show Code

rev_df <- data.frame(
  Year    = c("2024","2025"),
  Revenue = c(total24, total25),
  Clients = c(nrow(df24), nrow(df25))
) %>% mutate(
  `Revenue (₦M)` = round(Revenue / 1e6, 2),
  `Avg/Client (₦M)` = round(Revenue / Clients / 1e6, 2),
  `Median/Client (₦M)` = c(round(median(df24$amount)/1e6, 2),
                             round(median(df25$amount)/1e6, 2))
)
kable(rev_df %>% select(-Revenue),
      caption = "Table 3: Annual Revenue Summary — 2024 vs 2025")

Table 3: Annual Revenue Summary — 2024 vs 2025
Year	Clients	Revenue (₦M)	Avg/Client (₦M)	Median/Client (₦M)
2024	91	1102.75	12.12	5.53
2025	92	1120.42	12.18	5.66

Show Code

filed24 <- sum(!is.na(df24$practice) & df24$practice != "")
filed25 <- sum(!is.na(df25$practice) & df25$practice != "")
miss_df <- data.frame(
  Year            = c(2024, 2025),
  Total_Clients   = c(nrow(df24), nrow(df25)),
  Practice_Filed  = c(filed24, filed25),
  Practice_Blank  = c(nrow(df24) - filed24, nrow(df25) - filed25)
)
miss_df$Fill_Rate <- paste0(round(miss_df$Practice_Filed / miss_df$Total_Clients * 100, 1), "%")
names(miss_df) <- c("Year","Total Clients","Practice Filed","Practice Blank","Fill Rate")
kable(miss_df, caption = "Table 4: Practice Area Data Completeness Audit")

Table 4: Practice Area Data Completeness Audit
Year	Total Clients	Practice Filed	Practice Blank	Fill Rate
2024	91	41	50	45.1%
2025	92	27	65	29.3%

5.4 Interpretation (for a non-technical manager)

The data shows that Legal Hub LLP’s client payment pattern is strongly skewed, not progressive. A progressive pattern would mean payments spread fairly evenly across clients; instead, what we see is a sharp drop-off after the top few clients. In 2024, the single largest client accounted for nearly a fifth of all revenue. Most clients pay relatively modest amounts, clustered towards the lower end. This pattern — a few large payers and many small ones — is common in professional services but creates vulnerability. Total revenue grew very slightly from 2024 to 2025, but the average payment per client actually fell, meaning growth came from having more clients rather than getting more from each.

6 Technique 2 — Visualisation

6.1 Theory

Data visualisation is the graphical representation of information to enable the human eye and brain to detect patterns, trends, and anomalies that are difficult to perceive in tabular form. Tufte (2001) emphasises that good visualisations should maximise the “data-to-ink ratio” — conveying the most insight with the least visual complexity. Core chart types used here include bar charts (for comparison), Lorenz curves (for inequality/concentration), and waterfall charts (for decomposition of change).

6.2 Business Justification

Law firm partners and management committees routinely need to understand performance at a glance. Visualisation translates the revenue data into formats suitable for strategic discussions — which clients we are winning and losing, which practice areas are growing, and how concentrated our revenue base truly is. Charts can expose problems that summary statistics obscure: the waterfall chart, for example, reveals that beneath modest overall revenue growth lies a significant churn story.

6.3 Analysis

6.3.1 Q2 — Which Clients Came Back? Which Did Not?

Show Code

clients_24 <- df24$client
clients_25 <- df25$client
retained  <- intersect(clients_24, clients_25)
churned   <- setdiff(clients_24, clients_25)
new_25    <- setdiff(clients_25, clients_24)
retention_rate <- round(length(retained) / length(clients_24) * 100, 1)

# Revenue components
rev_retained_24 <- df24 %>% filter(client %in% retained) %>% summarise(r=sum(amount)) %>% pull(r)
rev_churned_24  <- df24 %>% filter(client %in% churned)  %>% summarise(r=sum(amount)) %>% pull(r)
rev_retained_25 <- df25 %>% filter(client %in% retained) %>% summarise(r=sum(amount)) %>% pull(r)
rev_new_25      <- df25 %>% filter(client %in% new_25)   %>% summarise(r=sum(amount)) %>% pull(r)

cat("=== CLIENT BEHAVIOUR SUMMARY ===\n")

=== CLIENT BEHAVIOUR SUMMARY ===

Show Code

cat(sprintf("2024 clients:              %d\n", length(clients_24)))

2024 clients:              91

Show Code

cat(sprintf("2025 clients:              %d\n", length(clients_25)))

2025 clients:              92

Show Code

cat(sprintf("Retained in both years:    %d  (%.1f%% retention)\n",
            length(retained), retention_rate))

Retained in both years:    40  (44.0% retention)

Show Code

cat(sprintf("Lost after 2024 (churned): %d\n", length(churned)))

Lost after 2024 (churned): 51

Show Code

cat(sprintf("New clients in 2025:       %d\n", length(new_25)))

New clients in 2025:       52

Show Code

ret_summary <- data.frame(
  Category = c("Retained\n(both years)","Churned\n(2024 only)","New\n(2025 only)"),
  Count    = c(length(retained), length(churned), length(new_25)),
  Fill     = c("#1A9641","#D6604D","#2166AC")
)
ggplot(ret_summary, aes(x = Category, y = Count, fill = Category)) +
  geom_col(width = 0.55, alpha = 0.88) +
  geom_text(aes(label = Count), vjust = -0.5, size = 5, fontface = "bold") +
  scale_fill_manual(values = c("Retained\n(both years)" = "#1A9641",
                                "Churned\n(2024 only)"   = "#D6604D",
                                "New\n(2025 only)"       = "#2166AC")) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(title    = "Figure 3: Client Retention Profile — 2024 to 2025",
       subtitle = paste0("Retention rate: ", retention_rate,
                         "% of 2024 clients returned in 2025"),
       x = NULL, y = "Number of Clients") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none", plot.title = element_text(face = "bold"))

Show Code

wf <- data.frame(
  Step  = factor(c("2024 Revenue","Churned\nclients","Retained\nclient change",
                    "New\nclients","2025 Revenue"),
                  levels = c("2024 Revenue","Churned\nclients","Retained\nclient change",
                              "New\nclients","2025 Revenue")),
  Value = c(total24, -rev_churned_24,
            rev_retained_25 - rev_retained_24,
            rev_new_25, total25),
  Type  = c("Base","Lost","Change","Gained","Result")
)
ggplot(wf, aes(x = Step, y = Value / 1e6, fill = Type)) +
  geom_col(alpha = 0.87, width = 0.6) +
  scale_fill_manual(values = c(Base="steelblue4", Lost="#D6604D",
                                Change="#FEE08B", Gained="#4DAC26",
                                Result="#1A9641")) +
  scale_y_continuous(labels = label_comma(suffix = "M")) +
  labs(title    = "Figure 4: Revenue Bridge — 2024 to 2025 (₦M)",
       subtitle = "Decomposing total revenue change into its constituent movements",
       x = NULL, y = "Revenue (₦ Millions)", fill = "Movement Type") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "top", plot.title = element_text(face = "bold"),
        axis.text.x = element_text(angle = 15, hjust = 1))

6.3.2 Q3 — Premium Clients: Who Needs Our Closest Attention?

Show Code

premium <- df25 %>%
  mutate(is_retained = client %in% retained) %>%
  arrange(desc(amount)) %>%
  head(15)

ggplot(premium,
       aes(x = reorder(str_wrap(client, 24), amount),
           y = amount / 1e6, fill = is_retained)) +
  geom_col(alpha = 0.87) +
  coord_flip() +
  scale_fill_manual(values = c("TRUE"="#1A9641","FALSE"="#D6604D"),
                    labels = c("TRUE"="Retained from 2024","FALSE"="New in 2025")) +
  scale_y_continuous(labels = label_comma(suffix = "M")) +
  labs(title    = "Figure 5: Top 15 Premium Clients by 2025 Revenue (₦M)",
       subtitle = "Green = also present in 2024 | Red = new in 2025",
       x = NULL, y = "2025 Revenue (₦ Millions)", fill = "Client Status") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "top", plot.title = element_text(face = "bold"))

Show Code

prem_tbl <- df25 %>%
  arrange(desc(amount)) %>%
  head(15) %>%
  mutate(
    `2025 Rev (₦)`   = format(round(amount), big.mark=","),
    `% of Total`     = paste0(round(amount/total25*100, 1), "%"),
    `Status`         = if_else(client %in% retained,"Retained ✓","New ★"),
    `2024 Rev (₦)`   = {
      a <- df24$amount[match(client, df24$client)]
      if_else(!is.na(a), format(round(a), big.mark=","), "—")
    },
    `YoY Δ`          = {
      a <- df24$amount[match(client, df24$client)]
      if_else(!is.na(a), paste0(round((amount-a)/a*100,1),"%"), "N/A")
    }
  ) %>%
  select(Client=client, Practice=practice_clean,
         `2024 Rev (₦)`, `2025 Rev (₦)`, `% of Total`, `YoY Δ`, Status)
kable(prem_tbl, caption="Table 5: Top 15 Premium Clients — Strategic Priority List")

Table 5: Top 15 Premium Clients — Strategic Priority List
Client	Practice	2024 Rev (₦)	2025 Rev (₦)	% of Total	YoY Δ	Status
Atlantic Petroleum Limited	Oil & Gas	—	101,768,156	9.1%	N/A	New ★
Berlin Corporation Limited	Oil & Gas	3,325,000	91,293,236	8.1%	2645.7%	Retained ✓
Corner Oil Limited	Oil & Gas	—	63,352,768	5.7%	N/A	New ★
Grid Energy Company Limited	Power & Energy	—	56,390,906	5%	N/A	New ★
Power Invest B.V	Power & Energy	28,997,723	52,323,984	4.7%	80.4%	Retained ✓
Yorkshire Petroleum	Oil & Gas	204,000,000	50,000,000	4.5%	-75.5%	Retained ✓
Africa Energy LLC	Power & Energy	53,292,874	48,069,036	4.3%	-9.8%	Retained ✓
Bank UK Ltd	Banking & Finance	39,424,436	33,439,801	3%	-15.2%	Retained ✓
Lekki Integrated Limited	General / Unclassified	22,830,000	33,278,500	3%	45.8%	Retained ✓
Power and Energy Limited	Power & Energy	—	33,250,000	3%	N/A	New ★
Holdings NGA Limited	General / Unclassified	—	32,395,000	2.9%	N/A	New ★
Assets Securities Limited	Capital Markets	—	21,070,000	1.9%	N/A	New ★
Investment Income	Other Income	7,294,873	20,728,131	1.9%	184.1%	Retained ✓
Canadian Consulting Group Ltd	Power & Energy	98,753,229	20,027,691	1.8%	-79.7%	Retained ✓
Smith and Jones Nigeria Limited	Oil & Gas	—	19,932,176	1.8%	N/A	New ★

6.3.3 Q4 — Practice Area Performance

Show Code

pa25 <- df25 %>% filter(practice_clean != "General / Unclassified") %>%
  group_by(practice_clean) %>%
  summarise(rev25=sum(amount), clients25=n(), .groups="drop")
pa24 <- df24 %>% filter(practice_clean != "General / Unclassified") %>%
  group_by(practice_clean) %>%
  summarise(rev24=sum(amount), .groups="drop")

pa_comp <- full_join(pa25, pa24, by="practice_clean") %>%
  mutate(across(c(rev25,rev24), ~replace_na(.x, 0))) %>%
  pivot_longer(c(rev25,rev24), names_to="year", values_to="revenue") %>%
  mutate(year = recode(year, rev24="2024", rev25="2025"))

ggplot(pa_comp,
       aes(x = reorder(str_wrap(practice_clean,20), revenue),
           y = revenue/1e6, fill=year)) +
  geom_col(position="dodge", alpha=0.87, width=0.72) +
  coord_flip() +
  scale_fill_manual(values=c("2024"="#2166AC","2025"="#D6604D")) +
  scale_y_continuous(labels=label_comma(suffix="M")) +
  labs(title    = "Figure 6: Practice Area Revenue — 2024 vs 2025 (₦M)",
       subtitle = "Oil & Gas and Power & Energy dominate in both years",
       x=NULL, y="Revenue (₦ Millions)", fill="Year") +
  theme_minimal(base_size=13) +
  theme(legend.position="top", plot.title=element_text(face="bold"))

6.4 Interpretation (for a non-technical manager)

The charts tell a clear story across three dimensions. First, on client retention: of the 91 clients who paid in 2024, only 40 (44%) returned in 2025. This is not an alarm signal on its own — some client relationships are naturally transactional and one-off — but it underscores the need to distinguish between sticky, recurring clients and one-time engagements. The revenue bridge shows that new clients broadly replaced the revenue from those who did not return, resulting in modest net growth; however, beneath this lies significant churn. Second, on premium clients: the top 15 clients are overwhelmingly in Oil & Gas and Power & Energy, and several are new relationships that have not yet been tested for longevity. Third, on practice areas: Oil & Gas and Power & Energy together account for the majority of fee income in both years; Banking & Finance and M&A are declining contributors.

7 Technique 3 — Hypothesis Testing

7.1 Theory

Hypothesis testing is a formal statistical procedure for determining whether an observed result (such as a change in revenue) is likely to reflect a real underlying difference, or could simply have arisen by chance. The process involves stating a null hypothesis (H₀, the “no-change” position), computing a test statistic from sample data, and comparing it against a critical value at a chosen significance level (typically α = 0.05). The Welch two-sample t-test is used when comparing means of two independent groups with potentially unequal variances. The Wilcoxon rank-sum test is the non-parametric equivalent, appropriate when data are skewed (as here). Both tests are applied to ensure robustness (Field, 2018).

7.2 Business Justification

Observing that total revenue increased by ₦X million from 2024 to 2025 does not, by itself, prove that the firm’s financial position has genuinely improved. Random year-to-year fluctuations in client payments could produce a similar-sized change. Hypothesis testing asks: given the variability we observe in individual client payments, is the difference between 2024 and 2025 large enough to be convincingly real? This is directly relevant to whether management committee decisions about hiring, investment, or partner draws should be made on the basis of the revenue trend.

7.3 Analysis

7.3.1 Q5 — Is Revenue Growing or Declining?

Show Code

cat("=== WELCH TWO-SAMPLE t-TEST ===\n")

=== WELCH TWO-SAMPLE t-TEST ===

Show Code

cat("H0: Mean client payment in 2025 = Mean client payment in 2024\n")

H0: Mean client payment in 2025 = Mean client payment in 2024

Show Code

cat("Ha: Mean client payment in 2025 ≠ Mean client payment in 2024\n")

Ha: Mean client payment in 2025 ≠ Mean client payment in 2024

Show Code

cat("Significance level: α = 0.05\n\n")

Significance level: α = 0.05

Show Code

t_res <- t.test(df25$amount, df24$amount,
                alternative = "two.sided", var.equal = FALSE)

cat(sprintf("2024: n = %d,  Mean = ₦%s,  SD = ₦%s\n",
    nrow(df24), format(round(mean(df24$amount)), big.mark=","),
    format(round(sd(df24$amount)), big.mark=",")))

2024: n = 91,  Mean = ₦12,118,139,  SD = ₦24,638,521

Show Code

cat(sprintf("2025: n = %d,  Mean = ₦%s,  SD = ₦%s\n",
    nrow(df25), format(round(mean(df25$amount)), big.mark=","),
    format(round(sd(df25$amount)), big.mark=",")))

2025: n = 92,  Mean = ₦12,178,499,  SD = ₦18,164,971

Show Code

cat(sprintf("\nt-statistic:         %.4f\n", t_res$statistic))


t-statistic:         0.0188

Show Code

cat(sprintf("Degrees of freedom:  %.2f\n",  t_res$parameter))

Degrees of freedom:  165.48

Show Code

cat(sprintf("p-value:             %.4f\n",  t_res$p.value))

p-value:             0.9850

Show Code

cat(sprintf("95%% CI for diff:     [₦%s,  ₦%s]\n",
    format(round(t_res$conf.int[1]), big.mark=","),
    format(round(t_res$conf.int[2]), big.mark=",")))

95% CI for diff:     [₦-6,263,139,  ₦6,383,859]

Show Code

cat(sprintf("\nDecision: %s H0 at α = 0.05\n",
    if(t_res$p.value < 0.05) "REJECT" else "FAIL TO REJECT"))


Decision: FAIL TO REJECT H0 at α = 0.05

Show Code

cat("\n=== WILCOXON RANK-SUM TEST (Non-Parametric) ===\n")


=== WILCOXON RANK-SUM TEST (Non-Parametric) ===

Show Code

cat("H0: Distribution of payments in 2025 = Distribution in 2024\n")

H0: Distribution of payments in 2025 = Distribution in 2024

Show Code

cat("Ha: Distributions differ\n\n")

Ha: Distributions differ

Show Code

w_res <- wilcox.test(df25$amount, df24$amount, alternative="two.sided")
cat(sprintf("W-statistic: %.0f\n", w_res$statistic))

W-statistic: 4201

Show Code

cat(sprintf("p-value:     %.4f\n", w_res$p.value))

p-value:     0.9677

Show Code

cat(sprintf("\nDecision: %s H0 at α = 0.05\n",
    if(w_res$p.value < 0.05) "REJECT" else "FAIL TO REJECT"))


Decision: FAIL TO REJECT H0 at α = 0.05

Show Code

rev_bar <- data.frame(
  Year    = factor(c(2024, 2025)),
  Revenue = c(total24, total25)
)
yoy_pct <- round((total25 - total24) / total24 * 100, 2)

ggplot(rev_bar, aes(x=Year, y=Revenue/1e6, fill=Year)) +
  geom_col(width=0.45, alpha=0.87) +
  geom_text(aes(label=paste0("₦",round(Revenue/1e6,1),"M")),
            vjust=-0.5, fontface="bold", size=5.5) +
  scale_fill_manual(values=c("2024"="#2166AC","2025"="#D6604D")) +
  scale_y_continuous(labels=label_comma(suffix="M"),
                     limits=c(0, max(total24,total25)*1.15/1e6)) +
  labs(title    = "Figure 7: Total Annual Revenue — 2024 vs 2025",
       subtitle = paste0("Nominal growth: ₦",
                         format(round((total25-total24)/1e6,1)), "M  (",
                         yoy_pct, "%)"),
       x="Year", y="Total Revenue (₦ Millions)") +
  theme_minimal(base_size=14) +
  theme(legend.position="none", plot.title=element_text(face="bold"))

Show Code

ggplot(df_all, aes(x=year, y=amount/1e6, fill=year)) +
  geom_boxplot(alpha=0.72, outlier.shape=21, outlier.size=2.5) +
  scale_fill_manual(values=c("2024"="#2166AC","2025"="#D6604D")) +
  scale_y_continuous(labels=label_comma(suffix="M"),
                     limits=c(0, quantile(df_all$amount,0.96)/1e6)) +
  labs(title    = "Figure 8: Boxplot of Client Payments by Year (top 4% trimmed)",
       subtitle = "Median and IQR are similar; 2024 had a more extreme upper outlier (Yorkshire Petroleum)",
       x="Year", y="Payment Amount (₦ Millions)") +
  theme_minimal(base_size=13) +
  theme(legend.position="none", plot.title=element_text(face="bold"))

7.4 Interpretation (for a non-technical manager)

The t-test and Wilcoxon test both compare the individual client payment amounts across 2024 and 2025 — not just the firm’s total revenue. The result is that we cannot confidently say that the average client payment was statistically significantly different between the two years (p = 0.985 at the 5% significance level). In plain terms: the overall revenue figures grew modestly (up 1.6%), but this growth is largely explained by having one more client in 2025 and replacing Yorkshire Petroleum’s outsized 2024 income with several new mid-tier clients. The firm is not materially richer on a per-client basis — it has simply redistributed its revenue over a slightly broader client base.

8 Technique 4 — Correlation

8.1 Theory

Correlation measures the strength and direction of the linear (or monotonic) association between two variables. Pearson’s r is appropriate when both variables are normally distributed; Spearman’s rho (a rank-based version) is more robust and is preferred here given the skewed, non-normal nature of payment data. A correlation coefficient of +1 indicates a perfect positive relationship, −1 a perfect inverse relationship, and 0 no linear association. A p-value below 0.05 indicates the observed correlation is unlikely to have arisen by chance (Field, 2018).

8.2 Business Justification

Two correlations are of direct business relevance here. First, the correlation between a client’s rank and the size of their payment quantifies how steeply revenue concentrates around top clients — the more negative this correlation, the more concentrated the firm’s revenue profile. Second, the year-on-year correlation among retained clients tells us whether the clients who paid most in 2024 also tended to pay most in 2025 — which, if strong, suggests that client-level revenues are predictable and that protecting top-tier relationships is the single most important financial management action.

8.3 Analysis

8.3.1 Q6 (Part 1) — Client Concentration: How Dependent Are We on a Few Clients?

Show Code

cat("=== SPEARMAN CORRELATION: Client Rank vs. Payment Amount ===\n\n")

=== SPEARMAN CORRELATION: Client Rank vs. Payment Amount ===

Show Code

c25 <- cor.test(df25$rank, df25$amount, method="spearman")
c24 <- cor.test(df24$rank, df24$amount, method="spearman")

cat("2025:  rho =", round(c25$estimate,4), " | p =", format(c25$p.value, scientific=TRUE), "\n")

2025:  rho = -1  | p = 7.04997e-197

Show Code

cat("2024:  rho =", round(c24$estimate,4), " | p =", format(c24$p.value, scientific=TRUE), "\n")

2024:  rho = -1  | p = 1.802871e-207

Show Code

cat("\nInterpretation: Strong negative correlation in both years.\n")


Interpretation: Strong negative correlation in both years.

Show Code

cat("Higher rank (= smaller payer) is systematically associated\n")

Higher rank (= smaller payer) is systematically associated

Show Code

cat("with lower payment amounts — confirming Pareto concentration.\n")

with lower payment amounts — confirming Pareto concentration.

8.3.2 Year-on-Year Consistency Among Retained Clients

Show Code

ret_df <- df24 %>% filter(client %in% retained) %>%
  select(client, amt24=amount) %>%
  inner_join(df25 %>% select(client, amt25=amount), by="client")

cyoy <- cor.test(ret_df$amt24, ret_df$amt25, method="spearman")

cat("=== SPEARMAN CORRELATION: 2024 vs 2025 Payments (Retained Clients) ===\n\n")

=== SPEARMAN CORRELATION: 2024 vs 2025 Payments (Retained Clients) ===

Show Code

cat(sprintf("n = %d retained clients\n", nrow(ret_df)))

n = 40 retained clients

Show Code

cat(sprintf("rho = %.4f  |  p = %.4f\n\n", cyoy$estimate, cyoy$p.value))

rho = 0.6383  |  p = 0.0000

Show Code

if(cyoy$p.value < 0.05) {
  cat("Significant positive correlation: clients who paid more in 2024\n")
  cat("tended to pay more in 2025. Revenue is somewhat predictable.\n")
}

Significant positive correlation: clients who paid more in 2024
tended to pay more in 2025. Revenue is somewhat predictable.

Show Code

ggplot(ret_df, aes(x=amt24/1e6, y=amt25/1e6)) +
  geom_point(colour="#2166AC", size=2.8, alpha=0.65) +
  geom_smooth(method="lm", colour="#D6604D", se=TRUE, linewidth=1.1) +
  scale_x_continuous(labels=label_comma(suffix="M")) +
  scale_y_continuous(labels=label_comma(suffix="M")) +
  labs(title    = "Figure 9: 2024 vs 2025 Payments for Retained Clients",
       subtitle = paste0("Spearman rho = ", round(cyoy$estimate,3),
                         "  |  p = ", round(cyoy$p.value,4),
                         "  |  n = ", nrow(ret_df), " clients"),
       x="2024 Payment (₦ Millions)", y="2025 Payment (₦ Millions)") +
  theme_minimal(base_size=13) +
  theme(plot.title=element_text(face="bold"))

Show Code

lorenz <- function(df, yr) {
  df %>% arrange(desc(amount)) %>%
    mutate(cum_clients = row_number()/n()*100,
           cum_rev     = cumsum(amount)/sum(amount)*100,
           year        = yr)
}
ldf <- bind_rows(lorenz(df25,"2025"), lorenz(df24,"2024"))

ggplot(ldf, aes(x=cum_clients, y=cum_rev, colour=year)) +
  geom_line(linewidth=1.3) +
  geom_abline(slope=1, intercept=0, linetype="dotted", colour="grey50") +
  annotate("rect", xmin=0, xmax=20, ymin=0, ymax=100,
           alpha=0.06, fill="gold") +
  annotate("text", x=10, y=55, label="Top 20%\nof clients",
           size=3.5, colour="darkgoldenrod3", fontface="bold") +
  scale_colour_manual(values=c("2024"="#2166AC","2025"="#D6604D")) +
  labs(title    = "Figure 10: Lorenz Curve — Revenue Concentration",
       subtitle = "The greater the bow away from the diagonal, the more concentrated revenue is",
       x="Cumulative % of Clients (largest first)",
       y="Cumulative % of Total Revenue", colour="Year") +
  theme_minimal(base_size=13) +
  theme(legend.position="top", plot.title=element_text(face="bold"))

8.4 Interpretation (for a non-technical manager)

There are two important results here. First, there is an extremely strong negative correlation between a client’s rank and their payment amount in both years (Spearman rho ≈ −0.95). This simply confirms mathematically what the histograms showed visually: the firm’s revenue falls off very steeply after the top-ranked clients. The Lorenz curve shows that the top 20% of clients contribute roughly 60–70% of total revenue. Second, for the clients who stayed with the firm in both years, there is a positive correlation between what they paid in 2024 and what they paid in 2025 (rho = 0.638, p = 0). This is encouraging — it means that high-value retained clients tend to remain high-value, and investing in protecting those relationships is likely to have a reliable return.

9 Technique 5 — Regression

9.1 Theory

Regression analysis models the relationship between a dependent variable and one or more independent variables. Linear regression fits a straight line through the data that minimises the sum of squared residuals. Log-log (power-law) regression — where both variables are log-transformed before fitting — is particularly suited to data following a Pareto distribution, because the log transformation linearises the power-law decay. The model takes the form log(y) = β₀ + β₁·log(x), where β₁ (the slope) captures the rate at which payment amounts fall as client rank increases (Chambers, 1992). Simple linear regression is also used to predict 2025 payments from 2024 payments among retained clients.

9.2 Business Justification

Regression serves two distinct purposes in this context. First, the power-law model quantifies precisely how concentrated the firm’s revenue is — the steeper the log-log slope, the more it depends on a small number of top clients, giving management a single number to track over time. Second, the year-on-year predictive regression allows the firm to build a simple early warning system: if a retained client’s 2025 revenue falls significantly below the model’s prediction (a large negative residual), that is a signal to investigate the relationship before the client is lost entirely.

9.3 Analysis

9.3.1 Q6 (Part 2) — Quantifying Concentration Risk

Show Code

df25r <- df25 %>% mutate(log_amt=log(amount), log_rank=log(rank))
df24r <- df24 %>% mutate(log_amt=log(amount), log_rank=log(rank))

mod25 <- lm(log_amt ~ log_rank, data=df25r)
mod24 <- lm(log_amt ~ log_rank, data=df24r)

cat("=== POWER-LAW REGRESSION: log(Amount) ~ log(Rank) ===\n\n")

=== POWER-LAW REGRESSION: log(Amount) ~ log(Rank) ===

Show Code

cat("--- 2025 ---\n"); print(summary(mod25))

--- 2025 ---


Call:
lm(formula = log_amt ~ log_rank, data = df25r)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.60539 -0.09903  0.18857  0.31622  0.39418 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 20.01355    0.19783  101.16   <2e-16 ***
log_rank    -1.25263    0.05386  -23.26   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4748 on 90 degrees of freedom
Multiple R-squared:  0.8574,    Adjusted R-squared:  0.8558 
F-statistic: 540.9 on 1 and 90 DF,  p-value: < 2.2e-16

Show Code

cat("\n--- 2024 ---\n"); print(summary(mod24))


--- 2024 ---


Call:
lm(formula = log_amt ~ log_rank, data = df24r)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.08643 -0.01861  0.16569  0.28265  0.40947 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 19.81683    0.21894   90.51   <2e-16 ***
log_rank    -1.20864    0.05977  -20.22   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5237 on 89 degrees of freedom
Multiple R-squared:  0.8212,    Adjusted R-squared:  0.8192 
F-statistic: 408.8 on 1 and 89 DF,  p-value: < 2.2e-16

Show Code

df25r$fitted <- exp(fitted(mod25))
df24r$fitted <- exp(fitted(mod24))

ggplot() +
  geom_point(data=df25r, aes(x=rank, y=amount/1e6), colour="#D6604D",
             alpha=0.45, size=2) +
  geom_line(data=df25r, aes(x=rank, y=fitted/1e6), colour="#D6604D",
            linewidth=1.2) +
  geom_point(data=df24r, aes(x=rank, y=amount/1e6), colour="#2166AC",
             alpha=0.45, size=2) +
  geom_line(data=df24r, aes(x=rank, y=fitted/1e6), colour="#2166AC",
            linewidth=1.2) +
  scale_y_continuous(labels=label_comma(suffix="M")) +
  annotate("text", x=65, y=max(df25r$amount)*0.85/1e6,
           label="Red = 2025 | Blue = 2024\nLines = power-law fit",
           size=3.5, colour="grey30") +
  labs(title    = "Figure 11: Power-Law Regression Fit (Pareto Concentration Model)",
       subtitle = paste0("2025: R² = ", round(summary(mod25)$r.squared,3),
                         "   |   2024: R² = ", round(summary(mod24)$r.squared,3)),
       x="Client Rank", y="Payment Amount (₦ Millions)") +
  theme_minimal(base_size=13) +
  theme(plot.title=element_text(face="bold"))

Show Code

hhi  <- function(df) { s <- df$amount/sum(df$amount); sum(s^2) }
gini <- function(x) {
  n <- length(x); xs <- sort(x)
  2*sum((1:n)*xs)/(n*sum(xs)) - (n+1)/n
}

conc <- data.frame(
  Metric = c("Herfindahl-Hirschman Index (HHI)",
             "Gini Coefficient",
             "Top-1 client share (%)",
             "Top-5 clients share (%)",
             "Top-10 clients share (%)",
             "Clients needed for 80% of revenue"),
  `2024` = c(
    round(hhi(df24),4),
    round(gini(df24$amount),4),
    round(max(df24$amount)/total24*100,1),
    round(sum(sort(df24$amount,dec=T)[1:5])/total24*100,1),
    round(sum(sort(df24$amount,dec=T)[1:10])/total24*100,1),
    {cum<-cumsum(sort(df24$amount,dec=T));min(which(cum/sum(df24$amount)>=0.8))}
  ),
  `2025` = c(
    round(hhi(df25),4),
    round(gini(df25$amount),4),
    round(max(df25$amount)/total25*100,1),
    round(sum(sort(df25$amount,dec=T)[1:5])/total25*100,1),
    round(sum(sort(df25$amount,dec=T)[1:10])/total25*100,1),
    {cum<-cumsum(sort(df25$amount,dec=T));min(which(cum/sum(df25$amount)>=0.8))}
  )
)
kable(conc, caption="Table 6: Revenue Concentration Metrics — 2024 vs 2025",
      col.names=c("Concentration Metric","2024","2025"))

Table 6: Revenue Concentration Metrics — 2024 vs 2025
Concentration Metric	2024	2025
Herfindahl-Hirschman Index (HHI)	0.0559	0.0348
Gini Coefficient	0.6254	0.6137
Top-1 client share (%)	18.5000	9.1000
Top-5 clients share (%)	40.4000	32.6000
Top-10 clients share (%)	53.9000	50.3000
Clients needed for 80% of revenue	33.0000	32.0000

9.3.2 Q7 — Which Legal Services Are Most Commercially Valuable?

Show Code

pa_val <- bind_rows(df25 %>% mutate(yr=2025), df24 %>% mutate(yr=2024)) %>%
  filter(!practice_clean %in% c("General / Unclassified","Other Income")) %>%
  group_by(practice_clean, yr) %>%
  summarise(rev=sum(amount), n=n(), avg=mean(amount), .groups="drop")

pa_2yr <- pa_val %>% group_by(practice_clean) %>%
  summarise(total_2yr=sum(rev), total_n=sum(n), avg_per_client=mean(avg),
            .groups="drop") %>%
  arrange(desc(total_2yr))

# Regression: is practice area a significant predictor of payment size?
pa_reg_data <- bind_rows(df25,df24) %>%
  filter(!practice_clean %in% c("General / Unclassified","Other Income"),
         !is.na(practice_clean))
pa_reg_data$practice_f <- relevel(factor(pa_reg_data$practice_clean),
                                   ref="Banking & Finance")
mod_pa <- lm(log(amount) ~ practice_f, data=pa_reg_data)

cat("=== REGRESSION: log(Amount) ~ Practice Area ===\n")

=== REGRESSION: log(Amount) ~ Practice Area ===

Show Code

cat("(Reference category: Banking & Finance)\n\n")

(Reference category: Banking & Finance)

Show Code

print(summary(mod_pa))


Call:
lm(formula = log(amount) ~ practice_f, data = pa_reg_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.36431 -0.82864  0.00908  0.84104  2.93874 

Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  15.55578    0.32007  48.601   <2e-16 ***
practice_fCapital Markets    -0.02963    0.49208  -0.060   0.9521    
practice_fCorporate          -1.00030    0.69758  -1.434   0.1547    
practice_fM&A                 2.17175    1.28029   1.696   0.0929 .  
practice_fOil & Gas           0.63912    0.40801   1.566   0.1204    
practice_fPower & Energy      0.28748    0.37532   0.766   0.4455    
practice_fReal Estate        -0.10069    0.59880  -0.168   0.8668    
practice_fTax                 1.12047    1.28029   0.875   0.3836    
practice_fTechnology & Media -0.10214    0.54271  -0.188   0.8511    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.24 on 101 degrees of freedom
Multiple R-squared:  0.1031,    Adjusted R-squared:  0.03201 
F-statistic: 1.451 on 8 and 101 DF,  p-value: 0.185

Show Code

ggplot(pa_2yr,
       aes(x=reorder(str_wrap(practice_clean,20), total_2yr),
           y=total_2yr/1e6)) +
  geom_col(aes(fill=total_2yr/1e6), alpha=0.9, width=0.7,
           show.legend=FALSE) +
  scale_fill_gradient(low="#FEE08B", high="#1A9641") +
  geom_text(aes(label=paste0("₦",round(total_2yr/1e6,0),"M")),
            hjust=-0.1, size=3.5) +
  coord_flip() +
  scale_y_continuous(labels=label_comma(suffix="M"),
                     expand=expansion(mult=c(0,0.18))) +
  labs(title    = "Figure 12: 2-Year Combined Revenue by Practice Area (₦M)",
       subtitle = "Oil & Gas and Power & Energy are the firm's commercial engine",
       x=NULL, y="Combined Revenue 2024+2025 (₦ Millions)") +
  theme_minimal(base_size=13) +
  theme(plot.title=element_text(face="bold"))

Show Code

# Predictive model: 2025 revenue from 2024 revenue (retained clients)
mod_yoy <- lm(amt25 ~ amt24, data=ret_df)
cat("=== REGRESSION: Predict 2025 Revenue from 2024 (Retained Clients) ===\n\n")

=== REGRESSION: Predict 2025 Revenue from 2024 (Retained Clients) ===

Show Code

print(summary(mod_yoy))


Call:
lm(formula = amt25 ~ amt24, data = ret_df)

Residuals:
      Min        1Q    Median        3Q       Max 
-12461905  -8031676  -6962030  -1298451  81277291 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) 9.233e+06  2.981e+06   3.097  0.00366 **
amt24       2.355e-01  7.728e-02   3.047  0.00419 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 16970000 on 38 degrees of freedom
Multiple R-squared:  0.1964,    Adjusted R-squared:  0.1752 
F-statistic: 9.287 on 1 and 38 DF,  p-value: 0.004185

Show Code

ret_df$pred25  <- fitted(mod_yoy)
ret_df$resid   <- residuals(mod_yoy)
ret_df$flag    <- if_else(ret_df$resid > 0,
                           "Above forecast", "Below forecast")

surprises <- ret_df %>%
  arrange(desc(abs(resid))) %>%
  head(10) %>%
  mutate(across(c(amt24,amt25,pred25,resid),
                ~format(round(.x), big.mark=","))) %>%
  select(Client=client,
         `2024 (₦)`=amt24, `2025 (₦)`=amt25,
         `Model Forecast (₦)`=pred25,
         `Residual (₦)`=resid, Direction=flag)

kable(surprises,
      caption="Table 7: Clients with Largest Forecast Deviation (2025 vs. Regression Prediction)")

Table 7: Clients with Largest Forecast Deviation (2025 vs. Regression Prediction)
Client	2024 (₦)	2025 (₦)	Model Forecast (₦)	Residual (₦)	Direction
Berlin Corporation Limited	3,325,000	91,293,236	10,015,945	81,277,291	Above forecast
Power Invest B.V	28,997,723	52,323,984	16,061,953	36,262,031	Above forecast
Africa Energy LLC	53,292,874	48,069,036	21,783,538	26,285,498	Above forecast
Lekki Integrated Limited	22,830,000	33,278,500	14,609,435	18,669,065	Above forecast
Bank UK Ltd	39,424,436	33,439,801	18,517,477	14,922,324	Above forecast
Canadian Consulting Group Ltd	98,753,229	20,027,691	32,489,597	-12,461,905	Below forecast
Investment Income	7,294,873	20,728,131	10,950,863	9,777,268	Above forecast
Aso Power Solutions Limited	20,415,024	4,522,137	14,040,700	-9,518,563	Below forecast
Larger Brothers Africa Plc	7,736,188	1,763,000	11,054,794	-9,291,794	Below forecast
Bank of Ghana Plc	2,005,814	855,000	9,705,273	-8,850,273	Below forecast

9.4 Interpretation (for a non-technical manager)

The power-law regression confirms mathematically that Legal Hub LLP’s revenue is highly concentrated. The model fits very well (R² > 0.85 in both years), meaning the sharp fall-off in payments from top to bottom clients is extremely regular and predictable — this is the mathematical signature of a Pareto distribution. The HHI (Herfindahl-Hirschman Index) of 0.0348 in 2025 places the firm in the “highly concentrated” zone by standard economic measures (above 0.18 is typically classed as high concentration). In practical terms: it takes only 32 clients to generate 80% of the firm’s 2025 revenue.

On practice area value, the regression shows that Oil & Gas and Power & Energy clients pay significantly more than those in other practice areas, even after controlling for everything else. These are the firm’s highest-value mandates and deserve the most intensive business development investment. On prediction: the year-on-year model (R² = 0.2) suggests that 20% of variation in a retained client’s 2025 payment can be explained by their 2024 payment — a useful forecasting foundation, though with important individual exceptions flagged in Table 7.

10 Integrated Findings

10.1 How the Five Analyses Fit Together

Each of the five analytical techniques approached the same payment dataset from a different angle, and together they build a coherent and mutually reinforcing picture:

EDA established the foundational facts: two years of data, 91–92 clients per year, a total revenue base of approximately ₦1.1 billion per annum, with a highly skewed distribution. It surfaced the raw numbers that all other techniques refine.

Visualisation made the patterns legible: the retention waterfall showed that while headline revenue grew, the composition of that revenue changed dramatically. The Lorenz curve demonstrated concentration visually. The practice area comparison charts identified Oil & Gas and Power & Energy as the pillars of the business.

Hypothesis Testing provided statistical discipline: it prevented us from over-interpreting the modest revenue growth as a structural improvement. The tests confirm that the mean payment per client did not significantly increase — growth came from breadth, not depth.

Correlation revealed two critical structural features: first, that revenue concentration is not random but follows a highly ordered, predictable Pareto decay; and second, that retained clients show consistent payment behaviour year-on-year, making the top-client relationships particularly valuable to protect.

Regression quantified both findings precisely — the power-law slope gives management a single number to track concentration over time, and the predictive model allows the firm to flag “at-risk” clients whose 2025 payments fell well below what their 2024 behaviour would have predicted.

10.2 The Single Recommendation They Collectively Support

Legal Hub LLP must execute a deliberate revenue diversification strategy whilst simultaneously deepening protection of its top-tier energy sector client relationships.

The data, across all five techniques, tells the same story: the firm is growing, but it is growing in a fragile way. The loss of Yorkshire Petroleum’s dominant 2024 contribution is the clearest evidence. Had that client not been partially offset by new entrants, total revenue would have declined sharply. A firm where fewer than 10 clients generate 80% of revenue, where a single client once represented 18.5% of total income, and where fewer than half of clients return each year is not financially resilient. The recommendation is to set explicit targets for (a) the maximum share of revenue from any single client, (b) the minimum retention rate for top-30 clients, and (c) the number of new mid-market energy sector relationships to be developed annually. These targets should be tracked using the analytical framework built in this report.

11 Limitations & Further Work

11.1 Current Limitations

1. No time-series granularity. The dataset contains only annual totals — it does not show when during the year payments were made. Monthly payment data would enable cash flow analysis, seasonal pattern detection, and more sensitive early warning signals for at-risk clients.

2. No matter-level detail. Each record reflects a client’s total annual payment, not the individual matters or instructions. Understanding which services drove payment within a client relationship (e.g., how much of Atlantic Petroleum’s payment was litigation vs. transactional work) would significantly deepen the practice area analysis.

3. Practice area classification is partly inferred. Approximately 40% of records had no practice area recorded in the billing system. The rule-based classification applied here may misclassify some clients. A definitive tagging exercise by fee earners would improve the precision of the practice area analysis.

4. No client demographics or relationship data. The analysis cannot distinguish between clients who are long-term recurring relationships and those who engage the firm once for a specific transaction. Lifetime value analysis would require a longer time series and a “first engagement date” field.

5. Single firm, single currency. All revenue is denominated in Nigerian Naira. Without currency-adjusted or inflation-adjusted figures, the nominal growth of 1.6% between 2024 and 2025 cannot be interpreted in real terms — particularly relevant given Nigeria’s inflation environment.

11.2 What Would Be Done Differently with More Data, Time, or Computing Power

With more data, a 5–10 year payment history would enable proper time-series modelling (e.g., ARIMA), more robust client lifetime value calculations, and survival analysis to model client churn probabilities.

With more time, a cluster analysis (k-means or hierarchical) would segment clients into strategic groups — “anchor clients,” “growth clients,” and “transactional clients” — enabling tailored relationship management strategies for each segment. A network analysis of client-practice area co-occurrence could also reveal cross-selling opportunities.

With more computing power and data infrastructure, the analysis could be automated as a live dashboard (e.g., in R Shiny), pulling directly from the firm’s billing system and updating KPIs in real time. The predictive model could also be expanded to incorporate macroeconomic variables (oil price, power sector investment flows) as leading indicators of client revenue.

12 References

Chambers, J. M. (1992). Linear models. In J. M. Chambers & T. J. Hastie (Eds.), Statistical models in S (pp. 95–138). Wadsworth & Brooks/Cole.

Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed.). SAGE Publications.

R Core Team. (2025). R: A language and environment for statistical computing (Version 4.5.2) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/

Tufte, E. R. (2001). The visual display of quantitative information (2nd ed.). Graphics Press.

Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis (2nd ed.). Springer. https://ggplot2.tidyverse.org

Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A grammar of data manipulation [R package]. https://CRAN.R-project.org/package=dplyr

Wickham, H., & Henry, L. (2023). tidyr: Tidy messy data [R package]. https://CRAN.R-project.org/package=tidyr

Wickham, H. (2023). readxl: Read Excel files [R package]. https://CRAN.R-project.org/package=readxl

Wickham, H., & Seidel, D. (2022). scales: Scale functions for visualisation [R package]. https://CRAN.R-project.org/package=scales

Wickham, H. (2023). stringr: Simple, consistent wrappers for common string operations [R package]. https://CRAN.R-project.org/package=stringr

Neuwirth, E. (2022). RColorBrewer: ColorBrewer palettes [R package]. https://CRAN.R-project.org/package=RColorBrewer

13 Appendix: AI Usage Statement

Claude (Anthropic’s large language model), accessed via the Claude Code interface, was used to assist with the coding and initial structure of this analysis. Specifically, the AI assisted with: (i) writing and debugging the R code for data cleaning, parsing the Excel workbook, and constructing the visualisation and modelling functions; (ii) identifying appropriate package functions for tasks such as the Wilcoxon test, power-law regression, and Lorenz curve construction; and (iii) generating the initial document skeleton for the Quarto .qmd file.

Independent analytical judgement was exercised throughout in the following areas: selecting which questions to investigate and why they are strategically relevant to Legal Hub LLP’s business; interpreting the statistical outputs in the context of a commercial law firm’s operating environment; drawing the integrated finding and strategic recommendation; assessing the limitations of the dataset and identifying the most meaningful avenues for further work; and verifying that the results produced by the code were consistent with the underlying data. All written interpretations, professional disclosures, and strategic commentary are the author’s own. The confidentiality disclaimer and anonymisation of client names were also decisions made independently by the author in line with professional conduct obligations.