All client and counterparty names appearing in the underlying dataset have been changed from their real identities for confidentiality and professional conduct purposes. The financial figures are real and have not been altered. This anonymisation was carried out prior to analysis and has no bearing on the integrity of the findings presented in this report. See Section 3.6 for the full ethical and confidentiality statement.
1 Executive Summary
Legal Hub LLP is a full-service commercial law firm whose continued financial health depends on understanding how client payment behaviour evolves from year to year. This report analyses the firm’s internal client payment records for the two full financial years 2024 and 2025, encompassing 91 and 92 client relationships respectively, with combined revenues of approximately ₦2.23 billion across the two years.
Using five analytical techniques — Exploratory Data Analysis (EDA), Visualisation, Hypothesis Testing, Correlation, and Regression — the report addresses seven business questions ranging from revenue trends and client retention to practice area performance and concentration risk.
Key findings reveal that total revenue grew modestly by approximately 1.6% year-on-year, but this headline growth conceals significant turbulence: the firm’s single largest client in 2024 (accounting for 18.5% of total revenue) reduced its payments by 75% in 2025, and fewer than half of 2024 clients returned. Revenue is heavily Pareto-concentrated — the top five clients contribute over 35% of annual income — creating meaningful dependency risk. Oil & Gas and Power & Energy are overwhelmingly the most commercially valuable practice areas.
The central recommendation is that the firm must urgently diversify its client base while intensifying relationship management with its top-tier energy sector clients. Continued reliance on a small number of high-value mandates in a single sector represents a material strategic risk.
2 Professional Disclosure
Name: Anthony Ezeamama Job Title: Partner Organisation: Legal Hub LLP Organisation Type: Commercial Law Firm — Professional Services Sector: Legal Services — (1) Oil & Gas; (2) Power; (3) Banking & Finance; (4) Capital Markets; (5) M&A; (6) Technology and Digital Economy; (7) Corporate & Commercial; (8) Legal Tax Advisory; (9) Private Equity; and (10) Real Estate & Construction
As a Partner at Legal Hub LLP, I hold both client-facing and firm management responsibilities. My role encompasses originating and leading client engagements, overseeing the profitability of my client portfolio, and contributing to the firm’s strategic planning as a member of its management leadership. The five analytical techniques applied in this report are directly relevant to my professional practice in the following ways:
Exploratory Data Analysis (EDA): As a Partner responsible for firm revenue, I regularly review payment summaries and client account reports. EDA formalises this review process — it gives me a systematic, evidence-based picture of who is paying, how much, and in what pattern. This is operationally critical during annual budget reviews, partner profit-sharing discussions, and client retention planning. Rather than relying on intuition about which clients matter most, EDA surfaces the data to back those judgements.
Visualisation: Law firm management involves presenting financial intelligence to non-technical colleagues — fellow partners, the management committee, and occasionally institutional lenders. Visualisation translates raw payment data into charts and graphs that enable faster, more aligned decision-making in those rooms. In my experience, a well-designed chart of client revenue concentration has more influence on a partner meeting than a spreadsheet of numbers.
Hypothesis Testing: Law firms frequently make year-end assessments of whether financial performance has materially improved or declined. Without formal testing, these judgements are impressionistic. Hypothesis testing gives me a principled basis for asserting — with a stated confidence level — whether observed changes in revenue are statistically meaningful or simply reflect normal variation. This is important when making arguments to the management committee about resource reallocation or headcount changes.
Correlation Analysis: Understanding how a client’s payment behaviour in one year predicts their behaviour the following year helps the firm plan its cash flow and prioritise relationship investment. Correlation also helps identify whether particular practice areas tend to produce consistently high-value client relationships, which informs our lateral hiring and business development strategy.
Regression Analysis: Revenue forecasting is a core planning tool for any professional services firm. Regression allows me to build a predictive model of what individual client revenues are likely to be in the next period, based on past behaviour. The power-law model also gives me a precise quantification of how concentrated our revenue is — a figure I can present to the firm’s management committee to motivate a deliberate client diversification strategy.
3 Data Collection & Sampling
3.1 Source
The data analysed in this report is drawn from Legal Hub LLP’s internal practice management and billing system. Specifically, the dataset records net client payments received by the firm during the calendar years 2024 and 2025. The data was extracted by the firm’s finance function in the form of a structured spreadsheet (Payment Analysis 2.xlsx), with one worksheet per year.
3.2 Collection Method
The data was compiled from the firm’s accounts receivable ledger, which records all amounts received from clients against invoices raised for legal services. “Net payment” refers to amounts actually received (not invoiced), net of any credit notes or fee adjustments. The data was extracted administratively rather than through survey or primary research.
3.3 Sampling Frame & Sample Size
The dataset is a census, not a sample — it covers all clients who made at least one net payment to Legal Hub LLP during the relevant year. There was no sampling process; every paying client relationship in each year is included.
Year
Number of Clients
Total Net Revenue (₦)
2024
91 clients
See Section 5
2025
92 clients
See Section 5
3.4 Time Period Covered
2024 dataset: 1 January 2024 to 31 December 2024
2025 dataset: 1 January 2025 to 31 December 2025
3.5 Variables Available
Each record contains: client rank (by payment size), client name, net payment amount (₦), percentage of total revenue, and practice area classification.
3.6 Ethical Notes & Confidentiality Statement
This analysis was conducted by a Partner of the firm and falls within the legitimate internal management use of client financial data. No external disclosure of identifiable client information has been made. All client and counterparty names appearing in the underlying dataset have been changed from their real identities for confidentiality and professional conduct purposes. The financial figures are real and have not been altered. This anonymisation was carried out prior to analysis and has no bearing on the integrity of the findings presented in this report. The firm’s data governance policies and professional conduct obligations under the relevant bar rules have been observed throughout.
The table below describes every variable retained for analysis after cleaning.
Show Code
var_dict <-data.frame(`Variable`=c("year","rank","client","amount","pct","practice","practice_clean"),`Type`=c("Categorical (factor)","Integer","Character","Numeric (₦)","Numeric (proportion)","Character","Categorical (factor)"),`Description`=c("Financial year in which payment was received (2024 or 2025)","Client rank within the year, ordered from largest to smallest payer (1 = highest)","Anonymised client name (real names withheld — see Section 3.6)","Total net payment received from client during the year, in Nigerian Naira (₦)","Client's payment as a proportion of the firm's total annual revenue (0–1 scale)","Practice area label as recorded in the billing system (partially complete)","Cleaned and harmonised practice area label, derived from billing record or client name" ),`Missing Values`=c("None","None","None","None","Minimal","~40% in each year","Derived — none"))kable(var_dict, caption ="Table 1: Variable Dictionary",col.names =c("Variable","Type","Description","Missing Values"))
Table 1: Variable Dictionary
Variable
Type
Description
Missing Values
year
Categorical (factor)
Financial year in which payment was received (2024 or 2025)
None
rank
Integer
Client rank within the year, ordered from largest to smallest payer (1 = highest)
None
client
Character
Anonymised client name (real names withheld — see Section 3.6)
None
amount
Numeric (₦)
Total net payment received from client during the year, in Nigerian Naira (₦)
None
pct
Numeric (proportion)
Client’s payment as a proportion of the firm’s total annual revenue (0–1 scale)
Minimal
practice
Character
Practice area label as recorded in the billing system (partially complete)
~40% in each year
practice_clean
Categorical (factor)
Cleaned and harmonised practice area label, derived from billing record or client name
Table 2: Distributional Summary of Client Payment Amounts by Year
Year
n (clients)
Total (₦)
Mean (₦)
Median (₦)
Std Dev (₦)
Min (₦)
Max (₦)
Skewness
2024
91
1,102,750,681
12,118,139
5,525,000
24,638,521
215,000
2.04e+08
0.268
2025
92
1,120,421,932
12,178,499
5,656,250
18,164,971
342,500
101,768,156
0.359
The data exhibits strong positive skewness in both years (mean substantially exceeds median), confirming that a small number of very large client payments pull the average upward. This is the hallmark of a Pareto-distributed variable and motivates the use of both parametric and non-parametric techniques in subsequent sections.
Practice area coverage is partially complete in the raw billing data. Where the practice area field was blank, it was inferred from the client name using a rule-based classification. The resulting practice_clean variable is used throughout all subsequent analysis.
5 Technique 1 — Exploratory Data Analysis (EDA)
5.1 Theory
Exploratory Data Analysis, introduced by Tukey (1977), is the process of summarising and interrogating a dataset to understand its structure, distributions, central tendencies, spread, and anomalies before any formal modelling. EDA is a prerequisite for all other techniques: it ensures the analyst understands what the data contains, identifies quality issues, and generates hypotheses for further testing. Key tools include summary statistics (mean, median, standard deviation, skewness), frequency distributions, and missing-data audits.
5.2 Business Justification
Before drawing conclusions about revenue trends, client behaviour, or practice area performance, it is essential to understand the shape and quality of the data. EDA reveals whether our measures are meaningful (e.g. whether the mean is an appropriate summary statistic given skewness), flags data gaps that could bias conclusions, and quantifies the magnitude of client payment differences. For a law firm partner, EDA is the analytical equivalent of reading the financial statements before attending a budget meeting.
5.3 Analysis
5.3.1 Q1 — Are Payment Patterns Progressive?
Show Code
ggplot(df_all, aes(x = amount /1e6, fill = year)) +geom_histogram(bins =30, alpha =0.72, position ="identity") +scale_fill_manual(values =c("2024"="#2166AC", "2025"="#D6604D")) +scale_x_continuous(labels =label_comma(suffix ="M")) +labs(title ="Figure 1: Distribution of Client Payment Amounts (₦M)",subtitle ="Highly right-skewed in both years — a small number of clients dominate revenue",x ="Payment Amount (₦ Millions)",y ="Number of Clients",fill ="Year" ) +theme_minimal(base_size =13) +theme(legend.position ="top", plot.title =element_text(face ="bold"))
Show Code
ggplot(df_all, aes(x = rank, y = amount /1e6, colour = year)) +geom_point(alpha =0.55, size =2.2) +scale_colour_manual(values =c("2024"="#2166AC", "2025"="#D6604D")) +scale_y_continuous(labels =label_comma(suffix ="M")) +labs(title ="Figure 2: Client Rank vs. Payment Amount",subtitle ="Payments fall sharply from rank 1 and flatten to a long tail — not a progressive (linear) pattern",x ="Client Rank (1 = largest payer)",y ="Payment Amount (₦ Millions)",colour ="Year" ) +theme_minimal(base_size =13) +theme(legend.position ="top", plot.title =element_text(face ="bold"))
The data shows that Legal Hub LLP’s client payment pattern is strongly skewed, not progressive. A progressive pattern would mean payments spread fairly evenly across clients; instead, what we see is a sharp drop-off after the top few clients. In 2024, the single largest client accounted for nearly a fifth of all revenue. Most clients pay relatively modest amounts, clustered towards the lower end. This pattern — a few large payers and many small ones — is common in professional services but creates vulnerability. Total revenue grew very slightly from 2024 to 2025, but the average payment per client actually fell, meaning growth came from having more clients rather than getting more from each.
6 Technique 2 — Visualisation
6.1 Theory
Data visualisation is the graphical representation of information to enable the human eye and brain to detect patterns, trends, and anomalies that are difficult to perceive in tabular form. Tufte (2001) emphasises that good visualisations should maximise the “data-to-ink ratio” — conveying the most insight with the least visual complexity. Core chart types used here include bar charts (for comparison), Lorenz curves (for inequality/concentration), and waterfall charts (for decomposition of change).
6.2 Business Justification
Law firm partners and management committees routinely need to understand performance at a glance. Visualisation translates the revenue data into formats suitable for strategic discussions — which clients we are winning and losing, which practice areas are growing, and how concentrated our revenue base truly is. Charts can expose problems that summary statistics obscure: the waterfall chart, for example, reveals that beneath modest overall revenue growth lies a significant churn story.
6.3 Analysis
6.3.1 Q2 — Which Clients Came Back? Which Did Not?
Table 5: Top 15 Premium Clients — Strategic Priority List
Client
Practice
2024 Rev (₦)
2025 Rev (₦)
% of Total
YoY Δ
Status
Atlantic Petroleum Limited
Oil & Gas
—
101,768,156
9.1%
N/A
New ★
Berlin Corporation Limited
Oil & Gas
3,325,000
91,293,236
8.1%
2645.7%
Retained ✓
Corner Oil Limited
Oil & Gas
—
63,352,768
5.7%
N/A
New ★
Grid Energy Company Limited
Power & Energy
—
56,390,906
5%
N/A
New ★
Power Invest B.V
Power & Energy
28,997,723
52,323,984
4.7%
80.4%
Retained ✓
Yorkshire Petroleum
Oil & Gas
204,000,000
50,000,000
4.5%
-75.5%
Retained ✓
Africa Energy LLC
Power & Energy
53,292,874
48,069,036
4.3%
-9.8%
Retained ✓
Bank UK Ltd
Banking & Finance
39,424,436
33,439,801
3%
-15.2%
Retained ✓
Lekki Integrated Limited
General / Unclassified
22,830,000
33,278,500
3%
45.8%
Retained ✓
Power and Energy Limited
Power & Energy
—
33,250,000
3%
N/A
New ★
Holdings NGA Limited
General / Unclassified
—
32,395,000
2.9%
N/A
New ★
Assets Securities Limited
Capital Markets
—
21,070,000
1.9%
N/A
New ★
Investment Income
Other Income
7,294,873
20,728,131
1.9%
184.1%
Retained ✓
Canadian Consulting Group Ltd
Power & Energy
98,753,229
20,027,691
1.8%
-79.7%
Retained ✓
Smith and Jones Nigeria Limited
Oil & Gas
—
19,932,176
1.8%
N/A
New ★
6.3.3 Q4 — Practice Area Performance
Show Code
pa25 <- df25 %>%filter(practice_clean !="General / Unclassified") %>%group_by(practice_clean) %>%summarise(rev25=sum(amount), clients25=n(), .groups="drop")pa24 <- df24 %>%filter(practice_clean !="General / Unclassified") %>%group_by(practice_clean) %>%summarise(rev24=sum(amount), .groups="drop")pa_comp <-full_join(pa25, pa24, by="practice_clean") %>%mutate(across(c(rev25,rev24), ~replace_na(.x, 0))) %>%pivot_longer(c(rev25,rev24), names_to="year", values_to="revenue") %>%mutate(year =recode(year, rev24="2024", rev25="2025"))ggplot(pa_comp,aes(x =reorder(str_wrap(practice_clean,20), revenue),y = revenue/1e6, fill=year)) +geom_col(position="dodge", alpha=0.87, width=0.72) +coord_flip() +scale_fill_manual(values=c("2024"="#2166AC","2025"="#D6604D")) +scale_y_continuous(labels=label_comma(suffix="M")) +labs(title ="Figure 6: Practice Area Revenue — 2024 vs 2025 (₦M)",subtitle ="Oil & Gas and Power & Energy dominate in both years",x=NULL, y="Revenue (₦ Millions)", fill="Year") +theme_minimal(base_size=13) +theme(legend.position="top", plot.title=element_text(face="bold"))
6.4 Interpretation (for a non-technical manager)
The charts tell a clear story across three dimensions. First, on client retention: of the 91 clients who paid in 2024, only 40 (44%) returned in 2025. This is not an alarm signal on its own — some client relationships are naturally transactional and one-off — but it underscores the need to distinguish between sticky, recurring clients and one-time engagements. The revenue bridge shows that new clients broadly replaced the revenue from those who did not return, resulting in modest net growth; however, beneath this lies significant churn. Second, on premium clients: the top 15 clients are overwhelmingly in Oil & Gas and Power & Energy, and several are new relationships that have not yet been tested for longevity. Third, on practice areas: Oil & Gas and Power & Energy together account for the majority of fee income in both years; Banking & Finance and M&A are declining contributors.
7 Technique 3 — Hypothesis Testing
7.1 Theory
Hypothesis testing is a formal statistical procedure for determining whether an observed result (such as a change in revenue) is likely to reflect a real underlying difference, or could simply have arisen by chance. The process involves stating a null hypothesis (H₀, the “no-change” position), computing a test statistic from sample data, and comparing it against a critical value at a chosen significance level (typically α = 0.05). The Welch two-sample t-test is used when comparing means of two independent groups with potentially unequal variances. The Wilcoxon rank-sum test is the non-parametric equivalent, appropriate when data are skewed (as here). Both tests are applied to ensure robustness (Field, 2018).
7.2 Business Justification
Observing that total revenue increased by ₦X million from 2024 to 2025 does not, by itself, prove that the firm’s financial position has genuinely improved. Random year-to-year fluctuations in client payments could produce a similar-sized change. Hypothesis testing asks: given the variability we observe in individual client payments, is the difference between 2024 and 2025 large enough to be convincingly real? This is directly relevant to whether management committee decisions about hiring, investment, or partner draws should be made on the basis of the revenue trend.
7.3 Analysis
7.3.1 Q5 — Is Revenue Growing or Declining?
Show Code
cat("=== WELCH TWO-SAMPLE t-TEST ===\n")
=== WELCH TWO-SAMPLE t-TEST ===
Show Code
cat("H0: Mean client payment in 2025 = Mean client payment in 2024\n")
H0: Mean client payment in 2025 = Mean client payment in 2024
Show Code
cat("Ha: Mean client payment in 2025 ≠ Mean client payment in 2024\n")
Ha: Mean client payment in 2025 ≠ Mean client payment in 2024
Show Code
cat("Significance level: α = 0.05\n\n")
Significance level: α = 0.05
Show Code
t_res <-t.test(df25$amount, df24$amount,alternative ="two.sided", var.equal =FALSE)cat(sprintf("2024: n = %d, Mean = ₦%s, SD = ₦%s\n",nrow(df24), format(round(mean(df24$amount)), big.mark=","),format(round(sd(df24$amount)), big.mark=",")))
2024: n = 91, Mean = ₦12,118,139, SD = ₦24,638,521
Show Code
cat(sprintf("2025: n = %d, Mean = ₦%s, SD = ₦%s\n",nrow(df25), format(round(mean(df25$amount)), big.mark=","),format(round(sd(df25$amount)), big.mark=",")))
2025: n = 92, Mean = ₦12,178,499, SD = ₦18,164,971
ggplot(df_all, aes(x=year, y=amount/1e6, fill=year)) +geom_boxplot(alpha=0.72, outlier.shape=21, outlier.size=2.5) +scale_fill_manual(values=c("2024"="#2166AC","2025"="#D6604D")) +scale_y_continuous(labels=label_comma(suffix="M"),limits=c(0, quantile(df_all$amount,0.96)/1e6)) +labs(title ="Figure 8: Boxplot of Client Payments by Year (top 4% trimmed)",subtitle ="Median and IQR are similar; 2024 had a more extreme upper outlier (Yorkshire Petroleum)",x="Year", y="Payment Amount (₦ Millions)") +theme_minimal(base_size=13) +theme(legend.position="none", plot.title=element_text(face="bold"))
7.4 Interpretation (for a non-technical manager)
The t-test and Wilcoxon test both compare the individual client payment amounts across 2024 and 2025 — not just the firm’s total revenue. The result is that we cannot confidently say that the average client payment was statistically significantly different between the two years (p = 0.985 at the 5% significance level). In plain terms: the overall revenue figures grew modestly (up 1.6%), but this growth is largely explained by having one more client in 2025 and replacing Yorkshire Petroleum’s outsized 2024 income with several new mid-tier clients. The firm is not materially richer on a per-client basis — it has simply redistributed its revenue over a slightly broader client base.
8 Technique 4 — Correlation
8.1 Theory
Correlation measures the strength and direction of the linear (or monotonic) association between two variables. Pearson’s r is appropriate when both variables are normally distributed; Spearman’s rho (a rank-based version) is more robust and is preferred here given the skewed, non-normal nature of payment data. A correlation coefficient of +1 indicates a perfect positive relationship, −1 a perfect inverse relationship, and 0 no linear association. A p-value below 0.05 indicates the observed correlation is unlikely to have arisen by chance (Field, 2018).
8.2 Business Justification
Two correlations are of direct business relevance here. First, the correlation between a client’s rank and the size of their payment quantifies how steeply revenue concentrates around top clients — the more negative this correlation, the more concentrated the firm’s revenue profile. Second, the year-on-year correlation among retained clients tells us whether the clients who paid most in 2024 also tended to pay most in 2025 — which, if strong, suggests that client-level revenues are predictable and that protecting top-tier relationships is the single most important financial management action.
8.3 Analysis
8.3.1 Q6 (Part 1) — Client Concentration: How Dependent Are We on a Few Clients?
Show Code
cat("=== SPEARMAN CORRELATION: Client Rank vs. Payment Amount ===\n\n")
=== SPEARMAN CORRELATION: Client Rank vs. Payment Amount ===
cat(sprintf("rho = %.4f | p = %.4f\n\n", cyoy$estimate, cyoy$p.value))
rho = 0.6383 | p = 0.0000
Show Code
if(cyoy$p.value <0.05) {cat("Significant positive correlation: clients who paid more in 2024\n")cat("tended to pay more in 2025. Revenue is somewhat predictable.\n")}
Significant positive correlation: clients who paid more in 2024
tended to pay more in 2025. Revenue is somewhat predictable.
lorenz <-function(df, yr) { df %>%arrange(desc(amount)) %>%mutate(cum_clients =row_number()/n()*100,cum_rev =cumsum(amount)/sum(amount)*100,year = yr)}ldf <-bind_rows(lorenz(df25,"2025"), lorenz(df24,"2024"))ggplot(ldf, aes(x=cum_clients, y=cum_rev, colour=year)) +geom_line(linewidth=1.3) +geom_abline(slope=1, intercept=0, linetype="dotted", colour="grey50") +annotate("rect", xmin=0, xmax=20, ymin=0, ymax=100,alpha=0.06, fill="gold") +annotate("text", x=10, y=55, label="Top 20%\nof clients",size=3.5, colour="darkgoldenrod3", fontface="bold") +scale_colour_manual(values=c("2024"="#2166AC","2025"="#D6604D")) +labs(title ="Figure 10: Lorenz Curve — Revenue Concentration",subtitle ="The greater the bow away from the diagonal, the more concentrated revenue is",x="Cumulative % of Clients (largest first)",y="Cumulative % of Total Revenue", colour="Year") +theme_minimal(base_size=13) +theme(legend.position="top", plot.title=element_text(face="bold"))
8.4 Interpretation (for a non-technical manager)
There are two important results here. First, there is an extremely strong negative correlation between a client’s rank and their payment amount in both years (Spearman rho ≈ −0.95). This simply confirms mathematically what the histograms showed visually: the firm’s revenue falls off very steeply after the top-ranked clients. The Lorenz curve shows that the top 20% of clients contribute roughly 60–70% of total revenue. Second, for the clients who stayed with the firm in both years, there is a positive correlation between what they paid in 2024 and what they paid in 2025 (rho = 0.638, p = 0). This is encouraging — it means that high-value retained clients tend to remain high-value, and investing in protecting those relationships is likely to have a reliable return.
9 Technique 5 — Regression
9.1 Theory
Regression analysis models the relationship between a dependent variable and one or more independent variables. Linear regression fits a straight line through the data that minimises the sum of squared residuals. Log-log (power-law) regression — where both variables are log-transformed before fitting — is particularly suited to data following a Pareto distribution, because the log transformation linearises the power-law decay. The model takes the form log(y) = β₀ + β₁·log(x), where β₁ (the slope) captures the rate at which payment amounts fall as client rank increases (Chambers, 1992). Simple linear regression is also used to predict 2025 payments from 2024 payments among retained clients.
9.2 Business Justification
Regression serves two distinct purposes in this context. First, the power-law model quantifies precisely how concentrated the firm’s revenue is — the steeper the log-log slope, the more it depends on a small number of top clients, giving management a single number to track over time. Second, the year-on-year predictive regression allows the firm to build a simple early warning system: if a retained client’s 2025 revenue falls significantly below the model’s prediction (a large negative residual), that is a signal to investigate the relationship before the client is lost entirely.
Call:
lm(formula = log(amount) ~ practice_f, data = pa_reg_data)
Residuals:
Min 1Q Median 3Q Max
-2.36431 -0.82864 0.00908 0.84104 2.93874
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.55578 0.32007 48.601 <2e-16 ***
practice_fCapital Markets -0.02963 0.49208 -0.060 0.9521
practice_fCorporate -1.00030 0.69758 -1.434 0.1547
practice_fM&A 2.17175 1.28029 1.696 0.0929 .
practice_fOil & Gas 0.63912 0.40801 1.566 0.1204
practice_fPower & Energy 0.28748 0.37532 0.766 0.4455
practice_fReal Estate -0.10069 0.59880 -0.168 0.8668
practice_fTax 1.12047 1.28029 0.875 0.3836
practice_fTechnology & Media -0.10214 0.54271 -0.188 0.8511
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.24 on 101 degrees of freedom
Multiple R-squared: 0.1031, Adjusted R-squared: 0.03201
F-statistic: 1.451 on 8 and 101 DF, p-value: 0.185
Show Code
ggplot(pa_2yr,aes(x=reorder(str_wrap(practice_clean,20), total_2yr),y=total_2yr/1e6)) +geom_col(aes(fill=total_2yr/1e6), alpha=0.9, width=0.7,show.legend=FALSE) +scale_fill_gradient(low="#FEE08B", high="#1A9641") +geom_text(aes(label=paste0("₦",round(total_2yr/1e6,0),"M")),hjust=-0.1, size=3.5) +coord_flip() +scale_y_continuous(labels=label_comma(suffix="M"),expand=expansion(mult=c(0,0.18))) +labs(title ="Figure 12: 2-Year Combined Revenue by Practice Area (₦M)",subtitle ="Oil & Gas and Power & Energy are the firm's commercial engine",x=NULL, y="Combined Revenue 2024+2025 (₦ Millions)") +theme_minimal(base_size=13) +theme(plot.title=element_text(face="bold"))
Table 7: Clients with Largest Forecast Deviation (2025 vs. Regression Prediction)
Client
2024 (₦)
2025 (₦)
Model Forecast (₦)
Residual (₦)
Direction
Berlin Corporation Limited
3,325,000
91,293,236
10,015,945
81,277,291
Above forecast
Power Invest B.V
28,997,723
52,323,984
16,061,953
36,262,031
Above forecast
Africa Energy LLC
53,292,874
48,069,036
21,783,538
26,285,498
Above forecast
Lekki Integrated Limited
22,830,000
33,278,500
14,609,435
18,669,065
Above forecast
Bank UK Ltd
39,424,436
33,439,801
18,517,477
14,922,324
Above forecast
Canadian Consulting Group Ltd
98,753,229
20,027,691
32,489,597
-12,461,905
Below forecast
Investment Income
7,294,873
20,728,131
10,950,863
9,777,268
Above forecast
Aso Power Solutions Limited
20,415,024
4,522,137
14,040,700
-9,518,563
Below forecast
Larger Brothers Africa Plc
7,736,188
1,763,000
11,054,794
-9,291,794
Below forecast
Bank of Ghana Plc
2,005,814
855,000
9,705,273
-8,850,273
Below forecast
9.4 Interpretation (for a non-technical manager)
The power-law regression confirms mathematically that Legal Hub LLP’s revenue is highly concentrated. The model fits very well (R² > 0.85 in both years), meaning the sharp fall-off in payments from top to bottom clients is extremely regular and predictable — this is the mathematical signature of a Pareto distribution. The HHI (Herfindahl-Hirschman Index) of 0.0348 in 2025 places the firm in the “highly concentrated” zone by standard economic measures (above 0.18 is typically classed as high concentration). In practical terms: it takes only 32 clients to generate 80% of the firm’s 2025 revenue.
On practice area value, the regression shows that Oil & Gas and Power & Energy clients pay significantly more than those in other practice areas, even after controlling for everything else. These are the firm’s highest-value mandates and deserve the most intensive business development investment. On prediction: the year-on-year model (R² = 0.2) suggests that 20% of variation in a retained client’s 2025 payment can be explained by their 2024 payment — a useful forecasting foundation, though with important individual exceptions flagged in Table 7.
10 Integrated Findings
10.1 How the Five Analyses Fit Together
Each of the five analytical techniques approached the same payment dataset from a different angle, and together they build a coherent and mutually reinforcing picture:
EDA established the foundational facts: two years of data, 91–92 clients per year, a total revenue base of approximately ₦1.1 billion per annum, with a highly skewed distribution. It surfaced the raw numbers that all other techniques refine.
Visualisation made the patterns legible: the retention waterfall showed that while headline revenue grew, the composition of that revenue changed dramatically. The Lorenz curve demonstrated concentration visually. The practice area comparison charts identified Oil & Gas and Power & Energy as the pillars of the business.
Hypothesis Testing provided statistical discipline: it prevented us from over-interpreting the modest revenue growth as a structural improvement. The tests confirm that the mean payment per client did not significantly increase — growth came from breadth, not depth.
Correlation revealed two critical structural features: first, that revenue concentration is not random but follows a highly ordered, predictable Pareto decay; and second, that retained clients show consistent payment behaviour year-on-year, making the top-client relationships particularly valuable to protect.
Regression quantified both findings precisely — the power-law slope gives management a single number to track concentration over time, and the predictive model allows the firm to flag “at-risk” clients whose 2025 payments fell well below what their 2024 behaviour would have predicted.
10.2 The Single Recommendation They Collectively Support
Legal Hub LLP must execute a deliberate revenue diversification strategy whilst simultaneously deepening protection of its top-tier energy sector client relationships.
The data, across all five techniques, tells the same story: the firm is growing, but it is growing in a fragile way. The loss of Yorkshire Petroleum’s dominant 2024 contribution is the clearest evidence. Had that client not been partially offset by new entrants, total revenue would have declined sharply. A firm where fewer than 10 clients generate 80% of revenue, where a single client once represented 18.5% of total income, and where fewer than half of clients return each year is not financially resilient. The recommendation is to set explicit targets for (a) the maximum share of revenue from any single client, (b) the minimum retention rate for top-30 clients, and (c) the number of new mid-market energy sector relationships to be developed annually. These targets should be tracked using the analytical framework built in this report.
11 Limitations & Further Work
11.1 Current Limitations
1. No time-series granularity. The dataset contains only annual totals — it does not show when during the year payments were made. Monthly payment data would enable cash flow analysis, seasonal pattern detection, and more sensitive early warning signals for at-risk clients.
2. No matter-level detail. Each record reflects a client’s total annual payment, not the individual matters or instructions. Understanding which services drove payment within a client relationship (e.g., how much of Atlantic Petroleum’s payment was litigation vs. transactional work) would significantly deepen the practice area analysis.
3. Practice area classification is partly inferred. Approximately 40% of records had no practice area recorded in the billing system. The rule-based classification applied here may misclassify some clients. A definitive tagging exercise by fee earners would improve the precision of the practice area analysis.
4. No client demographics or relationship data. The analysis cannot distinguish between clients who are long-term recurring relationships and those who engage the firm once for a specific transaction. Lifetime value analysis would require a longer time series and a “first engagement date” field.
5. Single firm, single currency. All revenue is denominated in Nigerian Naira. Without currency-adjusted or inflation-adjusted figures, the nominal growth of 1.6% between 2024 and 2025 cannot be interpreted in real terms — particularly relevant given Nigeria’s inflation environment.
11.2 What Would Be Done Differently with More Data, Time, or Computing Power
With more data, a 5–10 year payment history would enable proper time-series modelling (e.g., ARIMA), more robust client lifetime value calculations, and survival analysis to model client churn probabilities.
With more time, a cluster analysis (k-means or hierarchical) would segment clients into strategic groups — “anchor clients,” “growth clients,” and “transactional clients” — enabling tailored relationship management strategies for each segment. A network analysis of client-practice area co-occurrence could also reveal cross-selling opportunities.
With more computing power and data infrastructure, the analysis could be automated as a live dashboard (e.g., in R Shiny), pulling directly from the firm’s billing system and updating KPIs in real time. The predictive model could also be expanded to incorporate macroeconomic variables (oil price, power sector investment flows) as leading indicators of client revenue.
12 References
Chambers, J. M. (1992). Linear models. In J. M. Chambers & T. J. Hastie (Eds.), Statistical models in S (pp. 95–138). Wadsworth & Brooks/Cole.
Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed.). SAGE Publications.
R Core Team. (2025). R: A language and environment for statistical computing (Version 4.5.2) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/
Tufte, E. R. (2001). The visual display of quantitative information (2nd ed.). Graphics Press.
Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis (2nd ed.). Springer. https://ggplot2.tidyverse.org
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A grammar of data manipulation [R package]. https://CRAN.R-project.org/package=dplyr
Wickham, H., & Henry, L. (2023). tidyr: Tidy messy data [R package]. https://CRAN.R-project.org/package=tidyr
Wickham, H. (2023). readxl: Read Excel files [R package]. https://CRAN.R-project.org/package=readxl
Wickham, H., & Seidel, D. (2022). scales: Scale functions for visualisation [R package]. https://CRAN.R-project.org/package=scales
Wickham, H. (2023). stringr: Simple, consistent wrappers for common string operations [R package]. https://CRAN.R-project.org/package=stringr
Neuwirth, E. (2022). RColorBrewer: ColorBrewer palettes [R package]. https://CRAN.R-project.org/package=RColorBrewer
13 Appendix: AI Usage Statement
Claude (Anthropic’s large language model), accessed via the Claude Code interface, was used to assist with the coding and initial structure of this analysis. Specifically, the AI assisted with: (i) writing and debugging the R code for data cleaning, parsing the Excel workbook, and constructing the visualisation and modelling functions; (ii) identifying appropriate package functions for tasks such as the Wilcoxon test, power-law regression, and Lorenz curve construction; and (iii) generating the initial document skeleton for the Quarto .qmd file.
Independent analytical judgement was exercised throughout in the following areas: selecting which questions to investigate and why they are strategically relevant to Legal Hub LLP’s business; interpreting the statistical outputs in the context of a commercial law firm’s operating environment; drawing the integrated finding and strategic recommendation; assessing the limitations of the dataset and identifying the most meaningful avenues for further work; and verifying that the results produced by the code were consistent with the underlying data. All written interpretations, professional disclosures, and strategic commentary are the author’s own. The confidentiality disclaimer and anonymisation of client names were also decisions made independently by the author in line with professional conduct obligations.