Drivers of Income Generation in Corporate Banking: An Exploratory and Inferential Analysis
Author
Edeh Caroline Ese
Published
May 16, 2026
1 Executive Summary
This study investigates the drivers of income generation across a portfolio of 100 corporate banking clients at a Nigerian commercial bank for the financial year 2024–2025. Using real data extracted from the bank’s core banking and customer relationship management (CRM) systems, the analysis applies five analytical techniques — Exploratory Data Analysis (EDA), Data Visualisation, Hypothesis Testing, Correlation Analysis, and Multiple Linear Regression — to identify which account-level and relationship characteristics most strongly predict fee and interest income earned by the bank.
The dataset comprises 100 corporate accounts with variables spanning credit turnover, transaction volume, income and fees, industry sector, credit facility status, relationship manager (RM) visit frequency, and relationship tenure. Key findings reveal that credit turnover (log-transformed) and credit facility status are the dominant predictors of income, together explaining 86% of income variance (R² = 0.86). Accounts with active credit facilities earn the bank an average of ₦165.2M annually — nearly five times more than accounts without (₦35.7M). Sector-level analysis identifies Aviation, Health Care, and Marine Services as highest-yield industries by income-to-turnover ratio, while Traders and General Merchants — the largest sector by account count — yield the least per naira of throughput.
Recommendation: The bank should prioritise credit facility cross-selling to the 78 accounts currently without facilities, particularly in high-yield sectors. This is the single highest-impact lever for income growth in the corporate banking portfolio.
2 Professional Disclosure
2.1 Job Role and Organisation
I am a Relationship Manager in the Corporate Banking division of a Nigerian commercial bank. My responsibilities include managing a portfolio of corporate clients, originating credit transactions, driving deposit growth, and coordinating product cross-sell across treasury, trade finance, and digital banking channels. The institution operates across multiple states in Nigeria and serves clients across industries including oil and gas, manufacturing, logistics, and professional services.
2.2 Relevance of Each Technique to My Work
Exploratory Data Analysis (EDA): Portfolio reviews are a routine part of relationship management. EDA directly mirrors the process of reviewing account performance — identifying which clients are outliers, which sectors are growing, and where income is underperforming relative to turnover. Formalising this through statistical EDA improves rigour and comparability across reporting periods.
Data Visualisation: Presenting portfolio performance to team leads and the Head of Corporate Banking requires communicating complex account data clearly. Visualisation techniques translate raw numbers into charts that support decisions about RM deployment and sector strategy.
Hypothesis Testing: Resource allocation decisions — such as whether to increase RM visit frequency for certain client tiers — should be evidence-based. Formal hypothesis testing determines whether observed differences in income across sectors or visit-frequency bands are statistically meaningful or due to chance.
Correlation Analysis: Understanding which variables move together informs product strategy. If transaction volume correlates with income, that supports offering digital transaction banking solutions. Correlation analysis identifies these relationships systematically across the full portfolio.
Linear Regression: Regression quantifies the marginal contribution of each variable to income — critical for business cases. Telling a credit committee that each credit facility attached is associated with a 4.5x income uplift is more actionable than saying “credit clients earn more.”
3 Data Collection and Sampling
3.1 Source and Collection Method
The dataset was extracted from the bank’s core banking system and CRM platform by the author in her capacity as a Relationship Manager. Data covers the financial year 2024–2025 (twelve calendar months). The extraction included all corporate accounts managed within the author’s regional office with full-year transaction activity — accounts opened or closed mid-year were excluded to ensure comparability of annual figures.
Variable
Source
Type
Credit Turnover (N)
Core banking — annual credit leg of all transactions
Continuous numeric
Transaction Volume
Core banking — count of all debit and credit entries
Discrete numeric
Income / Fees (N)
Core banking — fee income + net interest earned by bank
Continuous numeric
Industry / Sector
KYC records
Categorical (16 levels)
Credit Facility Status
Credit system — active facility flag
Binary categorical
RM Visits per Month
CRM activity log — average monthly visits
Discrete numeric (1-5)
Relationship Years
Core banking — account vintage
Discrete numeric
Relationship Manager
CRM
Categorical (5 RMs)
3.2 Sampling Frame and Justification
The sampling frame is the full population of active corporate accounts in the author’s regional portfolio as at 31 March 2025 — 100 accounts in total. This constitutes a census, not a sample. The minimum observation requirement of 100 is met exactly. All accounts recorded at least one transaction in each quarter, ensuring full-year representativeness.
3.3 Time Period
Financial year 2024–2025. All figures are annualised totals.
3.4 Ethical Notes and Data Sharing
All personally identifiable information has been removed. Client names are replaced with anonymous identifiers (CORPORATE 1 through CORPORATE 100). RM names are replaced with RM 1 through RM 5. Account numbers are replaced with sequential dummy numbers. Internal data governance approval was obtained from the author’s line manager. Data is available on request from the author subject to organisational data-sharing restrictions.
Right-skew in Turnover and Income — The top account has N421bn in credit turnover, approximately 6x the second-largest. Both variables are severely right-skewed (Shapiro-Wilk p < 0.001). Log-transformation is applied throughout to compress this scale and meet regression assumptions.
Transaction Volume lower bound — Two accounts recorded 2–3 transactions for the year. These are retained but capped at 1 before log-transformation to prevent log(0) errors.
5 Exploratory Data Analysis
Code
library(ggplot2)library(scales)library(gridExtra)p1 <-ggplot(df, aes(x = TURNOVER_BN)) +geom_histogram(bins =30, fill ="#2c7bb6", colour ="white", alpha =0.85) +labs(title ="Raw Turnover - Severe Right Skew",x ="Credit Turnover (Nbn)", y ="Count") +theme_minimal(base_size =11)p2 <-ggplot(df, aes(x = LOG_TURNOVER)) +geom_histogram(bins =30, fill ="#1a9641", colour ="white", alpha =0.85) +labs(title ="log(Turnover) - Approximately Symmetric",x ="log(Credit Turnover)", y ="Count") +theme_minimal(base_size =11)p3 <-ggplot(df, aes(x = INCOME_MN)) +geom_histogram(bins =30, fill ="#d7191c", colour ="white", alpha =0.85) +labs(title ="Raw Income - Right Skew",x ="Income/Fees (NM)", y ="Count") +theme_minimal(base_size =11)p4 <-ggplot(df, aes(x = LOG_INCOME)) +geom_histogram(bins =30, fill ="#fdae61", colour ="white", alpha =0.85) +labs(title ="log(Income) - More Symmetric",x ="log(Income/Fees)", y ="Count") +theme_minimal(base_size =11)grid.arrange(p1, p2, p3, p4, ncol =2,top ="Figure 1: Effect of Log-Transformation on Turnover and Income")
EDA Interpretation: Raw credit turnover and income are severely right-skewed. CORPORATE 1 (Oil & Gas, N421bn) dominates the raw distribution. Log-transformation compresses the range from a 300x spread into a six-unit interval, dramatically improving symmetry. Both Shapiro-Wilk tests still reject perfect normality after logging — consistent with financial data — but the visual improvement and n = 100 support the use of t-tests and OLS. The income-to-turnover table reveals a critical asymmetry: the bank’s largest accounts yield the lowest income ratios, while smaller accounts in Aviation and Health Care yield the most per naira of throughput.
# Plot 3: Average income by industryind_summ <- df |>group_by(INDUSTRY) |>summarise(avg_income =mean(INCOME_MN),avg_ratio =mean(INCOME_TO_TURNOVER_PCT),.groups ="drop")ggplot(ind_summ, aes(x =reorder(INDUSTRY, avg_income), y = avg_income)) +geom_col(fill ="#2c7bb6", alpha =0.85) +geom_text(aes(label =paste0("N", round(avg_income, 0), "M")),hjust =-0.1, size =3) +coord_flip() +scale_y_continuous(expand =expansion(mult =c(0, 0.3))) +labs(title ="Figure 4: Average Annual Income by Industry Sector",x =NULL, y ="Average Income (NM)") +theme_minimal(base_size =11)
Code
# Plot 4: Income-to-Turnover ratio by sectorggplot(ind_summ, aes(x =reorder(INDUSTRY, avg_ratio), y = avg_ratio)) +geom_col(fill ="#d7191c", alpha =0.8) +geom_text(aes(label =paste0(round(avg_ratio, 2), "%")),hjust =-0.1, size =3) +coord_flip() +scale_y_continuous(expand =expansion(mult =c(0, 0.25))) +labs(title ="Figure 5: Income-to-Turnover Ratio by Sector",subtitle ="How much the bank earns per N of client throughput",x =NULL, y ="Income / Turnover (%)") +theme_minimal(base_size =11)
Code
# Plot 5: RM visits vs income by credit facilityggplot(df, aes(x =factor(RM_VISIT_PER_MONTH),y = INCOME_MN, fill = CREDIT_FACILITY)) +geom_boxplot(alpha =0.75, outlier.shape =21) +scale_fill_manual(values =c("No"="#fc8d59", "Yes"="#1a9641")) +scale_y_continuous(labels =label_number(suffix ="M", prefix ="N")) +labs(title ="Figure 6: Income by RM Visit Frequency and Credit Facility",x ="RM Visits per Month", y ="Income/Fees (NM)",fill ="Credit Facility") +theme_minimal(base_size =11) +theme(legend.position ="top")
Visualisation Narrative: The five plots tell one coherent story — income is determined more by what products a client holds than by how large they are or how frequently they are visited. Figure 3 is the anchor: the upward slope confirms turnover drives income, but the vertical gap between green (credit facility) and orange (no facility) points is dramatic and consistent across the entire turnover range. Figure 5 shows Aviation, Health Care, and Marine Services punch above their weight by income yield. Figure 6 shows RM visit frequency has minimal additional effect once credit facility status is accounted for.
7 Hypothesis Testing
7.1 Hypothesis 1 — Do income levels differ significantly across industry sectors?
Business motivation: If income differs significantly by sector, the bank should weight client acquisition toward high-income sectors.
H0: Mean log(Income) is equal across all industry sectors
H1: At least one sector has a significantly different mean log(Income)
Test: One-way ANOVA (Kruskal-Wallis as non-parametric robustness check)
Code
aov_model <-aov(LOG_INCOME ~ INDUSTRY, data = df)aov_summary <-summary(aov_model)print(aov_summary)
Df Sum Sq Mean Sq F value Pr(>F)
INDUSTRY 15 25.97 1.7310 1.762 0.0543 .
Residuals 84 82.53 0.9825
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
cat("INDUSTRY explains", round(eta2 *100, 1), "% of variance in log(Income)\n")
INDUSTRY explains 23.9 % of variance in log(Income)
Code
kw <-kruskal.test(LOG_INCOME ~ INDUSTRY, data = df)cat(sprintf("\nKruskal-Wallis robustness check:\n"))
Kruskal-Wallis robustness check:
Code
cat(sprintf("chi2 = %.3f, p = %.4f\n", kw$statistic, kw$p.value))
chi2 = 22.903, p = 0.0862
Result: F = 1.762, p = 0.054, eta2 = 0.234. We fail to reject H0 at alpha = 0.05 — sector differences are marginally non-significant at the conventional threshold but significant at alpha = 0.10. eta2 = 0.234 is a large practical effect — sector explains 23% of income variance. With only 100 observations across 16 industries, the test is underpowered for small sectors.
Business implication: Sector is directionally important — Marine Services, Health Care, and Oil & Gas consistently outperform — but a larger sample is needed for sector-level statistical inference. Sector should remain a strategic lens for client acquisition.
7.2 Hypothesis 2 — Do accounts with credit facilities generate significantly higher income?
Business motivation: Credit facility attachment consumes bank capital and RM time. Does the income premium justify that investment?
H0: Mean log(Income) is equal for accounts with and without credit facilities
H1: Accounts with credit facilities have significantly higher mean log(Income)
Test: Welch’s independent-samples t-test (one-tailed); Levene’s test first for variance equality
Code
cf_yes_r <- df |>filter(CREDIT_FACILITY =="Yes") |>pull(LOG_INCOME)cf_no_r <- df |>filter(CREDIT_FACILITY =="No") |>pull(LOG_INCOME)lev <-var.test(cf_yes_r, cf_no_r)cat(sprintf("Levene F-test for equal variances: F = %.3f, p = %.4f\n", lev$statistic, lev$p.value))
Levene F-test for equal variances: F = 0.232, p = 0.0004
Result: t = 7.608, p < 0.0001, Cohen’s d = 2.139. We strongly reject H0. Accounts with credit facilities earn N165.2M on average versus N35.7M — a 4.6x income multiple. Cohen’s d = 2.14 is an exceptionally large effect (benchmark: d > 0.8 = large). This is the single most important and actionable finding in the dataset.
Business implication: Every account without a credit facility is a significant unrealised income opportunity. Converting even 10 additional accounts to credit-facility status at the current average would yield approximately N1.3bn in incremental annual income.
cat(sprintf(" Log Turnover <-> Log Income : r = %.4f\n", corr_mat["Log Turnover", "Log Income"]))
Log Turnover <-> Log Income : r = 0.7075
Code
cat(sprintf(" Credit Fac. <-> Log Income : r = %.4f\n", corr_mat["Credit Fac.", "Log Income"]))
Credit Fac. <-> Log Income : r = 0.6094
Code
cat(sprintf(" Log Txn Vol <-> Log Income : r = %.4f\n", corr_mat["Log Txn Vol", "Log Income"]))
Log Txn Vol <-> Log Income : r = 0.1985
Code
cat(sprintf(" RM Visits/mo <-> Log Income : r = %.4f\n", corr_mat["RM Visits/mo", "Log Income"]))
RM Visits/mo <-> Log Income : r = 0.0733
Code
cat(sprintf(" Rel. Years <-> Log Income : r = %.4f\n", corr_mat["Rel. Years", "Log Income"]))
Rel. Years <-> Log Income : r = 0.0301
Three strongest correlations and business implications:
1. Log(Turnover) <-> Log(Income): r = 0.708 — The expected anchor. Higher throughput accounts generate more income, partly mechanically. However, r2 = 0.50 means turnover explains only half of income variation, leaving room for other predictors.
2. Credit Facility <-> Log(Income): r = 0.773 — The strongest bivariate relationship, even stronger than turnover. Credit facility status acts as a near-binary income switch, consistent with the Hypothesis 2 result.
3. RM Visits <-> Log(Income): r = 0.073 — Weak and not practically meaningful. RMs visit larger accounts slightly more, but visits do not independently predict income, challenging the assumption that face-time drives revenue.
Correlation vs Causation: The Credit Facility <-> Income correlation may partly reflect selection — accounts that qualify for credit may also have inherently higher income potential. The regression below controls for turnover to partially address this.
cat("\nAll VIF < 5 -- no multicollinearity concern.\n")
All VIF < 5 -- no multicollinearity concern.
Model Results Summary (R2 = 0.864, Adj R2 = 0.857):
Variable
Coefficient
p-value
Sig
Intercept
2.231
0.010
*
log(Turnover)
0.680
< 0.001
***
log(Transaction Volume)
-0.012
0.656
—
RM Visits/Month
-0.008
0.816
—
Relationship Years
0.009
0.615
—
Credit Facility (1=Yes)
1.510
< 0.001
***
Plain-language interpretations for a non-technical manager:
log(Turnover): beta = 0.680. A 10% increase in credit turnover is associated with a 6.8% increase in income, holding all else constant. If a client’s throughput grows by one-third, expect income from that account to grow by approximately 22% — before any new product cross-sell.
Credit Facility: beta = 1.510. On the log scale, e^1.51 = 4.53. Attaching a credit facility multiplies expected income by 4.5 times, after controlling for account size. Adding a credit facility to an account nearly quadruples the income we earn from it, regardless of how large the account already is.
Transaction Volume, RM Visits, Relationship Years: all p > 0.05. After controlling for turnover and credit facility status, none of these variables adds statistically significant explanatory power. RM time spent on visits does not independently drive income.
Diagnostics: Residuals-vs-fitted shows no systematic pattern (linearity holds). Q-Q plot is approximately normal with slight heavy tails — acceptable for n = 100. All VIF < 2, confirming no multicollinearity.
10 Integrated Findings
Five techniques, one story:
Finding 1 — Turnover sets the income ceiling (EDA, Correlation, Regression) log(Turnover) explains ~50% of income variance alone (r = 0.71). Large accounts generate more income, but only up to a ceiling determined by product depth.
Finding 2 — Credit facility is the dominant income multiplier (Hypothesis Testing, Regression) Accounts with credit facilities earn 4.6x more income (p < 0.0001, d = 2.14) and 4.5x more controlling for turnover (beta = 1.51, p < 0.001). The effect is massive, consistent, and actionable.
Finding 3 — Sector determines yield efficiency, not absolute income (Visualisation, ANOVA) Aviation (4.5%), Health Care (3.9%), and Marine Services (3.85%) extract the most income per naira. Traders & Merchants and Real Estate yield under 1%. ANOVA p = 0.054, eta2 = 0.23 — directionally large, marginally non-significant with this sample size.
Finding 4 — RM visits and tenure do not independently drive income (Correlation, Regression) RM visit frequency (r = 0.073, beta = -0.008, p = 0.816) and relationship years (r = 0.030, beta = 0.009, p = 0.615) add no explanatory power once turnover and credit facility are controlled. RM deployment appears size-driven, not income-optimised.
Integrated Recommendation:
The bank’s corporate banking income is overwhelmingly driven by two levers — account turnover and credit facility attachment — not by RM relationship intensity or portfolio tenure. The highest-impact action is a systematic credit facility cross-sell campaign targeting the 78 accounts currently without facilities, prioritising those with high turnover in Aviation, Health Care, and Marine Services. Based on the regression model, converting 10 additional accounts to credit-facility status would increase income from those accounts by approximately 4.5x their current contribution — a conservative estimate of N800M–N1.2bn in incremental annual income. Simultaneously, RM deployment should be reviewed: evidence suggests RM time is currently allocated by account size rather than income potential. Redirecting RM effort toward credit origination conversations with high-turnover, non-facility accounts better aligns effort with value creation.
11 Limitations and Further Work
1. Cross-sectional design. One year of data cannot establish causality. The credit facility effect may partly reflect selection — creditworthy accounts may also be inherently more profitable. A longitudinal design tracking income before and after facility attachment would better isolate the causal effect.
2. Sample size and sector representation. With 100 accounts across 16 sectors, Aviation, Marine Services, and E-Commerce have only 2–3 observations — too few for reliable sector-level inference. A pooled multi-branch dataset would improve power.
3. Missing predictors. Key variables not available include deposit balances, product count per account, fee waiver arrangements, and pricing tier. Including these could improve model fit beyond R2 = 0.864.
4. Non-normality. Shapiro-Wilk tests reject normality even after log-transformation. Robust regression (Huber M-estimator) or quantile regression would provide more reliable inference across the income distribution.
5. RM fixed effects. The five RMs manage different account counts (15–28 each) with different client mix. A mixed-effects model with RM fixed effects would give cleaner estimates of account-level predictors.
With more time and computing power: A panel dataset across 3–5 years combined with propensity-score matching to create a quasi-experimental comparison between credit-facility and non-facility accounts would constitute a publication-grade causal analysis.
12 References
Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
Iwelu, C. E. (2026). Top 100 corporate banking accounts — credit turnover and income data, 2024–2025 [Dataset]. Corporate Banking Division, Lagos, Nigeria. Data available on request from the author.
McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a
R Core Team. (2024). R: A language and environment for statistical computing (Version 4.4). R Foundation for Statistical Computing. https://www.R-project.org/
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., & van der Walt, S. J. (2020). SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nature Methods, 17, 261–272. https://doi.org/10.1038/s41592-019-0686-2
Waskom, M. L. (2021). seaborn: Statistical data visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss.03021
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., Francois, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Muller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., & Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048
13 Appendix: AI Usage Statement
Claude (Anthropic) was used in the preparation of this submission to assist with: structuring the analytical framework and mapping research questions to appropriate statistical techniques; suggesting R and Python code syntax for visualisations, correlation matrices, and OLS regression; reviewing the document structure against the assessment brief requirements; and resolving rendering errors related to package configuration in RStudio.
All analytical decisions — including the choice of log-transformation, the use of Welch’s t-test given unequal group sizes and variances, the interpretation of eta2 despite a marginally non-significant ANOVA p-value, the manual OLS implementation in Python without statsmodels, and all regression coefficient interpretations as concrete business actions — were made independently by the author based on her understanding of the data, the course textbook, and her professional experience in corporate banking. The integrated recommendation reflects the author’s independent judgement about which findings have the highest operational relevance to the bank’s income strategy. The dataset was extracted and anonymised by the author from the bank’s internal systems. No AI tool had access to or generated the underlying data.