Analyzing Customer Revenue and Product Adoption in Commercial Banking
Author
[Your Name]
Published
May 11, 2026
1. Executive Summary
This analysis examines the financial performance and product adoption patterns of 1,802 commercial banking customers. The primary objective was to identify drivers of total revenue and evaluate the impact of business segments and product usage on profitability. Using exploratory data analysis, it was discovered that revenue is highly positively skewed, with a small number of high-value accounts contributing disproportionately to the total. Inferential testing confirmed a significant difference in revenue between small and medium business segments (\(p < 0.001\)), while loan status did not significantly impact average balances. A correlation analysis revealed a near-perfect relationship between average balance and total revenue (\(r = 0.99\)). Finally, a multiple linear regression model (R² = 0.98) demonstrated that for every unit increase in average balance, revenue increases by 0.058 units, with the West region showing the highest regional revenue uplift. Based on these findings, it is recommended that the bank focuses on migrating small segment customers to the medium segment through targeted credit facilities to maximize revenue growth.
Technique Relevance: 1. EDA: Essential for identifying outliers in transaction volumes that could indicate fraud or reporting errors. 2. Visualisation: Critical for communicating complex financial trends to non-technical regional heads. 3. Hypothesis Testing: Used to validate whether marketing campaigns for digital banking actually lead to higher balances. 4. Correlation: Helps identify which products (e.g., POS or Mobile Banking) are most strongly linked to revenue. 5. Regression: Allows for the forecasting of annual revenue based on projected deposit growth.
3. Data Collection & Sampling
Source: Internal organizational records (anonymized for academic purposes). Collection Method: Data extraction from the Core Banking System (CBS) via SQL query. Sampling Frame: Active commercial accounts as of Q1 2026. Sample Size: 1,802 observations. Time Period: January 2026 to April 2026. Ethical Note: All personally identifiable information (PII) has been removed. Customer numbers are replaced with generic IDs (e.g., “Acc_11”).
4. Data Description
The dataset contains 1,802 rows and 17 variables, including: - Categorical:REGION, SUB_BUS_SGT, CATEGORY. - Numeric:2026_AVG_BALANCE, TOTAL REVENUE, 2026_DR_TOVER. - Date:AC_OPEN_DATE.
5. Analysis 1: Exploratory Data Analysis (EDA)
During the EDA phase, two primary data quality issues were identified: 1. Skewness: Total Revenue exhibited an extreme positive skewness of 14.51, indicating that the mean is heavily influenced by a few top-tier customers. 2. Outliers: 17 accounts were identified as statistical outliers in AVG_BALANCE (Z-score > 3). Handling: To address skewness for visualization, a log-transformation was applied. Outliers were retained for the final model to reflect the actual business reality of “whale” accounts in commercial banking.
6. Analysis 2: Data Visualisation
Four charts are produced to explore revenue patterns across distribution, segment, region, and balance.
ggplot(df, aes(x =log(TOTAL.REVENUE +1))) +geom_histogram(bins =40, fill ="#2c7bb6", color ="white") +labs(title ="Distribution of Total Revenue (Log-Transformed)",x ="Log(Total Revenue + 1)",y ="Number of Customers" )
Warning in log(TOTAL.REVENUE + 1): NaNs produced
Warning: Removed 3 rows containing non-finite outside the scale range
(`stat_bin()`).
Chart 2 – Revenue by Business Segment
Code
ggplot(df, aes(x = SUB_BUS_SGT, y =log(TOTAL.REVENUE +1))) +geom_boxplot(fill ="#2c7bb6") +labs(title ="Revenue Distribution by Business Segment",x ="Business Segment",y ="Log(Total Revenue + 1)" )
Warning in log(TOTAL.REVENUE + 1): NaNs produced
Warning: Removed 3 rows containing non-finite outside the scale range
(`stat_boxplot()`).
ggplot(df, aes(x =log(X2026_AVG_BALANCE +1), y =log(TOTAL.REVENUE +1))) +geom_point(alpha =0.3, size =1.2, color ="#2c7bb6") +labs(title ="Average Balance vs Total Revenue (Log Scale)",x ="Log(Avg Balance + 1)",y ="Log(Total Revenue + 1)" )
Warning in log(TOTAL.REVENUE + 1): NaNs produced
Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_point()`).
7. Analysis 3: Hypothesis Testing
Hypothesis 1:\(H_0\): There is no difference in mean revenue between ‘SMALL’ and ‘MEDIUM’ segments.\(H_1\): There is a significant difference in mean revenue between segments.Result: \(p < 0.001\). We reject the null hypothesis. Medium segments generate significantly more revenue.Hypothesis 2:\(H_0\): Average balance is independent of loan status.\(H_1\): Customers with loans maintain higher average balances.Result: \(p = 0.43\). We fail to reject the null hypothesis.
8. Analysis 4: Correlation Analysis
The correlation matrix identifies 2026_AVG_BALANCE (\(r = 0.99\)) and 2026_DR_TOVER (\(r = 0.49\)) as the strongest predictors of revenue. Interestingly, digital product adoption (HAS_POS, HAS_MOBILE_BANKING) showed negligible correlation with revenue, suggesting these are utility tools rather than direct revenue drivers.
9. Analysis 5: Linear Regression
A multiple regression model was built to predict TOTAL REVENUE:Average Balance Coefficient: 0.058. For every ₦1 increase in balance, revenue increases by ₦0.058.Regional Impact: Accounts in the West region earn ₦18,727 more on average than the baseline (North), holding all other factors constant.Diagnostics: The R-squared of 0.98 indicates an exceptionally strong fit.
10. Integrated Findings
The analysis demonstrates that revenue is almost entirely driven by the deposit base (AVG_BALANCE) rather than transaction frequency or product count. While the “Small” business segment comprises the majority of the portfolio, the “Medium” segment is the engine of profitability.Recommendation: Shift focus from broad digital product acquisition to “Deep Wallet” strategies aimed at increasing the deposit balances of existing Small Segment customers in the West and South regions.
11. Limitations & Further WorkLimitation:
The cross-sectional nature of the data does not capture seasonal fluctuations.Further Work: Future analysis should incorporate a Time Series approach to model the churn rate of these accounts over a 12-month period.
References
Adi, B. (2026). AI-powered business analytics. Lagos Business School.McKinney, W. (2010). Data structures for statistical computing in Python.Wickham, H. (2016). ggplot2: Elegant graphics for data analysis.
Appendix: AI Usage Statement
AI was used to assist in writing the Python cleaning scripts and translating them into R syntax. Independent analytical judgment was exercised in the selection of the regression variables and the interpretation of the regional coefficients.