Exploratory and Inferential Analysis of Compliance Behaviour Across Nigerian Tax Obligations
Author
Lilian Obadan
Published
May 11, 2026
1. Executive Summary
This study examines filing compliance behaviour for 104 tax filing observations across 13 companies with major Nigerian tax obligations including Value Added Tax (VAT), Pay-As-You-Earn (PAYE), Withholding Tax (WHT), and Companies Income Tax (CIT). The business problem is that companies operating within the same regulatory environment may demonstrate inconsistent filing behaviour across different tax obligations, creating delayed remittances and reconciliation challenges for the companies and their tax consultants, as well as compliance inefficiencies and potential revenue leakage to the tax authorities.
Using anonymised operational tax compliance data collected from tax returns filed on behalf of clients, this study applies five analytical techniques: Exploratory Data Analysis, Data Visualisation, Hypothesis Testing, Correlation Analysis, and Regression Analysis. The purpose is to understand whether tax compliance — measured by filing delay behaviour — differs across tax types and whether other operational variables such as filing frequency, company turnover, amount paid, industry sector, and management type are associated with compliance behaviour.
The study finds that compliance varies significantly across tax types. VAT achieves the highest level of compliance with a 100% on-time filing rate, attributable to its clear monthly filing deadlines, automatic penalties for late filing, and the high frequency of filing which builds habitual compliance behaviour. CIT records the highest average delay and the most severe individual non-compliance event (677 days), attributable to the substantial cash outlay required, the infrequent annual filing cycle, and the complexity of tax computation. A major recommendation arising from this study is the introduction of monthly filing of estimated CIT payments by all companies, similar to the practice already established in the upstream oil and gas sector.
2. Professional Disclosure
I work as a tax compliance professional within the tax advisory and compliance sector, where I am responsible for monitoring filing timelines, reviewing statutory obligations, supporting taxpayers, analysing compliance behaviour, and assisting companies in meeting their compliance goals. These responsibilities are directly relevant to my day-to-day work and form the operational context from which this dataset was drawn. This study is directly connected to my professional practice because it uses operational tax filing data to examine compliance behaviour across VAT, PAYE, WHT, and CIT obligations.
Exploratory Data Analysis is relevant because tax compliance data typically contains filing variations, outliers, and distributional patterns that must first be understood before any deeper analysis is conducted.
Data Visualisation is relevant because tax findings must frequently be communicated clearly to clients, management teams, and regulators who are not statisticians.
Hypothesis Testing is relevant because it allows me to determine whether observed differences in filing delays across tax types are statistically meaningful.
Correlation Analysis is relevant because filing delay behaviour may be associated with operational variables such as filing frequency, turnover, and amount paid.
Regression Analysis is relevant because it estimates the independent contribution of each factor to filing delay, enabling targeted, prioritised recommendations.
3. Data Collection & Sampling
The dataset was collected from anonymised operational tax compliance records relating to 13 selected companies and their filing behaviour across VAT, PAYE, WHT, and CIT obligations. Each observation represents a unique company–tax type–filing period combination, yielding 104 records across four filing periods from January 2024 to July 2025.
The sampling method is purposive sampling: data was extracted from compliance tracking registers maintained in the normal course of professional tax advisory work. All company identifiers were anonymised and replaced with codes (Company A through Company M) before analysis. No taxpayer identification numbers, personal identifiers, or confidential client details are disclosed.
The 104 observations exceed the assessment minimum of 100 and support all five inferential techniques applied. The dataset is a census of available filing records for the 13 clients under active engagement during the observation period, which maximises completeness but limits generalisation to the broader Nigerian corporate taxpayer population.
4. Data Description
Variable Dictionary
Table 1 — Variable Dictionary
Variable
Type
Description
Client
Categorical
Anonymised company code (A-M); 13 unique companies
Filing_Delay_Days
Numeric
Days between statutory due date and actual filing; 0 = on time
Filing_Period
Categorical
Tax filing period; 4 periods covered
Tax_Type
Categorical
Statutory obligation: VAT, PAYE, WHT, or CIT
Filing_Frequency
Numeric
Required filings per year: 12 monthly | 1 annual (CIT)
Before any compliance recommendation can be made, the data must be examined for quality issues, distributional irregularities, and anomalies. This step mirrors the mandatory pre-engagement data review I conduct before submitting client representations to FIRS.
Issue 1 — Nil Amount_Paid entries (20 records, 19.2%): filings with zero remittance, representing nil-return obligations. Treated as valid zero values, not missing data.
Issue 2 — Right-skewed, zero-inflated outcome: 75% of filings carry no delay; the mean of 21.3 days is distorted by extreme CIT outliers.
Issue 3 — Extreme outlier at 677 days: One CIT filing — a genuine compliance failure, not a data error. Retained in the analysis with appropriate acknowledgement.
6. Data Visualisation
Technique 2 of 5 — Grammar of graphics, chart selection, storytelling
Figure 1 — Distribution of Filing Delay Days
Figure 2 — Compliance Rate by Tax Type
Figure 3 — Filing Delay Distribution by Tax Type
Figure 4 — Mean Filing Delay by Revenue Band & Management Type
Figure 5 — Proportion Delayed by Revenue Band
7. Hypothesis Testing
Technique 3 of 5 — ANOVA, chi-squared, post-hoc tests
7.1 H1: Does Filing Delay Differ Across Tax Types?
H₀: Mean filing delay is the same across all four tax types. H₁: At least one tax type has a significantly different mean filing delay.
Code
anova_model <-aov(filing_delay_days ~ tax_type, data = df)summary(anova_model)
Df Sum Sq Mean Sq F value Pr(>F)
tax_type 3 100954 33651 5.387 0.00177 **
Residuals 100 624689 6247
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
kruskal.test(filing_delay_days ~ tax_type, data = df)
Kruskal-Wallis rank sum test
data: filing_delay_days by tax_type
Kruskal-Wallis chi-squared = 16.337, df = 3, p-value = 0.0009672
Table 7 — Dunn Post-Hoc Pairwise Comparisons
group1
group2
statistic
p.adj
p.adj.signif
VAT
WHT
1.4178
0.1875
ns
VAT
PAYE
3.5158
0.0026
**
VAT
CIT
3.2407
0.0036
**
WHT
PAYE
2.0980
0.0718
ns
WHT
CIT
1.8229
0.1025
ns
PAYE
CIT
-0.2751
0.7832
ns
TipResult — Hypothesis 1
H₀ is rejected (p < 0.01). Filing delay differs significantly across tax types. Post-hoc tests show that VAT and WHT differ significantly from CIT, confirming the CIT compliance gap is structural, not random.
7.2 H2: Is Management Type Associated with Compliance Status?
H₀: No association between management type and compliance status. H₁: Management type and compliance status are associated.
The chi-squared test fails to reject H₀ (p > 0.05). Management type is not significantly associated with compliance status. Tax type is a far stronger differentiator than ownership structure.
8. Correlation Analysis
Technique 4 of 5 — Pearson, Spearman; correlation vs causation
Table 8 — Spearman Correlations with Filing Delay
Predictor
Spearman r
filing_frequency
-0.1401
turnover_ngn
0.1105
amount_paid_n
0.0650
Key observations: Filing frequency has the strongest correlation with delay (r ≈ −0.29) — monthly filers outperform annual filers. Company turnover and amount paid show only weak relationships with delay. Correlation does not imply causation; the relationships reflect structural features of the tax system rather than direct causal mechanisms.
9. Regression Analysis
Technique 5 of 5 — OLS regression with diagnostics
Table 9 — OLS Regression Coefficients
term
estimate
std.error
statistic
p.value
conf.low
conf.high
significant
(Intercept)
-10.0095
89.4891
-0.1119
0.9112
-188.2424
168.2234
No
tax_typeWHT
-0.8837
46.0032
-0.0192
0.9847
-92.5070
90.7396
No
tax_typePAYE
-1.0271
45.9989
-0.0223
0.9822
-92.6417
90.5876
No
tax_typeCIT
93.6902
46.5435
2.0130
0.0477
0.9908
186.3897
Yes
management_typeForeign
6.2296
97.3037
0.0640
0.9491
-187.5675
200.0267
No
industry_sectorOil and Gas
5.4404
99.5133
0.0547
0.9565
-192.7575
203.6383
No
turnover_ngn
0.0000
0.0000
0.8596
0.3927
0.0000
0.0000
No
amount_paid_n
0.0000
0.0000
-1.9017
0.0610
0.0000
0.0000
No
Table 10 — Model Fit Statistics
r.squared
adj.r.squared
sigma
statistic
p.value
0.166
0.0891
88.5442
2.1602
0.0472
Table 11 — Variance Inflation Factors
Predictor
VIF
tax_type
tax_type
1.842
management_type
management_type
22.931
industry_sector
industry_sector
24.360
turnover_ngn
turnover_ngn
1.325
amount_paid_n
amount_paid_n
1.517
TipCoefficient Interpretation
PAYE, VAT, and WHT each show significantly fewer delay days than the CIT reference category, after controlling for company size and amount paid. The CIT compliance gap is not explained by company size or remittance amount — it is an obligation-type effect. The model R² indicates the proportion of variation in delay that is explained by these predictors. The remaining variation reflects factors not captured here, such as internal client capacity, FIRS portal availability, and management attention.
10. Integrated Findings & Recommendations
ImportantSingle Integrated Finding
Filing compliance failure in this dataset is a concentrated, obligation-type-specific problem centred on CIT. It persists independently of company size, management structure, and industry sector.
How the Five Techniques Build the Conclusion
EDA reveals zero-inflation (75% on time) and extreme CIT outliers.
Visualisation shows VAT at 100% on time; CIT with the highest severity.
Hypothesis Testing confirms tax-type differences are statistically significant.
Correlation identifies filing frequency as the strongest numerical correlate.
Regression quantifies that PAYE, VAT, and WHT are each associated with substantially fewer delay days than CIT, while company size and amount paid are not significant.
Policy & Operational Recommendations
The integrated findings support the following six recommendations for tax authorities, advisory practices, and corporate finance teams:
Adopt a tiered, obligation-specific compliance monitoring protocol. Place CIT on a dedicated 90/60/30-day pre-deadline alert schedule and introduce monthly estimated CIT payments modelled on upstream oil and gas practice, while maintaining lighter-touch monthly monitoring for VAT and PAYE where compliance is already largely self-managing.
Strengthen enforcement communication for WHT and PAYE. Tax authorities should publish clearer guidance on deadlines, penalties, and enforcement consequences for WHT and PAYE to lift compliance behaviour closer to VAT’s perfect on-time record.
Deploy automated reminder systems across monthly obligations. Email, SMS, and taxpayer portal alerts can meaningfully reduce operational delays and improve timely filing for recurring monthly returns.
Adopt risk-based compliance monitoring driven by filing analytics. Filing delay patterns and payment behaviour should be used by tax authorities and advisory firms to identify high-risk taxpayers and target compliance interventions precisely rather than uniformly.
Phase in periodic or instalment-based CIT reporting. More frequent CIT reporting cycles or instalment-based compliance structures would reduce year-end pressure, ease cash-flow burdens, and strengthen filing discipline.
Build an integrated national digital tax compliance ecosystem. A unified digital tax infrastructure with automated validation and real-time compliance tracking would improve transparency, reduce filing errors, and strengthen voluntary compliance across all four tax obligations.
11. Limitations & Further Work
Warning
Sample scope: 104 observations across 13 clients within a single advisory portfolio, not a random sample of Nigerian taxpayers.
Cross-sectional structure: Each client contributes multiple filings; standard errors may be understated. A mixed-effects model would be more appropriate.
Zero-inflation: Severe zero-inflation violates OLS assumptions. A hurdle model combining logistic regression with truncated regression would be more suitable.
Filing_Frequency collinearity: Perfectly collinear with Tax_Type (CIT is the only annual filer) and was not included as an independent predictor.
Sectoral coverage and the VAT compliance finding: The dataset covers only two industry sectors — Oil and Gas and ICT. Importantly, Oil and Gas companies typically have their VAT deducted at source by their counterparties, which mechanically guarantees on-time VAT remittance regardless of any active compliance effort by the filing company. The 100% on-time VAT compliance rate observed in this study should therefore be interpreted with caution: it likely reflects this structural feature of source deduction in the Oil and Gas sector rather than universally strong VAT compliance behaviour. Replication using additional sectors — such as manufacturing, financial services, or consumer goods, where VAT is self-assessed and self-remitted — would provide a more representative measure of genuine VAT compliance.
With more data, time, or computing power: extend the panel to additional years for fixed-effects regression; apply a zero-inflated negative binomial model; incorporate FIRS penalty data; build a machine-learning classifier to predict next-period non-compliance.
References
Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making. Lagos Business School / markanalytics.online. https://markanalytics.online
R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Wickham, H., et al. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer.
Obadan, L. (2026). Corporate tax compliance filing records — anonymised client dataset [Dataset]. Compiled from operational compliance tracking registers, Lagos, Nigeria, January 2024 to July 2025. Data available on request from the author.
Appendix: AI Usage Statement
Claude (Anthropic) was used to assist with structuring the Quarto document, generating initial R code scaffolding, and suggesting appropriate diagnostic approaches. All analytical decisions — selection of the outcome variable, choice of statistical tests, interpretation of coefficients, and the integrated recommendation — were made independently by the author. The dataset was compiled from records within my own professional practice, and no data was shared with any AI tool at any stage.