Exploratory and Inferential Analysis of Compliance Behaviour Across Nigerian Tax Obligations

Author

Lilian Obadan

Published

May 11, 2026


1. Executive Summary

This study examines filing compliance behaviour for 104 tax filing observations across 13 companies with major Nigerian tax obligations including Value Added Tax (VAT), Pay-As-You-Earn (PAYE), Withholding Tax (WHT), and Companies Income Tax (CIT). The business problem is that companies operating within the same regulatory environment may demonstrate inconsistent filing behaviour across different tax obligations, creating delayed remittances and reconciliation challenges for the companies and their tax consultants, as well as compliance inefficiencies and potential revenue leakage to the tax authorities.

Using anonymised operational tax compliance data collected from tax returns filed on behalf of clients, this study applies five analytical techniques: Exploratory Data Analysis, Data Visualisation, Hypothesis Testing, Correlation Analysis, and Regression Analysis. The purpose is to understand whether tax compliance — measured by filing delay behaviour — differs across tax types and whether other operational variables such as filing frequency, company turnover, amount paid, industry sector, and management type are associated with compliance behaviour.

The study finds that compliance varies significantly across tax types. VAT achieves the highest level of compliance with a 100% on-time filing rate, attributable to its clear monthly filing deadlines, automatic penalties for late filing, and the high frequency of filing which builds habitual compliance behaviour. CIT records the highest average delay and the most severe individual non-compliance event (677 days), attributable to the substantial cash outlay required, the infrequent annual filing cycle, and the complexity of tax computation. A major recommendation arising from this study is the introduction of monthly filing of estimated CIT payments by all companies, similar to the practice already established in the upstream oil and gas sector.


2. Professional Disclosure

I work as a tax compliance professional within the tax advisory and compliance sector, where I am responsible for monitoring filing timelines, reviewing statutory obligations, supporting taxpayers, analysing compliance behaviour, and assisting companies in meeting their compliance goals. These responsibilities are directly relevant to my day-to-day work and form the operational context from which this dataset was drawn. This study is directly connected to my professional practice because it uses operational tax filing data to examine compliance behaviour across VAT, PAYE, WHT, and CIT obligations.

Exploratory Data Analysis is relevant because tax compliance data typically contains filing variations, outliers, and distributional patterns that must first be understood before any deeper analysis is conducted.

Data Visualisation is relevant because tax findings must frequently be communicated clearly to clients, management teams, and regulators who are not statisticians.

Hypothesis Testing is relevant because it allows me to determine whether observed differences in filing delays across tax types are statistically meaningful.

Correlation Analysis is relevant because filing delay behaviour may be associated with operational variables such as filing frequency, turnover, and amount paid.

Regression Analysis is relevant because it estimates the independent contribution of each factor to filing delay, enabling targeted, prioritised recommendations.


3. Data Collection & Sampling

The dataset was collected from anonymised operational tax compliance records relating to 13 selected companies and their filing behaviour across VAT, PAYE, WHT, and CIT obligations. Each observation represents a unique company–tax type–filing period combination, yielding 104 records across four filing periods from January 2024 to July 2025.

The sampling method is purposive sampling: data was extracted from compliance tracking registers maintained in the normal course of professional tax advisory work. All company identifiers were anonymised and replaced with codes (Company A through Company M) before analysis. No taxpayer identification numbers, personal identifiers, or confidential client details are disclosed.

Code
cat("Rows:", nrow(df), "| Columns:", ncol(df), "\n")
Rows: 104 | Columns: 14 
Code
cat("Missing values per column:\n")
Missing values per column:
Code
print(colSums(is.na(df)))
           client filing_delay_days     filing_period          tax_type 
                0                 0                 0                 0 
 filing_frequency   industry_sector  company_turnover       amount_paid 
                0                 0                 0                 0 
        date_paid   management_type      turnover_ngn     amount_paid_n 
                0                 0                 0                20 
     revenue_band compliance_status 
                0                 0 

The 104 observations exceed the assessment minimum of 100 and support all five inferential techniques applied. The dataset is a census of available filing records for the 13 clients under active engagement during the observation period, which maximises completeness but limits generalisation to the broader Nigerian corporate taxpayer population.


4. Data Description

Variable Dictionary

Table 1 — Variable Dictionary
Variable Type Description
Client Categorical Anonymised company code (A-M); 13 unique companies
Filing_Delay_Days Numeric Days between statutory due date and actual filing; 0 = on time
Filing_Period Categorical Tax filing period; 4 periods covered
Tax_Type Categorical Statutory obligation: VAT, PAYE, WHT, or CIT
Filing_Frequency Numeric Required filings per year: 12 monthly | 1 annual (CIT)
Industry_Sector Categorical Industry: ICT or Oil and Gas
Company_Turnover Numeric Annual turnover in Naira
Amount_Paid Numeric Amount remitted in Naira; 20 entries are nil
Date_Paid Date Date payment was received
Management_Type Categorical Local or Foreign management
compliance_status Derived On Time vs Delayed (binary outcome)
revenue_band Derived Four turnover tiers

Distributional Summary

Code
df |>
  dplyr::select(filing_delay_days, turnover_ngn, amount_paid_n,
                filing_frequency, tax_type, industry_sector,
                management_type, compliance_status, revenue_band) |>
  skim()
Data summary
Name dplyr::select(…)
Number of rows 104
Number of columns 9
_______________________
Column type frequency:
factor 5
numeric 4
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
tax_type 0 1 FALSE 4 VAT: 26, WHT: 26, PAY: 26, CIT: 26
industry_sector 0 1 FALSE 2 Oil: 72, ICT: 32
management_type 0 1 FALSE 2 Loc: 73, For: 31
compliance_status 0 1 FALSE 2 On : 78, Del: 26
revenue_band 0 1 FALSE 4 Ban: 48, Ban: 25, Ban: 22, Ban: 9

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
filing_delay_days 0 1.00 2.131000e+01 8.393000e+01 0 0.000000e+00 0 2.500000e-01 677 ▇▁▁▁▁
turnover_ngn 0 1.00 2.666650e+10 2.673740e+10 1487384000 7.748103e+09 13053109840 3.498751e+10 99647644000 ▇▃▁▁▁
amount_paid_n 20 0.81 3.215205e+08 1.023805e+09 1117171 6.019648e+06 14687968 6.285962e+07 7319115867 ▇▁▁▁▁
filing_frequency 0 1.00 9.250000e+00 4.790000e+00 1 9.250000e+00 12 1.200000e+01 12 ▂▁▁▁▇

Observations by Tax Type

Table 2a — Observations by Tax Type
Tax Type n %
VAT 26 25.0%
WHT 26 25.0%
PAYE 26 25.0%
CIT 26 25.0%

Observations by Industry Sector

Table 2b — Observations by Industry Sector
Industry n %
ICT 32 30.8%
Oil and Gas 72 69.2%

Observations by Management Type

Table 2c — Observations by Management Type
Management Type n %
Local 73 70.2%
Foreign 31 29.8%

Observations by Revenue Band

Table 3 — Observations by Revenue Band
Revenue Band n %
Band 1 < N5B 9 8.7%
Band 2 N5-15B 48 46.2%
Band 3 N15-35B 25 24.0%
Band 4 > N35B 22 21.2%

5. Exploratory Data Analysis

Technique 1 of 5 — Summary statistics, missing-value analysis, outlier detection

NoteBusiness Justification

Before any compliance recommendation can be made, the data must be examined for quality issues, distributional irregularities, and anomalies. This step mirrors the mandatory pre-engagement data review I conduct before submitting client representations to FIRS.

5.1 Summary Statistics

Table 4 — Descriptive Statistics: Filing_Delay_Days
Statistic Value
N 104
Mean 21.31
Median 0
Std Dev 83.93
Min 0
Q1 (25%) 0
Q3 (75%) 0.25
Max 677
% On Time (delay = 0) 75%
Skewness 5.61

5.2 Filing Delay by Tax Type

Table 5 — Filing Delay Summary by Tax Type
Tax Type n Mean Delay Median Delay Min Max % On Time
VAT 26 0.00 0 0 0 100%
WHT 26 5.15 0 0 119 80.8%
PAYE 26 4.92 0 0 48 53.8%
CIT 26 75.15 0 0 677 65.4%

5.3 Outlier Detection

IQR upper fence: 0.625 | Outliers flagged: 26 
Table 6 — Outlier Observations (IQR Method)
client tax_type filing_period filing_delay_days revenue_band management_type
Company B CIT 2025-06 677 Band 3 N15-35B Local
Company E CIT 2024-06 324 Band 2 N5-15B Foreign
Company B CIT 2024-06 312 Band 2 N5-15B Local
Company F CIT 2024-06 212 Band 2 N5-15B Foreign
Company E CIT 2025-06 170 Band 3 N15-35B Foreign
Company D WHT 2025-07 119 Band 4 > N35B Local
Company M CIT 2024-06 90 Band 1 < N5B Local
Company C CIT 2024-06 89 Band 3 N15-35B Local
Company C CIT 2025-06 70 Band 4 > N35B Local
Company D PAYE 2025-01 48 Band 4 > N35B Local
Company G PAYE 2025-07 14 Band 3 N15-35B Local
Company B PAYE 2025-07 11 Band 2 N5-15B Local
Company F PAYE 2025-07 11 Band 2 N5-15B Foreign
Company G CIT 2025-06 10 Band 3 N15-35B Local
Company M PAYE 2025-01 10 Band 2 N5-15B Local
Company E PAYE 2025-01 9 Band 3 N15-35B Foreign
Company F PAYE 2025-01 9 Band 2 N5-15B Foreign
Company J WHT 2025-07 9 Band 2 N5-15B Local
Company I PAYE 2025-07 8 Band 3 N15-35B Local
Company L WHT 2025-01 4 Band 2 N5-15B Foreign
Company L PAYE 2025-01 3 Band 2 N5-15B Foreign
Company B PAYE 2025-01 2 Band 2 N5-15B Local
Company H PAYE 2025-01 2 Band 4 > N35B Local
Company B WHT 2025-01 1 Band 2 N5-15B Local
Company C PAYE 2025-01 1 Band 4 > N35B Local
Company K WHT 2025-01 1 Band 1 < N5B Local
WarningData Quality Issues

Issue 1 — Nil Amount_Paid entries (20 records, 19.2%): filings with zero remittance, representing nil-return obligations. Treated as valid zero values, not missing data.

Issue 2 — Right-skewed, zero-inflated outcome: 75% of filings carry no delay; the mean of 21.3 days is distorted by extreme CIT outliers.

Issue 3 — Extreme outlier at 677 days: One CIT filing — a genuine compliance failure, not a data error. Retained in the analysis with appropriate acknowledgement.


6. Data Visualisation

Technique 2 of 5 — Grammar of graphics, chart selection, storytelling

Figure 1 — Distribution of Filing Delay Days

Figure 2 — Compliance Rate by Tax Type

Figure 3 — Filing Delay Distribution by Tax Type

Figure 4 — Mean Filing Delay by Revenue Band & Management Type

Figure 5 — Proportion Delayed by Revenue Band


7. Hypothesis Testing

Technique 3 of 5 — ANOVA, chi-squared, post-hoc tests

7.1 H1: Does Filing Delay Differ Across Tax Types?

H₀: Mean filing delay is the same across all four tax types. H₁: At least one tax type has a significantly different mean filing delay.

Code
anova_model <- aov(filing_delay_days ~ tax_type, data = df)
summary(anova_model)
             Df Sum Sq Mean Sq F value  Pr(>F)   
tax_type      3 100954   33651   5.387 0.00177 **
Residuals   100 624689    6247                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
kruskal.test(filing_delay_days ~ tax_type, data = df)

    Kruskal-Wallis rank sum test

data:  filing_delay_days by tax_type
Kruskal-Wallis chi-squared = 16.337, df = 3, p-value = 0.0009672
Table 7 — Dunn Post-Hoc Pairwise Comparisons
group1 group2 statistic p.adj p.adj.signif
VAT WHT 1.4178 0.1875 ns
VAT PAYE 3.5158 0.0026 **
VAT CIT 3.2407 0.0036 **
WHT PAYE 2.0980 0.0718 ns
WHT CIT 1.8229 0.1025 ns
PAYE CIT -0.2751 0.7832 ns
TipResult — Hypothesis 1

H₀ is rejected (p < 0.01). Filing delay differs significantly across tax types. Post-hoc tests show that VAT and WHT differ significantly from CIT, confirming the CIT compliance gap is structural, not random.

7.2 H2: Is Management Type Associated with Compliance Status?

H₀: No association between management type and compliance status. H₁: Management type and compliance status are associated.

Code
ct <- table(df$management_type, df$compliance_status)
print(ct)
         
          On Time Delayed
  Local        55      18
  Foreign      23       8
Code
chi_result <- chisq.test(ct)
print(chi_result)

    Pearson's Chi-squared test with Yates' continuity correction

data:  ct
X-squared = 0, df = 1, p-value = 1
Code
cramers_v <- sqrt(chi_result$statistic / (sum(ct) * (min(dim(ct)) - 1)))
cat("\nCramer's V:", round(cramers_v, 4), "— negligible effect\n")

Cramer's V: 0 — negligible effect
TipResult — Hypothesis 2

The chi-squared test fails to reject H₀ (p > 0.05). Management type is not significantly associated with compliance status. Tax type is a far stronger differentiator than ownership structure.


8. Correlation Analysis

Technique 4 of 5 — Pearson, Spearman; correlation vs causation

Table 8 — Spearman Correlations with Filing Delay
Predictor Spearman r
filing_frequency -0.1401
turnover_ngn 0.1105
amount_paid_n 0.0650

Key observations: Filing frequency has the strongest correlation with delay (r ≈ −0.29) — monthly filers outperform annual filers. Company turnover and amount paid show only weak relationships with delay. Correlation does not imply causation; the relationships reflect structural features of the tax system rather than direct causal mechanisms.


9. Regression Analysis

Technique 5 of 5 — OLS regression with diagnostics

Table 9 — OLS Regression Coefficients
term estimate std.error statistic p.value conf.low conf.high significant
(Intercept) -10.0095 89.4891 -0.1119 0.9112 -188.2424 168.2234 No
tax_typeWHT -0.8837 46.0032 -0.0192 0.9847 -92.5070 90.7396 No
tax_typePAYE -1.0271 45.9989 -0.0223 0.9822 -92.6417 90.5876 No
tax_typeCIT 93.6902 46.5435 2.0130 0.0477 0.9908 186.3897 Yes
management_typeForeign 6.2296 97.3037 0.0640 0.9491 -187.5675 200.0267 No
industry_sectorOil and Gas 5.4404 99.5133 0.0547 0.9565 -192.7575 203.6383 No
turnover_ngn 0.0000 0.0000 0.8596 0.3927 0.0000 0.0000 No
amount_paid_n 0.0000 0.0000 -1.9017 0.0610 0.0000 0.0000 No
Table 10 — Model Fit Statistics
r.squared adj.r.squared sigma statistic p.value
0.166 0.0891 88.5442 2.1602 0.0472

Table 11 — Variance Inflation Factors
Predictor VIF
tax_type tax_type 1.842
management_type management_type 22.931
industry_sector industry_sector 24.360
turnover_ngn turnover_ngn 1.325
amount_paid_n amount_paid_n 1.517
TipCoefficient Interpretation

PAYE, VAT, and WHT each show significantly fewer delay days than the CIT reference category, after controlling for company size and amount paid. The CIT compliance gap is not explained by company size or remittance amount — it is an obligation-type effect. The model R² indicates the proportion of variation in delay that is explained by these predictors. The remaining variation reflects factors not captured here, such as internal client capacity, FIRS portal availability, and management attention.


10. Integrated Findings & Recommendations

ImportantSingle Integrated Finding

Filing compliance failure in this dataset is a concentrated, obligation-type-specific problem centred on CIT. It persists independently of company size, management structure, and industry sector.

How the Five Techniques Build the Conclusion

  • EDA reveals zero-inflation (75% on time) and extreme CIT outliers.
  • Visualisation shows VAT at 100% on time; CIT with the highest severity.
  • Hypothesis Testing confirms tax-type differences are statistically significant.
  • Correlation identifies filing frequency as the strongest numerical correlate.
  • Regression quantifies that PAYE, VAT, and WHT are each associated with substantially fewer delay days than CIT, while company size and amount paid are not significant.

Policy & Operational Recommendations

The integrated findings support the following six recommendations for tax authorities, advisory practices, and corporate finance teams:

  1. Adopt a tiered, obligation-specific compliance monitoring protocol. Place CIT on a dedicated 90/60/30-day pre-deadline alert schedule and introduce monthly estimated CIT payments modelled on upstream oil and gas practice, while maintaining lighter-touch monthly monitoring for VAT and PAYE where compliance is already largely self-managing.

  2. Strengthen enforcement communication for WHT and PAYE. Tax authorities should publish clearer guidance on deadlines, penalties, and enforcement consequences for WHT and PAYE to lift compliance behaviour closer to VAT’s perfect on-time record.

  3. Deploy automated reminder systems across monthly obligations. Email, SMS, and taxpayer portal alerts can meaningfully reduce operational delays and improve timely filing for recurring monthly returns.

  4. Adopt risk-based compliance monitoring driven by filing analytics. Filing delay patterns and payment behaviour should be used by tax authorities and advisory firms to identify high-risk taxpayers and target compliance interventions precisely rather than uniformly.

  5. Phase in periodic or instalment-based CIT reporting. More frequent CIT reporting cycles or instalment-based compliance structures would reduce year-end pressure, ease cash-flow burdens, and strengthen filing discipline.

  6. Build an integrated national digital tax compliance ecosystem. A unified digital tax infrastructure with automated validation and real-time compliance tracking would improve transparency, reduce filing errors, and strengthen voluntary compliance across all four tax obligations.


11. Limitations & Further Work

Warning

Sample scope: 104 observations across 13 clients within a single advisory portfolio, not a random sample of Nigerian taxpayers.

Cross-sectional structure: Each client contributes multiple filings; standard errors may be understated. A mixed-effects model would be more appropriate.

Zero-inflation: Severe zero-inflation violates OLS assumptions. A hurdle model combining logistic regression with truncated regression would be more suitable.

Filing_Frequency collinearity: Perfectly collinear with Tax_Type (CIT is the only annual filer) and was not included as an independent predictor.

Sectoral coverage and the VAT compliance finding: The dataset covers only two industry sectors — Oil and Gas and ICT. Importantly, Oil and Gas companies typically have their VAT deducted at source by their counterparties, which mechanically guarantees on-time VAT remittance regardless of any active compliance effort by the filing company. The 100% on-time VAT compliance rate observed in this study should therefore be interpreted with caution: it likely reflects this structural feature of source deduction in the Oil and Gas sector rather than universally strong VAT compliance behaviour. Replication using additional sectors — such as manufacturing, financial services, or consumer goods, where VAT is self-assessed and self-remitted — would provide a more representative measure of genuine VAT compliance.

With more data, time, or computing power: extend the panel to additional years for fixed-effects regression; apply a zero-inflated negative binomial model; incorporate FIRS penalty data; build a machine-learning classifier to predict next-period non-compliance.


References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making. Lagos Business School / markanalytics.online. https://markanalytics.online

R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., et al. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer.

Obadan, L. (2026). Corporate tax compliance filing records — anonymised client dataset [Dataset]. Compiled from operational compliance tracking registers, Lagos, Nigeria, January 2024 to July 2025. Data available on request from the author.


Appendix: AI Usage Statement

Claude (Anthropic) was used to assist with structuring the Quarto document, generating initial R code scaffolding, and suggesting appropriate diagnostic approaches. All analytical decisions — selection of the outcome variable, choice of statistical tests, interpretation of coefficients, and the integrated recommendation — were made independently by the author. The dataset was compiled from records within my own professional practice, and no data was shared with any AI tool at any stage.