ANOMALY DETECTION IN PUBLIC EXPENDITURES: THE SÃO PAULO DEPARTMENT OF PUBLIC HEALTH CASE

John Vasconcelos

RESEARCH INTEREST

Applicability of machine learning and AI in identifying disruptive patterns in public and nonprofit budgeting, finance, and accounting.

1 CONTEXT

While AI is still expensive, ML is scalable and accessible
There are multiple algorithms available
Algorithms are complex and general
Algorithms must be studied for specialized uses
Anomaly detection models can optimize public oversight by minimizing false positives

2 OBJECTIVE

To assess the efficacy of the Isolation Forest (iForest) algorithms in identifying public expenditure anomalies that merit investigation by governmental oversight bodies.

3 RESEARCH QUESTION

Can iForest leverage granular expenditure data to objectively uncover disruptive patterns of fraud or errors?

4 HYPOTHESIS

iForest is effective to identify anomalous expenditure amounts based on public expenditure elements
iForest is not as effective as simple aggregation to identify inaccurate ledger entries based on public expenditure categorization

5 REVIEW OF LITERATURE

Most research is ML centered or private sector oriented
Most research applied to public sector lacks evidence due to unavailable labels
Most research applies to single methods rather than comparing the most effective for each case

6 CONCEPTS

ML is a field of AI that enables computers to “learn” patterns from data, without being explicitly programmed for specific tasks.
Supervised learning requires “labeled” training datasets. Unsupervised learning does not require such labels, working on internal patterns.

7 METHODOLOGY

7.1 Tools

The industry standard for tabular ML is the scikit-learn package in Python. For Deep Learning and AI, the standard is TensorFlow and PyTorch.
R offers ML packages but not as popular
Stata is not in the game
Presentation: Quarto

7 METHODOLOGY

7.2 Data

Census of public expenditures downloaded from the open data website of the state government of São Paulo, Brazil.
The datasets utilized covered a period of 10 years, from 2014 to 2023.
The dataset was composed of approximately 1.6 million rows across 36 features, from which, 5 were used in the model after cleaning and reshaping. The remaining features were useful for indentifying the spotted transactions.

7 METHODOLOGY

7.3 Dataset overview - Clean data

7 METHODOLOGY

7.4 Expenditure categorization in Brazil

7 METHODOLOGY

7.5 iForest algorithm

One of the most popular in anomaly detection
Works based on parallel decision trees
Each tree splits categories/values up seeking to isolate anomalies
Anomalous observations are isolated after fewer splits

7 METHODOLOGY

7.6 iForest choice

Nonsupervised algorithm
Fast processing, good scalability
Good for millions of rows and multiple dimensions
No need for previous cluster number determination (it is not a clustering algotithm)
Good for outlier identification
Non parametric

7 METHODOLOGY

7.7 Observational design

Summarized category tables to reveal unique combination of expenditure categories
Unusual combinations could reveal error or fraud
Will the algorithm spot these changes?
Assumption: Anomalous category combinations are not restricted by the financial systems

7 METHODOLOGY

7.8 Quasi-experimental design

Group T1: 20 Randomly selected transactions had the accrued value manually multipled by 20, 100 and 500 respectively, totaling 60 transactions
Group T2: 60 Randomly selected transactions had their expense element changed to another element inconsistent with their category (current expenses with capital elements, vice-versa)
Control group: Remaining transactions

8 RESULTS

Distribution of categories

8 RESULTS

Selected measure

8 RESULTS

Descriptive of elements by year (In BRL)

8 RESULTS

Descriptive of categories by year (In BRL)

8 RESULTS

Descriptive per bidding type by year (In BRL)

8 RESULTS

Descriptive - Categories (In BRL)

8 RESULTS

Descriptive - Categories (In BRL)

8 RESULTS

Descriptive - Main elements (In BRL)

8 RESULTS

Descriptive - Main elements (In BRL)

8 RESULTS

Visual normality test

Shapiro-Wilk test does not handle big data. Anderson-Darling is too sensible to big data.

8 RESULTS

Visual normality test

8 RESULTS

Histograms: entire period (In BRL)

8 RESULTS

Histograms: entire period (In BRL)

8 RESULTS

Anomaly score distribution

9 CONCLUSION

Numerous simulations, with different parameters, revealed that the algorithm was (i) ineffective to identify errors in expenditure categorization, and (ii) had a low effectiveness in identifying numerical outliers.

9 CONCLUSION

The simulation that detected the most errors returned 8 of the treated transactions, out of a total of 1,580 possible outliers. From these, 6 were transactions multiplied by 500, one by 20 and one by 100.
The most efficient simulation returned 4 out of 158.
No treated transaction targeted for categorization was identified

9 CONCLUSION

Utilizing grouping for identifying incorrect categorization was more effective, although requiring intense human observation.

9 CONCLUSION

The descriptive analysis showed that the values per element were dense with a high occurrance of ouliers. These outliers were nearly evenly distributed except for extreme outliers.
The model easily identifies outliers separated by “blank space” in the distribuion, so these are easily spotted
In practical applications, errors and fraud are not necessarily so extreme

9 CONCLUSION

Data on line items might had allowed for more accurate anomaly isolation due to shorter and more even distribution ranges
For financial outliers, the model might be useful for narrowing down the scope for human analysis
Although applicable to categorization, mere grouping does not address individualized outlier behavior in the context of each group

ANOMALY DETECTION IN PUBLIC EXPENDITURES: THE SÃO PAULO DEPARTMENT OF PUBLIC HEALTH CASE

RESEARCH INTEREST

1 CONTEXT

2 OBJECTIVE

3 RESEARCH QUESTION

4 HYPOTHESIS

5 REVIEW OF LITERATURE

6 CONCEPTS

7 METHODOLOGY

7.1 Tools

7 METHODOLOGY

7.2 Data

7 METHODOLOGY

7.3 Dataset overview - Clean data

7 METHODOLOGY

7.4 Expenditure categorization in Brazil

7 METHODOLOGY

7.5 iForest algorithm

7 METHODOLOGY

7.6 iForest choice

7 METHODOLOGY

7.7 Observational design

7 METHODOLOGY

7.8 Quasi-experimental design

8 RESULTS

Distribution of categories

8 RESULTS

Selected measure

8 RESULTS

Descriptive of elements by year (In BRL)

8 RESULTS

Descriptive of categories by year (In BRL)

8 RESULTS

Descriptive per bidding type by year (In BRL)

8 RESULTS

Descriptive - Categories (In BRL)

8 RESULTS

Descriptive - Categories (In BRL)

8 RESULTS

Descriptive - Main elements (In BRL)

8 RESULTS

Descriptive - Main elements (In BRL)

8 RESULTS

Visual normality test

8 RESULTS

Visual normality test

8 RESULTS

Histograms: entire period (In BRL)

8 RESULTS

Histograms: entire period (In BRL)

8 RESULTS

Anomaly score distribution

9 CONCLUSION

9 CONCLUSION

9 CONCLUSION

9 CONCLUSION

9 CONCLUSION

10 VALIDITY

11 FURTHER RESEARCH

Complementary data:

11 FURTHER RESEARCH

Comparative research:

11 FURTHER RESEARCH

Experimental research:

Questions?