Getting insights from data: Call Reports
I recently (Gow, 2026) proposed an extension to the data science “whole game” of R for Data Science (Wickham et al., 2023). In Gow (2026), I used Australian stock price data to illustrate the data curation process and, in this note, I use US bank “Call Report” data as a second illustration. In effect, I provide complete instructions for building a high-performance data library covering all Call Reports data provided by the FFIEC Bulk Data website that can be constructed in less than ten minutes on fast hardware (or a couple of hours on an older machine). I also give a few brief demonstrations of how to use the curated data, with examples for both R and Python. I conclude by discussing challenges encountered during processing and offering some observations about AI and data curation.
My extension of the data science “whole game”—depicted in Figure 1 below—adds a persist step to the original version, groups it with import and tidy into a single process, which I call Curate. As a complement to the new persist step, I also add a load step to the Understand process.1
This note was written and rendered with Quarto using RStudio, an integrated development environment (IDE) for working with R. The source code for this note is available here and the latest PDF version is available here.
In writing the R portions of this note, I used the packages listed below.2
I have not submitted ffiec.pq to CRAN. To install ffiec.pq, first install the pak package using install.packages("pak").3 Then, use pak to install ffiec.pq by typing pak::pak("iangow/ffiec.pq") in the R console.
library(ffiec.pq)
library(tidyverse)
library(farr)
library(dbplyr)
library(DBI)For the Python portion of this note, I believe that you need polars, pandas, and ffiec_data_collector (if you use Python to get the zip files). In generating some plots, I use seaborn and matplotlib, but I don’t include the associated code for reasons of space. So pip install polars pandas ffiec_data_collector probably suffices.
db <- dbConnect(duckdb::duckdb())por <- ffiec_scan_pqs(db, "por")
por |>
filter(IDRSSD == 802866) |>
rename(last_sub = last_date_time_submission_updated_on) |>
select(IDRSSD, financial_institution_name, date,last_sub) |>
mutate(sub_gap = sql("(last_sub::timestamp - date::timestamp)")) |>
mutate(sub_gap = epoch(sub_gap) / (60 * 60 * 24)) |>
arrange(desc(date)) |>
head() |>
collect() # A tibble: 6 × 5
IDRSSD financial_institution_name date last_sub sub_gap
<int> <chr> <date> <dttm> <dbl>
1 802866 SILICON VALLEY BANK 2022-12-31 2023-02-16 17:03:37 47.7
2 802866 SILICON VALLEY BANK 2022-09-30 2022-11-04 15:48:53 35.7
3 802866 SILICON VALLEY BANK 2022-06-30 2022-08-04 18:44:07 35.8
4 802866 SILICON VALLEY BANK 2022-03-31 2022-08-29 18:48:12 152.
5 802866 SILICON VALLEY BANK 2021-12-31 2022-06-10 21:41:20 162.
6 802866 SILICON VALLEY BANK 2021-09-30 2022-03-04 19:46:37 156.
ffiec_float <- ffiec_scan_pqs(db, "ffiec_float")figure1_items <- c(
# Core Figure 1 (Schedule RC)
"RCFD0081", "RCON0081", # Cash and non-interest-bearing balances
"RCFD0071", "RCON0071", # Interest-bearing balances
"RCFD2170", "RCON2170", # Total assets
"RCFD2200", "RCON2200", "RCFN2200", # Total deposits
"RCFD2122", "RCON2122", # Total loans & leases, net
"RCFD1754", "RCON1754", # Total securities
"RCFD3190", "RCON3190", # Total borrowed money
"RCFD3210", "RCON3210", # Total bank equity capital
"RCFD2948", "RCON2948", # Total liabilities
"RCFDG105", "RCONG105", # Total equity (including NCI)
# Uninsured deposits (RC-O)
"RCFD5597", "RCON5597", # Estimated uninsured deposits
# Securities (mostly RC-B memo items)
"RCFDJJ34", "RCONJJ34", # HTM securities: balance sheet
"RCFDJ465", "RCONJ465", # HTM securities: amortized cost
"RCFD1771", "RCON1771", # HTM securities: fair value
"RCFD1773", "RCON1773", # AFS securities: fair value
"RCFD1772", "RCON1772", # AFS securities: amortized cost
"RCFDJA22", "RCONJA22" # Equity securities
)The following pulls together all the data on the items above.
bs_data <-
ffiec_float |>
ffiec_pivot(items = figure1_items) |>
compute() |>
system_time() user system elapsed
9.691 1.222 1.041
Note that bs_data has data on all banks for all quarters, even though we will only look at SVB here.
bs_data |> count() |> collect()# A tibble: 1 × 1
n
<dbl>
1 657971
The following code prepares the data underlying Figure 1.
figure1_data <-
bs_data |>
filter(IDRSSD == 802866,
date %in% c("2019-12-31", "2022-12-31")) |>
mutate(cash =
coalesce(RCFD0081, RCON0081) +
coalesce(RCFD0071, RCON0071),
securities =
coalesce(RCFDJJ34, RCONJJ34) +
coalesce(RCFD1773, RCON1773) +
coalesce(RCFDJA22, RCONJA22),
loans =
coalesce(RCFD2122, RCON2122),
assets =
coalesce(RCFD2170, RCON2170),
other_assets = assets - cash - securities - loans,
deposits =
coalesce(RCFD2200, RCON2200) + RCFN2200,
liabilities =
coalesce(RCFD2948, RCON2948),
other_liabilities = liabilities - deposits,
htm_gains =
coalesce(RCFD1771, RCON1771) -
coalesce(RCFD1754, RCON1754),
afs_gains =
coalesce(RCFD1773, RCON1773) -
coalesce(RCFD1772, RCON1772),
equity =
coalesce(RCFDG105, RCONG105),
mtom_metrick = htm_gains + afs_gains,
mtom_correct = htm_gains,
equity_mtom_metrick = equity + mtom_metrick,
equity_mtom = equity + mtom_correct,
liab_eq = liabilities + equity,
) |>
select(IDRSSD, date,
cash, securities, loans, other_assets, assets,
deposits, other_liabilities, liabilities,
equity, liab_eq,
equity_mtom_metrick, equity_mtom)| Item | Value ($B) |
|---|---|
| Cash | 6 |
| Securities | 28 |
| Loans | 33 |
| Other assets | 3 |
| Total assets | 70 |
| Item | Value ($B) |
|---|---|
| Deposits | 63 |
| Other liabilities | 2 |
| Liabilities | 65 |
| Equity | 5 |
| Total liabilities & equity | 70 |
| Equity (Metrick MTM) | 5 |
| Equity (correct MTM) | 5 |
| Item | Value ($B) |
|---|---|
| Cash | 13 |
| Securities | 117 |
| Loans | 74 |
| Other assets | 5 |
| Total assets | 209 |
| Item | Value ($B) |
|---|---|
| Deposits | 175 |
| Other liabilities | 18 |
| Liabilities | 194 |
| Equity | 15 |
| Total liabilities & equity | 209 |
| Equity (Metrick MTM) | -2 |
| Equity (correct MTM) | 0 |
References
Footnotes
As will be seen this, this load step will not generally be an elaborate one. The inclusion of a separate load step serves more to better delineate the distinction between the Curate process and the Understand process.↩︎
Execute
install.packages(c("tidyverse", "farr", "DBI", "duckdb", "pak", dbplyr")within R to install all the packages other thanffiec.pqthat you will need to run the R code in this note.↩︎I already included
pakin the command in the footnote to the previous paragraph.↩︎