Getting insights from data: Call Reports

Author

Ian D. Gow

Published

7 February 2026

I recently (Gow, 2026) proposed an extension to the data science “whole game” of R for Data Science (Wickham et al., 2023). In Gow (2026), I used Australian stock price data to illustrate the data curation process and, in this note, I use US bank “Call Report” data as a second illustration. In effect, I provide complete instructions for building a high-performance data library covering all Call Reports data provided by the FFIEC Bulk Data website that can be constructed in less than ten minutes on fast hardware (or a couple of hours on an older machine). I also give a few brief demonstrations of how to use the curated data, with examples for both R and Python. I conclude by discussing challenges encountered during processing and offering some observations about AI and data curation.

My extension of the data science “whole game”—depicted in Figure 1 below—adds a persist step to the original version, groups it with import and tidy into a single process, which I call Curate. As a complement to the new persist step, I also add a load step to the Understand process.¹

A diagram displaying the data science workflow. — Figure 1: A representation of the data science workflow

Tip

This note was written and rendered with Quarto using RStudio, an integrated development environment (IDE) for working with R. The source code for this note is available here and the latest PDF version is available here.

In writing the R portions of this note, I used the packages listed below.²

I have not submitted ffiec.pq to CRAN. To install ffiec.pq, first install the pak package using install.packages("pak").³ Then, use pak to install ffiec.pq by typing pak::pak("iangow/ffiec.pq") in the R console.

library(ffiec.pq)
library(tidyverse)
library(farr)
library(dbplyr)
library(DBI)

For the Python portion of this note, I believe that you need polars, pandas, and ffiec_data_collector (if you use Python to get the zip files). In generating some plots, I use seaborn and matplotlib, but I don’t include the associated code for reasons of space. So pip install polars pandas ffiec_data_collector probably suffices.

db <- dbConnect(duckdb::duckdb())

por <- ffiec_scan_pqs(db, "por")
por |>
  filter(IDRSSD == 802866) |>
  rename(last_sub = last_date_time_submission_updated_on) |>
  select(IDRSSD, financial_institution_name, date,last_sub) |>
  mutate(sub_gap = sql("(last_sub::timestamp - date::timestamp)")) |>
  mutate(sub_gap = epoch(sub_gap) / (60 * 60 * 24)) |>
  arrange(desc(date)) |>
  head() |>
  collect()

# A tibble: 6 × 5
  IDRSSD financial_institution_name date       last_sub            sub_gap
   <int> <chr>                      <date>     <dttm>                <dbl>
1 802866 SILICON VALLEY BANK        2022-12-31 2023-02-16 17:03:37    47.7
2 802866 SILICON VALLEY BANK        2022-09-30 2022-11-04 15:48:53    35.7
3 802866 SILICON VALLEY BANK        2022-06-30 2022-08-04 18:44:07    35.8
4 802866 SILICON VALLEY BANK        2022-03-31 2022-08-29 18:48:12   152. 
5 802866 SILICON VALLEY BANK        2021-12-31 2022-06-10 21:41:20   162. 
6 802866 SILICON VALLEY BANK        2021-09-30 2022-03-04 19:46:37   156.

ffiec_float <- ffiec_scan_pqs(db, "ffiec_float")

figure1_items <- c(
  # Core Figure 1 (Schedule RC)
  "RCFD0081", "RCON0081",             # Cash and non-interest-bearing balances
  "RCFD0071", "RCON0071",             # Interest-bearing balances
  "RCFD2170", "RCON2170",             # Total assets
  "RCFD2200", "RCON2200", "RCFN2200", # Total deposits
  "RCFD2122", "RCON2122",             # Total loans & leases, net
  "RCFD1754", "RCON1754",             # Total securities
  "RCFD3190", "RCON3190",             # Total borrowed money
  "RCFD3210", "RCON3210",             # Total bank equity capital
  "RCFD2948", "RCON2948",             # Total liabilities
  "RCFDG105", "RCONG105",             # Total equity (including NCI)

  # Uninsured deposits (RC-O)
  "RCFD5597", "RCON5597",             # Estimated uninsured deposits

  # Securities (mostly RC-B memo items)
  "RCFDJJ34", "RCONJJ34",             # HTM securities: balance sheet
  "RCFDJ465", "RCONJ465",             # HTM securities: amortized cost
  "RCFD1771", "RCON1771",             # HTM securities: fair value
  "RCFD1773", "RCON1773",             # AFS securities: fair value
  "RCFD1772", "RCON1772",             # AFS securities: amortized cost
  "RCFDJA22", "RCONJA22"              # Equity securities
)

The following pulls together all the data on the items above.

bs_data <-
  ffiec_float |> 
  ffiec_pivot(items = figure1_items) |>
  compute() |>
  system_time()

   user  system elapsed 
  9.691   1.222   1.041

Note that bs_data has data on all banks for all quarters, even though we will only look at SVB here.

bs_data |> count() |> collect()

# A tibble: 1 × 1
       n
   <dbl>
1 657971

The following code prepares the data underlying Figure 1.

figure1_data <-
  bs_data |> 
  filter(IDRSSD == 802866, 
         date %in% c("2019-12-31", "2022-12-31")) |>
  mutate(cash = 
           coalesce(RCFD0081, RCON0081) + 
           coalesce(RCFD0071, RCON0071),
         securities =
           coalesce(RCFDJJ34, RCONJJ34) +
           coalesce(RCFD1773, RCON1773) +
           coalesce(RCFDJA22, RCONJA22),
         loans = 
           coalesce(RCFD2122, RCON2122),
         assets = 
           coalesce(RCFD2170, RCON2170),
         other_assets = assets - cash - securities - loans,
         deposits = 
           coalesce(RCFD2200, RCON2200) + RCFN2200,
         liabilities = 
           coalesce(RCFD2948, RCON2948),
         other_liabilities = liabilities - deposits,
         htm_gains = 
           coalesce(RCFD1771, RCON1771) -
           coalesce(RCFD1754, RCON1754),
         afs_gains = 
           coalesce(RCFD1773, RCON1773) -
           coalesce(RCFD1772, RCON1772),
         equity = 
           coalesce(RCFDG105, RCONG105),
         mtom_metrick = htm_gains + afs_gains,
         mtom_correct = htm_gains,
         equity_mtom_metrick = equity + mtom_metrick,
         equity_mtom = equity + mtom_correct,
         liab_eq = liabilities + equity,
         ) |>
  select(IDRSSD, date, 
         cash, securities, loans, other_assets, assets,
         deposits, other_liabilities, liabilities, 
         equity, liab_eq,
         equity_mtom_metrick, equity_mtom)

Assets (2019-12-31)
Item	Value ($B)
Cash	6
Securities	28
Loans	33
Other assets	3
Total assets	70

Liabilities and Equity (2019-12-31)
Item	Value ($B)
Deposits	63
Other liabilities	2
Liabilities	65
Equity	5
Total liabilities & equity	70
Equity (Metrick MTM)	5
Equity (correct MTM)	5

Assets (2022-12-31)
Item	Value ($B)
Cash	13
Securities	117
Loans	74
Other assets	5
Total assets	209

Liabilities and Equity (2022-12-31)
Item	Value ($B)
Deposits	175
Other liabilities	18
Liabilities	194
Equity	15
Total liabilities & equity	209
Equity (Metrick MTM)	-2
Equity (correct MTM)	0

References

Gow, I.D., 2026. Data curation and the data science workflow.

Wickham, H., Çetinkaya-Rundel, M., Grolemund, G., 2023. R for data science. O’Reilly Media, Sebastopol, CA.

Footnotes

As will be seen this, this load step will not generally be an elaborate one. The inclusion of a separate load step serves more to better delineate the distinction between the Curate process and the Understand process.↩︎
Execute install.packages(c("tidyverse", "farr", "DBI", "duckdb", "pak", dbplyr") within R to install all the packages other than ffiec.pq that you will need to run the R code in this note.↩︎
I already included pak in the command in the footnote to the previous paragraph.↩︎