Identifying Bank Subsidiaries and Parent Holding Companies in the US

Getting the Data

The datasets used in this article are public and can be downloaded from this link for the commercial banks level and this link for the bank holding company level. In either case, the variables names follow suit and can be identified using the website data dictionary. After downloading the data, I place the commercial and parent zipped files into two folders, named COM and BHC, respectively.

To get started, I load the usual suspects plyr and lubridate. Additionally, I refer to Hmisc and SASxport packages in order to deal with the SAS formatted files for the Call data.

library(plyr)
library(lubridate)
library(Hmisc)
library(haven)

rm(list = ls())

data.dir <- getwd()

From my own experience, I found it very useful to create a list of variables of interest and, hence, work with a smaller subset, especially when the regulatory data have thousands of variables.

var_list_file <- "/home/simaan/Dropbox/FNA/Data/var_list.csv"
var.ds <- read.csv(var_list_file,stringsAsFactors = FALSE)
var.ds

In this case, I focus only on identifying information. Moreover, I use the same variable list for both the Call and the FR Y-9C data.

Call Reports

I start with the Call reports and focus on the last four quarters of data.

com.dir <- paste(data.dir,"COM",sep = "/")
zip.files <- paste(com.dir,list.files(com.dir),sep = "/")
zip.l <- sapply(zip.files,unzip)
zip.l

## /home/simaan/Dropbox/RMarkdown/Identify_BHC_Sub/COM/call1703-zip.zip 
##                                                     "./call1703.xpt" 
## /home/simaan/Dropbox/RMarkdown/Identify_BHC_Sub/COM/call1706-zip.zip 
##                                                     "./call1706.xpt" 
## /home/simaan/Dropbox/RMarkdown/Identify_BHC_Sub/COM/call1709-zip.zip 
##                                                     "./call1709.xpt" 
## /home/simaan/Dropbox/RMarkdown/Identify_BHC_Sub/COM/call1712-zip.zip 
##                                                     "./call1712.xpt"

The above command unzips the listed files for R to read them. Note that each zipped file corresponds to a given YYMM quarter. Next, we select the variables of interest while reading each file separately:

var.sel <- var.ds$var

# read into a list
com.l <- lapply(zip.l, read_xpt )
com.l <- lapply(com.l, function(ds.i) ds.i[,names(ds.i) %in% var.sel] )

# stack altogether
com.ds <- ldply(com.l,data.frame)
com.ds$.id <- NULL
dim(com.ds)

## [1] 24209    13

head(com.ds)

After reading the above data, I replace each variable’s name with its label using the var.ds object. In particular, I do the following

var.ds <- merge(var.ds,data.frame(var = names(com.ds)), by = "var")
com.ds <- com.ds[,var.ds$var]
names(com.ds) <- var.ds$name
summary(com.ds)

##      lei                  id            country              name          
##  Length:24209       Min.   :     37   Length:24209       Length:24209      
##  Class :character   1st Qu.: 328357   Class :character   Class :character  
##  Mode  :character   Median : 652977   Mode  :character   Mode  :character  
##                     Mean   :1010749                                        
##                     3rd Qu.: 966731                                        
##                     Max.   :5143788                                        
##                                                                            
##   name_legal        fed_district_code    org_type       charter_type  
##  Length:24209       Min.   : 0.000    Min.   : 0.000   Min.   :  0.0  
##  Class :character   1st Qu.: 6.000    1st Qu.: 1.000   1st Qu.:200.0  
##  Mode  :character   Median : 7.000    Median : 1.000   Median :200.0  
##                     Mean   : 7.158    Mean   : 1.302   Mean   :208.5  
##                     3rd Qu.:10.000    3rd Qu.: 1.000   3rd Qu.:200.0  
##                     Max.   :12.000    Max.   :11.000   Max.   :400.0  
##                     NA's   :3         NA's   :3        NA's   :3      
##      city              state           fed_regulator2       parent_id      
##  Length:24209       Length:24209       Length:24209       Min.   :      0  
##  Class :character   Class :character   Class :character   1st Qu.:1056804  
##  Mode  :character   Mode  :character   Mode  :character   Median :1203853  
##                                                           Mean   :1700863  
##                                                           3rd Qu.:2744883  
##                                                           Max.   :5170805  
##                                                           NA's   :3        
##       date         
##  Min.   :20170331  
##  1st Qu.:20170331  
##  Median :20170630  
##  Mean   :20170777  
##  3rd Qu.:20170930  
##  Max.   :20171231  
##

It looks like that some the parent’s id are given zero. If that’s the case, I assign them the same value of the subsidiary id.

com.ds[com.ds$parent_id %in% 0,"parent_id"] <- com.ds[com.ds$parent_id %in% 0,"id"] 
com.ds <- na.omit(com.ds)

In addition, I drop all missing values from the com.ds data frame.

FR Y-9C Reports

The com.ds provides identification of all banks on the subsidiary level with a reference to the parent. In order to link the subsidiaries along with their parents, we move to the holding company data. Following approach as above, I read the parent BHCs data with respect to the description provided here:

# unzip
bhc.dir <- paste(data.dir,"BHC",sep = "/")
zip.files <- paste(bhc.dir,list.files(bhc.dir),sep = "/")
zip.files <- tail(zip.files,4)
zip.l <- sapply(zip.files,unzip)

# load the selected variables
var.ds <- read.csv(var_list_file ,stringsAsFactors = F)
var.sel <- var.ds$var

# read each file
read.bhc <- function(x) read.table(x, sep="^", comment.char="", header=TRUE, quote="", na.strings="--------", as.is=TRUE)
bhc.l <- lapply(zip.l,read.bhc)
bhc.l <- lapply(bhc.l,function(x) x[,names(x) %in% var.sel] )

# put list in data.frame
bhc.ds <- ldply(bhc.l,data.frame)
bhc.ds$.id <- NULL

# change var names to the given labels
var.ds <- read.csv(var_list_file ,stringsAsFactors = F)
var.ds <- merge(var.ds,data.frame(var = names(bhc.ds)), by = "var")
bhc.ds <- bhc.ds[,var.ds$var]
names(bhc.ds) <- var.ds$name
summary(bhc.ds)

##      lei            fed_regulator            id            country         
##  Length:10852       Length:10852       Min.   :1020180   Length:10852      
##  Class :character   Class :character   1st Qu.:1132140   Class :character  
##  Mode  :character   Mode  :character   Median :2090779   Mode  :character  
##                                        Mean   :2283760                     
##                                        3rd Qu.:3253678                     
##                                        Max.   :5170805                     
##                                        NA's   :4                           
##      name            name_legal        fed_district_code    org_type     
##  Length:10852       Length:10852       Min.   : 0.00     Min.   : 1.000  
##  Class :character   Class :character   1st Qu.: 6.00     1st Qu.: 1.000  
##  Mode  :character   Mode  :character   Median : 8.00     Median : 1.000  
##                                        Mean   : 7.46     Mean   : 1.424  
##                                        3rd Qu.:10.00     3rd Qu.: 1.000  
##                                        Max.   :12.00     Max.   :99.000  
##                                        NA's   :4         NA's   :4       
##   charter_type       city              state             parent_id      
##  Min.   :500.0   Length:10852       Length:10852       Min.   :      0  
##  1st Qu.:500.0   Class :character   Class :character   1st Qu.:      0  
##  Median :500.0   Mode  :character   Mode  :character   Median :      0  
##  Mean   :500.7                                         Mean   : 213353  
##  3rd Qu.:500.0                                         3rd Qu.:      0  
##  Max.   :720.0                                         Max.   :5141195  
##  NA's   :4                                             NA's   :4        
##       date         
##  Min.   :20170331  
##  1st Qu.:20170630  
##  Median :20170630  
##  Mean   :20170884  
##  3rd Qu.:20171231  
##  Max.   :20171231  
##  NA's   :4

It looks like most parent ids are zeroes. If that’s the case, then I assign the parent id number the same value of the id column, i.e.

bhc.ds[bhc.ds$parent_id %in% 0,"parent_id"] <- bhc.ds[bhc.ds$parent_id %in% 0,"id"] 
bhc.ds <- na.omit(bhc.ds)

Linking the Two Datasets

Given the datasets com.ds and bhc.ds, I merge them in the following fashion:

names(com.ds) <- paste(names(com.ds),"L1",sep = "_")
names(com.ds)[names(com.ds) == "date_L1"] <- "date"
com.ds$common_id_L1 <- com.ds$parent_id_L1
bhc.ds$common_id_L1 <- bhc.ds$id

com.bhc.ds <- merge(com.ds,bhc.ds, by = c("common_id_L1","date") )
head(com.bhc.ds)

Let’s compute the number of the subsidiaries for this child-parent dataset

no.sub1 <- ddply(com.bhc.ds,c("common_id_L1","date"), function(ds.i)  length(unique(ds.i$id_L1))  )
names(no.sub1)[3] <- "number_subs_1"
summary(no.sub1)

##   common_id_L1          date          number_subs_1   
##  Min.   :1020180   Min.   :20170331   Min.   : 1.000  
##  1st Qu.:1129186   1st Qu.:20170630   1st Qu.: 1.000  
##  Median :1944204   Median :20170630   Median : 1.000  
##  Mean   :2210358   Mean   :20170888   Mean   : 1.122  
##  3rd Qu.:3157150   3rd Qu.:20171231   3rd Qu.: 1.000  
##  Max.   :5170805   Max.   :20171231   Max.   :17.000

We observe that, mainly, there is a one-to-one relationship, with the exception for a number of cases. For instance, we note that in the last quarter of 2017, the Wintrust Financial Corporation had 17 subsidiaries:

id17 <- unique(no.sub1[no.sub1$number_subs_1 == 17,"common_id_L1"])
WT.list <- com.bhc.ds[com.bhc.ds$common_id_L1 ==  id17 & com.bhc.ds$date == 20171231,"name_L1"]
cat(sort(as.character(WT.list)),sep = "\n")

## BARRINGTON B&TC NA
## BEVERLY B&TC NA
## CHICAGO TC NA
## CRYSTAL LAKE B&TC NA
## FIFC EDGE INTL CORP
## HINSDALE B&TC
## LAKE FOREST B&TC NA
## LIBERTYVILLE B&TC
## NORTHBROOK B&TC
## OLD PLANK TRAIL CMNTY BK NA
## SCHAUMBURG B&TC NA
## ST CHARLES B&TC
## STATE BK OF THE LAKES
## TOWN BK
## VILLAGE B&TC
## WHEATON B&TC
## WINTRUST BK

Relatively, it is considered a large BHC with more than $20 billion in assets. To confirm such, I look into the 10-K annual filing by the corporation to confirm whether this the case. The report points out that:

“We now own fifteen banks, including eight Illinois-chartered banks: Hinsdale Bank and Trust Company (“Hinsdale Bank”), Wintrust Bank, Libertyville Bank and Trust Company (“Libertyville Bank”), Northbrook Bank & Trust Company (“Northbrook Bank”), Village Bank & Trust (“Village Bank”), Wheaton Bank & Trust Company (“Wheaton Bank”), State Bank of the Lakes and St. Charles Bank & Trust Company (“St. Charles Bank”). In addition, we have one Wisconsin-chartered bank, Town Bank, and six nationally chartered banks: Lake Forest Bank, Barrington Bank, Crystal Lake Bank & Trust Company, N.A. (“Crystal Lake Bank”), Schaumburg Bank & Trust Company, N.A. (“Schaumburg Bank”), Beverly Bank & Trust Company, N.A. (“Beverly Bank”) and Old Plank Trail Community Bank, N.A. (“Old Plank Trail Bank”). As of December 31, 2017, we had 157 banking locations.”

On the other hand, if we look at JP Morgan, for instance, we observe that it has two main subsidiaries at the end of 2017. This is in line with the 10-K report from the SEC (see link). Nevertheless, it is also appears that each subsidiary has a number of other subsidiaries in the annual report, which our data does not seem to convey such complexity in the structure:

JP.list <- com.bhc.ds[com.bhc.ds$common_id_L1 ==  1039502 & com.bhc.ds$date == 20171231,"name_legal_L1"]
cat(sort(as.character(JP.list)),sep = "\n")

## J.P. MORGAN INTERNATIONAL FINANCE LIMITED
## JPMORGAN CHASE BANK, NATIONAL ASSOCIATION

Final Data

To finalize, I consider the date of the report, the name and id of both the subsidiary and the parent, as well as the LEI code for each one. I also add the number of subsidiaries for each BHC in the sample

ds.final <- com.bhc.ds[,c("date","name_legal_L1","lei_L1","id_L1","name_legal","lei","id")]

no_sub <- ddply(ds.final,c("id","date"),function(x)  data.frame(no_sub = nrow(x)) )
no_sub <- no_sub[!is.na(no_sub$id),]
ds.final <- merge(ds.final,no_sub,by = c("id","date"),all = T)

Note that the LEI on the subsidiary level has multiple zeroes, whereas this does not seem to be the case for the parent.

Publically Traded Banks

In total, we have

length(unique(ds.final$id))

## [1] 4291

BHCs. However, only a small proportions are publicly traded. To identify these banks, I refer to the Banking Research Datasets from the Federal Reserve Bank of New York :

frb_crsp_file <- "https://www.newyorkfed.org/medialibrary/media/research/banking_research/data/crsp_20161231.csv?la=en"
crsp <- read.csv(frb_crsp_file,stringsAsFactors = F)
ds.final <- ds.final[ds.final$id %in% crsp$entity,]

In total, we identify 419 publicly traded companies according to 2017

length(unique(ds.final$id))

## [1] 419

Recall that commercial banks file the Call report on the lower level, whereas SEC filings and stock data take place at the parent level (see this link for a further discussion about the link data). Hence, for researchers interested in stock data, the final dataset would constitute a much smaller sample size.

Identifying Bank Subsidiaries and Parent Holding Companies in the US

Majeed Simaan

Feb 2, 2024