When does a bank fail?
A bank typically fails for one of the two reasons: Insolvency or Illiquidity. Insolvency occurs when the value of the assets held by a bank is less than the value of the liabilities(Deposits) held. Illiquidity occurs when a bank is not able to meet its current obligations as they come due.
What is the role of FDIC(Federal Deposit Insurance Corporation) in the event of bank failure?
In the event of a bank failure, the FDIC acts in two capacities. First, as the insurer of the bank’s deposits, the FDIC pays insurance to the depositors up to the insurance limit. Second, the FDIC, as the “Receiver” of the failed bank, assumes the task of selling/collecting the assets of the failed bank and settling its debts, including claims for deposits in excess of the insured limit.
The dataset holds information of all the banks in US which have failed from 1934 to present.
Problem Statement
What are the risk indicators which a depositor should take into consideration while depositing money into a bank?. The exploratory data analysis focuses on the typical risk indicators leading to a bank failure.
Solution Overview
Observing the typical characteristics of a bank failure using data visualization and summary statistics which will aid in guiding a depositor to choose a bank which most likely would not run into a failure.
The following packages were required:
# Loading the libraries
library(flexdashboard)
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
library(printr)
library(DT)
library(knitr)1.Data Import
Source: Commercial bank failures, 1934 - Present from Kaggle
This dataset contains 12 variables and 3484 observations of the failure of a commercial bank, savings association, and savings bank since the establishment of the FDIC in 1933. The bank failure report is available on the FDIC website. Comprehensive data on estimated losses are not available for FDIC-insured failures prior to 1986. Estimated loss is presented as “N/A” in years for which comprehensive information is not available.
# URL to the github repository
url <- "https://raw.githubusercontent.com/sayalidw93/Data-Wrangling/master/project%20data.csv"
# Read csv file
bank_raw <- read.csv(url, stringsAsFactors = FALSE)# Display class of the dataset
class(bank_raw)[1] "data.frame"
# Display variable names in the dataset
colnames(bank_raw) [1] "Financial.Institution.Number" "Institution.Name"
[3] "Institution.Type" "Charter.Type"
[5] "Headquarters" "Failure.Date"
[7] "Insurance.Fund" "Certificate.Number"
[9] "Transaction.Type" "Total.Deposits"
[11] "Total.Assets" "Estimated.Loss..2015."
# Display the dimensions of the dataset
dim(bank_raw)[1] 3484 12
2. Codebook
Codebook is available on the FDIC website. Description of the variables in the dataset is as follows:
3. Data Cleaning
Missing values in the data:
I checked the data for missing values.
# Number of missing values for each variable
colSums(is.na(bank_raw))Financial.Institution.Number Institution.Name
601 0
Institution.Type Charter.Type
0 0
Headquarters Failure.Date
0 0
Insurance.Fund Certificate.Number
0 485
Transaction.Type Total.Deposits
0 0
Total.Assets Estimated.Loss..2015.
151 975
Thus, there are missing values for Financial Institution Number, Certificate Number, Total Assets and Estimated Losses. As the variables Financial Institution Number and Certificate Number are not much of a concern in the exploratory analysis for this dataset, I will drop these two variables. However, for Total Assets and Estimated Losses I have decided not to remove any rows with missing data yet as these variables are significant for analysis. I will take care of this issue while doing individual analysis.
# Dropping insignificant variables
bank_raw <- bank_raw %>% select(-Financial.Institution.Number,-Certificate.Number)Splitting Failure Date and Headquarter variables
For the ease of analysis and to find relation between the variables, I split the Headquarters variable into Headquarter city and State variables, and the Failure Date variable into Month, Date and Year variables.
# Splitting the Headquarter and Failure Date variables
banks <- bank_raw %>%
separate(Headquarters, c("Headquarter city","State"), sep = ",") %>%
separate(Failure.Date, c("Month","Date","Year"), sep = "/")Imputing data
In the process of splitting the variables, a warning popped up for 2914th observation.# Checking the record after the variables were split
banks[2914,]| Institution.Name | Institution.Type | Charter.Type | Headquarter city | State | Month | Date | Year | Insurance.Fund | Transaction.Type | Total.Deposits | Total.Assets | Estimated.Loss..2015. | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2914 | BANK USA, N.A. | COMMERCIAL BANK | FEDERAL | KIHEI | MAUI | 5 | 19 | 1995 | BIF | ACQUISITION | 9124 | 9048 | 2593 |
# Checking the record before the variables were split
bank_raw[2914,]| Institution.Name | Institution.Type | Charter.Type | Headquarters | Failure.Date | Insurance.Fund | Transaction.Type | Total.Deposits | Total.Assets | Estimated.Loss..2015. | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2914 | BANK USA, N.A. | COMMERCIAL BANK | FEDERAL | KIHEI, MAUI, HI | 5/19/1995 | BIF | ACQUISITION | 9124 | 9048 | 2593 |
Thus, we see that, the state variable is MAUI instead of HI(Hawaii). Comparing it with the original dataset, I decided to replace the Value “MAUI” with “HI” for this record, as the state variable is an important variable in our analysis.
# Imputing value in the State variable
banks$State <- str_replace(banks$State, "MAUI","HI")4. Data Preview
In the data, each row is the failed bank and the columns represent the information for each failed bank. Dollar amount is in thousands for Total Assets, Total Deposits amd Estimated Losses.
# Get a sense of cleaned data
datatable(tail(banks,100))5. Structure and Summary of Clean Data
# Create a datatable to display structure of clean data
Var.type <- lapply(banks,class)
Var.name <- colnames(banks)
strdata <- as_data_frame(cbind(Var.name,Var.type))
colnames(strdata) <- c("Variable Name","Variable Type")
kable(strdata)| Variable Name | Variable Type |
|---|---|
| Institution.Name | character |
| Institution.Type | character |
| Charter.Type | character |
| Headquarter city | character |
| State | character |
| Month | character |
| Date | character |
| Year | character |
| Insurance.Fund | character |
| Transaction.Type | character |
| Total.Deposits | integer |
| Total.Assets | integer |
| Estimated.Loss..2015. | integer |
# Summary Statistics
Total.Deposits <- summary(banks$Total.Deposits)
Total.Assets <- summary(banks$Total.Assets)
Estimated.Loss <- summary(banks$Estimated.Loss..2015.)
# Merging the summary statistics for 3 variables in a data frame
summary_stats <- rbind(Total.Deposits,Total.Assets,Estimated.Loss)
# Display the summary statistics table
kable(summary_stats)| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | NA’s | |
|---|---|---|---|---|---|---|---|
| Total.Deposits | 0 | 12293 | 41721 | 299776.18 | 156241.5 | 188260793 | 0 |
| Total.Assets | 14 | 15051 | 47523 | 404588.41 | 173478.0 | 307021614 | 151 |
| Estimated.Loss | -4386 | 3852 | 13354 | 70644.35 | 44585.0 | 12438005 | 975 |
The summary statistics table gives us an idea about Mean, Median, Minimum, Maximum, Missing values, 1st Quartile and 3rd Quartile for the numeric variables of interest. The negative Minimum value of Estimated loss may indicate that those banks had a slight margin for profit, however going forward estimated to run out of money in the near future, hence opted for early bankruptcy.Thus, those are not unusual values and are not deleted.
This section aims to explore the various risk indicators for a bank failure.
I intend on exploring the following relations :
Analyze the bank failure rate over the period of time using a barplot/time series plot.This will help the depositor in deciding if the bank is risky or if the bank failed because of adverse market conditions.
Also, analyze the state-wise bank failure rate over the period of time using barplots, heatmaps etc. This will give an idea which states were more likely to witness bank failures.
Institution Type wise bank failure rate over the given period of time using barplot. This helps in deciding which institution types had a high failure rate.
From the data it looks like the difference between the total assets and total deposits is minimal. I plan to create a new Difference variable and try to check its variation over the given period of time. Also, a time series plot of the two variables, total assets and total deposits will give a clearer picture of how these variables has had an impact on bank failure. Theoretically, the bookish value of total assets (Total Assets) is more than actual value of total assets at the time when the bank is declared bankrupt.
Analyze the variation of estimated losses with respect to the transaction type used to resolve a bank failure.
What do I not know now? :(
Still exploring possible ways of building a model which gives the relationship of the estimated cost of resolving the bank failure with the other significant variables.