Exploratory Data Analysis
This section contains some preliminary EDA. We construct the sample from all firms for which we have time series of financial statements coming from factset. Flannery and Rangan exclude financials and regulated utilities (sic codes 4900 to 4999). We need to exclude these in future research. All of these variables are winsorized at the 1st and 99th percentiles to avoid the influence of extreme observations.
We define two samples:
- All US firms that report in USD that have two or more years of history
- All US firms that report in USD that have twenty or more years of history. We call these “survivor” firms.
The rationale behind also looking at survivor firms is that if we are interested in modelling the dynamics of firms over a long horizon (like it is the case for the CvaR usecase) we might want to build the model with firms that haven’t defaulted or disappeared in some other way over a long horizon. Also, to estimate firm-level firms effect, it is at least my intuition, that enough firm level history should be input. However the latter is not taken into account in Flannery, so I might be wrong. In any case it doesn’t do any harm.
For the regression, we also require complete information on the explanatory variables. Specifically:
- We have 14491 US firms that report in USD in the data.
- We have 6495 US firms that report in USD in the data, that have complete information to run the regression.
- The pooled panel has 64024 observations.
- We have 2456 US firms that report in USD in the data that we classify as survivors.
- We have 1793 survivors, that have complete information to run the regression.
- The pooled panel for survivor firms has 35036 observations.


- Descriptive Statistics for the full sample (winsorised 99th):
- Descriptive Statistics for the survivor sample (winsorised 99th):
- How do industry medians (IndMed) look like?

