First, collect data on Fama-French industries for each firm-year (Rhodes-Kropf et al. [2005] use the twelve-industry classification).
library(RPostgreSQL)
## Loading required package: DBI
drv <- dbDriver("PostgreSQL")
pg<- dbConnect(drv, dbname = "crsp")
# Create a CRSP-Compustat link table that includes SIC data from CRSP and this linked in turn
# to Fama-French (12) industries. Note that Compustat only has a "header" SIC code, whereas
# the CRSP data can change for a given firm over time.
rs <- dbGetQuery(pg, "DROP TABLE IF EXISTS sics")
rs <- dbGetQuery(pg, "
CREATE TEMP TABLE sics AS
SELECT gvkey, permno, namedt, nameenddt, siccd,
(CASE WHEN ind_num IS NULL THEN 12 ELSE ind_num::integer END) AS ff_ind
FROM
(SELECT DISTINCT gvkey, lpermno AS permno, siccd,
min(namedt) AS namedt, max(nameenddt) AS nameenddt
FROM crsp.ccmxpf_linktable AS a
INNER JOIN crsp.stocknames AS b
ON a.lpermno=b.permno
WHERE USEDFLAG='1' AND namedt >= linkdt AND nameenddt <= linkenddt OR linkenddt IS NULL
GROUP BY gvkey, lpermno, siccd
ORDER BY permno, namedt, nameenddt) AS a
LEFT JOIN ff.ind_12 AS b
ON siccd BETWEEN b.sic_low AND b.sic_high")
Rhodes-Kropf et al. (2005) describe the following variable calculations:
Using Compustat,we calculate a variety of size, performance,and leverage ratios. Market value is CRSP market equity plus Compustat book assets (item6=AT) minus deferred taxes (item74=TXDB) minus book equity (item60=CEQ). In addition, we obtain the following size-related measures: Total Plant, Property, Equipment (item 8=PPENT), Total Cash (item 1=CHE), Long-term Debt (item 9=DLTT), capital expenditures (CAPEX) (item 128=CAPX), and Net Income (item172=NI). Return on assets and equity are calculated by dividing net income in year t by assets (item6=AT) or book equity (item60=CEQ) in year t-1.
For leverage measures, we obtain the Current Ratio (items 4/5=ACT/LCT), Quick Ratio [items (4 - 3)/5=(ACT-INVT)/LCT], market leverage (1 - market equity/market value), and book leverage (1 - book equity/total book assets).
Finally, the announcement and closing dates of mergers, the method of payment (when available), and a dummy for whether the merger was withdrawn were taken from SDC and merged to the data from Compustat and CRSP.
# Based on this, pull together data from Compustat and merge with CRSP SIC data
reg.data <- dbGetQuery(pg,
"SELECT gvkey, datadate, fyear, b.ff_ind, b.siccd,
prcc_f * csho + at - txdb - ceq AS mv, prcc_f * csho AS mve,
at - txdb AS bv,
ppent, che, dltt, capx, ni, ceq,
CASE WHEN lag(at) OVER w > 0 THEN ni/lag(at) OVER w END AS roa,
CASE WHEN lag(ceq) OVER w > 0 THEN ni/lag(ceq) OVER w END AS roe,
CASE WHEN lct > 0 THEN act/lct END AS cr,
CASE WHEN lct > 0 THEN (act-invt)/lct END AS qr,
CASE WHEN (prcc_f * csho + at - txdb - ceq) > 0 THEN
1 - (prcc_f * csho)/(prcc_f * csho + at - txdb - ceq) END AS lev_mv,
CASE WHEN at > 0 THEN 1 - ceq/at END AS lev_book
FROM comp.funda AS a
LEFT JOIN sics AS b
USING (gvkey)
WHERE indfmt='INDL' AND consol='C' AND datafmt='STD' AND prcc_f IS NOT NULL
AND a.datadate BETWEEN b.namedt AND b.nameenddt
WINDOW w AS (PARTITION BY gvkey ORDER BY datadate)")
The following is a rough replication of Model 3 of Table 4. The coefficients are pretty similar, except for leverage (perhaps due to a different definition; Rhodes-Kropf et al. [2005] aren't too clear on how they calculated this). Also the \( R^2 \) values I'm getting are a bit higher, which can't be a bad thing. Note that I am including the main effect of the loss indicator (\( I_{NI<0} \)), though this doesn't actually affect the results much at all.
| Industry | Constant | \( b \) | \( \textbf{LEV} \) | \( \log(\left|\mathbf{NI}\right|) \) | \( I_{\mathrm{NI}< 0} \) | \( I_{\mathrm{NI}< 0} \cdot \log(\left|\mathbf{NI}\right|) \) | \( R^2 \) |
|---|---|---|---|---|---|---|---|
| 1 | 1.74 | 0.68 | -1.68 | 0.35 | -0.01 | -0.10 | 0.93 |
| 2 | 1.67 | 0.68 | -1.44 | 0.33 | -0.06 | -0.08 | 0.94 |
| 3 | 1.78 | 0.68 | -1.62 | 0.30 | 0.04 | -0.06 | 0.92 |
| 4 | 1.66 | 0.74 | -1.25 | 0.23 | -0.15 | 0.01 | 0.92 |
| 5 | 1.93 | 0.65 | -1.74 | 0.35 | 0.04 | -0.06 | 0.95 |
| 6 | 2.22 | 0.61 | -2.10 | 0.36 | -0.04 | -0.07 | 0.88 |
| 7 | 2.47 | 0.62 | -1.49 | 0.30 | -0.13 | 0.11 | 0.91 |
| 8 | 1.59 | 0.81 | -1.56 | 0.18 | -0.06 | -0.04 | 0.97 |
| 9 | 2.01 | 0.67 | -1.86 | 0.32 | -0.07 | -0.08 | 0.91 |
| 10 | 2.38 | 0.59 | -1.96 | 0.37 | -0.08 | -0.11 | 0.90 |
| 11 | 1.91 | 0.58 | -1.19 | 0.41 | -0.11 | -0.16 | 0.89 |
| 12 | 2.12 | 0.64 | -1.81 | 0.31 | -0.07 | -0.06 | 0.87 |
Now plot the predicted values against actual values
library(ggplot2)
qplot(log(mve), predicted, data = predicted)