Replicating Rhodes-Kropf et al. (2005)

Data step

First, collect data on Fama-French industries for each firm-year (Rhodes-Kropf et al. [2005] use the twelve-industry classification).

library(RPostgreSQL)

## Loading required package: DBI

drv <- dbDriver("PostgreSQL")
pg<- dbConnect(drv, dbname = "crsp")

# Create a CRSP-Compustat link table that includes SIC data from CRSP and this linked in turn 
# to Fama-French (12) industries. Note that Compustat only has a "header" SIC code, whereas 
# the CRSP data can change for a given firm over time.
rs <- dbGetQuery(pg, "DROP TABLE IF EXISTS sics")
rs <- dbGetQuery(pg, "
  CREATE TEMP TABLE sics AS
  SELECT gvkey, permno, namedt, nameenddt, siccd, 
    (CASE WHEN ind_num IS NULL THEN 12 ELSE ind_num::integer END) AS ff_ind
  FROM
    (SELECT DISTINCT gvkey, lpermno AS permno, siccd, 
      min(namedt) AS namedt, max(nameenddt) AS nameenddt
    FROM crsp.ccmxpf_linktable AS a
    INNER JOIN crsp.stocknames AS b
    ON a.lpermno=b.permno
    WHERE USEDFLAG='1' AND namedt >= linkdt AND nameenddt <= linkenddt OR linkenddt IS NULL
    GROUP BY gvkey, lpermno, siccd
    ORDER BY permno, namedt, nameenddt) AS a
  LEFT JOIN ff.ind_12 AS b
  ON siccd BETWEEN b.sic_low AND b.sic_high")

Rhodes-Kropf et al. (2005) describe the following variable calculations:

Using Compustat,we calculate a variety of size, performance,and leverage ratios. Market value is CRSP market equity plus Compustat book assets (item6=AT) minus deferred taxes (item74=TXDB) minus book equity (item60=CEQ). In addition, we obtain the following size-related measures: Total Plant, Property, Equipment (item 8=PPENT), Total Cash (item 1=CHE), Long-term Debt (item 9=DLTT), capital expenditures (CAPEX) (item 128=CAPX), and Net Income (item172=NI). Return on assets and equity are calculated by dividing net income in year t by assets (item6=AT) or book equity (item60=CEQ) in year t-1.

For leverage measures, we obtain the Current Ratio (items 4/5=ACT/LCT), Quick Ratio [items (4 - 3)/5=(ACT-INVT)/LCT], market leverage (1 - market equity/market value), and book leverage (1 - book equity/total book assets).

Finally, the announcement and closing dates of mergers, the method of payment (when available), and a dummy for whether the merger was withdrawn were taken from SDC and merged to the data from Compustat and CRSP.

# Based on this, pull together data from Compustat and merge with CRSP SIC data
reg.data <- dbGetQuery(pg,
  "SELECT gvkey, datadate, fyear,  b.ff_ind, b.siccd, 
    prcc_f * csho + at - txdb - ceq AS mv, prcc_f * csho AS mve,
    at - txdb AS bv,
    ppent, che, dltt, capx, ni, ceq,
    CASE WHEN lag(at) OVER w > 0 THEN ni/lag(at) OVER w END AS roa, 
    CASE WHEN lag(ceq) OVER w > 0 THEN ni/lag(ceq) OVER w END AS roe,
    CASE WHEN lct > 0 THEN act/lct END AS cr, 
    CASE WHEN lct > 0 THEN (act-invt)/lct END AS qr, 
    CASE WHEN (prcc_f * csho + at - txdb - ceq) > 0 THEN
    1 - (prcc_f * csho)/(prcc_f * csho + at - txdb - ceq) END AS lev_mv,
    CASE WHEN at > 0 THEN 1 - ceq/at END AS lev_book
  FROM comp.funda AS a
  LEFT JOIN sics AS b
  USING (gvkey) 
  WHERE indfmt='INDL' AND consol='C' AND datafmt='STD' AND prcc_f IS NOT NULL 
    AND a.datadate BETWEEN b.namedt AND b.nameenddt
  WINDOW w AS (PARTITION BY gvkey ORDER BY datadate)")

Replication and plot

The following is a rough replication of Model 3 of Table 4. The coefficients are pretty similar, except for leverage (perhaps due to a different definition; Rhodes-Kropf et al. [2005] aren't too clear on how they calculated this). Also the \( R^2 \) values I'm getting are a bit higher, which can't be a bad thing. Note that I am including the main effect of the loss indicator (\( I_{NI<0} \)), though this doesn't actually affect the results much at all.

Industry	Constant	\( b \)	\( \textbf{LEV} \)	\( \log(\left\|\mathbf{NI}\right\|) \)	\( I_{\mathrm{NI}< 0} \)	\( I_{\mathrm{NI}< 0} \cdot \log(\left\|\mathbf{NI}\right\|) \)	\( R^2 \)
1	1.74	0.68	-1.68	0.35	-0.01	-0.10	0.93
2	1.67	0.68	-1.44	0.33	-0.06	-0.08	0.94
3	1.78	0.68	-1.62	0.30	0.04	-0.06	0.92
4	1.66	0.74	-1.25	0.23	-0.15	0.01	0.92
5	1.93	0.65	-1.74	0.35	0.04	-0.06	0.95
6	2.22	0.61	-2.10	0.36	-0.04	-0.07	0.88
7	2.47	0.62	-1.49	0.30	-0.13	0.11	0.91
8	1.59	0.81	-1.56	0.18	-0.06	-0.04	0.97
9	2.01	0.67	-1.86	0.32	-0.07	-0.08	0.91
10	2.38	0.59	-1.96	0.37	-0.08	-0.11	0.90
11	1.91	0.58	-1.19	0.41	-0.11	-0.16	0.89
12	2.12	0.64	-1.81	0.31	-0.07	-0.06	0.87

Now plot the predicted values against actual values

library(ggplot2)
qplot(log(mve), predicted, data = predicted)

plot of chunk unnamed-chunk-1