Executive Summary

In this homework, we are interesting in running a basic Fama-Macbeth type of regression predicting cross-sectional stock returns with independent variable.This type of forecasts could simulate investors behaviors on combining multiple companies financial metrics to estimate stocks expected return.

To study this problem, the data was retrieved from WRDS Compustat database. The dataset was from 2010-2019 and the the year \(t\) Calendar stock return was predicted by the \(t-2\) year variables, as taking in to consideration that the company may end its fiscal year in May and report its 10-K in October. In the Table and Figure section, a summary statistics of the coefficient estimates along with the coefficient estimates listed by year. Also, a facet wrap graph by financial year was provided by the end of the page with non-linear least square (a.k.a “nls”) method in showing the Root-mean-sqaure Error (RMSE) trend.

Methodology

    1. E/P ratio: \[\text{E/P ratio} = \frac{\text{Earnings Per Share}}{\text{Stock Price}}\]
    1. Firm-cap \[\text{Firm-cap} = \text{Shares Outstandings} \times \text{Stock Price}\]
    1. Price Inverse \[\text{Price Inverse}= \frac{1}{\text{Price}}\]
    1. Normalized Accurals \[\text{Normalized Accural}= \text{EBITDA}-\text{Depreiciation and Amortization}-\text{Tax Payable}\]
    1. Variable included explanations:
    • CSHO – Common Shares Outstanding:

      This item represents the net number of all common shares outstanding at year-end, excluding treasury shares and scrip.

    • GVKEY – Standard and Poor’s Identifier:

      The fundamental Compustat company identifier.

    • FYEAR – Fiscal Year: This item represents the fiscal year of the current fiscal year-end month.

    • RECT – Receivables Total

    • DP – Depreciation and Amortization:

      Depreciation is concerned with spreading the actual cost or other basic value of tangible capital assets over their estimated useful life is a systematic manner. Amortization is the process of cost allocation for intangible assets

    • TXP – Income Taxes Payable:

      This item represents the accrued tax liability on income due within one year.

    • XACC – Accrued Expenses

    • EBITDA – Earnings Before Interest:

      This item is the sum of Sales - Net (SALE) minus Cost of Goods Sold (COGS) minus Selling, General & Administrative Expense (XSGA).

    • PRCC_F – Price Close - Annual - Fiscal

Table and Figure

Summary Statistics for Estimate Coefficient
Term Mean Standard Deviation
(Intercept) -0.5034755 17.1193241
ep.ratio -0.0021094 0.0355030
mkt.cap -0.0722294 1.9626103
p.accrual 0.0028746 0.0095591
p.inv 0.3591119 0.8099841

Summary Statistics of Coefficient Estimates

Facet Wrap by fyear

Coefficient Table by fyear

Conclusion

By the Facet Wrap graph, as the graph was plotted as RMSE vs market cap.When first plotting the plot, the range of the RMSE is too large to observe a trend. After truncated the data at 10th percentile and 90th percentile, we can easily observe that the RMSE rapidly increase as the market cap gets larger, which shows that the our error are left skewed when plotting in terms of the market cap. We would doubt the FM regression quality with larger Market Cap firms.

Computer Code

#mutate needed variables to data
data = data %>% 
  group_by(gvkey,fyear) %>%
  mutate(ep.ratio = ebitda/prcc_f,#calculate ep ratio
         mkt.cap = log(csho * prcc_f),#calculate market cap
         p.inv = 1/prcc_f, #calculate 1/p ratio
         p.accrual = rect-dp-txp-xacc) %>% #calculate price accrual
  drop_na()#drop any NA 
 
rtn = data %>% 
  group_by(gvkey) %>% 
  mutate(return = prcc_f/lag(prcc_f),# calculate return
         fwd.rtn=lead(prcc_f,2)/lead(prcc_f)) %>%  # calculate forward returns
  drop_na()
 
#run regression of fwd rtn
out.fwd = rtn %>%
    group_by(fyear) %>%
    do(out = tidy(lm(fwd.rtn ~ mkt.cap + p.accrual + ep.ratio + p.inv, data = .))) %>% 
  unnest(out)

#getting avg
coef.stat.fwd = out.fwd %>% 
  group_by(term) %>% 
  summarize(mean = mean(estimate),
              sd = sd(estimate))

colnames(coef.stat.fwd)=c("Term","Mean","Standard Deviation")
knitr::kable(coef.stat.fwd,caption = "Summary Statistics for Estimate Coefficient")  

#forecast rtn
pred.rtn = rtn %>% 
  group_by(gvkey) %>%
  mutate(pred.rtn = coef.stat.fwd$Mean[1]+
           coef.stat.fwd$Mean[2]*lag(mkt.cap,2)+
           coef.stat.fwd$Mean[3]*lag(p.accrual,2)+
           coef.stat.fwd$Mean[4]*lag(ep.ratio,2)+
           coef.stat.fwd$Mean[5]*lag(p.inv,2)) %>% 
  drop_na()
  

pred.rtn=pred.rtn %>% 
  mutate(rmse=rmse(return,pred.rtn))# calculate rmse

pred.rtn=pred.rtn %>% 
  filter(rmse>=quantile(rmse,0.1) & rmse<= quantile(rmse,0.9) ) #trucate the data

pred.rtn$gvkey=as.character(pred.rtn$gvkey)


#truncate outliers in rmse
ggplot(data=pred.rtn,
       aes(x=mkt.cap,y=rmse))+
  geom_point()+
  stat_smooth(method = 'nls', formula = y ~ a*exp(b *x), aes(colour = 'Exponential Fit'), se = FALSE, method.args =list(start=c(a=1,b=1)))+
  facet_wrap(~ fyear)+
  theme_classic()