In this homework, we are interesting in running a basic Fama-Macbeth type of regression predicting cross-sectional stock returns with independent variable.This type of forecasts could simulate investors behaviors on combining multiple companies financial metrics to estimate stocks expected return.
To study this problem, the data was retrieved from WRDS Compustat database. The dataset was from 2010-2019 and the the year \(t\) Calendar stock return was predicted by the \(t-2\) year variables, as taking in to consideration that the company may end its fiscal year in May and report its 10-K in October. In the Table and Figure section, a summary statistics of the coefficient estimates along with the coefficient estimates listed by year. Also, a facet wrap graph by financial year was provided by the end of the page with non-linear least square (a.k.a “nls”) method in showing the Root-mean-sqaure Error (RMSE) trend.
CSHO – Common Shares Outstanding:
This item represents the net number of all common shares outstanding at year-end, excluding treasury shares and scrip.
GVKEY – Standard and Poor’s Identifier:
The fundamental Compustat company identifier.
FYEAR – Fiscal Year: This item represents the fiscal year of the current fiscal year-end month.
RECT – Receivables Total
DP – Depreciation and Amortization:
Depreciation is concerned with spreading the actual cost or other basic value of tangible capital assets over their estimated useful life is a systematic manner. Amortization is the process of cost allocation for intangible assets
TXP – Income Taxes Payable:
This item represents the accrued tax liability on income due within one year.
XACC – Accrued Expenses
EBITDA – Earnings Before Interest:
This item is the sum of Sales - Net (SALE) minus Cost of Goods Sold (COGS) minus Selling, General & Administrative Expense (XSGA).
PRCC_F – Price Close - Annual - Fiscal
| Term | Mean | Standard Deviation |
|---|---|---|
| (Intercept) | -0.5034755 | 17.1193241 |
| ep.ratio | -0.0021094 | 0.0355030 |
| mkt.cap | -0.0722294 | 1.9626103 |
| p.accrual | 0.0028746 | 0.0095591 |
| p.inv | 0.3591119 | 0.8099841 |
By the Facet Wrap graph, as the graph was plotted as RMSE vs market cap.When first plotting the plot, the range of the RMSE is too large to observe a trend. After truncated the data at 10th percentile and 90th percentile, we can easily observe that the RMSE rapidly increase as the market cap gets larger, which shows that the our error are left skewed when plotting in terms of the market cap. We would doubt the FM regression quality with larger Market Cap firms.
#mutate needed variables to data
data = data %>%
group_by(gvkey,fyear) %>%
mutate(ep.ratio = ebitda/prcc_f,#calculate ep ratio
mkt.cap = log(csho * prcc_f),#calculate market cap
p.inv = 1/prcc_f, #calculate 1/p ratio
p.accrual = rect-dp-txp-xacc) %>% #calculate price accrual
drop_na()#drop any NA
rtn = data %>%
group_by(gvkey) %>%
mutate(return = prcc_f/lag(prcc_f),# calculate return
fwd.rtn=lead(prcc_f,2)/lead(prcc_f)) %>% # calculate forward returns
drop_na()
#run regression of fwd rtn
out.fwd = rtn %>%
group_by(fyear) %>%
do(out = tidy(lm(fwd.rtn ~ mkt.cap + p.accrual + ep.ratio + p.inv, data = .))) %>%
unnest(out)
#getting avg
coef.stat.fwd = out.fwd %>%
group_by(term) %>%
summarize(mean = mean(estimate),
sd = sd(estimate))
colnames(coef.stat.fwd)=c("Term","Mean","Standard Deviation")
knitr::kable(coef.stat.fwd,caption = "Summary Statistics for Estimate Coefficient")
#forecast rtn
pred.rtn = rtn %>%
group_by(gvkey) %>%
mutate(pred.rtn = coef.stat.fwd$Mean[1]+
coef.stat.fwd$Mean[2]*lag(mkt.cap,2)+
coef.stat.fwd$Mean[3]*lag(p.accrual,2)+
coef.stat.fwd$Mean[4]*lag(ep.ratio,2)+
coef.stat.fwd$Mean[5]*lag(p.inv,2)) %>%
drop_na()
pred.rtn=pred.rtn %>%
mutate(rmse=rmse(return,pred.rtn))# calculate rmse
pred.rtn=pred.rtn %>%
filter(rmse>=quantile(rmse,0.1) & rmse<= quantile(rmse,0.9) ) #trucate the data
pred.rtn$gvkey=as.character(pred.rtn$gvkey)
#truncate outliers in rmse
ggplot(data=pred.rtn,
aes(x=mkt.cap,y=rmse))+
geom_point()+
stat_smooth(method = 'nls', formula = y ~ a*exp(b *x), aes(colour = 'Exponential Fit'), se = FALSE, method.args =list(start=c(a=1,b=1)))+
facet_wrap(~ fyear)+
theme_classic()