Forecasting Predicted Returns

## Error in get(genname, envir = envir) : object 'testthat_print' not found

i. EXECUTIVE SUMMARY
ii. ANALYSIS
iii. PROCESS
    1.1 Methodology
    1.2 Data Cleaning
    1.3 Time Series Return Plots & Regression Outputs
    1.4 Mathematical Proof for Forecasting Predictions
    2. Expected Return v Forecast
    2.1 Model Estimations
    2.2 Plots : Historical v Prediction
iv. CODE

EXECUTIVE SUMMARY

This analysis shows that variables like dividend yield, market-book ratio, and yield spread can be used to make a reasonable prediction on stock returns. This can be seen in the plot using the estimated expected 12-month excess return obtained from the forecasting regressions of the 12-month excess return regression. By analyzing the predicted returns from the actual returns one can see that the two lines trend similarly. The regression used fits with Campbell-Shiller’s Theory that if the market-to-book ratio is high today, either future expected return on equity is high or expected future returns are low. Vice versa if the market-to-book ratio is low today.

From the predicted data it demonstrates that periods that have a high dividend yield end up being good predictors of having a future high excess returns. This could be due to the fact that during periods of high dividend yields, investors get more dividends per $1 that they invest in stocks. Since dividends get factored into returns for investors, having higher dividends ultimately leads to higher forecasted returns.

Periods that have a high term spread (ie steep yield curve term structure) end up being good predictors of having a future high excess returns. Thus, in times when the term spreads are large, investors foresee higher returns because the economy is in expansion. Conversely, in times when the term spreads are small, investors foresee lower returns because the economy is in recession.

ANALYSIS

Note : Blue line is the actual data, and red line is the forecast data.

It looks like periods that have a high dividend yield end up being good predictors of having a future high excess returns (Beta 1 is +0.2231487). This could be due to the fact that during periods of high dividend yields, investors get more dividends per $1 that they invest in stocks. Since dividends get factored into returns for investors, having higher dividends ultimately leads to higher forecasted returns.

It looks like periods that have a high term spread (ie steep yield curve term structure) end up being good predictors of having a future high excess returns (Beta 2 is +3.2723950). This is due to the fact that during expansions there is usually a steep term structure curve because investors foresee higher inflation and lower unemployment in expansions, and as such they expect the Federal Reserve to tighten monetary policy (ie raise interest rates) to return the economy to its long-run dynamics. Conversely, during recessions there is usually a flat term structure curve because investors foresee lower inflation and higher unemployment in recessions, and as such they expect the Federal Reserve to loosen monetary policy (ie lower interest rates) to return the economy to its long-run dynamics. Thus, in times when the term spreads are large, investors foresee higher returns because the economy is in expansion. Conversely, in times when the term spreads are small, investors foresee lower returns because the economy is in recession.

It looks like periods that have a high default spread (ie spread between AAA and BAA debt) end up being good predictors of having a future high excess returns (Beta 3 is +1.26416929). This is due to the fact that when there is a lot of default risk (ie when there is a high default spread), investors foresee higher excess returns in the market in order to compensate for the extra risk due to default.

The risks which D/P, default spread, and term spread are proxies for, generally are higher when times are bad and lower when times are good. The coefficients of the regressions are all positive. It makes sense that the expected excess return is opposite to business conditions.

Compared to simply regressing returns on lagged variables, an AR(1) is more efficient (with only two parameters) as we use monthly observations to estimate the model and then infer longer horizon expectations using its dynamics. It also means that we have to assume a specific function form. There is a trade-of between efficiency and flexibility.

PROCESS

1.1 METHODOLOGY

Using CRSP data containing S&P 500 Stock data, downloaded monthly market returns ex and cum dividends, as well as the monthly t-bill rate, from 1963 through 2019. Created the market dividend yield by summing the dividends over the last 12 months and divide by current price by using information extracted using the ex- and cum-dividend returns. Then constructing excess returns by subtracting the log of the 1-month gross t-bill rate from the 1-month gross cum-dividends returns.

From the St. Louis Fed data page downloaded monthly data on the term and default spreads for the same sample. Using the “10-Year Treasury Constant Maturity Minus Federal Funds Rate,” for the latter and subtracting “Moody’s Seasoned AAA Corporate Bond Minus Federal Funds Rate” from “Moody’s Seasoned Baa Corporate Bond Minus Federal Funds Rate”.

From St. Louis Federal Reserve Economic Data Website downloaded the monthly term spreads and the Monthly default spreads. The monthly term spreads are taken as the 10-Year Treasury Constant Maturity less the Federal Funds Rate. The Monthly default spreads is taken as Moody’s BAA Corporate Bond rate less Moody’s AAA Corporate Bond rate.

1.2 DATA CLEANING

return <- read_excel("data.xlsx", sheet = "Return")
return = data.frame(return)
colnames(return) = c("date","vwretd","vwretx")
return <- return %>%
          mutate(date = as.Date(as.character(date),format = "%Y%m%d"),DP = vwretd - vwretx,Pt = NA,Dt = NA)
return$Pt[1] = 1  # assuming P_0 = 1
return$Dt[1] = (return$DP[1]) * 1  #Assuming P_-1 = 1
for (i in 2 : nrow(return)){
  return$Pt[i] = (1 + return$vwretx[i]) * return$Pt[i-1]
  return$Dt[i] = (return$DP[i]) * return$Pt[i-1]
}

DA = c()
DA = rep(NA,11)
cumsum_12 = 0
for(i in 1:12){
  cumsum_12 = cumsum_12 + return$Dt[i]
}
DA[12] = cumsum_12
for(i in 13 : nrow(return)){
  cumsum_12 = cumsum_12 - return$Dt[i-12] + return$Dt[i]
  DA[i] = cumsum_12
}
return = return %>%
         mutate(DA = DA,
                market_dividend = DA/Pt)  #dividend yield = DA/Pt

T_Note = read_excel("data.xlsx",sheet = "TNote")
T_Note = data.frame(T_Note)
colnames(T_Note) = c("date","t30")
T_Note = T_Note %>%
         mutate(date = as.Date(as.character(date), format = "%Y%m%d"),log_return = log(1 + t30))
return = return %>%
         mutate(t_log_return = T_Note$log_return,
                excess_return = log(1 + vwretd) - t_log_return)

term_spread = read_excel("spreads.xlsx",sheet = "T10YFF")
term_spread = data.frame(term_spread)
colnames(term_spread) = c("date", "T10YFF")

term_spread <- term_spread %>%
               mutate(date = as.Date(date, format = "%Y-%m-%d"),
               T10YFF = na.locf(as.numeric(as.character(term_spread$T10YFF))),)  #replace NA with the previous term
term_spred_m <- term_spread %>%
                 group_by(cut(date, breaks = "1 month")) %>%
                 summarize(T10YFF = mean(T10YFF))%>%
                 mutate(ret = T10YFF/100)  
a = as.numeric(term_spred_m$ret)
AAA = read_excel("spreads.xlsx", sheet = "AAAFFM")
AAA = data.frame(AAA)
colnames(AAA) = c("date","AAAFFM")
AAA = AAA %>%
      mutate(date = as.Date(date, format = "%Y-%m-%d"),
             AAAFFM = AAAFFM / 100)

BAA = read_excel("spreads.xlsx", sheet = "BAAFFM")
BAA = data.frame(BAA)
colnames(BAA) = c("date","BAAFFM")
BAA = BAA %>%
  mutate(date = as.Date(date, format = "%Y-%m-%d"),
         BAAFFM = BAAFFM / 100,
         spread = BAAFFM - AAA$AAAFFM)

1.3 Time Series Plots & Regression

Prediction Regression
	intercept	std	dividend	std	term spread	std	default spread	std	R^2
1m	-0.0057179	0.0058015	0.2231487	0.1928545	0.3016158	0.1141633	0.0367866	0.6953044	0.0138658
3m	-0.0174720	0.0194749	0.6458052	0.6082086	0.8380644	0.3547351	0.2531925	1.7792268	0.0360835
12m	-0.0769157	0.1051819	0.2231487	0.1928545	3.2723950	1.7645725	1.2641693	6.1390901	0.1305369
24m	-0.0880871	0.2256513	4.1771807	5.7128192	4.6949814	2.7553200	1.3685092	8.9830961	0.1387652
60m	-0.1630004	0.2274923	6.4602075	5.1060864	0.3016158	0.1141633	12.7504726	14.5257271	0.2601113

1.4 Mathematical Proof for Forecasting Predictions

The above regression outputs fit with Campbell-Shiller’s Theory that if the market-to-book ratio is high today, either future expected return on equity is high or expected future returns are low. Vice versa if the market-to-book ratio is low today. Below is the proof of this theory.

A log-linear present value formula using the Campbell-Shiller return decomposition ($r_{t+1} \approx \kappa_0 + \rho \times pd_{t+1} - pd_t + \Delta d_{t+1}$). While this decomposition is very accurate, it does require a stationary pd-ratio and positive dividends. Many firms do not pay dividends, which means this formula is not always useful at the firm level.

However, Ohlson (Contemporary Accounting Research, 1995) and Vuolteenaho (Journal of Finiance, 2002) derive a return decomposition that uses market-to-book ratios and return-on-equity instead of the pd-ratio and dividends. In particular, the alter return decomposition is:_** \[r_{t+1} \approx \kappa + mb_{t+1} - mb_t + roe_{t+1}\] where $\kappa = 0.97$ with annual data, $mb_t = lnM_t/B_t$ and $roe_t = ln(1+ROE_t)$. Here, $M_t$ is the market value of equity, $B_t$ is the book value of equity, $ROE_t = \frac{E_t}{B_{t-1}}$ is return-on-equity where $E_t$ is earnings. With this decomposition we have:

\[mb_t \approx \mathbb{E}_t\Bigg(\sum_{j=1}^\infty \kappa^{j-1} roe_{t+j}\Bigg) - \mathbb{E}_t\Bigg(\sum_{j=1}^\infty \kappa^{j-1} r_{t+j}\Bigg)\] where $\mathbb{E}_t(.)$ is the conditional expectations operator.

EXPECTED RETURNS V FORECAST

2.1 Model Description

An AR(1) model is more parsimonious, thus it requires fewer estimations to be made. Also, by using an AR(1) model, we inherently predict excess returns at long horizons by lagged excess returns, and this could be useful when the excess return series experiences high autocorrelation. In this case, for long horizons, many of the returns overlap when we are running the OLS model. Thus, in order to take into account for the overlap in observations, we can consider an AR(1) model to take addvantage of the high autocorrelation in the series.

In addition to this, there may be other predictor variables (other than the lagged dividend yield, term spread, and default spread) outside the model that we are not accounting for. Lags of excess returns could be such a variable that is left unaccounted for in the model that may be useful.

2.2 Plots : Historical v Prediction

Plotting the estimated expected 12-month excess return that obtained from the forecasting regressions of the 12-month excess return regression for comparison.

CODE

return <- read_excel("data.xlsx", sheet = "Return")
return = data.frame(return)
colnames(return) = c("date","vwretd","vwretx")
return <- return %>%
          mutate(date = as.Date(as.character(date),format = "%Y%m%d"),DP = vwretd - vwretx,Pt = NA,Dt = NA)
return$Pt[1] = 1  # assuming P_0 = 1
return$Dt[1] = (return$DP[1]) * 1  #Assuming P_-1 = 1
for (i in 2 : nrow(return)){
  return$Pt[i] = (1 + return$vwretx[i]) * return$Pt[i-1]
  return$Dt[i] = (return$DP[i]) * return$Pt[i-1]
}

DA = c()
DA = rep(NA,11)
cumsum_12 = 0
for(i in 1:12){
  cumsum_12 = cumsum_12 + return$Dt[i]
}
DA[12] = cumsum_12
for(i in 13 : nrow(return)){
  cumsum_12 = cumsum_12 - return$Dt[i-12] + return$Dt[i]
  DA[i] = cumsum_12
}
return = return %>%
         mutate(DA = DA,
                market_dividend = DA/Pt)  #dividend yield = DA/Pt

T_Note = read_excel("data.xlsx",sheet = "TNote")
T_Note = data.frame(T_Note)
colnames(T_Note) = c("date","t30")
T_Note = T_Note %>%
         mutate(date = as.Date(as.character(date), format = "%Y%m%d"),log_return = log(1 + t30))
return = return %>%
         mutate(t_log_return = T_Note$log_return,
                excess_return = log(1 + vwretd) - t_log_return)

term_spread = read_excel("spreads.xlsx",sheet = "T10YFF")
term_spread = data.frame(term_spread)
colnames(term_spread) = c("date", "T10YFF")

term_spread <- term_spread %>%
               mutate(date = as.Date(date, format = "%Y-%m-%d"),
               T10YFF = na.locf(as.numeric(as.character(term_spread$T10YFF))),)  #replace NA with the previous term

## Warning in na.locf(as.numeric(as.character(term_spread$T10YFF))): NAs introduced
## by coercion

term_spred_m <- term_spread %>%
                 group_by(cut(date, breaks = "1 month")) %>%
                 summarize(T10YFF = mean(T10YFF))%>%
                 mutate(ret = T10YFF/100)  
a = as.numeric(term_spred_m$ret)
AAA = read_excel("spreads.xlsx", sheet = "AAAFFM")
AAA = data.frame(AAA)
colnames(AAA) = c("date","AAAFFM")
AAA = AAA %>%
      mutate(date = as.Date(date, format = "%Y-%m-%d"),
             AAAFFM = AAAFFM / 100)

BAA = read_excel("spreads.xlsx", sheet = "BAAFFM")
BAA = data.frame(BAA)
colnames(BAA) = c("date","BAAFFM")
BAA = BAA %>%
  mutate(date = as.Date(date, format = "%Y-%m-%d"),
         BAAFFM = BAAFFM / 100,
         spread = BAAFFM - AAA$AAAFFM)

#plot(x = return$date[12:nrow(return)], y = return$market_dividend[12:nrow(return)],type = "l", main = "Market Dividend yield over time", xlab = "Time", ylab = "Market Dividend Yield", col = "blue")
#plot(x = return$date, y = return$excess_return, main = "Excess Return Over time", xlab = "Time", ylab = "Excess Return(log)", type = "l", col = "blue")
#plot(x = AAA$date, y = term_spred_m$ret, main = "Term Spread Over Time", col = "blue", type = "l", xlab = "Time", ylab = "Term Spread")
#plot(x = BAA$date, y = BAA$spread, main = "Default Spread Over Time", col = "blue", type = "l", xlab = "Time", ylab = "Default Spread")


sum_3 = c()
sum_3 = rep(NA,2)
temp = 0   #cum sum 3 periods of r
for(i in 1:3){  # first 3 months can't use
  temp = temp + return$excess_return[i] 
}
sum_3[3] = temp
for(i in 4 : nrow(return)){
  temp = temp - return$excess_return[i-3] + return$excess_return[i]  #sum r_t = sum r_(t-1) + r_t - r_(t-3)
  sum_3[i] = temp
} 

#12 month
sum_12 = c()
sum_12 = rep(NA,11)
temp = 0   #cum sum 12 periods of r
for(i in 1:12){  # first 12 months can't use
  temp = temp + return$excess_return[i] 
}
sum_12[12] = temp
for(i in 13 : nrow(return)){
  temp = temp - return$excess_return[i-12] + return$excess_return[i]  #sum r_t = sum r_(t-1) + r_t - r_(t-12)
  sum_12[i] = temp
} ##sum_12[i] = cumsum of past 12 months r

#24 month
sum_24 = c()
sum_24 = rep(NA,23)
temp = 0   #cum sum 24 periods of r
for(i in 1:24){  # first 24 months can't use
  temp = temp + return$excess_return[i] 
}
sum_24[24] = temp
for(i in 25 : nrow(return)){
  temp = temp - return$excess_return[i-24] + return$excess_return[i]  #sum r_t = sum r_(t-1) + r_t - r_(t-24)
  sum_24[i] = temp
}

#60 month
sum_60 = c()
sum_60 = rep(NA,59)
temp = 0   
for(i in 1:60){  
  temp = temp + return$excess_return[i] 
}
sum_60[60] = temp
for(i in 61 : nrow(return)){
  temp = temp - return$excess_return[i-60] + return$excess_return[i]  
  sum_60[i] = temp}

return = return %>%
         mutate(sum_3 = sum_3,
                sum_12 = sum_12,
                sum_24 = sum_24,
                sum_60 = sum_60)



dividend_yield_new = return$market_dividend[12:nrow(return)] #starts from not NA dividend
excess_return_new = return$excess_return[12:nrow(return)]
term_spread_new = term_spred_m$ret[12:nrow(term_spred_m)]
default_spread_new = BAA$spread[12:nrow(BAA)]
out = lm(return$excess_return[12:nrow(return)] ~ back(dividend_yield_new,1) + back(term_spread_new,1) + back(default_spread_new,1)) #r_t+1~ dividend_t + term spread_t + default spread_t
out_3 = lm(return$sum_3[12:nrow(return)] ~ back(dividend_yield_new,3) + back(term_spread_new,3) + back(default_spread_new,3))
out_12 = lm(return$sum_12[12:nrow(return)] ~ back(dividend_yield_new,12) + back(term_spread_new,12) + back(default_spread_new,12)) #sum back 12m return ~ dividend 12m back + term spread 12m back + default 12m back  
out_24 = lm(return$sum_24[24:nrow(return)]~ back(return$market_dividend[24:nrow(return)],24) + back(term_spred_m$ret[24:nrow(term_spred_m)],24) + back(BAA$spread[24:nrow(BAA)],24))
out_60 = lm(return$sum_60[60:nrow(return)]~back(return$market_dividend[60:nrow(return)],60) + back(term_spred_m$ret[60:nrow(term_spred_m)],60) + back(BAA$spread[60:nrow(BAA)],60))

##Test heteroskedasticity(p small)
#lmtest::bptest(out)  
#lmtest::bptest(out_3) 
#lmtest::bptest(out_12) 
#lmtest::bptest(out_24)  
#lmtest::bgtest(out_60) 

#Change to Newy West Std
month_1_summary = summary(out)
month_1_summary$coefficients <- unclass(coeftest(out, vcov. = NeweyWest))
month_3_summary = summary(out_3)
month_3_summary$coefficients <- unclass(coeftest(out_3, vcov. = NeweyWest))
month_12_summary = summary(out_12)
month_12_summary$coefficients <- unclass(coeftest(out_12, vcov. = NeweyWest))
month_24_summary = summary(out_24)
month_24_summary$coefficients <- unclass(coeftest(out_24, vcov. = NeweyWest))
month_60_summary = summary(out_60)
month_60_summary$coefficients <- unclass(coeftest(out_60, vcov. = NeweyWest))

predict_regression = matrix(c(month_1_summary$coef[1,1:2],month_1_summary$coef[2,1:2], month_1_summary$coef[3,1:2], month_1_summary$coef[4,1:2],month_1_summary$r.squared,
                              month_3_summary$coef[1,1:2],month_3_summary$coef[2,1:2], month_3_summary$coef[3,1:2], month_3_summary$coef[4,1:2],month_3_summary$r.squared,
                              month_12_summary$coef[1,1:2],month_1_summary$coef[2,1:2], month_12_summary$coef[3,1:2], month_12_summary$coef[4,1:2],month_12_summary$r.squared,
                              month_24_summary$coef[1,1:2],month_24_summary$coef[2,1:2], month_24_summary$coef[3,1:2], month_24_summary$coef[4,1:2],month_24_summary$r.squared,
                              month_60_summary$coef[1,1:2],month_60_summary$coef[2,1:2], month_1_summary$coef[3,1:2], month_60_summary$coef[4,1:2],month_60_summary$r.squared),
                              nrow = 5, ncol = 9, byrow = TRUE)
rownames(predict_regression) = c("1m","3m","12m","24m","60m")
colnames(predict_regression) = c("intercept", "std", "dividend","std","term spread","std", "default spread","std", "R^2")


return = return %>%
         mutate(forecast = NA)
for(i in 12:nrow(return)){
  return$forecast[i] = out_12$coefficients[1] + out_12$coefficients[2] * return$market_dividend[i] + out_12$coefficients[3] * term_spred_m$ret[i] + out_12$coefficients[4] * BAA$spread[i]
}
                                                                                                                                                
#par(mfrow =c(1,1))
#plot(x = return$date[24:nrow(return)], y = return$excess_return[24 : nrow(return)], type = "l", col = "blue", xlab = "Time", ylab = "Excess return", main = "Monthly Historical Excess Return Over Time")
#plot(x = return$date[24:nrow(return)], y = return$sum_12[24 : nrow(return)], type = "l", col = "blue", xlab = "Time", ylab = "12 month Excess and Forecasted Excess return", main = "12 Month Historical and Perdicted Excess Return Over Time")
#lines(x = return$date[24:nrow(return)], y = return$forecast[24 : nrow(return)], col = "red", type = "l", xlab = "Time", ylab = "Forecasted Excess return", main = "Perdicted Excess Return Over Time")