library(tidyverse)
library(lubridate)
library(kableExtra)
library(knitr)

Sentiment and Equity Markets

In this section, we evaluate the relationship how our sentiment index compares to a broad US equity index (the Russell 1000 Index). This section will examine the fluctuations of the sentiment compared to the equity market in two ways: through a visual analysis of the normalized levels of both variables and a linear regression of the time series data. To accomplish this, we first merge 3 data sets aligned by the 102 FOMC meeting dates. To calculate normalized versions of the variables, we calculate Z-scores of both variables over the sample period. Lastly, we perform both analyses using the Z-score data.

# First load all 3 files into data frames.
# ------------------------------------------------------
mgData<-readRDS(file = "fomc_merged_data_v2.rds")
sData <- readRDS( file = "../DATA/SentimentDF.rds")
file_fred_ru1000tr = "../DATA/FRED_RU1000TR.csv"

ru1000tr = read_csv(file_fred_ru1000tr, 
                    col_types = cols(DATE=col_character(), 
                                     RU1000TR = col_double() ) )

# Generate a lubridate date column to join with the FOMC data.
# -----------------------------------------------------------------
ru1000tr %>% mutate( date_mdy = lubridate::ymd( DATE ) )-> ruData

 #z_ru_daily = (RU1000TR - mean(RU1000TR, na.rm=TRUE))/sd(RU1000TR, na.rm = TRUE )

#  Second, join the data:
#  Since this is a 2-way inner join, we start with the FOMC statement data
#  and join it to the sentiment data by date string (yyyymmdd)
# -------------------------------------------------------------------------
mgData %>% inner_join(sData, by = c( "statement.dates" = "FOMC_Date")) -> msData

#  Join the sentiment-FOMC data to the Russell 1000 Index data from FRED
#  Make sure to add a Z-score for each of the time series: sentiment and Rusell index values
#     Save the raw data and normalized data by FOMC data.
# ----------------------------------------------------------------------------------
msEQdata = msData %>% left_join(ruData, by = c("date_mdy" = "date_mdy") ) %>% 
                    select( date_mdy, Sentiment_Score, RU1000TR ) %>%
                    mutate( z_ru_fomc = (RU1000TR - mean(RU1000TR, na.rm = TRUE) ) / sd( RU1000TR, na.rm=TRUE ) ,
                            z_sentiment = ( Sentiment_Score - mean( Sentiment_Score, na.rm = TRUE) ) / 
                              sd( Sentiment_Score, na.rm=TRUE) )

Data Transformation: Scale and Frequency Domain Issues

Let’s inspect the data for accuracy and scaling issues. Exploratory data analysis shows 3 issues.

  • Normalization to z-score format is needed to ensure that scale is not a problem. Since the Russell Index level are expressed in the thousands, while the sentiment is on expressed in units of 0.01, scaling is essential along the y-dimension. To solve the scale problem, we convert the entire sample to Z-score equivalent which bring both time series to the same order of magnitude and mean.

  • There is also a need to normalize in the frequency domain. FOMC meetings occur 8 times per year so their sentiment levels and changes reflect nearly 2 months of news. Russell equity index levels are collected on a daily basis in order to ensure completeness of the data collection. The volatility of lower frequency data is much greater in absolute terms than volatility of higher frequency (daily) data. To address this, we only calculate Z-scores of the Russell equity index levels observed only on the FOMC dates.

  • Lastly, Russell Index levels increases at a geometric rate (roughly). Thus, values at the start of the sample period are smaller than values at the end of the period. The residuals in a regression of such data show significant increase volatility over the sample period. This is solved by apply a logarithmic transformation to Russell Index levels. This change fixes the non-constant residual volatility and also improves the model fit from 36 to 39 percent adjusted R-squared roughly.

The following code produces the log-transformed z-scores of FOMC periodic equity values.

msEQdata %>% mutate( logEquity = log(RU1000TR) ) %>%
             mutate( z_logEquity = ( logEquity - mean(logEquity) )/ sd( logEquity ) ) -> msEQdata
  

msEQdata %>%  kable() %>% scroll_box(width="100%", height="200px")
date_mdy Sentiment_Score RU1000TR z_ru_fomc z_sentiment logEquity z_logEquity
2007-01-31 0.0263158 3497.78 -0.7129900 2.8720683 8.159884 -0.6445636
2007-03-21 -0.0285714 3505.86 -0.7089570 -0.7514790 8.162191 -0.6388734
2007-05-09 -0.0280374 3698.71 -0.6127007 -0.7162224 8.215739 -0.5068180
2007-06-28 -0.0090090 3682.96 -0.6205620 0.5399938 8.211472 -0.5173417
2007-08-07 -0.0307692 3610.26 -0.6568484 -0.8965736 8.191535 -0.5665083
2007-08-17 -0.0126582 3531.38 -0.6962194 0.2990795 8.169444 -0.6209871
2007-09-18 -0.0250000 3726.95 -0.5986054 -0.5157003 8.223346 -0.4880606
2007-10-31 -0.0370370 3816.32 -0.5539986 -1.3103620 8.247042 -0.4296229
2007-12-11 -0.0169492 3650.86 -0.6365839 0.0158010 8.202718 -0.5389300
2008-01-22 -0.0522876 3234.17 -0.8445644 -2.3171733 8.081528 -0.8377977
2008-01-30 -0.0259740 3353.44 -0.7850337 -0.5800036 8.117742 -0.7484895
2008-03-18 -0.0451977 3299.30 -0.8120564 -1.8491157 8.101466 -0.7886286
2008-04-30 -0.0310881 3452.16 -0.7357601 -0.9176236 8.146755 -0.6769394
2008-06-25 -0.0379747 3328.66 -0.7974020 -1.3722636 8.110325 -0.7667802
2008-08-05 -0.0454545 3219.69 -0.8517917 -1.8660695 8.077040 -0.8488637
2008-09-16 -0.0647482 3047.69 -0.9376413 -3.1397991 8.022139 -0.9842554
2008-10-08 -0.0490566 2454.63 -1.2336525 -2.1038704 7.805731 -1.5179389
2008-10-29 -0.0478723 2309.41 -1.3061355 -2.0256876 7.744747 -1.6683314
2008-12-16 -0.0466926 2277.87 -1.3218779 -1.9478039 7.730996 -1.7022435
2009-01-28 -0.0136519 2201.85 -1.3598214 0.2334807 7.697053 -1.7859500
2009-03-18 -0.0270270 2016.35 -1.4524092 -0.6495206 7.609044 -2.0029889
2009-04-29 -0.0447761 2234.29 -1.3436298 -1.8212812 7.711679 -1.7498818
2009-06-24 -0.0430622 2311.77 -1.3049576 -1.7081316 7.745769 -1.6658126
2009-08-12 -0.0270270 2597.64 -1.1622726 -0.6495206 7.862359 -1.3782903
2009-09-23 -0.0377358 2751.63 -1.0854123 -1.3564962 7.919949 -1.2362673
2009-11-04 -0.0307167 2711.23 -1.1055770 -0.8931072 7.905158 -1.2727435
2009-12-16 -0.0224719 2889.67 -1.0165131 -0.3488007 7.968898 -1.1155546
2010-01-27 -0.0146628 2868.68 -1.0269897 0.1667444 7.961607 -1.1335332
2010-03-16 -0.0071942 3049.00 -0.9369874 0.6598011 8.022569 -0.9831956
2010-04-28 -0.0154440 3143.24 -0.8899499 0.1151672 8.053009 -0.9081264
2010-06-23 -0.0084746 2886.76 -1.0179655 0.5752760 7.967890 -1.1180393
2010-08-10 -0.0273973 2963.05 -0.9798873 -0.6739627 7.993974 -1.0537126
2010-09-21 -0.0185185 3026.69 -0.9481229 -0.0878055 8.015225 -1.0013068
2010-11-03 -0.0193548 3195.65 -0.8637907 -0.1430177 8.069546 -0.8673460
2010-12-14 -0.0263158 3329.18 -0.7971425 -0.6025661 8.110481 -0.7663950
2011-01-26 -0.0144928 3485.32 -0.7192091 0.1779677 8.156315 -0.6533642
2011-03-15 0.0034965 3460.65 -0.7315225 1.3655834 8.149212 -0.6708819
2011-04-27 0.0034843 3674.91 -0.6245799 1.3647791 8.209284 -0.5227379
2011-06-22 -0.0350877 3499.60 -0.7120816 -1.1816718 8.160404 -0.6432807
2011-08-09 -0.0280374 3173.99 -0.8746018 -0.7162224 8.062745 -0.8841181
2011-09-21 -0.0298913 3167.68 -0.8777512 -0.8386146 8.060755 -0.8890257
2011-11-02 -0.0132013 3366.74 -0.7783954 0.2632256 8.121700 -0.7387281
2011-12-13 -0.0110294 3337.07 -0.7932044 0.4066108 8.112849 -0.7605573
2012-01-25 -0.0114504 3630.15 -0.6469208 0.3788192 8.197029 -0.5529591
2012-03-13 -0.0185185 3838.77 -0.5427932 -0.0878055 8.252907 -0.4151583
2012-04-25 0.0000000 3827.04 -0.5486480 1.1347511 8.249847 -0.4227054
2012-06-20 -0.0198675 3731.04 -0.5965640 -0.1768659 8.224442 -0.4853558
2012-08-01 -0.0140351 3781.35 -0.5714530 0.2081819 8.237836 -0.4523247
2012-09-13 -0.0028736 4040.51 -0.4420997 0.9450440 8.304126 -0.2888473
2012-10-24 -0.0086207 3908.01 -0.5082338 0.5656299 8.270784 -0.3710736
2012-12-12 -0.0160183 3988.68 -0.4679694 0.0772537 8.291216 -0.3206861
2013-01-30 -0.0218978 4215.41 -0.3548027 -0.3108997 8.346502 -0.1843440
2013-03-20 -0.0169903 4397.11 -0.2641116 0.0130851 8.388703 -0.0802730
2013-05-01 -0.0141844 4464.70 -0.2303757 0.1983248 8.403957 -0.0426539
2013-06-19 -0.0160920 4604.29 -0.1607028 0.0723916 8.434744 0.0332686
2013-07-31 -0.0227790 4789.46 -0.0682798 -0.3690770 8.474173 0.1305049
2013-09-18 -0.0181087 4931.06 0.0023963 -0.0607469 8.503309 0.2023580
2013-10-30 -0.0204918 5048.76 0.0611434 -0.2180779 8.526898 0.2605301
2013-12-18 -0.0255941 5198.37 0.1358175 -0.5549249 8.556100 0.3325462
2014-01-29 -0.0227704 5114.60 0.0940058 -0.3685063 8.539855 0.2924821
2014-03-19 -0.0232975 5393.54 0.2332318 -0.4033039 8.592957 0.4234388
2014-04-30 -0.0211946 5446.93 0.2598801 -0.2644755 8.602807 0.4477304
2014-06-18 -0.0116279 5680.51 0.3764658 0.3670993 8.644796 0.5512792
2014-07-30 -0.0092937 5719.90 0.3961263 0.5212004 8.651707 0.5683207
2014-09-17 -0.0069808 5831.98 0.4520683 0.6738921 8.671112 0.6161759
2014-10-29 -0.0043290 5772.43 0.4223454 0.8489586 8.660848 0.5908653
2014-12-17 -0.0064795 5873.73 0.4729068 0.7069883 8.678245 0.6337674
2015-01-28 -0.0053619 5872.69 0.4723877 0.7807669 8.678068 0.6333307
2015-03-18 -0.0106383 6193.92 0.6327217 0.4324314 8.731323 0.7646637
2015-04-29 -0.0214477 6219.01 0.6452448 -0.2811857 8.735366 0.7746331
2015-06-17 -0.0056180 6225.98 0.6487237 0.7638631 8.736486 0.7773955
2015-07-29 -0.0084746 6244.98 0.6582071 0.5752760 8.739533 0.7849099
2015-09-17 -0.0105541 5927.08 0.4995351 0.4379906 8.687287 0.6560654
2015-10-28 -0.0134771 6193.52 0.6325221 0.2450199 8.731259 0.7645045
2015-12-16 -0.0107527 6147.98 0.6097919 0.4248795 8.723879 0.7463046
2016-01-27 -0.0226629 5577.69 0.3251457 -0.3614088 8.626530 0.5062326
2016-03-16 0.0000000 6035.22 0.5535105 1.1347511 8.705368 0.7006540
2016-04-27 -0.0027397 6265.80 0.6685988 0.9538797 8.742862 0.7931179
2016-06-15 -0.0205882 6215.64 0.6435627 -0.2244441 8.734824 0.7732964
2016-07-27 -0.0084507 6512.26 0.7916133 0.5768520 8.781442 0.8882607
2016-09-21 -0.0053763 6531.17 0.8010517 0.7798153 8.784341 0.8954113
2016-11-02 -0.0054348 6335.69 0.7034827 0.7759573 8.753954 0.8204730
2016-12-14 -0.0117302 6840.08 0.9552365 0.3603458 8.830555 1.0093779
2017-02-01 -0.0123457 6941.77 1.0059926 0.3197134 8.845312 1.0457711
2017-03-15 -0.0117302 7274.32 1.1719767 0.3603458 8.892106 1.1611686
2017-05-03 -0.0235294 7298.39 1.1839906 -0.4186149 8.895409 1.1693152
2017-06-14 -0.0085470 7472.19 1.2707386 0.5704942 8.918943 1.2273533
2017-07-26 -0.0092593 7607.26 1.3381555 0.5234728 8.936858 1.2715333
2017-09-20 -0.0260870 7727.34 1.3980904 -0.5874590 8.952520 1.3101565
2017-11-01 -0.0212766 7954.93 1.5116864 -0.2698884 8.981547 1.3817404
2017-12-13 -0.0129870 8231.14 1.6495498 0.2773738 9.015680 1.4659149
2018-01-31 -0.0037037 8732.78 1.8999310 0.8902398 9.074839 1.6118075
2018-03-21 0.0069686 8440.61 1.7541016 1.5948072 9.040810 1.5278882
2018-05-02 0.0000000 8213.04 1.6405156 1.1347511 9.013478 1.4604861
2018-06-13 0.0000000 8687.40 1.8772807 1.1347511 9.069629 1.5989590
2018-08-01 0.0147059 8800.09 1.9335271 2.1056048 9.082517 1.6307427
2018-09-26 0.0104712 9118.25 2.0923288 1.8260396 9.118033 1.7183286
2018-11-08 0.0050761 8798.28 1.9326237 1.4698681 9.082311 1.6302354
2018-12-19 0.0000000 7877.65 1.4731140 1.1347511 8.971785 1.3576657
2019-01-30 0.0046729 8468.86 1.7682018 1.4432466 9.044151 1.5361282
2019-03-20 -0.0177778 8948.39 2.0075474 -0.0389032 9.099229 1.6719554
2019-05-01 -0.0144231 9277.94 2.1720341 0.1825676 9.135395 1.7611441

Charting the Time Series Alternatives

In this section, we will show 3 time series charts illustrating the alternative considerations of regression modeling.

The first chart below shows the raw sentiment compared to raw Russell equity levels. Scale issues are obvious since the sentiment values are compressed to the appearance of a slightly fuzzy flat line. The chart below shows scaling is essential.

ggplot() + 
  geom_line(data=msEQdata, aes(x=date_mdy, y=Sentiment_Score) , color = "red" ) +
  geom_line(data=msEQdata, aes(x=date_mdy, y=RU1000TR), color="green") +
  ggtitle("Sentiment vs. Russell 1000 Equity Level", subtitle="Not usable without fixes")

The second chart shows the use of scaled sentiment versus scaled Russell equity levels. Scale issues are remain because the right hand side (the more recent years) shows higher variation than the left hand side (earliest years).

ggplot() + 
  geom_line(data=msEQdata, aes(x=date_mdy, y=z_sentiment) , color = "red" ) +
  geom_line(data=msEQdata, aes(x=date_mdy, y=z_ru_fomc), color="green") +
  ggtitle("Scaled Sentiment vs. Scaled Equity Index", subtitle = "Nearly There...")

Finally, the third chart shows the variables we will use in the regression analysis.

ggplot() + 
  geom_line(data=msEQdata, aes(x=date_mdy, y=z_sentiment) , color = "red" ) +
  geom_line(data=msEQdata, aes(x=date_mdy, y=z_logEquity), color="green") +
  ggtitle("Scaled-Sentiment vs. Scaled Log Equity Price", subtitle="What we will use")

Regressing Sentiment to Financial Variables

The final regression model we present uses the scaled, log-transformed data with the removal of an influential outlier (observation 1 of Jan 2007). For a reason yet to be determined, Jan 2007 generates the highest sentiment of the entire observation period. This is arguably wrong as the Sept 2018 period was possibly the most euphoric in recent memory. It is calculated in the code chunk below.

mod1 = lm( z_logEquity ~ z_sentiment, data=msEQdata[2:102,])

summary(mod1)
## 
## Call:
## lm(formula = z_logEquity ~ z_sentiment, data = msEQdata[2:102, 
##     ])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.96078 -0.59194  0.09201  0.58226  1.67231 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.02467    0.07894   0.313    0.755    
## z_sentiment  0.64312    0.08237   7.807 6.19e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.793 on 99 degrees of freedom
## Multiple R-squared:  0.3811, Adjusted R-squared:  0.3748 
## F-statistic: 60.96 on 1 and 99 DF,  p-value: 6.192e-12

The mod1 clearly has a statistically significant leading coefficient because the p-value is 6.19e-12. The adjusted-R-squared of 37 percent suggests the model has some explanatory power.

Examining the diagnostic plots below shows: * Q-Q plot and histogram of residuals show reasonable approximation to normality. * residuals have relatively homogenous variance across the range of observations * residuals have little trend in relative to the fitted values * leverage plot has controlled for most influential outlier (observation 1)

par(mfrow=c(3,2))
plot(mod1)
hist(mod1$residuals )

Finally, we present the scatterplot of regressed values overlay with the regression line to study the model fit.

ggplot(data=msEQdata[2:102,], aes(x=z_sentiment, y=z_logEquity) ) + 
   geom_point() + 
   geom_smooth(method=lm) +
   ggtitle("ScatterPlot of Fitted Regression Model", subtitle="X=Z-Sentiment, Y=Z-LogRussell 1000 (2007-2019)")

Discussion of Results

There are two comments related to the time series and regression we should make.

First, the time series of sentiment clearly shows a pattern characteristic of other financial variables through the 2007-2019 period. During the Q4 2008, at the depths of the financial crisis, sentiment appears to be at a low. During H2 2009, when the financial markets had miraculously recovered, the sentiment spikes upward. Other signs that sentiment is effective include the 2018 euphoria when equity markets reached daily highs during the summer and fall. Moreover, sentiment in Q4 2018 and Q1 2019 declined in concert with the observed selloff of risk assets in the same period.

However, the sentiment index is imperfect. The 2013 taper tantrum is not reflected correctly from a bond investor point of view. As we recall, on May 22, 2013, bond markets panicked when Bernanke gave a speech to Congress that quantitative easing would likely be terminated at a future date. More investigation is needed to understand the market and FOMC dynamics around that historical episode and we regard this as future work.

Second, the regressions suggests that sentiment is positively associated with equity levels. Positive sentiment is associated with higher Russell Index 1000 levels. We think this makes sense. Whether sentiment causes equity markets to move or vice versa is too complex to answer with the crude econometric analysis we have conducted. However, the trend and regression results suggest that more detailed regression analysis of sentiment difference vs. equity returns (instead of levels) both contemporaneous or lagged would promising some predictive value from sentiment analysis. The project timeline did not allow for this more extensive regression analysis work, but we view it as fertile ground for future research.