DS607_Validate_FOMC

library(tidyverse)

## ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.1.0       ✔ purrr   0.3.0  
## ✔ tibble  2.0.1       ✔ dplyr   0.8.0.1
## ✔ tidyr   0.8.2       ✔ stringr 1.4.0  
## ✔ readr   1.3.1       ✔ forcats 0.4.0

## ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(stringr)
library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(knitr)

Overview

This markdown reviews the validity of the FOMC data frame created by Jagdish. We assume the fomc_data.rds file is located in the working directory.

d2<-readRDS(file = "fomc_data.rds")

The FOMC dates are out of order and should be arrange alphabetically. Words counts should be tabulated using str_count

d2 %>% 
  mutate_if( is.factor, as.character) %>%
  arrange(statement.dates) %>%
  mutate( numwords = str_count(statement.content, "\\S+")) %>%
  select( statement.dates, numwords ) %>%
  filter(numwords > 700 ) %>% arrange(numwords)

##    statement.dates numwords
## 1         20121212      702
## 2         20130619      713
## 3         20130731      716
## 4         20141029      731
## 5         20141217      748
## 6         20131030      792
## 7         20130918      814
## 8         20140618      822
## 9         20140430      835
## 10        20140129      856
## 11        20140730      865
## 12        20131218      893
## 13        20140319      901
## 14        20140917      919

Revised DataFrame

d2 %>% mutate_if( is.factor, as.character) %>%  # needed to change dates from factors to string
    arrange(statement.dates) %>% 
    mutate( numwords = str_count(statement.content, "\\S+")) -> d3

d3 %>% select( statement.dates, numwords) %>% 
    kable() %>% 
    kable_styling() %>% 
    scroll_box(width="85%", height="200px")

statement.dates	numwords
20070131	189
20070321	175
20070509	176
20070618	187
20070807	215
20070810	92
20070817	138
20070918	267
20071031	313
20071211	289
20080122	263
20080130	261
20080311	415
20080318	299
20080430	317
20080625	259
20080805	254
20080916	232
20081008	439
20081029	297
20081216	411
20090128	485
20090318	432
20090429	437
20090624	349
20090812	434
20090923	437
20091104	475
20091216	584
20100127	563
20100316	458
20100428	422
20100509	290
20100623	372
20100810	471
20100921	440
20101103	506
20101214	505
20110126	450
20110315	476
20110427	477
20110622	469
20110809	537
20110921	606
20111102	501
20111213	445
20120125	434
20120313	445
20120425	454
20120620	501
20120801	473
20120913	571
20121024	564
20121212	702
20130130	661
20130320	659
20130501	685
20130619	713
20130731	716
20130918	814
20131030	792
20131218	893
20140129	856
20140319	901
20140430	835
20140618	822
20140730	865
20140917	919
20141029	731
20141217	748
20150128	583
20150318	598
20150429	574
20150617	559
20150729	553
20150917	602
20151028	593
20151216	606
20160127	573
20160316	586
20160427	591
20160615	555
20160727	580
20160921	618
20161102	612
20161214	550
20170201	515
20170315	536
20170503	532
20170614	566
20170726	521
20170920	549
20171101	527
20171213	501
20180131	436
20180321	462
20180502	436
20180613	336
20180801	324
20180926	306
20181108	319
20181219	346
20190130	344
20190320	360
20190501	336

dim(d3)

## [1] 105   4

We need to make 4 changes to the dataframe rows.

Remove 3 dates corresponding to extraordinary FOMC meeting related to swap lines, TALF and other measures. These dates do not have normal rate setting objectives. The dates to remove from the dataframe are:

20070810, 20080311 and 20100509.

Lastly, one of the dates is misnamed in the html filelink as 20070618 but the actual statement was released on 20070628. This is a typo in the URL. The actual statement contains the date June 28, 2007 which confirms the URL is misnamed.

d3 %>% 
  filter( statement.dates != "20070810") %>%
  filter( statement.dates != "20080311") %>%
  filter( statement.dates != "20100509")  -> d4


d4[d4$statement.dates == "20070618", "statement.dates"] <- "20070628"

d4 %>% select(statement.dates, numwords) %>%
  kable() %>%
  kable_styling() %>%
  scroll_box(width="100%", height="250px")

statement.dates	numwords
20070131	189
20070321	175
20070509	176
20070628	187
20070807	215
20070817	138
20070918	267
20071031	313
20071211	289
20080122	263
20080130	261
20080318	299
20080430	317
20080625	259
20080805	254
20080916	232
20081008	439
20081029	297
20081216	411
20090128	485
20090318	432
20090429	437
20090624	349
20090812	434
20090923	437
20091104	475
20091216	584
20100127	563
20100316	458
20100428	422
20100623	372
20100810	471
20100921	440
20101103	506
20101214	505
20110126	450
20110315	476
20110427	477
20110622	469
20110809	537
20110921	606
20111102	501
20111213	445
20120125	434
20120313	445
20120425	454
20120620	501
20120801	473
20120913	571
20121024	564
20121212	702
20130130	661
20130320	659
20130501	685
20130619	713
20130731	716
20130918	814
20131030	792
20131218	893
20140129	856
20140319	901
20140430	835
20140618	822
20140730	865
20140917	919
20141029	731
20141217	748
20150128	583
20150318	598
20150429	574
20150617	559
20150729	553
20150917	602
20151028	593
20151216	606
20160127	573
20160316	586
20160427	591
20160615	555
20160727	580
20160921	618
20161102	612
20161214	550
20170201	515
20170315	536
20170503	532
20170614	566
20170726	521
20170920	549
20171101	527
20171213	501
20180131	436
20180321	462
20180502	436
20180613	336
20180801	324
20180926	306
20181108	319
20181219	346
20190130	344
20190320	360
20190501	336

dim(d4)

## [1] 102   4

Manual Validation of statements

I checked the 4 statements against the actual webpage.

Oct 10, 2008 Dec 16, 2009 Sep 17, 2014 is the longest and is accurate. May 1, 2019 is the most recent

They are all accurate except for one detail. The elimination of newline characters is causing consecutive words to be conjoined producing nonsense words. The line code below should be eliminated in DS607_FOMC_Sentiment_Analysis_v3.Rmd

#reports\(statement.content[i]<-gsub("\n","",reports\)statement.content[i])

Conclusion

Once the removal of the minor changes to gsub command is made, the dataframe d4 above should be fit for purpose to use for research.

Nonetheless, I export the dataframe as a binary object below.

saveRDS(d4, file = "fomc_corrected_data_v1.rds")

DS607_Validate_FOMC_DATA

Alexander Ng

5/6/2019

Overview

Revised DataFrame

Manual Validation of statements

Conclusion