library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0       ✔ purrr   0.3.0  
## ✔ tibble  2.0.1       ✔ dplyr   0.8.0.1
## ✔ tidyr   0.8.2       ✔ stringr 1.4.0  
## ✔ readr   1.3.1       ✔ forcats 0.4.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(stringr)
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(knitr)

Overview

This markdown reviews the validity of the FOMC data frame created by Jagdish. We assume the fomc_data.rds file is located in the working directory.

d2<-readRDS(file = "fomc_data.rds")

The FOMC dates are out of order and should be arrange alphabetically. Words counts should be tabulated using str_count

d2 %>% 
  mutate_if( is.factor, as.character) %>%
  arrange(statement.dates) %>%
  mutate( numwords = str_count(statement.content, "\\S+")) %>%
  select( statement.dates, numwords ) %>%
  filter(numwords > 700 ) %>% arrange(numwords)
##    statement.dates numwords
## 1         20121212      702
## 2         20130619      713
## 3         20130731      716
## 4         20141029      731
## 5         20141217      748
## 6         20131030      792
## 7         20130918      814
## 8         20140618      822
## 9         20140430      835
## 10        20140129      856
## 11        20140730      865
## 12        20131218      893
## 13        20140319      901
## 14        20140917      919

Revised DataFrame

d2 %>% mutate_if( is.factor, as.character) %>%  # needed to change dates from factors to string
    arrange(statement.dates) %>% 
    mutate( numwords = str_count(statement.content, "\\S+")) -> d3

d3 %>% select( statement.dates, numwords) %>% 
    kable() %>% 
    kable_styling() %>% 
    scroll_box(width="85%", height="200px")
statement.dates numwords
20070131 189
20070321 175
20070509 176
20070618 187
20070807 215
20070810 92
20070817 138
20070918 267
20071031 313
20071211 289
20080122 263
20080130 261
20080311 415
20080318 299
20080430 317
20080625 259
20080805 254
20080916 232
20081008 439
20081029 297
20081216 411
20090128 485
20090318 432
20090429 437
20090624 349
20090812 434
20090923 437
20091104 475
20091216 584
20100127 563
20100316 458
20100428 422
20100509 290
20100623 372
20100810 471
20100921 440
20101103 506
20101214 505
20110126 450
20110315 476
20110427 477
20110622 469
20110809 537
20110921 606
20111102 501
20111213 445
20120125 434
20120313 445
20120425 454
20120620 501
20120801 473
20120913 571
20121024 564
20121212 702
20130130 661
20130320 659
20130501 685
20130619 713
20130731 716
20130918 814
20131030 792
20131218 893
20140129 856
20140319 901
20140430 835
20140618 822
20140730 865
20140917 919
20141029 731
20141217 748
20150128 583
20150318 598
20150429 574
20150617 559
20150729 553
20150917 602
20151028 593
20151216 606
20160127 573
20160316 586
20160427 591
20160615 555
20160727 580
20160921 618
20161102 612
20161214 550
20170201 515
20170315 536
20170503 532
20170614 566
20170726 521
20170920 549
20171101 527
20171213 501
20180131 436
20180321 462
20180502 436
20180613 336
20180801 324
20180926 306
20181108 319
20181219 346
20190130 344
20190320 360
20190501 336
dim(d3)
## [1] 105   4

We need to make 4 changes to the dataframe rows.

Remove 3 dates corresponding to extraordinary FOMC meeting related to swap lines, TALF and other measures. These dates do not have normal rate setting objectives. The dates to remove from the dataframe are:

20070810, 20080311 and 20100509.

Lastly, one of the dates is misnamed in the html filelink as 20070618 but the actual statement was released on 20070628. This is a typo in the URL. The actual statement contains the date June 28, 2007 which confirms the URL is misnamed.

d3 %>% 
  filter( statement.dates != "20070810") %>%
  filter( statement.dates != "20080311") %>%
  filter( statement.dates != "20100509")  -> d4


d4[d4$statement.dates == "20070618", "statement.dates"] <- "20070628"

d4 %>% select(statement.dates, numwords) %>%
  kable() %>%
  kable_styling() %>%
  scroll_box(width="100%", height="250px")
statement.dates numwords
20070131 189
20070321 175
20070509 176
20070628 187
20070807 215
20070817 138
20070918 267
20071031 313
20071211 289
20080122 263
20080130 261
20080318 299
20080430 317
20080625 259
20080805 254
20080916 232
20081008 439
20081029 297
20081216 411
20090128 485
20090318 432
20090429 437
20090624 349
20090812 434
20090923 437
20091104 475
20091216 584
20100127 563
20100316 458
20100428 422
20100623 372
20100810 471
20100921 440
20101103 506
20101214 505
20110126 450
20110315 476
20110427 477
20110622 469
20110809 537
20110921 606
20111102 501
20111213 445
20120125 434
20120313 445
20120425 454
20120620 501
20120801 473
20120913 571
20121024 564
20121212 702
20130130 661
20130320 659
20130501 685
20130619 713
20130731 716
20130918 814
20131030 792
20131218 893
20140129 856
20140319 901
20140430 835
20140618 822
20140730 865
20140917 919
20141029 731
20141217 748
20150128 583
20150318 598
20150429 574
20150617 559
20150729 553
20150917 602
20151028 593
20151216 606
20160127 573
20160316 586
20160427 591
20160615 555
20160727 580
20160921 618
20161102 612
20161214 550
20170201 515
20170315 536
20170503 532
20170614 566
20170726 521
20170920 549
20171101 527
20171213 501
20180131 436
20180321 462
20180502 436
20180613 336
20180801 324
20180926 306
20181108 319
20181219 346
20190130 344
20190320 360
20190501 336
dim(d4)
## [1] 102   4

Manual Validation of statements

I checked the 4 statements against the actual webpage.

Oct 10, 2008 Dec 16, 2009 Sep 17, 2014 is the longest and is accurate. May 1, 2019 is the most recent

They are all accurate except for one detail. The elimination of newline characters is causing consecutive words to be conjoined producing nonsense words. The line code below should be eliminated in DS607_FOMC_Sentiment_Analysis_v3.Rmd

#reports\(statement.content[i]<-gsub("\n","",reports\)statement.content[i])

Conclusion

Once the removal of the minor changes to gsub command is made, the dataframe d4 above should be fit for purpose to use for research.

Nonetheless, I export the dataframe as a binary object below.

saveRDS(d4, file = "fomc_corrected_data_v1.rds")