From prices to politics:
The causes and consequences of inflation at the Fed

Author

Benjamin Braun, Jérôme Deyris, Monica DiLeo

Abstract

Managing inflation has been the core task of central banks, at least in the Western world, for the past fifty years. Not only is it the task to which central banks are legally mandated, but central banks’ reputations hinge on their success in inflation fighting. Central bankers are inclined to portray their responses to inflation as scientific, rather than political: this is a core premise that justifies their independence. However, the economics literature has had little consensus on the relative importance of different drivers of inflation in different episodes, undermining this view. At the same time, different causes of inflation imply different policy responses which have varied and far-reaching distributional implications. Given this, how do central bankers identify the causes of inflation and thus appropriate policy responses? We examine this question in the context of the Federal Reserve (Fed). Leveraging large language models to code Fed speeches and Federal Open Market Committee transcripts beginning in the 1970s, we trace how Fed policymakers’ understanding of the drivers of inflation varies both over time and within episodes. Through qualitative and quantitative analysis, we examine how individual, institutional, and political factors influence how Fed policymakers understand and frame publicly and privately the causes and consequences of inflation.

This data reports is organized as follows. First, we present our textual data of public speeches (section 1) and FOMC transcripts (section 2). In section 3, we explain how we merge our data, and filter our corpus to remove irrelevant small sentences. Section 4 presents our dictionary based approach to flag inflation-related sentences. Finally, section 5 presents our final corpus of inflation-related excerpts, and discuss the problems inherent with the quantification of the different inflation drivers.

1. Speeches

Our collection of public speeches comes from two different sources.

First, we start with the database created by Campiglio et al. (2024). This dataset agregates speeches from three different sources for the Fed: (i) the BIS repository, (ii) the Fed websites, and (iii) the FRASER online archives maintained by the Fed St. Louis.

Second, we complement this dataset that only goes back to 1986 via a new scraping of the FRASER archives maintained by the Fed Saint Louis for the years 1970-1986. We complete missing metadata to match the format of Campiglio et al. (2024) (notably with the Gender of all speakers).

1.1 Sources

Our dataset comprises 8,253 speeches from three distinct sources (Panel A). Federal Reserve Bank Presidents delivered the majority of these speeches, rather than Board members or the (Vice) Chair (Panels B and D). While the proportion of speeches by female speakers has increased over time, Fed communications remain predominantly male, reflecting the institution’s gender imbalance (Panel C).

Distribution of Fed Speeches

1.2 OCR Quality

The quality of the text varies depending on the quality of the document and of its OCR. We approximate this quality by the share of words that are recognized as actual words by the package hunspel and its English dictionary. Due to the number of idiosyncrasic words and central banking specific vocabularies, and after close manual inspection, we consider that speeches above the 90% threshold to be very high quality.

Text Quality Analysis Across Different Dimensions

99,4% of our speeches are above this thresold. We exclude the remaining 0.6%, which mainly impact early speeches (see panels A and B) from Regional Fed Presidents, especially from the Fed Chicago and from the Fed of Kansas City (see panels C and D). We are left with 8,184 speeches.

1.3 Audience

Finally, we code the audience for all the speeches given by members of the Board (including Chairman and vice chairs). Values are currently missing for the most recent years, as well as for a few Greenspan / Volcker speeches.

Audience Distribution

There seems to be a drop in the proportion of speeches given to finance, academic and parliamentary audiences, in favour of more central-bank-academic and press audiences. However, we curently have most speeches not coded. The goal is to use what we have already coded as a validation sample to check whether LLMs could be able to code the remaining thousand speeches based on the title and incipit of each speech.

1.4 Sentences

We then parse speeches into individual sentences, using the package spacyr and its bigger and most accurate English language parsing model en_core_web_lg.

We manually inspect speeches containing more than 300 sentences to remove appendices that may artificially inflate document length. These appendices are common in parliamentary testimony transcripts, where they serve as reference materials for central bankers during question periods. We simply locate the actual last sentence of each speech and exclude the sentences afterwards.

We end up with a corpus of 1,097,717 sentences. The figure below plots the average number of sentences and words per speech per year (panels A and C), as well as their yearly distribution (panels B and D).

Parsed speeches

2. FOMC meetings

2.1 Collection and pre-processing steps

Our private interventions all come from the PDF files publicly released by the Board of Governors of their FOMC meeting transcripts with a delay of 5 years.

However, we did this collection effort in late 2023, and our last meetings available are from 2017. We may consider do this again early 2025 for two additional years?

There are no OCRization problems for these documents, but they need to be parsed into interventions. To do so, [to be completed by Monica].

We harmonize metadata (speaker names, gender, positions) using information from both the speech database above and the Federal Reserve History website. When a board member’s position changes within a year and the exact transition date is unknown, we assign all speeches from that year to the new position (e.g., for a 2014 position change, all 2014 speeches are labeled with the new position, while pre-2014 speeches retain the previous position).

2.2 Descriptive plots

We end up with 104,821 distinct interventions. Almost half of these interventions are given by the Fed Chair (panel A). However, Chair interventions are on average shorter, the chairman distributing speech time and making other short interventions, while regional fed presidents speaking less often, but fore more comprehensive and detailed speeches (typically informing the others about the economic and monetary developments of their region). This leads to fed presidents speaking more than half the words spoken, a share that even increases in 2006 and later in 2011 (to be investigated why?).

The other plots allow to zoom in on regional fed presidents, displaying their share of interventions (Panel B) or of words (panel D).

Distribution of Federal Reserve Speeches

2.3 Sentences

Last, we parse each intervention into sentences. This leads to 536,236 sentences. As you can see from the plot below, the lenght of interventions in average increased throughout the period, counted in sentences (panel A) or in words (panel C). However, there is a high variance, which especially shows in plots B and D as a lot of interventions are in fact pretty short.

3. Merging the two datasets

We now merge our two corpuses together. This first allows us to check whether we have speakers that only speak publicly in speeches but never in FOMC meetings, or vice versa. This allowed us to find a few regional fed presidents for which the FRASER archives had no public speeches registered.

3.1 Extra-long and extra-short sentences

We then inspect very short (below 5 words or 20 characters) and very long sentences (above 150 words or 1000 characters) to remove those that do not carry information relevant for our research question. After manual inspection, we noticed that there were a few very long sentences that had been kept as one by the parsing model due to the original document missing punctuation. To fix this problem, we export all sentences above this threshold to Python, in which we implement the model oliverguhr/fullstop-punctuation-multilang-large available at https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large.

After the 8,421 very long sentences get new punctuation, we relaunch our parsing model and find 59,530 sentences. It is important to note that some sentences are not split, which proves that the punctuation model does not always adding full stops.

Then, we decide to deal with extra-small sentences, excluding those that have less than 3 words or 10 characters. As shown by the figure below, this choice leads to remove sentences mainly in FOMC transcripts, that often contain small discussion or interjection sentences that are captured by the transcription but carry no interesting information (e.g. “Is there coffee?”).

This rule of exclusion for small sentences leads us to remove about 100,000 sentences, which account for 6.42 percent of the corpus. We are left with 1,532,866 sentences.

3.2 Public vs Private

We then plot each speakers’ speeches both publicly (speeches) and privately (interventions in fomc meetings).

With a few exceptions (e.g. Volcker), US central bankers speak more publicly than in FOMC meetings. It should be noted that some appear to have 0 FOMC interventions, simply because the coverage of FOMC meetings is poorer than for speeches (we have speeches for 1970-1976 and 2017-2023, years for which FOMC transcripts are unavailable).

The same can be made per words and not per sentences, which does not change much the results:

4. Inflation sentences

We then move on to trying to address our research question. How are Fed officials discussing the drivers of inflation, both privately and publicly? Before being able to spot whether these drivers, we first need to spot any relevant discussion of inflation. To do so, we implement a simple dictionary based approach.

4.1 Keywords and their prominence

Our dictionary consists of the following words and expressions:

          "inflation", "cpi", "pce", "price increas", 
          "price stab", "overall price", "price level", 
          "consumer price", "producer price", "price increas", 
          "price index", "price pressure", 
          "price shock", "higher price", "raise price", 
          "raising price", "rising price"

In total, these keywords appear 129,484 times in our corpus, in 129,457 different sentences. This means that 8.8 percent of the total sentences from our corpus are potentially relevant. However, it must be noted that certain of our keywords are much more prominent than others:

Code
library(gt)
keyword_stats <- readRDS("/Users/jerome.deyris/Library/CloudStorage/GoogleDrive-jerome.deyris@sciencespo.fr/Mon Drive/KnowLegPo/Fed_speeches_and_fomc/data/inflation_keyword_stats.rds")

gt(keyword_stats) %>%
  fmt_number(columns = "share", decimals = 2) %>%
  fmt_number(columns = "matches", decimals = 0, use_seps = TRUE) %>%
  cols_label(
    keyword = "Keyword",
    matches = "Occurrences",
    share = "Share of Total (%)"
  ) %>%
  tab_header(title = "Inflation-Related Keywords in Fed Communications")
Inflation-Related Keywords in Fed Communications
Keyword Occurrences Share of Total (%)
inflation 112,039 85.38
price stab 10,358 7.89
pce 4,288 3.27
cpi 3,491 2.66
price index 2,856 2.18
price increas 2,679 2.04
price increas 2,679 2.04
price level 2,345 1.79
consumer price 2,168 1.65
price pressure 1,510 1.15
higher price 1,066 0.81
price shock 561 0.43
rising price 453 0.35
raise price 349 0.27
producer price 201 0.15
overall price 198 0.15
raising price 127 0.10

Interestingly, this prominence also varies (although only slightly) in time.

Notice the apparition of pce and the decline of cpi in 2000, when the shift happened!

4.2 Inflation salience in time

We can also now explore the salience of inflation in time, both in public and private discussions. We can explore this yearly, of course, but also monthly to highlight periods of more sudden and acute attention to inflation.

Notice how generally the two correlate very well, except for an interesting mismatch in the late 70s and early in which there were little discussions about inflation in FOMC transcripts, but a lot of attention in speeches. This may be only a relative phenomenon, with less public speeches in general?

4.3 Inflation salience across speakers

Last, we can explore the relative importance of inflation in public and private speeches across different variables, such as position, gender and institution. Specifically, the Federal Reserve Banks of Saint Louis, San Francisco, Richmond and Cleveland appear as the most vocal about inflation. By contrast, Fed Board members and Vice Chair speak less about the issue, both in absolute and relative terms.

5. Building inflation excerpts

In order to know whether or not these sentences discuss the drivers of inflation, we need more context than just sentences that contain the word inflation. Otherwise, we may miss relevant discussions.

5.1 Adapting the triplets methodology

We follow standard practice in adding the sentence before and the sentence after. However, we include a twist to this method, in order to making sure each sentence gets fed only once to our LLM pipeline.

Indeed, the standard “triplets” lead to the following situation. If two relevant sentences are consecutive, then these sentences will be coded twice : once as relevant, with n-1 and n+1 sentences added for context, and once as context for the following sentence. In a similar fashion, almost-consecutive sentences will lead a single sentence of context to be coded twice:

Notice how sentences 8, 10, 15, 16 and 17 all appear in two text chunks. Sentence 9 even appears in three distinct excerpts, getting coded three times.

To make sure this does not happen, we decide to adapt the triplets method. After spotting all relevant sentences, we merge together consecutive and semi-consecutive inflation sentences, before adding as usual the previous and next sentence for context. This leads to some excerpts being more than 3 sentences long, but ensures we capture all relevant sentences only once. This has the added benefit of not cutting long discussions about inflation in several chunks, which we found to be more easy to code for both humans and LLMs.

5.2 How to normalize inflation discussions?

This methods leads to 56,743 inflation excerpts, containing our 129,457 inflation-related sentences. The majority of these excerpts are 3 sentences long, but a few are shorter (due to some FOMC interventions being less than 3 sentences), and some longer (due to our aggregation method).

It is important to note that the figure above is after removing the top of 1% of longer excerpts from the plot in order to ensure readability. This is because, after inspecting the data, it seems like we have a few inflation chunks that are very, very long (above 20 sentences, and as high as 92 sentences long), which were making the plots uninformative.

I don’t know what to do with those. Should we try to split them again? I think this would look bad. And it’s probably not necessary per se: there is only one above 3000 words, and maybe 30 above 1000 words. All should still fit in the context window of our LLMs. In any case, this leads us to the “scaling” problem that we need to decide: how do we want to weight the importance of the different drivers of inflation for each speaker?

I see two main possibilities:

  • The first is a simple count. For each year/month, we simply count for each speaker the number of times each drivers has been identified. The problem with this approach is that it is highly vulnerable to the chunking process. Whether the sentences are chunked together or not is going to greatly influence the measure. For example, a speech with 6 distinct chunks discussing inflation will count six times more than if the same sentences were apart.

  • The second is to normalize at the speech / fomc-meeting level. For any given speech / fomc meeting, the speaker has 1 point to be distributed to the different drivers. For example, let’s imagine a speech that has 4 relevant chunks, with twice fiscal, once fiscal + labor, and once energy. This means that it has 5 labels in 3 distinct chunks. With this method, the energy and labor both take 0.2 (1/5), and fiscal 0.6 (3/5). This does not solve entirely the issue, because if the 4 relevant chunks were merged, all three labels would have 0.33 (1/3). But at least, although we may still have measurement noise in the relative importance of each drivers, we do not add noise in the overall salience about inflation, using a normalized measure.

  • If we want to go once step further, we could simply assume equal weights to all the identified drivers in any speech. This way, our measure is insensitive to the chunking process: it does not matter how the relevant sentences are cut, all identified drivers get assigned the same weight. We do lose some measure of the weight, but the hope is that the aggregation of several speeches each year/month for each speaker will then give us further variability / granularity.

5.3 Chunks per speaker

In any case, the upside is that we do have several chunks per speaker and per year in office. Considering that our validation sample led us to believe that there are about 20% of excerpts that discuss the drivers of inflation, we would need at least 5 chunks a year in office for each speaker to avoid NA values. This threshold is represented by the gray line in the figure below:

Some speakers are unfortunately under this limit, suggesting we may have data gaps, i.e. year/speaker without any inflation drivers. This may be less the case if we focus on inflation-intense periods, but that’s worth noting already. We can always remove these few people, or assume their value for missing years is the same as the previous.

Generally speaking, you can see that the number of excerpts varies a lot depending on speakers:

With the same data, but in a scatterplot: