Introduction

The National Bureau of Economic Research is a think tank committed to “undertaking and disseminating unbiased economic research among public policymakers, business professional, and the academic community.” Its contribution to the history of modern economics is hard to overstate; NBER is best known for accurately providing start and end dates for recessions in the United States. Not surprisingly, considering its success, an amazing 22 Nobel Prize–winners for economics have been a part of NBER, including Milton Friedman, Paul Krugman, and Joseph Stiglitz.

The more than 1,600 economists who are NBER researchers are the leading scholars in their fields. Most NBER-affiliated researchers are either Faculty Research Fellows (FRFs) or Research Associates (RAs). Faculty Research Fellows are typically junior scholars. Research Associates, whose appointments are approved by the NBER Board of Directors, hold tenured positions at their home institutions.

New research by NBER affiliates, circulated for discussion and comment. The NBER distributes more than 1,200 working papers each year. These papers have not been peer reviewed. Papers issued more than 18 months ago are open access. More recent papers are available without charge to affiliates of subscribing academic institutions, employees of NBER Corporate Associates, government employees in the US, journalists, and residents of low-income countries.

Data Wrangling

library(nberwp)
library(tidyverse)
library(tidytuesdayR)
library(lubridate)
library(ggplot2)
library(stringr)
library(CGPfunctions)
library(scales)
library(ggpubr)
library(treemapify)
theme_set(theme_pubr())

Read data

tuesdata <- tidytuesdayR::tt_load('2021-09-28')

papers <- tuesdata$papers
authors <- tuesdata$authors
programs <- tuesdata$programs
paper_authors <- tuesdata$paper_authors
paper_programs <- tuesdata$paper_programs

Data Structuring

# joining needed data into a single dataframe
paper_authors <- paper_authors %>%
  left_join(papers, by = "paper")

paper_authors <- paper_authors %>%
  left_join(authors, by = "author")

paper_authors <- paper_authors %>%
  left_join(paper_programs, by = "paper")

nber <- paper_authors %>%
  left_join(programs, by = "program")

nber
str(nber)
## spec_tbl_df [130,081 x 11] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ paper           : chr [1:130081] "w0001" "w0002" "w0003" "w0004" ...
##  $ author          : chr [1:130081] "w0001.1" "w0002.1" "w0003.1" "w0004.1" ...
##  $ year            : num [1:130081] 1973 1973 1973 1973 1973 ...
##  $ month           : num [1:130081] 6 6 6 7 7 7 8 9 9 9 ...
##  $ title           : chr [1:130081] "Education, Information, and Efficiency" "Hospital Utilization: An Analysis of SMSA Differences in Hospital Admission Rates, Occupancy Rates and Bed Rates" "Error Components Regression Models and Their Applications" "Human Capital Life Cycle of Earnings Models: A Specific Solution and Estimation" ...
##  $ name            : chr [1:130081] "Finis Welch" "Barry R Chiswick" "Swarnjit S Arora" "Lee A Lillard" ...
##  $ user_nber       : chr [1:130081] "finis_welch" "barry_chiswick" "swarnjit_arora" NA ...
##  $ user_repec      : chr [1:130081] NA "pch425" NA "pli669" ...
##  $ program         : chr [1:130081] NA NA NA NA ...
##  $ program_desc    : chr [1:130081] NA NA NA NA ...
##  $ program_category: chr [1:130081] NA NA NA NA ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   paper = col_character(),
##   ..   author = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

Data Dictionary

  • paper: Paper ID
  • author: Author ID
  • year: Publish year
  • month: Publish month
  • title: Title of the paper
  • user_nber: Author username on NBER
  • user_repec: Author username on REPEC
  • name: Author name
  • program: program ID
  • program_desc: Subcategory of the paper
  • program_category: Category of the paper

Data Cleaning

Since we won’t be using user_nber and user_repec data, we will remove these variables.

nber <- nber[-c(7,8)]

Handling missing values

# checking for missing value.
colSums((is.na(nber)))
##            paper           author             year            month 
##                0                0                0                0 
##            title             name          program     program_desc 
##                0                0              530              530 
## program_category 
##             1516

The number of missing value is inconsequential relative to the dataset size, so we are going to ignore it.

Data Transformation

Now, we create a date data from year and month columns.

nber <- transform(nber, Date = as.Date(paste(year, month, 1, sep = "-")))

To analyze year on year monthly data, we may create publish_month variable.

nber$publish_month <- month(nber$Date, label = T, abbr = T)

Afterwards, renaming columns to be more understandable is considered best practice.

nber <- rename(nber, 
               paper_id = paper,
               author_id = author,
               publish_date = Date,
               author_name = name,
               program_subcategory = program_desc,
               program_id = program
               )

Several columns are categorical, therefore should be changed to factor data type.

nber <- nber %>%
  mutate(across(c(author_name, program_id, program_subcategory, program_category), as.factor))

We can remove redundant month variable.

nber <- nber[-c(4)]

Then, we will also rearrange the columns in dataframe

nber <- nber[, c(1,4,2,5,8,6,7,9,10,3)]

Let’s see how the final dataset looks like after wrangling.

str(nber)
## 'data.frame':    130081 obs. of  10 variables:
##  $ paper_id           : chr  "w0001" "w0002" "w0003" "w0004" ...
##  $ title              : chr  "Education, Information, and Efficiency" "Hospital Utilization: An Analysis of SMSA Differences in Hospital Admission Rates, Occupancy Rates and Bed Rates" "Error Components Regression Models and Their Applications" "Human Capital Life Cycle of Earnings Models: A Specific Solution and Estimation" ...
##  $ author_id          : chr  "w0001.1" "w0002.1" "w0003.1" "w0004.1" ...
##  $ author_name        : Factor w/ 15398 levels "A Abigail Payne",..: 4604 1505 13732 8457 6149 14475 8548 9851 12115 8457 ...
##  $ program_category   : Factor w/ 3 levels "Finance","Macro/International",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ program_id         : Factor w/ 21 levels "AG","AP","CF",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ program_subcategory: Factor w/ 21 levels "Asset Pricing",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ publish_date       : Date, format: "1973-06-01" "1973-06-01" ...
##  $ publish_month      : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 6 6 6 7 7 7 8 9 9 9 ...
##  $ year               : num  1973 1973 1973 1973 1973 ...
pw <- function(x){ 
    if(x %in% c(1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979)){
      x <- "1970s"
    }else if(x %in% c(1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989)){
      x <- "1980s"
    }else if(x %in% c(1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999)){
      x <- "1990s"
    }else if(x %in% c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009)){
      x <- "2000s"
    }else if(x %in% c(2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021)){
      x <- "2010s"
    }
}
nber$decade <- sapply(as.character(nber$year), FUN = pw)
nber$decade <- as.factor(nber$decade)

Data Description

summary(nber)
##    paper_id            title            author_id        
##  Length:130081      Length:130081      Length:130081     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##            author_name                program_category   program_id   
##  Jonathan Gruber :   359   Finance            :12957   LS     :14084  
##  James J Heckman :   331   Macro/International:35929   PE     :13967  
##  Daron Acemoglu  :   308   Micro              :79679   EFG    :13113  
##  Janet M Currie  :   306   NA's               : 1516   IFM    : 8570  
##  Michael D Bordo :   297                               ITI    : 7125  
##  Edward L Glaeser:   291                               (Other):72692  
##  (Other)         :128189                               NA's   :  530  
##                                program_subcategory  publish_date       
##  Labor Studies                           :14084    Min.   :1973-06-01  
##  Public Economics                        :13967    1st Qu.:2005-10-01  
##  Economic Fluctuations and Growth        :13113    Median :2013-03-01  
##  International Finance and Macroeconomics: 8570    Mean   :2010-07-13  
##  International Trade and Investment      : 7125    3rd Qu.:2018-03-01  
##  (Other)                                 :72692    Max.   :2021-06-01  
##  NA's                                    :  530                        
##  publish_month        year        decade     
##  May    :12102   Min.   :1973   1970s:  627  
##  Jun    :12090   1st Qu.:2005   1980s: 6247  
##  Oct    :11234   Median :2013   1990s:11231  
##  Dec    :11139   Mean   :2010   2000s:31403  
##  Sep    :11044   3rd Qu.:2018   2010s:80573  
##  Jan    :11018   Max.   :2021                
##  (Other):61454

nber dataset consists of 130 thousands of NBER economic papers published in 1973 - 2021.

length(unique(nber$author_id))
## [1] 15437

There are around 15 thousands of authors who have published on NBER during 1973-2021.

unique(nber$program_category)
## [1] <NA>                Macro/International Micro              
## [4] Finance            
## Levels: Finance Macro/International Micro

Each program category connects loosely with traditional economic field, they are Finance, Microeconomy, and Macroeconomy.

unique(nber$program_subcategory)
##  [1] <NA>                                          
##  [2] Economic Fluctuations and Growth              
##  [3] International Finance and Macroeconomics      
##  [4] International Trade and Investment            
##  [5] Public Economics                              
##  [6] Labor Studies                                 
##  [7] Health Economics                              
##  [8] Monetary Economics                            
##  [9] Productivity, Innovation, and Entrepreneurship
## [10] Law and Economics                             
## [11] Children                                      
## [12] Corporate Finance                             
## [13] Economics of Aging                            
## [14] Development of the American Economy           
## [15] Environment and Energy Economics              
## [16] Industrial Organization                       
## [17] Asset Pricing                                 
## [18] Health Care                                   
## [19] Economics of Education                        
## [20] Political Economics                           
## [21] Technical Working Papers                      
## [22] Development Economics                         
## 21 Levels: Asset Pricing Children Corporate Finance ... Technical Working Papers

Ignoring NA, there are 20 subcategoires of research program in NBER journal.

The 70s: A Turning Point in NBER History

Martin S. Felstein succeeded as the NBER president in 1977, who was a professor at Harvard. He altered every aspects of the NBER. He moved the headquarters from New York to Cambridge. He also advanced the NBER’s role in disseminating economic research by developing NBER Working Paper Series as a leading source of pre-publication findings.In 1977, when he started, there were 142 of such papers. In 2008, by close of his term, there were 14,000. During Feldstein’s term, the number of publication multiplied by twentyfolds (Graphic 1).

Graphic 1.

number_year <- nber %>%
  group_by(year) %>%
  summarise(titles = n()) %>%
  mutate(log_titles = log10(titles))
  ggplot(number_year, aes(x = year, y = log_titles))+
  geom_line(size =1, color = "#003f5c")+
  labs(title = "Number of NBER Publications",
       subtitle = "from 1973-2021",
       y = "Logarithmic scale of number of titles",
       x = NULL,
       caption = "Source: Tidytuesday")+
    theme(plot.title = element_text(hjust = 0, face = "bold"),
        plot.subtitle = element_text(hjust = 0),
        axis.title = element_text(size=9),
        legend.position = "bottom",
        legend.title = element_text(size=9, face = "bold"),
        axis.line = element_line(),
        panel.border = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())+
    scale_x_continuous(breaks = breaks_width(10)) +
    scale_y_continuous(breaks = breaks_width(1))

The Coming Era of Microeconomics

Founded in 1920, the NBER organization was largely a response to heated controversies over income distribution. Back then, economics was just economics, there were no subfields of economics as we know now. Naturally, macroeconomics issue such as recession, economic growth, and recession was the topics of the decade during the Great Depression. Most of economics paper focus on macroeconomic issue up to 1960s. In 1950s and 1960s, the NBER retained its focus on economic fluctuations, productivity, and economic growth. However, it all changed when John R. Meyer was appointed as the president. He was focused in micro-instead of macro-economics. From then on, we can observe a shift in the proportion of subfields of papers publications, as seen in Graphic 2. It is a slightly modified plot from Graphic 1: we swapped the logarithmic scale to level, and divide the number of paper publication by its economic subfields.

num_cat <- nber %>%
  group_by(year, program_category) %>%
  summarise(freq = n()) %>%
  arrange(desc(freq))
## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
num_cat <- na.omit(num_cat)
ggplot(num_cat, aes(x = year, y= freq))+
  geom_area(size =1, aes(group = program_category, fill = program_category)) +
  labs(title = "Number of NBER Publications by Economics Subfields",
       subtitle = "from 1973-2021",
       y = "Number of Titles",
       x = NULL,
       caption = "Source: Tidytuesday")+
  scale_fill_manual(values = c("#666262","#003f5c", "#ffa600")) +
  scale_x_continuous(breaks = breaks_width(10))+
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5),
        axis.title = element_text(size = 9),
        legend.position = "bottom",
        legend.title = element_text(size=9, face = "bold"),
        axis.line = element_line(),
        panel.border = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())+
  scale_y_continuous(labels=function(x) format(x, big.mark = ",", decimal.mark = ".", scientific = FALSE)) +
  guides(fill=guide_legend(title="Category"))

Leading the trends of microeconomics taking over the economics field, microeconomics research that was only a minority start to take the lead on the number of publication each year in the NBER. Unlike macroeconomics that is well-divided among school of thought, microeconomics is unified in their assumption. It explodes in popularity as an alternative to macro that continues to prove its inabiity to predict events. Some researchers even began to call macroeconomics as pseudoscience that no longer qualifies as scientific research.

Programs with the most publications in the NBER are Labor Studies, Public Economics, and Economics Fluctuations and Growth, two of which are of micro sub-fields (Graphic 3). Among top ten programs in the NBER, 4 are from macroeconomics, 4 from microeconomics, and 2 are from finance.

Graphic 3.

sub_cat <- nber %>%
  group_by(program_category, program_subcategory) %>%
  summarise(freq = n()) %>%
  arrange(desc(freq)) %>%
  head(10)
## `summarise()` has grouped output by 'program_category'. You can override using the `.groups` argument.
plot2 <- ggplot(data = sub_cat, 
       mapping = aes(x = freq, y = reorder(program_subcategory, freq), group = program_category))+
  geom_col(aes(fill = program_category))+
  labs( title = "Top Economic Fields Publications in NBER Journal",
        subtitle = "from 1973-2021",
        caption = "Source: Tidytuesday",
        x = "Number of Published Papers",
        y = NULL)+
  scale_y_discrete(labels = function(x) str_wrap(x, width = 35))+
  theme_minimal()+
  scale_fill_manual(values = c("#666262", "#003f5c", "#ffa600")) +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5),
        axis.title = element_text(size = 9),
        legend.position = "bottom",
        legend.title = element_text(size=9, face = "bold"),
        legend.justification = "center")+
  
  scale_x_continuous(labels=function(x) format(x, big.mark = ",", decimal.mark = ".", scientific = FALSE))

plot2 <- plot2 + guides(fill=guide_legend(title="Category"))
plot2

A Tale of Top University and Male-Dominated World of Economics

Economics is the worst profession for women (Bayer & Rouse, 2016). And the consequences of this aren’t felt only by the women who work in the field and must endure sexist policies and hostile behavior. Government policies would likely look very different were more women involved in drafting them.

The NBER isn’t immune to this stark gender disparity. First, let’s take a look at the most productive authors on the NBER. It is assumed that authors with most publications can be considered as seniors in the economics field.

Graphic 4.

author_freq <- nber %>%
  group_by(author_name) %>%
  summarise(freq = n()) %>%
  arrange(desc(freq)) %>%
  head(10)
plot_author <- 
  ggplot(author_freq, aes(x=reorder(author_name, freq), y=freq)) +
  geom_segment(size =1.5, aes(x=reorder(author_name, freq), xend=reorder(author_name, freq), y=0, yend=freq, color ="#003f5c")) +
  geom_point( size=4, color="#003f5c", fill=alpha("#52b3bf", 0.3), alpha=0.7, shape=21, stroke=2) +
  scale_colour_identity()+
  labs( title = "Most Profilic Authors on NBER Journal",
      subtitle = "from 1973-2021",
      caption = "Source: Tidytuesday",
      x = NULL,
      y = "Number of Publication")+
  theme_light() +
  coord_flip() +
  theme(
    panel.grid.major.y = element_blank(),
    panel.border = element_blank(),
    axis.ticks.y = element_blank()
  )+
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5),
        axis.title = element_text(size=9),
        legend.position = "none")

plot_author

Unsurprisingly no women on the list. What about the top 20? 30? Now, I compiled the top 30 authors with most paper publications on the NBER.

author1 <- data.frame(author_name, author_affiliation, author_repec_rank, author_gender)
author1$author_gender <- as.factor(author1$author_gender)
author_freq1 <- nber %>%
  group_by(author_name) %>%
  summarise(freq = n()) %>%
  arrange(desc(freq)) %>%
  head(30)
author3 <- author_freq1 %>%
  left_join(author1, by = "author_name")

author3$author_repec_rank <- as.numeric(author3$author_repec_rank)
author_gen <- author3 %>%
  group_by(author_gender) %>%
  summarise(freq =n ()) %>%
  mutate(proportion = freq/sum(freq)) %>%
  arrange(desc(proportion))
# dummy variable for creating plot
author_gen <- author_gen %>%
  mutate(Dummy = "2021")

Graphic 5.

ggplot(data = author_gen, aes(x = proportion, y = Dummy, fill = author_gender))+
  geom_col(position = "fill")+
  labs(title = "Male to Female Ratio in the NBER Top 30 Authors",
       subtitle = "from 1973-2021",
       caption = "Source: Tidytuesday",
       x = NULL,
       y = NULL,
       fill = NULL)+
  theme_minimal()+
  scale_fill_manual(values = c("#666262", "#003f5c", "#ffa600")) +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5),
        axis.title = element_text(size = 9),
        legend.position = "bottom",
        legend.title = element_text(size=9, face = "bold"),
        legend.justification = "center",
        axis.text.y = element_blank())

Only 2 out of 30 most profilic authors on the NBER are female. The picture on a whole affiliates of NBER isn’t looking good either, barely one fifth of members of the NBER are women. At virtually every level of training and every professional rank within economics, women are a minority. And women are less likely than men to progress along the career path, so this imbalance is more lopsided at senior levels.In economics, men receive tenure at a rate 12 percentage points higher than women do, after controlling for family circumstances and publication records. Women who clear that hurdle are about half as likely as men to be named full professor within seven years. Just 4% of doctoral degrees in economics were awarded to African-Americans in 2011 (compared with about 8% across all academic fields)

It is ironic how economics, the study of inequality, is plagued by inequality itself. Science is supposed to be ultimately about meritocracy, but academia such as economics is cluttered with barriers of entry – another example of such inequality is the discrepancy of senior economics from top university and from “the others”.

Graphic 6.

author_aff <- author3 %>%
  group_by(author_affiliation) %>%
  summarise(freq =n ()) 

author_aff <- author_aff %>% 
  add_row(author_affiliation = "Others", freq = 6) %>%
  mutate(proportion = freq/sum(freq))

author_aff <- author_aff[-c(1:3, 6:9, 12:13), ] 
ggplot(author_aff, aes(area = proportion, fill = author_affiliation,
                       label = paste(author_affiliation, scales::percent(proportion), sep = "\n"))) +
  geom_treemap() +
  geom_treemap_text(color = "white", place = "center") +
  scale_fill_manual(values = c("#666262", "#666262", "#003f5c", "#666262", "#666262")) +
  labs(title = "Number of Senior Authors in the NBER by University Affiliations",
       subtitle = "from 1973-2021",
       caption = "Source: Tidytuesday") +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5),
        legend.position = "none")

Not even 1 in 5 of top most productive authors in the NBER is from any institutions other than Harvard, UC, MIT, and Chicago. From the beginning, the playing field is tilted in favor of those from prestigious universities. Acceptance into journal, even working paper series, is a steep hill for the average economists.

Conclusion

The National Bureau of Econommics Research has been tremendeously important in the modern history of economics. Born on the brink of the turbulent Great Depression in 1920s, the projects of the NBER founders, such as on business cycles and economic growth, played a vital role in shaping up the economics of today as we know. Since 1970s, the NBER has been growing exponentially both in publications and researchers affiliations. Mimicking the current trend of economics field, microeconomics is taking the researcher’s realm by force, as macroeconomics’ predictive ability is being shamed and questioned. However, the change still hasn’t managed to shake off gender gap and institutional bias in the NBER, which is the same as with everywhere else in the economics field.