The National Bureau of Economic Research is a think tank committed to “undertaking and disseminating unbiased economic research among public policymakers, business professional, and the academic community.” Its contribution to the history of modern economics is hard to overstate; NBER is best known for accurately providing start and end dates for recessions in the United States. Not surprisingly, considering its success, an amazing 22 Nobel Prize–winners for economics have been a part of NBER, including Milton Friedman, Paul Krugman, and Joseph Stiglitz.
The more than 1,600 economists who are NBER researchers are the leading scholars in their fields. Most NBER-affiliated researchers are either Faculty Research Fellows (FRFs) or Research Associates (RAs). Faculty Research Fellows are typically junior scholars. Research Associates, whose appointments are approved by the NBER Board of Directors, hold tenured positions at their home institutions.
New research by NBER affiliates, circulated for discussion and comment. The NBER distributes more than 1,200 working papers each year. These papers have not been peer reviewed. Papers issued more than 18 months ago are open access. More recent papers are available without charge to affiliates of subscribing academic institutions, employees of NBER Corporate Associates, government employees in the US, journalists, and residents of low-income countries.
library(nberwp)
library(tidyverse)
library(tidytuesdayR)
library(lubridate)
library(ggplot2)
library(stringr)
library(CGPfunctions)
library(scales)
library(ggpubr)
library(treemapify)
theme_set(theme_pubr())
Read data
<- tidytuesdayR::tt_load('2021-09-28')
tuesdata
<- tuesdata$papers
papers <- tuesdata$authors
authors <- tuesdata$programs
programs <- tuesdata$paper_authors
paper_authors <- tuesdata$paper_programs paper_programs
Data Structuring
# joining needed data into a single dataframe
<- paper_authors %>%
paper_authors left_join(papers, by = "paper")
<- paper_authors %>%
paper_authors left_join(authors, by = "author")
<- paper_authors %>%
paper_authors left_join(paper_programs, by = "paper")
<- paper_authors %>%
nber left_join(programs, by = "program")
nber
str(nber)
## spec_tbl_df [130,081 x 11] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ paper : chr [1:130081] "w0001" "w0002" "w0003" "w0004" ...
## $ author : chr [1:130081] "w0001.1" "w0002.1" "w0003.1" "w0004.1" ...
## $ year : num [1:130081] 1973 1973 1973 1973 1973 ...
## $ month : num [1:130081] 6 6 6 7 7 7 8 9 9 9 ...
## $ title : chr [1:130081] "Education, Information, and Efficiency" "Hospital Utilization: An Analysis of SMSA Differences in Hospital Admission Rates, Occupancy Rates and Bed Rates" "Error Components Regression Models and Their Applications" "Human Capital Life Cycle of Earnings Models: A Specific Solution and Estimation" ...
## $ name : chr [1:130081] "Finis Welch" "Barry R Chiswick" "Swarnjit S Arora" "Lee A Lillard" ...
## $ user_nber : chr [1:130081] "finis_welch" "barry_chiswick" "swarnjit_arora" NA ...
## $ user_repec : chr [1:130081] NA "pch425" NA "pli669" ...
## $ program : chr [1:130081] NA NA NA NA ...
## $ program_desc : chr [1:130081] NA NA NA NA ...
## $ program_category: chr [1:130081] NA NA NA NA ...
## - attr(*, "spec")=
## .. cols(
## .. paper = col_character(),
## .. author = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Data Dictionary
paper
: Paper IDauthor
: Author IDyear
: Publish yearmonth
: Publish monthtitle
: Title of the paperuser_nber
: Author username on NBERuser_repec
: Author username on REPECname
: Author nameprogram
: program IDprogram_desc
: Subcategory of the paperprogram_category
: Category of the paperData Cleaning
Since we won’t be using user_nber
and user_repec
data, we will remove these variables.
<- nber[-c(7,8)] nber
Handling missing values
# checking for missing value.
colSums((is.na(nber)))
## paper author year month
## 0 0 0 0
## title name program program_desc
## 0 0 530 530
## program_category
## 1516
The number of missing value is inconsequential relative to the dataset size, so we are going to ignore it.
Data Transformation
Now, we create a date data from year
and month
columns.
<- transform(nber, Date = as.Date(paste(year, month, 1, sep = "-"))) nber
To analyze year on year monthly data, we may create publish_month
variable.
$publish_month <- month(nber$Date, label = T, abbr = T) nber
Afterwards, renaming columns to be more understandable is considered best practice.
<- rename(nber,
nber paper_id = paper,
author_id = author,
publish_date = Date,
author_name = name,
program_subcategory = program_desc,
program_id = program
)
Several columns are categorical, therefore should be changed to factor data type.
<- nber %>%
nber mutate(across(c(author_name, program_id, program_subcategory, program_category), as.factor))
We can remove redundant month
variable.
<- nber[-c(4)] nber
Then, we will also rearrange the columns in dataframe
<- nber[, c(1,4,2,5,8,6,7,9,10,3)] nber
Let’s see how the final dataset looks like after wrangling.
str(nber)
## 'data.frame': 130081 obs. of 10 variables:
## $ paper_id : chr "w0001" "w0002" "w0003" "w0004" ...
## $ title : chr "Education, Information, and Efficiency" "Hospital Utilization: An Analysis of SMSA Differences in Hospital Admission Rates, Occupancy Rates and Bed Rates" "Error Components Regression Models and Their Applications" "Human Capital Life Cycle of Earnings Models: A Specific Solution and Estimation" ...
## $ author_id : chr "w0001.1" "w0002.1" "w0003.1" "w0004.1" ...
## $ author_name : Factor w/ 15398 levels "A Abigail Payne",..: 4604 1505 13732 8457 6149 14475 8548 9851 12115 8457 ...
## $ program_category : Factor w/ 3 levels "Finance","Macro/International",..: NA NA NA NA NA NA NA NA NA NA ...
## $ program_id : Factor w/ 21 levels "AG","AP","CF",..: NA NA NA NA NA NA NA NA NA NA ...
## $ program_subcategory: Factor w/ 21 levels "Asset Pricing",..: NA NA NA NA NA NA NA NA NA NA ...
## $ publish_date : Date, format: "1973-06-01" "1973-06-01" ...
## $ publish_month : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 6 6 6 7 7 7 8 9 9 9 ...
## $ year : num 1973 1973 1973 1973 1973 ...
<- function(x){
pw if(x %in% c(1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979)){
<- "1970s"
x else if(x %in% c(1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989)){
}<- "1980s"
x else if(x %in% c(1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999)){
}<- "1990s"
x else if(x %in% c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009)){
}<- "2000s"
x else if(x %in% c(2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021)){
}<- "2010s"
x
} }
$decade <- sapply(as.character(nber$year), FUN = pw)
nber$decade <- as.factor(nber$decade) nber
summary(nber)
## paper_id title author_id
## Length:130081 Length:130081 Length:130081
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## author_name program_category program_id
## Jonathan Gruber : 359 Finance :12957 LS :14084
## James J Heckman : 331 Macro/International:35929 PE :13967
## Daron Acemoglu : 308 Micro :79679 EFG :13113
## Janet M Currie : 306 NA's : 1516 IFM : 8570
## Michael D Bordo : 297 ITI : 7125
## Edward L Glaeser: 291 (Other):72692
## (Other) :128189 NA's : 530
## program_subcategory publish_date
## Labor Studies :14084 Min. :1973-06-01
## Public Economics :13967 1st Qu.:2005-10-01
## Economic Fluctuations and Growth :13113 Median :2013-03-01
## International Finance and Macroeconomics: 8570 Mean :2010-07-13
## International Trade and Investment : 7125 3rd Qu.:2018-03-01
## (Other) :72692 Max. :2021-06-01
## NA's : 530
## publish_month year decade
## May :12102 Min. :1973 1970s: 627
## Jun :12090 1st Qu.:2005 1980s: 6247
## Oct :11234 Median :2013 1990s:11231
## Dec :11139 Mean :2010 2000s:31403
## Sep :11044 3rd Qu.:2018 2010s:80573
## Jan :11018 Max. :2021
## (Other):61454
nber
dataset consists of 130 thousands of NBER economic papers published in 1973 - 2021.
length(unique(nber$author_id))
## [1] 15437
There are around 15 thousands of authors who have published on NBER during 1973-2021.
unique(nber$program_category)
## [1] <NA> Macro/International Micro
## [4] Finance
## Levels: Finance Macro/International Micro
Each program category connects loosely with traditional economic field, they are Finance, Microeconomy, and Macroeconomy.
unique(nber$program_subcategory)
## [1] <NA>
## [2] Economic Fluctuations and Growth
## [3] International Finance and Macroeconomics
## [4] International Trade and Investment
## [5] Public Economics
## [6] Labor Studies
## [7] Health Economics
## [8] Monetary Economics
## [9] Productivity, Innovation, and Entrepreneurship
## [10] Law and Economics
## [11] Children
## [12] Corporate Finance
## [13] Economics of Aging
## [14] Development of the American Economy
## [15] Environment and Energy Economics
## [16] Industrial Organization
## [17] Asset Pricing
## [18] Health Care
## [19] Economics of Education
## [20] Political Economics
## [21] Technical Working Papers
## [22] Development Economics
## 21 Levels: Asset Pricing Children Corporate Finance ... Technical Working Papers
Ignoring NA, there are 20 subcategoires of research program in NBER journal.
Martin S. Felstein succeeded as the NBER president in 1977, who was a professor at Harvard. He altered every aspects of the NBER. He moved the headquarters from New York to Cambridge. He also advanced the NBER’s role in disseminating economic research by developing NBER Working Paper Series as a leading source of pre-publication findings.In 1977, when he started, there were 142 of such papers. In 2008, by close of his term, there were 14,000. During Feldstein’s term, the number of publication multiplied by twentyfolds (Graphic 1).
Graphic 1.
<- nber %>%
number_year group_by(year) %>%
summarise(titles = n()) %>%
mutate(log_titles = log10(titles))
ggplot(number_year, aes(x = year, y = log_titles))+
geom_line(size =1, color = "#003f5c")+
labs(title = "Number of NBER Publications",
subtitle = "from 1973-2021",
y = "Logarithmic scale of number of titles",
x = NULL,
caption = "Source: Tidytuesday")+
theme(plot.title = element_text(hjust = 0, face = "bold"),
plot.subtitle = element_text(hjust = 0),
axis.title = element_text(size=9),
legend.position = "bottom",
legend.title = element_text(size=9, face = "bold"),
axis.line = element_line(),
panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
scale_x_continuous(breaks = breaks_width(10)) +
scale_y_continuous(breaks = breaks_width(1))
Founded in 1920, the NBER organization was largely a response to heated controversies over income distribution. Back then, economics was just economics, there were no subfields of economics as we know now. Naturally, macroeconomics issue such as recession, economic growth, and recession was the topics of the decade during the Great Depression. Most of economics paper focus on macroeconomic issue up to 1960s. In 1950s and 1960s, the NBER retained its focus on economic fluctuations, productivity, and economic growth. However, it all changed when John R. Meyer was appointed as the president. He was focused in micro-instead of macro-economics. From then on, we can observe a shift in the proportion of subfields of papers publications, as seen in Graphic 2. It is a slightly modified plot from Graphic 1: we swapped the logarithmic scale to level, and divide the number of paper publication by its economic subfields.
<- nber %>%
num_cat group_by(year, program_category) %>%
summarise(freq = n()) %>%
arrange(desc(freq))
## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
<- na.omit(num_cat) num_cat
ggplot(num_cat, aes(x = year, y= freq))+
geom_area(size =1, aes(group = program_category, fill = program_category)) +
labs(title = "Number of NBER Publications by Economics Subfields",
subtitle = "from 1973-2021",
y = "Number of Titles",
x = NULL,
caption = "Source: Tidytuesday")+
scale_fill_manual(values = c("#666262","#003f5c", "#ffa600")) +
scale_x_continuous(breaks = breaks_width(10))+
theme(plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
axis.title = element_text(size = 9),
legend.position = "bottom",
legend.title = element_text(size=9, face = "bold"),
axis.line = element_line(),
panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
scale_y_continuous(labels=function(x) format(x, big.mark = ",", decimal.mark = ".", scientific = FALSE)) +
guides(fill=guide_legend(title="Category"))
Leading the trends of microeconomics taking over the economics field, microeconomics research that was only a minority start to take the lead on the number of publication each year in the NBER. Unlike macroeconomics that is well-divided among school of thought, microeconomics is unified in their assumption. It explodes in popularity as an alternative to macro that continues to prove its inabiity to predict events. Some researchers even began to call macroeconomics as pseudoscience that no longer qualifies as scientific research.
Programs with the most publications in the NBER are Labor Studies, Public Economics, and Economics Fluctuations and Growth, two of which are of micro sub-fields (Graphic 3). Among top ten programs in the NBER, 4 are from macroeconomics, 4 from microeconomics, and 2 are from finance.
Graphic 3.
<- nber %>%
sub_cat group_by(program_category, program_subcategory) %>%
summarise(freq = n()) %>%
arrange(desc(freq)) %>%
head(10)
## `summarise()` has grouped output by 'program_category'. You can override using the `.groups` argument.
<- ggplot(data = sub_cat,
plot2 mapping = aes(x = freq, y = reorder(program_subcategory, freq), group = program_category))+
geom_col(aes(fill = program_category))+
labs( title = "Top Economic Fields Publications in NBER Journal",
subtitle = "from 1973-2021",
caption = "Source: Tidytuesday",
x = "Number of Published Papers",
y = NULL)+
scale_y_discrete(labels = function(x) str_wrap(x, width = 35))+
theme_minimal()+
scale_fill_manual(values = c("#666262", "#003f5c", "#ffa600")) +
theme(plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
axis.title = element_text(size = 9),
legend.position = "bottom",
legend.title = element_text(size=9, face = "bold"),
legend.justification = "center")+
scale_x_continuous(labels=function(x) format(x, big.mark = ",", decimal.mark = ".", scientific = FALSE))
<- plot2 + guides(fill=guide_legend(title="Category"))
plot2 plot2
Economics is the worst profession for women (Bayer & Rouse, 2016). And the consequences of this aren’t felt only by the women who work in the field and must endure sexist policies and hostile behavior. Government policies would likely look very different were more women involved in drafting them.
The NBER isn’t immune to this stark gender disparity. First, let’s take a look at the most productive authors on the NBER. It is assumed that authors with most publications can be considered as seniors in the economics field.
Graphic 4.
<- nber %>%
author_freq group_by(author_name) %>%
summarise(freq = n()) %>%
arrange(desc(freq)) %>%
head(10)
<-
plot_author ggplot(author_freq, aes(x=reorder(author_name, freq), y=freq)) +
geom_segment(size =1.5, aes(x=reorder(author_name, freq), xend=reorder(author_name, freq), y=0, yend=freq, color ="#003f5c")) +
geom_point( size=4, color="#003f5c", fill=alpha("#52b3bf", 0.3), alpha=0.7, shape=21, stroke=2) +
scale_colour_identity()+
labs( title = "Most Profilic Authors on NBER Journal",
subtitle = "from 1973-2021",
caption = "Source: Tidytuesday",
x = NULL,
y = "Number of Publication")+
theme_light() +
coord_flip() +
theme(
panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.y = element_blank()
+
)theme(plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
axis.title = element_text(size=9),
legend.position = "none")
plot_author
Unsurprisingly no women on the list. What about the top 20? 30? Now, I compiled the top 30 authors with most paper publications on the NBER.
<- data.frame(author_name, author_affiliation, author_repec_rank, author_gender)
author1 $author_gender <- as.factor(author1$author_gender) author1
<- nber %>%
author_freq1 group_by(author_name) %>%
summarise(freq = n()) %>%
arrange(desc(freq)) %>%
head(30)
<- author_freq1 %>%
author3 left_join(author1, by = "author_name")
$author_repec_rank <- as.numeric(author3$author_repec_rank) author3
<- author3 %>%
author_gen group_by(author_gender) %>%
summarise(freq =n ()) %>%
mutate(proportion = freq/sum(freq)) %>%
arrange(desc(proportion))
# dummy variable for creating plot
<- author_gen %>%
author_gen mutate(Dummy = "2021")
Graphic 5.
ggplot(data = author_gen, aes(x = proportion, y = Dummy, fill = author_gender))+
geom_col(position = "fill")+
labs(title = "Male to Female Ratio in the NBER Top 30 Authors",
subtitle = "from 1973-2021",
caption = "Source: Tidytuesday",
x = NULL,
y = NULL,
fill = NULL)+
theme_minimal()+
scale_fill_manual(values = c("#666262", "#003f5c", "#ffa600")) +
theme(plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
axis.title = element_text(size = 9),
legend.position = "bottom",
legend.title = element_text(size=9, face = "bold"),
legend.justification = "center",
axis.text.y = element_blank())
Only 2 out of 30 most profilic authors on the NBER are female. The picture on a whole affiliates of NBER isn’t looking good either, barely one fifth of members of the NBER are women. At virtually every level of training and every professional rank within economics, women are a minority. And women are less likely than men to progress along the career path, so this imbalance is more lopsided at senior levels.In economics, men receive tenure at a rate 12 percentage points higher than women do, after controlling for family circumstances and publication records. Women who clear that hurdle are about half as likely as men to be named full professor within seven years. Just 4% of doctoral degrees in economics were awarded to African-Americans in 2011 (compared with about 8% across all academic fields)
It is ironic how economics, the study of inequality, is plagued by inequality itself. Science is supposed to be ultimately about meritocracy, but academia such as economics is cluttered with barriers of entry – another example of such inequality is the discrepancy of senior economics from top university and from “the others”.
Graphic 6.
<- author3 %>%
author_aff group_by(author_affiliation) %>%
summarise(freq =n ())
<- author_aff %>%
author_aff add_row(author_affiliation = "Others", freq = 6) %>%
mutate(proportion = freq/sum(freq))
<- author_aff[-c(1:3, 6:9, 12:13), ] author_aff
ggplot(author_aff, aes(area = proportion, fill = author_affiliation,
label = paste(author_affiliation, scales::percent(proportion), sep = "\n"))) +
geom_treemap() +
geom_treemap_text(color = "white", place = "center") +
scale_fill_manual(values = c("#666262", "#666262", "#003f5c", "#666262", "#666262")) +
labs(title = "Number of Senior Authors in the NBER by University Affiliations",
subtitle = "from 1973-2021",
caption = "Source: Tidytuesday") +
theme(plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
legend.position = "none")
Not even 1 in 5 of top most productive authors in the NBER is from any institutions other than Harvard, UC, MIT, and Chicago. From the beginning, the playing field is tilted in favor of those from prestigious universities. Acceptance into journal, even working paper series, is a steep hill for the average economists.
The National Bureau of Econommics Research has been tremendeously important in the modern history of economics. Born on the brink of the turbulent Great Depression in 1920s, the projects of the NBER founders, such as on business cycles and economic growth, played a vital role in shaping up the economics of today as we know. Since 1970s, the NBER has been growing exponentially both in publications and researchers affiliations. Mimicking the current trend of economics field, microeconomics is taking the researcher’s realm by force, as macroeconomics’ predictive ability is being shamed and questioned. However, the change still hasn’t managed to shake off gender gap and institutional bias in the NBER, which is the same as with everywhere else in the economics field.