Overview

The nihexporter package provides the following tables of data:

The nihexporter package is avavilable at https://github.com/jayhesselberth/nihexporter.

Examples

Many of the examples generate project numbers that you can use to look up project specifics in NIH REPORTER http://projectreporter.nih.gov/reporter.cfm

Award amounts

The total.cost column from the projects table has many NA values. You need to use na.rm in aggregation functions to ensure robust handling of these values.

Let’s look at spending over time for NIGMS in years that have such data:

cost.over.time <- projects %>%
  select(institute, total.cost, fiscal.year) %>%
  filter(institute == 'GM') %>%
  group_by(fiscal.year, institute) %>%
  summarize(yearly.cost = sum(total.cost, na.rm = TRUE)) %>%
  filter(yearly.cost > 0)

cost.over.time %>%
  ggplot(aes(x = fiscal.year, y = yearly.cost)) +
  geom_line()

Spending at NIGMS

Let’s look at WHERE the money is going. This example illustrates linking of the project and project.orgs tables via the org.duns column.

money.per.institution <- projects %>%
  filter(org.duns != '' & activity == 'R01') %>%
  select(org.duns, fiscal.year, total.cost) %>%
  group_by(org.duns, fiscal.year) %>%
  summarise(total.award.billions = sum(total.cost, na.rm = TRUE) / 1e9) %>%
  ungroup() %>%
  arrange(desc(total.award.billions)) %>%
  left_join(project.orgs) %>%
  select(fiscal.year, org.name, total.award.billions)
## Warning in left_join_impl(x, y, by$x, by$y): joining factors with different
## levels, coercing to character vector
money.per.institution %>%
  head(10) %>% kable()
fiscal.year org.name total.award.billions
2005 EMORY UNIVERSITY 0.4188344
2005 UNIVERSITY OF PENNSYLVANIA 0.4188344
2005 BOSTON UNIVERSITY 0.4188344
2005 FOX CHASE CANCER CENTER 0.4188344
2005 WISTAR INSTITUTE 0.4188344
2004 EMORY UNIVERSITY 0.4156760
2004 UNIVERSITY OF PENNSYLVANIA 0.4156760
2004 BOSTON UNIVERSITY 0.4156760
2004 FOX CHASE CANCER CENTER 0.4156760
2004 WISTAR INSTITUTE 0.4156760

And WHO are the big winners? Look these PIs up in NIH REPORTER …

money.per.pi <- projects %>%
  filter(activity == 'R01') %>%
  select(project.num, total.cost) %>%
  group_by(project.num) %>%
  summarise(total.award.millions = sum(total.cost, na.rm = TRUE) / 1e6) %>%
  ungroup() %>%
  arrange(desc(total.award.millions)) %>%
  left_join(project.pis) %>%
  filter(pi.id != '') %>%
  group_by(pi.id) %>%
  summarise(pi.millionaire = sum(total.award.millions)) %>%
  arrange(desc(pi.millionaire))  

money.per.pi %>%
  head(10) %>% kable()
pi.id pi.millionaire
1883846 1427.1710
1862760 983.2956
1858142 918.6283
6719627 768.9067
8802713 626.4762
8187946 597.4397
1880471 508.7985
7733544 505.7752
1889062 501.9305
1862962 479.7646

Productivity

In order to measure the “return”" on the money the NIH invests in the research enterprise, we can measure scholarly output (i.e., publications) per dollar invested.

Here we identify th highest performing grants outside of the R01 category. Much has been made of the wasteful spending outside of investigator-initiated research. Here we can see that this is not always the case …

high.perf.not.r01 <- projects %>%
  filter(activity != 'R01') %>%
  group_by(project.num) %>%
  summarise(overall.cost = sum(total.cost, na.rm = TRUE)) %>%
  filter(overall.cost > 1e6) %>%
  left_join(publinks) %>%
  filter(!is.na(pmid)) %>%
  group_by(project.num, overall.cost) %>%
  summarize(n.pubs = n()) %>%
  mutate(cost.per.pub = overall.cost / n.pubs) %>%
  ungroup() %>%
  arrange(cost.per.pub)
## Warning in left_join_impl(x, y, by$x, by$y): joining factors with different
## levels, coercing to character vector
high.perf.not.r01 %>%
  head(10) %>%
  kable()
project.num overall.cost n.pubs cost.per.pub
N01HC055015 1082263 855 1265.805
P30CA007175 2449282 721 3397.062
P30DK025295 2564892 691 3711.855
N01WH022110 1227228 326 3764.503
P30MH030929 1710028 444 3851.414
P30MH031302 1045586 266 3930.774
P01CA018221 2727397 692 3941.325
P41RR001008 1506034 366 4114.847
P01NS010828 1390846 335 4151.779
P40OD010440 1307418 268 4878.425

The highest performer has 855 publications at a cost of 1265.8046784 dollars per publication. Here are some of those publications:

## Warning in left_join_impl(x, y, by$x, by$y): joining factor and character
## vector, coercing into character vector
project.num pmid
N01HC055015 2646917
N01HC055015 2161309
N01HC055015 2141054
N01HC055015 1789804
N01HC055015 1877584
N01HC055015 1442716
N01HC055015 1462967
N01HC055015 1442718
N01HC055015 1342324
N01HC055015 1342298

Here we quantitate the return on R01 investment. One might argue that grants with higher cost.per.pub are less good investments.

costly.pubs.r01 <- projects %>%
  filter(activity == 'R01') %>%
  group_by(project.num) %>%
  summarise(overall.cost = sum(total.cost, na.rm = TRUE)) %>%
  left_join(publinks) %>%
  filter(!is.na(pmid)) %>%
  group_by(project.num, overall.cost) %>%
  summarize(n.pubs = n()) %>%
  mutate(cost.per.pub = overall.cost / n.pubs) %>%
  ungroup() %>%
  arrange(desc(cost.per.pub))
## Warning in left_join_impl(x, y, by$x, by$y): joining factors with different
## levels, coercing to character vector
costly.pubs.r01 %>%
  head(10) %>%
  kable()
project.num overall.cost n.pubs cost.per.pub
R01HL095647 18466884 1 18466884
R01MH085543 12383648 1 12383648
R01MH081107 10682303 1 10682303
R01HL084568 10227118 1 10227118
R01CA079572 9923542 1 9923542
R01HD042350 9400096 1 9400096
R01HL098237 27257346 3 9085782
R01HL105448 8024843 1 8024843
R01ES016443 15566054 2 7783027
R01EY005661 6445863 1 6445863

Duration

I am always impressed at how long people keep their grants. Let’s identify the longest running R01 projects …

long.grants <- projects %>%
  filter(activity == 'R01') %>%
  select(project.num, project.start, project.end) %>%
  group_by(project.num) %>%
  summarize(longest.run = max(project.end) - min(project.start)) %>%
  arrange(desc(longest.run)) %>%
  mutate(in.years = as.numeric(longest.run) / 365)

long.grants %>%
  head(10) %>%
  kable()
project.num longest.run in.years
R01HL009610 18261 days 50.03014
R01HL013629 17166 days 47.03014
R01HD008188 16282 days 44.60822
R01CA013202 15644 days 42.86027
R01EB001960 15400 days 42.19178
R01DA001411 15282 days 41.86849
R01GM023303 15219 days 41.69589
R01HL017964 15156 days 41.52329
R01DC000117 15127 days 41.44384
R01EY002686 15066 days 41.27671

A 50 year R01? Wow.