The nihexporter package provides the following tables of data:
projects: information about funded projects
project.pis: PI information (look these up in NIH REPORTER) for each project.num
project.orgs: links DUNS numbers (org.duns) from projects table to information on specific organizations
publinks: links grants (project.num) to PUBMED IDs (pmid).
The nihexporter package is avavilable at https://github.com/jayhesselberth/nihexporter.
Many of the examples generate project numbers that you can use to look up project specifics in NIH REPORTER http://projectreporter.nih.gov/reporter.cfm
The total.cost column from the projects table has many NA values. You need to use na.rm in aggregation functions to ensure robust handling of these values.
Let’s look at spending over time for NIGMS in years that have such data:
cost.over.time <- projects %>%
select(institute, total.cost, fiscal.year) %>%
filter(institute == 'GM') %>%
group_by(fiscal.year, institute) %>%
summarize(yearly.cost = sum(total.cost, na.rm = TRUE)) %>%
filter(yearly.cost > 0)
cost.over.time %>%
ggplot(aes(x = fiscal.year, y = yearly.cost)) +
geom_line()
Let’s look at WHERE the money is going. This example illustrates linking of the project and project.orgs tables via the org.duns column.
money.per.institution <- projects %>%
filter(org.duns != '' & activity == 'R01') %>%
select(org.duns, fiscal.year, total.cost) %>%
group_by(org.duns, fiscal.year) %>%
summarise(total.award.billions = sum(total.cost, na.rm = TRUE) / 1e9) %>%
ungroup() %>%
arrange(desc(total.award.billions)) %>%
left_join(project.orgs) %>%
select(fiscal.year, org.name, total.award.billions)
## Warning in left_join_impl(x, y, by$x, by$y): joining factors with different
## levels, coercing to character vector
money.per.institution %>%
head(10) %>% kable()
| fiscal.year | org.name | total.award.billions |
|---|---|---|
| 2005 | EMORY UNIVERSITY | 0.4188344 |
| 2005 | UNIVERSITY OF PENNSYLVANIA | 0.4188344 |
| 2005 | BOSTON UNIVERSITY | 0.4188344 |
| 2005 | FOX CHASE CANCER CENTER | 0.4188344 |
| 2005 | WISTAR INSTITUTE | 0.4188344 |
| 2004 | EMORY UNIVERSITY | 0.4156760 |
| 2004 | UNIVERSITY OF PENNSYLVANIA | 0.4156760 |
| 2004 | BOSTON UNIVERSITY | 0.4156760 |
| 2004 | FOX CHASE CANCER CENTER | 0.4156760 |
| 2004 | WISTAR INSTITUTE | 0.4156760 |
And WHO are the big winners? Look these PIs up in NIH REPORTER …
money.per.pi <- projects %>%
filter(activity == 'R01') %>%
select(project.num, total.cost) %>%
group_by(project.num) %>%
summarise(total.award.millions = sum(total.cost, na.rm = TRUE) / 1e6) %>%
ungroup() %>%
arrange(desc(total.award.millions)) %>%
left_join(project.pis) %>%
filter(pi.id != '') %>%
group_by(pi.id) %>%
summarise(pi.millionaire = sum(total.award.millions)) %>%
arrange(desc(pi.millionaire))
money.per.pi %>%
head(10) %>% kable()
| pi.id | pi.millionaire |
|---|---|
| 1883846 | 1427.1710 |
| 1862760 | 983.2956 |
| 1858142 | 918.6283 |
| 6719627 | 768.9067 |
| 8802713 | 626.4762 |
| 8187946 | 597.4397 |
| 1880471 | 508.7985 |
| 7733544 | 505.7752 |
| 1889062 | 501.9305 |
| 1862962 | 479.7646 |
In order to measure the “return”" on the money the NIH invests in the research enterprise, we can measure scholarly output (i.e., publications) per dollar invested.
Here we identify th highest performing grants outside of the R01 category. Much has been made of the wasteful spending outside of investigator-initiated research. Here we can see that this is not always the case …
high.perf.not.r01 <- projects %>%
filter(activity != 'R01') %>%
group_by(project.num) %>%
summarise(overall.cost = sum(total.cost, na.rm = TRUE)) %>%
filter(overall.cost > 1e6) %>%
left_join(publinks) %>%
filter(!is.na(pmid)) %>%
group_by(project.num, overall.cost) %>%
summarize(n.pubs = n()) %>%
mutate(cost.per.pub = overall.cost / n.pubs) %>%
ungroup() %>%
arrange(cost.per.pub)
## Warning in left_join_impl(x, y, by$x, by$y): joining factors with different
## levels, coercing to character vector
high.perf.not.r01 %>%
head(10) %>%
kable()
| project.num | overall.cost | n.pubs | cost.per.pub |
|---|---|---|---|
| N01HC055015 | 1082263 | 855 | 1265.805 |
| P30CA007175 | 2449282 | 721 | 3397.062 |
| P30DK025295 | 2564892 | 691 | 3711.855 |
| N01WH022110 | 1227228 | 326 | 3764.503 |
| P30MH030929 | 1710028 | 444 | 3851.414 |
| P30MH031302 | 1045586 | 266 | 3930.774 |
| P01CA018221 | 2727397 | 692 | 3941.325 |
| P41RR001008 | 1506034 | 366 | 4114.847 |
| P01NS010828 | 1390846 | 335 | 4151.779 |
| P40OD010440 | 1307418 | 268 | 4878.425 |
The highest performer has 855 publications at a cost of 1265.8046784 dollars per publication. Here are some of those publications:
## Warning in left_join_impl(x, y, by$x, by$y): joining factor and character
## vector, coercing into character vector
| project.num | pmid |
|---|---|
| N01HC055015 | 2646917 |
| N01HC055015 | 2161309 |
| N01HC055015 | 2141054 |
| N01HC055015 | 1789804 |
| N01HC055015 | 1877584 |
| N01HC055015 | 1442716 |
| N01HC055015 | 1462967 |
| N01HC055015 | 1442718 |
| N01HC055015 | 1342324 |
| N01HC055015 | 1342298 |
Here we quantitate the return on R01 investment. One might argue that grants with higher cost.per.pub are less good investments.
costly.pubs.r01 <- projects %>%
filter(activity == 'R01') %>%
group_by(project.num) %>%
summarise(overall.cost = sum(total.cost, na.rm = TRUE)) %>%
left_join(publinks) %>%
filter(!is.na(pmid)) %>%
group_by(project.num, overall.cost) %>%
summarize(n.pubs = n()) %>%
mutate(cost.per.pub = overall.cost / n.pubs) %>%
ungroup() %>%
arrange(desc(cost.per.pub))
## Warning in left_join_impl(x, y, by$x, by$y): joining factors with different
## levels, coercing to character vector
costly.pubs.r01 %>%
head(10) %>%
kable()
| project.num | overall.cost | n.pubs | cost.per.pub |
|---|---|---|---|
| R01HL095647 | 18466884 | 1 | 18466884 |
| R01MH085543 | 12383648 | 1 | 12383648 |
| R01MH081107 | 10682303 | 1 | 10682303 |
| R01HL084568 | 10227118 | 1 | 10227118 |
| R01CA079572 | 9923542 | 1 | 9923542 |
| R01HD042350 | 9400096 | 1 | 9400096 |
| R01HL098237 | 27257346 | 3 | 9085782 |
| R01HL105448 | 8024843 | 1 | 8024843 |
| R01ES016443 | 15566054 | 2 | 7783027 |
| R01EY005661 | 6445863 | 1 | 6445863 |
I am always impressed at how long people keep their grants. Let’s identify the longest running R01 projects …
long.grants <- projects %>%
filter(activity == 'R01') %>%
select(project.num, project.start, project.end) %>%
group_by(project.num) %>%
summarize(longest.run = max(project.end) - min(project.start)) %>%
arrange(desc(longest.run)) %>%
mutate(in.years = as.numeric(longest.run) / 365)
long.grants %>%
head(10) %>%
kable()
| project.num | longest.run | in.years |
|---|---|---|
| R01HL009610 | 18261 days | 50.03014 |
| R01HL013629 | 17166 days | 47.03014 |
| R01HD008188 | 16282 days | 44.60822 |
| R01CA013202 | 15644 days | 42.86027 |
| R01EB001960 | 15400 days | 42.19178 |
| R01DA001411 | 15282 days | 41.86849 |
| R01GM023303 | 15219 days | 41.69589 |
| R01HL017964 | 15156 days | 41.52329 |
| R01DC000117 | 15127 days | 41.44384 |
| R01EY002686 | 15066 days | 41.27671 |
A 50 year R01? Wow.