Overview

What are the rates of publication of papers on various topics?

R Markdown

This is an R Markdown File, which is a way to interlace three things:

  • R (or other!) programming language
  • Statistical or scientific reasoning about the code we write, and
  • The output of R code

… all in one place. You can export this interweaving of human and computer language as well as the code output to various formats – pdf, Microsoft Word, or html. Are you new to R Markdown? Here are some great resources:

  • R Markdown was developed by RStudio. Read their descriptions and examples on their site.
  • The RStudio gurus wrote a great book about it – check it out!

Load Packages

Here we add the packages we’ll use. tidyverse helps reshape data, easyPubMed simplifies the use of the PubMed API, and printr allows us to show data frames more attractively.

library(tidyverse)
library(easyPubMed)
library(printr)

Create Search Terms

Let’s create a data frame which contains all the combinations of the variables we want to search for. First, we’ll define three categories that we’ll combine.

We’ll make the combinatoric using expand_grid() from tidyverse.

years <- c(2012:2023)

terms <- c(
  'medicine disparity', 
  'medicine racism', 
  'medicine racial bias')

search_terms <- expand_grid("year" = years,
                       "term" = terms)

Let’s peek!

head(search_terms, 10)
year term
2012 medicine disparity
2012 medicine racism
2012 medicine racial bias
2013 medicine disparity
2013 medicine racism
2013 medicine racial bias
2014 medicine disparity
2014 medicine racism
2014 medicine racial bias
2015 medicine disparity

OK, now we’ll pad those search terms with the text that the API requires:

search_terms <- search_terms %>%
  mutate(final = paste(year, 
                       "[Date - Publication]",
                       " AND ",
                       term,
                       sep = ""
                       ))

And let’s look again:

head(search_terms, 20)
year term final
2012 medicine disparity 2012[Date - Publication] AND medicine disparity
2012 medicine racism 2012[Date - Publication] AND medicine racism
2012 medicine racial bias 2012[Date - Publication] AND medicine racial bias
2013 medicine disparity 2013[Date - Publication] AND medicine disparity
2013 medicine racism 2013[Date - Publication] AND medicine racism
2013 medicine racial bias 2013[Date - Publication] AND medicine racial bias
2014 medicine disparity 2014[Date - Publication] AND medicine disparity
2014 medicine racism 2014[Date - Publication] AND medicine racism
2014 medicine racial bias 2014[Date - Publication] AND medicine racial bias
2015 medicine disparity 2015[Date - Publication] AND medicine disparity
2015 medicine racism 2015[Date - Publication] AND medicine racism
2015 medicine racial bias 2015[Date - Publication] AND medicine racial bias
2016 medicine disparity 2016[Date - Publication] AND medicine disparity
2016 medicine racism 2016[Date - Publication] AND medicine racism
2016 medicine racial bias 2016[Date - Publication] AND medicine racial bias
2017 medicine disparity 2017[Date - Publication] AND medicine disparity
2017 medicine racism 2017[Date - Publication] AND medicine racism
2017 medicine racial bias 2017[Date - Publication] AND medicine racial bias
2018 medicine disparity 2018[Date - Publication] AND medicine disparity
2018 medicine racism 2018[Date - Publication] AND medicine racism

Search in PubMed

Now we’ll make a short function that returns the count of results for a given term:

count_results <- function(term) {
  results <- get_pubmed_ids(term)
  count <- as.integer(results$Count)
  return(count)
}

And now we’ll use that function to populate a new column. Note that we’re using an lapply function that lets us put a pause between searches in order to not go over the “anonymous” API rate supported by PubMed.

search_terms <- search_terms %>% 
  mutate(num_results = lapply(final, function(f) {
    Sys.sleep(0.5)
    count_results(f)
    }))

Visualize the Data

ggplot lets us take a look at our results graphically:

ggplot(search_terms, 
       aes(x=year, y=num_results)) +
  geom_col() +
  facet_wrap(term ~ .) +
  xlab("Year") +
  ylab("Count") + 
  ggtitle("Healthcare Disparity Articles")