Aims

To create an R script that performs a pubmed query online and returns the total number of hits and plots the number of articles by year.

Pubmed Query in R

This script uses the package RISmed to query and download content from Pubmed.

First, load the package and set up the query term.

# load RISmed package
library(RISmed)

# Set query term:
q <- "E. coli ST131"

Then make the query to pubmed (requires internet connection) and print out a summary of the query.

# Look up the query on pubmed
res <- EUtilsSummary(q, type="esearch", db="pubmed")

# Print out the query date:
date()
## [1] "Mon Dec  8 15:18:18 2014"
# Print out the summary of the query
summary(res)
## Query:
## ("escherichia coli"[MeSH Terms] OR ("escherichia"[All Fields] AND "coli"[All Fields]) OR "escherichia coli"[All Fields] OR "e coli"[All Fields]) AND ST131[All Fields] 
## 
## Result count:  367

Download the query data

# Download results 
fetch <- EUtilsGet(res,type="efetch", db="pubmed")

Create a plot of number of article containing the query term by year.

library(ggplot2)
count <- table(Year(fetch)) # count number of articles per year

# Save counts as data.frame to use with ggplot
count <- as.data.frame(count)
names(count) <- c("Year","Counts")

# count cumulative number of articles per year
ccount <- data.frame(Year=count$Year, Counts=cumsum(count$Counts)) 
ccount$g <- "g"
names(ccount) <- c("Year","Counts","g")


# Make plot
p <- qplot(x=Year, y=Counts, data=count, geom="bar", stat="identity")
p <- p + geom_line(aes(x=Year, y=Counts, colour="cumulative counts", group=g), data=ccount) +
    ggtitle(paste("PubMed articles containing \'",q,"\'", sep="")) +
    ylab("Number of articles") +
    xlab(paste("Year \n Query time: ",Sys.time(), sep="")) +
    labs(colour="") +
    theme_bw()
p + theme(legend.position=c(0.2,0.85)) +
    annotate("text", x=max(as.numeric(ccount$Year)), y=max(ccount$Counts), label=max(ccount$Counts))

Reference

This script follows the instruction from these websites:

http://davetang.org/muse/2013/10/31/querying-pubmed-using-r/

http://freshbiostats.wordpress.com/2013/12/03/analysis-of-pubmed-search-results-using-r/