To create an R script that performs a pubmed query online and returns the total number of hits and plots the number of articles by year.
This script uses the package RISmed
to query and download content from Pubmed.
First, load the package and set up the query term.
# load RISmed package
library(RISmed)
# Set query term:
q <- "E. coli ST131"
Then make the query to pubmed (requires internet connection) and print out a summary of the query.
# Look up the query on pubmed
res <- EUtilsSummary(q, type="esearch", db="pubmed")
# Print out the query date:
date()
## [1] "Mon Dec 8 15:18:18 2014"
# Print out the summary of the query
summary(res)
## Query:
## ("escherichia coli"[MeSH Terms] OR ("escherichia"[All Fields] AND "coli"[All Fields]) OR "escherichia coli"[All Fields] OR "e coli"[All Fields]) AND ST131[All Fields]
##
## Result count: 367
Download the query data
# Download results
fetch <- EUtilsGet(res,type="efetch", db="pubmed")
Create a plot of number of article containing the query term by year.
library(ggplot2)
count <- table(Year(fetch)) # count number of articles per year
# Save counts as data.frame to use with ggplot
count <- as.data.frame(count)
names(count) <- c("Year","Counts")
# count cumulative number of articles per year
ccount <- data.frame(Year=count$Year, Counts=cumsum(count$Counts))
ccount$g <- "g"
names(ccount) <- c("Year","Counts","g")
# Make plot
p <- qplot(x=Year, y=Counts, data=count, geom="bar", stat="identity")
p <- p + geom_line(aes(x=Year, y=Counts, colour="cumulative counts", group=g), data=ccount) +
ggtitle(paste("PubMed articles containing \'",q,"\'", sep="")) +
ylab("Number of articles") +
xlab(paste("Year \n Query time: ",Sys.time(), sep="")) +
labs(colour="") +
theme_bw()
p + theme(legend.position=c(0.2,0.85)) +
annotate("text", x=max(as.numeric(ccount$Year)), y=max(ccount$Counts), label=max(ccount$Counts))
This script follows the instruction from these websites:
http://davetang.org/muse/2013/10/31/querying-pubmed-using-r/
http://freshbiostats.wordpress.com/2013/12/03/analysis-of-pubmed-search-results-using-r/