PubMed adalah sumber literatur medis yang fenomenal.

"PubMed adalah mesin pencari gratis yang mengakses terutama basis data MEDLINE referensi dan abstrak pada ilmu kehidupan dan topik biomedis". (<a href="http://en.wikipedia.org/wiki/PubMed" target=_"blank"> Wikipedia.com </a>)

Bagi siapa pun yang bekerja dalam proyek Natural Language Processing (NLP) dan sedang mencari teks medis berbasis topik, PubMed adalah sumber daya yang paling relefan!

"PubMed terdiri lebih dari 24 juta kutipan untuk literatur biomedis dari MEDLINE, jurnal sains kehidupan, dan buku online. Kutipan dapat mencakup tautan ke konten teks lengkap dari PubMed Central dan situs web penerbit". 
(<a href="http://www.ncbi.nlm.nih.gov/pubmed" target=_"blank"> PubMed </a>)

Ada banyak alat yang berdiri sendiri dan banyak ekstensi perpustakaan pemrograman untuk membantu meminta dan mengekstrak data PubMed.

Informasi yang tersedia berkisar dari topik, judul, kutipan, abstrak, artikel, dll. Peneliti menggunakannya untuk melihat apa yang sedang tren di komunitas medis, subjek apa yang dibahas, siapa yang menulis apa dan kapan, dan sebagainya.

Pada akhirnya, saya membutuhkan sejumlah besar data medis yang tidak terstruktur untuk topik yang sangat spesifik dan paket yang RISmed memungkinkan saya untuk mendapatkan data itu dengan cara yang mudah.

Gunakan library(RISmed)

summary(search_query)
## Query:
## ("pulmonary disease, chronic obstructive"[MeSH Terms] OR ("pulmonary"[All Fields] AND "disease"[All Fields] AND "chronic"[All Fields] AND "obstructive"[All Fields]) OR "chronic obstructive pulmonary disease"[All Fields] OR "copd"[All Fields]) AND 2012[EDAT] : 2012[EDAT] 
## 
## Result count:  3569

Id query

QueryId(search_query)
##   [1] "23272298" "23271905" "23271904" "23271829" "23271821" "23271819"
##   [7] "23271818" "23271817" "23271741" "23271621" "23271620" "23270668"
##  [13] "23270360" "23270062" "23270045" "23249528" "23269884" "23269866"
##  [19] "23268483" "23268465" "23267696" "23266884" "23266537" "23266127"
##  [25] "23265910" "23265333" "23265285" "23265268" "23265228" "23264836"
##  [31] "23264660" "23264538" "23263935" "23263604" "23262518" "23262512"
##  [37] "23261311" "23261310" "23260455" "23259787" "23259710" "23259655"
##  [43] "23258927" "23258787" "23258786" "23258785" "23258783" "23258777"
##  [49] "23258776" "23258731" "23258580" "23258576" "23258471" "23258468"
##  [55] "23258247" "23258244" "23257773" "23257650" "23257530" "23257347"
##  [61] "23256918" "23256845" "23256723" "23256722" "23256721" "23256720"
##  [67] "23256719" "23256718" "23256717" "23256716" "23256715" "23256714"
##  [73] "23256713" "23256346" "23256175" "23256174" "23256173" "23256172"
##  [79] "23256171" "23256170" "23256169" "23256168" "23256167" "23256166"
##  [85] "23256165" "23256164" "23256163" "23256162" "23256161" "23255854"
##  [91] "23255616" "23255540" "23254770" "23253873" "23253549" "23253321"
##  [97] "23252578" "23252355" "23252287" "23251993"

Sekrap data dari Pubmed

records<- EUtilsGet(search_query)
class(records)
## [1] "Medline"
## attr(,"package")
## [1] "RISmed"
## [1] "Medline"
## attr(,"package")
## [1] "RISmed"
# str(records)

Susun dalam dataframe

pubmed_data <- data.frame('Title'=ArticleTitle(records),'Abstract'=AbstractText(records))
head(pubmed_data,1)
##                                                                     Title
## 1 Burning HOT: revisiting guidelines associated with home oxygen therapy.
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Abstract
## 1 Burn injuries secondary to home oxygen therapy (HOT) have become increasingly common in recent years, yet several guidelines for HOT and chronic obstructive pulmonary disease (COPD) neglect to stress the dangers of open flames. This retrospective review of burn injury admissions secondary to HOT to our burn centre from 2007 to 2012 aimed to establish the extent of this problem and to discuss the current literature and a selection of national guidelines. Out of six patients (five female, one male) with a median age of 72 (range 58-79), four were related to smoking, and two due to lighting candles. The mean total body surface area (TBSA) affected was 17% (range 2-60%). Five patients sustained facial burns, two suffered from inhalation injury (33.3%), and five required surgery (83.3%). Mean total length of stay was 20 days (range 8 to 33), and one patient died. Although mentioned in the majority, some guidelines fail to address the issue of smoking in light of the associated risk for injury, which in turn might have future implications in litigation related to iatrogenic injuries. Improved HOT guidelines will empower physicians to discourage smoking, and fully consider the risks versus benefits of home oxygen before prescription. With a view on impeding a rising trend of burns secondary to HOT, we suggest revision to national guidelines, where appropriate.
pubmed_data$Abstract <- as.character(pubmed_data$Abstract)
pubmed_data$Abstract <- gsub(",", " ", pubmed_data$Abstract, fixed = TRUE)

Lihat stringnya

str(pubmed_data)
## 'data.frame':    100 obs. of  2 variables:
##  $ Title   : Factor w/ 100 levels "[Advances in pulmonology in year 2012].",..: 24 47 63 69 76 18 92 98 37 1 ...
##  $ Abstract: chr  "Burn injuries secondary to home oxygen therapy (HOT) have become increasingly common in recent years  yet sever"| __truncated__ "BACKGROUND: High-intensity (high-pressure and high backup rate) noninvasive ventilation has recently been advoc"| __truncated__ "" "Oxygen is necessary for all aerobic life  and nothing is more important in respiratory care than its proper und"| __truncated__ ...

Simpan di directory anda

write.csv(pubmed_data, ("/your directory/pubmed_data.csv")