This can be a manual and protracted iterative process which may involve using specialised searching services, downloading abstracts, reading and filtering, secondary searching and so on, and may involve sifting many thousands of abstracts.
Often we may just want a rapid overview of the literature to help focus further reviewing.
In this vignette we demonstrate the use of R packages for large scale extraction of abstracts, and analytical techniques for identifying topics or themes in the abstracts.
The vignette is based on a number of R packages:
europepmc
- this is a sophisticated tool which interacts with the PubMedCentral API and provides access to additional fields.adjutant
- this is a fully fledged package with retrieval and clustering functions. 3.tidytext
- a package for text mining using tidy data principles.Rtsne
- this uses the tSNE algorithm for data reduction and cluster visualisationdbscan
- applies the HDBSCAN algorithm for data clusteringmyScrapers
- wraps some functions built on other packages to automate the search, extraction, and filtering process.We have “hacked” some of the functions in these packages and written additional functions to develop a work flow from searching and retrieval to analysis
europepmc
This is a package which allows searching of EuropePMC via the API.
It can be downloaded from CRAN.
if(!require("europepmc")) install.packages("europepmc")
library(europepmc)
The main function is epmc_search
which allows us to search the site and retrieve abstracts, metadata and citation counts.
We’ll use it with the search term “social media” AND “public health” AND surveillance.
head(epmc_search(params$search, limit = 10))
#> # A tibble: 6 x 28
#> id source pmid doi title authorString journalTitle pubYear
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 3144~ MED 3144~ 10.1~ Harn~ Strathdee S~ Curr Opin H~ 2019
#> 2 3141~ MED 3141~ 10.1~ Rece~ Conway M, H~ Yearb Med I~ 2019
#> 3 3127~ MED 3127~ 10.1~ A Sy~ Karmegam D,~ Disaster Me~ 2019
#> 4 3148~ MED 3148~ 10.2~ Comm~ Fontaine G,~ JMIR Public~ 2019
#> 5 3141~ MED 3141~ 10.1~ Arti~ Thiébaut R,~ Yearb Med I~ 2019
#> 6 PMC6~ PMC <NA> 10.5~ Keyw~ Hawes AN. Online J Pu~ 2019
#> # ... with 20 more variables: journalIssn <chr>, pubType <chr>,
#> # isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> # hasBook <chr>, citedByCount <int>, hasReferences <chr>,
#> # hasTextMinedTerms <chr>, hasDbCrossReferences <chr>,
#> # hasLabsLinks <chr>, hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> # firstPublicationDate <chr>, pmcid <chr>, issue <chr>,
#> # journalVolume <chr>, pageInfo <chr>, hasSuppl <chr>
This doesn’t extract the abstract text or Mesh headings (keywords) - to facilitate this we have wrapped the search function, into get_full_search
in myScrapers
.
library(tictoc)
tic()
search1 <- get_full_search(search = params$search, limit = params$limit)
toc()
#> 1695.38 sec elapsed
head(search1, 20)
#> # A tibble: 20 x 33
#> id source pmid doi title authorString journalTitle pubYear
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 3144~ MED 3144~ 10.1~ Harn~ Strathdee S~ Curr Opin H~ 2019
#> 2 3141~ MED 3141~ 10.1~ Rece~ Conway M, H~ Yearb Med I~ 2019
#> 3 3127~ MED 3127~ 10.1~ A Sy~ Karmegam D,~ Disaster Me~ 2019
#> 4 3148~ MED 3148~ 10.2~ Comm~ Fontaine G,~ JMIR Public~ 2019
#> 5 3141~ MED 3141~ 10.1~ Arti~ Thiébaut R,~ Yearb Med I~ 2019
#> 6 PMC6~ PMC <NA> 10.5~ Keyw~ Hawes AN. Online J Pu~ 2019
#> 7 3144~ MED 3144~ 10.3~ Wher~ Majmundar A~ Int J Envir~ 2019
#> 8 3116~ MED 3116~ 10.2~ Moni~ Liu S, Chen~ J Med Inter~ 2019
#> 9 3094~ MED 3094~ 10.1~ Pers~ Degeling C,~ Health Res ~ 2019
#> 10 3142~ MED 3142~ 10.1~ Soci~ Cesare N, N~ BMJ Open Sp~ 2019
#> 11 3132~ MED 3132~ 10.1~ Avia~ Chen Y, Zha~ Sci Rep 2019
#> 12 3128~ MED 3128~ 10.2~ Iden~ Chu KH, Col~ J Med Inter~ 2019
#> 13 3106~ MED 3106~ 10.2~ A So~ Reuter K, M~ JMIR Public~ 2019
#> 14 PMC6~ PMC <NA> 10.5~ Anal~ Burkom H, D~ Online J Pu~ 2019
#> 15 3092~ MED 3092~ 10.1~ Esta~ Brosch S, d~ Drug Saf 2019
#> 16 PMC6~ PMC <NA> 10.5~ Unde~ Park A, Wes~ Online J Pu~ 2019
#> 17 3109~ MED 3109~ 10.2~ Goog~ Lykens J, P~ JMIR Public~ 2019
#> 18 3070~ MED 3070~ 10.7~ Prec~ Bempong NE,~ J Glob Heal~ 2019
#> 19 3143~ MED 3143~ 10.3~ IDOM~ Béré WRC, C~ Stud Health~ 2019
#> 20 3131~ MED 3131~ 10.7~ Twit~ Schaible BJ~ Perm J 2019
#> # ... with 25 more variables: journalIssn <chr>, pubType <chr>,
#> # isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> # hasBook <chr>, citedByCount <int>, hasReferences <chr>,
#> # hasTextMinedTerms <chr>, hasDbCrossReferences <chr>,
#> # hasLabsLinks <chr>, hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> # firstPublicationDate <chr>, pmcid <chr>, issue <chr>,
#> # journalVolume <chr>, pageInfo <chr>, hasSuppl <chr>, bookid <chr>,
#> # name <int>, absText <list>, mesh <list>, keywords <chr>
We can see that the get_full_search
function returns addition metadata such as citation counts, whether the journal is open access and whether there is PDF available. By default, 1000 article descriptions are downloaded. It also includes mesh headings and abstract text.
we can see how many articles are available altogether by running epmc_profile
.
profile <- epmc_profile(query = params$search)
Running epmc_profile
allows us to see that there are 2867 articles of which 2767 are full text articles, and 1934 are open access.
We can easily look at annual abstract frequency - we can readily see the growth in publication frequency in the last 3 years.
search1 %>%
count(pubYear) %>%
ggplot(aes(pubYear, n)) +
geom_col(fill = "blue") +
labs(title = "Abstracts per year",
subtitle = paste("Search: ", params$search)) +
phecharts::theme_phe() +
theme(axis.text.x = element_text(angle = 45 ,hjust = 1))
Similarly we can identify the most frequent journals
journal_count <- search1 %>%
count(journalTitle) %>%
top_n(20) %>%
arrange(-n)
journal_count %>%
ggplot(aes(reorder(journalTitle, n), n)) +
geom_col(fill = "blue") +
coord_flip() +
labs(title = "Journal frequency") +
phecharts::theme_phe()
JMIR Public Health Surveill and PLoS One are the most frequent journals publishing articles on “social media” AND “public health” AND surveillance.
Once we have a data frame of 2869 records with abstract text, we can prepare the data for analysis. The create_corpus
function is designed for this.
out1 <- search1 %>%
select(pmid, pmcid ,doi, title, pubYear, citedByCount, absText, journalTitle) %>%
filter(absText != "NULL") %>%
mutate(text = paste(title, absText))
We will use a method exemplified in the adjutant
package which uses unsupervised machine learning to try and cluster similar articles and attach themes.
In this approach undertake some natural language processing. We will
The ultimate output of this analysis is a visualisation of clustered and labelled abstracts and a interactive table.
library(tidytext)
corp <- create_corpus(df = search1)
head(corp$corpus)
#> # A tibble: 6 x 6
#> pmid word n tf idf tf_idf
#> <chr> <chr> <int> <dbl> <dbl> <dbl>
#> 1 17572960 abil 1 0.00595 3.34 0.0199
#> 2 17572960 activ 2 0.0119 1.63 0.0194
#> 3 17572960 address 1 0.00595 1.83 0.0109
#> 4 17572960 adopt 6 0.0357 2.97 0.106
#> 5 17572960 adult 1 0.00595 2.48 0.0148
#> 6 17572960 advocaci 1 0.00595 4.05 0.0241
clust <- create_cluster(corpus = corp$corpus, minPts = 10)
#> 609.87 sec elapsed
clust$cluster_size
#> # A tibble: 43 x 2
#> cluster n
#> <dbl> <int>
#> 1 0 767
#> 2 1 230
#> 3 2 113
#> 4 3 55
#> 5 4 11
#> 6 5 13
#> 7 6 15
#> 8 7 100
#> 9 8 121
#> 10 9 13
#> # ... with 33 more rows
labels <- label_clusters(corp$corpus, clustering = clust$clustering, top_n = 4)
#> 1.14 sec elapsed
labels$labels
#> # A tibble: 43 x 2
#> # Groups: cluster [43]
#> cluster clus_names
#> <dbl> <chr>
#> 1 0 data-inform-health-studi
#> 2 1 hiv-sex-prevent-studi
#> 3 2 cancer-research-studi-health
#> 4 3 resist-effect-public-health
#> 5 4 measl-outbreak-vaccin-transmiss-public-health
#> 6 5 messag-health-research-inform-method-studi
#> 7 6 abstract-annual-null-meet
#> 8 7 null-surveil-public-health
#> 9 8 vaccin-public-inform-health
#> 10 9 biosurveil-system-data-inform-identifi-health
#> # ... with 33 more rows
p <- labels$results %>%
left_join(search1, by = c("pmid.value" = "pmid")) %>%
ggplot(aes(X1, X2)) +
geom_point(aes(colour = clustered, size = citedByCount) ) +
ggrepel::geom_text_repel(data = labels$plot, aes(medX, medY, label = clus_names), size = 3, colour = "red", alpha = 0.9)
p + scale_alpha_manual(values=c(1,0)) +
viridis::scale_color_viridis(discrete = TRUE, option = "viridis", alpha = .6, begin = .8, end = .1) +
phecharts::theme_phe() +
theme(panel.background = element_rect(fill = "#f0f0f0")) +
labs(subtitle = paste("Clustering: ", nrow(labels$plot), " topics" ),
title = paste("Search ", "= ", params$search ))
most_cited <- labels$results %>%
left_join(search1, by = c("pmid.value" = "pmid")) %>%
filter(cluster !=0) %>%
group_by(clus_names) %>%
top_n(n = 3, citedByCount) %>%
select(clus_names, title, pubYear, citedByCount) %>%
ungroup() %>%
arrange(clus_names, -citedByCount)
most_cited %>%
formattable::formattable()
clus_names | title | pubYear | citedByCount |
---|---|---|---|
abstract-annual-null-meet | Abstracts of the 36th Annual Meeting of the Society of General Internal Medicine. April 24-27, 2013. Denver, Colorado, USA. | 2013 | 3 |
abstract-annual-null-meet | Abstracts from the 38th annual meeting of the society of general internal medicine. | 2015 | 3 |
abstract-annual-null-meet | Abstracts of the 2014 NATA Annual Meeting & Clinical Symposia, June 26-28, 2014, Indianapolis, Indiana. | 2014 | 2 |
alcohol-drink-studi-health | New research findings since the 2007 Surgeon General’s Call to Action to Prevent and Reduce Underage Drinking: a review. | 2014 | 35 |
alcohol-drink-studi-health | Use of alcohol before suicide in the United States. | 2014 | 25 |
alcohol-drink-studi-health | A feasibility study of short message service text messaging as a surveillance tool for alcohol consumption and vehicle for interventions in university students. | 2013 | 17 |
biosurveil-system-data-inform-identifi-health | Advancing a framework to enable characterization and evaluation of data streams useful for biosurveillance. | 2014 | 7 |
biosurveil-system-data-inform-identifi-health | Biosurveillance capability requirements for the global health security agenda: lessons from the 2009 H1N1 pandemic. | 2014 | 6 |
biosurveil-system-data-inform-identifi-health | Digital disease detection: A systematic review of event-based internet biosurveillance systems. | 2017 | 6 |
canadian-activ-survei-develop-health-time-includ | Canadian 24-Hour Movement Guidelines for the Early Years (0-4 years): An Integration of Physical Activity, Sedentary Behaviour, and Sleep. | 2017 | 30 |
canadian-activ-survei-develop-health-time-includ | A collaborative approach to adopting/adapting guidelines - The Australian 24-Hour Movement Guidelines for the early years (Birth to 5 years): an integration of physical activity, sedentary behavior, and sleep. | 2017 | 22 |
canadian-activ-survei-develop-health-time-includ | Health insurance coverage and its impact on medical cost: observations from the floating population in China. | 2014 | 15 |
cancer-research-studi-health | Communication inequalities and public health implications of adult social networking site use in the United States. | 2010 | 50 |
cancer-research-studi-health | Principles and Recommendations for the Provision of Healthcare in Canada to Adolescent and Young Adult-Aged Cancer Patients and Survivors. | 2011 | 41 |
cancer-research-studi-health | Talking About Cancer and Meeting Peer Survivors: Social Information Needs of Adolescents and Young Adults Diagnosed with Cancer. | 2013 | 38 |
care-improv-health-includ-studi | The COMET Handbook: version 1.0. | 2017 | 101 |
care-improv-health-includ-studi | Prevention of acute exacerbations of COPD: American College of Chest Physicians and Canadian Thoracic Society Guideline. | 2015 | 69 |
care-improv-health-includ-studi | Harmonized patient-reported data elements in the electronic health record: supporting meaningful use by primary care action on health behaviors and key psychosocial factors. | 2012 | 65 |
cigarett-tobacco-smoke-studi | e-Cigarette awareness, use, and harm perceptions in US adults. | 2012 | 274 |
cigarett-tobacco-smoke-studi | Awareness and ever-use of electronic cigarettes among U.S. adults, 2010-2011. | 2013 | 238 |
cigarett-tobacco-smoke-studi | The global epidemiology of waterpipe smoking. | 2015 | 116 |
confer-research-particip-health | White Paper Report of the 2010 RAD-AID Conference on International Radiology for Developing Countries: identifying sustainable strategies for imaging services in the developing world. | 2011 | 7 |
confer-research-particip-health | Community-oriented integrated care and health promotion - views from the street. | 2015 | 3 |
confer-research-particip-health | Assessing the need for a new nationally representative household panel survey in the United States. | 2015 | 2 |
data-clinic-research-develop-health | Big data analytics in healthcare: promise and potential. | 2014 | 167 |
data-clinic-research-develop-health | Big data and biomedical informatics: a challenging opportunity. | 2014 | 34 |
data-clinic-research-develop-health | Big Data Application in Biomedical Research and Health Care: A Literature Review. | 2016 | 28 |
data-identifi-approach-health-result-base | Identifying localized changes in large systems: Change-point detection for biomolecular simulations. | 2015 | 6 |
data-identifi-approach-health-result-base | Getting the Word Out: New Approaches for Disseminating Public Health Science. | 2018 | 6 |
data-identifi-approach-health-result-base | A systematic review of data mining and machine learning for air pollution epidemiology. | 2017 | 6 |
diabet-risk-studi-health | Psychological language on Twitter predicts county-level heart disease mortality. | 2015 | 58 |
diabet-risk-studi-health | Position Statement on Active Outdoor Play. | 2015 | 28 |
diabet-risk-studi-health | Sharing data for public health research by members of an international online diabetes social network. | 2011 | 26 |
digit-data-social-health | Digital epidemiology. | 2012 | 111 |
digit-data-social-health | Assessing the feasibility and sample quality of a national random-digit dialing cellular phone survey of young adults. | 2014 | 16 |
digit-data-social-health | Ethical perspectives on recommending digital technology for patients with mental illness. | 2017 | 12 |
disast-respons-inform-health | Pro-anorexia and pro-recovery photo sharing: a tale of two warring tribes. | 2012 | 13 |
disast-respons-inform-health | Public Trauma after the Sewol Ferry Disaster: The Role of Social Media in Understanding the Public Mood. | 2015 | 12 |
disast-respons-inform-health | Local health department capacity for community engagement and its implications for disaster resilience. | 2013 | 11 |
diseas-surveil-approach-health | Approaches to passive mosquito surveillance in the EU. | 2015 | 29 |
diseas-surveil-approach-health | Data for action: collection and use of local data to end tuberculosis. | 2015 | 28 |
diseas-surveil-approach-health | Mapping population and pathogen movements. | 2014 | 14 |
drug-data-studi-health | Utilizing social media data for pharmacovigilance: A review. | 2015 | 88 |
drug-data-studi-health | Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. | 2015 | 63 |
drug-data-studi-health | Portable automatic text classification for adverse drug reaction detection via multi-corpus training. | 2015 | 43 |
facebook-post-media-social-studi-health | Leveraging Big Data to Improve Health Awareness Campaigns: A Novel Evaluation of the Great American Smokeout. | 2016 | 18 |
facebook-post-media-social-studi-health | Facebook Advertising Across an Engagement Spectrum: A Case Example for Public Health Communication. | 2016 | 11 |
facebook-post-media-social-studi-health | Social Network Behavior and Engagement Within a Smoking Cessation Facebook Page. | 2016 | 6 |
hiv-sex-prevent-studi | Minimal Awareness and Stalled Uptake of Pre-Exposure Prophylaxis (PrEP) Among at Risk, HIV-Negative, Black Men Who Have Sex with Men. | 2015 | 78 |
hiv-sex-prevent-studi | Acceptability of smartphone application-based HIV prevention among young men who have sex with men. | 2014 | 77 |
hiv-sex-prevent-studi | HIV incidence in men who have sex with men in England and Wales 2001-10: a nationwide population study. | 2013 | 72 |
influenza-data-time-health | National and local influenza surveillance through Twitter: an analysis of the 2012-2013 influenza epidemic. | 2013 | 112 |
influenza-data-time-health | Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance. | 2015 | 58 |
influenza-data-time-health | Monitoring influenza epidemics in china with search query from baidu. | 2013 | 54 |
injuri-prevent-identifi-studi-health | A review of CDC’s Web-based Injury Statistics Query and Reporting System (WISQARS™): Planning for the future of injury surveillance. | 2017 | 12 |
injuri-prevent-identifi-studi-health | The Road Traffic Injuries Research Network: a decade of research capacity strengthening in low- and middle-income countries. | 2016 | 7 |
injuri-prevent-identifi-studi-health | Epidemiology of training injuries in amateur taekwondo athletes: a retrospective cohort study. | 2015 | 6 |
injuri-prevent-identifi-studi-health | Health and Economic Burden of Running-Related Injuries in Dutch Trailrunners: A Prospective Cohort Study. | 2017 | 6 |
injuri-risk-identifi-health | Active surveillance of sudden cardiac death in young athletes by periodic Internet searches. | 2013 | 7 |
injuri-risk-identifi-health | Police Brutality and Black Health: Setting the Agenda for Public Health Scholars. | 2017 | 6 |
injuri-risk-identifi-health | Coccidioidomycosis among cast and crew members at an outdoor television filming event–California, 2012. | 2014 | 5 |
measl-outbreak-vaccin-transmiss-public-health | Measles Outbreak with Unique Virus Genotyping, Ontario, Canada, 2015. | 2017 | 5 |
measl-outbreak-vaccin-transmiss-public-health | Social capital and pet ownership - A tale of four cities. | 2017 | 3 |
measl-outbreak-vaccin-transmiss-public-health | A national measles outbreak in Ireland linked to a single imported case, April to September, 2016. | 2018 | 3 |
media-social-research-health | A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. | 2013 | 294 |
media-social-research-health | Social media: a review and tutorial of applications in medicine and health care. | 2014 | 95 |
media-social-research-health | Capturing the Patient’s Perspective: a Review of Advances in Natural Language Processing of Health-Related Text. | 2017 | 9 |
medicin-medic-patient-health-inform | Health 2050: The Realization of Personalized Medicine through Crowdsourcing, the Quantified Self, and the Participatory Biocitizen. | 2012 | 54 |
medicin-medic-patient-health-inform | Making sense of big data in health research: Towards an EU action plan. | 2016 | 44 |
medicin-medic-patient-health-inform | The new holism: P4 systems medicine and the medicalization of health and life itself. | 2016 | 27 |
messag-health-research-inform-method-studi | The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. | 2015 | 356 |
messag-health-research-inform-method-studi | Media coverage of health issues and how to work more effectively with journalists: a qualitative study. | 2010 | 22 |
messag-health-research-inform-method-studi | Public health emergency preparedness and response communications with health care providers: a literature review. | 2011 | 11 |
mobil-data-health-studi | Mobile health (mHealth) approaches and lessons for increased performance and retention of community health workers in low- and middle-income countries: a review. | 2013 | 154 |
mobil-data-health-studi | The Asthma Mobile Health Study, a large-scale clinical observational study using ResearchKit. | 2017 | 32 |
mobil-data-health-studi | Health Worker mHealth Utilization: A Systematic Review. | 2016 | 12 |
null-surveil-public-health | Influenza A (H7N9) and the importance of digital epidemiology. | 2013 | 53 |
null-surveil-public-health | Ethical challenges of big data in public health. | 2015 | 49 |
null-surveil-public-health | Direct-to-Consumer Pharmaceutical Advertising: Therapeutic or Toxic? | 2011 | 26 |
obes-childhood-preval-prevent-activ-increas-research-studi-includ-health | Patterns of childhood obesity prevention legislation in the United States. | 2007 | 34 |
obes-childhood-preval-prevent-activ-increas-research-studi-includ-health | An evolving scientific basis for the prevention and treatment of pediatric obesity. | 2014 | 25 |
obes-childhood-preval-prevent-activ-increas-research-studi-includ-health | Incorporating primary and secondary prevention approaches to address childhood obesity prevention and treatment in a low-income, ethnically diverse population: study design and demographic data from the Texas Childhood Obesity Research Demonstration (TX CORD) study. | 2015 | 22 |
outbreak-viru-diseas-health | Zika Virus: Medical Countermeasure Development Challenges. | 2016 | 56 |
outbreak-viru-diseas-health | What factors might have led to the emergence of Ebola in West Africa? | 2015 | 55 |
outbreak-viru-diseas-health | The emergence of ebola as a global health security threat: from ‘lessons learned’ to coordinated multilateral containment efforts. | 2014 | 24 |
particip-ag-effect-studi-includ-health | Support for healthy breastfeeding mothers with healthy term babies. | 2017 | 33 |
particip-ag-effect-studi-includ-health | Interventions for promoting the initiation of breastfeeding. | 2016 | 22 |
particip-ag-effect-studi-includ-health | Factors influencing sex differences in poststroke functional outcome. | 2015 | 17 |
particip-ag-effect-studi-includ-health | Population-level interventions in government jurisdictions for dietary sodium reduction. | 2016 | 17 |
physic-activ-ag-studi-health | Trends in television time, non-gaming PC use and moderate-to-vigorous physical activity among German adolescents 2002-2010. | 2014 | 26 |
physic-activ-ag-studi-health | Recreational screen-time among Chinese adolescents: a cross-sectional study. | 2014 | 19 |
physic-activ-ag-studi-health | Aerobic Capacity, Physical Activity and Metabolic Risk Factors in Firefighters Compared with Police Officers and Sedentary Clerks. | 2015 | 17 |
research-public-health-develop | Promoting integrated approaches to reducing health inequities among low-income workers: applying a social ecological framework. | 2014 | 44 |
research-public-health-develop | The public health exposome: a population-based, exposure science approach to health disparities research. | 2014 | 32 |
research-public-health-develop | Nature Contact and Human Health: A Research Agenda. | 2017 | 31 |
resist-effect-public-health | The comprehensive antibiotic resistance database. | 2013 | 439 |
resist-effect-public-health | Dissemination of health information through social networks: twitter and antibiotics. | 2010 | 156 |
resist-effect-public-health | Management of patients with multidrug-resistant/extensively drug-resistant tuberculosis in Europe: a TBNET consensus statement. | 2014 | 93 |
search-data-inform-studi | FluBreaks: early epidemic detection from Google flu trends. | 2012 | 23 |
search-data-inform-studi | Incidence of online health information search: a useful proxy for public health risk perception. | 2013 | 12 |
search-data-inform-studi | Surveillance Tools Emerging From Search Engines and Social Media Data for Determining Eye Disease Patterns. | 2016 | 11 |
sleep-behavior-time-cross-studi-data-health | Characterizing Sleep Issues Using Twitter. | 2015 | 16 |
sleep-behavior-time-cross-studi-data-health | Sleep, Health and Wellness at Work: A Scoping Review. | 2017 | 14 |
sleep-behavior-time-cross-studi-data-health | Digital Media and Sleep in Childhood and Adolescence. | 2017 | 8 |
sleep-behavior-time-cross-studi-data-health | Decreases in self-reported sleep duration among U.S. adolescents 2009-2015 and association with new media screen time. | 2017 | 8 |
social-health-public-studi | The influence of social networking sites on health behavior change: a systematic review and meta-analysis. | 2015 | 120 |
social-health-public-studi | Comparison of response rates and cost-effectiveness for a community-based survey: postal, internet and telephone modes with generic or personalised recruitment approaches. | 2012 | 63 |
social-health-public-studi | Public preferences about secondary uses of electronic health information. | 2013 | 44 |
suicid-prevent-rate-risk-health | Predicting national suicide numbers with social media data. | 2013 | 26 |
suicid-prevent-rate-risk-health | Suicide among children and adolescents in Canada: trends and sex differences, 1980-2008. | 2012 | 25 |
suicid-prevent-rate-risk-health | Accessing suicide-related information on the internet: a retrospective observational study of search behavior. | 2013 | 21 |
suicid-prevent-relat-develop-health-studi | Connecting the invisible dots: reaching lesbian, gay, and bisexual adolescents and young adults at risk for suicide through online social networks. | 2009 | 26 |
suicid-prevent-relat-develop-health-studi | Efficacy of Web-Based Collection of Strength-Based Testimonials for Text Message Extension of Youth Suicide Prevention Program: Randomized Controlled Experiment. | 2016 | 5 |
suicid-prevent-relat-develop-health-studi | Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? | 2018 | 4 |
surveil-diseas-data-public | Scoping review on search queries and social media for disease surveillance: a chronology of innovation. | 2013 | 46 |
surveil-diseas-data-public | Social media in public health. | 2013 | 24 |
surveil-diseas-data-public | Optimizing provider recruitment for influenza surveillance networks. | 2012 | 18 |
surveil-diseas-data-public | Health department use of social media to identify foodborne illness - Chicago, Illinois, 2013-2014. | 2014 | 18 |
surveil-diseas-public-health | Internet-based surveillance systems for monitoring emerging infectious diseases. | 2014 | 61 |
surveil-diseas-public-health | Social media and internet-based data in global systems for public health surveillance: a systematic review. | 2014 | 45 |
surveil-diseas-public-health | Enhancing disease surveillance with novel data streams: challenges and opportunities. | 2015 | 26 |
technologi-research-commun-data-health-develop | Using Electronic Health Records for Population Health Research: A Review of Methods and Applications. | 2016 | 41 |
technologi-research-commun-data-health-develop | Use of health information technology among racial and ethnic underserved communities. | 2011 | 18 |
technologi-research-commun-data-health-develop | Public preferences and the challenge to genetic research policy. | 2014 | 9 |
tweet-twitter-social-health | Social media use in the United States: implications for health communication. | 2009 | 257 |
tweet-twitter-social-health | Adoption and use of social media among public health departments. | 2012 | 79 |
tweet-twitter-social-health | Tweaking and tweeting: exploring Twitter for nonmedical use of a psychostimulant drug (Adderall) among college students. | 2013 | 54 |
vaccin-public-inform-health | Vaccine hesitancy: an overview. | 2013 | 122 |
vaccin-public-inform-health | Assessing vaccination sentiments with online social media: implications for infectious disease dynamics and control. | 2011 | 119 |
vaccin-public-inform-health | Communicating with parents about vaccination: a framework for health professionals. | 2012 | 117 |
We can review the commonest Mesh headings associated with each cluster tag.
labels$results %>%
left_join(search1, by = c("pmid.value" = "pmid")) %>%
select(clus_names, mesh) %>%
filter(mesh != "NULL") %>%
unnest(mesh) %>%
count(clus_names, mesh,sort = TRUE) %>%
filter(n < 30) %>%
ungroup() %>%
group_by(clus_names) %>%
top_n(10) %>%
mutate(summary = paste(mesh, collapse = "; " )) %>%
select(-c(mesh, n)) %>%
distinct() %>%
arrange(clus_names) %>%
knitr::kable()
clus_names | summary |
---|---|
abstract-annual-null-meet | Humans; Animals; Internal Medicine; Athletic Injuries; California; Education, Pharmacy; Libraries, Medical; Library Associations; Schools, Pharmacy; Sports; Students, Pharmacy; United States |
alcohol-drink-studi-health | Humans; Alcohol Drinking; Female; Male; Adolescent; Young Adult; Adult; Social Media; Alcoholic Beverages; Alcoholic Intoxication |
biosurveil-system-data-inform-identifi-health | Humans; Biosurveillance; Internet; Animals; Bioterrorism; Communicable Disease Control; Databases, Factual; Decision Support Techniques; Disease Outbreaks; Influenza A Virus, H1N1 Subtype; Influenza, Human; Public Health; Public Health Surveillance; Statistics as Topic |
canadian-activ-survei-develop-health-time-includ | Humans; Canada; Child, Preschool; Exercise; Female; Male; Infant; Guidelines as Topic; Infant, Newborn; Adult; Guideline Adherence; Health Promotion; Sleep; Surveys and Questionnaires; Time Factors; United States; Young Adult |
cancer-research-studi-health | Neoplasms; Adult; Middle Aged; Aged; United States; Young Adult; Adolescent; Early Detection of Cancer; Survivors; Health Knowledge, Attitudes, Practice |
care-improv-health-includ-studi | Humans; Female; Adult; Male; United States; Internet; Attitude of Health Personnel; Canada; Delivery of Health Care; Health Promotion; Middle Aged; Practice Guidelines as Topic; Pregnancy; Qualitative Research; Retrospective Studies; Risk Assessment; Treatment Outcome; Young Adult |
cigarett-tobacco-smoke-studi | Adult; Young Adult; Tobacco Products; Smoking Cessation; United States; Cross-Sectional Studies; Internet; Middle Aged; Marketing; Tobacco Industry |
confer-research-particip-health | Humans; Biomedical Research; Congresses as Topic; Africa; Community-Institutional Relations; Cooperative Behavior; Delivery of Health Care, Integrated; Developing Countries; Diagnostic Imaging; Global Health; Group Processes; Health Education; Health Knowledge, Attitudes, Practice; Health Policy; Health Services Research; Health Status Disparities; Healthcare Disparities; Information Dissemination; International Agencies; Medical Informatics; Models, Theoretical; Noncommunicable Diseases; Organizational Objectives; Patient Education as Topic; Public Health; Radiology; Research; Students, Medical; United States; World Health Organization |
data-clinic-research-develop-health | Humans; Medical Informatics; Electronic Health Records; Data Collection; Data Mining; Databases, Factual; Delivery of Health Care; Privacy; United States; Biomedical Research; Confidentiality; Data Anonymization; Datasets as Topic; Epidemiology; Information Systems; Internet; Medical Records; Public Health Informatics; Reproducibility of Results; Social Media |
data-identifi-approach-health-result-base | Humans; Air Pollution; Data Mining; Epidemiological Monitoring; Population Surveillance; Algorithms; Armed Conflicts; Artificial Intelligence; Bacterial Infections; Bibliometrics; Biophysical Phenomena; Bombs; Cities; Conservation of Natural Resources; Crime Victims; Crowdsourcing; Culture; Databases, Factual; Delivery of Health Care; Demography; Drug Resistance, Bacterial; Environmental Monitoring; Epidemiologic Studies; Exposure to Violence; Extraction and Processing Industry; Formaldehyde; Geography; Government; Health Facilities; Health Personnel; Health Resources; Health Workforce; History, 20th Century; History, 21st Century; Incidence; Internet; Likelihood Functions; Machine Learning; Malaria; Malawi; Mass Casualty Incidents; Medical Informatics; Models, Biological; Models, Theoretical; Molecular Dynamics Simulation; Natural Gas; Oil and Gas Fields; Physicians; Pilot Projects; Policy; Policy Making; Protein Conformation; Protein Folding; Proteins; Satellite Imagery; Social Planning; Social Values; Software; Surveys and Questionnaires; Sweden; Syria; Texas; Uncertainty |
data-inform-health-studi | Global Health; Health Knowledge, Attitudes, Practice; Child, Preschool; Health Policy; Risk Factors; Canada; Health Promotion; Pregnancy; Influenza, Human; Information Dissemination |
diabet-risk-studi-health | Humans; Male; Female; Diabetes Mellitus; Middle Aged; Risk Factors; Adult; United States; Cross-Sectional Studies; Social Media; Social Support |
digit-data-social-health | Humans; Adolescent; Female; Internet; Male; Public Health; Social Media; Adult; Cell Phones; Data Collection; Population Surveillance; Smoking; Smoking Cessation; Telemedicine; United States; Young Adult |
disast-respons-inform-health | Humans; Disasters; Cyclonic Storms; Adolescent; Female; Male; Social Media; Child; Adult; Child, Preschool; Disaster Planning; Public Health |
diseas-surveil-approach-health | Humans; Animals; Communicable Diseases; Disease Outbreaks; Male; Disease Vectors; Female; Poverty; Public Health Surveillance; Travel; United States |
drug-data-studi-health | Social Media; Internet; Drug-Related Side Effects and Adverse Reactions; Pharmacovigilance; United States; Adverse Drug Reaction Reporting Systems; Data Mining; Databases, Factual; United States Food and Drug Administration; Prescription Drugs |
facebook-post-media-social-studi-health | Humans; Social Media; Adult; Female; Communication; Cross-Sectional Studies; Data Collection; Government Agencies; Health Promotion; Hospitals; Information Seeking Behavior; Malaysia; Male; Middle Aged; Physicians; Public Health; Smoking Cessation; Social Behavior; Social Networking; Social Support; Surveys and Questionnaires; Taiwan; Wit and Humor as Topic |
hiv-sex-prevent-studi | Sexual Partners; Risk Factors; Cross-Sectional Studies; Mass Screening; Risk-Taking; Internet; Surveys and Questionnaires; Health Knowledge, Attitudes, Practice; Pre-Exposure Prophylaxis; Sexual and Gender Minorities; Sexually Transmitted Diseases |
influenza-data-time-health | Internet; United States; Seasons; Disease Outbreaks; Forecasting; Social Media; Population Surveillance; Epidemiological Monitoring; Female; Models, Statistical |
injuri-prevent-identifi-studi-health | Humans; Wounds and Injuries; Adolescent; Child; Risk Factors; United States; Centers for Disease Control and Prevention (U.S.); Population Surveillance; Public Health; Adult; Athletic Injuries; Health Promotion; Incidence; Suicide |
injuri-risk-identifi-health | Humans; Female; Male; Adult; Young Adult; Middle Aged; Accidents, Traffic; Adolescent; Incidence; Risk Factors; United States; Wounds and Injuries |
measl-outbreak-vaccin-transmiss-public-health | Humans; Measles; Disease Outbreaks; Vaccination; Adolescent; Child; Communicable Disease Control; Female; Genotype; Male; Measles virus |
media-social-research-health | Social Media; Humans; Confidentiality; Delivery of Health Care; Health Communication; Health Personnel; Health Promotion; Internet; Medical Informatics; Privacy; Social Networking |
medicin-medic-patient-health-inform | Humans; Databases, Factual; Precision Medicine; Delivery of Health Care; Algorithms; Big Data; Biomedical Research; Evidence-Based Medicine; Health Behavior; Health Knowledge, Attitudes, Practice; Healthy Lifestyle; Information Dissemination; Preventive Medicine; Public Health |
messag-health-research-inform-method-studi | Humans; Health Personnel; Natural Language Processing; Public Health; Biomedical Research; Checklist; Electronic Mail; Guidelines as Topic; Interviews as Topic; Research Report |
mobil-data-health-studi | Humans; Male; Adult; Female; Middle Aged; Telemedicine; Aged; Cell Phone; Communication; Community Health Workers; Delivery of Health Care; Developing Countries; Health Services Accessibility; Mobile Applications; Population Surveillance; Prospective Studies; Smartphone; Surveys and Questionnaires; Text Messaging; Young Adult |
null-surveil-public-health | Public Health; United States; Social Media; Health Promotion; Population Surveillance; Public Health Surveillance; Animals; Chronic Disease; Cooperative Behavior; Female; Internet |
obes-childhood-preval-prevent-activ-increas-research-studi-includ-health | Humans; Adolescent; Female; Male; Pediatric Obesity; Child; Obesity; Adult; Cross-Sectional Studies; Health Promotion; Prevalence; Public Health |
outbreak-viru-diseas-health | Hemorrhagic Fever, Ebola; Female; Animals; Male; Public Health; United States; Adult; Middle Aged; Population Surveillance; Zika Virus Infection |
particip-ag-effect-studi-includ-health | Male; Adult; Adolescent; Young Adult; Middle Aged; Child; Aged; Health Behavior; United States; Exercise; Pregnancy |
physic-activ-ag-studi-health | Humans; Female; Male; Adolescent; Child; Exercise; Adult; Socioeconomic Factors; Young Adult; Cross-Sectional Studies; Internet; Middle Aged; Prospective Studies; Risk Factors; Surveys and Questionnaires |
research-public-health-develop | United States; Public Health; Female; Health Policy; Male; Delivery of Health Care; Adult; Environmental Exposure; Health Status Disparities; Policy Making |
resist-effect-public-health | Anti-Bacterial Agents; Drug Resistance, Bacterial; Female; Male; Adult; Drug Resistance, Microbial; Food Safety; Food Supply; Middle Aged; Food Industry; Health Knowledge, Attitudes, Practice; Young Adult |
search-data-inform-studi | Humans; Internet; United States; Search Engine; Female; Population Surveillance; Adult; Centers for Disease Control and Prevention (U.S.); Forecasting; Incidence; Influenza, Human; Male; Public Health; Risk Assessment; Social Media |
sleep-behavior-time-cross-studi-data-health | Humans; Sleep; Female; Adolescent; Cross-Sectional Studies; Male; Aged; Middle Aged; Sleep Wake Disorders; Social Media; Time Factors |
social-health-public-studi | Humans; Adult; Female; Male; Health Behavior; Health Promotion; Middle Aged; Social Media; Aged; United States |
suicid-prevent-rate-risk-health | Humans; Suicide; Male; Female; Adult; Risk Factors; Middle Aged; Retrospective Studies; Adolescent; China; Internet; Models, Statistical; Primary Prevention; Republic of Korea; Search Engine; Social Media; Suicidal Ideation; Suicide, Attempted; Young Adult |
suicid-prevent-relat-develop-health-studi | Adolescent; Adult; Female; Humans; Internet; Male; Suicide; Young Adult; Adolescent Behavior; Age Factors; Algorithms; Child; Confidence Intervals; Focus Groups; Homosexuality, Female; Homosexuality, Male; Information Seeking Behavior; Monte Carlo Method; Pilot Projects; Prevalence; Qualitative Research; Risk Assessment; Risk Factors; Self-Injurious Behavior; Social Support; Suicide, Attempted; United Kingdom; United States |
surveil-diseas-data-public | Humans; Social Media; Disease Outbreaks; Population Surveillance; Influenza, Human; Models, Statistical; China; Communicable Diseases; Influenza A Virus, H7N9 Subtype; Internet; Public Health; Software |
surveil-diseas-public-health | Humans; Communicable Diseases; Public Health Surveillance; Internet; Population Surveillance; Disease Outbreaks; Social Media; Public Health; Animals; Data Collection; Epidemiological Monitoring |
technologi-research-commun-data-health-develop | Humans; Electronic Health Records; Access to Information; Age Factors; Air Pollutants; Air Pollution; Cell Phone; Communication; Community Health Workers; Computer Security; Continental Population Groups; Cultural Diversity; Culture; Data Collection; Data Curation; Data Mining; Dementia; Diagnostic Techniques and Procedures; Disabled Persons; Environment; Environmental Monitoring; Epidemiologic Research Design; Ethnic Groups; Focus Groups; Forecasting; Health Behavior; Health Knowledge, Attitudes, Practice; Health Services Accessibility; Health Status Disparities; Hospital Information Systems; Human Rights; India; Malawi; Maryland; Medical Records Systems, Computerized; Medically Underserved Area; Mental Health; Mobile Applications; Models, Theoretical; Patient Satisfaction; Physician-Patient Relations; Public Health; Reproducibility of Results; Research; Rural Health Services; Self-Help Devices; Sex Factors; Social Environment; Socioeconomic Factors; United Nations; United States; Universal Health Insurance; Vital Signs |
tweet-twitter-social-health | United States; Internet; Information Dissemination; Female; Male; Public Health; Adult; Communication; Disease Outbreaks; Adolescent; Data Collection; Public Opinion |
vaccin-public-inform-health | Male; Papillomavirus Vaccines; Health Knowledge, Attitudes, Practice; Social Media; Child; Immunization Programs; Adolescent; Adult; Patient Acceptance of Health Care; United States |
Lets explore articles for which public health is a Mesh heading.
ph <- labels$results %>%
left_join(search1, by = c("pmid.value" = "pmid")) %>%
filter(str_detect(keywords, "Public Health"))
ph %>%
count(clus_names, sort = TRUE)
#> # A tibble: 39 x 2
#> clus_names n
#> <chr> <int>
#> 1 data-inform-health-studi 108
#> 2 null-surveil-public-health 27
#> 3 tweet-twitter-social-health 22
#> 4 influenza-data-time-health 17
#> 5 outbreak-viru-diseas-health 16
#> 6 vaccin-public-inform-health 16
#> 7 research-public-health-develop 15
#> 8 surveil-diseas-public-health 13
#> 9 cigarett-tobacco-smoke-studi 12
#> 10 hiv-sex-prevent-studi 12
#> # ... with 29 more rows
There is one article tagged with ai-intellig-artifici-health-data which has Public Health as a mesh heading. We can use epmc_ftxt
to extract the full text article.
library(rvest)
get_pmcids <- ph %>%
filter(clus_names == "data-research-health-develop") %>%
select(id, pmcid) %>%
filter(!is.na(pmcid))
details <- mutate(ids, details = map(get_ids, epmc_details))
full_text <- details %>%
mutate(full_text = map(details, "ftx")) %>%
unnest(full_text) %>%
filter(availability == "Free") %>%
left_join(get_pmcids, by = c("value" = "id")) %>%
distinct()
full_text <- europepmc::epmc_ftxt("PMC5171550")
ft <- full_text %>%
html_text()
ft %>%
str_split(., "\\. ") %>%
enframe() %>%
formattable::formattable()
Finally we can gather all the abstracts into a single interactive table which can be searched, filtered and shared.
labels$results %>%
left_join(search1, by = c("pmid.value" = "pmid")) %>%
select(cluster, clus_names, doi, title, journalTitle, pubYear, citedByCount, absText) %>%
mutate(doi = paste0("<a href = https://", doi, ">doi</a>")) %>%
DT::datatable(escape = FALSE, extensions = c('Responsive','Buttons', 'FixedHeader'),
filter = "top",
options = list(
autoWidth = TRUE,
columnDefs = list(list(width = '450px')),
dom = 'Bfrtip',
buttons = c('csv', 'excel'),
fixedHeader=TRUE)
)