Introduction

  • Literature and evidence review essential in public health practice
  • Exponential growth in volume of literature
  • Initial first steps usually:
    • Developing search strategy
    • Reviewing and filtering abstracts
    • Obtaining full text (if possible)
    • Data extraction

This can be a manual and protracted iterative process which may involve using specialised searching services, downloading abstracts, reading and filtering, secondary searching and so on, and may involve sifting many thousands of abstracts.

Often we may just want a rapid overview of the literature to help focus further reviewing.

In this vignette we demonstrate the use of R packages for large scale extraction of abstracts, and analytical techniques for identifying topics or themes in the abstracts.

The vignette is based on a number of R packages:

  1. europepmc - this is a sophisticated tool which interacts with the PubMedCentral API and provides access to additional fields.
  2. adjutant - this is a fully fledged package with retrieval and clustering functions. 3.tidytext - a package for text mining using tidy data principles.
  3. Rtsne - this uses the tSNE algorithm for data reduction and cluster visualisation
  4. dbscan - applies the HDBSCAN algorithm for data clustering
  5. myScrapers - wraps some functions built on other packages to automate the search, extraction, and filtering process.

We have “hacked” some of the functions in these packages and written additional functions to develop a work flow from searching and retrieval to analysis

A simple example using europepmc

Searching Europe PubMed Central (epmc)

This is a package which allows searching of EuropePMC via the API.

It can be downloaded from CRAN.


if(!require("europepmc")) install.packages("europepmc")
library(europepmc)

The main function is epmc_search which allows us to search the site and retrieve abstracts, metadata and citation counts.

We’ll use it with the search term “social media” AND “public health” AND surveillance.


head(epmc_search(params$search, limit = 10))
#> # A tibble: 6 x 28
#>   id    source pmid  doi   title authorString journalTitle pubYear
#>   <chr> <chr>  <chr> <chr> <chr> <chr>        <chr>        <chr>  
#> 1 3144~ MED    3144~ 10.1~ Harn~ Strathdee S~ Curr Opin H~ 2019   
#> 2 3141~ MED    3141~ 10.1~ Rece~ Conway M, H~ Yearb Med I~ 2019   
#> 3 3127~ MED    3127~ 10.1~ A Sy~ Karmegam D,~ Disaster Me~ 2019   
#> 4 3148~ MED    3148~ 10.2~ Comm~ Fontaine G,~ JMIR Public~ 2019   
#> 5 3141~ MED    3141~ 10.1~ Arti~ Thiébaut R,~ Yearb Med I~ 2019   
#> 6 PMC6~ PMC    <NA>  10.5~ Keyw~ Hawes AN.    Online J Pu~ 2019   
#> # ... with 20 more variables: journalIssn <chr>, pubType <chr>,
#> #   isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> #   hasBook <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>,
#> #   hasLabsLinks <chr>, hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, pmcid <chr>, issue <chr>,
#> #   journalVolume <chr>, pageInfo <chr>, hasSuppl <chr>

This doesn’t extract the abstract text or Mesh headings (keywords) - to facilitate this we have wrapped the search function, into get_full_search in myScrapers.

library(tictoc)

tic()
search1 <- get_full_search(search = params$search, limit = params$limit)
toc()
#> 1695.38 sec elapsed

head(search1, 20)
#> # A tibble: 20 x 33
#>    id    source pmid  doi   title authorString journalTitle pubYear
#>    <chr> <chr>  <chr> <chr> <chr> <chr>        <chr>        <chr>  
#>  1 3144~ MED    3144~ 10.1~ Harn~ Strathdee S~ Curr Opin H~ 2019   
#>  2 3141~ MED    3141~ 10.1~ Rece~ Conway M, H~ Yearb Med I~ 2019   
#>  3 3127~ MED    3127~ 10.1~ A Sy~ Karmegam D,~ Disaster Me~ 2019   
#>  4 3148~ MED    3148~ 10.2~ Comm~ Fontaine G,~ JMIR Public~ 2019   
#>  5 3141~ MED    3141~ 10.1~ Arti~ Thiébaut R,~ Yearb Med I~ 2019   
#>  6 PMC6~ PMC    <NA>  10.5~ Keyw~ Hawes AN.    Online J Pu~ 2019   
#>  7 3144~ MED    3144~ 10.3~ Wher~ Majmundar A~ Int J Envir~ 2019   
#>  8 3116~ MED    3116~ 10.2~ Moni~ Liu S, Chen~ J Med Inter~ 2019   
#>  9 3094~ MED    3094~ 10.1~ Pers~ Degeling C,~ Health Res ~ 2019   
#> 10 3142~ MED    3142~ 10.1~ Soci~ Cesare N, N~ BMJ Open Sp~ 2019   
#> 11 3132~ MED    3132~ 10.1~ Avia~ Chen Y, Zha~ Sci Rep      2019   
#> 12 3128~ MED    3128~ 10.2~ Iden~ Chu KH, Col~ J Med Inter~ 2019   
#> 13 3106~ MED    3106~ 10.2~ A So~ Reuter K, M~ JMIR Public~ 2019   
#> 14 PMC6~ PMC    <NA>  10.5~ Anal~ Burkom H, D~ Online J Pu~ 2019   
#> 15 3092~ MED    3092~ 10.1~ Esta~ Brosch S, d~ Drug Saf     2019   
#> 16 PMC6~ PMC    <NA>  10.5~ Unde~ Park A, Wes~ Online J Pu~ 2019   
#> 17 3109~ MED    3109~ 10.2~ Goog~ Lykens J, P~ JMIR Public~ 2019   
#> 18 3070~ MED    3070~ 10.7~ Prec~ Bempong NE,~ J Glob Heal~ 2019   
#> 19 3143~ MED    3143~ 10.3~ IDOM~ Béré WRC, C~ Stud Health~ 2019   
#> 20 3131~ MED    3131~ 10.7~ Twit~ Schaible BJ~ Perm J       2019   
#> # ... with 25 more variables: journalIssn <chr>, pubType <chr>,
#> #   isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> #   hasBook <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>,
#> #   hasLabsLinks <chr>, hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, pmcid <chr>, issue <chr>,
#> #   journalVolume <chr>, pageInfo <chr>, hasSuppl <chr>, bookid <chr>,
#> #   name <int>, absText <list>, mesh <list>, keywords <chr>

We can see that the get_full_search function returns addition metadata such as citation counts, whether the journal is open access and whether there is PDF available. By default, 1000 article descriptions are downloaded. It also includes mesh headings and abstract text.

we can see how many articles are available altogether by running epmc_profile.


profile <- epmc_profile(query = params$search)

Running epmc_profile allows us to see that there are 2867 articles of which 2767 are full text articles, and 1934 are open access.

Analysing abstracts

Abstracts per year

We can easily look at annual abstract frequency - we can readily see the growth in publication frequency in the last 3 years.


search1 %>%
  count(pubYear) %>%
  ggplot(aes(pubYear, n)) +
  geom_col(fill = "blue") +
  labs(title = "Abstracts per year", 
       subtitle = paste("Search: ", params$search)) +
  phecharts::theme_phe() +
  theme(axis.text.x = element_text(angle = 45 ,hjust = 1))

Journal frequency

Similarly we can identify the most frequent journals


journal_count <- search1 %>%
  count(journalTitle) %>%
  top_n(20) %>%
  arrange(-n)

 journal_count %>%
  ggplot(aes(reorder(journalTitle, n), n)) +
  geom_col(fill = "blue") +
  coord_flip() +
  labs(title = "Journal frequency") +
  phecharts::theme_phe()

JMIR Public Health Surveill and PLoS One are the most frequent journals publishing articles on “social media” AND “public health” AND surveillance.

Topic identification

Once we have a data frame of 2869 records with abstract text, we can prepare the data for analysis. The create_corpus function is designed for this.


out1 <- search1 %>%
  select(pmid, pmcid ,doi, title, pubYear, citedByCount, absText, journalTitle) %>%
  filter(absText != "NULL") %>%
  mutate(text = paste(title, absText))

Text mining

We will use a method exemplified in the adjutant package which uses unsupervised machine learning to try and cluster similar articles and attach themes.

In this approach undertake some natural language processing. We will

  • Split each abstract into groups is single words
  • Remove numbers and common (stop) words
  • Stem each word (definition:)
  • Calculate the tf-idf score for each word in each abstract - this gives more weight to words which are more “typical” of the abstracts
  • Create a document feature matrix
  • Undertake dimensionality reduction using tSNE to simplify
  • Run HDBSCAN to identify clusters
  • Name the clusters
  • QA the result

The ultimate output of this analysis is a visualisation of clustered and labelled abstracts and a interactive table.


library(tidytext)

corp <- create_corpus(df = search1)

head(corp$corpus)
#> # A tibble: 6 x 6
#>   pmid     word         n      tf   idf tf_idf
#>   <chr>    <chr>    <int>   <dbl> <dbl>  <dbl>
#> 1 17572960 abil         1 0.00595  3.34 0.0199
#> 2 17572960 activ        2 0.0119   1.63 0.0194
#> 3 17572960 address      1 0.00595  1.83 0.0109
#> 4 17572960 adopt        6 0.0357   2.97 0.106 
#> 5 17572960 adult        1 0.00595  2.48 0.0148
#> 6 17572960 advocaci     1 0.00595  4.05 0.0241

clust <- create_cluster(corpus = corp$corpus, minPts = 10)
#> 609.87 sec elapsed


clust$cluster_size
#> # A tibble: 43 x 2
#>    cluster     n
#>      <dbl> <int>
#>  1       0   767
#>  2       1   230
#>  3       2   113
#>  4       3    55
#>  5       4    11
#>  6       5    13
#>  7       6    15
#>  8       7   100
#>  9       8   121
#> 10       9    13
#> # ... with 33 more rows

Labelling clusters


labels <- label_clusters(corp$corpus, clustering = clust$clustering, top_n = 4)
#> 1.14 sec elapsed

labels$labels
#> # A tibble: 43 x 2
#> # Groups:   cluster [43]
#>    cluster clus_names                                   
#>      <dbl> <chr>                                        
#>  1       0 data-inform-health-studi                     
#>  2       1 hiv-sex-prevent-studi                        
#>  3       2 cancer-research-studi-health                 
#>  4       3 resist-effect-public-health                  
#>  5       4 measl-outbreak-vaccin-transmiss-public-health
#>  6       5 messag-health-research-inform-method-studi   
#>  7       6 abstract-annual-null-meet                    
#>  8       7 null-surveil-public-health                   
#>  9       8 vaccin-public-inform-health                  
#> 10       9 biosurveil-system-data-inform-identifi-health
#> # ... with 33 more rows

Visualise


p <- labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  ggplot(aes(X1, X2)) +
  geom_point(aes(colour = clustered, size = citedByCount) ) +
  ggrepel::geom_text_repel(data = labels$plot, aes(medX, medY, label = clus_names), size = 3, colour = "red", alpha = 0.9)

p + scale_alpha_manual(values=c(1,0)) +
  viridis::scale_color_viridis(discrete = TRUE, option = "viridis", alpha = .6, begin = .8, end = .1) +
  phecharts::theme_phe() +
  theme(panel.background = element_rect(fill = "#f0f0f0")) +
  labs(subtitle = paste("Clustering: ", nrow(labels$plot), " topics" ), 
       title = paste("Search ", "= ", params$search ))

Understanding the labels

Most cited articles


most_cited <- labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  filter(cluster !=0) %>%
  group_by(clus_names) %>%
  top_n(n = 3, citedByCount) %>%
  select(clus_names, title, pubYear, citedByCount) %>%
  ungroup() %>%
  arrange(clus_names, -citedByCount)

most_cited %>%
  formattable::formattable()
clus_names title pubYear citedByCount
abstract-annual-null-meet Abstracts of the 36th Annual Meeting of the Society of General Internal Medicine. April 24-27, 2013. Denver, Colorado, USA. 2013 3
abstract-annual-null-meet Abstracts from the 38th annual meeting of the society of general internal medicine. 2015 3
abstract-annual-null-meet Abstracts of the 2014 NATA Annual Meeting & Clinical Symposia, June 26-28, 2014, Indianapolis, Indiana. 2014 2
alcohol-drink-studi-health New research findings since the 2007 Surgeon General’s Call to Action to Prevent and Reduce Underage Drinking: a review. 2014 35
alcohol-drink-studi-health Use of alcohol before suicide in the United States. 2014 25
alcohol-drink-studi-health A feasibility study of short message service text messaging as a surveillance tool for alcohol consumption and vehicle for interventions in university students. 2013 17
biosurveil-system-data-inform-identifi-health Advancing a framework to enable characterization and evaluation of data streams useful for biosurveillance. 2014 7
biosurveil-system-data-inform-identifi-health Biosurveillance capability requirements for the global health security agenda: lessons from the 2009 H1N1 pandemic. 2014 6
biosurveil-system-data-inform-identifi-health Digital disease detection: A systematic review of event-based internet biosurveillance systems. 2017 6
canadian-activ-survei-develop-health-time-includ Canadian 24-Hour Movement Guidelines for the Early Years (0-4 years): An Integration of Physical Activity, Sedentary Behaviour, and Sleep. 2017 30
canadian-activ-survei-develop-health-time-includ A collaborative approach to adopting/adapting guidelines - The Australian 24-Hour Movement Guidelines for the early years (Birth to 5 years): an integration of physical activity, sedentary behavior, and sleep. 2017 22
canadian-activ-survei-develop-health-time-includ Health insurance coverage and its impact on medical cost: observations from the floating population in China. 2014 15
cancer-research-studi-health Communication inequalities and public health implications of adult social networking site use in the United States. 2010 50
cancer-research-studi-health Principles and Recommendations for the Provision of Healthcare in Canada to Adolescent and Young Adult-Aged Cancer Patients and Survivors. 2011 41
cancer-research-studi-health Talking About Cancer and Meeting Peer Survivors: Social Information Needs of Adolescents and Young Adults Diagnosed with Cancer. 2013 38
care-improv-health-includ-studi The COMET Handbook: version 1.0. 2017 101
care-improv-health-includ-studi Prevention of acute exacerbations of COPD: American College of Chest Physicians and Canadian Thoracic Society Guideline. 2015 69
care-improv-health-includ-studi Harmonized patient-reported data elements in the electronic health record: supporting meaningful use by primary care action on health behaviors and key psychosocial factors. 2012 65
cigarett-tobacco-smoke-studi e-Cigarette awareness, use, and harm perceptions in US adults. 2012 274
cigarett-tobacco-smoke-studi Awareness and ever-use of electronic cigarettes among U.S. adults, 2010-2011. 2013 238
cigarett-tobacco-smoke-studi The global epidemiology of waterpipe smoking. 2015 116
confer-research-particip-health White Paper Report of the 2010 RAD-AID Conference on International Radiology for Developing Countries: identifying sustainable strategies for imaging services in the developing world. 2011 7
confer-research-particip-health Community-oriented integrated care and health promotion - views from the street. 2015 3
confer-research-particip-health Assessing the need for a new nationally representative household panel survey in the United States. 2015 2
data-clinic-research-develop-health Big data analytics in healthcare: promise and potential. 2014 167
data-clinic-research-develop-health Big data and biomedical informatics: a challenging opportunity. 2014 34
data-clinic-research-develop-health Big Data Application in Biomedical Research and Health Care: A Literature Review. 2016 28
data-identifi-approach-health-result-base Identifying localized changes in large systems: Change-point detection for biomolecular simulations. 2015 6
data-identifi-approach-health-result-base Getting the Word Out: New Approaches for Disseminating Public Health Science. 2018 6
data-identifi-approach-health-result-base A systematic review of data mining and machine learning for air pollution epidemiology. 2017 6
diabet-risk-studi-health Psychological language on Twitter predicts county-level heart disease mortality. 2015 58
diabet-risk-studi-health Position Statement on Active Outdoor Play. 2015 28
diabet-risk-studi-health Sharing data for public health research by members of an international online diabetes social network. 2011 26
digit-data-social-health Digital epidemiology. 2012 111
digit-data-social-health Assessing the feasibility and sample quality of a national random-digit dialing cellular phone survey of young adults. 2014 16
digit-data-social-health Ethical perspectives on recommending digital technology for patients with mental illness. 2017 12
disast-respons-inform-health Pro-anorexia and pro-recovery photo sharing: a tale of two warring tribes. 2012 13
disast-respons-inform-health Public Trauma after the Sewol Ferry Disaster: The Role of Social Media in Understanding the Public Mood. 2015 12
disast-respons-inform-health Local health department capacity for community engagement and its implications for disaster resilience. 2013 11
diseas-surveil-approach-health Approaches to passive mosquito surveillance in the EU. 2015 29
diseas-surveil-approach-health Data for action: collection and use of local data to end tuberculosis. 2015 28
diseas-surveil-approach-health Mapping population and pathogen movements. 2014 14
drug-data-studi-health Utilizing social media data for pharmacovigilance: A review. 2015 88
drug-data-studi-health Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. 2015 63
drug-data-studi-health Portable automatic text classification for adverse drug reaction detection via multi-corpus training. 2015 43
facebook-post-media-social-studi-health Leveraging Big Data to Improve Health Awareness Campaigns: A Novel Evaluation of the Great American Smokeout. 2016 18
facebook-post-media-social-studi-health Facebook Advertising Across an Engagement Spectrum: A Case Example for Public Health Communication. 2016 11
facebook-post-media-social-studi-health Social Network Behavior and Engagement Within a Smoking Cessation Facebook Page. 2016 6
hiv-sex-prevent-studi Minimal Awareness and Stalled Uptake of Pre-Exposure Prophylaxis (PrEP) Among at Risk, HIV-Negative, Black Men Who Have Sex with Men. 2015 78
hiv-sex-prevent-studi Acceptability of smartphone application-based HIV prevention among young men who have sex with men. 2014 77
hiv-sex-prevent-studi HIV incidence in men who have sex with men in England and Wales 2001-10: a nationwide population study. 2013 72
influenza-data-time-health National and local influenza surveillance through Twitter: an analysis of the 2012-2013 influenza epidemic. 2013 112
influenza-data-time-health Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance. 2015 58
influenza-data-time-health Monitoring influenza epidemics in china with search query from baidu. 2013 54
injuri-prevent-identifi-studi-health A review of CDC’s Web-based Injury Statistics Query and Reporting System (WISQARS™): Planning for the future of injury surveillance. 2017 12
injuri-prevent-identifi-studi-health The Road Traffic Injuries Research Network: a decade of research capacity strengthening in low- and middle-income countries. 2016 7
injuri-prevent-identifi-studi-health Epidemiology of training injuries in amateur taekwondo athletes: a retrospective cohort study. 2015 6
injuri-prevent-identifi-studi-health Health and Economic Burden of Running-Related Injuries in Dutch Trailrunners: A Prospective Cohort Study. 2017 6
injuri-risk-identifi-health Active surveillance of sudden cardiac death in young athletes by periodic Internet searches. 2013 7
injuri-risk-identifi-health Police Brutality and Black Health: Setting the Agenda for Public Health Scholars. 2017 6
injuri-risk-identifi-health Coccidioidomycosis among cast and crew members at an outdoor television filming event–California, 2012. 2014 5
measl-outbreak-vaccin-transmiss-public-health Measles Outbreak with Unique Virus Genotyping, Ontario, Canada, 2015. 2017 5
measl-outbreak-vaccin-transmiss-public-health Social capital and pet ownership - A tale of four cities. 2017 3
measl-outbreak-vaccin-transmiss-public-health A national measles outbreak in Ireland linked to a single imported case, April to September, 2016. 2018 3
media-social-research-health A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. 2013 294
media-social-research-health Social media: a review and tutorial of applications in medicine and health care. 2014 95
media-social-research-health Capturing the Patient’s Perspective: a Review of Advances in Natural Language Processing of Health-Related Text. 2017 9
medicin-medic-patient-health-inform Health 2050: The Realization of Personalized Medicine through Crowdsourcing, the Quantified Self, and the Participatory Biocitizen. 2012 54
medicin-medic-patient-health-inform Making sense of big data in health research: Towards an EU action plan. 2016 44
medicin-medic-patient-health-inform The new holism: P4 systems medicine and the medicalization of health and life itself. 2016 27
messag-health-research-inform-method-studi The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. 2015 356
messag-health-research-inform-method-studi Media coverage of health issues and how to work more effectively with journalists: a qualitative study. 2010 22
messag-health-research-inform-method-studi Public health emergency preparedness and response communications with health care providers: a literature review. 2011 11
mobil-data-health-studi Mobile health (mHealth) approaches and lessons for increased performance and retention of community health workers in low- and middle-income countries: a review. 2013 154
mobil-data-health-studi The Asthma Mobile Health Study, a large-scale clinical observational study using ResearchKit. 2017 32
mobil-data-health-studi Health Worker mHealth Utilization: A Systematic Review. 2016 12
null-surveil-public-health Influenza A (H7N9) and the importance of digital epidemiology. 2013 53
null-surveil-public-health Ethical challenges of big data in public health. 2015 49
null-surveil-public-health Direct-to-Consumer Pharmaceutical Advertising: Therapeutic or Toxic? 2011 26
obes-childhood-preval-prevent-activ-increas-research-studi-includ-health Patterns of childhood obesity prevention legislation in the United States. 2007 34
obes-childhood-preval-prevent-activ-increas-research-studi-includ-health An evolving scientific basis for the prevention and treatment of pediatric obesity. 2014 25
obes-childhood-preval-prevent-activ-increas-research-studi-includ-health Incorporating primary and secondary prevention approaches to address childhood obesity prevention and treatment in a low-income, ethnically diverse population: study design and demographic data from the Texas Childhood Obesity Research Demonstration (TX CORD) study. 2015 22
outbreak-viru-diseas-health Zika Virus: Medical Countermeasure Development Challenges. 2016 56
outbreak-viru-diseas-health What factors might have led to the emergence of Ebola in West Africa? 2015 55
outbreak-viru-diseas-health The emergence of ebola as a global health security threat: from ‘lessons learned’ to coordinated multilateral containment efforts. 2014 24
particip-ag-effect-studi-includ-health Support for healthy breastfeeding mothers with healthy term babies. 2017 33
particip-ag-effect-studi-includ-health Interventions for promoting the initiation of breastfeeding. 2016 22
particip-ag-effect-studi-includ-health Factors influencing sex differences in poststroke functional outcome. 2015 17
particip-ag-effect-studi-includ-health Population-level interventions in government jurisdictions for dietary sodium reduction. 2016 17
physic-activ-ag-studi-health Trends in television time, non-gaming PC use and moderate-to-vigorous physical activity among German adolescents 2002-2010. 2014 26
physic-activ-ag-studi-health Recreational screen-time among Chinese adolescents: a cross-sectional study. 2014 19
physic-activ-ag-studi-health Aerobic Capacity, Physical Activity and Metabolic Risk Factors in Firefighters Compared with Police Officers and Sedentary Clerks. 2015 17
research-public-health-develop Promoting integrated approaches to reducing health inequities among low-income workers: applying a social ecological framework. 2014 44
research-public-health-develop The public health exposome: a population-based, exposure science approach to health disparities research. 2014 32
research-public-health-develop Nature Contact and Human Health: A Research Agenda. 2017 31
resist-effect-public-health The comprehensive antibiotic resistance database. 2013 439
resist-effect-public-health Dissemination of health information through social networks: twitter and antibiotics. 2010 156
resist-effect-public-health Management of patients with multidrug-resistant/extensively drug-resistant tuberculosis in Europe: a TBNET consensus statement. 2014 93
search-data-inform-studi FluBreaks: early epidemic detection from Google flu trends. 2012 23
search-data-inform-studi Incidence of online health information search: a useful proxy for public health risk perception. 2013 12
search-data-inform-studi Surveillance Tools Emerging From Search Engines and Social Media Data for Determining Eye Disease Patterns. 2016 11
sleep-behavior-time-cross-studi-data-health Characterizing Sleep Issues Using Twitter. 2015 16
sleep-behavior-time-cross-studi-data-health Sleep, Health and Wellness at Work: A Scoping Review. 2017 14
sleep-behavior-time-cross-studi-data-health Digital Media and Sleep in Childhood and Adolescence. 2017 8
sleep-behavior-time-cross-studi-data-health Decreases in self-reported sleep duration among U.S. adolescents 2009-2015 and association with new media screen time. 2017 8
social-health-public-studi The influence of social networking sites on health behavior change: a systematic review and meta-analysis. 2015 120
social-health-public-studi Comparison of response rates and cost-effectiveness for a community-based survey: postal, internet and telephone modes with generic or personalised recruitment approaches. 2012 63
social-health-public-studi Public preferences about secondary uses of electronic health information. 2013 44
suicid-prevent-rate-risk-health Predicting national suicide numbers with social media data. 2013 26
suicid-prevent-rate-risk-health Suicide among children and adolescents in Canada: trends and sex differences, 1980-2008. 2012 25
suicid-prevent-rate-risk-health Accessing suicide-related information on the internet: a retrospective observational study of search behavior. 2013 21
suicid-prevent-relat-develop-health-studi Connecting the invisible dots: reaching lesbian, gay, and bisexual adolescents and young adults at risk for suicide through online social networks. 2009 26
suicid-prevent-relat-develop-health-studi Efficacy of Web-Based Collection of Strength-Based Testimonials for Text Message Extension of Youth Suicide Prevention Program: Randomized Controlled Experiment. 2016 5
suicid-prevent-relat-develop-health-studi Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? 2018 4
surveil-diseas-data-public Scoping review on search queries and social media for disease surveillance: a chronology of innovation. 2013 46
surveil-diseas-data-public Social media in public health. 2013 24
surveil-diseas-data-public Optimizing provider recruitment for influenza surveillance networks. 2012 18
surveil-diseas-data-public Health department use of social media to identify foodborne illness - Chicago, Illinois, 2013-2014. 2014 18
surveil-diseas-public-health Internet-based surveillance systems for monitoring emerging infectious diseases. 2014 61
surveil-diseas-public-health Social media and internet-based data in global systems for public health surveillance: a systematic review. 2014 45
surveil-diseas-public-health Enhancing disease surveillance with novel data streams: challenges and opportunities. 2015 26
technologi-research-commun-data-health-develop Using Electronic Health Records for Population Health Research: A Review of Methods and Applications. 2016 41
technologi-research-commun-data-health-develop Use of health information technology among racial and ethnic underserved communities. 2011 18
technologi-research-commun-data-health-develop Public preferences and the challenge to genetic research policy. 2014 9
tweet-twitter-social-health Social media use in the United States: implications for health communication. 2009 257
tweet-twitter-social-health Adoption and use of social media among public health departments. 2012 79
tweet-twitter-social-health Tweaking and tweeting: exploring Twitter for nonmedical use of a psychostimulant drug (Adderall) among college students. 2013 54
vaccin-public-inform-health Vaccine hesitancy: an overview. 2013 122
vaccin-public-inform-health Assessing vaccination sentiments with online social media: implications for infectious disease dynamics and control. 2011 119
vaccin-public-inform-health Communicating with parents about vaccination: a framework for health professionals. 2012 117

Use of keywords

We can review the commonest Mesh headings associated with each cluster tag.


labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  select(clus_names, mesh) %>%
  filter(mesh != "NULL") %>%
  unnest(mesh) %>%
  count(clus_names, mesh,sort = TRUE) %>%
  filter(n < 30) %>%
  ungroup() %>%
  group_by(clus_names) %>%
  top_n(10)  %>%
  mutate(summary = paste(mesh, collapse = "; " )) %>%
  select(-c(mesh, n)) %>%
  distinct() %>%
  arrange(clus_names) %>%
  knitr::kable()
clus_names summary
abstract-annual-null-meet Humans; Animals; Internal Medicine; Athletic Injuries; California; Education, Pharmacy; Libraries, Medical; Library Associations; Schools, Pharmacy; Sports; Students, Pharmacy; United States
alcohol-drink-studi-health Humans; Alcohol Drinking; Female; Male; Adolescent; Young Adult; Adult; Social Media; Alcoholic Beverages; Alcoholic Intoxication
biosurveil-system-data-inform-identifi-health Humans; Biosurveillance; Internet; Animals; Bioterrorism; Communicable Disease Control; Databases, Factual; Decision Support Techniques; Disease Outbreaks; Influenza A Virus, H1N1 Subtype; Influenza, Human; Public Health; Public Health Surveillance; Statistics as Topic
canadian-activ-survei-develop-health-time-includ Humans; Canada; Child, Preschool; Exercise; Female; Male; Infant; Guidelines as Topic; Infant, Newborn; Adult; Guideline Adherence; Health Promotion; Sleep; Surveys and Questionnaires; Time Factors; United States; Young Adult
cancer-research-studi-health Neoplasms; Adult; Middle Aged; Aged; United States; Young Adult; Adolescent; Early Detection of Cancer; Survivors; Health Knowledge, Attitudes, Practice
care-improv-health-includ-studi Humans; Female; Adult; Male; United States; Internet; Attitude of Health Personnel; Canada; Delivery of Health Care; Health Promotion; Middle Aged; Practice Guidelines as Topic; Pregnancy; Qualitative Research; Retrospective Studies; Risk Assessment; Treatment Outcome; Young Adult
cigarett-tobacco-smoke-studi Adult; Young Adult; Tobacco Products; Smoking Cessation; United States; Cross-Sectional Studies; Internet; Middle Aged; Marketing; Tobacco Industry
confer-research-particip-health Humans; Biomedical Research; Congresses as Topic; Africa; Community-Institutional Relations; Cooperative Behavior; Delivery of Health Care, Integrated; Developing Countries; Diagnostic Imaging; Global Health; Group Processes; Health Education; Health Knowledge, Attitudes, Practice; Health Policy; Health Services Research; Health Status Disparities; Healthcare Disparities; Information Dissemination; International Agencies; Medical Informatics; Models, Theoretical; Noncommunicable Diseases; Organizational Objectives; Patient Education as Topic; Public Health; Radiology; Research; Students, Medical; United States; World Health Organization
data-clinic-research-develop-health Humans; Medical Informatics; Electronic Health Records; Data Collection; Data Mining; Databases, Factual; Delivery of Health Care; Privacy; United States; Biomedical Research; Confidentiality; Data Anonymization; Datasets as Topic; Epidemiology; Information Systems; Internet; Medical Records; Public Health Informatics; Reproducibility of Results; Social Media
data-identifi-approach-health-result-base Humans; Air Pollution; Data Mining; Epidemiological Monitoring; Population Surveillance; Algorithms; Armed Conflicts; Artificial Intelligence; Bacterial Infections; Bibliometrics; Biophysical Phenomena; Bombs; Cities; Conservation of Natural Resources; Crime Victims; Crowdsourcing; Culture; Databases, Factual; Delivery of Health Care; Demography; Drug Resistance, Bacterial; Environmental Monitoring; Epidemiologic Studies; Exposure to Violence; Extraction and Processing Industry; Formaldehyde; Geography; Government; Health Facilities; Health Personnel; Health Resources; Health Workforce; History, 20th Century; History, 21st Century; Incidence; Internet; Likelihood Functions; Machine Learning; Malaria; Malawi; Mass Casualty Incidents; Medical Informatics; Models, Biological; Models, Theoretical; Molecular Dynamics Simulation; Natural Gas; Oil and Gas Fields; Physicians; Pilot Projects; Policy; Policy Making; Protein Conformation; Protein Folding; Proteins; Satellite Imagery; Social Planning; Social Values; Software; Surveys and Questionnaires; Sweden; Syria; Texas; Uncertainty
data-inform-health-studi Global Health; Health Knowledge, Attitudes, Practice; Child, Preschool; Health Policy; Risk Factors; Canada; Health Promotion; Pregnancy; Influenza, Human; Information Dissemination
diabet-risk-studi-health Humans; Male; Female; Diabetes Mellitus; Middle Aged; Risk Factors; Adult; United States; Cross-Sectional Studies; Social Media; Social Support
digit-data-social-health Humans; Adolescent; Female; Internet; Male; Public Health; Social Media; Adult; Cell Phones; Data Collection; Population Surveillance; Smoking; Smoking Cessation; Telemedicine; United States; Young Adult
disast-respons-inform-health Humans; Disasters; Cyclonic Storms; Adolescent; Female; Male; Social Media; Child; Adult; Child, Preschool; Disaster Planning; Public Health
diseas-surveil-approach-health Humans; Animals; Communicable Diseases; Disease Outbreaks; Male; Disease Vectors; Female; Poverty; Public Health Surveillance; Travel; United States
drug-data-studi-health Social Media; Internet; Drug-Related Side Effects and Adverse Reactions; Pharmacovigilance; United States; Adverse Drug Reaction Reporting Systems; Data Mining; Databases, Factual; United States Food and Drug Administration; Prescription Drugs
facebook-post-media-social-studi-health Humans; Social Media; Adult; Female; Communication; Cross-Sectional Studies; Data Collection; Government Agencies; Health Promotion; Hospitals; Information Seeking Behavior; Malaysia; Male; Middle Aged; Physicians; Public Health; Smoking Cessation; Social Behavior; Social Networking; Social Support; Surveys and Questionnaires; Taiwan; Wit and Humor as Topic
hiv-sex-prevent-studi Sexual Partners; Risk Factors; Cross-Sectional Studies; Mass Screening; Risk-Taking; Internet; Surveys and Questionnaires; Health Knowledge, Attitudes, Practice; Pre-Exposure Prophylaxis; Sexual and Gender Minorities; Sexually Transmitted Diseases
influenza-data-time-health Internet; United States; Seasons; Disease Outbreaks; Forecasting; Social Media; Population Surveillance; Epidemiological Monitoring; Female; Models, Statistical
injuri-prevent-identifi-studi-health Humans; Wounds and Injuries; Adolescent; Child; Risk Factors; United States; Centers for Disease Control and Prevention (U.S.); Population Surveillance; Public Health; Adult; Athletic Injuries; Health Promotion; Incidence; Suicide
injuri-risk-identifi-health Humans; Female; Male; Adult; Young Adult; Middle Aged; Accidents, Traffic; Adolescent; Incidence; Risk Factors; United States; Wounds and Injuries
measl-outbreak-vaccin-transmiss-public-health Humans; Measles; Disease Outbreaks; Vaccination; Adolescent; Child; Communicable Disease Control; Female; Genotype; Male; Measles virus
media-social-research-health Social Media; Humans; Confidentiality; Delivery of Health Care; Health Communication; Health Personnel; Health Promotion; Internet; Medical Informatics; Privacy; Social Networking
medicin-medic-patient-health-inform Humans; Databases, Factual; Precision Medicine; Delivery of Health Care; Algorithms; Big Data; Biomedical Research; Evidence-Based Medicine; Health Behavior; Health Knowledge, Attitudes, Practice; Healthy Lifestyle; Information Dissemination; Preventive Medicine; Public Health
messag-health-research-inform-method-studi Humans; Health Personnel; Natural Language Processing; Public Health; Biomedical Research; Checklist; Electronic Mail; Guidelines as Topic; Interviews as Topic; Research Report
mobil-data-health-studi Humans; Male; Adult; Female; Middle Aged; Telemedicine; Aged; Cell Phone; Communication; Community Health Workers; Delivery of Health Care; Developing Countries; Health Services Accessibility; Mobile Applications; Population Surveillance; Prospective Studies; Smartphone; Surveys and Questionnaires; Text Messaging; Young Adult
null-surveil-public-health Public Health; United States; Social Media; Health Promotion; Population Surveillance; Public Health Surveillance; Animals; Chronic Disease; Cooperative Behavior; Female; Internet
obes-childhood-preval-prevent-activ-increas-research-studi-includ-health Humans; Adolescent; Female; Male; Pediatric Obesity; Child; Obesity; Adult; Cross-Sectional Studies; Health Promotion; Prevalence; Public Health
outbreak-viru-diseas-health Hemorrhagic Fever, Ebola; Female; Animals; Male; Public Health; United States; Adult; Middle Aged; Population Surveillance; Zika Virus Infection
particip-ag-effect-studi-includ-health Male; Adult; Adolescent; Young Adult; Middle Aged; Child; Aged; Health Behavior; United States; Exercise; Pregnancy
physic-activ-ag-studi-health Humans; Female; Male; Adolescent; Child; Exercise; Adult; Socioeconomic Factors; Young Adult; Cross-Sectional Studies; Internet; Middle Aged; Prospective Studies; Risk Factors; Surveys and Questionnaires
research-public-health-develop United States; Public Health; Female; Health Policy; Male; Delivery of Health Care; Adult; Environmental Exposure; Health Status Disparities; Policy Making
resist-effect-public-health Anti-Bacterial Agents; Drug Resistance, Bacterial; Female; Male; Adult; Drug Resistance, Microbial; Food Safety; Food Supply; Middle Aged; Food Industry; Health Knowledge, Attitudes, Practice; Young Adult
search-data-inform-studi Humans; Internet; United States; Search Engine; Female; Population Surveillance; Adult; Centers for Disease Control and Prevention (U.S.); Forecasting; Incidence; Influenza, Human; Male; Public Health; Risk Assessment; Social Media
sleep-behavior-time-cross-studi-data-health Humans; Sleep; Female; Adolescent; Cross-Sectional Studies; Male; Aged; Middle Aged; Sleep Wake Disorders; Social Media; Time Factors
social-health-public-studi Humans; Adult; Female; Male; Health Behavior; Health Promotion; Middle Aged; Social Media; Aged; United States
suicid-prevent-rate-risk-health Humans; Suicide; Male; Female; Adult; Risk Factors; Middle Aged; Retrospective Studies; Adolescent; China; Internet; Models, Statistical; Primary Prevention; Republic of Korea; Search Engine; Social Media; Suicidal Ideation; Suicide, Attempted; Young Adult
suicid-prevent-relat-develop-health-studi Adolescent; Adult; Female; Humans; Internet; Male; Suicide; Young Adult; Adolescent Behavior; Age Factors; Algorithms; Child; Confidence Intervals; Focus Groups; Homosexuality, Female; Homosexuality, Male; Information Seeking Behavior; Monte Carlo Method; Pilot Projects; Prevalence; Qualitative Research; Risk Assessment; Risk Factors; Self-Injurious Behavior; Social Support; Suicide, Attempted; United Kingdom; United States
surveil-diseas-data-public Humans; Social Media; Disease Outbreaks; Population Surveillance; Influenza, Human; Models, Statistical; China; Communicable Diseases; Influenza A Virus, H7N9 Subtype; Internet; Public Health; Software
surveil-diseas-public-health Humans; Communicable Diseases; Public Health Surveillance; Internet; Population Surveillance; Disease Outbreaks; Social Media; Public Health; Animals; Data Collection; Epidemiological Monitoring
technologi-research-commun-data-health-develop Humans; Electronic Health Records; Access to Information; Age Factors; Air Pollutants; Air Pollution; Cell Phone; Communication; Community Health Workers; Computer Security; Continental Population Groups; Cultural Diversity; Culture; Data Collection; Data Curation; Data Mining; Dementia; Diagnostic Techniques and Procedures; Disabled Persons; Environment; Environmental Monitoring; Epidemiologic Research Design; Ethnic Groups; Focus Groups; Forecasting; Health Behavior; Health Knowledge, Attitudes, Practice; Health Services Accessibility; Health Status Disparities; Hospital Information Systems; Human Rights; India; Malawi; Maryland; Medical Records Systems, Computerized; Medically Underserved Area; Mental Health; Mobile Applications; Models, Theoretical; Patient Satisfaction; Physician-Patient Relations; Public Health; Reproducibility of Results; Research; Rural Health Services; Self-Help Devices; Sex Factors; Social Environment; Socioeconomic Factors; United Nations; United States; Universal Health Insurance; Vital Signs
tweet-twitter-social-health United States; Internet; Information Dissemination; Female; Male; Public Health; Adult; Communication; Disease Outbreaks; Adolescent; Data Collection; Public Opinion
vaccin-public-inform-health Male; Papillomavirus Vaccines; Health Knowledge, Attitudes, Practice; Social Media; Child; Immunization Programs; Adolescent; Adult; Patient Acceptance of Health Care; United States

Investigating individual themes to identify full-text articles

Lets explore articles for which public health is a Mesh heading.


ph <- labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  filter(str_detect(keywords, "Public Health"))

ph %>%
  count(clus_names, sort = TRUE)
#> # A tibble: 39 x 2
#>    clus_names                         n
#>    <chr>                          <int>
#>  1 data-inform-health-studi         108
#>  2 null-surveil-public-health        27
#>  3 tweet-twitter-social-health       22
#>  4 influenza-data-time-health        17
#>  5 outbreak-viru-diseas-health       16
#>  6 vaccin-public-inform-health       16
#>  7 research-public-health-develop    15
#>  8 surveil-diseas-public-health      13
#>  9 cigarett-tobacco-smoke-studi      12
#> 10 hiv-sex-prevent-studi             12
#> # ... with 29 more rows

There is one article tagged with ai-intellig-artifici-health-data which has Public Health as a mesh heading. We can use epmc_ftxt to extract the full text article.

library(rvest)



get_pmcids <- ph %>%
  filter(clus_names == "data-research-health-develop") %>%
  select(id, pmcid) %>%
  filter(!is.na(pmcid))


details <- mutate(ids, details = map(get_ids, epmc_details))

full_text <- details %>%
    mutate(full_text = map(details, "ftx")) %>%
    unnest(full_text) %>%
  filter(availability == "Free") %>%
  left_join(get_pmcids, by = c("value" = "id")) %>%
  distinct()


full_text <- europepmc::epmc_ftxt("PMC5171550")

ft <- full_text %>%
  html_text()

ft %>%
  str_split(., "\\. ") %>%
  enframe() %>%
  formattable::formattable()

Finally we can gather all the abstracts into a single interactive table which can be searched, filtered and shared.


labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid"))  %>%
  select(cluster, clus_names, doi, title, journalTitle, pubYear, citedByCount, absText) %>%
  mutate(doi = paste0("<a href = https://", doi, ">doi</a>")) %>%
  DT::datatable(escape = FALSE, extensions = c('Responsive','Buttons', 'FixedHeader'), 
                filter = "top", 
  options = list(
    autoWidth = TRUE,
    columnDefs = list(list(width = '450px')),
    dom = 'Bfrtip',
    buttons = c('csv', 'excel'),
    fixedHeader=TRUE) 
  )