Introduction

  • Literature and evidence review essential in public health practice
  • Exponential growth in volume of literature
  • Initial first steps usually:
    • Developing search strategy
    • Reviewing and filtering abstracts
    • Obtaining full text (if possible)
    • Data extraction

This can be a manual and protracted iterative process which may involve using specialised searching services, downloading abstracts, reading and filtering, secondary searching and so on, and may involve sifting many thousands of abstracts.

Often we may just want a rapid overview of the literature to help focus further reviewing.

In this vignette we demonstrate the use of R packages for large scale extraction of abstracts, and analytical techniques for identifying topics or themes in the abstracts.

The vignette is based on a number of R packages:

  1. europepmc - this is a sophisticated tool which interacts with the PubMedCentral API and provides access to additional fields.
  2. adjutant - this is a fully fledged package with retrieval and clustering functions. 3.tidytext - a package for text mining using tidy data principles.
  3. Rtsne - this uses the tSNE algorithm for data reduction and cluster visualisation
  4. dbscan - applies the HDBSCAN algorithm for data clustering
  5. myScrapers - wraps some functions built on other packages to automate the search, extraction, and filtering process.

We have “hacked” some of the functions in these packages and written additional functions to develop a work flow from searching and retrieval to analysis

A simple example using europepmc

Searching Europe PubMed Central (epmc)

This is a package which allows searching of EuropePMC via the API.

It can be downloaded from CRAN.


if(!require("europepmc")) install.packages("europepmc")
library(europepmc)

The main function is epmc_search which allows us to search the site and retrieve abstracts, metadata and citation counts.

We’ll use it with the search term (“data science” OR “big data” OR KW:machine learning OR KW:artificial intelligence) AND (KW:public health OR KW:population health OR surveillance).


head(epmc_search(params$search, limit = 10))
#> # A tibble: 6 x 28
#>   id    source pmid  pmcid doi   title authorString journalTitle issue
#>   <chr> <chr>  <chr> <chr> <chr> <chr> <chr>        <chr>        <chr>
#> 1 3104… MED    3104… PMC6… 10.1… An o… Kamel Boulo… Int J Healt… 1    
#> 2 3176… MED    3176… <NA>  10.1… Crow… Filice RW, … J Digit Ima… <NA> 
#> 3 3108… MED    3108… PMC6… 10.1… Mach… Lake IR, Co… BMC Public … 1    
#> 4 3054… MED    3054… PMC6… 10.3… Arti… Benke K, Be… Int J Envir… 12   
#> 5 3164… MED    3164… PMC6… 10.1… Reap… Huston P, E… Can Commun … 11   
#> 6 3175… MED    3175… <NA>  10.1… Impr… Bennie M, M… Br J Clin P… <NA> 
#> # … with 19 more variables: journalVolume <chr>, pubYear <chr>,
#> #   journalIssn <chr>, pageInfo <chr>, pubType <chr>, isOpenAccess <chr>,
#> #   inEPMC <chr>, inPMC <chr>, hasPDF <chr>, hasBook <chr>, hasSuppl <chr>,
#> #   citedByCount <int>, hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>

We can see how many articles are available altogether by running epmc_profile.


profile <- epmc_profile(query = params$search)
art_count <- profile$pubType$count[1]

Running epmc_profile finds 2027 articles of which 1437 are full text articles, and 830 are open access.

This doesn’t extract the abstract text or Mesh headings (keywords) - to facilitate this we have wrapped the search function, into get_epmc_abstract in myScrapers.

library(tictoc)
library(adjutant)
set.seed(42)

tic()
search1 <-get_epmc_abstracts(search = params$search, limit = art_count)
#> 613 sec elapsed
toc()
#> 613.03 sec elapsed

head(search1, 20)
#> $df
#> # A tibble: 2,027 x 32
#>    id    source pmid  pmcid doi   title authorString journalTitle issue
#>    <chr> <chr>  <chr> <chr> <chr> <chr> <chr>        <chr>        <chr>
#>  1 3104… MED    3104… PMC6… 10.1… An o… Kamel Boulo… Int J Healt… 1    
#>  2 3176… MED    3176… <NA>  10.1… Crow… Filice RW, … J Digit Ima… <NA> 
#>  3 3108… MED    3108… PMC6… 10.1… Mach… Lake IR, Co… BMC Public … 1    
#>  4 3054… MED    3054… PMC6… 10.3… Arti… Benke K, Be… Int J Envir… 12   
#>  5 3164… MED    3164… PMC6… 10.1… Reap… Huston P, E… Can Commun … 11   
#>  6 3175… MED    3175… <NA>  10.1… Impr… Bennie M, M… Br J Clin P… <NA> 
#>  7 3165… MED    3165… PMC6… 10.3… A Bi… Wang L, Xia… Int J Envir… 20   
#>  8 3123… MED    3123… PMC6… 10.1… Comp… Tapak L, Ha… BMC Res Not… 1    
#>  9 3173… MED    3173… PMC6… 10.2… Digi… Ding H, Fat… J Thorac Dis supp…
#> 10 3015… MED    3015… PMC6… 10.1… Arti… Thiébaut R,… Yearb Med I… 1    
#> # … with 2,017 more rows, and 23 more variables: journalVolume <chr>,
#> #   pubYear <chr>, journalIssn <chr>, pageInfo <chr>, pubType <chr>,
#> #   isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>, hasBook <chr>,
#> #   hasSuppl <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, name <int>, absText <list>, mesh <list>,
#> #   keywords <chr>
#> 
#> $time
#> $time$tic
#> elapsed 
#>    32.2 
#> 
#> $time$toc
#> elapsed 
#>   645.2 
#> 
#> $time$msg
#> logical(0)

We can see that the get_epmc_abstracts function returns addition metadata such as citation counts, whether the journal is open access and whether there is PDF available. By default, 1000 article descriptions are downloaded. It also includes mesh headings and abstract text.

Analysing abstracts

Abstracts per year

We can easily look at annual abstract frequency - we can readily see the growth in publication frequency in the last 3 years.


search1$df %>%
  mutate(pubYear = as.integer(pubYear)) %>%
  count(pubYear) %>%
  ggplot(aes(pubYear, n)) +
  geom_col(fill = "blue") +
  labs(title = "Abstracts per year", 
       subtitle = paste0("Search: ", "\n", params$search)) +
  phecharts::theme_phe() +
  theme(axis.text.x = element_text(angle = 45 ,hjust = 1), 
        plot.subtitle = element_text(size =7)) +
  xlim(c(1990, 2020))

Journal frequency

Similarly we can identify the most frequent journals


journal_count <- search1$df %>%
  count(journalTitle) %>%
  top_n(20) %>%
  arrange(-n)

 journal_count %>%
  ggplot(aes(reorder(journalTitle, n), n)) +
  geom_col(fill = "blue") +
  coord_flip() +
  labs(title = "Journal frequency") +
  phecharts::theme_phe()

Public Health Rep and Commun Dis Intell Q Rep are the most frequent journals publishing articles on (“data science” OR “big data” OR KW:machine learning OR KW:artificial intelligence) AND (KW:public health OR KW:population health OR surveillance).

Topic identification

Once we have a data frame of records with abstract text, we can prepare the data for analysis. The create_corpus function is designed for this.


out1 <- search1$df %>%
  select(pmid, pmcid ,doi, title, pubYear, citedByCount, absText, journalTitle) %>%
  filter(absText != "NULL") %>%
  mutate(text = paste(title, absText))

Text mining

We will use a method exemplified in the adjutant package which uses unsupervised machine learning to try and cluster similar articles and attach themes.

In this approach undertake some natural language processing. We will

  • Split each abstract into groups is single words
  • Remove numbers and common (stop) words
  • Stem each word (definition:)
  • Calculate the tf-idf score for each word in each abstract - this gives more weight to words which are more “typical” of the abstracts
  • Create a document feature matrix
  • Undertake dimensionality reduction using tSNE to simplify
  • Run HDBSCAN to identify clusters
  • Name the clusters
  • QA the result

The ultimate output of this analysis is a visualisation of clustered and labelled abstracts and a interactive table.


library(tidytext)

corp <- myScrapers::create_abstract_corpus(df = search1$df)

head(corp$corpus)
#> # A tibble: 6 x 6
#>   pmid     word         n     tf   idf tf_idf
#>   <chr>    <chr>    <int>  <dbl> <dbl>  <dbl>
#> 1 10071946 appli        2 0.0286  2.77 0.0792
#> 2 10071946 argument     1 0.0143  4.28 0.0611
#> 3 10071946 articl       2 0.0286  2.49 0.0712
#> 4 10071946 attempt      1 0.0143  3.72 0.0531
#> 5 10071946 base         4 0.0571  1.40 0.0802
#> 6 10071946 child        1 0.0143  3.33 0.0476
#corp$corpus %>%
 # count(pmid)
library(factoextra)

clust <- create_abstract_cluster(corpus = corp$corpus, minPts = 10, perplexity = 30)
#> If there are a small number of abstracts, set perplexity value 
#> to less than 30% of abstract count

#> 287.86 sec elapsed

hc <- clust$dbscan$hc 

#fviz_dend(hc, 9, color_labels_by_k = TRUE, type = "phylogenic")


clust$cluster_size
#> # A tibble: 38 x 2
#>    cluster     n
#>      <dbl> <int>
#>  1       0   590
#>  2       1    18
#>  3       2    18
#>  4       3    23
#>  5       4    13
#>  6       5    24
#>  7       6    12
#>  8       7   303
#>  9       8    27
#> 10       9    24
#> # … with 28 more rows

Labelling clusters


labels <- create_cluster_labels(corp$corpus, clustering = clust$clustering, top_n = 4)

labels$labels
#> # A tibble: 38 x 2
#> # Groups:   cluster [38]
#>    cluster clus_names                         
#>      <dbl> <chr>                              
#>  1       0 data-health-studi-public           
#>  2       1 brief-variat-null-data             
#>  3       2 intellectu-measur-public-health    
#>  4       3 epidem-intellig-null-health        
#>  5       4 ethic-research-medic-null          
#>  6       5 31-1-surveil-null                  
#>  7       6 communic-surveil-diseas-null-health
#>  8       7 polici-null-public-health          
#>  9       8 doctor-null-imag-studi             
#> 10       9 messag-commun-health-public        
#> # … with 28 more rows

Visualise


p <- labels$results %>%
  left_join(search1$df, by = c("pmid.value" = "pmid")) %>%
  ggplot(aes(X1, X2)) +
  geom_point(aes(colour = clustered, size = citedByCount) ) +
  ggrepel::geom_text_repel(data = labels$plot, aes(medX, medY, label = clus_names), size = 3, colour = "black", alpha = 0.9) 


p + scale_alpha_manual(values=c(1,0)) +
  viridis::scale_color_viridis(discrete = TRUE, option = "viridis", alpha = .5, begin = .8, end = .1, direction = -1) +
  phecharts::theme_phe() +
  theme(panel.background = element_rect(fill = "#ffffff"), 
        plot.title = element_text(size = 8)) +
  labs(subtitle = paste("Clustering: ", nrow(labels$plot), " topics" ), 
       title = paste("Search ", "= ", params$search ))

Understanding the labels

Most cited articles


most_cited <- labels$results %>%
  left_join(search1$df, by = c("pmid.value" = "pmid")) %>%
  filter(cluster !=0) %>%
  group_by(clus_names) %>%
  top_n(n = 3, citedByCount) %>%
  select(clus_names, title, pubYear, citedByCount) %>%
  ungroup() %>%
  arrange(clus_names, -citedByCount)

most_cited %>%
  gt::gt()
clus_names title pubYear citedByCount
1-control-studi-assess-base-public-health Spatially explicit multi-criteria decision analysis for managing vector-borne diseases. 2011 23
1-control-studi-assess-base-public-health Neurodevelopmental outcomes at 7 years' corrected age in preterm infants who were fed high-dose docosahexaenoic acid to term equivalent: a follow-up of a randomised controlled trial. 2015 20
1-control-studi-assess-base-public-health Risk factors for cerebrovascular disease mortality among the elderly in Beijing: a competing risk analysis. 2014 11
31-1-surveil-null OzFoodNet quarterly report, 1 July to 30 September 2010. 2010 2
31-1-surveil-null Australian Gonococcal Surveillance Programme, 1 July to 30 September 2015. 2016 2
31-1-surveil-null Australian Gonococcal Surveillance Programme, 1 July to 30 September 2016<U+2029>. 2017 2
artifici-intellig-health-public Deep Learning for Health Informatics. 2017 52
artifici-intellig-health-public The Association of Public Health Observatories (APHO) Diabetes Prevalence Model: estimates of total diabetes prevalence for England, 2010-2030. 2011 48
artifici-intellig-health-public Artificial neural networks for predicting failure to survive following in-hospital cardiopulmonary resuscitation. 1993 32
brief-variat-null-data Data briefing. Variation in primary care spend on HRG. 2006 0
brief-variat-null-data Data briefing. Variations in length of hospital stays. 2006 0
brief-variat-null-data Data briefing. Waiting times and patient flow. 2006 0
brief-variat-null-data Data briefing. A&E admissions by diagnosis and SHA. 2006 0
brief-variat-null-data Variations in length of stay. 2006 0
brief-variat-null-data Data briefing. Length of stay--the day-case drive. 2006 0
brief-variat-null-data Data briefing. Variation in A and E admissions. 2006 0
brief-variat-null-data Data briefing. Variance in primary care HRG spending. 2007 0
brief-variat-null-data Data briefing. Understanding preventable injury. 2007 0
brief-variat-null-data Data briefing. How length of stay varies by SHA area. 2007 0
brief-variat-null-data Data briefing. Massive variations in waiting times. 2007 0
brief-variat-null-data Data briefing. Day-case rates on nationwide increase. 2007 0
brief-variat-null-data Data briefing. New era of mental healthcare insights. 2007 0
brief-variat-null-data Data briefing. Is your PCT being overcharged? 2007 0
brief-variat-null-data Data briefing. Waiting times: cutting inequality. 2008 0
brief-variat-null-data Data briefing. Timely admission reflects efficiency. 2008 0
brief-variat-null-data Data briefing. Elective procedures: all in a day's work? 2008 0
brief-variat-null-data Data briefing. Admissions up for treatable illnesses. 2008 0
children-ag-studi-health Skill formation and the economics of investing in disadvantaged children. 2006 386
children-ag-studi-health Environmental lead exposure: a public health problem of global dimensions. 2000 187
children-ag-studi-health Children's health and the environment: public health issues and challenges for risk assessment. 2004 125
communic-surveil-diseas-null-health Drug-resistant malaria--occurrence, control, and surveillance. 1980 22
communic-surveil-diseas-null-health The use of social media in public health surveillance. 2015 7
communic-surveil-diseas-null-health The need for and the role of a coordinator in child health surveillance/promotion. 2001 6
curv-predict-risk-perform-valid-data-identifi-studi-health An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics. 2011 11
curv-predict-risk-perform-valid-data-identifi-studi-health Individualised risk assessment for diabetic retinopathy and optimisation of screening intervals: a scientific approach to reducing healthcare costs. 2016 10
curv-predict-risk-perform-valid-data-identifi-studi-health Derivation and validation of different machine-learning models in mortality prediction of trauma in motorcycle riders: a cross-sectional retrospective study in southern Taiwan. 2018 6
data-studi-public-health Development of a clinical data warehouse for hospital infection control. 2003 48
data-studi-public-health Intelligent information: a national system for monitoring clinical performance. 2008 29
data-studi-public-health Learning from death: a hospital mortality reduction programme. 2006 26
doctor-health-studi-public Epidemiology in Latin America and the Caribbean: current situation and challenges. 2012 64
doctor-health-studi-public Dilemmas in rationing health care services: the case for implicit rationing. 1995 43
doctor-health-studi-public Health Impacts of Climate Change in Pacific Island Countries: A Regional Assessment of Vulnerabilities and Adaptation Priorities. 2016 16
doctor-null-imag-studi Thinking the "unthinkable": why Philip Morris considered quitting. 2003 21
doctor-null-imag-studi The image of the nurse on the internet. 2007 19
doctor-null-imag-studi A new kind of doctor. 1981 15
effect-studi-health-public Cancer risk assessment, indicators, and guidelines for polycyclic aromatic hydrocarbons in the ambient air. 2002 481
effect-studi-health-public Characterization of potential endocrine-related health effects at low-dose levels of exposure to PCBs. 1999 170
effect-studi-health-public Trends and affordability of cigarette prices: ample room for tax increases and related health gains. 2002 85
epidem-intellig-null-health Epidemic intelligence during mass gatherings. 2006 3
epidem-intellig-null-health Experiences of the Student Epidemic Intelligence Society in strengthening public health response and epidemiologic capacity. 2010 2
epidem-intellig-null-health The ideal minister of health. 2002 1
epidem-intellig-null-health New intelligence. 2005 1
epidem-intellig-null-health Different approaches to gathering epidemic intelligence in Europe. 2006 1
ethic-research-medic-null The reversal test: eliminating status quo bias in applied ethics. 2006 25
ethic-research-medic-null Ethics and international research. 1997 12
ethic-research-medic-null The long-term prognosis of pre-term infants: conceptual, methodological, and ethical issues. 1994 5
genet-medicin-null-public Genetic screening programs and public policy. 1977 10
genet-medicin-null-public The problem with academic medicine: engineering our way into and out of the mess. 2005 4
genet-medicin-null-public Genetics: let the public decide. 1997 3
genet-medicin-null-public The genetics of human nature. 1973 3
genet-medicin-null-public Digital medicine, on its way to being just plain medicine. 2018 3
genet-research-health-public-develop-studi Choline: an essential nutrient for public health. 2009 192
genet-research-health-public-develop-studi Measuring paternal discrepancy and its public health consequences. 2005 74
genet-research-health-public-develop-studi The great opportunity: Evolutionary applications to medicine and public health. 2008 64
genet-social-research-determin-human-health Public attitudes regarding the donation and storage of blood specimens for genetic research. 2001 74
genet-social-research-determin-human-health Genetic fatalism and social policy: the implications of behavior genetics research. 1993 14
genet-social-research-determin-human-health Public deliberation and private choice in genetics and reproduction. 2000 12
health-studi-identifi-public-includ Rodent reservoirs of future zoonotic diseases. 2015 77
health-studi-identifi-public-includ The use of social networking sites for public health practice and research: a systematic review. 2014 58
health-studi-identifi-public-includ Crowdsourcing, citizen sensing and sensor web technologies for public and environmental health surveillance and crisis management: trends, OGC standards and application examples. 2011 46
hiv-infect-health-public Community-based treatment of advanced HIV disease: introducing DOT-HAART (directly observed therapy with highly active antiretroviral therapy). 2001 90
hiv-infect-health-public Does racial concordance between HIV-positive patients and their physicians affect the time to receipt of protease inhibitors? 2004 67
hiv-infect-health-public Heroin in brown, black and white: structural factors and medical consequences in the US heroin market. 2009 52
intellectu-measur-public-health Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet. 2009 232
intellectu-measur-public-health Infection control in the multidrug-resistant era: tending the human microbiome. 2012 48
intellectu-measur-public-health Tobacco control advocates must demand high-quality media campaigns: the California experience. 1998 38
intellig-health-studi-public A conceptual framework for intelligence-based public health nutrition workforce development. 2003 17
intellig-health-studi-public Effect of spiritual intelligence, emotional intelligence, psychological ownership and burnout on caring behaviour of nurses: a cross-sectional study. 2013 15
intellig-health-studi-public The politics of nursing knowledge and education critical pedagogy in the face of the militarization of nursing in the war on terror. 2010 10
intellig-health-studi-public The impact of emotional intelligence on work engagement of registered nurses: the mediating role of organisational justice. 2015 10
mental-studi-health-develop-public-includ The disorders induced by iodine deficiency. 1994 211
mental-studi-health-develop-public-includ The "backbone" of stigma: identifying the global core of public prejudice associated with mental illness. 2013 54
mental-studi-health-develop-public-includ Public mental health: the time is ripe for translation of evidence into practice. 2015 38
messag-commun-health-public Salutogenesis. 2005 140
messag-commun-health-public Using crowdsourcing technology for testing multilingual public health promotion materials. 2012 23
messag-commun-health-public How to exploit twitter for public health monitoring? 2013 17
nh-research-public-health Reoperation rates after breast conserving surgery for breast cancer among women in England: retrospective study of hospital episode statistics. 2012 96
nh-research-public-health Hype and public trust in science. 2013 15
nh-research-public-health Evidence-based commissioning in the English NHS: who uses which sources of evidence? A survey 2010/2011. 2013 5
nh-research-public-health Reversing the pipeline? Implementing public health evidence-based guidance in english local government. 2017 5
polici-null-public-health Public health assessment of potential biological terrorism agents. 2002 393
polici-null-public-health The making of a disease: female sexual dysfunction. 2003 111
polici-null-public-health Surveillance Sans Frontières: Internet-based emerging infectious disease intelligence and the HealthMap project. 2008 106
poor-poverti-famili-social-develop-health-increas-result-public Health care and equity in India. 2011 191
poor-poverti-famili-social-develop-health-increas-result-public Barriers and incentives to orphan care in a time of AIDS and economic crisis: a cross-sectional survey of caregivers in rural Zimbabwe. 2006 24
poor-poverti-famili-social-develop-health-increas-result-public Rebuilding transformation strategies in post-Ebola epidemics in Africa. 2017 4
research-health-public-develop Cost-utility analysis. 1993 104
research-health-public-develop Understanding the information needs of public health practitioners: a literature review to inform design of an interactive digital knowledge management system. 2007 71
research-health-public-develop The knowledge-value chain: A conceptual framework for knowledge translation in health. 2006 49
scenario-respons-understand-health-public-develop A Review on Internet of Things for Defense and Public Safety. 2016 21
scenario-respons-understand-health-public-develop Business and public health collaboration for emergency preparedness in Georgia: a case study. 2006 13
scenario-respons-understand-health-public-develop A probabilistic characterization of the health benefits of reducing methyl mercury intake in the United States. 2010 13
scienc-null-public-health CRISPR: Science can't solve it. 2015 15
scienc-null-public-health Well-being: towards an integration of psychology, neurobiology and social science. 2004 9
scienc-null-public-health Communicating with the public on issues of science and public health. 1995 7
scienc-research-public-health The construct of resilience: implications for interventions and social policies. 2000 245
scienc-research-public-health Understanding the human health effects of chemical mixtures. 2002 118
scienc-research-public-health Why should we promote public engagement with science? 2014 49
scienc-research-public-health Crowdsourcing applications for public health. 2014 49
studi-assess-includ-health Vancomycin-resistant enterococci outside the health-care setting: prevalence, sources, and public health implications. 1997 81
studi-assess-includ-health Patterns of childhood obesity prevention legislation in the United States. 2007 35
studi-assess-includ-health Breast milk and cognitive development--the role of confounders: a systematic review. 2013 26
surveil-data-public-health Global capacity for emerging infectious disease detection. 2010 79
surveil-data-public-health Fungal infections associated with contaminated methylprednisolone injections. 2013 71
surveil-data-public-health An outbreak of syphilis in Alabama prisons: correctional health policy and communicable disease control. 2001 32
surveil-diseas-health-public Strengthening public health surveillance and response using the health systems strengthening agenda in developing countries. 2010 38
surveil-diseas-health-public Evolution of ebola virus disease from exotic infection to global health priority, Liberia, mid-2014. 2015 21
surveil-diseas-health-public Effective animal health disease surveillance using a network-enabled approach. 2010 9
surveil-diseas-report-health Concepts for risk-based surveillance in the field of veterinary medicine and veterinary public health: review of current approaches. 2006 84
surveil-diseas-report-health Public health surveillance: historical origins, methods and evaluation. 1994 53
surveil-diseas-report-health Social media and internet-based data in global systems for public health surveillance: a systematic review. 2014 46
syndrom-surveil-system-health-public Implementing syndromic surveillance: a practical guide informed by the early experience. 2004 128
syndrom-surveil-system-health-public First confirmed cases of Middle East respiratory syndrome coronavirus (MERS-CoV) infection in the United States, updated information on the epidemiology of MERS-CoV infection, and guidance for the public, clinicians, and public health authorities - May 2014. 2014 52
syndrom-surveil-system-health-public Enhanced drop-in syndromic surveillance in New York City following September 11, 2001. 2003 18
tobacco-industri-health-includ-public The education effect on population health: a reassessment. 2011 104
tobacco-industri-health-includ-public Tobacco industry tactics for resisting public policy on health. 2000 73
tobacco-industri-health-includ-public The next public health revolution: public health information fusion and social networks. 2010 22
vaccin-diseas-public-health Investigation of bioterrorism-related anthrax, United States, 2001: epidemiologic findings. 2002 291
vaccin-diseas-public-health Mass distribution of free, intranasally administered influenza vaccine in a public school system. 2007 48
vaccin-diseas-public-health Smallpox: An attack scenario. 1999 47
water-drink-suppli-sampl-prevent-survei-level-health-public Pollution status of Pakistan: a retrospective review on heavy metal contamination of water, soil, and vegetables. 2014 22
water-drink-suppli-sampl-prevent-survei-level-health-public Comprehensive smoke-free legislation in England: how advocacy won the day. 2007 20
water-drink-suppli-sampl-prevent-survei-level-health-public Water fluoridation: a critical review of the physiological effects of ingested fluoride as a public health intervention. 2014 10

Use of keywords

We can review the commonest Mesh headings associated with each cluster tag.


labels$results %>%
  left_join(search1$df, by = c("pmid.value" = "pmid")) %>%
  select(clus_names, mesh) %>%
  filter(mesh != "NULL") %>%
  unnest(mesh) %>%
  count(clus_names, mesh,sort = TRUE) %>%
  filter(n < 30) %>%
  ungroup() %>%
  group_by(clus_names) %>%
  top_n(10)  %>%
  mutate(summary = paste(mesh, collapse = "; " )) %>%
  select(-c(mesh, n)) %>%
  distinct() %>%
  arrange(clus_names) %>%
  gt::gt()
summary
1-control-studi-assess-base-public-health
Humans; Male; Female; Public Health; Middle Aged; Adult; Animals; Child; Risk Factors; Adolescent; Age Factors; Aged; Appointments and Schedules; Cost-Benefit Analysis; Denmark; Disease Outbreaks; Hospitals, Public; No-Show Patients; Prevalence; Reminder Systems; Risk Assessment; Sex Factors; Young Adult
31-1-surveil-null
Australia; Humans; Public Health Surveillance; Disease Notification; Female; Male; Child, Preschool; Infant; Child; Adolescent; Communicable Diseases
artifici-intellig-health-public
Humans; Public Health; Adult; Female; Neural Networks (Computer); Aged; Aged, 80 and over; Artificial Intelligence; Male; Middle Aged
brief-variat-null-data
Hospitals, Public; State Medicine; United Kingdom; Humans; Patient Admission; Primary Health Care; England; Length of Stay; Waiting Lists; Ambulatory Surgical Procedures; Efficiency, Organizational; Elective Surgical Procedures; Emergency Service, Hospital; Hospitalization
children-ag-studi-health
Female; Child; Male; Child, Preschool; Infant; Public Health; Environmental Exposure; Adult; Adolescent; Lead; Socioeconomic Factors
communic-surveil-diseas-null-health
Humans; Public Health; Public Health Surveillance; Communicable Disease Control; Population Surveillance; Australia; Child; Communicable Diseases; Data Collection; Health Promotion; History, 20th Century; United Kingdom; United States
curv-predict-risk-perform-valid-data-identifi-studi-health
Humans; United States; Algorithms; Cross-Sectional Studies; Disease Outbreaks; Government Agencies; Intelligence; Mass Media; Public Health; Public Policy; Research; Risk Assessment; ROC Curve
data-health-studi-public
History, 20th Century; Epidemiology; Public Health Administration; Centers for Disease Control and Prevention (U.S.); Communicable Disease Control; Risk Factors; Health Policy; Australia; Infant; Internet
data-studi-public-health
Humans; Female; Hospitals, Public; Male; England; Public Health; Aged; Adult; Aged, 80 and over; Delivery of Health Care; Medical Audit; Middle Aged; Retrospective Studies; Socioeconomic Factors
doctor-health-studi-public
Humans; Female; Male; Health Services Accessibility; Middle Aged; Public Health; Socioeconomic Factors; Surveys and Questionnaires; Adult; Attitude; Delivery of Health Care; Health Knowledge, Attitudes, Practice; Nurses; Physicians; Public Opinion; Risk Factors; State Medicine; United Kingdom; Young Adult
doctor-null-imag-studi
Humans; Public Opinion; Female; Male; Physicians; Adult; Hospitals, Public; United States; Attitude to Health; Intelligence; Surveys and Questionnaires
effect-studi-health-public
Male; Public Health; Adult; Adolescent; Middle Aged; Public Policy; Young Adult; Socioeconomic Factors; Aged; Surveys and Questionnaires
epidem-intellig-null-health
Humans; Public Health; United Kingdom; Hospitals, Public; State Medicine; Disease Outbreaks; Health Policy; Information Dissemination; Intelligence Tests; United States
ethic-research-medic-null
Humans; Ethics, Medical; Intelligence; Public Policy; International Cooperation; Internationality; Parents; Physician-Patient Relations; Research; Social Control, Formal; United Kingdom
genet-medicin-null-public
Humans; Eugenics; Public Health; Genetic Diseases, Inborn; Genetic Engineering; Genetic Testing; Genetics; Intelligence; Public Opinion; Public Policy
genet-research-health-public-develop-studi
Humans; Public Health; Public Opinion; Adolescent; Female; Male; Adult; Young Adult; Aged; Child; Health Knowledge, Attitudes, Practice; Middle Aged; United States
genet-social-research-determin-human-health
Humans; Public Policy; United States; Ethics, Medical; Genetic Research; Genetic Testing; Genetics, Medical; Intelligence; Public Opinion; Socioeconomic Factors
health-studi-identifi-public-includ
Public Health; United States; Female; Male; Public Health Practice; Adult; Emergencies; Health Personnel; Biomedical Research; Cross-Sectional Studies; Epidemiology; Medical Informatics; Public-Private Sector Partnerships; Public Health Administration; Qualitative Research; Social Media; United Kingdom
hiv-infect-health-public
Humans; HIV Infections; United States; Female; Adult; Male; Adolescent; Middle Aged; Public Health; Public Policy; Young Adult
intellectu-measur-public-health
Humans; Female; Intelligence; Male; Adult; Public Health; Public Sector; Child; Cross-Sectional Studies; Environmental Exposure; Private Sector; United States
intellig-health-studi-public
Humans; Male; Female; Adult; Emotional Intelligence; Public Health; Hospitals, Public; Young Adult; Middle Aged; Cross-Sectional Studies; Emotions; Professional Competence; Public Policy; United Kingdom
mental-studi-health-develop-public-includ
Adult; Female; Male; Adolescent; Mental Health; Middle Aged; Intelligence; Public Health; Child; United States
messag-commun-health-public
Humans; Public Health; Female; Adolescent; Adult; Male; Young Adult; Public Opinion; United States; Animals; Disease Outbreaks; Middle Aged
nh-research-public-health
Humans; State Medicine; Hospitals, Public; United Kingdom; Adult; England; Health Services Research; Middle Aged; Public Health; Aged; Data Collection; Female; Marketing of Health Services; Public Health Administration; Public Opinion
polici-null-public-health
Public Opinion; United Kingdom; Disease Outbreaks; Intelligence; Research; Female; Child; History, 20th Century; Male; Animals; Delivery of Health Care; Government; Health Policy; International Cooperation
poor-poverti-famili-social-develop-health-increas-result-public
Humans; Poverty; Adolescent; Adult; Child; Delivery of Health Care; Health Services Accessibility; Healthcare Disparities; Petroleum; Private Sector; Public Health; Public Sector
research-health-public-develop
Public Policy; Female; Male; United States; Adult; Middle Aged; Health Policy; Public Opinion; England; Qualitative Research; United Kingdom
scenario-respons-understand-health-public-develop
Humans; Public Health; United States; Cost-Benefit Analysis; Models, Organizational; Models, Statistical; Public Health Administration; Risk Assessment; Aerospace Medicine; Algorithms; Antidotes; Bioterrorism; Chemical Terrorism; Chemical Warfare Agents; Cholera; Civil Defense; Cloud Computing; Commerce; Communicable Disease Control; Communicable Diseases; Computer Communication Networks; Computer Simulation; Confidentiality; Cooperative Behavior; Delivery of Health Care; Disaster Planning; Disasters; Electrocardiography, Ambulatory; Emigration and Immigration; Environmental Monitoring; Environmental Pollutants; Ethnic Groups; Female; Food Supply; Geography; Georgia; Government Agencies; Health Care Costs; Health Promotion; Health Services Accessibility; History, 20th Century; Hunger; India; Influenza A Virus, H5N1 Subtype; Influenza, Human; Insurance Benefits; Interinstitutional Relations; Internet; Interviews as Topic; Liability, Legal; Male; Malnutrition; Mass Chest X-Ray; Mass Screening; Methylmercury Compounds; Models, Economic; Models, Theoretical; Mortality; Motivation; Newspapers as Topic; Organizational Case Studies; Organizational Culture; Organizational Objectives; Organizations, Nonprofit; Pandemics; Political Systems; Probability; Program Development; Public-Private Sector Partnerships; Quarantine; Remote Sensing Technology; Research; Smallpox; Social Conditions; Social Medicine; Social Problems; Starvation; Strategic Stockpile; Travel; Triage; Tuberculosis, Pulmonary; Ukraine; United States Public Health Service; USSR; Voluntary Workers
scienc-null-public-health
Humans; Science; Public Health; Public Policy; United States; Biomedical Research; Federal Government; Politics; Public Opinion; Artificial Intelligence; Culture; Environment; Genetic Engineering; Mass Media; Research; Social Sciences; Software; Technology
scienc-research-public-health
Public Health; Public Policy; United States; Public Opinion; Science; Community Participation; History, 20th Century; Biomedical Research; History, 21st Century; Politics
studi-assess-includ-health
Humans; Public Health; United States; Adult; Child; Air Pollution; Europe; Female; International Cooperation; Obesity
surveil-data-public-health
Humans; Disease Outbreaks; Public Health; Public Health Surveillance; Male; Population Surveillance; Female; Internet; Middle Aged; Adult; United States
surveil-diseas-health-public
Humans; Public Health; Population Surveillance; Disease Outbreaks; Epidemiology; United States; Animals; Public Health Practice; Female; Laboratory Personnel; Male
surveil-diseas-report-health
Female; Male; Adolescent; Australia; Child; Adult; Middle Aged; Aged; Child, Preschool; Infant
syndrom-surveil-system-health-public
Humans; Public Health Surveillance; Public Health; United States; Disease Outbreaks; Male; Population Surveillance; Adult; Child; Female; Middle Aged
tobacco-industri-health-includ-public
Humans; Public Health; Tobacco Industry; Public Relations; United States; Smoking; Female; Health Policy; Male; Politics; Smoking Prevention
vaccin-diseas-public-health
United States; Bioterrorism; Disease Outbreaks; Vaccination; Adult; Female; Centers for Disease Control and Prevention (U.S.); Smallpox; Anthrax; Male
water-drink-suppli-sampl-prevent-survei-level-health-public
Humans; Public Health; Adolescent; Child; Drinking Water; Female; Male; Adult; Child, Preschool; Infant; Young Adult

Systematic reviews

We can extract systematic revews in a similar way.


sr <- labels$results %>%
  left_join(search1$df, by = c("pmid.value" = "pmid")) %>%
  filter(str_detect(keywords, "Review")|str_detect(absText, "review")) %>%
  unnest("absText")

table_sr <- sr %>%
  select(title, journalTitle, pubYear, clus_names, keywords, absText)

There are 250 articles tagged with Review as a Mesh heading or as a text word. These are shown in the table 2.

Full table of abstracts

Finally we can gather all the abstracts into a single interactive table which can be searched, filtered and shared.


labels <- labels$results %>%
  left_join(search1$df, by = c("pmid.value" = "pmid"))  %>%
  select(cluster, clus_names, pmcid,  doi, title, journalTitle, pubYear, citedByCount, absText) %>%
  mutate(doi = paste0("<a href = http://google.com/search?q=", doi, ">doi</a>"))
         #cluster = factor(cluster)) 

labels %>%
  DT::datatable(escape = FALSE, extensions = c('Responsive','Buttons', 'FixedHeader'), 
                filter = "top", 
  options = list(pageLength = 25,
    autoWidth = TRUE,
    columnDefs = list( ),
    dom = 'Bfrtip',
    buttons = c('csv', 'excel'),
    fixedHeader=TRUE) 
  )

Full texts

Selected full texts

library(rvest)
library(europepmc)



get_pmcids <- labels %>%
  select(pmcid) %>%
  filter(!is.na(pmcid)) %>%
  pull(pmcid)


details <- enframe(get_pmcids) %>%
  mutate(details = map(value, epmc_details, data_src = "pmc"))

details %>%
  unnest()

full_text_url <- details %>%
    mutate(full_text = map(details, "ftx")) %>%
    unnest(full_text) %>%
  filter(availability == "Free", documentStyle != "pdf") %>%
  select(value, url)

safe_text <- safely(get_page_text)

ftxt <- full_text_url %>%
  mutate(ftext = map(url, safe_text))

ftxt %>%
  unnest(cols = "ftext") %>%
  filter(str_detect(ftext, "chr"))

%>%
  unnest() %>%
  distinct()
  map(., "result")

ftxt %>%
  unnest(cols = "ftext")

full_text_pdf <- details %>%
    mutate(full_text = map(details, "ftx")) %>%
    unnest(full_text) %>%
  filter(availability == "Free", documentStyle == "pdf") %>%
  select(value, url)

# summary_ftext <- ftxt %>%
#   group_by(id) %>%
#   mutate(col = paste(ftxt, collapse = " ")) %>%
#   select(-ftext) %>%
#   distinct() %>%
#   mutate(summary = map(col, text_summariser, 6))