Introduction

  • Literature and evidence review essential in public health practice
  • Exponential growth in volume of literature
  • Initial first steps usually:
    • Developing search strategy
    • Reviwing and filtering abstracts
    • Obtaining full text (if possible)
    • Data extraction

This can be a manual and protracted interative process which may involve using specialised searching services, downloading abstracts, reading and filtering, secondary searching and so on, and may involve sifting many thousands of abstracts.

Often we may just want a rapid overview of the literature to help focus further reviewing.

In this vignette we demonstrate the use of R packages for large scale extraction of abstracts, and analytical techniques for identifying topics or themes in the abstracts.

The vignette is based on a number of R packages:

  1. europepmc - this is a sophisticated tool which interacts with the PubMedCentral API and provdes access to additional fields.
  2. adjutant - this is a fully fledged package with retrieval and clustering functions. 3.tidytext - a package for text mining using tidy data principles.
  3. Rtsne - this uses the tSNE algorithm for data reduction and cluster visualisation
  4. dbscan - applies the HDBSCAN algorithm for data clustering
  5. myScrapers - wraps some functions built on other packages to automate the search, extraction, and filtering process.

We have “hacked” some of the functions in these packages and written additional functions to develop a work flow from searching and retrieval to analysis

A simple example using europepmc

Searching Europe PubMed Central (epmc)

This is a package which allows searching of EuropePMC via the API.

It can be downloaded from CRAN.


if(!require("europepmc")) install.packages("europepmc")
library(europepmc)

The main function is epmc_search which allows us to search the site and retrieve abstracts, metadata and citation counts.

We’ll use it with the search term “big data” AND “public health”.


head(epmc_search(params$search, limit = 10))
#> # A tibble: 6 x 28
#>   id    source pmid  doi   title authorString journalTitle issue
#>   <chr> <chr>  <chr> <chr> <chr> <chr>        <chr>        <chr>
#> 1 3148~ MED    3148~ 10.1~ mHea~ Madanian S,~ BMJ Health ~ 1    
#> 2 3109~ MED    3109~ 10.1~ Big ~ Zetino J, M~ Soc Work Pu~ <NA> 
#> 3 3143~ MED    3143~ 10.3~ Usin~ Cassim N, M~ Stud Health~ <NA> 
#> 4 3147~ MED    3147~ 10.1~ Usin~ Zhang Y, Ba~ Int J Biome~ <NA> 
#> 5 3106~ MED    3106~ 10.1~ Auto~ Shaw D, Fav~ Med Health ~ <NA> 
#> 6 3142~ MED    3142~ 10.1~ The ~ Kenney M, M~ Med Humanit  <NA> 
#> # ... with 20 more variables: journalVolume <chr>, pubYear <chr>,
#> #   journalIssn <chr>, pubType <chr>, isOpenAccess <chr>, inEPMC <chr>,
#> #   inPMC <chr>, hasPDF <chr>, hasBook <chr>, citedByCount <int>,
#> #   hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, pageInfo <chr>, pmcid <chr>,
#> #   hasSuppl <chr>

This doesn’t extract the abstract text or Mesh headings (keywords) - to facilitate this we have wrapped the search function, into get_full_search in myScrapers.

library(tictoc)

tic()
search1 <- get_full_search(search = params$search, limit = params$limit)
toc()
#> 1751.07 sec elapsed

head(search1, 20)
#> # A tibble: 20 x 32
#>    id    source pmid  doi   title authorString journalTitle issue
#>    <chr> <chr>  <chr> <chr> <chr> <chr>        <chr>        <chr>
#>  1 3148~ MED    3148~ 10.1~ mHea~ Madanian S,~ BMJ Health ~ 1    
#>  2 3109~ MED    3109~ 10.1~ Big ~ Zetino J, M~ Soc Work Pu~ <NA> 
#>  3 3143~ MED    3143~ 10.3~ Usin~ Cassim N, M~ Stud Health~ <NA> 
#>  4 3147~ MED    3147~ 10.1~ Usin~ Zhang Y, Ba~ Int J Biome~ <NA> 
#>  5 3106~ MED    3106~ 10.1~ Auto~ Shaw D, Fav~ Med Health ~ <NA> 
#>  6 3142~ MED    3142~ 10.1~ The ~ Kenney M, M~ Med Humanit  <NA> 
#>  7 3126~ MED    3126~ 10.2~ Topi~ Cho HW.      Osong Publi~ 3    
#>  8 3131~ MED    3131~ 10.1~ Big ~ Yang C, Kon~ Nephrology ~ <NA> 
#>  9 3130~ MED    3130~ 10.1~ Use ~ Vermund SH.  Lancet HIV   8    
#> 10 3123~ MED    3123~ 10.3~ [Ran~ Xu L, Wang ~ Zhonghua Li~ 6    
#> 11 3138~ MED    3138~ 10.1~ Data~ Hoeyer K, B~ Soc Stud Sci 4    
#> 12 3109~ MED    3109~ 10.1~ Arti~ Leatherdale~ Cancer Caus~ 7    
#> 13 3149~ MED    3149~ 10.1~ "Hea~ Cheng YC, H~ Drug Alcoho~ <NA> 
#> 14 3133~ MED    3133~ 10.3~ Appl~ Kim H, Han ~ Int J Envir~ 14   
#> 15 3110~ MED    3110~ 10.3~ Usin~ Rocklöv J, ~ Emerg Infec~ 6    
#> 16 3141~ MED    3141~ 10.1~ Publ~ Rodríguez-G~ Yearb Med I~ 1    
#> 17 3120~ MED    3120~ 10.1~ Does~ Caliebe A, ~ BMC Med Res~ 1    
#> 18 3127~ MED    3127~ 10.1~ Data~ Hoeyer K.    Soc Stud Sci 4    
#> 19 3145~ MED    3145~ 10.1~ Why ~ Lehne M, Sa~ NPJ Digit M~ <NA> 
#> 20 3127~ MED    3127~ 10.1~ Leve~ Thompson ME~ N C Med J    4    
#> # ... with 24 more variables: journalVolume <chr>, pubYear <chr>,
#> #   journalIssn <chr>, pubType <chr>, isOpenAccess <chr>, inEPMC <chr>,
#> #   inPMC <chr>, hasPDF <chr>, hasBook <chr>, citedByCount <int>,
#> #   hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, pageInfo <chr>, pmcid <chr>,
#> #   hasSuppl <chr>, name <int>, absText <list>, mesh <list>,
#> #   keywords <chr>

We can see that the get_full_search function returns addition metadata such as citation counts, whether the journal is open access and whether there is PDF available. By default, 1000 article descriptions are downloaded. It also includes mesh headings and abstract text.

we can see how many articles are available altogether by running epmc_profile.


profile <- epmc_profile(query = params$search)

Running epmc_profile allows us to see that there are 4001 articles of which 3521 are full text articles, and 2457 are open access.

Analysing abstracts

Abstracts per year

We can easily look at annual abstract frequency - we can readily see the growth in publication frequency in the last 3 years.


search1 %>%
  count(pubYear) %>%
  ggplot(aes(pubYear, n)) +
  geom_col(fill = "blue") +
  labs(title = "Abstracts per year", 
       subtitle = paste("Search: ", params$search)) +
  phecharts::theme_phe() +
  theme(axis.text.x = element_text(angle = 45 ,hjust = 1))

Journal frequency

Similarly we can identify the most frequent journals


journal_count <- search1 %>%
  count(journalTitle) %>%
  top_n(20) %>%
  arrange(-n)

 journal_count %>%
  ggplot(aes(reorder(journalTitle, n), n)) +
  geom_col(fill = "blue") +
  coord_flip() +
  labs(title = "Journal frequency") +
  phecharts::theme_phe()

PLoS One and Int J Environ Res Public Health are the most frequent journals publishing articles on “big data” AND “public health”.

Topic identification

Once we have a data frame of 4004 records with abstract text, we can prepare the data for analysis. THe create_corpus function is designed for this.


out1 <- search1 %>%
  select(pmid, pmcid ,doi, title, pubYear, citedByCount, absText, journalTitle) %>%
  filter(absText != "NULL") %>%
  mutate(text = paste(title, absText))

Text mining

We will use a method exemplified in the adjutant package which uses unsupervised machine learning to try and cluster similar articles and attach themes.

In this approach undertake some natural language processing. We will

  • Split each abstract into groups is single words
  • Remove numbers and common (stop) words
  • Stem each word (definition:)
  • Calculate the tf-idf score for each word in each abstract - this gives more weight to words which are more “typical” of the abstracts
  • Create a document feature matrix
  • Undertake dimensionality reduction using tSNE to simplify
  • Run HDBSCAN to identify clusters
  • Name the clusters
  • QA the result

The ultimate output of this analysis is a visualisation of clustered and labelled abstracts and a interactive table.


library(tidytext)

corp <- create_corpus(df = search1)

head(corp$corpus)
#> # A tibble: 6 x 6
#>   pmid     word         n      tf   idf tf_idf
#>   <chr>    <chr>    <int>   <dbl> <dbl>  <dbl>
#> 1 18769432 biocur       1 0.25    7.57  1.89  
#> 2 18769432 data         1 0.25    0.466 0.117 
#> 3 18769432 futur        1 0.25    1.83  0.457 
#> 4 18769432 null         1 0.25    2.19  0.546 
#> 5 21214922 accuraci     2 0.0185  3.03  0.0561
#> 6 21214922 aim          1 0.00926 1.88  0.0174

clust <- create_cluster(corpus = corp$corpus, minPts = 10)
#> 1640.52 sec elapsed


clust$cluster_size
#> # A tibble: 70 x 2
#>    cluster     n
#>      <dbl> <int>
#>  1       0  1326
#>  2       1    62
#>  3       2    12
#>  4       3    49
#>  5       4    49
#>  6       5    13
#>  7       6    11
#>  8       7    37
#>  9       8    74
#> 10       9    42
#> # ... with 60 more rows

Labelling clusters


labels <- label_clusters(corp$corpus, clustering = clust$clustering, top_n = 4)
#> 1.08 sec elapsed

labels$labels
#> # A tibble: 70 x 2
#> # Groups:   cluster [70]
#>    cluster clus_names                          
#>      <dbl> <chr>                               
#>  1       0 research-health-data-studi          
#>  2       1 hiv-studi-data-health               
#>  3       2 ehealth-care-health-inform-base-data
#>  4       3 pollut-air-studi-health             
#>  5       4 vaccin-develop-data-health          
#>  6       5 cancer-null-report-data             
#>  7       6 pregnanc-popul-studi-data           
#>  8       7 nutrit-approach-health-data         
#>  9       8 imag-clinic-data-studi              
#> 10       9 injuri-data-studi-health            
#> # ... with 60 more rows

Visualise


p <- labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  ggplot(aes(X1, X2)) +
  geom_point(aes(colour = clustered, size = citedByCount) ) +
  ggrepel::geom_text_repel(data = labels$plot, aes(medX, medY, label = clus_names), size = 3, colour = "#006d2c", alpha = 0.9)

p + scale_alpha_manual(values=c(1,0)) +
  viridis::scale_color_viridis(discrete = TRUE, option = "cividis", alpha = .6) +
  phecharts::theme_phe() +
  theme(panel.background = element_rect(fill = "#f0f0f0")) +
  labs(subtitle = paste("Clustering: ", nrow(labels$plot), " topics" ), 
       title = paste("Search ", "= ", params$search ))

Understanding the labels

Most cited articles


most_cited <- labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  filter(cluster !=0) %>%
  group_by(clus_names) %>%
  top_n(n = 3, citedByCount) %>%
  select(clus_names, title, pubYear, citedByCount) %>%
  ungroup() %>%
  arrange(clus_names, -citedByCount)

most_cited %>%
  formattable::formattable()
clus_names title pubYear citedByCount
abstract-meet-annual-null Pandemic influenza and socioeconomic disparities: Lessons from 1918 Chicago. 2016 4
abstract-meet-annual-null Oral abstracts of the 22nd International AIDS Conference, 23-27 July 2018, Amsterdam, the Netherlands. 2018 4
abstract-meet-annual-null Abstracts from the 38th annual meeting of the society of general internal medicine. 2015 3
abstract-meet-annual-null Proceedings of the 3rd Biennial Conference of the Society for Implementation Research Collaboration (SIRC) 2015: advancing efficient methodologies through community partnerships and team science : Seattle, WA, USA. 24-26 September 2015. 2016 3
ag-associ-risk-studi Sex-age-specific association of body mass index with all-cause mortality among 12.8 million Korean adults: a prospective cohort study. 2015 35
ag-associ-risk-studi Association Between Smoking and Physician-Diagnosed Stroke and Myocardial Infarction in Male Adults in Korea. 2016 3
ag-associ-risk-studi Socioeconomic disparities in first stroke incidence, quality of care, and survival: a nationwide registry-based cohort study of 44 million adults in England. 2018 3
ag-nation-health-studi Cohort profile: the National Health Insurance Service-National Health Screening Cohort (NHIS-HEALS) in Korea. 2017 41
ag-nation-health-studi National dental policies and socio-demographic factors affecting changes in the incidence of periodontal treatments in Korean: A nationwide population-based retrospective cohort study from 2002-2013. 2016 12
ag-nation-health-studi Longitudinal change in estimated GFR among CKD patients: A 10-year follow-up study of an integrated kidney disease care program in Taiwan. 2017 8
ag-popul-studi-result Space-Time Covariation of Mortality with Temperature: A Systematic Study of Deaths in France, 1968-2009. 2015 18
ag-popul-studi-result Multiscale Entropy of Electroencephalogram as a Potential Predictor for the Prognosis of Neonatal Seizures. 2015 6
ag-popul-studi-result Maximising follow-up participation rates in a large scale 45 and Up Study in Australia. 2016 5
ai-artifici-intellig-health Global Evolution of Research in Artificial Intelligence in Health and Medicine: A Bibliometric Study. 2019 3
ai-artifici-intellig-health An overview of GeoAI applications in health and healthcare. 2019 2
ai-artifici-intellig-health Digital Diabetes Data and Artificial Intelligence: A Time for Humility Not Hubris. 2019 1
ai-artifici-intellig-health Artificial Intelligence for infectious disease Big Data Analytics. 2019 1
ai-artifici-intellig-health Artificial Intelligence vs. Natural Stupidity: Evaluating AI readiness for the Vietnamese Medical Information System. 2019 1
analysi-approach-data-base Meta-analysis in clinical trials revisited. 2015 250
analysi-approach-data-base VFDB 2016: hierarchical and refined dataset for big data analysis–10 years on. 2016 184
analysi-approach-data-base Metabolomics: beyond biomarkers and towards mechanisms. 2016 160
burden-global-diseas-studi Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980-2015: a systematic analysis for the Global Burden of Disease Study 2015. 2016 1071
burden-global-diseas-studi Global, regional, and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990-2015: a systematic analysis for the Global Burden of Disease Study 2015. 2016 534
burden-global-diseas-studi Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016. 2017 484
cancer-gene-express-studi-data Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA. 2015 48
cancer-gene-express-studi-data Genetic Mechanisms of Immune Evasion in Colorectal Cancer. 2018 22
cancer-gene-express-studi-data ISG15 in the tumorigenesis and treatment of cancer: An emerging role in malignancies of the digestive system. 2016 11
cancer-null-report-data The State of Cancer Care in America, 2016: A Report by the American Society of Clinical Oncology. 2016 43
cancer-null-report-data AACR Cancer Progress Report 2014. 2014 23
cancer-null-report-data The evolution of cancer registration. 2014 3
cancer-null-report-data AACR Cancer Progress Report 2015. 2015 3
cancer-patient-studi-data The National Cancer Institute’s Dietary Assessment Primer: A Resource for Diet Research. 2015 36
cancer-patient-studi-data Mechanisms of NAFLD development and therapeutic strategies. 2018 36
cancer-patient-studi-data Systems analysis of the prostate transcriptome in African-American men compared with European-American men. 2016 21
cancer-patient-studi-data Precancer Atlas to Drive Precision Prevention Trials. 2017 21
cancer-popul-studi-data Global, Regional, and National Cancer Incidence, Mortality, Years of Life Lost, Years Lived With Disability, and Disability-Adjusted Life-Years for 29 Cancer Groups, 1990 to 2016: A Systematic Analysis for the Global Burden of Disease Study. 2018 63
cancer-popul-studi-data Incidence and survival of adult cancer patients in Taiwan, 2002-2012. 2016 56
cancer-popul-studi-data Reproducibility, reliability and validity of population-based administrative health data for the assessment of cancer non-related comorbidities. 2017 12
cancer-research-patient-data Gut Microbiota, Inflammation, and Colorectal Cancer. 2016 67
cancer-research-patient-data Effectiveness of acupuncture and related therapies for palliative care of cancer: overview of systematic reviews. 2015 21
cancer-research-patient-data Future cancer research priorities in the USA: a Lancet Oncology Commission. 2017 21
cardiovascular-diseas-risk-studi The Incidence of Major Cardiovascular Events in Immigrants to Ontario, Canada: The CANHEART Immigrant Study. 2015 15
cardiovascular-diseas-risk-studi Cardiovascular Event Prediction by Machine Learning: The Multi-Ethnic Study of Atherosclerosis. 2017 14
cardiovascular-diseas-risk-studi The Association of Arsenic Metabolism with Cancer, Cardiovascular Disease, and Diabetes: A Systematic Review of the Epidemiological Evidence. 2017 14
care-health-studi-data Healthcare Access and Quality Index based on mortality from causes amenable to personal health care in 195 countries and territories, 1990-2015: a novel analysis from the Global Burden of Disease Study 2015. 2017 75
care-health-studi-data Measuring progress and projecting attainment on the basis of past trends of the health-related Sustainable Development Goals in 188 countries: an analysis from the Global Burden of Disease Study 2016. 2017 53
care-health-studi-data Mapping under-5 and neonatal mortality in Africa, 2000-15: a baseline analysis for the Sustainable Development Goals. 2017 28
care-patient-health-data Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. 2014 66
care-patient-health-data Insights from advanced analytics at the Veterans Health Administration. 2014 52
care-patient-health-data Innovating to enhance clinical data management using non-commercial and open source solutions across a multi-center network supporting inpatient pediatric care and research in Kenya. 2016 29
clinic-data-learn-studi Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available. 2016 60
clinic-data-learn-studi Medical big data: promise and challenges. 2017 19
clinic-data-learn-studi Are Randomized Controlled Trials the (G)old Standard? From Clinical Intelligence to Prescriptive Analytics. 2016 8
confer-intern-challeng-research-health-develop Making big data useful for health care: a summary of the inaugural mit critical data conference. 2014 18
confer-intern-challeng-research-health-develop U.S. Physician-Scientist Workforce in the 21st Century: Recommendations to Attract and Sustain the Pipeline. 2018 4
confer-intern-challeng-research-health-develop Promoting Secondary Analysis of Electronic Medical Records in China: Summary of the PLAGH-MIT Critical Data Conference and Health Datathon. 2017 3
confound-adjust-estim-effect-model-studi-data Uncertainty in Propensity Score Estimation: Bayesian Methods for Variable Selection and Model Averaged Causal Effects. 2014 12
confound-adjust-estim-effect-model-studi-data Adjusting for unmeasured confounding in nonrandomized longitudinal studies: a methodological review. 2017 7
confound-adjust-estim-effect-model-studi-data Bias Analysis for Uncontrolled Confounding in the Health Sciences. 2017 5
data-decis-health-challeng-public-develop The challenge of big data in public health: an opportunity for visual analytics. 2014 11
data-decis-health-challeng-public-develop Data mashups: potential contribution to decision support on climate change and health. 2014 7
data-decis-health-challeng-public-develop If you build it, they will come: unintended future uses of organised health data collections. 2016 7
digit-technologi-health-data Digital epidemiology. 2012 111
digit-technologi-health-data Developing and Evaluating Digital Interventions to Promote Behavior Change in Health and Health Care: Recommendations Resulting From an International Workshop. 2017 62
digit-technologi-health-data Applying and advancing behavior change theories and techniques in the context of a digital health revolution: proposals for more effectively realizing untapped potential. 2017 20
diseas-clinic-research-data Drug development in Alzheimer’s disease: the path to 2025. 2016 61
diseas-clinic-research-data Big data analytics to improve cardiovascular care: promise and challenges. 2016 55
diseas-clinic-research-data Brain-Body Pathways Linking Psychological Stress and Physical Health. 2015 40
effect-result-studi-data Big Data and Disease Prevention: From Quantified Self to Quantified Communities. 2013 20
effect-result-studi-data Changing the environment to improve population health: a framework for considering exposure in natural experimental studies. 2016 19
effect-result-studi-data Weather effects on the patterns of people’s everyday activities: a study using GPS traces of mobile phone users. 2013 9
ehealth-care-health-inform-base-data How can research keep up with eHealth? Ten strategies for increasing the timeliness and usefulness of eHealth research. 2014 46
ehealth-care-health-inform-base-data What is eHealth (6)? Development of a Conceptual Model for eHealth: Qualitative Study with Key Informants. 2017 15
ehealth-care-health-inform-base-data Designing an Electronic Patient Management System for Multiple Sclerosis: Building a Next Generation Multiple Sclerosis Documentation System. 2016 14
epidemiologi-health-data-research Transforming epidemiology for 21st century medicine and public health. 2013 52
epidemiologi-health-data-research Using Electronic Health Records for Population Health Research: A Review of Methods and Applications. 2016 41
epidemiologi-health-data-research Commentary: Epidemiology in the era of big data. 2015 29
epidemiologi-null-health-data The last two decades of life course epidemiology, and its relevance for research on ageing. 2016 22
epidemiologi-null-health-data Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology. 2017 12
epidemiologi-null-health-data The Future of Cardiovascular Epidemiology. 2016 11
forecast-model-time-data Accurate estimation of influenza epidemics using Google search data via ARGO. 2015 40
forecast-model-time-data Global disease monitoring and forecasting with Wikipedia. 2014 39
forecast-model-time-data Using networks to combine “big data” and traditional surveillance to improve influenza predictions. 2015 17
gene-associ-diseas-studi-data Pitfalls of predicting complex traits from SNPs. 2013 203
gene-associ-diseas-studi-data Developing and evaluating polygenic risk prediction models for stratified disease prevention. 2016 77
gene-associ-diseas-studi-data Analysis of shared heritability in common disorders of the brain. 2018 40
gene-associ-identifi-studi Rare and low-frequency coding variants alter human adult height. 2017 111
gene-associ-identifi-studi Trans-ancestry meta-analyses identify rare and common variants associated with blood pressure and hypertension. 2016 77
gene-associ-identifi-studi New loci for body fat percentage reveal link between adiposity and cardiometabolic disease risk. 2016 63
genom-challeng-research-data H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa. 2016 27
genom-challeng-research-data Stakeholder engagement: a key component of integrating genomic information into electronic health records. 2013 25
genom-challeng-research-data Building a data sharing model for global genomic research. 2014 15
genom-sequenc-base-data Insights from 20 years of bacterial genome sequencing. 2015 144
genom-sequenc-base-data IslandViewer 4: expanded prediction of genomic islands for larger-scale datasets. 2017 99
genom-sequenc-base-data Towards structural systems pharmacology to study complex diseases and personalized medicine. 2014 26
healthcar-data-develop-health Big data analytics in healthcare: promise and potential. 2014 167
healthcar-data-develop-health Driving Innovation in Health Systems through an Apps-Based Information Economy. 2015 29
healthcar-data-develop-health Towards Actualizing the Value Potential of Korea Health Insurance Review and Assessment (HIRA) Data as a Resource for Health Research: Strengths, Limitations, Applications, and Strategies for Optimal Use of HIRA Data. 2017 29
hiv-studi-data-health The changing epidemiology of the global paediatric HIV epidemic: keeping track of perinatally HIV-infected adolescents. 2013 51
hiv-studi-data-health Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes. 2014 47
hiv-studi-data-health Spending on health and HIV/AIDS: domestic health spending and development assistance in 188 countries, 1995-2015. 2018 17
hospit-patient-ag-studi-data Predicting suicides after psychiatric hospitalization in US Army soldiers: the Army Study To Assess Risk and rEsilience in Servicemembers (Army STARRS). 2015 73
hospit-patient-ag-studi-data Epidemiology and socioeconomic features of appendicitis in Taiwan: a 12-year population-based study. 2015 12
hospit-patient-ag-studi-data Automated comparison of last hospital main diagnosis and underlying cause of death ICD10 codes, France, 2008-2009. 2014 7
host-infect-sequenc-genom 25-Hydroxycholesterol Protects Host against Zika Virus Infection and Its Associated Microcephaly in a Mouse Model. 2017 46
host-infect-sequenc-genom Genomic Infectious Disease Epidemiology in Partially Sampled and Ongoing Outbreaks. 2017 31
host-infect-sequenc-genom Transferrin receptor 1 is a reticulocyte-specific receptor for Plasmodium vivax. 2018 20
idea-health-research-system-develop Innovation in academic chemical screening: filling the gaps in chemical biology. 2013 11
idea-health-research-system-develop OpenFDA: an innovative platform providing access to a wealth of FDA’s publicly available data. 2016 8
idea-health-research-system-develop Reimagining Human Research Protections for 21st Century Science. 2016 7
imag-clinic-data-studi A critical review of the first 10 years of candidate gene-by-environment interaction research in psychiatry. 2011 380
imag-clinic-data-studi Unraveling the miswired connectome: a developmental perspective. 2014 99
imag-clinic-data-studi Impact of the Alzheimer’s Disease Neuroimaging Initiative, 2004 to 2014. 2015 53
informat-health-system-inform-data Big Data Application in Biomedical Research and Health Care: A Literature Review. 2016 28
informat-health-system-inform-data Big Data: Are Biomedical and Health Informatics Training Programs Ready? Contribution of the IMIA Working Group for Health and Medical Informatics Education. 2014 5
informat-health-system-inform-data Does Informatics Enable or Inhibit the Delivery of Patient-centred, Coordinated, and Quality-assured Care: a Delphi Study. A Contribution of the IMIA Primary Health Care Informatics Working Group. 2015 4
injuri-data-studi-health Pre-Clinical Traumatic Brain Injury Common Data Elements: Toward a Common Language Across Laboratories. 2015 27
injuri-data-studi-health Applications for detection of acute kidney injury using electronic medical records and clinical information systems: workgroup statements from the 15(th) ADQI Consensus Conference. 2016 17
injuri-data-studi-health Utilizing electronic health records to predict acute kidney injury risk and outcomes: workgroup statements from the 15(th) ADQI Consensus Conference. 2016 15
learn-predict-model-data Big Data and machine learning in radiation oncology: State of the art and future prospects. 2016 33
learn-predict-model-data Machine Learning and Data Mining Methods in Diabetes Research. 2017 22
learn-predict-model-data Automatic prediction of rheumatoid arthritis disease activity from the electronic medical records. 2013 18
malaria-transmiss-infect-control-model Modeling infectious disease dynamics in the complex landscape of global health. 2015 89
malaria-transmiss-infect-control-model Global Epidemiology of Plasmodium vivax. 2016 71
malaria-transmiss-infect-control-model Tools and Strategies for Malaria Control and Elimination: What Do We Need to Achieve a Grand Convergence in Malaria? 2016 46
media-social-health-data Experimental evidence of massive-scale emotional contagion through social networks. 2014 139
media-social-health-data Behavioral intervention technologies: evidence review and recommendations for future research in mental health. 2013 138
media-social-health-data A cross-sectional examination of marketing of electronic cigarettes on Twitter. 2014 104
medic-care-research-health-data Routinely collected data as a strategic resource for research: priorities for methods and workforce. 2015 12
medic-care-research-health-data Setting a research agenda for interprofessional education and collaborative practice in the context of United States health system reform. 2016 7
medic-care-research-health-data Biostatistical and medical statistics graduate education. 2014 6
medic-care-research-health-data Accelerating Research Impact in a Learning Health Care System: VA’s Quality Enhancement Research Initiative in the Choice Act Era. 2017 6
method-applic-data-develop-research Building better biomarkers: brain models in translational neuroimaging. 2017 72
method-applic-data-develop-research Twenty-five years of confirmatory adaptive designs: opportunities and pitfalls. 2016 35
method-applic-data-develop-research Toward Good Read-Across Practice (GRAP) guidance. 2016 23
model-health-research-data-studi Spatial and temporal epidemiological analysis in the Big Data era. 2015 10
model-health-research-data-studi Scalable combinatorial tools for health disparities research. 2014 9
model-health-research-data-studi Using Big Data to Understand the Human Condition: The Kavli HUMAN Project. 2015 9
model-studi-result-data User Acceptance of Wrist-Worn Activity Trackers Among Community-Dwelling Older Adults: Mixed Method Study. 2017 12
model-studi-result-data Comparing baseline characteristics between groups: an introduction to the CBCgrps package. 2017 11
model-studi-result-data Expanding the boundaries of local similarity analysis. 2013 6
null-research-health-data Big data: The future of biocuration. 2008 206
null-research-health-data Data Resource Profile: The National Health Information Database of the National Health Insurance Service in South Korea. 2017 55
null-research-health-data Ethical challenges of big data in public health. 2015 49
null-scienc-health-data Radiogenomics: radiobiology enters the era of big data and team science. 2014 38
null-scienc-health-data Patient focused registries can improve health, care, and science. 2016 22
null-scienc-health-data Enabling a Learning Health System through a Unified Enterprise Data Warehouse: The Experience of the Northwestern University Clinical and Translational Sciences (NUCATS) Institute. 2015 14
null-scienc-health-data Navigating knowledge landscapes: on health, science, communication, media, and society. 2015 14
null-scienc-health-data Setting the Agenda for a New Discipline: Population Health Science. 2016 14
nutrit-approach-health-data Feeding the brain and nurturing the mind: Linking nutrition and the gut microbiota to brain development. 2015 41
nutrit-approach-health-data Popular Nutrition-Related Mobile Apps: A Feature Assessment. 2016 19
nutrit-approach-health-data Precision nutrition for prevention and management of type 2 diabetes. 2018 10
obes-adult-ag-relat-data-studi-health Variations in the Prevalence of Obesity Among European Countries, and a Consideration of Possible Causes. 2017 10
obes-adult-ag-relat-data-studi-health The effects of community environmental factors on obesity among Korean adults: a multilevel analysis. 2014 3
obes-adult-ag-relat-data-studi-health Effect of Maternal Age at Childbirth on Obesity in Postmenopausal Women: A Nationwide Population-Based Study in Korea. 2016 2
obes-adult-ag-relat-data-studi-health Investigation Of Obesity-Related Mortality Rates In Delaware. 2017 2
obes-adult-ag-relat-data-studi-health Compliance with Dietary Guidelines Varies by Weight Status: A Cross-Sectional Study of Australian Adults. 2018 2
obes-adult-ag-relat-data-studi-health Socioeconomic disparities in abdominal obesity over the life course in China. 2018 2
obes-adult-ag-relat-data-studi-health Estimating the potential impact of the UK government’s sugar reduction programme on child and adult health: modelling study. 2019 2
paper-editori-editor-research Contributions from the 2016 Literature on Clinical Decision Support. 2017 3
paper-editori-editor-research Discussion of “Representation of People’s Decisions in Health Information Systems: A Complementary Approach for Understanding Health Care Systems and Population Health”. 2017 1
paper-editori-editor-research Clinical Information Systems as the Backbone of a Complex Information Logistics Process: Findings from the Clinical Information Systems Perspective for 2016. 2017 1
paper-editori-editor-research A new community for those involved and interested in diagnosis and prognosis. 2017 1
patient-base-studi-data Time-varying effects of a text-based smoking cessation intervention for urban adolescents. 2015 9
patient-base-studi-data Influence of APOE Genotype on Hippocampal Atrophy over Time - An N=1925 Surface-Based ADNI Study. 2016 7
patient-base-studi-data Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis. 2014 6
patient-health-studi-data Multimorbidity in chronic disease: impact on health care resources and costs. 2016 36
patient-health-studi-data Characterizing treatment pathways at scale using the OHDSI network. 2016 35
patient-health-studi-data Successful aging: Advancing the science of physical independence in older adults. 2015 34
pollut-air-studi-health High-Resolution Air Pollution Mapping with Google Street View Cars: Exploiting Big Data. 2017 22
pollut-air-studi-health A Survey of Wireless Sensor Network Based Air Pollution Monitoring Systems. 2015 20
pollut-air-studi-health An association between air pollution and daily outpatient visits for respiratory disease in a heavy industry area. 2013 19
precis-medicin-research-data Building the foundation for genomics in precision medicine. 2015 78
precis-medicin-research-data Metabolomics enables precision medicine: “A White Paper, Community Perspective”. 2016 69
precis-medicin-research-data From big data analysis to personalized medicine for all: challenges and opportunities. 2015 62
precis-null-medicin-health Medicine. Big data meets public health. 2014 79
precis-null-medicin-health Precision Public Health for the Era of Precision Medicine. 2016 75
precis-null-medicin-health Will Precision Medicine Improve Population Health? 2016 40
pregnanc-popul-studi-data Association Between Methylphenidate and Amphetamine Use in Pregnancy and Risk of Congenital Malformations: A Cohort Study From the International Pregnancy Safety Study Consortium. 2018 7
pregnanc-popul-studi-data Aortic dissection in pregnancy in England: an incidence study using linked national databases. 2015 5
pregnanc-popul-studi-data Outcomes Associated With Paroxysmal Supraventricular Tachycardia During Pregnancy. 2017 3
pregnanc-popul-studi-data Maternal and fetal outcomes of pregnant women with type 1 diabetes, a national population study. 2017 3
public-health-system-commun-inform-data-popul Big bad data: law, public health, and biomedical databases. 2013 14
public-health-system-commun-inform-data-popul From urban planning and emergency training to Pokémon Go: applications of virtual reality GIS (VRGIS) and augmented reality GIS (ARGIS) in personal, public and environmental health. 2017 6
public-health-system-commun-inform-data-popul The EVOTION Decision Support System: Utilizing It for Public Health Policy-Making in Hearing Loss. 2017 1
public-health-system-commun-inform-data-popul Environmental Public Health Tracking Program Advances and Successes: Highlights From the First 15 Years. 2017 1
public-health-system-commun-inform-data-popul Public Views on Using Mobile Phone Call Detail Records in Health Research: Qualitative Study. 2019 1
public-health-system-commun-inform-data-popul Adjusting the focus: A public health ethics approach to data research. 2019 1
research-data-health-identifi Crowdsourcing–harnessing the masses to advance health and medicine, a systematic review. 2014 75
research-data-health-identifi Research impact: a narrative review. 2016 21
research-data-health-identifi Research impact in the community-based health sciences: an analysis of 162 case studies from the 2014 UK Research Excellence Framework. 2015 19
research-data-inform-health Big data and biomedical informatics: a challenging opportunity. 2014 34
research-data-inform-health Genetic data and electronic health records: a discussion of ethical, logistical and technological considerations. 2014 24
research-data-inform-health Has the biobank bubble burst? Withstanding the challenges for sustainable biobanking in the digital era. 2016 19
resist-develop-studi-data CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. 2017 230
resist-develop-studi-data Contemporary status of insecticide resistance in the major Aedes vectors of arboviruses infecting humans. 2017 59
resist-develop-studi-data MEGARes: an antimicrobial resistance database for high throughput sequencing. 2017 36
risk-ag-associ-cardiovascular-studi Associations of Omega-3 Fatty Acid Supplement Use With Cardiovascular Disease Risks: Meta-analysis of 10 Trials Involving 77<U+202F>917 Individuals. 2018 47
risk-ag-associ-cardiovascular-studi Measuring Burden of Unhealthy Behaviours Using a Multivariable Predictive Approach: Life Expectancy Lost in Canada Attributable to Smoking, Alcohol, Physical Inactivity, and Diet. 2016 18
risk-ag-associ-cardiovascular-studi Association of Physical Activity With Risk of Major Cardiovascular Diseases in Chinese Men and Women. 2017 5
risk-diseas-studi-health Very large database of lipids: rationale and design. 2013 18
risk-diseas-studi-health Cerebral Amyloid and Hypertension are Independently Associated with White Matter Lesions in Elderly. 2015 16
risk-diseas-studi-health Education and coronary heart disease: mendelian randomisation study. 2017 16
risk-patient-cohort-studi Proton Pump Inhibitors and Risk of Incident CKD and Progression to ESRD. 2016 54
risk-patient-cohort-studi Impact of diabetes on hospital admission and length of stay among a general population aged 45 year or more: a record linkage study. 2015 13
risk-patient-cohort-studi Intradialytic hypotension, blood pressure changes and mortality risk in incident hemodialysis patients. 2018 7
scienc-research-data-health What is a representative brain? Neuroscience meets population science. 2013 35
scienc-research-data-health Nature Contact and Human Health: A Research Agenda. 2017 31
scienc-research-data-health Advancing Symptom Science Through Use of Common Data Elements. 2015 17
sensor-health-develop-data-base A framework for using GPS data in physical activity and sedentary behavior studies. 2015 44
sensor-health-develop-data-base Data mining for wearable sensors in health monitoring systems: a review of recent trends and challenges. 2013 42
sensor-health-develop-data-base Devices for Self-Monitoring Sedentary Time or Physical Activity: A Scoping Review. 2016 32
surveil-public-health-data Approaches to canine health surveillance. 2014 33
surveil-public-health-data Infectious Disease Surveillance in the Big Data Era: Towards Faster and Locally Relevant Systems. 2016 18
surveil-public-health-data Big Data for Infectious Disease Surveillance and Modeling. 2016 17
toxicologi-approach-data-develop-health FutureTox II: in vitro data and in silico models for predictive toxicology. 2015 18
toxicologi-approach-data-develop-health Neonatal abstinence syndrome: Pharmacologic strategies for the mother and infant. 2016 12
toxicologi-approach-data-develop-health Systems Toxicology: Real World Applications and Opportunities. 2017 12
transmiss-infect-model-diseas-control-data Virus genomes reveal factors that spread and sustained the Ebola epidemic. 2017 66
transmiss-infect-model-diseas-control-data Exposure Patterns Driving Ebola Transmission in West Africa: A Retrospective Observational Study. 2016 25
transmiss-infect-model-diseas-control-data Updates to the zoonotic niche map of Ebola virus disease in Africa. 2016 14
vaccin-develop-data-health Spread of yellow fever virus outbreak in Angola and the Democratic Republic of the Congo 2015-16: a modelling study. 2017 41
vaccin-develop-data-health Systems vaccinology: Enabling rational vaccine design with systems biological approaches. 2015 37
vaccin-develop-data-health The Humoral Immune Response to HCV: Understanding is Key to Vaccine Development. 2014 24

Use of keywords

We can review the commonest Mesh headings associated with each cluster tag.


labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  select(clus_names, mesh) %>%
  filter(mesh != "NULL") %>%
  unnest(mesh) %>%
  count(clus_names, mesh,sort = TRUE) %>%
  filter(n < 30) %>%
  ungroup() %>%
  group_by(clus_names) %>%
  top_n(10)  %>%
  mutate(summary = paste(mesh, collapse = "; " )) %>%
  select(-c(mesh, n)) %>%
  distinct() %>%
  arrange(clus_names) %>%
  knitr::kable()
clus_names summary
abstract-meet-annual-null Humans; Animals; Delivery of Health Care; United States; Accreditation; Africa; Annual Reports as Topic; Biomedical Research; Canada; Career Choice; Career Mobility; Chicago; Clinical Competence; Committee Membership; Disease Outbreaks; Education, Medical, Graduate; Education, Pharmacy; Forecasting; Global Health; Healthy People Programs; History, 20th Century; Influenza Pandemic, 1918-1919; Influenza, Human; Information Management; Information Services; Internal Medicine; Internship and Residency; Job Description; Libraries, Digital; Libraries, Medical; Library Associations; Library Services; Marijuana Smoking; Medical Marijuana; Mental Disorders; Neuropharmacology; Ohio; Organizational Objectives; Organizational Policy; Pandemics; Patient Advocacy; Physicians; Primary Health Care; Psychopharmacology; Psychotropic Drugs; Quality Indicators, Health Care; Societies; Societies, Medical; Societies, Pharmaceutical; Socioeconomic Factors; Surgeons; Women’s Health
ag-associ-risk-studi Humans; Male; Adult; Aged; Female; Middle Aged; Prospective Studies; Risk Factors; Aged, 80 and over; Smoking
ag-nation-health-studi Humans; Female; Male; Republic of Korea; Adult; Middle Aged; Aged; Databases, Factual; Health Surveys; Prevalence; Risk Factors
ag-popul-studi-result Humans; Female; Male; Adolescent; Infant, Newborn; Adult; Age Factors; Aged; Aged, 80 and over; Case-Control Studies; Child; Child, Preschool; Infant; Time Factors; Young Adult
ai-artifici-intellig-health Artificial Intelligence; Big Data; Humans; Communicable Disease Control; Communicable Diseases; Communication; Disaster Planning; Disasters; Disease Eradication; Israel; Malaria; Public Health; Risk Management
analysi-approach-data-base Humans; Genomics; Computational Biology; Algorithms; Databases, Factual; Animals; Software; Female; Data Mining; Male; Models, Statistical
burden-global-diseas-studi Female; Global Burden of Disease; Male; Adult; Middle Aged; Aged; Child; Adolescent; Child, Preschool; Infant
cancer-gene-express-studi-data Humans; Female; Databases, Genetic; Genomics; Aged; Breast Neoplasms; DNA Copy Number Variations; Gene Expression Regulation, Neoplastic; Male; Neoplasms; Polymorphism, Single Nucleotide; Prognosis
cancer-null-report-data Humans; Neoplasms; Medical Oncology; Animals; Accidental Falls; Artificial Intelligence; Automation; Biomedical Research; Breast Neoplasms; Computational Biology; Confidentiality; Data Mining; Decision Making, Computer-Assisted; Delivery of Health Care; Diabetes Mellitus, Type 1; Diagnosis, Computer-Assisted; Economics, Behavioral; Female; Government Programs; Health Behavior; Health Care Costs; Health Services Accessibility; History, 21st Century; Image Interpretation, Computer-Assisted; Insurance, Health; Interinstitutional Relations; Magnetic Resonance Imaging; Metabolomics; Models, Organizational; Molecular Diagnostic Techniques; Motivation; Organizational Objectives; Population Surveillance; Proteomics; Quality of Health Care; Registries; Research Report; Tomography, X-Ray Computed; United States
cancer-patient-studi-data Male; Adult; Aged, 80 and over; Neoplasms; Breast Neoplasms; Risk Factors; Cohort Studies; Colorectal Neoplasms; Algorithms; Databases, Factual; Kaplan-Meier Estimate; Prognosis; Retrospective Studies; United States
cancer-popul-studi-data Humans; Female; Middle Aged; Male; Adult; Aged; Neoplasms; Incidence; Cohort Studies; Risk Factors
cancer-research-patient-data Humans; Neoplasms; Treatment Outcome; United States; Databases, Factual; Delivery of Health Care; Forecasting; Information Systems; Medical Oncology; Palliative Care
cardiovascular-diseas-risk-studi Humans; Cardiovascular Diseases; Female; Male; Risk Factors; Aged; Middle Aged; Biomarkers; Cohort Studies; Risk Assessment
care-health-studi-data Humans; Female; Male; Aged; Hospitalization; Middle Aged; Age Factors; Delivery of Health Care; Global Burden of Disease; Sex Factors
care-patient-health-data Humans; Electronic Health Records; Data Mining; Datasets as Topic; Delivery of Health Care; Medical Informatics; United States; Software; Biomedical Research; Data Collection; Databases, Factual; Quality Improvement; Quality of Health Care; Registries
clinic-data-learn-studi Humans; Electronic Health Records; Databases, Factual; Randomized Controlled Trials as Topic; Algorithms; Big Data; Data Mining; Observational Studies as Topic; Accidents, Occupational; Adult; Ambulatory Care Facilities; Biomedical Research; Breast Neoplasms; Causality; China; Clinical Studies as Topic; Clinical Trials as Topic; Comparative Effectiveness Research; Computer-Assisted Instruction; Computer Simulation; Construction Industry; Critical Illness; Data Analysis; Data Interpretation, Statistical; Database Management Systems; Decision Making; Drug Therapy, Combination; Estrogen Replacement Therapy; Female; Forecasting; Informed Consent; Intelligence; Intensive Care Units; Medical Informatics Applications; Models, Theoretical; Occupational Health; Ophthalmology; Pharmaceutical Preparations; Population Health Management; Postmenopause; Progestins; Propensity Score; Rare Diseases; Research Design; Research Subjects; Respiration, Artificial; Safety Management; Software; Tidal Volume; Treatment Outcome; User-Computer Interface; Ventilators, Mechanical; Virtual Reality; Workflow
confer-intern-challeng-research-health-develop Humans; Advisory Committees; Anniversaries and Special Events; Biomedical Research; Canada; China; Congresses as Topic; Consensus; Consensus Development Conferences as Topic; Data Mining; Databases as Topic; Europe; Food; Food Quality; Harm Reduction; Health Status Disparities; History, 20th Century; History, 21st Century; Information Dissemination; Internet; Medical Informatics; Mycotoxins; Physicians; Public Health; Public Health Administration; Public Health Informatics; Public Health Systems Research; Registries; Research Personnel; Retrospective Studies; Social Justice; Social Media; Societies; Socioeconomic Factors; Technology; United States; Violence; Washington; Workforce
confound-adjust-estim-effect-model-studi-data Humans; Bias; Confounding Factors (Epidemiology); Propensity Score; Research Design; Algorithms; Computer Simulation; Female; Logistic Models; Male; Multivariate Analysis
data-decis-health-challeng-public-develop Humans; Data Collection; Data Mining; Decision Support Techniques; Algorithms; Animals; Climate Change; Databases, Factual; Decision Making; Dual Use Research; Evidence-Based Practice; Health; Health Planning; Healthy Lifestyle; Human Rights; Information Dissemination; Informed Consent; Internet; Knowledge; Mental Health; Mental Health Services; Neural Networks (Computer); Policy; Privacy; Public Health; Records; Regional Health Planning; Safety; Spain; Systems Analysis; Technology; Veterinary Medicine
digit-technologi-health-data Humans; Telemedicine; Computer Security; Internet; Delivery of Health Care; Developing Countries; Electronic Health Records; Health Behavior; Public Health; Social Media; Software
diseas-clinic-research-data Humans; Cardiovascular Diseases; Databases, Factual; Biomedical Research; United States; Animals; Data Mining; Diffusion of Innovation; Electronic Health Records; Exercise; Forecasting; Information Dissemination; National Heart, Lung, and Blood Institute (U.S.)
effect-result-studi-data Humans; Cities; Air Pollutants; China; Environmental Monitoring; Air Pollution; Environmental Exposure; Particulate Matter; Spatial Analysis; Urbanization
ehealth-care-health-inform-base-data Humans; Telemedicine; Delivery of Health Care; Internet; Randomized Controlled Trials as Topic; Research Design; Adult; Behavior Therapy; Biomedical Research; Cell Phone; Communication; Data Mining; Decision Making; Evidence-Based Medicine; Female; Health Literacy; Health Promotion; History, 20th Century; Male; Medical Informatics; Mental Health; Middle Aged; Neoplasms; Obesity; Pathology, Clinical; Patient Participation; Physician-Patient Relations; Precision Medicine; Primary Health Care; Qualitative Research; Quality Improvement; Research; Self Care; Social Support; Weight Loss
epidemiologi-health-data-research Humans; Public Health; Epidemiology; Epidemiologic Methods; Epidemiologic Studies; Research Design; Data Collection; United States; Artificial Intelligence; Biomedical Research; Forecasting; Health Behavior; History, 21st Century; Precision Medicine
epidemiologi-null-health-data Humans; Epidemiologic Methods; Big Data; Biomedical Research; Confidentiality; Data Mining; Epidemiology; Forecasting; Public Health; Research
forecast-model-time-data Humans; Influenza, Human; Forecasting; Internet; Models, Statistical; Incidence; China; Epidemics; Models, Theoretical; Public Health; Time Factors; United States
gene-associ-diseas-studi-data Humans; Genome-Wide Association Study; Phenotype; Genetic Variation; Genomics; Genotype; Male; Models, Genetic; Polymorphism, Single Nucleotide; Algorithms; Animals; Gene-Environment Interaction; Genetic Predisposition to Disease; Transcriptome
gene-associ-identifi-studi Female; Genetic Predisposition to Disease; Male; Polymorphism, Single Nucleotide; Phenotype; Adult; Case-Control Studies; Middle Aged; Quantitative Trait Loci; Aged; Body Mass Index; European Continental Ancestry Group; Genetic Loci; Genetic Variation
genom-challeng-research-data Humans; Genomics; Genome, Human; Computational Biology; High-Throughput Nucleotide Sequencing; Information Dissemination; Electronic Health Records; Genetics, Medical; Infant, Newborn; Neonatal Screening; Precision Medicine; Translational Medical Research; United States
genom-sequenc-base-data Genomics; Sequence Analysis, DNA; Genome, Bacterial; Humans; Bacteria; Computational Biology; Phylogeny; Software; Animals; Genetic Variation; High-Throughput Nucleotide Sequencing; Internet; Reproducibility of Results; Species Specificity; Transcriptome
healthcar-data-develop-health Humans; Delivery of Health Care; Data Mining; Databases, Factual; Decision Making; Adult; Aged; Confidentiality; Datasets as Topic; Female; Information Dissemination; Information Storage and Retrieval; Internet; Male; Middle Aged; Organizational Innovation; Outcome Assessment (Health Care); Precision Medicine; Reproducibility of Results; Telemedicine
hiv-studi-data-health Male; Female; Adult; Public Health; Adolescent; HIV-1; Middle Aged; Social Media; Viral Load; Young Adult
hospit-patient-ag-studi-data Humans; Female; Middle Aged; Male; Aged; Adult; Young Adult; Aged, 80 and over; Adolescent; Retrospective Studies; Taiwan
host-infect-sequenc-genom Humans; Animals; Host-Parasite Interactions; Malaria; Phylogeny; Transcriptome; Algorithms; Computational Biology; Disease Models, Animal; Disease Transmission, Infectious; Ebolavirus; Genome, Viral; Genotype; Membrane Proteins; MicroRNAs; Mutation; Mycobacterium tuberculosis; Oligonucleotides; Orthomyxoviridae; Plasmodium; Plasmodium vivax; Protozoan Proteins; RNA, Messenger; Software
idea-health-research-system-develop Humans; Animals; Data Mining; Forecasting; History, 21st Century; Internet; Research; Software; Adverse Drug Reaction Reporting Systems; Algorithms; Bayes Theorem; Biology; Biomedical Research; Chemistry; Computer Graphics; Consent Forms; Cooperative Behavior; Critical Care; Curriculum; Databases, Factual; Datasets as Topic; Decision Making; Decision Support Systems, Clinical; Delivery of Health Care; Drug Development; Drug Evaluation, Preclinical; Drug Industry; Drug Labeling; Education, Graduate; Electronic Health Records; Europe; Female; Government Regulation; Health Behavior; Health Knowledge, Attitudes, Practice; Health Promotion; High-Throughput Screening Assays; History, 16th Century; History, 17th Century; History, 18th Century; History, 19th Century; History, 20th Century; History, Ancient; Human Experimentation; Industry; Information Dissemination; Informed Consent; Inventions; Life Style; Logic; Male; Medical Informatics; Medical Informatics Applications; Mobile Applications; Models, Statistical; One Health; Ownership; Phenotype; Private Sector; Probability; Product Recalls and Withdrawals; Programming Languages; Public Health; Public Sector; Research Design; Science; Social Responsibility; Statistics as Topic; Telemedicine; United States; United States Food and Drug Administration
imag-clinic-data-studi Female; Male; Image Processing, Computer-Assisted; Algorithms; Biomarkers; Diagnostic Imaging; Adult; Aged; Alzheimer Disease; Magnetic Resonance Imaging
informat-health-system-inform-data Medical Informatics; Humans; Societies, Medical; Patient Participation; Public Health Informatics; Telemedicine; Bibliometrics; Biological Ontologies; Communication; Computer Security; Confidentiality; Consensus; Consumer Health Informatics; Consumer Health Information; Data Anonymization; Datasets as Topic; Delphi Technique; Epidemiology; Forecasting; Genomics; Global Health; Health Equity; History, 20th Century; History, 21st Century; Informatics; Information Systems; Meaningful Use; Patient-Centered Care; Periodicals as Topic; Population Health; Precision Medicine; Privacy; Quality Assurance, Health Care; Social Media; Software; United States
injuri-data-studi-health Humans; Female; Male; Child; Child, Preschool; New South Wales; Oceanic Ancestry Group; Adolescent; Infant; Hospitalization; Infant, Newborn
learn-predict-model-data Humans; Machine Learning; Algorithms; Female; Male; Databases, Factual; Electronic Health Records; Data Mining; Middle Aged; ROC Curve; Support Vector Machine
malaria-transmiss-infect-control-model Animals; Malaria; Malaria, Falciparum; Child, Preschool; Global Health; Infant; Prevalence; Communicable Disease Control; Female; Insect Vectors; Male
media-social-health-data Internet; Female; Male; Adolescent; Adult; Social Networking; Information Dissemination; Public Health; Public Opinion; Social Support; Young Adult
medic-care-research-health-data Humans; Cooperative Behavior; United States; Public Health; Interprofessional Relations; Patient-Centered Care; Quality of Health Care; Research; Biomedical Research; Child; Child Welfare; Cost-Benefit Analysis; Data Collection; Delivery of Health Care; Evidence-Based Medicine; Health Personnel; History, 20th Century; History, 21st Century; Holistic Health
method-applic-data-develop-research Humans; Databases, Factual; Information Storage and Retrieval; Software; Algorithms; Animals; Artificial Intelligence; Bibliometrics; Biomarkers; Biometry; Brain; Brain Mapping; Chemical Safety; Clinical Trials as Topic; Cloud Computing; Computer Simulation; Data Interpretation, Statistical; Data Mining; Database Management Systems; Datasets as Topic; Endpoint Determination; Hazardous Substances; History, 20th Century; History, 21st Century; Image Processing, Computer-Assisted; Medical Informatics; Meta-Analysis as Topic; Metabolomics; Models, Theoretical; Neuroimaging; Pattern Recognition, Physiological; Reproducibility of Results; Research Design; Risk Assessment; Safety Management; Sample Size; Software Design; Toxicology; Uncertainty; User-Computer Interface; Workflow
model-health-research-data-studi Humans; Algorithms; Models, Theoretical; China; Delivery of Health Care; Geographic Information Systems; Health Services Accessibility; Research; Adult; Animal Diseases; Animals; Attitude; Catchment Area (Health); Cities; Cloud Computing; Commerce; Consumer Behavior; Costs and Cost Analysis; Crowdsourcing; Data Analysis; Data Collection; Data Interpretation, Statistical; Databases as Topic; Databases, Factual; Decision Making; Entropy; Environmental Exposure; Environmental Health; Environmental Monitoring; Epidemiologic Research Design; Food Safety; Fuzzy Logic; Garbage; Gene-Environment Interaction; Gravitation; Health; Health Impact Assessment; Health Personnel; Health Plan Implementation; Health Resources; Health Status Disparities; Industry; Information Services; Internet; Longitudinal Studies; Medically Underserved Area; Motor Activity; Motor Vehicles; Physicians; Probability; Program Evaluation; Public Health; Railroads; Recycling; Refuse Disposal; Remote Sensing Technology; Reproducibility of Results; Research Design; Risk Management; Rural Population; Safety Management; Socioeconomic Factors; Spatio-Temporal Analysis; Technology; Time Factors; Travel; Uncertainty; United States
model-studi-result-data Humans; Female; Adult; Algorithms; Data Mining; Electronic Health Records; Male; Middle Aged; Models, Theoretical; Young Adult
null-research-health-data Public Health; Biomedical Research; United States; Data Mining; Data Collection; Female; Male; Electronic Health Records; Datasets as Topic; Statistics as Topic
null-scienc-health-data Humans; Science; Health Services Research; Information Dissemination; Translational Medical Research; Computational Biology; Data Interpretation, Statistical; Genomics; Models, Statistical; United States
nutrit-approach-health-data Humans; Health Promotion; Nutrition Policy; Diet; Female; United States; Child; Databases, Factual; Feeding Behavior; Gastrointestinal Microbiome; Healthy Diet; Nutritional Status; Precision Medicine
obes-adult-ag-relat-data-studi-health Humans; Body Mass Index; Female; Male; Middle Aged; Obesity; Prevalence; Adult; Aged; Diet; Ethnic Groups; Health Surveys; Pediatric Obesity; Risk Factors
paper-editori-editor-research Humans; Electronic Health Records; Information Dissemination; Medical Informatics; Clinical Decision-Making; Data Mining; Decision Support Systems, Clinical; Biomedical Research; Comparative Effectiveness Research; Confidentiality; Data Anonymization; Datasets as Topic; Decision Making; Decision Support Techniques; Drug Interactions; Health Information Systems; Information Storage and Retrieval; Information Systems; Informed Consent; Internationality; Medical Order Entry Systems; Models, Organizational; Natural Language Processing; Patient Acceptance of Health Care; Patient Care Team; Peer Review; Quality Control; Societies, Medical
patient-base-studi-data Humans; Male; Female; Aged; Middle Aged; Adult; Adolescent; Aged, 80 and over; Alzheimer Disease; Calibration; Logistic Models; Risk Factors; Time Factors; Treatment Outcome
patient-health-studi-data Female; Male; Middle Aged; Aged; Adult; Electronic Health Records; Databases, Factual; Young Adult; Aged, 80 and over; Adolescent
pollut-air-studi-health Air Pollution; Humans; Air Pollutants; Female; Particulate Matter; Male; China; Environmental Monitoring; Environmental Exposure; Aged; Middle Aged
precis-medicin-research-data Genomics; Evidence-Based Medicine; Databases, Factual; Biomedical Research; Systems Biology; Translational Medical Research; Delivery of Health Care; Electronic Health Records; Animals; Biomarkers
precis-null-medicin-health Humans; Precision Medicine; Public Health; Biomedical Research; Computational Biology; Diagnostic Imaging; Education, Medical; Genomics; History, 21st Century; Medical Informatics; United States
pregnanc-popul-studi-data Adult; Female; Humans; Pregnancy; Cohort Studies; Adolescent; Pregnancy Outcome; Cesarean Section; Databases, Factual; Infant, Newborn; Pregnancy Complications; Pregnancy Trimester, First; Premature Birth; Prevalence; Smoking; Young Adult
public-health-system-commun-inform-data-popul Humans; Public Health; Health Policy; Causality; Chicago; City Planning; Civil Defense; Data Interpretation, Statistical; Decision Making; Decision Support Techniques; Electronic Health Records; Environmental Health; Epidemiologic Factors; Epidemiologic Studies; Geographic Information Systems; Government Regulation; Hearing Loss; Information Storage and Retrieval; Mobile Applications; Nutrition Policy; Policy Making; Population Surveillance; Public Health Practice; Public Policy; Public Sector; Quality Control; United States; User-Computer Interface; Video Games
research-data-health-identifi Humans; Research Design; Public Health; Biomedical Research; United Kingdom; Adolescent; Australia; Child; Cost-Benefit Analysis; Decision Making; Dental Research; Female; Male; Regression Analysis; Translational Medical Research
research-data-inform-health Humans; Biomedical Research; Information Dissemination; Informed Consent; Confidentiality; Female; Male; Medical Informatics; Reproducibility of Results; Adult; Data Mining; Databases, Factual; Internet; Models, Theoretical; United States
research-health-data-studi Information Storage and Retrieval; Models, Statistical; Social Media; Prevalence; Research Design; Information Dissemination; Infant; Population Surveillance; Cardiovascular Diseases; Child, Preschool; Models, Theoretical
resist-develop-studi-data Humans; Animals; Anti-Bacterial Agents; Drug Resistance, Microbial; Insecticide Resistance; Arboviruses; Mosquito Control; Mosquito Vectors; Aedes; Arbovirus Infections; Insecticides; Metagenomics; Streptococcus pneumoniae
risk-ag-associ-cardiovascular-studi Humans; Female; Male; Aged; Middle Aged; Adult; Child; Cohort Studies; Adolescent; Aged, 80 and over; Cardiovascular Diseases; Hypertension; Incidence; Myocardial Infarction; Obesity; Risk Factors; Sex Factors; Smoking; Stroke; Taiwan
risk-diseas-studi-health Humans; Female; Male; Middle Aged; Adult; Risk Factors; Aged; Hypertension; Prevalence; Taiwan
risk-patient-cohort-studi Female; Humans; Male; Middle Aged; Aged; Adult; Diabetes Mellitus, Type 2; Risk Factors; Adolescent; Aged, 80 and over; China; Comorbidity; Hospitalization; Kidney Failure, Chronic; Young Adult
scienc-research-data-health Humans; Biomedical Research; United States; Translational Medical Research; Science; Animals; Computational Biology; Cooperative Behavior; National Institutes of Health (U.S.); Public Health; Research Design
sensor-health-develop-data-base Humans; Exercise; Mobile Applications; Telemedicine; Geographic Information Systems; Internet; Monitoring, Physiologic; Public Health; Time Factors; Adolescent; Adult; Algorithms; Benchmarking; Biomedical Research; Biosensing Techniques; Environmental Monitoring; Female; Male; Monitoring, Ambulatory; Sedentary Lifestyle; Surveys and Questionnaires; Wearable Electronic Devices
surveil-public-health-data Humans; Communicable Diseases; Population Surveillance; Disease Outbreaks; Epidemiological Monitoring; Internet; Public Health Surveillance; Databases, Factual; Influenza, Human; Adolescent; Child; Child, Preschool; China; Communicable Diseases, Emerging; Data Collection; Electronic Data Processing; Information Dissemination; Insurance Claim Review; Models, Statistical; Public Health; Retrospective Studies; United States
toxicologi-approach-data-develop-health Humans; Analgesics, Opioid; Toxicology; Animals; Computer Simulation; Female; Practice Patterns, Physicians’; United States; Aged; Animal Testing Alternatives; Australia; Drug Prescriptions; In Vitro Techniques; Male; Middle Aged; Opioid-Related Disorders; Toxicity Tests
transmiss-infect-model-diseas-control-data Humans; Animals; Disease Outbreaks; Hemorrhagic Fever, Ebola; Communicable Diseases; Databases, Factual; Epidemiological Monitoring; Models, Theoretical; Data Collection; Female
vaccin-develop-data-health Vaccination; Vaccines; Influenza Vaccines; Influenza, Human; United States; Female; Aged; Health Knowledge, Attitudes, Practice; Immunization Programs; Infant; Male; Social Media; Socioeconomic Factors; Systems Biology; Vaccination Coverage

Investigating individial themes to identify full-text articles

Lets explore articles for which public health is a Mesh heading.


ph <- labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  filter(str_detect(keywords, "Public Health"))

ph %>%
  count(clus_names, sort = TRUE)
#> # A tibble: 39 x 2
#>    clus_names                            n
#>    <chr>                             <int>
#>  1 research-health-data-studi           76
#>  2 null-research-health-data            21
#>  3 epidemiologi-health-data-research    13
#>  4 media-social-health-data             10
#>  5 hiv-studi-data-health                 8
#>  6 precis-medicin-research-data          5
#>  7 surveil-public-health-data            5
#>  8 cancer-patient-studi-data             4
#>  9 digit-technologi-health-data          4
#> 10 forecast-model-time-data              4
#> # ... with 29 more rows

There is one article tagged with ai-intellig-artifici-health-data which has Public Health as a mesh heading. We can use epmc_ftxt to extract the full text article.

library(rvest)



get_pmcids <- ph %>%
  filter(clus_names == "data-research-health-develop") %>%
  select(id, pmcid) %>%
  filter(!is.na(pmcid))


details <- mutate(ids, details = map(get_ids, epmc_details))

full_text <- details %>%
    mutate(full_text = map(details, "ftx")) %>%
    unnest(full_text) %>%
  filter(availability == "Free") %>%
  left_join(get_pmcids, by = c("value" = "id")) %>%
  distinct()


full_text <- europepmc::epmc_ftxt("PMC5171550")

ft <- full_text %>%
  html_text()

ft %>%
  str_split(., "\\. ") %>%
  enframe() %>%
  formattable::formattable()

Finally we can gather all the abstracts into a single interactive table which can be searched, filtered and shared.


labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid"))  %>%
  select(cluster, clus_names, doi, title, journalTitle, pubYear, citedByCount, absText) %>%
  mutate(doi = paste0("<a href = https://", doi, ">doi</a>")) %>%
  DT::datatable(escape = FALSE, extensions = c('Responsive','Buttons', 'FixedHeader'), 
                filter = "top", 
  options = list(
    autoWidth = TRUE,
    columnDefs = list(list(width = '450px')),
    dom = 'Bfrtip',
    buttons = c('csv', 'excel'),
    fixedHeader=TRUE) 
  )