Introduction

Literature and evidence review essential in public health practice
Exponential growth in volume of literature
Initial first steps usually:
- Developing search strategy
- Reviewing and filtering abstracts
- Obtaining full text (if possible)
- Data extraction

This can be a manual and protracted iterative process which may involve using specialised searching services, downloading abstracts, reading and filtering, secondary searching and so on, and may involve sifting many thousands of abstracts.

Often we may just want a rapid overview of the literature to help focus further reviewing.

In this vignette we demonstrate the use of R packages for large scale extraction of abstracts, and analytical techniques for identifying topics or themes in the abstracts.

The vignette is based on a number of R packages:

europepmc - this is a sophisticated tool which interacts with the PubMedCentral API and provides access to additional fields.
adjutant - this is a fully fledged package with retrieval and clustering functions. 3.tidytext - a package for text mining using tidy data principles.
Rtsne - this uses the tSNE algorithm for data reduction and cluster visualisation
dbscan - applies the HDBSCAN algorithm for data clustering
myScrapers - wraps some functions built on other packages to automate the search, extraction, and filtering process.

We have “hacked” some of the functions in these packages and written additional functions to develop a work flow from searching and retrieval to analysis

A simple example using `europepmc`

Searching Europe PubMed Central (epmc)

This is a package which allows searching of EuropePMC via the API.

It can be downloaded from CRAN.


if(!require("europepmc")) install.packages("europepmc")
library(europepmc)

The main function is epmc_search which allows us to search the site and retrieve abstracts, metadata and citation counts.

We’ll use it with the search term blockchain AND health.


head(epmc_search(params$search, limit = 10))
#> # A tibble: 6 x 28
#>   id    source pmid  doi   title authorString journalTitle issue
#>   <chr> <chr>  <chr> <chr> <chr> <chr>        <chr>        <chr>
#> 1 3150~ MED    3150~ 10.2~ Appl~ Jin XL, Zha~ J Med Inter~ 9    
#> 2 3141~ MED    3141~ 10.2~ Priv~ Jones M, Jo~ J Med Inter~ 8    
#> 3 3147~ MED    3147~ 10.2~ A Bl~ Hylock RH, ~ J Med Inter~ 8    
#> 4 3133~ MED    3133~ 10.2~ Clou~ Zhu X, Shi ~ J Med Inter~ 7    
#> 5 3139~ MED    3139~ 10.3~ A Le~ Leeming G, ~ Front Med (~ <NA> 
#> 6 3141~ MED    3141~ 10.1~ Med-~ Zhou T, Li ~ J Med Syst   9    
#> # ... with 20 more variables: journalVolume <chr>, pubYear <chr>,
#> #   journalIssn <chr>, pageInfo <chr>, pubType <chr>, isOpenAccess <chr>,
#> #   inEPMC <chr>, inPMC <chr>, hasPDF <chr>, hasBook <chr>,
#> #   citedByCount <int>, hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, pmcid <chr>, hasSuppl <chr>

This doesn’t extract the abstract text or Mesh headings (keywords) - to facilitate this we have wrapped the search function, into get_full_search in myScrapers.

library(tictoc)
set.seed(42)

tic()
search1 <- get_full_search(search = params$search, limit = params$limit)
toc()
#> 188.88 sec elapsed

head(search1, 20)
#> # A tibble: 20 x 32
#>    id    source pmid  doi   title authorString journalTitle issue
#>    <chr> <chr>  <chr> <chr> <chr> <chr>        <chr>        <chr>
#>  1 3150~ MED    3150~ 10.2~ Appl~ Jin XL, Zha~ J Med Inter~ 9    
#>  2 3141~ MED    3141~ 10.2~ Priv~ Jones M, Jo~ J Med Inter~ 8    
#>  3 3147~ MED    3147~ 10.2~ A Bl~ Hylock RH, ~ J Med Inter~ 8    
#>  4 3133~ MED    3133~ 10.2~ Clou~ Zhu X, Shi ~ J Med Inter~ 7    
#>  5 3139~ MED    3139~ 10.3~ A Le~ Leeming G, ~ Front Med (~ <NA> 
#>  6 3141~ MED    3141~ 10.1~ Med-~ Zhou T, Li ~ J Med Syst   9    
#>  7 3143~ MED    3143~ 10.1~ Dece~ Coelho FC, ~ Mem Inst Os~ <NA> 
#>  8 3132~ MED    3132~ 10.3~ A Se~ Kim M, Park~ Sensors (Ba~ 13   
#>  9 3133~ MED    3133~ 10.3~ Bloc~ Shuaib K, S~ J Pers Med   3    
#> 10 3132~ MED    3132~ 10.3~ A Bl~ Rathee G, S~ Sensors (Ba~ 14   
#> 11 3129~ MED    3129~ 10.3~ Bloc~ Pop C, Anta~ Sensors (Ba~ 14   
#> 12 3131~ MED    3131~ 10.3~ Bloc~ Derhab A, G~ Sensors (Ba~ 14   
#> 13 3122~ MED    3122~ 10.2~ The ~ Esmaeilzade~ J Med Inter~ 6    
#> 14 3143~ MED    3143~ 10.3~ Poss~ Giordanengo~ Stud Health~ <NA> 
#> 15 3135~ MED    3135~ 10.3~ Enab~ Fernández-C~ Sensors (Ba~ 15   
#> 16 3134~ MED    3134~ 10.3~ Bloc~ Bernardi F,~ Stud Health~ <NA> 
#> 17 3134~ MED    3134~ 10.3~ Movi~ Balis C, Ta~ Stud Health~ <NA> 
#> 18 3134~ MED    3134~ 10.3~ Crow~ Mihelj J, Z~ Sensors (Ba~ 15   
#> 19 3134~ MED    3134~ 10.3~ Bloc~ Shifrin M, ~ Stud Health~ <NA> 
#> 20 3124~ MED    3124~ 10.3~ A Pe~ Cai W, Du X~ Sensors (Ba~ 12   
#> # ... with 24 more variables: journalVolume <chr>, pubYear <chr>,
#> #   journalIssn <chr>, pageInfo <chr>, pubType <chr>, isOpenAccess <chr>,
#> #   inEPMC <chr>, inPMC <chr>, hasPDF <chr>, hasBook <chr>,
#> #   citedByCount <int>, hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, pmcid <chr>, hasSuppl <chr>, name <int>,
#> #   absText <list>, mesh <list>, keywords <chr>

We can see that the get_full_search function returns addition metadata such as citation counts, whether the journal is open access and whether there is PDF available. By default, 1000 article descriptions are downloaded. It also includes mesh headings and abstract text.

we can see how many articles are available altogether by running epmc_profile.


profile <- epmc_profile(query = params$search)

Running epmc_profile allows us to see that there are 329 articles of which 254 are full text articles, and 232 are open access.

Analysing abstracts

Abstracts per year

We can easily look at annual abstract frequency - we can readily see the growth in publication frequency in the last 3 years.


search1 %>%
  count(pubYear) %>%
  ggplot(aes(pubYear, n)) +
  geom_col(fill = "blue") +
  labs(title = "Abstracts per year", 
       subtitle = paste("Search: ", params$search)) +
  phecharts::theme_phe() +
  theme(axis.text.x = element_text(angle = 45 ,hjust = 1))

Journal frequency

Similarly we can identify the most frequent journals


journal_count <- search1 %>%
  count(journalTitle) %>%
  top_n(20) %>%
  arrange(-n)

 journal_count %>%
  ggplot(aes(reorder(journalTitle, n), n)) +
  geom_col(fill = "blue") +
  coord_flip() +
  labs(title = "Journal frequency") +
  phecharts::theme_phe()

Sensors (Basel) and PLoS One are the most frequent journals publishing articles on blockchain AND health.

Topic identification

Once we have a data frame of 329 records with abstract text, we can prepare the data for analysis. The create_corpus function is designed for this.


out1 <- search1 %>%
  select(pmid, pmcid ,doi, title, pubYear, citedByCount, absText, journalTitle) %>%
  filter(absText != "NULL") %>%
  mutate(text = paste(title, absText))

Text mining

We will use a method exemplified in the adjutant package which uses unsupervised machine learning to try and cluster similar articles and attach themes.

In this approach undertake some natural language processing. We will

Split each abstract into groups is single words
Remove numbers and common (stop) words
Stem each word (definition:)
Calculate the tf-idf score for each word in each abstract - this gives more weight to words which are more “typical” of the abstracts
Create a document feature matrix
Undertake dimensionality reduction using tSNE to simplify
Run HDBSCAN to identify clusters
Name the clusters
QA the result

The ultimate output of this analysis is a visualisation of clustered and labelled abstracts and a interactive table.


library(tidytext)

corp <- create_corpus(df = search1)

head(corp$corpus)
#> # A tibble: 6 x 6
#>   pmid     word        n      tf   idf tf_idf
#>   <chr>    <chr>   <int>   <dbl> <dbl>  <dbl>
#> 1 24505257 accumul     1 0.00943  4.36 0.0411
#> 2 24505257 agent       1 0.00943  4.14 0.0390
#> 3 24505257 amount      1 0.00943  3.04 0.0287
#> 4 24505257 analysi     1 0.00943  1.74 0.0164
#> 5 24505257 analyz      3 0.0283   2.28 0.0645
#> 6 24505257 attach      2 0.0189   4.65 0.0877
corp$corpus %>%
  count(pmid)
#> # A tibble: 314 x 2
#>    pmid         n
#>    <chr>    <int>
#>  1 24505257    79
#>  2 25874694    57
#>  3 27037387     8
#>  4 27239273    44
#>  5 27240373    60
#>  6 27565509    72
#>  7 27638214    99
#>  8 27695049    76
#>  9 27768691    70
#> 10 28029119    91
#> # ... with 304 more rows


clust <- create_cluster(corpus = corp$corpus, minPts = params$minPts, perplexity = params$perplexity)
#> If there are small numbers of abstracts, 
#> try lowering the perpexlity value to less than 30% of the number of returns9.15 sec elapsed


clust$cluster_size
#> # A tibble: 4 x 2
#>   cluster     n
#>     <dbl> <int>
#> 1       0    47
#> 2       1    10
#> 3       2    20
#> 4       3   236

Labelling clusters


labels <- label_clusters(corp$corpus, clustering = clust$clustering, top_n = 4)
#> 0.14 sec elapsed

labels$labels
#> # A tibble: 4 x 2
#> # Groups:   cluster [4]
#>   cluster clus_names                               
#>     <dbl> <chr>                                    
#> 1       0 blockchain-technologi-result-paper-system
#> 2       1 trial-clinic-blockchain-data             
#> 3       2 secur-data-blockchain-base-propos        
#> 4       3 data-technologi-system-base

Visualise


p <- labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  ggplot(aes(X1, X2)) +
  geom_point(aes(colour = clustered, size = citedByCount) ) +
  ggrepel::geom_text_repel(data = labels$plot, aes(medX, medY, label = clus_names), size = 3, colour = "red", alpha = 0.9)

p + scale_alpha_manual(values=c(1,0)) +
  viridis::scale_color_viridis(discrete = TRUE, option = "viridis", alpha = .5, begin = .8, end = .1, direction = -1) +
  phecharts::theme_phe() +
  theme(panel.background = element_rect(fill = "#ffffff")) +
  labs(subtitle = paste("Clustering: ", nrow(labels$plot), " topics" ), 
       title = paste("Search ", "= ", params$search ))

Understanding the labels

Most cited articles


most_cited <- labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  filter(cluster !=0) %>%
  group_by(clus_names) %>%
  top_n(n = 3, citedByCount) %>%
  select(clus_names, title, pubYear, citedByCount) %>%
  ungroup() %>%
  arrange(clus_names, -citedByCount)

most_cited %>%
  formattable::formattable()

clus_names	title	pubYear	citedByCount
data-technologi-system-base	Opportunities and obstacles for deep learning in biology and medicine.	2018	41
data-technologi-system-base	Blockchain distributed ledger technologies for biomedical and health care applications.	2017	26
data-technologi-system-base	Healthcare Data Gateways: Found Healthcare Intelligence on Blockchain with Novel Privacy Risk Control.	2016	19
secur-data-blockchain-base-propos	Secure Cloud-Based EHR System Using Attribute-Based Cryptosystem and Blockchain.	2018	5
secur-data-blockchain-base-propos	Secure and Trustable Electronic Medical Records Sharing using Blockchain.	2017	3
secur-data-blockchain-base-propos	Combining Cryptography with EEG Biometrics.	2018	3
secur-data-blockchain-base-propos	Blockchain-Based Data Preservation System for Medical Data.	2018	3
trial-clinic-blockchain-data	Blockchain technology for improving clinical research quality.	2017	14
trial-clinic-blockchain-data	How blockchain-timestamped protocols could improve the trustworthiness of medical science.	2016	4
trial-clinic-blockchain-data	Improving data transparency in clinical trials using blockchain smart contracts.	2016	4

Use of keywords

We can review the commonest Mesh headings associated with each cluster tag.


labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  select(clus_names, mesh) %>%
  filter(mesh != "NULL") %>%
  unnest(mesh) %>%
  count(clus_names, mesh,sort = TRUE) %>%
  filter(n < 30) %>%
  ungroup() %>%
  group_by(clus_names) %>%
  top_n(10)  %>%
  mutate(summary = paste(mesh, collapse = "; " )) %>%
  select(-c(mesh, n)) %>%
  distinct() %>%
  arrange(clus_names) %>%
  knitr::kable()

clus_names	summary
blockchain-technologi-result-paper-system	Humans; Genomics; Electronic Health Records; Genome, Human; Algorithms; American Medical Association; Commerce; Computer Security; Confidentiality; Cooperative Behavior; Food Safety; United States
data-technologi-system-base	Computer Security; Delivery of Health Care; Electronic Health Records; Internet; Technology; Confidentiality; Information Dissemination; Privacy; Medical Informatics; Telemedicine
secur-data-blockchain-base-propos	Computer Security; Humans; Electronic Health Records; Confidentiality; Privacy; Health Information Exchange; Information Dissemination; Algorithms; Cloud Computing; Information Storage and Retrieval; Insurance, Health; Telemedicine
trial-clinic-blockchain-data	Humans; Clinical Trials as Topic; Computer Security; Internet; Algorithms; Confidentiality; Data Collection; Delivery of Health Care; Electronic Health Records; Information Dissemination; Medical Audit; Mobile Applications; Privacy; Proof of Concept Study; Quality Control; Quality Improvement; Research Design

Public health related abstracts


ph <- labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  filter(str_detect(keywords, "Public Health")|str_detect(absText, "public health|population health"))

table <- ph %>%
  select(title, journalTitle, pubYear, clus_names, keywords, absText)

There are 9 articles tagged with public health as a Mesh heading or where “public health” or “population health” are mentioned in the abstract text.. These are shown in the table 1.

Table 1
title	journalTitle	pubYear	clus_names	keywords	absText
The use of technology to promote vaccination: A social ecological model based framework.	Hum Vaccin Immunother	2018	data-technologi-system-base	c(“Humans”, “Vaccination”, “Public Health”, “Models, Theoretical”, “Educational Technology”, “Text Messaging”, “Cell Phone”)	Vaccinations are an important and effective cornerstone of preventive medical care. Growing technologic capabilities and use by both patients and providers present critical opportunities to leverage these tools to improve vaccination rates and public health. We propose the Social Ecological Model as a useful theoretical framework to identify areas in which technology has been or may be leveraged to target undervaccination across the individual, interpersonal, organizational, community, and society levels and the ways in which these levels interact.
Governance on the Drug Supply Chain via Gcoin Blockchain.	Int J Environ Res Public Health	2018	blockchain-technologi-result-paper-system	c(“Humans”, “Drug Industry”, “Internet”, “Prescription Drugs”)	As a trust machine, blockchain was recently introduced to the public to provide an immutable, consensus based and transparent system in the Fintech field. However, there are ongoing efforts to apply blockchain to other fields where trust and value are essential. In this paper, we suggest Gcoin blockchain as the base of the data flow of drugs to create transparent drug transaction data. Additionally, the regulation model of the drug supply chain could be altered from the inspection and examination only model to the surveillance net model, and every unit that is involved in the drug supply chain would be able to participate simultaneously to prevent counterfeit drugs and to protect public health, including patients.
Geospatial blockchain: promises, challenges, and scenarios in health and healthcare.	Int J Health Geogr	2018	data-technologi-system-base	c(“Humans”, “Confidentiality”, “Computer Security”, “Patient Participation”, “Delivery of Health Care”, “Spatial Analysis”)	A PubMed query run in June 2018 using the keyword ‘blockchain’ retrieved 40 indexed papers, a reflection of the growing interest in blockchain among the medical and healthcare research and practice communities. Blockchain’s foundations of decentralisation, cryptographic security and immutability make it a strong contender in reshaping the healthcare landscape worldwide. Blockchain solutions are currently being explored for: (1) securing patient and provider identities; (2) managing pharmaceutical and medical device supply chains; (3) clinical research and data monetisation; (4) medical fraud detection; (5) public health surveillance; (6) enabling truly public and open geo-tagged data; (7) powering many Internet of Things-connected autonomous devices, wearables, drones and vehicles, via the distributed peer-to-peer apps they run, to deliver the full vision of smart healthy cities and regions; and (8) blockchain-enabled augmented reality in crisis mapping and recovery scenarios, including mechanisms for validating, crediting and rewarding crowdsourced geo-tagged data, among other emerging use cases. Geospatially-enabled blockchain solutions exist today that use a crypto-spatial coordinate system to add an immutable spatial context that regular blockchains lack. These geospatial blockchains do not just record an entry’s specific time, but also require and validate its associated proof of location, allowing accurate spatiotemporal mapping of physical world events. Blockchain and distributed ledger technology face similar challenges as any other technology threatening to disintermediate legacy processes and commercial interests, namely the challenges of blockchain interoperability, security and privacy, as well as the need to find suitable and sustainable business models of implementation. Nevertheless, we expect blockchain technologies to get increasingly powerful and robust, as they become coupled with artificial intelligence (AI) in various real-word healthcare solutions involving AI-mediated data exchange on blockchains.
Reimagining Health Data Exchange: An Application Programming Interface-Enabled Roadmap for India.	J Med Internet Res	2018	data-technologi-system-base	c(“Humans”, “Public Health”, “Computer Security”, “India”, “Electronic Health Records”, “Universal Health Insurance”)	In February 2018, the Government of India announced a massive public health insurance scheme extending coverage to 500 million citizens, in effect making it the world’s largest insurance program. To meet this target, the government will rely on technology to effectively scale services, monitor quality, and ensure accountability. While India has seen great strides in informational technology development and outsourcing, cellular phone penetration, cloud computing, and financial technology, the digital health ecosystem is in its nascent stages and has been waiting for a catalyst to seed the system. This National Health Protection Scheme is expected to provide just this impetus for widespread adoption. However, health data in India are mostly not digitized. In the few instances that they are, the data are not standardized, not interoperable, and not readily accessible to clinicians, researchers, or policymakers. While such barriers to easy health information exchange are hardly unique to India, the greenfield nature of India’s digital health infrastructure presents an excellent opportunity to avoid the pitfalls of complex, restrictive, digital health systems that have evolved elsewhere. We propose here a federated, patient-centric, application programming interface (API)-enabled health information ecosystem that leverages India’s near-universal mobile phone penetration, universal availability of unique ID systems, and evolving privacy and data protection laws. It builds on global best practices and promotes the adoption of human-centered design principles, data minimization, and open standard APIs. The recommendations are the result of 18 months of deliberations with multiple stakeholders in India and the United States, including from academia, industry, and government.
A study on Chinese consumer preferences for food traceability information using best-worst scaling.	PLoS One	2018	blockchain-technologi-result-paper-system	c(“Humans”, “Vegetables”, “Socioeconomic Factors”, “Dairy Products”, “Adolescent”, “Adult”, “Middle Aged”, “China”, “Female”, “Male”, “Young Adult”, “Food Safety”, “Consumer Behavior”, “Red Meat”)	Food safety is a global public health issue, which often arises from asymmetric information between consumers and suppliers. With the development of information technology in human life, building a food traceability information sharing platform is viewed as one of the best ways to overcome the trust crisis and resolve the problem of information asymmetry in China. However, among the myriad information available from the food supply chain, there is a lack of knowledge on consumer preference. Based on the best-worst scaling approach, this paper investigated consumer preferences for vegetable, pork, and dairy product traceability information. Specifically, this paper measured the relative importance that consumers place on the traceable information. The results indicate that consumers have varying priorities for information in different cases. “Pesticide/veterinary use,” “picking/slaughtering date,” and “fertilizer/feed use” are the most preferred traceable information for Chinese consumers in the case of vegetables, while “picking/slaughtering date” and “history of illness and taking protective measures” are the most preferred information in the case of pork. In the case of dairy products, consumers prefer “processing information,” “environmental information of the origin,” and “traceable tag certification information” most. The results of this study call for the direct involvement of the Chinese government in the food safety information sharing system as following. First, given consumers’ diverse preferences, different types of traceable information should be recorded into the information sharing platform depending on food types. Second, the government could promote the step-by-step construction of such a platform based on the priority of consumers’ preferences. Third, new technology should be applied to guarantee the reliability of traceable information. Finally, local preferences in terms of the way consumers receive and understand information should be taken into consideration.
Precision Medicine: Changing the way we think about healthcare.	Clinics (Sao Paulo)	2018	data-technologi-system-base	c(“Humans”, “Neoplasms”, “Mental Disorders”, “Genomics”, “Education, Medical”, “Precision Medicine”)	Health care has changed since the decline in mortality caused by infectious diseases as well as chronic and non-contagious diseases, with a direct impact on the cost of public health and individual health care. We must now transition from traditional reactive medicine based on symptoms, diagnosis and treatment to a system that targets the disease before it occurs and, if it cannot be avoided, treats the disease in a personalized manner. Precision Medicine is that new way of thinking about medicine. In this paper, we performed a thorough review of the literature to present an updated review on the subject, discussing the impact of the use of genetics and genomics in the care process as well as medical education, clinical research and ethical issues. The Precision Medicine model is expanded upon in this article to include its principles of prediction, prevention, personalization and participation. Finally, we discuss Precision Medicine in various specialty fields and how it has been implemented in developing countries and its effects on public health and medical education.
Service supply chains for population health: Overcoming fragmentation of service delivery ecosystems.	Learn Health Syst	2019	data-technologi-system-base	NULL	Introduction:Population health involves integration of health, education, and social services to keep a defined population healthy, to address health challenges holistically, and to assist with the realities of being mortal. The fragmentation of the US population health delivery system is addressed. The impacts of this fragmentation on the treatment of substance abuse in the United States are considered. Innovations needed to overcome this fragmentation are proposed. Approach:Treatment capacity issues, including scheduling practices, are discussed. Costs of treatment and lack of treatment are considered. Models of integrated care delivery are reviewed. Potential innovations from systems science, behavioral economics, and social networks are considered. The implications of these innovations are discussed in terms of information technology (IT) systems and governance. Conclusions:Enormous savings are possible with more integrated treatment. Based on a range of empirical findings, it is argued that investments of these resources in integrated delivery of care have the potential to dramatically improve health outcomes, thereby significantly reducing the costs of population health.
Beyond the hype of big data and artificial intelligence: building foundations for knowledge and wisdom.	BMC Med	2019	data-technologi-system-base	NULL	Big data, coupled with the use of advanced analytical approaches, such as artificial intelligence (AI), have the potential to improve medical outcomes and population health. Data that are routinely generated from, for example, electronic medical records and smart devices have become progressively easier and cheaper to collect, process, and analyze. In recent decades, this has prompted a substantial increase in biomedical research efforts outside traditional clinical trial settings. Despite the apparent enthusiasm of researchers, funders, and the media, evidence is scarce for successful implementation of products, algorithms, and services arising that make a real difference to clinical care. This article collection provides concrete examples of how “big data” can be used to advance healthcare and discusses some of the limitations and challenges encountered with this type of research. It primarily focuses on real-world data, such as electronic medical records and genomic medicine, considers new developments in AI and digital health, and discusses ethical considerations and issues related to data sharing. Overall, we remain positive that big data studies and associated new technologies will continue to guide novel, exciting research that will ultimately improve healthcare and medicine-but we are also realistic that concerns remain about privacy, equity, security, and benefit to all.
Enabling the Internet of Mobile Crowdsourcing Health Things: A Mobile Fog Computing, Blockchain and IoT Based Continuous Glucose Monitoring System for Diabetes Mellitus Research and Care.	Sensors (Basel)	2019	data-technologi-system-base	NULL	Diabetes patients suffer from abnormal blood glucose levels, which can cause diverse health disorders that affect their kidneys, heart and vision. Due to these conditions, diabetes patients have traditionally checked blood glucose levels through Self-Monitoring of Blood Glucose (SMBG) techniques, like pricking their fingers multiple times per day. Such techniques involve a number of drawbacks that can be solved by using a device called Continuous Glucose Monitor (CGM), which can measure blood glucose levels continuously throughout the day without having to prick the patient when carrying out every measurement. This article details the design and implementation of a system that enhances commercial CGMs by adding Internet of Things (IoT) capabilities to them that allow for monitoring patients remotely and, thus, warning them about potentially dangerous situations. The proposed system makes use of smartphones to collect blood glucose values from CGMs and then sends them either to a remote cloud or to distributed fog computing nodes. Moreover, in order to exchange reliable, trustworthy and cybersecure data with medical scientists, doctors and caretakers, the system includes the deployment of a decentralized storage system that receives, processes and stores the collected data. Furthermore, in order to motivate users to add new data to the system, an incentive system based on a digital cryptocurrency named GlucoCoin was devised. Such a system makes use of a blockchain that is able to execute smart contracts in order to automate CGM sensor purchases or to reward the users that contribute to the system by providing their own data. Thanks to all the previously mentioned technologies, the proposed system enables patient data crowdsourcing and the development of novel mobile health (mHealth) applications for diagnosing, monitoring, studying and taking public health actions that can help to advance in the control of the disease and raise global awareness on the increasing prevalence of diabetes.

Systematic reviews

We cna extract systematic revews in a similar way.


sr <- labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  filter(str_detect(keywords, "Review")|str_detect(absText, "systematic review"))

table_sr <- sr %>%
  select(title, journalTitle, pubYear, clus_names, keywords, absText)

There are 7 articles tagged with public health as a Mesh heading. These are shown in the table 2.

title	journalTitle	pubYear	clus_names	keywords	absText
Blockchain Technology: Applications in Health Care.	Circ Cardiovasc Qual Outcomes	2017	blockchain-technologi-result-paper-system	c(“Humans”, “Confidentiality”, “Biomedical Technology”, “Diffusion of Innovation”, “Computer Security”, “Database Management Systems”, “Insurance Claim Review”, “Electronic Health Records”, “Administrative Claims, Healthcare”)	NULL
(Block) Chain Reaction: A Blockchain Revolution Sweeps into Health Care, Offering the Possibility for a Much-Needed Data Solution.	IEEE Pulse	2018	data-technologi-system-base	c(“Humans”, “Databases, Factual”, “Insurance Claim Review”, “Electronic Health Records”)	Electronic health records may have digitized patient data, but getting that data from one clinician to another remains a huge challenge, especially since patients often have multiple doctors ordering tests, prescribing drugs, and providing treatment. Many experts now believe that blockchain technology might be just the thing to get a patient’s pertinent medical information from where it is stored to where it is needed, as well as to allow patients to easily view their own medical histories. In addition, blockchain technology might also be able to help with other aspects of health care, such as improving the insurance claim or other administrative processes within healthcare networks and making health-related population data available to biomedical researchers.
Findings from 2017 on Health Information Management	Yearb Med Inform	2018	data-technologi-system-base	c(“Humans”, “Confidentiality”, “Health Policy”, “Health Records, Personal”, “Health Information Management”, “Health Information Exchange”, “Data Anonymization”)	OBJECTIVE:To summarize the recent literature and research and present a selection of the best papers published in 2017 in the field of Health Information Management (HIM) and Health Informatics. METHODS:A systematic review of the literature was performed by the two HIM section editors of the International Medical Informatics Association (IMIA) Yearbook with the help of a medical librarian. We searched bibliographic databases for HIM-related papers using both MeSH descriptors and keywords in titles and abstracts. A shortlist of 15 candidate best papers was first selected by section editors before being peer-reviewed by independent external reviewers. RESULTS:Health Information Exchange was a major theme within candidate best papers. The four papers ultimately selected as ‘Best Papers’ represent themes that include health information exchange, governance and policy issues, results of health information exchange, and methods of integrating information from multiple sources. Other articles within the candidate best papers include these themes as well as those focusing on authentication and de-identification and usability of information systems. CONCLUSIONS:The papers discussed in the HIM section of IMIA Yearbook reflect the overall theme of the 2018 edition of the Yearbook, i.e., the tension between privacy and access to information. While most of the papers focused on health information exchange, which reflects the “access” side of the equation, most of the others addressed privacy issues. This synopsis discusses these key issues at the intersection of HIM and informatics.
Implementing Blockchains for Efficient Health Care: Systematic Review.	J Med Internet Res	2019	data-technologi-system-base	NULL	BACKGROUND:The decentralized nature of sensitive health information can bring about situations where timely information is unavailable, worsening health outcomes. Furthermore, as patient involvement in health care increases, there is a growing need for patients to access and control their data. Blockchain is a secure, decentralized online ledger that could be used to manage electronic health records (EHRs) efficiently, therefore with the potential to improve health outcomes by creating a conduit for interoperability. OBJECTIVE:This study aimed to perform a systematic review to assess the feasibility of blockchain as a method of managing health care records efficiently. METHODS:Reviewers identified studies via systematic searches of databases including PubMed, MEDLINE, Scopus, EMBASE, ProQuest, and Cochrane Library. Suitability for inclusion of each was assessed independently. RESULTS:Of the 71 included studies, the majority discuss potential benefits and limitations without evaluation of their effectiveness, although some systems were tested on live data. CONCLUSIONS:Blockchain could create a mechanism to manage access to EHRs stored on the cloud. Using a blockchain can increase interoperability while maintaining privacy and security of data. It contains inherent integrity and conforms to strict legal regulations. Increased interoperability would be beneficial for health outcomes. Although this technology is currently unfamiliar to most, investments into creating a sufficiently user-friendly interface and educating users on how best to take advantage of it would lead to improved health outcomes. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID):RR2-10.2196/10994.
Comparison of blockchain platforms: a systematic review and healthcare examples.	J Am Med Inform Assoc	2019	data-technologi-system-base	NULL	OBJECTIVES:To introduce healthcare or biomedical blockchain applications and their underlying blockchain platforms, compare popular blockchain platforms using a systematic review method, and provide a reference for selection of a suitable blockchain platform given requirements and technical features that are common in healthcare and biomedical research applications. TARGET AUDIENCE:Healthcare or clinical informatics researchers and software engineers who would like to learn about the important technical features of different blockchain platforms to design and implement blockchain-based health informatics applications. SCOPE:Covered topics include (1) a brief introduction to healthcare or biomedical blockchain applications and the benefits to adopt blockchain; (2) a description of key features of underlying blockchain platforms in healthcare applications; (3) development of a method for systematic review of technology, based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement, to investigate blockchain platforms for healthcare and medicine applications; (4) a review of 21 healthcare-related technical features of 10 popular blockchain platforms; and (5) a discussion of findings and limitations of the review.
Blockchain Technology in Healthcare: A Systematic Review.	Healthcare (Basel)	2019	data-technologi-system-base	NULL	Since blockchain was introduced through Bitcoin, research has been ongoing to extend its applications to non-financial use cases. Healthcare is one industry in which blockchain is expected to have significant impacts. Research in this area is relatively new but growing rapidly; so, health informatics researchers and practitioners are always struggling to keep pace with research progress in this area. This paper reports on a systematic review of the ongoing research in the application of blockchain technology in healthcare. The research methodology is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines and a systematic mapping study process, in which a well-designed search protocol is used to search four scientific databases, to identify, extract and analyze all relevant publications. The review shows that a number of studies have proposed different use cases for the application of blockchain in healthcare; however, there is a lack of adequate prototype implementations and studies to characterize the effectiveness of these proposed use cases. The review further highlights the state-of-the-art in the development of blockchain applications for healthcare, their limitations and the areas for future research. To this end, therefore, there is still the need for more research to better understand, characterize and evaluate the utility of blockchain in healthcare.
Design Choices and Trade-Offs in Health Care Blockchain Implementations: Systematic Review.	J Med Internet Res	2019	data-technologi-system-base	NULL	BACKGROUND:A blockchain is a list of records that uses cryptography to make stored data immutable; their use has recently been proposed for electronic medical record (EMR) systems. This paper details a systematic review of trade-offs in blockchain technologies that are relevant to EMRs. Trade-offs are defined as “a compromise between two desirable but incompatible features.” OBJECTIVE:This review’s primary research question was: “What are the trade-offs involved in different blockchain designs that are relevant to the creation of blockchain-based electronic medical records systems?” METHODS:Seven databases were systematically searched for relevant articles using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). Papers published from January 1, 2017 to June 15, 2018 were selected. Quality assessments of papers were performed using the Risk Of Bias In Non-randomized Studies-of Interventions (ROBINS-I) tool and the Critical Assessment Skills Programme (CASP) tool. Database searches identified 2885 articles, of which 15 were ultimately included for analysis. RESULTS:A total of 17 trade-offs were identified impacting the design, development, and implementation of blockchain systems; these trade-offs are organized into themes, including business, application, data, and technology architecture. CONCLUSIONS:The key findings concluded the following: (1) multiple trade-offs can be managed adaptively to improve EMR utility; (2) multiple trade-offs involve improving the security of blockchain systems at the cost of other features, meaning EMR efficacy highly depends on data protection standards; and (3) multiple trade-offs result in improved blockchain scalability. Consideration of these trade-offs will be important to the specific environment in which electronic medical records are being developed. This review also uses its findings to suggest useful design choices for a hypothetical National Health Service blockchain. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID):RR2-10.2196/10994.

Full texts

library(rvest)



get_pmcids <- sr %>%
  select(id, pmcid) %>%
  filter(!is.na(pmcid))


details <- mutate(get_pmcids, details = map(id, epmc_details))

full_text <- details %>%
    mutate(full_text = map(details, "ftx")) %>%
    unnest(full_text) %>%
  filter(availability == "Open access", url != "pdf") %>%
  select(id, url)

ftxt <- mutate(full_text, ftext = map(url, get_page_text)) %>%
  unnest() %>%
  distinct()

# summary_ftext <- ftxt %>%
#   group_by(id) %>%
#   mutate(col = paste(ftxt, collapse = " ")) %>%
#   select(-ftext) %>%
#   distinct() %>%
#   mutate(summary = map(col, text_summariser, 6))

Full table of abstracts

Finally we can gather all the abstracts into a single interactive table which can be searched, filtered and shared.


labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid"))  %>%
  select(cluster, clus_names, doi, title, journalTitle, pubYear, citedByCount, absText) %>%
  mutate(doi = paste0("<a href = http://google.com/search?q=", doi, ">doi</a>")) %>%
  DT::datatable(escape = FALSE, extensions = c('Responsive','Buttons', 'FixedHeader'), 
                filter = "top", 
  options = list(
    autoWidth = TRUE,
    columnDefs = list(list(width = '450px')),
    dom = 'Bfrtip',
    buttons = c('csv', 'excel'),
    fixedHeader=TRUE) 
  )

Rapid literature reviewing: automated methods in R

blockchain AND health