Big data, AI, data science and public health

PHE - Health Improvement: Julian Flowers

2018-11-28

Outline

Digital, big data, data science and machine learning/ AI might impact on public health

Big data

Formal definition

95% unstructured

Bigness

Types of big data for public health

Source: Mooney SJ, Pejaver V. Big Data in Public Health : Terminology , Machine Learning , and Privacy. Annual review of public health 2018;1–18. doi:29261408

Source Examples ‘Bigness’ Technical Issues Typical uses
omics/biological Whole exome profiling, metabolomics Wide Lab effects,informatics pipeline Etiologic research, screening
Geospatial Neighborhood characteristics Wide Spatial autocorrelation Etiologic research, surveillance
Electronic health records Records of all patients with hypertension Tall also wide Data cleaning, natural language Clinical research, surveillance
Personal Daily GPS records monitoring, Fitbit readings Tall Redundancy, inference Aetiological research, potentially clinical decision making
Effluent data Google search results, Reddit Tall Selection biases, natural language Surveillance, screening, identification of hidden social networks

Drivers

Changing face of PH analysis

Pubmed search: 16000 most recent abstracts for ”machine learning/ ai public health”

Data science processes

Upside-down sloths are so cute Upside-down sloths are so cute Upside-down sloths are so cute

Exposomes and phenomes

Upside-down sloths are so cute

Precision public health

https://rpubs.com/jflowers/428618

Various definitions but all encompass a set of consistent ideas: * Use of big data and new technology to improve health
* Data used to give more precise descriptions of populations and individuals
* Application of new techniques and methods
* Speed, accuracy and scale
* “the application and combination of new and existing technologies, which more precisely describe and analyse individuals and their environment over the life course, to tailor preventive interventions for at-risk groups and improve the overall health of the population.” (Weeramanthri et al. 2018)
* “improving the ability to prevent disease, promote health, and reduce health disparities in populations by applying emerging methods and technologies for measuring disease, pathogens, exposures, behaviors, and susceptibility in populations; and developing policies and targeted implementation programs to improve health” (Khoury and Galea 2016)
* “requires robust primary surveillance data, rapid application of sophisticated analytics to track the geographical distribution of disease, and the capacity to act on such information” (Dowell, Blazes, and Desmond-Hellmann 2016)
* “Precision public health is characterized by discovering, validating, and optimizing care strategies for well-characterized population strata” (Arnett and Claas 2016)

Precision

Modernising PHI

PHI 1.0 PHI 2.0
Health profiling Analysis and insight
Statistical analysis Natural language processing, data wrangling, tidy data, web-scraping
Collation and description Prediction and prescription
Excel and stats packages R/Python/PowerBI/Tableau/Cloud
Static reports Interactive reports
Manual processing Automated processing
Waterfall Agile
User feedback User need
Epidemiology & stats Epidemiology + models + machine learning
Structured/ small data Structured and unstructured/ big data
Bias/ confounding Bias/ confounding

Modernising PHI

When? Software

Modernising PHI

Current focus in PHE

Machine learning examples: local health cluster analysis

Scaled up

Machine learning examples: area obesity rates

AI in literature searches and reviews

http://rpubs.com/jflowers/441656

Novel data uses: Google street maps

## [1] "Certain features of this site make use of javascript. For maximum benefit it is strongly advised that you \r\n          switch on javascript before continuing."
## [2] "Use of search engine’s imagery forms part of the era of ‘data abundance’, according to the UK’s national statistician"                                          
## [3] "Credit: Byrion Smith/CC BY 2.0"
## [1] "(Credit: Shutterstock)"

App data

active10