Big data, AI, data science and public health

PHE - Health Improvement: Julian Flowers

2018-11-28

Outline

Digital, big data, data science and machine learning/ AI might impact on public health

Data and analysis
Evidence synthesis
Decision making tools
Digital interventions
Policy and priority
Forecasting
Evaluation and impact assessment
“Precision public health”

Big data

Formal definition

95% unstructured

Bigness

GBD 38 billion estimates
Fingertips 20 million rows of data
PHE data lake 10 billion rows of data
Active 10 5Gb data per day
1:2 ngram document term matrix of corpus of DPH reports - ~ 1 billion columns

Types of big data for public health

Source: Mooney SJ, Pejaver V. Big Data in Public Health : Terminology , Machine Learning , and Privacy. Annual review of public health 2018;1–18. doi:29261408

Source	Examples	‘Bigness’	Technical Issues	Typical uses
omics/biological	Whole exome profiling, metabolomics	Wide	Lab effects,informatics pipeline	Etiologic research, screening
Geospatial	Neighborhood characteristics	Wide	Spatial autocorrelation	Etiologic research, surveillance
Electronic health records	Records of all patients with hypertension	Tall also wide	Data cleaning, natural language	Clinical research, surveillance
Personal	Daily GPS records monitoring, Fitbit readings	Tall	Redundancy, inference	Aetiological research, potentially clinical decision making
Effluent data	Google search results, Reddit	Tall	Selection biases, natural language	Surveillance, screening, identification of hidden social networks

Drivers

Data growth (exabytes) (big data)
Data variety
Compute power - Moore’s Law, quantum computing
Data science in govt.
GDS
Digital by design
Data and technology standards and codes of practice
Technology service manual
Reproducible analytical pipelines
Mainstreaming data science tools
Precision public health
Open data and democratisation of data
Industrial strategy
HDR-UK

Changing face of PH analysis

Pubmed search: 16000 most recent abstracts for ”machine learning/ ai public health”

Data science processes

Upside-down sloths are so cute

Exposomes and phenomes

Upside-down sloths are so cute

Precision public health

https://rpubs.com/jflowers/428618

Various definitions but all encompass a set of consistent ideas: * Use of big data and new technology to improve health
* Data used to give more precise descriptions of populations and individuals
* Application of new techniques and methods
* Speed, accuracy and scale
* “the application and combination of new and existing technologies, which more precisely describe and analyse individuals and their environment over the life course, to tailor preventive interventions for at-risk groups and improve the overall health of the population.” (Weeramanthri et al. 2018)
* “improving the ability to prevent disease, promote health, and reduce health disparities in populations by applying emerging methods and technologies for measuring disease, pathogens, exposures, behaviors, and susceptibility in populations; and developing policies and targeted implementation programs to improve health” (Khoury and Galea 2016)
* “requires robust primary surveillance data, rapid application of sophisticated analytics to track the geographical distribution of disease, and the capacity to act on such information” (Dowell, Blazes, and Desmond-Hellmann 2016)
* “Precision public health is characterized by discovering, validating, and optimizing care strategies for well-characterized population strata” (Arnett and Claas 2016)

Precision

A second more expansive version of precision public health is about the use of data and analytical techniques to design and implement interventions that benefit whole populations. This version emphasises the use of sophisticated surveillance and modelling and has been promoted by the Bill and Melinda Gates Foundation. It is certainly appealing and is typified for example by the recent Global Burden of Disease study and much of PHE’s work on infection control and health improvement., The era of precision public health is upon us and we all need to become familiar with the concepts. In addition (as the Topol review on the future of the healthcare workforce is likely to report next year) the public health workforce, like others in the health sector, needs to upskill in the areas of data science and digital delivery as these are the shiny new tools in the precision public health toolbox.

John Newton, Michael Ekpe, Peter Bradley https://publichealthmatters.blog.gov.uk/2018/11/20/predictive-prevention-and-the-drive-for-precision-public-health/

Modernising PHI

PHI 1.0	PHI 2.0
Health profiling	Analysis and insight
Statistical analysis	Natural language processing, data wrangling, tidy data, web-scraping
Collation and description	Prediction and prescription
Excel and stats packages	R/Python/PowerBI/Tableau/Cloud
Static reports	Interactive reports
Manual processing	Automated processing
Waterfall	Agile
User feedback	User need
Epidemiology & stats	Epidemiology + models + machine learning
Structured/ small data	Structured and unstructured/ big data
Bias/ confounding	Bias/ confounding

Modernising PHI

When?	Software

Yesterday |
Today |
Tomorrow |

Modernising PHI

Current focus in PHE

Data acquisition and consolidation
- Surveys
- Primary care
- App data/ marketing
Profile and indicator rationalisation
R package strategy
- fingertipsR
- fingertipscharts
- odsr
- PHEindicatormethods
Automation
DHIP
PHISy
ICT challenges
Data fluency

Machine learning examples: local health cluster analysis

Local health contains ~ 60 small area indicators - many of which are highly correlated
By using unsupervised machine learning techniques such as k-means analysis we can cluster areas into similar groups based on all the indicator data
This shows distinct small area clustering in inner and outer London, coastal areas, northern industrial areas, affluent rural areas an so on (map)
This is similar to ONS classification but is completely data driven

Scaled up

Machine learning examples: area obesity rates

Can we predict area obesity rates from Health Profile data
Use supervised machine learning - a range of algorithms to try and predict area obesity
Penalised linear models make reasonable predictions (r2 ~ 0.84; rmse ~ 2% - estimates area obesity to within 2% on average ~ 8% error)

AI in literature searches and reviews

AI is impacting on literature searching, review and synthesis…

http://rpubs.com/jflowers/441656

Novel data uses: Google street maps

## [1] "Certain features of this site make use of javascript. For maximum benefit it is strongly advised that you \r\n          switch on javascript before continuing."
## [2] "Use of search engine’s imagery forms part of the era of ‘data abundance’, according to the UK’s national statistician"                                          
## [3] "Credit: Byrion Smith/CC BY 2.0"

## [1] "(Credit: Shutterstock)"

App data

active10