Public health data science in PHE

PHE - Health Improvement: Julian Flowers

2018-11-27

Outline

Who we are
What we do
Where we are going

A bit about PHE and PHDS in PHE

Formed in 2013 from 70+ public health agencies in England
Origins of public health data science in Public Health Observatories and public health intelligence (PHI)
PHDS team formed in 2015 following reorganisation
Now ~ 15wte
5 main areas of work
- Data
- Methods and indicators
- GBD and analysis
- Developing and applying data science to PHI problems
- Developing capacity and capability

http://organogram.phe.gov.uk/

Our definition of PHDS

Application of data science methods, tools and techniques to improve our use of data in improving health and reducing health inequalities

Drivers

Demands => “data science techniques”, efficiency improvements
Data growth
Data variety
Compute power
Data science in govt.
GDS
Digital by design
Data and technology standards and codes of practice
Technology service manual
Reproducible analytical pipelines
Mainstreaming data science tools
Open data and democratisation of data
Precision public health [more later]

Frameworks

active10

Some outputs

active10

http://datascience.phe.gov.uk/MortalityTool/Summary/Details/824

More outputs

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(18)32207-4/fulltext

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(15)00195-6/fulltext

Training

F2F for GBD, GIS, SQL
Webinars
Hold datacamp licences
On the job support

Wider networking

GBD networks
Academic networks
Gov networks
NHS netorks

Changing face of PH analysis

Precision

A second more expansive version of precision public health is about the use of data and analytical techniques to design and implement interventions that benefit whole populations. This version emphasises the use of sophisticated surveillance and modelling and has been promoted by the Bill and Melinda Gates Foundation. It is certainly appealing and is typified for example by the recent Global Burden of Disease study and much of PHE’s work on infection control and health improvement., The era of precision public health is upon us and we all need to become familiar with the concepts. In addition (as the Topol review on the future of the healthcare workforce is likely to report next year) the public health workforce, like others in the health sector, needs to upskill in the areas of data science and digital delivery as these are the shiny new tools in the precision public health toolbox.

John Newton, Michael Ekpe, Peter Bradley https://publichealthmatters.blog.gov.uk/2018/11/20/predictive-prevention-and-the-drive-for-precision-public-health//

Types of big data for public health

Source: Mooney SJ, Pejaver V. Big Data in Public Health : Terminology , Machine Learning , and Privacy. Annual review of public health 2018;1–18. doi:29261408

Source	Examples	‘Bigness’	Technical Issues	Typical uses
omics/biological	Whole exome profiling, metabolomics	Wide	Lab effects,informatics pipeline	Etiologic research, screening
Geospatial	Neighborhood characteristics	Wide	Spatial autocorrelation	Etiologic research, surveillance
Electronic health records	Records of all patients with hypertension	Tall also wide	Data cleaning, natural language	Clinical research, surveillance
Personal	Daily GPS records monitoring, Fitbit readings	Tall	Redundancy, inference	Aetiological research, potentially clinical decision making
Effluent data	Google search results, Reddit	Tall	Selection biases, natural language	Surveillance, screening, identification of hidden social networks

Pubmed search: 16000 most recent abstracts for ”machine learning/ ai public health”

Modernising PHI

PHI 1.0	PHI 2.0
Health profiling	Analysis and insight
Statistical analysis	Natural language processing, data wrangling, tidy data, web-scraping
Collation and description	Prediction and prescription
Excel and stats packages	R/Python/PowerBI/Tableau/Cloud
Static reports	Interactive reports
Manual processing	Automated processing
Waterfall	Agile
User feedback	User need
Epidemiology & stats	Epidemiology + models + machine learning
Structured/ small data	Structured and unstructured/ big data
Bias/ confounding	Bias/ confounding

Modernising PHI

When?	Software

Yesterday |
Today |
Tomorrow |

Sorts of questions we need to answer

Basics about prevalence and incidence
Inclusion health
Impact and effectiveness
Real world evaluation
Implications of precision PH - granularity, timeliness, data integration
Value of new datasets (inc primary care, digital, apps)
Health in the future
Which interventions, where and for who
Practical daily linkage - cohort generation
Maximising efficiency and effective use of data

Population health intelligence system discovery (PHISy)

PERSONAS

Machine learning examples: local health cluster analysis

Local health contains ~ 60 small area indicators - many of which are highly correlated
By using unsupervised machine learning techniques such as k-means analysis we can cluster areas into similar groups based on all the indicator data
This shows distinct small area clustering in inner and outer London, coastal areas, northern industrial areas, affluent rural areas an so on (map)
This is similar to ONS classification but is completely data driven

Machine learning examples: area obesity rates

Can we predict area obesity rates from Health Profile data
Use supervised machine learning - a range of algorithms to try and predict area obesity
Penalised linear models make reasonable predictions (r2 ~ 0.84; rmse ~ 2% - estimates area obesity to within 2% on average ~ 8% error)

AI in literature searches and reviews

AI is impacting on literature searching, review and synthesis…

http://rpubs.com/jflowers/441656

Novel data uses: Google street maps

## [1] "Urban trees provide a wide range of environmental, social and economic benefits, such as improving air quality and are known to be associated with lower crime levels and greater community cohesion. In collaboration with the Office for National Statistics (ONS) Natural Capital team, we have developed an experimental method for estimating the density of trees and vegetation present at 10 metre intervals for all 112 major towns and cities in England and Wales."
## [2] "Our approach uses images sampled from Google Street View as the input to an image segmentation algorithm. This has enabled us to derive a vegetation density map by percentage, for the road network of an entire city. The developed system is built on recent advancements in the field of deep learning for semantic image segmentation."                                                                                                                                  
## [3] "This blog summarises the approaches in our research to establish a city-wide geospatial vegetation indicator. Beginning with attempts to identify green vegetation in arbitrary scenes, we then move to evaluate models of increasing complexity, finishing with the use and validation of deep image segmentation neural networks for visual scene understanding."

## [1] "Researchers from Stanford University have applied deep learning-based computer vision techniques to 50 million images across 200 regions to identify 22 million cars, which is roughly 8 percent of all automobiles in the United States. Based on the types of cars and their locations, the researchers estimated the income, race, education, and voting patterns of the people living in those areas. The results they derived from pictures are impressively accurate."

App data

active10