Public health data science in PHE

PHE - Health Improvement: Julian Flowers

2018-11-27

Outline

A bit about PHE and PHDS in PHE

http://organogram.phe.gov.uk/

Our definition of PHDS

Application of data science methods, tools and techniques to improve our use of data in improving health and reducing health inequalities

Drivers

Frameworks

active10 active10 active10

Some outputs

active10 active10

active10 active10

active10 active10

http://datascience.phe.gov.uk/MortalityTool/Summary/Details/824

More outputs

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(18)32207-4/fulltext

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(15)00195-6/fulltext

Training

Wider networking

Changing face of PH analysis

Precision

Types of big data for public health

Source: Mooney SJ, Pejaver V. Big Data in Public Health : Terminology , Machine Learning , and Privacy. Annual review of public health 2018;1–18. doi:29261408

Source Examples ‘Bigness’ Technical Issues Typical uses
omics/biological Whole exome profiling, metabolomics Wide Lab effects,informatics pipeline Etiologic research, screening
Geospatial Neighborhood characteristics Wide Spatial autocorrelation Etiologic research, surveillance
Electronic health records Records of all patients with hypertension Tall also wide Data cleaning, natural language Clinical research, surveillance
Personal Daily GPS records monitoring, Fitbit readings Tall Redundancy, inference Aetiological research, potentially clinical decision making
Effluent data Google search results, Reddit Tall Selection biases, natural language Surveillance, screening, identification of hidden social networks

Pubmed search: 16000 most recent abstracts for ”machine learning/ ai public health”

Modernising PHI

PHI 1.0 PHI 2.0
Health profiling Analysis and insight
Statistical analysis Natural language processing, data wrangling, tidy data, web-scraping
Collation and description Prediction and prescription
Excel and stats packages R/Python/PowerBI/Tableau/Cloud
Static reports Interactive reports
Manual processing Automated processing
Waterfall Agile
User feedback User need
Epidemiology & stats Epidemiology + models + machine learning
Structured/ small data Structured and unstructured/ big data
Bias/ confounding Bias/ confounding

Modernising PHI

When? Software

Sorts of questions we need to answer

Population health intelligence system discovery (PHISy)

PERSONAS

Machine learning examples: local health cluster analysis

Machine learning examples: area obesity rates

AI in literature searches and reviews

http://rpubs.com/jflowers/441656

Novel data uses: Google street maps

## [1] "Urban trees provide a wide range of environmental, social and economic benefits, such as improving air quality and are known to be associated with lower crime levels and greater community cohesion. In collaboration with the Office for National Statistics (ONS) Natural Capital team, we have developed an experimental method for estimating the density of trees and vegetation present at 10 metre intervals for all 112 major towns and cities in England and Wales."
## [2] "Our approach uses images sampled from Google Street View as the input to an image segmentation algorithm. This has enabled us to derive a vegetation density map by percentage, for the road network of an entire city. The developed system is built on recent advancements in the field of deep learning for semantic image segmentation."                                                                                                                                  
## [3] "This blog summarises the approaches in our research to establish a city-wide geospatial vegetation indicator. Beginning with attempts to identify green vegetation in arbitrary scenes, we then move to evaluate models of increasing complexity, finishing with the use and validation of deep image segmentation neural networks for visual scene understanding."
## [1] "Researchers from Stanford University have applied deep learning-based computer vision techniques to 50 million images across 200 regions to identify 22 million cars, which is roughly 8 percent of all automobiles in the United States. Based on the types of cars and their locations, the researchers estimated the income, race, education, and voting patterns of the people living in those areas. The results they derived from pictures are impressively accurate."

App data

active10