PHE - Health Improvement: Julian Flowers
2018-11-27
Application of data science methods, tools and techniques to improve our use of data in improving health and reducing health inequalities
https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(18)32207-4/fulltext
https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(15)00195-6/fulltext
Source: Mooney SJ, Pejaver V. Big Data in Public Health : Terminology , Machine Learning , and Privacy. Annual review of public health 2018;1–18. doi:29261408
Source | Examples | ‘Bigness’ | Technical Issues | Typical uses |
---|---|---|---|---|
omics/biological | Whole exome profiling, metabolomics | Wide | Lab effects,informatics pipeline | Etiologic research, screening |
Geospatial | Neighborhood characteristics | Wide | Spatial autocorrelation | Etiologic research, surveillance |
Electronic health records | Records of all patients with hypertension | Tall also wide | Data cleaning, natural language | Clinical research, surveillance |
Personal | Daily GPS records monitoring, Fitbit readings | Tall | Redundancy, inference | Aetiological research, potentially clinical decision making |
Effluent data | Google search results, Reddit | Tall | Selection biases, natural language | Surveillance, screening, identification of hidden social networks |
PHI 1.0 | PHI 2.0 |
---|---|
Health profiling | Analysis and insight |
Statistical analysis | Natural language processing, data wrangling, tidy data, web-scraping |
Collation and description | Prediction and prescription |
Excel and stats packages | R/Python/PowerBI/Tableau/Cloud |
Static reports | Interactive reports |
Manual processing | Automated processing |
Waterfall | Agile |
User feedback | User need |
Epidemiology & stats | Epidemiology + models + machine learning |
Structured/ small data | Structured and unstructured/ big data |
Bias/ confounding | Bias/ confounding |
When? | Software |
---|
Yesterday |
Today |
Tomorrow |
PERSONAS
Local health contains ~ 60 small area indicators - many of which are highly correlated
By using unsupervised machine learning techniques such as k-means analysis we can cluster areas into similar groups based on all the indicator data
This shows distinct small area clustering in inner and outer London, coastal areas, northern industrial areas, affluent rural areas an so on (map)
This is similar to ONS classification but is completely data driven
## [1] "Urban trees provide a wide range of environmental, social and economic benefits, such as improving air quality and are known to be associated with lower crime levels and greater community cohesion. In collaboration with the Office for National Statistics (ONS) Natural Capital team, we have developed an experimental method for estimating the density of trees and vegetation present at 10 metre intervals for all 112 major towns and cities in England and Wales."
## [2] "Our approach uses images sampled from Google Street View as the input to an image segmentation algorithm. This has enabled us to derive a vegetation density map by percentage, for the road network of an entire city. The developed system is built on recent advancements in the field of deep learning for semantic image segmentation."
## [3] "This blog summarises the approaches in our research to establish a city-wide geospatial vegetation indicator. Beginning with attempts to identify green vegetation in arbitrary scenes, we then move to evaluate models of increasing complexity, finishing with the use and validation of deep image segmentation neural networks for visual scene understanding."
## [1] "Researchers from Stanford University have applied deep learning-based computer vision techniques to 50 million images across 200 regions to identify 22 million cars, which is roughly 8 percent of all automobiles in the United States. Based on the types of cars and their locations, the researchers estimated the income, race, education, and voting patterns of the people living in those areas. The results they derived from pictures are impressively accurate."