PHE - Health Improvement: Julian Flowers
2018-11-28
Digital, big data, data science and machine learning/ AI might impact on public health
Source: Mooney SJ, Pejaver V. Big Data in Public Health : Terminology , Machine Learning , and Privacy. Annual review of public health 2018;1–18. doi:29261408
Source | Examples | ‘Bigness’ | Technical Issues | Typical uses |
---|---|---|---|---|
omics/biological | Whole exome profiling, metabolomics | Wide | Lab effects,informatics pipeline | Etiologic research, screening |
Geospatial | Neighborhood characteristics | Wide | Spatial autocorrelation | Etiologic research, surveillance |
Electronic health records | Records of all patients with hypertension | Tall also wide | Data cleaning, natural language | Clinical research, surveillance |
Personal | Daily GPS records monitoring, Fitbit readings | Tall | Redundancy, inference | Aetiological research, potentially clinical decision making |
Effluent data | Google search results, Reddit | Tall | Selection biases, natural language | Surveillance, screening, identification of hidden social networks |
https://rpubs.com/jflowers/428618
Various definitions but all encompass a set of consistent ideas: * Use of big data and new technology to improve health
* Data used to give more precise descriptions of populations and individuals
* Application of new techniques and methods
* Speed, accuracy and scale
* “the application and combination of new and existing technologies, which more precisely describe and analyse individuals and their environment over the life course, to tailor preventive interventions for at-risk groups and improve the overall health of the population.” (Weeramanthri et al. 2018)
* “improving the ability to prevent disease, promote health, and reduce health disparities in populations by applying emerging methods and technologies for measuring disease, pathogens, exposures, behaviors, and susceptibility in populations; and developing policies and targeted implementation programs to improve health” (Khoury and Galea 2016)
* “requires robust primary surveillance data, rapid application of sophisticated analytics to track the geographical distribution of disease, and the capacity to act on such information” (Dowell, Blazes, and Desmond-Hellmann 2016)
* “Precision public health is characterized by discovering, validating, and optimizing care strategies for well-characterized population strata” (Arnett and Claas 2016)
PHI 1.0 | PHI 2.0 |
---|---|
Health profiling | Analysis and insight |
Statistical analysis | Natural language processing, data wrangling, tidy data, web-scraping |
Collation and description | Prediction and prescription |
Excel and stats packages | R/Python/PowerBI/Tableau/Cloud |
Static reports | Interactive reports |
Manual processing | Automated processing |
Waterfall | Agile |
User feedback | User need |
Epidemiology & stats | Epidemiology + models + machine learning |
Structured/ small data | Structured and unstructured/ big data |
Bias/ confounding | Bias/ confounding |
When? | Software |
---|
Yesterday |
Today |
Tomorrow |
Local health contains ~ 60 small area indicators - many of which are highly correlated
By using unsupervised machine learning techniques such as k-means analysis we can cluster areas into similar groups based on all the indicator data
This shows distinct small area clustering in inner and outer London, coastal areas, northern industrial areas, affluent rural areas an so on (map)
This is similar to ONS classification but is completely data driven
## [1] "Certain features of this site make use of javascript. For maximum benefit it is strongly advised that you \r\n switch on javascript before continuing."
## [2] "Use of search engine’s imagery forms part of the era of ‘data abundance’, according to the UK’s national statistician"
## [3] "Credit: Byrion Smith/CC BY 2.0"
## [1] "(Credit: Shutterstock)"