fingertipsR
We’ll use the practice profiles and other profiles with GP data (e.g. diabetes):
(Note, indicator names for identical indicators vary across Fingertips data)
For this analysis we are interested in extracting prevalence estimates, deprivation scores and demographic variables (prevalence estimates are not age-adjusted)
We have a list of 40 indicator IDs for which we will extract data
We can now extract the data (NB takes a few minutes)
What have we got:
Now have 57 variables for 7265 GP practices.
Can look at the relationship between variables
Source : https://maps.googleapis.com/maps/api/staticmap?center=London&zoom=10&size=640x640&scale=2&maptype=terrain&key=xxx-eIDkZLfLH4
Source : https://maps.googleapis.com/maps/api/geocode/json?address=London&key=xxx-eIDkZLfLH4
Saving 7 x 7 in image
Source : https://maps.googleapis.com/maps/api/staticmap?center=England&zoom=6&size=640x640&scale=2&maptype=terrain&key=xxx-eIDkZLfLH4
Source : https://maps.googleapis.com/maps/api/geocode/json?address=England&key=xxx-eIDkZLfLH4
Cluster No | Size | Description | Geography |
---|---|---|---|
1 | 801 | Relatively deprived, “middle-aged”, higher levels of obesity and smoking, high prevalence of diabetes in younger age groups, high prevalence of cvd, msk, epilepsy, respiratory disease) | Urban NE/ NW/ Midlands/ some coastal/ East London |
2 | 687 | Most deprived, “middle-aged”, highest levels of obesity and smoking, high prevalence of diabetes in younger age groups, high prevalence of mental health problems and respiratory disease | Similar to cluster 1: Urban NE/ NW/ Midlands/ some coastal/ East London |
3 | 1632 | Average - “middle age structure”, not deprived, average levels of obesity, smoking, average prevalence for most disease | Suburban NE/ NW/ Midlands/ some coastal/London/ South coast |
4 | 296 | Relatively deprived, most “middle-aged”, low levels of obesity but higher smoking rates, high prevalence of diabetes in younger age groups, low prevalence of cvd, msk, epilepsy, respiratory disease), higher levels of mental health problems | Mostly suburban London |
5 | 433 | Deprived, youngest population, higher levels of obesity and smoking, high prevalence of diabetes in younger age groups, high prevalence of cvd, msk, epilepsy, respiratory disease) | Similar to cluster 4 |
6 | 537 | Not deprived, oldest population, lower levels of obesity and smoking, high prevalence of diabetes in younger age groups, high prevalence of cvd, msk, epilepsy, respiratory disease) | Largely coastal areas across England |
7 | 594 | Similar to 5.Deprived, youngest population, higher levels of obesity and smoking, high prevalence of diabetes in younger age groups, high prevalence of cvd, msk, epilepsy, respiratory disease | Geography similar to 1 |
8 | 442 | Similar to 4 but less deprived, more middle-aged population, lower levels of obesity and smoking, high prevalence of diabetes in older age groups, lower prevalence of cvd, msk, epilepsy, respiratory disease | Largely central London |
9 | 1121 | Similar to 6. Not Deprived, older population, lower levels of obesity but higher rate of smoking, high prevalence of diabetes in younger age groups, high prevalence of cvd, msk, epilepsy, respiratory disease | Geography widespread across the country |
10 | 731 | Not deprived, younger population, low rates of risk factors and prevalence across range of diseases | South East England |
Through simple clustering approaches we have been able define a number of distinct GP population phenotypes. For the purposes of this analysis we use co-morbidity to mean the co-existence of high prevalence across a range of diseases in the same population. Using this definition we can distinguish 4 groups of practice populations with high levels of co-morbidity:
Cluster 9 which is similar to Cluster 6 but has a different geography
Note that we have used crude prevalence estimates from QOF - age adjusted estimates could change the clustering. Also we could improve demographic profiling by including ethnicity estimates. Additional risk factor estimates (e.g. alcohol) could also change the clustering. The choice of 10 cluster groups is somewhat arbitrary and it is clear that some are subsets of others.
From an inequality perspective Cluster 1 is of most concern which appears to have high levels of co-morbidity despite having a relatively young population profile
London has a different set of phenotypes to the rest of the country.
(“unsupervised machine learning”[MeSH Terms] OR (“unsupervised”[All Fields] AND “machine”[All Fields] AND “learning”[All Fields]) OR “unsupervised machine learning”[All Fields]) OR ((“cluster analysis”[MeSH Terms] OR (“cluster”[All Fields] AND “analysis”[All Fields]) OR “cluster analysis”[All Fields]) AND (“population”[MeSH Terms] OR “population”[All Fields] OR “population groups”[MeSH Terms] OR (“population”[All Fields] AND “groups”[All Fields]) OR “population groups”[All Fields]) AND segmentation[All Fields]) AND 2010[PDAT] : 2018[PDAT]
1.Yan S, Kwan YH, Tan CS, Thumboo J, Low LL. A systematic review of the clinical application of data-driven population segmentation analysis. BMC Med Res Methodol. 2018;18(1):121., 2.Leslie HH, Zhou X, Spiegelman D, Kruk ME. Health system measurement: Harnessing machine learning to advance global health. PLoS One. 2018;13(10):e0204958., 3.Cleret de Langavant L, Bayen E, Yaffe K. Unsupervised Machine Learning to Identify High Likelihood of Dementia in Population-Based Surveys: Development and Validation Study. J Med Internet Res. 2018;20(7):e10493., 4.Vuik SI, Mayer E, Darzi A. A quantitative evidence base for population health: applying utilization-based cluster analysis to segment a patient population. Popul Health Metr. 2016;14:44., 5.Demydas T. Consumer segmentation based on the level and structure of fruit and vegetable intake: an empirical evidence for US adults from the National Health and Nutrition Examination Survey (NHANES) 2005-2006. Public Health Nutr. 2011;14(6):1088-95.