library(dplyr)
library(ggplot2)
library(knitr)
library(tidyr)
library(kableExtra)
library(ggplot2)
library(dplyr)
library(plotly)
library(reshape2)
library(ggridges)
library(tidyverse)
The United Kingdom’s Government website https://data.gov.uk offers an accessible way to access open data from many different sectors. Analyzing clinical health data can offer many important insights into treatment improvement and at times disease pathology.
This open data website was chosen for the breadth and depth of the open source clinical health data available. The web interface and the data retrievel process is user friendly. Dataset’s can be searched using multiple parameters, including a filter identify datasets as ‘open government licence (OGL) only’.
Using this filter and the dataset summary informaiton readily available regarding “National Diabetes Foot Care Audit 2014-2018”(https://data.gov.uk/dataset/970ebe30-cb7b-4205-9149-f99f0400a881/national-diabetes-foot-care-audit-2014-2018), it was determined this was an open dataset, available for free use.
db_data_master <- read.csv("/Users/cara/Desktop/dd.csv")
#head(db_data_master)
#colnames(db_data_master)
abr_db_data <- db_data_master %>% select(Parent.Code,Organisation.Type,Parent.Name, NDFA.patients..Age.at.assessment..years.,NDFA.patients.linked.to.NDA..Duration.of.diabetes..years.,Cases.with.any.amputation.in.6.months...., Cases.with.SINBAD.Neuropathy.recorded..n., Cases.with.SINBAD.Neuropathy.recorded....,Cases.with.FD.admission..debridement..in.6.months...., Cases.with.FD.admission..amputation..in.6.months...., Cases.with.FD.admission..circulatory.comp..in.6.months...., Cases.with.FD.admission..lower.leg.ulcer..in.6.months...., Cases.with.FD.admission..decubitus.ulcer..in.6.months...., Cases.with.FD.admission..cellulitis..in.6.months...., Cases.with.FD.admission..gangrene..in.6.months...., Cases.with.FD.admission..osteomyelitis..in.6.months...., Cases.with.FD.admission..bacterial...sepsis..in.6.months....) %>% filter(Organisation.Type != 'STP') %>% filter(NDFA.patients..Age.at.assessment..years. != '*') %>% filter(Cases.with.any.amputation.in.6.months.... != '*') %>% filter(NDFA.patients.linked.to.NDA..Duration.of.diabetes..years. != '*') %>%
group_by(Parent.Name)
#head(abr_db_data)
int_db_data <- abr_db_data %>% select("Location" = Parent.Name, "Amputations" = Cases.with.any.amputation.in.6.months....)
int_db_data$Amputations <- as.numeric(int_db_data$Amputations)
ordered_amputations <- int_db_data[order(-int_db_data$Amputations),]
ordered_amputations_nodup <- ordered_amputations %>% distinct(Location, .keep_all= TRUE)
highest_amputations <- ordered_amputations_nodup[1:10,]
kable(highest_amputations, booktabs = T, caption = "Table 1: United Kingdom locations with the highest number of Diabetes related amputations in the past 6 months.") %>% kable_styling(latex_options = "striped")
| Location | Amputations |
|---|---|
| Cornwall and the Isles of Scilly | 105 |
| NHS England South West (South West South) | 105 |
| Cornwall Partnership NHS Foundation Trust | 105 |
| Devon | 104 |
| NHS England South East (Hampshire, Isle of Wight and Thames Valley) | 104 |
| NHS England North (Yorkshire and Humber) | 104 |
| Northern Devon Healthcare NHS Trust | 104 |
| Royal Berkshire NHS Foundation Trust | 104 |
| NHS England Midlands and East (Central Midlands) | 103 |
| University Hospitals of Leicester NHS Trust | 103 |
Evaluating this dataset gave insight into the interaction and the spread of several severe diabetes related symptoms. The heatmap depicts the strength of interactions or corrlations between reported cases of 3 common diabetes related symptoms with amputation incidence in that location. These results suggest that the cellulitis and osteomyelitis incidence may be more strongly related to amputations than gangrene. This is initially a surprising result as you might typically associate gangrene with amputations. However, this may be because cellulitis and osteomyelitis, bacterial skin and bone infection, are early symptoms that can lead to gangrene. Alternatively, these results may show the weakness of using this type of plot to gain an understanding of this type of interaction without more information.
In the ridgeline plot, the incidence of several severe diabetes related symptoms was displayed across the locations with the highest reported number of amputations. This can provide some insight into the type of patient care demands in locations with the highest amputation rates. It is likely that locations with higher reported amputations might be higher level facilities. Interestingly, circulatory issues were reported more highly among the top amputation sites while osteomyelitis, gangrene and cellulitis were reported less. A likely reality is that early symptoms (i.e. osteomyelitis, gangrene, cellulitis) are reported by patients to community doctors and the most severe cases are funneled to specialized treatment centers to perform the amputation.
While there were numerous learning opportunities in cleaning and organizing the data. The biggest challenge was to find an open-dataset that contained health information and was easily accessible. While many datasets said they were ‘open’, finding a file of raw data was much harder than the main webpage of many of these ‘open data’ websites would suggest.