This is the R Notebook for the story “Avoidable mortality 32% higher in the West Midlands most deprived cities” published in Birmingham Eastside.
It provides with the statistical analysis for the story.
The data comes from the BBC Data Shared Unit, and the correlation analysis that I did for them can be found in the R Notebook called Death correlations.
library(readxl)
library(dplyr)
library(tidyr)
library(ggplot2)
library(knitr)
knit('Death_correlation.Rmd')
##
|
| | 0%
|
|.. | 2%
## ordinary text without R code
##
##
|
|... | 5%
## label: unnamed-chunk-22
##
|
|..... | 7%
## ordinary text without R code
##
##
|
|...... | 10%
## label: unnamed-chunk-23
##
|
|........ | 12%
## ordinary text without R code
##
##
|
|......... | 14%
## label: unnamed-chunk-24
##
|
|........... | 17%
## ordinary text without R code
##
##
|
|............ | 19%
## label: unnamed-chunk-25
##
|
|.............. | 21%
## ordinary text without R code
##
##
|
|............... | 24%
## label: unnamed-chunk-26
##
|
|................. | 26%
## ordinary text without R code
##
##
|
|................... | 29%
## label: unnamed-chunk-27
##
|
|.................... | 31%
## label: unnamed-chunk-28
##
|
|...................... | 33%
## ordinary text without R code
##
##
|
|....................... | 36%
## label: unnamed-chunk-29
##
|
|......................... | 38%
## ordinary text without R code
##
##
|
|.......................... | 40%
## label: unnamed-chunk-30
##
|
|............................ | 43%
## ordinary text without R code
##
##
|
|............................. | 45%
## label: unnamed-chunk-31
##
|
|............................... | 48%
## label: unnamed-chunk-32
##
|
|................................ | 50%
## ordinary text without R code
##
##
|
|.................................. | 52%
## label: unnamed-chunk-33
##
|
|.................................... | 55%
## label: unnamed-chunk-34
##
|
|..................................... | 57%
## label: unnamed-chunk-35
##
|
|....................................... | 60%
## ordinary text without R code
##
##
|
|........................................ | 62%
## label: unnamed-chunk-36
##
|
|.......................................... | 64%
## ordinary text without R code
##
##
|
|........................................... | 67%
## label: unnamed-chunk-37
##
|
|............................................. | 69%
## ordinary text without R code
##
##
|
|.............................................. | 71%
## label: unnamed-chunk-38
##
|
|................................................ | 74%
## ordinary text without R code
##
##
|
|.................................................. | 76%
## label: unnamed-chunk-39
##
|
|................................................... | 79%
## ordinary text without R code
##
##
|
|..................................................... | 81%
## label: unnamed-chunk-40
##
|
|...................................................... | 83%
## ordinary text without R code
##
##
|
|........................................................ | 86%
## label: unnamed-chunk-41
##
|
|......................................................... | 88%
## ordinary text without R code
##
##
|
|........................................................... | 90%
## label: unnamed-chunk-42
##
|
|............................................................ | 93%
## ordinary text without R code
##
##
|
|.............................................................. | 95%
## label: unnamed-chunk-43
##
|
|............................................................... | 98%
## ordinary text without R code
##
##
|
|.................................................................| 100%
## label: unnamed-chunk-44
## [1] "Death_correlation.md"
wm <- england %>% filter(Region == "West Midlands")
colnames(wm)[11]<- "Deprivation"
wm %>% filter(`Local Authority` == "Birmingham")
#Fill down empty rows
wm <- wm %>% group_by(`Local Authority`) %>% fill(Deprivation)
length(which(is.na(wm)))
## [1] 0
wm_persons_avoidable <- wm %>% filter(Sex == "Persons" & `Mortality
` == "Avoidable")
wm_women_avoidable <- wm %>% filter(Sex == "Females" & `Mortality
` == "Avoidable")
wm_men_avoidable <- wm %>% filter(Sex == "Males" & `Mortality
` == "Avoidable")
write.csv(wm_persons_avoidable, "wm_avoidable.csv")
wm_gender_avoidable <- wm %>% filter(`Mortality
` == "Avoidable" & Sex != "Persons")
write.csv(wm_gender_avoidable, "wm_gender_avoidable.csv")
summary(wm_persons_avoidable$`AS Rate`)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 173.9 192.0 219.8 223.3 241.2 295.8
sd(wm_persons_avoidable$`AS Rate`)
## [1] 37.36339
cor.test(wm_persons_avoidable$`AS Rate`, wm_persons_avoidable$Deprivation)
##
## Pearson's product-moment correlation
##
## data: wm_persons_avoidable$`AS Rate` and wm_persons_avoidable$Deprivation
## t = -7.4067, df = 28, p-value = 4.575e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.9078258 -0.6414475
## sample estimates:
## cor
## -0.8136803
plot(wm_persons_avoidable$Deprivation,wm_persons_avoidable$`AS Rate`, main = "Avoidable mortality higher in deprived areas", xlab = "1 most deprived and 211 least deprived", ylab = "Avoidable death rate per 100,000 inhabitants")
abline(lm(wm_persons_avoidable$`AS Rate` ~ wm_persons_avoidable$Deprivation))
boxplot(wm_persons_avoidable$`AS Rate`, horizontal = TRUE, axes = FALSE, main = "Avoidable deaths per 100,000 persons")
text(x=fivenum(wm_persons_avoidable$`AS Rate`), labels = fivenum(wm_persons_avoidable$`AS Rate`), y=1.25)
Who?
The national mean in England is 218 avoidable deaths per 100,000 similar to the West Midlands one. 50% of the local authorities are above that mean.
#Highest death rate
wm_persons_avoidable %>% select(`Local Authority`, `AS Rate`, Deprivation) %>% filter(`AS Rate` >= 241) %>% arrange(desc(`AS Rate`))
#Lowest death rate
wm_persons_avoidable %>% select(`Local Authority`, `AS Rate`, Deprivation) %>% filter(`AS Rate` <= 191) %>% arrange(desc(`AS Rate`))
# Fourth quantile
wm_persons_avoidable %>% filter(`AS Rate` >= 241) %>% group_by(Region) %>% summarise(mean(`AS Rate`))
# 2-4 quantile
wm_persons_avoidable %>% filter(`AS Rate` < 241) %>% group_by(Region) %>% summarise(mean(`AS Rate`))
(272.1125-205.5909)/205.5909*100
## [1] 32.3563
32% higher death rate in the most deprived quantile.
The random variation in the data is higher in small populations. The funnel plots are a useful way to control for that.
wm_funnelplot <- eng_all %>% filter(Region == "West Midlands")
wm_funnelplot <- wm_funnelplot %>% select(`Local Authority`, `AS Rate`, `All ages`)
wm_funnelplot$`AS Rate` <- wm_funnelplot$`AS Rate`/100000
wm_funnelplot$D_SE <- sqrt((wm_funnelplot$`AS Rate`*(1-wm_funnelplot$`AS Rate`)) / (wm_funnelplot$`All ages`))
colnames(wm_funnelplot) <- c("LA", "Death", "Pop", "SE_Death")
## common effect (fixed effect model) -- Mean
death.fem <- weighted.mean(wm_funnelplot$Death, 1/wm_funnelplot$SE_Death^2)
## lower and upper limits for 95% and 99.9% CI, based on FEM estimator
pop.seq <- seq(10000, max(wm_funnelplot$Pop), 10000)
ll95 <- death.fem - 1.96 * sqrt((death.fem*(1-death.fem)) / (pop.seq))
ul95 <- death.fem + 1.96 * sqrt((death.fem*(1-death.fem)) / (pop.seq))
ll999 <- death.fem - 3.29 * sqrt((death.fem*(1-death.fem)) / (pop.seq))
ul999 <- death.fem + 3.29 * sqrt((death.fem*(1-death.fem)) / (pop.seq))
CI <- data.frame(ll95, ul95, ll999, ul999, pop.seq, death.fem)
## draw plot
FP_WM_only <- ggplot(aes(x = Pop, y = Death), data = wm_funnelplot) +
geom_point(shape = 1, aes(text = paste(LA,"<br>","Avoidable death rate: ",(Death*100000))), size = 2) +
geom_line(aes(x = pop.seq, y = ll95, color = "steelblue3"), data = CI, color = "steelblue3") +
geom_line(aes(x = pop.seq, y = ul95, color = "steelblue3"), data = CI, color = "steelblue3")+
geom_line(aes(x = pop.seq, y = ll999, color = "steelblue1"), linetype = "dashed", data = CI, color = "steelblue1") +
geom_line(aes(x = pop.seq, y = ul999, color = "steelblue1"), linetype = "dashed", data = CI, color = "steelblue1") + geom_hline(aes(yintercept = death.fem, color = "olivedrab"), data = CI, color = "olivedrab") + xlim(0,1150000) + scale_y_continuous(labels = function(y) y*100000) + labs(title = "Avoidable deaths in the West Midlands", x = "Local authority population size", y = "Avoidable deaths per 100,000\n\n") + annotate("text",900000,.00390, label="99.9% limits", col="steelblue1", size=3.5, hjust=0)+ annotate("text",900000,.00372, label="95% limits", col="steelblue3", size=3.5, hjust=0) + annotate("text",900000,.00355, label="Regional mean", col="olivedrab", size=3.5, hjust=0) + theme_bw()
FP_WM_only
The blue lines are the CI 95% and 99.9% and the green line is the regional mean. The dots inside the lines are within the “expected” random variation, while the dots outside is more unlikely to be due to the random variation caused by the sample side.
I make the chart interactive to explore the information.
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
chart <- ggplotly(FP_WM_only, tooltip = c("text"))
chart
#To share in plotly
Sys.setenv("plotly_username"="CarmenAguilar")
Sys.setenv("plotly_api_key"="Tdpsdo2VWILpCenOMOQi")
chart_link = api_create(chart, filename = "FunnelPlot")
chart_link
The chart in plotly does not display properly. It does not respect the transformation in the y axis. As that information is in each data point, I removed it to avoid misundertandings.
Few, S., and Rowell, K. (2013) Variation and its discontents funnel plots for fair comparisons, Visual Business Intelligence Newsletter (Oct-Dec) Available at: http://www.perceptualedge.com/articles/visual_business_intelligence/variation_and_its_discontents.pdf [Accessed at July 19]
Goldacre, B. (2011) DIY statistical analysis: experience the thrill of touching real data, The Guardian, Oct. 28. Available at: https://www.theguardian.com/commentisfree/2011/oct/28/bad-science-diy-data-analysis [Accessed at July 19]
Barden, P. (2011) ‘Three-fold variation’ in UK bowel cancer death rates, Plumblum [blog], Sept. 15. Available at: https://pb204.blogspot.com/2011/09/three-fold-variation-in-uk-bowel-cancer.html [Accessed at July 19]
Bradshaw, P. (2011) Power Tools for Aspiring Data Journalists: Funnel Plots in R, Online Journalism Blog [blog] Oct. 31, available at: https://onlinejournalismblog.com/2011/10/31/power-tools-for-aspiring-data-journalists-funnel-plots-in-r/ [Accessed at July 19]
Gil Bellosta, C. (2015) ¿Os lo podéis creer? ¡Funnel plots en la prensa española!, Datanalytics [blog], Dec 4, available at:https://www.datanalytics.com/2015/12/04/os-lo-podeis-creer-funnel-plots-en-la-prensa-espanola/ [Accessed at July 19]
Mohammadian, C. (2016) Edit labels in tooltip for plotly maps using ggplot2 in r: comment. Stackoverflow [blog] Aug 14. Available at: https://stackoverflow.com/questions/38733403/edit-labels-in-tooltip-for-plotly-maps-using-ggplot2-in-r [Accessed at July 19]