Slope index of inequality

Julian Flowers
26 August 2015

R Markdown

This is an R Markdown presentation. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

Calculating slope of index inequality (SII) in R

  • SII is a widely used measure of health inequality which summarises the distribution of a health outcome measure (e.g. life expectancy) against a socioeconomic measure such as life expectancy
  • When used life expectancy it can be interpreted as the absolute differnce in years within a larger group of the least and most deprived subgroups
  • In this example we have extracted male and female life expactancy values (2008-11), IMD scores, and 2013 total populations from National General Practice Profiles
  • We then calculate on the basis of general practice, the SII for each CCG for male and female life expectancy in a few lines of code
  • the method can be used to calculate SIIs for any area based on sub areas and for any health measure given the population, health value, deprivation score a

Read data

Takes a data frame with area name/ code, population, deprivation score, health measure(s)

sii <- read.csv("~/Downloads/sii.csv")
head(sii)
##   Pr.code                                     CCG   Pop   IMD   Lem   Lef
## 1  A81001 NHS Hartlepool And Stockton-On-Tees CCG  4084 26.09 76.24 81.98
## 2  A81002 NHS Hartlepool And Stockton-On-Tees CCG 20111 28.44 77.12 81.81
## 3  A81003 NHS Hartlepool And Stockton-On-Tees CCG  3542 42.12 75.94 80.95
## 4  A81004                      NHS South Tees CCG  8513 31.01 76.72 81.36
## 5  A81005                      NHS South Tees CCG  7953 13.92 82.03 82.95
## 6  A81006 NHS Hartlepool And Stockton-On-Tees CCG 12221 29.81 76.76 81.40
dim(sii)
## [1] 7891    6

Install and load packages; look at data summary; remove NAs

  • We use dplyr to easily calculate necessary variables for each group of CCGs, and the broom package to perform regression and produce output in a tidy format
  • Look at data summary
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(broom)
library(Hmisc)
## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:dplyr':
## 
##     combine, src, summarize
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units
library(ggplot2)
sii <- tbl_df(sii)
summary(sii)
##     Pr.code                                              CCG      
##  A81001 :   1   NHS Northern, Eastern And Western Devon CCG: 123  
##  A81002 :   1   NHS Birmingham Crosscity CCG               : 115  
##  A81003 :   1   NHS Cambridgeshire and Peterborough CCG    : 106  
##  A81004 :   1   NHS Sandwell And West Birmingham CCG       : 105  
##  A81005 :   1   NHS Dorset CCG                             :  99  
##  A81006 :   1   NHS Liverpool CCG                          :  94  
##  (Other):7885   (Other)                                    :7249  
##       Pop             IMD             Lem             Lef       
##  Min.   :    0   Min.   : 2.86   Min.   :70.01   Min.   :76.61  
##  1st Qu.: 3700   1st Qu.:13.56   1st Qu.:77.16   1st Qu.:81.66  
##  Median : 6211   Median :21.41   Median :78.79   Median :83.04  
##  Mean   : 7053   Mean   :23.51   Mean   :78.69   Mean   :83.00  
##  3rd Qu.: 9552   3rd Qu.:31.74   3rd Qu.:80.32   3rd Qu.:84.34  
##  Max.   :52386   Max.   :68.47   Max.   :90.07   Max.   :91.88  
##                  NA's   :15      NA's   :260     NA's   :261
sii <- na.omit(sii)
summary(sii)
##     Pr.code                                              CCG      
##  A81001 :   1   NHS Northern, Eastern And Western Devon CCG: 120  
##  A81002 :   1   NHS Birmingham Crosscity CCG               : 112  
##  A81003 :   1   NHS Cambridgeshire and Peterborough CCG    : 104  
##  A81004 :   1   NHS Sandwell And West Birmingham CCG       : 100  
##  A81005 :   1   NHS Dorset CCG                             :  94  
##  A81006 :   1   NHS Liverpool CCG                          :  87  
##  (Other):7610   (Other)                                    :6999  
##       Pop             IMD             Lem             Lef       
##  Min.   :    0   Min.   : 2.86   Min.   :70.01   Min.   :76.61  
##  1st Qu.: 3703   1st Qu.:13.48   1st Qu.:77.16   1st Qu.:81.66  
##  Median : 6210   Median :21.23   Median :78.79   Median :83.04  
##  Mean   : 7058   Mean   :23.38   Mean   :78.69   Mean   :83.00  
##  3rd Qu.: 9564   3rd Qu.:31.43   3rd Qu.:80.31   3rd Qu.:84.34  
##  Max.   :52386   Max.   :68.47   Max.   :90.07   Max.   :91.88  
## 

Calculate mean IMD and life expectancy values per CCG (for use later)

aggsii <- summarise(group_by(sii, CCG), mean(IMD), mean(Lem), mean(Lef))
head(aggsii)
## Source: local data frame [6 x 4]
## 
##                                      CCG mean(IMD) mean(Lem) mean(Lef)
## 1 NHS Airedale, Wharfdale And Craven CCG  24.13765  77.61882  82.06824
## 2                        NHS Ashford CCG  33.34071  78.61429  83.50286
## 3                 NHS Aylesbury Vale CCG  15.43619  80.21000  84.08524
## 4           NHS Barking And Dagenham CCG  21.22400  78.85450  82.74200
## 5                         NHS Barnet CCG  13.09299  80.00672  83.89388
## 6                       NHS Barnsley CCG  29.44405  77.43676  81.88351

Call cumsum function (written by Hugh Mallinson)

source('~/Downloads/findXvals.R')
cumrank
## function (xvals) 
## {
##     prop <- xvals/sum(xvals)
##     cumprop <- numeric(length(xvals))
##     output <- numeric(length(xvals))
##     cumprop[1] <- prop[1]
##     output[1] <- prop[1]/2
##     for (i in 2:length(xvals)) {
##         cumprop[i] <- sum(prop[1:i])
##         output[i] <- prop[i]/2 + cumprop[i - 1]
##     }
##     FindXVals <- output
## }

Calculate relative rank

sii1 <- sii %>% arrange(CCG, IMD) %>% group_by(CCG) %>% mutate(relrank = cumrank(Pop))
head(sii1); tail(sii1)
## Source: local data frame [6 x 7]
## Groups: CCG
## 
##   Pr.code                                    CCG   Pop   IMD   Lem   Lef
## 1  B83006 NHS Airedale, Wharfdale And Craven CCG 10758  7.00 80.45 83.39
## 2  B83021 NHS Airedale, Wharfdale And Craven CCG 12748  9.84 77.68 78.64
## 3  B82028 NHS Airedale, Wharfdale And Craven CCG 13912 10.31 80.36 85.18
## 4  B83002 NHS Airedale, Wharfdale And Craven CCG  4526 10.70 80.17 85.45
## 5  B82099 NHS Airedale, Wharfdale And Craven CCG  4109 11.39 81.02 85.27
## 6  B82007 NHS Airedale, Wharfdale And Craven CCG  9309 12.04 75.77 80.71
## Variables not shown: relrank (dbl)
## Source: local data frame [6 x 7]
## Groups: CCG
## 
##   Pr.code                 CCG   Pop   IMD   Lem   Lef   relrank
## 1  M81073 NHS Wyre Forest CCG  8503 13.82 79.63 85.55 0.6573086
## 2  M81068 NHS Wyre Forest CCG 13106 17.28 78.98 84.99 0.7535265
## 3  M81608 NHS Wyre Forest CCG  2921 18.94 80.71 85.69 0.8248896
## 4  M81090 NHS Wyre Forest CCG  3217 23.17 81.11 85.49 0.8522201
## 5  M81027 NHS Wyre Forest CCG  6922 44.47 76.46 82.30 0.8973658
## 6  M81015 NHS Wyre Forest CCG  8064 50.11 74.99 81.25 0.9640936

Set up variables for regression analysis (for SII for male life expectancy)

sii1 <- sii %>% arrange(CCG, IMD) %>% group_by(CCG) %>% mutate(relrank = cumrank(Pop), 
tot = sum(Pop), prop = Pop/tot, Y = sqrt(prop)*Lem, a = sqrt(prop), b = relrank * a)

head(sii1[,5:12])
## Source: local data frame [6 x 8]
## 
##     Lem   Lef   relrank    tot       prop        Y         a           b
## 1 80.45 83.39 0.0344245 156255 0.06884900 21.10935 0.2623909 0.009032676
## 2 77.68 78.64 0.1096413 156255 0.08158459 22.18775 0.2856302 0.031316860
## 3 80.36 85.18 0.1949506 156255 0.08903395 23.97826 0.2983856 0.058170435
## 4 80.17 85.45 0.2539503 156255 0.02896547 13.64433 0.1701925 0.043220422
## 5 81.02 85.27 0.2815814 156255 0.02629676 13.13843 0.1621628 0.045662013
## 6 75.77 80.71 0.3245176 156255 0.05957569 18.49404 0.2440813 0.079208690

Calculate SIIs

siiM <- sii1 %>% group_by(CCG) %>% do(tidy(lm(Y ~ 0 + a + b, data=.))) ##set intercept at 0
 
siiM <- filter(siiM, term == "b") %>% mutate (lci = estimate - 1.96*std.error,
  uci = estimate + 1.96*std.error) %>% 
  select(CCG, estimate, lci, uci) ##extract SII values from regression and calculate 95% CIs
 
head(siiM)
## Source: local data frame [6 x 4]
## Groups: CCG
## 
##                                      CCG  estimate        lci       uci
## 1 NHS Airedale, Wharfdale And Craven CCG -6.388822  -9.381419 -3.396226
## 2                        NHS Ashford CCG -4.618791  -6.727257 -2.510325
## 3                 NHS Aylesbury Vale CCG -7.403199 -10.178371 -4.628027
## 4           NHS Barking And Dagenham CCG -6.277636  -7.384998 -5.170273
## 5                         NHS Barnet CCG -4.050214  -5.146874 -2.953554
## 6                       NHS Barnsley CCG -6.445051  -7.874487 -5.015614

Same for female LE

sii2 <- sii %>% arrange(CCG, IMD) %>% group_by(CCG) %>% mutate(relrank = cumrank(Pop), 
tot = sum(Pop), prop = Pop/tot, Y = sqrt(prop)*Lef, a = sqrt(prop), b = relrank * a)
siiF <- sii2 %>% group_by(CCG) %>% do(tidy(lm(Y ~ 0 + a + b, data=.))) ##set intercept at 0
 
siiF <- filter(siiF, term == "b") %>% mutate (lci = estimate - 1.96*std.error,
  uci = estimate + 1.96*std.error) %>% 
  select(CCG, estimate, lci, uci) ##extract SII values from regression and calculate 95% CIs

head(siiF)
## Source: local data frame [6 x 4]
## Groups: CCG
## 
##                                      CCG   estimate       lci         uci
## 1 NHS Airedale, Wharfdale And Craven CCG -3.9265523 -7.753242 -0.09986274
## 2                        NHS Ashford CCG -0.2193803 -1.826304  1.38754370
## 3                 NHS Aylesbury Vale CCG -3.7221070 -5.674428 -1.76978574
## 4           NHS Barking And Dagenham CCG -5.2663517 -6.472143 -4.06056026
## 5                         NHS Barnet CCG -3.2725833 -4.340735 -2.20443173
## 6                       NHS Barnsley CCG -4.4419603 -5.805199 -3.07872185

Extract SIIs and plot

tidy(model)
##          term  estimate std.error statistic      p.value
## 1 (Intercept) -1.704137 0.1652248 -10.31405 2.156211e-20
## 2           x  0.886490 0.0427591  20.73220 2.064050e-52

Compare distributions

##            x         y
## 1 -3.9265523 -6.388822
## 2 -0.2193803 -4.618791
## 3 -3.7221070 -7.403199
## 4 -5.2663517 -6.277636
## 5 -3.2725833 -4.050214
## 6 -4.4419603 -6.445051

Show values

## Source: local data frame [6 x 11]
## 
##                                      CCG  estimate        lci       uci
## 1 NHS Airedale, Wharfdale And Craven CCG -6.388822  -9.381419 -3.396226
## 2                        NHS Ashford CCG -4.618791  -6.727257 -2.510325
## 3                 NHS Aylesbury Vale CCG -7.403199 -10.178371 -4.628027
## 4           NHS Barking And Dagenham CCG -6.277636  -7.384998 -5.170273
## 5                         NHS Barnet CCG -4.050214  -5.146874 -2.953554
## 6                       NHS Barnsley CCG -6.445051  -7.874487 -5.015614
## Variables not shown: CCG.1 (fctr), estimate.1 (dbl), lci.1 (dbl), uci.1
##   (dbl), mean(IMD) (dbl), mean(Lem) (dbl), mean(Lef) (dbl)
## [1] 209  11
## Classes 'tbl_df', 'tbl' and 'data.frame':    209 obs. of  11 variables:
##  $ CCG       : Factor w/ 209 levels "NHS Airedale, Wharfdale And Craven CCG",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ estimate  : num  -6.39 -4.62 -7.4 -6.28 -4.05 ...
##  $ lci       : num  -9.38 -6.73 -10.18 -7.38 -5.15 ...
##  $ uci       : num  -3.4 -2.51 -4.63 -5.17 -2.95 ...
##  $ CCG.1     : Factor w/ 209 levels "NHS Airedale, Wharfdale And Craven CCG",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ estimate.1: num  -3.927 -0.219 -3.722 -5.266 -3.273 ...
##  $ lci.1     : num  -7.75 -1.83 -5.67 -6.47 -4.34 ...
##  $ uci.1     : num  -0.0999 1.3875 -1.7698 -4.0606 -2.2044 ...
##  $ mean(IMD) : num  24.1 33.3 15.4 21.2 13.1 ...
##  $ mean(Lem) : num  77.6 78.6 80.2 78.9 80 ...
##  $ mean(Lef) : num  82.1 83.5 84.1 82.7 83.9 ...