This analysis extends an earlier local authority level analysis on school readiness. That analysis aimed to find local authorities that have higher rates of child development at five years than would be expected based on the local authority’s level of deprivation, as measured by its IMD 2015 score.
It found that one local authority (Trafford) appeared to perform better than expected based on deprivation alone, and that local authority appeared to underperform when considering those children eligible for free school meals.
However, it is possible that there is important variation within local authorities, and analysing at local authority level misses this.
Here I will repeat the analysis using electoral wards nested within local authorities. This multilevel approach should help to tease out how much school readiness varies between versus within local authorities.
Obviously, the ideal approach would involve individual-level data. However it would require an enormous sample to look at individuals within wards within schools. This probably isn’t feasible, so I’ll continue with ecological analyses.
The aim of this project is to identify wards and local authorities that are performing better than would be expected based on deprivation alone.
The objectives that support this aim are:
School readiness data is available from PHE’s Local Health website. That website includes an option to download all the indicator data. I haven’t found a way to get a url for the data to allow it all to be downloaded in the code, so I’ve had to do that part manually.
As in the local authority level analysis I’m going to use the overall IMD score for simplicity. One issue with the overall score is that it includes an education domain. While the indicators underlying this do not include child development at 5 years (the earliest indicator used is key stage 2, which is ages 7-11), there is still a risk of circularity in the analysis.
# Load packages
require(tidyverse)
require(knitr)
require(cowplot)
require(lme4)
# Load the ward-level indicator data from www.localhealth.org.uk
sr_wards <- read.csv("LocalHealth_All_indicators_Ward_data.csv")
# Find indicators to use
sr_wards %>%
select(Indicator.ID,
Indicator.Name) %>%
distinct(Indicator.ID,
.keep_all = TRUE) %>%
filter(grepl("[Dd]eprivation|[Cc]hild [Dd]evelopment",
Indicator.Name))
## Indicator.ID
## 1 93275
## 2 93268
## 3 93265
## 4 93094
## 5 93279
## Indicator.Name
## 1 Index of Multiple Deprivation Score 2015, IMD
## 2 Income deprivation, English Indices of Deprivation 2015
## 3 Child Development at age 5 (%)
## 4 Child Poverty, English Indices of Deprivation 2015, IDACI
## 5 Older People in Deprivation, English Indices of Deprivation 2015, IDAOPI
# Use indicators 93275 (IMD 2015) and 93265 (child development)
sr_wards <- sr_wards %>%
filter(Indicator.ID %in% c(93275, 93265),
Area.Type == "Ward") %>%
select(ward_code = Area.Code,
ward_name = Area.Name,
la_name = Parent.Name,
la_code = Parent.Code,
Indicator.Name,
Value) %>%
spread(Indicator.Name,
Value) %>%
rename(child_dev = 5,
imd = 6)
# Get local authority to region lookup
if(!file.exists("la_to_region_lookup.csv")){
url <- "http://geoportal1-ons.opendata.arcgis.com/datasets/c457af6314f24b20bb5de8fe41e05898_0.csv"
download.file(url = url,
destfile = "la_to_region_lookup.csv",
method = "curl")
}
lookup <- read.csv("la_to_region_lookup.csv",
stringsAsFactors = FALSE,
header = TRUE) %>%
select(la_code = LAD17CD,
region_code = RGN17CD,
region_name = RGN17NM)
# Merge region data with school readiness data
sr_wards <- merge(sr_wards, lookup,
by = "la_code")
kable(head(sr_wards))
| la_code | ward_code | ward_name | la_name | child_dev | imd | region_code | region_name |
|---|---|---|---|---|---|---|---|
| E06000001 | E05008949 | Manor House | Hartlepool | 58.60565 | 48.978 | E12000001 | North East |
| E06000001 | E05008950 | Rural West | Hartlepool | 70.92100 | 10.962 | E12000001 | North East |
| E06000001 | E05008945 | Foggy Furze | Hartlepool | 53.38994 | 31.752 | E12000001 | North East |
| E06000001 | E05008943 | De Bruce | Hartlepool | 56.04470 | 42.163 | E12000001 | North East |
| E06000001 | E05008946 | Hart | Hartlepool | 67.79412 | 10.456 | E12000001 | North East |
| E06000001 | E05008944 | Fens and Rossmere | Hartlepool | 63.45931 | 18.191 | E12000001 | North East |
Unfortunately, it doesn’t look like denominator data is available, so the analysis will have to be unweighted (unlike the local-authority level analysis). I assume this is due to small numbers and risk of identifying individuals.
It is also worth noting that child development data is 2013/14 and IMD 2015 is based on data aggregated across a number of years up to 2008-2013.
It’s always good to plot the data first. While the overall analysis will use a multilevel approach, here I’ll fit a simple linear regression to allow a trend-line.
# Fit a simple linear regression model
m1 <- lm(child_dev ~ imd,
data = sr_wards)
# Make a scatter plot
g1 <- ggplot(data = sr_wards,
aes(x = imd,
y = child_dev)) +
geom_point(size = 0.5,
alpha = 0.5) +
geom_abline(intercept = m1$coefficients[1],
slope = m1$coefficients[2],
colour = "red") +
theme_bw() +
scale_colour_brewer(palette = "Set2") +
guides(alpha = FALSE) +
labs(title = "Child development and deprivation",
subtitle = "English wards, 2013/14",
x = "IMD (2015) score",
y = "% 'good' development at 5 years",
caption = "Data source: PHE\nSchool readiness data for 2013/14\nIMD data is IMD 2015")
g1
# Print a summary of the linear regression
summary(m1)
##
## Call:
## lm(formula = child_dev ~ imd, data = sr_wards)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.804 -5.197 0.335 5.365 29.485
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 70.159614 0.172697 406.26 <2e-16 ***
## imd -0.420908 0.007778 -54.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.781 on 7416 degrees of freedom
## (27 observations deleted due to missingness)
## Multiple R-squared: 0.2831, Adjusted R-squared: 0.283
## F-statistic: 2929 on 1 and 7416 DF, p-value: < 2.2e-16
This preliminary look at the data suggests that:
My approach here is to model the wards (level 1) nested within local authorities (level 2). I’ll start with a random intercepts model, and proceed to a random slopes model. The latter model allows local authority performance to be compared both in terms of their average child development at 5 (intercept) and the extant to which child development varies with deprivation. Both of these are relevant when judging an area’s performance, so it’s worth testing whether the slopes or intercepts vary substantially.
First we can plot the data fitting separate linear slopes for each local authority area to get a sense of what’s going on:
# Plot the data
g2 <- ggplot(data = sr_wards,
aes(x = imd,
y = child_dev,
colour = as.factor(region_name),
group = as.factor(la_name))) +
#geom_point(size=.7,
# alpha=.5,
# position = "jitter") +
geom_smooth(method = lm,
se = FALSE,
alpha = 0.5,
size = 0.5) +
geom_abline(intercept = m1$coefficients[1],
slope = m1$coefficients[2],
colour = "red") +
theme_linedraw() +
guides(colour = FALSE) +
scale_colour_viridis_d() +
labs(title = "Child development and deprivation",
subtitle = "English wards, 2013/14",
x = "IMD (2015) score",
y = "% 'good' development at 5 years",
caption = "Data source: PHE\nSchool readiness data for 2013/14\nIMD data is IMD 2015")
g2
The plot is messy, but it suggests that:
We can test whether a random slopes model improves the fit of the model to the data by fitting a random intercepts and a random intercepts and slopes models and comparing the fits:
# Loading packages
require(lme4)
# Random intercepts model
m2 <- lmer(child_dev ~ imd + (1 | la_name),
data = sr_wards,
REML = FALSE)
# Random slopes model
m3 <- lmer(child_dev ~ imd + (1 + imd | la_name),
data = sr_wards,
REML = FALSE)
anova(m3, m2)
## Data: sr_wards
## Models:
## m2: child_dev ~ imd + (1 | la_name)
## m3: child_dev ~ imd + (1 + imd | la_name)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## m2 4 49642 49669 -24817 49634
## m3 6 49511 49553 -24750 49499 134.35 2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA suggests that there is evidence that the slopes do vary across local authorities, and we’re justified in using a random slopes model.
# Get a summary of the random slopes model
summary(m3)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: child_dev ~ imd + (1 + imd | la_name)
## Data: sr_wards
##
## AIC BIC logLik deviance df.resid
## 49511.2 49552.7 -24749.6 49499.2 7412
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.4708 -0.6147 0.0207 0.6315 4.6813
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## la_name (Intercept) 21.43835 4.6302
## imd 0.03359 0.1833 -0.43
## Residual 40.30390 6.3485
## Number of obs: 7418, groups: la_name, 326
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 70.90940 0.32192 220.27
## imd -0.47122 0.01449 -32.52
##
## Correlation of Fixed Effects:
## (Intr)
## imd -0.616
## convergence code: 0
## Model failed to converge with max|grad| = 0.0247414 (tol = 0.002, component 1)
The output an average effect of IMD score on child development is a 0.47 decrease in the % of children scoring ‘good’ development at 5 years for every unit increase in IMD score (which is pretty meaningless, given that IMD is a composite-of-composites), and this is not wildly different than the slope obtained from a simple linear regression (0.42).
Another thing to note is the correlation between the intercept and slope reisduals is -0.43, which suggests that those with the bigger positive intercept residuals (those that have higher rates of child development overall) tend to have steeper negative slopes (tend to perform worse in terms of equity). This suggests a trade off in terms of overall performance and equity. We can illustrate this in the plot below:
# Get residuals
residuals <- ranef(m3, condVar = TRUE)[[1]]
residuals <- rownames_to_column(residuals) %>%
rename(la_name = rowname,
intercept = 2,
slope = imd)
# Plot the residuals
g3 <- qplot(x = intercept,
y = slope,
data = residuals,
geom = "point") +
geom_smooth(method = "lm",
colour = "red",
se = FALSE) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
theme_bw() +
labs(title = "Local authority slope and intercept residuals",
x = "Intercept",
y = "Slope",
caption = "Plot of intercept and slope residuals for 326 local authorities\nData source: PHE Local Health")
g3
Points here represent local authority areas. The negative sloping trendline shows that, on average, areas with better overall performance have worse equity (i.e. steeper slopes). However, those points in the upper right quadrant represent local authorities whose intercept and slope residuals are both positive - i.e. those that both do better than expected overall, and do well in terms of equity. The number of points in that quadrant suggests that it is possible to do well both in terms of overall child development, and in terms of equity.
The next step is to extract the local authority intercepts and slopes to identify local authorities that perform well in terms of both overall performance (i.e. large positive intercepts) and equity (i.e. positive slope terms) as well as identifying high performing wards.
There is likely to be variation in child development within as well as between local authorities. This means we might be interested in the ward-level residuals to find small areas that are doing particularly well. This might help us to identify factors at the smallest area level that influence child development. We can get a table of these too:
# Get a table of ward-level residuals
ward_resids <- residuals(m3, level = 2)
# Add the residuals to the ward-level data
sr_wards <- sr_wards %>%
filter(!is.na(imd),
!is.na(child_dev)) %>%
mutate(residual = ward_resids)
# Tabulate top 20 standardised residuals for the North West
ward_resid_table <- sr_wards %>%
select(ward_name,
la_name,
region_name,
residual) %>%
mutate(residual = residual / sd(residual)) %>%
filter(region_name == "North West") %>%
arrange(desc(residual))
kable(head(ward_resid_table,
n = 20))
| ward_name | la_name | region_name | residual |
|---|---|---|---|
| Bamber Bridge West | South Ribble | North West | 2.737936 |
| Hensingham | Copeland | North West | 2.657819 |
| Dalton | Allerdale | North West | 2.481971 |
| Broughton St Bridget’s | Allerdale | North West | 2.427813 |
| Harbour | Copeland | North West | 2.380902 |
| Windermere Town | South Lakeland | North West | 2.296096 |
| Staveley-in-Westmorland | South Lakeland | North West | 2.285754 |
| Sudell | Blackburn with Darwen | North West | 2.261724 |
| Great Corby and Geltsdale | Carlisle | North West | 2.261308 |
| Windermere Bowness South | South Lakeland | North West | 2.216826 |
| Hawcoat | Barrow-in-Furness | North West | 2.215149 |
| Priory | Trafford | North West | 2.178815 |
| Marsh | Allerdale | North West | 2.173724 |
| Preston Rural North | Preston | North West | 2.053989 |
| Wilmslow West and Chorley | Cheshire East | North West | 2.042569 |
| Pilkington Park | Bury | North West | 1.986735 |
| Samlesbury & Walton | South Ribble | North West | 1.979850 |
| Macclesfield Tytherington | Cheshire East | North West | 1.936555 |
| Windermere Bowness North | South Lakeland | North West | 1.891614 |
| Hayton | Carlisle | North West | 1.890295 |
There are 3789 wards with positive residuals, 492 of which are in the North West region.
The above analysis suggests that:
It’s worth stressing that this analysis is descriptive only. Even so, there are a number of drawbacks to this analysis. These include: