STATS19 Road Safety Dataset Analysis

Load Packages
Load Data
Recoding
Geography Preparation
Time Series Preparation
Data Exploration
Day of the Week vs Month
- Time Series Exploration
- Plotting Hourly Patterns by Day of Week
Calculating a Rate

All data obtained for 2015 (currently the most recently available) from https://data.gov.uk/dataset/road-accidents-safety-data All shapefiles sourced from the Office for National Statistics

Load Packages

library("dplyr")          # For data reshaping / summarising
library("tidyr")          # Data reshaping
library("lubridate")      # For handling dates
library("ggplot2")        # For data visualisation
library("viridis")        # Colour palettes
library("knitr")          # Contains the kable function
library("leaflet")        # Interactive maps
library("leaflet.extras") # For heatmaps
library("htmltools")      # For the htmlEscape function used by leaflet for labels
library("zoo")            # Has the rollmeans function

library("sp")             # Spatial stuff
library("rgdal")          # Spatial stuff
library("rgeos")          # Spatial stuff
library("maptools")       # Spatial stuff

#library("shiny")          # Interactivity

theme_set(theme_bw())     # Set ggplot default theme

Load Data

We exclude Scotland straight away as the shapefile we’re using later on doesn’t include it.

accidents <- read.csv("RoadSafetyData_2015\\Accidents_2015.csv", stringsAsFactors = FALSE)

# Exclude Scotland
accidents <- accidents %>% filter(substring(Local_Authority_.Highway.,1,1)!="S")

Recoding

Here we assign lookup values to some of the coded variables. Note that I haven’t done this for all coded variables, just the ones I’m looking at in this document.

# Accident severity ----------
accidents$Accident_Severity <- ifelse(accidents$Accident_Severity==1,"Fatal",
       ifelse(accidents$Accident_Severity==2,"Serious",
              ifelse(accidents$Accident_Severity==3,"Slight",NA
                     )))



# Road Type ----------
roadtype <- read.csv("RoadSafetyData_2015\\lookups\\roadtype.csv", stringsAsFactors=F)
accidents <- accidents %>% 
  left_join(roadtype, by=c("Road_Type"="code"))
rm(roadtype)

# Day of week Name ----------
# Create a function first
name_the_days <- function(weekday_number){
                        factor(
                          ifelse(weekday_number==1,"Sunday",
                               ifelse(weekday_number==2,"Monday",
                                      ifelse(weekday_number==3,"Tuesday",
                                             ifelse(weekday_number==4,"Wednesday",
                                                    ifelse(weekday_number==5,"Thursday",
                                                           ifelse(weekday_number==6,"Friday",
                                                                  ifelse(weekday_number==7,"Saturday",NA
                                                                  ))))))),
                          levels=c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"))}

# Run the name_the_days function
accidents$Day_of_Week_Name <- name_the_days(accidents$Day_of_Week)

# Change Date field into an actual date ----------
accidents$Date <- dmy(as.character(accidents$Date))

# Create a month number field from the Date field ----------
accidents$Month_Number <- factor(month(as.character(accidents$Date)))

Geography Preparation

We need to do a bit of work here as the geocoding of the data is a little off. First we’ll load a shapefile from the ONS and try to match all the codes in the accidents data to it. Any that don’t match will be assigned to a local authority based on a spatial join. It’s a little fiddly but does work quite well. I’m not certain if there are any special rules around how accidents are allocated to local authorities, hopefully the lat/lon location being within the borders will match that definition but these things can sometimes be more complex than expected.

# Load County & UA shapefile
# We will use this to check the codes in the dataset
# and then to do a spatial join if we have any failures
CTYUA2015_BGC <- readOGR(dsn="shp\\Counties_and_Unitary_Authorities_December_2015_GCB", layer="Counties_and_Unitary_Authorities_December_2015_Generalised_Clipped_Boundaries_in_England_and_Wales", stringsAsFactors=FALSE)

## OGR data source with driver: ESRI Shapefile 
## Source: "shp\Counties_and_Unitary_Authorities_December_2015_GCB", layer: "Counties_and_Unitary_Authorities_December_2015_Generalised_Clipped_Boundaries_in_England_and_Wales"
## with 174 features
## It has 6 fields
## Integer64 fields read as strings:  objectid

# Set the CRS
CTYUA2015_BGC <- spTransform(CTYUA2015_BGC, CRS("+init=epsg:27700"))

# We should now be able to check whether all the Local AUthority Highway codes will match this dataset
CTYUA2015_Codes <- unique(CTYUA2015_BGC@data$ctyua15cd)
accidents %>% 
  filter(!Local_Authority_.Highway. %in% CTYUA2015_Codes) %>%
  group_by(Local_Authority_.Highway.) %>% 
  summarise("n"=n())

## # A tibble: 3 x 2
##   Local_Authority_.Highway.     n
##                       <chr> <int>
## 1                 E06000048   716
## 2                 E08000020   517
## 3                 EHEATHROW    39

# We find that we have three local authorities which will need fixing
# Two of these are due to boundary changes (Gateshead and Northumberland)
# The remaining one is some odd quirk of this dataset and relates to Heathrow
# We'll solve these by assiging them using a spatial join and a shapefile



# Pull out the missing points so we can do a spatial join
  missing_la_points <- accidents %>%
        filter(!Local_Authority_.Highway. %in% CTYUA2015_Codes) %>%
        group_by(Location_Easting_OSGR,Location_Northing_OSGR) %>% 
        summarise() %>% 
        ungroup()

# Add x y coordinates
coordinates(missing_la_points) <- ~Location_Easting_OSGR+Location_Northing_OSGR

# Add a CRS to the points
proj4string(missing_la_points) <- CRS("+init=epsg:27700")

The following plot shows where the erroneous area codes fall. As we’d expect, we have a cluster in the north east over Northumberland and Gateshead and another in the south on top of Heathrow airport.

plot(CTYUA2015_BGC,border="grey")
plot(missing_la_points,col=rgb(1, 0, 0, 0.25), add=TRUE)

Now we have the problem data in the correct format we can perform a spatial join to identify the correct Local Authority code.

# Spatial join!
assigned_missing_la_points <- over(missing_la_points,CTYUA2015_BGC)

# Attach the spatial join results to the original missing points data frame
assigned_missing_la_points <- cbind(as.data.frame(missing_la_points),assigned_missing_la_points)

# Get rid of everything except the fields we need
assigned_missing_la_points <- assigned_missing_la_points %>% 
  select(ctyua15cd,Location_Easting_OSGR,Location_Northing_OSGR)

# Join the results back on to the accidents dataset
accidents <- accidents %>% left_join(assigned_missing_la_points)

## Joining, by = c("Location_Easting_OSGR", "Location_Northing_OSGR")

# Use ifelse to fill in the missing la codes in the new ctyua15cd field
accidents$ctyua15cd <- ifelse(is.na(accidents$ctyua15cd),accidents$Local_Authority_.Highway.,accidents$ctyua15cd)

# Add the names
CTYUA2015_LKP <- unique(data.frame("ctyua15cd"=CTYUA2015_BGC@data$ctyua15cd,"ctyua15nm"=CTYUA2015_BGC@data$ctyua15nm))

accidents <- accidents %>% 
  left_join(CTYUA2015_LKP)

## Joining, by = "ctyua15cd"

Time Series Preparation

There is an error in the dataset meaning that the day of the week needs correcting for two dates, the 13th May and the 11th November. We can see below that these two dates have records tagged as being both Wednesday and Tuesday.

accidents %>%
  filter(Date %in% as.Date(c("2015-05-13","2015-11-11"))) %>%
  group_by(Date,Day_of_Week,Day_of_Week_Name) %>%
  summarise("n"=n())

## # A tibble: 4 x 4
## # Groups:   Date, Day_of_Week [?]
##         Date Day_of_Week Day_of_Week_Name     n
##       <date>       <int>           <fctr> <int>
## 1 2015-05-13           3          Tuesday     1
## 2 2015-05-13           4        Wednesday   447
## 3 2015-11-11           3          Tuesday     1
## 4 2015-11-11           4        Wednesday   390

We fix this by using lubridate to derive these values from the date field and then re-running our name_the_days function used previously.

accidents$Day_of_Week <- wday(accidents$Date)
accidents$Day_of_Week_Name <- name_the_days(accidents$Day_of_Week)

# Check that we fixed them
accidents %>%
  filter(Date %in% as.Date(c("2015-05-13","2015-11-11"))) %>%
  group_by(Date,Day_of_Week,Day_of_Week_Name) %>%
  summarise("n"=n())

## # A tibble: 2 x 4
## # Groups:   Date, Day_of_Week [?]
##         Date Day_of_Week Day_of_Week_Name     n
##       <date>       <dbl>           <fctr> <int>
## 1 2015-05-13           4        Wednesday   448
## 2 2015-11-11           4        Wednesday   391

Data Exploration

Now things have been cleaned up a touch, let’s have a quick look at the dataset. We’ll start by just looking at the volume of accidents cut by a few different variables.

Month

accidents %>% 
  group_by(Date,Month_Number) %>% 
  summarise("n"=n()) %>% 

ggplot(data=., aes(Month_Number,n, fill=Month_Number)) +
  geom_boxplot() +
  scale_fill_viridis(option="viridis", discrete=T) +
  geom_point(alpha=0.5)

Road Type and Speed Limits

Interesting that the vast majority take place at speed limits of 30mph (though this doesn’t mean that vehicles were actually travelling within that speed limit of course!).

accidents %>% 
  group_by(road_type_name,Speed_limit) %>% 
  summarise("n"=n()) %>%
  
  ggplot(data=.,aes(road_type_name,n,fill=factor(Speed_limit))) +
  geom_col() +
  scale_fill_viridis(option="viridis", discrete=T) +
  coord_flip() +
  labs(title="Number of Accidents by Road Type and Speed Limit",
       x="Road Type",y="Number of Accidents",fill="Speed Limit")

Day of the Week

accidents %>% 
  group_by(Date,Day_of_Week_Name, Month_Number) %>% 
  summarise("n"=n()) %>% 

ggplot(data=., aes(Day_of_Week_Name,n, fill=Day_of_Week_Name)) +
  geom_boxplot() +
  scale_fill_viridis(option="viridis", discrete=T) +
  geom_point(alpha=0.5) +
  labs(title="Number of Accidents by Day of the Week",subtitle="Each point relates to one day of the year",
       x="Weekday",y="Number of Accidents",fill="Weekday")

Day of the Week vs Month

accidents %>% 
  group_by(Date,Day_of_Week_Name, Month_Number) %>% 
  summarise("n"=n()) %>%
  group_by(Day_of_Week_Name, Month_Number) %>%
  summarise("n"=mean(n)) %>%

ggplot(data=., aes(Month_Number,Day_of_Week_Name, fill=n)) +
    geom_tile() +
  scale_fill_viridis(option="plasma") +
  coord_fixed() +
  labs(title="Mean Number of Daily Accidents by Day of the Week and Month",
       x="Month Number",y="Weekday",fill="Mean Daily Accidents")

Time Series Exploration

We can easily look to see if there are any temporal patterns, though I won’t be going into any depth here. An ARIMA analysis would be appropriate but for now some rolling means will do.

We want to see where Bank holidays fell in 2015 in case there’s a link with the volume of accidents. You can get the list from https://www.gov.uk/bank-holidays

bank_holidays <- data.frame("Date"=as.Date(c("2015-01-01",
                           "2015-04-03",
                           "2015-04-06",
                           "2015-05-04",
                           "2015-05-25",
                           "2015-08-31",
                           "2015-12-25",
                           "2015-12-28")))

Let’s also generate an overall average and poisson confidence interval to get a feel for the variance here. We’ll approxiamte the poisson standard error using the square root of the mean of the accident count.

(time_series_limits <- accidents %>% 
  group_by(Date) %>% 
  summarise("n"=n()) %>% 
  group_by() %>% 
  summarise(
    "mean_accidents"=mean(n)) %>% 
  mutate(
        "se"=sqrt(mean_accidents),
        "UCL"=mean_accidents+(1.96*se),
        "LCL"=mean_accidents-(1.96*se)
  ))

## # A tibble: 1 x 4
##   mean_accidents       se      UCL      LCL
##            <dbl>    <dbl>    <dbl>    <dbl>
## 1        360.526 18.98752 397.7416 323.3105

Now let’s look at what happens over the year, the rolling average should iron out the weekly pattern we saw earlier. However, there are still some unusual patterns in the series.

accidents %>% 
  group_by(Date) %>% 
  summarise("n"=n()) %>%
  ungroup() %>% 
  mutate("Rollmean_Accidents" = rollmean(n,
                                         k = 7,
                                         fill = NA,
                                         align = "right"
                                         )) %>% 
  
  ggplot(data=., aes(Date,n)) +
  geom_point(size=0.75,alpha=0.6) +
  geom_vline(data=bank_holidays,aes(xintercept=as.numeric(Date)), linetype=1,size=2,col="light blue",alpha=0.5) +
  geom_hline(data=time_series_limits,aes(yintercept=mean_accidents), linetype=2,size=0.5,col="red") +
  geom_hline(data=time_series_limits,aes(yintercept=UCL), linetype=2,size=0.5,col="grey") +
  geom_hline(data=time_series_limits,aes(yintercept=LCL), linetype=2,size=0.5,col="grey") +
  geom_line(alpha=0.6) +
  geom_line(aes(Date,Rollmean_Accidents),col="red",size=1) +
  scale_x_date(date_breaks="1 month", date_labels="%b-%y") +
  theme(panel.grid.minor = element_blank(),panel.grid.major.y = element_blank()) +
  labs(title="Time Series of Number of Accidents by Date",
       x="Date",y="Number of Accidents")

Plotting Hourly Patterns by Day of Week

What happens to accident timings over the course of a week?

weekday_hour_summary <- accidents %>% 
  mutate("Hour"=as.numeric(substr(as.character(Time),1,2))) %>% 
  group_by(Date,Day_of_Week_Name,Hour) %>% 
  summarise(
    "n"=n()
  ) %>% 
  group_by(Day_of_Week_Name,Hour) %>% 
  summarise("mean_number_of_accidents"=mean(n))


ggplot(weekday_hour_summary, aes(Hour,mean_number_of_accidents)) +
  geom_col() +
  facet_wrap(~Day_of_Week_Name) +
  labs(title="Time Series of Mean Number of Accidents by Hour of the Day",
       x="Hour of the Day",y="Mean Number of Accidents")

Calculating a Rate

In this section we will calculate a rate based on a numerator of the number of individuals killed or seriously injured in accidents and a denominator of the total length of road in a given local authority.

Numerator: Number Killed or Seriously Injured

We obtain these figures from the casualties dataset which accompanies the accidents dataset.

casualties <- read.csv("RoadSafetyData_2015\\Casualties_2015.csv")

# Assign severity description
casualties$Casualty_Severity <- ifelse(casualties$Casualty_Severity==1,"Fatal",
       ifelse(casualties$Casualty_Severity==2,"Serious",
              ifelse(casualties$Casualty_Severity==3,"Slight",NA
                     )))

# There are 1759 casualty records which don't match to a corresponding accident
 casualties %>%
   filter(Casualty_Severity!="Slight") %>% 
   left_join(accidents) %>%
   filter(is.na(ctyua15cd)) %>% 
   group_by() %>% 
   summarise("ksi"=n())

## Joining, by = "Accident_Index"

## # A tibble: 1 x 1
##     ksi
##   <int>
## 1  1759

# Create a ctyua15 level count of casualties, excluding the 1759 which don't match
(ksi_casualties_ctyua <- casualties %>%
  filter(Casualty_Severity!="Slight") %>% 
  left_join(accidents) %>% 
  group_by(ctyua15cd) %>% 
  summarise("ksi"=n())) %>% 
  filter(!is.na(ctyua15cd))

## Joining, by = "Accident_Index"

## # A tibble: 173 x 2
##    ctyua15cd   ksi
##        <chr> <int>
##  1 E06000001    38
##  2 E06000002    53
##  3 E06000003    50
##  4 E06000004    66
##  5 E06000005    32
##  6 E06000006    32
##  7 E06000007    89
##  8 E06000008    78
##  9 E06000009    59
## 10 E06000010   105
## # ... with 163 more rows

Denominator: Length of Road

We’re going to try a very basic denominator but one which still feels a lot more realistic than using an area’s population. Here we’ll look at road length by local authority. This could be nuanced by trying to match the road types between the STATS19 data and the road length report (perhaps some form of standardisation is possible even) but for now, we’ll just do overall road length as a starter for 10.

All road length data taken from https://www.gov.uk/government/statistics/road-lengths-in-great-britain-2015

# Compare this to what is recorded in the National Statistics Road lengths in Great Britain: 2015 report
# I've cleaned this up in Excel to create a simple csv file (for shame!)
# Based on table RDL0202a of the report, all distances in km
road_lengths <- read.csv("F:\\Statistics\\STATS19\\road-lengths-in-great-britain-2015\\rdl0202a_2015.csv",stringsAsFactors=F)
road_lengths <- road_lengths %>%
  filter(substr(la_code,1,1)!="S") # Remove Scotland


a <- road_lengths[,c(1,2,3)]
a$type <- "motorways"
names(a)[3] <- "length_km"

b <- road_lengths[,c(1,2,4)]
b$type <- "a_roads"
names(b)[3] <- "length_km"

c <- road_lengths[,c(1,2,5)]
c$type <- "minor_roads"
names(c)[3] <- "length_km"

road_lengths <- rbind(a,b,c)
rm(list=c("a","b","c"))

road_lengths <- arrange(road_lengths,la_name)
head(road_lengths,9)

##     la_code              la_name length_km        type
## 1 E09000002 Barking and Dagenham       0.0   motorways
## 2 E09000002 Barking and Dagenham      37.1     a_roads
## 3 E09000002 Barking and Dagenham     300.1 minor_roads
## 4 E09000003               Barnet      12.0   motorways
## 5 E09000003               Barnet      96.4     a_roads
## 6 E09000003               Barnet     648.8 minor_roads
## 7 E08000016             Barnsley      17.3   motorways
## 8 E08000016             Barnsley     150.6     a_roads
## 9 E08000016             Barnsley    1046.8 minor_roads

Calculating the Rate

Note we have to be careful with the Isles of Scilly because they have such low numbers of accidents.

ksi_casualties_ctyua <- road_lengths %>% 
  group_by(la_code) %>% 
  summarise("length_km"=sum(length_km)) %>% 
  left_join(ksi_casualties_ctyua,by=c("la_code"="ctyua15cd")) %>% 
  mutate("ksi_per_1000km"=(ksi/length_km)*1000)


# Because the Isles of Scilly had no accidents, we have to enter a zero rather than an NA
ksi_casualties_ctyua[which(is.na(ksi_casualties_ctyua$ksi_per_1000km)),]$ksi <- 0
ksi_casualties_ctyua[which(is.na(ksi_casualties_ctyua$ksi_per_1000km)),]$ksi_per_1000km <- 0

# Check the histogram
ggplot(ksi_casualties_ctyua,aes(ksi_per_1000km)) +
  geom_histogram(bins=100) +
  labs(title="Histogram of KSI per 1000km",x="KSI per 1000km",y="Count")

Mapping KSI per 1000km

Here we use the leaflet package to generate an interactive map. This shows the following: * Boundaries for counties and unitary authorities, shaded based on quintiles of the number killed or seriously injured per 1000km of road * Clustered points showing the locations of all accidents (not just those resulting in a casualty), note that you have to zoom in quite closely before the points will split out to show individual accidents.

I suspect really we’re just mapping busy-ness here but it’s a start.

# Now we need to reproject our ctyua shapefile so that leaflet can display using latitude and longitude rather than Eastings and Northings.
CTYUA2015_BGC <- spTransform(CTYUA2015_BGC, CRS("+init=epsg:4326"))
#plot(CTYUA2015_BGC)


# Join the data to the shapefile
CTYUA2015_BGC <- merge(CTYUA2015_BGC, ksi_casualties_ctyua,by.x="ctyua15cd",by.y="la_code")

# Create a rounding function so figures can be displayed without lots of decimals
trunc.round <- function(x){trunc((x*10)+0.5)/10}

# Create colour quantiles
pal <- colorQuantile("YlOrRd", domain = CTYUA2015_BGC@data$ksi_per_1000km,n=10)



m <- leaflet(data=accidents) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=CTYUA2015_BGC,
             fillColor = ~pal(ksi_per_1000km),
             fillOpacity = 0.6,
             weight=0.5,
             color="black",
             group="Local Authorities",
             label=~htmlEscape(paste0(ctyua15nm,": ",trunc.round(ksi_per_1000km)," KSI per 1000km"))) %>%

  addMarkers(lng=~Longitude,
             lat=~Latitude,
             label = ~htmlEscape(paste0("No. of vehicles: ",Number_of_Vehicles,", No. of Casualties: ",Number_of_Casualties)),
             clusterOptions=markerClusterOptions(),
             group="Individual Accidents") %>%

#   Unfortunately there are too many points for the heatmap function to work.
#   addHeatmap(lng=~Longitude,
#              lat=~Latitude,
#              blur = 20, max = 0.05, radius = 15,
#              group="Heatmap"
#  ) %>% 

  addLegend(pal = pal, values = CTYUA2015_BGC@data$ksi_per_1000km, opacity = 0.7, title = "Deciles",
  position = "bottomright") %>%
  addLayersControl(
    overlayGroups = c("Individual Accidents", "Local Authorities"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  addMiniMap(toggleDisplay = TRUE, tiles=providers$CartoDB.Positron) 
# %>% hideGroup(c("Individual Accidents"))

m

Heatmap of a Single Local Authority

We can look at an individual local authority (Slough in this instance!) in the form of a heatmap. Notice how not all of the accidents seem to have been assigned to the correct local authority! We could fix this by expanding the spatial join process we performed earlier.

LA_name <- "Slough"

accidents_subset <- accidents %>% filter(ctyua15nm==LA_name)
CTYUA2015_BGC_subset <- CTYUA2015_BGC[which(CTYUA2015_BGC@data$ctyua15nm==LA_name),]


m <- leaflet(data=accidents_subset) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data=CTYUA2015_BGC_subset,
             fillOpacity = 0,
             weight=2,
             color="black",
             group="Local Authorities"
             ) %>%
  
  addCircleMarkers(
    lng=~Longitude,
    lat=~Latitude,
    radius = 4,
    color = "black",
    stroke = FALSE,
    fillOpacity = 0.5,
    label = ~htmlEscape(paste0("No. of vehicles: ",Number_of_Vehicles,", No. of Casualties: ",Number_of_Casualties)),
    group="Individual Accidents"
  ) %>%
  
  
  addHeatmap(lng=~Longitude,
             lat=~Latitude,
              blur = 20, max = 0.1, radius = 15,
             group="Heatmap"
  ) %>% 
  addLayersControl(
    overlayGroups = c("Individual Accidents", "Heatmap"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  hideGroup(c("Individual Accidents"))


m

Ranking Local Authorities by KSI per 1000km

The City of London has by far and away the highest rate in the country, almost double that of the second worst, Westminster.

(ranked_ksi_per_1000km <- CTYUA2015_BGC@data %>%
  select(ctyua15cd, ctyua15nm, length_km, ksi, ksi_per_1000km) %>% 
  mutate("rank" = dense_rank(desc(ksi_per_1000km)),
         "quintile"=ntile(ksi_per_1000km,5),
         "decile"=ntile(ksi_per_1000km,10)) %>%
  arrange(rank))

##     ctyua15cd                    ctyua15nm length_km ksi ksi_per_1000km
## 1   E09000001               City of London      57.1  43     753.064799
## 2   E09000033                  Westminster     340.9 135     396.010560
## 3   E09000019                    Islington     237.9  89     374.106768
## 4   E09000012                      Hackney     277.8  82     295.176386
## 5   E09000013       Hammersmith and Fulham     221.0  62     280.542986
## 6   E09000007                       Camden     284.5  76     267.135325
## 7   E09000022                      Lambeth     376.8  99     262.738854
## 8   E09000020       Kensington and Chelsea     205.4  52     253.164557
## 9   E06000043            Brighton and Hove     622.3 152     244.255182
## 10  E06000044                   Portsmouth     456.1 110     241.175181
## 11  E09000030                Tower Hamlets     286.2  68     237.596087
## 12  E09000028                    Southwark     392.8  89     226.578411
## 13  E06000045                  Southampton     580.5 125     215.331611
## 14  E08000025                   Birmingham    2542.4 444     174.638137
## 15  E06000018                   Nottingham     799.8 139     173.793448
## 16  E09000014                     Haringey     357.2  62     173.572228
## 17  E09000032                   Wandsworth     431.7  74     171.415335
## 18  E09000025                       Newham     425.0  72     169.411765
## 19  E09000005                        Brent     484.4  81     167.217176
## 20  E08000012                    Liverpool    1435.0 231     160.975610
## 21  E06000033              Southend-on-Sea     452.2  72     159.221583
## 22  E06000039                       Slough     318.2  48     150.848523
## 23  E06000008        Blackburn with Darwen     549.8  78     141.869771
## 24  E06000028                  Bournemouth     510.4  72     141.065831
## 25  E06000010  Kingston upon Hull, City of     752.1 105     139.609095
## 26  E09000018                     Hounslow     497.9  67     134.565174
## 27  E08000026                     Coventry     883.9 115     130.105216
## 28  E08000028                     Sandwell     895.4 116     129.551039
## 29  E06000009                    Blackpool     456.9  59     129.131101
## 30  E06000016                    Leicester     824.2 106     128.609561
## 31  E06000034                     Thurrock     581.7  73     125.494241
## 32  E06000032                        Luton     463.5  58     125.134844
## 33  E08000014                       Sefton     990.4 121     122.172859
## 34  E10000030                       Surrey    5558.9 679     122.146468
## 35  E09000003                       Barnet     757.2  92     121.500264
## 36  E06000029                        Poole     527.6  64     121.304018
## 37  E09000023                     Lewisham     448.7  53     118.119010
## 38  E08000015                       Wirral    1215.2 139     114.384463
## 39  E09000031               Waltham Forest     425.5  48     112.808461
## 40  E08000035                        Leeds    3030.3 338     111.540112
## 41  E10000032                  West Sussex    4219.7 459     108.775505
## 42  E09000010                      Enfield     644.3  70     108.645041
## 43  E06000023             Bristol, City of    1152.8 124     107.564192
## 44  E09000011                    Greenwich     503.6  54     107.227959
## 45  E09000009                       Ealing     587.7  63     107.197550
## 46  E08000021          Newcastle upon Tyne     972.6 104     106.929879
## 47  E09000026                    Redbridge     532.2  55     103.344607
## 48  E10000011                  East Sussex    3411.1 351     102.899358
## 49  E06000002                Middlesbrough     517.7  53     102.375893
## 50  E09000016                     Havering     658.5  67     101.746393
## 51  E08000031                Wolverhampton     771.5  77      99.805574
## 52  E10000017                   Lancashire    6962.3 690      99.105181
## 53  W06000015                      Cardiff    1081.0 107      98.982424
## 54  E08000030                      Walsall     869.9  86      98.861938
## 55  E06000015                        Derby     763.3  75      98.257566
## 56  E08000003                   Manchester    1374.5 134      97.489996
## 57  E09000015                       Harrow     473.4  46      97.169413
## 58  E08000032                     Bradford    1944.1 188      96.702845
## 59  E09000024                       Merton     376.2  36      95.693780
## 60  E06000026                     Plymouth     820.0  78      95.121951
## 61  E06000046                Isle of Wight     884.0  83      93.891403
## 62  E06000001                   Hartlepool     406.8  38      93.411996
## 63  E06000056         Central Bedfordshire    1504.8 140      93.035619
## 64  E06000014                         York     801.6  74      92.315369
## 65  E08000036                    Wakefield    1553.3 143      92.062061
## 66  E09000027         Richmond upon Thames     414.6  38      91.654607
## 67  E06000012      North East Lincolnshire     648.4  59      90.993214
## 68  E09000017                   Hillingdon     753.3  68      90.269481
## 69  E06000007                   Warrington     991.3  89      89.781096
## 70  E09000002         Barking and Dagenham     337.2  30      88.967972
## 71  E06000038                      Reading     395.3  35      88.540349
## 72  E06000035                       Medway     854.1  75      87.811732
## 73  E08000027                       Dudley    1026.7  90      87.659492
## 74  E06000040       Windsor and Maidenhead     693.5  60      86.517664
## 75  E09000006                      Bromley     904.2  77      85.158151
## 76  E09000021         Kingston upon Thames     344.3  29      84.228870
## 77  E09000008                      Croydon     779.3  65      83.408187
## 78  E08000034                     Kirklees    1918.1 159      82.894531
## 79  E08000004                       Oldham     822.9  68      82.634585
## 80  E06000030                      Swindon     898.8  74      82.331998
## 81  E08000001                       Bolton     991.2  81      81.719128
## 82  E10000015                Hertfordshire    4952.3 404      81.578257
## 83  E10000014                    Hampshire    8909.5 722      81.037095
## 84  E08000011                     Knowsley     581.2  47      80.867171
## 85  E08000033                   Calderdale    1162.1  92      79.167025
## 86  E10000002              Buckinghamshire    3288.0 257      78.163017
## 87  E06000036             Bracknell Forest     465.8  36      77.286389
## 88  E10000012                        Essex    8220.6 633      77.001679
## 89  E10000031                 Warwickshire    4160.9 320      76.906439
## 90  E10000025                  Oxfordshire    4706.7 361      76.699174
## 91  E06000004             Stockton-on-Tees     867.4  66      76.089463
## 92  E06000003         Redcar and Cleveland     679.3  50      73.605182
## 93  E10000021             Northamptonshire    4570.6 335      73.294535
## 94  E08000019                    Sheffield    1978.6 142      71.767917
## 95  E06000042                Milton Keynes    1249.3  89      71.239894
## 96  E08000005                     Rochdale     788.7  56      71.002916
## 97  E06000050    Cheshire West and Chester    2407.7 170      70.606803
## 98  E08000037                    Gateshead     906.8  64      70.577856
## 99  E06000027                       Torbay     524.5  37      70.543375
## 100 E08000018                    Rotherham    1219.1  85      69.723567
## 101 E06000031                 Peterborough     948.7  66      69.568884
## 102 E10000016                         Kent    9122.7 632      69.277736
## 103 E08000009                     Trafford     804.0  55      68.407960
## 104 W06000013                     Bridgend     798.9  54      67.592940
## 105 E06000055                      Bedford     913.2  61      66.798073
## 106 E08000022               North Tyneside     798.6  53      66.366141
## 107 E06000049                Cheshire East    2852.4 189      66.259992
## 108 E10000024              Nottinghamshire    4836.4 320      66.164916
## 109 E06000013           North Lincolnshire    1417.6  92      64.898420
## 110 E10000009                       Dorset    4321.5 279      64.560916
## 111 E08000008                     Tameside     767.4  49      63.851968
## 112 E08000013                   St. Helens     736.2  47      63.841347
## 113 E08000017                    Doncaster    1665.3 106      63.652195
## 114 E08000010                        Wigan    1159.0  73      62.985332
## 115 E08000006                      Salford     799.2  50      62.562563
## 116 E08000024                   Sunderland    1149.1  71      61.787486
## 117 W06000024               Merthyr Tydfil     348.8  21      60.206422
## 118 E10000003               Cambridgeshire    4768.9 286      59.971901
## 119 E06000005                   Darlington     548.2  32      58.372857
## 120 E06000022 Bath and North East Somerset    1056.7  61      57.726886
## 121 E10000007                   Derbyshire    5619.3 324      57.658427
## 122 E06000041                    Wokingham     747.2  43      57.548180
## 123 E06000047                County Durham    3694.9 211      57.105740
## 124 W06000006                      Wrexham    1212.0  69      56.930693
## 125 E08000016                     Barnsley    1214.7  68      55.980901
## 126 E06000017                      Rutland     554.6  31      55.896141
## 127 W06000012            Neath Port Talbot     876.7  49      55.891411
## 128 E06000006                       Halton     582.6  32      54.926193
## 129 E09000004                       Bexley     546.9  30      54.854635
## 130 W06000022                      Newport     738.1  40      54.193199
## 131 E08000007                    Stockport     992.3  53      53.411267
## 132 E10000018               Leicestershire    4581.6 242      52.819976
## 133 E08000029                     Solihull     871.2  46      52.800735
## 134 E06000021               Stoke-on-Trent     873.6  45      51.510989
## 135 W06000016            Rhondda Cynon Taf    1213.7  62      51.083464
## 136 E09000029                       Sutton     432.5  22      50.867052
## 137 E10000034               Worcestershire    4227.1 211      49.916018
## 138 W06000011                      Swansea    1171.5  58      49.509176
## 139 E08000002                         Bury     687.7  34      49.440163
## 140 E06000054                    Wiltshire    4774.7 233      48.798877
## 141 E06000011     East Riding of Yorkshire    3444.8 166      48.188574
## 142 E08000023               South Tyneside     561.3  27      48.102619
## 143 E10000023              North Yorkshire    9267.8 429      46.289303
## 144 E10000013              Gloucestershire    5572.9 253      45.398267
## 145 E06000024               North Somerset    1162.2  49      42.161418
## 146 E10000028                Staffordshire    6316.3 262      41.479980
## 147 W06000018                   Caerphilly    1025.5  41      39.980497
## 148 W06000005                   Flintshire    1283.7  51      39.728909
## 149 E06000037               West Berkshire    1414.0  56      39.603960
## 150 E06000020           Telford and Wrekin    1044.2  41      39.264509
## 151 W06000003                        Conwy    1656.1  65      39.248838
## 152 E10000020                      Norfolk   10051.1 385      38.304265
## 153 W06000004                 Denbighshire    1437.8  54      37.557379
## 154 W06000014            Vale of Glamorgan     919.1  33      35.904689
## 155 E10000019                 Lincolnshire    8933.5 319      35.708289
## 156 E06000025        South Gloucestershire    1554.4  50      32.166752
## 157 E10000027                     Somerset    6769.6 210      31.021035
## 158 E06000052                     Cornwall    7401.6 228      30.804150
## 159 E06000057               Northumberland    5163.8 153      29.629343
## 160 E06000019     Herefordshire, County of    3348.4  99      29.566360
## 161 E10000029                      Suffolk    7024.6 207      29.467870
## 162 E10000006                      Cumbria    8083.9 231      28.575316
## 163 W06000010              Carmarthenshire    3575.8 102      28.525085
## 164 E06000051                   Shropshire    5177.9 142      27.424245
## 165 E10000008                        Devon   13009.1 336      25.828074
## 166 W06000002                      Gwynedd    2516.7  65      25.827472
## 167 W06000023                        Powys    5381.8 136      25.270356
## 168 W06000009                Pembrokeshire    2628.1  66      25.113200
## 169 W06000008                   Ceredigion    2281.5  54      23.668639
## 170 W06000001             Isle of Anglesey    1209.4  28      23.151976
## 171 W06000019                Blaenau Gwent     419.5   6      14.302741
## 172 W06000021                Monmouthshire    1627.3  21      12.904812
## 173 W06000020                      Torfaen     478.0   4       8.368201
## 174 E06000053              Isles of Scilly      36.9   0       0.000000
##     rank quintile decile
## 1      1        5     10
## 2      2        5     10
## 3      3        5     10
## 4      4        5     10
## 5      5        5     10
## 6      6        5     10
## 7      7        5     10
## 8      8        5     10
## 9      9        5     10
## 10    10        5     10
## 11    11        5     10
## 12    12        5     10
## 13    13        5     10
## 14    14        5     10
## 15    15        5     10
## 16    16        5     10
## 17    17        5     10
## 18    18        5      9
## 19    19        5      9
## 20    20        5      9
## 21    21        5      9
## 22    22        5      9
## 23    23        5      9
## 24    24        5      9
## 25    25        5      9
## 26    26        5      9
## 27    27        5      9
## 28    28        5      9
## 29    29        5      9
## 30    30        5      9
## 31    31        5      9
## 32    32        5      9
## 33    33        5      9
## 34    34        5      9
## 35    35        4      8
## 36    36        4      8
## 37    37        4      8
## 38    38        4      8
## 39    39        4      8
## 40    40        4      8
## 41    41        4      8
## 42    42        4      8
## 43    43        4      8
## 44    44        4      8
## 45    45        4      8
## 46    46        4      8
## 47    47        4      8
## 48    48        4      8
## 49    49        4      8
## 50    50        4      8
## 51    51        4      8
## 52    52        4      8
## 53    53        4      7
## 54    54        4      7
## 55    55        4      7
## 56    56        4      7
## 57    57        4      7
## 58    58        4      7
## 59    59        4      7
## 60    60        4      7
## 61    61        4      7
## 62    62        4      7
## 63    63        4      7
## 64    64        4      7
## 65    65        4      7
## 66    66        4      7
## 67    67        4      7
## 68    68        4      7
## 69    69        4      7
## 70    70        3      6
## 71    71        3      6
## 72    72        3      6
## 73    73        3      6
## 74    74        3      6
## 75    75        3      6
## 76    76        3      6
## 77    77        3      6
## 78    78        3      6
## 79    79        3      6
## 80    80        3      6
## 81    81        3      6
## 82    82        3      6
## 83    83        3      6
## 84    84        3      6
## 85    85        3      6
## 86    86        3      6
## 87    87        3      6
## 88    88        3      5
## 89    89        3      5
## 90    90        3      5
## 91    91        3      5
## 92    92        3      5
## 93    93        3      5
## 94    94        3      5
## 95    95        3      5
## 96    96        3      5
## 97    97        3      5
## 98    98        3      5
## 99    99        3      5
## 100  100        3      5
## 101  101        3      5
## 102  102        3      5
## 103  103        3      5
## 104  104        3      5
## 105  105        2      4
## 106  106        2      4
## 107  107        2      4
## 108  108        2      4
## 109  109        2      4
## 110  110        2      4
## 111  111        2      4
## 112  112        2      4
## 113  113        2      4
## 114  114        2      4
## 115  115        2      4
## 116  116        2      4
## 117  117        2      4
## 118  118        2      4
## 119  119        2      4
## 120  120        2      4
## 121  121        2      4
## 122  122        2      3
## 123  123        2      3
## 124  124        2      3
## 125  125        2      3
## 126  126        2      3
## 127  127        2      3
## 128  128        2      3
## 129  129        2      3
## 130  130        2      3
## 131  131        2      3
## 132  132        2      3
## 133  133        2      3
## 134  134        2      3
## 135  135        2      3
## 136  136        2      3
## 137  137        2      3
## 138  138        2      3
## 139  139        2      3
## 140  140        1      2
## 141  141        1      2
## 142  142        1      2
## 143  143        1      2
## 144  144        1      2
## 145  145        1      2
## 146  146        1      2
## 147  147        1      2
## 148  148        1      2
## 149  149        1      2
## 150  150        1      2
## 151  151        1      2
## 152  152        1      2
## 153  153        1      2
## 154  154        1      2
## 155  155        1      2
## 156  156        1      2
## 157  157        1      1
## 158  158        1      1
## 159  159        1      1
## 160  160        1      1
## 161  161        1      1
## 162  162        1      1
## 163  163        1      1
## 164  164        1      1
## 165  165        1      1
## 166  166        1      1
## 167  167        1      1
## 168  168        1      1
## 169  169        1      1
## 170  170        1      1
## 171  171        1      1
## 172  172        1      1
## 173  173        1      1
## 174  174        1      1

ctyua15nm_factor_levels <- unique(ranked_ksi_per_1000km$ctyua15nm)[order(ranked_ksi_per_1000km$ksi_per_1000km)]

ranked_ksi_per_1000km$ctyua15nm <- factor(ranked_ksi_per_1000km$ctyua15nm, levels=ctyua15nm_factor_levels)

ggplot(ranked_ksi_per_1000km, aes(ctyua15nm,ksi_per_1000km, fill=ksi)) +
  geom_col() +
  scale_fill_gradient2(low = "#ffeda0", mid = "#feb24c",high = "#f03b20") +
  coord_flip() +
  geom_text(aes(label=round(ksi_per_1000km,0)), vjust=0.5,hjust=0,size=2) +
  theme(axis.text.x=element_text(size=8, angle = 90, vjust = 0.5, hjust=1)) +
  labs(title="Local Authority Ranked KSI per 1000km",subtitle="Columns shaded based on actual number killed or seriously injured",x="Local Authority",y="KSI per 1000km",fill="Count of KSI")

ggplot(ranked_ksi_per_1000km, aes(length_km,ksi, size=ksi_per_1000km)) +
  geom_point(alpha=0.5,shape=16) +
  geom_smooth(method="lm",se=F,col="red",linetype="dashed",show.legend=F) +
  scale_fill_viridis(discrete=T) +
  labs(title="Local Authority Total Length of Road vs Number KSI",
       x="Total Length of Roads",
       y="Number KSI",
       size="KSI per 1000km")

That’s all (for now)