Welcome

The goal of this project is to get an idea of the nature of traffic accidents in the United Kingdom. To help with the goal, an analysis of weekdays vs weekends will be done to give a better picture of how accidents occur. On-top of this, a interactive map will be created to better show the accident patterns in the United Kingdom.

Data Source

The data set used contains compilation of countrywide traffic accidents in all of United Kingdom. The data ranges from 2005 to 2015 and has about 1.7 million records.

There are 28 columns included in the data set.

ColumnName<-c("Accident_Index","Location_Easting_OSGR","Location_Northing_OSGR","Longitude","Latitude","Police_Force","Accident_Severity","Number_of_Vehicles","Number_of_Casualtues","Date","Day_of_Week","Time","Local_Authority_.District","Local_Authority_.Highway","Road_Type","Speed_limit","Junction_Control","Pedestrian_Crossing.Human_Control","Pedestrian_Crossing.Physical_Facilities","Light_Conditions","Weather_Conditions","Road_Surface_Conditions","Special_Conditions_at_Site","Carriageway_Hazards","Urban_or_Rural_Area","Did_Police_Officer_Attend_Scene_of_Accident","LSOA_of_Accident_Location","Year")
Description<-c("This is a unique identifier of the accident record.","States the estward-measured distance of the accident (the x-coordinate).","States the northward-measured distance of the accident(the y-coordinate).","Shows longitude in GPS coordinates of the accident.","Shows latitude in GPS coordinates of the accident.","The police force number (each number represents the police force name) which was present on the scene of the accident.","Shows the severity of the accident, a number between 1 and 3, where 1 indicates that there has been a fatal injurty on the scene and 3 indicates a slight injury.","Number of vehicles involved in the car accident. ","Number of individuals involved in the car accident.","The date of the car accident.","The day of the week in which the car accident occured. (1 being Sunday and 7 being Saturday).","The time of the car accident.","States the local authority district in which the accident occured at at.","States the local authority highway in which the accident occured at.","States the road type in which the accident occured (1=Roundabout,2=One way street,3=Dual carriageway,6=Single carriageway,7=Slip road,9=Unknown).","States the speed limit of the road where the accident occured (30,40,60,20,50 and 70).","Shows by who/what the the junction is being controlled by (Giveway or uncontrolled,Stop Sign,Authorised person and Automatic traffic signal).","Shows by who the nearest pedestrian crossing from the accident is being controlled by (None within 50 metres,Control by school crossing patrol and Control by other authorised person).","Shows the nearest pedestrial crossing physical facilities to the accidnet (No physical crossing within 50 meters,non-junction pedestrian crossing,Pedestrian phase at traffic signal junction and Central refuge).","Shows the ligh conditions of when the accident occured (Darkness: Street lights present and lit,Daylight: Street light present,Darkeness: No street lighting,Darkness: Street lights present but unlit and Darkness: Street lighting unknown).","Shows the weather condition of when the accident occured (Raining withouth high winds,Raining with high winds, Fog or mist, Unknown,Fine without high winds,Other, Snowing with high winds and Fine with high winds ).","Shows the road surface conditions (1=Dry,2=Wet/Damp,3=Snow,4=Frost/Ice,5=Flood (Over 3cm of water),-1=Unknown).","Shows if there were any special conditions at site when the accident occured (None,Oil or diseal, Road surface defective, Roadworks, Auto traffic signal out, Mud, Perment sign or marking defective or obscured, Auto traffic signal partly defective).","Shows if the carriageway contained any hazards when the accident occured (None, Other object in carriageway, Any animal (except a ridden horse), Pedestrian in carriageway (not injured), Involvement with previous accident and Dislodged vehicle load in cariageway).","Shows if the location of the accident was in an Urban or Rural area. (1 = Urban,2 = Rural and 3 = Unallocated).","Shows if police authority was present on the scene of the accident (No or Yes).","Shows the Lower Layer Super Output Areas of the accident.","The year of the car accident.")
COLUMNS<-data.frame(ColumnName,Description)
knitr::kable(COLUMNS,"pipe",align = c("l","l","c"))
ColumnName Description
Accident_Index This is a unique identifier of the accident record.
Location_Easting_OSGR States the estward-measured distance of the accident (the x-coordinate).
Location_Northing_OSGR States the northward-measured distance of the accident(the y-coordinate).
Longitude Shows longitude in GPS coordinates of the accident.
Latitude Shows latitude in GPS coordinates of the accident.
Police_Force The police force number (each number represents the police force name) which was present on the scene of the accident.
Accident_Severity Shows the severity of the accident, a number between 1 and 3, where 1 indicates that there has been a fatal injurty on the scene and 3 indicates a slight injury.
Number_of_Vehicles Number of vehicles involved in the car accident.
Number_of_Casualtues Number of individuals involved in the car accident.
Date The date of the car accident.
Day_of_Week The day of the week in which the car accident occured. (1 being Sunday and 7 being Saturday).
Time The time of the car accident.
Local_Authority_.District States the local authority district in which the accident occured at at.
Local_Authority_.Highway States the local authority highway in which the accident occured at.
Road_Type States the road type in which the accident occured (1=Roundabout,2=One way street,3=Dual carriageway,6=Single carriageway,7=Slip road,9=Unknown).
Speed_limit States the speed limit of the road where the accident occured (30,40,60,20,50 and 70).
Junction_Control Shows by who/what the the junction is being controlled by (Giveway or uncontrolled,Stop Sign,Authorised person and Automatic traffic signal).
Pedestrian_Crossing.Human_Control Shows by who the nearest pedestrian crossing from the accident is being controlled by (None within 50 metres,Control by school crossing patrol and Control by other authorised person).
Pedestrian_Crossing.Physical_Facilities Shows the nearest pedestrial crossing physical facilities to the accidnet (No physical crossing within 50 meters,non-junction pedestrian crossing,Pedestrian phase at traffic signal junction and Central refuge).
Light_Conditions Shows the ligh conditions of when the accident occured (Darkness: Street lights present and lit,Daylight: Street light present,Darkeness: No street lighting,Darkness: Street lights present but unlit and Darkness: Street lighting unknown).
Weather_Conditions Shows the weather condition of when the accident occured (Raining withouth high winds,Raining with high winds, Fog or mist, Unknown,Fine without high winds,Other, Snowing with high winds and Fine with high winds ).
Road_Surface_Conditions Shows the road surface conditions (1=Dry,2=Wet/Damp,3=Snow,4=Frost/Ice,5=Flood (Over 3cm of water),-1=Unknown).
Special_Conditions_at_Site Shows if there were any special conditions at site when the accident occured (None,Oil or diseal, Road surface defective, Roadworks, Auto traffic signal out, Mud, Perment sign or marking defective or obscured, Auto traffic signal partly defective).
Carriageway_Hazards Shows if the carriageway contained any hazards when the accident occured (None, Other object in carriageway, Any animal (except a ridden horse), Pedestrian in carriageway (not injured), Involvement with previous accident and Dislodged vehicle load in cariageway).
Urban_or_Rural_Area Shows if the location of the accident was in an Urban or Rural area. (1 = Urban,2 = Rural and 3 = Unallocated).
Did_Police_Officer_Attend_Scene_of_Accident Shows if police authority was present on the scene of the accident (No or Yes).
LSOA_of_Accident_Location Shows the Lower Layer Super Output Areas of the accident.
Year The year of the car accident.

Let us take a look at what our data looks like!

AllAccidents %>% 
  tibble::as_tibble()
## # A tibble: 1,780,653 x 34
##    ï..Accident_Index Location_Easting_O~ Location_Northing_O~ Longitude Latitude
##    <chr>                           <int>                <int>     <dbl>    <dbl>
##  1 200501BS00001                  525680               178240    -0.191     51.5
##  2 200501BS00002                  524170               181650    -0.212     51.5
##  3 200501BS00003                  524520               182240    -0.206     51.5
##  4 200501BS00004                  526900               177530    -0.174     51.5
##  5 200501BS00005                  528060               179040    -0.157     51.5
##  6 200501BS00006                  524770               181160    -0.203     51.5
##  7 200501BS00007                  524220               180830    -0.211     51.5
##  8 200501BS00009                  525890               179710    -0.188     51.5
##  9 200501BS00010                  527350               177650    -0.167     51.5
## 10 200501BS00011                  524550               180810    -0.207     51.5
## # ... with 1,780,643 more rows, and 29 more variables: Police_Force <int>,
## #   Accident_Severity <int>, Number_of_Vehicles <int>,
## #   Number_of_Casualties <int>, Date <chr>, Day_of_Week <int>, Time <chr>,
## #   Local_Authority_.District. <int>, Local_Authority_.Highway. <chr>,
## #   X1st_Road_Class <int>, X1st_Road_Number <int>, Road_Type <int>,
## #   Speed_limit <int>, Junction_Detail <int>, Junction_Control <int>,
## #   X2nd_Road_Class <int>, X2nd_Road_Number <int>, ...

If you would like to learn more about the dataset, please visit: https://www.kaggle.com/silicon99/dft-accident-data

A Quick Overview

Lets start with taking a look at a general overview of the data.

AllAccidents %>% 
  subset(select = c(Police_Force, Accident_Severity, Number_of_Vehicles, Number_of_Casualties)) %>% 
  summary()
##   Police_Force   Accident_Severity Number_of_Vehicles Number_of_Casualties
##  Min.   : 1.00   Min.   :1.000     Min.   : 1.000     Min.   : 1.000      
##  1st Qu.: 7.00   1st Qu.:3.000     1st Qu.: 1.000     1st Qu.: 1.000      
##  Median :31.00   Median :3.000     Median : 2.000     Median : 1.000      
##  Mean   :30.75   Mean   :2.838     Mean   : 1.832     Mean   : 1.349      
##  3rd Qu.:46.00   3rd Qu.:3.000     3rd Qu.: 2.000     3rd Qu.: 1.000      
##  Max.   :98.00   Max.   :3.000     Max.   :67.000     Max.   :93.000

Here we can see the general statistics for the continuous data.

Overview of Severities

summary.table <- AllAccidents %>% 
  count(Accident_Severity) %>% 
  group_by(Accident_Severity)
  
options(dplyr.summarise.inform = FALSE)

hc <- summary.table %>%
  hchart(
    "treemap", 
    hcaes(x = paste("Severity:", Accident_Severity, sep=" "), value = n, color = Accident_Severity)
  ) %>%
  hc_colorAxis(minColor = "#F4D03F", maxColor="red") %>% 
  hc_title(text = "Total Accidents by Severity, for the UK",
           align = "left") %>%
  hc_subtitle(text = "The color corrosponds to the number of accidents",
              align = "left") %>%
  hc_tooltip(formatter = JS("function(){
                            return (' Severity: ' + this.point.Accident_Severity + ' <br> # AccidentCount: ' + this.point.n)}"))

hc

This Treemap can tell us a few things regarding the Severity Category in the data, for example:

There are 3 Severity levels in the data,

Therefore, only 1.29% of the recorded accidents are categorized as level 1, the most severe accident.

The total number of recorded accidents in UK between 2005 to 2015 is 1780653 accidents.

Overview of Area

AllAccidents$Urban_or_Rural_Area_Word <- case_when (
  AllAccidents$Urban_or_Rural_Area == '1'  ~ 'Urban',
  AllAccidents$Urban_or_Rural_Area == '2' ~ 'Rural',
  AllAccidents$Urban_or_Rural_Area == '3' ~ 'Unallocated',
  
)


options(dplyr.summarise.inform = FALSE)

summary.table2 <- AllAccidents %>% 
  group_by(Urban_or_Rural_Area,Urban_or_Rural_Area_Word) %>% 
  summarise(
        Accident_Severity_Mean = mean(Accident_Severity),
        Accident_Count = sum(new1)
  ) 






hc <- summary.table2 %>% 
  slice(1:3) %>% 
  hchart(type = "bar",
         hcaes(x=Urban_or_Rural_Area_Word, y=Accident_Count, color=Urban_or_Rural_Area)) %>%
  hc_add_theme(hc_theme_flat()) %>% 
  hc_yAxis(title = list(text = "# of Accidents")) %>% 
  hc_title(text = "Total Accidents by area, in UK",
           align = "left") %>% 
  hc_tooltip(formatter = JS("function(){
                            return (' Area: ' + this.point.Urban_or_Rural_Area + ' <br> # AccidentCount: ' + this.point.Accident_Count + ' <br> # Accident_Severity_Mean: ' + this.point.Accident_Severity_Mean)}"))

hc

From the above, we can see there are 3 Areas in the UK which are Rural, Urban and Unknown.The graph shows the disparity of the three areas.

We can see that most car accidents happen in Urban areas. Urban areas had 64.38% of the acciedents. Rural areas contained 35.61% and Unallocated areas only contained 0.01%.

Overview of Road Types

AllAccidents$Road_Type_Word <- case_when (
  AllAccidents$Road_Type == '1'  ~ 'Roundabout',
  AllAccidents$Road_Type == '2' ~ 'One way street',
  AllAccidents$Road_Type == '3' ~ 'Dual carriageway',
  AllAccidents$Road_Type == '6' ~ 'Single carriageway',
  AllAccidents$Road_Type == '7' ~ 'Slip road',
  AllAccidents$Road_Type == '9' ~ 'Unknown',
  
)

summary.table8 <- AllAccidents %>% 
  group_by(Road_Type_Word) %>% 
  summarise(
    AccidentCount = sum(new1),
    meanAccidentSeverity = mean(Accident_Severity)
  ) 


hc <- summary.table8 %>%
  hchart(
    "treemap", 
    hcaes(x = paste("Road Type:", Road_Type_Word, sep=" "), value = AccidentCount, color = AccidentCount)
  ) %>%
  hc_colorAxis(minColor = "#F4D03F", maxColor="red") %>% 
  hc_title(text = "Total Accidents by Road Type, for the UK",
           align = "left") %>%
  hc_subtitle(text = "The color corrosponds to the number of accidents",
              align = "left") %>%
  hc_tooltip(formatter = JS("function(){
                            return (' Road Type: ' + this.point.Road_Type_Word + ' <br> # AccidentCount: ' + this.point.AccidentCount + ' <br> # Accident Severity Average: ' + this.point.meanAccidentSeverity)}"))

hc

From the above, we can see there are 6 Road Types in the UK which are Single carriageway,Dual carriageway, Roundabout,One way street, Slip road and Unknown. The graph shows the disparity of the 6 road types.

The data shows that most car accidents occur in carriageways more specifically single carriageways.Single carriageways contained 74.83% of the accidents.Carriageways all together contained 89.59% of the accidents while the other road types all together only contained 11% of the car accidents.

Complete Timeline

summary.table4 <- AllAccidents %>% 
  group_by(Year,Date,Day_of_Week) %>% 
  summarise(
    Accident_Count = sum(new1),
    Accident_Severity_Mean = mean(Accident_Severity),

  ) 



hc <- summary.table4 %>% 
  hchart(type="column", hcaes(x=Date, y=Accident_Count), color="black") %>% 
  hc_add_theme(hc_theme_flat()) %>% 
  hc_title(text = "Total Accidents by Date",
           align = "left") %>% 
  hc_yAxis(title = list(text = "Number of Accidents")) %>% 
  hc_xAxis(title = "", dateTimeLabelFormats = list(Year = '%Y')) %>%
  hc_tooltip(formatter = JS("function(){
                            return (' Day of Weak: ' + this.point.Date + ' <br> Day of Weak: ' + this.point.Day_of_Week + ' <br> # Accident Count: ' + this.point.Accident_Count + ' <br> Severity Mean: ' + this.point.Accident_Severity_Mean)}"))
hc

This graph shows exactly how much data is in this data set, as well as any patterns in the number of Accidents. For example,

The graph seems to show high variation throughout the individual weeks.This can be tested by looking at the accidents per weekday.

There might be a small decline in the number of accidents over time,

More analysis on this will need to be done to see if this is true.

Pearsons r and r2

raw_cor1 <- AllAccidents %>% 
  subset(!is.na(Accident_Severity)) %>% 
  subset(!is.na(Speed_limit)) %>%
  subset(!is.na(Road_Surface_Conditions)) %>%
  subset(select = c(Accident_Severity, Speed_limit, Road_Surface_Conditions)) %>% 
  as.matrix() %>% 
  cor()

raw_cor1
##                         Accident_Severity Speed_limit Road_Surface_Conditions
## Accident_Severity             1.000000000  -0.0806467             0.009436703
## Speed_limit                  -0.080646696   1.0000000             0.129754037
## Road_Surface_Conditions       0.009436703   0.1297540             1.000000000

Pearsons r cant tell us a lot, however we can gather a few bits of information:

Severity and Road Surface conditions appear to be positively correlated.The persons r value is positive, telling us that as the Severity increases when the Road surface conditions worsen.This might suggest that the worse the surface conditions are, the more sever an accident might be.

Severity and Speed limit are negatively correlated. Suggesting that as the Speed limit gets lower, the severity of an accident might be lower.

Lets check how well correlated these factors are, we can do this by simply squaring the above correlation.

raw_cor_2 <- raw_cor1 %>% 
  '^'(2) %>% 
  round(5) %>% #round
  hchart() %>% #graph
  hc_colorAxis(minColor = "#F4D03F", maxColor="red") %>%
  hc_add_theme(hc_theme_flat()) %>%
  hc_title(text = "Pearsons r squared matchup",
           align = "left") %>% 
  hc_subtitle(text = "For UK accidents",
              align = "left") %>% 
  hc_xAxis(categories = list("Severity", "Speed Limit", "Road Surface Conditions")) %>% 
  hc_yAxis(categories = list("Severity", "Speed Limit", "Road Surface Conditions"))%>% 
  hc_legend(align = "left") %>% 
  hc_plotOptions(
    series = list(
      boderWidth = 0,
      dataLabels = list(enabled = TRUE)))

raw_cor_2

All of the r squared values are very low, with the highest being Road surface conditions and Speed Limit of the road (0.01428).

As all of these r2 values are very low, this suggests that none of the factors can be explained by the other.

ie, the Road Surface Conditions of an Accident has little to do with the Severity of the Accident. The same can be said for all of the other match ups as well.

Total Accidents by Week per Year

AllAccidents$Day_of_Week_Words <- case_when (
  AllAccidents$Day_of_Week == '1'  ~ 'Sunday',
  AllAccidents$Day_of_Week == '2' ~ 'Monday',
  AllAccidents$Day_of_Week == '3' ~ 'Tuesday',
  AllAccidents$Day_of_Week == '4' ~ 'Wednesday',
  AllAccidents$Day_of_Week == '5'  ~ 'Thursday',
  AllAccidents$Day_of_Week == '6'  ~ 'Friday',
  AllAccidents$Day_of_Week == '7'  ~ 'Saturday'
  
  
)

summary.table7 <- AllAccidents %>% 
  group_by(Year,Day_of_Week,Day_of_Week_Words) %>% 
  summarise(
    Accident_Count = sum(new1),
    Accident_Severity_Mean = mean(Accident_Severity),
    
  ) 




hc <- summary.table7 %>% 
  hchart(type = "bar", 
         hcaes(x=Day_of_Week_Words, y=Accident_Count, group=Year)) %>%
  hc_add_theme(hc_theme_flat()) %>% 
  hc_plotOptions(column = list(stacking = "normal")) %>%
  hc_yAxis(title = list(text = "# of Accidents")) %>% 
  hc_title(text = "Total Accidents by Week, per year",
           align = "left") %>% 
  hc_tooltip(formatter = JS("function(){
                            return (' Area: ' + this.point.Year + ' <br> # AccidentCount: ' + this.point.Accident_Count + ' <br> # Accident_Severity_Mean: ' + this.point.Accident_Severity_Mean)}"))

hc

While this huge drop is apparent on the weekend, one reason why this might occur is that there are less people working, causing less people to be on the roads.

We can also clearly see that the number of accidents has greatly decreased over years.This could be due to a number of factors. Such as more money being invested into the infrastructure (better roads), due to an increase of speeding cameras people are more obliged to follow road rules which could possibly lead to less accidents or it could be the fact that over years the quality of cars has greatly improved which has lead to less faults occuring while people drive on the road.

Interactive Map of Accidents in 2015

Map of UK Accidents in 2015

Here is an interactive map of Accidents that have occurred within the UK in 2015. Data points are clustered together based off the level of Severity.

Feel free to play around with some of the controls.

AllAccidents <- AllAccidents[complete.cases(AllAccidents$Latitude), ]


AllAccidents <- AllAccidents %>% filter(Year == '2015')


# Set color palette 
pal <- colorBin("YlOrRd", domain = AllAccidents$Accident_Severity, 1:4)

# Creating groups of individual severity levels for filtering
groups <- as.character(sort(unique(AllAccidents$Accident_Severity), decreasing = TRUE))

# Creating html labels for map
labs <- lapply(seq(nrow(AllAccidents)), function(i) {
  paste0(
  'Severity: ', AllAccidents[i, "Accident_Severity"],'<br/>',
  'Number of Casualties: ', AllAccidents[i, "Number_of_Casualties"], ' <br/>',
  'Date: ', AllAccidents[i, "Date"], '<br/>',
  'Time: ', AllAccidents[i, "Time"], '<br/>',
  'Road Type: ', AllAccidents[i, "Road_Type_Word"], ' <br/>',
  'Speed Limit: ', AllAccidents[i, "Speed_limit"], ' <br/>',
  'Road Surface Conditions: ', AllAccidents[i, "Road_Surface_Conditions"], ' <br/>'
  
  
  )})

# Create a df for the labels
labeldf <- data.frame(Severity =  AllAccidents$Accident_Severity, Labels = unlist(labs), stringsAsFactors = FALSE)


# Create a map from leaflet to build from
map <- leaflet(AllAccidents, options = leafletOptions(minZoom = 5, maxZoom = 18)) %>%
  setView(lng = 3.4360, lat = 55.3781, zoom = 5) %>%
  setMaxBounds(6.4360, 62.000, -8.4360, 45.000) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addLayersControl(overlayGroups = groups,
                   options = layersControlOptions(collapsed = TRUE))

# Create layers for each severity level to be filtered
for (x in groups) {
  indiGroup = AllAccidents[AllAccidents$Accident_Severity == x,]
  map <- map %>%
    addCircleMarkers(data = indiGroup,
                     clusterOptions = markerClusterOptions(),
                     ~Longitude, ~Latitude,
                     group = x,
                     radius = 5,
                     color = ~pal(Accident_Severity),
                     stroke = FALSE,
                     fillOpacity = 0.5,
                     label = lapply(labeldf[labeldf$Severity == x,]$Labels, htmltools::HTML)
    )
}

# Display graph
map

Conclusion

While occurences of accidents is seemingly independent of weather, speed limit and severity, it is clear to see that accidents are more likely to happen on carriageways more specifically single carriageways. Using this it is clear to see there is a underlying pattern in what carriageways. are more accident prone, or less accident prone, based on the day of the week. This could help the surrounding cities better plan for future accidents, and could help reduce response time!

Challenges

It could be helpful to have a column that listed a cause for the accident, whether it be alcohol or drug-use related, or purely accidental. This type of data could assist in forecasting accidents for each category.

It would have been very beneficial if the data set contained a column which stated how much an accident disrupted traffic flow.

Since most of the accidents took place on carriageways, the categorical columns that had information on stop signs were not as useful when attempting to model accidents across UK.

Additional Work to be Done

Changes to the interactive map to add more filters (Months, Years, Highways vs Roads, Light conditions), display different information at various zoom levels (County clustered accidents, City Clustered Accidents, Road-specific Accidents), and to facilitate those additions, a geojson file containing the shapes of the Counties/Cities as well as data from out dataset that could be displayed in a simliar labelling fashion.