The goal of this project is to get an idea of the nature of traffic accidents in the United Kingdom. To help with the goal, an analysis of weekdays vs weekends will be done to give a better picture of how accidents occur. On-top of this, a interactive map will be created to better show the accident patterns in the United Kingdom.
The data set used contains compilation of countrywide traffic accidents in all of United Kingdom. The data ranges from 2005 to 2015 and has about 1.7 million records.
There are 28 columns included in the data set.
ColumnName<-c("Accident_Index","Location_Easting_OSGR","Location_Northing_OSGR","Longitude","Latitude","Police_Force","Accident_Severity","Number_of_Vehicles","Number_of_Casualtues","Date","Day_of_Week","Time","Local_Authority_.District","Local_Authority_.Highway","Road_Type","Speed_limit","Junction_Control","Pedestrian_Crossing.Human_Control","Pedestrian_Crossing.Physical_Facilities","Light_Conditions","Weather_Conditions","Road_Surface_Conditions","Special_Conditions_at_Site","Carriageway_Hazards","Urban_or_Rural_Area","Did_Police_Officer_Attend_Scene_of_Accident","LSOA_of_Accident_Location","Year")
Description<-c("This is a unique identifier of the accident record.","States the estward-measured distance of the accident (the x-coordinate).","States the northward-measured distance of the accident(the y-coordinate).","Shows longitude in GPS coordinates of the accident.","Shows latitude in GPS coordinates of the accident.","The police force number (each number represents the police force name) which was present on the scene of the accident.","Shows the severity of the accident, a number between 1 and 3, where 1 indicates that there has been a fatal injurty on the scene and 3 indicates a slight injury.","Number of vehicles involved in the car accident. ","Number of individuals involved in the car accident.","The date of the car accident.","The day of the week in which the car accident occured. (1 being Sunday and 7 being Saturday).","The time of the car accident.","States the local authority district in which the accident occured at at.","States the local authority highway in which the accident occured at.","States the road type in which the accident occured (1=Roundabout,2=One way street,3=Dual carriageway,6=Single carriageway,7=Slip road,9=Unknown).","States the speed limit of the road where the accident occured (30,40,60,20,50 and 70).","Shows by who/what the the junction is being controlled by (Giveway or uncontrolled,Stop Sign,Authorised person and Automatic traffic signal).","Shows by who the nearest pedestrian crossing from the accident is being controlled by (None within 50 metres,Control by school crossing patrol and Control by other authorised person).","Shows the nearest pedestrial crossing physical facilities to the accidnet (No physical crossing within 50 meters,non-junction pedestrian crossing,Pedestrian phase at traffic signal junction and Central refuge).","Shows the ligh conditions of when the accident occured (Darkness: Street lights present and lit,Daylight: Street light present,Darkeness: No street lighting,Darkness: Street lights present but unlit and Darkness: Street lighting unknown).","Shows the weather condition of when the accident occured (Raining withouth high winds,Raining with high winds, Fog or mist, Unknown,Fine without high winds,Other, Snowing with high winds and Fine with high winds ).","Shows the road surface conditions (1=Dry,2=Wet/Damp,3=Snow,4=Frost/Ice,5=Flood (Over 3cm of water),-1=Unknown).","Shows if there were any special conditions at site when the accident occured (None,Oil or diseal, Road surface defective, Roadworks, Auto traffic signal out, Mud, Perment sign or marking defective or obscured, Auto traffic signal partly defective).","Shows if the carriageway contained any hazards when the accident occured (None, Other object in carriageway, Any animal (except a ridden horse), Pedestrian in carriageway (not injured), Involvement with previous accident and Dislodged vehicle load in cariageway).","Shows if the location of the accident was in an Urban or Rural area. (1 = Urban,2 = Rural and 3 = Unallocated).","Shows if police authority was present on the scene of the accident (No or Yes).","Shows the Lower Layer Super Output Areas of the accident.","The year of the car accident.")
COLUMNS<-data.frame(ColumnName,Description)
knitr::kable(COLUMNS,"pipe",align = c("l","l","c"))
| ColumnName | Description |
|---|---|
| Accident_Index | This is a unique identifier of the accident record. |
| Location_Easting_OSGR | States the estward-measured distance of the accident (the x-coordinate). |
| Location_Northing_OSGR | States the northward-measured distance of the accident(the y-coordinate). |
| Longitude | Shows longitude in GPS coordinates of the accident. |
| Latitude | Shows latitude in GPS coordinates of the accident. |
| Police_Force | The police force number (each number represents the police force name) which was present on the scene of the accident. |
| Accident_Severity | Shows the severity of the accident, a number between 1 and 3, where 1 indicates that there has been a fatal injurty on the scene and 3 indicates a slight injury. |
| Number_of_Vehicles | Number of vehicles involved in the car accident. |
| Number_of_Casualtues | Number of individuals involved in the car accident. |
| Date | The date of the car accident. |
| Day_of_Week | The day of the week in which the car accident occured. (1 being Sunday and 7 being Saturday). |
| Time | The time of the car accident. |
| Local_Authority_.District | States the local authority district in which the accident occured at at. |
| Local_Authority_.Highway | States the local authority highway in which the accident occured at. |
| Road_Type | States the road type in which the accident occured (1=Roundabout,2=One way street,3=Dual carriageway,6=Single carriageway,7=Slip road,9=Unknown). |
| Speed_limit | States the speed limit of the road where the accident occured (30,40,60,20,50 and 70). |
| Junction_Control | Shows by who/what the the junction is being controlled by (Giveway or uncontrolled,Stop Sign,Authorised person and Automatic traffic signal). |
| Pedestrian_Crossing.Human_Control | Shows by who the nearest pedestrian crossing from the accident is being controlled by (None within 50 metres,Control by school crossing patrol and Control by other authorised person). |
| Pedestrian_Crossing.Physical_Facilities | Shows the nearest pedestrial crossing physical facilities to the accidnet (No physical crossing within 50 meters,non-junction pedestrian crossing,Pedestrian phase at traffic signal junction and Central refuge). |
| Light_Conditions | Shows the ligh conditions of when the accident occured (Darkness: Street lights present and lit,Daylight: Street light present,Darkeness: No street lighting,Darkness: Street lights present but unlit and Darkness: Street lighting unknown). |
| Weather_Conditions | Shows the weather condition of when the accident occured (Raining withouth high winds,Raining with high winds, Fog or mist, Unknown,Fine without high winds,Other, Snowing with high winds and Fine with high winds ). |
| Road_Surface_Conditions | Shows the road surface conditions (1=Dry,2=Wet/Damp,3=Snow,4=Frost/Ice,5=Flood (Over 3cm of water),-1=Unknown). |
| Special_Conditions_at_Site | Shows if there were any special conditions at site when the accident occured (None,Oil or diseal, Road surface defective, Roadworks, Auto traffic signal out, Mud, Perment sign or marking defective or obscured, Auto traffic signal partly defective). |
| Carriageway_Hazards | Shows if the carriageway contained any hazards when the accident occured (None, Other object in carriageway, Any animal (except a ridden horse), Pedestrian in carriageway (not injured), Involvement with previous accident and Dislodged vehicle load in cariageway). |
| Urban_or_Rural_Area | Shows if the location of the accident was in an Urban or Rural area. (1 = Urban,2 = Rural and 3 = Unallocated). |
| Did_Police_Officer_Attend_Scene_of_Accident | Shows if police authority was present on the scene of the accident (No or Yes). |
| LSOA_of_Accident_Location | Shows the Lower Layer Super Output Areas of the accident. |
| Year | The year of the car accident. |
Let us take a look at what our data looks like!
AllAccidents %>%
tibble::as_tibble()
## # A tibble: 1,780,653 x 34
## ï..Accident_Index Location_Easting_O~ Location_Northing_O~ Longitude Latitude
## <chr> <int> <int> <dbl> <dbl>
## 1 200501BS00001 525680 178240 -0.191 51.5
## 2 200501BS00002 524170 181650 -0.212 51.5
## 3 200501BS00003 524520 182240 -0.206 51.5
## 4 200501BS00004 526900 177530 -0.174 51.5
## 5 200501BS00005 528060 179040 -0.157 51.5
## 6 200501BS00006 524770 181160 -0.203 51.5
## 7 200501BS00007 524220 180830 -0.211 51.5
## 8 200501BS00009 525890 179710 -0.188 51.5
## 9 200501BS00010 527350 177650 -0.167 51.5
## 10 200501BS00011 524550 180810 -0.207 51.5
## # ... with 1,780,643 more rows, and 29 more variables: Police_Force <int>,
## # Accident_Severity <int>, Number_of_Vehicles <int>,
## # Number_of_Casualties <int>, Date <chr>, Day_of_Week <int>, Time <chr>,
## # Local_Authority_.District. <int>, Local_Authority_.Highway. <chr>,
## # X1st_Road_Class <int>, X1st_Road_Number <int>, Road_Type <int>,
## # Speed_limit <int>, Junction_Detail <int>, Junction_Control <int>,
## # X2nd_Road_Class <int>, X2nd_Road_Number <int>, ...
If you would like to learn more about the dataset, please visit: https://www.kaggle.com/silicon99/dft-accident-data
Lets start with taking a look at a general overview of the data.
AllAccidents %>%
subset(select = c(Police_Force, Accident_Severity, Number_of_Vehicles, Number_of_Casualties)) %>%
summary()
## Police_Force Accident_Severity Number_of_Vehicles Number_of_Casualties
## Min. : 1.00 Min. :1.000 Min. : 1.000 Min. : 1.000
## 1st Qu.: 7.00 1st Qu.:3.000 1st Qu.: 1.000 1st Qu.: 1.000
## Median :31.00 Median :3.000 Median : 2.000 Median : 1.000
## Mean :30.75 Mean :2.838 Mean : 1.832 Mean : 1.349
## 3rd Qu.:46.00 3rd Qu.:3.000 3rd Qu.: 2.000 3rd Qu.: 1.000
## Max. :98.00 Max. :3.000 Max. :67.000 Max. :93.000
Here we can see the general statistics for the continuous data.
The average severity in UK is 2.836
The average number of vehicles which are affected in a single accident is 1.835 (rounding up to 2)
The average number of casualties which are affected in a single accident is 1.329.
summary.table <- AllAccidents %>%
count(Accident_Severity) %>%
group_by(Accident_Severity)
options(dplyr.summarise.inform = FALSE)
hc <- summary.table %>%
hchart(
"treemap",
hcaes(x = paste("Severity:", Accident_Severity, sep=" "), value = n, color = Accident_Severity)
) %>%
hc_colorAxis(minColor = "#F4D03F", maxColor="red") %>%
hc_title(text = "Total Accidents by Severity, for the UK",
align = "left") %>%
hc_subtitle(text = "The color corrosponds to the number of accidents",
align = "left") %>%
hc_tooltip(formatter = JS("function(){
return (' Severity: ' + this.point.Accident_Severity + ' <br> # AccidentCount: ' + this.point.n)}"))
hc
This Treemap can tell us a few things regarding the Severity Category in the data, for example:
There are 3 Severity levels in the data,
The Severity ranges on a scale of 1 - 3,
Severity 1: 22998 accidents
Severity 2: 242080 accidents
Severity 3: 1515575 accidents
98% of the recorded accidents are categorized as Severity level 3 or 2.
1.29% of the recorded accidents are level 1, respectively
13.6% of the recorded accidents are level 2, respectively
85.11% of the recorded accidents are level 3, respectively
Therefore, only 1.29% of the recorded accidents are categorized as level 1, the most severe accident.
The total number of recorded accidents in UK between 2005 to 2015 is 1780653 accidents.
AllAccidents$Urban_or_Rural_Area_Word <- case_when (
AllAccidents$Urban_or_Rural_Area == '1' ~ 'Urban',
AllAccidents$Urban_or_Rural_Area == '2' ~ 'Rural',
AllAccidents$Urban_or_Rural_Area == '3' ~ 'Unallocated',
)
options(dplyr.summarise.inform = FALSE)
summary.table2 <- AllAccidents %>%
group_by(Urban_or_Rural_Area,Urban_or_Rural_Area_Word) %>%
summarise(
Accident_Severity_Mean = mean(Accident_Severity),
Accident_Count = sum(new1)
)
hc <- summary.table2 %>%
slice(1:3) %>%
hchart(type = "bar",
hcaes(x=Urban_or_Rural_Area_Word, y=Accident_Count, color=Urban_or_Rural_Area)) %>%
hc_add_theme(hc_theme_flat()) %>%
hc_yAxis(title = list(text = "# of Accidents")) %>%
hc_title(text = "Total Accidents by area, in UK",
align = "left") %>%
hc_tooltip(formatter = JS("function(){
return (' Area: ' + this.point.Urban_or_Rural_Area + ' <br> # AccidentCount: ' + this.point.Accident_Count + ' <br> # Accident_Severity_Mean: ' + this.point.Accident_Severity_Mean)}"))
hc
From the above, we can see there are 3 Areas in the UK which are Rural, Urban and Unknown.The graph shows the disparity of the three areas.
URBAN:
1146421 recorded accidents
mean severity level of 2.86
RURAL:
634089 recorded accidents
mean severity level of 2.79
UNALLOCATED:
143 recorded accidents
mean severity level of 2.87
We can see that most car accidents happen in Urban areas. Urban areas had 64.38% of the acciedents. Rural areas contained 35.61% and Unallocated areas only contained 0.01%.
AllAccidents$Road_Type_Word <- case_when (
AllAccidents$Road_Type == '1' ~ 'Roundabout',
AllAccidents$Road_Type == '2' ~ 'One way street',
AllAccidents$Road_Type == '3' ~ 'Dual carriageway',
AllAccidents$Road_Type == '6' ~ 'Single carriageway',
AllAccidents$Road_Type == '7' ~ 'Slip road',
AllAccidents$Road_Type == '9' ~ 'Unknown',
)
summary.table8 <- AllAccidents %>%
group_by(Road_Type_Word) %>%
summarise(
AccidentCount = sum(new1),
meanAccidentSeverity = mean(Accident_Severity)
)
hc <- summary.table8 %>%
hchart(
"treemap",
hcaes(x = paste("Road Type:", Road_Type_Word, sep=" "), value = AccidentCount, color = AccidentCount)
) %>%
hc_colorAxis(minColor = "#F4D03F", maxColor="red") %>%
hc_title(text = "Total Accidents by Road Type, for the UK",
align = "left") %>%
hc_subtitle(text = "The color corrosponds to the number of accidents",
align = "left") %>%
hc_tooltip(formatter = JS("function(){
return (' Road Type: ' + this.point.Road_Type_Word + ' <br> # AccidentCount: ' + this.point.AccidentCount + ' <br> # Accident Severity Average: ' + this.point.meanAccidentSeverity)}"))
hc
From the above, we can see there are 6 Road Types in the UK which are Single carriageway,Dual carriageway, Roundabout,One way street, Slip road and Unknown. The graph shows the disparity of the 6 road types.
Single carriageway
1332384 recorded accidents
mean severity level of 2.83
Dual carriageway
262950 recorded accidents
mean severity level of 2.84
Roundabout
119472 recorded accidents
mean severity level of 2.90
One way street
36755 recorded accidents
mean severity level of 2.86
Slip road
18647 recorded accidents
mean severity level of 2.89
Unknown
10445 recorded accidents
mean severity level of 2.87
The data shows that most car accidents occur in carriageways more specifically single carriageways.Single carriageways contained 74.83% of the accidents.Carriageways all together contained 89.59% of the accidents while the other road types all together only contained 11% of the car accidents.
summary.table4 <- AllAccidents %>%
group_by(Year,Date,Day_of_Week) %>%
summarise(
Accident_Count = sum(new1),
Accident_Severity_Mean = mean(Accident_Severity),
)
hc <- summary.table4 %>%
hchart(type="column", hcaes(x=Date, y=Accident_Count), color="black") %>%
hc_add_theme(hc_theme_flat()) %>%
hc_title(text = "Total Accidents by Date",
align = "left") %>%
hc_yAxis(title = list(text = "Number of Accidents")) %>%
hc_xAxis(title = "", dateTimeLabelFormats = list(Year = '%Y')) %>%
hc_tooltip(formatter = JS("function(){
return (' Day of Weak: ' + this.point.Date + ' <br> Day of Weak: ' + this.point.Day_of_Week + ' <br> # Accident Count: ' + this.point.Accident_Count + ' <br> Severity Mean: ' + this.point.Accident_Severity_Mean)}"))
hc
This graph shows exactly how much data is in this data set, as well as any patterns in the number of Accidents. For example,
The graph seems to show high variation throughout the individual weeks.This can be tested by looking at the accidents per weekday.
There might be a small decline in the number of accidents over time,
More analysis on this will need to be done to see if this is true.
raw_cor1 <- AllAccidents %>%
subset(!is.na(Accident_Severity)) %>%
subset(!is.na(Speed_limit)) %>%
subset(!is.na(Road_Surface_Conditions)) %>%
subset(select = c(Accident_Severity, Speed_limit, Road_Surface_Conditions)) %>%
as.matrix() %>%
cor()
raw_cor1
## Accident_Severity Speed_limit Road_Surface_Conditions
## Accident_Severity 1.000000000 -0.0806467 0.009436703
## Speed_limit -0.080646696 1.0000000 0.129754037
## Road_Surface_Conditions 0.009436703 0.1297540 1.000000000
Pearsons r cant tell us a lot, however we can gather a few bits of information:
Severity and Road Surface conditions appear to be positively correlated.The persons r value is positive, telling us that as the Severity increases when the Road surface conditions worsen.This might suggest that the worse the surface conditions are, the more sever an accident might be.
Severity and Speed limit are negatively correlated. Suggesting that as the Speed limit gets lower, the severity of an accident might be lower.
Lets check how well correlated these factors are, we can do this by simply squaring the above correlation.
raw_cor_2 <- raw_cor1 %>%
'^'(2) %>%
round(5) %>% #round
hchart() %>% #graph
hc_colorAxis(minColor = "#F4D03F", maxColor="red") %>%
hc_add_theme(hc_theme_flat()) %>%
hc_title(text = "Pearsons r squared matchup",
align = "left") %>%
hc_subtitle(text = "For UK accidents",
align = "left") %>%
hc_xAxis(categories = list("Severity", "Speed Limit", "Road Surface Conditions")) %>%
hc_yAxis(categories = list("Severity", "Speed Limit", "Road Surface Conditions"))%>%
hc_legend(align = "left") %>%
hc_plotOptions(
series = list(
boderWidth = 0,
dataLabels = list(enabled = TRUE)))
raw_cor_2
All of the r squared values are very low, with the highest being Road surface conditions and Speed Limit of the road (0.01428).
As all of these r2 values are very low, this suggests that none of the factors can be explained by the other.
ie, the Road Surface Conditions of an Accident has little to do with the Severity of the Accident. The same can be said for all of the other match ups as well.
AllAccidents$Day_of_Week_Words <- case_when (
AllAccidents$Day_of_Week == '1' ~ 'Sunday',
AllAccidents$Day_of_Week == '2' ~ 'Monday',
AllAccidents$Day_of_Week == '3' ~ 'Tuesday',
AllAccidents$Day_of_Week == '4' ~ 'Wednesday',
AllAccidents$Day_of_Week == '5' ~ 'Thursday',
AllAccidents$Day_of_Week == '6' ~ 'Friday',
AllAccidents$Day_of_Week == '7' ~ 'Saturday'
)
summary.table7 <- AllAccidents %>%
group_by(Year,Day_of_Week,Day_of_Week_Words) %>%
summarise(
Accident_Count = sum(new1),
Accident_Severity_Mean = mean(Accident_Severity),
)
hc <- summary.table7 %>%
hchart(type = "bar",
hcaes(x=Day_of_Week_Words, y=Accident_Count, group=Year)) %>%
hc_add_theme(hc_theme_flat()) %>%
hc_plotOptions(column = list(stacking = "normal")) %>%
hc_yAxis(title = list(text = "# of Accidents")) %>%
hc_title(text = "Total Accidents by Week, per year",
align = "left") %>%
hc_tooltip(formatter = JS("function(){
return (' Area: ' + this.point.Year + ' <br> # AccidentCount: ' + this.point.Accident_Count + ' <br> # Accident_Severity_Mean: ' + this.point.Accident_Severity_Mean)}"))
hc
Sunday has 10.97% of the total accidents.
Monday has 14.22% of the total accidents.
Tuesday has 14.98% of the total accidents.
Wednesday has 15.07% of the total accidents.
Thursday has 15.02% of the total accidents.
Friday has 16.36% of the total accidents.
Saturday has 13.37% of the total accidents.
While this huge drop is apparent on the weekend, one reason why this might occur is that there are less people working, causing less people to be on the roads.
We can also clearly see that the number of accidents has greatly decreased over years.This could be due to a number of factors. Such as more money being invested into the infrastructure (better roads), due to an increase of speeding cameras people are more obliged to follow road rules which could possibly lead to less accidents or it could be the fact that over years the quality of cars has greatly improved which has lead to less faults occuring while people drive on the road.
Here is an interactive map of Accidents that have occurred within the UK in 2015. Data points are clustered together based off the level of Severity.
Feel free to play around with some of the controls.
You can do the following:
Zoom In and Out with your Scroll Wheel
Click on Specific Clusters of Data Points to Zoom In
Use the Filters on the Right to Display Different Levels of Severity
AllAccidents <- AllAccidents[complete.cases(AllAccidents$Latitude), ]
AllAccidents <- AllAccidents %>% filter(Year == '2015')
# Set color palette
pal <- colorBin("YlOrRd", domain = AllAccidents$Accident_Severity, 1:4)
# Creating groups of individual severity levels for filtering
groups <- as.character(sort(unique(AllAccidents$Accident_Severity), decreasing = TRUE))
# Creating html labels for map
labs <- lapply(seq(nrow(AllAccidents)), function(i) {
paste0(
'Severity: ', AllAccidents[i, "Accident_Severity"],'<br/>',
'Number of Casualties: ', AllAccidents[i, "Number_of_Casualties"], ' <br/>',
'Date: ', AllAccidents[i, "Date"], '<br/>',
'Time: ', AllAccidents[i, "Time"], '<br/>',
'Road Type: ', AllAccidents[i, "Road_Type_Word"], ' <br/>',
'Speed Limit: ', AllAccidents[i, "Speed_limit"], ' <br/>',
'Road Surface Conditions: ', AllAccidents[i, "Road_Surface_Conditions"], ' <br/>'
)})
# Create a df for the labels
labeldf <- data.frame(Severity = AllAccidents$Accident_Severity, Labels = unlist(labs), stringsAsFactors = FALSE)
# Create a map from leaflet to build from
map <- leaflet(AllAccidents, options = leafletOptions(minZoom = 5, maxZoom = 18)) %>%
setView(lng = 3.4360, lat = 55.3781, zoom = 5) %>%
setMaxBounds(6.4360, 62.000, -8.4360, 45.000) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addLayersControl(overlayGroups = groups,
options = layersControlOptions(collapsed = TRUE))
# Create layers for each severity level to be filtered
for (x in groups) {
indiGroup = AllAccidents[AllAccidents$Accident_Severity == x,]
map <- map %>%
addCircleMarkers(data = indiGroup,
clusterOptions = markerClusterOptions(),
~Longitude, ~Latitude,
group = x,
radius = 5,
color = ~pal(Accident_Severity),
stroke = FALSE,
fillOpacity = 0.5,
label = lapply(labeldf[labeldf$Severity == x,]$Labels, htmltools::HTML)
)
}
# Display graph
map