── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
1.Import dataset using read.csv
rad <-read_csv("data/Road Accident Data.csv")
Rows: 307973 Columns: 23
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (16): Accident_Index, Accident Date, Month, Day_of_Week, Junction_Contr...
dbl (6): Year, Latitude, Longitude, Number_of_Casualties, Number_of_Vehicl...
time (1): Time
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
inspect dataset
head(rad) #returns the first couple of rows to see columns and values
# A tibble: 6 × 23
Accident_Index `Accident Date` Month Day_of_Week Year Junction_Control
<chr> <chr> <chr> <chr> <dbl> <chr>
1 200901BS70001 1/1/2021 Jan Thursday 2021 Give way or uncontroll…
2 200901BS70002 1/5/2021 Jan Monday 2021 Give way or uncontroll…
3 200901BS70003 1/4/2021 Jan Sunday 2021 Give way or uncontroll…
4 200901BS70004 1/5/2021 Jan Monday 2021 Auto traffic signal
5 200901BS70005 1/6/2021 Jan Tuesday 2021 Auto traffic signal
6 200901BS70006 1/1/2021 Jan Thursday 2021 Give way or uncontroll…
# ℹ 17 more variables: Junction_Detail <chr>, Accident_Severity <chr>,
# Latitude <dbl>, Light_Conditions <chr>, `Local_Authority_(District)` <chr>,
# Carriageway_Hazards <chr>, Longitude <dbl>, Number_of_Casualties <dbl>,
# Number_of_Vehicles <dbl>, Police_Force <chr>,
# Road_Surface_Conditions <chr>, Road_Type <chr>, Speed_limit <dbl>,
# Time <time>, Urban_or_Rural_Area <chr>, Weather_Conditions <chr>,
# Vehicle_Type <chr>
str(rad) #returns the structure of the dataset as well as the variable names and types
Our Dataset has 307973 rows and 23 columns. Most of the data is either in char or int, although sum num’s are sprinkled in. We have a unique identifier that called accident_Index, it is quite long however with 300,000 rows it may be optimal not to make our own key. We also notice that the dataset seems to come from the UK. although we did not check to be sure, the use of “carriage” and the mention of “Kensington and Chelsea” tells us this is from the UK.
Dataset containing road accidents data classified by type of injury, type of vehicle involved in accident, type of road, type of geographical area, time of the day and road conditions.
What are your motivations for exploring this dataset?
Although this subject was not my first choice, I haven’t been able to find something that could work towards what I want to do. As to this subject, I have been in a car accident before and it shook me up for a while. I wont say that it was hard to drive after but it definitely sits in the back of my mind when I do drive. So being able to understand the common causes of a car accident could help ease my mind.
What questions do you want to answer? (broad)
what attributes tend to be most associated with severe car accidents?
Hypothesis
Severe car accidents are more common in rainy junctions than any other situation.
Biases
A bias I may have is of course the fact that I have been in a car accident so I may hold a bias towards my own situation. I would also have a bias on prior understanding of driving where I know that rain and intersections cause volatile driving situations.
Data Dictionary
Variable Name
Class
Continuity
Description
Suggested R Functions
Accident_Index
Character
Discrete
Unique identifier for each accident
colnames(), unique()
Accident.Date
Character
Discrete
Date when the accident occurred
as.Date(), summary()
Month
Factor
Discrete
Month in which the accident occurred (e.g., Jan, Feb)
table(), levels()
Day_of_Week
Factor
Discrete
Day of the week when the accident occurred
table(), barplot(table())
Year
Integer
Discrete
Year when the accident occurred
unique(), hist()
Junction_Control
Factor
Discrete
Type of junction control at the accident location
table(), prop.table(table())
Junction_Detail
Factor
Discrete
Description of the junction where the accident occurred
table(), unique()
Accident_Severity
Factor
Discrete
Severity of the accident (e.g., Slight, Serious)
table(), barplot(table())
Latitude
Numeric
Continuous
Latitude coordinate of the accident location
summary(), range()
Longitude
Numeric
Continuous
Longitude coordinate of the accident location
summary(), range()
Light_Conditions
Factor
Discrete
Light conditions during the accident (e.g., Daylight, Darkness)
table(), pie(table())
Local_Authority_.District.
Factor
Discrete
Local authority district where the accident occurred
table(), length(unique())
Carriageway_Hazards
Factor
Discrete
Any hazards present on the carriageway
table(), which.max(table())
Number_of_Casualties
Integer
Discrete
Number of casualties in the accident
summary(), boxplot()
Number_of_Vehicles
Integer
Discrete
Number of vehicles involved in the accident
summary(), hist()
Police_Force
Factor
Discrete
Police force responsible for the accident location
table(), sort(table())
Road_Surface_Conditions
Factor
Discrete
Surface condition of the road at the accident location
table(), prop.table(table())
Road_Type
Factor
Discrete
Type of road where the accident occurred
table(), summary()
Speed_limit
Integer
Discrete
Speed limit on the road where the accident occurred
summary(), hist()
Time
Character
Discrete
Time when the accident occurred (HH:MM)
substr(), strptime(Time, format="%H:%M")
Urban_or_Rural_Area
Factor
Discrete
Indicates if the accident occurred in an urban or rural area
table(), barplot(table())
Weather_Conditions
Factor
Discrete
Weather conditions during the accident (e.g., Fine, Rain, Fog)