The accidents_data_raw dataset has over 200,000 observations of 24 variables. Its variable types include Character, Integer, and Numerical values. The dataset describes information about traffic accidents from 2016 to 2023. The variables describe aspects regarding the accident such as the conditions, type, and outcomes. Based on the initial inspection, “crash_date” stands out as a column of interest. It is stored as a character, but could be more useful as a datetime, so that may be a value to change in the future. The author states that the data for this dataset was “obtained from the internet”.
I want to explore this dataset to gain a better understanding of the potential causes of traffic accidents. Last year, my sister was in a car accident that left her with a broken leg. She wasn’t able to walk for a while, and still has a bit of a limp. I’ve always been a careful driver, but since then I’ve been extra vigilant. Exploring this data will help me be more aware of the conditions behind traffic accidents and how I can best avoid them.
Questions
In what weather and lighting conditions do the most traffic accidents occur?
What is the most common primary cause of severe traffic accidents?
Is there a correlation between crash type, severity, and/or damage?
Do significantly more traffic accidents occur during a specific time of year?
Hypothesis
Traffic accidents resulting in severe injuries (fatal or incapacitating) occur more often during adverse weather conditions and poor lighting, suggesting a significant positive correlation between poor driving conditions and accident-related injuries.
Ethical Considerations
Given that the data was collected from the internet, there could be some concerns regarding how the data was collected. Its source could be unreliable or inaccurate. There is also no indication about what area or region this data was collected from. Based on the types of variables in the data, I am assuming that the information for this dataset likely came from police or insurance reports.
Data Dictionary
Variable Name
Class/Data Type
Continuity
Description
Suggested R Function
crash_date
Character
Continuous
The date the accident occurred.
separate(), mutate()
traffic_control_device
Character
Continuous
The type of traffic control device involved (e.g., traffic light, sign)
group_by()
weather_condition
Character
Continuous
The weather conditions at the time of the accident
group_by()
lighting_condition
Character
Continuous
The lighting conditions at the time of the accident
group_by()
first_crash_type
Character
Continuous
The initial type of the crash (e.g., head-on, rear-end)
group_by()
trafficway_type
Character
Continuous
The type of roadway involved in the accident (e.g., highway, local road)
group_by()
alignment
Character
Continuous
The alignment of the road where the accident occurred (e.g., straight, curved)
group_by()
roadway_surface_cond
Character
Continuous
The condition of the roadway surface (e.g., dry, wet, icy)
group_by()
road_defect
Character
Continuous
Any defects present on the road surface
group_by()
crash_type
Character
Continuous
The overall type of the crash
group_by()
intersection_related_i
Character
Discrete
Whether the accident was related to an intersection
filter(), group_by()
damage
Character
Continuous
The extent of the damage caused by the accident
group_by()
prim_contributory_cause
Character
Continuous
The primary cause contributing to the crash
group_by()
num_units
Numerical: Integer
Discrete
The number of vehicles involved in the accident
filter()
most_severe_injury
Character
Continuous
The most severe injury sustained in the crash
group_by()
injuries_total
Numerical
Discrete
The total number of injuries reported
filter(), summary()
injuries_fatal
Numerical
Discrete
The number of fatal injuries resulting from the accident
filter(), summary()
injuries_incapacitating
Numerical
Discrete
The number of incapacitating injuries
filter(), summary()
injuries_non_incapacitating
Numerical
Discrete
The number of non-incapacitating injuries
filter(), summary()
injuries_reported_not_evident
Numerical
Discrete
The number of injuries reported but not visibly evident