This report looks at traffic accidents in the United States from March 2016 to February 2023. Road safety is still a major problem because millions of accidents happen every year, resulting in serious injuries, fatalities, and large financial losses. Identifying the reasons behind these collisions is essential to creating plans to lower risks and enhance traffic safety.There are several issues that conributes to the accidents but in this report we will be more focused on weather conditions, temperature, different time of the year, and different hours of day. The purpose behind this report is to serve as a valuable asset to regular citizens and those higher authorities on transportation department. The information can help officials better plan for emergency responses, deploy safer road infrastructure, and allocate resources.
The dataset includes approximately 7.7 million accident records collected from 2016 to 2023 across 49 states in the United States. It provides a comprehensive overview of traffic incidents, covering a wide range of factors such as visibility, temperature, wind direction, weather conditions, and road features, offering valuable insights into how environmental and situational factors contribute to accidents
The following tabs provide detailed information regarding accidents that happened in US from February 2016 to March 2023. Each tab has a unique visualization to show the different aspects of the accidents.
The bar chart displays the top 10 U.S. states with the highest number of traffic accidents from 2016 to 2023. California (CA) has the highest accident count, followed by Florida (FL) and Texas (TX)
top_states <- df %>%
group_by(State) %>%
summarise(Accident_Count = n(), .groups = 'drop') %>%
top_n(10, Accident_Count)
plot_top_states <- ggplot(top_states, aes(x = reorder(State, Accident_Count), y = Accident_Count, fill = State)) +
geom_col() +
labs(title = "Top 10 States with Most Accidents from 2016 to 2023", x = "State", y = "Number of Accidents") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
print(plot_top_states)
The data provides insights into how weather patterns influence accident rates over the years.
df <- df %>%
mutate(Start_Time = as.POSIXct(Start_Time, format = "%Y-%m-%d %H:%M:%S")) %>%
mutate(Year = format(Start_Time, "%Y")) %>%
filter(Year %in% 2016:2023)
weather_summary <- df %>%
group_by(Year, Weather_Condition) %>%
summarise(Count = n(), .groups = 'drop') %>%
mutate(Percentage = Count / sum(Count) * 100)
weather_summary <- weather_summary %>%
mutate(Weather_Condition = ifelse(Percentage < 1, "Other", Weather_Condition)) %>%
group_by(Year, Weather_Condition) %>%
summarise(Count = sum(Count), .groups = 'drop') %>%
mutate(Percentage = Count / sum(Count) * 100)
pie_charts <- ggplot(weather_summary, aes(x = "", y = Percentage, fill = Weather_Condition)) +
geom_bar(stat = "identity", width = 1) +
coord_polar(theta = "y") +
facet_wrap(~Year, scales = "free") +
scale_fill_viridis(discrete = TRUE, option = "D") +
labs(title = "Proportion of US Accidents by Weather Condition and Year",
x = NULL, y = NULL, fill = "Weather Condition") +
theme_void() +
theme(legend.position = "right")
print(pie_charts)
The line graph shows the number of accidents reported by hour of the day. Accidents happens most during peak hours 8-9 AM and lowest is during 3-4 AM.
df <- df %>%
mutate(Start_Time = ymd_hms(Start_Time),
Hour = hour(Start_Time)) %>%
drop_na(Start_Time)
hourly_accidents <- df %>%
group_by(Hour) %>%
summarise(Accident_Count = n(), .groups = 'drop')
max_accidents <- max(hourly_accidents$Accident_Count)
min_accidents <- min(hourly_accidents$Accident_Count)
hourly_plot <- ggplot(hourly_accidents, aes(x = Hour, y = Accident_Count)) +
geom_line(group=1, color="black") +
geom_point(color="red", size=3, show.legend=FALSE) +
geom_text(data = subset(hourly_accidents, Accident_Count == max_accidents | Accident_Count == min_accidents),
aes(label=Accident_Count), vjust=-0.5, color="blue") +
geom_point(data = subset(hourly_accidents, Accident_Count == max_accidents | Accident_Count == min_accidents),
aes(x = Hour, y = Accident_Count), color="green", size=4) +
scale_x_continuous(breaks = 0:23, labels = 1:24) +
labs(title = "Accidents Reported by Hour", x = "Hour of the Day", y = "Number of Accidents") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
print(hourly_plot)
The bar chart displays the top 10 cities with the most accidents each year from 2016 to 2023. Miami,and houston consistently report high accident numbers.
df$Start_Time <- as.Date(df$Start_Time)
top_cities_by_year <- df %>%
group_by(Year = format(Start_Time, "%Y"), City) %>%
summarise(Accident_Count = n(), .groups = 'drop') %>%
arrange(Year, desc(Accident_Count)) %>%
group_by(Year) %>%
slice_max(order_by = Accident_Count, n = 10) %>%
ungroup()
top_cities_by_year <- top_cities_by_year %>%
filter(Year >= 2016 & Year <= 2023) %>%
group_by(Year) %>%
arrange(desc(Accident_Count), .by_group = TRUE) %>%
mutate(City = factor(City, levels = unique(City))) %>%
ungroup()
plot_top_cities <- ggplot(top_cities_by_year, aes(x = reorder(City, Accident_Count), y = Accident_Count, fill = factor(Year))) +
geom_col() +
coord_flip() +
labs(title = "Top 10 Cities with Most Accidents in each Year (2016-2023)", x = "City", y = "Number of Accidents", fill = "Year") +
theme_minimal() +
theme(axis.text.y = element_text(angle = 0, hjust = 1), legend.position = "bottom") +
scale_fill_brewer(palette = "Set3")
print(plot_top_cities)
The heatmap illustrates the number of accidents based on temperature and month. Warmer temperature leads to more accidents than colder.
df$Month <- format(as.Date(df$Start_Time, "%Y-%m-%d %H:%M:%S"), "%m")
df$Temperature.F <- as.numeric(as.character(df$Temperature.F))
na_count <- sum(is.na(df$Temperature.F))
df <- df[!is.na(df$Temperature.F), ]
min_temp <- min(df$Temperature.F, na.rm = TRUE)
max_temp <- max(df$Temperature.F, na.rm = TRUE)
df_filtered <- df[df$Temperature.F >= -29 & df$Temperature.F <= 131, ]
if(nrow(df_filtered) == 0) {
stop("No valid temperature data available after filtering.")
}
df <- df_filtered
df$Temp_Bracket <- cut(df$Temperature.F,
breaks=seq(from=-29, to=131, by=10),
include.lowest=TRUE)
accidents_summary <- df %>%
group_by(Month, Temp_Bracket) %>%
summarise(Accident_Count = n(), .groups = 'drop')
heatmap_plot <- ggplot(accidents_summary, aes(x = Month, y = Temp_Bracket, fill = Accident_Count)) +
geom_tile(color = "black") +
scale_fill_gradient(low = "blue", high = "red") +
labs(title = "Heatmap of Accidents by Temperature and Month",
x = "Month (1=Jan, 12=Dec)",
y = "Temperature (°F)",
fill = "Number of Accidents") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 0, hjust = 0.5))
print(heatmap_plot)
Hazardous conditions are not necessarily the major cause of accidents, as a considerable number of incidents take place in gloomy, overcast, or fair weather. Because of greater traffic during favorable conditions, clear weather does not always translate into safer roadways. Texas has the most accidents. It does kind of makes sense aswell as texas is one of the biggest state.Crowded cities like miami and los angles is the hotspot of the accident. Accidents tend to happen more in peak hours of 7-9 AM and 4-7 am as there are heavy traffic and people are in hurry at that time.Extremely cold and hot temperatures have lower accident counts, possibly due to reduced travel activity. whereas the warm temperature of 50-80 F sees a lot more accident due to heavy traffic and people enganing in more outdoor activites. So we can conclude that weather conditions, temperature, time of the day, crowd in the city plays a factor in an accident. By keeping all these mind, citizens and authorities in position can improve the traver for better.