Chicago Traffic Accidents Project

Introduction

As in any city that has road infrastructure, traffic accidents are quite common in Chicago, Illinois. The purpose of this project is to leverage visualizations to determine which conditions lead to the most traffic accidents in the city. The dataset used in this project is taken from the Chicago Police Department’s electronic crash reporting system (known as E-Crash). It covers traffic accidents reported from 2013 to January 2024 within the city limits at the time of this project.

Dataset

The data used in this project was found on Kaggle, linked here: https://www.kaggle.com/datasets/anoopjohny/traffic-crashes-crashes/data. There are 794,956 recorded accidents in this dataset. It details crash dates, crash locations, time of crash, outside conditions, and contributing factors. There are 48 collected columns, and I have used CRASH_DATE, POSTED_SPEED_LIMIT, WEATHER_CONDITION, LIGHTING_CONDITION, FIRST_CRASH_TYPE, PRIM_CONTRIBUTORY_CAUSE, DAMAGE, STREET_NAME, INJURY_TOTAL, ROADWAY_SURFACE_COND, and MOST_SEVERE_INJURY.

For this project I have removed 2013 and 2014 from the data for some charts because the entries are sparse. I have also removed any “Not Applicable”, “Unknown”, or “Unable to Determine” entries from the data to get a clearer picture of which conditions truly lead to the most traffic accidents.

The findings reported here come from data covering January 2015 to January 2024.

Findings

Visualizations include the top 10 most accident-prone Chicago streets by accident count; traffic accidents by month and year; the most common primary causes and which one causes the most injuries; the weather, road and lighting conditions that lead to the most accidents; and the most common crash types per year. I have also included charts that will show the most common posted speed limits at accident locations, the percentage of injury types, and the percentage of damage each year.

Top 10 Most Dangerous Streets in Chicago

This bar chart shows the top 10 most dangerous streets in Chicago. There are thousands of streets in the city, so looking only at the top 10 will make this visualization more manageable. Above each bar is the total number of accidents that have occurred at that street from 2013-Jan 2024.

#Creating data frame with count of accidents by street name
library(dplyr)
streetcount <- data.frame(count(df, STREET_NAME))

streetcount <- streetcount[order(streetcount$n, decreasing = TRUE), ]


library(ggplot2)


rownames(streetcount) <- c(1:nrow(streetcount)) #making data frame index start from 1


streetcount$n <-as.numeric(streetcount$n) #making "n" into a numeric value for bar chart




chart1 <- ggplot(streetcount[1:10, ], aes(x = reorder(STREET_NAME, -n), y = n, fill = n)) +
  theme_bw() +
  geom_bar(colour="black", fill = "#08519C", stat ="identity") +
  labs(title = "10 Most Dangerous Streets in Chicago", x = "Street Name", y = "Number of Traffic Accidents", fill = "Accident Count") +
  theme(plot.title = element_text(hjust =  0.5), text = element_text(size = 10)) +
  geom_text(aes(label = scales::comma((after_stat(y))), group = STREET_NAME), vjust = -.5) +
  scale_y_continuous(labels=comma) 

chart1

Western Avenue is the most accident-prone street in Chicago. According to the Chicago Public Library, Western Ave. is actually famous for being Chicago’s longest street. It is 24 miles long and runs from Howard Street at the northern city limit to 119th street, the southern city limit. It even continues south of the city for approximately another 26 miles, although any accidents past the city limits are not included in this project.

It does seem reasonable to me that a long, famous street that cuts right through the middle of the city would have the majority of the traffic accidents because of the volume of traffic traveling on it each day. There are residences, businesses, and industrial buildings all located on this road.

The rest of the roads included in this top 10 are major highways or streets, significant to residents, and very busy. N Michigan Ave. even contains the Magnificent Mile! This visualization demonstrates that large and busy streets often lead to more traffic accidents.

Traffic Accidents by Month and Year

This multi-line chart shows the number of accidents per month, broken down by year. 2013 and 2014 have been removed from this set because there is a single-digit number of accidents in the dataset for both years combined. 2024 only includes a point for January because, at the time of this project, that is all the available data for this year.

months_df2 <- df %>%
  select(CRASH_DATE) %>%
  mutate(year = year(mdy_hms(CRASH_DATE)), 
         months = months(mdy_hms(CRASH_DATE), abbreviate = TRUE)) %>%
  group_by(year, months) %>%
  summarise(n = length(CRASH_DATE), .groups = 'keep') %>%
  data.frame()


months_df2 <- months_df2[-c(1, 2, 3, 4, 5, 6, 7), ] #removing data for 2013 and 2014 because there is not enough



months_df2$year <- as.factor(months_df2$year) #making "year" into a factor

#changing values 1-12 to month names
month_order2 <-factor(months_df2$months, level = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))

chart2 <- ggplot(months_df2, aes(x = month_order2, y = n, group = year)) +
  geom_line(aes(color = year), linewidth = 3) +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))+
  labs(title = "Traffic Accidents in Chicago per Month", x = "Months", y = "Number of Traffic Accidents") +
  geom_point(shape = 21, size = 4, color = "black", fill = "white") +
  scale_y_continuous(labels = comma) +
  scale_color_brewer(palette = "RdBu", name = "Year", guide = guide_legend(reverse = TRUE)) +
  geom_label_repel(aes(label= ifelse(n == 508, "Jan 2024, 508", "")), 
                   box.padding = 1, 
                   point.padding = 1, 
                   size = 4, 
                   color = "darkgray", 
                   segment.color = "darkgray") +
  facet_wrap(~year, ncol=5, nrow=5)
  

chart2

According to this chart, it appears that most traffic accidents occur from about May to October. There are always events, festivals, and concerts happening in this time frame. According to The Savvy Globetrotter, the summer and early fall are the most popular seasons for tourists to visit Chicago because the weather is great and there are so many outdoor events to enjoy. More tourists likely means more cars on Chicago roads, and therefore more traffic accidents.

I also noted an interesting dip in accidents from February to May of 2020, during the COVID pandemic stay-at-home orders, so fewer people on the streets also correlates to fewer traffic accidents.

The dramatic change in January and December 2023 to January 2024 is also intriguing. I wonder if the data was not entirely up-to-date for January 2024, or if there was another reason for the sudden decrease.

Total Injuries by Primary Cause of Traffic Accidents

This dual-axis chart shows the primary cause for accidents as recorded by the police. There are 40 different primary causes listed in the data, so I decided to only include causes attributed with more than 1,000 accidents. That turned out to be 12 primary causes. The line chart shows a count of the number of injuries per each cause.

#Setting up table for stacked bar chart------

#Creating df with primary cause of accident and accident year
new_table <-data.frame(count(df, PRIM_CONTRIBUTORY_CAUSE, year(mdy_hms(CRASH_DATE))))


#Removing NAs from df
library(tidyr)
new_table <- drop_na(new_table)

new_table <-new_table[order(new_table$n, decreasing = TRUE), ]


#Removing "unable to determine" from primary cause of accident
UDRows2 <- which(new_table$PRIM_CONTRIBUTORY_CAUSE %like any% c("%UNABLE TO DETERMINE%"))


BadTotal3 <- sum(new_table[UDRows2, "n"])


new_table2 <- new_table[-UDRows2, ]


#Making table index start from 1
rownames(new_table2) <- c(1:nrow(new_table2))


#Removing 2013 and 2014 from table because there isn't enough data
RemoveRows <- which(new_table2$Year %like any% c("2013", "2014"))


new_table2 <- new_table2[-c(345, 346, 347), ]



#Excluding primary causes that don't have enough data, only causes that have count over 1,000.

new_table3 <- new_table2 %>%
  group_by(PRIM_CONTRIBUTORY_CAUSE) %>%
  mutate(Total = n) %>%
  filter(Total > 1000)



colnames(new_table3)[2] <- "Year" #renaming "year.mdy_hms.CRASH_DATE" to just year

new_table3$Year <- as.factor(new_table3$Year) #making "Year" a factor to use in fill




#Creating df with total number of accidents per year
primcount = data.frame(count(df, PRIM_CONTRIBUTORY_CAUSE))

primcount = primcount[order(primcount$n, decreasing = TRUE), ]


#Removing "unable to determine" rows from total per year
UDPrim <- which(primcount$PRIM_CONTRIBUTORY_CAUSE %like any% c("%UNABLE TO DETERMINE%"))

primcount <- primcount[-UDPrim, ]


#Removing NA from total per year
NARows <- is.na(primcount$PRIM_CONTRIBUTORY_CAUSE)


primcount <- drop_na(primcount)


#Adding INJURY TOTAL line to stacked bar chart----

#finding total injuries for top primary causes
prim_top <- primcount$PRIM_CONTRIBUTORY_CAUSE[1:12] #Top 12 primary causes in stacked bar



#Adding up all injuries for each primary cause
injuries_df <- df %>%
  filter(PRIM_CONTRIBUTORY_CAUSE %in% prim_top) %>%
  select(PRIM_CONTRIBUTORY_CAUSE, INJURIES_TOTAL) %>%
  drop_na(INJURIES_TOTAL) %>%
  group_by(PRIM_CONTRIBUTORY_CAUSE) %>%
  summarise(totinjuries = sum(INJURIES_TOTAL)) %>%
  data.frame()

injuries_df$totinjuries <-as.numeric(injuries_df$totinjuries) #making sure total injuries are a number to use in line plot


#Making 2nd y axis labels and breaks
ylab <- seq(0, max(injuries_df$totinjuries), 3000)


mylabels <- paste0(ylab)


#Code for dual axis chart
chart3 <- ggplot(new_table3, aes(x = reorder(PRIM_CONTRIBUTORY_CAUSE, n, sum), y = n, fill = Year)) +
  geom_bar(stat = "identity", position = position_stack(reverse = TRUE)) +
  coord_flip() +
  theme_bw() +
  scale_y_continuous(labels = comma, limits = c(NA, 95000)) +
  labs(title = "Total Injuries by Primary Cause of Traffic Accidents", x = "Cause of Accident", y = "Number of Accidents") +
  theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 10)) +
  scale_fill_brewer(palette = "Spectral", guide = guide_legend(reverse = TRUE)) +
  geom_line(inherit.aes = FALSE, data = injuries_df,
            aes(x = PRIM_CONTRIBUTORY_CAUSE, y = totinjuries*2, color = "Total Injuries", group = 1), size = 1) +
  scale_color_manual(NULL, values = "black") +
  scale_y_continuous(labels = comma, 
                     sec.axis = sec_axis(~. /2, name = "Total Injuries", labels = mylabels, breaks = ylab)) +
  geom_point(inherit.aes = FALSE, data = injuries_df,
             aes(x = PRIM_CONTRIBUTORY_CAUSE, y = totinjuries*2, group = 1),
             size = 2, shape = 21, fill = "white", color = "black")
  

chart3

“Failing to Yield Right-of-Way” is the most common primary cause of traffic accidents in Chicago. It is also the cause that leads to the most injuries for involved in accidents by far, with 27,640 total injuries. Most of these primary causes can be attributed to human error.

The number of injuries sustained per each cause seems to increase for those that would probably happen with more reckless or distracted driving, or perhaps at higher speeds. “Following too Closely”, “Failing to Reduce Speed to Avoid Crash”, and “Disregarding Traffic Signals” all have a higher number of total injuries reported. Meanwhile, “Improper Backing” and “Disregarding Stop Sign”, two primary causes that would probably happen at lower speeds or in a neighborhood, have relatively few injuries reported.

It appears that most accidents happen due to human error, and failing to yield right-of-way while driving is especially dangerous for motorists in Chicago.

Weather, Roadway, and Lighting Conditions

These heatmaps look at the environmental conditions reported during each accident. Weather Condition is the state of the weather at the time of the accident. Lighting Condition is the amount and quality of light present at the time of the accident, listed by time of day. In the other chart, Roadway Surface Condition is what was on the road at the time of the accident. The missing spaces mean there were no accidents reported with both those conditions in the dataset.

#Creating data frame for heatmap. Weather is weather condition at time of accident, lighting is the time of day.
condition_df <- df %>%
  select(WEATHER_CONDITION, LIGHTING_CONDITION) %>%
  group_by(WEATHER_CONDITION, LIGHTING_CONDITION) %>%
  summarise(n = length(WEATHER_CONDITION), .groups = 'keep') %>%
  drop_na(WEATHER_CONDITION, LIGHTING_CONDITION) %>%
  data.frame

#Removing Unknown and Other from data
ConUnknown <- which(condition_df$WEATHER_CONDITION %like any% c("%UNKNOWN%"))
ConUnknown2 <- which(condition_df$LIGHTING_CONDITION %like any% c("%UNKNOWN"))
ConOther <- which(condition_df$WEATHER_CONDITION %like any% c("%OTHER%"))
ConOther2 <- which(condition_df$LIGHTING_CONDITION %like any% c("%OTHER%"))

condition_df <- condition_df[-c(ConUnknown, ConUnknown2, ConOther, ConOther2), ]

#Creating breaks for heatmap gradient
conbreaks <- c(seq(0, max(condition_df$n), by=50000))

#Building heatmap
chart4 <- ggplot(condition_df, aes(x = LIGHTING_CONDITION, y = WEATHER_CONDITION, fill = n)) +
  geom_tile(color = "black") +
  geom_text(aes(label = comma(n))) +
  theme_bw() +
  labs(title = "Environmental Elements Leading to Accidents", x = "Lighting Condition", y = "Weather Condition",
       fill = "Accident Count") +
  theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 10)) +
  scale_fill_continuous(low = "white", high = "#08519C", labels = comma, breaks = conbreaks) +
  guides(fill = guide_legend(reverse=TRUE, override.aes = list(color="black")))

chart4

Surprisingly, most accidents happen during the day in clear weather. Even then, the next most common accident conditions are clear weather and at night on a lighted road. I thought that accidents would be much more common at night or in bad weather.

The National Safety Council says that night is the most dangerous time to drive, and this is demonstrated somewhat in this visualization. “Darkness” and “Darkness, Lighted Road” do have more accidents for each type of weather than “Dusk” or “Dawn”. The second most common weather and lighting combination, “Darkness, Lighted Road – Clear” still has 292,634 fewer accidents than “Daylight – Clear”.

Perhaps this is due to an uptick in motorists on the road during the day; I decided to create another chart to take into consideration the potential roadway surface condition.

#creating df with roadway surface and lighting condition
surface_df <- df %>%
  select(ROADWAY_SURFACE_COND, LIGHTING_CONDITION) %>%
  group_by(ROADWAY_SURFACE_COND, LIGHTING_CONDITION) %>%
  summarise(n = length(ROADWAY_SURFACE_COND), .groups = 'keep') %>%
  drop_na(ROADWAY_SURFACE_COND, LIGHTING_CONDITION) %>%
  data.frame

#Removing Unknown and Other from data
ConUnknown3 <- which(surface_df$ROADWAY_SURFACE_COND %like any% c("%UNKNOWN%"))
ConUnknown4 <- which(surface_df$LIGHTING_CONDITION %like any% c("%UNKNOWN"))
ConOther3 <- which(surface_df$ROADWAY_SURFACE_COND %like any% c("%OTHER%"))
ConOther4 <- which(surface_df$LIGHTING_CONDITION %like any% c("%OTHER%"))

surface_df <- surface_df[-c(ConUnknown3, ConUnknown4, ConOther3, ConOther4), ]

#Creating breaks for heatmap gradient
conbreaks5 <- c(seq(0, max(surface_df$n), by=50000))

#Building heatmap
chart9 <- ggplot(surface_df, aes(x = LIGHTING_CONDITION, y = ROADWAY_SURFACE_COND, fill = n)) +
  geom_tile(color = "black") +
  geom_text(aes(label = comma(n))) +
  theme_bw() +
  labs(title = "Roadway Conditions Leading to Accidents", x = "Lighting Condition", y = "Road Surface Condition",
       fill = "Accident Count") +
  theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 10)) +
  scale_fill_continuous(low = "white", high = "#08519C", labels = comma, breaks = conbreaks5) +
  guides(fill = guide_legend(reverse=TRUE, override.aes = list(color="black")))

chart9

Once again, most accidents happen during the day in dry conditions. This is not surprising in terms of the data. Since most accidents happen in clear weather, it stands to reason that the road would be dry. The next most common roadway condition for accidents to occur is when the road is wet. This data lines up with the previous heatmap quite well.

Most Severe Injuries Sustained in Traffic Accidents

This donut chart shows the percentage of the most severe injuries sustained in a traffic accident for the years 2015-2024. This chart does not count up each injury reported in the accident, only the most severe.

#CHART 5: DONUT OF MOST SEVERE INJURY----
#Counting up most severe injuries into a data frame
top_injuries <- count(df, MOST_SEVERE_INJURY)
top_injuries <- drop_na(top_injuries)
top_injuries <- top_injuries[order(-n), ]

#Creating dataframe for donut chart (including percentages for pie labels)
severe_df <- df %>%
  select(MOST_SEVERE_INJURY, CRASH_DATE) %>%
  mutate(year = year(mdy_hms(CRASH_DATE))) %>%
  group_by(year, MOST_SEVERE_INJURY) %>%
  summarise(n=length(MOST_SEVERE_INJURY), .groups = 'keep') %>%
  group_by(year) %>%
  mutate(percent_of_total = round(100*n/sum(n),1)) %>%
  ungroup() %>%
  drop_na(MOST_SEVERE_INJURY) %>%
  data.frame()

#Removing 2013 and 2014 from the data frame because there isn't much data
RidRows <- which(severe_df$year %like any% c("2013", "2014"))


severe_df <- severe_df[-RidRows, ]

#Creating donut chart using plotly
chart5 <- plot_ly(severe_df, labels = ~MOST_SEVERE_INJURY, values = ~n) %>%
  add_pie(hole=0.6) %>%
  layout(title = "Injuries from Chicago Traffic Accidents (2015-2024)") %>%
  layout(annotations=list(text=paste0("Total Number Reported: \n", scales::comma(sum(severe_df$n))), 
                          "showarrow"=F))

chart5

Out of a total of 793,196 reports, the vast majority of accidents actually had “No indication of Injury” or injuries that were “Reported, Not Evident”. Actual incapacitating, non-incapacitating, or fatal injuries made up a very small percentage of the total. So, most accidents in Chicago from 2015-2024 did not lead to injury.

This may be because the speed limits posted in cities are often lower to account for traffic and population density. Lower speeds are less likely to cause massive damage in an accident.

Cost of Damage per Year

This trellis pie chart breaks down the amount of damage caused by traffic accidents each year. The dataset recorded damage by repair cost for vehicles involved in each accident.

#Counting up damage to put into data frame
top_damage <- count(df, DAMAGE)
top_damage <- drop_na(top_damage)
top_damage <- top_damage[order(-n), ]


#Creating dataframe for trellis pies (including percentages for pie labels)
damage_df <- df %>%
  select(DAMAGE, CRASH_DATE) %>%
  mutate(year = year(mdy_hms(CRASH_DATE))) %>%
  group_by(year, DAMAGE) %>%
  summarise(n=length(DAMAGE), .groups = 'keep') %>%
  group_by(year) %>%
  mutate(percent_of_total = round(100*n/sum(n),1)) %>%
  ungroup() %>%
  drop_na(DAMAGE) %>%
  data.frame()

#Removing 2013 and 2014 from the data frame because there isn't much data
RidRows2 <- which(damage_df$year %like any% c("2013", "2014"))


damage_df <- damage_df[-RidRows2, ]


#Changing order of damage amount legend so largest value is on top of the legend
damage_df$DAMAGE = factor(damage_df$DAMAGE, levels = c("OVER $1,500", "$501 - $1,500", "$500 OR LESS"))

#Creating pie charts
chart6 <- ggplot(data = damage_df, aes(x = "", y = n, fill = DAMAGE)) +
  geom_bar(stat = "identity", position = "fill") +
  coord_polar(theta = "y", start = 0) +
  labs(fill = "Damage Amount", x = NULL, y = NULL, title = "Amount of Damage from Chicago Traffic Accidents by Year") +
  theme_bw() +
  theme(plot.title = element_text(hjust =0.5),
        axis.text = element_blank(), 
        axis.ticks = element_blank(),
        panel.grid = element_blank()) +
  facet_wrap(~year, ncol=4, nrow=4) +
  scale_fill_brewer(palette = "Blues") +
  geom_text(aes(x=1.8, label = paste0(percent_of_total, "%")),
            size = 4,
            position = position_fill(vjust = 0.5))
chart6

Most traffic accidents led to over $1,500 of damage every year. The percentage increases each year, to a remarkable 77.2% of all accidents in 2024. The fact that over half of all accidents cause over $1,500 of damage every year is most likely due to the complicated technological advancements, supply chain issues, and inflation that have been affecting the industry for years now.

This aspect does not have any effect on whether or not an accident actually happens, but damage to vehicles involved is one of the major outcomes of a traffic accident. This and the injury visualization provide insight into the aftermath of most traffic accidents in Chicago. It appears that cars are becoming more expensive to repair and maintain as the years go by.

Top 10 Most Common Speed Limits per Year

This trellis bar chart shows the top 10 posted speed limits by accident count, broken down by year. As with the top 10 streets chart, there were many more posted speed limits included in the data. I decided to look at the top 10 because many of the others had very few accidents. This chart has 2013, 2014, and 2024 removed because they lacked enough entries to properly visualize.

#Making table for top 10 speed limits with the most crashes
top_speed <- data.frame(count(df, POSTED_SPEED_LIMIT))

top_speed <- top_speed[order(top_speed$n, decreasing = TRUE), ]

top_speed <- top_speed$POSTED_SPEED_LIMIT[1:10]

#Making data frame for bar chart
speed_count <- df %>%
  filter(POSTED_SPEED_LIMIT %in% top_speed) %>%
  select(CRASH_DATE, POSTED_SPEED_LIMIT) %>%
  mutate(year = year(mdy_hms(CRASH_DATE))) %>%
  group_by(POSTED_SPEED_LIMIT, year) %>%
  summarise(n = length(POSTED_SPEED_LIMIT), .groups = 'keep') %>%
  data.frame()

#Dropping 2013, 2014, and 2024 due to not enough data
DropYears <- which(speed_count$year %like any% c("%2013%", "%2014%", "%2024%"))
speed_count <- speed_count[-c(DropYears), ]


speed_count$year <- as.factor(speed_count$year)
speed_count$POSTED_SPEED_LIMIT <- as.factor(speed_count$POSTED_SPEED_LIMIT)


#Making trellis bar plot
chart7 <- ggplot(data = speed_count, aes(x = POSTED_SPEED_LIMIT, y = n, fill = year)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_bw() +
  coord_flip() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = comma) +
  labs(title = "Top 10 Most Common Posted Speed Limits in Chicago Accidents", 
       x = "Posted Speed Limit (MPH)",
       y = "Accident Count",
       fill = "Year") +
  scale_fill_brewer(palette = "Paired", name = "Year", guide = guide_legend(reverse = TRUE)) +
  facet_wrap(~year, ncol=3, nrow=3)

chart7

Probably unsurprisingly, the most accidents in Chicago happen at a posted speed limit of 30 miles per hour each year. It’s the most common speed limit on urban streets, and even the default speed limit for the entire state of Illinois. Since this data only looks at accidents within the city, 30 mph and similar speed limits are what I would expect to see represented the most.

This does not mean that every car involved in the accident was traveling at 30 mph, just that it was the posted speed limit in the area of the crash. Most Chicago accidents happen on major urban roads in the city, so this tracks with previous visualizations.

Posted speed limit, per this visualization, does not appear to have any special effect on Chicago accidents.

First Crash Type per Year

This heatmap shows the number of accidents attributed to each crash type included in the E-Crash system. All crash types are included in this data, and any blank spaces mean that there are no recorded accidents for that year and crash type in the data.

type_df <- df %>%
  select(FIRST_CRASH_TYPE, CRASH_DATE) %>%
  mutate(year = year(mdy_hms(CRASH_DATE))) %>%
  group_by(FIRST_CRASH_TYPE, year) %>%
  summarise(n = length(FIRST_CRASH_TYPE), .groups = 'keep') %>%
  drop_na(FIRST_CRASH_TYPE, year) %>%
  data.frame

#Removing 2013 and 2014 from dataframe due to not enough data
RidRows4 <- which(type_df$year %like any% c("%2013%", "%2014%"))

type_df <- type_df[-c(RidRows4), ]

#Making year a factor
type_df$year <- as.factor(type_df$year)

#Creating breaks for heatmap gradient
conbreaks2 <- c(seq(0, max(type_df$n), by=5000))

#Building heatmap
chart8 <- ggplot(type_df, aes(x = year, y = FIRST_CRASH_TYPE, fill = n)) +
  geom_tile(color = "black") +
  geom_text(aes(label = comma(n))) +
  theme_bw() +
  labs(title = "Types of Traffic Accidents in Chicago", x = "Year", y = "Crash Type",
       fill = "Accident Count") +
  theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 10)) +
  scale_fill_continuous(low = "white", high = "cadetblue4", labels = comma, breaks = conbreaks2) +
  guides(fill = guide_legend(reverse=TRUE, override.aes = list(color="black")))

chart8

The most common crash types are “Rear End”, “Turning”, “Parked Motor Vehicle”, and “Sideswipe Same Direction”, all of which would most likely be more common in an urban area with a lot of traffic, street parking, one-way streets, and a grid layout. The most common crash types corroborate with the most common primary causes shown in another visualization. Most primary causes are a result of a motorist driving distracted or failing to obey traffic laws, and the crash types line up with the primary causes. These crash types can be traced back to the primary causes in “Total Injuries by Primary Cause of Traffic Accidents”. For example, “Rear End” crashes usually happen because another driver was “Following too Closely”.

Busy streets, ignoring road rules, and distracted drivers lead to more traffic accidents in Chicago’s urban environment.

Conclusions

After analyzing each visualization, it appears that the major factors that lead to the most traffic accidents in the city of Chicago are: large streets with high traffic levels, a tourist season from approximately May to October meaning more people on the road, human error and distracted driving. The most common primary causes and crash types are what would be expected in an urban environment versus driving on the highway; drivers in Chicago should be especially careful to obey road signs and right-of-way rules, avoid following other cars too closely, and watch out for parked cars on city streets.