2023-02-19

Introduction

Loading the data

  • First, I loaded the data. I put the Arizona Accidents file in data and the csv file with information about the Arizona counties in counties
data <- read.csv("AZ_Accidents.csv")
counties <- read.csv("azcounties.csv")

Libraries

  • Next, I imported the following libraries to be used in my project:
library(tidyverse)
library(sf)
library(mapview)
library(ggplot2)
library(dplyr)
library(lubridate)
library(reshape2)

Transforming the Data

  • I changed the Start_Time variable from a chr to a date
  • I created a Weekday variable using the new Start_Time variable
data$Start_Time <- mdy(data$Start_Time)
data$Weekday <- lubridate::wday(data$Start_Time, label=T)

First look at the Dataset

data %>%
  select(Severity, Start_Time, Start_Lat, Start_Lng, County, Weather_Condition) %>%
  top_n(5)
## Selecting by Weather_Condition
##   Severity Start_Time Start_Lat Start_Lng   County      Weather_Condition
## 1        2 2016-09-27  33.66886 -112.1284 Maricopa        Widespread Dust
## 2        2 2016-09-27  33.54276 -112.1124 Maricopa        Widespread Dust
## 3        2 2016-08-05  33.38471 -111.9676 Maricopa Thunderstorms and Rain
## 4        2 2016-06-23  33.46216 -112.0236 Maricopa        Widespread Dust
## 5        2 2016-07-18  33.46188 -112.0524 Maricopa Thunderstorms and Rain
## 6        2 2016-07-19  33.29180 -111.9056 Maricopa        Widespread Dust

Questions

  • Question: Where do the bulk of accidents happen in Arizona? And what could be some of the reasons why they occur in a certain area?
  • Other Questions:
  • Do certain cities or counties see more crashes than others?
  • Is there a time of year or a specific day(s) of the week where more accidents occur?
  • Does inclement weather have any affect on amount of severity of crashes?
  • Do accidents that are more serious occur on different days than less sever accidents?

Accidents by County

data %>%
  group_by(County) %>%
  summarize(Accidents = n()) %>%
  mutate(County = fct_reorder(County, desc(Accidents)))
## # A tibble: 15 × 2
##    County     Accidents
##    <fct>          <int>
##  1 Apache           215
##  2 Cochise          110
##  3 Coconino         563
##  4 Gila             231
##  5 Graham            35
##  6 Greenlee          13
##  7 La Paz           181
##  8 Maricopa       10830
##  9 Mohave           477
## 10 Navajo           222
## 11 Pima            4396
## 12 Pinal            541
## 13 Santa Cruz        33
## 14 Yavapai          766
## 15 Yuma              85

Accidents per 1,000

merged <- merge(x=data, y=counties, by="County")

merged %>%
  group_by(County) %>%
  mutate(accidents_per_1000 = (n() / population) * 1000) %>%
  distinct(County, population, accidents_per_1000)
## # A tibble: 15 × 3
## # Groups:   County [15]
##    County     population accidents_per_1000
##    <chr>           <int>              <dbl>
##  1 Apache          66473              3.23 
##  2 Cochise        125092              0.879
##  3 Coconino       144942              3.88 
##  4 Gila            53211              4.34 
##  5 Graham          38145              0.918
##  6 Greenlee         9542              1.36 
##  7 La Paz          16845             10.7  
##  8 Maricopa      4367186              2.48 
##  9 Mohave         211274              2.26 
## 10 Navajo         106609              2.08 
## 11 Pima          1035063              4.25 
## 12 Pinal          420625              1.29 
## 13 Santa Cruz      47463              0.695
## 14 Yavapai        233789              3.28 
## 15 Yuma           202944              0.419

Map of Car Accidents in La Paz County

data %>%
  filter(County == "La Paz") %>%
  mapview(xcol = "Start_Lng", ycol = "Start_Lat", zcol = "Severity", crs = 4269, grid = FALSE, label = "Description")

I-10 vs. Not I-10 Code

  • How many crashes in La Paz County were on the I-10 and how many we not?
data <- data %>%
  mutate(i_10 = case_when(
      str_detect(Street, "I-10") ~ TRUE,
      !str_detect(Street, "I-10") ~ FALSE)) %>%
  mutate(i_10 = as.factor(i_10))
data %>%
  filter(County == "La Paz") %>%
  group_by(i_10) %>%
  summarize(Accidents = n()) %>%
  ggplot(aes(x=i_10, y=Accidents)) + 
  geom_bar(stat="identity", fill="lightblue") +
  labs(x="Accident on the I-10", title="La Paz Accidents (2016-2021)") +
  geom_text(aes(label=Accidents), vjust= 1.5, color="white")

I-10 vs. Not I-10

  • How many crashes in La Paz County were on the I-10 and how many we not?

Month of La Paz Accidents Code

  • What month did the accidents in La Paz on the I-10 happen in?
data <- data %>%
  mutate(Month = as.factor(month(Start_Time)))
  
data %>%
  filter(County == "La Paz" & i_10 == TRUE) %>%
  group_by(Month) %>%
  summarize(Accidents = n()) %>%
  ggplot(aes(x=Month, y=Accidents)) + 
  geom_bar(stat="identity", fill="lightblue") +
  geom_text(aes(label=Accidents), vjust=1.2, size=2.9, color="white") +
  labs(title="Accidents by Month in La Paz County")

Month of La Paz Accidents

  • What month did the accidents in La Paz on the I-10 happen in?

Accidents in July (I-10 vs. Not I-10) Code

  • Of the accidents in La Paz in July, how many of those were on the I-10 and how many were not?
data %>%
  filter(County == "La Paz" & Month == "7") %>%
  group_by(i_10) %>%
  summarize(Accidents = n()) %>%
  ggplot(aes(x=i_10, y=Accidents)) + 
  geom_bar(stat="identity", fill="lightblue") +
  geom_text(aes(label=Accidents), vjust= 1.5, color="white") +
  labs(x="Accident on the I-10", title="La Paz Accidents (2016-2021)")

Accidents in July (I-10 vs. Not I-10)

  • Of the accidents in La Paz in July, how many of those were on the I-10 and how many were not?

Accidents by Severity

ggplot(data, aes(factor(Severity), fill=factor(Severity))) + geom_bar(stat = "count") + 
    labs(x = "", y = "Accidents", title = "Accidents by Severity") +
    scale_fill_discrete(name = "Severity")

Accidents by Severity and Time of Day

ggplot(data, aes(factor(Severity), fill=factor(Severity))) + geom_bar(stat = "count") +
  facet_grid(~Sunrise_Sunset) +
  labs(x="Severity", y="Accidents") +
  scale_fill_discrete(name = "Severity")

Accidents by Day of the Week Code

  • What day(s) of the week do more accidents occur, if any?
data_grouped <- data %>%
  group_by(Weekday) %>%
  select(Weekday) %>%
  filter(Weekday != "NA") %>% #filters out about 172 records
  summarize(count = n())

  ggplot(data_grouped, aes(Weekday, count, fill = Weekday)) + 
  geom_bar(stat = "identity", show.legend = FALSE) + 
  labs(x = "", y = "Accidents")

Accidents by Day of the Week

  • What day(s) of the week do more accidents occur, if any?

Accidents by Day And Severity Code

  • Do accidents that are more serious occur on different days than less severe accidents?
data %>% 
  group_by(Weekday, Severity) %>%
  summarize(count = n(), .groups = 'keep') %>%
  filter(Weekday != "NA" & Severity > 2) %>%
  ggplot(aes(Weekday, count, fill = Severity)) + 
  geom_bar(stat = "identity", show.legend = FALSE) + 
  facet_grid(~Severity) +
  labs(x = "", y = "Accidents")

Accidents by Day and Severity

  • Do accidents that are more serious occur on different days than less severe accidents?

Top Weather Conditions for Accidents Code

  • Does inclement weather have an effect on number of accidents?
data %>%
  group_by(Weather_Condition) %>%
  summarize(count = n()) %>%
  mutate(Weather_Condition = fct_reorder(Weather_Condition, desc(count))) %>%
  top_n(6) %>%
  ggplot(aes(Weather_Condition, count)) + 
  geom_bar(stat = "identity", fill="lightblue") + 
  geom_text(aes(label=count), vjust=1.3, color="white", size=3) +
  labs(x = "", y = "Accidents") + 
  theme(axis.text = element_text(size = 6))

Top Weather Conditions for Accidents

  • Does inclement weather have an effect on number of accidents?

Map of Serious Car Accidents in Tempe Code

data %>% 
  filter(City == "Tempe", Severity > 2) %>%
  mapview(xcol = "Start_Lng", ycol = "Start_Lat", zcol = "Severity", 
          crs = 4269, grid = FALSE, label = "Description")

Map of Serious Car Accidents in Tempe

Map of Serious Car Accidents in Phoenix

Map of Serious Car Accident in Scottsdale

Map of Serious Car Accidents in Mesa

Map of Serious Car Accidents in Chandler

Map of Serious Car Accidents in Prescott

Findings for Car Accidents in Arizona

  • La Paz has the highest accidents per 1,000 people with 10.7 and those accidents are presumed to be by travelers between Los Angeles and Phoenix for the most part.
  • The accidents in La Paz line up with school breaks and may have a correlation.
  • Most accidents have a low severity.
  • There is no strong correlation between accident severity and time of day.
  • Weekends have far fewer accidents than weekdays. And Friday is the most frequent day for accidents.
  • From this data, there is no correlation between bad weather and more car accidents.
  • Serious car accidents tend to happen on the highways as opposed to other roads.