It is a known fact that USA is a large land mass where personal vehicles are the primary mode of commute. Many people indulge in interstate travel and on doing that most of them are exposed to road accidents. This assignment will help us to understand the States which are more vulnerable to road accidents. The data has been obtained from https://www-fars.nhtsa.dot.gov/States/StatesCrashesAndAllVictims.aspx. Where we look into the road fatalities across US states from the yr. 2018.
1.1 The initial design challenge was to assemble the data together in a format that can be readily used. The data had to be selected from multible tables and then arranged to the desireable format. 1.2 The data had to be selected from multible tables and then arranged to the desireable format.
1.3 For interactivity ggplot2 isn’t the right package. Hence a more suitable package like plotly that allows user engagement had to be selected.
Below sketched designs show the states with high fatalities in terms of road accidents
First and foremost we need to install all the required packages for the desired visualizations
Next the data is imported to R using read_csv function. As previously mentioned, all the data cleaning and formatting was done on Excel before being subjected to R manipulation
setwd('C:\\Users\\Santosh Maruwada\\Documents\\SMU Study materials\\Term 3\\Visual Analytics\\Assignment\\Assignment 5')
data1 <- read_csv("USFatalities.csv")
## Parsed with column specification:
## cols(
## State = col_character(),
## `Total deaths` = col_double(),
## `Total percent` = col_double(),
## `Driver deaths` = col_double(),
## `Percent of driver deaths` = col_double(),
## `Passenger deaths` = col_double(),
## `Percent of passenger deaths` = col_double(),
## `Motorcyclist deaths` = col_double(),
## `Percent of motorcyclists deaths` = col_double(),
## `Pedestrian deaths` = col_double(),
## `Percent of pedestrian deaths` = col_double(),
## `Cyclists deaths` = col_double(),
## `Percent of cyclists deaths` = col_double(),
## `Others deaths` = col_double(),
## `Percent of others deaths` = col_double()
## )
data1
## Warning: `...` is not empty.
##
## We detected these problematic arguments:
## * `needs_dots`
##
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
## # A tibble: 51 x 15
## State `Total deaths` `Total percent` `Driver deaths` `Percent of dri~
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Alab~ 953 100 593 62.2
## 2 Alas~ 80 100 36 45
## 3 Ariz~ 1010 100 406 40.2
## 4 Arka~ 516 100 308 59.7
## 5 Cali~ 3563 100 1425 40
## 6 Colo~ 632 100 297 47
## 7 Conn~ 294 100 133 45.2
## 8 Dela~ 111 100 47 42.3
## 9 Dist~ 31 100 5 16.1
## 10 Flor~ 3133 100 1224 39.1
## # ... with 41 more rows, and 10 more variables: `Passenger deaths` <dbl>,
## # `Percent of passenger deaths` <dbl>, `Motorcyclist deaths` <dbl>, `Percent
## # of motorcyclists deaths` <dbl>, `Pedestrian deaths` <dbl>, `Percent of
## # pedestrian deaths` <dbl>, `Cyclists deaths` <dbl>, `Percent of cyclists
## # deaths` <dbl>, `Others deaths` <dbl>, `Percent of others deaths` <dbl>
Just for the sake of having a personal choice of colour palette, the following colours have been selected to be implemented to our visualizations
mycolors <- c(" dark blue", "gold", "aqua", "orange red", "dark green")
We use Plot_ly to look into the statewise distribution of total fatalities by road accident in the US.
p1 <- data1 %>%
group_by(State) %>%
summarise(count = sum(`Total deaths`)) %>%
plot_ly(x = ~State,y = ~count,type = 'bar') %>%
layout(xaxis = list(title = "Fatalities By State"),
yaxis = list(title = 'Deaths'))
## `summarise()` ungrouping output (override with `.groups` argument)
p1
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
As we can see the state of Texas has the highest fatality count compared to any other state followed by California.
We use a lolipop chart to identify the top 10 states with highest fatalities where we create data frames by state which are arranged in descending order.
lollipop <- data1 %>%
select(State, `Total deaths`) %>%
group_by(State) %>%
summarise(Total_Count = sum(`Total deaths`)) %>%
filter(Total_Count>1010) %>%
arrange(desc(Total_Count)) %>%
mutate(`Total deaths`=fct_reorder(State, (Total_Count)))
## `summarise()` ungrouping output (override with `.groups` argument)
ggplot(lollipop, aes(x=`Total deaths`, y=Total_Count, label=Total_Count))+geom_segment(aes(x=`Total deaths`, xend=`Total deaths`, y=0, yend=Total_Count), color="grey")+geom_point(size=7, color="skyblue")+coord_flip()+geom_text(color="black", size=3)+labs(title="Top 10 states with highest deaths in the US - 2018", y="Deaths", x=" ", caption=" ")+theme(plot.caption = element_text(hjust = 1, face = "italic"))
As we can see, the majority of accidents occur in Texas, California and Florida. Whereas the rest in the top 10 do not even add up to the top three fatalities combined
We’d further like to see the distribution of the fatality rate amongst the top 5 states and we create a pie chart using plot_ly function as shown below
p2 <- data1 %>%
group_by(State) %>%
summarise(count = sum(`Total deaths`)) %>%
filter(count>1400) %>%
plot_ly(labels = ~State, values = ~count, marker = list(colors = mycolors)) %>%
add_pie(hole = 0.1) %>%
layout(xaxis = list(zeroline = F,showline = F,showticklabels = F,showgrid = F),
yaxis = list(zeroline = F,showline = F,showticklabels=F,showgrid=F))
## `summarise()` ungrouping output (override with `.groups` argument)
p2
To further reflect the fatality figure over an interactive map, we use the highchart() function with map navigation function included. This allows us to visualise total road fatalites by state in the year 2018 by hovering the cursor over any given state.
car <- data1 %>%
group_by(State) %>%
summarize(total = sum(`Total deaths`))
## `summarise()` ungrouping output (override with `.groups` argument)
highchart() %>%
hc_title(text = "Road Fatalities in US, 2018") %>%
hc_subtitle(text = " ") %>%
hc_add_series_map(usgeojson, car,name = "State",value = "total",joinBy = c("woename", "State")) %>%
hc_mapNavigation(enabled = T)
We’d like to understand the distribution of two wheeler deaths in the US and whether does it reflect similar patterns to that of the total fatalities by state in the US. We use a lolipop chart again where we create data frames and arrange the states in descending order to obtain the following visualization
lollipop1 <- data1 %>%
select(State, `Motorcyclist deaths`) %>%
group_by(State) %>%
summarise(Total_Count1 = sum(`Motorcyclist deaths`)) %>%
filter(Total_Count1>120) %>%
arrange(desc(Total_Count1)) %>%
mutate(`Motorcyclist deaths`=fct_reorder(State, (Total_Count1)))
## `summarise()` ungrouping output (override with `.groups` argument)
ggplot(lollipop1, aes(x=`Motorcyclist deaths`, y=Total_Count1, label=Total_Count1))+geom_segment(aes(x=`Motorcyclist deaths`, xend=`Motorcyclist deaths`, y=0, yend=Total_Count1), color="grey")+geom_point(size=7, color="skyblue")+coord_flip()+geom_text(color="black", size=3)+labs(title="Motorcyclists Fatlities in the US", y="Deaths",x=" ", caption=" ")+theme(plot.caption = element_text(hjust = 1, face = "italic"))
The above chart indicates that Florida has higher 2 wheeler deaths than California and Texas althought the total fatalities due to road accidents is lower for Florida compared to the two
It’s important to note that a road accident might involve innocent pedestrians without any fault of theirs. Pedestrians are generally found around crowded areas. Therefore, it will be interesting to see the distribution of pedestrian deaths across the US where we consider the pedestrian percentage death which is a fraction of the total fatalities caused due to road accidents. We use a map to visuazlize the data as shown below
car1 <- data1 %>%
group_by(State) %>%
summarize(total = sum(`Percent of pedestrian deaths`))
## `summarise()` ungrouping output (override with `.groups` argument)
highchart() %>%
hc_title(text = "Percentage of pedestrian deaths out of Total Fatalities") %>%
hc_subtitle(text = " ") %>%
hc_add_series_map(usgeojson, car1,name = "State",value = "total",joinBy = c("woename", "State")) %>%
hc_mapNavigation(enabled = T)
The above visualization indicates the Hawai, New Jersey and New York have the highest pedestrian deaths where around 30% of total fatalities are pedestrian which is alarming.
The following two maps have been selected for the final visualization
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
Insights:
The National Highway Traffic Safety Administration must take note of the top 3 states contributing to 60% of the total fatalities by making interstate travel and expressways safer. Probably higher rate of patrolling and speed restrictions need to be imposed in order to curb the fatality rate. It is even noted that New York, New Jersey and Hawai have high rate of pedestrian deaths, therefore NHTS has to prioritze the safety of pedestrians by creating safe passageways and to keep fast moving traffic away from crowded areas.