1. Overview

It is a known fact that USA is a large land mass where personal vehicles are the primary mode of commute. Many people indulge in interstate travel and on doing that most of them are exposed to road accidents. This assignment will help us to understand the States which are more vulnerable to road accidents. The data has been obtained from https://www-fars.nhtsa.dot.gov/States/StatesCrashesAndAllVictims.aspx. Where we look into the road fatalities across US states from the yr. 2018.

2. Data Design and Solution Challenges

1.1 The initial design challenge was to assemble the data together in a format that can be readily used. The data had to be selected from multible tables and then arranged to the desireable format. 1.2 The data had to be selected from multible tables and then arranged to the desireable format.

1.3 For interactivity ggplot2 isn’t the right package. Hence a more suitable package like plotly that allows user engagement had to be selected.

3. Proposed Design

Below sketched designs show the states with high fatalities in terms of road accidents

4. Description of steps involved

First and foremost we need to install all the required packages for the desired visualizations

Next the data is imported to R using read_csv function. As previously mentioned, all the data cleaning and formatting was done on Excel before being subjected to R manipulation

setwd('C:\\Users\\Santosh Maruwada\\Documents\\SMU Study materials\\Term 3\\Visual Analytics\\Assignment\\Assignment 5')
data1 <- read_csv("USFatalities.csv")

## Parsed with column specification:
## cols(
##   State = col_character(),
##   `Total deaths` = col_double(),
##   `Total percent` = col_double(),
##   `Driver deaths` = col_double(),
##   `Percent of driver deaths` = col_double(),
##   `Passenger deaths` = col_double(),
##   `Percent of passenger deaths` = col_double(),
##   `Motorcyclist deaths` = col_double(),
##   `Percent of motorcyclists deaths` = col_double(),
##   `Pedestrian deaths` = col_double(),
##   `Percent of pedestrian deaths` = col_double(),
##   `Cyclists deaths` = col_double(),
##   `Percent of cyclists deaths` = col_double(),
##   `Others deaths` = col_double(),
##   `Percent of others deaths` = col_double()
## )

data1

## Warning: `...` is not empty.
## 
## We detected these problematic arguments:
## * `needs_dots`
## 
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?

## # A tibble: 51 x 15
##    State `Total deaths` `Total percent` `Driver deaths` `Percent of dri~
##    <chr>          <dbl>           <dbl>           <dbl>            <dbl>
##  1 Alab~            953             100             593             62.2
##  2 Alas~             80             100              36             45  
##  3 Ariz~           1010             100             406             40.2
##  4 Arka~            516             100             308             59.7
##  5 Cali~           3563             100            1425             40  
##  6 Colo~            632             100             297             47  
##  7 Conn~            294             100             133             45.2
##  8 Dela~            111             100              47             42.3
##  9 Dist~             31             100               5             16.1
## 10 Flor~           3133             100            1224             39.1
## # ... with 41 more rows, and 10 more variables: `Passenger deaths` <dbl>,
## #   `Percent of passenger deaths` <dbl>, `Motorcyclist deaths` <dbl>, `Percent
## #   of motorcyclists deaths` <dbl>, `Pedestrian deaths` <dbl>, `Percent of
## #   pedestrian deaths` <dbl>, `Cyclists deaths` <dbl>, `Percent of cyclists
## #   deaths` <dbl>, `Others deaths` <dbl>, `Percent of others deaths` <dbl>

Just for the sake of having a personal choice of colour palette, the following colours have been selected to be implemented to our visualizations

mycolors <- c(" dark blue", "gold", "aqua", "orange red", "dark green")

We use Plot_ly to look into the statewise distribution of total fatalities by road accident in the US.

p1 <- data1 %>%
         group_by(State) %>%
         summarise(count = sum(`Total deaths`)) %>%
         plot_ly(x = ~State,y = ~count,type = 'bar') %>%
layout(xaxis = list(title = "Fatalities By State"),
yaxis = list(title = 'Deaths'))

## `summarise()` ungrouping output (override with `.groups` argument)

p1

## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

As we can see the state of Texas has the highest fatality count compared to any other state followed by California.

We use a lolipop chart to identify the top 10 states with highest fatalities where we create data frames by state which are arranged in descending order.

lollipop <- data1 %>%
  select(State, `Total deaths`) %>%
  group_by(State) %>%
  summarise(Total_Count = sum(`Total deaths`)) %>%
  filter(Total_Count>1010) %>%
  arrange(desc(Total_Count)) %>%
  mutate(`Total deaths`=fct_reorder(State, (Total_Count)))

## `summarise()` ungrouping output (override with `.groups` argument)

ggplot(lollipop, aes(x=`Total deaths`, y=Total_Count, label=Total_Count))+geom_segment(aes(x=`Total deaths`, xend=`Total deaths`, y=0, yend=Total_Count), color="grey")+geom_point(size=7, color="skyblue")+coord_flip()+geom_text(color="black", size=3)+labs(title="Top 10 states with highest deaths in the US - 2018", y="Deaths", x=" ",  caption=" ")+theme(plot.caption = element_text(hjust = 1, face = "italic"))

As we can see, the majority of accidents occur in Texas, California and Florida. Whereas the rest in the top 10 do not even add up to the top three fatalities combined

We’d further like to see the distribution of the fatality rate amongst the top 5 states and we create a pie chart using plot_ly function as shown below

p2 <- data1 %>%
         group_by(State) %>%
         summarise(count = sum(`Total deaths`)) %>%
         filter(count>1400) %>%
         plot_ly(labels = ~State, values = ~count, marker = list(colors = mycolors)) %>%
         add_pie(hole = 0.1) %>%
         layout(xaxis = list(zeroline = F,showline = F,showticklabels = F,showgrid = F),
                yaxis = list(zeroline = F,showline = F,showticklabels=F,showgrid=F))

## `summarise()` ungrouping output (override with `.groups` argument)

p2

To further reflect the fatality figure over an interactive map, we use the highchart() function with map navigation function included. This allows us to visualise total road fatalites by state in the year 2018 by hovering the cursor over any given state.

car <- data1 %>%
         group_by(State) %>%
         summarize(total = sum(`Total deaths`))

## `summarise()` ungrouping output (override with `.groups` argument)

highchart() %>%
         hc_title(text = "Road Fatalities in US, 2018") %>%
         hc_subtitle(text = " ") %>%
         hc_add_series_map(usgeojson, car,name = "State",value = "total",joinBy = c("woename", "State")) %>%
         hc_mapNavigation(enabled = T)

We’d like to understand the distribution of two wheeler deaths in the US and whether does it reflect similar patterns to that of the total fatalities by state in the US. We use a lolipop chart again where we create data frames and arrange the states in descending order to obtain the following visualization

lollipop1 <- data1 %>%
  select(State, `Motorcyclist deaths`) %>%
  group_by(State) %>%
  summarise(Total_Count1 = sum(`Motorcyclist deaths`)) %>%
  filter(Total_Count1>120) %>%
  arrange(desc(Total_Count1)) %>%
  mutate(`Motorcyclist deaths`=fct_reorder(State, (Total_Count1)))

## `summarise()` ungrouping output (override with `.groups` argument)

ggplot(lollipop1, aes(x=`Motorcyclist deaths`, y=Total_Count1, label=Total_Count1))+geom_segment(aes(x=`Motorcyclist deaths`, xend=`Motorcyclist deaths`, y=0, yend=Total_Count1), color="grey")+geom_point(size=7, color="skyblue")+coord_flip()+geom_text(color="black", size=3)+labs(title="Motorcyclists Fatlities in the US", y="Deaths",x=" ",  caption=" ")+theme(plot.caption = element_text(hjust = 1, face = "italic"))

The above chart indicates that Florida has higher 2 wheeler deaths than California and Texas althought the total fatalities due to road accidents is lower for Florida compared to the two

It’s important to note that a road accident might involve innocent pedestrians without any fault of theirs. Pedestrians are generally found around crowded areas. Therefore, it will be interesting to see the distribution of pedestrian deaths across the US where we consider the pedestrian percentage death which is a fraction of the total fatalities caused due to road accidents. We use a map to visuazlize the data as shown below

car1 <- data1 %>%
         group_by(State) %>%
         summarize(total = sum(`Percent of pedestrian deaths`))

## `summarise()` ungrouping output (override with `.groups` argument)

highchart() %>%
         hc_title(text = "Percentage of pedestrian deaths out of Total  Fatalities") %>%
         hc_subtitle(text = " ") %>%
         hc_add_series_map(usgeojson, car1,name = "State",value = "total",joinBy = c("woename", "State")) %>%
         hc_mapNavigation(enabled = T)

The above visualization indicates the Hawai, New Jersey and New York have the highest pedestrian deaths where around 30% of total fatalities are pedestrian which is alarming.

5. Final Visualization

The following two maps have been selected for the final visualization

## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

6. Insights and Implications

Insights:

The total fatalities in the state of Texas is highest, closely followed by California and Florida and the three account for nearly 60% of total fatalities in the US
Florida has the highest 2 wheeler accidents occuring, significantly higher than any other state.
As seen above, Texas, Florida and California do not have high pedestrian deaths which indicates that most the accidents happening around these 3 states with highest fatalities must be during interstate travel or on expressways

The National Highway Traffic Safety Administration must take note of the top 3 states contributing to 60% of the total fatalities by making interstate travel and expressways safer. Probably higher rate of patrolling and speed restrictions need to be imposed in order to curb the fatality rate. It is even noted that New York, New Jersey and Hawai have high rate of pedestrian deaths, therefore NHTS has to prioritze the safety of pedestrians by creating safe passageways and to keep fast moving traffic away from crowded areas.

Assignment 5