Assignment - 6

This is an assignment which has been built to help with the final project for Data 608 - Knowledge and Visual Analytics

The final project deals with the open Chicago data at the location given below - related to traffic crashes in city of Chicago. https://data.cityofchicago.org/Transportation/Traffic-Crashes-Crashes/85ca-t3if The data contains the crashes details including important parameters like crash date, posted speed limit, weather condition, # of lanes, traffic way type.

As a part of this visualization project, we will be building a user friendly visualization in R Shiny using Plotly - to show how the crashes frequency and seriousness is impacted based on important parameters like the month of the year, posted speed limit, # of lanes, etc.

Below I have prepared some visualizations using plotly which I am utilizing in generating a Shiny app for the final users to help look for multiple options to explore how the Chicago accidents have happened under varu=ious categries.

  1. Loading the required packages.
library(RSocrata)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(plyr)
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following objects are masked from 'package:plyr':
## 
##     arrange, mutate, rename, summarise
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
  1. Loading the data into R
chicago_accidents <- read.socrata("https://data.cityofchicago.org/resource/85ca-t3if.json")
  1. Displaying the dimensions
print(dim(chicago_accidents))
## [1] 362077     49
colnames(chicago_accidents)
##  [1] "rd_no"                         "crash_date"                   
##  [3] "posted_speed_limit"            "traffic_control_device"       
##  [5] "device_condition"              "weather_condition"            
##  [7] "lighting_condition"            "first_crash_type"             
##  [9] "trafficway_type"               "alignment"                    
## [11] "roadway_surface_cond"          "road_defect"                  
## [13] "report_type"                   "crash_type"                   
## [15] "intersection_related_i"        "damage"                       
## [17] "date_police_notified"          "prim_contributory_cause"      
## [19] "sec_contributory_cause"        "street_no"                    
## [21] "street_direction"              "street_name"                  
## [23] "beat_of_occurrence"            "num_units"                    
## [25] "most_severe_injury"            "injuries_total"               
## [27] "injuries_fatal"                "injuries_incapacitating"      
## [29] "injuries_non_incapacitating"   "injuries_reported_not_evident"
## [31] "injuries_no_indication"        "injuries_unknown"             
## [33] "crash_hour"                    "crash_day_of_week"            
## [35] "crash_month"                   "latitude"                     
## [37] "longitude"                     "location.type"                
## [39] "location.coordinates"          "lane_cnt"                     
## [41] "hit_and_run_i"                 "statements_taken_i"           
## [43] "crash_date_est_i"              "private_property_i"           
## [45] "photos_taken_i"                "work_zone_i"                  
## [47] "work_zone_type"                "dooring_i"                    
## [49] "workers_present_i"
chicago_accidents %>% head(5) %>% kable() %>% kable_styling()
rd_no crash_date posted_speed_limit traffic_control_device device_condition weather_condition lighting_condition first_crash_type trafficway_type alignment roadway_surface_cond road_defect report_type crash_type intersection_related_i damage date_police_notified prim_contributory_cause sec_contributory_cause street_no street_direction street_name beat_of_occurrence num_units most_severe_injury injuries_total injuries_fatal injuries_incapacitating injuries_non_incapacitating injuries_reported_not_evident injuries_no_indication injuries_unknown crash_hour crash_day_of_week crash_month latitude longitude location.type location.coordinates lane_cnt hit_and_run_i statements_taken_i crash_date_est_i private_property_i photos_taken_i work_zone_i work_zone_type dooring_i workers_present_i
AJ101349 1483369980 30 NO CONTROLS NO CONTROLS CLEAR DAYLIGHT SIDESWIPE SAME DIRECTION PARKING LOT STRAIGHT AND LEVEL DRY NO DEFECTS NOT ON SCENE (DESK REPORT) NO INJURY / DRIVE AWAY N $501 - $1,500 1483373700 IMPROPER LANE USAGE NOT APPLICABLE 450 E 35TH ST 211 2 NO INDICATION OF INJURY 0 0 0 0 0 2 0 9 2 1 41.831296077 -87.614925683 Point c(-87.614925683354, 41.831296076845) NA NA NA NA NA NA NA NA NA NA
AJ103671 1483541100 35 NO CONTROLS NO CONTROLS CLEAR DAYLIGHT SIDESWIPE SAME DIRECTION NOT DIVIDED STRAIGHT AND LEVEL DRY NO DEFECTS NOT ON SCENE (DESK REPORT) NO INJURY / DRIVE AWAY NA $501 - $1,500 1483542600 UNABLE TO DETERMINE ANIMAL 7144 N RIDGE BLVD 2411 2 NO INDICATION OF INJURY 0 0 0 0 0 3 0 8 4 1 42.012292006 -87.683227658 Point c(-87.683227657543, 42.012292006227) 2 Y NA NA NA NA NA NA NA NA
AJ114251 1484323200 30 TRAFFIC SIGNAL FUNCTIONING PROPERLY CLOUDY/OVERCAST DAYLIGHT SIDESWIPE SAME DIRECTION NOT DIVIDED STRAIGHT AND LEVEL ICE NO DEFECTS ON SCENE NO INJURY / DRIVE AWAY Y OVER $1,500 1484323380 IMPROPER LANE USAGE IMPROPER OVERTAKING/PASSING 700 S CICERO AVE 1533 2 NO INDICATION OF INJURY 0 0 0 0 0 2 0 10 6 1 41.87205805 -87.745079343 Point c(-87.745079343101, 41.872058050459) 0 NA NA NA NA NA NA NA NA NA
AJ123519 1484961180 15 STOP SIGN/FLASHER FUNCTIONING PROPERLY CLEAR DARKNESS, LIGHTED ROAD ANGLE NOT DIVIDED STRAIGHT AND LEVEL DRY NO DEFECTS NOT ON SCENE (DESK REPORT) NO INJURY / DRIVE AWAY NA OVER $1,500 1484961900 FAILING TO YIELD RIGHT-OF-WAY UNABLE TO DETERMINE 8700 S HARPER AVE 412 2 NO INDICATION OF INJURY 0 0 0 0 0 2 0 19 6 1 41.736802907 -87.587057965 Point c(-87.587057964619, 41.736802906927) 2 NA NA NA NA NA NA NA NA NA
AJ390611 1502668800 30 NO CONTROLS NO CONTROLS CLEAR DARKNESS PARKED MOTOR VEHICLE ONE-WAY STRAIGHT AND LEVEL DRY NO DEFECTS ON SCENE INJURY AND / OR TOW DUE TO CRASH NA OVER $1,500 1502731800 IMPROPER BACKING UNABLE TO DETERMINE 8454 S COLFAX AVE 423 2 NO INDICATION OF INJURY 0 0 0 0 0 1 0 19 1 8 41.741094906 -87.561352197 Point c(-87.561352197263, 41.741094905698) 1 N NA NA NA NA NA NA NA NA
  1. Plotting a few graphs - these graphs are based on a particular value of a few variables. As we move into the final project though Shiny, these variables will be handled by the user using a drop down. But for now, the variables are hard coded for the few visualizations we are displaying below:

Plotting total crashes - month wise:

chicago_accidents$crash_month_year <- substr(chicago_accidents$crash_date, 1,7)

plyr::count(chicago_accidents, "crash_month_year") %>% 
  subset(crash_month_year > '2015-09') %>% 
  plot_ly(x = ~crash_month_year, y = ~freq, mode = 'lines', type = 'scatter')

Plotting non-fatal crashes distribution based on road condition for one of the 12 months - from overall reading.

# Displaying the distribution of the crashes based on the road conditions for the month of January.

chicago_accidents %>% subset(injuries_fatal = 0) %>%
  subset(crash_month == 1) %>%
  plyr::count("roadway_surface_cond") %>%
  plot_ly(x = ~roadway_surface_cond, y = ~freq, type = 'bar') %>%
  layout(yaxis = list(title = 'value'), barmode = 'stack')

On similar lines, fatal crashes distribution –> injuries_fatal > 0:

chicago_accidents %>% subset(injuries_fatal > 0) %>%
  subset(crash_month == 1) %>%
  plyr::count("roadway_surface_cond") %>%
  plot_ly(x = ~roadway_surface_cond, y = ~freq, type = 'bar') %>%
  layout(yaxis = list(title = 'value'), barmode = 'stack')

We will use the above graphs to present plots based on what the user choses - fatal or no-fatal crashes, and then the month of the year the user wants to analyze.