The Norfolk Data Science Meetup group provided the bicycle accidents involving vehicles dataset.

They previewed it during one of their monthly meetings.

As a cycling enthusiast, I thought it would interesting to explore this data file. What time do most accidents occur? Are the number of accidents declining? Was the driver distracted? Do most accidents happen at intersections?

Fortunately, I have not been involved in any accidents involving a motor vehicle. I have been on the receiving end of impatient and disrespectful drivers, however. I try my best to avoid motor vehicles as much as possible. But this can be difficult living in Virginia’s most populous city of Virginia Beach.

It’s time to put our explorer hat on. Let’s load the data and write some code. Time to learn about bicycle accidents in Hampton Roads and the Eastern Shore of Virginia.

Gather

Creativity requires input, and that’s what research is. You’re gathering material with which to build. [Gene Luen Yang]

Two datasets are available on the web site. The Summary_Trend_data.csv file appears to be a subset of Summary_O_data.csv. There is no description of the datasets or codebook describing the columns.

Let’s explore the latter file.

# read dataset
bikes.raw <- read_csv("https://raw.githubusercontent.com/NorfolkDataSci/carCrashesWithBikes/master/Summary_O_data.csv")
set.seed(206)
# number of records
nrow(bikes.raw)
[1] 1328

Examine

Concentrate all your thoughts upon the work at hand. The sun’s rays do not burn until brought to a focus. [Alexander Graham Bell, scientist and inventor]
# columns
names(bikes.raw)
  [1] "Access Control"                               "Alcohol Notalcohol"                          
  [3] "Area Type1"                                   "A Crash"                                     
  [5] "A People"                                     "Basetypedesc"                                
  [7] "Begin Node Dsc"                               "Belted Unbelted"                             
  [9] "Bikeage"                                      "Bikegen"                                     
 [11] "Bikeinjurytype"                               "Bikevehiclenumber"                           
 [13] "Bike Nonbike"                                 "BMP"                                         
 [15] "B Crash"                                      "B People"                                    
 [17] "Carspeedlimit"                                "Collision Type"                              
 [19] "Comm Cargo Body Type Cd"                      "Comm Vehicle Body Type Cd"                   
 [21] "Cotedrouteid"                                 "Coted Mp"                                    
 [23] "Count App"                                    "CRASH_DT (copy)"                             
 [25] "Crash Dt"                                     "Crash Event Type Dsc"                        
 [27] "Crash Military Tm"                            "Crash Severity"                              
 [29] "Crash Year"                                   "Curbgutterdesc"                              
 [31] "C Crash"                                      "C People"                                    
 [33] "Juris Name Used"                              "Area Type Used"                              
 [35] "First Harmful Event of Entire Crash"          "MAINLINE"                                    
 [37] "Time Slicing Used"                            "Phy_Juris_Nm"                                
 [39] "Offset-Ft"                                    "Intersection Analysis"                       
 [41] "Clear"                                        "Calculation_9640323132123359"                
 [43] "Day Of Week"                                  "Deer Nodeer"                                 
 [45] "Direction Of Travel Cd"                       "Distracted Notdistracted"                    
 [47] "VSP"                                          "TOTAL CRASH"                                 
 [49] "Document Nbr"                                 "Driverage"                                   
 [51] "Drivergen"                                    "Driverinjurytype"                            
 [53] "Drivervehiclenumber"                          "Driver Action Type Cd"                       
 [55] "Driver Airbag Deployment"                     "Driver Alcohol Test Type Cd"                 
 [57] "Driver Condition Type Cd"                     "Driver Distraction Type Cd"                  
 [59] "Driver Drinking Type Cd"                      "Driver Drug Use Cd"                          
 [61] "Driver Ejected From Vehicle"                  "Driver Ems Transport Ind"                    
 [63] "Driver Fled Scene Ind"                        "Driver Safety Equip Used"                    
 [65] "Driver Vis Obscured Type Cd"                  "Drowsy Notdrowsy"                            
 [67] "Drug Nodrug"                                  "EMP"                                         
 [69] "End Node"                                     "End Node Dsc"                                
 [71] "End Offset"                                   "Facility"                                    
 [73] "FAC"                                          "First Crash Event Cd"                        
 [75] "First Harmful Event"                          "Fourth Crash Event Cd"                       
 [77] "Functionalclass"                              "FUN"                                         
 [79] "Govcondesc"                                   "Gr Nogr"                                     
 [81] "Hitrun Not Hitrun"                            "Initial Veh Impact Area Cd"                  
 [83] "Injury Crashes"                               "Intersection Type"                           
 [85] "Int Doc"                                      "Jurtype"                                     
 [87] "K_CRASH"                                      "K People"                                    
 [89] "LAT (copy)"                                   "LAT"                                         
 [91] "Leftshoulderwidth"                            "Length"                                      
 [93] "Lgtruck Nonlgtruck"                           "Light Condition"                             
 [95] "Located Unlocated"                            "LON (copy)"                                  
 [97] "LON"                                          "MAINLINE (group)"                            
 [99] "Mainline Yn"                                  "Medianleftshoulderwidth"                     
[101] "Medianrightshoulderwidth"                     "Mediantypedesc"                              
[103] "Medianwidthmax"                               "Medianwidthmin"                              
[105] "Median Type"                                  "Most Harmful Crash Event Cd"                 
[107] "Motor Nonmotor"                               "Node"                                        
[109] "Node Info"                                    "Node Totaadt2011"                            
[111] "Node Totaadt2012"                             "Node Totaadt2013"                            
[113] "Node Totaadt2014"                             "Node Totaadt2015"                            
[115] "Numberoflane"                                 "Number of Records"                           
[117] "Offset"                                       "Ownership"                                   
[119] "Passage"                                      "Passgen"                                     
[121] "Passinjurytype"                               "Passvehiclenumber"                           
[123] "Pass Airbag Deployment"                       "Pass Ejected From Vehicle"                   
[125] "Pass Ems Transport Ind"                       "Pass Safety Equip Used"                      
[127] "Pavementconditionvalue"                       "Pavementroughnessvalue"                      
[129] "Pavementwidth"                                "Pdo Crash"                                   
[131] "Total People Killed & Injured"                "Pedage"                                      
[133] "Pedestrians Injured"                          "Pedestrians Killed"                          
[135] "Pedgen"                                       "Pedinjurytype"                               
[137] "Pednumber"                                    "Ped Action"                                  
[139] "Ped Al Test"                                  "Ped Cond"                                    
[141] "Ped Drink"                                    "Ped Drug"                                    
[143] "Ped Nonped"                                   "Ped Rflct"                                   
[145] "Persons Injured"                              "Persons Killed"                              
[147] "Physical Juris Nm"                            "PJR"                                         
[149] "VSP_Used"                                     "Rd Type"                                     
[151] "Relation To Roadway"                          "Rightshoulderwidth"                          
[153] "RNS_MP (0.25 mi bin)"                         "Rns Mp"                                      
[155] "Roadway Alignment"                            "Roadway Defect"                              
[157] "Roadway Description"                          "Roadway Surface Cond"                        
[159] "Roadway Surface Type"                         "Route Or Street Nm"                          
[161] "Rte Category Cd"                              "Rte Cat"                                     
[163] "Rte Nm"                                       "Ruralurbandesc"                              
[165] "School Zone"                                  "Second Crash Event Cd"                       
[167] "Segtotaadt2011"                               "Segtotaadt2012"                              
[169] "Segtotaadt2013"                               "Segtotaadt2014"                              
[171] "Segtotaadt2015"                               "Senior Notsenior"                            
[173] "Sidewalkdesc"                                 "Speed Before"                                
[175] "Speed Max Safe"                               "Speed Notspeed"                              
[177] "Speed Posted"                                 "Start Node"                                  
[179] "Start Offset"                                 "Summons Issued Cd"                           
[181] "Surfacedesc"                                  "Third Crash Event Cd"                        
[183] "Time Slicing"                                 "Total Crashes including Property Damage Only"
[185] "Total Crash"                                  "Traffic Control Type"                        
[187] "Trfc Ctrl Status Type"                        "Truckcommr"                                  
[189] "Vehiclenumber"                                "Vehicle Body Type Cd"                        
[191] "Vehicle Make Nm"                              "Vehicle Maneuver Type Cd"                    
[193] "Vehicle Model Nm"                             "Vehicle Year Nbr"                            
[195] "District"                                     "District_Used"                               
[197] "Weather Condition"                            "Work Zone Location"                          
[199] "Work Zone Related"                            "Work Zone Type"                              
[201] "Young Notyoung"                              
glimpse(bikes.raw)
Observations: 1,328
Variables: 201
$ Access Control                               <chr> "No Access Control", "No Access Control", "No Ac...
$ Alcohol Notalcohol                           <chr> "Not ALCIHOL", "Not ALCIHOL", "Not ALCIHOL", "No...
$ Area Type1                                   <chr> "Urban", "Urban", "Urban", "Urban", "Urban", "Ur...
$ A Crash                                      <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, ...
$ A People                                     <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, ...
$ Basetypedesc                                 <chr> "Bituminous Concrete (Black Base)", "Bituminous ...
$ Begin Node Dsc                               <chr> "US-00058(B)/", "FAIRFIELD BLVD(L)/", "REPUBLIC ...
$ Belted Unbelted                              <chr> "Not UNBELTED", "Not UNBELTED", "Not UNBELTED", ...
$ Bikeage                                      <int> 49, 25, 17, 14, 44, 20, NA, 20, 19, 61, 17, 21, ...
$ Bikegen                                      <chr> "Male", "Male", "Male", "Male", "Not Provided", ...
$ Bikeinjurytype                               <chr> "B", "B", "C", "C", "C", "C", "B", "C", "A", "C"...
$ Bikevehiclenumber                            <int> 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, ...
$ Bike Nonbike                                 <chr> "BIKE", "BIKE", "BIKE", "BIKE", "BIKE", "BIKE", ...
$ BMP                                          <dbl> 13.43, 11.52, 0.76, NA, 25.65, NA, NA, 2.15, 2.7...
$ B Crash                                      <int> 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, ...
$ B People                                     <int> 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, ...
$ Carspeedlimit                                <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Collision Type                               <chr> "2. Angle", "2. Angle", "2. Angle", "2. Angle", ...
$ Comm Cargo Body Type Cd                      <chr> "Not ProvidedNot Provided", "Not Provided,Not Pr...
$ Comm Vehicle Body Type Cd                    <chr> "Not ProvidedNot Provided", "Not Provided,Not Pr...
$ Cotedrouteid                                 <chr> "SR00190", "SR00190", "13400009", NA, "US00017",...
$ Coted Mp                                     <dbl> 13.96, 11.53, 0.76, NA, 25.66, NA, NA, 2.15, 2.7...
$ Count App                                    <int> 3, 4, 4, NA, 3, 4, NA, 4, 3, NA, 3, 3, 3, 4, 4, ...
$ CRASH_DT (copy)                              <chr> "1/4/2011", "1/6/2011", "1/6/2011", "1/8/2011", ...
$ Crash Dt                                     <chr> "1/4/2011", "1/6/2011", "1/6/2011", "1/8/2011", ...
$ Crash Event Type Dsc                         <chr> "20. Motor Vehicle In Transport", "22. Bicycle",...
$ Crash Military Tm                            <int> 610, 1715, 1930, 1624, 1330, 1923, 1145, 1005, 1...
$ Crash Severity                               <chr> "B.Visiible Injury", "B.Visiible Injury", "C.Non...
$ Crash Year                                   <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, ...
$ Curbgutterdesc                               <chr> "Left and Right sides", "Left and Right sides", ...
$ C Crash                                      <int> 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, ...
$ C People                                     <int> 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, ...
$ Juris Name Used                              <chr> "Virginia Beach", "Virginia Beach", "Virginia Be...
$ Area Type Used                               <chr> "Urban", "Urban", "Urban", "Urban", "Urban", "Ur...
$ First Harmful Event of Entire Crash          <chr> "20. Motor Vehicle In Transport", "22. Bicycle",...
$ MAINLINE                                     <chr> "MAIN", "MAIN", "MAIN", "MAIN", "MAIN", "MAIN", ...
$ Time Slicing Used                            <chr> "6AM TO 9AM", "3PM TO 6PM", "6PM TO 9PM", "3PM T...
$ Phy_Juris_Nm                                 <chr> "134.Virginia Beach", "134.Virginia Beach", "134...
$ Offset-Ft                                    <dbl> 1716, 2101, 898, NA, 655, 0, NA, 0, 48, NA, 697,...
$ Intersection Analysis                        <chr> "Intersection", "Intersection", "Intersection", ...
$ Clear                                        <chr> "Reset Filters", "Reset Filters", "Reset Filters...
$ Calculation_9640323132123359                 <chr> "VDOT_OTHER", "VDOT_OTHER", "VDOT_OTHER", "VDOT_...
$ Day Of Week                                  <chr> "Tuesday", "Thursday", "Thursday", "Saturday", "...
$ Deer Nodeer                                  <chr> "Not DEER", "Not DEER", "Not DEER", "Not DEER", ...
$ Direction Of Travel Cd                       <chr> "SouthEast", "North,South", "North,East", "South...
$ Distracted Notdistracted                     <chr> NA, NA, "DISTRACTED", NA, "DISTRACTED", NA, NA, ...
$ VSP                                          <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ...
$ TOTAL CRASH                                  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ Document Nbr                                 <int> 111050057, 111050063, 111050064, 111640067, 1205...
$ Driverage                                    <int> 62, NA, 73, 40, NA, 30, 55, 26, 20, 47, NA, 31, ...
$ Drivergen                                    <chr> "Male", "Male", "Male", "Male", "Not Provided", ...
$ Driverinjurytype                             <chr> "B,PDO", "B,PDO", "PDO,C", "PDO,C", "PDO,C", "PD...
$ Drivervehiclenumber                          <dbl> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, ...
$ Driver Action Type Cd                        <chr> "28. Driving Without Lights,1. No Improper Actio...
$ Driver Airbag Deployment                     <chr> "3. Unavailable/Not Applicable,1. Deployed - Fro...
$ Driver Alcohol Test Type Cd                  <chr> "Not Applicable,Not Applicable", "Not Applicable...
$ Driver Condition Type Cd                     <chr> "1. No Defects,1. No Defects", "1. No Defects,9....
$ Driver Distraction Type Cd                   <chr> "Not Applicable,Not Applicable", "Not Applicable...
$ Driver Drinking Type Cd                      <chr> "1. Had Not Been Drinking,1. Had Not Been Drinki...
$ Driver Drug Use Cd                           <chr> "Not Applicable,Not Applicable", "2. No,3. Unkno...
$ Driver Ejected From Vehicle                  <chr> "2. Partially Ejected,1. Not Ejected", "1. Not E...
$ Driver Ems Transport Ind                     <chr> "No,NotProvided", "No,NotProvided", "NotProvided...
$ Driver Fled Scene Ind                        <chr> "No,No", "No,Yes", "No,No", "No,No", "Yes,No", "...
$ Driver Safety Equip Used                     <chr> "8. No Restraint Used,3. Lap and Shoulder Belt",...
$ Driver Vis Obscured Type Cd                  <chr> "1. Not Obscured,13. Other", "1. Not Obscured,No...
$ Drowsy Notdrowsy                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Drug Nodrug                                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ EMP                                          <int> 14, 12, 1, NA, 26, NA, NA, 2, 3, NA, 3, 0, 1, 1,...
$ End Node                                     <int> 541186, 541148, 541257, NA, 483086, NA, NA, 5411...
$ End Node Dsc                                 <chr> "LAVENDER LANE(R)/", "SR-00165(B)/", "134-08714(...
$ End Offset                                   <dbl> 0.00, 0.00, 0.00, NA, 0.00, NA, NA, 0.00, 0.00, ...
$ Facility                                     <int> 1, 1, 1, NA, 0, 1, NA, 1, 1, NA, 0, 0, 0, 1, 2, ...
$ FAC                                          <chr> "1.Divided, no control of access", "1.Divided, n...
$ First Crash Event Cd                         <chr> "20,22", "20,22", "20,20", "22,20", "22,22", "22...
$ First Harmful Event                          <chr> "1. On Roadway", "1. On Roadway", "1. On Roadway...
$ Fourth Crash Event Cd                        <chr> "0,0", "0,0", "0,0", "0,0", "0,0", "0,0", "0,0",...
$ Functionalclass                              <chr> "H", "H", "H", NA, "E", "H", NA, "E", "H", NA, "...
$ FUN                                          <chr> "H.Urban Minor Arterial", "H.Urban Minor Arteria...
$ Govcondesc                                   <chr> "Urban Extensions - Primary Routes", "Urban Exte...
$ Gr Nogr                                      <chr> "Not GUARDRAIL", "Not GUARDRAIL", "Not GUARDRAIL...
$ Hitrun Not Hitrun                            <chr> "Not HIT_RUN", "HIT_RUN", "Not HIT_RUN", "Not HI...
$ Initial Veh Impact Area Cd                   <chr> "2,12", "3,1", "12,3", "12,9", "1,6", "12,4", "1...
$ Injury Crashes                               <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, ...
$ Intersection Type                            <chr> "2. Two Approaches", "4. Four Approaches", "4. F...
$ Int Doc                                      <int> 111050057, 111050063, 111050064, NA, 120585198, ...
$ Jurtype                                      <chr> "C", "C", "C", "C", "C", "C", "C", "C", "C", "C"...
$ K_CRASH                                      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ...
$ K People                                     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ...
$ LAT (copy)                                   <dbl> 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, ...
$ LAT                                          <dbl> 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, ...
$ Leftshoulderwidth                            <int> 0, 0, 0, NA, 0, NA, NA, 0, 0, NA, 0, 0, 0, 0, 10...
$ Length                                       <dbl> 0.64, 0.59, 0.19, NA, 0.13, NA, NA, 0.23, 0.03, ...
$ Lgtruck Nonlgtruck                           <chr> "Not LRTRUCK", "Not LRTRUCK", "Not LRTRUCK", "No...
$ Light Condition                              <chr> "4. Darkness - Road Lighted", "3. Dusk", "4. Dar...
$ Located Unlocated                            <chr> "LOCATED", "LOCATED", "LOCATED", "UNLOCATED", "L...
$ LON (copy)                                   <dbl> -76, -76, -76, -76, -76, -76, -76, -76, -76, -76...
$ LON                                          <dbl> -76, -76, -76, -76, -76, -76, -76, -76, -76, -76...
$ MAINLINE (group)                             <chr> "MAINLINE", "MAINLINE", "MAINLINE", "MAINLINE", ...
$ Mainline Yn                                  <chr> "MAIN", "MAIN", "MAIN", NA, "MAIN", NA, NA, "MAI...
$ Medianleftshoulderwidth                      <int> 0, 0, 0, NA, 0, NA, NA, 6, 0, NA, 0, 0, 0, 0, 0,...
$ Medianrightshoulderwidth                     <int> 0, 0, 0, NA, 0, NA, NA, 6, 0, NA, 0, 0, 0, 0, 0,...
$ Mediantypedesc                               <chr> "Curbed Grass", "Curbed Grass", "Curbed Grass", ...
$ Medianwidthmax                               <int> 16, 16, 16, NA, 0, NA, NA, 16, 14, NA, 0, 0, 0, ...
$ Medianwidthmin                               <int> 8, 4, 4, NA, 0, NA, NA, 4, 3, NA, 0, 0, 0, 16, 1...
$ Median Type                                  <chr> "Divided Roadway", "Divided Roadway", "Divided R...
$ Most Harmful Crash Event Cd                  <chr> "20,22", "20,22", "20,20", "22,20", "22,22", "22...
$ Motor Nonmotor                               <chr> "Not MOTORCYCLE", "Not MOTORCYCLE", "Not MOTORCY...
$ Node                                         <int> 541188, 541174, 541066, NA, 483086, 541275, NA, ...
$ Node Info                                    <chr> "541188.  SR00190       SR00190      13408749", ...
$ Node Totaadt2011                             <int> 20258, 48566, 63023, NA, 27156, 57323, NA, 57661...
$ Node Totaadt2012                             <int> 18296, 44998, 61504, NA, 28053, 59152, NA, 55396...
$ Node Totaadt2013                             <int> 18464, 45413, 62556, NA, 26395, 59698, NA, 55735...
$ Node Totaadt2014                             <int> 17795, 43765, 60804, NA, 24772, 57532, NA, 54348...
$ Node Totaadt2015                             <int> 17531, 44158, 62575, NA, 25667, 59080, NA, 54198...
$ Numberoflane                                 <int> 4, 4, 4, NA, 4, NA, NA, 6, 4, NA, 4, 2, 2, 4, 4,...
$ Number of Records                            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ Offset                                       <dbl> 0.325, 0.398, 0.170, NA, 0.124, 0.000, NA, 0.000...
$ Ownership                                    <chr> "PRI_URBAN", "PRI_URBAN", "SEC_URBAN", "SEC_URBA...
$ Passage                                      <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Passgen                                      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Passinjurytype                               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Passvehiclenumber                            <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pass Airbag Deployment                       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pass Ejected From Vehicle                    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pass Ems Transport Ind                       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pass Safety Equip Used                       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pavementconditionvalue                       <dbl> 3, 0, 0, NA, 0, NA, NA, 0, 0, NA, 0, 0, 0, 0, 0,...
$ Pavementroughnessvalue                       <int> 0, 265, 0, NA, 188, NA, NA, 115, 0, NA, 0, 0, 0,...
$ Pavementwidth                                <int> 54, 54, 52, NA, 56, NA, NA, 76, 52, NA, 52, 0, 3...
$ Pdo Crash                                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ Total People Killed & Injured                <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ Pedage                                       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pedestrians Injured                          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ Pedestrians Killed                           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ Pedgen                                       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pedinjurytype                                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pednumber                                    <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Ped Action                                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Ped Al Test                                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Ped Cond                                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Ped Drink                                    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Ped Drug                                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Ped Nonped                                   <chr> "Not PED", "Not PED", "Not PED", "Not PED", "Not...
$ Ped Rflct                                    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Persons Injured                              <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, ...
$ Persons Killed                               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ...
$ Physical Juris Nm                            <chr> "Virginia Beach", "Virginia Beach", "Virginia Be...
$ PJR                                          <int> 134, 134, 134, 121, 124, 134, 131, 134, 134, 122...
$ VSP_Used                                     <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ...
$ Rd Type                                      <chr> "NOT-RD", "NOT-RD", "NOT-RD", "NOT-RD", "NOT-RD"...
$ Relation To Roadway                          <chr> "15. Other Crossing (Crossing for Bikes, School,...
$ Rightshoulderwidth                           <int> 0, 0, 0, NA, 0, NA, NA, 0, 0, NA, 0, 0, 0, 0, 10...
$ RNS_MP (0.25 mi bin)                         <dbl> 13.75, 11.50, 0.75, NA, 25.50, 6.50, NA, 2.00, 2...
$ Rns Mp                                       <dbl> 13.95, 11.53, 0.75, NA, 25.66, 6.68, NA, 2.14, 2...
$ Roadway Alignment                            <chr> "1. Straight - Level", "1. Straight - Level", "2...
$ Roadway Defect                               <chr> "1. No Defects", "1. No Defects", "1. No Defects...
$ Roadway Description                          <chr> "1. Two-Way, Not Divided", "2. Two-Way, Divided,...
$ Roadway Surface Cond                         <chr> "1. Dry", "1. Dry", "1. Dry", "1. Dry", "2. Wet"...
$ Roadway Surface Type                         <chr> "2. Blacktop, Asphalt, Bituminous", "2. Blacktop...
$ Route Or Street Nm                           <chr> "527 n witchduck rd", "kempsvillr rd", "799 firs...
$ Rte Category Cd                              <chr> "STPRI", "STPRI", "URB", "UNKWN", "USPRI", "URB"...
$ Rte Cat                                      <chr> "SR", "SR", "UR", "SC", "US", "UR", "SC", "SR", ...
$ Rte Nm                                       <chr> "R-VA   SR00190WB", "R-VA   SR00190WB", "R-VA134...
$ Ruralurbandesc                               <chr> "Urbanized (Population 200,000 and over)", "Urba...
$ School Zone                                  <chr> "3. No", "3. No", "3. No", "3. No", "3. No", "3....
$ Second Crash Event Cd                        <chr> "0,0", "0,0", "0,0", "0,0", "0,0", "0,0", "0,0",...
$ Segtotaadt2011                               <int> 18442, 31529, 36525, NA, 24625, NA, NA, 50491, 1...
$ Segtotaadt2012                               <int> 16730, 29632, 34557, NA, 25523, NA, NA, 47214, 1...
$ Segtotaadt2013                               <int> 16884, 29906, 34876, NA, 26422, NA, NA, 47482, 1...
$ Segtotaadt2014                               <int> 16272, 28820, 33610, NA, 22408, NA, NA, 46377, 1...
$ Segtotaadt2015                               <int> 15297, 27832, 35142, NA, 23257, NA, NA, 46649, 1...
$ Senior Notsenior                             <chr> "Not SENIOR", "Not SENIOR", "SENIOR", "Not SENIO...
$ Sidewalkdesc                                 <chr> "Left and Right sides", "Left and Right sides", ...
$ Speed Before                                 <chr> "5,5", "2,0", "30,3", "5,5", "0,5", "30,0", "5,5...
$ Speed Max Safe                               <chr> "5,5", "35,0", "35,35", "35,0", "35,35", "45,0",...
$ Speed Notspeed                               <chr> "Not SPEED", "Not SPEED", "Not SPEED", "SPEED", ...
$ Speed Posted                                 <chr> "35,25", "35,0", "35,35", "35,0", "35,35", "45,0...
$ Start Node                                   <int> 541088, 541176, 734938, NA, 483087, NA, NA, 5412...
$ Start Offset                                 <dbl> 0.00, 0.00, 0.00, NA, 0.14, NA, NA, 0.00, 0.00, ...
$ Summons Issued Cd                            <chr> "2. No,2. No", "2. No,Not Provided", "Not Provid...
$ Surfacedesc                                  <chr> "0", "0", "0", NA, "0", NA, NA, "0", "0", NA, "0...
$ Third Crash Event Cd                         <chr> "0,0", "0,0", "0,0", "0,0", "0,0", "0,0", "0,0",...
$ Time Slicing                                 <chr> "6AM TO 9AM", "3PM TO 6PM", "6PM TO 9PM", "3PM T...
$ Total Crashes including Property Damage Only <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ Total Crash                                  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ Traffic Control Type                         <chr> "4. Stop Sign", "3. Traffic Signal", "3. Traffic...
$ Trfc Ctrl Status Type                        <chr> "6. No Traffic Control Device Present", "1. Yes ...
$ Truckcommr                                   <chr> "Not a Parkway - Trucks and Commercial Vehicles ...
$ Vehiclenumber                                <dbl> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, ...
$ Vehicle Body Type Cd                         <chr> "9. Bicycle,1. Passenger car", "9. Bicycle,1. Pa...
$ Vehicle Make Nm                              <chr> "huffy,ford", "giant", "dodge,fuju", "DODGE", NA...
$ Vehicle Maneuver Type Cd                     <chr> "1. Going Straight Ahead,3. Making Left Turn", "...
$ Vehicle Model Nm                             <chr> "bicycle,focus", "defy 3", "journey,bicycle", "R...
$ Vehicle Year Nbr                             <dbl> 2009, 2009, 20092007, 1998, 0, 2007, 20072000, 2...
$ District                                     <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ...
$ District_Used                                <chr> "5.Hampton Roads", "5.Hampton Roads", "5.Hampton...
$ Weather Condition                            <chr> "1. No Adverse Condition (Clear/Cloudy)", "1. No...
$ Work Zone Location                           <chr> "Not Provided", "Not Provided", "Not Provided", ...
$ Work Zone Related                            <chr> "2. No", "2. No", "2. No", "2. No", "2. No", "2....
$ Work Zone Type                               <chr> "Not Provided", "Not Provided", "Not Provided", ...
$ Young Notyoung                               <chr> "Not YOUNG", "Not YOUNG", "YOUNG", "Not YOUNG", ...
sample_n(bikes.raw, 20)

After reviewing the list of columns, I’ve decided to concentrate on the following:

Column New Column Notes
[2] “Alcohol Notalcohol” AlcoholRelated factor variable of Yes/No, discrete-nominal
[9] “Bikeage” BikerAge continuous
[10] “Bikegen” BikerGender discrete-nominal
[18] “Collision Type” CollisionType consider split into cd and description, discrete-nominal
[25] “Crash Dt” CrashDate continuous
[26] “Crash Event Type Dsc” CrashEventTypeDesc consider split into cd and description, discrete-nominal
[27] “Crash Military Tm” CrashTime continuous
[28] “Crash Severity” CrashSeverity consider split into cd and description, discrete-ordinal
[29] “Crash Year” CrashYear continuous
[37] “Time Slicing Used” TimeFrame discrete-ordinal
[40] “Intersection Analysis” IntersectionAnalysis discrete-nominal
[43] “Day Of Week” DayOfWeek discrete-ordinal
[45] “Direction Of Travel Cd” DirectionOfTravelCd discrete-nominal
[46] “Distracted Notdistracted” Distracted discrete-nominal
[49] “Document Nbr” DocNbr unique identifier, discrete-nominal, identification variable
[50] “Driverage” DriversAge continuous
[51] “Drivergen” DriversGender discrete-nominal
[54] “Driver Action Type Cd” DriverActionTypeCd discrete-nominal
[56] “Driver Alcohol Test Type Cd” DriverAlcholTestTypeCd discrete-nominal
[58] “Driver Distraction Type Cd” DriverDistractionTypeCd discrete-nominal
[59] “Driver Drinking Type Cd” DriverDrinkingTypeCd discrete-nominal
[60] “Driver Drug Use Cd” DriverDrugUseCd discrete-nominal
[61] “Driver Ejected From Vehicle” DriverEjectedFromVehicle discrete-nominal
[65] “Driver Vis Obscured Type Cd” DriverVisObscuredTypeCd discrete-nominal
[81] “Hitrun Not Hitrun” HitAndRun Yes/No, discrete-nominal
[84] “Intersection Type” IntersectionType discrete-nominal
[90] “LAT” lat continuous
[94] “Light Condition” LightingCond discrete-nominal
[97] “LON” lng continuous
[145] “Persons Injured” NumInjured continuous
[146] “Persons Killed” NumKilled continuous
[158] “Roadway Surface Cond” RoadwaySurfaceCond discrete-nominal
[190] “Vehicle Body Type Cd” VehicleBodyTypeCd discrete-nominal
[192] “Vehicle Maneuver Type Cd” VehicleManeuverTypeCd discrete-nominal
[197] “Weather Condition” WeatherCond discrete-ordinal

Why these columns? Many columns were excluded because:

  1. They provided no meaningful information, e.g. [1] “Access Control”, [41] “Clear”
  2. I didn’t think the information was relevant for this analysis, e.g. [191] “Vehicle Make Nm” 3.) I had no clue what the information meant, e.g. [14] “BMP”, [74] “First Crash Event Cd”
  3. I felt the information was redundant, meaninless, or could be redefined in some other way, e.g. [53] “Drivervehiclenumber”, [70] “End Node Dsc”, [201] “Young Notyoung”
bikeDF <- bikes.raw[, c(2,9,10,18,25:29,37,40,43,45,46,49:51,54,56
                      ,58:61,65,81,84,90,94,97,145,146,158,190,192,197)]
# rename columns
colnames(bikeDF) <- c("AlcoholRelated", "BikerAge", "BikerGender"
                  , "CollisionType", "CrashDt", "CrashEventTypeDesc", "CrashTime"
                  , "crashSeverity", "CrashYear"
                  , "TimeFrame", "IntersectionAnalysis", "DayOfWeek", "DirectionOfTravelCd"
                  , "Distracted", "DocNbr", "DriverAge"
                  , "DriverGender", "DriverActionTypeCd", "DriverAlcoholTestTypeCd"
                  , "DriverDistractionTypeCd", "DriverDrinkingTypeCd"
                  , "DriverDrugUseCd", "DriverEjectedFromVehicle"
                  , "DriverVisObscuredTypeCd", "HitAndRun", "IntersectionType"
                  , "lat", "LightCondition"
                  , "lng", "NumInjured", "NumKilled", "RoadwaySurfaceCond"
                  , "VehicleBodyTypeCd", "VehicleManeuverTypeCd"
                  , "WeatherCond")
summary(bikeDF)
 AlcoholRelated        BikerAge  BikerGender        CollisionType        CrashDt         
 Length:1328        Min.   : 4   Length:1328        Length:1328        Length:1328       
 Class :character   1st Qu.:19   Class :character   Class :character   Class :character  
 Mode  :character   Median :28   Mode  :character   Mode  :character   Mode  :character  
                    Mean   :33                                                           
                    3rd Qu.:49                                                           
                    Max.   :85                                                           
                    NA's   :26                                                           
 CrashEventTypeDesc   CrashTime    crashSeverity        CrashYear     TimeFrame        
 Length:1328        Min.   :   0   Length:1328        Min.   :2011   Length:1328       
 Class :character   1st Qu.:1110   Class :character   1st Qu.:2012   Class :character  
 Mode  :character   Median :1501   Mode  :character   Median :2013   Mode  :character  
                    Mean   :1421                      Mean   :2013                     
                    3rd Qu.:1753                      3rd Qu.:2014                     
                    Max.   :2356                      Max.   :2016                     
                                                                                       
 IntersectionAnalysis  DayOfWeek         DirectionOfTravelCd  Distracted            DocNbr         
 Length:1328          Length:1328        Length:1328         Length:1328        Min.   :110810052  
 Class :character     Class :character   Class :character    Class :character   1st Qu.:121615168  
 Mode  :character     Mode  :character   Mode  :character    Mode  :character   Median :132482536  
                                                                                Mean   :133483626  
                                                                                3rd Qu.:142960095  
                                                                                Max.   :161960036  
                                                                                                   
   DriverAge        DriverGender       DriverActionTypeCd DriverAlcoholTestTypeCd DriverDistractionTypeCd
 Min.   :       1   Length:1328        Length:1328        Length:1328             Length:1328            
 1st Qu.:      27   Class :character   Class :character   Class :character        Class :character       
 Median :      41   Mode  :character   Mode  :character   Mode  :character        Mode  :character       
 Mean   :   58970                                                                                        
 3rd Qu.:      57                                                                                        
 Max.   :65323423                                                                                        
 NA's   :172                                                                                             
 DriverDrinkingTypeCd DriverDrugUseCd    DriverEjectedFromVehicle DriverVisObscuredTypeCd
 Length:1328          Length:1328        Length:1328              Length:1328            
 Class :character     Class :character   Class :character         Class :character       
 Mode  :character     Mode  :character   Mode  :character         Mode  :character       
                                                                                         
                                                                                         
                                                                                         
                                                                                         
  HitAndRun         IntersectionType        lat     LightCondition          lng        NumInjured 
 Length:1328        Length:1328        Min.   :37   Length:1328        Min.   :-78   Min.   :0.0  
 Class :character   Class :character   1st Qu.:37   Class :character   1st Qu.:-76   1st Qu.:1.0  
 Mode  :character   Mode  :character   Median :37   Mode  :character   Median :-76   Median :1.0  
                                       Mean   :37                      Mean   :-76   Mean   :1.1  
                                       3rd Qu.:37                      3rd Qu.:-76   3rd Qu.:1.0  
                                       Max.   :38                      Max.   :-75   Max.   :4.0  
                                                                                                  
   NumKilled    RoadwaySurfaceCond VehicleBodyTypeCd  VehicleManeuverTypeCd WeatherCond       
 Min.   :0.00   Length:1328        Length:1328        Length:1328           Length:1328       
 1st Qu.:0.00   Class :character   Class :character   Class :character      Class :character  
 Median :0.00   Mode  :character   Mode  :character   Mode  :character      Mode  :character  
 Mean   :0.01                                                                                 
 3rd Qu.:0.00                                                                                 
 Max.   :1.00                                                                                 
                                                                                              

One thing that stands out is the DriverAge column. It appears to have invalid data. Where in the world would find drivers ranging in ages from 1 to 65323423.

Let’s get a frequency count of the driver ages.

table(bikeDF$DriverAge)

       1        2        3        4        9       15       16       17       18       19       20 
       1        2        2        1        1        1        5        9       18       17       27 
      21       22       23       24       25       26       27       28       29       30       31 
      28       31       34       28       23       35       27       21       21       31       24 
      32       33       34       35       36       37       38       39       40       41       42 
      24       21       19       18       22       16       23       10       19       24       20 
      43       44       45       46       47       48       49       50       51       52       53 
      17       16       19       20       13       16       30       17       19       25       15 
      54       55       56       57       58       59       60       61       62       63       64 
      14       17       19       15       16       18       16        9       16       13       19 
      65       66       67       68       69       70       71       72       73       74       75 
       7       16       10        9        8        8        7        8       13        8        5 
      76       77       78       79       80       81       82       83       84       85       86 
       5        5        5        4        6        3        6        2        1        4        3 
      87       89       90       91       96       97     1921     2056     2154     2337     2447 
       2        2        1        2        1        1        1        1        1        1        1 
    2633     2736     3039     4640     4823     6057     6347     6577     6767     8144   272730 
       1        1        1        1        1        1        1        1        1        1        1 
  281837   315035   421414   571715   871414 65323423 
       1        1        1        1        1        1 

Those ages over 1000 appear to be a grouping of of ages. For example, 1921, is actually two ages 19 and 21. A larger number like 315035 is actually three pairs of ages, 31, 50, and 35.

Given this observation, create a DriverAge2 column which contains the revised driver’s age. For ages under 100, it will contain the actual age. For ages greater than 999, it will contain the first pair of numbers.

bikeDF <- bikeDF %>% mutate(DriverAge2 = 
                              ifelse(DriverAge >= 1000
                                     , as.numeric(substr(as.character(max(bikeDF$DriverAge))
                                                         , 1
                                                         , 2)
                                                  )
                                     , DriverAge)
                            )

Here’s a frequency count of the revised driver ages.

table(bikeDF$DriverAge2)

 1  2  3  4  9 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 
 1  2  2  1  1  1  5  9 18 17 27 28 31 34 28 23 35 27 21 21 31 24 24 21 19 18 22 16 23 10 19 24 20 17 16 
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 
19 20 13 16 30 17 19 25 15 14 17 19 15 16 18 16  9 16 13 19  7 16 10  9  8  8  7  8 13  8  5  5  5  5  4 
80 81 82 83 84 85 86 87 89 90 91 96 97 
 6  3  6  2  1  4  3  2  2  1  2  1  1 
summary(bikeDF$DriverAge2)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
      1      27      41      43      56      97     194 

This looks more sensible.

Frequency of bicycle accidents by year.

table(bikeDF$CrashYear)

2011 2012 2013 2014 2015 2016 
 252  285  264  245  221   61 

Prep

It is not enough that we do our best; sometimes we must do what is required. [Winston Churchill, former British prime minister]
# re-code variables
bikeDF <- bikeDF %>% 
  mutate(AlcoholRelated2 = ifelse(grepl("^Not", AlcoholRelated, ignore.case=TRUE), "No", "Yes"));
bikeDF <- bikeDF %>% 
  mutate(HitAndRun2 = ifelse(grepl("^Not", HitAndRun, ignore.case=TRUE), "No", "Yes"));
bikeDF <- bikeDF %>% 
  mutate(Intersection = ifelse(grepl("^Not", IntersectionAnalysis, ignore.case=TRUE), "No", "Yes"));
bikeDF <- bikeDF %>% 
  mutate(Distracted = ifelse(grepl("^Distracted", Distracted, ignore.case=TRUE), "Yes", "No"));
bikeDF <- bikeDF %>% 
  mutate(VehicleBodyType = stringi::stri_trim_both(gsub("(9. test)|,","",VehicleBodyTypeCd)))
# Use median age for blank age values
bikeDF <- bikeDF %>%
  mutate(BikerAge = ifelse(is.na(BikerAge), summary(bikeDF$BikerAge)[3], BikerAge))
bikeDF <- bikeDF %>%
  mutate(DriverAge = ifelse(is.na(DriverAge), summary(bikeDF$DriverAge)[3], DriverAge))
bikeDF <- bikeDF %>%
  mutate(DriverAge2 = ifelse(is.na(DriverAge2), summary(bikeDF$DriverAge2)[3], DriverAge2))
# factor variables
bikeDF$DayOfWeek <- factor(bikeDF$DayOfWeek
                           , levels = c("Sunday"
                                       , "Monday"
                                       , "Tuesday"
                                       , "Wednesday"
                                       , "Thursday"
                                       , "Friday"
                                       , "Saturday"))
bikeDF$WeatherCond <- factor(bikeDF$WeatherCond
                             , levels = c("1. No Adverse Condition (Clear/Cloudy)"
                                          , "3. Fog"
                                          , "4. Mist"
                                          , "5. Rain"
                                          , "6. Snow"
                                          , "9. Other"
                                          , "11. Severe Crosswinds"))
bikeDF$RoadwaySurfaceCond <- factor(bikeDF$RoadwaySurfaceCond
                                    , levels = c("1. Dry"
                                                 , "2. Wet"
                                                 , "3. Snowy"
                                                 , "7. Other"
                                                 , "10. Slush"
                                                 , "11. Sand, Dirt, Gravel"))
bikeDF$TimeFrame <- factor(bikeDF$TimeFrame
                           , levels = c("0AM TO 3AM"
                                        , "3AM TO 6AM"
                                        , "6AM TO 9AM"
                                        , "9AM TO 12PM"
                                        , "12PM TO 3PM"
                                        , "3PM TO 6PM"
                                        , "6PM TO 9PM"
                                        , "9PM TO 12AM"))
# format time in 24-hour format
bikeDF$CrashTime2 <- as.character(ifelse(bikeDF$CrashTime < 1000
                                         , paste0("0", bikeDF$CrashTime), bikeDF$CrashTime))
# updated CrashDt column with time in date format
bikeDF$CrashDt2 <- as.POSIXct(strptime(bikeDF$CrashDt
                                       , format="%m/%d/%Y")
                              , format="%m/%d%Y")
bikeDF$CrashDt2 <- as.POSIXct(strptime(paste(bikeDF$CrashDt, bikeDF$CrashTime2)
                                       , format="%m/%d/%Y %H%M")
                              , format="%m/%d%Y %H%M")
bikeDF$CrashMon <- month(bikeDF$CrashDt2)
bikeDF <- bikeDF %>% 
  separate(DriverActionTypeCd, c("DriverActionTypeCd_1", "DriverActionTypeCd_2")
           , ","
           , extra = "merge")
Too few values at 1 locations: 1170
# re-code gender fields -- either Male, Female, or Not Provided
bikeDF <- bikeDF %>%
  mutate(DriverGender2 = derivedFactor(
    "Female" = grepl("^Female", DriverGender, ignore.case=TRUE),
    "Male" = grepl("^Male", DriverGender, ignore.case=TRUE),
    .method = "first",
    .default = "Not Provided"
    ))
bikeDF <- bikeDF %>%
  mutate(BikerGender2 = derivedFactor(
    "Female" = grepl("^Female", BikerGender, ignore.case=TRUE),
    "Male" = grepl("^Male", BikerGender, ignore.case=TRUE),
    .method = "first",
    .default = "Not Provided"
    ))
# updated data w/ re-coded values and new columns
sample_n(bikeDF, 20)
summary(bikeDF$CrashDt2)
                 Min.               1st Qu.                Median                  Mean 
"2011-01-04 06:10:00" "2012-05-17 10:48:15" "2013-07-20 00:27:00" "2013-08-04 06:02:17" 
              3rd Qu.                  Max.                  NA's 
"2014-10-08 16:10:30" "2016-05-31 17:40:00"                   "4" 

Table 1: Age Distribution Breakdown


Age Bicyclist Driver
under 18 277 22
…under 5 3 6
…5 to 17 274 16
18 to 44 650 808
…18 to 24 281 183
…25 to 44 369 625
45 to 64 358 346
65 and over 43 152
16 and over 1128 1320
18 and over 1051 1306
21 and over 902 1244
62 and over 65 200

Create

Imagination is more important than knowledge. For knowledge is limited, whereas imagination embraces the entire world. [Albert Einstein, theoretical physicist]

Were driver’s distracted?

ggplot(data = bikeDF, aes(x = CrashYear, fill = Distracted)) +
  geom_bar(position = "dodge") +
  labs(title = "Frequency Of Distracted Drivers", x = "Year") +
  guides(fill=guide_legend(title="Distracted?")) +
  facet_grid(Distracted ~ .)

Were driver’s impaired?

ggplot(data = bikeDF, aes(x = CrashYear, fill = AlcoholRelated2)) +
  geom_bar(position = "dodge") +
  labs(title = "Frequency Of Alcohol Related Accidents", x = "Year") +
  guides(fill=guide_legend(title="Alcohol Related?")) +
  facet_grid(AlcoholRelated2 ~ .)

Do driver’s flee the scene of the accident?

ggplot(data = bikeDF, aes(x = CrashYear, fill = HitAndRun2)) +
  geom_bar(position = "dodge") +
  labs(title = "Frequency Of Hit & Run Related Accidents", x = "Year") +
  guides(fill=guide_legend(title="Hit & Run?")) 

The DirectionOfTravelCd column appears to be the bicyclist route. It describes the direction travelled before the accident.

dplyr::count(bikeDF, DirectionOfTravelCd) %>%
  arrange(-n) %>%
  mutate(DirectionOfTravelCd = factor(DirectionOfTravelCd, DirectionOfTravelCd)) %>%
  ggplot(mapping = aes(x = DirectionOfTravelCd, y = n)) +
  geom_bar(stat="identity") +
  coord_flip() +
  labs(title = "Accidents Based Upon Direction Of Travel", x = "Direction of Travel", y = "count")

When do most bicycle accidents occur?

ggplot(data = bikeDF, mapping = aes(x = CrashDt2, y = CrashTime)) + 
  geom_point(shape=1, alpha = 0.4) +  # hollow circles
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=0)) +
  geom_smooth() + # smoothing line w/ 95% confidence region
  facet_grid(~ DayOfWeek) +
  labs(title = "Accidents Breakdown By Day Of Week, Year, And Time"
       , x = "Year"
       , y = "Time Of Accident (24-hour)")

ggplot(data = bikeDF, mapping = aes(x = factor(CrashMon), y = CrashTime)) + 
  geom_boxplot() +
  facet_wrap(~ DayOfWeek, nrow = 3) +
  labs(title = "Accident Breakdown By Day Of Week, Month, And Time"
       , x = "Month"
       , y = "Time Of Accident (24-hour)")

Does weather play a role in bicycle accidents?

ggplot(data = bikeDF, mapping = aes(x = CrashYear)) + 
  geom_bar() + 
  facet_wrap(~ WeatherCond) +
  labs(title = "Frequency By Weather Condition", x = "Year")

Do road surface conditions affect the number of bicycle accidents?

ggplot(data = bikeDF, mapping = aes(x = CrashYear)) + 
  geom_bar() + 
  facet_wrap(~ RoadwaySurfaceCond) +
  labs(title = "Frequency By Road Surface Condition", x = "Year")

Are bicycle accidents more prevalent during the day, dusk, or evening?

ggplot(data = bikeDF, mapping = aes(x = CrashYear)) + 
  geom_bar() + 
  facet_wrap(~ LightCondition) +
  labs(title = "Frequency By Lighting Conditions", x = "Year")

Do intersections influence bicycle accidents?

ggplot(data = bikeDF, aes(x = CrashYear, fill = IntersectionAnalysis)) +
  geom_bar(position = "dodge") +
  labs(title = "Frequency Of Intersection Related Accidents", x = "Year") +
  guides(fill=guide_legend(title=""))

ggplot(data = bikeDF, mapping = aes(x = CrashYear)) + 
  geom_bar() + 
  facet_wrap(~ IntersectionType) +
  labs(title = "Frequency By Intersection Type", x = "Year")

ggplot(data = bikeDF, mapping = aes(x = CrashDt2, y = CrashTime)) + 
  geom_point(shape=2, alpha = 0.4) +
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=0)) +
  geom_smooth() + # smoothing line w/ 95% confidence region
  facet_grid(~ IntersectionType) +
  labs(title = "Accidents Breakdown By Intersection Type", x = "Year")

# tabulate Intersection Type and Intersection Analysis columns
knitr::kable(table(bikeDF$IntersectionType, bikeDF$IntersectionAnalysis))
Intersection Not Intersection
1. Not at Intersection 323 92
2. Two Approaches 53 24
3. Three Approaches 185 82
4. Four Approaches 428 134
5. Five-Point, or More 5 1
6. Roundabout 1 0

What role does gender play in bicycle accidents?

ggplot(data = bikeDF, mapping = aes(x = CrashYear)) + 
  geom_bar() + 
  facet_wrap(~ BikerGender2) +
  labs(title = "Frequency By Biker Gender", x = "Year")

ggplot(data = bikeDF, mapping = aes(x = CrashYear)) + 
  geom_bar() + 
  facet_wrap(~ DriverGender2) +
  labs(title = "Frequency By Driver Gender", x = "Year")

At what time frame do most bicycle accidents occur?

ggplot(data = bikeDF, mapping = aes(x = CrashYear)) + 
  geom_bar() + 
  facet_wrap(~ TimeFrame) +
  labs(title = "Frequency By Accident Time Frame", x = "Year")

ggplot(data = bikeDF, mapping = aes(x = factor(CrashMon), y = CrashTime)) + 
  geom_boxplot() +
  facet_wrap(~ DayOfWeek, nrow = 3) +
  stat_summary(fun.y=mean, geom="point", shape=5, size=1) + # plot mean using diamond shape
  labs(title = "Accident Breakdown By Day Of Week And Month"
       , x = "Month"
       , y = "Time Of Accident (24-hour)")

Age analysis.

ggplot(data = bikeDF, mapping = aes(x = BikerAge)) + 
  geom_histogram(bins=70, color="white") + 
  facet_wrap(~ TimeFrame) +
  labs(title = "Frequency Of Biker's Age By Time Frame", x = "Age")

bikeDF %>%
  group_by(CrashMon, TimeFrame) %>%
  summarize(avg_biker_age = mean(BikerAge)
            , std_dev_biker = sd(BikerAge, na.rm = TRUE)
                  , avg_driver_age = mean(DriverAge2)
                  , std_driver_age = sd(DriverAge2, na.rm = TRUE)
                  , num_accidents = n())

Now let’s show where the bicycle accidents occurred.

Group the number of bicycle accidents depending upon the map’s zoom level. The number inside each circle represents the total number of incidents in that area. When you click on a cluster, the map will automatically zoom into that area and split into smaller clusters or show the individual incidents depending on how zoomed in you are. Small circles represent individual incident reports.

leaflet(bikeDF) %>%
  addTiles() %>%
  addCircleMarkers(clusterOptions = markerClusterOptions()
                   , popup=paste("<b>("
                                 , bikeDF$BikerAge
                                 , " old "
                                 , bikeDF$BikerGender2
                                 , " hit by "
                                 , bikeDF$DriverAge2
                                 , " old "
                                 , bikeDF$DriverGender2
                                 , ")</b> "
                                 , "<br/>"
                                 , bikeDF$CrashDt2
                                 , "<br/>"
                                 , bikeDF$DayOfWeek
                                                 , " betweeen "
                                 , bikeDF$TimeFrame
                                 , "<br/>"
                                 , bikeDF$CollisionType
                                                 , " collision at "
                                 , bikeDF$IntersectionType
                                 , "<br/>"
                                                 , "Driver "
                                 , paste(bikeDF$DriverActionTypeCd_1, bikeDF$DriverActionTypeCd_2)
                                 )) 
Assuming 'lng' and 'lat' are longitude and latitude, respectively

Endnotes

Memory usage.

sort(sapply(ls()
            , function(x) {object.size(get(x))}
            )
     ) 
  getBikeAccidentCnt getDriverAccidentCnt               bikeDF            bikes.raw 
               20808                20808               579760              2549352 

Loaded packages.

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] mosaic_0.14.4     Matrix_1.2-8      mosaicData_0.14.0 lattice_0.20-34   lubridate_1.6.0  
 [6] leaflet_1.1.0     dplyr_0.5.0       purrr_0.2.2       readr_1.0.0       tidyr_0.6.1      
[11] tibble_1.2        ggplot2_2.2.1     tidyverse_1.1.1  

loaded via a namespace (and not attached):
 [1] reshape2_1.4.2   splines_3.3.2    haven_1.0.0      colorspace_1.3-2 htmltools_0.3.5  yaml_2.1.14     
 [7] base64enc_0.1-3  foreign_0.8-67   DBI_0.5-1        modelr_0.1.0     readxl_0.1.1     plyr_1.8.4      
[13] stringr_1.2.0    munsell_0.4.3    gtable_0.2.0     rvest_0.3.2      htmlwidgets_0.8  psych_1.6.12    
[19] evaluate_0.10    labeling_0.3     knitr_1.15.1     forcats_0.2.0    httpuv_1.3.3     crosstalk_1.0.0 
[25] curl_2.3         parallel_3.3.2   highr_0.6        broom_0.4.2      Rcpp_0.12.9      xtable_1.8-2    
[31] scales_0.4.1     backports_1.0.5  jsonlite_1.2     mime_0.5         gridExtra_2.2.1  mnormt_1.5-5    
[37] hms_0.3          digest_0.6.12    stringi_1.1.2    shiny_1.0.0      grid_3.3.2       rprojroot_1.2   
[43] tools_3.3.2      magrittr_1.5     lazyeval_0.2.0   ggdendro_0.1-20  MASS_7.3-45      rsconnect_0.7   
[49] xml2_1.1.1       assertthat_0.1   rmarkdown_1.3    httr_1.2.1       R6_2.2.0         nlme_3.1-131    

Report generated: 2017-03-01 22:48:13
