The Norfolk Data Science Meetup group provided the bicycle accidents involving vehicles dataset.
They previewed it during one of their monthly meetings.
As a cycling enthusiast, I thought it would interesting to explore this data file. What time do most accidents occur? Are the number of accidents declining? Was the driver distracted? Do most accidents happen at intersections?
Fortunately, I have not been involved in any accidents involving a motor vehicle. I have been on the receiving end of impatient and disrespectful drivers, however. I try my best to avoid motor vehicles as much as possible. But this can be difficult living in Virginia’s most populous city of Virginia Beach.
It’s time to put our explorer hat on. Let’s load the data and write some code. Time to learn about bicycle accidents in Hampton Roads and the Eastern Shore of Virginia.
Creativity requires input, and that’s what research is. You’re gathering material with which to build. [Gene Luen Yang]
Two datasets are available on the web site. The Summary_Trend_data.csv file appears to be a subset of Summary_O_data.csv. There is no description of the datasets or codebook describing the columns.
Let’s explore the latter file.
# read dataset
bikes.raw <- read_csv("https://raw.githubusercontent.com/NorfolkDataSci/carCrashesWithBikes/master/Summary_O_data.csv")
set.seed(206)
# number of records
nrow(bikes.raw)
[1] 1328
Concentrate all your thoughts upon the work at hand. The sun’s rays do not burn until brought to a focus. [Alexander Graham Bell, scientist and inventor]
# columns
names(bikes.raw)
[1] "Access Control" "Alcohol Notalcohol"
[3] "Area Type1" "A Crash"
[5] "A People" "Basetypedesc"
[7] "Begin Node Dsc" "Belted Unbelted"
[9] "Bikeage" "Bikegen"
[11] "Bikeinjurytype" "Bikevehiclenumber"
[13] "Bike Nonbike" "BMP"
[15] "B Crash" "B People"
[17] "Carspeedlimit" "Collision Type"
[19] "Comm Cargo Body Type Cd" "Comm Vehicle Body Type Cd"
[21] "Cotedrouteid" "Coted Mp"
[23] "Count App" "CRASH_DT (copy)"
[25] "Crash Dt" "Crash Event Type Dsc"
[27] "Crash Military Tm" "Crash Severity"
[29] "Crash Year" "Curbgutterdesc"
[31] "C Crash" "C People"
[33] "Juris Name Used" "Area Type Used"
[35] "First Harmful Event of Entire Crash" "MAINLINE"
[37] "Time Slicing Used" "Phy_Juris_Nm"
[39] "Offset-Ft" "Intersection Analysis"
[41] "Clear" "Calculation_9640323132123359"
[43] "Day Of Week" "Deer Nodeer"
[45] "Direction Of Travel Cd" "Distracted Notdistracted"
[47] "VSP" "TOTAL CRASH"
[49] "Document Nbr" "Driverage"
[51] "Drivergen" "Driverinjurytype"
[53] "Drivervehiclenumber" "Driver Action Type Cd"
[55] "Driver Airbag Deployment" "Driver Alcohol Test Type Cd"
[57] "Driver Condition Type Cd" "Driver Distraction Type Cd"
[59] "Driver Drinking Type Cd" "Driver Drug Use Cd"
[61] "Driver Ejected From Vehicle" "Driver Ems Transport Ind"
[63] "Driver Fled Scene Ind" "Driver Safety Equip Used"
[65] "Driver Vis Obscured Type Cd" "Drowsy Notdrowsy"
[67] "Drug Nodrug" "EMP"
[69] "End Node" "End Node Dsc"
[71] "End Offset" "Facility"
[73] "FAC" "First Crash Event Cd"
[75] "First Harmful Event" "Fourth Crash Event Cd"
[77] "Functionalclass" "FUN"
[79] "Govcondesc" "Gr Nogr"
[81] "Hitrun Not Hitrun" "Initial Veh Impact Area Cd"
[83] "Injury Crashes" "Intersection Type"
[85] "Int Doc" "Jurtype"
[87] "K_CRASH" "K People"
[89] "LAT (copy)" "LAT"
[91] "Leftshoulderwidth" "Length"
[93] "Lgtruck Nonlgtruck" "Light Condition"
[95] "Located Unlocated" "LON (copy)"
[97] "LON" "MAINLINE (group)"
[99] "Mainline Yn" "Medianleftshoulderwidth"
[101] "Medianrightshoulderwidth" "Mediantypedesc"
[103] "Medianwidthmax" "Medianwidthmin"
[105] "Median Type" "Most Harmful Crash Event Cd"
[107] "Motor Nonmotor" "Node"
[109] "Node Info" "Node Totaadt2011"
[111] "Node Totaadt2012" "Node Totaadt2013"
[113] "Node Totaadt2014" "Node Totaadt2015"
[115] "Numberoflane" "Number of Records"
[117] "Offset" "Ownership"
[119] "Passage" "Passgen"
[121] "Passinjurytype" "Passvehiclenumber"
[123] "Pass Airbag Deployment" "Pass Ejected From Vehicle"
[125] "Pass Ems Transport Ind" "Pass Safety Equip Used"
[127] "Pavementconditionvalue" "Pavementroughnessvalue"
[129] "Pavementwidth" "Pdo Crash"
[131] "Total People Killed & Injured" "Pedage"
[133] "Pedestrians Injured" "Pedestrians Killed"
[135] "Pedgen" "Pedinjurytype"
[137] "Pednumber" "Ped Action"
[139] "Ped Al Test" "Ped Cond"
[141] "Ped Drink" "Ped Drug"
[143] "Ped Nonped" "Ped Rflct"
[145] "Persons Injured" "Persons Killed"
[147] "Physical Juris Nm" "PJR"
[149] "VSP_Used" "Rd Type"
[151] "Relation To Roadway" "Rightshoulderwidth"
[153] "RNS_MP (0.25 mi bin)" "Rns Mp"
[155] "Roadway Alignment" "Roadway Defect"
[157] "Roadway Description" "Roadway Surface Cond"
[159] "Roadway Surface Type" "Route Or Street Nm"
[161] "Rte Category Cd" "Rte Cat"
[163] "Rte Nm" "Ruralurbandesc"
[165] "School Zone" "Second Crash Event Cd"
[167] "Segtotaadt2011" "Segtotaadt2012"
[169] "Segtotaadt2013" "Segtotaadt2014"
[171] "Segtotaadt2015" "Senior Notsenior"
[173] "Sidewalkdesc" "Speed Before"
[175] "Speed Max Safe" "Speed Notspeed"
[177] "Speed Posted" "Start Node"
[179] "Start Offset" "Summons Issued Cd"
[181] "Surfacedesc" "Third Crash Event Cd"
[183] "Time Slicing" "Total Crashes including Property Damage Only"
[185] "Total Crash" "Traffic Control Type"
[187] "Trfc Ctrl Status Type" "Truckcommr"
[189] "Vehiclenumber" "Vehicle Body Type Cd"
[191] "Vehicle Make Nm" "Vehicle Maneuver Type Cd"
[193] "Vehicle Model Nm" "Vehicle Year Nbr"
[195] "District" "District_Used"
[197] "Weather Condition" "Work Zone Location"
[199] "Work Zone Related" "Work Zone Type"
[201] "Young Notyoung"
glimpse(bikes.raw)
Observations: 1,328
Variables: 201
$ Access Control <chr> "No Access Control", "No Access Control", "No Ac...
$ Alcohol Notalcohol <chr> "Not ALCIHOL", "Not ALCIHOL", "Not ALCIHOL", "No...
$ Area Type1 <chr> "Urban", "Urban", "Urban", "Urban", "Urban", "Ur...
$ A Crash <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, ...
$ A People <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, ...
$ Basetypedesc <chr> "Bituminous Concrete (Black Base)", "Bituminous ...
$ Begin Node Dsc <chr> "US-00058(B)/", "FAIRFIELD BLVD(L)/", "REPUBLIC ...
$ Belted Unbelted <chr> "Not UNBELTED", "Not UNBELTED", "Not UNBELTED", ...
$ Bikeage <int> 49, 25, 17, 14, 44, 20, NA, 20, 19, 61, 17, 21, ...
$ Bikegen <chr> "Male", "Male", "Male", "Male", "Not Provided", ...
$ Bikeinjurytype <chr> "B", "B", "C", "C", "C", "C", "B", "C", "A", "C"...
$ Bikevehiclenumber <int> 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, ...
$ Bike Nonbike <chr> "BIKE", "BIKE", "BIKE", "BIKE", "BIKE", "BIKE", ...
$ BMP <dbl> 13.43, 11.52, 0.76, NA, 25.65, NA, NA, 2.15, 2.7...
$ B Crash <int> 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, ...
$ B People <int> 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, ...
$ Carspeedlimit <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Collision Type <chr> "2. Angle", "2. Angle", "2. Angle", "2. Angle", ...
$ Comm Cargo Body Type Cd <chr> "Not ProvidedNot Provided", "Not Provided,Not Pr...
$ Comm Vehicle Body Type Cd <chr> "Not ProvidedNot Provided", "Not Provided,Not Pr...
$ Cotedrouteid <chr> "SR00190", "SR00190", "13400009", NA, "US00017",...
$ Coted Mp <dbl> 13.96, 11.53, 0.76, NA, 25.66, NA, NA, 2.15, 2.7...
$ Count App <int> 3, 4, 4, NA, 3, 4, NA, 4, 3, NA, 3, 3, 3, 4, 4, ...
$ CRASH_DT (copy) <chr> "1/4/2011", "1/6/2011", "1/6/2011", "1/8/2011", ...
$ Crash Dt <chr> "1/4/2011", "1/6/2011", "1/6/2011", "1/8/2011", ...
$ Crash Event Type Dsc <chr> "20. Motor Vehicle In Transport", "22. Bicycle",...
$ Crash Military Tm <int> 610, 1715, 1930, 1624, 1330, 1923, 1145, 1005, 1...
$ Crash Severity <chr> "B.Visiible Injury", "B.Visiible Injury", "C.Non...
$ Crash Year <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, ...
$ Curbgutterdesc <chr> "Left and Right sides", "Left and Right sides", ...
$ C Crash <int> 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, ...
$ C People <int> 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, ...
$ Juris Name Used <chr> "Virginia Beach", "Virginia Beach", "Virginia Be...
$ Area Type Used <chr> "Urban", "Urban", "Urban", "Urban", "Urban", "Ur...
$ First Harmful Event of Entire Crash <chr> "20. Motor Vehicle In Transport", "22. Bicycle",...
$ MAINLINE <chr> "MAIN", "MAIN", "MAIN", "MAIN", "MAIN", "MAIN", ...
$ Time Slicing Used <chr> "6AM TO 9AM", "3PM TO 6PM", "6PM TO 9PM", "3PM T...
$ Phy_Juris_Nm <chr> "134.Virginia Beach", "134.Virginia Beach", "134...
$ Offset-Ft <dbl> 1716, 2101, 898, NA, 655, 0, NA, 0, 48, NA, 697,...
$ Intersection Analysis <chr> "Intersection", "Intersection", "Intersection", ...
$ Clear <chr> "Reset Filters", "Reset Filters", "Reset Filters...
$ Calculation_9640323132123359 <chr> "VDOT_OTHER", "VDOT_OTHER", "VDOT_OTHER", "VDOT_...
$ Day Of Week <chr> "Tuesday", "Thursday", "Thursday", "Saturday", "...
$ Deer Nodeer <chr> "Not DEER", "Not DEER", "Not DEER", "Not DEER", ...
$ Direction Of Travel Cd <chr> "SouthEast", "North,South", "North,East", "South...
$ Distracted Notdistracted <chr> NA, NA, "DISTRACTED", NA, "DISTRACTED", NA, NA, ...
$ VSP <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ...
$ TOTAL CRASH <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ Document Nbr <int> 111050057, 111050063, 111050064, 111640067, 1205...
$ Driverage <int> 62, NA, 73, 40, NA, 30, 55, 26, 20, 47, NA, 31, ...
$ Drivergen <chr> "Male", "Male", "Male", "Male", "Not Provided", ...
$ Driverinjurytype <chr> "B,PDO", "B,PDO", "PDO,C", "PDO,C", "PDO,C", "PD...
$ Drivervehiclenumber <dbl> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, ...
$ Driver Action Type Cd <chr> "28. Driving Without Lights,1. No Improper Actio...
$ Driver Airbag Deployment <chr> "3. Unavailable/Not Applicable,1. Deployed - Fro...
$ Driver Alcohol Test Type Cd <chr> "Not Applicable,Not Applicable", "Not Applicable...
$ Driver Condition Type Cd <chr> "1. No Defects,1. No Defects", "1. No Defects,9....
$ Driver Distraction Type Cd <chr> "Not Applicable,Not Applicable", "Not Applicable...
$ Driver Drinking Type Cd <chr> "1. Had Not Been Drinking,1. Had Not Been Drinki...
$ Driver Drug Use Cd <chr> "Not Applicable,Not Applicable", "2. No,3. Unkno...
$ Driver Ejected From Vehicle <chr> "2. Partially Ejected,1. Not Ejected", "1. Not E...
$ Driver Ems Transport Ind <chr> "No,NotProvided", "No,NotProvided", "NotProvided...
$ Driver Fled Scene Ind <chr> "No,No", "No,Yes", "No,No", "No,No", "Yes,No", "...
$ Driver Safety Equip Used <chr> "8. No Restraint Used,3. Lap and Shoulder Belt",...
$ Driver Vis Obscured Type Cd <chr> "1. Not Obscured,13. Other", "1. Not Obscured,No...
$ Drowsy Notdrowsy <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Drug Nodrug <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ EMP <int> 14, 12, 1, NA, 26, NA, NA, 2, 3, NA, 3, 0, 1, 1,...
$ End Node <int> 541186, 541148, 541257, NA, 483086, NA, NA, 5411...
$ End Node Dsc <chr> "LAVENDER LANE(R)/", "SR-00165(B)/", "134-08714(...
$ End Offset <dbl> 0.00, 0.00, 0.00, NA, 0.00, NA, NA, 0.00, 0.00, ...
$ Facility <int> 1, 1, 1, NA, 0, 1, NA, 1, 1, NA, 0, 0, 0, 1, 2, ...
$ FAC <chr> "1.Divided, no control of access", "1.Divided, n...
$ First Crash Event Cd <chr> "20,22", "20,22", "20,20", "22,20", "22,22", "22...
$ First Harmful Event <chr> "1. On Roadway", "1. On Roadway", "1. On Roadway...
$ Fourth Crash Event Cd <chr> "0,0", "0,0", "0,0", "0,0", "0,0", "0,0", "0,0",...
$ Functionalclass <chr> "H", "H", "H", NA, "E", "H", NA, "E", "H", NA, "...
$ FUN <chr> "H.Urban Minor Arterial", "H.Urban Minor Arteria...
$ Govcondesc <chr> "Urban Extensions - Primary Routes", "Urban Exte...
$ Gr Nogr <chr> "Not GUARDRAIL", "Not GUARDRAIL", "Not GUARDRAIL...
$ Hitrun Not Hitrun <chr> "Not HIT_RUN", "HIT_RUN", "Not HIT_RUN", "Not HI...
$ Initial Veh Impact Area Cd <chr> "2,12", "3,1", "12,3", "12,9", "1,6", "12,4", "1...
$ Injury Crashes <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, ...
$ Intersection Type <chr> "2. Two Approaches", "4. Four Approaches", "4. F...
$ Int Doc <int> 111050057, 111050063, 111050064, NA, 120585198, ...
$ Jurtype <chr> "C", "C", "C", "C", "C", "C", "C", "C", "C", "C"...
$ K_CRASH <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ...
$ K People <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ...
$ LAT (copy) <dbl> 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, ...
$ LAT <dbl> 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, ...
$ Leftshoulderwidth <int> 0, 0, 0, NA, 0, NA, NA, 0, 0, NA, 0, 0, 0, 0, 10...
$ Length <dbl> 0.64, 0.59, 0.19, NA, 0.13, NA, NA, 0.23, 0.03, ...
$ Lgtruck Nonlgtruck <chr> "Not LRTRUCK", "Not LRTRUCK", "Not LRTRUCK", "No...
$ Light Condition <chr> "4. Darkness - Road Lighted", "3. Dusk", "4. Dar...
$ Located Unlocated <chr> "LOCATED", "LOCATED", "LOCATED", "UNLOCATED", "L...
$ LON (copy) <dbl> -76, -76, -76, -76, -76, -76, -76, -76, -76, -76...
$ LON <dbl> -76, -76, -76, -76, -76, -76, -76, -76, -76, -76...
$ MAINLINE (group) <chr> "MAINLINE", "MAINLINE", "MAINLINE", "MAINLINE", ...
$ Mainline Yn <chr> "MAIN", "MAIN", "MAIN", NA, "MAIN", NA, NA, "MAI...
$ Medianleftshoulderwidth <int> 0, 0, 0, NA, 0, NA, NA, 6, 0, NA, 0, 0, 0, 0, 0,...
$ Medianrightshoulderwidth <int> 0, 0, 0, NA, 0, NA, NA, 6, 0, NA, 0, 0, 0, 0, 0,...
$ Mediantypedesc <chr> "Curbed Grass", "Curbed Grass", "Curbed Grass", ...
$ Medianwidthmax <int> 16, 16, 16, NA, 0, NA, NA, 16, 14, NA, 0, 0, 0, ...
$ Medianwidthmin <int> 8, 4, 4, NA, 0, NA, NA, 4, 3, NA, 0, 0, 0, 16, 1...
$ Median Type <chr> "Divided Roadway", "Divided Roadway", "Divided R...
$ Most Harmful Crash Event Cd <chr> "20,22", "20,22", "20,20", "22,20", "22,22", "22...
$ Motor Nonmotor <chr> "Not MOTORCYCLE", "Not MOTORCYCLE", "Not MOTORCY...
$ Node <int> 541188, 541174, 541066, NA, 483086, 541275, NA, ...
$ Node Info <chr> "541188. SR00190 SR00190 13408749", ...
$ Node Totaadt2011 <int> 20258, 48566, 63023, NA, 27156, 57323, NA, 57661...
$ Node Totaadt2012 <int> 18296, 44998, 61504, NA, 28053, 59152, NA, 55396...
$ Node Totaadt2013 <int> 18464, 45413, 62556, NA, 26395, 59698, NA, 55735...
$ Node Totaadt2014 <int> 17795, 43765, 60804, NA, 24772, 57532, NA, 54348...
$ Node Totaadt2015 <int> 17531, 44158, 62575, NA, 25667, 59080, NA, 54198...
$ Numberoflane <int> 4, 4, 4, NA, 4, NA, NA, 6, 4, NA, 4, 2, 2, 4, 4,...
$ Number of Records <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ Offset <dbl> 0.325, 0.398, 0.170, NA, 0.124, 0.000, NA, 0.000...
$ Ownership <chr> "PRI_URBAN", "PRI_URBAN", "SEC_URBAN", "SEC_URBA...
$ Passage <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Passgen <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Passinjurytype <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Passvehiclenumber <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pass Airbag Deployment <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pass Ejected From Vehicle <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pass Ems Transport Ind <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pass Safety Equip Used <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pavementconditionvalue <dbl> 3, 0, 0, NA, 0, NA, NA, 0, 0, NA, 0, 0, 0, 0, 0,...
$ Pavementroughnessvalue <int> 0, 265, 0, NA, 188, NA, NA, 115, 0, NA, 0, 0, 0,...
$ Pavementwidth <int> 54, 54, 52, NA, 56, NA, NA, 76, 52, NA, 52, 0, 3...
$ Pdo Crash <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ Total People Killed & Injured <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ Pedage <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pedestrians Injured <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ Pedestrians Killed <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ Pedgen <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pedinjurytype <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Pednumber <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Ped Action <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Ped Al Test <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Ped Cond <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Ped Drink <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Ped Drug <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Ped Nonped <chr> "Not PED", "Not PED", "Not PED", "Not PED", "Not...
$ Ped Rflct <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ Persons Injured <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, ...
$ Persons Killed <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ...
$ Physical Juris Nm <chr> "Virginia Beach", "Virginia Beach", "Virginia Be...
$ PJR <int> 134, 134, 134, 121, 124, 134, 131, 134, 134, 122...
$ VSP_Used <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ...
$ Rd Type <chr> "NOT-RD", "NOT-RD", "NOT-RD", "NOT-RD", "NOT-RD"...
$ Relation To Roadway <chr> "15. Other Crossing (Crossing for Bikes, School,...
$ Rightshoulderwidth <int> 0, 0, 0, NA, 0, NA, NA, 0, 0, NA, 0, 0, 0, 0, 10...
$ RNS_MP (0.25 mi bin) <dbl> 13.75, 11.50, 0.75, NA, 25.50, 6.50, NA, 2.00, 2...
$ Rns Mp <dbl> 13.95, 11.53, 0.75, NA, 25.66, 6.68, NA, 2.14, 2...
$ Roadway Alignment <chr> "1. Straight - Level", "1. Straight - Level", "2...
$ Roadway Defect <chr> "1. No Defects", "1. No Defects", "1. No Defects...
$ Roadway Description <chr> "1. Two-Way, Not Divided", "2. Two-Way, Divided,...
$ Roadway Surface Cond <chr> "1. Dry", "1. Dry", "1. Dry", "1. Dry", "2. Wet"...
$ Roadway Surface Type <chr> "2. Blacktop, Asphalt, Bituminous", "2. Blacktop...
$ Route Or Street Nm <chr> "527 n witchduck rd", "kempsvillr rd", "799 firs...
$ Rte Category Cd <chr> "STPRI", "STPRI", "URB", "UNKWN", "USPRI", "URB"...
$ Rte Cat <chr> "SR", "SR", "UR", "SC", "US", "UR", "SC", "SR", ...
$ Rte Nm <chr> "R-VA SR00190WB", "R-VA SR00190WB", "R-VA134...
$ Ruralurbandesc <chr> "Urbanized (Population 200,000 and over)", "Urba...
$ School Zone <chr> "3. No", "3. No", "3. No", "3. No", "3. No", "3....
$ Second Crash Event Cd <chr> "0,0", "0,0", "0,0", "0,0", "0,0", "0,0", "0,0",...
$ Segtotaadt2011 <int> 18442, 31529, 36525, NA, 24625, NA, NA, 50491, 1...
$ Segtotaadt2012 <int> 16730, 29632, 34557, NA, 25523, NA, NA, 47214, 1...
$ Segtotaadt2013 <int> 16884, 29906, 34876, NA, 26422, NA, NA, 47482, 1...
$ Segtotaadt2014 <int> 16272, 28820, 33610, NA, 22408, NA, NA, 46377, 1...
$ Segtotaadt2015 <int> 15297, 27832, 35142, NA, 23257, NA, NA, 46649, 1...
$ Senior Notsenior <chr> "Not SENIOR", "Not SENIOR", "SENIOR", "Not SENIO...
$ Sidewalkdesc <chr> "Left and Right sides", "Left and Right sides", ...
$ Speed Before <chr> "5,5", "2,0", "30,3", "5,5", "0,5", "30,0", "5,5...
$ Speed Max Safe <chr> "5,5", "35,0", "35,35", "35,0", "35,35", "45,0",...
$ Speed Notspeed <chr> "Not SPEED", "Not SPEED", "Not SPEED", "SPEED", ...
$ Speed Posted <chr> "35,25", "35,0", "35,35", "35,0", "35,35", "45,0...
$ Start Node <int> 541088, 541176, 734938, NA, 483087, NA, NA, 5412...
$ Start Offset <dbl> 0.00, 0.00, 0.00, NA, 0.14, NA, NA, 0.00, 0.00, ...
$ Summons Issued Cd <chr> "2. No,2. No", "2. No,Not Provided", "Not Provid...
$ Surfacedesc <chr> "0", "0", "0", NA, "0", NA, NA, "0", "0", NA, "0...
$ Third Crash Event Cd <chr> "0,0", "0,0", "0,0", "0,0", "0,0", "0,0", "0,0",...
$ Time Slicing <chr> "6AM TO 9AM", "3PM TO 6PM", "6PM TO 9PM", "3PM T...
$ Total Crashes including Property Damage Only <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ Total Crash <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ Traffic Control Type <chr> "4. Stop Sign", "3. Traffic Signal", "3. Traffic...
$ Trfc Ctrl Status Type <chr> "6. No Traffic Control Device Present", "1. Yes ...
$ Truckcommr <chr> "Not a Parkway - Trucks and Commercial Vehicles ...
$ Vehiclenumber <dbl> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, ...
$ Vehicle Body Type Cd <chr> "9. Bicycle,1. Passenger car", "9. Bicycle,1. Pa...
$ Vehicle Make Nm <chr> "huffy,ford", "giant", "dodge,fuju", "DODGE", NA...
$ Vehicle Maneuver Type Cd <chr> "1. Going Straight Ahead,3. Making Left Turn", "...
$ Vehicle Model Nm <chr> "bicycle,focus", "defy 3", "journey,bicycle", "R...
$ Vehicle Year Nbr <dbl> 2009, 2009, 20092007, 1998, 0, 2007, 20072000, 2...
$ District <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ...
$ District_Used <chr> "5.Hampton Roads", "5.Hampton Roads", "5.Hampton...
$ Weather Condition <chr> "1. No Adverse Condition (Clear/Cloudy)", "1. No...
$ Work Zone Location <chr> "Not Provided", "Not Provided", "Not Provided", ...
$ Work Zone Related <chr> "2. No", "2. No", "2. No", "2. No", "2. No", "2....
$ Work Zone Type <chr> "Not Provided", "Not Provided", "Not Provided", ...
$ Young Notyoung <chr> "Not YOUNG", "Not YOUNG", "YOUNG", "Not YOUNG", ...
sample_n(bikes.raw, 20)
After reviewing the list of columns, I’ve decided to concentrate on the following:
| Column | New Column | Notes |
|---|---|---|
| [2] “Alcohol Notalcohol” | AlcoholRelated | factor variable of Yes/No, discrete-nominal |
| [9] “Bikeage” | BikerAge | continuous |
| [10] “Bikegen” | BikerGender | discrete-nominal |
| [18] “Collision Type” | CollisionType | consider split into cd and description, discrete-nominal |
| [25] “Crash Dt” | CrashDate | continuous |
| [26] “Crash Event Type Dsc” | CrashEventTypeDesc | consider split into cd and description, discrete-nominal |
| [27] “Crash Military Tm” | CrashTime | continuous |
| [28] “Crash Severity” | CrashSeverity | consider split into cd and description, discrete-ordinal |
| [29] “Crash Year” | CrashYear | continuous |
| [37] “Time Slicing Used” | TimeFrame | discrete-ordinal |
| [40] “Intersection Analysis” | IntersectionAnalysis | discrete-nominal |
| [43] “Day Of Week” | DayOfWeek | discrete-ordinal |
| [45] “Direction Of Travel Cd” | DirectionOfTravelCd | discrete-nominal |
| [46] “Distracted Notdistracted” | Distracted | discrete-nominal |
| [49] “Document Nbr” | DocNbr | unique identifier, discrete-nominal, identification variable |
| [50] “Driverage” | DriversAge | continuous |
| [51] “Drivergen” | DriversGender | discrete-nominal |
| [54] “Driver Action Type Cd” | DriverActionTypeCd | discrete-nominal |
| [56] “Driver Alcohol Test Type Cd” | DriverAlcholTestTypeCd | discrete-nominal |
| [58] “Driver Distraction Type Cd” | DriverDistractionTypeCd | discrete-nominal |
| [59] “Driver Drinking Type Cd” | DriverDrinkingTypeCd | discrete-nominal |
| [60] “Driver Drug Use Cd” | DriverDrugUseCd | discrete-nominal |
| [61] “Driver Ejected From Vehicle” | DriverEjectedFromVehicle | discrete-nominal |
| [65] “Driver Vis Obscured Type Cd” | DriverVisObscuredTypeCd | discrete-nominal |
| [81] “Hitrun Not Hitrun” | HitAndRun | Yes/No, discrete-nominal |
| [84] “Intersection Type” | IntersectionType | discrete-nominal |
| [90] “LAT” | lat | continuous |
| [94] “Light Condition” | LightingCond | discrete-nominal |
| [97] “LON” | lng | continuous |
| [145] “Persons Injured” | NumInjured | continuous |
| [146] “Persons Killed” | NumKilled | continuous |
| [158] “Roadway Surface Cond” | RoadwaySurfaceCond | discrete-nominal |
| [190] “Vehicle Body Type Cd” | VehicleBodyTypeCd | discrete-nominal |
| [192] “Vehicle Maneuver Type Cd” | VehicleManeuverTypeCd | discrete-nominal |
| [197] “Weather Condition” | WeatherCond | discrete-ordinal |
Why these columns? Many columns were excluded because:
bikeDF <- bikes.raw[, c(2,9,10,18,25:29,37,40,43,45,46,49:51,54,56
,58:61,65,81,84,90,94,97,145,146,158,190,192,197)]
# rename columns
colnames(bikeDF) <- c("AlcoholRelated", "BikerAge", "BikerGender"
, "CollisionType", "CrashDt", "CrashEventTypeDesc", "CrashTime"
, "crashSeverity", "CrashYear"
, "TimeFrame", "IntersectionAnalysis", "DayOfWeek", "DirectionOfTravelCd"
, "Distracted", "DocNbr", "DriverAge"
, "DriverGender", "DriverActionTypeCd", "DriverAlcoholTestTypeCd"
, "DriverDistractionTypeCd", "DriverDrinkingTypeCd"
, "DriverDrugUseCd", "DriverEjectedFromVehicle"
, "DriverVisObscuredTypeCd", "HitAndRun", "IntersectionType"
, "lat", "LightCondition"
, "lng", "NumInjured", "NumKilled", "RoadwaySurfaceCond"
, "VehicleBodyTypeCd", "VehicleManeuverTypeCd"
, "WeatherCond")
summary(bikeDF)
AlcoholRelated BikerAge BikerGender CollisionType CrashDt
Length:1328 Min. : 4 Length:1328 Length:1328 Length:1328
Class :character 1st Qu.:19 Class :character Class :character Class :character
Mode :character Median :28 Mode :character Mode :character Mode :character
Mean :33
3rd Qu.:49
Max. :85
NA's :26
CrashEventTypeDesc CrashTime crashSeverity CrashYear TimeFrame
Length:1328 Min. : 0 Length:1328 Min. :2011 Length:1328
Class :character 1st Qu.:1110 Class :character 1st Qu.:2012 Class :character
Mode :character Median :1501 Mode :character Median :2013 Mode :character
Mean :1421 Mean :2013
3rd Qu.:1753 3rd Qu.:2014
Max. :2356 Max. :2016
IntersectionAnalysis DayOfWeek DirectionOfTravelCd Distracted DocNbr
Length:1328 Length:1328 Length:1328 Length:1328 Min. :110810052
Class :character Class :character Class :character Class :character 1st Qu.:121615168
Mode :character Mode :character Mode :character Mode :character Median :132482536
Mean :133483626
3rd Qu.:142960095
Max. :161960036
DriverAge DriverGender DriverActionTypeCd DriverAlcoholTestTypeCd DriverDistractionTypeCd
Min. : 1 Length:1328 Length:1328 Length:1328 Length:1328
1st Qu.: 27 Class :character Class :character Class :character Class :character
Median : 41 Mode :character Mode :character Mode :character Mode :character
Mean : 58970
3rd Qu.: 57
Max. :65323423
NA's :172
DriverDrinkingTypeCd DriverDrugUseCd DriverEjectedFromVehicle DriverVisObscuredTypeCd
Length:1328 Length:1328 Length:1328 Length:1328
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
HitAndRun IntersectionType lat LightCondition lng NumInjured
Length:1328 Length:1328 Min. :37 Length:1328 Min. :-78 Min. :0.0
Class :character Class :character 1st Qu.:37 Class :character 1st Qu.:-76 1st Qu.:1.0
Mode :character Mode :character Median :37 Mode :character Median :-76 Median :1.0
Mean :37 Mean :-76 Mean :1.1
3rd Qu.:37 3rd Qu.:-76 3rd Qu.:1.0
Max. :38 Max. :-75 Max. :4.0
NumKilled RoadwaySurfaceCond VehicleBodyTypeCd VehicleManeuverTypeCd WeatherCond
Min. :0.00 Length:1328 Length:1328 Length:1328 Length:1328
1st Qu.:0.00 Class :character Class :character Class :character Class :character
Median :0.00 Mode :character Mode :character Mode :character Mode :character
Mean :0.01
3rd Qu.:0.00
Max. :1.00
One thing that stands out is the DriverAge column. It appears to have invalid data. Where in the world would find drivers ranging in ages from 1 to 65323423.
Let’s get a frequency count of the driver ages.
table(bikeDF$DriverAge)
1 2 3 4 9 15 16 17 18 19 20
1 2 2 1 1 1 5 9 18 17 27
21 22 23 24 25 26 27 28 29 30 31
28 31 34 28 23 35 27 21 21 31 24
32 33 34 35 36 37 38 39 40 41 42
24 21 19 18 22 16 23 10 19 24 20
43 44 45 46 47 48 49 50 51 52 53
17 16 19 20 13 16 30 17 19 25 15
54 55 56 57 58 59 60 61 62 63 64
14 17 19 15 16 18 16 9 16 13 19
65 66 67 68 69 70 71 72 73 74 75
7 16 10 9 8 8 7 8 13 8 5
76 77 78 79 80 81 82 83 84 85 86
5 5 5 4 6 3 6 2 1 4 3
87 89 90 91 96 97 1921 2056 2154 2337 2447
2 2 1 2 1 1 1 1 1 1 1
2633 2736 3039 4640 4823 6057 6347 6577 6767 8144 272730
1 1 1 1 1 1 1 1 1 1 1
281837 315035 421414 571715 871414 65323423
1 1 1 1 1 1
Those ages over 1000 appear to be a grouping of of ages. For example, 1921, is actually two ages 19 and 21. A larger number like 315035 is actually three pairs of ages, 31, 50, and 35.
Given this observation, create a DriverAge2 column which contains the revised driver’s age. For ages under 100, it will contain the actual age. For ages greater than 999, it will contain the first pair of numbers.
bikeDF <- bikeDF %>% mutate(DriverAge2 =
ifelse(DriverAge >= 1000
, as.numeric(substr(as.character(max(bikeDF$DriverAge))
, 1
, 2)
)
, DriverAge)
)
Here’s a frequency count of the revised driver ages.
table(bikeDF$DriverAge2)
1 2 3 4 9 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
1 2 2 1 1 1 5 9 18 17 27 28 31 34 28 23 35 27 21 21 31 24 24 21 19 18 22 16 23 10 19 24 20 17 16
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
19 20 13 16 30 17 19 25 15 14 17 19 15 16 18 16 9 16 13 19 7 16 10 9 8 8 7 8 13 8 5 5 5 5 4
80 81 82 83 84 85 86 87 89 90 91 96 97
6 3 6 2 1 4 3 2 2 1 2 1 1
summary(bikeDF$DriverAge2)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1 27 41 43 56 97 194
This looks more sensible.
Frequency of bicycle accidents by year.
table(bikeDF$CrashYear)
2011 2012 2013 2014 2015 2016
252 285 264 245 221 61
It is not enough that we do our best; sometimes we must do what is required. [Winston Churchill, former British prime minister]
# re-code variables
bikeDF <- bikeDF %>%
mutate(AlcoholRelated2 = ifelse(grepl("^Not", AlcoholRelated, ignore.case=TRUE), "No", "Yes"));
bikeDF <- bikeDF %>%
mutate(HitAndRun2 = ifelse(grepl("^Not", HitAndRun, ignore.case=TRUE), "No", "Yes"));
bikeDF <- bikeDF %>%
mutate(Intersection = ifelse(grepl("^Not", IntersectionAnalysis, ignore.case=TRUE), "No", "Yes"));
bikeDF <- bikeDF %>%
mutate(Distracted = ifelse(grepl("^Distracted", Distracted, ignore.case=TRUE), "Yes", "No"));
bikeDF <- bikeDF %>%
mutate(VehicleBodyType = stringi::stri_trim_both(gsub("(9. test)|,","",VehicleBodyTypeCd)))
# Use median age for blank age values
bikeDF <- bikeDF %>%
mutate(BikerAge = ifelse(is.na(BikerAge), summary(bikeDF$BikerAge)[3], BikerAge))
bikeDF <- bikeDF %>%
mutate(DriverAge = ifelse(is.na(DriverAge), summary(bikeDF$DriverAge)[3], DriverAge))
bikeDF <- bikeDF %>%
mutate(DriverAge2 = ifelse(is.na(DriverAge2), summary(bikeDF$DriverAge2)[3], DriverAge2))
# factor variables
bikeDF$DayOfWeek <- factor(bikeDF$DayOfWeek
, levels = c("Sunday"
, "Monday"
, "Tuesday"
, "Wednesday"
, "Thursday"
, "Friday"
, "Saturday"))
bikeDF$WeatherCond <- factor(bikeDF$WeatherCond
, levels = c("1. No Adverse Condition (Clear/Cloudy)"
, "3. Fog"
, "4. Mist"
, "5. Rain"
, "6. Snow"
, "9. Other"
, "11. Severe Crosswinds"))
bikeDF$RoadwaySurfaceCond <- factor(bikeDF$RoadwaySurfaceCond
, levels = c("1. Dry"
, "2. Wet"
, "3. Snowy"
, "7. Other"
, "10. Slush"
, "11. Sand, Dirt, Gravel"))
bikeDF$TimeFrame <- factor(bikeDF$TimeFrame
, levels = c("0AM TO 3AM"
, "3AM TO 6AM"
, "6AM TO 9AM"
, "9AM TO 12PM"
, "12PM TO 3PM"
, "3PM TO 6PM"
, "6PM TO 9PM"
, "9PM TO 12AM"))
# format time in 24-hour format
bikeDF$CrashTime2 <- as.character(ifelse(bikeDF$CrashTime < 1000
, paste0("0", bikeDF$CrashTime), bikeDF$CrashTime))
# updated CrashDt column with time in date format
bikeDF$CrashDt2 <- as.POSIXct(strptime(bikeDF$CrashDt
, format="%m/%d/%Y")
, format="%m/%d%Y")
bikeDF$CrashDt2 <- as.POSIXct(strptime(paste(bikeDF$CrashDt, bikeDF$CrashTime2)
, format="%m/%d/%Y %H%M")
, format="%m/%d%Y %H%M")
bikeDF$CrashMon <- month(bikeDF$CrashDt2)
bikeDF <- bikeDF %>%
separate(DriverActionTypeCd, c("DriverActionTypeCd_1", "DriverActionTypeCd_2")
, ","
, extra = "merge")
Too few values at 1 locations: 1170
# re-code gender fields -- either Male, Female, or Not Provided
bikeDF <- bikeDF %>%
mutate(DriverGender2 = derivedFactor(
"Female" = grepl("^Female", DriverGender, ignore.case=TRUE),
"Male" = grepl("^Male", DriverGender, ignore.case=TRUE),
.method = "first",
.default = "Not Provided"
))
bikeDF <- bikeDF %>%
mutate(BikerGender2 = derivedFactor(
"Female" = grepl("^Female", BikerGender, ignore.case=TRUE),
"Male" = grepl("^Male", BikerGender, ignore.case=TRUE),
.method = "first",
.default = "Not Provided"
))
# updated data w/ re-coded values and new columns
sample_n(bikeDF, 20)
summary(bikeDF$CrashDt2)
Min. 1st Qu. Median Mean
"2011-01-04 06:10:00" "2012-05-17 10:48:15" "2013-07-20 00:27:00" "2013-08-04 06:02:17"
3rd Qu. Max. NA's
"2014-10-08 16:10:30" "2016-05-31 17:40:00" "4"
Table 1: Age Distribution Breakdown
| Age | Bicyclist | Driver |
|---|---|---|
| under 18 | 277 | 22 |
| …under 5 | 3 | 6 |
| …5 to 17 | 274 | 16 |
| 18 to 44 | 650 | 808 |
| …18 to 24 | 281 | 183 |
| …25 to 44 | 369 | 625 |
| 45 to 64 | 358 | 346 |
| 65 and over | 43 | 152 |
| 16 and over | 1128 | 1320 |
| 18 and over | 1051 | 1306 |
| 21 and over | 902 | 1244 |
| 62 and over | 65 | 200 |
Imagination is more important than knowledge. For knowledge is limited, whereas imagination embraces the entire world. [Albert Einstein, theoretical physicist]
ggplot(data = bikeDF, aes(x = CrashYear, fill = Distracted)) +
geom_bar(position = "dodge") +
labs(title = "Frequency Of Distracted Drivers", x = "Year") +
guides(fill=guide_legend(title="Distracted?")) +
facet_grid(Distracted ~ .)
ggplot(data = bikeDF, aes(x = CrashYear, fill = AlcoholRelated2)) +
geom_bar(position = "dodge") +
labs(title = "Frequency Of Alcohol Related Accidents", x = "Year") +
guides(fill=guide_legend(title="Alcohol Related?")) +
facet_grid(AlcoholRelated2 ~ .)
ggplot(data = bikeDF, aes(x = CrashYear, fill = HitAndRun2)) +
geom_bar(position = "dodge") +
labs(title = "Frequency Of Hit & Run Related Accidents", x = "Year") +
guides(fill=guide_legend(title="Hit & Run?"))
dplyr::count(bikeDF, DirectionOfTravelCd) %>%
arrange(-n) %>%
mutate(DirectionOfTravelCd = factor(DirectionOfTravelCd, DirectionOfTravelCd)) %>%
ggplot(mapping = aes(x = DirectionOfTravelCd, y = n)) +
geom_bar(stat="identity") +
coord_flip() +
labs(title = "Accidents Based Upon Direction Of Travel", x = "Direction of Travel", y = "count")
ggplot(data = bikeDF, mapping = aes(x = CrashDt2, y = CrashTime)) +
geom_point(shape=1, alpha = 0.4) + # hollow circles
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=0)) +
geom_smooth() + # smoothing line w/ 95% confidence region
facet_grid(~ DayOfWeek) +
labs(title = "Accidents Breakdown By Day Of Week, Year, And Time"
, x = "Year"
, y = "Time Of Accident (24-hour)")
ggplot(data = bikeDF, mapping = aes(x = factor(CrashMon), y = CrashTime)) +
geom_boxplot() +
facet_wrap(~ DayOfWeek, nrow = 3) +
labs(title = "Accident Breakdown By Day Of Week, Month, And Time"
, x = "Month"
, y = "Time Of Accident (24-hour)")
ggplot(data = bikeDF, mapping = aes(x = CrashYear)) +
geom_bar() +
facet_wrap(~ WeatherCond) +
labs(title = "Frequency By Weather Condition", x = "Year")
ggplot(data = bikeDF, mapping = aes(x = CrashYear)) +
geom_bar() +
facet_wrap(~ RoadwaySurfaceCond) +
labs(title = "Frequency By Road Surface Condition", x = "Year")
ggplot(data = bikeDF, mapping = aes(x = CrashYear)) +
geom_bar() +
facet_wrap(~ LightCondition) +
labs(title = "Frequency By Lighting Conditions", x = "Year")
ggplot(data = bikeDF, aes(x = CrashYear, fill = IntersectionAnalysis)) +
geom_bar(position = "dodge") +
labs(title = "Frequency Of Intersection Related Accidents", x = "Year") +
guides(fill=guide_legend(title=""))
ggplot(data = bikeDF, mapping = aes(x = CrashYear)) +
geom_bar() +
facet_wrap(~ IntersectionType) +
labs(title = "Frequency By Intersection Type", x = "Year")
ggplot(data = bikeDF, mapping = aes(x = CrashDt2, y = CrashTime)) +
geom_point(shape=2, alpha = 0.4) +
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=0)) +
geom_smooth() + # smoothing line w/ 95% confidence region
facet_grid(~ IntersectionType) +
labs(title = "Accidents Breakdown By Intersection Type", x = "Year")
# tabulate Intersection Type and Intersection Analysis columns
knitr::kable(table(bikeDF$IntersectionType, bikeDF$IntersectionAnalysis))
| Intersection | Not Intersection | |
|---|---|---|
| 1. Not at Intersection | 323 | 92 |
| 2. Two Approaches | 53 | 24 |
| 3. Three Approaches | 185 | 82 |
| 4. Four Approaches | 428 | 134 |
| 5. Five-Point, or More | 5 | 1 |
| 6. Roundabout | 1 | 0 |
ggplot(data = bikeDF, mapping = aes(x = CrashYear)) +
geom_bar() +
facet_wrap(~ BikerGender2) +
labs(title = "Frequency By Biker Gender", x = "Year")
ggplot(data = bikeDF, mapping = aes(x = CrashYear)) +
geom_bar() +
facet_wrap(~ DriverGender2) +
labs(title = "Frequency By Driver Gender", x = "Year")
ggplot(data = bikeDF, mapping = aes(x = CrashYear)) +
geom_bar() +
facet_wrap(~ TimeFrame) +
labs(title = "Frequency By Accident Time Frame", x = "Year")
ggplot(data = bikeDF, mapping = aes(x = factor(CrashMon), y = CrashTime)) +
geom_boxplot() +
facet_wrap(~ DayOfWeek, nrow = 3) +
stat_summary(fun.y=mean, geom="point", shape=5, size=1) + # plot mean using diamond shape
labs(title = "Accident Breakdown By Day Of Week And Month"
, x = "Month"
, y = "Time Of Accident (24-hour)")
ggplot(data = bikeDF, mapping = aes(x = BikerAge)) +
geom_histogram(bins=70, color="white") +
facet_wrap(~ TimeFrame) +
labs(title = "Frequency Of Biker's Age By Time Frame", x = "Age")
bikeDF %>%
group_by(CrashMon, TimeFrame) %>%
summarize(avg_biker_age = mean(BikerAge)
, std_dev_biker = sd(BikerAge, na.rm = TRUE)
, avg_driver_age = mean(DriverAge2)
, std_driver_age = sd(DriverAge2, na.rm = TRUE)
, num_accidents = n())
leaflet(bikeDF) %>%
addTiles() %>%
addCircleMarkers(clusterOptions = markerClusterOptions()
, popup=paste("<b>("
, bikeDF$BikerAge
, " old "
, bikeDF$BikerGender2
, " hit by "
, bikeDF$DriverAge2
, " old "
, bikeDF$DriverGender2
, ")</b> "
, "<br/>"
, bikeDF$CrashDt2
, "<br/>"
, bikeDF$DayOfWeek
, " betweeen "
, bikeDF$TimeFrame
, "<br/>"
, bikeDF$CollisionType
, " collision at "
, bikeDF$IntersectionType
, "<br/>"
, "Driver "
, paste(bikeDF$DriverActionTypeCd_1, bikeDF$DriverActionTypeCd_2)
))
Assuming 'lng' and 'lat' are longitude and latitude, respectively
Memory usage.
sort(sapply(ls()
, function(x) {object.size(get(x))}
)
)
getBikeAccidentCnt getDriverAccidentCnt bikeDF bikes.raw
20808 20808 579760 2549352
Loaded packages.
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mosaic_0.14.4 Matrix_1.2-8 mosaicData_0.14.0 lattice_0.20-34 lubridate_1.6.0
[6] leaflet_1.1.0 dplyr_0.5.0 purrr_0.2.2 readr_1.0.0 tidyr_0.6.1
[11] tibble_1.2 ggplot2_2.2.1 tidyverse_1.1.1
loaded via a namespace (and not attached):
[1] reshape2_1.4.2 splines_3.3.2 haven_1.0.0 colorspace_1.3-2 htmltools_0.3.5 yaml_2.1.14
[7] base64enc_0.1-3 foreign_0.8-67 DBI_0.5-1 modelr_0.1.0 readxl_0.1.1 plyr_1.8.4
[13] stringr_1.2.0 munsell_0.4.3 gtable_0.2.0 rvest_0.3.2 htmlwidgets_0.8 psych_1.6.12
[19] evaluate_0.10 labeling_0.3 knitr_1.15.1 forcats_0.2.0 httpuv_1.3.3 crosstalk_1.0.0
[25] curl_2.3 parallel_3.3.2 highr_0.6 broom_0.4.2 Rcpp_0.12.9 xtable_1.8-2
[31] scales_0.4.1 backports_1.0.5 jsonlite_1.2 mime_0.5 gridExtra_2.2.1 mnormt_1.5-5
[37] hms_0.3 digest_0.6.12 stringi_1.1.2 shiny_1.0.0 grid_3.3.2 rprojroot_1.2
[43] tools_3.3.2 magrittr_1.5 lazyeval_0.2.0 ggdendro_0.1-20 MASS_7.3-45 rsconnect_0.7
[49] xml2_1.1.1 assertthat_0.1 rmarkdown_1.3 httr_1.2.1 R6_2.2.0 nlme_3.1-131