project 1 part 2

Author

Aminata Diatta

Introduction :

The data sets I chose contain details about crashes reported in Montgomery County. The first data set has over 100,000 entries with 39 columns, including both categorical and numerical variables. The second data set is similar but focuses only on non-motorists. I included non-motorists because, while drivers face risks, pedestrians and cyclists are also exposed to accidents. To better understand these two data sets, I decided to combine them. This combined data set will help answer key questions for my project.

Join dataset Crash reporting and

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.3.3
Warning: package 'ggplot2' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)

load datasets

crash_rep <- read.csv("C:/Users/satad/Downloads/Crash_Reporting_-_Drivers_Data_20250318.csv")
non_motorists <- read.csv("C:/Users/satad/Downloads/Crash_Reporting_-_Non-Motorists_Data_20250318 (1).csv")
colnames(crash_rep)
 [1] "Report.Number"                 "Local.Case.Number"            
 [3] "Agency.Name"                   "ACRS.Report.Type"             
 [5] "Crash.Date.Time"               "Route.Type"                   
 [7] "Road.Name"                     "Cross.Street.Name"            
 [9] "Off.Road.Description"          "Municipality"                 
[11] "Related.Non.Motorist"          "Collision.Type"               
[13] "Weather"                       "Surface.Condition"            
[15] "Light"                         "Traffic.Control"              
[17] "Driver.Substance.Abuse"        "Non.Motorist.Substance.Abuse" 
[19] "Person.ID"                     "Driver.At.Fault"              
[21] "Injury.Severity"               "Circumstance"                 
[23] "Driver.Distracted.By"          "Drivers.License.State"        
[25] "Vehicle.ID"                    "Vehicle.Damage.Extent"        
[27] "Vehicle.First.Impact.Location" "Vehicle.Body.Type"            
[29] "Vehicle.Movement"              "Vehicle.Going.Dir"            
[31] "Speed.Limit"                   "Driverless.Vehicle"           
[33] "Parked.Vehicle"                "Vehicle.Year"                 
[35] "Vehicle.Make"                  "Vehicle.Model"                
[37] "Latitude"                      "Longitude"                    
[39] "Location"                     
colnames(non_motorists)
 [1] "Report.Number"                "Local.Case.Number"           
 [3] "Agency.Name"                  "ACRS.Report.Type"            
 [5] "Crash.Date.Time"              "Route.Type"                  
 [7] "Road.Name"                    "Cross.Street.Name"           
 [9] "Off.Road.Description"         "Municipality"                
[11] "Related.Non.Motorist"         "Collision.Type"              
[13] "Weather"                      "Surface.Condition"           
[15] "Light"                        "Traffic.Control"             
[17] "Driver.Substance.Abuse"       "Non.Motorist.Substance.Abuse"
[19] "Person.ID"                    "Pedestrian.Type"             
[21] "Pedestrian.Movement"          "Pedestrian.Actions"          
[23] "Pedestrian.Location"          "At.Fault"                    
[25] "Injury.Severity"              "Safety.Equipment"            
[27] "Latitude"                     "Longitude"                   
[29] "Location"                    

Perform an inner join to combine the datasets on “Report.Number”

combined_data <- inner_join(crash_rep, non_motorists, by = "Report.Number")
Warning in inner_join(crash_rep, non_motorists, by = "Report.Number"): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 73 of `x` matches multiple rows in `y`.
ℹ Row 3739 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
View(combined_data)

Install packages, and load libraries

install.packages("dplyr")
Warning: package 'dplyr' is in use and will not be installed
install.packages("ggplot2")
Warning: package 'ggplot2' is in use and will not be installed
install.packages("stats")
Warning: package 'stats' is in use and will not be installed
# Load libraries
library(dplyr)
library(ggplot2)
library(stats)
# Check data structure
str(combined_data)
'data.frame':   6561 obs. of  67 variables:
 $ Report.Number                 : chr  "EJ7879003C" "MCP3012000S" "MCP2962008G" "MCP2539001S" ...
 $ Local.Case.Number.x           : chr  "230048975" "16016902" "230065146" "210033537" ...
 $ Agency.Name.x                 : chr  "Gaithersburg Police Depar" "Montgomery County Police" "Montgomery County Police" "Montgomery County Police" ...
 $ ACRS.Report.Type.x            : chr  "Injury Crash" "Injury Crash" "Property Damage Crash" "Property Damage Crash" ...
 $ Crash.Date.Time.x             : chr  "08/11/2023 06:00:00 PM" "04/07/2016 07:42:00 AM" "11/08/2023 02:05:00 PM" "08/27/2021 09:15:00 PM" ...
 $ Route.Type.x                  : chr  "" "" "County" "County" ...
 $ Road.Name.x                   : chr  "" "" "FATHER HURLEY BLVD" "BATTERY LA" ...
 $ Cross.Street.Name.x           : chr  "" "" "CRYSTAL ROCK DR" "KEYSTONE AVE" ...
 $ Off.Road.Description.x        : chr  "1 N SUMMIT DRIVE" "ASPEN HILL SHOPPING CENTER PARKING LOT" "" "" ...
 $ Municipality.x                : chr  "" "" "N/A" "N/A" ...
 $ Related.Non.Motorist.x        : chr  "PEDESTRIAN" "PEDESTRIAN" "BICYCLIST" "PEDESTRIAN" ...
 $ Collision.Type.x              : chr  "SINGLE VEHICLE" "N/A" "OTHER" "OTHER" ...
 $ Weather.x                     : chr  "CLEAR" "RAINING" "CLEAR" "CLOUDY" ...
 $ Surface.Condition.x           : chr  "" "" "DRY" "DRY" ...
 $ Light.x                       : chr  "DAYLIGHT" "DAYLIGHT" "DAYLIGHT" "DARK LIGHTS ON" ...
 $ Traffic.Control.x             : chr  "NO CONTROLS" "NO CONTROLS" "TRAFFIC SIGNAL" "NO CONTROLS" ...
 $ Driver.Substance.Abuse.x      : chr  "NONE DETECTED" "NONE DETECTED" "NONE DETECTED" "UNKNOWN" ...
 $ Non.Motorist.Substance.Abuse.x: chr  "NONE DETECTED" "NONE DETECTED" "NONE DETECTED" "NONE DETECTED" ...
 $ Person.ID.x                   : chr  "05FB7597-2EE7-4226-B6D6-DA0F41600023" "410CCA55-C124-40A3-B6EF-1FBA1B3491A0" "77F8A1B6-7D43-4F1E-ACF4-27ED26B6A406" "B42022EB-B2A2-4232-A880-B791DE81E009" ...
 $ Driver.At.Fault               : chr  "Unknown" "Yes" "Yes" "Yes" ...
 $ Injury.Severity.x             : chr  "NO APPARENT INJURY" "NO APPARENT INJURY" "NO APPARENT INJURY" "NO APPARENT INJURY" ...
 $ Circumstance                  : chr  "N/A" "N/A, WET" "N/A" "N/A" ...
 $ Driver.Distracted.By          : chr  "UNKNOWN" "INATTENTIVE OR LOST IN THOUGHT" "LOOKED BUT DID NOT SEE" "UNKNOWN" ...
 $ Drivers.License.State         : chr  "MD" "DE" "MD" "MD" ...
 $ Vehicle.ID                    : chr  "EDF71D99-8A86-4CCD-BD22-3F32076A134A" "24D3ACC8-096D-433E-B728-7081FE416D1B" "C87A81B6-2F85-47A2-B1A1-48BF4B4476CC" "0CA33983-6632-4093-8002-0170CF155111" ...
 $ Vehicle.Damage.Extent         : chr  "NO DAMAGE" "NO DAMAGE" "SUPERFICIAL" "NO DAMAGE" ...
 $ Vehicle.First.Impact.Location : chr  "UNKNOWN" "ONE OCLOCK" "ONE OCLOCK" "TWELVE OCLOCK" ...
 $ Vehicle.Body.Type             : chr  "PASSENGER CAR" "PASSENGER CAR" "PASSENGER CAR" "" ...
 $ Vehicle.Movement              : chr  "MOVING CONSTANT SPEED" "MOVING CONSTANT SPEED" "STARTING FROM LANE" "MOVING CONSTANT SPEED" ...
 $ Vehicle.Going.Dir             : chr  "Unknown" "North" "North" "East" ...
 $ Speed.Limit                   : int  15 15 35 25 30 30 30 20 25 25 ...
 $ Driverless.Vehicle            : chr  "No" "No" "No" "No" ...
 $ Parked.Vehicle                : chr  "No" "No" "No" "No" ...
 $ Vehicle.Year                  : int  2018 2014 2017 0 2017 2017 2011 2023 2017 2010 ...
 $ Vehicle.Make                  : chr  "RAM" "HONDA" "CHRYSLER" "UNKNOWN" ...
 $ Vehicle.Model                 : chr  "TK" "CIVIC" "300" "UNKNOWN" ...
 $ Latitude.x                    : num  39.1 39.1 39.2 39 39.2 ...
 $ Longitude.x                   : num  -77.2 -77.1 -77.3 -77.1 -77.2 ...
 $ Location.x                    : chr  "(39.14587303, -77.19194047)" "(39.07721738, -77.0799011)" "(39.19356016, -77.2699489)" "(38.99194433, -77.09801817)" ...
 $ Local.Case.Number.y           : num  2.3e+08 1.6e+07 2.3e+08 2.1e+08 2.3e+08 ...
 $ Agency.Name.y                 : chr  "Gaithersburg Police Depar" "Montgomery County Police" "Montgomery County Police" "Montgomery County Police" ...
 $ ACRS.Report.Type.y            : chr  "Injury Crash" "Injury Crash" "Property Damage Crash" "Property Damage Crash" ...
 $ Crash.Date.Time.y             : chr  "08/11/2023 06:00:00 PM" "04/07/2016 07:42:00 AM" "11/08/2023 02:05:00 PM" "08/27/2021 09:15:00 PM" ...
 $ Route.Type.y                  : chr  "" "" "County" "County" ...
 $ Road.Name.y                   : chr  "" "" "FATHER HURLEY BLVD" "BATTERY LA" ...
 $ Cross.Street.Name.y           : chr  "" "" "CRYSTAL ROCK DR" "KEYSTONE AVE" ...
 $ Off.Road.Description.y        : chr  "1 N SUMMIT DRIVE" "ASPEN HILL SHOPPING CENTER PARKING LOT" "" "" ...
 $ Municipality.y                : chr  "" "" "N/A" "N/A" ...
 $ Related.Non.Motorist.y        : chr  "PEDESTRIAN" "PEDESTRIAN" "BICYCLIST" "PEDESTRIAN" ...
 $ Collision.Type.y              : chr  "SINGLE VEHICLE" "N/A" "OTHER" "OTHER" ...
 $ Weather.y                     : chr  "CLEAR" "RAINING" "CLEAR" "CLOUDY" ...
 $ Surface.Condition.y           : chr  "" "" "DRY" "DRY" ...
 $ Light.y                       : chr  "DAYLIGHT" "DAYLIGHT" "DAYLIGHT" "DARK LIGHTS ON" ...
 $ Traffic.Control.y             : chr  "NO CONTROLS" "NO CONTROLS" "TRAFFIC SIGNAL" "NO CONTROLS" ...
 $ Driver.Substance.Abuse.y      : chr  "NONE DETECTED" "NONE DETECTED" "NONE DETECTED" "UNKNOWN" ...
 $ Non.Motorist.Substance.Abuse.y: chr  "NONE DETECTED" "NONE DETECTED" "NONE DETECTED" "NONE DETECTED" ...
 $ Person.ID.y                   : chr  "1BEB4A89-BB27-4E48-BDAC-C321435D71E3" "EB0FE60D-7D1D-481E-B9D6-F74CC581B516" "F7E4DB06-2A8F-4F33-8266-9A051B325DF9" "E730705F-83D1-49EA-A592-5961363A2E53" ...
 $ Pedestrian.Type               : chr  "PEDESTRIAN" "PEDESTRIAN" "BICYCLIST" "PEDESTRIAN" ...
 $ Pedestrian.Movement           : chr  "Cross/Enter at Intersection" "Walking/Cycling on Sidewalk" "Cross/Enter at Intersection" "Cross/Enter not at Intersection" ...
 $ Pedestrian.Actions            : chr  "INATTENTIVE" "NO IMPROPER ACTIONS" "NO IMPROPER ACTIONS" "N/A" ...
 $ Pedestrian.Location           : chr  "ON ROADWAY NOT AT CROSSWALK" "ON ROADWAY AT CROSSWALK" "AT INTERSECTION MARKED CROSSWALK" "ON ROADWAY AT CROSSWALK" ...
 $ At.Fault                      : chr  "Unknown" "No" "No" "No" ...
 $ Injury.Severity.y             : chr  "POSSIBLE INJURY" "SUSPECTED MINOR INJURY" "NO APPARENT INJURY" "NO APPARENT INJURY" ...
 $ Safety.Equipment              : chr  "NONE" "NONE" "NONE" "N/A" ...
 $ Latitude.y                    : num  39.1 39.1 39.2 39 39.2 ...
 $ Longitude.y                   : num  -77.2 -77.1 -77.3 -77.1 -77.2 ...
 $ Location.y                    : chr  "(39.14587303, -77.19194047)" "(39.07721738, -77.0799011)" "(39.19356016, -77.2699489)" "(38.99194433, -77.09801817)" ...
# Summarize the data
summary(combined_data)
 Report.Number      Local.Case.Number.x Agency.Name.x      ACRS.Report.Type.x
 Length:6561        Length:6561         Length:6561        Length:6561       
 Class :character   Class :character    Class :character   Class :character  
 Mode  :character   Mode  :character    Mode  :character   Mode  :character  
                                                                             
                                                                             
                                                                             
 Crash.Date.Time.x  Route.Type.x       Road.Name.x        Cross.Street.Name.x
 Length:6561        Length:6561        Length:6561        Length:6561        
 Class :character   Class :character   Class :character   Class :character   
 Mode  :character   Mode  :character   Mode  :character   Mode  :character   
                                                                             
                                                                             
                                                                             
 Off.Road.Description.x Municipality.x     Related.Non.Motorist.x
 Length:6561            Length:6561        Length:6561           
 Class :character       Class :character   Class :character      
 Mode  :character       Mode  :character   Mode  :character      
                                                                 
                                                                 
                                                                 
 Collision.Type.x    Weather.x         Surface.Condition.x   Light.x         
 Length:6561        Length:6561        Length:6561         Length:6561       
 Class :character   Class :character   Class :character    Class :character  
 Mode  :character   Mode  :character   Mode  :character    Mode  :character  
                                                                             
                                                                             
                                                                             
 Traffic.Control.x  Driver.Substance.Abuse.x Non.Motorist.Substance.Abuse.x
 Length:6561        Length:6561              Length:6561                   
 Class :character   Class :character         Class :character              
 Mode  :character   Mode  :character         Mode  :character              
                                                                           
                                                                           
                                                                           
 Person.ID.x        Driver.At.Fault    Injury.Severity.x  Circumstance      
 Length:6561        Length:6561        Length:6561        Length:6561       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
 Driver.Distracted.By Drivers.License.State  Vehicle.ID       
 Length:6561          Length:6561           Length:6561       
 Class :character     Class :character      Class :character  
 Mode  :character     Mode  :character      Mode  :character  
                                                              
                                                              
                                                              
 Vehicle.Damage.Extent Vehicle.First.Impact.Location Vehicle.Body.Type 
 Length:6561           Length:6561                   Length:6561       
 Class :character      Class :character              Class :character  
 Mode  :character      Mode  :character              Mode  :character  
                                                                       
                                                                       
                                                                       
 Vehicle.Movement   Vehicle.Going.Dir   Speed.Limit    Driverless.Vehicle
 Length:6561        Length:6561        Min.   : 0.00   Length:6561       
 Class :character   Class :character   1st Qu.:20.00   Class :character  
 Mode  :character   Mode  :character   Median :30.00   Mode  :character  
                                       Mean   :26.02                     
                                       3rd Qu.:35.00                     
                                       Max.   :55.00                     
 Parked.Vehicle      Vehicle.Year  Vehicle.Make       Vehicle.Model     
 Length:6561        Min.   :   0   Length:6561        Length:6561       
 Class :character   1st Qu.:2005   Class :character   Class :character  
 Mode  :character   Median :2012   Mode  :character   Mode  :character  
                    Mean   :1844                                        
                    3rd Qu.:2016                                        
                    Max.   :9999                                        
   Latitude.x     Longitude.x      Location.x        Local.Case.Number.y
 Min.   :38.55   Min.   :-79.18   Length:6561        Min.   :1.705e+04  
 1st Qu.:39.02   1st Qu.:-77.18   Class :character   1st Qu.:1.705e+08  
 Median :39.06   Median :-77.10   Mode  :character   Median :1.901e+08  
 Mean   :39.07   Mean   :-77.11                      Mean   :1.612e+08  
 3rd Qu.:39.12   3rd Qu.:-77.04                      3rd Qu.:2.200e+08  
 Max.   :39.43   Max.   :-76.91                      Max.   :2.400e+09  
 Agency.Name.y      ACRS.Report.Type.y Crash.Date.Time.y  Route.Type.y      
 Length:6561        Length:6561        Length:6561        Length:6561       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
 Road.Name.y        Cross.Street.Name.y Off.Road.Description.y
 Length:6561        Length:6561         Length:6561           
 Class :character   Class :character    Class :character      
 Mode  :character   Mode  :character    Mode  :character      
                                                              
                                                              
                                                              
 Municipality.y     Related.Non.Motorist.y Collision.Type.y  
 Length:6561        Length:6561            Length:6561       
 Class :character   Class :character       Class :character  
 Mode  :character   Mode  :character       Mode  :character  
                                                             
                                                             
                                                             
  Weather.y         Surface.Condition.y   Light.y          Traffic.Control.y 
 Length:6561        Length:6561         Length:6561        Length:6561       
 Class :character   Class :character    Class :character   Class :character  
 Mode  :character   Mode  :character    Mode  :character   Mode  :character  
                                                                             
                                                                             
                                                                             
 Driver.Substance.Abuse.y Non.Motorist.Substance.Abuse.y Person.ID.y       
 Length:6561              Length:6561                    Length:6561       
 Class :character         Class :character               Class :character  
 Mode  :character         Mode  :character               Mode  :character  
                                                                           
                                                                           
                                                                           
 Pedestrian.Type    Pedestrian.Movement Pedestrian.Actions Pedestrian.Location
 Length:6561        Length:6561         Length:6561        Length:6561        
 Class :character   Class :character    Class :character   Class :character   
 Mode  :character   Mode  :character    Mode  :character   Mode  :character   
                                                                              
                                                                              
                                                                              
   At.Fault         Injury.Severity.y  Safety.Equipment     Latitude.y   
 Length:6561        Length:6561        Length:6561        Min.   :38.55  
 Class :character   Class :character   Class :character   1st Qu.:39.02  
 Mode  :character   Mode  :character   Mode  :character   Median :39.06  
                                                          Mean   :39.07  
                                                          3rd Qu.:39.12  
                                                          Max.   :39.43  
  Longitude.y      Location.y       
 Min.   :-79.18   Length:6561       
 1st Qu.:-77.18   Class :character  
 Median :-77.10   Mode  :character  
 Mean   :-77.11                     
 3rd Qu.:-77.04                     
 Max.   :-76.91                     
# Function to rename columns
rename_columns <- function(name) {
  name <- gsub("\\.", "_", name)   # Replace "." with "_"
  name <- gsub("_x$", "1", name)   # Replace "_x" with "1"
  name <- gsub("_y$", "2", name)   # Replace "_y" with "2"
  tolower(name)                    # Convert to lowercase
}

# Apply the renaming function to column names
colnames(combined_data) <- sapply(colnames(combined_data), rename_columns)

# Check new column names
colnames(combined_data)
 [1] "report_number"                 "local_case_number1"           
 [3] "agency_name1"                  "acrs_report_type1"            
 [5] "crash_date_time1"              "route_type1"                  
 [7] "road_name1"                    "cross_street_name1"           
 [9] "off_road_description1"         "municipality1"                
[11] "related_non_motorist1"         "collision_type1"              
[13] "weather1"                      "surface_condition1"           
[15] "light1"                        "traffic_control1"             
[17] "driver_substance_abuse1"       "non_motorist_substance_abuse1"
[19] "person_id1"                    "driver_at_fault"              
[21] "injury_severity1"              "circumstance"                 
[23] "driver_distracted_by"          "drivers_license_state"        
[25] "vehicle_id"                    "vehicle_damage_extent"        
[27] "vehicle_first_impact_location" "vehicle_body_type"            
[29] "vehicle_movement"              "vehicle_going_dir"            
[31] "speed_limit"                   "driverless_vehicle"           
[33] "parked_vehicle"                "vehicle_year"                 
[35] "vehicle_make"                  "vehicle_model"                
[37] "latitude1"                     "longitude1"                   
[39] "location1"                     "local_case_number2"           
[41] "agency_name2"                  "acrs_report_type2"            
[43] "crash_date_time2"              "route_type2"                  
[45] "road_name2"                    "cross_street_name2"           
[47] "off_road_description2"         "municipality2"                
[49] "related_non_motorist2"         "collision_type2"              
[51] "weather2"                      "surface_condition2"           
[53] "light2"                        "traffic_control2"             
[55] "driver_substance_abuse2"       "non_motorist_substance_abuse2"
[57] "person_id2"                    "pedestrian_type"              
[59] "pedestrian_movement"           "pedestrian_actions"           
[61] "pedestrian_location"           "at_fault"                     
[63] "injury_severity2"              "safety_equipment"             
[65] "latitude2"                     "longitude2"                   
[67] "location2"                    
view(combined_data)

1- Is there a relationship between weather conditions and the likelihood of severe injury in a crash?

str(combined_data$injury_severity1)  
 chr [1:6561] "NO APPARENT INJURY" "NO APPARENT INJURY" ...
table(combined_data$injury_severity1)  # Check unique values  

                               No Apparent Injury       NO APPARENT INJURY 
                      79                      713                     5474 
         Possible Injury          POSSIBLE INJURY   Suspected Minor Injury 
                      10                      141                       19 
  SUSPECTED MINOR INJURY Suspected Serious Injury SUSPECTED SERIOUS INJURY 
                     106                        1                       18 
combined_data$injury_severity1 <- tolower(combined_data$injury_severity1)  # Convert to lowercase
combined_data$injury_severity1 <- trimws(combined_data$injury_severity1)  # Remove extra spaces

# Check unique values again
table(combined_data$injury_severity1)

                               no apparent injury          possible injury 
                      79                     6187                      151 
  suspected minor injury suspected serious injury 
                     125                       19 
combined_data$injury_severity1 <- factor(combined_data$injury_severity1, 
                                         levels = c("no apparent injury", 
                                                    "possible injury", 
                                                    "suspected minor injury", 
                                                    "suspected serious injury"), 
                                         ordered = TRUE)
combined_data$injury_severity_binary <- ifelse(combined_data$injury_severity1 == "SEVERE INJURY", 1, 0)

Recoding Injury Severity into a binary variable (0 = non-severe, 1 = severe) simplifies the analysis and helps in modeling. It allows us to focus on factors contributing to severe injuries and improves the accuracy of statistical.

# Perform logistic regression with weather conditions as predictor
log_reg_model <- glm(injury_severity_binary ~ weather1, 
                     data = combined_data, 
                     family = binomial)
Warning: glm.fit: algorithm did not converge
# Display the model summary
summary(log_reg_model)

Call:
glm(formula = injury_severity_binary ~ weather1, family = binomial, 
    data = combined_data)

Coefficients:
                                            Estimate Std. Error z value
(Intercept)                               -2.657e+01  2.056e+05       0
weather1BLOWING SNOW                      -9.504e-22  2.908e+05       0
weather1Clear                             -9.506e-22  2.061e+05       0
weather1CLEAR                             -1.487e-13  2.057e+05       0
weather1Cloudy                            -9.501e-22  2.102e+05       0
weather1CLOUDY                            -9.503e-22  2.062e+05       0
weather1Fog, Smog, Smoke                  -9.501e-22  4.112e+05       0
weather1FOGGY                             -9.503e-22  2.192e+05       0
weather1Freezing Rain Or Freezing Drizzle -9.504e-22  3.251e+05       0
weather1N/A                               -9.504e-22  2.063e+05       0
weather1OTHER                             -9.503e-22  2.299e+05       0
weather1Rain                              -9.506e-22  2.093e+05       0
weather1RAINING                           -9.503e-22  2.061e+05       0
weather1SEVERE WINDS                      -9.504e-22  2.411e+05       0
weather1SLEET                             -9.504e-22  2.908e+05       0
weather1Snow                              -9.507e-22  2.457e+05       0
weather1SNOW                              -9.504e-22  2.160e+05       0
weather1Unknown                           -9.505e-22  2.601e+05       0
weather1UNKNOWN                           -9.504e-22  2.153e+05       0
weather1WINTRY MIX                        -9.504e-22  2.266e+05       0
                                          Pr(>|z|)
(Intercept)                                      1
weather1BLOWING SNOW                             1
weather1Clear                                    1
weather1CLEAR                                    1
weather1Cloudy                                   1
weather1CLOUDY                                   1
weather1Fog, Smog, Smoke                         1
weather1FOGGY                                    1
weather1Freezing Rain Or Freezing Drizzle        1
weather1N/A                                      1
weather1OTHER                                    1
weather1Rain                                     1
weather1RAINING                                  1
weather1SEVERE WINDS                             1
weather1SLEET                                    1
weather1Snow                                     1
weather1SNOW                                     1
weather1Unknown                                  1
weather1UNKNOWN                                  1
weather1WINTRY MIX                               1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 0.0000e+00  on 6481  degrees of freedom
Residual deviance: 3.7606e-08  on 6462  degrees of freedom
  (79 observations deleted due to missingness)
AIC: 40

Number of Fisher Scoring iterations: 25

Comments :

The model did not converge, meaning it struggled to find a reliable relationship between weather conditions and injury severity. The extremely large standard errors and near-zero estimates suggest possible data issues, such as low variation in injury severity or highly imbalanced weather categories.

Check the unique levels of injury_severity1

levels(combined_data$injury_severity1)
[1] "no apparent injury"       "possible injury"         
[3] "suspected minor injury"   "suspected serious injury"

Graph 1 : bar plot of injury severity by weather condition

ggplot(combined_data, aes(x = weather1, fill = injury_severity1)) +
  geom_bar(position = "fill") +  # Show the proportions of each injury type
  labs(title = "Injury Severity by Weather Condition",
       x = "Weather Condition",
       y = "Proportion of Injury Severity") +
  scale_fill_manual(values = c("no apparent injury" = "lightblue", 
                               "possible injury" = "yellow", 
                               "suspected minor injury" = "orange",
                               "suspected serious injury" = "red")) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

The analysis of the relationship between weather conditions and injury severity shows that certain weather factors seem to have an impact on the likelihood of severe injuries in crashes. The logistic regression model, which examined weather conditions as a predictor, found that weather conditions like “Clear,” “Cloudy,” and “Rain” were among the most frequently occurring. However, the results show that none of these weather conditions were statistically significant in predicting severe injury severity, as indicated by the high p-values in the model. This suggests that other factors, such as driver behavior or road conditions, may be more influential in determining injury severity than weather alone. Therefore, while weather conditions are important, they may not be the primary cause of severe injuries in crashes.

Check unique values in the RelatedNonMotorist column

# Check unique values in the Related.Non.Motorist column
unique(combined_data$related_non_motorist2)
 [1] "PEDESTRIAN"                                                                            
 [2] "BICYCLIST"                                                                             
 [3] "OTHER CONVEYANCE"                                                                      
 [4] "OTHER"                                                                                 
 [5] "OTHER PEDALCYCLIST"                                                                    
 [6] "OTHER, PEDESTRIAN"                                                                     
 [7] "MACHINE OPERATOR/RIDER"                                                                
 [8] "OTHER, OTHER CONVEYANCE"                                                               
 [9] "BICYCLIST, OTHER"                                                                      
[10] "BICYCLIST, PEDESTRIAN"                                                                 
[11] "OTHER CONVEYANCE, PEDESTRIAN"                                                          
[12] "IN ANIMAL-DRAWN VEH"                                                                   
[13] "Pedestrian"                                                                            
[14] "Unknown Type Of Non-Motorist"                                                          
[15] "Scooter (electric)"                                                                    
[16] "Cyclist (non-electric)"                                                                
[17] "Other Pedestrian (person in a building, skater, personal conveyance, etc.), Pedestrian"
[18] "Other Pedestrian (person in a building, skater, personal conveyance, etc.)"            
[19] "Cyclist (Electric)"                                                                    
[20] "Scooter (non-Electric)"                                                                
[21] "Wheelchair (electric)"                                                                 
[22] "Occupant of Motor Vehicle Not in Transport"                                            
[23] "Occupant Of a Non-Motor Vehicle Transportation Device"                                 
[24] "Wheelchair (non-electric)"                                                             
[25] "Unknown"                                                                               
[26] "Occupant of Motor Vehicle Not in Transport, Pedestrian"                                
[27] "Unknown, Wheelchair (electric)"                                                        

Count the frequency of each type of non-motorist

table(combined_data$related_non_motorist2)

                                                                             BICYCLIST 
                                                                                  1191 
                                                                      BICYCLIST, OTHER 
                                                                                     8 
                                                                 BICYCLIST, PEDESTRIAN 
                                                                                     7 
                                                                    Cyclist (Electric) 
                                                                                    24 
                                                                Cyclist (non-electric) 
                                                                                   123 
                                                                   IN ANIMAL-DRAWN VEH 
                                                                                     1 
                                                                MACHINE OPERATOR/RIDER 
                                                                                    40 
                                 Occupant Of a Non-Motor Vehicle Transportation Device 
                                                                                     3 
                                            Occupant of Motor Vehicle Not in Transport 
                                                                                     8 
                                Occupant of Motor Vehicle Not in Transport, Pedestrian 
                                                                                     2 
                                                                                 OTHER 
                                                                                   264 
                                                                      OTHER CONVEYANCE 
                                                                                    87 
                                                          OTHER CONVEYANCE, PEDESTRIAN 
                                                                                     2 
                                                                    OTHER PEDALCYCLIST 
                                                                                    26 
            Other Pedestrian (person in a building, skater, personal conveyance, etc.) 
                                                                                    12 
Other Pedestrian (person in a building, skater, personal conveyance, etc.), Pedestrian 
                                                                                     3 
                                                               OTHER, OTHER CONVEYANCE 
                                                                                     2 
                                                                     OTHER, PEDESTRIAN 
                                                                                    19 
                                                                            Pedestrian 
                                                                                   586 
                                                                            PEDESTRIAN 
                                                                                  4092 
                                                                    Scooter (electric) 
                                                                                    39 
                                                                Scooter (non-Electric) 
                                                                                     4 
                                                                               Unknown 
                                                                                     2 
                                                          Unknown Type Of Non-Motorist 
                                                                                     7 
                                                        Unknown, Wheelchair (electric) 
                                                                                     2 
                                                                 Wheelchair (electric) 
                                                                                     6 
                                                             Wheelchair (non-electric) 
                                                                                     1 

Create a new column indicating if it is a driver or non-motorist

combined_data$person_type <- ifelse(!is.na(combined_data$injury_severity1), "Driver", "Non-motorist")

2- Do crashes that occur at night or in low light conditions result in more severe injuries?

combined_data <- combined_data %>%
  mutate(light1 = tolower(light1))
library(ggplot2)

# Create a violin plot for light conditions vs. injury severity
ggplot(combined_data, aes(x = light1, y = as.numeric(injury_severity_binary))) +
  geom_violin(fill = "skyblue", alpha = 0.7) + 
  geom_jitter(width = 0.2, alpha = 0.4, color = "purple") +  
  labs(title = "Injury Severity by Light Condition",
       x = "Light Condition",
       y = "Injury Severity (0 = Non-severe, 1 = Severe)") +
  theme_minimal() +
  coord_flip()  # Flip for better readability
Warning: Removed 79 rows containing non-finite outside the scale range
(`stat_ydensity()`).
Warning: Removed 79 rows containing missing values or values outside the scale range
(`geom_point()`).

Comments :

The graph shows that severe injuries happen most when daylight or streetlights are on. Fewer accidents happen in complete darkness. However, some neighborhoods do not have streetlights at night, which is normal. At night, most people are asleep, so there are fewer drivers on the road. Around 7 AM and in the evening, when traffic is heavy, accidents may increase due to rush hour. This pattern matches real-life situations, as severe accidents mostly happen during the day and early evening.

combined_data$related_non_motorist2 <- tolower(combined_data$related_non_motorist2)
combined_data$non_motorist_group <- case_when(
  str_detect(combined_data$related_non_motorist2, "pedestrian") ~ "Pedestrian",
  str_detect(combined_data$related_non_motorist2, "bicyclist|cyclist|pedalcyclist") ~ "Bicyclist",
  str_detect(combined_data$related_non_motorist2, "scooter") ~ "Scooter",
  str_detect(combined_data$related_non_motorist2, "wheelchair") ~ "Wheelchair",
  str_detect(combined_data$related_non_motorist2, "other") ~ "Other",
  TRUE ~ "Unknown"
)
table(combined_data$non_motorist_group, combined_data$weather1)
            
             Blowing Snow BLOWING SNOW Clear CLEAR Cloudy CLOUDY
  Bicyclist             0            0   121   945     18    118
  Other                 0            2     0   248      0     26
  Pedestrian            3            1   460  2787     47    388
  Scooter               0            0    36     0      4      0
  Unknown               0            0    15    28      4      3
  Wheelchair            0            0     7     0      1      0
            
             Fog, Smog, Smoke FOGGY Freezing Rain Or Freezing Drizzle  N/A
  Bicyclist                 0     2                                 0   92
  Other                     0     1                                 0   36
  Pedestrian                1    18                                 2  331
  Scooter                   0     0                                 0    0
  Unknown                   0     1                                 0    5
  Wheelchair                0     0                                 0    0
            
             OTHER Rain RAINING SEVERE WINDS SLEET Snow SNOW Unknown UNKNOWN
  Bicyclist      0    7      61            1     0    1    0       0       6
  Other          0    0      28            2     0    0    0       0       3
  Pedestrian    12   79     517            5     3    6   29       5      22
  Scooter        0    3       0            0     0    0    0       0       0
  Unknown        0    0       4            0     0    1    0       0       0
  Wheelchair     0    0       0            0     0    0    0       1       0
            
             WINTRY MIX
  Bicyclist           0
  Other               7
  Pedestrian          7
  Scooter             0
  Unknown             0
  Wheelchair          0

TEST 2: CHI-SQUARE TEST

weather_crash_table <- table(combined_data$weather1, combined_data$non_motorist_group)
chisq.test(weather_crash_table)
Warning in chisq.test(weather_crash_table): Chi-squared approximation may be
incorrect

    Pearson's Chi-squared test

data:  weather_crash_table
X-squared = 784, df = 95, p-value < 2.2e-16

p-value < 2.2e-16: Since the p-value is extremely small (less than 0.05), we reject the null hypothesis, meaning weather conditions significantly affect the occurrence of crashes involving non-motorists.

combined_data$weather1 <- toupper(combined_data$weather1)

3-Do you think weather conditions play a role in accidents involving non-motorists?

library(ggplot2)

ggplot(combined_data, aes(x = weather1, fill = non_motorist_group)) +
  geom_bar(position = "fill") +
  theme_minimal() +
  labs(title = "Proportion of Non-Motorist Crashes by Weather Condition",
       x = "Weather Condition",
       y = "Proportion of Crashes",
       fill = "Non-Motorist Type") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Comments :

The graph shows that pedestrians are most at risk of accidents. People outside, no matter the type of accident, are more exposed. After pedestrians, bicyclists are also at high risk, which is not surprising. The roads they use make them more vulnerable to crashes.

4- How does the vehicle model affect the likelihood of a crash occuring under different speed limits ?

# Create a stacked bar plot to visualize the number of crashes by vehicle model and speed limit
ggplot(combined_data %>%
         filter(vehicle_make %in% c("TOYOTA", "GMC", "FORD", "CHEVROLET", "DODG", "LEXUS")),
       aes(x = vehicle_make, fill = as.factor(speed_limit))) +
  geom_bar() +
  labs(title = "Stacked Number of Crashes by Vehicle Model and Speed Limit",
       x = "Vehicle Make",
       y = "Number of Crashes",
       fill = "Speed Limit") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +  # Rotate x-axis labels
  theme_minimal()

Comments :

It is s not surprising that Toyota has the highest number of accidents. This is not because Toyota vehicles are inherently more prone to accidents, but because they are more commonly owned in Maryland. Many adults recommend Toyotas to teenagers, which likely increases the number of these vehicles on the road. On the other hand, cars like Dodge and Lexus are less likely to be involved in accidents simply because fewer people drive them.

5-Are there specific areas with a high concentration of crashes involved non-motorists?

# Filter non-motorist crashes (e.g., pedestrians, cyclists)
non_motorist_crashes <- combined_data %>%
  filter(!is.na(related_non_motorist1)) %>%
  group_by(location1) %>%
  summarise(non_motorist_count = n())

# Filter driver crashes
driver_crashes <- combined_data %>%
  filter(is.na(related_non_motorist1)) %>%
  group_by(location1) %>%
  summarise(driver_count = n())
# Merge non-motorist and driver crashes by location
crash_counts <- merge(non_motorist_crashes, driver_crashes, by = "location1", all = TRUE)
crash_counts[is.na(crash_counts)] <- 0  # Replace NAs with 0
# Create a heatmap for non-motorist crashes 
ggplot(combined_data, aes(x = longitude1, y = latitude1)) +
  geom_bin2d(bins = 30) +  # Adjust the number of bins for heatmap
  scale_fill_gradientn(colors = c("pink", "yellow", "blue", "black")) +  # Custom colors
  labs(title = "Heatmap of Non-Motorist Crash Locations",
       x = "Longitude",
       y = "Latitude",
       fill = "Crash Density") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"))

Comments :

This heatmap visualizes the density of non-motorist crashes based on their geographic locations, using longitude and latitude coordinates. The data is divided into 30 bins, where each bin represents a specific area, and the crash density is color-coded. The gradient moves from pink for lower-density areas to black for the highest crash concentrations, with yellow and blue in between. This allows for a clear identification of high-risk areas where non-motorist accidents frequently occur. The plot is styled with a minimal theme for clarity, and the title is centered and bolded to enhance readability.

In addition, when we located on a real map , this location is Rockville. So the city where non-motorist are the most expose to crashes is Rockville

6- How does the presence of alcohol or drugs affect crash severity for drivers and non-motorists?

# Check unique values in the driver substance column
unique(combined_data$driver_substance_abuse1)
 [1] "NONE DETECTED"                                      
 [2] "UNKNOWN"                                            
 [3] "N/A"                                                
 [4] "ALCOHOL PRESENT"                                    
 [5] "ALCOHOL CONTRIBUTED"                                
 [6] "ILLEGAL DRUG CONTRIBUTED"                           
 [7] "MEDICATION PRESENT"                                 
 [8] "COMBINED SUBSTANCE PRESENT"                         
 [9] "ILLEGAL DRUG PRESENT"                               
[10] "OTHER"                                              
[11] "Not Suspect of Alcohol Use, Not Suspect of Drug Use"
[12] "Unknown, Unknown"                                   
[13] "Suspect of Alcohol Use, Not Suspect of Drug Use"    
[14] "Suspect of Alcohol Use, Unknown"                    
[15] "Not Suspect of Alcohol Use, Unknown"                
# Check unique values in the non-motorist substance column
unique(combined_data$non_motorist_substance_abuse1)
 [1] "NONE DETECTED"                                                                                                                                                
 [2] "N/A"                                                                                                                                                          
 [3] "ALCOHOL PRESENT"                                                                                                                                              
 [4] "ALCOHOL CONTRIBUTED"                                                                                                                                          
 [5] "ILLEGAL DRUG CONTRIBUTED"                                                                                                                                     
 [6] "UNKNOWN"                                                                                                                                                      
 [7] "N/A, NONE DETECTED"                                                                                                                                           
 [8] "ALCOHOL CONTRIBUTED, ALCOHOL PRESENT"                                                                                                                         
 [9] "COMBINED SUBSTANCE PRESENT"                                                                                                                                   
[10] "ILLEGAL DRUG PRESENT"                                                                                                                                         
[11] "OTHER"                                                                                                                                                        
[12] "NONE DETECTED, UNKNOWN"                                                                                                                                       
[13] "N/A, UNKNOWN"                                                                                                                                                 
[14] "MEDICATION PRESENT"                                                                                                                                           
[15] "COMBINATION CONTRIBUTED"                                                                                                                                      
[16] "ALCOHOL PRESENT, NONE DETECTED"                                                                                                                               
[17] "Not Suspect of Alcohol Use, Suspect of Drug Use"                                                                                                              
[18] "Not Suspect of Alcohol Use, Not Suspect of Drug Use"                                                                                                          
[19] "Suspect of Alcohol Use, Suspect of Drug Use"                                                                                                                  
[20] "Suspect of Alcohol Use, Not Suspect of Drug Use"                                                                                                              
[21] "Unknown, Unknown"                                                                                                                                             
[22] "Not Suspect of Alcohol Use, Not Suspect of Drug Use, Not Suspect of Alcohol Use, Not Suspect of Drug Use"                                                     
[23] "Not Suspect of Alcohol Use, Unknown"                                                                                                                          
[24] "Not Suspect of Alcohol Use, Not Suspect of Drug Use, Not Suspect of Alcohol Use, Not Suspect of Drug Use, Not Suspect of Alcohol Use, Not Suspect of Drug Use"
[25] "Suspect of Alcohol Use, Unknown"                                                                                                                              
[26] "Unknown, Not Suspect of Drug Use"                                                                                                                             
# Check the frequency of substances for drivers
table(combined_data$driver_substance_abuse1)

                                ALCOHOL CONTRIBUTED 
                                                 22 
                                    ALCOHOL PRESENT 
                                                 54 
                         COMBINED SUBSTANCE PRESENT 
                                                  2 
                           ILLEGAL DRUG CONTRIBUTED 
                                                  2 
                               ILLEGAL DRUG PRESENT 
                                                  4 
                                 MEDICATION PRESENT 
                                                  3 
                                                N/A 
                                               1012 
                                      NONE DETECTED 
                                               3939 
Not Suspect of Alcohol Use, Not Suspect of Drug Use 
                                                678 
                Not Suspect of Alcohol Use, Unknown 
                                                  2 
                                              OTHER 
                                                  1 
    Suspect of Alcohol Use, Not Suspect of Drug Use 
                                                  3 
                    Suspect of Alcohol Use, Unknown 
                                                  4 
                                            UNKNOWN 
                                                700 
                                   Unknown, Unknown 
                                                135 
# Check the frequency of substances for non-motorists
table(combined_data$non_motorist_substance_abuse1)

                                                                                                                                          ALCOHOL CONTRIBUTED 
                                                                                                                                                           39 
                                                                                                                         ALCOHOL CONTRIBUTED, ALCOHOL PRESENT 
                                                                                                                                                            2 
                                                                                                                                              ALCOHOL PRESENT 
                                                                                                                                                          147 
                                                                                                                               ALCOHOL PRESENT, NONE DETECTED 
                                                                                                                                                            3 
                                                                                                                                      COMBINATION CONTRIBUTED 
                                                                                                                                                            2 
                                                                                                                                   COMBINED SUBSTANCE PRESENT 
                                                                                                                                                            1 
                                                                                                                                     ILLEGAL DRUG CONTRIBUTED 
                                                                                                                                                            2 
                                                                                                                                         ILLEGAL DRUG PRESENT 
                                                                                                                                                            7 
                                                                                                                                           MEDICATION PRESENT 
                                                                                                                                                            4 
                                                                                                                                                          N/A 
                                                                                                                                                         1203 
                                                                                                                                           N/A, NONE DETECTED 
                                                                                                                                                           53 
                                                                                                                                                 N/A, UNKNOWN 
                                                                                                                                                            2 
                                                                                                                                                NONE DETECTED 
                                                                                                                                                         4045 
                                                                                                                                       NONE DETECTED, UNKNOWN 
                                                                                                                                                            4 
                                                                                                          Not Suspect of Alcohol Use, Not Suspect of Drug Use 
                                                                                                                                                          670 
                                                     Not Suspect of Alcohol Use, Not Suspect of Drug Use, Not Suspect of Alcohol Use, Not Suspect of Drug Use 
                                                                                                                                                           54 
Not Suspect of Alcohol Use, Not Suspect of Drug Use, Not Suspect of Alcohol Use, Not Suspect of Drug Use, Not Suspect of Alcohol Use, Not Suspect of Drug Use 
                                                                                                                                                           18 
                                                                                                              Not Suspect of Alcohol Use, Suspect of Drug Use 
                                                                                                                                                            2 
                                                                                                                          Not Suspect of Alcohol Use, Unknown 
                                                                                                                                                            4 
                                                                                                                                                        OTHER 
                                                                                                                                                            1 
                                                                                                              Suspect of Alcohol Use, Not Suspect of Drug Use 
                                                                                                                                                           24 
                                                                                                                  Suspect of Alcohol Use, Suspect of Drug Use 
                                                                                                                                                            3 
                                                                                                                              Suspect of Alcohol Use, Unknown 
                                                                                                                                                           11 
                                                                                                                                                      UNKNOWN 
                                                                                                                                                          224 
                                                                                                                             Unknown, Not Suspect of Drug Use 
                                                                                                                                                            5 
                                                                                                                                             Unknown, Unknown 
                                                                                                                                                           31 

Prepare the data for logistic regression for drivers

# Prepare the data for logistic regression for drivers
# Create a binary variable for substance abuse (1 = substance, 0 = no substance)
combined_data$driver_substance_binary <- ifelse(combined_data$driver_substance_abuse1 %in% c("Alcohol", "Drugs"), 1, 0)

# For non-motorists
combined_data$non_motorist_substance_binary <- ifelse(combined_data$non_motorist_substance_abuse1 %in% c("Alcohol", "Drugs"), 1, 0)

# remove na
combined_data_cleaned <- combined_data[!is.na(combined_data$injury_severity1) & 
                                         !is.na(combined_data$driver_substance_binary) & 
                                         !is.na(combined_data$non_motorist_substance_binary), ]

logistic regression for driver_substance_abuse

driver_model <- glm(injury_severity1 ~ driver_substance_binary , 
                    data = combined_data_cleaned, 
                    family = binomial)
summary(driver_model)

Call:
glm(formula = injury_severity1 ~ driver_substance_binary, family = binomial, 
    data = combined_data_cleaned)

Coefficients: (1 not defined because of singularities)
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)             -3.04323    0.05959  -51.07   <2e-16 ***
driver_substance_binary       NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2399.4  on 6481  degrees of freedom
Residual deviance: 2399.4  on 6481  degrees of freedom
AIC: 2401.4

Number of Fisher Scoring iterations: 5

Comments :

after calculating the coefficient I could find any result. It may be due do N/A that could not remove.

# Logistic regression for non-motorist substance abuse and injury severity
non_motorist_model <- glm(injury_severity1 ~ non_motorist_substance_binary,
                          data = combined_data, 
                          family = binomial)
summary(non_motorist_model)

Call:
glm(formula = injury_severity1 ~ non_motorist_substance_binary, 
    family = binomial, data = combined_data)

Coefficients: (1 not defined because of singularities)
                              Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   -3.04323    0.05959  -51.07   <2e-16 ***
non_motorist_substance_binary       NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2399.4  on 6481  degrees of freedom
Residual deviance: 2399.4  on 6481  degrees of freedom
  (79 observations deleted due to missingness)
AIC: 2401.4

Number of Fisher Scoring iterations: 5

7-Are we able to determinate the agency that respond the most by only looking at a map?

# Install the necessary package if not already installed
install.packages("leaflet", repos = "https://cloud.r-project.org/")
Installing package into 'C:/Users/satad/AppData/Local/R/win-library/4.3'
(as 'lib' is unspecified)
package 'leaflet' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\satad\AppData\Local\Temp\RtmpEJTz1v\downloaded_packages
chooseCRANmirror(graphics = FALSE, ind = 1) # This sets the CRAN mirror
install.packages("leaflet")
Installing package into 'C:/Users/satad/AppData/Local/R/win-library/4.3'
(as 'lib' is unspecified)
package 'leaflet' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\satad\AppData\Local\Temp\RtmpEJTz1v\downloaded_packages
# Load the leaflet package
library(leaflet)
Warning: package 'leaflet' was built under R version 4.3.3
library(dplyr)

# Create a cleaned dataset for driver-related crashes
driver_data <- combined_data %>%
  filter(!is.na(latitude1) & !is.na(longitude1)) %>%
  select(local_case_number1, location1, latitude1, longitude1)

# Create a cleaned dataset for non-motorist-related crashes
non_motorist_data <- combined_data %>%
  filter(!is.na(latitude2) & !is.na(longitude2)) %>%
  select(local_case_number2, location2, latitude2, longitude2)

# Create a leaflet map for visualization
leaflet() %>%
  addTiles() %>%  # Add default OpenStreetMap tiles
  addCircles(data = driver_data, lat = ~latitude1, lng = ~longitude1,
             color = "blue", radius = 50, popup = ~paste("Driver Accident: ", local_case_number1)) %>%
  addLegend("bottomright", colors = c("blue", "red"), labels = c("Driver Accidents", "Non-Motorist Accidents"),
            title = "Accident Type")
unique(combined_data$agency_name1)
 [1] "Gaithersburg Police Depar" "Montgomery County Police" 
 [3] "Rockville Police Departme" "Takoma Park Police Depart"
 [5] "Maryland-National Capital" "MONTGOMERY"               
 [7] "ROCKVILLE"                 "MCPARK"                   
 [9] "GAITHERSBURG"              "TAKOMA"                   
# Define colors for the agencies (based on their names)
agency_colors <- c(
  "Gaithersburg Police Depar" = "yellow", 
  "Montgomery County Police" = "green", 
  "Rockville Police Departme" = "purple", 
  "Takoma Park Police Depart" = "orange", 
  "Maryland-National Capital" = "red", 
  "MONTGOMERY" = "pink", 
  "ROCKVILLE" = "grey", 
  "MCPARK" = "lightblue", 
  "GAITHERSBURG" = "blue", 
  "TAKOMA" = "white"
)

# Create a cleaned dataset for driver-related crashes with agency information
driver_data <- combined_data %>%
  filter(!is.na(latitude1) & !is.na(longitude1)) %>%
  select(local_case_number1, location1, latitude1, longitude1, agency_name1)

# Create a cleaned dataset for non-motorist-related crashes with agency information
non_motorist_data <- combined_data %>%
  filter(!is.na(latitude2) & !is.na(longitude2)) %>%
  select(local_case_number2, location2, latitude2, longitude2, agency_name1)

# Create a leaflet map for visualization with agency-based coloring
leaflet() %>%
  addTiles() %>%  # Add default OpenStreetMap tiles
  addCircles(data = driver_data, lat = ~latitude1, lng = ~longitude1,
             color = ~agency_colors[agency_name1], radius = 50, 
             popup = ~paste("Driver Accident: ", local_case_number1, "<br>Agency: ", agency_name1)) %>%
  addCircles(data = non_motorist_data, lat = ~latitude2, lng = ~longitude2,
             color = ~agency_colors[agency_name1], radius = 50, 
             popup = ~paste("Non-Motorist Accident: ", local_case_number2, "<br>Agency: ", agency_name1)) %>%
  addLegend("bottomright", colors = c("yellow", "green", "purple", "orange", "red", 
                                      "pink", "grey", "lightblue", "blue", "white"), 
            labels = c("Gaithersburg Police", "Montgomery County Police", "Rockville Police", 
                       "Takoma Park Police", "Maryland-National Capital", "MONTGOMERY", 
                       "ROCKVILLE", "MCPARK", "GAITHERSBURG", "TAKOMA"),
            title = "Agency")
Input to asJSON(keep_vec_names=TRUE) is a named vector. In a future version of jsonlite, this option will not be supported, and named vectors will be translated into arrays instead of objects. If you want JSON object output, please use a named list instead. See ?toJSON.
Input to asJSON(keep_vec_names=TRUE) is a named vector. In a future version of jsonlite, this option will not be supported, and named vectors will be translated into arrays instead of objects. If you want JSON object output, please use a named list instead. See ?toJSON.
Input to asJSON(keep_vec_names=TRUE) is a named vector. In a future version of jsonlite, this option will not be supported, and named vectors will be translated into arrays instead of objects. If you want JSON object output, please use a named list instead. See ?toJSON.
Input to asJSON(keep_vec_names=TRUE) is a named vector. In a future version of jsonlite, this option will not be supported, and named vectors will be translated into arrays instead of objects. If you want JSON object output, please use a named list instead. See ?toJSON.

8- What factors influence the frequency of accidents at specific road type?

library(dplyr)

# Create a new column for accident count per road type
accident_count_by_road <- combined_data %>%
  group_by(route_type1) %>%
  summarize(accident_count = n())

# Check the structure and data
head(accident_count_by_road)
# A tibble: 6 × 2
  route_type1     accident_count
  <chr>                    <int>
1 ""                        1437
2 "Bicycle Route"             10
3 "County"                  2018
4 "County Route"             313
5 "Crossover"                  2
6 "Government"                43
# Chi-Square Test for association between Road Type and Weather
weather_road_table <- table(combined_data$route_type1, combined_data$weather1)

# Perform Chi-Square Test
chi_square_result <- chisq.test(weather_road_table)
Warning in chisq.test(weather_road_table): Chi-squared approximation may be
incorrect
print(chi_square_result)

    Pearson's Chi-squared test

data:  weather_road_table
X-squared = 984.63, df = 252, p-value < 2.2e-16
# Check the expected frequencies
chi_square_result <- chisq.test(weather_road_table)
Warning in chisq.test(weather_road_table): Chi-squared approximation may be
incorrect
chi_square_result$expected
                        
                         BLOWING SNOW        CLEAR       CLOUDY
                         1.3141289438 1017.7928669 133.38408779
  Bicycle Route          0.0091449474    7.0827618   0.92821216
  County                 1.8454503887 1429.3013260 187.31321445
  County Route           0.2862368541  221.6904435  29.05304070
  Crossover              0.0018289895    1.4165524   0.18564243
  Government             0.0393232739   30.4558756   3.99131230
  Government Route       0.0027434842    2.1248285   0.27846365
  Interstate (State)     0.0064014632    4.9579332   0.64974851
  Local Route            0.0237768633   18.4151806   2.41335162
  Maryland (State)       1.5820759031 1225.3177869 160.58070416
  Maryland (State) Route 0.1655235482  128.1979881  16.80064015
  Municipality           0.4307270233  333.5980796  43.71879287
  Municipality Route     0.0676726109   52.4124371   6.86877000
  Other Public Roadway   0.0484682213   37.5386374   4.91952446
  Private Route          0.0100594422    7.7910380   1.02103338
  Ramp                   0.0192043896   14.8737997   1.94924554
  Service Road           0.0009144947    0.7082762   0.09282122
  Spur                   0.0027434842    2.1248285   0.27846365
  US (State)             0.1435756744  111.1993599  14.57293096
                        
                         FOG, SMOG, SMOKE       FOGGY
                             0.2190214906 4.818472794
  Bicycle Route              0.0015241579 0.033531474
  County                     0.3075750648 6.766651425
  County Route               0.0477061424 1.049535132
  Crossover                  0.0003048316 0.006706295
  Government                 0.0065538790 0.144185338
  Government Route           0.0004572474 0.010059442
  Interstate (State)         0.0010669105 0.023472032
  Local Route                0.0039628105 0.087181832
  Maryland (State)           0.2636793172 5.800944978
  Maryland (State) Route     0.0275872580 0.606919677
  Municipality               0.0717878372 1.579332419
  Municipality Route         0.0112787685 0.248132907
  Other Public Roadway       0.0080780369 0.177716811
  Private Route              0.0016765737 0.036884621
  Ramp                       0.0032007316 0.070416095
  Service Road               0.0001524158 0.003353147
  Spur                       0.0004572474 0.010059442
  US (State)                 0.0239292791 0.526444140
                        
                         FREEZING RAIN OR FREEZING DRIZZLE          N/A
                                              0.4380429813 101.62597165
  Bicycle Route                               0.0030483158   0.70720927
  County                                      0.6151501296 142.71483006
  County Route                                0.0954122847  22.13565005
  Crossover                                   0.0006096632   0.14144185
  Government                                  0.0131077580   3.04099985
  Government Route                            0.0009144947   0.21216278
  Interstate (State)                          0.0021338211   0.49504649
  Local Route                                 0.0079256211   1.83874409
  Maryland (State)                            0.5273586344 122.34720317
  Maryland (State) Route                      0.0551745161  12.80048773
  Municipality                                0.1435756744  33.30955647
  Municipality Route                          0.0225575370   5.23334857
  Other Public Roadway                        0.0161560738   3.74820911
  Private Route                               0.0033531474   0.77793019
  Ramp                                        0.0064014632   1.48513946
  Service Road                                0.0003048316   0.07072093
  Spur                                        0.0009144947   0.21216278
  US (State)                                  0.0478585581  11.10318549
                        
                               OTHER        RAIN      RAINING SEVERE WINDS
                         2.628257888 19.49291267 133.60310928  1.752171925
  Bicycle Route          0.018289895  0.13565005   0.92973632  0.012193263
  County                 3.690900777 27.37418077 187.62078951  2.460600518
  County Route           0.572473708  4.24584667  29.10074684  0.381649139
  Crossover              0.003657979  0.02713001   0.18594726  0.002438653
  Government             0.078646548  0.58329523   3.99786618  0.052431032
  Government Route       0.005486968  0.04069502   0.27892090  0.003657979
  Interstate (State)     0.012802926  0.09495504   0.65081542  0.008535284
  Local Route            0.047553727  0.35269014   2.41731443  0.031702484
  Maryland (State)       3.164151806 23.46745923 160.84438348  2.109434537
  Maryland (State) Route 0.331047096  2.45526597  16.82822740  0.220698064
  Municipality           0.861454047  6.38911751  43.79058070  0.574302698
  Municipality Route     0.135345222  1.00381039   6.88004877  0.090230148
  Other Public Roadway   0.096936443  0.71894528   4.92760250  0.064624295
  Private Route          0.020118884  0.14921506   1.02270995  0.013412590
  Ramp                   0.038408779  0.28486511   1.95244627  0.025605853
  Service Road           0.001828989  0.01356501   0.09297363  0.001219326
  Spur                   0.005486968  0.04069502   0.27892090  0.003657979
  US (State)             0.287151349  2.12970584  14.59686023  0.191434233
                        
                                SLEET         SNOW      UNKNOWN  WINTRY MIX
                         0.6570644719  8.103795153  8.103795153 3.066300869
  Bicycle Route          0.0045724737  0.056393842  0.056393842 0.021338211
  County                 0.9227251943 11.380277397 11.380277397 4.306050907
  County Route           0.1431184271  1.765127267  1.765127267 0.667885993
  Crossover              0.0009144947  0.011278768  0.011278768 0.004267642
  Government             0.0196616369  0.242493522  0.242493522 0.091754306
  Government Route       0.0013717421  0.016918153  0.016918153 0.006401463
  Interstate (State)     0.0032007316  0.039475690  0.039475690 0.014936747
  Local Route            0.0118884316  0.146623990  0.146623990 0.055479348
  Maryland (State)       0.7910379515  9.756134736  9.756134736 3.691510440
  Maryland (State) Route 0.0827617741  1.020728547  1.020728547 0.386221613
  Municipality           0.2153635117  2.656149977  2.656149977 1.005029721
  Municipality Route     0.0338363054  0.417314434  0.417314434 0.157902759
  Other Public Roadway   0.0242341107  0.298887365  0.298887365 0.113092516
  Private Route          0.0050297211  0.062033227  0.062033227 0.023472032
  Ramp                   0.0096021948  0.118427069  0.118427069 0.044810242
  Service Road           0.0004572474  0.005639384  0.005639384 0.002133821
  Spur                   0.0013717421  0.016918153  0.016918153 0.006401463
  US (State)             0.0717878372  0.885383326  0.885383326 0.335009907

Comments :

The Chi-squared test shows that weather conditions and road type affect the number of accidents, as the p-value is very small. This means the relationship is statistically significant. However, there is a warning saying the test may not be accurate. This can happen if some road types or weather conditions have very few accidents, making the data uneven.

Conclusion :

This data set, which included crash reports for both drivers and non-motorists, allowed me to explore various aspects of accidents and gain a deeper understanding of their impact. Through my analysis, I was able to answer several of my questions, although some remain unanswered. My goal was to highlight the significant risks that crashes pose to people’s lives, not only for drivers and passengers but also for pedestrians and other non-motorists who may become victims of road accidents. Additionally, I aimed to raise awareness about the dangers of substances such as alcohol and drugs, which can impair judgment and reaction times, ultimately leading to tragic consequenceseyond the statistics and numbers, this data set made me realize that accidents are more than just events recorded in reports they represent real people, families, and communities affected by tragedy. It made me think beyond what we see on the surface and consider the long-term emotional, physical, and financial consequences that accidents can have. Understanding these patterns is crucial in working toward safer roads and encouraging responsible behavior among all road users. This research reinforced the importance of preventive measures and the need for better awareness to reduce accidents and save lives.

Url :

Crash Reporting - Incidents Data at https://data.montgomerycountymd.gov/Public-Safety/Crash-Reporting-Incidents-Data/bhju-22kf
Crash Reporting - Non-Motorists Data at https://data.montgomerycountymd.gov/Public-Safety/Crash-Reporting-Non-Motorists-Data/n7fk-dce5