Data 205 Project 1

Author

Aminata Sata Diatta

Introduction:

The data sets I chose contain details about crashes reported in Montgomery County. The first data set has over 100,000 entries with 39 columns, including both categorical and numerical variables. The second data set is similar but focuses only on non-motorists. I included non-motorists because, while drivers face risks, pedestrians and cyclists are also exposed to accidents. To better understand these two data sets, I decided to combine them. This combined data set will help answer key questions for my project.

load the libraries and data sets

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.3.3
Warning: package 'ggplot2' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
crash_rep <- read.csv("C:/Users/satad/Downloads/Crash_Reporting_-_Drivers_Data_20250318.csv")
non_motorists <- read.csv("C:/Users/satad/Downloads/Crash_Reporting_-_Non-Motorists_Data_20250318 (1).csv")

I will start by modifying the name of the column to make my work more easier.

colnames(crash_rep)
 [1] "Report.Number"                 "Local.Case.Number"            
 [3] "Agency.Name"                   "ACRS.Report.Type"             
 [5] "Crash.Date.Time"               "Route.Type"                   
 [7] "Road.Name"                     "Cross.Street.Name"            
 [9] "Off.Road.Description"          "Municipality"                 
[11] "Related.Non.Motorist"          "Collision.Type"               
[13] "Weather"                       "Surface.Condition"            
[15] "Light"                         "Traffic.Control"              
[17] "Driver.Substance.Abuse"        "Non.Motorist.Substance.Abuse" 
[19] "Person.ID"                     "Driver.At.Fault"              
[21] "Injury.Severity"               "Circumstance"                 
[23] "Driver.Distracted.By"          "Drivers.License.State"        
[25] "Vehicle.ID"                    "Vehicle.Damage.Extent"        
[27] "Vehicle.First.Impact.Location" "Vehicle.Body.Type"            
[29] "Vehicle.Movement"              "Vehicle.Going.Dir"            
[31] "Speed.Limit"                   "Driverless.Vehicle"           
[33] "Parked.Vehicle"                "Vehicle.Year"                 
[35] "Vehicle.Make"                  "Vehicle.Model"                
[37] "Latitude"                      "Longitude"                    
[39] "Location"                     
colnames(non_motorists)
 [1] "Report.Number"                "Local.Case.Number"           
 [3] "Agency.Name"                  "ACRS.Report.Type"            
 [5] "Crash.Date.Time"              "Route.Type"                  
 [7] "Road.Name"                    "Cross.Street.Name"           
 [9] "Off.Road.Description"         "Municipality"                
[11] "Related.Non.Motorist"         "Collision.Type"              
[13] "Weather"                      "Surface.Condition"           
[15] "Light"                        "Traffic.Control"             
[17] "Driver.Substance.Abuse"       "Non.Motorist.Substance.Abuse"
[19] "Person.ID"                    "Pedestrian.Type"             
[21] "Pedestrian.Movement"          "Pedestrian.Actions"          
[23] "Pedestrian.Location"          "At.Fault"                    
[25] "Injury.Severity"              "Safety.Equipment"            
[27] "Latitude"                     "Longitude"                   
[29] "Location"                    

Perform an inner join to combine the datasets on “Report.Number”

combined_data <- inner_join(crash_rep, non_motorists, by = "Report.Number")
Warning in inner_join(crash_rep, non_motorists, by = "Report.Number"): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 73 of `x` matches multiple rows in `y`.
ℹ Row 3739 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
View(combined_data)

Install packages, and load libraries

install.packages("dplyr")
Warning: package 'dplyr' is in use and will not be installed
install.packages("ggplot2")
Warning: package 'ggplot2' is in use and will not be installed
install.packages("stats")
Warning: package 'stats' is in use and will not be installed
# Load libraries
library(dplyr)
library(ggplot2)
library(stats)

Rename my columns :

rename_columns <- function(name) {
  name <- gsub("\\.", "_", name)  
  name <- gsub("_x$", "1", name)   
  name <- gsub("_y$", "2", name)   
  tolower(name)                    
  }
  colnames(combined_data) <- sapply(colnames(combined_data), rename_columns)

View(combined_data)

1- Is there a relationship between weather conditions and injury in a crash?

table(combined_data$weather1)

                     Blowing Snow                      BLOWING SNOW 
                                3                                 3 
                            Clear                             CLEAR 
                              639                              4008 
                           Cloudy                            CLOUDY 
                               74                               535 
                 Fog, Smog, Smoke                             FOGGY 
                                1                                22 
Freezing Rain Or Freezing Drizzle                               N/A 
                                2                               464 
                            OTHER                              Rain 
                               12                                89 
                          RAINING                      SEVERE WINDS 
                              610                                 8 
                            SLEET                              Snow 
                                3                                 8 
                             SNOW                           Unknown 
                               29                                 6 
                          UNKNOWN                        WINTRY MIX 
                               31                                14 
table(combined_data$injury_severity1)

                               No Apparent Injury       NO APPARENT INJURY 
                      79                      713                     5474 
         Possible Injury          POSSIBLE INJURY   Suspected Minor Injury 
                      10                      141                       19 
  SUSPECTED MINOR INJURY Suspected Serious Injury SUSPECTED SERIOUS INJURY 
                     106                        1                       18 

Remove majuscule

combined_data$weather_clean <- tolower(combined_data$weather1)
combined_data$weather_clean <- trimws(combined_data$weather1)
# Apply both tolower and trimws CORRECTLY
combined_data$weather_clean <- tolower(trimws(combined_data$weather1))

# Now recode the values properly
combined_data$weather_clean <- dplyr::recode(combined_data$weather_clean,
  "clear" = "CLEAR",
  "cloudy" = "CLOUDY",
  "rain" = "RAIN",
  "raining" = "RAIN",
  "snow" = "SNOW",
  "blowing snow" = "SNOW",
  "fog, smog, smoke" = "FOG",
  "foggy" = "FOG",
  "severe winds" = "WIND",
  "freezing rain or freezing drizzle" = "FREEZING RAIN",
  "sleet" = "FREEZING RAIN",
  "wintry mix" = "FREEZING RAIN",
  "n/a" = NA_character_,
  "unknown" = NA_character_,
  .default = "Other"
)

remove Na

combined_data <- combined_data[!is.na(combined_data$weather_clean), ]

modify injury type name

combined_data$injury_clean <- tolower(combined_data$injury_severity1)
combined_data$injury_clean <- trimws(combined_data$injury_clean)

combined_data$injury_clean <- dplyr::recode(combined_data$injury_clean,
  "no apparent injury" = "No Injury",
  "possible injury" = "Possible",
  "suspected minor injury" = "Minor",
  "suspected serious injury" = "Serious",
  .default = "Other"
)
table(combined_data$weather_clean, combined_data$injury_clean)
               
                Minor No Injury Other Possible Serious
  CLEAR            99      4355    63      117      13
  CLOUDY            7       580     8       12       2
  FOG               0        22     0        1       0
  FREEZING RAIN     1        17     0        0       1
  Other             0        11     0        1       0
  RAIN             11       673     6        7       2
  SNOW              1        41     1        0       0
  WIND              1         7     0        0       0

Proportion and percentage of injury

# Get proportions in the specified order
injury_props <- prop.table(table(combined_data$injury_clean))

# Convert to data frame for better display or plotting
injury_df <- as.data.frame(injury_props)

# rename columns for clarity
colnames(injury_df) <- c("Injury_Severity", "Proportion")

# Multiply proportions by 100 to get percentages
injury_df$Percentage <- round(injury_df$Proportion * 100, 2)

# Print the result
print(injury_df)
  Injury_Severity  Proportion Percentage
1           Minor 0.019801980       1.98
2       No Injury 0.941584158      94.16
3           Other 0.012871287       1.29
4        Possible 0.022772277       2.28
5         Serious 0.002970297       0.30

Create the bar plot that represent injury severity by wheather condition

# Reorder the injury severity levels
combined_data$injury_clean <- factor(combined_data$injury_clean, 
                                     levels = c("Serious", "Possible", "Other", "Minor", "No Injury"))

# Create the bar plot
ggplot(combined_data, aes(x = weather_clean, fill = injury_clean)) +
  geom_bar(position = "fill") +
  labs(
    title = "Injury Severity by Weather Condition",
    x = "Weather Condition",
    y = "Proportion",
    fill = "Injury Severity"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Analysis :

The bar plot shows the relationship between weather conditions and injury severity in traffic accidents. Each bar represents a different weather condition, and the colors show the proportion of injury types. The pink color, which represents “No Injury,” dominates most weather conditions. Some weather types, like “WIND” and “FREEZING RAIN,” show more of the other colors, meaning more injuries. Overall, the graph helps us compare how often different types of injuries happen depending on the weather.

Interpretation :

Based on the bar plot, we can see that during windy weather, minor accidents happen. In addition, during fog, cloudy, and other weather conditions, possible injuries occur in crash accidents. More serious accidents happen during freezing rain. The type of injury is different depending on the weather condition. According to this result, weather is one of the factors that can influence crash accidents.

2- light vs injury severity

Before we do that, we will start to a binary column where 1 means that the injury was severe, and 0 when the injury is minor

combined_data$injury_severity_binary <- ifelse(combined_data$injury_clean == "Serious", 1, 0)
table(combined_data$injury_clean)

  Serious  Possible     Other     Minor No Injury 
       18       138        78       120      5706 
injury_percentages <- combined_data %>%
  count(injury_clean) %>%
  mutate(percentage = (n / sum(n)) * 100)

# Create bar chart
ggplot(injury_percentages, aes(x = injury_clean, y = percentage, fill = injury_clean)) +
  geom_bar(stat = "identity") +
  labs(title = "Injury Severity Distribution",
       x = "Injury Category",
       y = "Percentage (%)") +
  theme_minimal() +
  theme(
    legend.position = "none",
    panel.grid.major = element_blank(),  
    panel.grid.minor = element_blank(), 
    panel.background = element_blank(),  
    plot.background = element_blank()  ) +
  geom_text(aes(label = sprintf("%.2f%%", percentage)), vjust = -0.5)

Analysis :

The graph above shows the percentage of different injury levels. After doing the inner join, we can see that 94% of the accidents resulted in “No injury,” while only 0.30% were serious. These percentages will help explain the results we find later and give us a sense of how the data is distributed.

removing lowercase for light

combined_data2 <- combined_data %>%
  mutate(light1 = tolower(light1))

Create a heatmap for light conditions vs. injury severity

let’s start by renaming the element in our column.

heatmap_data <- combined_data2 %>%
  count(light1, injury_clean)

unique(combined_data2$light1)
 [1] "daylight"                 "dark lights on"          
 [3] "n/a"                      "dawn"                    
 [5] "dark no lights"           "dusk"                    
 [7] "dark -- unknown lighting" "unknown"                 
 [9] "other"                    "dark - not lighted"      
[11] "dark - lighted"           "dark - unknown lighting" 
combined_data$light1_clean <- combined_data2$light1

combined_data$light1_clean <- recode(combined_data$light1_clean,
  "dark - unknown lighting" = "unknown",
  "dark - lighted" = "dark lights on",
  "dark -- unknown lighting" = "unknown" ,
  "dark - not lighted" = "dark no lights"
)

combined_data3 <- combined_data %>%
  filter(light1_clean != "n/a")

table(combined_data$light1_clean)

dark lights on dark no lights           dawn       daylight           dusk 
          1549            240            115           3899            127 
           n/a          other        unknown 
            40             16             74 

Heatmap of Injury Severity by Light Condition

heatmap_data_norm <- combined_data3 %>%
  count(light1_clean, injury_clean) %>%
  group_by(light1_clean) %>%
  mutate(prop = n / sum(n)) %>%
  ungroup()

ggplot(heatmap_data_norm, aes(x = light1_clean, y = injury_clean, fill = prop)) +
  geom_tile(color = "black") +
  scale_fill_gradient(low = "yellow", high = "red") +
  labs(
    title = "Heatmap of Injury Severity by Light Condition",
    x = "Light Condition",
    y = "Injury Severity",
    fill = "Proportion"
  ) +
  theme(
    panel.grid.major = element_blank(),  
    panel.grid.minor = element_blank(), 
    panel.background = element_blank(),  
    plot.background = element_blank(),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Analysis :

The heatmap titled “Heatmap of Injury Severity by Light Condition” illustrates the proportion of injury severities under various lighting conditions during accidents. The x-axis represents different light conditions ( daylight, dark with no lights, dusk), while the y-axis indicates injury severity levels ranging from “No Injury” to “Serious.” The colors range from yellow (low proportion) to red (high proportion), with red signifying a higher percentage of cases. A notable pattern is that the “No Injury” row consistently shows the darkest red across nearly all light conditions, indicating that the majority of incidents reported resulted in no injuries, regardless of lighting. Meanwhile, all other injury categories, such as “Minor,” “Possible,” or “Serious,” show much lower proportions (mostly yellow), suggesting they are less common.

Interpretation:

Given that “No Injury” dominates the dataset, the heatmap primarily highlights how low the frequency of injuries is across all light conditions, rather than identifying conditions that are more dangerous. This imbalance in the dataset can mask smaller trends in actual injury rates. For instance, although it IS possible that poor lighting (like “dark-no lights”) may contribute to more severe injuries, the heatmap cannot clearly show that due to the overwhelming presence of non-injury cases.

3- Do you think wheather condition play a role in an accident involving non-motorists?

table(combined_data$weather1)

                     Blowing Snow                      BLOWING SNOW 
                                3                                 3 
                            Clear                             CLEAR 
                              639                              4008 
                           Cloudy                            CLOUDY 
                               74                               535 
                 Fog, Smog, Smoke                             FOGGY 
                                1                                22 
Freezing Rain Or Freezing Drizzle                             OTHER 
                                2                                12 
                             Rain                           RAINING 
                               89                               610 
                     SEVERE WINDS                             SLEET 
                                8                                 3 
                             Snow                              SNOW 
                                8                                29 
                       WINTRY MIX 
                               14 

Rename my weather1 column

combined_data$weather1 <- tolower(trimws(combined_data$weather1))

combined_data$weather1_clean <- recode(combined_data$weather1,
  "clear" = "CLEAR",
  "cloudy" = "CLOUDY",
  "rain" = "RAIN",
  "raining" = "RAIN",
  "blowing snow" = "SNOW",
  "Snow" = "SNOW",
  "sleet" = "RAIN",
  "freezing rain or freezing drizzle" = "RAIN",
  "foggy" = "FOG",
  "fog, smog, smoke" = "FOG",
  "severe winds" = "WIND",
  "wintry mix" = "RAIN",
  .default = "OTHER")
table(combined_data$weather1_clean)

 CLEAR CLOUDY    FOG  OTHER   RAIN   SNOW   WIND 
  4647    609     23     49    718      6      8 

let’s check this table

table(combined_data$related_non_motorist2, combined_data$weather1_clean)
                                                                                        
                                                                                         CLEAR
  BICYCLIST                                                                                920
  BICYCLIST, OTHER                                                                           6
  BICYCLIST, PEDESTRIAN                                                                      7
  Cyclist (Electric)                                                                        19
  Cyclist (non-electric)                                                                   102
  IN ANIMAL-DRAWN VEH                                                                        0
  MACHINE OPERATOR/RIDER                                                                    28
  Occupant Of a Non-Motor Vehicle Transportation Device                                      1
  Occupant of Motor Vehicle Not in Transport                                                 6
  Occupant of Motor Vehicle Not in Transport, Pedestrian                                     2
  OTHER                                                                                    184
  OTHER CONVEYANCE                                                                          64
  OTHER CONVEYANCE, PEDESTRIAN                                                               2
  OTHER PEDALCYCLIST                                                                        19
  Other Pedestrian (person in a building, skater, personal conveyance, etc.)                12
  Other Pedestrian (person in a building, skater, personal conveyance, etc.), Pedestrian     3
  OTHER, OTHER CONVEYANCE                                                                    0
  OTHER, PEDESTRIAN                                                                         15
  Pedestrian                                                                               443
  PEDESTRIAN                                                                              2763
  Scooter (electric)                                                                        33
  Scooter (non-Electric)                                                                     3
  Unknown                                                                                    1
  Unknown Type Of Non-Motorist                                                               7
  Unknown, Wheelchair (electric)                                                             2
  Wheelchair (electric)                                                                      4
  Wheelchair (non-electric)                                                                  1
                                                                                        
                                                                                         CLOUDY
  BICYCLIST                                                                                 116
  BICYCLIST, OTHER                                                                            0
  BICYCLIST, PEDESTRIAN                                                                       0
  Cyclist (Electric)                                                                          4
  Cyclist (non-electric)                                                                     14
  IN ANIMAL-DRAWN VEH                                                                         0
  MACHINE OPERATOR/RIDER                                                                      3
  Occupant Of a Non-Motor Vehicle Transportation Device                                       1
  Occupant of Motor Vehicle Not in Transport                                                  2
  Occupant of Motor Vehicle Not in Transport, Pedestrian                                      0
  OTHER                                                                                      19
  OTHER CONVEYANCE                                                                            7
  OTHER CONVEYANCE, PEDESTRIAN                                                                0
  OTHER PEDALCYCLIST                                                                          2
  Other Pedestrian (person in a building, skater, personal conveyance, etc.)                  0
  Other Pedestrian (person in a building, skater, personal conveyance, etc.), Pedestrian      0
  OTHER, OTHER CONVEYANCE                                                                     0
  OTHER, PEDESTRIAN                                                                           4
  Pedestrian                                                                                 47
  PEDESTRIAN                                                                                384
  Scooter (electric)                                                                          4
  Scooter (non-Electric)                                                                      0
  Unknown                                                                                     1
  Unknown Type Of Non-Motorist                                                                0
  Unknown, Wheelchair (electric)                                                              0
  Wheelchair (electric)                                                                       1
  Wheelchair (non-electric)                                                                   0
                                                                                        
                                                                                          FOG
  BICYCLIST                                                                                 2
  BICYCLIST, OTHER                                                                          0
  BICYCLIST, PEDESTRIAN                                                                     0
  Cyclist (Electric)                                                                        0
  Cyclist (non-electric)                                                                    0
  IN ANIMAL-DRAWN VEH                                                                       1
  MACHINE OPERATOR/RIDER                                                                    0
  Occupant Of a Non-Motor Vehicle Transportation Device                                     0
  Occupant of Motor Vehicle Not in Transport                                                0
  Occupant of Motor Vehicle Not in Transport, Pedestrian                                    0
  OTHER                                                                                     1
  OTHER CONVEYANCE                                                                          0
  OTHER CONVEYANCE, PEDESTRIAN                                                              0
  OTHER PEDALCYCLIST                                                                        0
  Other Pedestrian (person in a building, skater, personal conveyance, etc.)                0
  Other Pedestrian (person in a building, skater, personal conveyance, etc.), Pedestrian    0
  OTHER, OTHER CONVEYANCE                                                                   0
  OTHER, PEDESTRIAN                                                                         0
  Pedestrian                                                                                1
  PEDESTRIAN                                                                               18
  Scooter (electric)                                                                        0
  Scooter (non-Electric)                                                                    0
  Unknown                                                                                   0
  Unknown Type Of Non-Motorist                                                              0
  Unknown, Wheelchair (electric)                                                            0
  Wheelchair (electric)                                                                     0
  Wheelchair (non-electric)                                                                 0
                                                                                        
                                                                                         OTHER
  BICYCLIST                                                                                  0
  BICYCLIST, OTHER                                                                           0
  BICYCLIST, PEDESTRIAN                                                                      0
  Cyclist (Electric)                                                                         0
  Cyclist (non-electric)                                                                     1
  IN ANIMAL-DRAWN VEH                                                                        0
  MACHINE OPERATOR/RIDER                                                                     0
  Occupant Of a Non-Motor Vehicle Transportation Device                                      1
  Occupant of Motor Vehicle Not in Transport                                                 0
  Occupant of Motor Vehicle Not in Transport, Pedestrian                                     0
  OTHER                                                                                      0
  OTHER CONVEYANCE                                                                           0
  OTHER CONVEYANCE, PEDESTRIAN                                                               0
  OTHER PEDALCYCLIST                                                                         0
  Other Pedestrian (person in a building, skater, personal conveyance, etc.)                 0
  Other Pedestrian (person in a building, skater, personal conveyance, etc.), Pedestrian     0
  OTHER, OTHER CONVEYANCE                                                                    0
  OTHER, PEDESTRIAN                                                                          0
  Pedestrian                                                                                 6
  PEDESTRIAN                                                                                41
  Scooter (electric)                                                                         0
  Scooter (non-Electric)                                                                     0
  Unknown                                                                                    0
  Unknown Type Of Non-Motorist                                                               0
  Unknown, Wheelchair (electric)                                                             0
  Wheelchair (electric)                                                                      0
  Wheelchair (non-electric)                                                                  0
                                                                                        
                                                                                         RAIN
  BICYCLIST                                                                                61
  BICYCLIST, OTHER                                                                          0
  BICYCLIST, PEDESTRIAN                                                                     0
  Cyclist (Electric)                                                                        1
  Cyclist (non-electric)                                                                    6
  IN ANIMAL-DRAWN VEH                                                                       0
  MACHINE OPERATOR/RIDER                                                                    4
  Occupant Of a Non-Motor Vehicle Transportation Device                                     0
  Occupant of Motor Vehicle Not in Transport                                                0
  Occupant of Motor Vehicle Not in Transport, Pedestrian                                    0
  OTHER                                                                                    30
  OTHER CONVEYANCE                                                                          5
  OTHER CONVEYANCE, PEDESTRIAN                                                              0
  OTHER PEDALCYCLIST                                                                        0
  Other Pedestrian (person in a building, skater, personal conveyance, etc.)                0
  Other Pedestrian (person in a building, skater, personal conveyance, etc.), Pedestrian    0
  OTHER, OTHER CONVEYANCE                                                                   0
  OTHER, PEDESTRIAN                                                                         0
  Pedestrian                                                                               81
  PEDESTRIAN                                                                              527
  Scooter (electric)                                                                        2
  Scooter (non-Electric)                                                                    1
  Unknown                                                                                   0
  Unknown Type Of Non-Motorist                                                              0
  Unknown, Wheelchair (electric)                                                            0
  Wheelchair (electric)                                                                     0
  Wheelchair (non-electric)                                                                 0
                                                                                        
                                                                                         SNOW
  BICYCLIST                                                                                 0
  BICYCLIST, OTHER                                                                          0
  BICYCLIST, PEDESTRIAN                                                                     0
  Cyclist (Electric)                                                                        0
  Cyclist (non-electric)                                                                    0
  IN ANIMAL-DRAWN VEH                                                                       0
  MACHINE OPERATOR/RIDER                                                                    0
  Occupant Of a Non-Motor Vehicle Transportation Device                                     0
  Occupant of Motor Vehicle Not in Transport                                                0
  Occupant of Motor Vehicle Not in Transport, Pedestrian                                    0
  OTHER                                                                                     1
  OTHER CONVEYANCE                                                                          1
  OTHER CONVEYANCE, PEDESTRIAN                                                              0
  OTHER PEDALCYCLIST                                                                        0
  Other Pedestrian (person in a building, skater, personal conveyance, etc.)                0
  Other Pedestrian (person in a building, skater, personal conveyance, etc.), Pedestrian    0
  OTHER, OTHER CONVEYANCE                                                                   0
  OTHER, PEDESTRIAN                                                                         0
  Pedestrian                                                                                3
  PEDESTRIAN                                                                                1
  Scooter (electric)                                                                        0
  Scooter (non-Electric)                                                                    0
  Unknown                                                                                   0
  Unknown Type Of Non-Motorist                                                              0
  Unknown, Wheelchair (electric)                                                            0
  Wheelchair (electric)                                                                     0
  Wheelchair (non-electric)                                                                 0
                                                                                        
                                                                                         WIND
  BICYCLIST                                                                                 1
  BICYCLIST, OTHER                                                                          0
  BICYCLIST, PEDESTRIAN                                                                     0
  Cyclist (Electric)                                                                        0
  Cyclist (non-electric)                                                                    0
  IN ANIMAL-DRAWN VEH                                                                       0
  MACHINE OPERATOR/RIDER                                                                    0
  Occupant Of a Non-Motor Vehicle Transportation Device                                     0
  Occupant of Motor Vehicle Not in Transport                                                0
  Occupant of Motor Vehicle Not in Transport, Pedestrian                                    0
  OTHER                                                                                     0
  OTHER CONVEYANCE                                                                          0
  OTHER CONVEYANCE, PEDESTRIAN                                                              0
  OTHER PEDALCYCLIST                                                                        0
  Other Pedestrian (person in a building, skater, personal conveyance, etc.)                0
  Other Pedestrian (person in a building, skater, personal conveyance, etc.), Pedestrian    0
  OTHER, OTHER CONVEYANCE                                                                   2
  OTHER, PEDESTRIAN                                                                         0
  Pedestrian                                                                                0
  PEDESTRIAN                                                                                5
  Scooter (electric)                                                                        0
  Scooter (non-Electric)                                                                    0
  Unknown                                                                                   0
  Unknown Type Of Non-Motorist                                                              0
  Unknown, Wheelchair (electric)                                                            0
  Wheelchair (electric)                                                                     0
  Wheelchair (non-electric)                                                                 0

let’s load the libraries

Note: to insert the picture, I asked chat gpt for help to insert the image

non_motorist_weather <- combined_data %>%
  filter(!is.na(weather1_clean), !is.na(related_non_motorist2)) %>%
  group_by(weather1_clean) %>%
  summarise(count = n())

library(jpeg)
Warning: package 'jpeg' was built under R version 4.3.3
library(grid)

img <- readJPEG("C:/Users/satad/Pictures/Screenshots/car-accident-data205.jpg")
g <- rasterGrob(img, width = unit(1, "npc"), height = unit(1, "npc"), interpolate = TRUE)

Represtion of a bar graph representing the number of accident that involves non-motorist based on weather condition

ggplot(non_motorist_weather, aes(x = weather1_clean, y = count)) +
  annotation_custom(g, xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf) +
  geom_col(fill = "darkred", alpha = 0.8) +
  labs(
    title = "Non-Motorist Accidents by Weather Condition",
    x = "Weather Condition",
    y = "Number of Accidents"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    panel.background = element_rect(fill = NA),
    plot.background = element_rect(fill = NA)
  )

Analysis :

The bar chart represent the number of non-motorist accidents under different weather conditions. The tallest bar is for clear weather, showing that the most non-motorist accidents happened when the weather was clear. rain, and cloudy conditions also had some non-motorist accidents but far fewer than in clear weather. fog, snow, wind, and others weather types had very low accident counts. Each bar represents the number of non-motorist accidents for each specific weather type.

Interpretation :

The chart suggests that clear weather does not prevent accidents involving non-motorists. In fact, more of these accidents happen during clear weather, possibly because more people are outside walking or biking when the weather is nice. Bad weather like fog, snow, and wind shows fewer non-motorist crashes, which may be because fewer people are outside in those conditions. This means that while dangerous weather can affect driving, non-motorist accidents are more common when the weather is good. Drivers should still be alert for pedestrians and bicyclists even when the weather is clear.

let’s do a chi-square test :

# Simplify to yes/no: was a non-motorist involved
combined_data$non_motorist_involved <- ifelse(!is.na(combined_data$related_non_motorist2), "Yes", "No")

weather_vs_non_motorist <- table(combined_data$weather1_clean, combined_data$non_motorist_involved)
chisq.test(weather_vs_non_motorist)

    Chi-squared test for given probabilities

data:  weather_vs_non_motorist
X-squared = 19912, df = 6, p-value < 2.2e-16

Analysis :

The Chi-square test was used to check the relationship between weather conditions and non-motorist involvement in crashes. The test returned a very high Chi-square value of 19912 with 6 degrees of freedom. The p-value was less than 2.2e-16, which will help us interpret the result.

Interpretation :

This result means there is a strong relationship between weather and whether a non-motorist was involved in an accident.(p< 0.05)We can reject the idea that weather and non-motorist involvement are unrelated. For example, clear weather had the highest number of non-motorist accidents, likely because more people are outside. Even though accidents happen in all types of weather, this result shows the pattern is not random. Therefore, weather is an important factor to consider when studying traffic safety for non-motorists.

Modify the name of non-motorist name

combined_data$non_motorist_clean <- tolower(trimws(combined_data$related_non_motorist2))

combined_data$non_motorist_clean <- dplyr::recode(combined_data$non_motorist_clean,
  "pedestrian" = "PEDESTRIAN",
  "bicyclist" = "BICYCLIST",
  "cyclist (electric)" = "BICYCLIST",
  "cyclist (non-electric)" = "BICYCLIST",
  "scooter (electric)" = "SCOOTER",
  "scooter (non-electric)" = "SCOOTER",
  "wheelchair (electric)" = "WHEELCHAIR",
  "wheelchair (non-electric)" = "WHEELCHAIR",
  "machine operator/rider" = "WHEELCHAIR",
  "other" = "OTHER",
  .default = "OTHER"
)

Bubble chart of non- motorist crash counts by weather condition

# Calculate the count of non-motorist crashes per weather condition
dot_data <- combined_data %>%
  filter(!is.na(non_motorist_clean), !is.na(weather1_clean))
dot_data_summary <- dot_data %>%
  group_by(weather1_clean, non_motorist_clean) %>%
  summarise(crash_count = n()) %>%
  ungroup()
`summarise()` has grouped output by 'weather1_clean'. You can override using
the `.groups` argument.
# Calculate the total number of crashes
total_crashes <- sum(dot_data_summary$crash_count)

# Calculate the proportion of non-motorist crashes per weather condition
dot_data_summary <- dot_data_summary %>%
  mutate(proportion = crash_count / total_crashes)

# Now you can use this new 'proportion' column in your ggplot code
ggplot(dot_data_summary, aes(x = weather1_clean, y = non_motorist_clean, size = proportion, fill = proportion)) +
  geom_point(alpha = 0.9, shape = 21, color = "black") + 
  scale_size_area(max_size = 18) +
  scale_fill_gradient(low = "lightyellow", high = "red") +
  labs(
    title = "Proportion of Non-Motorist Crashes by Weather Condition",
    x = "Weather Condition",
    y = "Non-Motorist Type",
    size = "Proportion",
    fill = "Proportion"
  ) +
  guides(size = "none") +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.background = element_rect(fill = "white", color = NA),
    plot.background = element_rect(fill = "white", color = NA),
    panel.grid = element_blank(),
    axis.line = element_blank(),
    panel.border = element_blank()
  )

Analysis :

The bubble chart titled “Proportion of Non-Motorist Crashes by Weather Condition” illustrates the distribution of crashes involving various types of non-motorists such as pedestrians, bicyclists, and wheelchair users under different weather conditions. The x-axis represents weather conditions (clear, cloudy, rain), and the y-axis shows non-motorist types. Each bubble’s size and color intensity indicate the proportion of crashes: larger and redder bubbles reflect higher proportions. Notably, pedestrians involved in crashes under clear weather conditions show the largest and darkest bubble, indicating they account for the highest proportion of non-motorist crashes, especially in good weather.

Interpretation:

This visualization suggests that clear weather is associated with the highest number of non-motorist crashes, which may be due to increased outdoor activity and road use. Pedestrians are the most vulnerable group, likely because they are more frequently present on roadways compared to other non-motorist types. The low number of crashes in poor weather like fog, snow, and wind could indicate reduced non-motorist presence or more cautious driving behaviors in such conditions. The data highlights the need for enhanced pedestrian safety measures, especially in favorable weather conditions when road use is higher. It is normal that during certain period(snow,wind,) result are fewer because people do not go out during that period.

4- Vehicle and Speed Limit

let’s see the type of car named

unique(combined_data$vehicle_make)
  [1] "RAM"                  "HONDA"                "CHRYSLER"            
  [4] "UNKNOWN"              "HYUN"                 "CHEVROLET"           
  [7] "HYUNDAI"              "TOYTA"                "FORD"                
 [10] "THOM"                 "TOYOTA"               "TOYT"                
 [13] "ACUR"                 "FRHT"                 "CHEVY"               
 [16] "DODGE"                "PETERBUILT"           "TESLA"               
 [19] "ACURA"                "VOLVO"                "INFI"                
 [22] "BUICK"                "VOLK"                 "NISSAN"              
 [25] "VOLKSWAGON"           "VOLKSWAGEN"           "CADILLAC"            
 [28] "PONTIAC"              "LEXUS"                "CHEV"                
 [31] "BMW"                  "LINCOLN"              "GMC"                 
 [34] "KIA"                  "NISS"                 "TOY"                 
 [37] "SUB"                  "HOND"                 "N/A"                 
 [40] "TBU"                  "ISU"                  "DODG"                
 [43] "AUDI"                 "MERCEDES"             "MITS"                
 [46] "PORS"                 "MAZDA"                "LEXS"                
 [49] "JEEP"                 "MERZ"                 "MERCEDES BENZ"       
 [52] "SUBA"                 "LAND"                 "VOLV0"               
 [55] "MACK"                 "VOLV"                 "BUIC"                
 [58] "SUBARU"               "VW"                   "FREIGHTLINER"        
 [61] "MERC"                 "THOMAS"               "MISSAN"              
 [64] "GILL"                 "LINC"                 "MERCEDEZ"            
 [67] "CHEVROLETE"           "HINO"                 "MAZD"                
 [70] "IHON"                 "MERCURY"              "SUZUKI"              
 [73] "HYUNDA"               "KW"                   "UU"                  
 [76] "CHRY"                 "MERCEDEZ BENZ"        "SATURN"              
 [79] "SCION"                "ORIO"                 "JAGUAR"              
 [82] "MITSUBISHI"           "PONT"                 "CRV"                 
 [85] "HYUNDIA"              "TOOTA"                "TELSA"               
 [88] "HYUND"                "CHEVEROLET"           "LEXU"                
 [91] "5N1AT2MV9HC830432"    "CADI"                 "LANDROVER"           
 [94] "TOTOTA"               "TOYO"                 "VOLKS"               
 [97] "ISUZU"                "OLDS"                 "TESL"                
[100] "FRIGHT"               "FREIGHT"              "INTERNATIONAL"       
[103] "LONG"                 "NEW FLYER"            "U/K"                 
[106] "SUV"                  "SILVER"               "CRANE CARRIER"       
[109] "TOYORA"               "GILLIAN"              "TOYOT"               
[112] "NISSIAN"              "ELDO"                 "HUYANDAI"            
[115] "HUMMER"               "GRUMAN"               "GEO"                 
[118] "INFINITI"             "NFWL"                 "CHRYSTLER"           
[121] "TOYOYA"               "MNNI"                 "OTHER"               
[124] "GM"                   "FRT"                  "FOOD"                
[127] "JAGU"                 "SPAR"                 "GILLIG"              
[130] "PORSCHE"              "JTDEPRAEXLJ110150"    "SPARTAN"             
[133] "HARLEY DAVIDSON"      "LNDR"                 "UNK"                 
[136] "LUCID"                "XX"                   "LEX"                 
[139] "GILL BU"              "LAND ROVER"           "SUBURU"              
[142] "CHRYS"                "MITUS"                "KENWORTH"            
[145] "GILLIAM"              "PETERBILT"            "UNKOWN"              
[148] "BUS"                  "LEXSUS"               "GENERAL MOTORS"      
[151] "LEXIS"                "MADZA"                "ORION"               
[154] "SUBU"                 "HYUDAI"               "HYUANDAI"            
[157] "MAZDA 5"              "HONDAA"               "SATU"                
[160] "SONATA"               "HYUNDI"               "TOY0TA"              
[163] "TSMR"                 "NOVA"                 "CHEVORLET"           
[166] "ISUZ"                 "STERLING"             "MINI"                
[169] "THOM BUS"             "OLDSMOBILE"           "STLG"                
[172] "HUYNDAI"              "00"                   "RIDE-ON"             
[175] "CHRYSLE"              "5FNYF7H90KB004194"    "ED4"                 
[178] "MASERATI"             "NABI"                 "HYNDAI"              
[181] "SUZI"                 "THMS"                 "POSTAL TRUCK"        
[184] "MER"                  "TTOYOTA"              "MERCEDES-BENZ"       
[187] "MINI COOPER"          "INFINTY"              "BWM"                 
[190] "HARLEY"               "99"                   "CASCADIA"            
[193] "VOLKWAGEN"            "LINCON"               "TOYTOA"              
[196] "CEV"                  "GILG"                 "GEN"                 
[199] "MITZUBISHI"           "DUCATI"               "DODG3"               
[202] "CRCA"                 "SPRM"                 "NEO"                 
[205] "ACCURA"               "FIAT"                 "MARIN"               
[208] "NSSAN"                "JOHN DEERE"           "HUMM"                
[211] "ATHEY"                "FREI"                 "STAR"                
[214] "PTRB"                 "MITZ"                 "VNHL"                
[217] "GILLIS"               "CHRYVAL2008"          "FRAIGHT"             
[220] ""                     "THOMAS BUILT"         "BTL"                 
[223] "CATERPILLAR"          "GENESIS"              "SPARTAN MOTORS CHASS"

rename all the vehicle names:

combined_data$vehicle_make <- with(combined_data, case_when(
  vehicle_make %in% c("TOYOTA", "TOYOT", "TOY0TA", "TOYORA", "TOYTOA", "TOYO", "TOYT", "TOOT", "TOYOYA", "TTOYOTA", "TOY") ~ "TOYOTA",
  vehicle_make %in% c("CHEVROLET", "CHEV", "CHEVY", "CHEVEROLET", "CHEVROLETE", "CHEVORLET", "CEV") ~ "CHEVROLET",
  vehicle_make %in% c("FORD") ~ "FORD",
  vehicle_make %in% c("DODGE", "DODG", "DODG3") ~ "DODGE",
  vehicle_make %in% c("LEXUS", "LEXU", "LEXS", "LEXSUS", "LEX", "LEXIS") ~ "LEXUS",
  vehicle_make %in% c("HONDA", "HOND", "HONDAA", "IHON") ~ "HONDA",
  vehicle_make %in% c("HYUNDAI", "HYUN", "HYUNDA", "HYUANDAI", "HUYNDAI", "HYUNDIA", "HYUND", "HYNDAI", "HYUDAI", "HYUNDI") ~ "HYUNDAI",
  vehicle_make %in% c("NISSAN", "NISS", "NISSIAN", "MISSAN", "NSSAN") ~ "NISSAN",
  TRUE ~ "OTHER"
))
unique(combined_data$vehicle_make)
[1] "OTHER"     "HONDA"     "HYUNDAI"   "CHEVROLET" "FORD"      "TOYOTA"   
[7] "DODGE"     "NISSAN"    "LEXUS"    
# mean
mean_speed <- mean(combined_data$speed_limit, na.rm = TRUE)

# median
median_speed <- median(combined_data$speed_limit, na.rm = TRUE)

# minimum
min_speed <- min(combined_data$speed_limit, na.rm = TRUE)

#  maximum
max_speed <- max(combined_data$speed_limit, na.rm = TRUE)

# Print the results
cat("Mean Speed Limit: ", mean_speed, "\n")
Mean Speed Limit:  26.08663 
cat("Median Speed Limit: ", median_speed, "\n")
Median Speed Limit:  30 
cat("Minimum Speed Limit: ", min_speed, "\n")
Minimum Speed Limit:  0 
cat("Maximum Speed Limit: ", max_speed, "\n")
Maximum Speed Limit:  55 

Analysis :

The mean speed limit in the dataset is approximately 26.09 mph, and the median speed limit is 30 mph. This suggests that while many roads have typical speed limits around 30 mph, the overall average is pulled down slightly due to the presence of very low-speed zones. The minimum speed limit of 0 mph likely corresponds to specific areas like parking lots, private driveways, or non-road zones such as sidewalks or alleyways where vehicles are either prohibited or not expected to move. These areas are not meant for regular traffic flow and are exceptions in the dataset.

I will select the top 10 car and to make my graph interactive , I get help by chatgpt. Chat gpt gave me the library to load , and ggplotly

library(plotly)

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout
top_makes <- combined_data %>%
  count(vehicle_make, sort = TRUE) %>%
  slice_head(n = 10) %>%
  pull(vehicle_make)

heatmap_data <- combined_data %>%
  filter(
    vehicle_make %in% top_makes,
    !is.na(speed_limit),
    !is.na(vehicle_year),
    vehicle_year <= 2025
  ) %>%
  group_by(vehicle_make, speed_limit) %>%
  summarise(
    count = n(),
    min_year = min(vehicle_year),
    max_year = max(vehicle_year),
    .groups = "drop"
  ) %>%
  group_by(vehicle_make) %>%
  mutate(
    total_make_crashes = sum(count),
    proportion = count / total_make_crashes,
    year_range = paste0("Oldest: ", min_year, " | Newest: ", max_year)
  ) %>%
  ungroup()


p <- ggplot(heatmap_data, aes(
  x = factor(speed_limit),
  y = vehicle_make,
  fill = proportion,
  text = paste("Speed Limit of the area:", speed_limit,
               "<br>Make:", vehicle_make,
               "<br>", year_range,
               "<br>Proportion of Crashes:", round(proportion, 5))
)) +
  geom_tile(color = "red") +
  scale_fill_gradient(low = "yellow", high = "red") +
  labs(
    title = "Crash Porportion by Vehicle Make and Speed Limit",
    x = "Speed Limit (MPH)",
    y = "Vehicle Make",
    fill = "Crash Proportion"
  ) +
  theme_minimal() +
  theme(
    panel.background = element_rect(fill = "white", color = NA),
    plot.background = element_rect(fill = "white", color = NA),
    legend.background = element_rect(fill = "white", color = NA),
    legend.key = element_rect(fill = "white", color = NA),
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(hjust = 0.5), 
    panel.grid = element_blank(),
    axis.line = element_blank()
  )
ggplotly(p, tooltip = "text")

Analysis :

This chart shows how often crashes happen for different car brands at different speed limits. The x-axis shows the speed limits (from 0 to 55 mph), and the side lists car brands like Toyota, Honda, and Ford. The colors show how many crashes happened red means more crashes, and yellow means fewer. Most crashes happen at 25 to 35 mph, especially for cars like Honda, Ford, and Hyundai. These speeds are common on city streets where there is a lot of traffic and people walking around.

Interpretation :

The crash proportions decrease at both very low (0–15 mph) and high (45–55 mph) speed limits. This could suggest that either fewer crashes happen at these speeds or that there is less driving activity in those zones. The data also reveals that certain brands like Honda and Ford have more intense crash activity in the 25–35 mph range, which may be linked to their high usage in urban areas. On the other hand, brands like Lexus or Hyundai show some gaps, possibly due to fewer recorded crashes or less representation in the dataset. Overall, this heatmap shows that crash risk is highest in moderate-speed areas, where traffic congestion and interactions between vehicles and pedestrians are more frequent.

5-Are there specific areas with a high concentration of crashes involved non-motorists ?

let’s see the type of non motorist we have here :

unique(combined_data$related_non_motorist2)
 [1] "PEDESTRIAN"                                                                            
 [2] "BICYCLIST"                                                                             
 [3] "OTHER CONVEYANCE"                                                                      
 [4] "OTHER"                                                                                 
 [5] "OTHER PEDALCYCLIST"                                                                    
 [6] "OTHER, PEDESTRIAN"                                                                     
 [7] "MACHINE OPERATOR/RIDER"                                                                
 [8] "OTHER, OTHER CONVEYANCE"                                                               
 [9] "BICYCLIST, OTHER"                                                                      
[10] "BICYCLIST, PEDESTRIAN"                                                                 
[11] "OTHER CONVEYANCE, PEDESTRIAN"                                                          
[12] "IN ANIMAL-DRAWN VEH"                                                                   
[13] "Pedestrian"                                                                            
[14] "Unknown Type Of Non-Motorist"                                                          
[15] "Scooter (electric)"                                                                    
[16] "Cyclist (non-electric)"                                                                
[17] "Other Pedestrian (person in a building, skater, personal conveyance, etc.), Pedestrian"
[18] "Other Pedestrian (person in a building, skater, personal conveyance, etc.)"            
[19] "Cyclist (Electric)"                                                                    
[20] "Scooter (non-Electric)"                                                                
[21] "Wheelchair (electric)"                                                                 
[22] "Occupant of Motor Vehicle Not in Transport"                                            
[23] "Occupant Of a Non-Motor Vehicle Transportation Device"                                 
[24] "Wheelchair (non-electric)"                                                             
[25] "Unknown"                                                                               
[26] "Occupant of Motor Vehicle Not in Transport, Pedestrian"                                
[27] "Unknown, Wheelchair (electric)"                                                        

let’s rename them to make it simple

combined_data4 <- combined_data %>%
  mutate(
    related_non_motorist2 = recode(related_non_motorist2,
                                   "PEDESTRIAN" = "Pedestrian",
                                   "Pedestrian" = "Pedestrian",
                                   "BICYCLIST" = "Bicyclist",
                                   "Cyclist (non-electric)" = "Bicyclist",  
                                   "Cyclist (Electric)" = "Bicyclist",      
                                   "Scooter (electric)" = "Scooter",
                                   "Scooter (non-Electric)" = "Scooter",
                                   "Wheelchair (electric)" = "Wheelchair",
                                   "Wheelchair (non-electric)" = "Wheelchair",
                                   "Unknown Type Of Non-Motorist" = "Unknown",
                                   "Unknown" = "Unknown",
                                   "OTHER CONVEYANCE" = "Other",  
                                   "OTHER" = "Other",
                                   "MACHINE OPERATOR/RIDER" = "Other",  
                                   "IN ANIMAL-DRAWN VEH" = "Unknown",  
                                   "Other Pedestrian (person in a building, skater, personal conveyance, etc.)" = "Pedestrian",  
                                   "Other Pedestrian (person in a building, skater, personal conveyance, etc.), Pedestrian" = "Pedestrian",  
                                   "Other, PEDESTRIAN" = "Pedestrian",
                                   "OTHER, OTHER CONVEYANCE" = "Other",
                                   "BICYCLIST, OTHER" = "Bicyclist",
                                   "BICYCLIST, PEDESTRIAN" = "Bicyclist",
                                   "OTHER CONVEYANCE, PEDESTRIAN" = "Pedestrian",
                                   "Occupant of Motor Vehicle Not in Transport" = "Unknown", 
                                   "Occupant Of a Non-Motor Vehicle Transportation Device" = "Unknown",
                                   "Unknown, Wheelchair (electric)" = "Unknown",
                                   .default = "Other",
    )
  )

What types of route do we have in this data set ?

unique(combined_data4$route_type2)
 [1] ""                       "County"                 "Other Public Roadway"  
 [4] "Maryland (State)"       "Municipality"           "US (State)"            
 [7] "Ramp"                   "Government"             "Interstate (State)"    
[10] "Service Road"           "Maryland (State) Route" "Private Route"         
[13] "County Route"           "Local Route"            "Municipality Route"    
[16] "Bicycle Route"          "Crossover"              "Government Route"      
[19] "Spur"                  
table(combined_data4$related_non_motorist2)

 Bicyclist      Other Pedestrian    Scooter    Unknown Wheelchair 
      1260        391       4337         43         23          6 

second attempt cleaning :

combined_data5 <- combined_data4 %>%
  mutate(
    related_non_motorist2 = recode(related_non_motorist2,
      "Occupant of Motor Vehicle Not in Transport, Pedestrian" = "Pedestrian",
      "OTHER PEDALCYCLIST" = "Bicyclist",  
      "OTHER, PEDESTRIAN" = "Pedestrian",  
      "OTHER CONVEYANCE, PEDESTRIAN" = "Pedestrian", 
      "OTHER" = "Other", 
      "MACHINE OPERATOR/RIDER" = "Other", 
      "IN ANIMAL-DRAWN VEH" = "Unknown",  
      "Unknown Type Of Non-Motorist" = "Unknown",  
      "Unknown" = "Unknown",  
      "Scooter (electric)" = "Scooter",  
      "Scooter (non-Electric)" = "Scooter",  
      "Wheelchair (electric)" = "Wheelchair",  
      "Wheelchair (non-electric)" = "Wheelchair", 
      "Pedestrian" = "Pedestrian",  
      "BICYCLIST" = "Bicyclist",  
      "Cyclist (non-electric)" = "Bicyclist",  
      "Cyclist (Electric)" = "Bicyclist",  
      "BICYCLIST, OTHER" = "Bicyclist",  
      "BICYCLIST, PEDESTRIAN" = "Bicyclist" 
    )
  )


table(combined_data5$related_non_motorist2)

 Bicyclist      Other Pedestrian    Scooter    Unknown Wheelchair 
      1260        391       4337         43         23          6 

Let’s do the same for route_type

combined_data6 <- combined_data5 %>%
  mutate(
    route_type2 = recode(route_type2,
                         "County" = "County",  # Keep County as it is
                         "Other Public Roadway" = "Public Roads",
                         "Maryland (State)" = "Maryland State",  
                         "US (State)" = "Maryland State",  
                         "Municipality" = "Public Roads",
                         "Ramp" = "Government", 
                         "Government" = "Government", 
                         "Interstate (State)" = "Maryland State",
                         "Service Road" = "Public Roads",
                         "Maryland (State) Route" = "Maryland State",
                         "Private Route" = "Public Roads",  
                         "County Route" = "County",  
                         "Local Route" = "County",  
                         "Municipality Route" = "Public Roads",
                         "Bicycle Route" = "Bicycle Route", 
                         "Crossover" = "Bicycle Route",  
                         "Government Route" = "Government",  
                         "Spur" = "Unknown",  
                         "Private Roads" = "Public Roads",  
                         "Special Routes" = "Public Roads")
    )
unique(combined_data6$route_type2)
[1] ""               "County"         "Public Roads"   "Maryland State"
[5] "Government"     "Bicycle Route"  "Unknown"       
combined_data6 <- combined_data6 %>%
  mutate(route_type2 = recode(route_type2,
                              `""` = "Unknown",  # <- Fix for empty string
                              "County" = "County",
                              "Other Public Roadway" = "Public Roads",
                              "Maryland (State)" = "Maryland State",  
                              "US (State)" = "Maryland State",  
                              "Municipality" = "Public Roads",
                              "Ramp" = "Government", 
                              "Government" = "Government", 
                              "Interstate (State)" = "Maryland State",
                              "Service Road" = "Public Roads",
                              "Maryland (State) Route" = "Maryland state",
                              "Private Route" = "Public Roads",  
                              "County Route" = "County",  
                              "Local Route" = "County",  
                              "Municipality Route" = "Public Roads",
                              "Bicycle Route" = "Bicycle Route", 
                              "Crossover" = "Bicycle Route",  
                              "Government Route" = "Government",  
                              "Spur" = "Public Roads",  
                              "Private Roads" = "Public Roads",  
                              "Special Routes" = "Public Roads"
  ))

Non Motorists VS Route Type

library(scales)

Attaching package: 'scales'
The following object is masked from 'package:purrr':

    discard
The following object is masked from 'package:readr':

    col_factor
non_motorist_data <- combined_data6 %>%
  filter(!is.na(route_type2), route_type2 != "", 
         !is.na(related_non_motorist2), related_non_motorist2 != "")

route_type_counts <- non_motorist_data %>%
  count(route_type2, related_non_motorist2)

route_type_counts <- route_type_counts %>%
  group_by(route_type2) %>%
  mutate(
    total_crashes = sum(n),
    proportion = n / total_crashes
  ) %>%
  ungroup() %>%
  arrange(route_type2)

ggplot(route_type_counts, aes(
    x = proportion,
    y = factor(route_type2, levels = unique(route_type2)),
    fill = related_non_motorist2
  )) +
  geom_col(color = "white") +
  scale_x_continuous(labels = percent_format(accuracy = 1)) +
  labs(
    title = "Proportion of Non-Motorist Crash Types by Route Type",
    x = "Proportion of Crashes",
    y = "Route Type",
    fill = "Non-Motorist Type"
  ) +
  coord_cartesian(expand = FALSE) +
  theme_minimal() +
  theme(
    panel.background = element_rect(fill = "white", color = NA),
    plot.background  = element_rect(fill = "white", color = NA),
    legend.background = element_rect(fill = "white", color = NA),
    legend.key        = element_rect(fill = "white", color = NA),
    axis.text.y       = element_text(size = 10),
    axis.text.x       = element_text(size = 9),
    plot.title        = element_text(face = "bold", hjust = 0.5)
  )

Analysis:

This graph represent the porportion of crashes that non-motorists had based on route types. The x-axis represent the proportion of crashes and the y-axis, the types of route that we have here which are bicycle route , county road, government, maryland state , public roads. On bicycle routes, crashes occur most frequently with bicyclists and pedestrians, with bicyclists involved in 50% of the accidents on these roads. On county roads, pedestrians are more exposed to crashes. Government-owned roads also show a high rate of bicycle-related incidents, with 50% of crashes involving bicyclists. Finally, on both public roads and Maryland state roads, accidents mostly involve pedestrians.

Interpretation:

The data shows that crash risks vary depending on the type of roadway and the users involved. Bicycle routes and government-owned roads have the highest percentage of crashes involving bicyclists, each accounting for 50% of such incidents, indicating a need for better bike safety measures. In contrast, county roads, as well as public and Maryland state roads, show a greater risk for pedestrians, suggesting that pedestrian safety infrastructure such as sidewalks, crosswalks, and signals may be lacking or insufficient.

o6- How does the presence of alcohol or drugs affect crash severity for drivers ?

unique(combined_data$driver_substance_abuse2)
 [1] "NONE DETECTED"                                                                                                             
 [2] "UNKNOWN"                                                                                                                   
 [3] "N/A"                                                                                                                       
 [4] "NONE DETECTED, UNKNOWN"                                                                                                    
 [5] "ALCOHOL PRESENT"                                                                                                           
 [6] "ALCOHOL PRESENT, NONE DETECTED"                                                                                            
 [7] "ALCOHOL CONTRIBUTED"                                                                                                       
 [8] "N/A, NONE DETECTED"                                                                                                        
 [9] "ILLEGAL DRUG CONTRIBUTED"                                                                                                  
[10] "ALCOHOL CONTRIBUTED, N/A"                                                                                                  
[11] "ALCOHOL PRESENT, N/A"                                                                                                      
[12] "MEDICATION PRESENT"                                                                                                        
[13] "COMBINED SUBSTANCE PRESENT, NONE DETECTED"                                                                                 
[14] "ALCOHOL CONTRIBUTED, NONE DETECTED"                                                                                        
[15] "MEDICATION PRESENT, NONE DETECTED"                                                                                         
[16] "ILLEGAL DRUG PRESENT"                                                                                                      
[17] "COMBINED SUBSTANCE PRESENT"                                                                                                
[18] "OTHER"                                                                                                                     
[19] "Not Suspect of Alcohol Use, Not Suspect of Drug Use"                                                                       
[20] "Unknown, Unknown"                                                                                                          
[21] "Not Suspect of Alcohol Use, Not Suspect of Drug Use, Unknown, Unknown"                                                     
[22] "Not Suspect of Alcohol Use, Not Suspect of Drug Use, Not Suspect of Alcohol Use, Not Suspect of Drug Use"                  
[23] "Not Suspect of Alcohol Use, Not Suspect of Drug Use, Not Suspect of Alcohol Use, Not Suspect of Drug Use, Unknown, Unknown"
[24] "Suspect of Alcohol Use, Not Suspect of Drug Use"                                                                           
[25] "Suspect of Alcohol Use, Unknown"                                                                                           
[26] "Not Suspect of Alcohol Use, Unknown"                                                                                       
[27] "Unknown, Unknown, Unknown, Unknown"                                                                                        

Recode into two categories: 2 = substance use, 1 = none substance use

combined_data$substance_use_category <- ifelse(
  combined_data$driver_substance_abuse2 %in% c(
    "ALCOHOL PRESENT", 
    "ALCOHOL CONTRIBUTED", 
    "ILLEGAL DRUG PRESENT", 
    "ILLEGAL DRUG CONTRIBUTED", 
    "COMBINED SUBSTANCE PRESENT", 
    "MEDICATION PRESENT", 
    "ALCOHOL PRESENT, NONE DETECTED", 
    "ALCOHOL CONTRIBUTED, NONE DETECTED", 
    "COMBINED SUBSTANCE PRESENT, NONE DETECTED", 
    "MEDICATION PRESENT, NONE DETECTED"
  ), 
  2,
  1)

Let’s take a look at the new table

table(combined_data$substance_use_category)

   1    2 
5972   88 
head(combined_data[, c("driver_substance_abuse2", "substance_use_category")])
  driver_substance_abuse2 substance_use_category
1           NONE DETECTED                      1
2           NONE DETECTED                      1
3           NONE DETECTED                      1
4                 UNKNOWN                      1
5                     N/A                      1
6                     N/A                      1
unique(combined_data$injury_severity1)
[1] "NO APPARENT INJURY"       "POSSIBLE INJURY"         
[3] "SUSPECTED MINOR INJURY"   "SUSPECTED SERIOUS INJURY"
[5] "No Apparent Injury"       ""                        
[7] "Suspected Minor Injury"   "Possible Injury"         
[9] "Suspected Serious Injury"

Consider 1 as a serious injury , and 0 as no injury

combined_data$severe_injury <- ifelse(
  combined_data$injury_severity1 %in% c(
    "SUSPECTED SERIOUS INJURY", 
    "Suspected Serious Injury", 
    "POSSIBLE INJURY", 
    "Possible Injury"
  ),
  1,  
  0 
)
table(combined_data$injury_severity1, combined_data$severe_injury)
                          
                              0    1
                             78    0
  No Apparent Injury        709    0
  NO APPARENT INJURY       4997    0
  Possible Injury             0   10
  POSSIBLE INJURY             0  128
  Suspected Minor Injury     18    0
  SUSPECTED MINOR INJURY    102    0
  Suspected Serious Injury    0    1
  SUSPECTED SERIOUS INJURY    0   17

2*2 table

table_data <- table(combined_data$substance_use_category, combined_data$severe_injury)
table_data
   
       0    1
  1 5824  148
  2   80    8

Calculate the odds ratio:

odds_ratio <- (8 / 80) / (148 / 5824)
odds_ratio
[1] 3.935135

Comments :

Baes on the results, people who used substances (like alcohol or drugs) were more likely to be involved in an accident than those who did not. The higher the odds ratio is above 1, the stronger the association between substance use and crash involvement.

7- Are we able to locate accidents that involve non-motorists from the map?

library(leaflet)
Warning: package 'leaflet' was built under R version 4.3.3
map <- leaflet(data = combined_data) %>%
  addProviderTiles("CartoDB.Positron") %>%  
  addMarkers(
    ~longitude1, ~latitude1,  
    popup = ~road_name1,      
    label = ~road_name1,      
    clusterOptions = markerClusterOptions()  
  ) %>%
  setView(lng = mean(combined_data$longitude1), lat = mean(combined_data$latitude1), zoom = 10) 


map

Comments :

Based on this map, we can see that after performing the inner join, I worked with a total of 6,059 crash cases. This map helps identify the locations of these accidents and highlights the areas with the highest frequency of crashes. According to the map, places like Wheaton, Silver Spring, and Aspen Hill have a high number of accidents. This is not only due to the population density in these areas but also possibly because of heavy traffic, major intersections, and commercial activity that increase the risk of crashes.

Conclusion :

This project highlights the factors linked to crash risks in Montgomery County, including light conditions, weather, speed limits, vehicle types, non-motorist involvement, and substance use. Most crashes happened during clear weather, in daylight, and at moderate speed limits (25–35 mph) which are typical on busy city roads. Even though these are normal driving conditions, the high number of vehicles and pedestrians increases the chance of accidents.Pedestrians were the most affected non-motorists, especially in clear weather. Vehicles like Honda, Ford, and Hyundai showed higher crash rates, likely due to their popularity. Crashes were fewer in low-speed zones like parking lots and neighborhoods (0–15 mph) and on highways (45–55 mph). A large majority of crashes (94%) involved no injuries, showing that many incidents were minor but still important.Importantly, substance use was a contributing factor in some crashes, adding serious risk to all road users. Even a small number of drivers under the influence can cause major harm. These findings suggest that the biggest crash risks come not just from weather or road types, but from driver behavior, including speeding and substance use. To reduce accidents, Montgomery County should focus on traffic control, driver awareness programs, and stronger enforcement against impaired driving.