The data sets I chose contain details about crashes reported in Montgomery County. The first data set has over 100,000 entries with 39 columns, including both categorical and numerical variables. The second data set is similar but focuses only on non-motorists. I included non-motorists because, while drivers face risks, pedestrians and cyclists are also exposed to accidents. To better understand these two data sets, I decided to combine them. This combined data set will help answer key questions for my project.
Join dataset Crash reporting and
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.3.3
Warning: package 'ggplot2' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.0 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Perform an inner join to combine the datasets on “Report.Number”
combined_data <-inner_join(crash_rep, non_motorists, by ="Report.Number")
Warning in inner_join(crash_rep, non_motorists, by = "Report.Number"): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 73 of `x` matches multiple rows in `y`.
ℹ Row 3739 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
View(combined_data)
Install packages, and load libraries
install.packages("dplyr")
Warning: package 'dplyr' is in use and will not be installed
install.packages("ggplot2")
Warning: package 'ggplot2' is in use and will not be installed
install.packages("stats")
Warning: package 'stats' is in use and will not be installed
Report.Number Local.Case.Number.x Agency.Name.x ACRS.Report.Type.x
Length:6561 Length:6561 Length:6561 Length:6561
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Crash.Date.Time.x Route.Type.x Road.Name.x Cross.Street.Name.x
Length:6561 Length:6561 Length:6561 Length:6561
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Off.Road.Description.x Municipality.x Related.Non.Motorist.x
Length:6561 Length:6561 Length:6561
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
Collision.Type.x Weather.x Surface.Condition.x Light.x
Length:6561 Length:6561 Length:6561 Length:6561
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Traffic.Control.x Driver.Substance.Abuse.x Non.Motorist.Substance.Abuse.x
Length:6561 Length:6561 Length:6561
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
Person.ID.x Driver.At.Fault Injury.Severity.x Circumstance
Length:6561 Length:6561 Length:6561 Length:6561
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Driver.Distracted.By Drivers.License.State Vehicle.ID
Length:6561 Length:6561 Length:6561
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
Vehicle.Damage.Extent Vehicle.First.Impact.Location Vehicle.Body.Type
Length:6561 Length:6561 Length:6561
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
Vehicle.Movement Vehicle.Going.Dir Speed.Limit Driverless.Vehicle
Length:6561 Length:6561 Min. : 0.00 Length:6561
Class :character Class :character 1st Qu.:20.00 Class :character
Mode :character Mode :character Median :30.00 Mode :character
Mean :26.02
3rd Qu.:35.00
Max. :55.00
Parked.Vehicle Vehicle.Year Vehicle.Make Vehicle.Model
Length:6561 Min. : 0 Length:6561 Length:6561
Class :character 1st Qu.:2005 Class :character Class :character
Mode :character Median :2012 Mode :character Mode :character
Mean :1844
3rd Qu.:2016
Max. :9999
Latitude.x Longitude.x Location.x Local.Case.Number.y
Min. :38.55 Min. :-79.18 Length:6561 Min. :1.705e+04
1st Qu.:39.02 1st Qu.:-77.18 Class :character 1st Qu.:1.705e+08
Median :39.06 Median :-77.10 Mode :character Median :1.901e+08
Mean :39.07 Mean :-77.11 Mean :1.612e+08
3rd Qu.:39.12 3rd Qu.:-77.04 3rd Qu.:2.200e+08
Max. :39.43 Max. :-76.91 Max. :2.400e+09
Agency.Name.y ACRS.Report.Type.y Crash.Date.Time.y Route.Type.y
Length:6561 Length:6561 Length:6561 Length:6561
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Road.Name.y Cross.Street.Name.y Off.Road.Description.y
Length:6561 Length:6561 Length:6561
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
Municipality.y Related.Non.Motorist.y Collision.Type.y
Length:6561 Length:6561 Length:6561
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
Weather.y Surface.Condition.y Light.y Traffic.Control.y
Length:6561 Length:6561 Length:6561 Length:6561
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Driver.Substance.Abuse.y Non.Motorist.Substance.Abuse.y Person.ID.y
Length:6561 Length:6561 Length:6561
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
Pedestrian.Type Pedestrian.Movement Pedestrian.Actions Pedestrian.Location
Length:6561 Length:6561 Length:6561 Length:6561
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
At.Fault Injury.Severity.y Safety.Equipment Latitude.y
Length:6561 Length:6561 Length:6561 Min. :38.55
Class :character Class :character Class :character 1st Qu.:39.02
Mode :character Mode :character Mode :character Median :39.06
Mean :39.07
3rd Qu.:39.12
Max. :39.43
Longitude.y Location.y
Min. :-79.18 Length:6561
1st Qu.:-77.18 Class :character
Median :-77.10 Mode :character
Mean :-77.11
3rd Qu.:-77.04
Max. :-76.91
# Function to rename columnsrename_columns <-function(name) { name <-gsub("\\.", "_", name) # Replace "." with "_" name <-gsub("_x$", "1", name) # Replace "_x" with "1" name <-gsub("_y$", "2", name) # Replace "_y" with "2"tolower(name) # Convert to lowercase}# Apply the renaming function to column namescolnames(combined_data) <-sapply(colnames(combined_data), rename_columns)# Check new column namescolnames(combined_data)
Recoding Injury Severity into a binary variable (0 = non-severe, 1 = severe) simplifies the analysis and helps in modeling. It allows us to focus on factors contributing to severe injuries and improves the accuracy of statistical.
# Perform logistic regression with weather conditions as predictorlog_reg_model <-glm(injury_severity_binary ~ weather1, data = combined_data, family = binomial)
The model did not converge, meaning it struggled to find a reliable relationship between weather conditions and injury severity. The extremely large standard errors and near-zero estimates suggest possible data issues, such as low variation in injury severity or highly imbalanced weather categories.
Check the unique levels of injury_severity1
levels(combined_data$injury_severity1)
[1] "no apparent injury" "possible injury"
[3] "suspected minor injury" "suspected serious injury"
Graph 1 : bar plot of injury severity by weather condition
ggplot(combined_data, aes(x = weather1, fill = injury_severity1)) +geom_bar(position ="fill") +# Show the proportions of each injury typelabs(title ="Injury Severity by Weather Condition",x ="Weather Condition",y ="Proportion of Injury Severity") +scale_fill_manual(values =c("no apparent injury"="lightblue", "possible injury"="yellow", "suspected minor injury"="orange","suspected serious injury"="red")) +theme(axis.text.x =element_text(angle =45, hjust =1))
The analysis of the relationship between weather conditions and injury severity shows that certain weather factors seem to have an impact on the likelihood of severe injuries in crashes. The logistic regression model, which examined weather conditions as a predictor, found that weather conditions like “Clear,” “Cloudy,” and “Rain” were among the most frequently occurring. However, the results show that none of these weather conditions were statistically significant in predicting severe injury severity, as indicated by the high p-values in the model. This suggests that other factors, such as driver behavior or road conditions, may be more influential in determining injury severity than weather alone. Therefore, while weather conditions are important, they may not be the primary cause of severe injuries in crashes.
Check unique values in the RelatedNonMotorist column
# Check unique values in the Related.Non.Motorist columnunique(combined_data$related_non_motorist2)
[1] "PEDESTRIAN"
[2] "BICYCLIST"
[3] "OTHER CONVEYANCE"
[4] "OTHER"
[5] "OTHER PEDALCYCLIST"
[6] "OTHER, PEDESTRIAN"
[7] "MACHINE OPERATOR/RIDER"
[8] "OTHER, OTHER CONVEYANCE"
[9] "BICYCLIST, OTHER"
[10] "BICYCLIST, PEDESTRIAN"
[11] "OTHER CONVEYANCE, PEDESTRIAN"
[12] "IN ANIMAL-DRAWN VEH"
[13] "Pedestrian"
[14] "Unknown Type Of Non-Motorist"
[15] "Scooter (electric)"
[16] "Cyclist (non-electric)"
[17] "Other Pedestrian (person in a building, skater, personal conveyance, etc.), Pedestrian"
[18] "Other Pedestrian (person in a building, skater, personal conveyance, etc.)"
[19] "Cyclist (Electric)"
[20] "Scooter (non-Electric)"
[21] "Wheelchair (electric)"
[22] "Occupant of Motor Vehicle Not in Transport"
[23] "Occupant Of a Non-Motor Vehicle Transportation Device"
[24] "Wheelchair (non-electric)"
[25] "Unknown"
[26] "Occupant of Motor Vehicle Not in Transport, Pedestrian"
[27] "Unknown, Wheelchair (electric)"
Count the frequency of each type of non-motorist
table(combined_data$related_non_motorist2)
BICYCLIST
1191
BICYCLIST, OTHER
8
BICYCLIST, PEDESTRIAN
7
Cyclist (Electric)
24
Cyclist (non-electric)
123
IN ANIMAL-DRAWN VEH
1
MACHINE OPERATOR/RIDER
40
Occupant Of a Non-Motor Vehicle Transportation Device
3
Occupant of Motor Vehicle Not in Transport
8
Occupant of Motor Vehicle Not in Transport, Pedestrian
2
OTHER
264
OTHER CONVEYANCE
87
OTHER CONVEYANCE, PEDESTRIAN
2
OTHER PEDALCYCLIST
26
Other Pedestrian (person in a building, skater, personal conveyance, etc.)
12
Other Pedestrian (person in a building, skater, personal conveyance, etc.), Pedestrian
3
OTHER, OTHER CONVEYANCE
2
OTHER, PEDESTRIAN
19
Pedestrian
586
PEDESTRIAN
4092
Scooter (electric)
39
Scooter (non-Electric)
4
Unknown
2
Unknown Type Of Non-Motorist
7
Unknown, Wheelchair (electric)
2
Wheelchair (electric)
6
Wheelchair (non-electric)
1
Create a new column indicating if it is a driver or non-motorist
library(ggplot2)# Create a violin plot for light conditions vs. injury severityggplot(combined_data, aes(x = light1, y =as.numeric(injury_severity_binary))) +geom_violin(fill ="skyblue", alpha =0.7) +geom_jitter(width =0.2, alpha =0.4, color ="purple") +labs(title ="Injury Severity by Light Condition",x ="Light Condition",y ="Injury Severity (0 = Non-severe, 1 = Severe)") +theme_minimal() +coord_flip() # Flip for better readability
Warning: Removed 79 rows containing non-finite outside the scale range
(`stat_ydensity()`).
Warning: Removed 79 rows containing missing values or values outside the scale range
(`geom_point()`).
Comments :
The graph shows that severe injuries happen most when daylight or streetlights are on. Fewer accidents happen in complete darkness. However, some neighborhoods do not have streetlights at night, which is normal. At night, most people are asleep, so there are fewer drivers on the road. Around 7 AM and in the evening, when traffic is heavy, accidents may increase due to rush hour. This pattern matches real-life situations, as severe accidents mostly happen during the day and early evening.
p-value < 2.2e-16: Since the p-value is extremely small (less than 0.05), we reject the null hypothesis, meaning weather conditions significantly affect the occurrence of crashes involving non-motorists.
3-Do you think weather conditions play a role in accidents involving non-motorists?
library(ggplot2)ggplot(combined_data, aes(x = weather1, fill = non_motorist_group)) +geom_bar(position ="fill") +theme_minimal() +labs(title ="Proportion of Non-Motorist Crashes by Weather Condition",x ="Weather Condition",y ="Proportion of Crashes",fill ="Non-Motorist Type") +theme(axis.text.x =element_text(angle =45, hjust =1))
Comments :
The graph shows that pedestrians are most at risk of accidents. People outside, no matter the type of accident, are more exposed. After pedestrians, bicyclists are also at high risk, which is not surprising. The roads they use make them more vulnerable to crashes.
4- How does the vehicle model affect the likelihood of a crash occuring under different speed limits ?
# Create a stacked bar plot to visualize the number of crashes by vehicle model and speed limitggplot(combined_data %>%filter(vehicle_make %in%c("TOYOTA", "GMC", "FORD", "CHEVROLET", "DODG", "LEXUS")),aes(x = vehicle_make, fill =as.factor(speed_limit))) +geom_bar() +labs(title ="Stacked Number of Crashes by Vehicle Model and Speed Limit",x ="Vehicle Make",y ="Number of Crashes",fill ="Speed Limit") +theme(axis.text.x =element_text(angle =90, hjust =1)) +# Rotate x-axis labelstheme_minimal()
Comments :
It is s not surprising that Toyota has the highest number of accidents. This is not because Toyota vehicles are inherently more prone to accidents, but because they are more commonly owned in Maryland. Many adults recommend Toyotas to teenagers, which likely increases the number of these vehicles on the road. On the other hand, cars like Dodge and Lexus are less likely to be involved in accidents simply because fewer people drive them.
5-Are there specific areas with a high concentration of crashes involved non-motorists?
# Merge non-motorist and driver crashes by locationcrash_counts <-merge(non_motorist_crashes, driver_crashes, by ="location1", all =TRUE)crash_counts[is.na(crash_counts)] <-0# Replace NAs with 0
# Create a heatmap for non-motorist crashes ggplot(combined_data, aes(x = longitude1, y = latitude1)) +geom_bin2d(bins =30) +# Adjust the number of bins for heatmapscale_fill_gradientn(colors =c("pink", "yellow", "blue", "black")) +# Custom colorslabs(title ="Heatmap of Non-Motorist Crash Locations",x ="Longitude",y ="Latitude",fill ="Crash Density") +theme_minimal() +theme(plot.title =element_text(hjust =0.5, size =14, face ="bold"))
Comments :
This heatmap visualizes the density of non-motorist crashes based on their geographic locations, using longitude and latitude coordinates. The data is divided into 30 bins, where each bin represents a specific area, and the crash density is color-coded. The gradient moves from pink for lower-density areas to black for the highest crash concentrations, with yellow and blue in between. This allows for a clear identification of high-risk areas where non-motorist accidents frequently occur. The plot is styled with a minimal theme for clarity, and the title is centered and bolded to enhance readability.
In addition, when we located on a real map , this location is Rockville. So the city where non-motorist are the most expose to crashes is Rockville
6- How does the presence of alcohol or drugs affect crash severity for drivers and non-motorists?
# Check unique values in the driver substance columnunique(combined_data$driver_substance_abuse1)
[1] "NONE DETECTED"
[2] "UNKNOWN"
[3] "N/A"
[4] "ALCOHOL PRESENT"
[5] "ALCOHOL CONTRIBUTED"
[6] "ILLEGAL DRUG CONTRIBUTED"
[7] "MEDICATION PRESENT"
[8] "COMBINED SUBSTANCE PRESENT"
[9] "ILLEGAL DRUG PRESENT"
[10] "OTHER"
[11] "Not Suspect of Alcohol Use, Not Suspect of Drug Use"
[12] "Unknown, Unknown"
[13] "Suspect of Alcohol Use, Not Suspect of Drug Use"
[14] "Suspect of Alcohol Use, Unknown"
[15] "Not Suspect of Alcohol Use, Unknown"
# Check unique values in the non-motorist substance columnunique(combined_data$non_motorist_substance_abuse1)
[1] "NONE DETECTED"
[2] "N/A"
[3] "ALCOHOL PRESENT"
[4] "ALCOHOL CONTRIBUTED"
[5] "ILLEGAL DRUG CONTRIBUTED"
[6] "UNKNOWN"
[7] "N/A, NONE DETECTED"
[8] "ALCOHOL CONTRIBUTED, ALCOHOL PRESENT"
[9] "COMBINED SUBSTANCE PRESENT"
[10] "ILLEGAL DRUG PRESENT"
[11] "OTHER"
[12] "NONE DETECTED, UNKNOWN"
[13] "N/A, UNKNOWN"
[14] "MEDICATION PRESENT"
[15] "COMBINATION CONTRIBUTED"
[16] "ALCOHOL PRESENT, NONE DETECTED"
[17] "Not Suspect of Alcohol Use, Suspect of Drug Use"
[18] "Not Suspect of Alcohol Use, Not Suspect of Drug Use"
[19] "Suspect of Alcohol Use, Suspect of Drug Use"
[20] "Suspect of Alcohol Use, Not Suspect of Drug Use"
[21] "Unknown, Unknown"
[22] "Not Suspect of Alcohol Use, Not Suspect of Drug Use, Not Suspect of Alcohol Use, Not Suspect of Drug Use"
[23] "Not Suspect of Alcohol Use, Unknown"
[24] "Not Suspect of Alcohol Use, Not Suspect of Drug Use, Not Suspect of Alcohol Use, Not Suspect of Drug Use, Not Suspect of Alcohol Use, Not Suspect of Drug Use"
[25] "Suspect of Alcohol Use, Unknown"
[26] "Unknown, Not Suspect of Drug Use"
# Check the frequency of substances for driverstable(combined_data$driver_substance_abuse1)
ALCOHOL CONTRIBUTED
22
ALCOHOL PRESENT
54
COMBINED SUBSTANCE PRESENT
2
ILLEGAL DRUG CONTRIBUTED
2
ILLEGAL DRUG PRESENT
4
MEDICATION PRESENT
3
N/A
1012
NONE DETECTED
3939
Not Suspect of Alcohol Use, Not Suspect of Drug Use
678
Not Suspect of Alcohol Use, Unknown
2
OTHER
1
Suspect of Alcohol Use, Not Suspect of Drug Use
3
Suspect of Alcohol Use, Unknown
4
UNKNOWN
700
Unknown, Unknown
135
# Check the frequency of substances for non-motoriststable(combined_data$non_motorist_substance_abuse1)
ALCOHOL CONTRIBUTED
39
ALCOHOL CONTRIBUTED, ALCOHOL PRESENT
2
ALCOHOL PRESENT
147
ALCOHOL PRESENT, NONE DETECTED
3
COMBINATION CONTRIBUTED
2
COMBINED SUBSTANCE PRESENT
1
ILLEGAL DRUG CONTRIBUTED
2
ILLEGAL DRUG PRESENT
7
MEDICATION PRESENT
4
N/A
1203
N/A, NONE DETECTED
53
N/A, UNKNOWN
2
NONE DETECTED
4045
NONE DETECTED, UNKNOWN
4
Not Suspect of Alcohol Use, Not Suspect of Drug Use
670
Not Suspect of Alcohol Use, Not Suspect of Drug Use, Not Suspect of Alcohol Use, Not Suspect of Drug Use
54
Not Suspect of Alcohol Use, Not Suspect of Drug Use, Not Suspect of Alcohol Use, Not Suspect of Drug Use, Not Suspect of Alcohol Use, Not Suspect of Drug Use
18
Not Suspect of Alcohol Use, Suspect of Drug Use
2
Not Suspect of Alcohol Use, Unknown
4
OTHER
1
Suspect of Alcohol Use, Not Suspect of Drug Use
24
Suspect of Alcohol Use, Suspect of Drug Use
3
Suspect of Alcohol Use, Unknown
11
UNKNOWN
224
Unknown, Not Suspect of Drug Use
5
Unknown, Unknown
31
Prepare the data for logistic regression for drivers
# Prepare the data for logistic regression for drivers# Create a binary variable for substance abuse (1 = substance, 0 = no substance)combined_data$driver_substance_binary <-ifelse(combined_data$driver_substance_abuse1 %in%c("Alcohol", "Drugs"), 1, 0)# For non-motoristscombined_data$non_motorist_substance_binary <-ifelse(combined_data$non_motorist_substance_abuse1 %in%c("Alcohol", "Drugs"), 1, 0)# remove nacombined_data_cleaned <- combined_data[!is.na(combined_data$injury_severity1) &!is.na(combined_data$driver_substance_binary) &!is.na(combined_data$non_motorist_substance_binary), ]
logistic regression for driver_substance_abuse
driver_model <-glm(injury_severity1 ~ driver_substance_binary , data = combined_data_cleaned, family = binomial)summary(driver_model)
Call:
glm(formula = injury_severity1 ~ driver_substance_binary, family = binomial,
data = combined_data_cleaned)
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.04323 0.05959 -51.07 <2e-16 ***
driver_substance_binary NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2399.4 on 6481 degrees of freedom
Residual deviance: 2399.4 on 6481 degrees of freedom
AIC: 2401.4
Number of Fisher Scoring iterations: 5
Comments :
after calculating the coefficient I could find any result. It may be due do N/A that could not remove.
# Logistic regression for non-motorist substance abuse and injury severitynon_motorist_model <-glm(injury_severity1 ~ non_motorist_substance_binary,data = combined_data, family = binomial)summary(non_motorist_model)
Call:
glm(formula = injury_severity1 ~ non_motorist_substance_binary,
family = binomial, data = combined_data)
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.04323 0.05959 -51.07 <2e-16 ***
non_motorist_substance_binary NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2399.4 on 6481 degrees of freedom
Residual deviance: 2399.4 on 6481 degrees of freedom
(79 observations deleted due to missingness)
AIC: 2401.4
Number of Fisher Scoring iterations: 5
7-Are we able to determinate the agency that respond the most by only looking at a map?
# Install the necessary package if not already installedinstall.packages("leaflet", repos ="https://cloud.r-project.org/")
Installing package into 'C:/Users/satad/AppData/Local/R/win-library/4.3'
(as 'lib' is unspecified)
package 'leaflet' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\satad\AppData\Local\Temp\RtmpEJTz1v\downloaded_packages
chooseCRANmirror(graphics =FALSE, ind =1) # This sets the CRAN mirrorinstall.packages("leaflet")
Installing package into 'C:/Users/satad/AppData/Local/R/win-library/4.3'
(as 'lib' is unspecified)
package 'leaflet' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\satad\AppData\Local\Temp\RtmpEJTz1v\downloaded_packages
# Load the leaflet packagelibrary(leaflet)
Warning: package 'leaflet' was built under R version 4.3.3
library(dplyr)# Create a cleaned dataset for driver-related crashesdriver_data <- combined_data %>%filter(!is.na(latitude1) &!is.na(longitude1)) %>%select(local_case_number1, location1, latitude1, longitude1)# Create a cleaned dataset for non-motorist-related crashesnon_motorist_data <- combined_data %>%filter(!is.na(latitude2) &!is.na(longitude2)) %>%select(local_case_number2, location2, latitude2, longitude2)# Create a leaflet map for visualizationleaflet() %>%addTiles() %>%# Add default OpenStreetMap tilesaddCircles(data = driver_data, lat =~latitude1, lng =~longitude1,color ="blue", radius =50, popup =~paste("Driver Accident: ", local_case_number1)) %>%addLegend("bottomright", colors =c("blue", "red"), labels =c("Driver Accidents", "Non-Motorist Accidents"),title ="Accident Type")
# Define colors for the agencies (based on their names)agency_colors <-c("Gaithersburg Police Depar"="yellow", "Montgomery County Police"="green", "Rockville Police Departme"="purple", "Takoma Park Police Depart"="orange", "Maryland-National Capital"="red", "MONTGOMERY"="pink", "ROCKVILLE"="grey", "MCPARK"="lightblue", "GAITHERSBURG"="blue", "TAKOMA"="white")# Create a cleaned dataset for driver-related crashes with agency informationdriver_data <- combined_data %>%filter(!is.na(latitude1) &!is.na(longitude1)) %>%select(local_case_number1, location1, latitude1, longitude1, agency_name1)# Create a cleaned dataset for non-motorist-related crashes with agency informationnon_motorist_data <- combined_data %>%filter(!is.na(latitude2) &!is.na(longitude2)) %>%select(local_case_number2, location2, latitude2, longitude2, agency_name1)# Create a leaflet map for visualization with agency-based coloringleaflet() %>%addTiles() %>%# Add default OpenStreetMap tilesaddCircles(data = driver_data, lat =~latitude1, lng =~longitude1,color =~agency_colors[agency_name1], radius =50, popup =~paste("Driver Accident: ", local_case_number1, "<br>Agency: ", agency_name1)) %>%addCircles(data = non_motorist_data, lat =~latitude2, lng =~longitude2,color =~agency_colors[agency_name1], radius =50, popup =~paste("Non-Motorist Accident: ", local_case_number2, "<br>Agency: ", agency_name1)) %>%addLegend("bottomright", colors =c("yellow", "green", "purple", "orange", "red", "pink", "grey", "lightblue", "blue", "white"), labels =c("Gaithersburg Police", "Montgomery County Police", "Rockville Police", "Takoma Park Police", "Maryland-National Capital", "MONTGOMERY", "ROCKVILLE", "MCPARK", "GAITHERSBURG", "TAKOMA"),title ="Agency")
Input to asJSON(keep_vec_names=TRUE) is a named vector. In a future version of jsonlite, this option will not be supported, and named vectors will be translated into arrays instead of objects. If you want JSON object output, please use a named list instead. See ?toJSON.
Input to asJSON(keep_vec_names=TRUE) is a named vector. In a future version of jsonlite, this option will not be supported, and named vectors will be translated into arrays instead of objects. If you want JSON object output, please use a named list instead. See ?toJSON.
Input to asJSON(keep_vec_names=TRUE) is a named vector. In a future version of jsonlite, this option will not be supported, and named vectors will be translated into arrays instead of objects. If you want JSON object output, please use a named list instead. See ?toJSON.
Input to asJSON(keep_vec_names=TRUE) is a named vector. In a future version of jsonlite, this option will not be supported, and named vectors will be translated into arrays instead of objects. If you want JSON object output, please use a named list instead. See ?toJSON.
8- What factors influence the frequency of accidents at specific road type?
library(dplyr)# Create a new column for accident count per road typeaccident_count_by_road <- combined_data %>%group_by(route_type1) %>%summarize(accident_count =n())# Check the structure and datahead(accident_count_by_road)
# Chi-Square Test for association between Road Type and Weatherweather_road_table <-table(combined_data$route_type1, combined_data$weather1)# Perform Chi-Square Testchi_square_result <-chisq.test(weather_road_table)
Warning in chisq.test(weather_road_table): Chi-squared approximation may be
incorrect
# Check the expected frequencieschi_square_result <-chisq.test(weather_road_table)
Warning in chisq.test(weather_road_table): Chi-squared approximation may be
incorrect
chi_square_result$expected
BLOWING SNOW CLEAR CLOUDY
1.3141289438 1017.7928669 133.38408779
Bicycle Route 0.0091449474 7.0827618 0.92821216
County 1.8454503887 1429.3013260 187.31321445
County Route 0.2862368541 221.6904435 29.05304070
Crossover 0.0018289895 1.4165524 0.18564243
Government 0.0393232739 30.4558756 3.99131230
Government Route 0.0027434842 2.1248285 0.27846365
Interstate (State) 0.0064014632 4.9579332 0.64974851
Local Route 0.0237768633 18.4151806 2.41335162
Maryland (State) 1.5820759031 1225.3177869 160.58070416
Maryland (State) Route 0.1655235482 128.1979881 16.80064015
Municipality 0.4307270233 333.5980796 43.71879287
Municipality Route 0.0676726109 52.4124371 6.86877000
Other Public Roadway 0.0484682213 37.5386374 4.91952446
Private Route 0.0100594422 7.7910380 1.02103338
Ramp 0.0192043896 14.8737997 1.94924554
Service Road 0.0009144947 0.7082762 0.09282122
Spur 0.0027434842 2.1248285 0.27846365
US (State) 0.1435756744 111.1993599 14.57293096
FOG, SMOG, SMOKE FOGGY
0.2190214906 4.818472794
Bicycle Route 0.0015241579 0.033531474
County 0.3075750648 6.766651425
County Route 0.0477061424 1.049535132
Crossover 0.0003048316 0.006706295
Government 0.0065538790 0.144185338
Government Route 0.0004572474 0.010059442
Interstate (State) 0.0010669105 0.023472032
Local Route 0.0039628105 0.087181832
Maryland (State) 0.2636793172 5.800944978
Maryland (State) Route 0.0275872580 0.606919677
Municipality 0.0717878372 1.579332419
Municipality Route 0.0112787685 0.248132907
Other Public Roadway 0.0080780369 0.177716811
Private Route 0.0016765737 0.036884621
Ramp 0.0032007316 0.070416095
Service Road 0.0001524158 0.003353147
Spur 0.0004572474 0.010059442
US (State) 0.0239292791 0.526444140
FREEZING RAIN OR FREEZING DRIZZLE N/A
0.4380429813 101.62597165
Bicycle Route 0.0030483158 0.70720927
County 0.6151501296 142.71483006
County Route 0.0954122847 22.13565005
Crossover 0.0006096632 0.14144185
Government 0.0131077580 3.04099985
Government Route 0.0009144947 0.21216278
Interstate (State) 0.0021338211 0.49504649
Local Route 0.0079256211 1.83874409
Maryland (State) 0.5273586344 122.34720317
Maryland (State) Route 0.0551745161 12.80048773
Municipality 0.1435756744 33.30955647
Municipality Route 0.0225575370 5.23334857
Other Public Roadway 0.0161560738 3.74820911
Private Route 0.0033531474 0.77793019
Ramp 0.0064014632 1.48513946
Service Road 0.0003048316 0.07072093
Spur 0.0009144947 0.21216278
US (State) 0.0478585581 11.10318549
OTHER RAIN RAINING SEVERE WINDS
2.628257888 19.49291267 133.60310928 1.752171925
Bicycle Route 0.018289895 0.13565005 0.92973632 0.012193263
County 3.690900777 27.37418077 187.62078951 2.460600518
County Route 0.572473708 4.24584667 29.10074684 0.381649139
Crossover 0.003657979 0.02713001 0.18594726 0.002438653
Government 0.078646548 0.58329523 3.99786618 0.052431032
Government Route 0.005486968 0.04069502 0.27892090 0.003657979
Interstate (State) 0.012802926 0.09495504 0.65081542 0.008535284
Local Route 0.047553727 0.35269014 2.41731443 0.031702484
Maryland (State) 3.164151806 23.46745923 160.84438348 2.109434537
Maryland (State) Route 0.331047096 2.45526597 16.82822740 0.220698064
Municipality 0.861454047 6.38911751 43.79058070 0.574302698
Municipality Route 0.135345222 1.00381039 6.88004877 0.090230148
Other Public Roadway 0.096936443 0.71894528 4.92760250 0.064624295
Private Route 0.020118884 0.14921506 1.02270995 0.013412590
Ramp 0.038408779 0.28486511 1.95244627 0.025605853
Service Road 0.001828989 0.01356501 0.09297363 0.001219326
Spur 0.005486968 0.04069502 0.27892090 0.003657979
US (State) 0.287151349 2.12970584 14.59686023 0.191434233
SLEET SNOW UNKNOWN WINTRY MIX
0.6570644719 8.103795153 8.103795153 3.066300869
Bicycle Route 0.0045724737 0.056393842 0.056393842 0.021338211
County 0.9227251943 11.380277397 11.380277397 4.306050907
County Route 0.1431184271 1.765127267 1.765127267 0.667885993
Crossover 0.0009144947 0.011278768 0.011278768 0.004267642
Government 0.0196616369 0.242493522 0.242493522 0.091754306
Government Route 0.0013717421 0.016918153 0.016918153 0.006401463
Interstate (State) 0.0032007316 0.039475690 0.039475690 0.014936747
Local Route 0.0118884316 0.146623990 0.146623990 0.055479348
Maryland (State) 0.7910379515 9.756134736 9.756134736 3.691510440
Maryland (State) Route 0.0827617741 1.020728547 1.020728547 0.386221613
Municipality 0.2153635117 2.656149977 2.656149977 1.005029721
Municipality Route 0.0338363054 0.417314434 0.417314434 0.157902759
Other Public Roadway 0.0242341107 0.298887365 0.298887365 0.113092516
Private Route 0.0050297211 0.062033227 0.062033227 0.023472032
Ramp 0.0096021948 0.118427069 0.118427069 0.044810242
Service Road 0.0004572474 0.005639384 0.005639384 0.002133821
Spur 0.0013717421 0.016918153 0.016918153 0.006401463
US (State) 0.0717878372 0.885383326 0.885383326 0.335009907
Comments :
The Chi-squared test shows that weather conditions and road type affect the number of accidents, as the p-value is very small. This means the relationship is statistically significant. However, there is a warning saying the test may not be accurate. This can happen if some road types or weather conditions have very few accidents, making the data uneven.
Conclusion :
This data set, which included crash reports for both drivers and non-motorists, allowed me to explore various aspects of accidents and gain a deeper understanding of their impact. Through my analysis, I was able to answer several of my questions, although some remain unanswered. My goal was to highlight the significant risks that crashes pose to people’s lives, not only for drivers and passengers but also for pedestrians and other non-motorists who may become victims of road accidents. Additionally, I aimed to raise awareness about the dangers of substances such as alcohol and drugs, which can impair judgment and reaction times, ultimately leading to tragic consequenceseyond the statistics and numbers, this data set made me realize that accidents are more than just events recorded in reports they represent real people, families, and communities affected by tragedy. It made me think beyond what we see on the surface and consider the long-term emotional, physical, and financial consequences that accidents can have. Understanding these patterns is crucial in working toward safer roads and encouraging responsible behavior among all road users. This research reinforced the importance of preventive measures and the need for better awareness to reduce accidents and save lives.
Comments :
The model did not converge, meaning it struggled to find a reliable relationship between weather conditions and injury severity. The extremely large standard errors and near-zero estimates suggest possible data issues, such as low variation in injury severity or highly imbalanced weather categories.