In this project you will apply all the techniques we have studied so far including mapping.
The following file is the Montgomery County File of Traffic crashes by driver. We will be working with this data for the rest of the semester. The point of this project is for you to, using the tools at your disposal , perform and Exploratory Data analysis of the data and report your findings in a qmd document. This is not intended to be an in depth analysis, it is simply an exercise in looking at and thinking about the data. In particular you should be thinking about the questions you might ask and answer. Steps for this part 1: Create a new Quarto project in a separate folder on your system. Download the data set below into your project files, you should make a data sub directory in the project directory to hold this data. Be sure to understand all the data fields and how they are related, and what they can tell you. Write you conclusions and use whatever means you have to substantiate your conclusions.
Loading everything in
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(leaflet)
Warning: package 'leaflet' was built under R version 4.3.3
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 172105 Columns: 43
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (38): Report Number, Agency Name, ACRS Report Type, Crash Date/Time, Rou...
dbl (5): Local Case Number, Speed Limit, Vehicle Year, Latitude, Longitude
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Speculate on how you might use this data and about who might be interested in it.
Most obviously this dataset should tell us where concentrations of crashes may exist or what roads have the most crashes This dataset would be useful in determining if weather conditions do play role in crashes in addition to the road conditions and visibility This dataset could be useful in determine if the extent of possible injury can be determined by the type of crash This dataset would allow us to see what type of vehicle, what brand of vehicle or what model is most involved in crashes. This dataset would be useful in seeing where cyclists and pedestrians are often involved in crashes
The most obvious entity that would appreciate this data set would be MOCO department of transportation because they would be able to get so much insight. They would be able to see where the most accident prone areas are and they would be able to see how these crashes happen. Thus more police presence might be needed in those areas to maintain safe roadways. A group that might be interested in this dataset is the MOCO park service particularity with crashes involving animals. In areas near animal related crashes could be of concern to the park service perhaps in managing the deer population, an animal that can cause crashes. One group that could possibly be interested in this dataset is Tesla. They want to know if any of their driverless vehicles are getting to crashes and some of the factors that could be affecting crashes involved with their vehicles if any. City management officials could find useful information where pedestrians are most at risk of getting struck by a vehicle
ggplot(moco_crashes, aes(x = speed_limit, color = acrs_report_type)) +geom_density()
Interesting plot that shows that most of the fatal crashes seem to be centered on roads with 40mph Speed Limit. On the other hand injury crashes and property damage crashes seem to be closely aligned with each other with a mode of 35mph speed limit.
Lets take a look at how much pedestrians are involved with in fatal crashes
ggplot(moco_fatal, aes(x = speed_limit, color = related_non_motorist)) +geom_density()
This graph shows that pedestrians are involved with deadly crashes at a similar extent to other non related motorist except bicyclists who seem to have more involvement. However, the distribution of non related motorist in deadly crashes seems to be different for each one in terms of speed limit. Pedestrians in specific seem to be involved in deadly crashes the most when the speed limit is 35 to 40mph.
Leaflet to see concentrations of deadly pedestrian crashes
Assuming "longitude" and "latitude" are longitude and latitude, respectively
Two big hot spots I see are on Viers Mill and Rockville pike, those roads have speed limits around 35 to 45 mph which matches well with the previous visualization. Other clusters include: University Blvd east, clusters near Adelphi and a couple clusters on Georgia Ave. When clicking around most of these crashes don’t see to be associated with alcohol and other substance abuses. Furthermore, most of the crashes seem not to be the fault of the driver which I find interesting.
ggplot(blinding_lights) +geom_bar(aes(x = weather, fill = driver_at_fault), position ="dodge") +coord_flip() +theme_classic() +labs(title ="Are Headlights Blinding us?",subtitle ="Head On crashes at night with lights on",x ="Weather Conditon",y ="Number", fill ="Driver at fault?",caption ="https://data.montgomerycountymd.gov/Public-Safety/Crash-Reporting-Drivers-Data/mmzv-x632/about_data")
I can insinuate that drivers not at fault could not see, perhaps were blinded by the other vehicle’s headlights in the head on crash. In addition, in spite of all weather conditions, besides blowing snow, drivers are still more likely to be at fault than not. This maybe insinuates that headlights may not actually blinding people significantly.
# A tibble: 3 × 2
driver_at_fault observed
<chr> <int>
1 No 396
2 Unknown 24
3 Yes 580
Expected values
##no.4354*1000
[1] 435.4
##number of observations in the subset##unknown.02731*1000
[1] 27.31
##yes.5372*1000
[1] 537.2
The results of the test show we have significant evidence at a = .03 to suggest that at least on proportion of drivers at fault in the blinding lights subset is different from the proportions of the main data set.
Discussion:
The results of this project show some interesting stuff, but there is also a lot more I could explore given more time. In regards to the fatal crashes, that data could be helpful in reducing pedestrian deaths. City officials could use that information to see where new speed cameras might be needed or maybe just added crosswalks for people to walk across the roads unscathed. For the blinding lights, my analysis shows that perhaps headlights are not too bad and that people are still responsible for those head on crashes at night. Despite that, the data could still be useful. It might be a good idea to refine how we teach people to drive at night. Overall, there are a number of other factors/variables that could also be responsible for pedestrian deaths and blinding lights such as the road condition and the weather.