Car Crashes, Car Brands and those Behind the Wheel

Author

Ben Shilling

Introduction

Long time ago in a small town in Upstate New York, my 2005 Ford Focus zx3 and I where ripping around the windy roads of our lovely small town until an unmovable foe appeared in front of me. A tree magically appeared in front of me as I came racing around a corner. Hear my Ford Focus zx3, known as the “shitbox” by friends and family alike, met it’s end.

The shitbox met it’s end that day, but from it a very hard lesson was lesson learned, maybe drifting a front wheel drive car on a wet road is not a great idea. Know with the ego in check, and a half baked degree in Business Analytics, I wanted to see if I was just another statistic in that faithful day in 2021 or if my crash was just another one. Once we get to learn a little about these crashes we can expand the analysis to something more.

Using crash data from NYC Open Data, and NYC Open API we will learn about crashes, and all the people and vehicles involved in these crashes. These basic results about car crashes brought up another question. What are the personality traits of the drivers of involved in the crashes that lead to the highest proportion of deaths? Using about 40000 comments from Reddit threads discussing drivers of the safest and deadliest car brands (by proportion of death-to-crashes ratio) we will see what personality traits lead to the most dangerous accidents.

What is going on NYC?

Every day there are a about 600 reported car accidents in the confines of NYC. NYC and it’s great Open Data department tracks each vehicle, person and crash and all of the details that come along with it. Using two sources from NYC Open Data and one from the NYC Open API, I compiled a list of 262,848 unique crashes as well as all of the information related to the vehicles and people involved. Using NYC Open API I collected vehicle information on 6 car brands, those being BMW, Nissan, Honda, Chevy, Kia, Subaru. Using the API allowed me to create a data set tailored to the brands I wanted to study. These are well known brands, but they do not represent the entirety of the car accidents available. Using this data set there is information collected on 236,386 accidents, which involving 1,070,535 cars.

Taking a quick look at our summaries by Borough, Year and State, these factors do not make much of a difference in whether there is an injury or death in the accident. When looking at summary population statistics, more likely than not things will smooth at the macro level. In order to understand differences in crash outcome the analysis will have to get more granular than general location, state, and gender. Year seems different but this could be because of a sampling bias. This being that since this data is still being collected today, NYC has been backlogged and is unable to keep up with daily crashes. Because of this they are able to upload the more generally harmful crashes before they are able to get the rest of the crashes recorded. One topic that I wanted to explore was how car brand effected these figures.

Borough Crashes Deaths Injuries % Deadly Crashes % Injury Rate
BROOKLYN 200921 266 87110 0.13 43.36
QUEENS 185699 198 74529 0.11 40.13
MANHATTAN 108226 81 25124 0.07 23.21
BRONX 101596 126 42859 0.12 42.19
STATEN ISLAND 28385 25 15863 0.09 55.89
Year Crashes Deaths Injuries % Deadly Crashes % Injury Rate
2024 13862 43 12472 0.31 89.97
2023 19630 40 20384 0.20 103.84
2022 19601 45 14747 0.23 75.24
2021 22522 75 17320 0.33 76.90
2020 38070 258 28312 0.68 74.37
2019 150937 76 64138 0.05 42.49
2018 181000 236 70540 0.13 38.97
2017 192368 204 70274 0.11 36.53
2016 432148 437 171321 0.10 39.64
State Registration Crashes Deaths Injuries % Deadly Crashes % Injury Rate
NY 881343 1139 380807 0.1292346 43.20758
NJ 67788 92 28528 0.1357172 42.08414
PA 27972 56 14416 0.2002002 51.53725
CT 16558 6 8056 0.0362363 48.65322
FL 15645 34 7112 0.2173218 45.45861
VA 6823 22 3102 0.3224388 45.46387
MA 6305 32 2857 0.5075337 45.31324
MD 5964 0 3453 0.0000000 57.89738
NC 5954 0 3126 0.0000000 52.50252
GA 4500 8 2606 0.1777778 57.91111
Gender Crashes Deaths Injuries % Deadly Crashes % Injury Rate
M 624191 882 267264 0.1413029 42.81766
F 352380 471 173204 0.1336625 49.15262
U 65353 44 21010 0.0673267 32.14849

Looking at another summary table, we see that BMW not only outpaces every other brand in deaths, it also has a remarkably high proportion of the total crashes. BMW being a luxury brand should not be as accessible to everyone, meaning that the smaller population of cars on the road, that are BMW’s are also accounting for the deadliest crashes. What is interesting about this metric is that the injuries do not outpace the deaths nearly as much. So what could possible be causing these extremely difference in deaths, without the same rate of change in injuries?

Make Crashes Deaths Injuries % Deadly Crashes % Injury Rate
BMW 204273 423 90193 0.21 44.15
Nissan 201885 111 85812 0.05 42.51
Honda 204992 168 83907 0.08 40.93
Chevy 208167 278 78345 0.13 37.64
Kia 123682 225 71326 0.18 57.67
Subaru 127546 209 60017 0.16 47.06

So looking deeper in to BMW we can begin to see what they are doing to have this extremely high death rate. BMW accidents, like most accidents, occur because of drivers not paying attention. These accidents are usually minor and non-lethal, but when we are looking at the difference in types of accidents related to speed, dangerous maneuvers and substance use, BMW absolutely CLEARS Honda in these categories. One can imagine that having more accidents than other cars attributed to these factors will lead to a higher death rate.

Reason for Crash BMW Honda Difference % Difference
Driver Inattention/Distraction 50427 50665 -238 -0.47
Close Proximity/Improper Maneuvers 41182 40018 1164 2.91
Improper Passing or Lane Usage 20671 21096 -425 -2.01
Failure to Yield/Traffic Control Issues 17199 19340 -2141 -11.07
Vehicle Defects 8432 8266 166 2.01
Unsafe Speed 6859 4907 1952 39.78
Aggressive/Reckless Driving 6050 4900 1150 23.47
Driver Health/Experience Issues 4555 4543 12 0.26
Substance Involvement 4008 3149 859 27.28
Road/Environmental Conditions 2662 2030 632 31.13
Non-Electronic Distractions 1423 1547 -124 -8.02
Fatigue/Drowsiness 1071 1303 -232 -17.81
Pedestrian/Bicyclist Error 541 639 -98 -15.34
Electronic Distractions 182 266 -84 -31.58

Seeing that there is a clear difference in the types of accidents that BMW drivers get into versus the accidents that Honda drivers get into, the difference in the death number makes sense. To understand the differences in the two types of drivers, understanding personality traits can prove to be helpful. There are stereotypes of BMW drivers being rich, arrogant and un personable. To find this out we need to have a lot of people saying a lot of things about our drivers, and what better place to find that then Reddit.

What are the Sentiments on Honda and BMW Drivers?

Reddit as many know is a forum where people can go and discuss anything, from politics to sword parrying. For any group of people with anything in common there is a group of people willing to discuss it. Using Reddit I made two searches, “BMW Drivers” and “Honda Drivers.” From this search I pulled every available comment from the posts made along these topics. From these two searches alone 39,204 unique comments where harvested, giving me access to over 433,908 score-able words. With a large collection of comments from 168 unique subreddits, notably r/IdiotsInCars, r/BMW. r/Honda, r/Cars and how can can one not have r/driving when talking about driving. Uniquely there where 16,211 unique redditors adding there input. With all of this data compiled a simple positive-negative valence can happen.

In order to conduct a valence I needed to use a lexicon. I decided to use 2 lexicons combined, those being the “bing” and “nrc” lexicons. The nrc lexicon uses emotions as well as positive negative while the bing lexicon only uses positive negative. Many words that I found interesting and connected to the data-set like “douche” or “asshole” where not found one, but found in the other. Many of the words used in a public forum are not going to be the nicest, but since it is raw and unfiltered, a general opinion can be extrapolated if the evidence shows it. With the same sentiments and double the words, we can find how the world views each group.

When looking at negative v. positive analysis for each group we can see that there is no real difference between the sentiments of the two brands. One thing to note is that many of the comments in a thread about the drivers of the group may not be on the topic of the driver but rather they are talking about the brand or car. Many of the issues that people may have with Honda’s is that since they are mass production cars, they will be cheaper, have worse performance, and not have the same status as the BMW does. In order to start to find a general difference between the brands we will need to compare the most common words associated with each brand.

In the two graphs, each of these words are in the lexicons, considered score-able and show up at least 200 times in the 40000 comments. The words “shit” are associated with both brands as there most negative word, but it shows up a lot more than it does for Honda. What we can pull from this is that BMW has a less negative words that are associated with it, but the negative words associated with are going to be used more often. People refer to Honda as slow, issues, damage and cheap. This is more likely than not referring to the car itself in the comment. Compared to BMW, where people use words like “asshole”, “idiot”, “hate”, “worst”, and “douche” these words are referring more to the people owning the vehicles.

When comparing positive words we can see “reliable”, “pretty” and “love” are common words when referring to Honda while when comparing to BMW “fast”, and “money” represent the car in a positive light while the word “expensive” shows that in general the sentiment on BMW’s is that they are expensive while Honda’s are cheap. Based on this we cannot really understand what the sentiment is on the driver’s themselves. To do this we will have to play with the comments a little more.

Both tables are on the same scale

Each of the words showed in these graphs are words that appeared directly after the word “driver” in the Reddit comments. The word top 5 words used to describe BMW driver’s negatively are “shit”, “asshole”, “bad”, “worst”, “and idiot.” Other notable words used to describe BMW drivers are “expensive”, “douche” ,“fast” and “aggressive.” When comparing these words to the crashes listed in the NYC crashes section it turns out that the sentiment passed onto the public opinion.

For Honda the most common negative word is “bad” and similar words for BMW are significantly less. Even the positive words show up less except for “fun.” It seems that the general consensus on Honda drivers is that they are not the problem but rather some people take issue with the car itself. The BMW is living up to the expectations from the NYC data and because of this we can say that maybe there is something different with the BMW drivers compared to Honda Drivers.

Conclusion

Using NYC Open Data and Reddit comments we where able to find that car make has a correlation with the severity of crashes. Since cars are a machine there has to be something about the people driving them that causes the severity of their crashes to be higher. Through the analysis it became apparent that BMW not only stands out in the type of crashes, but also the deadliness of the crashes. While injuries are prevalent there is not much of a difference between BMW and other brands in terms of injuries. What the NYC data told us was the BMW drivers are using more dangerous maneuvers, speeding more often and substance use related accidents compared to Honda, which had one of the lowest proportions of crash/deaths.

Based on understanding the public sentiment of both drivers we could see that people in general do believe in the stereotypes that BMW drivers have associated with them, that being arrogant and unpleasant behavior that leads to higher instances of crashes. By understanding that people view these groups so differently we can see that individual driving habits are more predictable than once thought. Because of this there are many implications that can hurt/benefit you based on something arbitrary like car brand.