Survivals and fatalities in car accidents.

Every year, drivers throughout the world are killed or injured in road traffic. Young drivers run a greater risk everywhere, and this problem is still largely unsolved. Better understanding of the underlying processes could, however, be a useful tool in preventive endeavors. (“Reducing Crashes and Injuries Among Young Drivers: What Kind of Prevention Should We Be Focusing on?” 2006)

This research is important to help reduce fatalities in future car accidents. The research aims to pinpoint key factors to increasing an occupants’ chances at surviving a car crash. Understanding what factors influence the probability of death in car accidents can increase driver awareness and driving habits that improve overall safety, increase survival rate as well as the vehicle and highway safety regulations. The observations include information like the speed of driving during the accident, age of the car occupants, age of the vehicle, indicators if the vehicle was equipped with the airbags, and if seat belts were fastened.

Hypothesis

1. People who use seatbelts have more chances to survive the car crash.

2. Increasing the speed of the vehicle increases the odds of death.

3. People who are older will suffer more from the car crush.

Data

The “nassCDS” dataset are the US data, for 1997-2002, from police-reported car crashes in which there is a harmful event (people or property), and from which at least one vehicle was towed. Data are restricted to front-seat occupants and include only a subset of the variables recorded.

Description of the variables from a “nassCDS” dataset

  1. dvcat: ordered factor with levels (estimated impact speeds) 1-9km/h, 10-24, 25-39, 40-54, 55+
  2. dead: factor with levels alive dead
  3. airbag: a factor with levels: airbag, none
  4. seatbelt: a factor with levels: belted, none
  5. sex: a factor with levels f (female) m (male)
  6. ageOFocc: age of occupant in years
  7. yearacc: year of accident
  8. yearVeh: Year of model of vehicle; a numeric vector
  9. carage: the age of the car in years
  10. speed: Recoded “dvcat” variable indicating the speed levels: 1 = 1-9km/h, 2 =10-24, 3 = 25-39, 4 = 40-54, 5 = 55+.

Downloading the packages

library(DAAG)
library(Cite)
library(texreg)
library(stargazer)
library(ggthemes)
library(plotly)
library(Zelig)
library(dplyr)
library(ggplot2)

Overview of the data

nassCDS2 <- nassCDS %>%
  filter(!is.na(yearVeh)) %>%
  mutate(carage = yearacc - yearVeh,
         seatbelt = as.factor(seatbelt),
         sex = as.factor(sex),
         alive = ifelse(dead=="alive",1,0)) %>%
select(alive, ageOFocc, carage, seatbelt, sex, airbag, dvcat)

#Creating a new variable "speed"
s <- NA
nassCDS2$speed <- s
nassCDS2$speed[nassCDS2$dvcat == "1-9km/h"] <- 1
nassCDS2$speed[nassCDS2$dvcat == "10-24"] <- 2
nassCDS2$speed[nassCDS2$dvcat == "25-39"] <- 3
nassCDS2$speed[nassCDS2$dvcat == "40-54"] <- 4
nassCDS2$speed[nassCDS2$dvcat == "55+"] <- 5
head(nassCDS2)
##   alive ageOFocc carage seatbelt sex airbag dvcat speed
## 1     1       26      7   belted   f   none 25-39     3
## 2     1       72      2   belted   f airbag 10-24     2
## 3     1       69      9     none   f   none 10-24     2
## 4     1       53      2   belted   f airbag 25-39     3
## 5     1       32      9   belted   f   none 25-39     3
## 6     1       22     12   belted   f   none 40-54     4

Runnin the logistic regression models

##FIRST regression model
m1 <- glm(alive ~ seatbelt, family = binomial, data = nassCDS2)
summary(m1)

##SECOND regression model
m2 <- glm(alive ~ seatbelt + dvcat + sex + ageOFocc, family = binomial, data = nassCDS2)
summary(m2)

##THIRD regression model
m3<- glm(alive ~ seatbelt*dvcat + sex + ageOFocc + airbag, family = binomial, data = nassCDS2)
summary(m3)

Putting the results of regression in a table

Model 3 has the lowest deviance and perhaps it is the best fit to data. Lower values of BIC and AIC indicate better fit.

stargazer(m1,m2, m3, type = "html", 
          title= "Summary of Regression Results", align = TRUE,
          covariate.labels = c("10-24km/h", "25-39km/h",
                               "40-54km/h", "55+ km/h", "Sex", "Age", "Airbag",
                             "seatbelt:1-9km/h", "seatbelt:10-24km/h",
                             "seatbelt:25-39km/h", "seatbelt:40-54km/h",
                              "seatbelt:55+ km/h"))
Summary of Regression Results
Dependent variable:
alive
(1) (2) (3)
10-24km/h 1.261*** 0.958*** 3.734
(0.061) (0.067) (32.114)
25-39km/h -3.578*** -2.424***
(0.370) (0.379)
40-54km/h -0.295 -0.758**
(0.314) (0.323)
55+ km/h 0.305 0.343*
(0.197) (0.208)
Sex -0.124 -0.104
(0.097) (0.114)
Age -0.037 -0.038
(0.067) (0.067)
Airbag -0.033*** -0.033***
(0.002) (0.002)
seatbelt:1-9km/h 0.060
(0.066)
seatbelt:10-24km/h -9.074
(101.554)
seatbelt:25-39km/h 6.556
(85.829)
seatbelt:40-54km/h -3.394
(50.777)
seatbelt:55+ km/h 1.205
(19.192)
Constant 2.326*** 4.063*** 3.708***
(0.040) (0.151) (0.153)
Observations 26,216 26,216 26,216
Log Likelihood -4,594.741 -3,645.210 -3,614.350
Akaike Inf. Crit. 9,193.482 7,306.420 7,254.700
Note: p<0.1; p<0.05; p<0.01

Graphs showing relationships between variables

People who are older sufer more in car accidents

ggplot(nassCDS2, aes (x=ageOFocc, y=alive)) + geom_smooth() + theme_solarized() + labs(x="Age", y="Alive", title="Relationship between Age and Survival")
## `geom_smooth()` using method = 'gam'

Younger people tend to have newer cars

#Graph 1
ggplot(nassCDS2, aes(x = ageOFocc, y = carage)) + geom_density2d() + labs(x="Age", y="Carage", title="Relationship between the age of the occupant and age of the car")

#Graph 2
ggplot(nassCDS2, mapping = aes(x = carage, y = ageOFocc)) + geom_point(color = "grey") + stat_summary(fun.y = "mean", colour = "blue", geom = "line")

Younger people tend to drive faster

ggplot(nassCDS2, aes(x = ageOFocc, y = speed)) + geom_area(color = "orange") + theme_stata() + labs(x="Age", y="Speed", title="Relationship between the age of the occupant and speed of driving")

The overlook of factors during fatal car crashes

The graph show only fatal accidents and it controls for the use of seat belts. It pictures the speed of driving and the age of the car during a fatal car crash. We can notice that fatal accidents occurred with the higher speed of driving and more crashes happens in newer cars. We can also see that even people who were driving very slow but didn’t have the seat belts fasten didn’t survive the accident. The graph indicates that people who drive older cars tend to not use seat belts compared to people who drives younger cars.

factors <- nassCDS2 %>%
  filter(seatbelt %in% c("belted", "none") & alive == "0")

factors_plot <- ggplot(data = factors, aes(x = carage , y = speed)) +
  geom_line(aes(color = seatbelt), size = 2)

factors_plot

Relationship between the driving category and age in fatal accidents

speeding <- nassCDS2 %>%
  filter(dvcat %in% c("1-9km/h", "10-24", "25-39", "40-54", "55+") & alive == "0")

speeding_plot <- ggplot(data = speeding, aes(x = ageOFocc)) +
  geom_bar(aes(color = dvcat), size = 2)

speeding_plot
## Warning: position_stack requires non-overlapping x intervals

The driving category and age in fatal accidents - interactive

ggplotly(speeding_plot)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## Warning: position_stack requires non-overlapping x intervals

Conclusion

In this research I found that all my hypothesis were correct. Seat belt use increase survival chances in a car crash, older occupant have higher fatality risks and increasing the speed of the car will increase the risk of death.

The research also show the relationship between age and speed of driving and age of a driver and age of the car.

“Reducing Crashes and Injuries Among Young Drivers: What Kind of Prevention Should We Be Focusing on?” 2006. Injury Prevention 12 (suppl 1). Berg, Hans-Yngve: i15–i18.