Victorian Car Accidents involving Pedestrians

Inspired by Danny Cunningham’s article SUVs are Killing People, this notebook aims to investigate accidents involving cars and pedestrians in the state of Victoria, Australia. The key question this notebook aims to investigate is: What is the injury risk that different types of common passenger vehicles have on pedestrians in car accidents?

The Federal Chamber of Automotive Industries reports a uptrend of larger passenger vehicle sales in Australia. The latest March 2023 report demonstrates this with SUV sales far outweighing passenger vehicle sales. The Guardian also has a great visual showing the trend over the years.

Public data on Victorian car accidents can be found on VicRoads and from Data Vic

Loading & cleaning the crash data

We first load in the three relevant tables from the Data Vic Crash Stats dataset: Accident, Person & Vehicle. The crash statistics data ranges from January 2006 to October 2020.

We will only look at car accidents involving pedestrians and cars that are labelled as ‘cars’, ‘station wagons’ and ‘utility’. The data also filters for accidents with a speed limit of 70 and below to rule out data entry errors and only look at accidents on roads where pedestrians are present.

As many vehicles are labelled differently as cars, station wagons or utility trucks, we will categorize the same make & model based on the most frequent label. E.g. The Toyota Hilux is labelled as a car, station wagon and utility vehicle in the vehicle table depending on the individual crash id. Since it is labelled as a utility vehicle more often, we will categorise it as a utility vehicle.

library(tidyverse)
library(lubridate)

accident <- read_csv("ACCIDENT.csv") %>% 
  select(-c(DIRECTORY, EDITION, PAGE, 
            GRID_REFERENCE_X, GRID_REFERENCE_Y, NODE_ID)) %>%
  mutate(ACCIDENTDATE = lubridate::dmy(ACCIDENTDATE), # Parsing dates
         SPEED_ZONE = as.numeric(SPEED_ZONE)) 

person <- read_csv("PERSON.csv") %>%
  select(-c(VEHICLE_ID, SEATING_POSITION, HELMET_BELT_WORN, EJECTED_CODE)) %>%
  filter(ROAD_USER_TYPE == 1) # 1 being pedestrians

vehicle <- read_csv("VEHICLE.csv") %>%
  select(-c(INITIAL_DIRECTION, CONSTRUCTION_TYPE, FUEL_TYPE, FINAL_DIRECTION, TRAILER_TYPE,
            VEHICLE_COLOUR_1, VEHICLE_COLOUR_2, CAUGHT_FIRE, LAMPS, OWNER_POSTCODE, TOWED_AWAY_FLAG)) %>%
  mutate(Make_Model = paste(gsub(" ", "", VEHICLE_MAKE), VEHICLE_MODEL))

# List of Vehicles of interest
veh_interest <- list('Car', 'Station Wagon', 'Utility')

# Creating a combined table from accident, person, vehicle
pedestrians <- left_join(accident, person, by = 'ACCIDENT_NO') %>%
  left_join(., vehicle, by = 'ACCIDENT_NO') %>%
  filter(ACCIDENT_TYPE == 2,  # 2 being struck pedestrians
         `Vehicle Type Desc` %in% veh_interest,
         SPEED_ZONE <= 70) 

# pedestrian d.f but with Vehicle Type Desc based on frequency of occurence
p_recat <- pedestrians %>%
  group_by(`Vehicle Type Desc`) %>%
  summarise(n = n()) %>%
  left_join(pedestrians, by = c("Vehicle Type Desc")) %>%
  group_by(Make_Model) %>%
  arrange(desc(n), .by_group = TRUE) %>%
  mutate(`Vehicle Type Desc` = first(`Vehicle Type Desc`)) %>%
  select(-n)

p_recat
# A tibble: 14,016 × 61
# Groups:   Make_Model [1,558]
   Vehicle …¹ ACCID…² ACCIDENT…³ ACCID…⁴ ACCID…⁵ Accid…⁶ DAY_O…⁷ Day W…⁸ DCA_C…⁹
   <chr>      <chr>   <date>     <time>    <dbl> <chr>     <dbl> <chr>     <dbl>
 1 Car        T20180… 2018-04-28 21:40         2 Struck…       7 Saturd…     100
 2 Utility    T20180… 2018-01-08 21:47         2 Struck…       2 Monday      102
 3 Car        T20070… 2006-12-16 17:00         2 Struck…       7 Saturd…     100
 4 Car        T20110… 2011-04-28 17:20         2 Struck…       5 Thursd…     102
 5 Car        T20160… 2016-08-20 21:45         2 Struck…       7 Saturd…     100
 6 Car        T20110… 2011-12-13 12:35         2 Struck…       3 Tuesday     100
 7 Car        T20080… 2008-09-04 15:40         2 Struck…       5 Thursd…     107
 8 Car        T20090… 2009-03-24 10:00         2 Struck…       3 Tuesday     100
 9 Car        T20100… 2010-09-11 17:10         2 Struck…       7 Saturd…     103
10 Car        T20110… 2011-06-25 10:08         2 Struck…       7 Saturd…     102
# … with 14,006 more rows, 52 more variables: `DCA Description` <chr>,
#   LIGHT_CONDITION <dbl>, `Light Condition Desc` <chr>, NO_OF_VEHICLES <dbl>,
#   NO_PERSONS <dbl>, NO_PERSONS_INJ_2 <dbl>, NO_PERSONS_INJ_3 <dbl>,
#   NO_PERSONS_KILLED <dbl>, NO_PERSONS_NOT_INJ <dbl>, POLICE_ATTEND <dbl>,
#   ROAD_GEOMETRY <dbl>, `Road Geometry Desc` <chr>, SEVERITY <dbl>,
#   SPEED_ZONE <dbl>, PERSON_ID <chr>, SEX <chr>, AGE <dbl>, `Age Group` <chr>,
#   INJ_LEVEL <dbl>, `Inj Level Desc` <chr>, ROAD_USER_TYPE <dbl>, …

A quick look at the vehicle make/model and its categorisation:

library(reactable)

make_model <- p_recat %>% group_by(Make_Model, `Vehicle Type Desc`) %>% 
       summarise(n = n()) %>% arrange(desc(n)) %>% 
       pivot_wider(names_from = `Vehicle Type Desc`, values_from = n)

reactable::reactable(make_model)

Exploring the data

Vehicle accidents involving pedestrians occur with vehicles labelled as cars more frequently. The number of crashes per vehicle type is outlined below:

inj_factor <- c("Not injured", "Other injury","Serious injury","Fatality")

injury <- p_recat %>%
  mutate(`Inj Level Desc` = factor(`Inj Level Desc`, levels=inj_factor)) %>%
  filter(!is.na(`Inj Level Desc`)) %>%
  select(`Vehicle Type Desc`, `Inj Level Desc`) %>%
  group_by(`Vehicle Type Desc`, `Inj Level Desc`) %>%
  summarise(n = n())


injury_tbl <- injury %>%
  pivot_wider(names_from = `Inj Level Desc`, values_from = n) %>%
  rowwise() %>%
  mutate(sum = sum(c_across(where(is.numeric)), na.rm = T)) %>%
  ungroup() %>%
  arrange(desc(sum))

injury_tbl
# A tibble: 3 × 6
  `Vehicle Type Desc` `Not injured` `Other injury` Serious injur…¹ Fatal…²   sum
  <chr>                       <int>          <int>           <int>   <int> <int>
1 Car                           240           7225            5290     298 13053
2 Station Wagon                  14            412             305      25   756
3 Utility                         6             92              83       4   185
# … with abbreviated variable names ¹​`Serious injury`, ²​Fatality
library(plotly)

injury_bar <- injury %>% ggplot(aes(x = `Vehicle Type Desc`, y = n, fill = `Inj Level Desc`)) +
  geom_col(position = "fill") + ylab("proportion")


plotly::ggplotly(injury_bar)

Linear regression estimating the effect of vehicle types, vehicle manufacture year & speed limit on pedestrian fatalities:

model_df <- p_recat %>%
  select(ACCIDENT_NO, VEHICLE_MAKE, VEHICLE_MODEL, Make_Model, `Vehicle Type Desc`,
         VEHICLE_BODY_STYLE ,`Inj Level Desc`, INJ_LEVEL, TARE_WEIGHT, SPEED_ZONE,
         `Light Condition Desc`, VEHICLE_YEAR_MANUF) %>%
  mutate(is_station_wagon = ifelse(`Vehicle Type Desc` == 'Station Wagon', TRUE, FALSE),
         is_utility = ifelse(`Vehicle Type Desc` == 'Utility', TRUE, FALSE),
         is_car = ifelse(`Vehicle Type Desc` == 'Car', TRUE, FALSE),
         is_fatal = ifelse(INJ_LEVEL == 1, TRUE, FALSE),
         is_serious_injury = ifelse(INJ_LEVEL == 2, TRUE, FALSE)) 

# Checking Station Wagon Categorisation
model_df %>% filter(`Vehicle Type Desc` == 'Station Wagon') %>% group_by(Make_Model) %>% summarise(n = n()) %>% arrange(desc(n))
# A tibble: 342 × 2
   Make_Model        n
   <chr>         <int>
 1 BMW X5           30
 2 KIA CARNIV       24
 3 HONDA CR-V       23
 4 LROV DISCOV      21
 5 FORD ESCAPE      18
 6 JEEP WRANGL      13
 7 JEEP GRDCHK      12
 8 LEXUS RX350      12
 9 JEEP PATRIO      10
10 DAIHAT TERIOS     9
# … with 332 more rows
lm_fatal <- lm(is_fatal ~ `Vehicle Type Desc` + SPEED_ZONE, data = model_df)
summary(lm_fatal)

Call:
lm(formula = is_fatal ~ `Vehicle Type Desc` + SPEED_ZONE, data = model_df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.05824 -0.03154 -0.01708 -0.01708  1.01184 

Coefficients:
                                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      -0.0552147  0.0087942  -6.279 3.52e-10 ***
`Vehicle Type Desc`Station Wagon  0.0122435  0.0056395   2.171   0.0299 *  
`Vehicle Type Desc`Utility       -0.0008477  0.0111534  -0.076   0.9394    
SPEED_ZONE                        0.0014458  0.0001611   8.976  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1506 on 13990 degrees of freedom
  (22 observations deleted due to missingness)
Multiple R-squared:  0.005961,  Adjusted R-squared:  0.005748 
F-statistic: 27.96 on 3 and 13990 DF,  p-value: < 2.2e-16

Linear regression estimating the effect of vehicle types, vehicle manufacture year & speed limit on pedestrian serious injuries:

lm_serious <- lm(is_serious_injury ~ `Vehicle Type Desc` + SPEED_ZONE, data = model_df)
summary(lm_serious)

Call:
lm(formula = is_serious_injury ~ `Vehicle Type Desc` + SPEED_ZONE, 
    data = model_df)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.5363 -0.4377 -0.3838  0.5623  0.7239 

Coefficients:
                                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      0.1145696  0.0285615   4.011 6.07e-05 ***
`Vehicle Type Desc`Station Wagon 0.0056356  0.0183156   0.308    0.758    
`Vehicle Type Desc`Utility       0.0447211  0.0362232   1.235    0.217    
SPEED_ZONE                       0.0053854  0.0005231  10.294  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4892 on 13990 degrees of freedom
  (22 observations deleted due to missingness)
Multiple R-squared:  0.00762,   Adjusted R-squared:  0.007408 
F-statistic: 35.81 on 3 and 13990 DF,  p-value: < 2.2e-16

Findings

Pedestrian fatalities:

The linear regression estimates that station wagons have a statistically significant effect (p < 0.05) in pedestrian fatalities. The linear model estimates that station wagons add an additional 1.12% risk of fatality to pedestrians compared to the cars. The effect of utility vehicles is not significant, thus it is inconclusive whether utility vehicles have any effect of pedestrian fatalities compared to cars.

To see the make and models of station wagons/utility vehicles, please refer to the make and model table above in which you are able to sort for station wagons/utility vehicle by clicking on the column headers.

The model also estimates a positive significant effect of the speed zone (speed limit) and vehicle manufactured date on fatalities.

Pedestrian serious injuries:

The linear regression estimates that both station wagons and utility vehicles have no significant effect on pedestrians in crashes where the pedestrian is considered seriously injured.

Similarly to the fatality model, this model also estimates significant positive effects on the speed zone and vehicle manufactured date on fatalities.

Summary

The linear model estimates station wagons increase the risk of fatality by 1.12% compared to cars, which may seem small, however when we see the shift towards SUVs and larger vehicles in Australia, it may result in more pedestrian deaths.

The date currently has cars outweighing station wagons in the number of accidents involving pedestrians (Note: The vehicles listed in the data as station wagon are mainly SUVs - see Make Model table). However we may see this shift to station wagons being more frequently involved in pedestrian accidents.

Could the shift in consumer demand towards larger vehicles sacrifice pedestrian safety for driver/passenger safety in Victoria?

Key Constraint:

Vehicle descriptions are reliant on data inputted in the individual crash report. This may mean vehicles descriptions may vary across different reports and also vehicle descriptions evolve over time due to shifts in the car industry (e.g. Station Wagons in the 2000s may refer to cars more like the Holden Commodore Wagon, where as in the 2010s it may refer to cars more like Toyota RAV 4).
This constraint is addressed by categorising vehicles based on the frequency of their label per make and model.