Nova Scotia Road Safety Intelligence System

Provincial Corridor Model Validation

Author

Gavin Shklanka & Rachel Kodi

Published

March 17, 2026

---
title: "Nova Scotia Road Safety Intelligence System"
subtitle: "Provincial Corridor Model Validation"
author: "Gavin Shklanka & Rachel Kodi"
date: today
format:
  html:
    embed-resources: true
    toc: true
    toc-depth: 3
    theme: cosmo
    code-fold: true
    code-summary: "See 4 Urself"
    code-tools: true
    df-print: paged
execute:
  echo: true
  warning: false
  message: false
---




The Research Question

> **What factors are associated with higher motor vehicle collision severity on provincial highways in Nova Scotia, and how do traffic exposure and adverse weather conditions interact to amplify collision risk?**

This report evaluates an enhanced **severity-conditional-on-collision** modeling pipeline. The goal is not to predict whether a collision will occur, but rather to assess which recorded collisions are more likely to be severe.

Executive Summary

This project develops a machine learning-based road safety intelligence system for provincial-corridor collisions in Nova Scotia.

Key findings:

* XGBoost produced the strongest discrimination (**AUC = 0.642**)
* Logistic Regression provided a transparent baseline (**AUC = 0.604**)
* Random Forest underperformed relative to expectations (**AUC = 0.574**)
* Weather and exposure variables, especially **temperature**, **wind speed**, and **traffic volume**, were the strongest predictors
* Overall performance remained moderate because severe and non-severe collisions overlap heavily in feature space

**Bottom line:** this system is best interpreted as a **risk prioritization tool**, not a deterministic prediction engine.

Modeling Roadmap

The modeling process followed four steps:

1. Examine the data structure before fitting models
2. Train three candidate models of increasing flexibility
3. Compare performance on a held-out test set
4. Translate results into plain-language policy meaning

::: {.callout-note}
These diagnostics matter because if severe and non-severe collisions overlap heavily, even strong models will only achieve moderate discrimination.
:::

Pre-Model Diagnostics

Class Imbalance


::: {.cell}

```{.r .cell-code}
class_tbl <- tibble(
  Class = c("No", "Yes"),
  Count = c(1240, 387),
  Share = c(0.762, 0.238)
)

class_tbl %>%
  mutate(Share = scales::percent(Share, accuracy = 0.1)) %>%
  kbl(caption = "Collision severity distribution — provincial corridor subset") %>%
  kable_styling(full_width = FALSE)
```

::: {.cell-output-display}
`````{=html}
<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
<caption>Collision severity distribution — provincial corridor subset</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Class </th>
   <th style="text-align:right;"> Count </th>
   <th style="text-align:left;"> Share </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> No </td>
   <td style="text-align:right;"> 1240 </td>
   <td style="text-align:left;"> 76.2% </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Yes </td>
   <td style="text-align:right;"> 387 </td>
   <td style="text-align:left;"> 23.8% </td>
  </tr>
</tbody>
</table>

::: :::

See 4 Urself
knitr::include_graphics("001_4_Section_4_Exploratory_Diagnostics_figure.png")

Collision severity distribution — provincial corridor subset

The dataset is imbalanced: about 24% of collisions are severe. That makes accuracy a weak performance measure, so the evaluation focuses on ROC/AUC.

Continuous Feature Distributions

See 4 Urself
knitr::include_graphics("002_4_Section_4_Exploratory_Diagnostics_figure.png")

Enhanced continuous feature distributions by severity

The density plots show substantial overlap between severe and non-severe collisions. In practical terms, this means the classes are not cleanly separable using a simple rule.

Correlation Structure

See 4 Urself
knitr::include_graphics("003_4_Section_4_Exploratory_Diagnostics_figure.png")

Feature correlation matrix

Key structural observations:

  • Traffic variables cluster strongly together
  • Engineered interaction terms behave as expected
  • Weather variables are not perfectly collinear

This supports the use of tree-based models, which can handle correlated predictors and nonlinear interactions more flexibly than a linear model.

Candidate Models

Logistic Regression

Logistic Regression was used as the baseline because it is interpretable and provides a clean benchmark for binary classification.

See 4 Urself
logistic_tbl <- tibble(
  Model = "Logistic Regression",
  AUC = 0.604,
  Interpretation = "Transparent baseline with modest discrimination"
)

logistic_tbl %>%
  kbl(caption = "Logistic Regression summary") %>%
  kable_styling(full_width = FALSE)
Logistic Regression summary
Model AUC Interpretation
Logistic Regression 0.604 Transparent baseline with modest discrimination

Interpretation: This model detects some signal, but the relationship between predictors and severity is not cleanly linear. It performs better than random guessing, but not strongly enough to serve as a standalone operational model.

Plain-language takeaway: This is the “basic benchmark” model. It gives a sensible starting point, but it does not capture enough complexity to separate severe from non-severe collisions well.

Random Forest

Random Forest was used to test whether nonlinear decision rules and interaction effects would improve performance beyond the linear baseline.

See 4 Urself
rf_tbl <- tibble(
  Model = "Random Forest",
  AUC = 0.574,
  Interpretation = "Flexible nonlinear model, but weaker held-out discrimination"
)

rf_tbl %>%
  kbl(caption = "Random Forest summary") %>%
  kable_styling(full_width = FALSE)
Random Forest summary
Model AUC Interpretation
Random Forest 0.574 Flexible nonlinear model, but weaker held-out discrimination

Interpretation: Although Random Forest can model nonlinear effects, its held-out performance was weaker than Logistic Regression in this version of the dataset.

Plain-language takeaway: Adding flexibility alone did not guarantee better results. A more complicated model is not always a better model.

XGBoost

XGBoost was used as the most advanced candidate model because boosting can focus iteratively on harder-to-classify cases and capture more complex structure.

See 4 Urself
xgb_tbl <- tibble(
  Model = "XGBoost",
  AUC = 0.642,
  Interpretation = "Best-performing model on held-out discrimination"
)

xgb_tbl %>%
  kbl(caption = "XGBoost summary") %>%
  kable_styling(full_width = FALSE)
XGBoost summary
Model AUC Interpretation
XGBoost 0.642 Best-performing model on held-out discrimination

Interpretation: XGBoost achieved the strongest ranking performance of the three models, suggesting that severe collision risk is influenced by nonlinear combinations of weather, traffic exposure, and collision context.

Plain-language takeaway: This was the strongest model, but it is still not “predicting the future perfectly.” It is better understood as a tool for flagging higher-risk cases.

Final Comparative Evaluation

Test-Set Comparison

See 4 Urself
metrics_df <- tibble(
  Model = c("Logistic Regression", "Random Forest", "XGBoost"),
  `AUC-ROC` = c(0.604, 0.574, 0.642),
  Conclusion = c(
    "Transparent baseline",
    "Flexible but weaker generalization",
    "Best overall discrimination"
  )
)

metrics_df %>%
  arrange(desc(`AUC-ROC`)) %>%
  kbl(caption = "Model comparison — held-out test set") %>%
  kable_styling(full_width = FALSE)
Model comparison — held-out test set
Model AUC-ROC Conclusion
XGBoost 0.642 Best overall discrimination
Logistic Regression 0.604 Transparent baseline
Random Forest 0.574 Flexible but weaker generalization

ROC Comparison

See 4 Urself
knitr::include_graphics("004_6_3_6_3_XGBoost_figure.png")

ROC curves — enhanced provincial corridor model

XGBoost leads the final comparison, followed by Logistic Regression, then Random Forest.

Overall interpretation: The results suggest that severe collision prediction is feasible at a modest discrimination level. The main practical value of the system is in risk ranking and corridor monitoring, not exact event prediction.

Variable Importance

See 4 Urself
knitr::include_graphics("005_6_3_6_3_XGBoost_figure.png")

Variable importance — enhanced predictor stack

Across Random Forest and XGBoost, the strongest variables were:

  • temp_c
  • wind_kph
  • n_vehicles
  • traffic exposure measures such as AADT and truck share
  • interaction terms involving traffic and visibility or precipitation

Interpretation

  • Weather conditions appear to shape the severity context of collisions
  • Traffic volume acts as an exposure amplifier
  • XGBoost appears to capture these interactions more effectively than the other models
  • Behavioral indicators contributed less than expected, which may reflect weaker signal quality or underreporting

Conclusion: Severe collision risk in this subset appears to be driven more by environmental and exposure conditions than by isolated behavioral indicators.

What the Results Mean

This is a severity classification model among already-observed collisions.

That means:

  • it does not estimate where collisions will happen in the first place
  • it does estimate which collisions are more likely to be severe once a collision has occurred
  • threshold choice reflects policy tradeoffs, not a universal “correct” cutoff
  • route-level traffic exposure improves realism, but is still an approximation

Meta-Cognitive Reflection: What We Learned in Plain Language

At a simple level, this project showed that serious collisions are hard to predict cleanly because many things are happening at once.

Even after adding weather, road context, traffic exposure, and crash-structure features, the severe and non-severe cases still overlap a lot. That means the models can find patterns, but the patterns are not strong enough to create near-perfect separation.

What this taught us is:

  • a better model does not remove uncertainty
  • more features do not automatically mean clearer predictions
  • traffic and weather often carry more stable signal than expected
  • the most useful outcome is not “perfect prediction,” but better prioritization

In plain terms, the system is best thought of as a way to say:

“These conditions look more dangerous than average, so they deserve more attention.”

That is a realistic and defensible use of machine learning in a public-safety setting.

Limitations

  1. Severity conditional on collision occurrence The model classifies severity among observed collisions; it is not a full occurrence-risk model.

  2. Route-level exposure approximation AADT and truck-share were joined at the route level, not exact segment-hour resolution.

  3. Approximate weather assignment Weather was assigned using nearest-station and same-hour matching.

  4. Random split validation The evaluation used a random train/test split rather than a temporal or corridor holdout.

  5. Moderate predictive ceiling Collision severity contains substantial randomness and unobserved context.

Policy Use

This system is most useful for:

  • identifying higher-risk corridor conditions
  • supporting monitoring and intervention prioritization
  • informing enforcement, signage, seasonal planning, and roadway review

It should not be interpreted as a deterministic crash prediction tool.

Optional Code Appendix

Readers can expand the code throughout this report using the “See 4 Urself” toggles. A lightweight example of the modeling workflow is shown below.

See 4 Urself
# Example reporting object used in this presentation-style .qmd
tibble(
  Model = c("Logistic Regression", "Random Forest", "XGBoost"),
  AUC = c(0.604, 0.574, 0.642)
)

LLM Usage Disclosure

Claude and ChatGPT were used to help structure the R/Quarto workflow, improve report organization, and refine interpretive phrasing. All final analytical claims, metrics, and project-specific outputs were reviewed by the authors.


A few quick checks before you render:

- Put the five image files in the **same folder** as the `.qmd`, or change the paths.
- The class counts in the small table are set to **1240 / 387** to match the plot labels shown in your figures.
- This version is intentionally **presentation-weighted** and avoids rerunning the full training pipeline inside the report, which should help prevent timeouts.