Nova Scotia Road Safety Intelligence System

Provincial Corridor Model Validation

Author

Gavin Shklanka & Rachel Kodi

Published

March 17, 2026

---
title: "Nova Scotia Road Safety Intelligence System"
subtitle: "Provincial Corridor Model Validation"
author: "Gavin Shklanka & Rachel Kodi"
date: today
format:
  html:
    embed-resources: true
    toc: true
    toc-depth: 3
    theme: cosmo
    code-fold: true
    code-summary: "See 4 Urself"
    code-tools: true
    df-print: paged
execute:
  echo: true
  warning: false
  message: false
---




The Research Question

> **What factors are associated with higher motor vehicle collision severity on provincial highways in Nova Scotia, and how do traffic exposure and adverse weather conditions interact to amplify collision risk?**

This report evaluates an enhanced **severity-conditional-on-collision** modeling pipeline. The goal is not to predict whether a collision will occur, but rather to assess which recorded collisions are more likely to be severe.

Executive Summary

This project develops a machine learning-based road safety intelligence system for provincial-corridor collisions in Nova Scotia.

Key findings:

* XGBoost produced the strongest discrimination (**AUC = 0.642**)
* Logistic Regression provided a transparent baseline (**AUC = 0.604**)
* Random Forest underperformed relative to expectations (**AUC = 0.574**)
* Weather and exposure variables, especially **temperature**, **wind speed**, and **traffic volume**, were the strongest predictors
* Overall performance remained moderate because severe and non-severe collisions overlap heavily in feature space

**Bottom line:** this system is best interpreted as a **risk prioritization tool**, not a deterministic prediction engine.

Modeling Roadmap

The modeling process followed four steps:

1. Examine the data structure before fitting models
2. Train three candidate models of increasing flexibility
3. Compare performance on a held-out test set
4. Translate results into plain-language policy meaning

::: {.callout-note}
These diagnostics matter because if severe and non-severe collisions overlap heavily, even strong models will only achieve moderate discrimination.
:::

Pre-Model Diagnostics

Class Imbalance


::: {.cell}

```{.r .cell-code}
class_tbl <- tibble(
  Class = c("No", "Yes"),
  Count = c(1240, 387),
  Share = c(0.762, 0.238)
)

class_tbl %>%
  mutate(Share = scales::percent(Share, accuracy = 0.1)) %>%
  kbl(caption = "Collision severity distribution — provincial corridor subset") %>%
  kable_styling(full_width = FALSE)
```

::: {.cell-output-display}
`````{=html}
<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
<caption>Collision severity distribution — provincial corridor subset</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Class </th>
   <th style="text-align:right;"> Count </th>
   <th style="text-align:left;"> Share </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> No </td>
   <td style="text-align:right;"> 1240 </td>
   <td style="text-align:left;"> 76.2% </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Yes </td>
   <td style="text-align:right;"> 387 </td>
   <td style="text-align:left;"> 23.8% </td>
  </tr>
</tbody>
</table>

::: :::

See 4 Urself

knitr::include_graphics("001_4_Section_4_Exploratory_Diagnostics_figure.png")

Collision severity distribution — provincial corridor subset

The dataset is imbalanced: about 24% of collisions are severe. That makes accuracy a weak performance measure, so the evaluation focuses on ROC/AUC.

Continuous Feature Distributions

See 4 Urself

knitr::include_graphics("002_4_Section_4_Exploratory_Diagnostics_figure.png")

Enhanced continuous feature distributions by severity

The density plots show substantial overlap between severe and non-severe collisions. In practical terms, this means the classes are not cleanly separable using a simple rule.

Correlation Structure

See 4 Urself

knitr::include_graphics("003_4_Section_4_Exploratory_Diagnostics_figure.png")

Key structural observations:

Traffic variables cluster strongly together
Engineered interaction terms behave as expected
Weather variables are not perfectly collinear

This supports the use of tree-based models, which can handle correlated predictors and nonlinear interactions more flexibly than a linear model.

Candidate Models

Logistic Regression

Logistic Regression was used as the baseline because it is interpretable and provides a clean benchmark for binary classification.

See 4 Urself

logistic_tbl <- tibble(
  Model = "Logistic Regression",
  AUC = 0.604,
  Interpretation = "Transparent baseline with modest discrimination"
)

logistic_tbl %>%
  kbl(caption = "Logistic Regression summary") %>%
  kable_styling(full_width = FALSE)

Logistic Regression summary
Model	AUC	Interpretation
Logistic Regression	0.604	Transparent baseline with modest discrimination

Interpretation: This model detects some signal, but the relationship between predictors and severity is not cleanly linear. It performs better than random guessing, but not strongly enough to serve as a standalone operational model.

Plain-language takeaway: This is the “basic benchmark” model. It gives a sensible starting point, but it does not capture enough complexity to separate severe from non-severe collisions well.

Random Forest

Random Forest was used to test whether nonlinear decision rules and interaction effects would improve performance beyond the linear baseline.

See 4 Urself

rf_tbl <- tibble(
  Model = "Random Forest",
  AUC = 0.574,
  Interpretation = "Flexible nonlinear model, but weaker held-out discrimination"
)

rf_tbl %>%
  kbl(caption = "Random Forest summary") %>%
  kable_styling(full_width = FALSE)

Random Forest summary
Model	AUC	Interpretation
Random Forest	0.574	Flexible nonlinear model, but weaker held-out discrimination

Interpretation: Although Random Forest can model nonlinear effects, its held-out performance was weaker than Logistic Regression in this version of the dataset.

Plain-language takeaway: Adding flexibility alone did not guarantee better results. A more complicated model is not always a better model.

XGBoost

XGBoost was used as the most advanced candidate model because boosting can focus iteratively on harder-to-classify cases and capture more complex structure.

See 4 Urself

xgb_tbl <- tibble(
  Model = "XGBoost",
  AUC = 0.642,
  Interpretation = "Best-performing model on held-out discrimination"
)

xgb_tbl %>%
  kbl(caption = "XGBoost summary") %>%
  kable_styling(full_width = FALSE)

XGBoost summary
Model	AUC	Interpretation
XGBoost	0.642	Best-performing model on held-out discrimination

Interpretation: XGBoost achieved the strongest ranking performance of the three models, suggesting that severe collision risk is influenced by nonlinear combinations of weather, traffic exposure, and collision context.

Plain-language takeaway: This was the strongest model, but it is still not “predicting the future perfectly.” It is better understood as a tool for flagging higher-risk cases.

Final Comparative Evaluation

Test-Set Comparison

See 4 Urself

metrics_df <- tibble(
  Model = c("Logistic Regression", "Random Forest", "XGBoost"),
  `AUC-ROC` = c(0.604, 0.574, 0.642),
  Conclusion = c(
    "Transparent baseline",
    "Flexible but weaker generalization",
    "Best overall discrimination"
  )
)

metrics_df %>%
  arrange(desc(`AUC-ROC`)) %>%
  kbl(caption = "Model comparison — held-out test set") %>%
  kable_styling(full_width = FALSE)

Model comparison — held-out test set
Model	AUC-ROC	Conclusion
XGBoost	0.642	Best overall discrimination
Logistic Regression	0.604	Transparent baseline
Random Forest	0.574	Flexible but weaker generalization

ROC Comparison

See 4 Urself

knitr::include_graphics("004_6_3_6_3_XGBoost_figure.png")

ROC curves — enhanced provincial corridor model

XGBoost leads the final comparison, followed by Logistic Regression, then Random Forest.

Overall interpretation: The results suggest that severe collision prediction is feasible at a modest discrimination level. The main practical value of the system is in risk ranking and corridor monitoring, not exact event prediction.

Variable Importance

See 4 Urself

knitr::include_graphics("005_6_3_6_3_XGBoost_figure.png")

Variable importance — enhanced predictor stack

Across Random Forest and XGBoost, the strongest variables were:

temp_c
wind_kph
n_vehicles
traffic exposure measures such as AADT and truck share
interaction terms involving traffic and visibility or precipitation

Interpretation

Weather conditions appear to shape the severity context of collisions
Traffic volume acts as an exposure amplifier
XGBoost appears to capture these interactions more effectively than the other models
Behavioral indicators contributed less than expected, which may reflect weaker signal quality or underreporting

Conclusion: Severe collision risk in this subset appears to be driven more by environmental and exposure conditions than by isolated behavioral indicators.

What the Results Mean

This is a severity classification model among already-observed collisions.

That means:

it does not estimate where collisions will happen in the first place
it does estimate which collisions are more likely to be severe once a collision has occurred
threshold choice reflects policy tradeoffs, not a universal “correct” cutoff
route-level traffic exposure improves realism, but is still an approximation

Meta-Cognitive Reflection: What We Learned in Plain Language

At a simple level, this project showed that serious collisions are hard to predict cleanly because many things are happening at once.

Even after adding weather, road context, traffic exposure, and crash-structure features, the severe and non-severe cases still overlap a lot. That means the models can find patterns, but the patterns are not strong enough to create near-perfect separation.

What this taught us is:

a better model does not remove uncertainty
more features do not automatically mean clearer predictions
traffic and weather often carry more stable signal than expected
the most useful outcome is not “perfect prediction,” but better prioritization

In plain terms, the system is best thought of as a way to say:

“These conditions look more dangerous than average, so they deserve more attention.”

That is a realistic and defensible use of machine learning in a public-safety setting.

Limitations

Severity conditional on collision occurrence The model classifies severity among observed collisions; it is not a full occurrence-risk model.
Route-level exposure approximation AADT and truck-share were joined at the route level, not exact segment-hour resolution.
Approximate weather assignment Weather was assigned using nearest-station and same-hour matching.
Random split validation The evaluation used a random train/test split rather than a temporal or corridor holdout.
Moderate predictive ceiling Collision severity contains substantial randomness and unobserved context.

Policy Use

This system is most useful for:

identifying higher-risk corridor conditions
supporting monitoring and intervention prioritization
informing enforcement, signage, seasonal planning, and roadway review

It should not be interpreted as a deterministic crash prediction tool.

Optional Code Appendix

Readers can expand the code throughout this report using the “See 4 Urself” toggles. A lightweight example of the modeling workflow is shown below.

See 4 Urself

# Example reporting object used in this presentation-style .qmd
tibble(
  Model = c("Logistic Regression", "Random Forest", "XGBoost"),
  AUC = c(0.604, 0.574, 0.642)
)

LLM Usage Disclosure

Claude and ChatGPT were used to help structure the R/Quarto workflow, improve report organization, and refine interpretive phrasing. All final analytical claims, metrics, and project-specific outputs were reviewed by the authors.


A few quick checks before you render:

- Put the five image files in the **same folder** as the `.qmd`, or change the paths.
- The class counts in the small table are set to **1240 / 387** to match the plot labels shown in your figures.
- This version is intentionally **presentation-weighted** and avoids rerunning the full training pipeline inside the report, which should help prevent timeouts.