Nova Scotia Road Safety Intelligence System

Provincial Corridor Model Validation

Author

Gavin Shklanka & Rachel Kodi

Published

March 17, 2026

The Research Question

What factors are associated with higher motor vehicle collision severity on provincial highways in Nova Scotia, and how do traffic exposure and adverse weather conditions interact to amplify collision risk?

This report evaluates an enhanced severity-conditional-on-collision modeling pipeline. The goal is not to predict whether a collision will occur, but rather to assess which recorded collisions are more likely to be severe.

Executive Summary

This project develops a machine learning-based road safety intelligence system for provincial-corridor collisions in Nova Scotia.

Key findings:

  • XGBoost produced the strongest discrimination (AUC = 0.642)
  • Logistic Regression provided a transparent baseline (AUC = 0.604)
  • Random Forest underperformed relative to expectations (AUC = 0.574)
  • Weather and exposure variables — especially temperature, wind speed, and traffic volume — were the strongest predictors
  • Overall performance remained moderate because severe and non-severe collisions overlap heavily in feature space

Bottom line: this system is best interpreted as a risk prioritization tool, not a deterministic prediction engine.

Modeling Roadmap

The modeling process followed four steps:

  1. Examine the data structure before fitting models
  2. Train three candidate models of increasing flexibility
  3. Compare performance on a held-out test set
  4. Translate results into plain-language policy meaning
Note

These diagnostics matter because if severe and non-severe collisions overlap heavily, even strong models will achieve only moderate discrimination — which is the expected result for this kind of severity-conditional problem.

Pre-Model Diagnostics

Class Imbalance

See 4 Urself
if (has_data) {
  class_tbl <- analysis_df %>%
    count(severe) %>%
    mutate(share = scales::percent(n / sum(n), accuracy = 0.1))

  class_tbl %>%
    kbl(caption = "Collision severity distribution — provincial corridor subset") %>%
    kable_styling(full_width = FALSE)
} else {
  cat("Data not loaded — class balance table unavailable in this render.")
}
Data not loaded — class balance table unavailable in this render.
See 4 Urself
if (file.exists("001_4_Section_4_Exploratory_Diagnostics_figure.png")) {
  knitr::include_graphics("001_4_Section_4_Exploratory_Diagnostics_figure.png")
}

Collision severity distribution — provincial corridor subset

The dataset is imbalanced: severe collisions represent a meaningful but minority share of observed cases (approximately 21.8%). This makes raw accuracy a weak performance measure, so the evaluation focuses on ROC/AUC.

Continuous Feature Distributions

See 4 Urself
if (file.exists("002_4_Section_4_Exploratory_Diagnostics_figure.png")) {
  knitr::include_graphics("002_4_Section_4_Exploratory_Diagnostics_figure.png")
}

Enhanced continuous feature distributions by severity

The density plots show substantial overlap between severe and non-severe collisions. In practical terms, this means the classes are not cleanly separable using a simple rule.

Correlation Structure

See 4 Urself
if (file.exists("003_4_Section_4_Exploratory_Diagnostics_figure.png")) {
  knitr::include_graphics("003_4_Section_4_Exploratory_Diagnostics_figure.png")
}

Feature correlation matrix

Key structural observations:

  • Traffic variables cluster strongly together
  • Engineered interaction terms behave as expected
  • Weather variables are not perfectly collinear

This supports the use of tree-based models, which can handle correlated predictors and nonlinear interactions more flexibly than a linear model.

Temperature and Severe Collision Rate

Note

Interpretation note: The SHAP importance bar shows that temp_c is an influential predictor, but does not indicate whether cold or warm temperatures are more dangerous. Temperature likely acts as a context variable — reflecting freeze-thaw cycles, black ice conditions, slush, or seasonal exposure patterns — rather than a simple directional effect. The banded analysis below is the appropriate tool to assess whether severe outcomes are concentrated in very cold, near-freezing, mild, or warmer conditions.

See 4 Urself
if (!has_temp) {
  cat("Temperature-banded analysis is not available in this render because the loaded dataset does not contain `temp_c`. Load the enriched weather-joined CSV to enable this section.")
} else {
  temp_profile <- analysis_df %>%
    filter(!is.na(temp_c), !is.na(severity_raw)) %>%
    mutate(
      temp_band = cut(
        temp_c,
        breaks = c(-Inf, -10, -5, 0, 5, 10, 15, 20, 25, Inf),
        labels = c("< -10", "-10 to -5", "-5 to 0", "0 to 5",
                   "5 to 10", "10 to 15", "15 to 20", "20 to 25", "> 25"),
        right = FALSE
      )
    ) %>%
    group_by(temp_band) %>%
    summarise(
      collisions        = n(),
      severe_collisions = sum(severity_raw == 1, na.rm = TRUE),
      severe_rate       = mean(severity_raw == 1, na.rm = TRUE),
      .groups = "drop"
    )

  temp_profile %>%
    mutate(severe_rate_pct = scales::percent(severe_rate, accuracy = 0.1)) %>%
    select(temp_band, collisions, severe_collisions, severe_rate_pct) %>%
    kbl(
      caption = "Severe collision rate by temperature band",
      col.names = c("Temperature Band (°C)", "Total Collisions",
                    "Severe Collisions", "Severe Rate")
    ) %>%
    kable_styling(full_width = FALSE)
}
Temperature-banded analysis is not available in this render because the loaded dataset does not contain `temp_c`. Load the enriched weather-joined CSV to enable this section.
See 4 Urself
if (has_temp) {
  ggplot(temp_profile, aes(x = temp_band, y = severe_rate)) +
    geom_col(fill = "#2C7BB6", alpha = 0.85) +
    geom_hline(
      yintercept = mean(analysis_df$severity_raw == 1, na.rm = TRUE),
      linetype = "dashed", colour = "#D7191C", linewidth = 0.8
    ) +
    annotate(
      "text", x = Inf, y = mean(analysis_df$severity_raw == 1, na.rm = TRUE),
      label = "  Overall severe rate", hjust = 1.05, vjust = -0.5,
      colour = "#D7191C", size = 3.2
    ) +
    scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
    labs(
      title    = "Severe collision rate by temperature band",
      subtitle = "Dashed line = overall severe rate. Deviations indicate temperature-specific severity elevation.",
      x        = "Temperature band (°C)",
      y        = "Severe collision rate"
    ) +
    theme(axis.text.x = element_text(angle = 30, hjust = 1))
}

Interpreting this chart: Bands that rise clearly above the dashed reference line represent temperature conditions associated with elevated severe outcomes in this dataset. Bands near or below the reference line do not show elevated risk. This chart does not establish causation; it identifies where temperature context correlates with severity in the observed records.

Candidate Models

Logistic Regression

Logistic Regression was used as the baseline because it is interpretable and provides a clean benchmark for binary classification.

See 4 Urself
logistic_tbl <- tibble(
  Model          = "Logistic Regression",
  AUC            = 0.604,
  Interpretation = "Transparent baseline with modest discrimination"
)

logistic_tbl %>%
  kbl(caption = "Logistic Regression summary") %>%
  kable_styling(full_width = FALSE)
Logistic Regression summary
Model AUC Interpretation
Logistic Regression 0.604 Transparent baseline with modest discrimination

Interpretation: This model detects meaningful signal, but the relationship between predictors and severity is not cleanly linear. It performs better than random guessing, but not strongly enough to serve as a standalone operational model.

Plain-language takeaway: This is the “basic benchmark” model. It gives a sensible starting point, but does not capture enough complexity to separate severe from non-severe collisions well.

Random Forest

Random Forest was used to test whether nonlinear decision rules and interaction effects would improve performance beyond the linear baseline.

See 4 Urself
rf_tbl <- tibble(
  Model          = "Random Forest",
  AUC            = 0.574,
  Interpretation = "Flexible nonlinear model, but weaker held-out discrimination"
)

rf_tbl %>%
  kbl(caption = "Random Forest summary") %>%
  kable_styling(full_width = FALSE)
Random Forest summary
Model AUC Interpretation
Random Forest 0.574 Flexible nonlinear model, but weaker held-out discrimination

Interpretation: Although Random Forest can model nonlinear effects, its held-out performance was weaker than Logistic Regression in this version of the dataset.

Plain-language takeaway: Adding flexibility alone did not guarantee better results. A more complicated model is not always a better model.

XGBoost

XGBoost was used as the most advanced candidate because boosting can focus iteratively on harder-to-classify cases and capture more complex structure.

See 4 Urself
xgb_tbl <- tibble(
  Model          = "XGBoost",
  AUC            = 0.642,
  Interpretation = "Best-performing model on held-out discrimination"
)

xgb_tbl %>%
  kbl(caption = "XGBoost summary") %>%
  kable_styling(full_width = FALSE)
XGBoost summary
Model AUC Interpretation
XGBoost 0.642 Best-performing model on held-out discrimination

Interpretation: XGBoost achieved the strongest ranking performance of the three models, suggesting that severe collision risk is influenced by nonlinear combinations of weather, traffic exposure, and collision context.

Plain-language takeaway: This was the strongest model, but it is still not predicting the future with certainty. It is better understood as a tool for flagging higher-risk cases for closer attention.

Final Comparative Evaluation

Test-Set Comparison

See 4 Urself
metrics_df <- tibble(
  Model       = c("Logistic Regression", "Random Forest", "XGBoost"),
  `AUC-ROC`   = c(0.604, 0.574, 0.642),
  Conclusion  = c(
    "Transparent baseline",
    "Flexible but weaker generalization",
    "Best overall discrimination"
  )
)

metrics_df %>%
  arrange(desc(`AUC-ROC`)) %>%
  kbl(caption = "Model comparison — held-out test set") %>%
  kable_styling(full_width = FALSE)
Model comparison — held-out test set
Model AUC-ROC Conclusion
XGBoost 0.642 Best overall discrimination
Logistic Regression 0.604 Transparent baseline
Random Forest 0.574 Flexible but weaker generalization

ROC Comparison

See 4 Urself
if (file.exists("004_6_3_6_3_XGBoost_figure.png")) {
  knitr::include_graphics("004_6_3_6_3_XGBoost_figure.png")
}

ROC curves — enhanced provincial corridor model

XGBoost leads the final comparison, followed by Logistic Regression, then Random Forest.

Overall interpretation: The results suggest that severe collision prediction is feasible at a modest discrimination level. The main practical value of the system is in risk ranking and corridor monitoring, not exact event prediction.

Variable Importance

See 4 Urself
if (file.exists("005_6_3_6_3_XGBoost_figure.png")) {
  knitr::include_graphics("005_6_3_6_3_XGBoost_figure.png")
}

Variable importance — enhanced predictor stack

Across Random Forest and XGBoost, the strongest predictors included:

  • temp_c
  • wind_kph
  • n_vehicles
  • Traffic exposure measures such as AADT and truck share
  • Interaction terms involving traffic volume with visibility or precipitation

Interpretation

  • Weather conditions appear to shape the severity context of collisions
  • Traffic volume acts as an exposure amplifier
  • XGBoost appears to capture these interactions more effectively than the other models
  • Behavioural indicators contributed less than expected, which may reflect weaker signal quality or underreporting
Important

Important: A high SHAP importance for temp_c means the model relies heavily on temperature as a discriminating feature. It does not establish whether cold or warm temperatures are the primary driver of severity elevation. The temperature-banded analysis in the previous section is the appropriate tool for directional interpretation.

Conclusion: Severe collision risk in this subset appears driven more by environmental and exposure conditions than by isolated behavioural indicators.

What the Results Mean

This is a severity-classification model among already-observed collisions.

That means:

  • it does not estimate where collisions will happen in the first place
  • it does estimate which collisions are more likely to be severe once a collision has occurred
  • threshold choice reflects policy tradeoffs, not a universal “correct” cutoff
  • route-level traffic exposure improves realism, but is still an approximation

Meta-Cognitive Reflection: What We Learned in Plain Language

At a simple level, this project showed that serious collisions are hard to predict cleanly because many factors are occurring simultaneously.

Even after adding weather, road context, traffic exposure, and crash-structure features, severe and non-severe cases still overlap substantially. The models can identify patterns, but those patterns are not strong enough to achieve near-perfect separation.

Key lessons:

  • a better model does not remove uncertainty
  • more features do not automatically produce clearer predictions
  • traffic and weather often carry more stable signal than expected
  • the most useful outcome is not “perfect prediction,” but better prioritization

In plain terms, the system is best understood as a way to say:

“These conditions look more dangerous than average, so they deserve more attention.”

That is a realistic and defensible use of machine learning in a public-safety context.

Limitations

This model should not be interpreted as establishing causal relationships or attributing institutional responsibility. It is a severity-classification system built on observed collision records and approximate exposure and context variables.

Important constraints include:

  • severity is modeled conditional on collision occurrence, not full collision risk
  • route-level traffic exposure is an approximation, not exact segment-hour exposure
  • weather assignment is based on nearest-station matching and may miss local variation
  • behavioural variables may be weaker partly because they are underreported or inconsistently captured
  • moderate AUC performance indicates useful ranking ability, not high-certainty forecasting

Accordingly, the model is best used as a policy-support and prioritization layer, not as a standalone decision rule.

Policy Use

This system is most useful for:

  • identifying higher-risk corridor conditions
  • supporting monitoring and intervention prioritization
  • informing enforcement, signage, seasonal planning, and roadway review

It should not be interpreted as a deterministic crash prediction tool.

Optional Code Appendix

Readers can expand the code throughout this report using the “See 4 Urself” toggles. A lightweight example of the modeling workflow is shown below.

See 4 Urself
# Summary of reported model performance
tibble(
  Model = c("Logistic Regression", "Random Forest", "XGBoost"),
  AUC   = c(0.604, 0.574, 0.642)
) %>%
  arrange(desc(AUC))

Conclusion

Across the three candidate models, XGBoost delivered the strongest held-out performance, but overall discrimination remained moderate. This indicates that severe collision outcomes are partly predictable, though not cleanly separable, within the available provincial-corridor feature set.

The strongest signals came from environmental and exposure-related variables — especially temperature, wind speed, traffic volume, and interaction effects involving visibility and precipitation. Isolated behavioural indicators contributed less than expected, which may reflect weaker signal quality, underreporting, or limited measurement precision.

Taken together, the results support a practical interpretation of the system as a risk-prioritization tool rather than a deterministic prediction engine. Its most defensible public-sector value lies in helping provincial and municipal decision-makers identify corridor conditions that deserve closer monitoring, more responsive intervention, and more targeted resource allocation.

Further Implications for Provincial Road Safety Planning

The model suggests that severe-collision management may benefit from shifting away from static road-safety interpretation toward condition-sensitive corridor monitoring.

Three implications stand out:

  1. Environmental exposure should be integrated more directly into corridor safety review. Weather-related severity patterns suggest that seasonal and same-day operating conditions matter materially.

  2. Traffic exposure should be treated as a severity amplifier. Higher-volume corridors may warrant stronger prioritization when adverse conditions are present.

  3. Provincial and municipal agencies may benefit from a ranked-risk framework. Instead of treating all corridors equally, the model supports targeting attention toward conditions and segments where severity risk appears elevated.

These findings create a foundation for future work in corridor scoring, intervention targeting, and cross-jurisdiction safety coordination.

LLM Usage Disclosure

Claude and ChatGPT were used to help structure the R/Quarto workflow, improve report organization, and refine interpretive phrasing. All final analytical claims, metrics, and project-specific outputs were reviewed by the authors.