Title Page

Alberta Traffic Collision Analysis (2022)
DAT 301 - Project 1
Author: Rudy
Date: 2025-03-27

Table of Contents

(Automatically generated)

1. Introduction

This report analyzes monthly traffic collision data from Alberta in 2022. The objective is to discover trends and draw conclusions that can support future safety measures across Alberta’s highways and city streets.

Traffic accidents are influenced by numerous variables, including weather, driver behavior, enforcement intensity, and traffic volume. Alberta presents a unique case due to its geographic and climate diversity—ranging from snowy winters to busy summer travel seasons.

Understanding the monthly and seasonal distribution of traffic collisions can help policymakers implement timely interventions and road safety campaigns. This project uses a manually extracted subset of the provincial traffic data to demonstrate how data analysis can support real-world decision-making.

2. Data Source and Description

The dataset was manually constructed using figures from the 2022 Alberta Traffic Collision Statistics PDF available on the Alberta Open Government Portal.

collision_data <- read_csv("C:\\Users\\iamru\\OneDrive\\Documents\\dat301\\Alberta_Collision_Monthly_2022.csv")
## Rows: 12 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Month
## dbl (3): Fatal_Collisions, Major_Injury_Collisions, Total_Collisions
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(collision_data)
## Rows: 12
## Columns: 4
## $ Month                   <chr> "January", "February", "March", "April", "May"…
## $ Fatal_Collisions        <dbl> 14, 14, 8, 14, 26, 23, 36, 32, 29, 25, 18, 12
## $ Major_Injury_Collisions <dbl> 134, 112, 116, 114, 163, 184, 225, 242, 214, 2…
## $ Total_Collisions        <dbl> 10759, 9120, 8588, 7576, 7874, 8987, 8996, 881…

2.1 Column Descriptions

  • Month: The calendar month
  • Fatal_Collisions: Collisions that resulted in at least one fatality
  • Major_Injury_Collisions: Serious injury collisions that led to hospitalization
  • Total_Collisions: All recorded collisions (including minor and property damage)
summary(collision_data)
##     Month           Fatal_Collisions Major_Injury_Collisions Total_Collisions
##  Length:12          Min.   : 8.00    Min.   :112.0           Min.   : 7576   
##  Class :character   1st Qu.:14.00    1st Qu.:129.5           1st Qu.: 8760   
##  Mode  :character   Median :20.50    Median :168.5           Median : 8992   
##                     Mean   :20.92    Mean   :169.2           Mean   : 9743   
##                     3rd Qu.:26.75    3rd Qu.:214.8           3rd Qu.: 9677   
##                     Max.   :36.00    Max.   :242.0           Max.   :15034
skim(collision_data)
Data summary
Name collision_data
Number of rows 12
Number of columns 4
_______________________
Column type frequency:
character 1
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Month 0 1 3 9 0 12 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Fatal_Collisions 0 1 20.92 8.85 8 14.00 20.5 26.75 36 ▃▇▂▆▃
Major_Injury_Collisions 0 1 169.17 47.24 112 129.50 168.5 214.75 242 ▇▂▃▂▅
Total_Collisions 0 1 9743.42 2165.73 7576 8759.75 8991.5 9676.75 15034 ▇▂▁▁▁

3. Monthly Breakdown and Visualizations

collision_data$Month <- factor(collision_data$Month, levels = month.name)

3.1 Total Collisions by Month

ggplot(collision_data, aes(x = Month, y = Total_Collisions)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Total Traffic Collisions by Month", y = "Number of Collisions", x = "Month")

Interpretation:
December reported the highest number of traffic collisions (15,034), likely due to challenging winter driving conditions. January and November also show elevated collision counts, possibly related to icy roads and holiday travel. The lowest month was April, indicating more favorable driving conditions and fewer disruptions.

3.2 Fatal vs Major Injury Collisions

collision_data_long <- collision_data %>%
  pivot_longer(cols = c(Fatal_Collisions, Major_Injury_Collisions), names_to = "Type", values_to = "Count")
ggplot(collision_data_long, aes(x = Month, y = Count, group = Type, color = Type)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 2) +
  labs(title = "Fatal vs Major Injury Collisions by Month", x = "Month", y = "Count") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Interpretation:
July had the most fatal collisions (36), while August had the most major injuries (242). These summer peaks might be attributed to increased traffic during vacation season, more vulnerable road users (pedestrians, cyclists, motorcycles), and possibly driver fatigue on long trips.

5. Boxplots by Season

collision_data %>%
  pivot_longer(cols = c(Fatal_Collisions, Major_Injury_Collisions), names_to = "Type", values_to = "Count") %>%
  ggplot(aes(x = Season, y = Count, fill = Type)) +
  geom_boxplot() +
  labs(title = "Collision Severity by Season")

Interpretation:
Boxplots illustrate how summer collisions tend to have a higher range and median for both fatalities and injuries, reinforcing the seasonal risks.

6. Percentage Change Month to Month

collision_data <- collision_data %>%
  arrange(match(Month, month.name)) %>%
  mutate(Pct_Change = round(100 * (Total_Collisions - lag(Total_Collisions)) / lag(Total_Collisions), 1))

kable(collision_data[, c("Month", "Total_Collisions", "Pct_Change")])
Month Total_Collisions Pct_Change
January 10759 NA
February 9120 -15.2
March 8588 -5.8
April 7576 -11.8
May 7874 3.9
June 8987 14.1
July 8996 0.1
August 8817 -2.0
September 8967 1.7
October 9316 3.9
November 12887 38.3
December 15034 16.7

Interpretation:
Significant jumps are seen in the lead-up to winter months and just after summer holidays. This can help predict when to ramp up road safety messaging.

7. Statistical Measures

collision_data %>%
  summarise(
    Mean_Fatal = mean(Fatal_Collisions),
    Median_Fatal = median(Fatal_Collisions),
    Max_Total = max(Total_Collisions),
    Min_Total = min(Total_Collisions)
  )

Interpretation:
The average fatal collisions per month was ~21, with July being well above the average. Understanding these ranges helps contextualize what counts as an “exceptional” month.

8. Context: Road Safety and Driver Behavior

Alberta’s traffic patterns are influenced by multiple factors. Research from Transport Canada and the Alberta Ministry of Transportation has shown that:

Behavioral trends such as texting while driving, fatigue, and seasonal inexperience (e.g., summer drivers in rural areas) must be addressed in tandem with enforcement.

Alberta also sees a wide variety of road conditions across urban and rural areas. City driving introduces congestion and pedestrian risk, while highways present speeding and fatigue hazards. Understanding these environments is key for tailoring enforcement and safety campaigns.

9. Policy Implications and Practical Interventions

To reduce fatalities and injuries, Alberta’s municipalities and transport authorities can consider:

Infrastructure improvements could include smart signage, rumble strips at rural intersections, and improved lighting in high-collision areas. Public campaigns on distracted driving, speeding, and vehicle maintenance should align with these monthly trends.

10. Future Work and Recommendations

To build on this exploratory analysis, future projects could:

Additionally, deeper behavioral data (e.g., seatbelt use, texting, alcohol) could be gathered from law enforcement databases to enhance predictive models and personalize campaigns. A more automated data pipeline would also allow near real-time dashboards for transport agencies.

11. Conclusion

This report successfully uncovered seasonal and monthly trends in Alberta’s 2022 collision data. It reinforces the importance of proactive planning and seasonal preparation.

With access to a larger and more granular dataset, the analysis could be expanded to include predictive modeling and spatial clustering.

12. References

Appendix A: Full Dataset

kable(collision_data[, 1:4])
Month Fatal_Collisions Major_Injury_Collisions Total_Collisions
January 14 134 10759
February 14 112 9120
March 8 116 8588
April 14 114 7576
May 26 163 7874
June 23 184 8987
July 36 225 8996
August 32 242 8817
September 29 214 8967
October 25 217 9316
November 18 135 12887
December 12 174 15034

Appendix B: Additional Figures

ggplot(collision_data, aes(x = Fatal_Collisions)) +
  geom_histogram(binwidth = 2, fill = "darkred", color = "white") +
  labs(title = "Histogram of Fatal Collisions", x = "Fatal Collisions", y = "Frequency")

ggplot(collision_data, aes(x = Total_Collisions)) +
  geom_density(fill = "lightblue") +
  labs(title = "Density Plot of Total Collisions")