Alberta Traffic Collision Analysis (2022)
DAT 301 - Project 1
Author: Rudy
Date: 2025-03-27
(Automatically generated)
This report analyzes monthly traffic collision data from Alberta in 2022. The objective is to discover trends and draw conclusions that can support future safety measures across Alberta’s highways and city streets.
Traffic accidents are influenced by numerous variables, including weather, driver behavior, enforcement intensity, and traffic volume. Alberta presents a unique case due to its geographic and climate diversity—ranging from snowy winters to busy summer travel seasons.
Understanding the monthly and seasonal distribution of traffic collisions can help policymakers implement timely interventions and road safety campaigns. This project uses a manually extracted subset of the provincial traffic data to demonstrate how data analysis can support real-world decision-making.
The dataset was manually constructed using figures from the 2022 Alberta Traffic Collision Statistics PDF available on the Alberta Open Government Portal.
collision_data <- read_csv("C:\\Users\\iamru\\OneDrive\\Documents\\dat301\\Alberta_Collision_Monthly_2022.csv")
## Rows: 12 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Month
## dbl (3): Fatal_Collisions, Major_Injury_Collisions, Total_Collisions
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(collision_data)
## Rows: 12
## Columns: 4
## $ Month <chr> "January", "February", "March", "April", "May"…
## $ Fatal_Collisions <dbl> 14, 14, 8, 14, 26, 23, 36, 32, 29, 25, 18, 12
## $ Major_Injury_Collisions <dbl> 134, 112, 116, 114, 163, 184, 225, 242, 214, 2…
## $ Total_Collisions <dbl> 10759, 9120, 8588, 7576, 7874, 8987, 8996, 881…
summary(collision_data)
## Month Fatal_Collisions Major_Injury_Collisions Total_Collisions
## Length:12 Min. : 8.00 Min. :112.0 Min. : 7576
## Class :character 1st Qu.:14.00 1st Qu.:129.5 1st Qu.: 8760
## Mode :character Median :20.50 Median :168.5 Median : 8992
## Mean :20.92 Mean :169.2 Mean : 9743
## 3rd Qu.:26.75 3rd Qu.:214.8 3rd Qu.: 9677
## Max. :36.00 Max. :242.0 Max. :15034
skim(collision_data)
Name | collision_data |
Number of rows | 12 |
Number of columns | 4 |
_______________________ | |
Column type frequency: | |
character | 1 |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Month | 0 | 1 | 3 | 9 | 0 | 12 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Fatal_Collisions | 0 | 1 | 20.92 | 8.85 | 8 | 14.00 | 20.5 | 26.75 | 36 | ▃▇▂▆▃ |
Major_Injury_Collisions | 0 | 1 | 169.17 | 47.24 | 112 | 129.50 | 168.5 | 214.75 | 242 | ▇▂▃▂▅ |
Total_Collisions | 0 | 1 | 9743.42 | 2165.73 | 7576 | 8759.75 | 8991.5 | 9676.75 | 15034 | ▇▂▁▁▁ |
collision_data$Month <- factor(collision_data$Month, levels = month.name)
ggplot(collision_data, aes(x = Month, y = Total_Collisions)) +
geom_bar(stat = "identity", fill = "steelblue") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Total Traffic Collisions by Month", y = "Number of Collisions", x = "Month")
Interpretation:
December reported the highest number of traffic collisions (15,034),
likely due to challenging winter driving conditions. January and
November also show elevated collision counts, possibly related to icy
roads and holiday travel. The lowest month was April, indicating more
favorable driving conditions and fewer disruptions.
collision_data_long <- collision_data %>%
pivot_longer(cols = c(Fatal_Collisions, Major_Injury_Collisions), names_to = "Type", values_to = "Count")
ggplot(collision_data_long, aes(x = Month, y = Count, group = Type, color = Type)) +
geom_line(linewidth = 1.2) +
geom_point(size = 2) +
labs(title = "Fatal vs Major Injury Collisions by Month", x = "Month", y = "Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Interpretation:
July had the most fatal collisions (36), while August had the most major
injuries (242). These summer peaks might be attributed to increased
traffic during vacation season, more vulnerable road users (pedestrians,
cyclists, motorcycles), and possibly driver fatigue on long trips.
collision_data <- collision_data %>%
mutate(Season = case_when(
Month %in% c("December", "January", "February") ~ "Winter",
Month %in% c("March", "April", "May") ~ "Spring",
Month %in% c("June", "July", "August") ~ "Summer",
TRUE ~ "Fall"
))
seasonal_summary <- collision_data %>%
group_by(Season) %>%
summarise(across(Fatal_Collisions:Total_Collisions, sum))
kable(seasonal_summary)
Season | Fatal_Collisions | Major_Injury_Collisions | Total_Collisions |
---|---|---|---|
Fall | 72 | 566 | 31170 |
Spring | 48 | 393 | 24038 |
Summer | 91 | 651 | 26800 |
Winter | 40 | 420 | 34913 |
Interpretation:
- Winter leads in total collisions due to slippery road
conditions.
- Summer leads in fatalities and injuries—likely due to
higher speeds and volume.
collision_data %>%
pivot_longer(cols = c(Fatal_Collisions, Major_Injury_Collisions), names_to = "Type", values_to = "Count") %>%
ggplot(aes(x = Season, y = Count, fill = Type)) +
geom_boxplot() +
labs(title = "Collision Severity by Season")
Interpretation:
Boxplots illustrate how summer collisions tend to have a higher range
and median for both fatalities and injuries, reinforcing the seasonal
risks.
collision_data <- collision_data %>%
arrange(match(Month, month.name)) %>%
mutate(Pct_Change = round(100 * (Total_Collisions - lag(Total_Collisions)) / lag(Total_Collisions), 1))
kable(collision_data[, c("Month", "Total_Collisions", "Pct_Change")])
Month | Total_Collisions | Pct_Change |
---|---|---|
January | 10759 | NA |
February | 9120 | -15.2 |
March | 8588 | -5.8 |
April | 7576 | -11.8 |
May | 7874 | 3.9 |
June | 8987 | 14.1 |
July | 8996 | 0.1 |
August | 8817 | -2.0 |
September | 8967 | 1.7 |
October | 9316 | 3.9 |
November | 12887 | 38.3 |
December | 15034 | 16.7 |
Interpretation:
Significant jumps are seen in the lead-up to winter months and just
after summer holidays. This can help predict when to ramp up road safety
messaging.
collision_data %>%
summarise(
Mean_Fatal = mean(Fatal_Collisions),
Median_Fatal = median(Fatal_Collisions),
Max_Total = max(Total_Collisions),
Min_Total = min(Total_Collisions)
)
Interpretation:
The average fatal collisions per month was ~21, with July being well
above the average. Understanding these ranges helps contextualize what
counts as an “exceptional” month.
Alberta’s traffic patterns are influenced by multiple factors. Research from Transport Canada and the Alberta Ministry of Transportation has shown that:
Behavioral trends such as texting while driving, fatigue, and seasonal inexperience (e.g., summer drivers in rural areas) must be addressed in tandem with enforcement.
Alberta also sees a wide variety of road conditions across urban and rural areas. City driving introduces congestion and pedestrian risk, while highways present speeding and fatigue hazards. Understanding these environments is key for tailoring enforcement and safety campaigns.
To reduce fatalities and injuries, Alberta’s municipalities and transport authorities can consider:
Infrastructure improvements could include smart signage, rumble strips at rural intersections, and improved lighting in high-collision areas. Public campaigns on distracted driving, speeding, and vehicle maintenance should align with these monthly trends.
To build on this exploratory analysis, future projects could:
Additionally, deeper behavioral data (e.g., seatbelt use, texting, alcohol) could be gathered from law enforcement databases to enhance predictive models and personalize campaigns. A more automated data pipeline would also allow near real-time dashboards for transport agencies.
This report successfully uncovered seasonal and monthly trends in Alberta’s 2022 collision data. It reinforces the importance of proactive planning and seasonal preparation.
With access to a larger and more granular dataset, the analysis could be expanded to include predictive modeling and spatial clustering.
kable(collision_data[, 1:4])
Month | Fatal_Collisions | Major_Injury_Collisions | Total_Collisions |
---|---|---|---|
January | 14 | 134 | 10759 |
February | 14 | 112 | 9120 |
March | 8 | 116 | 8588 |
April | 14 | 114 | 7576 |
May | 26 | 163 | 7874 |
June | 23 | 184 | 8987 |
July | 36 | 225 | 8996 |
August | 32 | 242 | 8817 |
September | 29 | 214 | 8967 |
October | 25 | 217 | 9316 |
November | 18 | 135 | 12887 |
December | 12 | 174 | 15034 |
ggplot(collision_data, aes(x = Fatal_Collisions)) +
geom_histogram(binwidth = 2, fill = "darkred", color = "white") +
labs(title = "Histogram of Fatal Collisions", x = "Fatal Collisions", y = "Frequency")
ggplot(collision_data, aes(x = Total_Collisions)) +
geom_density(fill = "lightblue") +
labs(title = "Density Plot of Total Collisions")