month_day | avg_fatalities | z_score | overall_mean | overall_sd |
---|---|---|---|---|
1-1 | 156.6800 | 4.398594 | 124.8929 | 7.226650 |
5-23 | 160.6000 | 2.841433 | 146.9510 | 4.803573 |
7-4 | 176.0000 | 2.630590 | 155.1665 | 7.919725 |
3-16 | 140.8000 | 2.619854 | 130.3265 | 3.997761 |
11-1 | 167.4000 | 2.570047 | 149.9387 | 6.794170 |
4-3 | 148.4000 | 2.395102 | 139.0240 | 3.914656 |
7-3 | 174.0000 | 2.378056 | 155.1665 | 7.919725 |
2-29 | 137.1429 | 2.217328 | 126.5635 | 4.771198 |
9-4 | 164.4800 | 2.090657 | 153.2707 | 5.361633 |
10-31 | 168.6400 | 2.053682 | 156.3923 | 5.963797 |
Fatal Car Accidents in the U.S
Introduction
Traffic fatalities remain one of America’s most persistent public health challenges. While advances in vehicle safety technology and traffic engineering have reduced overall fatality rates, certain days of the year consistently demonstrate higher risks for deadly crashes. This case study examines temporal patterns in U.S. fatal accidents, seeking to identify the most dangerous days on American roadways and the factors that may contribute to these elevated risks.
This case study is related in tidytuesday project that is a weekly social data project for data lovers.
Question
Can you detect any correlations between fatal car crashes and particular days of the year?
What are the most dangerous days of the year for fatal car crashes in the United States?
What other factors might help analyze the data in more detail? You can use the cleaning script to download the full dataset.
Key Visualizations
Correlations by month and day
Most dangerous days of the year
Geographical analysis by the most three dangerous days
Temporal and Demographical analysis extension
# Summary
Summer and Fall seasons are relatively high fatal car accidents because of major festive days.
Independence day and Halloween day are the most likely fatal accidents are happened.
In the South-West of the U.S regions, there are more dangerous areas that fatal accidents happen during the above festive days.
Sernior age group likely makes accidents during day time. Younger age group likely make accidents during night time.
The Data, Data Dictionary
daily_accidents.csv
variable | class | description |
---|---|---|
date | date | Date of the accident. |
fatalities_count | integer | Total count of fatalities. |
Data preparation - create calendar features
1. Calculate average fatalities for each day of the year
A high z-score, like 4.39, means the data point is very far above the average. In a normal distribution, roughly 99.7% of data falls within 3 standard deviations of the mean. A z-score of 4.39 suggests the data point is outside this range and is considered an outlier.
2. Visualization of average fatalities across the year
Create a continuous day-of-year variable for plotting
3. Highlight statistically significant days
Confidence Interval | Z-score |
---|---|
90% | 1.645 |
95% | 1.96 |
99% | 2.575 |
99.5% | 2.81 |
99.9% | 3.29 |
Define significant threshold
Plot with significant days highlighted
4. Calendar heatmap visualization
Prepare data for calendar heatmap
Ranking
Day
[1] "assets/fatalities_top10.png"
Rank | Date | Avg. Fatalities | % of Max | Season | Special Day | Day of Year |
---|---|---|---|---|---|---|
1 | July 04 | 176.00 | 100.0 | Summer | Independence Day | 185 |
2 | July 03 | 174.00 | 98.9 | Summer | 184 | |
3 | October 31 | 168.64 | 95.8 | Fall | Halloween | 304 |
4 | August 06 | 168.52 | 95.8 | Summer | 218 | |
5 | October 11 | 167.52 | 95.2 | Fall | 284 | |
6 | November 01 | 167.40 | 95.1 | Fall | Month Start | 305 |
7 | August 08 | 166.72 | 94.7 | Summer | 220 | |
8 | October 09 | 165.72 | 94.2 | Fall | 282 | |
9 | August 02 | 165.56 | 94.1 | Summer | 214 | |
10 | October 16 | 164.76 | 93.6 | Fall | 289 |
Month
5. Identify specific notable dates and holidays
Define common US holidays and notable dates
6. Analyze day-to-day variations
Calculate day-to-day changes in fatality counts
7. Weekly patterns within months
8. Analyzing specific day patterns (e.g., first day of month, last day of month)
9. Seasonal patterns analysis
10. Statistical tests for particular days
Let’s run t-tests comparing specific days to all other days!
Dates with tall red bars: These holidays have fatality patterns that are very unlikely to be due to random chance. The taller the bar, the stronger the evidence that something unusual happens on that date. Dates with short gray bars: These holidays don’t show convincing evidence of different fatality patterns compared to regular days. The ordering: The dates are ordered by p-value (least significant to most significant), which helps visualize which holidays show the strongest patterns.
11. Visualize difference from yearly average
Arrange all plots in a grid for comprehensive view
Holiday and special days analysis
Key findings to questions
Can you detect any correlations between fatal car crashes and particular days of the year?
- As the rank shows, especially on the independence day and the day before the 4th of July, the fatal car crashes happen.
- In addition, on the Halloween day and the next day of the Halloween, incidents happen.
- When we look at the probability by z-score, the New year’s day is also the day that fatal accidents likely happen.
What are the most dangerous days of the year for fatal car crashes in the United States?
- The 4th of July, the Independence day is the most dangerous days of the year in the United States.
What other factors might help analyze the data in more detail? You can use the cleaning script to download the full dataset.
So far, the 4th and the 5th of July, and the 31st of October are the days of the heighest numbers of fata car accidents. Let’s see demographically, geographically, and temporally points of views.
Temporal and Demographical analysis extension
Check the accidents between day-time and night-time
Geographical analysis
In overall, the south-west part of the U.S.A is recorded as the highest fatal car accidents since 1992.