Fatal Car Accidents in the U.S

Author

S.Matsumoto

Published

May 7, 2025

Introduction

Traffic fatalities remain one of America’s most persistent public health challenges. While advances in vehicle safety technology and traffic engineering have reduced overall fatality rates, certain days of the year consistently demonstrate higher risks for deadly crashes. This case study examines temporal patterns in U.S. fatal accidents, seeking to identify the most dangerous days on American roadways and the factors that may contribute to these elevated risks.

This case study is related in tidytuesday project that is a weekly social data project for data lovers.

Question

  • Can you detect any correlations between fatal car crashes and particular days of the year?

  • What are the most dangerous days of the year for fatal car crashes in the United States?

  • What other factors might help analyze the data in more detail? You can use the cleaning script to download the full dataset.

Key Visualizations

Correlations by month and day

Most dangerous days of the year

Ranked by average cases by day from 1992 to 2024.

Geographical analysis by the most three dangerous days

Temporal and Demographical analysis extension

# Summary

  • Summer and Fall seasons are relatively high fatal car accidents because of major festive days.

  • Independence day and Halloween day are the most likely fatal accidents are happened.

  • In the South-West of the U.S regions, there are more dangerous areas that fatal accidents happen during the above festive days.

  • Sernior age group likely makes accidents during day time. Younger age group likely make accidents during night time.

The Data, Data Dictionary

daily_accidents.csv

variable class description
date date Date of the accident.
fatalities_count integer Total count of fatalities.

Data preparation - create calendar features

1. Calculate average fatalities for each day of the year

month_day

avg_fatalities

z_score

overall_mean

overall_sd

1-1

156.6800

4.398594

124.8929

7.226650

5-23

160.6000

2.841433

146.9510

4.803573

7-4

176.0000

2.630590

155.1665

7.919725

3-16

140.8000

2.619854

130.3265

3.997761

11-1

167.4000

2.570047

149.9387

6.794170

4-3

148.4000

2.395102

139.0240

3.914656

7-3

174.0000

2.378056

155.1665

7.919725

2-29

137.1429

2.217328

126.5635

4.771198

9-4

164.4800

2.090657

153.2707

5.361633

10-31

168.6400

2.053682

156.3923

5.963797

A high z-score, like 4.39, means the data point is very far above the average. In a normal distribution, roughly 99.7% of data falls within 3 standard deviations of the mean. A z-score of 4.39 suggests the data point is outside this range and is considered an outlier.

2. Visualization of average fatalities across the year

Create a continuous day-of-year variable for plotting

3. Highlight statistically significant days

Confidence interval to Z-score.
A confidence interval does not definitively state the most likely outcome, but it provides a range of plausible values. It indicates the probability that if we were to repeatedly sample and construct intervals, a certain percentage of those intervals would contain the true population value.
Confidence Interval Z-score
90% 1.645
95% 1.96
99% 2.575
99.5% 2.81
99.9% 3.29

Define significant threshold

Plot with significant days highlighted

4. Calendar heatmap visualization

Prepare data for calendar heatmap

Ranking

Day

[1] "assets/fatalities_top10.png"

Rank

Date

Avg. Fatalities

% of Max

Season

Special Day

Day of Year

1

July 04

176.00

100.0

Summer

Independence Day

185

2

July 03

174.00

98.9

Summer

184

3

October 31

168.64

95.8

Fall

Halloween

304

4

August 06

168.52

95.8

Summer

218

5

October 11

167.52

95.2

Fall

284

6

November 01

167.40

95.1

Fall

Month Start

305

7

August 08

166.72

94.7

Summer

220

8

October 09

165.72

94.2

Fall

282

9

August 02

165.56

94.1

Summer

214

10

October 16

164.76

93.6

Fall

289

Month

5. Identify specific notable dates and holidays

Define common US holidays and notable dates

6. Analyze day-to-day variations

Calculate day-to-day changes in fatality counts

7. Weekly patterns within months

8. Analyzing specific day patterns (e.g., first day of month, last day of month)

9. Seasonal patterns analysis

10. Statistical tests for particular days

Let’s run t-tests comparing specific days to all other days!

Dates with tall red bars: These holidays have fatality patterns that are very unlikely to be due to random chance. The taller the bar, the stronger the evidence that something unusual happens on that date. Dates with short gray bars: These holidays don’t show convincing evidence of different fatality patterns compared to regular days. The ordering: The dates are ordered by p-value (least significant to most significant), which helps visualize which holidays show the strongest patterns.

11. Visualize difference from yearly average

Arrange all plots in a grid for comprehensive view

Holiday and special days analysis

Key findings to questions

Can you detect any correlations between fatal car crashes and particular days of the year?

  • As the rank shows, especially on the independence day and the day before the 4th of July, the fatal car crashes happen.
  • In addition, on the Halloween day and the next day of the Halloween, incidents happen.
  • When we look at the probability by z-score, the New year’s day is also the day that fatal accidents likely happen.

What are the most dangerous days of the year for fatal car crashes in the United States?

  • The 4th of July, the Independence day is the most dangerous days of the year in the United States.

What other factors might help analyze the data in more detail? You can use the cleaning script to download the full dataset.

So far, the 4th and the 5th of July, and the 31st of October are the days of the heighest numbers of fata car accidents. Let’s see demographically, geographically, and temporally points of views.

Temporal and Demographical analysis extension

Check the accidents between day-time and night-time

Geographical analysis

In overall, the south-west part of the U.S.A is recorded as the highest fatal car accidents since 1992.