An analysis of accident rates for Santiam Pass.

Summary

The Oregon Dept of Transportation regularly publishes, as a public service, live road reports via Twitter. The record created by these tweets can be used to recontruct the times and locations of accidents on sections of road and also to correlate them with road conditions.

This analysis can be of benefit to drivers. The “average”" citizen has few ways to learn details about road safety in any quantitative way. This is especially useful to careful or experienced drivers who might become entangled in accidents caused by others. Avoiding times when these accidents are most frequent can increase peronal safety.

The analysis focuses on a specific location, US Highway 20 at Santiam Pass, a 4800 foot (1450 meters) mountain pass in the Cascade Range (milepost 79) of Oregon. As the main route from the Central Oregon city of Bend to the Willamette Valley cities of Portland, Eugene, and Salem, Highway20 has high traffic year round and is the site of frequent accidents.

This analysis looks analyzes reports of snow covered road surface.

This analysis establishes the accident frequency and density for a 11 mile stretch of road and looks at the correlation of snow and ice to traffic accidents. I show there is a high rate of accidents on the Pass (about one every two weeks) with rates varying by weekday as much as a factor of four. I also show the probability of accidents in adverse conditions is significantly higher than when there are no adverse conditions.

This anlaysis does not correlate accident rate to traffic volume, since that data are unavailable on Twitter.

## [1] 2014-11-16 2014-11-16 2014-11-15 2014-11-15 2014-11-15
## 482 Levels: 2012-11-01 2012-11-02 2012-11-09 2012-11-10 ... 2014-11-16

The data analyzed cover the dates from 2012-11-01 to 2014-11-16. There are 1500 tweets during this period.

Filtering on the crtieria: Santiam Pass Summit and crash and taking complete cases reduces the number of ’crash data points to 99. During the same period there were 181 days with snow.

summary statistics for the raw data

Over the period of study there were 0.8290713 accidents per week, meaning there is about one accident every two weeks when averaged over the entire record. Since the distance over which the accidents occured is the relatively short distance of 11 miles, the accident density is 3.9192463 accident/year/mile. This compares to the total number of crashes 267 reported along the 123 miles of highway covered by the TripCheckUS20B data or 1.063454 accident/year/mile. Subsetting the data to remove Santiam Pass from teh rest of Highway20 the accident rate drops to 0.7829744 over 112 miles. This means the accident rate on the 11 mile stretch of road as Santiam Pass is 5.0055866 times higher than on the rest of Highway20 on average. Note this does not preclude the existence of other hotspots.

During this period there were 181 days with snow reported on the road over a total of 745 days. The rest of this analysis will look at how this affects the accident rate.

Timeline of accidents

The graph below shows a timeline of accidents, with the location of the accident (measured in distance from the summit) represented by the y-axis. Red data points represent crash data, while blue data points represent days when snow was reported in the feed.

When do accidents occur?

We can also look at the correlation of accidents to the Day of Week. This correlation might be expected because traffic patterns change between weekdays and weekends.

The histrogram shows substantial variation depending on the day of the week. For instance Friday has about four times the number of accidents as Wednesday.

The mean number of accidents per weekday is 12.5714286 and the standard deviation is 6.0237625. The minimum number of accidents is 6 and the maximum is 22, having a ratio of 3.6666667:1.

Parenthetically, the number of snow days show no significant correlation to the weekday. This is what you’d expect, but it’s a good check of the data. The mean number of snow days per weekday is 25.8571429. For reference the z statistic for the min is -1.2756301 and the max is 1.3701212.

Hence, the day of the week plays as substantial role in the number of accidents.

Where do accidents occur?

This plot shows the distribution of accidents as measured by the distance from Santiam Pass Summit. In this specific case, while the number of accidents on the West side outweigh those on the East, there appears to be a strong bias to the distribution if one takes the mode as the center of the distribution.

The median distance of accidents is 2 Miles West of the Summit with a standard deviation of 2.6467138 about the mean 1.5795455. The data show a skewed distribution.

Is snow a signficant contributing factor to accident risk?

We can calculate rough estimated of the accident rate 0.1328859 crashes per day and the probability of snow 0.242953 per day during the same period.

## Waiting for profiling to be done...

Per this anlaysis in a period of 746 days there were:
+ 99 accidents
+ 181 days with snow
+ 53 days with snow and accidents

Using a binomial model, the coefficent relating accidents to snow is 0.414 with a confidence interval of 0.2982831 to 0.5666716, since the confidence interval does not include zero we would reject the NULL hypothesis and conclude snow is a signficant contributor to accidents.