Analysis of winter road safety probabilities for Santiam Pass, Oregon

Summary

It’s hard for the average person to find historical information about road safety, such as accident rate information based on location, time of day, and weather conditions.

However, the Oregon Dept of Transportation regularly publishes, as a public service, live road reports via Twitter. This creates, in effect, a record of conditions and traffic incidents which can be mined.

This analysis repurposes the ODOT twitter feed to recontruct historical events on a specific section of road. Here we look at accident rates taking the last 3000 tweets from the Tripcheck Highway 20B twitter account as an example. The focus area is US Highway 20 at Santiam Pass, a 4800 foot (1450 meters) mountain pass in the Cascade Range (milepost 79) of Oregon.

As the main route from the Central Oregon city of Bend to the Willamette Valley cities of Portland, Eugene, and Salem, Highway20 has high traffic year round and is the site of frequent accidents.

Getting the Tweets

The twitter feed for this analysis is downloaded by a separate R program and stored locally in a .csv file.

Data Cleaning

The data analyzed cover the dates from 2012-01-19 to 2016-01-25. There are 3000 tweets during this period.

Data are cleaned by searching the text for the strings “crash” and “snow” and then filtered for location “Santiam Pass Summit”. Dates are converted to decimal hours (AM and PM) and also a “timeperiod.” Below is an example of a very simple “text filter” function.

trip_incident_filter <- function(data_df, incident = "crash"){
##FUNCTION filtering tweets for incidents 
            ## accepts: data frame requiring text field to search
            ## returns: selected rows containing serach string

            ## filter incidents
            hwy_inc <-data_df[grep(tolower(incident), tolower(data_df$text)),]

            return(hwy_inc)
            }

A few instances of cleaned and engineered raw data are shown below.

created text dayperiod
1 2016-01-25 03:05:47 US20, 1 Mi N of Bend, Unconfirmed, An unconfirmed report of a crash has been received, use caution…. https://t.co/JQVp0bZ2Ik night
2 2016-01-24 18:01:52 US20 , tombstone, Slush or snow pack, Carry chains/Traction tires… https://t.co/6K2PHCZKLG midday
3 2016-01-24 18:01:52 US20 , santiam pass smt, Slush or snow pack, Carry chains/Traction tires… https://t.co/vuOylEMp31 midday
4 2016-01-24 12:05:50 US20 , santiam pass smt, Packed snow, Carry chains/Traction tires… https://t.co/vuOylEMp31 morning
5 2016-01-24 04:43:50 US20 , tombstone, Packed snow, Carry chains/Traction tires… https://t.co/6K2PHCZKLG night
6 2016-01-24 04:29:50 US20 , santiam pass smt, Slush or snow pack, Carry chains/Traction tires… https://t.co/vuOylEMp31 night
7 2016-01-23 18:07:49 US20 , santiam pass smt, Packed snow, Carry chains/Traction tires… https://t.co/vuOylEMp31 midday
8 2016-01-23 12:19:50 US20 , santiam pass smt, Slush or snow pack, Carry chains/Traction tires… https://t.co/vuOylEMp31 morning

Data Analysis

Highway 20

First let’s look at the accident data for the entire Highway20B twitter feed.
A histogram of accidents versus hour in the day shows what might be expected behavior. Low levels overnight, showing a sharper increase at the start of the morning commute and then rising throughout the workday to a peak at 6:00 PM followed by a sharp decline into late evening.

The accident rates above show little day to day variation. Another visualization makes variations from horu to hour and day to day clearer.