Project Description

This project is using data from a unpublished master’s paper by Carl Hoffstedt. They relate the automobile accident rate, in accidents per million vehicle miles to several potential terms. The data include 39 sections of large highways in the state of Minnesota in 1973. I am attempting to analyze the data to come to some conclusion as to relationship of variables to the accident rate by highway.

After the following analysis, my conclusion is that the accident rate is most directly related to the speed limit. And that as the speed limit increased from 40mph, there was a direct redt and continuous reduction in the accident rate.

MN Highway Dataset Analysis

Dataset Imported into HWYdf dataframe

DataFrame Descriptive Statistics

library(readr)
HWYdf <- read_csv("HWYdataset.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   X1 = col_integer(),
##   rate = col_double(),
##   len = col_double(),
##   adt = col_integer(),
##   trks = col_integer(),
##   sigs1 = col_double(),
##   slim = col_integer(),
##   shld = col_integer(),
##   lane = col_integer(),
##   acpt = col_double(),
##   itg = col_double(),
##   lwid = col_integer(),
##   htype = col_character()
## )
head(HWYdf)
## # A tibble: 6 x 13
##      X1  rate   len   adt  trks  sigs1  slim  shld  lane  acpt   itg  lwid
##   <int> <dbl> <dbl> <int> <int>  <dbl> <int> <int> <int> <dbl> <dbl> <int>
## 1     1  4.58  4.99    69     8 0.200     55    10     8   4.6  1.2     12
## 2     2  2.86 16.1     73     8 0.0621    60    10     4   4.4  1.43    12
## 3     3  3.02  9.75    49    10 0.103     60    10     4   4.7  1.54    12
## 4     4  2.29 10.6     61    13 0.0939    65    10     6   3.8  0.94    12
## 5     5  1.61 20.0     28    12 0.0500    70    10     4   2.2  0.65    12
## 6     6  6.87  5.97    30     6 2.01      55    10     4  24.8  0.34    12
## # ... with 1 more variable: htype <chr>
summary(HWYdf)
##        X1            rate            len              adt       
##  Min.   : 1.0   Min.   :1.610   Min.   : 2.960   Min.   : 1.00  
##  1st Qu.:10.5   1st Qu.:2.630   1st Qu.: 7.995   1st Qu.: 5.00  
##  Median :20.0   Median :3.050   Median :11.390   Median :13.00  
##  Mean   :20.0   Mean   :3.933   Mean   :12.884   Mean   :19.62  
##  3rd Qu.:29.5   3rd Qu.:4.595   3rd Qu.:17.800   3rd Qu.:24.00  
##  Max.   :39.0   Max.   :9.230   Max.   :40.090   Max.   :73.00  
##       trks            sigs1              slim         shld       
##  Min.   : 6.000   Min.   :0.04545   Min.   :40   Min.   : 1.000  
##  1st Qu.: 8.000   1st Qu.:0.08738   1st Qu.:50   1st Qu.: 4.000  
##  Median : 9.000   Median :0.17666   Median :55   Median : 8.000  
##  Mean   : 9.333   Mean   :0.51072   Mean   :55   Mean   : 6.872  
##  3rd Qu.:11.000   3rd Qu.:0.71515   3rd Qu.:60   3rd Qu.:10.000  
##  Max.   :15.000   Max.   :2.78933   Max.   :70   Max.   :10.000  
##       lane            acpt            itg              lwid      
##  Min.   :2.000   Min.   : 2.20   Min.   :0.0000   Min.   :10.00  
##  1st Qu.:2.000   1st Qu.: 6.95   1st Qu.:0.0000   1st Qu.:12.00  
##  Median :2.000   Median :10.30   Median :0.1300   Median :12.00  
##  Mean   :3.128   Mean   :12.16   Mean   :0.2964   Mean   :11.95  
##  3rd Qu.:4.000   3rd Qu.:14.60   3rd Qu.:0.3600   3rd Qu.:12.00  
##  Max.   :8.000   Max.   :53.00   Max.   :1.5400   Max.   :13.00  
##     htype          
##  Length:39         
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
dim.data.frame(HWYdf)
## [1] 39 13
names(HWYdf)
##  [1] "X1"    "rate"  "len"   "adt"   "trks"  "sigs1" "slim"  "shld" 
##  [9] "lane"  "acpt"  "itg"   "lwid"  "htype"

DataFrame Descriptives using Plots & Histograms

## Loading required package: ggplot2

display a histogram representing the distribution of the number of

highway lanes

ggplot(data=HWYdf) + geom_histogram(aes(x=lane),binwidth = .6)

###the above histogram shows that 37 of the 39 Highways have 2 or 4 lanes ###there is nearly an equal distribution between 2 and 4 lane highways

display a histogram representing this distribution of the highway ###types/funding source

ggplot(data=HWYdf) + geom_bar(mapping = aes(x=htype),width = .6)

###the above histogram shows that 32 of the 39 highways were almost equally ###distributed between 2 types: MA & PA

display a histogram representing this distribution of the speed lmits

ggplot(data=HWYdf) + geom_bar(mapping = aes(x=slim),width = .6)

###the above histogram shows that 33 of the 39 highways were distributed ##between 3 speeds: 50, 55 & 60

display a histogram representing the distribuion of the number of highway ###lanes

ggplot(data=HWYdf) + geom_histogram(aes(x=lane),binwidth = .6)

###the above histogram shows that 37 of the 39 Highways have 2 or 4 lanes ###there is nearly an equal distribution between 2 and 4 lane highways

display a histogram representing this distribution of the highway ##types/funding source

ggplot(data=HWYdf) + geom_bar(mapping = aes(x=htype),width = .6)

###the above histogram shows that 32 of the 39 highways were almost equally ###distributed between 2 types: MA & PA

display a histogram representing this distribution of the speed lmits

ggplot(data=HWYdf) + geom_bar(mapping = aes(x=slim),width = .6)

###this histogram shows that 33 of the 39 highways were distributed between ###3 speeds: 50, 55 & 60

display a histogram (using freqpoly) representing this distribution of the ###accident rates per million miles driven

ggplot(HWYdf,aes(rate)) + geom_histogram(binwidth=2)

###the above histogram shows that the majority of Minnesota Highways had an accident between 1-5% per million miles driven

calculate aggregates to look at accident rates grouped by highway type

aggregate(rate ~ htype,HWYdf,mean)
##   htype     rate
## 1   FAI 2.872000
## 2    MA 4.870000
## 3    MC 3.585000
## 4    PA 3.608421

the above shows that the MA highways have the highest accident rates

calculate aggregates to look at accident rates grouped by highway

speed limits

aggregate(rate ~ slim,HWYdf,mean)
##   slim     rate
## 1   40 9.230000
## 2   45 7.283333
## 3   50 4.055714
## 4   55 3.985333
## 5   60 2.750000
## 6   65 2.290000
## 7   70 1.610000

we see that the highways with lowest speed limitshave the highest ###accident rates

we see that accident rates steadily decline as the speed limit ###increases

calculate aggregates to look at avg speedlimits grouped by highway

type

aggregate(slim ~ htype,HWYdf,mean)
##   htype     slim
## 1   FAI 62.00000
## 2    MA 51.53846
## 3    MC 57.50000
## 4    PA 55.26316

we see that the highway type MA has the lowest average speed limits ## but ###the highest averageaccident rates

Including Plots

You can also embed plots, for example:

display a scatterplot displaying the relationship to accident rates

by highway lanes

ggplot(HWYdf,aes(x=lane,y=rate)) + geom_point()

###this histogram shows that the 8-lane highway had a high accident rate ###of nearly 5% ###There were also very high accident rates on some 2 & 4 lane highways as well ###The 6-lane highway had a very low accident rate of less than 2.5%

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.