This project is using data from a unpublished master’s paper by Carl Hoffstedt. They relate the automobile accident rate, in accidents per million vehicle miles to several potential terms. The data include 39 sections of large highways in the state of Minnesota in 1973. I am attempting to analyze the data to come to some conclusion as to relationship of variables to the accident rate by highway.
After the following analysis, my conclusion is that the accident rate is most directly related to the speed limit. And that as the speed limit increased from 40mph, there was a direct redt and continuous reduction in the accident rate.
library(readr)
HWYdf <- read_csv("HWYdataset.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## X1 = col_integer(),
## rate = col_double(),
## len = col_double(),
## adt = col_integer(),
## trks = col_integer(),
## sigs1 = col_double(),
## slim = col_integer(),
## shld = col_integer(),
## lane = col_integer(),
## acpt = col_double(),
## itg = col_double(),
## lwid = col_integer(),
## htype = col_character()
## )
head(HWYdf)
## # A tibble: 6 x 13
## X1 rate len adt trks sigs1 slim shld lane acpt itg lwid
## <int> <dbl> <dbl> <int> <int> <dbl> <int> <int> <int> <dbl> <dbl> <int>
## 1 1 4.58 4.99 69 8 0.200 55 10 8 4.6 1.2 12
## 2 2 2.86 16.1 73 8 0.0621 60 10 4 4.4 1.43 12
## 3 3 3.02 9.75 49 10 0.103 60 10 4 4.7 1.54 12
## 4 4 2.29 10.6 61 13 0.0939 65 10 6 3.8 0.94 12
## 5 5 1.61 20.0 28 12 0.0500 70 10 4 2.2 0.65 12
## 6 6 6.87 5.97 30 6 2.01 55 10 4 24.8 0.34 12
## # ... with 1 more variable: htype <chr>
summary(HWYdf)
## X1 rate len adt
## Min. : 1.0 Min. :1.610 Min. : 2.960 Min. : 1.00
## 1st Qu.:10.5 1st Qu.:2.630 1st Qu.: 7.995 1st Qu.: 5.00
## Median :20.0 Median :3.050 Median :11.390 Median :13.00
## Mean :20.0 Mean :3.933 Mean :12.884 Mean :19.62
## 3rd Qu.:29.5 3rd Qu.:4.595 3rd Qu.:17.800 3rd Qu.:24.00
## Max. :39.0 Max. :9.230 Max. :40.090 Max. :73.00
## trks sigs1 slim shld
## Min. : 6.000 Min. :0.04545 Min. :40 Min. : 1.000
## 1st Qu.: 8.000 1st Qu.:0.08738 1st Qu.:50 1st Qu.: 4.000
## Median : 9.000 Median :0.17666 Median :55 Median : 8.000
## Mean : 9.333 Mean :0.51072 Mean :55 Mean : 6.872
## 3rd Qu.:11.000 3rd Qu.:0.71515 3rd Qu.:60 3rd Qu.:10.000
## Max. :15.000 Max. :2.78933 Max. :70 Max. :10.000
## lane acpt itg lwid
## Min. :2.000 Min. : 2.20 Min. :0.0000 Min. :10.00
## 1st Qu.:2.000 1st Qu.: 6.95 1st Qu.:0.0000 1st Qu.:12.00
## Median :2.000 Median :10.30 Median :0.1300 Median :12.00
## Mean :3.128 Mean :12.16 Mean :0.2964 Mean :11.95
## 3rd Qu.:4.000 3rd Qu.:14.60 3rd Qu.:0.3600 3rd Qu.:12.00
## Max. :8.000 Max. :53.00 Max. :1.5400 Max. :13.00
## htype
## Length:39
## Class :character
## Mode :character
##
##
##
dim.data.frame(HWYdf)
## [1] 39 13
names(HWYdf)
## [1] "X1" "rate" "len" "adt" "trks" "sigs1" "slim" "shld"
## [9] "lane" "acpt" "itg" "lwid" "htype"
## Loading required package: ggplot2
ggplot(data=HWYdf) + geom_histogram(aes(x=lane),binwidth = .6)
###the above histogram shows that 37 of the 39 Highways have 2 or 4 lanes ###there is nearly an equal distribution between 2 and 4 lane highways
ggplot(data=HWYdf) + geom_bar(mapping = aes(x=htype),width = .6)
###the above histogram shows that 32 of the 39 highways were almost equally ###distributed between 2 types: MA & PA
ggplot(data=HWYdf) + geom_bar(mapping = aes(x=slim),width = .6)
###the above histogram shows that 33 of the 39 highways were distributed ##between 3 speeds: 50, 55 & 60
ggplot(data=HWYdf) + geom_histogram(aes(x=lane),binwidth = .6)
###the above histogram shows that 37 of the 39 Highways have 2 or 4 lanes ###there is nearly an equal distribution between 2 and 4 lane highways
ggplot(data=HWYdf) + geom_bar(mapping = aes(x=htype),width = .6)
###the above histogram shows that 32 of the 39 highways were almost equally ###distributed between 2 types: MA & PA
ggplot(data=HWYdf) + geom_bar(mapping = aes(x=slim),width = .6)
###this histogram shows that 33 of the 39 highways were distributed between ###3 speeds: 50, 55 & 60
ggplot(HWYdf,aes(rate)) + geom_histogram(binwidth=2)
###the above histogram shows that the majority of Minnesota Highways had an accident between 1-5% per million miles driven
aggregate(rate ~ htype,HWYdf,mean)
## htype rate
## 1 FAI 2.872000
## 2 MA 4.870000
## 3 MC 3.585000
## 4 PA 3.608421
aggregate(rate ~ slim,HWYdf,mean)
## slim rate
## 1 40 9.230000
## 2 45 7.283333
## 3 50 4.055714
## 4 55 3.985333
## 5 60 2.750000
## 6 65 2.290000
## 7 70 1.610000
aggregate(slim ~ htype,HWYdf,mean)
## htype slim
## 1 FAI 62.00000
## 2 MA 51.53846
## 3 MC 57.50000
## 4 PA 55.26316
You can also embed plots, for example:
ggplot(HWYdf,aes(x=lane,y=rate)) + geom_point()
###this histogram shows that the 8-lane highway had a high accident rate ###of nearly 5% ###There were also very high accident rates on some 2 & 4 lane highways as well ###The 6-lane highway had a very low accident rate of less than 2.5%
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.