According to the 2014 record of Traffic Fatalities by State in USA, how is the alcohol limit affects the fatality.
Does the fatality increases as the number of driver increase or will it decrease?
Has the DUI Law of BAC legal limit has decreased the number of drivers on the street of USA?
Also predicting the model, on how number of Fatality is affected with the legal limit?
https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812231
Used the tools in R such as str() and summary() to describe the original dataset you imported.
Looking at the str(TFAC) the dataframe has 51 observations for States (including Washington District of Columbia) with datatype Character and 12 variables/Columns with datatype integer for different Blood Alcohol Content level.
Summary describes the fatalities for each category of BAC level with Min value, Max Value, Median and 1st & 3rd Quartile.
library(readr)
TFAC <- read_csv("C:/Users/senet/Desktop/COLLEGE/St.Martins Classes/CSC 475 Design Business Intelligence Software/Project/TFAC.csv")
## Warning: Duplicated column names deduplicated: 'PCT' => 'PCT_1' [6], 'PCT'
## => 'PCT_2' [8], 'PCT' => 'PCT_3' [10], 'PCT' => 'PCT_4' [12]
## Parsed with column specification:
## cols(
## State = col_character(),
## TotalFatalities = col_integer(),
## BAC.00 = col_integer(),
## PCT = col_integer(),
## BAC0.01to0.07 = col_integer(),
## PCT_1 = col_integer(),
## BAC.08plus = col_integer(),
## PCT_2 = col_integer(),
## BAC.15plus = col_integer(),
## PCT_3 = col_integer(),
## BAC.01plus = col_integer(),
## PCT_4 = col_integer()
## )
View(TFAC)
str(TFAC)
## Classes 'tbl_df', 'tbl' and 'data.frame': 51 obs. of 12 variables:
## $ State : chr "Alabama" "Alaska" "Arizona" "Arkansas" ...
## $ TotalFatalities: int 820 73 770 466 3074 488 248 121 23 2494 ...
## $ BAC.00 : int 508 45 505 302 2017 300 135 70 17 1686 ...
## $ PCT : int 62 61 66 65 66 62 54 58 74 68 ...
## $ BAC0.01to0.07 : int 48 6 48 27 171 26 17 3 1 122 ...
## $ PCT_1 : int 6 9 6 6 6 5 7 2 5 5 ...
## $ BAC.08plus : int 264 22 199 135 882 160 97 49 5 685 ...
## $ PCT_2 : int 32 30 26 29 29 33 39 40 21 27 ...
## $ BAC.15plus : int 180 17 14 94 602 112 63 26 4 470 ...
## $ PCT_3 : int 22 23 18 20 20 23 26 22 16 19 ...
## $ BAC.01plus : int 312 29 246 162 1053 187 113 51 6 807 ...
## $ PCT_4 : int 38 39 32 35 34 38 46 42 26 32 ...
## - attr(*, "spec")=List of 2
## ..$ cols :List of 12
## .. ..$ State : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ TotalFatalities: list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ BAC.00 : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ PCT : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ BAC0.01to0.07 : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ PCT_1 : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ BAC.08plus : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ PCT_2 : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ BAC.15plus : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ PCT_3 : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ BAC.01plus : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ PCT_4 : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## ..$ default: list()
## .. ..- attr(*, "class")= chr "collector_guess" "collector"
## ..- attr(*, "class")= chr "col_spec"
summary(TFAC)
## State TotalFatalities BAC.00 PCT
## Length:51 Min. : 23.0 Min. : 17.0 Min. :51.00
## Class :character 1st Qu.: 208.5 1st Qu.: 130.0 1st Qu.:61.50
## Mode :character Median : 462.0 Median : 292.0 Median :64.00
## Mean : 640.7 Mean : 409.0 Mean :64.02
## 3rd Qu.: 795.0 3rd Qu.: 506.5 3rd Qu.:67.00
## Max. :3538.0 Max. :2017.0 Max. :76.00
## BAC0.01to0.07 PCT_1 BAC.08plus PCT_2
## Min. : 1.00 Min. : 1.000 Min. : 5.0 Min. :20.00
## 1st Qu.: 11.50 1st Qu.: 4.000 1st Qu.: 58.0 1st Qu.:27.50
## Median : 26.00 Median : 5.000 Median : 134.0 Median :29.00
## Mean : 34.69 Mean : 5.451 Mean : 195.6 Mean :30.31
## 3rd Qu.: 46.50 3rd Qu.: 6.000 3rd Qu.: 234.0 3rd Qu.:33.00
## Max. :224.00 Max. :12.000 Max. :1446.0 Max. :41.00
## BAC.15plus PCT_3 BAC.01plus PCT_4
## Min. : 4.0 Min. :14.00 Min. : 6.0 Min. :24.00
## 1st Qu.: 38.5 1st Qu.:19.00 1st Qu.: 71.5 1st Qu.:32.50
## Median : 89.0 Median :21.00 Median : 162.0 Median :35.00
## Mean :131.9 Mean :21.12 Mean : 230.0 Mean :35.69
## 3rd Qu.:161.5 3rd Qu.:23.00 3rd Qu.: 283.0 3rd Qu.:38.00
## Max. :974.0 Max. :30.00 Max. :1671.0 Max. :49.00
Describing the steps taken to get from the original dataset to the final dataset used for your analysis..
The dataset was imported from the above link from PDF and saved in Microsoft Excel in .CSV file. It is easier in .CSV format for the data manupulation. The dataset did not had any missing values/NA neither needed to be converted into different data type variable. After saving the dataframe which is TFAC in the default drive in .CSV file, the data was imported using following command in R as shown above.
Looking at the question, there are many possible factors that can cause the road fatalities such as Distracted driving, safety precautionary measure,etc.As a logical answer for the question, anyone would be able to answer it but looking my dataset with fatalities as a function of BAC, it might give different result. Therefore I would like to compare three variables BAC at 0.00, BAC at 0.08(legal limit) and BAC at 0.15plus (as max) again the fatalities. Showing a graphical representation below:
x<-cbind(TFAC$TotalFatalities)
#x1<-cbind(TFAC$TotalFatalities,TFAC$TotalFatalities,TFAC$TotalFatalities)
y<-cbind(TFAC$BAC.00,TFAC$BAC.08plus,TFAC$BAC.15plus)
matplot(x,y, type="p")
cor(x,y)
## [,1] [,2] [,3]
## [1,] 0.9902969 0.9739005 0.9667227
#matplot(x1,y, type="p")
#cor(x1,y)
Looking at the graph, the model has a positive slope which as predicted was a positive. That means the number of fatalities increases as there are more drivers on the street which in this case also proved that the fatalities increased more or less regardless of the DUI cases.
With the Given data set, I will try to create a Linear model of three graphs making Fatalities as a function of Blood Alcohol Content. 1)Fatalities vs BAC.00(Basically no alcohol consumsion) 2)Fatalities vs BAC.08(Legal Limit) 3)Fatalities vs BAC.15(Over the legal limit)
Also verify the Correlation Coefficient
plot(TFAC$BAC.00~TFAC$TotalFatalities)
Test1<-lm(TFAC$BAC.00~TFAC$TotalFatalities)
abline(Test1)
cor(TFAC$TotalFatalities,TFAC$BAC.00)
## [1] 0.9902969
plot(TFAC$BAC.08plus~TFAC$TotalFatalities)
Test2<-lm(TFAC$BAC.08plus~TFAC$TotalFatalities)
abline(Test2)
cor(TFAC$TotalFatalities,TFAC$BAC.08plus)
## [1] 0.9739005
plot(TFAC$BAC.15plus~TFAC$TotalFatalities)
Test3<-lm(TFAC$BAC.15plus~TFAC$TotalFatalities)
abline(Test3)
cor(TFAC$TotalFatalities,TFAC$BAC.15plus)
## [1] 0.9667227
As seen between the Correlation Coefficient, the slope gets lesser as BAC level increases. As there are other covariates that might be a factor in fatalities excluding BAC level but if we consider only this data frame, decreasing slope can be a factor of decreasing number of drivers and decrease number of fatalities as law get stricter. The changing laws on BAC has an impact on the number of fatalities.
g1<-lm(TotalFatalities~BAC.00+BAC.08plus,TFAC)
summary(g1)
##
## Call:
## lm(formula = TotalFatalities ~ BAC.00 + BAC.08plus, data = TFAC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.263 -3.926 -1.302 3.685 27.075
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.248461 1.428165 0.874 0.386
## BAC.00 1.024719 0.006657 153.923 <2e-16 ***
## BAC.08plus 1.126600 0.011975 94.081 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.288 on 48 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
## F-statistic: 2.304e+05 on 2 and 48 DF, p-value: < 2.2e-16
plot(g1)
new3 = data.frame(BAC.00=1864,BAC.08plus=1446)
new4 = data.frame(BAC.00=1864,BAC.08plus=1447)
Pred3 = predict(g1,new3)
Pred4 = predict(g1,new4)
Pred4-Pred3
## 1
## 1.1266
In above expression “g1<-lm(TotalFatalities~BAC.00+BAC.08plus,TFAC)”TotalFatalities is response or dependent variable where as BAC levels are treated as independent variable.
In the above data, TotalFatalities are taken from the state of Texas which has total fatalities of 1864 out of which 1446 has alcohol limit of BAC.08Plus. When predicting this model, if there is single person of BAC.08 is added in fatalities, keeping the total Fatalities constant, there is an increase of 1.1266 in Total Fatalities in that states. Now, even though a person can be either 1 or 2 (in intergers and not in decimals), the model predicts that there is more chance of fatalities when alcohol is consumed over the legal limit.
library(party)
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
Fatal_ctree<-ctree(TotalFatalities~BAC.00+BAC.08plus, TFAC)
print(Fatal_ctree)
##
## Conditional inference tree with 4 terminal nodes
##
## Response: TotalFatalities
## Inputs: BAC.00, BAC.08plus
## Number of observations: 51
##
## 1) BAC.00 <= 648; criterion = 1, statistic = 49.034
## 2) BAC.00 <= 302; criterion = 1, statistic = 42.448
## 3) BAC.00 <= 146; criterion = 1, statistic = 26.474
## 4)* weights = 15
## 3) BAC.00 > 146
## 5)* weights = 13
## 2) BAC.00 > 302
## 6)* weights = 16
## 1) BAC.00 > 648
## 7)* weights = 7
plot(Fatal_ctree)
plot(Fatal_ctree, type="simple")