Issue Description

According to the 2014 record of Traffic Fatalities by State in USA, how is the alcohol limit affects the fatality.

Questions

  1. Does the fatality increases as the number of driver increase or will it decrease?

  2. Has the DUI Law of BAC legal limit has decreased the number of drivers on the street of USA?

  3. Also predicting the model, on how number of Fatality is affected with the legal limit?

Data Source

http://www.cdc.gov/

Description of the Data

Used the tools in R such as str() and summary() to describe the original dataset you imported.

Looking at the str(TFAC) the dataframe has 51 observations for States (including Washington District of Columbia) with datatype Character and 12 variables/Columns with datatype integer for different Blood Alcohol Content level.

Summary describes the fatalities for each category of BAC level with Min value, Max Value, Median and 1st & 3rd Quartile.

library(readr)
TFAC <- read_csv("C:/Users/senet/Desktop/COLLEGE/St.Martins Classes/CSC 475 Design Business Intelligence Software/Project/TFAC.csv")
## Warning: Duplicated column names deduplicated: 'PCT' => 'PCT_1' [6], 'PCT'
## => 'PCT_2' [8], 'PCT' => 'PCT_3' [10], 'PCT' => 'PCT_4' [12]
## Parsed with column specification:
## cols(
##   State = col_character(),
##   TotalFatalities = col_integer(),
##   BAC.00 = col_integer(),
##   PCT = col_integer(),
##   BAC0.01to0.07 = col_integer(),
##   PCT_1 = col_integer(),
##   BAC.08plus = col_integer(),
##   PCT_2 = col_integer(),
##   BAC.15plus = col_integer(),
##   PCT_3 = col_integer(),
##   BAC.01plus = col_integer(),
##   PCT_4 = col_integer()
## )
View(TFAC)
str(TFAC)
## Classes 'tbl_df', 'tbl' and 'data.frame':    51 obs. of  12 variables:
##  $ State          : chr  "Alabama" "Alaska" "Arizona" "Arkansas" ...
##  $ TotalFatalities: int  820 73 770 466 3074 488 248 121 23 2494 ...
##  $ BAC.00         : int  508 45 505 302 2017 300 135 70 17 1686 ...
##  $ PCT            : int  62 61 66 65 66 62 54 58 74 68 ...
##  $ BAC0.01to0.07  : int  48 6 48 27 171 26 17 3 1 122 ...
##  $ PCT_1          : int  6 9 6 6 6 5 7 2 5 5 ...
##  $ BAC.08plus     : int  264 22 199 135 882 160 97 49 5 685 ...
##  $ PCT_2          : int  32 30 26 29 29 33 39 40 21 27 ...
##  $ BAC.15plus     : int  180 17 14 94 602 112 63 26 4 470 ...
##  $ PCT_3          : int  22 23 18 20 20 23 26 22 16 19 ...
##  $ BAC.01plus     : int  312 29 246 162 1053 187 113 51 6 807 ...
##  $ PCT_4          : int  38 39 32 35 34 38 46 42 26 32 ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 12
##   .. ..$ State          : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ TotalFatalities: list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ BAC.00         : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ PCT            : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ BAC0.01to0.07  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ PCT_1          : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ BAC.08plus     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ PCT_2          : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ BAC.15plus     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ PCT_3          : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ BAC.01plus     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ PCT_4          : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"
summary(TFAC)
##     State           TotalFatalities      BAC.00            PCT       
##  Length:51          Min.   :  23.0   Min.   :  17.0   Min.   :51.00  
##  Class :character   1st Qu.: 208.5   1st Qu.: 130.0   1st Qu.:61.50  
##  Mode  :character   Median : 462.0   Median : 292.0   Median :64.00  
##                     Mean   : 640.7   Mean   : 409.0   Mean   :64.02  
##                     3rd Qu.: 795.0   3rd Qu.: 506.5   3rd Qu.:67.00  
##                     Max.   :3538.0   Max.   :2017.0   Max.   :76.00  
##  BAC0.01to0.07        PCT_1          BAC.08plus         PCT_2      
##  Min.   :  1.00   Min.   : 1.000   Min.   :   5.0   Min.   :20.00  
##  1st Qu.: 11.50   1st Qu.: 4.000   1st Qu.:  58.0   1st Qu.:27.50  
##  Median : 26.00   Median : 5.000   Median : 134.0   Median :29.00  
##  Mean   : 34.69   Mean   : 5.451   Mean   : 195.6   Mean   :30.31  
##  3rd Qu.: 46.50   3rd Qu.: 6.000   3rd Qu.: 234.0   3rd Qu.:33.00  
##  Max.   :224.00   Max.   :12.000   Max.   :1446.0   Max.   :41.00  
##    BAC.15plus        PCT_3         BAC.01plus         PCT_4      
##  Min.   :  4.0   Min.   :14.00   Min.   :   6.0   Min.   :24.00  
##  1st Qu.: 38.5   1st Qu.:19.00   1st Qu.:  71.5   1st Qu.:32.50  
##  Median : 89.0   Median :21.00   Median : 162.0   Median :35.00  
##  Mean   :131.9   Mean   :21.12   Mean   : 230.0   Mean   :35.69  
##  3rd Qu.:161.5   3rd Qu.:23.00   3rd Qu.: 283.0   3rd Qu.:38.00  
##  Max.   :974.0   Max.   :30.00   Max.   :1671.0   Max.   :49.00

Cleaning and Preparation

Describing the steps taken to get from the original dataset to the final dataset used for your analysis..

The dataset was imported from the above link from PDF and saved in Microsoft Excel in .CSV file. It is easier in .CSV format for the data manupulation. The dataset did not had any missing values/NA neither needed to be converted into different data type variable. After saving the dataframe which is TFAC in the default drive in .CSV file, the data was imported using following command in R as shown above.

Final Results

  1. Does the fatality increases as the number of driver increase or will it decrease?

Looking at the question, there are many possible factors that can cause the road fatalities such as Distracted driving, safety precautionary measure,etc.As a logical answer for the question, anyone would be able to answer it but looking my dataset with fatalities as a function of BAC, it might give different result. Therefore I would like to compare three variables BAC at 0.00, BAC at 0.08(legal limit) and BAC at 0.15plus (as max) again the fatalities. Showing a graphical representation below:

x<-cbind(TFAC$TotalFatalities)
#x1<-cbind(TFAC$TotalFatalities,TFAC$TotalFatalities,TFAC$TotalFatalities)
y<-cbind(TFAC$BAC.00,TFAC$BAC.08plus,TFAC$BAC.15plus)
matplot(x,y, type="p")

cor(x,y)
##           [,1]      [,2]      [,3]
## [1,] 0.9902969 0.9739005 0.9667227
#matplot(x1,y, type="p")
#cor(x1,y)

Looking at the graph, the model has a positive slope which as predicted was a positive. That means the number of fatalities increases as there are more drivers on the street which in this case also proved that the fatalities increased more or less regardless of the DUI cases.

  1. Has the DUI Law of BAC legal limit has decreased the number of drivers on the street of USA?

With the Given data set, I will try to create a Linear model of three graphs making Fatalities as a function of Blood Alcohol Content. 1)Fatalities vs BAC.00(Basically no alcohol consumsion) 2)Fatalities vs BAC.08(Legal Limit) 3)Fatalities vs BAC.15(Over the legal limit)

Also verify the Correlation Coefficient

plot(TFAC$BAC.00~TFAC$TotalFatalities)
Test1<-lm(TFAC$BAC.00~TFAC$TotalFatalities)
abline(Test1)

cor(TFAC$TotalFatalities,TFAC$BAC.00)
## [1] 0.9902969
plot(TFAC$BAC.08plus~TFAC$TotalFatalities)
Test2<-lm(TFAC$BAC.08plus~TFAC$TotalFatalities)
abline(Test2)

cor(TFAC$TotalFatalities,TFAC$BAC.08plus)
## [1] 0.9739005
plot(TFAC$BAC.15plus~TFAC$TotalFatalities)
Test3<-lm(TFAC$BAC.15plus~TFAC$TotalFatalities)
abline(Test3)

cor(TFAC$TotalFatalities,TFAC$BAC.15plus)
## [1] 0.9667227

As seen between the Correlation Coefficient, the slope gets lesser as BAC level increases. As there are other covariates that might be a factor in fatalities excluding BAC level but if we consider only this data frame, decreasing slope can be a factor of decreasing number of drivers and decrease number of fatalities as law get stricter. The changing laws on BAC has an impact on the number of fatalities.

  1. Also predicting the model, on how number of Fatality is affected with the legal limit?
g1<-lm(TotalFatalities~BAC.00+BAC.08plus,TFAC)
summary(g1)
## 
## Call:
## lm(formula = TotalFatalities ~ BAC.00 + BAC.08plus, data = TFAC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.263  -3.926  -1.302   3.685  27.075 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.248461   1.428165   0.874    0.386    
## BAC.00      1.024719   0.006657 153.923   <2e-16 ***
## BAC.08plus  1.126600   0.011975  94.081   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.288 on 48 degrees of freedom
## Multiple R-squared:  0.9999, Adjusted R-squared:  0.9999 
## F-statistic: 2.304e+05 on 2 and 48 DF,  p-value: < 2.2e-16
plot(g1)

new3 = data.frame(BAC.00=1864,BAC.08plus=1446)
new4 = data.frame(BAC.00=1864,BAC.08plus=1447) 

Pred3 = predict(g1,new3)
Pred4 = predict(g1,new4)
Pred4-Pred3
##      1 
## 1.1266

In above expression “g1<-lm(TotalFatalities~BAC.00+BAC.08plus,TFAC)”TotalFatalities is response or dependent variable where as BAC levels are treated as independent variable.

In the above data, TotalFatalities are taken from the state of Texas which has total fatalities of 1864 out of which 1446 has alcohol limit of BAC.08Plus. When predicting this model, if there is single person of BAC.08 is added in fatalities, keeping the total Fatalities constant, there is an increase of 1.1266 in Total Fatalities in that states. Now, even though a person can be either 1 or 2 (in intergers and not in decimals), the model predicts that there is more chance of fatalities when alcohol is consumed over the legal limit.

library(party)
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
Fatal_ctree<-ctree(TotalFatalities~BAC.00+BAC.08plus, TFAC)
print(Fatal_ctree)
## 
##   Conditional inference tree with 4 terminal nodes
## 
## Response:  TotalFatalities 
## Inputs:  BAC.00, BAC.08plus 
## Number of observations:  51 
## 
## 1) BAC.00 <= 648; criterion = 1, statistic = 49.034
##   2) BAC.00 <= 302; criterion = 1, statistic = 42.448
##     3) BAC.00 <= 146; criterion = 1, statistic = 26.474
##       4)*  weights = 15 
##     3) BAC.00 > 146
##       5)*  weights = 13 
##   2) BAC.00 > 302
##     6)*  weights = 16 
## 1) BAC.00 > 648
##   7)*  weights = 7
plot(Fatal_ctree)

plot(Fatal_ctree, type="simple")