The final project requirements have been identified in the document located at the following link: https://bbhosted.cuny.edu/bbcswebdav/pid-42265237-dt-content-rid-347468186_1/courses/SPS01_DATA_607_01_1199_1/607-Final%281%29.pdf
Mario Pena
Ajay Arora
We enjoyed collaboration on the 3rd project and decieded to continue to work together for the final project. We established communication over #Slack and spoke over the phone to introduce ourselves properly. Furthermore, we divided the work effort in half. The work effort consisted of Data Preparation, Data Cleaning, Data Analysis, and Conclusion.
We both decided to communicate via email and/or #Slack. In addition, we establihed a Final Project group folder on GitHub.
The data was acquired at https://www-nrd.nhtsa.dot.gov/database/veh/veh.htm. (https://www-nrd.nhtsa.dot.gov/database/VSR/Download.aspx?tstno=&curno=&database=v&name=Vehdb-export&format=export)
Specifically, Which vehicle or set of vehicles are the safest from a head and leg injury perspective.
Head Injury Criterion, Left Femur Load, Right Femur Load
Based on the recommendations following this section, provided by the National Highway Traffic Safety Administration, we want to perform analysis to determine which vehicles have the best and worst injury severity ratings based upon the criteria listed below and using the following attributes.
vehicle make, Vehicle year, Vehicle body type, and occupant location (Left-front-seat, Right-front-seat).
The National Highway Traffic Safety Administration makes their recommendation using a star rating system. A greater number of Stars mean Safer Cars. 5-Star Safety Ratings measure the crashworthiness and rollover safety of vehicles. Five stars is the highest rating, one is the lowest. However, some of the vehicles getting a 5-star rating could be cost prohibitive for some consumers and provide a small variety of choices. In this instance, a more detailed analysis of the injury criteria and attributes are warranted.
National Highway Traffic Safety Administration Recommendations
National Highway Traffic Safety Administration Recommendations
The National Highway Traffic Safety Administration Recommendations provide a basis for vehicle safety for crashworthiness and rollover. The recommendation image provides a baseline for head, leg and chest injury. In summary, the closer the value is to absolute 0, the less likely the occupant will have a severe injury.
#Loading Libraries
library(DBI)
## Warning: package 'DBI' was built under R version 3.5.3
library("knitr")
library("tidyverse")
## Warning: package 'tidyverse' was built under R version 3.5.3
## -- Attaching packages ---------------------------------------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.0
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 0.8.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.5.3
## Warning: package 'tibble' was built under R version 3.5.3
## Warning: package 'readr' was built under R version 3.5.3
## Warning: package 'dplyr' was built under R version 3.5.3
## Warning: package 'forcats' was built under R version 3.5.3
## -- Conflicts ------------------------------------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library("stringr")
library("plotly")
## Warning: package 'plotly' was built under R version 3.5.3
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library("htmlTable")
## Warning: package 'htmlTable' was built under R version 3.5.3
library("stringr")
library("stats")
library("scales")
## Warning: package 'scales' was built under R version 3.5.3
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
library("viridis")
## Warning: package 'viridis' was built under R version 3.5.3
## Loading required package: viridisLite
##
## Attaching package: 'viridis'
## The following object is masked from 'package:scales':
##
## viridis_pal
library("wordcloud")
## Warning: package 'wordcloud' was built under R version 3.5.3
## Loading required package: RColorBrewer
#data <- read.csv(file="https://raw.githubusercontent.com/AjayArora35/Data-607-Group-Project-3/master/Data%20Science%20Software%20Skills.csv", header=TRUE, stringsAsFactors = FALSE)
#data2C <- read.csv(file="https://raw.githubusercontent.com/AjayArora35/Data-607-Group-Project-3/master/Data%20Science%20Computing%20Skills.csv", header=TRUE, stringsAsFactors = FALSE)
cn <- dbConnect(drv = RMySQL::MySQL(),
username = "admin",
password = "Data__607",
host = "database-1.cxdov2mcmzlo.us-east-2.rds.amazonaws.com",
port = 3306,
dbname = "data607finalproject")
data <- dbGetQuery(cn, "SELECT
RD.MAKED
,MAX(`Head Injury Criterion`) AS HIC
,MAX(`Left Femur Load`) AS LFL
,MAX(`Right Femur Load`) AS RFL
,MIN(`Head Injury Criterion`) AS min_HIC
,MIN(`Left Femur Load`) AS min_LFL
,MIN(`Right Femur Load`) AS min_RFL
FROM
(SELECT v.MAKED
, v.YEAR
, AVG(convert(o.HIC, SIGNED INTEGER)) AS `Head Injury Criterion`
, AVG(convert(o.LFEM , SIGNED INTEGER)) AS `Left Femur Load`
, AVG(convert(o.RFEM , SIGNED INTEGER)) AS `Right Femur Load`
FROM data607finalproject.test t
inner join data607finalproject.veh v
on(v.TSTNO = t.TSTNO)
inner join data607finalproject.occ o
on(o.TSTNO = v.TSTNO)
inner join data607finalproject.rest r
on(r.TSTNO = v.TSTNO)
AND (r.VEHNO = v.VEHNO)
WHERE v.MAKED != 'NHTSA'
AND v.MAKED NOT IN ('MCI')
AND v.YEAR != ''
AND v.YEAR != 0
AND (o.LFEM != '' AND o.LFEM != 0)
AND (o.RFEM != '' AND o.RFEM != 0)
AND r.DEPLOYD = 'DEPLOYED PROPERLY'
GROUP BY
v.MAKED
,v.YEAR
) AS RD
GROUP BY
RD.MAKED
ORDER BY
RD.MAKED")
## Warning in .local(conn, statement, ...): Decimal MySQL column 1 imported as
## numeric
## Warning in .local(conn, statement, ...): Decimal MySQL column 2 imported as
## numeric
## Warning in .local(conn, statement, ...): Decimal MySQL column 3 imported as
## numeric
## Warning in .local(conn, statement, ...): Decimal MySQL column 4 imported as
## numeric
## Warning in .local(conn, statement, ...): Decimal MySQL column 5 imported as
## numeric
## Warning in .local(conn, statement, ...): Decimal MySQL column 6 imported as
## numeric
head(data)
## MAKED HIC LFL RFL min_HIC min_LFL min_RFL
## 1 ACURA 723.1429 -400.7500 -329.5000 152.750 -6788.000 -7036.750
## 2 AUDI 627.5000 -269.1667 -199.6667 63.400 -6409.500 -5085.500
## 3 BMW 9999.0000 -999.0000 -999.0000 40.250 -6962.500 -7828.000
## 4 BUICK 1273.5000 -647.2222 -767.7000 149.000 -7157.500 -7034.750
## 5 CADILLAC 1120.0000 -565.0000 -704.5000 53.500 -5409.500 -5183.250
## 6 CHEVROLET 1282.7679 -956.6154 -1216.8462 96.375 -6328.364 -5758.714
data$HIC <- as.numeric(gsub(",","", data$HIC))
data$LFL <- as.numeric(gsub(",","", data$LFL))
data$RFL <- as.numeric(gsub(",","", data$RFL))
data$min_HIC <- as.numeric(gsub(",","", data$min_HIC))
data$min_LFL <- as.numeric(gsub(",","", data$min_LFL))
data$min_RFL <- as.numeric(gsub(",","", data$min_RFL))
data$HIC <- trunc(data$HIC)
data$LFL <- trunc(data$LFL)
data$RFL <- trunc(data$RFL)
data$min_HIC <- trunc(data$min_HIC)
data$min_LFL <- trunc(data$min_LFL)
data$minRFL <- trunc(data$min_RFL)
The Head Injury Criterion (HIC) is a measure of the likelihood of head injury arising from an impact. The summarized data for HCI was generated by taking the maximum of HCI value for each vehicle make.
The HIC can be used to assess safety related to vehicles, personal protective gear, and sport equipment.
Normally the variable is derived from the measurements of an accelerometer mounted at the center of mass of a crash test dummy’s head, when the dummy is exposed to crash forces. This means that the HIC includes the effects of head acceleration and the duration of the acceleration. Large accelerations may be tolerated for very short times.
At a HIC of 1000, there is an 18% probability of a severe head injury, a 55% probability of a serious injury and a 90% probability of a moderate head injury to the average adult.
Measured in integers, 0 to 9,999, HIC is the computed value of the head injury criterion, based on the resultant acceleration pulse for the head center of gravity. (https://www.intmath.com/applications-integration/hic-part2.php) Generally, experts agree that Head Injury Criterion (HIC) values above 1000 are life threatening.
grid1 <- ggplot(data = data,aes(x=reorder(data$MAKED, data$HIC), y=data$HIC, fill = viridis(49), )) +
theme(legend.position = "none", axis.text.y = element_text(size=6), axis.text.x = element_text(size=7)) +
geom_bar(stat = "identity") +
geom_label(aes(label=data$HIC), position = position_dodge(width = 0.5), size = 2.4, label.padding = unit(0.04, "lines"), label.size = 0.15, inherit.aes = TRUE)+
labs(title = "Maximum Head Injury Criterion", x = "Vehicles", y = "Severity")+
coord_flip()
grid1
Measured as an integer, LFEM indicates the maximum compression load for the left femur listed as a negative number.
grid2 <- ggplot(data = data,aes(x=reorder(data$MAKED, data$LFL), y=data$LFL, fill = viridis(49), )) +
theme(legend.position = "none", axis.text.y = element_text(size=6), axis.text.x = element_text(size=7)) +
geom_bar(stat = "identity") +
geom_label(aes(label=data$LFL), position = position_dodge(width = 0.5), size = 2.4, label.padding = unit(0.04, "lines"), label.size = 0.15, inherit.aes = TRUE)+
labs(title = " Maximum Left Femur Peak Load", x = "Vehicles", y = "Severity")+
coord_flip()
grid2
Measured as an integer, RFEM indicates the maximum compression load for the right femur listed as a negative number.
grid3 <- ggplot(data = data,aes(x=reorder(data$MAKED, data$RFL), y=data$RFL, fill = viridis(49), )) +
theme(legend.position = "none", axis.text.y = element_text(size=6), axis.text.x = element_text(size=7)) +
geom_bar(stat = "identity") +
geom_label(aes(label=data$RFL), position = position_dodge(width = 0.5), size = 2.4, label.padding = unit(0.04, "lines"), label.size = 0.15, inherit.aes = TRUE)+
labs(title = " Maximum Right Femur Peak Load", x = "Vehicles", y = "Severity")+
coord_flip()
grid3
grid4 <- ggplot(data = data,aes(x=reorder(data$MAKED, data$min_HIC), y=data$min_HIC, fill = viridis(49), )) +
theme(legend.position = "none", axis.text.y = element_text(size=6), axis.text.x = element_text(size=7)) +
geom_bar(stat = "identity") +
geom_label(aes(label=data$min_HIC), position = position_dodge(width = 0.5), size = 2.4, label.padding = unit(0.04, "lines"), label.size = 0.15, inherit.aes = TRUE)+
labs(title = "Minimum Head Injury Criterion", x = "Vehicles", y = "Severity")+
coord_flip()
grid4
grid5 <- ggplot(data = data,aes(x=reorder(data$MAKED, data$min_LFL), y=data$min_LFL, fill = viridis(49), )) +
theme(legend.position = "none", axis.text.y = element_text(size=6), axis.text.x = element_text(size=7)) +
geom_bar(stat = "identity") +
geom_label(aes(label=data$min_LFL), position = position_dodge(width = 0.5), size = 2.4, label.padding = unit(0.04, "lines"), label.size = 0.15, inherit.aes = TRUE)+
labs(title = " Minimum Left Femur Peak Load", x = "Vehicles", y = "Severity")+
coord_flip()
grid5
grid6 <- ggplot(data = data,aes(x=reorder(data$MAKED, data$min_RFL), y=data$min_RFL, fill = viridis(49), )) +
theme(legend.position = "none", axis.text.y = element_text(size=6), axis.text.x = element_text(size=7)) +
geom_bar(stat = "identity") +
geom_label(aes(label=data$min_RFL), position = position_dodge(width = 0.5), size = 2.4, label.padding = unit(0.04, "lines"), label.size = 0.15, inherit.aes = TRUE)+
labs(title = " Minimum Right Femur Peak Load", x = "Vehicles", y = "Severity")+
coord_flip()
grid6
The summarized view of the data for HCI, LFEM and RFEM does not reveal the details necessary for making a more informed decision about a vehicle. For example, suppose a potential vehicle buyer wanted to know these values based on the year, type of vehicle, location of the occputant, etc. The summarized views do not provide this level of detail, so it becomes necessary to retrieve data with year and type of vehicle, among other attributes.
#data <- read.csv(file="https://raw.githubusercontent.com/AjayArora35/Data-607-Group-Project-3/master/Data%20Science%20Software%20Skills.csv", header=TRUE, stringsAsFactors = FALSE)
#data2C <- read.csv(file="https://raw.githubusercontent.com/AjayArora35/Data-607-Group-Project-3/master/Data%20Science%20Computing%20Skills.csv", header=TRUE, stringsAsFactors = FALSE)
cn <- dbConnect(drv = RMySQL::MySQL(),
username = "admin",
password = "Data__607",
host = "database-1.cxdov2mcmzlo.us-east-2.rds.amazonaws.com",
port = 3306,
dbname = "data607finalproject")
data2 <- dbGetQuery(cn, "SELECT v.MAKED
, v.YEAR
, v.BODYD
, o.OCCLOCD
, o.HIC AS HIC
, o.LFEM AS LFL
, o.RFEM AS RFL
FROM data607finalproject.test t
inner join data607finalproject.veh v
on(v.TSTNO = t.TSTNO)
inner join data607finalproject.occ o
on(o.TSTNO = v.TSTNO)
inner join data607finalproject.rest r
on(r.TSTNO = t.TSTNO)
WHERE v.MAKED != 'NHTSA'
AND v.MAKED NOT IN ('MCI', 'OTHER')
AND (o.LFEM != '' AND o.LFEM != 0)
AND (o.RFEM != '' AND o.RFEM != 0)
AND (r.DEPLOYD = N'DEPLOYED PROPERLY')
AND (o.HIC != '' AND o.HIC != 0)
AND (v.YEAR != 0 and v.YEAR != '')
GROUP BY
v.MAKED
, v.YEAR
, v.BODYD
, o.OCCLOCD
, o.HIC
, o.LFEM
, o.RFEM
ORDER BY v.MAKED ")
head(data2)
## MAKED YEAR BODYD OCCLOCD HIC LFL RFL
## 1 ACURA 1988 FOUR DOOR SEDAN LEFT FRONT SEAT 284 -6886 -7344
## 2 ACURA 1988 FOUR DOOR SEDAN RIGHT FRONT SEAT 387 -2931 -5494
## 3 ACURA 1992 FOUR DOOR SEDAN LEFT FRONT SEAT 601 -9230 -8229
## 4 ACURA 1992 FOUR DOOR SEDAN LEFT FRONT SEAT 897 -1326 -5534
## 5 ACURA 1992 FOUR DOOR SEDAN LEFT FRONT SEAT 914 -3007 -7371
## 6 ACURA 1992 FOUR DOOR SEDAN RIGHT FRONT SEAT 433 -2140 -1401
data2$HIC <- as.numeric(gsub(",","", data2$HIC))
data2$LFL <- as.numeric(gsub(",","", data2$LFL))
data2$RFL <- as.numeric(gsub(",","", data2$RFL))
data2$YEAR <- as.numeric(data2$YEAR)
data2$BODYD <- as.factor(data2$BODYD)
data2$MAKED <- as.factor(data2$MAKED)
data2$OCCLOCD <- as.factor(data2$OCCLOCD)
data2$HIC <- trunc(data2$HIC)
data2$LFL <- trunc(data2$LFL)
data2$RFL <- trunc(data2$RFL)
In order to simplify our analysis we have decided to separete head injury criterion, left/right femur peak load and compare them against the attributes we thought were most relevant. We will add a column that averages the HCI, LFEM and RFEM by make, year, body type of the vehicle and occupant location in the vehicle.
The vehicle make that has the highest frequency of crash test observations in our data is Ford, and is then followed by Toyota, Honda, Chevrolet, Dodge, and Nissan to name a few.
attrMake <- data2 %>% group_by(MAKED) %>% summarise("Average HIC" = mean(HIC), Count = n())
ggplot(attrMake, aes(x=reorder(MAKED, -Count), y=Count)) + geom_bar(stat="identity", width = 0.5, fill = "tomato2") + labs(x = "Vehicle Make", y = "Frequency", title = "Distribution of Crash Test Observations by Vehicle Make") + theme(axis.text.x = element_text(angle = 60, hjust = 1, size=8))
attrMake_lfl <- data2 %>% group_by(MAKED) %>% summarise("Average LFL" = mean(LFL), Count = n())
attrMake_rfl <- data2 %>% group_by(MAKED) %>% summarise("Average RFL" = mean(RFL), Count = n())
head(attrMake)
## # A tibble: 6 x 3
## MAKED `Average HIC` Count
## <fct> <dbl> <int>
## 1 ACURA 384. 87
## 2 AUDI 278. 45
## 3 BMW 742. 71
## 4 BUICK 348. 78
## 5 CADILLAC 395. 87
## 6 CHEVROLET 515. 611
In the graphs below we can see from the crash tests that among the safest vehicle makes from a head and leg injury perspective we find Porsche, Tesla, Mini and Volkswagen among others.
ggplot(attrMake, aes(x=reorder(MAKED, `Average HIC`), y=`Average HIC`)) + geom_bar(stat="identity", width = 0.5, fill = viridis(49)) + labs(x = "Car Make", y = "AVG HIC", title = "Average HIC by Car Make") + theme(axis.text.x = element_text(angle = 60, hjust = 1, size=8))
ggplot(attrMake_lfl, aes(x=reorder(MAKED, -`Average LFL`), y=`Average LFL`)) + geom_bar(stat="identity", width = 0.5, fill = viridis(49)) + labs(x = "Car Make", y = "AVG LFL", title = "Average LFL by Car Make") + theme(axis.text.x = element_text(angle = 60, hjust = 1, size=8))
ggplot(attrMake_rfl, aes(x=reorder(MAKED, -`Average RFL`), y=`Average RFL`)) + geom_bar(stat="identity", width = 0.5, fill = viridis(49)) + labs(x = "Car Make", y = "AVG RFL", title = "Average RFL by Car Make") + theme(axis.text.x = element_text(angle = 60, hjust = 1, size=8))
attrYear <- data2 %>% group_by(YEAR) %>% summarise("Average HIC" = mean(HIC), Count = n())
attrYear_lfl <- data2 %>% group_by(YEAR) %>% summarise("Average LFL" = mean(LFL), Count = n())
attrYear_rfl <- data2 %>% group_by(YEAR) %>% summarise("Average RFL" = mean(RFL), Count = n())
head(attrYear)
## # A tibble: 6 x 3
## YEAR `Average HIC` Count
## <dbl> <dbl> <int>
## 1 1981 991 2
## 2 1982 667 2
## 3 1984 90.2 4
## 4 1986 662 2
## 5 1987 444. 4
## 6 1988 531. 15
During 1999 The National Highway Traffic Safety Administration planned for upgrading the Federal Motor Vehicle Safety Standard (FMVSS). They added new crash specifications that required the use of additional dummies of various sizes as well as additional performance criteria that appropriately represent head injury thresholds.
ggplot(attrYear, aes(x=YEAR, y=`Average HIC`)) + geom_line(color = "tomato2") + labs(x = "Year", y = "AVG HIC", title = "Average HIC by Year") + geom_point(color = "tomato2")
ggplot(attrYear_lfl, aes(x=YEAR, y=`Average LFL`)) + geom_line(color = "tomato2") + labs(x = "Year", y = "AVG LFL", title = "Average LFL by Year") + geom_point(color = "tomato2")
ggplot(attrYear_rfl, aes(x=YEAR, y=`Average RFL`)) + geom_line(color = "tomato2") + labs(x = "Year", y = "AVG RFL", title = "Average RFL by Year") + geom_point(color = "tomato2")
attrBody <- data2 %>% group_by(BODYD) %>% filter (BODYD != "OTHER") %>% summarise("Average HIC" = mean(HIC), Count = n())
attrBody_lfl <- data2 %>% group_by(BODYD) %>% filter (BODYD != "OTHER") %>% summarise("Average LFL" = mean(LFL), Count = n())
attrBody_rfl <- data2 %>% group_by(BODYD) %>% filter (BODYD != "OTHER") %>% summarise("Average RFL" = mean(RFL), Count = n())
head(attrBody)
## # A tibble: 6 x 3
## BODYD `Average HIC` Count
## <fct> <dbl> <int>
## 1 4 DOOR PICKUP 407. 181
## 2 CONVERTIBLE 976. 62
## 3 EXTENDED CAB PICKUP 489. 114
## 4 FIVE DOOR HATCHBACK 403. 221
## 5 FOUR DOOR SEDAN 467. 2551
## 6 MINIVAN 454. 134
According to the graphs below, among the safest vehicle body types from a head and leg injury perspective we find that three door coupes, five door hatchbacks, and 4 door pickups have the lowest injury averages.
Please Note: The category “OTHER” for body type has been removed as it is not clear what type of vehicles are included
ggplot(attrBody, aes(x=reorder(BODYD, `Average HIC`), y=`Average HIC`)) + geom_bar(stat="identity", width = 0.5, fill = magma(15)) + labs(x = "Body Type", y = "AVG HIC", title = "Average HIC by Car Body Type") + theme(axis.text.x = element_text(angle = 60, hjust = 1, size=8))
ggplot(attrBody_lfl, aes(x=reorder(BODYD, -`Average LFL`), y=`Average LFL`)) + geom_bar(stat="identity", width = 0.5, fill = magma(15)) + labs(x = "Body Type", y = "AVG LFL", title = "Average LFL by Car Body Type") + theme(axis.text.x = element_text(angle = 60, hjust = 1, size=8))
ggplot(attrBody_rfl, aes(x=reorder(BODYD, -`Average RFL`), y=`Average RFL`)) + geom_bar(stat="identity", width = 0.5, fill = magma(15)) + labs(x = "Body Type", y = "AVG RFL", title = "Average RFL by Car Body Type") + theme(axis.text.x = element_text(angle = 60, hjust = 1, size=8))
attrOcc <- data2 %>% group_by(OCCLOCD) %>% summarise("Average HIC" = mean(HIC), Count = n())
attrOcc_lfl <- data2 %>% group_by(OCCLOCD) %>% summarise("Average LFL" = mean(LFL), Count = n())
attrOcc_rfl <- data2 %>% group_by(OCCLOCD) %>% summarise("Average RFL" = mean(RFL), Count = n())
head(attrOcc)
## # A tibble: 6 x 3
## OCCLOCD `Average HIC` Count
## <fct> <dbl> <int>
## 1 CENTER REAR SEAT 712. 21
## 2 CENTER THIRD SEAT 871 4
## 3 LEFT FRONT SEAT 389. 2923
## 4 LEFT REAR SEAT 1389. 147
## 5 LEFT THIRD SEAT 1045. 4
## 6 RIGHT FRONT SEAT 458. 2364
According to the graphs below, one of the safest places to sit in a vehicle in the unfortunate event of a crash is the left front seat. In other words, you are the most safe when you are the driver. The second safest place to sit is in the front passenger seat.
ggplot(attrOcc, aes(x=reorder(OCCLOCD, `Average HIC`), y=`Average HIC`)) + geom_bar(stat="identity", width = 0.5, fill = plasma(8)) + labs(x = "Occupant Location", y = "AVG HIC", title = "Average HIC by Occupant Location") + theme(axis.text.x = element_text(angle = 60, hjust = 1, size=8))
ggplot(attrOcc_lfl, aes(x=reorder(OCCLOCD, -`Average LFL`), y=`Average LFL`)) + geom_bar(stat="identity", width = 0.5, fill = plasma(8)) + labs(x = "Occupant Location", y = "AVG LFL", title = "Average LFL by Occupant Location") + theme(axis.text.x = element_text(angle = 60, hjust = 1, size=8))
ggplot(attrOcc_rfl, aes(x=reorder(OCCLOCD, -`Average RFL`), y=`Average RFL`)) + geom_bar(stat="identity", width = 0.5, fill = plasma(8)) + labs(x = "Occupant Location", y = "AVG RFL", title = "Average RFL by Occupant Location") + theme(axis.text.x = element_text(angle = 60, hjust = 1, size=8))
Below we have another perspective for head injury criterion by occupant location looking at the data through boxplots. We can see that our boxplots agree with our analysis above, the two safest places to sit in a vehicle are in the driver seat and front passenger seat.
qplot(OCCLOCD, HIC, data = data2, geom= "boxplot", fill = OCCLOCD) + labs(x = "Occupant Location", y = "HIC", title = "Head Injury Criterion by Occupant Location") + theme(axis.text.x = element_text(angle = 60, hjust = 1, size=8)) + theme(legend.position = "none")
We can also look at our data by comparing vehicle make, body type and head injury criterion in the same graph in order to have a better visualization of our aggregate data.
Please Note: We have removed an observation (Chevrolet convertible) that was very far from the rest of the data in order to make the graph more readable. Additionally, the category “OTHER” for body type has been removed as it is not clear what type of vehicles are included
attrMakeBody <- data2 %>% group_by(MAKED, BODYD) %>% filter (MAKED != "CHEVROLET" | BODYD != "CONVERTIBLE", BODYD != "OTHER") %>% summarise("Average HIC" = mean(HIC), Count = n())
ggplot(attrMakeBody, aes(MAKED, `Average HIC`)) + geom_point(aes(color = BODYD)) + labs(x = "Vehicle Make", y = "AVG HIC", title = "Average HIC by Vehicle Make and Body Type") + theme(axis.text.x = element_text(angle = 60, hjust = 1, size=6), legend.title = element_text(size = 8), legend.text = element_text(size = 6)) + scale_color_discrete(name = "Body Type")
The data obtained from the more granular query has many more rows with repeating fields of data. To get a better understanding of this data, we will transform into a easier visual model.
# library(treemap)
#
# treemap(data2,
# index=c("MAKED", "BODYD"),
# vSize="YEAR",
# #vColor="GNI",
# #type="YEAR"
# )
library(data.tree)
## Warning: package 'data.tree' was built under R version 3.5.3
#data(GNI2014)
#(GNI2014)
data2$pathString <- paste("Vehicle_Make",
data2$YEAR,
data2$MAKED,
data2$BODYD,
data2$OCCLOCD,
sep = "/")
data2tree <- as.Node(data2)
#print(data2tree, "HIC", "LFL", "RFL", limit = 200)
#plot(population)
#library(networkD3)
#acmeNetwork <- ToDataFrameNetwork(data2tree, "name")
#simpleNetwork(acmeNetwork[-3], fontSize = 12)
#plot(as.dendrogram(data2tree), center = TRUE)
# works: print(data2tree, "HIC", "LFL", "RFL")
Reasons for transformation: It is increasingly becoming difficult to view the results because of the number of records in the resulset. Every time another variable of interest is added it exponentially increases the output of records. The difficulty becomes in “visually” consuming the data without removing any of the results. The following data that has been transformed into a tree structure and is still too large to consume “visually”. So, we decided to ask our questions using subsets of the data as depicted below.
#SetGraphStyle(data2tree, rankdir = "LR", dpi="70")
#SetEdgeStyle(data2tree, arrowhead = "vee", color = "grey35", penwidth = "20")
#SetNodeStyle(data2tree, style = "filled,rounded", shape = "box", fillcolor = "GreenYellow",
# fontname = "helvetica", tooltip = GetDefaultTooltip, fontsize = "36", height="0.75")
#SetNodeStyle(data2tree$ACURA, fillcolor = "LightBlue", penwidth = "5px")
#plot(data2tree$`2019`$ACURA, output = "graph")
#plot(as.dendrogram(data2tree), center = TRUE)
print(data2tree, "HIC", "LFL", "RFL")
## levelName HIC LFL RFL
## 1 Vehicle_Make NA NA NA
## 2 ¦--1988 NA NA NA
## 3 ¦ ¦--ACURA NA NA NA
## 4 ¦ ¦ °--FOUR DOOR SEDAN NA NA NA
## 5 ¦ ¦ ¦--LEFT FRONT SEAT 284 -6886 -7344
## 6 ¦ ¦ °--RIGHT FRONT SEAT 387 -2931 -5494
## 7 ¦ ¦--DODGE NA NA NA
## 8 ¦ ¦ °--THREE DOOR HATCHBACK NA NA NA
## 9 ¦ ¦ ¦--LEFT FRONT SEAT 194 -3011 -6397
## 10 ¦ ¦ °--RIGHT FRONT SEAT 319 -4408 -3670
## 11 ¦ ¦--FORD NA NA NA
## 12 ¦ ¦ °--FOUR DOOR SEDAN NA NA NA
## 13 ¦ ¦ ¦--LEFT FRONT SEAT 456 -5089 -5378
## 14 ¦ ¦ °--RIGHT FRONT SEAT 561 -2971 -2322
## 15 ¦ ¦--OLDSMOBILE NA NA NA
## 16 ¦ ¦ °--FOUR DOOR SEDAN NA NA NA
## 17 ¦ ¦ ¦--LEFT FRONT SEAT 709 -6010 -9399
## 18 ¦ ¦ °--RIGHT FRONT SEAT 539 -2318 -2620
## 19 ¦ °--VOLVO NA NA NA
## 20 ¦ °--FOUR DOOR SEDAN NA NA NA
## 21 ¦ ¦--LEFT FRONT SEAT 519 -5084 -7976
## 22 ¦ °--RIGHT FRONT SEAT 445 -3216 -2820
## 23 ¦--1992 NA NA NA
## 24 ¦ ¦--ACURA NA NA NA
## 25 ¦ ¦ °--FOUR DOOR SEDAN NA NA NA
## 26 ¦ ¦ ¦--LEFT FRONT SEAT 914 -3007 -7371
## 27 ¦ ¦ °--RIGHT FRONT SEAT 660 -2940 -1517
## 28 ¦ ¦--BMW NA NA NA
## 29 ¦ ¦ °--FOUR DOOR SEDAN NA NA NA
## 30 ¦ ¦ ¦--LEFT FRONT SEAT 705 -5418 -5196
## 31 ¦ ¦ °--RIGHT FRONT SEAT 698 -3127 -2077
## 32 ¦ ¦--CADILLAC NA NA NA
## 33 ¦ ¦ °--FOUR DOOR SEDAN NA NA NA
## 34 ¦ ¦ ¦--LEFT FRONT SEAT 598 -4706 -3407
## 35 ¦ ¦ °--RIGHT FRONT SEAT 900 -4809 -3180
## 36 ¦ ¦--CHEVROLET NA NA NA
## 37 ¦ ¦ °--FOUR DOOR SEDAN NA NA NA
## 38 ¦ ¦ °--LEFT FRONT SEAT 960 -9190 -5480
## 39 ¦ ¦--DODGE NA NA NA
## 40 ¦ ¦ °--VAN NA NA NA
## 41 ¦ ¦ ¦--LEFT FRONT SEAT 407 -2046 -6944
## 42 ¦ ¦ °--RIGHT FRONT SEAT 427 -6210 -2607
## 43 ¦ ¦--FORD NA NA NA
## 44 ¦ ¦ ¦--CONVERTIBLE NA NA NA
## 45 ¦ ¦ ¦ ¦--LEFT FRONT SEAT 811 -5556 -5298
## 46 ¦ ¦ ¦ °--RIGHT FRONT SEAT 129 -1134 -574
## 47 ¦ ¦ ¦--FOUR DOOR SEDAN NA NA NA
## 48 ¦ ¦ ¦ ¦--LEFT FRONT SEAT 907 -6401 -3763
## 49 ¦ ¦ ¦ °--RIGHT FRONT SEAT 331 -6067 -3759
## 50 ¦ ¦ °--VAN NA NA NA
## 51 ¦ ¦ ¦--LEFT FRONT SEAT 698 -6797 -3545
## 52 ¦ ¦ °--RIGHT FRONT SEAT 723 -4212 -1521
## 53 ¦ ¦--GEO NA NA NA
## 54 ¦ ¦ °--THREE DOOR HATCHBACK NA NA NA
## 55 ¦ ¦ ¦--LEFT FRONT SEAT 75 -6121 -6005
## 56 ¦ ¦ °--RIGHT FRONT SEAT 613 -1517 -1459
## 57 ¦ ¦--HONDA NA NA NA
## 58 ¦ ¦ ¦--FOUR DOOR SEDAN NA NA NA
## 59 ¦ ¦ ¦ ¦--LEFT FRONT SEAT 612 -2574 -4664
## 60 ¦ ¦ ¦ °--RIGHT FRONT SEAT 712 -3630 -63
## 61 ¦ ¦ ¦--THREE DOOR HATCHBACK NA NA NA
## 62 ¦ ¦ ¦ ¦--LEFT FRONT SEAT 302 -7908 -5954
## 63 ¦ ¦ ¦ °--RIGHT FRONT SEAT 119 -1214 -819
## 64 ¦ ¦ °--TWO DOOR SEDAN NA NA NA
## 65 ¦ ¦ ¦--LEFT FRONT SEAT 510 -3327 -2851
## 66 ¦ ¦ °--RIGHT FRONT SEAT 555 -1570 -2509
## 67 ¦ ¦--MITSUBISHI NA NA NA
## 68 ¦ ¦ °--FOUR DOOR SEDAN NA NA NA
## 69 ¦ ¦ ¦--LEFT FRONT SEAT 679 -6788 -3487
## 70 ¦ ¦ °--RIGHT FRONT SEAT 472 -1993 -3692
## 71 ¦ ¦--NISSAN NA NA NA
## 72 ¦ ¦ °--FOUR DOOR SEDAN NA NA NA
## 73 ¦ ¦ ¦--LEFT FRONT SEAT 818 -6219 -3043
## 74 ¦ ¦ °--RIGHT FRONT SEAT 907 -4284 -4559
## 75 ¦ ¦--OLDSMOBILE NA NA NA
## 76 ¦ ¦ °--FOUR DOOR SEDAN NA NA NA
## 77 ¦ ¦ ¦--LEFT FRONT SEAT 473 -4777 -5849
## 78 ¦ ¦ °--RIGHT FRONT SEAT 829 -5035 -5058
## 79 ¦ ¦--PLYMOUTH NA NA NA
## 80 ¦ ¦ °--VAN NA NA NA
## 81 ¦ ¦ ¦--LEFT FRONT SEAT 426 -1793 -3625
## 82 ¦ ¦ °--RIGHT FRONT SEAT 175 -231 -2202
## 83 ¦ ¦--PONTIAC NA NA NA
## 84 ¦ ¦ °--FOUR DOOR SEDAN NA NA NA
## 85 ¦ ¦ ¦--LEFT FRONT SEAT 360 -6122 -6291
## 86 ¦ ¦ °--RIGHT FRONT SEAT 768 -6668 -3134
## 87 ¦ ¦--SAAB NA NA NA
## 88 ¦ ¦ °--FOUR DOOR SEDAN NA NA NA
## 89 ¦ ¦ °--LEFT FRONT SEAT 361 -4786 -5534
## 90 ¦ ¦--TOYOTA NA NA NA
## 91 ¦ ¦ °--FOUR DOOR SEDAN NA NA NA
## 92 ¦ ¦ ¦--LEFT FRONT SEAT 428 -6183 -7139
## 93 ¦ ¦ °--RIGHT FRONT SEAT 649 -1361 -1210
## 94 ¦ °--VOLVO NA NA NA
## 95 ¦ °--FOUR DOOR SEDAN NA NA NA
## 96 ¦ ¦--LEFT FRONT SEAT 583 -4448 -4448
## 97 ¦ ¦--LEFT REAR SEAT 630 -4448 -4448
## 98 ¦ °--RIGHT FRONT SEAT 835 -4333 -205
## 99 ¦--1993 NA NA NA
## 100 ¦ °--... 22 nodes w/ 96 sub NA NA NA
## 101 °--... 34 nodes w/ 4642 sub NA NA NA
subset(data2, (data2$HIC >= 1 & data2$HIC <= 10) & (data2$LFL >= -10 & data2$LFL <= -1) & (data2$RFL >= -10 & data2$RFL <= -1), select=c(MAKED, YEAR, BODYD, OCCLOCD, HIC, LFL, RFL))
## MAKED YEAR BODYD OCCLOCD HIC LFL RFL
## 4524 SMART 2008 TWO DOOR COUPE LEFT FRONT SEAT 8 -6 -5
## 5083 TOYOTA 2007 FOUR DOOR SEDAN LEFT FRONT SEAT 1 -6 -5
subset(data2, (data2$HIC >= 11 & data2$HIC <= 50) & (data2$LFL >= -50 & data2$LFL <= -11) & (data2$RFL >= -50 & data2$RFL <= -11), select=c(MAKED, YEAR, BODYD, OCCLOCD, HIC, LFL, RFL))
## MAKED YEAR BODYD OCCLOCD HIC LFL RFL
## 167 BMW 2004 OTHER LEFT FRONT SEAT 11 -42 -26
## 169 BMW 2004 OTHER LEFT FRONT SEAT 15 -28 -47
## 173 BMW 2008 FOUR DOOR SEDAN LEFT FRONT SEAT 21 -30 -41
## 182 BMW 2009 TWO DOOR COUPE LEFT FRONT SEAT 27 -23 -15
## 336 CADILLAC 2010 FOUR DOOR SEDAN LEFT FRONT SEAT 18 -40 -46
## 781 CHEVROLET 2007 FOUR DOOR SEDAN LEFT FRONT SEAT 20 -40 -46
## 824 CHEVROLET 2010 TWO DOOR COUPE LEFT FRONT SEAT 11 -27 -31
## 836 CHEVROLET 2011 FOUR DOOR SEDAN LEFT FRONT SEAT 11 -37 -39
## 900 CHEVROLET 2014 FOUR DOOR SEDAN LEFT FRONT SEAT 12 -35 -36
## 902 CHEVROLET 2014 FOUR DOOR SEDAN LEFT FRONT SEAT 22 -17 -16
## 1406 DODGE 2007 PICKUP TRUCK LEFT FRONT SEAT 15 -28 -21
## 1413 DODGE 2008 FIVE DOOR HATCHBACK LEFT FRONT SEAT 13 -40 -33
## 1431 DODGE 2009 OTHER LEFT FRONT SEAT 14 -26 -27
## 1487 FIAT 2014 OTHER LEFT FRONT SEAT 16 -17 -22
## 1972 FORD 2005 OTHER LEFT FRONT SEAT 28 -48 -45
## 2096 FORD 2008 OTHER LEFT FRONT SEAT 17 -26 -22
## 2219 FORD 2013 FOUR DOOR SEDAN LEFT FRONT SEAT 17 -45 -40
## 2333 GMC 2008 OTHER LEFT FRONT SEAT 12 -39 -34
## 2815 HONDA 2008 OTHER LEFT FRONT SEAT 20 -43 -45
## 2833 HONDA 2010 FOUR DOOR SEDAN LEFT FRONT SEAT 28 -27 -29
## 2838 HONDA 2010 OTHER LEFT FRONT SEAT 24 -37 -37
## 2879 HONDA 2013 FOUR DOOR SEDAN LEFT FRONT SEAT 27 -43 -33
## 3071 HYUNDAI 2007 OTHER LEFT FRONT SEAT 50 -45 -41
## 3213 JAGUAR 2005 FOUR DOOR SEDAN LEFT FRONT SEAT 21 -12 -17
## 3257 JEEP 2004 OTHER LEFT FRONT SEAT 28 -23 -37
## 3410 KIA 2011 OTHER LEFT FRONT SEAT 12 -17 -12
## 3430 KIA 2014 OTHER LEFT FRONT SEAT 20 -34 -15
## 3442 KIA 2015 OTHER LEFT FRONT SEAT 19 -39 -34
## 3572 LINCOLN 2010 FOUR DOOR SEDAN LEFT FRONT SEAT 15 -11 -37
## 3643 MAZDA 2007 OTHER LEFT FRONT SEAT 41 -34 -47
## 3678 MAZDA 2014 FOUR DOOR SEDAN LEFT FRONT SEAT 19 -20 -21
## 3698 MAZDA 2015 FOUR DOOR SEDAN LEFT FRONT SEAT 26 -37 -41
## 4113 NISSAN 2005 4 DOOR PICKUP LEFT FRONT SEAT 14 -36 -45
## 4162 NISSAN 2008 OTHER LEFT FRONT SEAT 30 -37 -40
## 4176 NISSAN 2009 FOUR DOOR SEDAN LEFT FRONT SEAT 27 -30 -38
## 4180 NISSAN 2009 OTHER LEFT FRONT SEAT 17 -34 -23
## 4188 NISSAN 2010 OTHER LEFT FRONT SEAT 17 -15 -42
## 4203 NISSAN 2011 OTHER LEFT FRONT SEAT 34 -36 -34
## 4232 NISSAN 2014 FOUR DOOR SEDAN LEFT FRONT SEAT 16 -21 -15
## 4233 NISSAN 2014 FOUR DOOR SEDAN LEFT FRONT SEAT 19 -36 -34
## 4235 NISSAN 2014 FOUR DOOR SEDAN LEFT FRONT SEAT 34 -16 -11
## 4236 NISSAN 2014 FOUR DOOR SEDAN LEFT FRONT SEAT 34 -45 -47
## 4413 PORSCHE 2006 TWO DOOR COUPE LEFT FRONT SEAT 15 -37 -45
## 4451 SAAB 2006 FOUR DOOR SEDAN LEFT FRONT SEAT 25 -13 -27
## 4513 SATURN 2008 OTHER LEFT FRONT SEAT 24 -26 -18
## 4704 SUZUKI 2010 FOUR DOOR SEDAN LEFT FRONT SEAT 22 -46 -40
## 4989 TOYOTA 2004 FOUR DOOR SEDAN LEFT FRONT SEAT 31 -26 -34
## 5085 TOYOTA 2007 FOUR DOOR SEDAN LEFT FRONT SEAT 20 -50 -44
## 5086 TOYOTA 2007 FOUR DOOR SEDAN LEFT FRONT SEAT 24 -37 -42
## 5105 TOYOTA 2008 OTHER LEFT FRONT SEAT 16 -40 -41
## 5133 TOYOTA 2009 OTHER LEFT FRONT SEAT 11 -30 -40
## 5135 TOYOTA 2009 OTHER LEFT FRONT SEAT 49 -50 -47
## 5160 TOYOTA 2010 PICKUP TRUCK LEFT FRONT SEAT 12 -24 -15
## 5176 TOYOTA 2011 FOUR DOOR SEDAN LEFT FRONT SEAT 13 -46 -48
## 5181 TOYOTA 2011 FOUR DOOR SEDAN LEFT FRONT SEAT 28 -35 -27
## 5182 TOYOTA 2011 FOUR DOOR SEDAN LEFT FRONT SEAT 32 -12 -13
## 5282 TOYOTA 2015 FOUR DOOR SEDAN LEFT FRONT SEAT 12 -36 -35
## 5390 VOLKSWAGEN 2007 FOUR DOOR SEDAN LEFT FRONT SEAT 20 -36 -34
## 5392 VOLKSWAGEN 2007 FOUR DOOR SEDAN LEFT FRONT SEAT 28 -19 -39
## 5506 VOLVO 2007 OTHER LEFT FRONT SEAT 47 -12 -18
## 5514 VOLVO 2011 FOUR DOOR SEDAN LEFT FRONT SEAT 46 -45 -39
subset(data2, (data2$HIC >= 51 & data2$HIC <= 100) & (data2$LFL >= -100 & data2$LFL <= -51) & (data2$RFL >= -100 & data2$RFL <= -51), select=c(MAKED, YEAR, BODYD, OCCLOCD, HIC, LFL, RFL))
## MAKED YEAR BODYD OCCLOCD HIC LFL RFL
## 2019 FORD 2006 FOUR DOOR SEDAN LEFT REAR SEAT 74 -99 -94
## 2031 FORD 2006 FOUR DOOR SEDAN RIGHT FRONT SEAT 65 -76 -52
## 2825 HONDA 2009 OTHER LEFT FRONT SEAT 92 -57 -54
## 2853 HONDA 2011 OTHER LEFT FRONT SEAT 54 -82 -79
## 3345 KIA 2005 OTHER LEFT FRONT SEAT 53 -98 -79
## 4158 NISSAN 2008 FOUR DOOR SEDAN LEFT FRONT SEAT 61 -75 -86
## 5406 VOLKSWAGEN 2009 OTHER LEFT FRONT SEAT 79 -61 -54
In the first output, we select a couple of vehicles with the least amount of injury to the head and legs. In addition, two additional ranges of vehicles are provided, where minimal head and leg injuries were reported. These display results with vehicle year, vehicle body type, and occupant location in the vehicle. The occupant location, is the test dummy used to measure different crash impact forces.
Difficulty Encountered: As mentioned above, visualizing the large number of rows is difficult with any ease. This presented challenges in displaying the results so a consumer can easily find their vehicle of choice. So, we decided to present the results in subsets of the overall results.
References:
https://www.intmath.com/applications-integration/hic-part2.php
https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/vehdb-v4.pdf
https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/rev_criteria_0.pdf
https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/rev_criteria_0.pdf
https://en.wikipedia.org > wiki > Head_injury_criterion