This project involves data analysis using the R programming language, focusing on a data set related to fatal accidents involving Tesla vehicles. The data set includes detailed information about incidents such as driver involvement, pedestrian or cyclist presence, the status of Tesla’s Autopilot feature, and descriptions of the events leading to the accidents.
The primary goal of the analysis is to uncover patterns and insights surrounding the circumstances of these fatal events. By processing, transforming, and visualizing the data, the project aims to understand how factors like Autopilot usage, driver error, and pedestrian involvement contribute to these accidents.
By identifying key trends, this project can contribute to a broader understanding of Tesla-related accidents and potentially inform future improvements in vehicle safety and public policy.
Libraries are collections of functions, data, and documentation developed to extend R’s base capabilities. Libraries give us access to tools like functions for visualization, data manipulation or building web apps.
Libraries used in this project are:
shiny -> interactive web apps
leaflet -> interactive maps
dplyr -> data manipulation (filter, select, mutate…)
ggplot2 -> data visualization
rnaturalearth -> geospatial data for countries
rnaturalearthdata -> data used by rnaturalearth
sf -> SIMPLE FEATURES, spatial data
countrycode -> countr names conversion
maps -> map drawing
lubridate -> date and time operations
tools -> utility funcitions
plotly -> interactive plots
library(shiny)
library(leaflet)
library(dplyr)
library(ggplot2)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
library(countrycode)
library(maps)
library(lubridate)
library(tools)
library(plotly)
Using function read.csv(), a data set has been uploaded into a variable named “data”. After analyzing the data, some columns had data which was not useful for the analysis so the data was filtered and this columns were removed:
Deceased.4 -> it is a blank column
Source -> column contained links from where the data was taken
Note -> mostly blank, contains irrelevant data
A new data set is stored under varibale “data_filtered”.
data <- read.csv("C:\\Users\\rinod\\Documents\\faks\\erasmus+\\data_processing\\tesla_project\\Tesla Deaths - Deaths (3).csv")
data_filtered <- data[, !(names(data) %in% c("Deceased.4", "Source", "Note"))]
For purposes of data analysis, some columns had to be formated or data had to be separated in order to get relevant data.
One of those columns is “Date” which has a format of “8/16/2022”. A value in this format doesn’t give us any concrete data, so from it, “Year” and “Month” were extracted in order to make a timeline which can show us the number of accidents through time.
Two new variables are created, “monthly_counts” and “yearly_counts” which store data about in which month/year the accident happened. This can later help for visualization of accidents timeline. Data is grouped by month/year and it was summed in order to get the data for each time window.
data_filtered$Date <- mdy(data_filtered$Date)
data_filtered$Year <- year(data_filtered$Date)
data_filtered$Month <- floor_date(data_filtered$Date, "month")
monthly_counts <- data_filtered %>%
group_by(Month) %>%
summarise(Accidents = n()) %>%
arrange(Month)
yearly_counts <- data_filtered %>%
filter(!is.na(Year)) %>%
group_by(Year) %>%
summarise(Fatal_Accidents = n()) %>%
arrange(Year)
Column “Deaths” shows how many people died in a specific accident. In order to get a good visualization, the data had to be grouped by the number of deaths and summed up. New data is stored under a variable “death_counts”.
death_counts <- data_filtered %>%
filter(!is.na(Deaths)) %>%
group_by(Deaths) %>%
summarise(Accidents = n()) %>%
arrange(Deaths)
Autopilot represents a software component inside Tesla cars which completely takes over the control of the car from the driver. Involvement of autopilot can highly increase the risk of an accident. After sorting the data, elements were divided into two groups “Yes” which indicates that in the moment of the accident, an autopilot was turned on, and “No” which indicates that autopilot wasn’t involved in the accident. Data about autopilot summary has been stored in the variable “autopilot_summary”.
data_filtered$AutoPilot.claimed[data_filtered$AutoPilot.claimed == " - "] <- "No"
data_filtered$AutoPilot.claimed[data_filtered$AutoPilot.claimed != "No"] <- "Yes"
autopilot_summary <- data_filtered %>%
filter(!is.na(AutoPilot.claimed)) %>%
group_by(AutoPilot.claimed) %>%
summarise(Count = n())
Under the column “Description”, an explanation of the accident has been written down. Since a lot of elements differ mutually, a generalization of the data had to be performed. All accidents where a Tesla crashed into a moving or stationary obstacle were renamed as “Tesla crash”. If a Tesla hit a pedestrian or a cyclist, the element has been changed to “Pedestrian Involved”. The rest of the accidents were the ones where the driver misused a Tesla and caused an accident.
pedestrian_cases <- c(
179, 172, 170, 160, 150, 139, 129, 116, 106, 84,
71, 70, 63, 39, 36, 28, 26, 14, 9, 2,
251, 247, 228, 227, 224, 222, 191, 188, 183
)
driver_error_cases <- c(
171, 165, 149, 120, 117, 99,
81, 56, 45, 42, 27, 252, 239, 234, 232,
220, 219, 215, 207, 205, 189, 185
)
data_filtered$Description[data_filtered$Description != " "] <- "Tesla Crash"
data_filtered$Description[data_filtered$Case.. %in% pedestrian_cases] <- "Pedestrian Involved"
data_filtered$Description[data_filtered$Case.. %in% driver_error_cases] <- "Driver Error"
description_counts <- data_filtered %>%
filter(Description %in% c("Pedestrian Involved", "Tesla Crash", "Driver Error")) %>%
group_by(Description) %>%
summarise(Count = n()) %>%
ungroup()
One of the main parts of the analysis is to see whether a deceased was a Tesla occupant or a pedestrian. Tesla occupant is a person which was driving a Tesla car or was a passenger in a Tesla car. Others, like pedestrians, cyclists or other vehicle passengers are stored under Others.
data_filtered$Tesla.driver[data_filtered$Tesla.driver == " - "] <- "0"
data_filtered$Tesla.occupant[data_filtered$Tesla.occupant == " - "] <- "0"
data_filtered$Other.vehicle[data_filtered$Other.vehicle == " - "] <- "0"
data_filtered$Cyclists..Peds[data_filtered$Cyclists..Peds == " - "] <- "0"
data_filtered$Tesla.driver <- as.integer(data_filtered$Tesla.driver)
data_filtered$Tesla.occupant <- as.integer(data_filtered$Tesla.occupant)
data_filtered$Other.vehicle <- as.integer(data_filtered$Other.vehicle)
data_filtered$Cyclists..Peds <- as.integer(data_filtered$Cyclists..Peds)
tesla_total <- sum(data_filtered$Tesla.driver, data_filtered$Tesla.occupant, na.rm = TRUE)
others_total <- sum(data_filtered$Other.vehicle, data_filtered$Cyclists..Peds, na.rm = TRUE)
bar_data <- data.frame(
Category = c("Tesla Occupants", "Others"),
Count = c(tesla_total, others_total)
)
TO count the number of recorded deceased, columns Deceased.1, Deceased.2 and Deceased.3 were combined and counted.
deceased_combined <- c(data_filtered$Deceased.1,
data_filtered$Deceased.2,
data_filtered$Deceased.3)
deceased_clean <- na.omit(trimws(deceased_combined))
deceased_clean <- deceased_clean[deceased_clean != "" & !is.na(deceased_clean)]
After the data has been cleared, filtered and formated, a data analysis can be performed. Data analysis will pass through number of deaths, types of accidents and location distribution of the accidents,
On the timeline plot below, the distribution of accidents per month is visible. Most accidents happened during May of 2022 which counted a total of 14 accidents. A rising trend in accidents is visible through the entire graph which is a result of higher car demand and availability through the years.
#accidents per month
p_timeline <- ggplot(monthly_counts, aes(x = Month, y = Accidents, group = 1,
text = paste0("Month: ", Month, "<br>Accidents: ", Accidents))) +
geom_line(color = "red", size = 1.2) +
geom_point(color = "darkred") +
labs(title = "Tesla Accidents Over Time (Monthly)",
x = "Month",
y = "Number of Accidents") +
theme_minimal()
ggplotly(p_timeline, tooltip = "text")
On the timeline plot below, the distribution of accidents per year is visible. Most accidents happened during 2021 which counted a total of 58 accidents. A rising trend in accidents is visible through the entire graph which is a result of higher car demand and availability through the years.
#accidents per year
plot_ly(data = yearly_counts,
x = ~Year,
y = ~Fatal_Accidents,
type = 'scatter',
mode = 'lines+markers',
line = list(color = "darkred"),
marker = list(size = 6),
text = ~paste("Year:", Year, "<br>Accidents:", Fatal_Accidents),
hoverinfo = 'text') %>%
layout(title = "Tesla Accidents per Year",
xaxis = list(title = "Year", tickformat = "d"),
yaxis = list(title = "Number of Accidents"))
Number of death in one accident can indicate a severity of the accidents. On the ie chart below, a distribution of number of deaths per accident is shown. Most of accidents (almost 84%) had only one deceased so it can be concluded that the overall accident severity is on a low level. Accidents invloving more than one deceased include 41 deaths.
#number of deaths per accident
plot_ly(death_counts,
labels = ~Deaths,
values = ~Accidents,
type = 'pie',
textinfo = 'label+percent',
hoverinfo = 'label+value+percent',
marker = list(colors = RColorBrewer::brewer.pal(n = nrow(death_counts), name = "Reds"))) %>%
layout(title = "Number of Deaths per Tesla Accident")
Autopilot involvement analysis can highly reduce investigation time since the technology behind it is still considered unreliable. On a pie chart below, a comparison of accidents with autopilot involved and accidents without autopilot is shown. It is visible how in 224 out of 254 accidents didn’t have autopilot turned on which implies the technology is still not at the level of safe use.
plot_ly(autopilot_summary,
labels = ~AutoPilot.claimed,
values = ~Count,
type = "pie",
textinfo = "label+percent",
hoverinfo = "label+value+percent",
marker = list(colors = c("steelblue", "firebrick"))
) %>%
layout(title = "Tesla Fatal Accidents: Autopilot Claimed (Yes vs No)")
Description of the accident can assist in concluding whose fault was it for the accident. Data was constructed in a descriptive way without repetition so in order to make a good analysis, data had to be fitted in couple of categories. Examples of values inside column “Description” are as follows:
Tesla car crashes into tree, burns
Tesla hits motorcycle
Tesla hits pedestrian
DUI crash on highway, Tesla catches fire
Over 250 different descriptions have been formatted into three categories:
Tesla crash - Tesla car crashed into an obstacle or a vehicle crashed into Tesla
Pedestrian involved - Tesla hit a pedestrian
Driver Error - Accident was drivers fault (DUI, misuse…)
Analyzing the results, it is visible that almost 80% of the accidents were Tesla crashes. Pedestrian involved accidents count more than 11%, while in almost 9% of accidents it was the drivers fault.
#type of accident
plot_ly(description_counts,
labels = ~Description,
values = ~Count,
type = "pie",
textinfo = "label+percent",
hoverinfo = "label+value+percent",
marker = list(colors = c("darkorange", "darkred", "steelblue"))
) %>%
layout(title = "Tesla Accidents by Description")
Comparing deaths within a Tesla vehicle and deaths of other participants can suggest whether Tesla cars are more fatal for the driver or other participants in a traffic network. Results are approximately the same with a higher number in deaths of other participants.
153 people died as a participant in the traffic
140 people died within a Tesla vehicle, whether as a driver or an occupant.
plot_ly(bar_data,
x = ~Category,
y = ~Count,
type = 'bar',
text = ~paste(Count),
hoverinfo = 'text',
marker = list(color = c("firebrick", "steelblue"))) %>%
layout(title = "Fatalities: Tesla Occupants vs Others",
xaxis = list(title = ""),
yaxis = list(title = "Number of Deaths"))
Shiny is an R package that allows users to build interactive web applications directly from R, without requiring knowledge of HTML, CSS, or JavaScript. It provides an intuitive framework to connect user inputs with reactive visual outputs, making it ideal for presenting data analysis in an engaging and dynamic format.
In this project, Shiny is used to build an interactive dashboard that visualizes Tesla-related fatal accidents geographically. The app is designed to allow users to explore the data through two maps:
World map -> Displays a global overview of Tesla-related deaths.
Darker shade of red indicates higher number of deaths by country
USA stands out as the country with the most Tesla accidents which counts 185 accidents
After USA, China holds the record for most Tesla deaths as a standalone country with 13 accidents
US states map -> Focuses specifically on the United States, showing the number of Tesla fatalities by state
Darker shade of red indicates higher number of deaths by state
California with 79 and Florida with 26 deaths stand out with the most Tesla deaths in the USA
#shiny
data_filtered$Country_clean <- countrycode(data_filtered$Country, origin = "country.name", destination = "country.name")
data_filtered$Country_clean[data_filtered$Country_clean == "United States"] <- "United States of America"
data_filtered$State <- toupper(trimws(data_filtered$State))
data_filtered$State_full <- state.name[match(data_filtered$State, state.abb)]
ui <- fluidPage(
titlePanel("Tesla Fatal Accidents Dashboard"),
sidebarLayout(
sidebarPanel(
sliderInput("year_range", "Select Year Range:",
min = min(data_filtered$Year, na.rm = TRUE),
max = max(data_filtered$Year, na.rm = TRUE),
value = c(2016, 2025), sep = ""),
checkboxInput("include_usa", "Include USA in World Map", TRUE),
checkboxInput("highlight_states", "Highlight California & Florida", TRUE)
),
mainPanel(
tabsetPanel(
tabPanel("World Map", leafletOutput("worldMap", height = 500)),
tabPanel("US Map", leafletOutput("usMap", height = 500))
)
)
)
)
server <- function(input, output) {
filtered_data <- reactive({
data_filtered %>% filter(Year >= input$year_range[1], Year <= input$year_range[2])
})
output$worldMap <- renderLeaflet({
country_counts <- filtered_data() %>%
group_by(Country_clean) %>%
summarise(Accidents = n())
world <- ne_countries(scale = "medium", returnclass = "sf")
world_data <- left_join(world, country_counts, by = c("name" = "Country_clean"))
if (!input$include_usa) {
world_data <- world_data[world_data$name != "United States of America", ]
}
non_usa_vals <- world_data$Accidents[world_data$name != "United States of America"]
pal <- colorNumeric(palette = "Reds", domain = non_usa_vals, na.color = "lightgray")
getColor <- function(country, count) {
if (country == "United States of America") {
return("black")
} else if (is.na(count)) {
return("lightgray")
} else {
return(pal(count))
}
}
world_data$fillColor <- mapply(getColor, world_data$name, world_data$Accidents)
leaflet(world_data) %>%
addTiles() %>%
addPolygons(
fillColor = ~fillColor,
weight = 1,
color = "white",
fillOpacity = 0.7,
label = ~paste(name, ": ", ifelse(is.na(Accidents), "No data", Accidents)),
highlightOptions = highlightOptions(color = "black", weight = 2, bringToFront = TRUE)
) %>%
addLegend(pal = pal, values = non_usa_vals, opacity = 0.7,
title = "Tesla Accidents", position = "bottomright")
})
output$usMap <- renderLeaflet({
state_counts <- filtered_data() %>%
group_by(State_full) %>%
summarise(Accidents = n())
us_states_map <- map("state", fill = TRUE, plot = FALSE)
us_states_sf <- st_as_sf(us_states_map)
us_states_sf$State_full <- tools::toTitleCase(us_states_sf$ID)
map_data <- left_join(us_states_sf, state_counts, by = "State_full")
palette_domain <- map_data$Accidents[!map_data$State_full %in% c("California", "Florida")]
palUSA <- colorNumeric("Reds", domain = palette_domain, na.color = "lightgray")
map_data$fillColor <- mapply(function(state, count) {
if (input$highlight_states && state %in% c("California", "Florida")) {
return("black")
} else if (is.na(count)) {
return("lightgray")
} else {
return(palUSA(count))
}
}, map_data$State_full, map_data$Accidents)
leaflet(map_data) %>%
addTiles() %>%
addPolygons(
fillColor = ~fillColor,
weight = 1,
color = "white",
fillOpacity = 0.7,
label = ~paste(State_full, ": ", ifelse(is.na(Accidents), "No data", Accidents)),
highlightOptions = highlightOptions(color = "black", weight = 2, bringToFront = TRUE)
) %>%
addLegend(pal = palUSA,
values = palette_domain,
opacity = 0.7,
title = "Tesla Accidents by State",
position = "bottomright")
})
}
shinyApp(ui = ui, server = server)
In honor of:
## [1] "James T. Penner" "Douglas David Rockacy"
## [3] "Jeremy Caballero" "Mike Vargas Sr."
## [5] "Kaleb D. Dawson" "James Clinton Davies"
## [7] "Peggy K. Agaplou" "Katryn R. Fisher"
## [9] "Fredrick Scheffler II" "Koelby Edlund"
## [11] "Ms. Au" "Daniel Sincavage"
## [13] "David M. Baum" "Wayne Walter Swanson Jr."
## [15] "Joseph Lucero" "Ruiju Ma"
## [17] "Jose Elizarraraz Bravo" "Yu Nan"
## [19] "Christopher Liang" "Terry L. Siegel"
## [21] "Mitchell Moreno Jr." "Armin W."
## [23] "Kyle Germann" "Ariyanna Parsad"
## [25] "Jean Louis" "Samuel James Morson"
## [27] "William Varner" "Crystal Stash"
## [29] "Yoon Hong-geun" "Kevin Gallardo"
## [31] "Matthew Hathaway" "David Alan Brown"
## [33] "Claudia Günther" "Gary Marchi"
## [35] "Zachary Stringer" "Norma Jean Nixon"
## [37] "Mike Cochlin" "Arthur Henry Oliver"
## [39] "Vladmir Chen" "Zhang"
## [41] "Sarina Astorga" "Augustina Lillie"
## [43] "Jenna Monet" "Charles Barancik"
## [45] "Emelio Perez" "Robert Krottner"
## [47] "Nina Mogster" "Joan Fernando Valerio Jimenez"
## [49] "Marjorie Hill" "Wayne Marbury"
## [51] "Daniel J. Rodkey" "Kaity West"
## [53] "Janet Genao" "Omar Awan"
## [55] "Rod Rhines" "James Accurso"
## [57] "Keith Leund" "Edgar Martinez"
## [59] "Walter Huang" "Joel Michael"
## [61] "Greg Dolphin" "Bryant Gonzalez"
## [63] "James Scott" "Casey Speckman"
## [65] "Joshua Brown" "Gao Yaning"
## [67] "Tim Devine" "Peter Kleis"
## [69] "Louis Francis Thoelecke" "Joshua Slot"
## [71] "Alberto Casique-Salinas" "Alex Francisco Cuesta"
## [73] "Kyle Rieger" "Crystal McCallum"
## [75] "Sarina Astorga" "Madyson James"
## [77] "Luis Elizarraraz Bravo" "Robert G. Bailly"
## [79] "Jazmine Marquez" "Everette Talbot"
## [81] "Henry H. Abrahams" "Frau Sarah"
## [83] "Liam Cochlin" "Barrett Riley"
## [85] "Kevin McCarthy" "Armando Garcia-Gonzales"
## [87] "Andrew James Chaves" "Kiyvon Martin"
## [89] "Claudias Mutter Sylvia" "Quinn Cochlin"