2025-03-25

Our Purpose

This dataset contains information on vehicle thefts in 2015, detailing the total number of thefts for various makes and models across different states. We chose to analyze this dataset as a tool for insurance companies to gain meaningful insights into vehicle thefts by state, make, and model. This information can help companies assess which vehicles may pose higher risks for theft and guide their decisions on providing coverage.

Graph 1: Total Thefts by Make/Model

Analysis of Graph 1

The chart highlights which vehicle makes and models experienced the highest number of thefts in 2015. The graph is heavily skewed, with a small number of models accounting for a high total of thefts. Applying the Pareto principle, 20% of the models account for 80% of total thefts, emphasizing the high-risk of these specific vehicle models. Looking at the graph, the models with most thefts are Ford Pickups, Chevy Pickups, Honda Accords, Honda Civics, and Toyota Camrys. Insurance companies looking at this will charge higher rates for coverage of these vehicles.

Graph 2: Year vs. Make, focusing on top 5 makes

Analysis of Graph 2

If insurance companies wanted another variable to get a more accurate rundown on thefts, they can use the latest graph, which includes year and make. Due to the high number of different makes, we used the top 5 as an illustration in the graph, but this statistical analysis is for the data set as a whole. Because these makes are the top 5 in thefts, the statistics are found to be reflective in the graph.

Min: The oldest model year recorded in the dataset is 1989. 1st Quartile: 25% of the vehicles have a model year before 1998. Median: The median year is 2002, meaning half of the vehicles are from 2002 or earlier. Mean: The average model year is 2003, slightly higher than the median, suggesting the data is skewed toward more recent years. 3rd Quartile: 75% of the vehicles have a model year before 2006. Max: The newest vehicle year is 2015.

Focusing on this graph specifically, we see that newer toyota cars are a higher target for theft.

R Code of Graph 3

library(dplyr) library(plotly) library(stringr)

stolen_cars <- stolen_cars %>% mutate(Thefts = as.numeric(Thefts), Make = word(Make.Model, 1))

state_make <- stolen_cars %>% group_by(State, Make) %>% summarise(TotalThefts = sum(Thefts, na.rm = TRUE)) %>% ungroup()

plot_ly( data = state_make, x = ~State, y = ~Make, type = ‘scatter’, mode = ‘markers’, marker = list( size = ~log(TotalThefts + 1) * 3, # scale marker size using a logarithmic transform color = ~TotalThefts, colorscale = ‘Viridis’, showscale = TRUE, opacity = 0.8 ), text = ~paste(“State:”, State, “
Make:”, Make, “
Total Thefts:”, TotalThefts) ) %>% layout( title = “Total Thefts of Each Make in Each State”, xaxis = list( title = “State”, type = “category”, tickangle = 45, automargin = TRUE, domain = c(0, 1) ), yaxis = list(title = “Make”), margin = list(l = 0, r = 50, t = 50, b = 100) )

Graph 3: Total Thefts of Make in each State

Analysis of Graph 3

If insurance companies want more information about thefts per state to have a better estimate when deciding rates for their coverage of certain vehicles, they can use this graph. We see here that Chevrolets and Hondas have a high volume of thefts in a select number of varying states.

Graph 4: Make vs Year vs Total Thefts

Analysis of Graph 4

In our final graph, we have included a third variable, combining make by year by total thefts. This helps us to spot any makes by year with a high volume by putting the total thefts on the z axis. We see that the highest thefts of a specific make of a specific year is Fords in 2006, with the total thefts being 2022. Insurance companies can spot these high volume theft makes by year and make quick decisions as to whether or not they want to cover vehicles like this, and if so how much they would charge.

Conclusion

Our analysis provided valuable insights into vehicle theft patterns across different states, makes, and model years. The visualizations highlighted the most frequently stolen models and identified trends that could help insurance companies assess risk more effectively. These findings can guide decisions on coverage policies and potential premium adjustments.