Crime Statistics in the United States, 1979 - 2019

Introduction

Crime and Policing have a long and complex history in the United States. In this project we examine trends in different categories of crime since 1979.

Crime has tended to decrease over time in the United States, all the way back to Colonial times, however there have been intermitent disruptions to this trend. In particular, violent crime rates increased from the 1960s, and peaked in the early 1990s. In this project we will examine trends in different categories of crime since 1979. In particular, we will focus on the violent crime spike of the 1990s to try and develop a deeper understanding of events.

Description of Data

We will be examining the following U.S. Crimes dataset from Kaggle.com:

https://www.kaggle.com/tunguz/us-estimated-crimes

This data is derived from the FBI Summary Reporting System. It reflects the estimates that the FBI has traditionally included in its annual publications.

The dataset includes the following columns:

  • year

  • population

  • state_abbr

  • state_name

as well as yearly state-by-state and nationwide counts, from 1979-2019, for occurrences of crimes in the following categories:

  • violent crime

  • homicide

  • rape_legacy($)

  • rape_revised($)

  • robbery

  • aggravated_assault

  • property crime

  • burglary

  • larceny

  • motor_vehicle_theft

$ - In 2013, the FBI updated its working definition of rape from a “forcible” act to and act performed “without consent”. For more information see: https://ucr.fbi.gov/crime-in-the-u.s/2018/crime-in-the-u.s.-2018/topic-pages/rape

Preprocessing

library(tidyverse)
library(ggthemes)
library(scales)
library(RColorBrewer)
library(tools)
us_crimes <- readr::read_csv("estimated_crimes_1979_2019.csv")

The file has NA for the name and abbreviation columns for rows corresponding to the whole country, so we’ll fill that in:

us_crimes <- us_crimes %>% mutate(state_name = ifelse((is.na(state_name)), "United States", state_name))
us_crimes <- us_crimes %>% mutate(state_abbr = ifelse((is.na(state_abbr)), "US", state_abbr))

Where else is data missing?

sum(is.na(us_crimes$violent_crime))
[1] 0
sum(is.na(us_crimes$homicide))
[1] 0
sum(is.na(us_crimes$rape_legacy))
[1] 156
sum(is.na(us_crimes$rape_revised))
[1] 2116
sum(is.na(us_crimes$robbery))
[1] 0
sum(is.na(us_crimes$aggravated_assault))
[1] 0
sum(is.na(us_crimes$property_crime))
[1] 0
sum(is.na(us_crimes$burglary))
[1] 0
sum(is.na(us_crimes$larceny))
[1] 0
sum(is.na(us_crimes$motor_vehicle_theft))
[1] 0

Since the FBI updated its definition of rape, there are missing entries for these columns for years for which the definition was not in effect.

Rape statistics are also missing from 2017 onward:

us_crimes %>%
  select(year, rape_legacy, rape_revised) %>%
  filter(!((is.na(rape_legacy)) & (is.na(rape_revised)))) %>%
  arrange(desc(year)) %>%
  head(1)

With all that being said, this report will for the most part not be focused on the rape statistics. I do think that analyzing the consequences of this change would have made a great topic for the investigation; I was simply not aware of it when I began researching. However, I will definitely keep it in mind for a future project.

The dataset gives us total counts and the population, so we will divide these quantities to get the rate-per-person corresponding to each annual crime tally.

Divide by population to compute rates:

us_crimes <- us_crimes %>%
  mutate("aggravated_assault_rate" = aggravated_assault / population,
         "property_crime_rate" = property_crime / population,
         "violent_crime_rate" = violent_crime / population,
         "robbery_rate" = robbery / population,
         "motor_vehicle_theft_rate" = motor_vehicle_theft / population,
         "larceny_rate" = larceny / population,
         "homicide_rate" = homicide / population,
         "burglary_rate" = burglary / population,
         "rape_legacy_rate" = rape_legacy / population,
         "rape_revised_rate" = rape_revised / population)

We’ll look at this in tableau later, so we’ll save a new .csv file with the extra columns:

readr::write_csv(us_crimes, "us_crimes_with_rates.csv")

Exploratory plots

US Population 1979 - 2019

The population of the United States changed dramatically over the last few decades. Let’s take a quick look at how it changed, to put our data in a broader context:

whole_country <- us_crimes %>%
  filter(state_name != "United States") %>%
  group_by(year) %>%
  summarise(total_population = sum(population))

Here is how the population of the United States has changed since 1979:

whole_country %>%
  ggplot(aes(x = year, y = total_population)) +
  geom_line() +
  labs(title = "U.S. Population 1979-2019") +
  ylab("Population") +
  xlab("Year") +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme_hc()

The population has increased substantially over the range of data in this set. It is interesting to note that decreasing crime rates have coincided with an increasing population. All else being equal, one might guess the opposite would be the case.

Incidence and Rate of Crime by Category

Let us now examine the trends in rates of each category of crime in the dataset:

whole_country_long <- us_crimes %>%
  filter(state_name != "United States") %>%
  pivot_longer(names_to = "category",
               values_to = "count",
               cols = c("violent_crime",
                        "homicide",
                        "rape_legacy",
                        "rape_revised",
                        "robbery",
                        "aggravated_assault",
                        "burglary",
                        "larceny",
                        "motor_vehicle_theft"))
whole_country_long <- whole_country_long %>%
  mutate(category = toTitleCase(str_replace_all(category, "_", " ")))
whole_country_long %>%
  ggplot(aes(x = year, y = count)) +
  geom_line(aes(group = category, color = category), stat = 'summary', size = 1.5) +
  labs(title = "U.S. Crime Incidents By Category, 19279 - 2019") +
  ylab("Total (raw count)") +
  scale_color_brewer(palette = "RdYlGn", name = "Category") +
  theme_hc()

We can observe a few things from this plot. Larceny and burglary have occurred consistently at higher numbers than violent crimes, which is perhaps not so surprising. Larceny is categorically more common than all other crimes across the years in our data, and has also decreased substantially since 1990.

However, if we focus on the line for violent crime, we can see that this category has been relatively consistent across the years in our data. Though it has decreased from the peak in the 1990s, it appears to have remained at the same rate or even increased slightly overall over the years.

whole_country_rates_long <- us_crimes %>%
  filter(state_name != "United States") %>%
  pivot_longer(names_to = "category",
               values_to = "rate",
               cols = c(
                        "violent_crime_rate",
                        "homicide_rate",
                        "rape_legacy_rate",
                        "rape_revised_rate",
                        "robbery_rate",
                        "aggravated_assault_rate",
                        "burglary_rate",
                        "larceny_rate",
                        "motor_vehicle_theft_rate"))

Plotting the rates over time for each category:

whole_country_rates_long <- whole_country_rates_long %>%
  mutate(category = toTitleCase(str_replace_all(category, "_", " ")))
whole_country_rates_long %>%
  ggplot(aes(x = year, y = rate)) +
  labs(title = "U.S. Crime Incidents By Category, 1979 - 2019") +
  ylab("Rate (per capita)") +
  geom_line(aes(group = category, color = category), stat = 'summary', size = 1.5) +
  scale_color_brewer(palette = "RdYlGn", name = "Category") +
  theme_hc()

This is similar to the last plot, but looking at the rates makes it a bit more clear that larceny and burglary have really trended down significantly over these years. The peak in violent crime in the 1990s is also a bit more obvious when looking at the rates as opposed to the counts.

Let’s visualize things from a slightly different perspective. Here is the bar chart of all crimes of each category in this dataset, for the whole dataset and time period:

whole_country_long %>%
  group_by(category) %>%
  summarise(total = sum(count)) %>%
  select(category, total) %>%
  ggplot(aes(x = category, y = total)) +
  geom_bar(stat="Identity", fill = "cadetblue4", color = "cadetblue4") +
  labs(title = "U.S. Crime Incidents By Category, 19279 - 2019 (total)") +
  ylab("Total (raw count)") +
  xlab("Category") +
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5) ) + 
  theme_hc()

Violent Crime in the United States, 2015-2019

Histogram of States’ Average Violent Crime Rate, 2015-2019

It is important to understand the trends that have been taking place in our country, but we may also be interested in what is going on more recently. Let’s look at which states had the most violent crime over the last five years in the set; 2015-2019

violent_crime_last_5_years <- us_crimes %>%
  select(year, state_name, violent_crime_rate) %>%
  filter(year >= 2015) %>%
  filter(!(is.na(state_name))) %>%
  filter(state_name != "United States")

We will plot the average rate of violent crime over the five year period for all 50 states as a histogram:

last_5_dist <- violent_crime_last_5_years %>%
  group_by(state_name) %>%
  summarise(avg = mean(violent_crime_rate))

last_5_dist %>%
  ggplot(aes(x = avg)) +
  geom_histogram(bins = 15, fill = "cadetblue3", color = "cadetblue4") +
  labs(title = "Average Violent Crime Rate per State, 2015-2019") +
  ylab("No. of States") +
  xlab("Violent Crime Rate") + 
  theme_hc()

NA

Distribution and Outlier Analysis

The distribution is somewhat close to normal, but noticeably right-skewed.

Let’s see what the top states are:

last_5_dist %>%
  arrange(desc(avg))

Let’s see if D.C. and the other top states are outliers:

summary(last_5_dist$avg)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.001203 0.002683 0.003633 0.003855 0.004564 0.011052 
Q3 <- 0.004564
Q1 <- 0.002683
IQR <- Q3 - Q1

lf <- Q1 - 1.5*IQR
uf <- Q3 + 1.5*IQR
lf
[1] -0.0001385
uf
[1] 0.0073855

D.C., Alaska, and New Mexico are all statistical outliers for violent crime over the last five years. In my opinion, D.C. does not really belong in this analysis because it is only one city. I suspect that while we observe very high crime rates in D.C. compared to the other states, we may see similar rates in say, Los Angeles and New York City as opposed to California and New York.

On the other hand, Alaska and New Mexico do not even include large urban centers, so I was initially surprised by this result. What I found upon researching it a bit is that violent crime rates are especially high among the Native American population; a very serious issue that seems to receive unfortunately little attention from the media and political leadership. [1]

Visualizing U.S. Violent Crime with Tableau

The following interactive tableau visualization shows quantity and rate of violent crime incidents for each state in the dataset.

https://public.tableau.com/profile/daniel.lefevre#!/vizhome/Data110Final2/Dashboard1

Looking at all states, we can easily identify DC as the outlier in terms of rate, and we can also see that it accounted for a relatively small overall quantity of violent crimes which took place.

We can turn individual states on and off using the filter on the right. Here is DC vs. New York for the years in the dataset:

Thoughts and Summary

I was aware going into this ahead of time that violent crime had decreased since the middle of the century, and that violent crime had a peak in the 1990’s. Still, as with many of these projects, I find it interesting to see these trends in a more concrete way.

One question that came to me while working on the project is: why are drug crimes not included in the dataset? If I had thought of this earlier in the project, I might have tried to access the SRS directly and see if the data was simply left out from the dataset or what the explanation is there. It is fairly well known that the U.S. prison population has grown drastically over the last few decades. In fact, our incarcerated population increased by a factor of about 7 from 1980 to 2018.[3] It seems difficult to square a decrease in crime with a drastic increase in the prison population.

In this project, we looked at crime statistics in the USA from 1979 to 2019. We focused mainly on violent crime, and examined the distribution of events as well as the surge in violent crime during the 1990s. We found that while Washington D.C. is a statistical outlier in terms of crime rate and was especially so during the early-to-mid nineties, the surge in violence largely occurred in states like California, New York, and a few others.

I also learned about violence among Native American populations, which is a very complex and tragic issue. I may try to investigate this as another topic for future research.

One thing I struggled with during this project was that I wanted to show the y- axis of the population charts in “00 million” format instead of scientific notation. I tried to figure this out for quite some time but could not get it to work. I also would have liked to display the state names on the tiles in the Tableau visualization, but it seemed that the issue here was just that the tiles are too small and Tableau will not show the label.

[1] https://www.lifedaily.com/story/high-crime-rates-ignored-cases-us-indian-reservation-force-residents-take-action/

[2] https://ucr.fbi.gov/crime-in-the-u.s/2013/crime-in-the-u.s.-2013/violent-crime/rape

[3] https://www.sentencingproject.org/wp-content/uploads/2020/08/Trends-in-US-Corrections.pdf

