The topic I chose to look at for my final project was crime in the DMV from 1960 to 2019.I chose to look at both the totals for property and violent crimes as well as the rates for both. I chose this for my project because I am extremely interested in Criminal Justice and love to see what stories the data gathered shows. Keeping in mind that crime data is only the crimes that are reported and might not represent the full picture. The dataset that I chose to use was found on Github (https://corgis-edu.github.io/corgis/csv/state_crime/). This dataset was made from public information from the Unified Crime Reporting Statistics in collaboration with the U.S. Department of Justice and the Federal Bureau of Investigation. Before we get started I’ll provide the descriptions of my key variables as provided by the authors on Github.
Data.Population - The number of people living in this state at the time the report was created.
Data.Rates.Property.All - Rates are the number of reported offenses per 100,000 population. This property reflects all of the Property-related crimes, including burglaries, larcenies, and motor crimes.
Data.Rates.Violent.All - Rates are the number of reported offenses per 100,000 population. This property reflects all of the Violent crimes, including assaults, murders, rapes, and robberies.
Data.Totals.Property.All - This property reflects all of the Property-related crimes, including burglaries, larcenies, and motor crimes.
Data.Totals.Violent.All - This property reflects all of the Violent crimes, including assaults, murders, rapes, and robberies.
As always I first load in my libraries and then my csv file. I then use head to preview the dataset.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(highcharter)
## Warning: package 'highcharter' was built under R version 4.3.1
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(patchwork)
## Warning: package 'patchwork' was built under R version 4.3.1
library(cowplot)
## Warning: package 'cowplot' was built under R version 4.3.1
##
## Attaching package: 'cowplot'
##
## The following object is masked from 'package:patchwork':
##
## align_plots
##
## The following object is masked from 'package:lubridate':
##
## stamp
library(dplyr)
state_crime <- read_csv("state_crime.csv")
## Rows: 3115 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): State
## dbl (20): Year, Data.Population, Data.Rates.Property.All, Data.Rates.Propert...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(state_crime)
## # A tibble: 6 × 21
## State Year Data.Population Data.Rates.Property.All Data.Rates.Property.Bu…¹
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Alabama 1960 3266740 1035. 356.
## 2 Alabama 1961 3302000 986. 339.
## 3 Alabama 1962 3358000 1067 349.
## 4 Alabama 1963 3347000 1151. 377.
## 5 Alabama 1964 3407000 1359. 467.
## 6 Alabama 1965 3462000 1393. 474.
## # ℹ abbreviated name: ¹Data.Rates.Property.Burglary
## # ℹ 16 more variables: Data.Rates.Property.Larceny <dbl>,
## # Data.Rates.Property.Motor <dbl>, Data.Rates.Violent.All <dbl>,
## # Data.Rates.Violent.Assault <dbl>, Data.Rates.Violent.Murder <dbl>,
## # Data.Rates.Violent.Rape <dbl>, Data.Rates.Violent.Robbery <dbl>,
## # Data.Totals.Property.All <dbl>, Data.Totals.Property.Burglary <dbl>,
## # Data.Totals.Property.Larceny <dbl>, Data.Totals.Property.Motor <dbl>, …
First I remove the periods and change everything to lowercase. I use head to make sure everything worked.
colnames(state_crime) <- tolower(names(state_crime))
colnames(state_crime) <- str_replace_all(colnames(state_crime), "\\.", "_")
head(state_crime)
## # A tibble: 6 × 21
## state year data_population data_rates_property_all data_rates_property_bu…¹
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Alabama 1960 3266740 1035. 356.
## 2 Alabama 1961 3302000 986. 339.
## 3 Alabama 1962 3358000 1067 349.
## 4 Alabama 1963 3347000 1151. 377.
## 5 Alabama 1964 3407000 1359. 467.
## 6 Alabama 1965 3462000 1393. 474.
## # ℹ abbreviated name: ¹data_rates_property_burglary
## # ℹ 16 more variables: data_rates_property_larceny <dbl>,
## # data_rates_property_motor <dbl>, data_rates_violent_all <dbl>,
## # data_rates_violent_assault <dbl>, data_rates_violent_murder <dbl>,
## # data_rates_violent_rape <dbl>, data_rates_violent_robbery <dbl>,
## # data_totals_property_all <dbl>, data_totals_property_burglary <dbl>,
## # data_totals_property_larceny <dbl>, data_totals_property_motor <dbl>, …
I then filter by the DMV and preview the results.
filtered_state_crime <- state_crime %>%
filter(state %in% c("District of Columbia", "Maryland", "Virginia"))
head(filtered_state_crime)
## # A tibble: 6 × 21
## state year data_population data_rates_property_…¹ data_rates_property_…²
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 District … 1960 763956 2159. 600.
## 2 District … 1961 763956 2237. 642.
## 3 District … 1962 784000 2227. 641.
## 4 District … 1963 798000 2612 875.
## 5 District … 1964 808000 3122. 1103.
## 6 District … 1965 803000 3497 1231.
## # ℹ abbreviated names: ¹data_rates_property_all, ²data_rates_property_burglary
## # ℹ 16 more variables: data_rates_property_larceny <dbl>,
## # data_rates_property_motor <dbl>, data_rates_violent_all <dbl>,
## # data_rates_violent_assault <dbl>, data_rates_violent_murder <dbl>,
## # data_rates_violent_rape <dbl>, data_rates_violent_robbery <dbl>,
## # data_totals_property_all <dbl>, data_totals_property_burglary <dbl>,
## # data_totals_property_larceny <dbl>, data_totals_property_motor <dbl>, …
Next I filter by the columns I want to focus on. Using head to once again check my work.
filtered_state_crime <- filtered_state_crime %>%
select(state, year, data_population, data_rates_property_all, data_rates_violent_all, data_totals_property_all, data_totals_violent_all)
head(filtered_state_crime)
## # A tibble: 6 × 7
## state year data_population data_rates_property_…¹ data_rates_violent_all
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 District … 1960 763956 2159. 554.
## 2 District … 1961 763956 2237. 588.
## 3 District … 1962 784000 2227. 606.
## 4 District … 1963 798000 2612 594
## 5 District … 1964 808000 3122. 633.
## 6 District … 1965 803000 3497 723.
## # ℹ abbreviated name: ¹data_rates_property_all
## # ℹ 2 more variables: data_totals_property_all <dbl>,
## # data_totals_violent_all <dbl>
First I show a summary of my variables. I won’t have to do analysis on the state or year variables for obvious reasons. For the remaining five variables, I chose to make box plots.
From my boxplots I find five potential outliers over two variables.
summary(filtered_state_crime)
## state year data_population data_rates_property_all
## Length:180 Min. :1960 Min. : 519000 Min. :1469
## Class :character 1st Qu.:1975 1st Qu.: 721250 1st Qu.:2752
## Mode :character Median :1990 Median :4385000 Median :4023
## Mean :1990 Mean :3874850 Mean :4296
## 3rd Qu.:2004 3rd Qu.:5701108 3rd Qu.:5177
## Max. :2019 Max. :8535519 Max. :9512
## data_rates_violent_all data_totals_property_all data_totals_violent_all
## Min. : 151.3 Min. : 16495 Min. : 4230
## 1st Qu.: 305.0 1st Qu.: 47831 1st Qu.:10164
## Median : 641.5 Median :153503 Median :16749
## Mean : 842.3 Mean :134370 Mean :19616
## 3rd Qu.:1250.6 3rd Qu.:206980 3rd Qu.:26474
## Max. :2921.8 Max. :267625 Max. :49757
As you can see no potential outliers are present
boxplot(filtered_state_crime$data_population,
ylab = "Population")
As you can two potential outliers are identified.
boxplot(filtered_state_crime$data_rates_property_all,
ylab = "Property rate")
As you can two potential outliers are identified.
boxplot(filtered_state_crime$data_rates_violent_all,
ylab = "Violent rate")
As you can see no potential outliers are present
boxplot(filtered_state_crime$data_totals_property_all,
ylab = "Property Total")
As you can see no potential outliers are present
boxplot(filtered_state_crime$data_totals_violent_all,
ylab = "Violent Total")
I first want to make a plot showing the general population over time.
popplot <- ggplot(filtered_state_crime, aes(x = year, y = data_population)) +
geom_point()
popplot
I add labels and a title as well as changing the shape of the points to reflect the state they represent. I also change the Y axis scale. I then change the theme to black and white.
popplot1 <- ggplot(filtered_state_crime, aes(x = year, y = data_population)) +
geom_point(shape = ifelse(filtered_state_crime$state == "Maryland", 17,
ifelse(filtered_state_crime$state == "Virginia", 18, 16))) +
scale_y_continuous(labels = scales::comma, breaks = seq(0, max(filtered_state_crime$data_population), 1000000)) +
labs(x = "Year", y = "Population (in millions)", title = "DMV Population by Year") +
theme_bw()
popplot1
lastly to my graph I add a legend so it can be read.
popplot2 <- ggplot(filtered_state_crime, aes(x = year, y = data_population)) +
geom_point(aes(shape = state)) +
scale_y_continuous(labels = scales::comma, breaks = seq(0, max(filtered_state_crime$data_population), 1000000)) +
labs(x = "Year", y = "Population (in millions)", title = "DMV Population by Year") +
theme_bw() +
scale_shape_manual(values = c("District of Columbia" = 16, "Maryland" = 17, "Virginia" = 18), guide = guide_legend(title = "State Shapes"))
popplot2
I gather the property and violent totals and store them under crime_type and crime_total.
gathered_data1 <- filtered_state_crime %>%
select(state, year, starts_with("data_totals_")) %>%
pivot_longer(cols = starts_with("data_totals_"), names_to = "crime_type", values_to = "crime_total")
head(gathered_data1)
## # A tibble: 6 × 4
## state year crime_type crime_total
## <chr> <dbl> <chr> <dbl>
## 1 District of Columbia 1960 data_totals_property_all 16495
## 2 District of Columbia 1960 data_totals_violent_all 4230
## 3 District of Columbia 1961 data_totals_property_all 17093
## 4 District of Columbia 1961 data_totals_violent_all 4491
## 5 District of Columbia 1962 data_totals_property_all 17458
## 6 District of Columbia 1962 data_totals_violent_all 4750
I gather the property and violent rates and store them under crime_type and crime_rate.
gathered_data <- filtered_state_crime %>%
select(state, year, starts_with("data_rates_")) %>%
pivot_longer(cols = starts_with("data_rates_"), names_to = "crime_type", values_to = "crime_rate")
head(gathered_data)
## # A tibble: 6 × 4
## state year crime_type crime_rate
## <chr> <dbl> <chr> <dbl>
## 1 District of Columbia 1960 data_rates_property_all 2159.
## 2 District of Columbia 1960 data_rates_violent_all 554.
## 3 District of Columbia 1961 data_rates_property_all 2237.
## 4 District of Columbia 1961 data_rates_violent_all 588.
## 5 District of Columbia 1962 data_rates_property_all 2227.
## 6 District of Columbia 1962 data_rates_violent_all 606.
I chose to change the names for crime rate here for a cleaner appearance.
modified_gathered_data <- gathered_data %>%
mutate(crime_type = case_when(crime_type == "data_rates_property_all" ~ "Property Crime Rate",
crime_type == "data_rates_violent_all" ~ "Violent Crime Rate", TRUE ~ crime_type))
head(modified_gathered_data)
## # A tibble: 6 × 4
## state year crime_type crime_rate
## <chr> <dbl> <chr> <dbl>
## 1 District of Columbia 1960 Property Crime Rate 2159.
## 2 District of Columbia 1960 Violent Crime Rate 554.
## 3 District of Columbia 1961 Property Crime Rate 2237.
## 4 District of Columbia 1961 Violent Crime Rate 588.
## 5 District of Columbia 1962 Property Crime Rate 2227.
## 6 District of Columbia 1962 Violent Crime Rate 606.
I chose to change the names for total crime here for a cleaner appearance.
modified_gathered_data1 <- gathered_data1 %>%
mutate(crime_type = case_when(crime_type == "data_totals_property_all" ~ "Total Property Crime",
crime_type == "data_totals_violent_all" ~ "Total Violent Crime", TRUE ~ crime_type))
head(modified_gathered_data1)
## # A tibble: 6 × 4
## state year crime_type crime_total
## <chr> <dbl> <chr> <dbl>
## 1 District of Columbia 1960 Total Property Crime 16495
## 2 District of Columbia 1960 Total Violent Crime 4230
## 3 District of Columbia 1961 Total Property Crime 17093
## 4 District of Columbia 1961 Total Violent Crime 4491
## 5 District of Columbia 1962 Total Property Crime 17458
## 6 District of Columbia 1962 Total Violent Crime 4750
I start with graphing the crime rate data with the rate per 100,000 being the Y axis and the year being the X axis. I chose to color by crime type so a clear difference is seen.
p1 <- ggplot(modified_gathered_data, aes(x = year, y = crime_rate, color = crime_type)) +
geom_point() +
labs(x = "Year", y = "Crime Rate per 100,000") +
ggtitle("Crime Rates in the DMV by Year")
p1
I then change the shape of the points to reflect the States they represent. After I change the color of the points and the theme of the graph. I chose Red for violent crimes and Blue for property crimes, I did this because I feel that Red is normally used to indicate more severity.
p2 <- ggplot(modified_gathered_data, aes(x = year, y = crime_rate, color = crime_type, shape = state)) +
geom_point() +
scale_shape_manual(values = c("District of Columbia" = 16, "Maryland" = 17, "Virginia" = 18)) +
labs(x = "Year", y = "Crime Rate per 100,000", shape = "State", color = "Crime Type") +
ggtitle("Crime Rates in the DMV by Year") +
scale_color_manual(values = c("blue", "red")) +
theme_light()
p2
I then add trend lines to indicate trends over time.
p3 <- ggplot(modified_gathered_data, aes(x = year, y = crime_rate, color = crime_type, shape = state)) +
geom_point() +
geom_smooth(se = FALSE) +
scale_shape_manual(values = c("District of Columbia" = 16, "Maryland" = 17, "Virginia" = 18)) +
labs(x = "Year", y = "Crime Rate per 100,000", shape = "State", color = "Crime Type") +
ggtitle("Crime Rates in the DMV by Year") +
scale_color_manual(values = c("blue", "red")) +
theme_light()
p3
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
I then use facet wrap to split the graphs so you can focus on one type of crime at a time.
p4 <- ggplot(modified_gathered_data, aes(x = year, y = crime_rate, color = crime_type, shape = state)) +
geom_point() +
geom_smooth(se = FALSE) +
scale_shape_manual(values = c("District of Columbia" = 16, "Maryland" = 17, "Virginia" = 18)) +
labs(x = "Year", y = "Crime Rate per 100,000", shape = "State", color = "Crime Type") +
ggtitle("Crime Rates in the DMV by Year") +
scale_color_manual(values = c("blue", "red")) +
theme_light() +
facet_wrap(~ crime_type, ncol = 2) +
guides(color = FALSE)
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
p4
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
I start with graphing the total crime data with the total crime being the Y axis and the year being the X axis. I chose to color by crime type so a clear difference is seen.
p5 <- ggplot(modified_gathered_data1, aes(x = year, y = crime_total, color = crime_type)) +
geom_point() +
labs(x = "Year", y = "Crime Totals") +
ggtitle("Crime Totals in the DMV by Year")
p5
I then change the shape of the points to reflect the States they represent. After I change the color of the points and the theme of the graph. I chose Red for violent crimes and Blue for property crimes, I did this because I feel that Red is normally used to indicate more severity. I also scale the Y axis by 50,000.
p6 <- ggplot(modified_gathered_data1, aes(x = year, y = crime_total, color = crime_type, shape = state)) +
geom_point() +
scale_y_continuous(breaks = seq(0, max(gathered_data1$crime_total), 50000), labels = scales::comma) +
scale_shape_manual(values = c("District of Columbia" = 16, "Maryland" = 17, "Virginia" = 18)) +
labs(x = "Year", y = "Crime Totals", shape = "State", color = "Crime Type") +
ggtitle("Crime Totals in the DMV by Year") +
scale_color_manual(values = c("blue", "red")) +
theme_light()
p6
I then add trend lines to indicate trends over time.
p7 <- ggplot(modified_gathered_data1, aes(x = year, y = crime_total, color = crime_type, shape = state)) +
geom_point() +
geom_smooth(se = FALSE) +
scale_y_continuous(breaks = seq(0, max(gathered_data1$crime_total), 50000), labels = scales::comma) +
scale_shape_manual(values = c("District of Columbia" = 16, "Maryland" = 17, "Virginia" = 18)) +
labs(x = "Year", y = "Crime Totals", shape = "State", color = "Crime Type") +
ggtitle("Crime Totals in the DMV by Year") +
scale_color_manual(values = c("blue", "red")) +
theme_light()
p7
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
I then use facet wrap to split the graphs so you can focus on one type of crime at a time.
p8 <- ggplot(modified_gathered_data1, aes(x = year, y = crime_total, color = crime_type, shape = state)) +
geom_point() +
geom_smooth(se = FALSE) +
scale_y_continuous(breaks = seq(0, max(gathered_data1$crime_total), 50000), labels = scales::comma) +
scale_shape_manual(values = c("District of Columbia" = 16, "Maryland" = 17, "Virginia" = 18)) +
labs(x = "Year", y = "Crime Totals", shape = "State", color = "Crime Type") +
ggtitle("Crime Totals in the DMV by Year") +
scale_color_manual(values = c("blue", "red")) +
theme_light() +
facet_wrap(~ crime_type, ncol = 2) +
guides(color = FALSE)
p8
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
I now us cowplot to combine my population graph and my crime rates graph.
combined_plot <- plot_grid(popplot1, p3, ncol = 2)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
combined_plot
I now us cowplot to combine my population graph and my crime totals graph.
combined_plot1 <- plot_grid(popplot1, p7, ncol = 2)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
combined_plot1
I now use highcharter to make my crime rates information interactive.
hc <- highchart() %>%
hc_xAxis(categories = unique(modified_gathered_data$year)) %>%
hc_yAxis(title = list(text = "Crime Rates per 100,000")) %>%
hc_add_series(data = modified_gathered_data, type = "scatter",
hcaes(x = year, y = crime_rate, group = state, color = crime_type)) %>%
hc_legend(enabled = TRUE) %>%
hc_title(text = "Crime Rates in the DMV by Year") %>%
hc_tooltip(pointFormat = "Year: {point.x}<br/>Population: {point.y}<br/>Crime Type: {point.crime_type}")
hc
I now use highcharter to make my crime totals information interactive.
hc1 <- highchart() %>%
hc_xAxis(categories = unique(modified_gathered_data1$year)) %>%
hc_yAxis(title = list(text = "Crime Totals")) %>%
hc_add_series(data = modified_gathered_data1, type = "scatter",
hcaes(x = year, y = crime_total, group = state, color = crime_type)) %>%
hc_legend(enabled = TRUE) %>%
hc_title(text = "Crime Totals in the DMV by Year") %>%
hc_tooltip(pointFormat = "Year: {point.x}<br/>Population: {point.y}<br/>Crime Type: {point.crime_type}")
hc1
I think that when you combine the crime rates graph and the crime totals graph with the population growth graph, you get very interesting insights. When looking at the crime totals graph compared to the population growth per year graph you would expect to see a gradual increase in both Total property crime and total violent crime for Virginia and Maryland but you would expect to see DC stay the same due to its population not growing significantly. However what you see is that Maryland and Virginia’s totals outrageously spike up starting the 60s and continuing until about the mid-90s. DC saw a slight increase in the number of property crimes in the first decade but DC does not follow the same trend as VA and MD. For violent crimes, Maryland has a gradual increase until around 1995 while Virginia and DC remain rather constant. However when looking at the crime rates versus the population growth graph you see that DC’s rate of crimes per 100,000 is drastically higher than Virginia or Maryland. This is true for both property and violent crime. This is disproportionate to how much the population grew compared to how much crime increased. While DC’s population remained relatively constant, up until about 1996 they saw drastic increases in property crime and steady increases in violent crime. Interesting to note is how little change there was in Va’s violent crime in this time span. It is interesting to think about what factors might have led to this drastic increase in crime from ~1960 – 1995, and then what might have led to the decline in crime across the board. The three decades leading into 1960 saw a dramatic fall in both property and violent crime. 1960 marks the start of a more than three-decade uptrend in crime overall. The crime explosion is often linked to demographic shifts of post-World War II baby boomers, but I would like to keep exploring this data and topic further. Some interesting factors that might have played a role in the decrease in crime starting around 1995-200 are better economic opportunities, less alcohol consumption, and more modern policing techniques.
Sources: https://corgis-edu.github.io/corgis/csv/state_crime/ https://ucr.fbi.gov/crime-in-the-u.s/2019/crime-in-the-u.s.-2019/downloads/download-printable-files https://bjs.ojp.gov/content/pub/pdf/aus8009.pdf https://bjs.ojp.gov/content/pub/pdf/htus8008.pdf https://www.disastercenter.com/crime/dccrime.htm https://quod.lib.umich.edu/h/humfig/11217607.0002.206/--decivilization-in-the-1960s?rgn=main;view=fulltext