This analysis is aiming to explore Maryland crime data through the lens of the state’s different jurisdictions. The exploration will include a review of overall crime counts, geographic data, crime trends over time, distribution of crime types, and a deeper dive into the most prevalent forms of crime.
Two datasets were incorporated into this analysis:
One contains data about Maryland crimes, including columns for jurisdiction, year, population, and different crime counts split by type, as well as crime grand totals for the row data. This dataset is called “Maryland Crime Data by County (1975-Present)”, and was downloaded from kaggle.com.
The second contains geographic information about the different Maryland counties, including latitude and longitude. This dataset is called “US County Boundaries” and was downloaded from opendatasoft.com.
setwd("/Users/andrewkadish/Desktop/Loyola/DS736/")
library(dplyr)
library(data.table)
library(ggplot2)
library (scales)
library(ggrepel)
library(plotly)
library(leaflet)
library(stringr)
crimes <- data.frame(fread('MD_Crime_Data.csv')) # pulls the file into a data frame
# gets rid of percent and per 100,000 people columns in 'crimes', easier to work with
to_drop <- names(crimes) %like% 'PERCENT' | names(crimes) %like% 'PEOPLE' # Boolean list
ind_to_drop <- which(to_drop) # index list for TRUE
cols_to_drop <- names(crimes)[ind_to_drop] # column names for indices
crimes <- crimes[, !names(crimes) %in% cols_to_drop] # rewrites 'crimes'
# pulling in geographic coordinate data for Maryland counties
geo_data <- data.frame(fread('us-county-boundaries.csv'))
Below are graphical representations of the data from the two listed datasets, arranged in such a way as to glean insight into crime patterns, and connect them to the different Maryland jurisdictions measured. Click the different tabs to delve into the different layers of the analysis.
# creating a new data frame for the overall crime totals, in decreasing order, for bar chart
overall <- crimes %>%
select(JURISDICTION, GRAND.TOTAL) %>%
group_by(JURISDICTION) %>%
summarise(total = sum(GRAND.TOTAL), .groups = 'keep') %>%
data.frame()
overall <- overall[order(overall$total, decreasing = TRUE),]
# plots the first bar chart for ordered grand totals of crimes
ggplot(overall, aes(x = reorder(JURISDICTION, total), y = total)) +
geom_bar(stat = 'identity', position = 'stack',
color = 'black', fill = 'darkred') +
coord_flip() +
geom_text(data = overall,
aes(x = JURISDICTION,
y = total,
label=scales::comma(total),
fill=NULL),
hjust=-0.1, size = 3) +
scale_y_continuous(label = comma,
breaks = seq(0, max(overall$total)+250000, 500000),
limits = c(0, max(overall$total)+100000)) +
labs(x = 'Maryland County',
y = 'Total Crimes (All-Time)',
title = 'All-Time Crime Counts by Maryland Jurisdiction') +
theme_light() +
theme(plot.title = element_text(hjust = 0.5))
The above graph is a bar chart representation of total crimes in different Maryland jurisdictions, which includes all Maryland counties and Baltimore City, from the year 1975 to the year 2020. It has been ordered from highest to lowest overall crime count in order to provide insight into which jurisdictions see the most crime.
The results of this analysis show that Baltimore City, the only jurisdiction that is a city and not a county, has the highest overall crime count. Prince George’s County is the county with the highest overall crime, followed by Baltimore County, Montgomery County, Anne Arundel County, and Howard County.
Baltimore City, as well as these top 5 counties, will be analyzed for their geographic location in order to glean further insight into crime trends.
# updating the geographic data to only contain the desired columns
geo_data <- geo_data %>%
select(NAMELSAD, CLASSFP, INTPTLAT, INTPTLON) %>%
data.frame()
# rewriting the class column to the corresponding jurisdiction type
geo_data$CLASSFP <- ifelse(geo_data$CLASSFP == 'C7',
'City', # C7 is type city
'County') # H1 is type county
# updating 'Baltimore city' as the case doesn't match the other data set's value
geo_data$NAMELSAD <- ifelse(geo_data$NAMELSAD == 'Baltimore city',
'Baltimore City',
geo_data$NAMELSAD)
# updating the column names
colnames(geo_data) <- c('JURISDICTION', 'Type', 'Lat', 'Long')
maps <- left_join(overall, geo_data) # joining on JURISDICTION
# top 5 crime-riddled counties
t5c <- head(maps$JURISDICTION[maps$Type == 'County'], 5)
leaflet() %>%
addTiles() %>%
setView(lng = -76.8, lat = 39.15, zoom = 9.25) %>%
addCircles(maps$Long, maps$Lat, stroke = FALSE,
popup = paste('<strong>', maps$JURISDICTION, '</strong>',
'<br>',
'Total Crimes: ',
scales::comma(maps$total)),
label = maps$JURISDICTION,
radius = 7*sqrt(maps$total),
color = ifelse(maps$Type == 'City',
'maroon',
ifelse(maps$JURISDICTION %in% t5c,
'red',
'coral')),
fillOpacity = 0.5) %>%
addLegend('bottomleft',
colors = c('maroon', 'red', 'coral'),
labels = c('City',
'Top-5 County by Total Crimes',
'Lower-Crime County'),
title = 'Jurisdiction Type<br>
<span style="white-space: nowrap;">
<i>Radius ~ Total Crimes</i>
</span>',
opacity = 0.5)
This map above is centered around the 5 counties identified as having the top overall crime counts, as well as the city of Baltimore. Different colors have been used to denote the category of the jurisdiction, whether it be city, high relative crime county, or low relative crime county. The size of the circle markers corresponds (proportionately) to the overall crime count.
A clear conclusion from this geographical visualization is that higher crime counts are somewhat influenced by proximity to large urban centers. Baltimore City, being the highest-crime jurisdiction, is itself the largest, most heavily populated urban center in Maryland. Looking outside the city, all the counties following in overall crime count had some relatively close proximity to a large urban center, be that Baltimore City or Washington D.C. - D.C. is unmarked as it is not in Maryland, but it is the next-largest relevant urban center outside of Baltimore City, and seems to exert influence on crime counts of the surrounding areas.
As Howard County is relatively low in terms of overall crime counts compared to the top 5 jurisdictions, it is being excluded from further deep-dives (see “Overall Crime Counts by Jurisdiction” for a clearer visual of this).
t5j <- head(overall$JURISDICTION, 5) # top 5 crime-riddled jurisdictions
# new data frame for top 5 crime-riddled jurisdictions
j_top5tot <- filter(crimes, crimes$JURISDICTION %in% t5j)
# finds the highest value for each of the top 5 jurisdictions, for line chart
peaks = j_top5tot %>%
select(YEAR, GRAND.TOTAL, JURISDICTION) %>%
group_by(JURISDICTION) %>%
filter(GRAND.TOTAL == max(GRAND.TOTAL)) %>%
data.frame()
# plots line chart with top 5 jurisdictions by crime count, includes peaks and associated values
ggplot(j_top5tot, aes(x = YEAR, y = GRAND.TOTAL, group = JURISDICTION)) +
geom_line(aes(color = JURISDICTION), size = 1) +
geom_point(data = peaks, shape = 21, size = 3, color = 'black', fill = 'white') +
geom_label_repel(data = peaks, aes(label = stringr::str_wrap(paste('Year: ',
YEAR,
'Crimes: ',
scales::comma(GRAND.TOTAL)),
14)),
box.padding = 1, point.padding = 0.4, fontface = 'bold',
size = 3, color = 'black', fill = alpha(c("white"),0.7)) +
labs(x = 'Year', y = 'Total Crimes (w/ Peaks)',
title = 'Maryland Crime Trends by Jurisdiction (Top 5)', color = 'Jurisdiction') +
scale_x_continuous(breaks = seq(min(j_top5tot$YEAR), max(j_top5tot$YEAR), 5)) +
scale_y_continuous(label = comma,
breaks = seq(10000, max(j_top5tot$GRAND.TOTAL)+50000, 10000)) +
theme_light() +
theme(plot.title = element_text(hjust = 0.5))
The above line chart gives an indication of how crime counts have changed over time in the top 5 crime-riddled jurisdictions. Geometric points and labels are included to indicate the time at which crime counts peaked for each.
As expected, Baltimore City has the highest number of crimes recorded, which was in 1995. However, following this point, crimes in Baltimore City took a sharp downturn. Most of jurisdictions experienced similar downward trends after their peaks - while this was not necessarily the case for Anne Arundel County (which remained relatively stagnant for a longer period afterward), this jurisdiction still saw a significant decrease from its peak in 1975. By 2005, these top jurisdictions were all generally moving downward in terms of their overall crime counts. However, this does not necessarily indicate a reduction in crime, as it may be attributable to population decreases over time. To provide clarity, additional population trending is analyzed using the graph below:
# gets the peak values for population
pop_peaks = j_top5tot %>%
select(YEAR, POPULATION, JURISDICTION) %>%
group_by(JURISDICTION) %>%
filter(POPULATION == max(POPULATION)) %>%
data.frame()
# plots line chart with top 5 jurisdictions by population, includes peaks and associated values
ggplot(j_top5tot, aes(x = YEAR, y = POPULATION, group = JURISDICTION)) +
geom_line(aes(color = JURISDICTION), size = 1) +
geom_point(data = pop_peaks, shape = 21, size = 3, color = 'black', fill = 'white') +
geom_label_repel(data = pop_peaks, aes(label = stringr::str_wrap(paste('Year: ',
YEAR,
'Population: ',
scales::comma(POPULATION)),
20)),
box.padding = 1, point.padding = 0.4, fontface = 'bold',
size = 3, color = 'black', fill = alpha(c("white"),0.7)) +
labs(x = 'Year', y = 'Population (w/ Peaks)',
title = 'Population Trends for Top 5 Crime-Riddled Jurisdictions', color = 'Jurisdiction') +
scale_x_continuous(breaks = seq(min(j_top5tot$YEAR), max(j_top5tot$YEAR), 5)) +
scale_y_continuous(label = comma) +
theme_light() +
theme(plot.title = element_text(hjust = 0.5))
From the above graph, it is clear that the population for most of the top 5 crime-riddled jurisdictions actually increased over time, with Baltimore City being the exception. This indicated that there is some legitimate crime reduction going on in the counties, whereas this may not necessarily be the case for Baltimore City as fewer people is likely to mean fewer potential crimes.
# creating individual data frames for each crime type, creating a type column
t5murder <- crimes %>%
filter(JURISDICTION %in% t5j) %>% # checks to see that the jurisdiction is in the top 5
select(JURISDICTION, YEAR, POPULATION, MURDER) %>%
group_by(JURISDICTION, YEAR, POPULATION) %>%
summarise(n = sum(MURDER), .groups = 'keep') %>%
mutate(Type = 'Murder') %>%
data.frame()
t5rape <- crimes %>%
filter(JURISDICTION %in% t5j) %>% # checks to see that the jurisdiction is in the top 5
select(JURISDICTION, YEAR, POPULATION, RAPE) %>%
group_by(JURISDICTION, YEAR, POPULATION) %>%
summarise(n = sum(RAPE), .groups = 'keep') %>%
mutate(Type = 'Rape') %>%
data.frame()
t5rob <- crimes %>%
filter(JURISDICTION %in% t5j) %>% # checks to see that the jurisdiction is in the top 5
select(JURISDICTION, YEAR, POPULATION, ROBBERY) %>%
group_by(JURISDICTION, YEAR, POPULATION) %>%
summarise(n = sum(ROBBERY), .groups = 'keep') %>%
mutate(Type = 'Robbery') %>%
data.frame()
t5agg <- crimes %>%
filter(JURISDICTION %in% t5j) %>%
select(JURISDICTION, YEAR, POPULATION, AGG..ASSAULT) %>%
group_by(JURISDICTION, YEAR, POPULATION) %>%
summarise(n = sum(AGG..ASSAULT), .groups = 'keep') %>%
mutate(Type = 'Agg. Assault') %>%
data.frame()
t5be <- crimes %>%
filter(JURISDICTION %in% t5j) %>%
select(JURISDICTION, YEAR, POPULATION, B...E) %>%
group_by(JURISDICTION, YEAR, POPULATION) %>%
summarise(n = sum(B...E), .groups = 'keep') %>%
mutate(Type = 'Breaking & Entering') %>%
data.frame()
t5larc <- crimes %>%
filter(JURISDICTION %in% t5j) %>%
select(JURISDICTION, YEAR, POPULATION, LARCENY.THEFT) %>%
group_by(JURISDICTION, YEAR, POPULATION) %>%
summarise(n = sum(LARCENY.THEFT), .groups = 'keep') %>%
mutate(Type = 'Larceny') %>%
data.frame()
t5mv <- crimes %>%
filter(JURISDICTION %in% t5j) %>%
select(JURISDICTION, YEAR, POPULATION, M.V.THEFT) %>%
group_by(JURISDICTION, YEAR, POPULATION) %>%
summarise(n = sum(M.V.THEFT), .groups = 'keep') %>%
mutate(Type = 'MV Theft') %>%
data.frame()
# create a new data frame combining the new t5 data frames
crime_types <- rbind(t5murder, t5rape, t5rob, t5agg, t5be, t5larc, t5mv)
# set data to 2020
crime_types_2020 <- filter(crime_types[crime_types$YEAR == 2020,])
# creating a trellis chart for the crime types in each top jurisdiction in 2020
plot_ly(textposition = 'inside', values = ~n, labels = ~Type) %>%
add_pie(data = crime_types_2020[crime_types_2020$JURISDICTION == 'Anne Arundel County',],
title = 'Anne Arundel County',
domain = list(row = 0, column = 0),
hovertemplate = 'Type: %{label}<br>Count: %{value:,.0f}<br>Percentage: %{percent}<extra></extra>') %>%
add_pie(data = crime_types_2020[crime_types_2020$JURISDICTION == 'Baltimore City',],
title = 'Baltimore City',
domain = list(row = 0, column = 1),
hovertemplate = 'Type: %{label}<br>Count: %{value:,.0f}<br>Percentage: %{percent}<extra></extra>') %>%
add_pie(data = crime_types_2020[crime_types_2020$JURISDICTION == 'Baltimore County',],
title = 'Baltimore County',
domain = list(row = 0, column = 2),
hovertemplate = 'Type: %{label}<br>Count: %{value:,.0f}<br>Percentage: %{percent}<extra></extra>') %>%
add_pie(data = crime_types_2020[crime_types_2020$JURISDICTION == 'Montgomery County',],
title = 'Montgomery County',
domain = list(row = 1, column = 0),
hovertemplate = 'Type: %{label}<br>Count: %{value:,.0f}<br>Percentage: %{percent}<extra></extra>') %>%
add_pie(data = crime_types_2020[crime_types_2020$JURISDICTION == "Prince George's County",],
title = "Prince George's County",
domain = list(row = 1, column = 1),
hovertemplate = 'Type: %{label}<br>Count: %{value:,.0f}<br>Percentage: %{percent}<extra></extra>') %>%
layout(title = 'Crime Type Distribution in Top Jurisdictions',
showlegend = TRUE, grid = list(rows = 2, columns = 3))
From the above trellis chart, the top crime category for all top districts in 2020 was larceny, with the highest percentage seen in Montgomery County. The runner-up for all jurisdictions was breaking and entering, followed by other types of robbery or aggravated assault - rape and murder were the least common crimes across the board.
While this measure captures the proportion of crimes of a certain type, the next visual will take a deeper dive into the common top crime category - larceny.
# creating a heat map for types of crime in the different top jurisdictions for the 2010s
larc_2010s <- crime_types[crime_types$YEAR >= 2010 & crime_types$Type == 'Larceny',]
larc_2010s$larc_rate <- larc_2010s$n / larc_2010s$POPULATION
larc_2010s$YEAR <- as.factor(larc_2010s$YEAR)
ggplot(larc_2010s, aes(x = YEAR, y = JURISDICTION, fill = larc_rate)) +
geom_tile(color = 'black') +
geom_text(aes(label = scales::percent(larc_rate, accuracy = 0.01))) +
labs(title = 'Larceny Rates Across Jurisdictions w/ Top Crime Counts',
x = 'Year', y = 'Jurisdiction',
fill = 'Larceny Rate') +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_continuous(low = 'white', high = 'darkred')
The above heat map focusing further on larceny rates across the top 5 crime-riddled jurisdictions demonstrates (unsurprisingly) that Baltimore City consistently has the highest rates of larceny, which hovered close to 3%, or ~3 larceny counts for every 100 people. This is excluding 2010, at which point the highest rate belonged by a slim margin to Prince George’s County.
It is also apparent that Prince George’s County, along with Anne Arundel County, saw a steady decrease in their larceny rates across the 2010s. Other jurisdictions did not see as strong of a consistent downward trend, though they did tend to end lower than they started. This means that the proportion of larceny crimes against overall crimes distributions seen in the trellis chart for “2020 Crime Type Distribution” is likely the lowest that it has been for all jurisdictions, and may indicate that these areas are experiencing success with the reduction of larceny as a top crime category.
The analysis conducted in the above charts yields a couple of key conclusions - crime is most abundant in and around large urban centers, and overall crime is somewhat steadily decreasing over the years in the areas where it is most prominent. For Baltimore City, this may be due in part to a reduction in population over the years, however for the other measured counties (in which populations have risen), the crime count reduction seems to be happening in earnest.