As the COVID-19 pandemic continues, I’ve had much more time to play around with data. Like many people, I was interested in looking at the coronavirus data for myself, especially with regards to my community and the communites of my family and friends.

Once quarantine procedures were put in place, I was also interested in seeing how this affected other areas of life, particularly crime. Baltimore City is unfortunately known to have a high crime rate, and I was curious if the state-wide quarantine procedures put into effect in March would have any affect on the level of crime reported in the city.

If there was an affect, which types of crimes were affected, and how? And, if there was a change in crime rate, was there any way to really prove that it was due to COVID-19?

Data Sources

All data used for this project is free and publicly available. Here are my sources:

R Packages

The following R packages were used in the project:

library(tidyverse)
library(gridExtra)
library(knitr)
library(kableExtra)

This Year’s Data

I first uploaded data about reported crimes published by the Baltimore Police Department. Here is a sample of the data:

#March crime data from BPD Part 1 Victim Based Crime Data
crime <- read.csv("C:/Users/Morganak/Documents/R/Projects/COVID-19/crime_data/BPD_Part_1_Victim_Based_Crime_Data_mar2020.csv", stringsAsFactors = TRUE)

head(crime)
##   ï..CrimeDate CrimeTime CrimeCode           Location    Description
## 1   03/01/2020  20:00:00        4E  1600 N CHESTER ST COMMON ASSAULT
## 2   03/01/2020  03:00:00        7A   700 MELVILLE AVE     AUTO THEFT
## 3   03/01/2020  19:00:00        6E    200 S HILTON ST        LARCENY
## 4   03/01/2020  14:00:00        6G       800 GLADE CT        LARCENY
## 5   03/01/2020  12:40:00       8FV 3700 ELLERSLIE AVE          ARSON
## 6   03/01/2020  14:57:00        4B      500 E 26TH ST   AGG. ASSAULT
##   Inside.Outside Weapon Post  District         Neighborhood Longitude
## 1              I   <NA>  331   EASTERN        BROADWAY EAST -76.58853
## 2              O   <NA>  515  NORTHERN              WAVERLY -76.60595
## 3              O   <NA>  835 SOUTHWEST CARROLL-SOUTH HILTON -76.67207
## 4              I   <NA>  913  SOUTHERN             BROOKLYN -76.60081
## 5              I   FIRE  515  NORTHERN EDNOR GARDENS-LAKESI -76.60505
## 6              I  KNIFE  513  NORTHERN       BETTER WAVERLY -76.60935
##   Latitude Location.1           Premise vri_name1 Total.Incidents
## 1 39.30885         NA ROW/TOWNHOUSE-OCC                         1
## 2 39.33158         NA            STREET                         1
## 3 39.28278         NA            STREET                         1
## 4 39.22955         NA ROW/TOWNHOUSE-OCC                         1
## 5 39.33527         NA ROW/TOWNHOUSE-OCC                         1
## 6 39.31896         NA ROW/TOWNHOUSE-VAC                         1

Next, I did some slight reformatting…

names(crime)[1]<-"Date"
crime$Date<-as.Date(crime$Date, format = "%m/%d/%Y")

…followed by some exploring.

The first thing I wanted to see was the number of total crimes reported during March, 2020:

#summary tables of crimes
crime_count<-crime %>% group_by(Description) %>% tally()
sum(crime_count$n)#total crimes in March
## [1] 2632

That’s a lot of crime!

Next, I was interested in seeing the types of crimes that were being tracked, and how many crimes of each type had been reported:

kable(crime_count, col.names = c("Crime Type", "Count")) %>% kable_styling(bootstrap_options = "striped", full_width = F)
Crime Type Count
AGG. ASSAULT 401
ARSON 8
AUTO THEFT 225
BURGLARY 291
COMMON ASSAULT 565
HOMICIDE 16
LARCENY 543
LARCENY FROM AUTO 209
RAPE 11
ROBBERY - CARJACKING 35
ROBBERY - COMMERCIAL 43
ROBBERY - RESIDENCE 40
ROBBERY - STREET 196
SHOOTING 49

Kind of hard to see which types stand out.

Here is the same info in an easier-to-read graph:

#Crime by type
p<-ggplot(crime_count, aes(crime_count$Description, crime_count$n))+
  geom_bar(stat = "identity", color=crime_count$Description) +
  labs(title="March 2020 ", subtitle= "Source: data.baltimorecity.gov", x=NULL, y="Count") + coord_flip()
p

As you can see, certain crime types really stand out.

I was also curious to see how many crimes were occurring per day, regardless of type:

#Restructuring data to obtain daily counts
daily_crime<-crime %>% group_by(Date) %>% tally()# number crimes occurring by date

p20<-ggplot(data=daily_crime, aes(x=Date, y=n)) + geom_line(color = "#00AFBB", size = 2)+
  ggtitle("Baltimore Crimes Reported for March, 2020") + ylab("Crimes Reported") + xlab(NULL)+
  ylim(0,160) + 
  scale_x_date(breaks=as.Date(c("2020-03-01", "2020-03-05", "2020-03-10", "2020-03-15", "2020-03-20","2020-03-25")), date_labels = "%m/%d")
p20

Social distancing procedures were underway in Maryland more strictly in mid-March. Many people began to work from home, and the governor closed schools and state offices.

At a glance, this does look like the updward crime trend does seem to stop in mid-March. But what was last year like? Maybe crime always decreases at that time of year.

Last Year’s Data

To see what things were like last year, I pulled the same data, but this time filtered by March 2019. Then I ran the same code as above:

crime19<-read.csv("C:/Users/Morganak/Documents/R/Projects/COVID-19/crime_data/BPD_Part_1_Victim_Based_Crime_Data_mar2019.csv", stringsAsFactors = TRUE)

names(crime19)[1]<-"Date"
crime19$Date<-as.Date(crime19$Date, format = "%m/%d/%Y")

#summary tables of crimes in 2019
crime19_count<-crime19 %>% group_by(Description) %>% tally()
sum(crime19_count$n)#total crimes in March 2019
## [1] 3465

I can see already that the total number of crimes reported in March 2019 varies greatly from those reported this year. Much more crime was report in March 2019 than March 2020.

What about other crime stats? I’ll skip right to the bar plot this time.

p2<-ggplot(crime19_count, aes(crime19_count$Description, crime19_count$n))+
  geom_bar(stat = "identity", color=crime19_count$Description) +
  labs(title="March 2019 ", subtitle= "Source: data.baltimorecity.gov", x=NULL, y="Count") + coord_flip()

require(gridExtra)
grid.arrange(p2,p, ncol=2)

Doesn’t look much different from 2019. My impression is that the type of crimes being committed haven’t changed much between last year and the present.

So, how do overall crime levels compare between years?

To see that, I made a line graph for March 2019 and put that next to the March 2020 line graph for quick comparison.

daily_crime19<-crime19 %>% group_by(Date) %>% tally()# number crimes occurring by date

#Side-by-Side Line graphs March 2019 and 2020 total crime levels
p19<-ggplot(data=daily_crime19, aes(x=Date, y=n)) + geom_line(color = "#00AFBB", size = 2)+
  ggtitle("March, 2019") + ylab("Crimes Reported") + xlab(NULL)+
  ylim(0,160) +
  scale_x_date(breaks=as.Date(c("2019-03-01", "2019-03-05", "2019-03-10", "2019-03-15", "2019-03-20", "2019-03-25")), date_labels = "%m/%d")

p20<-ggplot(data=daily_crime, aes(x=Date, y=n)) + geom_line(color = "#00AFBB", size = 2)+
  ggtitle("March, 2020") + ylab("Crimes Reported") + xlab(NULL)+
  ylim(0,160) + 
  scale_x_date(breaks=as.Date(c("2020-03-01", "2020-03-05", "2020-03-10", "2020-03-15", "2020-03-20","2020-03-25")), date_labels = "%m/%d")
               
require(gridExtra)
grid.arrange(p19,p20, ncol=2)

It definitely looks like a difference between the end of March in 2019 and the end of March this year. However, I don’t really like how the x-axis breaks are spread out differently at the end of each month.

March 2019 had crimes reported after March 28th, but March 2020 didn’t, so the x-axes breaks are not evenly spaced.

I want people to focus on the content of the graph, not be distracted by the axis breaks, so I decided to try plotting both years on the same graph:

#new data without the year so differences on same days can be seen
daily_crime$Date<-format(daily_crime$Date, format="%m/%d")
daily_crime19$Date<-format(daily_crime19$Date, format="%m/%d")
all<-daily_crime19 %>% full_join(daily_crime, by = "Date")
names(all)[2]<-"Count2019"
names(all)[3]<-"Count2020"
all$Date<-as.Date(all$Date,"%m/%d")

Since both lines are on the same plot, it’s a good idea to choose colors that are color-blind friendly, as 1 in 20 people experience some form of colorblindness.

I used the color-picker by David Nichols that can be found at Coloring for Colorblindness.

colors<-c("2019"="#10A0BA" , "2020"="#D81B60")

p<-ggplot(data=all, aes(x=as.Date(all$Date, "%m/%d"))) + geom_line(aes(y=all$Count2019, color="2019")) +
  geom_line(aes(y=all$Count2020, color="2020")) +
  scale_color_manual(values=colors, name=NULL) +
  labs(title="Baltimore Crime: March 2019 vs. March 2020", subtitle= "Source: data.baltimorecity.gov", x=NULL, y="Reported Crimes") 
p

Conclusion

The final graph is definitely easier to read! Well, sort of. If you wanted to compare the exact differences between dates, this would not be the graph of choice. It’s a little hard to read at the beginning of the month, when the crime rates are more similar.

But it gets the point across pretty quickly. There is a clear difference in the trend of Baltimore crime last year (blue line) and this year (red line).

Can that change be attributed directly to COVID-19? Probably not directly. But I’d be curious to see how the data compares to earlier years, as well as whether the downward trend continues through the summer.