Introduction

I moved to Atlanta in the summer of 2013 for graduate school and immediately fell in love with the city. Southern hospitality and pride, lots of trees, hilly roads, winding streets, as well as a ton of cultural activities, Atlanta is a vibrant city with lots to offer. However, one particularly annoying 💩 aspect of living in Atlanta–and this is true of any large city really–is the relatively high crime rate of 1,433 per 100,000 residents as reported by Forbes.

My personal interest in crime in this city stems from having had my car broken into several times in the last couple of years. I was able to dig up some crime statistics from the Atlanta Police Department website and decided to spend a few minutes to map it. Then I thought, “Hmm, this could make for a nice #rstats tutorial.” So here is a mini geospatial data mapping exercise.

Note: In the absence of population data per district (and because I’m too busy and lazy to dig them up), we will only focus on how to make a base map and add a layer depicting where crimes are being reported. Because the data are not normalized by population, we will not be able to reach any meaningful conclusions about per-capita crime rates in different parts of the city or make cross-comparisons. So with that out of the way, let’s get moving.

Data and Libraries

I downloaded the COBRA-YTD2017 dataset from the above link and loaded the following libraries:

library(tidyverse)
library(ggmap)
library(readxl)
library(kableExtra)
library(knitr)

Descriptives and Visualization

Let’s load our dataset and take a closer look at it.

atl_crime_data <- read_excel("ATL_CRIME_2017.xlsx", 2)
glimpse(atl_crime_data)
## Observations: 26,759
## Variables: 23
## $ MI_PRINX          <chr> "8924155", "8924156", "8924157", "8924158", ...
## $ offense_id        <chr> "173650072", "173650102", "173650144", "1736...
## $ rpt_date          <chr> "12/31/2017", "12/31/2017", "12/31/2017", "1...
## $ occur_date        <chr> "12/30/2017", "12/18/2017", "12/30/2017", "1...
## $ occur_time        <chr> "23:15:00", "13:00:00", "22:01:00", "20:00:0...
## $ poss_date         <chr> "12/31/2017", "12/30/2017", "12/31/2017", "1...
## $ poss_time         <chr> "00:30:00", "22:00:00", "01:00:00", "01:06:0...
## $ beat              <chr> "510", "501", "303", "507", "409", "612", "6...
## $ apt_office_prefix <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ apt_office_num    <chr> NA, NA, NA, NA, NA, NA, "13", NA, NA, "8", N...
## $ location          <chr> "43 JESSE HILL JR DR NE", "1169 ATLANTIC DR ...
## $ MinOfucr          <chr> "0640", "0640", "0640", "0640", "0640", "065...
## $ MinOfibr_code     <chr> "2305", "2305", "2305", "2305", "2305", "230...
## $ dispo_code        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ MaxOfnum_victims  <chr> "2", "1", "1", "1", "2", "1", "1", "1", "1",...
## $ Shift             <chr> "Morn", "Unk", "Morn", "Eve", "Morn", "Morn"...
## $ `Avg Day`         <chr> "Sat", "Unk", "Sat", "Sat", "Sun", "Sun", "S...
## $ loc_type          <chr> "13", "13", "18", "18", "18", "18", "26", "1...
## $ UC2_Literal       <chr> "LARCENY-FROM VEHICLE", "LARCENY-FROM VEHICL...
## $ neighborhood      <chr> "Downtown", "Home Park", "Mechanicsville", "...
## $ npu               <chr> "M", "E", "V", "M", "R", "W", "W", "M", "W",...
## $ x                 <chr> "-84.380129999999994", "-84.397450000000006"...
## $ y                 <chr> "33.75582", "33.786740000000002", "33.7376",...

Hmm, there were 26,759 police reports generated from incidents that took place in 2017 across the 6 APD zones. I am primarily interested in vehicle crimes but before I can filter by crime type, I have to do some recoding. Variables x and y are longitude and latitude information representing the locations where the crimes occurred while UC2_Literal represents the type of crime. I am also going to recode the variable that logs which day of the week the crime occurred on. They are all character variables so we first need to recode them into numeric and factor variables respectively.

atl_crime_data$long <- atl_crime_data$x %>%
  as.numeric()

atl_crime_data$lat <- atl_crime_data$y %>%
  as.numeric()

atl_crime_data$type <- atl_crime_data$UC2_Literal %>%
  as.factor()

atl_crime_data$days <- atl_crime_data$`Avg Day` %>%
  as.factor()

Now, let’s see how many of each type of crime occurred in 2017 and map the daily frequencies by crime type.

kable(count(atl_crime_data, type, sort=TRUE), "html", col.names=c("Crime Type", "Frequency")) %>%
  kable_styling(bootstrap_options="striped", full_width=FALSE)
Crime Type Frequency
LARCENY-FROM VEHICLE 9840
LARCENY-NON VEHICLE 6589
AUTO THEFT 3197
BURGLARY-RESIDENCE 2635
AGG ASSAULT 2024
ROBBERY-PEDESTRIAN 1126
BURGLARY-NONRES 758
RAPE 226
ROBBERY-COMMERCIAL 157
ROBBERY-RESIDENCE 132
HOMICIDE 75

Interesting! Larceny from vehicles, which is a fancy way of saying car break-ins, ranks as the most frequent type of crime reported to the APD.

atl_crime_data %>%
  group_by(days, type) %>%
  summarize(freq=n()) %>%
  ggplot(aes(reorder(days, -freq), freq)) +   
  geom_bar(aes(fill=type), position="dodge", stat="identity", width=0.8, color="black") +
  xlab("Day of Week") +
  ylab("Frequency") +
  labs(fill="Crime Type") +
  ggtitle("Crime by Day of the Week")

Looks like Saturday is the peak day for overall crime. Okay, this is nice but let’s move on to make some city-level maps. First, we will create the base plot without longitude and latitude data.

atlanta_map <- qmap("atlanta",
                    zoom=12,
                    source="stamen",
                    maptype="toner",
                    color="bw")

atlanta_map

Our base map looks good in black and white. We will stick to this simple theme so our colored crime data can really pop. You can check the ggmap documentation for more sophisticated base plot themes if you decide to get more creative. Now we are ready to add a second layer to our plot. But first, let’s filter our dataset by crime types of interest.

atl_crime_vehicles <- atl_crime_data %>%
  select(long, lat, type) %>%
  filter(type %in% c("AUTO THEFT", "LARCENY-FROM VEHICLE"))

atlanta_map + geom_point(data=atl_crime_vehicles,
                         aes(x=long, y=lat, color=type),
                         alpha=0.2,
                         size=0.8) +
  labs(color="Crime Type") +
  theme(legend.position="bottom",
        legend.text=element_text(size=7)) 

We can repeat the above steps to map some of the more serious crime in the Atlanta area such as homicides:

And home burglaries:

Aggravated assaults:

And, finally, all crime:

Conclusions

A few words of caution: these maps obviously only depict the reported crimes in the APD log books. How many more go unreported, we can’t be sure. Also, sexual assault crimes don’t come with longitude and latitude information (as they shouldn’t) to protect the survivors. They are not depicted here. There was also a number of rows with missing data that were removed. Also, as noted by another data nerd from my neighborhood, “failing to normalize the data by population leads us to the wrong conclusion. It is not a surprise that crimes–including car crimes–occur more often where there are more people and less often where there are fewer people. Much of Atlanta’s south and west are relatively sparsely populated.” In other words, the data are not normalized by population so there is not much we can say in terms of cross-comparison.