I recently moved to Denver and therefore thought it would be interesting to analyze the crime that occurs here.Denver (both County and City) provides public crime data on their website (https://www.denvergov.org/opendata/dataset/city-and-county-of-denver-crime). Although there are many claims one could make from this dataset, my goal is - via analysis - to either support or refute conjectures made about crime in the Denver area.
Per the City and County of Denver’s website,
“This
dataset includes criminal offenses in the City and County of Denver for
the previous five calendar years plus the current year to date. The data
is based on the national Incident Based Reporting System (NIBRS) which
includes all victims of person crimes and all crimes within an incident.
The data is dynamic, which allows for additions, deletions, and/or
modifications at any time, resulting in more accurate information in the
database. Due to continuous data entry, the number of records in
subsequent extractions are subject to change. Crime data is updated
Monday through Friday.
I’ve read through several articles from NPR and CPR. This
article for reference (https://www.cpr.org/2022/03/10/colorado-crime-rates/)
claims that crime began rising substantially before the pandemic began
in 2020 and that trend still continues to this day. Below, I’ve outlined
the takeaways from this article that I will address in my analysis.
‘Violent crime’ includes murder, aggravated assault, sexual
assault, and robbery has been on the rise from 2019 to 2021.
From
2019 to 2021, murder has increased by 47 percent (Colorado Bureau of
Investigation).
Property crime has increased by 20 percent.
Auto theft has increased by 86 percent.
The FBI analyzed
data from 2019 to 2020 and found that Colorado had seen the
fourth-highest increase in crimes in the United States. The top three
states were Pennsylvania, South Dakota, and Utah.
Per Niel Oza (who reached out to the City of Denver for clarification/variable definitions), I’ve included definitions of the variables in the crime dataset (https://www.kaggle.com/code/neilb4yourking/analyzing-denver-s-crime-data/notebook).
Variable Explanation:
OFFENSE_ID is a
unique identifier for each offense. It is generated by concatenating
INCIDENT_ID, OFFENSE_CODE, and OFFENSE_CODE_EXTENSION. It provides a
unique identifier for each offense.
INCIDENT_ID is an identifier for an occurence of
offenses. Most OFFENSE_ID’s have unique INCIDENT_ID’s, but when a person
commits multiple offenses at once, e.g. liquor possession and heroine
possession, multiple OFFENSE_ID’s will be generated from the
INCIDENT_ID.
OFFENSE_CODE is a unique
identifier for a particular type of offense. Things such as criminal
mischief, trespassing, larceny, etc. all have different OFFENSE_CODE
values to identify them.
OFFENSE_CODE_EXTENSION are used to describe a
subset of another type crime. For example criminal_mischief-motor
vehicle and criminal_mischief-other have the same OFFENSE_CODE but
different extensions to differentiate them.
OFFENSE_TYPE_ID provides the basic name for the
offense. Each combination of OFFENSE_CODE and OFFENSE_EXTENSION
reference a unique crime. Contents of this column include things such as
theft-shoplift, criminal-trespassing, and threats-to-injure.
OFFENSE_CATEGORY_ID provides a more general
categorization for crimes. For example, theft-shoplift and
theft-from-bldg are both forms of larceny.
FIRST_OCCURENCE_DATE is the first possible
date/time of the offense. If the time of the offense is known, the
LAST_OCCURENCE_DATE will have value NaN. If the time is not known,
FIRST_OCCURENCE_DATE will note the first possible time for the offense,
and LAST_OCCURENCE_DATE will be last possible time of the offense. This
commonly occurs with burglaries, where the exact time of the offense may
not be known, but a range of time is known.
LAST_OCCURENCE_DATE will be NaN if the exact time
of the offense is known and will be an actual time if only a range of
possible times is known. In the latter case, it will be the last
possible time the offense could have occured.
REPORTED_DATE is the time at which the offense was
reported to the police.
INCIDENT_ADDRESS
provides the location of the offense. Not all entries have a value
for this column for privacy reasons.
GEO_LON and
GEO_LAT are the latitudes and longitudes of the location of
the offense.
GEO_X and GEO_Y are the
state plane (city of Denver standard projection) for the offense
location. Functionally similar to GEO_LON and GEO_LAT.
DISTRICT_ID is the district in charge of handling
the offense.
PRECINCT_ID is the
precinct in charge of handling the offense.
NEIGHBORHOOD_ID is the neighborhood the offense
occurred in.
IS_CRIME states whether
the offense was a crime.
IS_TRAFFIC
states whether the offense was a traffic incident.
Sys.setenv(PATH = paste(Sys.getenv("PATH"), "path_to_pandoc", sep=.Platform$path.sep))
denver_crime_dataset_3 <- utils::read.csv("/Users/bethanyleach/crime-3.csv")
denver_offense_codes <- utils::read.csv("/Users/bethanyleach/offense_codes.csv")
denver_crime_revised_3 <- as_tibble(denver_crime_dataset_3)
str(denver_crime_revised_3)
## tibble [499,468 × 19] (S3: tbl_df/tbl/data.frame)
## $ incident_id : num [1:499468] 2.02e+10 2.02e+07 2.02e+07 2.02e+07 2.02e+07 ...
## $ offense_id : num [1:499468] 2.02e+16 2.02e+13 2.02e+13 2.02e+13 2.02e+13 ...
## $ OFFENSE_CODE : int [1:499468] 2999 2999 2999 2999 2999 2999 2999 2999 2999 2999 ...
## $ OFFENSE_CODE_EXTENSION: int [1:499468] 0 0 0 0 0 0 0 0 0 0 ...
## $ OFFENSE_TYPE_ID : chr [1:499468] "criminal-mischief-other" "criminal-mischief-other" "criminal-mischief-other" "criminal-mischief-other" ...
## $ OFFENSE_CATEGORY_ID : chr [1:499468] "public-disorder" "public-disorder" "public-disorder" "public-disorder" ...
## $ FIRST_OCCURRENCE_DATE : chr [1:499468] "1/4/2022 11:30:00 AM" "1/3/2022 6:45:00 AM" "1/3/2022 1:00:00 AM" "1/3/2022 7:47:00 PM" ...
## $ LAST_OCCURRENCE_DATE : chr [1:499468] "1/4/2022 12:00:00 PM" "" "" "" ...
## $ REPORTED_DATE : chr [1:499468] "1/4/2022 8:36:00 PM" "1/3/2022 11:01:00 AM" "1/3/2022 6:11:00 AM" "1/3/2022 9:12:00 PM" ...
## $ INCIDENT_ADDRESS : chr [1:499468] "128 S CANOSA CT" "650 15TH ST" "919 E COLFAX AVE" "2345 W ALAMEDA AVE" ...
## $ GEO_X : num [1:499468] 3135366 3142454 3147484 3136478 3169237 ...
## $ GEO_Y : num [1:499468] 1685410 1696151 1694898 1684414 1705800 ...
## $ GEO_LON : num [1:499468] -105 -105 -105 -105 -105 ...
## $ GEO_LAT : num [1:499468] 39.7 39.7 39.7 39.7 39.8 ...
## $ DISTRICT_ID : int [1:499468] 4 6 6 4 5 6 3 6 3 1 ...
## $ PRECINCT_ID : int [1:499468] 411 611 621 411 512 621 312 623 311 123 ...
## $ NEIGHBORHOOD_ID : chr [1:499468] "valverde" "cbd" "north-capitol-hill" "valverde" ...
## $ IS_CRIME : int [1:499468] 1 1 1 1 1 1 1 1 1 1 ...
## $ IS_TRAFFIC : int [1:499468] 0 0 0 0 0 0 0 0 0 0 ...
denver_offense_codes_revised <- as_tibble(denver_offense_codes)
str(denver_offense_codes_revised)
## tibble [299 × 9] (S3: tbl_df/tbl/data.frame)
## $ OBJECTID : int [1:299] 1 2 3 4 5 6 7 8 9 10 ...
## $ OFFENSE_CODE : int [1:299] 2804 2804 2901 2902 2903 2999 2999 2999 3501 3503 ...
## $ OFFENSE_CODE_EXTENSION: int [1:299] 1 2 0 0 0 0 1 2 0 0 ...
## $ OFFENSE_TYPE_ID : chr [1:299] "stolen-property-possession" "fraud-possess-financial-device" "damaged-prop-bus" "criminal-mischief-private" ...
## $ OFFENSE_TYPE_NAME : chr [1:299] "Possession of stolen property" "Possession of a financial device" "Damaged business property" "Criminal mischief to private property" ...
## $ OFFENSE_CATEGORY_ID : chr [1:299] "all-other-crimes" "all-other-crimes" "public-disorder" "public-disorder" ...
## $ OFFENSE_CATEGORY_NAME : chr [1:299] "All Other Crimes" "All Other Crimes" "Public Disorder" "Public Disorder" ...
## $ IS_CRIME : int [1:299] 1 1 1 1 1 1 1 1 1 1 ...
## $ IS_TRAFFIC : int [1:299] 0 0 0 0 0 0 0 0 0 0 ...
denver_offense_codes_revised$OFFENSE_CODE <- as.character(denver_offense_codes_revised$OFFENSE_CODE )
denver_crime_revised_3$OFFENSE_CODE <- as.character(denver_crime_revised_3$OFFENSE_CODE )
denver_crimes_codes_joined_3 <- inner_join(denver_crime_revised_3, denver_offense_codes_revised,
by = c("OFFENSE_CODE", "OFFENSE_CODE_EXTENSION", "OFFENSE_TYPE_ID",
"OFFENSE_CATEGORY_ID", "IS_CRIME", "IS_TRAFFIC"))
denver_crime_datetime_separate_3 <- denver_crimes_codes_joined_3 %>%
mutate(FIRST_OCCURRENCE_DATE= mdy_hms(FIRST_OCCURRENCE_DATE),
day = day(FIRST_OCCURRENCE_DATE),
month = month(FIRST_OCCURRENCE_DATE),
year = year(FIRST_OCCURRENCE_DATE),
dayofweek = wday(FIRST_OCCURRENCE_DATE),
minute = wday(FIRST_OCCURRENCE_DATE),
second = second(FIRST_OCCURRENCE_DATE))
The dataset I obtained consisted of 471,428 rows of offenses that were reported from 1/2/2017 to 6/12/2022. 77.6% of said offenses were labeled as a crime and 22.3% as traffic incidents. This difference in percentages is not so black and white because the data was aggregated in order to determine which were crimes and which were traffic related.
#Crime vs Traffic Percentage Graph
unique(denver_crimes_codes_joined_3$OFFENSE_CATEGORY_NAME)
## [1] "Public Disorder" "Drug & Alcohol"
## [3] "Sexual Assault" "All Other Crimes"
## [5] "Traffic Accident" "Robbery"
## [7] "Other Crimes Against Persons" "Aggravated Assault"
## [9] "Arson" "Burglary"
## [11] "Larceny" "Theft from Motor Vehicle"
## [13] "Auto Theft" "White Collar Crime"
## [15] "Murder"
denver_crimes_codes_joined_is_isnt_crime_3 <- denver_crimes_codes_joined_3
is_crime_traffic_3 <- denver_crimes_codes_joined_is_isnt_crime_3 %>%
group_by(IS_CRIME, IS_TRAFFIC) %>%
tally() %>%
complete(IS_TRAFFIC, fill = list(n=0)) %>%
plyr::mutate(percentage = n / sum(n) * 100)
crime_traffic_percentage_plot <- ggplot(is_crime_traffic_3, aes(IS_TRAFFIC, percentage, fill = IS_CRIME)) +
geom_bar(stat = 'identity', position = 'dodge') +
xlab("Crime Traffic") +
theme(legend.position="none", axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
axis.title.x = element_text(angle = 0)) +
ylab("Percentage") + ggtitle("Percentage of Crimes vs Traffic Incidents")
crime_traffic_percentage_plot
While only two bars were graphed, the IS_CRIME and IS_TRAFFIC
columns are used to group the rows of data into three categories:
non-traffic criminal offenses, non-criminal traffic offenses, and
criminal traffic offenses. In this dataset, there were only 283
observations in the criminal traffic offenses category. This is such a
small percentage, so I omitted it from the plot.
denver_crime_datetime_separate_3$date <- as.Date(denver_crime_datetime_separate_3$FIRST_OCCURRENCE_DATE)
denver_crime_datetime_separate_3$time <- format(as.POSIXct(denver_crime_datetime_separate_3$FIRST_OCCURRENCE_DATE),
format = "%H:%M:%S")
first_500_denver_crime_datetime_separate_3 <- denver_crime_datetime_separate_3[1:500,]
first_500_denver_crime_datetime_separate_3$time <- as.character(first_500_denver_crime_datetime_separate_3$time)
first_500_denver_crime_datetime_separate_3
## # A tibble: 500 × 30
## incident_id offense_id OFFENSE_CODE OFFENSE_CODE_EXTENSION OFFENSE_TYPE_ID
## <dbl> <dbl> <chr> <int> <chr>
## 1 20226000193 2.02e16 2999 0 criminal-mischief…
## 2 20223319 2.02e13 2999 0 criminal-mischief…
## 3 20223093 2.02e13 2999 0 criminal-mischief…
## 4 20224000 2.02e13 2999 0 criminal-mischief…
## 5 20223956 2.02e13 2999 0 criminal-mischief…
## 6 20223903 2.02e13 2999 0 criminal-mischief…
## 7 20223899 2.02e13 2999 0 criminal-mischief…
## 8 20223888 2.02e13 2999 0 criminal-mischief…
## 9 20228085 2.02e13 2999 0 criminal-mischief…
## 10 20224563 2.02e13 2999 0 criminal-mischief…
## # … with 490 more rows, and 25 more variables: OFFENSE_CATEGORY_ID <chr>,
## # FIRST_OCCURRENCE_DATE <dttm>, LAST_OCCURRENCE_DATE <chr>,
## # REPORTED_DATE <chr>, INCIDENT_ADDRESS <chr>, GEO_X <dbl>, GEO_Y <dbl>,
## # GEO_LON <dbl>, GEO_LAT <dbl>, DISTRICT_ID <int>, PRECINCT_ID <int>,
## # NEIGHBORHOOD_ID <chr>, IS_CRIME <int>, IS_TRAFFIC <int>, OBJECTID <int>,
## # OFFENSE_TYPE_NAME <chr>, OFFENSE_CATEGORY_NAME <chr>, day <int>,
## # month <int>, year <int>, dayofweek <int>, minute <int>, second <int>, …
datatable(first_500_denver_crime_datetime_separate_3, options = list(pageLength = 25,scrollX='400px'))
denver_crime_datetime_separate_3 <- denver_crime_datetime_separate_3 %>%
filter(GEO_LON >= -105.3218 & GEO_LON <= -104.6096839)
denver_cols_map_crime_2 <- denver_crime_datetime_separate_3 %>%
filter(IS_TRAFFIC == 0)
Per the variable definitions noted above, it’s important to understand the difference between incident_id and offense_id. For each crime a person commits, an offense_id is generated. This means that if a person steals, is in possession of drugs, and commits murder, three different offense_ids will be created and similarly, there will be three instances of that same incident_id. This is the case for 11.5% of the incident_ids in this dataset. Therefore, I’ve chosen to acknowledge each row of data as a single crime. In a future criminal analysis, I’d like to delve into the crimes with multiple offense_ids in order to find which violations took place at the same time.
denver_cols_map_crime_2$summary_box <- paste("<b>Incident #: </b>", denver_cols_map_crime_2$incident_id,
"<br>", "<b>Incident Address: </b>", denver_cols_map_crime_2$INCIDENT_ADDRESS,
"<br>", "<b>Category: </b>", denver_cols_map_crime_2$OFFENSE_CATEGORY_ID,
"<br>", "<b>Day of the week: </b>", denver_cols_map_crime_2$dayofweek,
"<br>", "<b>Date: </b>", denver_cols_map_crime_2$date,
"<br>", "<b>Time: </b>", denver_cols_map_crime_2$time,
"<br>", "<b>Denver Neighborhood: </b>", denver_cols_map_crime_2$NEIGHBORHOOD_ID,
"<br>", "<b>Denver Police district ID #: </b>", denver_cols_map_crime_2$DISTRICT_ID,
"<br>", "<b>Longitude: </b>", denver_cols_map_crime_2$GEO_LON,
"<br>", "<b>Latitude: </b>", denver_cols_map_crime_2$GEO_LAT)
#Denver Crime subset - 40,000 samples - capacity of OpenStreetMap
denver_crime_subset <- denver_cols_map_crime_2[1:40000, ]
denver_crime_subset$summary_box <- paste("<b>Incident #: </b>", denver_crime_subset$incident_id,
"<br>", "<b>Incident Address: </b>", denver_crime_subset$INCIDENT_ADDRESS,
"<br>", "<b>Category: </b>", denver_crime_subset$OFFENSE_CATEGORY_ID,
"<br>", "<b>Day of the week: </b>", denver_crime_subset$dayofweek,
"<br>", "<b>Date: </b>", denver_crime_subset$date,
"<br>", "<b>Time: </b>", denver_crime_subset$time,
"<br>", "<b>Denver Neighborhood: </b>", denver_crime_subset$NEIGHBORHOOD_ID,
"<br>", "<b>Denver Police district ID #: </b>", denver_crime_subset$DISTRICT_ID,
"<br>", "<b>Longitude: </b>", denver_crime_subset$GEO_LON,
"<br>", "<b>Latitude: </b>", denver_crime_subset$GEO_LAT)
leaflet() %>%
addProviderTiles(providers$OpenStreetMap, group = "OSM") %>%
addTiles('http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png')%>%
addMarkers(lng = denver_crime_subset$GEO_LON,
lat = denver_crime_subset$GEO_LAT,
popup = denver_crime_subset$summary_box,
clusterOptions = markerClusterOptions())