Introduction

I recently moved to Denver and therefore thought it would be interesting to analyze the crime that occurs here.Denver (both County and City) provides public crime data on their website (https://www.denvergov.org/opendata/dataset/city-and-county-of-denver-crime). Although there are many claims one could make from this dataset, my goal is - via analysis - to either support or refute conjectures made about crime in the Denver area.

Denver’s Crime Data:

Per the City and County of Denver’s website,
“This dataset includes criminal offenses in the City and County of Denver for the previous five calendar years plus the current year to date. The data is based on the national Incident Based Reporting System (NIBRS) which includes all victims of person crimes and all crimes within an incident. The data is dynamic, which allows for additions, deletions, and/or modifications at any time, resulting in more accurate information in the database. Due to continuous data entry, the number of records in subsequent extractions are subject to change. Crime data is updated Monday through Friday.

Analysis Questions:

I’ve read through several articles from NPR and CPR. This article for reference (https://www.cpr.org/2022/03/10/colorado-crime-rates/) claims that crime began rising substantially before the pandemic began in 2020 and that trend still continues to this day. Below, I’ve outlined the takeaways from this article that I will address in my analysis.
‘Violent crime’ includes murder, aggravated assault, sexual assault, and robbery has been on the rise from 2019 to 2021.
From 2019 to 2021, murder has increased by 47 percent (Colorado Bureau of Investigation).
Property crime has increased by 20 percent.
Auto theft has increased by 86 percent.
The FBI analyzed data from 2019 to 2020 and found that Colorado had seen the fourth-highest increase in crimes in the United States. The top three states were Pennsylvania, South Dakota, and Utah.

Variables Included in the Dataset:

Per Niel Oza (who reached out to the City of Denver for clarification/variable definitions), I’ve included definitions of the variables in the crime dataset (https://www.kaggle.com/code/neilb4yourking/analyzing-denver-s-crime-data/notebook).

Variable Explanation:
OFFENSE_ID
is a unique identifier for each offense. It is generated by concatenating INCIDENT_ID, OFFENSE_CODE, and OFFENSE_CODE_EXTENSION. It provides a unique identifier for each offense.
INCIDENT_ID is an identifier for an occurence of offenses. Most OFFENSE_ID’s have unique INCIDENT_ID’s, but when a person commits multiple offenses at once, e.g. liquor possession and heroine possession, multiple OFFENSE_ID’s will be generated from the INCIDENT_ID.
OFFENSE_CODE is a unique identifier for a particular type of offense. Things such as criminal mischief, trespassing, larceny, etc. all have different OFFENSE_CODE values to identify them.
OFFENSE_CODE_EXTENSION are used to describe a subset of another type crime. For example criminal_mischief-motor vehicle and criminal_mischief-other have the same OFFENSE_CODE but different extensions to differentiate them.
OFFENSE_TYPE_ID provides the basic name for the offense. Each combination of OFFENSE_CODE and OFFENSE_EXTENSION reference a unique crime. Contents of this column include things such as theft-shoplift, criminal-trespassing, and threats-to-injure.
OFFENSE_CATEGORY_ID provides a more general categorization for crimes. For example, theft-shoplift and theft-from-bldg are both forms of larceny.
FIRST_OCCURENCE_DATE is the first possible date/time of the offense. If the time of the offense is known, the LAST_OCCURENCE_DATE will have value NaN. If the time is not known, FIRST_OCCURENCE_DATE will note the first possible time for the offense, and LAST_OCCURENCE_DATE will be last possible time of the offense. This commonly occurs with burglaries, where the exact time of the offense may not be known, but a range of time is known.
LAST_OCCURENCE_DATE will be NaN if the exact time of the offense is known and will be an actual time if only a range of possible times is known. In the latter case, it will be the last possible time the offense could have occured.
REPORTED_DATE is the time at which the offense was reported to the police.
INCIDENT_ADDRESS provides the location of the offense. Not all entries have a value for this column for privacy reasons.
GEO_LON and GEO_LAT are the latitudes and longitudes of the location of the offense.
GEO_X and GEO_Y are the state plane (city of Denver standard projection) for the offense location. Functionally similar to GEO_LON and GEO_LAT.
DISTRICT_ID is the district in charge of handling the offense.
PRECINCT_ID is the precinct in charge of handling the offense.
NEIGHBORHOOD_ID is the neighborhood the offense occurred in.
IS_CRIME states whether the offense was a crime.
IS_TRAFFIC states whether the offense was a traffic incident.

Sys.setenv(PATH = paste(Sys.getenv("PATH"), "path_to_pandoc", sep=.Platform$path.sep))

denver_crime_dataset_3 <- utils::read.csv("/Users/bethanyleach/crime-3.csv")
denver_offense_codes <- utils::read.csv("/Users/bethanyleach/offense_codes.csv")
denver_crime_revised_3 <- as_tibble(denver_crime_dataset_3)

str(denver_crime_revised_3)
## tibble [499,468 × 19] (S3: tbl_df/tbl/data.frame)
##  $ incident_id           : num [1:499468] 2.02e+10 2.02e+07 2.02e+07 2.02e+07 2.02e+07 ...
##  $ offense_id            : num [1:499468] 2.02e+16 2.02e+13 2.02e+13 2.02e+13 2.02e+13 ...
##  $ OFFENSE_CODE          : int [1:499468] 2999 2999 2999 2999 2999 2999 2999 2999 2999 2999 ...
##  $ OFFENSE_CODE_EXTENSION: int [1:499468] 0 0 0 0 0 0 0 0 0 0 ...
##  $ OFFENSE_TYPE_ID       : chr [1:499468] "criminal-mischief-other" "criminal-mischief-other" "criminal-mischief-other" "criminal-mischief-other" ...
##  $ OFFENSE_CATEGORY_ID   : chr [1:499468] "public-disorder" "public-disorder" "public-disorder" "public-disorder" ...
##  $ FIRST_OCCURRENCE_DATE : chr [1:499468] "1/4/2022 11:30:00 AM" "1/3/2022 6:45:00 AM" "1/3/2022 1:00:00 AM" "1/3/2022 7:47:00 PM" ...
##  $ LAST_OCCURRENCE_DATE  : chr [1:499468] "1/4/2022 12:00:00 PM" "" "" "" ...
##  $ REPORTED_DATE         : chr [1:499468] "1/4/2022 8:36:00 PM" "1/3/2022 11:01:00 AM" "1/3/2022 6:11:00 AM" "1/3/2022 9:12:00 PM" ...
##  $ INCIDENT_ADDRESS      : chr [1:499468] "128 S CANOSA CT" "650 15TH ST" "919 E COLFAX AVE" "2345 W ALAMEDA AVE" ...
##  $ GEO_X                 : num [1:499468] 3135366 3142454 3147484 3136478 3169237 ...
##  $ GEO_Y                 : num [1:499468] 1685410 1696151 1694898 1684414 1705800 ...
##  $ GEO_LON               : num [1:499468] -105 -105 -105 -105 -105 ...
##  $ GEO_LAT               : num [1:499468] 39.7 39.7 39.7 39.7 39.8 ...
##  $ DISTRICT_ID           : int [1:499468] 4 6 6 4 5 6 3 6 3 1 ...
##  $ PRECINCT_ID           : int [1:499468] 411 611 621 411 512 621 312 623 311 123 ...
##  $ NEIGHBORHOOD_ID       : chr [1:499468] "valverde" "cbd" "north-capitol-hill" "valverde" ...
##  $ IS_CRIME              : int [1:499468] 1 1 1 1 1 1 1 1 1 1 ...
##  $ IS_TRAFFIC            : int [1:499468] 0 0 0 0 0 0 0 0 0 0 ...
denver_offense_codes_revised <- as_tibble(denver_offense_codes)
str(denver_offense_codes_revised)
## tibble [299 × 9] (S3: tbl_df/tbl/data.frame)
##  $ OBJECTID              : int [1:299] 1 2 3 4 5 6 7 8 9 10 ...
##  $ OFFENSE_CODE          : int [1:299] 2804 2804 2901 2902 2903 2999 2999 2999 3501 3503 ...
##  $ OFFENSE_CODE_EXTENSION: int [1:299] 1 2 0 0 0 0 1 2 0 0 ...
##  $ OFFENSE_TYPE_ID       : chr [1:299] "stolen-property-possession" "fraud-possess-financial-device" "damaged-prop-bus" "criminal-mischief-private" ...
##  $ OFFENSE_TYPE_NAME     : chr [1:299] "Possession of stolen property" "Possession of a financial device" "Damaged business property" "Criminal mischief to private property" ...
##  $ OFFENSE_CATEGORY_ID   : chr [1:299] "all-other-crimes" "all-other-crimes" "public-disorder" "public-disorder" ...
##  $ OFFENSE_CATEGORY_NAME : chr [1:299] "All Other Crimes" "All Other Crimes" "Public Disorder" "Public Disorder" ...
##  $ IS_CRIME              : int [1:299] 1 1 1 1 1 1 1 1 1 1 ...
##  $ IS_TRAFFIC            : int [1:299] 0 0 0 0 0 0 0 0 0 0 ...
denver_offense_codes_revised$OFFENSE_CODE <- as.character(denver_offense_codes_revised$OFFENSE_CODE )
denver_crime_revised_3$OFFENSE_CODE <- as.character(denver_crime_revised_3$OFFENSE_CODE )

denver_crimes_codes_joined_3 <- inner_join(denver_crime_revised_3, denver_offense_codes_revised, 
                                           by = c("OFFENSE_CODE", "OFFENSE_CODE_EXTENSION", "OFFENSE_TYPE_ID", 
                                                  "OFFENSE_CATEGORY_ID", "IS_CRIME", "IS_TRAFFIC"))

denver_crime_datetime_separate_3 <- denver_crimes_codes_joined_3 %>%
  mutate(FIRST_OCCURRENCE_DATE= mdy_hms(FIRST_OCCURRENCE_DATE),
         day = day(FIRST_OCCURRENCE_DATE),
         month = month(FIRST_OCCURRENCE_DATE),
         year = year(FIRST_OCCURRENCE_DATE),
         dayofweek = wday(FIRST_OCCURRENCE_DATE),
         minute = wday(FIRST_OCCURRENCE_DATE),
         second = second(FIRST_OCCURRENCE_DATE))

Data ‘Dive’:

The dataset I obtained consisted of 471,428 rows of offenses that were reported from 1/2/2017 to 6/12/2022. 77.6% of said offenses were labeled as a crime and 22.3% as traffic incidents. This difference in percentages is not so black and white because the data was aggregated in order to determine which were crimes and which were traffic related.

#Crime vs Traffic Percentage Graph
unique(denver_crimes_codes_joined_3$OFFENSE_CATEGORY_NAME)
##  [1] "Public Disorder"              "Drug & Alcohol"              
##  [3] "Sexual Assault"               "All Other Crimes"            
##  [5] "Traffic Accident"             "Robbery"                     
##  [7] "Other Crimes Against Persons" "Aggravated Assault"          
##  [9] "Arson"                        "Burglary"                    
## [11] "Larceny"                      "Theft from Motor Vehicle"    
## [13] "Auto Theft"                   "White Collar Crime"          
## [15] "Murder"
denver_crimes_codes_joined_is_isnt_crime_3 <- denver_crimes_codes_joined_3 

is_crime_traffic_3 <- denver_crimes_codes_joined_is_isnt_crime_3 %>%
  group_by(IS_CRIME, IS_TRAFFIC) %>%
  tally() %>%
  complete(IS_TRAFFIC, fill = list(n=0)) %>%
  plyr::mutate(percentage = n / sum(n) * 100)

crime_traffic_percentage_plot <- ggplot(is_crime_traffic_3, aes(IS_TRAFFIC, percentage, fill = IS_CRIME)) + 
  geom_bar(stat = 'identity', position = 'dodge') + 
  xlab("Crime                             Traffic") + 
  theme(legend.position="none", axis.ticks.x = element_blank(), 
        axis.text.x = element_blank(), 
        axis.title.x = element_text(angle = 0)) + 
  ylab("Percentage") + ggtitle("Percentage of Crimes vs Traffic Incidents")

crime_traffic_percentage_plot

While only two bars were graphed, the IS_CRIME and IS_TRAFFIC columns are used to group the rows of data into three categories: non-traffic criminal offenses, non-criminal traffic offenses, and criminal traffic offenses. In this dataset, there were only 283 observations in the criminal traffic offenses category. This is such a small percentage, so I omitted it from the plot.

denver_crime_datetime_separate_3$date <- as.Date(denver_crime_datetime_separate_3$FIRST_OCCURRENCE_DATE) 

denver_crime_datetime_separate_3$time <- format(as.POSIXct(denver_crime_datetime_separate_3$FIRST_OCCURRENCE_DATE),    
                                                format = "%H:%M:%S")

first_500_denver_crime_datetime_separate_3 <- denver_crime_datetime_separate_3[1:500,]
first_500_denver_crime_datetime_separate_3$time <- as.character(first_500_denver_crime_datetime_separate_3$time) 
first_500_denver_crime_datetime_separate_3
## # A tibble: 500 × 30
##    incident_id offense_id OFFENSE_CODE OFFENSE_CODE_EXTENSION OFFENSE_TYPE_ID   
##          <dbl>      <dbl> <chr>                         <int> <chr>             
##  1 20226000193    2.02e16 2999                              0 criminal-mischief…
##  2    20223319    2.02e13 2999                              0 criminal-mischief…
##  3    20223093    2.02e13 2999                              0 criminal-mischief…
##  4    20224000    2.02e13 2999                              0 criminal-mischief…
##  5    20223956    2.02e13 2999                              0 criminal-mischief…
##  6    20223903    2.02e13 2999                              0 criminal-mischief…
##  7    20223899    2.02e13 2999                              0 criminal-mischief…
##  8    20223888    2.02e13 2999                              0 criminal-mischief…
##  9    20228085    2.02e13 2999                              0 criminal-mischief…
## 10    20224563    2.02e13 2999                              0 criminal-mischief…
## # … with 490 more rows, and 25 more variables: OFFENSE_CATEGORY_ID <chr>,
## #   FIRST_OCCURRENCE_DATE <dttm>, LAST_OCCURRENCE_DATE <chr>,
## #   REPORTED_DATE <chr>, INCIDENT_ADDRESS <chr>, GEO_X <dbl>, GEO_Y <dbl>,
## #   GEO_LON <dbl>, GEO_LAT <dbl>, DISTRICT_ID <int>, PRECINCT_ID <int>,
## #   NEIGHBORHOOD_ID <chr>, IS_CRIME <int>, IS_TRAFFIC <int>, OBJECTID <int>,
## #   OFFENSE_TYPE_NAME <chr>, OFFENSE_CATEGORY_NAME <chr>, day <int>,
## #   month <int>, year <int>, dayofweek <int>, minute <int>, second <int>, …
datatable(first_500_denver_crime_datetime_separate_3, options = list(pageLength = 25,scrollX='400px'))
denver_crime_datetime_separate_3 <- denver_crime_datetime_separate_3 %>%
  filter(GEO_LON >= -105.3218 & GEO_LON <= -104.6096839)

denver_cols_map_crime_2 <- denver_crime_datetime_separate_3 %>%
  filter(IS_TRAFFIC == 0)

Understanding Incident ID and Offense ID:

Per the variable definitions noted above, it’s important to understand the difference between incident_id and offense_id. For each crime a person commits, an offense_id is generated. This means that if a person steals, is in possession of drugs, and commits murder, three different offense_ids will be created and similarly, there will be three instances of that same incident_id. This is the case for 11.5% of the incident_ids in this dataset. Therefore, I’ve chosen to acknowledge each row of data as a single crime. In a future criminal analysis, I’d like to delve into the crimes with multiple offense_ids in order to find which violations took place at the same time.

denver_cols_map_crime_2$summary_box <- paste("<b>Incident #: </b>", denver_cols_map_crime_2$incident_id,
                                           "<br>", "<b>Incident Address: </b>", denver_cols_map_crime_2$INCIDENT_ADDRESS,
                                           "<br>", "<b>Category: </b>", denver_cols_map_crime_2$OFFENSE_CATEGORY_ID,
                                           "<br>", "<b>Day of the week: </b>", denver_cols_map_crime_2$dayofweek,
                                           "<br>", "<b>Date: </b>", denver_cols_map_crime_2$date,
                                           "<br>", "<b>Time: </b>", denver_cols_map_crime_2$time,
                                           "<br>", "<b>Denver Neighborhood: </b>", denver_cols_map_crime_2$NEIGHBORHOOD_ID,
                                           "<br>", "<b>Denver Police district ID #: </b>", denver_cols_map_crime_2$DISTRICT_ID,
                                           "<br>", "<b>Longitude: </b>", denver_cols_map_crime_2$GEO_LON,
                                           "<br>", "<b>Latitude: </b>", denver_cols_map_crime_2$GEO_LAT)

#Denver Crime subset - 40,000 samples - capacity of OpenStreetMap
denver_crime_subset <- denver_cols_map_crime_2[1:40000, ]

denver_crime_subset$summary_box <- paste("<b>Incident #: </b>", denver_crime_subset$incident_id,
                                           "<br>", "<b>Incident Address: </b>", denver_crime_subset$INCIDENT_ADDRESS,
                                           "<br>", "<b>Category: </b>", denver_crime_subset$OFFENSE_CATEGORY_ID,
                                           "<br>", "<b>Day of the week: </b>", denver_crime_subset$dayofweek,
                                           "<br>", "<b>Date: </b>", denver_crime_subset$date,
                                           "<br>", "<b>Time: </b>", denver_crime_subset$time,
                                           "<br>", "<b>Denver Neighborhood: </b>", denver_crime_subset$NEIGHBORHOOD_ID,
                                           "<br>", "<b>Denver Police district ID #: </b>", denver_crime_subset$DISTRICT_ID,
                                           "<br>", "<b>Longitude: </b>", denver_crime_subset$GEO_LON,
                                           "<br>", "<b>Latitude: </b>", denver_crime_subset$GEO_LAT)
  leaflet() %>%
    addProviderTiles(providers$OpenStreetMap, group = "OSM") %>%
    addTiles('http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png')%>%
    addMarkers(lng = denver_crime_subset$GEO_LON,
               lat = denver_crime_subset$GEO_LAT, 
               popup = denver_crime_subset$summary_box,
               clusterOptions = markerClusterOptions())

Mapping Crime in Denver via OpenStreetMap

In order to render this RMarkdown report, I had to consolidate the crime data set from 383,000 lines of data to 40,000. Roughly two thirds of the crimes are committed in downtown Denver. This makes sense as later in the report, I graphically show that the majority of crimes take place in Five Points (downtown).

#DAILY CRIME FROM 1/2/17-6/12/22
df_crime_over_time_2 <- denver_cols_map_crime_2 %>%
  group_by(date) %>%
  dplyr::summarize(total = n()) %>%
  arrange(date)

df_crime_over_time_plot_2 <- ggplot(df_crime_over_time_2, aes(x = date, y = total)) +
  geom_line(color = "purple", size = 0.05) +
  geom_smooth(color = "navy") +
  scale_x_date(breaks = date_breaks("1 year"), labels = date_format("%Y")) +
  xlab("Date of Crime (Year)") + ylab("Number of Crimes Committed") + 
  ggtitle("Denver: Daily Number of Crimes Committed from 1/2/2017 - 6/12/2022") +
  theme(axis.text.x = element_text(angle=30, vjust=.5, hjust=1))

df_crime_over_time_plot_2
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

# Explanation The daily crime trend is fairly consistent from 2017 until mid 2020 with a noticeable uptick in daily crime from 2020-2022. This analysis only includes 2022 data from January through mid June.

#TOTAL CRIMES PER YEAR
df_crime_by_year_2 <- denver_cols_map_crime_2 %>%
  group_by(year) %>%
  dplyr::summarize(total = n()) %>%
  arrange(year)

df_crime_year_plot_2 <- ggplot(df_crime_by_year_2, aes(year, total)) + 
  geom_bar(stat = "identity", position = "dodge") + 
  ggtitle("Total Number of Crimes per Year") +
  scale_x_continuous(breaks = df_crime_by_year_2$year)

df_crime_year_plot_2

# Explanation Similar to the previous graph which outlined daily crime from 2017-2022, this graph (outlining total crimes per year) shows the total number of crimes was roughly the same amount from 2017-2020 with an obvious uptick of crimes committed in 2021. Again, the total crime data for 2022 is low because this analysis only includes data from January through mid-June 2022.

#TOTAL CRIMES BY MONTH
df_crime_by_month_2 <- denver_cols_map_crime_2 %>%
  group_by(month) %>%
  dplyr::summarize(total = n()) %>%
  arrange(month)

month_format <- c("January","February","March","April","May","June", "July", "August",
                  "September", "October", "November", "December")

df_crime_by_month_2$month <- factor(df_crime_by_month_2$month, label = revalue(month_format))

month_crime_plot_2 <- ggplot(df_crime_by_month_2, aes(month, total, fill = month)) +
  geom_bar(stat = "identity", position = "dodge") + 
  ggtitle("Total crimes committed by month") + 
  theme(axis.text.x = element_text(angle=45, vjust=.8, hjust=1))

month_crime_plot_2

Explantion

As previously stated, this dataset only includes data through mid June 2022. Therefore, this graph is a bit inconclusive meaning that although there is a noticeably higher monthly crime rate from January through June, this could be attributed to this incomplete dataset.

#TOTAL CRIMES BY DAY OF THE MONTH
df_crime_day_of_month_2 <- denver_cols_map_crime_2 %>%
  group_by(day) %>%
  dplyr::summarize(total = n()) %>%
  arrange(day)

df_crime_day_of_month_plot_2 <- ggplot(df_crime_day_of_month_2, aes(day, total)) + 
  geom_bar(stat = "identity", position = "dodge") +
  ggtitle("Total crimes committed by day of the month")

df_crime_day_of_month_plot_2

Explanation

Based off this graph, there are substantially more crimes committed the first day of the month and noticeably fewer committed the last day of the month. Another explanation for this situation could be that a percentage of the crimes committed on the last day of the month end up being added to the system the first day of the month.

#CRIME BY WEEK/MONTH OF THE YEAR
denver_cols_map_crime_2$week_year <- strftime(denver_cols_map_crime_2$date, format = "%V")

week_year_crime_occurred_2 <- denver_cols_map_crime_2
year_format <- c("2017","2018","2019","2020","2021","2022")
week_year_crime_occurred_2$year <- factor(week_year_crime_occurred_2$year, label = revalue(year_format))

week_year_crime_occurred_2<- denver_cols_map_crime_2 %>%
  group_by(week_year, year) %>%
  dplyr::summarize(total = n()) %>%
  arrange(year)
## `summarise()` has grouped output by 'week_year'. You can override using the
## `.groups` argument.
week_year_crime_occurred_2
## # A tibble: 286 × 3
## # Groups:   week_year [53]
##    week_year  year total
##    <chr>     <int> <int>
##  1 01         2017  1028
##  2 02         2017  1219
##  3 03         2017  1273
##  4 04         2017  1328
##  5 05         2017  1291
##  6 06         2017  1277
##  7 07         2017  1325
##  8 08         2017  1172
##  9 09         2017  1253
## 10 10         2017  1263
## # … with 276 more rows
date_a <- seq(as.Date("2016-12-28"), 
             by = "week", 
             to = as.Date("2022-06-15"))
yearinfo_a <- sample(x = 2017:2022, size = 286, replace = TRUE)

date_year_a <- data.frame(date_a, yearinfo_a)
colnames(date_year_a) <- c("Date", "Year")

week_crime_occurred_plot_2 <- ggplot(date_year_a, aes(Date, week_year_crime_occurred_2$total)) +
  geom_line()+ scale_x_date(breaks = "1 year", date_labels = "%Y",
                            limits = c(as.Date("2017-01-01"), as.Date("2022-06-14")),
                            expand = c(0,0)) + xlab("Weeks") + ylab("Total Crimes") +
  ggtitle("Total Crimes Committed Per Week from 1/2/2017 - 6/12/2022")

week_crime_occurred_plot_2
## Warning: Removed 2 row(s) containing missing values (geom_path).

month_crime_occurred_2 <- denver_cols_map_crime_2 %>%
  group_by(month, year) %>%
  dplyr::summarize(total = n()) %>%
  arrange(year)
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
month_crime_occurred_2
## # A tibble: 66 × 3
## # Groups:   month [12]
##    month  year total
##    <int> <int> <int>
##  1     1  2017  5272
##  2     2  2017  4994
##  3     3  2017  5688
##  4     4  2017  5527
##  5     5  2017  5882
##  6     6  2017  6003
##  7     7  2017  5979
##  8     8  2017  6497
##  9     9  2017  5848
## 10    10  2017  5737
## # … with 56 more rows
date_b <- seq(as.Date("2017-01-01"), 
             by = "month", 
             to = as.Date("2022-06-15"))
yearinfo_month_b <- sample(x = 2017:2022, size = 66, replace = TRUE)

date_year_b <- data.frame(date_b, yearinfo_month_b)
colnames(date_year_b) <- c("Month", "Year")

month_crime_occurred_plot_b <- ggplot(date_year_b, aes(Month, month_crime_occurred_2$total)) +
  geom_line()+ scale_x_date(breaks = "1 year", date_labels = "%Y",
                            limits = c(as.Date("2017-01-01"), as.Date("2022-03-05")),
                            expand = c(0,0)) + xlab("Months") + ylab("Total Crimes") +
  ggtitle("Total Crimes Committed Per Month from 1/2/2017 - 6/12/2022")

month_crime_occurred_plot_b
## Warning: Removed 3 row(s) containing missing values (geom_path).

Explanation

The first graph shows the trend of crimes committed by week per year. This graph also shows a susbtantial increase in crimes committed from 2020 onward.
The second graph shows the trend of crimes committed by month per year. Again, this graph shows a similar pattern of crimes committed monthly from 2017-2019 and then an uptick taking place from 2020-2022.

#Crime total by neighborhood
df_crime_denver_neighborhood_2 <- denver_cols_map_crime_2 %>%
  group_by(NEIGHBORHOOD_ID) %>%
  dplyr::summarize(total = n()) %>%
  slice_max(order_by = total, n = 10)

df_crime_denver_neighborhood_plot_2 <- ggplot(df_crime_denver_neighborhood_2, aes(x = reorder(NEIGHBORHOOD_ID, 
                                                                                          -total), y = total, fill = NEIGHBORHOOD_ID)) + 
  geom_bar(stat = 'identity', position = 'dodge') + 
  xlab("Denver Neighborhoods") + 
  ylab("Total Crimes") + ggtitle("Denver Neighborhoods Where the Majority of Crimes Occur") +
  theme(legend.position = "none", axis.text.x = element_text(angle=45, vjust=0.9, hjust=1))

df_crime_denver_neighborhood_plot_2

#Map of Denver Neighborhoods
register_google("AIzaSyBZx2Za3bJfe2OY0QxgPgef-4GX9jd61Pg")

denver_area_map <- get_map("Denver",zoom=11)
## Source : https://maps.googleapis.com/maps/api/staticmap?center=Denver&zoom=11&size=640x640&scale=2&maptype=terrain&language=en-EN&key=xxx-4GX9jd61Pg
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Denver&key=xxx-4GX9jd61Pg
denver_area_map_bw <- get_map("Denver",zoom=11, color="bw")
## Source : https://maps.googleapis.com/maps/api/staticmap?center=Denver&zoom=11&size=640x640&scale=2&maptype=terrain&language=en-EN&key=xxx-4GX9jd61Pg
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Denver&key=xxx-4GX9jd61Pg
google_denver <- ggmap(denver_area_map)

denver_neigh_unique <- denver_cols_map_crime_2

cc_denver_neigh_unique <- denver_neigh_unique[complete.cases(denver_neigh_unique), ]
denver_neigh_unique <- cc_denver_neigh_unique[!duplicated(cc_denver_neigh_unique[17]),]

crime_full_set_com_case <- denver_cols_map_crime_2[complete.cases(denver_cols_map_crime_2), ]

neighborhoods_de <- structure(list(neighborhoods = c("Five Points", "Central Park", "Capitol Hill",
                                                     "Central Business District", "Montbello", "Union Station", 
                                                     "Civic Center", "East Colfax", "Gateway Green Valley Ranch",
                                                     "Lincoln Park"), 
                                   lon = c(-104.9740, -104.8800, -104.9813, -104.9934, -104.8434, 
                                           -105.0032, -104.9926, -104.8945, -104.7691, -104.9984),
                                   lat = c(39.75705, 39.75722, 39.73353, 39.74365, 39.78309,
                                           39.75033, 39.73308, 39.74649, 39.78255, 39.72606)), 
                              class = "data.frame", .Names = c("neighborhood", "lon", "lat"), row.names = c(NA, -10L))

options(ggrepel.max.overlaps = Inf)
denver_center <- c(lon = -104.9, lat = 39.74)
denver_center_map <- get_map(denver_center, zoom=11, scale=1, color = "bw")
## Source : https://maps.googleapis.com/maps/api/staticmap?center=39.74,-104.9&zoom=11&size=640x640&scale=1&maptype=terrain&language=en-EN&key=xxx-4GX9jd61Pg
ggmap(denver_center_map)

#subset for mapping
denver_cols_map_crime_samples <- denver_cols_map_crime_2[1:50000, ]

colored_crime_hotspot_map <- ggmap(denver_center_map) + geom_point(data = denver_cols_map_crime_samples, 
                                                                   aes(x = GEO_LON, y = GEO_LAT, color = NEIGHBORHOOD_ID)) +
  theme(legend.position = "none") +
  ggrepel::geom_label_repel(data = neighborhoods_de, mapping =  
                              aes(x = lon, y = lat, label = neighborhood),
                            box.padding = 2, point.padding = 0.0005, fontface = 'bold') +
  ggtitle("Labeled Map Highlighting High-Crime Neighborhoods in Denver")

colored_crime_hotspot_map
## Warning: Removed 322 rows containing missing values (geom_point).
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf

Explanation

Above is a map of the Denver area neighborhoods where the majority of crimes occur. Clearly, most crimes take place in downtown and large portion also take place east of downtown near Aurora.

#Map of Denver Police Districts
districts_de <- structure(list(neighborhoods = c("1", "2", "3", "4", "5", "6","7"), 
                               lon = c(-105.0270, -104.9223, -104.9242, -105.0255, -104.8243, 
                                       -104.9733, -104.6985),
                               lat = c(39.77254, 39.75212, 39.69802, 39.68374, 39.78835,
                                       39.74014, 39.84231)), 
                          class = "data.frame", .Names = c("police_district", "lon", "lat"), row.names = c(NA, -7L))


denver_police_district_map <- ggmap(denver_center_map) + geom_point(data = denver_cols_map_crime_samples,
                                                                    aes(x = GEO_LON, y = GEO_LAT, color = DISTRICT_ID)) +
  theme(legend.position = "none") + 
  ggrepel::geom_label_repel(data = districts_de, mapping =  
                              aes(x = lon, y = lat, label = police_district),
                            box.padding = 2, point.padding = 0.0005, fontface = 'bold') +
  ggtitle("Map of Police Districts in Denver")

denver_police_district_map
## Warning: Removed 322 rows containing missing values (geom_point).
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf

#crime time of day - prep for heat map 
get_hour_of_day <- function(x) {
  return (as.numeric(strsplit(x,":")[[1]][1]))
} 

time_day_crime_occurrence_2 <- denver_cols_map_crime_2 %>%
  mutate(Hour = sapply(time, get_hour_of_day)) %>%
  group_by(dayofweek, Hour) %>%
  dplyr::summarize(total = n())
## `summarise()` has grouped output by 'dayofweek'. You can override using the
## `.groups` argument.
day_of_week_format <- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")
hour_order_format <- c(paste(c(12,1:11),"AM"), paste(c(12,1:11),"PM"))

time_day_crime_occurrence_2$dayofweek <- factor(time_day_crime_occurrence_2$dayofweek, label = revalue(day_of_week_format))

time_day_crime_occurrence_2$Hour <- factor(time_day_crime_occurrence_2$Hour, level = 0:23, label = hour_order_format)

#heatmap plot crime 
crime.heatmap.2 <- ggplot(data = time_day_crime_occurrence_2, mapping = aes(x = Hour,
                                                                        y = dayofweek,
                                                                        fill = total)) + geom_tile() +xlab(label = "Hour") + 
  ggtitle("Denver: Number of Crimes from 2017-2022 by Time the Crime Occurred ") +
  xlab("Hour of the Day") + ylab("Day of the Week") + 
  theme(legend.position = "right", axis.text.x = element_text(angle=45, vjust=0.9, hjust=1))

crime.heatmap.2

Explanation

This heat map is slightly tricky to decipher, but it clearly shows that most crimes occur late Saturday/late Sunday night (around midnight), daily around noon, and mid-afternoon on Saturday.

#Pie Chart grouping all-other-crimes and other-crimes-against-persons
denver_cols_map_crime_cleaned_2 <- denver_cols_map_crime_2 %>%
  mutate(time = hms::as_hms(hour(FIRST_OCCURRENCE_DATE)*60 + minute(FIRST_OCCURRENCE_DATE)),
         date = date(FIRST_OCCURRENCE_DATE),
         time_division = cut(as.numeric(time), breaks = c(0,6*60,12*60,18*60,23*60+59),
                             labels=c("00-06","06-12","12-18","18-00"),
                             include.lowest = TRUE))

denver_cols_map_crime_cleaned_2 %>% 
  dplyr::select(incident_id, FIRST_OCCURRENCE_DATE, date, time, time_division)
## # A tibble: 383,319 × 5
##    incident_id FIRST_OCCURRENCE_DATE date       time   time_division
##          <dbl> <dttm>                <date>     <time> <fct>        
##  1 20226000193 2022-01-04 11:30:00   2022-01-04 11'30" 06-12        
##  2    20223319 2022-01-03 06:45:00   2022-01-03 06'45" 06-12        
##  3    20223093 2022-01-03 01:00:00   2022-01-03 01'00" 00-06        
##  4    20224000 2022-01-03 19:47:00   2022-01-03 19'47" 18-00        
##  5    20223956 2022-01-03 17:06:00   2022-01-03 17'06" 12-18        
##  6    20223903 2022-01-03 16:40:00   2022-01-03 16'40" 12-18        
##  7    20223899 2022-01-03 16:19:00   2022-01-03 16'19" 12-18        
##  8    20223888 2022-01-03 16:15:00   2022-01-03 16'15" 12-18        
##  9    20228085 2022-01-05 19:00:00   2022-01-05 19'00" 18-00        
## 10    20224563 2022-01-04 04:30:00   2022-01-04 04'30" 00-06        
## # … with 383,309 more rows
denver_cols_map_crime_cleaned_2 %>% 
  group_by(time_division) %>% 
  dplyr::summarize(total_crimes=n())
## # A tibble: 4 × 2
##   time_division total_crimes
##   <fct>                <int>
## 1 00-06                71690
## 2 06-12                78842
## 3 12-18               117060
## 4 18-00               115727
crime_types_2 <- denver_cols_map_crime_cleaned_2 %>%
  group_by(OFFENSE_CATEGORY_NAME) %>%
  dplyr::summarize(total_crimes = n()) %>%
  arrange(desc(total_crimes))

denver_cols_map_crime_cleaned_2 <- denver_cols_map_crime_cleaned_2 %>%
  mutate(
    crime=fct_recode(OFFENSE_CATEGORY_NAME,
                     "Crimes - Other" = "All Other Crimes",
                     "Crimes - Other" = "Other Crimes Against Persons",
                     "Theft - Motor Vehicle" = "Auto Theft",
                     "Theft - Motor Vehicle" = "Theft from Motor Vehicle",
                     "Public Disorder" = "Public Disorder", 
                     "Arson" = "Arson",
                     "Larceny" = "Larceny", 
                     "Murder" = "Murder", 
                     "Drugs/Alcohol" = "Drug & Alcohol",
                     "Robbery" = "Robbery", 
                     "Aggravated Assault" = "Aggravated Assault",
                     "Burglary" = "Burglary", 
                     "White Collar Crime" = "White Collar Crime"
    ))

denver_cols_map_crime_cleaned_3 <- denver_cols_map_crime_cleaned_2 %>%
  group_by(crime) %>%
  dplyr::summarize(total_crimes=n()) %>%
  arrange(desc(total_crimes))

denver_cols_map_crime_cleaned_4b <- denver_cols_map_crime_cleaned_3
denver_cols_map_crime_cleaned_4b <- tribble(
  ~crime,  ~total_crimes,
   "Crimes - Other",   100633,
   "Theft - Motor Vehicle",   98456,
   "Public Disorder",   53559,
   "Larceny",    52556,
   "Burglary",    25588,
   "Drugs/Alcohol ",   23568,
   "Aggravated Assault",  14615,
   "Robbery", 6646,
   "White Collar Crime",   6550,
   "Arson",  767,
   "Murder", 381)

denver_cols_map_crime_cleaned_4b$crime <- fct_reorder(denver_cols_map_crime_cleaned_4b$crime, denver_cols_map_crime_cleaned_4b$total_crimes)

denver_cols_map_crime_cleaned_4b <- denver_cols_map_crime_cleaned_4b[order(denver_cols_map_crime_cleaned_4b$total_crimes, decreasing = TRUE), ]

my_labels <- tibble(x.breaks = seq(1, 1.35, length.out = 11),
                    y.breaks = cumsum(denver_cols_map_crime_cleaned_4b$total_crimes) - denver_cols_map_crime_cleaned_4b$total_crimes/2, 
                    labels = paste(denver_cols_map_crime_cleaned_4b$total_crimes,scales::percent(round(denver_cols_map_crime_cleaned_4b$total_crimes/sum (denver_cols_map_crime_cleaned_4b$total_crimes),3), sep='\n')),
                    crime = denver_cols_map_crime_cleaned_4b$crime)


dist_categ_crimes <- ggplot(denver_cols_map_crime_cleaned_4b, aes(x = 1, y = total_crimes, fill = crime)) +
  ggtitle(paste("Distribution of the Different Categories of Crime")) +
  geom_bar(stat="identity", color='black') + 
  coord_polar(theta='y') + 
  guides(fill=guide_legend(override.aes=list(colour=NA)))+ 
  theme(axis.ticks=element_blank(),  # the axis ticks
        axis.title=element_blank(),  # the axis labels
        axis.text.y=element_blank(), # the 0.75, 1.00, 1.25 labels.
        axis.text.x = element_blank(), 
        panel.grid = element_blank()) +
  geom_label_repel(data = my_labels, aes(x = x.breaks, y = y.breaks, 
                                        label = labels, fill = crime),
                   label.padding = unit(0.1, "lines"),
                   size = 3.5,
                   show.legend = FALSE,
                   inherit.aes = FALSE)

dist_categ_crimes

# Explanation The largest category of crimes fell under the ‘other’ (26.3%) category followed closely by ‘theft-motor vehicle’ (25.7%). This equates with crime reports showing car theft is prevalent in a most urban communities. The fact that over one quarter of the collected data was listed as ‘other’ is significant because it skews how his data-set is interpreted. In other words, it means that those inputting the crime data didn’t categorize the offense specifically, which further means that fewer conclusions can be made about committed crimes in Denver.

##Exploratory Analysis in diving deeper - TRAFFIC mapping
denver_cols_map_traffic_2 <- denver_crime_datetime_separate_3 %>%
  filter(IS_CRIME == 0)

denver_cols_map_traffic_2$summary_box <- paste("<b>Incident #: </b>", denver_cols_map_traffic_2$incident_id,
                                             "<br>", "<b>Incident Address: </b>", denver_cols_map_traffic_2$INCIDENT_ADDRESS,
                                             "<br>", "<b>Category: </b>", denver_cols_map_traffic_2$OFFENSE_CATEGORY_ID,
                                             "<br>", "<b>Day of the week: </b>", denver_cols_map_traffic_2$dayofweek,
                                             "<br>", "<b>Date: </b>", denver_cols_map_traffic_2$date,
                                             "<br>", "<b>Time: </b>", denver_cols_map_traffic_2$time,
                                             "<br>", "<b>Denver Neighborhood: </b>", denver_cols_map_traffic_2$NEIGHBORHOOD_ID,
                                             "<br>", "<b>Denver Police district ID #: </b>", denver_cols_map_traffic_2$DISTRICT_ID,
                                             "<br>", "<b>Longitude: </b>", denver_cols_map_traffic_2$GEO_LON,
                                             "<br>", "<b>Latitude: </b>", denver_cols_map_traffic_2$GEO_LAT)

#subset of traffic 50,000 samples maximum OpenStreetMap can handle
denver_traffic_subset <- denver_cols_map_traffic_2[1:40000, ]

denver_traffic_subset$summary_box <- paste("<b>Incident #: </b>", denver_traffic_subset$incident_id,
                                           "<br>", "<b>Incident Address: </b>", denver_traffic_subset$INCIDENT_ADDRESS,
                                           "<br>", "<b>Category: </b>", denver_traffic_subset$OFFENSE_CATEGORY_ID,
                                           "<br>", "<b>Day of the week: </b>", denver_traffic_subset$dayofweek,
                                           "<br>", "<b>Date: </b>", denver_traffic_subset$date,
                                           "<br>", "<b>Time: </b>", denver_traffic_subset$time,
                                           "<br>", "<b>Denver Neighborhood: </b>", denver_traffic_subset$NEIGHBORHOOD_ID,
                                           "<br>", "<b>Denver Police district ID #: </b>", denver_traffic_subset$DISTRICT_ID,
                                           "<br>", "<b>Longitude: </b>", denver_traffic_subset$GEO_LON,
                                           "<br>", "<b>Latitude: </b>", denver_traffic_subset$GEO_LAT)
leaflet() %>%
  addProviderTiles(providers$OpenStreetMap, group = "OSM") %>%
  addTiles('http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png')%>%
  addMarkers(lng = denver_traffic_subset$GEO_LON,
             lat = denver_traffic_subset$GEO_LAT, 
             popup = denver_traffic_subset$summary_box,
             clusterOptions = markerClusterOptions())

Mapping Traffic Offenses in Denver via OpenStreetMap

In order to render this RMarkdown report, I had to consolidate the traffic data set from 111,000 lines of data to 40,000. Roughly two thirds of the traffic offenses occur in downtown Denver. This makes sense as later in the report, I graphically show that the majority of crimes take place in Five Points (downtown).

#TRAFFIC FROM 1/2/17-6/12/22
df_traffic_over_time <- denver_cols_map_traffic_2 %>%
        group_by(date) %>%
        dplyr::summarize(total = n()) %>%
        arrange(date)

df_traffic_over_time_plot <- ggplot(df_traffic_over_time, aes(x = date, y = total)) +
        geom_line(color = "purple", size = 0.05) +
        geom_smooth(color = "navy") +
        scale_x_date(breaks = date_breaks("1 year"), labels = date_format("%Y")) +
        xlab("Date of Crime (Year)") + ylab("Number of Crimes Committed") + 
        ggtitle("Denver: Daily Number of Traffic Incidents Committed from 1/2/2017 - 6/12/2022") +
        theme(axis.text.x = element_text(angle=30, vjust=.5, hjust=1))

df_traffic_over_time_plot
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

# Explanation The daily traffic incident trend is fairly consistent from 2017 until 2020 with a noticeable drop in daily offenses from the end of 2019 through mid 2020. This correlates with the beginning of the Covid-19 pandemic - meaning people were forced to work at home and weren’t driving daily. There was a gradual uptick from the last third of 2020 through the first third of 2021 and then a slight decline through June 2022. It’s possible that if I looked at this data-set at the end of 2022 that this last decline might level out. This analysis only includes 2022 data from January through mid June.

#traffic offenses NEIGHBORHOOD
denver_cols_map_traffic_2_neighborhood <- denver_cols_map_traffic_2 %>%
  group_by(NEIGHBORHOOD_ID) %>%
  dplyr::summarize(total = n()) %>%
  slice_max(order_by = total, n = 10)

denver_cols_map_traffic_2_neighborhood_plot <- ggplot(denver_cols_map_traffic_2_neighborhood, aes(x = reorder(NEIGHBORHOOD_ID, 
                                                                                              -total), y = total, fill = NEIGHBORHOOD_ID)) + 
  geom_bar(stat = 'identity', position = 'dodge') + 
  xlab("Denver Neighborhoods") + 
  ylab("Total Crimes") + ggtitle("Denver Neighborhoods Where the Majority of Crimes Occur") +
  theme(legend.position = "none", axis.text.x = element_text(angle=45, vjust=0.9, hjust=1))

denver_cols_map_traffic_2_neighborhood_plot

Explanation

This plot once again shows that the majority of traffic offenses occur in downtown. The difference between this and the crime plot was that most crimes happened in Five Points, whereas the traffic incidents occurred in Central Park and Baker. This makes sense as both neighbhorhoods are in close proximity to the freeway and are highly traversed.

#Heat map of Time - Traffic Accidents
#The grepl in R is a built-in function that 
#searches for matches of a string or string vector. 
denver_cols_map_traffic_info <- denver_cols_map_traffic_2 %>%
        filter(grepl("traffic-accident", OFFENSE_TYPE_ID))

denver_cols_map_traffic_info_daily <- denver_cols_map_traffic_info %>%
        group_by(date) %>%
        dplyr::summarize(total = n()) %>%
        arrange(date)

df_traffic_over_time_plot <- ggplot(denver_cols_map_traffic_info_daily, aes(x = date, y = total)) +
        geom_line(color = "purple", size = 0.05) +
        geom_smooth(color = "navy") +
        scale_x_date(breaks = date_breaks("1 year"), labels = date_format("%Y")) +
        xlab("Year of Traffic Accident") + ylab("Number of Traffic Accidents") + 
        ggtitle("Denver: Daily Number of Traffic Incidents Committed from 1/2/2017 - 6/12/2022") +
        theme(axis.text.x = element_text(angle=30, vjust=.5, hjust=1))

get_hour_of_day <- function(x) {
        return (as.numeric(strsplit(x,":")[[1]][1]))
} 

time_day_traffic_accident <- denver_cols_map_traffic_info %>%
        mutate(Hour = sapply(time, get_hour_of_day)) %>%
        group_by(dayofweek, Hour) %>%
        dplyr::summarize(total = n())
## `summarise()` has grouped output by 'dayofweek'. You can override using the
## `.groups` argument.
day_of_week_format <- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday", "Sunday")
hour_order_format <- c(paste(c(12,1:11),"AM"), paste(c(12,1:11),"PM"))


time_day_traffic_accident$dayofweek <- factor(time_day_traffic_accident$dayofweek, label = revalue(day_of_week_format))
time_day_traffic_accident$Hour <- factor(time_day_traffic_accident$Hour, level = 0:23, label = hour_order_format)

##HEATMAP PLOT TRAFFIC ACCIDENTS__________________________
traffic.heatmap <- ggplot(data = time_day_traffic_accident, mapping = aes(x = Hour,
                                                                          y = dayofweek,
                                                                          fill = total)) + geom_tile() +xlab(label = "Hour") + 
        ggtitle("Denver: Number of Traffic Accidents from 1/2/2017 - 6/12/2022") +
        xlab("Hour of the Day") + ylab("Day of the Week") + 
  theme(legend.position = "right", axis.text.x = element_text(angle=45, vjust=0.9, hjust=1))

traffic.heatmap

View(time_day_traffic_accident)
## Warning in system2("/usr/bin/otool", c("-L", shQuote(DSO)), stdout = TRUE):
## running command ''/usr/bin/otool' -L '/Library/Frameworks/R.framework/Resources/
## modules/R_de.so'' had status 1

Explanation

This heat map is much easier to decipher than the crime version. The majority of traffic incidents occur between 7-8AM and 3-5PM from Tuesday through Friday. All in all, this makes sense since both time frames correlate with daily commuter traffic.

#TRAFFIC INCIDENT Investigation other than 'Traffic Accident'
#filter out traffic-accident

denver_crimes_codes_joined_is_isnt_crime_3 <- denver_crimes_codes_joined_3 %>%
        filter(IS_TRAFFIC ==1, OFFENSE_TYPE_ID != "traffic-accident") %>%
        group_by(OFFENSE_TYPE_ID) %>%
        dplyr::summarize(total = n())

unique(denver_crimes_codes_joined_is_isnt_crime_3$OFFENSE_CATEGORY_ID)
## Warning: Unknown or uninitialised column: `OFFENSE_CATEGORY_ID`.
## NULL
unique(denver_crimes_codes_joined_is_isnt_crime_3$OFFENSE_TYPE_ID)
## [1] "traf-vehicular-assault"       "traf-vehicular-homicide"     
## [3] "traffic-accident-dui-duid"    "traffic-accident-hit-and-run"
denver_crimes_codes_joined_is_isnt_crime_3_wo_hom <- denver_crimes_codes_joined_is_isnt_crime_3 %>%
        filter(OFFENSE_TYPE_ID != "traf-vehicular-homicide", OFFENSE_TYPE_ID != "traf-vehicular-assault")

denver_crimes_codes_joined_is_isnt_crime_3_wo_hom$percentage <- 
        denver_crimes_codes_joined_is_isnt_crime_3_wo_hom$total / 
        sum(denver_crimes_codes_joined_is_isnt_crime_3_wo_hom$total)

pie_traffic_misc_2 <- ggplot(data = denver_crimes_codes_joined_is_isnt_crime_3_wo_hom, aes(x="", y=percentage, fill=OFFENSE_TYPE_ID)) +
        geom_col(color = "black") + 
        coord_polar("y", start=0) +
        geom_text(aes(x=1.6, label=paste0(round(percentage*100), "%")),
                  position = position_stack(vjust=0.5)) +
        theme(panel.background = element_blank(),
              axis.line = element_blank(),
              axis.text = element_blank(),
              axis.ticks = element_blank(),
              axis.title = element_blank(),
              plot.title = element_text(hjust = 0.5, size = 14)) +
        ggtitle("Subtypes of Traffic Accidents - Roughly One Third \n of the 111,000 Accidents in the Dataset") +
        scale_fill_discrete(name = "Miscellaneous Traffic Type")

pie_traffic_misc_2

Explanation

I explored the original data-set of all traffic incidents finding that 73,000/111,000 incidents were filed as ‘traffic accidents’. I wanted to explore the other sub-types of traffic incidents. I removed two sub-types because they collectively were only 300 of the 37,000 incidents. This pie chart is my result and it shows that of this subset, the majority (92%) are hit and runs whereas the minority (8%) are dui related.