Introduction

This data set exists to display reported crime in Chicago, Illinois from the year 2001 to 2021. This data is sourced from the Chicago Police Department’s CLEAR (Citizen Law Enforcement Analysis and Reporting) system. It contains 22 variables and nearly 7.5 million rows of data. This project utilizes the data set to create visualizations that not only answers questions such as “What is the most common type of crime in Chicago?”, but also seeks to create questions that cannot be answered simply using just this data set.

Basic Descriptive Statistics

As previously stated, there is nearly 7.5 million rows of data with 22 variables such as Date, Description, and Location. The types of data for each variable vary from character, integer, to logic. The variables that were used to create the visualizations include Date, Primary Type, and District. As seen in the plots below, the year with the most amount of crime was in 2002 and the least in 2021. The most common type of crime is theft and the least common was domestic violence. The district with the most crimes occurred in district 4 and the least was in district 21.

Histogram of Crime in Chicago by Year

The year with the most amount of recorded crime is 2002 with 486,778. The year with the least amount of recorded crime is 2021 with 203,527. The average number of crimes over the 21 years is 357,541 per year, with 2011 being the closest to that average.

library(data.table)
library(plyr)
library(dplyr)
library(DescTools)
library(ggplot2)
library(scales)
library(lubridate)
library(ggrepel)
library(RColorBrewer)
library(ggthemes)
library(plotly)
chicago <- fread("~/Desktop/R_datafiles/Crimes 2001 to Present.csv")

# Histogram of Crime in Chicago by Year (2001-2021)
plot1 <- ggplot(chicago, aes(x = Year)) +
  geom_histogram(bins = 21,color="darkblue",fill="lightblue") +
  labs(title = "Histogram of Crime in Chicago by Year",x="Year",y="Count of Crimes") +
  scale_y_continuous(labels=comma) +
  stat_bin(binwidth = 1, geom = 'text',color='black',
           aes(label=scales::comma(..count..)),vjust=-0.5)
# x-axis labels
x_axis_labs <- min(chicago$Year):max(chicago$Year)
plot1 <- plot1 + scale_x_continuous(labels = x_axis_labs, breaks = x_axis_labs)
plot1

Heatmap: Citations by Day of the Week

Friday is the most common day for a crime to occur for every year except 2020 and 2021. The least common day for a crime to occur is Sunday, specifically in the years 2001-2013, 2016, 2018, and 2019. The most common day of the week for a crime to occur in 2020 was Sunday with 31,106, and the least common day of the week was Tuesday with 29,035. The most common day of the week for a crime to occur in 2021 was Saturday with 29,861, and the least common day of the week was Thursday with 27,847.

days_df <- chicago %>%
  select(Date) %>%
  mutate(year = year(mdy_hms(Date)),
         dayoftheweek = weekdays(mdy_hms(Date), abbreviate = TRUE)) %>%
  group_by(year, dayoftheweek) %>%
  summarise(n = length(Date), .groups = 'keep') %>%
  data.frame()
days_df <- na.omit(days_df)
days_df$year <- as.factor(days_df$year)

mylevels = c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")
days_df$dayoftheweek <- factor(days_df$dayoftheweek, levels = mylevels)

breaks <- c(seq(0, max(days_df$n), by = 5000))

# Heatmap: Citations by Day of the Week
plot2 <- ggplot(days_df, aes(x = year, y = dayoftheweek, fill = n)) +
  geom_tile(color = "black") +
  geom_text(aes(label = comma(n))) +
  coord_equal(ratio = 1) +
  labs(title = "Heatmap: Citations by Day of the Week",
       x = "Year",
       y = "Days of the Week",
       fill = "Citation Count") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_discrete(limits = rev(levels(days_df$dayoftheweek))) +
  scale_fill_continuous(labels = comma, low = "white", high = "red", breaks = breaks) +
  guides(fill = guide_legend(reverse = TRUE, override.aes = list(color = "black")))
plot2

Citations by Hour

The time of day that had the most number of citations given for a crime is 12pm with 428,134. The time of day that had the least number of citations given for a crime is 5am with 101,512. The graph displays a general trend of an increase in the time of day with an increase in citations given for a crime.

# Citations by Hour (line chart)
timeofday_df <- chicago %>%
  select(Date) %>%
  mutate(hour24 = hour(mdy_hms(Date))) %>%
  group_by(hour24) %>%
  summarise(n = length(Date), .groups = "keep") %>%
  data.frame()
# x axis points at every hour mark
x_axis_labels <- min(timeofday_df$hour24):max(timeofday_df$hour24)

# largest and lowest number of citations by time of day
lg_sm <- timeofday_df %>%
  filter(n==min(n) | n==max(n)) %>%
  data.frame()

# Line Chart with citation count per hour of day, with max and min points labeled
plot3 <- ggplot(timeofday_df, aes(x = hour24, y = n)) +
  geom_line(color = "blue", size = 1) +
  geom_point(shape = 21, size = 4, color = "red", fill = "white") +
  labs(x = "Hour", y = "Citation Count", title = "Citations by Hour") +
  scale_y_continuous(labels = comma) +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_x_continuous(labels = x_axis_labels, breaks = x_axis_labels, minor_breaks = NULL) +
  geom_point(data = lg_sm, aes(x = hour24, y = n), shape = 21, size = 4,fill = "red",color = "red") +
  geom_label_repel(aes(label = ifelse(n==max(n) | n==min(n),scales::comma(n),"")),box.padding = 1, point.padding = 1, size=4, 
                   color='Grey50', segment.color='darkblue')
plot3

Citations by Districts in Chicago

The donut chart describes the percentages of the top 10 districts that have the most citations for crime in Chicago from 2001-2021. The district with the most number of citations is District 4 with 418,227. The district with the least number of citations of the top 10 is District 9 with 416,429. The district with the least number of citations overall is District 21 with 4. The top 10 districts with the most number of citations for crime covers around 55.9% of all the citations in Chicago. The top 10 districts are also primarily located in the southern area of Chicago.

top_district <- count(chicago, District)
top_district <- na.omit(top_district)
# ordering District count in descending order
top_district <- top_district[order(-n),]
# top 10 Districts
top_district10 <- top_district[1:10]

district_df <- chicago %>%
  select(District, Date) %>%
  mutate(year = year(mdy_hms(Date)),
         myDistrict = ifelse(District %in% top_district10$District, top_district10$District, "Other")) %>%
  group_by(year, myDistrict) %>%
  summarise(n = length(myDistrict), .groups = 'keep') %>%
  group_by(year) %>%
  mutate(percent_of_total = round(100*n/sum(n),1)) %>%
  ungroup() %>%
  data.frame()

# Donut Chart
plot4 <- plot_ly(district_df, labels = ~myDistrict, values = ~n) %>%
  add_pie(hole = 0.6) %>%
  layout(title = "Citations by Districts in Chicago (2001-2021)") %>%
  layout(annotations = list(text = paste0("Total Citation Count:\n", 
                                          scales::comma(sum(district_df$n))),
                                          "showarrow"=  F))
plot4

Number of Crimes in Chicago by Type

The plot only includes the last 8 years of this data set prevent over crowding of variables. The most common type of crime was Theft with 453,888 between 2014-2021. The least common shown on the plot was Robbery with 76,671. This trend remains the same when you include the data from 2001-2021.

# number of times a type of crime occurred (alphabetical order)
chicago_type <- count(chicago,`Primary Type`)

# number of times a type of crime occurred (descending order)
chicago_type <- chicago_type[order(chicago_type$n,decreasing = TRUE),]

# top 10 types of crime in Chicago
top_crime <- chicago_type$`Primary Type`[1:10]

# Type of crime, year it occurred, number of types crime occurred
chicago_new <- chicago %>%
  filter(`Primary Type` %in% top_crime) %>%
  select(Date, `Primary Type`) %>%
  mutate(year = year(mdy_hms(Date))) %>%
  group_by(`Primary Type`,year) %>%
  summarise(n=length(`Primary Type`), .groups = "keep") %>%
  data.frame()

# Type of crime, year it occurred, number of types crime occurred (including crime not detailed)
chicago_other <- chicago %>%
  filter(!`Primary Type` %in% top_crime) %>%
  select(Date) %>%
  mutate(year=year(mdy_hms(Date)), `Primary Type` = "Other") %>%
  group_by(`Primary Type`, year) %>%
  summarise(n=length(`Primary Type`), .groups = 'keep') %>%
  data.frame()

chicago_new <- rbind(chicago_new,chicago_other)

# only selecting the past 8 years for better visualization
chicago_new <- chicago_new[chicago_new$year >= 2014,]

chicago_total <- chicago_new %>%
  select(Primary.Type,n) %>%
  group_by(Primary.Type) %>%
  summarise(tot=sum(n), .groups = 'keep') %>%
  data.frame()
chicago_new$year <- as.factor(chicago_new$year)

# setting maximum values
max_y <- round_any(max(chicago_total$tot), 50000, ceiling)

# Number of Crimes in Chicago by Type of Crime
plot5 <- ggplot(chicago_new, aes(x = reorder(Primary.Type,n,sum), y = n, fill = year)) +
  geom_bar(stat = "identity", position = position_stack(reverse = TRUE)) +
  coord_flip() +
  labs(title = "Number of Crimes in Chicago by Type", x = "", y = "Crime Count", fill = "Year") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "Spectral", guide=guide_legend(reverse = TRUE)) +
  geom_text(data = chicago_total, aes(x = Primary.Type, y = tot, label = scales::comma(tot), fill = NULL),
            hjust=1,size=4) +
  scale_y_continuous(labels = comma,
                     breaks = seq(0, max_y, by=50000),
                     limits = c(0,max_y))

plot5

Conclusion

Overall, this project provides a general understanding of the crime that occurs in the city of Chicago from 2001-2021. Using the visualizations provided, one can argue that the most likely crime to occur would be theft on a Friday at 12pm, likely in one of the top 10 districts. This project only contains data about crimes that occurred and were recorded. This could be further expanded using data about the people who committed a crime. Combining this project with data about a criminal’s race, economic background, and/or age would aid in finding more insight into the crime that occurs in Chicago and even every city.