Chicago Crime EDA

library(readr)
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.3.1
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.1
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.3.1
library(readr)
crimedf <- read_csv("~/Desktop/crimedataquery.csv")
## Rows: 29845 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): case_number, date, block, iucr, primary_type, description, locatio...
## dbl (10): unique_key, beat, district, ward, community_area, x_coordinate, y_...
## lgl  (2): arrest, domestic
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Convert 'date' column to date-time format
crimedf$date <- as.POSIXct(crimedf$date, format="%Y-%m-%d %H:%M:%S", tz="UTC")

# Filtering for the years 2022 and 2023
crimedf_filtered <- filter(crimedf, year(date) %in% c(2022, 2023))

Exploratory Questions

  1. What type of crimes that have mostly occurred this last year 2022-2023?
  2. What are the top 5 crimes that occurred in 2023?
  3. Which block has the highest crime rate by beatings?

What type of crimes that have mostly occurred this last year 2022-2023?

crime_count_2022_2023 <- crimedf_filtered %>%
                        group_by(primary_type) %>%
                        summarize(count = n()) %>%
                        arrange(desc(count))

# Optionally, create a bar plot
ggplot(crime_count_2022_2023, aes(x=reorder(primary_type, count), y=count)) +
  geom_bar(stat="identity") +
  coord_flip() +
  labs(title="Crime Types in Chicago (2022-2023)", x="Crime Type", y="Count")

What are the top 5 crimes that occurred in 2022-2023?

From the previous, we can note that - deceptive practice - battery - other offense - narcotics - robbery

where the highest top 5 crimes comitted in 2022-2023. In other words, fradualent crimes are on the rise. These type of likley to miselead a consumer from providing false information.

Which block has the highest crime rate by beatings?

beating_crimes <- filter(crimedf_filtered, primary_type == "DECEPTIVE PRACTICE")

beating_crime_by_block <- beating_crimes %>%
  group_by(block) %>%
  summarize(count = n()) %>%
  arrange(desc(count)) %>%
  head(1)  # Assuming you want the top block

# Print the result
print(beating_crime_by_block)
## # A tibble: 1 × 2
##   block            count
##   <chr>            <int>
## 1 001XX N STATE ST    22

This block has the highest deceptive practice in chicago. If we were intrested in looking at the second highest crime then we would want to look at the block with the highest ‘battery’ crime.

battery_crimes <- filter(crimedf_filtered, primary_type == "BATTERY")

battery_crime_by_block <- battery_crimes %>%
  group_by(block) %>%
  summarize(count = n()) %>%
  arrange(desc(count)) %>%
  head(1)  # Assuming you want the top block

# Print the result
print(battery_crime_by_block)
## # A tibble: 1 × 2
##   block               count
##   <chr>               <int>
## 1 006XX S CENTRAL AVE     6

This block has the highest battery practice in chicago. These crimes were committed in 2022-2023.

Temporal Analysis near the highest block?

For a temporal analysis, particularly focusing on beatings:

  1. 2022-2023 - 001XX N STATE ST - we want to go back at least 3 years - DECEPTIVE PRACTICE
  2. 2022-2023 - 006XX S CENTRAL AVE - we want to go back at least 3 years - BATTERY

Temporal Analysis

library(readr)
library(dplyr)
library(lubridate)
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.3.1
library(leaflet.extras)
library(cluster)
## Warning: package 'cluster' was built under R version 4.3.1
library(readr)
library(dplyr)
library(lubridate)
library(leaflet)
library(leaflet.extras)
library(cluster)

# Load and preprocess data
crimedf <- read_csv("~/Desktop/crimedataquery.csv")
## Rows: 29845 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): case_number, date, block, iucr, primary_type, description, locatio...
## dbl (10): unique_key, beat, district, ward, community_area, x_coordinate, y_...
## lgl  (2): arrest, domestic
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
crimedf$date <- as.POSIXct(crimedf$date, format="%Y-%m-%d %H:%M:%OS", tz="UTC")
crimedf_filtered <- filter(crimedf, year(date) >= 2019)

# Clean data: Remove rows with NA/NaN/Inf in latitude or longitude
crimedf_filtered <- crimedf_filtered %>% 
  filter(!is.na(latitude) & !is.na(longitude) & 
         !is.infinite(latitude) & !is.infinite(longitude))

# K-means clustering
set.seed(123) # For reproducibility
coords <- crimedf_filtered %>% select(latitude, longitude)
kmeans_result <- kmeans(coords, centers = 5) # Adjust 'centers' as needed

# Add cluster information to the data
crimedf_filtered$cluster <- kmeans_result$cluster

# Create a leaflet map
map <- leaflet(crimedf_filtered) %>% addTiles()

# Add clustered points to the map
map <- map %>% addCircleMarkers(
  lat = ~latitude, 
  lng = ~longitude, 
  color = ~factor(cluster), 
  popup = ~paste("Cluster:", cluster)
)

# Add a heatmap layer
map <- map %>% addHeatmap(
    lat = ~latitude, 
    lng = ~longitude, 
    intensity = ~1, 
    blur = 20, 
    max = 0.05, 
    radius = 15
)

# Render the map
map

Here is a visualization of our temporal space in chicago. It is intresting to see how k-means is used to create our clustering to better concentrate on specific hot zone crimes. Clustering was made on a basis of 5 to learn the data and around the area structure the coloring with the amount of arrests.