Chicagocrime

Chicago Crime EDA

library(readr)
library(lubridate)

## Warning: package 'lubridate' was built under R version 4.3.1

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.3.1

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.3.1

library(readr)
crimedf <- read_csv("~/Desktop/crimedataquery.csv")

## Rows: 29845 Columns: 22

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): case_number, date, block, iucr, primary_type, description, locatio...
## dbl (10): unique_key, beat, district, ward, community_area, x_coordinate, y_...
## lgl  (2): arrest, domestic
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Convert 'date' column to date-time format
crimedf$date <- as.POSIXct(crimedf$date, format="%Y-%m-%d %H:%M:%S", tz="UTC")

# Filtering for the years 2022 and 2023
crimedf_filtered <- filter(crimedf, year(date) %in% c(2022, 2023))

Exploratory Questions

What type of crimes that have mostly occurred this last year 2022-2023?
What are the top 5 crimes that occurred in 2023?
Which block has the highest crime rate by beatings?

What type of crimes that have mostly occurred this last year 2022-2023?

crime_count_2022_2023 <- crimedf_filtered %>%
                        group_by(primary_type) %>%
                        summarize(count = n()) %>%
                        arrange(desc(count))

# Optionally, create a bar plot
ggplot(crime_count_2022_2023, aes(x=reorder(primary_type, count), y=count)) +
  geom_bar(stat="identity") +
  coord_flip() +
  labs(title="Crime Types in Chicago (2022-2023)", x="Crime Type", y="Count")

What are the top 5 crimes that occurred in 2022-2023?

From the previous, we can note that - deceptive practice - battery - other offense - narcotics - robbery

where the highest top 5 crimes comitted in 2022-2023. In other words, fradualent crimes are on the rise. These type of likley to miselead a consumer from providing false information.

Which block has the highest crime rate by beatings?

beating_crimes <- filter(crimedf_filtered, primary_type == "DECEPTIVE PRACTICE")

beating_crime_by_block <- beating_crimes %>%
  group_by(block) %>%
  summarize(count = n()) %>%
  arrange(desc(count)) %>%
  head(1)  # Assuming you want the top block

# Print the result
print(beating_crime_by_block)

## # A tibble: 1 × 2
##   block            count
##   <chr>            <int>
## 1 001XX N STATE ST    22

This block has the highest deceptive practice in chicago. If we were intrested in looking at the second highest crime then we would want to look at the block with the highest ‘battery’ crime.

battery_crimes <- filter(crimedf_filtered, primary_type == "BATTERY")

battery_crime_by_block <- battery_crimes %>%
  group_by(block) %>%
  summarize(count = n()) %>%
  arrange(desc(count)) %>%
  head(1)  # Assuming you want the top block

# Print the result
print(battery_crime_by_block)

## # A tibble: 1 × 2
##   block               count
##   <chr>               <int>
## 1 006XX S CENTRAL AVE     6

This block has the highest battery practice in chicago. These crimes were committed in 2022-2023.

Temporal Analysis near the highest block?

For a temporal analysis, particularly focusing on beatings:

Use time series analysis to identify trends, patterns, and seasonal variations in beatings. For this practice, we will focus on the two cases where we will break down by our paramters:

YEAR COMITTED CRIME - BLOCK - EVOLUTION OF TIME AROUND THAT AREA - TYPE OF CRIME

2022-2023 - 001XX N STATE ST - we want to go back at least 3 years - DECEPTIVE PRACTICE
2022-2023 - 006XX S CENTRAL AVE - we want to go back at least 3 years - BATTERY

Previous Trends

crimedf$date <- as.POSIXct(crimedf$date, format="%Y-%m-%d %H:%M:%S", tz="UTC")

# Case I: DECEPTIVE PRACTICE at 001XX N STATE ST
deceptive_practice_data <- filter(crimedf, primary_type == "DECEPTIVE PRACTICE",
                                  block == "001XX N STATE ST", year(date) >= 2019)

# Aggregate by month or year
monthly_deceptive <- deceptive_practice_data %>%
  mutate(month = floor_date(date, "month")) %>%
  group_by(month) %>%
  summarize(count = n())

# Plotting
ggplot(monthly_deceptive, aes(x = month, y = count)) +
  geom_line() +
  labs(title = "Monthly Trend of Deceptive Practice at 001XX N STATE ST (2019-2023)",
       x = "Month", y = "Count of Crimes")

There has been a sudden spike in 2023.

# Case II: BATTERY at 006XX S CENTRAL AVE
battery_data <- filter(crimedf, primary_type == "BATTERY",
                       block == "006XX S CENTRAL AVE", year(date) >= 2019)

# Aggregate by month or year
monthly_battery <- battery_data %>%
  mutate(month = floor_date(date, "month")) %>%
  group_by(month) %>%
  summarize(count = n())

# Plotting
ggplot(monthly_battery, aes(x = month, y = count)) +
  geom_line() +
  labs(title = "Monthly Trend of Battery at 006XX S CENTRAL AVE (2019-2023)",
       x = "Month", y = "Count of Crimes")

Ther was sudden spikes towards the end of 2021 and early 2022.

Temporal Analysis

library(readr)
library(dplyr)
library(lubridate)
library(leaflet)

## Warning: package 'leaflet' was built under R version 4.3.1

library(leaflet.extras)
library(cluster)

## Warning: package 'cluster' was built under R version 4.3.1

library(readr)
library(dplyr)
library(lubridate)
library(leaflet)
library(leaflet.extras)
library(cluster)

# Load and preprocess data
crimedf <- read_csv("~/Desktop/crimedataquery.csv")

## Rows: 29845 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): case_number, date, block, iucr, primary_type, description, locatio...
## dbl (10): unique_key, beat, district, ward, community_area, x_coordinate, y_...
## lgl  (2): arrest, domestic
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

crimedf$date <- as.POSIXct(crimedf$date, format="%Y-%m-%d %H:%M:%OS", tz="UTC")
crimedf_filtered <- filter(crimedf, year(date) >= 2019)

# Clean data: Remove rows with NA/NaN/Inf in latitude or longitude
crimedf_filtered <- crimedf_filtered %>% 
  filter(!is.na(latitude) & !is.na(longitude) & 
         !is.infinite(latitude) & !is.infinite(longitude))

# K-means clustering
set.seed(123) # For reproducibility
coords <- crimedf_filtered %>% select(latitude, longitude)
kmeans_result <- kmeans(coords, centers = 5) # Adjust 'centers' as needed

# Add cluster information to the data
crimedf_filtered$cluster <- kmeans_result$cluster

# Create a leaflet map
map <- leaflet(crimedf_filtered) %>% addTiles()

# Add clustered points to the map
map <- map %>% addCircleMarkers(
  lat = ~latitude, 
  lng = ~longitude, 
  color = ~factor(cluster), 
  popup = ~paste("Cluster:", cluster)
)

# Add a heatmap layer
map <- map %>% addHeatmap(
    lat = ~latitude, 
    lng = ~longitude, 
    intensity = ~1, 
    blur = 20, 
    max = 0.05, 
    radius = 15
)

# Render the map
map

the darker the zone the more of hotspot with crime
the lighter shaes indicates less hotspots with crime

Here is a visualization of our temporal space in chicago. It is intresting to see how k-means is used to create our clustering to better concentrate on specific hot zone crimes. Clustering was made on a basis of 5 to learn the data and around the area structure the coloring with the amount of arrests.

Chicagocrime_rstudio

2024-01-05