library(readr)
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.3.1
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.3.1
library(readr)
crimedf <- read_csv("~/Desktop/crimedataquery.csv")
## Rows: 29845 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): case_number, date, block, iucr, primary_type, description, locatio...
## dbl (10): unique_key, beat, district, ward, community_area, x_coordinate, y_...
## lgl (2): arrest, domestic
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Convert 'date' column to date-time format
crimedf$date <- as.POSIXct(crimedf$date, format="%Y-%m-%d %H:%M:%S", tz="UTC")
# Filtering for the years 2022 and 2023
crimedf_filtered <- filter(crimedf, year(date) %in% c(2022, 2023))
crime_count_2022_2023 <- crimedf_filtered %>%
group_by(primary_type) %>%
summarize(count = n()) %>%
arrange(desc(count))
# Optionally, create a bar plot
ggplot(crime_count_2022_2023, aes(x=reorder(primary_type, count), y=count)) +
geom_bar(stat="identity") +
coord_flip() +
labs(title="Crime Types in Chicago (2022-2023)", x="Crime Type", y="Count")
From the previous, we can note that - deceptive practice - battery - other offense - narcotics - robbery
where the highest top 5 crimes comitted in 2022-2023. In other words, fradualent crimes are on the rise. These type of likley to miselead a consumer from providing false information.
beating_crimes <- filter(crimedf_filtered, primary_type == "DECEPTIVE PRACTICE")
beating_crime_by_block <- beating_crimes %>%
group_by(block) %>%
summarize(count = n()) %>%
arrange(desc(count)) %>%
head(1) # Assuming you want the top block
# Print the result
print(beating_crime_by_block)
## # A tibble: 1 × 2
## block count
## <chr> <int>
## 1 001XX N STATE ST 22
This block has the highest deceptive practice in chicago. If we were intrested in looking at the second highest crime then we would want to look at the block with the highest ‘battery’ crime.
battery_crimes <- filter(crimedf_filtered, primary_type == "BATTERY")
battery_crime_by_block <- battery_crimes %>%
group_by(block) %>%
summarize(count = n()) %>%
arrange(desc(count)) %>%
head(1) # Assuming you want the top block
# Print the result
print(battery_crime_by_block)
## # A tibble: 1 × 2
## block count
## <chr> <int>
## 1 006XX S CENTRAL AVE 6
This block has the highest battery practice in chicago. These crimes were committed in 2022-2023.
For a temporal analysis, particularly focusing on beatings:
Use time series analysis to identify trends, patterns, and seasonal variations in beatings. For this practice, we will focus on the two cases where we will break down by our paramters:
YEAR COMITTED CRIME - BLOCK - EVOLUTION OF TIME AROUND THAT AREA - TYPE OF CRIME
crimedf$date <- as.POSIXct(crimedf$date, format="%Y-%m-%d %H:%M:%S", tz="UTC")
# Case I: DECEPTIVE PRACTICE at 001XX N STATE ST
deceptive_practice_data <- filter(crimedf, primary_type == "DECEPTIVE PRACTICE",
block == "001XX N STATE ST", year(date) >= 2019)
# Aggregate by month or year
monthly_deceptive <- deceptive_practice_data %>%
mutate(month = floor_date(date, "month")) %>%
group_by(month) %>%
summarize(count = n())
# Plotting
ggplot(monthly_deceptive, aes(x = month, y = count)) +
geom_line() +
labs(title = "Monthly Trend of Deceptive Practice at 001XX N STATE ST (2019-2023)",
x = "Month", y = "Count of Crimes")
There has been a sudden spike in 2023.
# Case II: BATTERY at 006XX S CENTRAL AVE
battery_data <- filter(crimedf, primary_type == "BATTERY",
block == "006XX S CENTRAL AVE", year(date) >= 2019)
# Aggregate by month or year
monthly_battery <- battery_data %>%
mutate(month = floor_date(date, "month")) %>%
group_by(month) %>%
summarize(count = n())
# Plotting
ggplot(monthly_battery, aes(x = month, y = count)) +
geom_line() +
labs(title = "Monthly Trend of Battery at 006XX S CENTRAL AVE (2019-2023)",
x = "Month", y = "Count of Crimes")
Ther was sudden spikes towards the end of 2021 and early 2022.
library(readr)
library(dplyr)
library(lubridate)
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.3.1
library(leaflet.extras)
library(cluster)
## Warning: package 'cluster' was built under R version 4.3.1
library(readr)
library(dplyr)
library(lubridate)
library(leaflet)
library(leaflet.extras)
library(cluster)
# Load and preprocess data
crimedf <- read_csv("~/Desktop/crimedataquery.csv")
## Rows: 29845 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): case_number, date, block, iucr, primary_type, description, locatio...
## dbl (10): unique_key, beat, district, ward, community_area, x_coordinate, y_...
## lgl (2): arrest, domestic
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
crimedf$date <- as.POSIXct(crimedf$date, format="%Y-%m-%d %H:%M:%OS", tz="UTC")
crimedf_filtered <- filter(crimedf, year(date) >= 2019)
# Clean data: Remove rows with NA/NaN/Inf in latitude or longitude
crimedf_filtered <- crimedf_filtered %>%
filter(!is.na(latitude) & !is.na(longitude) &
!is.infinite(latitude) & !is.infinite(longitude))
# K-means clustering
set.seed(123) # For reproducibility
coords <- crimedf_filtered %>% select(latitude, longitude)
kmeans_result <- kmeans(coords, centers = 5) # Adjust 'centers' as needed
# Add cluster information to the data
crimedf_filtered$cluster <- kmeans_result$cluster
# Create a leaflet map
map <- leaflet(crimedf_filtered) %>% addTiles()
# Add clustered points to the map
map <- map %>% addCircleMarkers(
lat = ~latitude,
lng = ~longitude,
color = ~factor(cluster),
popup = ~paste("Cluster:", cluster)
)
# Add a heatmap layer
map <- map %>% addHeatmap(
lat = ~latitude,
lng = ~longitude,
intensity = ~1,
blur = 20,
max = 0.05,
radius = 15
)
# Render the map
map
Here is a visualization of our temporal space in chicago. It is intresting to see how k-means is used to create our clustering to better concentrate on specific hot zone crimes. Clustering was made on a basis of 5 to learn the data and around the area structure the coloring with the amount of arrests.