DSCI 511 Spring 2023 Final Project

Introduction

Chicago is often regarded as one of the most violent cities in the United States. As a resident of the Chicago suburbs, I was curious to investigate the trends and patterns of homicide and non-fatal shootings in the city over the years. To answer my research questions, I utilized the Violence Reduction - Victims of Homicides and Non-Fatal Shootings dataset from the Chicago Data Portal site. In this analysis, I will discuss the data, my preprocessing techniques, filtering methods, programming language choice, and visualizations.

Questions

The main research question I hope to answer is whether the overall rate of homicide and non-fatal shootings in Chicago has increased or decreased since 1991. Additionally, I want to determine if there have been any significant fluctuations in the number of homicides or non-fatal shootings over the years. Furthermore, I want to find out if certain areas of Chicago have experienced more homicides or non-fatal shootings than others. Finally, I want to examine if the frequency of non-fatal shootings has increased since 2010 and if there are certain times of year when homicides or non-fatal shootings are more likely to occur in Chicago.

Data

The Violence Reduction - Victims of Homicides and Non-Fatal Shootings dataset covers a period from 1991 to the present day and includes individual-level homicide and non-fatal shooting victimizations. The data is classified by determining whether the victimization is present in the Chicago Police Department’s homicide data table or shooting data tables. The dataset is refreshed daily with approximately a 48-hour lag.

Preprocessing

To preprocess the data, I separated the date and time information into separate columns and created new columns to extract the month, hour, and year information from the date column. This allowed for easier analysis and visualization of the data based on different time frames. Additionally, I checked for and removed any missing or inconsistent data entries to ensure the accuracy of the dataset.

Filtering

To filter the data, I used a few different methods. One method was to filter the data to only include incidents classified as “HOMICIDE.” Another method I used was to filter the data to only include incidents that occurred from the year 2010 to present. I also filtered the data to exclude any incidents where the INCIDENT_IUCR_SECONDARY column was empty or NA, and replaced those values with “Unknown.” Additionally, I created a new variable called “repeat_offenders,” which filtered the data to only include cases where an individual had been a victim of multiple incidents.

Programming Language

I used R for this project. R is a powerful programming language specifically designed for data analysis and statistics. It has a wide range of libraries and packages that make it easy to manipulate and visualize large datasets. Additionally, R’s syntax is easy to read and understand, making it a great language for both beginners and experienced programmers.

Visualizations

To model the data, I used various types of visualizations such as bar charts, stacked bar charts, leaflet maps, small multiples, pie charts, and chloropleth maps. These visualizations allowed me to identify trends, outliers, and relationships within the data. For example, I created a bar chart to show the total number of homicides and non-fatal shootings over the years. I also used a leaflet map to display the geographical distribution of homicides and non-fatal shootings in Chicago. These visualizations helped me communicate my findings to others in a clear and concise manner.

Conclusion

In conclusion, the analysis of the “Violence Reduction - Victims of Homicides and Non-Fatal Shootings” dataset from the City of Chicago’s data portal site has provided insights into the incidence of homicides and non-fatal shootings in Chicago from 1991 to the present day. The data shows a steady increase in violent crime in the city, with a peak of 4354 incidents in 2016. There was a slight increase in the number of incidents from 2019 to 2020, going from 2689 to 4164 incidents, suggesting that efforts to combat violence in the city need to be continued and strengthened. Certain types of violent crimes, such as Aggravated Handgun and First Degree Murder cases, have remained consistently high over the years. The data also shows that certain times of the year, such as summer and weekends, are more likely to experience an increase in the incidence of violent crime. Additionally, certain areas of Chicago have experienced more homicides or non-fatal shootings than others. As for the racial distribution of victims in various types of violent crimes, Black individuals are overrepresented in most categories. Overall, the analysis provides valuable information for law enforcement agencies and policymakers to develop strategies and interventions to reduce the incidence of violence in the city. The findings partially answer the questions posed, with data showing a steady increase in violent crime over the years and certain patterns and trends that could be used to develop effective crime prevention strategies.

# Installing the necessary libraries
library(leaflet)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ stringr 1.5.0
## ✔ purrr   1.0.1     ✔ tibble  3.2.1
## ✔ readr   2.1.4     ✔ tidyr   1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sf)
## Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(reshape2)
## 
## Attaching package: 'reshape2'
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(scales)
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
library(shape)
library(patchwork)
# Uploading the dataset
murder <- read.csv("murder.csv")

#Visualization of Chicago Homicides and Non-Fatal shootings by year “In this section, We first convert the DATE column to a POSIXct format to work with date-time information. Then, we extract the month, time, hour, and year information from the DATE column and create separate columns for each. Finally, we create a bar chart using the ggplot2 package to visualize the number of homicides and non-fatal shootings in Chicago by year.”

# Convert the date column to Date format 
#murder$DATE <- as.Date(murder$DATE, format = "%m/%d/%Y %I:%M:%S %p")
murder$DATE <- as.POSIXct(murder$DATE, format = "%m/%d/%Y %I:%M:%S %p")

#create a new column fot just the month
murder$MONTH1 <- format(murder$DATE, "%m")
murder$TIME <-format(murder$DATE, "%I:%M:%S %p")
murder$HOUR1 <- format(murder$DATE, format = "%H")
# Create a new column for just the year
murder$YEAR <- format(murder$DATE, "%Y")



ggplot(murder, aes(x=YEAR)) +
  geom_bar(width = 0.8, color = "blue") +
  labs(x = "Year", y = "Count", main = "Chicago Homicides and non-Fatal Shootings by Year") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

The visualization shows the number of homicides and non-fatal shootings in Chicago from 1991 to 2023. We can see a general downward trend in the number of incidents, with a spike in 2010. However, it is important to note that this spike may be due to the fact that data on non-fatal shootings was not collected until 2010. Therefore, the increase in the number of incidents in 2010 may be attributed to this change in data collection rather than a true increase in violence.

#Chicago Homicides by year

Here is a plot showing the number of homicides in chicago by year using the data from the murder.csv dataset. This plot only includes incidents classified as homicide

murder_Homicide <- murder %>% filter(INCIDENT_PRIMARY == "HOMICIDE")

ggplot(murder_Homicide, aes(x=YEAR)) +
  geom_bar(width = 0.8, color = "blue") +
  labs(x = "Year", y = "Count", title = "Chicago Homicides by Year") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

Looking at the graph, we can see that Chicago has experienced a significant number of homicides over the years. The highest number of homicides occurred in 2021, with 1033 homicides, followed by 1992, with 948 homicides. While there have been fluctuations in the number of homicides each year, it is concerning to see that the number has remained high in recent years. This emphasizes the need for continued efforts to address the root causes of violence in the city and to improve public safety measures.

Plotting the number of homicides and non-fatal shootings in chicago from 2010-present

murder_2010_present <- murder %>% filter(YEAR >= "2010")

ggplot(murder_2010_present, aes(x=YEAR)) +
  geom_bar(width = 0.8, color = "blue") +
  labs(x = "Year", y = "Count", title = "Chicago Homicides & non-fatal shootings by Year (2010- present)") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

The plot shows the number of homicides and non-fatal shootings in Chicago from 2010 to the present. The data shows a steady increase in violent crime in the city, with a peak of 4354 incidents in 2016. However, it’s worth noting that there was a slight increase from 2019 to 2020, going from 2689 incidents to 4164 incidents. This suggests that efforts to combat violence in the city need to be continued and strengthened to address this concerning trend.

#Chicago Homicides and non-fatal shootings by year broken down by Secondary Incident

# Filter data
murder_2010_present_filtered <- murder_2010_present %>%
  filter(!is.na(INCIDENT_IUCR_SECONDARY) & INCIDENT_IUCR_SECONDARY != "") %>%
  mutate(INCIDENT_IUCR_SECONDARY = ifelse(is.na(INCIDENT_IUCR_SECONDARY) | INCIDENT_IUCR_SECONDARY == "", "Unknown", INCIDENT_IUCR_SECONDARY))

# Create stacked bar plot
p <- ggplot(murder_2010_present_filtered, aes(x=YEAR, fill=INCIDENT_IUCR_SECONDARY)) +
  geom_bar(width = 0.8, color = "white") +
  labs(x = "Year", y = "Count", title = "Chicago Homicides & non-fatal shootings by Year (2010- present)") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5),
        legend.position = "none")

# Create legend plot
legend <- ggplot(murder_2010_present_filtered, aes(x="", y="", fill=INCIDENT_IUCR_SECONDARY)) +
  geom_bar(stat="identity", width=1, color="white") +
  labs(fill = "Incident IUCR Secondary") +
  theme_void() +
  theme(legend.text=element_text(size=5),
        plot.margin=unit(c(1, 1, 1, 3), "cm"))

# Combine plots using patchwork

legend 

p

The number of Aggravated Handgun cases has been consistently high, with a slight dip in 2013 and 2019, but otherwise staying above 1,500 cases per year.

It peaked in 2016 with 3,233 cases and remained high in 2020 with 2,988 cases and in 2021 with 3,265 cases. The number of Aggravated Handgun cases in 2022 and 2023 seem to be significantly lower than previous years, but it may be too early to draw conclusions about the current year.

First Degree Murder cases have also been consistently high, with a peak in 2016 with 892 cases and remained above 500 cases every year except for 2023 so far. The number of First Degree Murder cases in 2020 and 2021 were particularly high with 988 and 1014 cases respectively.

Attempt Armed Handgun cases have been relatively low compared to the other categories, but it seems to have increased in 2016 and 2020 with 71 and 53 cases respectively.

Aggravated Domestic Battery - Handgun cases have been relatively low, but it seems to have increased in 2015 and 2019 with 15 and 22 cases respectively, and remained stable since then.

Aggravated Other Firearm cases have been consistently low, with a peak in 2010 with 46 cases and otherwise staying below 30 cases per year. The number of Armed Handgun cases has been relatively stable, staying below 100 cases per year except for 2016 with 91 cases.

Chicago Homicides and non-fatal shootings by month

This visualization is a bar chart showing the count of Chicago homicides and non-fatal shootings by month, from 2010 to present. The x-axis represents the month and the y-axis represents the count.

ggplot(murder_2010_present, aes(x=MONTH)) +
  geom_bar(width = 0.8, color = "blue") +
  labs(x = "MONTH", y = "Count", title = "Chicago Homicides and non-Fatal Shootings by Month") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

murder_count <- murder_2010_present %>%
  count(MONTH)

Looking at the data, it appears that July has the highest count with 5082, followed by June with 4658. The lowest count is in February with 1933. Overall, the data seems to show an increase in counts from the beginning of the year until the middle of the year, with a decrease in the latter half of the year.

#Chicago Homicides and Non-Fatal shootings by day of the week

murder$DATE <- as.POSIXct(murder$DATE, format = "%m/%d/%Y %I:%M:%S %p")

murder_2010_present <- murder %>% 
  filter(YEAR >= 2010) %>% 
  mutate(date = ymd_hms(DATE),
         day_of_week = factor(weekdays(date), 
                              levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")),
         day_of_month = day(date))

ggplot(murder_2010_present, aes(x = day_of_week)) +
  geom_bar(width = 0.8, color = "blue") +
  labs(x = "Day of the week", y = "Count", title = "Chicago Homicides & non-fatal shootings by day of the week (2010- present)") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

The above visualization is a bar graph representing the count of homicides and non-fatal shootings in Chicago by day of the week from 2010 to the present. The bar graph shows that the highest number of homicides and non-fatal shootings occurred on Saturdays, followed by Sundays and Fridays, while the lowest number occurred on Mondays.

The data received from this visualization shows that the total number of homicides and non-fatal shootings on Saturdays and Sundays are the highest, followed by Fridays, while Mondays have the lowest number of homicides and non-fatal shootings. This information could be useful for law enforcement agencies and policymakers to develop strategies and interventions to reduce the incidence of violence on weekends.

#Top 10 days for Homicides and Non-Fatal Shootings in chicago

murder_counts <- murder %>%
  select(DATE) %>%
  mutate(date = as.Date(DATE, format = "%m/%d/%Y")) %>%
  count(date, sort = TRUE)

head(murder_counts, n = 15)
##          date  n
## 1  2020-05-31 65
## 2  2020-07-05 53
## 3  2020-06-22 48
## 4  2021-07-05 48
## 5  2020-06-01 41
## 6  2017-08-20 40
## 7  2021-08-08 40
## 8  2015-07-05 39
## 9  2020-06-20 38
## 10 2017-06-11 37
## 11 2019-06-01 36
## 12 2018-08-05 35
## 13 2020-06-28 35
## 14 2021-08-15 35
## 15 2022-07-24 35

Possible reasons for each shooting

After wondering what could’ve been happening around these dates I looked online and here are some possible reasons for these shootings

  1. May 31, 2020: 65 shooting victims - On this day, there were widespread protests and unrest in many cities across the United States, including Chicago, following the death of George Floyd. The Chicago police reported a surge in violence that weekend, with more than 100 people shot, and 18 killed in the city.

  2. July 5, 2020: 53 shooting victims - The Fourth of July holiday weekend in 2020 was marked by a spike in violence across the city, with more than 100 people shot, and 14 killed in Chicago.

  3. June 22, 2020: 48 shooting victims - The weekend of June 19-22, 2020 saw a sharp increase in gun violence in Chicago, with more than 100 people shot, and 14 killed. This surge in violence was attributed to gang conflicts and disputes over drug territory.

  4. July 5, 2021: 48 shooting victims - The Fourth of July weekend in 2021 also saw a spike in violence, with more than 100 people shot, and 19 killed in Chicago.

  5. June 1, 2020: 41 shooting victims - This was the day after the first large protests in Chicago in response to the death of George Floyd. There were reports of looting and violence in some areas of the city.

  6. August 20, 2017: 40 shooting victims - This was one of the bloodiest weekends in Chicago in recent years, with more than 60 people shot, and 12 killed. The violence was attributed to gang activity and drug-related disputes.

  7. August 8, 2021: 40 shooting victims - This was another violent weekend in Chicago, with more than 70 people shot, and 11 killed. The spike in violence was attributed to disputes between rival gangs.

  8. July 5, 2015: 39 shooting victims - The Fourth of July weekend in 2015 was also marked by a surge in gun violence, with more than 60 people shot, and 10 killed in Chicago.

  9. June 20, 2020: 38 shooting victims - This was another violent weekend in Chicago, with more than 100 people shot, and 14 killed. The violence was attributed to gang conflicts and disputes over drug territory.

  10. June 11, 2017: 37 shooting victims - This was one of the deadliest weekends in Chicago in recent years, with more than 50 people shot, and 8 killed. The violence was attributed to gang activity and drug-related disputes.

#chicago Homicides and Non-fatal shootings by hour

ggplot(murder_2010_present, aes(x=HOUR1)) +
  geom_bar(width = 0.8, color = "blue") +
  labs(x = "HOUR", y = "Count", title = "Chicago Homicides and non-Fatal Shootings by Hour") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

The visualization shows the count of homicides and non-fatal shootings by hour in Chicago from 2010 to present. The highest number of incidents occur between 7 PM to 11 PM. The count of incidents starts to decrease after midnight and reaches the lowest point in the early morning hours, between 5 AM and 8 AM.

#Chicago Homicides and non-fatal shootings by season

murder_2010_present$MONTH_DAY <- month(murder_2010_present$DATE) * 100 + day(murder_2010_present$DATE)
murder_2010_present$SEASON <- cut(murder_2010_present$MONTH_DAY,
                                  breaks = c(0, 321, 621, 922, 1222, 1231), 
                                  labels = c("winter", "spring", "summer", "fall", "winter"))
# Group by season and month
count_season <- murder_2010_present %>%
  group_by(SEASON, MONTH) %>%
  summarise(count = n()) %>%
  arrange(SEASON, MONTH)
## `summarise()` has grouped output by 'SEASON'. You can override using the
## `.groups` argument.
ggplot(murder_2010_present, aes(x=SEASON)) +
  geom_bar(width = 0.8, color = "blue") +
  labs(x = "SEASON", y = "Count", title = "Chicago Homicides and non-Fatal \n Shootings by Season") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

The plot displays the count of Chicago homicides and non-fatal shootings by season, using data from 2010 to the present. The x-axis represents the four seasons (winter, spring, summer, and fall) and the y-axis shows the count of incidents. The bar chart shows that the number of incidents is highest in the summer, followed by spring, fall, and winter. This trend could potentially be explained by various factors, such as weather, social activities, and increases in gang activity during the summer months. Overall, this plot provides insight into the seasonal patterns of violent crime in Chicago and could potentially be useful for informing crime prevention strategies.

#Racial Distribution by injury Type

ggplot(murder_2010_present, aes(x="", fill=RACE)) +
  geom_bar(width=1, position="fill") +
  facet_wrap(~INCIDENT_PRIMARY, ncol=3) +
  coord_polar(theta="y") +
  theme_void() +
  labs(title="Racial distribution by injury type")

The pie charts present the racial distribution of offenders in various types of violent crimes in Chicago from 2001 to 2020. Each pie chart represents a different type of violent crime, and the slices of the pie represent the proportion of offenders of each race.

Looking at the Battery category, we observe that the majority of victims are Black, comprising almost 75% of the total. White Hispanics come next at approximately 16%, while all other races fall below 5%.

In the Criminal Sexual Assault category, only one victim was reported, and the offender was a Black individual.

In the Homicide category, Black victims make up the majority, accounting for almost 80%, followed by White Hispanics at approximately 15%. All other races fall below 5%.

In the Non-Fatal category, only two victims were reported, and both were Black.

Lastly, in the Robbery category, Black victims make up the majority at around 75%, followed by White Hispanics at around 15%. All other races fall below 5%.

Homicides and Non-fatal shootings by year

ggplot(murder_2010_present, aes(x=YEAR, fill=RACE)) +
  geom_bar(width = 0.8, color = "blue") +
  labs(x = "Year", y = "Count", title = "Homicides and non-fatal shootings \n from 2010 to 2023") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

Looking at the data, we can see that there have been fluctuations in the number of homicides and non-fatal shootings in Chicago from 2010 to 2023, with Black individuals being the most represented racial group in these categories. In 2010, there were 1,912 reported incidents involving Black victims, which increased over the years, peaking in 2016 with 3,252 reported incidents. However, from 2016 onwards, there appears to be a decline in the number of incidents involving Black victims.

On the other hand, the number of incidents involving White victims, whether White Hispanic or White non-Hispanic, have remained relatively consistent over the years, with some slight fluctuations. For example, in 2012, there was a peak in the number of incidents involving White victims, with 71 reported incidents. However, the number of incidents involving White victims is much lower compared to Black victims.

It is also important to note that there are some categories where there are only a small number of incidents reported for certain racial groups, such as the Asian/Pacific Islander category or the “I” category, which stands for “American Indian/Alaskan Native”. Therefore, we should be cautious in making conclusions about the trends for these groups based on this limited data.

ggplot(murder_2010_present, aes(x=YEAR, fill=INCIDENT_PRIMARY )) +
  geom_bar(width = 0.8, color = "blue") +
  labs(x = "Year", y = "Count", title = "Homicides and non-fatal shootings \n from 2010 to 2023") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

The plot shows the count of homicide and non-fatal shooting incidents from the year 2010 to the present year (2023). The plot uses stacked bars, where each bar represents a year, and the height of each bar represents the total count of incidents in that year. The bars are divided into different segments, with each segment representing a specific type of incident. The x-axis shows the year, and the y-axis shows the count of incidents.

Looking at the plot, we can see that the count of homicide incidents has fluctuated over the years, with some years showing an increase in the count, and others showing a decrease. For example, the count of homicide incidents was 515 in the year 2010, increased to a peak of 994 in the year 2020, and then decreased to 187 in the current year (2023).

Similarly, the count of non-fatal shooting incidents has also fluctuated over the years, but the overall trend seems to be increasing. We can see that there were only two non-fatal shooting incidents in the year 2011, but this count increased to 901 in the year 2016, which is the highest count in the plot. Since then, the count has decreased slightly but still remains higher than the earlier years.

murder_2010_present$AREA <- as.factor(murder_2010_present$AREA)
ggplot(murder_2010_present, aes(x=YEAR, fill=AREA)) +
  geom_bar(position="stack", color="blue") +
  labs(x="Year", y="Count", title="Homicides and non-fatal shootings \n from 2010 to 2023 by District") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

count_DISTRICT <- murder_2010_present %>%
  group_by(YEAR, DISTRICT) %>%
  summarise(count = n()) %>%
  arrange(YEAR, desc(count))
## `summarise()` has grouped output by 'YEAR'. You can override using the
## `.groups` argument.
print(count_DISTRICT)
## # A tibble: 309 × 3
## # Groups:   YEAR [14]
##    YEAR  DISTRICT count
##    <chr>    <int> <int>
##  1 2010         7   290
##  2 2010        11   246
##  3 2010         3   242
##  4 2010         8   236
##  5 2010         5   212
##  6 2010         6   206
##  7 2010        25   205
##  8 2010         4   201
##  9 2010        10   200
## 10 2010         9   184
## # ℹ 299 more rows
# Load homicide data
murder_data <- murder_2010_present
# Convert coordinates to numeric format
murder_data$Longitude <- as.numeric(murder_2010_present$LONGITUDE)
murder_data$Latitude <- as.numeric(murder_2010_present$LATITUDE)

# Create a color palette
pal <- colorFactor(
  palette = c("red", "green", "blue", "purple", "orange"), 
  domain = murder_data$INCIDENT_PRIMARY
)

# Create the leaflet map
m <- leaflet(murder_data) %>% 
  addTiles() %>% 
  addMarkers(lng = ~Longitude, lat = ~Latitude, 
             popup = ~as.character(murder_data$INCIDENT_PRIMARY),
             clusterOptions = markerClusterOptions())
## Warning in validateCoords(lng, lat, funcName): Data contains 1 rows with either
## missing or invalid lat/lon values and will be ignored
# Display the map
m

This visualization shows a map of locations where homicides occurred between 2010 and the present. The map is created using leaflet, a popular package for interactive web mapping in R. Each point on the map represents a homicide incident and is color-coded based on the incident type. The legend on the right shows the color scale for the different incident types, which range from red for “assault with a deadly weapon” to orange for “manslaughter.”

The map allows the user to zoom in and out and pan around to explore the different areas where homicides occurred. When the user hovers over a point, a pop-up appears with additional information about the incident, including the date, time, and location. The markers also cluster together to improve the readability of the map, particularly in areas where there are many incidents close together.

Map of chicago for homicides and non-fatal shootings from 2010 - 2023

# Read in the shapefile
chicago_shapefile <- st_read("geo_export_efbb67bd-8cda-48e8-bd76-453248b0863c.shp")
## Reading layer `geo_export_efbb67bd-8cda-48e8-bd76-453248b0863c' from data source `C:\Users\kelse\OneDrive - Saint Mary's College\DSCI511\Final Project\geo_export_efbb67bd-8cda-48e8-bd76-453248b0863c.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 77 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
## Geodetic CRS:  WGS84(DD)
# Count the number of murders per community area for each year
murders_by_year_and_community <- murder_2010_present %>% 
  group_by(YEAR, COMMUNITY_AREA) %>% 
  summarize(num_incidents = n()) %>% 
  ungroup()
## `summarise()` has grouped output by 'YEAR'. You can override using the
## `.groups` argument.
# Join the murder data to the shapefile
chicago_shapefile_with_data <- chicago_shapefile %>% 
  left_join(murders_by_year_and_community, by = c("community" = "COMMUNITY_AREA"))

# Calculate the breaks and labels based on the range of the data
data_range <- range(chicago_shapefile_with_data$num_incidents, na.rm = TRUE)
breaks <- seq(data_range[1], data_range[2], length.out = 10)
labels <- prettyNum(breaks, big.mark = ",")

# Create a ggplot object with small multiples for each year
ggplot(chicago_shapefile_with_data) +
  geom_sf(aes(fill = num_incidents)) +
  scale_fill_gradient(name = "Number of Incidents", 
                      low = "#FFFFCC", high = "#800026", 
                      breaks = breaks, labels = labels, n = 20) +
  facet_wrap(~YEAR, ncol = 4) +
  labs(title = "Chicago Murders 2010 - Present", x = "Longitude", y = "Latitude") +
  theme_void()

This visualization shows the number of murders per community area in Chicago from 2010 to present. It uses a choropleth map, where each community area is shaded based on the number of incidents reported in that area. The map is split into small multiples, with each year shown in a separate panel.

The darker shades of red indicate a higher number of incidents, while the lighter shades indicate a lower number. The legend on the right shows the corresponding number of incidents for each shade of red.

This visualization allows us to quickly identify areas of Chicago that have higher numbers of murders and how those numbers have changed over the years. We can also see that some community areas consistently have higher numbers of murders than others.

Here are some observations that i observed:

  1. The neighborhood of Austin has consistently had the highest number of homicides, with the exception of 2016, when the neighborhood of Austin had a significantly higher number of homicides compared to other years.

  2. The neighborhoods of West Englewood, Humboldt Park, Roseland, and Englewood have consistently been among the top neighborhoods with the highest number of homicides.

  3. The neighborhoods of North Lawndale and Greater Grand Crossing have shown an increase in the number of homicides in recent years, with both neighborhoods recording a higher number of homicides in 2021 compared to any other year in the dataset.

  4. Overall, the neighborhoods with the highest number of homicides have remained consistent over the years, indicating the persistence of crime in certain areas of the city.

#finding the proportion of offenders Involved in multiple shootings/ murders

# Count the number of repeat offenders and offenders who killed only one person

repeat_offenders <- murder %>% 
  group_by(CASE_NUMBER) %>% 
  summarise(num_occurrences = n()) %>% 
  filter(num_occurrences > 1)


num_repeat_offenders <- sum(murder$CASE_NUMBER %in% repeat_offenders$CASE_NUMBER)
num_single_offenders <- nrow(murder) - num_repeat_offenders

# Calculate the percentage of repeat offenders and single offenders
percent_repeat_offenders <- num_repeat_offenders / nrow(murder) * 100
percent_single_offenders <- num_single_offenders / nrow(murder) * 100

# Create a data frame for the pie chart
pie_data <- data.frame(
  category = c("Killed or shot more than 1 person", "killed or shot only 1 person"),
  percent = c(percent_repeat_offenders, percent_single_offenders)
)

# Create the pie chart with percentages displayed in labels
ggplot(pie_data, aes(x="", y=percent, fill=category)) + 
  geom_bar(width = 1, stat = "identity", color = "white") +
  coord_polar("y", start=0) +
  ggtitle("Proportion of Offenders\nInvolved in \nMultiple Murders/Shootings") +
  scale_fill_brewer(palette = "Set2") +
  labs(fill = "Category") +
  theme_void() +
  theme(
    legend.position = c(0.85, 0.15),
    legend.background = element_blank(),
    legend.text = element_text(size = 12),
    plot.title = element_text(size = 14, face = "bold")
  ) +
  geom_text(aes(label = paste0(round(percent), "%")), position = position_stack(vjust = 0.5))

# create a new column with only the time information
murder_2010_present$TimeOfDay <- format(murder_2010_present$TIME, format = "%I:%M %p")

# plot the data
ggplot(murder_2010_present, aes(x = TimeOfDay, group = INCIDENT_PRIMARY, color = INCIDENT_PRIMARY)) +
  geom_line(stat = "count") +
  labs(x = "Time", y = "Count of Incidents") +
  scale_color_manual(values = c("BATTERY" = "red", 
                                "HOMICIDE" = "green", 
                                "NON-FATAL" = "purple", 
                                "CRIMINAL SEXUAL ASSAULT" = "blue", 
                                "ROBBERY" = "yellow"))