To help guide my visualization creation, I narrowed down on five research questions to briefly explore Chicago crime and to further specifically analyse domestic crime.
Research Question 1: What are the most common crimes in Chicago and when do they occur?
Research Question 2: How many domestic arrests were made per month in 2021? When do domestic incidents occur most frequently?/What type of domestic incident?
Research Question 3: What time of the day do domestic disturbances occur most frequently?
Research Question 4: What amount of domestic assault crimes end in arrest?
Research Question 5: What areas of Chicago have the highest rate of domestic assault?
Data exploration used to help develop research questions.
# Column Names
colnames(dfcc)
## [1] "ID" "Case Number" "Date"
## [4] "Block" "IUCR" "Primary Type"
## [7] "Description" "Location Description" "Arrest"
## [10] "Domestic" "Beat" "District"
## [13] "Ward" "Community Area" "FBI Code"
## [16] "X Coordinate" "Y Coordinate" "Year"
## [19] "Updated On" "Latitude" "Longitude"
## [22] "Location"
# Data Preview
head(dfcc)
## ID Case Number Date Block IUCR
## 1: 12260346 JE102126 01/03/2021 01:23:00 PM 070XX S EGGLESTON AVE 0486
## 2: 12263464 JE105797 01/03/2021 06:59:00 AM 080XX S YALE AVE 0820
## 3: 12259990 JE101773 01/03/2021 12:20:00 AM 056XX W WASHINGTON BLVD 0486
## 4: 12260669 JE102509 01/03/2021 08:47:00 PM 057XX S RACINE AVE 2022
## 5: 25702 JE102438 01/03/2021 08:09:00 PM 068XX S STONY ISLAND AVE 0110
## 6: 12260241 JE101923 01/03/2021 08:54:00 AM 106XX S YATES AVE 0560
## Primary Type Description Location Description Arrest Domestic
## 1: BATTERY DOMESTIC BATTERY SIMPLE APARTMENT FALSE TRUE
## 2: THEFT $500 AND UNDER RESIDENCE FALSE FALSE
## 3: BATTERY DOMESTIC BATTERY SIMPLE APARTMENT FALSE TRUE
## 4: NARCOTICS POSSESS - COCAINE STREET TRUE FALSE
## 5: HOMICIDE FIRST DEGREE MURDER STREET FALSE FALSE
## 6: ASSAULT SIMPLE CHA APARTMENT FALSE FALSE
## Beat District Ward Community Area FBI Code X Coordinate Y Coordinate Year
## 1: 732 7 6 68 08B 1174496 1858251 2021
## 2: 623 6 17 44 06 1176011 1851718 2021
## 3: 1513 15 29 25 08B 1138722 1900183 2021
## 4: 713 7 16 67 18 1169298 1866822 2021
## 5: 332 3 5 43 01A 1188038 1860051 2021
## 6: 434 4 7 51 08A 1194343 1834995 2021
## Updated On Latitude Longitude Location
## 1: 01/16/2021 03:49:23 PM 41.76644 -87.63596 (41.766435144, -87.635963997)
## 2: 01/16/2021 03:49:23 PM 41.74847 -87.63061 (41.748473982, -87.630606588)
## 3: 01/16/2021 03:49:23 PM 41.88222 -87.76608 (41.88222427, -87.766076162)
## 4: 01/16/2021 03:49:23 PM 41.79007 -87.65477 (41.79006908, -87.654768679)
## 5: 01/10/2021 03:51:53 PM 41.77106 -87.58627 (41.771062488, -87.586270811)
## 6: 01/16/2021 03:49:23 PM 41.70215 -87.56398 (41.702154047, -87.563980453)
# Data Types
str(dfcc)
## Classes 'data.table' and 'data.frame': 203527 obs. of 22 variables:
## $ ID : int 12260346 12263464 12259990 12260669 25702 12260241 12260534 12260693 12260810 12262250 ...
## $ Case Number : chr "JE102126" "JE105797" "JE101773" "JE102509" ...
## $ Date : chr "01/03/2021 01:23:00 PM" "01/03/2021 06:59:00 AM" "01/03/2021 12:20:00 AM" "01/03/2021 08:47:00 PM" ...
## $ Block : chr "070XX S EGGLESTON AVE" "080XX S YALE AVE" "056XX W WASHINGTON BLVD" "057XX S RACINE AVE" ...
## $ IUCR : chr "0486" "0820" "0486" "2022" ...
## $ Primary Type : chr "BATTERY" "THEFT" "BATTERY" "NARCOTICS" ...
## $ Description : chr "DOMESTIC BATTERY SIMPLE" "$500 AND UNDER" "DOMESTIC BATTERY SIMPLE" "POSSESS - COCAINE" ...
## $ Location Description: chr "APARTMENT" "RESIDENCE" "APARTMENT" "STREET" ...
## $ Arrest : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
## $ Domestic : logi TRUE FALSE TRUE FALSE FALSE FALSE ...
## $ Beat : int 732 623 1513 713 332 434 1231 332 2433 235 ...
## $ District : int 7 6 15 7 3 4 12 3 24 2 ...
## $ Ward : int 6 17 29 16 5 7 28 5 48 5 ...
## $ Community Area : int 68 44 25 67 43 51 28 43 77 41 ...
## $ FBI Code : chr "08B" "06" "08B" "18" ...
## $ X Coordinate : int 1174496 1176011 1138722 1169298 1188038 1194343 1167677 1188317 1165620 NA ...
## $ Y Coordinate : int 1858251 1851718 1900183 1866822 1860051 1834995 1895707 1859560 1941643 NA ...
## $ Year : int 2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
## $ Updated On : chr "01/16/2021 03:49:23 PM" "01/16/2021 03:49:23 PM" "01/16/2021 03:49:23 PM" "01/16/2021 03:49:23 PM" ...
## $ Latitude : num 41.8 41.7 41.9 41.8 41.8 ...
## $ Longitude : num -87.6 -87.6 -87.8 -87.7 -87.6 ...
## $ Location : chr "(41.766435144, -87.635963997)" "(41.748473982, -87.630606588)" "(41.88222427, -87.766076162)" "(41.79006908, -87.654768679)" ...
## - attr(*, ".internal.selfref")=<externalptr>
# Data set summary
summary(dfcc)
## ID Case Number Date Block
## Min. : 25699 Length:203527 Length:203527 Length:203527
## 1st Qu.:12341290 Class :character Class :character Class :character
## Median :12422488 Mode :character Mode :character Mode :character
## Mean :12373419
## 3rd Qu.:12502208
## Max. :12585007
##
## IUCR Primary Type Description Location Description
## Length:203527 Length:203527 Length:203527 Length:203527
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Arrest Domestic Beat District Ward
## Mode :logical Mode :logical Min. : 111 Min. : 1.00 Min. : 1.00
## FALSE:179467 FALSE:159186 1st Qu.: 611 1st Qu.: 6.00 1st Qu.: 9.00
## TRUE :24060 TRUE :44341 Median :1024 Median :10.00 Median :24.00
## Mean :1150 Mean :11.27 Mean :23.12
## 3rd Qu.:1723 3rd Qu.:17.00 3rd Qu.:34.00
## Max. :2535 Max. :31.00 Max. :50.00
## NA's :11
## Community Area FBI Code X Coordinate Y Coordinate
## Min. : 1.00 Length:203527 Min. :1091242 Min. :1813909
## 1st Qu.:23.00 Class :character 1st Qu.:1153356 1st Qu.:1858123
## Median :32.00 Mode :character Median :1166965 Median :1891198
## Mean :37.15 Mean :1165109 Mean :1885758
## 3rd Qu.:55.00 3rd Qu.:1176822 3rd Qu.:1909222
## Max. :77.00 Max. :1205119 Max. :1951499
## NA's :1448 NA's :1448
## Year Updated On Latitude Longitude
## Min. :2021 Length:203527 Min. :41.64 Min. :-87.94
## 1st Qu.:2021 Class :character 1st Qu.:41.77 1st Qu.:-87.71
## Median :2021 Mode :character Median :41.86 Median :-87.66
## Mean :2021 Mean :41.84 Mean :-87.67
## 3rd Qu.:2021 3rd Qu.:41.91 3rd Qu.:-87.63
## Max. :2021 Max. :42.02 Max. :-87.52
## NA's :1448 NA's :1448
## Location
## Length:203527
## Class :character
## Mode :character
##
##
##
##
# Different Locations of Crime
unique(dfcc$"Location Description")
## [1] "APARTMENT"
## [2] "RESIDENCE"
## [3] "STREET"
## [4] "CHA APARTMENT"
## [5] "RESIDENCE - GARAGE"
## [6] "SMALL RETAIL STORE"
## [7] "BARBERSHOP"
## [8] "CAR WASH"
## [9] "CONVENIENCE STORE"
## [10] "HOTEL / MOTEL"
## [11] "AUTO"
## [12] "VEHICLE NON-COMMERCIAL"
## [13] "RESIDENCE - YARD (FRONT / BACK)"
## [14] "GAS STATION"
## [15] "ATM (AUTOMATIC TELLER MACHINE)"
## [16] "COMMERCIAL / BUSINESS OFFICE"
## [17] "DEPARTMENT STORE"
## [18] "PARK PROPERTY"
## [19] "GROCERY FOOD STORE"
## [20] NA
## [21] "CTA TRAIN"
## [22] "RESIDENCE - PORCH / HALLWAY"
## [23] "PARKING LOT / GARAGE (NON RESIDENTIAL)"
## [24] "RESTAURANT"
## [25] "SIDEWALK"
## [26] "TAVERN / LIQUOR STORE"
## [27] "POLICE FACILITY / VEHICLE PARKING LOT"
## [28] "ALLEY"
## [29] "OTHER (SPECIFY)"
## [30] "DRUG STORE"
## [31] "FACTORY / MANUFACTURING BUILDING"
## [32] "CONSTRUCTION SITE"
## [33] "VEHICLE - OTHER RIDE SHARE SERVICE (LYFT, UBER, ETC.)"
## [34] "CTA PLATFORM"
## [35] "AIRPORT TERMINAL UPPER LEVEL - NON-SECURE AREA"
## [36] "APPLIANCE STORE"
## [37] "KENNEL"
## [38] "CTA BUS STOP"
## [39] "AIRPORT TERMINAL LOWER LEVEL - NON-SECURE AREA"
## [40] "BAR OR TAVERN"
## [41] "GOVERNMENT BUILDING / PROPERTY"
## [42] "ATHLETIC CLUB"
## [43] "HOSPITAL BUILDING / GROUNDS"
## [44] "DRIVEWAY - RESIDENTIAL"
## [45] "CHURCH / SYNAGOGUE / PLACE OF WORSHIP"
## [46] "FEDERAL BUILDING"
## [47] "NURSING / RETIREMENT HOME"
## [48] "VEHICLE - COMMERCIAL"
## [49] "CTA BUS"
## [50] "CTA PARKING LOT / GARAGE / OTHER PROPERTY"
## [51] "LIBRARY"
## [52] "MEDICAL / DENTAL OFFICE"
## [53] "CTA STATION"
## [54] "WAREHOUSE"
## [55] "VACANT LOT / LAND"
## [56] "FOREST PRESERVE"
## [57] "AIRPORT TERMINAL UPPER LEVEL - SECURE AREA"
## [58] "VEHICLE - DELIVERY TRUCK"
## [59] "COLLEGE / UNIVERSITY - GROUNDS"
## [60] "BANK"
## [61] "SCHOOL - PUBLIC BUILDING"
## [62] "OTHER RAILROAD PROPERTY / TRAIN DEPOT"
## [63] "CHA PARKING LOT / GROUNDS"
## [64] "AIRPORT BUILDING NON-TERMINAL - NON-SECURE AREA"
## [65] "DAY CARE CENTER"
## [66] "CURRENCY EXCHANGE"
## [67] "CLEANING STORE"
## [68] "AUTO / BOAT / RV DEALERSHIP"
## [69] "SCHOOL - PUBLIC GROUNDS"
## [70] "JAIL / LOCK-UP FACILITY"
## [71] "FIRE STATION"
## [72] "AIRPORT EXTERIOR - NON-SECURE AREA"
## [73] "AIRPORT TERMINAL LOWER LEVEL - SECURE AREA"
## [74] "AIRPORT EXTERIOR - SECURE AREA"
## [75] "HIGHWAY / EXPRESSWAY"
## [76] "AIRPORT BUILDING NON-TERMINAL - SECURE AREA"
## [77] "ABANDONED BUILDING"
## [78] "VEHICLE - COMMERCIAL: ENTERTAINMENT / PARTY BUS"
## [79] "AIRPORT PARKING LOT"
## [80] "AIRCRAFT"
## [81] "CHA HALLWAY / STAIRWELL / ELEVATOR"
## [82] "SCHOOL - PRIVATE GROUNDS"
## [83] "SCHOOL - PRIVATE BUILDING"
## [84] "AIRPORT VENDING ESTABLISHMENT"
## [85] "PAWN SHOP"
## [86] "CEMETARY"
## [87] "CHA ELEVATOR"
## [88] "TAXICAB"
## [89] "OTHER COMMERCIAL TRANSPORTATION"
## [90] "NEWSSTAND"
## [91] "CTA TRACKS - RIGHT OF WAY"
## [92] "SPORTS ARENA / STADIUM"
## [93] "VACANT LOT"
## [94] "YARD"
## [95] "BOWLING ALLEY"
## [96] "DRIVEWAY"
## [97] "COIN OPERATED MACHINE"
## [98] "HOUSE"
## [99] "ELEVATOR"
## [100] "MOVIE HOUSE / THEATER"
## [101] "BARBER SHOP/BEAUTY SALON"
## [102] "BRIDGE"
## [103] "GAS STATION DRIVE/PROP."
## [104] "PARKING LOT"
## [105] "GARAGE"
## [106] "AIRPORT TRANSPORTATION SYSTEM (ATS)"
## [107] "POOL ROOM"
## [108] "COLLEGE / UNIVERSITY - RESIDENCE HALL"
## [109] "PORCH"
## [110] "BOAT / WATERCRAFT"
## [111] "LAKEFRONT / WATERFRONT / RIVERBANK"
## [112] "CREDIT UNION"
## [113] "ANIMAL HOSPITAL"
## [114] "AIRPORT/AIRCRAFT"
## [115] "VEHICLE - COMMERCIAL: TROLLEY BUS"
## [116] "VESTIBULE"
## [117] "RETAIL STORE"
## [118] "LIQUOR STORE"
## [119] "CHA PARKING LOT"
## [120] "AIRPORT TERMINAL MEZZANINE - NON-SECURE AREA"
## [121] "HOSPITAL"
## [122] "CTA \"\"L\"\" TRAIN"
## [123] "HALLWAY"
## [124] "GANGWAY"
## [125] "MOTEL"
## [126] "CLUB"
## [127] "SCHOOL YARD"
# Number of Community Areas
length(unique(dfcc$"Community Area"))
## [1] 77
# Number of Locations
length(unique(dfcc$"Location"))
## [1] 106325
# Number of Blocks
length(unique(dfcc$"Block"))
## [1] 26849
# Number of Districts
length(unique(dfcc$"District"))
## [1] 23
The first visualization shows a multiple bar chart of the top 10 crimes during 2021. For each crime type there is the total incident count each month. Theft, battery, criminal damage, assault and deceptive practice seem to be the most significant crimes. There also seems to be a trend of increase in crime during the warmer months.
# Data Transformation & Edited DF for Visualization use
dfcc$Date <- mdy_hms(dfcc$Date)
dfcc$month <- month(dfcc$Date)
dfcc$month <- as.factor(dfcc$month)
months_names <- c('Jan','Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul','Aug', 'Sep', 'Oct', 'Nov', 'Dec')
levels(dfcc$month) = months_names
dfcc$PrimaryType <- as.factor(dfcc$`Primary Type`)
top_crimes <- dfcc %>%
select(everything()) %>%
group_by(PrimaryType) %>%
dplyr::summarise(tot = length(PrimaryType), .groups = 'keep') %>%
data.frame()
top_crimes <- top_crimes[order(top_crimes$tot, decreasing=TRUE),]
top_crime_types <- top_crimes$PrimaryType[1:10]
# top_crime_types
crime_month <- dfcc %>%
select("PrimaryType", "month") %>%
filter(PrimaryType %in% top_crime_types) %>%
dplyr::group_by(PrimaryType, month) %>%
dplyr::summarise(n = length(PrimaryType), .groups='keep') %>%
data.frame()
# crime_month
p1 <- ggplot(crime_month, aes(x = month, y = n, fill = PrimaryType)) +
geom_bar(stat="identity", position = "dodge") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_y_continuous(labels = comma) +
labs(title = "Multiple Bar Charts - Number of Incidents per Month per Crime",
x = "Months", y = "Incident Count") +
scale_fill_brewer(palette = "Paired") +
facet_wrap(~PrimaryType, ncol=5, nrow=2)
p1
Visualization two is when the visualizations switch to domestic crimes. It features the top 10 domestic crimes.Each column shows the total incidents per month and a visual breakdown of the makeup of each crime within the total. Domestic battery is the most common type of domestic incident year round. It also follows the trend of increase during warmer months.
# Domestic Violence Data Frame
dom_df <- dfcc %>%
filter(Domestic == TRUE) %>%
select("Date", "Block", "IUCR", "Primary Type", "Description", "Location Description",
"Arrest", "Domestic", "Beat", "District", "Ward", "Community Area", "X Coordinate",
"Y Coordinate", "Location","Latitude", "Longitude", "month") %>%
data.frame()
tot_by_crimes <- dom_df %>%
select(everything()) %>%
group_by(Primary.Type) %>%
dplyr::summarise(tot = length(month), .groups = 'keep') %>%
data.frame()
tot_by_crimes <- tot_by_crimes[order(tot_by_crimes$tot, decreasing=TRUE),]
highest_crimes <- tot_by_crimes$Primary.Type[1:10]
high_crime <- dom_df %>%
filter(Primary.Type %in% highest_crimes) %>%
select(everything()) %>%
group_by(Primary.Type, month) %>%
dplyr::summarise(tot = length(month), .groups = 'keep') %>%
data.frame()
dom_month_tot <- dom_df %>%
select(month, Primary.Type) %>%
dplyr::group_by(month) %>%
dplyr::summarise(tot = length(Primary.Type), .groups = 'keep') %>%
data.frame()
p2 <- ggplot(high_crime, aes(x = month, y = tot, fill = Primary.Type)) +
geom_bar(stat = 'identity', size = .2, color = 'grey') +
labs(title = "Domestic Crimes Per Month (2021)", x = "Month ", y = "Number of Crimes") +
theme_light() +
theme(plot.title=element_text(hjust = 0.5), legend.title = element_text(color = "slategray", size = 10),
legend.text = element_text(color = "slategray", size = 6)) +
guides(fill=guide_legend(title="Type of Crime")) +
geom_text(data = dom_month_tot, aes(x=month, y=tot, label = scales::comma(tot), fill= NULL),
hjust = 0.5, size = 4, color = 'slategray') +
scale_fill_brewer(palette = "Paired")
p2
To follow the question about monthly rates, I wanted to look more closely at the hours of domestic crime. The plot shows that around 12:00am is when a majority of domestic incidents occur, hitting their low around 5:00 and 6:00am. After 9:00am the incidents stay around a similar amount.
dom_hours <- dom_df %>%
select(Date) %>%
dplyr::mutate(hour = hour(Date)) %>%
group_by(hour) %>%
dplyr::summarise(sum = length(Date), .groups = 'keep') %>%
data.frame()
hi_lo <- dom_hours %>%
filter(sum==min(sum) | sum==max(sum)) %>%
data.frame()
p3 <- ggplot(dom_hours, aes(x = hour, y = sum)) +
geom_line(color='black') +
geom_point(shape=21, size=4, color='white', fill='darkgreen') +
scale_x_continuous(labels = min(dom_hours$hour):max(dom_hours$hour),
breaks = min(dom_hours$hour):max(dom_hours$hour)) +
labs(title = "Domestic Incidents Per Hour",
x = "Hour", y = "Total Domestic Incidents",
caption = "Hour 0 = 12:00am") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5), plot.caption = element_text(hjust = 0.5)) +
geom_point(data = hi_lo, aes(x=hour, y=sum), shape=21, size=4, color='Blue', fill='white') +
geom_label_repel(aes(label= ifelse(sum == max(sum)|sum==min(sum),scales::comma(sum),"")),
box.padding = 1, point.padding=1,
size=4, color='Grey50', segment.color='darkblue')
p3
Using the observation from the second visualization, I wanted to look more closely at domestic battery. My curiosity was around how many of the recorded incidents also ended in arrest. As can be observed on the plot, the difference in the number of arrests versus incidents is very large.
battery_df <- dom_df %>%
select(Primary.Type, Arrest, month) %>%
filter(Primary.Type == "BATTERY") %>%
group_by(month) %>%
dplyr::summarise(tot = length(month), .groups = 'keep') %>%
data.frame()
arrest_battery <- dom_df %>%
select(Primary.Type, Arrest, month) %>%
filter(Primary.Type == "BATTERY" &Arrest == "TRUE") %>%
group_by(month) %>%
dplyr::summarise(tot_arrest = length(month), .groups = 'keep') %>%
data.frame()
all_battery <- cbind(battery_df, arrest_battery)
all_battery <- all_battery[-c(3)]
p4 <- ggplot(data = all_battery, aes(x=month)) +
geom_bar(aes(y=tot), stat = "identity", size = .5, color = 'black', fill = 'lightgrey') +
geom_point(aes(y=tot_arrest), shape = 21, fill = "slategray", color = "blue", size = 4) +
geom_line(group = 1, data = all_battery, aes(x = month, y=tot_arrest)) +
labs(title = "Monthly Domestic Battery Incidents vs. Arrests",
x = "Month", y = "Arrests vs. Reports",
caption = "Graph bars represent total domestic battery incidents and the graph points represent the total arrests.") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5), plot.caption = element_text(hjust = 0.5))
p4
The last visualization is a plot of all the longitude and latitude points for domestic battery incidents in 2021. It shows a few zones that have become almost completely solid with the density of incidents.
Looking at all of the visualizations together I am curious to further explore the 2021 data comparatively to the previous year and pre-covid years. It would be interesting to look at domestic incident rates as well as locations of domestic incidents compares to pre-covid data.
dom_battery <- dom_df %>%
select(everything()) %>%
filter(Primary.Type == "BATTERY") %>%
data.frame()
district_gps <- dom_battery %>%
select(Latitude, Longitude, District) %>%
group_by(Longitude, Latitude) %>%
data.frame()
district_gps$District <- as.numeric(district_gps$District )
ncols <- 25
map_palette <- colorRampPalette(brewer.pal(12, "Paired"))(ncols)
p5 <- ggplot(data = district_gps, aes(x=Longitude, y=Latitude, colour=factor(District))) +
geom_point(size = .6) +
scale_color_manual(name = "DISTRICT", values = map_palette) +
labs(title = "Map of Chicago",
caption = "Points on map made up of locations of domestic assault incidents in 2021.") +
theme_light() +
guides(colour=guide_legend(ncol=3))+
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
legend.box.background = element_rect(colour = "darkgrey"),
legend.title = element_text(hjust = 0.5, color = 'darkgrey'),
legend.position = c(.85,.75),
plot.title = element_text(hjust=0.5),
plot.caption = element_text(hjust = 0.5))
p5
## Warning: Removed 2 rows containing missing values (geom_point).