DataWranglingProject

Introduction

Before the riot of 2001, Cincinnati’s overall crime rate was dropping dramatically and had reached its lowest point since 1992. After the riot violent crime increased. Reflecting national trends, crime rates in Cincinnati have dropped in all categories from 2006 to 2010.

In 2011, using FBI data, the CQ Press ranked Cincinnati the 16th most dangerous city in the United States.However, the FBI web site recommends against using its data for rankings, since there are many factors that influence crime rates. Also, the CQ Press did not include Chicago, Illinois in its ranking.[6]

According to the Hamilton County Prosecutor, the vast majority of people being murdered are inside the city and are African American. In 2009, 44 black men and 11 black women were murdered, but no whites.[8] According to the Prosecutor, people almost always commit murders inside their racial classifications, and there is a subset in the underclass of Cincinnati which “commit a lot of violent crime and tend to be black,”[8] similar to national trends.

We have analysed the data to understand how crime rates have changed by time and location.

The final conclusions of the analysis will answer:

The number of crimes
Violence of offense
Weapons Used
Days of the week
Most dangerous locations
Gender which is targetted the most
Age bracket

We will be leveraging this information using various libraries in R and answer the question: Is Cincinnati a safe place as it was earlier?

Packages Required

The following packages are used :

library(tidyr) #for data cleaning
library(tidyverse) #to install the required packages
library(readr) #to import the csv file
library(dplyr) #to manipulate the data
library(ggplot2) #for plotting different graphs
library(knitr) #for R markdown
library(leaflet) #for visulaizing the data on map

Data Cleaning and Preparation

DATA SOURCE

I am using the crime dataset for the city of Cincinnati to explore the situation of crimes in the city

The link to the data source is here

PURPOSE OF THE DATA

The purpose is to identify crime rate and analyze different factors that lead to the crime. The original data contains 314000 rows and 39 columns and for the years 1991 to 2018.

DATA CLEANING

The first step is to import the data from a CSV file. #Importing data

#Import CSV file
crime_data <- read.csv('crimedata.csv')

On initial analysis on the data I found that I could use the date reported column and split it into two columns. The two columns would represent the date of crime and the time of occurrence.

#Date of crime and time of crime

Date_Rpt <- substr(crime_data$DATE_FROM, 1, 22)
Date_Rpt <- (strptime(Date_Rpt, '%m/%d/%Y %I:%M:%S %p'))


Time_of_rpt <- substr(crime_data$DATE_FROM, 12, 22)
Time_of_rpt<- as.difftime(Time_of_rpt,'%I:%M:%S %p', units="hours")
Date_Rpt <- substr(crime_data$DATE_FROM, 1, 10)

crime_data$DATE_FROM <- Date_Rpt
crime_data <- cbind(crime_data,  Time_of_rpt)

Now I converted the data into a table and filtered out the required columns for analysis

#Converting to a table
crime_data <- tbl_df(crime_data)
#selecting the required columns
cd_cleandata <- select(crime_data,
                   INSTANCEID,  
                   DATE_FROM,
                   Time_of_rpt,
                   OFFENSE,
                   DAYOFWEEK,
                   SNA_NEIGHBORHOOD,    
                   WEAPONS,
                   LONGITUDE_X, 
                   LATITUDE_X,  
                   VICTIM_AGE,  
                   VICTIM_GENDER
                  
)

Providing the summary of the cleaned dataset

summary(cd_cleandata)

##                                 INSTANCEID      DATE_FROM        
##  C2CA9A5A-5C3C-419F-BFE1-F6650C936EFF:   127   Length:355379     
##  932627CA-318A-4B93-8FE3-57914736D1F6:   126   Class :character  
##  456C6412-EC72-4F59-B1B4-5980631026AD:    84   Mode  :character  
##  3EFCADA0-F622-403C-9B64-50E23EC5F00F:    80                     
##  46504773-3D1B-48A2-96F6-74BD8EBE5F73:    64                     
##  1AF3B2F3-411A-4A01-A3A3-1E0F0DDAD73E:    60                     
##  (Other)                             :354838                     
##  Time_of_rpt                                OFFENSE      
##  Length:355379     THEFT                        :120315  
##  Class :difftime   CRIMINAL DAMAGING/ENDANGERING: 43536  
##  Mode  :numeric    ASSAULT                      : 38225  
##                    BURGLARY                     : 28817  
##                    AGGRAVATED ROBBERY           : 18516  
##                    BREAKING AND ENTERING        : 17148  
##                    (Other)                      : 88822  
##      DAYOFWEEK            SNA_NEIGHBORHOOD 
##  FRIDAY   :52163   N/A            : 44776  
##  SATURDAY :51460   WESTWOOD       : 32163  
##  SUNDAY   :50299   EAST PRICE HILL: 23103  
##  MONDAY   :50214   WEST PRICE HILL: 18588  
##  WEDNESDAY:49728   AVONDALE       : 17911  
##  TUESDAY  :49450   CUF            : 14680  
##  (Other)  :52065   (Other)        :204158  
##                                              WEAPONS      
##  99 - NONE                                       :141022  
##  99--NONE                                        : 70866  
##  40 - PERSONAL WEAPONS (HANDS, FEET, TEETH, ETC.): 41219  
##  U - UNKNOWN                                     : 24493  
##  40--PERSONAL WEAPONS (HANDS, FEET, TEETH, ETC.) : 16932  
##  12 - HANDGUN                                    : 15743  
##  (Other)                                         : 45104  
##   LONGITUDE_X       LATITUDE_X      VICTIM_AGE   
##  Min.   :-84.82   Min.   :39.05   18-25  :72619  
##  1st Qu.:-84.57   1st Qu.:39.12   31-40  :58768  
##  Median :-84.52   Median :39.14   UNKNOWN:57537  
##  Mean   :-84.52   Mean   :39.14   41-50  :43554  
##  3rd Qu.:-84.49   3rd Qu.:39.16   26-30  :41978  
##  Max.   :-84.25   Max.   :39.36   51-60  :35961  
##  NA's   :45864    NA's   :45864   (Other):44962  
##               VICTIM_GENDER   
##                      : 54361  
##  F - FEMALE          :    25  
##  FEMALE              :164757  
##  M - MALE            :    18  
##  MALE                :135901  
##  NON-PERSON (BUSINESS:    51  
##  UNKNOWN             :   266

Removing missing values and blank fields from the dataset.

#cleaning to remove na
cd_cleandata <- na.omit(cd_cleandata)
cd_cleandata <- with(cd_cleandata, cd_cleandata[!(INSTANCEID == "" | is.na(INSTANCEID)), ])
cd_cleandata <- with(cd_cleandata, cd_cleandata[!(DAYOFWEEK == "" | is.na(DAYOFWEEK)), ])
cd_cleandata <- with(cd_cleandata, cd_cleandata[!(SNA_NEIGHBORHOOD == "" | is.na(SNA_NEIGHBORHOOD)), ])
cd_cleandata <- with(cd_cleandata, cd_cleandata[!(OFFENSE == "" | is.na(OFFENSE)), ])
cd_cleandata <- with(cd_cleandata, cd_cleandata[!(VICTIM_AGE == "" | is.na(VICTIM_AGE)), ])
cd_cleandata <- with(cd_cleandata, cd_cleandata[!(VICTIM_AGE == "UNKNOWN" | is.na(VICTIM_AGE)), ])
cd_cleandata <- with(cd_cleandata, cd_cleandata[!(VICTIM_GENDER == "" | is.na(VICTIM_GENDER)), ])

Combining the similar kinds of crimes to on and combinig the weapons used to find the unique weapons used for the crime. Also, grouped juvenile and under 18 years of age into one group called Under 18.

#cleaning offense column
cd_cleandata$OFFENSE <- gsub(".*HARRASS.*", "HARRASS", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*MENACING.*", "MENACING", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*FORGERY.*", "FORGERY", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*ROBBERY.*", "ROBBERY", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*MURDER.*", "MURDER", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*ABDUCTION.*", "ABDUCTION", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*SEX.*", "SEXUAL CRIME", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*INTIMID.*", "INTIMIDATION", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*KIDNAPPING.*", "KIDNAPPING", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*PUBLIC INDECENCY.*", "PUBLIC INDECENCY", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*ARSON.*", "ARSON", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*UNAUTHORIZED USE.*", "UNAUTHORIZED USE", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*PATIENT ABUSE.*", "PATIENT ABUSE", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*TELEPHONE HARRASSMENT.*", "TELEPHONE HARRASSMENT", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*VIOL.*", "VIOLATE PROTECTION ORDER", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*CREDIT CARD.*", "CREDIT CARD FRAUD", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*VANDALISM.*", "VANDALISM", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*THEFT.*", "THEFT", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*CRIMINAL.*", "CRIMINAL MISCHIEF", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*DISORDERLY CONDUCT.*", "DISORDERLY CONDUCT", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*ENDANGERING CHILDREN.*", "ENDANGERING CHILDREN", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*ASSAULT.*", "ASSAULT", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*BURGLARY.*", "BURGLARY", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$OFFENSE <- gsub(".*RAPE.*", "RAPE", (cd_cleandata$OFFENSE), perl = FALSE)
cd_cleandata$VICTIM_AGE <- gsub(".*JUVENILE.*", "UNDER 18", (cd_cleandata$VICTIM_AGE), perl = FALSE)

#Cleaning weapons column

cd_cleandata$WEAPONS <- gsub(".*11.*", "FIREARM", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*12.*", "HANDGUN", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*13.*", "RIFLE", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*14.*", "SHOTGUN", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*15.*", "FIREARM", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*16.*", "FIREARM", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*17.*", "FIREARM", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*18.*", "BB AND PELLET GUNS", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*20.*", "KNIFE/CUTTING INSTRUMENT", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*30.*", "BLUNT OBJECT", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*35.*", "MOTOR VEHICLE", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*40.*", "PERSONAL WEAPON", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*60.*", "EXPLOSIVES", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*70.*", "DRUGS", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*80.*", "OTHER WEAPONS", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*U.*", "UNKNOWN", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*65.*", "FIRE/INCENDIARY DEVICE", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*50.*", "POISON", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*99.*", "NONE", (cd_cleandata$WEAPONS), perl = FALSE)
cd_cleandata$WEAPONS <- gsub(".*85.*", "ASPHYXIATION", (cd_cleandata$WEAPONS), perl = FALSE)

Final Dataset

The data set I used to analyze the problem is as follows:

kable(cd_cleandata[1:10,], caption = "Preview - Clean Data")

Preview - Clean Data
INSTANCEID	DATE_FROM	Time_of_rpt	OFFENSE	DAYOFWEEK	SNA_NEIGHBORHOOD	WEAPONS	LONGITUDE_X	LATITUDE_X	VICTIM_AGE	VICTIM_GENDER
D6D7D173-E416-4571-AF34-A767ACF810D9	03/16/2015	15.033333 hours	MENACING	MONDAY	WALNUT HILLS	NONE	-84.49080	39.11996	26-30	FEMALE
2A569674-9603-4ED9-82BB-783B843B94C9	06/23/2011	10.333333 hours	ASSAULT	THURSDAY	AVONDALE	PERSONAL WEAPON	-84.49155	39.14167	41-50	MALE
5B95E5DC-E35D-4F86-B77C-02F25C950329	06/25/2012	15.583333 hours	ASSAULT	MONDAY	COLLEGE HILL	FIREARM	-84.54596	39.20102	18-25	MALE
B908DB49-C51D-442B-ADFD-B6E64696071B	03/13/2018	16.750000 hours	ASSAULT	TUESDAY	WEST END	PERSONAL WEAPON	-84.52521	39.11082	41-50	FEMALE
66F569F5-4C8E-4848-BD42-D0F8DE4D42B0	05/19/2013	14.500000 hours	BURGLARY	SUNDAY	MADISONVILLE	FIREARM	-84.39035	39.15479	18-25	FEMALE
3B4450F2-F94B-4BFB-8B0A-3FF04DBD73C4	07/24/2015	0.000000 hours	ENDANGERING CHILDREN	FRIDAY	EVANSTON	UNKNOWN	-84.47835	39.13468	UNDER 18	MALE
B095D1B2-9089-4CA4-85B2-A2657FB82793	09/05/2012	1.366667 hours	ROBBERY	WEDNESDAY	OVER-THE-RHINE	NONE	-84.51276	39.11108	31-40	MALE
FD5706CD-7E2F-4CED-8C73-2E3AB142EF8C	09/03/2015	1.166667 hours	ASSAULT	THURSDAY	OVER-THE-RHINE	PERSONAL WEAPON	-84.51640	39.11505	41-50	FEMALE
417FB2EC-1405-4F31-802B-2D3D7C9A47C3	08/31/2014	18.000000 hours	THEFT	SUNDAY	MT. AUBURN	NONE	-84.51008	39.11733	18-25	MALE
A6B8356E-5A44-425A-8076-80BA963D09AC	09/25/2010	14.500000 hours	ASSAULT	SATURDAY	WALNUT HILLS	PERSONAL WEAPON	-84.49454	39.12494	31-40	MALE

Exploratory Analysis

The exploratory data analysis is used to find the following answers:

The number of crimes
Violence of offense
Weapons Used
Days of the week
Most dangerous locations
Gender which is targetted the most
Age bracket

#total number of crimes
length(unique(cd_cleandata$INSTANCEID))

## [1] 201198

#Violence of offense
cd_cleandata    %>% count(OFFENSE,  sort    = TRUE) %>% top_n(5)

## Selecting by n

## # A tibble: 5 x 2
##   OFFENSE               n
##   <chr>             <int>
## 1 THEFT             76142
## 2 ASSAULT           42003
## 3 CRIMINAL MISCHIEF 32744
## 4 BURGLARY          30155
## 5 ROBBERY           18469

#Weapons used
cd_cleandata    %>% count(WEAPONS,  sort    = TRUE) %>% filter(WEAPONS != "NONE" & WEAPONS!= "UNKNOWN") %>% top_n(5)

## Selecting by n

## # A tibble: 5 x 2
##   WEAPONS                                         n
##   <chr>                                       <int>
## 1 PERSONAL WEAPON                             46461
## 2 FIREARM                                      7918
## 3 OTHER WEAPONS                                2474
## 4 PERSONAL WEAPONS (HANDS, FEET, TEETH, ETC.)  1094
## 5 MOTOR VEHICLE                                 829

#unsafe days of the week
cd_cleandata    %>% count(DAYOFWEEK,    sort    = TRUE) %>% top_n(5)

## Selecting by n

## # A tibble: 5 x 2
##   DAYOFWEEK     n
##   <fct>     <int>
## 1 SATURDAY  37445
## 2 FRIDAY    37005
## 3 SUNDAY    37004
## 4 MONDAY    36286
## 5 WEDNESDAY 35881

#unsafe neighborhoods
cd_cleandata    %>% 
  count(SNA_NEIGHBORHOOD, sort = TRUE) %>% top_n(5)

## Selecting by n

## # A tibble: 5 x 2
##   SNA_NEIGHBORHOOD     n
##   <fct>            <int>
## 1 WESTWOOD         23857
## 2 EAST PRICE HILL  19441
## 3 WEST PRICE HILL  16044
## 4 AVONDALE         15413
## 5 CUF              12816

#Which gender is targeted more
cd_cleandata    %>% count(VICTIM_GENDER,    sort    = TRUE) %>% top_n(1)

## Selecting by n

## # A tibble: 1 x 2
##   VICTIM_GENDER      n
##   <fct>          <int>
## 1 FEMALE        141323

#age group
cd_cleandata    %>% count(VICTIM_AGE,   sort    = TRUE) %>% top_n(5)

## Selecting by n

## # A tibble: 5 x 2
##   VICTIM_AGE     n
##   <chr>      <int>
## 1 18-25      62000
## 2 31-40      50983
## 3 41-50      37963
## 4 26-30      36088
## 5 51-60      31584

Creating a bar plot to see the distribution of the crime over the years. The plot shows that the crime rates were high in the years 2010 to 2013 and after that it decreased in the further years.

#adding year column to the dataset
year = (substr(cd_cleandata$DATE_FROM,7,11))
cd_cleandata <- cbind(cd_cleandata,  year)

#plotting barplot
barplot(table(substr(cd_cleandata$DATE_FROM, 7, 11)),
        main = "Crimes committed by Year",
        xlab = "Year",
        ylab = "Total Crimes",
        col = "cyan")

Filtering the offenses by type.

#5 offense types

cd_cleandata %>%
  filter(cd_cleandata$OFFENSE == "THEFT" |
           cd_cleandata$OFFENSE == "MENACING"|
           cd_cleandata$OFFENSE == "ASSAULT"|
           cd_cleandata$OFFENSE == "BURGLARY"|
           cd_cleandata$OFFENSE == "CRIMINAL MISCHIEF") %>%
  ggplot(aes(factor(OFFENSE), fill = OFFENSE)) + 
  geom_bar(stat="count", position = "dodge")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  labs( y = "Count of crimes",
        x = "Offense type",
        title = "Top 5 offenses in Cincinnati")+
  guides(fill=guide_legend(title="Crime in Cincinnati"))

Here we can see that menacing has the lowest count and theft has the highest number of occurrence.

Using the leaflet package I am using the map functionality to find out the count of crimes per location.

#changing neighborhood to factor

cd_cleandata$SNA_NEIGHBORHOOD <- as.factor(cd_cleandata$SNA_NEIGHBORHOOD)

#creating function for visualization by location


mapforlocation <- function(location){
  subdata <- filter(cd_cleandata, location == cd_cleandata$SNA_NEIGHBORHOOD)
  occurence_map <- leaflet() %>%
    addTiles()%>%
    addMarkers(lng = subdata$LONGITUDE_X,
               lat = subdata$LATITUDE_X, 
               clusterOptions = markerClusterOptions()
    )
  
}

map <- mapforlocation(cd_cleandata$SNA_NEIGHBORHOOD)
map

Here we clearly see that the neighborhood Over the Rhine has the highest number of crimes.

Next I tried to find the trend to see the top 5 SNA neighborhoods with the highest number of crimes.

# finding most dangerous crimes by SNA neighborhood
cd_cleandata %>%
  filter(cd_cleandata$SNA_NEIGHBORHOOD == "WESTWOOD" |
           cd_cleandata$SNA_NEIGHBORHOOD == "WEST PRICE HILL"|
           cd_cleandata$SNA_NEIGHBORHOOD == "EAST PRICE HILL"|
           cd_cleandata$SNA_NEIGHBORHOOD == "OVER-THE-RHINE"|
           cd_cleandata$SNA_NEIGHBORHOOD == "AVONDALE") %>%
  ggplot(aes(factor(SNA_NEIGHBORHOOD), fill = SNA_NEIGHBORHOOD)) + 
  geom_bar(stat="count", position = "dodge",color ="blue")+
  theme(axis.text.x = element_text(angle = 30, hjust = 1))+
  labs( y = "Total number of crimes",
        x = "Neighborhood",
        title = "Unsafe Neighborhoods")

In the below plot we can see that the age group 31-40 is the most unsafe age group and the neighbohood has the highest number of crimes in that ge group.

#graph for age group and neighborhood

cd_cleandata %>% filter(cd_cleandata$SNA_NEIGHBORHOOD == "WESTWOOD"|
                       cd_cleandata$SNA_NEIGHBORHOOD == "WEST PRICE HILL"|
                       cd_cleandata$SNA_NEIGHBORHOOD == "EAST PRICE HILL"|
                       cd_cleandata$SNA_NEIGHBORHOOD == "OVER-THE-RHINE"|
                       cd_cleandata$SNA_NEIGHBORHOOD == "AVONDALE") %>%
  group_by(VICTIM_AGE, SNA_NEIGHBORHOOD)%>%
  tally()%>%
  ggplot(aes(x = VICTIM_AGE, y = n, group = SNA_NEIGHBORHOOD, color = SNA_NEIGHBORHOOD)) + 
  geom_line()+
  labs(y = "Number of Crimes",
       x = "Age Group",
       title = "Most Dangerous locations",
       color = "Neighborhoods")

Summary

I worked to analyze the Cincinnati crime dataset to find trends in the data and determine the parameters which lead to the criminal activity in the city.

I addressed this problem by using the data columns like neighborhood and the number of crimes in that area. I also use the offense types to find the fators which lead to those crimes.

The methodology I used was that I created a barplot, map, histogram and trend lines to compare the various factors and thereby gaining the insights.

This analysis helped to find the most dangerous locations in the city which can help the police department to increase the security measures in those areas. We also found out the criminal activity against the different age groups which can help the police to focus people in that age group which seem to be the working class age. We also see that theft is the most comon type of crime and thereby security measure against thefts can be focused more. We also see that the crimes has been reducing over the years which is a good indicator for the police department.

Through my analysis the police department can enforce more security in the nieghborhood Avondale. The city can conduct more awareness for the people in the age group 31-40 about how to be safe and avoid such situations. To reduce the number of thefts various programs can be conducted to make common people aware about how to take care of their belongings.

This project can be further improved by working on the less utilized columns like gender of the victim, weapons used in the crime, bias of the criminal and times of the crime to get more deeper insights.