Introduction:

This project is primarily focused in anticipating crime as per gender, age and race in the city of Cincinnati in order to create social security awareness among citizens as well as to target education materials.

Objective:

I will be using the City of Cincinnati Police Crime dataset to anticipate various crime as per gender, age and race. This will help in targeted educational materials across the city to create social security awareness as well as will help the citizens of Cincinnati to understand what kind of offense is more prevalent based on age and race.

Introduction to dataset:

I will be using City of Cincinnati Police Crime dataset which has approx. 358k rows and 40 columns. Out of these columns, my response variable will be “VICTIM_GENDER” and my predictor variables will be OFFENSE, VICTIM_AGE AND VICTIM_RACE. Since VICTIM_GENDER is a categorical variable, this will be the best target variable to analyze the data and get the outcome.

Packages Required:

To analyze this data, we will use the following R packages:

library(dplyr)
library("rio")

mutate() adds new variables that are functions of existing variables select() picks variables based on their names. filter() picks cases based on their values. summarise() reduces multiple values down to a single summary. arrange() changes the ordering of the rows.

These all combine naturally with group_by() which allows you to perform any operation “by group”. dplyr is designed to abstract over how the data is stored.

Data Preparation:

The data set comes from the Cincinnati Police Crime Data.

Incidents are the records, of reported crimes, collated by an agency for management. Incidents are typically housed in a Records Management System (RMS) that stores agency-wide data about law enforcement operations.This dataset contains 40 columns and approx 358k rows. This dataset was created in November 15, 2017 but has been updated recently on April 4, 2019.

Below are the steps which have been followed for data importing and cleaning:

crime_data <- import("C:/Users/Ankeeta/Desktop/city_of_cincinnati_police_data_initiative_crime_incidents.csv")
glimpse(crime_data)
## Observations: 355,379
## Variables: 40
## $ INSTANCEID                     <chr> "D6D7D173-E416-4571-AF34-A767AC...
## $ INCIDENT_NO                    <chr> "159006170", "41103934", "21104...
## $ DATE_REPORTED                  <chr> "03/16/2015 05:19:00 PM +0000",...
## $ DATE_FROM                      <chr> "03/16/2015 03:02:00 PM +0000",...
## $ DATE_TO                        <chr> "03/16/2015 03:05:00 PM +0000",...
## $ CLSD                           <chr> "J--CLOSED", "J--CLOSED", "D--V...
## $ UCR                            <int> 802, 810, 1493, 401, 810, 550, ...
## $ DST                            <chr> "4", "4", "2", "5", "1", "2", "...
## $ BEAT                           <chr> "2", "3", "3", "4", "3", "3", "...
## $ OFFENSE                        <chr> "AGGRAVATED MENACING", "ASSAULT...
## $ LOCATION                       <chr> "48-PARKING LOT", "02-MULTI FAM...
## $ THEFT_CODE                     <chr> "", "", "", "", "", "", "", "",...
## $ FLOOR                          <chr> "", "", "", "", "", "2 - FIRST ...
## $ SIDE                           <chr> "", "", "", "", "", "1 - FRONT"...
## $ OPENING                        <chr> "", "", "", "", "", "1 - DOOR",...
## $ HATE_BIAS                      <chr> "N--NO BIAS/NOT APPLICABLE", "N...
## $ DAYOFWEEK                      <chr> "MONDAY", "THURSDAY", "SATURDAY...
## $ RPT_AREA                       <chr> "45", "365", "138", "439", "203...
## $ CPD_NEIGHBORHOOD               <chr> "WALNUT HILLS", "AVONDALE", "MA...
## $ SNA_NEIGHBORHOOD               <chr> "WALNUT HILLS", "AVONDALE", "MA...
## $ WEAPONS                        <chr> "99 - NONE", "40--PERSONAL WEAP...
## $ DATE_OF_CLEARANCE              <chr> "03/28/2015 12:00:00 AM +0000",...
## $ HOUR_FROM                      <int> 152, 1020, 240, 1535, 1645, 143...
## $ HOUR_TO                        <int> 155, 1030, 244, 1540, 1712, 144...
## $ ADDRESS_X                      <chr> "21XX FULTON AV", "34XX READING...
## $ LONGITUDE_X                    <dbl> -84.49080, -84.49155, -84.39439...
## $ LATITUDE_X                     <dbl> 39.11996, 39.14167, 39.15156, 3...
## $ VICTIM_AGE                     <chr> "26-30", "41-50", "UNKNOWN", "1...
## $ VICTIM_RACE                    <chr> "BLACK", "BLACK", "", "BLACK", ...
## $ VICTIM_ETHNICITY               <chr> "NOT OF HISPANIC ORIG", "NOT OF...
## $ VICTIM_GENDER                  <chr> "FEMALE", "MALE", "", "MALE", "...
## $ SUSPECT_AGE                    <chr> "51-60", "UNKNOWN", "18-25", "1...
## $ SUSPECT_RACE                   <chr> "BLACK", "BLACK", "BLACK", "BLA...
## $ SUSPECT_ETHNICITY              <chr> "NOT OF HISPANIC ORIG", "NOT OF...
## $ SUSPECT_GENDER                 <chr> "MALE", "MALE", "FEMALE", "MALE...
## $ TOTALNUMBERVICTIMS             <int> 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 2...
## $ TOTALSUSPECTS                  <int> 1, 1, 2, 4, NA, 2, 1, 2, NA, NA...
## $ UCR_GROUP                      <chr> "PART 2 MINOR", "PART 2 MINOR",...
## $ COMMUNITY_COUNCIL_NEIGHBORHOOD <chr> "WALNUT HILLS", "AVONDALE", "MA...
## $ ZIP                            <dbl> 2.233473e-319, 2.234115e-319, 2...
summary(crime_data)
##   INSTANCEID        INCIDENT_NO        DATE_REPORTED     
##  Length:355379      Length:355379      Length:355379     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##   DATE_FROM           DATE_TO              CLSD                UCR        
##  Length:355379      Length:355379      Length:355379      Min.   :   0.0  
##  Class :character   Class :character   Class :character   1st Qu.: 552.0  
##  Mode  :character   Mode  :character   Mode  :character   Median : 600.0  
##                                                           Mean   : 803.9  
##                                                           3rd Qu.: 810.0  
##                                                           Max.   :2761.0  
##                                                           NA's   :71      
##      DST                BEAT             OFFENSE         
##  Length:355379      Length:355379      Length:355379     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##    LOCATION          THEFT_CODE           FLOOR          
##  Length:355379      Length:355379      Length:355379     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##      SIDE             OPENING           HATE_BIAS        
##  Length:355379      Length:355379      Length:355379     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##   DAYOFWEEK           RPT_AREA         CPD_NEIGHBORHOOD  
##  Length:355379      Length:355379      Length:355379     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  SNA_NEIGHBORHOOD     WEAPONS          DATE_OF_CLEARANCE    HOUR_FROM     
##  Length:355379      Length:355379      Length:355379      Min.   :   0.0  
##  Class :character   Class :character   Class :character   1st Qu.: 130.0  
##  Mode  :character   Mode  :character   Mode  :character   Median : 230.0  
##                                                           Mean   : 816.9  
##                                                           3rd Qu.:1623.0  
##                                                           Max.   :2359.0  
##                                                           NA's   :7       
##     HOUR_TO        ADDRESS_X          LONGITUDE_X       LATITUDE_X   
##  Min.   :   0.0   Length:355379      Min.   :-84.82   Min.   :39.05  
##  1st Qu.: 145.0   Class :character   1st Qu.:-84.57   1st Qu.:39.12  
##  Median : 815.0   Mode  :character   Median :-84.52   Median :39.14  
##  Mean   : 922.3                      Mean   :-84.52   Mean   :39.14  
##  3rd Qu.:1651.0                      3rd Qu.:-84.49   3rd Qu.:39.16  
##  Max.   :2359.0                      Max.   :-84.25   Max.   :39.36  
##  NA's   :1316                        NA's   :45864    NA's   :45864  
##   VICTIM_AGE        VICTIM_RACE        VICTIM_ETHNICITY  
##  Length:355379      Length:355379      Length:355379     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  VICTIM_GENDER      SUSPECT_AGE        SUSPECT_RACE      
##  Length:355379      Length:355379      Length:355379     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  SUSPECT_ETHNICITY  SUSPECT_GENDER     TOTALNUMBERVICTIMS TOTALSUSPECTS   
##  Length:355379      Length:355379      Min.   :  1.000    Min.   : 1.00   
##  Class :character   Class :character   1st Qu.:  1.000    1st Qu.: 1.00   
##  Mode  :character   Mode  :character   Median :  1.000    Median : 1.00   
##                                        Mean   :  1.433    Mean   : 1.64   
##                                        3rd Qu.:  1.000    3rd Qu.: 2.00   
##                                        Max.   :127.000    Max.   :16.00   
##                                        NA's   :146        NA's   :194149  
##   UCR_GROUP         COMMUNITY_COUNCIL_NEIGHBORHOOD      ZIP   
##  Length:355379      Length:355379                  Min.   :0  
##  Class :character   Class :character               1st Qu.:0  
##  Mode  :character   Mode  :character               Median :0  
##                                                    Mean   :0  
##                                                    3rd Qu.:0  
##                                                    Max.   :0  
## 

We can see that there are many NA’s in UCR, BEAT, RPT_AREA, LONGITUDE_X, LATITUDE_X, TOTALNUMBERVICTIMS, TOTALSUSPECTS and ZIP.

sum(is.na(crime_data))
## [1] 287417
crime_data_cleaned <- na.omit(crime_data)
sum(is.na(crime_data_cleaned))
## [1] 0
summary(crime_data_cleaned)
##   INSTANCEID        INCIDENT_NO        DATE_REPORTED     
##  Length:137065      Length:137065      Length:137065     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##   DATE_FROM           DATE_TO              CLSD                UCR        
##  Length:137065      Length:137065      Length:137065      Min.   :   0.0  
##  Class :character   Class :character   Class :character   1st Qu.: 551.0  
##  Mode  :character   Mode  :character   Mode  :character   Median : 701.0  
##                                                           Mean   : 844.2  
##                                                           3rd Qu.: 862.0  
##                                                           Max.   :2761.0  
##      DST                BEAT             OFFENSE         
##  Length:137065      Length:137065      Length:137065     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##    LOCATION          THEFT_CODE           FLOOR          
##  Length:137065      Length:137065      Length:137065     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##      SIDE             OPENING           HATE_BIAS        
##  Length:137065      Length:137065      Length:137065     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##   DAYOFWEEK           RPT_AREA         CPD_NEIGHBORHOOD  
##  Length:137065      Length:137065      Length:137065     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##  SNA_NEIGHBORHOOD     WEAPONS          DATE_OF_CLEARANCE    HOUR_FROM   
##  Length:137065      Length:137065      Length:137065      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.: 140  
##  Mode  :character   Mode  :character   Mode  :character   Median : 645  
##                                                           Mean   : 913  
##                                                           3rd Qu.:1730  
##                                                           Max.   :2359  
##     HOUR_TO      ADDRESS_X          LONGITUDE_X       LATITUDE_X   
##  Min.   :   0   Length:137065      Min.   :-84.82   Min.   :39.05  
##  1st Qu.: 170   Class :character   1st Qu.:-84.57   1st Qu.:39.12  
##  Median :1055   Mode  :character   Median :-84.52   Median :39.14  
##  Mean   :1027                      Mean   :-84.52   Mean   :39.14  
##  3rd Qu.:1820                      3rd Qu.:-84.49   3rd Qu.:39.16  
##  Max.   :2359                      Max.   :-84.26   Max.   :39.36  
##   VICTIM_AGE        VICTIM_RACE        VICTIM_ETHNICITY  
##  Length:137065      Length:137065      Length:137065     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##  VICTIM_GENDER      SUSPECT_AGE        SUSPECT_RACE      
##  Length:137065      Length:137065      Length:137065     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##  SUSPECT_ETHNICITY  SUSPECT_GENDER     TOTALNUMBERVICTIMS TOTALSUSPECTS   
##  Length:137065      Length:137065      Min.   : 1.000     Min.   : 1.000  
##  Class :character   Class :character   1st Qu.: 1.000     1st Qu.: 1.000  
##  Mode  :character   Mode  :character   Median : 1.000     Median : 1.000  
##                                        Mean   : 1.375     Mean   : 1.611  
##                                        3rd Qu.: 1.000     3rd Qu.: 2.000  
##                                        Max.   :15.000     Max.   :16.000  
##   UCR_GROUP         COMMUNITY_COUNCIL_NEIGHBORHOOD      ZIP   
##  Length:137065      Length:137065                  Min.   :0  
##  Class :character   Class :character               1st Qu.:0  
##  Mode  :character   Mode  :character               Median :0  
##                                                    Mean   :0  
##                                                    3rd Qu.:0  
##                                                    Max.   :0
crime_data_final <- select(crime_data_cleaned, 10, 28, 29, 31)
summary(crime_data_final)
##    OFFENSE           VICTIM_AGE        VICTIM_RACE       
##  Length:137065      Length:137065      Length:137065     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  VICTIM_GENDER     
##  Length:137065     
##  Class :character  
##  Mode  :character
attach(crime_data_final)
VICTIM_AGE <- as.factor(VICTIM_AGE)
VICTIM_RACE <- as.factor(VICTIM_RACE)
VICTIM_GENDER <- as.factor(VICTIM_GENDER)
OFFENSE <- as.factor(OFFENSE)
smp_size <- floor(0.7 * nrow(crime_data_final))
set.seed(25000)
train <- sample(seq_len(nrow(crime_data_final)), size = smp_size)
crime_train_data <- crime_data_final[train,]
crime_test_data <- crime_data_final[-train,]
summary(crime_train_data)
##    OFFENSE           VICTIM_AGE        VICTIM_RACE       
##  Length:95945       Length:95945       Length:95945      
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  VICTIM_GENDER     
##  Length:95945      
##  Class :character  
##  Mode  :character
summary(crime_test_data)
##    OFFENSE           VICTIM_AGE        VICTIM_RACE       
##  Length:41120       Length:41120       Length:41120      
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##  VICTIM_GENDER     
##  Length:41120      
##  Class :character  
##  Mode  :character
At this point we are good to go with the dataset since it is absolutely clean.

Proposed Exploratory Data Analysis:

From this above dataset, my target is to analyse the crime based upon gender, age, race and ethinicity. With Decision tree analysis and Random Forest analysis, I feel the outcome will be proper based on our categorical response variable. My target is to get an outcome where we can observe the crime, age, race and ethinicity which is more prevalent amongst men and women in Cincinnati. For example, expecting outcomes like below:

Male - between age 30-40, race - Black, ethinicity - Black American and most prevalent offense is robbery or

Female - between age 18-30, race - American, ethinicity - American, most prevalent crime is sexual harassment.