Advanced Analytics Homework #4

Determining Which Racial Group Has A Higher Probabiliy Rate of Arrest Using NYC Arrest Data from NYC Open Data

For this homework assignment, I’ve gone ahead and used the NYC Arrest Data from NYC Open Data to observe which racial group has the least probability rate of committing a crime within the NYC area. The variables used for the analysis are:

Variables
LAW CAT CD
PERP RACE
PERP SEX
AGE GROUP

Importing Data

library(readr)
NYPD_Arrest_Data_Year_to_Date_1_ <- read_csv("C:/Users/RArev/Desktop/NYPD_Arrest_Data__Year_to_Date_ (1).csv")

## Parsed with column specification:
## cols(
##   ARREST_KEY = col_double(),
##   ARREST_DATE = col_character(),
##   PD_CD = col_double(),
##   PD_DESC = col_character(),
##   KY_CD = col_double(),
##   OFNS_DESC = col_character(),
##   LAW_CODE = col_character(),
##   LAW_CAT_CD = col_character(),
##   ARREST_BORO = col_character(),
##   ARREST_PRECINCT = col_double(),
##   JURISDICTION_CODE = col_double(),
##   AGE_GROUP = col_character(),
##   PERP_SEX = col_character(),
##   PERP_RACE = col_character(),
##   X_COORD_CD = col_double(),
##   Y_COORD_CD = col_double(),
##   Latitude = col_double(),
##   Longitude = col_double()
## )

crime1 <- NYPD_Arrest_Data_Year_to_Date_1_

Wrangling the Data

crime1 <- crime1 %>%
  mutate(LAW_CAT_CD = revalue(LAW_CAT_CD, c("M" = "0", "F" = "1", "V" = "0"))) %>%
  mutate(AGE_GROUP = revalue(AGE_GROUP, c("<18" = "Adolescent", "18-24" = "Young Adult", "25-44" = "Adult", "45-64" = "Older Adult", "65+" = "Senior"))) %>%
  mutate(PERP_RACE = revalue(PERP_RACE, c("WHITE HISPANIC" = "LATINO", "BLACK HISPANIC" = "LATINO", "UNKNOWN" = "OTHER", "ASIAN / PACIFIC ISLANDER" = "OTHER", "AMERICAN INDIAN/ALASKAN NATIVE" = "OTHER" )))

crime1 <- rename(crime1, c(LAW_CAT_CD = "WAS_CRIME_COMMITTED", PERP_RACE = "RACE", PERP_SEX = "SEX"))

crime1 <- crime1 %>%
  mutate(SEX = as.factor(SEX)) %>%
  mutate(WAS_CRIME_COMMITTED = as.factor(WAS_CRIME_COMMITTED)) %>%
  mutate(RACE = as.factor(RACE)) %>%
  mutate(AGE_GROUP = as.factor(AGE_GROUP)) %>%
  na.omit()%>%
  select(WAS_CRIME_COMMITTED, RACE, SEX, AGE_GROUP)

crime1$CRIME <- crime1$WAS_CRIME_COMMITTED

Data Wrangling

The dependent variable (Was Crime Committed) was recoded into a binary variable. The original variable contained three values, each representing M for misdemeanor, V for violation, F for Felony. For the purpose of this analysis, F will be the only value regarded as Crime Committed. The Character variables being used (see above), were all character variable that were converted to factor variables and coded using the mutate and revalue functions.

Index

LAW CAT CD

renamed to Was Crime Committed
M (misdemeanor) = 0
V (violation) = 0
F (Felony) = 1
changed from character to factor variable

PERP RACE

renamed to Race
WHITE HISPANIC = LATINO
BLACK HISPANIC = LATINO
BLACK
WHITE
ASIAN / PACIFIC ISLANDER = OTHER
AMERICAN INDIAN/ALASKAN NATIVE = OTHER
UNKNOWN = OTHER
changed from character to factor variable

PERP SEX

renamed to Sex
M = Male
F = Female
changed from character to factor variable

AGE GROUP

<18 = adolescent
18 - 14 = young adult
25 - 44 = adult
44 - 64 = older adult
65 + = senior

Viewing Modified Data

crime2 <- crime1
crime2 <- crime2 %>%
  select(WAS_CRIME_COMMITTED, RACE, AGE_GROUP, SEX, everything()) %>%
  select(-CRIME)
head(crime2)

## # A tibble: 6 x 4
##   WAS_CRIME_COMMITTED RACE   AGE_GROUP   SEX  
##   <fct>               <fct>  <fct>       <fct>
## 1 0                   LATINO Young Adult M    
## 2 1                   BLACK  Adult       M    
## 3 0                   LATINO Adult       M    
## 4 1                   BLACK  Older Adult M    
## 5 0                   LATINO Young Adult M    
## 6 0                   LATINO Adult       M

Model 1 - Race

The first model determines the log odds of which of the three (Latino, White, Other) has the least probability of committing a crime

m0 <- glm(WAS_CRIME_COMMITTED ~ RACE, family = binomial, data = crime2)
summary(m0)

## 
## Call:
## glm(formula = WAS_CRIME_COMMITTED ~ RACE, family = binomial, 
##     data = crime2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9601  -0.9601  -0.8750   1.4115   1.5594  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.535171   0.008322 -64.307  < 2e-16 ***
## RACELATINO  -0.227462   0.013134 -17.318  < 2e-16 ***
## RACEOTHER   -0.197354   0.024898  -7.927 2.25e-15 ***
## RACEWHITE   -0.328965   0.019373 -16.980  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 167465  on 130590  degrees of freedom
## Residual deviance: 166990  on 130587  degrees of freedom
## AIC: 166998
## 
## Number of Fisher Scoring iterations: 4

Model 2 - Race & Sex

The second model includes the variable of sex into the log odds which determines which sex is most likely to commit a crime.

m1 <- glm(WAS_CRIME_COMMITTED ~ RACE + SEX, family = binomial, data = crime2)
summary(m1)

## 
## Call:
## glm(formula = WAS_CRIME_COMMITTED ~ RACE + SEX, family = binomial, 
##     data = crime2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9748  -0.8989  -0.8880   1.3945   1.6367  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.71285    0.01540 -46.284  < 2e-16 ***
## RACELATINO  -0.22997    0.01315 -17.495  < 2e-16 ***
## RACEOTHER   -0.20026    0.02492  -8.037 9.19e-16 ***
## RACEWHITE   -0.32285    0.01939 -16.648  < 2e-16 ***
## SEXM         0.21563    0.01563  13.793  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 167465  on 130590  degrees of freedom
## Residual deviance: 166797  on 130586  degrees of freedom
## AIC: 166807
## 
## Number of Fisher Scoring iterations: 4

Model 3 - Race, Sex & Age Group

The third model introduces the variable of Age group into the log odds which along with Race, not only determines which age group is least likely to commit a crime, but also assists in determining which racial group is least likely to commit a crime

m2 <- glm(WAS_CRIME_COMMITTED ~ RACE + SEX + AGE_GROUP, family = binomial, data = crime2)
summary(m2)

## 
## Call:
## glm(formula = WAS_CRIME_COMMITTED ~ RACE + SEX + AGE_GROUP, family = binomial, 
##     data = crime2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1641  -0.9368  -0.8577   1.4029   1.7431  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -0.26043    0.02756  -9.451  < 2e-16 ***
## RACELATINO           -0.23072    0.01318 -17.499  < 2e-16 ***
## RACEOTHER            -0.19230    0.02496  -7.703 1.33e-14 ***
## RACEWHITE            -0.29447    0.01949 -15.106  < 2e-16 ***
## SEXM                  0.22901    0.01568  14.604  < 2e-16 ***
## AGE_GROUPAdult       -0.48462    0.02558 -18.948  < 2e-16 ***
## AGE_GROUPOlder Adult -0.56476    0.02773 -20.367  < 2e-16 ***
## AGE_GROUPSenior      -0.71723    0.06291 -11.401  < 2e-16 ***
## AGE_GROUPYoung Adult -0.44758    0.02718 -16.468  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 167465  on 130590  degrees of freedom
## Residual deviance: 166355  on 130582  degrees of freedom
## AIC: 166373
## 
## Number of Fisher Scoring iterations: 4

Perfoming Likekihood ratio test using ANOVA Chisquare test and lmtest

Both a chi-square anova test and lmtest were conducted to determine which model would be best used for further analysis. Thus far both test indicate that model 2 is the best to be used for continual analysis.

anova(m0, m1, m2, test = "Chisq")

## Analysis of Deviance Table
## 
## Model 1: WAS_CRIME_COMMITTED ~ RACE
## Model 2: WAS_CRIME_COMMITTED ~ RACE + SEX
## Model 3: WAS_CRIME_COMMITTED ~ RACE + SEX + AGE_GROUP
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1    130587     166990                          
## 2    130586     166797  1   193.69 < 2.2e-16 ***
## 3    130582     166355  4   441.69 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lmtest::lrtest(m0, m1, m2)

## Likelihood ratio test
## 
## Model 1: WAS_CRIME_COMMITTED ~ RACE
## Model 2: WAS_CRIME_COMMITTED ~ RACE + SEX
## Model 3: WAS_CRIME_COMMITTED ~ RACE + SEX + AGE_GROUP
##   #Df LogLik Df  Chisq Pr(>Chisq)    
## 1   4 -83495                         
## 2   5 -83398  1 193.69  < 2.2e-16 ***
## 3   9 -83178  4 441.69  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model 2 Regression Table - Race & Sex

Model 2 indicates that between the sexes, males have higher odds of committing a crime by .22 units. Between racial groups, both Latino and White racial groups are least likely to commit crimes as the log of odds demonstrates units falling between between -.23 and -.33. When comparing this with model 1, model 2 clearly suggests that males are much more likely in committing a crime as the log of odds units increased to .22. However, it also suggest a slight decreases in the likelihood that Whites and Latinos are to commit a crime.

Statistical models
	Model 1	Model 2
(Intercept)	-0.54^***	-0.71^***
	(0.01)	(0.02)
LATINO	-0.23^***	-0.23^***
	(0.01)	(0.01)
OTHER	-0.20^***	-0.20^***
	(0.02)	(0.02)
WHITE	-0.33^***	-0.32^***
	(0.02)	(0.02)
SEXM		0.22^***
		(0.02)
AIC	166998.49	166806.80
BIC	167037.61	166855.70
Log Likelihood	-83495.25	-83398.40
Deviance	166990.49	166796.80
Num. obs.	130591	130591
p < 0.001, p < 0.01, p < 0.05

Graph 1

Graph 1 provides a visual reprenation on the likeliness of a crime being committed between sexes. Here, the graph clearly indicates that the probability of males committing a crime are .375 higher than a females odds at .323

visreg(m2, "SEX", scale = "response")

Graph 2

Graph 2 shows a visual representation of the likeliness of a crime being committed when sexes are grouped by age. This graph not only reinforces the notion that males have a higher probability of committing a crime, that they also more likely to commit a crime during adolescences.

visreg(m2, "SEX", by = "AGE_GROUP", scale = "response")

Graph 3

The final graph, graph 3 provides a visual representation of the likeliness of a crime being committed when sexes are group by racial groups. Unlike the regression table provided above, graph 3 indicates that black males have a higher probability (nearly three times higher) of committing a crime when compared to the other racial groups. Furthermore, this phenonemon is not only isloted to black males as the graph also indicates that black females are also shown as having a higher probability of committing a crime when compared to the other racial groups.

visreg(m2, "RACE", by = "SEX", scale = "response")