For this homework assignment, I’ve gone ahead and used the NYC Arrest Data from NYC Open Data to observe which racial group has the least probability rate of committing a crime within the NYC area. The variables used for the analysis are:
library(readr)
NYPD_Arrest_Data_Year_to_Date_1_ <- read_csv("C:/Users/RArev/Desktop/NYPD_Arrest_Data__Year_to_Date_ (1).csv")
## Parsed with column specification:
## cols(
## ARREST_KEY = col_double(),
## ARREST_DATE = col_character(),
## PD_CD = col_double(),
## PD_DESC = col_character(),
## KY_CD = col_double(),
## OFNS_DESC = col_character(),
## LAW_CODE = col_character(),
## LAW_CAT_CD = col_character(),
## ARREST_BORO = col_character(),
## ARREST_PRECINCT = col_double(),
## JURISDICTION_CODE = col_double(),
## AGE_GROUP = col_character(),
## PERP_SEX = col_character(),
## PERP_RACE = col_character(),
## X_COORD_CD = col_double(),
## Y_COORD_CD = col_double(),
## Latitude = col_double(),
## Longitude = col_double()
## )
crime1 <- NYPD_Arrest_Data_Year_to_Date_1_
crime1 <- crime1 %>%
mutate(LAW_CAT_CD = revalue(LAW_CAT_CD, c("M" = "0", "F" = "1", "V" = "0"))) %>%
mutate(AGE_GROUP = revalue(AGE_GROUP, c("<18" = "Adolescent", "18-24" = "Young Adult", "25-44" = "Adult", "45-64" = "Older Adult", "65+" = "Senior"))) %>%
mutate(PERP_RACE = revalue(PERP_RACE, c("WHITE HISPANIC" = "LATINO", "BLACK HISPANIC" = "LATINO", "UNKNOWN" = "OTHER", "ASIAN / PACIFIC ISLANDER" = "OTHER", "AMERICAN INDIAN/ALASKAN NATIVE" = "OTHER" )))
crime1 <- rename(crime1, c(LAW_CAT_CD = "WAS_CRIME_COMMITTED", PERP_RACE = "RACE", PERP_SEX = "SEX"))
crime1 <- crime1 %>%
mutate(SEX = as.factor(SEX)) %>%
mutate(WAS_CRIME_COMMITTED = as.factor(WAS_CRIME_COMMITTED)) %>%
mutate(RACE = as.factor(RACE)) %>%
mutate(AGE_GROUP = as.factor(AGE_GROUP)) %>%
na.omit()%>%
select(WAS_CRIME_COMMITTED, RACE, SEX, AGE_GROUP)
crime1$CRIME <- crime1$WAS_CRIME_COMMITTED
The dependent variable (Was Crime Committed) was recoded into a binary variable. The original variable contained three values, each representing M for misdemeanor, V for violation, F for Felony. For the purpose of this analysis, F will be the only value regarded as Crime Committed. The Character variables being used (see above), were all character variable that were converted to factor variables and coded using the mutate and revalue functions.
crime2 <- crime1
crime2 <- crime2 %>%
select(WAS_CRIME_COMMITTED, RACE, AGE_GROUP, SEX, everything()) %>%
select(-CRIME)
head(crime2)
## # A tibble: 6 x 4
## WAS_CRIME_COMMITTED RACE AGE_GROUP SEX
## <fct> <fct> <fct> <fct>
## 1 0 LATINO Young Adult M
## 2 1 BLACK Adult M
## 3 0 LATINO Adult M
## 4 1 BLACK Older Adult M
## 5 0 LATINO Young Adult M
## 6 0 LATINO Adult M
The first model determines the log odds of which of the three (Latino, White, Other) has the least probability of committing a crime
m0 <- glm(WAS_CRIME_COMMITTED ~ RACE, family = binomial, data = crime2)
summary(m0)
##
## Call:
## glm(formula = WAS_CRIME_COMMITTED ~ RACE, family = binomial,
## data = crime2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9601 -0.9601 -0.8750 1.4115 1.5594
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.535171 0.008322 -64.307 < 2e-16 ***
## RACELATINO -0.227462 0.013134 -17.318 < 2e-16 ***
## RACEOTHER -0.197354 0.024898 -7.927 2.25e-15 ***
## RACEWHITE -0.328965 0.019373 -16.980 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 167465 on 130590 degrees of freedom
## Residual deviance: 166990 on 130587 degrees of freedom
## AIC: 166998
##
## Number of Fisher Scoring iterations: 4
The second model includes the variable of sex into the log odds which determines which sex is most likely to commit a crime.
m1 <- glm(WAS_CRIME_COMMITTED ~ RACE + SEX, family = binomial, data = crime2)
summary(m1)
##
## Call:
## glm(formula = WAS_CRIME_COMMITTED ~ RACE + SEX, family = binomial,
## data = crime2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9748 -0.8989 -0.8880 1.3945 1.6367
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.71285 0.01540 -46.284 < 2e-16 ***
## RACELATINO -0.22997 0.01315 -17.495 < 2e-16 ***
## RACEOTHER -0.20026 0.02492 -8.037 9.19e-16 ***
## RACEWHITE -0.32285 0.01939 -16.648 < 2e-16 ***
## SEXM 0.21563 0.01563 13.793 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 167465 on 130590 degrees of freedom
## Residual deviance: 166797 on 130586 degrees of freedom
## AIC: 166807
##
## Number of Fisher Scoring iterations: 4
The third model introduces the variable of Age group into the log odds which along with Race, not only determines which age group is least likely to commit a crime, but also assists in determining which racial group is least likely to commit a crime
m2 <- glm(WAS_CRIME_COMMITTED ~ RACE + SEX + AGE_GROUP, family = binomial, data = crime2)
summary(m2)
##
## Call:
## glm(formula = WAS_CRIME_COMMITTED ~ RACE + SEX + AGE_GROUP, family = binomial,
## data = crime2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1641 -0.9368 -0.8577 1.4029 1.7431
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.26043 0.02756 -9.451 < 2e-16 ***
## RACELATINO -0.23072 0.01318 -17.499 < 2e-16 ***
## RACEOTHER -0.19230 0.02496 -7.703 1.33e-14 ***
## RACEWHITE -0.29447 0.01949 -15.106 < 2e-16 ***
## SEXM 0.22901 0.01568 14.604 < 2e-16 ***
## AGE_GROUPAdult -0.48462 0.02558 -18.948 < 2e-16 ***
## AGE_GROUPOlder Adult -0.56476 0.02773 -20.367 < 2e-16 ***
## AGE_GROUPSenior -0.71723 0.06291 -11.401 < 2e-16 ***
## AGE_GROUPYoung Adult -0.44758 0.02718 -16.468 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 167465 on 130590 degrees of freedom
## Residual deviance: 166355 on 130582 degrees of freedom
## AIC: 166373
##
## Number of Fisher Scoring iterations: 4
Both a chi-square anova test and lmtest were conducted to determine which model would be best used for further analysis. Thus far both test indicate that model 2 is the best to be used for continual analysis.
anova(m0, m1, m2, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: WAS_CRIME_COMMITTED ~ RACE
## Model 2: WAS_CRIME_COMMITTED ~ RACE + SEX
## Model 3: WAS_CRIME_COMMITTED ~ RACE + SEX + AGE_GROUP
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 130587 166990
## 2 130586 166797 1 193.69 < 2.2e-16 ***
## 3 130582 166355 4 441.69 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lmtest::lrtest(m0, m1, m2)
## Likelihood ratio test
##
## Model 1: WAS_CRIME_COMMITTED ~ RACE
## Model 2: WAS_CRIME_COMMITTED ~ RACE + SEX
## Model 3: WAS_CRIME_COMMITTED ~ RACE + SEX + AGE_GROUP
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 4 -83495
## 2 5 -83398 1 193.69 < 2.2e-16 ***
## 3 9 -83178 4 441.69 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model 2 indicates that between the sexes, males have higher odds of committing a crime by .22 units. Between racial groups, both Latino and White racial groups are least likely to commit crimes as the log of odds demonstrates units falling between between -.23 and -.33. When comparing this with model 1, model 2 clearly suggests that males are much more likely in committing a crime as the log of odds units increased to .22. However, it also suggest a slight decreases in the likelihood that Whites and Latinos are to commit a crime.
| Model 1 | Model 2 | ||
|---|---|---|---|
| (Intercept) | -0.54*** | -0.71*** | |
| (0.01) | (0.02) | ||
| LATINO | -0.23*** | -0.23*** | |
| (0.01) | (0.01) | ||
| OTHER | -0.20*** | -0.20*** | |
| (0.02) | (0.02) | ||
| WHITE | -0.33*** | -0.32*** | |
| (0.02) | (0.02) | ||
| SEXM | 0.22*** | ||
| (0.02) | |||
| AIC | 166998.49 | 166806.80 | |
| BIC | 167037.61 | 166855.70 | |
| Log Likelihood | -83495.25 | -83398.40 | |
| Deviance | 166990.49 | 166796.80 | |
| Num. obs. | 130591 | 130591 | |
| p < 0.001, p < 0.01, p < 0.05 | |||
Graph 1 provides a visual reprenation on the likeliness of a crime being committed between sexes. Here, the graph clearly indicates that the probability of males committing a crime are .375 higher than a females odds at .323
visreg(m2, "SEX", scale = "response")
Graph 2 shows a visual representation of the likeliness of a crime being committed when sexes are grouped by age. This graph not only reinforces the notion that males have a higher probability of committing a crime, that they also more likely to commit a crime during adolescences.
visreg(m2, "SEX", by = "AGE_GROUP", scale = "response")
The final graph, graph 3 provides a visual representation of the likeliness of a crime being committed when sexes are group by racial groups. Unlike the regression table provided above, graph 3 indicates that black males have a higher probability (nearly three times higher) of committing a crime when compared to the other racial groups. Furthermore, this phenonemon is not only isloted to black males as the graph also indicates that black females are also shown as having a higher probability of committing a crime when compared to the other racial groups.
visreg(m2, "RACE", by = "SEX", scale = "response")