NYPD Complaints Data - Project 2-3

Disclaimer: This is a relatively rough analysis and should not be taken for any political or legislative agenda. To fully and completely understand the underlying effects of crime and gain a 100% accurate assessment of the dataset it will require further work.

import libraries

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.4.2

Load CSV

nypd_complaints_data = read.csv("NYPD_Complaint_Data_Historic.csv")

Remove unneccessary columns for analysis

nypd_complaints_df = nypd_complaints_data[,c(-1,-3,-4,-5,-7,-9,-16,-18,-19,-20)]

nypd_complaints_df = nypd_complaints_df[,-4]

Sort by Burrough

nypd_complaints_df = nypd_complaints_df[order(nypd_complaints_df$BORO_NM),]

Duration of days between incident and report for all available dates

duration = as.Date(nypd_complaints_df$RPT_DT,"%m/%d/%Y")-as.Date(nypd_complaints_df$CMPLNT_FR_DT,"%m/%d/%Y")

averagereportingduration = mean(as.numeric(as.character(duration[!is.na(duration)])))

print(paste("The average duration, for all available dates, between the date of a crime's occurence and it's reporting is",round(averagereportingduration,0),"days.",sep=" "))

## [1] "The average duration, for all available dates, between the date of a crime's occurence and it's reporting is 15 days."

Delay in reporting of each crime. This analysis extrapolates the missing duration values, and it does not create a fully unique list of crimes. This is a relatively rough high level overview of the data.

nypd_complaints_df$Duration = as.numeric(as.character(duration))
nypd_complaints_df$KYCD = nypd_complaints_data$KY_CD
nypd_complaints_df$PoliceDesc = nypd_complaints_data$PD_DESC
crimedurationdf = nypd_complaints_df[,c("OFNS_DESC","KYCD","PoliceDesc","Duration")]
crimedurationdf = crimedurationdf[order(crimedurationdf$OFNS_DESC),]
crimedurationdf = crimedurationdf %>% fill(Duration)
missingvals = which(crimedurationdf$OFNS_DESC=="")
crimedurationdf$OFNS_DESC = as.character(crimedurationdf$OFNS_DESC)
crimedurationdf[missingvals,"OFNS_DESC"] = list(as.character(crimedurationdf[missingvals,"PoliceDesc"]))
crimedurationdf[which(crimedurationdf$OFNS_DESC==""),"OFNS_DESC"] = "Other"
options(warn = -1)
averagedurationbycrime = aggregate(crimedurationdf,by = list(crimedurationdf$OFNS_DESC),FUN=mean)
averagedurationbycrime = averagedurationbycrime[,c(-2,-3,-4)]

Delays greater than 100

print(subset(averagedurationbycrime,Duration>100))

##                                         Group.1 Duration
## 94                                       FRAUDS 104.8310
## 120             KIDNAPPING AND RELATED OFFENSES 442.0000
## 124 LARCENY,GRAND BY ACQUIRING LOST CREDIT CARD 170.0769
## 204             MURDER & NON-NEGL. MANSLAUGHTER 203.3359
## 241                                        RAPE 251.3853
## 281                                  SEX CRIMES 147.5966
## 297                                 THEFT-FRAUD 174.8864

print(paste("The crime that has the longest delay between reporting and occurrence is",subset(averagedurationbycrime,Duration == max(Duration))[1],"with an average delay of",subset(averagedurationbycrime,Duration == max(Duration))[,2],"days"))

## [1] "The crime that has the longest delay between reporting and occurrence is KIDNAPPING AND RELATED OFFENSES with an average delay of 442 days"

Crime by Burrough: Broad overview. Using the tail function, I saw that most entries with their Jurisdiction containing “N.Y. Police Dept” belonged to Staten Island; thus I filled all NA Burrough entries with “Staten Island” as over 90% of NA fields had a coinciding jurisidiction of “N.Y. Police Dept”.

burroughdf = nypd_complaints_df[,c("BORO_NM","LAW_CAT_CD")]
missingburroughval = which(burroughdf$BORO_NM=="")
burroughdf[missingburroughval,"BORO_NM"] = "STATEN ISLAND"
frequencytablebroad = table(burroughdf$LAW_CAT_CD)

print(frequencytablebroad)

## 
##      FELONY MISDEMEANOR   VIOLATION 
##     1717265     3180644      682126

par("mar")

## [1] 5.1 4.1 4.1 2.1

par(mar=c(11,11,11,11))
getOption("scipen")

## [1] 0

opt <- options("scipen" = 20)
getOption("scipen")

## [1] 20

Crime by Burrough: Analysis of frequency of occurrence for Felonies, Misdemeanors, and Violations

freqtableburr = table(burroughdf)
freqtableburr = as.data.frame(freqtableburr)
freqtableburr = subset(freqtableburr,BORO_NM != "")

print(freqtableburr)

##          BORO_NM  LAW_CAT_CD   Freq
## 2          BRONX      FELONY 325145
## 3       BROOKLYN      FELONY 541841
## 4      MANHATTAN      FELONY 422189
## 5         QUEENS      FELONY 370347
## 6  STATEN ISLAND      FELONY  57743
## 8          BRONX MISDEMEANOR 737972
## 9       BROOKLYN MISDEMEANOR 919218
## 10     MANHATTAN MISDEMEANOR 771325
## 11        QUEENS MISDEMEANOR 593282
## 12 STATEN ISLAND MISDEMEANOR 158847
## 14         BRONX   VIOLATION 146530
## 15      BROOKLYN   VIOLATION 205844
## 16     MANHATTAN   VIOLATION 138246
## 17        QUEENS   VIOLATION 141992
## 18 STATEN ISLAND   VIOLATION  49514

par("mar")

## [1] 5.1 4.1 4.1 2.1

par(mar=c(7,7,7,7))

Percentage of Crimes Completed

completiondf = subset(nypd_complaints_data,CRM_ATPT_CPTD_CD =="COMPLETED")
completionrate = (length(completiondf$CRM_ATPT_CPTD_CD)/length(nypd_complaints_data$CRM_ATPT_CPTD_CD))*100

print(paste("Of the total number of attempted crimes,the completion rate is",round(completionrate,2),"%"))

## [1] "Of the total number of attempted crimes,the completion rate is 98.28 %"

Granular View: Crimes with the greatest delay between reporting and occurrence by Burrough

granburroughdf = burroughdf
granburroughdf$Offense = nypd_complaints_df$OFNS_DESC
granburroughdf$LAW_CAT_CD = nypd_complaints_df$LAW_CAT_CD
granmissingvals = which(granburroughdf$OFNS_DESC=="")
granburroughdf$Offense = as.character(granburroughdf$Offense)
granburroughdf[granmissingvals,"Offense"] = list(as.character(granburroughdf[granmissingvals,"LAW_CAT_CD"]))
granburroughdf = granburroughdf[,-2]

topitems = unlist(subset(averagedurationbycrime,Duration>100)[1],use.names = FALSE)
granburroughdftopitems = granburroughdf[which(granburroughdf$Offense %in% topitems),]
freqgranburrough = table(granburroughdftopitems)
freqgranburrough = as.data.frame(freqgranburrough)
freqgranburrough = subset(freqgranburrough, BORO_NM != "")
freqgranburrough = subset(freqgranburrough, Offense != "")
freqgranburrough = subset(freqgranburrough, Offense != "KIDNAPPING AND RELATED OFFENSES")

Conclusion: Of the 5 Burroughs, Brooklyn seems to experience the greatest frequency of crime. In addition, general sex crimes, theft, and fraud have occurred at an exceptionally high frequency for the period of 2006-2016. Moreover, of the 5 burroughs, Staten Island seems to experience the lowest frequency of crime, especially when considering the big movers.

NYPD Complaints Data - Project 2-3

Hector Santana

October 8, 2017

Disclaimer: This is a relatively rough analysis and should not be taken for any political or legislative agenda. To fully and completely understand the underlying effects of crime and gain a 100% accurate assessment of the dataset it will require further work.

import libraries

Load CSV

Remove unneccessary columns for analysis

Sort by Burrough

Duration of days between incident and report for all available dates

Delay in reporting of each crime. This analysis extrapolates the missing duration values, and it does not create a fully unique list of crimes. This is a relatively rough high level overview of the data.

Delays greater than 100

Crime by Burrough: Analysis of frequency of occurrence for Felonies, Misdemeanors, and Violations

Percentage of Crimes Completed

Granular View: Crimes with the greatest delay between reporting and occurrence by Burrough