Disclaimer: This is a relatively rough analysis and should not be taken for any political or legislative agenda. To fully and completely understand the underlying effects of crime and gain a 100% accurate assessment of the dataset it will require further work.
import libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.2
Load CSV
nypd_complaints_data = read.csv("NYPD_Complaint_Data_Historic.csv")
Remove unneccessary columns for analysis
nypd_complaints_df = nypd_complaints_data[,c(-1,-3,-4,-5,-7,-9,-16,-18,-19,-20)]
nypd_complaints_df = nypd_complaints_df[,-4]
Sort by Burrough
nypd_complaints_df = nypd_complaints_df[order(nypd_complaints_df$BORO_NM),]
Duration of days between incident and report for all available dates
duration = as.Date(nypd_complaints_df$RPT_DT,"%m/%d/%Y")-as.Date(nypd_complaints_df$CMPLNT_FR_DT,"%m/%d/%Y")
averagereportingduration = mean(as.numeric(as.character(duration[!is.na(duration)])))
print(paste("The average duration, for all available dates, between the date of a crime's occurence and it's reporting is",round(averagereportingduration,0),"days.",sep=" "))
## [1] "The average duration, for all available dates, between the date of a crime's occurence and it's reporting is 15 days."
Delay in reporting of each crime. This analysis extrapolates the missing duration values, and it does not create a fully unique list of crimes. This is a relatively rough high level overview of the data.
nypd_complaints_df$Duration = as.numeric(as.character(duration))
nypd_complaints_df$KYCD = nypd_complaints_data$KY_CD
nypd_complaints_df$PoliceDesc = nypd_complaints_data$PD_DESC
crimedurationdf = nypd_complaints_df[,c("OFNS_DESC","KYCD","PoliceDesc","Duration")]
crimedurationdf = crimedurationdf[order(crimedurationdf$OFNS_DESC),]
crimedurationdf = crimedurationdf %>% fill(Duration)
missingvals = which(crimedurationdf$OFNS_DESC=="")
crimedurationdf$OFNS_DESC = as.character(crimedurationdf$OFNS_DESC)
crimedurationdf[missingvals,"OFNS_DESC"] = list(as.character(crimedurationdf[missingvals,"PoliceDesc"]))
crimedurationdf[which(crimedurationdf$OFNS_DESC==""),"OFNS_DESC"] = "Other"
options(warn = -1)
averagedurationbycrime = aggregate(crimedurationdf,by = list(crimedurationdf$OFNS_DESC),FUN=mean)
averagedurationbycrime = averagedurationbycrime[,c(-2,-3,-4)]
Delays greater than 100
print(subset(averagedurationbycrime,Duration>100))
## Group.1 Duration
## 94 FRAUDS 104.8310
## 120 KIDNAPPING AND RELATED OFFENSES 442.0000
## 124 LARCENY,GRAND BY ACQUIRING LOST CREDIT CARD 170.0769
## 204 MURDER & NON-NEGL. MANSLAUGHTER 203.3359
## 241 RAPE 251.3853
## 281 SEX CRIMES 147.5966
## 297 THEFT-FRAUD 174.8864
print(paste("The crime that has the longest delay between reporting and occurrence is",subset(averagedurationbycrime,Duration == max(Duration))[1],"with an average delay of",subset(averagedurationbycrime,Duration == max(Duration))[,2],"days"))
## [1] "The crime that has the longest delay between reporting and occurrence is KIDNAPPING AND RELATED OFFENSES with an average delay of 442 days"
Crime by Burrough: Broad overview. Using the tail function, I saw that most entries with their Jurisdiction containing “N.Y. Police Dept” belonged to Staten Island; thus I filled all NA Burrough entries with “Staten Island” as over 90% of NA fields had a coinciding jurisidiction of “N.Y. Police Dept”.
burroughdf = nypd_complaints_df[,c("BORO_NM","LAW_CAT_CD")]
missingburroughval = which(burroughdf$BORO_NM=="")
burroughdf[missingburroughval,"BORO_NM"] = "STATEN ISLAND"
frequencytablebroad = table(burroughdf$LAW_CAT_CD)
print(frequencytablebroad)
##
## FELONY MISDEMEANOR VIOLATION
## 1717265 3180644 682126
par("mar")
## [1] 5.1 4.1 4.1 2.1
par(mar=c(11,11,11,11))
getOption("scipen")
## [1] 0
opt <- options("scipen" = 20)
getOption("scipen")
## [1] 20

Crime by Burrough: Analysis of frequency of occurrence for Felonies, Misdemeanors, and Violations
freqtableburr = table(burroughdf)
freqtableburr = as.data.frame(freqtableburr)
freqtableburr = subset(freqtableburr,BORO_NM != "")
print(freqtableburr)
## BORO_NM LAW_CAT_CD Freq
## 2 BRONX FELONY 325145
## 3 BROOKLYN FELONY 541841
## 4 MANHATTAN FELONY 422189
## 5 QUEENS FELONY 370347
## 6 STATEN ISLAND FELONY 57743
## 8 BRONX MISDEMEANOR 737972
## 9 BROOKLYN MISDEMEANOR 919218
## 10 MANHATTAN MISDEMEANOR 771325
## 11 QUEENS MISDEMEANOR 593282
## 12 STATEN ISLAND MISDEMEANOR 158847
## 14 BRONX VIOLATION 146530
## 15 BROOKLYN VIOLATION 205844
## 16 MANHATTAN VIOLATION 138246
## 17 QUEENS VIOLATION 141992
## 18 STATEN ISLAND VIOLATION 49514
par("mar")
## [1] 5.1 4.1 4.1 2.1
par(mar=c(7,7,7,7))

Percentage of Crimes Completed
completiondf = subset(nypd_complaints_data,CRM_ATPT_CPTD_CD =="COMPLETED")
completionrate = (length(completiondf$CRM_ATPT_CPTD_CD)/length(nypd_complaints_data$CRM_ATPT_CPTD_CD))*100
print(paste("Of the total number of attempted crimes,the completion rate is",round(completionrate,2),"%"))
## [1] "Of the total number of attempted crimes,the completion rate is 98.28 %"
Granular View: Crimes with the greatest delay between reporting and occurrence by Burrough
granburroughdf = burroughdf
granburroughdf$Offense = nypd_complaints_df$OFNS_DESC
granburroughdf$LAW_CAT_CD = nypd_complaints_df$LAW_CAT_CD
granmissingvals = which(granburroughdf$OFNS_DESC=="")
granburroughdf$Offense = as.character(granburroughdf$Offense)
granburroughdf[granmissingvals,"Offense"] = list(as.character(granburroughdf[granmissingvals,"LAW_CAT_CD"]))
granburroughdf = granburroughdf[,-2]
topitems = unlist(subset(averagedurationbycrime,Duration>100)[1],use.names = FALSE)
granburroughdftopitems = granburroughdf[which(granburroughdf$Offense %in% topitems),]
freqgranburrough = table(granburroughdftopitems)
freqgranburrough = as.data.frame(freqgranburrough)
freqgranburrough = subset(freqgranburrough, BORO_NM != "")
freqgranburrough = subset(freqgranburrough, Offense != "")
freqgranburrough = subset(freqgranburrough, Offense != "KIDNAPPING AND RELATED OFFENSES")

Conclusion: Of the 5 Burroughs, Brooklyn seems to experience the greatest frequency of crime. In addition, general sex crimes, theft, and fraud have occurred at an exceptionally high frequency for the period of 2006-2016. Moreover, of the 5 burroughs, Staten Island seems to experience the lowest frequency of crime, especially when considering the big movers.