Intro

Commissioner William Bratton and his executive staff could take satisfaction in the progress made by the New York Police Department (NYPD) toward goals they had set at the outset of 1994 to reduce major crimes in the City. Their efforts had produced better results than even some of them had expected, better even than portrayed in the popular television drama carrying the Department’s name. William Bratton revolutionized the way policing was handleded in America, the introduction of the Compstat system utilized data like never before and would allow the NYPD to fight crime like never before. A change was implimented in the policing, away from following a rective protocol the department shifted focus on prevention. The purpose of this analysis is to compare the decrease in crime during Brattons first term with crime statistics of his second term.

Data

The data utilized is found on Kaggle and observes 2014-2015 Crimes reported in all 5 boroughs of New York City. It can be found here. As data from his first term is not widely available - I converted a PDF file showing the overall crime numbers in the years of question; with this in mind the anlysis will focus primarily on 2014 and 2015 as data is available for these periods.

Package Load

library(tidyr)      
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(knitr)   
library(leaflet)    
library(ggplot2)    
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(jsonlite)

Historic Analysis

Data load - Historic

JSON_URL <- "https://raw.githubusercontent.com/Fyoun123/Data607/master/Final%20Project/RAW/NYPD.json" #File was converted from PDF. 
json_df <- as.data.frame(fromJSON(JSON_URL))
colnames(json_df) = json_df[1, ] # The first row will be the header
json_df = json_df[-1, ]          # Removing the first row.
json_df                          # Test
##                City    1993    1995  + - %
## 2          New York 600,346 444,758 -25.9%
## 3       Los Angeles 312,790 266,204 -14.9%
## 4           Chicago       *       *      *
## 5           Houston 141,179 131,602  -6.8%
## 6      Philadelphia  97,659 108,278 +10.9%
## 7         San Diego  85,227  64,235 -24.6%
## 8           Phoenix  96,476 118,126 +22.4%
## 9            Dallas 110,803  98,624 -11.0%
## 10          Detroit 122,329 119,065  -2.7%
## 11      San Antonio  97,671  79,931 -18.2%
## 12         San Jose  36,743  36,096  -1.8%
## 13     Indianapolis  33,530  30,775  -8.2%
## 14        Las Vegas  48,367  60,178 +24.4%
## 15    San Francisco  67,345  60,474 -10.2%
## 16        Baltimore  91,920  94,855  +3.2%
## 17     Jacksonville  67,494  61,129  -9.4%
## 18         Columbus  58,604  58,715  +1.9%
## 19        Milwaukee  50,435  52,679  +4.4%
## 20          Memphis  62,150  65,597  +5.5%
## 21 Washington, D.C.  66,758  67,402  +1.0%
## 22          El Paso  46,738  41,692 -10.8%
## 23           Boston  55,555  52,278  -6.0%
## 24          Seattle  62,679  55,507 -11.4%
## 25        Nashville  55,500  56,090  +1.0%
## 26           Austin  51,468  42,586 -17.6%
## 27           Denver  39,796  34,769 -12.6%
## 28        Cleveland  40,006  38,665  -3.4%
## 29      New Orleans  52,773  53,399  +1.2%
## 30       Fort Worth  49,801  39,667 -20.3%

Data Clean - Historic

NY_Crime <- filter(json_df,json_df$City=='New York')
U_P <- gather(NY_Crime, year, value, `1993`:`1995`)
U_P$`+ - %`<- NULL
U_P$City <- NULL
U_P$value <- as.numeric(gsub(",","",U_P$value))
U_P
##   year  value
## 1 1993 600346
## 2 1995 444758

Visual representation of Crime decrease

# Basic barplot
ggplot(data=U_P, aes(x=year, y=value)) +
geom_bar(stat="identity") + scale_y_continuous(name="Crime", labels = scales::comma)

According to the sources utilized crime decreased by 25.9% during William Brattons first term as commisioners - This would be highly controversial later on as people began to argue that numbers were inflated to better represent the impact Bratton had.

Column Decription

(read.csv("Crime_Column_Description.csv"))
##               Column
## 1         CMPLNT_NUM
## 2       CMPLNT_FR_DT
## 3       CMPLNT_FR_TM
## 4       CMPLNT_TO_DT
## 5       CMPLNT_TO_TM
## 6             RPT_DT
## 7              KY_CD
## 8          OFNS_DESC
## 9              PD_CD
## 10           PD_DESC
## 11  CRM_ATPT_CPTD_CD
## 12        LAW_CAT_CD
## 13        JURIS_DESC
## 14           BORO_NM
## 15       ADDR_PCT_CD
## 16 LOC_OF_OCCUR_DESC
## 17     PREM_TYP_DESC
## 18          PARKS_NM
## 19        HADEVELOPT
## 20        X_COORD_CD
## 21        Y_COORD_CD
## 22          Latitude
## 23         Longitude
##                                                                                                                                      Description
## 1                                                                                           Randomly generated persistent ID for each complaint 
## 2                                       Exact date of occurrence for the reported event (or starting date of occurrence, if CMPLNT_TO_DT exists)
## 3                                       Exact time of occurrence for the reported event (or starting time of occurrence, if CMPLNT_TO_TM exists)
## 4                                                       Ending date of occurrence for the reported event, if exact time of occurrence is unknown
## 5                                                       Ending time of occurrence for the reported event, if exact time of occurrence is unknown
## 6                                                                                                             Date event was reported to police 
## 7                                                                                                        Three digit offense classification code
## 8                                                                                             Description of offense corresponding with key code
## 9                                                                         Three digit internal classification code (more granular than Key Code)
## 10                                    Description of internal classification corresponding with PD code (more granular than Offense Description)
## 11                                 Indicator of whether crime was successfully completed or attempted, but failed or was interrupted prematurely
## 12                                                                                             Level of offense: felony, misdemeanor, violation 
## 13 Jurisdiction responsible for incident. Either internal, like Police, Transit, and Housing; or external, like Correction, Port Authority, etc.
## 14                                                                                        The name of the borough in which the incident occurred
## 15                                                                                                   The precinct in which the incident occurred
## 16                                             Specific location of occurrence in or around the premises; inside, opposite of, front of, rear of
## 17                                                                      Specific description of premises; grocery store, residence, street, etc.
## 18                                        Name of NYC park, playground or greenspace of occurrence, if applicable (state parks are not included)
## 19                                                                                Name of NYCHA housing development of occurrence, if applicable
## 20                                     X-coordinate for New York State Plane Coordinate System, Long Island Zone, NAD 83, units feet (FIPS 3104)
## 21                                     Y-coordinate for New York State Plane Coordinate System, Long Island Zone, NAD 83, units feet (FIPS 3104)
## 22                                                      Latitude coordinate for Global Coordinate System, WGS 1984, decimal degrees (EPSG 4326) 
## 23                                                      Longitude coordinate for Global Coordinate System, WGS 1984, decimal degrees (EPSG 4326)

Data load - 2014 & 2015

Import CSV and rename columns so they are more readily understandable.

NYPD <- read.csv("NYPD_Complaint_Data_Historic.csv")
colnames(NYPD) <- c("crime_id","occurance_date","occurance_time","ending_date","ending_time","reported_date","offense_classification_code","offense_classification_description","internal_classification_code","internal_classification_description","crime_status","level_of_offense","type_of_jurisdiction","borough","precienct","specific_location","type_of_location","park_name","housing_name","x_coordinate","y_coordinate","latitude","longitude","location")
NYPD <- NYPD[,-c(1,7,9,19,18,20,21,24)] #Removal of columns
head(NYPD)
##   occurance_date occurance_time ending_date ending_time reported_date
## 1     12/31/2015       23:45:00                            12/31/2015
## 2     12/31/2015       23:36:00                            12/31/2015
## 3     12/31/2015       23:30:00                            12/31/2015
## 4     12/31/2015       23:30:00                            12/31/2015
## 5     12/31/2015       23:25:00  12/31/2015    23:30:00    12/31/2015
## 6     12/31/2015       23:18:00  12/31/2015    23:25:00    12/31/2015
##   offense_classification_description internal_classification_description
## 1                            FORGERY      FORGERY,ETC.,UNCLASSIFIED-FELO
## 2    MURDER & NON-NEGL. MANSLAUGHTER                                    
## 3                    DANGEROUS DRUGS      CONTROLLED SUBSTANCE,INTENT TO
## 4       ASSAULT 3 & RELATED OFFENSES                           ASSAULT 3
## 5       ASSAULT 3 & RELATED OFFENSES                           ASSAULT 3
## 6                     FELONY ASSAULT            ASSAULT 2,1,UNCLASSIFIED
##   crime_status level_of_offense type_of_jurisdiction   borough precienct
## 1    COMPLETED           FELONY     N.Y. POLICE DEPT     BRONX        44
## 2    COMPLETED           FELONY     N.Y. POLICE DEPT    QUEENS       103
## 3    COMPLETED           FELONY     N.Y. POLICE DEPT MANHATTAN        28
## 4    COMPLETED      MISDEMEANOR     N.Y. POLICE DEPT    QUEENS       105
## 5    COMPLETED      MISDEMEANOR     N.Y. POLICE DEPT MANHATTAN        13
## 6    ATTEMPTED           FELONY     N.Y. POLICE DEPT  BROOKLYN        71
##   specific_location type_of_location latitude longitude
## 1            INSIDE   BAR/NIGHT CLUB 40.82885 -73.91666
## 2           OUTSIDE                  40.69734 -73.78456
## 3                              OTHER 40.80261 -73.94505
## 4            INSIDE  RESIDENCE-HOUSE 40.65455 -73.72634
## 5          FRONT OF            OTHER 40.73800 -73.98789
## 6          FRONT OF       DRUG STORE 40.66502 -73.95711

Data cleaning

We take Occurance_date and subset into more time relate columns, month, day, weekday, etc. Filter data to appropriate year to rid of lagging rows.

NYPD <- NYPD %>%
  mutate(occurance_year = year(mdy(occurance_date)),
         occurance_month = month(mdy(occurance_date)),
         occurance_day = day(mdy(occurance_date)),
         occurance_weekdays = weekdays(mdy(occurance_date)),
         diff_reported.occurance = difftime(mdy(reported_date),mdy(occurance_date),units = "day"),
         occurance_date_time = as.POSIXct(paste(occurance_date,occurance_time),format = "%m/%d/%Y %H:%M:%S"),
         ending_date_time = as.POSIXct(paste(ending_date,ending_time),format = "%m/%d/%Y %H:%M:%S"),
         diff_ending.occurance = round(difftime(ending_date_time,occurance_date_time,units = "hours"),digits = 2),
         weekends = ifelse(occurance_weekdays %in% c("Sunday","Saturday"),"Yes","No")
         ) %>% 
  filter(occurance_year == "2014" | occurance_year == "2015")

Total Crime numbers for 2015 and 2015

NYPD1 <- NYPD %>% 
     group_by(occurance_year) %>% 
     summarise(total_crime = length(occurance_date))
NYPD1$total_crime <- as.numeric(NYPD1$total_crime)
NYPD1$occurance_year <- as.character(NYPD1$occurance_year)
ggplot(data=NYPD1, aes(x=occurance_year, y=total_crime)) +
geom_bar(stat="identity") + scale_y_continuous(name="Crime", labels = scales::comma)

H1 <- ggplot(data=NYPD1, aes(x=occurance_year, y=total_crime)) +
      geom_bar(stat="identity") + scale_y_continuous(name="Crime", labels = scales::comma)
H2 <- ggplot(data=U_P, aes(x=year, y=value)) +
      geom_bar(stat="identity") + scale_y_continuous(name="Crime", labels = scales::comma)
grid.arrange(H1, H2, ncol = 2)

NYPD1
## # A tibble: 2 x 2
##   occurance_year total_crime
##   <chr>                <dbl>
## 1 2014                490363
## 2 2015                468576

As per observation we notice that crime has declined as a whole in NYC in the past 2 decades, not by an entirely staggering amount. Total crime was reported at 490,363 in 2014 and has come down to 468,576 in 2015. This is a decrease of about 4.4%. However the 2015 crime numbers represent a 5.3% increase from the reported 1995 crime numbers. This might be due to the decrease in “Creative” reporting of the compstat data in the past decade.

Current Crime analysis

NYPD2014 <- filter(NYPD,occurance_year == "2014")
NYPD2015 <- filter(NYPD,occurance_year == "2015")
T20141 <- NYPD2014 %>% 
     ggplot(aes(as.factor(x = occurance_month))) + 
     geom_bar() +
     ggtitle("Monthly New York Crime - 2014") + xlab("Month") + ylab("number of crime")
T20151 <- NYPD2015 %>% 
     ggplot(aes(as.factor(x = occurance_month))) + 
     geom_bar() +
     ggtitle("Monthly New York Crime - 2015") + xlab("Month") + ylab("number of crime")
grid.arrange(T20141, T20151, ncol = 2)

(NYPD %>% 
     ggplot(aes(as.factor(x = occurance_month),fill = as.factor(occurance_year))) +
     geom_bar(position = "fill") +
     xlab("Month") + ylab("proportion") +
     scale_fill_discrete(guide = guide_legend(title = "year")))

Here we see a strong correlation between crime spiking as the weather becomes warmer and slowing down in the winter tempeartures. Theres a strong decline in crime in january, likely due to ramped up efforts around the New year.

Comparison of different boroughs

NYPD20142 <- NYPD2014 %>%
  ggplot(aes(x = borough, fill = level_of_offense)) +
  geom_bar(position = "dodge") +  ggtitle("Borough - 2014") 

NYPD20152 <- NYPD2015 %>%
  ggplot(aes(x = borough, fill = level_of_offense)) +
  geom_bar(position = "dodge") + ggtitle("Borough - 2015") 

grid.arrange(NYPD20142, NYPD20152, ncol = 1)

Decrease in crime in boroughs

(NYPDB1 <- NYPD2014 %>% 
     group_by(borough) %>% 
     summarise(total_crime = length(occurance_date)))
## # A tibble: 5 x 2
##   borough       total_crime
##   <fct>               <int>
## 1 BRONX              106016
## 2 BROOKLYN           148453
## 3 MANHATTAN          113096
## 4 QUEENS             100087
## 5 STATEN ISLAND       22711
(NYPDB2 <- NYPD2015 %>% 
     group_by(borough) %>% 
     summarise(total_crime = length(occurance_date)))
## # A tibble: 5 x 2
##   borough       total_crime
##   <fct>               <int>
## 1 BRONX              102950
## 2 BROOKLYN           140351
## 3 MANHATTAN          110580
## 4 QUEENS              92981
## 5 STATEN ISLAND       21714
borough.crime <- NYPD %>% 
  group_by(borough) %>% 
  summarise(n_borough = n())
borough.crime2014 <- NYPD2014 %>% 
  group_by(borough) %>% 
  summarise(n_borough = n())
borough.crime2015 <- NYPD2015 %>% 
  group_by(borough) %>% 
  summarise(n_borough = n())
borough.crime2014 %>% 
  ggplot(aes(x = borough, y = n_borough)) +
  geom_bar(stat = "identity") +
  ggtitle("Number of crime in different borough - 2014") + 
  xlab("Borough") + ylab("number of crime") +
  geom_text(aes(label = ..y..), vjust = -.5)

borough.crime2015 %>% 
  ggplot(aes(x = borough, y = n_borough)) +
  geom_bar(stat = "identity") +
  ggtitle("Number of crime in different borough - 2015") + 
  xlab("Borough") + ylab("number of crime") +
  geom_text(aes(label = ..y..), vjust = -.5)

Types of crime

T1A<-  NYPD2014 %>% 
  group_by(offense_classification_description) %>% 
  summarise(n_class = n()) %>% 
  arrange(desc(n_class)) %>% 
  head(n = 5) %>% 
  ggplot(aes(x = reorder(offense_classification_description, -n_class), y = n_class)) +
  geom_bar(stat = "identity") +
  ggtitle("Top 5 crime type in New York - 2014") + 
  xlab("offense classification") + ylab("number of crime") +
  geom_text(aes(label = ..y..), vjust = 0.5) +
  coord_flip()
T1B<- NYPD2015 %>% 
  group_by(offense_classification_description) %>% 
  summarise(n_class = n()) %>% 
  arrange(desc(n_class)) %>% 
  head(n = 5) %>% 
  ggplot(aes(x = reorder(offense_classification_description, -n_class), y = n_class)) +
  geom_bar(stat = "identity") +
  ggtitle("Top 5 crime type in New York - 2015") + 
  xlab("offense classification") + ylab("number of crime") +
  geom_text(aes(label = ..y..), vjust = 0.5) +
  coord_flip()
grid.arrange(T1A, T1B, ncol = 1)

War against drugs

NYPD2014DRUGS <- filter(NYPD2014,offense_classification_description == "DANGEROUS DRUGS")
DRUG.distribution2014 <- sample_n(NYPD2014DRUGS, 10e3) 

leaflet(data = DRUG.distribution2014) %>% 
  addProviderTiles("Stamen.TonerLite",
                     group = "Toner", 
                     options = providerTileOptions(minZoom = 1, maxZoom = 100)) %>%
  addCircleMarkers(~ longitude, ~latitude, radius = 0.0001, color = "orange", fillOpacity = .00001) 
## Warning in validateCoords(lng, lat, funcName): Data contains 214 rows with
## either missing or invalid lat/lon values and will be ignored
NYPD2015DRUGS <- filter(NYPD2015,offense_classification_description == "DANGEROUS DRUGS")
DRUG.distribution2015 <- sample_n(NYPD2015DRUGS, 10e3) 

leaflet(data = DRUG.distribution2015) %>% 
  addProviderTiles("Stamen.TonerLite",
                     group = "Toner", 
                     options = providerTileOptions(minZoom = 1, maxZoom = 100)) %>%
  addCircleMarkers(~ longitude, ~latitude, radius = 0.0001, color = "red", fillOpacity = .00001) 
## Warning in validateCoords(lng, lat, funcName): Data contains 40 rows with
## either missing or invalid lat/lon values and will be ignored
include_graphics(rep("S1.png", 1))

The shaded portion on the image represents the war against drugs as per 1993, in comparison with 2014 and 2015 those areas are still showing residual impacts of the past. Being that they are still drug hotspots.

Conclusion

William Bratton has had a major impact on the way crime is fought in todays time, he led a data revolution that gave insgihts into fighting crime that was not possible in the past. Although there may have been creativity as play in the generally large impact he had in his first tenure as NYC commisoner, his second term was still impactful with crime decreasing across the boroughs.