Part 1 - Introduction

Goal

I am seeking to understand the rates of criminal complaints by borough and then compare this with regional macro factors such as unemployment and income levels to determine if there is a correlation amongst them.

Motivation for performing the analysis

My sister just graduated with a master’s degree in education and is about to list the school of preference by borough. I would like to conduct this analysis to help her determine which borough is safest for her to be teaching.

Data Details

Data for this analysis is made up of three components.

  • The first dataset represents a breakdown of every criminal complaint report filed in NYC by the NYPD up until December 2018. Each record represents a criminal complaint in NYC and includes information about the types of crime (ex. felony, misdemeanor, etc. ), the location (by city borough), and time of enforcement. The data is from the New York State website.

  • The second dataset represents all unemployment statistics for NYC from the New York State website.

  • The third dataset represents income and poverty estimates for NYC by borough from the United States Census Bureau.

Part 2 - Data

Load Data

# Raw JSON Url for updated information of NYC Reported Compliants.
download_data <- function(){
    
    # Defining our limit of compliants
    limit = 50000
    url <- "https://data.cityofnewyork.us/resource/7x9x-zpz6.json?"
    url <- paste(url, '&$limit=', limit, sep='')
    
    # Extract data into JSON data frame from API query
    Raw <- fromJSON(url)
    
    # Return extracted data
    return(Raw)
}
# Update complaint Table
complaint_Raw <- download_data()

Data Chacteristics

Dates

  • First reported complaint : 2010-03-25.

  • Last reported complaint: 2018-12-31.

  • Total number of days passed since first and last reported complaint: 3203.

  • Total number of years passed since first and last reported complaint: 9.

Amount of data collected

  • Total number of records referring to complaint in NYC: 49948.

Procedure to Find summary yearly Values

# we are generatin colum for each daily complaint
complaint_Raw$num_complaint <- 1
Totalcomplaint <- complaint_Raw$num_complaint %>% sum()
yearlycomplaintdata <- complaint_Raw %>%
                    group_by(Year) %>%
                    summarise(`No. complaint` = sum(num_complaint),
                               Percentage = paste(round((`No. complaint`/Totalcomplaint)*100,2),"%"),
                              `Monthly Average` = round(`No. complaint`/12,0),
                              `Daily Average` = round(`No. complaint`/365,0)
                              )  %>%
                    arrange(desc(`Year`))
kable(yearlycomplaintdata)
Year No. complaint Percentage Monthly Average Daily Average
2018 48777 97.66 % 4065 134
2017 934 1.87 % 78 3
2016 93 0.19 % 8 0
2015 57 0.11 % 5 0
2014 32 0.06 % 3 0
2013 18 0.04 % 2 0
2012 10 0.02 % 1 0
2011 13 0.03 % 1 0
2010 14 0.03 % 1 0

Procedure to Find summary monthly Values

monthlycomplaintdata <- complaint_Raw %>%
                    group_by(Year, Month, MonthYM) %>%
                    summarise(`No. complaint` = sum(num_complaint),
                              Percentage = paste(round((`No. complaint`/Totalcomplaint)*100,2),"%"),
                              `Daily Average` = round(`No. complaint`/30,0)
                              )  %>%
                    arrange(desc(`MonthYM`))
monthlycomplaintdata_1 <- monthlycomplaintdata
monthlycomplaintdata <- subset(monthlycomplaintdata, select = -c(Year, MonthYM))
kable(head(monthlycomplaintdata,20))
Month No. complaint Percentage Daily Average
Dec, 2018 3532 7.07 % 118
Nov, 2018 3712 7.43 % 124
Oct, 2018 4098 8.2 % 137
Sep, 2018 4195 8.4 % 140
Aug, 2018 4431 8.87 % 148
Jul, 2018 4394 8.8 % 146
Jun, 2018 4379 8.77 % 146
May, 2018 4443 8.9 % 148
Apr, 2018 3970 7.95 % 132
Mar, 2018 3960 7.93 % 132
Feb, 2018 3765 7.54 % 126
Jan, 2018 3898 7.8 % 130
Dec, 2017 379 0.76 % 13
Nov, 2017 139 0.28 % 5
Oct, 2017 98 0.2 % 3
Sep, 2017 70 0.14 % 2
Aug, 2017 48 0.1 % 2
Jul, 2017 35 0.07 % 1
Jun, 2017 29 0.06 % 1
May, 2017 22 0.04 % 1

Procedure to Find summary yearly Values by the borough

boroughyearlycomplaintdata <- complaint_Raw %>%
                    group_by(boro_nm) %>%
                    summarise(`No. complaint` = sum(num_complaint),
                              Percentage = paste(round((`No. complaint`/Totalcomplaint)*100,2),"%"),
                              `Monthly Average` = round(`No. complaint`/12,0),
                              `Daily Average` = round(`No. complaint`/365,0)
                              )  %>%
                    arrange(desc(`boro_nm`))
kable(boroughyearlycomplaintdata)
boro_nm No. complaint Percentage Monthly Average Daily Average
STATEN ISLAND 2259 4.52 % 188 6
QUEENS 9620 19.26 % 802 26
MANHATTAN 12192 24.41 % 1016 33
BROOKLYN 14708 29.45 % 1226 40
BRONX 11093 22.21 % 924 30
NA 76 0.15 % 6 0

Procedure to Find summary monthly Values by the borough

boroughmonthlycomplaintdata <- complaint_Raw %>%
                    group_by(Year, Month, MonthYM, boro_nm) %>%
                    summarise(`No. complaint` = sum(num_complaint),
                              Percentage = paste(round((`No. complaint`/Totalcomplaint)*100,2),"%"),
                              Percentage_int = round((`No. complaint`/Totalcomplaint)*100,2),
                              `Daily Average` = round(`No. complaint`/30,0)
                              )  %>%
                    arrange(desc(`MonthYM`))
boroughmonthlycomplaintdata_1 <- boroughmonthlycomplaintdata
boroughmonthlycomplaintdata <- subset(boroughmonthlycomplaintdata, select = -c(Year, MonthYM))
kable(head(boroughmonthlycomplaintdata,20))
Month boro_nm No. complaint Percentage Percentage_int Daily Average
Dec, 2018 NA 6 0.01 % 0.01 0
Dec, 2018 BRONX 774 1.55 % 1.55 26
Dec, 2018 BROOKLYN 1009 2.02 % 2.02 34
Dec, 2018 MANHATTAN 890 1.78 % 1.78 30
Dec, 2018 QUEENS 695 1.39 % 1.39 23
Dec, 2018 STATEN ISLAND 158 0.32 % 0.32 5
Nov, 2018 NA 5 0.01 % 0.01 0
Nov, 2018 BRONX 790 1.58 % 1.58 26
Nov, 2018 BROOKLYN 1114 2.23 % 2.23 37
Nov, 2018 MANHATTAN 919 1.84 % 1.84 31
Nov, 2018 QUEENS 724 1.45 % 1.45 24
Nov, 2018 STATEN ISLAND 160 0.32 % 0.32 5
Oct, 2018 NA 2 0 % 0.00 0
Oct, 2018 BRONX 933 1.87 % 1.87 31
Oct, 2018 BROOKLYN 1221 2.44 % 2.44 41
Oct, 2018 MANHATTAN 999 2 % 2.00 33
Oct, 2018 QUEENS 772 1.55 % 1.55 26
Oct, 2018 STATEN ISLAND 171 0.34 % 0.34 6
Sep, 2018 NA 6 0.01 % 0.01 0
Sep, 2018 BRONX 950 1.9 % 1.90 32

Procedure to Find summary daily Values by the borough

boroughcomplaintdata <- complaint_Raw %>%
                    group_by(Year, Month, MonthYM, boro_nm) %>%
                    summarise(`No. complaint` = sum(num_complaint))  %>%
                    arrange(desc(`MonthYM`))
                    
boroughcomplaintdata <- spread(data = boroughcomplaintdata,
                             key = boro_nm,
                             value = `No. complaint`
                             )

colnames(boroughcomplaintdata)[9] <- "Not Available"
kable(head(boroughcomplaintdata,20))
Year Month MonthYM BRONX BROOKLYN MANHATTAN QUEENS STATEN ISLAND Not Available
2010 Apr, 2010 2010/04 1 NA 1 NA NA NA
2010 Dec, 2010 2010/12 NA 2 1 NA NA NA
2010 Feb, 2010 2010/02 NA 1 NA NA NA NA
2010 Jan, 2010 2010/01 1 1 2 NA NA NA
2010 Jun, 2010 2010/06 1 2 NA NA NA NA
2010 Oct, 2010 2010/10 NA NA NA NA 1 NA
2011 Apr, 2011 2011/04 NA NA NA NA 1 NA
2011 Feb, 2011 2011/02 NA 1 NA NA NA NA
2011 Jan, 2011 2011/01 NA 1 1 2 1 NA
2011 Mar, 2011 2011/03 1 NA 1 NA NA NA
2011 May, 2011 2011/05 NA 1 NA NA NA NA
2011 Oct, 2011 2011/10 NA NA 1 NA NA NA
2011 Sep, 2011 2011/09 1 NA 1 NA NA NA
2012 Aug, 2012 2012/08 NA 1 NA NA NA NA
2012 Dec, 2012 2012/12 1 NA NA NA NA NA
2012 Feb, 2012 2012/02 NA NA NA 1 NA NA
2012 Jan, 2012 2012/01 NA NA NA NA 1 NA
2012 Jul, 2012 2012/07 NA 1 NA NA 1 NA
2012 May, 2012 2012/05 1 NA NA NA NA NA
2012 Oct, 2012 2012/10 NA NA NA 1 NA NA

Yearly summary

Table 1: Summary yearly table of complaints in NYC.
Year No. complaint Percentage Monthly Average Daily Average
2018 48777 97.66 % 4065 134
2017 934 1.87 % 78 3
2016 93 0.19 % 8 0
2015 57 0.11 % 5 0
2014 32 0.06 % 3 0
2013 18 0.04 % 2 0
2012 10 0.02 % 1 0
2011 13 0.03 % 1 0
2010 14 0.03 % 1 0
Table 1.1: Poorly forecasted trend based on a current daily average for the current year.
Year Current.Reported Curent.Daily.Avg Forecasted.Month.Avg Forecasted.Yearly.Avg Previous.Year Difference
2018 48777 372 11160 135780 934 134846

Yearly Summary Graph

Monthly summary

In this section, I have listed our given data into monthly results, with hopes of finding some patterns; one of these patterns could be related to the season of the year for example.

Table 2: Summary monthly table of criminal complaints in NYC.
Month No. complaint Percentage Daily Average
Dec, 2018 3532 7.07 % 118
Nov, 2018 3712 7.43 % 124
Oct, 2018 4098 8.2 % 137
Sep, 2018 4195 8.4 % 140
Aug, 2018 4431 8.87 % 148
Jul, 2018 4394 8.8 % 146
Jun, 2018 4379 8.77 % 146
May, 2018 4443 8.9 % 148
Apr, 2018 3970 7.95 % 132
Mar, 2018 3960 7.93 % 132
Feb, 2018 3765 7.54 % 126
Jan, 2018 3898 7.8 % 130
Dec, 2017 379 0.76 % 13
Nov, 2017 139 0.28 % 5
Oct, 2017 98 0.2 % 3
Sep, 2017 70 0.14 % 2
Aug, 2017 48 0.1 % 2
Jul, 2017 35 0.07 % 1
Jun, 2017 29 0.06 % 1
May, 2017 22 0.04 % 1

I’m going to show only Date from 2017 monthly visulation

Part 3 - Statistics by Borough

Borough yearly summary

Table 3: Borough summary yearly table of complaints in NYC.
boro_nm No. complaint Percentage Monthly Average Daily Average
BROOKLYN 14708 29.45 % 1226 40
MANHATTAN 12192 24.41 % 1016 33
BRONX 11093 22.21 % 924 30
QUEENS 9620 19.26 % 802 26
STATEN ISLAND 2259 4.52 % 188 6
NA 76 0.15 % 6 0

Borough yearly summary Graph

Borough monthly summary

Table 4: Borough summary monthly table of complaints in NYC.
Month BRONX BROOKLYN MANHATTAN QUEENS STATEN ISLAND Not Available Total
Dec, 2018 774 1009 890 695 158 6 3532
Nov, 2018 790 1114 919 724 160 5 3712
Oct, 2018 933 1221 999 772 171 2 4098
Sep, 2018 950 1201 1009 851 178 6 4195
Aug, 2018 992 1338 1052 822 221 6 4431
Jul, 2018 1025 1299 1002 851 205 12 4394
Jun, 2018 934 1283 1075 893 186 8 4379
May, 2018 956 1346 1043 859 232 7 4443
Apr, 2018 876 1141 986 770 189 8 3970
Mar, 2018 891 1182 995 739 148 5 3960

Bronx borough

Queens borough

Brooklyn borough

Manahttan borough

Staten Island borough

Unknown borough

Borough Criminal Complaint Description summary

Table 10: Borough summary table of contributing factors at the time of complaint involving cyclists in NYC.
ofns_desc BRONX BROOKLYN MANHATTAN QUEENS STATEN ISLAND Not Available Total
PETIT LARCENY 1767 2580 2943 1673 344 0 9307
HARRASSMENT 2 1804 2250 1539 1542 486 0 7621
ASSAULT 3 & RELATED OFFENSES 1575 1633 1104 1115 236 0 5663
CRIMINAL MISCHIEF & RELATED OF 1096 1478 1011 1077 269 0 4931
GRAND LARCENY 706 1166 1826 830 137 1 4666
OFF. AGNST PUB ORD SENSBLTY & 483 727 515 408 132 0 2265
FELONY ASSAULT 621 645 395 427 71 0 2159
DANGEROUS DRUGS 571 488 407 93 59 0 1618
ROBBERY 353 464 311 259 39 0 1426
MISCELLANEOUS PENAL LAW 204 566 189 376 83 0 1418
BURGLARY 254 445 280 255 31 0 1265
DANGEROUS WEAPONS 267 290 177 166 30 0 930
OFFENSES AGAINST PUBLIC ADMINI 290 176 203 117 46 0 832
SEX CRIMES 157 282 216 144 23 0 822
VEHICLE AND TRAFFIC LAWS 187 220 103 198 31 0 739
GRAND LARCENY OF MOTOR VEHICLE 135 168 75 176 21 0 575
INTOXICATED & IMPAIRED DRIVING 91 183 87 126 73 0 560
FORGERY 80 228 141 93 13 0 555
THEFT-FRAUD 53 125 156 113 27 0 474
CRIMINAL TRESPASS 48 98 80 89 26 0 341
FRAUDS 37 59 107 46 23 0 272
RAPE 44 62 42 44 12 0 204
POSSESSION OF STOLEN PROPERTY 31 68 42 43 3 0 187
UNAUTHORIZED USE OF A VEHICLE 25 45 20 73 7 0 170
OFFENSES INVOLVING FRAUD 40 28 40 14 3 0 125
OTHER OFFENSES RELATED TO THEF 26 35 27 24 7 0 119
ADMINISTRATIVE CODE 37 41 12 20 5 0 115
OFFENSES AGAINST THE PERSON 20 38 27 22 8 0 115
ARSON 26 33 19 22 4 0 104
MURDER & NON-NEGL. MANSLAUGHTER 0 0 0 0 0 75 75
NYS LAWS-UNCLASSIFIED FELONY 16 12 25 9 4 0 66
OTHER STATE LAWS (NON PENAL LA 7 13 8 3 2 0 33
BURGLAR’S TOOLS 3 9 15 3 0 0 30
THEFT OF SERVICES 8 8 11 2 0 0 29
FRAUDULENT ACCOSTING 2 3 13 2 0 0 20
GAMBLING 1 7 5 2 0 0 15
ALCOHOLIC BEVERAGE CONTROL LAW 4 6 2 2 0 0 14
KIDNAPPING & RELATED OFFENSES 4 6 2 2 0 0 14
AGRICULTURE & MRKTS LAW-UNCLASSIFIED 5 2 2 3 1 0 13
PETIT LARCENY OF MOTOR VEHICLE 4 2 5 1 0 0 12
PROSTITUTION & RELATED OFFENSES 5 3 0 2 0 0 10
JOSTLING 2 0 6 0 0 0 8
OFFENSES RELATED TO CHILDREN 0 5 0 3 0 0 8
OFFENSES AGAINST PUBLIC SAFETY 2 3 0 1 0 0 6
DISORDERLY CONDUCT 0 3 2 0 0 0 5
ENDAN WELFARE INCOMP 2 1 0 0 0 0 3
CHILD ABANDONMENT/NON SUPPORT 0 1 0 0 1 0 2
OTHER STATE LAWS 0 0 2 0 0 0 2
HOMICIDE-NEGLIGENT,UNCLASSIFIE 0 1 0 0 0 0 1
KIDNAPPING 0 0 0 0 1 0 1
LOITERING/GAMBLING (CARDS, DIC 0 1 0 0 0 0 1
NEW YORK CITY HEALTH CODE 0 1 0 0 0 0 1
0 0 0 0 0 1 0 1

Part 4 - Crime Description

p <- plot_ly(data = head(boroughcomplaintdescdata_1), 
             y = ~ofns_desc,
             x = ~Total,
             text = ~paste("Total of mentions:", Total),
             type = 'bar',
             orientation = 'h') %>%
  layout(yaxis = list(title = "Crime Description"),
         xaxis = list(title = "Number of Crime Complaints."),
         margin = list(b = 100, l = 200, r = 70)) %>%
  layout(showlegend = FALSE, 
         title = " Top 10 Crime Complaint in NYC of all times.", 
         autosize = TRUE
         ) %>%
  layout(autosize = F, width = 800, height = 500)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
ggplotly(p)

MANHATTAN borough

p <- plot_ly(data = head(boroughcomplaintdescdata_1,10), 
             y = ~ofns_desc,
             x = ~MANHATTAN,
             text = ~paste("Total of mentions:", Total),
             type = 'bar',
             orientation = 'h') %>%
  layout(yaxis = list(title = "Crime Description"),
         xaxis = list(title = "Number of Crime Complaints."),
         margin = list(b = 100, l = 200, r = 80)) %>%
  layout(showlegend = FALSE, 
         title = "Top 10 Crime Complaint in MANHATTAN of all times.", 
         autosize = TRUE
         ) %>%
  layout(autosize = F, width = 800, height = 500)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
ggplotly(p)

Bronx borough

p <- plot_ly(data = head(boroughcomplaintdescdata_1,10), 
             y = ~ofns_desc,
             x = ~BRONX,
             text = ~paste("Total of mentions:", Total),
             type = 'bar',
             orientation = 'h') %>%
  layout(yaxis = list(title = "Crime Description"),
         xaxis = list(title = "Number of Crime Complaints."),
         margin = list(b = 100, l = 200, r = 80)) %>%
  layout(showlegend = FALSE, 
         title = " Top 10 Crime Complaint in BRONX of all times.", 
         autosize = TRUE
         ) %>%
  layout(autosize = F, width = 800, height = 500)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
ggplotly(p)

Staten Island borough

p <- plot_ly(data = head(boroughcomplaintdescdata_1,10), 
             y = ~ofns_desc,
             x = ~`STATEN ISLAND`,
             text = ~paste("Total of mentions:", Total),
             type = 'bar',
             orientation = 'h') %>%
  layout(yaxis = list(title = "Crime Description"),
         xaxis = list(title = "Number of Crime Complaints."),
         margin = list(b = 100, l = 200, r = 80)) %>%
  layout(showlegend = FALSE, 
         title = " Top 10 Crime Complaint in STATEN ISLAND of all times.", 
         autosize = TRUE
         ) %>%
  layout(autosize = F, width = 800, height = 500)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
ggplotly(p)

Brooklyn borough

p <- plot_ly(data = head(boroughcomplaintdescdata_1,10), 
             y = ~ofns_desc,
             x = ~BROOKLYN,
             text = ~paste("Total of mentions:", Total),
             type = 'bar',
             orientation = 'h') %>%
  layout(yaxis = list(title = "Crime Description"),
         xaxis = list(title = "Number of Crime Complaints."),
         margin = list(b = 100, l = 200, r = 80)) %>%
  layout(showlegend = FALSE, 
         title = " Top 10 Crime Complaint in BROOKLYN of all times.", 
         autosize = TRUE
         ) %>%
  layout(autosize = F, width = 800, height = 500)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
ggplotly(p)

Queens borough

p <- plot_ly(data = head(boroughcomplaintdescdata_1,10), 
             y = ~ofns_desc,
             x = ~QUEENS,
             text = ~paste("Total of mentions:", Total),
             type = 'bar',
             orientation = 'h') %>%
  layout(yaxis = list(title = "Crime Description"),
         xaxis = list(title = "Number of Crime Complaints."),
         margin = list(b = 100, l = 200, r = 80)) %>%
  layout(showlegend = FALSE, 
         title = " Top 10 Crime Complaint in QUEENS of all times.", 
         autosize = TRUE
         ) %>%
  layout(autosize = TRUE, width = 800, height = 500)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
ggplotly(p)

PART 5 - Observed patterns

we can quickly observe a few interesting patterns of contributing factors reported as crime descriptions as follows:

  • The top 10 crime complaint descriptions all over the 5 boroughs, that is Petit Larceny & Related Offenses , Harresment, Robbery, Assault 3 & Related Offenses , Criminal Miscief,Grand Larceny , Off. Agnst Pub ord Sensblty, Fellony Assault, MISCELLANEOUS PENAL LAW, Dangerous Drugs are the top criminal complaints descriptions all over the five boroughs.

  • MANHATTAN seems to present a very high number of crime complaints Petit Larceny & Related Offenses contrary to the BRONX , STATEN ISLAND ,Queens for example.

  • Brooklyn and Manhattan stand out due to Dangerous Drugs, doubling up Queens and about three times more than the Staten Island boroughs.

  • BROOKLYN present a high number of Roobery mentions compared to the other boroughs.

PART 6 - Modeling

Here, I’m going to find out whether Bronx borough County compalint rates correlated with Bronx (regional data) macro factors such as Unemployment rate and poverty.

Macro Factor Data Load

rawpoverty<-read.csv(url("https://raw.githubusercontent.com/omerozeren/DATA607/master/FINAL_PROJECT/Poverty_Data.csv"))
names(rawpoverty)[4]<-"county"
unemployment<-read.csv(url("https://raw.githubusercontent.com/omerozeren/DATA607/master/FINAL_PROJECT/Local_Area_Unemployment_Data.csv"))
names(unemployment)[1]<-"county"

Borough Crime Complaint Data

BRONX_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='BRONX',]
QUEENS_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='QUEENS',]
STATEN_ISLAND_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='STATEN ISLAND',]
MANHATTAN_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='MANHATTAN',]
BROOKLYN_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='BROOKLYN',]

Tidy Poverty Macro Factor Data

# 2013 values
pov13 <- rawpoverty %>% filter(Year == 2013) %>% select(county,pct = `All.Ages.in.Poverty.Percent`, inc = `Median.Household.Income.in.Dollars`)

# 2016 values
pov16 <- rawpoverty %>% filter(Year == 2016) %>% select(county,pct = `All.Ages.in.Poverty.Percent`, inc = `Median.Household.Income.in.Dollars`)

# Fix income values by removing $ and ,
pov13$inc <- as.numeric(str_replace_all(pov13$inc,"\\$|,",""))
pov16$inc <- as.numeric(str_replace_all(pov16$inc,"\\$|,",""))

# Combine our data frames
poverty <- inner_join(pov13, pov16, by=c("county" = "county"),
                      suffix = c("_2013","_2016"))

# Compute changes and impute interim values
poverty <- poverty %>%
  mutate(povChg = pct_2016 - pct_2013, incChg = inc_2016 - inc_2013,
         povIncrement = povChg / 3, incIncrement = incChg / 3,
         pct_2014 = pct_2013 + povIncrement, pct_2015 = pct_2016 - povIncrement,
         inc_2014 = inc_2013 + incIncrement, inc_2015 = inc_2016 - incIncrement)

# Tidy the data
poverty <- poverty %>% 
  gather(key="year",
         value="value",
         pct_2013,pct_2014,pct_2015,pct_2016,
         inc_2013,inc_2014,inc_2015,inc_2016)

poverty <- poverty %>% separate(year,c("measure","yr"),"_")

poverty <- poverty %>% spread(key="measure",value="value") %>% select(county, year = yr, income = inc, poverty = pct)

poverty$year <- as.numeric(poverty$year)

# Adjust income to 1,000's scale
poverty$income <- round(poverty$income/1000,2)

kable(head(poverty,20))
county year income poverty
United States 2013 52.25 15.8
New York 2013 57.26 16.0
Albany County (NY) 2013 55.78 13.7
Allegany County (NY) 2013 41.85 16.7
Bronx County (NY) 2013 33.08 30.7
Broome County (NY) 2013 45.14 17.7
Cattaraugus County (NY) 2013 40.94 18.9
Cayuga County (NY) 2013 48.98 14.2
Chautauqua County (NY) 2013 40.47 19.1
Chemung County (NY) 2013 45.34 17.0
Chenango County (NY) 2013 44.33 16.8
Clinton County (NY) 2013 45.90 15.7
Columbia County (NY) 2013 57.09 11.8
Cortland County (NY) 2013 46.16 14.6
Delaware County (NY) 2013 43.38 17.8
Dutchess County (NY) 2013 69.30 9.5
Erie County (NY) 2013 51.12 15.1
Essex County (NY) 2013 46.34 11.7
Franklin County (NY) 2013 42.67 22.4
Fulton County (NY) 2013 42.26 15.7

Joint Macro Economics Datas

To support our analysis, we will join our data frames so we have all the variables we will be using in a single data frame.

library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
BRONX_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='BRONX',]
BRONX_data<-na.omit(BRONX_data)
names(BRONX_data)[4]<-"county"
unemployment$county<-as.character(unemployment$county)
bronx_unemp<-unemployment[unemployment$county =='Bronx County',]
bronx_poverty <- poverty[poverty$county == "Bronx County (NY)",] 
#Ordering the data
bronx_unemp<-bronx_unemp[order(bronx_unemp$Year, bronx_unemp$Month,decreasing = TRUE), ]
macro_data<- inner_join(bronx_poverty,bronx_unemp,by=c("year" = "Year"))
macro_data$year <- BRONX_data$Year[1:48]

Tidy Model Data

model_data <- macro_data %>%
  group_by(year, Month) %>%
  summarise(average_income = mean(income),
            average_poverty = mean(poverty),
            average_un_rate = mean(Unemployment.Rate))
#model_data$year<-as.character(model_data$year)
model_data<-model_data[order(model_data$year, model_data$Month,decreasing = TRUE), ]
model_data$Criminal_Rate <- BRONX_data$Percentage_int[1:48]

Analysis

Let’s build a model to see how the variables relate to the Bronx crime compliant rate.

# Linear model for 2013
model<- lm(Criminal_Rate ~ average_income + average_poverty + average_un_rate,data=model_data)
summary(model)
## 
## Call:
## lm(formula = Criminal_Rate ~ average_income + average_poverty + 
##     average_un_rate, data = model_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.76305 -0.35172  0.01825  0.35840  0.56284 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -9.886e+03  2.683e+03  -3.685 0.000624 ***
## average_income   1.034e+02  2.809e+01   3.680 0.000633 ***
## average_poverty  2.106e+02  5.712e+01   3.686 0.000621 ***
## average_un_rate  2.422e-01  7.796e-02   3.107 0.003307 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4263 on 44 degrees of freedom
## Multiple R-squared:  0.7279, Adjusted R-squared:  0.7094 
## F-statistic: 39.24 on 3 and 44 DF,  p-value: 1.691e-12
plot(model)

PART 7 - Conclusion and Recommendation:

  • After modeling analysis, it is remarkably true that there is statistically a significant linear relationship between the criminal rate of only the BRONX borough and income, unemployment, or poverty rates. I could not locate income, unemployment, or poverty rates data for the remaining four boroughs

  • A high-level crime description within all of NYC (inclusive of all five boroughs- Queens, Manhattan, Bronx, Staten Island, Brooklyn) displayed that the following complaints were the highest from 2010-2018- petit larceny, harassment, assault & related offenses, and criminal mischief.

  • The crime descriptions broken out by borough for the highest number of complaints:

  • Petit Larceny: Manhattan, Brooklyn, Bronx, Queens, Staten Island

  • Harassment: Brooklyn, Bronx, Manhattan, Queens, Staten Island

  • Assault & Related Offenses: Bronx, Brooklyn, Manhattan, Queens, Staten Island

  • Criminal Mischief: Brooklyn, Bronx, Manhattan, Queens, Staten Island

  • Overall Staten Island has the least criminal complaints amongst the five boroughs. As a result, I recommend my sister to list the following boroughs as her top three preferences to teach: Staten Island, Queens, Manhattan