Final Project- DATA 607

Part 1 - Introduction

Goal

I am seeking to understand the rates of criminal complaints by borough and then compare this with regional macro factors such as unemployment and income levels to determine if there is a correlation amongst them.

Motivation for performing the analysis

My sister just graduated with a master’s degree in education and is about to list the school of preference by borough. I would like to conduct this analysis to help her determine which borough is safest for her to be teaching.

Data Details

Data for this analysis is made up of three components.

The first dataset represents a breakdown of every criminal complaint report filed in NYC by the NYPD up until December 2018. Each record represents a criminal complaint in NYC and includes information about the types of crime (ex. felony, misdemeanor, etc. ), the location (by city borough), and time of enforcement. The data is from the New York State website.
The second dataset represents all unemployment statistics for NYC from the New York State website.
The third dataset represents income and poverty estimates for NYC by borough from the United States Census Bureau.

Part 2 - Data

Load Data

# Raw JSON Url for updated information of NYC Reported Compliants.
download_data <- function(){
    
    # Defining our limit of compliants
    limit = 50000
    url <- "https://data.cityofnewyork.us/resource/7x9x-zpz6.json?"
    url <- paste(url, '&$limit=', limit, sep='')
    
    # Extract data into JSON data frame from API query
    Raw <- fromJSON(url)
    
    # Return extracted data
    return(Raw)
}

# Update complaint Table
complaint_Raw <- download_data()

Data Chacteristics

Dates

First reported complaint : 2010-03-25.
Last reported complaint: 2018-12-31.
Total number of days passed since first and last reported complaint: 3203.
Total number of years passed since first and last reported complaint: 9.

Amount of data collected

Total number of records referring to complaint in NYC: 49948.

Procedure to Find summary yearly Values

# we are generatin colum for each daily complaint
complaint_Raw$num_complaint <- 1
Totalcomplaint <- complaint_Raw$num_complaint %>% sum()
yearlycomplaintdata <- complaint_Raw %>%
                    group_by(Year) %>%
                    summarise(`No. complaint` = sum(num_complaint),
                               Percentage = paste(round((`No. complaint`/Totalcomplaint)*100,2),"%"),
                              `Monthly Average` = round(`No. complaint`/12,0),
                              `Daily Average` = round(`No. complaint`/365,0)
                              )  %>%
                    arrange(desc(`Year`))
kable(yearlycomplaintdata)

Year	No. complaint	Percentage	Monthly Average	Daily Average
2018	48777	97.66 %	4065	134
2017	934	1.87 %	78	3
2016	93	0.19 %	8	0
2015	57	0.11 %	5	0
2014	32	0.06 %	3	0
2013	18	0.04 %	2	0
2012	10	0.02 %	1	0
2011	13	0.03 %	1	0
2010	14	0.03 %	1	0

Procedure to Find summary monthly Values

monthlycomplaintdata <- complaint_Raw %>%
                    group_by(Year, Month, MonthYM) %>%
                    summarise(`No. complaint` = sum(num_complaint),
                              Percentage = paste(round((`No. complaint`/Totalcomplaint)*100,2),"%"),
                              `Daily Average` = round(`No. complaint`/30,0)
                              )  %>%
                    arrange(desc(`MonthYM`))
monthlycomplaintdata_1 <- monthlycomplaintdata
monthlycomplaintdata <- subset(monthlycomplaintdata, select = -c(Year, MonthYM))
kable(head(monthlycomplaintdata,20))

Month	No. complaint	Percentage	Daily Average
Dec, 2018	3532	7.07 %	118
Nov, 2018	3712	7.43 %	124
Oct, 2018	4098	8.2 %	137
Sep, 2018	4195	8.4 %	140
Aug, 2018	4431	8.87 %	148
Jul, 2018	4394	8.8 %	146
Jun, 2018	4379	8.77 %	146
May, 2018	4443	8.9 %	148
Apr, 2018	3970	7.95 %	132
Mar, 2018	3960	7.93 %	132
Feb, 2018	3765	7.54 %	126
Jan, 2018	3898	7.8 %	130
Dec, 2017	379	0.76 %	13
Nov, 2017	139	0.28 %	5
Oct, 2017	98	0.2 %	3
Sep, 2017	70	0.14 %	2
Aug, 2017	48	0.1 %	2
Jul, 2017	35	0.07 %	1
Jun, 2017	29	0.06 %	1
May, 2017	22	0.04 %	1

Procedure to Find summary yearly Values by the borough

boroughyearlycomplaintdata <- complaint_Raw %>%
                    group_by(boro_nm) %>%
                    summarise(`No. complaint` = sum(num_complaint),
                              Percentage = paste(round((`No. complaint`/Totalcomplaint)*100,2),"%"),
                              `Monthly Average` = round(`No. complaint`/12,0),
                              `Daily Average` = round(`No. complaint`/365,0)
                              )  %>%
                    arrange(desc(`boro_nm`))
kable(boroughyearlycomplaintdata)

boro_nm	No. complaint	Percentage	Monthly Average	Daily Average
STATEN ISLAND	2259	4.52 %	188	6
QUEENS	9620	19.26 %	802	26
MANHATTAN	12192	24.41 %	1016	33
BROOKLYN	14708	29.45 %	1226	40
BRONX	11093	22.21 %	924	30
NA	76	0.15 %	6	0

Procedure to Find summary monthly Values by the borough

boroughmonthlycomplaintdata <- complaint_Raw %>%
                    group_by(Year, Month, MonthYM, boro_nm) %>%
                    summarise(`No. complaint` = sum(num_complaint),
                              Percentage = paste(round((`No. complaint`/Totalcomplaint)*100,2),"%"),
                              Percentage_int = round((`No. complaint`/Totalcomplaint)*100,2),
                              `Daily Average` = round(`No. complaint`/30,0)
                              )  %>%
                    arrange(desc(`MonthYM`))
boroughmonthlycomplaintdata_1 <- boroughmonthlycomplaintdata
boroughmonthlycomplaintdata <- subset(boroughmonthlycomplaintdata, select = -c(Year, MonthYM))
kable(head(boroughmonthlycomplaintdata,20))

Month	boro_nm	No. complaint	Percentage	Percentage_int	Daily Average
Dec, 2018	NA	6	0.01 %	0.01	0
Dec, 2018	BRONX	774	1.55 %	1.55	26
Dec, 2018	BROOKLYN	1009	2.02 %	2.02	34
Dec, 2018	MANHATTAN	890	1.78 %	1.78	30
Dec, 2018	QUEENS	695	1.39 %	1.39	23
Dec, 2018	STATEN ISLAND	158	0.32 %	0.32	5
Nov, 2018	NA	5	0.01 %	0.01	0
Nov, 2018	BRONX	790	1.58 %	1.58	26
Nov, 2018	BROOKLYN	1114	2.23 %	2.23	37
Nov, 2018	MANHATTAN	919	1.84 %	1.84	31
Nov, 2018	QUEENS	724	1.45 %	1.45	24
Nov, 2018	STATEN ISLAND	160	0.32 %	0.32	5
Oct, 2018	NA	2	0 %	0.00	0
Oct, 2018	BRONX	933	1.87 %	1.87	31
Oct, 2018	BROOKLYN	1221	2.44 %	2.44	41
Oct, 2018	MANHATTAN	999	2 %	2.00	33
Oct, 2018	QUEENS	772	1.55 %	1.55	26
Oct, 2018	STATEN ISLAND	171	0.34 %	0.34	6
Sep, 2018	NA	6	0.01 %	0.01	0
Sep, 2018	BRONX	950	1.9 %	1.90	32

Procedure to Find summary daily Values by the borough

boroughcomplaintdata <- complaint_Raw %>%
                    group_by(Year, Month, MonthYM, boro_nm) %>%
                    summarise(`No. complaint` = sum(num_complaint))  %>%
                    arrange(desc(`MonthYM`))
                    
boroughcomplaintdata <- spread(data = boroughcomplaintdata,
                             key = boro_nm,
                             value = `No. complaint`
                             )

colnames(boroughcomplaintdata)[9] <- "Not Available"
kable(head(boroughcomplaintdata,20))

Year	Month	MonthYM	BRONX	BROOKLYN	MANHATTAN	QUEENS	STATEN ISLAND	Not Available
2010	Apr, 2010	2010/04	1	NA	1	NA	NA	NA
2010	Dec, 2010	2010/12	NA	2	1	NA	NA	NA
2010	Feb, 2010	2010/02	NA	1	NA	NA	NA	NA
2010	Jan, 2010	2010/01	1	1	2	NA	NA	NA
2010	Jun, 2010	2010/06	1	2	NA	NA	NA	NA
2010	Oct, 2010	2010/10	NA	NA	NA	NA	1	NA
2011	Apr, 2011	2011/04	NA	NA	NA	NA	1	NA
2011	Feb, 2011	2011/02	NA	1	NA	NA	NA	NA
2011	Jan, 2011	2011/01	NA	1	1	2	1	NA
2011	Mar, 2011	2011/03	1	NA	1	NA	NA	NA
2011	May, 2011	2011/05	NA	1	NA	NA	NA	NA
2011	Oct, 2011	2011/10	NA	NA	1	NA	NA	NA
2011	Sep, 2011	2011/09	1	NA	1	NA	NA	NA
2012	Aug, 2012	2012/08	NA	1	NA	NA	NA	NA
2012	Dec, 2012	2012/12	1	NA	NA	NA	NA	NA
2012	Feb, 2012	2012/02	NA	NA	NA	1	NA	NA
2012	Jan, 2012	2012/01	NA	NA	NA	NA	1	NA
2012	Jul, 2012	2012/07	NA	1	NA	NA	1	NA
2012	May, 2012	2012/05	1	NA	NA	NA	NA	NA
2012	Oct, 2012	2012/10	NA	NA	NA	1	NA	NA

Yearly summary

Table 1: Summary yearly table of complaints in NYC.
Year	No. complaint	Percentage	Monthly Average	Daily Average
2018	48777	97.66 %	4065	134
2017	934	1.87 %	78	3
2016	93	0.19 %	8	0
2015	57	0.11 %	5	0
2014	32	0.06 %	3	0
2013	18	0.04 %	2	0
2012	10	0.02 %	1	0
2011	13	0.03 %	1	0
2010	14	0.03 %	1	0

Table 1.1: Poorly forecasted trend based on a current daily average for the current year.
Year	Current.Reported	Curent.Daily.Avg	Forecasted.Month.Avg	Forecasted.Yearly.Avg	Previous.Year	Difference
2018	48777	372	11160	135780	934	134846

Yearly Summary Graph

Monthly summary

In this section, I have listed our given data into monthly results, with hopes of finding some patterns; one of these patterns could be related to the season of the year for example.

Table 2: Summary monthly table of criminal complaints in NYC.
Month	No. complaint	Percentage	Daily Average
Dec, 2018	3532	7.07 %	118
Nov, 2018	3712	7.43 %	124
Oct, 2018	4098	8.2 %	137
Sep, 2018	4195	8.4 %	140
Aug, 2018	4431	8.87 %	148
Jul, 2018	4394	8.8 %	146
Jun, 2018	4379	8.77 %	146
May, 2018	4443	8.9 %	148
Apr, 2018	3970	7.95 %	132
Mar, 2018	3960	7.93 %	132
Feb, 2018	3765	7.54 %	126
Jan, 2018	3898	7.8 %	130
Dec, 2017	379	0.76 %	13
Nov, 2017	139	0.28 %	5
Oct, 2017	98	0.2 %	3
Sep, 2017	70	0.14 %	2
Aug, 2017	48	0.1 %	2
Jul, 2017	35	0.07 %	1
Jun, 2017	29	0.06 %	1
May, 2017	22	0.04 %	1

I’m going to show only Date from 2017 monthly visulation

Part 3 - Statistics by Borough

Borough yearly summary

Table 3: Borough summary yearly table of complaints in NYC.
boro_nm	No. complaint	Percentage	Monthly Average	Daily Average
BROOKLYN	14708	29.45 %	1226	40
MANHATTAN	12192	24.41 %	1016	33
BRONX	11093	22.21 %	924	30
QUEENS	9620	19.26 %	802	26
STATEN ISLAND	2259	4.52 %	188	6
NA	76	0.15 %	6	0

Borough yearly summary Graph

Borough monthly summary

Table 4: Borough summary monthly table of complaints in NYC.
Month	BRONX	BROOKLYN	MANHATTAN	QUEENS	STATEN ISLAND	Not Available	Total
Dec, 2018	774	1009	890	695	158	6	3532
Nov, 2018	790	1114	919	724	160	5	3712
Oct, 2018	933	1221	999	772	171	2	4098
Sep, 2018	950	1201	1009	851	178	6	4195
Aug, 2018	992	1338	1052	822	221	6	4431
Jul, 2018	1025	1299	1002	851	205	12	4394
Jun, 2018	934	1283	1075	893	186	8	4379
May, 2018	956	1346	1043	859	232	7	4443
Apr, 2018	876	1141	986	770	189	8	3970
Mar, 2018	891	1182	995	739	148	5	3960

Bronx borough

Queens borough

Brooklyn borough

Manahttan borough

Staten Island borough

Unknown borough

Borough Criminal Complaint Description summary

Table 10: Borough summary table of contributing factors at the time of complaint involving cyclists in NYC.
ofns_desc	BRONX	BROOKLYN	MANHATTAN	QUEENS	STATEN ISLAND	Not Available	Total
PETIT LARCENY	1767	2580	2943	1673	344	0	9307
HARRASSMENT 2	1804	2250	1539	1542	486	0	7621
ASSAULT 3 & RELATED OFFENSES	1575	1633	1104	1115	236	0	5663
CRIMINAL MISCHIEF & RELATED OF	1096	1478	1011	1077	269	0	4931
GRAND LARCENY	706	1166	1826	830	137	1	4666
OFF. AGNST PUB ORD SENSBLTY &	483	727	515	408	132	0	2265
FELONY ASSAULT	621	645	395	427	71	0	2159
DANGEROUS DRUGS	571	488	407	93	59	0	1618
ROBBERY	353	464	311	259	39	0	1426
MISCELLANEOUS PENAL LAW	204	566	189	376	83	0	1418
BURGLARY	254	445	280	255	31	0	1265
DANGEROUS WEAPONS	267	290	177	166	30	0	930
OFFENSES AGAINST PUBLIC ADMINI	290	176	203	117	46	0	832
SEX CRIMES	157	282	216	144	23	0	822
VEHICLE AND TRAFFIC LAWS	187	220	103	198	31	0	739
GRAND LARCENY OF MOTOR VEHICLE	135	168	75	176	21	0	575
INTOXICATED & IMPAIRED DRIVING	91	183	87	126	73	0	560
FORGERY	80	228	141	93	13	0	555
THEFT-FRAUD	53	125	156	113	27	0	474
CRIMINAL TRESPASS	48	98	80	89	26	0	341
FRAUDS	37	59	107	46	23	0	272
RAPE	44	62	42	44	12	0	204
POSSESSION OF STOLEN PROPERTY	31	68	42	43	3	0	187
UNAUTHORIZED USE OF A VEHICLE	25	45	20	73	7	0	170
OFFENSES INVOLVING FRAUD	40	28	40	14	3	0	125
OTHER OFFENSES RELATED TO THEF	26	35	27	24	7	0	119
ADMINISTRATIVE CODE	37	41	12	20	5	0	115
OFFENSES AGAINST THE PERSON	20	38	27	22	8	0	115
ARSON	26	33	19	22	4	0	104
MURDER & NON-NEGL. MANSLAUGHTER	0	0	0	0	0	75	75
NYS LAWS-UNCLASSIFIED FELONY	16	12	25	9	4	0	66
OTHER STATE LAWS (NON PENAL LA	7	13	8	3	2	0	33
BURGLAR’S TOOLS	3	9	15	3	0	0	30
THEFT OF SERVICES	8	8	11	2	0	0	29
FRAUDULENT ACCOSTING	2	3	13	2	0	0	20
GAMBLING	1	7	5	2	0	0	15
ALCOHOLIC BEVERAGE CONTROL LAW	4	6	2	2	0	0	14
KIDNAPPING & RELATED OFFENSES	4	6	2	2	0	0	14
AGRICULTURE & MRKTS LAW-UNCLASSIFIED	5	2	2	3	1	0	13
PETIT LARCENY OF MOTOR VEHICLE	4	2	5	1	0	0	12
PROSTITUTION & RELATED OFFENSES	5	3	0	2	0	0	10
JOSTLING	2	0	6	0	0	0	8
OFFENSES RELATED TO CHILDREN	0	5	0	3	0	0	8
OFFENSES AGAINST PUBLIC SAFETY	2	3	0	1	0	0	6
DISORDERLY CONDUCT	0	3	2	0	0	0	5
ENDAN WELFARE INCOMP	2	1	0	0	0	0	3
CHILD ABANDONMENT/NON SUPPORT	0	1	0	0	1	0	2
OTHER STATE LAWS	0	0	2	0	0	0	2
HOMICIDE-NEGLIGENT,UNCLASSIFIE	0	1	0	0	0	0	1
KIDNAPPING	0	0	0	0	1	0	1
LOITERING/GAMBLING (CARDS, DIC	0	1	0	0	0	0	1
NEW YORK CITY HEALTH CODE	0	1	0	0	0	0	1
0	0	0	0	0	1	0	1

Part 4 - Crime Description

p <- plot_ly(data = head(boroughcomplaintdescdata_1), 
             y = ~ofns_desc,
             x = ~Total,
             text = ~paste("Total of mentions:", Total),
             type = 'bar',
             orientation = 'h') %>%
  layout(yaxis = list(title = "Crime Description"),
         xaxis = list(title = "Number of Crime Complaints."),
         margin = list(b = 100, l = 200, r = 70)) %>%
  layout(showlegend = FALSE, 
         title = " Top 10 Crime Complaint in NYC of all times.", 
         autosize = TRUE
         ) %>%
  layout(autosize = F, width = 800, height = 500)

## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()

ggplotly(p)

MANHATTAN borough

p <- plot_ly(data = head(boroughcomplaintdescdata_1,10), 
             y = ~ofns_desc,
             x = ~MANHATTAN,
             text = ~paste("Total of mentions:", Total),
             type = 'bar',
             orientation = 'h') %>%
  layout(yaxis = list(title = "Crime Description"),
         xaxis = list(title = "Number of Crime Complaints."),
         margin = list(b = 100, l = 200, r = 80)) %>%
  layout(showlegend = FALSE, 
         title = "Top 10 Crime Complaint in MANHATTAN of all times.", 
         autosize = TRUE
         ) %>%
  layout(autosize = F, width = 800, height = 500)

## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()

ggplotly(p)

Bronx borough

p <- plot_ly(data = head(boroughcomplaintdescdata_1,10), 
             y = ~ofns_desc,
             x = ~BRONX,
             text = ~paste("Total of mentions:", Total),
             type = 'bar',
             orientation = 'h') %>%
  layout(yaxis = list(title = "Crime Description"),
         xaxis = list(title = "Number of Crime Complaints."),
         margin = list(b = 100, l = 200, r = 80)) %>%
  layout(showlegend = FALSE, 
         title = " Top 10 Crime Complaint in BRONX of all times.", 
         autosize = TRUE
         ) %>%
  layout(autosize = F, width = 800, height = 500)

## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()

ggplotly(p)

Staten Island borough

p <- plot_ly(data = head(boroughcomplaintdescdata_1,10), 
             y = ~ofns_desc,
             x = ~`STATEN ISLAND`,
             text = ~paste("Total of mentions:", Total),
             type = 'bar',
             orientation = 'h') %>%
  layout(yaxis = list(title = "Crime Description"),
         xaxis = list(title = "Number of Crime Complaints."),
         margin = list(b = 100, l = 200, r = 80)) %>%
  layout(showlegend = FALSE, 
         title = " Top 10 Crime Complaint in STATEN ISLAND of all times.", 
         autosize = TRUE
         ) %>%
  layout(autosize = F, width = 800, height = 500)

## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()

ggplotly(p)

Brooklyn borough

p <- plot_ly(data = head(boroughcomplaintdescdata_1,10), 
             y = ~ofns_desc,
             x = ~BROOKLYN,
             text = ~paste("Total of mentions:", Total),
             type = 'bar',
             orientation = 'h') %>%
  layout(yaxis = list(title = "Crime Description"),
         xaxis = list(title = "Number of Crime Complaints."),
         margin = list(b = 100, l = 200, r = 80)) %>%
  layout(showlegend = FALSE, 
         title = " Top 10 Crime Complaint in BROOKLYN of all times.", 
         autosize = TRUE
         ) %>%
  layout(autosize = F, width = 800, height = 500)

## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()

ggplotly(p)

Queens borough

p <- plot_ly(data = head(boroughcomplaintdescdata_1,10), 
             y = ~ofns_desc,
             x = ~QUEENS,
             text = ~paste("Total of mentions:", Total),
             type = 'bar',
             orientation = 'h') %>%
  layout(yaxis = list(title = "Crime Description"),
         xaxis = list(title = "Number of Crime Complaints."),
         margin = list(b = 100, l = 200, r = 80)) %>%
  layout(showlegend = FALSE, 
         title = " Top 10 Crime Complaint in QUEENS of all times.", 
         autosize = TRUE
         ) %>%
  layout(autosize = TRUE, width = 800, height = 500)

## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()

ggplotly(p)

PART 5 - Observed patterns

we can quickly observe a few interesting patterns of contributing factors reported as crime descriptions as follows:

The top 10 crime complaint descriptions all over the 5 boroughs, that is Petit Larceny & Related Offenses , Harresment, Robbery, Assault 3 & Related Offenses , Criminal Miscief,Grand Larceny , Off. Agnst Pub ord Sensblty, Fellony Assault, MISCELLANEOUS PENAL LAW, Dangerous Drugs are the top criminal complaints descriptions all over the five boroughs.
MANHATTAN seems to present a very high number of crime complaints Petit Larceny & Related Offenses contrary to the BRONX , STATEN ISLAND ,Queens for example.
Brooklyn and Manhattan stand out due to Dangerous Drugs, doubling up Queens and about three times more than the Staten Island boroughs.
BROOKLYN present a high number of Roobery mentions compared to the other boroughs.

PART 6 - Modeling

Here, I’m going to find out whether Bronx borough County compalint rates correlated with Bronx (regional data) macro factors such as Unemployment rate and poverty.

Macro Factor Data Load

rawpoverty<-read.csv(url("https://raw.githubusercontent.com/omerozeren/DATA607/master/FINAL_PROJECT/Poverty_Data.csv"))
names(rawpoverty)[4]<-"county"
unemployment<-read.csv(url("https://raw.githubusercontent.com/omerozeren/DATA607/master/FINAL_PROJECT/Local_Area_Unemployment_Data.csv"))
names(unemployment)[1]<-"county"

Borough Crime Complaint Data

BRONX_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='BRONX',]
QUEENS_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='QUEENS',]
STATEN_ISLAND_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='STATEN ISLAND',]
MANHATTAN_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='MANHATTAN',]
BROOKLYN_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='BROOKLYN',]

Tidy Poverty Macro Factor Data

# 2013 values
pov13 <- rawpoverty %>% filter(Year == 2013) %>% select(county,pct = `All.Ages.in.Poverty.Percent`, inc = `Median.Household.Income.in.Dollars`)

# 2016 values
pov16 <- rawpoverty %>% filter(Year == 2016) %>% select(county,pct = `All.Ages.in.Poverty.Percent`, inc = `Median.Household.Income.in.Dollars`)

# Fix income values by removing $ and ,
pov13$inc <- as.numeric(str_replace_all(pov13$inc,"\\$|,",""))
pov16$inc <- as.numeric(str_replace_all(pov16$inc,"\\$|,",""))

# Combine our data frames
poverty <- inner_join(pov13, pov16, by=c("county" = "county"),
                      suffix = c("_2013","_2016"))

# Compute changes and impute interim values
poverty <- poverty %>%
  mutate(povChg = pct_2016 - pct_2013, incChg = inc_2016 - inc_2013,
         povIncrement = povChg / 3, incIncrement = incChg / 3,
         pct_2014 = pct_2013 + povIncrement, pct_2015 = pct_2016 - povIncrement,
         inc_2014 = inc_2013 + incIncrement, inc_2015 = inc_2016 - incIncrement)

# Tidy the data
poverty <- poverty %>% 
  gather(key="year",
         value="value",
         pct_2013,pct_2014,pct_2015,pct_2016,
         inc_2013,inc_2014,inc_2015,inc_2016)

poverty <- poverty %>% separate(year,c("measure","yr"),"_")

poverty <- poverty %>% spread(key="measure",value="value") %>% select(county, year = yr, income = inc, poverty = pct)

poverty$year <- as.numeric(poverty$year)

# Adjust income to 1,000's scale
poverty$income <- round(poverty$income/1000,2)

kable(head(poverty,20))

county	year	income	poverty
United States	2013	52.25	15.8
New York	2013	57.26	16.0
Albany County (NY)	2013	55.78	13.7
Allegany County (NY)	2013	41.85	16.7
Bronx County (NY)	2013	33.08	30.7
Broome County (NY)	2013	45.14	17.7
Cattaraugus County (NY)	2013	40.94	18.9
Cayuga County (NY)	2013	48.98	14.2
Chautauqua County (NY)	2013	40.47	19.1
Chemung County (NY)	2013	45.34	17.0
Chenango County (NY)	2013	44.33	16.8
Clinton County (NY)	2013	45.90	15.7
Columbia County (NY)	2013	57.09	11.8
Cortland County (NY)	2013	46.16	14.6
Delaware County (NY)	2013	43.38	17.8
Dutchess County (NY)	2013	69.30	9.5
Erie County (NY)	2013	51.12	15.1
Essex County (NY)	2013	46.34	11.7
Franklin County (NY)	2013	42.67	22.4
Fulton County (NY)	2013	42.26	15.7

Joint Macro Economics Datas

To support our analysis, we will join our data frames so we have all the variables we will be using in a single data frame.

library(zoo)

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

BRONX_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='BRONX',]
BRONX_data<-na.omit(BRONX_data)
names(BRONX_data)[4]<-"county"
unemployment$county<-as.character(unemployment$county)
bronx_unemp<-unemployment[unemployment$county =='Bronx County',]
bronx_poverty <- poverty[poverty$county == "Bronx County (NY)",] 
#Ordering the data
bronx_unemp<-bronx_unemp[order(bronx_unemp$Year, bronx_unemp$Month,decreasing = TRUE), ]
macro_data<- inner_join(bronx_poverty,bronx_unemp,by=c("year" = "Year"))
macro_data$year <- BRONX_data$Year[1:48]

Tidy Model Data

model_data <- macro_data %>%
  group_by(year, Month) %>%
  summarise(average_income = mean(income),
            average_poverty = mean(poverty),
            average_un_rate = mean(Unemployment.Rate))
#model_data$year<-as.character(model_data$year)
model_data<-model_data[order(model_data$year, model_data$Month,decreasing = TRUE), ]
model_data$Criminal_Rate <- BRONX_data$Percentage_int[1:48]

Analysis

Let’s build a model to see how the variables relate to the Bronx crime compliant rate.

# Linear model for 2013
model<- lm(Criminal_Rate ~ average_income + average_poverty + average_un_rate,data=model_data)
summary(model)

## 
## Call:
## lm(formula = Criminal_Rate ~ average_income + average_poverty + 
##     average_un_rate, data = model_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.76305 -0.35172  0.01825  0.35840  0.56284 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -9.886e+03  2.683e+03  -3.685 0.000624 ***
## average_income   1.034e+02  2.809e+01   3.680 0.000633 ***
## average_poverty  2.106e+02  5.712e+01   3.686 0.000621 ***
## average_un_rate  2.422e-01  7.796e-02   3.107 0.003307 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4263 on 44 degrees of freedom
## Multiple R-squared:  0.7279, Adjusted R-squared:  0.7094 
## F-statistic: 39.24 on 3 and 44 DF,  p-value: 1.691e-12

plot(model)

PART 7 - Conclusion and Recommendation:

After modeling analysis, it is remarkably true that there is statistically a significant linear relationship between the criminal rate of only the BRONX borough and income, unemployment, or poverty rates. I could not locate income, unemployment, or poverty rates data for the remaining four boroughs
A high-level crime description within all of NYC (inclusive of all five boroughs- Queens, Manhattan, Bronx, Staten Island, Brooklyn) displayed that the following complaints were the highest from 2010-2018- petit larceny, harassment, assault & related offenses, and criminal mischief.
The crime descriptions broken out by borough for the highest number of complaints:
Petit Larceny: Manhattan, Brooklyn, Bronx, Queens, Staten Island
Harassment: Brooklyn, Bronx, Manhattan, Queens, Staten Island
Assault & Related Offenses: Bronx, Brooklyn, Manhattan, Queens, Staten Island
Criminal Mischief: Brooklyn, Bronx, Manhattan, Queens, Staten Island
Overall Staten Island has the least criminal complaints amongst the five boroughs. As a result, I recommend my sister to list the following boroughs as her top three preferences to teach: Staten Island, Queens, Manhattan

PART 9 - References

NYPD Complaint Data - https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Current-Year-To-Date-/5uac-w243
Unemployment Data- https://data.ny.gov/Economic-Development/Local-Area-Unemployment-Statistics-Beginning-1976/5hyu-bdh8
Income and Poverty Estimates - https://www.census.gov/data-tools/demo/saipe/saipe.html?s_appName=saipe&map_yearSelector=2013&map_geoSelector=aa_c&s_state=36&s_year=2016,2013

Final Project- DATA 607

OMER OZEREN

Part 1 - Introduction

Goal

Motivation for performing the analysis

Data Details

Part 2 - Data

Load Data

Data Chacteristics

Dates

Amount of data collected

Procedure to Find summary yearly Values

Procedure to Find summary monthly Values

Procedure to Find summary yearly Values by the borough

Procedure to Find summary monthly Values by the borough

Procedure to Find summary daily Values by the borough

Yearly summary

Yearly Summary Graph

Monthly summary

Part 3 - Statistics by Borough

Borough yearly summary

Borough yearly summary Graph

Borough monthly summary

Bronx borough

Queens borough

Brooklyn borough

Manahttan borough

Staten Island borough

Unknown borough

Borough Criminal Complaint Description summary

Part 4 - Crime Description

MANHATTAN borough

Bronx borough

Staten Island borough

Brooklyn borough

Queens borough

PART 5 - Observed patterns

PART 6 - Modeling

Macro Factor Data Load

Borough Crime Complaint Data

Tidy Poverty Macro Factor Data

Joint Macro Economics Datas

Tidy Model Data

Analysis

PART 7 - Conclusion and Recommendation:

PART 9 - References