I am seeking to understand the rates of criminal complaints by borough and then compare this with regional macro factors such as unemployment and income levels to determine if there is a correlation amongst them.
My sister just graduated with a master’s degree in education and is about to list the school of preference by borough. I would like to conduct this analysis to help her determine which borough is safest for her to be teaching.
Data for this analysis is made up of three components.
The first dataset represents a breakdown of every criminal complaint report filed in NYC by the NYPD up until December 2018. Each record represents a criminal complaint in NYC and includes information about the types of crime (ex. felony, misdemeanor, etc. ), the location (by city borough), and time of enforcement. The data is from the New York State website.
The second dataset represents all unemployment statistics for NYC from the New York State website.
The third dataset represents income and poverty estimates for NYC by borough from the United States Census Bureau.
# Raw JSON Url for updated information of NYC Reported Compliants.
download_data <- function(){
# Defining our limit of compliants
limit = 50000
url <- "https://data.cityofnewyork.us/resource/7x9x-zpz6.json?"
url <- paste(url, '&$limit=', limit, sep='')
# Extract data into JSON data frame from API query
Raw <- fromJSON(url)
# Return extracted data
return(Raw)
}
# Update complaint Table
complaint_Raw <- download_data()
First reported complaint : 2010-03-25.
Last reported complaint: 2018-12-31.
Total number of days passed since first and last reported complaint: 3203.
Total number of years passed since first and last reported complaint: 9.
# we are generatin colum for each daily complaint
complaint_Raw$num_complaint <- 1
Totalcomplaint <- complaint_Raw$num_complaint %>% sum()
yearlycomplaintdata <- complaint_Raw %>%
group_by(Year) %>%
summarise(`No. complaint` = sum(num_complaint),
Percentage = paste(round((`No. complaint`/Totalcomplaint)*100,2),"%"),
`Monthly Average` = round(`No. complaint`/12,0),
`Daily Average` = round(`No. complaint`/365,0)
) %>%
arrange(desc(`Year`))
kable(yearlycomplaintdata)
Year | No. complaint | Percentage | Monthly Average | Daily Average |
---|---|---|---|---|
2018 | 48777 | 97.66 % | 4065 | 134 |
2017 | 934 | 1.87 % | 78 | 3 |
2016 | 93 | 0.19 % | 8 | 0 |
2015 | 57 | 0.11 % | 5 | 0 |
2014 | 32 | 0.06 % | 3 | 0 |
2013 | 18 | 0.04 % | 2 | 0 |
2012 | 10 | 0.02 % | 1 | 0 |
2011 | 13 | 0.03 % | 1 | 0 |
2010 | 14 | 0.03 % | 1 | 0 |
monthlycomplaintdata <- complaint_Raw %>%
group_by(Year, Month, MonthYM) %>%
summarise(`No. complaint` = sum(num_complaint),
Percentage = paste(round((`No. complaint`/Totalcomplaint)*100,2),"%"),
`Daily Average` = round(`No. complaint`/30,0)
) %>%
arrange(desc(`MonthYM`))
monthlycomplaintdata_1 <- monthlycomplaintdata
monthlycomplaintdata <- subset(monthlycomplaintdata, select = -c(Year, MonthYM))
kable(head(monthlycomplaintdata,20))
Month | No. complaint | Percentage | Daily Average |
---|---|---|---|
Dec, 2018 | 3532 | 7.07 % | 118 |
Nov, 2018 | 3712 | 7.43 % | 124 |
Oct, 2018 | 4098 | 8.2 % | 137 |
Sep, 2018 | 4195 | 8.4 % | 140 |
Aug, 2018 | 4431 | 8.87 % | 148 |
Jul, 2018 | 4394 | 8.8 % | 146 |
Jun, 2018 | 4379 | 8.77 % | 146 |
May, 2018 | 4443 | 8.9 % | 148 |
Apr, 2018 | 3970 | 7.95 % | 132 |
Mar, 2018 | 3960 | 7.93 % | 132 |
Feb, 2018 | 3765 | 7.54 % | 126 |
Jan, 2018 | 3898 | 7.8 % | 130 |
Dec, 2017 | 379 | 0.76 % | 13 |
Nov, 2017 | 139 | 0.28 % | 5 |
Oct, 2017 | 98 | 0.2 % | 3 |
Sep, 2017 | 70 | 0.14 % | 2 |
Aug, 2017 | 48 | 0.1 % | 2 |
Jul, 2017 | 35 | 0.07 % | 1 |
Jun, 2017 | 29 | 0.06 % | 1 |
May, 2017 | 22 | 0.04 % | 1 |
boroughyearlycomplaintdata <- complaint_Raw %>%
group_by(boro_nm) %>%
summarise(`No. complaint` = sum(num_complaint),
Percentage = paste(round((`No. complaint`/Totalcomplaint)*100,2),"%"),
`Monthly Average` = round(`No. complaint`/12,0),
`Daily Average` = round(`No. complaint`/365,0)
) %>%
arrange(desc(`boro_nm`))
kable(boroughyearlycomplaintdata)
boro_nm | No. complaint | Percentage | Monthly Average | Daily Average |
---|---|---|---|---|
STATEN ISLAND | 2259 | 4.52 % | 188 | 6 |
QUEENS | 9620 | 19.26 % | 802 | 26 |
MANHATTAN | 12192 | 24.41 % | 1016 | 33 |
BROOKLYN | 14708 | 29.45 % | 1226 | 40 |
BRONX | 11093 | 22.21 % | 924 | 30 |
NA | 76 | 0.15 % | 6 | 0 |
boroughmonthlycomplaintdata <- complaint_Raw %>%
group_by(Year, Month, MonthYM, boro_nm) %>%
summarise(`No. complaint` = sum(num_complaint),
Percentage = paste(round((`No. complaint`/Totalcomplaint)*100,2),"%"),
Percentage_int = round((`No. complaint`/Totalcomplaint)*100,2),
`Daily Average` = round(`No. complaint`/30,0)
) %>%
arrange(desc(`MonthYM`))
boroughmonthlycomplaintdata_1 <- boroughmonthlycomplaintdata
boroughmonthlycomplaintdata <- subset(boroughmonthlycomplaintdata, select = -c(Year, MonthYM))
kable(head(boroughmonthlycomplaintdata,20))
Month | boro_nm | No. complaint | Percentage | Percentage_int | Daily Average |
---|---|---|---|---|---|
Dec, 2018 | NA | 6 | 0.01 % | 0.01 | 0 |
Dec, 2018 | BRONX | 774 | 1.55 % | 1.55 | 26 |
Dec, 2018 | BROOKLYN | 1009 | 2.02 % | 2.02 | 34 |
Dec, 2018 | MANHATTAN | 890 | 1.78 % | 1.78 | 30 |
Dec, 2018 | QUEENS | 695 | 1.39 % | 1.39 | 23 |
Dec, 2018 | STATEN ISLAND | 158 | 0.32 % | 0.32 | 5 |
Nov, 2018 | NA | 5 | 0.01 % | 0.01 | 0 |
Nov, 2018 | BRONX | 790 | 1.58 % | 1.58 | 26 |
Nov, 2018 | BROOKLYN | 1114 | 2.23 % | 2.23 | 37 |
Nov, 2018 | MANHATTAN | 919 | 1.84 % | 1.84 | 31 |
Nov, 2018 | QUEENS | 724 | 1.45 % | 1.45 | 24 |
Nov, 2018 | STATEN ISLAND | 160 | 0.32 % | 0.32 | 5 |
Oct, 2018 | NA | 2 | 0 % | 0.00 | 0 |
Oct, 2018 | BRONX | 933 | 1.87 % | 1.87 | 31 |
Oct, 2018 | BROOKLYN | 1221 | 2.44 % | 2.44 | 41 |
Oct, 2018 | MANHATTAN | 999 | 2 % | 2.00 | 33 |
Oct, 2018 | QUEENS | 772 | 1.55 % | 1.55 | 26 |
Oct, 2018 | STATEN ISLAND | 171 | 0.34 % | 0.34 | 6 |
Sep, 2018 | NA | 6 | 0.01 % | 0.01 | 0 |
Sep, 2018 | BRONX | 950 | 1.9 % | 1.90 | 32 |
boroughcomplaintdata <- complaint_Raw %>%
group_by(Year, Month, MonthYM, boro_nm) %>%
summarise(`No. complaint` = sum(num_complaint)) %>%
arrange(desc(`MonthYM`))
boroughcomplaintdata <- spread(data = boroughcomplaintdata,
key = boro_nm,
value = `No. complaint`
)
colnames(boroughcomplaintdata)[9] <- "Not Available"
kable(head(boroughcomplaintdata,20))
Year | Month | MonthYM | BRONX | BROOKLYN | MANHATTAN | QUEENS | STATEN ISLAND | Not Available |
---|---|---|---|---|---|---|---|---|
2010 | Apr, 2010 | 2010/04 | 1 | NA | 1 | NA | NA | NA |
2010 | Dec, 2010 | 2010/12 | NA | 2 | 1 | NA | NA | NA |
2010 | Feb, 2010 | 2010/02 | NA | 1 | NA | NA | NA | NA |
2010 | Jan, 2010 | 2010/01 | 1 | 1 | 2 | NA | NA | NA |
2010 | Jun, 2010 | 2010/06 | 1 | 2 | NA | NA | NA | NA |
2010 | Oct, 2010 | 2010/10 | NA | NA | NA | NA | 1 | NA |
2011 | Apr, 2011 | 2011/04 | NA | NA | NA | NA | 1 | NA |
2011 | Feb, 2011 | 2011/02 | NA | 1 | NA | NA | NA | NA |
2011 | Jan, 2011 | 2011/01 | NA | 1 | 1 | 2 | 1 | NA |
2011 | Mar, 2011 | 2011/03 | 1 | NA | 1 | NA | NA | NA |
2011 | May, 2011 | 2011/05 | NA | 1 | NA | NA | NA | NA |
2011 | Oct, 2011 | 2011/10 | NA | NA | 1 | NA | NA | NA |
2011 | Sep, 2011 | 2011/09 | 1 | NA | 1 | NA | NA | NA |
2012 | Aug, 2012 | 2012/08 | NA | 1 | NA | NA | NA | NA |
2012 | Dec, 2012 | 2012/12 | 1 | NA | NA | NA | NA | NA |
2012 | Feb, 2012 | 2012/02 | NA | NA | NA | 1 | NA | NA |
2012 | Jan, 2012 | 2012/01 | NA | NA | NA | NA | 1 | NA |
2012 | Jul, 2012 | 2012/07 | NA | 1 | NA | NA | 1 | NA |
2012 | May, 2012 | 2012/05 | 1 | NA | NA | NA | NA | NA |
2012 | Oct, 2012 | 2012/10 | NA | NA | NA | 1 | NA | NA |
Year | No. complaint | Percentage | Monthly Average | Daily Average |
---|---|---|---|---|
2018 | 48777 | 97.66 % | 4065 | 134 |
2017 | 934 | 1.87 % | 78 | 3 |
2016 | 93 | 0.19 % | 8 | 0 |
2015 | 57 | 0.11 % | 5 | 0 |
2014 | 32 | 0.06 % | 3 | 0 |
2013 | 18 | 0.04 % | 2 | 0 |
2012 | 10 | 0.02 % | 1 | 0 |
2011 | 13 | 0.03 % | 1 | 0 |
2010 | 14 | 0.03 % | 1 | 0 |
Year | Current.Reported | Curent.Daily.Avg | Forecasted.Month.Avg | Forecasted.Yearly.Avg | Previous.Year | Difference |
---|---|---|---|---|---|---|
2018 | 48777 | 372 | 11160 | 135780 | 934 | 134846 |
In this section, I have listed our given data into monthly results, with hopes of finding some patterns; one of these patterns could be related to the season of the year for example.
Month | No. complaint | Percentage | Daily Average |
---|---|---|---|
Dec, 2018 | 3532 | 7.07 % | 118 |
Nov, 2018 | 3712 | 7.43 % | 124 |
Oct, 2018 | 4098 | 8.2 % | 137 |
Sep, 2018 | 4195 | 8.4 % | 140 |
Aug, 2018 | 4431 | 8.87 % | 148 |
Jul, 2018 | 4394 | 8.8 % | 146 |
Jun, 2018 | 4379 | 8.77 % | 146 |
May, 2018 | 4443 | 8.9 % | 148 |
Apr, 2018 | 3970 | 7.95 % | 132 |
Mar, 2018 | 3960 | 7.93 % | 132 |
Feb, 2018 | 3765 | 7.54 % | 126 |
Jan, 2018 | 3898 | 7.8 % | 130 |
Dec, 2017 | 379 | 0.76 % | 13 |
Nov, 2017 | 139 | 0.28 % | 5 |
Oct, 2017 | 98 | 0.2 % | 3 |
Sep, 2017 | 70 | 0.14 % | 2 |
Aug, 2017 | 48 | 0.1 % | 2 |
Jul, 2017 | 35 | 0.07 % | 1 |
Jun, 2017 | 29 | 0.06 % | 1 |
May, 2017 | 22 | 0.04 % | 1 |
I’m going to show only Date from 2017 monthly visulation
boro_nm | No. complaint | Percentage | Monthly Average | Daily Average |
---|---|---|---|---|
BROOKLYN | 14708 | 29.45 % | 1226 | 40 |
MANHATTAN | 12192 | 24.41 % | 1016 | 33 |
BRONX | 11093 | 22.21 % | 924 | 30 |
QUEENS | 9620 | 19.26 % | 802 | 26 |
STATEN ISLAND | 2259 | 4.52 % | 188 | 6 |
NA | 76 | 0.15 % | 6 | 0 |
Month | BRONX | BROOKLYN | MANHATTAN | QUEENS | STATEN ISLAND | Not Available | Total |
---|---|---|---|---|---|---|---|
Dec, 2018 | 774 | 1009 | 890 | 695 | 158 | 6 | 3532 |
Nov, 2018 | 790 | 1114 | 919 | 724 | 160 | 5 | 3712 |
Oct, 2018 | 933 | 1221 | 999 | 772 | 171 | 2 | 4098 |
Sep, 2018 | 950 | 1201 | 1009 | 851 | 178 | 6 | 4195 |
Aug, 2018 | 992 | 1338 | 1052 | 822 | 221 | 6 | 4431 |
Jul, 2018 | 1025 | 1299 | 1002 | 851 | 205 | 12 | 4394 |
Jun, 2018 | 934 | 1283 | 1075 | 893 | 186 | 8 | 4379 |
May, 2018 | 956 | 1346 | 1043 | 859 | 232 | 7 | 4443 |
Apr, 2018 | 876 | 1141 | 986 | 770 | 189 | 8 | 3970 |
Mar, 2018 | 891 | 1182 | 995 | 739 | 148 | 5 | 3960 |
ofns_desc | BRONX | BROOKLYN | MANHATTAN | QUEENS | STATEN ISLAND | Not Available | Total |
---|---|---|---|---|---|---|---|
PETIT LARCENY | 1767 | 2580 | 2943 | 1673 | 344 | 0 | 9307 |
HARRASSMENT 2 | 1804 | 2250 | 1539 | 1542 | 486 | 0 | 7621 |
ASSAULT 3 & RELATED OFFENSES | 1575 | 1633 | 1104 | 1115 | 236 | 0 | 5663 |
CRIMINAL MISCHIEF & RELATED OF | 1096 | 1478 | 1011 | 1077 | 269 | 0 | 4931 |
GRAND LARCENY | 706 | 1166 | 1826 | 830 | 137 | 1 | 4666 |
OFF. AGNST PUB ORD SENSBLTY & | 483 | 727 | 515 | 408 | 132 | 0 | 2265 |
FELONY ASSAULT | 621 | 645 | 395 | 427 | 71 | 0 | 2159 |
DANGEROUS DRUGS | 571 | 488 | 407 | 93 | 59 | 0 | 1618 |
ROBBERY | 353 | 464 | 311 | 259 | 39 | 0 | 1426 |
MISCELLANEOUS PENAL LAW | 204 | 566 | 189 | 376 | 83 | 0 | 1418 |
BURGLARY | 254 | 445 | 280 | 255 | 31 | 0 | 1265 |
DANGEROUS WEAPONS | 267 | 290 | 177 | 166 | 30 | 0 | 930 |
OFFENSES AGAINST PUBLIC ADMINI | 290 | 176 | 203 | 117 | 46 | 0 | 832 |
SEX CRIMES | 157 | 282 | 216 | 144 | 23 | 0 | 822 |
VEHICLE AND TRAFFIC LAWS | 187 | 220 | 103 | 198 | 31 | 0 | 739 |
GRAND LARCENY OF MOTOR VEHICLE | 135 | 168 | 75 | 176 | 21 | 0 | 575 |
INTOXICATED & IMPAIRED DRIVING | 91 | 183 | 87 | 126 | 73 | 0 | 560 |
FORGERY | 80 | 228 | 141 | 93 | 13 | 0 | 555 |
THEFT-FRAUD | 53 | 125 | 156 | 113 | 27 | 0 | 474 |
CRIMINAL TRESPASS | 48 | 98 | 80 | 89 | 26 | 0 | 341 |
FRAUDS | 37 | 59 | 107 | 46 | 23 | 0 | 272 |
RAPE | 44 | 62 | 42 | 44 | 12 | 0 | 204 |
POSSESSION OF STOLEN PROPERTY | 31 | 68 | 42 | 43 | 3 | 0 | 187 |
UNAUTHORIZED USE OF A VEHICLE | 25 | 45 | 20 | 73 | 7 | 0 | 170 |
OFFENSES INVOLVING FRAUD | 40 | 28 | 40 | 14 | 3 | 0 | 125 |
OTHER OFFENSES RELATED TO THEF | 26 | 35 | 27 | 24 | 7 | 0 | 119 |
ADMINISTRATIVE CODE | 37 | 41 | 12 | 20 | 5 | 0 | 115 |
OFFENSES AGAINST THE PERSON | 20 | 38 | 27 | 22 | 8 | 0 | 115 |
ARSON | 26 | 33 | 19 | 22 | 4 | 0 | 104 |
MURDER & NON-NEGL. MANSLAUGHTER | 0 | 0 | 0 | 0 | 0 | 75 | 75 |
NYS LAWS-UNCLASSIFIED FELONY | 16 | 12 | 25 | 9 | 4 | 0 | 66 |
OTHER STATE LAWS (NON PENAL LA | 7 | 13 | 8 | 3 | 2 | 0 | 33 |
BURGLAR’S TOOLS | 3 | 9 | 15 | 3 | 0 | 0 | 30 |
THEFT OF SERVICES | 8 | 8 | 11 | 2 | 0 | 0 | 29 |
FRAUDULENT ACCOSTING | 2 | 3 | 13 | 2 | 0 | 0 | 20 |
GAMBLING | 1 | 7 | 5 | 2 | 0 | 0 | 15 |
ALCOHOLIC BEVERAGE CONTROL LAW | 4 | 6 | 2 | 2 | 0 | 0 | 14 |
KIDNAPPING & RELATED OFFENSES | 4 | 6 | 2 | 2 | 0 | 0 | 14 |
AGRICULTURE & MRKTS LAW-UNCLASSIFIED | 5 | 2 | 2 | 3 | 1 | 0 | 13 |
PETIT LARCENY OF MOTOR VEHICLE | 4 | 2 | 5 | 1 | 0 | 0 | 12 |
PROSTITUTION & RELATED OFFENSES | 5 | 3 | 0 | 2 | 0 | 0 | 10 |
JOSTLING | 2 | 0 | 6 | 0 | 0 | 0 | 8 |
OFFENSES RELATED TO CHILDREN | 0 | 5 | 0 | 3 | 0 | 0 | 8 |
OFFENSES AGAINST PUBLIC SAFETY | 2 | 3 | 0 | 1 | 0 | 0 | 6 |
DISORDERLY CONDUCT | 0 | 3 | 2 | 0 | 0 | 0 | 5 |
ENDAN WELFARE INCOMP | 2 | 1 | 0 | 0 | 0 | 0 | 3 |
CHILD ABANDONMENT/NON SUPPORT | 0 | 1 | 0 | 0 | 1 | 0 | 2 |
OTHER STATE LAWS | 0 | 0 | 2 | 0 | 0 | 0 | 2 |
HOMICIDE-NEGLIGENT,UNCLASSIFIE | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
KIDNAPPING | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
LOITERING/GAMBLING (CARDS, DIC | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
NEW YORK CITY HEALTH CODE | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
p <- plot_ly(data = head(boroughcomplaintdescdata_1),
y = ~ofns_desc,
x = ~Total,
text = ~paste("Total of mentions:", Total),
type = 'bar',
orientation = 'h') %>%
layout(yaxis = list(title = "Crime Description"),
xaxis = list(title = "Number of Crime Complaints."),
margin = list(b = 100, l = 200, r = 70)) %>%
layout(showlegend = FALSE,
title = " Top 10 Crime Complaint in NYC of all times.",
autosize = TRUE
) %>%
layout(autosize = F, width = 800, height = 500)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
ggplotly(p)
p <- plot_ly(data = head(boroughcomplaintdescdata_1,10),
y = ~ofns_desc,
x = ~MANHATTAN,
text = ~paste("Total of mentions:", Total),
type = 'bar',
orientation = 'h') %>%
layout(yaxis = list(title = "Crime Description"),
xaxis = list(title = "Number of Crime Complaints."),
margin = list(b = 100, l = 200, r = 80)) %>%
layout(showlegend = FALSE,
title = "Top 10 Crime Complaint in MANHATTAN of all times.",
autosize = TRUE
) %>%
layout(autosize = F, width = 800, height = 500)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
ggplotly(p)
p <- plot_ly(data = head(boroughcomplaintdescdata_1,10),
y = ~ofns_desc,
x = ~BRONX,
text = ~paste("Total of mentions:", Total),
type = 'bar',
orientation = 'h') %>%
layout(yaxis = list(title = "Crime Description"),
xaxis = list(title = "Number of Crime Complaints."),
margin = list(b = 100, l = 200, r = 80)) %>%
layout(showlegend = FALSE,
title = " Top 10 Crime Complaint in BRONX of all times.",
autosize = TRUE
) %>%
layout(autosize = F, width = 800, height = 500)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
ggplotly(p)
p <- plot_ly(data = head(boroughcomplaintdescdata_1,10),
y = ~ofns_desc,
x = ~`STATEN ISLAND`,
text = ~paste("Total of mentions:", Total),
type = 'bar',
orientation = 'h') %>%
layout(yaxis = list(title = "Crime Description"),
xaxis = list(title = "Number of Crime Complaints."),
margin = list(b = 100, l = 200, r = 80)) %>%
layout(showlegend = FALSE,
title = " Top 10 Crime Complaint in STATEN ISLAND of all times.",
autosize = TRUE
) %>%
layout(autosize = F, width = 800, height = 500)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
ggplotly(p)
p <- plot_ly(data = head(boroughcomplaintdescdata_1,10),
y = ~ofns_desc,
x = ~BROOKLYN,
text = ~paste("Total of mentions:", Total),
type = 'bar',
orientation = 'h') %>%
layout(yaxis = list(title = "Crime Description"),
xaxis = list(title = "Number of Crime Complaints."),
margin = list(b = 100, l = 200, r = 80)) %>%
layout(showlegend = FALSE,
title = " Top 10 Crime Complaint in BROOKLYN of all times.",
autosize = TRUE
) %>%
layout(autosize = F, width = 800, height = 500)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
ggplotly(p)
p <- plot_ly(data = head(boroughcomplaintdescdata_1,10),
y = ~ofns_desc,
x = ~QUEENS,
text = ~paste("Total of mentions:", Total),
type = 'bar',
orientation = 'h') %>%
layout(yaxis = list(title = "Crime Description"),
xaxis = list(title = "Number of Crime Complaints."),
margin = list(b = 100, l = 200, r = 80)) %>%
layout(showlegend = FALSE,
title = " Top 10 Crime Complaint in QUEENS of all times.",
autosize = TRUE
) %>%
layout(autosize = TRUE, width = 800, height = 500)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
ggplotly(p)
we can quickly observe a few interesting patterns of contributing factors reported as crime descriptions as follows:
The top 10 crime complaint descriptions all over the 5 boroughs, that is Petit Larceny & Related Offenses , Harresment, Robbery, Assault 3 & Related Offenses , Criminal Miscief,Grand Larceny , Off. Agnst Pub ord Sensblty, Fellony Assault, MISCELLANEOUS PENAL LAW, Dangerous Drugs are the top criminal complaints descriptions all over the five boroughs.
MANHATTAN seems to present a very high number of crime complaints Petit Larceny & Related Offenses contrary to the BRONX , STATEN ISLAND ,Queens for example.
Brooklyn and Manhattan stand out due to Dangerous Drugs, doubling up Queens and about three times more than the Staten Island boroughs.
BROOKLYN present a high number of Roobery mentions compared to the other boroughs.
Here, I’m going to find out whether Bronx borough County compalint rates correlated with Bronx (regional data) macro factors such as Unemployment rate and poverty.
rawpoverty<-read.csv(url("https://raw.githubusercontent.com/omerozeren/DATA607/master/FINAL_PROJECT/Poverty_Data.csv"))
names(rawpoverty)[4]<-"county"
unemployment<-read.csv(url("https://raw.githubusercontent.com/omerozeren/DATA607/master/FINAL_PROJECT/Local_Area_Unemployment_Data.csv"))
names(unemployment)[1]<-"county"
BRONX_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='BRONX',]
QUEENS_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='QUEENS',]
STATEN_ISLAND_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='STATEN ISLAND',]
MANHATTAN_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='MANHATTAN',]
BROOKLYN_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='BROOKLYN',]
# 2013 values
pov13 <- rawpoverty %>% filter(Year == 2013) %>% select(county,pct = `All.Ages.in.Poverty.Percent`, inc = `Median.Household.Income.in.Dollars`)
# 2016 values
pov16 <- rawpoverty %>% filter(Year == 2016) %>% select(county,pct = `All.Ages.in.Poverty.Percent`, inc = `Median.Household.Income.in.Dollars`)
# Fix income values by removing $ and ,
pov13$inc <- as.numeric(str_replace_all(pov13$inc,"\\$|,",""))
pov16$inc <- as.numeric(str_replace_all(pov16$inc,"\\$|,",""))
# Combine our data frames
poverty <- inner_join(pov13, pov16, by=c("county" = "county"),
suffix = c("_2013","_2016"))
# Compute changes and impute interim values
poverty <- poverty %>%
mutate(povChg = pct_2016 - pct_2013, incChg = inc_2016 - inc_2013,
povIncrement = povChg / 3, incIncrement = incChg / 3,
pct_2014 = pct_2013 + povIncrement, pct_2015 = pct_2016 - povIncrement,
inc_2014 = inc_2013 + incIncrement, inc_2015 = inc_2016 - incIncrement)
# Tidy the data
poverty <- poverty %>%
gather(key="year",
value="value",
pct_2013,pct_2014,pct_2015,pct_2016,
inc_2013,inc_2014,inc_2015,inc_2016)
poverty <- poverty %>% separate(year,c("measure","yr"),"_")
poverty <- poverty %>% spread(key="measure",value="value") %>% select(county, year = yr, income = inc, poverty = pct)
poverty$year <- as.numeric(poverty$year)
# Adjust income to 1,000's scale
poverty$income <- round(poverty$income/1000,2)
kable(head(poverty,20))
county | year | income | poverty |
---|---|---|---|
United States | 2013 | 52.25 | 15.8 |
New York | 2013 | 57.26 | 16.0 |
Albany County (NY) | 2013 | 55.78 | 13.7 |
Allegany County (NY) | 2013 | 41.85 | 16.7 |
Bronx County (NY) | 2013 | 33.08 | 30.7 |
Broome County (NY) | 2013 | 45.14 | 17.7 |
Cattaraugus County (NY) | 2013 | 40.94 | 18.9 |
Cayuga County (NY) | 2013 | 48.98 | 14.2 |
Chautauqua County (NY) | 2013 | 40.47 | 19.1 |
Chemung County (NY) | 2013 | 45.34 | 17.0 |
Chenango County (NY) | 2013 | 44.33 | 16.8 |
Clinton County (NY) | 2013 | 45.90 | 15.7 |
Columbia County (NY) | 2013 | 57.09 | 11.8 |
Cortland County (NY) | 2013 | 46.16 | 14.6 |
Delaware County (NY) | 2013 | 43.38 | 17.8 |
Dutchess County (NY) | 2013 | 69.30 | 9.5 |
Erie County (NY) | 2013 | 51.12 | 15.1 |
Essex County (NY) | 2013 | 46.34 | 11.7 |
Franklin County (NY) | 2013 | 42.67 | 22.4 |
Fulton County (NY) | 2013 | 42.26 | 15.7 |
To support our analysis, we will join our data frames so we have all the variables we will be using in a single data frame.
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
BRONX_data<-boroughmonthlycomplaintdata_1[boroughmonthlycomplaintdata_1$boro_nm=='BRONX',]
BRONX_data<-na.omit(BRONX_data)
names(BRONX_data)[4]<-"county"
unemployment$county<-as.character(unemployment$county)
bronx_unemp<-unemployment[unemployment$county =='Bronx County',]
bronx_poverty <- poverty[poverty$county == "Bronx County (NY)",]
#Ordering the data
bronx_unemp<-bronx_unemp[order(bronx_unemp$Year, bronx_unemp$Month,decreasing = TRUE), ]
macro_data<- inner_join(bronx_poverty,bronx_unemp,by=c("year" = "Year"))
macro_data$year <- BRONX_data$Year[1:48]
model_data <- macro_data %>%
group_by(year, Month) %>%
summarise(average_income = mean(income),
average_poverty = mean(poverty),
average_un_rate = mean(Unemployment.Rate))
#model_data$year<-as.character(model_data$year)
model_data<-model_data[order(model_data$year, model_data$Month,decreasing = TRUE), ]
model_data$Criminal_Rate <- BRONX_data$Percentage_int[1:48]
Let’s build a model to see how the variables relate to the Bronx crime compliant rate.
# Linear model for 2013
model<- lm(Criminal_Rate ~ average_income + average_poverty + average_un_rate,data=model_data)
summary(model)
##
## Call:
## lm(formula = Criminal_Rate ~ average_income + average_poverty +
## average_un_rate, data = model_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.76305 -0.35172 0.01825 0.35840 0.56284
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.886e+03 2.683e+03 -3.685 0.000624 ***
## average_income 1.034e+02 2.809e+01 3.680 0.000633 ***
## average_poverty 2.106e+02 5.712e+01 3.686 0.000621 ***
## average_un_rate 2.422e-01 7.796e-02 3.107 0.003307 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4263 on 44 degrees of freedom
## Multiple R-squared: 0.7279, Adjusted R-squared: 0.7094
## F-statistic: 39.24 on 3 and 44 DF, p-value: 1.691e-12
plot(model)
After modeling analysis, it is remarkably true that there is statistically a significant linear relationship between the criminal rate of only the BRONX borough and income, unemployment, or poverty rates. I could not locate income, unemployment, or poverty rates data for the remaining four boroughs
A high-level crime description within all of NYC (inclusive of all five boroughs- Queens, Manhattan, Bronx, Staten Island, Brooklyn) displayed that the following complaints were the highest from 2010-2018- petit larceny, harassment, assault & related offenses, and criminal mischief.
The crime descriptions broken out by borough for the highest number of complaints:
Petit Larceny: Manhattan, Brooklyn, Bronx, Queens, Staten Island
Harassment: Brooklyn, Bronx, Manhattan, Queens, Staten Island
Assault & Related Offenses: Bronx, Brooklyn, Manhattan, Queens, Staten Island
Criminal Mischief: Brooklyn, Bronx, Manhattan, Queens, Staten Island
Overall Staten Island has the least criminal complaints amongst the five boroughs. As a result, I recommend my sister to list the following boroughs as her top three preferences to teach: Staten Island, Queens, Manhattan
NYPD Complaint Data - https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Current-Year-To-Date-/5uac-w243
Unemployment Data- https://data.ny.gov/Economic-Development/Local-Area-Unemployment-Statistics-Beginning-1976/5hyu-bdh8
Income and Poverty Estimates - https://www.census.gov/data-tools/demo/saipe/saipe.html?s_appName=saipe&map_yearSelector=2013&map_geoSelector=aa_c&s_state=36&s_year=2016,2013