I am seeking to understand the rates of criminal complaints by borough and then compare this with factors such as unemployment and income levels to determine if there is a correlation amongst them.
My sister just graduated with a master’s degree in education and is about to list the school of preference by borough. I would like to conduct this analysis to help her determine which borough is safest for her to be teaching.
Data for this analysis is made up of three components.
The first dataset represents a breakdown of every criminal complaint report filed in NYC by the NYPD up until October2018. Each record represents a criminal complaint in NYC and includes information about the types of crime (ex. felony, misdemeanor, etc. ), the location (by city borough), and time of enforcement. The data is from the New York State website.
The second dataset represents all unemployment statistics for NYC from the New York State website.
The third dataset represents income and poverty estimates for NYC by borough from the United States Census Bureau.
Write code to grab criminal data from the API, unemployment statistics and income & poverty estimates from Excel and store them in SQL
Use the supervised learning approach (Support Vector Machine,Gradient Boosting) to identify which borough has the most criminal activity- rank the boroughs. I will,however,likely look into unsupervised methods as part of the process as well.
Split the data in train & test,and tune a model to detect rate of criminal activity.
Download images for all pages identified and visually confirm that I got what I wanted.
NYPD Complaint Data - https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Current-Year-To-Date-/5uac-w243
Unemployment Data- https://data.ny.gov/Economic-Development/Local-Area-Unemployment-Statistics-Beginning-1976/5hyu-bdh8
Income and Poverty Estimates - https://www.census.gov/data-tools/demo/saipe/saipe.html?s_appName=saipe&map_yearSelector=2013&map_geoSelector=aa_c&s_state=36&s_year=2016,2013