DAT608 - Final Project Proposal
library(RSocrata)
library(tidyverse)
library(plotly)
library(blsAPI)
library(jsonlite)
library(knitr)
library(kableExtra)
Objective
This project is to create a interactive Shiny dashboard with NYPD Shooting Incident Data from NYC Open Data Socrata Open Data API as data source targeting a multi angle view of the data including time series, location, and other informative data attributes. The purpose of this project is to develop interactive data visualization app using public SODA API and dynamic data visualization tools such as plotly.
Data Introduction
Data Source
1. NYPD Shooting Incident Data (Year 2020)
List of every shooting incident that occurred in NYC during the current calendar year. This is a breakdown of every shooting incident that occurred in NYC during the current calendar year. This data is manually extracted every quarter and reviewed by the Office of Management Analysis and Planning before being posted on the NYPD website. Each record represents a shooting incident in NYC and includes information about the event, the location and time of occurrence. In addition, information related to suspect and victim demographics is also included. This data can be used by the public to explore the nature of police enforcement activity.
Url: https://data.cityofnewyork.us/Public-Safety/NYPD-Shooting-Incident-Data-Year-To-Date-/5ucz-vwe8
SODA API: https://data.cityofnewyork.us/resource/5ucz-vwe8.csv
2. NYPD Shooting Incident Data (Historic)
List of every shooting incident that occurred in NYC from year 2006 to 2019.
Url: https://data.cityofnewyork.us/Public-Safety/NYPD-Shooting-Incident-Data-Historic-/833y-fsy8
SODA API: https://data.cityofnewyork.us/resource/833y-fsy8.csv
Data Dictionary
read_csv('https://raw.githubusercontent.com/oggyluky11/DATA608-SPRING-2021/main/Final-Project/NYPD_Shooting_Incident_Data_Dictionary.csv') %>%
kable()
Column Name | Column Description |
---|---|
INCIDENT_KEY | Randomly generated persistent ID for each incident |
OCCUR_DATE | Exact date of the shooting incident |
OCCUR_TIME | Exact time of the shooting incident |
BORO | Borough where the shooting incident occurred |
PRECINCT | Precinct where the shooting incident occurred |
JURISDICTION_CODE | Jurisdiction where the shooting incident occurred. Jurisdiction codes 0(Patrol), 1(Transit) and 2(Housing) represent NYPD whilst codes 3 and more represent non NYPD jurisdictions |
LOCATION_DESC | Location of the shooting incident |
STATISTICAL_MURDER_FLAG | Shooting resulted in the victim’s death which would be counted as a murder |
PERP_AGE_GROUP | Perpetrator’s age within a category |
PERP_SEX | Perpetrator’s sex description |
PERP_RACE | Perpetrator’s race description |
VIC_AGE_GROUP | Victim’s age within a category |
VIC_SEX | Victim’s sex description |
VIC_RACE | Victim’s race description |
X_COORD_CD | Midblock X-coordinate for New York State Plane Coordinate System, Long Island Zone, NAD 83, units feet (FIPS 3104) |
Y_COORD_CD | Midblock Y-coordinate for New York State Plane Coordinate System, Long Island Zone, NAD 83, units feet (FIPS 3104) |
Data Exploration
#data from year 2013 - 2019
shooting_url1 <- 'https://data.cityofnewyork.us/resource/833y-fsy8.csv'
shooting_data1 <- read.socrata(shooting_url1)
#data from year 2020
shooting_url2 <- 'https://data.cityofnewyork.us/resource/5ucz-vwe8.csv'
shooting_data2 <- read.socrata(shooting_url2)
shooting_data <- shooting_data1 %>%
rbind(shooting_data2)
glimpse(shooting_data)
## Rows: 23,568
## Columns: 19
## $ incident_key <int> 109548290, 74428638, 65440608, 163842829, 8...
## $ occur_date <dttm> 2014-04-11, 2010-09-01, 2009-09-05, 2017-0...
## $ occur_time <chr> "20:14:00", "22:00:00", "9:58:00", "21:45:0...
## $ boro <chr> "STATEN ISLAND", "MANHATTAN", "BROOKLYN", "...
## $ precinct <int> 122, 23, 77, 44, 75, 52, 23, 105, 103, 73, ...
## $ jurisdiction_code <int> 2, 2, 0, 0, 0, 0, 2, 0, 0, 2, 0, 0, 0, 0, 0...
## $ location_desc <chr> "MULTI DWELL - PUBLIC HOUS", "MULTI DWELL -...
## $ statistical_murder_flag <chr> "false", "false", "false", "true", "true", ...
## $ perp_age_group <chr> "", "18-24", "18-24", "25-44", "", "UNKNOWN...
## $ perp_sex <chr> "", "M", "M", "M", "", "M", "U", "", "F", "...
## $ perp_race <chr> "", "BLACK HISPANIC", "BLACK", "BLACK", "",...
## $ vic_age_group <chr> "18-24", "18-24", "18-24", "25-44", "45-64"...
## $ vic_sex <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M"...
## $ vic_race <chr> "BLACK", "BLACK", "BLACK", "BLACK", "BLACK ...
## $ x_coord_cd <chr> "956246", "1000102", "997761", "1008325", "...
## $ y_coord_cd <chr> "153097", "229680", "186202", "246288", "18...
## $ latitude <dbl> 40.58686, 40.79709, 40.67776, 40.84265, 40....
## $ longitude <dbl> -74.10083, -73.94275, -73.95129, -73.91299,...
## $ geocoded_column <chr> "POINT (-74.10082545299997 40.5868559840000...
Expected Project Deliverable
The interactive dashboard will be developed using Shiny Dashboard. Two sections / slides will be built.
Section 1 will be visualization on time series and locations. Vertical & horizontal count plots, pie plots and map plots will be used. See draft below:
Section 2 will be visualization based other informative categorical attributes in the data. Vertical & horizontal count plots, pie plots will be used. See draft below: