1 NYC Open data Analysis (311 Service Request)

NYC Open data is a city initiative and a powerful tool which ensures transparency and fosters civic innovation within our NYC to help and improve the quality of life for millions of New Yorkers. Every day, NYC311 receives hundreds of thousands of requests from New Yorkers related to several types of service which city provides through city agencies. The NYC officials respond to the requests and tried to close the issues as soon as they can. The data is publically available and also accessible online.

2 Motivation

NYC 311 data is available from 2010 onwards and it is truly a BIG Data analytics dataset. I would like to take this an opportunity to understand the NYC311 service request data better by understanding the numerous types of requests and how they are handled by city agencies.

3 Data Sources

The data provided by 311 open data can be downloaded from NYC open data portal.

4 Goal

  • Make use of both exported data in csv format and APIs and use them appropriately as needed
  • Wrangle wide NYC 311 dataset
  • Visualize using ggplot and other visualization techniques
  • Perform statistical analysis
  • Analyze data to find the trend based upon request attributes:
    • Types of request
    • Request status such as closed time, open, pending etc
  • Design a shiny app to create an interactive dashboard
  • Try to create a complaint map and also visualize most common or least common complaints for historical and daily data

5 Data Acquisition (BIG Data Analytics Platform)

A fully automated data pipeline is built to acquire data directly from NYC Open Data portal. The data has been moved Amazon EMR (Elastic Map Reduce, Hadoop ) for Big Data Analytics purpose.

  • Fired up an AWS EC2 instance with Hadoop, Spark and R
  • Loaded gigabytes of 311 data as a CSV with automated data pipeline.
  • Moved the CSV into HDFS (Hadoop Distributed File System),
  • Used SparklyR to read the data in efficient manner and crunched the data using Spark functions and dplyr
  • Data is transformed and aggregated datasets are loaded into Amazon RDS (MySQL)
  • Exctacted map data (Latitude and Longitude) and loaded into MongoDB (http://mlab.com)
  • Extracted real time data on using its Socrata API

6 EDA (Exploratory Data Analysis)

The dataset has over 15 million data points and 14 features. Let’s take a look at the dataset and try to gain some insight.

Here are the list of 20 NYC Agencies available in the dataset:

##  [1] "HPD"   "DOT"   "DOB"   "DOHMH" "NYPD"  "DCA"   "DSNY"  "DEP"  
##  [9] "DPR"   "TLC"   "DOE"   "DHS"   "DFTA"  "3-1-1" "DOITT" "EDC"  
## [17] "DOF"   "NYCEM" "FDNY"  "HRA"

Let’s understand the dataset first and explore availabe feautres.

## [1] 15153259
ukey CreatedDate ClosedDate Agency ComplaintType zipcode Status Borough DataChannel Latitude Longitude Cryear Crmonth Crday
27117472 01/10/2014 12:00:00 AM 01/13/2014 12:00:00 AM HPD HEATING 11203 Closed BROOKLYN PHONE 40.64660 -73.94623 2014 01 10
27117473 01/10/2014 12:00:00 AM 01/15/2014 12:00:00 AM HPD HEATING 10457 Closed BRONX PHONE 40.84935 -73.89262 2014 01 10
27117474 01/10/2014 12:00:00 AM 01/12/2014 12:00:00 AM HPD HEATING 10455 Closed BRONX PHONE 40.81825 -73.90657 2014 01 10
27117475 01/10/2014 02:43:00 PM 01/12/2014 12:05:00 PM DOT Traffic Signal Condition 11432 Closed QUEENS UNKNOWN 40.70399 -73.79855 2014 01 10
27117476 01/10/2014 12:00:00 AM 01/13/2014 12:00:00 AM HPD HEATING 11432 Closed QUEENS ONLINE 40.71407 -73.77787 2014 01 10
27117477 01/10/2014 12:00:00 AM 01/13/2014 12:00:00 AM HPD HEATING 11412 Closed QUEENS PHONE 40.69932 -73.76677 2014 01 10

6.2 NYC 311 Complaints handeled by NYC Agencies

From the below plot it appears that NYPS and HPD are definitely top 2 agencies who are handling more number of 311 complaints.

So, let’s explore NYPD 311 data and see which borough has more number of complaints. All boroughs appear to have upaward trend and Brooklyn & Queens are topping the list.

Agency Borough Cryear TotalCount
NYPD BROOKLYN 2017 239700
NYPD BROOKLYN 2016 216561
HPD BROOKLYN 2014 206773
HPD BROOKLYN 2015 202321
HPD BROOKLYN 2012 200007
HPD BROOKLYN 2013 195273
NYPD QUEENS 2017 194131
HPD BRONX 2014 194127
HPD BRONX 2016 191845
HPD BROOKLYN 2016 191133

After analyzing NYPD and HPD complaints in detail it appears that complaints handeled by NYPD has a very clear upward trend whereas HPD data distribution for % change over the period of time appears to be a normal distribution.

6.3 Most Common Complaints

Noise - Residential and HEAT/HOT WATER are clearly top 2 complaints of recent years.

6.4 Least Common Complaints

X-Ray Machine/Equipment and Bottled Water are rank lowest among the least common complaints list

6.5 Heatmap of the most common complaints

Below is the borough wise heatmap for 311 complaints. As expected, Noise - Residential and HEAT/HOT WATER complaint types clearly have upward trends whereas street condition and unsanitary condition complaint types are comparively low in number and also appear consistent over last 3 years.

6.6 Complaint Status

Data analysis shows that majority number of complaints are closed and only few are open and pending. From the plot it’s very clear that there are some ageing complaints that need attention.

ComplaintType Status Cryear TotalCount
Noise - Residential Closed 2017 229017
HEAT/HOT WATER Closed 2016 227107
HEAT/HOT WATER Closed 2015 225194
Noise - Residential Closed 2016 220964
HEAT/HOT WATER Closed 2017 212547
Noise - Residential Closed 2015 206406
Noise - Residential Closed 2014 192267
HEATING Closed 2013 190398
HEATING Closed 2012 181094
Noise - Residential Closed 2013 151144

6.7 Same Day Complaint closures

Below plot shows top 100 complaints types that got closed on same day by their respective agencies.

  • Blocked Drivaway and Illegal Parking in Queens and Brooklyn appears to be the most common type complaint that got closed on same day
  • Only Manhattan appears to have homeless person assitance complaints.

6.8 Data Channel

From below plot it’s pretty evident that Phone is the most commonly used data channel for NYC 311 Complaints. Online data channel also appears to have upward trend.

7 Some Statistical Analysis..

For the most commmon complaint type , Heat/ Hot Water, below box plot clealy indicates that Bronx has most number of such complaint where as difference between median and 1st quartile (IQR) is high for Manhattan and Queens both.

ComplaintType Borough Cryear TotalCount
HEAT/HOT WATER BRONX 2016 74197
HEAT/HOT WATER BRONX 2015 71864
HEAT/HOT WATER BROOKLYN 2016 69572
HEAT/HOT WATER BRONX 2017 68713
Noise - Residential BROOKLYN 2016 68300
HEAT/HOT WATER BROOKLYN 2015 67874
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1134   16944   36885   36700   54259   74197

8 Real Time Data: NYC 311

The real time data has been fetched with open data Scorata API and using RSocrata library. Below is the dataframe for the same.

## [1] 56316

9 Shiny App

Here is the link https://niteenk.shinyapps.io/NYC311_DATA607/ Shiny App. It’s an interative data visualization for NYC maps and Complaints Trends

10 Conclusion:

  • The number of complaints have gone up since 2010 and has upward trend. For year 2016-17:
    • Staten Island has highest, almost 17% increase in number of complaints from 16.65% to 33.59%
    • Manhattan has ~10% less complaints registered
    • Bronx has lowest % increase ~3%
  • NYPD and HPD handle most of number of complaints
  • Most and Least Common Complaints:
    • Noise - Residential and HEAT/HOT WATER are the 2 complaints
    • X-Ray Machine/Equipment and Bottled Water are rank lowest among
  • Same day complaint closure:
    • Blocked Drivaway and Illegal Parking in Queens and Brooklyn top the list of same day complaint closure
  • Phone and Online complaints are the most common type of data channel for complaints
  • In Shiny interactive app, NYPD (2015) data cluster map analsis show upward trend