1 NYC Open data Analysis (311 Service Request)

NYC Open data is a city initiative and a powerful tool which ensures transparency and fosters civic innovation within our NYC to help and improve the quality of life for millions of New Yorkers. Every day, NYC311 receives hundreds of thousands of requests from New Yorkers related to several types of service which city provides through city agencies. The NYC officials respond to the requests and tried to close the issues as soon as they can. The data is publically available and also accessible online.

2 Motivation

NYC 311 data is available from 2010 onwards and it is truly a BIG Data analytics dataset. I would like to take this an opportunity to understand the NYC311 service request data better by understanding the numerous types of requests and how they are handled by city agencies.

3 Data Sources

The data provided by 311 open data can be downloaded from NYC open data portal.

4 Goal

  • Make use of both exported data in csv format and APIs and use them appropriately as needed
  • Wrangle wide NYC 311 dataset
  • Visualize using ggplot and other visualization techniques
  • Perform statistical analysis
  • Analyze data to find the trend based upon request attributes:
    • Types of request
    • Request status such as closed time, open, pending etc
  • Try to create a complaint map and also visualize most common or least common complaints for historical and daily data

5 Data Acquisition (BIG Data Analytics Platform)

A fully automated data pipeline is built to acquire data directly from NYC Open Data portal. The data has been moved Amazon EMR (Elastic Map Reduce, Hadoop ) for Big Data Analytics purpose.

  • Fired up an AWS EC2 instance with Hadoop, Spark and R
  • Loaded gigabytes of 311 data as a CSV with automated data pipeline.
  • Moved the CSV into HDFS (Hadoop Distributed File System)
  • Used SparklyR to read the data in efficient manner and crunched the data using Spark functions and dplyr
  • Data is transformed and aggregated datasets are loaded into Amazon RDS (MySQL)

6 EDA (Exploratory Data Analysis)

The dataset has over 20 million data points and 14 features. Let’s take a look at the dataset and try to gain some insight.

Here are the list of 20 NYC Agencies available in the dataset:

Let’s understand the dataset first and explore availabe feautres.

6.2 NYC 311 Complaints handeled by NYC Agencies

From the below plot it appears that NYPS and HPD are definitely top 2 agencies who are handling more number of 311 complaints.

So, let’s explore NYPD 311 data and see which borough has more number of complaints. All boroughs appear to have upaward trend and Brooklyn & Queens are topping the list.

After analyzing NYPD and HPD complaints in detail it appears that complaints handeled by NYPD has a very clear upward trend whereas HPD data distribution for % change over the period of time appears to be a normal distribution.

6.3 Most Common Complaints

Noise - Residential and HEAT/HOT WATER are clearly top 2 complaints of recent years.

6.4 Least Common Complaints

X-Ray Machine/Equipment and Bottled Water are ranked lowest

6.5 Heatmap of the most common complaints

Below is the borough wise heatmap for 311 complaints. As expected, Noise - Residential and HEAT/HOT WATER complaint types clearly have upward trends whereas street condition and unsanitary condition complaint types are comparively low in number and also appear consistent over last 3 years.

6.6 Complaint Status

Data analysis shows that majority number of complaints are closed and only few are open and pending. From the plot it’s very clear that there are some ageing complaints that need attention.

6.7 Same Day Complaint closures

Below plot shows top 100 complaints types that got closed on same day by their respective agencies.

  • Blocked Drivaway and Illegal Parking in Queens and Brooklyn appear to be the most common type complaint that got closed on same day
  • Only Manhattan appears to have homeless person assitance complaints.

7 Some Statistical Analysis..

For the most commmon complaint type , Heat/ Hot Water, below box plot clealy indicates that Bronx has most number of such complaint where as difference between median and 1st quartile (IQR) is high for Manhattan and Queens both.

8 Conclusion:

  • The number of complaints have gone up since 2010 and has upward trend. For year 2016-17:
    • Almost 17% increase in number of complaints from 16.65% to 33.59% for Staten Island
    • Manhattan has ~10% less complaints registered
    • The lowest % increase ~3% noticed in Bronx
  • Most of number of complaints handeled by NYPD and HPD
  • Most and Least Common Complaints:
    • Noise - Residential and HEAT/HOT WATER are the top 2 complaints types
    • X-Ray Machine/Equipment and Bottled Water are lowest ranked complaints typpes
  • Same day complaint closure:
    • Blocked Drivaway and Illegal Parking in Queens and Brooklyn top the list of same day complaint closure