NYC 311 complaint analysis

Write-up on visualization

Parameters of dataset

NYC 311 dataset is formed from or all the data points are derived from New York.

Now, since data source is socrata API, It pulls the latest dataset from the NYC 311 and it updated on a daily basis. Therefore, whenever the API is called, daily datasets are called.

For twitter dataset, it pulls 1100 latest tweets mentioning NYC 311.

Some of the main data points for the NYC 311 dataset are:

“agency”
“agency_name”
“complaint_type”
“descriptor”
“incident_zip”
“incident_address”
“street_name”
“cross_street_1”
“cross_street_2”
“intersection_street_1”
“intersection_street_2”
“status”
“community_board”
“borough”
“x_coordinate_state_plane”
“y_coordinate_state_plane”
“open_data_channel_type”
“park_facility_name”
“park_borough”
“latitude”
“longitude”
“location”
“resolution_description”
“resolution_action_updated_date”

About NYC Open Data set

Beginning in 2010, NYC launched an initiative to expose government data via NYC Open Data in an effort to “improve the accessibility, transparency, and accountability of City government, this catalog offers access to a repository of government-produced, machine-readable data sets.”

What dataset shows and Why is it important

NYC 311’s mission is to provide the public with quick, easy access to all New York City government services and information while offering the best customer service. It help Agencies improve service delivery by allowing them to focus on their core missions and manage their workload efficiently.

NYC 311 data is updated on a daily basis and is provided by DoITT where currently I am pursuing my internship. Therefore, I wanted to apply visualization concepts studied in Data 608 to analyze this data set.

Aim

  • To analyze and build visualizations for issues around New York City (including Manhattan, Queens, Brooklyn, and Bronx) by frequency of reported incidents in each area.

  • NYC 311 Service Requests & Resolution Analysis through Text Mining

  • Explore and analyze NYC 311 Service requests (historical data sets) to understand diverse patterns, regular themes and trends, as well as community satisfaction levels derived from resolution categories and timing.

  • I would also want to do sentiment analysis using Syuzhet Package on the NYC 311 twitter comments to determine “nyc311” Tweet’s Emotions especially during the period of virus outbreak and also create visualization for same.

Import libraries

Load all the necessary packages

Load the data using socrata API

Analyze the dataset with socrata API

## [1] "data.frame"

Display column Names and no. of rows

##  [1] "unique_key"                     "created_date"                  
##  [3] "agency"                         "agency_name"                   
##  [5] "complaint_type"                 "descriptor"                    
##  [7] "location_type"                  "incident_zip"                  
##  [9] "incident_address"               "street_name"                   
## [11] "cross_street_1"                 "cross_street_2"                
## [13] "intersection_street_1"          "intersection_street_2"         
## [15] "city"                           "landmark"                      
## [17] "status"                         "community_board"               
## [19] "bbl"                            "borough"                       
## [21] "x_coordinate_state_plane"       "y_coordinate_state_plane"      
## [23] "open_data_channel_type"         "park_facility_name"            
## [25] "park_borough"                   "latitude"                      
## [27] "longitude"                      "location"                      
## [29] ":@computed_region_efsh_h5xi"    ":@computed_region_f5dn_yrer"   
## [31] ":@computed_region_yeji_bk3q"    ":@computed_region_92fq_4b7q"   
## [33] ":@computed_region_sbqj_enih"    "closed_date"                   
## [35] "resolution_description"         "resolution_action_updated_date"
## [37] "address_type"                   "facility_type"                 
## [39] "taxi_pick_up_location"
## [1] 1000

First 5 rows of dataset

Data Exploration and Visualization

1) Top 50 most common complain types

As we see above, highest number of service requests is for the Noise-residential complain type followed by Noise-street/sidewalk.

2) Most common complaint types by borough and status

No. of complaints/ Count of complaints by borough and status:

As we analyze from the graph above, Bronx, Brooklyn and Manhattan has over 100 complaints which are at the closed status, which shows a good progress to solve complaints by NYC 311.

Service Request Resolutions Tidying and Analysis - Using Tidytext

In this section, we will analyse frequent words used by Service Request Resolutions,

Let’s use Tidytext for this purpose.

Most frequent words used in NYC311 Service Requests

The following step also filters the data having value as “NA” and does not include it in the tokenized_resolutions dataset

## Joining, by = "word"
## Rows: 281
## Columns: 3
## Groups: borough [5]
## $ borough <chr> "BRONX", "BRONX", "BRONX", "BRONX", "BRONX", "BRONX", "BRONX"…
## $ word    <chr> "act", "action", "additional", "arrival", "attempt", "complai…
## $ n       <int> 1, 65, 2, 3, 32, 128, 95, 32, 32, 32, 32, 100, 5, 4, 21, 60, …

Analyze internal structure of tokenized_resolutions

## tibble [281 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
##  $ borough: chr [1:281] "BRONX" "BRONX" "BRONX" "BRONX" ...
##  $ word   : chr [1:281] "act" "action" "additional" "arrival" ...
##  $ n      : int [1:281] 1 65 2 3 32 128 95 32 32 32 ...
##  - attr(*, "groups")= tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
##   ..$ borough: chr [1:5] "BRONX" "BROOKLYN" "MANHATTAN" "QUEENS" ...
##   ..$ .rows  :List of 5
##   .. ..$ : int [1:40] 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..$ : int [1:64] 41 42 43 44 45 46 47 48 49 50 ...
##   .. ..$ : int [1:78] 105 106 107 108 109 110 111 112 113 114 ...
##   .. ..$ : int [1:59] 183 184 185 186 187 188 189 190 191 192 ...
##   .. ..$ : int [1:40] 242 243 244 245 246 247 248 249 250 251 ...
##   ..- attr(*, ".drop")= logi TRUE

Let’s see first few rows of tokenized_resolutions

Now let’s look for the top 25 most frequent word used in complaints by 5 boroughs:

As we see above, in all 5 boroughs, most frequently used word is police followed by department, complaint,responded.

Determining terms/words truly characteristic for SRs by Borough leveraging textmining (TF-IDF)

In information retrieval, tf–idf or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.

Presenting characterisitc terms/words for SRs by Borough

Let’s analyze some distinctive words used by each borough

As we can infer from the graph above, most distinctive words used by boroughs are

  1. Bronx: reviewed followed by provided

  2. Manhattan: unable followed by premises

  3. Brooklyn: reviewed followed by provided

  4. Quees: reported followed by city

  5. Staten Island: violation followed by time

Map Analysis

Now let’s analyse the NYC 311 data using Map Analysis

Preparing and tidying up the data for map plotting

Dataset of counts_filtered in which the count of complaints is greater than 2

Service Request Resolutions Tidying and Analysis - Using TM

tm vignette is meant for text mining in R utilizing the text mining framework provided by the tm package.

Load required libraries

Filtering dataset to the most relevant Complaint Types

Cleaning up non-standard characters (encoding conversion)

Creating Corpus and TDM

## <<TermDocumentMatrix (terms: 121, documents: 998)>>
## Non-/sparse entries: 8179/112579
## Sparsity           : 93%
## Maximal term length: 15
## Weighting          : term frequency (tf)

Removing sparse terms (80% of sparse percentage of empty)

## <<TermDocumentMatrix (terms: 10, documents: 998)>>
## Non-/sparse entries: 5414/4566
## Sparsity           : 46%
## Maximal term length: 11
## Weighting          : term frequency (tf)

Top terms by frequency (mentioned at least 50 times)

## [1] 10

Displaying top terms

##  [1] "action"      "available"   "complaint"   "condition"   "department" 
##  [6] "fix"         "information" "police"      "responded"   "took"

Find top associations using findAssocs() for the top terms (lower correlation limit of 0.4). More consistent term association patterns found in service requests

## $action
## 
## 
##                 x
## ----------  -----
## fix          0.88
## took         0.88
## responded    0.59
## police       0.54
## 
## $available
## 
## 
##                   x
## ------------  -----
## information    0.84
## 
## $complaint
## 
## 
##             x
## -------  ----
## police    0.4
## 
## $condition
## 
## 
##           x
## -----  ----
## fix     0.6
## took    0.6
## 
## $department
## 
## 
##                 x
## ----------  -----
## police       0.84
## responded    0.82
## fix          0.43
## took         0.43

As per the above association figures,

action is associated to word:

  1. fix by 85%

  2. took by 85%

  3. police by 48%

  4. responded by 46%

Similarly, we can interpret for other words.

Creating a WordCloud for the top terms/words in the SRs

NYC311 Tweets Analysis

Now, let’s do NYC 311 tweet analysis,

Data Collection and Exploration

API Set-up (Application Name and security context). Commands commented and keys masked

Search and collect 1100 tweets doing any mention to the “nyc311” service (hashtag, user, follower, etc.)

x
@KGRLogic Good afternoon, thank you for reaching out. Please DM me with details on the type of inspection you requested. Thanks! https://t.co/hDTCua1AH9
@willardk Good afternoon, please send us a DM so I may ask you a few questions about this food delivery. Thank you! https://t.co/hDTCu9JZPB
@pixistik04 Good evening. You can report a fire hydrant that’s open online here: https://t.co/XGvdzZiRb8 or by sending us a DM for help with reporting. https://t.co/hDTCu9JZPB
@immichaelmorgan @NYCMayor @NYCMayorsOffice Good evening. You can get information and guidance about DMV service changes online at https://t.co/KXNs23W0Ry or you can reach out to them by phone at (718) 966-6155 Monday through Friday from 8:30 AM to 4 PM. Thanks!
@andrewPnelson2 Hi, all non-essential construction in NYC has been halted. DOB created a Real-Time Essential Construction Map, which shows the location of allowed essential construction sites in NYC. If a worksite isn’t on the map, DM us to file a report. https://t.co/hDTCu9JZPB

Sample of Users tweeting about “nyc311”

name location followers_count friends_count
New York City 311 New York City 346557 238
Yalaisa Wright United States 75 295
Boerum Hill Neighbors Brooklyn, NY 518 1702
Kevin New York, NY 221 451
Nicholas F 382 1330
Eagle One 🇺🇸🦅 ’Merica 277 307

Let’s plot “nyc311” Tweets Time series (Last 7-9 days)

We can see a very interesting graph above, there have been consecutive increase and decrease in the no. of tweets from May 03 to May 08, but there’s a drastic decrease in no. of tweets to nyc 311 betwwen May 10 and May 12, one of the main reasons can be due to covid-19.

Sentiment Analysis - Syuzhet Package

Syuzhet breaks the text/words into 10 different emotions - anger, anticipation, disgust, fear, joy, sadness, surprise, trust, negative and positive.

Let’s determine “nyc311” Tweet’s Emotions

nyc311_tweets_txt anger anticipation disgust fear joy sadness surprise trust negative positive
@KGRLogic Good afternoon, thank you for reaching out. Please DM me with details on the type of inspection you requested. Thanks! https://t.co/hDTCua1AH9 0 1 0 0 1 0 1 1 0 1
@willardk Good afternoon, please send us a DM so I may ask you a few questions about this food delivery. Thank you! https://t.co/hDTCu9JZPB 0 2 0 0 2 0 1 2 0 3
@pixistik04 Good evening. You can report a fire hydrant that’s open online here: https://t.co/XGvdzZiRb8 or by sending us a DM for help with reporting. https://t.co/hDTCu9JZPB 0 1 0 1 1 0 1 1 0 1

Sentiment Scoring

The core idea of sentiment scores is to put the number of positive reviews in relation to the number of negative reviews.

Let’s have a look at Positive Tweets

x
@KGRLogic Good afternoon, thank you for reaching out. Please DM me with details on the type of inspection you requested. Thanks! https://t.co/hDTCua1AH9
@willardk Good afternoon, please send us a DM so I may ask you a few questions about this food delivery. Thank you! https://t.co/hDTCu9JZPB
@pixistik04 Good evening. You can report a fire hydrant that’s open online here: https://t.co/XGvdzZiRb8 or by sending us a DM for help with reporting. https://t.co/hDTCu9JZPB
@immichaelmorgan @NYCMayor @NYCMayorsOffice Good evening. You can get information and guidance about DMV service changes online at https://t.co/KXNs23W0Ry or you can reach out to them by phone at (718) 966-6155 Monday through Friday from 8:30 AM to 4 PM. Thanks!
@andrewPnelson2 Hi, all non-essential construction in NYC has been halted. DOB created a Real-Time Essential Construction Map, which shows the location of allowed essential construction sites in NYC. If a worksite isn’t on the map, DM us to file a report. https://t.co/hDTCu9JZPB

Most Positive Tweet

## [1] "@PQuinceNYC @HelenRosenthal @NYCDOB Good morning, please send us a Direct Message. We have a few questions to clarify what is happening at the construction site to ensure that we file the correct report. Thank you. https://t.co/hDTCu9JZPB"

Let’s have a look at Negative Tweets

x
.@NYCDHS’s Code Blue is in effect until tomorrow, Sunday, May 10 at 8:00 AM. If you see a homeless person outside in these frigid temperatures, please call us at 311. https://t.co/jEaQyOxlxc
@domiruiz02 Hi, thank you for your tweets. Call 911 to report an emergency situation or condition that might cause danger to life or personal property and to report a medical or health-related emergency: https://t.co/Gf62x24xHN.
@DiamondVMedia @NYC_DOT @Pollytrott @NYCSpeakerCoJo @BPEricAdams @NYCMayor @NYGovCuomo Good morning, if the potholes are dangerous and likely to cause an accident, call 911. You can report potholes at https://t.co/MkR064QHhv or DM me and I’ll file for you. https://t.co/hDTCu9JZPB
UPDATE: #NYCASP rules are suspended through Sunday, May 17.

#NYCASP resumes Monday, May 18 through Sunday, May 24 for a citywide clean sweep.

#NYCASP rules will then be suspended again through Sunday, June 7.

Parking meters will remain in effect.

Follow @NYCASP for more. https://t.co/Qfh0v6R3Ia | |@megshashin @NYCMayor @NYCMayorsOffice Good morning, we’re sorry to hear about your experience. If you believe you’ve been discriminated against, you can file a complaint with NYC Commission on Human Rights at https://t.co/FvPrtMXcLe or send us a DM. https://t.co/hDTCu9JZPB |

Most Negative Tweet

## [1] "@nyc311 @NYPD13Pct @CarlinaRivera I reported a recurring homeless condition in Gramercy. It was referred to the NYPD and subsequently closed as “non crime corrected” It’s a disgusting and unhealthy situation.  He’s  defecating on the sidewalk. 333 East 23 Street b/t 1st and 2nd https://t.co/Huc4taBHkc"

Let’s now see Neutral Tweets

x
#NYCASP Las reglas de estacionamiento alterno están suspendidas hoy, sábado, 9 de mayo, hasta el martes, 12 de mayo. Los parquímetros permanecerán en efecto. Sigue @NYCASP y baja la aplicación móvil para recibir alertas directas a tu teléfono: https://t.co/9GSt3VfwSg https://t.co/8oFLl91kmn
#NYCASP Las reglas de estacionamiento alterno están suspendidas hoy, miércoles, 13 de mayo. Los parquímetros permanecerán en efecto.

Sigue @NYCASP y baja la aplicación móvil para recibir alertas directas a tu teléfono: https://t.co/9GSt3VfwSg | |#NYCASP Las reglas de estacionamiento alterno están suspendidas hoy, jueves, 7 de mayo, hasta el martes, 12 de mayo. Los parquímetros permanecerán en efecto. Sigue @NYCASP y baja la aplicación móvil para recibir alertas directas a tu teléfono: https://t.co/9GSt3VfwSg https://t.co/7OmeoBoGbv | |#NYCASP Las reglas de estacionamiento alterno están suspendidas hoy, martes, 12 de mayo. Los parquímetros permanecerán en efecto. Sigue @NYCASP y baja la aplicación móvil para recibir alertas directas a tu teléfono: https://t.co/9GSt3VfwSg https://t.co/bbZezAr66f | |#NYCASP Las reglas de estacionamiento alterno están suspendidas hoy, miércoles, 6 de mayo, hasta el martes, 12 de mayo. Los parquímetros permanecerán en efecto. Sigue @NYCASP y baja la aplicación móvil para recibir alertas directas a tu teléfono: https://t.co/9GSt3VfwSg https://t.co/TMiRuFsyUf |

Total Tweets by Sentiment using plotly package

Conclusion

Based on all the analyses performed, the NYC311 Service represents a very popular and reliable channel and resource for the NYC communities to raise awareness to the local agencies and citizen services providers about multiple topics of importance and well-being for the society.

I was able to identify overall themes and topics affecting the main boroughs within the NY Metro area but more importantly, I was able to narrow down characteristic themes and patterns that were more prevalent in each one, providing an idea of the specific challenges, needs and local dynamics each borough community experiments on a quotidian basis.

In terms of Sentiment Analysis for the “nyc311” tweets, the majority of them describe a positive sentiment , surprisingly not a considerable number of complaints or negative mentions being raised leveraging the Twitter channel and also, the NYC311 service uses it to provide resolution advice, status and redirection guidance to its users/followers.

Issues faced during the creation of this Analytics project

  1. Twitter developer account - Process of getting permission to create twitter account has been modiefied and upgraded and requires much smaller details which is then reviewed by the twitter. It was a 3 day process to explain about how and where I will be using NYC 311 twitter data, but I finally got permissions to create app with twitter developer account.

  2. Map plotting in RMarkdown - Map plotting code to plot maps of the NYC area with SR statistics overlayed into multiple facets by complaint type worked perfectly in the RStudio Console. Once I tried the code within R Markdown it threw an exception/error not supporting facets and not overlaying SR statistics. I added a picture of the correct plot right after the affected code section as a reference.