NYC Subway entrances

John Kellogg

2019-10-06


NYC Subway entrances

Found by: John Kellogg

Client Comments:

  • From a not “clean” aspect, I have found the New York subway entrances file which does need cleaning and reprocessing ( https://data.cityofnewyork.us/Transportation/Subway-Entrances/drex-xx56). The Line column is all over the place and needs individual lines split into separate columns. The Name column could also be split up into separate columns. A cool graphic could then be utilized marking the locations on a map.
  • Oh man, that data needs some major clean up. One of the columns called line has what looks like date month name followed by one digit day. And then in the same column just a number. No clue what that columns is for but that would be hard to group or aggregate. (John Suh)
Analyst Comments and Actions:

Github link contained a downloadable csv which was loaded into a raw dataframe. Two separate clients have given feedback into the project. Each ask will be broken into separate sections where possible.

##   OBJECTID                               URL
## 1     1734 http://web.mta.info/nyct/service/
## 2     1735 http://web.mta.info/nyct/service/
## 3     1736 http://web.mta.info/nyct/service/
## 4     1737 http://web.mta.info/nyct/service/
## 5     1738 http://web.mta.info/nyct/service/
## 6     1739 http://web.mta.info/nyct/service/
##                                      NAME
## 1 Birchall Ave & Sagamore St at NW corner
## 2 Birchall Ave & Sagamore St at NE corner
## 3 Morris Park Ave & 180th St at NW corner
## 4 Morris Park Ave & 180th St at NW corner
## 5       Boston Rd & 178th St at SW corner
## 6  Boston Rd & E Tremont Ave at NW corner
##                                        the_geom  LINE
## 1  POINT (-73.86835600032798 40.84916900104506) 5-Feb
## 2  POINT (-73.86821300022677 40.84912800131844) 5-Feb
## 3  POINT (-73.87349900050798 40.84122300105249) 5-Feb
## 4   POINT (-73.8728919997833 40.84145300067447) 5-Feb
## 5  POINT (-73.87962300013866 40.84081500075867) 5-Feb
## 6 POINT (-73.88000500027815 40.840434000875874) 5-Feb

Client Comments:

“The Line column is all over the place and needs individual lines split into separate columns.”

Analyst Comments and Actions:

The Line column has been cleaned of Excel’s autoformatting. We now need to start the tidying up. First is getting the Subway lines data into individual columns.

## Warning: Expected 11 pieces. Missing pieces filled with `NA` in 1905
## rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
## 20, ...].
Analyst Comments and Actions:

Second step in the tidying process: entrance locations into separate fields (Main Street, Cross Street, Corner)

## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 1927
## rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
## 20, ...].
## Warning: Expected 2 pieces. Additional pieces discarded in 18 rows [362,
## 371, 389, 517, 756, 757, 758, 759, 760, 1275, 1276, 1358, 1359, 1442, 1443,
## 1444, 1851, 1914].
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 3 rows
## [1227, 1228, 1924].

Client Comments:

“A cool graphic could then be utilized marking the locations on a map.”

Analyst Comments and Actions:

Using leaflet, a map of all the entrances with labels was created.

## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.