NYC Subway entrances

John Kellogg

2019-10-06

NYC Subway entrances

Found by: John Kellogg

Client Comments:

From a not “clean” aspect, I have found the New York subway entrances file which does need cleaning and reprocessing ( https://data.cityofnewyork.us/Transportation/Subway-Entrances/drex-xx56). The Line column is all over the place and needs individual lines split into separate columns. The Name column could also be split up into separate columns. A cool graphic could then be utilized marking the locations on a map.
Oh man, that data needs some major clean up. One of the columns called line has what looks like date month name followed by one digit day. And then in the same column just a number. No clue what that columns is for but that would be hard to group or aggregate. (John Suh)

Analyst Comments and Actions:

Github link contained a downloadable csv which was loaded into a raw dataframe. Two separate clients have given feedback into the project. Each ask will be broken into separate sections where possible.

# dowload file from github, save it locally in your home directory
download <- download.file('https://raw.githubusercontent.com/kelloggjohnd/DATA607/master/DOITT_SUBWAY_ENTRANCE.csv', destfile = "subway_entrance.csv", method = "wininet") 

# manipulate the data into a data frame
data_raw <- data.frame(read.csv(file = "subway_entrance.csv", header = TRUE, sep = ","))

# QC step
head(data_raw)

##   OBJECTID                               URL
## 1     1734 http://web.mta.info/nyct/service/
## 2     1735 http://web.mta.info/nyct/service/
## 3     1736 http://web.mta.info/nyct/service/
## 4     1737 http://web.mta.info/nyct/service/
## 5     1738 http://web.mta.info/nyct/service/
## 6     1739 http://web.mta.info/nyct/service/
##                                      NAME
## 1 Birchall Ave & Sagamore St at NW corner
## 2 Birchall Ave & Sagamore St at NE corner
## 3 Morris Park Ave & 180th St at NW corner
## 4 Morris Park Ave & 180th St at NW corner
## 5       Boston Rd & 178th St at SW corner
## 6  Boston Rd & E Tremont Ave at NW corner
##                                        the_geom  LINE
## 1  POINT (-73.86835600032798 40.84916900104506) 5-Feb
## 2  POINT (-73.86821300022677 40.84912800131844) 5-Feb
## 3  POINT (-73.87349900050798 40.84122300105249) 5-Feb
## 4   POINT (-73.8728919997833 40.84145300067447) 5-Feb
## 5  POINT (-73.87962300013866 40.84081500075867) 5-Feb
## 6 POINT (-73.88000500027815 40.840434000875874) 5-Feb

Client Comments:

“The Line column is all over the place and needs individual lines split into separate columns.”

Analyst Comments and Actions:

With the data loaded, we can now start the process to clean the data. The first chunk will adjust the Line column. It appears Excel auto-formatted some of the fields into a date.

# Keep the ObjectID for ease in later joins
subway_line_raw <- subset(data_raw, select = c(OBJECTID,LINE)) 

# SRT replace functions.  I chose to keep these seperate instead of a Tidyr package for ease in QC.
subway_line_raw$LINE <- str_replace_all(subway_line_raw$LINE, "1/2/2003", "1-2-3")
subway_line_raw$LINE <- str_replace_all(subway_line_raw$LINE, "Feb", "2")
subway_line_raw$LINE <- str_replace_all(subway_line_raw$LINE, "Mar", "3")
subway_line_raw$LINE <- str_replace_all(subway_line_raw$LINE, "Apr", "4")
subway_line_raw$LINE <- str_replace_all(subway_line_raw$LINE, "2005", "5")
subway_line_raw$LINE <- str_replace_all(subway_line_raw$LINE, "2006", "6")
subway_line_raw$LINE <- str_replace_all(subway_line_raw$LINE, "2003", "3")

# QC step
head(subway_line_raw)

##   OBJECTID LINE
## 1     1734  5-2
## 2     1735  5-2
## 3     1736  5-2
## 4     1737  5-2
## 5     1738  5-2
## 6     1739  5-2

Analyst Comments and Actions:

The Line column has been cleaned of Excel’s autoformatting. We now need to start the tidying up. First is getting the Subway lines data into individual columns.

# Seperate the lines column into seperate columns 
subway_lines<- subway_line_raw %>% 
  separate(LINE,c("Line1","Line2","Line3","Line4","Line5","Line6","Line7","Line8","Line9","Line10","Line11"), sep = "-", convert = FALSE)

## Warning: Expected 11 pieces. Missing pieces filled with `NA` in 1905
## rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
## 20, ...].

Analyst Comments and Actions:

Second step in the tidying process: entrance locations into separate fields (Main Street, Cross Street, Corner)

location_raw <- subset(data_raw, select = c(OBJECTID,NAME))

location_raw <- location_raw %>% 
  separate(NAME, c("Street", "Cross_Street", "Second_street"), sep = "&")%>%
  separate(Cross_Street, c("Cross_Street", "Corner"), sep = "at")

## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 1927
## rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
## 20, ...].

## Warning: Expected 2 pieces. Additional pieces discarded in 18 rows [362,
## 371, 389, 517, 756, 757, 758, 759, 760, 1275, 1276, 1358, 1359, 1442, 1443,
## 1444, 1851, 1914].

## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 3 rows
## [1227, 1228, 1924].

Location<- location_raw[, c("OBJECTID","Street", "Cross_Street", "Second_street","Corner")]

Analyst Comments and Actions:

Third step in the tidying: separate the ‘Lat’ and ‘Long’ coordinates into separate columns

geo_raw <- subset(data_raw, select = c(OBJECTID,the_geom))
geo_raw <- geo_raw %>% separate(the_geom, c("trash", "Lat", "Long"), sep = " ")
geo_raw$Lat<- gsub("[(]", "", geo_raw$Lat)
geo_raw$Long<- gsub("[)]", "", geo_raw$Long)
geo_data <- subset(geo_raw, select = c(OBJECTID, Lat,Long))

Analyst Comments and Actions:

Forth and last step in the tidying: combine all the above subsets into one master Data Frame.

Subway <- subset(data_raw, select = c(OBJECTID, URL))
Subway <- Subway %>%
  full_join(subway_lines,by = c("OBJECTID" = "OBJECTID"), copy = FALSE) %>%
  full_join(geo_data,by = c("OBJECTID" = "OBJECTID"), copy = FALSE) %>%
  full_join(Location,by = c("OBJECTID" = "OBJECTID"), copy = FALSE)

Client Comments:

“A cool graphic could then be utilized marking the locations on a map.”

Analyst Comments and Actions:

Using leaflet, a map of all the entrances with labels was created.

# Subset of the Positioning coordinates
Subway_points<- Subway%>%
  unite(Name, Street:Cross_Street, sep = "_")%>%
    select(Lat, Long, Name)%>%
      mutate_at(vars(matches("Lat|Long")),funs(as.numeric))

## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.

leaflet(Subway_points) %>% 
  addTiles() %>%
  addCircles(lng = ~Lat, lat = ~Long, weight = 1,popup = ~Name) %>%
  addProviderTiles("CartoDB.Positron")%>%
  setView(-73.98, 40.75, zoom = 13)