facilities from open data

This is a short markdown on NYC public facilities locations from NYC Open Data. It shows how I cleaned the data and what it includes

# read shps ------------------------------------------------------------------------------
oddir <- "~/R/local-data/NYC social service shpfiles/"
files <- list.files(oddir,
                    recursive = T )

# facilities shpfile
fac <- paste0(oddir,
               files[grepl("Fac.*shp$", files)])
fac <- st_read(fac)
## Reading layer `geo_export_c9d7853c-b6d8-473e-839c-bac278d6e681' from data source `C:\Users\kiram\OneDrive\Documents\R\local-data\NYC social service shpfiles\Facilities-Database\geo_export_c9d7853c-b6d8-473e-839c-bac278d6e681.shp' using driver `ESRI Shapefile'
## Simple feature collection with 33084 features and 36 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -74.25392 ymin: 40.49974 xmax: -73.70193 ymax: 40.91391
## geographic CRS: WGS84(DD)

Data overviews

colnames(fac)
##  [1] "address"    "addressnum" "bbl"        "bin"        "boro"      
##  [6] "borocode"   "capacity"   "captype"    "censtract"  "city"      
## [11] "commboard"  "council"    "datasource" "facdomain"  "facgroup"  
## [16] "facname"    "facsubgrp"  "factype"    "latitude"   "longitude" 
## [21] "nta"        "opabbrev"   "opname"     "optype"     "overabbrev"
## [26] "overagency" "overlevel"  "policeprct" "proptype"   "schooldist"
## [31] "servarea"   "streetname" "uid"        "xcoord"     "ycoord"    
## [36] "zipcode"    "geometry"
fac %>% tibble()
## # A tibble: 33,084 x 37
##    address addressnum    bbl    bin boro  borocode capacity captype censtract
##    <chr>   <chr>       <dbl>  <dbl> <chr>    <dbl>    <dbl> <chr>   <chr>    
##  1 2869 C~ 2869       4.04e9 4.09e6 Quee~        4        0 <NA>    062500   
##  2 117 WE~ 117        1.01e9 1.03e6 Manh~        1        0 <NA>    018100   
##  3 80-45 ~ 80-45      4.08e9 4.54e6 Quee~        4        0 <NA>    156700   
##  4 888 FO~ 888        3.05e9 3.33e6 Broo~        3        0 <NA>    107000   
##  5 25 BEA~ 25         1.00e9 1.00e6 Manh~        1        0 <NA>    000900   
##  6 1150 F~ 1150       5.02e9 5.13e6 Stat~        5        0 <NA>    027301   
##  7 280 PL~ 280        1.02e9 1.05e6 Manh~        1        0 <NA>    017800   
##  8 1 BATT~ 1          1.00e9 1.00e6 Manh~        1        0 <NA>    000900   
##  9 885 SE~ 885        1.01e9 1.04e6 Manh~        1        0 <NA>    009000   
## 10 55 BRO~ 55         1.00e9 1.00e6 Manh~        1        0 <NA>    001300   
## # ... with 33,074 more rows, and 28 more variables: city <chr>,
## #   commboard <dbl>, council <dbl>, datasource <chr>, facdomain <chr>,
## #   facgroup <chr>, facname <chr>, facsubgrp <chr>, factype <chr>,
## #   latitude <dbl>, longitude <dbl>, nta <chr>, opabbrev <chr>, opname <chr>,
## #   optype <chr>, overabbrev <chr>, overagency <chr>, overlevel <chr>,
## #   policeprct <dbl>, proptype <chr>, schooldist <chr>, servarea <chr>,
## #   streetname <chr>, uid <chr>, xcoord <dbl>, ycoord <dbl>, zipcode <chr>,
## #   geometry <POINT [°]>
#colnames(fac)[13:27] %>%
#  map( ~count(tibble(fac),
#              !!rlang::sym(.)))

#tibble(fac) %>%
#  count(facgroup, facsubgrp) %>% View()

I think the facility group and subgroup columns are likely the best filtering to what we’re interested in.

Full detail:

## Warning: package 'ggplot2' was built under R version 4.0.3
## Warning: package 'plotly' was built under R version 4.0.3

Trim to selected types

# trim to selected columns
fac <- fac %>%
    select(address,
           matches("^fac|^op"))

# Kindergartens and days cares I leave out..
plausible.to.keep <- c("CAMPS",
                       # "HEALTH CARE", # mental health and chemical dependency?
                       "HUMAN SERVICES", # school, workforce dev
                       "LIBRARIES",
                       # "VOCATIONAL OR PROPRIETARY SCHOOLS",
                       # "PARKS",
                       "SCHOOLS", # K-12? does it make sense to filter further?
                       "YOUTH SERVICES"
                       )

plausible.to.keep <- plausible.to.keep %>% paste(collapse = "|")


fac <- fac %>%
  filter(grepl(plausible.to.keep, facgroup))

# filter from subgroups too
fac <- fac %>%
  filter(!(facgroup %in% "HUMAN SERVICES" &
             grepl("IMMIGRANT|DISABILITIES|SENIOR|SOUP", facsubgrp)))

Possible further trims?:

# can easily get sub-type for school, for example..
tibble(fac) %>% 
  filter(grepl("SCHOOLS", facgroup)) %>% 
  count(factype)
## # A tibble: 34 x 2
##    factype                                                   n
##    <chr>                                                 <int>
##  1 APPROVED PRIVATE SCHOOLS FOR SWD                          9
##  2 CHARTER SCHOOL                                          381
##  3 ELEMENTARY SCHOOL - CHARTER                              78
##  4 ELEMENTARY SCHOOL - NON-PUBLIC                          678
##  5 ELEMENTARY SCHOOL - PUBLIC                              637
##  6 ELEMENTARY SCHOOL - PUBLIC, SPECIAL EDUCATION             1
##  7 GED-ALTERNATIVE HIGH SCHOOL EQUIVALENCY PREP PROGRAMS   129
##  8 HIGH SCHOOL - CHARTER                                    26
##  9 HIGH SCHOOL - NON-PUBLIC                                146
## 10 HIGH SCHOOL - PUBLIC                                    390
## # ... with 24 more rows

What is left after clean:

The map staggers the points – occasionally facilities are double-counted, for example Youth Services where the same building houses an after-school program and an SYEP location. It’d be possible to parse these out and handle differently.

Final notes

Data downloaded from open data: https://data.cityofnewyork.us/City-Government/Facilities-Database-Shapefile/2fpa-bnsx

metadata: https://www1.nyc.gov/assets/planning/download/pdf/data-maps/open-data/colp_metadata.pdf

From open data: “Facilities Database - Shapefile The City Planning Facilities Database (FacDB) aggregates information about 35,000+ public and private facilities and program sites that are owned, operated, funded, licensed or certified by a City, State, or Federal agency in the City of New York. It captures facilities that generally help to shape quality of life in the city’s neighborhoods, including schools, day cares, parks, libraries, public safety services, youth programs, community centers, health clinics, workforce development programs, transitional housing, and solid waste and transportation infrastructure sites. To facilitate analysis and mapping, the data is available in coma-separated values (CSV) file format, ESRI Shapefile, and GeoJSon. The data is also complemented with a new interactive web map that enables users to easily filter the data for their needs. Users are strongly encouraged to read the database documentation, particularly with regard to analytical limitations. For data dictionary, please follow this link All previously released versions of this data are available at BYTES of the BIG APPLE- Archive”