This is a short markdown on NYC public facilities locations from NYC Open Data. It shows how I cleaned the data and what it includes
# read shps ------------------------------------------------------------------------------
oddir <- "~/R/local-data/NYC social service shpfiles/"
files <- list.files(oddir,
recursive = T )
# facilities shpfile
fac <- paste0(oddir,
files[grepl("Fac.*shp$", files)])
fac <- st_read(fac)
## Reading layer `geo_export_c9d7853c-b6d8-473e-839c-bac278d6e681' from data source `C:\Users\kiram\OneDrive\Documents\R\local-data\NYC social service shpfiles\Facilities-Database\geo_export_c9d7853c-b6d8-473e-839c-bac278d6e681.shp' using driver `ESRI Shapefile'
## Simple feature collection with 33084 features and 36 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -74.25392 ymin: 40.49974 xmax: -73.70193 ymax: 40.91391
## geographic CRS: WGS84(DD)
colnames(fac)
## [1] "address" "addressnum" "bbl" "bin" "boro"
## [6] "borocode" "capacity" "captype" "censtract" "city"
## [11] "commboard" "council" "datasource" "facdomain" "facgroup"
## [16] "facname" "facsubgrp" "factype" "latitude" "longitude"
## [21] "nta" "opabbrev" "opname" "optype" "overabbrev"
## [26] "overagency" "overlevel" "policeprct" "proptype" "schooldist"
## [31] "servarea" "streetname" "uid" "xcoord" "ycoord"
## [36] "zipcode" "geometry"
fac %>% tibble()
## # A tibble: 33,084 x 37
## address addressnum bbl bin boro borocode capacity captype censtract
## <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 2869 C~ 2869 4.04e9 4.09e6 Quee~ 4 0 <NA> 062500
## 2 117 WE~ 117 1.01e9 1.03e6 Manh~ 1 0 <NA> 018100
## 3 80-45 ~ 80-45 4.08e9 4.54e6 Quee~ 4 0 <NA> 156700
## 4 888 FO~ 888 3.05e9 3.33e6 Broo~ 3 0 <NA> 107000
## 5 25 BEA~ 25 1.00e9 1.00e6 Manh~ 1 0 <NA> 000900
## 6 1150 F~ 1150 5.02e9 5.13e6 Stat~ 5 0 <NA> 027301
## 7 280 PL~ 280 1.02e9 1.05e6 Manh~ 1 0 <NA> 017800
## 8 1 BATT~ 1 1.00e9 1.00e6 Manh~ 1 0 <NA> 000900
## 9 885 SE~ 885 1.01e9 1.04e6 Manh~ 1 0 <NA> 009000
## 10 55 BRO~ 55 1.00e9 1.00e6 Manh~ 1 0 <NA> 001300
## # ... with 33,074 more rows, and 28 more variables: city <chr>,
## # commboard <dbl>, council <dbl>, datasource <chr>, facdomain <chr>,
## # facgroup <chr>, facname <chr>, facsubgrp <chr>, factype <chr>,
## # latitude <dbl>, longitude <dbl>, nta <chr>, opabbrev <chr>, opname <chr>,
## # optype <chr>, overabbrev <chr>, overagency <chr>, overlevel <chr>,
## # policeprct <dbl>, proptype <chr>, schooldist <chr>, servarea <chr>,
## # streetname <chr>, uid <chr>, xcoord <dbl>, ycoord <dbl>, zipcode <chr>,
## # geometry <POINT [°]>
#colnames(fac)[13:27] %>%
# map( ~count(tibble(fac),
# !!rlang::sym(.)))
#tibble(fac) %>%
# count(facgroup, facsubgrp) %>% View()
I think the facility group and subgroup columns are likely the best filtering to what we’re interested in.
## Warning: package 'ggplot2' was built under R version 4.0.3
## Warning: package 'plotly' was built under R version 4.0.3
# trim to selected columns
fac <- fac %>%
select(address,
matches("^fac|^op"))
# Kindergartens and days cares I leave out..
plausible.to.keep <- c("CAMPS",
# "HEALTH CARE", # mental health and chemical dependency?
"HUMAN SERVICES", # school, workforce dev
"LIBRARIES",
# "VOCATIONAL OR PROPRIETARY SCHOOLS",
# "PARKS",
"SCHOOLS", # K-12? does it make sense to filter further?
"YOUTH SERVICES"
)
plausible.to.keep <- plausible.to.keep %>% paste(collapse = "|")
fac <- fac %>%
filter(grepl(plausible.to.keep, facgroup))
# filter from subgroups too
fac <- fac %>%
filter(!(facgroup %in% "HUMAN SERVICES" &
grepl("IMMIGRANT|DISABILITIES|SENIOR|SOUP", facsubgrp)))
# can easily get sub-type for school, for example..
tibble(fac) %>%
filter(grepl("SCHOOLS", facgroup)) %>%
count(factype)
## # A tibble: 34 x 2
## factype n
## <chr> <int>
## 1 APPROVED PRIVATE SCHOOLS FOR SWD 9
## 2 CHARTER SCHOOL 381
## 3 ELEMENTARY SCHOOL - CHARTER 78
## 4 ELEMENTARY SCHOOL - NON-PUBLIC 678
## 5 ELEMENTARY SCHOOL - PUBLIC 637
## 6 ELEMENTARY SCHOOL - PUBLIC, SPECIAL EDUCATION 1
## 7 GED-ALTERNATIVE HIGH SCHOOL EQUIVALENCY PREP PROGRAMS 129
## 8 HIGH SCHOOL - CHARTER 26
## 9 HIGH SCHOOL - NON-PUBLIC 146
## 10 HIGH SCHOOL - PUBLIC 390
## # ... with 24 more rows
The map staggers the points – occasionally facilities are double-counted, for example Youth Services where the same building houses an after-school program and an SYEP location. It’d be possible to parse these out and handle differently.
Data downloaded from open data: https://data.cityofnewyork.us/City-Government/Facilities-Database-Shapefile/2fpa-bnsx
metadata: https://www1.nyc.gov/assets/planning/download/pdf/data-maps/open-data/colp_metadata.pdf
From open data: “Facilities Database - Shapefile The City Planning Facilities Database (FacDB) aggregates information about 35,000+ public and private facilities and program sites that are owned, operated, funded, licensed or certified by a City, State, or Federal agency in the City of New York. It captures facilities that generally help to shape quality of life in the city’s neighborhoods, including schools, day cares, parks, libraries, public safety services, youth programs, community centers, health clinics, workforce development programs, transitional housing, and solid waste and transportation infrastructure sites. To facilitate analysis and mapping, the data is available in coma-separated values (CSV) file format, ESRI Shapefile, and GeoJSon. The data is also complemented with a new interactive web map that enables users to easily filter the data for their needs. Users are strongly encouraged to read the database documentation, particularly with regard to analytical limitations. For data dictionary, please follow this link All previously released versions of this data are available at BYTES of the BIG APPLE- Archive”