library(readr)
library(tidyverse)
The PLUTO data we’ll use to layer the Property Valuation and Assessment Data. Note it stands for “Primary Land Use Tax Lot Output”.
https://www.nyc.gov/content/planning/pages/resources/datasets/mappluto-pluto-change#overview
“Extensive land use and geographic data at the tax lot level in comma–separated values (csv) file format. The PLUTO files contain more than seventy fields derived from data maintained by city agencies.”
Here we load the PLUTO data that we previously downloaded from the website above.
We identified three columns with parsing issues so are pulling them in as characters.
# Original read of the data
#df <- read_csv("~/Documents/D698/pluto_25v1_1.csv",
# col_types = cols(zonedist3 = col_character(), zonedist4 =
# col_character(), spdist2 = col_character(),
# .default = col_guess()))
# Using RDS files to make loading in subsequent chunks faster
#saveRDS(df, file="df_cc3.rds")
df <- readRDS("~/Documents/D698/df_cc3.rds")
#No more columns with parsing issues
# code to review parsing issues if any
#problems <- problems(df)
#problems %>%
# distinct(col) %>%
# pull(col)
Here we filter the data to only look at properties in the 10019 zip code.
We can use Block-Lot-Exemption to merge this PLUTO data with the DOF’s Property Valuation and Assessment Data.
There is one PLUTO record per Block-Lot so no need to subset on year.
We could try to use zonedist1
to select for just the
Class 2 properties however any non Class 2 properties will drop off when
we merge the two databases.
We initially subset the dataset to focus on zipcode 10019 but there were only 1,049 records and upon inspection, 25% of zip codes are missing so we opted to filter on Manhattan instead, which has zero missing.
df2 <- df %>%
filter(borough == "MN")
df %>%
summarise(
total_rows = n(),
missing_count = sum(is.na(zipcode)),
missing_pct = mean(is.na(zipcode)) * 100
)
## # A tibble: 1 × 3
## total_rows missing_count missing_pct
## <int> <int> <dbl>
## 1 858629 2132 0.248
df %>%
summarise(
total_rows = n(),
missing_count = sum(is.na(borough)),
missing_pct = mean(is.na(borough)) * 100
)
## # A tibble: 1 × 3
## total_rows missing_count missing_pct
## <int> <int> <dbl>
## 1 858629 0 0
Here we save the data for later
#No more writing to csv
#write.csv(df2, file="PVADex.csv")
saveRDS(df2, file="PLUTO.rds")
Here are the fields for reference
t(head(df2, n=1))
## [,1]
## borough "MN"
## block "1371"
## lot "30"
## cd "106"
## bct2020 "1010601"
## bctcb2020 "10106010001"
## ct2010 "106.01"
## cb2010 "0001"
## schooldist NA
## council "5"
## zipcode NA
## firecomp NA
## policeprct NA
## healthcenterdistrict NA
## healtharea NA
## sanitboro NA
## sanitdistrict NA
## sanitsub NA
## address "EAST 55 STREET"
## zonedist1 "R10"
## zonedist2 NA
## zonedist3 NA
## zonedist4 NA
## overlay1 NA
## overlay2 NA
## spdist1 NA
## spdist2 NA
## spdist3 NA
## ltdheight NA
## splitzone "N"
## bldgclass "T2"
## landuse "07"
## easements "0"
## ownertype "C"
## ownername "NYC DEPARTMENT OF SMALL BUSINESS SERVICES"
## lotarea "13950"
## bldgarea "0"
## comarea NA
## resarea NA
## officearea NA
## retailarea NA
## garagearea NA
## strgearea NA
## factryarea NA
## otherarea NA
## areasource "7"
## numbldgs "0"
## numfloors NA
## unitsres "0"
## unitstotal "0"
## lotfront "61"
## lotdepth "45"
## bldgfront "0"
## bldgdepth "0"
## ext "N"
## proxcode "0"
## irrlotcode "Y"
## lottype "9"
## bsmtcode "0"
## assessland "33840"
## assesstot "33840"
## exempttot "33840"
## yearbuilt "0"
## yearalter1 "0"
## yearalter2 "0"
## histdist NA
## landmark NA
## builtfar "0"
## residfar "10"
## commfar "0"
## facilfar "10"
## borocode "1"
## bbl "1013710030"
## condono NA
## tract2010 "010601"
## xcoord "995125"
## ycoord "214596"
## zonemap "8d"
## zmcode NA
## sanborn "106E069"
## taxmap "10507"
## edesignum NA
## appbbl NA
## appdate NA
## plutomapid "4"
## firm07_flag "1"
## pfirm15_flag "1"
## version "25v1.1"
## dcpedited "TRUE"
## latitude "40.75569"
## longitude "-73.96075"
## notes NA