CUNY MSDS Capstone Project

library(readr)
library(tidyverse)

PLUTO

The PLUTO data we’ll use to layer the Property Valuation and Assessment Data. Note it stands for “Primary Land Use Tax Lot Output”.

https://www.nyc.gov/content/planning/pages/resources/datasets/mappluto-pluto-change#overview

“Extensive land use and geographic data at the tax lot level in comma–separated values (csv) file format. The PLUTO files contain more than seventy fields derived from data maintained by city agencies.”

Load Data

Here we load the PLUTO data that we previously downloaded from the website above.

We identified three columns with parsing issues so are pulling them in as characters.

# Original read of the data
#df <- read_csv("~/Documents/D698/pluto_25v1_1.csv",
#               col_types = cols(zonedist3 = col_character(), zonedist4 =
#                           col_character(), spdist2 = col_character(),
#                           .default = col_guess()))

# Using RDS files to make loading in subsequent chunks faster
#saveRDS(df, file="df_cc3.rds")
df <- readRDS("~/Documents/D698/df_cc3.rds")

#No more columns with parsing issues
# code to review parsing issues if any
#problems <- problems(df)

#problems %>%
#  distinct(col) %>%
#  pull(col)

Subset the data

Here we filter the data to only look at properties in the 10019 zip code.

We can use Block-Lot-Exemption to merge this PLUTO data with the DOF’s Property Valuation and Assessment Data.

There is one PLUTO record per Block-Lot so no need to subset on year.

We could try to use zonedist1 to select for just the Class 2 properties however any non Class 2 properties will drop off when we merge the two databases.

We initially subset the dataset to focus on zipcode 10019 but there were only 1,049 records and upon inspection, 25% of zip codes are missing so we opted to filter on Manhattan instead, which has zero missing.

df2 <- df %>%
  filter(borough == "MN")

df %>%
  summarise(
    total_rows = n(),
    missing_count = sum(is.na(zipcode)),
    missing_pct = mean(is.na(zipcode)) * 100
  )

## # A tibble: 1 × 3
##   total_rows missing_count missing_pct
##        <int>         <int>       <dbl>
## 1     858629          2132       0.248

df %>%
  summarise(
    total_rows = n(),
    missing_count = sum(is.na(borough)),
    missing_pct = mean(is.na(borough)) * 100
  )

## # A tibble: 1 × 3
##   total_rows missing_count missing_pct
##        <int>         <int>       <dbl>
## 1     858629             0           0

Write Data

Here we save the data for later

#No more writing to csv
#write.csv(df2, file="PVADex.csv")

saveRDS(df2, file="PLUTO.rds")

Reference

Here are the fields for reference

t(head(df2, n=1))

##                      [,1]                                       
## borough              "MN"                                       
## block                "1371"                                     
## lot                  "30"                                       
## cd                   "106"                                      
## bct2020              "1010601"                                  
## bctcb2020            "10106010001"                              
## ct2010               "106.01"                                   
## cb2010               "0001"                                     
## schooldist           NA                                         
## council              "5"                                        
## zipcode              NA                                         
## firecomp             NA                                         
## policeprct           NA                                         
## healthcenterdistrict NA                                         
## healtharea           NA                                         
## sanitboro            NA                                         
## sanitdistrict        NA                                         
## sanitsub             NA                                         
## address              "EAST 55 STREET"                           
## zonedist1            "R10"                                      
## zonedist2            NA                                         
## zonedist3            NA                                         
## zonedist4            NA                                         
## overlay1             NA                                         
## overlay2             NA                                         
## spdist1              NA                                         
## spdist2              NA                                         
## spdist3              NA                                         
## ltdheight            NA                                         
## splitzone            "N"                                        
## bldgclass            "T2"                                       
## landuse              "07"                                       
## easements            "0"                                        
## ownertype            "C"                                        
## ownername            "NYC DEPARTMENT OF SMALL BUSINESS SERVICES"
## lotarea              "13950"                                    
## bldgarea             "0"                                        
## comarea              NA                                         
## resarea              NA                                         
## officearea           NA                                         
## retailarea           NA                                         
## garagearea           NA                                         
## strgearea            NA                                         
## factryarea           NA                                         
## otherarea            NA                                         
## areasource           "7"                                        
## numbldgs             "0"                                        
## numfloors            NA                                         
## unitsres             "0"                                        
## unitstotal           "0"                                        
## lotfront             "61"                                       
## lotdepth             "45"                                       
## bldgfront            "0"                                        
## bldgdepth            "0"                                        
## ext                  "N"                                        
## proxcode             "0"                                        
## irrlotcode           "Y"                                        
## lottype              "9"                                        
## bsmtcode             "0"                                        
## assessland           "33840"                                    
## assesstot            "33840"                                    
## exempttot            "33840"                                    
## yearbuilt            "0"                                        
## yearalter1           "0"                                        
## yearalter2           "0"                                        
## histdist             NA                                         
## landmark             NA                                         
## builtfar             "0"                                        
## residfar             "10"                                       
## commfar              "0"                                        
## facilfar             "10"                                       
## borocode             "1"                                        
## bbl                  "1013710030"                               
## condono              NA                                         
## tract2010            "010601"                                   
## xcoord               "995125"                                   
## ycoord               "214596"                                   
## zonemap              "8d"                                       
## zmcode               NA                                         
## sanborn              "106E069"                                  
## taxmap               "10507"                                    
## edesignum            NA                                         
## appbbl               NA                                         
## appdate              NA                                         
## plutomapid           "4"                                        
## firm07_flag          "1"                                        
## pfirm15_flag         "1"                                        
## version              "25v1.1"                                   
## dcpedited            "TRUE"                                     
## latitude             "40.75569"                                 
## longitude            "-73.96075"                                
## notes                NA

CUNY MSDS Capstone Project

Taxation Equity for NYC Co-ops

PK O’Flaherty

Coding Chunk 3

2025-05-23

PLUTO

Load Data

Subset the data

Write Data

Reference