library(readr)
library(tidyverse)

Property Valuation and Assessment Data Expanded

Here is the expanded version of the DOF’s Property Valuation and Assessment Data set.

https://catalog.data.gov/dataset/property-valuation-and-assessment-data-tax-classes-1234

“Real Estate Assessment Property data. Data represent NYC properties assessments for purpose to calculate Property Tax, Grant eligible properties Exemptions and/or Abatements. Data collected and entered into the system by various City employee, like Property Assessors, Property Exemption specialists, ACRIS reporting, Department of Building reporting, etc..”

Note, You can go to the follow website to enter any borough block and lot number to see a map of it and the surrounding area. For example Block 1046 in Manhattan is between 55th and 56th Street and between 8th and 9th Avenue, and Lot 23 is a specific lot in that block. Our primary key, since we are only looking at data in the borough of Manhattan is the combination of the Block and Lot number.

https://propertyinformationportal.nyc.gov/




Load Data

Here we load the expanded data that we previously downloaded from the website above.

Of note there is data up to at least 2024 which may mean we ignore the unexpanded data entirely which only went up to 2018/19.

In this case BBLE was not present but there were seven columns with parsing errors due to the default column type not being expansive enough. They are NOAV VALREF ZIP_CODE YRBUILT_FLAG APPT_EASE PROTEST_1 PROTEST_OLD (columns 71 70 109 118 93 116 78). Reading the data took over 24 minutes and the number of parsing errors was 0.2% (16,523/8,151,370) so we’re letting it be with notation. This is confirmed with hindsight as only ZIP_CODE is included in our merged and subsetted data. We can continue with this load as we are looking at all of Manhattan now.

To save knitting time we tried saving our df as an rds file and read from there for speed but it didn’t work so we abandoned that but keep it in case we have to revisit this again.

df1 <- read_csv("~/Documents/D698/Property_Valuation_and_Assessment_Data_Tax_Classes_1_2_3_4.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 8151370 Columns: 139
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (31): PARID, EASEMENT, SUBIDENT-REUC, IDENT, SUBIDENT, PYTAXCLASS, TENT...
## dbl (102): BORO, BLOCK, LOT, RECTYPE, YEAR, ROLL_SECTION, SECVOL, PYMKTLAND,...
## lgl   (6): VALREF, YRBUILT_FLAG, APPT_EASE, CONDO_SFX3, PROTEST_2, ATTORNEY_...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#saveRDS(df1, file="df_cc2.rds")
#df1 <- readRDS("~/Documents/D698/df_cc2.rds")

Here we pull the columns with parsing issues.

We checked and none of the zip code parse issues were related to 10019, but now we’re talking all of Manhattan. We don’t need to rerun for zip code because we know we can get location data from coordinates in the PLUTO dataset.

# code to review parsing issues if any
problems <- problems(df1)

problems %>%
  distinct(col) %>%
  pull(col)
## [1]  71  70 109 118  93 116  78
names(df1)[c(71, 70, 109, 118, 93, 116, 78)]
## [1] "VALREF"       "NOAV"         "APPT_EASE"    "PROTEST_OLD"  "YRBUILT_FLAG"
## [6] "PROTEST_1"    "ZIP_CODE"
problems %>%
  filter(col==78)
## # A tibble: 122 × 5
##       row   col expected actual     file                                        
##     <int> <int> <chr>    <chr>      <chr>                                       
##  1 418097    78 a double 10465-1271 /Users/pkoflaherty/Documents/D698/Property_…
##  2 849989    78 a double 11368-4627 /Users/pkoflaherty/Documents/D698/Property_…
##  3 849991    78 a double 11368-4249 /Users/pkoflaherty/Documents/D698/Property_…
##  4 849992    78 a double 11368-4759 /Users/pkoflaherty/Documents/D698/Property_…
##  5 849994    78 a double 11368-4758 /Users/pkoflaherty/Documents/D698/Property_…
##  6 849996    78 a double 11368-4157 /Users/pkoflaherty/Documents/D698/Property_…
##  7 856318    78 a double 11375-2859 /Users/pkoflaherty/Documents/D698/Property_…
##  8 912776    78 a double 11356-1729 /Users/pkoflaherty/Documents/D698/Property_…
##  9 915897    78 a double 11354-1966 /Users/pkoflaherty/Documents/D698/Property_…
## 10 929833    78 a double 11354-4000 /Users/pkoflaherty/Documents/D698/Property_…
## # ℹ 112 more rows



Subset the data

Here we filter the data to only look at non-condo, Class 2 properties in Manhattan.

df2 <- df1 %>%
  filter(BORO == "1", CURTAXCLASS == "2", is.na(CONDO_Number))

We are also going to limit to the 2025 year since it is the last whole year with 26,761 records while 2026 only has 13,355 so far, but we’ll save that for a subseqent code chunk.

# Count the number of each YEAR
df2 %>%
  count(YEAR)
## # A tibble: 4 × 2
##    YEAR     n
##   <dbl> <int>
## 1  2023 26839
## 2  2024 26788
## 3  2025 26761
## 4  2026 13355



Write Data

Here we save the data for later

write.csv(df2, file="PVADex.csv")



Reference

Here are the fields for reference

t(head(df2, n=1))
##                        [,1]               
## PARID                  "1000110014"       
## BORO                   "1"                
## BLOCK                  "11"               
## LOT                    "14"               
## EASEMENT               NA                 
## SUBIDENT-REUC          NA                 
## RECTYPE                "1"                
## YEAR                   "2023"             
## IDENT                  NA                 
## SUBIDENT               NA                 
## ROLL_SECTION           NA                 
## SECVOL                 "101"              
## PYMKTLAND              "1690000"          
## PYMKTTOT               "10440000"         
## PYACTLAND              "760500"           
## PYACTTOT               "4698000"          
## PYACTEXTOT             "14760"            
## PYTRNLAND              "760500"           
## PYTRNTOT               "4923750"          
## PYTRNEXTOT             "14760"            
## PYTXBTOT               "4698000"          
## PYTXBEXTOT             "14760"            
## PYTAXCLASS             "2"                
## TENMKTLAND             "1690000"          
## TENMKTTOT              "12121000"         
## TENACTLAND             "760500"           
## TENACTTOT              "5454450"          
## TENACTEXTOT            "12300"            
## TENTRNLAND             "760500"           
## TENTRNTOT              "5022210"          
## TENTRNEXTOT            "12300"            
## TENTXBTOT              "5022210"          
## TENTXBEXTOT            "12300"            
## TENTAXCLASS            "2"                
## CBNMKTLAND             "1690000"          
## CBNMKTTOT              "12121000"         
## CBNACTLAND             "760500"           
## CBNACTTOT              "5454450"          
## CBNACTEXTOT            "11950"            
## CBNTRNLAND             "760500"           
## CBNTRNTOT              "5022210"          
## CBNTRNEXTOT            "11950"            
## CBNTXBTOT              "5022210"          
## CBNTXBEXTOT            "11950"            
## CBNTAXCLASS            "2"                
## FINMKTLAND             "1690000"          
## FINMKTTOT              "12121000"         
## FINACTLAND             "760500"           
## FINACTTOT              "5454450"          
## FINACTEXTOT            "11950"            
## FINTRNLAND             "760500"           
## FINTRNTOT              "5022210"          
## FINTRNEXTOT            "11950"            
## FINTXBTOT              "5022210"          
## FINTXBEXTOT            "11950"            
## FINTAXCLASS            "2"                
## CURMKTLAND             "1690000"          
## CURMKTTOT              "12121000"         
## CURACTLAND             "760500"           
## CURACTTOT              "5454450"          
## CURACTEXTOT            "11950"            
## CURTRNLAND             "760500"           
## CURTRNTOT              "5022210"          
## CURTRNEXTOT            "11950"            
## CURTXBTOT              "5022210"          
## CURTXBEXTOT            "11950"            
## CURTAXCLASS            "2"                
## PERIOD                 "3"                
## NEWDROP                "0"                
## NOAV                   "0"                
## VALREF                 NA                 
## BLDG_CLASS             "D4"               
## OWNER                  "BEAVER TOWERS INC"
## ZONING                 "C5-5"             
## HOUSENUM_LO            "26"               
## HOUSENUM_HI            "28"               
## STREET_NAME            "BEAVER STREET"    
## ZIP_CODE               "10004"            
## GEPSUPPORT_RC          "00"               
## STCODE                 "1.1295e+10"       
## LOT_FRT                "53.46"            
## LOT_DEP                "56.33"            
## LOT_IRREG              "I"                
## BLD_FRT                "53"               
## BLD_DEP                "56"               
## BLD_EXT                "N"                
## BLD_STORY              "19"               
## CORNER                 NA                 
## LAND_AREA              "2907"             
## NUM_BLDGS              "1"                
## YRBUILT                "1909"             
## YRBUILT_RANGE          "0"                
## YRBUILT_FLAG           NA                 
## YRALT1                 "1989"             
## YRALT1_RANGE           "0"                
## YRALT2                 "0"                
## YRALT2_RANGE           "0"                
## COOP_APTS              "19"               
## UNITS                  "20"               
## REUC_REF               NA                 
## APTNO                  NA                 
## COOP_NUM               "100001"           
## CPB_BORO               "1"                
## CPB_DIST               "1"                
## APPT_DATE              NA                 
## APPT_BORO              NA                 
## APPT_BLOCK             NA                 
## APPT_LOT               NA                 
## APPT_EASE              NA                 
## CONDO_Number           NA                 
## CONDO_SFX1             NA                 
## CONDO_SFX2             NA                 
## CONDO_SFX3             NA                 
## UAF_LAND               "0"                
## UAF_BLDG               "0"                
## PROTEST_1              "1"                
## PROTEST_2              NA                 
## PROTEST_OLD            "1"                
## ATTORNEY_GROUP1        "14"               
## ATTORNEY_GROUP2        NA                 
## ATTORNEY_GROUP_OLD     "14"               
## GROSS_SQFT             "51387"            
## HOTEL_AREA_GROSS       "0"                
## OFFICE_AREA_GROSS      "0"                
## RESIDENTIAL_AREA_GROSS "49087"            
## RETAIL_AREA_GROSS      "2300"             
## LOFT_AREA_GROSS        "0"                
## FACTORY_AREA_GROSS     "0"                
## WAREHOUSE_AREA_GROSS   "0"                
## STORAGE_AREA_GROSS     "0"                
## GARAGE_AREA            "0"                
## OTHER_AREA_GROSS       "0"                
## REUC_DESCRIPTION       NA                 
## EXTRACRDT              "05/17/2022"       
## PYTAXFLAG              "A"                
## TENTAXFLAG             "T"                
## CBNTAXFLAG             "T"                
## FINTAXFLAG             "T"                
## CURTAXFLAG             "T"