library(readr)
library(tidyverse)

Property Valuation and Assessment Data Expanded

Here is the expanded version of the DOF’s Property Valuation and Assessment Data set.

https://catalog.data.gov/dataset/property-valuation-and-assessment-data-tax-classes-1234

“Real Estate Assessment Property data. Data represent NYC properties assessments for purpose to calculate Property Tax, Grant eligible properties Exemptions and/or Abatements. Data collected and entered into the system by various City employee, like Property Assessors, Property Exemption specialists, ACRIS reporting, Department of Building reporting, etc..”

Note, You can go to the follow website to enter any borough block and lot number to see a map of it and the surrounding area. For example Block 1046 in Manhattan is between 55th and 56th Street and between 8th and 9th Avenue, and Lot 23 is a specific lot in that block. Our primary key, since we are only looking at data in the borough of Manhattan is the combination of the Block and Lot number.

https://propertyinformationportal.nyc.gov/




Load Data

Here we load the expanded data that we previously downloaded from the website above.

Of note there is data up to at least 2024 which may mean we ignore the unexpanded data entirely which only went up to 2018/19.

In this case BBLE was not present but there were seven columns with parsing errors due to the default column type not being expansive enough. They are NOAV VALREF ZIP_CODE YRBUILT_FLAG APPT_EASE PROTEST_1 PROTEST_OLD. Reading the data took over 24 minutes and the number of parsing errors was 0.2% (16,523/8,151,370) so we’re letting it be with notation.

To save knitting time we tried saving our df as an rds file and read from there for speed but the knit failed so we undid it.

df <- read_csv("~/Documents/D698/Property_Valuation_and_Assessment_Data_Tax_Classes_1_2_3_4.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 8151370 Columns: 139
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (31): PARID, EASEMENT, SUBIDENT-REUC, IDENT, SUBIDENT, PYTAXCLASS, TENT...
## dbl (102): BORO, BLOCK, LOT, RECTYPE, YEAR, ROLL_SECTION, SECVOL, PYMKTLAND,...
## lgl   (6): VALREF, YRBUILT_FLAG, APPT_EASE, CONDO_SFX3, PROTEST_2, ATTORNEY_...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#saveRDS(df, file="df_cc2.rds")

#df <- readRDS("df_cc2.rds")

Here we pull the columns with parsing issues.

# code to review parsing issues if any
problems <- problems(df)

problems %>%
  distinct(col) %>%
  pull(col)
## [1]  71  70 109 118  93 116  78



Subset the data

Here we filter the data to only look at Class 2 properties in the 10019 zip code.

df2 <- df %>%
  filter(ZIP_CODE == "10019", PYTAXCLASS == "2")

We were going to limit to the latest year, however it looks like 2026 is incomplete but 2023-2025 are whole. We could keep just 2025 but it may be that 2023 and 2024 give us more data to train.

# Count the number of each YEAR
df2 %>%
  count(YEAR)
## # A tibble: 4 × 2
##    YEAR     n
##   <dbl> <int>
## 1  2023 17786
## 2  2024 17870
## 3  2025 17742
## 4  2026  8969



Write Data

Here we save the data for later

write.csv(df2, file="PVADex.csv")



Reference

Here are the fields for reference

t(head(df2, n=1))
##                        [,1]                     
## PARID                  "1006380019"             
## BORO                   "1"                      
## BLOCK                  "638"                    
## LOT                    "19"                     
## EASEMENT               NA                       
## SUBIDENT-REUC          NA                       
## RECTYPE                "1"                      
## YEAR                   "2023"                   
## IDENT                  NA                       
## SUBIDENT               NA                       
## ROLL_SECTION           NA                       
## SECVOL                 "210"                    
## PYMKTLAND              "4e+06"                  
## PYMKTTOT               "10058000"               
## PYACTLAND              "1800000"                
## PYACTTOT               "4526100"                
## PYACTEXTOT             "130823"                 
## PYTRNLAND              "1800000"                
## PYTRNTOT               "4655880"                
## PYTRNEXTOT             "134505"                 
## PYTXBTOT               "4526100"                
## PYTXBEXTOT             "130823"                 
## PYTAXCLASS             "2"                      
## TENMKTLAND             "4e+06"                  
## TENMKTTOT              "10740000"               
## TENACTLAND             "1800000"                
## TENACTTOT              "4833000"                
## TENACTEXTOT            "58116"                  
## TENTRNLAND             "1800000"                
## TENTRNTOT              "4770900"                
## TENTRNEXTOT            "57401"                  
## TENTXBTOT              "4770900"                
## TENTXBEXTOT            "57401"                  
## TENTAXCLASS            "2"                      
## CBNMKTLAND             "4e+06"                  
## CBNMKTTOT              "10740000"               
## CBNACTLAND             "1800000"                
## CBNACTTOT              "4833000"                
## CBNACTEXTOT            "195745"                 
## CBNTRNLAND             "1800000"                
## CBNTRNTOT              "4770900"                
## CBNTRNEXTOT            "190309"                 
## CBNTXBTOT              "4770900"                
## CBNTXBEXTOT            "190309"                 
## CBNTAXCLASS            "2"                      
## FINMKTLAND             "4e+06"                  
## FINMKTTOT              "10740000"               
## FINACTLAND             "1800000"                
## FINACTTOT              "4833000"                
## FINACTEXTOT            "195745"                 
## FINTRNLAND             "1800000"                
## FINTRNTOT              "4770900"                
## FINTRNEXTOT            "190309"                 
## FINTXBTOT              "4770900"                
## FINTXBEXTOT            "190309"                 
## FINTAXCLASS            "2"                      
## CURMKTLAND             "4e+06"                  
## CURMKTTOT              "10740000"               
## CURACTLAND             "1800000"                
## CURACTTOT              "4833000"                
## CURACTEXTOT            "195745"                 
## CURTRNLAND             "1800000"                
## CURTRNTOT              "4770900"                
## CURTRNEXTOT            "190309"                 
## CURTXBTOT              "4770900"                
## CURTXBEXTOT            "190309"                 
## CURTAXCLASS            "2"                      
## PERIOD                 "3"                      
## NEWDROP                "0"                      
## NOAV                   "0"                      
## VALREF                 NA                       
## BLDG_CLASS             "C6"                     
## OWNER                  "WVH HOUSING CORPORATION"
## ZONING                 "C1-6A"                  
## HOUSENUM_LO            "720"                    
## HOUSENUM_HI            "732"                    
## STREET_NAME            "WEST 11 STREET"         
## ZIP_CODE               "10019"                  
## GEPSUPPORT_RC          "42"                     
## STCODE                 "1.3399e+10"             
## LOT_FRT                "196.58"                 
## LOT_DEP                "258.5"                  
## LOT_IRREG              "I"                      
## BLD_FRT                "44"                     
## BLD_DEP                "46"                     
## BLD_EXT                "N"                      
## BLD_STORY              "5"                      
## CORNER                 "NW"                     
## LAND_AREA              "39334"                  
## NUM_BLDGS              "10"                     
## YRBUILT                "1939"                   
## YRBUILT_RANGE          "0"                      
## YRBUILT_FLAG           NA                       
## YRALT1                 "0"                      
## YRALT1_RANGE           "0"                      
## YRALT2                 "0"                      
## YRALT2_RANGE           "0"                      
## COOP_APTS              "100"                    
## UNITS                  "100"                    
## REUC_REF               NA                       
## APTNO                  NA                       
## COOP_NUM               "103987"                 
## CPB_BORO               "1"                      
## CPB_DIST               "2"                      
## APPT_DATE              NA                       
## APPT_BORO              NA                       
## APPT_BLOCK             NA                       
## APPT_LOT               NA                       
## APPT_EASE              NA                       
## CONDO_Number           NA                       
## CONDO_SFX1             NA                       
## CONDO_SFX2             NA                       
## CONDO_SFX3             NA                       
## UAF_LAND               "0"                      
## UAF_BLDG               "0"                      
## PROTEST_1              "1"                      
## PROTEST_2              NA                       
## PROTEST_OLD            "1"                      
## ATTORNEY_GROUP1        "21"                     
## ATTORNEY_GROUP2        NA                       
## ATTORNEY_GROUP_OLD     "21"                     
## GROSS_SQFT             "118350"                 
## HOTEL_AREA_GROSS       "0"                      
## OFFICE_AREA_GROSS      "0"                      
## RESIDENTIAL_AREA_GROSS "118350"                 
## RETAIL_AREA_GROSS      "0"                      
## LOFT_AREA_GROSS        "0"                      
## FACTORY_AREA_GROSS     "0"                      
## WAREHOUSE_AREA_GROSS   "0"                      
## STORAGE_AREA_GROSS     "0"                      
## GARAGE_AREA            "0"                      
## OTHER_AREA_GROSS       "0"                      
## REUC_DESCRIPTION       NA                       
## EXTRACRDT              "05/17/2022"             
## PYTAXFLAG              "A"                      
## TENTAXFLAG             "T"                      
## CBNTAXFLAG             "T"                      
## FINTAXFLAG             "T"                      
## CURTAXFLAG             "T"