library(readr)
library(tidyverse)
Here is the expanded version of the DOF’s Property Valuation and Assessment Data set.
https://catalog.data.gov/dataset/property-valuation-and-assessment-data-tax-classes-1234
“Real Estate Assessment Property data. Data represent NYC properties assessments for purpose to calculate Property Tax, Grant eligible properties Exemptions and/or Abatements. Data collected and entered into the system by various City employee, like Property Assessors, Property Exemption specialists, ACRIS reporting, Department of Building reporting, etc..”
Note, You can go to the follow website to enter any borough block and lot number to see a map of it and the surrounding area. For example Block 1046 in Manhattan is between 55th and 56th Street and between 8th and 9th Avenue, and Lot 23 is a specific lot in that block. Our primary key, since we are only looking at data in the borough of Manhattan is the combination of the Block and Lot number.
https://propertyinformationportal.nyc.gov/
Here we load the expanded data that we previously downloaded from the website above.
Of note there is data up to at least 2024 which may mean we ignore the unexpanded data entirely which only went up to 2018/19.
In this case BBLE
was not present but there were seven
columns with parsing errors due to the default column type not being
expansive enough. They are NOAV
VALREF
ZIP_CODE
YRBUILT_FLAG
APPT_EASE
PROTEST_1
PROTEST_OLD
. Reading the data took
over 24 minutes and the number of parsing errors was 0.2%
(16,523/8,151,370) so we’re letting it be with notation.
To save knitting time we tried saving our df
as an rds
file and read from there for speed but the knit failed so we undid
it.
df <- read_csv("~/Documents/D698/Property_Valuation_and_Assessment_Data_Tax_Classes_1_2_3_4.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 8151370 Columns: 139
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (31): PARID, EASEMENT, SUBIDENT-REUC, IDENT, SUBIDENT, PYTAXCLASS, TENT...
## dbl (102): BORO, BLOCK, LOT, RECTYPE, YEAR, ROLL_SECTION, SECVOL, PYMKTLAND,...
## lgl (6): VALREF, YRBUILT_FLAG, APPT_EASE, CONDO_SFX3, PROTEST_2, ATTORNEY_...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#saveRDS(df, file="df_cc2.rds")
#df <- readRDS("df_cc2.rds")
Here we pull the columns with parsing issues.
# code to review parsing issues if any
problems <- problems(df)
problems %>%
distinct(col) %>%
pull(col)
## [1] 71 70 109 118 93 116 78
Here we filter the data to only look at Class 2 properties in the 10019 zip code.
df2 <- df %>%
filter(ZIP_CODE == "10019", PYTAXCLASS == "2")
We were going to limit to the latest year, however it looks like 2026 is incomplete but 2023-2025 are whole. We could keep just 2025 but it may be that 2023 and 2024 give us more data to train.
# Count the number of each YEAR
df2 %>%
count(YEAR)
## # A tibble: 4 × 2
## YEAR n
## <dbl> <int>
## 1 2023 17786
## 2 2024 17870
## 3 2025 17742
## 4 2026 8969
Here we save the data for later
write.csv(df2, file="PVADex.csv")
Here are the fields for reference
t(head(df2, n=1))
## [,1]
## PARID "1006380019"
## BORO "1"
## BLOCK "638"
## LOT "19"
## EASEMENT NA
## SUBIDENT-REUC NA
## RECTYPE "1"
## YEAR "2023"
## IDENT NA
## SUBIDENT NA
## ROLL_SECTION NA
## SECVOL "210"
## PYMKTLAND "4e+06"
## PYMKTTOT "10058000"
## PYACTLAND "1800000"
## PYACTTOT "4526100"
## PYACTEXTOT "130823"
## PYTRNLAND "1800000"
## PYTRNTOT "4655880"
## PYTRNEXTOT "134505"
## PYTXBTOT "4526100"
## PYTXBEXTOT "130823"
## PYTAXCLASS "2"
## TENMKTLAND "4e+06"
## TENMKTTOT "10740000"
## TENACTLAND "1800000"
## TENACTTOT "4833000"
## TENACTEXTOT "58116"
## TENTRNLAND "1800000"
## TENTRNTOT "4770900"
## TENTRNEXTOT "57401"
## TENTXBTOT "4770900"
## TENTXBEXTOT "57401"
## TENTAXCLASS "2"
## CBNMKTLAND "4e+06"
## CBNMKTTOT "10740000"
## CBNACTLAND "1800000"
## CBNACTTOT "4833000"
## CBNACTEXTOT "195745"
## CBNTRNLAND "1800000"
## CBNTRNTOT "4770900"
## CBNTRNEXTOT "190309"
## CBNTXBTOT "4770900"
## CBNTXBEXTOT "190309"
## CBNTAXCLASS "2"
## FINMKTLAND "4e+06"
## FINMKTTOT "10740000"
## FINACTLAND "1800000"
## FINACTTOT "4833000"
## FINACTEXTOT "195745"
## FINTRNLAND "1800000"
## FINTRNTOT "4770900"
## FINTRNEXTOT "190309"
## FINTXBTOT "4770900"
## FINTXBEXTOT "190309"
## FINTAXCLASS "2"
## CURMKTLAND "4e+06"
## CURMKTTOT "10740000"
## CURACTLAND "1800000"
## CURACTTOT "4833000"
## CURACTEXTOT "195745"
## CURTRNLAND "1800000"
## CURTRNTOT "4770900"
## CURTRNEXTOT "190309"
## CURTXBTOT "4770900"
## CURTXBEXTOT "190309"
## CURTAXCLASS "2"
## PERIOD "3"
## NEWDROP "0"
## NOAV "0"
## VALREF NA
## BLDG_CLASS "C6"
## OWNER "WVH HOUSING CORPORATION"
## ZONING "C1-6A"
## HOUSENUM_LO "720"
## HOUSENUM_HI "732"
## STREET_NAME "WEST 11 STREET"
## ZIP_CODE "10019"
## GEPSUPPORT_RC "42"
## STCODE "1.3399e+10"
## LOT_FRT "196.58"
## LOT_DEP "258.5"
## LOT_IRREG "I"
## BLD_FRT "44"
## BLD_DEP "46"
## BLD_EXT "N"
## BLD_STORY "5"
## CORNER "NW"
## LAND_AREA "39334"
## NUM_BLDGS "10"
## YRBUILT "1939"
## YRBUILT_RANGE "0"
## YRBUILT_FLAG NA
## YRALT1 "0"
## YRALT1_RANGE "0"
## YRALT2 "0"
## YRALT2_RANGE "0"
## COOP_APTS "100"
## UNITS "100"
## REUC_REF NA
## APTNO NA
## COOP_NUM "103987"
## CPB_BORO "1"
## CPB_DIST "2"
## APPT_DATE NA
## APPT_BORO NA
## APPT_BLOCK NA
## APPT_LOT NA
## APPT_EASE NA
## CONDO_Number NA
## CONDO_SFX1 NA
## CONDO_SFX2 NA
## CONDO_SFX3 NA
## UAF_LAND "0"
## UAF_BLDG "0"
## PROTEST_1 "1"
## PROTEST_2 NA
## PROTEST_OLD "1"
## ATTORNEY_GROUP1 "21"
## ATTORNEY_GROUP2 NA
## ATTORNEY_GROUP_OLD "21"
## GROSS_SQFT "118350"
## HOTEL_AREA_GROSS "0"
## OFFICE_AREA_GROSS "0"
## RESIDENTIAL_AREA_GROSS "118350"
## RETAIL_AREA_GROSS "0"
## LOFT_AREA_GROSS "0"
## FACTORY_AREA_GROSS "0"
## WAREHOUSE_AREA_GROSS "0"
## STORAGE_AREA_GROSS "0"
## GARAGE_AREA "0"
## OTHER_AREA_GROSS "0"
## REUC_DESCRIPTION NA
## EXTRACRDT "05/17/2022"
## PYTAXFLAG "A"
## TENTAXFLAG "T"
## CBNTAXFLAG "T"
## FINTAXFLAG "T"
## CURTAXFLAG "T"