library(readr)
library(tidyverse)
Here is the expanded version of the DOF’s Property Valuation and Assessment Data set.
https://catalog.data.gov/dataset/property-valuation-and-assessment-data-tax-classes-1234
“Real Estate Assessment Property data. Data represent NYC properties assessments for purpose to calculate Property Tax, Grant eligible properties Exemptions and/or Abatements. Data collected and entered into the system by various City employee, like Property Assessors, Property Exemption specialists, ACRIS reporting, Department of Building reporting, etc..”
Note, You can go to the follow website to enter any borough block and lot number to see a map of it and the surrounding area. For example Block 1046 in Manhattan is between 55th and 56th Street and between 8th and 9th Avenue, and Lot 23 is a specific lot in that block. Our primary key, since we are only looking at data in the borough of Manhattan is the combination of the Block and Lot number.
https://propertyinformationportal.nyc.gov/
Here we load the expanded data that we previously downloaded from the website above.
Of note there is data up to at least 2024 which may mean we ignore the unexpanded data entirely which only went up to 2018/19.
In this case BBLE
was not present but there were seven
columns with parsing errors due to the default column type not being
expansive enough. They are NOAV
VALREF
ZIP_CODE
YRBUILT_FLAG
APPT_EASE
PROTEST_1
PROTEST_OLD
(columns 71 70 109 118
93 116 78). Reading the data took over 24 minutes and the number of
parsing errors was 0.2% (16,523/8,151,370) so we’re letting it be with
notation. This is confirmed with hindsight as only ZIP_CODE
is included in our merged and subsetted data. We can continue with this
load as we are looking at all of Manhattan now.
To save knitting time we tried saving our df
as an rds
file and read from there for speed but it didn’t work so we abandoned
that but keep it in case we have to revisit this again.
df1 <- read_csv("~/Documents/D698/Property_Valuation_and_Assessment_Data_Tax_Classes_1_2_3_4.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 8151370 Columns: 139
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (31): PARID, EASEMENT, SUBIDENT-REUC, IDENT, SUBIDENT, PYTAXCLASS, TENT...
## dbl (102): BORO, BLOCK, LOT, RECTYPE, YEAR, ROLL_SECTION, SECVOL, PYMKTLAND,...
## lgl (6): VALREF, YRBUILT_FLAG, APPT_EASE, CONDO_SFX3, PROTEST_2, ATTORNEY_...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#saveRDS(df1, file="df_cc2.rds")
#df1 <- readRDS("~/Documents/D698/df_cc2.rds")
Here we pull the columns with parsing issues.
We checked and none of the zip code parse issues were related to 10019, but now we’re talking all of Manhattan. We don’t need to rerun for zip code because we know we can get location data from coordinates in the PLUTO dataset.
# code to review parsing issues if any
problems <- problems(df1)
problems %>%
distinct(col) %>%
pull(col)
## [1] 71 70 109 118 93 116 78
names(df1)[c(71, 70, 109, 118, 93, 116, 78)]
## [1] "VALREF" "NOAV" "APPT_EASE" "PROTEST_OLD" "YRBUILT_FLAG"
## [6] "PROTEST_1" "ZIP_CODE"
problems %>%
filter(col==78)
## # A tibble: 122 × 5
## row col expected actual file
## <int> <int> <chr> <chr> <chr>
## 1 418097 78 a double 10465-1271 /Users/pkoflaherty/Documents/D698/Property_…
## 2 849989 78 a double 11368-4627 /Users/pkoflaherty/Documents/D698/Property_…
## 3 849991 78 a double 11368-4249 /Users/pkoflaherty/Documents/D698/Property_…
## 4 849992 78 a double 11368-4759 /Users/pkoflaherty/Documents/D698/Property_…
## 5 849994 78 a double 11368-4758 /Users/pkoflaherty/Documents/D698/Property_…
## 6 849996 78 a double 11368-4157 /Users/pkoflaherty/Documents/D698/Property_…
## 7 856318 78 a double 11375-2859 /Users/pkoflaherty/Documents/D698/Property_…
## 8 912776 78 a double 11356-1729 /Users/pkoflaherty/Documents/D698/Property_…
## 9 915897 78 a double 11354-1966 /Users/pkoflaherty/Documents/D698/Property_…
## 10 929833 78 a double 11354-4000 /Users/pkoflaherty/Documents/D698/Property_…
## # ℹ 112 more rows
Here we filter the data to only look at non-condo, Class 2 properties in Manhattan.
df2 <- df1 %>%
filter(BORO == "1", CURTAXCLASS == "2", is.na(CONDO_Number))
We are also going to limit to the 2025 year since it is the last whole year with 26,761 records while 2026 only has 13,355 so far, but we’ll save that for a subseqent code chunk.
# Count the number of each YEAR
df2 %>%
count(YEAR)
## # A tibble: 4 × 2
## YEAR n
## <dbl> <int>
## 1 2023 26839
## 2 2024 26788
## 3 2025 26761
## 4 2026 13355
Here we save the data for later
write.csv(df2, file="PVADex.csv")
Here are the fields for reference
t(head(df2, n=1))
## [,1]
## PARID "1000110014"
## BORO "1"
## BLOCK "11"
## LOT "14"
## EASEMENT NA
## SUBIDENT-REUC NA
## RECTYPE "1"
## YEAR "2023"
## IDENT NA
## SUBIDENT NA
## ROLL_SECTION NA
## SECVOL "101"
## PYMKTLAND "1690000"
## PYMKTTOT "10440000"
## PYACTLAND "760500"
## PYACTTOT "4698000"
## PYACTEXTOT "14760"
## PYTRNLAND "760500"
## PYTRNTOT "4923750"
## PYTRNEXTOT "14760"
## PYTXBTOT "4698000"
## PYTXBEXTOT "14760"
## PYTAXCLASS "2"
## TENMKTLAND "1690000"
## TENMKTTOT "12121000"
## TENACTLAND "760500"
## TENACTTOT "5454450"
## TENACTEXTOT "12300"
## TENTRNLAND "760500"
## TENTRNTOT "5022210"
## TENTRNEXTOT "12300"
## TENTXBTOT "5022210"
## TENTXBEXTOT "12300"
## TENTAXCLASS "2"
## CBNMKTLAND "1690000"
## CBNMKTTOT "12121000"
## CBNACTLAND "760500"
## CBNACTTOT "5454450"
## CBNACTEXTOT "11950"
## CBNTRNLAND "760500"
## CBNTRNTOT "5022210"
## CBNTRNEXTOT "11950"
## CBNTXBTOT "5022210"
## CBNTXBEXTOT "11950"
## CBNTAXCLASS "2"
## FINMKTLAND "1690000"
## FINMKTTOT "12121000"
## FINACTLAND "760500"
## FINACTTOT "5454450"
## FINACTEXTOT "11950"
## FINTRNLAND "760500"
## FINTRNTOT "5022210"
## FINTRNEXTOT "11950"
## FINTXBTOT "5022210"
## FINTXBEXTOT "11950"
## FINTAXCLASS "2"
## CURMKTLAND "1690000"
## CURMKTTOT "12121000"
## CURACTLAND "760500"
## CURACTTOT "5454450"
## CURACTEXTOT "11950"
## CURTRNLAND "760500"
## CURTRNTOT "5022210"
## CURTRNEXTOT "11950"
## CURTXBTOT "5022210"
## CURTXBEXTOT "11950"
## CURTAXCLASS "2"
## PERIOD "3"
## NEWDROP "0"
## NOAV "0"
## VALREF NA
## BLDG_CLASS "D4"
## OWNER "BEAVER TOWERS INC"
## ZONING "C5-5"
## HOUSENUM_LO "26"
## HOUSENUM_HI "28"
## STREET_NAME "BEAVER STREET"
## ZIP_CODE "10004"
## GEPSUPPORT_RC "00"
## STCODE "1.1295e+10"
## LOT_FRT "53.46"
## LOT_DEP "56.33"
## LOT_IRREG "I"
## BLD_FRT "53"
## BLD_DEP "56"
## BLD_EXT "N"
## BLD_STORY "19"
## CORNER NA
## LAND_AREA "2907"
## NUM_BLDGS "1"
## YRBUILT "1909"
## YRBUILT_RANGE "0"
## YRBUILT_FLAG NA
## YRALT1 "1989"
## YRALT1_RANGE "0"
## YRALT2 "0"
## YRALT2_RANGE "0"
## COOP_APTS "19"
## UNITS "20"
## REUC_REF NA
## APTNO NA
## COOP_NUM "100001"
## CPB_BORO "1"
## CPB_DIST "1"
## APPT_DATE NA
## APPT_BORO NA
## APPT_BLOCK NA
## APPT_LOT NA
## APPT_EASE NA
## CONDO_Number NA
## CONDO_SFX1 NA
## CONDO_SFX2 NA
## CONDO_SFX3 NA
## UAF_LAND "0"
## UAF_BLDG "0"
## PROTEST_1 "1"
## PROTEST_2 NA
## PROTEST_OLD "1"
## ATTORNEY_GROUP1 "14"
## ATTORNEY_GROUP2 NA
## ATTORNEY_GROUP_OLD "14"
## GROSS_SQFT "51387"
## HOTEL_AREA_GROSS "0"
## OFFICE_AREA_GROSS "0"
## RESIDENTIAL_AREA_GROSS "49087"
## RETAIL_AREA_GROSS "2300"
## LOFT_AREA_GROSS "0"
## FACTORY_AREA_GROSS "0"
## WAREHOUSE_AREA_GROSS "0"
## STORAGE_AREA_GROSS "0"
## GARAGE_AREA "0"
## OTHER_AREA_GROSS "0"
## REUC_DESCRIPTION NA
## EXTRACRDT "05/17/2022"
## PYTAXFLAG "A"
## TENTAXFLAG "T"
## CBNTAXFLAG "T"
## FINTAXFLAG "T"
## CURTAXFLAG "T"