library(readr)
library(tidyverse)

Property Valuation and Assessment Data

We’re taking this data and the expanded version below as our starting point.

https://catalog.data.gov/dataset/property-valuation-and-assessment-data-db7c2

“Real Estate Assessment Property data. The Department of Finance values properties every year as one step in calculating property tax bills.”




Load Data

Here we load the data that we previously downloaded from the website above.

When we ran the first time there were 1,692 rows with parsing issues. It was the first variable BBLE which we assume is going to be a primary key (Building Block Lot Easement) which was being read as a double but then the ones with a letter after the ten digits (the Easement) are getting forced to drop the letter so we have to specify the column types to preserve BBLE

df <- read_csv("~/Documents/D698/Property_Valuation_and_Assessment_Data.csv",
               col_types = cols(BBLE = col_character(), .default = col_guess()))

Here we look at the first ten records and confirm that we see the full BBLE.

# code to review parsing issues if any
#problems(df)

head(df,15)
## # A tibble: 15 × 40
##    BBLE    BORO BLOCK   LOT EASEMENT OWNER BLDGCL TAXCLASS LTFRONT LTDEPTH EXT  
##    <chr>  <dbl> <dbl> <dbl> <chr>    <chr> <chr>  <chr>      <dbl>   <dbl> <chr>
##  1 10001…     1    16  3859 <NA>     CHEN… R4     2              0       0 <NA> 
##  2 10007…     1    73    28 <NA>     NYC … V1     4            183      52 <NA> 
##  3 10007…     1    73    29 <NA>     NYC … Y7     4             90     500 <NA> 
##  4 10002…     1    29  7504 <NA>     <NA>  R0     2             36      73 <NA> 
##  5 10003…     1    36    12 <NA>     NYC … Y7     4            534     604 <NA> 
##  6 10007…     1    73     8 <NA>     NYC … T2     4            238     597 <NA> 
##  7 10017…     1   174  9031 <NA>     <NA>  V1     4             54     165 <NA> 
##  8 10017…     1   176   118 <NA>     <NA>  V1     4              0      50 <NA> 
##  9 10019…     1   198     1 E        NYC … U7     3              0       0 <NA> 
## 10 10020…     1   207     1 E        NYC … U7     3              0       0 <NA> 
## 11 10020…     1   208    19 E        NYC … U7     3              0       0 <NA> 
## 12 10020…     1   209  1001 <NA>     424 … RK     4              0       0 <NA> 
## 13 10067…     1   676     3 G        <NA>  U9     3              0       0 <NA> 
## 14 10067…     1   676     5 <NA>     MTA … V1     4            800     538 <NA> 
## 15 10022…     1   221     1 <NA>     NYC … U1     3            161     461 <NA> 
## # ℹ 29 more variables: STORIES <dbl>, FULLVAL <dbl>, AVLAND <dbl>, AVTOT <dbl>,
## #   EXLAND <dbl>, EXTOT <dbl>, EXCD1 <dbl>, STADDR <chr>, POSTCODE <dbl>,
## #   EXMPTCL <chr>, BLDFRONT <dbl>, BLDDEPTH <dbl>, AVLAND2 <dbl>, AVTOT2 <dbl>,
## #   EXLAND2 <dbl>, EXTOT2 <dbl>, EXCD2 <dbl>, PERIOD <chr>, YEAR <chr>,
## #   VALTYPE <chr>, Borough <chr>, Latitude <dbl>, Longitude <dbl>,
## #   `Community Board` <dbl>, `Council District` <dbl>, `Census Tract` <dbl>,
## #   BIN <dbl>, NTA <chr>, `New Georeferenced Column` <chr>



Subset the data

Before we manipulate the data we need to reduce from nearly 10 million records (9,845,857) by subsetting just the Class 2 properties, and then properties in Manhattan. We’re still at 997,655 records so we going to constrict our data further by limiting to just the zip code of 10019 and are left with 71,246 records.

df2 <- subset(df,TAXCLASS == "2")
df3 <- subset(df2,BORO == 1)
df4 <- subset(df3,POSTCODE == 10019)



Observe Data

Here we look at the first 50 rows and observe that the BBLE is made up of the BORO BLOCK LOT and EASEMENT.

You can go to the follow website to enter any borough block and lot number to see a map of it and the surrounding area. For example Block 1046 in Manhattan is between 55th and 56th Street and between 8th and 9th Avenue, and Lot 23 is a specific lot in that block.

https://propertyinformationportal.nyc.gov/

head(df4,10)
## # A tibble: 10 × 40
##    BBLE    BORO BLOCK   LOT EASEMENT OWNER BLDGCL TAXCLASS LTFRONT LTDEPTH EXT  
##    <chr>  <dbl> <dbl> <dbl> <chr>    <chr> <chr>  <chr>      <dbl>   <dbl> <chr>
##  1 10106…     1  1060  7502 <NA>     <NA>  R0     2            150     200 <NA> 
##  2 10104…     1  1045    61 <NA>     360 … D8     2            122     175 <NA> 
##  3 10063…     1   638    19 <NA>     WVH … C6     2            196     258 <NA> 
##  4 10100…     1  1006  1219 <NA>     <NA>  R4     2              0       0 <NA> 
##  5 10106…     1  1060  7502 <NA>     <NA>  R0     2            150     200 <NA> 
##  6 10104…     1  1045    61 <NA>     360 … D8     2            122     175 <NA> 
##  7 10104…     1  1048  7504 <NA>     EBY … R0     2            163     200 <NA> 
##  8 10104…     1  1043  7502 <NA>     300 … R0     2            150     100 <NA> 
##  9 10104…     1  1047  1054 <NA>     KIM … R4     2              0       0 <NA> 
## 10 10104…     1  1047  1047 <NA>     WU, … R4     2              0       0 <NA> 
## # ℹ 29 more variables: STORIES <dbl>, FULLVAL <dbl>, AVLAND <dbl>, AVTOT <dbl>,
## #   EXLAND <dbl>, EXTOT <dbl>, EXCD1 <dbl>, STADDR <chr>, POSTCODE <dbl>,
## #   EXMPTCL <chr>, BLDFRONT <dbl>, BLDDEPTH <dbl>, AVLAND2 <dbl>, AVTOT2 <dbl>,
## #   EXLAND2 <dbl>, EXTOT2 <dbl>, EXCD2 <dbl>, PERIOD <chr>, YEAR <chr>,
## #   VALTYPE <chr>, Borough <chr>, Latitude <dbl>, Longitude <dbl>,
## #   `Community Board` <dbl>, `Council District` <dbl>, `Census Tract` <dbl>,
## #   BIN <dbl>, NTA <chr>, `New Georeferenced Column` <chr>

And here is just looking at Block/Lot 1046/23 in Manhattan:

df5 <- df4 %>% 
  filter(BLOCK == 1046) %>%
  filter(LOT == 23)
df5
## # A tibble: 9 × 40
##   BBLE     BORO BLOCK   LOT EASEMENT OWNER BLDGCL TAXCLASS LTFRONT LTDEPTH EXT  
##   <chr>   <dbl> <dbl> <dbl> <chr>    <chr> <chr>  <chr>      <dbl>   <dbl> <chr>
## 1 101046…     1  1046    23 <NA>     317 … D4     2             57     100 <NA> 
## 2 101046…     1  1046    23 <NA>     317 … D4     2             57     100 <NA> 
## 3 101046…     1  1046    23 <NA>     317 … D4     2             57     100 <NA> 
## 4 101046…     1  1046    23 <NA>     317 … D4     2             57     100 <NA> 
## 5 101046…     1  1046    23 <NA>     317 … D4     2             57     100 <NA> 
## 6 101046…     1  1046    23 <NA>     317 … D4     2             57     100 <NA> 
## 7 101046…     1  1046    23 <NA>     317 … D4     2             57     100 <NA> 
## 8 101046…     1  1046    23 <NA>     317 … D4     2             57     100 <NA> 
## 9 101046…     1  1046    23 <NA>     317 … D4     2             57     100 <NA> 
## # ℹ 29 more variables: STORIES <dbl>, FULLVAL <dbl>, AVLAND <dbl>, AVTOT <dbl>,
## #   EXLAND <dbl>, EXTOT <dbl>, EXCD1 <dbl>, STADDR <chr>, POSTCODE <dbl>,
## #   EXMPTCL <chr>, BLDFRONT <dbl>, BLDDEPTH <dbl>, AVLAND2 <dbl>, AVTOT2 <dbl>,
## #   EXLAND2 <dbl>, EXTOT2 <dbl>, EXCD2 <dbl>, PERIOD <chr>, YEAR <chr>,
## #   VALTYPE <chr>, Borough <chr>, Latitude <dbl>, Longitude <dbl>,
## #   `Community Board` <dbl>, `Council District` <dbl>, `Census Tract` <dbl>,
## #   BIN <dbl>, NTA <chr>, `New Georeferenced Column` <chr>

I’m concerned about the YEAR column because it looks as if this data hasn’t been updated since “2018/19”. To determine if every building has a “2018/19” record or if I should get a single record for each building that is the most recent YEAR available, I’ve counted the unique Block-lot combinations before and after having subset for the 2018/19 year and get 8,756 versus 8,402.

Since the delta is small, for our purposes for now we will only collect the records that have a year of 2018/19 before we write this to be used in other Code Chunks.

It may be that we abandon this dataset in favor of the expanded version which we will explore next.

# Count how many unique block-lot combinations there are
df4 %>%
  distinct(BLOCK, LOT) %>%
  count()
## # A tibble: 1 × 1
##       n
##   <int>
## 1  8756
# Count how many unique block-lot combinations there for year 2018/19
df4 %>%
  filter(YEAR=='2018/19') %>%
  distinct(BLOCK, LOT) %>%
  count()
## # A tibble: 1 × 1
##       n
##   <int>
## 1  8402
df6 <- df4 %>% 
  filter(YEAR == '2018/19')



Write Data

Here we save the data for later

write.csv(df6, file="PVAD.csv")



Reference

Here are the fields for reference

t(head(df6, n=1))
##                          [,1]                          
## BBLE                     "1010607502"                  
## BORO                     "1"                           
## BLOCK                    "1060"                        
## LOT                      "7502"                        
## EASEMENT                 NA                            
## OWNER                    NA                            
## BLDGCL                   "R0"                          
## TAXCLASS                 "2"                           
## LTFRONT                  "150"                         
## LTDEPTH                  "200"                         
## EXT                      NA                            
## STORIES                  "16"                          
## FULLVAL                  "0"                           
## AVLAND                   "0"                           
## AVTOT                    "0"                           
## EXLAND                   "0"                           
## EXTOT                    "0"                           
## EXCD1                    NA                            
## STADDR                   "425 WEST 50 STREET"          
## POSTCODE                 "10019"                       
## EXMPTCL                  NA                            
## BLDFRONT                 "0"                           
## BLDDEPTH                 "0"                           
## AVLAND2                  NA                            
## AVTOT2                   NA                            
## EXLAND2                  NA                            
## EXTOT2                   NA                            
## EXCD2                    NA                            
## PERIOD                   "FINAL"                       
## YEAR                     "2018/19"                     
## VALTYPE                  "AC-TR"                       
## Borough                  "MANHATTAN"                   
## Latitude                 "40.76388"                    
## Longitude                "-73.98985"                   
## Community Board          "104"                         
## Council District         "3"                           
## Census Tract             "133"                         
## BIN                      "1026676"                     
## NTA                      "Clinton"                     
## New Georeferenced Column "POINT (-73.989853 40.763878)"