West 3rd and MacDougal St

Abstract:

Introduction:

Initially we supposed data science techniques could verify that there are property tax inequalities in NYC. We could do this by building a model predicting assessed tax values and compare them to actual tax assessments. If wealthier buildings had larger errors than mid and poorer buildings then we would have evidence they are underassessed.

What we found was a rich world of analysis, tradeoffs and historical issues that suggests the NYC property tax system largely shifts a disproportionate tax burden onto the poor and middle classes with a modest burden for wealthier people. Our paper shifted from establishing evidence of inequality to providing a tool for individual buildings to determine if they are overassessed, within the current framework, and have an argument to appeal their taxes.

NYC property taxes were $33.7 billion in 2025, a $1 billion increase compared to 2024, and is the single largest revenue source for the city, representing 44% of all New York City tax revenue. 45% of those property taxes are levied against multifamily buildings. NYC determines how much in property taxes they need to collect and backs into the tax percent of property value required to meet that objective. Let’s say there were only two multifamily buildings in NYC, yours and your friend’s across the street, and each had a property value of one dollar, the city needs to raise 50 cents and would assess a 25% property tax for the year. Now if your friend had successfully appealed his taxable property value and the city lowered his building’s value to 50 cents, the city still needs to raise 50 cents and so would assess a 33% property tax for the year. You pay 33 cents and your friend pays 17 cents. Over time this preferential tax treatment could allow your friend to save enough, and for you to deplete your savings enough, to where your friend ends up buying your building and starts charging you 60 cents a year in rent, covering both of the building’s property tax and allowing your friend to live with more on less. Not having to work anymore your friend runs for city council and changes the law so rental buildings pay twice as much in taxes, further lowering his personal tax and justifying a 20 cent rent increase for you.

It’s a ridiculous example but widening wealth gaps driven in part by property tax inequities is literally happening in the city. Currently the amount of taxes rental buildings pay is between 4.5 and 6 times what equivalent non rental buildings pay, which is passed through to renters via high rents, so that roughly a third of the rents paid in NYC go to property taxes, exacerbating the difficulty for renters to save enough to buy their homes and reduce their property tax burdens. We are living in a ridiculous example.

NYC divides property into four tax classes. Tax Class 1 are 1-3 family homes, think brownstone, and Tax Class 2 are multi-family homes like co-ops and condos. While Class 1 makes up 47% of total market value and Class 2 25% of the total taxable property value in the city, only 15% of the property taxes are collected from Class 1 (first class?) and 45% of the property taxes is collected from Class 2 (second class?). It bears emphasizing that these percentages are fixed and so if the taxable value of all class 2 properties dropped in half, 45% of the property taxes would still be collected from class 2 and all class 2 properties tax liability would stay the same.

Note the remaining 40% of property taxes are collected from utilities (class 3 at 6%) and businesses (class 4 at 34%). Examples of businesses in class 4 being office buildings, stores, hotels, factories and lofts. The last is because lofts were converted from manufacturing, warehousing or storage into high-ceiling residences. Together, class 3 and 4 comprise the remaining 28% of total taxable property in the city but due to tax abatement programs, covid era hotel programs and a disincentive to tax businesses during market downturns, class 4 alone may represent 37% of the market value of the cities properties with class 3 raising that an unknown additional amount shrouded in regulations around utility monopolies.

How to fix the inequalities in the current taxation system in NYC is outside of the scope of this paper. The percentage split between the four classes goes back to when the property tax system was created in 1981 and hasn’t been updated since then to current proportions. Likely class 1 property owners have more political clout and are able to perpetuate their smaller share of the total tax bill. There are additional systemic inequities such as class 2 buildings being valued as if they were a rental building which has the effect of overvaluing cheaper buildings and undervaluing more expensive buildings. This methodology, too, is locked into the 1981 tax reforms and can’t be addressed without major political will and clarity of outcomes.

What we can do is help the roughly 5.75 million residents living in class 2 properties (versus the ~1.98 million living in class 1 properties) determine if they are being treated fairly within the constraints of the current system. We should be able to build a model that predicts the city Department of Finance’s estimate of the value of class 2 properties, and determine if a property is likely under or over valued.

Our original hypothesis was that sophisticated owners would have the resources and education to appeal their tax assessments and win, however the literature reveals this to already be proven, even after factoring out the systemic tax advantages already enjoyed of owning luxury units in class 2 properties.

Now we try to give owners without those same wealth, accounting and legal resources, networks and education, a tool to possibly lower their own tax burden.

Literature Review:

I started out with the premise that sophisticated owners were able to successfully appeal property tax assessments. As I learned about NYC’s Department of Finance’s methodology for calculating taxes I assumed some of the over/under assessment for poor/rich buildings would be due to taking statistical averages. The literature review supported these ideas and suggest that my contribution is in educating the reader on how to understand property taxes in NYC and help them determine if they should appeal their own property taxes.

I restricted articles to those written after 1981 when the current rules for taxation were codified in NYC.

The Invisible Problem

Hayashi (2012) introduces the concept of legal salience, that if people can name how they are injured, then they are more likely to be able to initiate and succeed in getting compensated in court. Property tax assessments have low legal salience, meaning regular property owners can’t easily recognize or articulate when they are being overassessed, especially in an opaque system like NYC’s. As a result all but the most sophisticated property owners are discouraged from the appeals process, reinforcing inequality. With every win by a sophisticated owner, the tax burden is shifted a little more disproportionately onto ordinary owners who are poorer and experience poorer outcomes or complete non-participation in the legal system’s venue for property tax appeals.

Systemic Inequity

In 1975, ‘Hellerstein v. Assessor of Islip’ determined that fractional assessments of full market value for determining property taxes were illegal (Scanlon and Cohen 2011). At the time, tax assessors made determinations individually, which led to idiosyncratic assessments and facile bribery. That court case was about to dramatically shift property tax burden onto commercial properties and to avoid that, the New York State Legislature, to prevent significant disruptions, wrote into law S. 7000A in 1981, which is our current taxation system today. However shifts in population, market value and economic activity in the last 44 years hasn’t been reflected in the unchanged tax burden percentages assigned across the four property classes. For example, Class 1 properties (1-3 unit family homes) have an effective tax rate of 0.67% compared to the 3.31% of Class 2 (co-ops, condos and rentals) or the 3.85% of Class 4 (commercial/industrial). So while Class 1 properties own 49% of property value, they contribute only 15% of tax revenue. Vastly more renters live in Class 2 housing, resulting in renters paying significantly more of passed through property tax than the fewer, wealthier renters living in Class 1 housing. While there are co-op and condo tax abatements to bring Class 2 property taxes closer in line to Class 1, these abatements only benefit non-pied-a-terre owners, not renters. Scanlon and Cohen also confirmed wealthier properties to be more likely to appeal assessments, and more likely to win.

This perpetuation of tax advantages for the housing of largely wealthier residents drives a widening of the wealth gap through sale prices as well. Hodge, Komarek & McAllister (2024) found that overassessed lower value properties sold at a 13% discount and underassessed higher value properties sold for a 10% premium. So not only does the assessment inequities redistribute the tax burden regressively, they reshape actual wealth.

Some of this property tax inequity exists in the NYC Department of Finance’s (DOF) property tax calculation methodology as well. The Furman Center (2013) describes how Section 581 of the city’s Real Property Tax Law values Class 2 properties as if they were rental properties even though they don’t generate rental income. This leads to inaccurate assessments especially for high-value properties because no true rental comparables exist, especially when compared to rent-regulated buildings. For example, in 2012, 50 individual co-op units sold for more than the entire building’s official DOF market value estimate. The Furman Center iterated that the city is clearly a regressive taxation system. In the four-class property tax system the city specifies how much property tax will be assessed from each class and sets the rates accordingly, so when high-value properties are underassessed it leads to larger increases on low and mid-value properties, shifting the tax burden on to poorer people. The city can only pick one rate per class such that when applied to everyone in that class’s assessed property tax value, produces the amount of taxes they need from that class. Again, the tax win of a wealthier property is cent-for-cent distributed across every other property, and lower and mid-value properties are not getting wins.

An additional source of tax inequity may be investor size. Xiao (2022) found nationally that larger investors (100+ units owned) have a tax assessment discount compared to small investors and individual home owners. She found the large investor tax assessment discount was larger in areas with any one of three characteristics: 1) A high tax burden, 2) a high concentration of large investors, and 3) a fairer property tax administration. NYC has both a high tax burden and a large concentration of large investors. It was not clear what made a fairer property tax administration or if New York City was considered one, but it seems like a paradox that a fairer property tax administration would allow for a persistent tax discount for large investors. It may be that large investors are able to get tax abatements in exchange for development promises or are large enough to game the rules in other ways, being ultra sophisticated owners with access to the best resources.

Capitalization Rates’ Role in Horizontal and Vertical Inequities

There are several important, but subtle, concepts: horizontal equity, vertical equity and capitalization rates. The NYC Independent Budget Office (2022) confirms there are horizontal inequities (where similarly priced properties have different assessments) and vertical inequities (where wealthier properties pay proportionally less in property taxes). The DOF uses high capitalization rates (cap rates, rates used to discount future streams of hypothetical rental income to a present day property value) that result in lower market valuations. Since the DOF doesn’t use actual market data this can lead to tax inequities. The DOF can’t change from cap rates to fair market value because they operate under the S7000A law from 1981, however they could lower cap rates for high value properties and raise them for low-value properties to narrow vertical inequities.

An additional example to clarify both the terminology of horizontal and vertical equity, and progressive and regressive tax systems - Horizontal equity is when two people, in two equal homes, pay equal property tax. Vertical equity is when a third person, in a more expensive home, pays more property tax. How much more taxes the person in the more expensive home pays determines what kind of tax system you have. A proportional tax system would have the owner of the more expensive home pay an equal percentage of taxes relative to home value as the first two. Science suggests this still puts a disproportionate tax burden on the first two and a more equal outcome is achieved if the more expensive home pays a greater percentage of property value as tax. This is a progressive tax system and generally leads to better wealth equality and stable social outcomes. The opposite is a regressive tax system where the owner of the wealthier home pays a lower percentage of the home’s value in property tax. New York City’s is a regressive property tax system that exacerbates the wealth-gap between haves and have-nots.

Options to Improve the Current System

The Independent Budget Office (IBO) has two other suggestions to be more fair. Using median instead of average transactions would bring down artificially high capitalization rates. Imagine a long flat trending up curve that spikes incredibly high over a short distance towards the end on the far right. This is a visual representation of property values in the city. By taking the average of the values to determine the capitalization rate the DOF is using a rate high above the curve, influenced by the outlier wealthy values. If the DOF used median values to determine the capitalization rates then it would be on the curve and wealthier properties would be valued higher in proportion to the median properties compared to when taking the average.

The second IBO suggestion for fairness is to adjust cap rates between property types. For example, to use a slightly lower cap rate for office buildings in a prime midtown location compared to a lower-tier office neighborhood. The DOF would be challenged to adjust capitalization rates based soley on property characteristics, such as building type, location and usage. Attempting to tweak the capitalization rates in order to get fair market outcomes could invite lawsuits for violating equal protection under the Fourteenth Amendment, which requires that similarly situated individuals be treated equally under the law, or for violating property rights protected under the Fifth Amendment’s Takings Clause and due process protections.

Inconsistencies in the City’s Processes

Our task of providing a tool to help buildings produce evidence if they are overassessed relative to their peers is further complicated by inconsistencies in the DOF’s processes. Goor (2017) found that the NYC DOF deviates significantly from its publicized process when calculating property taxes and that property taxes are poorly correlated with land, market and assessed values. Since we will be using similar characteristics as inputs in our models we should expect a higher degree of error due to the DOF’s process inconsistencies.

The Burden of Renting

Rachel Michelle Goor’s 2017 paper “Only the little people pay taxes” had several general findings of note such as a lot of Class 2c luxury condos’ nearby rental buildings have rental controls leading to deep undervaluations for luxury condos. She determined that property tax exemptions are granted for people’s ability to organize and lobby the State legislature, not because those tax exemptions were sound tax policy. Her emphasis was on homeowners versus renters and she found homeowners have 46% of the estimated market value but pay 15% of the total property tax burden (coincident to but not the same as the 15% of tax paid by Class 1 properties), while rental properties have 24% of the estimated market value but pay 37% of the property tax. This is 4.73 times more tax paid for market value as a rental building than as a homeowner, though she stated renters in NYC were paying six times as much (presumably on a different basis) in property taxes as homeowners do compared to the average outside of NYC which is closer to renters paying 1.5 times as much property taxes as homeowners

Reform Proposals

A second, highly influential paper for us was Lizzie (Yea Won) Lee’s, 2023, “Evaluating the ‘Road to Reform’ for New York City’s Property Tax System”.

Like Goor used machine learning to identify that the DOF was deviating significantly from its publicly stated processes. It follows that Lee (2023) supports the DOF using machine learning (XGBoost) as a viable tool for improving fairness and transparency in tax assessments by flagging valuation anomalies for further investigation.

Like Goor, Lee found that the DOF’s methodology is not disclosed nor reproducible and so it’s difficult to verify if a DOF valuation is accurate, or how a valuation compares to equivalent properties, or how equitable assessments are between property types. This will directly impact our likelihood of success, but we have to start somewhere, for free, without the need to engage expensive law firms and experts.

Lee characterized NYC’s property tax system as extremely regressive with low-income neighborhoods often facing higher effective tax rates. She agrees with the New York City Advisory Commission on Property Tax Reforms 2021 final report for Class 2 to be split into less than inclusive or greater than 10 units, as well as switching to a sales-based market valuation.

A sophisticated criticism she had of The New York City Advisory Commission on Property Tax Reform is that they used quantiles to group properties together to understand impact but that might mean an $800k home and an $80M luxury penthouse might both be in the same top quantile with drastically different impacts. The Commission didn’t specify their quantiles but it could be treating the top 1% or the top 0.1% the same as the top 25%. This could be used to disingenuously misrepresent the tax changes for different groups, or attribute tax increases of the top 0.1% as being burdened by ordinary homeowners unfairly.

It’s not from the literature review but let’s take a moment to say that switching to fair market value has it’s own drawbacks. Without proper smoothing, gentrification would accelerate as increases in home prices would leverage property taxes higher for the residents who had been there prior to gentrification.

Lee suggests NYC may be reluctant to reform property tax because it’s one third of NYC revenue and fears high-income outmigration away from the city. She encourages NYC to lower income from property tax and discourages carve-out plans which lower revenue hoping to stimulate construction but that have mixed results.

While her analysis of what should be NYC’s path to property tax reform is outside of the scope of this paper, it helps frame what ordinary property owners can do to achieve more equitable property outcomes for themselves.

Summary

The literature strongly supports that NYC’s property tax system is regressive, opaque, and full of systemic and methodological inequities. The 1981 statute prescribes an outdated division of tax burden that punishes poor and middle-class residents in favor of wealthier, low occupancy or luxury buildings. Inconsistencies in processes further reduce the legal saliency, or residents’ ability, to name and confirm the financial injury of being overassessed for property taxes. Our task is not to present a path to reform but to provide a free, data-driven attempt to support an appeal, where justified, for ordinary home owners to reduce their tax assessments to fair levels relative to equivalent neighboring properties.

Referenced Papers

Hayashi, A. T. (2012). The Legal Salience of Taxation. SSRN Electronic Journal.
DOI Link

Scanlon and Cohen (2011) Distribution of the Burden of New York City’s Property Tax - The Furman Center for Real Estate & Urban Policy
PSU Link

Hodge, T. R., Komarek, T. M., & McAllister, A. (2024). A Double Negative: Capitalizing on Assessment Regressivity.
DOI Link

Furman Center Policy Brief (2013) Shifting the Burden: Examining the Undertaxation of Some of the Most Valuable Properties in New York City - The Furman Center for Real Estate & Urban Policy
Furman Center Link

Xiao, S. W. (2022). Investor Scale and Property Taxation.
DOI Link

NYC Independent Budget Office (2022). Does NYC’s Method for Assessing Commercial Property Values Result in Inequities)
PDF Link

Goor, R. M. (2017). Only the little people pay taxes.
URI Link

Lee, Lizzie (Yea Won) (2023) Evaluating the ‘Road to Reform’ for New York City’s Property Tax System
DOI Link

Extended Bibliography

New York City Independent Budget Office (?). The Coop/Condo Abatement and Residential Property Tax Reform in New York City
Link

New York City Independent Budget Office (2006). Twenty-Five Years After S7000A: How Property Tax Burdens Have Shifted in New York City
Link

New York City Independent Budget Office (2013). The Coop & Condo Tax Break Has Expired, Giving Albany Chance for Long-Promised Fix
Link

New York City Independent Budget Office (2022). Does NYC’s Method for Assessing Commercial Property Values Result in Inequities
Link

Shi, Boicourt, Ng, et al. (2024). An Assessment of NYC Cooperative Housing’s Climate Vulnerability and Barriers to Adaptation
Link

Cetrino, Benjamin (2014) Classification of Property for Taxation in New York State
Link

Nadine Brozan (2002) For Co-op Complexes, Complex Choices
NYT Archival Link

Berry, C. (2021). An Evaluation of the Residential Property Tax Equity in New York City
PDF Link

Research Question:

In NYC, the property tax liability for a building is ultimately based on a single number: the total market value of the tax lot, as assessed by the Department of Finance (DOF). The final tax number is influenced by tax abatements and exemptions that are based on policy, which while maybe not fair or optimized for a given property, are not appealable. Additionally, transitional formulas cause a delay in the full reflection of changes in the market value assessment in any given year. So, while there is a lot of noise in a building’s final tax liability, the signal is the assessed total market value.

This project aims to model this assessed market value using publicly available DOF data, NYC land use files (PLUTO) and averaged sale price information. By comparing our model’s predicted value to the city’s actual assessed market value for each tax lot, we can provide insight to a building as to whether it is likely over-, under- or appropriately assessed. In this way we can support ordinary homeowners who may not otherwise have the means to know if they are overassessed or not.

Our research question is:

Can we build a model that predicts the total market value assessed by the DOF? And in so doing provide a tool for co-op boards and residents, especially those without sophisticated legal or accounting resources, to know whether their building may have a good case for appeal?

We suspect that highly sophisticated owners, with the resources to own high-cost or luxury buildings, are likely able to measure and appeal tax assessments to keep their tax liability fair or underassessed, even in addition to the systemic low-tax benefits already enjoyed by high-cost or luxury buildings in the NYC regressive property tax system. These low assessments would appear as higher predicted assessed market values than what was actually assessed.

There is no need for a traditional hypothesis and null hypothesis, we aren’t running statistical tests on randomized groups. Instead we’re building a predictive model to surface actionable data for buildings considering if they are suffering the invisible harm of being overassessed in property taxes compared to their peers.

Data and Variables:

We’ve found NYC property tax data from Open Data NYC. discuss what I’ve found.

I’m a little concerned I need to do individual building valuations and that there’s multiple buildings on any block/lot combination with no way to primary key individual buildings across the publically available NYC data

I also need to do a better domain dive of what NYC discloses of their methodology

What we don’t have is a good sophistication flag, I tried pulling educational attainment from the US Bureau of Census at the zip code level, but my zipcode in NYC has tens thousand buildings (citation needed). GH suggested I look at how granular that data is, maybe by voting precinct. I think the sophistication flag is price per square foot for the average apartment in that building - less than $1200 is low, between $1200-$1300 is mid and above $1300/sqft is high. I could use a decision tree to come up with better cutoffs

could possibly do some data visualization with an NYC map

There are 71k 4+ residence units (over how many years?) in the manhattan zipcode of 10019 alone. Zipcode doesn’t give us the granularity or resolution we need to view wealth differences in individual buildings

We’re also having an issue with block and lot number not being granular enough. On any given lot in a block there are 1-10 buildings so we need a building ID code that’s consistent across datasets

Domain Dive

NYC Dept of Finance What are class 2 properties?

say how the assessment ratio of 0.45 exacerbates market inequity. The DOF uses artificially high capitalization rates. Maybe this causes people to think they are getting a deal and they are scared to inquire becasue they don’t want a reassessment to go against their interest. One of the papers talks about the low legal salience of overassessed taxation where it’s hard for someone to understand or quantify the financial injury of overtaxation and so they never appeal their overtaxation in court. This high market capitalization might be one way the city does that. They look at their statements and see the low market valuation for their property and are scared to contact the DOF for clarification because they then mistakenly believe they are underassessed.

The DOF gets Net Operating Income from owner-reported financials. Uses a modeling process to correct for underreporting, missing data and inconsistency.

NOI is reported gross income minus reported expenses other than property taxes) lol then at least my building’s NOI = property tax * 1.05.

So one of the papers addressed vertical and horizontal inequities and how we’re stuck using cap rates by asset type.

capitalization rates is net operating income over market value.

So for example, my building has a net operating income of 1,000,000 and assume that for it’s class of asset the market capitalization is 0.18, then the market valuation for my building is $5,555,555. However if they used actual market data then the capitalization rate might be 0.05 and a fair market valuation of $20,000,000.

For instance you and I could both own rental buildings next to each other. If yours is better managed, more aesthetic, and attracts tenants paying higher rents, you will have a higher Net Operating Income (NOI) than me and a higher assessed value than me. Market capitalization rates aren’t tied to how efficiently buildings are using their capital. And we can’t assign different market capitalizations to two buildings with identical property characteristics or otherwise we would be violating equal protection under the law. Now let’s say you don’t report your income and the city applies an average to determine yours and now you end up with the same assessed value as me, how we have horizontal inequity. (If a building doesn’t report there should be a penalty above average, but how would they detect false reporting?)

“we use statistical modeling to calculate the typical income and expenses for properties similar to yours in size, location, age, and number of units. The process varies depending upon whether your property has more or less than 10 units.“

For class 2 they don’t look at recent slaes data they look at operating income.

If they go off of true operating income then a resident-owner building is going to have significantly less operating income because they only collect maintenance and not rent. Operating Income - $1,000,000 Capitalization Rate - 0.18 Marketing Valuation (OI/CR) - $5,555,555 assessment ratio - 0.45 assessment - $2,500,000 tax rate - 12.86% assessed tax - $321,375

📌Out of place📌 If assessments were based on sale price data then luxury units with large staff may have lower purchase prices but higher maintenance.

It could also be there is systemic bias(?), for example NYC uses statisical methods to produce assessed property values for tax purposes. That may tend to lump poor, mid and rich co-ops together so that poor buildings are paying a disproportionately larger value of taxes and rich buildings are paying a disproportionately lower value of taxes. Basically we could ask if the statistical methods used by NYC DOF lump all buildings into the mid rich building category

What research methods do I intend to use, what models, variables and data sets will be used to test hypos and theory?

statistcial validity of the data sets and models..

📌 Compare market valuations to sale price ratios

DATA EXPLORATION Three values Model’s estimated value The actual value as provided in our data The formulaic model We went to data to look at this. Not sure if the TotVal column is the assessed tax. I can narrow it down to look at my building and compare

Describe each dataset and where it comes from with proper citation. Find a list from NYCDOF with the variable name descriptions. Narrow the number of records to Taxtype=2 Can narrow the database to manhattan only and that could be enough for the project

But could even initially narrow the database to a particular zipcode in the beginning so that I can build an end to end model with the smallest amount of data and then build up from there.

I also want to use Census Data to get average income per zip code to predict value based on zip code… except NYC is so dense. You have a $1 pizza shop next to a $20 a person Italian restaurant next to a $200 a person italian restaurant. If you charged all three an average tax based on earnings potential knowing the average income of the street, the $1 pizza shop would disadvantaged and the $200/person italian restaurant would be advantaged

Also, check the warning and issues from the read_csv() to see if there are anything that would give us pause and make us want to find a different route of processing. For example, if there is a parsing issue with a specific deliminarotr…..

Which columns are of importance to us. The basic values we’ll use in our regression Also-> run missing statistics on the columns.

Steps for next time I look this up. Basic regression I want to run Which columns do i want to focus on. Pick one of the literature review links.

Blurb

I’m doing a domain dive/data dictionary for the Pluto dataset. I think next I need to add a column to the data dictionary so I knew which department each field came from so I don’t have duplicate fields from DOF mucking up my merged data

I don’t think I’ll get anything out of PLUTO, because presumably everything the DOF uses to calculate assessed market value should be in the DOF data but there might be some interesting correlations between market value and a non DOF field in the PLUTO database, also the Pluto data has coordinates so I could do an additional model where nearby properties have a larger impact impact on predicted market value assessment than distant properties

Oh, you talked about in your letter that we have a lot of old people. As I’m going through this exercise I’m feeling like every coop in the city should audit their exemptions, people over 65 who make less than X should be able to get an exemption and I don’t know if it’s automatic, presumably it affects them, but I don’t know how we parcel out exemptions, I’m only aware of that for abatements. Step 2, check the math. We should have a spreadsheet that the historical assessed values produce a transitional/final value that’s correct. Maybe NYC never makes a mistake like that but if we had that spreadsheet and NYC’s number was higher than ours we could point it out. Third is what is the final assessed Market value for each tax year…. That’s the number I’m trying to predict. It’s multiplied by 0.45, goes through some smoothing so it doesn’t go up (or down) too much in any one year, so the final tax number ignoring any exemptions and abatements is really a lagging indicator and the only thing we should check if it’s fair is the final assessed market value every year. It will look low, and that’s a psychological tool by NYC so that people don’t question it, but it doesn’t matter how low it is, it matters how low it is in relationship to comparable properties.

I want to calculate the predicted final market assessed value for all of the lots/parcels in our block and see how they compare to each other.

My goal with the PLUTO data is to see if there are any additional fields that could make an argument for whether someone’s assessed market value is high or low (I’m really looking at us, and do we have a case to appeal our assessed market value)

I know I also have to look at something called vertical and horizontal equity. Like if there is another building on our block that has roughly the same square footage of residential space but we pay more taxes than them, that’s horizontal inequity. And if there is a building that’s much more valuable than us and they pay roughly the same taxes as us that’s vertical inequity

Yes, Building Class C# are walkups and Building Class D# are elevator buildings - in the statement for each tax lot there’s a short list of data provided and so I assume that’s the majority of the variables they use to estimate value. I also wonder if they use prevailing interest rates but their methodology is black box. They don’t say exactly

Ah, my third set of data is sales. And presumably that should have a lot to support building’s relative market value to each other, however Class 2 properties are specifically valued as if they are rental properties, so I think it’s more number of units, etc. And that means poor buildings (low value of unit sales) are being taxed the same as mid buildings (like ours) and taxed the same as luxury buildings with high sales, given they are all the same size. We literally have a regressive property tax system in NYC and rich people in nice properties pay proportionately less in property taxes. (Actual rental buildings pay like 4.5 times as much taxes as coops and condos, so most of the rent people pay is pass through property taxes, and yet most renters think they aren’t paying any property tax! Rich people have really gamed the system in NYC, no wonder it’s kind of a playground for elites

List of Data Sources

Property Valuation and Assessment Data

We’re taking this data and the expanded version below as our starting point.

https://catalog.data.gov/dataset/property-valuation-and-assessment-data-db7c2

“Real Estate Assessment Property data. The Department of Finance values properties every year as one step in calculating property tax bills.”

Property Valuation and Assessment Data Expanded

Here is the expanded version of the DOF’s Property Valuation and Assessment Data set.

https://catalog.data.gov/dataset/property-valuation-and-assessment-data-tax-classes-1234

“Real Estate Assessment Property data. Data represent NYC properties assessments for purpose to calculate Property Tax, Grant eligible properties Exemptions and/or Abatements. Data collected and entered into the system by various City employee, like Property Assessors, Property Exemption specialists, ACRIS reporting, Department of Building reporting, etc..”

Rolling Sales Data

We should be able to subset for properties that have sold in the last 12 months to train and test our model

https://www.nyc.gov/site/finance/property/property-rolling-sales-data.page

“The Department of Finance’s rolling sales files list tax class 1, 2, and 4 properties that have sold in the last 12-month period in New York City. These files include the neighborhood, building type, square footage, and other data.”

PLUTO

The PLUTO data we’ll use to layer the Property Valuation and Assessment Data. Note it stands for “Primary Land Use Tax Lot Output”.

https://www.nyc.gov/content/planning/pages/resources/datasets/mappluto-pluto-change#overview

“Extensive land use and geographic data at the tax lot level in comma–separated values (csv) file format. The PLUTO files contain more than seventy fields derived from data maintained by city agencies.”

Potential Additional Data

There are four additional sources of data that could be used to expand the original scope in future versions of the project.

MapPLUTO

MapPLUTO is the spatial version of the PLUTO data using tax lot geometries from the Department of Finance’s Digital Tax Map. This could allow us to use distance as a factor in predicting assessed tax values.

Property Exemption Detail

We are skipping property tax exemption details because we should be able to predict property taxes independent of assessments. For an extra layer of sophistication in a future version of the project we could back out of property exemptions before predicting tax values, however for our purposes our model should just determine that these properties have a lower than expected tax burden.

“Government, Not-for-Profit and Commercial Exemptions maintained by NYC Department of Finance. The Department of Finance administers a number of benefits for property owners in the form of exemptions and abatements. Exemptions lower the amount of tax one owes by reducing the property’s assessed value. This data contains exemption information for Government, Not-for-Profit and Commercial Exemptions.”

https://catalog.data.gov/dataset/property-exemption-detail

Assessment Actions

Another future extension of the project could be to evaluate the outcomes of legal action to reduce assessments.

“Assessment Actions Actions on Applications for Reducing Assessments or Reclassifying Property. Listed here are Tax Commission actions for reducing assessments or reclassifying property. KEY: YR=Assessment year; B=Borough (1=Manhattan, 2=Bronx, 3=Brooklyn, 4=Queens, 5=Staten Island); TC=Tax Class or subclass. Classification claims. Reductions are expressed in total actual assessed value. For condominiums, actions shown are for representative lots only.”

https://catalog.data.gov/dataset/assessment-actions

American Community Survey

We have data at the census tract level that we could use to layer additional features to identify sophisticated owners but I think this should be in reserve as we will already have sale price data. Presumably those who can pay more for their property has greater access to

https://dol.ny.gov/american-community-survey

Data Display

BLDGCL what do these values mean R0, D8, C6, R4. I need to pull out a data dictionary

I should be putting my summary of work here and this more detailed work in another document

LTFront and LTDEPTH to get lot footprint, do the same for building footprint and stories to get square footage of building.

FULLVALUE, AVLAND AVTOT -> go through and do individual calculations to see how these are related

issue with one data set, up to the 2018-2019 plan year, need to grab it from another dataset but similar

can I calculate an assessed value per square foot-> compare that to buildings in the same block-lot and then what about adjacent lots?

List of data I’m looking at NYC open data NYC DOF I have a database on Appeals information. can I compare that to the price per squarefoot and see if wealthy buildings are more likely to appeal? Also it’s only successful appeals in the database, if I could find a database with unsuccessful appeals that could be interesting. Basically there are three mechanism we are looking at for inequity: 1) rich people are better able to appeal 2) part of the statistical calculation for tax assessments used by NYC DOF penalizes poorer buildings and subsidizes richer buildings but tending to assume an average value, or 3) the price per square foot is distorted by underlying building mortgage, which shouldn’t affect the taxable value but would affect the sale value, or high maintenance due to luxury amenitites and staff, two equal units, if one building has higher maintenance the price is lower. But is that true, like does the price accurately capture the value of having a doorman? Sigh, this also means price per square foot isn’t a great indicator of richness…

I need to start over, grab more datasets and do basic data exploration and evaluating missingness and imputation strategies.

No matter what we’re only looking at tax class 2 buildings. Maybe that can be part of the domain dive section: where we discuss how the DOF assesses tax values.

There’s also the Pluto database and that has a building ID but is that consistent across other databases?

BBLE “1000163859”
BORO “1”
BLOCK “16”
LOT “3859”
EASEMENT NA
OWNER “CHEN, QI TOM”
BLDGCL “R4”
TAXCLASS “2”
LTFRONT “0”
LTDEPTH “0”
EXT NA
STORIES “31”
FULLVAL “354180”
AVLAND “3310”
AVTOT “159381”
EXLAND “3310”
EXTOT “159381”
EXCD1 “6800”
STADDR “1 RIVER TERRACE” POSTCODE NA
EXMPTCL NA
BLDFRONT “0”
BLDDEPTH “0”
AVLAND2 “3310”
AVTOT2 “148953”
EXLAND2 “3310”
EXTOT2 “148953”
EXCD2 NA
PERIOD “FINAL”
YEAR “2018/19”
VALTYPE “AC-TR”
Borough NA
Latitude NA
Longitude NA
Community Board NA
Council District NA
Census Tract NA
BIN NA
NTA NA
New Georeferenced Column NA

Where can I find the NYC DOF glossary? AVTOT is Average Total value or Assessed Value Total? FULLVAL is market value? EXTOT is Exempt value? AVTOT2 is assessed value total after exemptions? Need to use summary(), look for missing values, use skimr::skim(), look for outliers, and ranges

Use Census data to come up with average salary per zipcode to help determine Richness by zipcode…

Statistical Methods:

This section describes the methods you used to analyze the data.

Goal: design and implement a model or set of models to measure or explore these relationships

Ridge Regression

Try linear regression -> I may know what my features are because of the domain dive but if not I could try lasso regression. For a nonlinear model maybe I could do random forest. How would the residuals from a linear vs nonlinear model be a diagnostic in and of itself?

use test/train split and x-fold cross-validation

Residual Analysis

Analyze residuals across building types.

Visualize anomalies - what we’re expecting to see is greater residuals for wealthy and poor buildings w

Statistical Validity

How sound are the research design and methods

Is the sample of observations selected fro the test reflective of the population are the values of the independnet variables not dependent on each other are there significant confounding or exogenous factors influencing the depenent variable and thus need to be controlled for in the model?

Internal Validity

Do the data sets and variables accurately represent the phenomena being explored

External validity

Can the results of the study be generalized

Limitations Potential concerns: It’s possible that if we train the model on the historically available data and there is the presence of unfairness with richer buildings not paying their fair share, then the model would predict relatively lower property tax assessments than you would expect and so the residuals would be normal for the higher end buildings. If we have a nonlinear model that may capture the unfairness anomaly as an expected part of the model so if we use a linear model, and the actual tax assessment methodology is linear then we will avoid this problem. If we can’t use differences in the residuals to identify the tax anomaly then we may have to find other aspects of the model to look for taxation anomalies. [Later, go through and standardize the nomenclature]

Model Selection

How did the type of relationships among the variables or end result influence which statistical or machine learning model was most appropriate

consider the scikit-learn algorithm cheat-sheet OR https://www.analyticssteps.com/blogs/5-statistical-data-analysis-techniques-statistical-modelling-machine-learning

main classifications are: regression / classifcation / clustering / dimensionality reduction

Model Fit

Discuss my work in relationship to overfitting and underfitting

need to discuss feature importance and partial dependence plots - what other visuals can I interpret?

Data Analysis

Discussion of Results:

One concern is that the model might learn the systemic bias and so maybe that’s another way to look at this. If the model is showing even residuals between poor, mid and rich buildings… I guess that would indicate it’s more systemic unfairness than sophisticated owners being able to appeal.

Also it’s easy to see with linear regression, but if we have some low-interpretable non-linear model then I’m not sure what conclusions I’ll be able to draw.

Conclusion:

Final Presentation

Include a link to the youtube presentation ***

The city’s ordinary property owners, from the most vulnerable to the highly educated, don’t even know they’re being overtaxed because the harm is so abstract and technical.

write an article for The Co-operator ***

include link to data repository Github or elsewhere or where on the NYC websites I can find the data or, it should already be processed. maybe I can have links to other RPubs documents where the data scrubbing is visible ***

Simple website where you enter your buildings Block and Lot and are able to see a Tax Anomaly-ometer where: Green - fine/under Yellow - fine Orange - over paying Red - definitely appeal Maybe the website can also produce a small report that a board member can bring back to their board to discuss Share as a marketing tool for Daisy our property management company

make the final presentation off of slides and not the final document. ***

CUNY MSDS Capstone Project

Taxation Equity for NYC Co-ops

PK O’Flaherty

Stage 7 - wrote research question

2025-06-02