Estimating NY State Renter Occupancy Rates by Tract Using PUMA-Level Models

Intro

  • The purpose of this analysis is to see how accurately we can predict renter occupancy (Rentocc) rates at the tract level, by using models that are informed by PUMA Data.
  • We will produce separate models for each PUMA, and will use the model from each puma to estimate the renter occupancy for each tract contained in that PUMA.
  • Technically, we have renter occupancy data available at the tract level.
  • We will make our predictions as though renter occupancy data is unavialable at the tract level.
  • We will then compare predicted renter occupancy to the true renter occupancy rate.

This is possible because:

  • We have PUMA and Tract data which represent the same populations

  • All of the variables which we need in order to predict renter occupancy exist in both the PUMA & tract data.

Maps

  • Map of Actual Renter Occupancy shows Renter Occupacy rates as recorded in ACS tract data.
  • Map of Predicted Renter Occupancy shows Renter Occupacy rates as predicted by the model build on PUMA data.

Actual vs Predicted

Use the slider at the center of the map to alternate between actual & predicted values.

  • Left: Actual Renter Occupancy Rate
  • Right: Predicted Renter Occupancy Rate

Actual Renter Occupancy

Predicted Renter Occupancy

Regression Model

The information below describes the model which is built from PUMA data to predict renter occupancy at the tract level.The model is displayed, and the the amount of error in the predictions made from this model is described through visualization & summary.

  • PUMA microdata is used to model the relationship between rental occupancy and race, age, and income of the head of household.

  • Tract Data (absent any rental occupancy data) will be then evaluated through this model, and rental occupancy will be estimated for each tract.

Model:

Model Accuracy

The logistic regression models predict renter occupancy rates by using race, age, and income of head-of-household. Alternate modelling methods (logistic mixed effects, spatial microsimulation) may improve accuracy and minimize error. Different and/or additional predictor variables may also improve predictions.

  • Statewide, 46% of households are renter occupied (per tract data).
  • On average, predictions are within +/-9% of the true Renter Occupancy Rate.
  • 90% of predictions are within +/-18% or less of the true Renter Occupancy Rate
  • Minimum Error is 0% (Perfect Prediction)
  • Maximum Error is 90%
  • Error is approximately Normally Distributed

Histogram of Prediction Errors

Viz: Predicted vs Actual Correlation

There is a linear relationship between our predictions & actuals - this is a good sign.

Viz: Overestimating Low Rentocc, Underestimating High Rentocc

Error is not randomly distributed for all values of renter occupancy.

  • This model is likely to over-estimate renter occupancy in those tracts which have very low renter occupancy.

  • This model is likely to under-estimate renter occupancy in those tracts which have very high renter occupancy.