Predicting Airbnb Prices in NYC

ECON 465 – Final Project

Cemre Nur Hascan

2026-06-04

Economic Motivation

Why Do Short-Term Rental Prices Matter?

  • Housing and rental markets are critical components of urban economics.
  • Pricing behavior reveals how consumers financially value location and privacy.
  • Identifying these price drivers helps explain real estate valuation in highly competitive markets like New York City.

The Economic Question

Core Research Question:

“How do structural features, locational characteristics, and listing engagement predict the short-term rental price of an Airbnb property in New York City?”

Objectives: - Compare basic engagement metrics against hard structural economic factors. - Determine which variable creates the highest price premium.

Dataset Description

New York City Airbnb Open Data (2019)

  • Source: Kaggle
  • Observations: Filtered dataset removing invalid/zero prices.
  • Target Variable: price (Continuous, USD)
  • Key Predictors:
    • Structural: room_type (Entire home/apt, Private room, Shared room)
    • Locational: neighbourhood_group (Manhattan, Brooklyn, Queens, etc.)
    • Engagement: availability_365, number_of_reviews

Finding from Stage 1: Probability Analysis

The Distribution of Rental Prices

  • Observation: The original price variable was heavily right-skewed. Most properties are affordable, with rare extreme luxury outliers.
  • Action Taken: Applied a Log-Transformation (log_price).
  • Result: The transformed data strongly approximates a Log-Normal distribution.
  • Implication: This transformation was strictly necessary to satisfy the normality assumptions required for linear regression modeling.

Models Built and Compared

Model 1: Baseline Regression - Method: Simple Linear Regression - Predictors: availability_365 + number_of_reviews - Focus: Can we predict price simply based on listing engagement and availability?

Model 2: Comprehensive Regression - Method: Multiple Linear Regression - Predictors: availability_365 + number_of_reviews + room_type + neighbourhood_group - Focus: Incorporating core structural and spatial economic drivers.

Why I Chose the Final Model (Model 2)

Model 2 strictly outperformed the baseline for three reasons:

  1. Better Statistical Performance: Adding location and room type caused the RMSE to drop and R-squared to increase dramatically.
  2. Economic Logic: Real estate pricing is fundamentally driven by space and location, not just booking frequency.
  3. Model Stability: A 5-Fold Cross-Validation confirmed that the model generalizes well to unseen data and does not suffer from overfitting.

Main Results & Economic Interpretation

1. The Location Premium - Listings located in Manhattan command a significantly higher baseline price compared to Brooklyn or Queens. - Interpretation: Highlights spatial inequality and severe demand in the city center.

2. The Privacy Premium - Moving from a “Private room” to an “Entire home/apt” drastically increases the expected price. - Interpretation: Tourists pay a heavy premium to avoid shared spaces. Structural factors dominate engagement metrics.

Limitations

  1. Omitted Variable Bias:
    • The dataset lacks a physical property size variable (square footage). A high price might simply reflect a larger house, limiting model precision.
  2. Temporal Relevance:
    • The data is from 2019 (pre-COVID-19). It does not capture recent inflation, shifts in tourism, or remote-work housing demands.

Future Improvements & Conclusion

Future Improvements: - Incorporate proximity to public transit (e.g., distance to nearest subway station) as an additional spatial predictor.

Future Research Question: - Do highly reviewed places charge a premium, or do they receive many reviews because they are affordably priced?

Conclusion: - Structural space and spatial location strictly dominate standard engagement metrics in predicting urban rental pricing.