Predicting Airbnb Prices in NYC

ECON 465 – Final Project

Cemre Nur Hascan

2026-06-04

Economic Motivation

Why Do Short-Term Rental Prices Matter?

Housing and rental markets are critical components of urban economics.
Pricing behavior reveals how consumers financially value location and privacy.
Identifying these price drivers helps explain real estate valuation in highly competitive markets like New York City.

The Economic Question

Core Research Question:

“How do structural features, locational characteristics, and listing engagement predict the short-term rental price of an Airbnb property in New York City?”

Objectives: - Compare basic engagement metrics against hard structural economic factors. - Determine which variable creates the highest price premium.

Dataset Description

New York City Airbnb Open Data (2019)

Source: Kaggle
Observations: Filtered dataset removing invalid/zero prices.
Target Variable: price (Continuous, USD)
Key Predictors:
- Structural: room_type (Entire home/apt, Private room, Shared room)
- Locational: neighbourhood_group (Manhattan, Brooklyn, Queens, etc.)
- Engagement: availability_365, number_of_reviews

Finding from Stage 1: Probability Analysis

The Distribution of Rental Prices

Observation: The original price variable was heavily right-skewed. Most properties are affordable, with rare extreme luxury outliers.
Action Taken: Applied a Log-Transformation (log_price).
Result: The transformed data strongly approximates a Log-Normal distribution.
Implication: This transformation was strictly necessary to satisfy the normality assumptions required for linear regression modeling.

Models Built and Compared

Model 1: Baseline Regression - Method: Simple Linear Regression - Predictors: availability_365 + number_of_reviews - Focus: Can we predict price simply based on listing engagement and availability?

Model 2: Comprehensive Regression - Method: Multiple Linear Regression - Predictors: availability_365 + number_of_reviews + room_type + neighbourhood_group - Focus: Incorporating core structural and spatial economic drivers.

Why I Chose the Final Model (Model 2)

Model 2 strictly outperformed the baseline for three reasons:

Better Statistical Performance: Adding location and room type caused the RMSE to drop and R-squared to increase dramatically.
Economic Logic: Real estate pricing is fundamentally driven by space and location, not just booking frequency.
Model Stability: A 5-Fold Cross-Validation confirmed that the model generalizes well to unseen data and does not suffer from overfitting.

Main Results & Economic Interpretation

1. The Location Premium - Listings located in Manhattan command a significantly higher baseline price compared to Brooklyn or Queens. - Interpretation: Highlights spatial inequality and severe demand in the city center.

2. The Privacy Premium - Moving from a “Private room” to an “Entire home/apt” drastically increases the expected price. - Interpretation: Tourists pay a heavy premium to avoid shared spaces. Structural factors dominate engagement metrics.

Limitations

Omitted Variable Bias:
- The dataset lacks a physical property size variable (square footage). A high price might simply reflect a larger house, limiting model precision.
Temporal Relevance:
- The data is from 2019 (pre-COVID-19). It does not capture recent inflation, shifts in tourism, or remote-work housing demands.

Future Improvements & Conclusion

Future Improvements: - Incorporate proximity to public transit (e.g., distance to nearest subway station) as an additional spatial predictor.

Future Research Question: - Do highly reviewed places charge a premium, or do they receive many reviews because they are affordably priced?

Conclusion: - Structural space and spatial location strictly dominate standard engagement metrics in predicting urban rental pricing.