ECON 465 Stage 3 Final Report
Economic Question
In this project, I wanted to understand what affects Airbnb prices in New York City. Airbnb listings have different prices, and I wanted to see whether factors such as reviews, availability, and host activity can help explain these differences.
My research question is:
What factors affect Airbnb listing prices in NYC?
Data
For this analysis, I used the NYC Airbnb dataset from Kaggle. First, I imported the data into R and cleaned it by fixing the variable names and removing missing values.
The main variable I am trying to predict is price.
The variables used in the model are:
- minimum nights
- number of reviews
- availability_365
- calculated_host_listings_count
- reviews_per_month
To evaluate the model fairly, I split the data into training data (80%) and test data (20%) using a random seed of 465.
Probability Analysis
When I looked at the distribution of Airbnb prices, I noticed that it was strongly right-skewed. Most listings had moderate prices, but there were some very expensive listings.
After creating a log transformation of price, the distribution became more balanced and closer to a normal shape. This suggests that Airbnb prices may follow a log-normal distribution.
This analysis shows that very expensive Airbnb listings exist, but they are much less common than average-priced listings.
Modeling
I built two linear regression models.
Model 1 included:
- minimum nights
- number of reviews
- calculated_host_listings_count
Model 2 included all variables from Model 1 and added:
- availability_365
- reviews_per_month
I compared the models using RMSE and R-squared values. I also used 5-fold cross-validation to check whether the model performed consistently across different samples.
Results
Model 2 achieved better predictive performance than Model 1 because it included additional explanatory variables.
The regression results show that all explanatory variables are statistically significant.
The estimated coefficients are:
- Minimum nights: +0.171
- Number of reviews: -0.151
- Availability 365: +0.119
- Calculated host listings count: +0.250
- Reviews per month: -2.453
The model is statistically significant overall (p < 0.001).
However, the model’s explanatory power is limited, with an R-squared value of approximately 0.01. This means that only about 1% of the variation in Airbnb prices is explained by the variables included in the model.
Economic Interpretation
The results suggest that Airbnb prices are related to listing characteristics.
Listings with higher minimum night requirements tend to have slightly higher prices. Listings that are available for more days during the year also tend to charge higher prices.
I found a negative relationship between reviews and price. One possible reason is that lower-priced listings receive more bookings and therefore accumulate more reviews.
Hosts with a larger number of listings tend to charge slightly higher prices, which may reflect greater experience in the Airbnb market.
Even though the model is statistically significant, the low R-squared value suggests that factors such as location, room type, neighborhood quality, and amenities are probably much more important in determining Airbnb prices.
Limitations and Reproducibility
Limitations
The dataset does not contain information about location quality, amenities, or room type.
The analysis shows relationships between variables but does not prove causation.
Reproducibility
To make the analysis reproducible:
- I used relative file paths.
- I used
set.seed(465)before splitting the data. - All code and analysis were included in one Quarto document.
- The analysis was completed using tidyverse, janitor, tidymodels, rsample, and yardstick.
AI Use Log
AI Tool Used
ChatGPT
Prompt
“How can I calculate RMSE and R-squared in R using tidymodels?”
How I Used the Output
I used the explanation as a guide while building and evaluating my regression models.
Verification
I tested all code myself in RStudio and made changes when necessary.
Final Reflections
One Improvement
If I had more time, I would include variables related to neighborhood, room type, and amenities because these factors probably have a strong impact on Airbnb prices.
Future Research Question
A question I would like to study in the future is:
How does neighborhood location influence Airbnb prices in different cities?
Conclusion
In this project, I analyzed factors affecting Airbnb prices in New York City using regression analysis. The results showed that listing characteristics have some relationship with price, but the model explains only a small part of price differences. Including more detailed information about properties and locations would likely improve the analysis.