Boston Airbnb Listing Price Predictive modeling

Edward Harvey & Jake Naughton

12/15/2021

Objectives

Our goal for this project is to use publicly available data on Boston-area Airbnb listings to help a first-time Airbnb renter accurately price their new listing. Our project consists of two analyses:

-Using bootstrap CIs to suggest a listing price based on neighborhood and other characteristics

-Using “sentiment analysis” of listing descriptions to identify vocabulary that influence price

Attempting to fit a Gamma distribution via MLE

Airbnb prices in Boston appear to be Gamma distributed.

Zooming in on neighborhoods

Unfortunately individual neighborhoods do not show a common distribution.

Boostrap CIs by neighbourhood

Bootstrapping provides a non-parametric alternative for analyzing the distribution of neighborhood prices.

Bootstrap CIs for number of bedrooms

The bootstrap method works for other property characteristics as well, such as the number of bedrooms.

Bootstrap CI predictive function for various characteristics

We developed a function to provide bootstrap CIs according to the following characteristics, any combination of which may be specified:

-Neighborhood

-Number of bathrooms, bedrooms and beds

-Property type and room type

-whether the host is a “superhost”

(price_listing_func(neighbourhood="Back Bay", num_beds = 2, type_room = "Entire home/apt"))
## $lower_bound
##     2.5% 
## 273.8756 
## 
## $mean_estimate
## [1] 293.9816
## 
## $upper_bound
##    97.5% 
## 315.2593 
## 
## $number_of_observations
## [1] 81

Inputs can be vectors, will return warning if dataset is limited

(price_listing_func(neighbourhood="Roslindale", num_bedroom = 2, superhost = TRUE))
## Warning in price_listing_func(neighbourhood = "Roslindale", num_bedroom = 2, :
## Fewer than 5 properties with these characteristics
## $lower_bound
## 2.5% 
##    6 
## 
## $mean_estimate
## [1] 113.549
## 
## $upper_bound
## 97.5% 
##   223 
## 
## $number_of_observations
## [1] 1

Sentiment Analysis

##    Sentiment_words_df.words_test Sentiment_words_df.Freq
## 1                       downtown                     809
## 2                        private                     739
## 3                    restaurants                     682
## 4                          great                     509
## 5                        minutes                     492
## 6                        station                     470
## 7                        walking                     432
## 8                       spacious                     431
## 9                          heart                     403
## 10                         quiet                     392
## 11                     beautiful                     385
## 12                       parking                     368
## 13                          good                     340
## 14                      historic                     319
## 15                        subway                     306

We are selecting 15 words to use for a sentiment analysis to see if the summary of the property can have any impact on its price. To choose the descriptors for analysis we looked at the most frequently reoccurring words excluding ones like prepositions, numbers, Boston, etc.

Distribution of price by appearance of Sentiment

Bootstrap Intervals for Sentiment

Conclusions

-It is not appropriate to assume a gamma distribution for price data when broken down by listing characteristic

-Bootstrap CIs provide a reliable way to predict price data for a limited number of combined characteristics

-Only about four sentiment words appear to be associated with significantly higher prices

-These four words may be associated with neighborhood characteristics (e.g. “historic”)

-The analysis does not take into account combinations of words (e.g. “historic,” “parking,” and “private”), so there is some overlap in the bootstrap CIs, and certain combinations of words might produce different effects.