Edward Harvey & Jake Naughton
12/15/2021
Our goal for this project is to use publicly available data on Boston-area Airbnb listings to help a first-time Airbnb renter accurately price their new listing. Our project consists of two analyses:
-Using bootstrap CIs to suggest a listing price based on neighborhood and other characteristics
-Using “sentiment analysis” of listing descriptions to identify vocabulary that influence price
Airbnb prices in Boston appear to be Gamma distributed.
Unfortunately individual neighborhoods do not show a common distribution.
Bootstrapping provides a non-parametric alternative for analyzing the distribution of neighborhood prices.
The bootstrap method works for other property characteristics as well, such as the number of bedrooms.
We developed a function to provide bootstrap CIs according to the following characteristics, any combination of which may be specified:
-Neighborhood
-Number of bathrooms, bedrooms and beds
-Property type and room type
-whether the host is a “superhost”
## $lower_bound
## 2.5%
## 273.8756
##
## $mean_estimate
## [1] 293.9816
##
## $upper_bound
## 97.5%
## 315.2593
##
## $number_of_observations
## [1] 81
## Warning in price_listing_func(neighbourhood = "Roslindale", num_bedroom = 2, :
## Fewer than 5 properties with these characteristics
## $lower_bound
## 2.5%
## 6
##
## $mean_estimate
## [1] 113.549
##
## $upper_bound
## 97.5%
## 223
##
## $number_of_observations
## [1] 1
## Sentiment_words_df.words_test Sentiment_words_df.Freq
## 1 downtown 809
## 2 private 739
## 3 restaurants 682
## 4 great 509
## 5 minutes 492
## 6 station 470
## 7 walking 432
## 8 spacious 431
## 9 heart 403
## 10 quiet 392
## 11 beautiful 385
## 12 parking 368
## 13 good 340
## 14 historic 319
## 15 subway 306
We are selecting 15 words to use for a sentiment analysis to see if the summary of the property can have any impact on its price. To choose the descriptors for analysis we looked at the most frequently reoccurring words excluding ones like prepositions, numbers, Boston, etc.
-It is not appropriate to assume a gamma distribution for price data when broken down by listing characteristic
-Bootstrap CIs provide a reliable way to predict price data for a limited number of combined characteristics
-Only about four sentiment words appear to be associated with significantly higher prices
-These four words may be associated with neighborhood characteristics (e.g. “historic”)
-The analysis does not take into account combinations of words (e.g. “historic,” “parking,” and “private”), so there is some overlap in the bootstrap CIs, and certain combinations of words might produce different effects.