An alternative to hotel accommodations, Airbnb has made it easier for individuals to rent rooms or whole houses to other individuals creating a new competitive corner of the market disrupting tourism business models in larger cities (Gutiérrez et al. 2017). Research into impacts and Airbnb distribution has been primarily focused in European markets with attention given to larger tourist city centers (Adamiak et al. 2019; Garcia-López et al. 2020). An exception can be found in a study focusing on Los Angeles County, California that compared short-term housing rental impacts on housing markets by comparing cities in the county that had implemented restrictions on short-term rentals and those that had no such restrictions. It was found that these restrictions led to a 2% drop in property values (Koster et al. 2021).
A city with a market for historical tourism, Boston mixes historic tourist sites amid residential communities. Boston features 24 economically varied neighborhoods and is heavily touristed with no active restrictions or ordinances impacting Airbnb listings making it an ideal candidate for Airbnb market analysis (Meet Boston | Your Official Guide to Boston, n.d.). Utilizing Boston as the region of focus and spatial statistic methods, which neighborhoods see the highest occupancy rate, and how does this relate to the mean household income of each neighborhood?
Airbnb listing data was sourced from Insideairbnb.com. This data contains listings from January 1st, 2025 to December 27th, 2025. To obtain the data, Inside Airbnb utilizes public information compiled from the Airbnb website and is verified and cleaned prior to release. The occupancy rate for each listing are calculated utilizing a model based on the work of (Bunn 2024). The resulting model ,called “San Francisco Model”, takes in to account the average stay length of approximately 50% of reviews in a listing. This value is then multiplied by the estimated bookings of the listing to produce an occupancy rate. This value compared with the listing type was utilized to produce an occupancy rating for each neighborhood normalized by listing type. This data set will require filtering of variables to ensure only the neighborhood, coordinates, occupancy rate, and listing type are selected.
US Census data was sourced utilizing the tidycensus
package pulling the 2024 1-yr estimate data. Utilizing the mean
household income by census tract, this data set provides the mean
household income for each neighborhood when combined. Utilizing
tidycensus provides a clean data set that can be combined
with other data sets for deeper analysis.
Neighborhood data was sourced from the official City of Boston data hub. This data
set provides geographies for the 24 Boston neighborhoods aligned with
census tracks allowing for incorporation of tidycensus
data.
While working on this project, my R Studio went through an update that was not initially realized during drafting of this draft. The update has resulted in packages not installing or not functioning properly. As such, I have included the coding as far as I have progressed, but will be reverting to previous version and continuing the project during progression following this draft.
# import data ----
neighborhoods <- st_read(here("data", "Boston_Neighborhood_Boundaries_Approximated_by_2020_Census_Tracts.shp"),
quiet = TRUE)
listings <- read_csv(here("data", "listings_sum.csv"))
# Compare neighborhoods captured by each data set ----
unique(neighborhoods$neighborho)
unique(listings$neighbourhood)
# Plot initial data ----
listings_plot <- st_as_sf(listings, coords = c("longitude", "latitude"), crs = 4326)
# Plot of Neighborhoods
leaflet(data = neighborhoods) %>%
addTiles() %>%
addPolygons(fillColor = ~ as.factor(neighborhoods$neighborho),
stroke = FALSE,
label = ~ neighborho)
ggplot() +
geom_sf(data = listings_plot,
aes(color = as.factor(room_type))) +
geom_sf(data = neighborhoods, fill = NA) +
coord_sf(crs = 4326) +
theme_void()
Figure 1 is a flow chart of the processes employed to determine if a
correlation exists between the types and occupancy of Airbnb listings
and the mean household income of each neighborhood. The three data sets
will each require unique data wrangling to produce tidy data ready for
analysis operations. Selecting the Airbnb variables previously
discussed, a new, tidy data set will be produced. This data set will be
combined with Boston neighborhood data to create visual context for
distribution of listings across the city. Following this, US Census data
for census tracks of the Boston area will be pulled using
tidycensus. This data set will be converted into centroid
points for each tract. The mean income for each point will be added
together for all points in each neighborhood and averaged to produce a
mean value for each neighborhood. The resulting data set will be plotted
and combined with point count data from the Airbnb listings data set to
produce a joined table consisting of neighborhoods with variables of
mean household income, # of total listings and listings by type, and
occupancy rates. From this data set, plots will be produced of the # of
listings by types versus the mean household income of each neighborhood.
From these plots, correlation will be determined.