Inside Airbnb: An Exploratory Analysis of Barcelona Listings

Author

Rui Gao

Published

May 8, 2026

1 Introduction

  • Problem statement: How are Airbnb listings distributed across Barcelona, Spain? Are certain neighborhoods more expensive? Are full homes being rented more than private rooms, potentially removing housing from the long-term market?

  • Why it matters: Short-term rentals like Airbnb have been shown to reduce housing availability and increase rents in cities. Barcelona has a Ill-known housing affordability crisis, making this particularly relevant.

  • Methodology: I will and clean listing + spatial data, merge them, and perform EDA using maps, plots, and summary statistics. To address my problem statement, I utilize listing data sourced from Inside Airbnb (insideairbnb.com), an independent, mission-driven project that scrapes and publishes publicly available Airbnb listing data to promote transparency about the platform’s impact on residential communities. I will use the Barcelona, Catalonia, Spain data from https://insideairbnb.com/get-the-data/

    My chosen dataset contains 18,177 Airbnb listings across Barcelona, Spain, including information on listing type, neighbourhood, host activity, availability, and review history. We focus our analysis on five of Barcelona’s most prominent districts Ciutat Vella, Eixample, Gràcia, Sant Martí, and Sants-Montjuïc, which together represent the city’s most visited and densely populated areas. After importing and cleaning the data using tools from the ‘tidyverse’ package in R, I will perform exploratory data analysis using summary statistics, data visualizations, and spatial mapping to uncover patterns in listing distribution and room type concentration across Barcelona’s neighbourhoods.

    For visualizations, I will use ‘ggplot2’ and ‘plotly’ to create a mix of bar charts, chloropleth maps, and interactive visualizations for deeper exploration of the data.

    I will use the ‘sf’ package to merge the listing summary data with neighbourhood boundary data from a GeoJSON file, allowing me to produce choropleth maps that visualize the geographic concentration of entire home listings across Barcelona’s neighbourhoods.

    Together these techniques allow me to build a comprehensive picture of how Airbnb listings are distributed across Barcelona and to what extent the platform is contributing to the removal of housing from the long-term residential market.

  • Consumers: This analysis is designed to be useful to several different audiences:

    1. City planners and housing policymakers can use the geographic concentration maps and district-level summaries to identify which neighbourhoods are most impacted by short-term rental activity and prioritize those areas for regulatory intervention such as licensing requirements or caps on entire home listings.
    2. Barcelona residents and housing advocates can use this analysis to better understand the scale of Airbnb’s presence in their neighbourhoods and advocate for policies that protect the long-term housing supply.
    3. Tourists / travelers can use this analysis to make more informed decisions about where they choose to stay in Barcelona, and engage in ethical tourism practices. They can understand that booking entire home listings in high-concentration neighbourhoods may contribute to housing insecurity for long-term residents.
    4. Researchers and data scientists can use this analysis as a foundation for more advanced investigation, such as merging Airbnb data with Census income data or housing price indices to more directly measure the relationship between Airbnb concentration and neighbourhood gentrification over time.

2 Packages Required

Add your required package info/code here.

library(tidyverse)   # data wrangling and visualization
library(sf)          # spatial data and mapping
library(ggplot2)     # static plots
library(plotly)      # interactive plots
library(readr)       # importing CSV data
library(janitor)     # cleaning column names

3 Data Preparation

Add info/code related to data preparation. This would be the steps you performed to clean your data, maybe creating new variables, and putting it into a convenient, tidy form for data analysis in the next step.

The data used in this analysis was obtained from Inside Airbnb (https://insideairbnb.com/get-the-data/), an independent, non-commercial project that collects and publishes publicly available data scraped from the Airbnb platform.

I used the Barcelona, Spain dataset, and my analysis uses a listings file (‘barcelona_listings.csv’), a neighbourhood boundary file (‘neighbourhoods.geojson’), and a neighbourhood lookup table (‘neighbourhoods.csv’).

The raw ‘barcelona_listings.csv’ file contains 18,177 Airbnb listings across Barcelona with 19 variables capturing information about each listing including its location, host activity, room type, availability, and review history. It is important to note that the ‘price’ column is entirely missing for all 18,177 values, which significantly limits the ability to analyze pricing patterns and is a major limitation of this dataset.

Key variables in the raw dataset include:

| Variable | Description | |———-|————-| | ‘id’ | Unique identifier for each listing | | ‘name’ | Name of the listing | | ‘host_id’ | Unique identifier for the host | | ‘neighbourhood_group’ | District the listing belongs to | | ‘neighbourhood’ | Specific neighbourhood of the listing | | ‘latitude’ / ‘longitude’ | Geographic coordinates of the listing | | ‘room_type’ | Type of listing (Entire home, Private room, etc.) | | ‘price’ | Nightly price of the listing | | ‘minimum_nights’ | Minimum number of nights required to book | | ‘number_of_reviews’ | Total number of reviews received | | ‘availability_365’ | Number of days available in the next 365 days | | ‘calculated_host_listings_count’ | Total number of listings the host manages | | ‘last_review’ | Date of the most recent review | | ‘reviews_per_month’ | Average number of reviews per month | | ‘license’ | Listing license number if applicable |

# import spatial neighbourhood boundary data
neighbourhoods <- st_read("neighbourhoods.geojson",
                           quiet = TRUE)

# import listings data
listings <- read_csv("barcelona_listings.csv",
                     na = c("", "N/A", "NA"))

# clean, filter and create new variables
listings <- listings |>
  clean_names() |>                                      
  filter(neighbourhood_group %in% c("Gràcia",           
                                    "Eixample",
                                    "Sant Martí",
                                    "Ciutat Vella",
                                    "Sants-Montjuïc")) |>
  mutate(
    room_type = factor(room_type),                      
    neighbourhood_group = factor(neighbourhood_group),  
    neighbourhood = factor(neighbourhood),              
    last_review = as.Date(last_review),                 
    availability_pct = availability_365 / 365 * 100,    
    is_entire_home = ifelse(room_type == "Entire home/apt",
                            "Entire Home", "Other"),    
    is_commercial = ifelse(availability_365 > 300,
                           TRUE, FALSE),                
    is_multi_host = ifelse(calculated_host_listings_count > 1,
                           TRUE, FALSE)                 
  )

# verify filter and new variables worked correctly
table(listings$neighbourhood_group)

  Ciutat Vella       Eixample         Gràcia     Sant Martí Sants-Montjuïc 
          4120           6659           1548           1634           1766 
table(listings$is_entire_home)

Entire Home       Other 
      10323        5404 
colSums(is.na(listings))
                            id                           name 
                             0                              0 
                       host_id                host_profile_id 
                             0                              0 
                     host_name            neighbourhood_group 
                             4                              0 
                 neighbourhood                       latitude 
                             0                              0 
                     longitude                      room_type 
                             0                              0 
                         price                 minimum_nights 
                         15727                              0 
             number_of_reviews                    last_review 
                             0                           4109 
             reviews_per_month calculated_host_listings_count 
                          4109                              0 
              availability_365          number_of_reviews_ltm 
                             0                              0 
                       license               availability_pct 
                          6044                              0 
                is_entire_home                  is_commercial 
                             0                              0 
                 is_multi_host 
                             0 
# show first 10 rows of cleaned data
print(listings, n = 10)
# A tibble: 15,727 × 23
        id name            host_id host_profile_id host_name neighbourhood_group
     <dbl> <chr>             <dbl>           <dbl> <chr>     <fct>              
 1   18674 "Huge flat for…  7.16e4         1.46e18 Mireia    Eixample           
 2 2031134 "Sagrada Famil…  9.10e6         1.46e18 Miguel    Eixample           
 3 4415694 "DEEP PURPLE -…  1.24e7         1.46e18 Yolanda   Sants-Montjuïc     
 4 4415780 "Double Room \…  4.99e6         1.46e18 Judith    Gràcia             
 5 5064035 "Lovely 2 BD w…  1.74e6         1.46e18 Silvia &… Gràcia             
 6 5064651 "Nice cute roo…  2.62e7         1.47e18 Rabea     Eixample           
 7 6163177 "Piso a 100 mt…  3.20e7         1.46e18 Pablo     Sant Martí         
 8 8414793 "Lugaris Beach…  3.76e5         1.46e18 Xavier    Sant Martí         
 9 9357521 "Three bedroom…  8.13e6         1.46e18 Eva       Gràcia             
10 9358082 "Bright, frien…  4.86e7         1.47e18 Marie-Gé… Sant Martí         
# ℹ 15,717 more rows
# ℹ 17 more variables: neighbourhood <fct>, latitude <dbl>, longitude <dbl>,
#   room_type <fct>, price <lgl>, minimum_nights <dbl>,
#   number_of_reviews <dbl>, last_review <date>, reviews_per_month <dbl>,
#   calculated_host_listings_count <dbl>, availability_365 <dbl>,
#   number_of_reviews_ltm <dbl>, license <chr>, availability_pct <dbl>,
#   is_entire_home <chr>, is_commercial <lgl>, is_multi_host <lgl>

I learned to use ‘clean_names()’ from the ‘janitor’ package to quickly standardize all column names to ‘snake_case’, making them easier to work with. Then I filtered the five districts of interest since my analysis focuses specifically on the areas of Barcelona most likely to be impacted by short-term rental activity. I convert categorical variables to factors so R treats them appropriately in summaries and visualizations, and convert ‘last_review’ from a character string to a proper date object.

4 Exploratory Data Analysis

Add info/code related to exploratory data analysis. This would be the code you used to generate visualizations and summary statistics as Ill as narrative text that interprets and explains interesting features in the plots/statistics you generated.

I created new variables to further analyze gentrification:

  • ‘availability_pct’ = expresses availability as a percentage of the year
  • ‘is_entire_home’ = flags listings that are entire home rentals, since these are most likely to displace long-term residents by removing complete housing units from the residential market.
  • ‘is_commercial’ = flags listings available for more than 300 days per year, which suggests the listing is operated as a full-time commercial rental rather than an occasional home share.
  • ‘is_multi_host’ = flags hosts who manage more than one listing simultaneously, correlates with commercial property management activity rather than individual residents renting their own homes.
# summarize entire home concentration by neighbourhood group
neighbourhood_summary <- listings |>
  group_by(neighbourhood_group) |>
  summarize(
    n_listings = n(),
    n_entire_home = sum(is_entire_home == "Entire Home"),
    pct_entire_home = mean(is_entire_home == "Entire Home") * 100,
    pct_commercial = mean(is_commercial, na.rm = TRUE) * 100,
    pct_multi_host = mean(is_multi_host, na.rm = TRUE) * 100,
    avg_availability = mean(availability_365, na.rm = TRUE)
  ) |>
  arrange(desc(pct_entire_home))

print(neighbourhood_summary)
# A tibble: 5 × 7
  neighbourhood_group n_listings n_entire_home pct_entire_home pct_commercial
  <fct>                    <int>         <int>           <dbl>          <dbl>
1 Sant Martí                1634          1149            70.3           31.7
2 Gràcia                    1548          1059            68.4           33.5
3 Eixample                  6659          4479            67.3           39.7
4 Sants-Montjuïc            1766          1137            64.4           35.4
5 Ciutat Vella              4120          2499            60.7           33.7
# ℹ 2 more variables: pct_multi_host <dbl>, avg_availability <dbl>

The table above summarizes the concentration of Airbnb listings across the five most popular districts of Barcelona. Ciutat Vella and Eixample show the highest percentages of entire home listings, which suggest these central tourist districts face the greatest displacement pressure on long-term residents. The pct_commercial column indicates that a significant portion of listings are available for over 300 days a year, which is a strong indicator taht these are likely permanent commercial rentals that remove housing stock from Barcelona’s long-term market for locals.

# creating bar plot of % entire home listings by neighbourhood group
ggplot(neighbourhood_summary, aes(x = reorder(neighbourhood_group, -pct_entire_home),
                                   y = pct_entire_home,
                                   fill = neighbourhood_group)) +
  geom_col() +
  labs(
    title = "Concentration of Entire Home Listings by District",
    x = "District",
    y = "% Entire Home Listings"
  ) +
  theme_minimal() +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 45, hjust = 1))

The bar plot above ranks each district by the percentage of listings that are entire home rentals. Districts with higher percentages are of greatest concern from a housing perspective. When entire homes are listed on Airbnb rather than rented to long-term residents, this directly reduces the supply of available housing and can drive up rents in surrounding areas as residents are pushed out. The pattern suggests that gentrification pressure is not evenly distributed across Barcelona but is concentrated in specific central districts.

# count of each room type by neighbourhood group
room_type_summary <- listings |>
  group_by(neighbourhood_group, room_type) |>
  summarize(n = n(), .groups = "drop") |>
  group_by(neighbourhood_group) |>
  mutate(pct = n / sum(n) * 100)

# stacked bar plot
ggplot(room_type_summary, aes(x = neighbourhood_group,
                               y = pct,
                               fill = room_type)) +
  geom_col() +
  labs(
    title = "Room Type Distribution Across Barcelona Districts",
    x = "District",
    y = "Percentage of Listings",
    fill = "Room Type"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

The stacked bar plot breaks down the composition of listing types within each district. Entire home rentals have the highest amount across all five districts. This indicates that Airbnb is primarily being used to convert residential housing into short-term tourist accommodation rather than allowing residents to occasionally rent out a spare room. Private room listings where an existing resident lives there make up a notably smaller share in the most tourist-heavy districts.

# perform commercial operator analysis
multi_host_summary <- listings |>
  group_by(neighbourhood_group) |>
  summarize(
    pct_multi_host = mean(is_multi_host) * 100,
    avg_host_listings = mean(calculated_host_listings_count, na.rm = TRUE)
  ) |>
  arrange(desc(pct_multi_host))

print(multi_host_summary)
# A tibble: 5 × 3
  neighbourhood_group pct_multi_host avg_host_listings
  <fct>                        <dbl>             <dbl>
1 Eixample                      84.1              88.9
2 Ciutat Vella                  77.2              48.6
3 Gràcia                        75.8              62.5
4 Sants-Montjuïc                71.0              54.8
5 Sant Martí                    68.1              33.5

A key indicator of commercial Airbnb activity is when there are hosts who manage multiple listings. Realistically, a resident renting out their own home would typically have only one listing for a spare room. But property management companies would manage many as a result of their business model. The table above shows that across all five districts, a substantial percentage of listings belong to multi-listing hosts, with Eixample (85.07%) and Ciutat Vella (77.16%) having the highest percentages. This suggests that a significant portion of Barcelona’s Airbnb market is driven by commercial operators rather than residents, further supporting the issue that gentrification is negatively impacting the availability of Barcelona’s housing stock.

#neighbourhoods <- st_read("neighbourhoods.geojson")

# filtering neighborhood data 
neighbourhoods_filtered <- neighbourhoods |>
  left_join(
    listings |>
      group_by(neighbourhood) |>
      summarize(pct_entire_home = mean(is_entire_home == "Entire Home") * 100),# find percentage of listings that are entire homes
    by = "neighbourhood"
  )

# create choropleth map
ggplot(neighbourhoods_filtered) +
  geom_sf(aes(fill = pct_entire_home)) +
  scale_fill_viridis_c(option = "plasma",
                       name = "% Entire Home") +
  labs(
    title = "Concentration of Entire Home Airbnb Listings in Barcelona",
    subtitle = "Higher % suggests greater displacement pressure on long-term residents"
  ) +
  theme_minimal()

The map above provides a spatial visualization of Airbnb’s impact on Barcelona’s housing market. Darker shaded neighbourhoods have a higher concentration of entire home listings and therefore face greater displacement pressure. The map reveals clear geographic clustering. Certain neighbourhoods within Ciutat Vella and Eixample stand out as hotspots since they are hold Ill-known tourist destinations. This geographic concentration is important because gentrification tends to spread outward from these core areas over time, gradually pushing long-term residents further from the city center.

5 Summary

5.1 Summary

5.1.1 5.1 Problem Statement

This analysis sought to investigate the extent to which Airbnb contributes to gentrification pressures in five of Barcelona’s most prominent districts: Ciutat Vella, Eixample, Gràcia, Sant Martí, and Sants-Montjuïc. Specifically, I asked the following research questions:

  • Which districts have the highest concentration of entire home Airbnb listings, suggesting the greatest displacement pressure on long-term residents?
  • To what extent are Airbnb listings in Barcelona operated by commercial hosts rather than residents occasionally renting their own homes?
  • Are certain neighbourhoods within these districts disproportionately impacted by the concentration of short-term rentals?

5.1.2 5.2 Methodology

To address these research questions, I imported and cleaned listing data from Inside Airbnb (insideairbnb.com), a mission-driven project that collects and publishes data about Airbnb’s impact on residential communities. The dataset contained over 18,000 listings across Barcelona, which I filtered to focus on my five districts of interest. After cleaning and tidying the data using tidyverse tools, I created several new variables to support my analysis including is_entire_home to flag entire home listings, is_commercial to identify listings available for more than 300 days per year, and is_multi_host to identify hosts operating multiple listings simultaneously. I then performed exploratory data analysis using summary tables, bar plots, stacked bar charts, and choropleth maps to visualize patterns across districts and neighbourhoods.

5.1.3 5.3 Interesting Insights

The analysis revealed several noteworthy findings:

  • Entire home listings dominate across all five districts, with some neighbourhoods showing entire home concentration rates exceeding 70-80% of all listings. This suggests that the majority of Airbnb activity in these areas involves converting residential housing into tourist accommodation rather than occasional home sharing.
  • Ciutat Vella and Eixample hoId the highest concentrations of entire home and commercial listings, which aligns with their status as Barcelona’s most visited tourist districts because they have attractions like Mercat de la Boqueria (Ciutat Vella) and Sagrada Familia (Eixample).
  • A significant proportion of listings belong to multi-listing hosts, indicating that commercial property management companies rather than individual residents are driving much of Barcelona’s Airbnb market.

5.1.4 5.4 Implications

These findings have important implications for several groups:

  • City planners and policymakers should consider targeted regulations in high-concentration neighbourhoods, such as caps on entire home listings or licensing requirements for commercial operators, to protect the supply of long-term rental housing.
  • Long-term residents in Ciutat Vella and Eixample face the greatest risk of displacement as housing is converted to short-term tourist accommodation, driving up rents and reducing housing availability.
  • Housing advocates can use geographic concentration data to prioritize which neighbourhoods need the most urgent intervention and support.
  • Tourists should be aware that their Airbnb bookings, particularly entire home rentals in central districts, may be contributing to housing insecurity for Barcelona’s long-term residents.

5.1.5 5.5 Limitations and Future Improvements

There are several important limitations to acknowledge in this analysis:

  • Price data was entirely missing from our dataset, which significantly limited my ability to analyze the financial impact of Airbnb on housing costs. A more complete dataset including nightly prices would allow for a much richer analysis of gentrification pressure.
  • The dataset is a snapshot in time and does not capture how the Airbnb market has changed over the years. Incorporating the reviews dataset to track listing activity over time would allow for a longitudinal analysis showing how gentrification pressure has grown since Airbnb entered Barcelona’s market.
  • this commercial listing threshold of 300+ days availability is an approximation. A more precise definition of commercial activity would require additional data such as booking frequency.
  • No external demographic data was merged, meaning I could not directly identify which neighbourhoods are low-income or most vulnerable to gentrification. Future work could merge this data with Census income data or housing price data from Barcelona’s open data portal to strengthen the gentrification narrative.