Se trata de uno de los conjuntos de datos publicados por Inside Airbnb para Barcelona. En concreto se trata de listings, el que contiene el listado de propiedades dispuestas para el alquiler. La fecha de scrapping de estos datos corresponde al 20 de marzo de 2024. Son un total de 18.519 propiedades, para las cuales no hay valores perdidos ni en la categoría de tipo de propiedad ni el latitud y longitud. Por lo tanto, podemos mapear todas las propiedades y clasificarlas según tipología.
Una consideración importante: se trata de datos de Barcelona. Hospitalet de Llobregat, Sant Adrià de Besòs, Badalona y los otros municipios colindantes no están incluidos.
Los datos se pueden revisar acá: https://insideairbnb.com/barcelona/
url <- "https://raw.githubusercontent.com/juancarraha/mbdds/main/listings.csv"
data <- read.csv(url, sep = ",")
barrios <- data %>%
count(neighbourhood_group_cleansed, sort = TRUE) %>%
mutate(percentage = round(n / sum(n) * 100, 2))
pander(barrios)
| neighbourhood_group_cleansed | n | percentage |
|---|---|---|
| Eixample | 6704 | 36.2 |
| Ciutat Vella | 4266 | 23.04 |
| Sants-Montjuïc | 1933 | 10.44 |
| Sant Martí | 1684 | 9.09 |
| Gràcia | 1556 | 8.4 |
| Sarrià-Sant Gervasi | 974 | 5.26 |
| Horta-Guinardó | 529 | 2.86 |
| Les Corts | 379 | 2.05 |
| Sant Andreu | 288 | 1.56 |
| Nou Barris | 206 | 1.11 |
El dataset original cuenta con 75 variables, para las cuales la mayoría tienen sus registros completos. En la siguiente tabla está el porcentaje de valores perdidos o vacíos en cada una.
na_percentages <- data %>%
summarise_all(~ mean(is.na(.) | . == "") * 100) %>%
gather(key = "variable", value = "na_percentage") %>%
mutate(na_percentage = round(na_percentage, 2))
pander(na_percentages)
| variable | na_percentage |
|---|---|
| id | 0 |
| listing_url | 0 |
| scrape_id | 0 |
| last_scraped | 0 |
| source | 0 |
| name | 0 |
| description | 3.44 |
| neighborhood_overview | 46.74 |
| picture_url | 0 |
| host_id | 0 |
| host_url | 0 |
| host_name | 0.01 |
| host_since | 0.01 |
| host_location | 23.27 |
| host_about | 36.71 |
| host_response_time | 0.01 |
| host_response_rate | 0.01 |
| host_acceptance_rate | 0.01 |
| host_is_superhost | 0.81 |
| host_thumbnail_url | 0.01 |
| host_picture_url | 0.01 |
| host_neighbourhood | 50.46 |
| host_listings_count | 0.01 |
| host_total_listings_count | 0.01 |
| host_verifications | 0 |
| host_has_profile_pic | 0.01 |
| host_identity_verified | 0.01 |
| neighbourhood | 46.74 |
| neighbourhood_cleansed | 0 |
| neighbourhood_group_cleansed | 0 |
| latitude | 0 |
| longitude | 0 |
| property_type | 0 |
| room_type | 0 |
| accommodates | 0 |
| bathrooms | 20.79 |
| bathrooms_text | 0.06 |
| bedrooms | 10.8 |
| beds | 21.22 |
| amenities | 0 |
| price | 20.86 |
| minimum_nights | 0 |
| maximum_nights | 0 |
| minimum_minimum_nights | 0 |
| maximum_minimum_nights | 0 |
| minimum_maximum_nights | 0 |
| maximum_maximum_nights | 0 |
| minimum_nights_avg_ntm | 0 |
| maximum_nights_avg_ntm | 0 |
| calendar_updated | 100 |
| has_availability | 5.71 |
| availability_30 | 0 |
| availability_60 | 0 |
| availability_90 | 0 |
| availability_365 | 0 |
| calendar_last_scraped | 0 |
| number_of_reviews | 0 |
| number_of_reviews_ltm | 0 |
| number_of_reviews_l30d | 0 |
| first_review | 25.69 |
| last_review | 25.69 |
| review_scores_rating | 25.68 |
| review_scores_accuracy | 25.7 |
| review_scores_cleanliness | 25.69 |
| review_scores_checkin | 25.71 |
| review_scores_communication | 25.69 |
| review_scores_location | 25.71 |
| review_scores_value | 25.71 |
| license | 32.5 |
| instant_bookable | 0 |
| calculated_host_listings_count | 0 |
| calculated_host_listings_count_entire_homes | 0 |
| calculated_host_listings_count_private_rooms | 0 |
| calculated_host_listings_count_shared_rooms | 0 |
| reviews_per_month | 25.69 |
Se registran varios tipos de propiedad. La web de Airbnb incluye guías y foros para discutir sobre la definición de cada uno de los tipos de propiedad, y sugerencias para la clasificación que hacen los hosts.
property_type_counts <- data %>%
group_by(property_type) %>%
summarise(frequency = n()) %>%
arrange(desc(frequency))
pander(property_type_counts)
| property_type | frequency |
|---|---|
| Entire rental unit | 9681 |
| Private room in rental unit | 5674 |
| Entire serviced apartment | 465 |
| Room in hotel | 402 |
| Entire condo | 395 |
| Entire loft | 289 |
| Private room in condo | 213 |
| Private room in hostel | 199 |
| Room in boutique hotel | 193 |
| Private room in home | 187 |
| Private room in bed and breakfast | 91 |
| Private room in casa particular | 80 |
| Shared room in hostel | 80 |
| Entire home | 78 |
| Shared room in rental unit | 41 |
| Private room in guest suite | 39 |
| Private room in serviced apartment | 39 |
| Private room in loft | 35 |
| Entire vacation home | 30 |
| Room in hostel | 30 |
| Private room | 24 |
| Entire villa | 23 |
| Entire guest suite | 21 |
| Private room in guesthouse | 20 |
| Entire guesthouse | 19 |
| Entire townhouse | 19 |
| Room in serviced apartment | 18 |
| Boat | 14 |
| Private room in floor | 11 |
| Private room in townhouse | 10 |
| Camper/RV | 9 |
| Tiny home | 9 |
| Room in aparthotel | 8 |
| Casa particular | 7 |
| Entire place | 6 |
| Private room in chalet | 6 |
| Private room in dome | 6 |
| Private room in vacation home | 4 |
| Shared room in bed and breakfast | 4 |
| Shared room in home | 4 |
| Shared room in villa | 4 |
| Entire chalet | 3 |
| Private room in boat | 3 |
| Room in bed and breakfast | 3 |
| Private room in villa | 2 |
| Shared room | 2 |
| Shared room in guesthouse | 2 |
| Shared room in hotel | 2 |
| Shared room in loft | 2 |
| Yurt | 2 |
| Barn | 1 |
| Earthen home | 1 |
| Entire cabin | 1 |
| Hut | 1 |
| Private room in tiny home | 1 |
| Private room in tower | 1 |
| Shared room in boutique hotel | 1 |
| Shared room in condo | 1 |
| Shared room in floor | 1 |
| Shared room in guest suite | 1 |
| Shared room in serviced apartment | 1 |
Aquí he recodificado de manera general los tipos de propiedades en las tres tipologías generales que reconoce Airbnb.
Alojamiento entero
Habitación
Habitación compartida
La fuente puede ser revisada en el siguiente enlace: https://www.airbnb.es/help/article/317
data <- data %>%
mutate(accommodation_type = case_when(
grepl("^Entire|^Whole|^Complete", property_type, ignore.case = TRUE) ~ "Entire place",
grepl("^Private room|^Room in|^Casa particular|^Yurt|^Barn|^Earthen home|^Hut", property_type, ignore.case = TRUE) ~ "Private room",
grepl("^Shared", property_type, ignore.case = TRUE) ~ "Shared room",
TRUE ~ "Other" # En caso de que haya tipos no contemplados
))
El resultado de la recodificación puede ser mejorado. Algunas consideraciones que poropongo son las siguientes:
data$accommodation_type[data$property_type == "Entire guest suite"] <- "Private room"
data$accommodation_type[data$property_type %in% c("Casa particular", "Earthen home", "Boat", "Camper/RV", "Tiny home")] <- "Entire place"
cross_tab <- data %>%
count(accommodation_type, property_type) %>%
arrange(accommodation_type, desc(n))
pander(cross_tab)
| accommodation_type | property_type | n |
|---|---|---|
| Entire place | Entire rental unit | 9681 |
| Entire place | Entire serviced apartment | 465 |
| Entire place | Entire condo | 395 |
| Entire place | Entire loft | 289 |
| Entire place | Entire home | 78 |
| Entire place | Entire vacation home | 30 |
| Entire place | Entire villa | 23 |
| Entire place | Entire guesthouse | 19 |
| Entire place | Entire townhouse | 19 |
| Entire place | Boat | 14 |
| Entire place | Camper/RV | 9 |
| Entire place | Tiny home | 9 |
| Entire place | Casa particular | 7 |
| Entire place | Entire place | 6 |
| Entire place | Entire chalet | 3 |
| Entire place | Earthen home | 1 |
| Entire place | Entire cabin | 1 |
| Private room | Private room in rental unit | 5674 |
| Private room | Room in hotel | 402 |
| Private room | Private room in condo | 213 |
| Private room | Private room in hostel | 199 |
| Private room | Room in boutique hotel | 193 |
| Private room | Private room in home | 187 |
| Private room | Private room in bed and breakfast | 91 |
| Private room | Private room in casa particular | 80 |
| Private room | Private room in guest suite | 39 |
| Private room | Private room in serviced apartment | 39 |
| Private room | Private room in loft | 35 |
| Private room | Room in hostel | 30 |
| Private room | Private room | 24 |
| Private room | Entire guest suite | 21 |
| Private room | Private room in guesthouse | 20 |
| Private room | Room in serviced apartment | 18 |
| Private room | Private room in floor | 11 |
| Private room | Private room in townhouse | 10 |
| Private room | Room in aparthotel | 8 |
| Private room | Private room in chalet | 6 |
| Private room | Private room in dome | 6 |
| Private room | Private room in vacation home | 4 |
| Private room | Private room in boat | 3 |
| Private room | Room in bed and breakfast | 3 |
| Private room | Private room in villa | 2 |
| Private room | Yurt | 2 |
| Private room | Barn | 1 |
| Private room | Hut | 1 |
| Private room | Private room in tiny home | 1 |
| Private room | Private room in tower | 1 |
| Shared room | Shared room in hostel | 80 |
| Shared room | Shared room in rental unit | 41 |
| Shared room | Shared room in bed and breakfast | 4 |
| Shared room | Shared room in home | 4 |
| Shared room | Shared room in villa | 4 |
| Shared room | Shared room | 2 |
| Shared room | Shared room in guesthouse | 2 |
| Shared room | Shared room in hotel | 2 |
| Shared room | Shared room in loft | 2 |
| Shared room | Shared room in boutique hotel | 1 |
| Shared room | Shared room in condo | 1 |
| Shared room | Shared room in floor | 1 |
| Shared room | Shared room in guest suite | 1 |
| Shared room | Shared room in serviced apartment | 1 |
Las propiedades son, en su mayoría, alojamientos enteros. De todas maneras, las habitaciones privadas representan cerca del 40% y las habitaciones compartidas un porcentaje muy minoritario. Las categorías por recodificar representan un número pequeño de propiedades, por lo que no deberían cambiar de manera importante esta distribución.
Propongo evaluar si generamos otra variable que categorice entre servicios de alojamiento formales como hoteles y hostales, y el alojamiento en unidades de vivienda no diseñadas para la pernoctación de turistas o población flotante. Más que nada, para distinguir con equipamientos como los hoteles si decidimos incorporarlos a través de OpenStreetMaps.
tipos <- data %>%
count(accommodation_type, sort = TRUE) %>%
mutate(percentage = round(n / sum(n) * 100, 2))
pander(tipos)
| accommodation_type | n | percentage |
|---|---|---|
| Entire place | 11049 | 59.66 |
| Private room | 7324 | 39.55 |
| Shared room | 146 | 0.79 |
Otra variable interesante es la de noches mínimas y máximas. Las mímimas presentan datos elevadísimos en algunos casos, y creo que vale la pena considerar eliminar los outliers y ciertos valores muy altos. En el caso de las noches máximas los valores son elevados, pero eso tiene más sentido.
ggplot(data, aes(x = minimum_nights)) +
geom_histogram(binwidth = 5, fill = "blue", color = "black", alpha = 0.7) +
labs(title = "Histograma de Minimum Nights", x = "Minimum Nights", y = "Frecuencia") +
theme_minimal()
ggplot(data, aes(y = minimum_nights)) +
geom_boxplot(fill = "blue", color = "black", alpha = 0.7) +
facet_wrap(~ accommodation_type) +
labs(title = "Boxplot de Minimum Nights", y = "Minimum Nights") +
theme_minimal()
# Calcular los límites de los outliers
outlier_limits <- boxplot.stats(data$minimum_nights)$out
cat("Los límites para considerar outliers son:", outlier_limits, "\n")
## Los límites para considerar outliers son: 120 90 180 90 300 186 120 180 180 364 119 90 180 120 90 1124 100 180 180 100 90 360 90 865 100 179 900 90 90 90 90 300 100 150 90 100 100 300 360 360 210 90 80 90 90 100 88 300 1124 360 90 80 90 90 90 365 180 150 90 500 100 90 300 90 120 180 300 90 90 180 90 300 310 90 90 90 90 250 80 90 90 90 86 90 1000 100 100 120 900 120 91 90 90 1000 91 365 90 100 1000 1000 1000 360 90 180 90 180 90 90 365 180 333 365 300 180 90 179 200 150 90 92 100 92 364 120 90 150 90 360 365 365 365 365 365 365 365 365 365 150 180 365 365 365 80 365 99 99 99 99 90 90 150 92 120 120 120 90 120 90 280 190 91 92 92 90 180 120 120 180 90 90 90 90 90 92 90 365 365 365 365 180 90 90 90 365 90 100 80 90
ggplot(data, aes(x = maximum_nights)) +
geom_histogram(binwidth = 5, fill = "blue", color = "black", alpha = 0.7) +
labs(title = "Histograma de Maximum Nights", x = "Maximum Nights", y = "Frecuencia") +
theme_minimal()
ggplot(data, aes(y = maximum_nights)) +
geom_boxplot(fill = "blue", color = "black", alpha = 0.7) +
facet_wrap(~ accommodation_type) +
labs(title = "Boxplot de Maximum Nights", y = "Maximum Nights") +
theme_minimal()
El dataset también incluye información sobre los hosts y si estos tienen más de un listing. Con esto se pueden ver casos en los que un edificio contenga varias unidades en alquiler en Airbnb, o si el host tiene muchas propiedades repartidas en la ciudad. Revisando un poco los host_id he detectado casos en los que se trata de emprendimientos o compañías dedicadas al alquiler vacacional. Esto podría ser de interés más adelante si ahondamos en temas de turistificación. De momento no me parece crucial trabajar las variables, pero sí saber que el dataset permite explorar estos puntos.
Dejo fuera los alojamientos que tienen un número de noches mínimas de 80 o más, que son las que representan valores outliers.
airbnb_data <- data[data$minimum_nights < 80, c("id", "price", "accommodation_type", "latitude", "longitude")]
El precio por noches está en dólares, por lo que se convertirá a euros al tipo de cambio correspondiente al día de extracción de la data.
airbnb_data$price <- as.integer(sub("\\$", "", airbnb_data$price))
## Warning: NAs introducidos por coerción
tipo_cambio = 0.9148 #Tipo de cambio USD a EUR el 20/03/2024
airbnb_data$precio_euro <- airbnb_data$price * tipo_cambio
write.csv(airbnb_data, "airbnb_data.csv", row.names = FALSE)