“Every one soon or late comes round by Rome.” - Robert Browning
Rome has beckoned travelers from afar for quite a few decades now. If Italy represents romance, Rome stands for intimacy. Intimacy between its glorious past and urban present. Intimacy between its spellbinding art and inspiring culture. There is always more to Rome, and no matter how many trips you take, there will always be more to Rome. Needless to say that Rome receives millions of tourists each year. Rome is the 11th most visited city in the world and 3rd most visited in Europe.
The purpose of this project is to perform exploratory analysis on the AirBnB rental data for Rome and understand the following:
After all, who doesn’t want to travel the world with minimum cost, if not for free?
Since 2008, guests and hosts have used Airbnb to travel in a more unique, personalized way. As part of the Inside Airbnb initiative, this dataset describes the listing activity of homestays in Rome, Italy and is sourced from publicly available information from the Airbnb site.
Compiled till 8thMay, 2017, the following Airbnb activity is included in this Rome dataset:
R will be used to perform data analysis and visualization to explore and identify the most affordable neighbourhoods to stay for a solo or family trip. This will be done by categorizing them based on the variables.
If you are a traveler or planning your next vacation to this exotic city, it will help you find the best neighborhoods to stay. If not, it might just inspire you to pack your bags!
To perform the analysis to the best of my abilities, I will be using the following R packages:
library(tidyverse)
library(dplyr)
library(stringr)
library(knitr)
library(ggmap)
library(DT)
library(plotly)
library(gridExtra)
Inside Airbnb is an independent, non-commercial set of tools and data that allows you to explore how Airbnb is really being used in cities around the world.
By analyzing publicly available information about a city’s Airbnb’s listings, Inside Airbnb provides filters and key metrics so you can see how Airbnb is being used to compete with the residential housing market.
The data behind the Inside Airbnb site is sourced from publicly available information from the Airbnb site.
Link to the data is available here
Loading the datasets
listings <- read.csv("data/listings.csv", na.strings=c(""," ","NA"))
Dimensions of the datasets
listings_dim <- dim(listings)
Column names of the datasets
names(listings)
## [1] "id" "listing_url"
## [3] "scrape_id" "last_scraped"
## [5] "name" "summary"
## [7] "space" "description"
## [9] "experiences_offered" "neighborhood_overview"
## [11] "notes" "transit"
## [13] "access" "interaction"
## [15] "house_rules" "thumbnail_url"
## [17] "medium_url" "picture_url"
## [19] "xl_picture_url" "host_id"
## [21] "host_url" "host_name"
## [23] "host_since" "host_location"
## [25] "host_about" "host_response_time"
## [27] "host_response_rate" "host_acceptance_rate"
## [29] "host_is_superhost" "host_thumbnail_url"
## [31] "host_picture_url" "host_neighbourhood"
## [33] "host_listings_count" "host_total_listings_count"
## [35] "host_verifications" "host_has_profile_pic"
## [37] "host_identity_verified" "street"
## [39] "neighbourhood" "neighbourhood_cleansed"
## [41] "neighbourhood_group_cleansed" "city"
## [43] "state" "zipcode"
## [45] "market" "smart_location"
## [47] "country_code" "country"
## [49] "latitude" "longitude"
## [51] "is_location_exact" "property_type"
## [53] "room_type" "accommodates"
## [55] "bathrooms" "bedrooms"
## [57] "beds" "bed_type"
## [59] "amenities" "square_feet"
## [61] "price" "weekly_price"
## [63] "monthly_price" "security_deposit"
## [65] "cleaning_fee" "guests_included"
## [67] "extra_people" "minimum_nights"
## [69] "maximum_nights" "calendar_updated"
## [71] "has_availability" "availability_30"
## [73] "availability_60" "availability_90"
## [75] "availability_365" "calendar_last_scraped"
## [77] "number_of_reviews" "first_review"
## [79] "last_review" "review_scores_rating"
## [81] "review_scores_accuracy" "review_scores_cleanliness"
## [83] "review_scores_checkin" "review_scores_communication"
## [85] "review_scores_location" "review_scores_value"
## [87] "requires_license" "license"
## [89] "jurisdiction_names" "instant_bookable"
## [91] "cancellation_policy" "require_guest_profile_picture"
## [93] "require_guest_phone_verification" "calculated_host_listings_count"
## [95] "reviews_per_month"
** Taking a subset of the data for easier analysis**
At the first glance of the dataset, we’ve seen that it contains many irrelevant and redundant columns that we won’t want to use in our analysis. Undoubtedly columns such as “host picture url” and “host name” will not help us in our analysis. Thus, subsetting the data set with select columns for ease of analysis.
listings_sub <- listings %>%
select(latitude, longitude,neighbourhood_cleansed, price,accommodates,room_type, property_type, bed_type)
** A look at the first few rows of the data set**
datatable(head(listings_sub, n=10))
Checking for null values
sum(is.na(listings_sub))
## [1] 0
We see that there are no null values present for any variable
Price being an important variable in our analysis, performing intial analysis and manipulation on the variable
Changing the type of the variable
First we need to fix up the price variable, which is given to us as a string containing dollar signs, dots, and commas.
listings_sub$price[1:5]
## [1] $58.00 $50.00 $150.00 $96.00 $89.00
## 410 Levels: $1,000.00 $1,050.00 $1,400.00 $1,500.00 $1,540.00 ... $999.00
Changing the price variable to type numeric
listings_sub$price <- as.numeric(listings_sub$price)
Summary of the variable
summary(listings_sub$price)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 151.0 304.0 249.5 354.0 410.0
Listing the top 10 neighbourhoods with the maximum listings
listings_sub %>%
group_by(neighbourhood_cleansed) %>%
select(neighbourhood_cleansed) %>%
summarise(count = n())%>%
arrange(desc(count)) %>%
top_n(10)%>%
ggplot()+
geom_bar(mapping = aes(x=reorder(neighbourhood_cleansed, count),
y=count),
stat="identity", fill = "light blue") +
coord_flip() +
labs(title="Top 10 neighbourhoods with maximum listings",
x="Neighbourhoods", y="Number of listings") +
theme_minimal()
It can be seen that the neighbourhood of I Centro Storico has the most number of listings with close to 12000 listings up for rent. It is interesting to note that Centro Storico is the historical center of his Italian city. Here’s where most visitors would want to spend their time!
Maximum number of listings lie in which price range?
listings_sub %>%
count(cut_width(price, 10)) %>%
arrange(desc(n)) %>%
top_n(1)
## # A tibble: 1 x 2
## `cut_width(price, 10)` n
## <fctr> <int>
## 1 (385,395] 1757
Most of the listings 1757 lie in the price range of 385 - 395 euros.
Finding the most costliest rental in Rome
listings_sub %>%
filter(price == max(listings_sub$price)) %>%
select(neighbourhood_cleansed ,property_type,price)
## neighbourhood_cleansed property_type price
## 1 XII Monte Verde Apartment 410
Interestingly this location has a pool and is located close to the vatican. To check the url of this location, click here
Listing out the neighbourhoods which lie within the 25 and 75 percentile of the price range
listings_sub %>%
filter(price >= quantile(listings_sub$price)[2] & price <= quantile(listings_sub$price)[3]) %>%
distinct(neighbourhood_cleansed)
## neighbourhood_cleansed
## 1 I Centro Storico
## 2 III Monte Sacro
## 3 IV Tiburtina
## 4 VI Roma delle Torri
## 5 VII San Giovanni/CinecittÃ
## 6 VIII Appia Antica
## 7 IX Eur
## 8 X Ostia/Acilia
## 9 XI Arvalia/Portuense
## 10 XII Monte Verde
## 11 XIII Aurelia
## 12 XIV Monte Mario
## 13 XV Cassia/Flaminia
## 14 II Parioli/Nomentano
## 15 V Prenestino/Centocelle
Categorising the variable price for better understanding
listings_sub$price_cat <- cut(listings_sub$price, c(0, 151, 354, 410),
labels = c("affordable", "moderate", "expensive"))
Total count of each price category
datatable( listings_sub %>%
group_by(price_cat) %>%
summarise(count=n()))
Subsetting the litsings under the affordable category
affordable.rentals <- listings_sub %>%
filter(price_cat == "affordable")
Categorising accommodation into 3 types
affordable.rentals$accommodation_cat <- cut(affordable.rentals$accommodates, c(0,1,4,20),
labels = c("single", "family", "group"))
Grouping the average prce by accommodation and room type
plot1 <- affordable.rentals %>%
group_by(accommodation_cat) %>%
summarise(mean=mean(price)) %>%
ggplot() +
geom_bar(mapping = aes(x=reorder(accommodation_cat, mean),
y=mean),
stat="identity", fill = "light blue") +
labs(title="Average price by acccomodation",
x="Accommodation type ", y="Average price of the listings") +
theme_minimal()
plot2 <- affordable.rentals %>%
group_by(room_type) %>%
summarise(mean=mean(price)) %>%
arrange(desc(mean)) %>%
# top_n(5) %>%
ggplot() +
geom_bar(mapping = aes(x=reorder(room_type, mean),
y=mean),
stat="identity", fill = "light blue") +
labs(title="Average price by room type",
x="Room type ", y="Average price of the listings") +
theme_minimal()
grid.arrange(plot1, plot2, ncol=2)
It can be seen that the most priced listings are for Single and Shared Room. These are mostly hostels and dormitories.
Grouping the average price by property type
affordable.rentals %>%
group_by(property_type) %>%
summarise(mean=mean(price)) %>%
arrange(desc(mean)) %>%
# top_n(5) %>%
ggplot() +
geom_bar(mapping = aes(x=reorder(property_type, mean),
y=mean),
stat="identity", fill = "light blue") +
coord_flip() +
labs(title="Average Price property wise",
x="Property type ", y="Average price of the listings") +
theme_minimal()
It can be seen that Bungalows are the most priced proerty types in Rome!
Grouping the average price by bed type
affordable.rentals %>%
group_by(bed_type) %>%
summarise(mean=mean(price)) %>%
arrange(desc(mean)) %>%
# top_n(5) %>%
ggplot() +
geom_bar(mapping = aes(x=reorder(bed_type, mean),
y=mean),
stat="identity", fill = "light blue") +
# coord_flip() +
labs(title="Average Price by bed type",
x="Bed type ", y="Average price of the listings") +
theme_minimal()
Further deep diving into our analysis, we can will now check the locations which are most affordable for a family trip.
Filtering our subsetted data for accommodation of family type
family.affordable <- affordable.rentals %>%
filter(accommodation_cat =="family")
Counting and mapping the locations which accommodate families according to room type
map <- get_map(location = 'Rome',maptype = "roadmap", zoom = "auto", scale = "auto")
locations <- ggmap(map) + geom_point(aes(x = longitude, y = latitude, color = room_type), data = family.affordable, alpha = 0.8)
locations
datatable(family.affordable %>%
group_by(room_type) %>%
select(room_type) %>%
summarise(count = n()))
As can be seen from the map and the count, most of the listing are type Entire home/apt and Private Rooms. Only 37 of the lisings are of type shared, this is expected as families would would prefer private rooms while travelling.
Average price of the room types
(family.affordable %>%
group_by(room_type) %>%
summarise(AveragePrice=round(mean(price), 2)) %>%
arrange(AveragePrice))
## # A tibble: 3 x 2
## room_type AveragePrice
## <fctr> <dbl>
## 1 Entire home/apt 39.55
## 2 Private room 54.91
## 3 Shared room 81.27
Counting and mapping the locations which accommodate families according to bed type
locations <- ggmap(map) + geom_point(aes(x = longitude, y = latitude, color = bed_type), data = family.affordable, alpha = 0.8)
locations
datatable(family.affordable %>%
group_by(bed_type) %>%
select(bed_type) %>%
summarise(count = n()))
As expected, most listings that accommodate families have bed type of Real Bed. Nothing new here!
The affordable neighbourhoods which have the most listings for a family accommodation
datatable(family.affordable %>%
group_by(neighbourhood_cleansed) %>%
select(neighbourhood_cleansed) %>%
summarise(count = n())%>%
arrange(desc(count)) %>%
top_n(5))
The affordable neighbourhoods average price for a family accommodation
datatable(family.affordable %>%
group_by(neighbourhood_cleansed) %>%
summarise(AveragePrice=round(mean(price), 2)) %>%
arrange(AveragePrice))
For more analysis, we can will now check the locations which are most affordable for a solo trip.
Filtering our subsetted data for accommodation of family type
single.affordable <- affordable.rentals %>%
filter(accommodation_cat == "single")
Counting and mapping the locations which accommodate singles according to room type
locations <- ggmap(map) + geom_point(aes(x = longitude, y = latitude, color = room_type), data = single.affordable, alpha = 0.8)
locations
datatable(single.affordable %>%
group_by(room_type) %>%
select(room_type) %>%
summarise(count = n()))
For singles also it seems that Private Rooms are the most listed!
Average price of the room types
datatable(single.affordable %>%
group_by(room_type) %>%
summarise(AveragePrice=round(mean(price), 2)) %>%
arrange(AveragePrice))
Counting and mapping the locations which accommodate singles according to bed type
locations <- ggmap(map) + geom_point(aes(x = longitude, y = latitude, color = bed_type), data = single.affordable, alpha = 0.8)
locations
datatable(single.affordable %>%
group_by(bed_type) %>%
select(bed_type) %>%
summarise(count = n()))
The affordable neighbourhoods which have the most listings for a family accommodation
datatable(single.affordable %>%
group_by(neighbourhood_cleansed) %>%
select(neighbourhood_cleansed) %>%
summarise(count = n())%>%
arrange(desc(count)) %>%
top_n(5))
It can be seen that the neighbourhoods of VII San Giovanni/CinecittA and I Centro Storico are the most listed neighbourhoods!
The affordable neighbourhoods average price for a single accommodation
datatable(single.affordable %>%
group_by(neighbourhood_cleansed) %>%
summarise(AveragePrice=round(mean(price), 2)) %>%
arrange(AveragePrice))
This analysis is mainly to figure out the most affordable and listed neighbourhoods in the city of Rome. The following have been the main insights from this analysis:
Futher analysis can be done by looking at the varibles which impact the proce of the listings and creating model to predict the price of locations on the basis of amenities offered.