Overview

I continued with data that I’ve been working with previously in this class to make further detailed analyses and visualizations about Airbnb listings in Amsterdam, The Netherlands.

Import Libraries

I imported libraries that I would be using in my analysis.

library(tidyverse)
## ── Attaching packages ───────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
library(ggplot2)

Import Data

I downloaded data from Inside Airbnb for Airbnb listings in Amsterdam, North Holland, the Netherlands and imported it using read_csv().

listings <- read_csv('airbnb_listings.csv')
## Parsed with column specification:
## cols(
##   id = col_double(),
##   name = col_character(),
##   host_id = col_double(),
##   host_name = col_character(),
##   neighbourhood_group = col_logical(),
##   neighbourhood = col_character(),
##   latitude = col_double(),
##   longitude = col_double(),
##   room_type = col_character(),
##   price = col_double(),
##   minimum_nights = col_double(),
##   number_of_reviews = col_double(),
##   last_review = col_date(format = ""),
##   reviews_per_month = col_double(),
##   calculated_host_listings_count = col_double(),
##   availability_365 = col_double()
## )

Analysis of the Different Room Types in Airbnb Listings in Amsterdam

I wanted to see the different types of rooms in the Airbnb listings and how many of each type there are.

First, I created a data frame with counts of each room type.

all_room_types <- listings %>% count(room_type)
all_room_types
## # A tibble: 3 x 2
##   room_type           n
##   <chr>           <int>
## 1 Entire home/apt 16148
## 2 Private room     4118
## 3 Shared room        71

I added a column showing the percentage of each room type that exist in the listings.

totalListings <- nrow(listings)
roomtype_percentages <- mutate(all_room_types, percentage=round(n/totalListings*100, digits = 2))
roomtype_percentages
## # A tibble: 3 x 3
##   room_type           n percentage
##   <chr>           <int>      <dbl>
## 1 Entire home/apt 16148      79.4 
## 2 Private room     4118      20.2 
## 3 Shared room        71       0.35

Finally, I made a bar plot representing my findings using ggplot.

mycols <- c("#fbb4ae", "#b3cde3", "#ccebc5")
bp <- ggplot(roomtype_percentages) +
  aes(x = room_type, y = n) +
  geom_col() +
  scale_y_continuous(labels = comma) +
  geom_bar(stat="identity", fill=mycols)+
  geom_text(aes(label=paste(percentage, "%", sep="")), vjust=-0.3, size=3.5) +
  theme_minimal() +
  labs(
    title = "Room Types of Airbnb Listings",
    subtitle = "Amsterdam, North Holland, The Netherlands",
    y = "Number of Listings",
    x = "Room Types",
    caption = "Source: Inside Airbnb"
  )
bp + coord_flip()

My findings show that the most abundant type of listings in Amsterdam is entire home/apartment room types, with the least being shared rooms.

Analysis of Airbnb Activity in Amsterdam

Reveiws left by Airbnb guests after their stay can be used as an indicator of Airbnb activity. I took the reviews from Airbnb listings in Amsterdam to make a time series line plot showing Airbnb activity in Amsterdam.

First, I created a data frame with just the dates of the last review received and the counts for each date. As some of them contained NA values, I filtered those out using filter().

activity <- listings %>% count(last_review)
activityWithoutNull <- filter(activity, !is.na(last_review))
activityWithoutNull
## # A tibble: 1,368 x 2
##    last_review     n
##    <date>      <int>
##  1 2012-02-13      1
##  2 2012-07-26      1
##  3 2012-07-27      1
##  4 2012-08-06      1
##  5 2012-11-24      1
##  6 2013-01-08      1
##  7 2013-04-01      1
##  8 2013-04-09      1
##  9 2013-05-09      1
## 10 2013-07-02      1
## # … with 1,358 more rows

I plotted this data frame against time to create a time series line graph showing Airbnb activity with ggplot.

activity_graph <- ggplot() + 
  geom_line(aes(y = n, x = last_review), data = activityWithoutNull, stat="identity", color = "#00AFBB") +
  scale_y_continuous(labels = comma) +
  labs(
    title = "Activity of Airbnb Listings",
    subtitle = "Amsterdam, North Holland, The Netherlands",
    y = "Number of Reviews",
    x = "Year",
    caption = "Source: Inside Airbnb")
activity_graph

My findings show an overall trend of increasing Airbnb activity with time, as more people using Airbnb starting 2016 and peaking in the later half of 2018.

Analysis of Availability of Airbnb Listings in Amsterdam

Airbnb listings vary in its availability throughout the year. Airbnb hosts can set up the calendar for their listings so that they are only available for specific numbers of days, or the listings may be already booked, restricting their availabilities.

I wanted to see the number of listings according to the number of days they are available within a year.

First, I created a data frame containing the number of days available to be booked in a year for the listings, from 0 to 365 days, and counted how many listings fell into each availability category.

availability_count <- listings %>% count(availability_365)
availability_count
## # A tibble: 366 x 2
##    availability_365     n
##               <dbl> <int>
##  1                0  9820
##  2                1   299
##  3                2   245
##  4                3   313
##  5                4   338
##  6                5   230
##  7                6   294
##  8                7   248
##  9                8   209
## 10                9   179
## # … with 356 more rows

I used this information to create a line graph showing the number of listings available for the different numbers of days in a year.

availability_graph <- ggplot() + 
  geom_line(aes(y = n, x = availability_365), data = availability_count, stat="identity", color = "steelblue") +
  scale_y_continuous(labels = comma) +
  labs(
    title = "Availability of Airbnb Listings",
    subtitle = "Amsterdam, North Holland, The Netherlands",
    y = "Number of Listings",
    x = "Number of Days Available in a Year",
    caption = "Source: Inside Airbnb")
availability_graph

I found that my findings were heavily skewed because there are 9820 listings out of 20337 listings, around 48% of the listings, that have an availability of 0 days. So I filtered these “unavailable” listings to make a more accurate representation of the availability of Airbnb listings.

availability_count_non0 <- filter(availability_count, availability_365 > 0)
availability_count_non0
## # A tibble: 365 x 2
##    availability_365     n
##               <dbl> <int>
##  1                1   299
##  2                2   245
##  3                3   313
##  4                4   338
##  5                5   230
##  6                6   294
##  7                7   248
##  8                8   209
##  9                9   179
## 10               10   189
## # … with 355 more rows

I plotted this information in a line graph, showing the trends of availability of in the Airbnb listings in Amsterdam.

availability_graph_non0 <- ggplot() + 
  geom_line(aes(y = n, x = availability_365), data = availability_count_non0, stat="identity", color = "steelblue") +
  scale_y_continuous(labels = comma) +
  labs(
    title = "Availability of Airbnb Listings",
    subtitle = "Amsterdam, North Holland, The Netherlands",
    y = "Number of Listings",
    x = "Number of Days Available in a Year",
    caption = "Source: Inside Airbnb")
availability_graph_non0

My findings show that more listings are available short-term than long-term, with an outlying peak of 210 listings having an availability of 189 days.