1 Introduction

AirBNB is a marketplace where travelers can rent apartment and unique home from all over the world. This visualisation aims to provide more information about AirBNB apartment distribution in Singapore.

The data used for this assignment come from 2 sources: First we will need AirBNB rental data which consist on the price, neighbourhood, type of room and many other information. This data can be downloaded from http://insideairbnb.com/get-the-data.html

Next we will need the geospatial data of singapore map by planning area which can be easily find in data.gov.sg. You may refer to this link for more info: https://data.gov.sg/dataset/master-plan-2014-subzone-boundary-no-sea?resource_id=c30bfcc0-7e23-4959-b4d9-c5da5e00af54

1.1 Challenges Faced

1.1.1 Joining data

The geo-spatial data polygon’s coordinates are based on subzone instead of planning area. The neighbourhood feature in the AirBNB dataset is based on planning area instead. Joining of data would be possible but there will be an issue when we are plotting the chloropleth map. For example, Outram planning area has people’s park and Chinatown subzone. When we are plotting on tmap, it will show both Chinatown and people’s park values as Outram planning area values which is wrong.

1.1.2 Interactive Tmap and Positioning

As Tmap doesn’t allow much customization, it is difficult to customize the visualization as well as including interactivity functions into the plot. It is also difficult to position 2 interactive plot a tmap and plotly side by side given that most of r syntax support plotting 2 object of the same class. (e.g. 2 plotly chart side by side is possible but 1 leaflet and 1 plotly is very difficult to achieve)

1.1.3 Filter and Legend

Despite leaflet having many customise functions, applying a select filter in leaflet is a very difficult task. As leaflet support overlay, which is a check box option for you to choose which layer you want to display, it is very difficult for user to use. For example, when we want to look at private room, we will need to uncheck all the other boxes and check private rooms. The legend is also very messy as it will include all the legend if of the selected checkbox.

1.2 Solution

1.2.1 Joining data

This can be solve by performing a union of polygon to get the coordinate of the entire planning area instead of individual subzone.

1.2.2 Interactive Tmap and Positioning

As Tmap return an leaflet output, we can make use of leaflet function to add in interactivity into the chart. We can also make use of html packages to help us to solve alignment issues as well as including an overall title for the plot.

1.2.3 Filter and Legend

Leaflet have another option known as basegroup. However, this option is meant to be switch between different type of map background. Hence, we can build multiple tmap and set each of them as part of the basegroup so users can see the chloropleth map of each room type. However, the legend would still be very messy. Hence, i standardise the legend to a fixed style with same break point for all the 4 chart and only display one of the legend since the rest follow the same legend.

1.3 Proposed Sketch

Rough sketch of proposed visualisation. Pardon me for my bad handwriting and drawing. This is the visual of a chloropleth map with boxplot.

knitr::include_graphics("sketch.jpg")

2 Step-by-step Instructions

2.1 Install and load the libraries

First we will need to install and load all the necessary library needed for the analysis and building of visualization. * Plotly package is used to plot the interactive box plot * Leaflet is needed in order to add in some customization for tmap * htmltools is needed to bind the 2 interactive plot into as one visualization

packages = c('tidyverse','tmap','sf', 'plotly','leaflet','htmltools')
for (p in packages){
  if(!require(p,character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2.2 Data Wrangling

2.2.1 Loading data

As the data is in csv format, we will be using read_csv to load the airBNB dataset as a dataframe. Change Labelname for room type since it is very long

airbnb_data <- read_csv("data/listings.csv")
airbnb_data$room_type[airbnb_data$room_type == "Entire home/apt"] <- "Entire home"

Take a glimpse at the data to understand the data structure.

glimpse(airbnb_data)

## Rows: 7,323
## Columns: 16
## $ id                             <dbl> 49091, 50646, 56334, 71609, 71896, 7...
## $ name                           <chr> "COZICOMFORT LONG TERM STAY ROOM 2",...
## $ host_id                        <dbl> 266763, 227796, 266763, 367042, 3670...
## $ host_name                      <chr> "Francesca", "Sujatha", "Francesca",...
## $ neighbourhood_group            <chr> "North Region", "Central Region", "N...
## $ neighbourhood                  <chr> "Woodlands", "Bukit Timah", "Woodlan...
## $ latitude                       <dbl> 1.44255, 1.33235, 1.44246, 1.34541, ...
## $ longitude                      <dbl> 103.7958, 103.7852, 103.7967, 103.95...
## $ room_type                      <chr> "Private room", "Private room", "Pri...
## $ price                          <dbl> 84, 80, 70, 167, 95, 84, 209, 52, 54...
## $ minimum_nights                 <dbl> 180, 90, 6, 90, 90, 90, 1, 90, 90, 1...
## $ number_of_reviews              <dbl> 1, 18, 20, 20, 24, 48, 29, 176, 199,...
## $ last_review                    <date> 2013-10-21, 2014-12-26, 2015-10-01,...
## $ reviews_per_month              <dbl> 0.01, 0.24, 0.18, 0.19, 0.22, 0.43, ...
## $ calculated_host_listings_count <dbl> 2, 1, 2, 8, 8, 8, 8, 3, 3, 4, 4, 7, ...
## $ availability_365               <dbl> 365, 365, 365, 365, 365, 365, 180, 3...

The code below help us to prepare the spatial data. * First, it will read the shp file * Group by planning area name in order to merge the polygons * Merge the subzone polygon into planning area polygon

mpsz <- st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_NO_SEA_PL")%>%
  group_by(PLN_AREA_N) %>%
  summarise(geometry = sf::st_union(geometry))

## Reading layer `MP14_SUBZONE_NO_SEA_PL' from data source `C:\Users\FabianO.O\Desktop\VA\Assignment5\Assignment5\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21

2.2.2 Outlier Preparation

As some of the planning area have very little data, it is good to remove them from the visualization as it could be misleading. Hence, the code chunk below help us to get the list of neighbourhood with at least 10 air BNB apartment rental

no_outliers<- airbnb_data %>%
  group_by(neighbourhood)%>%
  summarise(count = n())%>%
  filter(count >= 10)
neighbourhood_list <- unique(no_outliers$neighbourhood)

2.2.3 Cleaning and Aggregating Data

The code chunk below performs the following: * Group the data by neighbourhood and room type * Retrieve the records that will be used for ananlysis * Perform summarise function to get the total count, mean, median, min and max price of the air BNB rental * Change all the neighbourhood to uppercase so we can join with the geospatial data later.

airbnb_data_clean <- airbnb_data %>%
  group_by(neighbourhood,room_type)%>%
  filter(neighbourhood %in% neighbourhood_list)%>%
  summarise(count = n(),
            avg_price = mean(price),
            median_price = median(price),
            min_price = min(price),
            max_price = max(price))%>%
  mutate_at(.vars = vars(neighbourhood),.funs = funs(toupper))

## `summarise()` regrouping output by 'neighbourhood' (override with `.groups` argument)

## Warning: `funs()` is deprecated as of dplyr 0.8.0.
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

airbnb_data_clean

## # A tibble: 119 x 7
## # Groups:   neighbourhood [37]
##    neighbourhood room_type    count avg_price median_price min_price max_price
##    <chr>         <chr>        <int>     <dbl>        <dbl>     <dbl>     <dbl>
##  1 ANG MO KIO    Entire home      6     128.           130        60       200
##  2 ANG MO KIO    Private room    37      80.4           59        20       279
##  3 ANG MO KIO    Shared room      2      40             40        25        55
##  4 BEDOK         Entire home     82     250.           150        78      5000
##  5 BEDOK         Hotel room       3     137.           145        95       170
##  6 BEDOK         Private room   234     123.            90        22      3001
##  7 BEDOK         Shared room      5     215            150        14       501
##  8 BISHAN        Entire home     18     193.           160        68       500
##  9 BISHAN        Private room    42     158.            63        38      1350
## 10 BISHAN        Shared room      1      27             27        27        27
## # ... with 109 more rows

This code is use to check the unique values of room type of all airBNB rental in Singapore.

unique(airbnb_data_clean$room_type)

## [1] "Entire home"  "Private room" "Shared room"  "Hotel room"

2.2.4 Joining data

We need to join the geospatial data and the AirBNB data set together. Right join is being used so our output could be a sf object and the polygon with no matches will be filtered out from the data set.

airbnb_clean <- right_join(mpsz,airbnb_data_clean, c("PLN_AREA_N" = "neighbourhood" ))
airbnb_clean

## Simple feature collection with 119 features and 7 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 10373.18 ymin: 21678.35 xmax: 44860.41 ymax: 50256.33
## projected CRS:  SVY21
## # A tibble: 119 x 8
##    PLN_AREA_N                  geometry room_type count avg_price median_price
##    <chr>                  <POLYGON [m]> <chr>     <int>     <dbl>        <dbl>
##  1 ANG MO KIO ((31070.93 39048.68, 310~ Entire h~     6     128.           130
##  2 ANG MO KIO ((31070.93 39048.68, 310~ Private ~    37      80.4           59
##  3 ANG MO KIO ((31070.93 39048.68, 310~ Shared r~     2      40             40
##  4 BEDOK      ((39348.48 32017.65, 393~ Entire h~    82     250.           150
##  5 BEDOK      ((39348.48 32017.65, 393~ Hotel ro~     3     137.           145
##  6 BEDOK      ((39348.48 32017.65, 393~ Private ~   234     123.            90
##  7 BEDOK      ((39348.48 32017.65, 393~ Shared r~     5     215            150
##  8 BISHAN     ((30988.88 36251.95, 310~ Entire h~    18     193.           160
##  9 BISHAN     ((30988.88 36251.95, 310~ Private ~    42     158.            63
## 10 BISHAN     ((30988.88 36251.95, 310~ Shared r~     1      27             27
## # ... with 109 more rows, and 2 more variables: min_price <dbl>,
## #   max_price <dbl>

2.3 Creating Tmap

2.3.1 Create Tmap for Private Room

The following is performed in the code chunk: * Assign tmap object to the variable private * Create a tmap starting by using tm shape as the base layer. Notice that there is a filter for private room as each type of room will have an individual tmap. * Apply fill for the chloropleth map. tm fill help us to color the polygon according to the number of room in each planning area. * popup.vars allow us to click on the polygons in the map and see various values such as average price and min price in polygon * Title refers to the title of the legend * Style is fixed with break points up to 500. The max number of count is 514. The breakpoint are created so the comparison is fair between the different room types * tm borders is to outline the shape of the polygon. It allow us to visualise the shape of the planning area.

private<-tm_shape(airbnb_clean %>% filter(room_type == "Private room"),name = "Private Room")+
    tm_fill("count",
            popup.vars=c("Count"="count",
                       "Average Price"="avg_price",
                       "Median Price"="median_price",
                       "Min Price" = "min_price",
                       "Max Price" = "max_price"),
            title = "Room Count",
            style="fixed", breaks = c(0,50,100,150,200,250,300,350,400,450,500),
            alpha = 0.7)+
    tm_borders(lwd = 1,
               alpha = 1)+tm_layout(main.title = "Map")

2.3.2 Create Tmap for Entire Home

You can see that the creation is similar to private room with the exception of the legend title. This is because, if we were to show legend for this tmap, the actual visualization will be very cluttered as it will display 4 different legend in the map. Since, all the legend follows the same breakpoint, we will not be showing the legend. This is same for the other 2 map.

entire<-tm_shape(airbnb_clean%>% filter(room_type == "Entire home"),name = "Entire home")+
    tm_fill("count",
            popup.vars=c("Count"="count",
                       "Average Price"="avg_price",
                       "Median Price"="median_price",
                       "Min Price" = "min_price",
                       "Max Price" = "max_price"),
            legend.show = FALSE,
            style="fixed", breaks = c(0,50,100,150,200,250,300,350,400,450,500),
            alpha = 0.7)+
    tm_borders(lwd = 1,
               alpha = 1)

2.3.3 Create Tmap for Shared Room

shared<-tm_shape(airbnb_clean%>% filter(room_type == "Shared room"),name = "Shared room")+
    tm_fill("count",
            popup.vars=c("Count"="count",
                       "Average Price"="avg_price",
                       "Median Price"="median_price",
                       "Min Price" = "min_price",
                       "Max Price" = "max_price"),
            legend.show = FALSE,
            style="fixed", breaks = c(0,50,100,150,200,250,300,350,400,450,500),
            alpha = 0.7)+
    tm_borders(lwd = 1,
               alpha = 1)

2.3.4 Create Tmap for Hotel Room

hotel<-tm_shape(airbnb_clean%>% filter(room_type == "Hotel room"),name = "Hotel room")+
    tm_fill("count",
            popup.vars=c("Count"="count",
                       "Average Price"="avg_price",
                       "Median Price"="median_price",
                       "Min Price" = "min_price",
                       "Max Price" = "max_price"),
            legend.show = FALSE,
            style="fixed", breaks = c(0,50,100,150,200,250,300,350,400,450,500),
            alpha = 0.7)+
    tm_borders(lwd = 1,
               alpha = 1)

2.3.5 Combining Tmap and customisation

The code chunk below combine the 4 tmap object that is created above and add in control to apply filter. As you can see the base groups, it contain each of the room types base on their name in tm shape. This will help us to filter the respective chloropleth map for analysis. The last part where layersControlOptions(collapsed = FALSE) is to make sure that the filter is never hidden to allow user to switch between the chloropleth map easily.

leaflet_map <- tmap_leaflet(c(private,entire,shared,hotel))%>%
  addTiles(attribution = 'Source: AirBNB Insider, Leaflet.')%>%
  addLayersControl(
    baseGroups = c(c("Private Room","Entire home","Shared room","Hotel room")),
    position = "topleft",
    options = layersControlOptions(collapsed = FALSE)
  )

2.4 Create Interactive Boxplot using Plotly

The boxplot allow us to visualise the distribution of prices across the room type which can help users to decide the best room type they should go for based on their budget. We will create a interactive box plot by: * create a plotly object with data with the raw data before cleaning. This includes the outliers as boxplot can help us to visualize the variance. The y axis will be the price while x axis would be the room type which is differentiated by colors * Y_axis refers to the zoom range as the the variation is very big due to prime locations, it difficult to visualise without zooming in to the boxplot * Title refers to the plot title * Legend is being set to an orientation of horizontal for better visualisation. Margin is also being set to avoid axis label and legend overlay each other

boxplot <- plot_ly(airbnb_data, y = ~price, color = ~room_type, type = "box")%>%
   layout(yaxis = list(range = c(0, 1000)),
          title = 'AirBNB Price Distribution Across Room Type',
          legend = list(orientation = 'h')) %>%
          layout(autosize = T, margin = list(b=50))

2.5 Final Visualisation with HTML Functions

As described in the challenges, it is very difficult to show a leaflet and plotly chart side by side. However, with HTML tags, we are able to create a “webpage” that combine the both charts together. You may notice that there is a data source text at the bottom of the chart but no codes are displaying. This is done with a little shortcut where i create another code chunk with just the h6 tag without displaying the code.

Interactivity: * Change of Filter in map * Click on polygon in map to see additional information * Zoom in and out for both map and box plot * Filter Legend where you can click to show or hide box plot from chart

browsable(
  tagList(list(
    tags$h2("AirBNB Room Count and Price Distribution", style = "text-align:center"),
    tags$div(
      style = 'width:50%;display:block;float:left;',
      leaflet_map
    ),
    tags$div(
      style = 'width:50%;display:block;float:left;',
      boxplot
    )
  ))
)

## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

AirBNB Room Count and Price Distribution

Data Source: planning area map - https://data.gov.sg,airBNB - http://insideairbnb.com/get-the-data.html

3 Short Description

The Map helps us to visualise the concentration of AirBNB apartment across various planning area in Singapore while the boxplot helps us to see the distribution of AirBNB apartment price across the room types.

From the map, we can see that there is relatively lesser shared room and hotel room listed in AirBNB data set compared to private room and entire home. This can be seen when you toggle between radio buttons. First, for private room and entire home, there are apartment at almost all the planning area. On the other hand, hotel rooms are mainly existing near the central of Singapore such as Orchard, Outram and kallang area. For shared and hotel room, the max number of rentals it can go is 150 per planning area. However, for private and entire home they have quite a few planning area with more than 200 apartments for rental.
For private room, the places with the most apartment rental is around Kallang and Geylang area which is more towards the east side of Singapore. Their surrounding planning area such as Bukit Merah and Bedok have around 200 apartments listed on AirBNB. Similarly, for entire home, most of them cluttered around Novena, Kallang and Geylang area. However, their surrounding planning area is relatively lower with around 100 apartments listed. Hence, although both were clustered around central of Singapore, Private room are more evenly distributed.
From the box plot we can see that the price variation for the entire house is the greatest. When we look at the entire house box plot, the price can go as high as 13k while the variation for hotel and shared room are a lot lower. Comparing the interquartile range between the 4 room types, Entire home have the highest median price followed by hotel room, private room, and shared room. This trend is similar for the rest of the interquartile values. Something interesting is that for all these rooms, they can go as low as $20 per day. This is probably due to some promotion event or lack of customer .

Assignment 5

Fabian

10/30/2020