1. Overview

This assignment aims to visualise the Singapore HDB Resale Price. The objective of this visualisation is to look at resale price distribution according to area as well as how resale prices of different flat types change from 2017 to March 2021.

Source: https://data.gov.sg/dataset/resale-flat-prices?resource_id=42ff9cfe-abe5-4b54-beda-c88f9bb438ee.

1.1 Major data

The dataset contains 1,038,114 rows of data about HDB resale information from 2017 to March 2021.

Variable Description
Month Month Time
Town Area of HDB
Flat_type Number of rooms in the flat
Blocks Block number
Storey_range Stoery range of the HDB
Floor_area Floor area
Flat_model Whether or not this flat is New General or Improved
Lease_commence_date Lease commence date
Resale_price Resale price

In addition, geospatial data provided in Week 11 in class exercise are applied to plot Singapore Map.

1.2 Challenges

Challenges in designing of line chart:

  • I planned to design a line chart to display the average price change over the time for every area. However, I realized that there are too many areas (about 30 area), and it makes the line chart crowded as it is difficult to clearly identify any line.

Hence, I chose to plot lines according to flat type instead, in which only 5 lines will be shown in a line chart and it is easy to differentiate the trend of lines.

1.3 Proposed stretch design

I proposed to have 3 charts:

  • A line chart that visualises the price trend over the years for different flat types.
  • A map that visualises the geographically distribution of resale price in Singapore.
  • A treemap that visualises the distribution of units sold and average resale price according to areas.

Proposed stretch

2. Step-by-Step Description

2.1 Install and Load R packages

  • dplyr provides a consistent set of verbs that help you solve the most common data manipulation challenges. I used groupby() and summarise() to help me with generating average resale price.

  • sf provides support for simple features, a standardized way to encode spatial vector data..

  • tmap offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps.

  • plotly is a library that makes interactive, publication-quality graphs.

  • d3treeR is an R htmlwidget for d3.js treemaps. It is designed to integrate seamlessly with the R treemap package or work with traditional nested JSON hierarchies. .

packages <- c('dplyr', 'sf', 'tmap', 'plotly')

for (p in packages){
  if (!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

d3treeR has to be installed separately via:

install.packages("devtools")
library(devtools)
install_github("timelyportfolio/d3treeR")

2.2 Load Dataset

Both main dataset and geosptial data have to be loaded.

data <- read.csv("data.csv")
mpsz <- st_read(dsn = "geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")

2.3 Data Preparation

2.3.1 Data Wrangling for Line Chart

I plan to plot the line chart for average resale price of different flat time according to time series.

  • Firstly, I transform the original dataset to a dataset which groupthe data by month and flat type.
data_type <- data %>%
  group_by(month,flat_type)  %>%
  summarize(avg_price = mean(resale_price))
  • Secondly, I extract the average resale price for different flat type as single dataframe. Then I combine these dataframes into a new one, rename the columnsfor further use in plotting the line chart.
room2 <-data_type %>%
  filter(flat_type=='2 ROOM')
room3 <-data_type %>%
  filter(flat_type=='3 ROOM')
room4 <-data_type %>%
  filter(flat_type=='4 ROOM')
room5 <-data_type %>%
  filter(flat_type=='5 ROOM')
executive <-data_type %>%
  filter(flat_type=='EXECUTIVE')
data_byRoom<- data.frame(room2$month,room2$avg_price, room3$avg_price, room4$avg_price, room5$avg_price, executive$avg_price)

colnames(data_byRoom) <- c("month", "tworoom", "threeroom", "fourroom", "fiveroom", "executive")

2.3.2 Data Wrangling for Map

  • In order to get the average resale price for difference area based on past 1 year data, I transform the original dataset into a new one by apply filter(), groupby(), summarise(), mean() functions in dplyr package.
data_avgprice_byzone <- data %>%
  filter(month >= "2020-01") %>%
  group_by(town)  %>%
  summarize(avg_price = mean(resale_price))
  • left_join() of dplyr is used to join the geographical data and attribute table using planning areas name i.e., PLN_AREA_N and TOWN as the common identifier.
data_map <- left_join(mpsz, data_avgprice_byzone, 
                      by = c("PLN_AREA_N" = "town"))

2.3.3 Data Wrangling for Tree Map

In order to build an interactive tree map, I would like to construct a dataset with Planning Area, Flat type, corresponding unit sold and mean resale price.

I retrieve and calculate unit sold and mean resale price via 2 datasets.

avg_price_bytowntype <- data %>%
  group_by(town, flat_type) %>%
  summarise(avg_p=mean(resale_price))

count_bytowntype <- data %>%
  group_by(town, flat_type) %>%
  count(town)

After that, I summarise the variables needed into a final dataset and rename the columns for easy use.

treemapData <-data.frame(avg_price_bytowntype$town, count_bytowntype$flat_type,avg_price_bytowntype$avg_p, count_bytowntype$n)
colnames(treemapData) <- c("town", "flat_type","mean_resale_price", "quantity_sold")

2.4 Data Visualization

2.4.1 Plot The Line Chart

plotly is used to plot a interactive line chart.

plot_ly(data_byRoom,x= ~month) %>%
  add_lines(y = ~tworoom, name='2 room', mode = 'lines') %>%
  add_lines(y = ~threeroom, name='3 room',mode = 'lines')%>%
  add_lines(y = ~fourroom, name='4 room',mode = 'lines')%>%
  add_lines(y = ~fiveroom, name='5 room',mode = 'lines')%>%
  add_lines(y = ~executive, name='executive',mode = 'lines')

2.4.2 Plot the Singapore Map

tmap is used to plot a interactive line chart.

tmap_mode("view")
tm_shape(data_map) +
  tm_fill("avg_price", 
          style = "quantile", 
          palette = "Blues",
          title = "Average Resale Price") +
  tm_layout(main.title = "Distribution of Average Price \nby Area",
            main.title.position = "center",
            main.title.size = 1.2,
            frame = TRUE) +
  tm_borders(alpha = 0.5)+
  tm_scale_bar(width = 0.15)

2.4.3 Plot the Tree Map

treemap and d3treeR is used to plot a interactive line chart.d3treeR is used to turn a static treemap into an interactive one.

tm <- treemap(treemapData, index = c("town", "flat_type"),
              vSize = "quantity_sold", vColor = "mean_resale_price",
              type = "value",
              palette="RdYlBu", 
              range=c(minSales, maxSales),
              title="HDB Resale by Planning Area and Flat Type",
              title.legend = "Mean Resale Price")

d3tree(tm,rootname = "Singapore" )

3. Visualization and Insights

3.1 Line Chart: Price Trend

From the Line Chart of Average Resale Price Trend Over Time For Different Flat Type, here are some insights:

  • Executive flat is the most expensive type among all HDB flat types.

  • The resale price were relatively stable from 2017 to first half of 2020. However, since second half of 2020, there has been an increasing trend in the resale price for all flat types.

3.2 Map Plot: Average Resale Price For Difference Area In Singapore

## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\Zhengnan\Desktop\AY2020-2021 Term 2\Visual for BI\A5\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## Projected CRS: SVY21

From the Map Plot, there are some insights:

  • The most expensive areas are clustered in the south areas, while north areas are the cheapest.

3.3 Tree Map Plot

My apologies, for some unknown reasons, the static treemap will somehow be shown twice when I knit in R markdown evenif I didn’t mean to show them.

  • Block size represents the unit sold
  • Color represents the average resale price

From the Tree Map Plot, there are some insights:

  • Central areas sold very few units but resale prices are relatively higher.

  • Sengkang, Punggol, Yishun, Jurong West and Woodland areas have the top units sold from 2017 to 2021, but the prices of these areas are relatively cheap.