This assignment aims to visualise the Singapore HDB Resale Price. The objective of this visualisation is to look at resale price distribution according to area as well as how resale prices of different flat types change from 2017 to March 2021.
Source: https://data.gov.sg/dataset/resale-flat-prices?resource_id=42ff9cfe-abe5-4b54-beda-c88f9bb438ee.
The dataset contains 1,038,114 rows of data about HDB resale information from 2017 to March 2021.
| Variable | Description |
|---|---|
| Month | Month Time |
| Town | Area of HDB |
| Flat_type | Number of rooms in the flat |
| Blocks | Block number |
| Storey_range | Stoery range of the HDB |
| Floor_area | Floor area |
| Flat_model | Whether or not this flat is New General or Improved |
| Lease_commence_date | Lease commence date |
| Resale_price | Resale price |
In addition, geospatial data provided in Week 11 in class exercise are applied to plot Singapore Map.
Hence, I chose to plot lines according to flat type instead, in which only 5 lines will be shown in a line chart and it is easy to differentiate the trend of lines.
I proposed to have 3 charts:
Proposed stretch
dplyr provides a consistent set of verbs that help you solve the most common data manipulation challenges. I used groupby() and summarise() to help me with generating average resale price.
sf provides support for simple features, a standardized way to encode spatial vector data..
tmap offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps.
plotly is a library that makes interactive, publication-quality graphs.
d3treeR is an R htmlwidget for d3.js treemaps. It is designed to integrate seamlessly with the R treemap package or work with traditional nested JSON hierarchies. .
packages <- c('dplyr', 'sf', 'tmap', 'plotly')
for (p in packages){
if (!require(p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
d3treeR has to be installed separately via:
install.packages("devtools")
library(devtools)
install_github("timelyportfolio/d3treeR")
Both main dataset and geosptial data have to be loaded.
data <- read.csv("data.csv")
mpsz <- st_read(dsn = "geospatial",
layer = "MP14_SUBZONE_WEB_PL")
I plan to plot the line chart for average resale price of different flat time according to time series.
data_type <- data %>%
group_by(month,flat_type) %>%
summarize(avg_price = mean(resale_price))
room2 <-data_type %>%
filter(flat_type=='2 ROOM')
room3 <-data_type %>%
filter(flat_type=='3 ROOM')
room4 <-data_type %>%
filter(flat_type=='4 ROOM')
room5 <-data_type %>%
filter(flat_type=='5 ROOM')
executive <-data_type %>%
filter(flat_type=='EXECUTIVE')
data_byRoom<- data.frame(room2$month,room2$avg_price, room3$avg_price, room4$avg_price, room5$avg_price, executive$avg_price)
colnames(data_byRoom) <- c("month", "tworoom", "threeroom", "fourroom", "fiveroom", "executive")
filter(), groupby(), summarise(), mean() functions in dplyr package.data_avgprice_byzone <- data %>%
filter(month >= "2020-01") %>%
group_by(town) %>%
summarize(avg_price = mean(resale_price))
left_join() of dplyr is used to join the geographical data and attribute table using planning areas name i.e., PLN_AREA_N and TOWN as the common identifier.data_map <- left_join(mpsz, data_avgprice_byzone,
by = c("PLN_AREA_N" = "town"))
In order to build an interactive tree map, I would like to construct a dataset with Planning Area, Flat type, corresponding unit sold and mean resale price.
I retrieve and calculate unit sold and mean resale price via 2 datasets.
avg_price_bytowntype <- data %>%
group_by(town, flat_type) %>%
summarise(avg_p=mean(resale_price))
count_bytowntype <- data %>%
group_by(town, flat_type) %>%
count(town)
After that, I summarise the variables needed into a final dataset and rename the columns for easy use.
treemapData <-data.frame(avg_price_bytowntype$town, count_bytowntype$flat_type,avg_price_bytowntype$avg_p, count_bytowntype$n)
colnames(treemapData) <- c("town", "flat_type","mean_resale_price", "quantity_sold")
plotly is used to plot a interactive line chart.
plot_ly(data_byRoom,x= ~month) %>%
add_lines(y = ~tworoom, name='2 room', mode = 'lines') %>%
add_lines(y = ~threeroom, name='3 room',mode = 'lines')%>%
add_lines(y = ~fourroom, name='4 room',mode = 'lines')%>%
add_lines(y = ~fiveroom, name='5 room',mode = 'lines')%>%
add_lines(y = ~executive, name='executive',mode = 'lines')
tmap is used to plot a interactive line chart.
tmap_mode("view")
tm_shape(data_map) +
tm_fill("avg_price",
style = "quantile",
palette = "Blues",
title = "Average Resale Price") +
tm_layout(main.title = "Distribution of Average Price \nby Area",
main.title.position = "center",
main.title.size = 1.2,
frame = TRUE) +
tm_borders(alpha = 0.5)+
tm_scale_bar(width = 0.15)
treemap and d3treeR is used to plot a interactive line chart.d3treeR is used to turn a static treemap into an interactive one.
tm <- treemap(treemapData, index = c("town", "flat_type"),
vSize = "quantity_sold", vColor = "mean_resale_price",
type = "value",
palette="RdYlBu",
range=c(minSales, maxSales),
title="HDB Resale by Planning Area and Flat Type",
title.legend = "Mean Resale Price")
d3tree(tm,rootname = "Singapore" )
From the Line Chart of Average Resale Price Trend Over Time For Different Flat Type, here are some insights:
Executive flat is the most expensive type among all HDB flat types.
The resale price were relatively stable from 2017 to first half of 2020. However, since second half of 2020, there has been an increasing trend in the resale price for all flat types.
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\Zhengnan\Desktop\AY2020-2021 Term 2\Visual for BI\A5\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## Projected CRS: SVY21
From the Map Plot, there are some insights:
My apologies, for some unknown reasons, the static treemap will somehow be shown twice when I knit in R markdown evenif I didn’t mean to show them.
From the Tree Map Plot, there are some insights:
Central areas sold very few units but resale prices are relatively higher.
Sengkang, Punggol, Yishun, Jurong West and Woodland areas have the top units sold from 2017 to 2021, but the prices of these areas are relatively cheap.