Shapefile: https://data.gov.sg/dataset/master-plan-2014-planning-area-boundary-web (Urban Redevelopment Authority, 2014)
HDB Resale Flat Prices: https://data.gov.sg/dataset/resale-flat-prices (Housing and Development Board, 2019)
Since 1990, the Housing and Development Board of Singapore (HDB) has been keeping track of the re-selling of public apartment housing into the secondary market. Published on Singapore’s open data portal, data.gov.sg, this dataset will be used alongside the 2014 Urban Redevelopment Authority’s (URA) 2014 Master Plan that defines the boundaries of the various planning areas across Singapore. In order to create the maps in this notebook, the shape files from the 2014 planning area boundary master plan was used to define the polygons representing the different planning areas in Singapore on the maps.
This notebook attempts to make use of the data listed above to reveal some insights into the median resale prices of HDB flats in Singapore from January 2017 to August 2019. This dataset contains the details of the HDB resale transactions, including unit information such as floor area in square meters, number of rooms, address in addition to the resale price.
The original dataset is downloaded from data.gov.sg during August 2019, and hence is missing the latest 2020 data. However, the data has gone through a round of feature engineering in external tools in order to create more relevant information such as distance to nearby amenities, price per square meter area, and latitude / longitude of the unit.
In this visualisation, we will be focusing on making use of geospatial plots, mainly choropleth maps and bubble maps, to visualise the median resale price / median price per square meter of units by Planning Area, total number of transactions in the area over the duration captured within the data. While there will be a few different maps plotted, the final visualization will be one that attempts to emcompass as much useful information in a aesthetically-pleasing and user-friendly map.
There was a need to carry out some data transformation on the dataset to map the data’s ‘Area’ column indicating geographical planning area to the shape file’s area name as specified by URA. There was also a need to process the shape file using the correct Geodetic System (WGS84), as Singapore’s SVY21 is not supported in the libraries used. These coordinates were also used to obtain the accurate latitude and longitude in order to plot other items such as individual sale transactions and the centroids of planning areas.
While the dataset downloaded from data.gov.sg contained plenty of rows, the actual columns of information captured inside was limited, missing information such as price per square meter and geospatial data such as coordinates or latitiude/longitude. There was a need to carry out feature engineering to create these additional data needed for effective visualisation. While some features like price / sq. meter was simple to create, there was a need to query other resources like the SLA OneMap API to get the lat/long of the HDB flats sold in order to plot the transactions on a map.
The dataset is large, consisting of over 57,000 transactions of HDB resale flats across 24 different planning areas in Singapore. The large amount of data meant that plotting every single transaction on a map was not feasible. Instead, sensible aggregation measures were needed to create a map that was not cluttered yet still offered interesting insights. In this case, the data was grouped by planning area in order to reduce the amount of data that was plotted, while taking the median prices to reduce the effect of outlier transactions that go for abnormally high or low prices.
The main chart used is the choropleth map, along with bubble map. The purple colour scale of the individual planning areas indicate the area’s median price per square meter. The intensitity of the purple shading increases with the median price, while the size of the yellow bubble in the area indicates the total number of transactions recorded. In addition to the combined map, users are able to hover over the different planning areas to get an easily-readable tooltip that contains the actual figures of the planning area. As not areas have data, Null values are represented by grey shading.
packages <- c('tidyverse', 'skimr', 'leaflet','rgdal', 'data.table', 'sp', 'cartography')
for (p in packages) {
if(!require(p, character.only = T)) {
install.packages(p)
}
library(p, character.only = T)
}
## Loading required package: tidyverse
## -- Attaching packages -------------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.0 v purrr 0.3.3
## v tibble 3.0.3 v dplyr 0.8.5
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ----------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Loading required package: skimr
## Loading required package: leaflet
## Loading required package: rgdal
## Loading required package: sp
## rgdal: version: 1.4-8, (SVN revision 845)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
## Path to GDAL shared files: G:/R-3.6.3/library/rgdal/gdal
## GDAL binary built with GEOS: TRUE
## Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
## Path to PROJ.4 shared files: G:/R-3.6.3/library/rgdal/proj
## Linking to sp version: 1.4-1
## Loading required package: data.table
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following object is masked from 'package:purrr':
##
## transpose
## Loading required package: cartography
options(scipen=999)
Read in data
hdb_data <- utils::read.csv('./resale_processed.csv')
# Cast factors to strings
hdb_data <- hdb_data %>% mutate_if(is.factor, as.character)
# Inspect dataframe structure
str(hdb_data)
## 'data.frame': 57542 obs. of 34 variables:
## $ town : chr "ANG MO KIO" "ANG MO KIO" "ANG MO KIO" "ANG MO KIO" ...
## $ block : chr "101" "101" "101" "101" ...
## $ street_name : chr "ANG MO KIO AVE 3" "ANG MO KIO AVE 3" "ANG MO KIO AVE 3" "ANG MO KIO AVE 3" ...
## $ floor_area_sqm : num 117 117 117 117 127 106 127 112 106 74 ...
## $ flat_model : chr "Standard" "Standard" "Standard" "Standard" ...
## $ resale_price : num 500000 552000 528000 564000 648000 455000 640000 515000 410000 345000 ...
## $ latitude : num 1.37 1.37 1.37 1.37 1.37 ...
## $ longitude : num 104 104 104 104 104 ...
## $ nearest_mrt_x : chr "TE6 Mayflower" "TE6 Mayflower" "TE6 Mayflower" "TE6 Mayflower" ...
## $ nearest_mrt_now_x : chr "NS16 Ang Mo Kio" "NS16 Ang Mo Kio" "NS16 Ang Mo Kio" "NS16 Ang Mo Kio" ...
## $ timeToMRT_x : num 5.83 5.83 5.83 5.83 5.32 5.32 5.32 5.32 5.32 4.82 ...
## $ distanceToMRT_x : int 486 486 486 486 443 443 443 443 443 401 ...
## $ timeToMRTNow_x : num 14.9 14.9 14.9 14.9 15.4 ...
## $ distanceToMRTNow_x : int 1242 1242 1242 1242 1284 1284 1284 1284 1284 1323 ...
## $ nearest_mall_x : chr "Broadway Plaza" "Broadway Plaza" "Broadway Plaza" "Broadway Plaza" ...
## $ nearest_school_x : chr "ANG MO KIO PRIMARY SCHOOL" "ANG MO KIO PRIMARY SCHOOL" "ANG MO KIO PRIMARY SCHOOL" "ANG MO KIO PRIMARY SCHOOL" ...
## $ nearest_healthcare_x : chr "Thomson Medical Centre" "Thomson Medical Centre" "Thomson Medical Centre" "Thomson Medical Centre" ...
## $ nearest_mrt_y : chr "TE6 Mayflower" "TE6 Mayflower" "TE6 Mayflower" "TE6 Mayflower" ...
## $ nearest_mrt_now_y : chr "NS16 Ang Mo Kio" "NS16 Ang Mo Kio" "NS16 Ang Mo Kio" "NS16 Ang Mo Kio" ...
## $ nearest_mall_y : chr "Broadway Plaza" "Broadway Plaza" "Broadway Plaza" "Broadway Plaza" ...
## $ nearest_school_y : chr "ANG MO KIO PRIMARY SCHOOL" "ANG MO KIO PRIMARY SCHOOL" "ANG MO KIO PRIMARY SCHOOL" "ANG MO KIO PRIMARY SCHOOL" ...
## $ nearest_healthcare_y : chr "Thomson Medical Centre" "Thomson Medical Centre" "Thomson Medical Centre" "Thomson Medical Centre" ...
## $ timeToMall : num 15.1 15.1 15.1 15.1 15.6 ...
## $ distanceToMall : int 1258 1258 1258 1258 1299 1299 1299 1299 1299 1333 ...
## $ timeToSch : num 2.1 2.1 2.1 2.1 2.7 2.7 2.7 2.7 2.7 3.48 ...
## $ distanceToSch : int 174 174 174 174 224 224 224 224 224 289 ...
## $ timeToHealthcare : num 13 13 13 13 13 ...
## $ distanceToHealthcare : int 6561 6561 6561 6561 6561 6561 6561 6561 6561 6563 ...
## $ remaining_lease_mths : int 721 708 699 698 898 882 878 875 875 717 ...
## $ price_per_sqm : num 4274 4718 4513 4821 5102 ...
## $ storey_range_avg : int 2 5 5 11 8 5 5 5 11 5 ...
## $ lg_price_per_sqm_norm: num -0.118 0.299 0.112 0.39 0.629 ...
## $ lg_resale_price_norm : num 0.555 0.856 0.721 0.922 1.345 ...
## $ flat_type_num : int 5 5 5 5 5 4 5 4 4 3 ...
View actual data
head(hdb_data, 15)
Generate Summary Statistics
skim(hdb_data)
| Name | hdb_data |
| Number of rows | 57542 |
| Number of columns | 34 |
| _______________________ | |
| Column type frequency: | |
| character | 14 |
| numeric | 20 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| town | 0 | 1 | 5 | 15 | 0 | 26 | 0 |
| block | 0 | 1 | 1 | 4 | 0 | 2336 | 0 |
| street_name | 0 | 1 | 7 | 20 | 0 | 536 | 0 |
| flat_model | 0 | 1 | 4 | 22 | 0 | 19 | 0 |
| nearest_mrt_x | 0 | 1 | 8 | 26 | 0 | 127 | 0 |
| nearest_mrt_now_x | 0 | 1 | 9 | 26 | 0 | 91 | 0 |
| nearest_mall_x | 0 | 1 | 3 | 31 | 0 | 119 | 0 |
| nearest_school_x | 0 | 1 | 13 | 44 | 0 | 176 | 0 |
| nearest_healthcare_x | 0 | 1 | 16 | 34 | 0 | 36 | 0 |
| nearest_mrt_y | 0 | 1 | 8 | 26 | 0 | 127 | 0 |
| nearest_mrt_now_y | 0 | 1 | 9 | 26 | 0 | 91 | 0 |
| nearest_mall_y | 0 | 1 | 3 | 31 | 0 | 119 | 0 |
| nearest_school_y | 0 | 1 | 13 | 44 | 0 | 176 | 0 |
| nearest_healthcare_y | 0 | 1 | 16 | 34 | 0 | 36 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| floor_area_sqm | 0 | 1 | 97.87 | 24.21 | 31.00 | 82.00 | 96.00 | 113.00 | 249.00 | ▃▇▃▁▁ |
| resale_price | 0 | 1 | 440366.43 | 154821.16 | 160000.00 | 330000.00 | 408000.00 | 515000.00 | 1205000.00 | ▆▇▂▁▁ |
| latitude | 0 | 1 | 1.37 | 0.04 | 1.27 | 1.34 | 1.36 | 1.39 | 1.46 | ▂▆▇▅▃ |
| longitude | 0 | 1 | 103.84 | 0.07 | 103.69 | 103.78 | 103.84 | 103.90 | 103.99 | ▃▆▇▇▃ |
| timeToMRT_x | 0 | 1 | 11.77 | 7.34 | 0.00 | 7.03 | 10.13 | 14.47 | 237.27 | ▇▁▁▁▁ |
| distanceToMRT_x | 0 | 1 | 980.50 | 611.62 | 0.00 | 586.00 | 844.00 | 1204.00 | 19771.00 | ▇▁▁▁▁ |
| timeToMRTNow_x | 0 | 1 | 12.88 | 7.00 | 0.00 | 7.88 | 11.65 | 16.22 | 209.90 | ▇▁▁▁▁ |
| distanceToMRTNow_x | 0 | 1 | 1072.67 | 583.08 | 0.00 | 657.00 | 971.00 | 1352.00 | 17490.00 | ▇▁▁▁▁ |
| timeToMall | 0 | 1 | 12.13 | 8.68 | 0.00 | 7.30 | 10.82 | 15.05 | 266.08 | ▇▁▁▁▁ |
| distanceToMall | 0 | 1 | 1010.72 | 723.18 | 0.00 | 608.00 | 902.00 | 1254.00 | 22174.00 | ▇▁▁▁▁ |
| timeToSch | 0 | 1 | 8.25 | 7.56 | 0.08 | 4.83 | 6.92 | 9.77 | 261.83 | ▇▁▁▁▁ |
| distanceToSch | 0 | 1 | 687.43 | 630.24 | 7.00 | 402.00 | 576.00 | 814.00 | 21821.00 | ▇▁▁▁▁ |
| timeToHealthcare | 0 | 1 | 22.19 | 9.00 | 0.00 | 15.60 | 20.88 | 29.37 | 41.18 | ▂▆▇▇▃ |
| distanceToHealthcare | 0 | 1 | 14058.04 | 6855.14 | 0.00 | 9097.00 | 13265.00 | 19993.00 | 29116.00 | ▅▇▆▇▃ |
| remaining_lease_mths | 0 | 1 | 893.76 | 147.83 | 555.00 | 783.00 | 894.00 | 1000.00 | 1160.00 | ▂▇▆▇▅ |
| price_per_sqm | 0 | 1 | 4529.51 | 1214.16 | 2207.79 | 3728.81 | 4227.27 | 4923.91 | 11808.51 | ▇▇▂▁▁ |
| storey_range_avg | 0 | 1 | 8.66 | 5.80 | 2.00 | 5.00 | 8.00 | 11.00 | 50.00 | ▇▂▁▁▁ |
| lg_price_per_sqm_norm | 0 | 1 | 0.00 | 1.00 | -2.90 | -0.69 | -0.16 | 0.48 | 4.17 | ▁▇▆▂▁ |
| lg_resale_price_norm | 0 | 1 | 0.00 | 1.00 | -2.92 | -0.71 | -0.06 | 0.64 | 3.23 | ▁▆▇▃▁ |
| flat_type_num | 0 | 1 | 4.14 | 0.92 | 1.00 | 3.00 | 4.00 | 5.00 | 6.00 | ▁▅▇▅▂ |
shapefile <- readOGR('./master-plan-2014-planning-area-boundary-web-shp/MP14_PLNG_AREA_WEB_PL.shp')
## OGR data source with driver: ESRI Shapefile
## Source: "G:\OneDrive - Singapore Management University\VA\Assignment 5\master-plan-2014-planning-area-boundary-web-shp\MP14_PLNG_AREA_WEB_PL.shp", layer: "MP14_PLNG_AREA_WEB_PL"
## with 55 features
## It has 12 fields
sg_area <- spTransform(shapefile, '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs')
latlong <- data.table(coordinates(spTransform(sg_area, '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs')))
sg_area@data$longitude <- latlong$V1
sg_area@data$latitude <- latlong$V2
hdb_data$town[hdb_data$town == "KALLANG/WHAMPOA"] <- "KALLANG"
Plotting the data on a map
testMap <- leaflet() %>%
addMapPane(name = "maplabels", zIndex = 420) %>%
addTiles() %>%
addProviderTiles("CartoDB.PositronOnlyLabels",
options = leafletOptions(pane = "maplabels")) %>%
addCircleMarkers(data = hdb_data,
radius = 1
)
## Assuming "longitude" and "latitude" are longitude and latitude, respectively
testMap
From the sheer number of blue dots all across the map, we can tell that there are a lot of transactions across the whole of Singapore, and that it is not feasible to plot out every single transaction as the map would be unreadable, even with colour-coding and size hierarchy. There is a need to do aggregation of the data by town (Planning area).
map_data <- setDT(hdb_data)[,list(`Median Resale Price`=median(resale_price), `median price per sq meter`=median(price_per_sqm)), by = .(town), with = TRUE]
map_data$`number of transactions` <- count(hdb_data, town) %>%
rename(`number of transactions` = n) %>% .$`number of transactions`
map_data
Creating the base data for plotting the maps
shape.data <- sg_area@data
shape.data$id <- as.numeric(1:nrow(shape.data))
names(shape.data)[names(shape.data) == "PLN_AREA_N"] <- "town"
new.shape.data <- merge(shape.data, map_data, by = "town", all.x = TRUE)
new.shape.data <- new.shape.data[order(new.shape.data$id,decreasing = FALSE),]
sg_area@data <- new.shape.data
# Prepare colour palette
mypalette <- colorBin(palette=carto.pal('purple.pal', n1 = 10), domain=sg_area@data$`Median Resale Price`, na.color="lightgrey")
# Prepare label text to be shown on hover
label_text <- paste(
"Town: ", sg_area@data$town,"<br/>",
"Median price: ", sg_area@data$`Median Resale Price`, "<br />",
sep="") %>%
lapply(htmltools::HTML)
# Plot choropleth map using leaflet
choroplethMap <- leaflet(sg_area) %>%
addMapPane(name = "maplabels", zIndex = 420) %>%
addTiles() %>%
addProviderTiles("CartoDB.PositronOnlyLabels",
options = leafletOptions(pane = "maplabels")) %>%
addPolygons(
fillColor = ~mypalette(sg_area@data$`Median Resale Price`),
weight = 2,
opacity = 1,
fillOpacity = 0.9,
color = 'grey',
label = label_text
) %>%
leaflet::addLegend(pal = mypalette, values = sg_area@data$`Median Resale Price`, title = "Median Resale Price", position = "bottomright")
choroplethMap
From the output, we realise that taking the entire unit’s resale price may not be an entirely accurate reflection of the prices in the area, as it is possible that the high prices could simply be the result of having larger apartment units. Therefore, we repeat the same chart, but using price per square meter instead.
# Prepare colour palette
mypalette <- colorBin(palette=carto.pal('purple.pal', n1 = 10), domain=sg_area@data$`median price per sq meter`, na.color="lightgrey")
# Prepare label text to be shown on hover
label_text <- paste(
"Town: ", sg_area@data$town,"<br/>",
"Median $ / Sq. M: ", sg_area@data$`median price per sq meter`, "<br />",
sep="") %>%
lapply(htmltools::HTML)
# Plot choropleth map using leaflet
choroplethMap <- leaflet(sg_area) %>%
addMapPane(name = "maplabels", zIndex = 420) %>%
addTiles() %>%
addProviderTiles("CartoDB.PositronOnlyLabels",
options = leafletOptions(pane = "maplabels")) %>%
addPolygons(
fillColor = ~mypalette(sg_area@data$`median price per sq meter`),
weight = 2,
opacity = 1,
fillOpacity = 0.9,
color = 'grey',
label = label_text
) %>%
leaflet::addLegend(pal = mypalette, values = sg_area@data$`median price per sq meter`, title = "Median Price Per Square Meter", position = "bottomright")
choroplethMap
From the plot, we see a slight change in the results, with some towns (planning areas) swapping colour bin, but still generally remaining in their respective high or low ends.
Likewise, another choropleth map can be plotted for the total number of resale flat transactions in the area over the period of time
mypalette <- colorBin(palette=carto.pal('sand.pal', n1 = 10), domain=sg_area@data$`number of transactions`, na.color="lightgrey")
# Prepare label text to be shown on hover
label_text <- paste(
"Town: ", sg_area@data$town,"<br/>",
"Number of transactions: ", sg_area@data$`number of transactions`,
sep="") %>%
lapply(htmltools::HTML)
# Plot choropleth map using leaflet
choroplethMap <- leaflet(sg_area) %>%
addMapPane(name = "maplabels", zIndex = 420) %>%
addTiles() %>%
addProviderTiles("CartoDB.PositronOnlyLabels",
options = leafletOptions(pane = "maplabels")) %>%
addPolygons(
fillColor = ~mypalette(sg_area@data$`number of transactions`),
weight = 2,
opacity = 1,
fillOpacity = 0.9,
color = 'grey',
label = label_text
) %>%
leaflet::addLegend(pal = mypalette, values = sg_area@data$`number of transactions`, title = "Number of transactions", position = "bottomright")
choroplethMap
Both choropleth maps above give useful information that is useful for readers, however it is not convenient to scroll between two different maps, especially when comparing the data of the two different factors (number of transactions, price). Therefore, the solution is to turn one of the choropleth maps into a bubble map, and overlay the bubbles on top of the other choropleth map. The tooltips can also be combined to give a more complete picture at a glance simply by mousing over the respective planning areas.
# Prepare colour palette
mypalette <- colorBin(palette=carto.pal('purple.pal', n1 = 10), domain=sg_area@data$`median price per sq meter`, na.color="lightgrey")
# Prepare label text to be shown on hover
label_text <- paste(
"Town: ", sg_area@data$town,"<br/>",
"Median $ / Sq. M: $", round(sg_area@data$`median price per sq meter`,2), "<br />",
"Median Price: $", sg_area@data$`Median Resale Price`, "<br />",
"Number of transactions: ", sg_area@data$`number of transactions`,
sep="") %>%
lapply(htmltools::HTML)
# Plot choropleth map using leaflet
finalMap <- leaflet(sg_area) %>%
addMapPane(name = "maplabels", zIndex = 420) %>%
addTiles() %>%
addProviderTiles("CartoDB.PositronOnlyLabels",
options = leafletOptions(pane = "maplabels")) %>%
setView(lat = 1.35, lng = 103.8, zoom=11) %>%
addPolygons(
fillColor = ~mypalette(sg_area@data$`median price per sq meter`),
weight = 2,
opacity = 1,
fillOpacity = 0.9,
color = 'grey',
label = label_text
) %>%
addCircleMarkers( data = sg_area@data,
radius = ~sg_area@data$`number of transactions` / 250,
weight = 2,
opacity = 1,
fillOpacity = 0.9,
fillColor = '#ffcc02',
color = "black",
label = label_text,
stroke = TRUE
) %>%
leaflet::addLegend(pal = mypalette, values = sg_area@data$`median price per sq meter`, title = "Median Price Per Square Meter", position = "bottomright")
## Assuming "longitude" and "latitude" are longitude and latitude, respectively
finalMap
Median price per square meter of resale HDB flats were the highest in Queenstown ($6584), followed by Bukit Timah ($6490) and Bukit Merah ($6096). The rankings for overall Median HDB resale prices are slightly different, with Bukit Timah ($757,500) with the highest median resale price by some distance, followed by Bukit Merah ($586,900) and Queenstown ($566,000). This could indicate that the HDB resale units in Bukit Timah are larger, as they command a higher overall sale price at a lower price per square meter. This is also in line with general reasoning as they are located close to the central business district and commerical offices concentrated in the South. It may be worth further exploration into other possible factors that cause these higher prices such as accessibility to public transport, number of schools and other amenities nearby.
The most number of HDB resale transactions occured in Woodlands, Jurong West and Sengkang, with median prices per square meter on the lower end of the scale. In general, The higher the median price per square meter of HDB resale flats in the area, the lower the number of transactions there are in the area, as shown by the small sizes of the yellow bubbles in deep purple areas like Bukit Timah and Marine Parade, with Bukit Merah being an exception where the prices are in the top-3 while having relatively high amounts of transactions still. It could be a reflection of the demand and supply within the area, as prices get driven up due to low supply of sellers.
With more data, we can also do a year-on-year comparison to determine if there have been fluctuations in HDB resale prices over time. Also with a greater range of years, these visualisations can be expanded into a fully-fledged R-Shiny dashboard where users can freely choose which factors to be used on the chart instead of being bound to just price and number of transactions as it is now. Other supporting visualisations can also be created using the raw transaction data, such as distribution of flat types sold within the area (Pie chart of the 3/4/5 room flats), histogram of all sales within the area etc.