Singapore HDB’s Mean Resale Price in 2019
Overview
For this assignment, we will explore the dataset taken from this source:
The purpose of the visualization is to explore Singapore HDB’s resale flat prices in year 2019 and we aim to answer the following questions:
What is the mean resale price broken down by town?
Does higher storey HDB commands a higher price than lower storey HDB?
An interactive map to visualize the resale price in each town
Finally, an interactive map broken down by region
Major Data and Design Challenges
Key Challenges
1. Data Challenge - Missing Year Data
The source dataset is missing the year data. The date field has concatenate both the year and month information together, in the form 2017-01, 2017-02, etc. The challenge would be how do we extract the year data from this format.
2. Design Challenge - No obvious cluster for visualization
There are a total of 26 towns. Visualizing this dataset would be difficult. It is not feasible to use the 26 towns as a cluster, because it would be hard for users to understand what is being visualized. It would also be hard for users to draw valuable insights if we used a huge cluster size.
3. Design Challenge - No obvious common field between aspatial and geospatial dataset
Our aspatial resale data location information is ‘town’, there isn’t a field ‘town’ in our geospatial dataset. Given the geospatial dataset has many locational field, the challenge would be mapping common field correctly.
Suggestions to Overcome Challenges
| No. | Challenge | Proposed Solution |
|---|---|---|
| 1 | Missing year data | Cleanse the data in excel, use the function left() to extract the first 4 character of the date field |
| 2 | No obvious cluster | Break down the towns into regions (Central, East, North, North-East and West). We will then visualize the data based on region which is a more manageable cluster size as compared to planning area and subzone. To do so, download the URA Master Plan subzone boundary in shapefile format (i.e. MP14_SUBZONE_WEB_PL) found from Data.gov.sg, to get the region information. Then map the information accordingly. |
| 3 | No obvious common field | After analysing both aspatial and geospatial file, it is discovered that the field ‘town’ in the aspatial file can be mapped to the field ‘PLN_AREA_N’ in the geospatial file. We will use a left join to accomplish this. |
Proposed Design
Step-by-Step Guide
Step 1: Load Library
Step 2: Load Data
resale <- read.csv("C:/Users/think/Desktop/1-SMU Term 3/5-Visual Analytics and Applications/Assignment 5/data/aspatial/resale.csv", header = T)
mpsz <- st_read(dsn = "C:/Users/think/Desktop/1-SMU Term 3/5-Visual Analytics and Applications/Assignment 5/data/geospatial", layer = "MP14_SUBZONE_WEB_PL")## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\think\Desktop\1-SMU Term 3\5-Visual Analytics and Applications\Assignment 5\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
Step 3: Extract 2019 dataset and important fields
Step 4: Visualize the mean resale price broken down by town, in an interactive bar chart
First, prepare the data using this code:
options(highcharter.theme = hc_theme_smpl(tooltip = list(valueDecimals = 2)))
plotmean <- resale2019 %>%
group_by(town) %>%
summarise(Mean_Resale_Price = mean(resale_price))
plotmean_sorted = plotmean[order(plotmean$Mean_Resale_Price),]Next, plot the interactive barchart using this code:
Step 5: Visualize whether higher storey HDB commands a higher price, in an interactive heatmap
First, prepare the data using this code:
plotfloor<- resale2019 %>%
group_by(town, storey_range) %>%
summarise(Mean_Resale_Price = mean(resale_price)) Next, plot the interactive heatmap using this code:
heatmap <- ggplot(data = plotfloor,
mapping = aes(x = town, y = storey_range,fill = Mean_Resale_Price)) +
geom_tile() +
labs(title = "Heatmap of HDB breakdown by area and storey", x = "Town", y = "Storey") +
scale_fill_gradient(name = "Mean Resale Price",
low = "#ffedde",
high = "#c24b0c")+
theme(axis.text.x = element_text(angle = 45))Step 6: Visualize the mean resale price using a Choropleth Map
First, prepare the data using this code:
Next, plot an interactive choropleth map using this code:
tm_shape(mpsz_resale2019)+
tm_fill("Mean_Resale_Price",
n = 6,
style = "quantile",
palette = "Oranges") +
tm_borders(alpha = 0.5)For deeper analysis, we further broken down the choropleth map into different regions using this code:
Final Visualization
Insight 1: HDB in Bukit Timah, Bishan and Central Area are the most expensive
Insight 2.1: Higher storey command better prices
Insight 2.2: There is no price differentiation if you stay at unpopular town
Insight 2.3: Bukit Timah does not have high storey building, while Central has the most high storey building
First, based on the heatmap, it is generally true that higher storey units commands a better price, as illustrated from the darker colours on higher storey.
However, if you stay in unpopular towns such as Woodlands, Sembawang, Choa Chu Kang, Jurong West, Pasir Ris, Punggol and Hougang, there is not much differentiation in price even for higher storey units. This is probably because these town are at the edges of Singapore. It is extremely inconvenient to commute from those areas to town. Therefore, there is less demand for those flats resulting in a lower price.
Finally, from the heatmap, it is observed that flats in Bukit Timah is relatively low rised compared to other towns. The highest storey in Bukit Timah is only 15. This is probably because Bukit Timah is known as an exclusive area with many Bungalows and Terraces houses. Therefore, to maintain its “prestige”, the urban planning ministry might have intentionally kept HDB storey at Bukit Timah low too.
## `summarise()` regrouping output by 'town' (override with `.groups` argument)
Insight 3: HDB in central region are more expensive, while North region is the cheapest
Based on the choropleth map below, it is observed that certain areas are more expensive than others. Prices are usually similar for the surrounding cluster.
When we dive deeper into the map and classify them into their respective region, it is observed that HDB in central area are the most expensive.
HDB in central region such as Toa-Payoh, Kallang and Tanjong Pagar, are usually mature towns with many amenities, shopping centers, offices and gym facilities. Therefore it is reasonable that they command a higher price, for the convenience the location provides.
It is also observed that HDB in the North region are the cheapest, as these are usually new towns like Seng Kang with fewer amenities.
## tmap mode set to plotting