1. Overview

This report aims to visualise the geographical distribution of Singapore’s median property prices and resident’s median income by planning area.

1.1. Purpose of Visualisation

The purpose of the visualisation is to understand whether there is any relationship between the median income and median property prices for each planning area. In addition, we will also try to find out if there are any planning areas that are different from the others.

1.2. Data and Design Challenges

The first challenge is with regard to the availability of dataset. We were not able to find datasets on household income by planning areas. However we were able to find working persons aged 15 years and over by planning area and gross monthly income from work, 2015. Please refer to the following link: https://data.gov.sg/dataset/resident-working-persons-aged-15-years-over-by-planning-area-gross-monthly-income-from-work-2015?resource_id=e4c209d3-4a07-426a-baeb-a7026d09241c. We will use this dataset to derive the median income, as a proxy of how affluent each planning area is. We will also need to perform substantial data transformation to attain the median income for each planning area.

We will incorporate the Real Estate Information System (REALIS) dataset for all private property transactions in 2018. The REALIS dataset contains a wide range of information on the Singapore property market, and it is only available from Urban Redevelopment Authority (URA) on subscription.

We will also incorporate the HDB (public housing) resale dataset, and only filter for transactions in 2018. Please refer to the following link: https://data.gov.sg/dataset/resale-flat-prices?resource_id=42ff9cfe-abe5-4b54-beda-c88f9bb438ee.We will combine with with the REALIS dataset to compute the average property prices for each planning area.

We noted that the period of the income dataset is for 2015, and the period of the Realis dataset is for 2018, however we will proceed to study these 2 datasets together as they are the closest that we can find for our use case.

For the planning area boundary of Singapore, we will download the shape file from the following data.gov.sg link: https://data.gov.sg/dataset/master-plan-2014-planning-area-boundary-web?resource_id=2ab23cb2-b1a4-4b1a-a9e1-b9cad0ac159b. The file was last updated on 5 December 2014.

1.3. Proposed Sketch Design

Please see the proposed sketch design. We will create an interactive map, and visualise the median property prices by the shading of each polygon. We will visualise the median income of each planning area by the shading and size of each bubble. We will also have a tooltip which will display additional information about each polygon.

We will also create the same map a second time, but replace median property income with median price per square foot (PSF).


2. Step-by-step Preparation

2.1. Install and Load R Packages

  • sf is a package to encode spatial vector data.
  • tmap is used build thematic map with interaction.
  • tidyverse is a set of packages to perform data wrangling and exploration.
  • readxl is used to read excel files into R.
  • knitr is used to enable integration of certain R codes into HTML.
  • ggplot2 is a data visualisation package for visualisation and statistical programming language R.
  • scales is used to provide methods to determine breaks and labels for axes and legends.

2.2. Load the Data

First we will load the planning area boundary shape file.

## Reading layer `MP14_PLNG_AREA_WEB_PL' from data source `C:\Users\User\Desktop\SMU Term 2\4. Visual G2\A5. Assignment 5 20200807 (Submitted)\ISSS608-Assignment5-KhooWenYong-v3\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 55 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## Simple feature collection with 55 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## First 10 features:
##    OBJECTID              PLN_AREA_N PLN_AREA_C CA_IND       REGION_N REGION_C
## 1         1                  BISHAN         BS      N CENTRAL REGION       CR
## 2         2             BUKIT BATOK         BK      N    WEST REGION       WR
## 3         3             BUKIT MERAH         BM      N CENTRAL REGION       CR
## 4         4           BUKIT PANJANG         BP      N    WEST REGION       WR
## 5         5             BUKIT TIMAH         BT      N CENTRAL REGION       CR
## 6         6 CENTRAL WATER CATCHMENT         CC      N   NORTH REGION       NR
## 7         7                  CHANGI         CH      N    EAST REGION       ER
## 8         8              CHANGI BAY         CB      N    EAST REGION       ER
## 9         9           CHOA CHU KANG         CK      N    WEST REGION       WR
## 10       10                CLEMENTI         CL      N    WEST REGION       WR
##             INC_CRC FMEL_UPD_D   X_ADDR   Y_ADDR SHAPE_Leng SHAPE_Area
## 1  BA616285F402846F 2014-12-05 28789.76 37450.89   13517.12    7618921
## 2  FB44C870B04B7F57 2014-12-05 19255.42 37527.65   15234.22   11133256
## 3  738B479882E4EE28 2014-12-05 26865.78 28662.87   29156.29   14462472
## 4  4A9C6E6BAF7BE998 2014-12-05 21287.04 38761.84   15891.85    9019940
## 5  C893AEAD20F42559 2014-12-05 23256.76 34689.00   22492.84   17526654
## 6  52D0068508B0348A 2014-12-05 24424.42 39849.05   30538.25   37147854
## 7  70EAFFC7C76EA5AD 2014-12-05 46307.48 36991.43   31342.12   40940490
## 8  ACFD43D88C72381C 2014-12-05 49502.49 34316.54   18731.56    1829822
## 9  F361929FAF5E9611 2014-12-05 18415.05 40833.42   11518.81    6117294
## 10 8A5BEDB748EAA0AA 2014-12-05 19923.36 33319.17   15024.87    9516228
##                          geometry
## 1  MULTIPOLYGON (((29772.19 38...
## 2  MULTIPOLYGON (((20294.46 39...
## 3  MULTIPOLYGON (((26228.63 30...
## 4  MULTIPOLYGON (((21448.72 41...
## 5  MULTIPOLYGON (((24031.39 36...
## 6  MULTIPOLYGON (((25073.29 43...
## 7  MULTIPOLYGON (((44586.3 417...
## 8  MULTIPOLYGON (((48860.11 34...
## 9  MULTIPOLYGON (((18349.06 43...
## 10 MULTIPOLYGON (((19680.06 31...

We will perform a quick plot to ensure that the map works well.


Next we will load the income dataset.

## Parsed with column specification:
## cols(
##   year = col_double(),
##   level_1 = col_character(),
##   level_2 = col_character(),
##   level_3 = col_character(),
##   value = col_double()
## )

We will select 5 random rows to inspect the dataset.

year level_1 level_2 level_3 value
2015 $3,000 - $3,999 Total Bukit Timah 2.1
2015 $8,000 - $8,999 Total Others 0.9
2015 $2,500 - $2,999 Total Punggol 3.5
2015 $3,000 - $3,999 Total Yishun 18.0
2015 Below $1,000 Total Jurong East 4.3

We will load the REALIS dataset.

## Parsed with column specification:
## cols(
##   `Project Name` = col_character(),
##   Address = col_character(),
##   `No. of Units` = col_double(),
##   `Area (sqm)` = col_double(),
##   `Type of Area` = col_character(),
##   `Transacted Price ($)` = col_double(),
##   `Unit Price ($ psm)` = col_double(),
##   `Unit Price ($ psf)` = col_double(),
##   `Sale Date` = col_character(),
##   `Property Type` = col_character(),
##   Tenure = col_character(),
##   `Type of Sale` = col_character(),
##   `Purchaser Address Indicator` = col_character(),
##   `Postal District` = col_double(),
##   `Postal Sector` = col_double(),
##   `Postal Code` = col_double(),
##   `Planning Region` = col_character(),
##   `Planning Area` = col_character()
## )

We will select 5 random rows and certain specific columns to inspect the dataset.

Project Name Address Unit Price Unit PSF Planning Area
TANGLIN REGENCY 381 Tanglin Road #09-03 1218000 15418 Bukit Merah
MOUNTBATTEN SUITES 861 Mountbatten Road #05-04 1023000 15500 Marine Parade
RIVERCOVE RESIDENCES 32 Anchorvale Lane #06-24 1076200 9965 Sengkang
TIVOLI GRANDE 128 Koon Seng Road #02-17 1002800 13551 Marine Parade
KENT RIDGE HILL RESIDENCES 62 South Buona Vista Road #04-37 1121000 18683 Queenstown

Lastly we will load the HDB resales prices dataset.

## Parsed with column specification:
## cols(
##   month = col_character(),
##   town = col_character(),
##   flat_type = col_character(),
##   block = col_character(),
##   street_name = col_character(),
##   storey_range = col_character(),
##   floor_area_sqm = col_double(),
##   flat_model = col_character(),
##   lease_commence_date = col_double(),
##   remaining_lease = col_character(),
##   resale_price = col_double()
## )

We will select 5 random rows and certain specific columns to inspect the dataset.

month town flat_type floor_area_sqm resale_price
2020-06 JURONG WEST 3 ROOM 68 310000
2017-08 SEMBAWANG 5 ROOM 123 462000
2020-06 BEDOK 4 ROOM 92 342000
2017-05 ANG MO KIO 3 ROOM 75 362000
2020-06 SENGKANG 4 ROOM 90 367000

2.3. Data Wrangling

2.3.1. Preparing the dataset to compute median income by planning area

We will first work on the income dataset. The end goal is to attain the median income for each planning area. Due to the nature of the raw dataset, we will only be able to estimate the median income, and not the mean income, for each planning area. As such, we will need to perform some transformation to first compute the cumulative number of resident at the 50th percentile.

We will select 5 random rows and certain specific columns to inspect the dataset at this stage. Please note that all columns from $2,000 to $10,999 are omitted for ease of viewing.

planning area Below $1,000 $1,000 - $1,499 $1,500 - $1,999 $11,000 - $11,999 $12,000 & Over cum_median_pop
SEMBAWANG 3.2 3.6 3.0 0.7 2.3 23.60
CHOA CHU KANG 7.3 7.8 6.3 1.3 6.5 48.70
JURONG WEST 13.4 13.8 11.6 1.8 7.7 79.25
CLEMENTI 3.8 3.6 2.4 1.0 6.1 23.10
OTHERS 1.1 1.6 1.0 0.5 5.9 10.75

We have computed the cumulative median population amount for each planning area. For the next step, we will copy this table to excel and perform a manual transformation. This is becuase the following steps to attain the median income for each planning area requires some judgement.

Firstly we will use excel to compute the cumulative number of resident at each income bracket, under column U to AI.



Next we will use the cumulative median income population to derive the median income bracket under column S. For example, the cumulative median income population for Ang Mo Kio is 50.65, hence the median income bracket fall into the bracket of $3,000 to $3,999. This is becuase the cumulative income population for the previous bracket is only up to 43.9.



We will estimate the median income of each planning area by using the mid point of each bracket under column T. For example, Ang Mo Kio is in the bracket of $3,000 to $3,999, hence the median income of Ang Mo Kio will be estimated to be $3,500. There is one planning area, Tanglin, whose median income falls into the bracket of $12,000 and over. For this case, we will assume that the median income is $12,000, however the actual value is not determinable, and will be higher than $12,000.

We will read the median income dataset prepared in excel back into R.

We will only retain the required columns in the dataframe, and select 5 random rows for checking.

planning area Median Income
SERANGOON 4500
ANG MO KIO 3500
GEYLANG 3500
SENGKANG 4500
TANGLIN 12000

2.3.2. Preparing the Housing Prices dataset

As the REALIS and HDB dataset is currently by individual transactions, we will combined the 2 dataset and perform an aggregate to compute the median unit price and unit price per square foot (PSF) for each planning area. We will use the median in this case as we do not want our metric to be affected by extreme data points (e.g. extremely high prices and PSF). We will also convert the planning area values to uppercase to perform joins with the shape file later. We will select 5 random rows for checking.

First we will prepare the HDB dataset.

Next we will prepare the REALIS dataset.

Now we will append both datasets, and perform an aggregation to calculate the median price and PSM for each planning area.

Planning Area Median Housing Price Median Housing PSF No of Transactions
2 BEDOK 950000 954.0000 2979
30 ROCHOR 1350000 1620.5000 132
18 KALLANG/WHAMPOA 446250 481.5261 626
40 WOODLANDS 371000 327.8934 1937
20 MARINE PARADE 1650000 1505.0000 636

2.4. Exploring the Dataset

First we will plot a horizontal bar chart of the median income of all planning areas to understand the dataset.


We notice that a substantial amount, 14 of the planning areas have median income of $3.5k, and the remaining generally fall into between $4.5k to $5.5k. There are only 2 exceptional planning areas, Bukit Timah and Tanglin that have significantly higher median income of $9.5k and $12k respectively. From this graph, we can understand that while there is only a slight distribution of median income across the planning areas, apart from Bukit Timah and Tanglin.

Next we will plot a horizontal bar chart to better understand the property prices.


We notice that unlike the median income, there is a spread of median property prices across the planning areas. The highest 2 areas, Orchard and Southern Islands, have median prices of around $3.7m. On the other hand, some of the heartland areas like Yishun and Woodlands have median prices of around $370k. When we perform the map view later, we will get a better understanding on how this translate to the geographical distribution of median prices across Singapore.

We will also take a look at the prices PSF by planning area.


When we consider the price per square foot (PSF), there is also quite a spread across the planning areas. However we notice some slight difference in ranking. For example, areas like River Valley and Orchard take the top spot at around $2.6k PSF. Southern Islands dropped quite a bit to the 12th position at $1.6k PSF. This suggests that houses in Southern Islands, while still relative expensive (by PSF), are generally larger in area by square foot.

Unsurpringly, the areas with lower PSF still belonged to the heartland areas that are far from the central region in Singapore.

2.5. Preparing the Map View

Firstly we will perform the various left_join from the income and housing prices dataset to the map shape file.

We noted that as there are lesser median income data for the planning areas as compared to the housing median prices, we will need to perform an order to arrange the non-Null values at the top of the dataframe. If we omit this step, tmap will display the median income bubbles in the wrong planning area.

Now we will create an interactive map view of median housing price and the median income of each planning area using tmap. We will edit the popup to indicate more information for each bubble and planning area. We will also set the style to quantile as it will provide a breakdown of 5 quantiles for more intuitive viewing.

## tmap mode set to interactive viewing
## Legend for symbol sizes not available in view mode.

The other metric that is commonly used to compare houses is the price per square foot (PSF). As such, we will also plot a map view of the median housing PSF and the median income of each planning area. We will edit the popup to indicate more information for each bubble and planning area.

## tmap mode set to interactive viewing
## Legend for symbol sizes not available in view mode.

3. Final Data Visualisation

In this report, we will study the map view of the median income in 2015, against the median housing prices and PSF in 2018, across the various planning areas. The difference in period is due to the limitation of available dataset, nonethless it can provide good insights. Please note that they are no median income and property prices for certain planning areas due to the limitation of the dataset.

Interactive Map View of Median Housing Prices (2018) and Median Income (2015) by Planning Area
Note 1: Click on planning area to view more information.
Note 2: The actual median income for Tanglin is higher than $12k, however it is not determinable and we will assume it to be $12k for this visualisation.

We first notice that there are central areas, such as Tanglin, Orchard, Newton, River Valley, Downtown Core and Southern Islands at the top quantile of the median housing price. This is expected as they are the prime land in Singapore’s context. In the case of Tanglin, we notice that the median income ($12k) is also in the highest category, which does correspond with the high property prices.

Besides Bukit Timah and Tanglin, the other planning areas generally have a mix of median income from $3.5k to $5.5k regardless of the area.

Another interesting point is that while Punggol is in the lowest housing quantile with median property price of $455k, the residents have slightly higher median income of $5.5. This may suggest that Punggol residents are slighlty more affluent relative to the median housing price in that area.

We also noted that Sungei Kadut and Changi have unusually high median price, but both of them also have an unusually low number of transaction at 5 transactions for the whole of 2018.

Interactive Map View of Median Housing PSF (2018) and Median Income (2015) by Planning Area
Note 1: Click on planning area to view more information.
Note 2: The actual median income for Tanglin is higher than $12k, however it is not determinable and we will assume it to be $12k for this visualisation.
When we further study the map view by PSF, we noted that the higher PSF housing are indeed in the central area. The previous observation on Sungei Kadut and Changi having high property prices might simply be due to a few transactions of large houses in these 2 areas.

Similarly, we notice that some planning areas (i.e. Bishan, Novena, Marine Parade) with slightly higher median income are generally closer the central region, which is in line with the higher property PSF, with the exception of Punggol.

Conclusion and Future Works
Besides Bukit Timah and Tanglin, it is not very conclusive that more affluent residents are staying in the central region, where property prices and PSF are higher. This might be due to the limitation of the available dataset that currently only shows individual workings persons per planning area. For future works, data analysts may use these codes to plot resident household income (if available) from various sources (e.g. full time employment, business) by planning area, which will better present how households income are correlated to property prices.

4. References