1. Major Data & Design Challenges

The transaction details of Singapore private residential property, Jan 1995 to Jun 2020 data series provided by the Urban Redevelopment Authority (URA) could be downloaded from the Real Estate Information System (REALIS) via SMU Library (see Figure 1).

Figure 1: REALIS website

At each time, only 10,000 rows of transaction data could be downloaded. The data is downloaded as 1,000 rows of transaction data per .csv file (see Figure 2).

Figure 2: Multiple .csv Files with up to 1,000 Rows

The other dataset to use would be from the ‘geospatial’ folder provided during Lesson 10 of this course.

The challenges with working with the downloaded data and visualizing it are described in the following sub-sections.

1.1. Need to Merge Separate .csv files

There is a need to merge the separate .csv files, downloaded from REALIS, each containing up to 1,000 rows of transaction data. I used Python to concatenate all the .csv files together into a single .csv file. See Figure 3 for the final merged .csv file. This file contains the transaction data of Singapore private residential property from Jan 1995 to Jun 2020. As this dataset is already in the ‘long’ format, there is no need to transform it from aggregated format. However, more work is needed to transform this ‘long’ format to suitable formats to visualize the data in a map. More details on preparing the data can be found in Section 2.

Figure 3: Final Merged File

1.2. Need to Change Date Format

However, the dataset contained the ‘Sale Date’ in different date formats. As we would like to visualize the data by year, we could transform all of the dates to ‘yyyy’ format in R. More details can be found in Section 2.

1.3. Need to Merge .csv file and Map File

To visualize data on a map, we will need to transform the ‘finalREALIS.csv’ file into dataframes containing required data and merge it with the map dataset from Lesson 10’s geospatial folder. By examining the files, we see that the ‘Planning Area’ column from the ‘finalREALIS.csv’ file is common with the ‘PLN_AREA_N’ column from the map dataset, so there should be no problem joining the datasets. More details on the transformation of the .csv file into suitable dataframes and joining to the map dataset are given in Section 2.3.

1.4. Proposed Design

1.4.1. Overview

There would be an overview visualization to enable the reader to appreciate the total transaction volume and the mean transaction value since Jan 1995 till Jun 2020 by planning regions. Since the names of the planning regions are long, we could display them on the y-axis so that the axis would not be too cluttered. A bar chart sorted in descending counts of total transactions and mean transaction values would be simple for the reader to appreciate how the planning regions compare with one another. See Figure 4 below for the design of the interactive bar charts.

Figure 4: Sketch of Proposed Interactive Overview Bar Charts

1.4.2. Details of Transactions in Latest Year (2019)

Since there is no transaction data for the whole of Year 2020, we will use the transaction data for Year 2019 for visualization. Within 2019, users may be interested to understand how the different planning areas compare with one another in terms of total transaction volume and total transaction value. By visualizing these on the map of Singapore, with appropriate colour legend, the user would find it intuitive to have a quick overview of the comparison. See Figure 5 for the proposed design of the static maps.

Figure 5: Sketch of Proposed Static Maps

If the user is interested, the interactive maps proposed below would enable him/her to mouse over the various parts of the map to display the planning areas and clicking on each area will display its total transaction volume and total transaction value for Year 2019. See Figure 6 for the proposed design of the interactive maps.

Figure 6: Sketch of Interactive Maps

2. Step-by-Step Description on Data Visualization Preparation

2.1. Installation and Loading of Required Libraries

sf, tmap, tidyverse, plotly, gapminder, ggplot2 and htmlwidgets packages were loaded using the following lines:

#installing and loading the required libraries
packages = c('sf', 'tmap', 'tidyverse', 'plotly', 'gapminder', 'ggplot2', 'htmlwidgets')
for (p in packages){
  if(!require(p,character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2.2. Reading in Datasets

The datasets ‘FinalRealis.csv’ and geospatial dataset were read and imported to R using the following lines:

#importing private residential transaction data from FinalRealis.csv file
transactions = read_csv("data/FinalRealis.csv")

#importing map data from geospatial folder
mpsz = st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")

## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\SMU MITB\Semester 3\2. ISSS608 Visual Analytics & Applications\Assignments\Assg5\Data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21

2.3. Transform Date Format

The format of Sale Date was changed from “dd-mmm-yyyy” and “mmm-yyy” to “yyyy” format using the multidate function. The dataframe with transformed Sale Date format is passed to ‘trans1’.

multidate <- function(data, formats){
    a<-list()
    for(i in 1:length(formats)){
        a[[i]]<- as.Date(data,format=formats[i])
        a[[1]][!is.na(a[[i]])]<-a[[i]][!is.na(a[[i]])]
        }
    a[[1]]
    }

trans1 = transactions %>%
  mutate(`Sale Date` = format(multidate(`Sale Date`, format = c('%d-%b-%Y', '%b-%Y')), "%Y"))

The dataframe ‘trans1’ can now be used to plot the interactive bar graphs for the overview visualization.

2.4. Preparation of Overview Chart

2.4.1 Preparation of Total Transaction Volume Interactive Bar Chart

We first set the levels in the order that we want. Specifically, we want to sort the planning regions in decreasing counts of transactions.

library(scales)

## set the levels in the order that we want
transSorted <- within(trans1, 
                   `Planning Region` <- factor(`Planning Region`, 
                                      levels=names(sort(table(`Planning Region`), 
                                                        decreasing=FALSE))))

ggplot function is called with data argument as ‘transSorted’. Then the aesthetic mapping ‘x= Planning Region’ is called.

## set the levels in the order that we want
transSorted <- within(trans1, 
                   `Planning Region` <- factor(`Planning Region`, 
                                      levels=names(sort(table(`Planning Region`), 
                                                        decreasing=FALSE))))
transSorted %>%
  ggplot(aes(x= `Planning Region`))

Next, the geom object ‘geom_bar’ is called in order to display a vertical bar chart of count of transactions for various planning regions. Since we wanted the planning regions to be on the y-axis, we added a function ‘coord_flip’ to switch the vertical bar chart to a horizontal bar chart.

## set the levels in the order that we want
transSorted <- within(trans1, 
                   `Planning Region` <- factor(`Planning Region`, 
                                      levels=names(sort(table(`Planning Region`), 
                                                        decreasing=FALSE))))
transSorted %>%
  ggplot(aes(x= `Planning Region`)) + 
  geom_bar() +
  coord_flip()

The above chart has horizontal gridlines which is unnecessary data ink. To remove it, yet retain the vertical major and minor gridlines, we add on the theme() function specifying our design for the vertical major and minor gridlines.

## set the levels in the order that we want
transSorted <- within(trans1, 
                   `Planning Region` <- factor(`Planning Region`, 
                                      levels=names(sort(table(`Planning Region`), 
                                                        decreasing=FALSE))))
transSorted %>%
  ggplot(aes(x= `Planning Region`)) + 
  geom_bar() +
  coord_flip() +
  theme(panel.grid.major.y = element_blank(),   panel.grid.minor.y = element_blank())

We want to change the count axis to a more readable number format, with thousands separated by a comma. We do this by importing a library called ‘scales’, and specifying ‘labels = comma’.

library(scales)

## set the levels in the order that we want
transSorted <- within(trans1, 
                   `Planning Region` <- factor(`Planning Region`, 
                                      levels=names(sort(table(`Planning Region`), 
                                                        decreasing=FALSE))))
transSorted %>%
  ggplot(aes(x= `Planning Region`)) + 
  geom_bar() +
  coord_flip() +
  theme(panel.grid.major.y = element_blank(),   panel.grid.minor.y = element_blank()) +
  scale_y_continuous(labels = comma)

We also want to relabel the x-axis and add in the title for the graph. Hence, labs() functions were added.

library(scales)

## set the levels in the order that we want
transSorted <- within(trans1, 
                   `Planning Region` <- factor(`Planning Region`, 
                                      levels=names(sort(table(`Planning Region`), 
                                                        decreasing=FALSE))))
transSorted %>%
  ggplot(aes(x= `Planning Region`)) + 
  geom_bar() +
  coord_flip() +
  theme(panel.grid.major.y = element_blank(),   panel.grid.minor.y = element_blank()) +
  scale_y_continuous(labels = comma) +
  labs(y = "Total Transaction Volume") +
  labs(title = "Total Transaction Volumes, Jan 1995 - Jun 2020, by Planning Region")

Finally, to make the bar chart an interactive graph, we used ggplotly.

library(scales)

## set the levels in the order that we want
transSorted <- within(trans1, 
                   `Planning Region` <- factor(`Planning Region`, 
                                      levels=names(sort(table(`Planning Region`), 
                                                        decreasing=FALSE))))
ts <- transSorted %>%
  ggplot(aes(x= `Planning Region`)) + 
  geom_bar() +
  coord_flip() +
  theme(panel.grid.major.y = element_blank(),   panel.grid.minor.y = element_blank()) +
  scale_y_continuous(labels = comma) +
  labs(y = "Total Transaction Volume") +
  labs(title = "Total Transaction Volumes, Jan 1995 - Jun 2020, by Planning Region")

ggplotly(ts)

2.4.2 Preparation of Mean Transaction Value Interactive Bar Chart

Similar to the preparation of Total Transaction Volume Interactive Bar Chart, we first call the ggplot function with data argument as ‘trans1’. Then the aesthetic mapping “x = reorder(Planning Region, Transacted Price ($)), y = Transacted Price ($)” is called. This reordering is to ensure that the planning regions were sorted according to descending values of transacted price.

trans1 %>%
  ggplot(aes(x = reorder(`Planning Region`, `Transacted Price ($)`), y = `Transacted Price ($)`))

Next, stat_summary() function is called, with geom = “bar” to display a bar chart of mean transacted prices for each planning region.

trans1 %>%
  ggplot(aes(x = reorder(`Planning Region`, `Transacted Price ($)`), y = `Transacted Price ($)`)) +
  stat_summary(geom = "bar", fun = "mean")

Since we wanted the planning regions to be on the y-axis, we added a function ‘coord_flip’ to switch the vertical bar chart to a horizontal bar chart. The above chart has horizontal gridlines which is unnecessary data ink. To remove it, yet retain the vertical major and minor gridlines, we add on the theme function specifying our design for the vertical major and minor gridlines.

trans1 %>%
  ggplot(aes(x = reorder(`Planning Region`, `Transacted Price ($)`), y = `Transacted Price ($)`)) +
  stat_summary(geom = "bar", fun = "mean") +
  coord_flip() +
  theme(panel.grid.major.y = element_blank(),   panel.grid.minor.y = element_blank())

We want to change the count axis to a more readable number format, with thousands separated by a comma. We do this by importing a library called ‘scales’, and specifying ‘labels = comma’.

library(scales)

trans1 %>%
  ggplot(aes(x = reorder(`Planning Region`, `Transacted Price ($)`), y = `Transacted Price ($)`)) +
  stat_summary(geom = "bar", fun = "mean") +
  theme(panel.grid.major.y = element_blank(),   panel.grid.minor.y = element_blank()) +
  coord_flip() +
  scale_y_continuous(labels = comma)

We also want to relabel the x- and y-axes and add in the title for the graph. Hence, labs() functions were added.

library(scales)

trans1 %>%
  ggplot(aes(x = reorder(`Planning Region`, `Transacted Price ($)`), y = `Transacted Price ($)`)) +
  stat_summary(geom = "bar", fun = "mean") +
  theme(panel.grid.major.y = element_blank(),   panel.grid.minor.y = element_blank()) +
  coord_flip() +
  scale_y_continuous(labels = comma) +
  labs(x = "Planning Region", y = "Total Transaction Value (S$)") +
  labs(title = "Mean Transaction Values (S$), Jan 1995 - Jun 2020, by Planning Region")

Finally, to make the bar chart an interactive graph, we could use ggplotly.

library(scales)

ts2 <- trans1 %>%
  ggplot(aes(x = reorder(`Planning Region`, `Transacted Price ($)`), y = `Transacted Price ($)`)) +
  stat_summary(geom = "bar", fun = "mean") +
  theme(panel.grid.major.y = element_blank(),   panel.grid.minor.y = element_blank()) +
  coord_flip() +
  scale_y_continuous(labels = comma) +
  labs(x = "Planning Region", y = "Total Transaction Value (S$)") +
  labs(title = "Mean Transaction Values (S$), Jan 1995 - Jun 2020, by Planning Region")

ggplotly(ts2)

2.5. Preparation for 2019 Transaction Dataset and Joining to Map Dataset

2.5.1 Preparation of Dataframe for 2019 Total Transaction Volume

Since we only want the data from Year 2019, we first apply a filter to the dataframe ‘trans1’. Also, since the values of ‘PLN_AREA_N’ in the map dataset is in uppercase, we convert the strings in ‘trans1’ column ‘Planning Area’ into uppercase. Next, since we do not need other details except Planning Region, Planning Area, Type of Area, we use the function select() to select these columns only. Finally, we use a combination of group_by() function and tally() function to count the number of unique groups of planning region, planning area and type of area (strata or land). The tally() function will create a column ‘n’ for the counts of unique groupings we had specified under group_by().

trans1 %>%
  filter(`Sale Date` == 2019) %>%
  mutate_at(.vars = vars(`Planning Area`), toupper) %>%
  select(`Planning Region`, `Planning Area`, `Type of Area`) %>%
  group_by(`Planning Region`, `Planning Area`, `Type of Area`) %>%
  tally()

## # A tibble: 68 x 4
## # Groups:   Planning Region, Planning Area [38]
##    `Planning Region` `Planning Area` `Type of Area`     n
##    <chr>             <chr>           <chr>          <int>
##  1 Central Region    BISHAN          Land              54
##  2 Central Region    BISHAN          Strata           388
##  3 Central Region    BUKIT MERAH     Land               3
##  4 Central Region    BUKIT MERAH     Strata          1091
##  5 Central Region    BUKIT TIMAH     Land             137
##  6 Central Region    BUKIT TIMAH     Strata           896
##  7 Central Region    DOWNTOWN CORE   Strata           382
##  8 Central Region    GEYLANG         Land              36
##  9 Central Region    GEYLANG         Strata          1110
## 10 Central Region    KALLANG         Land               6
## # ... with 58 more rows

Next, we use spread() function to transform the rows of “Type of Area” and values “n” into new columns “Land” and “Strata”.

transVol = trans1 %>%
  filter(`Sale Date` == 2019) %>%
  mutate_at(.vars = vars(`Planning Area`), toupper) %>%
  select(`Planning Region`, `Planning Area`, `Type of Area`) %>%
  group_by(`Planning Region`, `Planning Area`, `Type of Area`) %>%
  tally()


transVol = transVol %>%
  spread(`Type of Area`, n)

transVol

## # A tibble: 38 x 4
## # Groups:   Planning Region, Planning Area [38]
##    `Planning Region` `Planning Area`  Land Strata
##    <chr>             <chr>           <int>  <int>
##  1 Central Region    BISHAN             54    388
##  2 Central Region    BUKIT MERAH         3   1091
##  3 Central Region    BUKIT TIMAH       137    896
##  4 Central Region    DOWNTOWN CORE      NA    382
##  5 Central Region    GEYLANG            36   1110
##  6 Central Region    KALLANG             6    422
##  7 Central Region    MARINE PARADE      40    561
##  8 Central Region    MUSEUM             NA     45
##  9 Central Region    NEWTON              4    213
## 10 Central Region    NOVENA             50    525
## # ... with 28 more rows

Since we observed values ‘NA’ in the above dataframe, we replace them by value ‘0’ using the is.na() function.

transVol = trans1 %>%
  filter(`Sale Date` == 2019) %>%
  mutate_at(.vars = vars(`Planning Area`), toupper) %>%
  select(`Planning Region`, `Planning Area`, `Type of Area`) %>%
  group_by(`Planning Region`, `Planning Area`, `Type of Area`) %>%
  tally()

transVol = transVol %>%
  spread(`Type of Area`, n)

transVol[is.na(transVol)] <- 0

transVol

## # A tibble: 38 x 4
## # Groups:   Planning Region, Planning Area [38]
##    `Planning Region` `Planning Area`  Land Strata
##    <chr>             <chr>           <int>  <int>
##  1 Central Region    BISHAN             54    388
##  2 Central Region    BUKIT MERAH         3   1091
##  3 Central Region    BUKIT TIMAH       137    896
##  4 Central Region    DOWNTOWN CORE       0    382
##  5 Central Region    GEYLANG            36   1110
##  6 Central Region    KALLANG             6    422
##  7 Central Region    MARINE PARADE      40    561
##  8 Central Region    MUSEUM              0     45
##  9 Central Region    NEWTON              4    213
## 10 Central Region    NOVENA             50    525
## # ... with 28 more rows

Finally, we would like to have a column summing up the total counts of transaction for each planning area. Hence, we create a new column ‘TOTAL TRANSACTION VOLUME’ using the function mutate().

transVol = trans1 %>%
  filter(`Sale Date` == 2019) %>%
  mutate_at(.vars = vars(`Planning Area`), toupper) %>%
  select(`Planning Region`, `Planning Area`, `Type of Area`) %>%
  group_by(`Planning Region`, `Planning Area`, `Type of Area`) %>%
  tally()

transVol = transVol %>%
  spread(`Type of Area`, n)

transVol[is.na(transVol)] <- 0

transVol = transVol %>%
  mutate(`TOTAL TRANSACTION VOLUME` = `Land` + `Strata`)

transVol

## # A tibble: 38 x 5
## # Groups:   Planning Region, Planning Area [38]
##    `Planning Region` `Planning Area`  Land Strata `TOTAL TRANSACTION VOLUME`
##    <chr>             <chr>           <int>  <int>                      <int>
##  1 Central Region    BISHAN             54    388                        442
##  2 Central Region    BUKIT MERAH         3   1091                       1094
##  3 Central Region    BUKIT TIMAH       137    896                       1033
##  4 Central Region    DOWNTOWN CORE       0    382                        382
##  5 Central Region    GEYLANG            36   1110                       1146
##  6 Central Region    KALLANG             6    422                        428
##  7 Central Region    MARINE PARADE      40    561                        601
##  8 Central Region    MUSEUM              0     45                         45
##  9 Central Region    NEWTON              4    213                        217
## 10 Central Region    NOVENA             50    525                        575
## # ... with 28 more rows

2.5.2 Joining 2019 Total Transaction Volume Dataframe to Map Data

The prepared ‘transVol’ dataframe is left joined to the map data using their common values in “PLN_AREA_N” and “Planning Area” to create a new dataframe ‘mpsz_2019’.

mpsz_2019 <- left_join(mpsz, transVol, by = c("PLN_AREA_N" = "Planning Area"))

mpsz_2019

## Simple feature collection with 323 features and 19 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## First 10 features:
##    OBJECTID SUBZONE_NO       SUBZONE_N SUBZONE_C CA_IND      PLN_AREA_N
## 1         1          1    MARINA SOUTH    MSSZ01      Y    MARINA SOUTH
## 2         2          1    PEARL'S HILL    OTSZ01      Y          OUTRAM
## 3         3          3       BOAT QUAY    SRSZ03      Y SINGAPORE RIVER
## 4         4          8  HENDERSON HILL    BMSZ08      N     BUKIT MERAH
## 5         5          3         REDHILL    BMSZ03      N     BUKIT MERAH
## 6         6          7  ALEXANDRA HILL    BMSZ07      N     BUKIT MERAH
## 7         7          9   BUKIT HO SWEE    BMSZ09      N     BUKIT MERAH
## 8         8          2     CLARKE QUAY    SRSZ02      Y SINGAPORE RIVER
## 9         9         13 PASIR PANJANG 1    QTSZ13      N      QUEENSTOWN
## 10       10          7       QUEENSWAY    QTSZ07      N      QUEENSTOWN
##    PLN_AREA_C       REGION_N REGION_C          INC_CRC FMEL_UPD_D   X_ADDR
## 1          MS CENTRAL REGION       CR 5ED7EB253F99252E 2014-12-05 31595.84
## 2          OT CENTRAL REGION       CR 8C7149B9EB32EEFC 2014-12-05 28679.06
## 3          SR CENTRAL REGION       CR C35FEFF02B13E0E5 2014-12-05 29654.96
## 4          BM CENTRAL REGION       CR 3775D82C5DDBEFBD 2014-12-05 26782.83
## 5          BM CENTRAL REGION       CR 85D9ABEF0A40678F 2014-12-05 26201.96
## 6          BM CENTRAL REGION       CR 9D286521EF5E3B59 2014-12-05 25358.82
## 7          BM CENTRAL REGION       CR 7839A8577144EFE2 2014-12-05 27680.06
## 8          SR CENTRAL REGION       CR 48661DC0FBA09F7A 2014-12-05 29253.21
## 9          QT CENTRAL REGION       CR 1F721290C421BFAB 2014-12-05 22077.34
## 10         QT CENTRAL REGION       CR 3580D2AFFBEE914C 2014-12-05 24168.31
##      Y_ADDR SHAPE_Leng SHAPE_Area Planning Region Land Strata
## 1  29220.19   5267.381  1630379.3            <NA>   NA     NA
## 2  29782.05   3506.107   559816.2  Central Region    0    272
## 3  29974.66   1740.926   160807.5  Central Region    0    149
## 4  29933.77   3313.625   595428.9  Central Region    3   1091
## 5  30005.70   2825.594   387429.4  Central Region    3   1091
## 6  29991.38   4428.913  1030378.8  Central Region    3   1091
## 7  30230.86   3275.312   551732.0  Central Region    3   1091
## 8  30222.86   2208.619   290184.7  Central Region    0    149
## 9  29893.78   6571.323  1084792.3  Central Region   12   1032
## 10 30104.18   3454.239   631644.3  Central Region   12   1032
##    TOTAL TRANSACTION VOLUME                       geometry
## 1                        NA MULTIPOLYGON (((31495.56 30...
## 2                       272 MULTIPOLYGON (((29092.28 30...
## 3                       149 MULTIPOLYGON (((29932.33 29...
## 4                      1094 MULTIPOLYGON (((27131.28 30...
## 5                      1094 MULTIPOLYGON (((26451.03 30...
## 6                      1094 MULTIPOLYGON (((25899.7 297...
## 7                      1094 MULTIPOLYGON (((27746.95 30...
## 8                       149 MULTIPOLYGON (((29351.26 29...
## 9                      1044 MULTIPOLYGON (((20996.49 30...
## 10                     1044 MULTIPOLYGON (((24472.11 29...

2.5.3 Preparation of Dataframe for 2019 Total Transacted Value

Since we only want the data from Year 2019, we first apply a filter to the dataframe ‘trans1’. Also, since the values of ‘PLN_AREA_N’ in the map dataset is in uppercase, we convert the strings in column ‘Planning Area’ into uppercase. Next, since we do not need other details except Planning Region, Planning Area, Transacted Price ($), we use the function select() to select these columns only. Finally, we use a combination of group_by() function and summarise() function to sum up all the transacted prices for unique groups of planning region and planning area. The summarise() function will create a column ‘sum’ for the unique groupings we had specified under group_by(). Finally, since the values under column ‘sum’ are large numbers, we could divide it by a billion and create a new column ‘TOTAL TRANSACTION VALUE ($B)’.

transPrice = trans1 %>%
  filter(`Sale Date` == 2019) %>%
  mutate_at(.vars = vars(`Planning Area`), toupper) %>%
  select(`Planning Region`, `Planning Area`, `Transacted Price ($)`) %>%
  group_by(`Planning Region`, `Planning Area`) %>%
  summarise(sum = sum(`Transacted Price ($)`)) %>%
  mutate(`TOTAL TRANSACTION VALUE ($B)` = sum/1000000000)

transPrice

## # A tibble: 38 x 4
## # Groups:   Planning Region [5]
##    `Planning Region` `Planning Area`        sum `TOTAL TRANSACTION VALUE ($B)`
##    <chr>             <chr>                <dbl>                          <dbl>
##  1 Central Region    BISHAN           838306162                         0.838 
##  2 Central Region    BUKIT MERAH     1935610543                         1.94  
##  3 Central Region    BUKIT TIMAH     2918261595                         2.92  
##  4 Central Region    DOWNTOWN CORE   1197842054                         1.20  
##  5 Central Region    GEYLANG         1516302843                         1.52  
##  6 Central Region    KALLANG          655451609                         0.655 
##  7 Central Region    MARINE PARADE   1419921405                         1.42  
##  8 Central Region    MUSEUM            84885550                         0.0849
##  9 Central Region    NEWTON          1073713186                         1.07  
## 10 Central Region    NOVENA          1385798708                         1.39  
## # ... with 28 more rows

2.5.4 Joining 2019 Total Transaction Value Dataframe to Map Data

The prepared ‘transPrice’ dataframe is left joined to the map data using their common values in “PLN_AREA_N” and “Planning Area” to create a new dataframe ‘mpsz_2019Price’.

mpsz_2019Price <- left_join(mpsz, transPrice, by = c("PLN_AREA_N" = "Planning Area"))

mpsz_2019Price

## Simple feature collection with 323 features and 18 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## First 10 features:
##    OBJECTID SUBZONE_NO       SUBZONE_N SUBZONE_C CA_IND      PLN_AREA_N
## 1         1          1    MARINA SOUTH    MSSZ01      Y    MARINA SOUTH
## 2         2          1    PEARL'S HILL    OTSZ01      Y          OUTRAM
## 3         3          3       BOAT QUAY    SRSZ03      Y SINGAPORE RIVER
## 4         4          8  HENDERSON HILL    BMSZ08      N     BUKIT MERAH
## 5         5          3         REDHILL    BMSZ03      N     BUKIT MERAH
## 6         6          7  ALEXANDRA HILL    BMSZ07      N     BUKIT MERAH
## 7         7          9   BUKIT HO SWEE    BMSZ09      N     BUKIT MERAH
## 8         8          2     CLARKE QUAY    SRSZ02      Y SINGAPORE RIVER
## 9         9         13 PASIR PANJANG 1    QTSZ13      N      QUEENSTOWN
## 10       10          7       QUEENSWAY    QTSZ07      N      QUEENSTOWN
##    PLN_AREA_C       REGION_N REGION_C          INC_CRC FMEL_UPD_D   X_ADDR
## 1          MS CENTRAL REGION       CR 5ED7EB253F99252E 2014-12-05 31595.84
## 2          OT CENTRAL REGION       CR 8C7149B9EB32EEFC 2014-12-05 28679.06
## 3          SR CENTRAL REGION       CR C35FEFF02B13E0E5 2014-12-05 29654.96
## 4          BM CENTRAL REGION       CR 3775D82C5DDBEFBD 2014-12-05 26782.83
## 5          BM CENTRAL REGION       CR 85D9ABEF0A40678F 2014-12-05 26201.96
## 6          BM CENTRAL REGION       CR 9D286521EF5E3B59 2014-12-05 25358.82
## 7          BM CENTRAL REGION       CR 7839A8577144EFE2 2014-12-05 27680.06
## 8          SR CENTRAL REGION       CR 48661DC0FBA09F7A 2014-12-05 29253.21
## 9          QT CENTRAL REGION       CR 1F721290C421BFAB 2014-12-05 22077.34
## 10         QT CENTRAL REGION       CR 3580D2AFFBEE914C 2014-12-05 24168.31
##      Y_ADDR SHAPE_Leng SHAPE_Area Planning Region        sum
## 1  29220.19   5267.381  1630379.3            <NA>         NA
## 2  29782.05   3506.107   559816.2  Central Region  389821688
## 3  29974.66   1740.926   160807.5  Central Region  548822231
## 4  29933.77   3313.625   595428.9  Central Region 1935610543
## 5  30005.70   2825.594   387429.4  Central Region 1935610543
## 6  29991.38   4428.913  1030378.8  Central Region 1935610543
## 7  30230.86   3275.312   551732.0  Central Region 1935610543
## 8  30222.86   2208.619   290184.7  Central Region  548822231
## 9  29893.78   6571.323  1084792.3  Central Region 1571360137
## 10 30104.18   3454.239   631644.3  Central Region 1571360137
##    TOTAL TRANSACTION VALUE ($B)                       geometry
## 1                            NA MULTIPOLYGON (((31495.56 30...
## 2                     0.3898217 MULTIPOLYGON (((29092.28 30...
## 3                     0.5488222 MULTIPOLYGON (((29932.33 29...
## 4                     1.9356105 MULTIPOLYGON (((27131.28 30...
## 5                     1.9356105 MULTIPOLYGON (((26451.03 30...
## 6                     1.9356105 MULTIPOLYGON (((25899.7 297...
## 7                     1.9356105 MULTIPOLYGON (((27746.95 30...
## 8                     0.5488222 MULTIPOLYGON (((29351.26 29...
## 9                     1.5713601 MULTIPOLYGON (((20996.49 30...
## 10                    1.5713601 MULTIPOLYGON (((24472.11 29...

2.6. Preparation of Static Maps

2.6.1 Preparation of Static Choropleth Maps for Total Transaction Volume and Total Transaction Value for Year 2019

The base map is first drawn using tm_shape(mpsz_2019Price) + tm_polygons().

tm_shape(mpsz_2019Price) +
  tm_polygons()

Since we want the choropleth map to show the geographical distribution of the total transaction value, we assign the target variable ‘TOTAL TRANSACTION VALUE ($B)’ to tm_polygons. We also specified the style to be ‘equal’, and to have a historgram in the legend of this graph.

tm_shape(mpsz_2019Price) +
  tm_polygons(col = "TOTAL TRANSACTION VALUE ($B)", 
              style = "equal",
              legend.hist = TRUE)

Since the legend is interfering with the choropleth map, we use tm_layout() to specify that we want the legend to be outside of the map area. Finally, a scale bar, compass and credits are added in the specified locations in the map area for completeness.

tm_shape(mpsz_2019Price) +
  tm_polygons(col = "TOTAL TRANSACTION VALUE ($B)", 
              style = "equal",
              legend.hist = TRUE) +
  tm_layout(legend.outside = TRUE) +
  tm_scale_bar(position=c("left", "top"), width = 2) +
  tm_compass(position=c("right", "top")) +
  tm_credits("Source: Planning Sub-zone boundary and Singapore Private Residential Transactions from Urban Redevelopment Authorithy (URA)", position = c("left", "bottom"))

Similarly, the choropleth map showing the geographical distribution of total transaction volume for 2019 is created using similar steps as above, except that this time, we specified the style to be ‘fixed’, meaning we specified the breaks we want to bin the data. Here, we bin the total transaction volume by the hundreds up to 700 (maximum value upon examining the data).

tm_shape(mpsz_2019) +
  tm_polygons(col = "TOTAL TRANSACTION VOLUME", 
              style = "fixed",
              breaks = c(1, 101, 201, 301, 401, 501, 601, 700),
              labels = c("1-100", "101-200", "201-300", "301-400", "401-500", "501-600", "601-700"),
              legend.hist = TRUE) +
  tm_layout(legend.outside = TRUE) +
  tm_scale_bar(position=c("left", "top"), width = 2) +
  tm_compass(position=c("right", "top")) +
  tm_credits("Source: Planning Sub-zone boundary and Singapore Private Residential Transactions from Urban Redevelopment Authorithy (URA)", position = c("left", "bottom"))

As we would like to place the 2 choropleth static maps together for comparison, we assign the two choropleth maps as ‘Volume’ and ‘Price’ respectively and use tmap_arrange() function to put them together. This will enable easier comparison.

Volume = tm_shape(mpsz_2019) +
  tm_polygons(col = "TOTAL TRANSACTION VOLUME", 
              style = "fixed",
              breaks = c(1, 101, 201, 301, 401, 501, 601, 700),
              labels = c("1-100", "101-200", "201-300", "301-400", "401-500", "501-600", "601-700"),
              legend.hist = TRUE) +
  tm_layout(legend.outside = TRUE) +
  tm_scale_bar(position=c("left", "top"), width = 2) +
  tm_compass(position=c("right", "top")) +
  tm_credits("Source: Planning Sub-zone boundary and Singapore Private Residential Transactions from Urban Redevelopment Authorithy (URA)", position = c("left", "bottom"))

Price = tm_shape(mpsz_2019Price) +
  tm_polygons(col = "TOTAL TRANSACTION VALUE ($B)", 
              style = "equal",
              legend.hist = TRUE) +
  tm_layout(legend.outside = TRUE) +
  tm_scale_bar(position=c("left", "top"), width = 2) +
  tm_compass(position=c("right", "top")) +
  tm_credits("Source: Planning Sub-zone boundary and Singapore Private Residential Transactions from Urban Redevelopment Authorithy (URA)", position = c("left", "bottom"))

tmap_arrange(Volume, Price)

2.6.2 Preparation of Interactive Choropleth Maps for Total Transaction Volume and Total Transaction Value for Year 2019

In order to change the static choropleth map showing the geographic total transaction volume for Year 2019, we simply call the function tmap_mode(“view”).

tmap_mode("view")

tm_shape(mpsz_2019) +
  tm_polygons(col = "TOTAL TRANSACTION VOLUME", 
              style = "fixed",
              breaks = c(1, 101, 201, 301, 401, 501, 601, 700),
              labels = c("1-100", "101-200", "201-300", "301-400", "401-500", "501-600", "601-700"),
              legend.hist = TRUE) +
  tm_layout(legend.outside = TRUE) +
  tm_scale_bar(position=c("left", "bottom"), width = 2) +
  tm_compass(position=c("right", "bottom")) +
  tm_credits("Source: Planning Sub-zone boundary and Singapore Private Residential Transactions from Urban Redevelopment Authorithy (URA)", position = c("left", "bottom"))

However, in the above map, when we mouse over the planning areas, the polygon id number appeared instead of the planning area names. To show the names of the planning areas when we mouse over the choropleth map, we specify id=“PLN_AREA_N” within tm_polygons(). The interactive map for the total transaction volume in Year 2019 is completed. By mousing over each planning area, the name of the planning area is displayed and the user can click on the planning area to display a callout dialogue box showing the name of the planning area and the total transaction volume for that planning area.

tmap_mode("view")

tm_shape(mpsz_2019) +
  tm_polygons(col = "TOTAL TRANSACTION VOLUME", 
              style = "fixed",
              breaks = c(1, 101, 201, 301, 401, 501, 601, 700),
              labels = c("1-100", "101-200", "201-300", "301-400", "401-500", "501-600", "601-700"),
              legend.hist = TRUE,
              id="PLN_AREA_N") +
  tm_layout(legend.outside = TRUE) +
  tm_scale_bar(position=c("left", "bottom"), width = 2) +
  tm_compass(position=c("right", "bottom")) +
  tm_credits("Source: Planning Sub-zone boundary and Singapore Private Residential Transactions from Urban Redevelopment Authorithy (URA)", position = c("left", "bottom"))

Similarly, we use tmap_mode(“View”) to convert the static choropleth map showing the geographic distribution of total transaction value into an interactive one. To show the names of the planning areas when we mouse over the choropleth map, we specify id=“PLN_AREA_N” within tm_polygons(). The interactive map is now completed. By mousing over each planning area, the name of the planning area is displayed and the user can click on the planning area to display a callout dialogue box showing the name of the planning area and the total transaction value (S$B) for that planning area.

3. Final Data Visualization with Short Descriptions

3.1. Overview from 1995 to 2019

By comparing the two interactive bar charts below for total transaction volumes and mean transacted prices for the period of Jan 1995 to Jun 2020, we find that the Central Region consistently topped and the North Region was at the bottom of both charts. This gives us an insight that the Central Region had the highest activity and mean value in private residential property transactions, while the North Region experienced the lowest activity and mean value in private residential property transactions. This would help a user to understand at a macro-perspective which region is more ‘profitable’ and ‘feasible’ to invest in property. User may mouse over the bar charts for more details on total transaction volumes and mean transacted prices for each region.

3.2. Total Transaction Volume and Value in Year 2019

3.2.1. Comparison of Total Transaction Volume and Value in Year 2019 using Choropleth Maps

The two static choropleth maps display the geographical distribution of Total Transaction Volume and Value in Year 2019. We noticed that there is a planning area within the North-East region that experienced a high volume of transaction, yet relatively low transaction value. To investigate deeper, we could use the interactive choropleth maps shown in Section 3.2.2.

3.2.2. Investigating Deeper using Interactive Choropleth Maps

From the interactive choropleth maps showing the geographic distribution of the total transaction volume by planning area in 2019, we could mouse over the areas we are interested in to check the names of these planning area. For the area in the North-East region mentioned in Section 3.2.1 above, we learnt that this area in Punggol, with transaction volume of 674 (falling under the highest bin of total transaction volume), yet with S$0.769B worth of total transaction value (second lowest bin of total transaction value). This indicated that the properties in Punggol are highly transactable, but yield lower value.

In addition, we observed that areas such as Bedok, Bukit Merah and Bukit Timah all experienced very high volumes of transaction and yielded relatively high transaction values too. This indicated that properties in these areas are very transactable and yield higher value at the same time. These areas are potentially more attractive to people looking to invest in property.

Visual Analytics - Assg 5

Tan Hui Ling

7/25/2020