Data Source: https://data.gov.sg/dataset/hdb-property-information

IS428 Visual Analytics and Applications - Assignment 5 Zhang Yiling 6 April 2021

1. Objectives
Without giving any specific dataset, the objective of this assignment is to incorporate interactivity and/or map(s) in your data visualization design using the appropriate R packages (e.g. ggplot2) that can reveal useful insights.

Assignment Requirements:
The data visualization must be built programmatically using ggplot2 or other appropriate R packages. The deliverable must be prepared using RMarkdown and be knitted into html document.
Scope of work #1: Describe the major data and design challenges faced in accomplishing the task, and how you plan to overcome these challenges with a proposed sketched design. (3 marks)
Scope of work #2: Provide step-by-step description on how the data visualization was prepared by using ggplot2 and other related R packages. (3 marks)
Scope of work #3: The final data visualization and a short description of not more than 350 words. The description must provide at least two useful information revealed by the data visualization. (4 marks)


2. Challenges and Solutions
2.1. Challenge: Dataset is not in the suitable format for plotting

The HDB Property information data set is in a format that is not readily usable to plot charts, while there are too many small groups existing in the dataset and simply plotting them would not be helpful for generating useful insight.

Solution: Even though I can manually modify the dataset to the format that I need using excel, however this will be time consuming and tedious.Therefore, I shall learn the data transformation techniques from websites such as datacamp. Examples include dplyr::mutate(). By using mutate(), I can aggregate the 1room_sold/2room_sold into a smallUnitSold and avoid large number of data columns. With that,I am able to plot them into ternary plot by categorizing them into three different groups.

I will be adopting the method of transforming the data before every plot for this assignment.

2.2. Challenge: Multiple HDB property with 0 total dwelling units.

The HDB property information dataset contains multiple HDB with 0 total_dwelling_units. It is not meaningful to visualize the HDB with 0 total dwelling units on the map.
Solution: Before plotting the charts, at the data transformation stage, I will be excluding the HDB property that have 0 total dwelling units using the dplyr::filter() function. A note/disclaimer will be written in the charts’ subtitle to inform the viewers that HDB property with 0 total dwelling units are not shown.

2.3. Challenge: Column names are not aligned with the input sequence when drawing the HDB(Per Building) Type Distribution across Towns.

I have aggregated the count of HDB type according to different towns and before plotting it into heatmap using highchart package.

Aggregated number of HDB type acorss different towns
I input the 5 HDB type/towns/number of specific HDB type in different towns according to the sequence I have aggregated above. However, I realize that the output was not organized as I expected. The heatmap itself will adjust the position of column and show as below.The values and columns are not aligned.

Original input sequence of HDB type
Misaligned HDB type number Solution: I adjust the input sequence of HDB type and make it aligned with the sequence of number of specific HDB type in each town. The input sequence is shown as below.

Adjust input sequence of HDB type

2.4. Challenge: Fail to access hc_add_series_boxplot function in highcharter package.

During the attempt of plotting HDB Type (Per Building) Overall Distribution using boxplot, I failed to access hc_add_series_boxplot function in highcharter package which enables interactivity.
Solution: I switched to another function called hcboxplot in highcharter package which enable me to draw boxplot but in a different orientation.

2.5. Sketch
The sketch visualizes roughly how the solutions of the challenges will be incorporated.

Solution 1: In order to plot ternary graph under section 3.4,I need to derive/combine multiple groups to have lesser groups without losing the original information.
Solution 2: In order to plot Choropleth map in section 3.3, I need to filter HDB with 0 dwelling units so that non-meaningful data will not show in the map.
Solution 3: In order to plot properly under section 3.12, I need to manually change the column sequence to fit into the heatmap and ensure accuracy.
Solution 4: Under section 3.11, I need to switch to hcboxplot() function in highchart package to draw boxplot showing the HDB Type(Per Building) Overall Distribution but in another orientation.
Girl in a jacket
All other sketches not mentioned above are also included in the attached png.
Girl in a jacket
Girl in a jacket

3. Step-by-step Description

3.1 Setup

This code chunk loads the packages required for analysis.

library(sf)
## Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1
library(tmap)
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.1     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ───────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggtern)
## Registered S3 methods overwritten by 'ggtern':
##   method           from   
##   grid.draw.ggplot ggplot2
##   plot.ggplot      ggplot2
##   print.ggplot     ggplot2
## --
## Remember to cite, run citation(package = 'ggtern') for further info.
## --
## 
## Attaching package: 'ggtern'
## The following objects are masked from 'package:ggplot2':
## 
##     aes, annotate, ggplot, ggplot_build, ggplot_gtable, ggplotGrob,
##     ggsave, layer_data, theme_bw, theme_classic, theme_dark,
##     theme_gray, theme_light, theme_linedraw, theme_minimal, theme_void
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
library(ggExtra)
library(parcoords)
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library(gplots)
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess
library(dplyr)
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:purrr':
## 
##     transpose
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

3.2. Import Datasets
There are two dataset in this assignments.
MP14_SUBZONE_WEB_PL is obtained from lesson, which contains Singapore geospatial information in SVY21 format.
The HDB property information dataset is obtained from the data.gov.sg(managed by Housing and Developmemt Board). It contains the location of existing HDB blocks, highest floor level, year of completion, type of building and number of HDB flats (breakdown by flat type) per block etc.

## Import Geospatial Data into R 
mpsz <- st_read(dsn = "data/geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `/Users/eva/Desktop/Assignment 5 2/data/geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## Projected CRS: SVY21
HDB_property <- read.csv("hdb_property.csv")
HDB_property = filter(HDB_property,total_dwelling_units>0)

3.3 Visualizing Singapore’s HDB distribution which group by total_dwelling_units of each block. Creating an interactive map application.

mpsz_HDB <- left_join(mpsz, HDB_property, by = c("PLN_AREA_C" = "bldg_contract_town"))
tm_shape(mpsz_HDB)+
  tm_fill("total_dwelling_units", 
          style = "quantile", 
          palette = "Blues", 
          legend.hist = TRUE, 
          legend.is.portrait = TRUE,
          legend.hist.z = 0.1) +
  tm_layout(legend.height = 0.45, 
            legend.width = 0.35,
            legend.outside = FALSE,
            legend.position = c("right", "bottom"),
            frame = FALSE
            )+
  tm_layout(main.title = "Distribution of Dwelling Unit in Planning Towns(HDB with 0 dwelling units are not shown)",
            main.title.position = "center",
            main.title.size = 1.2,
            frame = TRUE)+
  tm_borders(alpha = 0.5)

## [1] "#E3EEF8" "#BFD8EC" "#7FB9DA" "#4090C5" "#1664AB"

3.4. Visualizing the distribution of SmallUnitSold/MediumUnitSold/LargeUnitSold and SmallUnitRent/MediumUnitRent/LargeUnitRent against different HDB blocks by interactive ternary diagram using package plotly.

## Transform the dataset to group different columns into meaningful variables. 

HDB_property<- HDB_property %>%
  mutate(SmallUnitSold = rowSums(.[13:14]))%>%
  mutate(MediumUnitSold = rowSums(.[15])) %>%
  mutate(LargeUnitSold = rowSums(.[16:17]))%>%
  mutate(SmallUnitRent = rowSums(.[22:23]))%>%
  mutate(MediumUnitRent = rowSums(.[24])) %>%
  mutate(LargeUnitRent = rowSums(.[25])) %>%
  mutate(TotalUnitSold = rowSums(.[14:18])) %>%
  mutate(TotalUnitRent = rowSums(.[22:25]))
# reusable function for axis formatting


axis <- function(txt) {
  list(
    title = txt, tickformat = "%", tickfont = list(size = 10)
  )
}

ternaryAxes <- list(
  aaxis = axis("SmallUnitSold"), 
  baxis = axis("MediumUnitSold"), 
  caxis = axis("LargeUnitSold")
)

# Initiating a plotly visualization 
plot_ly(
  HDB_property, 
  a = ~ SmallUnitSold, 
  b = ~ MediumUnitSold, 
  c = ~ LargeUnitSold, 
  type = "scatterternary",
  color = I("red")
)%>%
  layout(
    ternary = ternaryAxes
  )
## No scatterternary mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
## plot an interactive ternary diagram
# reusable function for axis formatting

axis <- function(txt) {
  list(
    title = txt, tickformat = "%", tickfont = list(size = 10)
  )
}

ternaryAxes <- list(
  aaxis = axis("SmallUnitRent"), 
  baxis = axis("MediumUnitRent"), 
  caxis = axis("LargeUnitRent")
)

# Initiating a plotly visualization 
plot_ly(
  HDB_property, 
  a = ~ SmallUnitRent, 
  b = ~ MediumUnitRent, 
  c = ~ LargeUnitRent, 
  type = "scatterternary",
  color = I("red")
)%>%
  layout(
    ternary = ternaryAxes
  )
## No scatterternary mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode

3.5.Visualizing data through variations in coloring using package gplots. Creating heatmap which shows total_dwelling_units/ SmallUnitSold/ MediumUnitSold/ LargeUnitSold/ SmallUnitRent/MediumUnitRent/LargeUnitRent/TotalUnitSold/TotalUnitRent against different towns.

## Aggregate data which group by bldg_contract_town using package data.table / dplyr 
Heatmap_data<- HDB_property %>%
  select(c(11:12,25:32))
Heatmap_df<- data.frame(Heatmap_data)


setDT(Heatmap_df)
Heatmap_df<- Heatmap_df[,lapply(.SD,sum),by=bldg_contract_town, .SDcols = 2:10]
## Plot the heatmap using package gplots
row.names(Heatmap_df) <- Heatmap_df$bldg_contract_town
## Transforming the data frame into a matrix 
Heatmap_df_matrix <- data.matrix(Heatmap_df)
## Create the heatmap and adjust the font size of labels 
heatmap.2(x=Heatmap_df_matrix,cexRow=0.5,cexCol =0.7)


3.6. Visualizing HDB flat type in different towns using Parallel Coordinate Plot

## Aggregate data which group by bldg_contract_town using package data.table / dplyr 

 pc_data<- HDB_property %>%
  select(c(11,5:9))

pc_df<- data.frame(pc_data)


setDT(pc_df)
pc_df<- pc_df[,lapply(.SD,sum),by=bldg_contract_town, .SDcols = 2:6]
pc_df<-pc_df[order(pc_df$bldg_contract_town),]
## Creating parallel coordinate of HDB flat type in different towns  using package parcoords 


parcoords(
 pc_df[,1:6],
 )

3.7. Visualizing total_dwelling_units/TotalUnitSold/TotalUnitRent in different towns using Parallel Coordinate Plot

## dwelling unit/total sold/ total rent comparison
pc_df1<- Heatmap_df %>%
  select(c(1,2,9:10))
  

pc_df1<-pc_df1[order(pc_df1$bldg_contract_town),]
parcoords(
 pc_df1[,1:4],
 )

3.8. Visualizing ratio of TotalUnitSold/TotalUnitRent in different towns using Parallel Coordinate Plot

pc_df1
##     bldg_contract_town total_dwelling_units TotalUnitSold TotalUnitRent
##  1:                AMK                51312         46765          3861
##  2:                 BB                45092         43426          3359
##  3:                 BD                63031         58531          2650
##  4:                 BH                20072         19016           267
##  5:                 BM                54227         44127          6507
##  6:                 BP                35325         34680           571
##  7:                 BT                 2554          2471            49
##  8:                CCK                49026         47178          1353
##  9:                 CL                26730         25775           784
## 10:                 CT                12003          8887          1386
## 11:                 GL                30829         26803          3755
## 12:                 HG                57272         55142          2506
## 13:                 JE                24122         23359           543
## 14:                 JW                75207         71632          3386
## 15:                KWN                39931         33174          2839
## 16:                 MP                 7859          6534          1351
## 17:                 PG                51704         49029          3527
## 18:                PRC                29654         29267           342
## 19:                 QT                33973         31204          3612
## 20:                 SB                30020         28182          3493
## 21:                SGN                21632         21045           412
## 22:                 SK                69196         67378          2984
## 23:                TAP                73210         69839          1516
## 24:                 TP                40591         36099          4069
## 25:                 WL                69900         65129          3049
## 26:                 YS                65158         62029          2962
##     bldg_contract_town total_dwelling_units TotalUnitSold TotalUnitRent
pc_df2 <- transform(pc_df1, ratio = TotalUnitSold / TotalUnitRent)

pc_df2<- pc_df2%>%
  select(c(1,5))

pc_df2<-pc_df2[order(pc_df2$bldg_contract_town),]
parcoords(
 pc_df2[,1:2],
 )

3.9. Visualizing TotalUnitSold/TotalUnitRent in different towns using scatter plot and plot the histogram on the scatter plot as extension

## scatter plot with histogram 

base<-ggplot(data = pc_df1, aes(x=pc_df1$TotalUnitSold,y=pc_df1$TotalUnitRent))+ geom_point(stat = "identity")+
  labs(x = "TotalUnitSold",y="TotalUnitRent")

 
## plot the histogram on the scatter plot as extension 
  base+
  geom_vline(xintercept = 0,linetype = 'longdash')+
  geom_hline(yintercept = 0, linetype = 'longdash')+
  coord_cartesian(xlim = c(0,100000),
                  ylim = c(0,100000))
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
ggMarginal(base,type="histogram",color = "blue",fill= "light blue")


3.10. Plot the line chart using highcharter package to vizualize the change of TotaluUnitSold and TotalUnitRent in the years.

library(highcharter)

## Transform the data by selecting only year_completed/TotalUnitSold/TotalUnitRent
hg_data<- HDB_property %>%
  select(c(4,31:32))

hg_df<- data.frame(hg_data)
setDT(hg_df)

hg_df<- hg_df[,lapply(.SD,sum),by=year_completed, .SDcols = 2:3]
hg_df<-hg_df[order(hg_df$year_completed),]
highchart() %>%
  hc_add_theme(hc_theme_google()) %>% 
  hc_chart(type = "spline") %>% 
  hc_title(text = "HDB Property Information") %>%
  hc_xAxis(categories = hg_df$year_completed) %>%  
  hc_add_series(name = "TotalUnitSold", data = hg_df$TotalUnitSold) %>%  
  hc_add_series(name = "TotalUnitRent", data = hg_df$TotalUnitRent ) %>%
  hc_colors(colors = c('gray','blue'))

3.11. Plot the boxplot to visualize the HDB flat type overall distribution in Singapore using package highcharter.

# Transform the data by only selecting the HDB flat type 
boxdata<- pc_df[,2:6]
boxdata[1:26]
##     residential commercial market_hawker miscellaneous multistorey_carpark
##  1:         387        104             0           122                   0
##  2:         423         82             0            98                   0
##  3:         519        128             0           137                   0
##  4:         231         72             0            54                   0
##  5:         429        115             0           106                   0
##  6:         338         31             0            82                   0
##  7:          27          6             0            11                   0
##  8:         523         90             0           122                   3
##  9:         200         54             0            62                   0
## 10:          79         44             2            32                   0
## 11:         276         74             0            73                   0
## 12:         590        144             0           145                   0
## 13:         231         67             0            62                   0
## 14:         732        150             0           194                   0
## 15:         299         76             0            80                   0
## 16:          59         12             0            21                   0
## 17:         484         24             0           129                   0
## 18:         431         72             0            73                   0
## 19:         241         41             0            52                   0
## 20:         254         15             0            77                   0
## 21:         232         54             0            58                   1
## 22:         621         39             0           166                   0
## 23:         799        154             0           188                   0
## 24:         315         81             0            88                   0
## 25:         706         91             0           181                   0
## 26:         651        127             0           163                   0
##     residential commercial market_hawker miscellaneous multistorey_carpark
## Transform the input into data table to plot the chart
box_data <- data.table(type =rep(c("residential","commercial ","market_hawker", "miscellaneous ","multistorey_carpark"),each =26),num = c(387,423,519 ,231 ,429, 338 , 27, 523 ,200  ,79 ,276, 590, 231, 732, 299,  59, 484, 431, 241 ,254 ,232, 621, 799, 315 ,706 ,651, 127 ,116 ,162 ,101 ,157,  44 ,  6, 118  ,74  ,56,  98, 164 , 84 ,178, 105,  19  ,41  ,85  ,63 , 32, 67 , 66, 184, 102, 119, 150,10,  1 , 9 , 2 ,14,  1 , 1,  3  ,5 , 7 ,10  ,2  ,3 , 5 , 8 , 2 , 0 , 3 , 6  ,1 , 1 , 1,  2 , 7  ,6,  2,127, 112 ,144 , 56,120 , 87 , 11 ,130 , 67 , 34 , 82, 152,  65 ,205 , 89,  21, 141,  77,  76 , 85, 61, 184 ,195 , 98, 193 ,174, 20 , 33 , 35 , 19 , 52 , 48 ,  3 , 66  ,12 ,  9  ,17,  43 , 16 , 79 , 29   ,7  ,68  ,45  ,34  ,42,14 ,103 , 62 , 32 ,101,31))
## Plot the boxplot using package highcharter 
 

hcboxplot(x = box_data$num, var =box_data$type,name = "type" ) %>%
  hc_title(text = "HDB(Per Block)Overall Distribution",align="center")%>%
  hc_yAxis(title = list(text = "Number of HDB(Per Building)"))%>%
  hc_legend(enabled = FALSE) %>%
  hc_add_theme(hc_theme_ft())
## Warning: 'hcboxplot' is deprecated.
## Use 'data_to_boxplot' instead.
## See help("Deprecated")

3.12. Plot the heatmap to visualize the HDB flat type distribution in different towns using package highcharter.

## create input for heatmap generation 

type <- c( "miscellaneous",  "multistorey_carpatype",  "residential"  ,"commercial", "market_hawker " )

town<- c("AMK","BB","BD","BH","BM","BP","BT","CCK","CL","CT","GL","HG","JE","JW"    ,"KWN","MP","PG","PRC","QT","SB","SGN","SK","TAP","TP","WL","YS")

nums = c(
387,104,0,122,0,
423,82, 0   ,98,0,
519 ,128,   0   ,137,   0,
231,72,0,54,0,
429,115,0,106,0,
338 ,31,    0,  82  ,0,
27, 6,  0,  11, 0,
523,    90, 0,  122,    3,
200 ,54 ,0  ,62,    0,
79  ,44 ,2, 32, 0,
276 ,74 ,0, 73, 0,
590 ,144,   0,  145,    0,
231,    67, 0,62,   0,
732 ,150,   0   ,194,   0,
299 ,76 ,0, 80, 0,
59, 12, 0   ,21,    0,
484,    24, 0   ,129,   0,
431,72, 0   ,73,    0,
241,    41, 0,  52, 0,
254,15, 0,77,   0,
232,    54, 0,  58, 1,
621,39, 0   ,166,   0,
799,    154,    0,  188,    0,
315,    81, 0   ,88,    0,
706 ,91 ,0, 181,    0,
651,    127 ,0  ,163,   0)
 


## Form data table using package data.table 
heatmap_data = data.table(type =rep(type,times = length(town)),
                          town =rep(town,each = length(type)),
                          num = length(type) *length(town))

## Plot the heatmap using package highcharter 
highchart() %>%
  hc_title(text = "HDB(Per Building) Type Distribution across Towns ",align="center")%>%
  hc_xAxis(categories = type) %>%
  hc_yAxis(categories = town)%>%
  hc_colorAxis(min = 0,minColor = "#FFFFFF") %>%
  hc_legend(align = "right",layout = "vertical",margin = 0,verticalAlign = "top",y = 25,symbolHeight = 400) %>%
  hc_tooltip(formatter = JS("function () {return '<b>' + this.series.yAxis.categories[this.point.y] + '</b> has <br><b>' +
                                    this.point.value + '</b> buildings of <br><b>' + this.series.xAxis.categories[this.point.x] + '</b>'+'</b> type <br><b>';}")) %>%
  hc_add_series(name = "Sales per employee",data = heatmap_data,type = "heatmap",
                hcaes(x = type,y = town,value = nums),
                dataLabels = list(enabled = TRUE)) %>%
  hc_add_theme(hc_theme_538())

4. Insights

4.1 Insight 1
Under section 3.4, I found out that in medium-size units being sold occupy for largest proportion among all the units being sold and nearly all the units being rent are small size and large size.

4.2 Insight 2
Under section 3.5, I found out that JW/TAP/SK/WL have the higher total dwelling units than other towns and JW has the highest total units sold.

4.3 Insight 3
Under section 3.8, I found out that PRC has the higest ratio of TotalUnitSold/TotalUnitRent and MP has the lowest ratio.

4.4 Insight 4
Under section 3.10, I found out that year 1984(73119), 1998(36609) and 2017(30187) having the top three numbers of units sold.

4.5 Insight 5
Under section 3.12, I found out that the most common HDB flat types ranked as residential, miscellaneous, commercial, multistory_carpark and market_hawker in descending order. TAP has the highest number of HDB block in residential and commercial type, CT is the only town having HDB block in market_hawker type.CCK and SGN are the only two having HDB block in multistorey_carpark type.