Data Source: https://data.gov.sg/dataset/hdb-property-information
IS428 Visual Analytics and Applications - Assignment 5 Zhang Yiling 6 April 2021
1. Objectives
Without giving any specific dataset, the objective of this assignment is to incorporate interactivity and/or map(s) in your data visualization design using the appropriate R packages (e.g. ggplot2) that can reveal useful insights.
Assignment Requirements:
The data visualization must be built programmatically using ggplot2 or other appropriate R packages. The deliverable must be prepared using RMarkdown and be knitted into html document.
Scope of work #1: Describe the major data and design challenges faced in accomplishing the task, and how you plan to overcome these challenges with a proposed sketched design. (3 marks)
Scope of work #2: Provide step-by-step description on how the data visualization was prepared by using ggplot2 and other related R packages. (3 marks)
Scope of work #3: The final data visualization and a short description of not more than 350 words. The description must provide at least two useful information revealed by the data visualization. (4 marks)
2. Challenges and Solutions
2.1. Challenge: Dataset is not in the suitable format for plotting
The HDB Property information data set is in a format that is not readily usable to plot charts, while there are too many small groups existing in the dataset and simply plotting them would not be helpful for generating useful insight.
Solution: Even though I can manually modify the dataset to the format that I need using excel, however this will be time consuming and tedious.Therefore, I shall learn the data transformation techniques from websites such as datacamp. Examples include dplyr::mutate(). By using mutate(), I can aggregate the 1room_sold/2room_sold into a smallUnitSold and avoid large number of data columns. With that,I am able to plot them into ternary plot by categorizing them into three different groups.
I will be adopting the method of transforming the data before every plot for this assignment.
2.2. Challenge: Multiple HDB property with 0 total dwelling units.
The HDB property information dataset contains multiple HDB with 0 total_dwelling_units. It is not meaningful to visualize the HDB with 0 total dwelling units on the map.
Solution: Before plotting the charts, at the data transformation stage, I will be excluding the HDB property that have 0 total dwelling units using the dplyr::filter() function. A note/disclaimer will be written in the charts’ subtitle to inform the viewers that HDB property with 0 total dwelling units are not shown.
2.3. Challenge: Column names are not aligned with the input sequence when drawing the HDB(Per Building) Type Distribution across Towns.
I have aggregated the count of HDB type according to different towns and before plotting it into heatmap using highchart package.
I input the 5 HDB type/towns/number of specific HDB type in different towns according to the sequence I have aggregated above. However, I realize that the output was not organized as I expected. The heatmap itself will adjust the position of column and show as below.The values and columns are not aligned.
Solution: I adjust the input sequence of HDB type and make it aligned with the sequence of number of specific HDB type in each town. The input sequence is shown as below.
2.4. Challenge: Fail to access hc_add_series_boxplot function in highcharter package.
During the attempt of plotting HDB Type (Per Building) Overall Distribution using boxplot, I failed to access hc_add_series_boxplot function in highcharter package which enables interactivity.
Solution: I switched to another function called hcboxplot in highcharter package which enable me to draw boxplot but in a different orientation.
2.5. Sketch
The sketch visualizes roughly how the solutions of the challenges will be incorporated.
Solution 1: In order to plot ternary graph under section 3.4,I need to derive/combine multiple groups to have lesser groups without losing the original information.
Solution 2: In order to plot Choropleth map in section 3.3, I need to filter HDB with 0 dwelling units so that non-meaningful data will not show in the map.
Solution 3: In order to plot properly under section 3.12, I need to manually change the column sequence to fit into the heatmap and ensure accuracy.
Solution 4: Under section 3.11, I need to switch to hcboxplot() function in highchart package to draw boxplot showing the HDB Type(Per Building) Overall Distribution but in another orientation.
All other sketches not mentioned above are also included in the attached png.
3. Step-by-step Description
3.1 Setup
This code chunk loads the packages required for analysis.
library(sf)
## Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1
library(tmap)
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.1 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ───────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggtern)
## Registered S3 methods overwritten by 'ggtern':
## method from
## grid.draw.ggplot ggplot2
## plot.ggplot ggplot2
## print.ggplot ggplot2
## --
## Remember to cite, run citation(package = 'ggtern') for further info.
## --
##
## Attaching package: 'ggtern'
## The following objects are masked from 'package:ggplot2':
##
## aes, annotate, ggplot, ggplot_build, ggplot_gtable, ggplotGrob,
## ggsave, layer_data, theme_bw, theme_classic, theme_dark,
## theme_gray, theme_light, theme_linedraw, theme_minimal, theme_void
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
library(ggExtra)
library(parcoords)
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(gplots)
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
library(dplyr)
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following object is masked from 'package:purrr':
##
## transpose
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
3.2. Import Datasets
There are two dataset in this assignments.
MP14_SUBZONE_WEB_PL is obtained from lesson, which contains Singapore geospatial information in SVY21 format.
The HDB property information dataset is obtained from the data.gov.sg(managed by Housing and Developmemt Board). It contains the location of existing HDB blocks, highest floor level, year of completion, type of building and number of HDB flats (breakdown by flat type) per block etc.
## Import Geospatial Data into R
mpsz <- st_read(dsn = "data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `/Users/eva/Desktop/Assignment 5 2/data/geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## Projected CRS: SVY21
HDB_property <- read.csv("hdb_property.csv")
HDB_property = filter(HDB_property,total_dwelling_units>0)
3.3 Visualizing Singapore’s HDB distribution which group by total_dwelling_units of each block. Creating an interactive map application.
mpsz_HDB <- left_join(mpsz, HDB_property, by = c("PLN_AREA_C" = "bldg_contract_town"))
tm_shape(mpsz_HDB)+
tm_fill("total_dwelling_units",
style = "quantile",
palette = "Blues",
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(legend.height = 0.45,
legend.width = 0.35,
legend.outside = FALSE,
legend.position = c("right", "bottom"),
frame = FALSE
)+
tm_layout(main.title = "Distribution of Dwelling Unit in Planning Towns(HDB with 0 dwelling units are not shown)",
main.title.position = "center",
main.title.size = 1.2,
frame = TRUE)+
tm_borders(alpha = 0.5)
## [1] "#E3EEF8" "#BFD8EC" "#7FB9DA" "#4090C5" "#1664AB"
3.4. Visualizing the distribution of SmallUnitSold/MediumUnitSold/LargeUnitSold and SmallUnitRent/MediumUnitRent/LargeUnitRent against different HDB blocks by interactive ternary diagram using package plotly.
## Transform the dataset to group different columns into meaningful variables.
HDB_property<- HDB_property %>%
mutate(SmallUnitSold = rowSums(.[13:14]))%>%
mutate(MediumUnitSold = rowSums(.[15])) %>%
mutate(LargeUnitSold = rowSums(.[16:17]))%>%
mutate(SmallUnitRent = rowSums(.[22:23]))%>%
mutate(MediumUnitRent = rowSums(.[24])) %>%
mutate(LargeUnitRent = rowSums(.[25])) %>%
mutate(TotalUnitSold = rowSums(.[14:18])) %>%
mutate(TotalUnitRent = rowSums(.[22:25]))
# reusable function for axis formatting
axis <- function(txt) {
list(
title = txt, tickformat = "%", tickfont = list(size = 10)
)
}
ternaryAxes <- list(
aaxis = axis("SmallUnitSold"),
baxis = axis("MediumUnitSold"),
caxis = axis("LargeUnitSold")
)
# Initiating a plotly visualization
plot_ly(
HDB_property,
a = ~ SmallUnitSold,
b = ~ MediumUnitSold,
c = ~ LargeUnitSold,
type = "scatterternary",
color = I("red")
)%>%
layout(
ternary = ternaryAxes
)
## No scatterternary mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
## plot an interactive ternary diagram
# reusable function for axis formatting
axis <- function(txt) {
list(
title = txt, tickformat = "%", tickfont = list(size = 10)
)
}
ternaryAxes <- list(
aaxis = axis("SmallUnitRent"),
baxis = axis("MediumUnitRent"),
caxis = axis("LargeUnitRent")
)
# Initiating a plotly visualization
plot_ly(
HDB_property,
a = ~ SmallUnitRent,
b = ~ MediumUnitRent,
c = ~ LargeUnitRent,
type = "scatterternary",
color = I("red")
)%>%
layout(
ternary = ternaryAxes
)
## No scatterternary mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
3.5.Visualizing data through variations in coloring using package gplots. Creating heatmap which shows total_dwelling_units/ SmallUnitSold/ MediumUnitSold/ LargeUnitSold/ SmallUnitRent/MediumUnitRent/LargeUnitRent/TotalUnitSold/TotalUnitRent against different towns.
## Aggregate data which group by bldg_contract_town using package data.table / dplyr
Heatmap_data<- HDB_property %>%
select(c(11:12,25:32))
Heatmap_df<- data.frame(Heatmap_data)
setDT(Heatmap_df)
Heatmap_df<- Heatmap_df[,lapply(.SD,sum),by=bldg_contract_town, .SDcols = 2:10]
## Plot the heatmap using package gplots
row.names(Heatmap_df) <- Heatmap_df$bldg_contract_town
## Transforming the data frame into a matrix
Heatmap_df_matrix <- data.matrix(Heatmap_df)
## Create the heatmap and adjust the font size of labels
heatmap.2(x=Heatmap_df_matrix,cexRow=0.5,cexCol =0.7)
3.6. Visualizing HDB flat type in different towns using Parallel Coordinate Plot
## Aggregate data which group by bldg_contract_town using package data.table / dplyr
pc_data<- HDB_property %>%
select(c(11,5:9))
pc_df<- data.frame(pc_data)
setDT(pc_df)
pc_df<- pc_df[,lapply(.SD,sum),by=bldg_contract_town, .SDcols = 2:6]
pc_df<-pc_df[order(pc_df$bldg_contract_town),]
## Creating parallel coordinate of HDB flat type in different towns using package parcoords
parcoords(
pc_df[,1:6],
)
3.7. Visualizing total_dwelling_units/TotalUnitSold/TotalUnitRent in different towns using Parallel Coordinate Plot
## dwelling unit/total sold/ total rent comparison
pc_df1<- Heatmap_df %>%
select(c(1,2,9:10))
pc_df1<-pc_df1[order(pc_df1$bldg_contract_town),]
parcoords(
pc_df1[,1:4],
)
3.8. Visualizing ratio of TotalUnitSold/TotalUnitRent in different towns using Parallel Coordinate Plot
pc_df1
## bldg_contract_town total_dwelling_units TotalUnitSold TotalUnitRent
## 1: AMK 51312 46765 3861
## 2: BB 45092 43426 3359
## 3: BD 63031 58531 2650
## 4: BH 20072 19016 267
## 5: BM 54227 44127 6507
## 6: BP 35325 34680 571
## 7: BT 2554 2471 49
## 8: CCK 49026 47178 1353
## 9: CL 26730 25775 784
## 10: CT 12003 8887 1386
## 11: GL 30829 26803 3755
## 12: HG 57272 55142 2506
## 13: JE 24122 23359 543
## 14: JW 75207 71632 3386
## 15: KWN 39931 33174 2839
## 16: MP 7859 6534 1351
## 17: PG 51704 49029 3527
## 18: PRC 29654 29267 342
## 19: QT 33973 31204 3612
## 20: SB 30020 28182 3493
## 21: SGN 21632 21045 412
## 22: SK 69196 67378 2984
## 23: TAP 73210 69839 1516
## 24: TP 40591 36099 4069
## 25: WL 69900 65129 3049
## 26: YS 65158 62029 2962
## bldg_contract_town total_dwelling_units TotalUnitSold TotalUnitRent
pc_df2 <- transform(pc_df1, ratio = TotalUnitSold / TotalUnitRent)
pc_df2<- pc_df2%>%
select(c(1,5))
pc_df2<-pc_df2[order(pc_df2$bldg_contract_town),]
parcoords(
pc_df2[,1:2],
)
3.9. Visualizing TotalUnitSold/TotalUnitRent in different towns using scatter plot and plot the histogram on the scatter plot as extension
## scatter plot with histogram
base<-ggplot(data = pc_df1, aes(x=pc_df1$TotalUnitSold,y=pc_df1$TotalUnitRent))+ geom_point(stat = "identity")+
labs(x = "TotalUnitSold",y="TotalUnitRent")
## plot the histogram on the scatter plot as extension
base+
geom_vline(xintercept = 0,linetype = 'longdash')+
geom_hline(yintercept = 0, linetype = 'longdash')+
coord_cartesian(xlim = c(0,100000),
ylim = c(0,100000))
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
ggMarginal(base,type="histogram",color = "blue",fill= "light blue")
3.10. Plot the line chart using highcharter package to vizualize the change of TotaluUnitSold and TotalUnitRent in the years.
library(highcharter)
## Transform the data by selecting only year_completed/TotalUnitSold/TotalUnitRent
hg_data<- HDB_property %>%
select(c(4,31:32))
hg_df<- data.frame(hg_data)
setDT(hg_df)
hg_df<- hg_df[,lapply(.SD,sum),by=year_completed, .SDcols = 2:3]
hg_df<-hg_df[order(hg_df$year_completed),]
highchart() %>%
hc_add_theme(hc_theme_google()) %>%
hc_chart(type = "spline") %>%
hc_title(text = "HDB Property Information") %>%
hc_xAxis(categories = hg_df$year_completed) %>%
hc_add_series(name = "TotalUnitSold", data = hg_df$TotalUnitSold) %>%
hc_add_series(name = "TotalUnitRent", data = hg_df$TotalUnitRent ) %>%
hc_colors(colors = c('gray','blue'))
3.11. Plot the boxplot to visualize the HDB flat type overall distribution in Singapore using package highcharter.
# Transform the data by only selecting the HDB flat type
boxdata<- pc_df[,2:6]
boxdata[1:26]
## residential commercial market_hawker miscellaneous multistorey_carpark
## 1: 387 104 0 122 0
## 2: 423 82 0 98 0
## 3: 519 128 0 137 0
## 4: 231 72 0 54 0
## 5: 429 115 0 106 0
## 6: 338 31 0 82 0
## 7: 27 6 0 11 0
## 8: 523 90 0 122 3
## 9: 200 54 0 62 0
## 10: 79 44 2 32 0
## 11: 276 74 0 73 0
## 12: 590 144 0 145 0
## 13: 231 67 0 62 0
## 14: 732 150 0 194 0
## 15: 299 76 0 80 0
## 16: 59 12 0 21 0
## 17: 484 24 0 129 0
## 18: 431 72 0 73 0
## 19: 241 41 0 52 0
## 20: 254 15 0 77 0
## 21: 232 54 0 58 1
## 22: 621 39 0 166 0
## 23: 799 154 0 188 0
## 24: 315 81 0 88 0
## 25: 706 91 0 181 0
## 26: 651 127 0 163 0
## residential commercial market_hawker miscellaneous multistorey_carpark
## Transform the input into data table to plot the chart
box_data <- data.table(type =rep(c("residential","commercial ","market_hawker", "miscellaneous ","multistorey_carpark"),each =26),num = c(387,423,519 ,231 ,429, 338 , 27, 523 ,200 ,79 ,276, 590, 231, 732, 299, 59, 484, 431, 241 ,254 ,232, 621, 799, 315 ,706 ,651, 127 ,116 ,162 ,101 ,157, 44 , 6, 118 ,74 ,56, 98, 164 , 84 ,178, 105, 19 ,41 ,85 ,63 , 32, 67 , 66, 184, 102, 119, 150,10, 1 , 9 , 2 ,14, 1 , 1, 3 ,5 , 7 ,10 ,2 ,3 , 5 , 8 , 2 , 0 , 3 , 6 ,1 , 1 , 1, 2 , 7 ,6, 2,127, 112 ,144 , 56,120 , 87 , 11 ,130 , 67 , 34 , 82, 152, 65 ,205 , 89, 21, 141, 77, 76 , 85, 61, 184 ,195 , 98, 193 ,174, 20 , 33 , 35 , 19 , 52 , 48 , 3 , 66 ,12 , 9 ,17, 43 , 16 , 79 , 29 ,7 ,68 ,45 ,34 ,42,14 ,103 , 62 , 32 ,101,31))
## Plot the boxplot using package highcharter
hcboxplot(x = box_data$num, var =box_data$type,name = "type" ) %>%
hc_title(text = "HDB(Per Block)Overall Distribution",align="center")%>%
hc_yAxis(title = list(text = "Number of HDB(Per Building)"))%>%
hc_legend(enabled = FALSE) %>%
hc_add_theme(hc_theme_ft())
## Warning: 'hcboxplot' is deprecated.
## Use 'data_to_boxplot' instead.
## See help("Deprecated")
3.12. Plot the heatmap to visualize the HDB flat type distribution in different towns using package highcharter.
## create input for heatmap generation
type <- c( "miscellaneous", "multistorey_carpatype", "residential" ,"commercial", "market_hawker " )
town<- c("AMK","BB","BD","BH","BM","BP","BT","CCK","CL","CT","GL","HG","JE","JW" ,"KWN","MP","PG","PRC","QT","SB","SGN","SK","TAP","TP","WL","YS")
nums = c(
387,104,0,122,0,
423,82, 0 ,98,0,
519 ,128, 0 ,137, 0,
231,72,0,54,0,
429,115,0,106,0,
338 ,31, 0, 82 ,0,
27, 6, 0, 11, 0,
523, 90, 0, 122, 3,
200 ,54 ,0 ,62, 0,
79 ,44 ,2, 32, 0,
276 ,74 ,0, 73, 0,
590 ,144, 0, 145, 0,
231, 67, 0,62, 0,
732 ,150, 0 ,194, 0,
299 ,76 ,0, 80, 0,
59, 12, 0 ,21, 0,
484, 24, 0 ,129, 0,
431,72, 0 ,73, 0,
241, 41, 0, 52, 0,
254,15, 0,77, 0,
232, 54, 0, 58, 1,
621,39, 0 ,166, 0,
799, 154, 0, 188, 0,
315, 81, 0 ,88, 0,
706 ,91 ,0, 181, 0,
651, 127 ,0 ,163, 0)
## Form data table using package data.table
heatmap_data = data.table(type =rep(type,times = length(town)),
town =rep(town,each = length(type)),
num = length(type) *length(town))
## Plot the heatmap using package highcharter
highchart() %>%
hc_title(text = "HDB(Per Building) Type Distribution across Towns ",align="center")%>%
hc_xAxis(categories = type) %>%
hc_yAxis(categories = town)%>%
hc_colorAxis(min = 0,minColor = "#FFFFFF") %>%
hc_legend(align = "right",layout = "vertical",margin = 0,verticalAlign = "top",y = 25,symbolHeight = 400) %>%
hc_tooltip(formatter = JS("function () {return '<b>' + this.series.yAxis.categories[this.point.y] + '</b> has <br><b>' +
this.point.value + '</b> buildings of <br><b>' + this.series.xAxis.categories[this.point.x] + '</b>'+'</b> type <br><b>';}")) %>%
hc_add_series(name = "Sales per employee",data = heatmap_data,type = "heatmap",
hcaes(x = type,y = town,value = nums),
dataLabels = list(enabled = TRUE)) %>%
hc_add_theme(hc_theme_538())
4. Insights
4.1 Insight 1
Under section 3.4, I found out that in medium-size units being sold occupy for largest proportion among all the units being sold and nearly all the units being rent are small size and large size.
4.2 Insight 2
Under section 3.5, I found out that JW/TAP/SK/WL have the higher total dwelling units than other towns and JW has the highest total units sold.
4.3 Insight 3
Under section 3.8, I found out that PRC has the higest ratio of TotalUnitSold/TotalUnitRent and MP has the lowest ratio.
4.4 Insight 4
Under section 3.10, I found out that year 1984(73119), 1998(36609) and 2017(30187) having the top three numbers of units sold.
4.5 Insight 5
Under section 3.12, I found out that the most common HDB flat types ranked as residential, miscellaneous, commercial, multistory_carpark and market_hawker in descending order. TAP has the highest number of HDB block in residential and commercial type, CT is the only town having HDB block in market_hawker type.CCK and SGN are the only two having HDB block in multistorey_carpark type.