Social area analysis is the study of socioeconomic characteristics of a population. In his paper, Shevky looked into indicators such as social rank, urbanisation and ethnic groups segregation.
Here, Singapore is the area of study and we are interested to find out how we can make Singapore a more Liveable City. In the Economist Intelligence Unit’s (EIU) 2019 Global Liveability Index, Singapore is ranked 40th amongst the world’s Liveable Cities. Cities were ranked in terms of stability, healthcare, culture and environment, education and infrastructure.
Singapore is a unique destination because it is a country and a city at the same time. Her limited land means that population density is high. To reduce congestion in the Central Business District (CBD) area, the Government is developing office clusters in the heartlands, such as the Tampines and Jurong planning area, where residents can live, work and play.
In addition to the variables required, we will create indicators to understand Singapore’s liveability at the Planning Subzone level. Several Planning Subzones (e.g. Ang Mo Kio Town Centre, Cheng San etc.) make up a Planning Area (e.g. Ang Mo Kio).
Population density is the ratio of number of people to the land area. We will be using Singapore’s resident (citizens and permanent residents) population as the numerator since Singstat’s data does not include non-residents.
Old-age support ratio is defined as the ratio of elderly (65 years and over) to economically active population (25 to 64 years old). Although the Department of Statistics define the economically active population to be 20 to 64 years old, we will be using 25 to 64 years old instead. This is because a large proportion of Singapore’s 20 to 24 years old are serving National Service, or studying in Institutes of Higher Learning.
The old-age support ratio has been declining over the decades. This means that a fewer number of economically active population is supporting each elderly and could indicate a strain on economic resources.
Number of shopping amenities per 100 residents indicates the density of shopping facilities to the population. These shopping spaces provide basic necessities and material goods and are thus indicators of the standard of living of a population.
Number of financial services per 100 residents indicates the density of money changes, ATMs and other financial amenities to the resident population. These financial services enable residents to engage in financial transactions.
Number of businesses per 100 residents indicates the density of business entities to the resident population. These business entities provide employment to the population and having a low density in the heartlands reveal the potential for growth of office clusters in line with Singapore’s office decentralization strategy.
Number of private residences per 100 residents indicates the density of upmarket residential areas to the resident population. These upmarket areas provide more choices for Singapore’s growing rich population and makes it attractive for expats to live to Singapore.
Number of government services per 100 residents indicates the density of government organizations to the resident population. These government organizations, such as Town Council, serve the population and ensure that Singapore can continue to grow vibrantly.
We will be using three clustering techniques:
Hierarchical clustering is a common method of grouping the most similar clusters together. In particular, the Agglomerative Clustering is a bottom-up approach that combines the two most homogeneous clusters into a bigger cluster at each step. Dissimilarity can be measured by the Ward’s method which minimizes total within-cluster variance.
Next, we will use the K-means clustering to group the subzones. A reason for using another clustering method is that both hierarchical clustering and K-means clustering have their shortfalls. On one hand, the K-means algorithm requires pre-specification of the number of clusters and is sensitive to the starting points. In other words, varying the starting point could result in different clusters each time. On the other hand, a disadvantage of the hierarchical clustering is that the clusters are fixed and new observations do not change the cluster groups. K-means make up for this shortfall.
The spatially-constrained clustering technique will be used in a separate analysis. It is based on pruning a minimum spanning tree which is constructed from the contiguity structure of the spatial units.
In our analysis we need these data
tidyverse
for data wranglingsf
for data importation and geospatial data wranglingtmap
, ggplot
and ggpubr
for data visualizationggplot
for data visualizationcorrplot
for correlation plotspdep
for spatial handlingcluster
and heatmaply
for clustering analysisA new library used is the factoextra
, which enables us to create visualization for our clustering analysis. Another new library, gridExtra
enables us to plot multiple plots on the same page.
packages = c('spdep', 'ClustGeo', 'tmap', 'sf', 'ggpubr', 'cluster', 'heatmaply', 'corrplot', 'tidyverse','psych', 'factoextra')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
library(gridExtra)
population <- read_csv("data/aspatial/respopagesextod2011to2019.csv")
head(population)
## # A tibble: 6 x 7
## PA SZ AG Sex TOD Pop Time
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 1- and 2-Room Flats 0 2011
## 2 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 3-Room Flats 10 2011
## 3 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 4-Room Flats 30 2011
## 4 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 5-Room and Executive F~ 50 2011
## 5 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HUDC Flats (excluding thos~ 0 2011
## 6 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Landed Properties 0 2011
2019 population data required:
unique(population$AG)
## [1] "0_to_4" "5_to_9" "10_to_14" "15_to_19" "20_to_24"
## [6] "25_to_29" "30_to_34" "35_to_39" "40_to_44" "45_to_49"
## [11] "50_to_54" "55_to_59" "60_to_64" "65_to_69" "70_to_74"
## [16] "75_to_79" "80_to_84" "85_to_89" "90_and_over"
The grouping of the age groups will be direct. We will simply add the various age groups up to get the mega age groups as required.
This allows us to understand which types of dwelling there are and how to treat these variables into our desired groups.
population2019 <- population %>%
filter(Time == 2019)
aggregate(population2019$Pop, by=list(Category=population2019$TOD), FUN=sum)
## Category x
## 1 Condominiums and Other Apartments 590110
## 2 HDB 1- and 2-Room Flats 175050
## 3 HDB 3-Room Flats 582250
## 4 HDB 4-Room Flats 1340580
## 5 HDB 5-Room and Executive Flats 1065050
## 6 HUDC Flats (excluding those privatised) 0
## 7 Landed Properties 249980
## 8 Others 30400
Several of these dwelling types may seem foreign. Here are the definitions from SingStat:
Given that the available data is vague on what “Others” TOD is, we will drop it. Then, we will be grouping the TOD as follows
Indicators required | Corresponding data from population dataframe |
---|---|
HDB1-2RM dwellers | HDB 1- and 2-Room Flats |
HDB3-4RM dwellers | HDB 3-Room Flats + HDB 4-Room Flats |
HDB5RM-Ec dweller | HDB 5-Room and Executive Flats |
Condominium and apartment dwellers | Condominiums and Other Apartments |
Landed property dwellers | Landed Properties |
population2019 <- population2019 %>%
filter(Pop > 0) %>%
mutate_at(.vars = vars(PA, SZ), .funs = funs(toupper))
Steps done:
YOUNG
, ECONACTIVE
and AGED
columnsage_df <- population2019 %>%
spread(AG, Pop) %>%
mutate(YOUNG = `0_to_4`+`5_to_9`+`10_to_14`+`15_to_19`+`20_to_24`) %>%
mutate(ECONACTIVE = `25_to_29`+`30_to_34`+`35_to_39`+`40_to_44`+`45_to_49`+
`50_to_54`+`55_to_59`+`60_to_64`) %>%
mutate(AGED = `65_to_69`+`70_to_74`+`75_to_79`+`80_to_84`+`85_to_89`+`90_and_over`) %>%
select(`SZ`, `YOUNG`, `ECONACTIVE`, `AGED`)
age_df[is.na(age_df)] <- 0
age_df <- age_df %>%
group_by(`SZ`) %>%
summarize(`YOUNG`=sum(`YOUNG`), `ECONACTIVE`=sum(`ECONACTIVE`), `AGED`=sum(`AGED`))
head(age_df)
## # A tibble: 6 x 4
## SZ YOUNG ECONACTIVE AGED
## <chr> <dbl> <dbl> <dbl>
## 1 ADMIRALTY 4370 8380 890
## 2 ALEXANDRA HILL 2880 7560 2960
## 3 ALEXANDRA NORTH 580 1420 0
## 4 ALJUNIED 8220 24330 7460
## 5 ANAK BUKIT 5850 12330 3400
## 6 ANCHORVALE 15250 27740 3180
Same steps are done as the Age Group data for the Type of Dwelling feature, except that we are filtering out the ‘Others’ properties.
tod_df <- population2019 %>%
filter(!(TOD == c('HUDC Flats (excluding those privatised)','Others'))) %>%
spread(TOD, Pop) %>%
mutate(HDB12RM = `HDB 1- and 2-Room Flats`) %>%
mutate(HDB34RM = `HDB 3-Room Flats`+`HDB 4-Room Flats`) %>%
mutate(HDB5RMEC = `HDB 5-Room and Executive Flats`) %>%
mutate(CONDO = `Condominiums and Other Apartments`) %>%
mutate(LANDED = `Landed Properties`) %>%
select(`SZ`, `HDB12RM`, `HDB34RM`, `HDB5RMEC`, `CONDO`, `LANDED`)
tod_df[is.na(tod_df)] <- 0
tod_df <- tod_df %>%
group_by(`SZ`) %>%
summarize(`HDB12RM`=sum(`HDB12RM`), `HDB34RM`=sum(`HDB34RM`), `HDB5RMEC`=sum(`HDB5RMEC`),
`CONDO`=sum(`CONDO`), `LANDED`=sum(`LANDED`))
head(tod_df)
## # A tibble: 6 x 6
## SZ HDB12RM HDB34RM HDB5RMEC CONDO LANDED
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ADMIRALTY 1140 6580 6220 0 0
## 2 ALEXANDRA HILL 3910 6310 3120 0 0
## 3 ALEXANDRA NORTH 0 0 0 2120 0
## 4 ALJUNIED 2120 18020 6400 11930 460
## 5 ANAK BUKIT 160 60 3170 9020 7600
## 6 ANCHORVALE 1680 17190 22650 5000 0
sf_mpsz <- st_read(dsn = "data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\weich\Downloads\SMU2020\IS415\IS415_Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## proj4string: +proj=tmerc +lat_0=1.366666666666667 +lon_0=103.8333333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs
# head(sf_mpsz)
sf_business <- st_read(dsn = "data/geospatial",
layer = "Business")
## Reading layer `Business' from data source `C:\Users\weich\Downloads\SMU2020\IS415\IS415_Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 6550 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6147 ymin: 1.24605 xmax: 104.0044 ymax: 1.4698
## CRS: 4326
# head(sf_business)
sf_financial <- st_read(dsn = "data/geospatial",
layer = "Financial")
## Reading layer `Financial' from data source `C:\Users\weich\Downloads\SMU2020\IS415\IS415_Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 3320 features and 29 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6256 ymin: 1.24392 xmax: 103.9998 ymax: 1.46247
## CRS: 4326
# head(sf_financial)
sf_govtembassy <- st_read(dsn = "data/geospatial",
layer = "Govt_Embassy")
## Reading layer `Govt_Embassy' from data source `C:\Users\weich\Downloads\SMU2020\IS415\IS415_Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 443 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6282 ymin: 1.24911 xmax: 103.9884 ymax: 1.45765
## CRS: 4326
# head(sf_govtembassy)
sf_privresidential <- st_read(dsn = "data/geospatial",
layer = "Private residential")
## Reading layer `Private residential' from data source `C:\Users\weich\Downloads\SMU2020\IS415\IS415_Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 3604 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6295 ymin: 1.23943 xmax: 103.9749 ymax: 1.45379
## CRS: 4326
# head(sf_privresidential)
sf_shopping <- st_read(dsn = "data/geospatial",
layer = "Shopping")
## Reading layer `Shopping' from data source `C:\Users\weich\Downloads\SMU2020\IS415\IS415_Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 511 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.679 ymin: 1.24779 xmax: 103.9644 ymax: 1.4535
## CRS: 4326
# head(sf_shopping)
Some entries have the same POI_ID but have spelling differences (e.g. MINISTRY OF EDUCATION STELLAR CTR vs MINISTRY OF EDUCATION STELLAR CENTRE) or use an abbreviation (e.g. MEWER vs MINISTRY OF ENVIRONMENT AND WATER RESOURCES). We want to remove the duplicated entries.
duplicated()
function has a fromLast argument that returns the first till the second last duplicated entries but not the last if fromLast=TRUE
. It returns the second till the last duplicated entries but not the first if fromLast=FALSE
. Therefore, we use the OR (“|”) operator to get all the duplicated entries.View()
to view the whole list in a separate tab.View(sf_govtembassy[duplicated(sf_govtembassy$POI_ID) | duplicated(sf_govtembassy$POI_ID, fromLast=TRUE), ])
sf_business <- sf_business[!duplicated(sf_business$POI_ID),]
sf_financial <- sf_financial[!duplicated(sf_financial$POI_ID),]
sf_govtembassy <- sf_govtembassy[!duplicated(sf_govtembassy$POI_ID),]
sf_privresidential <- sf_privresidential[!duplicated(sf_privresidential$POI_ID),]
sf_shopping <- sf_shopping[!duplicated(sf_shopping$POI_ID),]
Create function that assigns 3414 Coordinate Reference System and identifies which Planning Subzone the services are located. Function returns a Simple Feature Dataframe.
sf_point_in_poly <- function(point, epsg, polygon){
point <- st_transform(point, epsg)
point <- st_join(point, polygon, join=st_intersects) %>%
select(`POI_ID`,`POI_NAME`,`SUBZONE_N`)
return(point)
}
sf_mpsz <- st_transform(sf_mpsz, 3414)
sf_business <- sf_point_in_poly(sf_business, 3414, sf_mpsz)
sf_financial <- sf_point_in_poly(sf_financial, 3414, sf_mpsz)
sf_govtembassy <- sf_point_in_poly(sf_govtembassy, 3414, sf_mpsz)
sf_privresidential <- sf_point_in_poly(sf_privresidential, 3414, sf_mpsz)
sf_shopping <- sf_point_in_poly(sf_shopping, 3414, sf_mpsz)
We will be creating the variables mentioned in the Variables section.
POPDENSE
) - divide resident population by area (SHAPE_Area
)OLDS_RATIO
) - divide elderly population (AGED
) by economically active population (ECONACTIVE
)SHOPPER100
) - divide the number of shopping amenities by every 100 peopleFINPER100
) - divide the number of financial services by every 100 peopleBIZPER100
) - divide the number of businesses by every 100 peoplePRIVRESPER100
) - divide the number of private residences by every 100 peopleGOVTPER100
) - divide the number of government services by every 100 peopleSteps:
pop_df <- population2019 %>%
group_by(`SZ`) %>%
summarize(`POP`=sum(`Pop`))
sf_mpsz <- sf_mpsz %>%
select(`SUBZONE_N`, `PLN_AREA_N`, `SHAPE_Area`)
sf_mpsz_select <- left_join(sf_mpsz, pop_df, by = c("SUBZONE_N" = "SZ")) %>%
mutate_at(vars(`POP`), ~replace(., is.na(.), 0)) %>%
mutate(POPDENSE = `POP` / `SHAPE_Area` * 1e6)
AGED
by ECONACTIVE
sf_mpsz_select <- left_join(sf_mpsz_select, age_df, by = c("SUBZONE_N" = "SZ")) %>%
mutate(OLDS_RATIO = `AGED` / `ECONACTIVE`)
Steps:
shop_df <- data.frame(table(sf_shopping$SUBZONE_N)) %>%
rename(`SUBZONE_N` = `Var1`, `shop_count` = `Freq`)
sf_mpsz_select <- left_join(sf_mpsz_select, shop_df) %>%
mutate(SHOPPER100 = `shop_count` / `POP` * 100)
fin_df <- data.frame(table(sf_financial$SUBZONE_N)) %>%
rename(`SUBZONE_N` = `Var1`, `fin_count` = `Freq`)
sf_mpsz_select <- left_join(sf_mpsz_select, fin_df) %>%
mutate(FINPER100 = `fin_count` / `POP` * 100)
biz_df <- data.frame(table(sf_shopping$SUBZONE_N)) %>%
rename(`SUBZONE_N` = `Var1`, `biz_count` = `Freq`)
sf_mpsz_select <- left_join(sf_mpsz_select, biz_df) %>%
mutate(BIZPER100 = `biz_count` / `POP` * 100)
privres_df <- data.frame(table(sf_privresidential$SUBZONE_N)) %>%
rename(`SUBZONE_N` = `Var1`, `privres_count` = `Freq`)
sf_mpsz_select <- left_join(sf_mpsz_select, privres_df) %>%
mutate(PRIVRESPER100 = `privres_count` / `POP` * 100)
Steps are same as above except that we add an additional step before that:
to_delete <- "CONSULATE|EMBASSY|HIGH COMMISSION"
sf_govtembassy <- sf_govtembassy[- grep(to_delete, sf_govtembassy$POI_NAME),]
govt_df <- data.frame(table(sf_govtembassy$SUBZONE_N)) %>%
rename(`SUBZONE_N` = `Var1`, `govt_count` = `Freq`)
sf_mpsz_select <- left_join(sf_mpsz_select, govt_df) %>%
mutate(GOVTPER100 = `govt_count` / `POP` * 100)
sf_mpsz_select <- left_join(sf_mpsz_select, tod_df, by = c("SUBZONE_N" = "SZ"))
sf_mpsz_select[is.na(sf_mpsz_select)] <- 0
Since our analysis has to do with the socioeconomic classes, we are only interested in the Subzones with resident population.
sf_mpsz_select <- sf_mpsz_select %>%
filter(`POP` > 0)
First, we will plot the required indicators in a histogram chart.
YOUNG_hist <- ggplot(data=sf_mpsz_select, aes(x=`YOUNG`)) +
geom_histogram(bins=20, color="black", fill="light blue")
ECONACTIVE_hist <- ggplot(data=sf_mpsz_select, aes(x=`ECONACTIVE`)) +
geom_histogram(bins=20, color="black", fill="light blue")
AGED_hist <- ggplot(data=sf_mpsz_select, aes(x=`AGED`)) +
geom_histogram(bins=20, color="black", fill="light blue")
HDB12RM_hist <- ggplot(data=sf_mpsz_select, aes(x=`HDB12RM`)) +
geom_histogram(bins=20, color="black", fill="light blue")
HDB34RM_hist <- ggplot(data=sf_mpsz_select, aes(x=`HDB34RM`)) +
geom_histogram(bins=20, color="black", fill="light blue")
HDB5RMEC_hist <- ggplot(data=sf_mpsz_select, aes(x=`HDB5RMEC`)) +
geom_histogram(bins=20, color="black", fill="light blue")
CONDO_hist <- ggplot(data=sf_mpsz_select, aes(x=`CONDO`)) +
geom_histogram(bins=20, color="black", fill="light blue")
LANDED_hist <- ggplot(data=sf_mpsz_select, aes(x=`LANDED`)) +
geom_histogram(bins=20, color="black", fill="light blue")
ggarrange(YOUNG_hist, ECONACTIVE_hist, AGED_hist, HDB12RM_hist, HDB34RM_hist,
HDB5RMEC_hist, CONDO_hist, LANDED_hist,
ncol = 4,
nrow = 2)
From the histograms above, we can tell that the data is highly skewed. Though the horizontal axis is in the ’000 range, the charts vary between 5000 and 80000. Hence, the range is not the same.
POPDENSE_hist <- ggplot(data=sf_mpsz_select, aes(x=`POPDENSE`)) +
geom_histogram(bins=20, color="black", fill="light blue")
OLDS_RATIO_hist <- ggplot(data=sf_mpsz_select, aes(x=`OLDS_RATIO`)) +
geom_histogram(bins=20, color="black", fill="light blue")
SHOPPER100_hist <- ggplot(data=sf_mpsz_select, aes(x=`SHOPPER100`)) +
geom_histogram(bins=20, color="black", fill="light blue")
FINPER100_hist <- ggplot(data=sf_mpsz_select, aes(x=`FINPER100`)) +
geom_histogram(bins=20, color="black", fill="light blue")
BIZPER100_hist <- ggplot(data=sf_mpsz_select, aes(x=`BIZPER100`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PRIVRESPER100_hist <- ggplot(data=sf_mpsz_select, aes(x=`PRIVRESPER100`)) +
geom_histogram(bins=20, color="black", fill="light blue")
GOVTPER100_hist <- ggplot(data=sf_mpsz_select, aes(x=`GOVTPER100`)) +
geom_histogram(bins=20, color="black", fill="light blue")
ggarrange(POPDENSE_hist, OLDS_RATIO_hist, SHOPPER100_hist, FINPER100_hist, BIZPER100_hist,
PRIVRESPER100_hist, GOVTPER100_hist,
ncol = 4,
nrow = 2)
From the histograms above, we can tell that the scale is vastly different from the variables above. This means that we have to perform data standardisation before our clustering analysis.
From the above histograms, we can plot the variables that seem to have a similar distribution on the choropleth map.
m1 <- tm_shape(sf_mpsz_select) +
tm_fill(col = "YOUNG",
n = 5,
style = "jenks",
title = "Young population") +
tm_borders(alpha = 0.5)
m2 <- tm_shape(sf_mpsz_select) +
tm_fill(col = "ECONACTIVE",
n = 5,
style = "jenks",
title = "Economically-active pop.") +
tm_borders(alpha = 0.5)
m3 <- tm_shape(sf_mpsz_select) +
tm_fill(col = "AGED",
n = 5,
style = "jenks",
title = "Elderly population") +
tm_borders(alpha = 0.5)
m4 <- tm_shape(sf_mpsz_select) +
tm_fill(col = "CONDO",
n = 5,
style = "jenks",
title = "Condo") +
tm_borders(alpha = 0.5)
tmap_arrange(m1, m2, m3, m4,
asp=NA, nrow=2, ncol=2)
The subzones have rather similar pattern across the four plots. This could suggest high correlation amongst these variables. For a more mathematical approach, we can plot the correlation plot.
The Correlation Analysis allows us to understand the bivariate relationship between variables. It includes the direction (positive/negative) of the relationship as well as how strongly they are correlated (the higher the correlation coefficient, the stronger the correlation). If variables are strongly correlated (correlation coefficient >= 0.85), we have to remove the variable that is more correlated with other variables.
to_corr <- sf_mpsz_select %>%
select(`SUBZONE_N`,`YOUNG`,`ECONACTIVE`,`AGED`,`HDB12RM`,`HDB34RM`,`HDB5RMEC`,`CONDO`,`LANDED`,
`POPDENSE`,`OLDS_RATIO`,`SHOPPER100`,`FINPER100`,`BIZPER100`,`PRIVRESPER100`,`GOVTPER100`) %>%
st_set_geometry(NULL)
to_corr <- do.call(data.frame,lapply(to_corr, function(x) replace(x, is.infinite(x),0)))
cluster_vars.cor = cor(to_corr[,2:16])
corrplot.mixed(cluster_vars.cor,
lower = "ellipse",
upper = "number",
tl.pos = "lt",
diag = "l",
tl.col = "black")
From the plot above, we can tell that the YOUNG
population is highly correlated with three variables:
ECONACTIVE
population (0.98),HDB34RM
(0.85), andHDB5RMEC
(0.91).The ECONACTIVE
population is highly correlated with four variables:
YOUNG
population (0.98),AGED
population (0.87),HDB34RM
(0.9), andHDB5RMEC
(0.89).Therefore, between the YOUNG
and ECONACTIVE
variable, we will retain the YOUNG
variable.
The HDB34RM
variable is highly correlated with three variables:
YOUNG
(0.9),ECONACTIVE
(0.85), andAGED
(0.9).The HDB5RMEC
variable is highly correlated with two variables:
YOUNG
(0.91), andECONACTIVE
(0.89).Therefore, between the HDB34RM
and HDB5RMEC
variable, we will retain the HDB5RMEC
variable.
Now, we have the YOUNG
and HDB5RMEC
variable being highly correlated with each other. Since YOUNG
is more correlated with AGED
(0.78) than HDB5RMEC
is correlated with AGED
(0.65), we will remove the YOUNG
variable.
The SHOPPER100
variable is highly correlated with three variables:
FINPER100
(0.88),BIZPER100
(1), andPRIVRESPER100
(0.9).Since SHOPPER100
and BIZPER100
have perfect correlation, we will retain either (e.g. SHOPPER100
).
The FINPER100
variable is highly correlated with two variables:
SHOPPER100
(0.88), andBIZPER100
(0.88).The PRIVRESPER100
variable is highly correlated with two variables:
SHOPPER100
(0.9), andBIZPER100
(0.9).Therefore, between SHOPPER100
, FINPER100
, and PRIVRESPER100
, we will retain FINPER100
.
In sum, these are the variables to include in the clustering analysis: AGED
, HDB12RM
, HDB5RMEC
, CONDO
, LANDED
, POPDENSE
, OLDS_RATIO
, FINPER100
, GOVTPER100
.
Once again, the Hierarchical Clustering technique groups the most homogeneous observations together. We can use this to find the most similar clusters and understand what it suggests about Singapore’s liveability.
cluster_vars <- to_corr %>%
select(`SUBZONE_N`,`AGED`, `HDB12RM`, `HDB5RMEC`, `CONDO`, `LANDED`, `POPDENSE`, `OLDS_RATIO`, `FINPER100`, `GOVTPER100`)
row.names(cluster_vars) <- cluster_vars$SUBZONE_N
cluster_vars <- select(cluster_vars, c(2:10))
head(cluster_vars)
## AGED HDB12RM HDB5RMEC CONDO LANDED POPDENSE OLDS_RATIO
## PEARL'S HILL 1700 4520 0 420 0 11843.1719 0.5279503
## BOAT QUAY 0 0 0 50 0 435.3031 0.0000000
## HENDERSON HILL 2790 3930 1270 190 0 22471.1973 0.3780488
## REDHILL 1190 1970 3240 1930 0 27695.3656 0.1900958
## ALEXANDRA HILL 2960 3910 3120 0 0 13373.7225 0.3915344
## BUKIT HO SWEE 3330 3700 1900 540 0 26788.3659 0.4007220
## FINPER100 GOVTPER100
## PEARL'S HILL 0.36199095 0.01508296
## BOAT QUAY 2.85714286 1.42857143
## HENDERSON HILL 0.02989537 0.00000000
## REDHILL 0.11183597 0.00000000
## ALEXANDRA HILL 0.10159652 0.02902758
## BUKIT HO SWEE 0.04059540 0.01353180
As mentioned in the EDA, the scale of the data is highly varied and standardisation is necessary. We will use Z-score standardisation which rescales data to have mean=0 and standard deviation=1. It is preferred over normalization as it can deal with outliers.
cluster_vars.z <- scale(cluster_vars)
describe(cluster_vars.z)
## vars n mean sd median trimmed mad min max range skew kurtosis
## AGED 1 234 0 1 -0.26 -0.17 0.81 -0.83 6.40 7.23 2.36 9.34
## HDB12RM 2 234 0 1 -0.65 -0.22 0.00 -0.65 3.42 4.07 1.69 2.12
## HDB5RMEC 3 234 0 1 -0.36 -0.22 0.36 -0.60 5.77 6.37 2.80 9.44
## CONDO 4 234 0 1 -0.40 -0.18 0.62 -0.82 4.64 5.46 1.58 2.67
## LANDED 5 234 0 1 -0.44 -0.24 0.00 -0.44 7.23 7.67 3.93 18.93
## POPDENSE 6 234 0 1 -0.23 -0.08 1.18 -1.21 2.55 3.75 0.55 -0.86
## OLDS_RATIO 7 234 0 1 -0.08 -0.05 1.24 -1.32 2.62 3.94 0.21 -0.97
## FINPER100 8 234 0 1 -0.21 -0.19 0.01 -0.22 10.14 10.36 7.26 58.58
## GOVTPER100 9 234 0 1 -0.19 -0.18 0.00 -0.19 10.96 11.15 7.89 71.05
## se
## AGED 0.07
## HDB12RM 0.07
## HDB5RMEC 0.07
## CONDO 0.07
## LANDED 0.07
## POPDENSE 0.07
## OLDS_RATIO 0.07
## FINPER100 0.07
## GOVTPER100 0.07
r <- ggplot(data=cluster_vars, aes(x= `AGED`)) +
geom_histogram(bins=20, color="black", fill="light blue") +
ggtitle("Before Standardisation")
r2 <- ggplot(data=cluster_vars, aes(x= `OLDS_RATIO`)) +
geom_histogram(bins=20, color="black", fill="light blue")
cluster_vars.z_df <- as.data.frame(cluster_vars.z)
z <- ggplot(data=cluster_vars.z_df, aes(x=`AGED`)) +
geom_histogram(bins=20, color="black", fill="light blue") +
ggtitle("Z-score Standardisation")
z2 <- ggplot(data=cluster_vars.z_df, aes(x=`OLDS_RATIO`)) +
geom_histogram(bins=20, color="black", fill="light blue")
ggarrange(r, z, r2, z2,
ncol = 2, nrow = 2)
As seen from the above chart, the AGED and OLDS_RATIO are of very different scales before standardisation. After Z-score standardisation, they have a similar range.
The agnes()
function enables us to get the agglomerative coefficient, which measures the strength of clustering. Thus, we will use a function that computes the agglomerative coefficient for each clustering method (the closer the value to 1, the stronger the clustering structure).
Steps:
m <- c( "average", "single", "complete", "ward")
names(m) <- c( "average", "single", "complete", "ward")
ac <- function(x) {
agnes(cluster_vars.z, method = x)$ac
}
map_dbl(m, ac)
## average single complete ward
## 0.8896250 0.8561701 0.9350264 0.9662691
Ward gives the strongest clustering structure. Therefore, we will be using the Ward’s method in our hierarchical clustering.
Steps:
proxmat <- dist(cluster_vars.z, method = 'euclidean')
hclust_ward <- hclust(proxmat, method = 'ward.D')
plot(hclust_ward, cex = 0.6)
The fviz_nbclust()
function enables us to find the optimal number of clusters with different methods. We will use FUNcluster=hcut for hierarchical clustering.
The Elbow Method plots the total within sum of squares (i.e. intraclass variance) against the number of clusters, k. A low TWSS means that the class is more compact. We get the optimal number of clusters by finding a bend.
fviz_nbclust(cluster_vars.z, hcut, method = "wss")
This Elbow Method plot does not give a clearcut bend for us to determine the optimal number of clusters. We could turn to the Average Silhouette Method.
The Average Silhouette Method measures how well the observation lies in the cluster. It plots the average sihouette width against the number of clusters, k. We pick the k that is maximised.
fviz_nbclust(cluster_vars.z, hcut, method = "silhouette")
The optimal number of clusters would be k=6 where the average sihouette width is maximised.
Use rect.hclust() and specify the optimal number of clusters, k=6.
plot(hclust_ward, cex = 0.6)
rect.hclust(hclust_ward, k = 6, border = 2:3)
Steps:
sub_grp <- cutree(hclust_ward, k = 6)
ncluster_vars.z_df <- cluster_vars.z_df %>%
mutate(cluster = sub_grp)
fviz_cluster(list(data = cluster_vars.z, cluster = sub_grp), geom = "point")
We can see that the clusters have some overlapping dimensions portions on the scatter plot. This reveals that the indicators may not be significant enough to differentiate between the clusters. We can use the interactive cluster heatmap to find out the strength of the variables.
The heatmaply() function allows us to interact with the heat map by hovering our mouse over the plot. The darker the color of the bars, the higher the value of the variables.
heatmaply(cluster_vars.z,
Colv=NA,
dist_method = "euclidean",
hclust_method = "ward.D",
seriate = "OLO",
colors = Blues,
k_row = 6,
margins = c(NA,200,60,NA),
fontsize_row = 4,
fontsize_col = 5,
main="Geographic Segmentation of Singapore by Liveability Indicators",
xlab = "Liveability Indicators",
ylab = "Singapore Planning Subzones"
)
The heatmap does not seem too intuitive for us to understand how the variables affect the clustering. However, we could gather several insights:
We can use qtm() to do a simple map of the hierarhical clustering to understand the geographic distribution.
sf_ncluster_vars.z_df <- cbind(ncluster_vars.z_df, sf_mpsz_select$geometry)
sf_ncluster_vars.z_df <- st_as_sf(sf_ncluster_vars.z_df)
qtm(sf_ncluster_vars.z_df, "cluster")
From the map above, we can gather that the subzones in a particular cluster are geographically scattered.
The heatmap showed us that the indicators were not very telling across all the subzones in the cluster. For instance, some subzones in cluster 3 have a low number of financial and government services per 100 people while others in this cluster have a high number. Therefore, we can summarize them and take the mean to compare a single number across the indicators.
ncluster_vars.z_df %>%
group_by(`cluster`) %>%
summarize_all(mean)
## # A tibble: 6 x 10
## cluster AGED HDB12RM HDB5RMEC CONDO LANDED POPDENSE OLDS_RATIO FINPER100
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.792 2.15 0.266 0.458 -0.301 0.815 0.740 -0.198
## 2 2 -0.727 -0.510 -0.463 -0.313 -0.267 -0.755 -0.901 -0.00986
## 3 3 0.941 0.174 1.44 0.448 -0.188 1.24 0.0611 -0.210
## 4 4 -0.833 -0.648 -0.605 -0.781 -0.432 -1.17 -1.32 3.96
## 5 5 0.129 0.110 -0.283 -0.588 -0.394 0.460 1.09 -0.169
## 6 6 0.182 -0.339 -0.408 0.951 1.95 -0.592 0.759 -0.200
## # ... with 1 more variable: GOVTPER100 <dbl>
Analysis of cluster mean will be done below after finding out the significant variables.
The Analysis of Variance (ANOVA) test allows us to decipher whether the differences between means are statistically significant.
Hypothesis:
Confidence interval: 95%
We reject the null hypothesis if the p-value is less than the alpha value of 0.05.
res.aov <- aov(cluster ~ ., data = ncluster_vars.z_df)
summary(res.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## AGED 1 11.07 11.07 10.359 0.001480 **
## HDB12RM 1 63.96 63.96 59.852 3.46e-13 ***
## HDB5RMEC 1 36.97 36.97 34.595 1.46e-08 ***
## CONDO 1 0.06 0.06 0.056 0.813087
## LANDED 1 68.00 68.00 63.631 7.68e-14 ***
## POPDENSE 1 30.22 30.22 28.282 2.53e-07 ***
## OLDS_RATIO 1 138.07 138.07 129.197 < 2e-16 ***
## FINPER100 1 14.58 14.58 13.642 0.000278 ***
## GOVTPER100 1 11.55 11.55 10.811 0.001171 **
## Residuals 224 239.39 1.07
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the ANOVA test, we find out that eight out of 9 variables are statistically significant at the 95% confidence interval. We can reject H0 for these variables to conclude that the difference between the means are not the same.
Cluster 1
Cluster 1 has a high ageing population and high old-age support ratio. The number of HDB 1-2 Room Flats is significantly higher than the other clusters as well. This could signify that it is a less prosperous zone. However, it also has a high population density.
Since AGED
is highly correlated with ECONACTIVE
, the size of the economically-active population is high in this cluster. Yet, the number of financial services and hence business entities and shopping amenities per 100 residents is low in this cluster. The Government could turn this cluster into a live-work-play area to decentralize office locations. Given that some of these subzones are located far from the central business district, it could be a viable alternative to explore.
Cluster 3
Cluster 3 has the highest ageing population. However, its old-age support ratio is significantly lower than that of cluster 1. This may create a strain on the economically-active population. Therefore, the Government could pay more attention to this cluster and bring in more elderly-friendly amenities for the elderly to age gracefully.
Cluster 6
Cluster 6 has a high number of condominiums and landed properties. Its population density, however, is significantly low. This may signify an upmarket region where the Government can attract expats to live in the region. Some of these subzones are located far from the Central Business District, which means that upmarket facilities could be built in these subzones to attract this population with high earnings power.
Cluster 4
Cluster 4 has a high number of financial and government services per 100 residents. Its population density, however, is significantly low. The Government could look into the accessibility of this area since residents who work here do not live close to their working location.
Cluster 5
Cluster 5 has a high old-age support ratio although its elderly population is not high. This may mean that there is little cause for concern here with regards to the elderly population.
Cluster 2
Cluster 2 scores low on all indicators. It may not be the most important region to look into at the moment.
The Spatially Constrained Clustering technique groups makes the clusters more geographically compact. It takes into consideration the distance between the clusters.
Since the SKATER function only supports sp object, we have to convert the sf object into a sp object.
sp_cluster_vars.z <- as_Spatial(sf_ncluster_vars.z_df[,1:9])
cluster_vars.nb <- poly2nb(sp_cluster_vars.z)
summary(cluster_vars.nb)
## Neighbour list object:
## Number of regions: 234
## Number of nonzero links: 1198
## Percentage nonzero weights: 2.187888
## Average number of links: 5.119658
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9
## 4 10 25 36 69 41 33 11 5
## 4 least connected regions:
## 13 204 222 233 with 1 link
## 5 most connected regions:
## 59 66 71 121 131 with 9 links
The coordinates of the sp object gets the centroid position of the subzones and will be used as the nodes in the plot.
plot(sp_cluster_vars.z, border=grey(.5))
plot(cluster_vars.nb, coordinates(sp_cluster_vars.z), col="blue", add=TRUE)
The nbcosts()
function calculates the distance between the nodes. This is the pairwise dissimilarity between the subzone and its neighbours.
Then, we convert is to a weights object with nb2listw()
function.
lcosts <- nbcosts(cluster_vars.nb, cluster_vars.z)
cluster_vars.w <- nb2listw(cluster_vars.nb, lcosts, style="B")
summary(cluster_vars.w)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 234
## Number of nonzero links: 1198
## Percentage nonzero weights: 2.187888
## Average number of links: 5.119658
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9
## 4 10 25 36 69 41 33 11 5
## 4 least connected regions:
## 13 204 222 233 with 1 link
## 5 most connected regions:
## 59 66 71 121 131 with 9 links
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 234 54756 3885.8 34394.62 349721.3
The mstree()
function calculates the minimum spanning tree.
cluster_vars.mst <- mstree(cluster_vars.w)
Checking class and dimensions
class(cluster_vars.mst)
## [1] "mst" "matrix"
dim(cluster_vars.mst)
## [1] 233 3
clust <- skater(cluster_vars.mst[,1:2], cluster_vars.z, 5)
ccs <- clust$groups
table(ccs)
## ccs
## 1 2 3 4 5 6
## 217 5 4 2 5 1
This gives us the number of observations in each cluster. Cluster 1 encompasses majority of the subzones.
plot(sp_cluster_vars.z, border=gray(.5))
plot(clust, coordinates(sp_cluster_vars.z), cex.lab=.7,
groups.colors=c("red", "green", "blue", "brown", "pink", "orange"), cex.circles=0.005, add=TRUE)
groups_mat <- as.matrix(clust$groups)
sf_spatialcluster <- cbind(sf_ncluster_vars.z_df, as.factor(groups_mat)) %>%
rename(`SP_CLUSTER`=`as.factor.groups_mat.`)
qtm(sf_spatialcluster, "SP_CLUSTER")
sf_spatialcluster %>%
group_by(`SP_CLUSTER`) %>%
summarize_all(mean)
## Simple feature collection with 6 features and 11 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 6351.521 ymin: 24495.28 xmax: 45539.4 ymax: 49597.42
## CRS: EPSG:3414
## # A tibble: 6 x 12
## SP_CLUSTER AGED HDB12RM HDB5RMEC CONDO LANDED POPDENSE OLDS_RATIO
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 -0.129 -0.0683 -0.0955 -0.0858 -0.0674 -0.0291 -0.0153
## 2 2 3.02 0.762 1.81 2.29 2.37 0.290 0.669
## 3 3 0.914 0.389 3.10 0.986 -0.384 1.79 -0.298
## 4 4 -0.833 -0.648 -0.605 -0.741 -0.435 -1.16 -1.32
## 5 5 2.33 2.28 0.221 1.11 1.12 0.245 1.03
## 6 6 -0.833 -0.648 -0.605 -0.821 -0.435 -1.21 -1.32
## # ... with 4 more variables: FINPER100 <dbl>, GOVTPER100 <dbl>, cluster <dbl>,
## # geometry <GEOMETRY [m]>
p1 <- ggplot(sf_spatialcluster, aes(x=SP_CLUSTER, y=AGED)) +
geom_boxplot() +
coord_flip()
p2 <- ggplot(sf_spatialcluster, aes(x=SP_CLUSTER, y=HDB12RM)) +
geom_boxplot() +
coord_flip()
p3 <- ggplot(sf_spatialcluster, aes(x=SP_CLUSTER, y=HDB5RMEC)) +
geom_boxplot() +
coord_flip()
p4 <- ggplot(sf_spatialcluster, aes(x=SP_CLUSTER, y=CONDO)) +
geom_boxplot() +
coord_flip()
p5 <- ggplot(sf_spatialcluster, aes(x=SP_CLUSTER, y=LANDED)) +
geom_boxplot() +
coord_flip()
p6 <- ggplot(sf_spatialcluster, aes(x=SP_CLUSTER, y=POPDENSE)) +
geom_boxplot() +
coord_flip()
p7 <- ggplot(sf_spatialcluster, aes(x=SP_CLUSTER, y=OLDS_RATIO)) +
geom_boxplot() +
coord_flip()
p8 <- ggplot(sf_spatialcluster, aes(x=SP_CLUSTER, y=FINPER100)) +
geom_boxplot() +
coord_flip()
p9 <- ggplot(sf_spatialcluster, aes(x=SP_CLUSTER, y=GOVTPER100)) +
geom_boxplot() +
coord_flip()
ggarrange(p1,p2,p3,p4,p5,p6,p7,p8,p9,
ncol = 3, nrow = 3)
Cluster 2
Cluster 2 (yellow, e.g. Frankel, Simei) seems to be a mixed of rich and poor neighbourhoods with a mid-number of HDB 1-2 Room Flats, but high number of landed and condominium properties. Since population density is somewhat high and the number of financial services per 100 people is low, more can be done in this area to serve the people. Since FINPER100
is highly correlated with BIZPER100
, there is potential to turn this cluster into a work-play-live area in line with the Government’s aim of decentralizing office spaces.
In addition, since ageing population and old-age support ratio is high, elderly-friendly amenities need to be in place to serve this group.
Cluster 3
Cluster 3 (purple, e.g. Sengkang Town Centre, Punggol Field) is high in population density and the number of HDB 5 Room Flats and Executive Condominiums. Since ageing population and old-age support ratio is mid level, it suggests that subzones in this cluster are new estates which seeks to attract the new generation families. Since the number of financial services per 100 people is low, the number of business entities per 100 people is low as well. In line with the Government’s strategy to develop the northeast region into bustling new towns, the Government could build innovative business clusters which are attractive to the younger economically-active population.
Cluster 4
Cluster 4 (pink, e.g. Boulevard, Somerset) is low on all indicators except for the number of financial and government services per 100 people. As the premium shopping belt of Singapore, it is high in tourist traffic. Expats could be living in this cluster. Since shopping amenities and business entities are high in number in this region, it could rank high in liveability for expats.
Cluster 6
Cluster 6 (orange, e.g. Lim Chu Kang) is low on all indicators except for having the highest number of government services per 100 people. This could be due to the low population.
Cluster 5
Cluster 5 (blue, e.g. Geylang East, Aljunied) has a high elderly population, as well as high old-age support ratio. Since population density is high and financial services is low, more can be done to bring these financial services closer to the elderly people. As some elderly may be lowly educated and not financially literate, an offline approach is needed to serve this population.
Cluster 1
Cluster 1 (green) scores in the mid range for all indicators. Perhaps this cluster could be isolated for further analysis.
Since the optimal number of clusters obtained by hierarchical clustering is 6, we will plot the K-Means on a scatter plot from k=4 to k=7.
k4 <- kmeans(cluster_vars.z, centers = 4, nstart = 25)
k5 <- kmeans(cluster_vars.z, centers = 5, nstart = 25)
k6 <- kmeans(cluster_vars.z, centers = 6, nstart = 25)
k7 <- kmeans(cluster_vars.z, centers = 7, nstart = 25)
# plots to compare
p1 <- fviz_cluster(k4, geom = "point", data = cluster_vars.z) + ggtitle("k = 4")
p2 <- fviz_cluster(k5, geom = "point", data = cluster_vars.z) + ggtitle("k = 5")
p3 <- fviz_cluster(k6, geom = "point", data = cluster_vars.z) + ggtitle("k = 6")
p4 <- fviz_cluster(k7, geom = "point", data = cluster_vars.z) + ggtitle("k = 7")
grid.arrange(p1, p2, p3, p4, nrow = 2)
We can see that as k increases, the portion of overlaps increases. Again, we can use a more mathematical approach to get the optimal number of clusters.
We will use the Average Sihouette Method since it is more clearcut than the Elbow Method.
fviz_nbclust(cluster_vars.z, kmeans, method = "silhouette")
The optimal number of clusters is 3.
k3 <- kmeans(cluster_vars.z, centers = 3, nstart = 25)
fviz_cluster(k3, geom = "point", data = cluster_vars.z) + ggtitle("k = 3")
Now, we see that the optimal number of clusters, k=3 gives a clear separation between the observations.
Again, summarizing the results would give us a better measure of how the indicators fair amongst the clusters.
kmeans_cluster <- cluster_vars.z_df %>%
mutate(cluster = k3$cluster) %>%
group_by(`cluster`) %>%
summarize_all(mean)
kmeans_cluster
## # A tibble: 3 x 10
## cluster AGED HDB12RM HDB5RMEC CONDO LANDED POPDENSE OLDS_RATIO FINPER100
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 -0.568 -0.548 -0.475 -0.0827 0.0932 -0.669 -0.383 -0.0579
## 2 2 -0.833 -0.648 -0.605 -0.781 -0.432 -1.17 -1.32 3.96
## 3 3 0.794 0.755 0.657 0.162 -0.0900 0.947 0.588 -0.205
## # ... with 1 more variable: GOVTPER100 <dbl>
sf_kmeans_cluster <- cbind(kmeans_cluster, sf_mpsz_select$geometry)
sf_kmeans_cluster <- st_as_sf(sf_kmeans_cluster)
qtm(sf_kmeans_cluster, "cluster")
All three clusters are geographically scattered.
Cluster 2
Cluster 2 is high on the number of financial and government services per 100 resident. Since population density is not high, this may mean an oversupply of such services and relocation could be explored.
Cluster 3
Cluster 3 is high in ageing population and population density. However, the number of financial and government services per 100 residents is not high. This may mean an undersupply of such services and relocation could be explored. Thus, this area seems like a residential area and not a working area. Should the Government explore turning this cluster into a live-work-play area, the Government would need to build more amenities to serve this group. In particular, elderly-friendly services need to be present.
Cluster 1
Cluster 1 scores low on all indicators. This may mean that it is not an immediate source for concern.
The clustering techniques have allowed us to understand the social spaces in Singapore. In particular, we wanted to understand how Singapore could become a more liveable city by analysing several demographic and level of living indicators.
Since all clustering techniques have given varied conclusions, the technique to use depend on the Government strategy. For instance, the socially constrained cluster technique is useful for a macro planning level. However, the other clustering methods can allow the Government to understand the Subzone differences of a Planning Area and create a Planning Area strategy that incorporates these subzones.
The indicators used are limited in scope and does not give us a full picture on whether Singapore is a Liveable City. As mentioned earlier, the liveability includes the culture of a place, and even how economically vibrant it is. However, these indicators are not available at the Planning Subzone level.