1 Getting Started

In this take-home exercise, we are tasked to segment Singapore at the planning subzone level into homogeneous socioeconomic areas by combining geodemographic data extracted from Singapore Department of Statistics and urban functions extracted from the geospatial data provided.

1.1 The Data

We are provided the following geospatial datasets:

  • Government including embassy
  • Business
  • Shopping
  • Financial
  • Upmarket residential area

They are all in ESRI shapefile format.

Business encompasses Industry as well and we are to extract both from it.

2 Installing & Loading the necessary packages

packages = c('rgdal', 'spdep', 'ClustGeo',  'tmap', 'sf', 'ggpubr', 'cluster', 'heatmaply', 'corrplot', 'psych', 'tidyverse','tmaptools','factoextra','NbClust')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
    }
  library(p,character.only = T)
}

3 Importing Data to R Environment

3.1 Importing Geospatial Subzone Data

mpsz = st_read(dsn = "data/geospatial", layer="MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\jiiireh\Desktop\Take-home_ex03\Take-home_ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21

3.1.1 Data Validity & NA Checking

unique(st_is_valid(mpsz,reason = TRUE))
##  [1] "Valid Geometry"                                           
##  [2] "Ring Self-intersection[27932.3925999999 21982.7971999999]"
##  [3] "Ring Self-intersection[26885.4439000003 26668.3121000007]"
##  [4] "Ring Self-intersection[26920.1689999998 26978.5440999996]"
##  [5] "Ring Self-intersection[15432.4749999996 31319.716]"       
##  [6] "Ring Self-intersection[12861.3828999996 32207.4923]"      
##  [7] "Ring Self-intersection[19681.2353999997 31294.4521999992]"
##  [8] "Ring Self-intersection[41375.108 40432.8588999994]"       
##  [9] "Ring Self-intersection[38542.2260999996 44605.4089000002]"
## [10] "Ring Self-intersection[21702.5623000003 48125.1154999994]"
mpsz[rowSums(is.na(mpsz))!=0,]
## Simple feature collection with 0 features and 15 fields
## bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
## projected CRS:  SVY21
##  [1] OBJECTID   SUBZONE_NO SUBZONE_N  SUBZONE_C  CA_IND     PLN_AREA_N
##  [7] PLN_AREA_C REGION_N   REGION_C   INC_CRC    FMEL_UPD_D X_ADDR    
## [13] Y_ADDR     SHAPE_Leng SHAPE_Area geometry  
## <0 rows> (or 0-length row.names)

As we can observe there are some invalid polygons in mpsz. We will resolve this with the function st_make_valid(). We can also see that there are no NA values.

3.1.2 Making mpsz valid

mpsz <- st_make_valid(mpsz)

st_make_Valid attempts to repair invalidities without only minimal alterations to the input geometries. No vertices are dropped or moved, the structure of the object is simply re-arranged. This is a good thing for clean, but invalid data, and a bad thing for messy and invalid data.

3.1.3 Ensuring no invalid data remains

unique(st_is_valid(mpsz,reason = TRUE))
## [1] "Valid Geometry"

3.1.4 Checking CRS of mpsz

st_crs(mpsz)
## Coordinate Reference System:
##   User input: SVY21 
##   wkt:
## PROJCRS["SVY21",
##     BASEGEOGCRS["SVY21[WGS84]",
##         DATUM["World Geodetic System 1984",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]],
##             ID["EPSG",6326]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["Degree",0.0174532925199433]]],
##     CONVERSION["unnamed",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]

We see that mpsz is already is SVY21 or ESPG:3414, so no transformation will be required.

3.2 Importing Aspatial Residential 2019 Data

rDwelling <- read_csv ("data/aspatial/respopagesextod2011to2019.csv") %>%
  filter(Time == 2019)
## Parsed with column specification:
## cols(
##   PA = col_character(),
##   SZ = col_character(),
##   AG = col_character(),
##   Sex = col_character(),
##   TOD = col_character(),
##   Pop = col_double(),
##   Time = col_double()
## )

3.2.1 Aspatial Data Wrangling

unique(rDwelling$TOD)
## [1] "HDB 1- and 2-Room Flats"                
## [2] "HDB 3-Room Flats"                       
## [3] "HDB 4-Room Flats"                       
## [4] "HDB 5-Room and Executive Flats"         
## [5] "HUDC Flats (excluding those privatised)"
## [6] "Landed Properties"                      
## [7] "Condominiums and Other Apartments"      
## [8] "Others"

We have a couple of TODs that we will not be using for our study: “HUDC Flats” & “Others”. Hence, they will need to be removed from the data.

3.2.2 Removing Unused TODs

rDwelling <-rDwelling[(rDwelling$TOD!="HUDC Flats (excluding those privatised)" & rDwelling$TOD!="Others"),]

3.2.3 Ensuring the TODs are removed

unique(rDwelling$TOD)
## [1] "HDB 1- and 2-Room Flats"           "HDB 3-Room Flats"                 
## [3] "HDB 4-Room Flats"                  "HDB 5-Room and Executive Flats"   
## [5] "Landed Properties"                 "Condominiums and Other Apartments"

4 Extracting Indicators

4.1 Economy Active

E_Active <- rDwelling %>%
  filter(AG == "25_to_29"| AG == "30_to_34"| AG == "35_to_39"| AG == "40_to_44" | AG == "45_to_49" |AG == "50_to_54"| AG == "55_to_59" | AG == "60_to_64")  %>%
  group_by(SZ = SZ) %>%
  summarize(Pop = sum(Pop))%>%
  mutate_at(.vars = vars(SZ), .funs = funs(toupper))
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.
colnames(E_Active)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(E_Active)[2] <- "E_Active" # Rename Pop to unique value for joining

4.2 Young

Young <- rDwelling %>%
  filter(AG == "0_to_4"| AG == "5_to_9"|AG == "10_to_14"|AG == "15_to_19"| AG == "20_to_24")  %>%
  group_by(SZ = SZ) %>%
  summarize(Pop = sum(Pop))%>%
  mutate_at(.vars = vars(SZ), .funs = funs(toupper))

colnames(Young)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(Young)[2] <- "Young" # Rename Pop to unique value for joining

4.3 Aged

Aged <- rDwelling %>%
  filter(AG == "65_to_69"| AG == "70_to_74"| AG == "75_to_79"| AG == "80_to_84" | AG == "85_to_89" | AG == "90_and_over")  %>%
  group_by(SZ = SZ) %>%
  summarize(Pop = sum(Pop))%>%
  mutate_at(.vars = vars(SZ), .funs = funs(toupper))

colnames(Aged)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(Aged)[2] <- "Aged" # Rename Pop to unique value for joining

4.4 Population Density

Pop_SZ <- rDwelling %>%
  group_by(SZ = SZ) %>%
  summarize(Pop = sum(Pop)) %>%
  mutate_at(.vars = vars(SZ), .funs = funs(toupper))

colnames(Pop_SZ)[1] <- "SUBZONE_N"

mpsz_Pop <- left_join(mpsz, Pop_SZ)
## Joining, by = "SUBZONE_N"
## Warning: Column `SUBZONE_N` joining factor and character vector, coercing into
## character vector
mpsz_Pop <- st_set_geometry(mpsz_Pop, NULL) # Dropping the geometry Table

SG_Dens <- sum(mpsz_Pop$Pop) / sum(mpsz_Pop$SHAPE_Area) # Population Density of Singapore

SZ_Pop <- mpsz_Pop %>%
  select("SUBZONE_N","Pop")

Pop_Dens <- mpsz_Pop %>% 
  mutate(dens = Pop / SHAPE_Area) # Calculate Population Density in m^2 by Subzone

Pop_Dens <- Pop_Dens[c(3,16,17)] # Dropping unneeded columns

4.5 HDB1-2RM Dwellers

HDB_1_2 <- rDwelling %>%
  filter(TOD == "HDB 1- and 2-Room Flats") %>%
  group_by(SZ = SZ) %>%
  summarize(Pop = sum(Pop))%>%
  mutate_at(.vars = vars(SZ), .funs = funs(toupper))

colnames(HDB_1_2)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(HDB_1_2)[2] <- "HDB_1_2" # Rename Pop to unique value for joining

4.6 HDB3-4RM Dwellers

HDB_3_4 <- rDwelling %>%
  filter(TOD == "HDB 3-Room Flats" | TOD == "HDB 4-Room Flats") %>%
  group_by(SZ = SZ) %>%
  summarize(Pop = sum(Pop))%>%
  mutate_at(.vars = vars(SZ), .funs = funs(toupper))

colnames(HDB_3_4)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(HDB_3_4)[2] <- "HDB_3_4" # Rename Pop to unique value for joining

4.7 HDB5-EC Dwellers

HDB_5_EC <- rDwelling %>%
  filter(TOD == "HDB 5-Room and Executive Flats") %>%
  group_by(SZ = SZ) %>%
  summarize(Pop = sum(Pop))%>%
  mutate_at(.vars = vars(SZ), .funs = funs(toupper))

colnames(HDB_5_EC)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(HDB_5_EC)[2] <- "HDB_5_EC" # Rename Pop to unique value for joining

4.8 Condominium & Apartment Dwellers

Condo_Apt <- rDwelling %>%
  filter(TOD == "Condominiums and Other Apartments") %>%
  group_by(SZ = SZ) %>%
  summarize(Pop = sum(Pop))%>%
  mutate_at(.vars = vars(SZ), .funs = funs(toupper))

colnames(Condo_Apt)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(Condo_Apt)[2] <- "Condo_Apt" # Rename Pop to unique value for joining

4.9 Landed Property Dwellers

LandedProperty <- rDwelling %>%
  filter(TOD == "Landed Properties") %>%
  group_by(SZ = SZ) %>%
  summarize(Pop = sum(Pop))%>%
  mutate_at(.vars = vars(SZ), .funs = funs(toupper))

colnames(LandedProperty)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(LandedProperty)[2] <- "LandedProperty" # Rename Pop to unique value for joining

4.10 Joining all the counts by Subzone

Indicators <- left_join(E_Active,Young)
## Joining, by = "SUBZONE_N"
Indicators <- left_join(Indicators,Aged)
## Joining, by = "SUBZONE_N"
Indicators <- left_join(Indicators,Pop_Dens)
## Joining, by = "SUBZONE_N"
Indicators <- left_join(Indicators,HDB_1_2)
## Joining, by = "SUBZONE_N"
Indicators <- left_join(Indicators,HDB_3_4)
## Joining, by = "SUBZONE_N"
Indicators <- left_join(Indicators,HDB_5_EC)
## Joining, by = "SUBZONE_N"
Indicators <- left_join(Indicators,Condo_Apt)
## Joining, by = "SUBZONE_N"
Indicators <- left_join(Indicators,LandedProperty)
## Joining, by = "SUBZONE_N"

4.11 Viewing the summary statistics of Indicators

summary(Indicators)
##   SUBZONE_N            E_Active         Young            Aged      
##  Length:323         Min.   :    0   Min.   :    0   Min.   :    0  
##  Class :character   1st Qu.:    0   1st Qu.:    0   1st Qu.:    0  
##  Mode  :character   Median : 2790   Median : 1170   Median :  640  
##                     Mean   : 7346   Mean   : 3286   Mean   : 1761  
##                     3rd Qu.:10285   3rd Qu.: 4365   3rd Qu.: 2940  
##                     Max.   :79640   Max.   :34240   Max.   :18600  
##       Pop              dens             HDB_1_2        HDB_3_4     
##  Min.   :     0   Min.   :0.000000   Min.   :   0   Min.   :    0  
##  1st Qu.:     0   1st Qu.:0.000000   1st Qu.:   0   1st Qu.:    0  
##  Median :  4880   Median :0.005857   Median :   0   Median :    0  
##  Mean   : 12393   Mean   :0.010662   Mean   : 542   Mean   : 5953  
##  3rd Qu.: 17035   3rd Qu.:0.019864   3rd Qu.: 605   3rd Qu.: 9705  
##  Max.   :132480   Max.   :0.046058   Max.   :4700   Max.   :75000  
##     HDB_5_EC       Condo_Apt     LandedProperty   
##  Min.   :    0   Min.   :    0   Min.   :    0.0  
##  1st Qu.:    0   1st Qu.:    0   1st Qu.:    0.0  
##  Median :    0   Median :  230   Median :    0.0  
##  Mean   : 3297   Mean   : 1827   Mean   :  773.9  
##  3rd Qu.: 3660   3rd Qu.: 2835   3rd Qu.:  400.0  
##  Max.   :47960   Max.   :16770   Max.   :18820.0

Looking at the data, we see subzones with 0 populations. If we use population 0 to scale we will get an error. Therefore we will drop subzones where population are 0.

4.12 Dropping Subzones with 0 Population

Indicators<-Indicators[!(Indicators$Pop==0),]

4.12.1 Deriving new Indicator variables using dplyr

Using Indicators as will be biased to areas with larger populations, hence we will use the code chunk below to overcome this issue

Indicators_derived <- Indicators %>%
  mutate(`E_Active` = `E_Active`/`Pop`*1000) %>%
  mutate(`Young` = `Young`/`Pop`*1000) %>%
  mutate(`Aged` = `Aged`/`Pop`*1000) %>%
  mutate(`HDB_1_2` = `HDB_1_2`/`Pop`*1000) %>%
  mutate(`HDB_3_4` = `HDB_3_4`/`Pop`*1000) %>%
  mutate(`HDB_5_EC` = `HDB_5_EC`/`Pop`*1000) %>%
  mutate(`Condo_Apt` = `Condo_Apt`/`Pop`*1000) %>%
  mutate(`LandedProperty` = `LandedProperty`/`Pop`*1000)
summary(Indicators_derived)
##   SUBZONE_N            E_Active          Young            Aged      
##  Length:228         Min.   : 512.4   Min.   :  0.0   Min.   :  0.0  
##  Class :character   1st Qu.: 573.0   1st Qu.:218.0   1st Qu.:106.3  
##  Mode  :character   Median : 592.8   Median :254.2   Median :151.0  
##                     Mean   : 601.8   Mean   :247.9   Mean   :150.3  
##                     3rd Qu.: 607.9   3rd Qu.:287.6   3rd Qu.:192.7  
##                     Max.   :1000.0   Max.   :360.0   Max.   :325.9  
##       Pop              dens              HDB_1_2          HDB_3_4     
##  Min.   :    10   Min.   :6.580e-06   Min.   :  0.00   Min.   :  0.0  
##  1st Qu.:  3330   1st Qu.:4.402e-03   1st Qu.:  0.00   1st Qu.:  0.0  
##  Median : 11640   Median :1.222e-02   Median :  0.00   Median :402.7  
##  Mean   : 17557   Mean   :1.510e-02   Mean   : 40.37   Mean   :355.0  
##  3rd Qu.: 26505   3rd Qu.:2.474e-02   3rd Qu.: 47.94   3rd Qu.:606.7  
##  Max.   :132480   Max.   :4.606e-02   Max.   :712.93   Max.   :948.1  
##     HDB_5_EC       Condo_Apt       LandedProperty  
##  Min.   :  0.0   Min.   :   0.00   Min.   :   0.0  
##  1st Qu.:  0.0   1st Qu.:  26.75   1st Qu.:   0.0  
##  Median :144.0   Median : 145.71   Median :   0.0  
##  Mean   :164.1   Mean   : 307.46   Mean   : 133.0  
##  3rd Qu.:259.4   3rd Qu.: 491.72   3rd Qu.: 145.2  
##  Max.   :836.4   Max.   :1000.00   Max.   :1000.0

4.12.2 Visualising the Difference

We will join both Indicators & Indicators_derived to mpsz for plotting and replace all NA values with 0 for areas without specific indicators.

mpsz_Indicators <- left_join(mpsz, Indicators) #Join Indicators to mpsz for plotting purposes
## Joining, by = "SUBZONE_N"
## Warning: Column `SUBZONE_N` joining factor and character vector, coercing into
## character vector
mpsz_Indicators <- replace(mpsz_Indicators, is.na(mpsz_Indicators),0) # Replace NA with 0 for areas without specific indicators
mpsz_Indicators_derived <- left_join(mpsz,Indicators_derived) #Join Indicators_derived to mpsz for plotting purposes
## Joining, by = "SUBZONE_N"
## Warning: Column `SUBZONE_N` joining factor and character vector, coercing into
## character vector
mpsz_Indicators_derived <- replace(mpsz_Indicators_derived, is.na(mpsz_Indicators_derived),0) # Replace NA with 0 for areas without specific indicators
Indic_Pop_Map <- qtm(mpsz_Indicators, "Pop")
Indic_EA_Map <- qtm(mpsz_Indicators, "E_Active")
Indic_D_EA_Map <- qtm(mpsz_Indicators_derived, "E_Active")

tmap_arrange(Indic_Pop_Map,Indic_EA_Map,Indic_D_EA_Map, nrow = 3)

As we can observe, the E_Active of Indicators_derived no longer appears to follow the distribution pattern of the population.

4.12.3 Plotting the Population Indicators

densMap <- qtm(mpsz_Indicators, "dens")
Young_Indic_Map <- qtm(mpsz_Indicators_derived, "Young")
Aged_Indic_Map <- qtm(mpsz_Indicators_derived, "Aged")


tmap_arrange(densMap,Young_Indic_Map,Aged_Indic_Map, ncol = 2, nrow = 2)

We observe many high porportion areas of young in the north / north eastern regions of Singapore.

4.12.4 Plotting the TOD HDB Indicators

HDB_1_2_Indic_Map <- qtm(mpsz_Indicators_derived, "HDB_1_2")
HDB_3_4_Indic_Map <- qtm(mpsz_Indicators_derived, "HDB_3_4")
HDB_5_EC_Indic_Map <- qtm(mpsz_Indicators_derived, "HDB_5_EC")


tmap_arrange(HDB_1_2_Indic_Map,HDB_3_4_Indic_Map,HDB_5_EC_Indic_Map, ncol = 2, nrow = 2)

We see that HDB_1_2 are mainly in the central region.

Condo_Apt_Indic_Map <- qtm(mpsz_Indicators_derived, "Condo_Apt")
LandedProperty_Indic_Map <- qtm(mpsz_Indicators_derived, "LandedProperty")

tmap_arrange(Condo_Apt_Indic_Map,LandedProperty_Indic_Map, nrow = 2)

We observe a large number of Condo_Apts in the central region of Singapore.

5 Importing Urban Function Geospatial Data

5.1 Business

Business = st_read(dsn = "data/geospatial", layer="Business")
## Reading layer `Business' from data source `C:\Users\jiiireh\Desktop\Take-home_ex03\Take-home_ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 6550 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.6147 ymin: 1.24605 xmax: 104.0044 ymax: 1.4698
## geographic CRS: WGS 84

5.1.1 Data Validity & NA Checking

unique(st_is_valid(Business,reason = TRUE))
## [1] "Valid Geometry"
Business[rowSums(is.na(Business))!=0,]
## Simple feature collection with 279 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.6198 ymin: 1.2601 xmax: 103.994 ymax: 1.45162
## geographic CRS: WGS 84
## First 10 features:
##         POI_ID SEQ_NUM FAC_TYPE                         POI_NAME ST_NAME
## 4   1101180212       1     5000   MALAYSIA GARMENT MANUFACTURERS    <NA>
## 13  1001052864       1     5000        MENLO WORLDWIDE LOGISTICS    <NA>
## 113 1141900621       1     5000                     PRIMZ BIZHUB    <NA>
## 228 1097875448       1     5000                     ACER TOWER A    <NA>
## 229 1097875449       1     5000                     ACER TOWER B    <NA>
## 265 1137930245       1     5000              NORTH SPRING BIZHUB    <NA>
## 267 1103814952       1     5000                       AT PUNGGOL    <NA>
## 269 1103814941       1     5000                        ECO-SCAPE    <NA>
## 270 1103814940       1     5000                            PRINZ    <NA>
## 271 1103814939       1     5000 SING SEE SOON FLORAL & LANDSCAPE    <NA>
##                     geometry
## 4   POINT (103.8855 1.33821)
## 13  POINT (103.7509 1.32875)
## 113  POINT (103.805 1.43509)
## 228 POINT (103.7466 1.32665)
## 229 POINT (103.7459 1.32695)
## 265 POINT (103.8431 1.43731)
## 267 POINT (103.9172 1.39312)
## 269  POINT (103.9159 1.3934)
## 270  POINT (103.9159 1.3934)
## 271 POINT (103.9158 1.39326)

We observe no invalid polygons however there are NA Data; specifically in the ST_NAME column. However, street names is not a significant column to us and it will be dropped eventually, hence we can classify it as a non-issue.

5.1.2 Investigating Industry within Business

unique(Business$FAC_TYPE)
## [1] 5000 9991

We see that there are two unique FAC_TYPEs in Business. We will separate them out to find out which FAC_TYPE refers to Industry.

5.1.3 Splitting Business by FAC_TYPE

Business_5000 <- Business %>%
  filter(FAC_TYPE == 5000)

Business_9991 <- Business %>%
  filter(FAC_TYPE == 9991)

glimpse(Business_5000)
## Rows: 6,440
## Columns: 6
## $ POI_ID   <dbl> 1101180209, 1101180210, 1101180211, 1101180212, 1101180213...
## $ SEQ_NUM  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ FAC_TYPE <int> 5000, 5000, 5000, 5000, 5000, 5000, 5000, 5000, 5000, 5000...
## $ POI_NAME <fct> JOHN CHEN, TROPICAL INDUSTRIAL BUILDING, LIAN CHEONG INDUS...
## $ ST_NAME  <fct> LITTLE RD, LITTLE RD, LITTLE RD, NA, LITTLE RD, LOWER KENT...
## $ geometry <POINT [°]> POINT (103.8856 1.33841), POINT (103.8852 1.33832), ...
glimpse(Business_9991)
## Rows: 110
## Columns: 6
## $ POI_ID   <dbl> 1110491789, 1099992474, 1099992477, 1099992477, 1100464367...
## $ SEQ_NUM  <int> 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2...
## $ FAC_TYPE <int> 9991, 9991, 9991, 9991, 9991, 9991, 9991, 9991, 9991, 9991...
## $ POI_NAME <fct> KAKI BUKIT INDUSTRIAL ESTATE, TERRACE FACTORIES TUAS SOUTH...
## $ ST_NAME  <fct> KAKI BUKIT AVE 1, TUAS SOUTH ST 5, TUAS SOUTH ST 5, TUAS S...
## $ geometry <POINT [°]> POINT (103.9042 1.33269), POINT (103.6242 1.29325), ...
B5000_Map <- tm_shape(mpsz)+
  tm_polygons(alpha = 0, border.col = "lightgrey", border.alpha = 1)+
tm_shape(Business_5000)+
  tm_dots(col = "red")+
  tm_layout(title = "Business FAC_TYPE 5000 Distribution",
    title.size = 1,
    title.position = c("center", "top"),
    inner.margins = c(0.06, 0.10, 0.10, 0.08))

B9991_Map <- tm_shape(mpsz)+
  tm_polygons(alpha = 0, border.col = "lightgrey", border.alpha = 1)+
tm_shape(Business_9991)+
  tm_dots(col = "red")+
  tm_layout(title = "Business FAC_TYPE 9991 Distribution",
    title.size = 1,
    title.position = c("center", "top"),
    inner.margins = c(0.06, 0.10, 0.10, 0.08))

tmap_arrange(B5000_Map,B9991_Map, ncol = 2)

Looking into the data, FAC_TYPE 5000 has POI_NAMES such as “AIA” and “ABBOTT”, an insurance and medical device and health care companies respectively. Meanwhile FAC_TYPE 9991 primarily has POI_NAMES that include “INDUSTRIAL ESTATE” and “WATER FABRICATION”. Therefore we will regard FAC_TYPE 9991 as our extracted industry data.

5.1.4 Renaming the FAC_TYPE 9991 & 5000

Industry <- Business_9991
Business <- Business_5000

5.2 4.2.1 Financial

Financial = st_read(dsn = "data/geospatial", layer="Financial")
## Reading layer `Financial' from data source `C:\Users\jiiireh\Desktop\Take-home_ex03\Take-home_ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 3320 features and 29 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.6256 ymin: 1.24392 xmax: 103.9998 ymax: 1.46247
## geographic CRS: WGS 84

5.2.1 Data Validity & NA Checking

unique(st_is_valid(Financial,reason = TRUE))
## [1] "Valid Geometry"
Financial[rowSums(is.na(Financial))!=0,]
## Simple feature collection with 3320 features and 29 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.6256 ymin: 1.24392 xmax: 103.9998 ymax: 1.46247
## geographic CRS: WGS 84
## First 10 features:
##       LINK_ID     POI_ID SEQ_NUM FAC_TYPE            POI_NAME POI_LANGCD
## 1  1170624361 1132324230       1     3578                 UOB        ENG
## 2  1112103842 1132315471       1     3578                POSB        ENG
## 3  1112103842 1132315472       1     3578                 UOB        ENG
## 4  1112103842 1132315473       1     3578                OCBC        ENG
## 5   864687596 1100784924       1     3578                OCBC        ENG
## 6   902073032 1132324170       1     6000             MAYBANK        ENG
## 7   778516217 1141424387       1     6000 ADPOST MONEYCHANGER        ENG
## 8   880495939 1096910285       1     3578                 UOB        ENG
## 9   866996334 1096910292       1     3578                OCBC        ENG
## 10  880495939 1096910286       1     3578            CITIBANK        ENG
##    POI_NMTYPE POI_ST_NUM ST_NUM_FUL ST_NFUL_LC           ST_NAME ST_LANGCD
## 1           B        201       <NA>       <NA>      YISHUN AVE 2       ENG
## 2           B        375       <NA>       <NA>  COMMONWEALTH AVE       ENG
## 3           B        375       <NA>       <NA>  COMMONWEALTH AVE       ENG
## 4           B        375       <NA>       <NA>  COMMONWEALTH AVE       ENG
## 5           B       <NA>       <NA>       <NA> JURONG WEST ST 51       ENG
## 6           B        707       <NA>       <NA>     EAST COAST RD       ENG
## 7           B        163       <NA>       <NA>        TANGLIN RD       ENG
## 8           B       <NA>       <NA>       <NA>              <NA>      <NA>
## 9           B         11       <NA>       <NA>         ARTS LINK       ENG
## 10          B       <NA>       <NA>       <NA>              <NA>      <NA>
##    POI_ST_SD ACC_TYPE   PH_NUMBER CHAIN_ID NAT_IMPORT PRIVATE IN_VICIN
## 1          L     <NA>        <NA>     6919          N       N        N
## 2          R     <NA>        <NA>     6918          N       N        N
## 3          R     <NA>        <NA>     6919          N       N        N
## 4          R     <NA>        <NA>     6920          N       N        N
## 5          R     <NA>        <NA>     6920          N       N        N
## 6          L     <NA> 18006292266     3657          N       N        N
## 7          R     <NA>    67330779        0          N       N        N
## 8          R     <NA>        <NA>     6919          N       N        N
## 9          R     <NA>        <NA>     6920          N       N        N
## 10         R     <NA>        <NA>     1165          N       N        N
##    NUM_PARENT NUM_CHILD PERCFRREF VANCITY_ID
## 1           0         0        NA          0
## 2           0         0        NA          0
## 3           0         0        NA          0
## 4           0         0        NA          0
## 5           0         0        60          0
## 6           0         0        NA          0
## 7           1         0        50          0
## 8           0         0        20          0
## 9           0         0        NA          0
## 10          0         0        20          0
##                                                              ACT_ADDR
## 1                                                                <NA>
## 2                                                                <NA>
## 3                                                                <NA>
## 4                                                                <NA>
## 5  501 JURONG WEST STREET 51                         SINGAPORE 640501
## 6                                                                <NA>
## 7                                                                <NA>
## 8                                                                <NA>
## 9                                                                <NA>
## 10                                                               <NA>
##    ACT_LANGCD            ACT_ST_NAM ACT_ST_NUM ACT_ADMIN ACT_POSTAL
## 1        <NA>                  <NA>       <NA>      <NA>       <NA>
## 2        <NA>                  <NA>       <NA>      <NA>       <NA>
## 3        <NA>                  <NA>       <NA>      <NA>       <NA>
## 4        <NA>                  <NA>       <NA>      <NA>       <NA>
## 5         ENG JURONG WEST STREET 51        501 SINGAPORE     640501
## 6        <NA>                  <NA>       <NA>      <NA>       <NA>
## 7        <NA>                  <NA>       <NA>      <NA>       <NA>
## 8        <NA>                  <NA>       <NA>      <NA>       <NA>
## 9        <NA>                  <NA>       <NA>      <NA>       <NA>
## 10       <NA>                  <NA>       <NA>      <NA>       <NA>
##                    geometry
## 1   POINT (103.833 1.41695)
## 2  POINT (103.7989 1.30211)
## 3  POINT (103.7989 1.30211)
## 4  POINT (103.7989 1.30211)
## 5  POINT (103.7189 1.35016)
## 6  POINT (103.9224 1.31199)
## 7  POINT (103.8242 1.30528)
## 8  POINT (103.7723 1.29608)
## 9  POINT (103.7719 1.29367)
## 10 POINT (103.7723 1.29608)

We observe no invalid polygons however there are NA Data in many columns. Taking a look at the NA values, we see that they refer to mainly the street number/ name, phone number & address details. However since we have the geometric location to locate each point, street names and address columns are redundant and thus the NA values are a non-issue as those columns are not taken into account for our study.

5.2.2 Investigating Financial FAC_TYPEs

unique(Financial$FAC_TYPE)
## [1] 3578 6000

We see two unique FAC_TYPEs within Financial. We will take a look at the differences between the two.

Financial_3578 <- Financial %>%
  filter(FAC_TYPE == '3578')

Financial_6000 <- Financial %>%
  filter(FAC_TYPE == '6000')

F3578_Map <- tm_shape(mpsz)+
  tm_polygons(alpha = 0, border.col = "lightgrey", border.alpha = 1)+
tm_shape(Financial_3578)+
  tm_dots(col = "red")+
  tm_layout(title = "Financial FAC_TYPE 3578 Distribution",
    title.size = 1,
    title.position = c("center", "top"),
    inner.margins = c(0.06, 0.10, 0.10, 0.08))

F6000_Map <- tm_shape(mpsz)+
  tm_polygons(alpha = 0, border.col = "lightgrey", border.alpha = 1)+
tm_shape(Financial_6000)+
  tm_dots(col = "red")+
  tm_layout(title = "Financial FAC_TYPE 6000 Distribution",
    title.size = 1,
    title.position = c("center", "top"),
    inner.margins = c(0.06, 0.10, 0.10, 0.08))

tmap_arrange(F3578_Map,F6000_Map, ncol=2)

From analysing the data and checking the available postal codes within the data, it appears FAC_TYPE 3578 under Financial are the geolocations of the ATMs in Singapore while FAC_TYPE 6000 are the geolocations of services such as money exchangers or banks.

5.3 Govt_Embassy

Govt_Embassy = st_read(dsn = "data/geospatial", layer="Govt_Embassy")
## Reading layer `Govt_Embassy' from data source `C:\Users\jiiireh\Desktop\Take-home_ex03\Take-home_ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 443 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.6282 ymin: 1.24911 xmax: 103.9884 ymax: 1.45765
## geographic CRS: WGS 84

5.3.1 Data Validity & NA Checking

unique(st_is_valid(Govt_Embassy,reason = TRUE))
## [1] "Valid Geometry"
Govt_Embassy[rowSums(is.na(Govt_Embassy))!=0,]
## Simple feature collection with 28 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.7206 ymin: 1.27341 xmax: 103.8578 ymax: 1.45114
## geographic CRS: WGS 84
## First 10 features:
##         POI_ID SEQ_NUM FAC_TYPE                                 POI_NAME
## 4   1141424338       1     9993                   GENERAL CONSULATE OMAN
## 34  1070436984       1     9993             TAIPEI REPRESENTATIVE OFFICE
## 35  1070436981       1     9525                    MINISTRY OF TRANSPORT
## 36  1070436980       1     9525              CASINO REGULATORY AUTHORITY
## 49  1024547731       1     9525                             SLF BUILDING
## 63  1149038609       1     9525 MEDIA DEVELOPMENT AUTHORITY OF SINGAPORE
## 64  1149038609       2     9525                                      MDA
## 66  1058449988       1     9993             EMBASSY UNITED ARAB EMIRATES
## 116 1083739893       1     9525                         RAFFLES BUILDING
## 117 1083739890       1     9525        NATIONAL PARKS BOARD HEADQUARTERS
##     ST_NAME                 geometry
## 4      <NA>  POINT (103.8578 1.2999)
## 34     <NA> POINT (103.8011 1.27347)
## 35     <NA> POINT (103.8012 1.27341)
## 36     <NA> POINT (103.8012 1.27341)
## 49     <NA> POINT (103.8393 1.33325)
## 63     <NA> POINT (103.7874 1.29881)
## 64     <NA> POINT (103.7874 1.29881)
## 66     <NA>  POINT (103.8578 1.2999)
## 116    <NA> POINT (103.8182 1.31664)
## 117    <NA> POINT (103.8161 1.31599)

We can observe that there are no invalid geometries, however there are NAs, specifically in the ST_NAME column. As mentioned, since the ST_NAME column is not a column we will be taking into consideration for this study, the NA values within it are a non-issue.

5.3.2 Investigating Govt_Embassy FAC_TYPEs

unique(Govt_Embassy$FAC_TYPE)
## [1] 9993 9525

We see two unique FAC_TYPEs. We will briefly take a look at the differences.

Govt_Embassy_9993 <- Govt_Embassy %>%
  filter(FAC_TYPE == '9993')

Govt_Embassy_9525 <- Govt_Embassy %>%
  filter(FAC_TYPE == '9525')

G9993_Map <- tm_shape(mpsz)+
  tm_polygons(alpha = 0, border.col = "lightgrey", border.alpha = 1)+
tm_shape(Govt_Embassy_9993)+
  tm_dots(col = "red")+
  tm_layout(title = "Govt_Embassy FAC_TYPE 9993 Distribution",
    title.size = 1,
    title.position = c("center", "top"),
    inner.margins = c(0.06, 0.10, 0.10, 0.08))

G9525_Map <- tm_shape(mpsz)+
  tm_polygons(alpha = 0, border.col = "lightgrey", border.alpha = 1)+
tm_shape(Govt_Embassy_9525)+
  tm_dots(col = "red")+
  tm_layout(title = "Govt_Embassy FAC_TYPE 9525 Distribution",
    title.size = 1,
    title.position = c("center", "top"),
    inner.margins = c(0.06, 0.10, 0.10, 0.08))

tmap_arrange(G9993_Map,G9525_Map, ncol=2)

We observe that FAC_TYPE 9993 have most of its locations in the central region of Singapore while FAC TYPE 9525 is more spaced out around Singapore. We will take a look at the data for more insight as to why this may be.

head(Govt_Embassy_9993)
## Simple feature collection with 6 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.8431 ymin: 1.28113 xmax: 103.8578 ymax: 1.31836
## geographic CRS: WGS 84
##       POI_ID SEQ_NUM FAC_TYPE               POI_NAME      ST_NAME
## 1 1141424380       1     9993   CONSULATE SAN MARINO    CHURCH ST
## 2 1141424404       1     9993           EMBASSY LAOS GOLDHILL PLZ
## 3 1141424402       1     9993       CONSULATE BELIZE     CECIL ST
## 4 1141424338       1     9993 GENERAL CONSULATE OMAN         <NA>
## 5 1001332522       1     9993         EMBASSY NORWAY RAFFLES QUAY
## 6 1001332520       1     9993         EMBASSY PANAMA RAFFLES QUAY
##                   geometry
## 1 POINT (103.8494 1.28343)
## 2 POINT (103.8431 1.31836)
## 3 POINT (103.8493 1.28128)
## 4  POINT (103.8578 1.2999)
## 5 POINT (103.8512 1.28113)
## 6  POINT (103.8512 1.2812)
head(Govt_Embassy_9525)
## Simple feature collection with 6 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.8455 ymin: 1.27869 xmax: 103.9184 ymax: 1.32688
## geographic CRS: WGS 84
##       POI_ID SEQ_NUM FAC_TYPE                       POI_NAME      ST_NAME
## 1 1192460871       1     9525                MND TOWER BLOCK   MAXWELL RD
## 2 1192460819       1     9525 MND AUDITORIUM & FUNCTION HALL   MAXWELL RD
## 3 1192460843       1     9525          AICARE LINK @ MAXWELL   MAXWELL RD
## 4 1192460783       1     9525   HARMONY IN DIVERSITY GALLERY   MAXWELL RD
## 5 1192460750       1     9525    FAMILY SUPPORT DIVISION MSF   MAXWELL RD
## 6 1194224304       1     9525               LTA BEDOK CAMPUS CHAI CHEE ST
##                   geometry
## 1 POINT (103.8456 1.27869)
## 2 POINT (103.8455 1.27883)
## 3 POINT (103.8455 1.27883)
## 4 POINT (103.8455 1.27883)
## 5 POINT (103.8455 1.27883)
## 6 POINT (103.9184 1.32688)

Looking at the POI_NAMES, we see that FAC_TYPE 9993 are the foreign embassies, while FAC_TYPE 9525 have POI_NAMES that include “Town Council” , “Fire Station” etc.

5.3.3 Private_residential

Private_residential = st_read(dsn = "data/geospatial", layer="Private residential")
## Reading layer `Private residential' from data source `C:\Users\jiiireh\Desktop\Take-home_ex03\Take-home_ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 3604 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.6295 ymin: 1.23943 xmax: 103.9749 ymax: 1.45379
## geographic CRS: WGS 84

5.3.4 Data Validity & NA Checking

unique(st_is_valid(Private_residential,reason = TRUE))
## [1] "Valid Geometry"
Private_residential[rowSums(is.na(Private_residential))!=0,]
## Simple feature collection with 45 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.742 ymin: 1.27681 xmax: 103.9495 ymax: 1.44509
## geographic CRS: WGS 84
## First 10 features:
##         POI_ID SEQ_NUM FAC_TYPE               POI_NAME ST_NAME
## 3   1202668778       1     9590 GREENTOPS @ SIMS PLACE    <NA>
## 40  1100618584       1     9590    PARKROYAL RESIDENCE    <NA>
## 70  1202435811       1     9590       FERNVALE GARDENS    <NA>
## 72  1202435810       1     9590         FERNVALE FLORA    <NA>
## 102  995162529       1     9590         SIGNATURE PARK    <NA>
## 231 1149047335       1     9590 SOUTH BEACH RESIDENCES    <NA>
## 236 1192848219       1     9590              J GATEWAY    <NA>
## 287 1023797242       1     9590       GREAT WORLD CITY    <NA>
## 542 1202435848       1     9590              COSTA RIS    <NA>
## 718 1069869725       1     9590            THE LAURELS    <NA>
##                     geometry
## 3   POINT (103.8797 1.31643)
## 40  POINT (103.8609 1.30024)
## 70  POINT (103.8788 1.39261)
## 72  POINT (103.8757 1.39347)
## 102 POINT (103.7699 1.34295)
## 231 POINT (103.8564 1.29458)
## 236  POINT (103.742 1.33585)
## 287 POINT (103.8314 1.29365)
## 542  POINT (103.948 1.36881)
## 718 POINT (103.8374 1.30444)

We observe no invalid geometries but NA values in the ST_NAME column. Similar to the above conclusions, the NA values in ST_NAME is a non-issue.

5.3.5 Investigating Private_residential FAC_TYPEs

unique(Private_residential$FAC_TYPE)
## [1] 9590

We observe only one FAC_TYPE in Private_residential.

P_Map <- tm_shape(mpsz)+
  tm_polygons(Private_residential = 0, border.col = "Black", border.alpha = 1)+
tm_shape(Private_residential)+
  tm_dots(col = "red")+
  tm_layout(title = "Private_residential Distribution",
    title.size = 1,
    title.position = c("center", "top"),
    inner.margins = c(0.06, 0.10, 0.10, 0.08))

P_Map

We observe a denser distribution of upmarket residential locations near the central and slightly eastern area of Singapore as compared to the other areas. This could be reasoned by property locations near the Central Business Districts usually being more lucrative due to it’s proximity to the CBD or town areas.

5.3.6 Shopping

Shopping = st_read(dsn = "data/geospatial", layer="Shopping")
## Reading layer `Shopping' from data source `C:\Users\jiiireh\Desktop\Take-home_ex03\Take-home_ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 511 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.679 ymin: 1.24779 xmax: 103.9644 ymax: 1.4535
## geographic CRS: WGS 84

5.3.7 Data Validity & NA Checking

unique(st_is_valid(Shopping,reason = TRUE))
## [1] "Valid Geometry"
Shopping[rowSums(is.na(Shopping))!=0,]
## Simple feature collection with 102 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.679 ymin: 1.25619 xmax: 103.9635 ymax: 1.45123
## geographic CRS: WGS 84
## First 10 features:
##        POI_ID SEQ_NUM FAC_TYPE                                  POI_NAME
## 7  1069767253       1     6512     UNITED SQUARE GOLDHILL PLAZA ENTRANCE
## 8  1069767253       2     6512       UNITED SQUARE GOLDHILL PLZ ENTRANCE
## 9  1039562724       1     6512                                 THE FORUM
## 10 1039562723       1     6512                                WATERFRONT
## 12 1039562756       1     6512                             THE BULL RING
## 18 1069686034       1     6512                                    BUGIS+
## 21 1178047575       1     6512                                  E!AVENUE
## 25 1178800633       1     6512                      TANJONG PAGAR CENTRE
## 27 1201735347       1     6512                         HEARTBEAT @ BEDOK
## 36 1103577748       1     6512 TAMPINES MART-TAMPINES STREET 34 ENTRANCE
##    ST_NAME                 geometry
## 7     <NA> POINT (103.8432 1.31744)
## 8     <NA> POINT (103.8432 1.31744)
## 9     <NA> POINT (103.8205 1.25619)
## 10    <NA> POINT (103.8205 1.25619)
## 12    <NA> POINT (103.8205 1.25619)
## 18    <NA> POINT (103.8539 1.29988)
## 21    <NA> POINT (103.9556 1.37781)
## 25    <NA> POINT (103.8454 1.27721)
## 27    <NA> POINT (103.9326 1.32735)
## 36    <NA> POINT (103.9607 1.35454)

5.3.8 Investigating Shopping FAC_TYPEs

unique(Shopping$FAC_TYPE)
## [1] 6512

Only one unique FAC_TYPE is observed.

head(Shopping)
## Simple feature collection with 6 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.7127 ymin: 1.28458 xmax: 103.9041 ymax: 1.35375
## geographic CRS: WGS 84
##       POI_ID SEQ_NUM FAC_TYPE                 POI_NAME         ST_NAME
## 1 1132106213       1     6512          SIN MING CENTRE     SIN MING RD
## 2  801758392       1     6512              THE ADELPHI      COLEMAN ST
## 3  842821452       1     6512 BOON LAY SHOPPING CENTRE     BOON LAY PL
## 4 1193779191       1     6512            KATONG SQUARE   EAST COAST RD
## 5  801758399       1     6512           SIM LIM SQUARE ROCHOR CANAL RD
## 6 1001450091       1     6512    PEOPLE'S PARK COMPLEX         PARK RD
##                   geometry
## 1  POINT (103.836 1.35375)
## 2 POINT (103.8515 1.29124)
## 3 POINT (103.7127 1.34672)
## 4   POINT (103.9041 1.305)
## 5 POINT (103.8533 1.30341)
## 6  POINT (103.843 1.28458)

From the POI_NAMES, we can see that Shopping refers to the various shopping centres in Singapore.

S_Map <- tm_shape(mpsz)+
  tm_polygons(Shopping = 0, border.col = "Black", border.alpha = 1)+
tm_shape(Shopping)+
  tm_dots(col = "red")+
  tm_layout(title = "Shopping Distribution",
    title.size = 1,
    title.position = c("center", "top"),
    inner.margins = c(0.06, 0.10, 0.10, 0.08))

S_Map

We can observe a large number of shopping centres in the Central region of Singapore likely to cater to the traffic of people who either work there and patronise these shopping centres during meal hours or post work or for leisure and tourists.

We also do see sparse distributions of shopping centres around Singapore to cater to the populace living in those parts.

5.4 Transforming Projections of the Urban Functions

st_crs(Business)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["latitude",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["longitude",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]
st_crs(Industry)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["latitude",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["longitude",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]
st_crs(Financial)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["latitude",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["longitude",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]
st_crs(Govt_Embassy)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["latitude",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["longitude",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]
st_crs(Private_residential)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["latitude",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["longitude",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]
st_crs(Shopping)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["latitude",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["longitude",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]

We see that the projection system of various urban functions are ESPG:4326. Therefore we will have to transform them to ESPG:3414

5.4.1 Converting to ESPG:3414

Business3414 <- st_transform(Business, "+init=EPSG:3414 +datum=WGS84")
Industry3414 <- st_transform(Industry, "+init=EPSG:3414 +datum=WGS84")
Govt_Embassy3414 <- st_transform(Govt_Embassy, "+init=EPSG:3414 +datum=WGS84")
Financial3414 <- st_transform(Financial, "+init=EPSG:3414 +datum=WGS84")
Private_residential3414 <- st_transform(Private_residential, "+init=EPSG:3414 +datum=WGS84")
Shopping3414 <- st_transform(Shopping, "+init=EPSG:3414 +datum=WGS84")

5.4.2 Ensuring CRS is correct

st_crs(Business3414)
## Coordinate Reference System:
##   User input: +init=EPSG:3414 +datum=WGS84 
##   wkt:
## PROJCRS["unknown",
##     BASEGEOGCRS["unknown",
##         DATUM["World Geodetic System 1984",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]],
##             ID["EPSG",6326]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8901]]],
##     CONVERSION["unknown",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]

Our urban functions are now in ESPG:3414 projection.

5.5 Joining Urban Functions to mpsz & Count UF by Subzone

5.5.1 Business

mpsz_Business <- st_join(mpsz,Business3414)
mpsz_Business <- mpsz_Business[!is.na(mpsz_Business$FAC_TYPE),] #Removing Subzones without UF
mpsz_Business <- mpsz_Business %>% 
  mutate(count = 1) # adding a count column
Business_SZCount <- mpsz_Business %>%
  group_by(SUBZONE_N) %>%
  summarise(Business = sum(count)) # counting UF by Subzone
Business_SZCount <- st_set_geometry(Business_SZCount, NULL) # Dropping the geometry Table
mpsz_Industry <- st_join(mpsz,Industry3414)
mpsz_Industry <- mpsz_Industry[!is.na(mpsz_Industry$FAC_TYPE),] #Removing Subzones without UF
mpsz_Industry <- mpsz_Industry %>% 
  mutate(count = 1) # adding a count column
Industry_SZCount <- mpsz_Industry %>%
  group_by(SUBZONE_N) %>%
  summarise(Industry = sum(count))# counting UF by Subzone
Industry_SZCount <- st_set_geometry(Industry_SZCount, NULL) # Dropping the geometry Table
mpsz_Financial <- st_join(mpsz,Financial3414)
mpsz_Financial <- mpsz_Financial[!is.na(mpsz_Financial$FAC_TYPE),] #Removing Subzones without UF
mpsz_Financial <- mpsz_Financial %>% 
  mutate(count = 1) # adding a count column
Financial_SZCount <- mpsz_Financial %>%
  group_by(SUBZONE_N) %>%
  summarise(Financial = sum(count))# counting UF by Subzone
Financial_SZCount <- st_set_geometry(Financial_SZCount, NULL) # Dropping the geometry Table
mpsz_Govt_Embassy <- st_join(mpsz,Govt_Embassy3414)
mpsz_Govt_Embassy <- mpsz_Govt_Embassy[!is.na(mpsz_Govt_Embassy$FAC_TYPE),] #Removing Subzones without UF
mpsz_Govt_Embassy <- mpsz_Govt_Embassy %>% 
  mutate(count = 1) # adding a count column
Govt_Embassy_SZCount <- mpsz_Govt_Embassy %>%
  group_by(SUBZONE_N) %>%
  summarise(Govt_Embassy = sum(count))# counting UF by Subzone
Govt_Embassy_SZCount <- st_set_geometry(Govt_Embassy_SZCount, NULL) # Dropping the geometry Table
mpsz_Private_residential <- st_join(mpsz,Private_residential3414)
mpsz_Private_residential <- mpsz_Private_residential[!is.na(mpsz_Private_residential$FAC_TYPE),] #Removing Subzones without UF
mpsz_Private_residential <- mpsz_Private_residential %>% 
  mutate(count = 1) # adding a count column
Private_residential_SZCount <- mpsz_Private_residential %>%
  group_by(SUBZONE_N) %>%
  summarise(Private_residential = sum(count))# counting UF by Subzone
Private_residential_SZCount <- st_set_geometry(Private_residential_SZCount, NULL) # Dropping the geometry Table
mpsz_Shopping <- st_join(mpsz,Shopping3414)
mpsz_Shopping <- mpsz_Shopping[!is.na(mpsz_Shopping$FAC_TYPE),] #Removing Subzones without UF
mpsz_Shopping <- mpsz_Shopping %>% 
  mutate(count = 1) # adding a count column
Shopping_SZCount <- mpsz_Shopping %>%
  group_by(SUBZONE_N) %>%
  summarise(Shopping = sum(count))# counting UF by Subzone
Shopping_SZCount <- st_set_geometry(Shopping_SZCount, NULL) # Dropping the geometry Table

5.6 Joining all Urban Functions

mpsz_UF <- left_join(mpsz,Business_SZCount)
## Joining, by = "SUBZONE_N"
mpsz_UF <- left_join(mpsz_UF,Industry_SZCount)
## Joining, by = "SUBZONE_N"
mpsz_UF <- left_join(mpsz_UF,Financial_SZCount)
## Joining, by = "SUBZONE_N"
mpsz_UF <- left_join(mpsz_UF,Govt_Embassy_SZCount)
## Joining, by = "SUBZONE_N"
mpsz_UF <- left_join(mpsz_UF,Private_residential_SZCount)
## Joining, by = "SUBZONE_N"
mpsz_UF <- left_join(mpsz_UF,Shopping_SZCount)
## Joining, by = "SUBZONE_N"
summary(mpsz_UF)
##     OBJECTID       SUBZONE_NO               SUBZONE_N     SUBZONE_C   CA_IND 
##  Min.   :  1.0   Min.   : 1.000   ADMIRALTY      :  1   AMSZ01 :  1   N:274  
##  1st Qu.: 81.5   1st Qu.: 2.000   AIRPORT ROAD   :  1   AMSZ02 :  1   Y: 49  
##  Median :162.0   Median : 4.000   ALEXANDRA HILL :  1   AMSZ03 :  1          
##  Mean   :162.0   Mean   : 4.625   ALEXANDRA NORTH:  1   AMSZ04 :  1          
##  3rd Qu.:242.5   3rd Qu.: 6.500   ALJUNIED       :  1   AMSZ05 :  1          
##  Max.   :323.0   Max.   :17.000   ANAK BUKIT     :  1   AMSZ06 :  1          
##                                   (Other)        :317   (Other):317          
##          PLN_AREA_N    PLN_AREA_C               REGION_N   REGION_C 
##  BUKIT MERAH  : 17   BM     : 17   CENTRAL REGION   :134   CR :134  
##  QUEENSTOWN   : 15   QT     : 15   EAST REGION      : 30   ER : 30  
##  ANG MO KIO   : 12   AM     : 12   NORTH-EAST REGION: 48   NER: 48  
##  DOWNTOWN CORE: 12   DT     : 12   NORTH REGION     : 41   NR : 41  
##  TOA PAYOH    : 12   TP     : 12   WEST REGION      : 70   WR : 70  
##  HOUGANG      : 10   HG     : 10                                    
##  (Other)      :245   (Other):245                                    
##              INC_CRC      FMEL_UPD_D             X_ADDR          Y_ADDR     
##  00F5E30B5C9B7AD8:  1   Min.   :2014-12-05   Min.   : 5093   Min.   :19579  
##  013B509B8EDF15BE:  1   1st Qu.:2014-12-05   1st Qu.:21864   1st Qu.:31776  
##  01A4287FB060A0A6:  1   Median :2014-12-05   Median :28465   Median :35113  
##  029BD940F4455194:  1   Mean   :2014-12-05   Mean   :27257   Mean   :36106  
##  0524461C92F35D94:  1   3rd Qu.:2014-12-05   3rd Qu.:31674   3rd Qu.:39869  
##  05FD555397CBEE7A:  1   Max.   :2014-12-05   Max.   :50425   Max.   :49553  
##  (Other)         :317                                                       
##    SHAPE_Leng        SHAPE_Area          Business         Industry    
##  Min.   :  871.5   Min.   :   39438   Min.   :  1.00   Min.   :1.000  
##  1st Qu.: 3709.6   1st Qu.:  628261   1st Qu.:  2.00   1st Qu.:1.000  
##  Median : 5211.9   Median : 1229894   Median :  7.00   Median :1.000  
##  Mean   : 6524.4   Mean   : 2420882   Mean   : 29.81   Mean   :2.245  
##  3rd Qu.: 6942.6   3rd Qu.: 2106483   3rd Qu.: 29.00   3rd Qu.:2.000  
##  Max.   :68083.9   Max.   :69748299   Max.   :308.00   Max.   :8.000  
##                                       NA's   :107      NA's   :274    
##    Financial       Govt_Embassy    Private_residential    Shopping     
##  Min.   :  1.00   Min.   : 1.000   Min.   :  1.00      Min.   : 1.000  
##  1st Qu.:  3.25   1st Qu.: 1.000   1st Qu.:  3.00      1st Qu.: 1.000  
##  Median :  8.00   Median : 2.000   Median :  7.00      Median : 2.000  
##  Mean   : 13.28   Mean   : 3.331   Mean   : 15.08      Mean   : 3.476  
##  3rd Qu.: 16.00   3rd Qu.: 4.000   3rd Qu.: 15.00      3rd Qu.: 4.000  
##  Max.   :134.00   Max.   :19.000   Max.   :217.00      Max.   :31.000  
##  NA's   :73       NA's   :190      NA's   :84          NA's   :176     
##           geometry  
##  MULTIPOLYGON :318  
##  POLYGON      :  5  
##  epsg:NA      :  0  
##  +proj=tmer...:  0  
##                     
##                     
## 

We observe NA values in our urban functions in subzones that do not have the respective urban functions. We will replace these NA values with 0 to represent that these areas do not have the respective urban functions.

5.6.1 Replacing NA in our mpsz_UF

mpsz_UF <- replace(mpsz_UF, is.na(mpsz_UF),0)

5.6.2 Ensuring no NA remains in our mpsz_UF

mpsz_UF[rowSums(is.na(mpsz_UF))!=0,]
## Simple feature collection with 0 features and 21 fields
## bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
## projected CRS:  SVY21
##  [1] OBJECTID            SUBZONE_NO          SUBZONE_N          
##  [4] SUBZONE_C           CA_IND              PLN_AREA_N         
##  [7] PLN_AREA_C          REGION_N            REGION_C           
## [10] INC_CRC             FMEL_UPD_D          X_ADDR             
## [13] Y_ADDR              SHAPE_Leng          SHAPE_Area         
## [16] Business            Industry            Financial          
## [19] Govt_Embassy        Private_residential Shopping           
## [22] geometry           
## <0 rows> (or 0-length row.names)
summary(mpsz_UF)
##     OBJECTID       SUBZONE_NO               SUBZONE_N     SUBZONE_C   CA_IND 
##  Min.   :  1.0   Min.   : 1.000   ADMIRALTY      :  1   AMSZ01 :  1   N:274  
##  1st Qu.: 81.5   1st Qu.: 2.000   AIRPORT ROAD   :  1   AMSZ02 :  1   Y: 49  
##  Median :162.0   Median : 4.000   ALEXANDRA HILL :  1   AMSZ03 :  1          
##  Mean   :162.0   Mean   : 4.625   ALEXANDRA NORTH:  1   AMSZ04 :  1          
##  3rd Qu.:242.5   3rd Qu.: 6.500   ALJUNIED       :  1   AMSZ05 :  1          
##  Max.   :323.0   Max.   :17.000   ANAK BUKIT     :  1   AMSZ06 :  1          
##                                   (Other)        :317   (Other):317          
##          PLN_AREA_N    PLN_AREA_C               REGION_N   REGION_C 
##  BUKIT MERAH  : 17   BM     : 17   CENTRAL REGION   :134   CR :134  
##  QUEENSTOWN   : 15   QT     : 15   EAST REGION      : 30   ER : 30  
##  ANG MO KIO   : 12   AM     : 12   NORTH-EAST REGION: 48   NER: 48  
##  DOWNTOWN CORE: 12   DT     : 12   NORTH REGION     : 41   NR : 41  
##  TOA PAYOH    : 12   TP     : 12   WEST REGION      : 70   WR : 70  
##  HOUGANG      : 10   HG     : 10                                    
##  (Other)      :245   (Other):245                                    
##              INC_CRC      FMEL_UPD_D             X_ADDR          Y_ADDR     
##  00F5E30B5C9B7AD8:  1   Min.   :2014-12-05   Min.   : 5093   Min.   :19579  
##  013B509B8EDF15BE:  1   1st Qu.:2014-12-05   1st Qu.:21864   1st Qu.:31776  
##  01A4287FB060A0A6:  1   Median :2014-12-05   Median :28465   Median :35113  
##  029BD940F4455194:  1   Mean   :2014-12-05   Mean   :27257   Mean   :36106  
##  0524461C92F35D94:  1   3rd Qu.:2014-12-05   3rd Qu.:31674   3rd Qu.:39869  
##  05FD555397CBEE7A:  1   Max.   :2014-12-05   Max.   :50425   Max.   :49553  
##  (Other)         :317                                                       
##    SHAPE_Leng        SHAPE_Area          Business         Industry     
##  Min.   :  871.5   Min.   :   39438   Min.   :  0.00   Min.   :0.0000  
##  1st Qu.: 3709.6   1st Qu.:  628261   1st Qu.:  0.00   1st Qu.:0.0000  
##  Median : 5211.9   Median : 1229894   Median :  2.00   Median :0.0000  
##  Mean   : 6524.4   Mean   : 2420882   Mean   : 19.94   Mean   :0.3406  
##  3rd Qu.: 6942.6   3rd Qu.: 2106483   3rd Qu.: 14.00   3rd Qu.:0.0000  
##  Max.   :68083.9   Max.   :69748299   Max.   :308.00   Max.   :8.0000  
##                                                                        
##    Financial       Govt_Embassy    Private_residential    Shopping     
##  Min.   :  0.00   Min.   : 0.000   Min.   :  0.00      Min.   : 0.000  
##  1st Qu.:  1.00   1st Qu.: 0.000   1st Qu.:  0.00      1st Qu.: 0.000  
##  Median :  5.00   Median : 0.000   Median :  4.00      Median : 0.000  
##  Mean   : 10.28   Mean   : 1.372   Mean   : 11.16      Mean   : 1.582  
##  3rd Qu.: 13.00   3rd Qu.: 1.000   3rd Qu.: 11.00      3rd Qu.: 1.000  
##  Max.   :134.00   Max.   :19.000   Max.   :217.00      Max.   :31.000  
##                                                                        
##           geometry  
##  MULTIPOLYGON :318  
##  POLYGON      :  5  
##  epsg:NA      :  0  
##  +proj=tmer...:  0  
##                     
##                     
## 

As we observe, no NA remains.

5.6.3 Deriving new Urban Function variables using dplyr

We will not be deriving any new variables from the urban functions as regardless of subzone area size, the Central Region in Singapore always has a higher number of businesses while industry functions try to be further away from housing areas and shopping and government buildings such as community centres will be around housing or business locations.

mpsz_All <- left_join(mpsz_UF,Indicators_derived)
## Joining, by = "SUBZONE_N"
## Warning: Column `SUBZONE_N` joining factor and character vector, coercing into
## character vector
summary(mpsz_All)
##     OBJECTID       SUBZONE_NO      SUBZONE_N           SUBZONE_C   CA_IND 
##  Min.   :  1.0   Min.   : 1.000   Length:323         AMSZ01 :  1   N:274  
##  1st Qu.: 81.5   1st Qu.: 2.000   Class :character   AMSZ02 :  1   Y: 49  
##  Median :162.0   Median : 4.000   Mode  :character   AMSZ03 :  1          
##  Mean   :162.0   Mean   : 4.625                      AMSZ04 :  1          
##  3rd Qu.:242.5   3rd Qu.: 6.500                      AMSZ05 :  1          
##  Max.   :323.0   Max.   :17.000                      AMSZ06 :  1          
##                                                      (Other):317          
##          PLN_AREA_N    PLN_AREA_C               REGION_N   REGION_C 
##  BUKIT MERAH  : 17   BM     : 17   CENTRAL REGION   :134   CR :134  
##  QUEENSTOWN   : 15   QT     : 15   EAST REGION      : 30   ER : 30  
##  ANG MO KIO   : 12   AM     : 12   NORTH-EAST REGION: 48   NER: 48  
##  DOWNTOWN CORE: 12   DT     : 12   NORTH REGION     : 41   NR : 41  
##  TOA PAYOH    : 12   TP     : 12   WEST REGION      : 70   WR : 70  
##  HOUGANG      : 10   HG     : 10                                    
##  (Other)      :245   (Other):245                                    
##              INC_CRC      FMEL_UPD_D             X_ADDR          Y_ADDR     
##  00F5E30B5C9B7AD8:  1   Min.   :2014-12-05   Min.   : 5093   Min.   :19579  
##  013B509B8EDF15BE:  1   1st Qu.:2014-12-05   1st Qu.:21864   1st Qu.:31776  
##  01A4287FB060A0A6:  1   Median :2014-12-05   Median :28465   Median :35113  
##  029BD940F4455194:  1   Mean   :2014-12-05   Mean   :27257   Mean   :36106  
##  0524461C92F35D94:  1   3rd Qu.:2014-12-05   3rd Qu.:31674   3rd Qu.:39869  
##  05FD555397CBEE7A:  1   Max.   :2014-12-05   Max.   :50425   Max.   :49553  
##  (Other)         :317                                                       
##    SHAPE_Leng        SHAPE_Area          Business         Industry     
##  Min.   :  871.5   Min.   :   39438   Min.   :  0.00   Min.   :0.0000  
##  1st Qu.: 3709.6   1st Qu.:  628261   1st Qu.:  0.00   1st Qu.:0.0000  
##  Median : 5211.9   Median : 1229894   Median :  2.00   Median :0.0000  
##  Mean   : 6524.4   Mean   : 2420882   Mean   : 19.94   Mean   :0.3406  
##  3rd Qu.: 6942.6   3rd Qu.: 2106483   3rd Qu.: 14.00   3rd Qu.:0.0000  
##  Max.   :68083.9   Max.   :69748299   Max.   :308.00   Max.   :8.0000  
##                                                                        
##    Financial       Govt_Embassy    Private_residential    Shopping     
##  Min.   :  0.00   Min.   : 0.000   Min.   :  0.00      Min.   : 0.000  
##  1st Qu.:  1.00   1st Qu.: 0.000   1st Qu.:  0.00      1st Qu.: 0.000  
##  Median :  5.00   Median : 0.000   Median :  4.00      Median : 0.000  
##  Mean   : 10.28   Mean   : 1.372   Mean   : 11.16      Mean   : 1.582  
##  3rd Qu.: 13.00   3rd Qu.: 1.000   3rd Qu.: 11.00      3rd Qu.: 1.000  
##  Max.   :134.00   Max.   :19.000   Max.   :217.00      Max.   :31.000  
##                                                                        
##     E_Active          Young            Aged            Pop        
##  Min.   : 512.4   Min.   :  0.0   Min.   :  0.0   Min.   :    10  
##  1st Qu.: 573.0   1st Qu.:218.0   1st Qu.:106.3   1st Qu.:  3330  
##  Median : 592.8   Median :254.2   Median :151.0   Median : 11640  
##  Mean   : 601.8   Mean   :247.9   Mean   :150.3   Mean   : 17557  
##  3rd Qu.: 607.9   3rd Qu.:287.6   3rd Qu.:192.7   3rd Qu.: 26505  
##  Max.   :1000.0   Max.   :360.0   Max.   :325.9   Max.   :132480  
##  NA's   :95       NA's   :95      NA's   :95      NA's   :95      
##       dens            HDB_1_2          HDB_3_4         HDB_5_EC    
##  Min.   :0.00001   Min.   :  0.00   Min.   :  0.0   Min.   :  0.0  
##  1st Qu.:0.00440   1st Qu.:  0.00   1st Qu.:  0.0   1st Qu.:  0.0  
##  Median :0.01222   Median :  0.00   Median :402.7   Median :144.0  
##  Mean   :0.01510   Mean   : 40.37   Mean   :355.0   Mean   :164.1  
##  3rd Qu.:0.02474   3rd Qu.: 47.94   3rd Qu.:606.7   3rd Qu.:259.4  
##  Max.   :0.04606   Max.   :712.93   Max.   :948.1   Max.   :836.4  
##  NA's   :95        NA's   :95       NA's   :95      NA's   :95     
##    Condo_Apt       LandedProperty            geometry  
##  Min.   :   0.00   Min.   :   0.0   MULTIPOLYGON :318  
##  1st Qu.:  26.75   1st Qu.:   0.0   POLYGON      :  5  
##  Median : 145.71   Median :   0.0   epsg:NA      :  0  
##  Mean   : 307.46   Mean   : 133.0   +proj=tmer...:  0  
##  3rd Qu.: 491.72   3rd Qu.: 145.2                      
##  Max.   :1000.00   Max.   :1000.0                      
##  NA's   :95        NA's   :95

We observe NA values in our indicators in subzones that do not have the respective indicators. We will replace these NA values with 0 to represent that these areas do not have the respective indicators.

5.6.4 Replacing NA in our mpsz_UF

mpsz_All <- replace(mpsz_All, is.na(mpsz_All),0)

5.6.5 Ensuring no NA remains in our mpsz_UF

mpsz_All[rowSums(is.na(mpsz_All))!=0,]
## Simple feature collection with 0 features and 31 fields
## bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
## projected CRS:  SVY21
##  [1] OBJECTID            SUBZONE_NO          SUBZONE_N          
##  [4] SUBZONE_C           CA_IND              PLN_AREA_N         
##  [7] PLN_AREA_C          REGION_N            REGION_C           
## [10] INC_CRC             FMEL_UPD_D          X_ADDR             
## [13] Y_ADDR              SHAPE_Leng          SHAPE_Area         
## [16] Business            Industry            Financial          
## [19] Govt_Embassy        Private_residential Shopping           
## [22] E_Active            Young               Aged               
## [25] Pop                 dens                HDB_1_2            
## [28] HDB_3_4             HDB_5_EC            Condo_Apt          
## [31] LandedProperty      geometry           
## <0 rows> (or 0-length row.names)
summary(mpsz_All)
##     OBJECTID       SUBZONE_NO      SUBZONE_N           SUBZONE_C   CA_IND 
##  Min.   :  1.0   Min.   : 1.000   Length:323         AMSZ01 :  1   N:274  
##  1st Qu.: 81.5   1st Qu.: 2.000   Class :character   AMSZ02 :  1   Y: 49  
##  Median :162.0   Median : 4.000   Mode  :character   AMSZ03 :  1          
##  Mean   :162.0   Mean   : 4.625                      AMSZ04 :  1          
##  3rd Qu.:242.5   3rd Qu.: 6.500                      AMSZ05 :  1          
##  Max.   :323.0   Max.   :17.000                      AMSZ06 :  1          
##                                                      (Other):317          
##          PLN_AREA_N    PLN_AREA_C               REGION_N   REGION_C 
##  BUKIT MERAH  : 17   BM     : 17   CENTRAL REGION   :134   CR :134  
##  QUEENSTOWN   : 15   QT     : 15   EAST REGION      : 30   ER : 30  
##  ANG MO KIO   : 12   AM     : 12   NORTH-EAST REGION: 48   NER: 48  
##  DOWNTOWN CORE: 12   DT     : 12   NORTH REGION     : 41   NR : 41  
##  TOA PAYOH    : 12   TP     : 12   WEST REGION      : 70   WR : 70  
##  HOUGANG      : 10   HG     : 10                                    
##  (Other)      :245   (Other):245                                    
##              INC_CRC      FMEL_UPD_D             X_ADDR          Y_ADDR     
##  00F5E30B5C9B7AD8:  1   Min.   :2014-12-05   Min.   : 5093   Min.   :19579  
##  013B509B8EDF15BE:  1   1st Qu.:2014-12-05   1st Qu.:21864   1st Qu.:31776  
##  01A4287FB060A0A6:  1   Median :2014-12-05   Median :28465   Median :35113  
##  029BD940F4455194:  1   Mean   :2014-12-05   Mean   :27257   Mean   :36106  
##  0524461C92F35D94:  1   3rd Qu.:2014-12-05   3rd Qu.:31674   3rd Qu.:39869  
##  05FD555397CBEE7A:  1   Max.   :2014-12-05   Max.   :50425   Max.   :49553  
##  (Other)         :317                                                       
##    SHAPE_Leng        SHAPE_Area          Business         Industry     
##  Min.   :  871.5   Min.   :   39438   Min.   :  0.00   Min.   :0.0000  
##  1st Qu.: 3709.6   1st Qu.:  628261   1st Qu.:  0.00   1st Qu.:0.0000  
##  Median : 5211.9   Median : 1229894   Median :  2.00   Median :0.0000  
##  Mean   : 6524.4   Mean   : 2420882   Mean   : 19.94   Mean   :0.3406  
##  3rd Qu.: 6942.6   3rd Qu.: 2106483   3rd Qu.: 14.00   3rd Qu.:0.0000  
##  Max.   :68083.9   Max.   :69748299   Max.   :308.00   Max.   :8.0000  
##                                                                        
##    Financial       Govt_Embassy    Private_residential    Shopping     
##  Min.   :  0.00   Min.   : 0.000   Min.   :  0.00      Min.   : 0.000  
##  1st Qu.:  1.00   1st Qu.: 0.000   1st Qu.:  0.00      1st Qu.: 0.000  
##  Median :  5.00   Median : 0.000   Median :  4.00      Median : 0.000  
##  Mean   : 10.28   Mean   : 1.372   Mean   : 11.16      Mean   : 1.582  
##  3rd Qu.: 13.00   3rd Qu.: 1.000   3rd Qu.: 11.00      3rd Qu.: 1.000  
##  Max.   :134.00   Max.   :19.000   Max.   :217.00      Max.   :31.000  
##                                                                        
##     E_Active          Young            Aged            Pop        
##  Min.   :   0.0   Min.   :  0.0   Min.   :  0.0   Min.   :     0  
##  1st Qu.:   0.0   1st Qu.:  0.0   1st Qu.:  0.0   1st Qu.:     0  
##  Median : 576.6   Median :223.7   Median :111.1   Median :  4880  
##  Mean   : 424.8   Mean   :175.0   Mean   :106.1   Mean   : 12393  
##  3rd Qu.: 600.6   3rd Qu.:271.9   3rd Qu.:177.3   3rd Qu.: 17035  
##  Max.   :1000.0   Max.   :360.0   Max.   :325.9   Max.   :132480  
##                                                                   
##       dens             HDB_1_2          HDB_3_4         HDB_5_EC    
##  Min.   :0.000000   Min.   :  0.00   Min.   :  0.0   Min.   :  0.0  
##  1st Qu.:0.000000   1st Qu.:  0.00   1st Qu.:  0.0   1st Qu.:  0.0  
##  Median :0.005857   Median :  0.00   Median :  0.0   Median :  0.0  
##  Mean   :0.010662   Mean   : 28.50   Mean   :250.6   Mean   :115.9  
##  3rd Qu.:0.019864   3rd Qu.: 28.99   3rd Qu.:504.3   3rd Qu.:207.2  
##  Max.   :0.046058   Max.   :712.93   Max.   :948.1   Max.   :836.4  
##                                                                     
##    Condo_Apt       LandedProperty             geometry  
##  Min.   :   0.00   Min.   :   0.00   MULTIPOLYGON :318  
##  1st Qu.:   0.00   1st Qu.:   0.00   POLYGON      :  5  
##  Median :  45.93   Median :   0.00   epsg:NA      :  0  
##  Mean   : 217.03   Mean   :  93.90   +proj=tmer...:  0  
##  3rd Qu.: 300.34   3rd Qu.:  38.87                      
##  Max.   :1000.00   Max.   :1000.00                      
## 

6 Exploratory Data Analysis

6.1 Using Histograms

We will utilise histograms to see the overall distributions of our data values

E_ActiveHist <- ggplot(data=mpsz_All, 
             aes(x= `E_Active`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

YoungHist <- ggplot(data=mpsz_All, 
             aes(x= `Young`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

AgedHist <- ggplot(data=mpsz_All, 
             aes(x= `Aged`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

densHist <- ggplot(data=mpsz_All, 
             aes(x= `dens`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

HDB_1_2Hist <- ggplot(data=mpsz_All, 
             aes(x= `HDB_1_2`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

HDB_3_4Hist <- ggplot(data=mpsz_All, 
             aes(x= `HDB_3_4`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

HDB_5_ECHist <- ggplot(data=mpsz_All, 
             aes(x= `HDB_5_EC`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

Condo_AptHist <- ggplot(data=mpsz_All, 
             aes(x= `Condo_Apt`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

LandedPropertyHist <- ggplot(data=mpsz_All, 
             aes(x= `LandedProperty`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

BusinessHist <- ggplot(data=mpsz_All, 
             aes(x= `Business`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

IndustryHist <- ggplot(data=mpsz_All, 
             aes(x= `Industry`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

FinancialHist <- ggplot(data=mpsz_All, 
             aes(x= `Financial`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

Govt_EmbassyHist <- ggplot(data=mpsz_All, 
             aes(x= `Govt_Embassy`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

Private_residentialHist <- ggplot(data=mpsz_All, 
             aes(x= `Private_residential`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

ShoppingHist <- ggplot(data=mpsz_All, 
             aes(x= `Shopping`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")
ggarrange(E_ActiveHist, YoungHist, AgedHist, densHist, HDB_1_2Hist, HDB_3_4Hist, HDB_5_ECHist, Condo_AptHist, LandedPropertyHist, BusinessHist, IndustryHist, FinancialHist, Govt_EmbassyHist, Private_residentialHist, ShoppingHist,
          ncol = 3, 
          nrow = 5)

We observe that most of our variables here are not normally distributed. We will factor this insight into our standardisation method choice.

6.2 Correlation Analysis

We will next be performing cluster analysis as it is important for us identify highly correlated input values and avoid using both.

We will utilize the corrplot.mixed() function of corrplot package to visualise and analyse the correlation of the input variables. We will assume correlation coefficients of magnitudes between 0.7 and 0.9 are considered highly correlated and should not be used together.

We must set mpsz_All as a dataframe before we can use it for our correlation analysis

6.2.1 Setting mpsz_All as dataframe

We also drop Population to remove it as a variable in our correlation analysis

mpsz_All_NoPop <- select(mpsz_All, -c("Pop"))
mpsz_All_NoPop_df <- as.data.frame(mpsz_All_NoPop)

7 Urban Functions

cluster_vars.cor = cor(mpsz_All_NoPop_df[,16:21])
corrplot.mixed(cluster_vars.cor,
               lower = "ellipse", 
               upper = "number",
               tl.pos = "lt",
               diag = "l",
               tl.col = "black")

We observe a 0.72 correlation coefficient magnitude between Financial and Shopping for our urban functions. We will opt to use Financial for our clustering analysis moving forward

8 Indicators

cluster_vars.cor = cor(mpsz_All_NoPop_df[,22:30])
corrplot.mixed(cluster_vars.cor,
               lower = "ellipse", 
               upper = "number",
               tl.pos = "lt",
               diag = "l",
               tl.col = "black")

Next we observe that a correlation coefficient magnitude of 0.87 between E_Active and Young, 0.72 between E_Active and Aged, 0.74 between dens and HDB_3_4 and 0.72 between dens and HDB_5_EC. We will opt for Young and Aged over E_Active and HDB_3_4 and HDB_5_EC over dens for our clustering analysis moving forward.

9 Heirarchy Cluster Analysis

9.1 Extracting Analysis Variables

Next we will be performing Heirarchy Cluster Analysis.

To begin, we will have to extract our clusters to use in our analysis from mpsz_All

cluster_vars <- mpsz_All %>%
  st_set_geometry(NULL) %>%
  select("SUBZONE_N", "Young", "Aged", "HDB_1_2", "HDB_3_4", "HDB_5_EC", "Condo_Apt", "LandedProperty", "Business", "Industry", "Financial", "Govt_Embassy", "Private_residential")

head(cluster_vars,10)
##          SUBZONE_N    Young     Aged  HDB_1_2  HDB_3_4  HDB_5_EC  Condo_Apt
## 1     MARINA SOUTH   0.0000   0.0000   0.0000   0.0000   0.00000    0.00000
## 2     PEARL'S HILL 167.1924 315.4574 712.9338 220.8202   0.00000   66.24606
## 3        BOAT QUAY   0.0000 200.0000   0.0000   0.0000   0.00000 1000.00000
## 4   HENDERSON HILL 195.8146 243.6472 293.7220 597.1599  94.91779   14.20030
## 5          REDHILL 266.4165 146.3415 184.8030 330.2064 303.93996  181.05066
## 6   ALEXANDRA HILL 215.7303 233.7079 292.8839 473.4082 233.70787    0.00000
## 7    BUKIT HO SWEE 193.2203 243.3898 250.8475 583.7288 128.81356   36.61017
## 8      CLARKE QUAY   0.0000   0.0000   0.0000   0.0000   0.00000 1000.00000
## 9  PASIR PANJANG 1 252.2727 122.7273   0.0000   0.0000   0.00000  675.00000
## 10       QUEENSWAY 103.4483 172.4138   0.0000   0.0000   0.00000 1000.00000
##    LandedProperty Business Industry Financial Govt_Embassy Private_residential
## 1               0        0        0         3            0                   0
## 2               0        6        0        25            1                   6
## 3               0       40        0         2            2                   1
## 4               0        0        0         4            0                   5
## 5               0        2        0        12            0                   6
## 6               0       39        1        15            7                   4
## 7               0        6        0         6            4                  11
## 8               0       12        0        19            4                   6
## 9             325       16        0         4            0                  56
## 10              0        0        0         2            0                   1

Next we will change our row numbers into our subzones names

row.names(cluster_vars) <- cluster_vars$"SUBZONE_N"
head(cluster_vars,10)
##                       SUBZONE_N    Young     Aged  HDB_1_2  HDB_3_4  HDB_5_EC
## MARINA SOUTH       MARINA SOUTH   0.0000   0.0000   0.0000   0.0000   0.00000
## PEARL'S HILL       PEARL'S HILL 167.1924 315.4574 712.9338 220.8202   0.00000
## BOAT QUAY             BOAT QUAY   0.0000 200.0000   0.0000   0.0000   0.00000
## HENDERSON HILL   HENDERSON HILL 195.8146 243.6472 293.7220 597.1599  94.91779
## REDHILL                 REDHILL 266.4165 146.3415 184.8030 330.2064 303.93996
## ALEXANDRA HILL   ALEXANDRA HILL 215.7303 233.7079 292.8839 473.4082 233.70787
## BUKIT HO SWEE     BUKIT HO SWEE 193.2203 243.3898 250.8475 583.7288 128.81356
## CLARKE QUAY         CLARKE QUAY   0.0000   0.0000   0.0000   0.0000   0.00000
## PASIR PANJANG 1 PASIR PANJANG 1 252.2727 122.7273   0.0000   0.0000   0.00000
## QUEENSWAY             QUEENSWAY 103.4483 172.4138   0.0000   0.0000   0.00000
##                  Condo_Apt LandedProperty Business Industry Financial
## MARINA SOUTH       0.00000              0        0        0         3
## PEARL'S HILL      66.24606              0        6        0        25
## BOAT QUAY       1000.00000              0       40        0         2
## HENDERSON HILL    14.20030              0        0        0         4
## REDHILL          181.05066              0        2        0        12
## ALEXANDRA HILL     0.00000              0       39        1        15
## BUKIT HO SWEE     36.61017              0        6        0         6
## CLARKE QUAY     1000.00000              0       12        0        19
## PASIR PANJANG 1  675.00000            325       16        0         4
## QUEENSWAY       1000.00000              0        0        0         2
##                 Govt_Embassy Private_residential
## MARINA SOUTH               0                   0
## PEARL'S HILL               1                   6
## BOAT QUAY                  2                   1
## HENDERSON HILL             0                   5
## REDHILL                    0                   6
## ALEXANDRA HILL             7                   4
## BUKIT HO SWEE              4                  11
## CLARKE QUAY                4                   6
## PASIR PANJANG 1            0                  56
## QUEENSWAY                  0                   1

We now see that the indexes are the subzone names. We will then remove the subzone area column.

mpsz_All_cVar <- select(cluster_vars, c(2:13))

9.2 Data Standardisation

9.2.1 Min-Max Standardisation

Since our variables are not all normally distributed, we will chose Min-Max Standardistion as Z-Score is not appropriate for non-normal distributions.

mpsz_All_cVar.std <- normalize(mpsz_All_cVar)
summary(mpsz_All_cVar.std)
##      Young             Aged           HDB_1_2           HDB_3_4      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.6215   Median :0.3409   Median :0.00000   Median :0.0000  
##  Mean   :0.4861   Mean   :0.3256   Mean   :0.03997   Mean   :0.2643  
##  3rd Qu.:0.7553   3rd Qu.:0.5439   3rd Qu.:0.04067   3rd Qu.:0.5319  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.00000   Max.   :1.0000  
##     HDB_5_EC        Condo_Apt       LandedProperty       Business       
##  Min.   :0.0000   Min.   :0.00000   Min.   :0.00000   Min.   :0.000000  
##  1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.000000  
##  Median :0.0000   Median :0.04593   Median :0.00000   Median :0.006494  
##  Mean   :0.1385   Mean   :0.21703   Mean   :0.09390   Mean   :0.064734  
##  3rd Qu.:0.2477   3rd Qu.:0.30034   3rd Qu.:0.03887   3rd Qu.:0.045455  
##  Max.   :1.0000   Max.   :1.00000   Max.   :1.00000   Max.   :1.000000  
##     Industry         Financial         Govt_Embassy     Private_residential
##  Min.   :0.00000   Min.   :0.000000   Min.   :0.00000   Min.   :0.00000    
##  1st Qu.:0.00000   1st Qu.:0.007463   1st Qu.:0.00000   1st Qu.:0.00000    
##  Median :0.00000   Median :0.037313   Median :0.00000   Median :0.01843    
##  Mean   :0.04257   Mean   :0.076706   Mean   :0.07219   Mean   :0.05142    
##  3rd Qu.:0.00000   3rd Qu.:0.097015   3rd Qu.:0.05263   3rd Qu.:0.05069    
##  Max.   :1.00000   Max.   :1.000000   Max.   :1.00000   Max.   :1.00000

9.3 Using Histograms

We will utilise histograms to see the overall distributions of our data values

YoungHist.std <- ggplot(data=mpsz_All_cVar.std, 
             aes(x= `Young`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

AgedHist.std <- ggplot(data=mpsz_All_cVar.std, 
             aes(x= `Aged`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

HDB_1_2Hist.std <- ggplot(data=mpsz_All_cVar.std, 
             aes(x= `HDB_1_2`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

HDB_3_4Hist.std <- ggplot(data=mpsz_All_cVar.std, 
             aes(x= `HDB_3_4`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

HDB_5_ECHist.std <- ggplot(data=mpsz_All_cVar.std, 
             aes(x= `HDB_5_EC`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

Condo_AptHist.std <- ggplot(data=mpsz_All_cVar.std, 
             aes(x= `Condo_Apt`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

LandedPropertyHist.std <- ggplot(data=mpsz_All_cVar.std, 
             aes(x= `LandedProperty`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

BusinessHist.std <- ggplot(data=mpsz_All_cVar.std, 
             aes(x= `Business`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

IndustryHist.std <- ggplot(data=mpsz_All_cVar.std, 
             aes(x= `Industry`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

FinancialHist.std <- ggplot(data=mpsz_All_cVar.std, 
             aes(x= `Financial`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

Govt_EmbassyHist.std <- ggplot(data=mpsz_All_cVar.std, 
             aes(x= `Govt_Embassy`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

Private_residentialHist.std <- ggplot(data=mpsz_All_cVar.std, 
             aes(x= `Private_residential`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")
ggarrange(YoungHist.std, AgedHist.std, HDB_1_2Hist.std, HDB_3_4Hist.std, HDB_5_ECHist.std, Condo_AptHist.std, LandedPropertyHist.std, BusinessHist.std, IndustryHist.std, FinancialHist.std, Govt_EmbassyHist.std, Private_residentialHist.std,
          ncol = 3, 
          nrow = 4)

We observe that our data ranges are now between 0 and 1. However the overall distributions have not changed much.

9.4 Proximity Matrix

proxmat <- dist(mpsz_All_cVar.std, method = 'euclidean')

9.5 Selecting the Optimal Clustering Algorithm

m <- c( "average", "single", "complete", "ward")
names(m) <- c( "average", "single", "complete", "ward")

ac <- function(x) {
  agnes(mpsz_All_cVar.std, method = x)$ac
}

map_dbl(m, ac)
##   average    single  complete      ward 
## 0.8903011 0.8432187 0.9187196 0.9840449

We observe that the Ward’s method has the highest agglomerative coefficient and thus provides the strongest clustering structure. Therefore we will use Ward’s method to analyse the variables

9.5.1 Using Elbow Method of fviz_nbclust() to find Optimal K

fviz_nbclust(mpsz_All_cVar.std, FUN = hcut, method = "wss")+
     geom_vline(xintercept = 4, linetype = 2)+
   labs(subtitle = "Elbow Method")

For this method we see an elbow at K=4. We will try other methods to determine whether 4 is the optimal number of clusters.

9.5.2 Using Silhouette Method of fviz_nbclust() to find Optimal K

fviz_nbclust(mpsz_All_cVar.std, FUN=hcut, method="silhouette")+
    labs(subtitle = "Silhouette Method")

This method also reveals 4 as the optimal number of clusters that we will adopt moving forward.

9.6 Hierarchical Clustering

hclust_ward <- hclust(proxmat, method = 'ward.D2')
plot(hclust_ward, cex = 0.6)
rect.hclust(hclust_ward, k = 4, border = 2:5)

9.7 Visually-Driven Hierarchical Clustering Analysis

9.8 Transforming the data frame into a matrix:

mpsz_All_cVar.std.mat <- data.matrix(mpsz_All_cVar.std)

9.9 Plotting Interactive Cluster Heatmap

heatmaply(mpsz_All_cVar.std.mat,
          Colv=NA,
          dist_method = "euclidean",
          hclust_method = "ward.D2",
          seriate = "OLO",
          colors = Blues,
          k_row = 4,
          margins = c(NA,200,60,NA),
          fontsize_row = 4,
          fontsize_col = 5,
          main="Geographic Segmentation of Singapore by Indicators & Urban Functions",
          xlab = "Indicators & Urban Functions",
          ylab = "Singapore Subzones"
          )

9.10 Mapping the Clusters Formed

groups <- as.factor(cutree(hclust_ward, k=4))
SZ_cluster <- cbind(mpsz_All, as.matrix(groups)) %>%
  rename(`CLUSTER`=`as.matrix.groups.`)

qtm(SZ_cluster, "CLUSTER")

We see that the clusters are very fragmented. This output reflect=s the limitation of heirarchical cluster analysis as it is a non-spatial clustering algorithm.

10 Spatially Constrained Clustering - SKATER approach

10.1 Converting into SpatialPolygonsDataFrame

We will convert our mpsz into SpatialPolygonsDataFrame as SKATER only supports sp objects.

mpsz_All_sp <- as_Spatial(mpsz_All)

10.2 Computing Neighbour List

Next we will compute the neighbour list

mpsz_All_sp.nb <- poly2nb(mpsz_All_sp)
summary(mpsz_All_sp.nb)
## Neighbour list object:
## Number of regions: 323 
## Number of nonzero links: 1934 
## Percentage nonzero weights: 1.853751 
## Average number of links: 5.987616 
## 5 regions with no links:
## 17 18 19 295 302
## Link number distribution:
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  5  2  6 10 26 77 87 51 34 16  3  3  1  1  1 
## 2 least connected regions:
## 16 234 with 1 link
## 1 most connected region:
## 313 with 17 links

10.2.1 Plotting the Neigbour List

plot(mpsz_All_sp, border=grey(.5))
plot(mpsz_All_sp.nb, coordinates(mpsz_All_sp), col = "red", add = TRUE)

We observe subzones with no neighbours. Since we cannot calculate edge costs with 0 neighbour subzones, we will have to remove them.

mpsz_All_sp.nb.NZ <- subset(mpsz_All_sp.nb, subset = card(mpsz_All_sp.nb) > 0)
summary(mpsz_All_sp.nb.NZ)
## Neighbour list object:
## Number of regions: 318 
## Number of nonzero links: 1934 
## Percentage nonzero weights: 1.912503 
## Average number of links: 6.081761 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  2  6 10 26 77 87 51 34 16  3  3  1  1  1 
## 2 least connected regions:
## 16 234 with 1 link
## 1 most connected region:
## 313 with 17 links

As we can observe, there are no more 0 neighbour subzones.

However as we remove the subzones, we will have to keep track of them and add them back in future else we won’t be able to plot our choropleth map.

Looking at the data, the 0 neighbour subzones are indexes 17:19, 295 & 302.

SZ_add <- function(df, num, index) {
  df[seq(index+1,nrow(df)+1),] <- df[seq(index,nrow(df)),]
  df[index,] <- num
  df
}

This function will add

10.3 Computing Minimum Spanning Tree

10.3.1 Calculating Edge Costs

lcosts <- nbcosts(mpsz_All_sp.nb.NZ, mpsz_All_cVar.std)
SZ.w <- nb2listw(mpsz_All_sp.nb.NZ, lcosts, style = "B")
summary(SZ.w)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 318 
## Number of nonzero links: 1934 
## Percentage nonzero weights: 1.912503 
## Average number of links: 6.081761 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  2  6 10 26 77 87 51 34 16  3  3  1  1  1 
## 2 least connected regions:
## 16 234 with 1 link
## 1 most connected region:
## 313 with 17 links
## 
## Weights style: B 
## Weights constants summary:
##     n     nn       S0       S1       S2
## B 318 101124 1831.917 4129.611 47721.25

10.3.2 Computing Minimum Spanning Tree

SZ.mst <- mstree(SZ.w)
class(SZ.mst)
## [1] "mst"    "matrix"
dim(SZ.mst)
## [1] 317   3

10.3.3 Plotting the Minimum Spanning Tree

plot(mpsz_All_sp, border=gray(.5))
plot.mst(SZ.mst, coordinates(mpsz_All_sp), col = "red", cex.lab = 0.7, cex.circles = 0.005, add = TRUE)

10.4 Computing Spatially Constrained Clusters using SKATER Method

We will first compute the clusters with the SKATER Method

clusters <- skater(SZ.mst[,1:2], mpsz_All_cVar.std, 3)

Next we will plot the pruned tree showing the 4 clusters

plot(mpsz_All_sp, border = gray(0.5))
plot(clusters, coordinates(mpsz_All_sp), cex.lab = 0.7,      
     groups.colors = c("red", "blue", "green", "brown", "pink"), cex.circles = 0.005, add = TRUE)
## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

10.5 Visualising the Clusters in a Choropleth Map

Before we can print out the clusters in a choropleth map, we will need to return the subzone points that had no neighbours prior.

We begin this with converting the clusters into a matrix.

groups_mat <- as.matrix(clusters$groups)

Using the function we made earlier, we add back the subzones earlier removed.

df <- as.data.frame(groups_mat)
df <- SZ_add(df, 0 , 17)
df <- SZ_add(df, 0 , 18)
df <- SZ_add(df, 0 , 19)
df <- SZ_add(df, 0 , 295)
df <- SZ_add(df, 0 , 302)
groups_mat <- as.matrix(df)

10.6 Plotting the Clusters obtained from using the SKATER Method

SZ_spatialcluster <- cbind(SZ_cluster, as.factor(groups_mat)) %>%
  rename(`SP_CLUSTER`=`as.factor.groups_mat.`)
qtm(SZ_spatialcluster, "SP_CLUSTER")

10.7 Cluster Analysis

10.7.1 Separating the Clusters for an Overview of the Differences

The analysis of this portion will also take a look at the plotted tree above.

Cluster0 <- SZ_spatialcluster %>%
  filter(SP_CLUSTER == 0)
Cluster1 <- SZ_spatialcluster %>%
  filter(SP_CLUSTER == 1)
Cluster2 <- SZ_spatialcluster %>%
  filter(SP_CLUSTER == 2)
Cluster3 <- SZ_spatialcluster %>%
  filter(SP_CLUSTER == 3)
Cluster4 <- SZ_spatialcluster %>%
  filter(SP_CLUSTER == 4)
summary(Cluster0)
##     OBJECTID       SUBZONE_NO   SUBZONE_N           SUBZONE_C CA_IND
##  Min.   : 17.0   Min.   :1.0   Length:5           NESZ01 :1   N:5   
##  1st Qu.: 18.0   1st Qu.:2.0   Class :character   SISZ02 :1   Y:0   
##  Median : 19.0   Median :2.0   Mode  :character   SMSZ04 :1         
##  Mean   :130.2   Mean   :2.4                      WISZ02 :1         
##  3rd Qu.:295.0   3rd Qu.:3.0                      WISZ03 :1         
##  Max.   :302.0   Max.   :4.0                      AMSZ01 :0         
##                                                   (Other):0         
##                  PLN_AREA_N   PLN_AREA_C              REGION_N REGION_C
##  WESTERN ISLANDS      :2    WI     :2    CENTRAL REGION   :1   CR :1   
##  NORTH-EASTERN ISLANDS:1    NE     :1    EAST REGION      :0   ER :0   
##  SIMPANG              :1    SI     :1    NORTH-EAST REGION:1   NER:1   
##  SOUTHERN ISLANDS     :1    SM     :1    NORTH REGION     :1   NR :1   
##  ANG MO KIO           :0    AM     :0    WEST REGION      :2   WR :2   
##  BEDOK                :0    BD     :0                                  
##  (Other)              :0    (Other):0                                  
##              INC_CRC    FMEL_UPD_D             X_ADDR          Y_ADDR     
##  5809FC547293EA2D:1   Min.   :2014-12-05   Min.   :15932   Min.   :19579  
##  66E54DD5CE0C71A2:1   1st Qu.:2014-12-05   1st Qu.:21206   1st Qu.:20466  
##  92BC3E09C68F3B52:1   Median :2014-12-05   Median :29815   Median :23413  
##  E69207D4F76DEEA3:1   Mean   :2014-12-05   Mean   :29778   Mean   :30663  
##  F718C723E08FBD51:1   3rd Qu.:2014-12-05   3rd Qu.:31511   3rd Qu.:42613  
##  00F5E30B5C9B7AD8:0   Max.   :2014-12-05   Max.   :50425   Max.   :47245  
##  (Other)         :0                                                       
##    SHAPE_Leng      SHAPE_Area          Business    Industry   Financial
##  Min.   : 5466   Min.   : 1611279   Min.   :0   Min.   :0   Min.   :0  
##  1st Qu.:18704   1st Qu.: 2206319   1st Qu.:0   1st Qu.:0   1st Qu.:0  
##  Median :24759   Median : 4207271   Median :0   Median :0   Median :0  
##  Mean   :27398   Mean   :16047844   Mean   :0   Mean   :0   Mean   :0  
##  3rd Qu.:25627   3rd Qu.: 4963787   3rd Qu.:0   3rd Qu.:0   3rd Qu.:0  
##  Max.   :62436   Max.   :67250563   Max.   :0   Max.   :0   Max.   :0  
##                                                                        
##   Govt_Embassy Private_residential    Shopping    E_Active     Young  
##  Min.   :0     Min.   :0           Min.   :0   Min.   :0   Min.   :0  
##  1st Qu.:0     1st Qu.:0           1st Qu.:0   1st Qu.:0   1st Qu.:0  
##  Median :0     Median :0           Median :0   Median :0   Median :0  
##  Mean   :0     Mean   :0           Mean   :0   Mean   :0   Mean   :0  
##  3rd Qu.:0     3rd Qu.:0           3rd Qu.:0   3rd Qu.:0   3rd Qu.:0  
##  Max.   :0     Max.   :0           Max.   :0   Max.   :0   Max.   :0  
##                                                                       
##       Aged        Pop         dens      HDB_1_2     HDB_3_4     HDB_5_EC
##  Min.   :0   Min.   :0   Min.   :0   Min.   :0   Min.   :0   Min.   :0  
##  1st Qu.:0   1st Qu.:0   1st Qu.:0   1st Qu.:0   1st Qu.:0   1st Qu.:0  
##  Median :0   Median :0   Median :0   Median :0   Median :0   Median :0  
##  Mean   :0   Mean   :0   Mean   :0   Mean   :0   Mean   :0   Mean   :0  
##  3rd Qu.:0   3rd Qu.:0   3rd Qu.:0   3rd Qu.:0   3rd Qu.:0   3rd Qu.:0  
##  Max.   :0   Max.   :0   Max.   :0   Max.   :0   Max.   :0   Max.   :0  
##                                                                         
##    Condo_Apt LandedProperty CLUSTER SP_CLUSTER          geometry
##  Min.   :0   Min.   :0      1:5     0:5        MULTIPOLYGON :5  
##  1st Qu.:0   1st Qu.:0      2:0     1:0        epsg:NA      :0  
##  Median :0   Median :0      3:0     2:0        +proj=tmer...:0  
##  Mean   :0   Mean   :0      4:0     3:0                         
##  3rd Qu.:0   3rd Qu.:0              4:0                         
##  Max.   :0   Max.   :0                                          
## 
sum(Cluster0$SHAPE_Area) / 100000 # Convert m^2 to km^2
## [1] 802.3922

Looking at our cluster 0 firstly, we see that they are mainly the islands around Singapore such as Pulau Ubin, Tekong and the many southern islands. These were the areas without neighbours that we removed prior and from the summary we see that these areas are void of all urban functions and indicators. It has a total area of 802.3922 km^2, making it the second largest cluster among the 5.

summary(Cluster1)
##     OBJECTID       SUBZONE_NO      SUBZONE_N           SUBZONE_C   CA_IND 
##  Min.   :  9.0   Min.   : 1.000   Length:200         AMSZ01 :  1   N:200  
##  1st Qu.:138.8   1st Qu.: 2.000   Class :character   AMSZ02 :  1   Y:  0  
##  Median :189.5   Median : 4.000   Mode  :character   AMSZ03 :  1          
##  Mean   :188.0   Mean   : 4.415                      AMSZ04 :  1          
##  3rd Qu.:241.2   3rd Qu.: 6.000                      AMSZ05 :  1          
##  Max.   :319.0   Max.   :15.000                      AMSZ06 :  1          
##                                                      (Other):194          
##        PLN_AREA_N    PLN_AREA_C               REGION_N  REGION_C
##  ANG MO KIO : 12   AM     : 12   CENTRAL REGION   :48   CR :48  
##  TOA PAYOH  : 12   TP     : 12   EAST REGION      :30   ER :30  
##  JURONG EAST: 10   JE     : 10   NORTH-EAST REGION:42   NER:42  
##  BUKIT BATOK:  9   BK     :  9   NORTH REGION     :15   NR :15  
##  JURONG WEST:  9   JW     :  9   WEST REGION      :65   WR :65  
##  QUEENSTOWN :  9   QT     :  9                                  
##  (Other)    :139   (Other):139                                  
##              INC_CRC      FMEL_UPD_D             X_ADDR          Y_ADDR     
##  00F5E30B5C9B7AD8:  1   Min.   :2014-12-05   Min.   : 5093   Min.   :26138  
##  013B509B8EDF15BE:  1   1st Qu.:2014-12-05   1st Qu.:19288   1st Qu.:34203  
##  029BD940F4455194:  1   Median :2014-12-05   Median :28511   Median :36586  
##  05FD555397CBEE7A:  1   Mean   :2014-12-05   Mean   :26807   Mean   :37091  
##  0664CA7EF6504AE5:  1   3rd Qu.:2014-12-05   3rd Qu.:34108   3rd Qu.:39822  
##  0ABCF49C51112DC2:  1   Max.   :2014-12-05   Max.   :49502   Max.   :47683  
##  (Other)         :194                                                       
##    SHAPE_Leng      SHAPE_Area          Business         Industry    
##  Min.   : 1634   Min.   :  143138   Min.   :  0.00   Min.   :0.000  
##  1st Qu.: 4296   1st Qu.:  918875   1st Qu.:  0.00   1st Qu.:0.000  
##  Median : 5657   Median : 1444181   Median :  2.00   Median :0.000  
##  Mean   : 7214   Mean   : 2895375   Mean   : 26.15   Mean   :0.455  
##  3rd Qu.: 7563   3rd Qu.: 2408087   3rd Qu.: 19.75   3rd Qu.:0.000  
##  Max.   :68084   Max.   :69748299   Max.   :308.00   Max.   :8.000  
##                                                                     
##    Financial      Govt_Embassy    Private_residential    Shopping     
##  Min.   : 0.00   Min.   : 0.000   Min.   :  0.00      Min.   : 0.000  
##  1st Qu.: 1.00   1st Qu.: 0.000   1st Qu.:  0.00      1st Qu.: 0.000  
##  Median : 4.00   Median : 0.000   Median :  4.00      Median : 0.000  
##  Mean   : 8.74   Mean   : 0.795   Mean   : 12.13      Mean   : 0.935  
##  3rd Qu.:12.00   3rd Qu.: 1.000   3rd Qu.: 12.00      3rd Qu.: 1.000  
##  Max.   :79.00   Max.   :10.000   Max.   :217.00      Max.   :14.000  
##                                                                       
##     E_Active         Young            Aged            Pop        
##  Min.   :  0.0   Min.   :  0.0   Min.   :  0.0   Min.   :     0  
##  1st Qu.:  0.0   1st Qu.:  0.0   1st Qu.:  0.0   1st Qu.:     0  
##  Median :576.0   Median :235.9   Median :124.5   Median :  8160  
##  Mean   :420.0   Mean   :182.7   Mean   :107.4   Mean   : 15002  
##  3rd Qu.:598.5   3rd Qu.:275.4   3rd Qu.:173.5   3rd Qu.: 24755  
##  Max.   :800.0   Max.   :360.0   Max.   :269.0   Max.   :132480  
##                                                                  
##       dens             HDB_1_2          HDB_3_4          HDB_5_EC     
##  Min.   :0.000000   Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.:0.000000   1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.00  
##  Median :0.006731   Median :  0.00   Median : 69.78   Median : 20.12  
##  Mean   :0.011663   Mean   : 21.42   Mean   :270.96   Mean   :126.61  
##  3rd Qu.:0.022283   3rd Qu.: 28.54   3rd Qu.:555.74   3rd Qu.:237.90  
##  Max.   :0.046058   Max.   :322.17   Max.   :837.65   Max.   :613.50  
##                                                                       
##    Condo_Apt       LandedProperty   CLUSTER SP_CLUSTER          geometry  
##  Min.   :   0.00   Min.   :   0.0   1:58    0:  0      MULTIPOLYGON :197  
##  1st Qu.:   0.00   1st Qu.:   0.0   2:97    1:200      POLYGON      :  3  
##  Median :  36.74   Median :   0.0   3:20    2:  0      epsg:NA      :  0  
##  Mean   : 165.90   Mean   : 125.1   4:25    3:  0      +proj=tmer...:  0  
##  3rd Qu.: 240.78   3rd Qu.: 118.6           4:  0                         
##  Max.   :1000.00   Max.   :1000.0                                         
## 
sum(Cluster1$SHAPE_Area) / 100000 # Convert m^2 to km^2
## [1] 5790.749

Cluster 1 takes up the manjority area of Singapore which spans over all regions and hence has the largest total area of 5790.749 km^2.

summary(Cluster2)
##     OBJECTID       SUBZONE_NO     SUBZONE_N           SUBZONE_C  CA_IND
##  Min.   :259.0   Min.   :1.000   Length:27          CKSZ05 : 1   N:27  
##  1st Qu.:290.5   1st Qu.:2.500   Class :character   CKSZ06 : 1   Y: 0  
##  Median :300.0   Median :5.000   Mode  :character   MDSZ01 : 1         
##  Mean   :298.6   Mean   :4.667                      MDSZ02 : 1         
##  3rd Qu.:308.5   3rd Qu.:6.500                      SBSZ02 : 1         
##  Max.   :323.0   Max.   :9.000                      SBSZ03 : 1         
##                                                     (Other):21         
##          PLN_AREA_N   PLN_AREA_C              REGION_N  REGION_C
##  WOODLANDS    :9    WD     :9    CENTRAL REGION   : 0   CR : 0  
##  SEMBAWANG    :8    SB     :8    EAST REGION      : 0   ER : 0  
##  SIMPANG      :3    SM     :3    NORTH-EAST REGION: 0   NER: 0  
##  CHOA CHU KANG:2    CK     :2    NORTH REGION     :25   NR :25  
##  MANDAI       :2    MD     :2    WEST REGION      : 2   WR : 2  
##  YISHUN       :2    YS     :2                                   
##  (Other)      :1    (Other):1                                   
##              INC_CRC     FMEL_UPD_D             X_ADDR          Y_ADDR     
##  01A4287FB060A0A6: 1   Min.   :2014-12-05   Min.   :18348   Min.   :41594  
##  19529EBD71A301DD: 1   1st Qu.:2014-12-05   1st Qu.:22560   1st Qu.:45530  
##  1ED0377B40E71BDA: 1   Median :2014-12-05   Median :24666   Median :46959  
##  2E2DB30B78E2AC57: 1   Mean   :2014-12-05   Mean   :24720   Mean   :46512  
##  4215C006676A7D38: 1   3rd Qu.:2014-12-05   3rd Qu.:27023   3rd Qu.:48211  
##  42D5F52D334C615F: 1   Max.   :2014-12-05   Max.   :30568   Max.   :49553  
##  (Other)         :21                                                       
##    SHAPE_Leng      SHAPE_Area         Business         Industry     
##  Min.   : 3254   Min.   : 595652   Min.   :  0.00   Min.   :0.0000  
##  1st Qu.: 4428   1st Qu.:1094016   1st Qu.:  0.00   1st Qu.:0.0000  
##  Median : 5520   Median :1576001   Median :  1.00   Median :0.0000  
##  Mean   : 6361   Mean   :2023216   Mean   : 17.07   Mean   :0.3333  
##  3rd Qu.: 7305   3rd Qu.:2225299   3rd Qu.: 10.50   3rd Qu.:0.0000  
##  Max.   :11829   Max.   :7235809   Max.   :173.00   Max.   :3.0000  
##                                                                     
##    Financial       Govt_Embassy    Private_residential    Shopping     
##  Min.   : 0.000   Min.   :0.0000   Min.   : 0.000      Min.   :0.0000  
##  1st Qu.: 0.000   1st Qu.:0.0000   1st Qu.: 0.000      1st Qu.:0.0000  
##  Median : 2.000   Median :0.0000   Median : 1.000      Median :0.0000  
##  Mean   : 5.259   Mean   :0.5926   Mean   : 2.519      Mean   :0.8519  
##  3rd Qu.: 9.000   3rd Qu.:0.5000   3rd Qu.: 3.500      3rd Qu.:1.0000  
##  Max.   :21.000   Max.   :6.0000   Max.   :14.000      Max.   :8.0000  
##                                                                        
##     E_Active         Young            Aged             Pop       
##  Min.   :  0.0   Min.   :  0.0   Min.   :  0.00   Min.   :    0  
##  1st Qu.:  0.0   1st Qu.:  0.0   1st Qu.:  0.00   1st Qu.:    0  
##  Median :569.0   Median :233.5   Median : 75.60   Median : 4320  
##  Mean   :331.3   Mean   :164.9   Mean   : 59.34   Mean   :15767  
##  3rd Qu.:600.6   3rd Qu.:310.4   3rd Qu.: 91.70   3rd Qu.:32610  
##  Max.   :622.6   Max.   :338.3   Max.   :175.52   Max.   :98410  
##                                                                  
##       dens             HDB_1_2          HDB_3_4         HDB_5_EC    
##  Min.   :0.000000   Min.   :  0.00   Min.   :  0.0   Min.   :  0.0  
##  1st Qu.:0.000000   1st Qu.:  0.00   1st Qu.:  0.0   1st Qu.:  0.0  
##  Median :0.001955   Median :  0.00   Median :  0.0   Median :  0.0  
##  Mean   :0.011275   Mean   : 17.17   Mean   :213.2   Mean   :185.9  
##  3rd Qu.:0.020179   3rd Qu.: 31.74   3rd Qu.:443.4   3rd Qu.:430.9  
##  Max.   :0.044356   Max.   :113.47   Max.   :824.5   Max.   :509.6  
##                                                                     
##    Condo_Apt      LandedProperty    CLUSTER SP_CLUSTER          geometry 
##  Min.   :  0.00   Min.   :   0.00   1:12    0: 0       MULTIPOLYGON :26  
##  1st Qu.:  0.00   1st Qu.:   0.00   2:12    1: 0       POLYGON      : 1  
##  Median :  0.00   Median :   0.00   3: 1    2:27       epsg:NA      : 0  
##  Mean   : 54.02   Mean   :  85.28   4: 2    3: 0       +proj=tmer...: 0  
##  3rd Qu.: 36.85   3rd Qu.:   0.00           4: 0                         
##  Max.   :759.56   Max.   :1000.00                                        
## 
sum(Cluster2$SHAPE_Area) / 100000 # Convert m^2 to km^2
## [1] 546.2683

Cluster 2 is present in the northern region of Singapore with a couple of subzones at the west region. It has the smallest total area at 546.2683.

summary(Cluster3)
##     OBJECTID        SUBZONE_NO      SUBZONE_N           SUBZONE_C  CA_IND
##  Min.   :  1.00   Min.   : 1.000   Length:86          BMSZ01 : 1   N:37  
##  1st Qu.: 27.25   1st Qu.: 2.000   Class :character   BMSZ02 : 1   Y:49  
##  Median : 50.00   Median : 4.000   Mode  :character   BMSZ03 : 1         
##  Mean   : 55.56   Mean   : 5.291                      BMSZ04 : 1         
##  3rd Qu.: 81.75   3rd Qu.: 7.000                      BMSZ05 : 1         
##  Max.   :165.00   Max.   :17.000                      BMSZ06 : 1         
##                                                       (Other):80         
##          PLN_AREA_N   PLN_AREA_C              REGION_N  REGION_C
##  BUKIT MERAH  :17   BM     :17   CENTRAL REGION   :85   CR :85  
##  DOWNTOWN CORE:12   DT     :12   EAST REGION      : 0   ER : 0  
##  ROCHOR       :10   RC     :10   NORTH-EAST REGION: 0   NER: 0  
##  KALLANG      : 6   KL     : 6   NORTH REGION     : 0   NR : 0  
##  NEWTON       : 6   NT     : 6   WEST REGION      : 1   WR : 1  
##  QUEENSTOWN   : 6   QT     : 6                                  
##  (Other)      :29   (Other):29                                  
##              INC_CRC     FMEL_UPD_D             X_ADDR          Y_ADDR     
##  0524461C92F35D94: 1   Min.   :2014-12-05   Min.   :20764   Min.   :25813  
##  06B9FD8607810069: 1   1st Qu.:2014-12-05   1st Qu.:27287   1st Qu.:29577  
##  0D1D1759D7BC6D6C: 1   Median :2014-12-05   Median :28817   Median :30681  
##  0F0735F1BDDF53C7: 1   Mean   :2014-12-05   Mean   :28510   Mean   :30628  
##  0FF1661344C84AED: 1   3rd Qu.:2014-12-05   3rd Qu.:29991   3rd Qu.:31803  
##  0FF5E50B9581D2BE: 1   Max.   :2014-12-05   Max.   :33716   Max.   :33930  
##  (Other)         :80                                                       
##    SHAPE_Leng        SHAPE_Area         Business         Industry     
##  Min.   :  871.6   Min.   :  39438   Min.   : 0.000   Min.   :0.0000  
##  1st Qu.: 2293.0   1st Qu.: 216181   1st Qu.: 1.000   1st Qu.:0.0000  
##  Median : 3006.0   Median : 411359   Median : 4.000   Median :0.0000  
##  Mean   : 3816.2   Mean   : 708805   Mean   : 8.605   Mean   :0.1163  
##  3rd Qu.: 4515.4   3rd Qu.: 877930   3rd Qu.:11.000   3rd Qu.:0.0000  
##  Max.   :17496.2   Max.   :4919132   Max.   :51.000   Max.   :5.0000  
##                                                                       
##    Financial       Govt_Embassy    Private_residential    Shopping     
##  Min.   :  0.00   Min.   : 0.000   Min.   :  0.00      Min.   : 0.000  
##  1st Qu.:  3.00   1st Qu.: 0.000   1st Qu.:  1.00      1st Qu.: 0.000  
##  Median :  8.00   Median : 1.000   Median :  5.00      Median : 1.000  
##  Mean   : 15.71   Mean   : 3.081   Mean   : 12.16      Mean   : 3.302  
##  3rd Qu.: 19.00   3rd Qu.: 4.000   3rd Qu.: 11.00      3rd Qu.: 4.500  
##  Max.   :134.00   Max.   :19.000   Max.   :123.00      Max.   :31.000  
##                                                                        
##     E_Active          Young             Aged            Pop       
##  Min.   :   0.0   Min.   :  0.00   Min.   :  0.0   Min.   :    0  
##  1st Qu.: 517.7   1st Qu.: 16.13   1st Qu.:  0.0   1st Qu.:   60  
##  Median : 581.7   Median :196.91   Median :137.9   Median : 1400  
##  Mean   : 480.0   Mean   :164.27   Mean   :123.2   Mean   : 4005  
##  3rd Qu.: 613.2   3rd Qu.:249.79   3rd Qu.:200.2   3rd Qu.: 7758  
##  Max.   :1000.0   Max.   :330.51   Max.   :325.9   Max.   :19100  
##                                                                   
##       dens              HDB_1_2          HDB_3_4         HDB_5_EC     
##  Min.   :0.0000000   Min.   :  0.00   Min.   :  0.0   Min.   :  0.00  
##  1st Qu.:0.0003167   1st Qu.:  0.00   1st Qu.:  0.0   1st Qu.:  0.00  
##  Median :0.0039049   Median :  0.00   Median :  0.0   Median :  0.00  
##  Mean   :0.0074665   Mean   : 50.53   Mean   :218.8   Mean   : 60.94  
##  3rd Qu.:0.0108508   3rd Qu.: 29.69   3rd Qu.:503.3   3rd Qu.: 87.31  
##  Max.   :0.0439867   Max.   :712.93   Max.   :948.1   Max.   :836.36  
##                                                                       
##    Condo_Apt      LandedProperty   CLUSTER SP_CLUSTER          geometry 
##  Min.   :   0.0   Min.   :  0.00   1:20    0: 0       MULTIPOLYGON :85  
##  1st Qu.:   0.0   1st Qu.:  0.00   2:31    1: 0       POLYGON      : 1  
##  Median : 163.1   Median :  0.00   3:33    2: 0       epsg:NA      : 0  
##  Mean   : 403.9   Mean   : 33.35   4: 2    3:86       +proj=tmer...: 0  
##  3rd Qu.: 993.3   3rd Qu.:  0.00           4: 0                         
##  Max.   :1000.0   Max.   :738.98                                        
## 
sum(Cluster3$SHAPE_Area) / 100000 # Convert m^2 to km^2
## [1] 609.5722

Cluster 3 is in the Central Southern region of Singapore, this area consists of mainly the CBD and Town areas. Likely to have a large number of businesses and social activity. It has the smallest total area at 609.5722 km^2.

summary(Cluster4)
##     OBJECTID     SUBZONE_NO   SUBZONE_N           SUBZONE_C CA_IND
##  Min.   :204   Min.   :2.0   Length:5           HGSZ02 :1   N:5   
##  1st Qu.:231   1st Qu.:2.0   Class :character   HGSZ07 :1   Y:0   
##  Median :260   Median :3.0   Mode  :character   SESZ02 :1         
##  Mean   :248   Mean   :3.6                      SESZ03 :1         
##  3rd Qu.:272   3rd Qu.:4.0                      SESZ04 :1         
##  Max.   :273   Max.   :7.0                      AMSZ01 :0         
##                                                 (Other):0         
##       PLN_AREA_N   PLN_AREA_C              REGION_N REGION_C
##  SENGKANG  :3    SE     :3    CENTRAL REGION   :0   CR :0   
##  HOUGANG   :2    HG     :2    EAST REGION      :0   ER :0   
##  ANG MO KIO:0    AM     :0    NORTH-EAST REGION:5   NER:5   
##  BEDOK     :0    BD     :0    NORTH REGION     :0   NR :0   
##  BISHAN    :0    BK     :0    WEST REGION      :0   WR :0   
##  BOON LAY  :0    BL     :0                                  
##  (Other)   :0    (Other):0                                  
##              INC_CRC    FMEL_UPD_D             X_ADDR          Y_ADDR     
##  5A2D0E9E6B285069:1   Min.   :2014-12-05   Min.   :33930   Min.   :37657  
##  6EDE1DB873D24BDD:1   1st Qu.:2014-12-05   1st Qu.:34219   1st Qu.:39071  
##  986666487FF7CF78:1   Median :2014-12-05   Median :35164   Median :41061  
##  BE2E2BB27D14DC52:1   Mean   :2014-12-05   Mean   :34903   Mean   :40221  
##  F00F5344E293F642:1   3rd Qu.:2014-12-05   3rd Qu.:35222   3rd Qu.:41501  
##  00F5E30B5C9B7AD8:0   Max.   :2014-12-05   Max.   :35978   Max.   :41815  
##  (Other)         :0                                                       
##    SHAPE_Leng     SHAPE_Area         Business      Industry   Financial   
##  Min.   :5112   Min.   :1007410   Min.   :0.0   Min.   :0   Min.   : 1.0  
##  1st Qu.:5216   1st Qu.:1455508   1st Qu.:0.0   1st Qu.:0   1st Qu.: 1.0  
##  Median :5438   Median :1499109   Median :1.0   Median :0   Median :11.0  
##  Mean   :5540   Mean   :1409319   Mean   :1.8   Mean   :0   Mean   :15.8  
##  3rd Qu.:5617   3rd Qu.:1515534   3rd Qu.:3.0   3rd Qu.:0   3rd Qu.:24.0  
##  Max.   :6316   Max.   :1569035   Max.   :5.0   Max.   :0   Max.   :42.0  
##                                                                           
##   Govt_Embassy Private_residential    Shopping       E_Active    
##  Min.   :0.0   Min.   : 8.0        Min.   : 0.0   Min.   :584.9  
##  1st Qu.:0.0   1st Qu.: 9.0        1st Qu.: 0.0   1st Qu.:595.2  
##  Median :0.0   Median : 9.0        Median : 3.0   Median :597.3  
##  Mean   :0.6   Mean   :12.8        Mean   : 3.4   Mean   :596.7  
##  3rd Qu.:0.0   3rd Qu.:13.0        3rd Qu.: 4.0   3rd Qu.:597.5  
##  Max.   :3.0   Max.   :25.0        Max.   :10.0   Max.   :608.8  
##                                                                  
##      Young            Aged             Pop             dens        
##  Min.   :224.4   Min.   : 77.67   Min.   :31760   Min.   :0.02138  
##  1st Qu.:259.1   1st Qu.: 92.32   1st Qu.:32400   1st Qu.:0.03109  
##  Median :295.7   Median :106.81   Median :46610   Median :0.03153  
##  Mean   :281.1   Mean   :122.22   Mean   :46478   Mean   :0.03291  
##  3rd Qu.:298.9   3rd Qu.:143.58   3rd Qu.:60200   3rd Qu.:0.03837  
##  Max.   :327.2   Max.   :190.74   Max.   :61420   Max.   :0.04220  
##                                                                    
##     HDB_1_2          HDB_3_4         HDB_5_EC       Condo_Apt    
##  Min.   : 0.000   Min.   :365.3   Min.   :144.2   Min.   : 60.4  
##  1st Qu.: 7.327   1st Qu.:370.7   1st Qu.:213.3   1st Qu.:107.3  
##  Median :27.741   Median :403.6   Median :468.6   Median :121.9  
##  Mean   :22.409   Mean   :437.1   Mean   :367.3   Mean   :145.9  
##  3rd Qu.:36.044   3rd Qu.:498.7   3rd Qu.:485.9   3rd Qu.:134.2  
##  Max.   :40.932   Max.   :546.9   Max.   :524.6   Max.   :305.7  
##                                                                  
##  LandedProperty    CLUSTER SP_CLUSTER          geometry
##  Min.   :  0.000   1:0     0:0        MULTIPOLYGON :5  
##  1st Qu.:  4.070   2:5     1:0        epsg:NA      :0  
##  Median :  4.153   3:0     2:0        +proj=tmer...:0  
##  Mean   : 27.303   4:0     3:0                         
##  3rd Qu.: 10.390           4:5                         
##  Max.   :117.901                                       
## 
sum(Cluster4$SHAPE_Area) / 10000 # Convert m^2 to km^2
## [1] 704.6595

Cluster 4 resides in the north eastern region of Singapore, it has the second smallest total area at 704.6595 km^2.

tm_shape(Cluster1) +
  tm_fill("SP_CLUSTER") +
  tm_borders(lwd = 0.1,  alpha = 1) +
  tm_layout(title = "Business", legend.outside = TRUE) +
  tm_bubbles(col = "Business",
             size = "Business",
             border.col = "black",
             border.lwd = 1,
             alpha = 0.7)

10.7.2 Subsetting SZ_spatialcluster into the Variables used for Cluster Analysis

This will allow us to compare the individual traits of each cluster.

BusinessClusters <- SZ_spatialcluster %>%
  select("SUBZONE_N","Business","SP_CLUSTER") %>%
  group_by(SP_CLUSTER) %>%
  summarise(Business = mean(Business))

IndustryClusters <- SZ_spatialcluster %>%
  select("SUBZONE_N","Industry","SP_CLUSTER") %>%
  group_by(SP_CLUSTER) %>%
  summarise(Industry = mean(Industry))

FinancialClusters <- SZ_spatialcluster %>%
  select("SUBZONE_N","Financial","SP_CLUSTER") %>%
  group_by(SP_CLUSTER) %>%
  summarise(Financial = mean(Financial))

Govt_EmbassyClusters <- SZ_spatialcluster %>%
  select("SUBZONE_N","Govt_Embassy","SP_CLUSTER") %>%
  group_by(SP_CLUSTER) %>%
  summarise(Govt_Embassy = mean(Govt_Embassy))

Private_residentialClusters <- SZ_spatialcluster %>%
  select("SUBZONE_N","Private_residential","SP_CLUSTER") %>%
  group_by(SP_CLUSTER) %>%
  summarise(Private_residential = mean(Private_residential))

YoungClusters <- SZ_spatialcluster %>%
  select("SUBZONE_N","Young","SP_CLUSTER") %>%
  group_by(SP_CLUSTER) %>%
  summarise(Young = mean(Young))

AgedClusters <- SZ_spatialcluster %>%
  select("SUBZONE_N","Aged","SP_CLUSTER") %>%
  group_by(SP_CLUSTER) %>%
  summarise(Aged = mean(Aged))

HDB_1_2Clusters <- SZ_spatialcluster %>%
  select("SUBZONE_N","HDB_1_2","SP_CLUSTER") %>%
  group_by(SP_CLUSTER) %>%
  summarise(HDB_1_2 = mean(HDB_1_2))

HDB_3_4Clusters <- SZ_spatialcluster %>%
  select("SUBZONE_N","HDB_3_4","SP_CLUSTER") %>%
  group_by(SP_CLUSTER) %>%
  summarise(HDB_3_4 = mean(HDB_3_4))

HDB_5_ECClusters <- SZ_spatialcluster %>%
  select("SUBZONE_N","HDB_5_EC","SP_CLUSTER") %>%
  group_by(SP_CLUSTER) %>%
  summarise(HDB_5_EC = mean(HDB_5_EC))

Condo_AptClusters <- SZ_spatialcluster %>%
  select("SUBZONE_N","Condo_Apt","SP_CLUSTER") %>%
  group_by(SP_CLUSTER) %>%
  summarise(Business = mean(Condo_Apt))

LandedPropertyClusters <- SZ_spatialcluster %>%
  select("SUBZONE_N","LandedProperty","SP_CLUSTER") %>%
  group_by(SP_CLUSTER) %>%
  summarise(Business = mean(LandedProperty))

For the analysis below, since we have determined that cluster 0 is void of any value of the variables, we will not comment on it moving foward.

BusinessClusters
## Simple feature collection with 5 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## # A tibble: 5 x 3
##   SP_CLUSTER Business                                                   geometry
##   <fct>         <dbl>                                             <GEOMETRY [m]>
## 1 0              0    MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.~
## 2 1             26.2  MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 1457~
## 3 2             17.1  MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 271~
## 4 3              8.60 MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 260~
## 5 4              1.8  POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 ~

Cluster 1 has the highest proportion of Business followed by Cluster 2,3 and 4.

IndustryClusters
## Simple feature collection with 5 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## # A tibble: 5 x 3
##   SP_CLUSTER Industry                                                   geometry
##   <fct>         <dbl>                                             <GEOMETRY [m]>
## 1 0             0     MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.~
## 2 1             0.455 MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 1457~
## 3 2             0.333 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 271~
## 4 3             0.116 MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 260~
## 5 4             0     POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 ~

We see that cluster 1 has the highest proportion of industries again followed by Cluster 2,3 and 4

FinancialClusters
## Simple feature collection with 5 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## # A tibble: 5 x 3
##   SP_CLUSTER Financial                                                  geometry
##   <fct>          <dbl>                                            <GEOMETRY [m]>
## 1 0               0    MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754~
## 2 1               8.74 MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 145~
## 3 2               5.26 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 27~
## 4 3              15.7  MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 26~
## 5 4              15.8  POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62~

For Financial, Cluster4 has the highest proportion of Financial Urban Functions at 15.8 followed closely by Cluster 3 then 1 and lastly, 2.

Govt_EmbassyClusters
## Simple feature collection with 5 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## # A tibble: 5 x 3
##   SP_CLUSTER Govt_Embassy                                               geometry
##   <fct>             <dbl>                                         <GEOMETRY [m]>
## 1 0                 0     MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17~
## 2 1                 0.795 MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, ~
## 3 2                 0.593 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59,~
## 4 3                 3.08  MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41,~
## 5 4                 0.6   POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320~

Cluster 3 shows a much higher proportion of Govt_Embassy urban functions at 3 while the rest, even the second highest is below 1 at 0.795. This could be attributed to most of the foreign embassies being in the CBD / Central Region as seen by the distribution plot earlier.

Private_residentialClusters
## Simple feature collection with 5 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## # A tibble: 5 x 3
##   SP_CLUSTER Private_residenti~                                         geometry
##   <fct>                   <dbl>                                   <GEOMETRY [m]>
## 1 0                        0    MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868~
## 2 1                       12.1  MULTIPOLYGON (((14557.7 30447.21, 14562.89 3044~
## 3 2                        2.52 MULTIPOLYGON (((27253.37 41646.98, 27223.41 416~
## 4 3                       12.2  MULTIPOLYGON (((26066.69 25744.31, 26074.22 257~
## 5 4                       12.8  POLYGON ((34418.46 37253.29, 34371.44 37143.21,~

We observe that clusters 4 3 and 1 are close however 4 is the marginally the highest at 12.8 followed by 3 and closely 1. Lastly we have cluster 2. This can be seen in the distribution plot earlier done for private_residential distribution.

YoungClusters
## Simple feature collection with 5 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## # A tibble: 5 x 3
##   SP_CLUSTER Young                                                      geometry
##   <fct>      <dbl>                                                <GEOMETRY [m]>
## 1 0             0  MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.27 ~
## 2 1           183. MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 14570.7~
## 3 2           165. MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 27193.~
## 4 3           164. MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 26078.~
## 5 4           281. POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 371~

Cluster 4 shows the highest proportion at 281.0606 while cluster 1 follows at 182.6900. Cluster2 and 3 are close with Cluster 2 narrowly being higher. In the above plot, high porportion areas of young trended to the north which could attribute to why Cluster 4’s young has the highest proportion.

AgedClusters
## Simple feature collection with 5 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## # A tibble: 5 x 3
##   SP_CLUSTER  Aged                                                      geometry
##   <fct>      <dbl>                                                <GEOMETRY [m]>
## 1 0            0   MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.27 ~
## 2 1          107.  MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 14570.7~
## 3 2           59.3 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 27193.~
## 4 3          123.  MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 26078.~
## 5 4          122.  POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 371~

We see cluster 3 having the highest proportion of Aged here with cluster 4 followed at a close second. Cluster 2 shows a much lower proportion compared to the other 3. For cluster 3 it may be the highest because of the sheer number of subzones in the area having a mix of low and high proprtions adding up. From the summary we also observe the largest difference between median at max for aged in cluster 3.

HDB_1_2Clusters
## Simple feature collection with 5 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## # A tibble: 5 x 3
##   SP_CLUSTER HDB_1_2                                                    geometry
##   <fct>        <dbl>                                              <GEOMETRY [m]>
## 1 0              0   MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.2~
## 2 1             21.4 MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 14570~
## 3 2             17.2 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 2719~
## 4 3             50.5 MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 2607~
## 5 4             22.4 POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 3~

We see that cluster 3 shows the highest proportion of HDB_1_2 at more than double than number 2 cluster 4. This can be seen also in the earlier plotted distribution map of HDB_1_2.

HDB_3_4Clusters
## Simple feature collection with 5 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## # A tibble: 5 x 3
##   SP_CLUSTER HDB_3_4                                                    geometry
##   <fct>        <dbl>                                              <GEOMETRY [m]>
## 1 0               0  MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.2~
## 2 1             271. MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 14570~
## 3 2             213. MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 2719~
## 4 3             219. MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 2607~
## 5 4             437. POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 3~

For HDB_3_4 we see cluster 4 at a significant proportion lead above the other 3 clusters at 437.0574.

HDB_5_ECClusters
## Simple feature collection with 5 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## # A tibble: 5 x 3
##   SP_CLUSTER HDB_5_EC                                                   geometry
##   <fct>         <dbl>                                             <GEOMETRY [m]>
## 1 0               0   MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.~
## 2 1             127.  MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 1457~
## 3 2             186.  MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 271~
## 4 3              60.9 MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 260~
## 5 4             367.  POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 ~

We see that Cluster 4 shows the highest proportion of HDB_5_EC at almost a double the second highest cluster 2 and quintuple the lowest cluster 3. We can observe a low proportion of HDB_5_ECs in the above plot as well.

Condo_AptClusters
## Simple feature collection with 5 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## # A tibble: 5 x 3
##   SP_CLUSTER Business                                                   geometry
##   <fct>         <dbl>                                             <GEOMETRY [m]>
## 1 0               0   MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.~
## 2 1             166.  MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 1457~
## 3 2              54.0 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 271~
## 4 3             404.  MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 260~
## 5 4             146.  POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 ~

Cluster 3 shows the highest proportion of Condo_Apt among the 4 clusters at more than double the second cluster 1 and more than quintuple the lowest cluster 2. This can also be seen in the Condo_Apt distribution plot above.

LandedPropertyClusters
## Simple feature collection with 5 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
## # A tibble: 5 x 3
##   SP_CLUSTER Business                                                   geometry
##   <fct>         <dbl>                                             <GEOMETRY [m]>
## 1 0               0   MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.~
## 2 1             125.  MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 1457~
## 3 2              85.3 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 271~
## 4 3              33.3 MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 260~
## 5 4              27.3 POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 ~

We see cluster 1 with the highest proportion of LandedProperty followed by cluster2 then 3 and 4. This result is to be expected asin the above plot, landed property had low proportions for areas in cluster 3 and 4.