1 Introduction

In this take home assignment, I will need to delineate using hierarchical clustering technique and spatially constrained clustering technique, then describe the analysis result.

1.1 Data involved in this assignment

Singapore Residents by Planning AreaSubzone, Age Group, Sex and Type of Dwelling, June 2011-2019 (extract the following indicators for Year 2019)
- Economy active population (i.e. age 25-64)
- Young group (age 0-24)
- Aged group (65 and above)
- Population density
- HDB1-2RM
- HDB3-4RM
- HDB5RM-EC
- Condo/Apt
- Landed Properties
URA Master Plan 2014 Planning Subzone boundary
Urban functions geospatial data
- Government including embassy
- Business
- Industry
- Shopping
- Financial
- Upmarket residential area

2 Setting up my environment

The R packages needed for this take home exercise are as follows:

Spatial data handling +sf, and spdep Geospatial analysis package +ClustGeo Attribute data handling +tidyverse, especially readr, ggplot2 and dplyr Choropleth mapping +tmap *cleangeo +facilitate handling and catching rgeos geometry issues +provide an utility to clean the spatial objects

packages <- c('rgdal', 'spdep', 'raster', 'ClustGeo', 'tmap', 'sf', 'ggpubr', 'cluster', 'heatmaply', 'corrplot', 'tidyverse', 'cleangeo', 'psych')

for(p in packages){
  if(!require(p, character.only=T)){
    install.packages(p)
  }
  
  library(p,character.only=T)
}

3 Preparing the necessary data for analysis

3.1 Import population as tibble dataframe

population <- read_csv("data/aspatial/Resident Population by Planning AreaSubzone, Age Group and Sex 2019.csv")

## Parsed with column specification:
## cols(
##   Time = col_double(),
##   PA = col_character(),
##   SZ = col_character(),
##   AG = col_character(),
##   Sex = col_character(),
##   TOD = col_character(),
##   Pop = col_double()
## )

3.2 Import urban functions geospatial data as simple feature Dataframe

business <- st_read(dsn="data/geospatial", layer="Business")

## Reading layer `Business' from data source `D:\Hao Jun\School\[IS415] Geospatial Analytics and Applications\Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 6550 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.6147 ymin: 1.24605 xmax: 104.0044 ymax: 1.4698
## geographic CRS: WGS 84

financial <- st_read(dsn="data/geospatial", layer="Financial")

## Reading layer `Financial' from data source `D:\Hao Jun\School\[IS415] Geospatial Analytics and Applications\Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 3320 features and 29 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.6256 ymin: 1.24392 xmax: 103.9998 ymax: 1.46247
## geographic CRS: WGS 84

embassy <- st_read(dsn="data/geospatial", layer="Govt_Embassy")

## Reading layer `Govt_Embassy' from data source `D:\Hao Jun\School\[IS415] Geospatial Analytics and Applications\Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 443 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.6282 ymin: 1.24911 xmax: 103.9884 ymax: 1.45765
## geographic CRS: WGS 84

residential <- st_read(dsn="data/geospatial", layer="Private residential")

## Reading layer `Private residential' from data source `D:\Hao Jun\School\[IS415] Geospatial Analytics and Applications\Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 3604 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.6295 ymin: 1.23943 xmax: 103.9749 ymax: 1.45379
## geographic CRS: WGS 84

shopping <- st_read(dsn="data/geospatial", layer="Shopping")

## Reading layer `Shopping' from data source `D:\Hao Jun\School\[IS415] Geospatial Analytics and Applications\Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 511 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.679 ymin: 1.24779 xmax: 103.9644 ymax: 1.4535
## geographic CRS: WGS 84

3.3 Import suzbone geospatial data as SpatialPolygonsDataFrame

In order to import the various geospatial data, I will be using readOGR function under rgdal package.

mpsz <- readOGR(dsn="data/geospatial", layer="MP14_SUBZONE_WEB_PL")

## OGR data source with driver: ESRI Shapefile 
## Source: "D:\Hao Jun\School\[IS415] Geospatial Analytics and Applications\Take-home_Ex03\data\geospatial", layer: "MP14_SUBZONE_WEB_PL"
## with 323 features
## It has 15 fields

4 Exploring on the population data

4.1 Glimpse the data type

glimpse(population)

## Rows: 98,192
## Columns: 7
## $ Time <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 20...
## $ PA   <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang M...
## $ SZ   <chr> "Ang Mo Kio Town Centre", "Ang Mo Kio Town Centre", "Ang Mo Ki...
## $ AG   <chr> "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0...
## $ Sex  <chr> "Males", "Males", "Males", "Males", "Males", "Males", "Males",...
## $ TOD  <chr> "HDB 1- and 2-Room Flats", "HDB 3-Room Flats", "HDB 4-Room Fla...
## $ Pop  <dbl> 0, 10, 10, 20, 0, 0, 50, 0, 0, 10, 10, 20, 0, 0, 40, 0, 0, 10,...

4.2 Sorting population data using dplyr

4.2.1 Derive the 3 new age groups Young, Active, Aged

population_sort <- population %>%
  spread(AG, Pop) %>%
  mutate_at(.vars = vars(PA, SZ), .funs = funs(toupper)) %>%
  group_by(Time, PA, SZ, Sex, TOD) %>%
  mutate(YOUNG=`0_to_4`+`5_to_9`+`10_to_14`+`15_to_19`+`20_to_24`) %>%
  mutate(ACTIVE=`25_to_29`+`30_to_34`+`35_to_39`+`40_to_44`+`45_to_49`+`50_to_54`+`55_to_59`+`60_to_64`) %>%
  mutate(AGED=`65_to_69`+`70_to_74`+`75_to_79`+`80_to_84`+`85_to_89`+`90_and_over`) %>%
  filter(TOD != "Others" && TOD != "HUDC Flats (excluding those privatised)") %>%
  summarise(YOUNG=sum(YOUNG), ACTIVE=sum(ACTIVE), AGED=sum(AGED))

## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.

population_sort$TOD <- factor(population_sort$TOD)

glimpse(population_sort)

## Rows: 3,876
## Columns: 8
## Groups: Time, PA, SZ, Sex [646]
## $ Time   <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, ...
## $ PA     <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG...
## $ SZ     <chr> "ANG MO KIO TOWN CENTRE", "ANG MO KIO TOWN CENTRE", "ANG MO ...
## $ Sex    <chr> "Females", "Females", "Females", "Females", "Females", "Fema...
## $ TOD    <fct> Condominiums and Other Apartments, HDB 1- and 2-Room Flats, ...
## $ YOUNG  <dbl> 340, 0, 50, 80, 220, 0, 340, 0, 50, 70, 210, 0, 0, 90, 1040,...
## $ ACTIVE <dbl> 600, 0, 160, 200, 530, 0, 510, 0, 140, 160, 480, 0, 0, 230, ...
## $ AGED   <dbl> 80, 0, 50, 80, 220, 0, 60, 0, 20, 60, 170, 0, 0, 180, 1660, ...

In this transformation, I spread out the age group and population count columns. Then, I proceed to mutate the subzone column to uppercase for later analysis with mpsz SpatialPolygonsDataFrame. Followed by combining the age group into 3 groups such as Active (age 25-64), Young (age 0-24), Aged (65 and above). Lastly, is to select the columns that I would like to keep, filter away those dwelling that is not required for this analysis, and change the TOD column into factor for the next phase of transformation.

4.2.2 Combining the 3 room with 4 room as 1 variable and rename the necessary variables

population_sort$TOD <- population_sort$TOD %>%
  fct_collapse("HDB3-4RM" = c("HDB 3-Room Flats", "HDB 4-Room Flats")) %>%
  recode("HDB 1- and 2-Room Flats"="HDB1-2RM", "HDB 5-Room and Executive Flats"="HDB5RM-EC", "Condominiums and Other Apartments"="Condo/Apt")

head(population_sort, n=10)

## # A tibble: 10 x 8
## # Groups:   Time, PA, SZ, Sex [2]
##     Time PA         SZ                  Sex    TOD            YOUNG ACTIVE  AGED
##    <dbl> <chr>      <chr>               <chr>  <fct>          <dbl>  <dbl> <dbl>
##  1  2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ Condo/Apt        340    600    80
##  2  2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ HDB1-2RM           0      0     0
##  3  2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ HDB3-4RM          50    160    50
##  4  2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ HDB3-4RM          80    200    80
##  5  2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ HDB5RM-EC        220    530   220
##  6  2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ Landed Proper~     0      0     0
##  7  2019 ANG MO KIO ANG MO KIO TOWN CE~ Males  Condo/Apt        340    510    60
##  8  2019 ANG MO KIO ANG MO KIO TOWN CE~ Males  HDB1-2RM           0      0     0
##  9  2019 ANG MO KIO ANG MO KIO TOWN CE~ Males  HDB3-4RM          50    140    20
## 10  2019 ANG MO KIO ANG MO KIO TOWN CE~ Males  HDB3-4RM          70    160    60

The next transformation, I am combining the 3 room and 4 room flats variables together using fct_collapse and using recode to rename the values in TOD column.

population_sort <- population_sort %>%
  group_by(Time, PA, SZ, Sex, TOD) %>%
  summarise(YOUNG=sum(YOUNG), ACTIVE=sum(ACTIVE), AGED=sum(AGED))

head(population_sort, n=10)

## # A tibble: 10 x 8
## # Groups:   Time, PA, SZ, Sex [2]
##     Time PA         SZ                  Sex    TOD            YOUNG ACTIVE  AGED
##    <dbl> <chr>      <chr>               <chr>  <fct>          <dbl>  <dbl> <dbl>
##  1  2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ Condo/Apt        340    600    80
##  2  2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ HDB1-2RM           0      0     0
##  3  2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ HDB3-4RM         130    360   130
##  4  2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ HDB5RM-EC        220    530   220
##  5  2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ Landed Proper~     0      0     0
##  6  2019 ANG MO KIO ANG MO KIO TOWN CE~ Males  Condo/Apt        340    510    60
##  7  2019 ANG MO KIO ANG MO KIO TOWN CE~ Males  HDB1-2RM           0      0     0
##  8  2019 ANG MO KIO ANG MO KIO TOWN CE~ Males  HDB3-4RM         120    300    80
##  9  2019 ANG MO KIO ANG MO KIO TOWN CE~ Males  HDB5RM-EC        210    480   170
## 10  2019 ANG MO KIO ANG MO KIO TOWN CE~ Males  Landed Proper~     0      0     0

Then, lastly combining the rows after renaming “HDB3-4RM” using summarise function.

4.3 Sorting the population data

4.3.1 Get total population in each subzone

population_sort_all <- population_sort %>%
  group_by(SZ) %>%
  summarise(Pop=sum(YOUNG, ACTIVE,AGED))

head(population_sort_all, n=10)

## # A tibble: 10 x 2
##    SZ                       Pop
##    <chr>                  <dbl>
##  1 ADMIRALTY              14110
##  2 AIRPORT ROAD               0
##  3 ALEXANDRA HILL         13350
##  4 ALEXANDRA NORTH         2120
##  5 ALJUNIED               38930
##  6 ANAK BUKIT             22040
##  7 ANCHORVALE             46610
##  8 ANG MO KIO TOWN CENTRE  4880
##  9 ANSON                      0
## 10 BALESTIER              32120

4.3.2 Get total population for each dwelling in each subzone

population_sort_tod <- population_sort %>%
  group_by(SZ, TOD) %>%
  summarise(Pop=sum(YOUNG, ACTIVE, AGED)) %>%
  spread(TOD, Pop)

head(population_sort_tod, n=10)

## # A tibble: 10 x 6
## # Groups:   SZ [10]
##    SZ            `Condo/Apt` `HDB1-2RM` `HDB3-4RM` `HDB5RM-EC` `Landed Properti~
##    <chr>               <dbl>      <dbl>      <dbl>       <dbl>             <dbl>
##  1 ADMIRALTY               0       1140       6750        6220                 0
##  2 AIRPORT ROAD            0          0          0           0                 0
##  3 ALEXANDRA HI~           0       3910       6320        3120                 0
##  4 ALEXANDRA NO~        2120          0          0           0                 0
##  5 ALJUNIED            11930       2120      18020        6400               460
##  6 ANAK BUKIT           9020        160       2090        3170              7600
##  7 ANCHORVALE           5000       1680      17280       22650                 0
##  8 ANG MO KIO T~        1930          0       1120        1830                 0
##  9 ANSON                   0          0          0           0                 0
## 10 BALESTIER            9610       1790      15490        4660               570

4.3.3 Get the 3 age groups in each subzone

population_sort_ageg <- population_sort %>%
  group_by(SZ) %>%
  summarise(YOUNG=sum(YOUNG), ACTIVE=sum(ACTIVE), AGED=sum(AGED))

head(population_sort_ageg, n=10)

## # A tibble: 10 x 4
##    SZ                     YOUNG ACTIVE  AGED
##    <chr>                  <dbl>  <dbl> <dbl>
##  1 ADMIRALTY               4370   8380  1360
##  2 AIRPORT ROAD               0      0     0
##  3 ALEXANDRA HILL          2880   7350  3120
##  4 ALEXANDRA NORTH          580   1420   120
##  5 ALJUNIED                8260  23700  6970
##  6 ANAK BUKIT              5860  12370  3810
##  7 ANCHORVALE             15250  27740  3620
##  8 ANG MO KIO TOWN CENTRE  1360   2780   740
##  9 ANSON                      0      0     0
## 10 BALESTIER               7310  19020  5790

4.4 Combining all the computed variables to each subzone

population_combine <- left_join(population_sort_all, population_sort_tod, by=c("SZ"="SZ"))
population_combine <- left_join(population_combine, population_sort_ageg, by=c("SZ"="SZ"))

head(population_combine, n=10)

## # A tibble: 10 x 10
##    SZ      Pop `Condo/Apt` `HDB1-2RM` `HDB3-4RM` `HDB5RM-EC` `Landed Propert~
##    <chr> <dbl>       <dbl>      <dbl>      <dbl>       <dbl>            <dbl>
##  1 ADMI~ 14110           0       1140       6750        6220                0
##  2 AIRP~     0           0          0          0           0                0
##  3 ALEX~ 13350           0       3910       6320        3120                0
##  4 ALEX~  2120        2120          0          0           0                0
##  5 ALJU~ 38930       11930       2120      18020        6400              460
##  6 ANAK~ 22040        9020        160       2090        3170             7600
##  7 ANCH~ 46610        5000       1680      17280       22650                0
##  8 ANG ~  4880        1930          0       1120        1830                0
##  9 ANSON     0           0          0          0           0                0
## 10 BALE~ 32120        9610       1790      15490        4660              570
## # ... with 3 more variables: YOUNG <dbl>, ACTIVE <dbl>, AGED <dbl>

4.4.1 Derive the new variables by dividing with subzone population

population_combine <- population_combine %>%
  group_by(SZ) %>%
  mutate(`YOUNG_PR`=`YOUNG`/`Pop`) %>%
  mutate(`ACTIVE_PR`=`ACTIVE`/`Pop`) %>%
  mutate(`AGED_PR`=`AGED`/`Pop`)
  
  
population_combine$SZ <- factor(population_combine$SZ)

head(population_combine, n=10)

## # A tibble: 10 x 13
## # Groups:   SZ [10]
##    SZ      Pop `Condo/Apt` `HDB1-2RM` `HDB3-4RM` `HDB5RM-EC` `Landed Propert~
##    <fct> <dbl>       <dbl>      <dbl>      <dbl>       <dbl>            <dbl>
##  1 ADMI~ 14110           0       1140       6750        6220                0
##  2 AIRP~     0           0          0          0           0                0
##  3 ALEX~ 13350           0       3910       6320        3120                0
##  4 ALEX~  2120        2120          0          0           0                0
##  5 ALJU~ 38930       11930       2120      18020        6400              460
##  6 ANAK~ 22040        9020        160       2090        3170             7600
##  7 ANCH~ 46610        5000       1680      17280       22650                0
##  8 ANG ~  4880        1930          0       1120        1830                0
##  9 ANSON     0           0          0          0           0                0
## 10 BALE~ 32120        9610       1790      15490        4660              570
## # ... with 6 more variables: YOUNG <dbl>, ACTIVE <dbl>, AGED <dbl>,
## #   YOUNG_PR <dbl>, ACTIVE_PR <dbl>, AGED_PR <dbl>

5 Detect potential geometry issues for mpsz SpatialPolygonsDataFrame

In order to detect for any geometry issues, I will be using clgeo_CollectionReport function under cleangeo package which gives us more details.

clgeo_CollectionReport(mpsz) %>%
  filter(valid == FALSE)

##             type valid    issue_type error_msg
## 1 rgeos_validity FALSE GEOM_VALIDITY      <NA>
## 2 rgeos_validity FALSE GEOM_VALIDITY      <NA>
## 3 rgeos_validity FALSE GEOM_VALIDITY      <NA>
## 4 rgeos_validity FALSE GEOM_VALIDITY      <NA>
## 5 rgeos_validity FALSE GEOM_VALIDITY      <NA>
## 6 rgeos_validity FALSE GEOM_VALIDITY      <NA>
## 7 rgeos_validity FALSE GEOM_VALIDITY      <NA>
## 8 rgeos_validity FALSE GEOM_VALIDITY      <NA>
## 9 rgeos_validity FALSE GEOM_VALIDITY      <NA>
##                                                                     warning_msg
## 1 Ring Self-intersection at or near point 27932.392599999999 21982.797200000001
## 2 Ring Self-intersection at or near point 26885.443899999998 26668.312099999999
## 3 Ring Self-intersection at or near point 26920.169000000002 26978.544099999999
## 4                   Ring Self-intersection at or near point 15432.475 31319.716
## 5 Ring Self-intersection at or near point 12861.382900000001 32207.492300000002
## 6         Ring Self-intersection at or near point 19681.235400000001 31294.4522
## 7          Ring Self-intersection at or near point 41375.108 40432.858899999999
## 8         Ring Self-intersection at or near point 38542.2261 44605.408900000002
## 9         Ring Self-intersection at or near point 21702.562300000001 48125.1155

The warning messages mostly state that “Ring Self-intersection at or near point xxx”. This would mean that some polygons are inside the overlap area or it’s within an invalid geometry coordinates.

5.1 Viewing the overall report on the mpsz SpatialPolygonsDataFrame

In order to view the result at a glance, I will be using clgeo_SummaryReport function under cleangeo package.

clgeo_SummaryReport(clgeo_CollectionReport(mpsz))

##              type       valid                 issue_type 
##  rgeos_validity:  9   Mode :logical   GEOM_VALIDITY:  9  
##  NA's          :314   FALSE:9         NA's         :314  
##                       TRUE :314

The report show that there is 9 out of 323 spatial objects are not valid. Therefore, I will need to fix these 9 spatial objects.

5.1.1 Process mpsz SpatialPolygonsDataFrame

I will use clgeo_Clean another function under cleangeo package to fix the 9 spatial objects that is not valid from the summary report.

The cleaning strategy used by clgeo_Clean is through polygonation method using re-polygonation intuitive algorithm to rebuild those identifed invalid geometry into a valid geometry.

mpsz.clean <- clgeo_Clean(mpsz)

5.1.2 Viewing the overall report after fixing the invalid spatial objects

clgeo_SummaryReport(clgeo_CollectionReport(mpsz.clean))

##    type      valid         issue_type
##  NA's:323   Mode:logical   NA's:323  
##             TRUE:323

From the summary report, it’s showing that there are 323 spatial objects are now valid. This shows that after applying the polygonation cleaning strategy, it does rebuild the 9 spatial objects into a valid geometries.

5.2 Compute all the subzones area

mpsz.clean$AREA <- gArea(mpsz.clean, byid=T)

Ensure that after the fix on the spatial objects the subzone area matches

any(duplicated(mpsz.clean@data$SHAPE_Area, mpsz.clean@data$AREA))

## [1] FALSE

Units of measurement is in meters square

5.2.1 Extract the subzones area

area <- mpsz.clean@data %>%
  select(SUBZONE_N, AREA)

5.2.2 Compute the population density

population_combine <- left_join(population_combine, area, by=c("SZ"="SUBZONE_N"))

population_combine <- population_combine %>%
  group_by(SZ) %>%
  mutate(DENSITY = (`Pop`/`AREA`))

head(population_combine, n=10)

## # A tibble: 10 x 15
## # Groups:   SZ [10]
##    SZ      Pop `Condo/Apt` `HDB1-2RM` `HDB3-4RM` `HDB5RM-EC` `Landed Propert~
##    <fct> <dbl>       <dbl>      <dbl>      <dbl>       <dbl>            <dbl>
##  1 ADMI~ 14110           0       1140       6750        6220                0
##  2 AIRP~     0           0          0          0           0                0
##  3 ALEX~ 13350           0       3910       6320        3120                0
##  4 ALEX~  2120        2120          0          0           0                0
##  5 ALJU~ 38930       11930       2120      18020        6400              460
##  6 ANAK~ 22040        9020        160       2090        3170             7600
##  7 ANCH~ 46610        5000       1680      17280       22650                0
##  8 ANG ~  4880        1930          0       1120        1830                0
##  9 ANSON     0           0          0          0           0                0
## 10 BALE~ 32120        9610       1790      15490        4660              570
## # ... with 8 more variables: YOUNG <dbl>, ACTIVE <dbl>, AGED <dbl>,
## #   YOUNG_PR <dbl>, ACTIVE_PR <dbl>, AGED_PR <dbl>, AREA <dbl>, DENSITY <dbl>

Population Density = Number of People/Land Area

6 Exploring on the remaining urban functions spatial data

6.1 View the business spatial data

summary(business)

##      POI_ID             SEQ_NUM         FAC_TYPE   
##  Min.   :3.644e+07   Min.   :1.000   Min.   :5000  
##  1st Qu.:9.967e+08   1st Qu.:1.000   1st Qu.:5000  
##  Median :1.097e+09   Median :1.000   Median :5000  
##  Mean   :9.933e+08   Mean   :1.021   Mean   :5084  
##  3rd Qu.:1.108e+09   3rd Qu.:1.000   3rd Qu.:5000  
##  Max.   :1.204e+09   Max.   :3.000   Max.   :9991  
##                                                    
##                        POI_NAME                  ST_NAME    
##  CAMBRIDGE INDUSTRIAL TRUST:   8   TAGORE LN         :  83  
##  DHL                       :   6   JOO KOON CIR      :  80  
##  NATIONAL OILWELL VARCO    :   6   GUL CIR           :  62  
##  ST MICROELECTRONICS       :   6   KAKI BUKIT PL     :  53  
##  CWT                       :   5   KAKI BUKIT IND TER:  52  
##  HALLIBURTON               :   5   (Other)           :5941  
##  (Other)                   :6514   NA's              : 279  
##           geometry   
##  POINT        :6550  
##  epsg:4326    :   0  
##  +proj=long...:   0  
##                      
##                      
##                      
##

Filter and convert the columns to the correct data types

business_reduced <- subset(business, select=-c(POI_ID, SEQ_NUM))
business_reduced$FAC_TYPE <- factor(business_reduced$FAC_TYPE)

summary(business_reduced)

##  FAC_TYPE                          POI_NAME                  ST_NAME    
##  5000:6440   CAMBRIDGE INDUSTRIAL TRUST:   8   TAGORE LN         :  83  
##  9991: 110   DHL                       :   6   JOO KOON CIR      :  80  
##              NATIONAL OILWELL VARCO    :   6   GUL CIR           :  62  
##              ST MICROELECTRONICS       :   6   KAKI BUKIT PL     :  53  
##              CWT                       :   5   KAKI BUKIT IND TER:  52  
##              HALLIBURTON               :   5   (Other)           :5941  
##              (Other)                   :6514   NA's              : 279  
##           geometry   
##  POINT        :6550  
##  epsg:4326    :   0  
##  +proj=long...:   0  
##                      
##                      
##                      
##

business %>%
  group_by(SEQ_NUM, FAC_TYPE) %>%
  tally(sort=TRUE)

## Simple feature collection with 5 features and 3 fields
## geometry type:  MULTIPOINT
## dimension:      XY
## bbox:           xmin: 103.6147 ymin: 1.24605 xmax: 104.0044 ymax: 1.4698
## geographic CRS: WGS 84
## # A tibble: 5 x 4
## # Groups:   SEQ_NUM [3]
##   SEQ_NUM FAC_TYPE     n                                                geometry
##     <int>    <int> <int>                                        <MULTIPOINT [°]>
## 1       1     5000  6320 ((103.6147 1.24605), (103.6198 1.27715), (103.6208 1.2~
## 2       2     5000   117 ((103.624 1.29208), (103.6251 1.29124), (103.6251 1.29~
## 3       1     9991    95 ((103.6237 1.29389), (103.6242 1.29325), (103.6251 1.2~
## 4       2     9991    15 ((103.6237 1.29389), (103.6251 1.29333), (103.633 1.31~
## 5       3     5000     3 ((103.6936 1.35292), (103.7493 1.30213), (103.8451 1.2~

Wondering what is SEQ_NUM, I group by the SEQ_NUM with FAC_TYPE to view association. Based on the POI_NAME, I belive that it’s the way how the FAC_TYPE are being lablel since 5000 or 9991 are also associated to 1 and 2.

head(business %>%
  filter(FAC_TYPE == 5000), n=10)

## Simple feature collection with 10 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.7354 ymin: 1.2972 xmax: 103.8858 ymax: 1.34032
## geographic CRS: WGS 84
##        POI_ID SEQ_NUM FAC_TYPE                        POI_NAME
## 1  1101180209       1     5000                       JOHN CHEN
## 2  1101180210       1     5000    TROPICAL INDUSTRIAL BUILDING
## 3  1101180211       1     5000 LIAN CHEONG INDUSTRIAL BUILDING
## 4  1101180212       1     5000  MALAYSIA GARMENT MANUFACTURERS
## 5  1101180213       1     5000                         UNIGOLD
## 6  1192316144       1     5000             NUS UNIVERSITY HALL
## 7  1144317654       1     5000           SUITES AT BUKIT TIMAH
## 8  1103507488       1     5000                      TIONG HUAT
## 9  1001052867       1     5000  LEE CHOON GUAN TIMBER MERCHANT
## 10 1001052868       1     5000           WEIGHT BRIDGE SERVICE
##                ST_NAME                 geometry
## 1            LITTLE RD POINT (103.8856 1.33841)
## 2            LITTLE RD POINT (103.8852 1.33832)
## 3            LITTLE RD POINT (103.8852 1.33834)
## 4                 <NA> POINT (103.8855 1.33821)
## 5            LITTLE RD POINT (103.8858 1.33844)
## 6  LOWER KENT RIDGE RD  POINT (103.7777 1.2972)
## 7  JALAN JURONG KECHIL POINT (103.7738 1.34032)
## 8   KALLANG PUDDING RD  POINT (103.879 1.32773)
## 9           PENJURU RD  POINT (103.7354 1.3184)
## 10          PENJURU RD POINT (103.7361 1.31927)

head(business %>%
  filter(FAC_TYPE == 9991), n=10)

## Simple feature collection with 10 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.6242 ymin: 1.29325 xmax: 103.9686 ymax: 1.3598
## geographic CRS: WGS 84
##        POI_ID SEQ_NUM FAC_TYPE
## 1  1110491789       1     9991
## 2  1099992474       1     9991
## 3  1099992477       1     9991
## 4  1099992477       2     9991
## 5  1100464367       1     9991
## 6  1100464367       2     9991
## 7  1100464368       1     9991
## 8  1100464369       1     9991
## 9  1100464369       2     9991
## 10 1108286608       1     9991
##                                                        POI_NAME
## 1                                  KAKI BUKIT INDUSTRIAL ESTATE
## 2                             TERRACE FACTORIES TUAS SOUTH ST 5
## 3               TERRACE FACTORIES TUAS SOUTH ST 5 EAST ENTRANCE
## 4                             JTC TERRACE FACTORIES TUAS S ST 5
## 5  CHANGI BUSINESS PARK CHANGI BUSINESS PARK CENTRAL 1 ENTRANCE
## 6                                CHANGI BUSINESS PARK CENTRAL 1
## 7                                          CHANGI BUSINESS PARK
## 8      CHANGI BUSINESS PARK CHANGI BUSINESS PARK VISTA ENTRANCE
## 9                           CHANGI BUSINESS PARK VISTA ENTRANCE
## 10                                 ANG MO KIO INDUSTRIAL PARK 1
##                           ST_NAME                 geometry
## 1                KAKI BUKIT AVE 1 POINT (103.9042 1.33269)
## 2                 TUAS SOUTH ST 5 POINT (103.6242 1.29325)
## 3                 TUAS SOUTH ST 5 POINT (103.6251 1.29333)
## 4                 TUAS SOUTH ST 5 POINT (103.6251 1.29333)
## 5  CHANGI BUSINESS PARK CENTRAL 1 POINT (103.9633 1.33283)
## 6  CHANGI BUSINESS PARK CENTRAL 1 POINT (103.9633 1.33283)
## 7  CHANGI BUSINESS PARK CENTRAL 2 POINT (103.9657 1.33445)
## 8        CHANGI BUSINESS PARK VIS POINT (103.9686 1.33649)
## 9        CHANGI BUSINESS PARK VIS POINT (103.9686 1.33649)
## 10          ANG MO KIO IND PARK 1   POINT (103.856 1.3598)

After viewing in detail, there are 2 types of FAC_TYPE, 5000 indicates the business, and 9991 indicates industry. Therefore, I will need to convert the SpatialPointsDataFrame into sf dataframe before I can filter them separately.

6.2 View the embassy spatial data

summary(embassy)

##      POI_ID             SEQ_NUM         FAC_TYPE   
##  Min.   :3.644e+07   Min.   :1.000   Min.   :9525  
##  1st Qu.:1.010e+09   1st Qu.:1.000   1st Qu.:9525  
##  Median :1.058e+09   Median :1.000   Median :9525  
##  Mean   :1.006e+09   Mean   :1.111   Mean   :9651  
##  3rd Qu.:1.113e+09   3rd Qu.:1.000   3rd Qu.:9993  
##  Max.   :1.203e+09   Max.   :2.000   Max.   :9993  
##                                                    
##                             POI_NAME         ST_NAME             geometry  
##  ANG MO KIO TOWN COUNCIL        :  5   MAXWELL RD: 16   POINT        :443  
##  SEMBAWANG-NEE SOON TOWN COUNCIL:  3   THOMSON RD: 12   epsg:4326    :  0  
##  ALJUNIED HOUGANG TOWN COUNCIL  :  2   ORCHARD RD: 11   +proj=long...:  0  
##  ALJUNIED TOWN COUNCIL          :  2   COLLEGE RD: 10                      
##  BISHAN-TOA PAYOH TOWN COUNCIL  :  2   SCOTTS RD :  8                      
##  CENTRAL PROVIDENT FUND BOARD   :  2   (Other)   :358                      
##  (Other)                        :427   NA's      : 28

Filter and convert the columns to the correct data types

embassy_reduced <- subset(embassy, select=-c(POI_ID, SEQ_NUM))
embassy_reduced$FAC_TYPE <- factor(embassy_reduced$FAC_TYPE)

summary(embassy_reduced)

##  FAC_TYPE                              POI_NAME         ST_NAME   
##  9525:324   ANG MO KIO TOWN COUNCIL        :  5   MAXWELL RD: 16  
##  9993:119   SEMBAWANG-NEE SOON TOWN COUNCIL:  3   THOMSON RD: 12  
##             ALJUNIED HOUGANG TOWN COUNCIL  :  2   ORCHARD RD: 11  
##             ALJUNIED TOWN COUNCIL          :  2   COLLEGE RD: 10  
##             BISHAN-TOA PAYOH TOWN COUNCIL  :  2   SCOTTS RD :  8  
##             CENTRAL PROVIDENT FUND BOARD   :  2   (Other)   :358  
##             (Other)                        :427   NA's      : 28  
##           geometry  
##  POINT        :443  
##  epsg:4326    :  0  
##  +proj=long...:  0  
##                     
##                     
##                     
##

head(embassy %>%
  filter(FAC_TYPE == 9525), n=10)

## Simple feature collection with 10 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.8359 ymin: 1.27869 xmax: 103.9698 ymax: 1.36625
## geographic CRS: WGS 84
##        POI_ID SEQ_NUM FAC_TYPE                       POI_NAME           ST_NAME
## 1  1192460871       1     9525                MND TOWER BLOCK        MAXWELL RD
## 2  1192460819       1     9525 MND AUDITORIUM & FUNCTION HALL        MAXWELL RD
## 3  1192460843       1     9525          AICARE LINK @ MAXWELL        MAXWELL RD
## 4  1192460783       1     9525   HARMONY IN DIVERSITY GALLERY        MAXWELL RD
## 5  1192460750       1     9525    FAMILY SUPPORT DIVISION MSF        MAXWELL RD
## 6  1194224304       1     9525               LTA BEDOK CAMPUS      CHAI CHEE ST
## 7  1104192309       1     9525  ALJUNIED HOUGANG TOWN COUNCIL     HOUGANG AVE 2
## 8  1178905747       1     9525       SINGAPORE PRISON SERVICE   UPP CHANGI RD N
## 9  1058427947       1     9525             OLD POLICE ACADEMY MOUNT PLEASANT RD
## 10 1161806958       1     9525   CENTRAL PROVIDENT FUND BOARD        THOMSON RD
##                    geometry
## 1  POINT (103.8456 1.27869)
## 2  POINT (103.8455 1.27883)
## 3  POINT (103.8455 1.27883)
## 4  POINT (103.8455 1.27883)
## 5  POINT (103.8455 1.27883)
## 6  POINT (103.9184 1.32688)
## 7  POINT (103.8895 1.36625)
## 8  POINT (103.9698 1.35891)
## 9  POINT (103.8359 1.33186)
## 10 POINT (103.8435 1.31962)

head(embassy %>%
  filter(FAC_TYPE == 9993), n=10)

## Simple feature collection with 10 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.7836 ymin: 1.27958 xmax: 103.8605 ymax: 1.33637
## geographic CRS: WGS 84
##        POI_ID SEQ_NUM FAC_TYPE                 POI_NAME      ST_NAME
## 1  1141424380       1     9993     CONSULATE SAN MARINO    CHURCH ST
## 2  1141424404       1     9993             EMBASSY LAOS GOLDHILL PLZ
## 3  1141424402       1     9993         CONSULATE BELIZE     CECIL ST
## 4  1141424338       1     9993   GENERAL CONSULATE OMAN         <NA>
## 5  1001332522       1     9993           EMBASSY NORWAY RAFFLES QUAY
## 6  1001332520       1     9993           EMBASSY PANAMA RAFFLES QUAY
## 7  1058438674       1     9993 GENERAL CONSULATE GREECE   DUNEARN RD
## 8  1069433720       1     9993      EMBASSY NORTH KOREA     BEACH RD
## 9   995186738       1     9993           EMBASSY TURKEY  SHENTON WAY
## 10 1058455576       1     9993 HIGH COMMISSION MALDIVES   THOMSON RD
##                    geometry
## 1  POINT (103.8494 1.28343)
## 2  POINT (103.8431 1.31836)
## 3  POINT (103.8493 1.28128)
## 4   POINT (103.8578 1.2999)
## 5  POINT (103.8512 1.28113)
## 6   POINT (103.8512 1.2812)
## 7  POINT (103.7836 1.33637)
## 8  POINT (103.8605 1.30008)
## 9  POINT (103.8503 1.27958)
## 10  POINT (103.8441 1.3169)

After viewing in detail, there are 2 types of FAC_TYPE, 9525 indicates the Government building, and 9993 indicates embassy. Therefore, I will need to convert the SpatialPointsDataFrame into sf dataframe before I can filter them separately.

6.3 View the financial spatial data

summary(financial)

##     LINK_ID              POI_ID             SEQ_NUM         FAC_TYPE   
##  Min.   :1.161e+08   Min.   :3.644e+07   Min.   :1.000   Min.   :3578  
##  1st Qu.:8.594e+08   1st Qu.:1.097e+09   1st Qu.:1.000   1st Qu.:3578  
##  Median :9.140e+08   Median :1.113e+09   Median :1.000   Median :3578  
##  Mean   :9.092e+08   Mean   :1.088e+09   Mean   :1.008   Mean   :4397  
##  3rd Qu.:1.046e+09   3rd Qu.:1.132e+09   3rd Qu.:1.000   3rd Qu.:6000  
##  Max.   :1.224e+09   Max.   :1.204e+09   Max.   :2.000   Max.   :6000  
##                                                                        
##      POI_NAME   POI_LANGCD POI_NMTYPE   POI_ST_NUM   ST_NUM_FUL  ST_NFUL_LC 
##  OCBC    :788   ENG:3320   B:3293     1      : 212   29A :   1   ENG :   5  
##  UOB     :577              J:  27     10     :  76   333A:   1   NA's:3315  
##  POSB    :564                         2      :  53   77B :   1              
##  DBS     :282                         11     :  50   7A  :   1              
##  CITIBANK:153                         304    :  50   8A  :   1              
##  MAYBANK : 51                         (Other):2004   NA's:3315              
##  (Other) :905                         NA's   : 875                          
##               ST_NAME     ST_LANGCD   POI_ST_SD ACC_TYPE          PH_NUMBER   
##  ORCHARD RD       : 156   ENG :2926   L:1652    NA's:3320   63396666   :  52  
##  BEACH RD         :  44   NA's: 394   N:  24                63272265   :  16  
##  NORTH BRIDGE RD  :  39               R:1644                18002222121:  15  
##  COLLYER QUAY     :  38                                     18004383333:  13  
##  NEW UPP CHANGI RD:  35                                     18001111111:  11  
##  (Other)          :2614                                     (Other)    : 539  
##  NA's             : 394                                     NA's       :2674  
##     CHAIN_ID     NAT_IMPORT PRIVATE  IN_VICIN   NUM_PARENT    
##  Min.   :    0   N:3320     N:3320   N:3320   Min.   :0.0000  
##  1st Qu.: 2526                                1st Qu.:0.0000  
##  Median : 6918                                Median :0.0000  
##  Mean   : 5121                                Mean   :0.3807  
##  3rd Qu.: 6920                                3rd Qu.:1.0000  
##  Max.   :24982                                Max.   :2.0000  
##                                                               
##    NUM_CHILD           PERCFRREF       VANCITY_ID
##  Min.   :0.0000000   Min.   : 1.00   Min.   :0   
##  1st Qu.:0.0000000   1st Qu.:30.00   1st Qu.:0   
##  Median :0.0000000   Median :50.00   Median :0   
##  Mean   :0.0003012   Mean   :46.87   Mean   :0   
##  3rd Qu.:0.0000000   3rd Qu.:60.00   3rd Qu.:0   
##  Max.   :1.0000000   Max.   :99.00   Max.   :0   
##                      NA's   :1339                
##                                                                ACT_ADDR   
##  1 KIM SENG PROMENADE                              SINGAPORE 237994:   7  
##  3 TEMASEK BOULEVARD                               SINGAPORE 038983:   7  
##  530 LORONG 6 TOA PAYOH                            SINGAPORE 310530:   7  
##  2 JURONG EAST ST 21                               SINGAPORE 609601:   6  
##  3D RIVER VALLEY ROAD                              SINGAPORE 179023:   6  
##  (Other)                                                           : 243  
##  NA's                                                              :3044  
##  ACT_LANGCD               ACT_ST_NAM     ACT_ST_NUM       ACT_ADMIN   
##  ENG : 276   DUNEARN ROAD      :   7   1      :  20   INGAPORE :   1  
##  NA's:3044   KIM SENG PROMENADE:   7   3      :  11   SINGAPORE: 275  
##              LORONG 6 TOA PAYOH:   7   2      :  10   NA's     :3044  
##              PAYA LEBAR ROAD   :   7   50     :   8                   
##              TEMASEK BOULEVARD :   7   530    :   7                   
##              (Other)           : 241   (Other): 220                   
##              NA's              :3044   NA's   :3044                   
##    ACT_POSTAL            geometry   
##  038983 :   7   POINT        :3320  
##  237994 :   7   epsg:4326    :   0  
##  310530 :   7   +proj=long...:   0  
##  609601 :   7                       
##  179023 :   6                       
##  (Other): 242                       
##  NA's   :3044

Filter and convert the columns to the correct data types

financial_reduced <- subset(financial, select=-c(LINK_ID, POI_ID, SEQ_NUM, ST_NAME, POI_LANGCD, POI_NMTYPE, POI_ST_NUM, ST_NUM_FUL, ST_NFUL_LC, ST_LANGCD, POI_ST_SD,  ACC_TYPE, PH_NUMBER, CHAIN_ID, NAT_IMPORT, PRIVATE, IN_VICIN, NUM_PARENT, NUM_CHILD, PERCFRREF, VANCITY_ID, ACT_ADDR, ACT_LANGCD, ACT_ST_NAM, ACT_ST_NUM, ACT_ADMIN, ACT_POSTAL))

financial_reduced$FAC_TYPE <- factor(financial_reduced$FAC_TYPE)

summary(financial_reduced)

##  FAC_TYPE        POI_NAME            geometry   
##  3578:2198   OCBC    :788   POINT        :3320  
##  6000:1122   UOB     :577   epsg:4326    :   0  
##              POSB    :564   +proj=long...:   0  
##              DBS     :282                       
##              CITIBANK:153                       
##              MAYBANK : 51                       
##              (Other) :905

head(financial_reduced %>%
  filter(FAC_TYPE == 3578), n=10)

## Simple feature collection with 10 features and 2 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.7189 ymin: 1.29367 xmax: 103.8635 ymax: 1.41695
## geographic CRS: WGS 84
##    FAC_TYPE POI_NAME                 geometry
## 1      3578      UOB  POINT (103.833 1.41695)
## 2      3578     POSB POINT (103.7989 1.30211)
## 3      3578      UOB POINT (103.7989 1.30211)
## 4      3578     OCBC POINT (103.7989 1.30211)
## 5      3578     OCBC POINT (103.7189 1.35016)
## 6      3578      UOB POINT (103.7723 1.29608)
## 7      3578     OCBC POINT (103.7719 1.29367)
## 8      3578 CITIBANK POINT (103.7723 1.29608)
## 9      3578      UOB POINT (103.8635 1.31838)
## 10     3578      UOB POINT (103.7801 1.29684)

head(financial_reduced %>%
  filter(FAC_TYPE == 6000), n=10)

## Simple feature collection with 10 features and 2 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.7739 ymin: 1.27715 xmax: 103.9224 ymax: 1.43101
## geographic CRS: WGS 84
##    FAC_TYPE                  POI_NAME                 geometry
## 1      6000                   MAYBANK POINT (103.9224 1.31199)
## 2      6000       ADPOST MONEYCHANGER POINT (103.8242 1.30528)
## 3      6000 HBZ INT'L EXCHANGE CO (S) POINT (103.8512 1.29212)
## 4      6000           RAZINA EXCHANGE POINT (103.8454 1.27721)
## 5      6000                      OCBC POINT (103.7739 1.43101)
## 6      6000              MONEY SUPPLY POINT (103.8468 1.27715)
## 7      6000        SINO MONEY CHANGER POINT (103.9047 1.30211)
## 8      6000   EVERPEACE MONEY CHANGER POINT (103.9047 1.30211)
## 9      6000                      POSB  POINT (103.871 1.35411)
## 10     6000                       DBS POINT (103.8556 1.30219)

After viewing in detail, there are 2 types of FAC_TYPE, 3578 indicates the Banks, and 6000 indicates Money Services Business (MSB). Therefore, I will need to convert the SpatialPointsDataFrame into sf dataframe before I can filter them separately.

6.4 View the residential spatial data

summary(residential)

##      POI_ID             SEQ_NUM         FAC_TYPE   
##  Min.   :3.644e+07   Min.   :1.000   Min.   :9590  
##  1st Qu.:9.968e+08   1st Qu.:1.000   1st Qu.:9590  
##  Median :1.070e+09   Median :1.000   Median :9590  
##  Mean   :1.052e+09   Mean   :1.007   Mean   :9590  
##  3rd Qu.:1.105e+09   3rd Qu.:1.000   3rd Qu.:9590  
##  Max.   :1.204e+09   Max.   :2.000   Max.   :9590  
##                                                    
##                     POI_NAME                 ST_NAME              geometry   
##  BLISSFUL VIEW          :   3   PASIR PANJANG RD :  45   POINT        :3604  
##  CLEMENTI PARK          :   3   UPP EAST COAST RD:  29   epsg:4326    :   0  
##  COMPASSVALE VIEW       :   3   LOR K TELOK KURAU:  26   +proj=long...:   0  
##  KING'S MANSION         :   3   BUKIT TIMAH RD   :  24                       
##  MIDPOINT PROPERTIES    :   3   EAST COAST RD    :  23                       
##  NEE SOON CENTRAL ESTATE:   3   (Other)          :3412                       
##  (Other)                :3586   NA's             :  45

Filter and convert the columns to the correct data types

residential_reduced <- subset(residential, select=-c(POI_ID, SEQ_NUM, ST_NAME))
residential_reduced$FAC_TYPE <- factor(residential_reduced$FAC_TYPE)

summary(residential_reduced)

##  FAC_TYPE                       POI_NAME             geometry   
##  9590:3604   BLISSFUL VIEW          :   3   POINT        :3604  
##              CLEMENTI PARK          :   3   epsg:4326    :   0  
##              COMPASSVALE VIEW       :   3   +proj=long...:   0  
##              KING'S MANSION         :   3                       
##              MIDPOINT PROPERTIES    :   3                       
##              NEE SOON CENTRAL ESTATE:   3                       
##              (Other)                :3586

head(residential_reduced, n=10)

## Simple feature collection with 10 features and 2 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.7754 ymin: 1.28119 xmax: 103.9516 ymax: 1.36908
## geographic CRS: WGS 84
##    FAC_TYPE                       POI_NAME                 geometry
## 1      9590 MARINA BAY SERVICED APARTMENTS POINT (103.8526 1.28119)
## 2      9590                 SIN MING VILLE POINT (103.8355 1.35361)
## 3      9590         GREENTOPS @ SIMS PLACE POINT (103.8797 1.31643)
## 4      9590    MOUNTBATTEN DAKOTA CRESCENT POINT (103.8895 1.30834)
## 5      9590                    SINGA COURT POINT (103.9084 1.33037)
## 6      9590            FORESQUE RESIDENCES POINT (103.7754 1.36908)
## 7      9590              TIONG BAHRU COURT POINT (103.8272 1.28439)
## 8      9590            BIRMINGHAM MANSIONS POINT (103.8444 1.31747)
## 9      9590              THOMSON EURO-ASIA POINT (103.8443 1.31787)
## 10     9590                STRATFORD COURT POINT (103.9517 1.32789)

After viewing in detail, there is only 1 type of FAC_TYPE, 9590 indicates residental area.

6.5 View the shopping spatial data

summary(shopping)

##      POI_ID             SEQ_NUM         FAC_TYPE   
##  Min.   :3.644e+07   Min.   :1.000   Min.   :6512  
##  1st Qu.:9.656e+08   1st Qu.:1.000   1st Qu.:6512  
##  Median :1.070e+09   Median :1.000   Median :6512  
##  Mean   :8.934e+08   Mean   :1.108   Mean   :6512  
##  3rd Qu.:1.104e+09   3rd Qu.:1.000   3rd Qu.:6512  
##  Max.   :1.204e+09   Max.   :3.000   Max.   :6512  
##                                                    
##                              POI_NAME             ST_NAME   
##  BUKIT BATOK WEST SHOPPING CENTRE:  2   ORCHARD RD    : 37  
##  CHANGE ALLEY                    :  2   BUKIT TIMAH RD:  7  
##  FARMART CENTRE                  :  2   SCOTTS RD     :  7  
##  FORTUNE CENTRE                  :  2   BEACH RD      :  6  
##  HARBOUR FRONT CENTRE ENTRANCE   :  2   BENCOOLEN ST  :  5  
##  NEW WORLD CENTRE                :  2   (Other)       :347  
##  (Other)                         :499   NA's          :102  
##           geometry  
##  POINT        :511  
##  epsg:4326    :  0  
##  +proj=long...:  0  
##                     
##                     
##                     
##

Filter and convert the columns to the correct data types

shopping_reduced <- subset(shopping, select=-c(POI_ID, SEQ_NUM, ST_NAME))
shopping_reduced$FAC_TYPE <- factor(shopping_reduced$FAC_TYPE)

summary(shopping_reduced)

##  FAC_TYPE                               POI_NAME            geometry  
##  6512:511   BUKIT BATOK WEST SHOPPING CENTRE:  2   POINT        :511  
##             CHANGE ALLEY                    :  2   epsg:4326    :  0  
##             FARMART CENTRE                  :  2   +proj=long...:  0  
##             FORTUNE CENTRE                  :  2                      
##             HARBOUR FRONT CENTRE ENTRANCE   :  2                      
##             NEW WORLD CENTRE                :  2                      
##             (Other)                         :499

head(shopping, n=10)

## Simple feature collection with 10 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 103.7127 ymin: 1.25619 xmax: 103.9041 ymax: 1.35375
## geographic CRS: WGS 84
##        POI_ID SEQ_NUM FAC_TYPE                              POI_NAME
## 1  1132106213       1     6512                       SIN MING CENTRE
## 2   801758392       1     6512                           THE ADELPHI
## 3   842821452       1     6512              BOON LAY SHOPPING CENTRE
## 4  1193779191       1     6512                         KATONG SQUARE
## 5   801758399       1     6512                        SIM LIM SQUARE
## 6  1001450091       1     6512                 PEOPLE'S PARK COMPLEX
## 7  1069767253       1     6512 UNITED SQUARE GOLDHILL PLAZA ENTRANCE
## 8  1069767253       2     6512   UNITED SQUARE GOLDHILL PLZ ENTRANCE
## 9  1039562724       1     6512                             THE FORUM
## 10 1039562723       1     6512                            WATERFRONT
##            ST_NAME                 geometry
## 1      SIN MING RD  POINT (103.836 1.35375)
## 2       COLEMAN ST POINT (103.8515 1.29124)
## 3      BOON LAY PL POINT (103.7127 1.34672)
## 4    EAST COAST RD   POINT (103.9041 1.305)
## 5  ROCHOR CANAL RD POINT (103.8533 1.30341)
## 6          PARK RD  POINT (103.843 1.28458)
## 7             <NA> POINT (103.8432 1.31744)
## 8             <NA> POINT (103.8432 1.31744)
## 9             <NA> POINT (103.8205 1.25619)
## 10            <NA> POINT (103.8205 1.25619)

After viewing in detail, there is only 1 type of FAC_TYPE, 6512 indicates shopping centres.

6.6 Conclusion of the urban functions spatial data

OVERALL FAC_TYPE
Business	Industry	Government Building	Embassy	Banks	Money Services Business (MSB)	Residental Area	Shopping Centres
5000	9991	9525	9993	3578	6000	9590	6512

7 Checking the projection system of the spatial data

Using CRS to retrieve the projection information of the geospatial data to ensure that they are projected in same projection system. ## Subzone spatial data

mpsz_3414 <- st_as_sf(mpsz.clean, 3414)
mpsz_3414 <- st_transform(mpsz_3414, 3414)
st_crs(mpsz_3414)

## Coordinate Reference System:
##   User input: EPSG:3414 
##   wkt:
## PROJCRS["SVY21 / Singapore TM",
##     BASEGEOGCRS["SVY21",
##         DATUM["SVY21",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4757]],
##     CONVERSION["Singapore Transverse Mercator",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (N)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (E)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Singapore"],
##         BBOX[1.13,103.59,1.47,104.07]],
##     ID["EPSG",3414]]

7.1 Business spatial data

business_reduced <- st_transform(business_reduced, 3414)
st_crs(business_reduced)

## Coordinate Reference System:
##   User input: EPSG:3414 
##   wkt:
## PROJCRS["SVY21 / Singapore TM",
##     BASEGEOGCRS["SVY21",
##         DATUM["SVY21",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4757]],
##     CONVERSION["Singapore Transverse Mercator",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (N)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (E)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Singapore"],
##         BBOX[1.13,103.59,1.47,104.07]],
##     ID["EPSG",3414]]

7.2 Embassy spatial data

embassy_reduced <- st_transform(embassy_reduced, 3414)
crs(embassy_reduced)

## [1] "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"

7.3 Financial spatial data

financial_reduced <- st_transform(financial_reduced, 3414)
crs(financial_reduced)

## [1] "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"

7.4 Residential spatial data

residential_reduced <- st_transform(residential_reduced, 3414)
crs(residential_reduced)

## [1] "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"

7.5 Shopping spatial data

shopping_reduced <- st_transform(shopping_reduced, 3414)
crs(shopping_reduced)

## [1] "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"

8 Filter and transform urban functions data

8.1 Business and Industry

Binding the subzone spatial data with business urban functions spatial data

mpsz_business <- bind_cols(business_reduced, mpsz_3414[as.numeric(st_within(business_reduced, mpsz_3414)),]) %>%
  group_by(SUBZONE_N, FAC_TYPE, POI_NAME)

Sort and tally the number of business and industry in each subzone

urban_business <- mpsz_business %>%
  group_by(SUBZONE_N, FAC_TYPE) %>%
  tally() %>%
  spread(FAC_TYPE, n) %>%
  select(SUBZONE_N, `5000`, `9991`)

urban_business <- st_drop_geometry(urban_business)

urban_business <- replace(urban_business, is.na(urban_business), 0)

urban_business <- urban_business %>%
  summarise(`5000`=sum(`5000`), `9991`=sum(`9991`))

8.2 Embassy

Binding the subzone spatial data with Embassy urban functions spatial data

mpsz_embassy <- bind_cols(embassy_reduced, mpsz_3414[as.numeric(st_within(embassy_reduced, mpsz_3414)),]) %>%
  select(SUBZONE_N, FAC_TYPE, POI_NAME, geometry)

Sort and tally the number of government buildings and embassy in each subzone

urban_embassy <- mpsz_embassy %>%
  group_by(SUBZONE_N, FAC_TYPE) %>%
  tally() %>%
  spread(FAC_TYPE, n) %>%
  select(SUBZONE_N, `9525`, `9993`)

urban_embassy <- st_drop_geometry(urban_embassy)

urban_embassy <- replace(urban_embassy, is.na(urban_embassy), 0)

urban_embassy <- urban_embassy %>%
  summarise(`9525`=sum(`9525`), `9993`=sum(`9993`))

8.3 Financial

Binding the subzone spatial data with financial urban functions spatial data

mpsz_financial <- bind_cols(financial_reduced, mpsz_3414[as.numeric(st_within(financial_reduced, mpsz_3414)),]) %>%
  select(SUBZONE_N, FAC_TYPE, POI_NAME, geometry)

Sort and tally the number of banks and MSB in each subzone

urban_financial <- mpsz_financial %>%
  group_by(SUBZONE_N, FAC_TYPE) %>%
  tally() %>%
  spread(FAC_TYPE, n) %>%
  select(SUBZONE_N, `3578`, `6000`)

urban_financial <- st_drop_geometry(urban_financial)

urban_financial <- replace(urban_financial, is.na(urban_financial), 0)

urban_financial <- urban_financial %>%
  summarise(`3578`=sum(`3578`), `6000`=sum(`6000`))

8.4 Residential

Binding the subzone spatial data with residential urban functions spatial data

mpsz_residential <- bind_cols(residential_reduced, mpsz_3414[as.numeric(st_within(residential_reduced, mpsz_3414)),]) %>%
  select(SUBZONE_N, FAC_TYPE, POI_NAME, geometry)

Sort and tally the number of residential in each subzone

urban_residential <- mpsz_residential %>%
  group_by(SUBZONE_N, FAC_TYPE) %>%
  tally() %>%
  spread(FAC_TYPE, n) %>%
  select(SUBZONE_N, `9590`)

urban_residential <- st_drop_geometry(urban_residential)

urban_residential <- replace(urban_residential, is.na(urban_residential), 0)

urban_residential <- urban_residential %>%
  summarise(`9590`=sum(`9590`))

8.5 Shopping

Binding the subzone spatial data with shopping urban functions spatial data

mpsz_shopping <- bind_cols(shopping_reduced, mpsz_3414[as.numeric(st_within(shopping_reduced, mpsz_3414)),]) %>%
  select(SUBZONE_N, FAC_TYPE, POI_NAME, geometry)

Sort and tally the number of shopping in each subzone

urban_shopping <- mpsz_shopping %>%
  group_by(SUBZONE_N, FAC_TYPE) %>%
  tally() %>%
  spread(FAC_TYPE, n) %>%
  select(SUBZONE_N, `6512`)

urban_shopping <- st_drop_geometry(urban_shopping)

urban_shopping <- replace(urban_shopping, is.na(urban_shopping), 0)

urban_shopping <- urban_shopping %>%
  summarise(`6512`=sum(`6512`))

8.6 Merging all the urban functions computed data

mpsz_urban <- merge(urban_business, urban_embassy) 
mpsz_urban <- merge(mpsz_urban, urban_financial)
mpsz_urban <- merge(mpsz_urban, urban_residential)
mpsz_urban <- merge(mpsz_urban, urban_shopping)

mpsz_urban <- mpsz_urban %>%
  rename(`Business`=`5000`, `Industry`=`9991`, `Government`=`9525`, `Embassy`=`9993`, `Banks`=`3578`, `MSB`=`6000`, `Residential`=`9590`, `Shopping`=`6512`)

head(mpsz_urban, n=10)

##                 SUBZONE_N Business Industry Government Embassy Banks MSB
## 1          ALEXANDRA HILL       39        1          6       1    10   5
## 2                ALJUNIED       62        1          2       0    32  10
## 3              ANAK BUKIT       13        0          1       0    27   4
## 4  ANG MO KIO TOWN CENTRE        1        0          1       0    13  16
## 5                   ANSON       15        0          1       2     4   2
## 6               BALESTIER       23        0          6       0    19  12
## 7             BEDOK NORTH        5        0          2       0    46  22
## 8               BENCOOLEN       13        0          1       0     6   9
## 9             BISHAN EAST        1        0          3       0    17   8
## 10              BOAT QUAY       40        0          2       0     0   2
##    Residential Shopping
## 1            4        1
## 2          140        2
## 3           49        3
## 4            8        5
## 5            3        1
## 6          123        6
## 7           29        6
## 8            7        5
## 9           10        2
## 10           1        1

9 Combine subzone with urban functions spatial data

urban_population <- left_join(population_combine, mpsz_urban, by=c("SZ"="SUBZONE_N"))

urban_population <- left_join(mpsz_3414, urban_population, by=c("SUBZONE_N"="SZ")) %>%
  select("SUBZONE_N", "Condo/Apt", "HDB1-2RM", "HDB3-4RM", "HDB5RM-EC", "Landed Properties", "YOUNG_PR", "ACTIVE_PR", "AGED_PR", "DENSITY", "Business", "Industry", "Government", "Embassy", "Banks", "MSB", "Residential", "Shopping")

urban_population <- replace(urban_population, is.na(urban_population), 0)

Select on the columns that are relevant and replace all the NA to 0 due to the no data after combining all the data.

10 Correlation Analysis

10.1 Visualise and analyse the correlation of the input variables

urban_population_df <- as.data.frame(urban_population)
urban_vars.cor = cor(urban_population_df[,2:18])
corrplot.mixed(urban_vars.cor,
               lower = "ellipse", 
               upper = "number",
               tl.pos = "lt",
               diag = "l",
               tl.col = "black", tl.cex=0.9, number.cex=0.5)

The correlation plot above shows that YOUNG_PR and ACTIVE_PR are highly correlated. Therefore, YOUNG_PR should be used in the cluster analysis due to the lower correlated values with other variables.

11 Hierarchy Cluster Analysis

11.1 Extracting analysis variables

from the shan_sf simple feature object into dataframe

cluster_vars <- urban_population_df %>%
  select("SUBZONE_N", "Condo/Apt", "HDB1-2RM", "HDB3-4RM", "HDB5RM-EC", "Landed Properties", "YOUNG_PR", "AGED_PR", "DENSITY", "Business", "Industry", "Government", "Embassy", "Banks", "MSB", "Residential", "Shopping")
head(cluster_vars, n=10)

##          SUBZONE_N Condo/Apt HDB1-2RM HDB3-4RM HDB5RM-EC Landed Properties
## 1     MARINA SOUTH         0        0        0         0                 0
## 2     PEARL'S HILL       420     4520     1400         0                 0
## 3        BOAT QUAY        50        0        0         0                 0
## 4   HENDERSON HILL       190     3930     7990      1270                 0
## 5          REDHILL      1930     1970     3520      3240                 0
## 6   ALEXANDRA HILL         0     3910     6320      3120                 0
## 7    BUKIT HO SWEE       540     3700     8610      1900                 0
## 8      CLARKE QUAY        10        0        0         0                 0
## 9  PASIR PANJANG 1      2970        0        0         0              1430
## 10       QUEENSWAY       290        0        0         0                 0
##     YOUNG_PR   AGED_PR      DENSITY Business Industry Government Embassy Banks
## 1  0.0000000 0.0000000 0.000000e+00        0        0          0       0     0
## 2  0.1671924 0.3154574 1.132514e-02        6        0          1       0    10
## 3  0.0000000 0.2000000 3.109308e-04       40        0          2       0     0
## 4  0.1958146 0.2436472 2.247120e-02        0        0          0       0     0
## 5  0.2664165 0.1463415 2.751469e-02        0        0          0       0     0
## 6  0.2157303 0.2337079 1.295640e-02       39        1          6       1    10
## 7  0.1932203 0.2433898 2.673399e-02        6        0          3       1     5
## 8  0.0000000 0.0000000 3.446082e-05       12        0          3       1    17
## 9  0.2522727 0.1227273 4.056076e-03        0        0          0       0     0
## 10 0.1034483 0.1724138 4.591192e-04        0        0          0       0     0
##    MSB Residential Shopping
## 1    0           0        0
## 2   15           6        3
## 3    2           1        1
## 4    0           0        0
## 5    0           0        0
## 6    5           4        1
## 7    1          11        1
## 8    2           6       10
## 9    0           0        0
## 10   0           0        0

11.2 Replace the row index from row number to township name

row.names(cluster_vars) <- cluster_vars$"SUBZONE_N"
head(cluster_vars, n=10)

##                       SUBZONE_N Condo/Apt HDB1-2RM HDB3-4RM HDB5RM-EC
## MARINA SOUTH       MARINA SOUTH         0        0        0         0
## PEARL'S HILL       PEARL'S HILL       420     4520     1400         0
## BOAT QUAY             BOAT QUAY        50        0        0         0
## HENDERSON HILL   HENDERSON HILL       190     3930     7990      1270
## REDHILL                 REDHILL      1930     1970     3520      3240
## ALEXANDRA HILL   ALEXANDRA HILL         0     3910     6320      3120
## BUKIT HO SWEE     BUKIT HO SWEE       540     3700     8610      1900
## CLARKE QUAY         CLARKE QUAY        10        0        0         0
## PASIR PANJANG 1 PASIR PANJANG 1      2970        0        0         0
## QUEENSWAY             QUEENSWAY       290        0        0         0
##                 Landed Properties  YOUNG_PR   AGED_PR      DENSITY Business
## MARINA SOUTH                    0 0.0000000 0.0000000 0.000000e+00        0
## PEARL'S HILL                    0 0.1671924 0.3154574 1.132514e-02        6
## BOAT QUAY                       0 0.0000000 0.2000000 3.109308e-04       40
## HENDERSON HILL                  0 0.1958146 0.2436472 2.247120e-02        0
## REDHILL                         0 0.2664165 0.1463415 2.751469e-02        0
## ALEXANDRA HILL                  0 0.2157303 0.2337079 1.295640e-02       39
## BUKIT HO SWEE                   0 0.1932203 0.2433898 2.673399e-02        6
## CLARKE QUAY                     0 0.0000000 0.0000000 3.446082e-05       12
## PASIR PANJANG 1              1430 0.2522727 0.1227273 4.056076e-03        0
## QUEENSWAY                       0 0.1034483 0.1724138 4.591192e-04        0
##                 Industry Government Embassy Banks MSB Residential Shopping
## MARINA SOUTH           0          0       0     0   0           0        0
## PEARL'S HILL           0          1       0    10  15           6        3
## BOAT QUAY              0          2       0     0   2           1        1
## HENDERSON HILL         0          0       0     0   0           0        0
## REDHILL                0          0       0     0   0           0        0
## ALEXANDRA HILL         1          6       1    10   5           4        1
## BUKIT HO SWEE          0          3       1     5   1          11        1
## CLARKE QUAY            0          3       1    17   2           6       10
## PASIR PANJANG 1        0          0       0     0   0           0        0
## QUEENSWAY              0          0       0     0   0           0        0

11.2.1 Remove the SUBZONE_N column

subzone_urban <- select(cluster_vars, c(2:17))
head(subzone_urban, n=10)

##                 Condo/Apt HDB1-2RM HDB3-4RM HDB5RM-EC Landed Properties
## MARINA SOUTH            0        0        0         0                 0
## PEARL'S HILL          420     4520     1400         0                 0
## BOAT QUAY              50        0        0         0                 0
## HENDERSON HILL        190     3930     7990      1270                 0
## REDHILL              1930     1970     3520      3240                 0
## ALEXANDRA HILL          0     3910     6320      3120                 0
## BUKIT HO SWEE         540     3700     8610      1900                 0
## CLARKE QUAY            10        0        0         0                 0
## PASIR PANJANG 1      2970        0        0         0              1430
## QUEENSWAY             290        0        0         0                 0
##                  YOUNG_PR   AGED_PR      DENSITY Business Industry Government
## MARINA SOUTH    0.0000000 0.0000000 0.000000e+00        0        0          0
## PEARL'S HILL    0.1671924 0.3154574 1.132514e-02        6        0          1
## BOAT QUAY       0.0000000 0.2000000 3.109308e-04       40        0          2
## HENDERSON HILL  0.1958146 0.2436472 2.247120e-02        0        0          0
## REDHILL         0.2664165 0.1463415 2.751469e-02        0        0          0
## ALEXANDRA HILL  0.2157303 0.2337079 1.295640e-02       39        1          6
## BUKIT HO SWEE   0.1932203 0.2433898 2.673399e-02        6        0          3
## CLARKE QUAY     0.0000000 0.0000000 3.446082e-05       12        0          3
## PASIR PANJANG 1 0.2522727 0.1227273 4.056076e-03        0        0          0
## QUEENSWAY       0.1034483 0.1724138 4.591192e-04        0        0          0
##                 Embassy Banks MSB Residential Shopping
## MARINA SOUTH          0     0   0           0        0
## PEARL'S HILL          0    10  15           6        3
## BOAT QUAY             0     0   2           1        1
## HENDERSON HILL        0     0   0           0        0
## REDHILL               0     0   0           0        0
## ALEXANDRA HILL        1    10   5           4        1
## BUKIT HO SWEE         1     5   1          11        1
## CLARKE QUAY           1    17   2           6       10
## PASIR PANJANG 1       0     0   0           0        0
## QUEENSWAY             0     0   0           0        0

11.3 Data Standardisation

11.3.1 Using Min-Max standardisation

subzone_urban.std <- normalize(subzone_urban)
summary(subzone_urban.std)

##    Condo/Apt          HDB1-2RM         HDB3-4RM         HDB5RM-EC      
##  Min.   :0.00000   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.01371   Median :0.0000   Median :0.00000   Median :0.00000  
##  Mean   :0.10894   Mean   :0.1153   Mean   :0.07937   Mean   :0.06875  
##  3rd Qu.:0.16905   3rd Qu.:0.1287   3rd Qu.:0.12940   3rd Qu.:0.07631  
##  Max.   :1.00000   Max.   :1.0000   Max.   :1.00000   Max.   :1.00000  
##  Landed Properties    YOUNG_PR         AGED_PR          DENSITY      
##  Min.   :0.00000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.00000   Median :0.6215   Median :0.3409   Median :0.1272  
##  Mean   :0.04112   Mean   :0.4861   Mean   :0.3256   Mean   :0.2315  
##  3rd Qu.:0.02125   3rd Qu.:0.7553   3rd Qu.:0.5439   3rd Qu.:0.4313  
##  Max.   :1.00000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##     Business          Industry         Government         Embassy       
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.00000   Median :0.00000   Median :0.00000   Median :0.00000  
##  Mean   :0.01796   Mean   :0.01176   Mean   :0.05015   Mean   :0.02145  
##  3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :1.00000   Max.   :1.00000   Max.   :1.00000   Max.   :1.00000  
##      Banks              MSB           Residential         Shopping      
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.00000   Median :0.00000   Median :0.00000   Median :0.00000  
##  Mean   :0.05475   Mean   :0.02409   Mean   :0.03501   Mean   :0.03166  
##  3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :1.00000   Max.   :1.00000   Max.   :1.00000   Max.   :1.00000

Min-max standardised clustering variables are now 0 to 1.

11.3.2 Using Z-score standardisation

subzone_urban.z <- scale(subzone_urban)
describe(subzone_urban.z)

##                   vars   n mean sd median trimmed  mad   min   max range  skew
## Condo/Apt            1 323    0  1  -0.56   -0.22 0.12 -0.64  5.25  5.89  1.99
## HDB1-2RM             2 323    0  1  -0.52   -0.25 0.00 -0.52  4.01  4.53  2.18
## HDB3-4RM             3 323    0  1  -0.60   -0.21 0.00 -0.60  6.96  7.56  2.70
## HDB5RM-EC            4 323    0  1  -0.49   -0.24 0.00 -0.49  6.65  7.14  3.33
## Landed Properties    5 323    0  1  -0.36   -0.25 0.00 -0.36  8.42  8.78  4.66
## YOUNG_PR             6 323    0  1   0.40    0.03 0.78 -1.42  1.50  2.93 -0.52
## AGED_PR              7 323    0  1   0.06   -0.05 1.37 -1.23  2.55  3.78  0.08
## DENSITY              8 323    0  1  -0.39   -0.15 0.70 -0.87  2.87  3.74  0.95
## Business             9 323    0  1  -0.25   -0.22 0.00 -0.25 13.80 14.06  9.09
## Industry            10 323    0  1  -0.15   -0.15 0.00 -0.15 12.31 12.46  9.37
## Government          11 323    0  1  -0.34   -0.27 0.00 -0.34  6.38  6.72  4.06
## Embassy             12 323    0  1  -0.21   -0.21 0.00 -0.21  9.36  9.56  6.32
## Banks               13 323    0  1  -0.39   -0.27 0.00 -0.39  6.69  7.08  3.44
## MSB                 14 323    0  1  -0.27   -0.24 0.00 -0.27 10.77 11.04  6.59
## Residential         15 323    0  1  -0.28   -0.24 0.00 -0.28  7.65  7.93  5.33
## Shopping            16 323    0  1  -0.30   -0.24 0.00 -0.30  9.21  9.51  5.81
##                   kurtosis   se
## Condo/Apt             4.33 0.06
## HDB1-2RM              4.26 0.06
## HDB3-4RM             10.59 0.06
## HDB5RM-EC            13.52 0.06
## Landed Properties    26.86 0.06
## YOUNG_PR             -1.38 0.06
## AGED_PR              -1.21 0.06
## DENSITY              -0.33 0.06
## Business            112.31 0.06
## Industry             99.30 0.06
## Government           18.63 0.06
## Embassy              44.36 0.06
## Banks                14.37 0.06
## MSB                  55.13 0.06
## Residential          30.95 0.06
## Shopping             42.94 0.06

11.3.3 Visualising the standardised clustering variables

r <- ggplot(data=subzone_urban, aes(x=`YOUNG_PR`))+
  geom_histogram(bins=20, color="black", fill="light blue")+
  ggtitle("Without Standardisation")

subzone_urban_s_df <- as.data.frame(subzone_urban.std)
s <- ggplot(data=subzone_urban_s_df, aes(x=`YOUNG_PR`))+
  geom_histogram(bins=20, color="black", fill="light blue")+
  ggtitle("Min-Max Standardisation")

subzone_urban_z_df <- as.data.frame(subzone_urban.z)
z <- ggplot(data=subzone_urban_z_df, aes(x=`YOUNG_PR`))+
  geom_histogram(bins=20, color="black", fill="light blue")+
  ggtitle("Z-score Standardisation")

ggarrange(r, s, z, ncol=3, nrow=1)

The overall distribution of the variables doesn’t change much. Therefore, I will not appy any data standarisation.

11.4 Computing proximity matrix using euclidean distance

distance is calcuated based on the information space, not the actual distance

proxmat <- dist(subzone_urban, method="euclidean")
head(proxmat, n=10)

##  [1] 4750.49546   64.10959 8996.33259 5522.11917 8060.18617 9577.25922
##  [7]   26.13427 3296.33131  290.00007    0.00000

11.5 Computing hierarchical clustering using Dendrogram method

With the not normalised euclidean distance proximity matrix

hclust_ward <- hclust(proxmat, method="ward.D")
plot(hclust_ward, cex=0.6)

11.6 Computing proximity matrix using euclidean distance

distance is calcuated based on the information space, not the actual distance

proxmat <- dist(subzone_urban, method="euclidean")
summary(proxmat)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3164    9113   12346   17730   91612

11.7 Selecting the optimal clustering algorithm

11.7.1 Compute the agglomerative coefficients

m <- c( "average", "single", "complete", "ward")
names(m) <- c( "average", "single", "complete", "ward")

ac <- function(x) {
  agnes(subzone_urban, method=x)$ac
}

map_dbl(m, ac)

##   average    single  complete      ward 
## 0.9749503 0.9437245 0.9802686 0.9909485

The results show Ward’s method identifies the strongest clustering structure compared to the other 3 clustering algorithm.

11.7.2 Interpreting the dendrograms with a border around the selected clusters

plot(hclust_ward, cex=0.6)
rect.hclust(hclust_ward, k=6, border=2:5)

After experimenting the cluster results, I choose to do a 6-cluster analysis for the social areas.

11.8 Perform visually-driven hiearchical clustering analysis

11.8.1 Transforming shan_ict data frame into a matrix

subzone_urban_mat <- data.matrix(subzone_urban)

11.8.2 Plot an interactive cluster heatmap

heatmaply(normalize(subzone_urban_mat), Colv=NA, dist_method="euclidean", hclust_method="ward.D", seriate="OLO", 
          colors=Blues, k_row=6, margins=c(NA,200,60,NA), fontsize_row=4, fontsize_col=5,
          main="Social Areas of Singapore by Subzone", xlab="Variable Indicators",
          ylab="Subzone in Singapore")

12 Delineate social areas using hierarchical clustering

12.1 Create a 6-cluster model from the calculated hierarchical clustering

groups <- as.factor(cutree(hclust_ward, k=6))

12.2 Visualise the clusters

In order to visualise the clusters, the groups object need to be appended onto urban_population simple feature object

subzone_urban_cluster <- cbind(urban_population, as.matrix(groups)) %>%
  rename(`CLUSTER`=`as.matrix.groups.`)

qtm(subzone_urban_cluster, "CLUSTER")

The subzone choropleth map above display quite dispersed social areas especially those in the East and North East side of Singapore. Therefore, I will proceed with Spatially Constrained Clustering analysis of the social areas.

13 Delineate social areas using spatially constrained clustering technique (SKATER approach)

13.1 Convert urban_population into SpatialPolygonDataFrame

urban_population_sp <- as_Spatial(urban_population)

13.2 Compute the neighbours list from polygon list

urban_population.nb <- poly2nb(urban_population_sp)
summary(urban_population.nb)

## Neighbour list object:
## Number of regions: 323 
## Number of nonzero links: 1934 
## Percentage nonzero weights: 1.853751 
## Average number of links: 5.987616 
## 5 regions with no links:
## 17 18 19 295 302
## Link number distribution:
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  5  2  6 10 26 77 87 51 34 16  3  3  1  1  1 
## 2 least connected regions:
## 16 234 with 1 link
## 1 most connected region:
## 313 with 17 links

Identity which 5 subzones that have no neighbours.

13.2.1 Identity the subzones with no neighbour

urban_population_sp@data[c(17,18,19,295,302),]

##                 SUBZONE_N Condo.Apt HDB1.2RM HDB3.4RM HDB5RM.EC
## 17                 SUDONG         0        0        0         0
## 18                SEMAKAU         0        0        0         0
## 19         SOUTHERN GROUP         0        0        0         0
## 295         PULAU SELETAR         0        0        0         0
## 302 NORTH-EASTERN ISLANDS         0        0        0         0
##     Landed.Properties YOUNG_PR ACTIVE_PR AGED_PR DENSITY Business Industry
## 17                  0        0         0       0       0        0        0
## 18                  0        0         0       0       0        0        0
## 19                  0        0         0       0       0        0        0
## 295                 0        0         0       0       0        0        0
## 302                 0        0         0       0       0        0        0
##     Government Embassy Banks MSB Residential Shopping
## 17           0       0     0   0           0        0
## 18           0       0     0   0           0        0
## 19           0       0     0   0           0        0
## 295          0       0     0   0           0        0
## 302          0       0     0   0           0        0

13.2.2 Remove the subzones from subzone_urban data frame and urban_population_sp

For calculating the neighbour cost

row.names.remove <- c("SUDONG", "SEMAKAU", "SOUTHERN GROUP", "PULAU SELETAR", "NORTH-EASTERN ISLANDS")
subzone_urban <- subzone_urban[!(row.names(subzone_urban) %in% row.names.remove), ]

subzone_urban_cluster <- subzone_urban_cluster[!(row.names(subzone_urban_cluster) %in% row.names.remove), ]

urban_population_sp <- st_as_sf(urban_population_sp)
urban_population_sp <- urban_population_sp[c(-17,-18,-19,-295,-302), ]
urban_population_sp <- as_Spatial(urban_population_sp)

13.3 Re-compute the neighbours list

urban_population.nb <- poly2nb(urban_population_sp)
summary(urban_population.nb)

## Neighbour list object:
## Number of regions: 318 
## Number of nonzero links: 1934 
## Percentage nonzero weights: 1.912503 
## Average number of links: 6.081761 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  2  6 10 26 77 87 51 34 16  3  3  1  1  1 
## 2 least connected regions:
## 16 234 with 1 link
## 1 most connected region:
## 313 with 17 links

After removing the 5 subzones, the average number of links increases due to the removal of subzones that aren’t reachable from another subzone.

13.3.1 Plot the neighbours list on top of the subzone area

plot(urban_population_sp, border=grey(.5))
plot(urban_population.nb, coordinates(urban_population_sp), col="blue", add=TRUE)

13.4 Computing minimum spanning tree

13.4.1 Compute the cost of each edge

nbcosts compute the cost of each edge which is the distance between each nodes

lcosts <- nbcosts(urban_population.nb, subzone_urban)

13.4.1.1 Convert the neighbour list to a list weights object with lcosts

B style to make sure the cost values are not row-standardised

urban_population.w <- nb2listw(urban_population.nb, lcosts, style="B")

## Warning in nb2listw(urban_population.nb, lcosts, style = "B"): zero sum general
## weights

summary(urban_population.w)

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 318 
## Number of nonzero links: 1934 
## Percentage nonzero weights: 1.912503 
## Average number of links: 6.081761 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  2  6 10 26 77 87 51 34 16  3  3  1  1  1 
## 2 least connected regions:
## 16 234 with 1 link
## 1 most connected region:
## 313 with 17 links
## 
## Weights style: B 
## Weights constants summary:
##     n     nn       S0           S1           S2
## B 318 101124 20225567 1.026771e+12 1.062439e+13

13.4.2 Computing minimum spanning tree (MST)

urban_population.mst <- mstree(urban_population.w)

13.4.2.1 Check the class and dimension of the MST

class(urban_population.mst)

## [1] "mst"    "matrix"

dim(urban_population.mst)

## [1] 317   3

The minimum spanning tree consists on n-1 edges to traverse all the nodes. Therefore, the dimension is 317 instead of 318.

13.4.2.2 View the first few rows of the minimum spanning tree

head(urban_population.mst)

##      [,1] [,2]       [,3]
## [1,]   48   11   18.11077
## [2,]   11   10  290.00007
## [3,]   10   64  355.87224
## [4,]   11   27 2125.59664
## [5,]   27    9 1568.80878
## [6,]   27   53 2088.28256

13.4.2.3 Plot the observation numbers with the edges and the subzone boundaries

plot(urban_population_sp, border=gray(.5))
plot.mst(urban_population.mst, coordinates(urban_population_sp), col="blue", cex.lab=0.7, cex.circles=0.005, add=TRUE)

The plot shows the observation numbers of the nodes in addition to the edges.

13.5 Compute spatially constrained clusters (SKATER approach)

The number of cuts is one less than the number of clusters

clust6 <- skater(urban_population.mst[,1:2], subzone_urban, 5)

13.5.1 View the SKATER object

str(clust6)

## List of 8
##  $ groups      : num [1:318] 2 2 2 1 1 1 1 2 1 1 ...
##  $ edges.groups:List of 6
##   ..$ :List of 3
##   .. ..$ node: num [1:137] 181 206 262 290 307 300 274 315 283 277 ...
##   .. ..$ edge: num [1:136, 1:3] 290 294 315 300 254 286 302 301 10 11 ...
##   .. ..$ ssw : num 985951
##   ..$ :List of 3
##   .. ..$ node: num [1:94] 152 104 124 80 57 50 81 62 94 54 ...
##   .. ..$ edge: num [1:93, 1:3] 69 99 161 80 113 115 80 133 57 41 ...
##   .. ..$ ssw : num 294562
##   ..$ :List of 3
##   .. ..$ node: num [1:9] 263 270 269 259 215 257 205 260 264
##   .. ..$ edge: num [1:8, 1:3] 263 270 269 263 257 215 263 259 270 269 ...
##   .. ..$ ssw : num 93578
##   ..$ :List of 3
##   .. ..$ node: num [1:3] 179 186 196
##   .. ..$ edge: num [1:2, 1:3] 179 179 186 196 39076 ...
##   .. ..$ ssw : num 58202
##   ..$ :List of 3
##   .. ..$ node: num [1:29] 252 184 249 169 220 190 216 143 256 204 ...
##   .. ..$ edge: num [1:28, 1:3] 169 184 190 143 252 249 182 169 216 256 ...
##   .. ..$ ssw : num 273236
##   ..$ :List of 3
##   .. ..$ node: num [1:46] 158 156 134 187 142 145 183 167 135 110 ...
##   .. ..$ edge: num [1:45, 1:3] 134 187 158 187 235 167 156 154 183 145 ...
##   .. ..$ ssw : num 378966
##  $ not.prune   : NULL
##  $ candidates  : int [1:6] 1 2 3 4 5 6
##  $ ssto        : num 3016706
##  $ ssw         : num [1:6] 3016706 2712071 2529876 2374049 2231770 ...
##  $ crit        : num [1:2] 1 Inf
##  $ vec.crit    : num [1:318] 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "class")= chr "skater"

List structure is the groups vector containing the labels of the cluster that belongs to each observation

13.5.1.1 Check the cluster assignment

ccs6 <- clust6$groups
table(ccs6)

## ccs6
##   1   2   3   4   5   6 
## 137  94   9   3  29  46

13.5.2 Plot the six clusters on top of the township boundaries

plot(urban_population_sp, border=gray(.5))
plot(clust6, coordinates(urban_population_sp), cex.lab=.7, groups.colors=c("red", "green", "blue", "brown", "pink"), cex.circles=0.005, add=TRUE)

## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

13.6 Plot the newly derived clusters

Spatially constraint cluster

groups_mat <- as.matrix(clust6$groups)
subzone_urban_spatialcluster <- cbind(subzone_urban_cluster, as.factor(groups_mat)) %>%
  rename(`SP_CLUSTER`=`as.factor.groups_mat.`)

subzone_urban_spatialcluster.map <- tm_shape(subzone_urban_spatialcluster)+
  tm_fill(col="SP_CLUSTER", n=6, title="SP_CLUSTER")+ 
  tm_legend(legend.position=c("right", "bottom"))+
  tm_layout(outer.margins=0, asp=0)+
  tm_borders(alpha = 0.5)

subzone_urban_cluster.map <- tm_shape(subzone_urban_spatialcluster)+
  tm_fill(col="CLUSTER", n=6, title="CLUSTER")+ 
  tm_legend(legend.position=c("right", "bottom"))+
  tm_layout(outer.margins=0, asp=0)+
  tm_borders(alpha = 0.5)

tmap_arrange(subzone_urban_spatialcluster.map, subzone_urban_cluster.map, asp=0.8, ncol=2)

After the spatially constrained clustering technique, the social areas now are much more clustered than before. From the plot:

Cluster 1 mainly is in the western side.
Cluster 2 is stretching from the west, central and to the eastern side of Singapore.
Cluster 3 is gather towards the north east side of Singapore.
Cluster 4 is has less clustered subzone but it’s in the Eastern side as well.
Cluster 5 is more towards the outer part of Singapore that surrounds the Cluster 1, 3, and 6.
Lastly, Cluster 6 is from the central to the eastern side of Singapore.

Take-home_Ex03

Poon Hao Jun

5/21/2020

1 Introduction

1.1 Data involved in this assignment

2 Setting up my environment

3 Preparing the necessary data for analysis

3.1 Import population as tibble dataframe

3.2 Import urban functions geospatial data as simple feature Dataframe

3.3 Import suzbone geospatial data as SpatialPolygonsDataFrame

4 Exploring on the population data

4.1 Glimpse the data type

4.2 Sorting population data using dplyr

4.2.1 Derive the 3 new age groups Young, Active, Aged

4.2.2 Combining the 3 room with 4 room as 1 variable and rename the necessary variables

4.3 Sorting the population data

4.3.1 Get total population in each subzone

4.3.2 Get total population for each dwelling in each subzone

4.3.3 Get the 3 age groups in each subzone

4.4 Combining all the computed variables to each subzone

4.4.1 Derive the new variables by dividing with subzone population

5 Detect potential geometry issues for mpsz SpatialPolygonsDataFrame

5.1 Viewing the overall report on the mpsz SpatialPolygonsDataFrame

5.1.1 Process mpsz SpatialPolygonsDataFrame

5.1.2 Viewing the overall report after fixing the invalid spatial objects

5.2 Compute all the subzones area

5.2.1 Extract the subzones area

5.2.2 Compute the population density

6 Exploring on the remaining urban functions spatial data

6.1 View the business spatial data

6.2 View the embassy spatial data

6.3 View the financial spatial data

6.4 View the residential spatial data

6.5 View the shopping spatial data

6.6 Conclusion of the urban functions spatial data

7 Checking the projection system of the spatial data

7.1 Business spatial data

7.2 Embassy spatial data

7.3 Financial spatial data

7.4 Residential spatial data

7.5 Shopping spatial data

8 Filter and transform urban functions data

8.1 Business and Industry

8.2 Embassy

8.3 Financial

8.4 Residential

8.5 Shopping

8.6 Merging all the urban functions computed data

9 Combine subzone with urban functions spatial data

10 Correlation Analysis

10.1 Visualise and analyse the correlation of the input variables

11 Hierarchy Cluster Analysis

11.1 Extracting analysis variables

11.2 Replace the row index from row number to township name

11.2.1 Remove the SUBZONE_N column

11.3 Data Standardisation

11.3.1 Using Min-Max standardisation

11.3.2 Using Z-score standardisation

11.3.3 Visualising the standardised clustering variables

11.4 Computing proximity matrix using euclidean distance

11.5 Computing hierarchical clustering using Dendrogram method

11.6 Computing proximity matrix using euclidean distance

11.7 Selecting the optimal clustering algorithm

11.7.1 Compute the agglomerative coefficients

11.7.2 Interpreting the dendrograms with a border around the selected clusters

11.8 Perform visually-driven hiearchical clustering analysis

11.8.1 Transforming shan_ict data frame into a matrix

11.8.2 Plot an interactive cluster heatmap

12 Delineate social areas using hierarchical clustering

12.1 Create a 6-cluster model from the calculated hierarchical clustering

12.2 Visualise the clusters

13 Delineate social areas using spatially constrained clustering technique (SKATER approach)

13.1 Convert urban_population into SpatialPolygonDataFrame

13.2 Compute the neighbours list from polygon list

13.2.1 Identity the subzones with no neighbour

13.2.2 Remove the subzones from subzone_urban data frame and urban_population_sp

13.3 Re-compute the neighbours list

13.3.1 Plot the neighbours list on top of the subzone area

13.4 Computing minimum spanning tree

13.4.1 Compute the cost of each edge