In this take home assignment, I will need to delineate using hierarchical clustering technique and spatially constrained clustering technique, then describe the analysis result.
Singapore Residents by Planning AreaSubzone, Age Group, Sex and Type of Dwelling, June 2011-2019 (extract the following indicators for Year 2019)
Urban functions geospatial data
The R packages needed for this take home exercise are as follows:
Spatial data handling +sf, and spdep Geospatial analysis package +ClustGeo Attribute data handling +tidyverse, especially readr, ggplot2 and dplyr Choropleth mapping +tmap *cleangeo +facilitate handling and catching rgeos geometry issues +provide an utility to clean the spatial objects
packages <- c('rgdal', 'spdep', 'raster', 'ClustGeo', 'tmap', 'sf', 'ggpubr', 'cluster', 'heatmaply', 'corrplot', 'tidyverse', 'cleangeo', 'psych')
for(p in packages){
if(!require(p, character.only=T)){
install.packages(p)
}
library(p,character.only=T)
}
population <- read_csv("data/aspatial/Resident Population by Planning AreaSubzone, Age Group and Sex 2019.csv")
## Parsed with column specification:
## cols(
## Time = col_double(),
## PA = col_character(),
## SZ = col_character(),
## AG = col_character(),
## Sex = col_character(),
## TOD = col_character(),
## Pop = col_double()
## )
business <- st_read(dsn="data/geospatial", layer="Business")
## Reading layer `Business' from data source `D:\Hao Jun\School\[IS415] Geospatial Analytics and Applications\Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 6550 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6147 ymin: 1.24605 xmax: 104.0044 ymax: 1.4698
## geographic CRS: WGS 84
financial <- st_read(dsn="data/geospatial", layer="Financial")
## Reading layer `Financial' from data source `D:\Hao Jun\School\[IS415] Geospatial Analytics and Applications\Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 3320 features and 29 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6256 ymin: 1.24392 xmax: 103.9998 ymax: 1.46247
## geographic CRS: WGS 84
embassy <- st_read(dsn="data/geospatial", layer="Govt_Embassy")
## Reading layer `Govt_Embassy' from data source `D:\Hao Jun\School\[IS415] Geospatial Analytics and Applications\Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 443 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6282 ymin: 1.24911 xmax: 103.9884 ymax: 1.45765
## geographic CRS: WGS 84
residential <- st_read(dsn="data/geospatial", layer="Private residential")
## Reading layer `Private residential' from data source `D:\Hao Jun\School\[IS415] Geospatial Analytics and Applications\Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 3604 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6295 ymin: 1.23943 xmax: 103.9749 ymax: 1.45379
## geographic CRS: WGS 84
shopping <- st_read(dsn="data/geospatial", layer="Shopping")
## Reading layer `Shopping' from data source `D:\Hao Jun\School\[IS415] Geospatial Analytics and Applications\Take-home_Ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 511 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.679 ymin: 1.24779 xmax: 103.9644 ymax: 1.4535
## geographic CRS: WGS 84
In order to import the various geospatial data, I will be using readOGR function under rgdal package.
mpsz <- readOGR(dsn="data/geospatial", layer="MP14_SUBZONE_WEB_PL")
## OGR data source with driver: ESRI Shapefile
## Source: "D:\Hao Jun\School\[IS415] Geospatial Analytics and Applications\Take-home_Ex03\data\geospatial", layer: "MP14_SUBZONE_WEB_PL"
## with 323 features
## It has 15 fields
glimpse(population)
## Rows: 98,192
## Columns: 7
## $ Time <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 20...
## $ PA <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang M...
## $ SZ <chr> "Ang Mo Kio Town Centre", "Ang Mo Kio Town Centre", "Ang Mo Ki...
## $ AG <chr> "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0...
## $ Sex <chr> "Males", "Males", "Males", "Males", "Males", "Males", "Males",...
## $ TOD <chr> "HDB 1- and 2-Room Flats", "HDB 3-Room Flats", "HDB 4-Room Fla...
## $ Pop <dbl> 0, 10, 10, 20, 0, 0, 50, 0, 0, 10, 10, 20, 0, 0, 40, 0, 0, 10,...
population_sort <- population %>%
spread(AG, Pop) %>%
mutate_at(.vars = vars(PA, SZ), .funs = funs(toupper)) %>%
group_by(Time, PA, SZ, Sex, TOD) %>%
mutate(YOUNG=`0_to_4`+`5_to_9`+`10_to_14`+`15_to_19`+`20_to_24`) %>%
mutate(ACTIVE=`25_to_29`+`30_to_34`+`35_to_39`+`40_to_44`+`45_to_49`+`50_to_54`+`55_to_59`+`60_to_64`) %>%
mutate(AGED=`65_to_69`+`70_to_74`+`75_to_79`+`80_to_84`+`85_to_89`+`90_and_over`) %>%
filter(TOD != "Others" && TOD != "HUDC Flats (excluding those privatised)") %>%
summarise(YOUNG=sum(YOUNG), ACTIVE=sum(ACTIVE), AGED=sum(AGED))
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.
population_sort$TOD <- factor(population_sort$TOD)
glimpse(population_sort)
## Rows: 3,876
## Columns: 8
## Groups: Time, PA, SZ, Sex [646]
## $ Time <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, ...
## $ PA <chr> "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG MO KIO", "ANG...
## $ SZ <chr> "ANG MO KIO TOWN CENTRE", "ANG MO KIO TOWN CENTRE", "ANG MO ...
## $ Sex <chr> "Females", "Females", "Females", "Females", "Females", "Fema...
## $ TOD <fct> Condominiums and Other Apartments, HDB 1- and 2-Room Flats, ...
## $ YOUNG <dbl> 340, 0, 50, 80, 220, 0, 340, 0, 50, 70, 210, 0, 0, 90, 1040,...
## $ ACTIVE <dbl> 600, 0, 160, 200, 530, 0, 510, 0, 140, 160, 480, 0, 0, 230, ...
## $ AGED <dbl> 80, 0, 50, 80, 220, 0, 60, 0, 20, 60, 170, 0, 0, 180, 1660, ...
In this transformation, I spread out the age group and population count columns. Then, I proceed to mutate the subzone column to uppercase for later analysis with mpsz SpatialPolygonsDataFrame. Followed by combining the age group into 3 groups such as Active (age 25-64), Young (age 0-24), Aged (65 and above). Lastly, is to select the columns that I would like to keep, filter away those dwelling that is not required for this analysis, and change the TOD column into factor for the next phase of transformation.
population_sort$TOD <- population_sort$TOD %>%
fct_collapse("HDB3-4RM" = c("HDB 3-Room Flats", "HDB 4-Room Flats")) %>%
recode("HDB 1- and 2-Room Flats"="HDB1-2RM", "HDB 5-Room and Executive Flats"="HDB5RM-EC", "Condominiums and Other Apartments"="Condo/Apt")
head(population_sort, n=10)
## # A tibble: 10 x 8
## # Groups: Time, PA, SZ, Sex [2]
## Time PA SZ Sex TOD YOUNG ACTIVE AGED
## <dbl> <chr> <chr> <chr> <fct> <dbl> <dbl> <dbl>
## 1 2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ Condo/Apt 340 600 80
## 2 2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ HDB1-2RM 0 0 0
## 3 2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ HDB3-4RM 50 160 50
## 4 2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ HDB3-4RM 80 200 80
## 5 2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ HDB5RM-EC 220 530 220
## 6 2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ Landed Proper~ 0 0 0
## 7 2019 ANG MO KIO ANG MO KIO TOWN CE~ Males Condo/Apt 340 510 60
## 8 2019 ANG MO KIO ANG MO KIO TOWN CE~ Males HDB1-2RM 0 0 0
## 9 2019 ANG MO KIO ANG MO KIO TOWN CE~ Males HDB3-4RM 50 140 20
## 10 2019 ANG MO KIO ANG MO KIO TOWN CE~ Males HDB3-4RM 70 160 60
The next transformation, I am combining the 3 room and 4 room flats variables together using fct_collapse and using recode to rename the values in TOD column.
population_sort <- population_sort %>%
group_by(Time, PA, SZ, Sex, TOD) %>%
summarise(YOUNG=sum(YOUNG), ACTIVE=sum(ACTIVE), AGED=sum(AGED))
head(population_sort, n=10)
## # A tibble: 10 x 8
## # Groups: Time, PA, SZ, Sex [2]
## Time PA SZ Sex TOD YOUNG ACTIVE AGED
## <dbl> <chr> <chr> <chr> <fct> <dbl> <dbl> <dbl>
## 1 2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ Condo/Apt 340 600 80
## 2 2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ HDB1-2RM 0 0 0
## 3 2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ HDB3-4RM 130 360 130
## 4 2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ HDB5RM-EC 220 530 220
## 5 2019 ANG MO KIO ANG MO KIO TOWN CE~ Femal~ Landed Proper~ 0 0 0
## 6 2019 ANG MO KIO ANG MO KIO TOWN CE~ Males Condo/Apt 340 510 60
## 7 2019 ANG MO KIO ANG MO KIO TOWN CE~ Males HDB1-2RM 0 0 0
## 8 2019 ANG MO KIO ANG MO KIO TOWN CE~ Males HDB3-4RM 120 300 80
## 9 2019 ANG MO KIO ANG MO KIO TOWN CE~ Males HDB5RM-EC 210 480 170
## 10 2019 ANG MO KIO ANG MO KIO TOWN CE~ Males Landed Proper~ 0 0 0
Then, lastly combining the rows after renaming “HDB3-4RM” using summarise function.
population_sort_all <- population_sort %>%
group_by(SZ) %>%
summarise(Pop=sum(YOUNG, ACTIVE,AGED))
head(population_sort_all, n=10)
## # A tibble: 10 x 2
## SZ Pop
## <chr> <dbl>
## 1 ADMIRALTY 14110
## 2 AIRPORT ROAD 0
## 3 ALEXANDRA HILL 13350
## 4 ALEXANDRA NORTH 2120
## 5 ALJUNIED 38930
## 6 ANAK BUKIT 22040
## 7 ANCHORVALE 46610
## 8 ANG MO KIO TOWN CENTRE 4880
## 9 ANSON 0
## 10 BALESTIER 32120
population_sort_tod <- population_sort %>%
group_by(SZ, TOD) %>%
summarise(Pop=sum(YOUNG, ACTIVE, AGED)) %>%
spread(TOD, Pop)
head(population_sort_tod, n=10)
## # A tibble: 10 x 6
## # Groups: SZ [10]
## SZ `Condo/Apt` `HDB1-2RM` `HDB3-4RM` `HDB5RM-EC` `Landed Properti~
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ADMIRALTY 0 1140 6750 6220 0
## 2 AIRPORT ROAD 0 0 0 0 0
## 3 ALEXANDRA HI~ 0 3910 6320 3120 0
## 4 ALEXANDRA NO~ 2120 0 0 0 0
## 5 ALJUNIED 11930 2120 18020 6400 460
## 6 ANAK BUKIT 9020 160 2090 3170 7600
## 7 ANCHORVALE 5000 1680 17280 22650 0
## 8 ANG MO KIO T~ 1930 0 1120 1830 0
## 9 ANSON 0 0 0 0 0
## 10 BALESTIER 9610 1790 15490 4660 570
population_sort_ageg <- population_sort %>%
group_by(SZ) %>%
summarise(YOUNG=sum(YOUNG), ACTIVE=sum(ACTIVE), AGED=sum(AGED))
head(population_sort_ageg, n=10)
## # A tibble: 10 x 4
## SZ YOUNG ACTIVE AGED
## <chr> <dbl> <dbl> <dbl>
## 1 ADMIRALTY 4370 8380 1360
## 2 AIRPORT ROAD 0 0 0
## 3 ALEXANDRA HILL 2880 7350 3120
## 4 ALEXANDRA NORTH 580 1420 120
## 5 ALJUNIED 8260 23700 6970
## 6 ANAK BUKIT 5860 12370 3810
## 7 ANCHORVALE 15250 27740 3620
## 8 ANG MO KIO TOWN CENTRE 1360 2780 740
## 9 ANSON 0 0 0
## 10 BALESTIER 7310 19020 5790
population_combine <- left_join(population_sort_all, population_sort_tod, by=c("SZ"="SZ"))
population_combine <- left_join(population_combine, population_sort_ageg, by=c("SZ"="SZ"))
head(population_combine, n=10)
## # A tibble: 10 x 10
## SZ Pop `Condo/Apt` `HDB1-2RM` `HDB3-4RM` `HDB5RM-EC` `Landed Propert~
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ADMI~ 14110 0 1140 6750 6220 0
## 2 AIRP~ 0 0 0 0 0 0
## 3 ALEX~ 13350 0 3910 6320 3120 0
## 4 ALEX~ 2120 2120 0 0 0 0
## 5 ALJU~ 38930 11930 2120 18020 6400 460
## 6 ANAK~ 22040 9020 160 2090 3170 7600
## 7 ANCH~ 46610 5000 1680 17280 22650 0
## 8 ANG ~ 4880 1930 0 1120 1830 0
## 9 ANSON 0 0 0 0 0 0
## 10 BALE~ 32120 9610 1790 15490 4660 570
## # ... with 3 more variables: YOUNG <dbl>, ACTIVE <dbl>, AGED <dbl>
population_combine <- population_combine %>%
group_by(SZ) %>%
mutate(`YOUNG_PR`=`YOUNG`/`Pop`) %>%
mutate(`ACTIVE_PR`=`ACTIVE`/`Pop`) %>%
mutate(`AGED_PR`=`AGED`/`Pop`)
population_combine$SZ <- factor(population_combine$SZ)
head(population_combine, n=10)
## # A tibble: 10 x 13
## # Groups: SZ [10]
## SZ Pop `Condo/Apt` `HDB1-2RM` `HDB3-4RM` `HDB5RM-EC` `Landed Propert~
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ADMI~ 14110 0 1140 6750 6220 0
## 2 AIRP~ 0 0 0 0 0 0
## 3 ALEX~ 13350 0 3910 6320 3120 0
## 4 ALEX~ 2120 2120 0 0 0 0
## 5 ALJU~ 38930 11930 2120 18020 6400 460
## 6 ANAK~ 22040 9020 160 2090 3170 7600
## 7 ANCH~ 46610 5000 1680 17280 22650 0
## 8 ANG ~ 4880 1930 0 1120 1830 0
## 9 ANSON 0 0 0 0 0 0
## 10 BALE~ 32120 9610 1790 15490 4660 570
## # ... with 6 more variables: YOUNG <dbl>, ACTIVE <dbl>, AGED <dbl>,
## # YOUNG_PR <dbl>, ACTIVE_PR <dbl>, AGED_PR <dbl>
In order to detect for any geometry issues, I will be using clgeo_CollectionReport function under cleangeo package which gives us more details.
clgeo_CollectionReport(mpsz) %>%
filter(valid == FALSE)
## type valid issue_type error_msg
## 1 rgeos_validity FALSE GEOM_VALIDITY <NA>
## 2 rgeos_validity FALSE GEOM_VALIDITY <NA>
## 3 rgeos_validity FALSE GEOM_VALIDITY <NA>
## 4 rgeos_validity FALSE GEOM_VALIDITY <NA>
## 5 rgeos_validity FALSE GEOM_VALIDITY <NA>
## 6 rgeos_validity FALSE GEOM_VALIDITY <NA>
## 7 rgeos_validity FALSE GEOM_VALIDITY <NA>
## 8 rgeos_validity FALSE GEOM_VALIDITY <NA>
## 9 rgeos_validity FALSE GEOM_VALIDITY <NA>
## warning_msg
## 1 Ring Self-intersection at or near point 27932.392599999999 21982.797200000001
## 2 Ring Self-intersection at or near point 26885.443899999998 26668.312099999999
## 3 Ring Self-intersection at or near point 26920.169000000002 26978.544099999999
## 4 Ring Self-intersection at or near point 15432.475 31319.716
## 5 Ring Self-intersection at or near point 12861.382900000001 32207.492300000002
## 6 Ring Self-intersection at or near point 19681.235400000001 31294.4522
## 7 Ring Self-intersection at or near point 41375.108 40432.858899999999
## 8 Ring Self-intersection at or near point 38542.2261 44605.408900000002
## 9 Ring Self-intersection at or near point 21702.562300000001 48125.1155
The warning messages mostly state that “Ring Self-intersection at or near point xxx”. This would mean that some polygons are inside the overlap area or it’s within an invalid geometry coordinates.
In order to view the result at a glance, I will be using clgeo_SummaryReport function under cleangeo package.
clgeo_SummaryReport(clgeo_CollectionReport(mpsz))
## type valid issue_type
## rgeos_validity: 9 Mode :logical GEOM_VALIDITY: 9
## NA's :314 FALSE:9 NA's :314
## TRUE :314
The report show that there is 9 out of 323 spatial objects are not valid. Therefore, I will need to fix these 9 spatial objects.
I will use clgeo_Clean another function under cleangeo package to fix the 9 spatial objects that is not valid from the summary report.
The cleaning strategy used by clgeo_Clean is through polygonation method using re-polygonation intuitive algorithm to rebuild those identifed invalid geometry into a valid geometry.
mpsz.clean <- clgeo_Clean(mpsz)
clgeo_SummaryReport(clgeo_CollectionReport(mpsz.clean))
## type valid issue_type
## NA's:323 Mode:logical NA's:323
## TRUE:323
From the summary report, it’s showing that there are 323 spatial objects are now valid. This shows that after applying the polygonation cleaning strategy, it does rebuild the 9 spatial objects into a valid geometries.
mpsz.clean$AREA <- gArea(mpsz.clean, byid=T)
Ensure that after the fix on the spatial objects the subzone area matches
any(duplicated(mpsz.clean@data$SHAPE_Area, mpsz.clean@data$AREA))
## [1] FALSE
Units of measurement is in meters square
area <- mpsz.clean@data %>%
select(SUBZONE_N, AREA)
population_combine <- left_join(population_combine, area, by=c("SZ"="SUBZONE_N"))
population_combine <- population_combine %>%
group_by(SZ) %>%
mutate(DENSITY = (`Pop`/`AREA`))
head(population_combine, n=10)
## # A tibble: 10 x 15
## # Groups: SZ [10]
## SZ Pop `Condo/Apt` `HDB1-2RM` `HDB3-4RM` `HDB5RM-EC` `Landed Propert~
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ADMI~ 14110 0 1140 6750 6220 0
## 2 AIRP~ 0 0 0 0 0 0
## 3 ALEX~ 13350 0 3910 6320 3120 0
## 4 ALEX~ 2120 2120 0 0 0 0
## 5 ALJU~ 38930 11930 2120 18020 6400 460
## 6 ANAK~ 22040 9020 160 2090 3170 7600
## 7 ANCH~ 46610 5000 1680 17280 22650 0
## 8 ANG ~ 4880 1930 0 1120 1830 0
## 9 ANSON 0 0 0 0 0 0
## 10 BALE~ 32120 9610 1790 15490 4660 570
## # ... with 8 more variables: YOUNG <dbl>, ACTIVE <dbl>, AGED <dbl>,
## # YOUNG_PR <dbl>, ACTIVE_PR <dbl>, AGED_PR <dbl>, AREA <dbl>, DENSITY <dbl>
Population Density = Number of People/Land Area
summary(business)
## POI_ID SEQ_NUM FAC_TYPE
## Min. :3.644e+07 Min. :1.000 Min. :5000
## 1st Qu.:9.967e+08 1st Qu.:1.000 1st Qu.:5000
## Median :1.097e+09 Median :1.000 Median :5000
## Mean :9.933e+08 Mean :1.021 Mean :5084
## 3rd Qu.:1.108e+09 3rd Qu.:1.000 3rd Qu.:5000
## Max. :1.204e+09 Max. :3.000 Max. :9991
##
## POI_NAME ST_NAME
## CAMBRIDGE INDUSTRIAL TRUST: 8 TAGORE LN : 83
## DHL : 6 JOO KOON CIR : 80
## NATIONAL OILWELL VARCO : 6 GUL CIR : 62
## ST MICROELECTRONICS : 6 KAKI BUKIT PL : 53
## CWT : 5 KAKI BUKIT IND TER: 52
## HALLIBURTON : 5 (Other) :5941
## (Other) :6514 NA's : 279
## geometry
## POINT :6550
## epsg:4326 : 0
## +proj=long...: 0
##
##
##
##
Filter and convert the columns to the correct data types
business_reduced <- subset(business, select=-c(POI_ID, SEQ_NUM))
business_reduced$FAC_TYPE <- factor(business_reduced$FAC_TYPE)
summary(business_reduced)
## FAC_TYPE POI_NAME ST_NAME
## 5000:6440 CAMBRIDGE INDUSTRIAL TRUST: 8 TAGORE LN : 83
## 9991: 110 DHL : 6 JOO KOON CIR : 80
## NATIONAL OILWELL VARCO : 6 GUL CIR : 62
## ST MICROELECTRONICS : 6 KAKI BUKIT PL : 53
## CWT : 5 KAKI BUKIT IND TER: 52
## HALLIBURTON : 5 (Other) :5941
## (Other) :6514 NA's : 279
## geometry
## POINT :6550
## epsg:4326 : 0
## +proj=long...: 0
##
##
##
##
business %>%
group_by(SEQ_NUM, FAC_TYPE) %>%
tally(sort=TRUE)
## Simple feature collection with 5 features and 3 fields
## geometry type: MULTIPOINT
## dimension: XY
## bbox: xmin: 103.6147 ymin: 1.24605 xmax: 104.0044 ymax: 1.4698
## geographic CRS: WGS 84
## # A tibble: 5 x 4
## # Groups: SEQ_NUM [3]
## SEQ_NUM FAC_TYPE n geometry
## <int> <int> <int> <MULTIPOINT [°]>
## 1 1 5000 6320 ((103.6147 1.24605), (103.6198 1.27715), (103.6208 1.2~
## 2 2 5000 117 ((103.624 1.29208), (103.6251 1.29124), (103.6251 1.29~
## 3 1 9991 95 ((103.6237 1.29389), (103.6242 1.29325), (103.6251 1.2~
## 4 2 9991 15 ((103.6237 1.29389), (103.6251 1.29333), (103.633 1.31~
## 5 3 5000 3 ((103.6936 1.35292), (103.7493 1.30213), (103.8451 1.2~
Wondering what is SEQ_NUM, I group by the SEQ_NUM with FAC_TYPE to view association. Based on the POI_NAME, I belive that it’s the way how the FAC_TYPE are being lablel since 5000 or 9991 are also associated to 1 and 2.
head(business %>%
filter(FAC_TYPE == 5000), n=10)
## Simple feature collection with 10 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.7354 ymin: 1.2972 xmax: 103.8858 ymax: 1.34032
## geographic CRS: WGS 84
## POI_ID SEQ_NUM FAC_TYPE POI_NAME
## 1 1101180209 1 5000 JOHN CHEN
## 2 1101180210 1 5000 TROPICAL INDUSTRIAL BUILDING
## 3 1101180211 1 5000 LIAN CHEONG INDUSTRIAL BUILDING
## 4 1101180212 1 5000 MALAYSIA GARMENT MANUFACTURERS
## 5 1101180213 1 5000 UNIGOLD
## 6 1192316144 1 5000 NUS UNIVERSITY HALL
## 7 1144317654 1 5000 SUITES AT BUKIT TIMAH
## 8 1103507488 1 5000 TIONG HUAT
## 9 1001052867 1 5000 LEE CHOON GUAN TIMBER MERCHANT
## 10 1001052868 1 5000 WEIGHT BRIDGE SERVICE
## ST_NAME geometry
## 1 LITTLE RD POINT (103.8856 1.33841)
## 2 LITTLE RD POINT (103.8852 1.33832)
## 3 LITTLE RD POINT (103.8852 1.33834)
## 4 <NA> POINT (103.8855 1.33821)
## 5 LITTLE RD POINT (103.8858 1.33844)
## 6 LOWER KENT RIDGE RD POINT (103.7777 1.2972)
## 7 JALAN JURONG KECHIL POINT (103.7738 1.34032)
## 8 KALLANG PUDDING RD POINT (103.879 1.32773)
## 9 PENJURU RD POINT (103.7354 1.3184)
## 10 PENJURU RD POINT (103.7361 1.31927)
head(business %>%
filter(FAC_TYPE == 9991), n=10)
## Simple feature collection with 10 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6242 ymin: 1.29325 xmax: 103.9686 ymax: 1.3598
## geographic CRS: WGS 84
## POI_ID SEQ_NUM FAC_TYPE
## 1 1110491789 1 9991
## 2 1099992474 1 9991
## 3 1099992477 1 9991
## 4 1099992477 2 9991
## 5 1100464367 1 9991
## 6 1100464367 2 9991
## 7 1100464368 1 9991
## 8 1100464369 1 9991
## 9 1100464369 2 9991
## 10 1108286608 1 9991
## POI_NAME
## 1 KAKI BUKIT INDUSTRIAL ESTATE
## 2 TERRACE FACTORIES TUAS SOUTH ST 5
## 3 TERRACE FACTORIES TUAS SOUTH ST 5 EAST ENTRANCE
## 4 JTC TERRACE FACTORIES TUAS S ST 5
## 5 CHANGI BUSINESS PARK CHANGI BUSINESS PARK CENTRAL 1 ENTRANCE
## 6 CHANGI BUSINESS PARK CENTRAL 1
## 7 CHANGI BUSINESS PARK
## 8 CHANGI BUSINESS PARK CHANGI BUSINESS PARK VISTA ENTRANCE
## 9 CHANGI BUSINESS PARK VISTA ENTRANCE
## 10 ANG MO KIO INDUSTRIAL PARK 1
## ST_NAME geometry
## 1 KAKI BUKIT AVE 1 POINT (103.9042 1.33269)
## 2 TUAS SOUTH ST 5 POINT (103.6242 1.29325)
## 3 TUAS SOUTH ST 5 POINT (103.6251 1.29333)
## 4 TUAS SOUTH ST 5 POINT (103.6251 1.29333)
## 5 CHANGI BUSINESS PARK CENTRAL 1 POINT (103.9633 1.33283)
## 6 CHANGI BUSINESS PARK CENTRAL 1 POINT (103.9633 1.33283)
## 7 CHANGI BUSINESS PARK CENTRAL 2 POINT (103.9657 1.33445)
## 8 CHANGI BUSINESS PARK VIS POINT (103.9686 1.33649)
## 9 CHANGI BUSINESS PARK VIS POINT (103.9686 1.33649)
## 10 ANG MO KIO IND PARK 1 POINT (103.856 1.3598)
After viewing in detail, there are 2 types of FAC_TYPE, 5000 indicates the business, and 9991 indicates industry. Therefore, I will need to convert the SpatialPointsDataFrame into sf dataframe before I can filter them separately.
summary(embassy)
## POI_ID SEQ_NUM FAC_TYPE
## Min. :3.644e+07 Min. :1.000 Min. :9525
## 1st Qu.:1.010e+09 1st Qu.:1.000 1st Qu.:9525
## Median :1.058e+09 Median :1.000 Median :9525
## Mean :1.006e+09 Mean :1.111 Mean :9651
## 3rd Qu.:1.113e+09 3rd Qu.:1.000 3rd Qu.:9993
## Max. :1.203e+09 Max. :2.000 Max. :9993
##
## POI_NAME ST_NAME geometry
## ANG MO KIO TOWN COUNCIL : 5 MAXWELL RD: 16 POINT :443
## SEMBAWANG-NEE SOON TOWN COUNCIL: 3 THOMSON RD: 12 epsg:4326 : 0
## ALJUNIED HOUGANG TOWN COUNCIL : 2 ORCHARD RD: 11 +proj=long...: 0
## ALJUNIED TOWN COUNCIL : 2 COLLEGE RD: 10
## BISHAN-TOA PAYOH TOWN COUNCIL : 2 SCOTTS RD : 8
## CENTRAL PROVIDENT FUND BOARD : 2 (Other) :358
## (Other) :427 NA's : 28
Filter and convert the columns to the correct data types
embassy_reduced <- subset(embassy, select=-c(POI_ID, SEQ_NUM))
embassy_reduced$FAC_TYPE <- factor(embassy_reduced$FAC_TYPE)
summary(embassy_reduced)
## FAC_TYPE POI_NAME ST_NAME
## 9525:324 ANG MO KIO TOWN COUNCIL : 5 MAXWELL RD: 16
## 9993:119 SEMBAWANG-NEE SOON TOWN COUNCIL: 3 THOMSON RD: 12
## ALJUNIED HOUGANG TOWN COUNCIL : 2 ORCHARD RD: 11
## ALJUNIED TOWN COUNCIL : 2 COLLEGE RD: 10
## BISHAN-TOA PAYOH TOWN COUNCIL : 2 SCOTTS RD : 8
## CENTRAL PROVIDENT FUND BOARD : 2 (Other) :358
## (Other) :427 NA's : 28
## geometry
## POINT :443
## epsg:4326 : 0
## +proj=long...: 0
##
##
##
##
head(embassy %>%
filter(FAC_TYPE == 9525), n=10)
## Simple feature collection with 10 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.8359 ymin: 1.27869 xmax: 103.9698 ymax: 1.36625
## geographic CRS: WGS 84
## POI_ID SEQ_NUM FAC_TYPE POI_NAME ST_NAME
## 1 1192460871 1 9525 MND TOWER BLOCK MAXWELL RD
## 2 1192460819 1 9525 MND AUDITORIUM & FUNCTION HALL MAXWELL RD
## 3 1192460843 1 9525 AICARE LINK @ MAXWELL MAXWELL RD
## 4 1192460783 1 9525 HARMONY IN DIVERSITY GALLERY MAXWELL RD
## 5 1192460750 1 9525 FAMILY SUPPORT DIVISION MSF MAXWELL RD
## 6 1194224304 1 9525 LTA BEDOK CAMPUS CHAI CHEE ST
## 7 1104192309 1 9525 ALJUNIED HOUGANG TOWN COUNCIL HOUGANG AVE 2
## 8 1178905747 1 9525 SINGAPORE PRISON SERVICE UPP CHANGI RD N
## 9 1058427947 1 9525 OLD POLICE ACADEMY MOUNT PLEASANT RD
## 10 1161806958 1 9525 CENTRAL PROVIDENT FUND BOARD THOMSON RD
## geometry
## 1 POINT (103.8456 1.27869)
## 2 POINT (103.8455 1.27883)
## 3 POINT (103.8455 1.27883)
## 4 POINT (103.8455 1.27883)
## 5 POINT (103.8455 1.27883)
## 6 POINT (103.9184 1.32688)
## 7 POINT (103.8895 1.36625)
## 8 POINT (103.9698 1.35891)
## 9 POINT (103.8359 1.33186)
## 10 POINT (103.8435 1.31962)
head(embassy %>%
filter(FAC_TYPE == 9993), n=10)
## Simple feature collection with 10 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.7836 ymin: 1.27958 xmax: 103.8605 ymax: 1.33637
## geographic CRS: WGS 84
## POI_ID SEQ_NUM FAC_TYPE POI_NAME ST_NAME
## 1 1141424380 1 9993 CONSULATE SAN MARINO CHURCH ST
## 2 1141424404 1 9993 EMBASSY LAOS GOLDHILL PLZ
## 3 1141424402 1 9993 CONSULATE BELIZE CECIL ST
## 4 1141424338 1 9993 GENERAL CONSULATE OMAN <NA>
## 5 1001332522 1 9993 EMBASSY NORWAY RAFFLES QUAY
## 6 1001332520 1 9993 EMBASSY PANAMA RAFFLES QUAY
## 7 1058438674 1 9993 GENERAL CONSULATE GREECE DUNEARN RD
## 8 1069433720 1 9993 EMBASSY NORTH KOREA BEACH RD
## 9 995186738 1 9993 EMBASSY TURKEY SHENTON WAY
## 10 1058455576 1 9993 HIGH COMMISSION MALDIVES THOMSON RD
## geometry
## 1 POINT (103.8494 1.28343)
## 2 POINT (103.8431 1.31836)
## 3 POINT (103.8493 1.28128)
## 4 POINT (103.8578 1.2999)
## 5 POINT (103.8512 1.28113)
## 6 POINT (103.8512 1.2812)
## 7 POINT (103.7836 1.33637)
## 8 POINT (103.8605 1.30008)
## 9 POINT (103.8503 1.27958)
## 10 POINT (103.8441 1.3169)
After viewing in detail, there are 2 types of FAC_TYPE, 9525 indicates the Government building, and 9993 indicates embassy. Therefore, I will need to convert the SpatialPointsDataFrame into sf dataframe before I can filter them separately.
summary(financial)
## LINK_ID POI_ID SEQ_NUM FAC_TYPE
## Min. :1.161e+08 Min. :3.644e+07 Min. :1.000 Min. :3578
## 1st Qu.:8.594e+08 1st Qu.:1.097e+09 1st Qu.:1.000 1st Qu.:3578
## Median :9.140e+08 Median :1.113e+09 Median :1.000 Median :3578
## Mean :9.092e+08 Mean :1.088e+09 Mean :1.008 Mean :4397
## 3rd Qu.:1.046e+09 3rd Qu.:1.132e+09 3rd Qu.:1.000 3rd Qu.:6000
## Max. :1.224e+09 Max. :1.204e+09 Max. :2.000 Max. :6000
##
## POI_NAME POI_LANGCD POI_NMTYPE POI_ST_NUM ST_NUM_FUL ST_NFUL_LC
## OCBC :788 ENG:3320 B:3293 1 : 212 29A : 1 ENG : 5
## UOB :577 J: 27 10 : 76 333A: 1 NA's:3315
## POSB :564 2 : 53 77B : 1
## DBS :282 11 : 50 7A : 1
## CITIBANK:153 304 : 50 8A : 1
## MAYBANK : 51 (Other):2004 NA's:3315
## (Other) :905 NA's : 875
## ST_NAME ST_LANGCD POI_ST_SD ACC_TYPE PH_NUMBER
## ORCHARD RD : 156 ENG :2926 L:1652 NA's:3320 63396666 : 52
## BEACH RD : 44 NA's: 394 N: 24 63272265 : 16
## NORTH BRIDGE RD : 39 R:1644 18002222121: 15
## COLLYER QUAY : 38 18004383333: 13
## NEW UPP CHANGI RD: 35 18001111111: 11
## (Other) :2614 (Other) : 539
## NA's : 394 NA's :2674
## CHAIN_ID NAT_IMPORT PRIVATE IN_VICIN NUM_PARENT
## Min. : 0 N:3320 N:3320 N:3320 Min. :0.0000
## 1st Qu.: 2526 1st Qu.:0.0000
## Median : 6918 Median :0.0000
## Mean : 5121 Mean :0.3807
## 3rd Qu.: 6920 3rd Qu.:1.0000
## Max. :24982 Max. :2.0000
##
## NUM_CHILD PERCFRREF VANCITY_ID
## Min. :0.0000000 Min. : 1.00 Min. :0
## 1st Qu.:0.0000000 1st Qu.:30.00 1st Qu.:0
## Median :0.0000000 Median :50.00 Median :0
## Mean :0.0003012 Mean :46.87 Mean :0
## 3rd Qu.:0.0000000 3rd Qu.:60.00 3rd Qu.:0
## Max. :1.0000000 Max. :99.00 Max. :0
## NA's :1339
## ACT_ADDR
## 1 KIM SENG PROMENADE SINGAPORE 237994: 7
## 3 TEMASEK BOULEVARD SINGAPORE 038983: 7
## 530 LORONG 6 TOA PAYOH SINGAPORE 310530: 7
## 2 JURONG EAST ST 21 SINGAPORE 609601: 6
## 3D RIVER VALLEY ROAD SINGAPORE 179023: 6
## (Other) : 243
## NA's :3044
## ACT_LANGCD ACT_ST_NAM ACT_ST_NUM ACT_ADMIN
## ENG : 276 DUNEARN ROAD : 7 1 : 20 INGAPORE : 1
## NA's:3044 KIM SENG PROMENADE: 7 3 : 11 SINGAPORE: 275
## LORONG 6 TOA PAYOH: 7 2 : 10 NA's :3044
## PAYA LEBAR ROAD : 7 50 : 8
## TEMASEK BOULEVARD : 7 530 : 7
## (Other) : 241 (Other): 220
## NA's :3044 NA's :3044
## ACT_POSTAL geometry
## 038983 : 7 POINT :3320
## 237994 : 7 epsg:4326 : 0
## 310530 : 7 +proj=long...: 0
## 609601 : 7
## 179023 : 6
## (Other): 242
## NA's :3044
Filter and convert the columns to the correct data types
financial_reduced <- subset(financial, select=-c(LINK_ID, POI_ID, SEQ_NUM, ST_NAME, POI_LANGCD, POI_NMTYPE, POI_ST_NUM, ST_NUM_FUL, ST_NFUL_LC, ST_LANGCD, POI_ST_SD, ACC_TYPE, PH_NUMBER, CHAIN_ID, NAT_IMPORT, PRIVATE, IN_VICIN, NUM_PARENT, NUM_CHILD, PERCFRREF, VANCITY_ID, ACT_ADDR, ACT_LANGCD, ACT_ST_NAM, ACT_ST_NUM, ACT_ADMIN, ACT_POSTAL))
financial_reduced$FAC_TYPE <- factor(financial_reduced$FAC_TYPE)
summary(financial_reduced)
## FAC_TYPE POI_NAME geometry
## 3578:2198 OCBC :788 POINT :3320
## 6000:1122 UOB :577 epsg:4326 : 0
## POSB :564 +proj=long...: 0
## DBS :282
## CITIBANK:153
## MAYBANK : 51
## (Other) :905
head(financial_reduced %>%
filter(FAC_TYPE == 3578), n=10)
## Simple feature collection with 10 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.7189 ymin: 1.29367 xmax: 103.8635 ymax: 1.41695
## geographic CRS: WGS 84
## FAC_TYPE POI_NAME geometry
## 1 3578 UOB POINT (103.833 1.41695)
## 2 3578 POSB POINT (103.7989 1.30211)
## 3 3578 UOB POINT (103.7989 1.30211)
## 4 3578 OCBC POINT (103.7989 1.30211)
## 5 3578 OCBC POINT (103.7189 1.35016)
## 6 3578 UOB POINT (103.7723 1.29608)
## 7 3578 OCBC POINT (103.7719 1.29367)
## 8 3578 CITIBANK POINT (103.7723 1.29608)
## 9 3578 UOB POINT (103.8635 1.31838)
## 10 3578 UOB POINT (103.7801 1.29684)
head(financial_reduced %>%
filter(FAC_TYPE == 6000), n=10)
## Simple feature collection with 10 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.7739 ymin: 1.27715 xmax: 103.9224 ymax: 1.43101
## geographic CRS: WGS 84
## FAC_TYPE POI_NAME geometry
## 1 6000 MAYBANK POINT (103.9224 1.31199)
## 2 6000 ADPOST MONEYCHANGER POINT (103.8242 1.30528)
## 3 6000 HBZ INT'L EXCHANGE CO (S) POINT (103.8512 1.29212)
## 4 6000 RAZINA EXCHANGE POINT (103.8454 1.27721)
## 5 6000 OCBC POINT (103.7739 1.43101)
## 6 6000 MONEY SUPPLY POINT (103.8468 1.27715)
## 7 6000 SINO MONEY CHANGER POINT (103.9047 1.30211)
## 8 6000 EVERPEACE MONEY CHANGER POINT (103.9047 1.30211)
## 9 6000 POSB POINT (103.871 1.35411)
## 10 6000 DBS POINT (103.8556 1.30219)
After viewing in detail, there are 2 types of FAC_TYPE, 3578 indicates the Banks, and 6000 indicates Money Services Business (MSB). Therefore, I will need to convert the SpatialPointsDataFrame into sf dataframe before I can filter them separately.
summary(residential)
## POI_ID SEQ_NUM FAC_TYPE
## Min. :3.644e+07 Min. :1.000 Min. :9590
## 1st Qu.:9.968e+08 1st Qu.:1.000 1st Qu.:9590
## Median :1.070e+09 Median :1.000 Median :9590
## Mean :1.052e+09 Mean :1.007 Mean :9590
## 3rd Qu.:1.105e+09 3rd Qu.:1.000 3rd Qu.:9590
## Max. :1.204e+09 Max. :2.000 Max. :9590
##
## POI_NAME ST_NAME geometry
## BLISSFUL VIEW : 3 PASIR PANJANG RD : 45 POINT :3604
## CLEMENTI PARK : 3 UPP EAST COAST RD: 29 epsg:4326 : 0
## COMPASSVALE VIEW : 3 LOR K TELOK KURAU: 26 +proj=long...: 0
## KING'S MANSION : 3 BUKIT TIMAH RD : 24
## MIDPOINT PROPERTIES : 3 EAST COAST RD : 23
## NEE SOON CENTRAL ESTATE: 3 (Other) :3412
## (Other) :3586 NA's : 45
Filter and convert the columns to the correct data types
residential_reduced <- subset(residential, select=-c(POI_ID, SEQ_NUM, ST_NAME))
residential_reduced$FAC_TYPE <- factor(residential_reduced$FAC_TYPE)
summary(residential_reduced)
## FAC_TYPE POI_NAME geometry
## 9590:3604 BLISSFUL VIEW : 3 POINT :3604
## CLEMENTI PARK : 3 epsg:4326 : 0
## COMPASSVALE VIEW : 3 +proj=long...: 0
## KING'S MANSION : 3
## MIDPOINT PROPERTIES : 3
## NEE SOON CENTRAL ESTATE: 3
## (Other) :3586
head(residential_reduced, n=10)
## Simple feature collection with 10 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.7754 ymin: 1.28119 xmax: 103.9516 ymax: 1.36908
## geographic CRS: WGS 84
## FAC_TYPE POI_NAME geometry
## 1 9590 MARINA BAY SERVICED APARTMENTS POINT (103.8526 1.28119)
## 2 9590 SIN MING VILLE POINT (103.8355 1.35361)
## 3 9590 GREENTOPS @ SIMS PLACE POINT (103.8797 1.31643)
## 4 9590 MOUNTBATTEN DAKOTA CRESCENT POINT (103.8895 1.30834)
## 5 9590 SINGA COURT POINT (103.9084 1.33037)
## 6 9590 FORESQUE RESIDENCES POINT (103.7754 1.36908)
## 7 9590 TIONG BAHRU COURT POINT (103.8272 1.28439)
## 8 9590 BIRMINGHAM MANSIONS POINT (103.8444 1.31747)
## 9 9590 THOMSON EURO-ASIA POINT (103.8443 1.31787)
## 10 9590 STRATFORD COURT POINT (103.9517 1.32789)
After viewing in detail, there is only 1 type of FAC_TYPE, 9590 indicates residental area.
summary(shopping)
## POI_ID SEQ_NUM FAC_TYPE
## Min. :3.644e+07 Min. :1.000 Min. :6512
## 1st Qu.:9.656e+08 1st Qu.:1.000 1st Qu.:6512
## Median :1.070e+09 Median :1.000 Median :6512
## Mean :8.934e+08 Mean :1.108 Mean :6512
## 3rd Qu.:1.104e+09 3rd Qu.:1.000 3rd Qu.:6512
## Max. :1.204e+09 Max. :3.000 Max. :6512
##
## POI_NAME ST_NAME
## BUKIT BATOK WEST SHOPPING CENTRE: 2 ORCHARD RD : 37
## CHANGE ALLEY : 2 BUKIT TIMAH RD: 7
## FARMART CENTRE : 2 SCOTTS RD : 7
## FORTUNE CENTRE : 2 BEACH RD : 6
## HARBOUR FRONT CENTRE ENTRANCE : 2 BENCOOLEN ST : 5
## NEW WORLD CENTRE : 2 (Other) :347
## (Other) :499 NA's :102
## geometry
## POINT :511
## epsg:4326 : 0
## +proj=long...: 0
##
##
##
##
Filter and convert the columns to the correct data types
shopping_reduced <- subset(shopping, select=-c(POI_ID, SEQ_NUM, ST_NAME))
shopping_reduced$FAC_TYPE <- factor(shopping_reduced$FAC_TYPE)
summary(shopping_reduced)
## FAC_TYPE POI_NAME geometry
## 6512:511 BUKIT BATOK WEST SHOPPING CENTRE: 2 POINT :511
## CHANGE ALLEY : 2 epsg:4326 : 0
## FARMART CENTRE : 2 +proj=long...: 0
## FORTUNE CENTRE : 2
## HARBOUR FRONT CENTRE ENTRANCE : 2
## NEW WORLD CENTRE : 2
## (Other) :499
head(shopping, n=10)
## Simple feature collection with 10 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.7127 ymin: 1.25619 xmax: 103.9041 ymax: 1.35375
## geographic CRS: WGS 84
## POI_ID SEQ_NUM FAC_TYPE POI_NAME
## 1 1132106213 1 6512 SIN MING CENTRE
## 2 801758392 1 6512 THE ADELPHI
## 3 842821452 1 6512 BOON LAY SHOPPING CENTRE
## 4 1193779191 1 6512 KATONG SQUARE
## 5 801758399 1 6512 SIM LIM SQUARE
## 6 1001450091 1 6512 PEOPLE'S PARK COMPLEX
## 7 1069767253 1 6512 UNITED SQUARE GOLDHILL PLAZA ENTRANCE
## 8 1069767253 2 6512 UNITED SQUARE GOLDHILL PLZ ENTRANCE
## 9 1039562724 1 6512 THE FORUM
## 10 1039562723 1 6512 WATERFRONT
## ST_NAME geometry
## 1 SIN MING RD POINT (103.836 1.35375)
## 2 COLEMAN ST POINT (103.8515 1.29124)
## 3 BOON LAY PL POINT (103.7127 1.34672)
## 4 EAST COAST RD POINT (103.9041 1.305)
## 5 ROCHOR CANAL RD POINT (103.8533 1.30341)
## 6 PARK RD POINT (103.843 1.28458)
## 7 <NA> POINT (103.8432 1.31744)
## 8 <NA> POINT (103.8432 1.31744)
## 9 <NA> POINT (103.8205 1.25619)
## 10 <NA> POINT (103.8205 1.25619)
After viewing in detail, there is only 1 type of FAC_TYPE, 6512 indicates shopping centres.
| OVERALL FAC_TYPE | |||||||
|---|---|---|---|---|---|---|---|
| Business | Industry | Government Building | Embassy | Banks | Money Services Business (MSB) | Residental Area | Shopping Centres |
| 5000 | 9991 | 9525 | 9993 | 3578 | 6000 | 9590 | 6512 |
Using CRS to retrieve the projection information of the geospatial data to ensure that they are projected in same projection system. ## Subzone spatial data
mpsz_3414 <- st_as_sf(mpsz.clean, 3414)
mpsz_3414 <- st_transform(mpsz_3414, 3414)
st_crs(mpsz_3414)
## Coordinate Reference System:
## User input: EPSG:3414
## wkt:
## PROJCRS["SVY21 / Singapore TM",
## BASEGEOGCRS["SVY21",
## DATUM["SVY21",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4757]],
## CONVERSION["Singapore Transverse Mercator",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",1.36666666666667,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",103.833333333333,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",1,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",28001.642,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",38744.572,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["northing (N)",north,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["easting (E)",east,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["unknown"],
## AREA["Singapore"],
## BBOX[1.13,103.59,1.47,104.07]],
## ID["EPSG",3414]]
business_reduced <- st_transform(business_reduced, 3414)
st_crs(business_reduced)
## Coordinate Reference System:
## User input: EPSG:3414
## wkt:
## PROJCRS["SVY21 / Singapore TM",
## BASEGEOGCRS["SVY21",
## DATUM["SVY21",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4757]],
## CONVERSION["Singapore Transverse Mercator",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",1.36666666666667,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",103.833333333333,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",1,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",28001.642,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",38744.572,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["northing (N)",north,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["easting (E)",east,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["unknown"],
## AREA["Singapore"],
## BBOX[1.13,103.59,1.47,104.07]],
## ID["EPSG",3414]]
embassy_reduced <- st_transform(embassy_reduced, 3414)
crs(embassy_reduced)
## [1] "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
financial_reduced <- st_transform(financial_reduced, 3414)
crs(financial_reduced)
## [1] "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
residential_reduced <- st_transform(residential_reduced, 3414)
crs(residential_reduced)
## [1] "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
shopping_reduced <- st_transform(shopping_reduced, 3414)
crs(shopping_reduced)
## [1] "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
Binding the subzone spatial data with business urban functions spatial data
mpsz_business <- bind_cols(business_reduced, mpsz_3414[as.numeric(st_within(business_reduced, mpsz_3414)),]) %>%
group_by(SUBZONE_N, FAC_TYPE, POI_NAME)
Sort and tally the number of business and industry in each subzone
urban_business <- mpsz_business %>%
group_by(SUBZONE_N, FAC_TYPE) %>%
tally() %>%
spread(FAC_TYPE, n) %>%
select(SUBZONE_N, `5000`, `9991`)
urban_business <- st_drop_geometry(urban_business)
urban_business <- replace(urban_business, is.na(urban_business), 0)
urban_business <- urban_business %>%
summarise(`5000`=sum(`5000`), `9991`=sum(`9991`))
Binding the subzone spatial data with Embassy urban functions spatial data
mpsz_embassy <- bind_cols(embassy_reduced, mpsz_3414[as.numeric(st_within(embassy_reduced, mpsz_3414)),]) %>%
select(SUBZONE_N, FAC_TYPE, POI_NAME, geometry)
Sort and tally the number of government buildings and embassy in each subzone
urban_embassy <- mpsz_embassy %>%
group_by(SUBZONE_N, FAC_TYPE) %>%
tally() %>%
spread(FAC_TYPE, n) %>%
select(SUBZONE_N, `9525`, `9993`)
urban_embassy <- st_drop_geometry(urban_embassy)
urban_embassy <- replace(urban_embassy, is.na(urban_embassy), 0)
urban_embassy <- urban_embassy %>%
summarise(`9525`=sum(`9525`), `9993`=sum(`9993`))
Binding the subzone spatial data with financial urban functions spatial data
mpsz_financial <- bind_cols(financial_reduced, mpsz_3414[as.numeric(st_within(financial_reduced, mpsz_3414)),]) %>%
select(SUBZONE_N, FAC_TYPE, POI_NAME, geometry)
Sort and tally the number of banks and MSB in each subzone
urban_financial <- mpsz_financial %>%
group_by(SUBZONE_N, FAC_TYPE) %>%
tally() %>%
spread(FAC_TYPE, n) %>%
select(SUBZONE_N, `3578`, `6000`)
urban_financial <- st_drop_geometry(urban_financial)
urban_financial <- replace(urban_financial, is.na(urban_financial), 0)
urban_financial <- urban_financial %>%
summarise(`3578`=sum(`3578`), `6000`=sum(`6000`))
Binding the subzone spatial data with residential urban functions spatial data
mpsz_residential <- bind_cols(residential_reduced, mpsz_3414[as.numeric(st_within(residential_reduced, mpsz_3414)),]) %>%
select(SUBZONE_N, FAC_TYPE, POI_NAME, geometry)
Sort and tally the number of residential in each subzone
urban_residential <- mpsz_residential %>%
group_by(SUBZONE_N, FAC_TYPE) %>%
tally() %>%
spread(FAC_TYPE, n) %>%
select(SUBZONE_N, `9590`)
urban_residential <- st_drop_geometry(urban_residential)
urban_residential <- replace(urban_residential, is.na(urban_residential), 0)
urban_residential <- urban_residential %>%
summarise(`9590`=sum(`9590`))
Binding the subzone spatial data with shopping urban functions spatial data
mpsz_shopping <- bind_cols(shopping_reduced, mpsz_3414[as.numeric(st_within(shopping_reduced, mpsz_3414)),]) %>%
select(SUBZONE_N, FAC_TYPE, POI_NAME, geometry)
Sort and tally the number of shopping in each subzone
urban_shopping <- mpsz_shopping %>%
group_by(SUBZONE_N, FAC_TYPE) %>%
tally() %>%
spread(FAC_TYPE, n) %>%
select(SUBZONE_N, `6512`)
urban_shopping <- st_drop_geometry(urban_shopping)
urban_shopping <- replace(urban_shopping, is.na(urban_shopping), 0)
urban_shopping <- urban_shopping %>%
summarise(`6512`=sum(`6512`))
mpsz_urban <- merge(urban_business, urban_embassy)
mpsz_urban <- merge(mpsz_urban, urban_financial)
mpsz_urban <- merge(mpsz_urban, urban_residential)
mpsz_urban <- merge(mpsz_urban, urban_shopping)
mpsz_urban <- mpsz_urban %>%
rename(`Business`=`5000`, `Industry`=`9991`, `Government`=`9525`, `Embassy`=`9993`, `Banks`=`3578`, `MSB`=`6000`, `Residential`=`9590`, `Shopping`=`6512`)
head(mpsz_urban, n=10)
## SUBZONE_N Business Industry Government Embassy Banks MSB
## 1 ALEXANDRA HILL 39 1 6 1 10 5
## 2 ALJUNIED 62 1 2 0 32 10
## 3 ANAK BUKIT 13 0 1 0 27 4
## 4 ANG MO KIO TOWN CENTRE 1 0 1 0 13 16
## 5 ANSON 15 0 1 2 4 2
## 6 BALESTIER 23 0 6 0 19 12
## 7 BEDOK NORTH 5 0 2 0 46 22
## 8 BENCOOLEN 13 0 1 0 6 9
## 9 BISHAN EAST 1 0 3 0 17 8
## 10 BOAT QUAY 40 0 2 0 0 2
## Residential Shopping
## 1 4 1
## 2 140 2
## 3 49 3
## 4 8 5
## 5 3 1
## 6 123 6
## 7 29 6
## 8 7 5
## 9 10 2
## 10 1 1
urban_population <- left_join(population_combine, mpsz_urban, by=c("SZ"="SUBZONE_N"))
urban_population <- left_join(mpsz_3414, urban_population, by=c("SUBZONE_N"="SZ")) %>%
select("SUBZONE_N", "Condo/Apt", "HDB1-2RM", "HDB3-4RM", "HDB5RM-EC", "Landed Properties", "YOUNG_PR", "ACTIVE_PR", "AGED_PR", "DENSITY", "Business", "Industry", "Government", "Embassy", "Banks", "MSB", "Residential", "Shopping")
urban_population <- replace(urban_population, is.na(urban_population), 0)
Select on the columns that are relevant and replace all the NA to 0 due to the no data after combining all the data.
urban_population_df <- as.data.frame(urban_population)
urban_vars.cor = cor(urban_population_df[,2:18])
corrplot.mixed(urban_vars.cor,
lower = "ellipse",
upper = "number",
tl.pos = "lt",
diag = "l",
tl.col = "black", tl.cex=0.9, number.cex=0.5)
The correlation plot above shows that YOUNG_PR and ACTIVE_PR are highly correlated. Therefore, YOUNG_PR should be used in the cluster analysis due to the lower correlated values with other variables.
from the shan_sf simple feature object into dataframe
cluster_vars <- urban_population_df %>%
select("SUBZONE_N", "Condo/Apt", "HDB1-2RM", "HDB3-4RM", "HDB5RM-EC", "Landed Properties", "YOUNG_PR", "AGED_PR", "DENSITY", "Business", "Industry", "Government", "Embassy", "Banks", "MSB", "Residential", "Shopping")
head(cluster_vars, n=10)
## SUBZONE_N Condo/Apt HDB1-2RM HDB3-4RM HDB5RM-EC Landed Properties
## 1 MARINA SOUTH 0 0 0 0 0
## 2 PEARL'S HILL 420 4520 1400 0 0
## 3 BOAT QUAY 50 0 0 0 0
## 4 HENDERSON HILL 190 3930 7990 1270 0
## 5 REDHILL 1930 1970 3520 3240 0
## 6 ALEXANDRA HILL 0 3910 6320 3120 0
## 7 BUKIT HO SWEE 540 3700 8610 1900 0
## 8 CLARKE QUAY 10 0 0 0 0
## 9 PASIR PANJANG 1 2970 0 0 0 1430
## 10 QUEENSWAY 290 0 0 0 0
## YOUNG_PR AGED_PR DENSITY Business Industry Government Embassy Banks
## 1 0.0000000 0.0000000 0.000000e+00 0 0 0 0 0
## 2 0.1671924 0.3154574 1.132514e-02 6 0 1 0 10
## 3 0.0000000 0.2000000 3.109308e-04 40 0 2 0 0
## 4 0.1958146 0.2436472 2.247120e-02 0 0 0 0 0
## 5 0.2664165 0.1463415 2.751469e-02 0 0 0 0 0
## 6 0.2157303 0.2337079 1.295640e-02 39 1 6 1 10
## 7 0.1932203 0.2433898 2.673399e-02 6 0 3 1 5
## 8 0.0000000 0.0000000 3.446082e-05 12 0 3 1 17
## 9 0.2522727 0.1227273 4.056076e-03 0 0 0 0 0
## 10 0.1034483 0.1724138 4.591192e-04 0 0 0 0 0
## MSB Residential Shopping
## 1 0 0 0
## 2 15 6 3
## 3 2 1 1
## 4 0 0 0
## 5 0 0 0
## 6 5 4 1
## 7 1 11 1
## 8 2 6 10
## 9 0 0 0
## 10 0 0 0
row.names(cluster_vars) <- cluster_vars$"SUBZONE_N"
head(cluster_vars, n=10)
## SUBZONE_N Condo/Apt HDB1-2RM HDB3-4RM HDB5RM-EC
## MARINA SOUTH MARINA SOUTH 0 0 0 0
## PEARL'S HILL PEARL'S HILL 420 4520 1400 0
## BOAT QUAY BOAT QUAY 50 0 0 0
## HENDERSON HILL HENDERSON HILL 190 3930 7990 1270
## REDHILL REDHILL 1930 1970 3520 3240
## ALEXANDRA HILL ALEXANDRA HILL 0 3910 6320 3120
## BUKIT HO SWEE BUKIT HO SWEE 540 3700 8610 1900
## CLARKE QUAY CLARKE QUAY 10 0 0 0
## PASIR PANJANG 1 PASIR PANJANG 1 2970 0 0 0
## QUEENSWAY QUEENSWAY 290 0 0 0
## Landed Properties YOUNG_PR AGED_PR DENSITY Business
## MARINA SOUTH 0 0.0000000 0.0000000 0.000000e+00 0
## PEARL'S HILL 0 0.1671924 0.3154574 1.132514e-02 6
## BOAT QUAY 0 0.0000000 0.2000000 3.109308e-04 40
## HENDERSON HILL 0 0.1958146 0.2436472 2.247120e-02 0
## REDHILL 0 0.2664165 0.1463415 2.751469e-02 0
## ALEXANDRA HILL 0 0.2157303 0.2337079 1.295640e-02 39
## BUKIT HO SWEE 0 0.1932203 0.2433898 2.673399e-02 6
## CLARKE QUAY 0 0.0000000 0.0000000 3.446082e-05 12
## PASIR PANJANG 1 1430 0.2522727 0.1227273 4.056076e-03 0
## QUEENSWAY 0 0.1034483 0.1724138 4.591192e-04 0
## Industry Government Embassy Banks MSB Residential Shopping
## MARINA SOUTH 0 0 0 0 0 0 0
## PEARL'S HILL 0 1 0 10 15 6 3
## BOAT QUAY 0 2 0 0 2 1 1
## HENDERSON HILL 0 0 0 0 0 0 0
## REDHILL 0 0 0 0 0 0 0
## ALEXANDRA HILL 1 6 1 10 5 4 1
## BUKIT HO SWEE 0 3 1 5 1 11 1
## CLARKE QUAY 0 3 1 17 2 6 10
## PASIR PANJANG 1 0 0 0 0 0 0 0
## QUEENSWAY 0 0 0 0 0 0 0
subzone_urban <- select(cluster_vars, c(2:17))
head(subzone_urban, n=10)
## Condo/Apt HDB1-2RM HDB3-4RM HDB5RM-EC Landed Properties
## MARINA SOUTH 0 0 0 0 0
## PEARL'S HILL 420 4520 1400 0 0
## BOAT QUAY 50 0 0 0 0
## HENDERSON HILL 190 3930 7990 1270 0
## REDHILL 1930 1970 3520 3240 0
## ALEXANDRA HILL 0 3910 6320 3120 0
## BUKIT HO SWEE 540 3700 8610 1900 0
## CLARKE QUAY 10 0 0 0 0
## PASIR PANJANG 1 2970 0 0 0 1430
## QUEENSWAY 290 0 0 0 0
## YOUNG_PR AGED_PR DENSITY Business Industry Government
## MARINA SOUTH 0.0000000 0.0000000 0.000000e+00 0 0 0
## PEARL'S HILL 0.1671924 0.3154574 1.132514e-02 6 0 1
## BOAT QUAY 0.0000000 0.2000000 3.109308e-04 40 0 2
## HENDERSON HILL 0.1958146 0.2436472 2.247120e-02 0 0 0
## REDHILL 0.2664165 0.1463415 2.751469e-02 0 0 0
## ALEXANDRA HILL 0.2157303 0.2337079 1.295640e-02 39 1 6
## BUKIT HO SWEE 0.1932203 0.2433898 2.673399e-02 6 0 3
## CLARKE QUAY 0.0000000 0.0000000 3.446082e-05 12 0 3
## PASIR PANJANG 1 0.2522727 0.1227273 4.056076e-03 0 0 0
## QUEENSWAY 0.1034483 0.1724138 4.591192e-04 0 0 0
## Embassy Banks MSB Residential Shopping
## MARINA SOUTH 0 0 0 0 0
## PEARL'S HILL 0 10 15 6 3
## BOAT QUAY 0 0 2 1 1
## HENDERSON HILL 0 0 0 0 0
## REDHILL 0 0 0 0 0
## ALEXANDRA HILL 1 10 5 4 1
## BUKIT HO SWEE 1 5 1 11 1
## CLARKE QUAY 1 17 2 6 10
## PASIR PANJANG 1 0 0 0 0 0
## QUEENSWAY 0 0 0 0 0
subzone_urban.std <- normalize(subzone_urban)
summary(subzone_urban.std)
## Condo/Apt HDB1-2RM HDB3-4RM HDB5RM-EC
## Min. :0.00000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.01371 Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.10894 Mean :0.1153 Mean :0.07937 Mean :0.06875
## 3rd Qu.:0.16905 3rd Qu.:0.1287 3rd Qu.:0.12940 3rd Qu.:0.07631
## Max. :1.00000 Max. :1.0000 Max. :1.00000 Max. :1.00000
## Landed Properties YOUNG_PR AGED_PR DENSITY
## Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.00000 Median :0.6215 Median :0.3409 Median :0.1272
## Mean :0.04112 Mean :0.4861 Mean :0.3256 Mean :0.2315
## 3rd Qu.:0.02125 3rd Qu.:0.7553 3rd Qu.:0.5439 3rd Qu.:0.4313
## Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## Business Industry Government Embassy
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.01796 Mean :0.01176 Mean :0.05015 Mean :0.02145
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## Banks MSB Residential Shopping
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.05475 Mean :0.02409 Mean :0.03501 Mean :0.03166
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
Min-max standardised clustering variables are now 0 to 1.
subzone_urban.z <- scale(subzone_urban)
describe(subzone_urban.z)
## vars n mean sd median trimmed mad min max range skew
## Condo/Apt 1 323 0 1 -0.56 -0.22 0.12 -0.64 5.25 5.89 1.99
## HDB1-2RM 2 323 0 1 -0.52 -0.25 0.00 -0.52 4.01 4.53 2.18
## HDB3-4RM 3 323 0 1 -0.60 -0.21 0.00 -0.60 6.96 7.56 2.70
## HDB5RM-EC 4 323 0 1 -0.49 -0.24 0.00 -0.49 6.65 7.14 3.33
## Landed Properties 5 323 0 1 -0.36 -0.25 0.00 -0.36 8.42 8.78 4.66
## YOUNG_PR 6 323 0 1 0.40 0.03 0.78 -1.42 1.50 2.93 -0.52
## AGED_PR 7 323 0 1 0.06 -0.05 1.37 -1.23 2.55 3.78 0.08
## DENSITY 8 323 0 1 -0.39 -0.15 0.70 -0.87 2.87 3.74 0.95
## Business 9 323 0 1 -0.25 -0.22 0.00 -0.25 13.80 14.06 9.09
## Industry 10 323 0 1 -0.15 -0.15 0.00 -0.15 12.31 12.46 9.37
## Government 11 323 0 1 -0.34 -0.27 0.00 -0.34 6.38 6.72 4.06
## Embassy 12 323 0 1 -0.21 -0.21 0.00 -0.21 9.36 9.56 6.32
## Banks 13 323 0 1 -0.39 -0.27 0.00 -0.39 6.69 7.08 3.44
## MSB 14 323 0 1 -0.27 -0.24 0.00 -0.27 10.77 11.04 6.59
## Residential 15 323 0 1 -0.28 -0.24 0.00 -0.28 7.65 7.93 5.33
## Shopping 16 323 0 1 -0.30 -0.24 0.00 -0.30 9.21 9.51 5.81
## kurtosis se
## Condo/Apt 4.33 0.06
## HDB1-2RM 4.26 0.06
## HDB3-4RM 10.59 0.06
## HDB5RM-EC 13.52 0.06
## Landed Properties 26.86 0.06
## YOUNG_PR -1.38 0.06
## AGED_PR -1.21 0.06
## DENSITY -0.33 0.06
## Business 112.31 0.06
## Industry 99.30 0.06
## Government 18.63 0.06
## Embassy 44.36 0.06
## Banks 14.37 0.06
## MSB 55.13 0.06
## Residential 30.95 0.06
## Shopping 42.94 0.06
r <- ggplot(data=subzone_urban, aes(x=`YOUNG_PR`))+
geom_histogram(bins=20, color="black", fill="light blue")+
ggtitle("Without Standardisation")
subzone_urban_s_df <- as.data.frame(subzone_urban.std)
s <- ggplot(data=subzone_urban_s_df, aes(x=`YOUNG_PR`))+
geom_histogram(bins=20, color="black", fill="light blue")+
ggtitle("Min-Max Standardisation")
subzone_urban_z_df <- as.data.frame(subzone_urban.z)
z <- ggplot(data=subzone_urban_z_df, aes(x=`YOUNG_PR`))+
geom_histogram(bins=20, color="black", fill="light blue")+
ggtitle("Z-score Standardisation")
ggarrange(r, s, z, ncol=3, nrow=1)
The overall distribution of the variables doesn’t change much. Therefore, I will not appy any data standarisation.
distance is calcuated based on the information space, not the actual distance
proxmat <- dist(subzone_urban, method="euclidean")
head(proxmat, n=10)
## [1] 4750.49546 64.10959 8996.33259 5522.11917 8060.18617 9577.25922
## [7] 26.13427 3296.33131 290.00007 0.00000
With the not normalised euclidean distance proximity matrix
hclust_ward <- hclust(proxmat, method="ward.D")
plot(hclust_ward, cex=0.6)
distance is calcuated based on the information space, not the actual distance
proxmat <- dist(subzone_urban, method="euclidean")
summary(proxmat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3164 9113 12346 17730 91612
m <- c( "average", "single", "complete", "ward")
names(m) <- c( "average", "single", "complete", "ward")
ac <- function(x) {
agnes(subzone_urban, method=x)$ac
}
map_dbl(m, ac)
## average single complete ward
## 0.9749503 0.9437245 0.9802686 0.9909485
The results show Ward’s method identifies the strongest clustering structure compared to the other 3 clustering algorithm.
plot(hclust_ward, cex=0.6)
rect.hclust(hclust_ward, k=6, border=2:5)
After experimenting the cluster results, I choose to do a 6-cluster analysis for the social areas.
subzone_urban_mat <- data.matrix(subzone_urban)
heatmaply(normalize(subzone_urban_mat), Colv=NA, dist_method="euclidean", hclust_method="ward.D", seriate="OLO",
colors=Blues, k_row=6, margins=c(NA,200,60,NA), fontsize_row=4, fontsize_col=5,
main="Social Areas of Singapore by Subzone", xlab="Variable Indicators",
ylab="Subzone in Singapore")