packages = c('rgdal', 'spdep', 'ClustGeo', 'tmap', 'sf', 'ggpubr', 'cluster', 'heatmaply', 'corrplot', 'psych', 'tidyverse','GISTools', 'tibble','ggplot2',"plotly",'factoextra')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
Importing MP14_SUBZONE_WEB_PL layer and named is as mpsz
mpsz <- st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\IS415 Geospatial Analytics and Applications\take-home3\Take_home_Exe03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=1.366666666666667 +lon_0=103.8333333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs
View the summary of mpsz
summary(mpsz)
## OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND
## Min. : 1.0 Min. : 1.000 ADMIRALTY : 1 AMSZ01 : 1 N:274
## 1st Qu.: 81.5 1st Qu.: 2.000 AIRPORT ROAD : 1 AMSZ02 : 1 Y: 49
## Median :162.0 Median : 4.000 ALEXANDRA HILL : 1 AMSZ03 : 1
## Mean :162.0 Mean : 4.625 ALEXANDRA NORTH: 1 AMSZ04 : 1
## 3rd Qu.:242.5 3rd Qu.: 6.500 ALJUNIED : 1 AMSZ05 : 1
## Max. :323.0 Max. :17.000 ANAK BUKIT : 1 AMSZ06 : 1
## (Other) :317 (Other):317
## PLN_AREA_N PLN_AREA_C REGION_N REGION_C
## BUKIT MERAH : 17 BM : 17 CENTRAL REGION :134 CR :134
## QUEENSTOWN : 15 QT : 15 EAST REGION : 30 ER : 30
## ANG MO KIO : 12 AM : 12 NORTH-EAST REGION: 48 NER: 48
## DOWNTOWN CORE: 12 DT : 12 NORTH REGION : 41 NR : 41
## TOA PAYOH : 12 TP : 12 WEST REGION : 70 WR : 70
## HOUGANG : 10 HG : 10
## (Other) :245 (Other):245
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR
## 00F5E30B5C9B7AD8: 1 Min. :2014-12-05 Min. : 5093 Min. :19579
## 013B509B8EDF15BE: 1 1st Qu.:2014-12-05 1st Qu.:21864 1st Qu.:31776
## 01A4287FB060A0A6: 1 Median :2014-12-05 Median :28465 Median :35113
## 029BD940F4455194: 1 Mean :2014-12-05 Mean :27257 Mean :36106
## 0524461C92F35D94: 1 3rd Qu.:2014-12-05 3rd Qu.:31674 3rd Qu.:39869
## 05FD555397CBEE7A: 1 Max. :2014-12-05 Max. :50425 Max. :49553
## (Other) :317
## SHAPE_Leng SHAPE_Area geometry
## Min. : 871.5 Min. : 39438 MULTIPOLYGON :323
## 1st Qu.: 3709.6 1st Qu.: 628261 epsg:NA : 0
## Median : 5211.9 Median : 1229894 +proj=tmer...: 0
## Mean : 6524.4 Mean : 2420882
## 3rd Qu.: 6942.6 3rd Qu.: 2106483
## Max. :68083.9 Max. :69748299
##
Based on the content of the requirement, we can remove those irrelevant column by using subset() function and in this case we will only keeping SUBZONE_N, SHAPE_Area and geometry for future analysis.
mpsz <- subset(mpsz, select = -c(1:2,4:14))
mpsz
## Simple feature collection with 323 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=1.366666666666667 +lon_0=103.8333333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs
## First 10 features:
## SUBZONE_N SHAPE_Area geometry
## 1 MARINA SOUTH 1630379.3 MULTIPOLYGON (((31495.56 30...
## 2 PEARL'S HILL 559816.2 MULTIPOLYGON (((29092.28 30...
## 3 BOAT QUAY 160807.5 MULTIPOLYGON (((29932.33 29...
## 4 HENDERSON HILL 595428.9 MULTIPOLYGON (((27131.28 30...
## 5 REDHILL 387429.4 MULTIPOLYGON (((26451.03 30...
## 6 ALEXANDRA HILL 1030378.8 MULTIPOLYGON (((25899.7 297...
## 7 BUKIT HO SWEE 551732.0 MULTIPOLYGON (((27746.95 30...
## 8 CLARKE QUAY 290184.7 MULTIPOLYGON (((29351.26 29...
## 9 PASIR PANJANG 1 1084792.3 MULTIPOLYGON (((20996.49 30...
## 10 QUEENSWAY 631644.3 MULTIPOLYGON (((24472.11 29...
Perform a class() function check to see the class of the mpsz.
class(mpsz)
## [1] "sf" "data.frame"
Converting mpsz into SpatialPolygonsDataFrame
mpsz_sp <- as(mpsz,"Spatial")
Importing all urban functions layer and named it respectively.
embassy <- st_read(dsn = "data/geospatial", layer = "Govt_Embassy")
## Reading layer `Govt_Embassy' from data source `C:\IS415 Geospatial Analytics and Applications\take-home3\Take_home_Exe03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 443 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6282 ymin: 1.24911 xmax: 103.9884 ymax: 1.45765
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
business <- st_read(dsn = "data/geospatial", layer = "Business")
## Reading layer `Business' from data source `C:\IS415 Geospatial Analytics and Applications\take-home3\Take_home_Exe03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 6550 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6147 ymin: 1.24605 xmax: 104.0044 ymax: 1.4698
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
financial <- st_read(dsn = "data/geospatial", layer = "Financial")
## Reading layer `Financial' from data source `C:\IS415 Geospatial Analytics and Applications\take-home3\Take_home_Exe03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 3320 features and 29 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6256 ymin: 1.24392 xmax: 103.9998 ymax: 1.46247
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
residential <- st_read(dsn = "data/geospatial", layer = "Private residential")
## Reading layer `Private residential' from data source `C:\IS415 Geospatial Analytics and Applications\take-home3\Take_home_Exe03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 3604 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6295 ymin: 1.23943 xmax: 103.9749 ymax: 1.45379
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
shopping <- st_read(dsn = "data/geospatial", layer = "Shopping")
## Reading layer `Shopping' from data source `C:\IS415 Geospatial Analytics and Applications\take-home3\Take_home_Exe03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 511 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.679 ymin: 1.24779 xmax: 103.9644 ymax: 1.4535
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
Using head() function to view the details of the embassy,business,financial, residential and shopping.
head(embassy)
## Simple feature collection with 6 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.8431 ymin: 1.27869 xmax: 103.8578 ymax: 1.31836
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## POI_ID SEQ_NUM FAC_TYPE POI_NAME ST_NAME
## 1 1141424380 1 9993 CONSULATE SAN MARINO CHURCH ST
## 2 1141424404 1 9993 EMBASSY LAOS GOLDHILL PLZ
## 3 1141424402 1 9993 CONSULATE BELIZE CECIL ST
## 4 1141424338 1 9993 GENERAL CONSULATE OMAN <NA>
## 5 1192460871 1 9525 MND TOWER BLOCK MAXWELL RD
## 6 1192460819 1 9525 MND AUDITORIUM & FUNCTION HALL MAXWELL RD
## geometry
## 1 POINT (103.8494 1.28343)
## 2 POINT (103.8431 1.31836)
## 3 POINT (103.8493 1.28128)
## 4 POINT (103.8578 1.2999)
## 5 POINT (103.8456 1.27869)
## 6 POINT (103.8455 1.27883)
head(business)
## Simple feature collection with 6 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.7777 ymin: 1.2972 xmax: 103.8858 ymax: 1.33844
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## POI_ID SEQ_NUM FAC_TYPE POI_NAME
## 1 1101180209 1 5000 JOHN CHEN
## 2 1101180210 1 5000 TROPICAL INDUSTRIAL BUILDING
## 3 1101180211 1 5000 LIAN CHEONG INDUSTRIAL BUILDING
## 4 1101180212 1 5000 MALAYSIA GARMENT MANUFACTURERS
## 5 1101180213 1 5000 UNIGOLD
## 6 1192316144 1 5000 NUS UNIVERSITY HALL
## ST_NAME geometry
## 1 LITTLE RD POINT (103.8856 1.33841)
## 2 LITTLE RD POINT (103.8852 1.33832)
## 3 LITTLE RD POINT (103.8852 1.33834)
## 4 <NA> POINT (103.8855 1.33821)
## 5 LITTLE RD POINT (103.8858 1.33844)
## 6 LOWER KENT RIDGE RD POINT (103.7777 1.2972)
head(financial)
## Simple feature collection with 6 features and 29 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.7189 ymin: 1.30211 xmax: 103.9224 ymax: 1.41695
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## LINK_ID POI_ID SEQ_NUM FAC_TYPE POI_NAME POI_LANGCD POI_NMTYPE
## 1 1170624361 1132324230 1 3578 UOB ENG B
## 2 1112103842 1132315471 1 3578 POSB ENG B
## 3 1112103842 1132315472 1 3578 UOB ENG B
## 4 1112103842 1132315473 1 3578 OCBC ENG B
## 5 864687596 1100784924 1 3578 OCBC ENG B
## 6 902073032 1132324170 1 6000 MAYBANK ENG B
## POI_ST_NUM ST_NUM_FUL ST_NFUL_LC ST_NAME ST_LANGCD POI_ST_SD
## 1 201 <NA> <NA> YISHUN AVE 2 ENG L
## 2 375 <NA> <NA> COMMONWEALTH AVE ENG R
## 3 375 <NA> <NA> COMMONWEALTH AVE ENG R
## 4 375 <NA> <NA> COMMONWEALTH AVE ENG R
## 5 <NA> <NA> <NA> JURONG WEST ST 51 ENG R
## 6 707 <NA> <NA> EAST COAST RD ENG L
## ACC_TYPE PH_NUMBER CHAIN_ID NAT_IMPORT PRIVATE IN_VICIN NUM_PARENT
## 1 <NA> <NA> 6919 N N N 0
## 2 <NA> <NA> 6918 N N N 0
## 3 <NA> <NA> 6919 N N N 0
## 4 <NA> <NA> 6920 N N N 0
## 5 <NA> <NA> 6920 N N N 0
## 6 <NA> 18006292266 3657 N N N 0
## NUM_CHILD PERCFRREF VANCITY_ID
## 1 0 NA 0
## 2 0 NA 0
## 3 0 NA 0
## 4 0 NA 0
## 5 0 60 0
## 6 0 NA 0
## ACT_ADDR ACT_LANGCD
## 1 <NA> <NA>
## 2 <NA> <NA>
## 3 <NA> <NA>
## 4 <NA> <NA>
## 5 501 JURONG WEST STREET 51 SINGAPORE 640501 ENG
## 6 <NA> <NA>
## ACT_ST_NAM ACT_ST_NUM ACT_ADMIN ACT_POSTAL
## 1 <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA>
## 5 JURONG WEST STREET 51 501 SINGAPORE 640501
## 6 <NA> <NA> <NA> <NA>
## geometry
## 1 POINT (103.833 1.41695)
## 2 POINT (103.7989 1.30211)
## 3 POINT (103.7989 1.30211)
## 4 POINT (103.7989 1.30211)
## 5 POINT (103.7189 1.35016)
## 6 POINT (103.9224 1.31199)
head(residential)
## Simple feature collection with 6 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.7754 ymin: 1.28119 xmax: 103.9084 ymax: 1.36908
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## POI_ID SEQ_NUM FAC_TYPE POI_NAME ST_NAME
## 1 1132324282 1 9590 MARINA BAY SERVICED APARTMENTS MARINA BLVD
## 2 1132106212 1 9590 SIN MING VILLE JALAN TODAK
## 3 1202668778 1 9590 GREENTOPS @ SIMS PLACE <NA>
## 4 1099690099 1 9590 MOUNTBATTEN DAKOTA CRESCENT DAKOTA CRES
## 5 995195128 1 9590 SINGA COURT JALAN SINGA
## 6 1176000954 1 9590 FORESQUE RESIDENCES PETIR RD
## geometry
## 1 POINT (103.8526 1.28119)
## 2 POINT (103.8355 1.35361)
## 3 POINT (103.8797 1.31643)
## 4 POINT (103.8895 1.30834)
## 5 POINT (103.9084 1.33037)
## 6 POINT (103.7754 1.36908)
head(shopping)
## Simple feature collection with 6 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.7127 ymin: 1.28458 xmax: 103.9041 ymax: 1.35375
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## POI_ID SEQ_NUM FAC_TYPE POI_NAME ST_NAME
## 1 1132106213 1 6512 SIN MING CENTRE SIN MING RD
## 2 801758392 1 6512 THE ADELPHI COLEMAN ST
## 3 842821452 1 6512 BOON LAY SHOPPING CENTRE BOON LAY PL
## 4 1193779191 1 6512 KATONG SQUARE EAST COAST RD
## 5 801758399 1 6512 SIM LIM SQUARE ROCHOR CANAL RD
## 6 1001450091 1 6512 PEOPLE'S PARK COMPLEX PARK RD
## geometry
## 1 POINT (103.836 1.35375)
## 2 POINT (103.8515 1.29124)
## 3 POINT (103.7127 1.34672)
## 4 POINT (103.9041 1.305)
## 5 POINT (103.8533 1.30341)
## 6 POINT (103.843 1.28458)
Using duplicated function to check any duplicate rows, since the POI_ID is an unique identifier, therefore we will perform a check on POI_ID instead of other variables.
In this data set, geometry is a requirement for our data analysis. Therefore, we will perform a NA check on geometry column. If there is any NA value in the geometry column, it will return TRUE, else return FALSE.
any(duplicated(embassy$POI_ID))
## [1] TRUE
any(is.na(embassy$geometry))
## [1] FALSE
any(duplicated(embassy$POI_ID))
## [1] TRUE
any(is.na(embassy$geometry))
## [1] FALSE
any(duplicated(business$POI_ID))
## [1] TRUE
any(is.na(business$geometry))
## [1] FALSE
any(duplicated(financial$POI_ID))
## [1] TRUE
any(is.na(financial$geometry))
## [1] FALSE
any(duplicated(residential$POI_ID))
## [1] TRUE
any(is.na(residential$geometry))
## [1] FALSE
any(duplicated(shopping$POI_ID))
## [1] TRUE
any(is.na(shopping$geometry))
## [1] FALSE
After performing the check on Business, there are 2 different FAC_TYPE which we can further split the data into an additional urban function called “Industry”. Based on the description of the business, we can identify that “FAC_TYPE” 9991 can be further classify as Industry and “FAC_TYPE” 5000 can be remain as Business.
industry <- business %>%
filter(FAC_TYPE == "9991")
business <- business %>%
filter(FAC_TYPE == "5000")
head(industry)
## Simple feature collection with 6 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6242 ymin: 1.29325 xmax: 103.9632 ymax: 1.33283
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## POI_ID SEQ_NUM FAC_TYPE
## 1 1110491789 1 9991
## 2 1099992474 1 9991
## 3 1099992477 1 9991
## 4 1099992477 2 9991
## 5 1100464367 1 9991
## 6 1100464367 2 9991
## POI_NAME
## 1 KAKI BUKIT INDUSTRIAL ESTATE
## 2 TERRACE FACTORIES TUAS SOUTH ST 5
## 3 TERRACE FACTORIES TUAS SOUTH ST 5 EAST ENTRANCE
## 4 JTC TERRACE FACTORIES TUAS S ST 5
## 5 CHANGI BUSINESS PARK CHANGI BUSINESS PARK CENTRAL 1 ENTRANCE
## 6 CHANGI BUSINESS PARK CENTRAL 1
## ST_NAME geometry
## 1 KAKI BUKIT AVE 1 POINT (103.9042 1.33269)
## 2 TUAS SOUTH ST 5 POINT (103.6242 1.29325)
## 3 TUAS SOUTH ST 5 POINT (103.6251 1.29333)
## 4 TUAS SOUTH ST 5 POINT (103.6251 1.29333)
## 5 CHANGI BUSINESS PARK CENTRAL 1 POINT (103.9633 1.33283)
## 6 CHANGI BUSINESS PARK CENTRAL 1 POINT (103.9633 1.33283)
any(duplicated(industry$POI_ID))
## [1] TRUE
any(is.na(industry$geometry))
## [1] FALSE
In the above code chunk, we identified that there are duplicated information, one way to filter them is by their SEQ_NUM. In this case, we will be keep those row with SEQ_NUM equal to 1.
embassy <- embassy %>%
filter(SEQ_NUM == 1)
business <- business %>%
filter(SEQ_NUM == 1)
financial <- financial %>%
filter(SEQ_NUM == 1)
residential <- residential %>%
filter(SEQ_NUM == 1)
shopping <- shopping %>%
filter(SEQ_NUM == 1)
industry <- industry %>%
filter(SEQ_NUM == 1)
As we can see the details of each urban fucntions from the code chunk of the view function, we identified that the CRS is not in SV21. Therefore, we will use st_transform function to transform the CRS to SV21 (+init=epsg:3414).
embassy <- st_transform(embassy, CRS("+init=epsg:3414"))
business <- st_transform(business, CRS("+init=epsg:3414"))
financial <- st_transform(financial, CRS("+init=epsg:3414"))
residential <- st_transform(residential, CRS("+init=epsg:3414"))
shopping <- st_transform(shopping, CRS("+init=epsg:3414"))
industry <- st_transform(industry, CRS("+init=epsg:3414"))
As we can see the class of each urban function is in sf format.
class(embassy)
## [1] "sf" "data.frame"
class(business)
## [1] "sf" "data.frame"
class(financial)
## [1] "sf" "data.frame"
class(residential)
## [1] "sf" "data.frame"
class(shopping)
## [1] "sf" "data.frame"
class(industry)
## [1] "sf" "data.frame"
Converting all the urban functions into Formal class SpatialPoints format.
embassy_sp <- as(embassy, "Spatial")
embassy_sp <- as(embassy_sp, "SpatialPoints")
business_sp <- as(business, "Spatial")
business_sp<- as(business_sp, "SpatialPoints")
financial_sp <- as(financial, "Spatial")
financial_sp<- as(financial_sp, "SpatialPoints")
residential_sp <- as(residential, "Spatial")
residential_sp<- as(residential_sp, "SpatialPoints")
shopping_sp <- as(shopping, "Spatial")
shopping_sp<- as(shopping_sp, "SpatialPoints")
industry_sp <- as(industry, "Spatial")
industry_sp<- as(industry_sp, "SpatialPoints")
Using tmap function to display the location of each point on the Subzone layer map in order to have an rough idea the location of each point. By plotting the map, we are also can have a rough idea which island or subzone do not have any data points. So in the further we can remove them.
Business_tm <- tm_shape(mpsz_sp) +
tm_polygons(col = "lightgrey") +
tm_shape(business_sp)+
tm_compass()+
tm_bubbles(size = 0.1,alpha = 0.5, col = "brown")+
tm_layout(bg.color = "skyblue",title = "Location of Business Outlet", title.size = 0.4)
Embassy_tm <- tm_shape(mpsz_sp) +
tm_polygons(col = "lightgrey") +
tm_shape(embassy_sp)+
tm_compass()+
tm_bubbles(size = 0.1,alpha = 0.5, col = "brown")+
tm_layout(bg.color = "skyblue",title = "Location of Embassy Outlet", title.size = 0.4)
Industry_tm <- tm_shape(mpsz_sp) +
tm_polygons(col = "lightgrey") +
tm_shape(industry_sp)+
tm_compass()+
tm_bubbles(size = 0.1,alpha = 0.5, col = "brown")+
tm_layout(bg.color = "skyblue",title = "Location of Industry Outlet", title.size = 0.4)
Residential_tm <- tm_shape(mpsz_sp) +
tm_polygons(col = "lightgrey") +
tm_shape(residential_sp)+
tm_compass()+
tm_bubbles(size = 0.1,alpha = 0.5, col = "brown")+
tm_layout(bg.color = "skyblue",title = "Location of Residential Outlet", title.size = 0.4)
Shopping_tm <- tm_shape(mpsz_sp) +
tm_polygons(col = "lightgrey") +
tm_shape(shopping_sp)+
tm_compass()+
tm_bubbles(size = 0.1,alpha = 0.5, col = "brown")+
tm_layout(bg.color = "skyblue",title = "Location of Shopping Outlet", title.size = 0.4)
Financial_tm <- tm_shape(mpsz_sp) +
tm_polygons(col = "lightgrey") +
tm_shape(financial_sp)+
tm_compass()+
tm_bubbles(size = 0.1,alpha = 0.5, col = "brown")+
tm_layout(bg.color = "skyblue",title = "Location of Financial Outlet", title.size = 0.4)
tmap_arrange(Business_tm,Embassy_tm,Industry_tm,Residential_tm,Shopping_tm,Financial_tm)
Using poly.counts function to count the number of points fall within each polygon.
Business <- poly.counts(business_sp,mpsz_sp)
Embassy <- poly.counts(embassy_sp,mpsz_sp)
Industry <- poly.counts(industry_sp,mpsz_sp)
Residential <- poly.counts(residential_sp,mpsz_sp)
Shopping <- poly.counts(shopping_sp,mpsz_sp)
Financial <-poly.counts(financial_sp,mpsz_sp)
After identifying, the number of points fall within each polygon, we will be using cbind function to combine the point data sets with the mpsz data set to make it into a single data set instead of 7 individual data sets.
mpsz_join <- cbind(mpsz,Business,Embassy,Industry,Residential,Shopping,Financial)
summary(mpsz_join)
## SUBZONE_N SHAPE_Area Business Embassy
## ADMIRALTY : 1 Min. : 39438 Min. : 0.00 Min. : 0.00
## AIRPORT ROAD : 1 1st Qu.: 628261 1st Qu.: 0.00 1st Qu.: 0.00
## ALEXANDRA HILL : 1 Median : 1229894 Median : 2.00 Median : 0.00
## ALEXANDRA NORTH: 1 Mean : 2420882 Mean : 19.57 Mean : 1.22
## ALJUNIED : 1 3rd Qu.: 2106483 3rd Qu.: 13.50 3rd Qu.: 1.00
## ANAK BUKIT : 1 Max. :69748299 Max. :303.00 Max. :17.00
## (Other) :317
## Industry Residential Shopping Financial
## Min. :0.0000 Min. : 0.00 Min. : 0.000 Min. : 0.0
## 1st Qu.:0.0000 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 1.0
## Median :0.0000 Median : 4.00 Median : 0.000 Median : 5.0
## Mean :0.2941 Mean : 11.08 Mean : 1.418 Mean : 10.2
## 3rd Qu.:0.0000 3rd Qu.: 11.00 3rd Qu.: 1.000 3rd Qu.: 13.0
## Max. :5.0000 Max. :215.00 Max. :27.000 Max. :132.0
##
## geometry
## MULTIPOLYGON :323
## epsg:NA : 0
## +proj=tmer...: 0
##
##
##
##
Perform a class check on mpsz_join after joining all the points data set to see is that any changes to the class.
class(mpsz_join)
## [1] "sf" "data.frame"
Importing the respopagesextod2011to2019 and named it as pop.
pop <- read_csv("data/aspatial/respopagesextod2011to2019.csv")
Using filter() function, group_by() function, spread() function and mutate() function to combine the data set according to the requirement.
pop2019 <- pop %>%
filter(Time == 2019) %>%
group_by(SZ,TOD)%>%
summarise(Pop = sum(Pop)) %>%
ungroup() %>%
spread(TOD,Pop)%>%
mutate(`HDB1-2RM` = `HDB 1- and 2-Room Flats`) %>%
mutate(`HDB3-4RM` = `HDB 3-Room Flats` + `HDB 4-Room Flats`) %>%
mutate(`HDB5RM-Ec` = `HDB 5-Room and Executive Flats`) %>%
mutate(`Cond_Apt` = `Condominiums and Other Apartments`) %>%
mutate(`Landed` = `Landed Properties`) %>%
mutate(total_dwellers = `HDB1-2RM`+`HDB3-4RM`+`HDB5RM-Ec`+`Cond_Apt`+`Landed`)
pop2019 <- subset(pop2019, select = -c(2:9))
pop2019_1 <- pop %>%
filter(Time == 2019) %>%
group_by(SZ,AG)%>%
summarise(Pop = sum(Pop)) %>%
ungroup() %>%
spread(AG,Pop)%>%
mutate(Economy = `25_to_29` +`30_to_34` +`35_to_39`+`40_to_44`+`45_to_49`+`50_to_54`+`55_to_59`+`60_to_64`) %>%
mutate(Young = `0_to_4`+`5_to_9`+`10_to_14`+`15_to_19`+`20_to_24`) %>%
mutate(Aged = `65_to_69`+`70_to_74`+`75_to_79`+`80_to_84`+`85_to_89`+`90_and_over`) %>%
mutate(total = Economy + Young+ Aged)
pop2019_1 <- subset(pop2019_1, select = -c(2:20))
pop2019 <- left_join(pop2019_1,pop2019)
pop2019 <- pop2019 %>%
mutate_at(.vars = vars(SZ), .funs = funs(toupper))
Using summary function to see the detail of pop2019
summary(pop2019)
## SZ Economy Young Aged
## Length:323 Min. : 0 Min. : 0 Min. : 0
## Class :character 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Mode :character Median : 2840 Median : 1170 Median : 730
## Mean : 7386 Mean : 3296 Mean : 1806
## 3rd Qu.:10285 3rd Qu.: 4365 3rd Qu.: 3000
## Max. :79780 Max. :34260 Max. :18860
## total HDB1-2RM HDB3-4RM HDB5RM-Ec
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median : 4890 Median : 0 Median : 0 Median : 0
## Mean : 12487 Mean : 542 Mean : 5953 Mean : 3297
## 3rd Qu.: 17065 3rd Qu.: 605 3rd Qu.: 9705 3rd Qu.: 3660
## Max. :132900 Max. :4700 Max. :75000 Max. :47960
## Cond_Apt Landed total_dwellers
## Min. : 0 Min. : 0.0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0.0 1st Qu.: 0
## Median : 230 Median : 0.0 Median : 4880
## Mean : 1827 Mean : 773.9 Mean : 12393
## 3rd Qu.: 2835 3rd Qu.: 400.0 3rd Qu.: 17035
## Max. :16770 Max. :18820.0 Max. :132480
Performing a class check on pop2019 data set
class(pop2019)
## [1] "tbl_df" "tbl" "data.frame"
using left_join function to combine 2 data set into a single data set and perform a NA check to make sure there is no missing fill.
final <- left_join(mpsz_join,pop2019, by = c( "SUBZONE_N" = "SZ"))
any(is.na(final))
## [1] FALSE
Perform a class check on final
class(final)
## [1] "sf" "data.frame"
The unit of measurement of the values are number of total Population within each polygon for Economy, Young and Aged. Using these values directly will be bais by the underlying total number of population within each polygon.
The unit of measurement of the values are number of total dwellers within each polygon for HDB1-2RM_PR, HDB3-4RM_PR, HDB5RM-Ec_PR, Cond_Apt_PR and Landed_PR. Using these values directly will be bais by the underlying total number of dwellers within each polygon.
In order to overcome this problem, we will derive the penetration rate of each ICT variable by using the code chunk below.
final_derived <- final %>%
mutate(Pop_density = `total`/`SHAPE_Area`) %>%
mutate(Economy_PR = `Economy`/`total`) %>%
mutate(Young_PR = `Young`/`total`) %>%
mutate(Aged_PR = `Aged`/`total`) %>%
mutate(`HDB1-2RM_PR` = `HDB1-2RM`/`total_dwellers`) %>%
mutate(`HDB3-4RM_PR` = `HDB3-4RM`/`total_dwellers`) %>%
mutate(`HDB5RM-Ec_PR` = `HDB5RM-Ec`/`total_dwellers`) %>%
mutate(`Cond_Apt_PR` = `Cond_Apt`/`total_dwellers`) %>%
mutate(`Landed_PR` = `Landed`/`total_dwellers`)
To view the summary of final_derived
summary(final_derived)
## SUBZONE_N SHAPE_Area Business Embassy
## Length:323 Min. : 39438 Min. : 0.00 Min. : 0.00
## Class :character 1st Qu.: 628261 1st Qu.: 0.00 1st Qu.: 0.00
## Mode :character Median : 1229894 Median : 2.00 Median : 0.00
## Mean : 2420882 Mean : 19.57 Mean : 1.22
## 3rd Qu.: 2106483 3rd Qu.: 13.50 3rd Qu.: 1.00
## Max. :69748299 Max. :303.00 Max. :17.00
##
## Industry Residential Shopping Financial
## Min. :0.0000 Min. : 0.00 Min. : 0.000 Min. : 0.0
## 1st Qu.:0.0000 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 1.0
## Median :0.0000 Median : 4.00 Median : 0.000 Median : 5.0
## Mean :0.2941 Mean : 11.08 Mean : 1.418 Mean : 10.2
## 3rd Qu.:0.0000 3rd Qu.: 11.00 3rd Qu.: 1.000 3rd Qu.: 13.0
## Max. :5.0000 Max. :215.00 Max. :27.000 Max. :132.0
##
## Economy Young Aged total
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median : 2840 Median : 1170 Median : 730 Median : 4890
## Mean : 7386 Mean : 3296 Mean : 1806 Mean : 12487
## 3rd Qu.:10285 3rd Qu.: 4365 3rd Qu.: 3000 3rd Qu.: 17065
## Max. :79780 Max. :34260 Max. :18860 Max. :132900
##
## HDB1-2RM HDB3-4RM HDB5RM-Ec Cond_Apt
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median : 0 Median : 0 Median : 0 Median : 230
## Mean : 542 Mean : 5953 Mean : 3297 Mean : 1827
## 3rd Qu.: 605 3rd Qu.: 9705 3rd Qu.: 3660 3rd Qu.: 2835
## Max. :4700 Max. :75000 Max. :47960 Max. :16770
##
## Landed total_dwellers geometry Pop_density
## Min. : 0.0 Min. : 0 MULTIPOLYGON :323 Min. :0.00000
## 1st Qu.: 0.0 1st Qu.: 0 epsg:NA : 0 1st Qu.:0.00000
## Median : 0.0 Median : 4880 +proj=tmer...: 0 Median :0.00591
## Mean : 773.9 Mean : 12393 Mean :0.01074
## 3rd Qu.: 400.0 3rd Qu.: 17035 3rd Qu.:0.01993
## Max. :18820.0 Max. :132480 Max. :0.04606
##
## Economy_PR Young_PR Aged_PR HDB1-2RM_PR
## Min. :0.05263 Min. :0.0000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.57064 1st Qu.:0.2146 1st Qu.:0.1075 1st Qu.:0.00000
## Median :0.59023 Median :0.2531 Median :0.1523 Median :0.00000
## Mean :0.59611 Mean :0.2475 Mean :0.1564 Mean :0.04037
## 3rd Qu.:0.60549 3rd Qu.:0.2862 3rd Qu.:0.1942 3rd Qu.:0.04794
## Max. :0.90000 Max. :0.5588 Max. :0.9474 Max. :0.71293
## NA's :89 NA's :89 NA's :89 NA's :95
## HDB3-4RM_PR HDB5RM-Ec_PR Cond_Apt_PR Landed_PR
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.02675 1st Qu.:0.0000
## Median :0.4027 Median :0.1441 Median :0.14571 Median :0.0000
## Mean :0.3550 Mean :0.1641 Mean :0.30746 Mean :0.1330
## 3rd Qu.:0.6067 3rd Qu.:0.2594 3rd Qu.:0.49172 3rd Qu.:0.1452
## Max. :0.9482 Max. :0.8364 Max. :1.00000 Max. :1.0000
## NA's :95 NA's :95 NA's :95 NA's :95
As we can see there are some colums with NA, however this NA is slighly different as when we open up the data set it shows NAn. The Nan is due the calculation. Therefore, we need to use is.nan() function to change to 0.
final_derived$Economy_PR[is.nan(final_derived$Economy_PR)] <- 0
final_derived$Young_PR[is.nan(final_derived$Young_PR)] <- 0
final_derived$Aged_PR[is.nan(final_derived$Aged_PR)] <- 0
final_derived$`HDB1-2RM_PR`[is.nan(final_derived$`HDB1-2RM_PR`)] <- 0
final_derived$`HDB3-4RM_PR`[is.nan(final_derived$`HDB3-4RM_PR`)] <- 0
final_derived$`HDB5RM-Ec_PR`[is.nan(final_derived$`HDB5RM-Ec_PR`)] <- 0
final_derived$Cond_Apt_PR[is.nan(final_derived$Cond_Apt_PR)] <- 0
final_derived$Landed_PR[is.nan(final_derived$Landed_PR)] <- 0
Using ggplot function to plot the graph to see the distribution.
Economy_PR <- ggplot(data=final_derived,
aes(x= `Economy_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Young_PR <- ggplot(data=final_derived,
aes(x= `Young_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Aged_PR <- ggplot(data=final_derived,
aes(x= `Aged_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Pop_density <- ggplot(data=final_derived,
aes(x= `Pop_density`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`HDB1-2RM_PR` <- ggplot(data=final_derived,
aes(x= `HDB1-2RM_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`HDB3-4RM_PR` <- ggplot(data=final_derived,
aes(x= `HDB3-4RM_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`HDB5RM-Ec_PR` <- ggplot(data=final_derived,
aes(x= `HDB5RM-Ec_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Cond_Apt_PR <- ggplot(data=final_derived,
aes(x= `Cond_Apt_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Landed_PR <- ggplot(data=final_derived,
aes(x= `Landed_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`Business` <- ggplot(data=final_derived,
aes(x= `Business`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`Embassy` <- ggplot(data=final_derived,
aes(x= `Embassy`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`Financial` <- ggplot(data=final_derived,
aes(x= `Financial`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`Industry` <- ggplot(data=final_derived,
aes(x= `Industry`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`Residential` <- ggplot(data=final_derived,
aes(x= `Residential`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`Shopping` <- ggplot(data=final_derived,
aes(x= `Shopping`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
ggarrange(Economy_PR, Young_PR, Aged_PR, Pop_density, `HDB1-2RM_PR`, `HDB3-4RM_PR`, `HDB5RM-Ec_PR`,Cond_Apt_PR,Landed_PR, Embassy,Business, Financial,Industry,Residential,Shopping,
ncol = 3,
nrow = 5)
When we place the order differently, we can see the class turns up differently. The purpose of switch the position of the variable is to make the data set into data.frame format for cluster_vars.cor input as it only take in data set in data.frame format.
final1 <- left_join(pop2019,mpsz_join,by = c( "SZ" = "SUBZONE_N"))
class(final1)
## [1] "tbl_df" "tbl" "data.frame"
Will repeat as above steps
final_derived1 <- final1 %>%
mutate(Pop_density = `total`/`SHAPE_Area`) %>%
mutate(Economy_PR = `Economy`/`total`) %>%
mutate(Young_PR = `Young`/`total`) %>%
mutate(Aged_PR = `Aged`/`total`) %>%
mutate(`HDB1-2RM_PR` = `HDB1-2RM`/`total_dwellers`) %>%
mutate(`HDB3-4RM_PR` = `HDB3-4RM`/`total_dwellers`) %>%
mutate(`HDB5RM-Ec_PR` = `HDB5RM-Ec`/`total_dwellers`) %>%
mutate(`Cond_Apt_PR` = `Cond_Apt`/`total_dwellers`) %>%
mutate(`Landed_PR` = `Landed`/`total_dwellers`)
final_derived1$Economy_PR[is.nan(final_derived1$Economy_PR)] <- 0
final_derived1$Young_PR[is.nan(final_derived1$Young_PR)] <- 0
final_derived1$Aged_PR[is.nan(final_derived1$Aged_PR)] <- 0
final_derived1$`HDB1-2RM_PR`[is.nan(final_derived1$`HDB1-2RM_PR`)] <- 0
final_derived1$`HDB3-4RM_PR`[is.nan(final_derived1$`HDB3-4RM_PR`)] <- 0
final_derived1$`HDB5RM-Ec_PR`[is.nan(final_derived1$`HDB5RM-Ec_PR`)] <- 0
final_derived1$Cond_Apt_PR[is.nan(final_derived1$Cond_Apt_PR)] <- 0
final_derived1$Landed_PR[is.nan(final_derived1$Landed_PR)] <- 0
Using colnames() function to view the name of each columns.
colnames(final_derived1)
## [1] "SZ" "Economy" "Young" "Aged"
## [5] "total" "HDB1-2RM" "HDB3-4RM" "HDB5RM-Ec"
## [9] "Cond_Apt" "Landed" "total_dwellers" "SHAPE_Area"
## [13] "Business" "Embassy" "Industry" "Residential"
## [17] "Shopping" "Financial" "geometry" "Pop_density"
## [21] "Economy_PR" "Young_PR" "Aged_PR" "HDB1-2RM_PR"
## [25] "HDB3-4RM_PR" "HDB5RM-Ec_PR" "Cond_Apt_PR" "Landed_PR"
To perform a switch column function to help in our further inputs. As we are putting all the relavent columns near to each other.
final_derived1 <- final_derived1[, c(1,2,3, 4,5,7,8,9,10,11,12,13,20,21,22,23,6,24,25,26,27,28,13,14,15,16,17,18)]
colnames(final_derived1)
## [1] "SZ" "Economy" "Young" "Aged"
## [5] "total" "HDB3-4RM" "HDB5RM-Ec" "Cond_Apt"
## [9] "Landed" "total_dwellers" "SHAPE_Area" "Business"
## [13] "Pop_density" "Economy_PR" "Young_PR" "Aged_PR"
## [17] "HDB1-2RM" "HDB1-2RM_PR" "HDB3-4RM_PR" "HDB5RM-Ec_PR"
## [21] "Cond_Apt_PR" "Landed_PR" "Business" "Embassy"
## [25] "Industry" "Residential" "Shopping" "Financial"
To see the different of the class.
class(final_derived)
## [1] "sf" "data.frame"
class(final_derived1)
## [1] "tbl_df" "tbl" "data.frame"
Before we perform cluster analysis, it is important for us to endure that the input variables use are not highly correlated.To visualise and analyse the correlation of the input variables.If we input final_derived as our input value is will show “Error in cor(final_derived[, 14:28]) : ‘x’ must be numeric”
cluster_vars.cor = cor(final_derived1[,14:28])
corrplot.mixed(cluster_vars.cor,
lower = "ellipse",
upper = "number",
tl.pos = "lt",
diag = "l",
tl.col = "black",
number.cex = .5)
#final_derived <- st_as_sf(final_derived)
Since we identify Young_PR and Economy_PR are highly correlated. Therefore, we have to remove one of them. in this case we will be removing Young_PR from our input. However, in order to perform the selection we have to use st_set_geometry to drop the geomoetry columns.
cluster_vars <- final_derived %>%
st_set_geometry(NULL)
cluster_vars <- dplyr :: select(cluster_vars, "SUBZONE_N","Economy_PR", "Aged_PR", "Pop_density", "HDB1-2RM_PR", "HDB3-4RM_PR","HDB5RM-Ec_PR","Cond_Apt_PR", "Landed_PR", "Business", "Embassy", "Financial", "Industry","Residential" ,"Shopping")
To view the summary of cluster_vars.
summary(cluster_vars)
## SUBZONE_N Economy_PR Aged_PR Pop_density
## Length:323 Min. :0.0000 Min. :0.0000 Min. :0.00000
## Class :character 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
## Mode :character Median :0.5753 Median :0.1240 Median :0.00591
## Mean :0.4319 Mean :0.1133 Mean :0.01074
## 3rd Qu.:0.5991 3rd Qu.:0.1822 3rd Qu.:0.01993
## Max. :0.9000 Max. :0.9474 Max. :0.04606
## HDB1-2RM_PR HDB3-4RM_PR HDB5RM-Ec_PR Cond_Apt_PR
## Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.00000 Median :0.0000 Median :0.0000 Median :0.04593
## Mean :0.02850 Mean :0.2506 Mean :0.1159 Mean :0.21703
## 3rd Qu.:0.02899 3rd Qu.:0.5043 3rd Qu.:0.2072 3rd Qu.:0.30034
## Max. :0.71293 Max. :0.9481 Max. :0.8364 Max. :1.00000
## Landed_PR Business Embassy Financial
## Min. :0.00000 Min. : 0.00 Min. : 0.00 Min. : 0.0
## 1st Qu.:0.00000 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.0
## Median :0.00000 Median : 2.00 Median : 0.00 Median : 5.0
## Mean :0.09390 Mean : 19.57 Mean : 1.22 Mean : 10.2
## 3rd Qu.:0.03887 3rd Qu.: 13.50 3rd Qu.: 1.00 3rd Qu.: 13.0
## Max. :1.00000 Max. :303.00 Max. :17.00 Max. :132.0
## Industry Residential Shopping
## Min. :0.0000 Min. : 0.00 Min. : 0.000
## 1st Qu.:0.0000 1st Qu.: 0.00 1st Qu.: 0.000
## Median :0.0000 Median : 4.00 Median : 0.000
## Mean :0.2941 Mean : 11.08 Mean : 1.418
## 3rd Qu.:0.0000 3rd Qu.: 11.00 3rd Qu.: 1.000
## Max. :5.0000 Max. :215.00 Max. :27.000
Now, we are creating a new columns is called overview, the purpose of having this columns to sum up all the relevant columns together and to identify which are the subzone polygon with 0 value.
cluster_vars <- cluster_vars %>%
group_by(SUBZONE_N) %>%
mutate(overview = `Economy_PR` + `Aged_PR`+`Pop_density`+ `HDB1-2RM_PR`+ `HDB3-4RM_PR`+`HDB5RM-Ec_PR`+`Cond_Apt_PR`+ `Landed_PR`+ `Business`+ `Embassy`+ `Financial`+ `Industry`+`Residential`+`Shopping`)
After creating a new columns, the overview is added to the last columns.
colnames(cluster_vars)
## [1] "SUBZONE_N" "Economy_PR" "Aged_PR" "Pop_density" "HDB1-2RM_PR"
## [6] "HDB3-4RM_PR" "HDB5RM-Ec_PR" "Cond_Apt_PR" "Landed_PR" "Business"
## [11] "Embassy" "Financial" "Industry" "Residential" "Shopping"
## [16] "overview"
However, to help us save time we can re-arrange it next to SUBZONE_N.
cluster_vars <- cluster_vars[, c(16,1, 2, 3, 4,5,6,7,8,9,10,11,12,13,14,15,16)]
colnames(cluster_vars)
## [1] "overview" "SUBZONE_N" "Economy_PR" "Aged_PR" "Pop_density"
## [6] "HDB1-2RM_PR" "HDB3-4RM_PR" "HDB5RM-Ec_PR" "Cond_Apt_PR" "Landed_PR"
## [11] "Business" "Embassy" "Financial" "Industry" "Residential"
## [16] "Shopping" "overview"
Since there are SUBZONEs with 0 value, in order to prevent any inaccurate analysis due to the polygons with 0 value. ASs there is no information for those polygons, therefore, we will be removing them from our analysis.
cluster_vars <- cluster_vars[-c(13,17,18,19,44,179,191,203,234,252,256,270,274,286,295,298,301,302,303,304,312,318,319),]
Not only removing from the cluster_vars data set, we also need to remove from mpsz. As we will be combining both data set together in the later stage. When the number of rows are different, we are unable to perform join function.
mpsz <- mpsz[-c(13,17,18,19,44,179,191,203,234,252,256,270,274,286,295,298,301,302,303,304,312,318,319),]
We will name the new data set as mpsz_sp1 which is in SpatialPolygon format.
mpsz_sp1 <- as(mpsz,"Spatial")
Removing the overview columns, since we will not be using it anymore.
cluster_vars <- subset(cluster_vars, select = -c(overview))
In the code chunk below, normalize() of heatmaply package is used to stadardisation the clustering variables by using Min-Max method.
cluster_vars_SZ <- normalize(cluster_vars)
Next, we need to change the rows by SUBZONE name instead of row number by using the code chunk below
row.names(cluster_vars) <- cluster_vars$"SUBZONE_N"
Notice that the row number has been replaced into the SUBZONE name.
cluster_vars <- as.data.frame(cluster_vars)
Now, we will delete the SUBZONE_N field by using the code chunk below.
final_output <- dplyr::select(cluster_vars, c(2:15))
summary(final_output,10)
## Economy_PR Aged_PR Pop_density HDB1-2RM_PR
## Min. :0.0000 Min. :0.00000 Min. :0.0000000 Min. :0.00000
## 1st Qu.:0.5363 1st Qu.:0.03755 1st Qu.:0.0002178 1st Qu.:0.00000
## Median :0.5804 Median :0.13316 Median :0.0070916 Median :0.00000
## Mean :0.4650 Mean :0.12199 Mean :0.0115599 Mean :0.03068
## 3rd Qu.:0.6011 3rd Qu.:0.18705 3rd Qu.:0.0216070 3rd Qu.:0.03263
## Max. :0.9000 Max. :0.94737 Max. :0.0460579 Max. :0.71293
## HDB3-4RM_PR HDB5RM-Ec_PR Cond_Apt_PR Landed_PR
## Min. :0.00000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.03132 Median :0.0000 Median :0.06175 Median :0.00000
## Mean :0.26982 Mean :0.1247 Mean :0.23367 Mean :0.10110
## 3rd Qu.:0.53305 3rd Qu.:0.2242 3rd Qu.:0.31995 3rd Qu.:0.05445
## Max. :0.94815 Max. :0.8364 Max. :1.00000 Max. :1.00000
## Business Embassy Financial Industry
## Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. :0.0000
## 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 2.00 1st Qu.:0.0000
## Median : 3.00 Median : 0.000 Median : 5.50 Median :0.0000
## Mean : 21.07 Mean : 1.313 Mean : 10.98 Mean :0.3167
## 3rd Qu.: 15.00 3rd Qu.: 1.000 3rd Qu.: 14.00 3rd Qu.:0.0000
## Max. :303.00 Max. :17.000 Max. :132.00 Max. :5.0000
## Residential Shopping
## Min. : 0.00 Min. : 0.000
## 1st Qu.: 1.00 1st Qu.: 0.000
## Median : 5.00 Median : 0.000
## Mean : 11.93 Mean : 1.527
## 3rd Qu.: 11.25 3rd Qu.: 2.000
## Max. :215.00 Max. :27.000
In the code chunk below, normalize() of heatmaply package is used to stadardisation the clustering variables by using Min-Max method.
The summary() is then used to display the summary statistics of the standardised clustering variables.
final_output.std <- normalize(final_output)
summary(final_output.std)
## Economy_PR Aged_PR Pop_density HDB1-2RM_PR
## Min. :0.0000 Min. :0.00000 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.5959 1st Qu.:0.03964 1st Qu.:0.004729 1st Qu.:0.00000
## Median :0.6449 Median :0.14056 Median :0.153971 Median :0.00000
## Mean :0.5166 Mean :0.12877 Mean :0.250986 Mean :0.04304
## 3rd Qu.:0.6679 3rd Qu.:0.19744 3rd Qu.:0.469127 3rd Qu.:0.04577
## Max. :1.0000 Max. :1.00000 Max. :1.000000 Max. :1.00000
## HDB3-4RM_PR HDB5RM-Ec_PR Cond_Apt_PR Landed_PR
## Min. :0.00000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.03303 Median :0.0000 Median :0.06175 Median :0.00000
## Mean :0.28458 Mean :0.1491 Mean :0.23367 Mean :0.10110
## 3rd Qu.:0.56220 3rd Qu.:0.2681 3rd Qu.:0.31995 3rd Qu.:0.05445
## Max. :1.00000 Max. :1.0000 Max. :1.00000 Max. :1.00000
## Business Embassy Financial Industry
## Min. :0.000000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.01515 1st Qu.:0.00000
## Median :0.009901 Median :0.00000 Median :0.04167 Median :0.00000
## Mean :0.069527 Mean :0.07725 Mean :0.08316 Mean :0.06333
## 3rd Qu.:0.049505 3rd Qu.:0.05882 3rd Qu.:0.10606 3rd Qu.:0.00000
## Max. :1.000000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## Residential Shopping
## Min. :0.000000 Min. :0.00000
## 1st Qu.:0.004651 1st Qu.:0.00000
## Median :0.023256 Median :0.00000
## Mean :0.055504 Mean :0.05654
## 3rd Qu.:0.052326 3rd Qu.:0.07407
## Max. :1.000000 Max. :1.00000
After normalization, all the value are fall within 0 to 1 range.
ggplot function will be helping us to display each graph.
Economy_PR <- ggplot(data=final_output.std,
aes(x= `Economy_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Aged_PR <- ggplot(data=final_output.std,
aes(x= `Aged_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Pop_density <- ggplot(data=final_output.std,
aes(x= `Pop_density`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`HDB1-2RM_PR` <- ggplot(data=final_output.std,
aes(x= `HDB1-2RM_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`HDB3-4RM_PR` <- ggplot(data=final_output.std,
aes(x= `HDB3-4RM_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`HDB5RM-Ec_PR` <- ggplot(data=final_output.std,
aes(x= `HDB5RM-Ec_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Cond_Apt_PR <- ggplot(data=final_output.std,
aes(x= `Cond_Apt_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Landed_PR <- ggplot(data=final_output.std,
aes(x= `Landed_PR`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`Business` <- ggplot(data=final_output.std,
aes(x= `Business`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`Embassy` <- ggplot(data=final_output.std,
aes(x= `Embassy`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`Financial` <- ggplot(data=final_output.std,
aes(x= `Financial`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`Industry` <- ggplot(data=final_output.std,
aes(x= `Industry`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`Residential` <- ggplot(data=final_output.std,
aes(x= `Residential`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
`Shopping` <- ggplot(data=final_output.std,
aes(x= `Shopping`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
ggarrange(Economy_PR, Young_PR, Aged_PR, Pop_density, `HDB1-2RM_PR`, `HDB3-4RM_PR`, `HDB5RM-Ec_PR`,Cond_Apt_PR,Landed_PR, Embassy,Business, Financial,Industry,Residential,Shopping,
ncol = 3,
nrow = 5)
To compute the proximity matrix using euclidean method.
proxmat <- dist(final_output.std,method = 'euclidean')
To list the content of proxmat for visual inspection.
To compute the agglomerative coefficients of all hierarchical clustering algorithms.
m <- c( "average", "single", "complete", "ward")
names(m) <- c( "average", "single", "complete", "ward")
ac <- function(x) {
agnes(final_output.std, method = x)$ac
}
map_dbl(m, ac)
## average single complete ward
## 0.8711732 0.8126779 0.9100738 0.9798199
Since ward gives the highest value, therefore in the following code chunk will be using ward instead of others.
Performs hierarchical cluster analysis using ward.D method. The hierarchical clustering output is stored in an object of class hclust which describes the tree produced by the clustering process.
hclust_ward <- hclust(proxmat, method = 'ward.D')
Plot the tree by using plot() of R Graphics as shown in the code chunk below.
plot(hclust_ward, cex = 0.1, )
In order to identify the number of clustering, we will be using fviz_nbclust() function to identify the optimal number of clusters. 10 clusters will be used in the suquence code chunk.
set.seed(123)
fviz_nbclust(final_output.std, hcut, nstart = 25, method = "gap_stat", nboot = 50)+
labs(subtitle = "Gap statistic method")
To draw the dendrogram with a border around the selected clusters by using rect.hclust() of R stats.
plot(hclust_ward, cex = 0.1)
rect.hclust(hclust_ward, k = 10, border = 2:5)
To transform final_output.std data frame into a data matrix.
final_output_mat <- data.matrix(final_output.std)
To build an interactive cluster heatmap.
heatmaply(normalize(final_output_mat),
Colv=NA,
dist_method = "euclidean",
hclust_method = "ward.D",
seriate = "OLO",
colors = Blues,
k_row = 10,
margins = c(NA,200,60,NA),
fontsize_row = 4,
fontsize_col = 5,
main="Geographic Segmentation of Singapore Subzone by geodemographic and urban functions indicators",
xlab = "Geodemographic and Urban Functions Indicators",
ylab = "Planning Subzone of Singapore"
)
To derive a 10-cluster model.
groups <- as.factor(cutree(hclust_ward, k=10))
The code chunk below form the join in three steps:
sg_sf_cluster <- cbind(mpsz, as.matrix(groups)) %>%
rename(`CLUSTER`=`as.matrix.groups.`)
To plot the choropleth map showing the cluster formed.
tm_shape(sg_sf_cluster)+
tm_fill(col = "CLUSTER")+
tm_borders(col = "black") +
tm_layout(outer.bg.color = "lightblue", legend.text.size = 0.55, title = "Hierarchical Clustering", title.size = 1)
Using st_set_geometry to drop the geometry and named it as sg_sf_cluster_all
sg_sf_cluster_all <- sg_sf_cluster %>%
st_set_geometry(NULL)
To perform a left_join with cluster_vars_SZ and the cluster_vars_SZ already in standardised format.
sg_sf_cluster_all <- left_join(sg_sf_cluster_all, cluster_vars_SZ)
Remane all the columns names
sg_sf_cluster_all <- sg_sf_cluster_all %>%
rename("Cl_Economy_PR"= "Economy_PR","Cl_Aged_PR" = "Aged_PR","Cl_Pop_density" = "Pop_density", "Cl_HDB1-2RM_PR"= "HDB1-2RM_PR" , "Cl_HDB3-4RM_PR" = "HDB3-4RM_PR","Cl_HDB5RM-Ec_PR" = "HDB5RM-Ec_PR","Cl_Cond_Apt_PR" = "Cond_Apt_PR", "Cl_Landed_PR" = "Landed_PR", "Cl_Business" = "Business", "Cl_Embassy" ="Embassy" , "Cl_Financial" = "Financial", "Cl_Industry" = "Industry","Cl_Residential" = "Residential" ,"Cl_Shopping" = "Shopping" )
Using pivot_longer() function to pivot the data set.
sg_sf_cluster_all <- sg_sf_cluster_all %>%
pivot_longer( cols = starts_with("Cl_"),
names_to = "variable",
values_to = "value"
)
Using facet_wrap to display the data sets in geom_col.
ggplot(data = sg_sf_cluster_all, aes(y = variable, x = value )) +
geom_col() +
facet_wrap( ~ CLUSTER, nrow = 3,ncol = 4, scales = "free_x") +
theme(strip.text.x = element_text(size = 9,face="bold"), axis.text.y = element_text(size = 5,face="bold"), axis.text.x = element_text(size = 7,face="bold"),panel.background = element_rect(fill = "#BFD5E3", colour = "#6D9EC1",size = 2, linetype = "solid"),
panel.grid.major = element_line(size = 0.5, linetype = 'solid',
colour = "white"),
panel.grid.minor = element_line(size = 0.25, linetype = 'solid',
colour = "white")) +
ggtitle("Detail of Geodemographic and Urban Functions Indicators for each cluster")
Using facet_wrap to display the data sets in geom_boxplot.
ggplot(data = sg_sf_cluster_all, aes(y=variable, x=value), fill = variable) +
geom_boxplot()+
facet_wrap( ~ CLUSTER, nrow = 3,ncol = 4) +
theme(strip.text.x = element_text(size = 12), axis.text.y = element_text(size = 5,face="bold"), axis.text.x = element_text(size = 7,face="bold"),panel.background = element_rect(fill = "#BFD5E3", colour = "#6D9EC1",size = 2, linetype = "solid"),
panel.grid.major = element_line(size = 0.5, linetype = 'solid',
colour = "white"),
panel.grid.minor = element_line(size = 0.25, linetype = 'solid',
colour = "white")) +
ggtitle("Detail of Geodemographic and Urban Functions Indicators for each cluster")
Cluster 1: Cluster 1 is represented by green colour and by looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that business is the main player in this cluster.
Cluster 2: Cluster 2 is represented by green colour and by looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that Economy group and HDB3-4RM are the main player in this cluster.
Cluster 3: Cluster 3 is represented by green colour and by looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that econmoy and cond_apt are the main player in this cluster.
Cluster 4: Cluster 4 is represented by green colour and by looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that HDB3-4RM, Population density and economy are the main player in this cluster.
Cluster 5: Cluster 5 is represented by green colour and by looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that population density, economy, HDB5RM-Ec and HDB3-4Rm are the main player in this cluster.
Cluster 6: Cluster 6 is represented by green colour and by looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that economy and cond_apt are the main player in this cluster.
Cluster 7: Cluster 7 is represented by green colour and by looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that business and industry are the main player in this cluster.
Cluster 8: Cluster 8 is represented by green colour and by looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that shapping, embassy and financial are the main player in this cluster.
Cluster 9: Cluster 9 is represented by green colour and by looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that economy and industry are the main player in this cluster.
Cluster 10: Cluster 10 is represented by green colour and by looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that landed and economy group are the main player in this cluster.
To compute the neighbours list from polygon list.
mpsz.nb <- poly2nb(mpsz_sp1)
summary(mpsz.nb)
## Neighbour list object:
## Number of regions: 300
## Number of nonzero links: 1758
## Percentage nonzero weights: 1.953333
## Average number of links: 5.86
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 10 11 12 15
## 1 5 13 34 79 74 47 31 12 1 1 1 1
## 1 least connected region:
## 16 with 1 link
## 1 most connected region:
## 313 with 15 links
We can plot the neighbours list on mpsz_sp1
plot(mpsz_sp1, border=grey(.5))
plot(mpsz.nb, coordinates(mpsz_sp1), col="blue", add=TRUE)
To compute the cost of each edge.
lcosts <- nbcosts(mpsz.nb, final_output)
For each observation, this gives the pairwise dissimilarity between its values on the five variables and the values for the neighbouring observation (from the neighbour list). Basically, this is the notion of a generalised weight for a spatial weights matrix.
Next, We will incorporate these costs into a weights object in the same way as we did in the calculation of inverse of distance weights. In other words, we convert the neighbour list to a list weights object by specifying the just computed lcosts as the weights.
Note that we specify the style as B to make sure the cost values are not row-standardised.
mpsz.w <- nb2listw(mpsz.nb, lcosts, style="B")
summary(mpsz.w)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 300
## Number of nonzero links: 1758
## Percentage nonzero weights: 1.953333
## Average number of links: 5.86
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 10 11 12 15
## 1 5 13 34 79 74 47 31 12 1 1 1 1
## 1 least connected region:
## 16 with 1 link
## 1 most connected region:
## 313 with 15 links
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 300 90000 75606.3 14121307 137048363
The minimum spanning tree is computed by mean of the mstree()
mpsz.mst <- mstree(mpsz.w)
To check its class and dimension
class(mpsz.mst)
## [1] "mst" "matrix"
dim(mpsz.mst)
## [1] 299 3
Display the content of mpsz.mst
head(mpsz.mst)
## [,1] [,2] [,3]
## [1,] 81 74 27.073973
## [2,] 81 119 30.166206
## [3,] 119 161 5.291503
## [4,] 161 117 5.099020
## [5,] 117 118 5.000000
## [6,] 117 15 27.000000
plot(mpsz_sp1, border=gray(.5))
plot.mst(mpsz.mst, coordinates(mpsz_sp1),
col="blue", cex.lab=0.7, cex.circles=0.005, add=TRUE)
To compute the spatially constrained cluster
clust10 <- skater(mpsz.mst[,1:2], final_output, 9)
Examine its contents
str(clust10)
## List of 8
## $ groups : num [1:300] 4 4 4 4 4 4 4 4 4 4 ...
## $ edges.groups:List of 10
## ..$ :List of 3
## .. ..$ node: num [1:6] 81 88 110 99 75 74
## .. ..$ edge: num [1:5, 1:3] 99 88 81 81 110 75 110 88 74 99 ...
## .. ..$ ssw : num 106
## ..$ :List of 3
## .. ..$ node: num [1:7] 153 138 107 89 106 84 135
## .. ..$ edge: num [1:6, 1:3] 153 107 138 106 89 89 138 89 107 84 ...
## .. ..$ ssw : num 258
## ..$ :List of 3
## .. ..$ node: num [1:5] 132 178 152 144 108
## .. ..$ edge: num [1:4, 1:3] 132 178 152 132 178 ...
## .. ..$ ssw : num 223
## ..$ :List of 3
## .. ..$ node: num [1:261] 54 49 57 32 48 23 213 18 102 52 ...
## .. ..$ edge: num [1:260, 1:3] 57 32 48 23 18 213 102 220 52 17 ...
## .. ..$ ssw : num 5259
## ..$ :List of 3
## .. ..$ node: num [1:4] 113 165 162 158
## .. ..$ edge: num [1:3, 1:3] 165 113 165 162 165 ...
## .. ..$ ssw : num 206
## ..$ :List of 3
## .. ..$ node: num [1:2] 283 300
## .. ..$ edge: num [1, 1:3] 283 300 42.1
## .. ..$ ssw : num 42.1
## ..$ :List of 3
## .. ..$ node: num 221
## .. ..$ edge: num[0 , 1:3]
## .. ..$ ssw : num 0
## ..$ :List of 3
## .. ..$ node: num [1:6] 272 275 173 294 70 282
## .. ..$ edge: num [1:5, 1:3] 272 272 275 173 272 275 282 294 70 173 ...
## .. ..$ ssw : num 213
## ..$ :List of 3
## .. ..$ node: num [1:2] 160 105
## .. ..$ edge: num [1, 1:3] 160 105 23.7
## .. ..$ ssw : num 23.7
## ..$ :List of 3
## .. ..$ node: num [1:6] 53 43 40 95 121 128
## .. ..$ edge: num [1:5, 1:3] 121 95 40 53 43 128 121 95 43 40 ...
## .. ..$ ssw : num 135
## $ not.prune : NULL
## $ candidates : int [1:10] 1 2 3 4 5 6 7 8 9 10
## $ ssto : num 11218
## $ ssw : num [1:10] 11218 9371 8666 8195 7723 ...
## $ crit : num [1:2] 1 Inf
## $ vec.crit : num [1:300] 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "class")= chr "skater"
To check the cluster assignment
ccs10 <- clust10$groups
ccs10
## [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [26] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 10 4 4 10 4 4 4 4 4 4 4
## [51] 4 4 10 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 8 4 4 4 1 1
## [76] 4 4 4 4 4 1 4 4 2 4 4 4 1 2 4 4 4 4 4 10 4 4 4 1 4
## [101] 4 4 4 4 9 2 2 3 4 1 4 4 5 4 4 4 4 4 4 4 10 4 4 4 4
## [126] 4 4 10 4 4 4 3 4 4 2 4 4 2 4 4 4 4 4 3 4 4 4 4 4 4
## [151] 4 3 2 4 4 4 4 5 4 9 4 5 4 4 5 4 4 4 4 4 4 4 8 4 4
## [176] 4 4 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [201] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 7 4 4 4 4
## [226] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [251] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 8 4 4 8
## [276] 4 4 4 4 4 4 8 6 4 4 4 4 4 4 4 4 4 4 8 4 4 4 4 4 6
We can find out how many observations are in each cluster by means of the table command. Parenthetially, we can also find this as the dimension of each vector in the lists contained in edges.groups.
table(ccs10)
## ccs10
## 1 2 3 4 5 6 7 8 9 10
## 6 7 5 261 4 2 1 6 2 6
Plot the pruned tree that shows the ten clusters on top of the subzone area.
plot(mpsz_sp1, border=gray(.5))
plot(clust10, coordinates(mpsz_sp1), cex.lab=.7,
groups.colors=c("red","green","blue", "brown", "pink","yellow","orange","purple","grey","black","coral"), cex.circles=0.005, add=TRUE)
groups_mat <- as.matrix(clust10$groups)
mpsz_sf_spatialcluster <- cbind(sg_sf_cluster, as.factor(groups_mat)) %>%
rename(`SP_CLUSTER`=`as.factor.groups_mat.`)
tm_shape(mpsz_sf_spatialcluster)+
tm_fill(col = "SP_CLUSTER")+
tm_borders(col = "black") +
tm_layout(outer.bg.color = "lightblue", legend.text.size = 0.55, title = "Spatially Constrained Clustering", title.size = 1)
Using st_set_geometry to drop the geometry and named it as mpsz_sf_spatialcluster_all
mpsz_sf_spatialcluster_all <- mpsz_sf_spatialcluster %>%
st_set_geometry(NULL)
To perform a left_join with cluster_vars_SZ and the cluster_vars_SZ already in standardised format.
mpsz_sf_spatialcluster_all <- left_join(mpsz_sf_spatialcluster_all, cluster_vars_SZ)
Remane all the columns names
mpsz_sf_spatialcluster_all <- mpsz_sf_spatialcluster_all %>%
rename("Cl_Economy_PR"= "Economy_PR","Cl_Aged_PR" = "Aged_PR","Cl_Pop_density" = "Pop_density", "Cl_HDB1-2RM_PR"= "HDB1-2RM_PR" , "Cl_HDB3-4RM_PR" = "HDB3-4RM_PR","Cl_HDB5RM-Ec_PR" = "HDB5RM-Ec_PR","Cl_Cond_Apt_PR" = "Cond_Apt_PR", "Cl_Landed_PR" = "Landed_PR", "Cl_Business" = "Business", "Cl_Embassy" ="Embassy" , "Cl_Financial" = "Financial", "Cl_Industry" = "Industry","Cl_Residential" = "Residential" ,"Cl_Shopping" = "Shopping" )
Using pivot_longer() function to pivot the data set.
mpsz_sf_spatialcluster_all <- mpsz_sf_spatialcluster_all %>%
pivot_longer( cols = starts_with("Cl_"),
names_to = "variable",
values_to = "value"
)
Using facet_wrap to display the data sets in geom_col.
ggplot(data = mpsz_sf_spatialcluster_all, aes(y = variable, x = value )) +
geom_col() +
facet_wrap( ~ SP_CLUSTER, nrow = 3,ncol = 4, scales = "free_x") +
theme(strip.text.x = element_text(size = 9,face="bold"), axis.text.y = element_text(size = 5,face="bold"), axis.text.x = element_text(size = 7,face="bold"),panel.background = element_rect(fill = "#BFD5E3", colour = "#6D9EC1",size = 2, linetype = "solid"),
panel.grid.major = element_line(size = 0.5, linetype = 'solid',
colour = "white"),
panel.grid.minor = element_line(size = 0.25, linetype = 'solid',
colour = "white")) +
ggtitle("Detail of Geodemographic and Urban Functions Indicators for each cluster")
Using facet_wrap to display the data sets in geom_boxplot.
ggplot(data = mpsz_sf_spatialcluster_all, aes(y=variable, x=value), fill = variable) +
geom_boxplot()+
facet_wrap( ~ SP_CLUSTER, nrow = 3,ncol = 4) +
theme(strip.text.x = element_text(size = 12), axis.text.y = element_text(size = 5,face="bold"), axis.text.x = element_text(size = 7,face="bold"),panel.background = element_rect(fill = "#BFD5E3", colour = "#6D9EC1",size = 2, linetype = "solid"),
panel.grid.major = element_line(size = 0.5, linetype = 'solid',
colour = "white"),
panel.grid.minor = element_line(size = 0.25, linetype = 'solid',
colour = "white")) +
ggtitle("Detail of Geodemographic and Urban Functions Indicators for each cluster")
Cluster 1: Cluster 1 is represented by green colour and as we can see the location of cluster 1 is located around Jurong West area. By looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that business and industry are the main player in this cluster and the rests are close to 0.
Cluster 2: Cluster 2 is represented by yellow colour and as we can see the location of cluster 3 is located around Jurong East and Clementi areas. By looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that business is the main player in this cluster and the rests are close to 0.
Cluster 3: Cluster 3 is represented by purple colour and as we can see the location of cluster 3 is located around CBD and town areas. By looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that business, economy group and indsutry are the main player in this cluster. However, it is not only limit to those mentioned groups, but the rests is not as signification as the mentioned threes
Cluster 4: Cluster 4 is represented by red colour and as we can see the location of cluster 4 almost entire Singapore. As we know Singapore is very good at geodemographic and urban function planning. Therefore, we are not surprise that when most of the subzone are fall within the same cluster. However, in this cluster there some singificate definately more others such as Economy group, HDB3-4RM,popualtion density and cond_Apt.
Cluster 5: Cluster 5 is represented by blue colour and as we can see the location of cluster 4 is located around Kallang area. By looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that Residential, Economy group and Cond_apt are the main player in this cluster.
Cluster 6: Cluster 6 is represented by orange colour and as we can see the location of cluster 5 Woodlands area. By looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that Business and Industry are the main player in this cluster.
Cluster 7: Cluster 7 is represented by green colour and as we can see the location of cluster 7 is located around Downtown East area. By looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that Aged group and Business are the main player in this cluster.
Cluster 8: Cluster 8 is represented by pink colour and as we can see the location of cluster 8 is located Choa Chu Kang and Tuas areas. By looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that business and Economy group are the main player in this cluster.
Cluster 9: Cluster 9 is represented by blue colour and as we can see the location of cluster 9 is located around Tangling area. By looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that Residential, Economy group and Embassy are the main player in this cluster.
Cluster 10: Cluster 10 is represented by blue colour and as we can see the location of cluster 10 town area. By looking at detail of Geodemographic and Urban Functions Indicators for each cluster, we can see that shopping, HDB3-4RM, financial, Residential, Economy group and Embassy are the main player in this cluster.