In this take-home exercise, we are tasked to segment Singapore at the planning subzone level into homogeneous socioeconomic areas by combining geodemographic data extracted from Singapore Department of Statistics and urban functions extracted from the geospatial data provided.
We are provided the following geospatial datasets:
They are all in ESRI shapefile format.
Business encompasses Industry as well and we are to extract both from it.
packages = c('rgdal', 'spdep', 'ClustGeo', 'tmap', 'sf', 'ggpubr', 'cluster', 'heatmaply', 'corrplot', 'psych', 'tidyverse','tmaptools','factoextra','NbClust')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
mpsz = st_read(dsn = "data/geospatial", layer="MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\jiiireh\Desktop\Take-home_ex03\Take-home_ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
unique(st_is_valid(mpsz,reason = TRUE))
## [1] "Valid Geometry"
## [2] "Ring Self-intersection[27932.3925999999 21982.7971999999]"
## [3] "Ring Self-intersection[26885.4439000003 26668.3121000007]"
## [4] "Ring Self-intersection[26920.1689999998 26978.5440999996]"
## [5] "Ring Self-intersection[15432.4749999996 31319.716]"
## [6] "Ring Self-intersection[12861.3828999996 32207.4923]"
## [7] "Ring Self-intersection[19681.2353999997 31294.4521999992]"
## [8] "Ring Self-intersection[41375.108 40432.8588999994]"
## [9] "Ring Self-intersection[38542.2260999996 44605.4089000002]"
## [10] "Ring Self-intersection[21702.5623000003 48125.1154999994]"
mpsz[rowSums(is.na(mpsz))!=0,]
## Simple feature collection with 0 features and 15 fields
## bbox: xmin: NA ymin: NA xmax: NA ymax: NA
## projected CRS: SVY21
## [1] OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
## [7] PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
## [13] Y_ADDR SHAPE_Leng SHAPE_Area geometry
## <0 rows> (or 0-length row.names)
As we can observe there are some invalid polygons in mpsz. We will resolve this with the function st_make_valid(). We can also see that there are no NA values.
mpsz <- st_make_valid(mpsz)
st_make_Valid attempts to repair invalidities without only minimal alterations to the input geometries. No vertices are dropped or moved, the structure of the object is simply re-arranged. This is a good thing for clean, but invalid data, and a bad thing for messy and invalid data.
unique(st_is_valid(mpsz,reason = TRUE))
## [1] "Valid Geometry"
st_crs(mpsz)
## Coordinate Reference System:
## User input: SVY21
## wkt:
## PROJCRS["SVY21",
## BASEGEOGCRS["SVY21[WGS84]",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ID["EPSG",6326]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["Degree",0.0174532925199433]]],
## CONVERSION["unnamed",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",1.36666666666667,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",103.833333333333,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",1,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",28001.642,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",38744.572,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]]
We see that mpsz is already is SVY21 or ESPG:3414, so no transformation will be required.
rDwelling <- read_csv ("data/aspatial/respopagesextod2011to2019.csv") %>%
filter(Time == 2019)
## Parsed with column specification:
## cols(
## PA = col_character(),
## SZ = col_character(),
## AG = col_character(),
## Sex = col_character(),
## TOD = col_character(),
## Pop = col_double(),
## Time = col_double()
## )
unique(rDwelling$TOD)
## [1] "HDB 1- and 2-Room Flats"
## [2] "HDB 3-Room Flats"
## [3] "HDB 4-Room Flats"
## [4] "HDB 5-Room and Executive Flats"
## [5] "HUDC Flats (excluding those privatised)"
## [6] "Landed Properties"
## [7] "Condominiums and Other Apartments"
## [8] "Others"
We have a couple of TODs that we will not be using for our study: “HUDC Flats” & “Others”. Hence, they will need to be removed from the data.
rDwelling <-rDwelling[(rDwelling$TOD!="HUDC Flats (excluding those privatised)" & rDwelling$TOD!="Others"),]
unique(rDwelling$TOD)
## [1] "HDB 1- and 2-Room Flats" "HDB 3-Room Flats"
## [3] "HDB 4-Room Flats" "HDB 5-Room and Executive Flats"
## [5] "Landed Properties" "Condominiums and Other Apartments"
E_Active <- rDwelling %>%
filter(AG == "25_to_29"| AG == "30_to_34"| AG == "35_to_39"| AG == "40_to_44" | AG == "45_to_49" |AG == "50_to_54"| AG == "55_to_59" | AG == "60_to_64") %>%
group_by(SZ = SZ) %>%
summarize(Pop = sum(Pop))%>%
mutate_at(.vars = vars(SZ), .funs = funs(toupper))
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.
colnames(E_Active)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(E_Active)[2] <- "E_Active" # Rename Pop to unique value for joining
Young <- rDwelling %>%
filter(AG == "0_to_4"| AG == "5_to_9"|AG == "10_to_14"|AG == "15_to_19"| AG == "20_to_24") %>%
group_by(SZ = SZ) %>%
summarize(Pop = sum(Pop))%>%
mutate_at(.vars = vars(SZ), .funs = funs(toupper))
colnames(Young)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(Young)[2] <- "Young" # Rename Pop to unique value for joining
Aged <- rDwelling %>%
filter(AG == "65_to_69"| AG == "70_to_74"| AG == "75_to_79"| AG == "80_to_84" | AG == "85_to_89" | AG == "90_and_over") %>%
group_by(SZ = SZ) %>%
summarize(Pop = sum(Pop))%>%
mutate_at(.vars = vars(SZ), .funs = funs(toupper))
colnames(Aged)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(Aged)[2] <- "Aged" # Rename Pop to unique value for joining
Pop_SZ <- rDwelling %>%
group_by(SZ = SZ) %>%
summarize(Pop = sum(Pop)) %>%
mutate_at(.vars = vars(SZ), .funs = funs(toupper))
colnames(Pop_SZ)[1] <- "SUBZONE_N"
mpsz_Pop <- left_join(mpsz, Pop_SZ)
## Joining, by = "SUBZONE_N"
## Warning: Column `SUBZONE_N` joining factor and character vector, coercing into
## character vector
mpsz_Pop <- st_set_geometry(mpsz_Pop, NULL) # Dropping the geometry Table
SG_Dens <- sum(mpsz_Pop$Pop) / sum(mpsz_Pop$SHAPE_Area) # Population Density of Singapore
SZ_Pop <- mpsz_Pop %>%
select("SUBZONE_N","Pop")
Pop_Dens <- mpsz_Pop %>%
mutate(dens = Pop / SHAPE_Area) # Calculate Population Density in m^2 by Subzone
Pop_Dens <- Pop_Dens[c(3,16,17)] # Dropping unneeded columns
HDB_1_2 <- rDwelling %>%
filter(TOD == "HDB 1- and 2-Room Flats") %>%
group_by(SZ = SZ) %>%
summarize(Pop = sum(Pop))%>%
mutate_at(.vars = vars(SZ), .funs = funs(toupper))
colnames(HDB_1_2)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(HDB_1_2)[2] <- "HDB_1_2" # Rename Pop to unique value for joining
HDB_3_4 <- rDwelling %>%
filter(TOD == "HDB 3-Room Flats" | TOD == "HDB 4-Room Flats") %>%
group_by(SZ = SZ) %>%
summarize(Pop = sum(Pop))%>%
mutate_at(.vars = vars(SZ), .funs = funs(toupper))
colnames(HDB_3_4)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(HDB_3_4)[2] <- "HDB_3_4" # Rename Pop to unique value for joining
HDB_5_EC <- rDwelling %>%
filter(TOD == "HDB 5-Room and Executive Flats") %>%
group_by(SZ = SZ) %>%
summarize(Pop = sum(Pop))%>%
mutate_at(.vars = vars(SZ), .funs = funs(toupper))
colnames(HDB_5_EC)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(HDB_5_EC)[2] <- "HDB_5_EC" # Rename Pop to unique value for joining
Condo_Apt <- rDwelling %>%
filter(TOD == "Condominiums and Other Apartments") %>%
group_by(SZ = SZ) %>%
summarize(Pop = sum(Pop))%>%
mutate_at(.vars = vars(SZ), .funs = funs(toupper))
colnames(Condo_Apt)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(Condo_Apt)[2] <- "Condo_Apt" # Rename Pop to unique value for joining
LandedProperty <- rDwelling %>%
filter(TOD == "Landed Properties") %>%
group_by(SZ = SZ) %>%
summarize(Pop = sum(Pop))%>%
mutate_at(.vars = vars(SZ), .funs = funs(toupper))
colnames(LandedProperty)[1] <- "SUBZONE_N" # Rename SZ to SUBZONE_N for ease of joining
colnames(LandedProperty)[2] <- "LandedProperty" # Rename Pop to unique value for joining
Indicators <- left_join(E_Active,Young)
## Joining, by = "SUBZONE_N"
Indicators <- left_join(Indicators,Aged)
## Joining, by = "SUBZONE_N"
Indicators <- left_join(Indicators,Pop_Dens)
## Joining, by = "SUBZONE_N"
Indicators <- left_join(Indicators,HDB_1_2)
## Joining, by = "SUBZONE_N"
Indicators <- left_join(Indicators,HDB_3_4)
## Joining, by = "SUBZONE_N"
Indicators <- left_join(Indicators,HDB_5_EC)
## Joining, by = "SUBZONE_N"
Indicators <- left_join(Indicators,Condo_Apt)
## Joining, by = "SUBZONE_N"
Indicators <- left_join(Indicators,LandedProperty)
## Joining, by = "SUBZONE_N"
summary(Indicators)
## SUBZONE_N E_Active Young Aged
## Length:323 Min. : 0 Min. : 0 Min. : 0
## Class :character 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Mode :character Median : 2790 Median : 1170 Median : 640
## Mean : 7346 Mean : 3286 Mean : 1761
## 3rd Qu.:10285 3rd Qu.: 4365 3rd Qu.: 2940
## Max. :79640 Max. :34240 Max. :18600
## Pop dens HDB_1_2 HDB_3_4
## Min. : 0 Min. :0.000000 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.:0.000000 1st Qu.: 0 1st Qu.: 0
## Median : 4880 Median :0.005857 Median : 0 Median : 0
## Mean : 12393 Mean :0.010662 Mean : 542 Mean : 5953
## 3rd Qu.: 17035 3rd Qu.:0.019864 3rd Qu.: 605 3rd Qu.: 9705
## Max. :132480 Max. :0.046058 Max. :4700 Max. :75000
## HDB_5_EC Condo_Apt LandedProperty
## Min. : 0 Min. : 0 Min. : 0.0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0.0
## Median : 0 Median : 230 Median : 0.0
## Mean : 3297 Mean : 1827 Mean : 773.9
## 3rd Qu.: 3660 3rd Qu.: 2835 3rd Qu.: 400.0
## Max. :47960 Max. :16770 Max. :18820.0
Looking at the data, we see subzones with 0 populations. If we use population 0 to scale we will get an error. Therefore we will drop subzones where population are 0.
Indicators<-Indicators[!(Indicators$Pop==0),]
Using Indicators as will be biased to areas with larger populations, hence we will use the code chunk below to overcome this issue
Indicators_derived <- Indicators %>%
mutate(`E_Active` = `E_Active`/`Pop`*1000) %>%
mutate(`Young` = `Young`/`Pop`*1000) %>%
mutate(`Aged` = `Aged`/`Pop`*1000) %>%
mutate(`HDB_1_2` = `HDB_1_2`/`Pop`*1000) %>%
mutate(`HDB_3_4` = `HDB_3_4`/`Pop`*1000) %>%
mutate(`HDB_5_EC` = `HDB_5_EC`/`Pop`*1000) %>%
mutate(`Condo_Apt` = `Condo_Apt`/`Pop`*1000) %>%
mutate(`LandedProperty` = `LandedProperty`/`Pop`*1000)
summary(Indicators_derived)
## SUBZONE_N E_Active Young Aged
## Length:228 Min. : 512.4 Min. : 0.0 Min. : 0.0
## Class :character 1st Qu.: 573.0 1st Qu.:218.0 1st Qu.:106.3
## Mode :character Median : 592.8 Median :254.2 Median :151.0
## Mean : 601.8 Mean :247.9 Mean :150.3
## 3rd Qu.: 607.9 3rd Qu.:287.6 3rd Qu.:192.7
## Max. :1000.0 Max. :360.0 Max. :325.9
## Pop dens HDB_1_2 HDB_3_4
## Min. : 10 Min. :6.580e-06 Min. : 0.00 Min. : 0.0
## 1st Qu.: 3330 1st Qu.:4.402e-03 1st Qu.: 0.00 1st Qu.: 0.0
## Median : 11640 Median :1.222e-02 Median : 0.00 Median :402.7
## Mean : 17557 Mean :1.510e-02 Mean : 40.37 Mean :355.0
## 3rd Qu.: 26505 3rd Qu.:2.474e-02 3rd Qu.: 47.94 3rd Qu.:606.7
## Max. :132480 Max. :4.606e-02 Max. :712.93 Max. :948.1
## HDB_5_EC Condo_Apt LandedProperty
## Min. : 0.0 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 26.75 1st Qu.: 0.0
## Median :144.0 Median : 145.71 Median : 0.0
## Mean :164.1 Mean : 307.46 Mean : 133.0
## 3rd Qu.:259.4 3rd Qu.: 491.72 3rd Qu.: 145.2
## Max. :836.4 Max. :1000.00 Max. :1000.0
We will join both Indicators & Indicators_derived to mpsz for plotting and replace all NA values with 0 for areas without specific indicators.
mpsz_Indicators <- left_join(mpsz, Indicators) #Join Indicators to mpsz for plotting purposes
## Joining, by = "SUBZONE_N"
## Warning: Column `SUBZONE_N` joining factor and character vector, coercing into
## character vector
mpsz_Indicators <- replace(mpsz_Indicators, is.na(mpsz_Indicators),0) # Replace NA with 0 for areas without specific indicators
mpsz_Indicators_derived <- left_join(mpsz,Indicators_derived) #Join Indicators_derived to mpsz for plotting purposes
## Joining, by = "SUBZONE_N"
## Warning: Column `SUBZONE_N` joining factor and character vector, coercing into
## character vector
mpsz_Indicators_derived <- replace(mpsz_Indicators_derived, is.na(mpsz_Indicators_derived),0) # Replace NA with 0 for areas without specific indicators
Indic_Pop_Map <- qtm(mpsz_Indicators, "Pop")
Indic_EA_Map <- qtm(mpsz_Indicators, "E_Active")
Indic_D_EA_Map <- qtm(mpsz_Indicators_derived, "E_Active")
tmap_arrange(Indic_Pop_Map,Indic_EA_Map,Indic_D_EA_Map, nrow = 3)
As we can observe, the E_Active of Indicators_derived no longer appears to follow the distribution pattern of the population.
densMap <- qtm(mpsz_Indicators, "dens")
Young_Indic_Map <- qtm(mpsz_Indicators_derived, "Young")
Aged_Indic_Map <- qtm(mpsz_Indicators_derived, "Aged")
tmap_arrange(densMap,Young_Indic_Map,Aged_Indic_Map, ncol = 2, nrow = 2)
We observe many high porportion areas of young in the north / north eastern regions of Singapore.
HDB_1_2_Indic_Map <- qtm(mpsz_Indicators_derived, "HDB_1_2")
HDB_3_4_Indic_Map <- qtm(mpsz_Indicators_derived, "HDB_3_4")
HDB_5_EC_Indic_Map <- qtm(mpsz_Indicators_derived, "HDB_5_EC")
tmap_arrange(HDB_1_2_Indic_Map,HDB_3_4_Indic_Map,HDB_5_EC_Indic_Map, ncol = 2, nrow = 2)
We see that HDB_1_2 are mainly in the central region.
Condo_Apt_Indic_Map <- qtm(mpsz_Indicators_derived, "Condo_Apt")
LandedProperty_Indic_Map <- qtm(mpsz_Indicators_derived, "LandedProperty")
tmap_arrange(Condo_Apt_Indic_Map,LandedProperty_Indic_Map, nrow = 2)
We observe a large number of Condo_Apts in the central region of Singapore.
Business = st_read(dsn = "data/geospatial", layer="Business")
## Reading layer `Business' from data source `C:\Users\jiiireh\Desktop\Take-home_ex03\Take-home_ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 6550 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6147 ymin: 1.24605 xmax: 104.0044 ymax: 1.4698
## geographic CRS: WGS 84
unique(st_is_valid(Business,reason = TRUE))
## [1] "Valid Geometry"
Business[rowSums(is.na(Business))!=0,]
## Simple feature collection with 279 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6198 ymin: 1.2601 xmax: 103.994 ymax: 1.45162
## geographic CRS: WGS 84
## First 10 features:
## POI_ID SEQ_NUM FAC_TYPE POI_NAME ST_NAME
## 4 1101180212 1 5000 MALAYSIA GARMENT MANUFACTURERS <NA>
## 13 1001052864 1 5000 MENLO WORLDWIDE LOGISTICS <NA>
## 113 1141900621 1 5000 PRIMZ BIZHUB <NA>
## 228 1097875448 1 5000 ACER TOWER A <NA>
## 229 1097875449 1 5000 ACER TOWER B <NA>
## 265 1137930245 1 5000 NORTH SPRING BIZHUB <NA>
## 267 1103814952 1 5000 AT PUNGGOL <NA>
## 269 1103814941 1 5000 ECO-SCAPE <NA>
## 270 1103814940 1 5000 PRINZ <NA>
## 271 1103814939 1 5000 SING SEE SOON FLORAL & LANDSCAPE <NA>
## geometry
## 4 POINT (103.8855 1.33821)
## 13 POINT (103.7509 1.32875)
## 113 POINT (103.805 1.43509)
## 228 POINT (103.7466 1.32665)
## 229 POINT (103.7459 1.32695)
## 265 POINT (103.8431 1.43731)
## 267 POINT (103.9172 1.39312)
## 269 POINT (103.9159 1.3934)
## 270 POINT (103.9159 1.3934)
## 271 POINT (103.9158 1.39326)
We observe no invalid polygons however there are NA Data; specifically in the ST_NAME column. However, street names is not a significant column to us and it will be dropped eventually, hence we can classify it as a non-issue.
unique(Business$FAC_TYPE)
## [1] 5000 9991
We see that there are two unique FAC_TYPEs in Business. We will separate them out to find out which FAC_TYPE refers to Industry.
Business_5000 <- Business %>%
filter(FAC_TYPE == 5000)
Business_9991 <- Business %>%
filter(FAC_TYPE == 9991)
glimpse(Business_5000)
## Rows: 6,440
## Columns: 6
## $ POI_ID <dbl> 1101180209, 1101180210, 1101180211, 1101180212, 1101180213...
## $ SEQ_NUM <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ FAC_TYPE <int> 5000, 5000, 5000, 5000, 5000, 5000, 5000, 5000, 5000, 5000...
## $ POI_NAME <fct> JOHN CHEN, TROPICAL INDUSTRIAL BUILDING, LIAN CHEONG INDUS...
## $ ST_NAME <fct> LITTLE RD, LITTLE RD, LITTLE RD, NA, LITTLE RD, LOWER KENT...
## $ geometry <POINT [°]> POINT (103.8856 1.33841), POINT (103.8852 1.33832), ...
glimpse(Business_9991)
## Rows: 110
## Columns: 6
## $ POI_ID <dbl> 1110491789, 1099992474, 1099992477, 1099992477, 1100464367...
## $ SEQ_NUM <int> 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2...
## $ FAC_TYPE <int> 9991, 9991, 9991, 9991, 9991, 9991, 9991, 9991, 9991, 9991...
## $ POI_NAME <fct> KAKI BUKIT INDUSTRIAL ESTATE, TERRACE FACTORIES TUAS SOUTH...
## $ ST_NAME <fct> KAKI BUKIT AVE 1, TUAS SOUTH ST 5, TUAS SOUTH ST 5, TUAS S...
## $ geometry <POINT [°]> POINT (103.9042 1.33269), POINT (103.6242 1.29325), ...
B5000_Map <- tm_shape(mpsz)+
tm_polygons(alpha = 0, border.col = "lightgrey", border.alpha = 1)+
tm_shape(Business_5000)+
tm_dots(col = "red")+
tm_layout(title = "Business FAC_TYPE 5000 Distribution",
title.size = 1,
title.position = c("center", "top"),
inner.margins = c(0.06, 0.10, 0.10, 0.08))
B9991_Map <- tm_shape(mpsz)+
tm_polygons(alpha = 0, border.col = "lightgrey", border.alpha = 1)+
tm_shape(Business_9991)+
tm_dots(col = "red")+
tm_layout(title = "Business FAC_TYPE 9991 Distribution",
title.size = 1,
title.position = c("center", "top"),
inner.margins = c(0.06, 0.10, 0.10, 0.08))
tmap_arrange(B5000_Map,B9991_Map, ncol = 2)
Looking into the data, FAC_TYPE 5000 has POI_NAMES such as “AIA” and “ABBOTT”, an insurance and medical device and health care companies respectively. Meanwhile FAC_TYPE 9991 primarily has POI_NAMES that include “INDUSTRIAL ESTATE” and “WATER FABRICATION”. Therefore we will regard FAC_TYPE 9991 as our extracted industry data.
Industry <- Business_9991
Business <- Business_5000
Financial = st_read(dsn = "data/geospatial", layer="Financial")
## Reading layer `Financial' from data source `C:\Users\jiiireh\Desktop\Take-home_ex03\Take-home_ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 3320 features and 29 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6256 ymin: 1.24392 xmax: 103.9998 ymax: 1.46247
## geographic CRS: WGS 84
unique(st_is_valid(Financial,reason = TRUE))
## [1] "Valid Geometry"
Financial[rowSums(is.na(Financial))!=0,]
## Simple feature collection with 3320 features and 29 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6256 ymin: 1.24392 xmax: 103.9998 ymax: 1.46247
## geographic CRS: WGS 84
## First 10 features:
## LINK_ID POI_ID SEQ_NUM FAC_TYPE POI_NAME POI_LANGCD
## 1 1170624361 1132324230 1 3578 UOB ENG
## 2 1112103842 1132315471 1 3578 POSB ENG
## 3 1112103842 1132315472 1 3578 UOB ENG
## 4 1112103842 1132315473 1 3578 OCBC ENG
## 5 864687596 1100784924 1 3578 OCBC ENG
## 6 902073032 1132324170 1 6000 MAYBANK ENG
## 7 778516217 1141424387 1 6000 ADPOST MONEYCHANGER ENG
## 8 880495939 1096910285 1 3578 UOB ENG
## 9 866996334 1096910292 1 3578 OCBC ENG
## 10 880495939 1096910286 1 3578 CITIBANK ENG
## POI_NMTYPE POI_ST_NUM ST_NUM_FUL ST_NFUL_LC ST_NAME ST_LANGCD
## 1 B 201 <NA> <NA> YISHUN AVE 2 ENG
## 2 B 375 <NA> <NA> COMMONWEALTH AVE ENG
## 3 B 375 <NA> <NA> COMMONWEALTH AVE ENG
## 4 B 375 <NA> <NA> COMMONWEALTH AVE ENG
## 5 B <NA> <NA> <NA> JURONG WEST ST 51 ENG
## 6 B 707 <NA> <NA> EAST COAST RD ENG
## 7 B 163 <NA> <NA> TANGLIN RD ENG
## 8 B <NA> <NA> <NA> <NA> <NA>
## 9 B 11 <NA> <NA> ARTS LINK ENG
## 10 B <NA> <NA> <NA> <NA> <NA>
## POI_ST_SD ACC_TYPE PH_NUMBER CHAIN_ID NAT_IMPORT PRIVATE IN_VICIN
## 1 L <NA> <NA> 6919 N N N
## 2 R <NA> <NA> 6918 N N N
## 3 R <NA> <NA> 6919 N N N
## 4 R <NA> <NA> 6920 N N N
## 5 R <NA> <NA> 6920 N N N
## 6 L <NA> 18006292266 3657 N N N
## 7 R <NA> 67330779 0 N N N
## 8 R <NA> <NA> 6919 N N N
## 9 R <NA> <NA> 6920 N N N
## 10 R <NA> <NA> 1165 N N N
## NUM_PARENT NUM_CHILD PERCFRREF VANCITY_ID
## 1 0 0 NA 0
## 2 0 0 NA 0
## 3 0 0 NA 0
## 4 0 0 NA 0
## 5 0 0 60 0
## 6 0 0 NA 0
## 7 1 0 50 0
## 8 0 0 20 0
## 9 0 0 NA 0
## 10 0 0 20 0
## ACT_ADDR
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 <NA>
## 5 501 JURONG WEST STREET 51 SINGAPORE 640501
## 6 <NA>
## 7 <NA>
## 8 <NA>
## 9 <NA>
## 10 <NA>
## ACT_LANGCD ACT_ST_NAM ACT_ST_NUM ACT_ADMIN ACT_POSTAL
## 1 <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA>
## 5 ENG JURONG WEST STREET 51 501 SINGAPORE 640501
## 6 <NA> <NA> <NA> <NA> <NA>
## 7 <NA> <NA> <NA> <NA> <NA>
## 8 <NA> <NA> <NA> <NA> <NA>
## 9 <NA> <NA> <NA> <NA> <NA>
## 10 <NA> <NA> <NA> <NA> <NA>
## geometry
## 1 POINT (103.833 1.41695)
## 2 POINT (103.7989 1.30211)
## 3 POINT (103.7989 1.30211)
## 4 POINT (103.7989 1.30211)
## 5 POINT (103.7189 1.35016)
## 6 POINT (103.9224 1.31199)
## 7 POINT (103.8242 1.30528)
## 8 POINT (103.7723 1.29608)
## 9 POINT (103.7719 1.29367)
## 10 POINT (103.7723 1.29608)
We observe no invalid polygons however there are NA Data in many columns. Taking a look at the NA values, we see that they refer to mainly the street number/ name, phone number & address details. However since we have the geometric location to locate each point, street names and address columns are redundant and thus the NA values are a non-issue as those columns are not taken into account for our study.
unique(Financial$FAC_TYPE)
## [1] 3578 6000
We see two unique FAC_TYPEs within Financial. We will take a look at the differences between the two.
Financial_3578 <- Financial %>%
filter(FAC_TYPE == '3578')
Financial_6000 <- Financial %>%
filter(FAC_TYPE == '6000')
F3578_Map <- tm_shape(mpsz)+
tm_polygons(alpha = 0, border.col = "lightgrey", border.alpha = 1)+
tm_shape(Financial_3578)+
tm_dots(col = "red")+
tm_layout(title = "Financial FAC_TYPE 3578 Distribution",
title.size = 1,
title.position = c("center", "top"),
inner.margins = c(0.06, 0.10, 0.10, 0.08))
F6000_Map <- tm_shape(mpsz)+
tm_polygons(alpha = 0, border.col = "lightgrey", border.alpha = 1)+
tm_shape(Financial_6000)+
tm_dots(col = "red")+
tm_layout(title = "Financial FAC_TYPE 6000 Distribution",
title.size = 1,
title.position = c("center", "top"),
inner.margins = c(0.06, 0.10, 0.10, 0.08))
tmap_arrange(F3578_Map,F6000_Map, ncol=2)
From analysing the data and checking the available postal codes within the data, it appears FAC_TYPE 3578 under Financial are the geolocations of the ATMs in Singapore while FAC_TYPE 6000 are the geolocations of services such as money exchangers or banks.
Govt_Embassy = st_read(dsn = "data/geospatial", layer="Govt_Embassy")
## Reading layer `Govt_Embassy' from data source `C:\Users\jiiireh\Desktop\Take-home_ex03\Take-home_ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 443 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6282 ymin: 1.24911 xmax: 103.9884 ymax: 1.45765
## geographic CRS: WGS 84
unique(st_is_valid(Govt_Embassy,reason = TRUE))
## [1] "Valid Geometry"
Govt_Embassy[rowSums(is.na(Govt_Embassy))!=0,]
## Simple feature collection with 28 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.7206 ymin: 1.27341 xmax: 103.8578 ymax: 1.45114
## geographic CRS: WGS 84
## First 10 features:
## POI_ID SEQ_NUM FAC_TYPE POI_NAME
## 4 1141424338 1 9993 GENERAL CONSULATE OMAN
## 34 1070436984 1 9993 TAIPEI REPRESENTATIVE OFFICE
## 35 1070436981 1 9525 MINISTRY OF TRANSPORT
## 36 1070436980 1 9525 CASINO REGULATORY AUTHORITY
## 49 1024547731 1 9525 SLF BUILDING
## 63 1149038609 1 9525 MEDIA DEVELOPMENT AUTHORITY OF SINGAPORE
## 64 1149038609 2 9525 MDA
## 66 1058449988 1 9993 EMBASSY UNITED ARAB EMIRATES
## 116 1083739893 1 9525 RAFFLES BUILDING
## 117 1083739890 1 9525 NATIONAL PARKS BOARD HEADQUARTERS
## ST_NAME geometry
## 4 <NA> POINT (103.8578 1.2999)
## 34 <NA> POINT (103.8011 1.27347)
## 35 <NA> POINT (103.8012 1.27341)
## 36 <NA> POINT (103.8012 1.27341)
## 49 <NA> POINT (103.8393 1.33325)
## 63 <NA> POINT (103.7874 1.29881)
## 64 <NA> POINT (103.7874 1.29881)
## 66 <NA> POINT (103.8578 1.2999)
## 116 <NA> POINT (103.8182 1.31664)
## 117 <NA> POINT (103.8161 1.31599)
We can observe that there are no invalid geometries, however there are NAs, specifically in the ST_NAME column. As mentioned, since the ST_NAME column is not a column we will be taking into consideration for this study, the NA values within it are a non-issue.
unique(Govt_Embassy$FAC_TYPE)
## [1] 9993 9525
We see two unique FAC_TYPEs. We will briefly take a look at the differences.
Govt_Embassy_9993 <- Govt_Embassy %>%
filter(FAC_TYPE == '9993')
Govt_Embassy_9525 <- Govt_Embassy %>%
filter(FAC_TYPE == '9525')
G9993_Map <- tm_shape(mpsz)+
tm_polygons(alpha = 0, border.col = "lightgrey", border.alpha = 1)+
tm_shape(Govt_Embassy_9993)+
tm_dots(col = "red")+
tm_layout(title = "Govt_Embassy FAC_TYPE 9993 Distribution",
title.size = 1,
title.position = c("center", "top"),
inner.margins = c(0.06, 0.10, 0.10, 0.08))
G9525_Map <- tm_shape(mpsz)+
tm_polygons(alpha = 0, border.col = "lightgrey", border.alpha = 1)+
tm_shape(Govt_Embassy_9525)+
tm_dots(col = "red")+
tm_layout(title = "Govt_Embassy FAC_TYPE 9525 Distribution",
title.size = 1,
title.position = c("center", "top"),
inner.margins = c(0.06, 0.10, 0.10, 0.08))
tmap_arrange(G9993_Map,G9525_Map, ncol=2)
We observe that FAC_TYPE 9993 have most of its locations in the central region of Singapore while FAC TYPE 9525 is more spaced out around Singapore. We will take a look at the data for more insight as to why this may be.
head(Govt_Embassy_9993)
## Simple feature collection with 6 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.8431 ymin: 1.28113 xmax: 103.8578 ymax: 1.31836
## geographic CRS: WGS 84
## POI_ID SEQ_NUM FAC_TYPE POI_NAME ST_NAME
## 1 1141424380 1 9993 CONSULATE SAN MARINO CHURCH ST
## 2 1141424404 1 9993 EMBASSY LAOS GOLDHILL PLZ
## 3 1141424402 1 9993 CONSULATE BELIZE CECIL ST
## 4 1141424338 1 9993 GENERAL CONSULATE OMAN <NA>
## 5 1001332522 1 9993 EMBASSY NORWAY RAFFLES QUAY
## 6 1001332520 1 9993 EMBASSY PANAMA RAFFLES QUAY
## geometry
## 1 POINT (103.8494 1.28343)
## 2 POINT (103.8431 1.31836)
## 3 POINT (103.8493 1.28128)
## 4 POINT (103.8578 1.2999)
## 5 POINT (103.8512 1.28113)
## 6 POINT (103.8512 1.2812)
head(Govt_Embassy_9525)
## Simple feature collection with 6 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.8455 ymin: 1.27869 xmax: 103.9184 ymax: 1.32688
## geographic CRS: WGS 84
## POI_ID SEQ_NUM FAC_TYPE POI_NAME ST_NAME
## 1 1192460871 1 9525 MND TOWER BLOCK MAXWELL RD
## 2 1192460819 1 9525 MND AUDITORIUM & FUNCTION HALL MAXWELL RD
## 3 1192460843 1 9525 AICARE LINK @ MAXWELL MAXWELL RD
## 4 1192460783 1 9525 HARMONY IN DIVERSITY GALLERY MAXWELL RD
## 5 1192460750 1 9525 FAMILY SUPPORT DIVISION MSF MAXWELL RD
## 6 1194224304 1 9525 LTA BEDOK CAMPUS CHAI CHEE ST
## geometry
## 1 POINT (103.8456 1.27869)
## 2 POINT (103.8455 1.27883)
## 3 POINT (103.8455 1.27883)
## 4 POINT (103.8455 1.27883)
## 5 POINT (103.8455 1.27883)
## 6 POINT (103.9184 1.32688)
Looking at the POI_NAMES, we see that FAC_TYPE 9993 are the foreign embassies, while FAC_TYPE 9525 have POI_NAMES that include “Town Council” , “Fire Station” etc.
Private_residential = st_read(dsn = "data/geospatial", layer="Private residential")
## Reading layer `Private residential' from data source `C:\Users\jiiireh\Desktop\Take-home_ex03\Take-home_ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 3604 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.6295 ymin: 1.23943 xmax: 103.9749 ymax: 1.45379
## geographic CRS: WGS 84
unique(st_is_valid(Private_residential,reason = TRUE))
## [1] "Valid Geometry"
Private_residential[rowSums(is.na(Private_residential))!=0,]
## Simple feature collection with 45 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.742 ymin: 1.27681 xmax: 103.9495 ymax: 1.44509
## geographic CRS: WGS 84
## First 10 features:
## POI_ID SEQ_NUM FAC_TYPE POI_NAME ST_NAME
## 3 1202668778 1 9590 GREENTOPS @ SIMS PLACE <NA>
## 40 1100618584 1 9590 PARKROYAL RESIDENCE <NA>
## 70 1202435811 1 9590 FERNVALE GARDENS <NA>
## 72 1202435810 1 9590 FERNVALE FLORA <NA>
## 102 995162529 1 9590 SIGNATURE PARK <NA>
## 231 1149047335 1 9590 SOUTH BEACH RESIDENCES <NA>
## 236 1192848219 1 9590 J GATEWAY <NA>
## 287 1023797242 1 9590 GREAT WORLD CITY <NA>
## 542 1202435848 1 9590 COSTA RIS <NA>
## 718 1069869725 1 9590 THE LAURELS <NA>
## geometry
## 3 POINT (103.8797 1.31643)
## 40 POINT (103.8609 1.30024)
## 70 POINT (103.8788 1.39261)
## 72 POINT (103.8757 1.39347)
## 102 POINT (103.7699 1.34295)
## 231 POINT (103.8564 1.29458)
## 236 POINT (103.742 1.33585)
## 287 POINT (103.8314 1.29365)
## 542 POINT (103.948 1.36881)
## 718 POINT (103.8374 1.30444)
We observe no invalid geometries but NA values in the ST_NAME column. Similar to the above conclusions, the NA values in ST_NAME is a non-issue.
unique(Private_residential$FAC_TYPE)
## [1] 9590
We observe only one FAC_TYPE in Private_residential.
P_Map <- tm_shape(mpsz)+
tm_polygons(Private_residential = 0, border.col = "Black", border.alpha = 1)+
tm_shape(Private_residential)+
tm_dots(col = "red")+
tm_layout(title = "Private_residential Distribution",
title.size = 1,
title.position = c("center", "top"),
inner.margins = c(0.06, 0.10, 0.10, 0.08))
P_Map
We observe a denser distribution of upmarket residential locations near the central and slightly eastern area of Singapore as compared to the other areas. This could be reasoned by property locations near the Central Business Districts usually being more lucrative due to it’s proximity to the CBD or town areas.
Shopping = st_read(dsn = "data/geospatial", layer="Shopping")
## Reading layer `Shopping' from data source `C:\Users\jiiireh\Desktop\Take-home_ex03\Take-home_ex03\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 511 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.679 ymin: 1.24779 xmax: 103.9644 ymax: 1.4535
## geographic CRS: WGS 84
unique(st_is_valid(Shopping,reason = TRUE))
## [1] "Valid Geometry"
Shopping[rowSums(is.na(Shopping))!=0,]
## Simple feature collection with 102 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.679 ymin: 1.25619 xmax: 103.9635 ymax: 1.45123
## geographic CRS: WGS 84
## First 10 features:
## POI_ID SEQ_NUM FAC_TYPE POI_NAME
## 7 1069767253 1 6512 UNITED SQUARE GOLDHILL PLAZA ENTRANCE
## 8 1069767253 2 6512 UNITED SQUARE GOLDHILL PLZ ENTRANCE
## 9 1039562724 1 6512 THE FORUM
## 10 1039562723 1 6512 WATERFRONT
## 12 1039562756 1 6512 THE BULL RING
## 18 1069686034 1 6512 BUGIS+
## 21 1178047575 1 6512 E!AVENUE
## 25 1178800633 1 6512 TANJONG PAGAR CENTRE
## 27 1201735347 1 6512 HEARTBEAT @ BEDOK
## 36 1103577748 1 6512 TAMPINES MART-TAMPINES STREET 34 ENTRANCE
## ST_NAME geometry
## 7 <NA> POINT (103.8432 1.31744)
## 8 <NA> POINT (103.8432 1.31744)
## 9 <NA> POINT (103.8205 1.25619)
## 10 <NA> POINT (103.8205 1.25619)
## 12 <NA> POINT (103.8205 1.25619)
## 18 <NA> POINT (103.8539 1.29988)
## 21 <NA> POINT (103.9556 1.37781)
## 25 <NA> POINT (103.8454 1.27721)
## 27 <NA> POINT (103.9326 1.32735)
## 36 <NA> POINT (103.9607 1.35454)
unique(Shopping$FAC_TYPE)
## [1] 6512
Only one unique FAC_TYPE is observed.
head(Shopping)
## Simple feature collection with 6 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 103.7127 ymin: 1.28458 xmax: 103.9041 ymax: 1.35375
## geographic CRS: WGS 84
## POI_ID SEQ_NUM FAC_TYPE POI_NAME ST_NAME
## 1 1132106213 1 6512 SIN MING CENTRE SIN MING RD
## 2 801758392 1 6512 THE ADELPHI COLEMAN ST
## 3 842821452 1 6512 BOON LAY SHOPPING CENTRE BOON LAY PL
## 4 1193779191 1 6512 KATONG SQUARE EAST COAST RD
## 5 801758399 1 6512 SIM LIM SQUARE ROCHOR CANAL RD
## 6 1001450091 1 6512 PEOPLE'S PARK COMPLEX PARK RD
## geometry
## 1 POINT (103.836 1.35375)
## 2 POINT (103.8515 1.29124)
## 3 POINT (103.7127 1.34672)
## 4 POINT (103.9041 1.305)
## 5 POINT (103.8533 1.30341)
## 6 POINT (103.843 1.28458)
From the POI_NAMES, we can see that Shopping refers to the various shopping centres in Singapore.
S_Map <- tm_shape(mpsz)+
tm_polygons(Shopping = 0, border.col = "Black", border.alpha = 1)+
tm_shape(Shopping)+
tm_dots(col = "red")+
tm_layout(title = "Shopping Distribution",
title.size = 1,
title.position = c("center", "top"),
inner.margins = c(0.06, 0.10, 0.10, 0.08))
S_Map
We can observe a large number of shopping centres in the Central region of Singapore likely to cater to the traffic of people who either work there and patronise these shopping centres during meal hours or post work or for leisure and tourists.
We also do see sparse distributions of shopping centres around Singapore to cater to the populace living in those parts.
st_crs(Business)
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
st_crs(Industry)
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
st_crs(Financial)
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
st_crs(Govt_Embassy)
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
st_crs(Private_residential)
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
st_crs(Shopping)
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
We see that the projection system of various urban functions are ESPG:4326. Therefore we will have to transform them to ESPG:3414
Business3414 <- st_transform(Business, "+init=EPSG:3414 +datum=WGS84")
Industry3414 <- st_transform(Industry, "+init=EPSG:3414 +datum=WGS84")
Govt_Embassy3414 <- st_transform(Govt_Embassy, "+init=EPSG:3414 +datum=WGS84")
Financial3414 <- st_transform(Financial, "+init=EPSG:3414 +datum=WGS84")
Private_residential3414 <- st_transform(Private_residential, "+init=EPSG:3414 +datum=WGS84")
Shopping3414 <- st_transform(Shopping, "+init=EPSG:3414 +datum=WGS84")
st_crs(Business3414)
## Coordinate Reference System:
## User input: +init=EPSG:3414 +datum=WGS84
## wkt:
## PROJCRS["unknown",
## BASEGEOGCRS["unknown",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ID["EPSG",6326]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8901]]],
## CONVERSION["unknown",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",1.36666666666667,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",103.833333333333,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",1,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",28001.642,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",38744.572,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]]
Our urban functions are now in ESPG:3414 projection.
mpsz_Business <- st_join(mpsz,Business3414)
mpsz_Business <- mpsz_Business[!is.na(mpsz_Business$FAC_TYPE),] #Removing Subzones without UF
mpsz_Business <- mpsz_Business %>%
mutate(count = 1) # adding a count column
Business_SZCount <- mpsz_Business %>%
group_by(SUBZONE_N) %>%
summarise(Business = sum(count)) # counting UF by Subzone
Business_SZCount <- st_set_geometry(Business_SZCount, NULL) # Dropping the geometry Table
mpsz_Industry <- st_join(mpsz,Industry3414)
mpsz_Industry <- mpsz_Industry[!is.na(mpsz_Industry$FAC_TYPE),] #Removing Subzones without UF
mpsz_Industry <- mpsz_Industry %>%
mutate(count = 1) # adding a count column
Industry_SZCount <- mpsz_Industry %>%
group_by(SUBZONE_N) %>%
summarise(Industry = sum(count))# counting UF by Subzone
Industry_SZCount <- st_set_geometry(Industry_SZCount, NULL) # Dropping the geometry Table
mpsz_Financial <- st_join(mpsz,Financial3414)
mpsz_Financial <- mpsz_Financial[!is.na(mpsz_Financial$FAC_TYPE),] #Removing Subzones without UF
mpsz_Financial <- mpsz_Financial %>%
mutate(count = 1) # adding a count column
Financial_SZCount <- mpsz_Financial %>%
group_by(SUBZONE_N) %>%
summarise(Financial = sum(count))# counting UF by Subzone
Financial_SZCount <- st_set_geometry(Financial_SZCount, NULL) # Dropping the geometry Table
mpsz_Govt_Embassy <- st_join(mpsz,Govt_Embassy3414)
mpsz_Govt_Embassy <- mpsz_Govt_Embassy[!is.na(mpsz_Govt_Embassy$FAC_TYPE),] #Removing Subzones without UF
mpsz_Govt_Embassy <- mpsz_Govt_Embassy %>%
mutate(count = 1) # adding a count column
Govt_Embassy_SZCount <- mpsz_Govt_Embassy %>%
group_by(SUBZONE_N) %>%
summarise(Govt_Embassy = sum(count))# counting UF by Subzone
Govt_Embassy_SZCount <- st_set_geometry(Govt_Embassy_SZCount, NULL) # Dropping the geometry Table
mpsz_Private_residential <- st_join(mpsz,Private_residential3414)
mpsz_Private_residential <- mpsz_Private_residential[!is.na(mpsz_Private_residential$FAC_TYPE),] #Removing Subzones without UF
mpsz_Private_residential <- mpsz_Private_residential %>%
mutate(count = 1) # adding a count column
Private_residential_SZCount <- mpsz_Private_residential %>%
group_by(SUBZONE_N) %>%
summarise(Private_residential = sum(count))# counting UF by Subzone
Private_residential_SZCount <- st_set_geometry(Private_residential_SZCount, NULL) # Dropping the geometry Table
mpsz_Shopping <- st_join(mpsz,Shopping3414)
mpsz_Shopping <- mpsz_Shopping[!is.na(mpsz_Shopping$FAC_TYPE),] #Removing Subzones without UF
mpsz_Shopping <- mpsz_Shopping %>%
mutate(count = 1) # adding a count column
Shopping_SZCount <- mpsz_Shopping %>%
group_by(SUBZONE_N) %>%
summarise(Shopping = sum(count))# counting UF by Subzone
Shopping_SZCount <- st_set_geometry(Shopping_SZCount, NULL) # Dropping the geometry Table
mpsz_UF <- left_join(mpsz,Business_SZCount)
## Joining, by = "SUBZONE_N"
mpsz_UF <- left_join(mpsz_UF,Industry_SZCount)
## Joining, by = "SUBZONE_N"
mpsz_UF <- left_join(mpsz_UF,Financial_SZCount)
## Joining, by = "SUBZONE_N"
mpsz_UF <- left_join(mpsz_UF,Govt_Embassy_SZCount)
## Joining, by = "SUBZONE_N"
mpsz_UF <- left_join(mpsz_UF,Private_residential_SZCount)
## Joining, by = "SUBZONE_N"
mpsz_UF <- left_join(mpsz_UF,Shopping_SZCount)
## Joining, by = "SUBZONE_N"
summary(mpsz_UF)
## OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND
## Min. : 1.0 Min. : 1.000 ADMIRALTY : 1 AMSZ01 : 1 N:274
## 1st Qu.: 81.5 1st Qu.: 2.000 AIRPORT ROAD : 1 AMSZ02 : 1 Y: 49
## Median :162.0 Median : 4.000 ALEXANDRA HILL : 1 AMSZ03 : 1
## Mean :162.0 Mean : 4.625 ALEXANDRA NORTH: 1 AMSZ04 : 1
## 3rd Qu.:242.5 3rd Qu.: 6.500 ALJUNIED : 1 AMSZ05 : 1
## Max. :323.0 Max. :17.000 ANAK BUKIT : 1 AMSZ06 : 1
## (Other) :317 (Other):317
## PLN_AREA_N PLN_AREA_C REGION_N REGION_C
## BUKIT MERAH : 17 BM : 17 CENTRAL REGION :134 CR :134
## QUEENSTOWN : 15 QT : 15 EAST REGION : 30 ER : 30
## ANG MO KIO : 12 AM : 12 NORTH-EAST REGION: 48 NER: 48
## DOWNTOWN CORE: 12 DT : 12 NORTH REGION : 41 NR : 41
## TOA PAYOH : 12 TP : 12 WEST REGION : 70 WR : 70
## HOUGANG : 10 HG : 10
## (Other) :245 (Other):245
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR
## 00F5E30B5C9B7AD8: 1 Min. :2014-12-05 Min. : 5093 Min. :19579
## 013B509B8EDF15BE: 1 1st Qu.:2014-12-05 1st Qu.:21864 1st Qu.:31776
## 01A4287FB060A0A6: 1 Median :2014-12-05 Median :28465 Median :35113
## 029BD940F4455194: 1 Mean :2014-12-05 Mean :27257 Mean :36106
## 0524461C92F35D94: 1 3rd Qu.:2014-12-05 3rd Qu.:31674 3rd Qu.:39869
## 05FD555397CBEE7A: 1 Max. :2014-12-05 Max. :50425 Max. :49553
## (Other) :317
## SHAPE_Leng SHAPE_Area Business Industry
## Min. : 871.5 Min. : 39438 Min. : 1.00 Min. :1.000
## 1st Qu.: 3709.6 1st Qu.: 628261 1st Qu.: 2.00 1st Qu.:1.000
## Median : 5211.9 Median : 1229894 Median : 7.00 Median :1.000
## Mean : 6524.4 Mean : 2420882 Mean : 29.81 Mean :2.245
## 3rd Qu.: 6942.6 3rd Qu.: 2106483 3rd Qu.: 29.00 3rd Qu.:2.000
## Max. :68083.9 Max. :69748299 Max. :308.00 Max. :8.000
## NA's :107 NA's :274
## Financial Govt_Embassy Private_residential Shopping
## Min. : 1.00 Min. : 1.000 Min. : 1.00 Min. : 1.000
## 1st Qu.: 3.25 1st Qu.: 1.000 1st Qu.: 3.00 1st Qu.: 1.000
## Median : 8.00 Median : 2.000 Median : 7.00 Median : 2.000
## Mean : 13.28 Mean : 3.331 Mean : 15.08 Mean : 3.476
## 3rd Qu.: 16.00 3rd Qu.: 4.000 3rd Qu.: 15.00 3rd Qu.: 4.000
## Max. :134.00 Max. :19.000 Max. :217.00 Max. :31.000
## NA's :73 NA's :190 NA's :84 NA's :176
## geometry
## MULTIPOLYGON :318
## POLYGON : 5
## epsg:NA : 0
## +proj=tmer...: 0
##
##
##
We observe NA values in our urban functions in subzones that do not have the respective urban functions. We will replace these NA values with 0 to represent that these areas do not have the respective urban functions.
mpsz_UF <- replace(mpsz_UF, is.na(mpsz_UF),0)
mpsz_UF[rowSums(is.na(mpsz_UF))!=0,]
## Simple feature collection with 0 features and 21 fields
## bbox: xmin: NA ymin: NA xmax: NA ymax: NA
## projected CRS: SVY21
## [1] OBJECTID SUBZONE_NO SUBZONE_N
## [4] SUBZONE_C CA_IND PLN_AREA_N
## [7] PLN_AREA_C REGION_N REGION_C
## [10] INC_CRC FMEL_UPD_D X_ADDR
## [13] Y_ADDR SHAPE_Leng SHAPE_Area
## [16] Business Industry Financial
## [19] Govt_Embassy Private_residential Shopping
## [22] geometry
## <0 rows> (or 0-length row.names)
summary(mpsz_UF)
## OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND
## Min. : 1.0 Min. : 1.000 ADMIRALTY : 1 AMSZ01 : 1 N:274
## 1st Qu.: 81.5 1st Qu.: 2.000 AIRPORT ROAD : 1 AMSZ02 : 1 Y: 49
## Median :162.0 Median : 4.000 ALEXANDRA HILL : 1 AMSZ03 : 1
## Mean :162.0 Mean : 4.625 ALEXANDRA NORTH: 1 AMSZ04 : 1
## 3rd Qu.:242.5 3rd Qu.: 6.500 ALJUNIED : 1 AMSZ05 : 1
## Max. :323.0 Max. :17.000 ANAK BUKIT : 1 AMSZ06 : 1
## (Other) :317 (Other):317
## PLN_AREA_N PLN_AREA_C REGION_N REGION_C
## BUKIT MERAH : 17 BM : 17 CENTRAL REGION :134 CR :134
## QUEENSTOWN : 15 QT : 15 EAST REGION : 30 ER : 30
## ANG MO KIO : 12 AM : 12 NORTH-EAST REGION: 48 NER: 48
## DOWNTOWN CORE: 12 DT : 12 NORTH REGION : 41 NR : 41
## TOA PAYOH : 12 TP : 12 WEST REGION : 70 WR : 70
## HOUGANG : 10 HG : 10
## (Other) :245 (Other):245
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR
## 00F5E30B5C9B7AD8: 1 Min. :2014-12-05 Min. : 5093 Min. :19579
## 013B509B8EDF15BE: 1 1st Qu.:2014-12-05 1st Qu.:21864 1st Qu.:31776
## 01A4287FB060A0A6: 1 Median :2014-12-05 Median :28465 Median :35113
## 029BD940F4455194: 1 Mean :2014-12-05 Mean :27257 Mean :36106
## 0524461C92F35D94: 1 3rd Qu.:2014-12-05 3rd Qu.:31674 3rd Qu.:39869
## 05FD555397CBEE7A: 1 Max. :2014-12-05 Max. :50425 Max. :49553
## (Other) :317
## SHAPE_Leng SHAPE_Area Business Industry
## Min. : 871.5 Min. : 39438 Min. : 0.00 Min. :0.0000
## 1st Qu.: 3709.6 1st Qu.: 628261 1st Qu.: 0.00 1st Qu.:0.0000
## Median : 5211.9 Median : 1229894 Median : 2.00 Median :0.0000
## Mean : 6524.4 Mean : 2420882 Mean : 19.94 Mean :0.3406
## 3rd Qu.: 6942.6 3rd Qu.: 2106483 3rd Qu.: 14.00 3rd Qu.:0.0000
## Max. :68083.9 Max. :69748299 Max. :308.00 Max. :8.0000
##
## Financial Govt_Embassy Private_residential Shopping
## Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 0.000
## 1st Qu.: 1.00 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 5.00 Median : 0.000 Median : 4.00 Median : 0.000
## Mean : 10.28 Mean : 1.372 Mean : 11.16 Mean : 1.582
## 3rd Qu.: 13.00 3rd Qu.: 1.000 3rd Qu.: 11.00 3rd Qu.: 1.000
## Max. :134.00 Max. :19.000 Max. :217.00 Max. :31.000
##
## geometry
## MULTIPOLYGON :318
## POLYGON : 5
## epsg:NA : 0
## +proj=tmer...: 0
##
##
##
As we observe, no NA remains.
We will not be deriving any new variables from the urban functions as regardless of subzone area size, the Central Region in Singapore always has a higher number of businesses while industry functions try to be further away from housing areas and shopping and government buildings such as community centres will be around housing or business locations.
mpsz_All <- left_join(mpsz_UF,Indicators_derived)
## Joining, by = "SUBZONE_N"
## Warning: Column `SUBZONE_N` joining factor and character vector, coercing into
## character vector
summary(mpsz_All)
## OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND
## Min. : 1.0 Min. : 1.000 Length:323 AMSZ01 : 1 N:274
## 1st Qu.: 81.5 1st Qu.: 2.000 Class :character AMSZ02 : 1 Y: 49
## Median :162.0 Median : 4.000 Mode :character AMSZ03 : 1
## Mean :162.0 Mean : 4.625 AMSZ04 : 1
## 3rd Qu.:242.5 3rd Qu.: 6.500 AMSZ05 : 1
## Max. :323.0 Max. :17.000 AMSZ06 : 1
## (Other):317
## PLN_AREA_N PLN_AREA_C REGION_N REGION_C
## BUKIT MERAH : 17 BM : 17 CENTRAL REGION :134 CR :134
## QUEENSTOWN : 15 QT : 15 EAST REGION : 30 ER : 30
## ANG MO KIO : 12 AM : 12 NORTH-EAST REGION: 48 NER: 48
## DOWNTOWN CORE: 12 DT : 12 NORTH REGION : 41 NR : 41
## TOA PAYOH : 12 TP : 12 WEST REGION : 70 WR : 70
## HOUGANG : 10 HG : 10
## (Other) :245 (Other):245
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR
## 00F5E30B5C9B7AD8: 1 Min. :2014-12-05 Min. : 5093 Min. :19579
## 013B509B8EDF15BE: 1 1st Qu.:2014-12-05 1st Qu.:21864 1st Qu.:31776
## 01A4287FB060A0A6: 1 Median :2014-12-05 Median :28465 Median :35113
## 029BD940F4455194: 1 Mean :2014-12-05 Mean :27257 Mean :36106
## 0524461C92F35D94: 1 3rd Qu.:2014-12-05 3rd Qu.:31674 3rd Qu.:39869
## 05FD555397CBEE7A: 1 Max. :2014-12-05 Max. :50425 Max. :49553
## (Other) :317
## SHAPE_Leng SHAPE_Area Business Industry
## Min. : 871.5 Min. : 39438 Min. : 0.00 Min. :0.0000
## 1st Qu.: 3709.6 1st Qu.: 628261 1st Qu.: 0.00 1st Qu.:0.0000
## Median : 5211.9 Median : 1229894 Median : 2.00 Median :0.0000
## Mean : 6524.4 Mean : 2420882 Mean : 19.94 Mean :0.3406
## 3rd Qu.: 6942.6 3rd Qu.: 2106483 3rd Qu.: 14.00 3rd Qu.:0.0000
## Max. :68083.9 Max. :69748299 Max. :308.00 Max. :8.0000
##
## Financial Govt_Embassy Private_residential Shopping
## Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 0.000
## 1st Qu.: 1.00 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 5.00 Median : 0.000 Median : 4.00 Median : 0.000
## Mean : 10.28 Mean : 1.372 Mean : 11.16 Mean : 1.582
## 3rd Qu.: 13.00 3rd Qu.: 1.000 3rd Qu.: 11.00 3rd Qu.: 1.000
## Max. :134.00 Max. :19.000 Max. :217.00 Max. :31.000
##
## E_Active Young Aged Pop
## Min. : 512.4 Min. : 0.0 Min. : 0.0 Min. : 10
## 1st Qu.: 573.0 1st Qu.:218.0 1st Qu.:106.3 1st Qu.: 3330
## Median : 592.8 Median :254.2 Median :151.0 Median : 11640
## Mean : 601.8 Mean :247.9 Mean :150.3 Mean : 17557
## 3rd Qu.: 607.9 3rd Qu.:287.6 3rd Qu.:192.7 3rd Qu.: 26505
## Max. :1000.0 Max. :360.0 Max. :325.9 Max. :132480
## NA's :95 NA's :95 NA's :95 NA's :95
## dens HDB_1_2 HDB_3_4 HDB_5_EC
## Min. :0.00001 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.:0.00440 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.0
## Median :0.01222 Median : 0.00 Median :402.7 Median :144.0
## Mean :0.01510 Mean : 40.37 Mean :355.0 Mean :164.1
## 3rd Qu.:0.02474 3rd Qu.: 47.94 3rd Qu.:606.7 3rd Qu.:259.4
## Max. :0.04606 Max. :712.93 Max. :948.1 Max. :836.4
## NA's :95 NA's :95 NA's :95 NA's :95
## Condo_Apt LandedProperty geometry
## Min. : 0.00 Min. : 0.0 MULTIPOLYGON :318
## 1st Qu.: 26.75 1st Qu.: 0.0 POLYGON : 5
## Median : 145.71 Median : 0.0 epsg:NA : 0
## Mean : 307.46 Mean : 133.0 +proj=tmer...: 0
## 3rd Qu.: 491.72 3rd Qu.: 145.2
## Max. :1000.00 Max. :1000.0
## NA's :95 NA's :95
We observe NA values in our indicators in subzones that do not have the respective indicators. We will replace these NA values with 0 to represent that these areas do not have the respective indicators.
mpsz_All <- replace(mpsz_All, is.na(mpsz_All),0)
mpsz_All[rowSums(is.na(mpsz_All))!=0,]
## Simple feature collection with 0 features and 31 fields
## bbox: xmin: NA ymin: NA xmax: NA ymax: NA
## projected CRS: SVY21
## [1] OBJECTID SUBZONE_NO SUBZONE_N
## [4] SUBZONE_C CA_IND PLN_AREA_N
## [7] PLN_AREA_C REGION_N REGION_C
## [10] INC_CRC FMEL_UPD_D X_ADDR
## [13] Y_ADDR SHAPE_Leng SHAPE_Area
## [16] Business Industry Financial
## [19] Govt_Embassy Private_residential Shopping
## [22] E_Active Young Aged
## [25] Pop dens HDB_1_2
## [28] HDB_3_4 HDB_5_EC Condo_Apt
## [31] LandedProperty geometry
## <0 rows> (or 0-length row.names)
summary(mpsz_All)
## OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND
## Min. : 1.0 Min. : 1.000 Length:323 AMSZ01 : 1 N:274
## 1st Qu.: 81.5 1st Qu.: 2.000 Class :character AMSZ02 : 1 Y: 49
## Median :162.0 Median : 4.000 Mode :character AMSZ03 : 1
## Mean :162.0 Mean : 4.625 AMSZ04 : 1
## 3rd Qu.:242.5 3rd Qu.: 6.500 AMSZ05 : 1
## Max. :323.0 Max. :17.000 AMSZ06 : 1
## (Other):317
## PLN_AREA_N PLN_AREA_C REGION_N REGION_C
## BUKIT MERAH : 17 BM : 17 CENTRAL REGION :134 CR :134
## QUEENSTOWN : 15 QT : 15 EAST REGION : 30 ER : 30
## ANG MO KIO : 12 AM : 12 NORTH-EAST REGION: 48 NER: 48
## DOWNTOWN CORE: 12 DT : 12 NORTH REGION : 41 NR : 41
## TOA PAYOH : 12 TP : 12 WEST REGION : 70 WR : 70
## HOUGANG : 10 HG : 10
## (Other) :245 (Other):245
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR
## 00F5E30B5C9B7AD8: 1 Min. :2014-12-05 Min. : 5093 Min. :19579
## 013B509B8EDF15BE: 1 1st Qu.:2014-12-05 1st Qu.:21864 1st Qu.:31776
## 01A4287FB060A0A6: 1 Median :2014-12-05 Median :28465 Median :35113
## 029BD940F4455194: 1 Mean :2014-12-05 Mean :27257 Mean :36106
## 0524461C92F35D94: 1 3rd Qu.:2014-12-05 3rd Qu.:31674 3rd Qu.:39869
## 05FD555397CBEE7A: 1 Max. :2014-12-05 Max. :50425 Max. :49553
## (Other) :317
## SHAPE_Leng SHAPE_Area Business Industry
## Min. : 871.5 Min. : 39438 Min. : 0.00 Min. :0.0000
## 1st Qu.: 3709.6 1st Qu.: 628261 1st Qu.: 0.00 1st Qu.:0.0000
## Median : 5211.9 Median : 1229894 Median : 2.00 Median :0.0000
## Mean : 6524.4 Mean : 2420882 Mean : 19.94 Mean :0.3406
## 3rd Qu.: 6942.6 3rd Qu.: 2106483 3rd Qu.: 14.00 3rd Qu.:0.0000
## Max. :68083.9 Max. :69748299 Max. :308.00 Max. :8.0000
##
## Financial Govt_Embassy Private_residential Shopping
## Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 0.000
## 1st Qu.: 1.00 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 5.00 Median : 0.000 Median : 4.00 Median : 0.000
## Mean : 10.28 Mean : 1.372 Mean : 11.16 Mean : 1.582
## 3rd Qu.: 13.00 3rd Qu.: 1.000 3rd Qu.: 11.00 3rd Qu.: 1.000
## Max. :134.00 Max. :19.000 Max. :217.00 Max. :31.000
##
## E_Active Young Aged Pop
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0
## Median : 576.6 Median :223.7 Median :111.1 Median : 4880
## Mean : 424.8 Mean :175.0 Mean :106.1 Mean : 12393
## 3rd Qu.: 600.6 3rd Qu.:271.9 3rd Qu.:177.3 3rd Qu.: 17035
## Max. :1000.0 Max. :360.0 Max. :325.9 Max. :132480
##
## dens HDB_1_2 HDB_3_4 HDB_5_EC
## Min. :0.000000 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.:0.000000 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.0
## Median :0.005857 Median : 0.00 Median : 0.0 Median : 0.0
## Mean :0.010662 Mean : 28.50 Mean :250.6 Mean :115.9
## 3rd Qu.:0.019864 3rd Qu.: 28.99 3rd Qu.:504.3 3rd Qu.:207.2
## Max. :0.046058 Max. :712.93 Max. :948.1 Max. :836.4
##
## Condo_Apt LandedProperty geometry
## Min. : 0.00 Min. : 0.00 MULTIPOLYGON :318
## 1st Qu.: 0.00 1st Qu.: 0.00 POLYGON : 5
## Median : 45.93 Median : 0.00 epsg:NA : 0
## Mean : 217.03 Mean : 93.90 +proj=tmer...: 0
## 3rd Qu.: 300.34 3rd Qu.: 38.87
## Max. :1000.00 Max. :1000.00
##
We will utilise histograms to see the overall distributions of our data values
E_ActiveHist <- ggplot(data=mpsz_All,
aes(x= `E_Active`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
YoungHist <- ggplot(data=mpsz_All,
aes(x= `Young`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
AgedHist <- ggplot(data=mpsz_All,
aes(x= `Aged`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
densHist <- ggplot(data=mpsz_All,
aes(x= `dens`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
HDB_1_2Hist <- ggplot(data=mpsz_All,
aes(x= `HDB_1_2`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
HDB_3_4Hist <- ggplot(data=mpsz_All,
aes(x= `HDB_3_4`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
HDB_5_ECHist <- ggplot(data=mpsz_All,
aes(x= `HDB_5_EC`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Condo_AptHist <- ggplot(data=mpsz_All,
aes(x= `Condo_Apt`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
LandedPropertyHist <- ggplot(data=mpsz_All,
aes(x= `LandedProperty`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
BusinessHist <- ggplot(data=mpsz_All,
aes(x= `Business`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
IndustryHist <- ggplot(data=mpsz_All,
aes(x= `Industry`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
FinancialHist <- ggplot(data=mpsz_All,
aes(x= `Financial`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Govt_EmbassyHist <- ggplot(data=mpsz_All,
aes(x= `Govt_Embassy`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Private_residentialHist <- ggplot(data=mpsz_All,
aes(x= `Private_residential`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
ShoppingHist <- ggplot(data=mpsz_All,
aes(x= `Shopping`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
ggarrange(E_ActiveHist, YoungHist, AgedHist, densHist, HDB_1_2Hist, HDB_3_4Hist, HDB_5_ECHist, Condo_AptHist, LandedPropertyHist, BusinessHist, IndustryHist, FinancialHist, Govt_EmbassyHist, Private_residentialHist, ShoppingHist,
ncol = 3,
nrow = 5)
We observe that most of our variables here are not normally distributed. We will factor this insight into our standardisation method choice.
We will next be performing cluster analysis as it is important for us identify highly correlated input values and avoid using both.
We will utilize the corrplot.mixed() function of corrplot package to visualise and analyse the correlation of the input variables. We will assume correlation coefficients of magnitudes between 0.7 and 0.9 are considered highly correlated and should not be used together.
We must set mpsz_All as a dataframe before we can use it for our correlation analysis
We also drop Population to remove it as a variable in our correlation analysis
mpsz_All_NoPop <- select(mpsz_All, -c("Pop"))
mpsz_All_NoPop_df <- as.data.frame(mpsz_All_NoPop)
cluster_vars.cor = cor(mpsz_All_NoPop_df[,16:21])
corrplot.mixed(cluster_vars.cor,
lower = "ellipse",
upper = "number",
tl.pos = "lt",
diag = "l",
tl.col = "black")
We observe a 0.72 correlation coefficient magnitude between Financial and Shopping for our urban functions. We will opt to use Financial for our clustering analysis moving forward
cluster_vars.cor = cor(mpsz_All_NoPop_df[,22:30])
corrplot.mixed(cluster_vars.cor,
lower = "ellipse",
upper = "number",
tl.pos = "lt",
diag = "l",
tl.col = "black")
Next we observe that a correlation coefficient magnitude of 0.87 between E_Active and Young, 0.72 between E_Active and Aged, 0.74 between dens and HDB_3_4 and 0.72 between dens and HDB_5_EC. We will opt for Young and Aged over E_Active and HDB_3_4 and HDB_5_EC over dens for our clustering analysis moving forward.
Next we will be performing Heirarchy Cluster Analysis.
To begin, we will have to extract our clusters to use in our analysis from mpsz_All
cluster_vars <- mpsz_All %>%
st_set_geometry(NULL) %>%
select("SUBZONE_N", "Young", "Aged", "HDB_1_2", "HDB_3_4", "HDB_5_EC", "Condo_Apt", "LandedProperty", "Business", "Industry", "Financial", "Govt_Embassy", "Private_residential")
head(cluster_vars,10)
## SUBZONE_N Young Aged HDB_1_2 HDB_3_4 HDB_5_EC Condo_Apt
## 1 MARINA SOUTH 0.0000 0.0000 0.0000 0.0000 0.00000 0.00000
## 2 PEARL'S HILL 167.1924 315.4574 712.9338 220.8202 0.00000 66.24606
## 3 BOAT QUAY 0.0000 200.0000 0.0000 0.0000 0.00000 1000.00000
## 4 HENDERSON HILL 195.8146 243.6472 293.7220 597.1599 94.91779 14.20030
## 5 REDHILL 266.4165 146.3415 184.8030 330.2064 303.93996 181.05066
## 6 ALEXANDRA HILL 215.7303 233.7079 292.8839 473.4082 233.70787 0.00000
## 7 BUKIT HO SWEE 193.2203 243.3898 250.8475 583.7288 128.81356 36.61017
## 8 CLARKE QUAY 0.0000 0.0000 0.0000 0.0000 0.00000 1000.00000
## 9 PASIR PANJANG 1 252.2727 122.7273 0.0000 0.0000 0.00000 675.00000
## 10 QUEENSWAY 103.4483 172.4138 0.0000 0.0000 0.00000 1000.00000
## LandedProperty Business Industry Financial Govt_Embassy Private_residential
## 1 0 0 0 3 0 0
## 2 0 6 0 25 1 6
## 3 0 40 0 2 2 1
## 4 0 0 0 4 0 5
## 5 0 2 0 12 0 6
## 6 0 39 1 15 7 4
## 7 0 6 0 6 4 11
## 8 0 12 0 19 4 6
## 9 325 16 0 4 0 56
## 10 0 0 0 2 0 1
Next we will change our row numbers into our subzones names
row.names(cluster_vars) <- cluster_vars$"SUBZONE_N"
head(cluster_vars,10)
## SUBZONE_N Young Aged HDB_1_2 HDB_3_4 HDB_5_EC
## MARINA SOUTH MARINA SOUTH 0.0000 0.0000 0.0000 0.0000 0.00000
## PEARL'S HILL PEARL'S HILL 167.1924 315.4574 712.9338 220.8202 0.00000
## BOAT QUAY BOAT QUAY 0.0000 200.0000 0.0000 0.0000 0.00000
## HENDERSON HILL HENDERSON HILL 195.8146 243.6472 293.7220 597.1599 94.91779
## REDHILL REDHILL 266.4165 146.3415 184.8030 330.2064 303.93996
## ALEXANDRA HILL ALEXANDRA HILL 215.7303 233.7079 292.8839 473.4082 233.70787
## BUKIT HO SWEE BUKIT HO SWEE 193.2203 243.3898 250.8475 583.7288 128.81356
## CLARKE QUAY CLARKE QUAY 0.0000 0.0000 0.0000 0.0000 0.00000
## PASIR PANJANG 1 PASIR PANJANG 1 252.2727 122.7273 0.0000 0.0000 0.00000
## QUEENSWAY QUEENSWAY 103.4483 172.4138 0.0000 0.0000 0.00000
## Condo_Apt LandedProperty Business Industry Financial
## MARINA SOUTH 0.00000 0 0 0 3
## PEARL'S HILL 66.24606 0 6 0 25
## BOAT QUAY 1000.00000 0 40 0 2
## HENDERSON HILL 14.20030 0 0 0 4
## REDHILL 181.05066 0 2 0 12
## ALEXANDRA HILL 0.00000 0 39 1 15
## BUKIT HO SWEE 36.61017 0 6 0 6
## CLARKE QUAY 1000.00000 0 12 0 19
## PASIR PANJANG 1 675.00000 325 16 0 4
## QUEENSWAY 1000.00000 0 0 0 2
## Govt_Embassy Private_residential
## MARINA SOUTH 0 0
## PEARL'S HILL 1 6
## BOAT QUAY 2 1
## HENDERSON HILL 0 5
## REDHILL 0 6
## ALEXANDRA HILL 7 4
## BUKIT HO SWEE 4 11
## CLARKE QUAY 4 6
## PASIR PANJANG 1 0 56
## QUEENSWAY 0 1
We now see that the indexes are the subzone names. We will then remove the subzone area column.
mpsz_All_cVar <- select(cluster_vars, c(2:13))
Since our variables are not all normally distributed, we will chose Min-Max Standardistion as Z-Score is not appropriate for non-normal distributions.
mpsz_All_cVar.std <- normalize(mpsz_All_cVar)
summary(mpsz_All_cVar.std)
## Young Aged HDB_1_2 HDB_3_4
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.6215 Median :0.3409 Median :0.00000 Median :0.0000
## Mean :0.4861 Mean :0.3256 Mean :0.03997 Mean :0.2643
## 3rd Qu.:0.7553 3rd Qu.:0.5439 3rd Qu.:0.04067 3rd Qu.:0.5319
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000
## HDB_5_EC Condo_Apt LandedProperty Business
## Min. :0.0000 Min. :0.00000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.000000
## Median :0.0000 Median :0.04593 Median :0.00000 Median :0.006494
## Mean :0.1385 Mean :0.21703 Mean :0.09390 Mean :0.064734
## 3rd Qu.:0.2477 3rd Qu.:0.30034 3rd Qu.:0.03887 3rd Qu.:0.045455
## Max. :1.0000 Max. :1.00000 Max. :1.00000 Max. :1.000000
## Industry Financial Govt_Embassy Private_residential
## Min. :0.00000 Min. :0.000000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.007463 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.037313 Median :0.00000 Median :0.01843
## Mean :0.04257 Mean :0.076706 Mean :0.07219 Mean :0.05142
## 3rd Qu.:0.00000 3rd Qu.:0.097015 3rd Qu.:0.05263 3rd Qu.:0.05069
## Max. :1.00000 Max. :1.000000 Max. :1.00000 Max. :1.00000
We will utilise histograms to see the overall distributions of our data values
YoungHist.std <- ggplot(data=mpsz_All_cVar.std,
aes(x= `Young`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
AgedHist.std <- ggplot(data=mpsz_All_cVar.std,
aes(x= `Aged`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
HDB_1_2Hist.std <- ggplot(data=mpsz_All_cVar.std,
aes(x= `HDB_1_2`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
HDB_3_4Hist.std <- ggplot(data=mpsz_All_cVar.std,
aes(x= `HDB_3_4`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
HDB_5_ECHist.std <- ggplot(data=mpsz_All_cVar.std,
aes(x= `HDB_5_EC`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Condo_AptHist.std <- ggplot(data=mpsz_All_cVar.std,
aes(x= `Condo_Apt`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
LandedPropertyHist.std <- ggplot(data=mpsz_All_cVar.std,
aes(x= `LandedProperty`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
BusinessHist.std <- ggplot(data=mpsz_All_cVar.std,
aes(x= `Business`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
IndustryHist.std <- ggplot(data=mpsz_All_cVar.std,
aes(x= `Industry`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
FinancialHist.std <- ggplot(data=mpsz_All_cVar.std,
aes(x= `Financial`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Govt_EmbassyHist.std <- ggplot(data=mpsz_All_cVar.std,
aes(x= `Govt_Embassy`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
Private_residentialHist.std <- ggplot(data=mpsz_All_cVar.std,
aes(x= `Private_residential`)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
ggarrange(YoungHist.std, AgedHist.std, HDB_1_2Hist.std, HDB_3_4Hist.std, HDB_5_ECHist.std, Condo_AptHist.std, LandedPropertyHist.std, BusinessHist.std, IndustryHist.std, FinancialHist.std, Govt_EmbassyHist.std, Private_residentialHist.std,
ncol = 3,
nrow = 4)
We observe that our data ranges are now between 0 and 1. However the overall distributions have not changed much.
proxmat <- dist(mpsz_All_cVar.std, method = 'euclidean')
m <- c( "average", "single", "complete", "ward")
names(m) <- c( "average", "single", "complete", "ward")
ac <- function(x) {
agnes(mpsz_All_cVar.std, method = x)$ac
}
map_dbl(m, ac)
## average single complete ward
## 0.8903011 0.8432187 0.9187196 0.9840449
We observe that the Ward’s method has the highest agglomerative coefficient and thus provides the strongest clustering structure. Therefore we will use Ward’s method to analyse the variables
fviz_nbclust(mpsz_All_cVar.std, FUN = hcut, method = "wss")+
geom_vline(xintercept = 4, linetype = 2)+
labs(subtitle = "Elbow Method")
For this method we see an elbow at K=4. We will try other methods to determine whether 4 is the optimal number of clusters.
fviz_nbclust(mpsz_All_cVar.std, FUN=hcut, method="silhouette")+
labs(subtitle = "Silhouette Method")
This method also reveals 4 as the optimal number of clusters that we will adopt moving forward.
hclust_ward <- hclust(proxmat, method = 'ward.D2')
plot(hclust_ward, cex = 0.6)
rect.hclust(hclust_ward, k = 4, border = 2:5)
mpsz_All_cVar.std.mat <- data.matrix(mpsz_All_cVar.std)
heatmaply(mpsz_All_cVar.std.mat,
Colv=NA,
dist_method = "euclidean",
hclust_method = "ward.D2",
seriate = "OLO",
colors = Blues,
k_row = 4,
margins = c(NA,200,60,NA),
fontsize_row = 4,
fontsize_col = 5,
main="Geographic Segmentation of Singapore by Indicators & Urban Functions",
xlab = "Indicators & Urban Functions",
ylab = "Singapore Subzones"
)
groups <- as.factor(cutree(hclust_ward, k=4))
SZ_cluster <- cbind(mpsz_All, as.matrix(groups)) %>%
rename(`CLUSTER`=`as.matrix.groups.`)
qtm(SZ_cluster, "CLUSTER")
We see that the clusters are very fragmented. This output reflect=s the limitation of heirarchical cluster analysis as it is a non-spatial clustering algorithm.
We will convert our mpsz into SpatialPolygonsDataFrame as SKATER only supports sp objects.
mpsz_All_sp <- as_Spatial(mpsz_All)
Next we will compute the neighbour list
mpsz_All_sp.nb <- poly2nb(mpsz_All_sp)
summary(mpsz_All_sp.nb)
## Neighbour list object:
## Number of regions: 323
## Number of nonzero links: 1934
## Percentage nonzero weights: 1.853751
## Average number of links: 5.987616
## 5 regions with no links:
## 17 18 19 295 302
## Link number distribution:
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 14 17
## 5 2 6 10 26 77 87 51 34 16 3 3 1 1 1
## 2 least connected regions:
## 16 234 with 1 link
## 1 most connected region:
## 313 with 17 links
plot(mpsz_All_sp, border=grey(.5))
plot(mpsz_All_sp.nb, coordinates(mpsz_All_sp), col = "red", add = TRUE)
We observe subzones with no neighbours. Since we cannot calculate edge costs with 0 neighbour subzones, we will have to remove them.
mpsz_All_sp.nb.NZ <- subset(mpsz_All_sp.nb, subset = card(mpsz_All_sp.nb) > 0)
summary(mpsz_All_sp.nb.NZ)
## Neighbour list object:
## Number of regions: 318
## Number of nonzero links: 1934
## Percentage nonzero weights: 1.912503
## Average number of links: 6.081761
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 10 11 12 14 17
## 2 6 10 26 77 87 51 34 16 3 3 1 1 1
## 2 least connected regions:
## 16 234 with 1 link
## 1 most connected region:
## 313 with 17 links
As we can observe, there are no more 0 neighbour subzones.
However as we remove the subzones, we will have to keep track of them and add them back in future else we won’t be able to plot our choropleth map.
Looking at the data, the 0 neighbour subzones are indexes 17:19, 295 & 302.
SZ_add <- function(df, num, index) {
df[seq(index+1,nrow(df)+1),] <- df[seq(index,nrow(df)),]
df[index,] <- num
df
}
This function will add
lcosts <- nbcosts(mpsz_All_sp.nb.NZ, mpsz_All_cVar.std)
SZ.w <- nb2listw(mpsz_All_sp.nb.NZ, lcosts, style = "B")
summary(SZ.w)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 318
## Number of nonzero links: 1934
## Percentage nonzero weights: 1.912503
## Average number of links: 6.081761
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 10 11 12 14 17
## 2 6 10 26 77 87 51 34 16 3 3 1 1 1
## 2 least connected regions:
## 16 234 with 1 link
## 1 most connected region:
## 313 with 17 links
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 318 101124 1831.917 4129.611 47721.25
SZ.mst <- mstree(SZ.w)
class(SZ.mst)
## [1] "mst" "matrix"
dim(SZ.mst)
## [1] 317 3
plot(mpsz_All_sp, border=gray(.5))
plot.mst(SZ.mst, coordinates(mpsz_All_sp), col = "red", cex.lab = 0.7, cex.circles = 0.005, add = TRUE)
We will first compute the clusters with the SKATER Method
clusters <- skater(SZ.mst[,1:2], mpsz_All_cVar.std, 3)
Next we will plot the pruned tree showing the 4 clusters
plot(mpsz_All_sp, border = gray(0.5))
plot(clusters, coordinates(mpsz_All_sp), cex.lab = 0.7,
groups.colors = c("red", "blue", "green", "brown", "pink"), cex.circles = 0.005, add = TRUE)
## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter
## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter
## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter
## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter
Before we can print out the clusters in a choropleth map, we will need to return the subzone points that had no neighbours prior.
We begin this with converting the clusters into a matrix.
groups_mat <- as.matrix(clusters$groups)
Using the function we made earlier, we add back the subzones earlier removed.
df <- as.data.frame(groups_mat)
df <- SZ_add(df, 0 , 17)
df <- SZ_add(df, 0 , 18)
df <- SZ_add(df, 0 , 19)
df <- SZ_add(df, 0 , 295)
df <- SZ_add(df, 0 , 302)
groups_mat <- as.matrix(df)
SZ_spatialcluster <- cbind(SZ_cluster, as.factor(groups_mat)) %>%
rename(`SP_CLUSTER`=`as.factor.groups_mat.`)
qtm(SZ_spatialcluster, "SP_CLUSTER")
The analysis of this portion will also take a look at the plotted tree above.
Cluster0 <- SZ_spatialcluster %>%
filter(SP_CLUSTER == 0)
Cluster1 <- SZ_spatialcluster %>%
filter(SP_CLUSTER == 1)
Cluster2 <- SZ_spatialcluster %>%
filter(SP_CLUSTER == 2)
Cluster3 <- SZ_spatialcluster %>%
filter(SP_CLUSTER == 3)
Cluster4 <- SZ_spatialcluster %>%
filter(SP_CLUSTER == 4)
summary(Cluster0)
## OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND
## Min. : 17.0 Min. :1.0 Length:5 NESZ01 :1 N:5
## 1st Qu.: 18.0 1st Qu.:2.0 Class :character SISZ02 :1 Y:0
## Median : 19.0 Median :2.0 Mode :character SMSZ04 :1
## Mean :130.2 Mean :2.4 WISZ02 :1
## 3rd Qu.:295.0 3rd Qu.:3.0 WISZ03 :1
## Max. :302.0 Max. :4.0 AMSZ01 :0
## (Other):0
## PLN_AREA_N PLN_AREA_C REGION_N REGION_C
## WESTERN ISLANDS :2 WI :2 CENTRAL REGION :1 CR :1
## NORTH-EASTERN ISLANDS:1 NE :1 EAST REGION :0 ER :0
## SIMPANG :1 SI :1 NORTH-EAST REGION:1 NER:1
## SOUTHERN ISLANDS :1 SM :1 NORTH REGION :1 NR :1
## ANG MO KIO :0 AM :0 WEST REGION :2 WR :2
## BEDOK :0 BD :0
## (Other) :0 (Other):0
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR
## 5809FC547293EA2D:1 Min. :2014-12-05 Min. :15932 Min. :19579
## 66E54DD5CE0C71A2:1 1st Qu.:2014-12-05 1st Qu.:21206 1st Qu.:20466
## 92BC3E09C68F3B52:1 Median :2014-12-05 Median :29815 Median :23413
## E69207D4F76DEEA3:1 Mean :2014-12-05 Mean :29778 Mean :30663
## F718C723E08FBD51:1 3rd Qu.:2014-12-05 3rd Qu.:31511 3rd Qu.:42613
## 00F5E30B5C9B7AD8:0 Max. :2014-12-05 Max. :50425 Max. :47245
## (Other) :0
## SHAPE_Leng SHAPE_Area Business Industry Financial
## Min. : 5466 Min. : 1611279 Min. :0 Min. :0 Min. :0
## 1st Qu.:18704 1st Qu.: 2206319 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :24759 Median : 4207271 Median :0 Median :0 Median :0
## Mean :27398 Mean :16047844 Mean :0 Mean :0 Mean :0
## 3rd Qu.:25627 3rd Qu.: 4963787 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :62436 Max. :67250563 Max. :0 Max. :0 Max. :0
##
## Govt_Embassy Private_residential Shopping E_Active Young
## Min. :0 Min. :0 Min. :0 Min. :0 Min. :0
## 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :0 Median :0 Median :0 Median :0 Median :0
## Mean :0 Mean :0 Mean :0 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :0 Max. :0 Max. :0 Max. :0
##
## Aged Pop dens HDB_1_2 HDB_3_4 HDB_5_EC
## Min. :0 Min. :0 Min. :0 Min. :0 Min. :0 Min. :0
## 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :0 Median :0 Median :0 Median :0 Median :0 Median :0
## Mean :0 Mean :0 Mean :0 Mean :0 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :0 Max. :0 Max. :0 Max. :0 Max. :0
##
## Condo_Apt LandedProperty CLUSTER SP_CLUSTER geometry
## Min. :0 Min. :0 1:5 0:5 MULTIPOLYGON :5
## 1st Qu.:0 1st Qu.:0 2:0 1:0 epsg:NA :0
## Median :0 Median :0 3:0 2:0 +proj=tmer...:0
## Mean :0 Mean :0 4:0 3:0
## 3rd Qu.:0 3rd Qu.:0 4:0
## Max. :0 Max. :0
##
sum(Cluster0$SHAPE_Area) / 100000 # Convert m^2 to km^2
## [1] 802.3922
Looking at our cluster 0 firstly, we see that they are mainly the islands around Singapore such as Pulau Ubin, Tekong and the many southern islands. These were the areas without neighbours that we removed prior and from the summary we see that these areas are void of all urban functions and indicators. It has a total area of 802.3922 km^2, making it the second largest cluster among the 5.
summary(Cluster1)
## OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND
## Min. : 9.0 Min. : 1.000 Length:200 AMSZ01 : 1 N:200
## 1st Qu.:138.8 1st Qu.: 2.000 Class :character AMSZ02 : 1 Y: 0
## Median :189.5 Median : 4.000 Mode :character AMSZ03 : 1
## Mean :188.0 Mean : 4.415 AMSZ04 : 1
## 3rd Qu.:241.2 3rd Qu.: 6.000 AMSZ05 : 1
## Max. :319.0 Max. :15.000 AMSZ06 : 1
## (Other):194
## PLN_AREA_N PLN_AREA_C REGION_N REGION_C
## ANG MO KIO : 12 AM : 12 CENTRAL REGION :48 CR :48
## TOA PAYOH : 12 TP : 12 EAST REGION :30 ER :30
## JURONG EAST: 10 JE : 10 NORTH-EAST REGION:42 NER:42
## BUKIT BATOK: 9 BK : 9 NORTH REGION :15 NR :15
## JURONG WEST: 9 JW : 9 WEST REGION :65 WR :65
## QUEENSTOWN : 9 QT : 9
## (Other) :139 (Other):139
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR
## 00F5E30B5C9B7AD8: 1 Min. :2014-12-05 Min. : 5093 Min. :26138
## 013B509B8EDF15BE: 1 1st Qu.:2014-12-05 1st Qu.:19288 1st Qu.:34203
## 029BD940F4455194: 1 Median :2014-12-05 Median :28511 Median :36586
## 05FD555397CBEE7A: 1 Mean :2014-12-05 Mean :26807 Mean :37091
## 0664CA7EF6504AE5: 1 3rd Qu.:2014-12-05 3rd Qu.:34108 3rd Qu.:39822
## 0ABCF49C51112DC2: 1 Max. :2014-12-05 Max. :49502 Max. :47683
## (Other) :194
## SHAPE_Leng SHAPE_Area Business Industry
## Min. : 1634 Min. : 143138 Min. : 0.00 Min. :0.000
## 1st Qu.: 4296 1st Qu.: 918875 1st Qu.: 0.00 1st Qu.:0.000
## Median : 5657 Median : 1444181 Median : 2.00 Median :0.000
## Mean : 7214 Mean : 2895375 Mean : 26.15 Mean :0.455
## 3rd Qu.: 7563 3rd Qu.: 2408087 3rd Qu.: 19.75 3rd Qu.:0.000
## Max. :68084 Max. :69748299 Max. :308.00 Max. :8.000
##
## Financial Govt_Embassy Private_residential Shopping
## Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 0.000
## 1st Qu.: 1.00 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 4.00 Median : 0.000 Median : 4.00 Median : 0.000
## Mean : 8.74 Mean : 0.795 Mean : 12.13 Mean : 0.935
## 3rd Qu.:12.00 3rd Qu.: 1.000 3rd Qu.: 12.00 3rd Qu.: 1.000
## Max. :79.00 Max. :10.000 Max. :217.00 Max. :14.000
##
## E_Active Young Aged Pop
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0
## Median :576.0 Median :235.9 Median :124.5 Median : 8160
## Mean :420.0 Mean :182.7 Mean :107.4 Mean : 15002
## 3rd Qu.:598.5 3rd Qu.:275.4 3rd Qu.:173.5 3rd Qu.: 24755
## Max. :800.0 Max. :360.0 Max. :269.0 Max. :132480
##
## dens HDB_1_2 HDB_3_4 HDB_5_EC
## Min. :0.000000 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.:0.000000 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median :0.006731 Median : 0.00 Median : 69.78 Median : 20.12
## Mean :0.011663 Mean : 21.42 Mean :270.96 Mean :126.61
## 3rd Qu.:0.022283 3rd Qu.: 28.54 3rd Qu.:555.74 3rd Qu.:237.90
## Max. :0.046058 Max. :322.17 Max. :837.65 Max. :613.50
##
## Condo_Apt LandedProperty CLUSTER SP_CLUSTER geometry
## Min. : 0.00 Min. : 0.0 1:58 0: 0 MULTIPOLYGON :197
## 1st Qu.: 0.00 1st Qu.: 0.0 2:97 1:200 POLYGON : 3
## Median : 36.74 Median : 0.0 3:20 2: 0 epsg:NA : 0
## Mean : 165.90 Mean : 125.1 4:25 3: 0 +proj=tmer...: 0
## 3rd Qu.: 240.78 3rd Qu.: 118.6 4: 0
## Max. :1000.00 Max. :1000.0
##
sum(Cluster1$SHAPE_Area) / 100000 # Convert m^2 to km^2
## [1] 5790.749
Cluster 1 takes up the manjority area of Singapore which spans over all regions and hence has the largest total area of 5790.749 km^2.
summary(Cluster2)
## OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND
## Min. :259.0 Min. :1.000 Length:27 CKSZ05 : 1 N:27
## 1st Qu.:290.5 1st Qu.:2.500 Class :character CKSZ06 : 1 Y: 0
## Median :300.0 Median :5.000 Mode :character MDSZ01 : 1
## Mean :298.6 Mean :4.667 MDSZ02 : 1
## 3rd Qu.:308.5 3rd Qu.:6.500 SBSZ02 : 1
## Max. :323.0 Max. :9.000 SBSZ03 : 1
## (Other):21
## PLN_AREA_N PLN_AREA_C REGION_N REGION_C
## WOODLANDS :9 WD :9 CENTRAL REGION : 0 CR : 0
## SEMBAWANG :8 SB :8 EAST REGION : 0 ER : 0
## SIMPANG :3 SM :3 NORTH-EAST REGION: 0 NER: 0
## CHOA CHU KANG:2 CK :2 NORTH REGION :25 NR :25
## MANDAI :2 MD :2 WEST REGION : 2 WR : 2
## YISHUN :2 YS :2
## (Other) :1 (Other):1
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR
## 01A4287FB060A0A6: 1 Min. :2014-12-05 Min. :18348 Min. :41594
## 19529EBD71A301DD: 1 1st Qu.:2014-12-05 1st Qu.:22560 1st Qu.:45530
## 1ED0377B40E71BDA: 1 Median :2014-12-05 Median :24666 Median :46959
## 2E2DB30B78E2AC57: 1 Mean :2014-12-05 Mean :24720 Mean :46512
## 4215C006676A7D38: 1 3rd Qu.:2014-12-05 3rd Qu.:27023 3rd Qu.:48211
## 42D5F52D334C615F: 1 Max. :2014-12-05 Max. :30568 Max. :49553
## (Other) :21
## SHAPE_Leng SHAPE_Area Business Industry
## Min. : 3254 Min. : 595652 Min. : 0.00 Min. :0.0000
## 1st Qu.: 4428 1st Qu.:1094016 1st Qu.: 0.00 1st Qu.:0.0000
## Median : 5520 Median :1576001 Median : 1.00 Median :0.0000
## Mean : 6361 Mean :2023216 Mean : 17.07 Mean :0.3333
## 3rd Qu.: 7305 3rd Qu.:2225299 3rd Qu.: 10.50 3rd Qu.:0.0000
## Max. :11829 Max. :7235809 Max. :173.00 Max. :3.0000
##
## Financial Govt_Embassy Private_residential Shopping
## Min. : 0.000 Min. :0.0000 Min. : 0.000 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.:0.0000
## Median : 2.000 Median :0.0000 Median : 1.000 Median :0.0000
## Mean : 5.259 Mean :0.5926 Mean : 2.519 Mean :0.8519
## 3rd Qu.: 9.000 3rd Qu.:0.5000 3rd Qu.: 3.500 3rd Qu.:1.0000
## Max. :21.000 Max. :6.0000 Max. :14.000 Max. :8.0000
##
## E_Active Young Aged Pop
## Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 0
## Median :569.0 Median :233.5 Median : 75.60 Median : 4320
## Mean :331.3 Mean :164.9 Mean : 59.34 Mean :15767
## 3rd Qu.:600.6 3rd Qu.:310.4 3rd Qu.: 91.70 3rd Qu.:32610
## Max. :622.6 Max. :338.3 Max. :175.52 Max. :98410
##
## dens HDB_1_2 HDB_3_4 HDB_5_EC
## Min. :0.000000 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.:0.000000 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.0
## Median :0.001955 Median : 0.00 Median : 0.0 Median : 0.0
## Mean :0.011275 Mean : 17.17 Mean :213.2 Mean :185.9
## 3rd Qu.:0.020179 3rd Qu.: 31.74 3rd Qu.:443.4 3rd Qu.:430.9
## Max. :0.044356 Max. :113.47 Max. :824.5 Max. :509.6
##
## Condo_Apt LandedProperty CLUSTER SP_CLUSTER geometry
## Min. : 0.00 Min. : 0.00 1:12 0: 0 MULTIPOLYGON :26
## 1st Qu.: 0.00 1st Qu.: 0.00 2:12 1: 0 POLYGON : 1
## Median : 0.00 Median : 0.00 3: 1 2:27 epsg:NA : 0
## Mean : 54.02 Mean : 85.28 4: 2 3: 0 +proj=tmer...: 0
## 3rd Qu.: 36.85 3rd Qu.: 0.00 4: 0
## Max. :759.56 Max. :1000.00
##
sum(Cluster2$SHAPE_Area) / 100000 # Convert m^2 to km^2
## [1] 546.2683
Cluster 2 is present in the northern region of Singapore with a couple of subzones at the west region. It has the smallest total area at 546.2683.
summary(Cluster3)
## OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND
## Min. : 1.00 Min. : 1.000 Length:86 BMSZ01 : 1 N:37
## 1st Qu.: 27.25 1st Qu.: 2.000 Class :character BMSZ02 : 1 Y:49
## Median : 50.00 Median : 4.000 Mode :character BMSZ03 : 1
## Mean : 55.56 Mean : 5.291 BMSZ04 : 1
## 3rd Qu.: 81.75 3rd Qu.: 7.000 BMSZ05 : 1
## Max. :165.00 Max. :17.000 BMSZ06 : 1
## (Other):80
## PLN_AREA_N PLN_AREA_C REGION_N REGION_C
## BUKIT MERAH :17 BM :17 CENTRAL REGION :85 CR :85
## DOWNTOWN CORE:12 DT :12 EAST REGION : 0 ER : 0
## ROCHOR :10 RC :10 NORTH-EAST REGION: 0 NER: 0
## KALLANG : 6 KL : 6 NORTH REGION : 0 NR : 0
## NEWTON : 6 NT : 6 WEST REGION : 1 WR : 1
## QUEENSTOWN : 6 QT : 6
## (Other) :29 (Other):29
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR
## 0524461C92F35D94: 1 Min. :2014-12-05 Min. :20764 Min. :25813
## 06B9FD8607810069: 1 1st Qu.:2014-12-05 1st Qu.:27287 1st Qu.:29577
## 0D1D1759D7BC6D6C: 1 Median :2014-12-05 Median :28817 Median :30681
## 0F0735F1BDDF53C7: 1 Mean :2014-12-05 Mean :28510 Mean :30628
## 0FF1661344C84AED: 1 3rd Qu.:2014-12-05 3rd Qu.:29991 3rd Qu.:31803
## 0FF5E50B9581D2BE: 1 Max. :2014-12-05 Max. :33716 Max. :33930
## (Other) :80
## SHAPE_Leng SHAPE_Area Business Industry
## Min. : 871.6 Min. : 39438 Min. : 0.000 Min. :0.0000
## 1st Qu.: 2293.0 1st Qu.: 216181 1st Qu.: 1.000 1st Qu.:0.0000
## Median : 3006.0 Median : 411359 Median : 4.000 Median :0.0000
## Mean : 3816.2 Mean : 708805 Mean : 8.605 Mean :0.1163
## 3rd Qu.: 4515.4 3rd Qu.: 877930 3rd Qu.:11.000 3rd Qu.:0.0000
## Max. :17496.2 Max. :4919132 Max. :51.000 Max. :5.0000
##
## Financial Govt_Embassy Private_residential Shopping
## Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 0.000
## 1st Qu.: 3.00 1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 0.000
## Median : 8.00 Median : 1.000 Median : 5.00 Median : 1.000
## Mean : 15.71 Mean : 3.081 Mean : 12.16 Mean : 3.302
## 3rd Qu.: 19.00 3rd Qu.: 4.000 3rd Qu.: 11.00 3rd Qu.: 4.500
## Max. :134.00 Max. :19.000 Max. :123.00 Max. :31.000
##
## E_Active Young Aged Pop
## Min. : 0.0 Min. : 0.00 Min. : 0.0 Min. : 0
## 1st Qu.: 517.7 1st Qu.: 16.13 1st Qu.: 0.0 1st Qu.: 60
## Median : 581.7 Median :196.91 Median :137.9 Median : 1400
## Mean : 480.0 Mean :164.27 Mean :123.2 Mean : 4005
## 3rd Qu.: 613.2 3rd Qu.:249.79 3rd Qu.:200.2 3rd Qu.: 7758
## Max. :1000.0 Max. :330.51 Max. :325.9 Max. :19100
##
## dens HDB_1_2 HDB_3_4 HDB_5_EC
## Min. :0.0000000 Min. : 0.00 Min. : 0.0 Min. : 0.00
## 1st Qu.:0.0003167 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.00
## Median :0.0039049 Median : 0.00 Median : 0.0 Median : 0.00
## Mean :0.0074665 Mean : 50.53 Mean :218.8 Mean : 60.94
## 3rd Qu.:0.0108508 3rd Qu.: 29.69 3rd Qu.:503.3 3rd Qu.: 87.31
## Max. :0.0439867 Max. :712.93 Max. :948.1 Max. :836.36
##
## Condo_Apt LandedProperty CLUSTER SP_CLUSTER geometry
## Min. : 0.0 Min. : 0.00 1:20 0: 0 MULTIPOLYGON :85
## 1st Qu.: 0.0 1st Qu.: 0.00 2:31 1: 0 POLYGON : 1
## Median : 163.1 Median : 0.00 3:33 2: 0 epsg:NA : 0
## Mean : 403.9 Mean : 33.35 4: 2 3:86 +proj=tmer...: 0
## 3rd Qu.: 993.3 3rd Qu.: 0.00 4: 0
## Max. :1000.0 Max. :738.98
##
sum(Cluster3$SHAPE_Area) / 100000 # Convert m^2 to km^2
## [1] 609.5722
Cluster 3 is in the Central Southern region of Singapore, this area consists of mainly the CBD and Town areas. Likely to have a large number of businesses and social activity. It has the smallest total area at 609.5722 km^2.
summary(Cluster4)
## OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND
## Min. :204 Min. :2.0 Length:5 HGSZ02 :1 N:5
## 1st Qu.:231 1st Qu.:2.0 Class :character HGSZ07 :1 Y:0
## Median :260 Median :3.0 Mode :character SESZ02 :1
## Mean :248 Mean :3.6 SESZ03 :1
## 3rd Qu.:272 3rd Qu.:4.0 SESZ04 :1
## Max. :273 Max. :7.0 AMSZ01 :0
## (Other):0
## PLN_AREA_N PLN_AREA_C REGION_N REGION_C
## SENGKANG :3 SE :3 CENTRAL REGION :0 CR :0
## HOUGANG :2 HG :2 EAST REGION :0 ER :0
## ANG MO KIO:0 AM :0 NORTH-EAST REGION:5 NER:5
## BEDOK :0 BD :0 NORTH REGION :0 NR :0
## BISHAN :0 BK :0 WEST REGION :0 WR :0
## BOON LAY :0 BL :0
## (Other) :0 (Other):0
## INC_CRC FMEL_UPD_D X_ADDR Y_ADDR
## 5A2D0E9E6B285069:1 Min. :2014-12-05 Min. :33930 Min. :37657
## 6EDE1DB873D24BDD:1 1st Qu.:2014-12-05 1st Qu.:34219 1st Qu.:39071
## 986666487FF7CF78:1 Median :2014-12-05 Median :35164 Median :41061
## BE2E2BB27D14DC52:1 Mean :2014-12-05 Mean :34903 Mean :40221
## F00F5344E293F642:1 3rd Qu.:2014-12-05 3rd Qu.:35222 3rd Qu.:41501
## 00F5E30B5C9B7AD8:0 Max. :2014-12-05 Max. :35978 Max. :41815
## (Other) :0
## SHAPE_Leng SHAPE_Area Business Industry Financial
## Min. :5112 Min. :1007410 Min. :0.0 Min. :0 Min. : 1.0
## 1st Qu.:5216 1st Qu.:1455508 1st Qu.:0.0 1st Qu.:0 1st Qu.: 1.0
## Median :5438 Median :1499109 Median :1.0 Median :0 Median :11.0
## Mean :5540 Mean :1409319 Mean :1.8 Mean :0 Mean :15.8
## 3rd Qu.:5617 3rd Qu.:1515534 3rd Qu.:3.0 3rd Qu.:0 3rd Qu.:24.0
## Max. :6316 Max. :1569035 Max. :5.0 Max. :0 Max. :42.0
##
## Govt_Embassy Private_residential Shopping E_Active
## Min. :0.0 Min. : 8.0 Min. : 0.0 Min. :584.9
## 1st Qu.:0.0 1st Qu.: 9.0 1st Qu.: 0.0 1st Qu.:595.2
## Median :0.0 Median : 9.0 Median : 3.0 Median :597.3
## Mean :0.6 Mean :12.8 Mean : 3.4 Mean :596.7
## 3rd Qu.:0.0 3rd Qu.:13.0 3rd Qu.: 4.0 3rd Qu.:597.5
## Max. :3.0 Max. :25.0 Max. :10.0 Max. :608.8
##
## Young Aged Pop dens
## Min. :224.4 Min. : 77.67 Min. :31760 Min. :0.02138
## 1st Qu.:259.1 1st Qu.: 92.32 1st Qu.:32400 1st Qu.:0.03109
## Median :295.7 Median :106.81 Median :46610 Median :0.03153
## Mean :281.1 Mean :122.22 Mean :46478 Mean :0.03291
## 3rd Qu.:298.9 3rd Qu.:143.58 3rd Qu.:60200 3rd Qu.:0.03837
## Max. :327.2 Max. :190.74 Max. :61420 Max. :0.04220
##
## HDB_1_2 HDB_3_4 HDB_5_EC Condo_Apt
## Min. : 0.000 Min. :365.3 Min. :144.2 Min. : 60.4
## 1st Qu.: 7.327 1st Qu.:370.7 1st Qu.:213.3 1st Qu.:107.3
## Median :27.741 Median :403.6 Median :468.6 Median :121.9
## Mean :22.409 Mean :437.1 Mean :367.3 Mean :145.9
## 3rd Qu.:36.044 3rd Qu.:498.7 3rd Qu.:485.9 3rd Qu.:134.2
## Max. :40.932 Max. :546.9 Max. :524.6 Max. :305.7
##
## LandedProperty CLUSTER SP_CLUSTER geometry
## Min. : 0.000 1:0 0:0 MULTIPOLYGON :5
## 1st Qu.: 4.070 2:5 1:0 epsg:NA :0
## Median : 4.153 3:0 2:0 +proj=tmer...:0
## Mean : 27.303 4:0 3:0
## 3rd Qu.: 10.390 4:5
## Max. :117.901
##
sum(Cluster4$SHAPE_Area) / 10000 # Convert m^2 to km^2
## [1] 704.6595
Cluster 4 resides in the north eastern region of Singapore, it has the second smallest total area at 704.6595 km^2.
tm_shape(Cluster1) +
tm_fill("SP_CLUSTER") +
tm_borders(lwd = 0.1, alpha = 1) +
tm_layout(title = "Business", legend.outside = TRUE) +
tm_bubbles(col = "Business",
size = "Business",
border.col = "black",
border.lwd = 1,
alpha = 0.7)
This will allow us to compare the individual traits of each cluster.
BusinessClusters <- SZ_spatialcluster %>%
select("SUBZONE_N","Business","SP_CLUSTER") %>%
group_by(SP_CLUSTER) %>%
summarise(Business = mean(Business))
IndustryClusters <- SZ_spatialcluster %>%
select("SUBZONE_N","Industry","SP_CLUSTER") %>%
group_by(SP_CLUSTER) %>%
summarise(Industry = mean(Industry))
FinancialClusters <- SZ_spatialcluster %>%
select("SUBZONE_N","Financial","SP_CLUSTER") %>%
group_by(SP_CLUSTER) %>%
summarise(Financial = mean(Financial))
Govt_EmbassyClusters <- SZ_spatialcluster %>%
select("SUBZONE_N","Govt_Embassy","SP_CLUSTER") %>%
group_by(SP_CLUSTER) %>%
summarise(Govt_Embassy = mean(Govt_Embassy))
Private_residentialClusters <- SZ_spatialcluster %>%
select("SUBZONE_N","Private_residential","SP_CLUSTER") %>%
group_by(SP_CLUSTER) %>%
summarise(Private_residential = mean(Private_residential))
YoungClusters <- SZ_spatialcluster %>%
select("SUBZONE_N","Young","SP_CLUSTER") %>%
group_by(SP_CLUSTER) %>%
summarise(Young = mean(Young))
AgedClusters <- SZ_spatialcluster %>%
select("SUBZONE_N","Aged","SP_CLUSTER") %>%
group_by(SP_CLUSTER) %>%
summarise(Aged = mean(Aged))
HDB_1_2Clusters <- SZ_spatialcluster %>%
select("SUBZONE_N","HDB_1_2","SP_CLUSTER") %>%
group_by(SP_CLUSTER) %>%
summarise(HDB_1_2 = mean(HDB_1_2))
HDB_3_4Clusters <- SZ_spatialcluster %>%
select("SUBZONE_N","HDB_3_4","SP_CLUSTER") %>%
group_by(SP_CLUSTER) %>%
summarise(HDB_3_4 = mean(HDB_3_4))
HDB_5_ECClusters <- SZ_spatialcluster %>%
select("SUBZONE_N","HDB_5_EC","SP_CLUSTER") %>%
group_by(SP_CLUSTER) %>%
summarise(HDB_5_EC = mean(HDB_5_EC))
Condo_AptClusters <- SZ_spatialcluster %>%
select("SUBZONE_N","Condo_Apt","SP_CLUSTER") %>%
group_by(SP_CLUSTER) %>%
summarise(Business = mean(Condo_Apt))
LandedPropertyClusters <- SZ_spatialcluster %>%
select("SUBZONE_N","LandedProperty","SP_CLUSTER") %>%
group_by(SP_CLUSTER) %>%
summarise(Business = mean(LandedProperty))
For the analysis below, since we have determined that cluster 0 is void of any value of the variables, we will not comment on it moving foward.
BusinessClusters
## Simple feature collection with 5 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## # A tibble: 5 x 3
## SP_CLUSTER Business geometry
## <fct> <dbl> <GEOMETRY [m]>
## 1 0 0 MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.~
## 2 1 26.2 MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 1457~
## 3 2 17.1 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 271~
## 4 3 8.60 MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 260~
## 5 4 1.8 POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 ~
Cluster 1 has the highest proportion of Business followed by Cluster 2,3 and 4.
IndustryClusters
## Simple feature collection with 5 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## # A tibble: 5 x 3
## SP_CLUSTER Industry geometry
## <fct> <dbl> <GEOMETRY [m]>
## 1 0 0 MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.~
## 2 1 0.455 MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 1457~
## 3 2 0.333 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 271~
## 4 3 0.116 MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 260~
## 5 4 0 POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 ~
We see that cluster 1 has the highest proportion of industries again followed by Cluster 2,3 and 4
FinancialClusters
## Simple feature collection with 5 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## # A tibble: 5 x 3
## SP_CLUSTER Financial geometry
## <fct> <dbl> <GEOMETRY [m]>
## 1 0 0 MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754~
## 2 1 8.74 MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 145~
## 3 2 5.26 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 27~
## 4 3 15.7 MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 26~
## 5 4 15.8 POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62~
For Financial, Cluster4 has the highest proportion of Financial Urban Functions at 15.8 followed closely by Cluster 3 then 1 and lastly, 2.
Govt_EmbassyClusters
## Simple feature collection with 5 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## # A tibble: 5 x 3
## SP_CLUSTER Govt_Embassy geometry
## <fct> <dbl> <GEOMETRY [m]>
## 1 0 0 MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17~
## 2 1 0.795 MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, ~
## 3 2 0.593 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59,~
## 4 3 3.08 MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41,~
## 5 4 0.6 POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320~
Cluster 3 shows a much higher proportion of Govt_Embassy urban functions at 3 while the rest, even the second highest is below 1 at 0.795. This could be attributed to most of the foreign embassies being in the CBD / Central Region as seen by the distribution plot earlier.
Private_residentialClusters
## Simple feature collection with 5 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## # A tibble: 5 x 3
## SP_CLUSTER Private_residenti~ geometry
## <fct> <dbl> <GEOMETRY [m]>
## 1 0 0 MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868~
## 2 1 12.1 MULTIPOLYGON (((14557.7 30447.21, 14562.89 3044~
## 3 2 2.52 MULTIPOLYGON (((27253.37 41646.98, 27223.41 416~
## 4 3 12.2 MULTIPOLYGON (((26066.69 25744.31, 26074.22 257~
## 5 4 12.8 POLYGON ((34418.46 37253.29, 34371.44 37143.21,~
We observe that clusters 4 3 and 1 are close however 4 is the marginally the highest at 12.8 followed by 3 and closely 1. Lastly we have cluster 2. This can be seen in the distribution plot earlier done for private_residential distribution.
YoungClusters
## Simple feature collection with 5 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## # A tibble: 5 x 3
## SP_CLUSTER Young geometry
## <fct> <dbl> <GEOMETRY [m]>
## 1 0 0 MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.27 ~
## 2 1 183. MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 14570.7~
## 3 2 165. MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 27193.~
## 4 3 164. MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 26078.~
## 5 4 281. POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 371~
Cluster 4 shows the highest proportion at 281.0606 while cluster 1 follows at 182.6900. Cluster2 and 3 are close with Cluster 2 narrowly being higher. In the above plot, high porportion areas of young trended to the north which could attribute to why Cluster 4’s young has the highest proportion.
AgedClusters
## Simple feature collection with 5 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## # A tibble: 5 x 3
## SP_CLUSTER Aged geometry
## <fct> <dbl> <GEOMETRY [m]>
## 1 0 0 MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.27 ~
## 2 1 107. MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 14570.7~
## 3 2 59.3 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 27193.~
## 4 3 123. MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 26078.~
## 5 4 122. POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 371~
We see cluster 3 having the highest proportion of Aged here with cluster 4 followed at a close second. Cluster 2 shows a much lower proportion compared to the other 3. For cluster 3 it may be the highest because of the sheer number of subzones in the area having a mix of low and high proprtions adding up. From the summary we also observe the largest difference between median at max for aged in cluster 3.
HDB_1_2Clusters
## Simple feature collection with 5 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## # A tibble: 5 x 3
## SP_CLUSTER HDB_1_2 geometry
## <fct> <dbl> <GEOMETRY [m]>
## 1 0 0 MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.2~
## 2 1 21.4 MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 14570~
## 3 2 17.2 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 2719~
## 4 3 50.5 MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 2607~
## 5 4 22.4 POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 3~
We see that cluster 3 shows the highest proportion of HDB_1_2 at more than double than number 2 cluster 4. This can be seen also in the earlier plotted distribution map of HDB_1_2.
HDB_3_4Clusters
## Simple feature collection with 5 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## # A tibble: 5 x 3
## SP_CLUSTER HDB_3_4 geometry
## <fct> <dbl> <GEOMETRY [m]>
## 1 0 0 MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.2~
## 2 1 271. MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 14570~
## 3 2 213. MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 2719~
## 4 3 219. MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 2607~
## 5 4 437. POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 3~
For HDB_3_4 we see cluster 4 at a significant proportion lead above the other 3 clusters at 437.0574.
HDB_5_ECClusters
## Simple feature collection with 5 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## # A tibble: 5 x 3
## SP_CLUSTER HDB_5_EC geometry
## <fct> <dbl> <GEOMETRY [m]>
## 1 0 0 MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.~
## 2 1 127. MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 1457~
## 3 2 186. MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 271~
## 4 3 60.9 MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 260~
## 5 4 367. POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 ~
We see that Cluster 4 shows the highest proportion of HDB_5_EC at almost a double the second highest cluster 2 and quintuple the lowest cluster 3. We can observe a low proportion of HDB_5_ECs in the above plot as well.
Condo_AptClusters
## Simple feature collection with 5 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## # A tibble: 5 x 3
## SP_CLUSTER Business geometry
## <fct> <dbl> <GEOMETRY [m]>
## 1 0 0 MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.~
## 2 1 166. MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 1457~
## 3 2 54.0 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 271~
## 4 3 404. MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 260~
## 5 4 146. POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 ~
Cluster 3 shows the highest proportion of Condo_Apt among the 4 clusters at more than double the second cluster 1 and more than quintuple the lowest cluster 2. This can also be seen in the Condo_Apt distribution plot above.
LandedPropertyClusters
## Simple feature collection with 5 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
## # A tibble: 5 x 3
## SP_CLUSTER Business geometry
## <fct> <dbl> <GEOMETRY [m]>
## 1 0 0 MULTIPOLYGON (((17763.39 15889.1, 17758.6 15868.3, 17754.~
## 2 1 125. MULTIPOLYGON (((14557.7 30447.21, 14562.89 30443.22, 1457~
## 3 2 85.3 MULTIPOLYGON (((27253.37 41646.98, 27223.41 41646.59, 271~
## 4 3 33.3 MULTIPOLYGON (((26066.69 25744.31, 26074.22 25738.41, 260~
## 5 4 27.3 POLYGON ((34418.46 37253.29, 34371.44 37143.21, 34320.62 ~
We see cluster 1 with the highest proportion of LandedProperty followed by cluster2 then 3 and 4. This result is to be expected asin the above plot, landed property had low proportions for areas in cluster 3 and 4.