Since 2013, Singapore has an active Open Data initiative. It aims to enhance transparency, public participation, and collaboration in the nation. The nation has big data ambitions and believes that the bountiful pool of available data should be used to gain new insight that will improve economic welfare. With the inception of The Smart Nation vision in 2017, Open Data is seen as a necessary component of this initiative, especially in the promotion of public-private collaborations (co-innovation). However, most of the effort to date tend to focus on hosting online open data portal by various government agency such as data.gov.sg, SLA OneMap, URA SPACE and LTA LTA DataMall , just to mention a few of them. There are very little work on how to integrate information shared by these agencies to gain better understanding on national development or public services issues.
In view of this, we are going to conduct a use-case to demonstrate the potential contribution of geospatial analytics in R to integrate, analyse and communicate the analysis results by using open data provided by different government agencies. The specific objectives of the study are as follows:
packages = c('rgdal', 'maptools', 'raster','spatstat', 'tmap', 'tidyverse','sf','httr', 'rvest')
for (p in packages) {
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}I have decided to use additional ECDA Statistics from ECDA to supplement this analysis.
popdata <- read_csv("data/aspatial/respopagesextod2011to2019.csv")## Parsed with column specification:
## cols(
## PA = col_character(),
## SZ = col_character(),
## AG = col_character(),
## Sex = col_character(),
## TOD = col_character(),
## Pop = col_double(),
## Time = col_double()
## )
ecda_stats <- read_csv("data/aspatial/ecda.csv")## Parsed with column specification:
## cols(
## Year = col_double(),
## Childcare_centres = col_double(),
## Childcare_places = col_double(),
## Childcare_enrolment = col_double(),
## C_Avg_fullday_fees = col_double(),
## Kindergarten_centres = col_double(),
## Kindergarten_places = col_double(),
## Kindergarten_enrolment = col_double(),
## K_Avg_fullday_fees = col_double()
## )
summary(popdata)## PA SZ AG
## Length:883728 Length:883728 Length:883728
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Sex TOD Pop Time
## Length:883728 Length:883728 Min. : 0.00 Min. :2011
## Class :character Class :character 1st Qu.: 0.00 1st Qu.:2013
## Mode :character Mode :character Median : 0.00 Median :2015
## Mean : 39.83 Mean :2015
## 3rd Qu.: 10.00 3rd Qu.:2017
## Max. :2860.00 Max. :2019
summary(ecda_stats)## Year Childcare_centres Childcare_places Childcare_enrolment
## Min. :2016 Min. :1324 Min. :128207 Min. :101905
## 1st Qu.:2017 1st Qu.:1373 1st Qu.:136571 1st Qu.:107633
## Median :2018 Median :1434 Median :148155 Median :113919
## Mean :2018 Mean :1431 Mean :147688 Mean :113576
## 3rd Qu.:2018 3rd Qu.:1492 3rd Qu.:159271 3rd Qu.:119862
## Max. :2019 Max. :1532 Max. :166235 Max. :124559
## C_Avg_fullday_fees Kindergarten_centres Kindergarten_places
## Min. :1013 Min. :435.0 Min. :45794
## 1st Qu.:1013 1st Qu.:462.0 1st Qu.:50588
## Median :1021 Median :475.5 Median :53773
## Mean :1029 Mean :466.5 Mean :52175
## 3rd Qu.:1037 3rd Qu.:480.0 3rd Qu.:55359
## Max. :1062 Max. :480.0 Max. :55359
## Kindergarten_enrolment K_Avg_fullday_fees
## Min. :58204 Min. :317
## 1st Qu.:60276 1st Qu.:317
## Median :63137 Median :317
## Mean :62446 Mean :327
## 3rd Qu.:65306 3rd Qu.:327
## Max. :65306 Max. :357
For the purpose of this homework, I will be creating a new column known as CC_KDG. This is because the ECDA Website mentions that the main target group for Childcare and Kindergardens in Singapore refers to children aged between 2 years to 7 years of Age. I also calculated a CC_KDG_PCT Column in order to look at the ratio between this group of children and the total population living in the planning subzone.
popdata2019 <- popdata %>%
filter(Time == "2019") %>%
group_by(PA,SZ,AG) %>%
summarise(`POP` = sum(`Pop`))%>%
ungroup() %>%
spread(AG, POP) %>%
mutate(CC_KDG = `0_to_4` + `5_to_9`) %>%
mutate(YOUNG = `0_to_4` + `5_to_9` + `10_to_14` + `15_to_19` + `20_to_24`) %>%
mutate(`ECONOMY ACTIVE` = rowSums(.[7:11]) + rowSums(.[13:15]))%>%
mutate(TOTAL = YOUNG + `ECONOMY ACTIVE` + rowSums(.[16:21])) %>%
mutate(CC_KDG_PCT = 100*CC_KDG/TOTAL) %>%
select(PA, SZ, CC_KDG, YOUNG, `ECONOMY ACTIVE`, `TOTAL`, `CC_KDG_PCT`) Some NAs were introduced in the above dataframe from the process where the CC_KDG_PCT column was created. This is because certain planning subzones had no census data - thus causing a NA to appear when CC_KDG and TOTAL (both 0 in this subzone) are divided by each other. Keep these values for now - no need to process since they are different from the subzones which have no CC_KDG eligible children.
popdata2019[rowSums(is.na(popdata2019))!=0,] Presence of Nulls are introduced during the creation of CC_KDG_PCT.
subset(popdata2019,popdata2019$CC_KDG_PCT == 0)Locations with no CC_KDG Eligible Children have CC_KDG_PCT defined as 0, which is encoded differently from the above locations with no census data.
mpsz <- readOGR(dsn = "data/geospatial", layer="MP14_SUBZONE_WEB_PL")## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\65918\Desktop\NUS\AY_2021_SEM1\IS415\Take Home Assignments\IS415_Take-Home_EX01\data\geospatial", layer: "MP14_SUBZONE_WEB_PL"
## with 323 features
## It has 15 fields
preschool <- readOGR(dsn = "data/geospatial/pre-schools-location-kml.kml",
layer = "PRESCHOOLS_LOCATION")## OGR data source with driver: KML
## Source: "C:\Users\65918\Desktop\NUS\AY_2021_SEM1\IS415\Take Home Assignments\IS415_Take-Home_EX01\data\geospatial\pre-schools-location-kml.kml", layer: "PRESCHOOLS_LOCATION"
## with 1925 features
## It has 2 fields
cc <- readOGR("data/geospatial/child-care-services-kml.kml")## OGR data source with driver: KML
## Source: "C:\Users\65918\Desktop\NUS\AY_2021_SEM1\IS415\Take Home Assignments\IS415_Take-Home_EX01\data\geospatial\child-care-services-kml.kml", layer: "CHILDCARE"
## with 1545 features
## It has 2 fields
kdg <- readOGR("data/geospatial/kindergartens-kml.kml")## OGR data source with driver: KML
## Source: "C:\Users\65918\Desktop\NUS\AY_2021_SEM1\IS415\Take Home Assignments\IS415_Take-Home_EX01\data\geospatial\kindergartens-kml.kml", layer: "KINDERGARTENS"
## with 448 features
## It has 2 fields
crs(mpsz)## CRS arguments:
## +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1
## +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs
crs(preschool)## CRS arguments: +proj=longlat +datum=WGS84 +no_defs
crs(cc)## CRS arguments: +proj=longlat +datum=WGS84 +no_defs
crs(kdg)## CRS arguments: +proj=longlat +datum=WGS84 +no_defs
ps_svy21 <- spTransform(preschool,
CRS("+init=epsg:3414"))
cc_svy21 <- spTransform(cc,
CRS("+init=epsg:3414"))
kdg_svy21 <- spTransform(kdg,
CRS("+init=epsg:3414"))par(mfrow=c(2,2))
plot(ps_svy21, main ="Preschools")
plot(mpsz, main = "Subzones")
plot(cc_svy21, main ="ChildCares")
plot(kdg_svy21, main = "Kindergardens")This helps us gain a preliminary understanding of the Data
ps_svy21## class : SpatialPointsDataFrame
## features : 1925
## extent : 11203.01, 45404.24, 25596.33, 49300.88 (xmin, xmax, ymin, ymax)
## crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs
## variables : 2
## names : Name, Description
## min values : kml_1, <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>CENTRE_NAME</th> <td>3-IN-1 FAMILY CENTRE</td> </tr><tr bgcolor=""> <th>CENTRE_CODE</th> <td>ST0027</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESS</th> <td>205, TAMPINES STREET 21, #01 - 1281, S 520205</td> </tr><tr bgcolor=""> <th>POSTAL_CODE</th> <td>520205</td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>DF7EC9C2B9C80A00</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200812235534</td> </tr></table></center>
## max values : kml_999, <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>CENTRE_NAME</th> <td>Zulfa Kindergarten</td> </tr><tr bgcolor=""> <th>CENTRE_CODE</th> <td>PT9603</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESS</th> <td>717 Jurong West Street 71 #B1-107 , S640717</td> </tr><tr bgcolor=""> <th>POSTAL_CODE</th> <td>640717</td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>332EB5B21175D3B4</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200812235535</td> </tr></table></center>
kdg_svy21## class : SpatialPointsDataFrame
## features : 448
## extent : 11909.7, 43395.47, 25596.33, 48562.06 (xmin, xmax, ymin, ymax)
## crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs
## variables : 2
## names : Name, Description
## min values : kml_1, <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>ADDRESSBLOCKHOUSENUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSBUILDINGNAME</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSFLOORNUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSPOSTALCODE</th> <td>085501</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSSTREETNAME</th> <td>1E Cantoment Road #03-49 S(085501)</td> </tr><tr bgcolor=""> <th>ADDRESSTYPE</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>DESCRIPTION</th> <td>Kindergartens</td> </tr><tr bgcolor=""> <th>HYPERLINK</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>LANDXADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor=""> <th>LANDYADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor="#E3E3F3"> <th>NAME</th> <td>PCF Sparkletots Preschool @ Tanjong Pagar-Tiong Bahru Blk 1E (KN)</td> </tr><tr bgcolor=""> <th>PHOTOURL</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>446630DA5C23E739</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200813015028</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSUNITNUMBER</th> <td></td> </tr></table></center>
## max values : kml_99, <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>ADDRESSBLOCKHOUSENUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSBUILDINGNAME</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSFLOORNUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSPOSTALCODE</th> <td>98716</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSSTREETNAME</th> <td>61 Wishart Road</td> </tr><tr bgcolor=""> <th>ADDRESSTYPE</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>DESCRIPTION</th> <td>Kindergartens</td> </tr><tr bgcolor=""> <th>HYPERLINK</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>LANDXADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor=""> <th>LANDYADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor="#E3E3F3"> <th>NAME</th> <td>The Capstone Kindergarten</td> </tr><tr bgcolor=""> <th>PHOTOURL</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>C4005259C8C947E5</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200813015028</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSUNITNUMBER</th> <td></td> </tr></table></center>
cc_svy21## class : SpatialPointsDataFrame
## features : 1545
## extent : 11203.01, 45404.24, 25667.6, 49300.88 (xmin, xmax, ymin, ymax)
## crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs
## variables : 2
## names : Name, Description
## min values : kml_1, <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>ADDRESSBLOCKHOUSENUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSBUILDINGNAME</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSPOSTALCODE</th> <td>018989</td> </tr><tr bgcolor=""> <th>ADDRESSSTREETNAME</th> <td>1, MARINA BOULEVARD, #B1 - 01, ONE MARINA BOULEVARD, SINGAPORE 018989</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSTYPE</th> <td></td> </tr><tr bgcolor=""> <th>DESCRIPTION</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>HYPERLINK</th> <td></td> </tr><tr bgcolor=""> <th>LANDXADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor="#E3E3F3"> <th>LANDYADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor=""> <th>NAME</th> <td>THE LITTLE SKOOL-HOUSE INTERNATIONAL PTE. LTD.</td> </tr><tr bgcolor="#E3E3F3"> <th>PHOTOURL</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSFLOORNUMBER</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>08F73931F4A691F4</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200826094036</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSUNITNUMBER</th> <td></td> </tr></table></center>
## max values : kml_999, <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>ADDRESSBLOCKHOUSENUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSBUILDINGNAME</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSPOSTALCODE</th> <td>829646</td> </tr><tr bgcolor=""> <th>ADDRESSSTREETNAME</th> <td>200, PONGGOL SEVENTEENTH AVENUE, SINGAPORE 829646</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSTYPE</th> <td></td> </tr><tr bgcolor=""> <th>DESCRIPTION</th> <td>Child Care Services</td> </tr><tr bgcolor="#E3E3F3"> <th>HYPERLINK</th> <td></td> </tr><tr bgcolor=""> <th>LANDXADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor="#E3E3F3"> <th>LANDYADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor=""> <th>NAME</th> <td>RAFFLES KIDZ @ PUNGGOL PTE LTD</td> </tr><tr bgcolor="#E3E3F3"> <th>PHOTOURL</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSFLOORNUMBER</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>379D017BF244B0FA</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200826094036</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSUNITNUMBER</th> <td></td> </tr></table></center>
ps_sp <- as(ps_svy21, "SpatialPoints")
cc_sp <- as(cc_svy21, "SpatialPoints")
kdg_sp <- as(kdg_svy21, "SpatialPoints")ps_ppp <- as(ps_sp, "ppp")
cc_ppp <- as(cc_sp, "ppp")
kdg_ppp <- as(kdg_sp, "ppp")Examine new ppp format
par(mfrow=c(1,3))
plot(ps_ppp)
plot(cc_ppp)
plot(kdg_ppp)summary(ps_ppp)## Planar point pattern: 1925 points
## Average intensity 2.374419e-06 points per square unit
##
## *Pattern contains duplicated points*
##
## Coordinates are given to 3 decimal places
## i.e. rounded to the nearest multiple of 0.001 units
##
## Window: rectangle = [11203.01, 45404.24] x [25596.33, 49300.88] units
## (34200 x 23700 units)
## Window area = 810725000 square units
summary(cc_ppp)## Planar point pattern: 1545 points
## Average intensity 1.91145e-06 points per square unit
##
## *Pattern contains duplicated points*
##
## Coordinates are given to 3 decimal places
## i.e. rounded to the nearest multiple of 0.001 units
##
## Window: rectangle = [11203.01, 45404.24] x [25667.6, 49300.88] units
## (34200 x 23630 units)
## Window area = 808287000 square units
summary(kdg_ppp)## Planar point pattern: 448 points
## Average intensity 6.195602e-07 points per square unit
##
## *Pattern contains duplicated points*
##
## Coordinates are given to 3 decimal places
## i.e. rounded to the nearest multiple of 0.001 units
##
## Window: rectangle = [11909.7, 43395.47] x [25596.33, 48562.06] units
## (31490 x 22970 units)
## Window area = 723094000 square units
Presence of Duplicate Points - Need to further process data
Find out how many locations have more than one point event
sum(multiplicity(ps_ppp) > 1)## [1] 302
sum(multiplicity(cc_ppp) > 1)## [1] 128
sum(multiplicity(kdg_ppp) > 1)## [1] 100
Duplicated Point Events in All ppp objects
View Duplicate Point Events
tmap_mode('plot')## tmap mode set to plotting
tm_shape(ps_svy21) + tm_dots(alpha = 0.3, size = 0.1)
Some dots appear darker than others - indicating the presence of duplicate point events
Use Jittering Approach to Handle Duplicates
ps_ppp_jit <- rjitter(ps_ppp, retry=TRUE, nsim=1, drop=TRUE)
plot(ps_ppp_jit)cc_ppp_jit <- rjitter(cc_ppp, retry=TRUE, nsim=1, drop=TRUE)
plot(cc_ppp_jit)kdg_ppp_jit <- rjitter(kdg_ppp, retry=TRUE, nsim=1, drop=TRUE)
plot(kdg_ppp_jit)Create owin
mpsz_sp <- as(mpsz, "SpatialPolygons")
mpsz_owin <- as(mpsz_sp, "owin")Combine with jittered points
cc_ppp <- cc_ppp_jit[mpsz_owin]
kdg_ppp <- kdg_ppp_jit[mpsz_owin]Proxy for Demand using the number of CC_KDG Children in Each Subzone & Supply using the Spatial Point Events
Transform mpsz to prepare for left join
mpsz <- st_as_sf(mpsz)
mpsz <- st_transform(mpsz, crs= 3414)test <- st_is_valid(mpsz)
length(which(test==FALSE))## [1] 9
9 polygons are invalid - fix using sf package
mpsz <- st_make_valid(mpsz)Do a Join from the PopData2019 df to the mpsz df
popdata2019 <- popdata2019 %>%
mutate_at(.vars = vars(PA,SZ), .funs = funs(toupper)) %>%
filter(TOTAL > 0) #Filter out records where there is no census for that particular subzone
mpszpop2019 <- left_join(mpsz, popdata2019, by = c("SUBZONE_N" = "SZ"))Visualising CC_KDG numbers by Subzone
tm_shape(mpszpop2019) +
tm_fill("CC_KDG",
n = 6, style = "jenks",
palette = "Blues",
legend.hist.title = "CC/KDG Eligible Children",
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Distribution of Childcare/Kindergarden Eligible Children by Planning Subzone (Jenks Classification)",
main.title.size = 0.76,
main.title.position = "center",
legend.height = 0.43,
legend.width = 0.35,
legend.outside = FALSE,
legend.position = c("right","bottom"),
frame = FALSE) +
tm_borders(alpha = 0.2)
Above chloropeth may cause differing interpretations depending on the style and number of breaks used, etc. Thus, create a box map to reduce interpretability errors.
Creating Boxbreaks Function
boxbreaks <- function(v,mult=1.5) {
qv <- unname(quantile(v,na.rm = TRUE))
iqr <- qv[4] - qv[2]
upfence <- qv[4] + mult * iqr
lofence <- qv[2] - mult * iqr
# initialize break points vector
bb <- vector(mode="numeric",length=7)
# logic for lower and upper fences
if (lofence < qv[1]) {
# no lower outliers
bb[1] <- lofence
bb[2] <- floor(qv[1])
} else {
bb[2] <- lofence
bb[1] <- qv[1]
}
if(upfence > qv[5]) {
# no upper outliers
bb[7] <- upfence
bb[6] <- ceiling(qv[5])
} else {
bb[6] <- upfence
bb[7] <- qv[5]
}
bb[3:5] <- qv[2:4]
return(bb)
}Create get.var function
get.var <- function(vname,df) {
v <- df[vname] %>% st_set_geometry(NULL)
v <- unname(v[,1])
return(v)
}Plot Boxmap
boxmap <- function(vnam,df,titlesize,legx, legy, legtitle=NA,mtitle="Box Map",mult=1.5){
var <- get.var(vnam,df)
bb <- boxbreaks(var)
tm_shape(df) +
tm_fill(vnam,
title=legtitle,
breaks=bb,
palette="-RdBu",
labels = c("lower outlier", "< 25%", "25% - 50%", "50% - 75%","> 75%", "upper outlier")
) +
tm_fill(vnam,
df %>%
filter(is.na(vnam)),
labels = c("NA")
) +
tm_borders(alpha =0.2) +
tm_layout(main.title = paste("Box Map of ",toString(vnam)," Numbers by Planning Subzone"),
main.title.size = titlesize,
main.title.position = "center",
frame = FALSE,
legend.height = legx,
legend.width = legy
)
}
boxmap("CC_KDG", mpszpop2019, 1, 0.5,0.5)
From the above, we can see that the demand for childcare/kindergarden services varies accordingly. Certain Regions (Those further away from the Central Business District) tend to have a higher demand for the aforementioned services. This makes relatively good sense if we think about it - the CBD is where most people go to work. People will often tend to use Childcare/Kindergarden services nearer to their houses, which causes the distribution to be higher at more residential areas.
Currently the imorted GIS Data does not give us any meaningful interpretation - information is all hidden in the ‘Description’ Column. Upon further reading, this could be the way the data is stored; as an html table for each observation. If we were to parse it, we will be able to get a nice table containing the information of a particular CC/KDG Point Event.
Read PreSchool locations as sf
ps_sf <- read_sf("data/geospatial/pre-schools-location-kml.kml")
cc_sf <- read_sf("data/geospatial/child-care-services-kml.kml")
kdg_sf <- read_sf("data/geospatial/kindergartens-kml.kml")Write Functions to Extract and Bind attributes for each row
slicer <- function(sf,y) {
sf %>%
slice(y) %>%
pull(Description) %>%
read_html() %>%
html_node("table") %>%
html_table(header = TRUE, trim = TRUE, dec = ".", fill = TRUE) %>%
as_tibble(.name_repair = ~ make.names(c("Attribute", "Value"))) %>%
pivot_wider(names_from = Attribute, values_from = Value)
}
flatten <- function(df) {
attrs <- lapply(X = 1:nrow(df),
sf = df,
FUN = slicer)
df_attrs <- df %>%
bind_cols(bind_rows(attrs))
df_attrs <- df_attrs[,c(3:ncol(df_attrs))]
df_attrs_3414 <- st_transform(df_attrs, 3414)
return(df_attrs_3414)
}ps_sf_attrs_3414 <- flatten(ps_sf)
cc_sf_attrs_3414 <- flatten(cc_sf)
kdg_sf_attrs_3414 <- flatten(kdg_sf)Find Number of Point Events in each Planning Subzone
mpszpop2019$`PreSchool Count` <- c(lengths(st_intersects(mpszpop2019, ps_sf_attrs_3414)))
mpszpop2019$`CC Count` <- c(lengths(st_intersects(mpszpop2019, cc_sf_attrs_3414)))
mpszpop2019$`KDG Count` <- c(lengths(st_intersects(mpszpop2019, kdg_sf_attrs_3414)))View Summary Statistics of the new Columns
summary(mpszpop2019$`PreSchool Count`)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 3.00 5.96 9.00 58.00
summary(mpszpop2019$`CC Count`)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 3.000 4.783 7.500 42.000
summary(mpszpop2019$`KDG Count`)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 1.000 1.387 2.000 17.000
Interestingly, this means that even subzones with no information for population is encoded as having 0 for the 3 Counts. While this may usually make sense, it does not provide an apple for apple comparison when it compare it to the boxmap for preschool demand (which contains a category for Missing Values). Process the Counts further by replacing Count values where Total Population in a Subzone = NA and Count = 0 with NA. This prevents the Count BoxMap from being distorted towards 0, and also prevents subzones with no populations but having preschools from being reset to 0.
Change Values in PreSchool Count Column
vec <- which(mpszpop2019$`PreSchool Count` == 0 & is.na(mpszpop2019$TOTAL)) #Vector where we want to replace PreSchool Count values with 0
mpszpop2019[vec, "PreSchool Count"] <- NA
vec_cc <- which(mpszpop2019$`CC Count` == 0 & is.na(mpszpop2019$TOTAL)) #Vector where we want to replace PreSchool Count values with 0
mpszpop2019[vec, "CC Count"] <- NA
vec_kdg <- which(mpszpop2019$`KDG Count` == 0 & is.na(mpszpop2019$TOTAL)) #Vector where we want to replace PreSchool Count values with 0
mpszpop2019[vec, "KDG Count"] <- NAPlot Boxmap of the PreSchool Count in each Planning Subzone
boxmap("PreSchool Count", mpszpop2019,1, 0.5,0.5)
This map looks rather different from the one showing demand! While certain high demand areas are met with high PreSchool Counts, some areas (such as the NorthEastern Region of Singapore) seem to be lacking in terms of preschools. Let us examine it further in the next section.
Plot Demand and Supply Boxmaps side by side - make a comparison
sup <- boxmap("CC_KDG", mpszpop2019,0.7, 0.42,0.42)
dd <- boxmap("PreSchool Count", mpszpop2019,0.7,0.42,0.42)
tmap_arrange(sup,dd)
Plotting them side by side, we do see that the demand outstrips the supply in certain regions, most prominently in the northeast region of Singapore. Some Areas have greater supply than others (E.g. SouthEast Area)
Plot CC and KDG by Subzones
dd_cc <- boxmap("CC Count", mpszpop2019,0.7,0.42,0.42)
dd_kdg <- boxmap("KDG Count", mpszpop2019,0.7,0.42,0.42)
tmap_arrange(dd_cc, dd_kdg)
From the above, we see that the KDG Count Numbers by Planning Subzones is more skewed than that of the CC Count - presence of more dark red outliers. An observational comparison to the demand boxmap (proxied by number of CC/KDG Eligible Children) show that the number of Childcare tends to be higher in regions with higher demand, while the Number of Kindergardens is not really matching the areas with the highest demand (Especially NorthEastern Region). This is however qualitative - use ECDA figures in next section to do quantitative analysis.
Use ECDA Figures to gauge Demand and Supply
ecda2019 <- ecda_stats[ecda_stats$Year == 2019,]
cc_avg <- ecda2019$Childcare_places / ecda2019$Childcare_centres
kdg_avg <- ecda2019$Kindergarten_places / ecda2019$Kindergarten_centres
mpszpop2019$CC_VACANCIES <- mpszpop2019$`CC Count` * cc_avg
mpszpop2019$KDG_VACANCIES <- mpszpop2019$`KDG Count` * kdg_avg
mpszpop2019$PS_VACANCIES <- mpszpop2019$CC_VACANCIES + mpszpop2019$KDG_VACANCIES # No choice, because we don't have more granular data for demand. Can only compare demand with the vacancies from both KDG and CC
mpszpop2019$NET_DEMAND <- mpszpop2019$CC_KDG - mpszpop2019$PS_VACANCIESPlot chloropleth for NET_DEMAND
tm_shape(mpszpop2019) +
tm_fill("NET_DEMAND", palette = "RdGy") +
tm_layout(frame = FALSE,
main.title = "Chloropleth of NET_DEMAND",
main.title.position = "center") +
tm_borders(alpha = 0.5)## Variable(s) "NET_DEMAND" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
From the above map, we can see that the patterns that we have been more or less matches the qualitative analysis. The darker the areas, the more demand outstrips the supply for preschool services in the planning subzone.This is especially true in the NorthEastern region of Singapore. Other areas with high demand for PreSchool services such as pockets in the west, north and east do not have as positive net_demand as the NE region of Singapore.Regions in pink have more supply than demand.
Goal is to Plot an interactive proportional symbol map. Use the parsed tables from previous section as simply plotting the SpatialPointsDataFrames does not yield meaningful point plots.
tmap_mode("view")## tmap mode set to interactive viewing
tm_shape(ps_svy21)+
tm_bubbles(col = "lightblue",
size = 0.5,
border.col = "black",
border.lwd = 0.5)Performing Joins to ps_sf_attrs_3414 to create IS_KDG Column
ps_sf_attrs_iskdg <- st_join(ps_sf_attrs_3414, cc_sf_attrs_3414, by = c("CENTRE_CODE"))
ps_sf_attrs_iskdg$iskdg <- ""
vec_cc <- which(is.na(ps_sf_attrs_iskdg$NAME)) #Vector where we want to replace iskdg values with Kindergarden label
ps_sf_attrs_iskdg[vec_cc, "iskdg"] <- "Kindergarden" # Assign nulls with Kindergarden since the vector checks for NAs in the "Name" column, which will have data only if the join is successful
ps_sf_attrs_iskdg[which(ps_sf_attrs_iskdg$iskdg != "Kindergarden"), "iskdg"] <- "Childcare" #Assign rest as ChildCare
ps_sf_attrs_iskdg <- ps_sf_attrs_iskdg[, c("geometry", "CENTRE_NAME", "CENTRE_CODE", "ADDRESS", "iskdg")]Plot tmap
tmap_mode("view")## tmap mode set to interactive viewing
tm_shape(ps_sf_attrs_iskdg) +
tm_bubbles(col = "iskdg",
size = 0.5,
border.col = "black",
border.lwd = 0.5) +
tm_facets(by = "iskdg",
nrow = 1,
sync = TRUE)tmap_mode("plot")## tmap mode set to plotting
Not much we can see from the above other than the fact that the presence of PreSchools is highly correlated with residential areas (We see areas like catchment areas, military bases, etc having no preschools). This to me means that the data is not randomly distributed - more clustered. Run 2nd Order Point Pattern Analysis to acertain visual interpretation.
CC_G = Gest(cc_ppp, correction = "border")
plot(CC_G)Perform Complete Spatial Randomness Test
H0 = Distribution of childcare services are randomly distributed H1 = Distribution of childcare services are not randomly distributed
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01. (Lesser Simulations used as 1000 simulations taking too long)
CC_G.csr <- envelope(cc_ppp, Gest, nsim=99)## Generating 99 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
## 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99.
##
## Done.
plot(CC_G.csr)
Our observed G Function lies above that of the envelopes - Reject Null Hypothesis that the distribution of childcare services are randomly distributed. As it also lies abover the envelopes, the distribution can be said to follow that of a clustered pattern.
CC_F = Fest(cc_ppp)
plot(CC_F)Perform Complete Spatial Randomness Test
H0 = Distribution of childcare services are randomly distributed H1 = Distribution of childcare services are not randomly distributed
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01. (Lesser Simulations used as 1000 simulations taking too long)
CC_F.csr <- envelope(cc_ppp, Fest, nsim=99)## Generating 99 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
## 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99.
##
## Done.
plot(CC_F.csr)Our observed F Function lies below that of the envelopes - Reject Null Hypothesis that the distribution of childcare services are randomly distributed. As it also lies below the envelopes, the distribution can be said to follow that of a clustered pattern.
Aborted due to lack of memory
CC_K = Kest(cc_ppp, correction = "Ripley")
plot(CC_K, . -r ~ r, ylab= "K(d)-r", xlab = "d(m)")Perform Complete Spatial Randomness Test H0 = Distribution of childcare services are randomly distributed H1 = Distribution of childcare services are not randomly distributed
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01. (Lesser Simulations used as 1000 simulations taking too long)
CC_K.csr <- envelope(cc_ppp, Kest, nsim = 99, rank = 1, glocal=TRUE)plot(CC_K.csr, . - r ~ r, xlab="d", ylab="K(d)-r")Aborted due to lack of memory
CC_L = Lest(cc_ppp, correction = "Ripley")
plot(CC_L, . -r ~ r,
ylab= "L(d)-r", xlab = "d(m)")Perform Complete Spatial Randomness Test H0 = Distribution of childcare services are randomly distributed H1 = Distribution of childcare services are not randomly distributed
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01. (Lesser Simulations used as 1000 simulations taking too long)
CC_L.csr <- envelope(cc_ppp, Lest, nsim = 99, rank = 1, glocal=TRUE)plot(CC_L.csr, . - r ~ r, xlab="d", ylab="L(d)-r")KDG_G = Gest(kdg_ppp, correction = "border")
plot(KDG_G)Perform Complete Spatial Randomness Test
H0 = Distribution of kindergarden services are randomly distributed H1 = Distribution of kindergarden services are not randomly distributed
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01. (Lesser Simulations used as 1000 simulations taking too long)
KDG_G.csr <- envelope(kdg_ppp, Gest, nsim=99)## Generating 99 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
## 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99.
##
## Done.
plot(KDG_G.csr)
Our observed G Function lies above that of the envelopes - Reject Null Hypothesis that the distribution of childcare services are randomly distributed. As it also lies abover the envelopes, the distribution can be said to follow that of a clustered pattern.
KDG_F = Fest(kdg_ppp)
plot(KDG_F)Perform Complete Spatial Randomness Test
H0 = Distribution of childcare services are randomly distributed H1 = Distribution of childcare services are not randomly distributed
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01. (Lesser Simulations used as 1000 simulations taking too long)
KDG_F.csr <- envelope(kdg_ppp, Fest, nsim=99)## Generating 99 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
## 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99.
##
## Done.
plot(KDG_F.csr)Our observed F Function lies below that of the envelopes - Reject Null Hypothesis that the distribution of childcare services are randomly distributed. As it also lies below the envelopes, the distribution can be said to follow that of a clustered pattern.
Aborted due to lack of memory
Rescale cc_ppp for increased interpretability
cc_ppp.km <- rescale(cc_ppp, 1000, "km")Try bw.ppl method since our data is clustered
kde_cc.ppl <- density(cc_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="gaussian")
plot(kde_cc.ppl, main = "bw.ppl")
Density of Childcares is the relatively well evenly distributed in the residential regions.
Try other kernel methods
par(mfrow=c(1,2))
plot(density(cc_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="gaussian"), main="Gaussian")
plot(density(cc_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="epanechnikov"), main="Epanechnikov")par(mfrow=c(1,2))
plot(density(cc_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="quartic"), main="Quartic")
plot(density(cc_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="disc"), main="Disc")kde_cc_adaptive <- adaptive.density(cc_ppp.km, method="kernel")
plot(kde_cc_adaptive)par(mfrow=c(1,2))
plot(kde_cc.ppl, main = "Fixed bandwidth")
plot(kde_cc_adaptive, main = "Adaptive bandwidth")
Use fixed method as Singapore’s Spatial point patterns does not exhibit that much geographical skew distribution.
gridded_kde_cc_bw <- as.SpatialGridDataFrame.im(kde_cc.ppl)
spplot(gridded_kde_cc_bw)kde_cc_bw_raster <- raster(gridded_kde_cc_bw)
kde_cc_bw_raster## class : RasterLayer
## dimensions : 128, 128, 16384 (nrow, ncol, ncell)
## resolution : 0.419757, 0.2695907 (x, y)
## extent : 2.667538, 56.39644, 15.74872, 50.25633 (xmin, xmax, ymin, ymax)
## crs : NA
## source : memory
## names : v
## values : -1.388286e-14, 22.41333 (min, max)
#Assign projection system
projection(kde_cc_bw_raster) <- CRS("+init=EPSG:3414")
kde_cc_bw_raster## class : RasterLayer
## dimensions : 128, 128, 16384 (nrow, ncol, ncell)
## resolution : 0.419757, 0.2695907 (x, y)
## extent : 2.667538, 56.39644, 15.74872, 50.25633 (xmin, xmax, ymin, ymax)
## crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs
## source : memory
## names : v
## values : -1.388286e-14, 22.41333 (min, max)
tmap_mode('view')## tmap mode set to interactive viewing
tm_basemap("OpenStreetMap")+tm_shape(kde_cc_bw_raster) + tm_raster('v') ## Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
Rescale cc_ppp for increased interpretability
kdg_ppp.km <- rescale(kdg_ppp, 1000, "km")Try bw.ppl method since our data is clustered
kde_kdg.ppl <- density(kdg_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="gaussian")
plot(kde_kdg.ppl, main = "bw.ppl")
Density of Kindergardens is the highest in the SouthEastern Region of Singapore.
Try other kernel methods
par(mfrow=c(1,2))
plot(density(kdg_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="gaussian"), main="Gaussian")
plot(density(kdg_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="epanechnikov"), main="Epanechnikov")par(mfrow=c(1,2))
plot(density(kdg_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="quartic"), main="Quartic")
plot(density(kdg_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="disc"), main="Disc")kde_kdg_adaptive <- adaptive.density(kdg_ppp.km, method="kernel")
plot(kde_kdg_adaptive)par(mfrow=c(1,2))
plot(kde_kdg.ppl, main = "Fixed bandwidth")
plot(kde_kdg_adaptive, main = "Adaptive bandwidth")
Use fixed method as Singapore’s Spatial point patterns does not exhibit that much geographical skew distribution.
gridded_kde_kdg_bw <- as.SpatialGridDataFrame.im(kde_kdg.ppl)
spplot(gridded_kde_kdg_bw)kde_kdg_bw_raster <- raster(gridded_kde_kdg_bw)
kde_kdg_bw_raster## class : RasterLayer
## dimensions : 128, 128, 16384 (nrow, ncol, ncell)
## resolution : 0.419757, 0.2695907 (x, y)
## extent : 2.667538, 56.39644, 15.74872, 50.25633 (xmin, xmax, ymin, ymax)
## crs : NA
## source : memory
## names : v
## values : -1.995518e-15, 11.49178 (min, max)
#Assign projection system
projection(kde_kdg_bw_raster) <- CRS("+init=EPSG:3414")
kde_kdg_bw_raster## class : RasterLayer
## dimensions : 128, 128, 16384 (nrow, ncol, ncell)
## resolution : 0.419757, 0.2695907 (x, y)
## extent : 2.667538, 56.39644, 15.74872, 50.25633 (xmin, xmax, ymin, ymax)
## crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs
## source : memory
## names : v
## values : -1.995518e-15, 11.49178 (min, max)
tmap_mode('view')## tmap mode set to interactive viewing
tm_basemap("OpenStreetMap")+tm_shape(kde_kdg_bw_raster) + tm_raster('v') ## Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
KDE Calculates the density of features in a neighbourhood, wheares the Point Maps show the Point Events on the Map.