1 Overview

1.1 Background Context

Since 2013, Singapore has an active Open Data initiative. It aims to enhance transparency, public participation, and collaboration in the nation. The nation has big data ambitions and believes that the bountiful pool of available data should be used to gain new insight that will improve economic welfare. With the inception of The Smart Nation vision in 2017, Open Data is seen as a necessary component of this initiative, especially in the promotion of public-private collaborations (co-innovation). However, most of the effort to date tend to focus on hosting online open data portal by various government agency such as data.gov.sg, SLA OneMap, URA SPACE and LTA LTA DataMall , just to mention a few of them. There are very little work on how to integrate information shared by these agencies to gain better understanding on national development or public services issues.

1.2 Objectives

In view of this, we are going to conduct a use-case to demonstrate the potential contribution of geospatial analytics in R to integrate, analyse and communicate the analysis results by using open data provided by different government agencies. The specific objectives of the study are as follows:

To gain understanding on the supply and demand of childcare and kindergarten services at planning subzone level.
To gain understanding on the geographic distribution of childcare and kindergarten in Singapore.

2 Import Packages & Libraries

packages = c('rgdal', 'maptools', 'raster','spatstat', 'tmap', 'tidyverse','sf','httr', 'rvest')
for (p in packages) {
  if(!require(p, character.only = T)){
  install.packages(p)
  }
  library(p,character.only = T)
}

3 Import Data

3.1 Import Aspatial Data

I have decided to use additional ECDA Statistics from ECDA to supplement this analysis.

popdata <- read_csv("data/aspatial/respopagesextod2011to2019.csv")

## Parsed with column specification:
## cols(
##   PA = col_character(),
##   SZ = col_character(),
##   AG = col_character(),
##   Sex = col_character(),
##   TOD = col_character(),
##   Pop = col_double(),
##   Time = col_double()
## )

ecda_stats <- read_csv("data/aspatial/ecda.csv")

## Parsed with column specification:
## cols(
##   Year = col_double(),
##   Childcare_centres = col_double(),
##   Childcare_places = col_double(),
##   Childcare_enrolment = col_double(),
##   C_Avg_fullday_fees = col_double(),
##   Kindergarten_centres = col_double(),
##   Kindergarten_places = col_double(),
##   Kindergarten_enrolment = col_double(),
##   K_Avg_fullday_fees = col_double()
## )

3.1.1 Get a brief understanding of the aspatial data

summary(popdata)

##       PA                 SZ                 AG           
##  Length:883728      Length:883728      Length:883728     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##      Sex                TOD                 Pop               Time     
##  Length:883728      Length:883728      Min.   :   0.00   Min.   :2011  
##  Class :character   Class :character   1st Qu.:   0.00   1st Qu.:2013  
##  Mode  :character   Mode  :character   Median :   0.00   Median :2015  
##                                        Mean   :  39.83   Mean   :2015  
##                                        3rd Qu.:  10.00   3rd Qu.:2017  
##                                        Max.   :2860.00   Max.   :2019

summary(ecda_stats)

##       Year      Childcare_centres Childcare_places Childcare_enrolment
##  Min.   :2016   Min.   :1324      Min.   :128207   Min.   :101905     
##  1st Qu.:2017   1st Qu.:1373      1st Qu.:136571   1st Qu.:107633     
##  Median :2018   Median :1434      Median :148155   Median :113919     
##  Mean   :2018   Mean   :1431      Mean   :147688   Mean   :113576     
##  3rd Qu.:2018   3rd Qu.:1492      3rd Qu.:159271   3rd Qu.:119862     
##  Max.   :2019   Max.   :1532      Max.   :166235   Max.   :124559     
##  C_Avg_fullday_fees Kindergarten_centres Kindergarten_places
##  Min.   :1013       Min.   :435.0        Min.   :45794      
##  1st Qu.:1013       1st Qu.:462.0        1st Qu.:50588      
##  Median :1021       Median :475.5        Median :53773      
##  Mean   :1029       Mean   :466.5        Mean   :52175      
##  3rd Qu.:1037       3rd Qu.:480.0        3rd Qu.:55359      
##  Max.   :1062       Max.   :480.0        Max.   :55359      
##  Kindergarten_enrolment K_Avg_fullday_fees
##  Min.   :58204          Min.   :317       
##  1st Qu.:60276          1st Qu.:317       
##  Median :63137          Median :317       
##  Mean   :62446          Mean   :327       
##  3rd Qu.:65306          3rd Qu.:327       
##  Max.   :65306          Max.   :357

3.1.2 Data Manipulation - Reshape popdata df

For the purpose of this homework, I will be creating a new column known as CC_KDG. This is because the ECDA Website mentions that the main target group for Childcare and Kindergardens in Singapore refers to children aged between 2 years to 7 years of Age. I also calculated a CC_KDG_PCT Column in order to look at the ratio between this group of children and the total population living in the planning subzone.

popdata2019 <- popdata %>% 
  filter(Time == "2019") %>%
  group_by(PA,SZ,AG) %>%
  summarise(`POP` = sum(`Pop`))%>%
  ungroup() %>%
  spread(AG, POP) %>%
  mutate(CC_KDG = `0_to_4` + `5_to_9`) %>%
  mutate(YOUNG = `0_to_4` + `5_to_9` + `10_to_14` + `15_to_19` + `20_to_24`) %>%
  mutate(`ECONOMY ACTIVE` = rowSums(.[7:11]) + rowSums(.[13:15]))%>%
  mutate(TOTAL = YOUNG + `ECONOMY ACTIVE` + rowSums(.[16:21])) %>%
  mutate(CC_KDG_PCT = 100*CC_KDG/TOTAL) %>%
  select(PA, SZ, CC_KDG, YOUNG, `ECONOMY ACTIVE`, `TOTAL`, `CC_KDG_PCT`)

3.1.3 Deal with NAs

Some NAs were introduced in the above dataframe from the process where the CC_KDG_PCT column was created. This is because certain planning subzones had no census data - thus causing a NA to appear when CC_KDG and TOTAL (both 0 in this subzone) are divided by each other. Keep these values for now - no need to process since they are different from the subzones which have no CC_KDG eligible children.

popdata2019[rowSums(is.na(popdata2019))!=0,]

Presence of Nulls are introduced during the creation of CC_KDG_PCT.

subset(popdata2019,popdata2019$CC_KDG_PCT == 0)

Locations with no CC_KDG Eligible Children have CC_KDG_PCT defined as 0, which is encoded differently from the above locations with no census data.

3.2 Import Geospatial Data

mpsz <- readOGR(dsn = "data/geospatial", layer="MP14_SUBZONE_WEB_PL")

## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\65918\Desktop\NUS\AY_2021_SEM1\IS415\Take Home Assignments\IS415_Take-Home_EX01\data\geospatial", layer: "MP14_SUBZONE_WEB_PL"
## with 323 features
## It has 15 fields

preschool <- readOGR(dsn = "data/geospatial/pre-schools-location-kml.kml",
                     layer = "PRESCHOOLS_LOCATION")

## OGR data source with driver: KML 
## Source: "C:\Users\65918\Desktop\NUS\AY_2021_SEM1\IS415\Take Home Assignments\IS415_Take-Home_EX01\data\geospatial\pre-schools-location-kml.kml", layer: "PRESCHOOLS_LOCATION"
## with 1925 features
## It has 2 fields

cc <- readOGR("data/geospatial/child-care-services-kml.kml")

## OGR data source with driver: KML 
## Source: "C:\Users\65918\Desktop\NUS\AY_2021_SEM1\IS415\Take Home Assignments\IS415_Take-Home_EX01\data\geospatial\child-care-services-kml.kml", layer: "CHILDCARE"
## with 1545 features
## It has 2 fields

kdg <- readOGR("data/geospatial/kindergartens-kml.kml")

## OGR data source with driver: KML 
## Source: "C:\Users\65918\Desktop\NUS\AY_2021_SEM1\IS415\Take Home Assignments\IS415_Take-Home_EX01\data\geospatial\kindergartens-kml.kml", layer: "KINDERGARTENS"
## with 448 features
## It has 2 fields

3.2.1 Check Projection Systems of GIS Data

crs(mpsz)

## CRS arguments:
##  +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1
## +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs

crs(preschool)

## CRS arguments: +proj=longlat +datum=WGS84 +no_defs

crs(cc)

## CRS arguments: +proj=longlat +datum=WGS84 +no_defs

crs(kdg)

## CRS arguments: +proj=longlat +datum=WGS84 +no_defs

3.2.2 Reprojecting PreSchool GIS Data

ps_svy21 <- spTransform(preschool, 
                          CRS("+init=epsg:3414"))
cc_svy21 <- spTransform(cc, 
                          CRS("+init=epsg:3414"))
kdg_svy21 <- spTransform(kdg, 
                          CRS("+init=epsg:3414"))

3.2.3 Examine imported GIS Data

par(mfrow=c(2,2))
plot(ps_svy21, main ="Preschools")
plot(mpsz, main = "Subzones")
plot(cc_svy21, main ="ChildCares")
plot(kdg_svy21, main = "Kindergardens")

3.2.4 Examine Preschool GIS Data

This helps us gain a preliminary understanding of the Data

ps_svy21

## class       : SpatialPointsDataFrame 
## features    : 1925 
## extent      : 11203.01, 45404.24, 25596.33, 49300.88  (xmin, xmax, ymin, ymax)
## crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs 
## variables   : 2
## names       :    Name,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Description 
## min values  :   kml_1, <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>CENTRE_NAME</th> <td>3-IN-1 FAMILY CENTRE</td> </tr><tr bgcolor=""> <th>CENTRE_CODE</th> <td>ST0027</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESS</th> <td>205, TAMPINES STREET 21, #01 - 1281, S 520205</td> </tr><tr bgcolor=""> <th>POSTAL_CODE</th> <td>520205</td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>DF7EC9C2B9C80A00</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200812235534</td> </tr></table></center> 
## max values  : kml_999,    <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>CENTRE_NAME</th> <td>Zulfa Kindergarten</td> </tr><tr bgcolor=""> <th>CENTRE_CODE</th> <td>PT9603</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESS</th> <td>717 Jurong West Street 71  #B1-107 , S640717</td> </tr><tr bgcolor=""> <th>POSTAL_CODE</th> <td>640717</td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>332EB5B21175D3B4</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200812235535</td> </tr></table></center>

kdg_svy21

## class       : SpatialPointsDataFrame 
## features    : 448 
## extent      : 11909.7, 43395.47, 25596.33, 48562.06  (xmin, xmax, ymin, ymax)
## crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs 
## variables   : 2
## names       :   Name,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Description 
## min values  :  kml_1, <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>ADDRESSBLOCKHOUSENUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSBUILDINGNAME</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSFLOORNUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSPOSTALCODE</th> <td>085501</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSSTREETNAME</th> <td>1E Cantoment Road  #03-49 S(085501)</td> </tr><tr bgcolor=""> <th>ADDRESSTYPE</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>DESCRIPTION</th> <td>Kindergartens</td> </tr><tr bgcolor=""> <th>HYPERLINK</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>LANDXADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor=""> <th>LANDYADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor="#E3E3F3"> <th>NAME</th> <td>PCF Sparkletots Preschool @ Tanjong Pagar-Tiong Bahru Blk 1E (KN)</td> </tr><tr bgcolor=""> <th>PHOTOURL</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>446630DA5C23E739</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200813015028</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSUNITNUMBER</th> <td></td> </tr></table></center> 
## max values  : kml_99,                                                              <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>ADDRESSBLOCKHOUSENUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSBUILDINGNAME</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSFLOORNUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSPOSTALCODE</th> <td>98716</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSSTREETNAME</th> <td>61 Wishart Road</td> </tr><tr bgcolor=""> <th>ADDRESSTYPE</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>DESCRIPTION</th> <td>Kindergartens</td> </tr><tr bgcolor=""> <th>HYPERLINK</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>LANDXADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor=""> <th>LANDYADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor="#E3E3F3"> <th>NAME</th> <td>The Capstone Kindergarten</td> </tr><tr bgcolor=""> <th>PHOTOURL</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>C4005259C8C947E5</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200813015028</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSUNITNUMBER</th> <td></td> </tr></table></center>

cc_svy21

## class       : SpatialPointsDataFrame 
## features    : 1545 
## extent      : 11203.01, 45404.24, 25667.6, 49300.88  (xmin, xmax, ymin, ymax)
## crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs 
## variables   : 2
## names       :    Name,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Description 
## min values  :   kml_1, <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>ADDRESSBLOCKHOUSENUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSBUILDINGNAME</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSPOSTALCODE</th> <td>018989</td> </tr><tr bgcolor=""> <th>ADDRESSSTREETNAME</th> <td>1, MARINA BOULEVARD, #B1 - 01, ONE MARINA BOULEVARD, SINGAPORE 018989</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSTYPE</th> <td></td> </tr><tr bgcolor=""> <th>DESCRIPTION</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>HYPERLINK</th> <td></td> </tr><tr bgcolor=""> <th>LANDXADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor="#E3E3F3"> <th>LANDYADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor=""> <th>NAME</th> <td>THE LITTLE SKOOL-HOUSE INTERNATIONAL PTE. LTD.</td> </tr><tr bgcolor="#E3E3F3"> <th>PHOTOURL</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSFLOORNUMBER</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>08F73931F4A691F4</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200826094036</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSUNITNUMBER</th> <td></td> </tr></table></center> 
## max values  : kml_999,                  <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>ADDRESSBLOCKHOUSENUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSBUILDINGNAME</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSPOSTALCODE</th> <td>829646</td> </tr><tr bgcolor=""> <th>ADDRESSSTREETNAME</th> <td>200, PONGGOL SEVENTEENTH AVENUE, SINGAPORE 829646</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSTYPE</th> <td></td> </tr><tr bgcolor=""> <th>DESCRIPTION</th> <td>Child Care Services</td> </tr><tr bgcolor="#E3E3F3"> <th>HYPERLINK</th> <td></td> </tr><tr bgcolor=""> <th>LANDXADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor="#E3E3F3"> <th>LANDYADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor=""> <th>NAME</th> <td>RAFFLES KIDZ @ PUNGGOL PTE LTD</td> </tr><tr bgcolor="#E3E3F3"> <th>PHOTOURL</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSFLOORNUMBER</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>379D017BF244B0FA</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200826094036</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSUNITNUMBER</th> <td></td> </tr></table></center>

3.2.5 Convert to generic sp format

ps_sp <- as(ps_svy21, "SpatialPoints")
cc_sp <- as(cc_svy21, "SpatialPoints")
kdg_sp <- as(kdg_svy21, "SpatialPoints")

3.2.6 Convert to generic ppp format

ps_ppp <- as(ps_sp, "ppp")
cc_ppp <- as(cc_sp, "ppp")
kdg_ppp <- as(kdg_sp, "ppp")

Examine new ppp format

par(mfrow=c(1,3))
plot(ps_ppp)
plot(cc_ppp)
plot(kdg_ppp)

summary(ps_ppp)

## Planar point pattern:  1925 points
## Average intensity 2.374419e-06 points per square unit
## 
## *Pattern contains duplicated points*
## 
## Coordinates are given to 3 decimal places
## i.e. rounded to the nearest multiple of 0.001 units
## 
## Window: rectangle = [11203.01, 45404.24] x [25596.33, 49300.88] units
##                     (34200 x 23700 units)
## Window area = 810725000 square units

summary(cc_ppp)

## Planar point pattern:  1545 points
## Average intensity 1.91145e-06 points per square unit
## 
## *Pattern contains duplicated points*
## 
## Coordinates are given to 3 decimal places
## i.e. rounded to the nearest multiple of 0.001 units
## 
## Window: rectangle = [11203.01, 45404.24] x [25667.6, 49300.88] units
##                     (34200 x 23630 units)
## Window area = 808287000 square units

summary(kdg_ppp)

## Planar point pattern:  448 points
## Average intensity 6.195602e-07 points per square unit
## 
## *Pattern contains duplicated points*
## 
## Coordinates are given to 3 decimal places
## i.e. rounded to the nearest multiple of 0.001 units
## 
## Window: rectangle = [11909.7, 43395.47] x [25596.33, 48562.06] units
##                     (31490 x 22970 units)
## Window area = 723094000 square units

Presence of Duplicate Points - Need to further process data

3.2.7 Processing Duplicates

Find out how many locations have more than one point event

sum(multiplicity(ps_ppp) > 1)

## [1] 302

sum(multiplicity(cc_ppp) > 1)

## [1] 128

sum(multiplicity(kdg_ppp) > 1)

## [1] 100

Duplicated Point Events in All ppp objects

View Duplicate Point Events

tmap_mode('plot')

## tmap mode set to plotting

tm_shape(ps_svy21) + tm_dots(alpha = 0.3, size = 0.1)

Some dots appear darker than others - indicating the presence of duplicate point events

Use Jittering Approach to Handle Duplicates

ps_ppp_jit <- rjitter(ps_ppp, retry=TRUE, nsim=1, drop=TRUE)
plot(ps_ppp_jit)

cc_ppp_jit <- rjitter(cc_ppp, retry=TRUE, nsim=1, drop=TRUE)
plot(cc_ppp_jit)

kdg_ppp_jit <- rjitter(kdg_ppp, retry=TRUE, nsim=1, drop=TRUE)
plot(kdg_ppp_jit)

Create owin

mpsz_sp <- as(mpsz, "SpatialPolygons")
mpsz_owin <- as(mpsz_sp, "owin")

Combine with jittered points

cc_ppp <- cc_ppp_jit[mpsz_owin]
kdg_ppp <- kdg_ppp_jit[mpsz_owin]

4 Analysing Demand and Supply of CC & KDG Services by Planning Subzone

Proxy for Demand using the number of CC_KDG Children in Each Subzone & Supply using the Spatial Point Events

4.1 Demand

Transform mpsz to prepare for left join

mpsz <- st_as_sf(mpsz)
mpsz <- st_transform(mpsz, crs= 3414)

test <- st_is_valid(mpsz)
length(which(test==FALSE))

## [1] 9

9 polygons are invalid - fix using sf package

mpsz <- st_make_valid(mpsz)

Do a Join from the PopData2019 df to the mpsz df

popdata2019 <- popdata2019 %>%
  mutate_at(.vars = vars(PA,SZ), .funs = funs(toupper)) %>%
  filter(TOTAL > 0) #Filter out records where there is no census for that particular subzone

mpszpop2019 <- left_join(mpsz, popdata2019, by = c("SUBZONE_N" = "SZ"))

Visualising CC_KDG numbers by Subzone

tm_shape(mpszpop2019) +
  tm_fill("CC_KDG",
          n = 6, style = "jenks", 
          palette = "Blues",
          legend.hist.title = "CC/KDG Eligible Children",
          legend.hist = TRUE,
          legend.is.portrait = TRUE,
          legend.hist.z = 0.1) +
  tm_layout(main.title = "Distribution of Childcare/Kindergarden Eligible Children by Planning Subzone (Jenks Classification)",
            main.title.size = 0.76,
            main.title.position = "center",
            legend.height = 0.43,
            legend.width = 0.35,
            legend.outside = FALSE,
            legend.position = c("right","bottom"),
            frame = FALSE) +
  tm_borders(alpha = 0.2)

Above chloropeth may cause differing interpretations depending on the style and number of breaks used, etc. Thus, create a box map to reduce interpretability errors.

Creating Boxbreaks Function

boxbreaks <- function(v,mult=1.5) { 
  qv <- unname(quantile(v,na.rm = TRUE))
  iqr <- qv[4] - qv[2]  
  upfence <- qv[4] + mult * iqr  
  lofence <- qv[2] - mult * iqr
  
  # initialize break points vector  
  bb <- vector(mode="numeric",length=7)
  # logic for lower and upper fences
  if (lofence < qv[1]) {  
    # no lower outliers    
    bb[1] <- lofence    
    bb[2] <- floor(qv[1])  
  } else {    
    bb[2] <- lofence    
    bb[1] <- qv[1]
  }
  
  if(upfence > qv[5]) {
  # no upper outliers   
    bb[7] <- upfence   
    bb[6] <- ceiling(qv[5]) 
  } else {   
    bb[6] <- upfence  
    bb[7] <- qv[5]
  } 
  bb[3:5] <- qv[2:4]
  return(bb)
}

Create get.var function

get.var <- function(vname,df) { 
  v <- df[vname] %>% st_set_geometry(NULL)
  v <- unname(v[,1])
  return(v)
}

Plot Boxmap

boxmap <- function(vnam,df,titlesize,legx, legy, legtitle=NA,mtitle="Box Map",mult=1.5){  
  var  <- get.var(vnam,df)
  bb <- boxbreaks(var)
  tm_shape(df) +    
    tm_fill(vnam,
            title=legtitle,
            breaks=bb,
            palette="-RdBu", 
            labels = c("lower outlier", "< 25%", "25% - 50%", "50% - 75%","> 75%", "upper outlier")
            ) +
    tm_fill(vnam, 
            df %>% 
              filter(is.na(vnam)), 
            labels = c("NA")
            ) +
  tm_borders(alpha =0.2) +
  tm_layout(main.title = paste("Box Map of ",toString(vnam)," Numbers by Planning Subzone"),
            main.title.size = titlesize,
            main.title.position =  "center",
            frame = FALSE,
            legend.height = legx,
            legend.width = legy
            )
}
boxmap("CC_KDG", mpszpop2019, 1, 0.5,0.5)

From the above, we can see that the demand for childcare/kindergarden services varies accordingly. Certain Regions (Those further away from the Central Business District) tend to have a higher demand for the aforementioned services. This makes relatively good sense if we think about it - the CBD is where most people go to work. People will often tend to use Childcare/Kindergarden services nearer to their houses, which causes the distribution to be higher at more residential areas.

4.2 Supply

Currently the imorted GIS Data does not give us any meaningful interpretation - information is all hidden in the ‘Description’ Column. Upon further reading, this could be the way the data is stored; as an html table for each observation. If we were to parse it, we will be able to get a nice table containing the information of a particular CC/KDG Point Event.

Read PreSchool locations as sf

ps_sf <- read_sf("data/geospatial/pre-schools-location-kml.kml")
cc_sf <- read_sf("data/geospatial/child-care-services-kml.kml")
kdg_sf <- read_sf("data/geospatial/kindergartens-kml.kml")

Write Functions to Extract and Bind attributes for each row

slicer <- function(sf,y) {
   sf %>% 
     slice(y) %>%
     pull(Description) %>%
     read_html() %>%
     html_node("table") %>%
     html_table(header = TRUE, trim = TRUE, dec = ".", fill = TRUE) %>%
     as_tibble(.name_repair = ~ make.names(c("Attribute", "Value"))) %>% 
     pivot_wider(names_from = Attribute, values_from = Value)  
}

flatten <- function(df) {
  attrs <- lapply(X = 1:nrow(df), 
                sf = df,
                FUN = slicer)
  
  df_attrs <- df %>% 
                bind_cols(bind_rows(attrs)) 
  df_attrs <- df_attrs[,c(3:ncol(df_attrs))]
  df_attrs_3414 <- st_transform(df_attrs, 3414) 
  return(df_attrs_3414)
}

ps_sf_attrs_3414 <- flatten(ps_sf)
cc_sf_attrs_3414 <- flatten(cc_sf)
kdg_sf_attrs_3414 <- flatten(kdg_sf)

Find Number of Point Events in each Planning Subzone

mpszpop2019$`PreSchool Count` <- c(lengths(st_intersects(mpszpop2019, ps_sf_attrs_3414)))
mpszpop2019$`CC Count` <- c(lengths(st_intersects(mpszpop2019, cc_sf_attrs_3414)))
mpszpop2019$`KDG Count` <- c(lengths(st_intersects(mpszpop2019, kdg_sf_attrs_3414)))

View Summary Statistics of the new Columns

summary(mpszpop2019$`PreSchool Count`)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    3.00    5.96    9.00   58.00

summary(mpszpop2019$`CC Count`)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   3.000   4.783   7.500  42.000

summary(mpszpop2019$`KDG Count`)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   1.000   1.387   2.000  17.000

Interestingly, this means that even subzones with no information for population is encoded as having 0 for the 3 Counts. While this may usually make sense, it does not provide an apple for apple comparison when it compare it to the boxmap for preschool demand (which contains a category for Missing Values). Process the Counts further by replacing Count values where Total Population in a Subzone = NA and Count = 0 with NA. This prevents the Count BoxMap from being distorted towards 0, and also prevents subzones with no populations but having preschools from being reset to 0.

Change Values in PreSchool Count Column

vec <- which(mpszpop2019$`PreSchool Count` == 0 & is.na(mpszpop2019$TOTAL)) #Vector where we want to replace PreSchool Count values with 0
mpszpop2019[vec, "PreSchool Count"] <- NA

vec_cc <- which(mpszpop2019$`CC Count` == 0 & is.na(mpszpop2019$TOTAL)) #Vector where we want to replace PreSchool Count values with 0
mpszpop2019[vec, "CC Count"] <- NA

vec_kdg <- which(mpszpop2019$`KDG Count` == 0 & is.na(mpszpop2019$TOTAL)) #Vector where we want to replace PreSchool Count values with 0
mpszpop2019[vec, "KDG Count"] <- NA

Plot Boxmap of the PreSchool Count in each Planning Subzone

boxmap("PreSchool Count", mpszpop2019,1, 0.5,0.5)

This map looks rather different from the one showing demand! While certain high demand areas are met with high PreSchool Counts, some areas (such as the NorthEastern Region of Singapore) seem to be lacking in terms of preschools. Let us examine it further in the next section.

4.3 Demand vs Supply

Plot Demand and Supply Boxmaps side by side - make a comparison

sup <- boxmap("CC_KDG", mpszpop2019,0.7, 0.42,0.42) 
dd <- boxmap("PreSchool Count", mpszpop2019,0.7,0.42,0.42) 
tmap_arrange(sup,dd)

Plotting them side by side, we do see that the demand outstrips the supply in certain regions, most prominently in the northeast region of Singapore. Some Areas have greater supply than others (E.g. SouthEast Area)

Plot CC and KDG by Subzones

dd_cc <- boxmap("CC Count", mpszpop2019,0.7,0.42,0.42)
dd_kdg <- boxmap("KDG Count", mpszpop2019,0.7,0.42,0.42)
tmap_arrange(dd_cc, dd_kdg)

From the above, we see that the KDG Count Numbers by Planning Subzones is more skewed than that of the CC Count - presence of more dark red outliers. An observational comparison to the demand boxmap (proxied by number of CC/KDG Eligible Children) show that the number of Childcare tends to be higher in regions with higher demand, while the Number of Kindergardens is not really matching the areas with the highest demand (Especially NorthEastern Region). This is however qualitative - use ECDA figures in next section to do quantitative analysis.

Use ECDA Figures to gauge Demand and Supply

ecda2019 <- ecda_stats[ecda_stats$Year == 2019,]
cc_avg <- ecda2019$Childcare_places / ecda2019$Childcare_centres
kdg_avg <- ecda2019$Kindergarten_places / ecda2019$Kindergarten_centres

mpszpop2019$CC_VACANCIES <- mpszpop2019$`CC Count` * cc_avg
mpszpop2019$KDG_VACANCIES <- mpszpop2019$`KDG Count` * kdg_avg
mpszpop2019$PS_VACANCIES <- mpszpop2019$CC_VACANCIES + mpszpop2019$KDG_VACANCIES # No choice, because we don't have more granular data for demand. Can only compare demand with the vacancies from both KDG and CC
mpszpop2019$NET_DEMAND <- mpszpop2019$CC_KDG - mpszpop2019$PS_VACANCIES

Plot chloropleth for NET_DEMAND

tm_shape(mpszpop2019) +
  tm_fill("NET_DEMAND", palette = "RdGy") +
  tm_layout(frame = FALSE,
            main.title = "Chloropleth of NET_DEMAND",
            main.title.position = "center") +
  tm_borders(alpha = 0.5)

## Variable(s) "NET_DEMAND" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

From the above map, we can see that the patterns that we have been more or less matches the qualitative analysis. The darker the areas, the more demand outstrips the supply for preschool services in the planning subzone.This is especially true in the NorthEastern region of Singapore. Other areas with high demand for PreSchool services such as pockets in the west, north and east do not have as positive net_demand as the NE region of Singapore.Regions in pink have more supply than demand.

5 Point Maps

Goal is to Plot an interactive proportional symbol map. Use the parsed tables from previous section as simply plotting the SpatialPointsDataFrames does not yield meaningful point plots.

tmap_mode("view")

## tmap mode set to interactive viewing

tm_shape(ps_svy21)+
tm_bubbles(col = "lightblue",
           size = 0.5,
           border.col = "black",
           border.lwd = 0.5)

No meaningful data present; use parsed sf tables from previous section

Performing Joins to ps_sf_attrs_3414 to create IS_KDG Column

ps_sf_attrs_iskdg <- st_join(ps_sf_attrs_3414, cc_sf_attrs_3414, by = c("CENTRE_CODE"))
ps_sf_attrs_iskdg$iskdg <- ""

vec_cc <- which(is.na(ps_sf_attrs_iskdg$NAME)) #Vector where we want to replace iskdg values with Kindergarden label
ps_sf_attrs_iskdg[vec_cc, "iskdg"] <- "Kindergarden" # Assign nulls with Kindergarden since the vector checks for NAs in the "Name" column, which will have data only if the join is successful
ps_sf_attrs_iskdg[which(ps_sf_attrs_iskdg$iskdg != "Kindergarden"), "iskdg"] <- "Childcare" #Assign rest as ChildCare

ps_sf_attrs_iskdg <- ps_sf_attrs_iskdg[, c("geometry", "CENTRE_NAME", "CENTRE_CODE", "ADDRESS", "iskdg")]

Plot tmap

tmap_mode("view")

## tmap mode set to interactive viewing

tm_shape(ps_sf_attrs_iskdg) +
  tm_bubbles(col = "iskdg",
             size = 0.5,
             border.col = "black",
             border.lwd = 0.5) +
  tm_facets(by = "iskdg",
            nrow = 1,
            sync = TRUE)

tmap_mode("plot")

## tmap mode set to plotting

Not much we can see from the above other than the fact that the presence of PreSchools is highly correlated with residential areas (We see areas like catchment areas, military bases, etc having no preschools). This to me means that the data is not randomly distributed - more clustered. Run 2nd Order Point Pattern Analysis to acertain visual interpretation.

6 2nd Order Spatial Point Patterns Analysis

6.1 Childcare

6.1.1 G-Function

CC_G = Gest(cc_ppp, correction = "border")
plot(CC_G)

Perform Complete Spatial Randomness Test
H0 = Distribution of childcare services are randomly distributed H1 = Distribution of childcare services are not randomly distributed

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01. (Lesser Simulations used as 1000 simulations taking too long)

CC_G.csr <- envelope(cc_ppp, Gest, nsim=99)

## Generating 99 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
## 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,  99.
## 
## Done.

plot(CC_G.csr)

Our observed G Function lies above that of the envelopes - Reject Null Hypothesis that the distribution of childcare services are randomly distributed. As it also lies abover the envelopes, the distribution can be said to follow that of a clustered pattern.

6.1.2 F-Function

CC_F = Fest(cc_ppp)
plot(CC_F)

Perform Complete Spatial Randomness Test
H0 = Distribution of childcare services are randomly distributed H1 = Distribution of childcare services are not randomly distributed

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01. (Lesser Simulations used as 1000 simulations taking too long)

CC_F.csr <- envelope(cc_ppp, Fest, nsim=99)

## Generating 99 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
## 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,  99.
## 
## Done.

plot(CC_F.csr)

Our observed F Function lies below that of the envelopes - Reject Null Hypothesis that the distribution of childcare services are randomly distributed. As it also lies below the envelopes, the distribution can be said to follow that of a clustered pattern.

6.1.3 K-Function

Aborted due to lack of memory

CC_K = Kest(cc_ppp, correction = "Ripley")
plot(CC_K, . -r ~ r, ylab= "K(d)-r", xlab = "d(m)")

Perform Complete Spatial Randomness Test H0 = Distribution of childcare services are randomly distributed H1 = Distribution of childcare services are not randomly distributed

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01. (Lesser Simulations used as 1000 simulations taking too long)

CC_K.csr <- envelope(cc_ppp, Kest, nsim = 99, rank = 1, glocal=TRUE)

plot(CC_K.csr, . - r ~ r, xlab="d", ylab="K(d)-r")

6.1.4 L-Function

Aborted due to lack of memory

CC_L = Lest(cc_ppp, correction = "Ripley")
plot(CC_L, . -r ~ r, 
     ylab= "L(d)-r", xlab = "d(m)")

Perform Complete Spatial Randomness Test H0 = Distribution of childcare services are randomly distributed H1 = Distribution of childcare services are not randomly distributed

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01. (Lesser Simulations used as 1000 simulations taking too long)

CC_L.csr <- envelope(cc_ppp, Lest, nsim = 99, rank = 1, glocal=TRUE)

plot(CC_L.csr, . - r ~ r, xlab="d", ylab="L(d)-r")

6.2 Kindergarden

6.2.1 G-Function

KDG_G = Gest(kdg_ppp, correction = "border")
plot(KDG_G)

Perform Complete Spatial Randomness Test

H0 = Distribution of kindergarden services are randomly distributed H1 = Distribution of kindergarden services are not randomly distributed

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01. (Lesser Simulations used as 1000 simulations taking too long)

KDG_G.csr <- envelope(kdg_ppp, Gest, nsim=99)

## Generating 99 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
## 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,  99.
## 
## Done.

plot(KDG_G.csr)

6.2.2 F-Function

KDG_F = Fest(kdg_ppp)
plot(KDG_F)

Perform Complete Spatial Randomness Test

H0 = Distribution of childcare services are randomly distributed H1 = Distribution of childcare services are not randomly distributed

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01. (Lesser Simulations used as 1000 simulations taking too long)

KDG_F.csr <- envelope(kdg_ppp, Fest, nsim=99)

## Generating 99 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
## 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,  99.
## 
## Done.

plot(KDG_F.csr)

6.2.3 K-Function & L-Function

Aborted due to lack of memory

7 Kernel Density Maps

7.1 Childcare

7.1.1 Automatic Bandwidth Selection Method

Rescale cc_ppp for increased interpretability

cc_ppp.km <- rescale(cc_ppp, 1000, "km")

Try bw.ppl method since our data is clustered

kde_cc.ppl <- density(cc_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="gaussian")
plot(kde_cc.ppl, main = "bw.ppl")

Density of Childcares is the relatively well evenly distributed in the residential regions.

Try other kernel methods

par(mfrow=c(1,2))
plot(density(cc_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="gaussian"), main="Gaussian")
plot(density(cc_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="epanechnikov"), main="Epanechnikov")

par(mfrow=c(1,2))
plot(density(cc_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="quartic"), main="Quartic")
plot(density(cc_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="disc"), main="Disc")

7.1.2 Adaptive Bandwidth Selection Method

kde_cc_adaptive <- adaptive.density(cc_ppp.km, method="kernel")
plot(kde_cc_adaptive)

par(mfrow=c(1,2))
plot(kde_cc.ppl, main = "Fixed bandwidth")
plot(kde_cc_adaptive, main = "Adaptive bandwidth")

Use fixed method as Singapore’s Spatial point patterns does not exhibit that much geographical skew distribution.

7.1.3 Convert KDE output to grid

gridded_kde_cc_bw <- as.SpatialGridDataFrame.im(kde_cc.ppl)
spplot(gridded_kde_cc_bw)

7.1.4 Convert gridded output to raster

kde_cc_bw_raster <- raster(gridded_kde_cc_bw)
kde_cc_bw_raster

## class      : RasterLayer 
## dimensions : 128, 128, 16384  (nrow, ncol, ncell)
## resolution : 0.419757, 0.2695907  (x, y)
## extent     : 2.667538, 56.39644, 15.74872, 50.25633  (xmin, xmax, ymin, ymax)
## crs        : NA 
## source     : memory
## names      : v 
## values     : -1.388286e-14, 22.41333  (min, max)

#Assign projection system
projection(kde_cc_bw_raster) <- CRS("+init=EPSG:3414")
kde_cc_bw_raster

## class      : RasterLayer 
## dimensions : 128, 128, 16384  (nrow, ncol, ncell)
## resolution : 0.419757, 0.2695907  (x, y)
## extent     : 2.667538, 56.39644, 15.74872, 50.25633  (xmin, xmax, ymin, ymax)
## crs        : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs 
## source     : memory
## names      : v 
## values     : -1.388286e-14, 22.41333  (min, max)

7.1.5 Visualise in tmap

tmap_mode('view')

## tmap mode set to interactive viewing

tm_basemap("OpenStreetMap")+tm_shape(kde_cc_bw_raster) + tm_raster('v')

## Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

7.2 Kindergarden

7.2.1 Automatic Bandwidth Selection Method

Rescale cc_ppp for increased interpretability

kdg_ppp.km <- rescale(kdg_ppp, 1000, "km")

Try bw.ppl method since our data is clustered

kde_kdg.ppl <- density(kdg_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="gaussian")
plot(kde_kdg.ppl, main = "bw.ppl")

Density of Kindergardens is the highest in the SouthEastern Region of Singapore.

Try other kernel methods

par(mfrow=c(1,2))
plot(density(kdg_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="gaussian"), main="Gaussian")
plot(density(kdg_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="epanechnikov"), main="Epanechnikov")

par(mfrow=c(1,2))
plot(density(kdg_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="quartic"), main="Quartic")
plot(density(kdg_ppp.km, sigma=bw.ppl, edge=TRUE, kernel="disc"), main="Disc")

7.2.2 Adaptive Bandwidth Selection Method

kde_kdg_adaptive <- adaptive.density(kdg_ppp.km, method="kernel")
plot(kde_kdg_adaptive)

par(mfrow=c(1,2))
plot(kde_kdg.ppl, main = "Fixed bandwidth")
plot(kde_kdg_adaptive, main = "Adaptive bandwidth")

Use fixed method as Singapore’s Spatial point patterns does not exhibit that much geographical skew distribution.

7.2.3 Convert KDE output to grid

gridded_kde_kdg_bw <- as.SpatialGridDataFrame.im(kde_kdg.ppl)
spplot(gridded_kde_kdg_bw)

7.2.4 Convert gridded output to raster

kde_kdg_bw_raster <- raster(gridded_kde_kdg_bw)
kde_kdg_bw_raster

## class      : RasterLayer 
## dimensions : 128, 128, 16384  (nrow, ncol, ncell)
## resolution : 0.419757, 0.2695907  (x, y)
## extent     : 2.667538, 56.39644, 15.74872, 50.25633  (xmin, xmax, ymin, ymax)
## crs        : NA 
## source     : memory
## names      : v 
## values     : -1.995518e-15, 11.49178  (min, max)

#Assign projection system
projection(kde_kdg_bw_raster) <- CRS("+init=EPSG:3414")
kde_kdg_bw_raster

## class      : RasterLayer 
## dimensions : 128, 128, 16384  (nrow, ncol, ncell)
## resolution : 0.419757, 0.2695907  (x, y)
## extent     : 2.667538, 56.39644, 15.74872, 50.25633  (xmin, xmax, ymin, ymax)
## crs        : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs 
## source     : memory
## names      : v 
## values     : -1.995518e-15, 11.49178  (min, max)

7.2.5 Visualise in tmap

tmap_mode('view')

## tmap mode set to interactive viewing

tm_basemap("OpenStreetMap")+tm_shape(kde_kdg_bw_raster) + tm_raster('v')

## Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

8 Advantages of KDE over Point Maps

KDE Calculates the density of features in a neighbourhood, wheares the Point Maps show the Point Events on the Map.

KDE quantifies and shows us where points may be concentrated in a given area. Given a KDE and a Point Map, we are more likely to be able to easily pick out regions where there are high concentrations of Kindergardens/Childcare as opposed to a Point Map (which can be very hard to see should there be too many points)
KDE allows us to produce a continuous surface from a point layer, and also identify hot spots

9 Limitations

I used Average Enrolment per Kindergarden/Childcare in Section 4.3 as that was the only data set that I could find regarding enrolment. However, enrolment is likely to vary accordingly to the different Kindergarden/Childcare Size.
I was also unable to proxy for demand for Kindergarden and Childcare only. This is because the demand data is not granular enough. Thus I could only compare and contrast the level of demand and supply for preschool services as a whole
The Age Groups according to the ECDA website for Childcare and Kindergarden eligible children is from 2 years to 7 years old. For this analysis, I used children from 0 to 8 years of age from the demographic data. This causes the analysis to be slightly on the more conservative side as opposed to if I had only 2 years to 7 years old data.
I lacked computing power and was unable to compure the F and L functions. However, the other 2 functions seem to point to the fact that the point events are not randomly distributed

IS415_Take-Home_EX1

Sean Lee

20 September 2020

1 Overview

1.1 Background Context

1.2 Objectives

2 Import Packages & Libraries

3 Import Data

3.1 Import Aspatial Data

3.1.1 Get a brief understanding of the aspatial data

3.1.2 Data Manipulation - Reshape popdata df

3.1.3 Deal with NAs

3.2 Import Geospatial Data

3.2.1 Check Projection Systems of GIS Data

3.2.2 Reprojecting PreSchool GIS Data

3.2.3 Examine imported GIS Data

3.2.4 Examine Preschool GIS Data

3.2.5 Convert to generic sp format

3.2.6 Convert to generic ppp format

3.2.7 Processing Duplicates

4 Analysing Demand and Supply of CC & KDG Services by Planning Subzone

4.1 Demand

4.2 Supply

4.3 Demand vs Supply

5 Point Maps

6 2nd Order Spatial Point Patterns Analysis

6.1 Childcare

6.1.1 G-Function

6.1.2 F-Function

6.1.3 K-Function

6.1.4 L-Function

6.2 Kindergarden

6.2.1 G-Function

6.2.2 F-Function

6.2.3 K-Function & L-Function

7 Kernel Density Maps

7.1 Childcare

7.1.1 Automatic Bandwidth Selection Method

7.1.2 Adaptive Bandwidth Selection Method

7.1.3 Convert KDE output to grid

7.1.4 Convert gridded output to raster

7.1.5 Visualise in tmap

7.2 Kindergarden

7.2.1 Automatic Bandwidth Selection Method

7.2.2 Adaptive Bandwidth Selection Method

7.2.3 Convert KDE output to grid

7.2.4 Convert gridded output to raster

7.2.5 Visualise in tmap

8 Advantages of KDE over Point Maps

9 Limitations