IS415 Geospatial Analytics and Application

Instructor: Dr. Kam Tin Seong.

Assoc. Professor of Information Systems (Practice)

1. Overview

Task - analyze the geographic distribution of childcare and kindergarten services in Singapore by using appropriate map analysis & spatial point patterns analysis techniques

Section A: The supply and demand of childcare and kindergarten services by planning subzone

Exploratory Spatial Data Analysis - choropleth map to reveal supply and demand of childcare and kindergarten services at plannning subzone level
Analytics Mapping - reveal relationship between childcare and kindergarten services at planning subzone level
Geocommunication - drawing statistical conclusion from 2 & 3

Section B: Spatial Point Pattern Analysis

Exploratory Spatial Data Analysis - point maps of location of childcare and kindergarten services in separate point maps
2nd order spatial point patterns analysis (null hypothesis, alternative hypothesis, and confidence level)
Kernel density estimation (KDE) maps of childcare and kindergarten services - display on openstreetmap of Singapore, draw statistical conclusions and compare the advantages of KDE over point maps.

1.1. Data sets used

More about the datasets used:

According to Early Childhood Development Agency (ECDA), Singapore preschool landscape comprises both kindergartens and childcare centres.
Kindergartens provide pre-school developmental programme for children from about 2 years to below 7 years of age.
Childcare centres provide child care services and pre-school developmental programmes for children aged between 18 months and below 7 years old.Several centres also provide infant care programmes for infants aged between 2 and 18 months old.
Since the age range overlaps for childcare and kindergarten services, this project will assume that childcare services are for children aged 0 to 2 years, while kindergarten services are for children aged 3 to 6
Since the population data set (Singapore Residents by Planning AreaSubzone, Age Group, Sex and Type of Dwelling, June 2011-2019) has a variable ‘AG’ that have ages binned in 4 years, namely ‘0_to_4’, ‘5_to_9’ and so on, the AG count will be divided in half for the sake of getting the count for children aged 0 to 2 years and children aged 3 to 6.

1.2. Install & Load R packages

R packages used:

sf
tidyverse
ggplot2
ggthemes
plotly
tmap
rgdal
rvest
maptools
mapview
raster
spatstat

packages = c('sf', 'tidyverse', 'ggplot2', 'ggthemes', 'plotly', 'tmap','rgdal','rvest', 'maptools','mapview', 'raster', 'spatstat')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2. Section A: Exploratory Spatial Data Analysis

Choropleth mapping to reveal supply and demand of childcare and kindergarten services at plannning subzone level

2.1 Importing the geospatial data

Planning area (study area)

We will read the available data into both sp object and sf object although we will be using the sf object first.

mpsz_sp <- readOGR(dsn = "data/geospatial", layer= "MP14_SUBZONE_WEB_PL")

## OGR data source with driver: ESRI Shapefile 
## Source: "D:\SMU Year 3 Sem 1\IS415 Geospatial Analytics and Application\Take-home_Ex01\data\geospatial", layer: "MP14_SUBZONE_WEB_PL"
## with 323 features
## It has 15 fields

mpsz_sf <- st_read(dsn = "data/geospatial", layer= "MP14_SUBZONE_WEB_PL")

## Reading layer `MP14_SUBZONE_WEB_PL' from data source `D:\SMU Year 3 Sem 1\IS415 Geospatial Analytics and Application\Take-home_Ex01\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21

summary(mpsz_sf)

##     OBJECTID       SUBZONE_NO      SUBZONE_N          SUBZONE_C        
##  Min.   :  1.0   Min.   : 1.000   Length:323         Length:323        
##  1st Qu.: 81.5   1st Qu.: 2.000   Class :character   Class :character  
##  Median :162.0   Median : 4.000   Mode  :character   Mode  :character  
##  Mean   :162.0   Mean   : 4.625                                        
##  3rd Qu.:242.5   3rd Qu.: 6.500                                        
##  Max.   :323.0   Max.   :17.000                                        
##     CA_IND           PLN_AREA_N         PLN_AREA_C          REGION_N        
##  Length:323         Length:323         Length:323         Length:323        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    REGION_C           INC_CRC            FMEL_UPD_D             X_ADDR     
##  Length:323         Length:323         Min.   :2014-12-05   Min.   : 5093  
##  Class :character   Class :character   1st Qu.:2014-12-05   1st Qu.:21864  
##  Mode  :character   Mode  :character   Median :2014-12-05   Median :28465  
##                                        Mean   :2014-12-05   Mean   :27257  
##                                        3rd Qu.:2014-12-05   3rd Qu.:31674  
##                                        Max.   :2014-12-05   Max.   :50425  
##      Y_ADDR        SHAPE_Leng        SHAPE_Area                geometry  
##  Min.   :19579   Min.   :  871.5   Min.   :   39438   MULTIPOLYGON :323  
##  1st Qu.:31776   1st Qu.: 3709.6   1st Qu.:  628261   epsg:NA      :  0  
##  Median :35113   Median : 5211.9   Median : 1229894   +proj=tmer...:  0  
##  Mean   :36106   Mean   : 6524.4   Mean   : 2420882                      
##  3rd Qu.:39869   3rd Qu.: 6942.6   3rd Qu.: 2106483                      
##  Max.   :49553   Max.   :68083.9   Max.   :69748299

Checking if there are NA values in the planning area data set:

There are no NA values.

any(is.na(mpsz_sf))

## [1] FALSE

Plotting Base Map (study area)

tm_shape(mpsz_sf)+
  tm_basemap('OpenStreetMap')+
  tm_polygons("PLN_AREA_N")+
  tm_text("SUBZONE_N", size = 0.1)+ 
  tm_borders(lwd=0.5)+
  tm_layout(legend.height = 0.45, 
            legend.width = 0.35,
            frame = FALSE) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar(width = 0.15) +
  tm_credits("www.data.gov.sg", 
             position = c("left", "bottom"))

This is Singapore with the different planning subzones shown, this assignment will be based on this map.

Childcare & Kindergarten services

Firstly, we read the child care & kindergarten KML data into R as a Spatial Object (sp) by using readOGR().

childcare_sp <- readOGR("data/geospatial/child-care-services-kml.kml", "CHILDCARE")

## OGR data source with driver: KML 
## Source: "D:\SMU Year 3 Sem 1\IS415 Geospatial Analytics and Application\Take-home_Ex01\data\geospatial\child-care-services-kml.kml", layer: "CHILDCARE"
## with 1545 features
## It has 2 fields

kindergarten_sp <- readOGR("data/geospatial/kindergartens-kml.kml", "KINDERGARTENS")

## OGR data source with driver: KML 
## Source: "D:\SMU Year 3 Sem 1\IS415 Geospatial Analytics and Application\Take-home_Ex01\data\geospatial\kindergartens-kml.kml", layer: "KINDERGARTENS"
## with 448 features
## It has 2 fields

childcare_sf <- st_read("data/geospatial/child-care-services-kml.kml", "CHILDCARE")

## Reading layer `CHILDCARE' from data source `D:\SMU Year 3 Sem 1\IS415 Geospatial Analytics and Application\Take-home_Ex01\data\geospatial\child-care-services-kml.kml' using driver `KML'
## Simple feature collection with 1545 features and 2 fields
## geometry type:  POINT
## dimension:      XYZ
## bbox:           xmin: 103.6824 ymin: 1.248403 xmax: 103.9897 ymax: 1.462134
## z_range:        zmin: 0 zmax: 0
## geographic CRS: WGS 84

kindergarten_sf <- st_read("data/geospatial/kindergartens-kml.kml", "KINDERGARTENS")

## Reading layer `KINDERGARTENS' from data source `D:\SMU Year 3 Sem 1\IS415 Geospatial Analytics and Application\Take-home_Ex01\data\geospatial\kindergartens-kml.kml' using driver `KML'
## Simple feature collection with 448 features and 2 fields
## geometry type:  POINT
## dimension:      XYZ
## bbox:           xmin: 103.6887 ymin: 1.247759 xmax: 103.9717 ymax: 1.455452
## z_range:        zmin: 0 zmax: 0
## geographic CRS: WGS 84

preschool_sf <- st_read("data/geospatial/pre-schools-location-kml.kml", "PRESCHOOLS_LOCATION")

## Reading layer `PRESCHOOLS_LOCATION' from data source `D:\SMU Year 3 Sem 1\IS415 Geospatial Analytics and Application\Take-home_Ex01\data\geospatial\pre-schools-location-kml.kml' using driver `KML'
## Simple feature collection with 1925 features and 2 fields
## geometry type:  POINT
## dimension:      XYZ
## bbox:           xmin: 103.6824 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
## z_range:        zmin: 0 zmax: 0
## geographic CRS: WGS 84

To make the data more accurate, we should check the coordinate reference systems first and change the projection systems to EPSG 3414 to make it more accurate. We use st_crs() function to check sf object’s crs and crs() function to check sp object’s crs.

st_crs(mpsz_sf)
st_crs(childcare_sf)
st_crs(kindergarten_sf)
st_crs(preschool_sf)

crs(mpsz_sp)
crs(childcare_sp)
crs(kindergarten_sp)

Some of the sf data are not in SVY21 and EPSG3414 so we will use st_transform() & spTransform() functions to change it to EPSG3414.

mpsz_sf_svy21 <- st_transform(mpsz_sf, 3414)
childcare_sf_svy21 <- st_transform(childcare_sf,3414)
kindergarten_sf_svy21 <- st_transform(kindergarten_sf,3414)
preschool_sf_svy21 <- st_transform(preschool_sf, 3414)

mpsz_sp_svy21 <- spTransform(mpsz_sp, CRS("+init=epsg:3414"))
childcare_sp_svy21 <- spTransform(childcare_sp, CRS("+init=epsg:3414"))
kindergarten_sp_svy21 <- spTransform(kindergarten_sp, CRS("+init=epsg:3414"))

Let’s check the CRS again for the sf objects.

st_crs(mpsz_sf_svy21)
st_crs(childcare_sf_svy21)
st_crs(kindergarten_sf_svy21)
st_crs(preschool_sf_svy21)

Let’s check the CRS again for the sp objects.

crs(mpsz_sp_svy21)

## CRS arguments:
##  +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1
## +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs

crs(childcare_sp_svy21)

## CRS arguments:
##  +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1
## +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs

crs(kindergarten_sp_svy21)

## CRS arguments:
##  +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1
## +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs

Now that projection system is done, let’s move on to displaying the KML file and see what we have.

head(childcare_sf_svy21, n=1)

## Simple feature collection with 1 feature and 2 fields
## geometry type:  POINT
## dimension:      XYZ
## bbox:           xmin: 27976.73 ymin: 45716.7 xmax: 27976.73 ymax: 45716.7
## z_range:        zmin: 0 zmax: 0
## projected CRS:  SVY21 / Singapore TM
##    Name
## 1 kml_1
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Description
## 1 <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>ADDRESSBLOCKHOUSENUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSBUILDINGNAME</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSPOSTALCODE</th> <td>760742</td> </tr><tr bgcolor=""> <th>ADDRESSSTREETNAME</th> <td>742, YISHUN AVENUE 5, #01 - 470, SINGAPORE 760742</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSTYPE</th> <td></td> </tr><tr bgcolor=""> <th>DESCRIPTION</th> <td>Child Care Services</td> </tr><tr bgcolor="#E3E3F3"> <th>HYPERLINK</th> <td></td> </tr><tr bgcolor=""> <th>LANDXADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor="#E3E3F3"> <th>LANDYADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor=""> <th>NAME</th> <td>AVERBEL CHILD DEVELOPMENT CENTRE PTE LTD</td> </tr><tr bgcolor="#E3E3F3"> <th>PHOTOURL</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSFLOORNUMBER</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>AEA27114446235CE</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200826094036</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSUNITNUMBER</th> <td></td> </tr></table></center>
##                       geometry
## 1 POINT Z (27976.73 45716.7 0)

head(kindergarten_sf_svy21, n=1)

## Simple feature collection with 1 feature and 2 fields
## geometry type:  POINT
## dimension:      XYZ
## bbox:           xmin: 28847.97 ymin: 40124.9 xmax: 28847.97 ymax: 40124.9
## z_range:        zmin: 0 zmax: 0
## projected CRS:  SVY21 / Singapore TM
##    Name
## 1 kml_1
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Description
## 1 <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>ADDRESSBLOCKHOUSENUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSBUILDINGNAME</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSFLOORNUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSPOSTALCODE</th> <td>560644</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSSTREETNAME</th> <td>644 Ang Mo Kio Ave 4  #01-850 S(560644)</td> </tr><tr bgcolor=""> <th>ADDRESSTYPE</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>DESCRIPTION</th> <td>Kindergartens</td> </tr><tr bgcolor=""> <th>HYPERLINK</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>LANDXADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor=""> <th>LANDYADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor="#E3E3F3"> <th>NAME</th> <td>PCF Sparkletots Preschool @ Yio Chu Kang Blk 644 (KN)</td> </tr><tr bgcolor=""> <th>PHOTOURL</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>904D106E26156265</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200813015028</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSUNITNUMBER</th> <td></td> </tr></table></center>
##                       geometry
## 1 POINT Z (28847.97 40124.9 0)

head(preschool_sf_svy21, n=1)

## Simple feature collection with 1 feature and 2 fields
## geometry type:  POINT
## dimension:      XYZ
## bbox:           xmin: 13258.34 ymin: 35611.04 xmax: 13258.34 ymax: 35611.04
## z_range:        zmin: 0 zmax: 0
## projected CRS:  SVY21 / Singapore TM
##    Name
## 1 kml_1
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Description
## 1 <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>CENTRE_NAME</th> <td>BRILLIANT TOTS PTE. LTD.</td> </tr><tr bgcolor=""> <th>CENTRE_CODE</th> <td>PT9334</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESS</th> <td>610, JURONG WEST STREET 65, #01 - 534, S 640610</td> </tr><tr bgcolor=""> <th>POSTAL_CODE</th> <td>640610</td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>0523C7904478A63D</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200812235534</td> </tr></table></center>
##                        geometry
## 1 POINT Z (13258.34 35611.04 0)

You can see that all the variables that we need are lumped into the Description variable. Hence, we need to split the Description variable to create new columns.

2.2 Tidying KML data

We use the read_html(), html_node() and html_table() functions from the rvest package to split the table in the Description variable and pivot_wider() function to create new columns of Attribute and Value. Credits: Stack Overflow

childcare KML data

attributes <- lapply(X = 1:nrow(childcare_sf_svy21), 
                     FUN = function(x) {

                       childcare_sf_svy21 %>% 
                         slice(x) %>%
                         pull(Description) %>%
                         read_html() %>%
                         html_node("table") %>%
                         html_table(header = TRUE, trim = TRUE, dec = ".", fill = TRUE) %>%
                         as_tibble(.name_repair = ~ make.names(c("Attribute", "Value"))) %>% 
                         pivot_wider(names_from = Attribute, values_from = Value)

                     })

Next, we create a new dataframe that adds new columns from the different Attributes and Values in the respective columns and deselect Description to make the data tidy.

childcare_attr <- childcare_sf_svy21 %>%
  bind_cols(bind_rows(attributes)) %>%
  dplyr::select(-Description)

Let’s see the new sf childcare dataframe.

head(childcare_attr, n=1)

## Simple feature collection with 1 feature and 16 fields
## geometry type:  POINT
## dimension:      XYZ
## bbox:           xmin: 27976.73 ymin: 45716.7 xmax: 27976.73 ymax: 45716.7
## z_range:        zmin: 0 zmax: 0
## projected CRS:  SVY21 / Singapore TM
##    Name ADDRESSBLOCKHOUSENUMBER ADDRESSBUILDINGNAME ADDRESSPOSTALCODE
## 1 kml_1                                                        760742
##                                   ADDRESSSTREETNAME ADDRESSTYPE
## 1 742, YISHUN AVENUE 5, #01 - 470, SINGAPORE 760742            
##           DESCRIPTION HYPERLINK LANDXADDRESSPOINT LANDYADDRESSPOINT
## 1 Child Care Services                           0                 0
##                                       NAME PHOTOURL ADDRESSFLOORNUMBER
## 1 AVERBEL CHILD DEVELOPMENT CENTRE PTE LTD                            
##            INC_CRC     FMEL_UPD_D ADDRESSUNITNUMBER
## 1 AEA27114446235CE 20200826094036                  
##                       geometry
## 1 POINT Z (27976.73 45716.7 0)

any(is.na(childcare_attr))

## [1] FALSE

kindergarten KML data

attributes <- lapply(X = 1:nrow(kindergarten_sf_svy21), 
                     FUN = function(x) {

                       kindergarten_sf_svy21 %>% 
                         slice(x) %>%
                         pull(Description) %>%
                         read_html() %>%
                         html_node("table") %>%
                         html_table(header = TRUE, trim = TRUE, dec = ".", fill = TRUE) %>%
                         as_tibble(.name_repair = ~ make.names(c("Attribute", "Value"))) %>% 
                         pivot_wider(names_from = Attribute, values_from = Value)

                     })

kindergarten_attr <- kindergarten_sf_svy21 %>%
  bind_cols(bind_rows(attributes)) %>%
  dplyr::select(-Description)

Let’s see the new sf kindergarten dataframe.

head(kindergarten_attr, n=1)

## Simple feature collection with 1 feature and 16 fields
## geometry type:  POINT
## dimension:      XYZ
## bbox:           xmin: 28847.97 ymin: 40124.9 xmax: 28847.97 ymax: 40124.9
## z_range:        zmin: 0 zmax: 0
## projected CRS:  SVY21 / Singapore TM
##    Name ADDRESSBLOCKHOUSENUMBER ADDRESSBUILDINGNAME ADDRESSFLOORNUMBER
## 1 kml_1                                                               
##   ADDRESSPOSTALCODE                       ADDRESSSTREETNAME ADDRESSTYPE
## 1            560644 644 Ang Mo Kio Ave 4  #01-850 S(560644)            
##     DESCRIPTION HYPERLINK LANDXADDRESSPOINT LANDYADDRESSPOINT
## 1 Kindergartens                           0                 0
##                                                    NAME PHOTOURL
## 1 PCF Sparkletots Preschool @ Yio Chu Kang Blk 644 (KN)         
##            INC_CRC     FMEL_UPD_D ADDRESSUNITNUMBER
## 1 904D106E26156265 20200813015028                  
##                       geometry
## 1 POINT Z (28847.97 40124.9 0)

any(is.na(kindergarten_attr))

## [1] FALSE

preschool KML data

attributes <- lapply(X = 1:nrow(preschool_sf_svy21), 
                     FUN = function(x) {

                       preschool_sf_svy21 %>% 
                         slice(x) %>%
                         pull(Description) %>%
                         read_html() %>%
                         html_node("table") %>%
                         html_table(header = TRUE, trim = TRUE, dec = ".", fill = TRUE) %>%
                         as_tibble(.name_repair = ~ make.names(c("Attribute", "Value"))) %>% 
                         pivot_wider(names_from = Attribute, values_from = Value)

                     })

preschool_attr <- preschool_sf_svy21 %>%
  bind_cols(bind_rows(attributes)) %>%
  dplyr::select(-Description)

Let’s see the new sf preschool dataframe.

head(preschool_attr, n=1)

## Simple feature collection with 1 feature and 7 fields
## geometry type:  POINT
## dimension:      XYZ
## bbox:           xmin: 13258.34 ymin: 35611.04 xmax: 13258.34 ymax: 35611.04
## z_range:        zmin: 0 zmax: 0
## projected CRS:  SVY21 / Singapore TM
##    Name              CENTRE_NAME CENTRE_CODE
## 1 kml_1 BRILLIANT TOTS PTE. LTD.      PT9334
##                                           ADDRESS POSTAL_CODE          INC_CRC
## 1 610, JURONG WEST STREET 65, #01 - 534, S 640610      640610 0523C7904478A63D
##       FMEL_UPD_D                      geometry
## 1 20200812235534 POINT Z (13258.34 35611.04 0)

any(is.na(preschool_attr), n=1)

## [1] TRUE

summary(childcare_attr)

##      Name           ADDRESSBLOCKHOUSENUMBER ADDRESSBUILDINGNAME
##  Length:1545        Length:1545             Length:1545        
##  Class :character   Class :character        Class :character   
##  Mode  :character   Mode  :character        Mode  :character   
##  ADDRESSPOSTALCODE  ADDRESSSTREETNAME  ADDRESSTYPE        DESCRIPTION       
##  Length:1545        Length:1545        Length:1545        Length:1545       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##   HYPERLINK         LANDXADDRESSPOINT  LANDYADDRESSPOINT      NAME          
##  Length:1545        Length:1545        Length:1545        Length:1545       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##    PHOTOURL         ADDRESSFLOORNUMBER   INC_CRC           FMEL_UPD_D       
##  Length:1545        Length:1545        Length:1545        Length:1545       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##  ADDRESSUNITNUMBER           geometry   
##  Length:1545        POINT Z      :1545  
##  Class :character   epsg:3414    :   0  
##  Mode  :character   +proj=tmer...:   0

summary(kindergarten_attr)

##      Name           ADDRESSBLOCKHOUSENUMBER ADDRESSBUILDINGNAME
##  Length:448         Length:448              Length:448         
##  Class :character   Class :character        Class :character   
##  Mode  :character   Mode  :character        Mode  :character   
##  ADDRESSFLOORNUMBER ADDRESSPOSTALCODE  ADDRESSSTREETNAME  ADDRESSTYPE       
##  Length:448         Length:448         Length:448         Length:448        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##  DESCRIPTION         HYPERLINK         LANDXADDRESSPOINT  LANDYADDRESSPOINT 
##  Length:448         Length:448         Length:448         Length:448        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##      NAME             PHOTOURL           INC_CRC           FMEL_UPD_D       
##  Length:448         Length:448         Length:448         Length:448        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##  ADDRESSUNITNUMBER           geometry  
##  Length:448         POINT Z      :448  
##  Class :character   epsg:3414    :  0  
##  Mode  :character   +proj=tmer...:  0

summary(preschool_attr)

##      Name           CENTRE_NAME        CENTRE_CODE          ADDRESS         
##  Length:1925        Length:1925        Length:1925        Length:1925       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##  POSTAL_CODE          INC_CRC           FMEL_UPD_D                 geometry   
##  Length:1925        Length:1925        Length:1925        POINT Z      :1925  
##  Class :character   Class :character   Class :character   epsg:3414    :   0  
##  Mode  :character   Mode  :character   Mode  :character   +proj=tmer...:   0

Let’s see the sp objects.

It’s also lumped into the Description variable, but we will leave it first since we alr processed the sf KML data.

head(childcare_sp_svy21, n=1)

##    Name
## 1 kml_1
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Description
## 1 <center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"> <th>ADDRESSBLOCKHOUSENUMBER</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSBUILDINGNAME</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSPOSTALCODE</th> <td>760742</td> </tr><tr bgcolor=""> <th>ADDRESSSTREETNAME</th> <td>742, YISHUN AVENUE 5, #01 - 470, SINGAPORE 760742</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSTYPE</th> <td></td> </tr><tr bgcolor=""> <th>DESCRIPTION</th> <td>Child Care Services</td> </tr><tr bgcolor="#E3E3F3"> <th>HYPERLINK</th> <td></td> </tr><tr bgcolor=""> <th>LANDXADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor="#E3E3F3"> <th>LANDYADDRESSPOINT</th> <td>0</td> </tr><tr bgcolor=""> <th>NAME</th> <td>AVERBEL CHILD DEVELOPMENT CENTRE PTE LTD</td> </tr><tr bgcolor="#E3E3F3"> <th>PHOTOURL</th> <td></td> </tr><tr bgcolor=""> <th>ADDRESSFLOORNUMBER</th> <td></td> </tr><tr bgcolor="#E3E3F3"> <th>INC_CRC</th> <td>AEA27114446235CE</td> </tr><tr bgcolor=""> <th>FMEL_UPD_D</th> <td>20200826094036</td> </tr><tr bgcolor="#E3E3F3"> <th>ADDRESSUNITNUMBER</th> <td></td> </tr></table></center>

2.3 Importing the aspatial data (csv file)

Singapore Population

population <- read_csv('data/aspatial/respopagesextod2011to2019.csv')

head(population)

## # A tibble: 6 x 7
##   PA        SZ               AG    Sex   TOD                           Pop  Time
##   <chr>     <chr>            <chr> <chr> <chr>                       <dbl> <dbl>
## 1 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 1- and 2-Room Flats         0  2011
## 2 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 3-Room Flats               10  2011
## 3 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 4-Room Flats               30  2011
## 4 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 5-Room and Executive F~    50  2011
## 5 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HUDC Flats (excluding thos~     0  2011
## 6 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Landed Properties               0  2011

distinct(population, SZ)

## # A tibble: 323 x 1
##    SZ                    
##    <chr>                 
##  1 Ang Mo Kio Town Centre
##  2 Cheng San             
##  3 Chong Boon            
##  4 Kebun Bahru           
##  5 Sembawang Hills       
##  6 Shangri-La            
##  7 Tagore                
##  8 Townsville            
##  9 Yio Chu Kang          
## 10 Yio Chu Kang East     
## # ... with 313 more rows

Checking if there are NA values in the population data set:

There are no NA values.

any(is.na(population))

## [1] FALSE

population[rowSums(is.na(population))!=0,]

## # A tibble: 0 x 7
## # ... with 7 variables: PA <chr>, SZ <chr>, AG <chr>, Sex <chr>, TOD <chr>,
## #   Pop <dbl>, Time <dbl>

2.4 Combining spatial & aspatial data

To make the choropleth map, we need to first split the AG variable (age) in the population data set to different variable of binned ages.

We are particularly interested in 2 variables, childcare services age, 0 to 2 years old and kindergarten services age, 3 to 6 years old.

pop_preschool <- population %>%
  filter(Time == 2019) %>%
  spread(AG, Pop) %>%
  mutate(`CHILDCARE_POP` = `0_to_4` %/%  2) %>%
  mutate(`KINDERGARTEN_POP` = (`0_to_4` %/%  2) + (`5_to_9` %/%  (2/5)))%>%
  mutate(`YOUNG` = `0_to_4`+`5_to_9`+`10_to_14`+
`15_to_19`+`20_to_24`) %>%
  mutate(`ECONOMICALLY_ACTIVE` = rowSums(.[9:13])+rowSums(.[15:17]))%>%
  mutate(`AGED`= rowSums(.[18:22])) %>%
  mutate(`TOTAL` = rowSums(.[5:22])) %>%
  mutate_at(.vars = vars(PA, SZ, Sex), .funs = funs(toupper))%>%
  dplyr::select(`PA`, `SZ`, `KINDERGARTEN_POP`, `CHILDCARE_POP`, `YOUNG`, `ECONOMICALLY_ACTIVE`, `AGED`, `TOTAL`)  %>%
  filter(`ECONOMICALLY_ACTIVE`> 0)

Since we do not need the data on TOD, the residence type of the population in each planning area, we will create a new dataframe that adds up the population in each subzone.

pop_preschool2 <- data.frame()

index = 1
for (i in 1:nrow(pop_preschool)){
  if (i == 1){ 
    data1 <- pop_preschool[i,]
    df1 <- data.frame(data1)
    pop_preschool2 <- rbind(pop_preschool2, df1)
  } 
  if ((i != 1) & (pop_preschool[i,2] %in% pop_preschool2[,2] == 0)){ 
    data2 <- pop_preschool[i,]
    df2 <- data.frame(data2)
    pop_preschool2 <- rbind(pop_preschool2, df2)
    index = index + 1
  } else{
    for (j in 3:8){
      pop_preschool2[index, j] = pop_preschool2[index,j] + pop_preschool[i, j]
    }
  }
}

From the code above, the first observation of subzone Ang Mo Kio Town Centre is added twice to the final dataframe because of our first condition, so we have to reduce the first row value by the first observation of subzone Ang Mo Kio Town Centre before further manipulating the data.

for (a in 3:8){
  pop_preschool2[1,a] = pop_preschool2[1,a] - pop_preschool[1,a]
}

Then, we will create new columns named:

‘PRESCHOOL_POP’ which is the percentage of the population of children aged 0 to 6 who go for childcare and kindergarten services
‘CHILDCARE_POP_PERCENT’ which is the percentage of the population of children aged 0 to 2 who go for childcare services
‘KINDERGARTEN_POP_PERCENT’ which is the percentage of the population of children aged 3 to 6 who go for kindergarten services

pop_preschool_final <- pop_preschool2 %>%
  mutate(`PRESCHOOL_POP` = ((`CHILDCARE_POP` + `KINDERGARTEN_POP`)/ `TOTAL`)*100)%>%
  mutate(`CHILDCARE_POP_PERCENT` = ((`CHILDCARE_POP`)/ `TOTAL`)*100)%>%
  mutate(`KINDERGARTEN_POP_PERCENT` = ((`KINDERGARTEN_POP`)/ `TOTAL`)*100)%>%
dplyr::select(`PA`, `SZ`, `CHILDCARE_POP`,`CHILDCARE_POP_PERCENT`, `KINDERGARTEN_POP`,`KINDERGARTEN_POP_PERCENT`, `PRESCHOOL_POP`, `YOUNG`, `ECONOMICALLY_ACTIVE`, `AGED`, `TOTAL`)

summary(pop_preschool_final)

##       PA                 SZ            CHILDCARE_POP    CHILDCARE_POP_PERCENT
##  Length:233         Length:233         Min.   :   0.0   Min.   :0.0000       
##  Class :character   Class :character   1st Qu.:  65.0   1st Qu.:0.4448       
##  Mode  :character   Mode  :character   Median : 210.0   Median :0.7490       
##                                        Mean   : 398.2   Mean   :0.8852       
##                                        3rd Qu.: 535.0   3rd Qu.:1.1663       
##                                        Max.   :2875.0   Max.   :4.1201       
##  KINDERGARTEN_POP KINDERGARTEN_POP_PERCENT PRESCHOOL_POP        YOUNG      
##  Min.   :    0    Min.   : 0.000           Min.   : 0.000   Min.   :    0  
##  1st Qu.:  453    1st Qu.: 3.167           1st Qu.: 3.579   1st Qu.:  750  
##  Median : 1527    Median : 5.242           Median : 5.947   Median : 2760  
##  Mean   : 2520    Mean   : 5.755           Mean   : 6.640   Mean   : 4568  
##  3rd Qu.: 3437    3rd Qu.: 8.029           3rd Qu.: 9.130   3rd Qu.: 6520  
##  Max.   :16540    Max.   :20.554           Max.   :24.635   Max.   :34260  
##  ECONOMICALLY_ACTIVE      AGED           TOTAL       
##  Min.   :   30       Min.   :    0   Min.   :  4088  
##  1st Qu.: 1630       1st Qu.:  590   1st Qu.: 13014  
##  Median : 5840       Median : 2510   Median : 29240  
##  Mean   : 9623       Mean   : 3464   Mean   : 33147  
##  3rd Qu.:14660       3rd Qu.: 5120   3rd Qu.: 47260  
##  Max.   :75280       Max.   :28520   Max.   :155628

We can see that ‘pop_preschool_final’ has 233 observations of 11 variables, thus we want to check if it corresponds to the distinct values of subzone areas. From the code, we can see that the distinct value is 233, it corresponds to the length of our final population dataframe.

distinct(pop_preschool, SZ)

## # A tibble: 233 x 1
##    SZ                    
##    <chr>                 
##  1 ANG MO KIO TOWN CENTRE
##  2 CHENG SAN             
##  3 CHONG BOON            
##  4 KEBUN BAHRU           
##  5 SEMBAWANG HILLS       
##  6 SHANGRI-LA            
##  7 TAGORE                
##  8 TOWNSVILLE            
##  9 YIO CHU KANG EAST     
## 10 YIO CHU KANG WEST     
## # ... with 223 more rows

Finally, we will merge the aspatial population data (pop_preschool_final) that we have tidied with the spatial data which is the Master Plan 2014 Planning Area Boundary. For the arguments, we put all.x as TRUE so that all the x dataframe (mpsz_sp_svy21) will be kept even if no matches are found, it’s similar to left_join() function in SQL.

mpsz_preschool <- merge(mpsz_sp_svy21, pop_preschool_final,  by.x = "SUBZONE_N", by.y = "SZ", all.x=TRUE)

We can see that there are 90 NA’s from the population data, meaning that not all the subzone planning area are captured in the population data, they have NA population.

summary(mpsz_preschool)

## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min      max
## x  2667.538 56396.44
## y 15748.721 50256.33
## Is projected: TRUE 
## proj4string :
## [+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1
## +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs]
## Data attributes:
##   SUBZONE_N            OBJECTID       SUBZONE_NO      SUBZONE_C        
##  Length:323         Min.   :  1.0   Min.   : 1.000   Length:323        
##  Class :character   1st Qu.: 81.5   1st Qu.: 2.000   Class :character  
##  Mode  :character   Median :162.0   Median : 4.000   Mode  :character  
##                     Mean   :162.0   Mean   : 4.625                     
##                     3rd Qu.:242.5   3rd Qu.: 6.500                     
##                     Max.   :323.0   Max.   :17.000                     
##                                                                        
##     CA_IND           PLN_AREA_N         PLN_AREA_C          REGION_N        
##  Length:323         Length:323         Length:323         Length:323        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    REGION_C           INC_CRC           FMEL_UPD_D            X_ADDR     
##  Length:323         Length:323         Length:323         Min.   : 5093  
##  Class :character   Class :character   Class :character   1st Qu.:21864  
##  Mode  :character   Mode  :character   Mode  :character   Median :28465  
##                                                           Mean   :27257  
##                                                           3rd Qu.:31674  
##                                                           Max.   :50425  
##                                                                          
##      Y_ADDR        SHAPE_Leng        SHAPE_Area            PA           
##  Min.   :19579   Min.   :  871.5   Min.   :   39438   Length:323        
##  1st Qu.:31776   1st Qu.: 3709.6   1st Qu.:  628261   Class :character  
##  Median :35113   Median : 5211.9   Median : 1229894   Mode  :character  
##  Mean   :36106   Mean   : 6524.4   Mean   : 2420882                     
##  3rd Qu.:39869   3rd Qu.: 6942.6   3rd Qu.: 2106483                     
##  Max.   :49553   Max.   :68083.9   Max.   :69748299                     
##                                                                         
##  CHILDCARE_POP    CHILDCARE_POP_PERCENT KINDERGARTEN_POP
##  Min.   :   0.0   Min.   :0.0000        Min.   :    0   
##  1st Qu.:  65.0   1st Qu.:0.4448        1st Qu.:  453   
##  Median : 210.0   Median :0.7490        Median : 1527   
##  Mean   : 398.2   Mean   :0.8852        Mean   : 2520   
##  3rd Qu.: 535.0   3rd Qu.:1.1663        3rd Qu.: 3437   
##  Max.   :2875.0   Max.   :4.1201        Max.   :16540   
##  NA's   :90       NA's   :90            NA's   :90      
##  KINDERGARTEN_POP_PERCENT PRESCHOOL_POP        YOUNG       ECONOMICALLY_ACTIVE
##  Min.   : 0.000           Min.   : 0.000   Min.   :    0   Min.   :   30      
##  1st Qu.: 3.167           1st Qu.: 3.579   1st Qu.:  750   1st Qu.: 1630      
##  Median : 5.242           Median : 5.947   Median : 2760   Median : 5840      
##  Mean   : 5.755           Mean   : 6.640   Mean   : 4568   Mean   : 9623      
##  3rd Qu.: 8.029           3rd Qu.: 9.130   3rd Qu.: 6520   3rd Qu.:14660      
##  Max.   :20.554           Max.   :24.635   Max.   :34260   Max.   :75280      
##  NA's   :90               NA's   :90       NA's   :90      NA's   :90         
##       AGED           TOTAL       
##  Min.   :    0   Min.   :  4088  
##  1st Qu.:  590   1st Qu.: 13014  
##  Median : 2510   Median : 29240  
##  Mean   : 3464   Mean   : 33147  
##  3rd Qu.: 5120   3rd Qu.: 47260  
##  Max.   :28520   Max.   :155628  
##  NA's   :90      NA's   :90

2.5 Choropleth Mapping

Demand - Population

Demand is a lot about population, so we will plot maps of the overall population, preschool age population and describe the spatial patterns.

Map Function for Demand

map_layout <- function(x,y,z){
  tm_shape(x)+
    tm_fill(y, 
            style = 'equal',
            palette = 'Blues',
            legend.hist = TRUE,
            legend.is.portrait = TRUE,
            legend.hist.z = 0.2)+
    tm_borders(alpha = 0.5, col = 'white')+
    tm_layout(main.title = z,
              main.title.position = 'center',
              main.title.size = 1.2,
              legend.height = 0.45, 
              legend.width = 0.35,
              legend.outside = FALSE,
              legend.position = c("right", "bottom"),
              frame = FALSE) 
}

Population by planning subzone area

tm_shape(mpsz_preschool)+
  tm_fill("TOTAL", style = 'pretty')+
  tm_style('natural')+
  tm_layout(main.title = "Overall population of Singapore",
            main.title.position = 'center',
            main.title.size = 1.2,
            legend.height = 0.45, 
            legend.width = 0.35,
            legend.outside = FALSE,
            legend.position = c("right", "bottom"),
            frame = FALSE) +
  tm_borders(alpha = 0.5)

n1 <- order(mpsz_preschool$TOTAL, na.last = TRUE, decreasing = TRUE)[1:4]

((mpsz_preschool$TOTAL)[n1])

## [1] 155628 120259 109526 103208

((mpsz_preschool$PLN_AREA_N)[n1])

## [1] "TAMPINES"  "WOODLANDS" "BEDOK"     "TAMPINES"

((mpsz_preschool$SUBZONE_N)[n1])

## [1] "TAMPINES EAST"  "WOODLANDS EAST" "BEDOK NORTH"    "TAMPINES WEST"

Map description:

From this map, we can see that there are only 4 subzone areas that have a population over 100,000 people and 1 subzone area that has a population over 150,000
The highest populated subzone area is Tampines East with 155,628 people

Population (Preschool)

d1 <- map_layout(mpsz_preschool, "PRESCHOOL_POP", "Overall population of children aged 0 to 6")
d1

n2 <- order(mpsz_preschool$PRESCHOOL_POP, na.last = TRUE, decreasing = TRUE)[1:5]

((mpsz_preschool$PRESCHOOL_POP)[n2])

## [1] 24.63457 24.55684 21.10544 19.74317 19.31104

((mpsz_preschool$PLN_AREA_N)[n2])

## [1] "PUNGGOL"  "PUNGGOL"  "SENGKANG" "SENGKANG" "YISHUN"

((mpsz_preschool$SUBZONE_N)[n2])

## [1] "WATERWAY EAST" "MATILDA"       "FERNVALE"      "ANCHORVALE"   
## [5] "YISHUN EAST"

Map description: From the map, we can see that there are 3 planning area that are in dark blue meaning a high population of children aged 0 to 7 (21.10% - 24.63%). These planning subzone areas are:

Punggol, Waterwest East = 24.63 %
Punggol, Matilda = 24.56 %
Sengkang, Fernvale = 21.10 %

Population (Childcare)

d2 <- map_layout(mpsz_preschool, "CHILDCARE_POP_PERCENT", "Population of children aged 0 to 2 (Childcare services)")
d2

n3 <- order(mpsz_preschool$CHILDCARE_POP_PERCENT, na.last = TRUE, decreasing = TRUE)[1:5]

((mpsz_preschool$CHILDCARE_POP_PERCENT)[n3])

## [1] 4.120092 4.002532 3.377225 3.107460 2.988886

((mpsz_preschool$PLN_AREA_N)[n3])

## [1] "PUNGGOL"  "PUNGGOL"  "PUNGGOL"  "SENGKANG" "SENGKANG"

((mpsz_preschool$SUBZONE_N)[n3])

## [1] "WATERWAY EAST"       "MATILDA"             "PUNGGOL TOWN CENTRE"
## [4] "FERNVALE"            "ANCHORVALE"

Map description: From the map, we can see that there are 3 planning area that are in dark blue meaning a high population of children aged 0 to 2 (3.269% - 4.120%) that go for childcare services. These planning subzone areas are:

Punggol, Waterwest East = 4.12 % %
Punggol, Matilda = 4.00 %
Punggol, Punggol Town Centre = 3.38 %

Population (Kindergarten)

d3 <- map_layout(mpsz_preschool, "KINDERGARTEN_POP_PERCENT", "Population of children aged 3 to 7 (Kindergarten services)")
d3

n4 <- order(mpsz_preschool$KINDERGARTEN_POP_PERCENT, na.last = TRUE, decreasing = TRUE)[1:5]

((mpsz_preschool$KINDERGARTEN_POP_PERCENT)[n4])

## [1] 20.55431 20.51447 17.99798 16.75428 16.40684

((mpsz_preschool$PLN_AREA_N)[n4])

## [1] "PUNGGOL"  "PUNGGOL"  "SENGKANG" "SENGKANG" "YISHUN"

((mpsz_preschool$SUBZONE_N)[n4])

## [1] "MATILDA"       "WATERWAY EAST" "FERNVALE"      "ANCHORVALE"   
## [5] "YISHUN EAST"

Map description: From the map, we can see that there are 3 planning area that are in dark blue meaning a high population of children aged 3 to 7 (17.99% - 20.55%). These planning subzone areas are:

Punggol, Matilda = 20.55 %
Punggol, Waterway East = 20.51 %
Sengkang, Fernvale = 17.99 %

Overall Population (Demand)

#library(grid)

#grid.newpage()
#pushViewport(viewport(layout = grid.layout(nrow = 2, ncol = 2)))
#print(d1, vp = viewport(layout.pos.row = 1, layout.pos.col = 1:2))
#print(d2, vp = viewport(layout.pos.row = 2, layout.pos.col = 1))
#print(d3, vp = viewport(layout.pos.row = 2, layout.pos.col = 2))

tmap_arrange(d1,d2,d3)

Map description:

From this map, we can see that the population of childcare age and kindergarten age does not vary from the expected, the areas with the 2 highest percentage of both kindergarten and childcare population are still Punggol Waterway East and Punggol Matilda
We can also see that the population of children age 0 to 2 are mostly concentrated in the north-east area while the population of children age 3 to 7 are more widely spread, including the east area

Supply - Preschool Density in subzone area

Next, we have to identify the pre-schools in each planning subzone area.

Firstly, let’s see if there are any duplicated points events. There seems to be one duplicated point events from the result of the code block. However when we check if there’s any duplicate in Name which contains the block number and building level, it has no duplicate. So, maybe some childcare centres have the same street name.

nrow(childcare_attr)

## [1] 1545

nrow(distinct(childcare_attr, ADDRESSSTREETNAME))

## [1] 1544

any(duplicated(childcare_attr$ADDRESSSTREETNAME))

## [1] TRUE

any(duplicated(childcare_attr$Name))

## [1] FALSE

There are no duplicate data in the kindergarten KML file.

any(duplicated(kindergarten_attr$ADDRESSSTREETNAME))

## [1] FALSE

any(duplicated(kindergarten_attr$Name))

## [1] FALSE

We use the lengths() function and st_intersects() function to calculate the count of preschools in each planning subzone area. For this, we use the sf dataframe because st_intersect() function does not work with sp objects.

mpsz_preschool$`CHILDCARE_COUNT` <- lengths(st_intersects(mpsz_sf_svy21, childcare_sf_svy21))
mpsz_preschool$`KINDERGARTEN_COUNT` <- lengths(st_intersects(mpsz_sf_svy21, kindergarten_sf_svy21))
mpsz_preschool$`PRESCHOOL_COUNT` <- lengths(st_intersects(mpsz_sf_svy21, preschool_sf_svy21))

mpsz_sf_svy21$`CHILDCARE_COUNT` <- lengths(st_intersects(mpsz_sf_svy21, childcare_sf_svy21))
mpsz_sf_svy21$`KINDERGARTEN_COUNT` <- lengths(st_intersects(mpsz_sf_svy21, kindergarten_sf_svy21))
mpsz_sf_svy21$`PRESCHOOL_COUNT` <- lengths(st_intersects(mpsz_sf_svy21, preschool_sf_svy21))

Calculating density of preschool services

We calculate the area of the planning subzone areas using st_area() from the sf package and it only works on non-Spatial data frame so we have to use the ‘mpsz_sf_svy21’.

mpsz_preschool$AREA <- mpsz_sf_svy21 %>%
  st_area()

mpsz_sf_svy21 <- mpsz_sf_svy21 %>%
  mutate(`AREA` = mpsz_preschool$AREA)

Next, we can calculate the respective childcare centres density and kindergartens density by diving the count over area of each planning subzone. Then, multiply by 1,000,000 because the projection system is in SVY21 and it’s in metres. To convert m^2 to km^2, we have to multiply by 1,000,000. We also add the childcare density & kindergarten density to the sf data frame ‘mpsz_sf_svy21’ for future map plotting or gg2plots.

mpsz_preschool$CHILDCARE_DENSITY <- (mpsz_preschool$CHILDCARE_COUNT/ mpsz_preschool$AREA * 1000000)
mpsz_preschool$KINDERGARTEN_DENSITY <- (mpsz_preschool$KINDERGARTEN_COUNT/ mpsz_preschool$AREA * 1000000)
mpsz_preschool$PRESCHOOL_DENSITY <- ((mpsz_preschool$KINDERGARTEN_COUNT+ mpsz_preschool$CHILDCARE_COUNT)/ mpsz_preschool$AREA * 1000000)



mpsz_sf_svy21$CHILDCARE_DENSITY <- (mpsz_preschool$CHILDCARE_COUNT/ mpsz_preschool$AREA * 1000000)
mpsz_sf_svy21$KINDERGARTEN_DENSITY <- (mpsz_preschool$KINDERGARTEN_COUNT/ mpsz_preschool$AREA * 1000000)
mpsz_sf_svy21$PRESCHOOL_DENSITY <- ((mpsz_preschool$KINDERGARTEN_COUNT+ mpsz_preschool$CHILDCARE_COUNT)/ mpsz_preschool$AREA * 1000000)

#mpsz_sp_svy21 <- mpsz_preschool %>%
#  mutate(`CHILDCARE_DENSITY` = `CHILDCARE_COUNT`/ `AREA` * 1000000) %>%
#  mutate(`KINDERGARTEN_DENSITY` = `KINDERGARTEN_COUNT`/ `AREA` * 1000000)

ggplot(data = mpsz_sf_svy21, aes(x= '', y= as.numeric(`CHILDCARE_DENSITY`)))+
  geom_boxplot()

ggplot(data = mpsz_sf_svy21, aes(x= '', y= as.numeric(`KINDERGARTEN_DENSITY`))) +
  geom_boxplot()

Box plot description: We can see that both boxplots are equally skewed to the left, towards 0 and have some outliers. However, the boxplot for kindergarten_density is more skewed to the left of the graph and to the right of the range (above the middle) compared to the childcare_density data.

Hence, we cannot use quantile method for our tmap plots, since it will be skewed. We will be using ‘pretty’ and ‘equal’ or ‘fixed’ with manually determined breaks.

Number of preschool services

s1 <- tm_shape(mpsz_preschool)+
    tm_fill("PRESCHOOL_COUNT", 
            style = 'equal',
            palette = 'Purples',
            legend.hist = TRUE,
            legend.is.portrait = TRUE,
            legend.hist.z = 0.2)+
    tm_layout(main.title = "Number of preschool services",
              main.title.position = 'center',
              main.title.size = 1.2,
              legend.height = 0.45, 
              legend.width = 0.35,
              legend.outside = FALSE,
              legend.position = c("right", "bottom"),
              frame = FALSE) +
    tm_borders(alpha = 0.5)
s1

Checking out the top 3 planning subzone area with highest preschool count.

n5 <- order(mpsz_preschool$PRESCHOOL_COUNT, na.last = TRUE, decreasing = TRUE)[1:3]

((mpsz_preschool$PRESCHOOL_COUNT)[n5])

## [1] 58 47 31

((mpsz_preschool$PLN_AREA_N)[n5])

## [1] "TAMPINES"  "WOODLANDS" "BEDOK"

((mpsz_preschool$SUBZONE_N)[n5])

## [1] "TAMPINES EAST"  "WOODLANDS EAST" "BEDOK NORTH"

((mpsz_preschool$PRESCHOOL_POP)[n5])

## [1] 12.459198 15.383464  9.702719

Map Description: the areas with highest preschool centres are:

Tampines East: 58 centres with 12.5% of the population being eligible for preschool
Woodlands East: 47 centres with 15.4% of the population being eligible for preschool
Bedok North: 31 centres with 9.7% of the population being eligible for preschool

Checking out the top 3 planning subzone area with highest childcare & kindergarten count.

n6 <- order(mpsz_preschool$CHILDCARE_COUNT, na.last = TRUE, decreasing = TRUE)[1:5]

((mpsz_preschool$CHILDCARE_COUNT)[n6])

## [1] 42 42 27 25 23

((mpsz_preschool$PLN_AREA_N)[n6])

## [1] "TAMPINES"  "WOODLANDS" "SENGKANG"  "BEDOK"     "TAMPINES"

((mpsz_preschool$SUBZONE_N)[n6])

## [1] "TAMPINES EAST"        "WOODLANDS EAST"       "SENGKANG TOWN CENTRE"
## [4] "BEDOK NORTH"          "TAMPINES WEST"

n7 <- order(mpsz_preschool$KINDERGARTEN_COUNT, na.last = TRUE, decreasing = TRUE)[1:3]

((mpsz_preschool$KINDERGARTEN_COUNT)[n7])

## [1] 17 11 10

((mpsz_preschool$PLN_AREA_N)[n7])

## [1] "TAMPINES"      "MARINE PARADE" "GEYLANG"

((mpsz_preschool$SUBZONE_N)[n7])

## [1] "TAMPINES EAST" "KATONG"        "ALJUNIED"

We can see that both the ranks of highest childcare centres are not similar except Tampines East which has the highest overall population in Singapore too.

However, highest count does not always mean good, because it might be too much for a small area and too few for a big area. So, let’s take a look at the density next.

3. Section A: Analytics mapping

Revealing relationship between childcare & kindergarten services at planning subzone level

Map function for density

map_layout_density <- function(x,y,z){
  tm_shape(x)+
    tm_fill(y, 
            style = 'fixed',
            breaks = c(0,1,5,10,15,20),
            palette = "Purples",
            legend.hist = TRUE,
            legend.is.portrait = TRUE,
            legend.hist.z = 0.2)+
    tm_borders(alpha = 0.5, col = 'white')+
    tm_layout(main.title = z,
              main.title.position = 'center',
              main.title.size = 1.2,
              legend.height = 0.45, 
              legend.width = 0.35,
              legend.outside = FALSE,
              legend.position = c("right", "bottom"),
              frame = FALSE) 
}

s2 <- map_layout_density(mpsz_preschool, "PRESCHOOL_DENSITY", "Density of preschool services")


s3 <- map_layout_density(mpsz_preschool, "CHILDCARE_DENSITY", "Density of childcare services")


s4 <- map_layout_density(mpsz_preschool, "KINDERGARTEN_DENSITY", "Density of kindergarten services")

par(mfrow= c(2,2))

s2

s3

s4

n8 <- order(mpsz_preschool$PRESCHOOL_DENSITY, na.last = TRUE, decreasing = TRUE)[1:5]

((mpsz_preschool$PRESCHOOL_DENSITY)[n8])

## Units: [1/m^2]
## [1] 35.60169 34.93134 25.35630 23.89766 23.36775

((mpsz_preschool$PLN_AREA_N)[n8])

## [1] "DOWNTOWN CORE" "MANDAI"        "DOWNTOWN CORE" "SEMBAWANG"    
## [5] "SERANGOON"

((mpsz_preschool$SUBZONE_N)[n8])

## [1] "CECIL"             "MANDAI ESTATE"     "PHILLIP"          
## [4] "SEMBAWANG CENTRAL" "SERANGOON NORTH"

((mpsz_preschool$PRESCHOOL_POP)[n8])

## [1]  0.000000  2.975484        NA 11.749184  8.684998

Map description: From the map, we can see that there are plenty of planning subzone areas with preschool density values of more than 20 per km^2. However, 2 of the areas are in Downtown core, which is not a residential area.

For the relationship between the childcare and kindergarten services, we can see that childcare services are found at the same planning subzone areas and at more planning subzones but which are still neighbouring, nothing too distant.

4. Section A: Drawing statistical conclusion from 2 & 3

General trend: As the childcare/kindergarten service/centres increase, the density also increase, the variation is caused by the difference in area size.

We want to find out whether the preschools are proportional to the population of each planning subzone area so we should plot preschool density vs preschool population.

tmap_mode("view")

## tmap mode set to interactive viewing

tmap_arrange(d1, s2, asp=1, ncol=2)

## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.

## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.

tmap_mode("plot")

## tmap mode set to plotting

Map Description:

We can see that some areas with high preschool population are served well, with 15-20% of preschool services density such as Woodlands East, Keat Hong and Anchorvale, the north & northeast area.

However, we can also see lacking services in some high populated areas such as Fernvale, Holland road and Frankel, newly rising areas which may need more preschool facilities.

5. Section B: Exploratory Spatial Data Analysis

EDA for point maps of location of childcare and kindergarten services in separate point maps

We have done the following in Section A:

Import planning subzone boundary layer, childcare KML file, kindergarten KML file into R as sp objects using readOGR() function
Check the CRS of each sp object and transform it into EPSG3414 using spTransform() function

5.1 Conversion to ppp object format

Check for any NA values or duplicated values in the dataframe

So, the spatstat package requires the data to be in ppp object form. We need to convert the sp object to ppp object format.

Converting childcare & kindergarten data to ppp object format

childcare_sp1 <- as(childcare_sp_svy21, "SpatialPoints")
childcare_ppp <- as(childcare_sp1, "ppp")

kindergarten_sp1 <- as(kindergarten_sp_svy21, "SpatialPoints")
kindergarten_ppp <- as(kindergarten_sp1, "ppp")

Converting mpsz data to owin object (required by spatstat package)

mpsz_owin <- as(mpsz_sp_svy21, "owin")

5.2 Combining ppp object and owin object (Whole of SG)

mpsz_childcare_ppp = childcare_ppp[mpsz_owin]

mpsz_kindergarten_ppp = kindergarten_ppp[mpsz_owin]

5.3 Visualing study area and point maps

par(mfrow= c(1,2))

plot(mpsz_sp_svy21)
plot(childcare_sp_svy21,lwd= 0.1, add=TRUE)

plot(mpsz_sp_svy21)
plot(kindergarten_sp_svy21,lwd= 0.1, add=TRUE)

par(mfrow = c(1,2))
plot(childcare_ppp)
plot(kindergarten_ppp)

par(mfrow= c(1,2))
plot(mpsz_childcare_ppp)
plot(mpsz_kindergarten_ppp)

Changing the order of variables of the sf dataframe, childcare_attr and kindergarten_attr, so that the names of the centres can be displayed on the interactive map.

childcare_attr2 <- childcare_attr%>%
  dplyr::select(NAME, ADDRESSBLOCKHOUSENUMBER, ADDRESSBUILDINGNAME, ADDRESSPOSTALCODE,ADDRESSSTREETNAME, ADDRESSTYPE, DESCRIPTION, LANDXADDRESSPOINT, LANDYADDRESSPOINT, INC_CRC, FMEL_UPD_D, geometry)

kindergarten_attr2 <- kindergarten_attr %>%
  dplyr::select(NAME, ADDRESSBLOCKHOUSENUMBER, ADDRESSBUILDINGNAME, ADDRESSPOSTALCODE,ADDRESSSTREETNAME, ADDRESSTYPE, DESCRIPTION, LANDXADDRESSPOINT, LANDYADDRESSPOINT, INC_CRC, FMEL_UPD_D, geometry)

tmap_mode("view")

## tmap mode set to interactive viewing

a <- tm_shape(childcare_attr2)+
  tm_basemap("OpenStreetMap")+
  tm_bubbles(col='red', size= 0.2)


b <- tm_shape(kindergarten_attr2)+
  tm_basemap("OpenStreetMap")+
  tm_bubbles(col='blue', size = 0.2)
  
tmap_arrange(a, b, asp=1, ncol=2)

tmap_mode("plot")

## tmap mode set to plotting

After a few runs of the L-function, the author’s laptop does not have enough resources to process the study area of whole of Singapore.

We have been given the option to choose to focus on the following four planning areas instead of the entire country. They are: Sengkang, Bedok, Bukit Batok & Hougang.

SK = mpsz_sp_svy21[mpsz_sp_svy21@data$PLN_AREA_N == "SENGKANG",]
BD = mpsz_sp_svy21[mpsz_sp_svy21@data$PLN_AREA_N == "BEDOK",]
BB = mpsz_sp_svy21[mpsz_sp_svy21@data$PLN_AREA_N == "BUKIT BATOK",]
HG = mpsz_sp_svy21[mpsz_sp_svy21@data$PLN_AREA_N == "HOUGANG",]

Next, we need to convert the planning area into generic sp objects then to owin objects.

SK_sp = as(SK, "SpatialPolygons")
BD_sp = as(BD, "SpatialPolygons")
BB_sp = as(BB, "SpatialPolygons")
HG_sp = as(HG, "SpatialPolygons")

SK_owin = as(SK_sp, "owin")
BD_owin = as(BD_sp, "owin")
BB_owin = as(BB_sp, "owin")
HG_owin = as(HG_sp, "owin")

5.4 Combining ppp object and owin object (4 planning area)

Sengkang Planning Area

childcare_sk_ppp = childcare_ppp[SK_owin]

kindergarten_sk_ppp = kindergarten_ppp[SK_owin]

Bedok Planning Area

childcare_bd_ppp = childcare_ppp[BD_owin]

kindergarten_bd_ppp = kindergarten_ppp[BD_owin]

Bukit Batok Planning Area

childcare_bb_ppp = childcare_ppp[BB_owin]

kindergarten_bb_ppp = kindergarten_ppp[BB_owin]

Hougang Planning Area

childcare_hg_ppp = childcare_ppp[HG_owin]

kindergarten_hg_ppp = kindergarten_ppp[HG_owin]

Rescaling to kilometres (km)

childcare_sk_ppp.km = rescale(childcare_sk_ppp, 1000, "km")
kindergarten_sk_ppp.km = rescale(kindergarten_sk_ppp, 1000, "km")

childcare_bd_ppp.km = rescale(childcare_bd_ppp, 1000, "km")
kindergarten_bd_ppp.km = rescale(kindergarten_bd_ppp, 1000, "km")

childcare_bb_ppp.km = rescale(childcare_bb_ppp, 1000, "km")
kindergarten_bb_ppp.km = rescale(kindergarten_bb_ppp, 1000, "km")

childcare_hg_ppp.km = rescale(childcare_hg_ppp, 1000, "km")
kindergarten_hg_ppp.km = rescale(kindergarten_hg_ppp, 1000, "km")

5.5 Visualing 4 planning areas

Sengkang Planning Area

par(mfrow= c(1,2))
plot(childcare_sk_ppp.km, main = "Sengkang childcare centres")
plot(kindergarten_sk_ppp.km, main = "Sengkang kindergartens")

Map Description:

Sengkang’s childcare centres are quite clustered especially within that area near the end
Sengkang’s kindergartens are not as clustered, since the point events are less frequent
Bedok Planning Area

par(mfrow= c(1,2))
plot(childcare_bd_ppp.km, main = "Bedok childcare centres")
plot(kindergarten_bd_ppp.km, main = "Bedok kindergartens")

Map Description:

Bedok’s childcare centres are quite randomly located even though there seems to be small clusters
Bedok kindergartens’ spatial pattern is random
Bukit Batok Planning Area

par(mfrow= c(1,2))
plot(childcare_bb_ppp.km, main = "Bukit Batok childcare centres")
plot(kindergarten_bb_ppp.km, main = "Bukit Batok kindergartens")

Map Description:

Bukit Batok childcare centres’ spatial pattern is random
Bukit Batok kindergartens’ spatial pattern is random
Hougang Planning Area

par(mfrow= c(1,2))
plot(childcare_hg_ppp.km, main = "Hougang childcare centres")
plot(kindergarten_hg_ppp.km, main = "Hougang kindergartens")

Map Description:

Hougang’s childcare centres are quite randomly located even though there seems to be small clusters
Hougang kindergartens’ spatial pattern is random

6. Section B: 2nd order spatial point pattern analysis

We will be using the L-Function(Besag 1977), to compute the L-function estimation and envelope() function to perform monte carlo simulation test. Both functions are from spatstat package.

6.1 Sengkang Planning Area

6.1.1 Computing L-function estimation (childcare)

We will use L-function, a transformed version of Ripley’s K-function,because it’s the most commonly used function.

L_sk_c = Lest(childcare_sk_ppp, correction = "Ripley")
plot(L_sk_c, .-r ~r,
     ylab= "L(d)-r", xlab= "d(m)")

6.1.2 Complete Spatial Randomness Test (childcare)

To confirm our observed spatial patterns from computing the estimation, we can conduct a hypothesis test.

Null Hypothesis, H0: The distribution of childcare services in Sengkang is randomly distributed.

Alternative Hypothesis, H1:The distribution of childcare services in Sengkang is not randomly distributed.

We set the alpha value to be 0.01. Since it’s a two tailed test, the nsim, number of simulation will be at least 2/0.01 = 200, 199 because it starts from 0.

L_sk_c.csr <- envelope(childcare_sk_ppp, Lest, nsim = 199, rank =1, glocal = TRUE)

Normal way of plotting:

plot(L_sk_c.csr, .-r ~r, xlab="d", ylab= "L(d)-r")

However, we can’t zoom in to see the exact distance when the estimated L-function exceeds the envelope to indicate statistical significance.

So, we will plot a quantum plot that allows us to slide over a range of values and zoom into the estimated L-function. Help is taken from link1, link2.

quantumPlot <- function(x, title, colour=c("#d73027", "#ffffbf", "#91bfdb")){
  #env.data <- as.data.frame(tree.data)
  #env.data <- env.data[-1,]
  df <- as.data.frame(x)
  
  ggquantumplot <- ggplot(df, aes(r, obs-r))+
    geom_line(colour=c("#4d4d4d"))+
    geom_line(aes(r,theo-r), colour="red", linetype = "dashed")+
    geom_ribbon(aes(ymin=lo-r,ymax=hi-r),alpha=0.1, colour=c("#91bfdb")) +
    xlab("Distance r (m)") +
    ylab("L(r)-r") +
    geom_rug(data=df[df$obs > df$hi,], sides="b", colour=colour[1])  +
    geom_rug(data=df[df$obs < df$lo,], sides="b", colour=colour[2]) +
    geom_rug(data=df[df$obs >= df$lo & df$obs <= df$hi,], sides="b", color=colour[3]) +
    theme_tufte()+
    ggtitle(title)
  
  txt1<-"Clustering"
  txt2<-"Regular/Uniform"
  txt3<-"Random"

  if (nrow(df[df$obs > df$hi,])==0){ 
    if (nrow(df[df$obs < df$lo,])==0){ 
      ggplotly(ggquantumplot, dynamicTicks=T) %>%
        style(text = txt3, traces = 4) %>%
        rangeslider() 
    }else if (nrow(df[df$obs >= df$lo & df$obs <= Lcsr_df$hi,])==0){ 
      ggplotly(ggquantumplot, dynamicTicks=T) %>%
        style(text = txt2, traces = 4) %>%
        rangeslider() 
    }else {
      ggplotly(ggquantumplot, dynamicTicks=T) %>%
        style(text = txt2, traces = 4) %>%
        style(text = txt3, traces = 5) %>%
        rangeslider() 
    }
  } else if (nrow(df[df$obs < df$lo,])==0){
    if (nrow(df[df$obs >= df$lo & df$obs <= df$hi,])==0){
      ggplotly(ggquantumplot, dynamicTicks=T) %>%
        style(text = txt1, traces = 4) %>%
        rangeslider() 
    } else{
      ggplotly(ggquantumplot, dynamicTicks=T) %>%
        style(text = txt1, traces = 4) %>%
        style(text = txt3, traces = 5) %>%
        rangeslider()
    }
  } else{
    ggplotly(ggquantumplot, dynamicTicks=T) %>%
      style(text = txt1, traces = 4) %>%
      style(text = txt2, traces = 5) %>%
      style(text = txt3, traces = 6) %>%
      rangeslider()
    }
  }

quantumPlot(L_sk_c.csr, "Pairwise Distance: L function-Sengkang Childcare")

Conclusion:

Until distance ~66m, childcare centres are not statistically significant and we do not have enough evidence to reject the null hypothesis that the distribution is random, although the estimated L-function is almost one line with the upper envelope even before 150m
After distance ~66m/ when the estimated L-function is above the empirical line and envelope, childcare centres are statistically significant & we have enough evidence to reject the null hypothesis. Hence, the distribution of childcare centres in Sengkang is clustered

6.1.3 Computing L-function estimation (kindergarten)

L_sk_k = Lest(kindergarten_sk_ppp, correction = "Ripley")
plot(L_sk_k, .-r ~r,
     ylab= "L(d)-r", xlab= "d(m)")

6.1.4 Complete Spatial Randomness Test (kindergarten)

Null Hypothesis, H0: The distribution of childcare services in Sengkang is randomly distributed.

Alternative Hypothesis, H1:The distribution of childcare services in Sengkang is not randomly distributed.

We set the alpha value to be 0.01. Since it’s a two tailed test, the nsim, number of simulation will be at least 2/0.01 = 200.

L_sk_k.csr <- envelope(kindergarten_sk_ppp, Lest, nsim = 199, rank =1, glocal = TRUE)

quantumPlot(L_sk_k.csr, "Pairwise Distance: L function-Sengkang Kindergarten")

Conclusion:

We can see that the estimated L-function keeps going in and out of the envelope, so from 0 to 188m, the distance is statistically significant there is clustering
Then from 188m to 492m, the distribution is random and it goes to clustering again

6.2 Bedok Planning Area

6.2.1 Computing L-function estimation (childcare)

L_bd_c = Lest(childcare_bd_ppp, correction = "Ripley")
plot(L_bd_c, .-r ~r,
     ylab= "L(d)-r", xlab= "d(m)")

6.2.2 Complete Spatial Randomness Test (childcare)

Null Hypothesis, H0: The distribution of childcare services in Bedok is randomly distributed.

Alternative Hypothesis, H1:The distribution of childcare services in Bedok is not randomly distributed.

We set the alpha value to be 0.01. Since it’s a two tailed test, the nsim, number of simulation will be at least 2/0.01 = 200.

L_bd_c.csr <- envelope(childcare_bd_ppp, Lest, nsim = 199, rank =1, glocal = TRUE)

quantumPlot(L_bd_c.csr, "Pairwise Distance: L function-Bedok Childcare")

Conclusion:

Until distance ~447m, childcare centres are not statistically significant and we do not have enough evidence to reject the null hypothesis
After distance ~447m/ when the estimated L-function is above the empirical line and envelope, childcare centres are statistically significant & we have enough evidence to reject the null hypothesis. Hence, the distribution of childcare centres in Sengkang is clustered

6.2.3 Computing L-function estimation (kindergarten)

L_bd_k = Lest(kindergarten_bd_ppp, correction = "Ripley")
plot(L_bd_k, .-r ~r,
     ylab= "L(d)-r", xlab= "d(m)")

6.2.4 Complete Spatial Randomness Test (kindergarten)

Null Hypothesis, H0: The distribution of childcare services in Sengkang is randomly distributed.

Alternative Hypothesis, H1:The distribution of childcare services in Sengkang is not randomly distributed.

We set the alpha value to be 0.01. Since it’s a two tailed test, the nsim, number of simulation will be at least 2/0.01 = 200.

L_bd_k.csr <- envelope(kindergarten_bd_ppp, Lest, nsim = 199, rank =1, glocal = TRUE)

quantumPlot(L_bd_k.csr, "Pairwise Distance: L function-Bedok Kindergarten")

Conclusion:

This plot is quite different from the rest because from distance 0 to 45m, the estimated L-function is already above the envelope, indicating that they are statistically significant & we have enough evidence to reject the null hypothesis. Hence, the distribution of kindergartens in Bedok is clustered
After distance 45m, the estimated L-function is within the envelope, indicating random distribution

6.3 Bukit Batok Planning Area

6.3.1 Computing L-function estimation (childcare)

L_bb_c = Lest(childcare_bb_ppp, correction = "Ripley")
plot(L_bb_c, .-r ~r,
     ylab= "L(d)-r", xlab= "d(m)")

6.3.2 Complete Spatial Randomness Test (childcare)

Null Hypothesis, H0: The distribution of childcare services in Bukit Batok is randomly distributed.

Alternative Hypothesis, H1:The distribution of childcare services in Bukit Batok is not randomly distributed.

We set the alpha value to be 0.01. Since it’s a two tailed test, the nsim, number of simulation will be at least 2/0.01 = 200.

L_bb_c.csr <- envelope(childcare_bb_ppp, Lest, nsim = 199, rank =1, glocal = TRUE)

quantumPlot(L_bb_c.csr, "Pairwise Distance: L function-Bukit Batok Childcare")

Conclusion:

For childcare centres in Bukit Batok, the estimated L-function all lies within the envelope, indicating that they are statistically insignificant and we don’t have enough evidence to reject the null hypothesis. Hence, the distribution is random with some distances being above the envelope indicating that the distribution is clustered

6.3.3 Computing L-function estimation (kindergarten)

L_bb_k = Lest(kindergarten_bb_ppp, correction = "Ripley")
plot(L_bb_k, .-r ~r,
     ylab= "L(d)-r", xlab= "d(m)")

6.3.4 Complete Spatial Randomness Test (kindergarten)

Null Hypothesis, H0: The distribution of childcare services in Bukit Batok is randomly distributed.

Alternative Hypothesis, H1:The distribution of childcare services in Bukit Batok is not randomly distributed.

We set the alpha value to be 0.01. Since it’s a two tailed test, the nsim, number of simulation will be at least 2/0.01 = 200.

L_bb_k.csr <- envelope(kindergarten_bb_ppp, Lest, nsim = 199, rank =1, glocal = TRUE)

quantumPlot(L_bb_k.csr, "Pairwise Distance: L function-Bukit Batok Kindergarten")

Conclusion:

For this plot, the estimated L-function all lies within the envelope so the distribution of kindergartens in Bukit Batok is random

6.4 Hougang Planning Area

6.4.1 Computing L-function estimation (childcare)

L_hg_c = Lest(childcare_hg_ppp, correction = "Ripley")
plot(L_hg_c, .-r ~r,
     ylab= "L(d)-r", xlab= "d(m)")

6.4.2 Complete Spatial Randomness Test (childcare)

Null Hypothesis, H0: The distribution of childcare services in Hougang is randomly distributed.

Alternative Hypothesis, H1:The distribution of childcare services in Hougang is not randomly distributed.

We set the alpha value to be 0.01. Since it’s a two tailed test, the nsim, number of simulation will be at least 2/0.01 = 200.

L_hg_c.csr <- envelope(childcare_hg_ppp, Lest, nsim = 199, rank =1, glocal = TRUE)

quantumPlot(L_hg_c.csr, "Pairwise Distance: L function-Hougang Childcare")

Conclusion:

Until distance ~674m, childcare centres are not statistically significant and we do not have enough evidence to reject the null hypothesis that the distribution is random, although the estimated L-function is almost one line with the upper envelope even before 150m
After distance ~674m/ when the estimated L-function is above the empirical line and envelope, childcare centres are statistically significant & we have enough evidence to reject the null hypothesis. Hence, the distribution of childcare centres in Sengkang is clustered

6.4.3 Computing L-function estimation (kindergarten)

L_hg_k = Lest(kindergarten_hg_ppp, correction = "Ripley")
plot(L_hg_k, .-r ~r,
     ylab= "L(d)-r", xlab= "d(m)")

6.4.4 Complete Spatial Randomness Test (kindergarten)

Null Hypothesis, H0: The distribution of childcare services in Hougang is randomly distributed.

Alternative Hypothesis, H1:The distribution of childcare services in Hougang is not randomly distributed.

We set the alpha value to be 0.01. Since it’s a two tailed test, the nsim, number of simulation will be at least 2/0.01 = 200.

L_hg_k.csr <- envelope(kindergarten_hg_ppp, Lest, nsim = 199, rank =1, glocal = TRUE)

quantumPlot(L_hg_k.csr, "Pairwise Distance: L function-Hougang Kindergarten")

Conclusion:

This plot is quite different from the rest because from distance 0 to 58m, the estimated L-function is already above the envelope, indicating that they are statistically significant & we have enough evidence to reject the null hypothesis. Hence, the distribution of kindergartens in Bedok is clustered
After distance 58m, the estimated L-function is within the envelope, indicating random distribution

We can see that most of the distributions of childcare centres and kindergartens are equally clustered and random, and a mix of both derived from the quantum plot map.

7. Section B: Kerney Density Estimation (KDE)

To create KDE map, we use the density() function of spatstat and bw.diggle() for automatic bandwidth selection method.

For some of the visualizations, the kernels are manually chosen to visualize the density better.

7.1 Sengkang’s KDE map

KDE of Sengkang’s childcare centres

kde_childcare_sk_bw <- density(childcare_sk_ppp.km, sigma = bw.diggle, edge = TRUE, kernel = "gaussian")

KDE of Sengkang’s kindergartens

kde_kindergartens_sk_bw <- density(kindergarten_sk_ppp.km, sigma = 0.25, edge = TRUE, kernel = "gaussian")

par(mfrow=c(1,2))
plot(kde_childcare_sk_bw, main = "Sengkang's Childcare Density")
plot(kde_kindergartens_sk_bw, main = "Sengkang's Kindergarten Density")

7.2 Bedok’s KDE map

KDE of Bedok’s childcare centres

kde_childcare_bd_bw <- density(childcare_bd_ppp.km, sigma = bw.diggle, edge = TRUE, kernel = "gaussian")

KDE of Bedok’s kindergartens

kde_kindergartens_bd_bw <- density(kindergarten_bd_ppp.km, sigma = bw.diggle, edge = TRUE, kernel = "gaussian")

par(mfrow=c(1,2))
plot(kde_childcare_bd_bw, main = "Bedok's Childcare Density")
plot(kde_kindergartens_bd_bw, main = "Bedok's Kindergarten Density")

7.3 Bukit Batok’s KDE map

KDE of Bukit Batok’s childcare centres

kde_childcare_bb_bw <- density(childcare_bb_ppp.km, sigma = bw.diggle, edge = TRUE, kernel = "gaussian")

KDE of Bukit Batok’s kindergartens

kde_kindergartens_bb_bw <- density(kindergarten_bb_ppp.km, sigma = 0.25, edge = TRUE, kernel = "gaussian")

par(mfrow=c(1,2))
plot(kde_childcare_bb_bw, main = "Bukit Batok's Childcare Density")
plot(kde_kindergartens_bb_bw, main = "Bukit Batok's Kindergarten Density")

7.4 Hougang’s KDE map

KDE of Hougang’s childcare centres

kde_childcare_hg_bw <- density(childcare_hg_ppp.km, sigma = bw.diggle, edge = TRUE, kernel = "gaussian")

KDE of Bukit Batok’s kindergartens

kde_kindergartens_hg_bw <- density(kindergarten_hg_ppp.km, sigma = 0.25, edge = TRUE, kernel = "gaussian")

par(mfrow=c(1,2))
plot(kde_childcare_hg_bw, main = "Hougang's Childcare Density")
plot(kde_kindergartens_hg_bw, main = "Hougang's Kindergarten Density")

7.5 Displaying KDE maps on openstreetmap of Singapore

Firstly, we have to convert the KDE output into grid object using as.SpatialGridDataFrame.im() function so that we can turn it into raster object using raster() function.

7.5.1 Sengkang childcare raster object

gridded_kde_childcare_sk_bw <- as.SpatialGridDataFrame.im(kde_childcare_sk_bw)
kde_childcare_sk_bw_raster <- raster(gridded_kde_childcare_sk_bw)
kde_childcare_sk_bw_raster

## class      : RasterLayer 
## dimensions : 128, 128, 16384  (nrow, ncol, ncell)
## resolution : 0.05660914, 0.0220703  (x, y)
## extent     : 30.12219, 37.36816, 39.76083, 42.58583  (xmin, xmax, ymin, ymax)
## crs        : NA 
## source     : memory
## names      : v 
## values     : -1.133223e-14, 53.42105  (min, max)

Then, we assign the projection system to the raster object using CRS() function so that when we map the raster object, it will return the accurate longtitude & latitude.

crs(kde_childcare_sk_bw_raster) <- CRS('+init=EPSG:3414')

7.5.1.1 Sengkang childcare KDE output in tmap

tmap_mode("view")

## tmap mode set to interactive viewing

tm_shape(kde_childcare_sk_bw_raster) +
  tm_basemap("OpenStreetMap")+
  tm_raster('v')+
  tm_layout(legend.position = c('left', 'bottom'), frame = FALSE)

## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.

## Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

We will be creating a function to plot the KDE output using tmap package.

tmap_mode("plot")

## tmap mode set to plotting

KDE_tmap <- function(x,y){
  tm_shape(x)+
  tm_basemap("OpenStreetMap")+
  tm_raster(y)+
  tm_layout(legend.height = 0.45, 
            legend.width = 0.35,
            frame = FALSE) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar(width = 0.15)
}


KDE_tmap(kde_childcare_sk_bw_raster, "v")

## Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

7.5.1.2 Sengkang kindergarten KDE output in tmap

gridded_kde_kindergarten_sk_bw <- as.SpatialGridDataFrame.im(kde_kindergartens_sk_bw)
kde_kindergarten_sk_bw_raster <- raster(gridded_kde_kindergarten_sk_bw)

crs(kde_kindergarten_sk_bw_raster) <- CRS('+init=EPSG:3414')

KDE_tmap(kde_kindergarten_sk_bw_raster, "v")

## Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

7.5.2 Bedok KDE output in tmap

gridded_kde_childcare_bd_bw <- as.SpatialGridDataFrame.im(kde_childcare_bd_bw)
kde_childcare_bd_bw_raster <- raster(gridded_kde_childcare_bd_bw)

gridded_kde_kindergarten_bd_bw <- as.SpatialGridDataFrame.im(kde_kindergartens_bd_bw)
kde_kindergarten_bd_bw_raster <- raster(gridded_kde_kindergarten_bd_bw)

crs(kde_childcare_bd_bw_raster) <- CRS('+init=EPSG:3414')

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum SVY21 in CRS definition

crs(kde_kindergarten_bd_bw_raster) <- CRS('+init=EPSG:3414')

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum SVY21 in CRS definition

tmap_mode("plot")

## tmap mode set to plotting

par(mfrow=c(1,2))

KDE_tmap(kde_childcare_bd_bw_raster, "v")

KDE_tmap(kde_kindergarten_bd_bw_raster, "v")

7.5.3 Bukit Batok KDE output in tmap

gridded_kde_childcare_bb_bw <- as.SpatialGridDataFrame.im(kde_childcare_bb_bw)
kde_childcare_bb_bw_raster <- raster(gridded_kde_childcare_bb_bw)

gridded_kde_kindergarten_bb_bw <- as.SpatialGridDataFrame.im(kde_kindergartens_bb_bw)
kde_kindergarten_bb_bw_raster <- raster(gridded_kde_kindergarten_bb_bw)

crs(kde_childcare_bb_bw_raster) <- CRS('+init=EPSG:3414')

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum SVY21 in CRS definition

crs(kde_kindergarten_bb_bw_raster) <- CRS('+init=EPSG:3414')

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum SVY21 in CRS definition

tmap_mode("plot")

## tmap mode set to plotting

par(mfrow=c(1,2))

KDE_tmap(kde_childcare_bb_bw_raster, "v")

KDE_tmap(kde_kindergarten_bb_bw_raster, "v")

7.5.4 Hougang KDE output in tmap

gridded_kde_childcare_hg_bw <- as.SpatialGridDataFrame.im(kde_childcare_hg_bw)
kde_childcare_hg_bw_raster <- raster(gridded_kde_childcare_hg_bw)

gridded_kde_kindergarten_hg_bw <- as.SpatialGridDataFrame.im(kde_kindergartens_hg_bw)
kde_kindergarten_hg_bw_raster <- raster(gridded_kde_kindergarten_hg_bw)

crs(kde_childcare_hg_bw_raster) <- CRS('+init=EPSG:3414')

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum SVY21 in CRS definition

crs(kde_kindergarten_hg_bw_raster) <- CRS('+init=EPSG:3414')

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum SVY21 in CRS definition

tmap_mode("plot")

## tmap mode set to plotting

par(mfrow=c(1,2))

KDE_tmap(kde_childcare_hg_bw_raster, "v")

KDE_tmap(kde_kindergarten_hg_bw_raster, "v")

7.6 Advantages of KDE over point maps

The advantages of KDE over point maps are:

each pixel or plot of the land has a density value, making it easy for us to analyze the areas near to the point events
we can differentiate areas within the levels of the KDE, making it easy for us to spot the dense areas

However, point maps also have some aspects which are useful, such as showing the variable text as we hover over the point events, it’s also the accurate location of each point events.

Learnings from the assignment

From the assignment, we can see that not all planning subzone areas are sufficiently served by the preschool facilities (from the preschool density data)
From the assignment, we can see some new planning subzone areas that are more appealing for the new families, from the rising children population in certain planning subzone areas
From the assignment, we can see that some CBD and downtown areas have high preschool density, indicating some trends of maybe parents dropping off their children before going to work. There are other trends that influence our data
From the assignment, we can also see that there are more childcare centres than kindergarten, due to the trend of busy workng parents, also stated in ECDA’s website
The author learnt the difference between sp objects and sf dataframe, and the functions used to handle different data objects
The author learnt how to handle KML file, getting the important information from it
The author learnt the hard way about manipulating csv dataframe from dividing the age variable and creating new columns
The author learnt more about the different spatial point pattern analysis functions and the plotting different kind of maps, be it ggplot, tmap and ggmap
Most importantly, the author becomes more confident in coding geospatial data in R

Take-home_Exercise 01

Erika Aldisa

9/11/2020 (updated: 2020-09-25)

IS415 Geospatial Analytics and Application

Instructor: Dr. Kam Tin Seong.

Assoc. Professor of Information Systems (Practice)

1. Overview

1.1. Data sets used

1.2. Install & Load R packages

2. Section A: Exploratory Spatial Data Analysis

Choropleth mapping to reveal supply and demand of childcare and kindergarten services at plannning subzone level

2.1 Importing the geospatial data

Plotting Base Map (study area)

2.2 Tidying KML data

2.3 Importing the aspatial data (csv file)

2.4 Combining spatial & aspatial data

2.5 Choropleth Mapping

Demand - Population

Map Function for Demand

Population by planning subzone area

Map description:

Population (Preschool)

Map description: From the map, we can see that there are 3 planning area that are in dark blue meaning a high population of children aged 0 to 7 (21.10% - 24.63%). These planning subzone areas are:

Population (Childcare)

Map description: From the map, we can see that there are 3 planning area that are in dark blue meaning a high population of children aged 0 to 2 (3.269% - 4.120%) that go for childcare services. These planning subzone areas are:

Population (Kindergarten)

Map description: From the map, we can see that there are 3 planning area that are in dark blue meaning a high population of children aged 3 to 7 (17.99% - 20.55%). These planning subzone areas are:

Overall Population (Demand)

Map description:

Supply - Preschool Density in subzone area

Calculating density of preschool services

Number of preschool services

Map Description: the areas with highest preschool centres are:

3. Section A: Analytics mapping

Revealing relationship between childcare & kindergarten services at planning subzone level

Map function for density

Map description: From the map, we can see that there are plenty of planning subzone areas with preschool density values of more than 20 per km^2. However, 2 of the areas are in Downtown core, which is not a residential area.

4. Section A: Drawing statistical conclusion from 2 & 3

Map Description:

5. Section B: Exploratory Spatial Data Analysis

5.1 Conversion to ppp object format

5.2 Combining ppp object and owin object (Whole of SG)

5.3 Visualing study area and point maps

After a few runs of the L-function, the author’s laptop does not have enough resources to process the study area of whole of Singapore.

5.4 Combining ppp object and owin object (4 planning area)

5.5 Visualing 4 planning areas

6. Section B: 2nd order spatial point pattern analysis

6.1 Sengkang Planning Area

6.1.1 Computing L-function estimation (childcare)

6.1.2 Complete Spatial Randomness Test (childcare)

6.1.3 Computing L-function estimation (kindergarten)

6.1.4 Complete Spatial Randomness Test (kindergarten)

6.2 Bedok Planning Area

6.2.1 Computing L-function estimation (childcare)

6.2.2 Complete Spatial Randomness Test (childcare)

6.2.3 Computing L-function estimation (kindergarten)

6.2.4 Complete Spatial Randomness Test (kindergarten)

6.3 Bukit Batok Planning Area

6.3.1 Computing L-function estimation (childcare)

6.3.2 Complete Spatial Randomness Test (childcare)

6.3.3 Computing L-function estimation (kindergarten)

6.3.4 Complete Spatial Randomness Test (kindergarten)

6.4 Hougang Planning Area

6.4.1 Computing L-function estimation (childcare)

6.4.2 Complete Spatial Randomness Test (childcare)

6.4.3 Computing L-function estimation (kindergarten)

6.4.4 Complete Spatial Randomness Test (kindergarten)

7. Section B: Kerney Density Estimation (KDE)

7.1 Sengkang’s KDE map

7.2 Bedok’s KDE map

7.3 Bukit Batok’s KDE map

7.4 Hougang’s KDE map

7.5 Displaying KDE maps on openstreetmap of Singapore

7.5.1 Sengkang childcare raster object

7.5.1.1 Sengkang childcare KDE output in tmap

7.5.1.2 Sengkang kindergarten KDE output in tmap

7.5.2 Bedok KDE output in tmap

7.5.3 Bukit Batok KDE output in tmap

7.5.4 Hougang KDE output in tmap

7.6 Advantages of KDE over point maps