1 Introduction

This analysis aims to analyse and compare childcare services in 2017 and 2020. There will be two parts to this analysis: (1) Supply and Demand Analysis of Childcare Services, and (2) Spatial Point Pattern analysis of Childcare Services.

The analysis defines children aged 0 to 6 as potential beneficiaries of childcare services. This is because childcare centers under Early Childhood Development Agency (ECDA) only provide programmes for children below the age of 7 years.

2 Data

The following data sets will be used in the analysis.

Data	Format	Description	Source
Childcare (2020)	KML	Location of childcare services in 2020	data.gov.sg (https://data.gov.sg/dataset/child-care-services)
Childcare (2017)	Shapefile	Location of childcare services in 2017	Hands-on Exercise Data
Subzone	Shapefile	Master Plan 2014 subzone boundary polygons	data.gov.sg (https://data.gov.sg/dataset/master-plan-2014-subzone-boundary-no-sea)
Population data	CSV	Singapore residents by planning area, subzone, age group, sex and type of dwelling (2011-2019)	Department of Statistics Singapore (https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data)
Crude birth rate	CSV	Number of live births in Singapore each year, per thousand mid-year population	data.gov.sg (https://data.gov.sg/dataset/births-and-fertility-annual?resource_id=2ba37efc-5411-4f1f-aecf-ea2455c9236d)

3 Install and load packages

packages = c('tidyverse', 'tmap', 'sf', 'rgdal', 'maptools', 'raster', 'spatstat', 'plotly', 'tmaptools')
for (p in packages) {
  if (!require(p, character.only = T)) {
    install.packages(p)
  }
  library(p, character.only = T)
}

4 Data import

Import the data and examine contents of each data set

4.1 Childcare (2020)

Import as sf

childcare20_sf <- st_read('data/geospatial/child-care-services-kml.kml')

## Reading layer `CHILDCARE' from data source `C:\Users\Xiao Rong\Desktop\School\Geospatial Analytics and Applications\Assignments\Take-Home Exercise 1\IS415_Take-home_Ex01\data\geospatial\child-care-services-kml.kml' using driver `KML'
## Simple feature collection with 1545 features and 2 fields
## geometry type:  POINT
## dimension:      XYZ
## bbox:           xmin: 103.6824 ymin: 1.248403 xmax: 103.9897 ymax: 1.462134
## z_range:        zmin: 0 zmax: 0
## geographic CRS: WGS 84

Glimpse

glimpse(childcare20_sf)

## Rows: 1,545
## Columns: 3
## $ Name        <chr> "kml_1", "kml_2", "kml_3", "kml_4", "kml_5", "kml_6", "...
## $ Description <chr> "<center><table><tr><th colspan='2' align='center'><em>...
## $ geometry    <POINT [°]> POINT Z (103.8331 1.42972 0), POINT Z (103.8138 1...

4.2 Childcare (2017)

Import as sf

childcare17_sf <- st_read(dsn = 'data/geospatial',
                          layer = 'CHILDCARE')

## Reading layer `CHILDCARE' from data source `C:\Users\Xiao Rong\Desktop\School\Geospatial Analytics and Applications\Assignments\Take-Home Exercise 1\IS415_Take-home_Ex01\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 1312 features and 18 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 11203.01 ymin: 25667.6 xmax: 45404.24 ymax: 49300.88
## projected CRS:  SVY21

Glimpse

glimpse(childcare17_sf)

## Rows: 1,312
## Columns: 19
## $ OBJECTID   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1...
## $ ADDRESSBLO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ ADDRESSBUI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ ADDRESSPOS <chr> "387908", "489773", "569880", "520114", "437157", "76092...
## $ ADDRESSSTR <chr> "11 LORONG 37 GEYLANG SINGAPORE 387908", "13 BEDOK RIA P...
## $ ADDRESSTYP <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ DESCRIPTIO <chr> "Child Care Services", "Child Care Services", "Child Car...
## $ HYPERLINK  <chr> "http://www.childcarelink.gov.sg/ccls/chdcentpart/ChdCen...
## $ LANDXADDRE <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ LANDYADDRE <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ NAME       <chr> "FIRST JUNIOR PRESCHOOL", "DISCOVERY KIDZ PRESCHOOL PTE....
## $ PHOTOURL   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ ADDRESSFLO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ INC_CRC    <chr> "45DBE80EB321A9B5", "B80CD6C30B33E468", "53490A27EDC8D9B...
## $ FMEL_UPD_D <date> 2016-12-23, 2016-12-23, 2016-12-23, 2016-12-23, 2016-12...
## $ ADDRESSUNI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ X_ADDR     <dbl> 34246.67, 41122.34, 32682.01, 41034.07, 34806.74, 28423....
## $ Y_ADDR     <dbl> 33141.02, 34355.95, 39989.33, 36221.83, 32997.58, 45471....
## $ geometry   <POINT [m]> POINT (34246.67 33141.02), POINT (41122.34 34355.9...

4.3 Subzone

Import as sf

subzone_sf <- st_read(dsn = 'data/geospatial',
                      layer = 'MP14_SUBZONE_NO_SEA_PL')

## Reading layer `MP14_SUBZONE_NO_SEA_PL' from data source `C:\Users\Xiao Rong\Desktop\School\Geospatial Analytics and Applications\Assignments\Take-Home Exercise 1\IS415_Take-home_Ex01\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21

Glimpse

glimpse(subzone_sf)

## Rows: 323
## Columns: 16
## $ OBJECTID   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1...
## $ SUBZONE_NO <int> 2, 2, 3, 4, 5, 4, 10, 12, 4, 6, 1, 1, 3, 8, 3, 7, 9, 2, ...
## $ SUBZONE_N  <chr> "PEOPLE'S PARK", "BUKIT MERAH", "CHINATOWN", "PHILLIP", ...
## $ SUBZONE_C  <chr> "OTSZ02", "BMSZ02", "OTSZ03", "DTSZ04", "DTSZ05", "OTSZ0...
## $ CA_IND     <chr> "Y", "N", "Y", "Y", "Y", "Y", "N", "Y", "N", "Y", "Y", "...
## $ PLN_AREA_N <chr> "OUTRAM", "BUKIT MERAH", "OUTRAM", "DOWNTOWN CORE", "DOW...
## $ PLN_AREA_C <chr> "OT", "BM", "OT", "DT", "DT", "OT", "BM", "DT", "BM", "D...
## $ REGION_N   <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REGION", "C...
## $ REGION_C   <chr> "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "C...
## $ INC_CRC    <chr> "B4120D23006C932A", "1C51019439A68700", "0FF1661344C84AE...
## $ FMEL_UPD_D <date> 2016-05-11, 2016-05-11, 2016-05-11, 2016-05-11, 2016-05...
## $ X_ADDR     <dbl> 28831.78, 26360.80, 29153.97, 29706.72, 29968.62, 29509....
## $ Y_ADDR     <dbl> 29419.65, 29384.14, 29158.04, 29744.91, 29572.76, 29646....
## $ SHAPE_Leng <dbl> 1822.1927, 3074.9632, 4297.5999, 871.5549, 1872.7522, 16...
## $ SHAPE_Area <dbl> 93140.44, 411722.82, 587222.68, 39437.94, 188767.49, 133...
## $ geometry   <MULTIPOLYGON [m]> MULTIPOLYGON (((29099.02 29..., MULTIPOLYGO...

4.4 Population data

Import

population <- read_csv('data/aspatial/respopagesextod2011to2019.csv')

## Parsed with column specification:
## cols(
##   PA = col_character(),
##   SZ = col_character(),
##   AG = col_character(),
##   Sex = col_character(),
##   TOD = col_character(),
##   Pop = col_double(),
##   Time = col_double()
## )

Glimpse

glimpse(population)

## Rows: 883,728
## Columns: 7
## $ PA   <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang M...
## $ SZ   <chr> "Ang Mo Kio Town Centre", "Ang Mo Kio Town Centre", "Ang Mo Ki...
## $ AG   <chr> "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0...
## $ Sex  <chr> "Males", "Males", "Males", "Males", "Males", "Males", "Males",...
## $ TOD  <chr> "HDB 1- and 2-Room Flats", "HDB 3-Room Flats", "HDB 4-Room Fla...
## $ Pop  <dbl> 0, 10, 30, 50, 0, 0, 40, 0, 0, 10, 30, 60, 0, 0, 40, 0, 0, 10,...
## $ Time <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 20...

4.5 Birth rate

Import

birthrate <- read_csv('data/aspatial/crude-birth-rate.csv')

## Parsed with column specification:
## cols(
##   year = col_double(),
##   level_1 = col_character(),
##   value = col_double()
## )

Glimpse

glimpse(birthrate)

## Rows: 59
## Columns: 3
## $ year    <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969,...
## $ level_1 <chr> "Crude Birth Rate", "Crude Birth Rate", "Crude Birth Rate",...
## $ value   <dbl> 37.5, 35.2, 33.7, 33.2, 31.6, 29.5, 28.3, 25.6, 23.5, 21.8,...

5 Data wrangling

Pre-process and prepare data for analysis.

5.1 Handle invalid geometries

Ensure that spatial data to be used for analysis has no invalid geometries.

5.1.1 Check for invalid geometries

Subzone

length(which(st_is_valid(subzone_sf) == FALSE))

## [1] 9

There are 9 invalid geometries in the subzone data.

Childcare (2020)

length(which(st_is_valid(childcare20_sf) == FALSE))

## [1] 0

There are no invalid geometries for childcare (2020) data.

Childcare (2017)

length(which(st_is_valid(childcare17_sf) == FALSE))

## [1] 0

There are no invalid geometries for childcare (2017) data.

5.1.2 Create valid representation

Only the subzone data has invalid geometries. Handle the invalid geometries and make them valid.

subzone_sf <- st_make_valid(subzone_sf)
length(which(st_is_valid(subzone_sf) == FALSE))

## [1] 0

Invalid geometries have been handled, there are no more invalid geometries for the spatial datasets.

5.2 Handle missing values

Check the population attribute data for missing values, as missing values can impact future calculations.

population[rowSums(is.na(population))!=0,]

## # A tibble: 0 x 7
## # ... with 7 variables: PA <chr>, SZ <chr>, AG <chr>, Sex <chr>, TOD <chr>,
## #   Pop <dbl>, Time <dbl>

There are no missing values in the population data.

5.3 Spatial data preparation: define projection

Prepare spatial data for use in spatial analysis. Define the coordinate reference system (CRS).

Spatial data in Singapore is utilised for this analysis.
All spatial data will be projected in EPSG:3414, a projected coordinate system for Singapore.
For each spatial data, CRS will be checked, and then assigned or transformed accordingly.

5.3.1 Childcare (2020)

Check CRS

st_crs(childcare20_sf)

## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]

Transform CRS

Transform projection of childcare20_sf from WGS84 to EPSG:3414.

childcare20_sf3414 <- st_transform(childcare20_sf, 3414)
st_crs(childcare20_sf3414)

## Coordinate Reference System:
##   User input: EPSG:3414 
##   wkt:
## PROJCRS["SVY21 / Singapore TM",
##     BASEGEOGCRS["SVY21",
##         DATUM["SVY21",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4757]],
##     CONVERSION["Singapore Transverse Mercator",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (N)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (E)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Singapore"],
##         BBOX[1.13,103.59,1.47,104.07]],
##     ID["EPSG",3414]]

5.3.2 Childcare (2017)

Check CRS

st_crs(childcare17_sf)

## Coordinate Reference System:
##   User input: SVY21 
##   wkt:
## PROJCRS["SVY21",
##     BASEGEOGCRS["SVY21[WGS84]",
##         DATUM["World Geodetic System 1984",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]],
##             ID["EPSG",6326]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["Degree",0.0174532925199433]]],
##     CONVERSION["unnamed",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]

Assign CRS

Assign EPSG:3414 as the projection to childcare17_sf.

childcare17_sf3414 <- st_set_crs(childcare17_sf, 3414)
st_crs(childcare17_sf3414)

## Coordinate Reference System:
##   User input: EPSG:3414 
##   wkt:
## PROJCRS["SVY21 / Singapore TM",
##     BASEGEOGCRS["SVY21",
##         DATUM["SVY21",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4757]],
##     CONVERSION["Singapore Transverse Mercator",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (N)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (E)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Singapore"],
##         BBOX[1.13,103.59,1.47,104.07]],
##     ID["EPSG",3414]]

5.3.3 Subzone

Check CRS

st_crs(subzone_sf)

## Coordinate Reference System:
##   User input: SVY21 
##   wkt:
## PROJCRS["SVY21",
##     BASEGEOGCRS["SVY21[WGS84]",
##         DATUM["World Geodetic System 1984",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]],
##             ID["EPSG",6326]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["Degree",0.0174532925199433]]],
##     CONVERSION["unnamed",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]

Assign CRS

Assign EPSG:3414 as the projection to subzone_sf.

subzone_sf3414 <- st_set_crs(subzone_sf, 3414)
st_crs(subzone_sf3414)

## Coordinate Reference System:
##   User input: EPSG:3414 
##   wkt:
## PROJCRS["SVY21 / Singapore TM",
##     BASEGEOGCRS["SVY21",
##         DATUM["SVY21",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4757]],
##     CONVERSION["Singapore Transverse Mercator",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (N)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (E)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Singapore"],
##         BBOX[1.13,103.59,1.47,104.07]],
##     ID["EPSG",3414]]

5.3.4 Plot spatial data

Plot spatial data to check that all data have a consistent projection system.

tm_shape(subzone_sf3414) +
  tm_polygons(border.col='grey') +
  tm_shape(childcare17_sf3414) + 
  tm_dots(col='brown4') +
  tm_shape(childcare20_sf3414) +
  tm_dots(col='lightsteelblue4') +
  tm_layout(frame = FALSE)

The subzone and childcare points seem to be aligned, hence all spatial data is taken to be projected correctly.

5.4 Pre-process attribute data

5.4.1 Extract 2017 and 2020 population data

Analysis will be performed on child care services data in 2017 and 2020. Hence, population data in 2017 and 2020 will be extracted separately from the population dataset.

Note: Population data in 2019 will be extracted and used as a proxy for the population in 2020, since population information for 2020 has not been made available yet.

population17 <- population %>%
  filter(Time == 2017)

population20 <- population %>%
  filter(Time == 2019)

5.4.2 Tidy data for analysis

Tidy the population dataset for each year, such that for each subzone, there will only be one row corresponding to population information.

Also, create another column calculating the total population for all age groups for each subzone.

2017 population data

population17_tidy <- population17 %>%
  group_by(PA, SZ, AG) %>%
  summarise(`POP` = sum(`Pop`)) %>%
  ungroup() %>%
  pivot_wider(names_from = AG, values_from = POP) %>%
  mutate(TOTAL = rowSums(.[3:21]))

glimpse(population17_tidy)

## Rows: 323
## Columns: 22
## $ PA            <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio...
## $ SZ            <chr> "Ang Mo Kio Town Centre", "Cheng San", "Chong Boon", ...
## $ `0_to_4`      <dbl> 190, 1120, 850, 680, 190, 590, 280, 960, 0, 170, 0, 9...
## $ `10_to_14`    <dbl> 320, 1150, 1050, 990, 400, 740, 430, 950, 0, 200, 0, ...
## $ `15_to_19`    <dbl> 280, 1280, 1320, 1120, 520, 830, 530, 870, 0, 260, 0,...
## $ `20_to_24`    <dbl> 250, 1460, 1410, 1290, 550, 1010, 650, 1110, 0, 340, ...
## $ `25_to_29`    <dbl> 320, 1850, 1750, 1490, 510, 1040, 700, 1460, 0, 330, ...
## $ `30_to_34`    <dbl> 340, 1960, 1730, 1300, 290, 980, 400, 1430, 0, 220, 0...
## $ `35_to_39`    <dbl> 380, 2340, 1950, 1510, 280, 1080, 440, 1830, 0, 220, ...
## $ `40_to_44`    <dbl> 470, 2230, 1980, 1690, 480, 1230, 560, 1820, 0, 260, ...
## $ `45_to_49`    <dbl> 450, 2180, 1880, 1740, 530, 1240, 580, 1530, 0, 280, ...
## $ `5_to_9`      <dbl> 300, 1080, 920, 810, 320, 660, 340, 1030, 0, 160, 0, ...
## $ `50_to_54`    <dbl> 350, 2250, 2180, 1870, 550, 1440, 640, 1580, 0, 310, ...
## $ `55_to_59`    <dbl> 310, 2140, 2150, 1820, 550, 1430, 730, 1750, 0, 330, ...
## $ `60_to_64`    <dbl> 290, 2240, 2170, 1780, 460, 1350, 640, 1710, 0, 330, ...
## $ `65_to_69`    <dbl> 280, 2060, 2010, 1670, 420, 1140, 470, 1590, 0, 260, ...
## $ `70_to_74`    <dbl> 170, 1210, 1450, 1120, 250, 800, 250, 1130, 0, 150, 0...
## $ `75_to_79`    <dbl> 150, 910, 1110, 890, 220, 640, 220, 970, 0, 110, 0, 7...
## $ `80_to_84`    <dbl> 70, 540, 570, 500, 140, 380, 160, 530, 0, 80, 0, 380,...
## $ `85_to_89`    <dbl> 40, 280, 300, 290, 100, 220, 90, 300, 0, 40, 0, 200, ...
## $ `90_and_over` <dbl> 0, 120, 140, 120, 40, 90, 40, 150, 0, 30, 0, 80, 50, ...
## $ TOTAL         <dbl> 4960, 28400, 26920, 22680, 6800, 16890, 8150, 22700, ...

2020 population data

population20_tidy <- population20 %>%
  group_by(PA, SZ, AG) %>%
  summarise(`POP` = sum(`Pop`)) %>%
  ungroup() %>%
  pivot_wider(names_from = AG, values_from = POP) %>%
  mutate(TOTAL = rowSums(.[3:21]))

glimpse(population20_tidy)

## Rows: 323
## Columns: 22
## $ PA            <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio...
## $ SZ            <chr> "Ang Mo Kio Town Centre", "Cheng San", "Chong Boon", ...
## $ `0_to_4`      <dbl> 170, 1050, 840, 730, 200, 540, 220, 750, 0, 160, 0, 7...
## $ `10_to_14`    <dbl> 330, 1060, 1020, 1020, 410, 680, 430, 930, 0, 230, 0,...
## $ `15_to_19`    <dbl> 310, 1210, 1150, 1090, 460, 720, 490, 830, 0, 260, 0,...
## $ `20_to_24`    <dbl> 290, 1380, 1380, 1170, 530, 860, 570, 930, 0, 320, 0,...
## $ `25_to_29`    <dbl> 290, 1810, 1600, 1450, 510, 1030, 690, 1370, 0, 330, ...
## $ `30_to_34`    <dbl> 280, 2010, 1930, 1400, 320, 1030, 490, 1370, 0, 240, ...
## $ `35_to_39`    <dbl> 330, 2220, 1820, 1550, 290, 1010, 360, 1520, 0, 240, ...
## $ `40_to_44`    <dbl> 430, 2050, 1900, 1700, 400, 1090, 450, 1700, 0, 250, ...
## $ `45_to_49`    <dbl> 470, 2220, 1840, 1860, 540, 1170, 600, 1550, 0, 330, ...
## $ `5_to_9`      <dbl> 260, 1000, 880, 840, 280, 590, 320, 910, 0, 170, 0, 9...
## $ `50_to_54`    <dbl> 350, 2070, 1980, 1820, 540, 1260, 590, 1400, 0, 290, ...
## $ `55_to_59`    <dbl> 350, 2160, 2150, 1800, 550, 1420, 700, 1630, 0, 350, ...
## $ `60_to_64`    <dbl> 290, 2200, 2160, 1750, 480, 1240, 710, 1680, 0, 360, ...
## $ `65_to_69`    <dbl> 270, 2150, 2080, 1700, 420, 1140, 520, 1590, 0, 250, ...
## $ `70_to_74`    <dbl> 200, 1570, 1670, 1300, 300, 910, 330, 1310, 0, 170, 0...
## $ `75_to_79`    <dbl> 140, 1020, 1190, 900, 240, 670, 220, 960, 0, 110, 0, ...
## $ `80_to_84`    <dbl> 80, 580, 740, 590, 150, 400, 190, 610, 0, 80, 0, 420,...
## $ `85_to_89`    <dbl> 40, 310, 400, 330, 100, 240, 100, 330, 0, 30, 0, 230,...
## $ `90_and_over` <dbl> 10, 150, 200, 140, 50, 120, 60, 180, 0, 30, 0, 100, 6...
## $ TOTAL         <dbl> 4890, 28220, 26930, 23140, 6770, 16120, 8040, 21550, ...

5.4.3 Extract population information on children

Obtain population information on the number of children aged 0 to 6.

Analysis will be conducted on children aged 0 to 6, as they are the potential beneficiaries for childcare services in Singapore.
However, the population size of children in the data are provided in age groups of 0 to 4, and 5 to 9.
As such, the number of children aged 5 and 6 have to be extracted from the number of children given in the age group 5 to 9.
This is done by deriving the number of children aged 5 and 6 as a percentage of the total number of children aged 5 to 9.
The number of births (per thousand population) in Singapore will be used this percentage.

\[\small Percentage\ of\ children\ aged\ 5\ and\ 6\ in\ the\ age\ group\ 5\ to\ 9\ in\ 2017\] \[\small = \frac{(Number\ of\ births\ in\ 2011\ +\ Number\ of\ births\ in\ 2012)}{Number\ of\ births\ from\ 2008\ to\ 2012} \]

\[\small Percentage\ of\ children\ aged\ 5\ and\ 6\ in\ the\ age\ group\ 5\ to\ 9\ in\ 2019\] \[\small = \frac{(Number\ of\ births\ in\ 2013\ +\ Number\ of\ births\ in\ 2014)}{Number\ of\ births\ from\ 2010\ to\ 2014} \]

The function below obtains the number of births (per thousand population) in Singapore for a particular year.

get_birth <- function(yr, df) {
  df1 <- df %>%
    filter(year == yr)
  df1$value
}

The code chunk below:

Processes population data to calculate the number of children aged 0 to 6.
Calculates number of children as a percentage of total population.

2017 population data

# obtain the percentage of children aged 5 and 6 in 2017
perc17 <- (get_birth(2011, birthrate) + get_birth(2012, birthrate)) / (get_birth(2008, birthrate) + get_birth(2009, birthrate) + get_birth(2010, birthrate) + get_birth(2011, birthrate) + get_birth(2012, birthrate))

# get 2017 population data for children aged 0 to 6 for each subzone
pop17 <- population17_tidy %>%
  mutate(`5_to_6` = `5_to_9` * perc17) %>%
  mutate(`0_to_6` = round(`0_to_4` + `5_to_6`)) %>%
  dplyr::select(PA, SZ, `0_to_6`, TOTAL)
  
glimpse(pop17)

## Rows: 323
## Columns: 4
## $ PA       <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "A...
## $ SZ       <chr> "Ang Mo Kio Town Centre", "Cheng San", "Chong Boon", "Kebu...
## $ `0_to_6` <dbl> 310, 1552, 1218, 1004, 318, 854, 416, 1372, 0, 234, 0, 137...
## $ TOTAL    <dbl> 4960, 28400, 26920, 22680, 6800, 16890, 8150, 22700, 0, 40...

2020 population data

# obtain the percentage of children aged 5 and 6 in 2020
perc20 <- (get_birth(2013, birthrate) + get_birth(2014, birthrate)) / (get_birth(2010, birthrate) + get_birth(2011, birthrate) + get_birth(2012, birthrate) + get_birth(2013, birthrate) + get_birth(2014, birthrate))

# get 2020 population data for children aged 0 to 6 for each subzone
pop20 <- population20_tidy %>%
  mutate(`5_to_6` = `5_to_9` * perc20) %>%
  mutate(`0_to_6` = round(`0_to_4` + `5_to_6`)) %>%
  dplyr::select(PA, SZ, `0_to_6`, TOTAL)
  
glimpse(pop20)

## Rows: 323
## Columns: 4
## $ PA       <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "A...
## $ SZ       <chr> "Ang Mo Kio Town Centre", "Cheng San", "Chong Boon", "Kebu...
## $ `0_to_6` <dbl> 273, 1448, 1190, 1064, 311, 775, 347, 1112, 0, 228, 0, 115...
## $ TOTAL    <dbl> 4890, 28220, 26930, 23140, 6770, 16120, 8040, 21550, 0, 42...

5.4.4 Join data

Join population attribute data to the spatial subzone data.

2017 subzone-population data

pop17_mutate <- pop17 %>% 
  mutate_at(.vars = vars(PA, SZ),
            .funs = funs(toupper))
subzone17 <- left_join(subzone_sf3414, pop17_mutate, by = c('SUBZONE_N' = 'SZ'))

glimpse(subzone17)

## Rows: 323
## Columns: 19
## $ OBJECTID   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1...
## $ SUBZONE_NO <int> 2, 2, 3, 4, 5, 4, 10, 12, 4, 6, 1, 1, 3, 8, 3, 7, 9, 2, ...
## $ SUBZONE_N  <chr> "PEOPLE'S PARK", "BUKIT MERAH", "CHINATOWN", "PHILLIP", ...
## $ SUBZONE_C  <chr> "OTSZ02", "BMSZ02", "OTSZ03", "DTSZ04", "DTSZ05", "OTSZ0...
## $ CA_IND     <chr> "Y", "N", "Y", "Y", "Y", "Y", "N", "Y", "N", "Y", "Y", "...
## $ PLN_AREA_N <chr> "OUTRAM", "BUKIT MERAH", "OUTRAM", "DOWNTOWN CORE", "DOW...
## $ PLN_AREA_C <chr> "OT", "BM", "OT", "DT", "DT", "OT", "BM", "DT", "BM", "D...
## $ REGION_N   <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REGION", "C...
## $ REGION_C   <chr> "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "C...
## $ INC_CRC    <chr> "B4120D23006C932A", "1C51019439A68700", "0FF1661344C84AE...
## $ FMEL_UPD_D <date> 2016-05-11, 2016-05-11, 2016-05-11, 2016-05-11, 2016-05...
## $ X_ADDR     <dbl> 28831.78, 26360.80, 29153.97, 29706.72, 29968.62, 29509....
## $ Y_ADDR     <dbl> 29419.65, 29384.14, 29158.04, 29744.91, 29572.76, 29646....
## $ SHAPE_Leng <dbl> 1822.1927, 3074.9632, 4297.5999, 871.5549, 1872.7522, 16...
## $ SHAPE_Area <dbl> 93140.44, 411722.82, 587222.68, 39437.94, 188767.49, 133...
## $ PA         <chr> "OUTRAM", "BUKIT MERAH", "OUTRAM", "DOWNTOWN CORE", "DOW...
## $ `0_to_6`   <dbl> 10, 50, 982, 0, 0, 52, 1060, 0, 1184, 0, 0, 364, 0, 638,...
## $ TOTAL      <dbl> 340, 1130, 11410, 0, 0, 1450, 12940, 0, 16250, 0, 0, 778...
## $ geometry   <MULTIPOLYGON [m]> MULTIPOLYGON (((29099.02 29..., MULTIPOLYGO...

2020 subzone-population data

pop20_mutate <- pop20 %>% 
  mutate_at(.vars = vars(PA, SZ),
            .funs = funs(toupper))
subzone20 <- left_join(subzone_sf3414, pop20_mutate, by = c('SUBZONE_N' = 'SZ'))

glimpse(subzone20)

## Rows: 323
## Columns: 19
## $ OBJECTID   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1...
## $ SUBZONE_NO <int> 2, 2, 3, 4, 5, 4, 10, 12, 4, 6, 1, 1, 3, 8, 3, 7, 9, 2, ...
## $ SUBZONE_N  <chr> "PEOPLE'S PARK", "BUKIT MERAH", "CHINATOWN", "PHILLIP", ...
## $ SUBZONE_C  <chr> "OTSZ02", "BMSZ02", "OTSZ03", "DTSZ04", "DTSZ05", "OTSZ0...
## $ CA_IND     <chr> "Y", "N", "Y", "Y", "Y", "Y", "N", "Y", "N", "Y", "Y", "...
## $ PLN_AREA_N <chr> "OUTRAM", "BUKIT MERAH", "OUTRAM", "DOWNTOWN CORE", "DOW...
## $ PLN_AREA_C <chr> "OT", "BM", "OT", "DT", "DT", "OT", "BM", "DT", "BM", "D...
## $ REGION_N   <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REGION", "C...
## $ REGION_C   <chr> "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "C...
## $ INC_CRC    <chr> "B4120D23006C932A", "1C51019439A68700", "0FF1661344C84AE...
## $ FMEL_UPD_D <date> 2016-05-11, 2016-05-11, 2016-05-11, 2016-05-11, 2016-05...
## $ X_ADDR     <dbl> 28831.78, 26360.80, 29153.97, 29706.72, 29968.62, 29509....
## $ Y_ADDR     <dbl> 29419.65, 29384.14, 29158.04, 29744.91, 29572.76, 29646....
## $ SHAPE_Leng <dbl> 1822.1927, 3074.9632, 4297.5999, 871.5549, 1872.7522, 16...
## $ SHAPE_Area <dbl> 93140.44, 411722.82, 587222.68, 39437.94, 188767.49, 133...
## $ PA         <chr> "OUTRAM", "BUKIT MERAH", "OUTRAM", "DOWNTOWN CORE", "DOW...
## $ `0_to_6`   <dbl> 0, 36, 731, 0, 0, 32, 1027, 0, 906, 0, 0, 258, 0, 619, 8...
## $ TOTAL      <dbl> 310, 1100, 10760, 0, 0, 1350, 13030, 0, 15430, 0, 0, 663...
## $ geometry   <MULTIPOLYGON [m]> MULTIPOLYGON (((29099.02 29..., MULTIPOLYGO...

6 Supply-Demand Analysis

The supply and demand of childcare services in 2017 and 2020 will be analysed and compared in this section.
The supply of childcare services is given by the location of childcare services, while the demand of childcare services is given by the number of potential beneficiaries (children aged 0 to 6) of childcare services.

6.1 EDA & Choropleth Mapping

Exploratory data analysis will be conducted, and choropleth mapping will be utilised to analyse the supply and demand of childcare services at the subzone level, in 2017 and 2020. It must be noted that normalised measures are required for choropleth maps. The normalised measures that will be utilised for choropleth mapping are described below.

The demand for childcare services in a subzone will be measured as the number of children aged 0 to 6 as a percentage of the population.

\[\small Demand\ for\ childcare\ in\ subzone = \frac{Number\ of\ children\ aged\ 0\ to\ 6\ in\ the\ subzone}{Total\ population\ of\ subzone}\]

The larger the value for demand, the higher the demand for childcare services in the subzone.
A larger demand value implies a higher percentage of children aged 0 to 6 in the subzone, who are the potential beneficiaries of childcare services.

The supply of childcare in a subzone will be measured as the number of childcare centres per square kilometer of the subzone.

\[\small Supply\ of\ childcare\ in\ subzone = \frac{Number\ of\ childcare\ centers\ in\ the\ subzone}{Total\ area\ of\ subzone\ (km^2)}\]

The larger the value for supply, the higher the supply of childcare services in the subzone.
A larger supply value implies that there are more childcare centers per square kilometer of the subzone to meet the demands of the subzone population.

Demand and supply can also be mapped as a ratio of the number of children aged 0 to 6, to the number of childcare centers in the subzone.

\[\small Children\ to\ Childcare\ Ratio\ of\ subzone = \frac{Number\ of\ children\ aged\ 0\ to\ 6\ at\ subzone}{Number\ of\ childcare\ centers\ at\ subzone}\]

This ratio is a measure of the number of children per childcare center in the subzone.
It provides a measure of whether a subzone is underserved or overserved in terms of childcare services.
The lower the children-to-childcare ratio of a subzone (except -1), the better the demand for childcare services is met in the subzone.
Note: Subzones with children aged 0 to 6 that have no childcare centers located in the subzone will be assigned a value of -1, to indicate that there is no childcare center available to serve the children living in the subzone.

6.1.1 Calculate required data

Calculate the data required for choropleth mapping:

Demand
Supply
Children-to-Childcare Ratio

2017 subzone-population data

# Count number of childcare centers in each subzone (points in polygon)
subzone17$NUM_CHILDCARE <- lengths(st_intersects(subzone17, childcare17_sf3414))

subzone17_ddss <- subzone17 %>%
  # Calculate demand. If there are no children, demand is 0.
  mutate(DEMAND = ifelse(`0_to_6` != 0, `0_to_6` / TOTAL, 0)) %>%
  # Calculate supply. If there are no childcare centers, supply is 0.
  mutate(SUPPLY = ifelse(NUM_CHILDCARE != 0,  
                         round(NUM_CHILDCARE / (SHAPE_Area/1000000), 0), 0)) %>%
  # Calculate children-to-childcare ratio. 
  mutate(CHILD_CENTRE_RATIO = ifelse((NUM_CHILDCARE==0 & `0_to_6`>0), -1,
                                     ifelse((NUM_CHILDCARE==0 & `0_to_6`==0), 0,
                                            round((`0_to_6` / NUM_CHILDCARE), 0))))

glimpse(subzone17_ddss)

## Rows: 323
## Columns: 23
## $ OBJECTID           <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1...
## $ SUBZONE_NO         <int> 2, 2, 3, 4, 5, 4, 10, 12, 4, 6, 1, 1, 3, 8, 3, 7...
## $ SUBZONE_N          <chr> "PEOPLE'S PARK", "BUKIT MERAH", "CHINATOWN", "PH...
## $ SUBZONE_C          <chr> "OTSZ02", "BMSZ02", "OTSZ03", "DTSZ04", "DTSZ05"...
## $ CA_IND             <chr> "Y", "N", "Y", "Y", "Y", "Y", "N", "Y", "N", "Y"...
## $ PLN_AREA_N         <chr> "OUTRAM", "BUKIT MERAH", "OUTRAM", "DOWNTOWN COR...
## $ PLN_AREA_C         <chr> "OT", "BM", "OT", "DT", "DT", "OT", "BM", "DT", ...
## $ REGION_N           <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REG...
## $ REGION_C           <chr> "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", ...
## $ INC_CRC            <chr> "B4120D23006C932A", "1C51019439A68700", "0FF1661...
## $ FMEL_UPD_D         <date> 2016-05-11, 2016-05-11, 2016-05-11, 2016-05-11,...
## $ X_ADDR             <dbl> 28831.78, 26360.80, 29153.97, 29706.72, 29968.62...
## $ Y_ADDR             <dbl> 29419.65, 29384.14, 29158.04, 29744.91, 29572.76...
## $ SHAPE_Leng         <dbl> 1822.1927, 3074.9632, 4297.5999, 871.5549, 1872....
## $ SHAPE_Area         <dbl> 93140.44, 411722.82, 587222.68, 39437.94, 188767...
## $ PA                 <chr> "OUTRAM", "BUKIT MERAH", "OUTRAM", "DOWNTOWN COR...
## $ `0_to_6`           <dbl> 10, 50, 982, 0, 0, 52, 1060, 0, 1184, 0, 0, 364,...
## $ TOTAL              <dbl> 340, 1130, 11410, 0, 0, 1450, 12940, 0, 16250, 0...
## $ geometry           <MULTIPOLYGON [m]> MULTIPOLYGON (((29099.02 29..., MUL...
## $ NUM_CHILDCARE      <int> 0, 4, 6, 1, 2, 1, 3, 0, 2, 0, 0, 5, 0, 2, 1, 9, ...
## $ DEMAND             <dbl> 0.02941176, 0.04424779, 0.08606486, 0.00000000, ...
## $ SUPPLY             <dbl> 0, 10, 10, 25, 11, 8, 7, 0, 6, 0, 0, 9, 0, 3, 3,...
## $ CHILD_CENTRE_RATIO <dbl> -1, 12, 164, 0, 0, 52, 353, 0, 592, 0, 0, 73, 0,...

2020 subzone-population data

# Count number of childcare centers in each subzone (points in polygon)
subzone20$NUM_CHILDCARE <- lengths(st_intersects(subzone20, childcare20_sf3414))

subzone20_ddss <- subzone20 %>%
  # Calculate demand. If there are no children, demand is 0.
  mutate(DEMAND = ifelse(`0_to_6` != 0, `0_to_6` / TOTAL, 0)) %>%
  # Calculate supply. If there are no childcare centers, supply is 0.
  mutate(SUPPLY = ifelse(NUM_CHILDCARE != 0,  
                         round(NUM_CHILDCARE / (SHAPE_Area/1000000), 0), 0)) %>%
  # Calculate children-to-childcare ratio. 
  mutate(CHILD_CENTRE_RATIO = ifelse((NUM_CHILDCARE==0 & `0_to_6`>0), -1,
                                     ifelse((NUM_CHILDCARE==0 & `0_to_6`==0), 0,
                                            round((`0_to_6` / NUM_CHILDCARE), 0))))

glimpse(subzone20_ddss)

## Rows: 323
## Columns: 23
## $ OBJECTID           <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1...
## $ SUBZONE_NO         <int> 2, 2, 3, 4, 5, 4, 10, 12, 4, 6, 1, 1, 3, 8, 3, 7...
## $ SUBZONE_N          <chr> "PEOPLE'S PARK", "BUKIT MERAH", "CHINATOWN", "PH...
## $ SUBZONE_C          <chr> "OTSZ02", "BMSZ02", "OTSZ03", "DTSZ04", "DTSZ05"...
## $ CA_IND             <chr> "Y", "N", "Y", "Y", "Y", "Y", "N", "Y", "N", "Y"...
## $ PLN_AREA_N         <chr> "OUTRAM", "BUKIT MERAH", "OUTRAM", "DOWNTOWN COR...
## $ PLN_AREA_C         <chr> "OT", "BM", "OT", "DT", "DT", "OT", "BM", "DT", ...
## $ REGION_N           <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REG...
## $ REGION_C           <chr> "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", ...
## $ INC_CRC            <chr> "B4120D23006C932A", "1C51019439A68700", "0FF1661...
## $ FMEL_UPD_D         <date> 2016-05-11, 2016-05-11, 2016-05-11, 2016-05-11,...
## $ X_ADDR             <dbl> 28831.78, 26360.80, 29153.97, 29706.72, 29968.62...
## $ Y_ADDR             <dbl> 29419.65, 29384.14, 29158.04, 29744.91, 29572.76...
## $ SHAPE_Leng         <dbl> 1822.1927, 3074.9632, 4297.5999, 871.5549, 1872....
## $ SHAPE_Area         <dbl> 93140.44, 411722.82, 587222.68, 39437.94, 188767...
## $ PA                 <chr> "OUTRAM", "BUKIT MERAH", "OUTRAM", "DOWNTOWN COR...
## $ `0_to_6`           <dbl> 0, 36, 731, 0, 0, 32, 1027, 0, 906, 0, 0, 258, 0...
## $ TOTAL              <dbl> 310, 1100, 10760, 0, 0, 1350, 13030, 0, 15430, 0...
## $ geometry           <MULTIPOLYGON [m]> MULTIPOLYGON (((29099.02 29..., MUL...
## $ NUM_CHILDCARE      <int> 0, 3, 9, 1, 3, 0, 4, 0, 3, 1, 0, 4, 0, 3, 1, 8, ...
## $ DEMAND             <dbl> 0.00000000, 0.03272727, 0.06793680, 0.00000000, ...
## $ SUPPLY             <dbl> 0, 7, 15, 25, 16, 0, 9, 0, 9, 4, 0, 7, 0, 5, 3, ...
## $ CHILD_CENTRE_RATIO <dbl> 0, 12, 81, 0, 0, -1, 257, 0, 302, 0, 0, 64, 0, 2...

6.1.2 Childcare services in 2017

6.1.2.1 Demand analysis

Summary statistics of demand for childcare services in subzones

summary(subzone17_ddss$DEMAND)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.00000 0.05114 0.04683 0.06882 0.18537

On average, 4.7% of subzone populations comprise children aged 0 to 6 in 2017.
Across the different subzones, the percentage of the subzone population where children are aged 0 to 6 ranges from 0% to 18.5%.
There is therefore a variation of demand for childcare services across subzones.

Box plot to visualise extreme values

ggplot(subzone17_ddss, aes(x = '', y = DEMAND)) +
  geom_boxplot() + 
  labs(x='', y='Demand') +
  theme_minimal()

It can be observed that there is one upper outlier for demand of childcare services, which is the subzone where children aged 0 to 6 make up 18.5% of the subzone population (demand=0.185). This subzone has a very high demand for childcare services relative to the rest of the subzones.

Histogram visualising the distribution of demand for childcare services in subzones

ggplot(subzone17_ddss, aes(x = DEMAND)) + 
  geom_histogram(fill = 'darksalmon') +
  labs(title = 'Distribution of demand for childcare services in subzones',
       x = 'Demand for childcare services',
       y = 'Frequency') +
  theme_minimal()

The distribution of demand for childcare services is right-skewed.
It can be noted that there are close to 100 subzones with no demand for childcare services (demand=0). If these subzones still have childcare centers located within the subzone, these areas could be considered overserved since there is no demand.
For majority of the subzones, children aged 0 to 6 make up less than 10% (demand=0.1) of the total subzone population.

Top 10 subzones with highest demand for childcare services

top10_demand_17 = top_n(subzone17_ddss, 10, DEMAND)

ggplot(top10_demand_17, aes(x=DEMAND, y=reorder(SUBZONE_N, DEMAND), label=round(DEMAND, 2))) +
  geom_col(fill='darksalmon') +
  labs(title='Top 10 subzones with highest demand for childcare',
       x='Demand',
       y='Subzone') +
  geom_text(nudge_x=0.01, colour='gray23', size=3.5) +
  theme_minimal()

Punggol Town Centre has the highest demand for childcare centers in Singapore across all subzones.

Box map

A box map will be used to visualise demand spatially across different subzones in Singapore.

A customised classification scheme for the choropleth map will be constructed using the basic principles of a box plot.
This ensures that data classification is not manipulated, and that the data is visualised accurately to represent the real-world situation.
The box map will enable statistical interpretation of outliers and better identification of subzones that have relatively higher or lower demand compared to the rest of the subzones. In this analysis, data points will be considered outliers if they are more than 1.5 times interquartile range.

The following code chunks are functions to construct the box map.

# To create break points for box map
boxbreaks <- function(v, mult=1.5) {
  qv <- unname(quantile(v))
  iqr <- qv[4] - qv[2]
  # upfence and lofence define the area where points will be defined as outliers
  upfence <- qv[4] + mult * iqr
  lofence <- qv[2] - mult * iqr
  # initialize break points vector
  bb <- vector(mode="numeric",length=7)
  # logic for lower and upper fences
  if (lofence < qv[1]) { # no lower outliers
  bb[1] <- lofence
  bb[2] <- floor(qv[1])
  } else {
  bb[2] <- lofence
  bb[1] <- qv[1]
  }
  if (upfence > qv[5]) { # no upper outliers
  bb[7] <- upfence
  bb[6] <- ceiling(qv[5])
  } else {
  bb[6] <- upfence
  bb[7] <- qv[5]
  }
  bb[3:5] <- qv[2:4]
  return(bb)
}

# Extract variable as vector out of sf dataframe
get.var <- function(vname, df) {
  v <- df[vname] %>% st_set_geometry(NULL)
  v <- unname(v[,1])
  return(v)
}

# Boxmap function
boxmap <- function(vnam, df, mtitle, legtitle=NA, mult=1.5, palette='-RdBu') {
  df1 <- drop_na(df)
  var <- get.var(vnam,df1)
  bb <- boxbreaks(var)
  tm_shape(df) +
    tm_fill(vnam,
            title=legtitle,
            breaks=bb,
            palette=palette,
            labels = c("Lower outlier", "< 25%", "25% - 50%", "50% - 75%","> 75%", "Upper outlier")) +
  tm_borders(lwd=0.1, alpha=1) +
  tm_layout(main.title = mtitle,
            main.title.position = 'center',
            main.title.size = 1,
            frame = FALSE) +
  tm_scale_bar(width = 0.15)
}

# Boxmap function with points overlayed on top of choropleth map
boxmap_pts <- function(vnam, df, pointdf, mtitle, legtitle=NA, mult=1.5, palette='-RdBu') {
  boxmap(vnam, df, mtitle, legtitle=legtitle, mult=mult, palette=palette) +
  tm_shape(pointdf) +
    tm_dots(col="gray23")
}

Demand for childcare services in 2017

dd_boxmap17 <- boxmap("DEMAND", subzone17_ddss, mtitle="Demand of Childcare Services in 2017")
dd_boxmap17

It can be observed that there is one subzone which is an upper outlier (dark red) for demand at the northeast region of Singapore. There are no lower outliers.
The upper outlier is identified as Punggol Town Center (as per previous EDA), which has extremely high demand for childcare services as compared to the rest of the subzones.
Subzones in reds have higher demand for childcare than the median subzone demand, while subzones in blues have lower demand than the median subzone demand.
As the intensity of red color increases, the subzone has a relatively higher demand compared to the rest of the subzones. Conversely, as the intensity of blue color increases, the subzone has relatively lower demand compared to the rest of the subzones.
There should therefore be more childcare services located at the subzones coloured red, especially in subzones with more intense red colour.
By observation, there are clusters of subzones with relatively higher demand (dark orange) at each region in Singapore. These clusters are most obvious in the Northeast, North and Central regions.

Overlay locations of childcare services on top of the box map

boxmap_pts("DEMAND", subzone17_ddss, childcare17_sf3414, mtitle='Demand and Location of Childcare Services in 2017')

The black dots represent locations of childcare centers.
It can be visually observed that in general, childcare centers do indeed distribute themselves around subzones with higher demand.
However, it can be observed that there are certain subzones with relatively high demand, that are coloured reds, that do not have any (or have very little) childcare services located in the subzone. These areas are potentially underserved, as children aged 0 to 6 living in these subzones will not have childcare centers located near to them.
- These subzones are located in the West, North and East regions of Singapore.
- For the subzone in the West, the lack of childcare centers can be attributed to the fact that the subzone is the Western Water Catchment. As a result, there might be space constraints for setting up childcare centers. Children residing at the subzone will have to go to childcare centers in the neighbouring subzones.

6.1.2.2 Supply analysis

Summary statistics of supply of childcare services in subzones

summary(subzone17_ddss$SUPPLY)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   3.000   3.916   6.500  28.000

On average, there are about 4 childcare centers per square kilometer for a subzone in 2017. The median is 3 childcare centers per square kilometer.
Across the different subzones, the number of childcare centers per square kilometer ranges from 0 to 28.
There is therefore a variation of the supply of childcare services across subzones.

Box plot to visualise extreme values

ggplot(subzone17_ddss, aes(x = '', y = SUPPLY)) +
  geom_boxplot() + 
  labs(x='', y='Supply') +
  theme_minimal()

It can be observed that there are multiple upper outliers for the supply of childcare services. These subzones have a higher supply of childcare services relative to the rest of the subzones, with relatively more childcare centers per square kilometer (more than 15 childcare centers per square kilometer).

Histogram visualising the distribution of supply for childcare services in subzones.

ggplot(subzone17_ddss, aes(x = SUPPLY)) + 
  geom_histogram(fill = 'darkseagreen4') +
  labs(title = 'Distribution of supply for childcare services in subzones',
       x = 'Supply for childcare services',
       y = 'Frequency') +
  theme_minimal()

The distribution of childcare supply is right-skewed.
Majority of subzones have 0 to 10 childcare centers per square kilometer.
There are more than 100 subzones with no supply for childcare services (supply=0). If these subzones have no demand for childcare services (no children aged 0 to 6), then the lack of supply can be justified. Otherwise, the subzones are likely to be underserved due to unmet demand.

Top 10 subzones with highest supply of childcare services

top10_supply_17 = top_n(subzone17_ddss, 10, SUPPLY)

ggplot(top10_supply_17, aes(x=SUPPLY, y=reorder(SUBZONE_N, SUPPLY), label=round(SUPPLY, 2))) +
  geom_col(fill='darkseagreen4') +
  labs(title='Top 10 subzones with highest supply of childcare',
       x='Supply',
       y='Subzone') +
  geom_text(nudge_x=0.8, colour='gray23', size=3.5) +
  theme_minimal()

Mandai Estate has the highest supply of childcare across all subzones in Singapore, with 28 childcare centres per square kilometer.

Box map

Similar to how demand was visualised, a box map will be utilised to visualise supply spatially across different subzones in Singapore. The box map will enable statistical interpretation of outliers and better identification of subzones that have relatively higher or lower supply compared to the rest of the subzones in Singapore.

Supply of childcare services in 2017

ss_boxmap17 <- boxmap("SUPPLY", subzone17_ddss, mtitle="Supply of Childcare Services in 2017")
ss_boxmap17

It can be observed that there are multiple subzones which are upper outliers (dark red) for the supply of childcare centers, in the North and Central regions of Singapore. There is a concentration of childcare services in those subzones, with extremely high supply of childcare centers compared to the rest of Singapore.
There are no lower outliers for supply.
Subzones in reds have higher supply of childcare than the median subzone supply, while subzones in blues have lower supply of childcare than the median subzone supply.
As the intensity of red colour increases, the subzone has relatively higher supply compared to the rest of the subzones. Conversely, as the intensity of blue colour increases, the subzone has a relatively lower supply compared to the rest of the subzones.
Therefore, subzones coloured red, especially those with more intense red colour, will be able to enjoy more childcare services in their subzone, with a higher number of childcare centers per square kilometer.
Groups of adjacent subzones with relatively higher supply (red areas) can be observed, possibly indicating a clustering of childcare services in these areas.

6.1.2.3 Demand & supply analysis

The children-to-childcare ratio is a measure of both demand and supply. The ratio is a measure of the population of children aged 0 to 6 per childcare center available in the subzone, providing an idea of whether a subzone is underserved or overserved.

To understand whether there are sufficient childcare centers to cater to the demand of the subzone population, the capacity of childcare centers have to be considered in union with the children-to-childcare ratio.

As childcare centers in Singapore vary in enrolment capacity depending on the size of the childcare center, a proxy value for the average capacity of a childcare center will be utilised in this analysis. This value will be calculated based on the latest figures for total capacity of childcare centers in Singapore, divided by the total number of childcare centers in Singapore.

\[\small Average\ childcare\ enrolment\ capacity = \frac{Total\ capacity\ in\ childcare\ centres\ in\ Singapore}{Total\ number\ of\ childcare\ centres\ in Singapore}\]

Using the latest figures (2018) from the Department of Statistics Singapore, the average childcare capacity in Singapore is 112 children per childcare center.

\[\small Average\ childcare\ enrolment\ capacity = \frac{165,919}{1,486} = 112\]

Taking the average enrolment capacity of childcare centers into account, subzones with a children-to-childcare ratio of more than 112 have demand exceeding the capacity of childcare centers available, and are underserved.
Subzones with children-to-childcare ratio of less than 112 are possibly but not necessarily overserved, as certain childcare centers in Singapore are more specialised and have smaller enrolment capacity to simulate a better learning environment.

Summary statistics of children-to-childcare ratio

summary(subzone17_ddss$CHILD_CENTRE_RATIO)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    -1.0     0.0    78.0   135.4   214.0   952.0

On average, there are about 135 children per childcare center in a subzone.
This suggests that on average, the demand for childcare centers in Singapore are not matched by supply of childcare centers, since average capacity of childcare centers in Singapore is 112. Children aged 0 to 6 in a subzone is on average underserved in terms of childcare services.
Across subzones, number of children per childcare center can go up to 952 children per center.

Box plot to visualise extreme values for children-to-childcare ratio

ggplot(subzone17_ddss, aes(x = '', y = CHILD_CENTRE_RATIO)) +
  geom_boxplot() + 
  labs(x='', y='Children-To-Childcare Ratio') +
  theme_minimal()

There are multiple upper outliers for the children-to-childcare ratio, indicating that these subzones are underserved, with a very high number of children per childcare center (more than 500), which exceeds the average enrolment capacity of a childcare center.

Histogram visualising the distribution of children-to-childcare ratio in subzones

ggplot(subzone17_ddss, aes(x = CHILD_CENTRE_RATIO)) + 
  geom_histogram(fill = 'thistle4') +
  labs(title = 'Distribution of children-to-childcare ratio in subzones',
       x = 'Children-To-Childcare Ratio',
       y = 'Frequency') +
  theme_minimal()

The distribution of children-to-childcare ratio is right-skewed.

Top 10 underserved subzones: highest children-to-childcare ratio

top10_ratio_17 = top_n(subzone17_ddss, 10, CHILD_CENTRE_RATIO)

ggplot(top10_ratio_17, aes(x=CHILD_CENTRE_RATIO, y=reorder(SUBZONE_N, CHILD_CENTRE_RATIO), label=round(CHILD_CENTRE_RATIO))) +
  geom_col(fill='thistle4') +
  labs(title='Top 10 subzones with highest children-to-childcare ratio',
       x='Children-To-Childcare Ratio',
       y='Subzone') +
  geom_text(nudge_x=45, colour='gray23', size=3.5) +
  theme_minimal()

Redhill has the highest children-to-childcare ratio in Singapore across all subzones, and is the most underserved subzone in Singapore in terms of childcare services. It has 952 children per childcare center, exceeding the average childcare enrolment capacity by almost 9 times.
Even Compassvale, which has the 10th highest children-to-childcare ratio, has 582 children per childcare center, exceeds average childcare capacity by 5 times.

Choropleth Map

To better understand how the demand for childcare services in a subzone is matched by supply, a customised classification scheme will be used to visualise the chilren-to-childcare ratio spatially across subzones in Singapore.

The values of the children-to-childcare ratio are classified according to the following scheme:

-1 : Subzones with children-to-childcare ratio of -1 have demand for childcare services, but do not have childcare centers located in the subzones.
0 : Subzones with children-to-childcare ratio of 0 do not have demand for childcare services (there are no children aged 0 to 6 in the subzones).
0 - 111 : Well-served or overserved subzones, with childcare capacity meeting the subzone demand.
Above 112 : Underserved subzones, with demand for childcare services exceeding the average childcare capacity. There is a higher number of children aged 0 to 9 per childcare center residing in the subzone that exceeds the average childcare enrolment capacity.

The function below creates the choropleth map with the custom classification scheme.

choropleth_ratio <- function(df, mtitle='Children-to-childcare Ratio', palette='-RdGy') {
  df$category[df$CHILD_CENTRE_RATIO == -1] = "-1"
  df$category[df$CHILD_CENTRE_RATIO == 0] = "0"
  df$category[df$CHILD_CENTRE_RATIO > 0 & df$CHILD_CENTRE_RATIO <= 112] = "0 - 112"
  df$category[df$CHILD_CENTRE_RATIO > 112] = "Above 112"
  
  tm_shape(df) +
    tm_fill('category',
            palette = palette,
            title = 'CHILDREN-TO-CHILDCARE\nRATIO') +
    tm_borders(lwd = 0.1,
               alpha = 1) +
    tm_layout(main.title = mtitle,
              main.title.position = 'center',
              main.title.size = 1,
              frame = FALSE) +
    tm_scale_bar(width = 0.15)
}

Children-to-childcare ratio for childcare services in 2017

ratio_choropleth17 <- choropleth_ratio(subzone17_ddss, mtitle = 'Children-To-Childcare Ratio in 2017')
ratio_choropleth17

Subzones in deep red are underserved subzones in terms of childcare services. These subzones have a higher demand for childcare services that exceed the average childcare capacity. This means that there are less opportunities for children aged 0 to 6 in these subzones to utilise childcare services.
- There are a lot of subzones in deep red across Singapore, suggesting that a lot of subzones in Singapore have needs for childcare services that are not met, as demand is not matched by supply.
Subzones in dark grey are subzones that do not have childcare centres located in the area to serve the demands of the subzone for childcare services. There are several of these subzones spread across Singapore. The problem is amplified for many of these subzones because neighbouring subzones are coloured in dark red, indicating that there is insufficient supply of childcare services even in neighbouring subzones. Children of these subzones will therefore have to travel further out of their subzones for opportunities to utilise childcare services.
Subzones in light grey have no demand for childcare services, which can be attributed to these subzones being non-residential areas such as water catchment and industrial areas.

Overlay locations of childcare services on top of the choropleth map

choropleth_ratio(subzone17_ddss, mtitle = 'Childcare Locations and Children-To-Childcare Ratio in 2017') +
  tm_shape(childcare17_sf3414) +
  tm_dots()

It can be observed that even though childcare services are generally distributed around the subzones that are in deep red, these subzones are still underserved.
This implies that childcare enrolment capacity is generally insufficient in Singapore. Many subzones are still underserved, and childcare centers have insufficient capacity to accomodate the demand based on the subzone populations of children aged 0 to 6.

Box Map

To conduct more statistically sound comparisons across subzones in Singapore, a box map is utilised to enable statistical interpretation of data.

ratio_boxmap17 <- boxmap("CHILD_CENTRE_RATIO", subzone17_ddss, mtitle="Children-To-Childcare Ratio in 2017", legtitle = 'CHILDREN-TO-CHILDCARE\nRATIO')
ratio_boxmap17

There are multiple subzones that are outliers, which are coloured in dark red. Majority of them are located at the North, Northeast and Central regions of Singapore. These subzones have extremely high children-to-childcare ratio relative to the rest of the subzones in Singapore, indicating that they are the most underserved subzones in Singapore, where supply of childcare services is insufficient to cater to the demand.

6.1.3 Childcare services in 2020

Similar to the analysis for childcare services in 2017, the same will be conducted for childcare services in 2020. Differences in observations between 2017 and 2020 will then be compared.

6.1.3.1 Demand analysis

Summary statistics of demand for childcare services in subzones

summary(subzone20_ddss$DEMAND)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.00000 0.04769 0.04361 0.06503 0.16909

On average, 4.4% of subzone populations comprise children aged 0 to 6 in 2020. This is a decrease from 2017, where children made up 4.7% of subzone populations on average. Thus, from 2017 to 2020, average subzone demand for childcare services decreased.
Across the different subzones, the percentage of the subzone population where children are aged 0 to 6 ranges from 0% to 16.9%. This range is also reduced from that of 2017, which had a range of 0% to 18.5%.
There is a variation of demand for childcare services across subzones.

Box plot to visualise extreme values

ggplot(subzone20_ddss, aes(x = '', y = DEMAND)) +
  geom_boxplot() + 
  labs(x='', y='Demand') +
  theme_minimal()

There is one upper outlier for demand of childcare services, which is the subzone where children aged 0 to 6 make up 16.9% of the subzone population (demand=0.169). This subzone has a very high demand for childcare services relative to the rest of the subzones.

Histogram visualising the distribution of demand for childcare services in subzones

ggplot(subzone20_ddss, aes(x = DEMAND)) + 
  geom_histogram(fill = 'darksalmon') +
  labs(title = 'Distribution of demand for childcare services in subzones',
       x = 'Demand for childcare services',
       y = 'Frequency') +
  theme_minimal()

The distribution of demand remains skewed to the right.
Majority of subzones have a demand for childcare of less than 0.10, where children aged 0 to 6 make up less than 10% of the total subzone population.
It can be noted that there are now more than 100 subzones with no demand for childcare services (demand=0) in 2020, compared to 2017 where the value was less than 100. If these subzones still have childcare centers located within the subzone, these areas could be considered overserved since there is no demand.

Top 10 subzones with highest demand for childcare services

top10_demand_20 = top_n(subzone20_ddss, 10, DEMAND)

ggplot(top10_demand_20, aes(x=DEMAND, y=reorder(SUBZONE_N, DEMAND), label=round(DEMAND, 2))) +
  geom_col(fill='darksalmon') +
  labs(title='Top 10 subzones with highest demand for childcare',
       x='Demand',
       y='Subzone') +
  geom_text(nudge_x=0.01, colour='gray23', size=3.5) +
  theme_minimal()

Punggol Town Centre remains as the subzone with the highest demand for childcare centers in Singapore across all subzones.
There are changes in top 10 rankings for demand compared to 2017. Subzones such as Sembawang East, Changi West, Brickworks and Tampines North, which were previously not in the top 10 made its way up to be part of the top 10 subzones having highest demand for childcare services. On the other hand, subzones like Mackenzie, Singapore Polytechnic, Newton Circus and Paterson are not in the top 10 highest demand anymore.

Box map

A box map will be used to visualise demand spatially across different subzones in Singapore.

Demand for childcare services in 2020

dd_boxmap20 <- boxmap("DEMAND", subzone20_ddss, mtitle="Demand of Childcare Services in 2020")
dd_boxmap20

Similar to 2017, Punggol Town Center (Northeast region) is an upper outlier for demand (dark red), and has extremely high demand for childcare services as compared to the rest of the subzones.
There are no lower outliers.
Subzones in reds have higher demand for childcare than the median subzone demand, while subzones in blues have lower demand than the median subzone demand.
As the intensity of red color increases, the subzone has a relatively higher demand compared to the rest of the subzones. Conversely, as the intensity of blue color increases, the subzone has relatively lower demand compared to the rest of the subzones.
There should therefore be more childcare services located at the subzones coloured red, especially in subzones with more intense red colour.
By observation, there are clusters of subzones with relatively higher demand (dark orange) at each region in Singapore. These clusters are most obvious in the Northeast, North and Central regions.

Overlay locations of childcare services on top of the box map

boxmap_pts("DEMAND", subzone20_ddss, childcare20_sf3414, mtitle='Demand and Location of Childcare Services in 2020')

The black dots represent locations of childcare centers.
In general, childcare centers do indeed distribute themselves around subzones with higher demand.
However, it can be observed that there are still certain subzones with relatively higher demand, that are coloured reds, that do not have any (or have very little) childcare services located in the subzone. These areas are potentially underserved, as children aged 0 to 6 living in these subzones will not have childcare centers located near to them. Children in these subzones will have less opportunities to utilise childcare services, and will have to travel out of their subzones to seek such opportunities to attend childcare.

Visualise and compare demand for childcare services in 2017 and 2020

To better compare changes in demand for childcare services between 2017 and 2020, the box map for demand of childcare services in 2017 and 2020 are placed side-by-side.

tmap_arrange(dd_boxmap17, dd_boxmap20, ncol=2)

Visually, it can be observed that some subzones in the North, West and Central regions of Singapore, which used to have relatively higher demand (dark orange) in 2017, now have relatively lower demand (light red) in 2020.
- These subzones have been identified to include Nee Soon, Gombak, Mountbatten and Tanjong Rhu.
The converse can also be observed. Certain subzones now have relatively higher demand compared to the rest of the subzones (darker red), as opposed to 2017 (where they were light blue or light red in colour). These subzones are located in the Central, Northeast and East regions of Singapore.
- The subzones have been identified to include Seletar, Tampines West and Swiss Club.

6.1.3.2 Supply analysis

Summary statistics of supply of childcare services in subzones

summary(subzone20_ddss$SUPPLY)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   3.000   4.601   8.000  36.000

On average, there are about 5 childcare centres per square kilometer for a subzone in 2017. This is an increase from 2017, which had 4 childcare centres per square kilometer for a subzone on average. Thus, from 2017 to 2020, the average supply of childcare centres in subzones increased.
The median is 3 childcare centers per square kilometer.
Across the different subzones, the number of childcare centers per square kilometer ranges from 0 to 36. Compared to 2017 which had a range of 0 to 28, the range of supply across subzones increased.
There is a variation of the supply of childcare services across subzones.

Box plot to visualise extreme values

ggplot(subzone20_ddss, aes(x = '', y = SUPPLY)) +
  geom_boxplot() + 
  labs(x='', y='Supply') +
  theme_minimal()

It can be observed that there are four upper outliers for the supply of childcare services. These subzones have a higher supply of childcare services relative to the rest of the subzones, with relatively more childcare centers per square kilometer (more than 20 childcare centers per square kilometer).

Histogram visualising the distribution of supply for childcare services in subzones.

ggplot(subzone20_ddss, aes(x = SUPPLY)) + 
  geom_histogram(fill = 'darkseagreen4') +
  labs(title = 'Distribution of supply for childcare services in subzones',
       x = 'Supply for childcare services',
       y = 'Frequency') +
  theme_minimal()

The distribution of supply of childcare services is right-skewed.
There are more than 100 subzones with no supply for childcare services (supply=0). If these subzones have no demand for childcare services (no children aged 0 to 6), then the lack of supply can be justified. Otherwise, the subzones are likely to be underserved due to unmet demand.
Majority of subzones have 0 to 15 childcare centers per square kilometer.

Top 10 subzones with highest supply of childcare services

top10_supply_20 = top_n(subzone20_ddss, 10, SUPPLY)

ggplot(top10_supply_20, aes(x=SUPPLY, y=reorder(SUBZONE_N, SUPPLY), label=round(SUPPLY, 2))) +
  geom_col(fill='darkseagreen4') +
  labs(title='Top 10 subzones with highest supply of childcare',
       x='Supply',
       y='Subzone') +
  geom_text(nudge_x=0.8, colour='gray23', size=3.5) +
  theme_minimal()

Since 2017, Cecil overtook Mandai Estate to be the subzone with the highest supply of childcare across all subzones in Singapore, with 36 childcare centres per square kilometer.
Mandai Estate, which was originally the subzone with the highest supply of childcare in 2017, also has its supply increased from 28 to 35 childcare centres per square kilometer.
From 2017, Raffles Place and Sembawang Central made its way up to become part of the top 10 subzones with the highest childcare supply. Whereas Anson and Midview dropped out of the top 10 rankings.
In general, the top 10 subzones with highest supply have a larger number of childcare centres per square kilometer in 2020 compared to 2017.

Box map

Similar to how demand was visualised, a box map will be utilised to visualise supply spatially across different subzones in Singapore.

Supply of childcare services in 2020

ss_boxmap20 <- boxmap("SUPPLY", subzone20_ddss, mtitle="Supply of Childcare Services in 2020")
ss_boxmap20

Similar to 2017, it can be observed that there are multiple upper outliers(dark red) for supply of childcare centers in the North and Central regions of Singapore. There is a concentration of childcare services in those subzones, with extremely high supply of childcare centers compared to the rest of Singapore.
There are no lower outliers for supply.
Subzones in reds have higher supply of childcare than the median subzone supply, while subzones in blues have lower supply of childcare than the median subzone supply.
As the intensity of red colour increases, the subzone has relatively higher supply compared to the rest of the subzones. Conversely, as the intensity of blue colour increases, the subzone has a relatively lower supply compared to the rest of the subzones.
Therefore, subzones coloured red, especially those with more intense red colour, will be able to enjoy more childcare services in their subzone, with a higher number of childcare centers per square kilometer.
By observation, there are groups of adjacent subzones with relatively higher supply (dark orange), which suggests a possible clustering of childcare services at those areas.

Visualise and compare supply of childcare services in 2017 and 2020

To better compare changes in supply for childcare services between 2017 and 2020, the box map for supply of childcare services in 2017 and 2020 are placed side-by-side.

tmap_arrange(ss_boxmap17, ss_boxmap20, ncol=2)

Compared to 2017, it can be observed that certain subzones in the East and Central regions now have relatively lower supply of childcare compared to the rest of the subzones. This is denoted by the change in colour from darker red to light red.
- These subzones are identified as Pasir Ris West and Aljunied.
There are also subzones mainly in the North, Northeast and Central regions that now have relatively higher supply of childcare compared to the rest of the subzones, as opposed to during 2017. This is can be observed by the subzones having a more intense red colour in 2020 than in 2017.
- For example, Sembawang Central in the North region has now become an upper outlier, with an extremely high number of childcare centres per square kilometer compared to the rest of the subzones in Singapore.
- Other subzones which now have relatively higher supply of childcare include Punggol Town Centre (which is an upper outlier for demand and has extremely high demand for childcare services) and Sengkang Town Centre.

6.1.3.3 Demand & supply analysis

Similar to the analysis for 2017 data, the children-to-childcare ratio will be utilised to analyse demand and supply of childcare services. Average childcare capacity in Singapore is taken to be 112 children per centre.

Summary statistics of children-to-childcare ratio

summary(subzone20_ddss$CHILD_CENTRE_RATIO)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    -1.0     0.0    87.0   112.5   183.0   803.0

On average, there are about 113 children per childcare center in a subzone. This is a decrease from 2017, where there were about 135 children per childcare center in a subzone on average. Therefore, in general, demand is better matched by supply in 2020 than in 2017.
This suggests that on average, the demand for childcare centers in Singapore is now matched by supply of childcare centers, since average capacity of childcare centers in Singapore is 112.
Across subzones, the number of children per childcare center can go up to 803 children per center. This is now a lower value compared to 2017, which could go up to 952 children per center.

Box plot to visualise extreme values for children-to-childcare ratio

ggplot(subzone20_ddss, aes(x = '', y = CHILD_CENTRE_RATIO)) +
  geom_boxplot() + 
  labs(x='', y='Children-To-Childcare Ratio') +
  theme_minimal()

There are multiple upper outliers for the children-to-childcare ratio, indicating that these subzones are underserved, with a very high number of children per childcare center (more than 400), which exceeds the average enrolment capacity of a childcare center.

Histogram visualising the distribution of children-to-childcare ratio in subzones

ggplot(subzone20_ddss, aes(x = CHILD_CENTRE_RATIO)) + 
  geom_histogram(fill = 'thistle4') +
  labs(title = 'Distribution of children-to-childcare ratio in subzones',
       x = 'Children-To-Childcare Ratio',
       y = 'Frequency') +
  theme_minimal()

The distribution of children-to-childcare ratio is right-skewed.

Top 10 underserved subzones: highest children-to-childcare ratio

top10_ratio_20 = top_n(subzone20_ddss, 10, CHILD_CENTRE_RATIO)

ggplot(top10_ratio_20, aes(x=CHILD_CENTRE_RATIO, y=reorder(SUBZONE_N, CHILD_CENTRE_RATIO), label=round(CHILD_CENTRE_RATIO))) +
  geom_col(fill='thistle4') +
  labs(title='Top 10 subzones with highest children-to-childcare ratio',
       x='Children-To-Childcare Ratio',
       y='Subzone') +
  geom_text(nudge_x=45, colour='gray23', size=3.5) +
  theme_minimal()

Similar to the results in 2017, Redhill has the highest children-to-childcare ratio in Singapore across all subzones, and is the most underserved subzone in Singapore in terms of childcare services. However, demand for childcare services in Redhill is now better matched by supply as compared to 2017. It now has 803 children per childcare center, as compared 952 in 2017. But demand still exceeds the average childcare enrolment capacity by about 7 times.
Other subzones like Lower Seletar, Matilda, Fernvale and Dover, which remained in the top 10 highest children-to-childcare ratio, also see a decrease in their children-to-childcare ratio compared to their respective 2017 values.

Choropleth Map

The values of the children-to-childcare ratio are classified according to the following scheme:

-1 : Subzones with children-to-childcare ratio of -1 have demand for childcare services, but do not have childcare centers located in the subzones.
0 : Subzones with children-to-childcare ratio of 0 do not have demand for childcare services (there are no children aged 0 to 6 in the subzones).
0 - 111 : Well-served or overserved subzones, with childcare capacity meeting the subzone demand.
Above 112 : Underserved subzones, with demand for childcare services exceeding the average childcare capacity. There is a higher number of children aged 0 to 9 per childcare center residing in the subzone that exceeds the average childcare enrolment capacity.

Children-to-childcare ratio for childcare services in 2020

ratio_choropleth20 <- choropleth_ratio(subzone20_ddss, mtitle = 'Children-To-Childcare Ratio in 2020')
ratio_choropleth20

Subzones in deep red are underserved subzones in terms of childcare services. These subzones have a higher demand for childcare services that exceed the average childcare capacity. This means that there are less opportunities for children aged 0 to 6 in these subzones to utilise childcare services.
- There are still a lot of subzones across Singapore in deep red, suggesting that a lot of subzones in Singapore have needs for childcare services that are not met, as demand is not matched by supply.
Subzones in dark grey are subzones that do not have childcare centres located in the area to serve the demands of the subzone for childcare services. There are several of these subzones spread across Singapore. The problem is amplified for many of these subzones because neighbouring subzones are coloured in dark red, indicating that there is insufficient supply of childcare services even in neighbouring subzones. Children of these subzones will therefore have to travel further out of their subzones for opportunities to utilise childcare services.
Subzones in light grey have no demand for childcare services, which can be attributed to these subzones being non-residential areas such as water catchment and industrial areas.

Compare children-to-childcare ratio in 2017 and 2020

The choropleth map for chilren-to-childcare ratio for 2017 and 2020 are placed side-by-side for comparison.

tmap_arrange(ratio_choropleth17, ratio_choropleth20, ncol=2)

While there are still many subzones that are underserved (deep red), there are now less subzones that are underserved in 2020 than in 2017, especially in the Northeast, East and West regions of Singapore. This can be observed by a lesser number of subzones in deep red in the 2020 map than in the 2017 map. Furthermore, in the Central region, it can be observed that many subzones which originally did not have any childcare services to meet its demand (dark grey) in 2017, now has childcare centres built in the subzones, albeit still being underserved (red).
This suggests that there is a general improvement from 2017, in the gap between demand and supply of childcare services in subzones. The demand and supply gap for some of the subzones have been bridged either by a decrease in demand for childcare services (due to a decrease in number of children aged 0 to 6), or by an increase in demand of childcare services (where more childcare centres have been built). Children that had been limited in opportunities to utilise childcare services in 2017 now have more opportunities to utilise them.

Box Map

To conduct more statistically sound comparisons across subzones in Singapore, a box map is utilised to enable statistical interpretation of data.

ratio_boxmap20 <- boxmap("CHILD_CENTRE_RATIO", subzone20_ddss, mtitle="Children-To-Childcare Ratio in 2020", legtitle = 'CHILDREN-TO-CHILDCARE\nRATIO')
ratio_boxmap20

There are multiple subzones that are outliers, which are coloured in dark red. Majority of them are located at the Northeast, West and Central regions of Singapore. These subzones have extremely high children-to-childcare ratio relative to the rest of the subzones in Singapore, indicating that they are the most underserved subzones in Singapore, where supply of childcare services is insufficient to cater to the demand.

Compare children-to-childcare ratio in 2017 and 2020 with box map

tmap_arrange(ratio_boxmap17, ratio_boxmap20, ncol=2)

Focusing on the upper outliers in dark red, it can be observed that there are now less subzones with extremely high children-to-childcare ratios relative to the rest of the subzones in 2020, compared to that in 2017.
Majority of these subzones with extremely high children-to-childcare ratios in 2017 were located in the North, Northeast and Central regions of Singapore. Now, majority of subzones with extremely high children-to-childcare ratios are located in the Northeast, West and Central regions, with less of them in the Northeast and Central regions compared to 2017.
This suggests that compared to 2017, there are now less subzones which are extremely underserved relative to the rest of the subzones in the particular year.

6.2 Spatio-Temporal Analysis

In this section, temporal changes of childcare services from 2017 to 2020 will be analysed spatially.

6.2.1 Summary Statistics

Calculate the difference in number of childcare services for each subzone

subzone1720 <- st_join(subzone17, subzone20,
                       join = st_equals,
                       suffix = c('_2017', '_2020')) %>%
  mutate(DIFF_CHILDCARE = NUM_CHILDCARE_2020 - NUM_CHILDCARE_2017)

summary(subzone1720$DIFF_CHILDCARE)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -4.0000  0.0000  0.0000  0.7214  1.0000 10.0000

From summary statistics, the number of childcare services in each subzone increased on average.
There are subzones that increased as much as 10 childcare centres.
Certain subzones decreased in their number of childcare services, up to a decrease in 4 childcare centres.

6.2.2 Dot Distribution Map

The function below creates a dot distribution map.

dotdistmap <- function(polygon_df, point_df, mtitle) {
  tm_shape(polygon_df) +
    tm_fill(col = 'gray94') +
    tm_borders(col = 'gray28', lwd = 0.1, alpha = 0.3) +
    tm_shape(point_df) +
    tm_dots(col = 'tomato3') +
    tm_layout(frame.lwd = 0.1, 
              frame = 'gray31',
              main.title = mtitle,
              main.title.position = 'center',
              main.title.size = 1.5,
              main.title.color = 'gray28')
}

Childcare services in 2017 and 2020

tmap_arrange(dotdistmap(subzone_sf3414, childcare17_sf3414, '2017'), 
             dotdistmap(subzone_sf3414, childcare20_sf3414, '2020'),
             ncol = 2)

Generally, there seems to be more childcare centres in 2020 than in 2017.
The number of childcare centres increased in North and Northeast regions of Singapore.

6.2.3 Point Map

To better visualise the changes in childcare services across subzones, childcare location points for 2017 and 2020 were overlayed on top of each other.

Childcare services in 2017 and 2020

pointmap <- tm_shape(subzone_sf3414) +
  tm_fill(col = 'gray94') +
  tm_borders(col = 'gray28', lwd = 0.1, alpha = 0.3) +
  tm_shape(childcare17_sf3414) +
  tm_symbols(size = 0.1,
             col = 'red1',
             alpha = 0.5, 
             border.lwd = NA) +
  tm_shape(childcare20_sf3414) + 
  tm_symbols(size = 0.1,
             col = 'yellow1',
             alpha = 0.5, 
             border.lwd = NA) +
  tm_add_legend(type = 'symbol',
                title = 'Childcare',
                labels = c('2017', '2020'),
                col = c('red1', 'yellow1'),
                alpha = 0.7,
                border.lwd = 0.01) +
  tm_layout(main.title = 'Childcare Services in 2017 and 2020',
            main.title.position = 'center',
            main.title.size = 1,
            frame = FALSE) +
  tm_scale_bar(width = 0.15)

pointmap

Childcare services in 2017 and 2020 were coloured red and yellow respectively. Their opacity were then adjusted and made more transparent, to be able to see if childcare location points were overlapping.
This way, a childcare location point that is orange will mean that the childcare existed since 2017 and still exists in 2020. Points that in red means that the childcare is no longer present in 2020 and have been closed down sine 2017. Points in yellow will mean that the childcare did not exist in 2017, and is newly built after 2017.
It can be observed that there are many new childcare services set up in 2020 compared to in 2017, mainly in the North, Northeast and West regions, denoted by the yellow areas. This is coherent with the observations made from the dot distribution map previously.
Most childcare centers that existed in 2017 is still in operation in 2020, denoted by the orange areas.
However, there were some childcare centres that closed down since 2017, denoted by the red points. Childcare centres that have been closed down since 2017 are mainly located in the Central region of Singapore.

6.2.4 Choropleth Map

To analyse the change in childcare services from 2017 to 2020 at a subzone level, a choropleth map will be constructed to find out which subzones increased or decreased its number of childcare services offered. There will be three categories: (1) Increased, (2) No Change, (3) Decreased.

Construct choropleth map of change in childcare services

subzone1720$diff_category[subzone1720$DIFF_CHILDCARE < 0] = "Decreased"
subzone1720$diff_category[subzone1720$DIFF_CHILDCARE == 0] = "No Change"
subzone1720$diff_category[subzone1720$DIFF_CHILDCARE > 0] = "Increased"
  
change_choropleth <- tm_shape(subzone1720) +
  tm_fill('diff_category',
          palette = c('thistle3', 'darkseagreen', 'gray90'),
          title = 'CHANGE') +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Change in Number of Childcare Services\nFrom 2017 to 2020",
            main.title.position = 'center',
            main.title.size = 1,
            frame = FALSE) +
  tm_scale_bar(width = 0.15)

change_choropleth

Many subzones increased in its provision of childcare services (green areas), especially in the Northeast region of Singapore.
However, there were also few subzones which decreased its number of childcare centres (purple areas), in the North, Central and West regions of Singapore.

To better assess whether changes in childcare services between 2017 and 2020 correspond to the demand of childcare services in subzones, the changes will be compared with demand at a subzone level.

Compare changes in childcare services with demand in 2017

tmap_arrange(change_choropleth, dd_boxmap17, ncol=2)

Majority of the newly set up childcare services in 2020 (green areas on left map) were generally set up in areas of higher demand (red areas on right map) based on 2017 population, possibly explaining the improvement in the demand and supply gap that was observed previously (in the demand and supply analysis of childcare services in 2020).
However, there are also subzones that had high demand (red areas on right map) based on 2017 population, but did not increase in its provision of childcare services (no change or decreased). This is especially true in the North, Central and West regions of Singapore.

Compare changes in childcare services with demand in 2020

tmap_arrange(change_choropleth, dd_boxmap20, ncol=2)

In comparison with demand in 2020, the number of childcare services did increase since 2017 (green areas on left map) at certain subzones with high demand (red areas on right map) based on population in 2020.
However, there are subzones which decreased its number of childcare services (purple areas on left map), even though the subzone has a high demand in 2020 (red areas on right map). This is particularly true in the Central and North regions of Singapore. There are also subzones which has no change in the number of childcare services since 2017 (grey areas on left map), even though the subzone has a high demand for childcare services in 2020 (red ares on right map).
This can suggest that the planning for provision of childcare services does not adapt quickly enough to the needs of the population at a subzone level. Hence, more evidence-based approaches as such to understand the demand of subzones should be used in planning for the provision of childcare services in subzones.

In conclusion, the analysis and comparison of the demand and supply and childcare services in 2017 and 2020 revealed that there have been improvements in increasing the supply of childcare centres to meet the high demand of subzones. However, more can be done to bridge the gap in many of the subzones in Singapore which are still underserved in terms of childcare services.

7 Spatial Point Pattern Analysis

This section analyses the spatial point pattern of childcare services in Singapore.

7.1 Point Symbol Map

Visualise childcare locations in 2017 and 2020 using a point symbol map.

The following function creates the point symbol map.

point_symbol_map <- function(polygon_df, point_df, mtitle)
  tm_shape(polygon_df) +
  tm_fill(col = 'gray94') +
  tm_borders(col = 'gray28', lwd = 0.1, alpha = 0.3) +
  tm_shape(point_df) +
  tm_bubbles(col = 'tomato3',
             size = 0.1,
             alpha = 0.8,
             border.col = 'gray31',
             border.lwd = 0.1) +
  tm_layout(frame = FALSE,
            main.title = mtitle,
            main.title.position = 'center',
            main.title.size = 1) +
  tm_scale_bar(width = 0.15)

7.1.1 Childcare (2017)

point_symbol_map(subzone_sf3414, childcare17_sf3414, 'Childcare Locations in 2017')

The locations of childcare services in 2017 do not seem to be randomly distributed.
By observation, there are signs of clustering at several areas in Singapore, where childcare services are more densely arranged together. These groups of clustered childcare services are distinct and separate from one another.
Several tight clusters can be observed:
- 2 clusters in the North region
- 2 clusters in the West region
- A cluster in the East region
- A cluster in the Northeast region

7.1.2 Childcare (2020)

point_symbol_map(subzone_sf3414, childcare20_sf3414, 'Childcare Locations in 2020')

Similar to the observations for childcare locations in 2017, childcare locations in 2020 also do not seem to be randomly distributed, with signs of clustering at several regions of Singapore.

7.2 Second Order Point Pattern Analysis

In order to statistically verify the above observations that the locations of childcare centres are not randomly distributed and whether there is clustering, second order point pattern analysis will be conducted.

Due to the intensive computational power required to conduct analysis on the whole of Singapore, analysis will be conducted on selected planning areas of Singapore (highlighted in pink in the map below). These study areas are:
- Sengkang
- Hougang
- Bedok
- Bukit Batok

subzone_sf3414$STUDY_AREA[(subzone_sf3414$PLN_AREA_N=='SENGKANG') | (subzone_sf3414$PLN_AREA_N=='BEDOK') | (subzone_sf3414$PLN_AREA_N=='BUKIT BATOK') | (subzone_sf3414$PLN_AREA_N=='HOUGANG')] = "Y"
subzone_sf3414$STUDY_AREA[is.na(subzone_sf3414$STUDY_AREA)] = 'N'

tm_shape(subzone_sf3414) +
  tm_fill('STUDY_AREA',
          palette = c('grey90', 'rosybrown2')) +
  tm_borders(col = 'gray28', lwd = 0.1, alpha = 0.3) +
    tm_layout(legend.show = FALSE,
              frame = FALSE)

7.2.1 Prepare data

Convert and prepare data into an appropriate format for spatial point pattern analysis.

7.2.1.1 Convert point data into ppp format

Childcare point data (currently an sf object) has to be converted into spatstat’s ppp format.
This will be done through the following format conversions:
sf object > SpatialPointsDataFrame object > SpatialPoints object > ppp object

Childcare (2017)

childcare17_ppp <- childcare17_sf3414 %>%
  as('Spatial') %>%
  as('SpatialPoints') %>%
  as('ppp')

summary(childcare17_ppp)

## Planar point pattern:  1312 points
## Average intensity 1.623186e-06 points per square unit
## 
## *Pattern contains duplicated points*
## 
## Coordinates are given to 3 decimal places
## i.e. rounded to the nearest multiple of 0.001 units
## 
## Window: rectangle = [11203.01, 45404.24] x [25667.6, 49300.88] units
##                     (34200 x 23630 units)
## Window area = 808287000 square units

Childcare (2020)

childcare20_ppp <- childcare20_sf3414 %>%
  as('Spatial') %>%
  as('SpatialPoints') %>%
  as('ppp')

summary(childcare20_ppp)

## Planar point pattern:  1545 points
## Average intensity 1.91145e-06 points per square unit
## 
## *Pattern contains duplicated points*
## 
## Coordinates are given to 3 decimal places
## i.e. rounded to the nearest multiple of 0.001 units
## 
## Window: rectangle = [11203.01, 45404.24] x [25667.6, 49300.88] units
##                     (34200 x 23630 units)
## Window area = 808287000 square units

7.2.1.2 Handle duplicated points

Duplicated points in the childcare data must be handled, as the statistical methodology used assumes simple spatial point processes, hence points cannot be coincidental.

Duplicated points will handled through jittering, where a small perturbation is added to the duplicate points, such that the points do not occupy the exact same space.

7.2.1.2.1 Check for duplication

Childcare (2017)

any(duplicated(childcare17_ppp))

## [1] TRUE

sum(multiplicity(childcare17_ppp) > 1)

## [1] 85

There are 85 points that have duplication.

Childcare (2020)

any(duplicated(childcare20_ppp))

## [1] TRUE

sum(multiplicity(childcare20_ppp) > 1)

## [1] 128

There are 128 points that have duplication.

7.2.1.2.2 Jittering

Childcare (2017)

childcare17_ppp_jit <- rjitter(childcare17_ppp,
                               retry = TRUE,
                               nsim = 1,
                               drop = TRUE)
sum(multiplicity(childcare17_ppp_jit) > 1)

## [1] 0

There are no more duplicated points after jittering.

Childcare (2020)

childcare20_ppp_jit <- rjitter(childcare20_ppp,
                               retry = TRUE,
                               nsim = 1,
                               drop = TRUE)
sum(multiplicity(childcare20_ppp_jit) > 1)

## [1] 0

There are no more duplicated points after jittering.

7.2.1.3 Define study areas

The analysis of spatial point patterns will be will be confined to four geographical areas of Singapore: Sengkang, Hougang, Bedok and Bukit Batok.

These study areas will first be extracted from the subzone data.
To represent the polygonal region of the study areas, an owin object of spatstat will be created for each study area.
This is done through the following conversions:
sf object > SpatialPolygonsDataFrame > SpatialPolygons object > owin object
Note: an owin object will also be created for the Singapore boundary, for later analysis in the next section.

Function to extract study area and convert to owin object.

get_owin <- function(subzone, pln_area_n) {
  subzone[subzone$PLN_AREA_N == pln_area_n,] %>%
    as('Spatial') %>%
    as('SpatialPolygons') %>%
    as('owin')
}

Create owin objects for study areas.

Sengkang

sengkang_owin <- get_owin(subzone_sf3414, "SENGKANG")

sengkang_owin

## window: polygonal boundary
## enclosing rectangle: [30122.19, 37368.16] x [39760.83, 42585.83] units

Hougang

hougang_owin <- get_owin(subzone_sf3414, "HOUGANG")

hougang_owin

## window: polygonal boundary
## enclosing rectangle: [32428.23, 36538.74] x [35083.89, 41180.3] units

Bedok

bedok_owin <- get_owin(subzone_sf3414, "BEDOK")

bedok_owin

## window: polygonal boundary
## enclosing rectangle: [34995.38, 42502.87] x [31574.68, 36745.39] units

Bukit Batok

bukitbatok_owin <- get_owin(subzone_sf3414, "BUKIT BATOK")

bukitbatok_owin

## window: polygonal boundary
## enclosing rectangle: [17231.102, 20996.085] x [34961.52, 40182.48] units

Singapore

subzone_owin <- subzone_sf3414 %>%
  as('Spatial') %>%
  as('SpatialPolygons') %>%
  as('owin')

subzone_owin

## window: polygonal boundary
## enclosing rectangle: [2667.54, 56396.44] x [15748.72, 50256.33] units

7.2.1.4 Combine points with study area

Combine the childcare points with the study area.

Sengkang (2017)

childcare17_sengkang_ppp <- childcare17_ppp_jit[sengkang_owin]

Hougang (2017)

childcare17_hougang_ppp <- childcare17_ppp_jit[hougang_owin]

Bedok (2017)

childcare17_bedok_ppp <- childcare17_ppp_jit[bedok_owin]

Bukit Batok (2017)

childcare17_bukitbatok_ppp <- childcare17_ppp_jit[bukitbatok_owin]

Sengkang (2020)

childcare20_sengkang_ppp <- childcare20_ppp_jit[sengkang_owin]

Hougang (2020)

childcare20_hougang_ppp <- childcare20_ppp_jit[hougang_owin]

Bedok (2020)

childcare20_bedok_ppp <- childcare20_ppp_jit[bedok_owin]

Bukit Batok (2020)

childcare20_bukitbatok_ppp <- childcare20_ppp_jit[bukitbatok_owin]

Singapore (2017)

childcare17_sg_ppp <- childcare17_ppp_jit[subzone_owin]

Singapore (2020)

childcare20_sg_ppp <- childcare20_ppp_jit[subzone_owin]

7.2.1.5 Plot spatial data

Visualise ppp objects to ensure that there are no errors in data preparation and conversion.

For childcare services in 2017:

par(mfrow=c(2,2))

plot(childcare17_sengkang_ppp, main = 'Sengkang')
plot(childcare17_hougang_ppp, main = 'Hougang')
plot(childcare17_bedok_ppp, main = 'Bedok')
plot(childcare17_bukitbatok_ppp, main = 'Bukit Batok')

mtext('Childcare Services in 2017', outer=TRUE, side=3, line=-12)

For childcare services in 2020:

par(mfrow=c(2,2))

plot(childcare20_sengkang_ppp, main = 'Sengkang')
plot(childcare20_hougang_ppp, main = 'Hougang')
plot(childcare20_bedok_ppp, main = 'Bedok')
plot(childcare20_bukitbatok_ppp, main = 'Bukit Batok')


mtext('Childcare Services in 2020', outer=TRUE, side=3, line=-12)

7.2.2 Formulate Hypothesis

Hypothesis testing will be conducted utilising second-order statistics (L function), to assess if the observed point pattern for childcare services (in 2017 and 2020) is significantly different from a homogeneous Poisson process (Complete Spatial Randomness, CSR).

Four study areas will be tested: Sengkang, Bedok, Bukit Batok, Hougang.
Null hypothesis, H0: The distribution of childcare services in the study area are randomly distributed.
Alternative hypothesis, H1: The distribution of childcare services in the study area are not randomly distributed.
The hypothesis will be tested at a significance level of 0.01, with a corresponding confidence level of 99%.

7.2.3 L function

The L function is a second-order statistic that will be used for this analysis.
It is based on Ripley’s K Function, which is a popular second-order statistic.

The general K function is given by:

\[K(r) = \frac{E[N_0(r)]}{\lambda}\] Where,

\(E[N_0(r)]\) is the expected number of events within a distance r from an arbitrary event.
\(\lambda\) is the estimated intensity of the study area (total number of points divided by the area of study region).

The K function is a cumulative function which increases with distance. This is because with increasing distance bands, there are going to be more points.
It must however be noted that the variance of K(r) increases with r. As the distance band gets larger, there is a greater variability of the average number of points within the circle.
The L function tries to account for this variance increasing with distance. Hence, the L function will be utilised for the purpose of this analysis.

The L function is a variance-stabilising transformation of the K function:

\[L(r) = \sqrt{\frac{K(r)}{\pi}}\]

Methodology and Interpretation

To assess the spatial point pattern, \((L(r) - r)\) will be plotted against \(r\).
At complete spatial randomness (CSR), \(L(r) - r = 0\)
- \(L(r) – r > 0\) implies clustering
- \(L(r) - r < 0\) implies dispersion
To assess whether the observed point pattern is significant, Monte Carlo hypothesis test for CSR using L function will be conducted.
- The Monte Carlo test is a randomisation test based on simulations, which computes point-wise critical bands.
- The width of the envelopes reflect the variability of the process under the null hypothesis of CSR.
- If L(r) lies outside the randomisation envelope, the observed point pattern is significant.

7.2.3.1 Construct functions

The function below computes and plots the L function estimate for a point pattern. Ripley edge correction is used to account for edge effects.

l_estimate <- function(ppp) {
  l_est <- Lest(ppp, correction = 'Ripley')
  
  plot(l_est, 
       . -r ~ r,
       ylab = 'L(r) - r',
       xlab = 'r (metres)',
       main = 'L-Estimate')
}

The function below conducts the Monte Carlo test for CSR with L function, and plots the results.

l_montecarlo <- function(ppp, mtitle) {
  
  # Set seed such that same results will be obtained from MC test every time
  set.seed(1234)
  l_mc <- envelope(ppp, Lest, nsim = 199)
  
  l_data <- as.data.frame(l_mc)
  l_data <- l_data[-1,]
  
  colour <- c("#d73027", "#ffffbf", "#91bfdb")

  test <- ggplot(l_data, aes(x=r, y=obs-r))+
    # plot observed value
    geom_line(colour=c("#4d4d4d"), aes(text = sprintf('L(r) - r: %f \nr: %f', obs-r, r), group=1))+
    # plot simulation envelopes
    geom_ribbon(aes(ymin=(lo-r),ymax=(hi-r), alpha=0.1, colour=c("#e0e0e0"))) +
    xlab("Distance, r (metres)") +
    ylab("L(r) - r") +
    # plot expected value, which is equal to 0
    geom_hline(yintercept=0, linetype = "dashed", colour=c("#800000")) +
    # plot 'Quantums'
    geom_rug(data=l_data[(l_data$obs-l_data$r) > (l_data$hi-l_data$r),], sides="b", colour=colour[1]) +
    geom_rug(data=l_data[(l_data$obs-l_data$r) < (l_data$lo-l_data$r),], sides="b", colour=colour[2]) +
    geom_rug(data=l_data[(l_data$obs-l_data$r) >= (l_data$lo-l_data$r) & (l_data$obs-l_data$r) <= (l_data$hi-l_data$r),], sides="b", color=colour[3]) +
    # turn off all legends
    theme(legend.position="none") +
    ggtitle(sprintf('Childcare Services in %s: L(r) - r with Randomisation Envelope', mtitle))
    
  ggplotly(test, tooltip = "text")
}

7.2.3.2 Sengkang (2017)

Compute L Function estimate

l_estimate(childcare17_sengkang_ppp)

For distances more than 50m, the \(L(r) - r\) function for childcare services at Sengkang in 2017 (black line) lies above the \(L(r) - r\) function at CSR (red line), suggesting clustering.

To confirm the observed spatial pattern and assess if the distribution of childcare services at Sengkang in 2017 is significantly different from a homogeneous Poisson process, conduct Monte Carlo test with L function to test the hypothesis.

Null hypothesis, H0: The distribution of childcare services at Sengkang in 2017 is randomly distributed.
Alternative hypothesis, H1: The distribution of childcare services at Sengkang in 2017 is not randomly distributed.
The hypothesis will be tested at a significance level of 0.01, with a corresponding confidence level of 99%.

Monte Carlo Test with L Function

l_montecarlo(childcare17_sengkang_ppp, 'Sengkang')

## Generating 199 simulations of CSR  ...
## 1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34.36.38.40
## .42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74.76.78.80
## .82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114.116.118.120
## .122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154.156.158.160
## .162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194.196.198 199.
## 
## Done.

At distances below 105m, \(L(r) - r\) for childcare services at Sengkang in 2017 (black line) is mostly above 0 but lies within the randomisation envelope. Hence, clustering is not statistically significant at distances below 105m.
From 105m, \(L(r) - r\) for childcare services in Sengkang is more than 0 and lies above the randomisation envelope. This suggests statistically significant clustering of childcare services in Sengkang at distances above 105m.
At distances above 105m, there is sufficient evidence to reject the null hypothesis, and the distribution of childcare services at Sengkang in 2017 is not randomly distributed.

7.2.3.3 Sengkang (2020)

Compute L Function estimate

l_estimate(childcare20_sengkang_ppp)

Except for certain distances less than 100m, the \(L(r) - r\) function for childcare services at Sengkang in 2020 (black line) lies above the \(L(r) - r\) function at CSR (red line), suggesting clustering.

To confirm the observed spatial pattern and assess if the distribution of childcare services at Sengkang in 2020 is significantly different from a homogeneous Poisson process, conduct Monte Carlo test with L function to test the hypothesis.

Null hypothesis, H0: The distribution of childcare services at Sengkang in 2020 is randomly distributed.
Alternative hypothesis, H1: The distribution of childcare services at Sengkang in 2020 is not randomly distributed.
The hypothesis will be tested at a significance level of 0.01, with a corresponding confidence level of 99%.

Monte Carlo Test with L Function

l_montecarlo(childcare20_sengkang_ppp, 'Sengkang')

## Generating 199 simulations of CSR  ...
## 1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34.36.38.40
## .42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74.76.78.80
## .82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114.116.118.120
## .122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154.156.158.160
## .162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194.196.198 199.
## 
## Done.

At short distances below 100m, although \(L(r) - r\) for childcare services at Sengkang in 2020 (black line) is mostly above 0, it lies within the randomisation envelope. Hence clustering is not statistically significant at distances below 100m.
For distances above 100m, \(L(r) - r\) for childcare services in Sengkang is more than 0 and lies above the randomisation envelope. This suggests statistically significant clustering of childcare services in Sengkang at distances above 100m.
At distances above 100m, there is sufficient evidence to reject the null hypothesis, and the distribution of childcare services at Sengkang in 2020 is not randomly distributed.

7.2.3.4 Hougang (2017)

Compute L Function estimate

l_estimate(childcare17_hougang_ppp)

Above 100m, the \(L(r) - r\) function for childcare services at Hougang in 2017 (black line) lies above the \(L(r) - r\) function at CSR (red line), suggesting spatial clustering.

To confirm the observed spatial pattern and assess if the distribution of childcare services at Hougang in 2017 is significantly different from a homogeneous Poisson process, conduct Monte Carlo test with L function to test the hypothesis.

Null hypothesis, H0: The distribution of childcare services at Hougang in 2017 is randomly distributed.
Alternative hypothesis, H1: The distribution of childcare services at Hougang in 2017 is not randomly distributed.
The hypothesis will be tested at a significance level of 0.01, with a corresponding confidence level of 99%.

Monte Carlo Test with L Function

l_montecarlo(childcare17_hougang_ppp, 'Hougang')

## Generating 199 simulations of CSR  ...
## 1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34.36.38.40
## .42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74.76.78.80
## .82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114.116.118.120
## .122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154.156.158.160
## .162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194.196.198 199.
## 
## Done.

\(L(r) - r\) for childcare services at Hougang in 2017 (black line) is more than 0, but lies within the randomisation envelope for distances below 500m.
From 500m, the observed \(L(r) - r\) values start to go above the randomisation envelope, but still enter the envelope at certain distances.
It is only at distances above 600m where \(L(r) - r\) for childcare services at Hougang in 2017 is above 0 and outside of the randomisation envelope. This suggests that spatial clustering at distances above 600m is statistically significant.
Above 600m, there is sufficient evidence to reject the null hypothesis, and the distribution of childcare services in Hougang is not randomly distributed.
Whereas at distances below 600m, spatial patterns observed (dispersion or clustering) is not statistically significant and there is insufficient evidence to reject the null hypothesis. Childcare services in Hougang are randomly distributed at distances below 600m.

7.2.3.5 Hougang (2020)

Compute L Function estimate

l_estimate(childcare20_hougang_ppp)

Above 100m, the \(L(r) - r\) function for childcare services at Hougang in 2020 (black line) lies above the \(L(r) - r\) function at CSR (red line), suggesting spatial clustering.

To confirm the observed spatial pattern and assess if the distribution of childcare services at Hougang in 2020 is significantly different from a homogeneous Poisson process, conduct Monte Carlo test with L function to test the hypothesis.

Null hypothesis, H0: The distribution of childcare services at Hougang in 2020 is randomly distributed.
Alternative hypothesis, H1: The distribution of childcare services at Hougang in 2020 is not randomly distributed.
The hypothesis will be tested at a significance level of 0.01, with a corresponding confidence level of 99%.

Monte Carlo Test with L Function

l_montecarlo(childcare20_hougang_ppp, 'Hougang')

## Generating 199 simulations of CSR  ...
## 1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34.36.38.40
## .42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74.76.78.80
## .82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114.116.118.120
## .122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154.156.158.160
## .162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194.196.198 199.
## 
## Done.

At distances below 70m, \(L(r) - r\) for childcare services at Hougang in 2020 (black line) is less than 0, but is within the randomisation envelope.
From 70m, observed \(L(r) - r\) values go above 0, but still lie within the randomisation envelope.
At about 480m, the observed \(L(r) - r\) values start to go above the randomisation envelope, but still enter the envelope at certain distances.
It is only at distances above 688m where \(L(r) - r\) for childcare services in Hougang is above 0 and outside of the randomisation envelope. This suggests that spatial clustering at distances above 688m is statistically significant.
Above 688m, there is sufficient evidence to reject the null hypothesis, and the distribution of childcare services at Hougang in 2020 is not randomly distributed.
Whereas at distances below 688m, spatial patterns observed (dispersion or clustering) is not statistically significant and there is insufficient evidence to reject the null hypothesis. Childcare services at Hougang in 2020 are randomly distributed at distances below 688m.

7.2.3.6 Bedok (2017)

Compute L Function estimate

l_estimate(childcare17_bedok_ppp)

Except at certain distances, the \(L(r) - r\) function for childcare services at Bedok in 2017 (black line) lies above the \(L(r) - r\) function at CSR (red line), suggesting clustering.

To confirm the observed spatial pattern and assess if the distribution of childcare services at Bedok in 2017 is significantly different from a homogeneous Poisson process, conduct Monte Carlo test with L function to test the hypothesis.

Null hypothesis, H0: The distribution of childcare services at Bedok in 2017 is randomly distributed.
Alternative hypothesis, H1: The distribution of childcare services at Bedok in 2017 is not randomly distributed.
The hypothesis will be tested at a significance level of 0.01, with a corresponding confidence level of 99%.

Monte Carlo Test with L Function

l_montecarlo(childcare17_bedok_ppp, 'Bedok')

## Generating 199 simulations of CSR  ...
## 1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34.36.38.40
## .42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74.76.78.80
## .82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114.116.118.120
## .122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154.156.158.160
## .162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194.196.198 199.
## 
## Done.

\(L(r) - r\) for childcare services at Bedok in 2017 (black line) is above 0, but lie within the randomisation envelope at all distances.
Hence, there is no statistically significant clustering of childcare services at Bedok in 2017.
There is insufficient evidence to reject the null hypothesis, hence the distribution of childcare services at Bedok in 2017 is randomly distributed.

7.2.3.7 Bedok (2020)

Compute L Function estimate

l_estimate(childcare20_bedok_ppp)

The \(L(r) - r\) function for childcare services at Bedok in 2020 (black line) lies above the \(L(r) - r\) function at CSR (red line), suggesting clustering.

To confirm the observed spatial pattern and assess if the distribution of childcare services at Bedok in 2020 is significantly different from a homogeneous Poisson process, conduct Monte Carlo test with L function to test the hypothesis.

Null hypothesis, H0: The distribution of childcare services at Bedok in 2020 is randomly distributed.
Alternative hypothesis, H1: The distribution of childcare services at Bedok in 2020 is not randomly distributed.
The hypothesis will be tested at a significance level of 0.01, with a corresponding confidence level of 99%.

Monte Carlo Test with L Function

l_montecarlo(childcare20_bedok_ppp, 'Bedok')

## Generating 199 simulations of CSR  ...
## 1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34.36.38.40
## .42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74.76.78.80
## .82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114.116.118.120
## .122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154.156.158.160
## .162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194.196.198 199.
## 
## Done.

For Bedok, although \(L(r) - r\) for childcare services in 2020 (black line) is above 0, it lies within the randomisation envelope for most distances, only falling out of the envelope at certain distances.
Hence, it cannot be said that there is statistically significant clustering of childcare services at Bedok in 2020.
There is insufficient evidence to reject the null hypothesis, hence the distribution of childcare services at Bedok in 2020 is randomly distributed.

7.2.3.8 Bukit Batok (2017)

Compute L Function estimate

l_estimate(childcare17_bukitbatok_ppp)

Except for distances below 150m, the \(L(r) - r\) function for childcare services at Bukit Batok in 2017 (black line) lies above the \(L(r) - r\) function at CSR (red line), suggesting clustering.

To confirm the observed spatial pattern and assess if the distribution of childcare services at Bukit Batok in 2017 is significantly different from a homogeneous Poisson process, conduct Monte Carlo test with L function to test the hypothesis.

Null hypothesis, H0: The distribution of childcare services at Bukit Batok in 2017 is randomly distributed.
Alternative hypothesis, H1: The distribution of childcare services at Bukit Batok in 2017 is not randomly distributed.
The hypothesis will be tested at a significance level of 0.01, with a corresponding confidence level of 99%.

Monte Carlo Test with L Function

l_montecarlo(childcare17_bukitbatok_ppp, 'Bukit Batok')

## Generating 199 simulations of CSR  ...
## 1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34.36.38.40
## .42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74.76.78.80
## .82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114.116.118.120
## .122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154.156.158.160
## .162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194.196.198 199.
## 
## Done.

\(L(r) - r\) for childcare services at Bukit Batok in 2017 (black line) is mostly above 0 except at short distances below 80m.
However, the observed \(L(r) - r\) values lie mostly within the randomisation envelope.
Observed values for \(L(r) - r\) are only consistently above the randomisation envelope at distances from 400m to 760m. This suggests that there is statistically significant clustering of childcare services at Bukit Batok in 2017, at distances of 400m to 760m.
At distances above 400m to 760m, there is sufficient evidence to reject the null hypothesis, and the distribution of childcare services at Bukit Batok in 2017 is not randomly distributed.
At all other distances, spatial clustering is not statistically significant and there is insufficient evidence to reject the null hypothesis.

7.2.3.9 Bukit Batok (2020)

Compute L Function estimate

l_estimate(childcare20_bukitbatok_ppp)

Except for certain distances less than 150m, the \(L(r) - r\) function for childcare services at Bukit Batok in 2020 (black line) lies above the \(L(r) - r\) function at CSR (red line), suggesting clustering.

To confirm the observed spatial pattern and assess if the distribution of childcare services at Bukit Batok in 2020 is significantly different from a homogeneous Poisson process, conduct Monte Carlo test with L function to test the hypothesis.

Null hypothesis, H0: The distribution of childcare services at Bukit Batok in 2020 is randomly distributed.
Alternative hypothesis, H1: The distribution of childcare services at Bukit Batok in 2020 is not randomly distributed.
The hypothesis will be tested at a significance level of 0.01, with a corresponding confidence level of 99%.

Monte Carlo Test with L Function

l_montecarlo(childcare20_bukitbatok_ppp, 'Bukit Batok')

## Generating 199 simulations of CSR  ...
## 1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34.36.38.40
## .42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74.76.78.80
## .82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114.116.118.120
## .122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154.156.158.160
## .162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194.196.198 199.
## 
## Done.

\(L(r) - r\) for childcare services in Bukit Batok (black line) is mostly above 0 except at short distances below 110m.
However, the observed \(L(r) - r\) value lies mostly within the randomisation envelope at distances below 550m. Hence spatial clustering is not statistically significant at distances below 550m.
Only above 550m to 880m do the observed \(L(r) - r\) values go above of the randomisation envelope, suggesting statistically significant clustering of childcare services in Bukit Batok at distances of 550m to 880m.
At distances above 550m to 880m, there is sufficient evidence to reject the null hypothesis, and the distribution of childcare services in Bukit Batok is not randomly distributed.

The second-order analysis of the spatial point pattern of childcare services reveals that there are certain subzones in Singapore with statistically significant clustering of childcare services. This can signify strategic planning of childcare services provision by the government or the individual service providers. More can be done to understand the accessibility of subzone populations to areas with high concentration of childcare services.

7.3 Kernel Density Estimation

Kernel density estimation (KDE) will be utilised to visualise the spatial patterns of childcare services and identify areas with high concentration of childcare services.

The distribution of childcare services in 2017 and 2020 will be visualised through mapping the varying density of childcare services across the study area.
Similar to the previous analysis, KDE will be used to analyse the following planning areas: Sengkang, Hougang, Bedok and Bukit Batok, as well as the whole of Singapore.
Bandwidth selection:
- For the planning areas, bandwidth of the kernel density maps will be manually selected based on the cut-off points where clustering is statistically significant (based on second-order point pattern analysis). This will better visualise and highlight the concentration of childcare services in the spatial point pattern.
- For planning areas with no statistically significant clustering, as well as for analysis of the entire Singapore boundary, automatic bandwidth selection will be used. The bw.ppl() algorithm will be utilised, which tends to produce more appropriate values when the pattern consists predominantly of tight clusters.
The Gaussian kernel function will be utilised, such that childcare services points that are located further away from a reference point will be given less weight. This Childcare services that are
KDE will also be used to analyse the whole of Singapore.

7.3.1 Rescaling

Rescale data such that density values are computed as the \(number\ of\ childcare\ points\ per\ square\ kilometre\), for easier interpretation.

The spatial data is projected in SVY21 which uses metres as its unit of measurement. Hence the unit of measurement will have to be converted to kilometres.

childcare17_sengkang_ppp_km <- rescale(childcare17_sengkang_ppp, 1000, 'km')
childcare17_hougang_ppp_km <- rescale(childcare17_hougang_ppp, 1000, 'km')
childcare17_bedok_ppp_km <- rescale(childcare17_bedok_ppp, 1000, 'km')
childcare17_bukitbatok_ppp_km <- rescale(childcare17_bukitbatok_ppp, 1000, 'km')

childcare20_sengkang_ppp_km <- rescale(childcare20_sengkang_ppp, 1000, 'km')
childcare20_hougang_ppp_km <- rescale(childcare20_hougang_ppp, 1000, 'km')
childcare20_bedok_ppp_km <- rescale(childcare20_bedok_ppp, 1000, 'km')
childcare20_bukitbatok_ppp_km <- rescale(childcare20_bukitbatok_ppp, 1000, 'km')

childcare17_sg_ppp_km <- rescale(childcare17_sg_ppp, 1000, 'km')
childcare20_sg_ppp_km <- rescale(childcare20_sg_ppp, 1000, 'km')

7.3.2 Construct function

The function below will compute KDE and plot the kernel density map.

kde <- function(ppp, sigma, pln_area, year) {
  kde_map <- density(ppp,
                     sigma = sigma,
                     edge = TRUE,
                     kernel = 'gaussian')

  plot(kde_map, 
       main = sprintf('Childcare Services at %s in %s', pln_area, year),
       box = FALSE)
}

7.3.3 Sengkang (2017)

Bandwidth is defined as 105m, based on results from second-order point pattern analysis.

kde(childcare17_sengkang_ppp_km, 0.105, 'Sengkang', '2017')

Across Sengkang in 2017, there are several locations with childcare services located close to each other.
But there towards the east side of Sengkang, there is a higher concentration of childcare services (up to 80 childcare centres per square kilometers).

7.3.4 Sengkang (2020)

Bandwidth is defined as 100m, based on results from second-order point pattern analysis.

kde(childcare20_sengkang_ppp_km, 0.1, 'Sengkang', '2020')

Across Sengkang in 2020, there seem to be more areas with childcare centres located close to each other, denoted by brighter areas across Sengkang, compared to 2017.
Similar to 2017, there is a region with higher concentration of childcare services at the east side of Sengkang in 2020.

7.3.5 Hougang (2017)

Bandwidth is defined as 600m, based on results from second-order point pattern analysis.

kde(childcare17_hougang_ppp_km, 0.6, 'Hougang', '2017')

There seem to be two areas at Hougang in 2017 where childcare centres are more concentrated compared to the rest of the region.

7.3.6 Hougang (2020)

Bandwidth is defined as 688m, based on results from second-order point pattern analysis.

kde(childcare20_hougang_ppp_km, 0.688, 'Hougang', '2020')

In 2020, there seems to be only one main region where childcare centres are more concentrated in Hougang.

7.3.7 Bedok (2017)

bw.ppl(childcare17_bedok_ppp_km)

##     sigma 
## 0.4225241

Bandwidth is defined as 515m, utilising the bw.ppl() algorithm.

kde(childcare17_bedok_ppp_km, bw.ppl, 'Bedok', '2017')

Concentration of childcare centres do not vary much across Bedok, but a slightly higher concentration exists in the middle of Bedok in 2017.

7.3.8 Bedok (2020)

bw.ppl(childcare20_bedok_ppp_km)

##     sigma 
## 0.4684795

Bandwidth is defined as 507m, utilising the bw.ppl() algorithm.

kde(childcare20_bedok_ppp_km, bw.ppl, 'Bedok', '2020')

Compared to 2017, childcare services at Bedok in 2020 seem to be even more concentrated at the middle of Bedok.

7.3.9 Bukit Batok (2017)

Bandwidth is defined as 400m, based on results from second-order point pattern analysis.

kde(childcare17_bukitbatok_ppp_km, 0.4, 'Bukit Batok', '2017')

The west side of Bukit Batok sees a higher concentration of childcare services in 2017 compared to the rest of the regions.

7.3.10 Bukit Batok (2020)

Bandwidth is defined as 550m, based on results from second-order point pattern analysis.

kde(childcare20_bukitbatok_ppp_km, 0.55, 'Bukit Batok', '2020')

Compared to 2017, there is a higher concentration of childcare services at Bukit Batok in 2020 over a larger area at its west side.

Kernel density maps of childcare services in 2017 and 2020 across Singapore will be plotted.

7.3.11 Childcare (2017)

kde_childcare17 <- density(childcare17_sg_ppp_km, sigma = bw.ppl, edge = TRUE, kernel = 'gaussian')
plot(kde_childcare17, main = 'Childcare Services in 2017', box = FALSE)

It can be observed that there are three areas in Singapore with higher concentration of childcare services in 2017, compared to the rest of the areas in Singapore.
These 3 areas are located in the Northeast, North and West region of Singapore. They are denoted by the brighter purple or yellow colours.

7.3.12 Childcare (2020)

kde_childcare20 <- density(childcare20_sg_ppp_km, sigma = bw.ppl, edge = TRUE, kernel = 'gaussian')
plot(kde_childcare20, main = 'Childcare Services in 2020', box = FALSE)

In comparison to the kernel density map for childcare services in 2017, it can be easily observed that there are now more areas in Singapore which have a higher concentration of childcare services in 2020.
Two specific regions that have relatively higher concentration are located in the North and Northeast region of Singapore. They are denoted by the brighter purple or yellow colours.

7.3.13 KDE on OpenStreetMap

The kernel density maps for Singapore will plotted on top of OpenStreetMap.

To plot KDE of Singapore on top of OpenStreetMap, tmap will have to be utilised.
The KDE output will have to be converted into raster for usage in tmap.
This can be done through the following format conversions:
KDE output > grid object > raster

7.3.13.1 Convert KDE output into raster

Childcare (2017)

kde_childcare17_raster <- kde_childcare17 %>%
  as.SpatialGridDataFrame.im() %>%
  raster()
kde_childcare17_raster

## class      : RasterLayer 
## dimensions : 128, 128, 16384  (nrow, ncol, ncell)
## resolution : 0.419757, 0.2695907  (x, y)
## extent     : 2.667538, 56.39644, 15.74872, 50.25633  (xmin, xmax, ymin, ymax)
## crs        : NA 
## source     : memory
## names      : v 
## values     : -7.348703e-15, 32.11809  (min, max)

Childcare (2020)

kde_childcare20_raster <- kde_childcare20 %>%
  as.SpatialGridDataFrame.im() %>%
  raster()
kde_childcare20_raster

## class      : RasterLayer 
## dimensions : 128, 128, 16384  (nrow, ncol, ncell)
## resolution : 0.419757, 0.2695907  (x, y)
## extent     : 2.667538, 56.39644, 15.74872, 50.25633  (xmin, xmax, ymin, ymax)
## crs        : NA 
## source     : memory
## names      : v 
## values     : -1.291346e-14, 29.18508  (min, max)

7.3.13.2 Assign projection

Childcare (2017)

projection(kde_childcare17_raster) <- CRS('+init=EPSG:3414')
kde_childcare17_raster

## class      : RasterLayer 
## dimensions : 128, 128, 16384  (nrow, ncol, ncell)
## resolution : 0.419757, 0.2695907  (x, y)
## extent     : 2.667538, 56.39644, 15.74872, 50.25633  (xmin, xmax, ymin, ymax)
## crs        : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs 
## source     : memory
## names      : v 
## values     : -7.348703e-15, 32.11809  (min, max)

Childcare (2020)

projection(kde_childcare20_raster) <- CRS('+init=EPSG:3414')
kde_childcare20_raster

## class      : RasterLayer 
## dimensions : 128, 128, 16384  (nrow, ncol, ncell)
## resolution : 0.419757, 0.2695907  (x, y)
## extent     : 2.667538, 56.39644, 15.74872, 50.25633  (xmin, xmax, ymin, ymax)
## crs        : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +units=m +no_defs 
## source     : memory
## names      : v 
## values     : -1.291346e-14, 29.18508  (min, max)

7.3.13.3 Plot KDE output

tmap_mode('view')

## tmap mode set to interactive viewing

tm_basemap('OpenStreetMap') +
tm_shape(kde_childcare17_raster) +
  tm_raster('v') + 
  tm_layout(legend.position = c('right', 'bottom'), 
            frame = FALSE)

## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.

## Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

7.4 Kernel Density Map vs Point Map

The use of kernel density maps provide several advantages over point maps.

Kernel density estimate smooths the points creating a continuous surface of density estimates in a given area. This provides a quantitative value representing the concentration of points at a particular value. With point maps, concentration of points can only be observed qualitatively.
With kernel density maps, it takes into account the inverse-distance-weighted counts of points, to represent the concentration of points at a particular location. This cannot be achieved through observation using point maps.
- The inverse-distance-weighted counts is important because in the real-world, childcare centres that are further away from a particular location does not mean that they cannot potentially serve the population. These points should still be taken into account, just that points further away should just be given less weight, as people will have to travel further to access the childcare service. This is exactly what is accounted for with kernel function.

Take-Home Exercise 1

Geographic Analysis of the Supply and Demand of Childcare Services in Singapore

Xiao Rong Wong

1 Introduction

2 Data

3 Install and load packages

4 Data import

4.1 Childcare (2020)

Import as sf

Glimpse

4.2 Childcare (2017)

Import as sf

Glimpse

4.3 Subzone

Import as sf

Glimpse

4.4 Population data

Import

Glimpse

4.5 Birth rate

Import

Glimpse

5 Data wrangling

5.1 Handle invalid geometries

5.1.1 Check for invalid geometries

Subzone

Childcare (2020)

Childcare (2017)

5.1.2 Create valid representation

5.2 Handle missing values

5.3 Spatial data preparation: define projection

5.3.1 Childcare (2020)

Check CRS

Transform CRS

5.3.2 Childcare (2017)

Check CRS

Assign CRS

5.3.3 Subzone

Check CRS

Assign CRS

5.3.4 Plot spatial data

5.4 Pre-process attribute data

5.4.1 Extract 2017 and 2020 population data

5.4.2 Tidy data for analysis

2017 population data

2020 population data

5.4.3 Extract population information on children

2017 population data

2020 population data

5.4.4 Join data

2017 subzone-population data

2020 subzone-population data

6 Supply-Demand Analysis

6.1 EDA & Choropleth Mapping

6.1.1 Calculate required data

2017 subzone-population data

2020 subzone-population data

6.1.2 Childcare services in 2017

6.1.2.1 Demand analysis

6.1.2.2 Supply analysis

6.1.2.3 Demand & supply analysis

6.1.3 Childcare services in 2020

6.1.3.1 Demand analysis

6.1.3.2 Supply analysis

6.1.3.3 Demand & supply analysis

6.2 Spatio-Temporal Analysis

6.2.1 Summary Statistics

6.2.2 Dot Distribution Map

6.2.3 Point Map

6.2.4 Choropleth Map

7 Spatial Point Pattern Analysis

7.1 Point Symbol Map

7.1.1 Childcare (2017)

7.1.2 Childcare (2020)

7.2 Second Order Point Pattern Analysis

7.2.1 Prepare data

7.2.1.1 Convert point data into ppp format

Childcare (2017)

Childcare (2020)

7.2.1.2 Handle duplicated points