Harvey Lauw

Overview

Objectives

In view of this, we are going to conduct a use-case to demonstrate the potential contribution of geospatial analytics in R to integrate, analyse and communicate the analysis results by using open data provided by different government agencies. The specific objectives of the study are as follow:

  • To gain understanding on the supply and demand of childcare services in 2017 and 2020 at planning subzone level.

  • To gain understanding on the geographic distribution of childcare in 2017 and 2020 in Singapore.

Exploratory Spatial Data Analysis

Data wrangling for residents dataset

Childrens need to attend Kindergarten at the age of 5. Hence, it is safe to assume that kindergartens take on the role of child care right after child care services throughout Singapore. New Category of “TODDLER/PRESCHOOL”for Ages 0 to 4 is required. Additional assumption that there is no population change from 2019 to 2020, as data only has population up to 2019. Hence, population at 2019 will be used instead, to be compared with 2017.

Exploratory Data Analysis for residents dataset

Linear regression suggests that in both time periods, 2017 & 2020, the number of YOUNG, ECONOMY ACTIVE and AGED population are significant in determining the number of TODDLER/PRESCHOOL. However, their absolute value of the Coefficients’ estimates in the linear regression model is similar to determine which is the most impactful variable to study for this research with the number of TODDLERS/PRESCHOOL population.

## 
## Call:
## lm(formula = `TODDLER/PRESCHOOL` ~ YOUNG + `ECONOMY ACTIVE` + 
##     AGED, data = residents_2017)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1325.30   -67.45   -53.84    74.36  2008.41 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      53.83907   21.64975   2.487   0.0134 *  
## YOUNG            -0.39002    0.03416 -11.417   <2e-16 ***
## `ECONOMY ACTIVE`  0.31396    0.01706  18.406   <2e-16 ***
## AGED             -0.42586    0.02534 -16.803   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 314.9 on 319 degrees of freedom
## Multiple R-squared:  0.8934, Adjusted R-squared:  0.8924 
## F-statistic:   891 on 3 and 319 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = `TODDLER/PRESCHOOL` ~ YOUNG + `ECONOMY ACTIVE` + 
##     AGED, data = residents_2019)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1177.81   -61.53   -46.20    61.25  2074.01 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      46.20042   22.27076   2.074   0.0388 *  
## YOUNG            -0.34795    0.04156  -8.372 1.81e-15 ***
## `ECONOMY ACTIVE`  0.29790    0.02019  14.753  < 2e-16 ***
## AGED             -0.40158    0.02581 -15.560  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 323.4 on 319 degrees of freedom
## Multiple R-squared:  0.889,  Adjusted R-squared:  0.8879 
## F-statistic: 851.4 on 3 and 319 DF,  p-value: < 2.2e-16

Section A: The supply and demand of childcare and kindergarten services by planning subzone

Exploratory Spatial Data Analysis:

Using appropriate EDA and choropleth mapping techniques to reveal the supply and demand of childcare services in 2017 and 2020 at the planning subzone level. Describe the spatial patterns observed.

Multiple columns of Number of Population in different age group against the Number of TODDLERS/PRESCHOOL does not exhibit much change from 2017 to 2020.Therefore, subsequent geospatial analysis will focus on the TODDLER/PRESCHOOL column with the location based columns.

An obvious pattern change can be seen on the North, North East and East of Singapore. In 2017, there are multiple clusters of high demand for childcare services in a planning area with the center of 1 subzone being the highest in demand for childcare services.

For example, taking 1 of the cluster with Bishan where the number of TODDLERS/PRESCHOOLER has decreased, and in 2017 the subzones surrounding the cluster in Bishan seemed to be mo spread out where they have similar shade of blue on the map. As compared to 2020 where the subzones surrounding it seemed to be getting more TODDLERS/PRESCHOOLERS and the darker shade move towards the center of the cluster where the darkest subzone is at.

The same can be said for clusters in planning areas like Bedok, Bukit Panjang, Choa Chu Kang, Jurong East, Kallang, Toa Payoh & Woodlands.

There is an opposite effect on some areas like Bukit Batok, Bukit Merah & Sembawang, where each clusters has show an overall in crease in the same planning subzone.

Use the tmap below visualization to interact with the residents data at national level. Tooltip guide: - 1st line Bolded number = Polygon index of the subzone, - 2nd line = Number of TODDLER/PRESCHOOL in the subzone and year

Analytics mapping

Using appropriate analytics mapping techniques to reveal the temporal changes of the childcare services at the planning subzone level.

From 3.1, select any 4 planning areas in singapore by the increase in the demand of childcare services: Sengkang, Bedok, Bukit Batok & Hougang. These will be few of the examples to study the supply of childcare services from 2017 to 2020.

Check for presence of duplicated data points

## [1] TRUE
## [1] TRUE

Perform Jittering to handle duplicated points

## [1] FALSE
## [1] FALSE

Combine jittered data points and subzone owin file

The number of Childcare services has increased in SengKang from 2017 to 2020. And the Childcare service centers are moving towards the center of the Sengkang cluster

The number of Childcare services has increased in Bedok from 2017 to 2020.

The number of Childcare services has increased in Bukit Batok from 2017 to 2020.

The number of Childcare services has increased in Hougang from 2017 to 2020.

The overall number of Childcare services has increased in Singapore from 2017 to 2020.

Geocommunication

Describe the results of 3.1 and 3.2 and draw statistical conclusions.

The results in 3.2 and 3.1 combined tells us that over time, while the number of TODDLERS/PRESCHOOLERS continue to increase, the supply in the number of childcare services available will continue to increase as well to keep up with the demand by the number of TODDLERS/PRESCHOOLERS in Singapore.

It seems that the increase in the number of childcare services are not just dependent on the most effected planning areas in Singapore as the entire island has been seen with an increase in Childcare services almost equally.

Quadrat Analysis

A test of Complete Spatial Randomness for a given point pattern

The test hypotheses at 95% confident interval are:

Ho = The distribution of childcare services are randomly distributed. H1 = The distribution of childcare services are not randomly distributed.

Results of Chi-squared test: p-value for childcare services in 2017 & 2020 is less than 2.2e-16 which is less than the significance level of 0.05. Hence, we reject H0 for both childcare services in 2017 & 2019. And that the childcare services are not randomly distributed.

Results of Monte Carlo test: p-value for childcare services in 2017 & 2020 is 0.002 which is still less than the significance level of 0.05. Hence, we also reject H0 for both childcare services in 2017 & 2019. And that the childcare services are not randomly distributed.

## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  childcare_2017_mpsz_ppp
## X2 = 2216.8, df = 192, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## Quadrats: 193 tiles (irregular windows)
## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  childcare_2020_mpsz_ppp
## X2 = 2627.2, df = 192, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## Quadrats: 193 tiles (irregular windows)
## 
##  Conditional Monte Carlo test of CSR using quadrat counts
##  Test statistic: Pearson X2 statistic
## 
## data:  childcare_2017_mpsz_ppp
## X2 = 2216.8, p-value = 0.002
## alternative hypothesis: two.sided
## 
## Quadrats: 193 tiles (irregular windows)
## 
##  Conditional Monte Carlo test of CSR using quadrat counts
##  Test statistic: Pearson X2 statistic
## 
## data:  childcare_2020_mpsz_ppp
## X2 = 2627.2, p-value = 0.002
## alternative hypothesis: two.sided
## 
## Quadrats: 193 tiles (irregular windows)

Nearest Neighbour Analysis

A test of aggregation for a spatial point pattern

The test hypotheses at 95% confident interval are:

Ho = The distribution of childcare services are randomly distributed. H1 = The distribution of childcare services are not randomly distributed.

Results of Clark-Evans test: p-value for childcare services in 2017 & 2020 is 0.02 which is less than the significance level of 0.05. Hence, we reject H0 for both childcare services in 2017 & 2019. And that the childcare services are not randomly distributed. As the p-values for both 2017 & 2019 childcare services datapoints are above 0 and is lesser than the significancce level, the spatial points feature echibits a clustered pattern.

## 
##  Clark-Evans test
##  No edge correction
##  Monte Carlo test based on 99 simulations of CSR with fixed n
## 
## data:  childcare_2017_mpsz_ppp
## R = 0.55149, p-value = 0.02
## alternative hypothesis: two-sided
## 
##  Clark-Evans test
##  No edge correction
##  Monte Carlo test based on 99 simulations of CSR with fixed n
## 
## data:  childcare_2020_mpsz_ppp
## R = 0.53865, p-value = 0.02
## alternative hypothesis: two-sided

Section B: Spatial Point Pattern Analysis

Exploratory Spatial Data Analysis

Using point mapping techniques, display the location of childcare services in 2019 and 2020 at the national level. Describe the spatial patterns reveal by their respective distribution

An obvious point to mention is that not many childcare services in 2019 are opened at subzones that used to not have any childcare services in 2017. The number of childcare services have definitely increased and the supply of childcare services. Duplicate childcare services have been removed which means the darker shade of spatial points in Singapore shown in 2019 plot tells us that there are a lot more overlapping spatial points of different childcare services. This is consistent and spread out throughout the different clusters found in Singapore in 2017.

For example, taking 1 of the cluster in 2017 with Tampines East where it has the highest number of childcare in its planning area, in 2017 the subzones around it seemed to have a a less even distribution in the number of childcare services in the surrounding subzones. As compared to 2019 where the subzones surrounding it seemed to have have a more even distribution of childcares in the center of the cluster as well as its surrounding. In 2019, to top it off, the number of childcare services have increased!

The same can be said for clusters in planning areas for Yishun planning area as Yishun East as the center of the cluster, Woodlands planning area with Woodlands East, Punggol planning area with Waterway East, etc.

## [1] TRUE
## [1] TRUE
## [1] FALSE
## [1] FALSE

With reference to the spatial point patterns observed in 4.1

Formulate the null hypothesis and alternative hypothesis and select the confidence level.

The test hypotheses at 95% confident interval are:

Ho = The distribution of childcare services at the planning area are randomly distributed.

H1= The distribution of childcare services at the planning area are not randomly distributed.

The null hypothesis will be rejected if p-value if smaller than alpha value of 0.05.

Perform the test by using appropriate 2nd order spatial point patterns analysis technique.

For both distributions of childcare services at Sengkang in year 2017 & 2020, the L(observed) value is greater than its corresponding L(theoretical) value for a particular distance and above its upperconfidence envelop, spatial clustering for that distance is statistically significant. And we reject the null hypothesis H0. Hence, the distribution of childcare services at Sengkang are not randomly distributed from 2017 to 2020.

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

For distributions of childcare services at Bedok in 2017, the L(observed) value is greater than its corresponding L(theoretical) value for a particular distance and transitioning from under to above, but still mostly above its upperconfidence envelop, spatial clustering for that distance mostly and statistically significant. And we reject the null hypothesis H0. Hence, the distribution of childcare services at Bedok are not randomly distributed in 2017.

For distributions of childcare services at Bedok in 2020, the L(observed) value is greater than its corresponding L(theoretical) value for a particular distance and mostly under its upperconfidence envelop, spatial clustering for that distance is mostly not statistically significant. And we fail reject the null hypothesis H0. Hence, the distribution of childcare services at Bedok are randomly distributed in 2020.

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

For distributions of childcare services at Bukit Batok in 2017, the L(observed) value is greater than its corresponding L(theoretical) value for a particular distance and mostly above its upperconfidence envelop, spatial clustering for that distance is not statistically significant. Hence, we fail reject the null hypothesis H0. Hence, the distribution of childcare services at Bukit Batok are randomly distributed from 2017.

For distributions of childcare services at Bukit Batok in 2020, the L(observed) value is greater than its corresponding L(theoretical) value for a particular distance and mostly under its upperconfidence envelop, spatial clustering for that distance is not statistically significant. Hence, we reject the null hypothesis H0. Hence, the distribution of childcare services at Bukit Batok are not randomly distributed in 2020.

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

For both distributions of childcare services at Hougang in year 2017 & 2020, the L(observed) value is greater than its corresponding L(theoretical) value for a particular distance and mostly above its upperconfidence envelop, spatial clustering for that distance is statistically significant. Hence, we reject the null hypothesis H0. Hence, the distribution of childcare services at Bukit Batok are not randomly distributed in 2020.

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

With reference to the analysis results, draw statistical conclusions

From the Null hypotheses, we can confirm that the approach where childcare services are set up are most of the time not random and it could be due to the increase in demand for childcare services as the number of TODDLERS/PRESCHOOLERS have increased from 2017 to 2020.

With reference to the results derived in 4.1 and 4.2:

Using appropriate tmap functions, display the kernel density maps on openstreetmap of Singapore.

kde_childcare_2017 <- density(childcare_2017_mpsz_ppp, sigma=bw.diggle, edge=TRUE, kernel="gaussian") 
kde_childcare_2020 <- density(childcare_2020_mpsz_ppp,  sigma=bw.diggle, edge=TRUE, kernel="gaussian") 

#Convert Kernel Density Estimation Object into Gridded Object
gridded_kde_childcare_2017 <- as.SpatialGridDataFrame.im(kde_childcare_2017)
gridded_kde_childcare_2020 <- as.SpatialGridDataFrame.im(kde_childcare_2020)

#Convert Gridded Object into Raster
kde_childcare_2017_raster <- raster(gridded_kde_childcare_2017)
kde_childcare_2020_raster <- raster(gridded_kde_childcare_2020)

#Reassign projection system
projection(kde_childcare_2017_raster) <- CRS("+init=EPSG:3414")
projection(kde_childcare_2020_raster) <- CRS("+init=EPSG:3414")

#Display with tmap
kde2017 <- tm_shape(kde_childcare_2017_raster) + 
  tm_raster(n=10, palette ="Oranges",
            title="2017: v", alpha =0.7, breaks = c(0,0.000005,0.00001,0.000015,0.00002,0.000025)) + 
  tm_basemap(server = "http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png") +
  tm_view(set.view = c(lon = 103.814564, lat = 1.361085, zoom = 10.5)) +
  tm_legend(legend.outside=TRUE)

kde2020 <- tm_shape(kde_childcare_2020_raster) + 
  tm_raster("v",n=10, palette ="Blues",
            title="2020: v", alpha =0.7, breaks = c(0,0.000005,0.00001,0.000015,0.00002,0.000025)) +
  tm_basemap(server = "http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png") + 
  tm_view(set.view = c(lon = 103.814564, lat = 1.361085, zoom = 10.5)) +
  tm_legend(legend.outside=TRUE)

tmap_mode("view")
tmap_arrange(kde2017, kde2020, asp=1, ncol=2, sync = TRUE)

With reference to the analysis results, draw statistical conclusions.

From the Kernel Density alone, we can tell that there isn’t a big change on the whole island but clusters of Childcare services from 2017 in locations like Sengkang and Woodlands has some movement in the intensity. When comparing the L function output of the 4 different planning areas in 2 different time period, we can tell that 6/8 of the cases tells us that Childcare Services are not set up randomly and are placed in places which have higher concentration in the number of TODDLERS/PRESCHOOL.

Compare the advantages of kernel density maps in 4.3.2 over point maps in 4.1.

Kernel density estimation computes the intensity of a point distribution and it provides us with a simple connection between the different childcare services locations. Each location produces a certain bandwidth and interacts with other locations’ bandwidth. They influence each other and project a certain intencity level on the kernel density map.

Unlike point maps, where it shows a general overview of how clustered the Childcare Service locations are. With point maps we can only give a rough guage, for example, “Oh I see the North of Singapore has a cluster and maybe the East has a more intense cluster”.

With kernel intensity maps we can give a better gauge and a better statement, for example, “Oh I see the North of Singapore has a cluster of Childcare service all in one place, but has only a density value of 0.000005 lower than the cluster in the East”. It provides a quantifiable estimate for anlaysts to be sure that which cluster is more intense.

4.3.4.1 Visualizing as a whole on all the maps created with Kernel Density Map, Point Map of Childcare Service with Map on No. of TODDLER/PRESCHOOOL in OpenStreetMap.

Notes: - You can differentiate Kernel Density estimation of the Childcare services locations by the grids on the chart - If not gridded, and in red, it represents the quantity of TODDLERS/PRESCHOOL present in respective subzones.

Extra insights

Using (QUEEN) contiguity based neighbours instead of (ROOK) due to complex polygon shape in the subzone level where a subzone can be closer when placed diagonally as compared to just horizontally and vertically. There is no change from 2017 to 2020, where both had the average number of links are approximately 6.

## Neighbour list object:
## Number of regions: 323 
## Number of nonzero links: 1934 
## Percentage nonzero weights: 1.853751 
## Average number of links: 5.987616 
## 5 regions with no links:
## 16 17 18 294 301
## Link number distribution:
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  5  2  6 10 26 77 87 51 34 16  3  3  1  1  1 
## 2 least connected regions:
## 15 233 with 1 link
## 1 most connected region:
## 312 with 17 links

## Neighbour list object:
## Number of regions: 323 
## Number of nonzero links: 1934 
## Percentage nonzero weights: 1.853751 
## Average number of links: 5.987616 
## 5 regions with no links:
## 16 17 18 294 301
## Link number distribution:
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  5  2  6 10 26 77 87 51 34 16  3  3  1  1  1 
## 2 least connected regions:
## 15 233 with 1 link
## 1 most connected region:
## 312 with 17 links