Harvey Lauw

Overview

Objectives

In view of this, we are going to conduct a use-case to demonstrate the potential contribution of geospatial analytics in R to integrate, analyse and communicate the analysis results by using open data provided by different government agencies. The specific objectives of the study are as follow:

To gain understanding on the supply and demand of childcare services in 2017 and 2020 at planning subzone level.
To gain understanding on the geographic distribution of childcare in 2017 and 2020 in Singapore.

Installing and Launching R Packages

packages = c('rgdal', 'maptools', 'raster','spatstat', 'tmap', 'sf', 'spdep', 'tidyverse')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}

Exploratory Spatial Data Analysis

Importing data

#Importing polygon feature data in shapefile format
mpsz <- readOGR(dsn = "data",
            layer="MP14_SUBZONE_WEB_PL")

#Importing 2017 Childcare data in shapefile format
childcare_2017 <- readOGR(dsn = "data",
                     layer="CHILDCARE")

#Importing 2020 Childcare data in kml format
childcare_2020 <- readOGR("data/child-care-services-kml.kml", "CHILDCARE")

#Importing Residents data from 2011 to 2019, csv file, into r environment
residents <- read_csv("data/aspatial/planning-area-subzone-age-group-sex-and-type-of-dwelling-june-2011-2019.csv")

Data wrangling for residents dataset

Childrens need to attend Kindergarten at the age of 5. Hence, it is safe to assume that kindergartens take on the role of child care right after child care services throughout Singapore. New Category of “TODDLER/PRESCHOOL”for Ages 0 to 4 is required. Additional assumption that there is no population change from 2019 to 2020, as data only has population up to 2019. Hence, population at 2019 will be used instead, to be compared with 2017.

#Joining mpsze with resident data for 2017 & 2020 respectively

residents_2017 <- residents %>%
  filter(year == 2017) %>%
  group_by(planning_area,subzone,age_group) %>%
  summarise('resident_count' = sum(resident_count)) %>%
  ungroup() %>%
  spread(age_group, resident_count) %>%
  mutate('TODDLER/PRESCHOOL' =rowSums(.[3])) %>%
  mutate('YOUNG' =rowSums(.[4:6]) + rowSums(.[12])) %>%
  mutate('ECONOMY ACTIVE' =rowSums(.[7:11]) + rowSums(.[13:15])) %>%
  mutate('AGED' =rowSums(.[16:21])) %>%
  mutate('TOTAL' =rowSums(.[3:21])) %>%
  select('planning_area','subzone','TODDLER/PRESCHOOL','YOUNG','ECONOMY ACTIVE','AGED','TOTAL')

residents_2019 <- residents %>%
  filter(year == 2019) %>%
  group_by(planning_area,subzone,age_group) %>%
  summarise('resident_count' = sum(resident_count)) %>%
  ungroup() %>%
  spread(age_group, resident_count) %>%
  mutate('TODDLER/PRESCHOOL' =rowSums(.[3])) %>%
  mutate('YOUNG' =rowSums(.[4:6]) + rowSums(.[12])) %>%
  mutate('ECONOMY ACTIVE' =rowSums(.[7:11]) + rowSums(.[13:15])) %>%
  mutate('AGED' =rowSums(.[16:21])) %>%
  mutate('TOTAL' =rowSums(.[3:21])) %>%
  select('planning_area','subzone','TODDLER/PRESCHOOL','YOUNG','ECONOMY ACTIVE','AGED','TOTAL')

Exploratory Data Analysis for residents dataset

Linear regression suggests that in both time periods, 2017 & 2020, the number of YOUNG, ECONOMY ACTIVE and AGED population are significant in determining the number of TODDLER/PRESCHOOL. However, their absolute value of the Coefficients’ estimates in the linear regression model is similar to determine which is the most impactful variable to study for this research with the number of TODDLERS/PRESCHOOL population.

lm.residents_2017 <- lm(`TODDLER/PRESCHOOL`~YOUNG + `ECONOMY ACTIVE` + AGED,
                                 data=residents_2017) 
summary(lm.residents_2017)

## 
## Call:
## lm(formula = `TODDLER/PRESCHOOL` ~ YOUNG + `ECONOMY ACTIVE` + 
##     AGED, data = residents_2017)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1325.30   -67.45   -53.84    74.36  2008.41 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      53.83907   21.64975   2.487   0.0134 *  
## YOUNG            -0.39002    0.03416 -11.417   <2e-16 ***
## `ECONOMY ACTIVE`  0.31396    0.01706  18.406   <2e-16 ***
## AGED             -0.42586    0.02534 -16.803   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 314.9 on 319 degrees of freedom
## Multiple R-squared:  0.8934, Adjusted R-squared:  0.8924 
## F-statistic:   891 on 3 and 319 DF,  p-value: < 2.2e-16

lm.residents_2019 <- lm(`TODDLER/PRESCHOOL`~YOUNG + `ECONOMY ACTIVE` + AGED,
                                 data=residents_2019) 
summary(lm.residents_2019)

## 
## Call:
## lm(formula = `TODDLER/PRESCHOOL` ~ YOUNG + `ECONOMY ACTIVE` + 
##     AGED, data = residents_2019)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1177.81   -61.53   -46.20    61.25  2074.01 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      46.20042   22.27076   2.074   0.0388 *  
## YOUNG            -0.34795    0.04156  -8.372 1.81e-15 ***
## `ECONOMY ACTIVE`  0.29790    0.02019  14.753  < 2e-16 ***
## AGED             -0.40158    0.02581 -15.560  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 323.4 on 319 degrees of freedom
## Multiple R-squared:  0.889,  Adjusted R-squared:  0.8879 
## F-statistic: 851.4 on 3 and 319 DF,  p-value: < 2.2e-16

Section A: The supply and demand of childcare and kindergarten services by planning subzone

Exploratory Spatial Data Analysis:

Using appropriate EDA and choropleth mapping techniques to reveal the supply and demand of childcare services in 2017 and 2020 at the planning subzone level. Describe the spatial patterns observed.

residents_2017 <- residents_2017 %>%
  mutate_at(.vars=vars(planning_area,subzone),
            .funs=funs(toupper)) 
mpsz_resident_2017 <- mpsz
mpsz_resident_2017@data <- left_join(mpsz_resident_2017@data, residents_2017, 
                              by = c("SUBZONE_N" = "subzone"))

residents_2020 <- residents_2019 %>%
  mutate_at(.vars=vars(planning_area,subzone),
            .funs=funs(toupper)) 
mpsz_resident_2020 <- mpsz
mpsz_resident_2020@data <- left_join(mpsz_resident_2020@data, residents_2020, 
                              by = c("SUBZONE_N" = "subzone"))

Multiple columns of Number of Population in different age group against the Number of TODDLERS/PRESCHOOL does not exhibit much change from 2017 to 2020.Therefore, subsequent geospatial analysis will focus on the TODDLER/PRESCHOOL column with the location based columns.

plot(mpsz_resident_2017$`TODDLER/PRESCHOOL`, mpsz_resident_2017$`ECONOMY ACTIVE` , main = "2017: Economy Active Vs Toddlers/Preschool", col= ifelse(mpsz_resident_2017$`TODDLER/PRESCHOOL` ==0, "red", "black"))

plot(mpsz_resident_2020$`TODDLER/PRESCHOOL`, mpsz_resident_2020$`ECONOMY ACTIVE` , main = "2020: Economy Active Vs Toddlers/Preschool",col= ifelse(mpsz_resident_2020$`TODDLER/PRESCHOOL` ==0, "red", "black"))

An obvious pattern change can be seen on the North, North East and East of Singapore. In 2017, there are multiple clusters of high demand for childcare services in a planning area with the center of 1 subzone being the highest in demand for childcare services.

For example, taking 1 of the cluster with Bishan where the number of TODDLERS/PRESCHOOLER has decreased, and in 2017 the subzones surrounding the cluster in Bishan seemed to be mo spread out where they have similar shade of blue on the map. As compared to 2020 where the subzones surrounding it seemed to be getting more TODDLERS/PRESCHOOLERS and the darker shade move towards the center of the cluster where the darkest subzone is at.

The same can be said for clusters in planning areas like Bedok, Bukit Panjang, Choa Chu Kang, Jurong East, Kallang, Toa Payoh & Woodlands.

There is an opposite effect on some areas like Bukit Batok, Bukit Merah & Sembawang, where each clusters has show an overall in crease in the same planning subzone.

tmap_mode("plot")
tm_shape(mpsz_resident_2017) +
  tm_fill("TODDLER/PRESCHOOL",
          style = "quantile",
          palette = "Blues",
          thres.poly = 0) + 
  tm_facets(by="planning_area")

tm_shape(mpsz_resident_2020) +
  tm_fill("TODDLER/PRESCHOOL",
          style = "quantile",
          palette = "Blues",
          thres.poly = 0) + 
  tm_facets(by="planning_area")

Use the tmap below visualization to interact with the residents data at national level. Tooltip guide: - 1st line Bolded number = Polygon index of the subzone, - 2nd line = Number of TODDLER/PRESCHOOL in the subzone and year

map2017 <- tm_shape(mpsz_resident_2017)+
  tm_fill("TODDLER/PRESCHOOL", 
          style = "jenks", 
          palette = "Oranges",
          alpha = 0.7)  +
  tm_basemap(server = "http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png") + 
  tm_view(set.view = c(lon = 103.814564, lat = 1.361085, zoom = 10.5)) +
  tm_borders(alpha = 0.5)

map2020 <- tm_shape(mpsz_resident_2020)+
  tm_fill("TODDLER/PRESCHOOL", 
          style = "jenks", 
          palette = "Blues",
          alpha = 0.7)  +
  tm_basemap(server = "http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png") + 
  tm_view(set.view = c(lon = 103.814564, lat = 1.361085, zoom = 10.5)) +
  tm_borders(alpha = 0.5)

tmap_mode("view")
tmap_arrange(map2017, map2020, asp=1, ncol=2, sync = TRUE)

Analytics mapping

Using appropriate analytics mapping techniques to reveal the temporal changes of the childcare services at the planning subzone level.

From 3.1, select any 4 planning areas in singapore by the increase in the demand of childcare services: Sengkang, Bedok, Bukit Batok & Hougang. These will be few of the examples to study the supply of childcare services from 2017 to 2020.

#Reading as SpatialPoints & SpatialPolygons
childcare_2017_sp <- as(childcare_2017, "SpatialPoints")
childcare_2020_sp <- as(childcare_2020, "SpatialPoints")
se = mpsz[mpsz@data$PLN_AREA_N == "SENGKANG",]
be = mpsz[mpsz@data$PLN_AREA_N == "BEDOK",]
bb = mpsz[mpsz@data$PLN_AREA_N == "BUKIT BATOK",]
hg = mpsz[mpsz@data$PLN_AREA_N == "HOUGANG",]

se_sp = as(se, "SpatialPolygons")
be_sp = as(be, "SpatialPolygons")
bb_sp = as(bb, "SpatialPolygons")
hg_sp = as(hg, "SpatialPolygons")

se_owin = as(se_sp, "owin")
be_owin = as(be_sp, "owin")
bb_owin = as(bb_sp, "owin")
hg_owin = as(hg_sp, "owin")

childcare_2017_3414 <- spTransform(childcare_2017_sp, 
                               CRS("+init=epsg:3414"))
childcare_2020_3414 <- spTransform(childcare_2020_sp, 
                               CRS("+init=epsg:3414"))

Check for presence of duplicated data points

#Conversion to spatstat's ppp object format 
childcare_2017_ppp <- as(childcare_2017_3414, "ppp")
any(duplicated(childcare_2017_ppp))

## [1] TRUE

childcare_2020_ppp <- as(childcare_2020_3414, "ppp")
any(duplicated(childcare_2020_ppp))

## [1] TRUE

Perform Jittering to handle duplicated points

childcare_2017_ppp_jit <- rjitter(childcare_2017_ppp, retry=TRUE, nsim=1, drop=TRUE)
any(duplicated(childcare_2017_ppp_jit))

## [1] FALSE

childcare_2020_ppp_jit <- rjitter(childcare_2020_ppp, retry=TRUE, nsim=1, drop=TRUE)
any(duplicated(childcare_2020_ppp_jit))

## [1] FALSE

Combine jittered data points and subzone owin file

childcare_2017_se_ppp = childcare_2017_ppp_jit[se_owin]
childcare_2017_be_ppp = childcare_2017_ppp_jit[be_owin]
childcare_2017_bb_ppp = childcare_2017_ppp_jit[bb_owin]
childcare_2017_hg_ppp = childcare_2017_ppp_jit[hg_owin]

childcare_2020_se_ppp = childcare_2020_ppp_jit[se_owin]
childcare_2020_be_ppp = childcare_2020_ppp_jit[be_owin]
childcare_2020_bb_ppp = childcare_2020_ppp_jit[bb_owin]
childcare_2020_hg_ppp = childcare_2020_ppp_jit[hg_owin]

The number of Childcare services has increased in SengKang from 2017 to 2020. And the Childcare service centers are moving towards the center of the Sengkang cluster

par(mfrow=c(1,2))
plot(childcare_2017_se_ppp, main = "SengKang 2017", cex = 0.8, markscale = 0.04, 
     bg = rgb(0.1,0.9,0.3,0.5), fg = "black")
plot(childcare_2020_se_ppp, main = "SengKang 2020", cex = 0.8, markscale = 0.04, 
     bg = rgb(0.1,0.9,0.3,0.5), fg = "black")

The number of Childcare services has increased in Bedok from 2017 to 2020.

par(mfrow=c(1,2))
plot(childcare_2017_be_ppp, main = "Bedok 2017", cex = 0.8, markscale = 0.04, 
     bg = rgb(0.1,0.9,0.3,0.5), fg = "black")
plot(childcare_2020_be_ppp, main = "Bedok 2020", cex = 0.8, markscale = 0.04, 
     bg = rgb(0.1,0.9,0.3,0.5), fg = "black")

The number of Childcare services has increased in Bukit Batok from 2017 to 2020.

par(mfrow=c(1,2))
plot(childcare_2017_bb_ppp, main = "Bukit Batok 2017", cex = 0.8, markscale = 0.04, 
     bg = rgb(0.1,0.9,0.3,0.5), fg = "black")
plot(childcare_2020_bb_ppp, main = "Bukit Batok 2020", cex = 0.8, markscale = 0.04, 
     bg = rgb(0.1,0.9,0.3,0.5), fg = "black")

The number of Childcare services has increased in Hougang from 2017 to 2020.

par(mfrow=c(1,2))
plot(childcare_2017_hg_ppp, main = "Hougang 2017", cex = 0.8, markscale = 0.04, 
     bg = rgb(0.1,0.9,0.3,0.5), fg = "black")
plot(childcare_2020_hg_ppp, main = "Hougang 2020", cex = 0.8, markscale = 0.04, 
     bg = rgb(0.1,0.9,0.3,0.5), fg = "black")

The overall number of Childcare services has increased in Singapore from 2017 to 2020.

mpsz_2017_sp <- as(mpsz_resident_2017, "SpatialPolygons")
mpsz_2020_sp <- as(mpsz_resident_2017, "SpatialPolygons")

#Reassigning coordinate system and reprojecting geospatial data as EPSG2414
mpsz_2017_3414 <- spTransform(mpsz_2017_sp,
                          CRS("+init=epsg:3414"))
mpsz_2020_3414 <- spTransform(mpsz_2017_sp,
                          CRS("+init=epsg:3414"))


#Conversion to spatstat's owin object format
mpsz_2017_sp_owin <- as(mpsz_2017_3414, "owin")
mpsz_2020_sp_owin <- as(mpsz_2020_3414, "owin")

childcare_2017_mpsz_ppp = childcare_2017_ppp_jit[mpsz_2017_sp_owin]
plot(childcare_2017_mpsz_ppp, main = "Singapore Childcare Services 2017")

childcare_2020_mpsz_ppp = childcare_2020_ppp_jit[mpsz_2020_sp_owin]
plot(childcare_2020_mpsz_ppp, main = "Singapore Childcare Services 2020")

Geocommunication

Describe the results of 3.1 and 3.2 and draw statistical conclusions.

The results in 3.2 and 3.1 combined tells us that over time, while the number of TODDLERS/PRESCHOOLERS continue to increase, the supply in the number of childcare services available will continue to increase as well to keep up with the demand by the number of TODDLERS/PRESCHOOLERS in Singapore.

It seems that the increase in the number of childcare services are not just dependent on the most effected planning areas in Singapore as the entire island has been seen with an increase in Childcare services almost equally.

Quadrat Analysis

A test of Complete Spatial Randomness for a given point pattern

The test hypotheses at 95% confident interval are:

Ho = The distribution of childcare services are randomly distributed. H1 = The distribution of childcare services are not randomly distributed.

Results of Chi-squared test: p-value for childcare services in 2017 & 2020 is less than 2.2e-16 which is less than the significance level of 0.05. Hence, we reject H0 for both childcare services in 2017 & 2019. And that the childcare services are not randomly distributed.

Results of Monte Carlo test: p-value for childcare services in 2017 & 2020 is 0.002 which is still less than the significance level of 0.05. Hence, we also reject H0 for both childcare services in 2017 & 2019. And that the childcare services are not randomly distributed.

qt_2017_alt <- quadrat.test(childcare_2017_mpsz_ppp, 
                   nx = 20, ny = 15,alternative=c("two.sided"))
qt_2017_alt

## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  childcare_2017_mpsz_ppp
## X2 = 2216.8, df = 192, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## Quadrats: 193 tiles (irregular windows)

qt_2020_alt <- quadrat.test(childcare_2020_mpsz_ppp, 
                   nx = 20, ny = 15,alternative=c("two.sided"))
qt_2020_alt

## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  childcare_2020_mpsz_ppp
## X2 = 2627.2, df = 192, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## Quadrats: 193 tiles (irregular windows)

qt_2017_M_alt <- quadrat.test(childcare_2017_mpsz_ppp, 
             nx = 20, ny = 15,
             method="M",
             nsim=999,
             alternative=c("two.sided"))
qt_2017_M_alt

## 
##  Conditional Monte Carlo test of CSR using quadrat counts
##  Test statistic: Pearson X2 statistic
## 
## data:  childcare_2017_mpsz_ppp
## X2 = 2216.8, p-value = 0.002
## alternative hypothesis: two.sided
## 
## Quadrats: 193 tiles (irregular windows)

qt_2017_M_alt <- quadrat.test(childcare_2020_mpsz_ppp, 
             nx = 20, ny = 15,
             method="M",
             nsim=999,
             alternative=c("two.sided"))
qt_2017_M_alt

## 
##  Conditional Monte Carlo test of CSR using quadrat counts
##  Test statistic: Pearson X2 statistic
## 
## data:  childcare_2020_mpsz_ppp
## X2 = 2627.2, p-value = 0.002
## alternative hypothesis: two.sided
## 
## Quadrats: 193 tiles (irregular windows)

Nearest Neighbour Analysis

A test of aggregation for a spatial point pattern

The test hypotheses at 95% confident interval are:

Ho = The distribution of childcare services are randomly distributed. H1 = The distribution of childcare services are not randomly distributed.

Results of Clark-Evans test: p-value for childcare services in 2017 & 2020 is 0.02 which is less than the significance level of 0.05. Hence, we reject H0 for both childcare services in 2017 & 2019. And that the childcare services are not randomly distributed. As the p-values for both 2017 & 2019 childcare services datapoints are above 0 and is lesser than the significancce level, the spatial points feature echibits a clustered pattern.

ce_2017_alt <- clarkevans.test(childcare_2017_mpsz_ppp,
                correction="none",
                clipregion="mpsz_2017_sp_owin",
                alternative=c("two.sided"),
                nsim=99)

ce_2017_alt

## 
##  Clark-Evans test
##  No edge correction
##  Monte Carlo test based on 99 simulations of CSR with fixed n
## 
## data:  childcare_2017_mpsz_ppp
## R = 0.55149, p-value = 0.02
## alternative hypothesis: two-sided

ce_2020_alt <- clarkevans.test(childcare_2020_mpsz_ppp,
                correction="none",
                clipregion="mpsz_2020_sp_owin",
                alternative=c("two.sided"),
                nsim=99)

ce_2020_alt

## 
##  Clark-Evans test
##  No edge correction
##  Monte Carlo test based on 99 simulations of CSR with fixed n
## 
## data:  childcare_2020_mpsz_ppp
## R = 0.53865, p-value = 0.02
## alternative hypothesis: two-sided

Section B: Spatial Point Pattern Analysis

Exploratory Spatial Data Analysis

Using point mapping techniques, display the location of childcare services in 2019 and 2020 at the national level. Describe the spatial patterns reveal by their respective distribution

An obvious point to mention is that not many childcare services in 2019 are opened at subzones that used to not have any childcare services in 2017. The number of childcare services have definitely increased and the supply of childcare services. Duplicate childcare services have been removed which means the darker shade of spatial points in Singapore shown in 2019 plot tells us that there are a lot more overlapping spatial points of different childcare services. This is consistent and spread out throughout the different clusters found in Singapore in 2017.

For example, taking 1 of the cluster in 2017 with Tampines East where it has the highest number of childcare in its planning area, in 2017 the subzones around it seemed to have a a less even distribution in the number of childcare services in the surrounding subzones. As compared to 2019 where the subzones surrounding it seemed to have have a more even distribution of childcares in the center of the cluster as well as its surrounding. In 2019, to top it off, the number of childcare services have increased!

The same can be said for clusters in planning areas for Yishun planning area as Yishun East as the center of the cluster, Woodlands planning area with Woodlands East, Punggol planning area with Waterway East, etc.

#Remove subzones with 0 TODDLERS/PRESCHOOLERS
mpsz_resident_2017r <- mpsz
mpsz_resident_2017r@data <- inner_join(mpsz_resident_2017r@data, residents_2017, 
                              by = c("SUBZONE_N" = "subzone"))
mpsz_resident_2017r = mpsz_resident_2017r[mpsz_resident_2017r@data$`TODDLER/PRESCHOOL` != 0,]

mpsz_resident_2020r <- mpsz
mpsz_resident_2020r@data <- inner_join(mpsz_resident_2020r@data, residents_2020, 
                              by = c("SUBZONE_N" = "subzone"))
mpsz_resident_2020r = mpsz_resident_2020r[mpsz_resident_2020r@data$`TODDLER/PRESCHOOL` != 0,]

#Reading as SpatialPoints & SpatialPolygons
childcare_2017_sp <- as(childcare_2017, "SpatialPoints")
childcare_2020_sp <- as(childcare_2020, "SpatialPoints")
mpsz_2017_sp <- as(mpsz_resident_2017r, "SpatialPolygons")
mpsz_2020_sp <- as(mpsz_resident_2020r, "SpatialPolygons")

#Reassigning coordinate system and reprojecting geospatial data as EPSG2414
mpsz_2017_3414 <- spTransform(mpsz_2017_sp,
                          CRS("+init=epsg:3414"))
mpsz_2020_3414 <- spTransform(mpsz_2020_sp,
                          CRS("+init=epsg:3414"))
childcare_2017_3414 <- spTransform(childcare_2017_sp, 
                               CRS("+init=epsg:3414"))
childcare_2020_3414 <- spTransform(childcare_2020_sp, 
                               CRS("+init=epsg:3414"))

#Conversion to spatstat's ppp object format 
childcare_2017_ppp <- as(childcare_2017_3414, "ppp")
any(duplicated(childcare_2017_ppp))

## [1] TRUE

childcare_2020_ppp <- as(childcare_2020_3414, "ppp")
any(duplicated(childcare_2020_ppp))

## [1] TRUE

#Perform Jittering to handle duplicated points
childcare_2017_ppp_jit <- rjitter(childcare_2017_ppp, retry=TRUE, nsim=1, drop=TRUE)
any(duplicated(childcare_2017_ppp_jit))

## [1] FALSE

childcare_2020_ppp_jit <- rjitter(childcare_2020_ppp, retry=TRUE, nsim=1, drop=TRUE)
any(duplicated(childcare_2020_ppp_jit))

## [1] FALSE

#Conversion to spatstat's owin object format
mpsz_2017_sp_owin <- as(mpsz_2017_3414, "owin")
mpsz_2020_sp_owin <- as(mpsz_2020_3414, "owin")


childcare_2017_mpsz_ppp = childcare_2017_ppp_jit[mpsz_2017_sp_owin]
plot(childcare_2017_mpsz_ppp, main = "Childcare Services in Singapore 2017")

childcare_2020_mpsz_ppp = childcare_2020_ppp_jit[mpsz_2020_sp_owin]
plot(childcare_2020_mpsz_ppp, main = "Childcare Services in Singapore 2020")

With reference to the spatial point patterns observed in 4.1

Formulate the null hypothesis and alternative hypothesis and select the confidence level.

The test hypotheses at 95% confident interval are:

Ho = The distribution of childcare services at the planning area are randomly distributed.

H1= The distribution of childcare services at the planning area are not randomly distributed.

The null hypothesis will be rejected if p-value if smaller than alpha value of 0.05.

Perform the test by using appropriate 2nd order spatial point patterns analysis technique.

For both distributions of childcare services at Sengkang in year 2017 & 2020, the L(observed) value is greater than its corresponding L(theoretical) value for a particular distance and above its upperconfidence envelop, spatial clustering for that distance is statistically significant. And we reject the null hypothesis H0. Hence, the distribution of childcare services at Sengkang are not randomly distributed from 2017 to 2020.

L_estimate_se_2017.csr <- envelope(childcare_2017_se_ppp, Lest, nsim = 40, rank = 1)

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

plot(L_estimate_se_2017.csr, main = "Sengkang 2017" , . -r ~ r, ylab="L(d)-r",xlab= "d")

L_estimate_se_2020.csr <- envelope(childcare_2017_se_ppp, Lest, nsim = 40, rank = 1, glocal=TRUE)

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

plot(L_estimate_se_2020.csr, main = "Sengkang 2020" , . -r ~ r, ylab="L(d)-r",xlab= "d")

For distributions of childcare services at Bedok in 2017, the L(observed) value is greater than its corresponding L(theoretical) value for a particular distance and transitioning from under to above, but still mostly above its upperconfidence envelop, spatial clustering for that distance mostly and statistically significant. And we reject the null hypothesis H0. Hence, the distribution of childcare services at Bedok are not randomly distributed in 2017.

For distributions of childcare services at Bedok in 2020, the L(observed) value is greater than its corresponding L(theoretical) value for a particular distance and mostly under its upperconfidence envelop, spatial clustering for that distance is mostly not statistically significant. And we fail reject the null hypothesis H0. Hence, the distribution of childcare services at Bedok are randomly distributed in 2020.

L_estimate_be_2017.csr <- envelope(childcare_2017_be_ppp, Lest, nsim = 40, rank = 1, glocal=TRUE)

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

plot(L_estimate_be_2017.csr, main = "Bedok 2017" , . -r ~ r, ylab="L(d)-r",xlab= "d")

L_estimate_be_2020.csr <- envelope(childcare_2017_be_ppp, Lest, nsim = 40, rank = 1, glocal=TRUE)

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

plot(L_estimate_be_2020.csr, main = "Bedok 2020" , . -r ~ r, ylab="L(d)-r",xlab= "d")

For distributions of childcare services at Bukit Batok in 2017, the L(observed) value is greater than its corresponding L(theoretical) value for a particular distance and mostly above its upperconfidence envelop, spatial clustering for that distance is not statistically significant. Hence, we fail reject the null hypothesis H0. Hence, the distribution of childcare services at Bukit Batok are randomly distributed from 2017.

For distributions of childcare services at Bukit Batok in 2020, the L(observed) value is greater than its corresponding L(theoretical) value for a particular distance and mostly under its upperconfidence envelop, spatial clustering for that distance is not statistically significant. Hence, we reject the null hypothesis H0. Hence, the distribution of childcare services at Bukit Batok are not randomly distributed in 2020.

L_estimate_bb_2017.csr <- envelope(childcare_2017_bb_ppp, Lest, nsim = 40, rank = 1, glocal=TRUE)

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

plot(L_estimate_bb_2017.csr, main = "Bukit Batok 2017" , . -r ~ r, ylab="L(d)-r",xlab= "d")

L_estimate_bb_2020.csr <- envelope(childcare_2017_bb_ppp, Lest, nsim = 40, rank = 1, glocal=TRUE)

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

plot(L_estimate_bb_2020.csr, main = "Bukit Batok 2020" , . -r ~ r, ylab="L(d)-r",xlab= "d")

For both distributions of childcare services at Hougang in year 2017 & 2020, the L(observed) value is greater than its corresponding L(theoretical) value for a particular distance and mostly above its upperconfidence envelop, spatial clustering for that distance is statistically significant. Hence, we reject the null hypothesis H0. Hence, the distribution of childcare services at Bukit Batok are not randomly distributed in 2020.

L_estimate_hg_2017.csr <- envelope(childcare_2017_hg_ppp, Lest, nsim = 40, rank = 1)

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

plot(L_estimate_hg_2017.csr, main = "Hougang 2017" , . -r ~ r, ylab="L(d)-r",xlab= "d")

L_estimate_hg_2020.csr <- envelope(childcare_2017_hg_ppp, Lest, nsim = 40, rank = 1, glocal=TRUE)

## Generating 40 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39,  40.
## 
## Done.

plot(L_estimate_hg_2020.csr, main = "Hougang 2020" , . -r ~ r, ylab="L(d)-r",xlab= "d")

With reference to the analysis results, draw statistical conclusions

From the Null hypotheses, we can confirm that the approach where childcare services are set up are most of the time not random and it could be due to the increase in demand for childcare services as the number of TODDLERS/PRESCHOOLERS have increased from 2017 to 2020.

With reference to the results derived in 4.1 and 4.2:

Derive kernel density maps of childcare services in 2017 and 2020.

Using bw.diggle() method to detect a single tight cluster in the midst of random noise out of all the automatic badwidth methods.

childcare_2017_mpsz_ppp.km <- rescale(childcare_2017_mpsz_ppp, 1000, "km")
childcare_2020_mpsz_ppp.km <- rescale(childcare_2020_mpsz_ppp, 1000, "km")
kde_childcare_2017 <- density(childcare_2017_mpsz_ppp.km , sigma=bw.diggle, edge=TRUE, kernel="gaussian") 
plot(kde_childcare_2017, main = "2017 Kernel Density Estimation")

kde_childcare_2020 <- density(childcare_2020_mpsz_ppp.km ,  sigma=bw.diggle, edge=TRUE, kernel="gaussian") 
plot(kde_childcare_2020, main = "2020 Kernel Density Estimation")

Using appropriate tmap functions, display the kernel density maps on openstreetmap of Singapore.

kde_childcare_2017 <- density(childcare_2017_mpsz_ppp, sigma=bw.diggle, edge=TRUE, kernel="gaussian") 
kde_childcare_2020 <- density(childcare_2020_mpsz_ppp,  sigma=bw.diggle, edge=TRUE, kernel="gaussian") 

#Convert Kernel Density Estimation Object into Gridded Object
gridded_kde_childcare_2017 <- as.SpatialGridDataFrame.im(kde_childcare_2017)
gridded_kde_childcare_2020 <- as.SpatialGridDataFrame.im(kde_childcare_2020)

#Convert Gridded Object into Raster
kde_childcare_2017_raster <- raster(gridded_kde_childcare_2017)
kde_childcare_2020_raster <- raster(gridded_kde_childcare_2020)

#Reassign projection system
projection(kde_childcare_2017_raster) <- CRS("+init=EPSG:3414")
projection(kde_childcare_2020_raster) <- CRS("+init=EPSG:3414")

#Display with tmap
kde2017 <- tm_shape(kde_childcare_2017_raster) + 
  tm_raster(n=10, palette ="Oranges",
            title="2017: v", alpha =0.7, breaks = c(0,0.000005,0.00001,0.000015,0.00002,0.000025)) + 
  tm_basemap(server = "http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png") +
  tm_view(set.view = c(lon = 103.814564, lat = 1.361085, zoom = 10.5)) +
  tm_legend(legend.outside=TRUE)

kde2020 <- tm_shape(kde_childcare_2020_raster) + 
  tm_raster("v",n=10, palette ="Blues",
            title="2020: v", alpha =0.7, breaks = c(0,0.000005,0.00001,0.000015,0.00002,0.000025)) +
  tm_basemap(server = "http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png") + 
  tm_view(set.view = c(lon = 103.814564, lat = 1.361085, zoom = 10.5)) +
  tm_legend(legend.outside=TRUE)

tmap_mode("view")
tmap_arrange(kde2017, kde2020, asp=1, ncol=2, sync = TRUE)

With reference to the analysis results, draw statistical conclusions.

From the Kernel Density alone, we can tell that there isn’t a big change on the whole island but clusters of Childcare services from 2017 in locations like Sengkang and Woodlands has some movement in the intensity. When comparing the L function output of the 4 different planning areas in 2 different time period, we can tell that 6/8 of the cases tells us that Childcare Services are not set up randomly and are placed in places which have higher concentration in the number of TODDLERS/PRESCHOOL.

Compare the advantages of kernel density maps in 4.3.2 over point maps in 4.1.

Kernel density estimation computes the intensity of a point distribution and it provides us with a simple connection between the different childcare services locations. Each location produces a certain bandwidth and interacts with other locations’ bandwidth. They influence each other and project a certain intencity level on the kernel density map.

Unlike point maps, where it shows a general overview of how clustered the Childcare Service locations are. With point maps we can only give a rough guage, for example, “Oh I see the North of Singapore has a cluster and maybe the East has a more intense cluster”.

With kernel intensity maps we can give a better gauge and a better statement, for example, “Oh I see the North of Singapore has a cluster of Childcare service all in one place, but has only a density value of 0.000005 lower than the cluster in the East”. It provides a quantifiable estimate for anlaysts to be sure that which cluster is more intense.

4.3.4.1 Visualizing as a whole on all the maps created with Kernel Density Map, Point Map of Childcare Service with Map on No. of TODDLER/PRESCHOOOL in OpenStreetMap.

Notes: - You can differentiate Kernel Density estimation of the Childcare services locations by the grids on the chart - If not gridded, and in red, it represents the quantity of TODDLERS/PRESCHOOL present in respective subzones.

compiled2017 <- tm_shape(kde_childcare_2017_raster) + 
  tm_raster("v",n=10, palette ="Blues",
            title="2017: v", alpha =0.7, breaks = c(0,0.000005,0.00001,0.000015,0.00002,0.000025)) +
  tm_basemap(server = "http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png") + 
  tm_shape(childcare_2017_3414) +
  tm_dots(size=0.005) +
  tm_view(set.view = c(lon = 103.814564, lat = 1.361085, zoom = 10.5)) +
  tm_legend(legend.outside=TRUE)+ tm_shape(mpsz_resident_2017)+
  tm_fill("TODDLER/PRESCHOOL", 
          style = "jenks", 
          palette = "Reds",
          alpha = 0.3)  

compiled2020 <- tm_shape(kde_childcare_2020_raster) + 
  tm_raster("v",n=10, palette ="Blues",
            title="2020: v", alpha =0.7, breaks = c(0,0.000005,0.00001,0.000015,0.00002,0.000025)) +
  tm_basemap(server = "http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png") + 
  tm_shape(childcare_2020_3414) +
  tm_dots(size=0.005) +
  tm_view(set.view = c(lon = 103.814564, lat = 1.361085, zoom = 10.5)) +
  tm_legend(legend.title.size = 0.2,
          legend.text.size = 0.2,legend.outside=TRUE) + 
  tm_shape(mpsz_resident_2020) +
  tm_fill("TODDLER/PRESCHOOL", 
          style = "jenks", 
          palette = "Reds",
          alpha = 0.3) 
tmap_mode("view")
tmap_arrange(compiled2017, compiled2020, asp=1, ncol=2, sync = TRUE)

Extra insights

Using (QUEEN) contiguity based neighbours instead of (ROOK) due to complex polygon shape in the subzone level where a subzone can be closer when placed diagonally as compared to just horizontally and vertically. There is no change from 2017 to 2020, where both had the average number of links are approximately 6.

mpsz_resident_2017_q <- poly2nb(mpsz_resident_2017, queen=TRUE)
summary(mpsz_resident_2017_q)

## Neighbour list object:
## Number of regions: 323 
## Number of nonzero links: 1934 
## Percentage nonzero weights: 1.853751 
## Average number of links: 5.987616 
## 5 regions with no links:
## 16 17 18 294 301
## Link number distribution:
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  5  2  6 10 26 77 87 51 34 16  3  3  1  1  1 
## 2 least connected regions:
## 15 233 with 1 link
## 1 most connected region:
## 312 with 17 links

plot(mpsz_resident_2017, main = "Residents 2017", border="lightgrey")
plot(mpsz_resident_2017_q, coordinates(mpsz_resident_2017), pch = 19, cex = 0.6, add = TRUE, col= "red")

mpsz_resident_2020_q <- poly2nb(mpsz_resident_2020, queen=TRUE)
summary(mpsz_resident_2020_q)

## Neighbour list object:
## Number of regions: 323 
## Number of nonzero links: 1934 
## Percentage nonzero weights: 1.853751 
## Average number of links: 5.987616 
## 5 regions with no links:
## 16 17 18 294 301
## Link number distribution:
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 14 17 
##  5  2  6 10 26 77 87 51 34 16  3  3  1  1  1 
## 2 least connected regions:
## 15 233 with 1 link
## 1 most connected region:
## 312 with 17 links

plot(mpsz_resident_2020, main = "Residents 2020", border="lightgrey")
plot(mpsz_resident_2020_q, coordinates(mpsz_resident_2020), pch = 19, cex = 0.6, add = TRUE, col= "red")

IS415-Geospatial Analytics and Applications: Take Home Ex1 (Alt)

IS415-Geospatial Analytics and Applications: Take Home Ex1 (Alt)

Overview

Objectives

Installing and Launching R Packages

Exploratory Spatial Data Analysis

Importing data

Data wrangling for residents dataset

Exploratory Data Analysis for residents dataset

Section A: The supply and demand of childcare and kindergarten services by planning subzone

Exploratory Spatial Data Analysis:

Analytics mapping

Geocommunication

Quadrat Analysis

Nearest Neighbour Analysis

Section B: Spatial Point Pattern Analysis

Exploratory Spatial Data Analysis

With reference to the spatial point patterns observed in 4.1

Formulate the null hypothesis and alternative hypothesis and select the confidence level.

Perform the test by using appropriate 2nd order spatial point patterns analysis technique.

With reference to the analysis results, draw statistical conclusions

With reference to the results derived in 4.1 and 4.2:

Derive kernel density maps of childcare services in 2017 and 2020.

Using appropriate tmap functions, display the kernel density maps on openstreetmap of Singapore.

With reference to the analysis results, draw statistical conclusions.

Compare the advantages of kernel density maps in 4.3.2 over point maps in 4.1.

Extra insights