Rpubs link: https://rpubs.com/alicesin/TakeHomeEx01
Section A
Part 1: Exploratory Data Analysis
1.1 Load essential packages
packages = c('rgdal', 'sf', 'maptools', 'raster','spatstat', 'tmap', 'tidyverse')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}## Loading required package: rgdal
## Loading required package: sp
## rgdal: version: 1.5-16, (SVN revision 1050)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
## Path to GDAL shared files: C:/Users/ahhli/OneDrive/Documents/R/win-library/4.0/rgdal/gdal
## GDAL binary built with GEOS: TRUE
## Loaded PROJ runtime: Rel. 6.3.1, February 10th, 2020, [PJ_VERSION: 631]
## Path to PROJ shared files: C:/Users/ahhli/OneDrive/Documents/R/win-library/4.0/rgdal/proj
## Linking to sp version:1.4-2
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
## Loading required package: sf
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
## Loading required package: maptools
## Checking rgeos availability: TRUE
## Loading required package: raster
## Loading required package: spatstat
## Loading required package: spatstat.data
## Loading required package: nlme
##
## Attaching package: 'nlme'
## The following object is masked from 'package:raster':
##
## getData
## Loading required package: rpart
##
## spatstat 1.64-1 (nickname: 'Help you I can, yes!')
## For an introduction to spatstat, type 'beginner'
##
## Note: spatstat version 1.64-1 is out of date by more than 4 months; we recommend upgrading to the latest version.
##
## Attaching package: 'spatstat'
## The following objects are masked from 'package:raster':
##
## area, rotate, shift
## Loading required package: tmap
## Loading required package: tidyverse
## Registered S3 method overwritten by 'cli':
## method from
## print.boxx spatstat
## -- Attaching packages ---------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.2
## v tidyr 1.1.1 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ------------------------------------------- tidyverse_conflicts() --
## x dplyr::collapse() masks nlme::collapse()
## x tidyr::extract() masks raster::extract()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x dplyr::select() masks raster::select()
1.2 Loading of datasets
1.2.1 Childcare Services dataset
Load childcare as spatial point data frame for both 2017 and 2020 The latest childcare services data for year 2020 was retrieved from data.gov.sg. The dataset was last updated on 29th of August 2020.
## Reading layer `CHILDCARE' from data source `C:\Users\ahhli\OneDrive\Desktop\Geospatial\Take-home-Ex01\IS415_Take-home-Ex01\data' using driver `ESRI Shapefile'
## Simple feature collection with 1312 features and 18 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 11203.01 ymin: 25667.6 xmax: 45404.24 ymax: 49300.88
## projected CRS: SVY21
## Reading layer `CHILDCARE' from data source `C:\Users\ahhli\OneDrive\Desktop\Geospatial\Take-home-Ex01\IS415_Take-home-Ex01\data\child-care-services-kml.kml' using driver `KML'
## Simple feature collection with 1545 features and 2 fields
## geometry type: POINT
## dimension: XYZ
## bbox: xmin: 103.6824 ymin: 1.248403 xmax: 103.9897 ymax: 1.462134
## z_range: zmin: 0 zmax: 0
## geographic CRS: WGS 84
1.2.2 Subzone level Spatial point data
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\ahhli\OneDrive\Desktop\Geospatial\Take-home-Ex01\IS415_Take-home-Ex01\data", layer: "MP14_SUBZONE_WEB_PL"
## with 323 features
## It has 15 fields
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\ahhli\OneDrive\Desktop\Geospatial\Take-home-Ex01\IS415_Take-home-Ex01\data' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
1.2.3 Singpaore Resident Population data
Load the dataset for Singapore Residents by Planning AreaSubzone, Age Group, Sex and Type of Dwelling, June 2011-2019. This data set was taken from Singapore Department of Statistics.
## Parsed with column specification:
## cols(
## PA = col_character(),
## SZ = col_character(),
## AG = col_character(),
## Sex = col_character(),
## TOD = col_character(),
## Pop = col_double(),
## Time = col_double()
## )
1.2.3.1 Assumptions
According to Early Childhood Development Agency SG, the age group for childcare services vary from above 18months to below 7 years old.
For the purpose of this study, we will be making a few assumptions as follows: 1. The age group that we will be focusing on includes 1 year old children. 2. The population of children is uniformly distributed across all age groups in this particular dataset. 3. As the dataset only contains data up to 2019, we will assume that the data for 2020 is represented by the 2019 data.
1.2.3.1 Data pre-processing
We shall create new dataframes that will represent population data for 2017 and 2020 respectively. - In this process, we will also extract the data that belongs to the age group of interest, i.e. data categorised under age group 1 to 6 (exclusive). - We need to remember to switch all subzone names in this dataset to capital letters for the convenience of matching of columns in the later part of the study. In short, these two dataframes represent the population of children in age group 1 to 6 for each year respectively.
popdata2017 <- popdata %>%
filter(Time == 2017) %>%
group_by(PA, SZ, AG) %>%
summarise('POP' = sum(Pop)) %>%
ungroup() %>%
spread(AG, POP) %>%
mutate(Children = rowSums(.[3]) +rowSums(.[12])) %>%
select('PA', 'SZ','Children')%>%
mutate(Children = Children / 10 * 6) %>%
mutate_at(.vars = "SZ", .funs=toupper) %>%
select(PA, SZ, Children)## `summarise()` regrouping output by 'PA', 'SZ' (override with `.groups` argument)
# popdata2017
popdata2020 <- popdata %>%
filter(Time == 2019) %>%
group_by(PA, SZ, AG) %>%
summarise('POP' = sum(Pop)) %>%
ungroup() %>%
spread(AG, POP) %>%
mutate(Children = rowSums(.[3]) +rowSums(.[12])) %>%
select('PA', 'SZ','Children') %>%
mutate(Children = Children / 10 * 6) %>%
mutate_at(.vars = "SZ", .funs=toupper) %>%
select(PA, SZ, Children)## `summarise()` regrouping output by 'PA', 'SZ' (override with `.groups` argument)
1.3 Checking of CRS
## Coordinate Reference System:
## User input: SVY21
## wkt:
## PROJCRS["SVY21",
## BASEGEOGCRS["SVY21[WGS84]",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ID["EPSG",6326]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["Degree",0.0174532925199433]]],
## CONVERSION["unnamed",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",1.36666666666667,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",103.833333333333,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",1,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",28001.642,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",38744.572,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]]
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
## Coordinate Reference System:
## User input: SVY21
## wkt:
## PROJCRS["SVY21",
## BASEGEOGCRS["SVY21[WGS84]",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ID["EPSG",6326]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["Degree",0.0174532925199433]]],
## CONVERSION["unnamed",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",1.36666666666667,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",103.833333333333,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",1,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",28001.642,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",38744.572,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]]
1.4 Converting of CRS
We would need to convert the CRS of different datasets to follow the projected coordinate system of Singapore, which is EPSG 3414.
1.4.1 Childcare Services Dataset
## Coordinate Reference System:
## User input: EPSG:3414
## wkt:
## PROJCRS["SVY21 / Singapore TM",
## BASEGEOGCRS["SVY21",
## DATUM["SVY21",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4757]],
## CONVERSION["Singapore Transverse Mercator",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",1.36666666666667,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",103.833333333333,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",1,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",28001.642,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",38744.572,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["northing (N)",north,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["easting (E)",east,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["unknown"],
## AREA["Singapore"],
## BBOX[1.13,103.59,1.47,104.07]],
## ID["EPSG",3414]]
## Coordinate Reference System:
## User input: EPSG:3414
## wkt:
## PROJCRS["SVY21 / Singapore TM",
## BASEGEOGCRS["SVY21",
## DATUM["SVY21",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4757]],
## CONVERSION["Singapore Transverse Mercator",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",1.36666666666667,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",103.833333333333,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",1,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",28001.642,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",38744.572,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["northing (N)",north,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["easting (E)",east,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["unknown"],
## AREA["Singapore"],
## BBOX[1.13,103.59,1.47,104.07]],
## ID["EPSG",3414]]
1.4.1 Planning Subzone level Dataset
## Coordinate Reference System:
## User input: EPSG:3414
## wkt:
## PROJCRS["SVY21 / Singapore TM",
## BASEGEOGCRS["SVY21",
## DATUM["SVY21",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4757]],
## CONVERSION["Singapore Transverse Mercator",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",1.36666666666667,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",103.833333333333,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",1,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",28001.642,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",38744.572,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["northing (N)",north,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["easting (E)",east,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["unknown"],
## AREA["Singapore"],
## BBOX[1.13,103.59,1.47,104.07]],
## ID["EPSG",3414]]
1.5 Preparing childcare service data for analysis
- Convert sf dataframe to spatial points dataframe
- Convert Spatial points dataframe to spatial points
- Convert spatial points to ppp format
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded datum Unknown based on WGS84 ellipsoid in CRS definition,
## but +towgs84= values preserved
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded datum Unknown based on WGS84 ellipsoid in CRS definition,
## but +towgs84= values preserved
childcare_2017_sp <- as(childcare_2017_sf, "SpatialPoints")
childcare_2020_sp <- as(childcare_2020_sf, "SpatialPoints")
childcare_2017_ppp <- as(childcare_2017_sp, "ppp")
childcare_2020_ppp <- as(childcare_2020_sp, "ppp")Next we check for duplicates in the ppp objects. Remove if there is.
if(any(duplicated(childcare_2017_ppp)) == TRUE){
childcare_2017_ppp <- rjitter(childcare_2017_ppp, retry=TRUE, nsim=1, drop=TRUE)
}
if(any(duplicated(childcare_2020_ppp)) == TRUE){
childcare_2020_ppp <- rjitter(childcare_2020_ppp, retry=TRUE, nsim=1, drop=TRUE)
}1.5 Planning Subzone level Dataset
## Simple feature collection with 0 features and 15 fields
## bbox: xmin: NA ymin: NA xmax: NA ymax: NA
## projected CRS: SVY21 / Singapore TM
## [1] OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
## [7] PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
## [13] Y_ADDR SHAPE_Leng SHAPE_Area geometry
## <0 rows> (or 0-length row.names)
## [1] 9
The results above shows that mpsz_sf dataset has 9 invalid values. We will therefore manage these values via st_make_valid() function. Afterwhich, we will check again for invalid values.
## [1] 0
0 invalid values in mpsz_sf dataset as shown above.
Part 2 : Analytical Mapping
2.1 Choropleth Plots for childcare service demand
Before plotting out the choropleth plot, let’s identify the breaks needed for the distribution of children aged 1 to 6. In order to find the most suitable breaks, we look at summary to get an idea of break values.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 0.0 246.0 722.9 978.0 6966.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 0.0 252.0 713.1 903.0 6708.0
The choropleth plot below illustrates the demand for childcare services at different planning subzone level in 2017.
choropleth_DD_2017 <- tm_shape(demand2017)+
tm_fill(col = "Children",
breaks = c("246", "722.9", "978", "6966"),
style="jenks",
palette = "Blues",
title = "Demand for Childcare Services in 2017") +
tm_layout(legend.position = c("right", "bottom")) +
tm_borders(alpha = 0.5) +
tmap_style("white")+
tm_shape(childcare_2017_sf)+
tm_symbols(col = "blue", scale = 0.35, alpha = 0.4)## tmap style set to "white"
## other available styles are: "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic", "watercolor"
As for the demand for childcare services in 2020, take a look at the plot below.
choropleth_DD_2020 <- tm_shape(demand2020)+
tm_fill(col = "Children",
breaks = c("252", "713.1", "903", "6708.0"),
style="jenks",
palette = "Blues",
title = "Demand for Childcare Services in 2020") +
tm_layout(legend.position = c("right", "bottom")) +
tm_borders(alpha = 0.5) +
tmap_style("white") +
tm_shape(childcare_2020_sf)+
tm_symbols(col = "blue", scale = 0.35, alpha = 0.4)## tmap style set to "white"
## other available styles are: "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic", "watercolor"
For the convenience of comparing these two demand plots, let’s look at them side by side.
Both the choropleth plots show that the childcare centres in Singapore are located within subzones with high population of children under age group 1 to 6 years old reside at.
Section B: Spatial Point Analysis
Part 4: Exploratory Data Analysis
For this section, we will be focusing more in depth on childcare services dataset. Before we delve into the analysis of the dataset, we need to do a number of checks to make sure that the dataset is ready for EDA.
4.1 Checking of duplicates
We have already done the checking of duplicates of the dataset in ppp format. The code chunk below was used earlier on.
4.2 Filter for relevant subzones
As there are subzones with zero population of children aged between 1 to 6 (inclusive), we should filter them so that we can focus on the relevant subzones for this part of the study.
demand2017 <- st_as_sf(demand2017)
demand2017_sp <- as_Spatial(demand2017[demand2017$Children > 0, ]) ## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded datum Unknown based on WGS84 ellipsoid in CRS definition,
## but +towgs84= values preserved
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded datum Unknown based on WGS84 ellipsoid in CRS definition,
## but +towgs84= values preserved
4.3 Preparation of Owin and Combination with childcare ppp points for each of the 4 subzones
For this study, we will narrow down to 4 subzones - SengKang, Bedok, Bukit Batok and Hougang.
4.3.1 Extract out demand for childcare at the 4 planning subzone areas for 2017 and 2020
# 2017
sengkang_2017 <- demand2017_sp[demand2017_sp@data$PLN_AREA_N == "SENGKANG",]
bedok_2017 <- demand2017_sp[demand2017_sp@data$PLN_AREA_N == "BEDOK",]
bukitbatok_2017 <- demand2017_sp[demand2017_sp@data$PLN_AREA_N == "BUKIT BATOK",]
hougang_2017 <- demand2017_sp[demand2017_sp@data$PLN_AREA_N == "HOUGANG",]
#2020
sengkang_2020 <- demand2020_sp[demand2020_sp@data$PLN_AREA_N == "SENGKANG",]
bedok_2020 <- demand2020_sp[demand2020_sp@data$PLN_AREA_N == "BEDOK",]
bukitbatok_2020 <- demand2020_sp[demand2020_sp@data$PLN_AREA_N == "BUKIT BATOK",]
hougang_2020 <- demand2020_sp[demand2020_sp@data$PLN_AREA_N == "HOUGANG",]4.3.2. Convert the current spatial dataframe to spatial polygons
#2017
sengkang_sp_2017 <- as(sengkang_2017, "SpatialPolygons")
bedok_sp_2017 <- as(bedok_2017, "SpatialPolygons")
bukitbatok_sp_2017 <- as(bukitbatok_2017, "SpatialPolygons")
hougang_sp_2017 <- as(hougang_2017, "SpatialPolygons")
#2020
sengkang_sp_2020 <- as(sengkang_2020, "SpatialPolygons")
bedok_sp_2020 <- as(bedok_2020, "SpatialPolygons")
bukitbatok_sp_2020 <- as(bukitbatok_2020, "SpatialPolygons")
hougang_sp_2020 <- as(hougang_2020, "SpatialPolygons")4.3.3 Prepare owins for each subzone
#2017
sengkang_owin_2017 <- as(sengkang_sp_2017, "owin")
bedok_owin_2017 <- as(bedok_sp_2017, "owin")
bukitbatok_owin_2017 <- as(bukitbatok_sp_2017, "owin")
hougang_owin_2017 <- as(hougang_sp_2017, "owin")
#2020
sengkang_owin_2020 <- as(sengkang_sp_2020, "owin")
bedok_owin_2020 <- as(bedok_sp_2020, "owin")
bukitbatok_owin_2020 <- as(bukitbatok_sp_2020, "owin")
hougang_owin_2020 <- as(hougang_sp_2020, "owin")4.3.4 Combine owins together with childcare demand ppp format
#2017
sengkang_2017_ppp = childcare_2017_ppp[sengkang_owin_2017]
bedok_2017_ppp = childcare_2017_ppp[bedok_owin_2017]
bukitbatok_2017_ppp = childcare_2017_ppp[bukitbatok_owin_2017]
hougang_2017_ppp = childcare_2017_ppp[hougang_owin_2017]
#2020
sengkang_2020_ppp = childcare_2017_ppp[sengkang_owin_2020]
bedok_2020_ppp = childcare_2017_ppp[bedok_owin_2020]
bukitbatok_2020_ppp = childcare_2017_ppp[bukitbatok_owin_2020]
hougang_2020_ppp = childcare_2017_ppp[hougang_owin_2020]4.4 SPatial Point Analysis
Let’s compare the spatial point patterns among 4 subzones in year 2017 first.
par(mfrow=c(2,2))
plot(sengkang_2017_ppp)
plot(bedok_2017_ppp)
plot(bukitbatok_2017_ppp)
plot(hougang_2017_ppp)From the plots above, we can observe that each subzone has different pattern in the way the spatial points are clustered.
In year 2017, - Sengkang: The spatial points are relatively more spread out but the east side of Sengkang is slightly more crowded with childcare centres. - Bedok: It seems that there are less childcare centres near the top and bottom edges of this subzone. - Bukitbatok: It is very evident that Bukit batok’s west side is more crowded with childcare centres as compared to its east side. - Hougang: Hougang’s nothern areas are more heavily crowded as compared to its southern areas.
Let’s compare the spatial point patterns observed in year 2020.
par(mfrow=c(2,2))
plot(sengkang_2020_ppp)
plot(bedok_2020_ppp)
plot(bukitbatok_2020_ppp)
plot(hougang_2017_ppp)In year 2020, - Sengkang: The spatial points are more crowded at the west side of seng kang - Bedok: There are a lot more childcare centres at Bedok, except for the north and south edges of this particular subzone. - Bukitbatok: There are more childcare centres at the left side of bukit batok as compared to the rest of the area. - Hougang: The spatial points are less crowded at the south part of Hougang.
Part 5: 2nd Order Spatial Point patterns analysis with the use of hypothesis testing
In class, we learned various functions that can be used to analyse the 2nd order spatial point patterns, such as the G,F and L functions. We choose L function for this analysis because it is a Besag’s transformation of Ripley’s K function.
5.1 In the context of year 2017
H0: The distribution of childcare services at (Sengkang/Bedok/Bukit Batok/Hougang) in 2017 is randomly distributed. H1: The distribution of childcare services at (Sengkang/Bedok/Bukit Batok/Hougang) in 2017 is not randomly distributed. For this study, the confidence interval level of 95% and an a-value of 0.05 will be used.
5.1.1 Sengkang 2017
sengkang_2017_L = Lest(sengkang_2017_ppp, correction = "Ripley")
plot(sengkang_2017_L, . -r ~ r, ylab= "L(d)-r", xlab = "d(m)")## Generating 39 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
##
## Done.
Based on the model output plot above, the majority of the clustering of childcare services at sengkang in 2017 do not lie within the envelope and this suggests to us that the model output is statistically significant and we reject the null hypothesis.
5.1.2 Bedok 2017
bedok_2017_L = Lest(bedok_2017_ppp, correction = "Ripley")
plot(bedok_2017_L, . -r ~ r, ylab= "L(d)-r", xlab = "d(m)")## Generating 39 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
##
## Done.
Based on the model output plot above, the majority of the clustering of childcare services at bedok in 2017 lies within the envelope and this suggests to us that the model output is not statistically significant and we fail to reject the null hypothesis.
5.1.3 Bukit Batok 2017
bukitbatok_2017_L = Lest(bukitbatok_2017_ppp, correction = "Ripley")
plot(bukitbatok_2017_L, . -r ~ r, ylab= "L(d)-r", xlab = "d(m)")## Generating 39 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
##
## Done.
Based on the model output plot above, approximately half of the clustering of childcare services at bukit batok in 2017 do not lie within the envelope and this suggests to us that the model output is statistically significant and we reject the null hypothesis.
5.1.4 Hougang 2017
hougang_2017_L = Lest(hougang_2017_ppp, correction = "Ripley")
plot(hougang_2017_L, . -r ~ r, ylab= "L(d)-r", xlab = "d(m)")## Generating 39 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
##
## Done.
Based on the model output plot above, the majority of the clustering of childcare services at hougang in 2017 lies within the envelope and this suggests to us that the model output is not statistically significant and we fail to reject the null hypothesis.
5.2 In the context of year 2020
H0: The distribution of childcare services at (Sengkang/Bedok/Bukit Batok/Hougang) in 2020 is randomly distributed. H1: The distribution of childcare services at (Sengkang/Bedok/Bukit Batok/Hougang) in 2020 is not randomly distributed. For this study, the confidence interval level of 95% and an a-value of 0.05 will be used.
5.2.1 Sengkang 2020
sengkang_2020_L = Lest(sengkang_2020_ppp, correction = "Ripley")
plot(sengkang_2020_L, . -r ~ r, ylab= "L(d)-r", xlab = "d(m)")## Generating 39 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
##
## Done.
Based on the model output plot above, the majority of the clustering of childcare services at sengkang in 2020 do not lie within the envelope and this suggests to us that the model output is statistically significant and we reject the null hypothesis.
5.2.2 Bedok 2020
bedok_2020_L = Lest(bedok_2020_ppp, correction = "Ripley")
plot(bedok_2020_L, . -r ~ r, ylab= "L(d)-r", xlab = "d(m)")## Generating 39 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
##
## Done.
Based on the model output plot above, the majority of the clustering of childcare services at bedok in 2020 lies within the envelope and this suggests to us that the model output is not statistically significant and we fail to reject the null hypothesis.
5.2.3 Bukit Batok 2020
bukitbatok_2020_L = Lest(bukitbatok_2020_ppp, correction = "Ripley")
plot(bukitbatok_2020_L, . -r ~ r, ylab= "L(d)-r", xlab = "d(m)")## Generating 39 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
##
## Done.
Based on the model output plot above, approximately half of the clustering of childcare services at bukit batok in 2020 do not lie within the envelope and this suggests to us that the model output is statistically significant and we reject the null hypothesis.
5.2.4 Hougang 2020
hougang_2020_L = Lest(hougang_2020_ppp, correction = "Ripley")
plot(hougang_2020_L, . -r ~ r, ylab= "L(d)-r", xlab = "d(m)")## Generating 39 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
##
## Done.
Based on the model output plot above, the majority of the clustering of childcare services at hougang in 2020 lies within the envelope and this suggests to us that the model output is not statistically significant and we fail to reject the null hypothesis.
Part 6: Kernel Density Estimation
kde_childcare2017_bw <- density(childcare_2017_ppp, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
kde_childcare2020_bw <- density(childcare_2020_ppp, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
par(mfrow=c(1,2))
plot(kde_childcare2017_bw)
plot(kde_childcare2020_bw)# 2017
childcare_2017_ppp <- rescale(childcare_2017_ppp, 1000, "km")
kde_childcare2017_bw.km <- density(childcare_2017_ppp, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
# 2020
childcare_2020_ppp <- rescale(childcare_2020_ppp, 1000, "km")
kde_childcare2020_bw.km <- density(childcare_2020_ppp, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
# Combine two plots together
par(mfrow=c(1,2))
plot(kde_childcare2017_bw.km)
plot(kde_childcare2020_bw.km)For both 2017 and 2020 childcare data, we will visualise the data on tmap instead. The steps are as following: - First convert the previously achieved KDE into grid format instead. - Next we will have to convert the grid format into raster format - We also need to remember to project the raster format to appropriate projection system, which in this case is EPSG 3414 -
gridded_kde_childcare_2017_bw <-as.SpatialGridDataFrame.im(kde_childcare2017_bw.km)
spplot(gridded_kde_childcare_2017_bw)kde_childcare_2017_bw_raster <- raster(gridded_kde_childcare_2017_bw)
projection(kde_childcare_2017_bw_raster) <- CRS("+init=EPSG:3414")## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum SVY21 in CRS definition
tm_shape(kde_childcare_2017_bw_raster) +
tm_raster("v") +
tm_layout(legend.position = c("right", "bottom"), frame = FALSE)## Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
For 2020 childcare data, we will output a tmap and visualise instead.
gridded_kde_childcare_2020_bw <-as.SpatialGridDataFrame.im(kde_childcare2020_bw.km)
kde_childcare_2020_bw_raster <- raster(gridded_kde_childcare_2020_bw)
projection(kde_childcare_2020_bw_raster) <- CRS("+init=EPSG:3414")## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum SVY21 in CRS definition
tm_shape(kde_childcare_2020_bw_raster) +
tm_raster("v") +
tm_layout(legend.position = c("right", "bottom"), frame = FALSE)## Variable(s) "v" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
Advantages of using Kernel Density Map
- The kernel density map leverages on openstreetmap so as compared to spatial point map, it is easier for users to visualise the clusers and analyse them more accurately.
- Unlike spatial point map, colour gradients are used so visualisations are done at greater ease.