In this hands-on exercise, you will gain hands-on experience on how to model geographical accessibility by using R’s geospatial analysis packages.
By the end of this hands-on exercise, you will be able:
Two data sets will be used in this hands-on exercise, they are:
Warning: These three aspatial data are specially collected by Prof. Kam for teaching and research purpose. Students taking IS415 Geospatial Analytics and Applications are allowed to use them for hands-on exercise purpose. Please obtain formal approval from Prof. Kam if you want to use them for other courses or usage.
Before we getting started, it is important for us to install the necessary R packages and launch them into RStudio environment.
The R packages need for this exercise are as follows:
The code chunk below installs and launches these R packages into RStudio environment.
packages = c('tmap', 'SpatialAcc',
'sf', 'ggstatsplot',
'tidyverse')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
Notice that with tidyverse, we do not have to install readr, dplyr and ggplots packages separately. In fact, tidyverse also installs other R packages such as tidyr, stringr, forcats, tibble, purrr and magrittr.
The geospatial data used in this hands-on exercise is called MP14_SUBZONE_WEB_PL. It is in ESRI shapefile format. The shapefile consists of URA Master Plan 2014’s planning subzone boundaries. Polygon features are used to represent these geographic boundaries. The GIS data is in svy21 projected coordinates systems.
The code chunk below is used to import MP_SUBZONE_WEB_PL shapefile by using st_read() of sf packages.
mpsz = st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `D:\IS415-AY2020-21T1\03-Hands-on Exercises\Hands-on_Ex10\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
The report above shows that the R object used to contain the imported MP14_SUBZONE_WEB_PL shapefile is called mpsz and it is a simple feature object. The geometry type is multipolygon. it is also important to note that mpsz simple feature object does not have EPSG information.
The code chunk below updates the newly imported mpsz with the correct ESPG code (i.e. 3414)
mpsz_svy21 <- st_transform(mpsz, 3414)
After transforming the projection metadata, you can varify the projection of the newly transformed mpsz_svy21 by using st_crs() of sf package.
The code chunk below will be used to varify the newly transformed mpsz_svy21.
st_crs(mpsz_svy21)
## Coordinate Reference System:
## User input: EPSG:3414
## wkt:
## PROJCRS["SVY21 / Singapore TM",
## BASEGEOGCRS["SVY21",
## DATUM["SVY21",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4757]],
## CONVERSION["Singapore Transverse Mercator",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",1.36666666666667,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",103.833333333333,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",1,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",28001.642,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",38744.572,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["northing (N)",north,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["easting (E)",east,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["unknown"],
## AREA["Singapore"],
## BBOX[1.13,103.59,1.47,104.07]],
## ID["EPSG",3414]]
Notice that the EPSG: is indicated as 3414 now.
Next, you will reveal the extent of mpsz_svy21 by using st_bbox() of sf package.
st_bbox(mpsz_svy21) #view extent
## xmin ymin xmax ymax
## 2667.538 15748.721 56396.440 50256.334
The code chunk below performs the following tasks:
hdb <- read_csv("data/aspatial/hdb_data.csv") %>%
mutate(Total = rowSums(.[2:11]))
Next, the code chunk below will be used to convert the data type of Postcode field from numeric to character.
hdb$Postcode <- as.character(hdb$Postcode)
Lastly, st_as_sf() of sf package is used to convert the asptial data.frame into simple point feature data.frame by using the values from X and Y field. Note that the coordinates are read as longitude (i.e. X) and latitude (i.e. Y). The crs argument is used to define the projection system, in this case EPSG: 3414 which is svy21 projected coordinates system of Singapore.
hdb_sf <- st_as_sf(hdb,
coords = c('X', 'Y'),
crs = "+init=epsg:3414")
We can display and print out the summary statistics of fields in hdb_sf point feature data.frame by using the code chunk below.
summary(hdb_sf)
## Postcode 1-room 2-room 3-room
## Length:9978 Min. : 0.000 Min. : 0.000 Min. : 0.00
## Class :character 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00
## Mode :character Median : 0.000 Median : 0.000 Median : 0.00
## Mean : 3.016 Mean : 4.918 Mean : 24.09
## 3rd Qu.: 0.000 3rd Qu.: 0.000 3rd Qu.: 24.00
## Max. :520.000 Max. :452.000 Max. :528.00
## 4-room 5-room Executive HUDC
## Min. : 0.00 Min. : 0.00 Min. : 0.000 Min. :0.00000
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.:0.00000
## Median : 34.00 Median : 1.00 Median : 0.000 Median :0.00000
## Mean : 41.34 Mean : 23.99 Mean : 6.487 Mean :0.01012
## 3rd Qu.: 65.00 3rd Qu.: 45.00 3rd Qu.: 0.000 3rd Qu.:0.00000
## Max. :316.00 Max. :164.00 Max. :135.000 Max. :1.00000
## Multi-generation Studio Apartment Type S1 Type S2
## Min. : 0.00000 Min. : 0.0000 Min. : 0.0000 Min. : 0.00000
## 1st Qu.: 0.00000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00000
## Median : 0.00000 Median : 0.0000 Median : 0.0000 Median : 0.00000
## Mean : 0.03678 Mean : 0.9015 Mean : 0.1235 Mean : 0.06174
## 3rd Qu.: 0.00000 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.00000
## Max. :66.00000 Max. :217.0000 Max. :176.0000 Max. :88.00000
## Latitude Longitude Total geometry
## Min. :1.270 Min. :103.7 Min. : 1.0 POINT :9978
## 1st Qu.:1.337 1st Qu.:103.8 1st Qu.: 76.0 epsg:NA : 0
## Median :1.363 Median :103.8 Median : 98.0 +proj=tmer...: 0
## Mean :1.365 Mean :103.8 Mean :104.9
## 3rd Qu.:1.391 3rd Qu.:103.9 3rd Qu.:128.0
## Max. :1.457 Max. :104.0 Max. :584.0
Repeat the steps you had learned in previous section to import hospital data.
hospital <- read_csv("data/aspatial/hospital.csv")
hospital$Postcode <- as.character(hospital$Postcode)
hospital_sf <- st_as_sf(
hospital,
coords = c('X', 'Y'),
crs = "+init=epsg:3414"
)
summary(hospital_sf)
## Postcode Latitude Longitude No Beds
## Length:26 Min. :1.280 Min. :103.7 Min. : 34.0
## Class :character 1st Qu.:1.311 1st Qu.:103.8 1st Qu.: 241.5
## Mode :character Median :1.326 Median :103.8 Median : 375.0
## Mean :1.339 Mean :103.8 Mean : 610.9
## 3rd Qu.:1.366 3rd Qu.:103.9 3rd Qu.: 957.5
## Max. :1.424 Max. :103.9 Max. :2000.0
## geometry
## POINT :26
## epsg:NA : 0
## +proj=tmer...: 0
##
##
##
The code chunk below uses read_cvs() of readr package to import distance_matrix.csv into RStudio. The imported object is a tibble data.frame called dismat.
dismat <- read_csv("data/aspatial/distance_matrix.csv", skip = 0)
Next, the code chunk below is used to convert the postal_hospital field from numeric to character data type.
dismat$postal_hospital <- as.character(dismat$postal_hospital)
Next, the spread() of tidyr package is used to transform the O-D matrix from a thin format into a fat format.
distmat_fat <- dismat %>%
select(postal_hospital, postal_hdb,
Distance) %>%
spread(postal_hospital, Distance)%>%
select(c(-c('postal_hdb')))
Note: Since tidyr version 1.0 a new function called pivot_wider() is introduce. You should use pivot_wider() instead of spread()
The code chunk below is used to add postal_hdb, lat_hdb and long_hdb fields into the fat distance matrix and the output data.frame is called dismat_temp.
dismat_temp <- dismat %>%
inner_join(hdb, by=c("postal_hdb"="Postcode"))%>%
inner_join(hospital, by=c("postal_hospital"="Postcode"))%>%
select(postal_hospital,postal_hdb, lat_hdb, long_hdb, Distance)%>%
spread(postal_hospital,Distance)
The final distance matrix is derived by dropping the postal_hdb, lat_hdb, and long_hdb fields.
dismat_matrix <- dismat_temp %>%
select (-c('postal_hdb',
'lat_hdb','long_hdb'))%>%
data.matrix(rownames.force = NA)
The code chunk below is used to convert the values in the distance matrix from metres to kilometres.
distmat_km<-as.matrix(distmat_fat/1000)
In this section, the ac() of SpatialAcc package will be used to compute a collection of geographical accessibility measure by using Hansen and SAM algorithm.
The code chunk below is used to compute Hansen’s accessibility measure (refer to argument family). It performs the following tasks:
acc_Hansen <- data.frame(ac(hdb$Total,
hospital$`No Beds`,
distmat_km,
d0 = 30000,
power = 2,
family = "Hansen"))
colnames(acc_Hansen) <- "accHansen"
acc_Hansen <- tbl_df(acc_Hansen)
hdb_Hansen <- bind_cols(hdb_sf, acc_Hansen)
The code chunk below is used to compute SAM accessibility measure (refer to argument family). It performs the following tasks:
acc_SAM <- data.frame(ac(hdb$Total,
hospital$`No Beds`,
distmat_km,
d0 = 30000,
power = 2,
family = "SAM"))
colnames(acc_SAM) <- "accSAM"
acc_SAM <- tbl_df(acc_SAM)
hdb_SAM <- bind_cols(hdb_sf, acc_SAM)
sum(is.na(hdb_SAM))
## [1] 0
tmap_mode("view")
tm_shape(hospital_sf) +
tm_symbols(size = 0.1) +
tm_shape(hdb_Hansen) +
tm_bubbles(col = "accHansen",
n = 5,
style = "quantile",
size = 0.001,
border.col = "black",
border.lwd = 1)
In this section, we are going to compare the distribution of Hansen’s accessibility values by URA Planning Region.
Firstly, we need to add the planning region field into hdb_Hansen point feature data.frame by using the code chunk below.
hdb_Hansen <- st_transform(hdb_Hansen, 3414)
hdb_Hansen <- st_join(hdb_Hansen, mpsz_svy21, join = st_intersects)
Next, ggplot() will be used to plot the distribution by using boxplot graphical method.
ggplot(data=hdb_Hansen,
aes(y = accHansen,
x= REGION_N)) +
geom_boxplot() +
geom_point(stat="summary",
fun.y="mean",
colour ="red",
size=4)
To confirm the observation, comfirmatory data analysis can be performed by using ggbetweenstats() of ggstatsplot package of R.
hdb_Hansen$log_accHansen <- log(hdb_Hansen$accHansen)
ggbetweenstats(data = hdb_Hansen,
x = REGION_N,
y = log_accHansen,
pairwise.comparisons = TRUE,
p.adjust.method = "fdr",
title = "Hansen's accessibility values by HDB location and by Region",
caption = "Hansen's values"
)
tmap_mode("view")
tm_shape(hospital_sf) +
tm_symbols(size = 0.1) +
tm_shape(hdb_SAM) +
tm_bubbles(col = "accSAM",
n = 5,
style = "quantile",
size = 0.001,
border.col = "black",
border.lwd = 1)