1 Introduction

In this hands-on exercise, you will gain hands-on experience on how to model geographical accessibility by using R’s geospatial analysis packages.

1.1 Learning Outcome

By the end of this hands-on exercise, you will be able:

  • to import GIS polygon data into R and save them as simple feature data.frame by using appropriate functions of sf package of R;
  • to import aspatial data into R and save them as simple feature data.frame by using appropriate functions of sf package of R;
  • to computer accessibility measure by using Hansen’s potential model and Spatial Accessibility Measure (SAM); and
  • to visualise the accessibility measures by using tmap and ggplot2 packages.

1.2 The data

Two data sets will be used in this hands-on exercise, they are:

  • URA Master Plan subzone boundary in ESRI shapefile format.
  • HDB location data (i.e. hdb_data.csv) in csv file format.
  • Hospital location data (i.e. hospital) in csv format.
  • Distance matrix (i.e. distance_matrix) in csv format. This file consists of shortest travel distance (distance) and travel time (i.e. time) from a hdb location to a hospital location.

Warning: These three aspatial data are specially collected by Prof. Kam for teaching and research purpose. Students taking IS415 Geospatial Analytics and Applications are allowed to use them for hands-on exercise purpose. Please obtain formal approval from Prof. Kam if you want to use them for other courses or usage.

2 Getting Started

2.1 Installing and launching R packages

Before we getting started, it is important for us to install the necessary R packages and launch them into RStudio environment.

The R packages need for this exercise are as follows:

  • Spatial data handling
    • sf
  • Modelling geographical accessibility
    • spatialAcc
  • Attribute data handling
    • tidyverse, especially readr and dplyr
  • thematic mapping
    • tmap
  • Satistical graphic
    • ggplot2

The code chunk below installs and launches these R packages into RStudio environment.

packages = c('tmap', 'SpatialAcc',
             'sf', 'ggstatsplot',
             'tidyverse')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Notice that with tidyverse, we do not have to install readr, dplyr and ggplots packages separately. In fact, tidyverse also installs other R packages such as tidyr, stringr, forcats, tibble, purrr and magrittr.

3 Geospatial Data Wrangling

3.1 Importing geospatial data

The geospatial data used in this hands-on exercise is called MP14_SUBZONE_WEB_PL. It is in ESRI shapefile format. The shapefile consists of URA Master Plan 2014’s planning subzone boundaries. Polygon features are used to represent these geographic boundaries. The GIS data is in svy21 projected coordinates systems.

The code chunk below is used to import MP_SUBZONE_WEB_PL shapefile by using st_read() of sf packages.

mpsz = st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `D:\IS415-AY2020-21T1\03-Hands-on Exercises\Hands-on_Ex10\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21

The report above shows that the R object used to contain the imported MP14_SUBZONE_WEB_PL shapefile is called mpsz and it is a simple feature object. The geometry type is multipolygon. it is also important to note that mpsz simple feature object does not have EPSG information.

3.2 Updating CRS information

The code chunk below updates the newly imported mpsz with the correct ESPG code (i.e. 3414)

mpsz_svy21 <- st_transform(mpsz, 3414)

After transforming the projection metadata, you can varify the projection of the newly transformed mpsz_svy21 by using st_crs() of sf package.

The code chunk below will be used to varify the newly transformed mpsz_svy21.

st_crs(mpsz_svy21)
## Coordinate Reference System:
##   User input: EPSG:3414 
##   wkt:
## PROJCRS["SVY21 / Singapore TM",
##     BASEGEOGCRS["SVY21",
##         DATUM["SVY21",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4757]],
##     CONVERSION["Singapore Transverse Mercator",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (N)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (E)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Singapore"],
##         BBOX[1.13,103.59,1.47,104.07]],
##     ID["EPSG",3414]]

Notice that the EPSG: is indicated as 3414 now.

Next, you will reveal the extent of mpsz_svy21 by using st_bbox() of sf package.

st_bbox(mpsz_svy21) #view extent
##      xmin      ymin      xmax      ymax 
##  2667.538 15748.721 56396.440 50256.334

4 Apsaital Data Handling and Wrangling

4.1 Importing HDB Data

The code chunk below performs the following tasks:

  • importing hdb_data.csv using read_csv function of readr package.
  • using the mutate() and rowSums() functions of dplyr to derive a new variable called Total by summing columns 2 to 11 of the importing hdb_data file.
  • calling the output tibbled data.frame hdb.
hdb <- read_csv("data/aspatial/hdb_data.csv") %>%
mutate(Total = rowSums(.[2:11]))

Next, the code chunk below will be used to convert the data type of Postcode field from numeric to character.

hdb$Postcode <- as.character(hdb$Postcode)

Lastly, st_as_sf() of sf package is used to convert the asptial data.frame into simple point feature data.frame by using the values from X and Y field. Note that the coordinates are read as longitude (i.e. X) and latitude (i.e. Y). The crs argument is used to define the projection system, in this case EPSG: 3414 which is svy21 projected coordinates system of Singapore.

hdb_sf <- st_as_sf(hdb, 
  coords = c('X', 'Y'),
  crs = "+init=epsg:3414")

We can display and print out the summary statistics of fields in hdb_sf point feature data.frame by using the code chunk below.

summary(hdb_sf)
##    Postcode             1-room            2-room            3-room      
##  Length:9978        Min.   :  0.000   Min.   :  0.000   Min.   :  0.00  
##  Class :character   1st Qu.:  0.000   1st Qu.:  0.000   1st Qu.:  0.00  
##  Mode  :character   Median :  0.000   Median :  0.000   Median :  0.00  
##                     Mean   :  3.016   Mean   :  4.918   Mean   : 24.09  
##                     3rd Qu.:  0.000   3rd Qu.:  0.000   3rd Qu.: 24.00  
##                     Max.   :520.000   Max.   :452.000   Max.   :528.00  
##      4-room           5-room         Executive            HUDC        
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.000   Min.   :0.00000  
##  1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.000   1st Qu.:0.00000  
##  Median : 34.00   Median :  1.00   Median :  0.000   Median :0.00000  
##  Mean   : 41.34   Mean   : 23.99   Mean   :  6.487   Mean   :0.01012  
##  3rd Qu.: 65.00   3rd Qu.: 45.00   3rd Qu.:  0.000   3rd Qu.:0.00000  
##  Max.   :316.00   Max.   :164.00   Max.   :135.000   Max.   :1.00000  
##  Multi-generation   Studio Apartment      Type S1            Type S2        
##  Min.   : 0.00000   Min.   :  0.0000   Min.   :  0.0000   Min.   : 0.00000  
##  1st Qu.: 0.00000   1st Qu.:  0.0000   1st Qu.:  0.0000   1st Qu.: 0.00000  
##  Median : 0.00000   Median :  0.0000   Median :  0.0000   Median : 0.00000  
##  Mean   : 0.03678   Mean   :  0.9015   Mean   :  0.1235   Mean   : 0.06174  
##  3rd Qu.: 0.00000   3rd Qu.:  0.0000   3rd Qu.:  0.0000   3rd Qu.: 0.00000  
##  Max.   :66.00000   Max.   :217.0000   Max.   :176.0000   Max.   :88.00000  
##     Latitude       Longitude         Total                geometry   
##  Min.   :1.270   Min.   :103.7   Min.   :  1.0   POINT        :9978  
##  1st Qu.:1.337   1st Qu.:103.8   1st Qu.: 76.0   epsg:NA      :   0  
##  Median :1.363   Median :103.8   Median : 98.0   +proj=tmer...:   0  
##  Mean   :1.365   Mean   :103.8   Mean   :104.9                       
##  3rd Qu.:1.391   3rd Qu.:103.9   3rd Qu.:128.0                       
##  Max.   :1.457   Max.   :104.0   Max.   :584.0

4.2 Importing Hospital Data

Repeat the steps you had learned in previous section to import hospital data.

hospital <- read_csv("data/aspatial/hospital.csv")
hospital$Postcode <- as.character(hospital$Postcode)
hospital_sf <- st_as_sf(
  hospital, 
  coords = c('X', 'Y'),
  crs = "+init=epsg:3414"
)
summary(hospital_sf)
##    Postcode            Latitude       Longitude        No Beds      
##  Length:26          Min.   :1.280   Min.   :103.7   Min.   :  34.0  
##  Class :character   1st Qu.:1.311   1st Qu.:103.8   1st Qu.: 241.5  
##  Mode  :character   Median :1.326   Median :103.8   Median : 375.0  
##                     Mean   :1.339   Mean   :103.8   Mean   : 610.9  
##                     3rd Qu.:1.366   3rd Qu.:103.9   3rd Qu.: 957.5  
##                     Max.   :1.424   Max.   :103.9   Max.   :2000.0  
##           geometry 
##  POINT        :26  
##  epsg:NA      : 0  
##  +proj=tmer...: 0  
##                    
##                    
## 

4.3 Importing Distance Matrix

The code chunk below uses read_cvs() of readr package to import distance_matrix.csv into RStudio. The imported object is a tibble data.frame called dismat.

dismat <- read_csv("data/aspatial/distance_matrix.csv", skip = 0)

4.3.1 Converting to character data type.

Next, the code chunk below is used to convert the postal_hospital field from numeric to character data type.

dismat$postal_hospital <- as.character(dismat$postal_hospital)

Next, the spread() of tidyr package is used to transform the O-D matrix from a thin format into a fat format.

distmat_fat <- dismat %>%
  select(postal_hospital, postal_hdb,
         Distance) %>%
  spread(postal_hospital, Distance)%>%
  select(c(-c('postal_hdb')))

Note: Since tidyr version 1.0 a new function called pivot_wider() is introduce. You should use pivot_wider() instead of spread()

The code chunk below is used to add postal_hdb, lat_hdb and long_hdb fields into the fat distance matrix and the output data.frame is called dismat_temp.

dismat_temp <- dismat %>%
  inner_join(hdb, by=c("postal_hdb"="Postcode"))%>%
  inner_join(hospital, by=c("postal_hospital"="Postcode"))%>%
  select(postal_hospital,postal_hdb, lat_hdb, long_hdb, Distance)%>%
  spread(postal_hospital,Distance)

The final distance matrix is derived by dropping the postal_hdb, lat_hdb, and long_hdb fields.

dismat_matrix <- dismat_temp %>%
  select (-c('postal_hdb',
             'lat_hdb','long_hdb'))%>%
  data.matrix(rownames.force = NA)

The code chunk below is used to convert the values in the distance matrix from metres to kilometres.

distmat_km<-as.matrix(distmat_fat/1000)

5 Computing Geographical Accessibility Measures

In this section, the ac() of SpatialAcc package will be used to compute a collection of geographical accessibility measure by using Hansen and SAM algorithm.

5.1 Modelling geographical accessibility using Hansen algorithm

The code chunk below is used to compute Hansen’s accessibility measure (refer to argument family). It performs the following tasks:

  • ac() is used to compute Hansen’s accessibility values. The output object is then convert into a tibble data.frame called acc_Hansen.
  • Perform left-join with hdb_sf simple feature data.frame and called the output data.frame hdb_Hansen.
acc_Hansen <- data.frame(ac(hdb$Total,
                  hospital$`No Beds`,
                  distmat_km, 
                  d0 = 30000,
                  power = 2, 
                  family = "Hansen"))
colnames(acc_Hansen) <- "accHansen"
acc_Hansen <- tbl_df(acc_Hansen)
hdb_Hansen <- bind_cols(hdb_sf, acc_Hansen)

5.2 Modelling accessibility using SAM algorithm

The code chunk below is used to compute SAM accessibility measure (refer to argument family). It performs the following tasks:

  • ac() is used to compute Hansen’s accessibility values. The output object is then convert into a tibble data.frame called acc_SAM.
  • Perform left-join with hdb_sf simple feature data.frame and caled the output data.frame hdb_SAM.
acc_SAM <- data.frame(ac(hdb$Total,
                  hospital$`No Beds`,
                  distmat_km, 
                  d0 = 30000,
                  power = 2, 
                  family = "SAM"))
colnames(acc_SAM) <- "accSAM"
acc_SAM <- tbl_df(acc_SAM)
hdb_SAM <- bind_cols(hdb_sf, acc_SAM)
sum(is.na(hdb_SAM))
## [1] 0

6 Visualising Geograpgical Accessibility Measures

6.1 Mapping Hansen accessibility

tmap_mode("view")
tm_shape(hospital_sf) +
  tm_symbols(size = 0.1) +
tm_shape(hdb_Hansen) + 
  tm_bubbles(col = "accHansen",
             n = 5,
             style = "quantile",
             size = 0.001,
             border.col = "black",
             border.lwd = 1)

6.2 Statistical graphic visualisation

In this section, we are going to compare the distribution of Hansen’s accessibility values by URA Planning Region.

Firstly, we need to add the planning region field into hdb_Hansen point feature data.frame by using the code chunk below.

hdb_Hansen <- st_transform(hdb_Hansen, 3414)
hdb_Hansen <- st_join(hdb_Hansen, mpsz_svy21, join = st_intersects)

Next, ggplot() will be used to plot the distribution by using boxplot graphical method.

ggplot(data=hdb_Hansen, 
       aes(y = accHansen, 
           x= REGION_N)) +
  geom_boxplot() +
  geom_point(stat="summary", 
             fun.y="mean", 
             colour ="red", 
             size=4)

To confirm the observation, comfirmatory data analysis can be performed by using ggbetweenstats() of ggstatsplot package of R.

hdb_Hansen$log_accHansen <- log(hdb_Hansen$accHansen)
ggbetweenstats(data = hdb_Hansen,
               x = REGION_N,
               y = log_accHansen,
               pairwise.comparisons = TRUE,
               p.adjust.method = "fdr",
  title = "Hansen's accessibility values by HDB location and by Region",
  caption = "Hansen's values"
)

6.3 Mapping SAM acccessibility

tmap_mode("view")
tm_shape(hospital_sf) +
  tm_symbols(size = 0.1) +
tm_shape(hdb_SAM) + 
  tm_bubbles(col = "accSAM",
             n = 5,
             style = "quantile",
             size = 0.001,
             border.col = "black",
             border.lwd = 1)