Modelling Geographical Accessibility

IS415 Geospatial Analytics and Application

Instructor: Dr. Kam Tin Seong.

Assoc. Professor of Information Systems (Practice)

1. Overview

We will be learning how to model geographical accessibility of hospitals from HDBs in Singapore by using R’s geospatial analysis packages.

We will compute the accessibility using Hansen’s potential model and Spatial Accessibility Measure (SAM) and visualize it using tmap and ggplot2 packages.

1.1 Dataset

  • Singapore 2014 Master Plan subzone boundary shapefile containing the boundary layer of planning areas and subzones in Singapore, you can find the data here

  • hdb_data.csv containing HDB location data, collected by Prof. Kam

  • hospital.csv containing hospital location data, collected by Prof. Kam

  • distance_matrix.csv containing shortest travel distance and travel time from a HDB location to a hospital location, collected by Prof. Kam

Warning: These three aspatial data are specially collected by Prof. Kam for teaching and research purpose. Students taking IS415 Geospatial Analytics and Applications are allowed to use them for hands-on exercise purpose. Please obtain formal approval from Prof. Kam if you want to use them for other courses or usage.

2. Load/Install necessary R packages

We will be using these R packages:

  • sf - for handling spatial data

  • tmap - for choropleth mapping

  • tidyverse - for tidying attribute/variable data (include tidyr, readr, ggplot2 & dplyr packages)

  • SpatialAcc - for modeling geographical accessibility, providing a set of spatial accessibility measures from a set of locations (demand) to another set of locations (supply), more info can be found here

  • ggstatsplot - for ggplot2 plots with statistical details, more info can be found here

packages = c('sf', 'tmap', 'tidyverse', 'SpatialAcc', 'ggstatsplot')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

3. Geospatial Data Wrangling

3.1 Importing geospatial data (shapefile) into R

We import the MP14_SUBZONE_WEB_PL shapefile using st_read() function of sf package. The shapefile consists of URA Master Plan 2014’s planning subzone boundaries and is in svy21 projected coordinate reference systems (CRS).

mpsz <- st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `D:\SMU Year 3 Sem 1\IS415 Geospatial Analytics and Application\In_class_exercise\In_class_exercise10\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21

mpsz is a simple feature data.frame and it’s geometry type is multipolygon. It is also important to note that mpsz sf object does not have EPSG information.

3.2 Transforming CRS

We will transform the mpsz object to a new dataframe with EPSG 3414 by using st_transform() function. After that, we can verify the CRS by using st_crs() function.

mpsz_svy21 <- st_transform(mpsz, 3414)
st_crs(mpsz_svy21)
## Coordinate Reference System:
##   User input: EPSG:3414 
##   wkt:
## PROJCRS["SVY21 / Singapore TM",
##     BASEGEOGCRS["SVY21",
##         DATUM["SVY21",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4757]],
##     CONVERSION["Singapore Transverse Mercator",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",1.36666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",103.833333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",28001.642,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",38744.572,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (N)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (E)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Singapore"],
##         BBOX[1.13,103.59,1.47,104.07]],
##     ID["EPSG",3414]]

We can also see the extent of mpsz_svy21 by using st_bbox() function of sf package.

st_bbox(mpsz_svy21)
##      xmin      ymin      xmax      ymax 
##  2667.538 15748.721 56396.440 50256.334

4. Aspatial Data Wrangling

4.1 Importing HDB data (csv)

We import the csv file using read_csv() function of readr package.

Then, we will derive a new variable called Total which is the sum of columns 2 to 11 (1-room to Type S1)

hdb <- read_csv("data/aspatial/hdb_data.csv") %>% mutate(Total = rowSums(.[2:11]))
## 
## -- Column specification --------------------
## cols(
##   Postcode = col_double(),
##   `1-room` = col_double(),
##   `2-room` = col_double(),
##   `3-room` = col_double(),
##   `4-room` = col_double(),
##   `5-room` = col_double(),
##   Executive = col_double(),
##   HUDC = col_double(),
##   `Multi-generation` = col_double(),
##   `Studio Apartment` = col_double(),
##   `Type S1` = col_double(),
##   `Type S2` = col_double(),
##   X = col_double(),
##   Y = col_double(),
##   Latitude = col_double(),
##   Longitude = col_double()
## )
hdb
## # A tibble: 9,978 x 17
##    Postcode `1-room` `2-room` `3-room` `4-room` `5-room` Executive  HUDC
##       <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>     <dbl> <dbl>
##  1    85101        0        0        0        0        0         0     0
##  2    85201        0        0        0        0        0         0     0
##  3    85301        0        0        0        0        0         0     0
##  4    85401        0        0        0        0        0         0     0
##  5    85501        0        0        0        0        0         0     0
##  6    85601        0        0        0        0        0         0     0
##  7    85701        0        0        0        0        0         0     0
##  8    50004        0        0      248        0        2         0     0
##  9    50005      430        0        0        0        0         0     0
## 10    50032        0       86       38        3        0         0     0
## # ... with 9,968 more rows, and 9 more variables: `Multi-generation` <dbl>,
## #   `Studio Apartment` <dbl>, `Type S1` <dbl>, `Type S2` <dbl>, X <dbl>,
## #   Y <dbl>, Latitude <dbl>, Longitude <dbl>, Total <dbl>

We will also convert the data type of Postcode variable from numeric/double to character and use st_as_sf() function of sf package to convert the hdb aspatial tibble data.frame to sf data.frame using the values from X(longitude) and Y(latitude) values.

hdb$Postcode <- as.character(hdb$Postcode)

hdb_sf <- st_as_sf(hdb, 
                   coords = c("X", "Y"),
                   crs = "+init=epsg:3414")
summary(hdb_sf)
##    Postcode             1-room            2-room            3-room      
##  Length:9978        Min.   :  0.000   Min.   :  0.000   Min.   :  0.00  
##  Class :character   1st Qu.:  0.000   1st Qu.:  0.000   1st Qu.:  0.00  
##  Mode  :character   Median :  0.000   Median :  0.000   Median :  0.00  
##                     Mean   :  3.016   Mean   :  4.918   Mean   : 24.09  
##                     3rd Qu.:  0.000   3rd Qu.:  0.000   3rd Qu.: 24.00  
##                     Max.   :520.000   Max.   :452.000   Max.   :528.00  
##      4-room           5-room         Executive            HUDC        
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.000   Min.   :0.00000  
##  1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.000   1st Qu.:0.00000  
##  Median : 34.00   Median :  1.00   Median :  0.000   Median :0.00000  
##  Mean   : 41.34   Mean   : 23.99   Mean   :  6.487   Mean   :0.01012  
##  3rd Qu.: 65.00   3rd Qu.: 45.00   3rd Qu.:  0.000   3rd Qu.:0.00000  
##  Max.   :316.00   Max.   :164.00   Max.   :135.000   Max.   :1.00000  
##  Multi-generation   Studio Apartment      Type S1            Type S2        
##  Min.   : 0.00000   Min.   :  0.0000   Min.   :  0.0000   Min.   : 0.00000  
##  1st Qu.: 0.00000   1st Qu.:  0.0000   1st Qu.:  0.0000   1st Qu.: 0.00000  
##  Median : 0.00000   Median :  0.0000   Median :  0.0000   Median : 0.00000  
##  Mean   : 0.03678   Mean   :  0.9015   Mean   :  0.1235   Mean   : 0.06174  
##  3rd Qu.: 0.00000   3rd Qu.:  0.0000   3rd Qu.:  0.0000   3rd Qu.: 0.00000  
##  Max.   :66.00000   Max.   :217.0000   Max.   :176.0000   Max.   :88.00000  
##     Latitude       Longitude         Total                geometry   
##  Min.   :1.270   Min.   :103.7   Min.   :  1.0   POINT        :9978  
##  1st Qu.:1.337   1st Qu.:103.8   1st Qu.: 76.0   epsg:NA      :   0  
##  Median :1.363   Median :103.8   Median : 98.0   +proj=tmer...:   0  
##  Mean   :1.365   Mean   :103.8   Mean   :104.9                       
##  3rd Qu.:1.391   3rd Qu.:103.9   3rd Qu.:128.0                       
##  Max.   :1.457   Max.   :104.0   Max.   :584.0

4.2 Importing Hospital data (csv)

Next, we will import the hospital csv data.

hospital <- read_csv("data/aspatial/hospital.csv")
## 
## -- Column specification --------------------
## cols(
##   Postcode = col_double(),
##   X = col_double(),
##   Y = col_double(),
##   Latitude = col_double(),
##   Longitude = col_double(),
##   `No Beds` = col_double()
## )
hospital
## # A tibble: 26 x 6
##    Postcode      X      Y Latitude Longitude `No Beds`
##       <dbl>  <dbl>  <dbl>    <dbl>     <dbl>     <dbl>
##  1   119074 22476. 30756.     1.29      104.      1239
##  2   159964 24289. 30112.     1.29      104.       176
##  3   169608 27711. 29155.     1.28      104.      1785
##  4   188770 30687. 31504.     1.30      104.       380
##  5   217562 30294. 32752.     1.31      104.       121
##  6   228510 28236. 31944.     1.31      104.       345
##  7   229899 29500. 32531.     1.31      104.       830
##  8   258500 26474. 32194.     1.31      104.       258
##  9   289891 25774. 34337.     1.33      104.        34
## 10   307677 28906. 34177.     1.33      104.       190
## # ... with 16 more rows
hospital$Postcode <- as.character(hospital$Postcode)
hospital_sf <- st_as_sf(hospital,
                        coords = c("X", "Y"),
                        crs = "+init=epsg:3414")
summary(hospital_sf)
##    Postcode            Latitude       Longitude        No Beds      
##  Length:26          Min.   :1.280   Min.   :103.7   Min.   :  34.0  
##  Class :character   1st Qu.:1.311   1st Qu.:103.8   1st Qu.: 241.5  
##  Mode  :character   Median :1.326   Median :103.8   Median : 375.0  
##                     Mean   :1.339   Mean   :103.8   Mean   : 610.9  
##                     3rd Qu.:1.366   3rd Qu.:103.9   3rd Qu.: 957.5  
##                     Max.   :1.424   Max.   :103.9   Max.   :2000.0  
##           geometry 
##  POINT        :26  
##  epsg:NA      : 0  
##  +proj=tmer...: 0  
##                    
##                    
## 

4.3 Importing Distance Matrix (csv)

Lastly, we import the distance matrix csv data using the same steps.

dismat <- read_csv("data/aspatial/distance_matrix.csv")
## 
## -- Column specification --------------------
## cols(
##   postal_hospital = col_double(),
##   lat_hospital = col_double(),
##   long_hospital = col_double(),
##   postal_hdb = col_character(),
##   lat_hdb = col_double(),
##   long_hdb = col_double(),
##   Distance = col_double(),
##   Time = col_double()
## )
dismat
## # A tibble: 259,428 x 8
##    postal_hospital lat_hospital long_hospital postal_hdb lat_hdb long_hdb
##              <dbl>        <dbl>         <dbl> <chr>        <dbl>    <dbl>
##  1          119074         1.29          104. 050004        1.28     104.
##  2          159964         1.29          104. 050004        1.28     104.
##  3          169608         1.28          104. 050004        1.28     104.
##  4          188770         1.30          104. 050004        1.28     104.
##  5          217562         1.31          104. 050004        1.28     104.
##  6          228510         1.31          104. 050004        1.28     104.
##  7          229899         1.31          104. 050004        1.28     104.
##  8          258500         1.31          104. 050004        1.28     104.
##  9          289891         1.33          104. 050004        1.28     104.
## 10          307677         1.33          104. 050004        1.28     104.
## # ... with 259,418 more rows, and 2 more variables: Distance <dbl>, Time <dbl>

Note that Distance is in metres and Time is in seconds.

4.3.1 Tidying distance matrix

dismat$postal_hospital <- as.character(dismat$postal_hospital)
dismat_fat <- dismat %>%
  select(postal_hospital, postal_hdb, Distance)
dismat_fat
## # A tibble: 259,428 x 3
##    postal_hospital postal_hdb Distance
##    <chr>           <chr>         <dbl>
##  1 119074          050004         9251
##  2 159964          050004         6035
##  3 169608          050004         1976
##  4 188770          050004         2924
##  5 217562          050004         4500
##  6 228510          050004         4210
##  7 229899          050004         5169
##  8 258500          050004         4959
##  9 289891          050004         9775
## 10 307677          050004         7697
## # ... with 259,418 more rows

We will use pivot_wider() or spread() function of tidyr package to transform the distance-matrix from thin format (long) into fat format (wide).

dismat_fat <- dismat_fat %>%
  spread(postal_hospital, Distance) %>%
  select(-postal_hdb)

dismat_fat
## # A tibble: 9,978 x 26
##    `119074` `159964` `169608` `188770` `217562` `228510` `229899` `258500`
##       <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
##  1     9251     6035     1976     2924     4500     4210     5169     4959
##  2     9355     6114     2056     3027     4604     4289     5273     5062
##  3     8430     5596     1537     2738     4314     3995     4983     4772
##  4     9016     5819     1760     2688     4264     3974     4934     4723
##  5     9136     6027     2387     2435     4271     4077     4940     4854
##  6     9061     5951     2311     2359     4195     4002     4864     4778
##  7     8986     5876     2236     2284     4120     3927     4789     4703
##  8     9036     5926     2286     2334     4170     3977     4839     4753
##  9     8947     5853     2213     2245     4081     3903     4750     4664
## 10     9148     6011     1953     2821     4397     4186     5066     4855
## # ... with 9,968 more rows, and 18 more variables: `289891` <dbl>,
## #   `307677` <dbl>, `308433` <dbl>, `329562` <dbl>, `329563` <dbl>,
## #   `427990` <dbl>, `529889` <dbl>, `529895` <dbl>, `539747` <dbl>,
## #   `544835` <dbl>, `544886` <dbl>, `547530` <dbl>, `569766` <dbl>,
## #   `574623` <dbl>, `609606` <dbl>, `659674` <dbl>, `768024` <dbl>,
## #   `768828` <dbl>

We will add several variables into the fat/wide distance matrix and call it a new data.frame dismat_temp.

dismat_temp <- dismat %>%
  inner_join(hdb, by = c("postal_hdb" = "Postcode")) %>% 
  inner_join(hospital, by = c("postal_hospital" = "Postcode")) %>%
  select(postal_hospital, postal_hdb, lat_hdb, long_hdb, Distance) %>%
  spread(postal_hospital, Distance)

dismat_temp
## # A tibble: 9,870 x 29
##    postal_hdb lat_hdb long_hdb `119074` `159964` `169608` `188770` `217562`
##    <chr>        <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
##  1 100044        1.27     104.     5634     3925     4516     7633    10481
##  2 100045        1.27     104.     5672     3928     4519     7671    10518
##  3 100046        1.27     104.     5548     3803     4394     7546    10394
##  4 100047        1.27     104.     5512     3768     4359     7511    10358
##  5 100048        1.27     104.     5399     3655     4245     7398    10245
##  6 100049        1.27     104.     5411     3666     4257     7410    10257
##  7 100050        1.27     104.     5067     3323     4266     7418    10266
##  8 100051        1.27     104.     5220     3475     4715     7867    10714
##  9 100052        1.27     104.     5300     3555     4348     7500    10348
## 10 100053        1.27     104.     5346     3601     4245     7397    10244
## # ... with 9,860 more rows, and 21 more variables: `228510` <dbl>,
## #   `229899` <dbl>, `258500` <dbl>, `289891` <dbl>, `307677` <dbl>,
## #   `308433` <dbl>, `329562` <dbl>, `329563` <dbl>, `427990` <dbl>,
## #   `529889` <dbl>, `529895` <dbl>, `539747` <dbl>, `544835` <dbl>,
## #   `544886` <dbl>, `547530` <dbl>, `569766` <dbl>, `574623` <dbl>,
## #   `609606` <dbl>, `659674` <dbl>, `768024` <dbl>, `768828` <dbl>

The final distance matrix will be derived by dropping the postal_hdb, lat_hdb and long_hdb variables from dismat_temp, as well as converting the values in the matrix from metres to kilometres.

dismat_matrix <- dismat_temp %>%
  select(-c(postal_hdb, lat_hdb, long_hdb)) %>%
  data.matrix(rownames.force = NA) 

head(dismat_matrix)
##      119074 159964 169608 188770 217562 228510 229899 258500 289891 307677
## [1,]   5634   3925   4516   7633  10481   6299   7786   5570  11181   8700
## [2,]   5672   3928   4519   7671  10518   6302   7823   5608  11219   8738
## [3,]   5548   3803   4394   7546  10394   6177   7699   5484  11095   8613
## [4,]   5512   3768   4359   7511  10358   6141   7664   5448  11059   8578
## [5,]   5399   3655   4245   7398  10245   6028   7550   5335  10946   8465
## [6,]   5411   3666   4257   7410  10257   6040   7562   5347  10958   8477
##      308433 329562 329563 427990 529889 529895 539747 544835 544886 547530
## [1,]  10004   8293   8143  16252  21733  21853  19698  21551  21138  17537
## [2,]  10007   8330   8181  16290  21770  21890  19701  21588  21176  17575
## [3,]   9882   8206   8057  16165  21646  21766  19576  21464  21051  17450
## [4,]   9846   8171   8021  16130  21610  21730  19541  21429  21016  17415
## [5,]   9733   8058   7908  16017  21497  21617  19428  21315  20903  17302
## [6,]   9745   8069   7920  16029  21509  21629  19439  21327  20915  17314
##      569766 574623 609606 659674 768024 768828
## [1,]  17861  10854  12178  14764  25027  25240
## [2,]  17899  10857  12181  14801  25065  25277
## [3,]  17774  10732  12056  14677  24941  25153
## [4,]  17739  10697  12021  14642  24905  25118
## [5,]  17626  10584  11908  14529  24792  25005
## [6,]  17638  10595  11919  14540  24804  25017
dismat_km <- as.matrix(dismat_fat/1000)
head(dismat_km)
##      119074 159964 169608 188770 217562 228510 229899 258500 289891 307677
## [1,]  9.251  6.035  1.976  2.924  4.500  4.210  5.169  4.959  9.775  7.697
## [2,]  9.355  6.114  2.056  3.027  4.604  4.289  5.273  5.062  9.878  7.800
## [3,]  8.430  5.596  1.537  2.738  4.314  3.995  4.983  4.772  9.589  7.510
## [4,]  9.016  5.819  1.760  2.688  4.264  3.974  4.934  4.723  9.539  7.461
## [5,]  9.136  6.027  2.387  2.435  4.271  4.077  4.940  4.854  9.670  7.592
## [6,]  9.061  5.951  2.311  2.359  4.195  4.002  4.864  4.778  9.595  7.517
##      308433 329562 329563 427990 529889 529895 539747 544835 544886 547530
## [1,]  6.298  7.000  7.058  9.302 18.045 18.165 15.992 17.863 17.450 13.849
## [2,]  6.377  7.104  7.162  9.405 18.148 18.268 16.072 17.966 17.554 13.953
## [3,]  6.083  6.814  6.872  9.115 17.858 17.978 15.777 17.676 17.264 13.663
## [4,]  6.062  6.764  6.823  9.066 17.809 17.929 15.756 17.627 17.214 13.613
## [5,]  6.175  6.895  5.722 11.043 17.940 18.060 15.870 17.758 17.345 13.744
## [6,]  6.100  6.820  5.647 11.097 17.864 17.984 15.794 17.683 17.270 13.669
##      569766 574623 609606 659674 768024 768828
## [1,] 14.173  9.816 15.777 18.381 21.339 21.552
## [2,] 14.277  9.896 15.856 18.484 21.443 21.655
## [3,] 13.987  9.601 14.950 17.560 21.153 21.366
## [4,] 13.937  9.581 15.541 18.145 21.104 21.316
## [5,] 14.068  8.397 15.645 18.266 21.234 21.447
## [6,] 13.993  8.322 15.569 18.191 21.159 21.372

5. Computing Geographical Accessibility Measure

In this section, the ac() function of SpatialAcc package will be used to compute a collection of geographical accessibility measures such as Hansen’s potential model and Spatial Accessibility Measure (SAM).

5.1 Modeling accessibility using Hansen’s algorithm

The code chunk below is used to compute Hansen’s accessibility measure (refer to argument family). It performs the following tasks:

  • ac() is used to compute Hansen’s accessibility values. The output object is then converted into a tibble data.frame called acc_Hansen

  • bind columns of “acc_Hansen” with “hdb_sf” sf data.frame and call the output data.frame hdb_Hansen

d0 argument is the threshold distance or time for accessibility, it’s 30 in kilometres and 30,000 in metres.

family argument can be “Hansen”, “SAM” (Default), “2SFCA” and “KD2SFCA”.

acc_Hansen <- data.frame(ac(hdb$Total,
                            hospital$`No Beds`,
                            dismat_km,
                            d0 = 30,
                            power = 2,
                            family = "Hansen"))
colnames(acc_Hansen) <- "accHansen"
acc_Hansen <- tbl_df(acc_Hansen)
hdb_Hansen <- bind_cols(hdb_sf, acc_Hansen)
hdb_Hansen
## Simple feature collection with 9978 features and 16 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 11519.15 ymin: 28097.64 xmax: 45192.3 ymax: 48741.06
## projected CRS:  SVY21 / Singapore TM
## # A tibble: 9,978 x 17
##    Postcode `1-room` `2-room` `3-room` `4-room` `5-room` Executive  HUDC
##  * <chr>       <dbl>    <dbl>    <dbl>    <dbl>    <dbl>     <dbl> <dbl>
##  1 85101           0        0        0        0        0         0     0
##  2 85201           0        0        0        0        0         0     0
##  3 85301           0        0        0        0        0         0     0
##  4 85401           0        0        0        0        0         0     0
##  5 85501           0        0        0        0        0         0     0
##  6 85601           0        0        0        0        0         0     0
##  7 85701           0        0        0        0        0         0     0
##  8 50004           0        0      248        0        2         0     0
##  9 50005         430        0        0        0        0         0     0
## 10 50032           0       86       38        3        0         0     0
## # ... with 9,968 more rows, and 9 more variables: `Multi-generation` <dbl>,
## #   `Studio Apartment` <dbl>, `Type S1` <dbl>, `Type S2` <dbl>, Latitude <dbl>,
## #   Longitude <dbl>, Total <dbl>, geometry <POINT [m]>, accHansen <dbl>

5.2 Modeling accessibility using SAM algorithm

The code chunk below is used to compute SAM accessibility measure (refer to argument family). It performs the following tasks:

  • ac() is used to compute SAM accessibility values. The output object is then converted into a tibble data.frame called acc_SAM

  • bind columns of “acc_SAM” with “hdb_sf” sf data.frame and call the output data.frame hdb_SAM

acc_SAM <- data.frame(ac(hdb$Total,
                            hospital$`No Beds`,
                            dismat_km,
                            d0 = 30,
                            power = 2,
                            family = "SAM"))
colnames(acc_SAM) <- "accSAM"
acc_SAM <- tbl_df(acc_SAM)
hdb_SAM <- bind_cols(hdb_sf, acc_SAM)
hdb_SAM
## Simple feature collection with 9978 features and 16 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 11519.15 ymin: 28097.64 xmax: 45192.3 ymax: 48741.06
## projected CRS:  SVY21 / Singapore TM
## # A tibble: 9,978 x 17
##    Postcode `1-room` `2-room` `3-room` `4-room` `5-room` Executive  HUDC
##  * <chr>       <dbl>    <dbl>    <dbl>    <dbl>    <dbl>     <dbl> <dbl>
##  1 85101           0        0        0        0        0         0     0
##  2 85201           0        0        0        0        0         0     0
##  3 85301           0        0        0        0        0         0     0
##  4 85401           0        0        0        0        0         0     0
##  5 85501           0        0        0        0        0         0     0
##  6 85601           0        0        0        0        0         0     0
##  7 85701           0        0        0        0        0         0     0
##  8 50004           0        0      248        0        2         0     0
##  9 50005         430        0        0        0        0         0     0
## 10 50032           0       86       38        3        0         0     0
## # ... with 9,968 more rows, and 9 more variables: `Multi-generation` <dbl>,
## #   `Studio Apartment` <dbl>, `Type S1` <dbl>, `Type S2` <dbl>, Latitude <dbl>,
## #   Longitude <dbl>, Total <dbl>, geometry <POINT [m]>, accSAM <dbl>

We can check if there are any NA values in hdb_SAM and hdb_Hansen.

sum(is.na(hdb_SAM))
## [1] 0
sum(is.na(hdb_Hansen))
## [1] 0

We can see that there are no NA values! We can move on to visualising the geographical accessibility measures.

6. Visualising Geographical Accessibility Measures

6.1 Mapping Hansen’s potential model

We will change the tmap mode to view for interactive mapping.

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(hospital_sf) +
  tm_symbols(size = 0.1) +
tm_shape(hdb_Hansen) + 
  tm_bubbles(col = "accHansen",
             n = 5,
             style = "quantile",
             size = 0.001,
             border.col = "black",
             border.lwd = 1)

The darker spatial points have higher accessibility and lighter spatial points have lower accessibility.

We can see that the west and north west region have relatively low poor accessibility to hospital facilities.

6.2 Statistical graphs visualisation

We are going to compare the distribution of Hansen’s accessibility values by planning region.

Firstly, we will add the planning regions from mpsz_svy21 into hdb_Hansen sf point data.frame by using st_join() function of sf package which is a spatial join based on geometry. It will overlap the hospital spatial points with the mpsz_svy21 geometry by using st_intersects.

hdb_Hansen <- st_transform(hdb_Hansen, 3414)
hdb_Hansen <- st_join(hdb_Hansen, mpsz_svy21, join = st_intersects)
hdb_Hansen
## Simple feature collection with 9978 features and 31 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 11519.15 ymin: 28097.64 xmax: 45192.3 ymax: 48741.06
## projected CRS:  SVY21 / Singapore TM
## # A tibble: 9,978 x 32
##    Postcode `1-room` `2-room` `3-room` `4-room` `5-room` Executive  HUDC
##  * <chr>       <dbl>    <dbl>    <dbl>    <dbl>    <dbl>     <dbl> <dbl>
##  1 85101           0        0        0        0        0         0     0
##  2 85201           0        0        0        0        0         0     0
##  3 85301           0        0        0        0        0         0     0
##  4 85401           0        0        0        0        0         0     0
##  5 85501           0        0        0        0        0         0     0
##  6 85601           0        0        0        0        0         0     0
##  7 85701           0        0        0        0        0         0     0
##  8 50004           0        0      248        0        2         0     0
##  9 50005         430        0        0        0        0         0     0
## 10 50032           0       86       38        3        0         0     0
## # ... with 9,968 more rows, and 24 more variables: `Multi-generation` <dbl>,
## #   `Studio Apartment` <dbl>, `Type S1` <dbl>, `Type S2` <dbl>, Latitude <dbl>,
## #   Longitude <dbl>, Total <dbl>, geometry <POINT [m]>, accHansen <dbl>,
## #   OBJECTID <int>, SUBZONE_NO <int>, SUBZONE_N <chr>, SUBZONE_C <chr>,
## #   CA_IND <chr>, PLN_AREA_N <chr>, PLN_AREA_C <chr>, REGION_N <chr>,
## #   REGION_C <chr>, INC_CRC <chr>, FMEL_UPD_D <date>, X_ADDR <dbl>,
## #   Y_ADDR <dbl>, SHAPE_Leng <dbl>, SHAPE_Area <dbl>

Next, we will plot the distribution using geom_boxplot() function of ggplot2 package.

ggplot(data = hdb_Hansen, aes(y = accHansen, x = REGION_N)) +
  geom_boxplot() +
  geom_point(stat = "summary",
             fun.y = "mean",
             colour = "red", 
             size = 4)
## No summary function supplied, defaulting to `mean_se()`

We can observe that the values are skewed such that we can’t see the box plot, the interquartile range and the mean properly, hence we can use log of the accHansen values for the plot.

ggplot(data = hdb_Hansen, aes(y = log(accHansen), x = REGION_N)) +
  geom_boxplot() +
  geom_point(stat = "summary",
             fun.y = "mean",
             colour = "red", 
             size = 4)
## No summary function supplied, defaulting to `mean_se()`

From this boxplot, it is reinforced again that the west region have relatively lower accessibility values and the regions with higher hospital facilities would be the central region & the north-east region.

To confirm the observation, confirmatory data analysis can be performed using ggbetweenstats() function of ggstatsplot package. This is also called the ANOVA test.

hdb_Hansen$log_accHansen <- log(hdb_Hansen$accHansen)
ggbetweenstats(data = hdb_Hansen,
               x = REGION_N,
               y = log_accHansen,
               pairwise.comparisons = TRUE,
               p.adjust.method = "fdr",
               title = "Hansen's accessibility values by HDB location and by Region", 
               caption = "Hansen's values")

Let’s try mapping the boxplot based on one region, the central region and the different planning subzones.

hdb_Hansen_central <- hdb_Hansen %>%
  filter(REGION_C == "CR")
ggplot(data = hdb_Hansen_central, aes(y = log_accHansen, x = PLN_AREA_N)) +
  geom_boxplot(alpha = 0.5) +
  geom_point(stat = "summary",
             fun.y = "mean",
             colour = "red", 
             size = 4) + 
  theme_gray(base_size = 7)
## No summary function supplied, defaulting to `mean_se()`

We can see that the 2 planning areas with highest accessibility values are Rochor and Downtown.

6.3 Mapping SAM accessibility

We will change the tmap mode to view for interactive mapping.

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(hospital_sf) +
  tm_symbols(size = 0.1) +
tm_shape(hdb_SAM) + 
  tm_bubbles(col = "accSAM",
             n = 5,
             style = "quantile",
             size = 0.001,
             border.col = "black",
             border.lwd = 1)

6.4 SAM Statistical graphs visualisation

We are going to compare the distribution of SAM’s accessibility values by planning region.

We will add the planning regions from mpsz_svy21 into hdb_SAM sf point data.frame by using st_join() function of sf package.

hdb_SAM <- st_transform(hdb_SAM, 3414)
hdb_SAM <- st_join(hdb_SAM, mpsz_svy21, join = st_intersects)
hdb_SAM
## Simple feature collection with 9978 features and 31 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 11519.15 ymin: 28097.64 xmax: 45192.3 ymax: 48741.06
## projected CRS:  SVY21 / Singapore TM
## # A tibble: 9,978 x 32
##    Postcode `1-room` `2-room` `3-room` `4-room` `5-room` Executive  HUDC
##  * <chr>       <dbl>    <dbl>    <dbl>    <dbl>    <dbl>     <dbl> <dbl>
##  1 85101           0        0        0        0        0         0     0
##  2 85201           0        0        0        0        0         0     0
##  3 85301           0        0        0        0        0         0     0
##  4 85401           0        0        0        0        0         0     0
##  5 85501           0        0        0        0        0         0     0
##  6 85601           0        0        0        0        0         0     0
##  7 85701           0        0        0        0        0         0     0
##  8 50004           0        0      248        0        2         0     0
##  9 50005         430        0        0        0        0         0     0
## 10 50032           0       86       38        3        0         0     0
## # ... with 9,968 more rows, and 24 more variables: `Multi-generation` <dbl>,
## #   `Studio Apartment` <dbl>, `Type S1` <dbl>, `Type S2` <dbl>, Latitude <dbl>,
## #   Longitude <dbl>, Total <dbl>, geometry <POINT [m]>, accSAM <dbl>,
## #   OBJECTID <int>, SUBZONE_NO <int>, SUBZONE_N <chr>, SUBZONE_C <chr>,
## #   CA_IND <chr>, PLN_AREA_N <chr>, PLN_AREA_C <chr>, REGION_N <chr>,
## #   REGION_C <chr>, INC_CRC <chr>, FMEL_UPD_D <date>, X_ADDR <dbl>,
## #   Y_ADDR <dbl>, SHAPE_Leng <dbl>, SHAPE_Area <dbl>

Next, we will plot the distribution using geom_boxplot() function of ggplot2 package.

ggplot(data = hdb_SAM, aes(y = log(accSAM), x = REGION_N)) +
  geom_boxplot() +
  geom_point(stat = "summary",
             fun.y = "mean",
             colour = "red", 
             size = 4)
## No summary function supplied, defaulting to `mean_se()`

From SAM boxplot, we can also see similar observations where the west region is also under-served for hospital facilities and have low accessibility values.