We will be learning how to model geographical accessibility of hospitals from HDBs in Singapore by using R’s geospatial analysis packages.
We will compute the accessibility using Hansen’s potential model and Spatial Accessibility Measure (SAM) and visualize it using tmap and ggplot2 packages.
Singapore 2014 Master Plan subzone boundary shapefile containing the boundary layer of planning areas and subzones in Singapore, you can find the data here
hdb_data.csv containing HDB location data, collected by Prof. Kam
hospital.csv containing hospital location data, collected by Prof. Kam
distance_matrix.csv containing shortest travel distance and travel time from a HDB location to a hospital location, collected by Prof. Kam
Warning: These three aspatial data are specially collected by Prof. Kam for teaching and research purpose. Students taking IS415 Geospatial Analytics and Applications are allowed to use them for hands-on exercise purpose. Please obtain formal approval from Prof. Kam if you want to use them for other courses or usage.
We will be using these R packages:
sf - for handling spatial data
tmap - for choropleth mapping
tidyverse - for tidying attribute/variable data (include tidyr, readr, ggplot2 & dplyr packages)
SpatialAcc - for modeling geographical accessibility, providing a set of spatial accessibility measures from a set of locations (demand) to another set of locations (supply), more info can be found here
ggstatsplot - for ggplot2 plots with statistical details, more info can be found here
packages = c('sf', 'tmap', 'tidyverse', 'SpatialAcc', 'ggstatsplot')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
We import the MP14_SUBZONE_WEB_PL shapefile using st_read() function of sf package. The shapefile consists of URA Master Plan 2014’s planning subzone boundaries and is in svy21 projected coordinate reference systems (CRS).
mpsz <- st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `D:\SMU Year 3 Sem 1\IS415 Geospatial Analytics and Application\In_class_exercise\In_class_exercise10\data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
mpsz is a simple feature data.frame and it’s geometry type is multipolygon. It is also important to note that mpsz sf object does not have EPSG information.
We will transform the mpsz object to a new dataframe with EPSG 3414 by using st_transform() function. After that, we can verify the CRS by using st_crs() function.
mpsz_svy21 <- st_transform(mpsz, 3414)
st_crs(mpsz_svy21)
## Coordinate Reference System:
## User input: EPSG:3414
## wkt:
## PROJCRS["SVY21 / Singapore TM",
## BASEGEOGCRS["SVY21",
## DATUM["SVY21",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4757]],
## CONVERSION["Singapore Transverse Mercator",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",1.36666666666667,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",103.833333333333,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",1,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",28001.642,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",38744.572,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["northing (N)",north,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["easting (E)",east,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["unknown"],
## AREA["Singapore"],
## BBOX[1.13,103.59,1.47,104.07]],
## ID["EPSG",3414]]
We can also see the extent of mpsz_svy21 by using st_bbox() function of sf package.
st_bbox(mpsz_svy21)
## xmin ymin xmax ymax
## 2667.538 15748.721 56396.440 50256.334
We import the csv file using read_csv() function of readr package.
Then, we will derive a new variable called Total which is the sum of columns 2 to 11 (1-room to Type S1)
hdb <- read_csv("data/aspatial/hdb_data.csv") %>% mutate(Total = rowSums(.[2:11]))
##
## -- Column specification --------------------
## cols(
## Postcode = col_double(),
## `1-room` = col_double(),
## `2-room` = col_double(),
## `3-room` = col_double(),
## `4-room` = col_double(),
## `5-room` = col_double(),
## Executive = col_double(),
## HUDC = col_double(),
## `Multi-generation` = col_double(),
## `Studio Apartment` = col_double(),
## `Type S1` = col_double(),
## `Type S2` = col_double(),
## X = col_double(),
## Y = col_double(),
## Latitude = col_double(),
## Longitude = col_double()
## )
hdb
## # A tibble: 9,978 x 17
## Postcode `1-room` `2-room` `3-room` `4-room` `5-room` Executive HUDC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 85101 0 0 0 0 0 0 0
## 2 85201 0 0 0 0 0 0 0
## 3 85301 0 0 0 0 0 0 0
## 4 85401 0 0 0 0 0 0 0
## 5 85501 0 0 0 0 0 0 0
## 6 85601 0 0 0 0 0 0 0
## 7 85701 0 0 0 0 0 0 0
## 8 50004 0 0 248 0 2 0 0
## 9 50005 430 0 0 0 0 0 0
## 10 50032 0 86 38 3 0 0 0
## # ... with 9,968 more rows, and 9 more variables: `Multi-generation` <dbl>,
## # `Studio Apartment` <dbl>, `Type S1` <dbl>, `Type S2` <dbl>, X <dbl>,
## # Y <dbl>, Latitude <dbl>, Longitude <dbl>, Total <dbl>
We will also convert the data type of Postcode variable from numeric/double to character and use st_as_sf() function of sf package to convert the hdb aspatial tibble data.frame to sf data.frame using the values from X(longitude) and Y(latitude) values.
hdb$Postcode <- as.character(hdb$Postcode)
hdb_sf <- st_as_sf(hdb,
coords = c("X", "Y"),
crs = "+init=epsg:3414")
summary(hdb_sf)
## Postcode 1-room 2-room 3-room
## Length:9978 Min. : 0.000 Min. : 0.000 Min. : 0.00
## Class :character 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00
## Mode :character Median : 0.000 Median : 0.000 Median : 0.00
## Mean : 3.016 Mean : 4.918 Mean : 24.09
## 3rd Qu.: 0.000 3rd Qu.: 0.000 3rd Qu.: 24.00
## Max. :520.000 Max. :452.000 Max. :528.00
## 4-room 5-room Executive HUDC
## Min. : 0.00 Min. : 0.00 Min. : 0.000 Min. :0.00000
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.:0.00000
## Median : 34.00 Median : 1.00 Median : 0.000 Median :0.00000
## Mean : 41.34 Mean : 23.99 Mean : 6.487 Mean :0.01012
## 3rd Qu.: 65.00 3rd Qu.: 45.00 3rd Qu.: 0.000 3rd Qu.:0.00000
## Max. :316.00 Max. :164.00 Max. :135.000 Max. :1.00000
## Multi-generation Studio Apartment Type S1 Type S2
## Min. : 0.00000 Min. : 0.0000 Min. : 0.0000 Min. : 0.00000
## 1st Qu.: 0.00000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00000
## Median : 0.00000 Median : 0.0000 Median : 0.0000 Median : 0.00000
## Mean : 0.03678 Mean : 0.9015 Mean : 0.1235 Mean : 0.06174
## 3rd Qu.: 0.00000 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.00000
## Max. :66.00000 Max. :217.0000 Max. :176.0000 Max. :88.00000
## Latitude Longitude Total geometry
## Min. :1.270 Min. :103.7 Min. : 1.0 POINT :9978
## 1st Qu.:1.337 1st Qu.:103.8 1st Qu.: 76.0 epsg:NA : 0
## Median :1.363 Median :103.8 Median : 98.0 +proj=tmer...: 0
## Mean :1.365 Mean :103.8 Mean :104.9
## 3rd Qu.:1.391 3rd Qu.:103.9 3rd Qu.:128.0
## Max. :1.457 Max. :104.0 Max. :584.0
Next, we will import the hospital csv data.
hospital <- read_csv("data/aspatial/hospital.csv")
##
## -- Column specification --------------------
## cols(
## Postcode = col_double(),
## X = col_double(),
## Y = col_double(),
## Latitude = col_double(),
## Longitude = col_double(),
## `No Beds` = col_double()
## )
hospital
## # A tibble: 26 x 6
## Postcode X Y Latitude Longitude `No Beds`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 119074 22476. 30756. 1.29 104. 1239
## 2 159964 24289. 30112. 1.29 104. 176
## 3 169608 27711. 29155. 1.28 104. 1785
## 4 188770 30687. 31504. 1.30 104. 380
## 5 217562 30294. 32752. 1.31 104. 121
## 6 228510 28236. 31944. 1.31 104. 345
## 7 229899 29500. 32531. 1.31 104. 830
## 8 258500 26474. 32194. 1.31 104. 258
## 9 289891 25774. 34337. 1.33 104. 34
## 10 307677 28906. 34177. 1.33 104. 190
## # ... with 16 more rows
hospital$Postcode <- as.character(hospital$Postcode)
hospital_sf <- st_as_sf(hospital,
coords = c("X", "Y"),
crs = "+init=epsg:3414")
summary(hospital_sf)
## Postcode Latitude Longitude No Beds
## Length:26 Min. :1.280 Min. :103.7 Min. : 34.0
## Class :character 1st Qu.:1.311 1st Qu.:103.8 1st Qu.: 241.5
## Mode :character Median :1.326 Median :103.8 Median : 375.0
## Mean :1.339 Mean :103.8 Mean : 610.9
## 3rd Qu.:1.366 3rd Qu.:103.9 3rd Qu.: 957.5
## Max. :1.424 Max. :103.9 Max. :2000.0
## geometry
## POINT :26
## epsg:NA : 0
## +proj=tmer...: 0
##
##
##
Lastly, we import the distance matrix csv data using the same steps.
dismat <- read_csv("data/aspatial/distance_matrix.csv")
##
## -- Column specification --------------------
## cols(
## postal_hospital = col_double(),
## lat_hospital = col_double(),
## long_hospital = col_double(),
## postal_hdb = col_character(),
## lat_hdb = col_double(),
## long_hdb = col_double(),
## Distance = col_double(),
## Time = col_double()
## )
dismat
## # A tibble: 259,428 x 8
## postal_hospital lat_hospital long_hospital postal_hdb lat_hdb long_hdb
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 119074 1.29 104. 050004 1.28 104.
## 2 159964 1.29 104. 050004 1.28 104.
## 3 169608 1.28 104. 050004 1.28 104.
## 4 188770 1.30 104. 050004 1.28 104.
## 5 217562 1.31 104. 050004 1.28 104.
## 6 228510 1.31 104. 050004 1.28 104.
## 7 229899 1.31 104. 050004 1.28 104.
## 8 258500 1.31 104. 050004 1.28 104.
## 9 289891 1.33 104. 050004 1.28 104.
## 10 307677 1.33 104. 050004 1.28 104.
## # ... with 259,418 more rows, and 2 more variables: Distance <dbl>, Time <dbl>
Note that Distance is in metres and Time is in seconds.
dismat$postal_hospital <- as.character(dismat$postal_hospital)
dismat_fat <- dismat %>%
select(postal_hospital, postal_hdb, Distance)
dismat_fat
## # A tibble: 259,428 x 3
## postal_hospital postal_hdb Distance
## <chr> <chr> <dbl>
## 1 119074 050004 9251
## 2 159964 050004 6035
## 3 169608 050004 1976
## 4 188770 050004 2924
## 5 217562 050004 4500
## 6 228510 050004 4210
## 7 229899 050004 5169
## 8 258500 050004 4959
## 9 289891 050004 9775
## 10 307677 050004 7697
## # ... with 259,418 more rows
We will use pivot_wider() or spread() function of tidyr package to transform the distance-matrix from thin format (long) into fat format (wide).
dismat_fat <- dismat_fat %>%
spread(postal_hospital, Distance) %>%
select(-postal_hdb)
dismat_fat
## # A tibble: 9,978 x 26
## `119074` `159964` `169608` `188770` `217562` `228510` `229899` `258500`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 9251 6035 1976 2924 4500 4210 5169 4959
## 2 9355 6114 2056 3027 4604 4289 5273 5062
## 3 8430 5596 1537 2738 4314 3995 4983 4772
## 4 9016 5819 1760 2688 4264 3974 4934 4723
## 5 9136 6027 2387 2435 4271 4077 4940 4854
## 6 9061 5951 2311 2359 4195 4002 4864 4778
## 7 8986 5876 2236 2284 4120 3927 4789 4703
## 8 9036 5926 2286 2334 4170 3977 4839 4753
## 9 8947 5853 2213 2245 4081 3903 4750 4664
## 10 9148 6011 1953 2821 4397 4186 5066 4855
## # ... with 9,968 more rows, and 18 more variables: `289891` <dbl>,
## # `307677` <dbl>, `308433` <dbl>, `329562` <dbl>, `329563` <dbl>,
## # `427990` <dbl>, `529889` <dbl>, `529895` <dbl>, `539747` <dbl>,
## # `544835` <dbl>, `544886` <dbl>, `547530` <dbl>, `569766` <dbl>,
## # `574623` <dbl>, `609606` <dbl>, `659674` <dbl>, `768024` <dbl>,
## # `768828` <dbl>
We will add several variables into the fat/wide distance matrix and call it a new data.frame dismat_temp.
dismat_temp <- dismat %>%
inner_join(hdb, by = c("postal_hdb" = "Postcode")) %>%
inner_join(hospital, by = c("postal_hospital" = "Postcode")) %>%
select(postal_hospital, postal_hdb, lat_hdb, long_hdb, Distance) %>%
spread(postal_hospital, Distance)
dismat_temp
## # A tibble: 9,870 x 29
## postal_hdb lat_hdb long_hdb `119074` `159964` `169608` `188770` `217562`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 100044 1.27 104. 5634 3925 4516 7633 10481
## 2 100045 1.27 104. 5672 3928 4519 7671 10518
## 3 100046 1.27 104. 5548 3803 4394 7546 10394
## 4 100047 1.27 104. 5512 3768 4359 7511 10358
## 5 100048 1.27 104. 5399 3655 4245 7398 10245
## 6 100049 1.27 104. 5411 3666 4257 7410 10257
## 7 100050 1.27 104. 5067 3323 4266 7418 10266
## 8 100051 1.27 104. 5220 3475 4715 7867 10714
## 9 100052 1.27 104. 5300 3555 4348 7500 10348
## 10 100053 1.27 104. 5346 3601 4245 7397 10244
## # ... with 9,860 more rows, and 21 more variables: `228510` <dbl>,
## # `229899` <dbl>, `258500` <dbl>, `289891` <dbl>, `307677` <dbl>,
## # `308433` <dbl>, `329562` <dbl>, `329563` <dbl>, `427990` <dbl>,
## # `529889` <dbl>, `529895` <dbl>, `539747` <dbl>, `544835` <dbl>,
## # `544886` <dbl>, `547530` <dbl>, `569766` <dbl>, `574623` <dbl>,
## # `609606` <dbl>, `659674` <dbl>, `768024` <dbl>, `768828` <dbl>
The final distance matrix will be derived by dropping the postal_hdb, lat_hdb and long_hdb variables from dismat_temp, as well as converting the values in the matrix from metres to kilometres.
dismat_matrix <- dismat_temp %>%
select(-c(postal_hdb, lat_hdb, long_hdb)) %>%
data.matrix(rownames.force = NA)
head(dismat_matrix)
## 119074 159964 169608 188770 217562 228510 229899 258500 289891 307677
## [1,] 5634 3925 4516 7633 10481 6299 7786 5570 11181 8700
## [2,] 5672 3928 4519 7671 10518 6302 7823 5608 11219 8738
## [3,] 5548 3803 4394 7546 10394 6177 7699 5484 11095 8613
## [4,] 5512 3768 4359 7511 10358 6141 7664 5448 11059 8578
## [5,] 5399 3655 4245 7398 10245 6028 7550 5335 10946 8465
## [6,] 5411 3666 4257 7410 10257 6040 7562 5347 10958 8477
## 308433 329562 329563 427990 529889 529895 539747 544835 544886 547530
## [1,] 10004 8293 8143 16252 21733 21853 19698 21551 21138 17537
## [2,] 10007 8330 8181 16290 21770 21890 19701 21588 21176 17575
## [3,] 9882 8206 8057 16165 21646 21766 19576 21464 21051 17450
## [4,] 9846 8171 8021 16130 21610 21730 19541 21429 21016 17415
## [5,] 9733 8058 7908 16017 21497 21617 19428 21315 20903 17302
## [6,] 9745 8069 7920 16029 21509 21629 19439 21327 20915 17314
## 569766 574623 609606 659674 768024 768828
## [1,] 17861 10854 12178 14764 25027 25240
## [2,] 17899 10857 12181 14801 25065 25277
## [3,] 17774 10732 12056 14677 24941 25153
## [4,] 17739 10697 12021 14642 24905 25118
## [5,] 17626 10584 11908 14529 24792 25005
## [6,] 17638 10595 11919 14540 24804 25017
dismat_km <- as.matrix(dismat_fat/1000)
head(dismat_km)
## 119074 159964 169608 188770 217562 228510 229899 258500 289891 307677
## [1,] 9.251 6.035 1.976 2.924 4.500 4.210 5.169 4.959 9.775 7.697
## [2,] 9.355 6.114 2.056 3.027 4.604 4.289 5.273 5.062 9.878 7.800
## [3,] 8.430 5.596 1.537 2.738 4.314 3.995 4.983 4.772 9.589 7.510
## [4,] 9.016 5.819 1.760 2.688 4.264 3.974 4.934 4.723 9.539 7.461
## [5,] 9.136 6.027 2.387 2.435 4.271 4.077 4.940 4.854 9.670 7.592
## [6,] 9.061 5.951 2.311 2.359 4.195 4.002 4.864 4.778 9.595 7.517
## 308433 329562 329563 427990 529889 529895 539747 544835 544886 547530
## [1,] 6.298 7.000 7.058 9.302 18.045 18.165 15.992 17.863 17.450 13.849
## [2,] 6.377 7.104 7.162 9.405 18.148 18.268 16.072 17.966 17.554 13.953
## [3,] 6.083 6.814 6.872 9.115 17.858 17.978 15.777 17.676 17.264 13.663
## [4,] 6.062 6.764 6.823 9.066 17.809 17.929 15.756 17.627 17.214 13.613
## [5,] 6.175 6.895 5.722 11.043 17.940 18.060 15.870 17.758 17.345 13.744
## [6,] 6.100 6.820 5.647 11.097 17.864 17.984 15.794 17.683 17.270 13.669
## 569766 574623 609606 659674 768024 768828
## [1,] 14.173 9.816 15.777 18.381 21.339 21.552
## [2,] 14.277 9.896 15.856 18.484 21.443 21.655
## [3,] 13.987 9.601 14.950 17.560 21.153 21.366
## [4,] 13.937 9.581 15.541 18.145 21.104 21.316
## [5,] 14.068 8.397 15.645 18.266 21.234 21.447
## [6,] 13.993 8.322 15.569 18.191 21.159 21.372
In this section, the ac() function of SpatialAcc package will be used to compute a collection of geographical accessibility measures such as Hansen’s potential model and Spatial Accessibility Measure (SAM).
The code chunk below is used to compute Hansen’s accessibility measure (refer to argument family). It performs the following tasks:
ac() is used to compute Hansen’s accessibility values. The output object is then converted into a tibble data.frame called acc_Hansen
bind columns of “acc_Hansen” with “hdb_sf” sf data.frame and call the output data.frame hdb_Hansen
d0 argument is the threshold distance or time for accessibility, it’s 30 in kilometres and 30,000 in metres.
family argument can be “Hansen”, “SAM” (Default), “2SFCA” and “KD2SFCA”.
acc_Hansen <- data.frame(ac(hdb$Total,
hospital$`No Beds`,
dismat_km,
d0 = 30,
power = 2,
family = "Hansen"))
colnames(acc_Hansen) <- "accHansen"
acc_Hansen <- tbl_df(acc_Hansen)
hdb_Hansen <- bind_cols(hdb_sf, acc_Hansen)
hdb_Hansen
## Simple feature collection with 9978 features and 16 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 11519.15 ymin: 28097.64 xmax: 45192.3 ymax: 48741.06
## projected CRS: SVY21 / Singapore TM
## # A tibble: 9,978 x 17
## Postcode `1-room` `2-room` `3-room` `4-room` `5-room` Executive HUDC
## * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 85101 0 0 0 0 0 0 0
## 2 85201 0 0 0 0 0 0 0
## 3 85301 0 0 0 0 0 0 0
## 4 85401 0 0 0 0 0 0 0
## 5 85501 0 0 0 0 0 0 0
## 6 85601 0 0 0 0 0 0 0
## 7 85701 0 0 0 0 0 0 0
## 8 50004 0 0 248 0 2 0 0
## 9 50005 430 0 0 0 0 0 0
## 10 50032 0 86 38 3 0 0 0
## # ... with 9,968 more rows, and 9 more variables: `Multi-generation` <dbl>,
## # `Studio Apartment` <dbl>, `Type S1` <dbl>, `Type S2` <dbl>, Latitude <dbl>,
## # Longitude <dbl>, Total <dbl>, geometry <POINT [m]>, accHansen <dbl>
The code chunk below is used to compute SAM accessibility measure (refer to argument family). It performs the following tasks:
ac() is used to compute SAM accessibility values. The output object is then converted into a tibble data.frame called acc_SAM
bind columns of “acc_SAM” with “hdb_sf” sf data.frame and call the output data.frame hdb_SAM
acc_SAM <- data.frame(ac(hdb$Total,
hospital$`No Beds`,
dismat_km,
d0 = 30,
power = 2,
family = "SAM"))
colnames(acc_SAM) <- "accSAM"
acc_SAM <- tbl_df(acc_SAM)
hdb_SAM <- bind_cols(hdb_sf, acc_SAM)
hdb_SAM
## Simple feature collection with 9978 features and 16 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 11519.15 ymin: 28097.64 xmax: 45192.3 ymax: 48741.06
## projected CRS: SVY21 / Singapore TM
## # A tibble: 9,978 x 17
## Postcode `1-room` `2-room` `3-room` `4-room` `5-room` Executive HUDC
## * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 85101 0 0 0 0 0 0 0
## 2 85201 0 0 0 0 0 0 0
## 3 85301 0 0 0 0 0 0 0
## 4 85401 0 0 0 0 0 0 0
## 5 85501 0 0 0 0 0 0 0
## 6 85601 0 0 0 0 0 0 0
## 7 85701 0 0 0 0 0 0 0
## 8 50004 0 0 248 0 2 0 0
## 9 50005 430 0 0 0 0 0 0
## 10 50032 0 86 38 3 0 0 0
## # ... with 9,968 more rows, and 9 more variables: `Multi-generation` <dbl>,
## # `Studio Apartment` <dbl>, `Type S1` <dbl>, `Type S2` <dbl>, Latitude <dbl>,
## # Longitude <dbl>, Total <dbl>, geometry <POINT [m]>, accSAM <dbl>
We can check if there are any NA values in hdb_SAM and hdb_Hansen.
sum(is.na(hdb_SAM))
## [1] 0
sum(is.na(hdb_Hansen))
## [1] 0
We can see that there are no NA values! We can move on to visualising the geographical accessibility measures.
We will change the tmap mode to view for interactive mapping.
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(hospital_sf) +
tm_symbols(size = 0.1) +
tm_shape(hdb_Hansen) +
tm_bubbles(col = "accHansen",
n = 5,
style = "quantile",
size = 0.001,
border.col = "black",
border.lwd = 1)
The darker spatial points have higher accessibility and lighter spatial points have lower accessibility.
We can see that the west and north west region have relatively low poor accessibility to hospital facilities.
We are going to compare the distribution of Hansen’s accessibility values by planning region.
Firstly, we will add the planning regions from mpsz_svy21 into hdb_Hansen sf point data.frame by using st_join() function of sf package which is a spatial join based on geometry. It will overlap the hospital spatial points with the mpsz_svy21 geometry by using st_intersects.
hdb_Hansen <- st_transform(hdb_Hansen, 3414)
hdb_Hansen <- st_join(hdb_Hansen, mpsz_svy21, join = st_intersects)
hdb_Hansen
## Simple feature collection with 9978 features and 31 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 11519.15 ymin: 28097.64 xmax: 45192.3 ymax: 48741.06
## projected CRS: SVY21 / Singapore TM
## # A tibble: 9,978 x 32
## Postcode `1-room` `2-room` `3-room` `4-room` `5-room` Executive HUDC
## * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 85101 0 0 0 0 0 0 0
## 2 85201 0 0 0 0 0 0 0
## 3 85301 0 0 0 0 0 0 0
## 4 85401 0 0 0 0 0 0 0
## 5 85501 0 0 0 0 0 0 0
## 6 85601 0 0 0 0 0 0 0
## 7 85701 0 0 0 0 0 0 0
## 8 50004 0 0 248 0 2 0 0
## 9 50005 430 0 0 0 0 0 0
## 10 50032 0 86 38 3 0 0 0
## # ... with 9,968 more rows, and 24 more variables: `Multi-generation` <dbl>,
## # `Studio Apartment` <dbl>, `Type S1` <dbl>, `Type S2` <dbl>, Latitude <dbl>,
## # Longitude <dbl>, Total <dbl>, geometry <POINT [m]>, accHansen <dbl>,
## # OBJECTID <int>, SUBZONE_NO <int>, SUBZONE_N <chr>, SUBZONE_C <chr>,
## # CA_IND <chr>, PLN_AREA_N <chr>, PLN_AREA_C <chr>, REGION_N <chr>,
## # REGION_C <chr>, INC_CRC <chr>, FMEL_UPD_D <date>, X_ADDR <dbl>,
## # Y_ADDR <dbl>, SHAPE_Leng <dbl>, SHAPE_Area <dbl>
Next, we will plot the distribution using geom_boxplot() function of ggplot2 package.
ggplot(data = hdb_Hansen, aes(y = accHansen, x = REGION_N)) +
geom_boxplot() +
geom_point(stat = "summary",
fun.y = "mean",
colour = "red",
size = 4)
## No summary function supplied, defaulting to `mean_se()`
We can observe that the values are skewed such that we can’t see the box plot, the interquartile range and the mean properly, hence we can use log of the accHansen values for the plot.
ggplot(data = hdb_Hansen, aes(y = log(accHansen), x = REGION_N)) +
geom_boxplot() +
geom_point(stat = "summary",
fun.y = "mean",
colour = "red",
size = 4)
## No summary function supplied, defaulting to `mean_se()`
From this boxplot, it is reinforced again that the west region have relatively lower accessibility values and the regions with higher hospital facilities would be the central region & the north-east region.
To confirm the observation, confirmatory data analysis can be performed using ggbetweenstats() function of ggstatsplot package. This is also called the ANOVA test.
hdb_Hansen$log_accHansen <- log(hdb_Hansen$accHansen)
ggbetweenstats(data = hdb_Hansen,
x = REGION_N,
y = log_accHansen,
pairwise.comparisons = TRUE,
p.adjust.method = "fdr",
title = "Hansen's accessibility values by HDB location and by Region",
caption = "Hansen's values")
Let’s try mapping the boxplot based on one region, the central region and the different planning subzones.
hdb_Hansen_central <- hdb_Hansen %>%
filter(REGION_C == "CR")
ggplot(data = hdb_Hansen_central, aes(y = log_accHansen, x = PLN_AREA_N)) +
geom_boxplot(alpha = 0.5) +
geom_point(stat = "summary",
fun.y = "mean",
colour = "red",
size = 4) +
theme_gray(base_size = 7)
## No summary function supplied, defaulting to `mean_se()`
We can see that the 2 planning areas with highest accessibility values are Rochor and Downtown.
We will change the tmap mode to view for interactive mapping.
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(hospital_sf) +
tm_symbols(size = 0.1) +
tm_shape(hdb_SAM) +
tm_bubbles(col = "accSAM",
n = 5,
style = "quantile",
size = 0.001,
border.col = "black",
border.lwd = 1)
We are going to compare the distribution of SAM’s accessibility values by planning region.
We will add the planning regions from mpsz_svy21 into hdb_SAM sf point data.frame by using st_join() function of sf package.
hdb_SAM <- st_transform(hdb_SAM, 3414)
hdb_SAM <- st_join(hdb_SAM, mpsz_svy21, join = st_intersects)
hdb_SAM
## Simple feature collection with 9978 features and 31 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 11519.15 ymin: 28097.64 xmax: 45192.3 ymax: 48741.06
## projected CRS: SVY21 / Singapore TM
## # A tibble: 9,978 x 32
## Postcode `1-room` `2-room` `3-room` `4-room` `5-room` Executive HUDC
## * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 85101 0 0 0 0 0 0 0
## 2 85201 0 0 0 0 0 0 0
## 3 85301 0 0 0 0 0 0 0
## 4 85401 0 0 0 0 0 0 0
## 5 85501 0 0 0 0 0 0 0
## 6 85601 0 0 0 0 0 0 0
## 7 85701 0 0 0 0 0 0 0
## 8 50004 0 0 248 0 2 0 0
## 9 50005 430 0 0 0 0 0 0
## 10 50032 0 86 38 3 0 0 0
## # ... with 9,968 more rows, and 24 more variables: `Multi-generation` <dbl>,
## # `Studio Apartment` <dbl>, `Type S1` <dbl>, `Type S2` <dbl>, Latitude <dbl>,
## # Longitude <dbl>, Total <dbl>, geometry <POINT [m]>, accSAM <dbl>,
## # OBJECTID <int>, SUBZONE_NO <int>, SUBZONE_N <chr>, SUBZONE_C <chr>,
## # CA_IND <chr>, PLN_AREA_N <chr>, PLN_AREA_C <chr>, REGION_N <chr>,
## # REGION_C <chr>, INC_CRC <chr>, FMEL_UPD_D <date>, X_ADDR <dbl>,
## # Y_ADDR <dbl>, SHAPE_Leng <dbl>, SHAPE_Area <dbl>
Next, we will plot the distribution using geom_boxplot() function of ggplot2 package.
ggplot(data = hdb_SAM, aes(y = log(accSAM), x = REGION_N)) +
geom_boxplot() +
geom_point(stat = "summary",
fun.y = "mean",
colour = "red",
size = 4)
## No summary function supplied, defaulting to `mean_se()`
From SAM boxplot, we can also see similar observations where the west region is also under-served for hospital facilities and have low accessibility values.