Introduction

The objectives of this project are:

(1)Investigate in detail the old-age dependency ratio in Singapore by subzones

This would help us better understand the number of elderly aged 65 and above as a proportion of the Economically Active Population (EAP) by subzones. This will help shed light on the spatial distribution of the number of elderly in Singapore and, thus, areas with higher dependency ratios can be identified.

  1. Determine if there is sufficient amenities to cope with the ageing population More importantly, this project also aims to determine if there is sufficient CHAS clinics and elderly care services to cope with the increased demand as a result of our ageing population

There are 4 datasets used in this project:

  1. Title: respopagsex2000to2018.csv (CSV file) (https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data)

  2. Title: ‘Master Plan 2014 Subzone Boundary (No Sea) (SHP)’ Name: MP14_SUBZONE_NO_SEA_PL.shp Source: https://geo.data.gov.sg/mp14-subzone-no-sea-pl/2016/05/11/shp/mp14-subzone-no-sea-pl.zip

  3. Title: CHAS Clinics (KML) Name: chas-clinics-kml.kml Source: https://geo.data.gov.sg/moh-chas-clinics/2020/09/02/kml/moh-chas-clinics.kml

  4. Title: ELDERCARE SERVICES Name: ELDERCARE.shp Source: https://geo.data.gov.sg/eldercare/2016/07/28/kml/eldercare.zip

1. Interactive Map of Aged Dependency Ratio in 2011 and 2018 comparison

2. Interactive Map of percentage change in Aged Dependency Ratio from 2011 to 2015 and between 2015 and 2018 comparison

3. Interactive Map of aged dependency map layered with data of CHAS Clinics and Elderly Care Services (with clustered option)

4. Density plots of amenities to easily identify spatial patterns of amenities distribution

Proposed R Visualizations

Sketches

Planned Sketches

sketch

Challenges and solutions

Describe the major data and design challenges faced in accomplishing the task, and how you plan to overcome these challenges with a proposed sketched design.

Data Challenges 1. Data cleaning of KML files

KML files have description data in HTML formats which require extensive data cleaning in order to be extracted. There are several functions which are found to be useful in the process of data cleaning.

# replace_html() #clean html tags for processing
# strsplit() #split strings based on certain characters or whitespaces (in this case)
# trimws()#trim whitespaces
# do.call()#to combine lists to create dataframes for joining
  1. How to layer all the datasets and show them meaningfully together in a single, interactive map?

Initially, huge challenges were faced in layering of data as I did not know the right method to layer the data. The solution consists of a few crucial step. First, extract Name or Description data from the shape or kml file, together with an identifier (id) to identify each coordinate or polygon. Next, extract coordinates and the corresponding id. Finally, join the two together. The combined objects will form the base layer and, subsequently, functions such as tm_dots could be used to show the coordinates on top of the base layer, with the correct labels showing up, as it was correctly joined on the dataframe previously, corresponding to each coordinate.

Design challenges

How to best show changes in spatial patterns over time? tm_facet offers a side-by-side comparison of the map in years.

How to give a clear overview of the amenities and whether they are placed in critical areas in need?

Layer the maps as successive layers with tm_map. Also, in order to avoid fine-grained dots from being shown when the viewers zoom out of the map. Cluster options could be use within tm_layout to let the dots aggregate into a clustered circle showing the count of amenities in a subzone or region so that viewers can have a clearer overview. At the same time, with tm_leaflet, layers can also be selected/de-selected so that only information viewers are interested in will show up. In addition, a density plot can also be used to give a quick overview of the spatial patterns for each type of amenities for an even clearer summary.

Step-by-Step guide to creating visualizations

Visualisation 1

1. Load packages

library(rgdal)
library(sf)
library(tmap)
library(tidyverse)
library(readr)
library(dplyr)
library(textclean)
library(maptools)
library(tmap)

2. Read and Wrangle Data and obtain two dataframes - one for 2011 and the other for 2018

This part of the code took reference from https://rpubs.com/tskam/Choropleth_Mapping with modifications

agedata2011<-read_csv("data\\aspatial\\respopagsex2000to2018.csv") 

popaged <- function(year){
  x<- agedata2011 %>%
   filter(Time==year)%>%
   pivot_wider(names_from="AG", values_from="Pop") %>% 
   mutate(YOUNG = `0_to_4`+`5_to_9`+`10_to_14`+`15_to_19`+`20_to_24`) %>% 
   mutate(`ECONOMY ACTIVE` = rowSums(.[9:13])+rowSums(.[15:17]))%>%
  mutate(`AGED`=rowSums(.[18:22])) %>%
  mutate(`TOTAL`=rowSums(.[5:22])) %>%
  mutate_at(.vars = vars(PA, SZ), toupper) %>%
  select(`PA`, `SZ`, `YOUNG`, `ECONOMY ACTIVE`, `AGED`, `TOTAL`) %>% 
  filter(`ECONOMY ACTIVE` > 0) %>%
  filter(`AGED` > 0)%>%
  group_by(PA,SZ) %>%
  summarize_all(sum) %>%
  mutate(`AGEDDEPENDENCY` = (`AGED`)/`ECONOMY ACTIVE`)
  return(x)
  
}

agedpop2011<-popaged(2011)
agedpop2018<-popaged(2018)

3. Join dataframes with Shape files

mp2014<-st_read("data\\spatial\\mp2014", layer="MP14_SUBZONE_NO_SEA_PL")
## Reading layer `MP14_SUBZONE_NO_SEA_PL' from data source `C:\Users\Tan\Desktop\Y3S2\IS428_G1_Visual_Analytics_for_Business_Intelligence\A5\data\spatial\mp2014' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21
mp_joined_2011 <- left_join(mp2014,agedpop2011, by = c("SUBZONE_N" = "SZ"))
mp_joined_2018 <- left_join(mp2014,agedpop2018, by = c("SUBZONE_N" = "SZ"))

4. Create a function to create a static map for between 2011 and 2018 to compare Aged Dependency Ratio

tmap_mode("view")

staticagedmap <- function(dataset){
  tm_shape(dataset)+
  tm_fill("AGEDDEPENDENCY", 
          style = "quantile",
          legend.hist = TRUE, 
          legend.is.portrait = TRUE,
          legend.hist.z = 0.1,
          popup.vars=c("SUBZONE_N","AGEDDEPENDENCY"),
          id="SUBZONE_N", palette= c("Blues"), na.rm=TRUE) +
  tm_layout(legend.height = 0.45, 
            legend.width = 0.35,
            legend.outside = FALSE,
            legend.position = c("right", "bottom"),
            frame = FALSE) + tm_borders(col="blue") +
    tm_style("cobalt")
}


agedependencymap2011<- staticagedmap(mp_joined_2011)
agedependencymap2018<- staticagedmap(mp_joined_2018)

tmap_arrange(agedependencymap2011,agedependencymap2018)

For Visualisation 1, we cannot really infer any big differences between the two maps. We can try to compare percentage change between 2011 to 2015 and then 2015 to 2018 to see if we can tell the difference in percentage change over the years.

Visualization 2

1. Wrangle data to calculate percentage in aged dependency ratio

agedpop2011<-popaged(2011)
agedpop2015<-popaged(2015)
agedpop2018<-popaged(2018)
change2011<- inner_join(agedpop2011,agedpop2015,by="SZ")%>%
                mutate(change= ((AGEDDEPENDENCY.y-AGEDDEPENDENCY.x)/AGEDDEPENDENCY.x)*100)%>%
                filter(change>=-100 & change<=100)
change2018<- inner_join(agedpop2015,agedpop2018,by="SZ")%>%
                mutate(change= ((AGEDDEPENDENCY.y-AGEDDEPENDENCY.x)/AGEDDEPENDENCY.x)*100)%>%
                filter(change>=-100 & change<=100)

change2011[is.na(change2011)]<-0
change2018[is.na(change2018)]<-0

2. Join dataframe to Master Plan 2014 Shape file which is in between year 2011 to 2018

change2011 <- inner_join(mp2014, change2011, by = c("SUBZONE_N" = "SZ"))
change2018 <- inner_join(mp2014, change2018, by = c("SUBZONE_N" = "SZ"))

3. Create a function to plot change map

This is slightly modified from the function created in visualization 1

tmap_mode("view")

changeagedmap <- function(dataset){
  tm_shape(dataset)+
  tm_fill("change", 
          style = "quantile",
          legend.hist = TRUE, 
          legend.is.portrait = TRUE,
          legend.hist.z = 0.1,
          popup.vars=c("SUBZONE_N","change"),
          id="SUBZONE_N", palette= c("Blues"), na.rm=TRUE) +
  tm_layout(legend.height = 0.45, 
            legend.width = 0.35,
            legend.outside = FALSE,
            legend.position = c("right", "bottom"),
            frame = FALSE) + tm_borders(col="blue") 
}


changedmap2011<- changeagedmap(change2011)
changedmap2018<- changeagedmap(change2011)


tmap_arrange(changedmap2011, changedmap2018)

Disclaimer: Do note that values of percentage change are not accurate as this is based on Master Plan 2014 which may not be reflective of the different subzone areas in Singapore at different time periods.

However, percentage change have been deliberately filtered such that it’s between -100% to 100%.

This is to give viewers a general overview of the change as a comparison in age dependency side-by-side may not reveal much. One can still observe that estates which are older have comparatively faster increase in old aged persons. For example, KAMPONG UBI, with 41.8% change. However, the change percentage should only give a general indication as it is not accurate due to change in subzone boundaries in Master Plan.

Visualisation 3

1. Clean data for chasclinics KML file

Using text clean and for-loops, clean data of kml files and obtain the names of CHAS clinics and their location in Singapore

This data cleaning process was inspired based on https://rpubs.com/ralphtay/645209

However, further work was done to tailor to the unique case.

chasclinics<-st_read("data\\spatial\\chas-clinics\\chas-clinics-kml.kml")
## Reading layer `MOH_CHAS_CLINICS' from data source `C:\Users\Tan\Desktop\Y3S2\IS428_G1_Visual_Analytics_for_Business_Intelligence\A5\data\spatial\chas-clinics\chas-clinics-kml.kml' using driver `KML'
## Simple feature collection with 1167 features and 2 fields
## geometry type:  POINT
## dimension:      XYZ
## bbox:           xmin: 103.5818 ymin: 1.016264 xmax: 103.9903 ymax: 1.456037
## z_range:        zmin: 0 zmax: 0
## geographic CRS: WGS 84
chas1 <- data.frame(chasclinics)
chas2<- data.frame(apply(chas1["Description"], 1, FUN=replace_html))
names(chas2)[1] <- "Description"
splitstringfn <- function(x){
  return(strsplit(x, split="\\s+"))
}

chas3 <- apply(chas2["Description"], 1, FUN="splitstringfn")

location<- list()

for (i in 1:1137){
  x<- paste0(trimws(unlist(chas3[i])))
  for(j in 1:length(x)){
    if(x[j]=="BUILDING_NAME"){
      n<- j-1}

    else if(x[j]=="STREET_NAME"){
      d<-j+1
    }
  }

  location[i]<-""
  for (k in d:n){

      location[i]<- paste(location[i], trimws(unlist(chas3[i])[k]), sep=" ")

  }

}

NAME<-list()
for (i in 1:1137) {
    NAME[i]<- paste0("kml_",i) }

finalchas <- do.call(rbind, Map(data.frame, A=NAME, B=location))
names(finalchas) <- c("NAME", "location")


chas3geo<- chasclinics %>%
  select("Name", "geometry")


Chas_Clinics_Singapore<- full_join(chas3geo, finalchas, by=c("Name"="NAME"))

2. Clean data for Elderly Care Services shape file

elderlycare<-st_read("data\\spatial\\elder", layer="ELDERCARE")
## Reading layer `ELDERCARE' from data source `C:\Users\Tan\Desktop\Y3S2\IS428_G1_Visual_Analytics_for_Business_Intelligence\A5\data\spatial\elder' using driver `ESRI Shapefile'
## Simple feature collection with 133 features and 18 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 14481.92 ymin: 28218.43 xmax: 41665.14 ymax: 46804.9
## projected CRS:  SVY21
df1<-elderlycare %>%
  select("NAME","OBJECTID","geometry")

df2<-elderlycare %>%
  select("OBJECTID","ADDRESSSTR")

Eldercare_Services<- st_join(df1, df2, by=c("OBJECTID"="OBJECTID"))

3. Plot a map with the latest Age Dependency ratio in Singapore based on year 2018’s data with layered data CHAS clinics and Elderly Care Services

joined_data <- left_join(mp2014, agedpop2018, by = c("SUBZONE_N" = "SZ"))


agedependency_with_amenities2018 <- 
  tm_basemap("Esri.WorldTopoMap")+
  tm_shape(joined_data)+
  tm_fill("AGEDDEPENDENCY", 
          style = "quantile",
          legend.hist = TRUE, 
          legend.is.portrait = TRUE,
          legend.hist.z = 0.1,
          popup.vars=c("SUBZONE_N","AGEDDEPENDENCY"),
          palette = "Blues",
          id="SUBZONE_N") +
  tm_layout(legend.height = 0.45, 
            legend.width = 0.35,
            legend.outside = FALSE,
            legend.position = c("right", "bottom"),
            frame = FALSE) + tm_borders(col="blue") +
  tm_shape(Chas_Clinics_Singapore)+
  tm_dots(size=0.05, col="white",shape=0, alpha=0.7, clustering=T)+
  tm_shape(Eldercare_Services)+
  tm_dots(size=0.05,col="yellow",shape=0,alpha=0.7,clustering=T)
agedependency_with_amenities2018
## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.
## Symbol shapes other than circles or icons are not supported in view mode.
tmap_mode("view")
## tmap mode set to interactive viewing

Visualisation 4

1. Extract coordinates and plot a density plot using stat_density2dplot

Do this for both CHAS clinics and Elderly Care Services

library(gridExtra)

ggchas<- chasclinics %>%
 mutate(long = st_coordinates(.)[,1],
         lat = st_coordinates(.)[,2])

ggchasg<-ggplot(Chas_Clinics_Singapore, aes(x=ggchas$long,y=ggchas$lat)) + 
  coord_equal() + 
  xlab('Longitude') + 
  ylab('Latitude') + 
  stat_density2d(aes(fill = ..level..), alpha = .5,
                 geom = "polygon", data = Chas_Clinics_Singapore) + 
  scale_fill_viridis_c() + 
  theme(legend.position = 'bottom')


ggelderly<- Eldercare_Services["geometry"] %>%
 mutate(long = st_coordinates(.)[,1],
         lat = st_coordinates(.)[,2])


ggchaseldercare<-ggplot(Eldercare_Services, aes(x=ggelderly$long,y=ggelderly$lat)) + 
  coord_equal() + 
  xlab('Longitude') + 
  ylab('Latitude') + 
  stat_density2d(aes(fill = ..level..), alpha = .5,
                 geom = "polygon", data = Eldercare_Services) + 
  scale_fill_viridis_c() + 
  theme(legend.position = 'bottom')

grid.arrange(ggchasg,ggchaseldercare, ncol=2)

Results revealed from visualization

Key finding 1

From visualization 1, viewers can observe that there is not much of difference in the change of aged dependency ratio as expected. However, it is evident from the visualization that the aged dependency ratio is higher as it tends towards the South Eastern parts of Singapore. In 2011, the higher range is between 0.206 to 0.250 but in 2018, that figure has climbed towards 0.356 and 0.800. Lim Chu Kang, for example, has a ratio of 0.5 which means that for every 2 Economically Active Person, there is 1 who is an elderly aged 65 above. Other subzones with high aged dependency ratios are Holland Drive, Alexandra Hill and Geylang Bahru. For more, one can click on relevant parts of the map in visualization 1 to gain insights. Visualization 2 has a similar trend, places with high aged dependency ratio are also ageing much faster, some of these subzones include Kampong Java, Henderson Hill and Marine Parade in the range between 35.6% to 80% increase in year 2018.

Key finding 2

From visualization 3, viewers can also tell that amenities are sufficient and that the number of elderly care services as well as CHAS clinics increase along with aged dependency ratio. This is good as it suggests that there are adequate amenities to support the demand increase as more population age in certain subzones. For example, at the South of Singapore, there are 77 elderly care services, and this area happens to be the area with the highest aged dependency ratio as well. As the map expands outwards from the central area, the aged dependency ratio decreases and so does the number of CHAS clinics and elderly care services.

Key finding 3

From the density plot in visualization 4, it is obvious that there are 3 clusters of CHAS clinics, one in south and two in the North Eastern area. This confirms the previous insight, but the spatial pattern is more obvious in the density plot. On the right which is elderly care services, there is a similar pattern, there are 2 clusters with the concentrated one in the southern part of Singapore and a less concentrated one expanding from the North Eastern parts of Singapore.