The objectives of this project are:
(1)Investigate in detail the old-age dependency ratio in Singapore by subzones
This would help us better understand the number of elderly aged 65 and above as a proportion of the Economically Active Population (EAP) by subzones. This will help shed light on the spatial distribution of the number of elderly in Singapore and, thus, areas with higher dependency ratios can be identified.
There are 4 datasets used in this project:
Title: respopagsex2000to2018.csv (CSV file) (https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data)
Title: ‘Master Plan 2014 Subzone Boundary (No Sea) (SHP)’ Name: MP14_SUBZONE_NO_SEA_PL.shp Source: https://geo.data.gov.sg/mp14-subzone-no-sea-pl/2016/05/11/shp/mp14-subzone-no-sea-pl.zip
Title: CHAS Clinics (KML) Name: chas-clinics-kml.kml Source: https://geo.data.gov.sg/moh-chas-clinics/2020/09/02/kml/moh-chas-clinics.kml
Title: ELDERCARE SERVICES Name: ELDERCARE.shp Source: https://geo.data.gov.sg/eldercare/2016/07/28/kml/eldercare.zip
1. Interactive Map of Aged Dependency Ratio in 2011 and 2018 comparison
2. Interactive Map of percentage change in Aged Dependency Ratio from 2011 to 2015 and between 2015 and 2018 comparison
3. Interactive Map of aged dependency map layered with data of CHAS Clinics and Elderly Care Services (with clustered option)
4. Density plots of amenities to easily identify spatial patterns of amenities distribution
Proposed R Visualizations
Sketches
sketch
Data Challenges 1. Data cleaning of KML files
KML files have description data in HTML formats which require extensive data cleaning in order to be extracted. There are several functions which are found to be useful in the process of data cleaning.
# replace_html() #clean html tags for processing
# strsplit() #split strings based on certain characters or whitespaces (in this case)
# trimws()#trim whitespaces
# do.call()#to combine lists to create dataframes for joiningInitially, huge challenges were faced in layering of data as I did not know the right method to layer the data. The solution consists of a few crucial step. First, extract Name or Description data from the shape or kml file, together with an identifier (id) to identify each coordinate or polygon. Next, extract coordinates and the corresponding id. Finally, join the two together. The combined objects will form the base layer and, subsequently, functions such as tm_dots could be used to show the coordinates on top of the base layer, with the correct labels showing up, as it was correctly joined on the dataframe previously, corresponding to each coordinate.
Design challenges
How to best show changes in spatial patterns over time? tm_facet offers a side-by-side comparison of the map in years.
How to give a clear overview of the amenities and whether they are placed in critical areas in need?
Layer the maps as successive layers with tm_map. Also, in order to avoid fine-grained dots from being shown when the viewers zoom out of the map. Cluster options could be use within tm_layout to let the dots aggregate into a clustered circle showing the count of amenities in a subzone or region so that viewers can have a clearer overview. At the same time, with tm_leaflet, layers can also be selected/de-selected so that only information viewers are interested in will show up. In addition, a density plot can also be used to give a quick overview of the spatial patterns for each type of amenities for an even clearer summary.
library(rgdal)
library(sf)
library(tmap)
library(tidyverse)
library(readr)
library(dplyr)
library(textclean)
library(maptools)
library(tmap)This part of the code took reference from https://rpubs.com/tskam/Choropleth_Mapping with modifications
agedata2011<-read_csv("data\\aspatial\\respopagsex2000to2018.csv")
popaged <- function(year){
x<- agedata2011 %>%
filter(Time==year)%>%
pivot_wider(names_from="AG", values_from="Pop") %>%
mutate(YOUNG = `0_to_4`+`5_to_9`+`10_to_14`+`15_to_19`+`20_to_24`) %>%
mutate(`ECONOMY ACTIVE` = rowSums(.[9:13])+rowSums(.[15:17]))%>%
mutate(`AGED`=rowSums(.[18:22])) %>%
mutate(`TOTAL`=rowSums(.[5:22])) %>%
mutate_at(.vars = vars(PA, SZ), toupper) %>%
select(`PA`, `SZ`, `YOUNG`, `ECONOMY ACTIVE`, `AGED`, `TOTAL`) %>%
filter(`ECONOMY ACTIVE` > 0) %>%
filter(`AGED` > 0)%>%
group_by(PA,SZ) %>%
summarize_all(sum) %>%
mutate(`AGEDDEPENDENCY` = (`AGED`)/`ECONOMY ACTIVE`)
return(x)
}
agedpop2011<-popaged(2011)
agedpop2018<-popaged(2018)mp2014<-st_read("data\\spatial\\mp2014", layer="MP14_SUBZONE_NO_SEA_PL")## Reading layer `MP14_SUBZONE_NO_SEA_PL' from data source `C:\Users\Tan\Desktop\Y3S2\IS428_G1_Visual_Analytics_for_Business_Intelligence\A5\data\spatial\mp2014' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
mp_joined_2011 <- left_join(mp2014,agedpop2011, by = c("SUBZONE_N" = "SZ"))
mp_joined_2018 <- left_join(mp2014,agedpop2018, by = c("SUBZONE_N" = "SZ"))tmap_mode("view")
staticagedmap <- function(dataset){
tm_shape(dataset)+
tm_fill("AGEDDEPENDENCY",
style = "quantile",
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1,
popup.vars=c("SUBZONE_N","AGEDDEPENDENCY"),
id="SUBZONE_N", palette= c("Blues"), na.rm=TRUE) +
tm_layout(legend.height = 0.45,
legend.width = 0.35,
legend.outside = FALSE,
legend.position = c("right", "bottom"),
frame = FALSE) + tm_borders(col="blue") +
tm_style("cobalt")
}
agedependencymap2011<- staticagedmap(mp_joined_2011)
agedependencymap2018<- staticagedmap(mp_joined_2018)
tmap_arrange(agedependencymap2011,agedependencymap2018)For Visualisation 1, we cannot really infer any big differences between the two maps. We can try to compare percentage change between 2011 to 2015 and then 2015 to 2018 to see if we can tell the difference in percentage change over the years.
agedpop2011<-popaged(2011)
agedpop2015<-popaged(2015)
agedpop2018<-popaged(2018)
change2011<- inner_join(agedpop2011,agedpop2015,by="SZ")%>%
mutate(change= ((AGEDDEPENDENCY.y-AGEDDEPENDENCY.x)/AGEDDEPENDENCY.x)*100)%>%
filter(change>=-100 & change<=100)
change2018<- inner_join(agedpop2015,agedpop2018,by="SZ")%>%
mutate(change= ((AGEDDEPENDENCY.y-AGEDDEPENDENCY.x)/AGEDDEPENDENCY.x)*100)%>%
filter(change>=-100 & change<=100)
change2011[is.na(change2011)]<-0
change2018[is.na(change2018)]<-0change2011 <- inner_join(mp2014, change2011, by = c("SUBZONE_N" = "SZ"))
change2018 <- inner_join(mp2014, change2018, by = c("SUBZONE_N" = "SZ"))This is slightly modified from the function created in visualization 1
tmap_mode("view")
changeagedmap <- function(dataset){
tm_shape(dataset)+
tm_fill("change",
style = "quantile",
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1,
popup.vars=c("SUBZONE_N","change"),
id="SUBZONE_N", palette= c("Blues"), na.rm=TRUE) +
tm_layout(legend.height = 0.45,
legend.width = 0.35,
legend.outside = FALSE,
legend.position = c("right", "bottom"),
frame = FALSE) + tm_borders(col="blue")
}
changedmap2011<- changeagedmap(change2011)
changedmap2018<- changeagedmap(change2011)
tmap_arrange(changedmap2011, changedmap2018)Disclaimer: Do note that values of percentage change are not accurate as this is based on Master Plan 2014 which may not be reflective of the different subzone areas in Singapore at different time periods.
However, percentage change have been deliberately filtered such that it’s between -100% to 100%.
This is to give viewers a general overview of the change as a comparison in age dependency side-by-side may not reveal much. One can still observe that estates which are older have comparatively faster increase in old aged persons. For example, KAMPONG UBI, with 41.8% change. However, the change percentage should only give a general indication as it is not accurate due to change in subzone boundaries in Master Plan.
Using text clean and for-loops, clean data of kml files and obtain the names of CHAS clinics and their location in Singapore
This data cleaning process was inspired based on https://rpubs.com/ralphtay/645209
However, further work was done to tailor to the unique case.
chasclinics<-st_read("data\\spatial\\chas-clinics\\chas-clinics-kml.kml")## Reading layer `MOH_CHAS_CLINICS' from data source `C:\Users\Tan\Desktop\Y3S2\IS428_G1_Visual_Analytics_for_Business_Intelligence\A5\data\spatial\chas-clinics\chas-clinics-kml.kml' using driver `KML'
## Simple feature collection with 1167 features and 2 fields
## geometry type: POINT
## dimension: XYZ
## bbox: xmin: 103.5818 ymin: 1.016264 xmax: 103.9903 ymax: 1.456037
## z_range: zmin: 0 zmax: 0
## geographic CRS: WGS 84
chas1 <- data.frame(chasclinics)
chas2<- data.frame(apply(chas1["Description"], 1, FUN=replace_html))
names(chas2)[1] <- "Description"
splitstringfn <- function(x){
return(strsplit(x, split="\\s+"))
}
chas3 <- apply(chas2["Description"], 1, FUN="splitstringfn")
location<- list()
for (i in 1:1137){
x<- paste0(trimws(unlist(chas3[i])))
for(j in 1:length(x)){
if(x[j]=="BUILDING_NAME"){
n<- j-1}
else if(x[j]=="STREET_NAME"){
d<-j+1
}
}
location[i]<-""
for (k in d:n){
location[i]<- paste(location[i], trimws(unlist(chas3[i])[k]), sep=" ")
}
}
NAME<-list()
for (i in 1:1137) {
NAME[i]<- paste0("kml_",i) }
finalchas <- do.call(rbind, Map(data.frame, A=NAME, B=location))
names(finalchas) <- c("NAME", "location")
chas3geo<- chasclinics %>%
select("Name", "geometry")
Chas_Clinics_Singapore<- full_join(chas3geo, finalchas, by=c("Name"="NAME"))elderlycare<-st_read("data\\spatial\\elder", layer="ELDERCARE")## Reading layer `ELDERCARE' from data source `C:\Users\Tan\Desktop\Y3S2\IS428_G1_Visual_Analytics_for_Business_Intelligence\A5\data\spatial\elder' using driver `ESRI Shapefile'
## Simple feature collection with 133 features and 18 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 14481.92 ymin: 28218.43 xmax: 41665.14 ymax: 46804.9
## projected CRS: SVY21
df1<-elderlycare %>%
select("NAME","OBJECTID","geometry")
df2<-elderlycare %>%
select("OBJECTID","ADDRESSSTR")
Eldercare_Services<- st_join(df1, df2, by=c("OBJECTID"="OBJECTID"))joined_data <- left_join(mp2014, agedpop2018, by = c("SUBZONE_N" = "SZ"))
agedependency_with_amenities2018 <-
tm_basemap("Esri.WorldTopoMap")+
tm_shape(joined_data)+
tm_fill("AGEDDEPENDENCY",
style = "quantile",
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1,
popup.vars=c("SUBZONE_N","AGEDDEPENDENCY"),
palette = "Blues",
id="SUBZONE_N") +
tm_layout(legend.height = 0.45,
legend.width = 0.35,
legend.outside = FALSE,
legend.position = c("right", "bottom"),
frame = FALSE) + tm_borders(col="blue") +
tm_shape(Chas_Clinics_Singapore)+
tm_dots(size=0.05, col="white",shape=0, alpha=0.7, clustering=T)+
tm_shape(Eldercare_Services)+
tm_dots(size=0.05,col="yellow",shape=0,alpha=0.7,clustering=T)
agedependency_with_amenities2018## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.
## Symbol shapes other than circles or icons are not supported in view mode.
tmap_mode("view")## tmap mode set to interactive viewing
Do this for both CHAS clinics and Elderly Care Services
library(gridExtra)
ggchas<- chasclinics %>%
mutate(long = st_coordinates(.)[,1],
lat = st_coordinates(.)[,2])
ggchasg<-ggplot(Chas_Clinics_Singapore, aes(x=ggchas$long,y=ggchas$lat)) +
coord_equal() +
xlab('Longitude') +
ylab('Latitude') +
stat_density2d(aes(fill = ..level..), alpha = .5,
geom = "polygon", data = Chas_Clinics_Singapore) +
scale_fill_viridis_c() +
theme(legend.position = 'bottom')
ggelderly<- Eldercare_Services["geometry"] %>%
mutate(long = st_coordinates(.)[,1],
lat = st_coordinates(.)[,2])
ggchaseldercare<-ggplot(Eldercare_Services, aes(x=ggelderly$long,y=ggelderly$lat)) +
coord_equal() +
xlab('Longitude') +
ylab('Latitude') +
stat_density2d(aes(fill = ..level..), alpha = .5,
geom = "polygon", data = Eldercare_Services) +
scale_fill_viridis_c() +
theme(legend.position = 'bottom')
grid.arrange(ggchasg,ggchaseldercare, ncol=2)From visualization 1, viewers can observe that there is not much of difference in the change of aged dependency ratio as expected. However, it is evident from the visualization that the aged dependency ratio is higher as it tends towards the South Eastern parts of Singapore. In 2011, the higher range is between 0.206 to 0.250 but in 2018, that figure has climbed towards 0.356 and 0.800. Lim Chu Kang, for example, has a ratio of 0.5 which means that for every 2 Economically Active Person, there is 1 who is an elderly aged 65 above. Other subzones with high aged dependency ratios are Holland Drive, Alexandra Hill and Geylang Bahru. For more, one can click on relevant parts of the map in visualization 1 to gain insights. Visualization 2 has a similar trend, places with high aged dependency ratio are also ageing much faster, some of these subzones include Kampong Java, Henderson Hill and Marine Parade in the range between 35.6% to 80% increase in year 2018.
From visualization 3, viewers can also tell that amenities are sufficient and that the number of elderly care services as well as CHAS clinics increase along with aged dependency ratio. This is good as it suggests that there are adequate amenities to support the demand increase as more population age in certain subzones. For example, at the South of Singapore, there are 77 elderly care services, and this area happens to be the area with the highest aged dependency ratio as well. As the map expands outwards from the central area, the aged dependency ratio decreases and so does the number of CHAS clinics and elderly care services.
From the density plot in visualization 4, it is obvious that there are 3 clusters of CHAS clinics, one in south and two in the North Eastern area. This confirms the previous insight, but the spatial pattern is more obvious in the density plot. On the right which is elderly care services, there is a similar pattern, there are 2 clusters with the concentrated one in the southern part of Singapore and a less concentrated one expanding from the North Eastern parts of Singapore.