R Shiny app: https://ycchoo.shinyapps.io/Assign5/
The following will be a data visualization to reveal the demographic structure of Singapore population by age, gender, type of dwelling and subzone in 2019. It will be built into a Shiny app using ggplot2 and other R packages. The main focus is to provide dropdown boxes to filter Age group and Sub zone for the plots of Gender infographics, TOD pie chart and spatial distribution of Singapore.
The data from https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data comes as a 2011-2019 series. Hence, data cleaning and filtering is required to extract only year 2019 data.
The main variables are Type of Dwelling, Gender and Age Group. The challenge lies adding interaction to the variables and actively plotting them on maps and charts.
The proposed data design will display 3 plots in a infographics for Gender ratio, pie chart for TOD and spatial distribution of Age group. There will be 2 dropdown lists to filter the Age Group and Sub Zone, with total as default for both of them. The filter selection is reactive for the 3 plots.
packages = c('ggplot2', 'shiny', 'DT', 'sf', 'tmap', 'tidyverse', 'plyr', 'png', 'devtools', 'echarts4r', 'ggpubr', 'viridis', 'plotly')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
#devtools::install_github("JohnCoene/echarts4r.assets")
# Add function to trim numerics to 2 decimal points
specify_decimal <- function(x, k) trimws(format(round(x, k), nsmall=k))
The code chunk below use the st_read() function of sf package to import MP14_SUBZONE_WEB_PL shapfile into R as a simple feature data frame called mpsz.
rawdata <- read_csv("Data/respopagesextod2011to2019.csv")
mpsz <- st_read(dsn = "Data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\Favian\Desktop\ISSS608 Visual Analytics and Applications\Assignment 5\Assign5\Data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
To categorize TOD and Age.
popdataTOD_ <- rawdata %>%
filter(Time == 2019) %>%
select(`SZ`, `AG`, `TOD`, `Pop`)
popdataTOD_ <- aggregate(cbind(Pop) ~ SZ + AG + TOD, popdataTOD_, sum)
popdataTOD_ <- popdataTOD_ %>%
spread(AG, Pop) %>%
mutate(YOUNG = `0_to_4`+`5_to_9`+`10_to_14`+
`15_to_19`+`20_to_24`) %>%
mutate(`ACTIVE` = rowSums(.[7:11])+
rowSums(.[13:15]))%>%
mutate(`AGED`=rowSums(.[16:20])) %>%
mutate(`TOTAL`=rowSums(.[22:24])) %>%
mutate_at(.vars = vars(SZ, TOD), toupper) %>%
select(`SZ`, `TOD`, `YOUNG`, `ACTIVE`, `AGED`,
`TOTAL`)
popdataTOD <- popdataTOD_
for (type in unique(popdataTOD_$TOD)) {
temp <- popdataTOD_ %>%
filter(TOD == type)
temp <- temp %>%rbind(data.frame(SZ='TOTAL', TOD = type, YOUNG = sum(temp$YOUNG), ACTIVE = sum(temp$ACTIVE), AGED = sum(temp$AGED), TOTAL = sum(temp$TOTAL)))
temp <- temp %>% filter(SZ == "TOTAL")
popdataTOD <- popdataTOD %>% rbind(data.frame(temp))
}
To categorize Age groups into 4 bands - Young (<25), Active (25 to 64), Aged (>64), Total (All age)
popdata <- rawdata %>%
filter(Time == 2019) %>%
spread(AG, Pop) %>%
mutate(YOUNG = `0_to_4`+`5_to_9`+`10_to_14`+
`15_to_19`+`20_to_24`) %>%
mutate(`ACTIVE` = rowSums(.[9:13])+
rowSums(.[15:18]))%>%
mutate(`AGED`=rowSums(.[19:24])) %>%
mutate(`TOTAL`=rowSums(.[25:27])) %>%
mutate_at(.vars = vars(PA, SZ), toupper) %>%
select(`PA`, `SZ`, `YOUNG`, `ACTIVE`, `AGED`,
`TOTAL`)
popdata <- aggregate(cbind(YOUNG, ACTIVE, AGED, TOTAL) ~ PA + SZ, popdata, sum) %>% arrange(PA)
#popdata <- popdata[apply(popdata[,-c(1:2)], 1, function(x) !all(x==0)),]
To filter out Male data rows for all age groups
popdataMale <- rawdata %>%
filter(Sex == "Males") %>%
filter(Time == 2019) %>%
spread(AG, Pop) %>%
mutate(YOUNG = `0_to_4`+`5_to_9`+`10_to_14`+
`15_to_19`+`20_to_24`) %>%
mutate(`ACTIVE` = rowSums(.[9:13])+
rowSums(.[15:18]))%>%
mutate(`AGED`=rowSums(.[19:24])) %>%
mutate(`TOTAL`=rowSums(.[25:27])) %>%
mutate_at(.vars = vars(PA, SZ), toupper) %>%
select(`PA`, `SZ`, `YOUNG`, `ACTIVE`, `AGED`,
`TOTAL`)
popdataMale <- aggregate(cbind(YOUNG, ACTIVE, AGED, TOTAL) ~ PA + SZ, popdataMale, sum) %>% arrange(PA)
#popdataMale <- popdataMale[apply(popdataMale[,-c(1:2)], 1, function(x) !all(x==0)),]
popdataMale <- popdataMale %>% rbind(data.frame(PA='TOTAL', SZ='TOTAL', YOUNG = sum(popdataMale$YOUNG), ACTIVE = sum(popdataMale$ACTIVE), AGED = sum(popdataMale$AGED), TOTAL = sum(popdataMale$TOTAL)))
To calculate the Male and Female ratio for all 4 age bands
popdataGender <- popdata
popdataGender <- popdataGender %>% rbind(data.frame(PA='TOTAL', SZ='TOTAL', YOUNG = sum(popdataGender$YOUNG), ACTIVE = sum(popdataGender$ACTIVE), AGED = sum(popdataGender$AGED), TOTAL = sum(popdataGender$TOTAL)))
popdataGender <- popdataGender %>%
mutate(YOUNG_M = popdataMale$YOUNG/`YOUNG`*100) %>%
mutate(ACTIVE_M = popdataMale$ACTIVE/`ACTIVE`*100) %>%
mutate(AGED_M = popdataMale$AGED/`AGED`*100) %>%
mutate(TOTAL_M = popdataMale$TOTAL/`TOTAL`*100) %>%
select(`PA`, `SZ`, `YOUNG_M`, `ACTIVE_M`, `AGED_M`,
`TOTAL_M`)
popdataGender <- popdataGender %>%
mutate(YOUNG_F = 100 - `YOUNG_M`) %>%
mutate(ACTIVE_F = 100 - `ACTIVE_M`) %>%
mutate(AGED_F = 100 - `AGED_M`) %>%
mutate(TOTAL_F = 100 - `TOTAL_M`)
Next, left_join() of dplyr is used to join the geographical data and attribute table using planning subzone name e.g. SUBZONE_N and SZ as the common identifier.
mpsz_popdata <- left_join(mpsz, popdata,
by = c("SUBZONE_N" = "SZ"))
With tmap function according to input variable for selected Age group
tm_shape(mpsz_popdata) +
tm_fill("TOTAL", n=6, style = "pretty", palette = "Blues") +
tm_borders(alpha = 0.5) +
tm_bubbles(size = "TOTAL", col = "REGION_N") +
tm_layout(main.title = paste("Spatial Distribution of ", "TOTAL", " Population in Singapore",sep=""),
main.title.position = "center",
main.title.size = 1.2,
legend.outside = T,
legend.position = c("right","bottom"),
frame = F) +
tm_compass(type = "4star", size = 2) +
tm_scale_bar(width = 0.2)
As a filter for the selections from the R Shiny downdrop list
popdataGenderSelected <- popdataGender %>%
filter(PA =="TOTAL") %>%
select(paste("TOTAL", "_M", sep=""), paste("TOTAL", "_F", sep=""))
With Echarts4r according to input variables for selected Age group and selected Sub Zone
gender = data.frame(gender=c("Male", "Female"), value=c(specify_decimal(popdataGenderSelected[,1], 2), specify_decimal(popdataGenderSelected[,2], 2)),
path = c('path://M18.2629891,11.7131596 L6.8091608,11.7131596 C1.6685112,11.7131596 0,13.032145 0,18.6237673 L0,34.9928467 C0,38.1719847 4.28388932,38.1719847 4.28388932,34.9928467 L4.65591984,20.0216948 L5.74941883,20.0216948 L5.74941883,61.000787 C5.74941883,65.2508314 11.5891201,65.1268798 11.5891201,61.000787 L11.9611506,37.2137775 L13.1110872,37.2137775 L13.4831177,61.000787 C13.4831177,65.1268798 19.3114787,65.2508314 19.3114787,61.000787 L19.3114787,20.0216948 L20.4162301,20.0216948 L20.7882606,34.9928467 C20.7882606,38.1719847 25.0721499,38.1719847 25.0721499,34.9928467 L25.0721499,18.6237673 C25.0721499,13.032145 23.4038145,11.7131596 18.2629891,11.7131596 M12.5361629,1.11022302e-13 C15.4784742,1.11022302e-13 17.8684539,2.38997966 17.8684539,5.33237894 C17.8684539,8.27469031 15.4784742,10.66467 12.5361629,10.66467 C9.59376358,10.66467 7.20378392,8.27469031 7.20378392,5.33237894 C7.20378392,2.38997966 9.59376358,1.11022302e-13 12.5361629,1.11022302e-13',
'path://M28.9624207,31.5315864 L24.4142575,16.4793596 C23.5227152,13.8063773 20.8817445,11.7111088 17.0107398,11.7111088 L12.112691,11.7111088 C8.24168636,11.7111088 5.60080331,13.8064652 4.70917331,16.4793596 L0.149791395,31.5315864 C-0.786976655,34.7595013 2.9373074,35.9147532 3.9192135,32.890727 L8.72689855,19.1296485 L9.2799493,19.1296485 C9.2799493,19.1296485 2.95992025,43.7750224 2.70031069,44.6924335 C2.56498417,45.1567684 2.74553639,45.4852068 3.24205501,45.4852068 L8.704461,45.4852068 L8.704461,61.6700801 C8.704461,64.9659872 13.625035,64.9659872 13.625035,61.6700801 L13.625035,45.360657 L15.5097899,45.360657 L15.4984835,61.6700801 C15.4984835,64.9659872 20.4191451,64.9659872 20.4191451,61.6700801 L20.4191451,45.4852068 L25.8814635,45.4852068 C26.3667633,45.4852068 26.5586219,45.1567684 26.4345142,44.6924335 C26.1636859,43.7750224 19.8436568,19.1296485 19.8436568,19.1296485 L20.3966199,19.1296485 L25.2043926,32.890727 C26.1862111,35.9147532 29.9105828,34.7595013 28.9625083,31.5315864 L28.9624207,31.5315864 Z M14.5617154,0 C17.4960397,0 19.8773132,2.3898427 19.8773132,5.33453001 C19.8773132,8.27930527 17.4960397,10.66906 14.5617154,10.66906 C11.6274788,10.66906 9.24611767,8.27930527 9.24611767,5.33453001 C9.24611767,2.3898427 11.6274788,0 14.5617154,0 L14.5617154,0 Z'))
gender %>%
e_charts(gender) %>%
e_x_axis(splitLine=list(show = FALSE),
axisTick=list(show=FALSE),
axisLine=list(show=FALSE),
axisLabel= list(show=FALSE)) %>%
e_y_axis(max=100,
splitLine=list(show = FALSE),
axisTick=list(show=FALSE),
axisLine=list(show=FALSE),
axisLabel=list(show=FALSE)) %>%
e_color(color = c('#69cce6','#eee')) %>%
e_pictorial(value, symbol = path, z=10, name= 'realValue',
symbolBoundingData= 100, symbolClip= TRUE) %>%
e_pictorial(value, symbol = path, name= 'background',
symbolBoundingData= 100) %>%
e_labels(position = "bottom", offset= c(0, 10),
textStyle =list(fontSize= 20, fontFamily= 'Arial',
fontWeight ='bold',
color= '#69cce6'),
formatter="{@[1]}% {@[0]}") %>%
e_legend(show = FALSE) %>%
e_theme("westeros")
With plot_ly function according to input variables for selected Age group and selected Sub Zone
popdataTODSelected <- popdataTOD %>% filter(SZ == "TOTAL")
plot_ly(popdataTODSelected, labels = ~TOD, values = ~YOUNG, type = 'pie') %>% layout(title = 'Pie chart by Type of Dwelling',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
The data visualization is built as a Shiny app and combines the above 3 plots side by side: Ratio percentage of Gender by the selected Age Group and Sub Zone Percentage count of TOD by the selected Age Group and Sub Zone Spatial distribution on Singapore map by the selected Age Group
The Gender ratio infographics displays the 2 categories of Male and Female, with blue fill as the percentage count. The title will react to the Age Group and Sub Zone selection changes. The title will react to the selected Sub Zone.
The pie chart breaks the population count into respective Type of Dwelling. There are labels of percentage and hovering each pie will have a tooltip to depict the TOD, Count and % Count.
The spatial distribution map will react to the selected Age Group to show their count and proportion on the map.
Selection of AGED group with TOTAL subzones shown a ratio of 45.8% Male to 54.20% of Female. There is almost a 10% difference as opposed to the Total age group of 49.15% Male to 50.85% Female.
This may imply that Female have a longer lifespan than Male in Singapore.
Woodlands East has a comparably high proportion of YOUNG aged group which may suggest a maturing subzone. Furthermore, 40.7% of the AGED, 41.9% of the ACTIVE and 43.8% of the YOUNG stay in HDB 5 rooms and Executive flats.
This may imply that recent BTO projects include a lot of HDB 5 rooms and Executive projects.