R Shiny app: https://ycchoo.shinyapps.io/Assign5/

1. Overview

The following will be a data visualization to reveal the demographic structure of Singapore population by age, gender, type of dwelling and subzone in 2019. It will be built into a Shiny app using ggplot2 and other R packages. The main focus is to provide dropdown boxes to filter Age group and Sub zone for the plots of Gender infographics, TOD pie chart and spatial distribution of Singapore.

1.1. Major data and design challenges

The data from https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data comes as a 2011-2019 series. Hence, data cleaning and filtering is required to extract only year 2019 data.

The main variables are Type of Dwelling, Gender and Age Group. The challenge lies adding interaction to the variables and actively plotting them on maps and charts.

1.2. Sketch of proposed data visualization

The proposed data design will display 3 plots in a infographics for Gender ratio, pie chart for TOD and spatial distribution of Age group. There will be 2 dropdown lists to filter the Age Group and Sub Zone, with total as default for both of them. The filter selection is reactive for the 3 plots.

2. Step-by-step data visualization

2.1. Loading and installing the R packages

packages = c('ggplot2', 'shiny', 'DT', 'sf', 'tmap', 'tidyverse', 'plyr', 'png', 'devtools', 'echarts4r', 'ggpubr', 'viridis', 'plotly')

for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

#devtools::install_github("JohnCoene/echarts4r.assets")

# Add function to trim numerics to 2 decimal points
specify_decimal <- function(x, k) trimws(format(round(x, k), nsmall=k))

2.2. Importing Dataset and Geospatial Data into R

The code chunk below use the st_read() function of sf package to import MP14_SUBZONE_WEB_PL shapfile into R as a simple feature data frame called mpsz.

rawdata <- read_csv("Data/respopagesextod2011to2019.csv")
mpsz <- st_read(dsn = "Data/geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\Favian\Desktop\ISSS608 Visual Analytics and Applications\Assignment 5\Assign5\Data\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21

2.3. Add dataset “popdataTOD” based on Type of Dwelling

To categorize TOD and Age.

popdataTOD_ <- rawdata  %>%
  filter(Time == 2019) %>%
  select(`SZ`, `AG`, `TOD`, `Pop`)

popdataTOD_ <- aggregate(cbind(Pop) ~ SZ + AG + TOD, popdataTOD_, sum)

popdataTOD_ <- popdataTOD_ %>%
  spread(AG, Pop) %>%
  mutate(YOUNG = `0_to_4`+`5_to_9`+`10_to_14`+
`15_to_19`+`20_to_24`) %>%
mutate(`ACTIVE` = rowSums(.[7:11])+
rowSums(.[13:15]))%>%
mutate(`AGED`=rowSums(.[16:20])) %>%
mutate(`TOTAL`=rowSums(.[22:24])) %>%
mutate_at(.vars = vars(SZ, TOD), toupper) %>%
select(`SZ`, `TOD`, `YOUNG`, `ACTIVE`, `AGED`, 
       `TOTAL`)

popdataTOD <- popdataTOD_

for (type in unique(popdataTOD_$TOD)) {
temp <- popdataTOD_ %>%
  filter(TOD == type)

temp <- temp %>%rbind(data.frame(SZ='TOTAL', TOD = type, YOUNG = sum(temp$YOUNG), ACTIVE = sum(temp$ACTIVE), AGED = sum(temp$AGED), TOTAL = sum(temp$TOTAL)))

temp <- temp %>% filter(SZ == "TOTAL")

popdataTOD <- popdataTOD %>% rbind(data.frame(temp))
}

2.4. Add dataset “popdataAG”

To categorize Age groups into 4 bands - Young (<25), Active (25 to 64), Aged (>64), Total (All age)

popdata <- rawdata  %>%
  filter(Time == 2019) %>%
  spread(AG, Pop) %>%
  mutate(YOUNG = `0_to_4`+`5_to_9`+`10_to_14`+
`15_to_19`+`20_to_24`) %>%
mutate(`ACTIVE` = rowSums(.[9:13])+
rowSums(.[15:18]))%>%
mutate(`AGED`=rowSums(.[19:24])) %>%
mutate(`TOTAL`=rowSums(.[25:27])) %>%
mutate_at(.vars = vars(PA, SZ), toupper) %>%
select(`PA`, `SZ`, `YOUNG`, `ACTIVE`, `AGED`, 
       `TOTAL`)

popdata <- aggregate(cbind(YOUNG, ACTIVE, AGED, TOTAL) ~ PA + SZ, popdata, sum) %>% arrange(PA)

#popdata <- popdata[apply(popdata[,-c(1:2)], 1, function(x) !all(x==0)),]

2.5. Add dataset “popdataMale”

To filter out Male data rows for all age groups

popdataMale <- rawdata  %>%
  filter(Sex == "Males") %>%
  filter(Time == 2019) %>%
  spread(AG, Pop) %>%
  mutate(YOUNG = `0_to_4`+`5_to_9`+`10_to_14`+
`15_to_19`+`20_to_24`) %>%
mutate(`ACTIVE` = rowSums(.[9:13])+
rowSums(.[15:18]))%>%
mutate(`AGED`=rowSums(.[19:24])) %>%
mutate(`TOTAL`=rowSums(.[25:27])) %>%
mutate_at(.vars = vars(PA, SZ), toupper) %>%
select(`PA`, `SZ`, `YOUNG`, `ACTIVE`, `AGED`, 
       `TOTAL`)

popdataMale <- aggregate(cbind(YOUNG, ACTIVE, AGED, TOTAL) ~ PA + SZ, popdataMale, sum) %>% arrange(PA)

#popdataMale <- popdataMale[apply(popdataMale[,-c(1:2)], 1, function(x) !all(x==0)),]

popdataMale <- popdataMale %>% rbind(data.frame(PA='TOTAL', SZ='TOTAL', YOUNG = sum(popdataMale$YOUNG), ACTIVE = sum(popdataMale$ACTIVE), AGED = sum(popdataMale$AGED), TOTAL = sum(popdataMale$TOTAL)))

2.6. Add dataset “popdataGender”

To calculate the Male and Female ratio for all 4 age bands

popdataGender <- popdata
popdataGender <- popdataGender %>% rbind(data.frame(PA='TOTAL', SZ='TOTAL', YOUNG = sum(popdataGender$YOUNG), ACTIVE = sum(popdataGender$ACTIVE), AGED = sum(popdataGender$AGED), TOTAL = sum(popdataGender$TOTAL)))

popdataGender <- popdataGender %>%
  mutate(YOUNG_M = popdataMale$YOUNG/`YOUNG`*100) %>%
  mutate(ACTIVE_M = popdataMale$ACTIVE/`ACTIVE`*100) %>%
  mutate(AGED_M = popdataMale$AGED/`AGED`*100) %>%
  mutate(TOTAL_M = popdataMale$TOTAL/`TOTAL`*100) %>%
select(`PA`, `SZ`, `YOUNG_M`, `ACTIVE_M`, `AGED_M`, 
       `TOTAL_M`)
popdataGender <- popdataGender %>%
  mutate(YOUNG_F = 100 - `YOUNG_M`) %>%
  mutate(ACTIVE_F = 100 - `ACTIVE_M`) %>%
  mutate(AGED_F = 100 - `AGED_M`) %>%
  mutate(TOTAL_F = 100 - `TOTAL_M`)

2.7. Joining the attribute data and geospatial data

Next, left_join() of dplyr is used to join the geographical data and attribute table using planning subzone name e.g. SUBZONE_N and SZ as the common identifier.

mpsz_popdata <- left_join(mpsz, popdata, 
                              by = c("SUBZONE_N" = "SZ"))

2.8. Plot Spatial Distribution Map

With tmap function according to input variable for selected Age group

        tm_shape(mpsz_popdata) +
            tm_fill("TOTAL", n=6, style = "pretty", palette = "Blues") +
            tm_borders(alpha = 0.5) +
            tm_bubbles(size = "TOTAL", col = "REGION_N") +
            tm_layout(main.title = paste("Spatial Distribution of ", "TOTAL", " Population in Singapore",sep=""),
                      main.title.position = "center",
                      main.title.size = 1.2,
                      legend.outside = T,
                      legend.position = c("right","bottom"),
                      frame = F) +
            tm_compass(type = "4star", size = 2) +
            tm_scale_bar(width = 0.2)

2.9. Add dataset “popdataGenderselected”

As a filter for the selections from the R Shiny downdrop list

    popdataGenderSelected <- popdataGender %>%
        filter(PA =="TOTAL")  %>%
        select(paste("TOTAL", "_M", sep=""), paste("TOTAL", "_F", sep=""))

2.10. Plot Gender ratio infographics

With Echarts4r according to input variables for selected Age group and selected Sub Zone

gender = data.frame(gender=c("Male", "Female"), value=c(specify_decimal(popdataGenderSelected[,1], 2), specify_decimal(popdataGenderSelected[,2], 2)),
                    path = c('path://M18.2629891,11.7131596 L6.8091608,11.7131596 C1.6685112,11.7131596 0,13.032145 0,18.6237673 L0,34.9928467 C0,38.1719847 4.28388932,38.1719847 4.28388932,34.9928467 L4.65591984,20.0216948 L5.74941883,20.0216948 L5.74941883,61.000787 C5.74941883,65.2508314 11.5891201,65.1268798 11.5891201,61.000787 L11.9611506,37.2137775 L13.1110872,37.2137775 L13.4831177,61.000787 C13.4831177,65.1268798 19.3114787,65.2508314 19.3114787,61.000787 L19.3114787,20.0216948 L20.4162301,20.0216948 L20.7882606,34.9928467 C20.7882606,38.1719847 25.0721499,38.1719847 25.0721499,34.9928467 L25.0721499,18.6237673 C25.0721499,13.032145 23.4038145,11.7131596 18.2629891,11.7131596 M12.5361629,1.11022302e-13 C15.4784742,1.11022302e-13 17.8684539,2.38997966 17.8684539,5.33237894 C17.8684539,8.27469031 15.4784742,10.66467 12.5361629,10.66467 C9.59376358,10.66467 7.20378392,8.27469031 7.20378392,5.33237894 C7.20378392,2.38997966 9.59376358,1.11022302e-13 12.5361629,1.11022302e-13',
'path://M28.9624207,31.5315864 L24.4142575,16.4793596 C23.5227152,13.8063773 20.8817445,11.7111088 17.0107398,11.7111088 L12.112691,11.7111088 C8.24168636,11.7111088 5.60080331,13.8064652 4.70917331,16.4793596 L0.149791395,31.5315864 C-0.786976655,34.7595013 2.9373074,35.9147532 3.9192135,32.890727 L8.72689855,19.1296485 L9.2799493,19.1296485 C9.2799493,19.1296485 2.95992025,43.7750224 2.70031069,44.6924335 C2.56498417,45.1567684 2.74553639,45.4852068 3.24205501,45.4852068 L8.704461,45.4852068 L8.704461,61.6700801 C8.704461,64.9659872 13.625035,64.9659872 13.625035,61.6700801 L13.625035,45.360657 L15.5097899,45.360657 L15.4984835,61.6700801 C15.4984835,64.9659872 20.4191451,64.9659872 20.4191451,61.6700801 L20.4191451,45.4852068 L25.8814635,45.4852068 C26.3667633,45.4852068 26.5586219,45.1567684 26.4345142,44.6924335 C26.1636859,43.7750224 19.8436568,19.1296485 19.8436568,19.1296485 L20.3966199,19.1296485 L25.2043926,32.890727 C26.1862111,35.9147532 29.9105828,34.7595013 28.9625083,31.5315864 L28.9624207,31.5315864 Z M14.5617154,0 C17.4960397,0 19.8773132,2.3898427 19.8773132,5.33453001 C19.8773132,8.27930527 17.4960397,10.66906 14.5617154,10.66906 C11.6274788,10.66906 9.24611767,8.27930527 9.24611767,5.33453001 C9.24611767,2.3898427 11.6274788,0 14.5617154,0 L14.5617154,0 Z'))

  gender %>% 
  e_charts(gender) %>% 
  e_x_axis(splitLine=list(show = FALSE), 
           axisTick=list(show=FALSE),
           axisLine=list(show=FALSE),
           axisLabel= list(show=FALSE)) %>%
  e_y_axis(max=100, 
           splitLine=list(show = FALSE),
           axisTick=list(show=FALSE),
           axisLine=list(show=FALSE),
           axisLabel=list(show=FALSE)) %>%
  e_color(color = c('#69cce6','#eee')) %>%
  e_pictorial(value, symbol = path, z=10, name= 'realValue', 
              symbolBoundingData= 100, symbolClip= TRUE) %>% 
  e_pictorial(value, symbol = path, name= 'background', 
              symbolBoundingData= 100) %>% 
  e_labels(position = "bottom", offset= c(0, 10), 
           textStyle =list(fontSize= 20, fontFamily= 'Arial', 
                           fontWeight ='bold', 
                           color= '#69cce6'),
           formatter="{@[1]}% {@[0]}") %>%
  e_legend(show = FALSE) %>%
  e_theme("westeros")

2.11. Plot pie chart by Type of Dwelling

With plot_ly function according to input variables for selected Age group and selected Sub Zone

popdataTODSelected <- popdataTOD %>% filter(SZ == "TOTAL")

plot_ly(popdataTODSelected, labels = ~TOD, values = ~YOUNG, type = 'pie') %>% layout(title = 'Pie chart by Type of Dwelling',
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))

3. The data visualization

3.1. Final data visualization

https://ycchoo.shinyapps.io/Assign5/

3.2. Data visualization description

The data visualization is built as a Shiny app and combines the above 3 plots side by side: Ratio percentage of Gender by the selected Age Group and Sub Zone Percentage count of TOD by the selected Age Group and Sub Zone Spatial distribution on Singapore map by the selected Age Group

The Gender ratio infographics displays the 2 categories of Male and Female, with blue fill as the percentage count. The title will react to the Age Group and Sub Zone selection changes. The title will react to the selected Sub Zone.

The pie chart breaks the population count into respective Type of Dwelling. There are labels of percentage and hovering each pie will have a tooltip to depict the TOD, Count and % Count.

The spatial distribution map will react to the selected Age Group to show their count and proportion on the map.

3.3. Insights description

Selection of AGED group with TOTAL subzones shown a ratio of 45.8% Male to 54.20% of Female. There is almost a 10% difference as opposed to the Total age group of 49.15% Male to 50.85% Female.
This may imply that Female have a longer lifespan than Male in Singapore.

Woodlands East has a comparably high proportion of YOUNG aged group which may suggest a maturing subzone. Furthermore, 40.7% of the AGED, 41.9% of the ACTIVE and 43.8% of the YOUNG stay in HDB 5 rooms and Executive flats.
This may imply that recent BTO projects include a lot of HDB 5 rooms and Executive projects.