IS428 Visual Analytics and Applications - Assignment 5

Overview

The data was collected from Singapore Department of Statistics. In the data, it shows the distribution of the residents’ population (round up to the nearest 10) in the different areas of Singapore per year from 2011 to 2020. The distribution of the residents’ population is reflected based on the year, their age group, gender and also their type of dwelling in each planning areas. The data consists of the following fields:

Addition file: MP14_SUBZONE_WEB_PL.shp file that is used in Lesson 11.

Major data and design challenges

There are a few data and design challenges when doing the visualisation:

Use case: As my business users have a programme line up for the elderly population, they want to understand the distribution of elderly in Singapore based on the Subzone areas. As such, they are able to market their programme in targeted locations with higher density of the elderly population.

Proposed sketched design

Step by Step Guide

Step 1: Install and Load package

packages=c("tidyverse","plotly", "scales","grid",'sf', 'tmap')

for(p in packages){
  if(!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

Step 2: Load data and extract 2019 data

demographics <-read_csv("data/aspatial/respopagesextod2011to2020.csv")

# only take year 2019
demo_2019 <- filter(demographics, Time==2019)

Use the code below to see demographics data in Singapore for year 2019.

demo_2019

Step 3: Load shape file for map

mpsz <- st_read(dsn = "data/geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `/Users/jaslynwong/Visual analytics/Assignment 5/data/geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21

Use the code below to see what fields and data there are in the shape file.

mpsz

Step 4: Data Preparation

Before creating the map visualisation, we will need to perform the following data preparation:

  • Create four new variables: YOUNG, ECONOMY_ACTIVE, MIDDLEAGE, ELDERLY.
  • Make PA (Planning area) and SZ (Subzone) data to uppercase so that we can join the data with the shape file.
  • Join shape file with the demographics data.

TABLE 1

Age Category Age cohorts (5 year range)
YOUNG ‘0_to_4’, ‘5_to_9’, ‘10_to_14’, ‘15_to_19’, ‘20_to_24’
ECONOMY_ACTIVE ‘25_to_29’, ‘30_to_34’, ‘35_to_39’, ‘40_to_44’
MIDDLEAGE ‘45_to_49’, ‘50_to_54’, ‘55_to_59’
ELDERLY ‘60_to_64’, ‘65_to_69’, ‘70_to_74’, ‘75_to_79’, ‘80_to_84’, ‘85_to_89’, ‘90_and_over’
TOTAL All the Age Category (YOUNG, ECONOMY_ACTIVE, MIDDLEAGE, ELDERLY)

4.1 Data wrangling

The following functions are used in this step:

  • spread(), and
  • mutate(), mutate_at(), filter() and select()

Use the code below to create new variables that has the same grouping as TABLE 1.

demo2019 <- demo_2019%>%
spread(AG, Pop)%>%
  mutate(YOUNG = `0_to_4`+`5_to_9`+`10_to_14`+
`15_to_19`+`20_to_24`) %>%
  mutate(ECONOMY_ACTIVE = `25_to_29`+`30_to_34`+`35_to_39`+
`40_to_44`) %>%
mutate(MIDDLEAGE=`45_to_49`+`50_to_54`+`55_to_59`) %>%
  mutate(ELDERLY=`60_to_64`+`65_to_69`+`70_to_74`+`75_to_79`+`80_to_84`+`85_to_89`+`90_and_over`) %>%
mutate(`TOTAL`=`0_to_4`+`5_to_9`+`10_to_14`+
`15_to_19`+`20_to_24`+`25_to_29`+`30_to_34`+`35_to_39`+
`40_to_44`+`45_to_49`+`50_to_54`+`55_to_59`+`60_to_64`+`65_to_69`+`70_to_74`+`75_to_79`+`80_to_84`+`85_to_89`+`90_and_over`) %>% 
  mutate_at(.vars = vars(PA, SZ), toupper) %>%
  select(`PA`, `SZ`,`TOD`,`YOUNG`, `ECONOMY_ACTIVE`, `MIDDLEAGE`, `ELDERLY`, 
       `TOTAL`) %>%
  filter(`ECONOMY_ACTIVE` > 0)

Use the code below to see the data with the new variables.

demo2019

As the demo2019 consist of each Sex and TOD (Type of Dwelling) in each row, thus each SZ (subzone) will have multiple rows.

To make each subzone only have one row, use the code below to do a summation of all resident counts for each subzone.

x<- demo2019 %>% 
  group_by(SZ) %>% 
  summarize_if(is.numeric,sum,na.rm = TRUE)

4.2 Joining data and shape file data

Use left_join() in this step to join the geographical data with the demographics data using their common identifier (“SUBZONE_N” and “SZ”).

mpsz_x <- left_join(mpsz, x, 
                              by = c("SUBZONE_N" = "SZ"))

Use the code below to see the joined data.

mpsz_x

Step 5: Choropleth map and gradual symbol map using tmap

5.1 Create choropleth map

The code below will show the choropleth map of the total resident count in each subzone using tm_shape() and tm_fill(). Use tmap_mode('view') to make the map interactive.

In case if you want to do a static map visualisation instead, you can use tmap_mode("plot") instead of tmap_mode("view").

tmap_mode("view")
tm_shape(mpsz_x)+
  tm_fill("TOTAL",
          style = "pretty", 
          palette = "Blues") +
  tm_layout(legend.outside = TRUE,legend.position = c("right", "bottom")) +
  tm_borders(alpha = 0.5) +
  tmap_style("white")

5.2 Include gradual symbol map on top of the choropleth map

The code below will include the gradual symbol map of the elderly distribution in the different Subzone with colours that represents each region, on top of the choropleth map by using tm_bubbles().

tm_shape(mpsz_x)+
  tm_fill("TOTAL",
          style = "equal", 
          palette = "Blues") +
  tm_bubbles(size="ELDERLY",col="REGION_N")+  #ADD THIS LINE TO ADD THE GRADUAL SYMBOL MAP FOR ELDERLY POPULATION
  tm_borders(alpha = 0.5) +
  tmap_style("white") 

5.3 Change the colour palette of the gradual symbol map

By default, the palette for the gradual symbol map is palette="cat". You will probably realise that in the map visualisation above, the “cat” palette that is used for the gradual symbol map makes it abit difficult for readers to read as the colour of the bubble in the West region is quite similar to the choropleth map fill colour.

The code below chose a colour that is contrasting with the choropleth map by using palette="div" inside tm_bubbles().

tm_shape(mpsz_x)+
  tm_fill("TOTAL",
          style = "equal", 
          palette = "Blues") +
  tm_bubbles(size="ELDERLY",col="REGION_N", palette="div")+    #ADD THIS palette to use a more contrasting colour from the choropleth map 
  tm_borders(alpha = 0.5) +
  tmap_style("white")

5.4 Change the identifier for gradual symbol map and choropleth map

By default, the id (identifier) for each subzone in the map is identified by its OBJECTID when hovering over a particular area on the map (in the above map visualisation). To make it understandable for the readers, include id="SUBZONE_N" inside tm_fill() and tm_bubbles() to change the hover identity to be the Subzone (SUBZONE_N) name instead of the OBJECTID that nobody understands.

tm_shape(mpsz_x)+
  tm_fill(id="SUBZONE_N","TOTAL",
          style = "equal", 
          palette = "Blues") +
  tm_bubbles(id="SUBZONE_N",size="ELDERLY",col="REGION_N", palette="div")+
  tm_borders(alpha = 0.5) +
  tmap_style("white")

5.5 Add details to the tooltip when readers click on a Subzone area

The code below will include details (population count) of each Subzone in the tooltip by using popup.vars.

tm_shape(mpsz_x)+
  tm_fill(id="SUBZONE_N","TOTAL",
          style = "equal", 
          palette = "Blues", popup.vars = c("Region" = "REGION_N", "ELDERLY" = "ELDERLY", "TOTAL" = "TOTAL")) +
  tm_bubbles(id="PLN_AREA_N",size="ELDERLY",col="REGION_N", palette="div", popup.vars = c("Region" = "REGION_N", "ELDERLY" = "ELDERLY", "TOTAL" = "TOTAL") )+
  tm_borders(alpha = 0.5) +
  tmap_style("white")

Final data visualisation and description

Elderly Population Distribution in Singapore 2019

Source from Singapore Department of Statistics

\[\\[1in]\]

Use case: As my business users have a programme line up for the elderly population, they want to understand the distribution of elderly in Singapore based on the Subzone areas. As such, they are able to market their programme in targeted locations with higher density of the elderly population.

In general, from the final visualisation, there is wide spread of the elderly population in the Central region of Singapore. However, if my business users want to target a specific location to advertise their programme, I would recommend them to start with Tampines that is in the East region as there is a higher population of elderly in that Subzone area. Thus, the chances of elderly gaining awareness of the programme in Tampines will be higher as compared to a location where there is a lesser elderly population. In addition, Tampines has the highest population as compared to other Subzone area. Thus, if my business users were to advertise another programme that is related to all age groups, they are able to target Tampines as their first location as well.

Reference