IS428 Visual Analytics and Applications - Assignment 5
The data was collected from Singapore Department of Statistics. In the data, it shows the distribution of the residents’ population (round up to the nearest 10) in the different areas of Singapore per year from 2011 to 2020. The distribution of the residents’ population is reflected based on the year, their age group, gender and also their type of dwelling in each planning areas. The data consists of the following fields:
Addition file: MP14_SUBZONE_WEB_PL.shp file that is used in Lesson 11.
There are a few data and design challenges when doing the visualisation:
As there are too many data in the csv file, from the period of 2011 to 2020, we will only be taking data from 2019 to visualise the demographic structure of Singapore’s population.
There is a total of 15 age cohorts, too many categories for readers to look at and understand which might make it confusing for readers and it will take a longer time to decipher the information from the visualisation due to information overload.
When hovering on my interactive map visualisation, it shows a number instead of the Subzone name. A number for each subzone will not mean anything to the readers when readers are trying to understand which subzone they are hovering on.
There are quite a few visualisations that I can do with the data that I have got. However, if I show too much visualisation, it will be confusing to the readers. Thus I needed to come out with a use case to design my interactive map visualisation so that it is easy to read and useful for my target audience.
Use case: As my business users have a programme line up for the elderly population, they want to understand the distribution of elderly in Singapore based on the Subzone areas. As such, they are able to market their programme in targeted locations with higher density of the elderly population.
Proposed sketched design
packages=c("tidyverse","plotly", "scales","grid",'sf', 'tmap')
for(p in packages){
if(!require(p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
demographics <-read_csv("data/aspatial/respopagesextod2011to2020.csv")
# only take year 2019
demo_2019 <- filter(demographics, Time==2019)
Use the code below to see demographics data in Singapore for year 2019.
demo_2019
mpsz <- st_read(dsn = "data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `/Users/jaslynwong/Visual analytics/Assignment 5/data/geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
Use the code below to see what fields and data there are in the shape file.
mpsz
Before creating the map visualisation, we will need to perform the following data preparation:
TABLE 1
| Age Category | Age cohorts (5 year range) |
|---|---|
| YOUNG | ‘0_to_4’, ‘5_to_9’, ‘10_to_14’, ‘15_to_19’, ‘20_to_24’ |
| ECONOMY_ACTIVE | ‘25_to_29’, ‘30_to_34’, ‘35_to_39’, ‘40_to_44’ |
| MIDDLEAGE | ‘45_to_49’, ‘50_to_54’, ‘55_to_59’ |
| ELDERLY | ‘60_to_64’, ‘65_to_69’, ‘70_to_74’, ‘75_to_79’, ‘80_to_84’, ‘85_to_89’, ‘90_and_over’ |
| TOTAL | All the Age Category (YOUNG, ECONOMY_ACTIVE, MIDDLEAGE, ELDERLY) |
The following functions are used in this step:
spread(), andmutate(), mutate_at(), filter() and select()Use the code below to create new variables that has the same grouping as TABLE 1.
demo2019 <- demo_2019%>%
spread(AG, Pop)%>%
mutate(YOUNG = `0_to_4`+`5_to_9`+`10_to_14`+
`15_to_19`+`20_to_24`) %>%
mutate(ECONOMY_ACTIVE = `25_to_29`+`30_to_34`+`35_to_39`+
`40_to_44`) %>%
mutate(MIDDLEAGE=`45_to_49`+`50_to_54`+`55_to_59`) %>%
mutate(ELDERLY=`60_to_64`+`65_to_69`+`70_to_74`+`75_to_79`+`80_to_84`+`85_to_89`+`90_and_over`) %>%
mutate(`TOTAL`=`0_to_4`+`5_to_9`+`10_to_14`+
`15_to_19`+`20_to_24`+`25_to_29`+`30_to_34`+`35_to_39`+
`40_to_44`+`45_to_49`+`50_to_54`+`55_to_59`+`60_to_64`+`65_to_69`+`70_to_74`+`75_to_79`+`80_to_84`+`85_to_89`+`90_and_over`) %>%
mutate_at(.vars = vars(PA, SZ), toupper) %>%
select(`PA`, `SZ`,`TOD`,`YOUNG`, `ECONOMY_ACTIVE`, `MIDDLEAGE`, `ELDERLY`,
`TOTAL`) %>%
filter(`ECONOMY_ACTIVE` > 0)
Use the code below to see the data with the new variables.
demo2019
As the demo2019 consist of each Sex and TOD (Type of Dwelling) in each row, thus each SZ (subzone) will have multiple rows.
To make each subzone only have one row, use the code below to do a summation of all resident counts for each subzone.
x<- demo2019 %>%
group_by(SZ) %>%
summarize_if(is.numeric,sum,na.rm = TRUE)
Use left_join() in this step to join the geographical data with the demographics data using their common identifier (“SUBZONE_N” and “SZ”).
mpsz_x <- left_join(mpsz, x,
by = c("SUBZONE_N" = "SZ"))
Use the code below to see the joined data.
mpsz_x
The code below will show the choropleth map of the total resident count in each subzone using tm_shape() and tm_fill(). Use tmap_mode('view') to make the map interactive.
In case if you want to do a static map visualisation instead, you can use tmap_mode("plot") instead of tmap_mode("view").
tmap_mode("view")
tm_shape(mpsz_x)+
tm_fill("TOTAL",
style = "pretty",
palette = "Blues") +
tm_layout(legend.outside = TRUE,legend.position = c("right", "bottom")) +
tm_borders(alpha = 0.5) +
tmap_style("white")
The code below will include the gradual symbol map of the elderly distribution in the different Subzone with colours that represents each region, on top of the choropleth map by using tm_bubbles().
tm_shape(mpsz_x)+
tm_fill("TOTAL",
style = "equal",
palette = "Blues") +
tm_bubbles(size="ELDERLY",col="REGION_N")+ #ADD THIS LINE TO ADD THE GRADUAL SYMBOL MAP FOR ELDERLY POPULATION
tm_borders(alpha = 0.5) +
tmap_style("white")
By default, the palette for the gradual symbol map is palette="cat". You will probably realise that in the map visualisation above, the “cat” palette that is used for the gradual symbol map makes it abit difficult for readers to read as the colour of the bubble in the West region is quite similar to the choropleth map fill colour.
The code below chose a colour that is contrasting with the choropleth map by using palette="div" inside tm_bubbles().
tm_shape(mpsz_x)+
tm_fill("TOTAL",
style = "equal",
palette = "Blues") +
tm_bubbles(size="ELDERLY",col="REGION_N", palette="div")+ #ADD THIS palette to use a more contrasting colour from the choropleth map
tm_borders(alpha = 0.5) +
tmap_style("white")
By default, the id (identifier) for each subzone in the map is identified by its OBJECTID when hovering over a particular area on the map (in the above map visualisation). To make it understandable for the readers, include id="SUBZONE_N" inside tm_fill() and tm_bubbles() to change the hover identity to be the Subzone (SUBZONE_N) name instead of the OBJECTID that nobody understands.
tm_shape(mpsz_x)+
tm_fill(id="SUBZONE_N","TOTAL",
style = "equal",
palette = "Blues") +
tm_bubbles(id="SUBZONE_N",size="ELDERLY",col="REGION_N", palette="div")+
tm_borders(alpha = 0.5) +
tmap_style("white")
The code below will include details (population count) of each Subzone in the tooltip by using popup.vars.
tm_shape(mpsz_x)+
tm_fill(id="SUBZONE_N","TOTAL",
style = "equal",
palette = "Blues", popup.vars = c("Region" = "REGION_N", "ELDERLY" = "ELDERLY", "TOTAL" = "TOTAL")) +
tm_bubbles(id="PLN_AREA_N",size="ELDERLY",col="REGION_N", palette="div", popup.vars = c("Region" = "REGION_N", "ELDERLY" = "ELDERLY", "TOTAL" = "TOTAL") )+
tm_borders(alpha = 0.5) +
tmap_style("white")
\[\\[1in]\]
Use case: As my business users have a programme line up for the elderly population, they want to understand the distribution of elderly in Singapore based on the Subzone areas. As such, they are able to market their programme in targeted locations with higher density of the elderly population.
In general, from the final visualisation, there is wide spread of the elderly population in the Central region of Singapore. However, if my business users want to target a specific location to advertise their programme, I would recommend them to start with Tampines that is in the East region as there is a higher population of elderly in that Subzone area. Thus, the chances of elderly gaining awareness of the programme in Tampines will be higher as compared to a location where there is a lesser elderly population. In addition, Tampines has the highest population as compared to other Subzone area. Thus, if my business users were to advertise another programme that is related to all age groups, they are able to target Tampines as their first location as well.