0. Introduction

This assignment is interesting as it is completely open to our imagination and interpretation on how we wish to tackle the problem.

In assignment 4 where we worked on the population data, there were some areas where there were opportunities where data displayed in maps would make a greatly visualisation to showcase certain information. Since the use of maps was covered in Lecture 10, we will use the same dataset from assignment 4 to complete the parts of the visualisation that we did not get to do in Assignment 4.

1. Major data and design challenges

Some of the challenges we can expect are as follows:

  • There is a lot of data within the dataset (e.g., place people live, age profile, gender, type of dwelling, etc.) and it is a challenge to present to the user all of the information that is available. Many of the information contains geographical information that cannot be effectively conveyed in a form or table or graph which are the most prevalent form of visualisation. The most natural solution is to present the information against a map.

  • Since there is quite a lot of information that is available, a single map would not be able to convey the richness of information available. Therefore, we will single out 3 sets of information that we think would be interesting to the reader and plot them into a choropleth.

  • In the process of doing this assignment, I had originally intended to combine a Shiny app with R markdown to provide some interactivity. However, there seems to be some issues relating to the use of RenderTmap that didn’t seem to quite work in Rmd (it was ok as a standalone Shiny app though). Therefore, instead of pursuing interactivity via Shiny, we can also attempt to use animation to provide more information to the reader while achieving a visually appealing infographic for the reader. In this instance, one of the problems that I highlighted in assignment 4 where the population pyramid, when plotted into facets using years, ended up looking extremely cluttered and lost its appeal and ability to convey useful info. However, in this assignment, we will instead animate the pyramid which is both informative (we can see how the population changes over time) and interesting (having animation is more visually appealling).

Since it is really difficult to sketch a choropleth by hand, we will instead provide the step-by-step directions to generate them instead in the next section.

2. Step-by-step flow to generate the visuals in R

In this section, we will present the steps used to generate the visuals that will be used in part 3 of the assignment.

2.1 Import libraries and data

We start by importing the all of the necessary packages for the visualisation and data processing.

packages=c('tidyverse', 'gganimate', 'gifski', 'tmap', 'sf', 'png')

for (p in packages){
  if (!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

We import the .csv file as follows:

# read the csv file
pop_data <- read_csv("./respopagesextod2011to2019.csv")

# remove all the understores from the AG field
pop_data$AG <- str_replace_all(pop_data$AG, "_", " ")

# preserve the original data order
pop_data$AG <- factor(pop_data$AG, levels = pop_data$AG, labels = pop_data$AG)

2.2 Animated population pyramid

Perform data processing for the pyramid plot.

agpop_mutated <- pop_data %>%
  spread(AG, Pop)

pop_data_sumarea <- 
  agpop_mutated[-c(1:2,4)] %>% 
  group_by(Time, Sex) %>%
  summarise_all(list(sum)) %>%
  pivot_longer(
    cols = 3:21,
    names_to = "AG",
    values_to = "Pop"
  )

# preserve the original data order
pop_data_sumarea$AG <- factor(pop_data_sumarea$AG, levels = pop_data_sumarea$AG, labels = pop_data_sumarea$AG)

Generate the animated population pyramid as follows:

# plot the pyramid
pop_pyramid <-
  ggplot(pop_data_sumarea, aes(x=AG, fill=Sex)) + 
  geom_col(data=subset(pop_data_sumarea, Sex=="Females" ), aes(y=Pop) ) +
  geom_col(data=subset(pop_data_sumarea, Sex=="Males" ), aes(y=-Pop) ) +
  scale_y_continuous(breaks = seq(-2000000, 2000000, 100000), 
                     labels = paste0(as.character(c(seq(2000, 0, -100), seq(100, 2000, 100))), "K")) + 
  coord_flip() +
  scale_fill_brewer(palette = "Set1") +
  theme_classic() +
  transition_time(Time) +
  labs(title = "Year: {as.integer(frame_time)}", 
       caption = "Data source: www.singstat.gov.sg") +
  theme(axis.text = element_text(size = 12)) +
  theme(axis.title = element_text(size = 16)) 


pop_py_anim <- animate(pop_pyramid, fps = 10, duration = 10, width = 700, height = 500, start_pause = 10, end_pause = 10)

pop_py_anim

2.3 Plotting the choropleth

Read the map file:

mpsz <- st_read(dsn = "geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\weige\Google Drive\~MITB\ISSS608 - Visual Analytics\Assignments\Assignment 5\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21

Prepare the various data for each subzone that we want to present. We join the dataset with the map data and calculate the following metrics:

  • Average age
  • Percentage of for Young, Active and Old
  • Population density
# summarise the young, active and old and filter out zero data
agpop_mutated <- pop_data %>%
  spread(AG, Pop) %>%
  mutate(Young = rowSums(.[6:10])) %>%
  mutate(Active = rowSums(.[11:18]))  %>%
  mutate(Old = rowSums(.[19:24])) %>%
  mutate(Total = rowSums(.[25:27])) 


# pop_main is used to hold the main data
pop_main <- agpop_mutated[-c(1,3:4)] %>%
  group_by(SZ, Time) %>%
  summarise_all(list(sum))

# calculate percentages
pop_main$Young <- pop_main$Young / pop_main$Total
pop_main$Active <- pop_main$Active / pop_main$Total
pop_main$Old <- pop_main$Old / pop_main$Total

# calculate average age
age_mat <- matrix(1, nrow(pop_main), 1) %*% matrix(seq(from = 2.5, to = 92.5, by = 5), nrow=1, ncol=19)
pop_main$avg_age = rowSums(pop_main[,3:21] * age_mat) / pop_main$Total

# drop all the age-population columns
pop_main <- pop_main[-c(3:21)]

# join the map to the dataset
map_combined <- left_join(mpsz %>% mutate(SUBZONE_N = str_to_title(SUBZONE_N)), 
                          pop_main, 
                          by = c("SUBZONE_N" = "SZ"))

# calculate the population density
map_combined$pop_den = map_combined$Total / map_combined$SHAPE_Area * 1e6

map_combined <- map_combined[-c(1:2)]

# filter down to 2019
map2019 <- map_combined %>% filter(Time==2019)

Plotting the choropleth for average age:

tmap_mode("view")


tm_avg_age <- 
  tm_shape(map2019) +
  tm_polygons("avg_age", 
          style = "quantile", 
          palette = "Blues") +
  tm_credits("Data source: www.singstat.gov.sg")

tm_avg_age

Plotting the choropleth for population density:

tmap_mode("view")

map2019 <- map_combined %>% filter(Time==2019)

tm_popden <- 
  tm_shape(map2019) +
  tm_polygons("pop_den", 
          style = "quantile", 
          palette = "Greens") +
  tm_credits("Data source: www.singstat.gov.sg")


tm_popden

3.0 Singapore Population Statistics

3.1 Population change from 2011 to 2019

Let’s take a look at how our population has changed from 2011 to 2019.

We can see from the animation a very interesting phenomenon where the “body” of the pyramid moves upwards as the population ages. It also becomes immediately clear that Singapore’s population is aging rapidly where the top of the pyramid widens much faster while the young to middle age group remains more or less the same.