ISSS608 Assignment 5

0. Introduction

This assignment is interesting as it is completely open to our imagination and interpretation on how we wish to tackle the problem.

In assignment 4 where we worked on the population data, there were some areas where there were opportunities where data displayed in maps would make a greatly visualisation to showcase certain information. Since the use of maps was covered in Lecture 10, we will use the same dataset from assignment 4 to complete the parts of the visualisation that we did not get to do in Assignment 4.

1. Major data and design challenges

Some of the challenges we can expect are as follows:

There is a lot of data within the dataset (e.g., place people live, age profile, gender, type of dwelling, etc.) and it is a challenge to present to the user all of the information that is available. Many of the information contains geographical information that cannot be effectively conveyed in a form or table or graph which are the most prevalent form of visualisation. The most natural solution is to present the information against a map.
Since there is quite a lot of information that is available, a single map would not be able to convey the richness of information available. Therefore, we will single out 3 sets of information that we think would be interesting to the reader and plot them into a choropleth.
In the process of doing this assignment, I had originally intended to combine a Shiny app with R markdown to provide some interactivity. However, there seems to be some issues relating to the use of RenderTmap that didn’t seem to quite work in Rmd (it was ok as a standalone Shiny app though). Therefore, instead of pursuing interactivity via Shiny, we can also attempt to use animation to provide more information to the reader while achieving a visually appealing infographic for the reader. In this instance, one of the problems that I highlighted in assignment 4 where the population pyramid, when plotted into facets using years, ended up looking extremely cluttered and lost its appeal and ability to convey useful info. However, in this assignment, we will instead animate the pyramid which is both informative (we can see how the population changes over time) and interesting (having animation is more visually appealling).

Since it is really difficult to sketch a choropleth by hand, we will instead provide the step-by-step directions to generate them instead in the next section.

2. Step-by-step flow to generate the visuals in R

In this section, we will present the steps used to generate the visuals that will be used in part 3 of the assignment.

2.1 Import libraries and data

We start by importing the all of the necessary packages for the visualisation and data processing.

packages=c('tidyverse', 'gganimate', 'gifski', 'tmap', 'sf', 'png')

for (p in packages){
  if (!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

We import the .csv file as follows:

# read the csv file
pop_data <- read_csv("./respopagesextod2011to2019.csv")

# remove all the understores from the AG field
pop_data$AG <- str_replace_all(pop_data$AG, "_", " ")

# preserve the original data order
pop_data$AG <- factor(pop_data$AG, levels = pop_data$AG, labels = pop_data$AG)

2.2 Animated population pyramid

Perform data processing for the pyramid plot.

agpop_mutated <- pop_data %>%
  spread(AG, Pop)

pop_data_sumarea <- 
  agpop_mutated[-c(1:2,4)] %>% 
  group_by(Time, Sex) %>%
  summarise_all(list(sum)) %>%
  pivot_longer(
    cols = 3:21,
    names_to = "AG",
    values_to = "Pop"
  )

# preserve the original data order
pop_data_sumarea$AG <- factor(pop_data_sumarea$AG, levels = pop_data_sumarea$AG, labels = pop_data_sumarea$AG)

Generate the animated population pyramid as follows:

# plot the pyramid
pop_pyramid <-
  ggplot(pop_data_sumarea, aes(x=AG, fill=Sex)) + 
  geom_col(data=subset(pop_data_sumarea, Sex=="Females" ), aes(y=Pop) ) +
  geom_col(data=subset(pop_data_sumarea, Sex=="Males" ), aes(y=-Pop) ) +
  scale_y_continuous(breaks = seq(-2000000, 2000000, 100000), 
                     labels = paste0(as.character(c(seq(2000, 0, -100), seq(100, 2000, 100))), "K")) + 
  coord_flip() +
  scale_fill_brewer(palette = "Set1") +
  theme_classic() +
  transition_time(Time) +
  labs(title = "Year: {as.integer(frame_time)}", 
       caption = "Data source: www.singstat.gov.sg") +
  theme(axis.text = element_text(size = 12)) +
  theme(axis.title = element_text(size = 16)) 


pop_py_anim <- animate(pop_pyramid, fps = 10, duration = 10, width = 700, height = 500, start_pause = 10, end_pause = 10)

pop_py_anim

2.3 Plotting the choropleth

Read the map file:

mpsz <- st_read(dsn = "geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")

## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\weige\Google Drive\~MITB\ISSS608 - Visual Analytics\Assignments\Assignment 5\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS:  SVY21

Prepare the various data for each subzone that we want to present. We join the dataset with the map data and calculate the following metrics:

Average age
Percentage of for Young, Active and Old
Population density

# summarise the young, active and old and filter out zero data
agpop_mutated <- pop_data %>%
  spread(AG, Pop) %>%
  mutate(Young = rowSums(.[6:10])) %>%
  mutate(Active = rowSums(.[11:18]))  %>%
  mutate(Old = rowSums(.[19:24])) %>%
  mutate(Total = rowSums(.[25:27])) 


# pop_main is used to hold the main data
pop_main <- agpop_mutated[-c(1,3:4)] %>%
  group_by(SZ, Time) %>%
  summarise_all(list(sum))

# calculate percentages
pop_main$Young <- pop_main$Young / pop_main$Total
pop_main$Active <- pop_main$Active / pop_main$Total
pop_main$Old <- pop_main$Old / pop_main$Total

# calculate average age
age_mat <- matrix(1, nrow(pop_main), 1) %*% matrix(seq(from = 2.5, to = 92.5, by = 5), nrow=1, ncol=19)
pop_main$avg_age = rowSums(pop_main[,3:21] * age_mat) / pop_main$Total

# drop all the age-population columns
pop_main <- pop_main[-c(3:21)]

# join the map to the dataset
map_combined <- left_join(mpsz %>% mutate(SUBZONE_N = str_to_title(SUBZONE_N)), 
                          pop_main, 
                          by = c("SUBZONE_N" = "SZ"))

# calculate the population density
map_combined$pop_den = map_combined$Total / map_combined$SHAPE_Area * 1e6

map_combined <- map_combined[-c(1:2)]

# filter down to 2019
map2019 <- map_combined %>% filter(Time==2019)

Plotting the choropleth for average age:

tmap_mode("view")


tm_avg_age <- 
  tm_shape(map2019) +
  tm_polygons("avg_age", 
          style = "quantile", 
          palette = "Blues") +
  tm_credits("Data source: www.singstat.gov.sg")

tm_avg_age

Plotting the choropleth for population density:

tmap_mode("view")

map2019 <- map_combined %>% filter(Time==2019)

tm_popden <- 
  tm_shape(map2019) +
  tm_polygons("pop_den", 
          style = "quantile", 
          palette = "Greens") +
  tm_credits("Data source: www.singstat.gov.sg")


tm_popden

3.0 Singapore Population Statistics

3.1 Population change from 2011 to 2019

Let’s take a look at how our population has changed from 2011 to 2019.

We can see from the animation a very interesting phenomenon where the “body” of the pyramid moves upwards as the population ages. It also becomes immediately clear that Singapore’s population is aging rapidly where the top of the pyramid widens much faster while the young to middle age group remains more or less the same.

3.2 Population trends by districts in 2019

Focusing on our population in 2019, let look at the disposition of the average age of the residents in the various districts.

We can see that the average age of the residents are higher in the middle of the island in general, which mostly corresponds to the mature estates. As we move outwards from these areas, the average age in general goes downwards. At the outskirts of Singapore, in particular in the Western, Northern and North-Eastern parts of Singapore, we can clearly see that those areas corresponds to the areas with the youngest average age. These are likely young couples with kids which brings down the average age.

We also see some outliers at the fringes of the island, in particular, Lim Chu Kang, Changi Point, Loyang West and Port. It is not clear how these data points are compiled as they have either relatively low population count or high middle/old age group. The data points are given as follows:

SUBZONE_N	Young	Active	Old	Pop	Avg Age
Loyang West	0%	5%	95%	190	80.92105
Port	0%	80%	20%	50	50.5
Changi Point	15%	69%	17%	480	48.95833
Lim Chu Kang	29%	57%	14%	70	48.92857

Taking a look at the population density, we have:

The population density reveals an interesting contrast to the average age where we can clearly see that the new towns in the outskirts are more densely populated where the younger population mostly stays. This is most likely due to the high cost of housing in the mature estates and would not be as affordable as the new towns for young couples. For the mature estates, the older population would have either bought the housing many years ago where the housing was still relatively cheap, or that the households would have worked for many years to afford the housing in the more centralised areas.