This assignment is interesting as it is completely open to our imagination and interpretation on how we wish to tackle the problem.
In assignment 4 where we worked on the population data, there were some areas where there were opportunities where data displayed in maps would make a greatly visualisation to showcase certain information. Since the use of maps was covered in Lecture 10, we will use the same dataset from assignment 4 to complete the parts of the visualisation that we did not get to do in Assignment 4.
Some of the challenges we can expect are as follows:
There is a lot of data within the dataset (e.g., place people live, age profile, gender, type of dwelling, etc.) and it is a challenge to present to the user all of the information that is available. Many of the information contains geographical information that cannot be effectively conveyed in a form or table or graph which are the most prevalent form of visualisation. The most natural solution is to present the information against a map.
Since there is quite a lot of information that is available, a single map would not be able to convey the richness of information available. Therefore, we will single out 3 sets of information that we think would be interesting to the reader and plot them into a choropleth.
In the process of doing this assignment, I had originally intended to combine a Shiny app with R markdown to provide some interactivity. However, there seems to be some issues relating to the use of RenderTmap that didn’t seem to quite work in Rmd (it was ok as a standalone Shiny app though). Therefore, instead of pursuing interactivity via Shiny, we can also attempt to use animation to provide more information to the reader while achieving a visually appealing infographic for the reader. In this instance, one of the problems that I highlighted in assignment 4 where the population pyramid, when plotted into facets using years, ended up looking extremely cluttered and lost its appeal and ability to convey useful info. However, in this assignment, we will instead animate the pyramid which is both informative (we can see how the population changes over time) and interesting (having animation is more visually appealling).
Since it is really difficult to sketch a choropleth by hand, we will instead provide the step-by-step directions to generate them instead in the next section.
In this section, we will present the steps used to generate the visuals that will be used in part 3 of the assignment.
We start by importing the all of the necessary packages for the visualisation and data processing.
packages=c('tidyverse', 'gganimate', 'gifski', 'tmap', 'sf', 'png')
for (p in packages){
if (!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
We import the .csv file as follows:
# read the csv file
pop_data <- read_csv("./respopagesextod2011to2019.csv")
# remove all the understores from the AG field
pop_data$AG <- str_replace_all(pop_data$AG, "_", " ")
# preserve the original data order
pop_data$AG <- factor(pop_data$AG, levels = pop_data$AG, labels = pop_data$AG)
Perform data processing for the pyramid plot.
agpop_mutated <- pop_data %>%
spread(AG, Pop)
pop_data_sumarea <-
agpop_mutated[-c(1:2,4)] %>%
group_by(Time, Sex) %>%
summarise_all(list(sum)) %>%
pivot_longer(
cols = 3:21,
names_to = "AG",
values_to = "Pop"
)
# preserve the original data order
pop_data_sumarea$AG <- factor(pop_data_sumarea$AG, levels = pop_data_sumarea$AG, labels = pop_data_sumarea$AG)
Generate the animated population pyramid as follows:
# plot the pyramid
pop_pyramid <-
ggplot(pop_data_sumarea, aes(x=AG, fill=Sex)) +
geom_col(data=subset(pop_data_sumarea, Sex=="Females" ), aes(y=Pop) ) +
geom_col(data=subset(pop_data_sumarea, Sex=="Males" ), aes(y=-Pop) ) +
scale_y_continuous(breaks = seq(-2000000, 2000000, 100000),
labels = paste0(as.character(c(seq(2000, 0, -100), seq(100, 2000, 100))), "K")) +
coord_flip() +
scale_fill_brewer(palette = "Set1") +
theme_classic() +
transition_time(Time) +
labs(title = "Year: {as.integer(frame_time)}",
caption = "Data source: www.singstat.gov.sg") +
theme(axis.text = element_text(size = 12)) +
theme(axis.title = element_text(size = 16))
pop_py_anim <- animate(pop_pyramid, fps = 10, duration = 10, width = 700, height = 500, start_pause = 10, end_pause = 10)
pop_py_anim
Read the map file:
mpsz <- st_read(dsn = "geospatial",
layer = "MP14_SUBZONE_WEB_PL")
## Reading layer `MP14_SUBZONE_WEB_PL' from data source `C:\Users\weige\Google Drive\~MITB\ISSS608 - Visual Analytics\Assignments\Assignment 5\geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 323 features and 15 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
## projected CRS: SVY21
Prepare the various data for each subzone that we want to present. We join the dataset with the map data and calculate the following metrics:
# summarise the young, active and old and filter out zero data
agpop_mutated <- pop_data %>%
spread(AG, Pop) %>%
mutate(Young = rowSums(.[6:10])) %>%
mutate(Active = rowSums(.[11:18])) %>%
mutate(Old = rowSums(.[19:24])) %>%
mutate(Total = rowSums(.[25:27]))
# pop_main is used to hold the main data
pop_main <- agpop_mutated[-c(1,3:4)] %>%
group_by(SZ, Time) %>%
summarise_all(list(sum))
# calculate percentages
pop_main$Young <- pop_main$Young / pop_main$Total
pop_main$Active <- pop_main$Active / pop_main$Total
pop_main$Old <- pop_main$Old / pop_main$Total
# calculate average age
age_mat <- matrix(1, nrow(pop_main), 1) %*% matrix(seq(from = 2.5, to = 92.5, by = 5), nrow=1, ncol=19)
pop_main$avg_age = rowSums(pop_main[,3:21] * age_mat) / pop_main$Total
# drop all the age-population columns
pop_main <- pop_main[-c(3:21)]
# join the map to the dataset
map_combined <- left_join(mpsz %>% mutate(SUBZONE_N = str_to_title(SUBZONE_N)),
pop_main,
by = c("SUBZONE_N" = "SZ"))
# calculate the population density
map_combined$pop_den = map_combined$Total / map_combined$SHAPE_Area * 1e6
map_combined <- map_combined[-c(1:2)]
# filter down to 2019
map2019 <- map_combined %>% filter(Time==2019)
Plotting the choropleth for average age:
tmap_mode("view")
tm_avg_age <-
tm_shape(map2019) +
tm_polygons("avg_age",
style = "quantile",
palette = "Blues") +
tm_credits("Data source: www.singstat.gov.sg")
tm_avg_age
Plotting the choropleth for population density:
tmap_mode("view")
map2019 <- map_combined %>% filter(Time==2019)
tm_popden <-
tm_shape(map2019) +
tm_polygons("pop_den",
style = "quantile",
palette = "Greens") +
tm_credits("Data source: www.singstat.gov.sg")
tm_popden
Let’s take a look at how our population has changed from 2011 to 2019.
We can see from the animation a very interesting phenomenon where the “body” of the pyramid moves upwards as the population ages. It also becomes immediately clear that Singapore’s population is aging rapidly where the top of the pyramid widens much faster while the young to middle age group remains more or less the same.
Focusing on our population in 2019, let look at the disposition of the average age of the residents in the various districts.
We can see that the average age of the residents are higher in the middle of the island in general, which mostly corresponds to the mature estates. As we move outwards from these areas, the average age in general goes downwards. At the outskirts of Singapore, in particular in the Western, Northern and North-Eastern parts of Singapore, we can clearly see that those areas corresponds to the areas with the youngest average age. These are likely young couples with kids which brings down the average age.
We also see some outliers at the fringes of the island, in particular, Lim Chu Kang, Changi Point, Loyang West and Port. It is not clear how these data points are compiled as they have either relatively low population count or high middle/old age group. The data points are given as follows:
| SUBZONE_N | Young | Active | Old | Pop | Avg Age |
|---|---|---|---|---|---|
| Loyang West | 0% | 5% | 95% | 190 | 80.92105 |
| Port | 0% | 80% | 20% | 50 | 50.5 |
| Changi Point | 15% | 69% | 17% | 480 | 48.95833 |
| Lim Chu Kang | 29% | 57% | 14% | 70 | 48.92857 |
Taking a look at the population density, we have:
The population density reveals an interesting contrast to the average age where we can clearly see that the new towns in the outskirts are more densely populated where the younger population mostly stays. This is most likely due to the high cost of housing in the mature estates and would not be as affordable as the new towns for young couples. For the mature estates, the older population would have either bought the housing many years ago where the housing was still relatively cheap, or that the households would have worked for many years to afford the housing in the more centralised areas.