1. Overview

This DataViz assignment is to capture and reveal the changing patterns of demographic composition (i.e. the young (age 0-24), the economically active group (i.e. age 25-64) and the aged group (i.e. 65 and above)) in Singapore by geographical hierarchy (i.e. region and planning area) over time (i.e. 2011-2019).

For this assignment, we will be using a age sex pyramid, bubble plot as well as ternary plot achieve the task.

2.1 Major data and design challenges faced in accomplishing the task

2.2. Ways to overcome the challenges

2.3 Sketch of proposed design

Caption for the picture.

Caption for the picture.

3. Step-by-step description of process

3.1 Installing and launching R packages

This code chunk install the basic tidyverse and plotly packages and load them onto RStudio environment

#packages <- c('tidyverse','plotly','gganimate','gifski','reshape2')
packages <- c('tidyverse','plotly')

for(p in packages){
  if(!require(p, character.only=T)){
    install.packages(p)
  }
  library(p, character.only=T)
}

3.2 Importing Data

This code chunk reads the csv file and pass it into the data frame.

sg_pop <- read_csv("data/respopagesextod2011to2019.csv")
## Parsed with column specification:
## cols(
##   PA = col_character(),
##   SZ = col_character(),
##   AG = col_character(),
##   Sex = col_character(),
##   TOD = col_character(),
##   Pop = col_double(),
##   Time = col_double()
## )

3.3 Data wrangling

For the first graph which is the age sex pyramid, we will be preparing the data frame as below. In the analysis, we will only be looking at the year 2019. Hence, we will be using the filter function and taking selecting the relavant fields.

sg_pop_pyramid <- sg_pop %>%
  mutate(`Time`=as.character(Time)) %>%
  filter(Time=="2019") %>%
  select(-PA,-SZ,-TOD,-Time)

#replacing the string so that it can be arranged correctly
sg_pop_pyramid$AG<-str_replace(as.character(sg_pop_pyramid$AG),"5_to_9","05_to_9")
sg_pop_pyramid_final <- aggregate(Pop~AG+Sex,data=sg_pop_pyramid,FUN=sum)

Grouping the data by PA, Time, AG. This is to sum the total population for the age group at the planning area by year.

#group by dc_year
sg_pop_grp <- sg_pop %>%
  group_by(PA,Time,AG) %>%
  summarise(DC_total=sum(Pop))

We first need to ungroup the dataframe after which we need to transform the the format into a wide format which can be easily ingested by ggplot. After transforming by using the spread function, we sum the population according to their demographic composition. (demographic composition (i.e. the young (age 0-24), the economically active group (i.e. age 25-64) and the aged group (i.e. 65 and above)). This is done by rowSums function.

agpop_mutated <- sg_pop_grp %>%
  ungroup(sg_pop_grp)%>%
  #mutate(`Time` = as.character(Time))%>%
  spread(AG, DC_total) %>%
  mutate(YOUNG = rowSums(.[3:7]))%>%
  mutate(ACTIVE = rowSums(.[8:15]))  %>%
  mutate(OLD = rowSums(.[16:20])) %>%
  mutate(TOTAL = rowSums(.[21:23])) %>%
  #filter(Year == 2018)%>%
  filter(TOTAL > 0)

3.4 Data visualization

We will first have a look at the visualization of the age sex pyramid for Singapore in the year 2019. Here we are using ggplotly to have some interaction with the graph. Users can click to find more information about the data point.

p<-ggplot(sg_pop_pyramid_final,aes(x = AG, y = ifelse(Sex == "Males", yes = -Pop, no = Pop),fill = Sex,
                                   text = paste('Sex: ', Sex, '<br>Pop:',Pop,'<br>Age Group:',AG )))+
  geom_col()+
  coord_flip()+
  scale_y_continuous(labels = abs, limits = max(sg_pop_pyramid_final$Pop)*c(-1,1))+
  list( 
  theme_bw(),
  theme(panel.grid.major.x = element_blank()),
  theme(axis.text.x.top = element_text(size=12)),
  theme(plot.title = element_text(size=14, face = "bold", hjust = 0.5)),
  theme(plot.subtitle = element_text(hjust = 0.5))
) +
  labs(
    x="Age",
    y = "Population",
    title = "Singapore's Age-Sex Pyramid Structure, 2019",
    subtitle = "Age-sex ratio of the population in Singapore seems to be skewed towards middle aged generation")

ggplotly(p,session="knitr",tooltip = "text")

We will be creating an interactive bubble plot to visualize the change in over the past few years. Considering the small space to create the plot, we will just be looking at 4 years over the past 7 years (interval of every 2 years)

For this analysis, we will be comparing the percentage of young against. We can double click on the legend to isolate a planning area. We will be discussing the insights in the next sections.

p <- agpop_mutated %>%
  filter((Time == 2019 | Time == 2017 | Time == 2015 | Time == 2013) & TOTAL > 1000) %>%
  ggplot(aes(x = OLD/TOTAL, y = YOUNG/TOTAL)) + 
  geom_point()

p <- p + facet_grid(Time ~ .)+
  geom_point(aes(color = PA,size = TOTAL), alpha = 0.5)+ 
  scale_size(range = c(0.5, 7))+ # Adjust the range of points size
  theme(plot.title = element_text(hjust = 0.5))+
  ggtitle("Active Group vs Old Group (%)") +
  xlab("Old Group %") + 
  ylab("YOUNG Group %") +
  scale_y_continuous(labels = scales::percent)+
  scale_x_continuous(labels = scales::percent)
  
# Display the plot
fig <- ggplotly(p)
fig

Over here, we will be creating a ternary chart. This is an animated chart where it will show us the movement of age group for the planning area. Similarly, the insights will be discussed in the next section.

# axis layout
axis <- function(title) {
  list(
    title = title,
    titlefont = list(
      size = 20
    ),
    tickfont = list(
      size = 15
    ),
    tickcolor = "rgba(70,130,180,1)",
    ticklen = 5
  )
}

fig <- agpop_mutated %>% 
  plot_ly(frame=~Time,size = ~TOTAL)
fig <- fig %>% add_trace(
    type = 'scatterternary',
    mode = 'markers',
    a = ~YOUNG,
    b = ~ACTIVE,
    c = ~OLD,
    text = ~paste('<br>Planning Area:', PA,
                  '<br>(A) Young pop:', YOUNG,
                  '<br>(B) Active pop:', ACTIVE, 
                  '<br>(C) Old pop:', OLD
                  ),
    opacity=0.4,
    marker = list( 
      symbol = 0,
      color = '#DB7365',
      sizemode="diameter",
      sizeref=4,
      line = list('width' = 2)
    )
  )

fig <- fig %>% layout(
    title = "Ternary Plot",
    ternary = list(
      sum = 100,
      aaxis = list(title="Young"),
      baxis = list(title="Active",min=0),
      caxis = list(title="Old",min=0),
      paper_bgcolor = 'rgb(243, 243, 243)',
      plot_bgcolor = 'rgb(243, 243, 243)'
    )
  )%>%
  animation_slider(
    currentvalue = list(prefix = "Year ", font = list(color="red"))
  ) %>%
    animation_opts(
    2000, redraw = FALSE
  )

fig

4. Insights from the visualization

Caption for the picture.

+ We can futhur confirm that in Singapore majority of the planning area are seeing an increase of the old group over the past years by looking at the ternary plot. If we press the play button, we can see that the bubbles as a group are moving towards the old group and away from the young and active group.

5. Reflection highlight