1. Overview

In this assignment, public demographic data published by the Singapore Department of Statistics for the year 2019 will be analyzed to glean meaningful interpretations from it.

Design
This data will be analyzed at two levels: by age cohort as well as planning area. Planning areas are the main areas of urban planning and census divisions of Singapore as delineated by the Urban Redevelopment Authority.

In addition, the Old Age Support Ratio (OASR), defined as the number of people in the population aged 20 to 64 divided by those above 64, was also calculated and studied at a national and planning area level. In 2015, countries in the Asia/Pacific region had an average OASR of 11 (OECD, 2019). By examining this statistic, it is possible to gauge whether Singapore is indeed facing an aging population compared to countries in the region and also whether the elderly in Singapore are congregating in a certain planning area.

To visualize the population data, it was decided to use population pyramids.Population pyramids are the most popular way of representing demographic data as it is easy to tell at a glance if a population is aging or growing solely from the shape (top or bottom heavy) of the visualization. In addition to a pyramid for the entire Singapore, a grid of pyramids (one pyramid for each planning area) will also be done up as seen in the sketch below to allow examination of the age group distribution in each planning area.

Sketch of population pyramid grid

Challenges

  1. Data Visualization Steps

2.1 Importing/installing packages

The following code loops through the packages defined and installs them if they are not installed. Thereafter, it loads them.

packages = c('tidyverse', 'dplyr', 'gridExtra', 'magick')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2.2 Preparing a Population Pyramid for Singapore

dplyr package pipes were used to filter the original dataset to obtain 2019 data only. A new field called Pop2 was created, which contained the population in units of thousands for cleaner presentation. In Pop2, the female population values were changed to negative so that they would be placed on the opposite side of males to create the pyramid structure. The age groups were also recoded to remove the underscores to look more presentable. ggplot was used to create the pyramid, with custom x-axis labels to replace the negative values for female population. The females were coloured pink and the males blue.

pop_data <- read_csv("data/respopagesextod2011to2019.csv") %>% filter(Time==2019) 
pop_data$Pop2 <- ifelse(pop_data$Sex == "Females", -pop_data$Pop/1000, pop_data$Pop/1000)

pop_data_grouped <- pop_data %>% 
  group_by(PA,AG,Sex) %>% 
  summarise(group_size=sum(Pop2)) %>% 
  replace(. == '0_to_4', '0 to 4') %>%
  replace(. == '5_to_9', '5 to 9') %>%
  replace(. == '10_to_14', '10 to 14') %>%
  replace(. == '15_to_19', '15 to 19') %>%
  replace(. == '20_to_24', '20 to 24') %>%
  replace(. == '25_to_29', '25 to 29') %>%
  replace(. == '30_to_34', '30 to 34') %>%
  replace(. == '35_to_39', '35 to 39') %>%
  replace(. == '40_to_44', '40 to 44') %>%
  replace(. == '45_to_49', '45 to 49') %>%
  replace(. == '50_to_54', '50 to 54') %>%
  replace(. == '55_to_59', '55 to 59') %>%
  replace(. == '60_to_64', '60 to 64') %>%
  replace(. == '65_to_69', '65 to 69') %>%
  replace(. == '70_to_74', '70 to 74') %>%
  replace(. == '75_to_79', '75 to 79') %>%
  replace(. == '80_to_84', '80 to 84') %>%
  replace(. == '85_to_89', '85 to 89') %>%
  replace(. == '90_and_over', '90 and over')

pop_data_grouped$AG <- factor(pop_data_grouped$AG, levels=c('0 to 4',
                                                            '5 to 9',
                                                            '10 to 14',
                                                            '15 to 19',
                                                            '20 to 24',
                                                            '25 to 29',
                                                            '30 to 34',
                                                            '35 to 39',
                                                            '40 to 44',
                                                            '45 to 49',
                                                            '50 to 54',
                                                            '55 to 59',
                                                            '60 to 64',
                                                            '65 to 69',
                                                            '70 to 74',
                                                            '75 to 79',
                                                            '80 to 84',
                                                            '85 to 89',
                                                            '90 and over'))



ggplot(pop_data_grouped, aes(x=group_size, y=AG, fill=Sex)) +
  geom_col() +
  labs(title="Population Pyramid of Singapore, 2019", x="Population ('000s)", y="Age Group") +
  scale_x_continuous(breaks = seq(-160, 160, 10), labels = paste0(c(seq(160, 0, -10), seq(10, 160, 10)))) +
  scale_fill_manual(values=c("lightpink", "lightblue")) +
  theme_classic() +
  theme(plot.title = element_text(size=18, hjust=0.5),
        axis.title.x = element_text(size=14),
        axis.title.y = element_text(size=14),
        legend.title = element_text(size=16),
        legend.text = element_text(size=14))

2.3 Population pyramids for each planning area

A list of unique values in the planning area column was looped through. If population data was present for that zone, a population pyramid similar to the one above would be plotted. Else, the planning area is skipped as there is no point plotting an empty graph. A message will be printed to console to inform the reader that the planning area was skipped. In each iteration, the graph is assigned a plot name and finally the gridExtra package is used to put everything together in two columns.

i <- 1

for (zone in unique(pop_data_grouped$PA)) {
  zone_data <- pop_data_grouped %>% filter(PA==zone)
  if (colSums(zone_data[, 4]) != 0) {
    plot <- ggplot(zone_data, aes(x=group_size, y=AG, fill=Sex)) +
      geom_col() +
      labs(title=paste("Population of Singapore (", zone, "), 2019", sep=""), x="Population ('000s)", y="Age Group") +
      scale_x_continuous(breaks = seq(-15, 15, 1), labels = paste0(c(seq(15, 0, -1), seq(1, 15, 1)))) +
      scale_fill_manual(values=c("lightpink", "lightblue")) +
      theme_classic() +
      theme(plot.title = element_text(size=11, hjust=0.5),
            axis.title.x = element_text(size=10),
            axis.title.y = element_text(size=10),
            legend.position = "none")
    plot_name <- paste("plot_", i, sep="")
    i <- i+1
    assign(plot_name, plot)
  } else {
    print(paste("INFO: ", zone, " has no population data and was skipped.", sep=""))
  }
}
## [1] "INFO: Boon Lay has no population data and was skipped."
## [1] "INFO: Central Water Catchment has no population data and was skipped."
## [1] "INFO: Changi Bay has no population data and was skipped."
## [1] "INFO: Marina East has no population data and was skipped."
## [1] "INFO: Marina South has no population data and was skipped."
## [1] "INFO: North-Eastern Islands has no population data and was skipped."
## [1] "INFO: Paya Lebar has no population data and was skipped."
## [1] "INFO: Pioneer has no population data and was skipped."
## [1] "INFO: Simpang has no population data and was skipped."
## [1] "INFO: Straits View has no population data and was skipped."
## [1] "INFO: Tengah has no population data and was skipped."
## [1] "INFO: Tuas has no population data and was skipped."
## [1] "INFO: Western Islands has no population data and was skipped."
grid.arrange(plot_1,
             plot_2,
             plot_3,
             plot_4,
             plot_5,
             plot_6,
             plot_7,
             plot_8,
             plot_9,
             plot_10,
             plot_11,
             plot_12,
             plot_13,
             plot_14,
             plot_15,
             plot_16,
             plot_17,
             plot_18,
             plot_19,
             plot_20,
             plot_21,
             plot_22,
             plot_23,
             plot_24,
             plot_25,
             plot_26,
             plot_27,
             plot_28,
             plot_29,
             plot_30,
             plot_31,
             plot_32,
             plot_33,
             plot_34,
             plot_35,
             plot_36,
             plot_37,
             plot_38,
             plot_39,
             plot_40,
             plot_41,
             plot_42, ncol=3)

2.4 Pictorial representation of Singapore’s OASR

In a bid to escape the repeating mundanity of ggplot, the magick package was used for pictorial representation of data. First, the OASR of Singapore was calculated. Two boolean columns were created to map the data to either 20-64 or >64. The colSums function was then applied on the population value multiplied by each of the boolean variables and the OASR was obtained from the results. The number of men in the visualization below were generated programmatically using a for-loop based on the OASR value, including the decimal value.

pop_data$is_above_64 <- ifelse(pop_data$AG %in% c('65_to_69',
                                                  '70_to_74',
                                                  '75_to_79',
                                                  '80_to_84',
                                                  '85_to_89',
                                                  '90_and_over'), 1, 0)
pop_data$is_20_to_64 <- ifelse(pop_data$AG %in% c('20_to_24',
                                                  '25_to_29',
                                                  '30_to_34',
                                                  '35_to_39',
                                                  '40_to_44',
                                                  '45_to_49',
                                                  '50_to_54',
                                                  '55_to_59',
                                                  '60_to_64'), 1, 0)


singapore_oasr <- colSums(pop_data[, 6] * pop_data[, 10]) / colSums(pop_data[, 6] * pop_data[, 9])

background <- image_read("./img/background.jpg")
old_man_png <- image_read("./img/old_man.png")
output<- image_composite(background, old_man_png)
man_png <- image_read("./img/man.png")

if (floor(singapore_oasr) > 1) {
  for (i in seq(0, floor(singapore_oasr))) {
    output <- image_composite(output, man_png, offset = paste("++",(i-1) * 35))
  }
  output <- image_annotate(output, paste("Singapore has an
  Old Age Support Ratio 
  of", round(singapore_oasr, 2)), size=40, gravity="north", style="italic", location="+0+80")
}

last_man_png <- image_crop(man_png, paste((singapore_oasr-floor(singapore_oasr))*150, "x512+290"))
output <- image_composite(output, last_man_png, offset = paste("+", 290 + (floor(singapore_oasr) + 1) * 30))
output

Icons made by Freepik from www.flaticon.com

2.5 OASR in Singapore by Planning Area

The OASR for each planning area was calculated to 2 decimal places and plotted on a bar chart. A viridis colour scale was used to make the graph more colour-blind friendly. As mentioned at the beginning of the report, neighbouring countries had an average OASR of 11 so a horizontal line was added at that value for easier comparison to the average.

pop_data$old = pop_data$Pop * pop_data$is_above_64
pop_data$active = pop_data$Pop * pop_data$is_20_to_64
pop_data_grouped2 <- pop_data %>%
                    group_by(PA) %>%
                    summarise(OASR=sum(active)/sum(old)) %>%
                    na.omit()

pop_data_grouped2$OASR <- round(pop_data_grouped2$OASR, 2)


ggplot(data=pop_data_grouped2, aes(x=reorder(PA, OASR), y=OASR, fill=OASR)) + 
  geom_col() +
  geom_text(aes(label=OASR), vjust=-0.5, size=3) +
  labs(title="Old Age Support Ratio in Singapore by Planning Area", x="Planning Area", y="Old Age Support Ratio") +
  scale_fill_continuous(type = "viridis") +
  scale_y_continuous(breaks=seq(0,55,5)) +
  theme_classic() +
  geom_hline(yintercept = 11, linetype="longdash", color="orange", size=1) +
  theme(plot.title = element_text(size=16, hjust=0.5),
        axis.title.x = element_text(size=12),
        axis.text.x = element_text(angle=45, hjust=1),
        axis.title.y = element_text(size=12))

  1. Conclusion

Finding #1
Overall as a country, Singapore’s OASR is far below the average in the Asia/Pacific region, which is a strong indicator of an aging population. Examination of the population pyramid showed a narrow bottom and a bulge near the top, which also supports the aging population suggestion. In the next few years, that bulge and narrow bottom will move upwards and the OASR will likely decrease even further.
Finding #2
With the exception of two planning areas, namely Museum and Western Water Catchment, all the other planning areas in Singapore are below the Asia/Pacific average OASR. This indicates that the Museum and Western Water Catchment areas has a higher proportion of economically-active residents.