What is the Demographics of the Population in Singapore by Planning Area?

IS428 - Visual Analytics for Business Intelligence – Assignment 4

Author: Peng You Xuan

Date: 21 Mar 2021

1. Overview

This makeover aims to visualize the demographics of the population in Singapore based on a data found on Singstat which provides information on the Planning Area Subzone, Age Group, Sex and Type of Dwelling (https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data) . The scope of this visualization is in 2019 and there are a total of 98,192 rows of data that we can work with.

1.1 Sketch of Proposed Data Visualization

2. Major Data and Design Challenges

The challenge for this visualization is the data preparation of the dataset. This was something new, rather than immediately using the dataset as it is, data have to be looked through and adjusted to ensure that the final visualization is able to provide sufficient insights for readers.

Another challenge for the data visualization is exploring the geofacet library, something which was not covered in class previously. The geofacet library also contains 2 sets of grids which has the grid of Singapore. Hence data has to be seen before we could use it. There were also locations that are named differently from the original dataset, causing it to not work properly initially. This would require changes to be made to the geofacet library dataset, to name the planning area name properly. This issue was prominent especially when exploring with sg_planning_area_grid2. To aid in the usage of the geofacet library, the following website was used as a point of guidance: <https://cran.r-project.org/web/packages/geofacet/vignettes/geofacet.html>.

Lastly, the requirement was to plot a visualization which had to show the demographics based on the planning area. this was difficult as there were many planning areas in Singapore, and it was not easy to make the visualization readable for the readers, as the initial visualization had words that were too small (e.g. name of Planning Area) that were not readable. To overcome this challenge, research was done to find out how to adjust the font sizes of the labels and axes appropriately.

3. Data Visualization

3.1 Installation of packages required

packages = c('tidyverse', "ggplot2", "geofacet")

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}
## Loading required package: tidyverse
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v dplyr   1.0.4
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Loading required package: geofacet

3.2 Data Preparation

  1. Read the data from the dataset downloaded from Singstat
pop_data <- read_csv("singapore-residents-by-planning-areasubzone-age-group-sex-and-type-of-dwelling-june-20112020/respopagesextod2011to2020.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   PA = col_character(),
##   SZ = col_character(),
##   AG = col_character(),
##   Sex = col_character(),
##   TOD = col_character(),
##   Pop = col_double(),
##   Time = col_double()
## )
pop_data
## # A tibble: 984,656 x 7
##    PA        SZ               AG    Sex    TOD                         Pop  Time
##    <chr>     <chr>            <chr> <chr>  <chr>                     <dbl> <dbl>
##  1 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 1- and 2-Room Flats       0  2011
##  2 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 3-Room Flats             10  2011
##  3 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 4-Room Flats             30  2011
##  4 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 5-Room and Executive~    50  2011
##  5 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HUDC Flats (excluding th~     0  2011
##  6 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  Landed Properties             0  2011
##  7 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  Condominiums and Other A~    40  2011
##  8 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  Others                        0  2011
##  9 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 1- and 2-Room Flats       0  2011
## 10 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 3-Room Flats             10  2011
## # ... with 984,646 more rows
  1. Filter out data that is from 2019
pop_data <- pop_data %>% filter(Time == 2019)
pop_data
## # A tibble: 98,192 x 7
##    PA        SZ               AG    Sex    TOD                         Pop  Time
##    <chr>     <chr>            <chr> <chr>  <chr>                     <dbl> <dbl>
##  1 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 1- and 2-Room Flats       0  2019
##  2 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 3-Room Flats             10  2019
##  3 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 4-Room Flats             10  2019
##  4 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 5-Room and Executive~    20  2019
##  5 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HUDC Flats (excluding th~     0  2019
##  6 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  Landed Properties             0  2019
##  7 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  Condominiums and Other A~    50  2019
##  8 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  Others                        0  2019
##  9 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 1- and 2-Room Flats       0  2019
## 10 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 3-Room Flats             10  2019
## # ... with 98,182 more rows
  1. Rename the columns for the dataset, to make it clearer for readers to understand the data
names(pop_data)<-c("PlanningArea","Subzone","AgeGroup","Gender","TypeofDwelling","Population","Year")
pop_data
## # A tibble: 98,192 x 7
##    PlanningArea Subzone     AgeGroup Gender TypeofDwelling      Population  Year
##    <chr>        <chr>       <chr>    <chr>  <chr>                    <dbl> <dbl>
##  1 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  HDB 1- and 2-Room ~          0  2019
##  2 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  HDB 3-Room Flats            10  2019
##  3 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  HDB 4-Room Flats            10  2019
##  4 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  HDB 5-Room and Exe~         20  2019
##  5 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  HUDC Flats (exclud~          0  2019
##  6 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  Landed Properties            0  2019
##  7 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  Condominiums and O~         50  2019
##  8 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  Others                       0  2019
##  9 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Femal~ HDB 1- and 2-Room ~          0  2019
## 10 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Femal~ HDB 3-Room Flats            10  2019
## # ... with 98,182 more rows
  1. The Age group field would not sort accurately, hence the Age group from 5 to 9 has to be renamed to allot the data to be sorted properly
pop_data <- pop_data %>% mutate(AgeGroup_2 = case_when(AgeGroup == "5_to_9" ~ "05_to_9", 
                                                       TRUE ~ as.character(AgeGroup)))
pop_data
## # A tibble: 98,192 x 8
##    PlanningArea Subzone     AgeGroup Gender TypeofDwelling      Population  Year
##    <chr>        <chr>       <chr>    <chr>  <chr>                    <dbl> <dbl>
##  1 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  HDB 1- and 2-Room ~          0  2019
##  2 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  HDB 3-Room Flats            10  2019
##  3 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  HDB 4-Room Flats            10  2019
##  4 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  HDB 5-Room and Exe~         20  2019
##  5 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  HUDC Flats (exclud~          0  2019
##  6 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  Landed Properties            0  2019
##  7 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  Condominiums and O~         50  2019
##  8 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Males  Others                       0  2019
##  9 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Femal~ HDB 1- and 2-Room ~          0  2019
## 10 Ang Mo Kio   Ang Mo Kio~ 0_to_4   Femal~ HDB 3-Room Flats            10  2019
## # ... with 98,182 more rows, and 1 more variable: AgeGroup_2 <chr>
  1. View the contents of the geofacet library for the SG planning area grid. To understand what is needed to plot the graphs using geofacet.
sg_planning_area_grid1
##    row col code                    name
## 1    1   7   WD               Woodlands
## 2    1   9   SE                 Simpang
## 3    1   8   SB               Sembawang
## 4    1  10   SL                 Seletar
## 5    2   8   YS                  Yishun
## 6    2   9   SE                Sengkang
## 7    2  10   PG                 Punggol
## 8    2   7   MD                  Mandai
## 9    3   7   TP               Toa Payoh
## 10   3   5   SK            Sungei Kadut
## 11   3   9   SG               Serangoon
## 12   3  11   PL              Paya Lebar
## 13   3  12   PR               Pasir Ris
## 14   3  14   NI   North Eastern Islands
## 15   3  10   HG                 Hougang
## 16   3   4   CK           Choa Chu Kang
## 17   3   6   BS                  Bishan
## 18   3   8   AM              Ang Mo Kio
## 19   4   4   TH                  Tengah
## 20   4   6   TL                 Tanglin
## 21   4  12   TM                Tampines
## 22   4   8   RC                  Rochor
## 23   4   7   NT                  Newton
## 24   4   3   LK            Lim Chu Kang
## 25   4   9   KL                 Kallang
## 26   4  10   GL                 Geylang
## 27   4  13   CH                  Changi
## 28   4   5   CC Central Water Catchment
## 29   4  11   BD                   Bedok
## 30   5  13   CB              Changi Bay
## 31   5   2   WC Western Water Catchment
## 32   5   7   OR                 Orchard
## 33   5   6   NV                  Novena
## 34   5   8   MU                  Museum
## 35   5  10   MP           Marine Parade
## 36   5   9   ME             Marina East
## 37   5   3   JW             Jurong West
## 38   5   5   BP           Bukit Panjang
## 39   5   4   BK             Bukit Batok
## 40   6   1   TS                    Tuas
## 41   6   6   RV            River Valley
## 42   6   2   PN                 Pioneer
## 43   6   8   MS            Marina South
## 44   6   4   JE             Jurong East
## 45   6   7   DT           Downtown Core
## 46   6   5   BT             Bukit Timah
## 47   6   3   BL                Boon Lay
## 48   7   7   SV            Straits View
## 49   7   6   SR         Singapore River
## 50   7   5   QT              Queenstown
## 51   7   4   CL                Clementi
## 52   8   7   OT                  Outram
## 53   8   6   BM             Bukit Merah
## 54   9   6   SI        Southern Islands
## 55   9   4   WI         Western Islands
sg_planning_area_grid2
##    col row              code              name         name_indo
## 1    6   6        ANG MO KIO        ANG MO KIO        ANG MO KIO
## 2    9  10             BEDOK             BEDOK             BEDOK
## 3    6   9            BISHAN            BISHAN            BISHAN
## 4    3   8       BUKIT BATOK       BUKIT BATOK       BUKIT BATOK
## 5    5  15       BUKIT MERAH       BUKIT MERAH       BUKIT MERAH
## 6    3   7     BUKIT PANJANG     BUKIT PANJANG     BUKIT PANJANG
## 7    4   9       BUKIT TIMAH       BUKIT TIMAH       BUKIT TIMAH
## 8    5  12           CENTRAL           CENTRAL           CENTRAL
## 9    2   4     CHOA CHU KANG     CHOA CHU KANG     CHOA CHU KANG
## 10   3  12          CLEMENTI          CLEMENTI          CLEMENTI
## 11   8  10           GEYLANG           GEYLANG           GEYLANG
## 12   8   6           HOUGANG           HOUGANG           HOUGANG
## 13   2   9       JURONG EAST       JURONG EAST       JURONG EAST
## 14   1   9       JURONG WEST       JURONG WEST       JURONG WEST
## 15   7  11 KALLANG / WHAMPOA KALLANG / WHAMPOA KALLANG / WHAMPOA
## 16   8  13     MARINE PARADE     MARINE PARADE     MARINE PARADE
## 17  11   6         PASIR RIS         PASIR RIS         PASIR RIS
## 18   7   4           PUNGGOL           PUNGGOL           PUNGGOL
## 19   4  13         QUEENTOWN         QUEENTOWN         QUEENTOWN
## 20   4   1         SEMBAWANG         SEMBAWANG         SEMBAWANG
## 21   7   5          SENGKANG          SENGKANG          SENGKANG
## 22   8   7         SERANGOON         SERANGOON         SERANGOON
## 23  10   8          TAMPINES          TAMPINES          TAMPINES
## 24   7  10         TOA PAYOH         TOA PAYOH         TOA PAYOH
## 25   3   2         WOODLANDS         WOODLANDS         WOODLANDS
## 26   6   3            YISHUN            YISHUN            YISHUN
  1. As planning_area_grid1 has more fields that might better represent Singapore and its location, sg_planning_area_grid1 is used for this visualization. To ensure consistency in the dataset, the names were changed to uppercase.
library(geofacet)

sg_planning_area_grid1 <- sg_planning_area_grid1 %>% mutate(name = toupper(name))
  1. Since the to ensure consistency, change the Planning area data to uppercase using toupper()
# convert planning area name to upper case 
pop_data <- pop_data %>% mutate(PlanningArea = toupper(PlanningArea))
pop_data
## # A tibble: 98,192 x 8
##    PlanningArea Subzone     AgeGroup Gender TypeofDwelling      Population  Year
##    <chr>        <chr>       <chr>    <chr>  <chr>                    <dbl> <dbl>
##  1 ANG MO KIO   Ang Mo Kio~ 0_to_4   Males  HDB 1- and 2-Room ~          0  2019
##  2 ANG MO KIO   Ang Mo Kio~ 0_to_4   Males  HDB 3-Room Flats            10  2019
##  3 ANG MO KIO   Ang Mo Kio~ 0_to_4   Males  HDB 4-Room Flats            10  2019
##  4 ANG MO KIO   Ang Mo Kio~ 0_to_4   Males  HDB 5-Room and Exe~         20  2019
##  5 ANG MO KIO   Ang Mo Kio~ 0_to_4   Males  HUDC Flats (exclud~          0  2019
##  6 ANG MO KIO   Ang Mo Kio~ 0_to_4   Males  Landed Properties            0  2019
##  7 ANG MO KIO   Ang Mo Kio~ 0_to_4   Males  Condominiums and O~         50  2019
##  8 ANG MO KIO   Ang Mo Kio~ 0_to_4   Males  Others                       0  2019
##  9 ANG MO KIO   Ang Mo Kio~ 0_to_4   Femal~ HDB 1- and 2-Room ~          0  2019
## 10 ANG MO KIO   Ang Mo Kio~ 0_to_4   Femal~ HDB 3-Room Flats            10  2019
## # ... with 98,182 more rows, and 1 more variable: AgeGroup_2 <chr>

2.3 Plotting Visualization

  1. To be able to test what will be plotted on the grid map, plot a sample graph to view an expanded version of the graph.
ggplot(pop_data, aes(AgeGroup_2, Population, fill = Gender)) +
  geom_col(stat="identity", width=1) + 
  coord_flip()
## Warning: Ignoring unknown parameters: stat

4. Final Visualization

Plot all graphs on with the geofacet library. This will show the Planning area location relative to the actual position of the location on the Singapore map.

library(geofacet)
ggplot(pop_data, aes(x=AgeGroup_2, y=Population, fill = Gender)) +
  labs(title = "Demographic of Population by Planning Area", 
       caption = "Source: Singstat", x = "Population", y = "Age Group") +
  
  geom_col(stat="identity") +
  theme(axis.title.x = element_text(size=50), 
        axis.title.y = element_text(size=50), 
        title = element_text(size=80), 
        axis.text.x = element_text(size=10), 
        legend.text = element_text(size = 50), 
        strip.text.x = element_text(size = 30, face = "bold")) +
  coord_flip() +
  #facet_wrap(~ Planning Area)
  facet_geo(~ PlanningArea, grid = "sg_planning_area_grid1", label = "name")
## Warning: Ignoring unknown parameters: stat
## Some values in the specified facet_geo column 'PlanningArea' do not
##   match the 'name' column of the specified grid and will be removed:
##   NORTH-EASTERN ISLANDS

5. Findings from the visualization

  1. Most of the population of Singapore are found in the upper half of Singapore as seen from the longer bar charts being concentrated in the upper half of the Singapore map layout.
  2. Senkang and Punggol are planning areas that seems to have a large proportion of their population to be of the older generation as the charts are longer towards the bottom.
  3. Most of the population is distributed in a normal distribution manner. Suggesting that most of the population are Adults rather than children or Old people.