Loading of R packages

packages = c('tidyverse','dplyr','ggpubr','knitr')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Load CSV and convert the columns to the proper data type for later analysis

sgpop <- read_csv("data/respopagesextod2011to2020.csv")

sgpop$PA <- factor(sgpop$PA)
sgpop$SZ <- factor(sgpop$SZ)
sgpop$AG <- factor(sgpop$AG)
sgpop$Sex <- factor(sgpop$Sex)
sgpop$TOD <- factor(sgpop$TOD)

1. Introduction

The dataset I will be using for this assignment is Singapore Residents by Planning Area Subzone, Age Group, Sex and Type of Dwelling, June 2011-2019 data series.

1.1 Describe the major data and design challenges faced in accomplishing this assignment

Challenge 1: The large number of categories to be visualised in this dataset

Using Explore Data Analysis (EDA), there are a total of 55 planning zones that consists 335 sub zones, and 19 age groups (as shown below). There are too many categories to proper visualise. Therefore, I would be further reducing the number of categories through by grouping the planning zones into 5 regions, and reduce the number of age group to 6 main age group. This I believe that it will be easier to visualise insights from the dataset.

unique(sgpop$PA)
##  [1] Ang Mo Kio              Bedok                   Bishan                 
##  [4] Boon Lay                Bukit Batok             Bukit Merah            
##  [7] Bukit Panjang           Bukit Timah             Central Water Catchment
## [10] Changi                  Changi Bay              Choa Chu Kang          
## [13] Clementi                Downtown Core           Geylang                
## [16] Hougang                 Jurong East             Jurong West            
## [19] Kallang                 Lim Chu Kang            Mandai                 
## [22] Marina East             Marina South            Marine Parade          
## [25] Museum                  Newton                  North-Eastern Islands  
## [28] Novena                  Orchard                 Outram                 
## [31] Pasir Ris               Paya Lebar              Pioneer                
## [34] Punggol                 Queenstown              River Valley           
## [37] Rochor                  Seletar                 Sembawang              
## [40] Sengkang                Serangoon               Simpang                
## [43] Singapore River         Southern Islands        Straits View           
## [46] Sungei Kadut            Tampines                Tanglin                
## [49] Tengah                  Toa Payoh               Tuas                   
## [52] Western Islands         Western Water Catchment Woodlands              
## [55] Yishun                 
## 55 Levels: Ang Mo Kio Bedok Bishan Boon Lay Bukit Batok ... Yishun
head(unique(sgpop$SZ))
## [1] Ang Mo Kio Town Centre Cheng San              Chong Boon            
## [4] Kebun Bahru            Sembawang Hills        Shangri-La            
## 335 Levels: Admiralty Airport Road Alexandra Hill Alexandra North ... Yunnan
unique(sgpop$AG)
##  [1] 0_to_4      5_to_9      10_to_14    15_to_19    20_to_24    25_to_29   
##  [7] 30_to_34    35_to_39    40_to_44    45_to_49    50_to_54    55_to_59   
## [13] 60_to_64    65_to_69    70_to_74    75_to_79    80_to_84    85_to_89   
## [19] 90_and_over
## 19 Levels: 0_to_4 10_to_14 15_to_19 20_to_24 25_to_29 30_to_34 ... 90_and_over

Challenge 2: Each planning zone has different number of sub zones

Since each planning zone has a different number of zones, there is a need to normalise the resident count. Therefore, I will be normalising the resident so that upon comparison with the rest of the planning zone it will be much comparable.

sgpop %>%
  group_by(PA) %>%
  summarise(Count = n_distinct(SZ)) %>%
  arrange(desc(Count))
## # A tibble: 55 x 2
##    PA            Count
##    <fct>         <int>
##  1 Bukit Merah      17
##  2 Queenstown       15
##  3 Downtown Core    13
##  4 Ang Mo Kio       12
##  5 Jurong East      12
##  6 Toa Payoh        12
##  7 Hougang          10
##  8 Rochor           10
##  9 Bukit Batok       9
## 10 Clementi          9
## # ... with 45 more rows

Challenge 3: Population count of 0

There are a total of 68,193 rows that population is 0 in Year 2019. This would either mean that there isn’t anyone staying at the outskirts of Singapore or the data is missing in the first place. Therefore, in order to keep the dataset relevant I have decided to exclude all of these rows with 0 population.

sgpop %>%
  filter(Time == "2019", Pop == 0) %>%
  summarise(count = n())
## # A tibble: 1 x 1
##   count
##   <int>
## 1 68193

1.2 Proposed sketched design to overcome the challenges

3. The final data visualization and a short description of not more than 350 words. The description must provide at least two useful information revealed by the data visualization.

In conclusion, the top 3 populated regions in Singapore are Central, West and North-East. For age groups in Singapore are 25-54, 0-14, and 55-64 very close with over 65. While the type of dwellings in Singapore are HDB 4-Room, HDB 5-Room, and Condo & Other Apartments. Also, there are more females than males in Singapore during 2019, refer to 2.4.6.

Although, there are more females than males, I found out that there are a few age groups in certain region in Singapore has slightly more males than females, refer to the graphs under 2.4.8.

For 0-14, it’s located in the North, East, North-East and Central. While 15-24, it’s located in the North. Finally 55-64, it’s located in the North and West.

From these insights, there is a sign that there are more young males than females following 2019 in the North, East, North-East and Central regions. Also, I can see that the older age group (55-64) is very close with the young (0-14) age group this shows sign that Singapore soon will more aging residents (55-64, Over 65) then young age group (0-14).

This aging population has been discussed in this article on 17 August 2019 that there’s a need create a new Ministry to look into ageism as Singapore’s workforce will age rapidly.