1 Overview

Student Name: Zhao Tianyun

Student ID: 01368190

1.1 Understanding task

The raw data consists of Singapore Residents by Planning Area Subzone, Age Group, Sex and Type of Dwelling, from year 2011-2010. This markdown file aims to reveal the demographic structure of Singapore population by age cohort and by planning area in 2019.

1.2 Purpose of Visualisation

To show the population structure(age, sex) of Singapore. To visualize the population distribution among planning areas. To uncover the population structure among the planning areas.

1.3 Challenges

The major challenge is to tidy up and wrangle the data set properly. The data set contains data related to type of dwelling and is categorized based on planning area, subzone, age group, sex, type of dwelling. I will need to tidy up the dataset and remove infromation that are not necessary for my purpose. Another challenge is that the age coherts are too divided, which makes it difficult to understand the bigger picture. Thus, I will group the age groups into three categories, namely the young, the economic active, and the aged groups, as it will help me to visualize underlying patterns. After readying the data set for visualization, another challenge is to come up with plots that can properly serve my purpose. I propose to construct a population pyramid, a bar chart of total population by planning area, and bar charts of different age groups by planning area. You may refer to the picture below for my proposed design.

1.4 Sketch of Proposed DataViz Design

A caption

A caption

2 DataViz Step-by-Step

Now, we will go through the data wrangling and visualization steps ## Loading required packages

packages = c('tidyverse', 'lemon')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
}
## Loading required package: tidyverse
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Loading required package: lemon
## Warning: package 'lemon' was built under R version 4.0.3
## 
## Attaching package: 'lemon'
## The following object is masked from 'package:purrr':
## 
##     %||%
## The following objects are masked from 'package:ggplot2':
## 
##     CoordCartesian, element_render

2.1 Loading dataset

pop <- read_csv('data/respopagesextod2011to2020.csv')
## Parsed with column specification:
## cols(
##   PA = col_character(),
##   SZ = col_character(),
##   AG = col_character(),
##   Sex = col_character(),
##   TOD = col_character(),
##   Pop = col_double(),
##   Time = col_double()
## )

2.2 Understanding dataset

head(pop, 10)
## # A tibble: 10 x 7
##    PA        SZ               AG    Sex    TOD                         Pop  Time
##    <chr>     <chr>            <chr> <chr>  <chr>                     <dbl> <dbl>
##  1 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 1- and 2-Room Flats       0  2011
##  2 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 3-Room Flats             10  2011
##  3 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 4-Room Flats             30  2011
##  4 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 5-Room and Executive~    50  2011
##  5 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HUDC Flats (excluding th~     0  2011
##  6 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  Landed Properties             0  2011
##  7 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  Condominiums and Other A~    40  2011
##  8 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  Others                        0  2011
##  9 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 1- and 2-Room Flats       0  2011
## 10 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 3-Room Flats             10  2011
summary(pop)
##       PA                 SZ                 AG                Sex           
##  Length:984656      Length:984656      Length:984656      Length:984656     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      TOD                 Pop               Time     
##  Length:984656      Min.   :   0.00   Min.   :2011  
##  Class :character   1st Qu.:   0.00   1st Qu.:2013  
##  Mode  :character   Median :   0.00   Median :2016  
##                     Mean   :  39.86   Mean   :2016  
##                     3rd Qu.:  10.00   3rd Qu.:2018  
##                     Max.   :2860.00   Max.   :2020

2.3 Data cleaning and wrangling

2.3.1 Filter year 2019 from dataset

pop2019 <- pop %>% filter(`Time` == 2019)
pop2019
## # A tibble: 98,192 x 7
##    PA        SZ               AG    Sex    TOD                         Pop  Time
##    <chr>     <chr>            <chr> <chr>  <chr>                     <dbl> <dbl>
##  1 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 1- and 2-Room Flats       0  2019
##  2 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 3-Room Flats             10  2019
##  3 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 4-Room Flats             10  2019
##  4 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HDB 5-Room and Executive~    20  2019
##  5 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  HUDC Flats (excluding th~     0  2019
##  6 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  Landed Properties             0  2019
##  7 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  Condominiums and Other A~    50  2019
##  8 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males  Others                        0  2019
##  9 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 1- and 2-Room Flats       0  2019
## 10 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 3-Room Flats             10  2019
## # ... with 98,182 more rows

2.3.2 Transform the data for age analysis

popage <- pop2019 %>%
  group_by(AG, Sex) %>%
  summarise(sum = sum(Pop))
## `summarise()` regrouping output by 'AG' (override with `.groups` argument)
popage
## # A tibble: 38 x 3
## # Groups:   AG [19]
##    AG       Sex        sum
##    <chr>    <chr>    <dbl>
##  1 0_to_4   Females  90850
##  2 0_to_4   Males    94730
##  3 10_to_14 Females 102550
##  4 10_to_14 Males   105830
##  5 15_to_19 Females 108910
##  6 15_to_19 Males   113730
##  7 20_to_24 Females 122480
##  8 20_to_24 Males   127040
##  9 25_to_29 Females 145960
## 10 25_to_29 Males   142640
## # ... with 28 more rows

2.3.3 Transform the data for planning area analysis

poppa <- pop2019 %>%
  group_by(PA) %>%
  summarise(sum = sum(Pop))
## `summarise()` ungrouping output (override with `.groups` argument)
poppa
## # A tibble: 55 x 2
##    PA                         sum
##    <chr>                    <dbl>
##  1 Ang Mo Kio              164430
##  2 Bedok                   279970
##  3 Bishan                   88230
##  4 Boon Lay                     0
##  5 Bukit Batok             154140
##  6 Bukit Merah             152600
##  7 Bukit Panjang           139700
##  8 Bukit Timah              77720
##  9 Central Water Catchment      0
## 10 Changi                    1790
## # ... with 45 more rows

2.3.4 Transform data into population by age group based by planning area

pop_aged <- pop %>%
  filter(`AG` == c("65_to_69", "70_to_74", "75_to_79", "80_to_84", "85_to_89", "90_and_over"))
## Warning in AG == c("65_to_69", "70_to_74", "75_to_79", "80_to_84", "85_to_89", :
## longer object length is not a multiple of shorter object length
pop_aged <- pop_aged %>%
  group_by(PA) %>%
  summarise(Pop_sum = sum(Pop))
## `summarise()` ungrouping output (override with `.groups` argument)
pop_active <- pop %>%
  filter(`AG` == c("25_to_29", "30_to_34", "35_to_39", "40_to_44", "45_to_49", "50_to_54", "55_to_59", "60_to_64"))
pop_active <- pop_active %>%
  group_by(PA) %>%
  summarise(Pop_sum = sum(Pop))
## `summarise()` ungrouping output (override with `.groups` argument)
pop_young <- pop %>%
  filter(`AG` == c("0_to_4", "5_to_9", "10_to_14", "15_to_19", "20_to_24"))
## Warning in AG == c("0_to_4", "5_to_9", "10_to_14", "15_to_19", "20_to_24"):
## longer object length is not a multiple of shorter object length
pop_young <- pop_young %>%
  group_by(PA) %>%
  summarise(Pop_sum = sum(Pop))
## `summarise()` ungrouping output (override with `.groups` argument)

2.4 Data Visualization

2.4.1 Singapore population pyramid (1st visualization)

v1 <- popage %>%
  arrange(sum) %>%
  mutate(AG = factor(AG, levels=c("0_to_4", "5_to_9", "10_to_14", "15_to_19", "20_to_24", "25_to_29", "30_to_34", "35_to_39", "40_to_44", "45_to_49", "50_to_54", "55_to_59", "60_to_64", "65_to_69", "70_to_74", "75_to_79", "80_to_84", "85_to_89", "90_and_over"))) %>% 
  ggplot(
       aes(x = AG, y = ifelse(Sex == "Males", -sum, sum), fill = Sex)) +
  geom_bar(stat = "identity") +
  scale_y_continuous(breaks = seq(-200000, 200000, 50000)) +
  ggtitle("Population Pyramid of Singapore") +
  xlab("Count") + ylab("Age") +
  coord_flip()

v1

2.4.2 Population based on planning area (2nd visualization)

v2 <- poppa %>% 
    ggplot(aes(x = reorder(PA, -sum), y = sum)) +
    geom_bar(stat = "identity") +
    scale_x_discrete(guide = guide_axis(n.dodge=8)) +
    ggtitle("Population by planning area") +
    xlab("Planning Area") + ylab("Population") +
    coord_cartesian(ylim = c(0,300000)) 
v2

2.4.3 Visualization of population by different age groups by planning area (3rd visualization)

aged <- pop_aged %>% 
  ggplot(aes(x = reorder(PA, -Pop_sum), y = Pop_sum)) +
    geom_bar(stat = "identity") +
    scale_x_discrete(guide = guide_axis(n.dodge=8))

aged

active <- pop_active %>% 
  ggplot(aes(x = reorder(PA, -Pop_sum), y = Pop_sum)) +
    geom_bar(stat = "identity") +
    scale_x_discrete(guide = guide_axis(n.dodge=8))

active

young <- pop_young %>% 
  ggplot(aes(x = reorder(PA, -Pop_sum), y = Pop_sum)) +
    geom_bar(stat = "identity") +
    scale_x_discrete(guide = guide_axis(n.dodge=8))

young

3 Description and findings

3.1 Description of visualization

The first visualization is an age/sex pyramid of the sum of all planning areas. In this plot, we will be able to see the proportion of population of different age coherts and respective gender compositions.

The second visualization is a bar chart showing the total population of planning areas, sorted in descending order. We will be able to rank the planning areas by population.

The third visualization consists of three plots. The age cohorts are grouped into three segments. From age 0 - 24 is the young group. From age 25 - 64 is the economically active group. And from age 65 to 90 and above is the aged group. From these three visualizations, we will be able to see the different age structure within different planning areas.

3.2 Findings

  1. There are generally more males than females in younger age groups, and more females than males in elder age groups. Indicating that in recent years, there maybe more male newborn children than female, and that females generally have a longer life expectency than their male counterparts.

  2. Among planning areas, Bedok, Jurong West, Tampines, Woodlands, Sengkang, Hougang, Yishun and Choa Chu Kang have the greatest population, indicating that these are major residential areas. On the other hand, planning areas such as Seletar, Paya Lebar, Pioneer, Tuas, and North-Eastern Islands have nearly no population or no population at all, indicating that these planning areas are planned for residential areas, but for other uses, such as industrial parks, military grounds or nature reserves.

  3. Of all the planning areas, Bedok, Bukit Merah and Hougang have significant aged population. These planning areas may likely be old neighborhoods. On the other hand, Jurong West, Sengkang and Bedok have high population of economically active population, this may indicate that these neighborhood are popular among working adults or they are convenient to reach commercial districts. Last but not least, Woodlands, Jurong West, and Bedok are home to large number of young population. This may show that these planning areas are considered as ideal neighborhoods for raising children or these areas have lower housing prices.