Week08

1.0 Overview

To reveal the demographic structure of Singapore population by Age Cohort and by Planning Area in 2019.

1.1 Major Challenges

The major chalenges in this data is that the data has both “Males” and “Females” data in the same column.
In order to plot the Age-Sex Pyramid, different treatment is required for each “Gender”.
There are planning areas where there are very little residents
To display all over 50 planning areas in a static chart.

1.2 Proposed Solution

Pivot the data to 2 separate columns and convert the “Males” to negative.
Pivot back to a single column where we will have all “Males” data as negetive so to form the left side of the pyramid.
Check the population for each “Planning Area” and remove those that are below 5000 in this study.
Display the plots as facets to allow comparision and remove or group the planning areas according to their population

1.3 Sketch

Sketch

2.0 Installing and Launching R Packages

This code installs and load the neccessary packages required for the analysis below.

packages <- c("tidyverse","reshape2")

for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

3.0 Importing and Preparing the Data Set

In this exercise, we will be using the data set of the population data of Singapore from Year 2000 to 2019. the data set can be downloaded here. We downloaded the 2 csv files in this exercise.

3.1 Importing the data

In the code chunk below, read_csv() of readr is used to import both the population csv files, respopagesextod2000to2010.csv and respopagesextod2011to2019.csv into R and parsed it into tibble R data frame format.

pop2000 <- read_csv("Data/respopagesextod2000to2010.csv")

## Parsed with column specification:
## cols(
##   PA = col_character(),
##   SZ = col_character(),
##   AG = col_character(),
##   Sex = col_character(),
##   TOD = col_character(),
##   Pop = col_double(),
##   Time = col_double()
## )

pop2011 <- read_csv("Data/respopagesextod2011to2019.csv")

## Parsed with column specification:
## cols(
##   PA = col_character(),
##   SZ = col_character(),
##   AG = col_character(),
##   Sex = col_character(),
##   TOD = col_character(),
##   Pop = col_double(),
##   Time = col_double()
## )

3.2 Concatenate the File

In the code chunk below, we concatenate the files to a master data frame.

popraw <- rbind(pop2000, pop2011)

3.3 Ordering of Age Groups

In the code chunk below, we determine the order of age group.

desired_order <- c("0_to_4",
                   "5_to_9",
                   "10_to_14",
                   "15_to_19",
                   "20_to_24",
                   "25_to_29",
                   "30_to_34",
                   "35_to_39",
                   "40_to_44",
                   "45_to_49",
                   "50_to_54",
                   "55_to_59",
                   "60_to_64",
                   "65_to_69",
                   "70_to_74",
                   "75_to_79",
                   "80_to_84",
                   "85_to_89",
                   "90_and_over")

popraw$AG <- factor(popraw$AG, levels=desired_order)

3.4 Data Extraction

In the code chunk below, we filter only data from 2019.

pop2019 <- popraw[popraw$Time == "2019",]

3.5 Summary of Data

In the code chunk below, we look at the total population of each planning area in 2019.

total2019 <- pop2019 %>%
  group_by(PA) %>%
  summarise(sum(Pop))

3.6 Removing Non-Residential Area

In the code chunk below, we remove planning areas that has population of less than 5000. We then further split them into below and above 10000.

residential2019 <- total2019[total2019$`sum(Pop)`> 5000, ]

PA2019 <- residential2019$PA

residential2019above <- residential2019[residential2019$`sum(Pop)` >= 100000, ]

PA2019above <- residential2019above$PA

residential2019below <- residential2019[residential2019$`sum(Pop)` < 100000, ]

PA2019below <- residential2019below$PA

3.7 Filtering Data

In the code chunk below, we remove the Non-Residential Areas for our data.

pop2019res <- pop2019 %>%
  filter(PA %in% PA2019)

3.8 Tidy Data for Age Sex Pyramid

In the code chunk below, we split the “Sex” column into 2 new columns, “Males” and “Females”. Convert the Males value to negative to plot Age-Sex Pyramid.

pop2019asp <- pivot_wider(pop2019res, names_from = Sex, values_from = Pop)

pop2019asp$Males <- -1 * pop2019asp$Males

pop2019asp <- pivot_longer(pop2019asp, cols = c("Males", "Females"), names_to = "Sex", values_to = "Pop")

4.0 Data Visualization

4.1 Overall Age Sex Pyramid

From the visualization below, the number of young (0 - 25) are decreasing. This show the characteristics of an aging population.

ggplot(pop2019asp, aes(x = AG, y = Pop, fill = Sex)) +
  geom_col() +
  labs(title = "Singapore Age Sex Pyramid 2019", x = "Age Group", y = "Population") +
  coord_flip() + 
  scale_fill_brewer(palette = "Set1") +  scale_y_continuous(breaks = seq(-150000, 150000, 50000), 
                     labels = paste0(as.character(c(seq(150, 0, -50), seq(50, 150, 50))), "k")) + 
  theme(plot.title = element_text(face = "bold", size = (15), colour="steelblue4", hjust = 0.5),
        legend.title = element_blank(),
        legend.text = element_text(face = "italic", colour="steelblue4"), 
        axis.title = element_text(size = (10), colour = "steelblue4"),
        axis.text = element_text(size = (10), colour = "steelblue4"),
        panel.background = element_blank(),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.border = element_rect(colour = "black", fill=NA, size=1))

4.2 Age Sex Pyramid by Planning Area

The visualizations below show the residential areas as compared to areas with similar population sizes. From the visualization by planning area, there are several areas with relatively smaller population. We will overcome this by splitting the planning areas using a cut-off of 10,000.

ggplot(pop2019asp, aes(x = AG, y = Pop, fill = Sex)) +
  geom_col() +
    labs(title = "Singapore Age Sex Pyramid 2019 by Planning Area", x = "Age Group", y = "Population") +
  coord_flip() + 
  scale_fill_brewer(palette = "Set1") + 
  theme(plot.title = element_text(face = "bold", size = (15), colour="steelblue4", hjust = 0.5),
        legend.title = element_blank(), 
        legend.text = element_text(face = "italic", colour="steelblue4"), 
        axis.title = element_text(size = (10), colour = "steelblue4"),
        axis.text = element_blank(),
        panel.background = element_blank(),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.border = element_rect(colour = "black", fill=NA, size=1)) +
  facet_wrap(PA ~ .)

4.3 Age Sex Pyramid by Planning Area and Population Size

The visualizations below shows the residential areas as compared to areas with similiar population sizes.

The visualizations below show the residential areas as compared to areas with similar population sizes. From the visualization, planning area like Ang Mo Kio and Bedok are experiencing aging population as the number of older people are higher that the active and young. Planning areas like Punggol and Sengkang possess the characteristic of new town. New town are areas which are newly developed. As a result, most of the residents are middle age, most probably they are newlywed. From this visualization, we can guess the number of years where the area was open for residency. For example, Chua Chu Kang, Pasir Ris and Tampines are developed at around the same time period as the mode of the pyramids are similar.

The visualization of the lower population residence planning area, we can see that areas like Newton, Novena, Outran, River Valley, Rochor and Tanglin are having very low population. This is due to their proximity with the central business district, as the land is used for mostly business purposes as a result, we see a lower population.

pop2019aspabove <- pop2019asp %>%
  filter(PA %in% PA2019above)

ggplot(pop2019aspabove, aes(x = AG, y = Pop, fill = Sex)) +
  geom_col() +
    labs(title = "Singapore Age Sex Pyramid 2019 by Planning Area \nPopulation >10000", x = "Age Group", y = "Population") +
  coord_flip() + 
  scale_fill_brewer(palette = "Set1") + 
  theme(plot.title = element_text(face = "bold", size = (15), colour="steelblue4", hjust = 0.5),
        legend.title = element_blank(), 
        legend.text = element_text(face = "italic", colour="steelblue4"), 
        axis.title = element_text(size = (10), colour = "steelblue4"),
        axis.text = element_blank(),
        panel.background = element_blank(),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.border = element_rect(colour = "black", fill=NA, size=1)) +
  facet_wrap(PA ~ .)

pop2019aspbelow <- pop2019asp %>%
  filter(PA %in% PA2019below)

ggplot(pop2019aspbelow, aes(x = AG, y = Pop, fill = Sex)) +
  geom_col() +
    labs(title = "Singapore Age Sex Pyramid 2019 by Planning Area\nPopulation < 10000", x = "Age Group", y = "Population") +
  coord_flip() + 
  scale_fill_brewer(palette = "Set1") + 
  theme(plot.title = element_text(face = "bold", size = (15), colour="steelblue4", hjust = 0.5),
        legend.title = element_blank(), 
        legend.text = element_text(face = "italic", colour="steelblue4"), 
        axis.title = element_text(size = (10), colour = "steelblue4"),
        axis.text = element_blank(),
        panel.background = element_blank(),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.border = element_rect(colour = "black", fill=NA, size=1)) +
  facet_wrap(PA ~ .)

5.0 Relection

R allows dynamic programming which adds flexibility in data manipulation. Tableau will be better for users with no programming background due to the drag and drop interface.
R allows users to add or remove items by coding. Tableau, on the other hand, requires experience in the program to figure out how to manipulate the display.
R markdown allows users to include the whole process in a document. Tableau can only include visualization and needs complementary software to include detailed write-ups.