knitr::opts_chunk$set(echo = TRUE)

The visualization is to reveal the demographic structure of Singapore population by age cohort and by planning area in 2019 ## 1. Install and Load Packages #### Introduction of packages tidyverse:The tidyverse is a coherent system of packages used for data manipulation, exploration and visualization. shiny: The shiny is used to build interactive web apps straight from R. ggthemes: The ggthemes is an extension of ‘ggplot2’ which provides extra themes, geoms, and scales.

packages = c('tidyverse','shiny','ggthemes')

for (p in packages){
  if (!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}
## Loading required package: tidyverse
## -- Attaching packages ---------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.0     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.4
## v tidyr   1.0.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## -- Conflicts ------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Loading required package: shiny
## Loading required package: ggthemes

2. Read file

file <- read.csv('C://population2011to2019.csv',header = T, sep = ',')
dim(file)
## [1] 883728      7

3. Extract certain rows and columns to a new table

Because there are 883728 rows and 7 columns data in the original data, but only the data of 2019 will be used. In order to shorten the running time, I extract the gender data of 2019 separately to make a data frame. For building population pyramid, population of female is converted into negative so that population of both male and female can be plotted in one x axis.

male = filter(file, Sex == 'Males', Time == 2019)
female = filter(file, Sex == 'Females', Time == 2019)
female$Population = -1*female$Population
population = rbind(male, female)
head(population)
##   Planning.Area                Subzone Age.Group   Sex
## 1    Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Males
## 2    Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Males
## 3    Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Males
## 4    Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Males
## 5    Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Males
## 6    Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Males
##                          Type.Of.Dwelling Population Time
## 1                 HDB 1- and 2-Room Flats          0 2019
## 2                        HDB 3-Room Flats         10 2019
## 3                        HDB 4-Room Flats         10 2019
## 4          HDB 5-Room and Executive Flats         20 2019
## 5 HUDC Flats (excluding those privatised)          0 2019
## 6                       Landed Properties          0 2019
population$Age.Group <- factor(population$Age.Group,levels=c('0_to_4',
                                       '5_to_9',
                                       "10_to_14",
                                       "15_to_19",
                                       "20_to_24",
                                       "25_to_29",
                                       "30_to_34",
                                       "35_to_39",
                                       "40_to_44",
                                       "45_to_49",
                                       "50_to_54",
                                       "55_to_59",
                                       "60_to_64",
                                       "65_to_69",
                                       "70_to_74",
                                       "75_to_79",
                                       "80_to_84",
                                       "85_to_89",
                                       "90_and_over"))

4. Build an overall population pyramid

This chart shows the overall population pyramid in 2019, and gender is differentiate by color of bars. Although the population is placed horizontally, it is actually the y-axis(flipped by coord_filp()), so use scale_y_continuous() to set the minimum and maximum limit of y axis, step size of each axis ticks, and labels of the axis. Moverover, add title layer and axes labels on ggplot.

ggplot(population, aes(x = Age.Group, y = Population, fill = Sex))+
  geom_bar(subset = (Sex = 'Females'),stat = 'identity')+
  geom_bar(subset = (Sex = 'Males'), stat = 'identity')+
  scale_y_continuous(breaks = seq(-200000, 200000, 50000), 
                     labels = paste0(as.character(c(seq(200, 0, -50), seq(50, 200, 50))), "m")) + 
  coord_flip() + 
  scale_fill_brewer(palette = "Set1")
## Warning: Ignoring unknown parameters: subset

## Warning: Ignoring unknown parameters: subset

#ggplot + 
  #xlab('Population')+
  #ylab('Age Group')

Shiny Application

In shiny app, there will be a drop-down menu in the sidebar, where users can choose the planning area they are interested in, and the population pyramid will be displayed in the main panel.

Create a select list input

Because in selectInput, a dataframe will be used to access the value, I create a new data frame to store Planning Area name and population value.

choice <- data.frame(population$Planning.Area, population$Population)

Demo

library(shiny)

# Define UI for application that draws a histogram
ui<-fluidPage(

    # Application title
    titlePanel("Singapore Population Pyramid"),

    # Sidebar with a slider input for number of bins
    sidebarLayout(
        sidebarPanel(
            selectInput("choice", label = 'Planning Area', choice = unique(population$Planning.Area , multiple = TRUE))),
                        
        mainPanel(
            plotOutput("barchart")
        )
    )
)
server<-function(input,output) {
  output$barchart <- renderPlot({
    filter(population, Planning.Area == input$choice) %>%
      ggplot(aes(x = Age.Group, y = Population, fill = Sex))+
      geom_bar(stat = 'identity', width = 0.6)+
      scale_y_continuous()+
      labs(title = 'Singapore Population Pyramid by 2019') +
      coord_flip()+
      theme_tufte() +
      theme(plot.title = element_text(hjust = 1),
            axis.ticks = element_blank())+
      scale_fill_brewer(palette = 'Set1')
  })
}
shinyApp(ui = ui, server = server)
Shiny applications not supported in static R Markdown documents

Challenge and Solution

1. How to reverse scale

In tableau, population pyramid is created by placing two charts side-by-side, but in R, it is impossible to combine two charts into one. So I convert the value of female population into negative to solve the problem. However, the value in population pyramid is also negative, which will be solved in the future..

2. How to create select menu in shiny

First, when I set the inputID as the table used to build plot, the main panel showed an error report. So, I create a new data frame that is only used to store options in drop-down menu. Second, after I create the select list input control, all planning areas in each row are displayed in the menu. Then, I change the ‘choice’ into unique name of planning area. Third, create a filter to achieve the purpose that when users choose a planning area, the population pyramid only shows the demographic structure in this area.

3. How to set axis ticks

Because the population value in population pyramid is placed horizontally, so I try to modify scale_x_continuous, but fail to change axis ticks. Then, I find that although it is placed horizontally, it is still y-axis, because I flip the axes by coord_flip(). Then I use scale_y_continuous() to set the minimum and maximum limit of y axis, step size of each axis ticks, and labels of the axis.

Informations

Regardless of data anomalies.

1. Gender Distribution

Overall, there are fewer males than females in Singapore, especially among older people. In economically active age and young people, women are only slightly higher than men, and even more men than women in certain age groups.

2. Population Distribution

The distributions of Sengkang and Punggol are different from other areas, because these two areas are new estates and a lot of BTO, the residents of the area are in economically active age and their children, so the population of young residents under 14 in these areas is relatively high.

Compare R and Tabealu

1. Data Analysis

Tableau is suitable for basic data exploration and data visualization, where users can generate the most common graphs, and perform basic data processing, like change data type, join tables. But in R, users have to write own script or find some libraries to accomplish anything. So R is always more flexible than Tableau. Because all data preprocessing and data analysis can be done in R easily, and also complex plot can be created in R.

2. Dynamic programming

Dynamic programming can be achieved by R but not Tableau. For any changes or updates to the data, users do not need to change any code, and visualization will be automatically updated.

3. R is free