knitr::opts_chunk$set(echo = TRUE)
The visualization is to reveal the demographic structure of Singapore population by age cohort and by planning area in 2019 ## 1. Install and Load Packages #### Introduction of packages tidyverse:The tidyverse is a coherent system of packages used for data manipulation, exploration and visualization. shiny: The shiny is used to build interactive web apps straight from R. ggthemes: The ggthemes is an extension of ‘ggplot2’ which provides extra themes, geoms, and scales.
packages = c('tidyverse','shiny','ggthemes')
for (p in packages){
if (!require(p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
## Loading required package: tidyverse
## -- Attaching packages ---------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.0 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.4
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Loading required package: shiny
## Loading required package: ggthemes
file <- read.csv('C://population2011to2019.csv',header = T, sep = ',')
dim(file)
## [1] 883728 7
Because there are 883728 rows and 7 columns data in the original data, but only the data of 2019 will be used. In order to shorten the running time, I extract the gender data of 2019 separately to make a data frame. For building population pyramid, population of female is converted into negative so that population of both male and female can be plotted in one x axis.
male = filter(file, Sex == 'Males', Time == 2019)
female = filter(file, Sex == 'Females', Time == 2019)
female$Population = -1*female$Population
population = rbind(male, female)
head(population)
## Planning.Area Subzone Age.Group Sex
## 1 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 2 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 3 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 4 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 5 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 6 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## Type.Of.Dwelling Population Time
## 1 HDB 1- and 2-Room Flats 0 2019
## 2 HDB 3-Room Flats 10 2019
## 3 HDB 4-Room Flats 10 2019
## 4 HDB 5-Room and Executive Flats 20 2019
## 5 HUDC Flats (excluding those privatised) 0 2019
## 6 Landed Properties 0 2019
population$Age.Group <- factor(population$Age.Group,levels=c('0_to_4',
'5_to_9',
"10_to_14",
"15_to_19",
"20_to_24",
"25_to_29",
"30_to_34",
"35_to_39",
"40_to_44",
"45_to_49",
"50_to_54",
"55_to_59",
"60_to_64",
"65_to_69",
"70_to_74",
"75_to_79",
"80_to_84",
"85_to_89",
"90_and_over"))
This chart shows the overall population pyramid in 2019, and gender is differentiate by color of bars. Although the population is placed horizontally, it is actually the y-axis(flipped by coord_filp()), so use scale_y_continuous() to set the minimum and maximum limit of y axis, step size of each axis ticks, and labels of the axis. Moverover, add title layer and axes labels on ggplot.
ggplot(population, aes(x = Age.Group, y = Population, fill = Sex))+
geom_bar(subset = (Sex = 'Females'),stat = 'identity')+
geom_bar(subset = (Sex = 'Males'), stat = 'identity')+
scale_y_continuous(breaks = seq(-200000, 200000, 50000),
labels = paste0(as.character(c(seq(200, 0, -50), seq(50, 200, 50))), "m")) +
coord_flip() +
scale_fill_brewer(palette = "Set1")
## Warning: Ignoring unknown parameters: subset
## Warning: Ignoring unknown parameters: subset
#ggplot +
#xlab('Population')+
#ylab('Age Group')
In shiny app, there will be a drop-down menu in the sidebar, where users can choose the planning area they are interested in, and the population pyramid will be displayed in the main panel.
Because in selectInput, a dataframe will be used to access the value, I create a new data frame to store Planning Area name and population value.
choice <- data.frame(population$Planning.Area, population$Population)
library(shiny)
# Define UI for application that draws a histogram
ui<-fluidPage(
# Application title
titlePanel("Singapore Population Pyramid"),
# Sidebar with a slider input for number of bins
sidebarLayout(
sidebarPanel(
selectInput("choice", label = 'Planning Area', choice = unique(population$Planning.Area , multiple = TRUE))),
mainPanel(
plotOutput("barchart")
)
)
)
server<-function(input,output) {
output$barchart <- renderPlot({
filter(population, Planning.Area == input$choice) %>%
ggplot(aes(x = Age.Group, y = Population, fill = Sex))+
geom_bar(stat = 'identity', width = 0.6)+
scale_y_continuous()+
labs(title = 'Singapore Population Pyramid by 2019') +
coord_flip()+
theme_tufte() +
theme(plot.title = element_text(hjust = 1),
axis.ticks = element_blank())+
scale_fill_brewer(palette = 'Set1')
})
}
shinyApp(ui = ui, server = server)
In tableau, population pyramid is created by placing two charts side-by-side, but in R, it is impossible to combine two charts into one. So I convert the value of female population into negative to solve the problem. However, the value in population pyramid is also negative, which will be solved in the future..
Because the population value in population pyramid is placed horizontally, so I try to modify scale_x_continuous, but fail to change axis ticks. Then, I find that although it is placed horizontally, it is still y-axis, because I flip the axes by coord_flip(). Then I use scale_y_continuous() to set the minimum and maximum limit of y axis, step size of each axis ticks, and labels of the axis.
Regardless of data anomalies.
Overall, there are fewer males than females in Singapore, especially among older people. In economically active age and young people, women are only slightly higher than men, and even more men than women in certain age groups.
The distributions of Sengkang and Punggol are different from other areas, because these two areas are new estates and a lot of BTO, the residents of the area are in economically active age and their children, so the population of young residents under 14 in these areas is relatively high.
Tableau is suitable for basic data exploration and data visualization, where users can generate the most common graphs, and perform basic data processing, like change data type, join tables. But in R, users have to write own script or find some libraries to accomplish anything. So R is always more flexible than Tableau. Because all data preprocessing and data analysis can be done in R easily, and also complex plot can be created in R.
Dynamic programming can be achieved by R but not Tableau. For any changes or updates to the data, users do not need to change any code, and visualization will be automatically updated.