The data visualization below aimed at revealing the demographic structure of Singapore population by age cohort and by planning area in 2019.
dataset can be found from : https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data
packages = c('tidyverse','dplyr')
for(p in packages){library
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
## Loading required package: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.6 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
raw_dataset <- read_csv("respopagesextod2011to2020.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## PA = col_character(),
## SZ = col_character(),
## AG = col_character(),
## Sex = col_character(),
## TOD = col_character(),
## Pop = col_double(),
## Time = col_double()
## )
In order to sort the age group in ascending order, we need to rename “5_to_9” to “05_to_09”, otherwise the “5_to_9” group will be ordered after “45_to_49” group.
raw_dataset$AG[raw_dataset$AG == "5_to_9"] <- "05_to_09"
head(raw_dataset)
## # A tibble: 6 x 7
## PA SZ AG Sex TOD Pop Time
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Ang Mo K… Ang Mo Kio Town… 0_to… Males HDB 1- and 2-Room Flats 0 2011
## 2 Ang Mo K… Ang Mo Kio Town… 0_to… Males HDB 3-Room Flats 10 2011
## 3 Ang Mo K… Ang Mo Kio Town… 0_to… Males HDB 4-Room Flats 30 2011
## 4 Ang Mo K… Ang Mo Kio Town… 0_to… Males HDB 5-Room and Executive F… 50 2011
## 5 Ang Mo K… Ang Mo Kio Town… 0_to… Males HUDC Flats (excluding thos… 0 2011
## 6 Ang Mo K… Ang Mo Kio Town… 0_to… Males Landed Properties 0 2011
In order to plot the “Age-Sex Pyramid by age group and planning area”, we need to first filter the data in year 2019 from the original data set and then select the need columns: Planning Area, Age group, Sex and population. Use the summarise() function to sum up the population based on planning area, age group and sex.
pyramid_withPA <- raw_dataset %>%
filter(Time == "2019") %>%
select(c("PA", "AG", "Sex", "Pop")) %>%
arrange(AG) %>%
group_by(PA, AG, Sex) %>%
summarise(Pop = sum(Pop))
## `summarise()` has grouped output by 'PA', 'AG'. You can override using the `.groups` argument.
head(pyramid_withPA)
## # A tibble: 6 x 4
## # Groups: PA, AG [3]
## PA AG Sex Pop
## <chr> <chr> <chr> <dbl>
## 1 Ang Mo Kio 0_to_4 Females 2660
## 2 Ang Mo Kio 0_to_4 Males 2760
## 3 Ang Mo Kio 05_to_09 Females 3110
## 4 Ang Mo Kio 05_to_09 Males 3120
## 5 Ang Mo Kio 10_to_14 Females 3670
## 6 Ang Mo Kio 10_to_14 Males 3710
In order to plot the overall sex-population pyramid plot in singapore 2019, we need to first use filter() to select the 2019 data set and choose PA,AG,SEX and Pop columns as above. Use arrange() function to sort the age group in ascending order. The aggregate() function is used to split data into subsets and computes the sum of population based on the age group and sex.
sg_pyramid <- raw_dataset %>%
filter(Time == "2019") %>%
select(c("PA", "AG", "Sex", "Pop")) %>%
arrange(AG)
sex_pop_pyramid <- aggregate(Pop~AG+Sex,data=sg_pyramid,FUN=sum)
head(sex_pop_pyramid)
## AG Sex Pop
## 1 0_to_4 Females 90850
## 2 05_to_09 Females 97040
## 3 10_to_14 Females 102550
## 4 15_to_19 Females 108910
## 5 20_to_24 Females 122480
## 6 25_to_29 Females 145960
We use ggplot to plot the visualizations. geom_col() is used for bar charts. In order to visualizing both of the female and male bar chart in one axes, we use aes() function to do aesthetic mappings. aesthetics X axis is “AG” and the Y axis do a condition check on the sex. We multiply the male population to make the chart filpped. The fill parameter defines the legend is sex. coord_flip() is used to flip the X and Y axes. scale_y_continuous() is used to transform the negative value in y axes to positive.
pyramidplot<-ggplot(sex_pop_pyramid,aes(x = AG, y = ifelse(Sex == "Males", yes = -Pop, no = Pop),fill = Sex))+
geom_col()+
coord_flip()+
scale_fill_brewer(palette = "Set1") +
scale_y_continuous(labels = abs, limits = max(sex_pop_pyramid$Pop)*c(-1,1))+
labs(
x="Age Group",
y = "Population",
title = "Overview of Singapore Sex-Population Pyramid,2019",
caption = "Data Source: Singstat.com"
)+
theme(plot.title = element_text(size=12, face='bold'))
pyramidplot
labs() is used to define the x,y, caption and title label. scale_fill_brewer(), theme() functions are all used to format the data visualization to be more aesthetic.
ggplot(pyramid_withPA, aes(x = AG,fill = Sex,y = ifelse (
test = Sex == "Males",yes = -Pop,no = Pop))) +
geom_bar(stat = "identity") +
scale_y_continuous(labels = abs,
limits = max(pyramid_withPA$Pop) * c(-1, 1)) +
coord_flip() +
labs(x = "Age Group", y = "Population", title = "Singapore Demographic Structure by Age Group and Planning Area in 2019",caption = "Data Source: Singstat.com") +
scale_fill_brewer(palette = "Set1") +
theme(axis.text.x=element_blank())+
theme(axis.text.y=element_blank())+
theme(plot.title = element_text(size=12, face='bold'))+
theme(axis.title.x=element_text(size=9),axis.title.y=element_text(size=9))+
facet_wrap(~ PA)
Insight 1: Singapore’s Population in 2019 From the overview of Sex-Population Pyramid chart, we can find that the adult age group from 25-59 contributes majorly in the Singapore population. The distribution of females and males are almost equally.
Insight 2: Clusters From the various sex-population pyramids by age group and planning area, we can find that some planning areas has similar distributions and we can group them into different clusters.
1st type of cluster: Areas such as Bedok, Bukit Merah, Bukit Panjang, Kallang, Jurong West, Jurong East, Toa Payoh.. These planning areas all have the characteristic of standard bell curve shape for both females and males where the middle age group are the majority of residents and youth and senior have similar number of residents.
2nd type of clusters: Areas such as Punggol and Seng Kang. These planning areas has more youth and middle-aged people living compared with seniors. This may because there are many new states and youngsters prefer living there more.
3rd type of clusters: Areas in the CBD has the minimum number of residents. Compared as areas in the East, West and North like Bedok, Jurong West, there has significantly less people living in the CDB central area.