Until 2019, the economically active group (aged from 25 to 64) is the biggest part of the population, which accounting for more than a half of people in Singapore, followed by the economy dependency (aged 0 to 14), while the percentage of aged people (aged 65 and above) is the smallest.
There is an obvious residence bias in planning areas of Singapore, with most people dwelling in Bedok, Jurong West and Choa Chu Kang and nearly no people living in Pioneer, Paya Leber, North Eastern Islands, Marina East, Changi Bay, Central Water Catchment, .et.
No matter in chart 1 which showing the distribution of population of age groups, or in chart 2 which presenting the distribution of population in different planning areas, there is not a remarkable difference between the number of men and women. However, for people aged above 80, the number of women is slightly higher than men.
packages <- c('tidyverse')
for (p in packages){
if (!require(p,character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
## Loading required package: tidyverse
## -- Attaching packages ------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
data_SG <- read.csv("data/respopagesextod2011to2019.csv")
library(dplyr)
data_SG <- filter(data_SG, Time =="2019")
library("reshape2")
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
require(data.table)
## Loading required package: data.table
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:reshape2':
##
## dcast, melt
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following object is masked from 'package:purrr':
##
## transpose
data_pivot= dcast(setDT(data_SG), Sex~AG,fun=list(sum,length),value.var = "Pop")
data_pivot<-data_pivot[,1:20]
names(data_pivot)<-c("Gender","0-4","10-14","15-19","20-24","25-29","30-34","35-39","40-44","45-49","5-9","50-54","55-59","60-64","65-69","70-74","75-79","80-84","85-89","90-over")
data_pivot<- data_pivot[,c("Gender","0-4","5-9","10-14","15-19","20-24","25-29","30-34","35-39","40-44","45-49","50-54","55-59","60-64","65-69","70-74","75-79","80-84","85-89","90-over")]
data_pivot
## Gender 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49
## 1: Females 90850 97040 102550 108910 122480 145960 153460 158850 157120 160230
## 2: Males 94730 101290 105830 113730 127040 142640 140360 142310 144130 151800
## 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-over
## 1: 152750 153590 140770 112900 79190 52680 36230 21430 13730
## 2: 149360 153850 138490 108920 71450 42460 26230 12490 5590
pop_class<-melt(data_pivot,id=c("Gender"))
pop_class$value[data_pivot$Gender=="Males"]<-pop_class$value[data_pivot$Gender=="Males"]*-1
Pop_agecohort<- ggplot(pop_class,aes(x=variable,y=value,
fill=Gender))+
geom_bar(stat="identity",position="identity")+
labs(X="Age",y="pop",
title="Demographic structure of population by Age Cohort in 2019")+
coord_flip()+xlab("Population")+ylab("Gender")+
geom_vline(aes(xintercept=mean(value)),colour ="red",linetype="dashed")
options(scipen=200)
Pop_agecohort
require(data.table)
pa_pivot= dcast(setDT(data_SG), Sex~PA,fun=list(sum,length),value.var = "Pop")
pa_pivot<-pa_pivot[,1:37]
names(pa_pivot)<-c("Gender","Ang MoKio","Bedok","Bishan" ,"Boon Lay","Bukit Batok" ,"Bukit Merah","Bukit Panjang" ,"Bukit Timah","Central Water Catchment","Changi","Changi Bay" ,"Choa Chu Kang","Clementi" ,"Downtown Core","Geylang","Hougang","Jurong East" ,"Jurong West","Kallang" , "Lim Chu Kang","Mandai","Marina East","Marina South","Marine Parade" ,"Museum","Newton","North-Eastern Islands","Novena","Orchard","Outram" ,"Pasir Ris","Paya Lebar","Pioneer","Punggol","Queenstown","River Valley")
pa_pivot<- pa_pivot[,c("Gender","Ang MoKio","Bedok","Bishan" ,"Boon Lay","Bukit Batok" ,"Bukit Merah","Bukit Panjang","Bukit Timah","Central Water Catchment","Changi","Changi Bay" ,"Choa Chu Kang","Clementi" ,"Downtown Core","Geylang","Hougang","Jurong East" ,"Jurong West","Kallang", "Lim Chu Kang","Mandai","Marina East","Marina South",
"Marine Parade" ,"Museum","Newton","North-Eastern Islands","Novena","Orchard","Outram" ,"Pasir Ris","Paya Lebar","Pioneer","Punggol","Queenstown","River Valley")]
pa_pivot<-read.csv("data/pa_pivot.csv")
pa_pivot
## Gender Bedok Jurong.West Hougang Choa.Chu.Kang Punggol Ang.MoKio
## 1 Females 144160 131920 115760 95860 86650 85770
## 2 Males 135810 133090 111350 95240 84270 78660
## Bukit.Batok Bukit.Merah Pasir.Ris Bukit.Panjang Geylang Kallang Queenstown
## 1 78600 79230 75070 70840 55870 51490 51040
## 2 75540 73370 73140 68860 54650 50450 45430
## Clementi Bishan Jurong.East Bukit.Timah Novena Marine.Parade Outram
## 1 48840 45500 39960 41710 25440 24620 9560
## 2 44070 42730 39270 36010 23950 21830 9490
## River.Valley Newton Downtown.Core Mandai Changi Orchard Museum Lim.Chu.Kang
## 1 5490 4140 1300 1070 910 480 240 40
## 2 4690 3860 1200 990 880 420 190 30
## Boon.Lay Central.Water.Catchment Changi.Bay Marina.East Marina.South
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## North.Eastern.Islands Paya.Lebar Pioneer
## 1 0 0 0
## 2 0 0 0
pa_class<-melt(pa_pivot,id=c("Gender"))
pa_class$value[pa_pivot$Gender=="Males"]<-pa_class$value[pa_pivot$Gender=="Males"]*-1
Pop_pa<- ggplot(pa_class,aes(x=variable,y=value,fill=Gender))+
geom_bar(stat="identity",position="identity")+
labs(X="PA",y="pop",title="Demographic structure of population by Planning Area in 2019")+coord_flip()+xlab("Planning area")+ylab("Gender")+
geom_vline(aes(xintercept=mean(value)),colour ="red")
Pop_pa
R visualization provides more possibilities in data preprocessing, we can apply R code and all kinds of packages more efficiently to do data cleaning and data processing such as filtering the specific rows and recoding some values in the columns, which is more complicated in Tableau.
R visualization is more flexible: unlike tableau, we can visualize data more flexibly by customizing the calculation method , especially in particular matrices, while we are limited in the functions that official platform created for us instead of user-defined functions.
R Studio code will record every step we did, the fact means that we can learning from our own previous work and others. Besides, we can change our code by setting different parameters so as to make our work efficiently.