Author: Penny Tuar
Date: 24th July 2020
This project aims to visualise the trending of demographic structure of Singapore population by age cohort and by planning area from 2011 to 2019. The data used was published on Singapore Department of Statistics Singstat.
Challenge A) There were large number of meaningless data from the dataset, there were 883,728 rows, of which 625,919 rows were records with zero (0) population count.
Solution A) Removed the rows with 0 population count before loading data into R.
Challenge B) There were 55 Planning areas in the dataset, due to the large number of planning areas variables, the charts might look very cluttered in certain chart types.
Solution B) A separate column ‘Region’ with only 5 variables is created with reference to the data from Wikipedia
Challenge C) The demographic characteristics was named using abbreviation such as PA,SZ,AG etc, for visualisation there will be a need to use column names that are understandable to all. Also within the column AG (Age), the variable from 5 to 9 years old was recorded as 5_to_9, this had resulted in error in the values order.
Solution C) Change the abbreviation by renaming the column names and manually indicate the desired_order for the Age variables.
Challenge D) Due to being new to this app, constantly faced with many error messages and some has no real solution such as “Error: attempt to use zero-length variable name”
Solution D) Need to play around with the app more and attempt to change the visualisation many times (this is the real challenge)
alt text
packages <- c('tidyverse','ggplot2')
for (p in packages) {
if (!require (p,character.only = T)){
install.packages(p)
}
library(p,character.only=T)
}
## Loading required package: tidyverse
## -- Attaching packages ----------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts -------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
data <- read.csv ("respopagesextod2011to2019edited.csv")
head(data)
## ï..PA SZ AG Sex
## 1 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 2 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 3 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 4 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 5 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 6 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## TOD Pop Time Region
## 1 HDB 1- and 2-Room Flats 0 2011 North-East
## 2 HDB 3-Room Flats 10 2011 North-East
## 3 HDB 4-Room Flats 30 2011 North-East
## 4 HDB 5-Room and Executive Flats 50 2011 North-East
## 5 HUDC Flats (excluding those privatised) 0 2011 North-East
## 6 Landed Properties 0 2011 North-East
names(data)[names(data) == "PA"] <- "Planning_Area"
names(data)[names(data) == "SZ"] <- "Subzone"
names(data)[names(data) == "AG"] <- "Age_Group"
names(data)[names(data) == "TOD"] <- "Housing"
names(data)[names(data) == "Pop"] <- "Population"
names(data)[names(data) == "Time"] <- "Year"
names(data)[names(data) == "Region"] <- "Region"
data <-as.data.frame(data)
head(data)
## ï..PA Subzone Age_Group Sex
## 1 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 2 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 3 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 4 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 5 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 6 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## Housing Population Year Region
## 1 HDB 1- and 2-Room Flats 0 2011 North-East
## 2 HDB 3-Room Flats 10 2011 North-East
## 3 HDB 4-Room Flats 30 2011 North-East
## 4 HDB 5-Room and Executive Flats 50 2011 North-East
## 5 HUDC Flats (excluding those privatised) 0 2011 North-East
## 6 Landed Properties 0 2011 North-East
summary(data)
## ï..PA Subzone Age_Group
## Bukit Merah : 46512 Admiralty : 2736 0_to_4 : 46512
## Queenstown : 41040 Airport Road : 2736 10_to_14: 46512
## Ang Mo Kio : 32832 Alexandra Hill : 2736 15_to_19: 46512
## Downtown Core: 32832 Alexandra North: 2736 20_to_24: 46512
## Toa Payoh : 32832 Aljunied : 2736 25_to_29: 46512
## Hougang : 27360 Anak Bukit : 2736 30_to_34: 46512
## (Other) :670320 (Other) :867312 (Other) :604656
## Sex Housing
## Females:441864 Condominiums and Other Apartments :110466
## Males :441864 HDB 1- and 2-Room Flats :110466
## HDB 3-Room Flats :110466
## HDB 4-Room Flats :110466
## HDB 5-Room and Executive Flats :110466
## HUDC Flats (excluding those privatised):110466
## (Other) :220932
## Population Year Region
## Min. : 0.00 Min. :2011 Central :366624
## 1st Qu.: 0.00 1st Qu.:2013 East : 82080
## Median : 0.00 Median :2015 North :112176
## Mean : 39.83 Mean :2015 North-East:131328
## 3rd Qu.: 10.00 3rd Qu.:2017 West :191520
## Max. :2860.00 Max. :2019
##
data$Age_Group <- factor(data$Age_Group, levels=c("0_to_4","5_to_9","10_to_14","15_to_19","20_to_24","25_to_29","30_to_34","35_to_39","40_to_44","45_to_49","50_to_54","55_to_59","60_to_64","65_to_69", "70_to_74", "75_to_79","80_to_84","85_to_89","90_and_over"))
levels(data$Age_Group)
## [1] "0_to_4" "5_to_9" "10_to_14" "15_to_19" "20_to_24"
## [6] "25_to_29" "30_to_34" "35_to_39" "40_to_44" "45_to_49"
## [11] "50_to_54" "55_to_59" "60_to_64" "65_to_69" "70_to_74"
## [16] "75_to_79" "80_to_84" "85_to_89" "90_and_over"
As we only want to compare the data in 2011 and 2019, therefore we use the filter () function to select the relevant year data.
data_py2019 <- data %>%
filter(Year==2019)
data_final2019 <- aggregate(Population~Age_Group+Sex,data=data_py2019,FUN=sum)
head(data_final2019)
## Age_Group Sex Population
## 1 0_to_4 Females 90850
## 2 5_to_9 Females 97040
## 3 10_to_14 Females 102550
## 4 15_to_19 Females 108910
## 5 20_to_24 Females 122480
## 6 25_to_29 Females 145960
data_py2011 <- data %>%
filter(Year==2011)
data_final2011 <- aggregate(Population~Age_Group+Sex,data=data_py2011,FUN=sum)
head(data_final2011)
## Age_Group Sex Population
## 1 0_to_4 Females 92590
## 2 5_to_9 Females 102180
## 3 10_to_14 Females 117300
## 4 15_to_19 Females 128240
## 5 20_to_24 Females 127010
## 6 25_to_29 Females 135680
options(scipen=999)
ggplot(data_final2019, aes(x = Age_Group, fill = Sex,
y = ifelse(test = Sex == "Males",
yes = -Population, no = Population))) +
geom_bar(stat = "identity") +
scale_y_continuous(labels = abs, limits = max(data_final2019$Population) * c(-1,1)) +theme_minimal()+
coord_flip() + labs(title="Age-Sex Population pyramid for 2019",
caption="Source: Singapore Department of Statistics", x = "Age Group", y = "Population")
options(scipen=999)
ggplot(data_final2011, aes(x = Age_Group, fill = Sex,
y = ifelse(test = Sex == "Males",
yes = -Population, no = Population))) +
geom_bar(stat = "identity") +
scale_y_continuous(labels = abs, limits = max(data_final2011$Population) * c(-1,1)) +theme_minimal()+
coord_flip() + labs(title="Age-Sex Population pyramid for 2011",
caption="Source: Singapore Department of Statistics", x = "Age Group", y = "Population")
As we are also interesting to know the pattern across the different planning area, we added the region variables
data_region2019 <- data_py2019 %>%
group_by(Age_Group, Sex, Region) %>%
summarise(Population = sum(Population))
ggplot(data_region2019, aes(x = Age_Group, fill = Sex,
y = ifelse(test = Sex == "Males",
yes = -Population, no = Population))) +
geom_bar(stat = "identity") +
coord_flip() + labs(title="Age,Sex,Region Population pyramid for 2019",
caption="Source: Singapore Department of Statistics and Wikipedia", x = "Age Group", y = "Population")+
scale_y_continuous(breaks=NULL) +
theme_minimal()+
facet_grid(~ Region)
data_region2011 <- data_py2011 %>%
group_by(Age_Group, Sex, Region) %>%
summarise(Population = sum(Population))
ggplot(data_region2011, aes(x = Age_Group, fill = Sex,
y = ifelse(test = Sex == "Males",
yes = -Population, no = Population))) +
geom_bar(stat = "identity") +
coord_flip() + labs(title="Age,Sex,Region Population pyramid for 2011",
caption="Source: Singapore Department of Statistics and Wikipedia", x = "Age Group", y = "Population")+
scale_y_continuous(breaks=NULL) +
theme_minimal()+
facet_grid(~ Region)
Based on the Age-Sex population pyramid, we are able to tell in 2011, there were lesser people who are 90 and above, while this has increased quite a fair bit by 2019. This means that Singaporean could be leading a longer life. Also from the change in shape, we are able to tell that in 2011, the highest population age range is about the 45 to 49 years while in 2019 this has become the 55 to 59 years. This could mean thar in another 10 years or less, the bulk of the population will be the aging population. This could be a warning sign to Singaporean.
Based on the additional factor of Region, the shape of of the population by region is quite inline with the insight mentioned earlier. The point that is worth noting will be that in the north-east region, there seems to be a spike in the overall population as well as increase in younger age as well as young adult in the mid 30s. This is inline with the recents years of development in Sengkang, Punggol and hougang.
Sources
SingStat:(https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data)
Wikipedia:(https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore)