The age groups are not sorted according to smallest to largest, but sorted as characters. Unable to manually change the order like in Tableau.
Solution: Have to created additional column taking the characters before the first _ to get starting age of each AG, then convert to numeric and then reorder the graph axis according to the new column.
Facet_wrap does not work well with too many facets as the information on the charts are hard to read.
Solution: Resized the fig height and width as well as number to columns to minimise clutter.
Scales on axis are hard to read when the values are large , e.g. many ’0’s. Does not auotmatically give a more readable form unlike in Tableau.
Solution: Customise the scales with scale_y_continuous or scale_x_continuous function. E.g. 10000 to be converted to 10K to make it more reader-friendly.
Import tidyverse package and assign the original dataset to pop_data.
library(tidyverse)
pop_data <- read_csv("respopagesextod2011to2020.csv")
Filter the data by Time for the year 2019 and assign the dataset to pop_filterdate.
Use the aggregate function to get the sum of the population count by planning area and assign it to popcount_PA. Then filter popcount_PA further by excluding planning areas with population count of zero as it will not provide much insight.
Use the aggregate function to get the sum of population count by sex of residents in the same age group in the same planning area and assign it to pop_cleaned. Then filter it even further by only keeping planning areas with population of more than zero. This can be done by making use of planning areas that are already filtered in popcount_PA. Create an additional column, AGstart, to get the starting age for each age group to facilitate sorting of axis later on. Convert the AGstart column from character to numeric.
pop_filterdate <- filter(pop_data, Time == 2019)
popcount_PA <- aggregate(pop_filterdate$Pop, by = list(PA = pop_filterdate$PA), FUN = sum)
popcount_PA <- popcount_PA %>% filter(x!= 0)
pop_cleaned<- aggregate(pop_filterdate$Pop, by = list(PA = pop_filterdate$PA, AG = pop_filterdate$AG, Sex = pop_filterdate$Sex), FUN = sum)
pop_cleaned <- pop_cleaned%>% filter(PA %in% popcount_PA$PA)%>%
mutate(AGstart = sub("\\_.*", "", AG))
pop_cleaned$AGstart <- as.numeric(pop_cleaned$AGstart)
Main aim of the bar chart is show population count in each planning area in 2019. Refer to code below to plot the bar chart. (Note: Planning Areas with population count of zero have already been omitted)
ggplot(popcount_PA, aes(x = x, y = reorder(PA,x)))+ geom_bar(stat = 'identity')+scale_x_continuous(breaks = seq(0, 300000, 50000), labels = paste0(as.character(c(0:6*5)),"K"))+
ggtitle("Population Count by PA in 2019", subtitle = "Bedok, Jurong West and Tampines top 3 largest planning areas") + labs(x = "Population Count", y = "PA", caption = "Source: Singstat Singapore Residents by Planning AreaSubzone, Age Group, Sex and Type of Dwelling, June 2011-2020")
This bar chart, unlike the previous bar chart, provides additional detail in terms of the population count by age group for each planning area in 2019.
ggplot(pop_cleaned, aes(x = reorder(AG, AGstart), y = x))+
geom_bar(stat = 'identity')+
scale_y_continuous(breaks = seq(0, 40000, 5000), labels = paste0(as.character(c(0:8*5)),"K"))+labs(y= "Count", x = "Age Group", caption = "Source: Singstat Singapore Residents by Planning AreaSubzone, Age Group, Sex and Type of Dwelling, June 2011-2020")+coord_flip()+facet_wrap(~PA, ncol = 6)+ ggtitle("Population Distribution by PA and Age Group in 2019", subtitle = "Punggol & Sengkang top 2 in terms of proportion of residents aged four and below")
ggplot(pop_cleaned, aes(x = reorder(AG, AGstart), y = x, fill = Sex))+
geom_bar(data = subset(pop_cleaned, Sex == 'Females'),stat = 'identity')+
geom_bar(data = subset(pop_cleaned, Sex == 'Males'),stat = 'identity', aes(y=x*(-1)))+
scale_y_continuous(breaks = seq(-20000, 20000, 5000), labels = paste0(as.character(c(4:0*5, 1:4*5)),"K"))+coord_flip()+labs(y= "Count", x = "Age Group", caption = "Source: Singstat Singapore Residents by Planning AreaSubzone, Age Group, Sex and Type of Dwelling, June 2011-2020")+ facet_wrap(~PA, ncol = 6)+ggtitle("Population Distribution in PA by Age Group and Sex in 2019")
Observations:
Bedok, Jurong West and Tampines are the top 3 planning areas in terms of population size, with Bedok being first, followed by Jurong West then Tampines.
Punggol and Sengkang are relatively younger planning areas. Among all planning areas, the proportion of population between 0 to 4 year old in Punggol and Sengkang are 2 of the largest as compared to others. This is further supported by the population pyramids, as seen from their wider bases. Likely to be inhabited by young families with kids as seen from the largest age group being 35 to 39 years old and the high proportion of 0 to 4 year olds.
Signs of an aging population is evident across most planning areas as seen from the narrow bases of the pyramids and wider midsections, with except of Punggol and Sengkang. Across all planning areas, there seems to be a common phenomenon of women living longer than men. More women live till 85 years old and above as compared to men.