This assignment is to capture visualization and trends of the demographics of Singapore population by age cohort and by planning area circa 2011 to 2019. The data has been sourced form the Department Statistics of Singapore (https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data)
The major challenge was adapting from a user interactive platform like Tableau to RStudio which is more technical. The following are some of the challenges faced with the data design and actions taken to overcome them:
2.1 The data had to be cleaned since there were many Planning Areas with zero population. These rows were excluded from further analysis
2.2 For most of the visualizations, they weren’t aesthetically pleasing due to lack of boundaries and standardized axis. Therefore a Formatting reusable code was created to have a uniform format for different codes.
2.3 For most of the visualization, the data had to be individually customized which means, excluding the columns that wouldn’t be required to produce a proposed visualization. The data is further grouped and aggregated using the dplyr package to create new data frames.
2.4 The data had to be categorised by creating new columns with respect to identifying Regions. As observed there are 55 planning areas making it complicated to detect any trends. Therefore to compress the dimentionality, external data was downloaded which maps each Planning area to a given region (Eg. Serangoon falling under North East region).
The following images indicate the proposed design of visualizations which will be formed using ggplot
Firstly, we import and load all the necessary packages including tidyverse,CGP functions and Plotly as shown below,
As previously indicated, for aesthetic purpose a reusable formatting code is coded as below
Formatting <- list(
theme_bw(),
theme(panel.grid.major.x = element_blank()),
theme(axis.text.x.top = element_text(size=13)),
theme(plot.title = element_text(size=14, face = "bold", hjust = 0.5)),
theme(plot.subtitle = element_text(hjust = 0.5))
)
Next, we import the Singapore Population and Demographic data from the saved csv file
setwd('C:\\Users\\Santosh Maruwada\\Documents\\SMU Study materials\\Term 3\\Visual Analytics\\Assignment\\Assignment 4')
raw_data <- read_csv("respopagesextod2011to2019.csv")
## Parsed with column specification:
## cols(
## PA = col_character(),
## SZ = col_character(),
## AG = col_character(),
## Sex = col_character(),
## TOD = col_character(),
## Pop = col_double(),
## Time = col_double(),
## Region = col_character()
## )
This visualization helps us understand the diversity and density of Singapore’s population across various age groups split by Sex for the year 2019. The first few steps would be to prepare the data. Time is converted into a categorical variable and select the year 2019 through filter function. Then a dataframe is created where Age Group, Sex and Population are grouped together.
pyramid <- raw_data %>%
mutate('Time' = as.character(Time)) %>%
filter(Time=="2019") %>%
select(AG, Sex, Pop) %>%
group_by(AG, Sex) %>%
summarise(Total_Pop = sum(Pop))
To create the pyramid visualization, we divide the total poppulation by 1000 by better visuazlization and we replace the 5_to_9 age group to 05_to_09 so that it fits in the descending order of age on the y axis. The coord_flip() function helps to make the bars horizontal and coupling it with if-else statement where we direct males population to the negative axis and females to positive. The legends indicate the colour a Sex belongs too and the label indicates the population of males and females for a given age group
pyramid$Total_Pop <- with(pyramid, ifelse(pyramid$Sex =="Males", -Total_Pop/1000, Total_Pop/1000))
pyramid$AG<-str_replace(as.character(pyramid$AG),"5_to_9","05_to_09")
pyramid1<-ggplot(pyramid,aes(x=AG, y=Total_Pop, fill=Sex)) + geom_bar(stat="identity") + scale_y_continuous(labels = abs, limits=max(pyramid$Total_Pop) * c(-1,1) * 1.1) + scale_fill_manual(values=as.vector(c("#FF66B2","#FF9933"))) + coord_flip() + labs(title="Singapore Population Age-Sex Pyramid - 2019", x="Age Group", y="Population (in 000s)") + Formatting +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5, size=15)
)+geom_text(aes(label = abs(Total_Pop), hjust=ifelse(test = pyramid$Sex == "Males", yes = 1.1, no = -0.1)))
pyramid1
a<-ggplotly(pyramid1,session="knitr")
To understand the distibution of the property types in Singapore, a histogram is created using the ggplot function by plotting mean of the Number of House units against property type. The following visualization shows the trend of housing type where majority of the population in Singapore resides.
histogram1 <- ggplot(data=raw_data, aes(x=reorder(TOD, -Pop),y=Pop))+stat_summary(geom='bar',color="black", fun.y='mean',fill="#FF9933") + labs(title="Singapore's Mean Housing Distribution, 2019", x="Property Type", y="Mean number of units") + theme_minimal() + Formatting
## Warning: `fun.y` is deprecated. Use `fun` instead.
histogram1
b<-ggplotly(histogram1,session="knitr")
To understand the population growth in Singapore across the two genders, a scatterplot is used to plot the population figures across the timeline from 2011-2019. TO obtain the visuals, first time is converted to character followed by picking up the relevant columns which are grouped into a data frame.
scatterp <- raw_data %>%
mutate('Time' = as.character(Time)) %>%
select(Time, Sex, Pop) %>%
group_by(Time, Sex) %>%
summarise(Total_Pop = sum(Pop))
Post the creation of the data fram, ggplot function is used to create a scatterplot that shows the growth of population across the two genders.
scatterp1 <- ggplot(scatterp, aes(x=Time, y=Total_Pop/1000, color=Sex))+geom_point(position="jitter", size=3)+labs(title="Singapore Residents Population, 2011-2019", x="Year", y="Population (in 000s)")+theme(plot.title = element_text(hjust = 0.5))+scale_color_manual(values=c("#FF66B2","#FF9933"))+ Formatting
scatterp1
c<-ggplotly(scatterp1,session="knitr")
##Understanding Singapore Population Density across Regions using Heat Map
A heat map indicates the population density across different regions by age group in the year 2019. The representation is in form of boxes indicating the population to the adjacent age group and planning area. To create the visualization, geom_tile() function is used. Firstly, time is converted into character and the relevant columns are picked for visualization
heatmap<- raw_data %>%
mutate(`Time`=as.character(Time)) %>%
filter(Time=="2019") %>%
select(-SZ,-Time,-Sex,-TOD)
heatmap1 <- aggregate(Pop~Region+AG,data=heatmap,FUN=sum)
heatmap1$AG<-str_replace(as.character(heatmap1$AG),"5_to_9","05_to_09")
Suing Geom_tile() function, the following visualization is obtaiined.
heatmapviz1<- ggplot(heatmap1,aes(AG,Region,fill=Pop))+
geom_tile(position = "identity",stat = "identity")+Formatting+labs(title = "Singapore Population Density - 2019",x="Age Group",y="Planning Area")+
theme(axis.text.x = element_text(angle = 90))+
theme(axis.text.y = element_text(size=8))
heatmapviz1
d<-ggplotly(heatmapviz1,session="knitr")
On using the geom_tile function and mapping Property Distribution by Region the following graphy is obstained.
heatmap<- raw_data %>%
mutate(`Time`=as.character(Time)) %>%
filter(Time=="2019") %>%
select(-SZ,-Time,-Sex)
heatmap2 <- aggregate(Pop~Region+TOD,data=heatmap,FUN=sum)
Suing Geom_tile() function, the following visualization is obtaiined.
heatmapviz2<- ggplot(heatmap2,aes(TOD,Region,fill=Pop))+
geom_tile(position = "identity",stat = "identity")+Formatting+labs(title = "Housing Type Density - 2019",x="Housing Type",y="Region")+
theme(axis.text.x = element_text(angle = 90))+
theme(axis.text.y = element_text(size=8))
heatmapviz2
e<-ggplotly(heatmapviz2,session="knitr")
The population history of Singapore by region can be determined thanks to slope chart allowing us to see trends on where do Singapore prefer staying over the period of time (2011-2019). To obtain the desired slope graph, newggslopegraph from the CGPfunctions library. Firstly, the data is prepared by changing time as a character and picking the required columns for the visualization using select and creating a data frame. We have grouped the time period of every two years from 2011 to 2019 to better visualize the trends.
popslope <- raw_data %>%
mutate('Time' = as.character(Time)) %>%
select(Time, Pop, Region) %>%
filter(Time %in% c("2011", "2013", "2015", "2017", "2019")) %>%
group_by(Region, Time) %>%
summarise(Pop = sum(Pop))
Here the newggslopegraph function is implemented to obtain the slope graph shown below
slopegraph1<-newggslopegraph(popslope, Time, Pop, Region)+labs(title="Singapore Resident Population by Region, 2011-2019",subtitle="", caption="", y="Population")+theme(plot.title = element_text(hjust = 0.5, size=15))
slopegraph1
Here are the visualizations summarized which were ellaborated above:
The following insights have been generated through the above visualizations:
The age-sex pyramid confirms the fact Singapore’s case of agening population with majority of the population being concentrated between age group 25-65. This ia indictive thatthe birth rate is on a decline and it probably attributes to rising living costs and educational expenses.
The heat maps are indicative the age group distribution and it can be seen that North East region hoardes majority of lower-middle aged population and the heat map on property type by region shows that majority of HDBs are around the North East regions. This can be indictive that economically active population is being provided with housing in these regions by the govt.
There is an influx of Singapore population settling in the North East region as indicated by the slope graph. On reflecting the data from the heat map, it indicates that it’s this region is mainly comprised of middle aged population or the ones who are economically active (i.e 30-50yrs old) . This shows us that there has been an enhancement in housing projects around these regions probably an initiative by the govt. to drive it’s growing resident population to other regions which are less occupied and knowing that Singapore is limited by land mass, resource and space utilization is extremely vital.