The purpose of Assignment 4 is to provide a better understanding of the demographic structure of Singapore. The data used in this assignment can be obtained from the Department of Statistics:https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data.
There will be three parts to this visualization.
The age sex pyramid will be used to analyze the age sex structure of the total population in Singapore as at 2019 as well as by the four main racial groups in Singapore (Chinese, Malays, Indians and Others)
Filled bar graphs will be used to display the proportion of the three population categories (young, economically active and aged) within the various planning areas and regions in Singapore
Bar graphs will be used to display how the young, economically active and aged has changed from 2011 to 2019 on a year to year basis by region
A dumbbell plot will be used to display how dependency ratio has changed from 2011 to 2019 between regions as well as planning areas
The age group split for the young, eocnomically active and aged are as follows: (i) Young: 0 years old to 24 years old (ii) Economically Active:25 years old to 64 years old (iii) Aged: 65 years old and above
Below are the list of libraries used in this viusalization assignment
library(ggpubr)
## Loading required package: ggplot2
library(ggplot2)
library(readr)
library(ggalt)
## Registered S3 methods overwritten by 'ggalt':
## method from
## grid.draw.absoluteGrob ggplot2
## grobHeight.absoluteGrob ggplot2
## grobWidth.absoluteGrob ggplot2
## grobX.absoluteGrob ggplot2
## grobY.absoluteGrob ggplot2
library(ggcorrplot)
library(ggthemes)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(purrr)
library(readr)
library(readxl)
library(stringr)
library(tibble)
library(tidyr)
library(tidyverse)
## -- Attaching packages ----------------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v dplyr 0.8.3 v forcats 0.4.0
## -- Conflicts -------------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks plotly::filter(), stats::filter()
## x dplyr::lag() masks stats::lag()
library(viridis)
## Loading required package: viridisLite
library(viridisLite)
The raw data was downloaded from Department of Statistics:https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data and transformed into these five datasets where new columns such as Age_Groups_Category, Category, Category1,Percent ,Dependency_Ratio_2011, Dependency_Ratio_2019 were formed by doing up a new calculation or by redcoding of previously existing columns.
There are 5 main datasets that were used: “AgeGenderRace2”, “Datacombined3” , “Datacombined2”, “DependencyRatioByRegion”, “DependencyRatio2”.
A brief description of each dataset are as follows:
(a) AgeGenderRace2
The columns of the dataset are as follows: Age_Groups,Age_Groups_Category,Gender, Category, Category1, Chiness,Malays,Indians,Others,Tota and the data pertains to 2019
Description of each column are as follows:
Age_Groups are the various age brackets (eg: 0 to 4 years old, 5 to 9 years old etc)
Age_Groups_Category are the category number for the various age brackets (for example age bracket 0 to 4 years old will fall under Age_Groups_Category 1)
Gender indicate if its male or female
Category refers to the age brackets that fall under either the Young, Aged or Economically Active as mentioned previously in the introduction
Category1 is the category number for Category (for example the Young will be labelled as 1)
Columns Chiness,Malays,Indians,Others,Total is the count of the population
(b) Datacombined 3
The columns of the dataset are as follows: Planning_Area, Region. Category1. Category, Year, Count, Total, Percent
Description of each column are as follows:
Planning_Area are the various areas in Singapore drawn up by the Urban Redevelopment Authority (eg: Ang Mo Kio, Yishun etc)
Region are the five regions in Singapore such as North-East, East, West, Central, East. The information on the mapping of the planning areas to the regions were obtained from :https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore.
Category refers to the age brackets that fall under either the Young, Aged or Economically Active as mentioned previously in the introduction
Category1 is the category number for Category (for example the Young will be labelled as 1)
Year represents the years which in this dataset is all 2019
Count is the count for that particular planning area and for a specific category (eg: the count of the economically active in Ang Mo Kio)
Total is the total populaiton for that planning area
Percent is a derived column by taking Count/ Total
(c) Datacombined2
The columns of the dataset are as follows: Planning_Area, Region, Category1, Category, Year, Count
Description of each column are as follows:
Planning_Area are the various areas in Singapore drawn up by the Urban Redevelopment Authority (eg: Ang Mo Kio, Yishun etc)
Region are the five regions in Singapore such as North-East, East, West, Central, East. The information on the mapping of the planning areas to the regions were obtained from :https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore.
Category refers to the age brackets that fall under either the Young, Aged or Economically Active as mentioned previously in the introduction
Category1 is the category number for Category (for example the Young will be labelled as 1)
Year represents the years which will be from 2011 to 2019
Count is the count for that particular planning area, for a specific category and for a specific year (eg: the count of the economically active in Ang Mo Kio for 2019)
(d) DependencyRatio2
The columns of the dataset are as follows: Planning_Area, Region, Year_2011, Young_2011, Aged_2011, Economically_Active_2011, Dependency_Ratio_2011, Planning_Area_2019, Year_20119, Young_2019, Aged_2019 Economically_Active_2019, Dependency_Ratio_2019
Description of each column are as follows:
Planning_Area are the various areas in Singapore drawn up by the Urban Redevelopment Authority (eg: Ang Mo Kio, Yishun etc). Planning_Area_2019 is the same as Planning_Area
Region are the five regions in Singapore such as North-East, East, West, Central, East. The information on the mapping of the planning areas to the regions were obtained from :https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore.
Year_2011 and Year_20119 is a column that shows the year 2011 and 2019 respectively
Young_2011, Aged_2011, Economically_Active_2011 represents that total count under each category for the year 2011 for each planning area
Young_2019, Aged_2019 Economically_Active_2019 represents that total count under each category for the year 2019 for each planning area
Dependency_Ratio_2011 and Dependency_Ratio_2019 represents the dependency ratio for years 2011 and 2019 respctively for each planning area. Dependency_Ratio_2011 is calculated by taking ((Young_2011+Aged_2011)/Economically_Active_2011)X 100. Dependency_Ratio_2019 is calculated by taking ((Young_2019+Aged_2019)/Economically_Active_2019)X 100
(e) DependencyRatioByRegion
The columns of the dataset are as follows:Region, Average_Dependency_Ratio_2011, Average_Dependency_Ratio_2019
Description of each column are as follows:
Region are the five regions in Singapore such as North-East, East, West, Central, East. The information on the mapping of the planning areas to the regions were obtained from :https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore.
Average_Dependency_Ratio_2011 and Average_Dependency_Ratio_2019 is the average of Dependency_Ratio_2011 and Dependency_Ratio_2019 respectively by region
The last step of data wrangling that was applied for all five datasets were recoding the age groups of 0 - 4, 5 - 9, 10 - 14, 15 - 19, 20 - 24, 25 - 29, 30 - 34, 35 - 39, 40 - 44 , 45 - 49 , 50 - 54, 55 - 59, 60 - 64, 65 - 69, 70 - 74, 75 - 79, 80 - 84, 85 & Over to 0_to_4, 05_to_9, 10_to_14, 15_to_19, 20_to_24, 25_to_29, 30_to_34, 35_to_39, 40_to_44,45_to_49,50_to_54,55_to_59,60_to_64,65_to_69,70_to_74,75_to_79,80_to_84,85_and_Over
library(readr)
setwd("C:/Users/User/Desktop/ASSIGNMENT 4")
AgeGenderRace2 <- read_csv('AgeGenderRace2.csv')
## Parsed with column specification:
## cols(
## Age_Groups = col_character(),
## Age_Groups_Category = col_double(),
## Gender = col_character(),
## Category = col_character(),
## Category1 = col_double(),
## Chinese = col_number(),
## Malays = col_number(),
## Indians = col_number(),
## Others = col_number(),
## Total = col_number()
## )
setwd("C:/Users/User/Desktop/ASSIGNMENT 4")
Datacombined3 <- read_csv('Datacombined3.csv')
## Parsed with column specification:
## cols(
## Planning_Area = col_character(),
## Region = col_character(),
## Category1 = col_double(),
## Category = col_character(),
## Year = col_double(),
## Count = col_double(),
## Total = col_double(),
## Percent = col_double()
## )
setwd("C:/Users/User/Desktop/ASSIGNMENT 4")
Datacombined2 <- read_csv('Datacombined2.csv')
## Parsed with column specification:
## cols(
## Planning_Area = col_character(),
## Region = col_character(),
## Category1 = col_double(),
## Category = col_character(),
## Year = col_double(),
## Count = col_double()
## )
setwd("C:/Users/User/Desktop/ASSIGNMENT 4")
DependencyRatioByRegion <- read_csv('DependencyRatioByRegion.csv')
## Parsed with column specification:
## cols(
## Region = col_character(),
## Average_Dependency_Ratio_2011 = col_double(),
## Average_Dependency_Ratio_2019 = col_double()
## )
setwd("C:/Users/User/Desktop/ASSIGNMENT 4")
DependencyRatio2 <- read_csv('DependencyRatio2.csv')
## Parsed with column specification:
## cols(
## Planning_Area = col_character(),
## Region = col_character(),
## Year_2011 = col_double(),
## Young_2011 = col_double(),
## Aged_2011 = col_double(),
## Economically_Active_2011 = col_double(),
## Dependency_Ratio_2011 = col_double(),
## Planning_Area_2019 = col_character(),
## Year_20119 = col_double(),
## Young_2019 = col_double(),
## Aged_2019 = col_double(),
## Economically_Active_2019 = col_double(),
## Dependency_Ratio_2019 = col_double()
## )
There are many planning areas in Singapore. While it is good to understand the demographic structure in each planning area, it is also useful to get an overview of demographic view by region. However, there is missing data between planning areas and regions. Therefore, the missing data was obtained from https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore and combined with the raw dataset
There are many age groups in the raw data. For some age groups such as 0 to 4 and 5 to 9 there is no significant variation between each age group. Therefore, it is more useful to combine the age groups into three broader catagories such as young, economically active and aged for more useful analysis. Therefore, age groups were recoded into the three age categories (young, economically active and aged)
Using an appropriate visualization tool to visualize age and sex simultaneously. An age-sex population pyramid could be used to visualize this.
Finding a useful and concise way to visualize and analyze the change in demographic structure from 2011 to 2019 by planning areas and regions. Analyzing the change in demographic structure using either age groups or age categories may be too much information being presented. Therefore, calculating the dependency ratio of 2011 and 2019 and seeing the change by planning area and region may be more feasible as the dependency ratio takes into account all age groups. Therefore, the dependency ratio was calculated for 2011 and 2019 and was visualized using a dumbbell plot.
Finding a useful and concise way to visualize and analyze the demographic structure in 2019 by planning areas and region. A filled bar plot could be used to analyze and to do a comparisons on the proportions of the young, economically active and aged across planning areas and regions.
In order to plot a age-sex pyramid the females population has to be changed to negative values so that there will be no overlapping of bar grpahs problem. The function coord_flip is also necessary to make it from a vertical bar graph to a hirozontal bar graph.
From the Age-Sex Pyramid it can be seen that the base of the pyramid is narrow and it subsequently gets wider which indicates that Singapore has a higher proportion of older people as compared to youth. The highest proportion of population ranges from ages 40 years old to 50 years old while the lowest proportion of population ranges from ages 0 years old to 29 years old.
# Females population has to be changed to negative values
Total1 <- AgeGenderRace2 %>% select (`Age_Groups`,`Age_Groups_Category`, `Gender`,`Category`, `Category1`,`Total` )
Total1$Total <- ifelse(Total1$Gender == "Females",-1*Total1$Total ,Total1$Total )
# Plotting the Age-Sex Pyramid
total_cohort <- ggplot(Total1 ,aes(x = Age_Groups, y = Total,fill = Gender))+ geom_bar(data = subset(Total1 ,Gender == "Females"), stat = "identity")+ geom_bar(data = subset(Total1 ,Gender == "Males"), stat = "identity") + scale_y_continuous(breaks = seq(-150000, 150000, 50000), labels = paste0(as.character(c(seq(150, 0, -50), seq(50, 150, 50))))) + coord_flip()
# Beautification of the Age-Sex Pyramid
total_cohort+ ggtitle("2019 Singapore Total Population Age Sex Pyramid")+ xlab("Age Group")+ ylab("Population in Thousands")+ scale_fill_manual(values=c('lightpink2','steelblue3'))+ theme(legend.position='right')
Before the four Age-Sex Pyramids could be plotted, we would need to filter each race out from the data using the select function. Then the females population has to be changed to negative values so that there will be no overlapping of bar grpahs problem. After tha the ggarange function will be used to arrange the plots of the four Age-Sex Pyramids.
When comparing the four age-sex pyramids by race, it can be seen that the Chinese, Indians and Others have a smaller base as compared to the Malays which means that among all four races, the Malays have a higher proportion of a younger population aging between 0 years old to 29 years old. For ages 60 years old and above, the Chinese have the highest proportion among all four races, followed by Malays, Indians and Others. For ages between 30 years old to 59 years old, the Chinese have the largest proportion of their citizens in these age brackts followed by Indians, Malays and Others.
# Filtering out each race and changing females population to negative values
Chinese1 <- AgeGenderRace2 %>% select (`Age_Groups`,`Age_Groups_Category`, `Gender`,`Category`, `Category1`,`Chinese` )
Chinese1$Chinese <- ifelse(Chinese1$Gender == "Females",-1*Chinese1$Chinese,Chinese1$Chinese)
Malays1 <- AgeGenderRace2 %>% select (`Age_Groups`,`Age_Groups_Category`, `Gender`,`Category`, `Category1`,`Malays` )
Malays1$Malays <- ifelse(Malays1$Gender == "Females",-1*Malays1$Malays,Malays1$Malays)
Indians1 <- AgeGenderRace2 %>% select (`Age_Groups`,`Age_Groups_Category`, `Gender`,`Category`, `Category1`,`Indians` )
Indians1$Indians <- ifelse(Indians1$Gender == "Females",-1*Indians1$Indians ,Indians1$Indians )
Others1 <- AgeGenderRace2 %>% select (`Age_Groups`,`Age_Groups_Category`, `Gender`,`Category`, `Category1`,`Others` )
Others1$Others <- ifelse(Others1$Gender == "Females",-1*Others1$Others ,Others1$Others )
# Plotting the age-sex pyramids and arranging it
ggarrange(ggplot(Chinese1 ,aes(x = Age_Groups, y = Chinese,fill = Gender))+ geom_bar(data = subset(Chinese1 ,Gender == "Females"), stat = "identity")+ geom_bar(data = subset(Chinese1 ,Gender == "Males"), stat = "identity") + scale_y_continuous(breaks = seq(-150000, 150000, 50000), labels = paste0(as.character(c(seq(150, 0, -50), seq(50, 150, 50))))) + coord_flip()+ ggtitle("Singapore Chinese Population Pyramid by Age Cohort, 2019")+ xlab("Age Group")+ ylab("Population in Thousands")+ scale_fill_manual(values=c('lightpink2','steelblue3'))+ theme(legend.position='right'), ggplot(Malays1 ,aes(x = Age_Groups, y = Malays,fill = Gender))+ geom_bar(data = subset(Malays1 ,Gender == "Females"), stat = "identity")+ geom_bar(data = subset(Malays1 ,Gender == "Males"), stat = "identity") + scale_y_continuous(breaks = seq(-150000, 150000, 50000), labels = paste0(as.character(c(seq(150, 0, -50), seq(50, 150, 50))))) + coord_flip() + ggtitle("Singapore Malay Population Pyramid by Age Cohort, 2019")+ xlab("Age Group")+ ylab("Population in Thousands")+ scale_fill_manual(values=c('lightpink2','steelblue3'))+ theme(legend.position='right'), ggplot(Indians1 ,aes(x = Age_Groups, y = Indians,fill = Gender))+ geom_bar(data = subset(Indians1 ,Gender == "Females"), stat = "identity")+ geom_bar(data = subset(Indians1 ,Gender == "Males"), stat = "identity") + scale_y_continuous(breaks = seq(-150000, 150000, 50000), labels = paste0(as.character(c(seq(150, 0, -50), seq(50, 150, 50))))) + coord_flip() + ggtitle("Singapore Indian Population Pyramid by Age Cohort, 2019")+ xlab("Age Group")+ ylab("Population in Thousands")+ scale_fill_manual(values=c('lightpink2','steelblue3'))+ theme(legend.position='right'), ggplot(Indians1 ,aes(x = Age_Groups, y = Others,fill = Gender))+ geom_bar(data = subset(Others1 ,Gender == "Females"), stat = "identity")+ geom_bar(data = subset(Others1 ,Gender == "Males"), stat = "identity") + scale_y_continuous(breaks = seq(-150000, 150000, 50000), labels = paste0(as.character(c(seq(150, 0, -50), seq(50, 150, 50))))) + coord_flip() + ggtitle("Singapore Others Population Pyramid by Age Cohort, 2019")+ xlab("Age Group")+ ylab("Population in Thousands")+ scale_fill_manual(values=c('lightpink2','steelblue3'))+ theme(legend.position='right'), ncol = 2, nrow = 2)
A filled bar graph has been used to depict this visualization to show the proportion of young, economically active and aged by each region. A geom_bar function was used to plot this with a coord_flip to flip the bar graph to be horizontal.
From the graph below, it can be seen that among all 5 regions the proportion of the young, economically active and aged are similar except that the central region has a higher proprtion of aged and a lower proportion of young.
# Create breaks to indicate the x-axis indicators
breaks <- c(0, 0.25, 0.5, 0.75, 1)
breaks
## [1] 0.00 0.25 0.50 0.75 1.00
# Plotting filled bar graph
ggplot(Datacombined3, aes(fill=Category, y=Percent, x=Region)) + geom_bar(position="fill",stat="identity")+ scale_y_continuous(breaks = breaks, labels = scales::percent(breaks)) + labs(x = "Region", y = "% of Population")+ geom_vline(xintercept=0.5)+ coord_flip()+ ggtitle("Proportion of the young, economically active and aged by region in 2019")+ theme(plot.title = element_text(hjust = .5), axis.ticks = element_blank())+ theme_minimal(base_size =10) + scale_fill_manual(values = c("deepskyblue4","goldenrod4","seagreen4"))
A filled bar graph has been used to depict this visualization to show the proportion of young, economically active and aged by each planing area . A geom_bar function was used to plot this with a coord_flip to flip the bar graph to be horizontal.
It can be seen that areas such as Wester Water Catchment, Southern Islands, Seletar, Sengkang, Sembawang, Punggol, Lim Chu Kang and Changi has a higher proportion of young. Planning areas such as Toa Payoh, Sengai Kadult, Serangoon, Rochor, Queenstown, Outram, Marine Parade, Kallng, Jurong East, Geylang, Clementi and Bukit Merah have a higher proportion of aged.Finally, area such as Southern Islands, Singapore River, Downtown core, Choa Chu Kang have a higher proportion of economically active.
# Create breaks to indicate the x-axis indicators
breaks <- c(0, 0.25, 0.5, 0.75, 1)
breaks
## [1] 0.00 0.25 0.50 0.75 1.00
# Plotting filled bar graph
ggplot(Datacombined3, aes(fill=Category, y=Percent, x=Planning_Area)) + geom_bar(position="fill",stat="identity")+ scale_y_continuous(breaks = breaks, labels = scales::percent(breaks)) + labs(x = "Planning Area", y = "% of Population")+ geom_vline(xintercept=0.5)+ coord_flip()+ ggtitle("Proportion of the young, economically active and aged by planning area in 2019")+ theme(plot.title = element_text(hjust = .5), axis.ticks = element_blank())+ theme_minimal(base_size =10) + scale_fill_manual(values = c("deepskyblue4","goldenrod4","seagreen4"))
This part of the data visualization will be used to see how the young, economically active and aged has changed from years between 2011 and 2019 by region. The data was first filtered outby regions using the filter function. Subsequently, for each region 3 bar graphs were plot for the young, aged and economically active using the facet_grid function.
In the North East Region, the aged and economically active have been steadily increasing on a year to year basis. The young had an initial increase from 2011 to 2016 but have stagnated since then.
In the East and Central regions, the aged have been steadily increasing each year. However, the economically active and young have been decreasing over the years.
In the West the aged has increased over the years while the young has decreased and the economically active has remained the same.
In the North, the aged and economically active has had a steady slight increase over the years while the young had a steady slight decrease over the years
# Filerting of data by region using the filter function
NorthEast <- Datacombined2 %>% filter(`Region` == "North-East")
East <- Datacombined2 %>% filter(`Region` == "East")
Central <- Datacombined2 %>% filter(`Region` == "Central")
West <- Datacombined2 %>% filter(`Region` == "West")
North <- Datacombined2 %>% filter(`Region` == "North")
# Plotting bar graph for North East Region for the three population categories (aged, economically active, young)
T1C1NE <-ggplot(NorthEast ,aes(Year,Count)) + geom_bar(stat="identity", width = 0.5, fill = "orange3") + scale_x_continuous(name="Category", breaks = c(2011,2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019),limits=c(2010, 2020)) + scale_y_continuous(name="NE", breaks = c(100000,200000,300000,400000,500000,600000,700000,800000,900000,1000000,1100000,1200000,1300000,1400000,1500000,1600000),limits=c(0,1600000)) + facet_grid(~Category)
T1C1NE
# Plotting bar graph for East Region for the three population categories (aged, economically active, young)
T1C1East <-ggplot(East ,aes(Year,Count)) + geom_bar(stat="identity", width = 0.5, fill = "orange3") + scale_x_continuous(name="Category", breaks = c(2011,2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019),limits=c(2010, 2020)) + scale_y_continuous(name="East", breaks = c(100000,200000,300000,400000,500000,600000,700000,800000,900000,1000000,1100000,1200000,1300000,1400000,1500000,1600000 ),limits=c(0,1600000)) + facet_grid(~Category)
T1C1East
# Plotting bar graph for Central Region for the three population categories (aged, economically active, young)
T1C1Central <-ggplot(Central ,aes(Year,Count)) + geom_bar(stat="identity", width = 0.5, fill = "orange3") + scale_x_continuous(name="Category", breaks = c(2011,2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019),limits=c(2010, 2020)) + scale_y_continuous(name="Central", breaks = c(100000,200000,300000,400000,500000,600000,700000,800000,900000,1000000,1100000,1200000,1300000,1400000,1500000,1600000),limits=c(0, 1600000)) + facet_grid(~Category)
T1C1Central
# Plotting bar graph for West Region for the three population categories (aged, economically active, young)
T1C1West <-ggplot(West ,aes(Year,Count)) + geom_bar(stat="identity", width = 0.5, fill = "orange3") + scale_x_continuous(name="Category", breaks = c(2011,2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019),limits=c(2010, 2020)) + scale_y_continuous(name="West", breaks = c(100000,200000,300000,400000,500000,600000,700000,800000,900000,1000000,1100000,1200000,1300000,1400000,1500000,1600000),limits=c(0, 1600000)) + facet_grid(~Category)
T1C1West
# Plotting bar graph for North Region for the three population categories (aged, economically active, young)
T1C1North <-ggplot(North ,aes(Year,Count)) + geom_bar(stat="identity", width = 0.5, fill = "orange3") + scale_x_continuous(name="Category", breaks = c(2011,2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019),limits=c(2010, 2020)) + scale_y_continuous(name="North", breaks = c(100000,200000,300000,400000,500000,600000,700000,800000,900000,1000000,1100000,1200000,1300000,1400000,1500000,1600000),limits=c(0, 1600000)) + facet_grid(~Category)
T1C1North
A dumbbell plot was used to plot this visualization as we are able to see the exact number of dependency ratio figure in 2011 and in 2019 and how the depedency has moved in 2019 as compared to 2011 by region.
For the West, North-East and Central dependency ratio has increased in 2019 as compared to 2011 (with Central having the highest increase). However for regions such as North and East the dependency ratios have fallen in 2019 as compared in 2011.
# Dumbbell plot for dependency ratio between 2011 and 2019 by region
ggplot() + geom_segment(data=DependencyRatioByRegion, aes(y=Region, yend=Region, x=0, xend=75), color="#b2b2b2", size=0.15) + geom_dumbbell(data=DependencyRatioByRegion, aes(y=Region, x=Average_Dependency_Ratio_2011, xend=Average_Dependency_Ratio_2019),size=1.5, color="#b2b2b2", size_x=3, size_xend = 3, colour_x = 'deepskyblue4', colour_xend ='goldenrod4') + geom_text(data=filter(DependencyRatioByRegion, Region =="West"),aes(x=Average_Dependency_Ratio_2011, y=Region, label="2011"),color= 'deepskyblue4', size=3, vjust=-1.5, fontface="bold", family="Lato") + geom_text(data=filter(DependencyRatioByRegion, Region =="West"), aes(x=Average_Dependency_Ratio_2019, y=Region, label="2019"), color='goldenrod4', size=3, vjust=-1.5, fontface="bold", family="Lato") + geom_text(data=DependencyRatioByRegion, aes(x=Average_Dependency_Ratio_2011, y=Region, label=Average_Dependency_Ratio_2011),color='deepskyblue4', size=2.75, vjust=2.5, family="Lato") + geom_text(data=DependencyRatioByRegion , color='goldenrod4', size=2.75, vjust=2.5, family="Lato",aes(x=Average_Dependency_Ratio_2019, y=Region, label= Average_Dependency_Ratio_2019)) + labs(x=NULL, y=NULL, title="Dependency Ratio from 2011 to 2019 by region", subtitle="How has it changed?") + theme_bw(base_family="Lato") + theme( panel.grid.major=element_blank(),panel.grid.minor=element_blank(),panel.border=element_blank(),axis.ticks=element_blank(),axis.text.x=element_blank(), plot.title=element_text(size = 16, face="bold"), plot.subtitle=element_text(face="italic", size=12, margin=margin(b=12)), plot.caption=element_text(size=8, margin=margin(t=12), color="#7a7d7e"))
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
A dumbbell plot was used to plot this visualization as we are able to see the exact number of dependency ratio figure in 2011 and in 2019 and how the depedency has moved in 2019 as compared to 2011 by planning area.
For most planning areas, dependency ratios have increased except for these areas: Woodlands, Western Water Catchment, Tampines, Sengai Kadut, Sembawang, Pasir Ris Downtown core, Choa Chu Kang, Changi and Bukit Panjang.
The figures for these planning areas are listed below as the figures are overlapping on the plot:
Yishun dependency ratio 2011: 63.51// Yishun dependency ratio 2019: 63.8
Tampines dependency ratio 2011:66.75// Tampines dependency ratio 2019:66.21
Sembawang dependency ratio 2011:64.54// Sembawang dependency ratio 2019:63.94
Houang dependency ratio 2011:65.82// Houang dependency ratio 2019:65.91
Bukit Panjang dependency ratio 2011:69// Bukit Panjang dependency ratio 2019:66.97
Bukit Batok dependency ratio 2011:62.11// Bukit Batok dependency ratio 2019: 63.48
# Dumbbell plot for dependency ratio between 2011 and 2019 by planning area
ggplot() + geom_segment(data=DependencyRatio2, aes(y=Planning_Area, yend=Planning_Area, x=0, xend=120), color="#b2b2b2", size=0.15) + geom_dumbbell(data=DependencyRatio2, aes(y=Planning_Area, x=Dependency_Ratio_2011, xend=Dependency_Ratio_2019),size=1.5, color="#b2b2b2", size_x=3, size_xend = 3, colour_x = 'deepskyblue4', colour_xend ='goldenrod4') + geom_text(data=filter(DependencyRatio2, Planning_Area =="Woodlands"),aes(x=Dependency_Ratio_2011, y=Planning_Area, label="2011"),color= 'deepskyblue4', size=3, vjust=-1.5, fontface="bold", family="Lato") + geom_text(data=filter(DependencyRatio2, Planning_Area =="Woodlands"), aes(x=Dependency_Ratio_2019, y=Planning_Area, label="2019"), color='goldenrod4', size=3, vjust=-1.5, fontface="bold", family="Lato") + geom_text(data=DependencyRatio2, aes(x=Dependency_Ratio_2011, y=Planning_Area, label=Dependency_Ratio_2011),color='deepskyblue4', size=2.75, vjust=2.5, family="Lato") + geom_text(data=DependencyRatio2, color='goldenrod4', size=2.75, vjust=2.5, family="Lato",aes(x=Dependency_Ratio_2019, y=Planning_Area, label= Dependency_Ratio_2019))+ labs(x=NULL, y=NULL, title="Dependency Ratio from 2011 to 2019 by planning area", subtitle="How has it changed?") + theme_bw(base_family="Lato") + theme( panel.grid.major=element_blank(),panel.grid.minor=element_blank(),panel.border=element_blank(),axis.ticks=element_blank(),axis.text.x=element_blank(), plot.title=element_text(size = 16, face="bold"), plot.subtitle=element_text(face="italic", size=12, margin=margin(b=12)), plot.caption=element_text(size=8, margin=margin(t=12), color="#7a7d7e"))
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
Overall, the various visualizations used above are able to give us some insights into the demographics of Singapore in 2019 and how it has changed from 2011 as seen from the visualizations above and the insight explained for each visualization. Packages in R can be used to create different kinds of visualizations that can help to present the data in a useful manner which is much stronger than Tableau. Creating these visualization are much more convenient in R as the code can be typed in a single sentence as compared to Tableau where we need to constantly drag and remove variables from the axis. Furthermore, R is able to churn out stronger statistical analysis as compared to Tableau. R also has functions where we can manipulate and filter out data directly from the dateset loaded into R and a modified dataset wil be created and saved into R which is better than Tableau as Tableau is unable to do so.