Done by Calvin C
- Sep 2021
In a recent newspaper article in February 21, it was reported that Singapore’s Fertility rate fell to a historic low of 1.1 and some of the contributing factors include COVID-19 and rising property prices that delayed marriages and parenthood plans. A falling birth rate together along with a higher life expectancy is increasing the rate that the population is aging. In this study, we will be using the demographics dataset from data.gov.sg to analyse the population changes in Singapore throughout the years and which ethnic group has the biggest influence in the aging population rate.
library(tidyverse)
library(directlabels)
library(pkgbuild)
library(gifski)
library(gganimate)
library(png)
theme_set(theme_light())
Reading the demographics data
df1 <- read.csv("tma_demographics.csv")
head(df1)
Checking out the structure of data
str(df1)
## 'data.frame': 22680 obs. of 4 variables:
## $ year : int 1957 1957 1957 1957 1957 1957 1957 1957 1957 1957 ...
## $ level_1: chr "Total Residents" "Total Residents" "Total Residents" "Total Residents" ...
## $ level_2: chr "0 - 4 Years" "5 - 9 Years" "10 - 14 Years" "15 - 19 Years" ...
## $ value : chr "264727" "218097" "136280" "135679" ...
Reorganizing the data by creating two new columns, ethnicity and gender and dropping aggregated data.
df2<- df1%>%
mutate(level_1 = gsub("Total ", '',df1$level_1))%>%
filter(level_1 == 'Male Malays'|level_1 == "Female Malays"|
level_1 == "Male Chinese"|level_1 == "Female Chinese"|
level_1 == "Male Indians"|level_1 == "Female Indians"|
level_1 == "Other Ethnic Groups (Males)"|level_1 == "Other Ethnic Groups (Females)")%>%
mutate(level_1 = gsub("Other Ethnic Groups \\(Males\\)", 'Male Others', level_1))%>%
mutate(level_1 = gsub("Other Ethnic Groups \\(Females\\)", 'Female Others', level_1)) %>%
separate(level_1, into = c("Gender","Ethnicity"), sep = " ")%>%
mutate(value = gsub("na", 0, value))
names(df2)[4]<- paste("age_band")
names(df2)[5]<- paste("population_size")
df2$population_size <- as.integer(df2$population_size)
head(df2)
To begin, we will be categorizing the age bands to 3 life stages based on national statistical standards:
Formative years – Youth age 19 and below
Working years – Adults between age 20 to 64
Retirement years – Adult’s age 65 and above
# Converting age bank to life stages
df3<- df2%>%
mutate(life_stage = (if_else(
age_band == "0 - 4 Years"|age_band == "5 - 9 Years"|
age_band =="10 - 14 Years"|age_band =="15 - 19 Years", " formative_years - 0 to 19", if_else(
age_band =="20 - 24 Years"|age_band =="25 - 29 Years"|
age_band =="30 - 34 Years"|age_band =="35 - 39 Years"|
age_band =="40 - 44 Years"|age_band =="45 - 49 Years"|
age_band =="50 - 54 Years"|age_band =="55 - 59 Years"|
age_band =="60 - 64 Years", " working_years - 20 to 64",if_else(
age_band =="65 - 69 Years"|age_band =="70 - 74 Years"|
age_band =="75 - 79 Years"|age_band =="80 - 84 Years"|
age_band =="85 - 89 Years"|age_band == "90 Years & Over", "retirement_years - 65 and above",NULL
))))) %>%
na.omit()
#population size by life_stage
df4 <- aggregate(population_size~year+life_stage, df3,sum)
head(df4)
Creating the plot
p.1a <- ggplot(data= df4, mapping = aes(x = year, y = population_size, group= life_stage))
p1a.anim<- p.1a + geom_line(aes(color = life_stage))+
scale_y_log10(labels = scales::label_comma())+
geom_dl(aes(label = life_stage, color = life_stage),method =("first.points"),size =16)+
labs(y = "Population size",
x=" Year",
title = "Population size by life stages"
)+
guides(color= 'none')+
theme_light()+
transition_reveal(year)
animate(p1a.anim, fps=12,renderer = gifski_renderer())
The above shows the movement of the population size changes in Singapore across the years. We can observe that the population size of youth has slowed down before the 1980s and have not been able to revert back to the 1,000,000 mark ever since while advancement in medical care has contributed to the rapid growth of the silver years population by more than 10 times in the last 50 years, from less than 30,000 in 1957 to more than 300,000 by 2019.
#population size growth of each Ethnicity split by life stage
df3a<- aggregate(population_size~year+Ethnicity+life_stage, df3, sum)
p.1b <- ggplot(data=df3a, mapping = aes(x=year, y=population_size, fill=Ethnicity))
p.1b + geom_line(aes(color =Ethnicity))+
scale_y_log10(labels=scales::label_comma())+
facet_wrap("~life_stage")+
geom_dl(aes(label = Ethnicity, color = Ethnicity),method =("smart.grid"),size =16)+
geom_smooth(method=lm, se=F, aes(color = Ethnicity),show.legend = F)+
guides(color = "none")+
labs(y = "Population Size",
x=" Year",
title = "Population size changes of each Ethnicity split by life stage")+
theme_light()
Looking into the different Ethnicity in Singapore, we can see that Chinese holds the highest proportion of population in Singapore. Under formative years, the Chinese ethnic group has a decreasing trend compared to the others which currently still holds an increasing trend.
Lastly, we will be looking at the dependency ratio which is defined as the number of dependents per working adult. We will be looking at two dependent groups, mainly children and youth below age 20 and adults age 65 and above.
Formula for dependency ratio per 100 adults for formative years is: (Formative years population/ working years population)*100
Formula for dependency ratio per 100 adults for Retirement years is: (Retirement years population/ working years population)*100
#Dependency ratio of each ethnicity splot
df3b <- df3a %>%
pivot_wider(names_from = life_stage, values_from = population_size)%>%
mutate(Formative = .[[3]]*100/.[[4]])%>%
mutate(Retirement = .[[5]]*100/.[[4]])
df3b <-df3b %>%
pivot_longer(cols = 6:7, names_to = "dep_ratio_type", values_to = "dep_ratio")%>%
select(year,Ethnicity,dep_ratio_type,dep_ratio)
head(df3b)
p.1c <- ggplot(data=df3b, mapping = aes(x=year, y=dep_ratio))
p.1c + geom_point(aes(color =dep_ratio_type, shape = Ethnicity))+
labs(y = "Dependency Ratio per 100 working adults",
x=" Year",
title = "Dependency Ratio of retirees & children split by ethnicity",
color = "Dependency Ratio type")+
theme_light()
We can see that across all Ethnic group, there is a general increase in the dependency ratio for retirement years which means that population size for retirement years has been growing much quicker than the population size of working adults. On the other hand, dependency ratio for formative years has dropped sharply since its peak before 1970s.
Comparing the changes across all the ethnic groups, the Chinese Ethnic group, represented by the circle shape, has the sharpest rise in the retirement years dependency ratio and the sharpest drop in the formative years’ dependency ratio.
The growth of population in the retirement years is rising much faster than the growth of population of working adults while the population of youth below age 20 is decreasing. Among the 4 Ethnic groups, Chinese is the majority ethnic group and has the greatest influence in the increasing rate of aging population in Singapore due to their rapid increase of dependency ratio for retirements years and rapid decrease of dependency ratio for formative years.
df5 <- df2 %>%
filter(age_band == "0 - 4 Years")%>%
pivot_wider(names_from = Gender, values_from = population_size)
p.2 <- ggplot(data = df5, mapping = aes(x = Female, y = Male))
p.2 + geom_point(mapping = aes(color = Ethnicity))+
geom_abline(intercept=0, slope=1, size =1, alpha = 0.1)+
scale_y_log10(labels = scales::label_comma())+
scale_x_log10(labels = scales::label_comma())+
geom_dl(aes(label = Ethnicity, color = Ethnicity),method =("smart.grid"),size =16)+
guides(color = 'none')+
labs(y = "Boys (aged 0-4)",
x= "Girls (aged 0-4)",
title = "Gender comparison for aged 0-4 across Ethnicity",
subtitle = "Data points are Ethnicity - years"
)+
theme_light()
The above is a scatterplot with each datapoint representing the population size in each distinct year for babies and infants between aged 0 to 4. The faint 45-degree line overlayed represents the point where population size of boys equals to population size of girls. Datapoints above the diagonal line means that there are more boys than girls while datapoints below the diagonal line means that there are more girls than boys in that year.
This graph is a representation of the youngest population of Singapore and looks at the birth rate and the survival rate of children until 4 years old. As Singapore’s infant mortality rate is one of the lowest at 1.8 per 1,000 live birth (Singstat, 2021), we can rule out infant mortality rate from this discussion.
From the graph we can see that for the Chinese ethnic group, all the data points reside clearly above the diagonal line which means that for all time periods captured within the dataset, number of births for baby girls has never exceeded the number of births for baby boys in any particular year. This observation is unsurprising due to the preference of boys in Chinese culture to carry on the family name and family are willing to continue to reproduce until they have a boy.
For the Malay ethnic group, we can also observe a similar trend where data points are above the diagonal line primarily but closer than Chinese ethnic group and there are some years where the ratio of birth between boys and girls is 1:1 when the data point is on the line.
For other 2 ethnic groups, the ratio is quite spread out where there are some years where birth of girls is higher and some years where birth of boys is higher which seem to hint that there is less gender preference in these ethnic groups.
To summarise, the graph clearly shows the preference of baby boys over baby girls in Singapore as the majority of the data points lies above the diagonal line. Certain ethnic race like Chinese and Malay exudes a stronger gender preference compared to the other two as they do not have any datapoints below the diagonal line.
Animated plot showing the data point movement cross the years
p.2 <- ggplot(data = df5, mapping = aes(x = Female, y = Male))
p.2anim <- p.2 + geom_point(mapping = aes(color = Ethnicity))+
geom_abline(intercept=0, slope=1, size =1, alpha = 0.1)+
geom_dl(aes(label = Ethnicity, color = Ethnicity),method =("first.points"),size =16)+
scale_y_log10(labels = scales::label_comma())+
scale_x_log10(labels = scales::label_comma())+
guides(color = 'none')+
labs(y = "Boys (aged 0-4)",
x= "Girls (aged 0-4)",
title = "Gender comparison for aged 0-4 across Ethnicity",
subtitle = "Data points are Ethnicity - years"
)+
theme_light()+
theme(title = element_text(size = 16),
axis.text.x = element_text(size = 12), # x-axis grid text
axis.text.y = element_text(size = 12), # y-axis grid text
axis.title = element_text(size = 14)) + # axis title text
transition_time(year) + # Transition added here
labs(title = "Year: {frame_time}") # Put the title here
animate(p.2anim, fps=12,renderer = gifski_renderer())
The above shows the movement of the baby boy- baby girl ratio of the different ethnic groups across the years. Looking at the data point movements, we can see that the gender preference for boys has been decreasing as time goes by as data points has been moving and even crossing the diagonal line in the later years.