The project is to design aa data visualisation to reveal the population spread in Singapore across various planning areas for the year 2019. We will also be able to analyse which areas and the kind of dwelling the population lives in.
Planning Areas with nil values which add no / marginal utility to the data were needed to be removed from the dataset The same was executed using the filter function
The age group is divided in buckets of 5 years which is a very small value and also creates lot of troubles in plotting. We can instead divide the data in buckets of 10 years which is convenient.
We can do this by using case_when from dplyr library.
There were no coordinates in the dataset for us to enable plotting on a map The same was manually inputted in the csv file using a third party website providing area-wise coordinates
Sketch.
This data was collected from Department of Statistics Singapore.
It is consisted of 7 columns :
PA - Planning Area
SZ - Subzone
AG - Age Group
Sex - Sex
TOD - Type of Dwelling
Pop - Resident Count
Time - Time / Period
We added another columns for co-ordinates (latitide and longitude) by adding them manually.
Begin by loading necessary libraries and adjusting the settings.
options(scipen = 99)
options(width = 120)
library(dplyr)
library(ggplot2)
library(leaflet)
library(plotly)
library(gganimate)
library(transformr)
library(ggrepel)
Read the csv file and see first few rows of data.
data <- read.csv('respopagesextod2011to2019.csv')
data %>% head(10) %>% select(-Latitude, -Longitude)
## PA SZ AG Sex TOD Pop Time
## 1 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males HDB 1- and 2-Room Flats 0 2011
## 2 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males HDB 3-Room Flats 10 2011
## 3 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males HDB 4-Room Flats 30 2011
## 4 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males HDB 5-Room and Executive Flats 50 2011
## 5 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males HUDC Flats (excluding those privatised) 0 2011
## 6 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males Landed Properties 0 2011
## 7 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males Condominiums and Other Apartments 40 2011
## 8 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males Others 0 2011
## 9 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Females HDB 1- and 2-Room Flats 0 2011
## 10 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Females HDB 3-Room Flats 10 2011
data <- data %>%
mutate(AG = case_when(AG %in% c('0_to_4', '5_to_9') ~ '0_to_9',
AG %in% c('10_to_14', '15_to_19') ~ '10_to_19',
AG %in% c('20_to_24', '25_to_29') ~ '20_to_29',
AG %in% c('30_to_34', '35_to_39') ~ '30_to_39',
AG %in% c('40_to_44', '45_to_49') ~ '40_to_49',
AG %in% c('50_to_54', '55_to_59') ~ '50_to_59',
AG %in% c('60_to_64', '65_to_69') ~ '60_to_69',
AG %in% c('70_to_74', '75_to_79') ~ '70_to_79',
AG %in% c('80_to_84', '85_to_89') ~ '80_to_89',
TRUE ~ AG))
data %>% head(10) %>% select(-Latitude, -Longitude)
## PA SZ AG Sex TOD Pop Time
## 1 Ang Mo Kio Ang Mo Kio Town Centre 0_to_9 Males HDB 1- and 2-Room Flats 0 2011
## 2 Ang Mo Kio Ang Mo Kio Town Centre 0_to_9 Males HDB 3-Room Flats 10 2011
## 3 Ang Mo Kio Ang Mo Kio Town Centre 0_to_9 Males HDB 4-Room Flats 30 2011
## 4 Ang Mo Kio Ang Mo Kio Town Centre 0_to_9 Males HDB 5-Room and Executive Flats 50 2011
## 5 Ang Mo Kio Ang Mo Kio Town Centre 0_to_9 Males HUDC Flats (excluding those privatised) 0 2011
## 6 Ang Mo Kio Ang Mo Kio Town Centre 0_to_9 Males Landed Properties 0 2011
## 7 Ang Mo Kio Ang Mo Kio Town Centre 0_to_9 Males Condominiums and Other Apartments 40 2011
## 8 Ang Mo Kio Ang Mo Kio Town Centre 0_to_9 Males Others 0 2011
## 9 Ang Mo Kio Ang Mo Kio Town Centre 0_to_9 Females HDB 1- and 2-Room Flats 0 2011
## 10 Ang Mo Kio Ang Mo Kio Town Centre 0_to_9 Females HDB 3-Room Flats 10 2011
plot <- data %>%
group_by(PA, Time) %>%
summarise(total_population = sum(Pop)) %>%
filter(all(total_population > 0)) %>%
rename(Planning_Area = PA) %>%
mutate(Time = factor(Time)) %>%
ggplot() + aes(Time, total_population, color = Planning_Area, group = Planning_Area) +
geom_line() + geom_point() +
xlab('Year')
p <- plot + geom_label_repel(aes(label = Planning_Area),na.rm = TRUE, nudge_x = 1) + theme(legend.position="none") +
transition_reveal(as.numeric(as.character(Time)))
animate(p, renderer = gifski_renderer(loop = FALSE))
If we visualize this on a map it would be easy to see how the population is geographically distributed. We can use leaflet package for it. We divide the group into 3 groups 0-50K, 50K-150K and 150K and above to see in which areas they are concentrated. We also remove areas with 0 population. Here we show population density for 2019 year.
Hovering over the bubbles, you are able to visualise the Planning Area and the Population
df1 <- data %>%
filter(Time == 2019) %>%
group_by(PA, Latitude, Longitude) %>%
summarise(total_population = sum(Pop)) %>%
filter(total_population > 0) %>%
rename(lat = Latitude, lng = Longitude) %>%
mutate(color = cut(total_population, breaks = c(-Inf, 50000, 150000, Inf),
labels = c('blue', 'black', 'red')))
df1 %>%
leaflet(width = 900, height = 750) %>%
addTiles() %>%
addCircles(weight = 1, radius = sqrt(df1$total_population) * 2, popup = df1$PA, color = df1$color,
label = paste0(df1$PA, ' : ', df1$total_population)) %>%
addLegend(labels = c("0-50K", "50K-150K", "150k and above"), colors = c("blue", "black", "red"))
We can also see population density for different planning area by Time. Click on legends to remove a particular area. Double Click on the year to view population for the Planning Areas for the particular year.
p <- data %>%
group_by(PA, Time) %>%
summarise(total_population = sum(Pop)) %>%
filter(total_population > 0) %>%
mutate(Time = factor(Time)) %>%
ggplot() + aes(PA, total_population, fill = Time) +
geom_bar(position="dodge", stat="identity") +
xlab('Planning Area') +
coord_flip()
ggplotly(p)
data %>%
group_by(PA, TOD, Time) %>%
summarise(total_population = sum(Pop)) %>%
filter(all(total_population > 0)) %>%
ungroup %>%
mutate(TOD = recode(TOD, "Condominiums and Other Apartments" = "Condo",
"HDB 1- and 2-Room Flats" = "1 & 2BHK",
"HDB 3-Room Flats" = "3 BHK",
"HDB 4-Room Flats" = "4 BHK",
"HDB 5-Room and Executive Flats" = "Executive")) %>%
ggplot() + aes(PA, TOD, fill = total_population) +
geom_tile() + coord_flip() +
facet_wrap(.~Time) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.1)) +
xlab('Planning Area') + ylab('Type of Dwelling')
From the spatial visualization, it appears that large clusters of population reside in East, West and North areas. South and the Central areas seem to be either less densely of scantly populated. The planning authorities seem to have ideated a dispersed residential development from the Central area, possibly with an intention to centralise commercial / office location.
Planning Areas of Bedok, Jurong West, Tampines & Woodlands have remained the largest by population. But, over the years there is a marginal dip in the overall population in these areas. In contrast, Sengkang, Punggol and Yishun areas have grown in a sizeable percentage over the years. It appears that the planning authorities had envisioned to not allow further congestion / growing density in already populated areas. Instead, they have chosen to develop medium or low populated areas to increase residential habitation. Other Planning areas, which are medium or scantly populated have remained to be so over the same period of time, with marginal variations wither on the upside or downside.