1. Introduction:

The project is to design aa data visualisation to reveal the population spread in Singapore across various planning areas for the year 2019. We will also be able to analyse which areas and the kind of dwelling the population lives in.

  1. Challenges to the dataset and its solutions:

Null values

Planning Areas with nil values which add no / marginal utility to the data were needed to be removed from the dataset The same was executed using the filter function

Age Group

The age group is divided in buckets of 5 years which is a very small value and also creates lot of troubles in plotting. We can instead divide the data in buckets of 10 years which is convenient.
We can do this by using case_when from dplyr library.

Coordinates

There were no coordinates in the dataset for us to enable plotting on a map The same was manually inputted in the csv file using a third party website providing area-wise coordinates

  1. Proposed DataViz (Sketch Version):

Sketch.

  1. Data :

This data was collected from Department of Statistics Singapore.

It is consisted of 7 columns :
PA - Planning Area
SZ - Subzone
AG - Age Group
Sex - Sex
TOD - Type of Dwelling
Pop - Resident Count
Time - Time / Period

We added another columns for co-ordinates (latitide and longitude) by adding them manually.

  1. Data Wrangling:

Begin by loading necessary libraries and adjusting the settings.

options(scipen = 99)
options(width = 120)
library(dplyr)
library(ggplot2)
library(leaflet)
library(plotly)
library(gganimate)
library(transformr)
library(ggrepel)

Read the csv file and see first few rows of data.

data <- read.csv('respopagesextod2011to2019.csv')
data %>% head(10) %>% select(-Latitude, -Longitude)
##            PA                     SZ     AG     Sex                                     TOD Pop Time
## 1  Ang Mo Kio Ang Mo Kio Town Centre 0_to_4   Males                 HDB 1- and 2-Room Flats   0 2011
## 2  Ang Mo Kio Ang Mo Kio Town Centre 0_to_4   Males                        HDB 3-Room Flats  10 2011
## 3  Ang Mo Kio Ang Mo Kio Town Centre 0_to_4   Males                        HDB 4-Room Flats  30 2011
## 4  Ang Mo Kio Ang Mo Kio Town Centre 0_to_4   Males          HDB 5-Room and Executive Flats  50 2011
## 5  Ang Mo Kio Ang Mo Kio Town Centre 0_to_4   Males HUDC Flats (excluding those privatised)   0 2011
## 6  Ang Mo Kio Ang Mo Kio Town Centre 0_to_4   Males                       Landed Properties   0 2011
## 7  Ang Mo Kio Ang Mo Kio Town Centre 0_to_4   Males       Condominiums and Other Apartments  40 2011
## 8  Ang Mo Kio Ang Mo Kio Town Centre 0_to_4   Males                                  Others   0 2011
## 9  Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Females                 HDB 1- and 2-Room Flats   0 2011
## 10 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Females                        HDB 3-Room Flats  10 2011
data <- data %>%
          mutate(AG = case_when(AG %in% c('0_to_4', '5_to_9') ~ '0_to_9',
                                AG %in% c('10_to_14', '15_to_19') ~ '10_to_19',
                                AG %in% c('20_to_24', '25_to_29') ~ '20_to_29',
                                AG %in% c('30_to_34', '35_to_39') ~ '30_to_39',
                                AG %in% c('40_to_44', '45_to_49') ~ '40_to_49',
                                AG %in% c('50_to_54', '55_to_59') ~ '50_to_59',
                                AG %in% c('60_to_64', '65_to_69') ~ '60_to_69',
                                AG %in% c('70_to_74', '75_to_79') ~ '70_to_79',
                                AG %in% c('80_to_84', '85_to_89') ~ '80_to_89',
                                TRUE ~ AG))
data %>% head(10) %>% select(-Latitude, -Longitude)
##            PA                     SZ     AG     Sex                                     TOD Pop Time
## 1  Ang Mo Kio Ang Mo Kio Town Centre 0_to_9   Males                 HDB 1- and 2-Room Flats   0 2011
## 2  Ang Mo Kio Ang Mo Kio Town Centre 0_to_9   Males                        HDB 3-Room Flats  10 2011
## 3  Ang Mo Kio Ang Mo Kio Town Centre 0_to_9   Males                        HDB 4-Room Flats  30 2011
## 4  Ang Mo Kio Ang Mo Kio Town Centre 0_to_9   Males          HDB 5-Room and Executive Flats  50 2011
## 5  Ang Mo Kio Ang Mo Kio Town Centre 0_to_9   Males HUDC Flats (excluding those privatised)   0 2011
## 6  Ang Mo Kio Ang Mo Kio Town Centre 0_to_9   Males                       Landed Properties   0 2011
## 7  Ang Mo Kio Ang Mo Kio Town Centre 0_to_9   Males       Condominiums and Other Apartments  40 2011
## 8  Ang Mo Kio Ang Mo Kio Town Centre 0_to_9   Males                                  Others   0 2011
## 9  Ang Mo Kio Ang Mo Kio Town Centre 0_to_9 Females                 HDB 1- and 2-Room Flats   0 2011
## 10 Ang Mo Kio Ang Mo Kio Town Centre 0_to_9 Females                        HDB 3-Room Flats  10 2011

  1. Visualization 1 - Population Trend by Planning Area:

plot <-   data %>% 
          group_by(PA, Time) %>% 
          summarise(total_population = sum(Pop)) %>%
          filter(all(total_population > 0)) %>%
          rename(Planning_Area = PA) %>%
          mutate(Time = factor(Time)) %>%
          ggplot() + aes(Time, total_population, color = Planning_Area, group = Planning_Area) + 
          geom_line() + geom_point() + 
          xlab('Year')
          
p <- plot + geom_label_repel(aes(label = Planning_Area),na.rm = TRUE, nudge_x = 1) + theme(legend.position="none") + 
transition_reveal(as.numeric(as.character(Time)))

animate(p, renderer = gifski_renderer(loop = FALSE))

  1. Visualization 2 - Population Distribution - Spatial View:

If we visualize this on a map it would be easy to see how the population is geographically distributed. We can use leaflet package for it. We divide the group into 3 groups 0-50K, 50K-150K and 150K and above to see in which areas they are concentrated. We also remove areas with 0 population. Here we show population density for 2019 year.

Hovering over the bubbles, you are able to visualise the Planning Area and the Population

df1 <- data %>%
        filter(Time == 2019) %>%
        group_by(PA, Latitude, Longitude) %>%
        summarise(total_population = sum(Pop)) %>%
        filter(total_population > 0) %>%
        rename(lat = Latitude, lng = Longitude) %>%
        mutate(color = cut(total_population, breaks = c(-Inf, 50000, 150000, Inf), 
               labels = c('blue', 'black', 'red')))
df1 %>%
  leaflet(width = 900, height = 750) %>%
  addTiles() %>%
  addCircles(weight = 1, radius = sqrt(df1$total_population) * 2, popup = df1$PA, color = df1$color, 
             label = paste0(df1$PA, ' : ', df1$total_population)) %>%
  addLegend(labels = c("0-50K", "50K-150K", "150k and above"), colors = c("blue", "black", "red"))

  1. Visualization 3 - Population by Planning Area - Annual Changes:

We can also see population density for different planning area by Time. Click on legends to remove a particular area. Double Click on the year to view population for the Planning Areas for the particular year.

p <- data %>%
    group_by(PA, Time) %>%
    summarise(total_population = sum(Pop)) %>%
    filter(total_population > 0) %>%
    mutate(Time = factor(Time)) %>%
    ggplot() + aes(PA, total_population, fill = Time) + 
    geom_bar(position="dodge", stat="identity") +
    xlab('Planning Area') + 
    coord_flip() 
    
ggplotly(p) 

  1. Visualization 4 - HEatmap: Population by Planning Area & Type of Dwelling:

data %>%
  group_by(PA, TOD, Time) %>%
  summarise(total_population = sum(Pop)) %>%
  filter(all(total_population > 0)) %>%
  ungroup %>%
  mutate(TOD = recode(TOD, "Condominiums and Other Apartments" = "Condo", 
                      "HDB 1- and 2-Room Flats" = "1 & 2BHK", 
                      "HDB 3-Room Flats" = "3 BHK", 
                      "HDB 4-Room Flats" = "4 BHK", 
                      "HDB 5-Room and Executive Flats" = "Executive")) %>%
  ggplot() +  aes(PA, TOD, fill = total_population) + 
  geom_tile() + coord_flip() +
  facet_wrap(.~Time) + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.1)) + 
  xlab('Planning Area') + ylab('Type of Dwelling')

  1. Insights:

Geographical Diversification

From the spatial visualization, it appears that large clusters of population reside in East, West and North areas. South and the Central areas seem to be either less densely of scantly populated. The planning authorities seem to have ideated a dispersed residential development from the Central area, possibly with an intention to centralise commercial / office location.

Area-wise density planning

Planning Areas of Bedok, Jurong West, Tampines & Woodlands have remained the largest by population. But, over the years there is a marginal dip in the overall population in these areas. In contrast, Sengkang, Punggol and Yishun areas have grown in a sizeable percentage over the years. It appears that the planning authorities had envisioned to not allow further congestion / growing density in already populated areas. Instead, they have chosen to develop medium or low populated areas to increase residential habitation. Other Planning areas, which are medium or scantly populated have remained to be so over the same period of time, with marginal variations wither on the upside or downside.

  1. References:

https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latestdata https://www.distancesto.com/coordinates/sg/singapore-latitude-longitude/history/2727.html