Author: Peng You Xuan
Date: 21 Mar 2021
This makeover aims to visualize the demographics of the population in Singapore based on a data found on Singstat which provides information on the Planning Area Subzone, Age Group, Sex and Type of Dwelling (https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data) . The scope of this visualization is in 2019 and there are a total of 98,192 rows of data that we can work with.
The challenge for this visualization is the data preparation of the dataset. This was something new, rather than immediately using the dataset as it is, data have to be looked through and adjusted to ensure that the final visualization is able to provide sufficient insights for readers.
Another challenge for the data visualization is exploring the geofacet library, something which was not covered in class previously. The geofacet library also contains 2 sets of grids which has the grid of Singapore. Hence data has to be seen before we could use it. There were also locations that are named differently from the original dataset, causing it to not work properly initially. This would require changes to be made to the geofacet library dataset, to name the planning area name properly. This issue was prominent especially when exploring with sg_planning_area_grid2. To aid in the usage of the geofacet library, the following website was used as a point of guidance: <https://cran.r-project.org/web/packages/geofacet/vignettes/geofacet.html>.
Lastly, the requirement was to plot a visualization which had to show the demographics based on the planning area. this was difficult as there were many planning areas in Singapore, and it was not easy to make the visualization readable for the readers, as the initial visualization had words that were too small (e.g. name of Planning Area) that were not readable. To overcome this challenge, research was done to find out how to adjust the font sizes of the labels and axes appropriately.
packages = c('tidyverse', "ggplot2", "geofacet")
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
## Loading required package: tidyverse
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.4
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Loading required package: geofacet
pop_data <- read_csv("singapore-residents-by-planning-areasubzone-age-group-sex-and-type-of-dwelling-june-20112020/respopagesextod2011to2020.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## PA = col_character(),
## SZ = col_character(),
## AG = col_character(),
## Sex = col_character(),
## TOD = col_character(),
## Pop = col_double(),
## Time = col_double()
## )
pop_data
## # A tibble: 984,656 x 7
## PA SZ AG Sex TOD Pop Time
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 1- and 2-Room Flats 0 2011
## 2 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 3-Room Flats 10 2011
## 3 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 4-Room Flats 30 2011
## 4 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 5-Room and Executive~ 50 2011
## 5 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HUDC Flats (excluding th~ 0 2011
## 6 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Landed Properties 0 2011
## 7 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Condominiums and Other A~ 40 2011
## 8 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Others 0 2011
## 9 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 1- and 2-Room Flats 0 2011
## 10 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 3-Room Flats 10 2011
## # ... with 984,646 more rows
pop_data <- pop_data %>% filter(Time == 2019)
pop_data
## # A tibble: 98,192 x 7
## PA SZ AG Sex TOD Pop Time
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 1- and 2-Room Flats 0 2019
## 2 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 3-Room Flats 10 2019
## 3 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 4-Room Flats 10 2019
## 4 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 5-Room and Executive~ 20 2019
## 5 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HUDC Flats (excluding th~ 0 2019
## 6 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Landed Properties 0 2019
## 7 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Condominiums and Other A~ 50 2019
## 8 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Others 0 2019
## 9 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 1- and 2-Room Flats 0 2019
## 10 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 3-Room Flats 10 2019
## # ... with 98,182 more rows
names(pop_data)<-c("PlanningArea","Subzone","AgeGroup","Gender","TypeofDwelling","Population","Year")
pop_data
## # A tibble: 98,192 x 7
## PlanningArea Subzone AgeGroup Gender TypeofDwelling Population Year
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males HDB 1- and 2-Room ~ 0 2019
## 2 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males HDB 3-Room Flats 10 2019
## 3 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males HDB 4-Room Flats 10 2019
## 4 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males HDB 5-Room and Exe~ 20 2019
## 5 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males HUDC Flats (exclud~ 0 2019
## 6 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males Landed Properties 0 2019
## 7 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males Condominiums and O~ 50 2019
## 8 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males Others 0 2019
## 9 Ang Mo Kio Ang Mo Kio~ 0_to_4 Femal~ HDB 1- and 2-Room ~ 0 2019
## 10 Ang Mo Kio Ang Mo Kio~ 0_to_4 Femal~ HDB 3-Room Flats 10 2019
## # ... with 98,182 more rows
pop_data <- pop_data %>% mutate(AgeGroup_2 = case_when(AgeGroup == "5_to_9" ~ "05_to_9",
TRUE ~ as.character(AgeGroup)))
pop_data
## # A tibble: 98,192 x 8
## PlanningArea Subzone AgeGroup Gender TypeofDwelling Population Year
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males HDB 1- and 2-Room ~ 0 2019
## 2 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males HDB 3-Room Flats 10 2019
## 3 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males HDB 4-Room Flats 10 2019
## 4 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males HDB 5-Room and Exe~ 20 2019
## 5 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males HUDC Flats (exclud~ 0 2019
## 6 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males Landed Properties 0 2019
## 7 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males Condominiums and O~ 50 2019
## 8 Ang Mo Kio Ang Mo Kio~ 0_to_4 Males Others 0 2019
## 9 Ang Mo Kio Ang Mo Kio~ 0_to_4 Femal~ HDB 1- and 2-Room ~ 0 2019
## 10 Ang Mo Kio Ang Mo Kio~ 0_to_4 Femal~ HDB 3-Room Flats 10 2019
## # ... with 98,182 more rows, and 1 more variable: AgeGroup_2 <chr>
sg_planning_area_grid1
## row col code name
## 1 1 7 WD Woodlands
## 2 1 9 SE Simpang
## 3 1 8 SB Sembawang
## 4 1 10 SL Seletar
## 5 2 8 YS Yishun
## 6 2 9 SE Sengkang
## 7 2 10 PG Punggol
## 8 2 7 MD Mandai
## 9 3 7 TP Toa Payoh
## 10 3 5 SK Sungei Kadut
## 11 3 9 SG Serangoon
## 12 3 11 PL Paya Lebar
## 13 3 12 PR Pasir Ris
## 14 3 14 NI North Eastern Islands
## 15 3 10 HG Hougang
## 16 3 4 CK Choa Chu Kang
## 17 3 6 BS Bishan
## 18 3 8 AM Ang Mo Kio
## 19 4 4 TH Tengah
## 20 4 6 TL Tanglin
## 21 4 12 TM Tampines
## 22 4 8 RC Rochor
## 23 4 7 NT Newton
## 24 4 3 LK Lim Chu Kang
## 25 4 9 KL Kallang
## 26 4 10 GL Geylang
## 27 4 13 CH Changi
## 28 4 5 CC Central Water Catchment
## 29 4 11 BD Bedok
## 30 5 13 CB Changi Bay
## 31 5 2 WC Western Water Catchment
## 32 5 7 OR Orchard
## 33 5 6 NV Novena
## 34 5 8 MU Museum
## 35 5 10 MP Marine Parade
## 36 5 9 ME Marina East
## 37 5 3 JW Jurong West
## 38 5 5 BP Bukit Panjang
## 39 5 4 BK Bukit Batok
## 40 6 1 TS Tuas
## 41 6 6 RV River Valley
## 42 6 2 PN Pioneer
## 43 6 8 MS Marina South
## 44 6 4 JE Jurong East
## 45 6 7 DT Downtown Core
## 46 6 5 BT Bukit Timah
## 47 6 3 BL Boon Lay
## 48 7 7 SV Straits View
## 49 7 6 SR Singapore River
## 50 7 5 QT Queenstown
## 51 7 4 CL Clementi
## 52 8 7 OT Outram
## 53 8 6 BM Bukit Merah
## 54 9 6 SI Southern Islands
## 55 9 4 WI Western Islands
sg_planning_area_grid2
## col row code name name_indo
## 1 6 6 ANG MO KIO ANG MO KIO ANG MO KIO
## 2 9 10 BEDOK BEDOK BEDOK
## 3 6 9 BISHAN BISHAN BISHAN
## 4 3 8 BUKIT BATOK BUKIT BATOK BUKIT BATOK
## 5 5 15 BUKIT MERAH BUKIT MERAH BUKIT MERAH
## 6 3 7 BUKIT PANJANG BUKIT PANJANG BUKIT PANJANG
## 7 4 9 BUKIT TIMAH BUKIT TIMAH BUKIT TIMAH
## 8 5 12 CENTRAL CENTRAL CENTRAL
## 9 2 4 CHOA CHU KANG CHOA CHU KANG CHOA CHU KANG
## 10 3 12 CLEMENTI CLEMENTI CLEMENTI
## 11 8 10 GEYLANG GEYLANG GEYLANG
## 12 8 6 HOUGANG HOUGANG HOUGANG
## 13 2 9 JURONG EAST JURONG EAST JURONG EAST
## 14 1 9 JURONG WEST JURONG WEST JURONG WEST
## 15 7 11 KALLANG / WHAMPOA KALLANG / WHAMPOA KALLANG / WHAMPOA
## 16 8 13 MARINE PARADE MARINE PARADE MARINE PARADE
## 17 11 6 PASIR RIS PASIR RIS PASIR RIS
## 18 7 4 PUNGGOL PUNGGOL PUNGGOL
## 19 4 13 QUEENTOWN QUEENTOWN QUEENTOWN
## 20 4 1 SEMBAWANG SEMBAWANG SEMBAWANG
## 21 7 5 SENGKANG SENGKANG SENGKANG
## 22 8 7 SERANGOON SERANGOON SERANGOON
## 23 10 8 TAMPINES TAMPINES TAMPINES
## 24 7 10 TOA PAYOH TOA PAYOH TOA PAYOH
## 25 3 2 WOODLANDS WOODLANDS WOODLANDS
## 26 6 3 YISHUN YISHUN YISHUN
library(geofacet)
sg_planning_area_grid1 <- sg_planning_area_grid1 %>% mutate(name = toupper(name))
# convert planning area name to upper case
pop_data <- pop_data %>% mutate(PlanningArea = toupper(PlanningArea))
pop_data
## # A tibble: 98,192 x 8
## PlanningArea Subzone AgeGroup Gender TypeofDwelling Population Year
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 ANG MO KIO Ang Mo Kio~ 0_to_4 Males HDB 1- and 2-Room ~ 0 2019
## 2 ANG MO KIO Ang Mo Kio~ 0_to_4 Males HDB 3-Room Flats 10 2019
## 3 ANG MO KIO Ang Mo Kio~ 0_to_4 Males HDB 4-Room Flats 10 2019
## 4 ANG MO KIO Ang Mo Kio~ 0_to_4 Males HDB 5-Room and Exe~ 20 2019
## 5 ANG MO KIO Ang Mo Kio~ 0_to_4 Males HUDC Flats (exclud~ 0 2019
## 6 ANG MO KIO Ang Mo Kio~ 0_to_4 Males Landed Properties 0 2019
## 7 ANG MO KIO Ang Mo Kio~ 0_to_4 Males Condominiums and O~ 50 2019
## 8 ANG MO KIO Ang Mo Kio~ 0_to_4 Males Others 0 2019
## 9 ANG MO KIO Ang Mo Kio~ 0_to_4 Femal~ HDB 1- and 2-Room ~ 0 2019
## 10 ANG MO KIO Ang Mo Kio~ 0_to_4 Femal~ HDB 3-Room Flats 10 2019
## # ... with 98,182 more rows, and 1 more variable: AgeGroup_2 <chr>
ggplot(pop_data, aes(AgeGroup_2, Population, fill = Gender)) +
geom_col(stat="identity", width=1) +
coord_flip()
## Warning: Ignoring unknown parameters: stat
Plot all graphs on with the geofacet library. This will show the Planning area location relative to the actual position of the location on the Singapore map.
library(geofacet)
ggplot(pop_data, aes(x=AgeGroup_2, y=Population, fill = Gender)) +
labs(title = "Demographic of Population by Planning Area",
caption = "Source: Singstat", x = "Population", y = "Age Group") +
geom_col(stat="identity") +
theme(axis.title.x = element_text(size=50),
axis.title.y = element_text(size=50),
title = element_text(size=80),
axis.text.x = element_text(size=10),
legend.text = element_text(size = 50),
strip.text.x = element_text(size = 30, face = "bold")) +
coord_flip() +
#facet_wrap(~ Planning Area)
facet_geo(~ PlanningArea, grid = "sg_planning_area_grid1", label = "name")
## Warning: Ignoring unknown parameters: stat
## Some values in the specified facet_geo column 'PlanningArea' do not
## match the 'name' column of the specified grid and will be removed:
## NORTH-EASTERN ISLANDS