Assignment-4-Submission.utf8.md

1. Overview

This project aims to visualise the trending of demographic structure of Singapore population by age cohort and by planning area from 2011 to 2019. The data used was published on Singapore Department of Statistics Singstat.

2. Challenges and Proposed Sketch

2.1 Major Data and Design challenges and proposed solutions

Challenge A) There were large number of meaningless data from the dataset, there were 883,728 rows, of which 625,919 rows were records with zero (0) population count.
Solution A) Removed the rows with 0 population count before loading data into R.

Challenge B) There were 55 Planning areas in the dataset, due to the large number of planning areas variables, the charts might look very cluttered in certain chart types.
Solution B) A separate column ‘Region’ with only 5 variables is created with reference to the data from Wikipedia

Challenge C) The demographic characteristics was named using abbreviation such as PA,SZ,AG etc, for visualisation there will be a need to use column names that are understandable to all. Also within the column AG (Age), the variable from 5 to 9 years old was recorded as 5_to_9, this had resulted in error in the values order.
Solution C) Change the abbreviation by renaming the column names and manually indicate the desired_order for the Age variables.

Challenge D) Due to being new to this app, constantly faced with many error messages and some has no real solution such as “Error: attempt to use zero-length variable name”
Solution D) Need to play around with the app more and attempt to change the visualisation many times (this is the real challenge)

2.2 Proposed Sketch Design

alt text

3. Step-by-step descriptions

3.1 Installation of Packages

packages <- c('tidyverse','ggplot2')

for (p in packages) {
  if (!require (p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only=T)
}

## Loading required package: tidyverse

## -- Attaching packages ----------------------------------- tidyverse 1.2.1 --

## v ggplot2 3.2.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0

## -- Conflicts -------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

3.2 Rename the columns variables and fix the Age_Group data error

data <- read.csv ("respopagesextod2011to2019edited.csv")
head(data)

##        ï..PA                     SZ     AG   Sex
## 1 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 2 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 3 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 4 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 5 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
## 6 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males
##                                       TOD Pop Time     Region
## 1                 HDB 1- and 2-Room Flats   0 2011 North-East
## 2                        HDB 3-Room Flats  10 2011 North-East
## 3                        HDB 4-Room Flats  30 2011 North-East
## 4          HDB 5-Room and Executive Flats  50 2011 North-East
## 5 HUDC Flats (excluding those privatised)   0 2011 North-East
## 6                       Landed Properties   0 2011 North-East

names(data)[names(data) == "PA"] <- "Planning_Area"
names(data)[names(data) == "SZ"] <- "Subzone"
names(data)[names(data) == "AG"] <- "Age_Group"
names(data)[names(data) == "TOD"] <- "Housing"
names(data)[names(data) == "Pop"] <- "Population"
names(data)[names(data) == "Time"] <- "Year"
names(data)[names(data) == "Region"] <- "Region"
data <-as.data.frame(data)
head(data)

##        ï..PA                Subzone Age_Group   Sex
## 1 Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Males
## 2 Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Males
## 3 Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Males
## 4 Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Males
## 5 Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Males
## 6 Ang Mo Kio Ang Mo Kio Town Centre    0_to_4 Males
##                                   Housing Population Year     Region
## 1                 HDB 1- and 2-Room Flats          0 2011 North-East
## 2                        HDB 3-Room Flats         10 2011 North-East
## 3                        HDB 4-Room Flats         30 2011 North-East
## 4          HDB 5-Room and Executive Flats         50 2011 North-East
## 5 HUDC Flats (excluding those privatised)          0 2011 North-East
## 6                       Landed Properties          0 2011 North-East

summary(data)

##            ï..PA                   Subzone          Age_Group     
##  Bukit Merah  : 46512   Admiralty      :  2736   0_to_4  : 46512  
##  Queenstown   : 41040   Airport Road   :  2736   10_to_14: 46512  
##  Ang Mo Kio   : 32832   Alexandra Hill :  2736   15_to_19: 46512  
##  Downtown Core: 32832   Alexandra North:  2736   20_to_24: 46512  
##  Toa Payoh    : 32832   Aljunied       :  2736   25_to_29: 46512  
##  Hougang      : 27360   Anak Bukit     :  2736   30_to_34: 46512  
##  (Other)      :670320   (Other)        :867312   (Other) :604656  
##       Sex                                            Housing      
##  Females:441864   Condominiums and Other Apartments      :110466  
##  Males  :441864   HDB 1- and 2-Room Flats                :110466  
##                   HDB 3-Room Flats                       :110466  
##                   HDB 4-Room Flats                       :110466  
##                   HDB 5-Room and Executive Flats         :110466  
##                   HUDC Flats (excluding those privatised):110466  
##                   (Other)                                :220932  
##    Population           Year             Region      
##  Min.   :   0.00   Min.   :2011   Central   :366624  
##  1st Qu.:   0.00   1st Qu.:2013   East      : 82080  
##  Median :   0.00   Median :2015   North     :112176  
##  Mean   :  39.83   Mean   :2015   North-East:131328  
##  3rd Qu.:  10.00   3rd Qu.:2017   West      :191520  
##  Max.   :2860.00   Max.   :2019                      
##

data$Age_Group <- factor(data$Age_Group, levels=c("0_to_4","5_to_9","10_to_14","15_to_19","20_to_24","25_to_29","30_to_34","35_to_39","40_to_44","45_to_49","50_to_54","55_to_59","60_to_64","65_to_69", "70_to_74", "75_to_79","80_to_84","85_to_89","90_and_over"))
levels(data$Age_Group)

##  [1] "0_to_4"      "5_to_9"      "10_to_14"    "15_to_19"    "20_to_24"   
##  [6] "25_to_29"    "30_to_34"    "35_to_39"    "40_to_44"    "45_to_49"   
## [11] "50_to_54"    "55_to_59"    "60_to_64"    "65_to_69"    "70_to_74"   
## [16] "75_to_79"    "80_to_84"    "85_to_89"    "90_and_over"

3.3 Pyramid population chart for 2011 and 2019

As we only want to compare the data in 2011 and 2019, therefore we use the filter () function to select the relevant year data.

data_py2019 <- data %>%
  filter(Year==2019)
data_final2019 <- aggregate(Population~Age_Group+Sex,data=data_py2019,FUN=sum)
  
head(data_final2019)

##   Age_Group     Sex Population
## 1    0_to_4 Females      90850
## 2    5_to_9 Females      97040
## 3  10_to_14 Females     102550
## 4  15_to_19 Females     108910
## 5  20_to_24 Females     122480
## 6  25_to_29 Females     145960

data_py2011 <- data %>%
  filter(Year==2011)
data_final2011 <- aggregate(Population~Age_Group+Sex,data=data_py2011,FUN=sum)
  
head(data_final2011)

##   Age_Group     Sex Population
## 1    0_to_4 Females      92590
## 2    5_to_9 Females     102180
## 3  10_to_14 Females     117300
## 4  15_to_19 Females     128240
## 5  20_to_24 Females     127010
## 6  25_to_29 Females     135680

options(scipen=999)
ggplot(data_final2019, aes(x = Age_Group, fill = Sex,
                 y = ifelse(test = Sex == "Males",
                            yes = -Population, no = Population))) + 
  geom_bar(stat = "identity") +
  scale_y_continuous(labels = abs, limits = max(data_final2019$Population) * c(-1,1)) +theme_minimal()+
  coord_flip() + labs(title="Age-Sex Population pyramid for 2019", 
       caption="Source: Singapore Department of Statistics", x = "Age Group", y = "Population")

options(scipen=999)
ggplot(data_final2011, aes(x = Age_Group, fill = Sex,
                 y = ifelse(test = Sex == "Males",
                            yes = -Population, no = Population))) + 
  geom_bar(stat = "identity") +
  scale_y_continuous(labels = abs, limits = max(data_final2011$Population) * c(-1,1)) +theme_minimal()+
  coord_flip() + labs(title="Age-Sex Population pyramid for 2011", 
       caption="Source: Singapore Department of Statistics", x = "Age Group", y = "Population")

3.4 Adding the Region variables to the Pyramid population chart for 2011 and 2019

As we are also interesting to know the pattern across the different planning area, we added the region variables

data_region2019 <- data_py2019 %>%
  group_by(Age_Group, Sex, Region) %>%
  summarise(Population = sum(Population))
  ggplot(data_region2019, aes(x = Age_Group, fill = Sex, 
                               y = ifelse(test = Sex == "Males",
                            yes = -Population, no = Population))) + 
  geom_bar(stat = "identity") +
  coord_flip() + labs(title="Age,Sex,Region Population pyramid for 2019", 
       caption="Source: Singapore Department of Statistics and Wikipedia", x = "Age Group", y = "Population")+
      scale_y_continuous(breaks=NULL) +
   theme_minimal()+
  facet_grid(~ Region)

data_region2011 <- data_py2011 %>%
  group_by(Age_Group, Sex, Region) %>%
  summarise(Population = sum(Population))
  ggplot(data_region2011, aes(x = Age_Group, fill = Sex, 
                               y = ifelse(test = Sex == "Males",
                            yes = -Population, no = Population))) + 
  geom_bar(stat = "identity") +
  coord_flip() + labs(title="Age,Sex,Region Population pyramid for 2011", 
       caption="Source: Singapore Department of Statistics and Wikipedia", x = "Age Group", y = "Population")+
      scale_y_continuous(breaks=NULL) +
   theme_minimal()+
  facet_grid(~ Region)

4. Useful Information revealed by the data visualisation

Based on the Age-Sex population pyramid, we are able to tell in 2011, there were lesser people who are 90 and above, while this has increased quite a fair bit by 2019. This means that Singaporean could be leading a longer life. Also from the change in shape, we are able to tell that in 2011, the highest population age range is about the 45 to 49 years while in 2019 this has become the 55 to 59 years. This could mean thar in another 10 years or less, the bulk of the population will be the aging population. This could be a warning sign to Singaporean.
Based on the additional factor of Region, the shape of of the population by region is quite inline with the insight mentioned earlier. The point that is worth noting will be that in the north-east region, there seems to be a spike in the overall population as well as increase in younger age as well as young adult in the mid 30s. This is inline with the recents years of development in Sengkang, Punggol and hougang.

Sources
SingStat:(https://www.singstat.gov.sg/find-data/search-by-theme/population/geographic-distribution/latest-data)
Wikipedia:(https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore)

Singapore Population Demographics Trending from 2011 to 2019