Overview
In this visualization, we will only explore data in year 2019 and we aim to answer the following questions:
What is the overall demographic structure broken down by age cohort?
What is the demographic structure broken down by planning area?
Does difference gender have a preference in which area they stay in?
Major Data and Design Challenges
Key Challenges
1. Data Challenge - Missing Data
One of the data challenge is there are many missing data entries.
The following planning areas have no population numbers. This is an error in the dataset, as there should be at least some residents residing in the following area:
| Planning Area with Missing Data | ||
|---|---|---|
| North-Eastern Islands | Tuas | Boon Lay |
| Western Islands | Changi Bay | Paya Lebar |
| Marina East | Tuas | Boon Lay |
| Western Islands | Pioneer | Central Water Catchment |
| Tengah | Straits View | Simpang |
| Marina South |
The following type of dwelling have no population numbers. This is an error in the dataset, as there should be at least some residents residing in the following dwelling:
| Type of Dwelling with Missing Data |
|---|
| HUDC Flats (excluding those privatised) |
2. Design Challenge - Huge dataset hard for visualizing
There are a total of 55 planning areas and 323 subzones. Visualizing this dataset would be difficult. It is not feasible to use the 55 planning areas as a cluster, because it would be hard for users to understand what is being visualized. It would also be hard for users to draw valuable insights if we used a huge cluster size.
Suggestions to Overcome Challenges
| No. | Challenge | Proposed Solution |
|---|---|---|
| 1 | Missing Data | Cleanse the data and remove these unnecessary data rows |
| 2 | Huge Data Set and Huge Cluster Size | Break down the planning area in regions (Central, East, North, North-East and West). We will then visualize the data based on region which is a more manageable cluster size as compared to planning area and subzone. To do so, download the URA Master Plan subzone boundary in shapefile format (i.e. MP14_SUBZONE_WEB_PL) found from Data.gov.sg, to get the region information. Then map the information accordingly. |
Proposed Design
Step-by-Step Guide
Step 1: Load Packages
Step 2: Read Data
Please kindly note that data has been cleanse using excel and entries without population count mentioned earlier are removed. In addition, we have created a variable call ‘levelorder’ so that we can arrange Age Group in ascending order later.
data <- read.csv("C:/Users/think/Desktop/1-SMU Term 3/5-Visual Analytics and Applications/Assignment 4/2019data.csv", header = T)
levelorder <- c('0_to_4', '5_to_9', '10_to_14', '15_to_19', '20_to_24', '25_to_29',
'30_to_34', '35_to_39', '40_to_44', '45_to_49', '50_to_54', '55_to_59',
'60_to_64','65_to_69', '70_to_74','75_to_79', '80_to_84','85_to_89', '90_and_over') Step 3: Visualize overall population structure
First, we have to prepare the data using the following code:
#data extraxtion for female
dataF <- filter(data, Gender=="Females")
dataF <-cbind(aggregate(Count ~ Age, dataF, sum))
dataF <-cbind(dataF, Gender = "Female")
#data extraction for male
dataM <- filter(data, Gender=="Males")
dataM <-cbind(aggregate(Count ~ Age, dataM, sum))
dataM <-cbind(dataM, Gender = "Male")
#Combine the data together
data_pyramid <-rbind(dataF,dataM)Then, simply plot the population pyramid using this code:
g1c<- ggplot(data_pyramid, aes(x = factor(Age, levelorder), fill = Gender,
y = ifelse(test =Gender == "Male", yes = -Count, no = Count))) +
geom_bar(stat = "identity") +
scale_y_continuous(labels = abs, limits = max(data_pyramid$Count) * c(-1,1)) +
labs(title = "Overview of Singapore Population", x = "Age", y = "Population") +
coord_flip() +
scale_fill_brewer(palette = "Set1") +
scale_y_discrete(breaks=NULL) +
theme_minimal()Step 4: Visualize population size in each Region break down by Age
simply plot the graph using this code:
data %>%
group_by(Age, Gender, Region) %>%
summarise(Population = sum(Count)) %>%
ggplot(aes(x = factor(Age, levelorder), y = Population, fill = Gender)) +
geom_bar(stat = "identity") +
labs(title = "Population Size in each region", x = "Age", y = "Population") +
coord_flip() +
scale_fill_brewer(palette = "Set1") +
scale_y_continuous(breaks=NULL) +
theme_minimal() +
facet_grid(~ Region)Step 5: Visualize population distribution in each Region break down by Gender
First, use a barchart to visualize population size in each region break down by gender.
data %>%
group_by(Age, Gender, Region) %>%
summarise(Population = sum(Count)) %>%
ggplot(aes(x = Region, y = Population, fill = Gender)) +
geom_bar(stat = "identity") +
labs(title = "Number of Males and Females in each Region", x = "Region", y = "Population") +
scale_fill_brewer(palette = "Set1") +
theme_minimal() Then, break it down further based on population demographics.
data %>%
group_by(Age, Gender, Region) %>%
summarise(Population = sum(Count)) %>%
ggplot(aes(x = factor(Age, levelorder), y = Population, fill = Gender)) +
geom_bar(stat = "identity") +
labs(title = "Population Distribution breakdown by Gender and Region", x = "Region", y = "Population") +
scale_fill_brewer(palette = "Set1") +
scale_x_discrete(breaks=NULL) +
theme_minimal() +
facet_grid(Gender ~ Region)Final Visualization
Insight 1: Aging Population
The shape of the population pyramid indicates that Singapore is facing a aging population. According to World bank, Singapore’s fertility rate was approximately 1.14 births per woman in 2019, hence a aging population.
Insight 2: Fewer resident in North and East region due to smaller planning area
Based on the graph below, the population size in the North and East regions are smaller. This is because there are fewer planning areas allocated to North and East regions, meaning it is a smaller segment on the Singapore map.
It is also important to note that, although Central region has the most number of planning areas allocated to it. The population size in Central is similar to North-East and West region. This is because Central region are dominated by office and retail buildings. Furthermore, residential prices in Central region is expensive, therefore fewer people can afford it.
Below is a summary on the number of planning areas allocated for each region.
| Region | Number of Planning Area |
|---|---|
| Central | 33744 |
| East | 6688 |
| North | 7296 |
| North-East | 10640 |
| West | 12768 |
Insight 3: No preference in where different gender prefer to stay
There are no preferences in which region different gender prefer stays. There are approximately equal number of male to number of females living in each region. Furthermore, the population distribution of male versus female in each region mirrors each other.
Below is a summary on the number of males and females in each region.
| Gender | Central | East | North | North-East | West |
|---|---|---|---|---|---|
| Female | 480,370 | 351,630 | 289,040 | 473,290 | 466,390 |
| Male | 444,460 | 335,360 | 285,670 | 450,830 | 456,380 |