Student Name: Zhao Tianyun
Student ID: 01368190
The raw data consists of Singapore Residents by Planning Area Subzone, Age Group, Sex and Type of Dwelling, from year 2011-2010. This markdown file aims to reveal the demographic structure of Singapore population by age cohort and by planning area in 2019.
To show the population structure(age, sex) of Singapore. To visualize the population distribution among planning areas. To uncover the population structure among the planning areas.
The major challenge is to tidy up and wrangle the data set properly. The data set contains data related to type of dwelling and is categorized based on planning area, subzone, age group, sex, type of dwelling. I will need to tidy up the dataset and remove infromation that are not necessary for my purpose. Another challenge is that the age coherts are too divided, which makes it difficult to understand the bigger picture. Thus, I will group the age groups into three categories, namely the young, the economic active, and the aged groups, as it will help me to visualize underlying patterns. After readying the data set for visualization, another challenge is to come up with plots that can properly serve my purpose. I propose to construct a population pyramid, a bar chart of total population by planning area, and bar charts of different age groups by planning area. You may refer to the picture below for my proposed design.
A caption
Now, we will go through the data wrangling and visualization steps ## Loading required packages
packages = c('tidyverse', 'lemon')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
}
## Loading required package: tidyverse
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Loading required package: lemon
## Warning: package 'lemon' was built under R version 4.0.3
##
## Attaching package: 'lemon'
## The following object is masked from 'package:purrr':
##
## %||%
## The following objects are masked from 'package:ggplot2':
##
## CoordCartesian, element_render
pop <- read_csv('data/respopagesextod2011to2020.csv')
## Parsed with column specification:
## cols(
## PA = col_character(),
## SZ = col_character(),
## AG = col_character(),
## Sex = col_character(),
## TOD = col_character(),
## Pop = col_double(),
## Time = col_double()
## )
head(pop, 10)
## # A tibble: 10 x 7
## PA SZ AG Sex TOD Pop Time
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 1- and 2-Room Flats 0 2011
## 2 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 3-Room Flats 10 2011
## 3 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 4-Room Flats 30 2011
## 4 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 5-Room and Executive~ 50 2011
## 5 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HUDC Flats (excluding th~ 0 2011
## 6 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Landed Properties 0 2011
## 7 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Condominiums and Other A~ 40 2011
## 8 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Others 0 2011
## 9 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 1- and 2-Room Flats 0 2011
## 10 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 3-Room Flats 10 2011
summary(pop)
## PA SZ AG Sex
## Length:984656 Length:984656 Length:984656 Length:984656
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## TOD Pop Time
## Length:984656 Min. : 0.00 Min. :2011
## Class :character 1st Qu.: 0.00 1st Qu.:2013
## Mode :character Median : 0.00 Median :2016
## Mean : 39.86 Mean :2016
## 3rd Qu.: 10.00 3rd Qu.:2018
## Max. :2860.00 Max. :2020
pop2019 <- pop %>% filter(`Time` == 2019)
pop2019
## # A tibble: 98,192 x 7
## PA SZ AG Sex TOD Pop Time
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 1- and 2-Room Flats 0 2019
## 2 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 3-Room Flats 10 2019
## 3 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 4-Room Flats 10 2019
## 4 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HDB 5-Room and Executive~ 20 2019
## 5 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males HUDC Flats (excluding th~ 0 2019
## 6 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Landed Properties 0 2019
## 7 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Condominiums and Other A~ 50 2019
## 8 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Males Others 0 2019
## 9 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 1- and 2-Room Flats 0 2019
## 10 Ang Mo K~ Ang Mo Kio Town~ 0_to~ Femal~ HDB 3-Room Flats 10 2019
## # ... with 98,182 more rows
popage <- pop2019 %>%
group_by(AG, Sex) %>%
summarise(sum = sum(Pop))
## `summarise()` regrouping output by 'AG' (override with `.groups` argument)
popage
## # A tibble: 38 x 3
## # Groups: AG [19]
## AG Sex sum
## <chr> <chr> <dbl>
## 1 0_to_4 Females 90850
## 2 0_to_4 Males 94730
## 3 10_to_14 Females 102550
## 4 10_to_14 Males 105830
## 5 15_to_19 Females 108910
## 6 15_to_19 Males 113730
## 7 20_to_24 Females 122480
## 8 20_to_24 Males 127040
## 9 25_to_29 Females 145960
## 10 25_to_29 Males 142640
## # ... with 28 more rows
poppa <- pop2019 %>%
group_by(PA) %>%
summarise(sum = sum(Pop))
## `summarise()` ungrouping output (override with `.groups` argument)
poppa
## # A tibble: 55 x 2
## PA sum
## <chr> <dbl>
## 1 Ang Mo Kio 164430
## 2 Bedok 279970
## 3 Bishan 88230
## 4 Boon Lay 0
## 5 Bukit Batok 154140
## 6 Bukit Merah 152600
## 7 Bukit Panjang 139700
## 8 Bukit Timah 77720
## 9 Central Water Catchment 0
## 10 Changi 1790
## # ... with 45 more rows
pop_aged <- pop %>%
filter(`AG` == c("65_to_69", "70_to_74", "75_to_79", "80_to_84", "85_to_89", "90_and_over"))
## Warning in AG == c("65_to_69", "70_to_74", "75_to_79", "80_to_84", "85_to_89", :
## longer object length is not a multiple of shorter object length
pop_aged <- pop_aged %>%
group_by(PA) %>%
summarise(Pop_sum = sum(Pop))
## `summarise()` ungrouping output (override with `.groups` argument)
pop_active <- pop %>%
filter(`AG` == c("25_to_29", "30_to_34", "35_to_39", "40_to_44", "45_to_49", "50_to_54", "55_to_59", "60_to_64"))
pop_active <- pop_active %>%
group_by(PA) %>%
summarise(Pop_sum = sum(Pop))
## `summarise()` ungrouping output (override with `.groups` argument)
pop_young <- pop %>%
filter(`AG` == c("0_to_4", "5_to_9", "10_to_14", "15_to_19", "20_to_24"))
## Warning in AG == c("0_to_4", "5_to_9", "10_to_14", "15_to_19", "20_to_24"):
## longer object length is not a multiple of shorter object length
pop_young <- pop_young %>%
group_by(PA) %>%
summarise(Pop_sum = sum(Pop))
## `summarise()` ungrouping output (override with `.groups` argument)
v1 <- popage %>%
arrange(sum) %>%
mutate(AG = factor(AG, levels=c("0_to_4", "5_to_9", "10_to_14", "15_to_19", "20_to_24", "25_to_29", "30_to_34", "35_to_39", "40_to_44", "45_to_49", "50_to_54", "55_to_59", "60_to_64", "65_to_69", "70_to_74", "75_to_79", "80_to_84", "85_to_89", "90_and_over"))) %>%
ggplot(
aes(x = AG, y = ifelse(Sex == "Males", -sum, sum), fill = Sex)) +
geom_bar(stat = "identity") +
scale_y_continuous(breaks = seq(-200000, 200000, 50000)) +
ggtitle("Population Pyramid of Singapore") +
xlab("Count") + ylab("Age") +
coord_flip()
v1
v2 <- poppa %>%
ggplot(aes(x = reorder(PA, -sum), y = sum)) +
geom_bar(stat = "identity") +
scale_x_discrete(guide = guide_axis(n.dodge=8)) +
ggtitle("Population by planning area") +
xlab("Planning Area") + ylab("Population") +
coord_cartesian(ylim = c(0,300000))
v2
aged <- pop_aged %>%
ggplot(aes(x = reorder(PA, -Pop_sum), y = Pop_sum)) +
geom_bar(stat = "identity") +
scale_x_discrete(guide = guide_axis(n.dodge=8))
aged
active <- pop_active %>%
ggplot(aes(x = reorder(PA, -Pop_sum), y = Pop_sum)) +
geom_bar(stat = "identity") +
scale_x_discrete(guide = guide_axis(n.dodge=8))
active
young <- pop_young %>%
ggplot(aes(x = reorder(PA, -Pop_sum), y = Pop_sum)) +
geom_bar(stat = "identity") +
scale_x_discrete(guide = guide_axis(n.dodge=8))
young
The first visualization is an age/sex pyramid of the sum of all planning areas. In this plot, we will be able to see the proportion of population of different age coherts and respective gender compositions.
The second visualization is a bar chart showing the total population of planning areas, sorted in descending order. We will be able to rank the planning areas by population.
The third visualization consists of three plots. The age cohorts are grouped into three segments. From age 0 - 24 is the young group. From age 25 - 64 is the economically active group. And from age 65 to 90 and above is the aged group. From these three visualizations, we will be able to see the different age structure within different planning areas.
There are generally more males than females in younger age groups, and more females than males in elder age groups. Indicating that in recent years, there maybe more male newborn children than female, and that females generally have a longer life expectency than their male counterparts.
Among planning areas, Bedok, Jurong West, Tampines, Woodlands, Sengkang, Hougang, Yishun and Choa Chu Kang have the greatest population, indicating that these are major residential areas. On the other hand, planning areas such as Seletar, Paya Lebar, Pioneer, Tuas, and North-Eastern Islands have nearly no population or no population at all, indicating that these planning areas are planned for residential areas, but for other uses, such as industrial parks, military grounds or nature reserves.
Of all the planning areas, Bedok, Bukit Merah and Hougang have significant aged population. These planning areas may likely be old neighborhoods. On the other hand, Jurong West, Sengkang and Bedok have high population of economically active population, this may indicate that these neighborhood are popular among working adults or they are convenient to reach commercial districts. Last but not least, Woodlands, Jurong West, and Bedok are home to large number of young population. This may show that these planning areas are considered as ideal neighborhoods for raising children or these areas have lower housing prices.