Demography of Hong Kong SAR

Hong Kong is the seventh most overpopulated city in the world, with a mere land size of 1,106 km² and a population of close to 7.5 million. Hong Kong has been widely labeled as the most densely populated country, and is facing declining birth rates along with an ageing society. It is estimated that by 2023, 26.8% of the population will be over 65 years old. The cost of living in Hong Kong is also exorbitant (especially concerns over housing and child care affordability).

Hong Kong is split into 18 districts (political areas), drawn according to mountains, coastlines and roads. District elections will elect councils for each of Hong Kong’s 18 districts. It makes sense to view the demography of Hong Kong by districts, since each district council (and their focus) is unique. Each district also differs in terms of its degree of urbanisation - For example, Sham Shui Po district is the poorest district in Hong Kong, with the lowest median household income across all districts. Notably, Hong Kong’s population is heavily concentrated along the Northern an Southern Shores where urban and metro areas are located.

Districts in Hong Kong

Aim

In this publication, we aim to uncover the demography of Hong Kong according to age groups by its districts. The data is retrieved from 2016 Population By-census - District Profiles (Constituency Areas, the official public platform for spatial data in Hong Kong.

The shapefile uses follows the following district separation:

geography of Hong Kong

1. Major Data and Design challenges

Major challenge Description
Data Large dataset with many variables; Need to sieve through numerous fields to decide which field to visualise.
Data Numeric fields are stored as strings i.e. “20 000”, thus there is a need to wrangle the data.
Data Column headers are not intuitive i.e. age_1; reference to the main website is needed to deduce the variable.
Design Visualisation may not reflect the exact geography sine the shapefile stores the constituency borders instead of the actual geographic area.

Proposed sketch design

Proposed sketch design

2. Step-by-step description of how data visualisation was prepared using ggplot2

Load libraries and dataset

Steps:
1. Set working directory & load essential libraries
2. Store the dataset for Hong Kong’s population census into df
3. View the dimensions of the dataset, along with the structure of variables and an overview of each column and its records
4. Check for NA values
5. Store the shp file into hkgeo

knitr::opts_chunk$set(warning = FALSE, echo = TRUE, eval = TRUE, message = FALSE, results = 'asis')
setwd("C:/Users/X Lin/Desktop/Notes/4.1/VA/assgn 5")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(tmap)
## Warning: package 'tmap' was built under R version 4.0.3
library(sf)
## Warning: package 'sf' was built under R version 4.0.3
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
df <- read.csv("hong kong census.csv")

dim(df)
## [1] 432 214
glimpse(df)
## Rows: 432
## Columns: 214
## $ dc_class          <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",...
## $ dc                <int> 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 1...
## $ dcca_class        <chr> "A01", "A02", "A03", "A04", "A05", "A06", "A07", ...
## $ dcca              <int> 1101, 1102, 1103, 1104, 1105, 1106, 1107, 1108, 1...
## $ dc_eng            <chr> "Central and Western ", "Central and Western ", "...
## $ ca_eng            <chr> "Chung Wan", "Mid Levels East", "Castle Road", "P...
## $ dcca_eng          <chr> "Central and Western - Chung Wan", "Central and W...
## $ dc_chi            <chr> "中西å\215\200 ", "中西å\215\200 ", "中西å\...
## $ ca_chi            <chr> "中環", "å\215Šå±±æ\235±", "衛城", "山頂", ...
## $ dcca_chi          <chr> "中西å\215\200 - 中環", "中西å\215\200 - å\...
## $ t_pop             <chr> " 12 501", " 17 009", " 20 058", " 20 263", " 18 ...
## $ pop_m             <chr> " 5 892", " 7 584", " 8 402", " 8 010", " 6 837",...
## $ pop_f             <chr> " 6 609", " 9 425", " 11 656", " 12 253", " 11 19...
## $ sr                <chr> "892", "805", "721", "654", "611", "822", "866", ...
## $ age_1             <chr> " 1 024", " 1 453", " 2 433", " 2 374", " 2 015",...
## $ age_2             <chr> " 1 100", " 1 478", " 1 948", " 1 945", " 2 037",...
## $ age_3             <chr> " 3 650", " 6 170", " 6 564", " 6 717", " 5 492",...
## $ age_4             <chr> " 4 453", " 5 712", " 6 538", " 7 029", " 5 708",...
## $ age_5             <chr> " 2 274", " 2 196", " 2 575", " 2 198", " 2 780",...
## $ t_ma              <dbl> 47.4, 42.8, 42.3, 42.7, 43.0, 43.1, 43.0, 45.5, 4...
## $ ma_m              <dbl> 45.7, 44.3, 42.7, 43.8, 45.2, 44.1, 43.1, 45.1, 4...
## $ ma_f              <dbl> 48.2, 41.4, 42.0, 41.9, 42.5, 42.4, 43.0, 45.9, 4...
## $ born_hk           <chr> " 6 171", " 9 197", " 11 624", " 8 660", " 10 489...
## $ born_chi          <chr> " 2 930", " 2 332", " 2 070", " 2 003", " 2 631",...
## $ born_else         <chr> " 3 400", " 5 480", " 6 364", " 9 600", " 4 912",...
## $ ethn_chi          <chr> " 9 239", " 11 513", " 13 821", " 10 499", " 13 2...
## $ ethn_phi          <chr> "598", " 1 440", " 2 614", " 4 670", " 2 251", "6...
## $ ethn_ind          <chr> "438", "633", "545", "574", "692", "287", "330", ...
## $ ethn_wh           <chr> " 1 507", " 1 929", " 1 692", " 2 656", " 1 034",...
## $ ethn_oth          <chr> "719", " 1 494", " 1 386", " 1 864", "763", "863"...
## $ ms_nm_m           <chr> " 2 323", " 2 581", " 1 977", " 1 905", " 1 933",...
## $ ms_ma_m           <chr> " 2 871", " 3 912", " 4 982", " 4 582", " 3 895",...
## $ ms_win_m          <chr> "88", "58", "106", "78", "99", "232", "211", "163...
## $ ms_div_m          <chr> "44", "173", "111", "99", "105", "237", "136", "7...
## $ ms_sep_m          <chr> "38", "44", "16", "83", "-", "-", "27", "57", "26...
## $ ms_nm_f           <chr> " 1 570", " 2 990", " 3 135", " 3 092", " 3 021",...
## $ ms_m_f            <chr> " 3 138", " 4 858", " 5 954", " 6 796", " 5 897",...
## $ ms_win_f          <chr> "733", "439", "674", "617", "588", " 1 001", "688...
## $ ms_div_f          <chr> "545", "455", "547", "576", "415", "359", "396", ...
## $ ms_sep_f          <chr> "127", "46", "123", "61", "64", "43", "19", "50",...
## $ ul_can            <chr> " 8 117", " 10 508", " 12 210", " 8 179", " 12 21...
## $ ul_put            <chr> " 1 004", "490", "421", "590", "329", "526", "552...
## $ ul_othchi         <chr> "52", "71", "146", "97", "56", "493", "648", "416...
## $ ul_eng            <chr> " 2 647", " 4 571", " 5 357", " 9 281", " 4 200",...
## $ ul_oth            <chr> "467", "916", " 1 120", " 1 439", "597", "283", "...
## $ readchi_ablepctn  <dbl> 73.8, 69.9, 71.0, 54.4, 74.0, 80.5, 92.0, 89.2, 8...
## $ readeng_ablepctn  <dbl> 80.7, 90.3, 93.5, 95.3, 92.3, 77.2, 68.0, 74.8, 8...
## $ writechi_ablepctn <dbl> 72.3, 67.4, 70.0, 53.6, 73.0, 76.7, 89.9, 87.5, 8...
## $ writeeng_ablepctn <dbl> 79.7, 89.3, 92.2, 94.3, 91.7, 72.8, 66.5, 73.3, 7...
## $ edu_prepri        <chr> "493", "340", "396", "355", "371", " 1 054", "682...
## $ edu_pri           <chr> "998", "773", "511", "381", "828", " 1 322", " 1 ...
## $ edu_lsed          <chr> " 1 304", "906", "931", " 1 057", " 1 115", " 1 8...
## $ edu_usec          <chr> " 2 926", " 3 790", " 4 281", " 4 299", " 3 833",...
## $ edu_dip           <chr> "652", " 1 062", "911", " 1 291", " 1 269", "885"...
## $ edu_sub           <chr> "456", "690", "572", "900", "676", "880", "593", ...
## $ edu_deg           <chr> " 4 648", " 7 995", " 10 023", " 9 606", " 7 925"...
## $ pls_same          <chr> "646", "824", " 1 276", "840", " 1 577", " 1 359"...
## $ pls_diff_hk       <chr> "566", "838", " 1 191", " 1 164", "900", "925", "...
## $ pls_diff_kln      <chr> "160", "217", "225", "331", "216", "247", "380", ...
## $ pls_diff_nt       <chr> "63", "110", "126", "63", "112", "99", "148", "15...
## $ s_diff_oth        <chr> "28", "43", "30", "61", "23", "18", "11", "-", "5...
## $ t_lf              <chr> " 7 886", " 10 760", " 12 009", " 12 853", " 10 5...
## $ lf_m              <chr> " 3 867", " 5 013", " 5 287", " 5 236", " 4 029",...
## $ lf_f              <chr> " 4 019", " 5 747", " 6 722", " 7 617", " 6 568",...
## $ lfpr_t            <dbl> 68.7, 69.2, 68.1, 71.8, 66.2, 62.1, 65.0, 65.1, 6...
## $ lfpr_m            <dbl> 72.1, 74.1, 73.5, 77.6, 66.8, 68.2, 70.6, 70.4, 6...
## $ lfpr_f            <dbl> 65.7, 65.4, 64.4, 68.4, 65.8, 57.3, 60.0, 60.2, 6...
## $ t_wp              <chr> " 7 647", " 10 509", " 11 655", " 12 615", " 10 3...
## $ wp_ee             <chr> " 5 972", " 8 809", " 9 770", " 10 353", " 8 747"...
## $ wp_er             <chr> "625", "725", "975", " 1 155", "848", "433", "275...
## $ wp_se             <chr> "946", "893", "757", "850", "760", "578", "697", ...
## $ wp_fw             <chr> "104", "82", "153", "257", "34", "148", "99", "10...
## $ t_nwp             <chr> " 4 854", " 6 500", " 8 403", " 7 648", " 7 643",...
## $ nwp_hm            <chr> "591", "991", " 1 064", " 1 402", "989", "976", "...
## $ nwp_st            <chr> " 1 487", " 2 146", " 3 315", " 3 372", " 3 186",...
## $ nwp_re            <chr> " 1 684", " 2 156", " 2 626", " 1 562", " 2 449",...
## $ nwp_oth           <chr> " 1 092", " 1 207", " 1 398", " 1 312", " 1 019",...
## $ plw_same          <chr> " 3 061", " 3 604", " 3 279", " 3 419", " 3 120",...
## $ plw_diff_hk       <chr> " 1 132", " 1 701", " 2 052", " 1 454", " 1 632",...
## $ plw_diff_kln      <chr> "954", " 1 459", " 1 183", "935", " 1 188", " 1 2...
## $ plw_diff_nt       <chr> "266", "437", "531", "482", "435", "467", "592", ...
## $ plw_diff_oth      <chr> "43", "241", "196", "24", "175", "211", "170", "1...
## $ plw_nofix         <chr> "644", "640", "526", "278", "440", "743", "800", ...
## $ plw_hm            <chr> " 1 289", " 2 072", " 3 398", " 5 585", " 3 122",...
## $ plw_out           <chr> "258", "355", "490", "438", "277", "252", "119", ...
## $ mearn_xfdh_1      <chr> " 1 536", " 2 455", " 3 351", " 4 738", " 3 170",...
## $ mearn_xfdh_2      <chr> "616", "443", "671", " 1 160", "778", "820", "968...
## $ mearn_xfdh_3      <chr> " 1 766", " 1 374", " 1 019", "700", "923", " 2 3...
## $ mearn_xfdh_4      <chr> "956", " 1 188", "840", "576", "833", " 1 217", "...
## $ mearn_xfdh_5      <chr> "564", "669", "645", "435", "639", "806", "596", ...
## $ mearn_xfdh_6      <chr> "796", " 1 322", " 1 004", "918", " 1 054", "934"...
## $ mearn_xfdh_7      <chr> " 1 309", " 2 976", " 3 972", " 3 831", " 2 958",...
## $ t_mmearn          <chr> "18,000", "25,500", "28,000", "13,000", "20,000",...
## $ mmearm_m          <chr> "27,500", "50,000", "60,000", "80,000", "50,000",...
## $ mmearn_f          <chr> "10,000", "12,600", "7,000", "5,000", "7,000", "1...
## $ mearn_xfdhfw_1    <chr> "645", "489", "291", "105", "357", "733", "965", ...
## $ mearn_xfdhfw_2    <chr> "491", "394", "446", "444", "613", "762", "935", ...
## $ mearn_xfdhfw_3    <chr> " 1 766", " 1 374", "996", "639", "923", " 2 309"...
## $ mearn_xfdhfw_4    <chr> "956", " 1 188", "840", "576", "833", " 1 217", "...
## $ mearn_xfdhfw_5    <chr> "564", "669", "645", "435", "639", "806", "596", ...
## $ mearn_xfdhfw_6    <chr> "796", " 1 322", " 1 004", "918", " 1 054", "934"...
## $ mearn_xfdhfw_7    <chr> " 1 309", " 2 976", " 3 972", " 3 831", " 2 958",...
## $ t_mmearn_xfdh     <chr> "20,000", "40,000", "55,000", "75,000", "45,000",...
## $ mmearn_xfdh_m     <chr> "27,500", "50,000", "60,000", "96,750", "51,000",...
## $ mmearn_xfdh_f     <chr> "16,000", "27,000", "45,000", "45,000", "30,000",...
## $ wp_a              <chr> " 1 149", " 2 676", " 3 173", " 3 358", " 2 249",...
## $ wp_b              <chr> "895", " 1 573", " 1 699", " 1 624", " 1 682", " ...
## $ wp_c              <chr> " 1 509", " 2 317", " 1 864", " 1 245", " 1 716",...
## $ wp_d              <chr> "720", "937", "704", "511", "861", " 1 164", " 1 ...
## $ wp_e              <chr> " 1 465", "498", "615", "270", "502", " 1 229", "...
## $ wp_f              <chr> "221", "96", "87", "8", "63", "262", "303", "212"...
## $ wp_g              <chr> "100", "62", "56", "152", "34", "321", "385", "28...
## $ wp_h              <chr> " 1 588", " 2 350", " 3 450", " 5 447", " 3 282",...
## $ wp_i              <chr> "-", "-", "7", "-", "-", "-", "11", "-", "-", "-"...
## $ wp_j              <chr> "47", "228", "156", "239", "152", "129", "249", "...
## $ wp_k              <chr> "164", "385", "165", "34", "281", "420", "384", "...
## $ wp_l              <chr> " 1 651", " 1 625", " 1 634", " 1 713", " 1 429",...
## $ wp_m              <chr> "388", "327", "305", "156", "201", "672", "706", ...
## $ wp_n              <chr> "483", "188", "369", "213", "323", "489", "817", ...
## $ wp_o              <chr> "152", "640", "463", "124", "459", "362", "191", ...
## $ wp_p              <chr> " 1 123", " 2 073", " 1 912", " 2 054", " 1 343",...
## $ wp_q              <chr> " 1 285", " 1 584", " 1 779", " 1 544", " 1 680",...
## $ wp_r              <chr> " 1 023", " 1 111", " 1 375", "911", " 1 367", " ...
## $ wp_s              <chr> " 1 296", " 2 333", " 3 448", " 5 627", " 3 118",...
## $ wp_t              <chr> "35", "15", "49", "-", "36", "29", "30", "30", "7...
## $ whr_1             <chr> "581", "719", "425", "555", "525", "730", "712", ...
## $ whr_2             <chr> "866", "705", "924", "880", "758", "861", "836", ...
## $ whr_3             <chr> " 2 228", " 3 303", " 3 787", " 3 136", " 3 198",...
## $ whr_4             <chr> " 2 159", " 3 024", " 3 444", " 3 952", " 3 293",...
## $ whr_5             <chr> "842", " 1 467", " 1 235", " 1 808", " 1 503", "9...
## $ whr_6             <chr> "971", " 1 291", " 1 840", " 2 284", " 1 112", "7...
## $ dh                <chr> " 5 289", " 6 521", " 6 001", " 5 357", " 5 027",...
## $ dhz_1             <chr> " 2 273", " 2 103", "806", "799", "784", " 1 588"...
## $ dhz_2             <chr> " 1 127", " 1 754", " 1 376", "940", "983", " 1 8...
## $ dhz_3             <chr> "935", " 1 008", " 1 284", "743", " 1 001", " 1 0...
## $ dhz_4             <chr> "422", "810", " 1 159", " 1 092", "925", "955", "...
## $ dhz_5             <chr> "418", "534", "886", "811", "776", "423", "341", ...
## $ dhz_6             <chr> "114", "312", "490", "972", "558", "191", "90", "...
## $ adhz              <dbl> 2.2, 2.5, 3.3, 3.7, 3.4, 2.6, 2.5, 2.4, 3.0, 2.6,...
## $ dhc_1             <chr> "788", " 1 291", " 1 338", "899", "969", " 1 092"...
## $ dhc_2             <chr> " 1 022", " 1 925", " 2 492", " 2 469", " 2 125",...
## $ dhc_3             <chr> "354", "240", "431", "80", "263", "585", "613", "...
## $ dhc_4             <chr> "31", "67", "57", "53", "22", "49", "26", "66", "...
## $ dhc_5             <chr> "173", "83", "158", "110", "240", "131", "180", "...
## $ dhc_6             <chr> "391", "322", "375", "269", "347", "437", "518", ...
## $ dhc_7             <chr> " 2 273", " 2 103", "806", "799", "784", " 1 588"...
## $ dhc_8             <chr> "257", "490", "344", "678", "277", "240", "244", ...
## $ dhi_1             <chr> "777", "598", "455", "245", "392", "568", "709", ...
## $ dhi_2             <chr> "367", "158", "138", "270", "158", "251", "549", ...
## $ dhi_3             <chr> "701", "508", "338", "195", "289", " 1 029", " 1 ...
## $ dhi_4             <chr> "613", "598", "281", "208", "270", "785", " 1 158...
## $ dhi_5             <chr> "564", "412", "288", "292", "291", "600", "632", ...
## $ dhi_6             <chr> "786", "964", "631", "362", "532", "842", "829", ...
## $ dhi_7             <chr> " 1 481", " 3 283", " 3 870", " 3 785", " 3 095",...
## $ ma_hh             <chr> "33,630", "60,000", "90,000", "132,250", "83,000"...
## $ dhi_e1            <chr> "186", "106", "119", "26", "23", "93", "89", "78"...
## $ dhi_e2            <chr> "161", "24", "38", "30", "16", "126", "190", "69"...
## $ dhi_e3            <chr> "560", "292", "210", "62", "144", "787", "978", "...
## $ dhi_e4            <chr> "539", "548", "198", "116", "192", "763", " 1 115...
## $ dhi_e5            <chr> "545", "380", "244", "246", "241", "546", "615", ...
## $ dhi_e6            <chr> "778", "942", "589", "327", "444", "809", "810", ...
## $ dhi_e7            <chr> " 1 434", " 3 190", " 3 734", " 3 729", " 2 970",...
## $ ma_econhh         <chr> "43,000", "70,000", "104,210", "174,460", "100,00...
## $ oq_pub            <chr> "-", "-", "-", "-", "-", "615", " 2 210", "-", "-...
## $ oq_s              <chr> "-", "-", "-", "-", "-", "-", "-", "-", "-", "-",...
## $ oq_pri            <chr> " 4 950", " 6 499", " 6 002", " 5 433", " 4 997",...
## $ oq_non            <chr> "444", "141", "25", "24", "495", "46", "534", "96...
## $ oq_tem            <chr> "13", "15", "-", "-", "4", "2", "-", "1", "9", "1...
## $ dh_pub            <chr> "-", "-", "-", "-", "-", "615", " 2 210", "-", "-...
## $ dh_s              <chr> "-", "-", "-", "-", "-", "-", "-", "-", "-", "-",...
## $ dh_pri            <chr> " 4 856", " 6 384", " 5 976", " 5 339", " 4 986",...
## $ dh_non            <chr> "420", "122", "25", "18", "37", "46", "110", "86"...
## $ dh_tem            <chr> "13", "15", "-", "-", "4", "2", "-", "1", "9", "1...
## $ pop_pub           <chr> "-", "-", "-", "-", "-", " 2 097", " 6 139", "-",...
## $ pop_s             <chr> "-", "-", "-", "-", "-", "-", "-", "-", "-", "-",...
## $ pop_pri           <chr> " 11 838", " 16 702", " 19 983", " 20 095", " 17 ...
## $ pop_non           <chr> "650", "285", "75", "168", "784", "92", "915", "3...
## $ pop_tem           <chr> "13", "22", "-", "-", "16", "2", "-", "1", "9", "...
## $ dh_r1             <chr> "636", "149", "9", "16", "36", "373", "698", "98"...
## $ dh_r2             <chr> " 1 166", " 1 097", "244", "66", "135", " 1 044",...
## $ dh_r3             <chr> " 2 162", " 1 986", "810", "260", "638", " 2 578"...
## $ dh_r4             <chr> "741", " 1 644", " 1 761", "693", " 1 124", " 1 3...
## $ dh_r5             <chr> "257", " 1 050", " 1 640", " 1 418", " 1 603", "4...
## $ dh_r6             <chr> "314", "595", " 1 537", " 2 904", " 1 491", "225"...
## $ dh_r0             <chr> "13", "-", "-", "-", "-", "2", "-", "1", "-", "1"...
## $ dh_ocm            <chr> "624", " 1 385", " 1 444", "840", "975", " 1 166"...
## $ dh_ocwm           <chr> " 1 975", " 2 467", " 2 443", " 1 762", " 2 483",...
## $ dh_st             <chr> " 2 193", " 2 130", " 1 444", " 1 706", " 1 137",...
## $ dh_co             <chr> "20", "16", "-", "-", "20", "22", "21", "48", "42...
## $ dh_rf             <chr> "219", "329", "309", "380", "258", "120", "228", ...
## $ dh_emp            <chr> "258", "194", "361", "669", "154", "93", "162", "...
## $ dhm_1             <chr> "165", "257", "283", "208", "105", "111", "119", ...
## $ dhm_2             <chr> "59", "27", "-", "-", "46", "65", "29", "59", "29...
## $ dhm_3             <chr> "31", "64", "8", "54", "9", "53", "98", "82", "13...
## $ dhm_4             <chr> "-", "48", "-", "44", "20", "97", "85", "207", "1...
## $ dhm_5             <chr> "39", "30", "17", "-", "27", "72", "85", "63", "7...
## $ dhm_6             <chr> "112", "362", "295", "41", "201", "519", "109", "...
## $ dhm_7             <chr> "218", "597", "841", "493", "567", "249", "21", "...
## $ dhm_loan          <chr> "15,500", "20,000", "26,250", "48,000", "25,000",...
## $ dhm_lr            <chr> "24.1", "18.7", "20", "16.2", "16.4", "19.2", "17...
## $ dhr_1             <chr> "147", "127", "75", "263", "61", "628", " 1 511",...
## $ dhr_2             <chr> "408", "34", "119", "31", "66", "318", " 1 015", ...
## $ dhr_3             <chr> "312", "25", "92", "23", "18", "453", "178", "269...
## $ dhr_4             <chr> " 1 604", " 2 154", " 1 519", " 2 058", " 1 166",...
## $ dm_r              <chr> "13,500", "23,000", "34,000", "67,000", "32,000",...
## $ dmr_ir            <dbl> 31.3, 30.9, 24.6, 34.4, 28.0, 30.9, 17.1, 38.2, 3...
## $ fa_m              <int> 40, 58, 93, 183, 94, 43, 32, 37, 45, 34, 34, 35, ...
## $ pm_hk             <chr> "470", "662", " 1 053", " 1 116", "807", " 1 031"...
## $ pm_kln            <chr> "270", "346", "310", "265", "384", "827", "457", ...
## $ pm_nt             <chr> "308", "336", "428", "177", "518", "767", "581", ...
## $ pm_oth            <chr> "46", "16", "75", "79", "51", "149", "119", "51",...
## $ pm_samearea       <chr> " 1 070", " 1 836", " 2 825", " 2 524", " 1 599",...
## $ pm_same           <chr> " 8 353", " 10 453", " 11 744", " 11 936", " 11 8...
## $ pm_out            <chr> " 1 770", " 2 907", " 2 828", " 3 489", " 2 174",...
colSums(is.na(df)) #check NA
##          dc_class                dc        dcca_class              dcca 
##                 0                 0                 0                 0 
##            dc_eng            ca_eng          dcca_eng            dc_chi 
##                 0                 0                 0                 0 
##            ca_chi          dcca_chi             t_pop             pop_m 
##                 0                 0                 0                 0 
##             pop_f                sr             age_1             age_2 
##                 0                 0                 0                 0 
##             age_3             age_4             age_5              t_ma 
##                 0                 0                 0                 0 
##              ma_m              ma_f           born_hk          born_chi 
##                 0                 0                 0                 0 
##         born_else          ethn_chi          ethn_phi          ethn_ind 
##                 0                 0                 0                 0 
##           ethn_wh          ethn_oth           ms_nm_m           ms_ma_m 
##                 0                 0                 0                 0 
##          ms_win_m          ms_div_m          ms_sep_m           ms_nm_f 
##                 0                 0                 0                 0 
##            ms_m_f          ms_win_f          ms_div_f          ms_sep_f 
##                 0                 0                 0                 0 
##            ul_can            ul_put         ul_othchi            ul_eng 
##                 0                 0                 0                 0 
##            ul_oth  readchi_ablepctn  readeng_ablepctn writechi_ablepctn 
##                 0                 0                 0                 0 
## writeeng_ablepctn        edu_prepri           edu_pri          edu_lsed 
##                 0                 0                 0                 0 
##          edu_usec           edu_dip           edu_sub           edu_deg 
##                 0                 0                 0                 0 
##          pls_same       pls_diff_hk      pls_diff_kln       pls_diff_nt 
##                 0                 0                 0                 0 
##        s_diff_oth              t_lf              lf_m              lf_f 
##                 0                 0                 0                 0 
##            lfpr_t            lfpr_m            lfpr_f              t_wp 
##                 0                 0                 0                 0 
##             wp_ee             wp_er             wp_se             wp_fw 
##                 0                 0                 0                 0 
##             t_nwp            nwp_hm            nwp_st            nwp_re 
##                 0                 0                 0                 0 
##           nwp_oth          plw_same       plw_diff_hk      plw_diff_kln 
##                 0                 0                 0                 0 
##       plw_diff_nt      plw_diff_oth         plw_nofix            plw_hm 
##                 0                 0                 0                 0 
##           plw_out      mearn_xfdh_1      mearn_xfdh_2      mearn_xfdh_3 
##                 0                 0                 0                 0 
##      mearn_xfdh_4      mearn_xfdh_5      mearn_xfdh_6      mearn_xfdh_7 
##                 0                 0                 0                 0 
##          t_mmearn          mmearm_m          mmearn_f    mearn_xfdhfw_1 
##                 0                 0                 0                 0 
##    mearn_xfdhfw_2    mearn_xfdhfw_3    mearn_xfdhfw_4    mearn_xfdhfw_5 
##                 0                 0                 0                 0 
##    mearn_xfdhfw_6    mearn_xfdhfw_7     t_mmearn_xfdh     mmearn_xfdh_m 
##                 0                 0                 0                 0 
##     mmearn_xfdh_f              wp_a              wp_b              wp_c 
##                 0                 0                 0                 0 
##              wp_d              wp_e              wp_f              wp_g 
##                 0                 0                 0                 0 
##              wp_h              wp_i              wp_j              wp_k 
##                 0                 0                 0                 0 
##              wp_l              wp_m              wp_n              wp_o 
##                 0                 0                 0                 0 
##              wp_p              wp_q              wp_r              wp_s 
##                 0                 0                 0                 0 
##              wp_t             whr_1             whr_2             whr_3 
##                 0                 0                 0                 0 
##             whr_4             whr_5             whr_6                dh 
##                 0                 0                 0                 0 
##             dhz_1             dhz_2             dhz_3             dhz_4 
##                 0                 0                 0                 0 
##             dhz_5             dhz_6              adhz             dhc_1 
##                 0                 0                 0                 0 
##             dhc_2             dhc_3             dhc_4             dhc_5 
##                 0                 0                 0                 0 
##             dhc_6             dhc_7             dhc_8             dhi_1 
##                 0                 0                 0                 0 
##             dhi_2             dhi_3             dhi_4             dhi_5 
##                 0                 0                 0                 0 
##             dhi_6             dhi_7             ma_hh            dhi_e1 
##                 0                 0                 0                 0 
##            dhi_e2            dhi_e3            dhi_e4            dhi_e5 
##                 0                 0                 0                 0 
##            dhi_e6            dhi_e7         ma_econhh            oq_pub 
##                 0                 0                 0                 0 
##              oq_s            oq_pri            oq_non            oq_tem 
##                 0                 0                 0                 0 
##            dh_pub              dh_s            dh_pri            dh_non 
##                 0                 0                 0                 0 
##            dh_tem           pop_pub             pop_s           pop_pri 
##                 0                 0                 0                 0 
##           pop_non           pop_tem             dh_r1             dh_r2 
##                 0                 0                 0                 0 
##             dh_r3             dh_r4             dh_r5             dh_r6 
##                 0                 0                 0                 0 
##             dh_r0            dh_ocm           dh_ocwm             dh_st 
##                 0                 0                 0                 0 
##             dh_co             dh_rf            dh_emp             dhm_1 
##                 0                 0                 0                 0 
##             dhm_2             dhm_3             dhm_4             dhm_5 
##                 0                 0                 0                 0 
##             dhm_6             dhm_7          dhm_loan            dhm_lr 
##                 0                 0                 0                 0 
##             dhr_1             dhr_2             dhr_3             dhr_4 
##                 0                 0                 0                 0 
##              dm_r            dmr_ir              fa_m             pm_hk 
##                 0                 0                 0                 0 
##            pm_kln             pm_nt            pm_oth       pm_samearea 
##                 0                 0                 0                 0 
##           pm_same            pm_out 
##                 0                 0
hkgeo <- st_read("DC 2015 poly Shapefile/DC_2015_poly Shapefile/GIH3_DC_2015_POLY.shp")
## Reading layer `GIH3_DC_2015_POLY' from data source `C:\Users\X Lin\Desktop\Notes\4.1\VA\assgn 5\DC 2015 poly Shapefile\DC_2015_poly Shapefile\GIH3_DC_2015_POLY.shp' using driver `ESRI Shapefile'
## Simple feature collection with 431 features and 11 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 799186.8 ymin: 799837 xmax: 869854 ymax: 847618
## projected CRS:  Hong Kong 1980 Grid System
hkgeo
## Simple feature collection with 431 features and 11 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 799186.8 ymin: 799837 xmax: 869854 ymax: 847618
## projected CRS:  Hong Kong 1980 Grid System
## First 10 features:
##    DCCA_2015_ DCCA_20151 CACODE                            ENAME
## 1         310        126    Q13                          Kwan Po
## 2         309        133    Q25                       Kwong Ming
## 3         308        128    G17            Hok Yuen Laguna Verde
## 4         306         96    E17 East Tsim Sha Tsui & King's Park
## 5         304         95    E01               Tsim Sha Tsui West
## 6         302        135    E16                 Yau Ma Tei North
## 7         300        129    G23                           Oi Man
## 8         299        130    J27                        Tsui Ping
## 9         298        141    Q14                           Nam On
## 10        295        142    J32                          Ting On
##                                               CNAME E00_CENTRO E00_CENT_1
## 1                                  <U+8ECD><U+5BF6>   845169.5   819389.1
## 2                                  <U+5EE3><U+660E>   844703.6   819318.0
## 3                  <U+9DB4><U+5712><U+6D77><U+9038>   838028.9   819106.4
## 4  <U+5C16><U+6771><U+53CA><U+4EAC><U+58EB><U+67CF>   835609.4   818067.6
## 5                  <U+5C16><U+6C99><U+5480><U+897F>   834482.3   818170.5
## 6                  <U+6CB9><U+9EBB><U+5730><U+5317>   835393.3   819717.5
## 7                                  <U+611B><U+6C11>   836637.1   819242.4
## 8                                  <U+7FE0><U+5C4F>   842073.0   819391.3
## 9                                  <U+5357><U+5B89>   845169.5   819389.1
## 10                                 <U+5B9A><U+5B89>   840698.4   819706.8
##                          DISTRICT_T    DISTRICT_E SHAPE_AREA SHAPE_LEN
## 1          <U+897F><U+8CA2><U+5340>      Sai Kung  308388.62  4277.714
## 2          <U+897F><U+8CA2><U+5340>      Sai Kung  273555.30  2349.297
## 3  <U+4E5D><U+9F8D><U+57CE><U+5340>  Kowloon City  960988.59  4492.008
## 4  <U+6CB9><U+5C16><U+65FA><U+5340> Yau Tsim Mong 2371378.23  8441.048
## 5  <U+6CB9><U+5C16><U+65FA><U+5340> Yau Tsim Mong 4397466.69 10279.687
## 6  <U+6CB9><U+5C16><U+65FA><U+5340> Yau Tsim Mong  135793.47  1936.740
## 7  <U+4E5D><U+9F8D><U+57CE><U+5340>  Kowloon City  368492.01  4528.530
## 8          <U+89C0><U+5858><U+5340>     Kwun Tong  362912.12  4135.462
## 9          <U+897F><U+8CA2><U+5340>      Sai Kung  166679.30  2411.632
## 10         <U+89C0><U+5858><U+5340>     Kwun Tong   82454.86  1848.950
##                          geometry
## 1  POLYGON ((845283.5 819328.1...
## 2  POLYGON ((844917.5 819236.8...
## 3  POLYGON ((838847.8 819020.6...
## 4  POLYGON ((836250.3 819250.7...
## 5  POLYGON ((834521 819759.8, ...
## 6  POLYGON ((835998.2 819775.8...
## 7  POLYGON ((836568.8 819776.7...
## 8  POLYGON ((842202.9 819491.2...
## 9  POLYGON ((845508.3 819708.5...
## 10 POLYGON ((841037.1 819565.3...

Data wrangling and manipulation

Steps:
1. filter df into a smaller dataset comprising of only fields we are interested in - total population, district, and age groups. The subset will be saved as hkpop
2. Change the data structure for population to numeric (currently stored as strings)
3. View the dimensions of the dataset, along with the structure of variables and an overview of each column and its records
4. Left join the shapefile with hkpop into a new dataframe hkpop_cleaned

hkpop <- df %>% select(ca_eng, t_pop, age_1, age_2,age_3,age_4,age_5) 
#change strings to numbers

hkpop$t_pop <- gsub(" ","",hkpop$t_pop)
hkpop$t_pop <- as.numeric(hkpop$t_pop)


hkpop$age_1 <- gsub(" ","",hkpop$age_1)
hkpop$age_1 <- as.numeric(hkpop$age_1)

hkpop$age_2 <- gsub(" ","",hkpop$age_2)
hkpop$age_2 <- as.numeric(hkpop$age_2)

hkpop$age_3 <- gsub(" ","",hkpop$age_3)
hkpop$age_3 <- as.numeric(hkpop$age_3)

hkpop$age_4 <- gsub(" ","",hkpop$age_4)
hkpop$age_4 <- as.numeric(hkpop$age_4)

hkpop$age_5 <- gsub(" ","",hkpop$age_5)
hkpop$age_5 <- as.numeric(hkpop$age_5)

hkpop_cleaned <- left_join(hkgeo, hkpop, by = c('ENAME'= 'ca_eng'))

Codes for geospatial visualisations

Steps:

Using tmap,

  1. Store the total population density map of Hong Kong into totalpop using t_polygons. This map will be made interactive.
  2. Store the population density of Hong Kong faceted by the 18 districts to totalpop_faceted. This map will not be interactive.
  3. For totalpop_faceted, tm_fill instead of tm_polygons will be used
  4. To create a map of Hong Kong according to age groups, step 1. will be repeated, however the specific age groups will be specified. This map will not be interactive and will be stored as agepop
  5. For agepop, labels will be changed to the specific age groups since the column headers are not intuitive i.e. age_1, age_2.
knitr::opts_chunk$set(warning = FALSE, error = TRUE, message = FALSE, results = 'asis')

totalpop <- tm_shape(hkpop_cleaned) + tm_polygons("t_pop", textNA = "NA", title = "Total Population", palette = "Greens") + tm_layout(main.title = "Distribution of Population in Hong Kong SAR", main.title.position = "center", main.title.size = 1.5)

totalpop_faceted <- tm_shape(hkpop_cleaned) + tm_fill("t_pop", textNA = "NA", title = "Total Population") + tm_layout(main.title = "Distribution of Population in Hong Kong SAR by District", main.title.position = "center", main.title.size = 1.5) + tm_facets(by = "DISTRICT_E") 

tmap_mode("plot")

agepop <- tm_shape(hkpop_cleaned) + tm_polygons(c("age_1","age_2", "age_3", "age_4", "age_5"), n = 5, palette = "Blues")  + tm_layout(main.title = "Age distribution in Hong Kong SAR", main.title.position = "center", panel.labels = c("Below 15", "15-24", "25-44","45-64", "Over 65"))

Hong Kong’s overall population

tmap_mode("view")
totalpop

Hong Kong’s population by district

Heavily populated districts:

1. Kwun Tong

Kwun Tong is the first satellite town in the urban area at the early days of Hong Kong and a dominant player in helping shape the territory’s economic development. Today, Kwun Tong continues to be an attractive region conducive to the long term development of Hong Kong. Numerous housing development projects have been unrolled in Kwun Tong, to provide residential facilities for a growing population in the booming business district.

2. Yau Tsim Mong

The Yau Tsim Mong District comprises of urban areas and is home to the Hong Kong Polytechnic University. The bustling District is also a major transport intersection (- the recent (connecting Hong Kong to Shenzhen and Guangzhou via a high-speed rail link completed recently). Home to Hong Kong’s largest shopping malls, cultural institutions and Night Markets, it is no wonder that the District is densely populated.

3. Wong Tai Sin

Wong Tai Sin is home to the working class neighborhood in Kowloon, where 80% of residents live in subsidised housing. At the same time, the district has the highest percentage of elderly, due to convenient transportation and full range of facilities which makes the region very livable.

tmap_mode("plot")

totalpop_faceted

Hong Kong’s overall population according to 5 age groups

Several observations can be noted from the following maps:

1. Children (and naturally, families) are more concentrated around the North region, particularly in the area of Yuen Long.

This can be attributed to the fact that there is more space available in the North, rendering the area appealing to young couples intending to look for houses to start a family. In 2019, a survey by the Hong Kong Baptist University suggests that the North District ranks highest on the city’s Happiness scale - possibly attributed to more families and greater green space compared to the city.

2. The working population (aged between 25-64) are most concentrated in the city center

The Central District is the central business district of Hong Kong, and evidently a large proportion of the working population would be based nearer to where multinational companies have their headquarters. The government headquarters is also located at Central. The proximity of Central to Victoria Harbour allows the region to serve as the centre of trade and financial activities effectively.

3. The older population (aged above 65) dominate in peripheral regions of Hong Kong

Many of Hong Kong’s elderly reside in Lantau Island (located on the bottom West). As the largest island in Hong Kong and originally a fishing village, Lantau Island is an ideal retirement area for the elderly. Parks cover more than half of the area of Lantau Island, and the pace of life here is much slower than that in the city. Privately owned residential developments are also located in Lantau Island, giving it a reputation as an expatriate enclave as well.

agepop