The objective of this assignment is to to reveal the resident population, sex ratio, married and divorced rate of Chinese population by provinces in 2018 through a series of map visualization done by tmap package. The population data is collected from National Bureau of Statistics of China in csv form. The map information is downloaded from https://data.humdata.org/dataset/china-administrative-boundaries in shapefile form.
The population data collected from National Bureau of Statistics of China has been grouped and manipulated in Excel and little wrangling is needed in R. Data of 31 Provinces is included the exception of Hongkong, Macao and Taiwan. Although there are many columns, the columns we are going to plot are “Population” (The resident population of each province at the end of year 2018), “SexRatio” (Calculated from Sample Survey with Female=100), “Divorce”(Crude Divorce Rate % calculated using Registered Divorces), “Marriage” (Crude Marriage Rate % calculated using Registered Marriages with the population base same as Crude Divorce Rate).
For map data, several modifications need to be done. As the data we downloaded separate several Provinces like Anhui, Xinjiang into smaller parts, we need to merge these geometry and dissolve the boundary for the same Province. We are using st_union to solve this problem. However, when I try to add tm_text, it turns out the duplicated text will appear as the data is merged from several geometries. Didn’t find any good solutions, I worked around this issue by plotting a shape using st_combine and another shape using st_union together.
Sketch
packages = c('sf', 'tmap', 'tidyverse', 'ggsci')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
Use st_read() of sf package and read_csv() function of readr package (one of the tidyverse package)
mpcn <- st_read(dsn = "chn_admbnda_adm1",
layer = "chn_admbnda_adm1_ocha")
popcn <- read_csv("Population.csv")
Check the data.Though there are 34 divisions, it has 57 features.
head(mpcn)
## Simple feature collection with 6 features and 6 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 92.78866 ymin: 23.58468 xmax: 120.4246 ymax: 42.79352
## geographic CRS: WGS 84
## ADM1_EN ADM1_ZH ADM1_PCODE ADM0_EN ADM0_ZH ADM0_PCODE
## 1 Anhui Province 安徽省 CN340 China 中国 CN
## 2 Anhui Province 安徽省 CN341 China 中国 CN
## 3 Beijing Municipality 北京市 CN110 China 中国 CN
## 4 Chongqing Municipality 重庆市 CN500 China 中国 CN
## 5 Fujian Province 福建省 CN350 China 中国 CN
## 6 Gansu province 甘肃省 CN620 China 中国 CN
## geometry
## 1 MULTIPOLYGON (((116.9352 34...
## 2 MULTIPOLYGON (((118.3409 31...
## 3 MULTIPOLYGON (((117.3147 40...
## 4 MULTIPOLYGON (((109.546 31....
## 5 MULTIPOLYGON (((117.2327 23...
## 6 MULTIPOLYGON (((104.1044 37...
Do quick map on it. Several provinces are segmentated.
qtm(mpcn)
class(mpcn)
## [1] "sf" "data.frame"
Create combined geometry using st_combine
mpcn2 =
mpcn %>%
group_by(ADM1_EN) %>%
summarize(geometry = st_combine(geometry))
summary(mpcn2)
## ADM1_EN geometry
## Length:34 MULTIPOLYGON :34
## Class :character epsg:4326 : 0
## Mode :character +proj=long...: 0
Create combined geometry using st_union
mpcn3 = mpcn %>%
group_by(ADM1_EN) %>%
summarize(geometry = st_union(geometry))
summary(mpcn3)
## ADM1_EN geometry
## Length:34 MULTIPOLYGON :22
## Class :character POLYGON :12
## Mode :character epsg:4326 : 0
## +proj=long...: 0
The boundary has been dissolved by using st_union.
m2 = qtm(mpcn2)
m3 = qtm(mpcn3)
tmap_arrange(m2, m3, asp=1, ncol=2)
As we would like to join map data with population data, we’d like to create a common column to join on.Columns“ADM1_EN”, “ADM1_PCODE” and “Region” are included in a dataframe named code.
ADM1_EN = mpcn$ADM1_EN
ADM1_PCODE = mpcn$ADM1_PCODE
Region = popcn$Region
code = data.frame('ADM1_EN' = ADM1_EN, 'ADM1_PCODE' = ADM1_PCODE)
code = code[!duplicated(code[,c("ADM1_EN")]), ]
code2 = popcn[,0:2]
code = left_join(code, code2, by = c('ADM1_PCODE' = "ADM1_PCODE"))
Append code to map data.
mpcn2 = left_join(mpcn2, code,
by = c("ADM1_EN" = "ADM1_EN"))
mpcn3 = left_join(mpcn3, code,
by = c("ADM1_EN" = "ADM1_EN"))
Join with population data.
mpcn_popcn <- left_join(mpcn3, popcn,
by = c("ADM1_PCODE" = "ADM1_PCODE"))
mpcn_popcn$Divorce = mpcn_popcn$Divorce/100
mpcn_popcn$Marriage = mpcn_popcn$Marriage/1000
#tmaptools::palette_explorer()
Plot the resident population for each Province.
g1 = tm_shape(mpcn_popcn) +
tm_fill('Population',palette = "Blues") +
tm_borders()+
tm_layout(title = "Distribution of Resident Population \n by Province in year 2018",
title.position =c("center", "top"),
title.size = 1.2,
legend.outside = F,
legend.text.size = 0.7,
#panel.show = T,
frame = FALSE) +
tm_shape(mpcn2) +
tm_fill(alpha = 0) +
tm_borders(alpha = 0) +
tm_text("Region", size = 0.4, remove.overlap = T) +
tm_layout(frame = FALSE) +
tm_credits("Source: Population data from National Bureau of Statistics of China",
size = 0.5, align = "left")
g1
g2 = tm_shape(mpcn_popcn) +
tm_fill('SexRatio',palette = "BuGn") +
tm_borders()+
tm_layout(title = "Distribution of Sex Ratio \nby Province in year 2018",
title.position = c("center","top"),
title.size = 1.2,
legend.outside = F,
frame = FALSE) +
tm_shape(mpcn2) +
tm_fill(alpha = 0) +
tm_borders(alpha = 0) +
tm_text("Region", size = 0.4, remove.overlap = T) +
tm_layout(frame = FALSE) +
tm_credits("Source: Population data from National Bureau of Statistics of China",
size = 0.5, align = "left")
g2
Higlight area with high Sex ratio
highratio = mpcn_popcn %>%
filter(mpcn_popcn$SexRatio > 107)
mapoutline <- st_union(mpcn)
sexRatioMap = tm_shape(mpcn_popcn) +
tm_fill("white") +
tm_borders("grey", lwd = 0.5) +
tm_shape(mapoutline) +
tm_borders(lwd = 2) +
tm_shape(highratio) +
tm_fill("turquoise3") +
tm_borders() +
tm_layout(frame = T, title = "Provinces with Sex Ratio higher than 107 in 2018 ",
title.size = 0.9, title.position = c(0.02, "top")) +
tm_credits("Source: Population data from National Bureau of Statistics of China",
size = 0.5, align = "right")
sexRatioMap
g3 = tm_shape(mpcn_popcn) +
tm_fill('Divorce',palette = "GnBu") +
tm_borders()+
tm_layout(title = "Distribution of Divorce Rate by \n Province in year 2018",
title.position = c("center","top"),
title.size = 1.2,
legend.outside = F,
frame = FALSE) +
tm_shape(mpcn2) +
tm_fill(alpha = 0) +
tm_borders(alpha = 0) +
tm_text("Region", size = 0.4, remove.overlap = T) +
tm_layout(frame = FALSE) +
tm_credits("Source: Population data from National Bureau of Statistics of China",
size = 0.5, align = "left")
g3
highDivorce = mpcn_popcn %>%
filter(mpcn_popcn$Divorce > 0.04)
lowDivorce = mpcn_popcn %>%
filter(mpcn_popcn$Divorce < 0.02)
Divorce = rbind(highDivorce, lowDivorce)
divorceMap = tm_shape(mpcn_popcn) +
tm_fill("white") +
tm_borders("grey", lwd = 0.5) +
tm_shape(mapoutline) +
tm_borders(lwd = 2) +
tm_shape(highDivorce) +
tm_fill("skyblue") +
tm_borders() +
tm_shape(lowDivorce) +
tm_fill("pink") +
tm_borders() +
tm_layout(frame = T, title = "Provinces with Divorce rate (>4% & <2%) in 2018 ",
title.size = 0.9, title.position = c(0.02, "top")) +
tm_credits("Source: Population data from National Bureau of Statistics of China",
size = 0.5, align = "right")
Pink: Divorce rate < 2%; Blue: Divorce rate > 4%
divorceMap
g4 = tm_shape(mpcn_popcn) +
tm_fill('Marriage',palette = "PuBu") +
tm_borders()+
tm_layout(title = "Distribution of Marriage Rate \n by Province in year 2018",
title.position = c("center","top"),
title.size = 1.2,
legend.outside = F,
frame = FALSE) +
tm_shape(mpcn2) +
tm_fill(alpha = 0) +
tm_borders(alpha = 0) +
tm_text("Region", size = 0.4, remove.overlap = T) +
tm_layout(frame = FALSE) +
tm_credits("Source: Population data from National Bureau of Statistics of China",
size = 0.5, align = "left")
g4
highMarriage = mpcn_popcn %>%
filter(mpcn_popcn$Marriage > 0.08)
lowMarriage = mpcn_popcn %>%
filter(mpcn_popcn$Marriage < 0.06)
marriageMap = tm_shape(mpcn_popcn) +
tm_fill("white") +
tm_borders("grey", lwd = 0.5) +
tm_shape(mapoutline) +
tm_borders(lwd = 2) +
tm_shape(highMarriage) +
tm_fill("pink") +
tm_borders() +
tm_shape(lowMarriage) +
tm_fill("skyblue") +
tm_borders() +
tm_layout(frame = T, title = "Provinces with Marriage rate (>8% & <6%) in 2018 ",
title.size = 0.9, title.position = c(0.02, "top")) +
tm_credits("Source: Population data from National Bureau of Statistics of China",
size = 0.5, align = "right")
Pink: marriage rate >8%; Blue: marriage rate <6%
marriageMap
tmap_arrange(g1, sexRatioMap, divorceMap, marriageMap)
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(mpcn_popcn) +
tm_fill('ADM1_EN',palette = "Set3",
popup.vars=c("Population"="Population","Sex Ratio" = "SexRatio",
"Divorce Rate" = "Divorce", "Marriage rate" = "Marriage"),
legend.show = F) +
tm_borders(alpha = 0.5)
## Warning: Number of levels of the variable "ADM1_EN" is 34, which is
## larger than max.categories (which is 30), so levels are combined. Set
## tmap_options(max.categories = 34) in the layer function to show all levels.
tmap_mode("plot")
tmap_style("white")
As can be seen from the four graphs, Guangdong and Shandong are most populated provinces in China, while Qinghai, Tibet and Ningxia are less densly populated. Most coastal areas in the south have high sex ratio, which may due to cultural reasons. Though surprisingly, Tianjin and Qinghai also have high sex ratio. Heilongjiang, Jilin, Chonqing and Guizhou have higher divorce rate while big cities like Beijing and Shanghai have moderately high divorce rate. Shandong and Zhejiang have lower marriage rate while Guizhou has the highest marriage rate of 11% in 2018. Details of the map such as layout and style can be improved.Future work can include time series plots and gender differences to better illustrate the situation.