Geographical Segmentation

Introduction

In this exercise, we will gain hands-on experience on how to delineate homogeneous region by using geographically referenced multivariate data. There are two major analysis, namely: 1. hierarchical cluster analysis; and 2. spatially constrained cluster analysis.

Learning Outcome

By the end of this hands-on exercise, we will able: - to convert GIS polygon data into R’s simple feature data.frame by using appropriate functions of sf package of R; - to convert simple feature data.frame into R’s SpatialPolygonDataFrame object by using appropriate sf of package of R; - to perform custer analysis by using hclust() of Base R; - to perform spatially constrained cluster analysis using skater() of Base R; and - to visualise the analysis output by using ggplot2 and tmap package.

Getting Started

The analytical question

In geobusiness and spatial policy, it is a common practice to delineate the market or planning area into homogeneous regions by using multivariate data. In this hands-on exercise, we are interested to delineate Shan State, Myanmar into homogeneous regions by using multiple Information and Communication technology (ICT) measures, namely: Radio, Television, Land line phone, Mobile phone, Computer, and Internet at home.

The data

Two data sets will be used in this study. They are: 1. Myanmar Township Boundary Data (i.e. myanmar_township_boundaries) : This is a GIS data in ESRI shapefile format. It consists of township bondary information of Myanmar. The spatial data are casptured in polygon features. 2. Shan-ICT.csv: This is an extract of The 2014 Myanmar Population and Housing Census Myanmar at the township level. Both data sets are download from Myanmar Information Management Unit (MIMU)

Installing and loading R packages

Before we get started, it is important for us to install the necessary R packages into R and launch these R packages into R environment.

The R packages needed for this exercise are as follows: - Spatial data handling: sf, and spdep - Geospatial analysis package: ClustGeo - Attribute data handling: tidyverse, especially readr, ggplot2 and dplyr - Choropleth mapping: tmap

Import packages

packages = c('rgdal', 'spdep', 'ClustGeo',  'tmap', 'sf', 'ggpubr', 'cluster', 'heatmaply', 'corrplot', 'tidyverse', 'psych')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
    }
  library(p,character.only = T)
}
## Loading required package: rgdal
## Loading required package: sp
## rgdal: version: 1.4-8, (SVN revision 845)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.4.2, released 2019/06/28
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/gdal
##  GDAL binary built with GEOS: FALSE 
##  Loaded PROJ.4 runtime: Rel. 5.2.0, September 15th, 2018, [PJ_VERSION: 520]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/proj
##  Linking to sp version: 1.3-2
## Loading required package: spdep
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
## Loading required package: sf
## Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
## Loading required package: ClustGeo
## Loading required package: tmap
## Loading required package: ggpubr
## Loading required package: ggplot2
## Loading required package: cluster
## Loading required package: heatmaply
## Loading required package: plotly
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## Loading required package: viridis
## Loading required package: viridisLite
## Registered S3 method overwritten by 'seriation':
##   method         from 
##   reorder.hclust gclus
## 
## ======================
## Welcome to heatmaply version 1.1.0
## 
## Type citation('heatmaply') for how to cite the package.
## Type ?heatmaply for the main documentation.
## 
## The github page is: https://github.com/talgalili/heatmaply/
## Please submit your suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
## Or contact: <tal.galili@gmail.com>
## ======================
## Loading required package: corrplot
## corrplot 0.84 loaded
## Loading required package: tidyverse
## ── Attaching packages ────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ tibble  2.1.3     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ✓ purrr   0.3.3
## ── Conflicts ───────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks plotly::filter(), stats::filter()
## x dplyr::lag()    masks stats::lag()
## Loading required package: psych
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

With tidyverse, we do not have to install readr, ggplot2 and dplyr packages seperately. In fact, tidyverse also installes other very useful R packages such as tidyr.

Data Import and Prepatation

Importing geospatial data into R environment

In this section, you will import Myanmar Township Boundary GIS data and its associated attrbiute table into R environment.

The Myanmar Township Boundary GIS data is in ESRI shapefile format. It will be imported into R environment by using the st_read() function of sf.

shan_sf <- st_read(dsn = "data/geospatial", layer = "myanmar_township_boundaries") %>%
  filter(ST %in% c("Shan (East)", "Shan (North)", "Shan (South)"))
## Reading layer `myanmar_township_boundaries' from data source `/Users/theodora/Desktop/IS415/Lesson 10/data/geospatial' using driver `ESRI Shapefile'
## Simple feature collection with 330 features and 14 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 92.17275 ymin: 9.671252 xmax: 101.1699 ymax: 28.54554
## CRS:            4326

The imported township boundary object is called shan_sf. It is saved in simple feature data.frame format. We can view the content of the newly created shan_sf simple features data.frame by using the code chun below.

shan_sf
## Simple feature collection with 55 features and 14 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 96.15107 ymin: 19.29932 xmax: 101.1699 ymax: 24.15907
## CRS:            4326
## First 10 features:
##    OBJECTID           ST ST_PCODE       DT   DT_PCODE        TS  TS_PCODE
## 1       163 Shan (North)   MMR015  Mongmit MMR015D008   Mongmit MMR015017
## 2       203 Shan (South)   MMR014 Taunggyi MMR014D001   Pindaya MMR014006
## 3       240 Shan (South)   MMR014 Taunggyi MMR014D001   Ywangan MMR014007
## 4       106 Shan (South)   MMR014 Taunggyi MMR014D001  Pinlaung MMR014009
## 5        72 Shan (North)   MMR015  Mongmit MMR015D008    Mabein MMR015018
## 6        40 Shan (South)   MMR014 Taunggyi MMR014D001     Kalaw MMR014005
## 7       194 Shan (South)   MMR014 Taunggyi MMR014D001     Pekon MMR014010
## 8       159 Shan (South)   MMR014 Taunggyi MMR014D001  Lawksawk MMR014008
## 9        61 Shan (North)   MMR015  Kyaukme MMR015D003 Nawnghkio MMR015013
## 10      124 Shan (North)   MMR015  Kyaukme MMR015D003   Kyaukme MMR015012
##                  ST_2            LABEL2 SELF_ADMIN ST_RG T_NAME_WIN
## 1  Shan State (North)    Mongmit\n61072       <NA> State   rdk;rdwf
## 2  Shan State (South)    Pindaya\n77769       Danu State     yif;w,
## 3  Shan State (South)    Ywangan\n76933       Danu State      &GmiH
## 4  Shan State (South)  Pinlaung\n162537       Pa-O State  yifavmif;
## 5  Shan State (North)     Mabein\n35718       <NA> State     rbdrf;
## 6  Shan State (South)     Kalaw\n163138       <NA> State       uavm
## 7  Shan State (South)      Pekon\n94226       <NA> State     z,fcHk
## 8  Shan State (South)          Lawksawk       <NA> State   &yfapmuf
## 9  Shan State (North) Nawnghkio\n128357       <NA> State  aemifcsdK
## 10 Shan State (North)   Kyaukme\n172874       <NA> State   ausmufrJ
##              T_NAME_M3     AREA                       geometry
## 1           မိုးမိတ\u103a 2703.611 MULTIPOLYGON (((96.96001 23...
## 2          ပင\u103aးတယ  629.025 MULTIPOLYGON (((96.7731 21....
## 3            ရ\u103dာငံ 2984.377 MULTIPOLYGON (((96.78483 21...
## 4  ပင\u103aလောင\u103aး 3396.963 MULTIPOLYGON (((96.49518 20...
## 5           မဘိမ\u103aး 5034.413 MULTIPOLYGON (((96.66306 24...
## 6                 ကလော 1456.624 MULTIPOLYGON (((96.49518 20...
## 7            ဖယ\u103aခုံ 2073.513 MULTIPOLYGON (((97.14738 19...
## 8   ရပ\u103aစောက\u103a 5145.659 MULTIPOLYGON (((96.94981 22...
## 9    နောင\u103aခ\u103bို 3271.537 MULTIPOLYGON (((96.75648 22...
## 10   က\u103bောက\u103aမဲ 3920.869 MULTIPOLYGON (((96.95498 22...

Notice that sf.data.frame is conformed to Hardy Wickham’s tidy framework.

Since shan_sf is conformed to tidy framework, we can also glimpse() to reveal the data type of it’s fields.

glimpse(shan_sf)
## Observations: 55
## Variables: 15
## $ OBJECTID   <dbl> 163, 203, 240, 106, 72, 40, 194, 159, 61, 124, 71, 155, 10…
## $ ST         <fct> Shan (North), Shan (South), Shan (South), Shan (South), Sh…
## $ ST_PCODE   <fct> MMR015, MMR014, MMR014, MMR014, MMR015, MMR014, MMR014, MM…
## $ DT         <fct> Mongmit, Taunggyi, Taunggyi, Taunggyi, Mongmit, Taunggyi, …
## $ DT_PCODE   <fct> MMR015D008, MMR014D001, MMR014D001, MMR014D001, MMR015D008…
## $ TS         <fct> Mongmit, Pindaya, Ywangan, Pinlaung, Mabein, Kalaw, Pekon,…
## $ TS_PCODE   <fct> MMR015017, MMR014006, MMR014007, MMR014009, MMR015018, MMR…
## $ ST_2       <fct> Shan State (North), Shan State (South), Shan State (South)…
## $ LABEL2     <fct> Mongmit
## 61072, Pindaya
## 77769, Ywangan
## 76933, Pinlaung
## 1625…
## $ SELF_ADMIN <fct> NA, Danu, Danu, Pa-O, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ ST_RG      <fct> State, State, State, State, State, State, State, State, St…
## $ T_NAME_WIN <fct> "rdk;rdwf", "yif;w,", "&GmiH", "yifavmif;", "rbdrf;", "uav…
## $ T_NAME_M3  <fct> မိုးမိတ်, ပင်းတယ, ရွာငံ, ပင်လောင်း, မဘိမ်း, ကလော, ဖယ်ခုံ, …
## $ AREA       <dbl> 2703.611, 629.025, 2984.377, 3396.963, 5034.413, 1456.624,…
## $ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((96.96001 23..., MULTIPOLYGON …

Importing aspatial data into R environment

The csv file will be import using read_csv function of readr package.

The code chunks used are shown below:

ict <- read_csv ("data/aspatial/Shan-ICT.csv")
## Parsed with column specification:
## cols(
##   `District Pcode` = col_character(),
##   `District Name` = col_character(),
##   `Township Pcode` = col_character(),
##   `Township Name` = col_character(),
##   `Total households` = col_double(),
##   Radio = col_double(),
##   Television = col_double(),
##   `Land line phone` = col_double(),
##   `Mobile phone` = col_double(),
##   Computer = col_double(),
##   `Internet at home` = col_double()
## )

The imported InfoComm variables extract of The 2014 Myanmar Population and Housing Census Myanmar attribute data set is called ict. It is saved in R’s * tibble data.frame* format.

The code chunk below reveal the summary statistics of ict data.frame.

summary(ict)
##  District Pcode     District Name      Township Pcode     Township Name     
##  Length:55          Length:55          Length:55          Length:55         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Total households     Radio         Television    Land line phone 
##  Min.   : 3318    Min.   :  115   Min.   :  728   Min.   :  20.0  
##  1st Qu.: 8711    1st Qu.: 1260   1st Qu.: 3744   1st Qu.: 266.5  
##  Median :13685    Median : 2497   Median : 6117   Median : 695.0  
##  Mean   :18369    Mean   : 4487   Mean   :10183   Mean   : 929.9  
##  3rd Qu.:23471    3rd Qu.: 6192   3rd Qu.:13906   3rd Qu.:1082.5  
##  Max.   :82604    Max.   :30176   Max.   :62388   Max.   :6736.0  
##   Mobile phone      Computer      Internet at home
##  Min.   :  150   Min.   :  20.0   Min.   :   8.0  
##  1st Qu.: 2037   1st Qu.: 121.0   1st Qu.:  88.0  
##  Median : 3559   Median : 244.0   Median : 316.0  
##  Mean   : 6470   Mean   : 575.5   Mean   : 760.2  
##  3rd Qu.: 7177   3rd Qu.: 507.0   3rd Qu.: 630.5  
##  Max.   :48461   Max.   :6705.0   Max.   :9746.0

Derive new variables using dplyr

The unit of measurement of the values are number of household. Using these values directly will be bais by the underlying total number of households. In general, the townships with relatively higher total number of households will also have higher number of households owning radio, TV, etc.

In order to overcome this problem, we will derive the penetration rate of each ICT variable by using the code chunk below.

ict_derived <- ict %>%
  mutate(`RADIO_PR` = `Radio`/`Total households`*1000) %>%
  mutate(`TV_PR` = `Television`/`Total households`*1000) %>%
  mutate(`LLPHONE_PR` = `Land line phone`/`Total households`*1000) %>%
  mutate(`MPHONE_PR` = `Mobile phone`/`Total households`*1000) %>%
  mutate(`COMPUTER_PR` = `Computer`/`Total households`*1000) %>%
  mutate(`INTERNET_PR` = `Internet at home`/`Total households`*1000) %>%
  rename(`DT_PCODE` =`District Pcode`,`DT`=`District Name`,
         `TS_PCODE`=`Township Pcode`, `TS`=`Township Name`,
         `TT_HOUSEHOLDS`=`Total households`,
         `RADIO`=`Radio`, `TV`=`Television`, 
         `LLPHONE`=`Land line phone`, `MPHONE`=`Mobile phone`,
         `COMPUTER`=`Computer`, `INTERNET`=`Internet at home`) 

Let us review the summary statistics of the newly derived penetration rates using the code chunk below.

summary(ict_derived)
##    DT_PCODE              DT              TS_PCODE              TS           
##  Length:55          Length:55          Length:55          Length:55         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  TT_HOUSEHOLDS       RADIO             TV           LLPHONE      
##  Min.   : 3318   Min.   :  115   Min.   :  728   Min.   :  20.0  
##  1st Qu.: 8711   1st Qu.: 1260   1st Qu.: 3744   1st Qu.: 266.5  
##  Median :13685   Median : 2497   Median : 6117   Median : 695.0  
##  Mean   :18369   Mean   : 4487   Mean   :10183   Mean   : 929.9  
##  3rd Qu.:23471   3rd Qu.: 6192   3rd Qu.:13906   3rd Qu.:1082.5  
##  Max.   :82604   Max.   :30176   Max.   :62388   Max.   :6736.0  
##      MPHONE         COMPUTER         INTERNET         RADIO_PR     
##  Min.   :  150   Min.   :  20.0   Min.   :   8.0   Min.   : 21.05  
##  1st Qu.: 2037   1st Qu.: 121.0   1st Qu.:  88.0   1st Qu.:138.95  
##  Median : 3559   Median : 244.0   Median : 316.0   Median :210.95  
##  Mean   : 6470   Mean   : 575.5   Mean   : 760.2   Mean   :215.68  
##  3rd Qu.: 7177   3rd Qu.: 507.0   3rd Qu.: 630.5   3rd Qu.:268.07  
##  Max.   :48461   Max.   :6705.0   Max.   :9746.0   Max.   :484.52  
##      TV_PR         LLPHONE_PR       MPHONE_PR       COMPUTER_PR    
##  Min.   :116.0   Min.   :  2.78   Min.   : 36.42   Min.   : 3.278  
##  1st Qu.:450.2   1st Qu.: 22.84   1st Qu.:190.14   1st Qu.:11.832  
##  Median :517.2   Median : 37.59   Median :305.27   Median :18.970  
##  Mean   :509.5   Mean   : 51.09   Mean   :314.05   Mean   :24.393  
##  3rd Qu.:606.4   3rd Qu.: 69.72   3rd Qu.:428.43   3rd Qu.:29.897  
##  Max.   :842.5   Max.   :181.49   Max.   :735.43   Max.   :92.402  
##   INTERNET_PR     
##  Min.   :  1.041  
##  1st Qu.:  8.617  
##  Median : 22.829  
##  Mean   : 30.644  
##  3rd Qu.: 41.281  
##  Max.   :117.985

Exploratory Data Analysis (EDA)

EDA using statistical graphics

We can plot the distribution of the variables (i.e. Number of households with radio) by using appropriate Exploratory Data Analysis (EDA) as shown in the code chunk below.

Histogram is useful to identify the overall distribution of the data values (i.e. left skew, right skew or normal distribution)

ggplot(data=ict_derived, aes(x=`RADIO`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

Boxplot is useful to detect if there are outliers.

ggplot(data=ict_derived, aes(x=`RADIO`)) +
  geom_boxplot(color="black", fill="light blue")

Next, we will also plotting the distribution of the newly derived variables (i.e. Radio penetration rate) by using the code chunk below.

ggplot(data=ict_derived, aes(x=`RADIO_PR`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

ggplot(data=ict_derived, aes(x=`RADIO_PR`)) +
  geom_boxplot(color="black", fill="light blue")

The code chunks below are used to create the data visualisation. They consist of two main parts. First, we will create the individual histograms using the code chunk below.

radio <- ggplot(data=ict_derived, 
             aes(x= `RADIO_PR`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

tv <- ggplot(data=ict_derived, 
             aes(x= `TV_PR`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

llphone <- ggplot(data=ict_derived, 
             aes(x= `LLPHONE_PR`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

mphone <- ggplot(data=ict_derived, 
             aes(x= `MPHONE_PR`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

computer <- ggplot(data=ict_derived, 
             aes(x= `COMPUTER_PR`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

internet <- ggplot(data=ict_derived, 
             aes(x= `INTERNET_PR`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

Put all histograms together

Use the ggarange() function of ggpubr package is used to group these histograms together.

ggarrange(radio, tv, llphone, mphone, computer, internet, 
          ncol = 3, 
          nrow = 2)

EDA using choropleth map

Joining geospatial data with aspatial data

Before we can prepare the choropleth map, we need to combine both the geospatial data object (i.e. shan_sf) and aspatial data.frame object (i.e. ict_derived) into one. This will be performed by using the left_join function of dplyr package. The shan_sf simple feature data.frame will be used as the base data object and the ict_derived data.frame will be used as the join table.

The code chunks below is used to perform the task. The unique identifier used to join both data objects is TS_PCODE.

shan_sf <- left_join(shan_sf, ict_derived, by=c("TS_PCODE"="TS_PCODE"))
## Warning: Column `TS_PCODE` joining factor and character vector, coercing into
## character vector

The message above shows that TS_CODE field is the common field used to perform the left-join.

It is important to note that there is no new output data been created. Instead, the data fields from ict_derived data frame are now updated into the data frame of shan_sf.

Preparing chloropeth map

To have a quick look at the distribution of Radio penetration rate of Shan State at township level, a choropleth map will be prepared.

The code chunks below are used to prepare the choropleth by using the qtm() function of tmap package

qtm(shan_sf, "RADIO_PR")

In order to reveal that the distribution shown in the choropleth map above are bias to the underlying total number of households at the townships, we will create two choropleth maps, one for the total number of households (i.e. TT_HOUSEHOLDS.map) and one for the total number of household with Radio (RADIO.map) by using the code chunk below.

TT_HOUSEHOLDS.map <- tm_shape(shan_sf) + 
  tm_fill(col = "TT_HOUSEHOLDS",
          n = 5,
          style = "jenks", 
          title = "Total households") + 
  tm_borders(alpha = 0.5) 

RADIO.map <- tm_shape(shan_sf) + 
  tm_fill(col = "RADIO",
          n = 5,
          style = "jenks",
          title = "Number Radio ") + 
  tm_borders(alpha = 0.5) 

tmap_arrange(TT_HOUSEHOLDS.map, RADIO.map,
             asp=NA, ncol=2)

Notice that the choropleth maps above clearly show that townships with relatively larger number ot households are also showing relatively higher number of radio ownership.

Now let us plot the choropleth maps showing the dsitribution of total number of households and Radio penetration rate by using the code chunk below.

tm_shape(shan_sf) +
    tm_polygons(c("TT_HOUSEHOLDS", "RADIO_PR"),
                style="jenks") +
    tm_facets(sync = TRUE, ncol = 2) +
  tm_legend(legend.position = c("right", "bottom"))+
  tm_layout(outer.margins=0, asp=0)

Correlation Analysis

Before we perform cluster analysis, it is important for us to endure that the input variables use are not highly correlated.

In this section, you will learn how to use corrplot.mixed() function of corrplot package to visualise and analyse the correlation of the input variables.

cluster_vars.cor = cor(ict_derived[,12:17])
corrplot.mixed(cluster_vars.cor,
         lower = "ellipse", 
               upper = "number",
               tl.pos = "lt",
               diag = "l",
               tl.col = "black")

The correlation plot above shows that COMPUTER_PR and INTERNET_PR are highly correlated. This suggest that only one of them should be used in the cluster analysis instead of both.

Hierachy Cluster Analysis

Extracting analysis variables

The code chunk below will be used to extract the clustering variables from the shan_sf simple feature object into data.frame

cluster_vars <- shan_sf %>%
  st_set_geometry(NULL) %>%
  select("TS.x", "RADIO_PR", "TV_PR", "LLPHONE_PR", "MPHONE_PR", "COMPUTER_PR")
cluster_vars
##          TS.x  RADIO_PR    TV_PR LLPHONE_PR MPHONE_PR COMPUTER_PR
## 1     Mongmit 286.18517 554.1313  35.306182 260.69440   12.159391
## 2     Pindaya 417.46466 505.1300  19.835841 162.39170   12.881897
## 3     Ywangan 484.52147 260.5734  11.935906 120.28559    4.414650
## 4    Pinlaung 231.64994 541.7189  28.544542 249.49028   13.762547
## 5      Mabein 449.49027 708.6423  72.752549 392.60890   16.450417
## 6       Kalaw 280.76244 611.6204  42.064778 408.79514   29.631601
## 7       Pekon 318.61183 535.8494  39.832703 214.84764   18.970325
## 8    Lawksawk 387.10175 630.0035  31.513657 320.56863   21.766768
## 9   Nawnghkio 349.33590 547.9456  38.449603 323.02011   15.764647
## 10    Kyaukme 210.95485 601.1773  39.582672 372.49304   30.947094
## 11       Muse 175.88008 842.5317 139.551634 712.92878   91.293779
## 12     Laihka 153.35609 590.5575  65.529010 261.43345   27.758817
## 13    Mongnai 182.51928 515.9062   5.141388 267.51285   28.277635
## 14    Mawkmai  97.44231 234.5010   2.780095  36.41924    7.784265
## 15     Kutkai 238.71578 474.0161  62.897378 264.62088   21.070884
## 16    Mongton 193.51808 388.4453  25.129169 290.51198   21.841240
## 17    Mongyai 168.44238 364.2707  20.333133 135.12905    9.378752
## 18  Mongkaing 156.74749 116.0299   6.000490  50.75925    8.388440
## 19     Lashio 235.73893 680.7891  66.685148 463.83909   60.216842
## 20    Mongpan 228.74008 517.2477  21.582734 437.74211   46.485888
## 21     Matman 244.12297 219.4093   8.438819  45.20796    6.027728
## 22  Tachileik 363.35354 759.4214  59.751756 735.43090   92.401937
## 23    Narphan  75.27012 210.0886  42.187690 305.26891    3.277892
## 24   Mongkhet  79.37695 313.3956  29.034268 114.64174    6.853583
## 25     Hsipaw 198.12167 475.8613  32.316784 269.08669   14.588719
## 26   Monghsat 107.67490 460.1169  34.508769 189.51537   22.817273
## 27    Mongmao 120.05744 512.6855  85.495452 424.98803   11.297271
## 28    Nansang 203.99724 585.6969  37.586810 314.63712   30.641998
## 29  Laukkaing  53.26400 778.5117 134.354578 716.74513   35.556180
## 30   Pangsang  97.44765 490.3358  62.569694 440.83757   50.427456
## 31      Namtu 240.01374 630.6159  35.220342 105.05970   19.929559
## 32  Monghpyak 405.02839 503.3252  42.497972 381.83293   25.141930
## 33    Konkyan  21.05456 287.0743 169.351886 316.73380   10.801904
## 34   Mongping 161.90476 330.8817  24.611033 190.75908   11.598303
## 35     Hopong 205.55645 455.8395  32.311550 209.96441   14.477515
## 36 Nyaungshwe 323.70878 554.4870  18.998921 351.19857   15.691701
## 37   Hsihseng 246.77306 505.5689  31.796405 145.09780    9.917305
## 38     Mongla  82.18126 673.1951  86.981567 555.10753   62.596006
## 39      Hseni 257.32235 628.8310  44.825537 431.87028   19.696069
## 40    Kunlong 121.92071 589.7806 181.485758 237.58661   20.785219
## 41     Hopang 127.60443 556.8737  81.746586 487.68817   24.523976
## 42    Namhkan 134.98623 653.5462  91.796237 596.48877   30.162955
## 43   Kengtung 267.50939 637.7874  63.203895 487.32989   47.542234
## 44    Langkho 227.46781 587.1483  74.391989 403.91035   32.308059
## 45    Monghsu 235.95261 464.9495  11.629171 231.06184   12.389958
## 46   Taunggyi 365.30919 755.2661  81.545688 586.66651   81.170403
## 47   Pangwaun  76.24025 366.7406  28.992770 518.14733    6.013315
## 48     Kyethi 268.62150 447.1408  22.357039  89.65285   12.807550
## 49     Loilen 142.92267 552.6099  21.649390 253.00431   32.015411
## 50     Manton 194.32513 188.3379   8.069764  54.79630    3.644410
## 51   Mongyang  71.74622 569.3250 113.944066 469.01868   12.847119
## 52    Kunhing 161.48796 473.0853  73.085339 340.48140   26.039387
## 53  Mongyawng 278.08501 739.2953 121.978196 324.06383   15.326276
## 54    Tangyan 223.86318 453.2753  22.594047 105.23021   12.065335
## 55    Namhsan 230.83668 390.5005  23.090976  97.77128    8.987943

Notice that the final clustering variables list does not include variable INTERNET_PR because it is highly correlated with variable COMPUTER_PR.

Next, we need to change the rows by township name instead of row number by using the code chunk below

row.names(cluster_vars) <- cluster_vars$"TS.x"
cluster_vars
##                  TS.x  RADIO_PR    TV_PR LLPHONE_PR MPHONE_PR COMPUTER_PR
## Mongmit       Mongmit 286.18517 554.1313  35.306182 260.69440   12.159391
## Pindaya       Pindaya 417.46466 505.1300  19.835841 162.39170   12.881897
## Ywangan       Ywangan 484.52147 260.5734  11.935906 120.28559    4.414650
## Pinlaung     Pinlaung 231.64994 541.7189  28.544542 249.49028   13.762547
## Mabein         Mabein 449.49027 708.6423  72.752549 392.60890   16.450417
## Kalaw           Kalaw 280.76244 611.6204  42.064778 408.79514   29.631601
## Pekon           Pekon 318.61183 535.8494  39.832703 214.84764   18.970325
## Lawksawk     Lawksawk 387.10175 630.0035  31.513657 320.56863   21.766768
## Nawnghkio   Nawnghkio 349.33590 547.9456  38.449603 323.02011   15.764647
## Kyaukme       Kyaukme 210.95485 601.1773  39.582672 372.49304   30.947094
## Muse             Muse 175.88008 842.5317 139.551634 712.92878   91.293779
## Laihka         Laihka 153.35609 590.5575  65.529010 261.43345   27.758817
## Mongnai       Mongnai 182.51928 515.9062   5.141388 267.51285   28.277635
## Mawkmai       Mawkmai  97.44231 234.5010   2.780095  36.41924    7.784265
## Kutkai         Kutkai 238.71578 474.0161  62.897378 264.62088   21.070884
## Mongton       Mongton 193.51808 388.4453  25.129169 290.51198   21.841240
## Mongyai       Mongyai 168.44238 364.2707  20.333133 135.12905    9.378752
## Mongkaing   Mongkaing 156.74749 116.0299   6.000490  50.75925    8.388440
## Lashio         Lashio 235.73893 680.7891  66.685148 463.83909   60.216842
## Mongpan       Mongpan 228.74008 517.2477  21.582734 437.74211   46.485888
## Matman         Matman 244.12297 219.4093   8.438819  45.20796    6.027728
## Tachileik   Tachileik 363.35354 759.4214  59.751756 735.43090   92.401937
## Narphan       Narphan  75.27012 210.0886  42.187690 305.26891    3.277892
## Mongkhet     Mongkhet  79.37695 313.3956  29.034268 114.64174    6.853583
## Hsipaw         Hsipaw 198.12167 475.8613  32.316784 269.08669   14.588719
## Monghsat     Monghsat 107.67490 460.1169  34.508769 189.51537   22.817273
## Mongmao       Mongmao 120.05744 512.6855  85.495452 424.98803   11.297271
## Nansang       Nansang 203.99724 585.6969  37.586810 314.63712   30.641998
## Laukkaing   Laukkaing  53.26400 778.5117 134.354578 716.74513   35.556180
## Pangsang     Pangsang  97.44765 490.3358  62.569694 440.83757   50.427456
## Namtu           Namtu 240.01374 630.6159  35.220342 105.05970   19.929559
## Monghpyak   Monghpyak 405.02839 503.3252  42.497972 381.83293   25.141930
## Konkyan       Konkyan  21.05456 287.0743 169.351886 316.73380   10.801904
## Mongping     Mongping 161.90476 330.8817  24.611033 190.75908   11.598303
## Hopong         Hopong 205.55645 455.8395  32.311550 209.96441   14.477515
## Nyaungshwe Nyaungshwe 323.70878 554.4870  18.998921 351.19857   15.691701
## Hsihseng     Hsihseng 246.77306 505.5689  31.796405 145.09780    9.917305
## Mongla         Mongla  82.18126 673.1951  86.981567 555.10753   62.596006
## Hseni           Hseni 257.32235 628.8310  44.825537 431.87028   19.696069
## Kunlong       Kunlong 121.92071 589.7806 181.485758 237.58661   20.785219
## Hopang         Hopang 127.60443 556.8737  81.746586 487.68817   24.523976
## Namhkan       Namhkan 134.98623 653.5462  91.796237 596.48877   30.162955
## Kengtung     Kengtung 267.50939 637.7874  63.203895 487.32989   47.542234
## Langkho       Langkho 227.46781 587.1483  74.391989 403.91035   32.308059
## Monghsu       Monghsu 235.95261 464.9495  11.629171 231.06184   12.389958
## Taunggyi     Taunggyi 365.30919 755.2661  81.545688 586.66651   81.170403
## Pangwaun     Pangwaun  76.24025 366.7406  28.992770 518.14733    6.013315
## Kyethi         Kyethi 268.62150 447.1408  22.357039  89.65285   12.807550
## Loilen         Loilen 142.92267 552.6099  21.649390 253.00431   32.015411
## Manton         Manton 194.32513 188.3379   8.069764  54.79630    3.644410
## Mongyang     Mongyang  71.74622 569.3250 113.944066 469.01868   12.847119
## Kunhing       Kunhing 161.48796 473.0853  73.085339 340.48140   26.039387
## Mongyawng   Mongyawng 278.08501 739.2953 121.978196 324.06383   15.326276
## Tangyan       Tangyan 223.86318 453.2753  22.594047 105.23021   12.065335
## Namhsan       Namhsan 230.83668 390.5005  23.090976  97.77128    8.987943

Notice that the row number has been replaced into the township name.

Now, we will delete the TS.x field by using the code chunk below.

shan_ict <- select(cluster_vars, c(2:6))
head(shan_ict)
##          RADIO_PR    TV_PR LLPHONE_PR MPHONE_PR COMPUTER_PR
## Mongmit  286.1852 554.1313   35.30618  260.6944    12.15939
## Pindaya  417.4647 505.1300   19.83584  162.3917    12.88190
## Ywangan  484.5215 260.5734   11.93591  120.2856     4.41465
## Pinlaung 231.6499 541.7189   28.54454  249.4903    13.76255
## Mabein   449.4903 708.6423   72.75255  392.6089    16.45042
## Kalaw    280.7624 611.6204   42.06478  408.7951    29.63160

Data Standardisation

In general, multiple variables will be used in cluster analysis. It is not unusual their values range are different. In order to avoid the cluster analysis result is baised to clustering variables with large values, it is useful to standardise the input variables before performing cluster analysis.

Min-Max standardisation

In the code chunk below, normalize() of heatmaply package is used to stadardisation the clustering variables by using Min-Max method. The summary() is then used to display the summary statistics of the standardised clustering variables.

shan_ict.std <- normalize(shan_ict)
summary(shan_ict.std)
##     RADIO_PR          TV_PR          LLPHONE_PR       MPHONE_PR     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.2544   1st Qu.:0.4600   1st Qu.:0.1123   1st Qu.:0.2199  
##  Median :0.4097   Median :0.5523   Median :0.1948   Median :0.3846  
##  Mean   :0.4199   Mean   :0.5416   Mean   :0.2703   Mean   :0.3972  
##  3rd Qu.:0.5330   3rd Qu.:0.6750   3rd Qu.:0.3746   3rd Qu.:0.5608  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##   COMPUTER_PR     
##  Min.   :0.00000  
##  1st Qu.:0.09598  
##  Median :0.17607  
##  Mean   :0.23692  
##  3rd Qu.:0.29868  
##  Max.   :1.00000

Notice that the values range of the Min-max standardised clustering variables are 0-1 now.

Z-score standardisation

Z-score standardisation can be performed easily by using scale() of Base R. The code chunk below will be used to stadardisation the clustering variables by using Z-score method.

shan_ict.z <- scale(shan_ict)
describe(shan_ict.z)
##             vars  n mean sd median trimmed  mad   min  max range  skew kurtosis
## RADIO_PR       1 55    0  1  -0.04   -0.06 0.94 -1.85 2.55  4.40  0.48    -0.27
## TV_PR          2 55    0  1   0.05    0.04 0.78 -2.47 2.09  4.56 -0.38    -0.23
## LLPHONE_PR     3 55    0  1  -0.33   -0.15 0.68 -1.19 3.20  4.39  1.37     1.49
## MPHONE_PR      4 55    0  1  -0.05   -0.06 1.01 -1.58 2.40  3.98  0.48    -0.34
## COMPUTER_PR    5 55    0  1  -0.26   -0.18 0.64 -1.03 3.31  4.34  1.80     2.96
##               se
## RADIO_PR    0.13
## TV_PR       0.13
## LLPHONE_PR  0.13
## MPHONE_PR   0.13
## COMPUTER_PR 0.13

Notice the mean and standard deviation of the Z-score standardised clustering variables are 0 and 1 respectively.

Note: describe() of psych package is used here instead of summary() of Base R because the earlier provides standard deviation.

Warning: Z-score standardisation method should only be used if we would assume all variables come from some normal distribution.

Visualising the standardised clustering variables

Beside reviewing the summary statistics of the standardised clustering variables, it is also a good practice to visualise their distribution graphical.

The code chunk below plot the scaled Radio_PR field.

r <- ggplot(data=ict_derived, 
             aes(x= `RADIO_PR`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue")

shan_ict_s_df <- as.data.frame(shan_ict.std)
s <- ggplot(data=shan_ict_s_df, 
       aes(x=`RADIO_PR`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue") +
  ggtitle("Min-Max Standardisation")

shan_ict_z_df <- as.data.frame(shan_ict.z)
z <- ggplot(data=shan_ict_z_df, 
       aes(x=`RADIO_PR`)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue") +
  ggtitle("Z-score Standardisation")

ggarrange(r, s, z,
          ncol = 3,
          nrow = 1)

shan_ict_mat <- data.matrix(shan_ict)

Notice that the overall distribution of the clustering variables will change after the data standardisation. Hence, it is advisible NOT to perform data standardisation if the values range of the clustering variables are not very large.

Computing proximity matrix

In R, many packages provide functions to calculate distance matrix. We will compute the proximity matrix by using dist() of R.

dist() supports six distance proximity calculations, they are: euclidean, maximum, manhattan, canberra, binary and minkowski. The default is euclidean proximity matrix.

The code chunk below is used to compute the proximity matrix using euclidean method.

proxmat <- dist(shan_ict, method = 'euclidean')

The code chunk below can then be used to list the content of proxmat for visual inspection.

proxmat
##              Mongmit   Pindaya   Ywangan  Pinlaung    Mabein     Kalaw
## Pindaya    171.86828                                                  
## Ywangan    381.88259 257.31610                                        
## Pinlaung    57.46286 208.63519 400.05492                              
## Mabein     263.37099 313.45776 529.14689 312.66966                    
## Kalaw      160.05997 302.51785 499.53297 181.96406 198.14085          
## Pekon       59.61977 117.91580 336.50410  94.61225 282.26877 211.91531
## Lawksawk   140.11550 204.32952 432.16535 192.57320 130.36525 140.01101
## Nawnghkio   89.07103 180.64047 377.87702 139.27495 204.63154 127.74787
## Kyaukme    144.02475 311.01487 505.89191 139.67966 264.88283  79.42225
## Muse       563.01629 704.11252 899.44137 571.58335 453.27410 412.46033
## Laihka     141.87227 298.61288 491.83321 101.10150 345.00222 197.34633
## Mongnai    115.86190 258.49346 422.71934  64.52387 358.86053 200.34668
## Mawkmai    434.92968 437.99577 397.03752 398.11227 693.24602 562.59200
## Kutkai      97.61092 212.81775 360.11861  78.07733 340.55064 204.93018
## Mongton    192.67961 283.35574 361.23257 163.42143 425.16902 267.87522
## Mongyai    256.72744 287.41816 333.12853 220.56339 516.40426 386.74701
## Mongkaing  503.61965 481.71125 364.98429 476.29056 747.17454 625.24500
## Lashio     251.29457 398.98167 602.17475 262.51735 231.28227 106.69059
## Mongpan    193.32063 335.72896 483.68125 192.78316 301.52942 114.69105
## Matman     401.25041 354.39039 255.22031 382.40610 637.53975 537.63884
## Tachileik  529.63213 635.51774 807.44220 555.01039 365.32538 373.64459
## Narphan    406.15714 474.50209 452.95769 371.26895 630.34312 463.53759
## Mongkhet   349.45980 391.74783 408.97731 305.86058 610.30557 465.52013
## Hsipaw     118.18050 245.98884 388.63147  76.55260 366.42787 212.36711
## Monghsat   214.20854 314.71506 432.98028 160.44703 470.48135 317.96188
## Mongmao    242.54541 402.21719 542.85957 217.58854 384.91867 195.18913
## Nansang    104.91839 275.44246 472.77637  85.49572 287.92364 124.30500
## Laukkaing  568.27732 726.85355 908.82520 563.81750 520.67373 427.77791
## Pangsang   272.67383 428.24958 556.82263 244.47146 418.54016 224.03998
## Namtu      179.62251 225.40822 444.66868 170.04533 366.16094 307.27427
## Monghpyak  177.76325 221.30579 367.44835 222.20020 212.69450 167.08436
## Konkyan    403.39082 500.86933 528.12533 365.44693 613.51206 444.75859
## Mongping   265.12574 310.64850 337.94020 229.75261 518.16310 375.64739
## Hopong     136.93111 223.06050 352.85844  98.14855 398.00917 264.16294
## Nyaungshwe  99.38590 216.52463 407.11649 138.12050 210.21337  95.66782
## Hsihseng   131.49728 172.00796 342.91035 111.61846 381.20187 287.11074
## Mongla     384.30076 549.42389 728.16301 372.59678 406.09124 260.26411
## Hseni      189.37188 337.98982 534.44679 204.47572 213.61240  38.52842
## Kunlong    224.12169 355.47066 531.63089 194.76257 396.61508 273.01375
## Hopang     281.05362 443.26362 596.19312 265.96924 368.55167 185.14704
## Namhkan    386.02794 543.81859 714.43173 382.78835 379.56035 246.39577
## Kengtung   246.45691 385.68322 573.23173 263.48638 219.47071  88.29335
## Langkho    164.26299 323.28133 507.78892 168.44228 253.84371  67.19580
## Monghsu    109.15790 198.35391 340.42789  80.86834 367.19820 237.34578
## Taunggyi   399.84278 503.75471 697.98323 429.54386 226.24011 252.26066
## Pangwaun   381.51246 512.13162 580.13146 356.37963 523.44632 338.35194
## Kyethi     202.92551 175.54012 287.29358 189.47065 442.07679 360.17247
## Loilen     145.48666 293.61143 469.51621  91.56527 375.06406 217.19877
## Manton     430.64070 402.42888 306.16379 405.83081 674.01120 560.16577
## Mongyang   309.51302 475.93982 630.71590 286.03834 411.88352 233.56349
## Kunhing    173.50424 318.23811 449.67218 141.58836 375.82140 197.63683
## Mongyawng  214.21738 332.92193 570.56521 235.55497 193.49994 173.43078
## Tangyan    195.92520 208.43740 324.77002 169.50567 448.59948 348.06617
## Namhsan    237.78494 228.41073 286.16305 214.33352 488.33873 385.88676
##                Pekon  Lawksawk Nawnghkio   Kyaukme      Muse    Laihka
## Pindaya                                                               
## Ywangan                                                               
## Pinlaung                                                              
## Mabein                                                                
## Kalaw                                                                 
## Pekon                                                                 
## Lawksawk   157.51129                                                  
## Nawnghkio  113.15370  90.82891                                        
## Kyaukme    202.12206 186.29066 157.04230                              
## Muse       614.56144 510.13288 533.68806 434.75768                    
## Laihka     182.23667 246.74469 211.88187 128.24979 526.65211          
## Mongnai    151.60031 241.71260 182.21245 142.45669 571.97975 100.53457
## Mawkmai    416.00669 567.52693 495.15047 512.02846 926.93007 429.96554
## Kutkai     114.98048 224.64646 147.44053 170.93318 592.90743 144.67198
## Mongton    208.14888 311.07742 225.81118 229.28509 634.71074 212.07320
## Mongyai    242.52301 391.26989 319.57938 339.27780 763.91399 264.13364
## Mongkaing  480.23965 625.18712 546.69447 586.05094 995.66496 522.96309
## Lashio     303.80011 220.75270 230.55346 129.95255 313.15288 238.64533
## Mongpan    243.30037 228.54223 172.84425 110.37831 447.49969 210.76951
## Matman     368.25761 515.39711 444.05061 505.52285 929.11283 443.25453
## Tachileik  573.39528 441.82621 470.45533 429.15493 221.19950 549.08985
## Narphan    416.84901 523.69580 435.59661 420.30003 770.40234 392.32592
## Mongkhet   342.08722 487.41102 414.10280 409.03553 816.44931 324.97428
## Hsipaw     145.37542 249.35081 176.09570 163.95741 591.03355 128.42987
## Monghsat   225.64279 352.31496 289.83220 253.25370 663.76026 158.93517
## Mongmao    293.70625 314.64777 257.76465 146.09228 451.82530 185.99082
## Nansang    160.37607 188.78869 151.13185  60.32773 489.35308  78.78999
## Laukkaing  624.82399 548.83928 552.65554 428.74978 149.26996 507.39700
## Pangsang   321.81214 345.91486 287.10769 175.35273 460.24292 214.19291
## Namtu      165.02707 260.95300 257.52713 270.87277 659.16927 185.86794
## Monghpyak  190.93173 142.31691  93.03711 217.64419 539.43485 293.22640
## Konkyan    421.48797 520.31264 439.34272 393.79911 704.86973 351.75354
## Mongping   259.68288 396.47081 316.14719 330.28984 744.44948 272.82761
## Hopong     138.86577 274.91604 204.88286 218.84211 648.68011 157.48857
## Nyaungshwe 139.31874 104.17830  43.26545 126.50414 505.88581 201.71653
## Hsihseng   105.30573 257.11202 209.88026 250.27059 677.66886 175.89761
## Mongla     441.20998 393.18472 381.40808 241.58966 256.80556 315.93218
## Hseni      243.98001 171.50398 164.05304  81.20593 381.30567 204.49010
## Kunlong    249.36301 318.30406 285.04608 215.63037 547.24297 122.68682
## Hopang     336.38582 321.16462 279.84188 154.91633 377.44407 230.78652
## Namhkan    442.77120 379.41126 367.33575 247.81990 238.67060 342.43665
## Kengtung   297.67761 209.38215 208.29647 136.23356 330.08211 258.23950
## Langkho    219.21623 190.30257 156.51662  51.67279 413.64173 160.94435
## Monghsu    113.84636 242.04063 170.09168 200.77712 633.21624 163.28926
## Taunggyi   440.66133 304.96838 344.79200 312.60547 250.81471 425.36916
## Pangwaun   423.81347 453.02765 381.67478 308.31407 541.97887 351.78203
## Kyethi     162.43575 317.74604 267.21607 328.14177 757.16745 255.83275
## Loilen     181.94596 265.29318 219.26405 146.92675 560.43400  59.69478
## Manton     403.82131 551.13000 475.77296 522.86003 941.49778 458.30232
## Mongyang   363.58788 363.37684 323.32123 188.59489 389.59919 229.71502
## Kunhing    213.46379 278.68953 206.15773 145.00266 533.00162 142.03682
## Mongyawng  248.43910 179.07229 220.61209 181.55295 422.37358 211.99976
## Tangyan    167.79937 323.14701 269.07880 306.78359 736.93741 224.29176
## Namhsan    207.16559 362.84062 299.74967 347.85944 778.52971 273.79672
##              Mongnai   Mawkmai    Kutkai   Mongton   Mongyai Mongkaing
## Pindaya                                                               
## Ywangan                                                               
## Pinlaung                                                              
## Mabein                                                                
## Kalaw                                                                 
## Pekon                                                                 
## Lawksawk                                                              
## Nawnghkio                                                             
## Kyaukme                                                               
## Muse                                                                  
## Laihka                                                                
## Mongnai                                                               
## Mawkmai    374.50873                                                  
## Kutkai      91.15307 364.95519                                        
## Mongton    131.67061 313.35220 107.06341                              
## Mongyai    203.23607 178.70499 188.94166 159.79790                    
## Mongkaing  456.00842 133.29995 428.96133 365.50032 262.84016          
## Lashio     270.86983 638.60773 289.82513 347.11584 466.36472 708.65819
## Mongpan    178.09554 509.99632 185.18173 200.31803 346.39710 563.56780
## Matman     376.33870 147.83545 340.86349 303.04574 186.95158 135.51424
## Tachileik  563.95232 919.38755 568.99109 608.76740 750.29555 967.14087
## Narphan    329.31700 273.75350 314.27683 215.97925 248.82845 285.65085
## Mongkhet   275.76855 115.58388 273.91673 223.22828 104.98924 222.60577
## Hsipaw      52.68195 351.34601  51.46282  90.69766 177.33790 423.77868
## Monghsat   125.25968 275.09705 154.32012 150.98053 127.35225 375.60376
## Mongmao    188.29603 485.52853 204.69232 206.57001 335.61300 552.31959
## Nansang     92.79567 462.41938 130.04549 199.58124 288.55962 542.16609
## Laukkaing  551.56800 882.51110 580.38112 604.66190 732.68347 954.11795
## Pangsang   204.25746 484.14757 228.33583 210.77938 343.30638 548.40662
## Namtu      209.35473 427.95451 225.28268 308.71751 278.02761 525.04057
## Monghpyak  253.26470 536.71695 206.61627 258.04282 370.01575 568.21089
## Konkyan    328.82831 339.01411 310.60810 248.25265 287.87384 380.92091
## Mongping   202.99615 194.31049 182.75266 119.86993  65.38727 257.18572
## Hopong      91.53795 302.84362  73.45899 106.21031 124.62791 379.37916
## Nyaungshwe 169.63695 502.99026 152.15482 219.72196 327.13541 557.32112
## Hsihseng   142.36728 329.29477 128.21054 194.64317 162.27126 411.59788
## Mongla     354.10985 686.88950 388.40984 411.06668 535.28615 761.48327
## Hseni      216.81639 582.53670 229.37894 286.75945 408.23212 648.04408
## Kunlong    202.92529 446.53763 204.54010 270.02165 299.36066 539.91284
## Hopang     243.00945 561.24281 263.31986 273.50305 408.73288 626.17673
## Namhkan    370.05669 706.47792 392.48568 414.53594 550.62819 771.39688
## Kengtung   272.28711 632.54638 279.19573 329.38387 460.39706 692.74693
## Langkho    174.67678 531.08019 180.51419 236.70878 358.95672 597.42714
## Monghsu     84.11238 332.07962  62.60859 107.04894 154.86049 400.71816
## Taunggyi   448.55282 810.74692 450.33382 508.40925 635.94105 866.21117
## Pangwaun   312.13429 500.68857 321.80465 257.50434 394.07696 536.95736
## Kyethi     210.50453 278.85535 184.23422 222.52947 137.79420 352.06533
## Loilen      58.41263 388.73386 131.56529 176.16001 224.79239 482.18190
## Manton     391.54062 109.08779 361.82684 310.20581 195.59882  81.75337
## Mongyang   260.39387 558.83162 285.33223 295.60023 414.31237 631.91325
## Kunhing    110.55197 398.43973 108.84990 114.03609 238.99570 465.03971
## Mongyawng  275.77546 620.04321 281.03383 375.22688 445.78964 700.98284
## Tangyan    180.37471 262.66006 166.61820 198.88460 109.08506 348.56123
## Namhsan    218.10003 215.19289 191.32762 196.76188  77.35900 288.66231
##               Lashio   Mongpan    Matman Tachileik   Narphan  Mongkhet
## Pindaya                                                               
## Ywangan                                                               
## Pinlaung                                                              
## Mabein                                                                
## Kalaw                                                                 
## Pekon                                                                 
## Lawksawk                                                              
## Nawnghkio                                                             
## Kyaukme                                                               
## Muse                                                                  
## Laihka                                                                
## Mongnai                                                               
## Mawkmai                                                               
## Kutkai                                                                
## Mongton                                                               
## Mongyai                                                               
## Mongkaing                                                             
## Lashio                                                                
## Mongpan    172.33279                                                  
## Matman     628.11049 494.81014                                        
## Tachileik  311.95286 411.03849 890.12935                              
## Narphan    525.63854 371.13393 312.05193 760.29566                    
## Mongkhet   534.44463 412.17123 203.02855 820.50164 217.28718          
## Hsipaw     290.86435 179.52054 344.45451 576.18780 295.40170 253.80950
## Monghsat   377.86793 283.30992 313.59911 677.09508 278.21548 167.98445
## Mongmao    214.23677 131.59966 501.59903 472.95568 331.42618 375.35820
## Nansang    184.47950 144.77393 458.06573 486.77266 398.13308 360.99219
## Laukkaing  334.65738 435.58047 903.72094 325.06329 708.82887 769.06406
## Pangsang   236.72516 140.23910 506.29940 481.31907 316.30314 375.58139
## Namtu      365.88437 352.91394 416.65397 659.56458 494.36143 355.99713
## Monghpyak  262.09281 187.85699 470.46845 444.04411 448.40651 462.63265
## Konkyan    485.51312 365.87588 392.40306 730.92980 158.82353 254.24424
## Mongping   454.52548 318.47482 201.65224 727.08969 188.64567 113.80917
## Hopong     345.31042 239.43845 291.84351 632.45718 294.40441 212.99485
## Nyaungshwe 201.58191 137.29734 460.91883 445.81335 427.94086 417.08639
## Hsihseng   369.00833 295.87811 304.02806 658.87060 377.52977 256.70338
## Mongla     179.95877 253.20001 708.17595 347.33155 531.46949 574.40292
## Hseni       79.41836 120.66550 564.64051 354.90063 474.12297 481.88406
## Kunlong    295.23103 288.03320 468.27436 595.70536 413.07823 341.68641
## Hopang     170.63913 135.62913 573.55355 403.82035 397.85908 451.51070
## Namhkan    173.27153 240.34131 715.42102 295.91660 536.85519 596.19944
## Kengtung    59.85893 142.21554 613.01033 295.90429 505.40025 531.35998
## Langkho    115.18145  94.98486 518.86151 402.33622 420.65204 428.08061
## Monghsu    325.71557 216.25326 308.13805 605.02113 311.92379 247.73318
## Taunggyi   195.14541 319.81385 778.45810 150.84117 684.20905 712.80752
## Pangwaun   362.45608 232.52209 523.43600 540.60474 264.64997 407.02947
## Kyethi     447.10266 358.89620 233.83079 728.87329 374.90376 233.25039
## Loilen     268.92310 207.25000 406.56282 573.75476 354.79137 284.76895
## Manton     646.66493 507.96808  59.52318 910.23039 280.26395 181.33894
## Mongyang   209.33700 194.93467 585.61776 448.79027 401.39475 445.40621
## Kunhing    255.10832 137.85278 403.66587 532.26397 281.62645 292.49814
## Mongyawng  172.70139 275.15989 601.80824 432.10118 572.76394 522.91815
## Tangyan    429.84475 340.39128 242.78233 719.84066 348.84991 201.49393
## Namhsan    472.04024 364.77086 180.09747 754.03913 316.54695 170.90848
##               Hsipaw  Monghsat   Mongmao   Nansang Laukkaing  Pangsang
## Pindaya                                                               
## Ywangan                                                               
## Pinlaung                                                              
## Mabein                                                                
## Kalaw                                                                 
## Pekon                                                                 
## Lawksawk                                                              
## Nawnghkio                                                             
## Kyaukme                                                               
## Muse                                                                  
## Laihka                                                                
## Mongnai                                                               
## Mawkmai                                                               
## Kutkai                                                                
## Mongton                                                               
## Mongyai                                                               
## Mongkaing                                                             
## Lashio                                                                
## Mongpan                                                               
## Matman                                                                
## Tachileik                                                             
## Narphan                                                               
## Mongkhet                                                              
## Hsipaw                                                                
## Monghsat   121.78922                                                  
## Mongmao    185.99483 247.17708                                        
## Nansang    120.24428 201.92690 164.99494                              
## Laukkaing  569.06099 626.44910 404.00848 480.60074                    
## Pangsang   205.04337 256.37933  57.60801 193.36162 408.04016          
## Namtu      229.44658 231.78673 365.03882 217.61884 664.06286 392.97391
## Monghpyak  237.67919 356.84917 291.88846 227.52638 565.84279 315.11651
## Konkyan    296.74316 268.25060 281.87425 374.70456 635.92043 274.81900
## Mongping   168.92101 140.95392 305.57166 287.36626 708.13447 308.33123
## Hopong      62.86179 100.45714 244.16253 167.66291 628.48557 261.51075
## Nyaungshwe 169.92664 286.37238 230.45003 131.18943 520.24345 257.77823
## Hsihseng   136.54610 153.49551 311.98001 193.53779 670.74564 335.52974
## Mongla     373.47509 429.00536 216.24705 289.45119 202.55831 217.88123
## Hseni      231.48538 331.22632 184.67099 136.45492 391.74585 214.66375
## Kunlong    205.10051 202.31862 224.43391 183.01388 521.88657 258.49342
## Hopang     248.72536 317.64824  78.29342 196.47091 331.67199  92.57672
## Namhkan    382.79302 455.10875 223.32205 302.89487 196.46063 231.38484
## Kengtung   284.08582 383.72138 207.58055 193.67980 351.48520 229.85484
## Langkho    183.05109 279.52329 134.50170  99.39859 410.41270 167.65920
## Monghsu     58.55724 137.24737 242.43599 153.59962 619.01766 260.52971
## Taunggyi   462.31183 562.88102 387.33906 365.04897 345.98041 405.59730
## Pangwaun   298.12447 343.53898 187.40057 326.12960 470.63605 157.48757
## Kyethi     195.17677 190.50609 377.89657 273.02385 749.99415 396.89963
## Loilen      98.04789 118.65144 190.26490  94.23028 535.57527 207.94433
## Manton     359.60008 317.15603 503.79786 476.55544 907.38406 504.75214
## Mongyang   267.10497 312.64797  91.06281 218.49285 326.19219 108.37735
## Kunhing     90.77517 165.38834 103.91040 128.20940 500.41640 123.18870
## Mongyawng  294.70967 364.40429 296.40789 191.11990 454.80044 336.16703
## Tangyan    167.69794 144.59626 347.14183 249.70235 722.40954 364.76893
## Namhsan    194.47928 169.56962 371.71448 294.16284 760.45960 385.65526
##                Namtu Monghpyak   Konkyan  Mongping    Hopong Nyaungshwe
## Pindaya                                                                
## Ywangan                                                                
## Pinlaung                                                               
## Mabein                                                                 
## Kalaw                                                                  
## Pekon                                                                  
## Lawksawk                                                               
## Nawnghkio                                                              
## Kyaukme                                                                
## Muse                                                                   
## Laihka                                                                 
## Mongnai                                                                
## Mawkmai                                                                
## Kutkai                                                                 
## Mongton                                                                
## Mongyai                                                                
## Mongkaing                                                              
## Lashio                                                                 
## Mongpan                                                                
## Matman                                                                 
## Tachileik                                                              
## Narphan                                                                
## Mongkhet                                                               
## Hsipaw                                                                 
## Monghsat                                                               
## Mongmao                                                                
## Nansang                                                                
## Laukkaing                                                              
## Pangsang                                                               
## Namtu                                                                  
## Monghpyak  346.57799                                                   
## Konkyan    478.37690 463.39594                                         
## Mongping   321.66441 354.76537 242.02901                               
## Hopong     206.82668 267.95563 304.49287 134.00139                     
## Nyaungshwe 271.41464 103.97300 432.35040 319.32583 209.32532           
## Hsihseng   131.89940 285.37627 383.49700 199.64389  91.65458  225.80242
## Mongla     483.49434 408.03397 468.09747 512.61580 432.31105  347.60273
## Hseni      327.41448 200.26876 448.84563 395.58453 286.41193  130.86310
## Kunlong    233.60474 357.44661 329.11433 309.05385 219.06817  285.13095
## Hopang     408.24516 304.26577 348.18522 379.27212 309.77356  247.19891
## Namhkan    506.32466 379.50202 481.59596 523.74815 444.13246  333.32428
## Kengtung   385.33554 221.47613 474.82621 442.80821 340.47382  177.75714
## Langkho    305.03473 200.27496 386.95022 343.96455 239.63685  128.26577
## Monghsu    209.64684 232.17823 331.72187 158.90478  43.40665  173.82799
## Taunggyi   518.72748 334.17439 650.56905 621.53039 513.76415  325.09619
## Pangwaun   517.03554 381.95144 263.97576 340.37881 346.00673  352.92324
## Kyethi     186.90932 328.16234 400.10989 187.43974 136.49038  288.06872
## Loilen     194.24075 296.99681 334.19820 231.99959 124.74445  206.40432
## Manton     448.58230 502.20840 366.66876 200.48082 310.58885  488.79874
## Mongyang   413.26052 358.17599 329.39338 387.80686 323.35704  294.29500
## Kunhing    296.43996 250.74435 253.74202 212.59619 145.15617  189.97131
## Mongyawng  262.24331 285.56475 522.38580 455.59190 326.59925  218.12104
## Tangyan    178.69483 335.26416 367.46064 161.67411 106.82328  284.14692
## Namhsan    240.95555 352.70492 352.20115 130.23777 132.70541  315.91750
##             Hsihseng    Mongla     Hseni   Kunlong    Hopang   Namhkan
## Pindaya                                                               
## Ywangan                                                               
## Pinlaung                                                              
## Mabein                                                                
## Kalaw                                                                 
## Pekon                                                                 
## Lawksawk                                                              
## Nawnghkio                                                             
## Kyaukme                                                               
## Muse                                                                  
## Laihka                                                                
## Mongnai                                                               
## Mawkmai                                                               
## Kutkai                                                                
## Mongton                                                               
## Mongyai                                                               
## Mongkaing                                                             
## Lashio                                                                
## Mongpan                                                               
## Matman                                                                
## Tachileik                                                             
## Narphan                                                               
## Mongkhet                                                              
## Hsipaw                                                                
## Monghsat                                                              
## Mongmao                                                               
## Nansang                                                               
## Laukkaing                                                             
## Pangsang                                                              
## Namtu                                                                 
## Monghpyak                                                             
## Konkyan                                                               
## Mongping                                                              
## Hopong                                                                
## Nyaungshwe                                                            
## Hsihseng                                                              
## Mongla     478.66210                                                  
## Hseni      312.74375 226.82048                                        
## Kunlong    231.85967 346.46200 276.19175                              
## Hopang     370.01334 147.02444 162.80878 271.34451                    
## Namhkan    492.09476  77.21355 212.11323 375.73885 146.18632          
## Kengtung   370.72441 202.45004  66.12817 317.14187 164.29921 175.63015
## Langkho    276.27441 229.01675  66.66133 224.52741 134.24847 224.40029
## Monghsu     97.82470 424.51868 262.28462 239.89665 301.84458 431.32637
## Taunggyi   528.14240 297.09863 238.19389 471.29032 329.95252 257.29147
## Pangwaun   433.06326 319.18643 330.70182 392.45403 206.98364 310.44067
## Kyethi      84.04049 556.02500 388.33498 298.55859 440.48114 567.86202
## Loilen     158.84853 338.67408 227.10984 166.53599 242.89326 364.90647
## Manton     334.87758 712.51416 584.63341 479.76855 577.52046 721.86149
## Mongyang   382.59743 146.66661 210.19929 247.22785  69.25859 167.72448
## Kunhing    220.15490 306.47566 206.47448 193.77551 172.96164 314.92119
## Mongyawng  309.51462 315.57550 173.86004 240.39800 290.51360 321.21112
## Tangyan     70.27241 526.80849 373.07575 268.07983 412.22167 542.64078
## Namhsan    125.74240 564.02740 411.96125 310.40560 440.51555 576.42717
##             Kengtung   Langkho   Monghsu  Taunggyi  Pangwaun    Kyethi
## Pindaya                                                               
## Ywangan                                                               
## Pinlaung                                                              
## Mabein                                                                
## Kalaw                                                                 
## Pekon                                                                 
## Lawksawk                                                              
## Nawnghkio                                                             
## Kyaukme                                                               
## Muse                                                                  
## Laihka                                                                
## Mongnai                                                               
## Mawkmai                                                               
## Kutkai                                                                
## Mongton                                                               
## Mongyai                                                               
## Mongkaing                                                             
## Lashio                                                                
## Mongpan                                                               
## Matman                                                                
## Tachileik                                                             
## Narphan                                                               
## Mongkhet                                                              
## Hsipaw                                                                
## Monghsat                                                              
## Mongmao                                                               
## Nansang                                                               
## Laukkaing                                                             
## Pangsang                                                              
## Namtu                                                                 
## Monghpyak                                                             
## Konkyan                                                               
## Mongping                                                              
## Hopong                                                                
## Nyaungshwe                                                            
## Hsihseng                                                              
## Mongla                                                                
## Hseni                                                                 
## Kunlong                                                               
## Hopang                                                                
## Namhkan                                                               
## Kengtung                                                              
## Langkho    107.16213                                                  
## Monghsu    316.91914 221.84918                                        
## Taunggyi   186.28225 288.27478 486.91951                              
## Pangwaun   337.48335 295.38434 343.38498 497.61245                    
## Kyethi     444.26274 350.91512 146.61572 599.57407 476.62610          
## Loilen     282.22935 184.10672 131.55208 455.91617 331.69981 232.32965
## Manton     631.99123 535.95620 330.76503 803.08034 510.79265 272.03299
## Mongyang   217.08047 175.35413 323.95988 374.58247 225.25026 453.86726
## Kunhing    245.95083 146.38284 146.78891 429.98509 229.09986 278.95182
## Mongyawng  203.87199 186.11584 312.85089 287.73864 475.33116 387.71518
## Tangyan    429.95076 332.02048 127.42203 592.65262 447.05580  47.79331
## Namhsan    466.20497 368.20978 153.22576 631.49232 448.58030  68.67929
##               Loilen    Manton  Mongyang   Kunhing Mongyawng   Tangyan
## Pindaya                                                               
## Ywangan                                                               
## Pinlaung                                                              
## Mabein                                                                
## Kalaw                                                                 
## Pekon                                                                 
## Lawksawk                                                              
## Nawnghkio                                                             
## Kyaukme                                                               
## Muse                                                                  
## Laihka                                                                
## Mongnai                                                               
## Mawkmai                                                               
## Kutkai                                                                
## Mongton                                                               
## Mongyai                                                               
## Mongkaing                                                             
## Lashio                                                                
## Mongpan                                                               
## Matman                                                                
## Tachileik                                                             
## Narphan                                                               
## Mongkhet                                                              
## Hsipaw                                                                
## Monghsat                                                              
## Mongmao                                                               
## Nansang                                                               
## Laukkaing                                                             
## Pangsang                                                              
## Namtu                                                                 
## Monghpyak                                                             
## Konkyan                                                               
## Mongping                                                              
## Hopong                                                                
## Nyaungshwe                                                            
## Hsihseng                                                              
## Mongla                                                                
## Hseni                                                                 
## Kunlong                                                               
## Hopang                                                                
## Namhkan                                                               
## Kengtung                                                              
## Langkho                                                               
## Monghsu                                                               
## Taunggyi                                                              
## Pangwaun                                                              
## Kyethi                                                                
## Loilen                                                                
## Manton     419.06087                                                  
## Mongyang   246.76592 585.70558                                        
## Kunhing    130.39336 410.49230 188.89405                              
## Mongyawng  261.75211 629.43339 304.21734 295.35984                    
## Tangyan    196.60826 271.82672 421.06366 249.74161 377.52279          
## Namhsan    242.15271 210.48485 450.97869 270.79121 430.02019  63.67613

Computing hierarchical clustering

In R, there are several packages provide hierarchical clustering function. In this hands-on exercise, hclust() of R stats will be used.

hclust() employed agglomeration method to compute the cluster. Eight clustering algorithms are supported, they are: ward.D, ward.D2, single, complete, average(UPGMA), mcquitty(WPGMA), median(WPGMC) and centroid(UPGMC).

The code chunk below performs hierarchical cluster analysis using ward.D method. The hierarchical clustering output is stored in an object of class hclust which describes the tree produced by the clustering process.

hclust_ward <- hclust(proxmat, method = 'ward.D')

We can then plot the tree by using plot() of R Graphicas as shown in the code chunk below

plot(hclust_ward, cex = 0.6)

Selecting the optimal clustering algorithm

One of the challenge in performing hierarchical clustering is to identify stronger clustering structures. The issue can be solved by using use agnes() function of cluster package. It functions like hclus(), however, with the agnes() function you can also get the agglomerative coefficient, which measures the amount of clustering structure found (values closer to 1 suggest strong clustering structure).

The code chunk below will be used to compute the agglomerative coefficients of all hierarchical clustering algorithms.

m <- c( "average", "single", "complete", "ward")
names(m) <- c( "average", "single", "complete", "ward")

ac <- function(x) {
  agnes(shan_ict, method = x)$ac
}

map_dbl(m, ac)
##   average    single  complete      ward 
## 0.8131144 0.6628705 0.8950702 0.9427730

With reference to the output above, we can see that Ward’s method provides the strongest clustering structure among the four methods assessed. Hence, in the subsequent analysis, only Ward’s method will be used.

Interpreting the dendrograms

In the dendrogram displayed above, each leaf corresponds to one observation. As we move up the tree, observations that are similar to each other are combined into branches, which are themselves fused at a higher height.

The height of the fusion, provided on the vertical axis, indicates the (dis)similarity between two observations. The higher the height of the fusion, the less similar the observations are. Note that, conclusions about the proximity of two observations can be drawn only based on the height where branches containing those two observations first are fused. We cannot use the proximity of two observations along the horizontal axis as a criteria of their similarity.

It’s also possible to draw the dendrogram with a border around the selected clusters by using rect.hclust() of R stats. The argument border is used to specify the border colors for the rectangles.

plot(hclust_ward, cex = 0.6)
rect.hclust(hclust_ward, k = 5, border = 2:5)

### Visually-driven hierarchical clustering analysis In this section, we will learn how to perform visually-driven hiearchical clustering analysis by using heatmaply package.

With heatmaply, we are able to build both highly interactive cluster heatmap or static cluster heatmap.

Transforming the data frame into a matrix

The data was loaded into a data frame, but it has to be a data matrix to make your heatmap.

The code chunk below will be used to transform shan_ict data frame into a data matrix.

shan_ict_mat <- data.matrix(shan_ict)

Plotting interactive cluster heatmap using heatmaply()

In the code chunk below, the heatmaply() is used to build an interactive cluster heatmap.

heatmaply(shan_ict_mat,
          Colv=NA,
          dist_method = "euclidean",
          hclust_method = "ward.D",
          seriate = "OLO",
          colors = Blues,
          k_row = 5,
          margins = c(NA,200,60,NA),
          fontsize_row = 4,
          fontsize_col = 5,
          main="Geographic Segmentation of Shan State by ICT indicators",
          xlab = "ICT Indicators",
          ylab = "Townships of Shan State"
          )
## Warning in doTryCatch(return(expr), name, parentenv, handler): unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
##   dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 6): Library not loaded: /opt/X11/lib/libSM.6.dylib
##   Referenced from: /Library/Frameworks/R.framework/Resources/modules//R_X11.so
##   Reason: image not found

Mapping the clusters formed

With closed examination of the dendragram above, we have decided to retain five clusters.

cutree() of R Base will be used in the code chunk below to derive a 5-cluster model.

groups <- as.factor(cutree(hclust_ward, k=5))

The output is called groups. It is a list object.

In order to visualise the clusters, the groups object need to be appended onto shan_sf simple feature object.

The code chunk below form the join in three steps: 1. the groups list object will be converted into a matrix; 2. cbind() is used to append groups matrix onto shan_sf to produce an output simple feature object called shan_sf_cluster; and 3. rename of dplyr package is used to rename as.matrix.groups field as CLUSTER.

shan_sf_cluster <- cbind(shan_sf, as.matrix(groups)) %>%
  rename(`CLUSTER`=`as.matrix.groups.`)

Next, qtm() of tmap package is used to plot the choropleth map showing the cluster formed.

qtm(shan_sf_cluster, "CLUSTER")

The choropleth map above reveals the clusters are very fragmented. The is one of the major limitation when non-spatial clustering algorithm such as hierarchical cluster analysis method is used.

Spatially Constrained Clustering - SKATER approach

In this section, you will learn how to derive spatially constrained cluster by using SKATER method.

Converting into SpatialPolygonsDataFrame

First, we need to convert shan_sf into SpatialPolygonDataFrame. This is because SKATER function only support sp objects such as SpatialPolygonDataFrame.

The code chunk below uses as_Spatial() of sf package to convert shan_sf into a SpatialPolygonDataFrame called shan_sp.

shan_sp <- as_Spatial(shan_sf)

Computing Neighbour list

Next, poly2nd() of spdep package will be used to compute the neighbours list from polygon list.

shan.nb <- poly2nb(shan_sp)
summary(shan.nb)
## Neighbour list object:
## Number of regions: 55 
## Number of nonzero links: 264 
## Percentage nonzero weights: 8.727273 
## Average number of links: 4.8 
## Link number distribution:
## 
##  2  3  4  5  6  7  8  9 
##  5  9  7 21  4  3  5  1 
## 5 least connected regions:
## 3 5 7 9 47 with 2 links
## 1 most connected region:
## 8 with 9 links

We can plot the neighbours list on shan_sp by using the code chunk below. Since we now can plot the community area boundaries as well, we plot this graph on top of the map. The first plot command gives the boundaries. This is followed by the plot of the neighbor list object, with coordinates applied to the original SpatialPolygonDataFrame (Shan state township boundaries) to extract the centroids of the polygons. These are used as the nodes for the graph representation. We also set the color to blue and specify add=TRUE to plot the network on top of the boundaries.

plot(shan_sp, border=grey(.5))
plot(shan.nb, coordinates(shan_sp), col="blue", add=TRUE)

Note that if you plot the network first and then the boundaries, some of the areas will be clipped. This is because the plotting area is determined by the characteristics of the first plot. In this example, because the boundary map extends further than the graph, we plot it first.

Computing minimum spanning tree

Calculating edge costs

Next, nbcosts() of spdep package is used to compute the cost of each edge. It is the distance between it nodes. This function compute this distance using a data.frame with observations vector in each node.

The code chunk below is used to compute the cost of each edge.

lcosts <- nbcosts(shan.nb, shan_ict)

For each observation, this gives the pairwise dissimilarity between its values on the five variables and the values for the neighbouring observation (from the neighbour list). Basically, this is the notion of a generalised weight for a spatial weights matrix.

Next, We will incorporate these costs into a weights object in the same way as we did in the calculation of inverse of distance weights. In other words, we convert the neighbour list to a list weights object by specifying the just computed lcosts as the weights.

In order to achieve this, nb2listw() of spdep package is used as shown in the code chunk below.

Note that we specify the style as B to make sure the cost values are not row-standardised

shan.w <- nb2listw(shan.nb, lcosts, style="B")
summary(shan.w)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 55 
## Number of nonzero links: 264 
## Percentage nonzero weights: 8.727273 
## Average number of links: 4.8 
## Link number distribution:
## 
##  2  3  4  5  6  7  8  9 
##  5  9  7 21  4  3  5  1 
## 5 least connected regions:
## 3 5 7 9 47 with 2 links
## 1 most connected region:
## 8 with 9 links
## 
## Weights style: B 
## Weights constants summary:
##    n   nn       S0       S1        S2
## B 55 3025 76267.65 58260785 522016004

Computing minimum spanning tree

The minimum spanning tree is computed by mean of the mstree() of spdep package as shown in the code chunk below.

shan.mst <- mstree(shan.w)

After computing the MST, we can check its class and dimension by using the code chunk below.

class(shan.mst)
## [1] "mst"    "matrix"
dim(shan.mst)
## [1] 54  3

Note that the dimension is 54 and not 55. This is because the minimum spanning tree consists on n-1 edges (links) in order to traverse all the nodes.

We can display the content of shan.mst by using head() as shown in the code chunk below.

head(shan.mst)
##      [,1] [,2]      [,3]
## [1,]   23   47 264.64997
## [2,]   47   27 187.40057
## [3,]   27   30  57.60801
## [4,]   27   41  78.29342
## [5,]   30   51 108.37735
## [6,]   51   38 146.66661

The plot method for the MST include a way to show the observation numbers of the nodes in addition to the edge. As before, we plot this together with the township boundaries. We can see how the initial neighbour list is simplified to just one edge connecting each of the nodes, while passing through all the nodes.

plot(shan_sp, border=gray(.5))
plot.mst(shan.mst, coordinates(shan_sp), 
     col="blue", cex.lab=0.7, cex.circles=0.005, add=TRUE)

Computing spatially constrained clusters using SKATER method

The code chunk below compute the spatially constrained cluster using skater() of spdep package.

clust5 <- skater(shan.mst[,1:2], shan_ict, 4)

The skater() takes three mandatory arguments: - the first two columns of the MST matrix (i.e. not the cost), - the data matrix (to update the costs as units are being grouped), and - the number of cuts. Note: It is set to one less than the number of clusters. So, the value specified is not the number of clusters, but the number of cuts in the graph, one less than the number of custers.

The result of the skater() is an object of class skater. We can examine its contents by using the code chunk below.

str(clust5)
## List of 8
##  $ groups      : num [1:55] 3 3 3 3 3 3 3 3 3 3 ...
##  $ edges.groups:List of 5
##   ..$ :List of 3
##   .. ..$ node: num 23
##   .. ..$ edge: num[0 , 1:3] 
##   .. ..$ ssw : num 0
##   ..$ :List of 3
##   .. ..$ node: num [1:22] 13 48 54 55 45 37 34 16 25 52 ...
##   .. ..$ edge: num [1:21, 1:3] 48 55 54 37 34 16 45 25 13 13 ...
##   .. ..$ ssw : num 3423
##   ..$ :List of 3
##   .. ..$ node: num [1:12] 10 9 6 2 8 1 4 36 46 3 ...
##   .. ..$ edge: num [1:11, 1:3] 2 6 8 1 36 4 6 10 10 8 ...
##   .. ..$ ssw : num 1846
##   ..$ :List of 3
##   .. ..$ node: num [1:2] 44 20
##   .. ..$ edge: num [1, 1:3] 44 20 95
##   .. ..$ ssw : num 95
##   ..$ :List of 3
##   .. ..$ node: num [1:18] 47 53 38 42 15 41 51 43 32 30 ...
##   .. ..$ edge: num [1:17, 1:3] 53 15 42 38 41 51 15 47 15 43 ...
##   .. ..$ ssw : num 3759
##  $ not.prune   : NULL
##  $ candidates  : int [1:5] 1 2 3 4 5
##  $ ssto        : num 12613
##  $ ssw         : num [1:5] 12613 10977 9962 9540 9123
##  $ crit        : num [1:2] 1 Inf
##  $ vec.crit    : num [1:55] 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "class")= chr "skater"

The most interesting component of this list structure is the groups vector containing the labels of the cluster to which each observation belongs (as before, the label itself is arbitary). This is followed by a detailed summary for each of the clusters in the edges.groups list. Sum of squares measures are given as ssto for the total and ssw to show the effect of each of the cuts on the overall criterion.

We can check the cluster assignment by using the conde chunk below.

ccs5 <- clust5$groups
ccs5
##  [1] 3 3 3 3 3 3 3 3 3 3 5 2 2 2 5 2 2 2 5 4 2 5 1 2 2 2 5 2 5 5 2 5 5 2 2 3 2 5
## [39] 5 5 5 5 5 4 2 3 5 2 2 2 5 2 5 2 2

We can find out how many observations are in each cluster by means of the table command. Parenthetially, we can also find this as the dimension of each vector in the lists contained in edges.groups. For example, the first list has node with dimension 12, which is also the number of observations in the first cluster.

table(ccs5)
## ccs5
##  1  2  3  4  5 
##  1 22 12  2 18

Lastly, we can also plot the pruned tree that shows the five clusters on top of the townshop area.

plot(shan_sp, border=gray(.5))
plot(clust5, coordinates(shan_sp), cex.lab=.7,
     groups.colors=c("red","green","blue", "brown", "pink"), cex.circles=0.005, add=TRUE)
## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

Visualising in chloropeth map

The code chunk below is used to plot the newly derived clusters by using SKATER method.

groups_mat <- as.matrix(clust5$groups)
shan_sf_spatialcluster <- cbind(shan_sf_cluster, as.factor(groups_mat)) %>%
  rename(`SP_CLUSTER`=`as.factor.groups_mat.`)
qtm(shan_sf_spatialcluster, "SP_CLUSTER")