1. Overview

Setting the Scene Brazil is the world’s fifth-largest country by area and the sixth most populor. Brazil is classified as an upper-middle income economy by the World Bank and a developing country, with the largest share of global wealth in Latin America. It is considered an advanced emerging economy. It has the ninth largest GDP in the world by nominal, and eighth by PPP measures. Behind all this impressive figures, the spatial development of Brazil is highly unequal as shown in the figure on the right. In 2013, the GDP per capita of the poorest municipality is R$3,190.6. On the other hand, the GDP per capita of the richest municipality is R$314,638. Half of the municipalities have GDP per capita less than R$16,000 and the top 25% municipalities earn R$26,155 and above.

1.1. The Tasks

The specific tasks of the analysis are as follows:

Preparing a series of choropleth maps showing the distribution of spatial specialisation by industry type, 2016 at municipality level.
Delineating industry specialisation clusters by using hierarchical clustering method and display their distribution by using appropriate thematic mapping technique.
Delineating industry specialisation clusters by using spatially constrained clustering method and display their distribution by using appropriate thematic mapping technique.

1.2. The data

For the purpose of this assignments, these two datasets are used.

BRAZIL_CITIES.csv. This data file consists of 81 columns and 5573 rows. Each row representing one municipality.
Data_Dictionary.csv. This file provides meta data of each columns in BRAZIL_CITIES.csv.

2. Install R Packages

Firstly, before we get started, it is important for us to install the necessary R packages into R and launch these R packages into R environment.

These are the R packages used for this take home assignment 3.

Spatial data handling : sf, rgdal and spdep
Attribute data handling : tidyverse, especially readr, ggplot2 and dplyr
Choropleth mapping: tmap
Multivariate data visualisation and analysis: coorplot, ggpubr, and heatmaply
Cluster analysis: cluster

packages = c('sf','rgdal', 'spdep', 'tidyverse', 'readr', 'dplyr', 'tmap', 'corrplot', 'ggpubr', 'heatmaply', 'cluster', 'factoextra', 'NbClust',  'psych', "geobr")
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
    }
  library(p,character.only = T)
}

## Loading required package: sf

## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1

## Loading required package: rgdal

## Loading required package: sp

## rgdal: version: 1.5-16, (SVN revision 1050)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
## Path to GDAL shared files: C:/Users/THIN SU YEE/Documents/R/win-library/4.0/rgdal/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 6.3.1, February 10th, 2020, [PJ_VERSION: 631]
## Path to PROJ shared files: C:/Users/THIN SU YEE/Documents/R/win-library/4.0/rgdal/proj
## Linking to sp version:1.4-2
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.

## Loading required package: spdep

## Loading required package: spData

## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`

## Loading required package: tidyverse

## -- Attaching packages -------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.2
## v tidyr   1.1.1     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0

## -- Conflicts ----------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

## Loading required package: tmap

## Loading required package: corrplot

## Warning: package 'corrplot' was built under R version 4.0.3

## corrplot 0.84 loaded

## Loading required package: ggpubr

## Warning: package 'ggpubr' was built under R version 4.0.3

## Loading required package: heatmaply

## Warning: package 'heatmaply' was built under R version 4.0.3

## Loading required package: plotly

## Warning: package 'plotly' was built under R version 4.0.3

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

## Loading required package: viridis

## Loading required package: viridisLite

## 
## ======================
## Welcome to heatmaply version 1.1.1
## 
## Type citation('heatmaply') for how to cite the package.
## Type ?heatmaply for the main documentation.
## 
## The github page is: https://github.com/talgalili/heatmaply/
## Please submit your suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
## Or contact: <tal.galili@gmail.com>
## ======================

## Loading required package: cluster

## Loading required package: factoextra

## Warning: package 'factoextra' was built under R version 4.0.3

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

## Loading required package: NbClust

## Warning: package 'NbClust' was built under R version 4.0.3

## Loading required package: psych

## Warning: package 'psych' was built under R version 4.0.3

## 
## Attaching package: 'psych'

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

## Loading required package: geobr

## Warning: package 'geobr' was built under R version 4.0.3

3. Loading the data

3.1. Importing the data

This code chunk imports BRAZIL_CITIES.csv into R by using read.csv2() and read_delim since csv file is using a semicolon as delimiter.

brazil_cities <- read.csv2("data/aspatial/BRAZIL_CITIES.csv", encoding = 'utf-8')

brazil_data <- read_delim("data/aspatial/BRAZIL_CITIES.csv", delim = ";")

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   CITY = col_character(),
##   STATE = col_character(),
##   AREA = col_number(),
##   REGIAO_TUR = col_character(),
##   CATEGORIA_TUR = col_character(),
##   RURAL_URBAN = col_character(),
##   GVA_MAIN = col_character()
## )

## See spec(...) for full column specifications.

This code chunk checks the content of imported brazil_cities data.

head(brazil_cities, n=5)

##                  CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS
## 1    Abadia De GoiÃ¡s    GO       0         6876              6876
## 2 Abadia Dos Dourados    MG       0         6704              6704
## 3          AbadiÃ¢nia    GO       0        15757             15609
## 4             AbaetÃ©    MG       0        22690             22690
## 5          Abaetetuba    PA       0       141100            141040
##   IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP IBGE_1
## 1                 0    2137          1546           591     5300     69
## 2                 0    2328          1481           847     4154     38
## 3               148    4655          3233          1422    10656    139
## 4                 0    7694          6667          1027    18464    176
## 5                60   31061         19057         12004    82956   1354
##   IBGE_1.4 IBGE_5.9 IBGE_10.14 IBGE_15.59 IBGE_60. IBGE_PLANTED_AREA
## 1      318      438        517       3542      416               319
## 2      207      260        351       2709      589              4479
## 3      650      894       1087       6896      990             10307
## 4      856     1233       1539      11979     2681              1862
## 5     5567     7618       8905      53516     5996             25200
##   IBGE_CROP_PRODUCTION_. IDHM.Ranking.2010  IDHM IDHM_Renda IDHM_Longevidade
## 1                   1843              1689 0.708      0.687             0.83
## 2                  18017              2207  0.69      0.693            0.839
## 3                  33085              2202  0.69      0.671            0.841
## 4                   7502              1994 0.698       0.72            0.848
## 5                 700872              3530 0.628      0.579            0.798
##   IDHM_Educacao         LONG          LAT     ALT PAY_TV FIXED_PHONES     AREA
## 1         0.622 -49.44054783 -16.75881189   893.6    360          842   147.26
## 2         0.563 -47.39683244 -18.48756496  753.12     77          296   881.06
## 3         0.579 -48.71881214 -16.18267186 1017.55    227          720 1,045.13
## 4         0.556 -45.44619142 -19.15584769  644.74   1230         1716 1,817.07
## 5         0.537 -48.88440382 -1.723469863   10.12   3389         1218 1,610.65
##                              REGIAO_TUR CATEGORIA_TUR ESTIMATED_POP
## 1                                                              8583
## 2                   Caminhos Do Cerrado             D          6972
## 3 RegiÃ£o TurÃstica Do Ouro E Cristais             C         19614
## 4                  Lago De TrÃªs Marias             D         23223
## 5                    Araguaia-Tocantins             D        156292
##       RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC  GVA_TOTAL
## 1          Urbano         6.2     27991.25     74750.32   36915.04   145857.6
## 2 Rural Adjacente    50524.57      25917.7     62689.23   28083.79  167215.28
## 3 Rural Adjacente       42.84      16728.3    138198.58    63396.2  261161.91
## 4          Urbano    113824.6     31002.62       172.33   86081.41  403241.27
## 5          Urbano   140463.72        58610    468128.69   486872.4 1154074.81
##      TAXES        GDP POP_GDP GDP_CAPITA
## 1  20554.2     166.41    8053   20664.57
## 2  12873.5     180.09    7037    25591.7
## 3 26822.58  287984.49   18427    15628.4
## 4 26994.09  430235.36   23574   18250.42
## 5 95180.48 1249255.29  151934    8222.36
##                                                                     GVA_MAIN
## 1                                                           Demais serviÃ§os
## 2                                                           Demais serviÃ§os
## 3                                                           Demais serviÃ§os
## 4                                                           Demais serviÃ§os
## 5 AdministraÃ§Ã£o, defesa, educaÃ§Ã£o e saÃºde pÃºblicas e seguridade social
##   MUN_EXPENDIT COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H
## 1     28227691      284      5      1     56      0      2     29    110     26
## 2     17909274      476      6      6     30      1      2     34    190     70
## 3     37513019      288      5      9     26      0      2      7    117     12
## 4           NA      621     18      1     40      0      1     20    303     62
## 5           NA      931      4      2     43      0      1     27    500     16
##   COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S
## 1      4      5      0      2     10     12      4      6      6      1      5
## 2     28     11      0      4     15     29      2      9     14      6     19
## 3     57      2      1      0      7     15      3     11      5      1      8
## 4     30      9      6      4     28     27      2     15     19      9     27
## 5     31      6      1      1     22     16      2    155     33     15     56
##   COMP_T COMP_U HOTELS BEDS Pr_Agencies Pu_Agencies Pr_Bank Pu_Bank Pr_Assets
## 1      0      0     NA   NA          NA          NA      NA      NA        NA
## 2      0      0     NA   NA          NA          NA      NA      NA        NA
## 3      0      0      1   34           1           1       1       1  33724584
## 4      0      0     NA   NA           2           2       2       2  44974716
## 5      0      0     NA   NA           2           4       2       4  76181384
##   Pu_Assets Cars Motorcycles Wheeled_tractor UBER MAC WAL.MART POST_OFFICES
## 1        NA 2158        1246               0   NA  NA       NA            1
## 2        NA 2227        1142               0   NA  NA       NA            1
## 3  67091904 2838        1426               0   NA  NA       NA            3
## 4 371922572 6928        2953               0   NA  NA       NA            4
## 5 800078483 5277       25661               0   NA  NA       NA            2

head(brazil_data, n = 5)

## # A tibble: 5 x 81
##   CITY  STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
##   <chr> <chr>   <dbl>        <dbl>            <dbl>            <dbl>   <dbl>
## 1 Abad~ GO          0         6876             6876                0    2137
## 2 Abad~ MG          0         6704             6704                0    2328
## 3 Abad~ GO          0        15757            15609              148    4655
## 4 Abae~ MG          0        22690            22690                0    7694
## 5 Abae~ PA          0       141100           141040               60   31061
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## #   IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## #   `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## #   IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## #   2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## #   IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## #   FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## #   ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## #   GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## #   ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## #   GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## #   COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## #   COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## #   COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## #   COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## #   HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## #   Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## #   Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## #   `WAL-MART` <dbl>, POST_OFFICES <dbl>

Next, this code chunks imports Data_Dictionary.csv into R by using read.csv2() since the file contains “;” as delimiter.

data_dict <- read.csv2("data/aspatial/Data_Dictionary.csv", )

This code chunk checks the content of imported data_dictionary which contains the metadata of the columns in brazil_cities data.

head(data_dict, n=5)

##            ï..FIELD                   DESCRIPTION REFERENCE UNIT
## 1              CITY              Name of the City               
## 2             STATE             Name of the State               
## 3           CAPITAL         1 if Capital of State               
## 4      IBGE_RES_POP          Resident Population       2010    -
## 5 IBGE_RES_POP_BRAS Resident Population Brazilian      2010    -
##                                  SOURCE X
## 1                                     -  
## 2                                     -  
## 3                                     -  
## 4 https://sidra.ibge.gov.br/tabela/1497  
## 5 https://sidra.ibge.gov.br/tabela/1497

3.2. Downloading boundary file

This code uses read_municipality() of geobr package to obtain a copy of 2016 municipality boundary file.

municipal<- read_municipality(year=2016)

## Using year 2016

## Loading data for the whole country. This might take a few minutes.

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |======================================================================| 100%

glimpse(municipal)

## Rows: 5,572
## Columns: 5
## $ code_muni    <dbl> 1100015, 1100023, 1100031, 1100049, 1100056, 1100064, ...
## $ name_muni    <chr> "Alta Floresta D'oeste", "Ariquemes", "Cabixi", "Cacoa...
## $ code_state   <chr> "11", "11", "11", "11", "11", "11", "11", "11", "11", ...
## $ abbrev_state <chr> "RO", "RO", "RO", "RO", "RO", "RO", "RO", "RO", "RO", ...
## $ geom         <MULTIPOLYGON [°]> MULTIPOLYGON (((-62.19465 -..., MULTIPOLY...

Observation: Based on the downloaded data, we can see that there are 5572 observations in this municipal boundary file. But not all of those are the interest of the study. Therefore, it is required to filter out unnecessary data and only take in the data that are needed to carry out the analysis of the study.

This code chunks uses read_metro_area() of geobr package to obtain a copy of the metropolitan boundary file.

metropolian <- read_metro_area(year=2016)

## Using year 2016

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## 
Downloading: 770 B     
Downloading: 770 B     
Downloading: 1.9 kB     
Downloading: 1.9 kB     
Downloading: 2.5 kB     
Downloading: 2.5 kB     
Downloading: 2.5 kB     
Downloading: 2.5 kB     
Downloading: 12 kB     
Downloading: 12 kB     
Downloading: 29 kB     
Downloading: 29 kB     
Downloading: 29 kB     
Downloading: 29 kB     
Downloading: 29 kB     
Downloading: 29 kB     
Downloading: 45 kB     
Downloading: 45 kB     
Downloading: 45 kB     
Downloading: 45 kB     
Downloading: 61 kB     
Downloading: 61 kB     
Downloading: 69 kB     
Downloading: 69 kB     
Downloading: 85 kB     
Downloading: 85 kB     
Downloading: 85 kB     
Downloading: 85 kB     
Downloading: 85 kB     
Downloading: 85 kB     
Downloading: 93 kB     
Downloading: 93 kB     
Downloading: 110 kB     
Downloading: 110 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 130 kB     
Downloading: 150 kB     
Downloading: 150 kB     
Downloading: 160 kB     
Downloading: 160 kB     
Downloading: 170 kB     
Downloading: 170 kB     
Downloading: 180 kB     
Downloading: 180 kB     
Downloading: 200 kB     
Downloading: 200 kB     
Downloading: 200 kB     
Downloading: 200 kB     
Downloading: 210 kB     
Downloading: 210 kB     
Downloading: 210 kB     
Downloading: 210 kB     
Downloading: 230 kB     
Downloading: 230 kB     
Downloading: 250 kB     
Downloading: 250 kB     
Downloading: 260 kB     
Downloading: 260 kB     
Downloading: 280 kB     
Downloading: 280 kB     
Downloading: 300 kB     
Downloading: 300 kB     
Downloading: 310 kB     
Downloading: 310 kB     
Downloading: 330 kB     
Downloading: 330 kB     
Downloading: 340 kB     
Downloading: 340 kB     
Downloading: 340 kB     
Downloading: 340 kB     
Downloading: 350 kB     
Downloading: 350 kB     
Downloading: 370 kB     
Downloading: 370 kB     
Downloading: 380 kB     
Downloading: 380 kB     
Downloading: 400 kB     
Downloading: 400 kB     
Downloading: 420 kB     
Downloading: 420 kB     
Downloading: 430 kB     
Downloading: 430 kB     
Downloading: 450 kB     
Downloading: 450 kB     
Downloading: 470 kB     
Downloading: 470 kB     
Downloading: 470 kB     
Downloading: 470 kB     
Downloading: 470 kB     
Downloading: 470 kB     
Downloading: 470 kB     
Downloading: 470 kB     
Downloading: 490 kB     
Downloading: 490 kB     
Downloading: 510 kB     
Downloading: 510 kB     
Downloading: 520 kB     
Downloading: 520 kB     
Downloading: 540 kB     
Downloading: 540 kB     
Downloading: 550 kB     
Downloading: 550 kB     
Downloading: 570 kB     
Downloading: 570 kB     
Downloading: 590 kB     
Downloading: 590 kB     
Downloading: 600 kB     
Downloading: 600 kB     
Downloading: 620 kB     
Downloading: 620 kB     
Downloading: 640 kB     
Downloading: 640 kB     
Downloading: 650 kB     
Downloading: 650 kB     
Downloading: 670 kB     
Downloading: 670 kB     
Downloading: 680 kB     
Downloading: 680 kB     
Downloading: 700 kB     
Downloading: 700 kB     
Downloading: 720 kB     
Downloading: 720 kB     
Downloading: 720 kB     
Downloading: 720 kB     
Downloading: 720 kB     
Downloading: 720 kB     
Downloading: 730 kB     
Downloading: 730 kB     
Downloading: 750 kB     
Downloading: 750 kB     
Downloading: 770 kB     
Downloading: 770 kB     
Downloading: 780 kB     
Downloading: 780 kB     
Downloading: 800 kB     
Downloading: 800 kB     
Downloading: 810 kB     
Downloading: 810 kB     
Downloading: 810 kB     
Downloading: 810 kB     
Downloading: 830 kB     
Downloading: 830 kB     
Downloading: 830 kB     
Downloading: 830 kB     
Downloading: 830 kB     
Downloading: 830 kB     
Downloading: 830 kB     
Downloading: 830 kB     
Downloading: 840 kB     
Downloading: 840 kB     
Downloading: 850 kB     
Downloading: 850 kB     
Downloading: 870 kB     
Downloading: 870 kB     
Downloading: 890 kB     
Downloading: 890 kB     
Downloading: 890 kB     
Downloading: 890 kB     
Downloading: 900 kB     
Downloading: 900 kB     
Downloading: 910 kB     
Downloading: 910 kB     
Downloading: 920 kB     
Downloading: 920 kB     
Downloading: 940 kB     
Downloading: 940 kB     
Downloading: 940 kB     
Downloading: 940 kB     
Downloading: 940 kB     
Downloading: 940 kB     
Downloading: 940 kB     
Downloading: 940 kB     
Downloading: 940 kB     
Downloading: 940 kB     
Downloading: 950 kB     
Downloading: 950 kB     
Downloading: 960 kB     
Downloading: 960 kB     
Downloading: 970 kB     
Downloading: 970 kB     
Downloading: 980 kB     
Downloading: 980 kB     
Downloading: 1 MB     
Downloading: 1 MB     
Downloading: 1 MB     
Downloading: 1 MB     
Downloading: 1 MB     
Downloading: 1 MB     
Downloading: 1 MB     
Downloading: 1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.1 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.2 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.3 MB     
Downloading: 1.4 MB     
Downloading: 1.4 MB     
Downloading: 1.4 MB     
Downloading: 1.4 MB     
Downloading: 1.4 MB     
Downloading: 1.4 MB     
Downloading: 1.4 MB     
Downloading: 1.4 MB     
Downloading: 1.4 MB     
Downloading: 1.4 MB     
Downloading: 1.4 MB     
Downloading: 1.4 MB     
Downloading: 1.4 MB     
Downloading: 1.4 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.5 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.6 MB     
Downloading: 1.7 MB     
Downloading: 1.7 MB     
Downloading: 1.7 MB     
Downloading: 1.7 MB     
Downloading: 1.7 MB     
Downloading: 1.7 MB     
Downloading: 1.7 MB     
Downloading: 1.7 MB     
Downloading: 1.7 MB     
Downloading: 1.7 MB     
Downloading: 1.7 MB     
Downloading: 1.7 MB     
Downloading: 1.7 MB     
Downloading: 1.7 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.8 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 1.9 MB     
Downloading: 2 MB     
Downloading: 2 MB     
Downloading: 2 MB     
Downloading: 2 MB     
Downloading: 2 MB     
Downloading: 2 MB     
Downloading: 2 MB     
Downloading: 2 MB     
Downloading: 2 MB     
Downloading: 2 MB     
Downloading: 2 MB     
Downloading: 2 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.1 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.2 MB     
Downloading: 2.3 MB     
Downloading: 2.3 MB     
Downloading: 2.3 MB     
Downloading: 2.3 MB     
Downloading: 2.3 MB     
Downloading: 2.3 MB     
Downloading: 2.3 MB     
Downloading: 2.3 MB     
Downloading: 2.3 MB     
Downloading: 2.3 MB     
Downloading: 2.3 MB     
Downloading: 2.3 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.4 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.5 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.6 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.7 MB     
Downloading: 2.8 MB     
Downloading: 2.8 MB     
Downloading: 2.8 MB     
Downloading: 2.8 MB     
Downloading: 2.8 MB     
Downloading: 2.8 MB

glimpse(metropolian)

## Rows: 1,331
## Columns: 10
## $ code_muni        <dbl> 2700300, 2701506, 2702009, 2702355, 2702603, 27029...
## $ name_muni        <chr> "Arapiraca", "Campo Grande", "Coité Do Nóia", "Cra...
## $ code_state       <chr> "27", "27", "27", "27", "27", "27", "27", "27", "2...
## $ abbrev_state     <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A...
## $ name_metro       <chr> "RM Agreste", "RM Agreste", "RM Agreste", "RM Agre...
## $ type             <chr> "RM", "RM", "RM", "RM", "RM", "RM", "RM", "RM", "R...
## $ subdivision      <chr> "NÃO TEM", "NÃO TEM", "NÃO TEM", "NÃO TEM", "NÃO T...
## $ legislation      <chr> "Lei Complementar 27", "Lei Complementar 27", "Lei...
## $ legislation_date <chr> "30.11.2009", "30.11.2009", "30.11.2009", "30.11.2...
## $ geom             <MULTIPOLYGON [°]> MULTIPOLYGON (((-36.66005 -..., MULTI...

Observation: Based on the downloaded data, we can see that there are 1331 observations in this metropolitan boundary file. But not all of those are the interest of the study. Therefore, it is required to filter out unnecessary data and only take in the data that are needed to carry out the analysis of the study.

3.3. Data Wranglings

Based on the observations above, it is necessary to filter out the data to meet the requirement of the study which is only to observe the list of municipals listed under São Paulo Macrometropolis to compute location quotients by industry type, at municipality level. ### 3.3.1. Extracting Municipals of Braganca Paulista city This code chunk will perform the data wrangling to get the list of 10 municipals in Regional Unit of Bragança Paulista city and add a new column name name_metro to the newly prepared muni_state simple feature frame.

muni_state <- municipal %>%
  filter(abbrev_state =="SP" & name_muni %in% c('Atibaia', 'Bom Jesus Dos Perdões', 'Bragança Paulista', 'Joanópolis', 'Nazaré Paulista', 'Pedra Bela', 'Pinhalzinho', 'Piracaia', 'Tuiuti', 'Vargem')) %>%
  mutate(name_metro = 'Regional Unit of Bragança Paulista city')

muni_state

## Simple feature collection with 10 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -46.76118 ymin: -23.31611 xmax: -46.04105 ymax: -22.70131
## geographic CRS: SIRGAS 2000
##    code_muni             name_muni code_state abbrev_state
## 1    3504107               Atibaia         35           SP
## 2    3507100 Bom Jesus Dos Perdões         35           SP
## 3    3507605     Bragança Paulista         35           SP
## 4    3525508            Joanópolis         35           SP
## 5    3532405       Nazaré Paulista         35           SP
## 6    3536802            Pedra Bela         35           SP
## 7    3538204           Pinhalzinho         35           SP
## 8    3538600              Piracaia         35           SP
## 9    3554953                Tuiuti         35           SP
## 10   3556354                Vargem         35           SP
##                              geom                              name_metro
## 1  MULTIPOLYGON (((-46.66146 -... Regional Unit of Bragança Paulista city
## 2  MULTIPOLYGON (((-46.47303 -... Regional Unit of Bragança Paulista city
## 3  MULTIPOLYGON (((-46.50858 -... Regional Unit of Bragança Paulista city
## 4  MULTIPOLYGON (((-46.14383 -... Regional Unit of Bragança Paulista city
## 5  MULTIPOLYGON (((-46.26139 -... Regional Unit of Bragança Paulista city
## 6  MULTIPOLYGON (((-46.46284 -... Regional Unit of Bragança Paulista city
## 7  MULTIPOLYGON (((-46.53711 -... Regional Unit of Bragança Paulista city
## 8  MULTIPOLYGON (((-46.34589 -... Regional Unit of Bragança Paulista city
## 9  MULTIPOLYGON (((-46.6655 -2... Regional Unit of Bragança Paulista city
## 10 MULTIPOLYGON (((-46.39475 -... Regional Unit of Bragança Paulista city

Observation: By this data wrangling, muni_state simple feature frame now only contains the 10 municipals of SP state and 5 column names ’code muni, name_muni, code_state, abbrev_state, geom, and name metro.

3.3.2. Extracting Metropolians of Sao Paulo State

This code chunk will perform data wrangling to keep only the code_muni, name_muni, code_state, abbrev_state, name_metro and geom as the study area in the metro_state simple feature frame.

metro_state <- metropolian %>%
  filter(abbrev_state =="SP") %>%
  select(code_muni, name_muni, code_state, abbrev_state, name_metro,geom)

metro_state

## Simple feature collection with 164 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -48.40857 ymin: -24.492 xmax: -44.16137 ymax: -22.01305
## geographic CRS: SIRGAS 2000
## First 10 features:
##    code_muni            name_muni code_state abbrev_state
## 1    3508405             Cabreúva         35           SP
## 2    3509601 Campo Limpo Paulista         35           SP
## 3    3524006              Itupeva         35           SP
## 4    3525201               Jarinu         35           SP
## 5    3525904              Jundiaí         35           SP
## 6    3527306             Louveira         35           SP
## 7    3556503      Várzea Paulista         35           SP
## 8    3500600   Águas De São Pedro         35           SP
## 9    3502002            Analândia         35           SP
## 10   3503307               Araras         35           SP
##                                         name_metro
## 1                    Aglomeração Urbana de Jundiaí
## 2                    Aglomeração Urbana de Jundiaí
## 3                    Aglomeração Urbana de Jundiaí
## 4                    Aglomeração Urbana de Jundiaí
## 5                    Aglomeração Urbana de Jundiaí
## 6                    Aglomeração Urbana de Jundiaí
## 7                    Aglomeração Urbana de Jundiaí
## 8  Aglomeração Urbana de Piracicaba-AU- Piracicaba
## 9  Aglomeração Urbana de Piracicaba-AU- Piracicaba
## 10 Aglomeração Urbana de Piracicaba-AU- Piracicaba
##                              geom
## 1  MULTIPOLYGON (((-47.09692 -...
## 2  MULTIPOLYGON (((-46.78614 -...
## 3  MULTIPOLYGON (((-47.04599 -...
## 4  MULTIPOLYGON (((-46.71471 -...
## 5  MULTIPOLYGON (((-46.85676 -...
## 6  MULTIPOLYGON (((-46.91599 -...
## 7  MULTIPOLYGON (((-46.82121 -...
## 8  MULTIPOLYGON (((-47.86297 -...
## 9  MULTIPOLYGON (((-47.67277 -...
## 10 MULTIPOLYGON (((-47.23694 -...

Observation: By this data wrangling, the created metro_state will now only contains the 164 metro that belong to “SP” state along with the 5 columns names; code_muni, name_muni, code_state, abbrev_state, name_metro and geom. It is also found that metro_state and muni_state simple feature frameshave the same number of column names. Therefore, it is possible to merge them to get the study area of the analysis.

3.3.3. Creating Study Area

This code chunk will merge the metro_state and muni_state together into one simple feature to form the study area of the assignment.

study_area <- rbind(metro_state, muni_state)

study_area

## Simple feature collection with 174 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -48.40857 ymin: -24.492 xmax: -44.16137 ymax: -22.01305
## geographic CRS: SIRGAS 2000
## First 10 features:
##    code_muni            name_muni code_state abbrev_state
## 1    3508405             Cabreúva         35           SP
## 2    3509601 Campo Limpo Paulista         35           SP
## 3    3524006              Itupeva         35           SP
## 4    3525201               Jarinu         35           SP
## 5    3525904              Jundiaí         35           SP
## 6    3527306             Louveira         35           SP
## 7    3556503      Várzea Paulista         35           SP
## 8    3500600   Águas De São Pedro         35           SP
## 9    3502002            Analândia         35           SP
## 10   3503307               Araras         35           SP
##                                         name_metro
## 1                    Aglomeração Urbana de Jundiaí
## 2                    Aglomeração Urbana de Jundiaí
## 3                    Aglomeração Urbana de Jundiaí
## 4                    Aglomeração Urbana de Jundiaí
## 5                    Aglomeração Urbana de Jundiaí
## 6                    Aglomeração Urbana de Jundiaí
## 7                    Aglomeração Urbana de Jundiaí
## 8  Aglomeração Urbana de Piracicaba-AU- Piracicaba
## 9  Aglomeração Urbana de Piracicaba-AU- Piracicaba
## 10 Aglomeração Urbana de Piracicaba-AU- Piracicaba
##                              geom
## 1  MULTIPOLYGON (((-47.09692 -...
## 2  MULTIPOLYGON (((-46.78614 -...
## 3  MULTIPOLYGON (((-47.04599 -...
## 4  MULTIPOLYGON (((-46.71471 -...
## 5  MULTIPOLYGON (((-46.85676 -...
## 6  MULTIPOLYGON (((-46.91599 -...
## 7  MULTIPOLYGON (((-46.82121 -...
## 8  MULTIPOLYGON (((-47.86297 -...
## 9  MULTIPOLYGON (((-47.67277 -...
## 10 MULTIPOLYGON (((-47.23694 -...

Observation: After merging the data, the data from the 164 municipals from sao_paulo_metro_boundary and 10 municipals from braganca_paulista city are combined together to form the study of the area of the assignment. It contains a total of 174 municipals with 5 column names to describe them.

3.3.4. Extracting Brazil Cities belong to Sao Paulo State

This code chunk will filter by the brazil cities that belong to Sao Paulo state only and its industry data.

brazil_cities_industry <- brazil_data %>%
  filter(STATE == "SP")%>%
  select (CITY,contains(c("COMP")))

brazil_cities_industry

## # A tibble: 645 x 23
##    CITY  COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
##    <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1 Adam~     1726    272      0    102      0      4     35    663    102     99
##  2 Adol~       81     10      2      3      0      0      2     33      5      4
##  3 Aguaí      799    149      9     54      0      3     21    333     44     51
##  4 Água~      212     48      0     18      0      0      9     63      7     21
##  5 Água~      881     11      0    168      0      1     41    328     11     92
##  6 Água~      216     60      1     10      0      0      7     68     15     19
##  7 Água~      202      2      1      4      0      1      6     99      3     40
##  8 Agud~      840     46      0     91      0      3     38    365     50     64
##  9 Alam~      137     61      0     14      0      0      4     31      3      8
## 10 Alfr~      107     31      0      9      0      0      3     37      3      3
## # ... with 635 more rows, and 12 more variables: COMP_J <dbl>, COMP_K <dbl>,
## #   COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## #   COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>

Observation: By filtering by the brazil cities that belong to Sao Paulo state only, the data is reduced from original 5573 observations to 645 observation. The number of columns are also further reduced by only ttaking in the name of city, and the columns that contains the name ‘comp’ which are required in carrying out the analysis of the study area.

3.3.5. Creating Industry Study Area of Brazil

This code chunk will use the left_join() of the dplyr to combine operation between muni metro study_area and brazil_cities_industry data frames to form the industry study area.

industry_study_area <- left_join(study_area, brazil_cities_industry, by = c("name_muni"= "CITY"))

industry_study_area

## Simple feature collection with 174 features and 27 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -48.40857 ymin: -24.492 xmax: -44.16137 ymax: -22.01305
## geographic CRS: SIRGAS 2000
## First 10 features:
##    code_muni            name_muni code_state abbrev_state
## 1    3508405             Cabreúva         35           SP
## 2    3509601 Campo Limpo Paulista         35           SP
## 3    3524006              Itupeva         35           SP
## 4    3525201               Jarinu         35           SP
## 5    3525904              Jundiaí         35           SP
## 6    3527306             Louveira         35           SP
## 7    3556503      Várzea Paulista         35           SP
## 8    3500600   Águas De São Pedro         35           SP
## 9    3502002            Analândia         35           SP
## 10   3503307               Araras         35           SP
##                                         name_metro COMP_TOT COMP_A COMP_B
## 1                    Aglomeração Urbana de Jundiaí     1062     23      2
## 2                    Aglomeração Urbana de Jundiaí     1198     16      0
## 3                    Aglomeração Urbana de Jundiaí     1773     39      3
## 4                    Aglomeração Urbana de Jundiaí      802     29      2
## 5                    Aglomeração Urbana de Jundiaí    16395    198     11
## 6                    Aglomeração Urbana de Jundiaí     1230     34      0
## 7                    Aglomeração Urbana de Jundiaí     1975      8      0
## 8  Aglomeração Urbana de Piracicaba-AU- Piracicaba      202      2      1
## 9  Aglomeração Urbana de Piracicaba-AU- Piracicaba      162     36      3
## 10 Aglomeração Urbana de Piracicaba-AU- Piracicaba     4662    343      5
##    COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M
## 1     180      0      6     54    375    109     69     12     24     17     30
## 2     151      0      2     76    459     52     81     38      9     18     56
## 3     274      1      9     95    584    101    134     33     25     64     79
## 4      73      0      2     41    298     52     71     27      5      8     35
## 5    1030      2     42    742   5431    833   1152    648    310    458   1292
## 6     145      0      6     70    492    106    102     21      8     24     39
## 7     388      0     13    134    782    137    105     32      9     16     54
## 8       4      0      1      6     99      3     40      5      2      1      9
## 9       8      0      0      6     39     26     20      3      0      4      6
## 10    481      1     15    233   1816    269    300     86     61     69    180
##    COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S COMP_T COMP_U
## 1      62      2     18     35      4     40      0      0
## 2      73      2     84     31     10     40      0      0
## 3     150      4     61     21     22     74      0      0
## 4      76      2     37     15     11     18      0      0
## 5    1904      9    586    802    230    715      0      0
## 6      65      4     39     19     13     43      0      0
## 7     128      2     66     27     12     62      0      0
## 8      15      2      3      3      2      4      0      0
## 9       4      2      0      2      0      3      0      0
## 10    314      4    112    173     44    156      0      0
##                              geom
## 1  MULTIPOLYGON (((-47.09692 -...
## 2  MULTIPOLYGON (((-46.78614 -...
## 3  MULTIPOLYGON (((-47.04599 -...
## 4  MULTIPOLYGON (((-46.71471 -...
## 5  MULTIPOLYGON (((-46.85676 -...
## 6  MULTIPOLYGON (((-46.91599 -...
## 7  MULTIPOLYGON (((-46.82121 -...
## 8  MULTIPOLYGON (((-47.86297 -...
## 9  MULTIPOLYGON (((-47.67277 -...
## 10 MULTIPOLYGON (((-47.23694 -...

Observation: By merging the data between study_area and brazil_cities_industry data frames, the industry_study_area of this assignment is formed. This simple feature frame contains 174 features with 27 fields which will all be important in performing the analysis of the study to find out the distribution of spatial specialization by industry type, 2016 at municipality level.

3.3.6. Plotting Study Area

This code chunk will plot the study area to check if it is correct

qtm(study_area$geom)

Observation: This is the correct study area of the assignment, thus, will be continue with the further analysis.

3.4. Computing Location Quotient

This code chunk will compute the location quotient of each industry from Comp A to U

location_quotient <- industry_study_area %>%
mutate_all(~replace(., is.na(.), 0))%>%
mutate_at(vars(contains("COMP")), funs(LQ=(./COMP_TOT)/(sum(.)/sum(COMP_TOT))))%>%
select(name_muni,name_metro, contains(c("COMP")),contains(c("LQ")))

## Warning: `funs()` is deprecated as of dplyr 0.8.0.
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

location_quotient

## Simple feature collection with 174 features and 46 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -48.40857 ymin: -24.492 xmax: -44.16137 ymax: -22.01305
## geographic CRS: SIRGAS 2000
## First 10 features:
##               name_muni                                      name_metro
## 1              Cabreúva                   Aglomeração Urbana de Jundiaí
## 2  Campo Limpo Paulista                   Aglomeração Urbana de Jundiaí
## 3               Itupeva                   Aglomeração Urbana de Jundiaí
## 4                Jarinu                   Aglomeração Urbana de Jundiaí
## 5               Jundiaí                   Aglomeração Urbana de Jundiaí
## 6              Louveira                   Aglomeração Urbana de Jundiaí
## 7       Várzea Paulista                   Aglomeração Urbana de Jundiaí
## 8    Águas De São Pedro Aglomeração Urbana de Piracicaba-AU- Piracicaba
## 9             Analândia Aglomeração Urbana de Piracicaba-AU- Piracicaba
## 10               Araras Aglomeração Urbana de Piracicaba-AU- Piracicaba
##    COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G COMP_H COMP_I
## 1      1062     23      2    180      0      6     54    375    109     69
## 2      1198     16      0    151      0      2     76    459     52     81
## 3      1773     39      3    274      1      9     95    584    101    134
## 4       802     29      2     73      0      2     41    298     52     71
## 5     16395    198     11   1030      2     42    742   5431    833   1152
## 6      1230     34      0    145      0      6     70    492    106    102
## 7      1975      8      0    388      0     13    134    782    137    105
## 8       202      2      1      4      0      1      6     99      3     40
## 9       162     36      3      8      0      0      6     39     26     20
## 10     4662    343      5    481      1     15    233   1816    269    300
##    COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R COMP_S COMP_T
## 1      12     24     17     30     62      2     18     35      4     40      0
## 2      38      9     18     56     73      2     84     31     10     40      0
## 3      33     25     64     79    150      4     61     21     22     74      0
## 4      27      5      8     35     76      2     37     15     11     18      0
## 5     648    310    458   1292   1904      9    586    802    230    715      0
## 6      21      8     24     39     65      4     39     19     13     43      0
## 7      32      9     16     54    128      2     66     27     12     62      0
## 8       5      2      1      9     15      2      3      3      2      4      0
## 9       3      0      4      6      4      2      0      2      0      3      0
## 10     86     61     69    180    314      4    112    173     44    156      0
##    COMP_U COMP_TOT_LQ  COMP_A_LQ COMP_B_LQ COMP_C_LQ COMP_D_LQ COMP_E_LQ
## 1       0           1  1.6778605  3.651562 2.4607702 0.0000000 2.8991794
## 2       0           1  1.0347030  0.000000 1.8299668 0.0000000 0.8566857
## 3       0           1  1.7041523  3.280845 2.2437006 1.5842341 2.6048465
## 4       0           1  2.8014067  4.835360 1.3215134 0.0000000 1.2796877
## 5       0           1  0.9356347  1.300931 0.9121135 0.3426468 1.3145776
## 6       0           1  2.1415407  0.000000 1.7115357 0.0000000 2.5031940
## 7       0           1  0.3138162  0.000000 2.8522508 0.0000000 3.3777275
## 8       0           1  0.7670632  9.598907 0.2874959 0.0000000 2.5403701
## 9       0           1 17.2163080 35.907023 0.7169652 0.0000000 0.0000000
## 10      0           1  5.6999939  2.079557 1.4979451 0.6024983 1.6510771
##    COMP_F_LQ COMP_G_LQ COMP_H_LQ COMP_I_LQ COMP_J_LQ COMP_K_LQ COMP_L_LQ
## 1  1.0260217 1.0891761 2.3423920 1.0040926 0.2172296 0.7488611 0.7166303
## 2  1.2801005 1.1818089 0.9906132 1.0449064 0.6098022 0.2489432 0.6726458
## 3  1.0811904 1.0160042 1.3000805 1.1680065 0.3578223 0.4672463 1.6160023
## 4  1.0315655 1.1461284 1.4797439 1.3681483 0.6472196 0.2065904 0.4465667
## 5  0.9132292 1.0217855 1.1595545 1.0859010 0.7598464 0.6265631 1.2506184
## 6  1.1483657 1.2338187 1.9667916 1.2815757 0.3282286 0.2155259 0.8735281
## 7  1.3690679 1.2213244 1.5831099 0.8216207 0.3114907 0.1510045 0.3626800
## 8  0.5993592 1.5117334 0.3389436 3.0602564 0.4758618 0.3280901 0.2216253
## 9  0.7473491 0.7425761 3.6628230 1.9079376 0.3560152 0.0000000 1.1053905
## 10 1.0084885 1.2015309 1.3168539 0.9944849 0.3546406 0.4335829 0.6625941
##    COMP_M_LQ COMP_N_LQ  COMP_O_LQ COMP_P_LQ COMP_Q_LQ COMP_R_LQ COMP_S_LQ
## 1  0.3859211 0.4593034  2.9852937 0.5309395 0.8911083 0.3054418 0.8125985
## 2  0.6386060 0.4794005  2.6463956 2.1964410 0.6996677 0.6769181 0.7203502
## 3  0.6087236 0.6656025  3.5762909 1.0777504 0.3202562 1.0062523 0.9004581
## 4  0.5962048 0.7455412  3.9530946 1.4451883 0.5057127 1.1122727 0.4842155
## 5  1.0765964 0.9136671  0.8701872 1.1196525 1.3226651 1.1376518 0.9408820
## 6  0.4331729 0.4157581  5.1550925 0.9932454 0.4176721 0.8570994 0.7542301
## 7  0.3735326 0.5098886  1.6052567 1.0468246 0.3696440 0.4927278 0.6772751
## 8  0.6086855 0.5842145 15.6949599 0.4652292 0.4015659 0.8029187 0.4272176
## 9  0.5059855 0.1942573 19.5702586 0.0000000 0.3338120 0.0000000 0.3995276
## 10 0.5274752 0.5298950  1.3600952 0.7525629 1.0033692 0.7653734 0.7219263
##    COMP_T_LQ COMP_U_LQ                           geom
## 1        NaN         0 MULTIPOLYGON (((-47.09692 -...
## 2        NaN         0 MULTIPOLYGON (((-46.78614 -...
## 3        NaN         0 MULTIPOLYGON (((-47.04599 -...
## 4        NaN         0 MULTIPOLYGON (((-46.71471 -...
## 5        NaN         0 MULTIPOLYGON (((-46.85676 -...
## 6        NaN         0 MULTIPOLYGON (((-46.91599 -...
## 7        NaN         0 MULTIPOLYGON (((-46.82121 -...
## 8        NaN         0 MULTIPOLYGON (((-47.86297 -...
## 9        NaN         0 MULTIPOLYGON (((-47.67277 -...
## 10       NaN         0 MULTIPOLYGON (((-47.23694 -...

Observation: From the location quotient simple feature dataframe, it can be seen that newly computed location quotient for comp A to U are added into the dataframe and now it contains 46 fields

3.5. Converting to Spatial Polygon

This code chunk will convert the muni_state to Spatial Polygon

muni_state_sp <- as(muni_state,Class="Spatial")

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded datum Unknown based on GRS80 ellipsoid in CRS definition,
##  but +towgs84= values preserved

muni_state_sp

## class       : SpatialPolygonsDataFrame 
## features    : 10 
## extent      : -46.76118, -46.04105, -23.31611, -22.70131  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs 
## variables   : 5
## names       : code_muni, name_muni, code_state, abbrev_state,                              name_metro 
## min values  :   3504107,   Atibaia,         35,           SP, Regional Unit of Bragança Paulista city 
## max values  :   3556354,    Vargem,         35,           SP, Regional Unit of Bragança Paulista city

Observation: The muni_state simple feature dataframe is being converted to spatial polygon data frame called muni_state_sp.

This code chunk will conver the location_quotient into spatial polygon

location_quotient_sp<- as(location_quotient,Class="Spatial")

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded datum Unknown based on GRS80 ellipsoid in CRS definition,
##  but +towgs84= values preserved

location_quotient_sp

## class       : SpatialPolygonsDataFrame 
## features    : 174 
## extent      : -48.40857, -44.16137, -24.492, -22.01305  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs 
## variables   : 46
## names       :          name_muni,                    name_metro, COMP_TOT, COMP_A, COMP_B, COMP_C, COMP_D, COMP_E, COMP_F, COMP_G, COMP_H, COMP_I, COMP_J, COMP_K, COMP_L, ... 
## min values  : Águas De São Pedro, Aglomeração Urbana de Jundiaí,        0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, ... 
## max values  :         Votorantim,                  RM São Paulo,   530446,   1125,     79,  31566,    332,    657,  25222, 150633,  19515,  29290,  38720,  23738,  14003, ...

Observation: The location_quotient simple feature dataframe is being converted to spatial polygon data frame called location_quotient_sp.

3.6 Creating Choropleth Maps by Industry Type

In this step, I will be creating choropleth maps to show the distribution of spatial specialisation by industry type, 2-16 at municipality level.

comp_a_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_A_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Agriculture_A_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_b_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_B_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Extractive_B_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_c_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_C_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Transformation_C_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_d_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_D_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Electricity_D_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_e_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_E_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "WasteManagment_E_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_f_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_F_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Construction_F_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_g_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_G_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Trade_G_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_h_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_H_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Transport_H_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_i_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_I_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Accommodation_I_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_j_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_J_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "ICT_J_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_k_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_K_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Financial_K_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_l_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_L_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "RealEstate_L_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_m_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_M_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Scientific_M_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_n_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_N_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Administrative_N_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_o_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_O_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "PublicAdm_O_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_p_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_P_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Education_P_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_q_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_Q_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Health_Q_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_r_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_R_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Arts&Culture_R_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_s_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_S_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Service_S_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_t_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_T_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "Domestic_T_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

comp_u_map <- tm_shape(location_quotient_sp) + 
  tm_fill("COMP_U_LQ") +
  tm_borders(alpha = 0.5)+
  tm_layout(main.title = "International_U_LQ",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE)

This code chunk will show the distribution of spatial specialization by industry type at 2016 municipal level.

tmap_arrange(comp_a_map, comp_b_map, comp_c_map, asp = 0, ncol=3)

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

tmap_arrange(comp_d_map, comp_e_map, comp_f_map, asp = 0, ncol=3)

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

tmap_arrange(comp_g_map, comp_h_map, comp_i_map, asp = 0, ncol=3)

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

tmap_arrange(comp_j_map, comp_k_map, comp_l_map, asp = 0, ncol=3)

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

tmap_arrange(comp_m_map, comp_n_map, comp_o_map, asp = 0, ncol=3)

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

tmap_arrange(comp_p_map, comp_q_map, comp_r_map, asp = 0, ncol=3)

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

tmap_arrange(comp_s_map, comp_t_map, comp_u_map, asp = 0, ncol=3)

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output

Geocommunication: Based on the choropleth maps that created to show the distribution of spatial specialization by industry type at municipal, we can see that Construction_F, Trade_G, RealEstate_L, Education_P, Health_Q, Arts_R, Service_S are widely spread out across Greater Sao Paulo State. Moreover, it can be seen that Agriculture_A industry is highly specialized in a few municipalities of the Sao Paulo State and those municipalities that highly specialized in Agriculture_A industry is clustered around together. However, there is no Domestic_T industry in the Sao Paulo State. Therefore, this industry T will be removed from the study area for the subsequent analysis.

4. Hierarchy Cluster Analysis

4.1. Correlation Analysis

Before we perform cluster analysis, it is important for us to ensure that the cluster variables are not highly correlated.Therefore, correlation analysis needs to be carried out. This code chunk will use coorplot.mixed() function of corrplot package to visualise and analyse the correlation of the input variables.

lq_cluster <- as.data.frame(location_quotient)%>% 
          mutate_all(~replace(., is.na(.), 0)) %>%
          select(name_muni, contains("LQ"), -"COMP_T_LQ", -"COMP_TOT_LQ")

  
cluster_vars.cor = cor(lq_cluster[2:21])
corrplot.mixed(cluster_vars.cor,
         lower = "ellipse", 
               upper = "ellipse",
               tl.pos = "lt",
               diag = "l",
               tl.col = "black")

Observation: Based on the correlation plot, we can see that Financial_K & Professional_M are highly correlated. This suggest only one of industry should be used in the cluster analysis instead of both.

4.2. Extrating clustering variables

The code chunk below will be used to extract the clustering variables lq_cluster simple feature object into data.frame.

lq_cluster_wo_m <- as.data.frame(lq_cluster)%>% 
          mutate_all(~replace(., is.na(.), 0)) %>%
          select(-"COMP_M_LQ")

head(lq_cluster_wo_m,10)

##               name_muni  COMP_A_LQ COMP_B_LQ COMP_C_LQ COMP_D_LQ COMP_E_LQ
## 1              Cabreúva  1.6778605  3.651562 2.4607702 0.0000000 2.8991794
## 2  Campo Limpo Paulista  1.0347030  0.000000 1.8299668 0.0000000 0.8566857
## 3               Itupeva  1.7041523  3.280845 2.2437006 1.5842341 2.6048465
## 4                Jarinu  2.8014067  4.835360 1.3215134 0.0000000 1.2796877
## 5               Jundiaí  0.9356347  1.300931 0.9121135 0.3426468 1.3145776
## 6              Louveira  2.1415407  0.000000 1.7115357 0.0000000 2.5031940
## 7       Várzea Paulista  0.3138162  0.000000 2.8522508 0.0000000 3.3777275
## 8    Águas De São Pedro  0.7670632  9.598907 0.2874959 0.0000000 2.5403701
## 9             Analândia 17.2163080 35.907023 0.7169652 0.0000000 0.0000000
## 10               Araras  5.6999939  2.079557 1.4979451 0.6024983 1.6510771
##    COMP_F_LQ COMP_G_LQ COMP_H_LQ COMP_I_LQ COMP_J_LQ COMP_K_LQ COMP_L_LQ
## 1  1.0260217 1.0891761 2.3423920 1.0040926 0.2172296 0.7488611 0.7166303
## 2  1.2801005 1.1818089 0.9906132 1.0449064 0.6098022 0.2489432 0.6726458
## 3  1.0811904 1.0160042 1.3000805 1.1680065 0.3578223 0.4672463 1.6160023
## 4  1.0315655 1.1461284 1.4797439 1.3681483 0.6472196 0.2065904 0.4465667
## 5  0.9132292 1.0217855 1.1595545 1.0859010 0.7598464 0.6265631 1.2506184
## 6  1.1483657 1.2338187 1.9667916 1.2815757 0.3282286 0.2155259 0.8735281
## 7  1.3690679 1.2213244 1.5831099 0.8216207 0.3114907 0.1510045 0.3626800
## 8  0.5993592 1.5117334 0.3389436 3.0602564 0.4758618 0.3280901 0.2216253
## 9  0.7473491 0.7425761 3.6628230 1.9079376 0.3560152 0.0000000 1.1053905
## 10 1.0084885 1.2015309 1.3168539 0.9944849 0.3546406 0.4335829 0.6625941
##    COMP_N_LQ  COMP_O_LQ COMP_P_LQ COMP_Q_LQ COMP_R_LQ COMP_S_LQ COMP_U_LQ
## 1  0.4593034  2.9852937 0.5309395 0.8911083 0.3054418 0.8125985         0
## 2  0.4794005  2.6463956 2.1964410 0.6996677 0.6769181 0.7203502         0
## 3  0.6656025  3.5762909 1.0777504 0.3202562 1.0062523 0.9004581         0
## 4  0.7455412  3.9530946 1.4451883 0.5057127 1.1122727 0.4842155         0
## 5  0.9136671  0.8701872 1.1196525 1.3226651 1.1376518 0.9408820         0
## 6  0.4157581  5.1550925 0.9932454 0.4176721 0.8570994 0.7542301         0
## 7  0.5098886  1.6052567 1.0468246 0.3696440 0.4927278 0.6772751         0
## 8  0.5842145 15.6949599 0.4652292 0.4015659 0.8029187 0.4272176         0
## 9  0.1942573 19.5702586 0.0000000 0.3338120 0.0000000 0.3995276         0
## 10 0.5298950  1.3600952 0.7525629 1.0033692 0.7653734 0.7219263         0

Observation: By looking at the lq_cluster_wo_m data frame, we can notice that the final clustering variables list does not include variable COMP_M_LQ because it is highly correlated with variable COMP_K_LQ

Next, we need to change the rows by municipality name instead of row number so that we can remove the name_muni column from the data.

This code chunk below will replace the rows with municipal name instead of row number.

row.names(lq_cluster_wo_m) <- lq_cluster_wo_m$"name_muni"
head(lq_cluster_wo_m,10)

##                                 name_muni  COMP_A_LQ COMP_B_LQ COMP_C_LQ
## Cabreúva                         Cabreúva  1.6778605  3.651562 2.4607702
## Campo Limpo Paulista Campo Limpo Paulista  1.0347030  0.000000 1.8299668
## Itupeva                           Itupeva  1.7041523  3.280845 2.2437006
## Jarinu                             Jarinu  2.8014067  4.835360 1.3215134
## Jundiaí                           Jundiaí  0.9356347  1.300931 0.9121135
## Louveira                         Louveira  2.1415407  0.000000 1.7115357
## Várzea Paulista           Várzea Paulista  0.3138162  0.000000 2.8522508
## Águas De São Pedro     Águas De São Pedro  0.7670632  9.598907 0.2874959
## Analândia                       Analândia 17.2163080 35.907023 0.7169652
## Araras                             Araras  5.6999939  2.079557 1.4979451
##                      COMP_D_LQ COMP_E_LQ COMP_F_LQ COMP_G_LQ COMP_H_LQ
## Cabreúva             0.0000000 2.8991794 1.0260217 1.0891761 2.3423920
## Campo Limpo Paulista 0.0000000 0.8566857 1.2801005 1.1818089 0.9906132
## Itupeva              1.5842341 2.6048465 1.0811904 1.0160042 1.3000805
## Jarinu               0.0000000 1.2796877 1.0315655 1.1461284 1.4797439
## Jundiaí              0.3426468 1.3145776 0.9132292 1.0217855 1.1595545
## Louveira             0.0000000 2.5031940 1.1483657 1.2338187 1.9667916
## Várzea Paulista      0.0000000 3.3777275 1.3690679 1.2213244 1.5831099
## Águas De São Pedro   0.0000000 2.5403701 0.5993592 1.5117334 0.3389436
## Analândia            0.0000000 0.0000000 0.7473491 0.7425761 3.6628230
## Araras               0.6024983 1.6510771 1.0084885 1.2015309 1.3168539
##                      COMP_I_LQ COMP_J_LQ COMP_K_LQ COMP_L_LQ COMP_N_LQ
## Cabreúva             1.0040926 0.2172296 0.7488611 0.7166303 0.4593034
## Campo Limpo Paulista 1.0449064 0.6098022 0.2489432 0.6726458 0.4794005
## Itupeva              1.1680065 0.3578223 0.4672463 1.6160023 0.6656025
## Jarinu               1.3681483 0.6472196 0.2065904 0.4465667 0.7455412
## Jundiaí              1.0859010 0.7598464 0.6265631 1.2506184 0.9136671
## Louveira             1.2815757 0.3282286 0.2155259 0.8735281 0.4157581
## Várzea Paulista      0.8216207 0.3114907 0.1510045 0.3626800 0.5098886
## Águas De São Pedro   3.0602564 0.4758618 0.3280901 0.2216253 0.5842145
## Analândia            1.9079376 0.3560152 0.0000000 1.1053905 0.1942573
## Araras               0.9944849 0.3546406 0.4335829 0.6625941 0.5298950
##                       COMP_O_LQ COMP_P_LQ COMP_Q_LQ COMP_R_LQ COMP_S_LQ
## Cabreúva              2.9852937 0.5309395 0.8911083 0.3054418 0.8125985
## Campo Limpo Paulista  2.6463956 2.1964410 0.6996677 0.6769181 0.7203502
## Itupeva               3.5762909 1.0777504 0.3202562 1.0062523 0.9004581
## Jarinu                3.9530946 1.4451883 0.5057127 1.1122727 0.4842155
## Jundiaí               0.8701872 1.1196525 1.3226651 1.1376518 0.9408820
## Louveira              5.1550925 0.9932454 0.4176721 0.8570994 0.7542301
## Várzea Paulista       1.6052567 1.0468246 0.3696440 0.4927278 0.6772751
## Águas De São Pedro   15.6949599 0.4652292 0.4015659 0.8029187 0.4272176
## Analândia            19.5702586 0.0000000 0.3338120 0.0000000 0.3995276
## Araras                1.3600952 0.7525629 1.0033692 0.7653734 0.7219263
##                      COMP_U_LQ
## Cabreúva                     0
## Campo Limpo Paulista         0
## Itupeva                      0
## Jarinu                       0
## Jundiaí                      0
## Louveira                     0
## Várzea Paulista              0
## Águas De São Pedro           0
## Analândia                    0
## Araras                       0

Observation: Based on the content of the lq_cluster_wo_m, it can be seen that the row has been replaced with the name_numi.

Now, this code chunk will delete the name_muni field by using the code chunk below.

new_lq<- select(lq_cluster_wo_m, c(2:20))
head(new_lq, 10)

##                       COMP_A_LQ COMP_B_LQ COMP_C_LQ COMP_D_LQ COMP_E_LQ
## Cabreúva              1.6778605  3.651562 2.4607702 0.0000000 2.8991794
## Campo Limpo Paulista  1.0347030  0.000000 1.8299668 0.0000000 0.8566857
## Itupeva               1.7041523  3.280845 2.2437006 1.5842341 2.6048465
## Jarinu                2.8014067  4.835360 1.3215134 0.0000000 1.2796877
## Jundiaí               0.9356347  1.300931 0.9121135 0.3426468 1.3145776
## Louveira              2.1415407  0.000000 1.7115357 0.0000000 2.5031940
## Várzea Paulista       0.3138162  0.000000 2.8522508 0.0000000 3.3777275
## Águas De São Pedro    0.7670632  9.598907 0.2874959 0.0000000 2.5403701
## Analândia            17.2163080 35.907023 0.7169652 0.0000000 0.0000000
## Araras                5.6999939  2.079557 1.4979451 0.6024983 1.6510771
##                      COMP_F_LQ COMP_G_LQ COMP_H_LQ COMP_I_LQ COMP_J_LQ
## Cabreúva             1.0260217 1.0891761 2.3423920 1.0040926 0.2172296
## Campo Limpo Paulista 1.2801005 1.1818089 0.9906132 1.0449064 0.6098022
## Itupeva              1.0811904 1.0160042 1.3000805 1.1680065 0.3578223
## Jarinu               1.0315655 1.1461284 1.4797439 1.3681483 0.6472196
## Jundiaí              0.9132292 1.0217855 1.1595545 1.0859010 0.7598464
## Louveira             1.1483657 1.2338187 1.9667916 1.2815757 0.3282286
## Várzea Paulista      1.3690679 1.2213244 1.5831099 0.8216207 0.3114907
## Águas De São Pedro   0.5993592 1.5117334 0.3389436 3.0602564 0.4758618
## Analândia            0.7473491 0.7425761 3.6628230 1.9079376 0.3560152
## Araras               1.0084885 1.2015309 1.3168539 0.9944849 0.3546406
##                      COMP_K_LQ COMP_L_LQ COMP_N_LQ  COMP_O_LQ COMP_P_LQ
## Cabreúva             0.7488611 0.7166303 0.4593034  2.9852937 0.5309395
## Campo Limpo Paulista 0.2489432 0.6726458 0.4794005  2.6463956 2.1964410
## Itupeva              0.4672463 1.6160023 0.6656025  3.5762909 1.0777504
## Jarinu               0.2065904 0.4465667 0.7455412  3.9530946 1.4451883
## Jundiaí              0.6265631 1.2506184 0.9136671  0.8701872 1.1196525
## Louveira             0.2155259 0.8735281 0.4157581  5.1550925 0.9932454
## Várzea Paulista      0.1510045 0.3626800 0.5098886  1.6052567 1.0468246
## Águas De São Pedro   0.3280901 0.2216253 0.5842145 15.6949599 0.4652292
## Analândia            0.0000000 1.1053905 0.1942573 19.5702586 0.0000000
## Araras               0.4335829 0.6625941 0.5298950  1.3600952 0.7525629
##                      COMP_Q_LQ COMP_R_LQ COMP_S_LQ COMP_U_LQ
## Cabreúva             0.8911083 0.3054418 0.8125985         0
## Campo Limpo Paulista 0.6996677 0.6769181 0.7203502         0
## Itupeva              0.3202562 1.0062523 0.9004581         0
## Jarinu               0.5057127 1.1122727 0.4842155         0
## Jundiaí              1.3226651 1.1376518 0.9408820         0
## Louveira             0.4176721 0.8570994 0.7542301         0
## Várzea Paulista      0.3696440 0.4927278 0.6772751         0
## Águas De São Pedro   0.4015659 0.8029187 0.4272176         0
## Analândia            0.3338120 0.0000000 0.3995276         0
## Araras               1.0033692 0.7653734 0.7219263         0

Observation: Based on the content of the lq_cluster_wo_m, it can be seen that the column name name_muni has been removed from the data frame.

4.3.Data Standarization

In general, multiple variables will be used in cluster analysis. It is usual that their values range are different. In order to avoid the cluster analysis result is baised to clustering variables with large values, it is necessary to standardise the input variables before performing cluster analysis.

4.3.1. Min-Max Standardization

This code chunk uses normalize() of heatmaply package to standardize the clustering variables by using Min-Max method. Moreover, summary() is also used to display the summary statstic of the standarized clustering variables.

lq_std <- normalize(new_lq)

summary(lq_std)

##    COMP_A_LQ          COMP_B_LQ         COMP_C_LQ        COMP_D_LQ     
##  Min.   :0.000000   Min.   :0.00000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.006057   1st Qu.:0.00000   1st Qu.:0.1922   1st Qu.:0.0000  
##  Median :0.027794   Median :0.01434   Median :0.2964   Median :0.0000  
##  Mean   :0.112124   Mean   :0.06177   Mean   :0.3335   Mean   :0.0141  
##  3rd Qu.:0.115554   3rd Qu.:0.05508   3rd Qu.:0.4438   3rd Qu.:0.0000  
##  Max.   :1.000000   Max.   :1.00000   Max.   :1.0000   Max.   :1.0000  
##    COMP_E_LQ         COMP_F_LQ        COMP_G_LQ        COMP_H_LQ     
##  Min.   :0.00000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.04312   1st Qu.:0.3288   1st Qu.:0.4829   1st Qu.:0.1765  
##  Median :0.08449   Median :0.4534   Median :0.5365   Median :0.2500  
##  Mean   :0.10879   Mean   :0.4503   Mean   :0.5289   Mean   :0.2932  
##  3rd Qu.:0.14406   3rd Qu.:0.5572   3rd Qu.:0.5960   3rd Qu.:0.3586  
##  Max.   :1.00000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##    COMP_I_LQ        COMP_J_LQ         COMP_K_LQ        COMP_L_LQ     
##  Min.   :0.0000   Min.   :0.00000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.2037   1st Qu.:0.02648   1st Qu.:0.0952   1st Qu.:0.1478  
##  Median :0.2356   Median :0.04087   Median :0.1522   Median :0.3109  
##  Mean   :0.2678   Mean   :0.05518   Mean   :0.1703   Mean   :0.3147  
##  3rd Qu.:0.2907   3rd Qu.:0.05853   3rd Qu.:0.2238   3rd Qu.:0.4287  
##  Max.   :1.0000   Max.   :1.00000   Max.   :1.0000   Max.   :1.0000  
##    COMP_N_LQ        COMP_O_LQ          COMP_P_LQ        COMP_Q_LQ     
##  Min.   :0.0000   Min.   :0.000000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.1198   1st Qu.:0.009808   1st Qu.:0.3069   1st Qu.:0.2019  
##  Median :0.1961   Median :0.023043   Median :0.4567   Median :0.3136  
##  Mean   :0.2177   Mean   :0.053088   Mean   :0.4491   Mean   :0.3329  
##  3rd Qu.:0.2760   3rd Qu.:0.055327   3rd Qu.:0.6018   3rd Qu.:0.4269  
##  Max.   :1.0000   Max.   :1.000000   Max.   :1.0000   Max.   :1.0000  
##    COMP_R_LQ        COMP_S_LQ        COMP_U_LQ      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.3695   1st Qu.:0.2856   1st Qu.:0.00000  
##  Median :0.4905   Median :0.3757   Median :0.00000  
##  Mean   :0.4645   Mean   :0.3790   Mean   :0.01048  
##  3rd Qu.:0.5985   3rd Qu.:0.4554   3rd Qu.:0.00000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.00000

Observation: After standardization, we can notice that the values range of the Min-max standardized clustering variables are 0-1 now.

4.4. Computing proximity matrix

In R, many packages provide function to calculate distance matrix. In this case, dist() of R will be used to compute the proximity matrix.

Dist() supports six distance proximity calculations which are euclidean, maximum, manhattan, canberra, binary and minkowski.

This code chunk will use the euclidean distance to compute the proximity matrix.

proxmat <- dist(new_lq, method = 'euclidean')

4.5. Computing hierarchical clustering

There are serveral packages provided in R for hierarhical clustering function. In this take-home-assignment03, The hclust() of R stats will be used to compute the hierarchical clustering with the use of ‘ward.D’. The hierarchical clustering output will be stored in an object of class hclust which describes the tree produced by the clustering process.

The code chunk will perform hierarchical cluster analysis using ward.D method.

hclust_ward <- hclust(proxmat, method = 'ward.D')
plot(hclust_ward, cex = 0.6)

4.6. Selecting the optimal clustering algorithm

It is taught by our geospatial professor that, one of the challenges in performing hierarchical clustering is to identify stronger clustering structures. This issues can also be solved by using agnes() function of cluster package which provide agglomerative coefficient to measure the amount of clustering strcuture.

It is said that value closer to 1 suggest strong clustering structure.

The code chunk will used to compute the agglomerative coefficients of all hierarchical clustering algorithms.

m <- c( "average", "single", "complete", "ward")
names(m) <- c( "average", "single", "complete", "ward")

ac <- function(x) {
  agnes(new_lq, method = x)$ac
}

map_dbl(m, ac)

##   average    single  complete      ward 
## 0.9562927 0.9510264 0.9580462 0.9728826

Observation: Based on the output, we can see that ward’s method provides the strongest clustering structure among the other 4 methods since its value 0.9728826 is the closer to 1 compare to the other methods. Therefore, ward’s method will be continued to used for the subsequent analysis.

4.7. Determining Optimal Clusters

Another technical challenge face by data analyst in performing clustering analysis is to determine the optimal clusters to retain.

There are three commonly used methods to determine the optimal clusters, they are:

Elbow Method
Average Silhouette Method
Gap Statistic Method

In this take home assignment, Gap Statistic Method will be used to determine the optimal clusters to retain.

4.7.1. Gap Statistic Method

The gap statistic compares the total within intra-cluster variation for different values of k with their expected values under null reference distribution of the data. The estimate of the optimal clusters will be value that maximize the gap statistic (i.e., that yields the largest gap statistic). This means that the clustering structure is far away from the random uniform distribution of points.

This code chunk will use clusGap() of cluster package to compute the gap statistic.

set.seed(12345)
gap_stat <- clusGap(new_lq, FUN = hcut, nstart = 25, K.max = 10, B = 50)
# Print the result
print(gap_stat, method = "firstmax")

## Clustering Gap statistic ["clusGap"] from call:
## clusGap(x = new_lq, FUNcluster = hcut, K.max = 10, B = 50, nstart = 25)
## B=50 simulated reference sets, k = 1..10; spaceH0="scaledPCA"
##  --> Number of clusters (method 'firstmax'): 3
##           logW   E.logW      gap     SE.sim
##  [1,] 6.832387 8.110970 1.278583 0.01630632
##  [2,] 6.563251 7.930109 1.366858 0.02004553
##  [3,] 6.423993 7.830754 1.406761 0.01645589
##  [4,] 6.340767 7.747398 1.406632 0.01674314
##  [5,] 6.291573 7.674303 1.382730 0.01620172
##  [6,] 6.129111 7.611438 1.482327 0.01649416
##  [7,] 6.075579 7.556450 1.480871 0.01636994
##  [8,] 5.998010 7.509528 1.511519 0.01622536
##  [9,] 5.903274 7.467097 1.563822 0.01684157
## [10,] 5.850465 7.428672 1.578207 0.01764693

This code chunk will use fviz_gap_stat() of factoextra package to visualize the plot.

fviz_gap_stat(gap_stat)

Observation: Based on the gap statistic graph, 3 is the recommended number of cluster to choose. It can say that 3 clusters is a logical number of cluster to pick before it increase to next biggest number of cluster which is 6. However, since the data is not very large to use 6 clusters, I will be selecting 3 clusters to perform all analysis.

4.8 Interpreting the dendrograms

In the dendrogram, each leaf corresponds to one observation. As we move up the tree, observations that are similar to each other are combined into branches, which are themselves fused at a higher height.

The height of the fusion, provided on the vertical axis, indicates the (dis)similarity between two observations. The higher the height of the fusion, the less similar the observations are. Note that, conclusions about the proximity of two observations can be drawn only based on the height where branches containing those two observations first are fused. We cannot use the proximity of two observations along the horizontal axis as a criteria of their similarity.

It’s also possible to draw the dendrogram with a border around the selected clusters by using rect.hclust() of R stats. The argument border is used to specify the border colors for the rectangles.

plot(hclust_ward, cex = 0.6)
rect.hclust(hclust_ward, k = 3, border = 2:5)

Observation: Based on the cluster dendrogram, we can see that municipalities with the relatively same heights are formed into 3 clusters of municipalities together. However, we can say that the cluster size are not evenly spread out since the red cluster contains more municipalities compare to the other two clusters as they are relatively small.

4.9. Visually-driven hierarchical clustering analysis

Next, we will perform visually-driven hierarchical clustering analysis by using the heatmaply package since it enable us to build both highly interactive cluster heatmap or static cluster heatmap.

4.9.1. Transforming the data frame into a matrix

new_lq_mat <- data.matrix(new_lq)

4.9.2. Plotting interactive cluster heatmap using heatmaply()

In the code chunk below, the heatmaply() is used to build and plot an interactive cluster heatmap.

heatmaply(normalize(new_lq_mat),
          Colv=NA,
          dist_method = "euclidean",
          hclust_method = "ward.D",
          seriate = "OLO",
          colors = Blues,
          k_row = 3,
          margins = c(NA,200,60,NA),
          fontsize_row = 4,
          fontsize_col = 5,
          main="Geographic Segmentation of Industrial Specialization by location quotients",
          xlab = "Locatiton Quotients",
          ylab = "City"
          )

Observation: based on the output, we can see that Comp_F_LQ, CompG_LQ, COMP_P_LQ, COMP_Q_LQ and COMP_R_LQ has higher location quotients value. As for the cluste, all 3 clusters contain a few high concentrated industries. For example, the red cluster has high concentration of Industry A (Agriculture, LiveStock, Forestry, Fishing and Aquaculture), Green cluster has high concentration of Industry F (Construction) and Industry P (Education), Blue cluster has high concentration of Industry Q(Health Services). However, the green cluster has more high concentrated industries compared to the other two.

4.10. Mapping the clusters formed

We will use cutree to create a 3 cluster model and save it as a list of factors.

groups <- as.factor(cutree(hclust_ward, k=3))

This code chunk will convert the group list objet into a matrix and merge with the spatial dataframe called location_quotient. So that we can plot the map using tmap. After which, rename() of dplyr will be used to rename the matrix groups field as CLUSTER

location_quotient_sf_cluster <- cbind(location_quotient, as.matrix(groups)) %>%
  rename(`CLUSTER`=`as.matrix.groups.`)

This code chunk will quickly plot the municipalities with respect to their cluster group on the map.

qtm(location_quotient_sf_cluster, "CLUSTER")

Observation: Based on the map output, we can see that the clusters are dispersed all over the Sau Paulo State. Therefore, it is hard for us to conclude any information from here. However, we can see that among all other cluster, cluster 1 seems to be abit more cluster and located closer to each other compare to other clusters.

5. Spatially Constrained Clustering - SKATER approach

In this section, i will be making use of skater apporach to derieve spatially constrained cluster.

5.1. Computing Neighbour List

We noticed that Ilhabela municipality is disconnected by water source from the rest of Sau Paulo State. Instead of removing it, we will be including a snap of 0.02 so that every polygon will have at least 1 neighbour.

This code chunk will use poly2nd() of spdep package to compute the neighbours list from polygon list.

location_quotient.nb <- poly2nb(location_quotient_sp)
summary(location_quotient.nb)

## Neighbour list object:
## Number of regions: 174 
## Number of nonzero links: 882 
## Percentage nonzero weights: 2.913199 
## Average number of links: 5.068966 
## 1 region with no links:
## 100
## Link number distribution:
## 
##  0  1  2  3  4  5  6  7  8  9 10 22 
##  1  4 13 23 36 31 29 16  9  6  5  1 
## 4 least connected regions:
## 8 9 29 36 with 1 link
## 1 most connected region:
## 161 with 22 links

This code chunk will plot the municipalities neighbour list using location_quotient_sp.

plot(location_quotient_sp, border=grey(.5))
plot(location_quotient.nb, coordinates(location_quotient_sp), col="hotpink", add=TRUE)

This code chunk will check the name of the municipality with no neighbor.

location_quotient_sp$name_muni[100]

## [1] "Ilhabela"

Observation: Based on the map output of the municipalities neighbour list, we can see that there is a municipality without neighbour. We also found that the name of municipality is Ilhabela. As it has zero neighbor, we will remove it from the analysis.

5.2. Removing Municipality without neighbour

This code chunk will remove the municipality named Ilhabela from the dataset. Then compute the neighbour list again.

new_lq_filtered <- new_lq[row.names(new_lq) != "Ilhabela", , drop = FALSE]

location_quotient_sp_filtered <- location_quotient_sp[row.names(new_lq) != "Ilhabela", , drop = FALSE]

location_quotient_sf_cluster <- location_quotient_sf_cluster[location_quotient_sf_cluster$name_muni != "Ilhabela", , drop = FALSE]

location_quotient.nb <- poly2nb(location_quotient_sp_filtered)
summary(location_quotient.nb)

## Neighbour list object:
## Number of regions: 173 
## Number of nonzero links: 882 
## Percentage nonzero weights: 2.946975 
## Average number of links: 5.098266 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 10 22 
##  4 13 23 36 31 29 16  9  6  5  1 
## 4 least connected regions:
## 8 9 29 36 with 1 link
## 1 most connected region:
## 161 with 22 links

Observation: Based on the observation, you can see that the municipality with no neighbour is removed and now everyone has aleast 1 neighbour each.

This code chunk will plot the municipalities neighbour list again.

plot(location_quotient_sp_filtered, border=grey(.5))
plot(location_quotient.nb, coordinates(location_quotient_sp_filtered), col="hotpink", add=TRUE)

Observation: Based on the map output of the neighbour list, we can now see that the Ilhabela municipality have been removed thus, there is no municipality without a neighbour and all municipalities are well-connected now.

5.3. Computing minimum spanning tree

Calculating edge costs

Next, nbcosts() of spdep package is used to compute the cost of each edge. It is the distance between it nodes. This function compute this distance using a data.frame with observations vector in each node.

The code chunk used nbcost() of spdep package to compute the cost of each edge.

lcosts <- nbcosts(location_quotient.nb, new_lq_filtered)

Creating Geographically distance weights

This code chunk uses nb2listw() of spdep package to incorporate these costs into a weight object using style B to make sure the cost values are not row-standardized.

location_quotient.w <- nb2listw(location_quotient.nb, lcosts, style="B")
summary(location_quotient.w)

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 173 
## Number of nonzero links: 882 
## Percentage nonzero weights: 2.946975 
## Average number of links: 5.098266 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 10 22 
##  4 13 23 36 31 29 16  9  6  5  1 
## 4 least connected regions:
## 8 9 29 36 with 1 link
## 1 most connected region:
## 161 with 22 links
## 
## Weights style: B 
## Weights constants summary:
##     n    nn       S0       S1      S2
## B 173 29929 13813.92 999945.6 8183825

5.3.1. Computing minimum spanning tree

This code chunk used mstree() of spdep package to compute the mean of minimum spanning tree. This is to ensure that all municipality are interconnected.

location_quotient.mst <- mstree(location_quotient.w)

This code chunk will check if the class of the computed MST is matrix or not. This is to ensure that the conversion was correct.

class(location_quotient.mst)

## [1] "mst"    "matrix"

Observation: Based on the output, we can see that the class of location_quotient.mst is matrix. Hence, the conversion was correct.

This code chunk will check the dimension of computed MST. In this case, the resulted dimension should be n-1 number of observations. This is because the minimum spanning tree consists on n-1 edges (links) in order to traverse all the nodes..

dim(location_quotient.mst)

## [1] 172   3

Observation: Based on the output result, we can see that the dimension of computed MST is correct. It is because we have initially 174 observations in the location_quotient. However, after we remove ‘Ilhabela’, it should be 173. And the minimum spanning tree consists of n - 1 edges. Therefore, the dimension of 172 means the spanning tree is correctly computed.

This code chunk plot the minimum spanning tree to see how the initial neighbour list is simplied to just one edge connecting to each of the nodes while passing passing through all other nodes.

plot(location_quotient_sp_filtered, border=gray(.5))
plot.mst(location_quotient.mst, coordinates(location_quotient_sp_filtered), 
     col="hotpink", cex.lab=0.7, cex.circles=0.005, add=TRUE)

5.4. Computing spatially constrained clusters using SKATER method

The skater() method takes three arguments:

the first two columns of the MST matrix (i.e. not the cost),
the data matrix (to update the costs as units are being grouped), and
the number of cuts. Note: It is set to one less than the number of clusters. So, the value specified is not the number of clusters, but the number of cuts in the graph, one less than the number of clusters.

This code chunk uses skater() of spdep package to compute the spatially constrained cluster.

clust3 <- skater(location_quotient.mst[,1:2], new_lq_filtered, method = "euclidean", 2)

class(clust3)

## [1] "skater"

Obsevation: The result of the skater() is an object of class skater.

This code chunk will examine the content of clust6.

str(clust3)

## List of 8
##  $ groups      : num [1:173] 1 1 1 1 1 1 1 1 1 1 ...
##  $ edges.groups:List of 3
##   ..$ :List of 3
##   .. ..$ node: num [1:165] 121 92 111 23 104 151 55 25 154 40 ...
##   .. ..$ edge: num [1:164, 1:3] 111 23 55 25 40 151 121 92 154 147 ...
##   .. ..$ ssw : num 2097
##   ..$ :List of 3
##   .. ..$ node: num [1:7] 97 117 119 88 102 106 90
##   .. ..$ edge: num [1:6, 1:3] 117 97 97 119 117 119 88 117 119 102 ...
##   .. ..$ ssw : num 153
##   ..$ :List of 3
##   .. ..$ node: num 89
##   .. ..$ edge: num[0 , 1:3] 
##   .. ..$ ssw : num 0
##  $ not.prune   : NULL
##  $ candidates  : int [1:3] 1 2 3
##  $ ssto        : num 2654
##  $ ssw         : num [1:3] 2654 2405 2250
##  $ crit        : num [1:2] 1 Inf
##  $ vec.crit    : num [1:173] 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "class")= chr "skater"

This code chunk will check the cluster assignment.

ccs3 <- clust3$groups
ccs3

##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 2 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1 1 1
## [112] 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

This code chunk will check the number of observation in each cluster by the means of table command.

table(ccs3)

## ccs3
##   1   2   3 
## 165   7   1

This code chunk will plot the pruned tree that shows the 5 clusters across municipality area.

plot(location_quotient_sp_filtered, border=gray(.5))
plot(clust3, coordinates(location_quotient_sp_filtered), cex.lab=.7,
     groups.colors=c("red","green","blue", "brown", "pink"), cex.circles=0.005, add=TRUE)

## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

## Warning in segments(coords[id1, 1], coords[id1, 2], coords[id2, 1],
## coords[id2, : "add" is not a graphical parameter

5.5. Visualising Cluster on Choropleth Map

The code chunk will visualize the derived cluster based on SKATER clustering onto a choropleth map.

groups_mat <- as.matrix(clust3$groups)
location_quotient_sf_sptialcluster <- cbind(location_quotient_sf_cluster, as.factor(groups_mat)) %>%
  rename(`SP_CLUSTER`=`as.factor.groups_mat.`)
qtm(location_quotient_sf_sptialcluster, "SP_CLUSTER")

Observation: Based on the output of the map resulted from skater method, we can see that clusters are not as fragmented as hierarchical cluster method. Instead, the clusters are grouping together together to provide more homogeneity. Thus, it is easier to differentiate between clusters. We can also noticed that the cluster 1 contains the most number of municipalities, whereas cluster 2 and 3 contains lesser municipal within the cluster. Additionally, cluster 3 only contains 1 municipal. This proves that the 3 clusters is still not optimal for this dataset.

5.6. Comparison between hierarchy method and skater method

This code chunk display the spatially constrained clustering hierarchy method and skater method side by side for comparison

hier_clust_map <- qtm(location_quotient_sf_cluster,
                  "CLUSTER") + 
  tm_borders(alpha = 0.5) + 
  tm_layout(main.title = "Hierarchy Method",
            main.title.position = "center",
            legend.show =  TRUE,
            legend.text.size = 0.6,
            frame = TRUE) 

skater_clust_map <- qtm(location_quotient_sf_sptialcluster,
                   "SP_CLUSTER") + 
  tm_borders(alpha = 0.5) + 
  tm_layout(main.title = "Skater Method",
            main.title.position = "center",
            legend.show =  TRUE,
            legend.text.size =0.6,
            frame = TRUE) 

tmap_arrange(hier_clust_map, skater_clust_map,
             asp=0, ncol=2)

## Warning: One tm layer group has duplicated layer types, which are omitted. To
## draw multiple layers of the same type, use multiple layer groups (i.e. specify
## tm_shape prior to each of them).

## Warning: One tm layer group has duplicated layer types, which are omitted. To
## draw multiple layers of the same type, use multiple layer groups (i.e. specify
## tm_shape prior to each of them).

Observation: Based on the two maps, we can say that there are of very different. The cluster in Hierarchy method is very fragmented whereas the cluster in skater method is clustered together with the high homogeneity. However, one thing to take note is that, even though the municipals belong to the cluster changes for the clusters in two different methods, the cluster 3 of hierarchy method and cluster 2 of skater method remain the same.

TakeHomeAssignment_03

Thin Su Yee

11/3/2020